\WithSuffix

[1] ${O}\left(#1\right)$ \WithSuffix[1] ${\tilde{{O}}}\left(#1\right)$ \WithSuffix[1] ${o}\left(#1\right)$ \WithSuffix[1] ${\tilde{{o}}}\left(#1\right)$ \WithSuffix[1] ${\Omega}\left(#1\right)$ \WithSuffix[1] ${\tilde{{\Omega}}}\left(#1\right)$ \WithSuffix[1] ${\omega}\left(#1\right)$ \WithSuffix[1] ${\tilde{{\omega}}{\left(#1\right)}}$ \WithSuffix[1] ${\Theta}\left(#1\right)$ \WithSuffix[1] ${\tilde{{\Theta}}\left(#1\right)}$ \NewEnvironkillcontents

On Sparsity Awareness in Distributed Computations

Keren Censor-Hillel Dean Leitersdorf Volodymyr Polosukhin

(Technion¹¹1{ckeren, leitersdorf, po}@cs.technion.ac.il)

We extract a core principle that underlies seemingly different fundamental distributed settings, which is that sparsity awareness may induce faster algorithms for core problems in these settings. To leverage this, we establish a new framework by developing an intermediate auxiliary model which is weak enough to be successfully simulated in the classic $\mathsf{CONGEST}$ model given low mixing time, as well as in the recently introduced $\mathsf{Hybrid}$ model. We prove that despite imposing harsh restrictions, this artificial model allows balancing massive data transfers with a maximal utilization of bandwidth. We then exemplify the power we gain from our methods, by deriving fast shortest-paths algorithms which greatly improve upon the state-of-the-art.

Specifically, we obtain the following end results for graphs of $n$ nodes:

•

A $(3+\varepsilon)$ approximation for weighted, undirected APSP in $(n^{1/2}+n/\delta)\cdot\tau_{\text{mix}}\cdot 2^{O(\sqrt{\log{n}})}$ rounds in the $\mathsf{CONGEST}$ model, where $\delta$ is the minimum degree of the graph and $\tau_{\text{mix}}$ is its mixing time. For graphs with $\delta=\tau_{\text{mix}}\cdot 2^{\omega{(\sqrt{\log{n}})}}$ , this takes $o(n)$ rounds, despite the $\Omega(n)$ known lower bound for general graphs [Nanongkai, STOC’14].
•

An ${(n^{7/6}/m^{1/2}+n^{2}/m)}\cdot\tau_{\text{mix}}\cdot 2^{O(\sqrt{\log{n}})}$ -round exact SSSP algorithm in the $\mathsf{CONGEST}$ model, for graphs with $m$ edges and a mixing time of $\tau_{\text{mix}}$ . This improves upon the previous algorithm of [Chechik and Mukhtar, PODC’20] for significant ranges of values of $m$ and $\tau_{\text{mix}}$ .
•

A $\mathsf{Congested\ Clique}$ simulation in the $\mathsf{CONGEST}$ model which improves upon the previous state-of-the-art simulation of [Ghaffari, Kuhn, and SU, PODC’17] by a factor that is proportional to the average degree in the graph.
•

An $\tilde{O}(n^{5/17}/\varepsilon^{9})$ -round algorithm for a $(1+\varepsilon)$ approximation for SSSP in the $\mathsf{Hybrid}$ model. The only previous $o(n^{1/3})$ round algorithm for distance approximations in this model is for a much larger approximation factor of $(1/\varepsilon)^{O(1/\varepsilon)}$ in $\tilde{O}(n^{\varepsilon})$ rounds [Augustine, Hinnenthal, Kuhn, Scheideler, Schneider, SODA’20].

1 Introduction

The overarching theme of this paper is laying down an algorithmic infrastructure and employing it for designing fast algorithms in seemingly unrelated distributed settings, namely, the classic $\mathsf{CONGEST}$ model and the recently-introduced $\mathsf{Hybrid}$ model.

The $\mathsf{CONGEST}$ model [56] abstracts a synchronous network of $n$ nodes, in which in each round of computation, each node can send a message of $O(\log n)$ bits on each of its links. A recent line of work addresses computing MST, distances, and data summarizaion in $\mathsf{CONGEST}$ with low-mixing time [38, 40, 60] and finding small graphs in $\mathsf{CONGEST}$ benefits from efficient computation inside components of low-mixing time [20, 21, 22, 15, 13, 30, 48]. Low-mixing time is, in particular, a property of expander graphs, which have been shown to be useful for designing data centers [43, 27].

The $\mathsf{Hybrid}$ model, recently introduced by [8], abstracts networks supporting high-bandwidth communication over local edges, as well as very low-bandwidth communication over global edges. Aligned with most previous work on this model, we assume here unbounded bandwidth for messages to neighbors, and $O(\log n)$ -bit messages to a constant number of nodes anywhere in the network. This model in particular abstracts recent developments in hybrid data centers [26, 42, 47, 62]. Most research in the $\mathsf{Hybrid}$ model has been devoted to distance computation problems [8, 49, 32, 18].

While these settings highly differ, we pinpoint an approach that underlies computation in both settings, namely, sparsity and density awareness. The key type of tasks which we tackle are tasks requiring transfer of massive amounts of data. Our general approach is to design load balancing mechanisms that leverage the full bandwidth of a given communication model. Examples of tasks that enjoy our framework are matrix multiplication and distance computations.

For the purpose of our framework, as an auxiliary tool (not as a computational model in its own right), we define the $\mathsf{Anonymous\ Capacitated}$ model with capacity $c$ (abbreviated $\mathsf{AC(c)}$ ), which is an all-to-all setting whose main characteristics are a limit on the bandwidth per node and the anonymity of nodes. To cope with the harsh nature of the $\mathsf{AC(c)}$ model, which is needed in order to allow it to be efficiently simulated, we develop a distributed data structure and accompanying algorithms, dedicated towards load balancing and full utilization of the available bandwidth.

We then show sparsity aware simulations of the $\mathsf{AC(c)}$ model in the $\mathsf{CONGEST}$ and $\mathsf{Hybrid}$ settings. Specifically, the simulations focus on utilizing all the available bandwidth in the underlying models, even when given highly skewed inputs. Combined, these obtain our end results – fast algorithms for distance computations in low-mixing time $\mathsf{CONGEST}$ and in the $\mathsf{Hybrid}$ model.

A first flavor of our end results: One of our main contributions is proving that the size $n$ of the graph is not fine-grained enough to capture the complexity of the all-pairs shortest-paths problem (APSP) in the $\mathsf{CONGEST}$ model. While there is an $\tilde{\Omega}(n)$ lower bound for general graphs, even when allowing large approximation factors [54], and a matching randomized algorithm giving an exact solution [12], we show that one can go significantly below this complexity, depending on the minimal degree in the graph and its mixing time.

Theorem 1.1 ( $(3+\varepsilon)$ -Approximation for APSP in $\mathsf{CONGEST}$ ).

For any constant $0<\varepsilon<1$ , and weighted, undirected graph ${G}$ with minimal degree $\delta$ and mixing time $\tau_{\text{mix}}$ , there is an algorithm in the $\mathsf{CONGEST}$ model computing a $(3+\varepsilon)$ approximation to APSP on $G$ within $(n^{1/2}+n/\delta)\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds²²2Note that there is a typo in the complexity stated for this theorem in the SPAA’21 proceedings., w.h.p.

For any constant $0<\gamma<1$ , consider a graph $G$ with $\delta=n^{\gamma}\cdot\tau_{\text{mix}}\cdot 2^{{\omega}(\sqrt{\log{n}})}$ . Using Theorem 1.1, it is possible to approximate weighted, undirected APSP on $G$ in $n^{1/2}\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}+O(n^{1-\gamma})$ rounds, w.h.p., in the $\mathsf{CONGEST}$ model, which is a major improvement over the linear general case. This approach is aligned with a conclusion that is obtained by the single-source shortest paths (SSSP) algorithm of [40] that reflects that $n$ and the diameter $D$ are insufficient for capturing the complexity of SSSP. Our result suggests that for APSP the dependence parameters could be $n,\delta$ and $\tau_{\text{mix}}$ , and this opens the complexity landscape of APSP in the $\mathsf{CONGEST}$ model to much further exciting research.

1.1 Our Contributions

1.1.1 Fast Algorithms for the $\mathsf{Hybrid}$ Model

The pioneering works of [8, 49] lay down technical foundations that show that utilizing both the local and global edges in the $\mathsf{Hybrid}$ model allows solutions which are much faster than algorithms which only use the local or global edges. One of their prime contributions is showing that the complexity of exact and approximate APSP is $\tilde{\Theta}(n^{1/2})$ rounds.

The $\tilde{\Theta}(n^{1/3})$ -round regime is also of importance in this model, as [49, 18] show a variety of algorithms with this complexity. For diameter, a lower bound of $\tilde{\Omega}(n^{1/3})$ rounds for exact unweighted diameter and for a $(2-\varepsilon)$ approximation for weighted diameter is shown (the first lower bound in the $\mathsf{Hybrid}$ model for a problem with a small output), matched with a $\tilde{O}(n^{1/3})$ round algorithm for a $(1+\varepsilon)$ approximation for unweighted diameter, and a $2$ approximation for weighted diameter. For weighted distances, $\tilde{O}(n^{1/3})$ -round algorithms are shown for approximations from polynomially many sources, in addition to exact distances from $\tilde{O}(n^{1/3})$ sources.

Our main contribution in the $\mathsf{Hybrid}$ model is breaking below $o(n^{1/3})$ rounds for a $(1+\varepsilon)$ approximation for weighted single source shortest paths (SSSP), which also implies a $(2+\varepsilon)$ approximation for weighted diameter. As we elaborate upon in the following subsections, this requires establishing an entire foundation of techniques which are a core contribution of the paper. We show the following algorithm which completes in $\tilde{O}(n^{5/17})$ rounds, w.h.p. (as common, high probability is at least $1-n^{-c}$ , for a constant $c>1$ ).

Theorem 1.2 ( $(1+\varepsilon)$ -Approximation for SSSP in $\mathsf{Hybrid}$ ).

Given a weighted, undirected graph $G=(V,E)$ , with $n=|V|$ and $m=|E|$ , a value $0<\varepsilon<1$ , and a source $s\in V$ , there is a $\mathsf{Hybrid}$ model algorithm computing a $(1+\varepsilon)$ -approximation of SSSP from $s$ , in ${\tilde{{O}}}(n^{5/17}/\varepsilon^{9})$ rounds, w.h.p.

Our results provide an interesting open question: what is the best round complexity for a $2$ approximation of the diameter? We show that for a $(2+\varepsilon)$ approximation, $o(n^{1/3})$ rounds suffice, while $(2-\varepsilon)$ requires $\tilde{\Omega}(n^{1/3})$ rounds, as stated above.

1.1.2 Fast Algorithms for the $\mathsf{CONGEST}$ Model

As a warm-up, we illustrate the power of our $\mathsf{AC(c)}$ model and its simulation in the $\mathsf{CONGEST}$ model, by showing how to simulate the $\mathsf{Congested\ Clique}$ model (a synchronous model where every two nodes can exchange messages of $O(\log n)$ bits in every round) in $\mathsf{AC(c)}$ and hence in $\mathsf{CONGEST}$ . Simulation of algorithms from the $\mathsf{Congested\ Clique}$ model may both significantly simplify algorithm design as well as improve results in other distributed models, as seen in previous works [38, 20, 21, 22, 15, 13, 30, 49].

We get the following result, from which one can already show new algorithms which beat the state-of-the-art in the $\mathsf{CONGEST}$ model by simulating existing $\mathsf{Congested\ Clique}$ algorithms.

Theorem 1.3 ( $\mathsf{Congested\ Clique}$ Simulation in $\mathsf{CONGEST}$ ).

Consider a graph $G$ . Let $A$ be an algorithm which runs in the $\mathsf{Congested\ Clique}$ model over $G$ in $t$ rounds. If each node $v$ has ${O}(\deg{v}\cdot\log{n})$ bits of input and output, then there is an algorithm which simulates $A$ in $(t\cdot{{n^{2}}/{m}})\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds of the $\mathsf{CONGEST}$ model over $G$ , w.h.p.

Here, $\tau_{\text{mix}}$ is the mixing time of the graph, which is roughly the number of rounds required for a lazy random walk to reach the stationary distribution (a precise definition is not needed for understanding our results). A previous non trivial simulation of the $\mathsf{Congested\ Clique}$ model in the $\mathsf{CONGEST}$ model is due to [38]. It emulates one $\mathsf{Congested\ Clique}$ round in $O({n\cdot\tau_{\text{mix}}\cdot(1+\frac{\Delta^{2}\log{n}\cdot\tau_{\text{mix}}}{n})\log{n}\log^{*}{n}})$ rounds of the $\mathsf{CONGEST}$ model, where $\Delta$ is the maximum degree in the graph. Moreover, for certain graphs, this is improved to $\tilde{O}(n\cdot\tau_{\text{mix}})$ . Thus, our simulation improves upon both previous algorithms for all graphs with $m=n\cdot 2^{{\omega}(\sqrt{\log{n}})}$ edges by a factor that is roughly proportional to the average degree, namely, faster by a factor of $m/(n\cdot 2^{\alpha\cdot\sqrt{\log{n}}})$ for some constant $\alpha\geq 1$ . Intuitively, in [38], if every node in $\mathsf{CONGEST}$ desires to message every other node then this requires many rounds. We circumvent this by sending the input of low degree nodes to high degree ones, which then simulate the $\mathsf{Congested\ Clique}$ model and send back the output to the low degree nodes.

Finally, such a simulation implies a relation between lower bounds in the $\mathsf{CONGEST}$ and $\mathsf{Congested\ Clique}$ models. Specifically, simulating one $\mathsf{Congested\ Clique}$ round in $T$ rounds of the $\mathsf{CONGEST}$ model shows that a lower bound of $R$ rounds in the $\mathsf{CONGEST}$ model implies a lower bound of $R/T$ rounds in the $\mathsf{Congested\ Clique}$ model. Due to [29, Theorem 4], we know that lower bounds for some problems in the $\mathsf{Congested\ Clique}$ model imply lower bounds in bounded-depth circuit complexity, and are therefore considered hard to obtain. Plugging our results from Theorem 1.3 in the $R/T$ lower bound for the $\mathsf{Congested\ Clique}$ model shows that if one constructs a family of graphs $\mathcal{G}$ with $m$ edges and $\tau_{\text{mix}}$ mixing time, for which solving some problem $P$ in the $\mathsf{CONGEST}$ model requires $R$ rounds, then $P$ has a lower bound of $(R\cdot m/(n^{2}\cdot\tau_{\text{mix}}\cdot 2^{\alpha{\sqrt{\log{n}}}}))$ rounds in the $\mathsf{Congested\ Clique}$ model for some constant $\alpha\geq 1$ . This means that any value of $\tau_{\text{mix}}$ that is below $R\cdot m/(n^{2}\cdot 2^{\alpha\sqrt{\log{n}}})$ implies a lower bound that is considered hard in the $\mathsf{Congested\ Clique}$ model.

SSSP.

The current state-of-the-art exact SSSP algorithm in the $\mathsf{Congested\ Clique}$ model, due to [14], runs in $O(n^{1/6})$ rounds. Using our result from Theorem 1.3, a simulation of this algorithm in the $\mathsf{CONGEST}$ model runs in $({n^{13/6}}/{m})\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds, w.h.p. We further improve this result by presenting a solution that is faster for any graph which is not extremely dense, namely, for which $m={o}(n^{2})$ . In a nutshell, our speed-up is due to the fact that the $\mathsf{Congested\ Clique}$ algorithm does not use all of the $\Omega(n^{2})$ bandwidth available to it in every round, and so it is inefficient to simulate directly in $\mathsf{CONGEST}$ . Thus, we instead simulate our $\mathsf{AC(c)}$ model, which better reflects the complexity of such algorithms, giving us the following.

Theorem 1.4 (Exact SSSP in $\mathsf{CONGEST}$ ).

Given a weighted, undirected graph $G$ and a source node $s\in G$ , there is an algorithm in the $\mathsf{CONGEST}$ model that ensures that every node $v\in G$ knows the value $d_{G}(s,v)$ , within ${(n^{7/6}/m^{1/2}+n^{2}/m)}\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds, w.h.p.

Consider graphs with a mixing time $\tau_{\text{mix}}={\tilde{{O}}}(1)=\operatorname{poly}\log(n)$ . The diameter $D$ of such graphs is also ${\tilde{{O}}}(1)$ . If such a graph has at least $m=n^{3/2}\cdot 2^{{\omega}(\sqrt{\log{n}})}$ edges, then our SSSP algorithm runs asymptotically faster than the state-of-the-art ${\tilde{{O}}}(n^{1/2}D^{1/4})$ -round algorithm of [23].

APSP.

We now turn our attention to the APSP problem. As observed by Nanongkai [54, Observation 1.4], to solve APSP in the $\mathsf{CONGEST}$ model, a node $v$ is required to learn ${\tilde{{O}}}(n)$ bits of information. In the worst case, for a node with a constant degree, this takes ${\tilde{{\Omega}}}(n)$ rounds. For this reason, slightly modified requirements for the output have been considered.

We consider a shortest-path query problem, in which we separate the computation of shortest paths into two phases, one in which the input graph is pre-processed, and another in which a query set of pairs of nodes is given, $Q\subseteq V\times V$ , and every node $v$ is required to learn the distance to every node $u$ such that $(v,u)\in Q$ . The round complexity of this problem is thus bi-criteria, measured both in terms of pre-processing time and in terms of query time. We analyze the round complexity of the query in terms of the query load, where given a node $v$ , $q_{v}=\left|\{u\ |\ (v,u)\in Q\lor(u,v)\in Q\}\right|$ denotes the number of queries which $v$ is a part of, and the total, normalized query load is $\ell=\max_{v\in V}q_{v}/\deg{(v)}$ .

Theorem 1.5 ( $(3+\varepsilon)$ -Approximation for Shortest-Path Query in $\mathsf{CONGEST}$ ).

For any constant $0<\varepsilon<1$ and a weighted, undirected graph $G$ with $m$ edges, there is a $\mathsf{CONGEST}$ algorithm which after $(n^{1/2}+{n^{11/6}}/{m})\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds of pre-processing, solves any instance of the $(3+\varepsilon)$ -approximate shortest path query problem with a known load $\ell$ , in $\ell\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds, w.h.p.

By denoting $\delta$ as the minimal degree in the graph, one gets $\ell\leq n/\delta$ and $m\geq n\cdot\delta$ , which implies Theorem 1.1 stated above.

Finally, we consider a version of APSP, which we call Scattered APSP, where the distance between every pair of nodes is known to some node, not necessarily the endpoints themselves. That is, we require that every node $u$ knows, for every node $v$ , the identity of a node $w_{uv}$ , which stores the distance $d_{G}(u,v)$ .

Theorem 1.6 ( $(3+\varepsilon)$ -Approximation for Scattered APSP in $\mathsf{CONGEST}$ ).

There exists an algorithm in the $\mathsf{CONGEST}$ model, that given a weighted, undirected input graph $G=(V,E)$ with $n=|V|$ and $m=|E|$ , and some constant $0<\varepsilon<1$ , solves the $(3+\varepsilon)$ -approximate Scattered APSP problem on $G$ , within $((n^{11/6}/m+n^{1/3}+m/n)/\varepsilon+n^{1/2})\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds³³3See footnote 2., w.h.p.

Roadmap.

After a survey of additional related work, Section 2 provides required definitions. Additional preliminaries appear in Appendix A. Section 3 is dedicated to the definition of carrier configurations in the $\mathsf{AC(c)}$ model, and for a flavor of a sample proof, namely, sparse matrix multiplication. The bulk of our infrastructure for $\mathsf{AC(c)}$ is deferred to Appendix B. Finally, Section 4 and Section 5 provide proofs of our end results in the $\mathsf{Hybrid}$ and $\mathsf{CONGEST}$ models, respectively. Both sections have some content deferred to Appendix C and Appendix D, respectively. We conclude with a discussion in Section 6.

1.2 Additional Related Work

$\mathsf{Hybrid}$ .

There is growing interest in the recently introduced $\mathsf{Hybrid}$ network model [8]. It is further studied by [49, 32, 18]. [8, 49, 18] consider the same values for the local bandwidth $\lambda$ and global bandwidth $\gamma$ as we do, and address mainly distance computations. The complexity of exact weighted APSP is ${\tilde{{\Theta}}}(n^{1/2})$ rounds, by combining the upper bound of [49], and the lower bound of [8]. The complexity of $3$ approximate weighted $n^{2/3}$ -SSP is ${\tilde{{\Theta}}}(n^{1/3})$ rounds, by combining the upper bound of [18], and the lower bound of [49]. [8, 49] show how to simulate one round of the $\mathsf{Broadcast\ Congested\ Clique}$ (a synchronous distributed model where every node can broadcast a (same) $O(\log n)$ -bit message to all nodes per round) and $\mathsf{Congested\ Clique}$ models on the skeleton graph with ${\tilde{{O}}}(n^{2/3})$ nodes in ${\tilde{{O}}}(n^{1/3})$ rounds of the $\mathsf{Hybrid}$ model and obtain various distance related results using them. For example [49] show $(7+\varepsilon)$ and $(3+\varepsilon)$ weighted $n^{x}$ -SSP approximation in $O(n^{1/3}/\varepsilon)$ and ${\tilde{{O}}}(n^{0.397+n^{x/2}})$ rounds w.h.p., respectively. [18] improve on some of those results by simulating ad-hoc models which exploit additional communication abilities of the $\mathsf{Hybrid}$ model. For instance, they show $3$ approximate weighted $n^{x}$ -SSP approximation in $O(n^{1/3}/\varepsilon)$ rounds w.h.p. [32] solve distance problems for sparse families of graphs using local congestion bounded by $\log{n}$ . Another special case of the $\mathsf{Hybrid}$ model is the $\mathsf{Node\text{-}Capacitated\ Clique}$ model, which contains only global edges [6, 7].

$\mathsf{CONGEST}$ .

Distance problems are extensively studied in the $\mathsf{CONGEST}$ model due to being fundamental tasks. One of the important problems in the $\mathsf{CONGEST}$ model is $(1+\varepsilon)$ -approximate SSSP [58, 11, 44]. The state-of-the-art randomized algorithm [11] solves it in ${\tilde{{O}}}((\sqrt{n}+D)\varepsilon^{-3})$ rounds w.h.p., even in the more restricted Broadcast $\mathsf{CONGEST}$ model. This is close to the $\tilde{\Omega}{(\sqrt{n}+D)}$ lower bound by [58]. The state-of-the-art deterministic algorithm [44] completes in $n^{1/2+o(1)}+D^{1+o(1)}$ rounds. The complexity of exact SSSP is still open, with algorithms given in [31, 39, 33, 23]. The current best known algorithm is of [23] runs in ${\tilde{{O}}}(n^{1/2}D^{1/4}+D)$ rounds, w.h.p.

There is a lower bound of ${\tilde{{\Omega}}}(n)$ rounds to compute APSP [54] which was then tweaked to give a ${\Omega}(n)$ lower bound for the weighted case [17]. Over the years there was much progress in understanding the complexity of this problem [57, 52, 51, 4, 31, 12, 2, 3]. For weighted exact APSP, the best known randomized algorithm [12] is optimal up to polylogarithmic terms. The best deterministic algorithm to date [3] completes in ${\tilde{{O}}}(n^{4/3})$ rounds. For unweighted exact APSP, ${\tilde{{O}}}(n)$ rounds algorithms are known [57, 52, 46].

To go below the ${\Omega}(n)$ lower bound, [50, 51] propose to build name-dependent routing tables and APSP. This means that the algorithm is allowed to choose small labels and output results that are consistent with those labels. Choosing the labels carefully thus overcomes the need to send too many bits of information to low-degree nodes. There is also a line of work which breaks below the lower bound for some certain graph families [36, 37, 53, 55]. [57, 34, 1, 41, 5] study exact and approximate diameter, eccentricities, girth and other problems.

Recently, [38, 40, 60] noticed that it is possible to develop faster $\mathsf{CONGEST}$ algorithms when the underlying graph has low mixing time. This was shown for MST [38], maximum flow, SSSP and transshipment [40] and frequency moments [60]. Algorithms for subgraph-freeness and related variants also enjoy fast computations on graphs with low mixing time [20, 48, 21, 20, 21, 22, 15, 13, 30].

$\mathsf{Congested\ Clique}$ and $\mathsf{Broadcast\ Congested\ Clique}$ .

In the $\mathsf{Congested\ Clique}$ and $\mathsf{Broadcast\ Congested\ Clique}$ models, distance related problems like APSP, $k$ -SSP and diameter and their approximation are studied in [54, 16, 35, 19, 14, 28, 45, 11]. We mention that models related to the $\mathsf{Broadcast\ Congested\ Clique}$ and $\mathsf{Congested\ Clique}$ models have been studied in [9, 10], who explore the regime between broadcast and unicast.

2 Preliminaries

The following are some required definitions, while Appendix A contains additional definitions and basic claims. We begin with a variant of the $\mathsf{Hybrid}$ model, introduced in [8].

Definition 1 ( $\mathsf{Hybrid}$ Model).

In the $\mathsf{Hybrid}$ model, a synchronous network of $n$ nodes with identifiers in $\left[n\right]$ is given by a graph $G=(V,E)$ . In each round, every node can send and receive $\lambda$ messages of $O(\log n)$ bits to/from each of its neighbors (over local edges $E$ ) and an additional $\gamma$ messages in total that are ${O}(\log{n})$ bits long to/from any other nodes in the network (over global edges). If in some round more than $\gamma$ messages are sent via global edges to/from a node, only $\gamma$ messages selected adversarially are delivered.

All of [8, 49, 18] use $\lambda=\infty,\gamma=\log{n}$ . To our knowledge only [32] considers more restrictive settings of $\lambda=1,\gamma=\log{n}$ .

We introduce the $\mathsf{Anonymous\ Capacitated}$ model with capacity $c$ (abbreviated $\mathsf{AC(c)}$ ), which we show is powerful for extracting core distributed principles. The model has two defining characteristics which are anonymity and restricted message capacity.

Definition 2 (The $\mathsf{AC(c)}$ Model).

The $\mathsf{Anonymous\ Capacitated}$ model with capacity $c$ is a distributed synchronous communication model, over a graph $G=(V,E)$ with $n$ nodes, where each node $v$ can send and receive $c$ messages in every round, each of size $O(\log n)$ bits, to and from any node in the graph. Nodes have identifiers in $[n]$ , however, every node has a unique $O(\log n)$ -bit, communication token, initially known only to itself. Node $v$ can send each message either to some node $w$ whose communication token is already known to $v$ , or, to a node selected uniformly, independently at random from the entire graph.

$\mathsf{AC(c)}$ bears some similarity to the $\mathsf{NCC}_{0}$ model [6] with an empty initial knowledge graph. Unlike the $\mathsf{NCC}_{0}$ model, $\mathsf{AC(c)}$ has additional capacity to send $c$ messages from each node and ability to send messages to random nodes. Our motivation for defining $\mathsf{AC(c)}$ is to provide a setting which is both strong enough in order to solve challenging problems, yet, at the same time is weak enough in order to be simulated efficiently in the settings of interest. We note the importance of having both identifiers and communication tokens. An identifier is chosen from a given, hardcoded set and thus can be used for assigning tasks to specific nodes. Communication tokens both assist in dealing with anonymity, and enhance the ability of the $\mathsf{AC(c)}$ model to be easily simulated in other distributed settings – a simulating algorithm can encode routing information of the underlying model in the tokens.

Many of our results hold for weighted graphs $G=(V,E,w)$ , where $w\colon E\mapsto\set{1,2,\dots,W}$ for a $W$ which is polynomial in $n$ . Whenever we send an edge $e$ as part of a message, we assume $w(e)$ is sent as well. We assume that all graphs that we deal with are connected.

Given a graph $G=(V,E)$ and a pair of nodes $u,v\in V$ , we denote by $hop(u,v)$ the hop distance between $u$ and $v$ , by $N^{k}_{G}(v)$ a subset of the $k$ closest nodes to $v$ with ties broken arbitrarily, and by $d_{G}^{h}(u,v)$ the weight of the lightest path between $u$ and $v$ of at most $h$ -hops, where if there is no path of at most $h$ -hops then $d_{G}^{h}(u,v)=\infty$ . In the special case of $h=\infty$ , we denote by $d_{G}(u,v)$ the weight of a lightest path between $u$ and $v$ . We also denote by $\deg_{G}{(v)}$ the degree of $v$ in $G$ , and, in the directed case, $\deg_{G}^{in}{(v)},\ \deg_{G}^{out}{(v)}$ denote the in-degree and out-degree of $v$ in $G$ , respectively. When it is clear from the context we sometimes drop the subscript $G$ .

Definition 3 ( $k$ -Source Shortest Paths (k-SSP)).

Given a graph $G=(V,E)$ , in the $k$ -source shortest path problem, we are given a set $S\subseteq V$ of $k$ sources. Every $u\in V$ is required to learn the distance $d_{G}(u,s)$ for each source $s\in S$ . The cases where $k=1$ , $k=n$ are called the single source shortest paths problem (SSSP), and all pairs shortest paths problem (APSP), respectively.

Definition 4 (Scattered APSP).

Given a graph $G=(V,E)$ , in the Scattered APSP problem, for every pair of nodes $u,v\in V$ , there exist nodes $w_{uv},w_{vu}\in V$ (potentially $w_{uv}=w_{vu}$ ), such that $w_{uv}$ and $w_{vu}$ know $d_{G}(u,v)$ , $u$ knows the identifier of $w_{uv}$ , and $v$ knows the identifier of $w_{vu}$ .

In the approximate versions of these problems, each $u\in V$ is required to learn an $\left(\alpha,\beta\right)$ -approximate distance $\widetilde{d}(u,v)$ which satisfies $d(u,v)\leq\widetilde{d}(u,v)\leq\alpha\cdot d(u,v)+\beta$ , and in case $\beta=0$ , $\widetilde{d}(u,v)$ is called an $\alpha$ -approximate distance.

Definition 5 (Diameter).

Given a graph $G=(V,E)$ , the diameter $D=\max_{u,v\in V}\set{d(u,v)}$ is the maximum distance in the graph. An $\alpha$ -approximation of the diameter $\widetilde{D}$ satisfies $D/\alpha\leq\widetilde{D}\leq D$ .

3 The $\mathsf{Anonymous\ Capacitated}$ Model

The role of defining the $\mathsf{AC(c)}$ model is its power given by our ability to efficiently simulate it in the $\mathsf{Hybrid}$ and $\mathsf{CONGEST}$ models. Applications of this strength are exemplified by improved algorithms for distance computation problems in these models.

To this end, we design fast algorithms in the $\mathsf{AC(c)}$ model for the useful tools of sparse matrix multiplication and hopset construction (Section 3.2). Such algorithms already exist in all-to-all communication models, such as the $\mathsf{Congested\ Clique}$ [14, 28]. However, fundamental load balancing and synchronization steps that are simple to implement when assuming a bandwidth of $\Theta(n^{2})$ messages per round as in the $\mathsf{Congested\ Clique}$ model, provide formidable challenges when nodes cannot receive $\Theta(n)$ messages each. For instance, when multiplying two matrices, while the number of finite elements in row $v$ of both input matrices might be small, in the output matrix, row $v$ might be very dense, so that node $v$ would not be able to learn all of this information. This even implies that it is not always the case that every node can even know about all the edges incident to it in some overlay graph (e.g., a hopset). The crux in the way we overcome these challenges is in introducing the carrier configuration distributed data structure that performs automatic load balancing (Section 3.1). Missing proofs are deferred to Appendix B.

3.1 Carrier Configurations

A carrier configuration is a distributed data structure for holding graphs and matrices, whose main objective is to provide a unified framework for load balancing in situations where substantial amounts of data need to be transferred. The key is that when using the carrier configurations data structure, an algorithm does not need to address many load balancing issues, as those are dealt with under-the-hood by the data structure itself. Therefore, this data structure allows us to focus on the core concepts of each algorithm and abstract away challenges that arise due to skewed inputs.

The data structure crucially enables our algorithms to enjoy sparsity awareness, by yielding complexities that depend on the average degree in an input graph rather than its maximal degree. This allows one to eventually deal with data which is highly skewed and which would otherwise cause a slow-down by having some nodes send and receive significantly more messages than others.

We stress that the manner in which we implement carrier configurations is inherently distributed in nature. In many cases, when two nodes $u,\ v$ store data, $D(u),\ D(v)$ , respectively, in a carrier configuration, the data is dispersed among many other nodes, and when operations are performed using both $D(u)$ and $D(v)$ , the nodes which store the data perform direct communication between themselves, without necessarily involving $u$ or $v$ .

In more detail, the carrier configuration data structure is based on three key parts. (1) Carrier Nodes: every node $v$ gets a set $C_{v}$ of carrier nodes, where $|C_{v}|=\Theta(\deg(v)/k)$ and $k$ is the average degree in the graph, which help $v$ to carry information and store its input edges in a distributed fashion. A key insight is that it is possible to create such an assignment of carrier nodes and also maintain that each node itself is not a carrier for too many other nodes, thus avoiding congestion. (2) Ordered Data Storage: the data of $v$ is stored in a sorted fashion across $C_{v}$ in order to enable its efficient usage. In particular, it takes $O(\log n)$ bits in order to describe what ranges of data are stored in a specific carrier node. (3) Communication Structure: the nodes $C_{v}\cup\{v\}$ are connected using a communication tree (Definition 11), which enables fast broadcasting and aggregation of information between $v$ and $C_{v}$ .

A formal definition of a carrier configuration data structure (Definition 12), as well as an extended $\mathsf{AC(c)}$ -specific definition (Definition 13) are given in Section B.2.

Carrier Configuration Toolbox. Typically, data is converted from a classical storage setting (every node knows the edges incident to it) into being stored in a carrier configuration, and then operations are applied to change the state of the configuration. Thus, we show in the $\mathsf{AC(c)}$ model how to convert a classical graph representation to a carrier configuration. Then, we provide basic tools, e.g., given two matrices held in carrier configurations, produce a third configuration holding the matrix resulting from computing the point-wise minimum of the two input matrices. The descriptions and implementations of these are deferred to Section B.2.

3.2 Sparsity Aware Distance Computations

In order to give a taste of the type of algorithms which we construct in the $\mathsf{AC(c)}$ model, we present an outline of our sparse matrix multiplication algorithm.

We build a foundation in the $\mathsf{AC(c)}$ model which enables us to eventually implement sparse matrix multiplication and hopset construction algorithms, as inspired by the $\mathsf{Congested\ Clique}$ model algorithms in [14],⁴⁴4We note that while some results of [14] were improved in [28], the improvements focus on reducing the amount of synchronous rounds of the algorithm, yet do not decrease the message complexity in way that assists us. in order to solve distance related problems. Ultimately, compared to [14], our main contribution is significantly reducing the message complexity overhead of the load balancing mechanisms. In the $\mathsf{Congested\ Clique}$ implementation, various load balancing steps require $\Theta(n^{2})$ messages, which is trivial in the $\mathsf{Congested\ Clique}$ model, yet is highly problematic in the $\mathsf{AC(c)}$ model (and in the $\mathsf{CONGEST}$ and $\mathsf{Hybrid}$ models which ultimately simulate it). Interestingly, for the relevant sparsity used in distance computations, the sparse matrix multiplication itself requires $o(n^{2})$ messages for actual transfer of matrix elements, and in the $\mathsf{Congested\ Clique}$ algorithm, the message complexity is dominated by the $\Theta(n^{2})$ overhead. We reduce this overhead significantly such that the message complexity of the algorithm is dominated by the messages which actually transfer matrix elements.

We show how to perform sparse matrix multiplication in the $\mathsf{AC(c)}$ model. To simplify our proofs, we assume that the two input matrices and the output matrix have the same average number of finite elements per row. We note that it is possible to use the same ideas we show here in order to prove the general case where the matrices have different densities.

Theorem 3.1 (Sparse Matrix Multiplication).

Given two $n\times n$ input matrices $S,T$ , both with an average number of finite elements per row of at most $k$ , and stored in carrier configurations, $A$ and $B$ , it is possible to output the matrix $\hat{P}$ , in carrier configuration $C$ , where $\hat{P}$ is $P=S\cdot T$ , with only the smallest $k$ elements of each row computed, breaking ties arbitrarily. This takes $\tilde{O}(k\cdot n^{1/3}/c+n/(k\cdot c)+1)$ rounds, in the $\mathsf{AC(c)}$ model, w.h.p.

Proof of Theorem 3.1.

Throughout the algorithm, we assume that every piece of data sent directly to a node is also known by all its carrier nodes. This is possible to do with only a $\tilde{O}(1)$ multiplicative factor to the round complexity, due to \IfAppendixLABEL:\next ( (Carriers Broadcast and Aggregate).)Lemma B.14. Further, as the input matrices are stored in carrier configuration, due to Item 4, every entry $S[i][j]$ is stored in $A$ alongside the communication tokens of both $i$ and $j$ (likewise for entries of $T$ stored in $B$ ). Thus, whenever a value is sent in the algorithm from one of the input matrices, we assume that it is sent alongside these communication tokens.

Denote by $\Delta$ the maximal number of finite elements in a row of $S$ or $T$ . In the proof below, we show a round complexity of $\tilde{O}(k\cdot n^{1/3}/c+n/(k\cdot c)+\Delta/(n^{1/3}\cdot c)+n^{2/3}/c+1)$ , which is bounded by the claimed round complexity of $\tilde{O}(k\cdot n^{1/3}/c+n/(k\cdot c)+1)$ .⁵⁵5Notice that $\tilde{O}(k\cdot n^{1/3}/c+n/(k\cdot c)+\Delta/(n^{1/3}\cdot c)+n^{2/3}/c+1)=\tilde{O}(k\cdot n^{1/3}/c+n/(k\cdot c)+1)$ , since: 1) It always holds that $\Delta\leq n$ , and so $\Delta/(n^{1/3}\cdot c)=O(n^{2/3}/c)$ , and 2) If $k<n^{1/3}$ , then $n^{2/3}/c=O(n/(k\cdot c))$ , yet if $k\geq n^{1/3}$ , then $n^{2/3}/c=O(k\cdot n^{1/3}/c)$ .

Throughout the proof, we construct various sets of nodes (for instance, $V_{i}$ ) where every node in the set knows the communication tokens and identifiers of all the other nodes in the set ( $Tokens(V_{i})$ ), and thus we assume that the nodes in the each set locally compute some arbitrary node (some fixed $v_{i}\in V_{i}$ ) which is the leader of the set.

Step: Partitioning the Input – Sets $V_{i}$
Denote by $\deg_{S}(v),\ \deg_{T}(v)$ the number of finite elements in row $v$ of $S$ , and column $v$ of $T$ , respectively.

Denote $W_{1},\dots,W_{n^{1/3}/2}$ , a hardcoded partition of $V$ into equally sized sets. Using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1 and \IfAppendixLABEL:\next ( (Grouping).)Lemma B.7, every $v\in W_{i}$ broadcasts $\deg_{S}(v),\ \deg_{T}(v)$ to $W_{i}$ , within $O(n^{2/3}/c+1)$ rounds. The nodes in $W_{i}$ locally partition $W_{i}$ into $W_{i,1},\dots,W_{i,j_{i}}$ , for some $j_{i}$ , where for each $k\in[j_{i}]$ , $\sum_{v\in W_{i,k}}\deg_{S}(v)+\deg_{T}(v)\leq 4k\cdot n^{2/3}+4\Delta$ . Since $\sum_{v\in V}\deg_{S}(v)+\deg_{T}(v)\leq 2nk$ , there exists a way to create this partitioning such that $\sum_{i\in[n^{1/3}]}j_{i}\leq n^{1/3}$ .

From here on, we refer to the sets $W_{i,k}$ as $V_{j}$ . We denote by $S[V_{j}]$ the rows of $S$ corresponding to the nodes $V_{j}$ , and by $T[V_{j}]$ the columns of $T$ corresponding to the nodes $V_{j}$ . As $\sum_{i\in[n^{1/3}]}j_{i}\leq n^{1/3}$ , there are at most $n^{1/3}$ sets $V_{i}$ – the sets $V_{1},\dots,V_{n^{1/3}}$ . For each set $V_{i}$ , an arbitrary $v_{i}\in V_{i}$ is designated as its leader. The identifiers and communication tokens of all the leaders are broadcast using \IfAppendixLABEL:\next ( (Broadcasting).)Lemma B.5, within $\tilde{O}(n^{1/3}/c+1)$ rounds.

We partition the information held by the sets $V_{i}$ . We compute $C_{i}=\{0=c_{0},c_{1},\dots c_{n^{1/3}/4-1},c_{n^{1/3}/4}=n\}$ , where the total number of finite entries in⁶⁶6The notation $S[X][a:b]$ denotes all rows of $S$ with indices in $X$ , from column $a$ to column $b$ . $S[V_{i}][c_{j}+1:c_{j+1}]$ is at most $16k\cdot n^{1/3}+16\Delta/n^{1/3}+|V_{i}|\leq 16k\cdot n^{1/3}+16\Delta/n^{1/3}+2n^{2/3}$ . Recall that for each $V_{i}$ , it holds that $\sum_{v\in V_{i}}\deg_{S}(v)+\deg_{T}(v)\leq 4k\cdot n^{2/3}+4\Delta$ , and also $|V_{i}|\leq 2n^{2/3}$ , implying the existence of such a set $C_{i}$ .

The nodes in $V_{i}$ compute the values in $C_{i}$ using binary search. To see this, given any $p_{1},\dots,p_{c}$ , the number of finite entries in each of $S[V_{i}][0:p_{1}],\dots,S[V_{i}][0:p_{c}]$ can be computed by the nodes in $V_{i}$ in $\tilde{O}(1)$ rounds, using \IfAppendixLABEL:\next ( (Group Broadcasting and Aggregating).)Corollary B.9. Thus, each $c_{j}\in C_{i}$ can be found using binary search, with $c$ binary searches run in parallel in $\tilde{O}(1)$ rounds. In total $\tilde{O}(|C_{i}|/c+1)=\tilde{O}(n^{1/3}/c+1)$ rounds are required to compute $C_{i}$ .

Step: Creating Intermediate Representations – Sets $U_{i,j}$ , $P_{i,j,\ell}$ Matrices
Let $U_{i,j}$ , for $i,j\in[n^{1/3}]$ , be a hard-coded partition of $V$ into equally sized sets. The goal of this step is for nodes $U_{i,j}$ to compute an intermediate representation of the product $S[V_{i}]\times T[V_{j}]$ . Therefore, we desire that each $V_{i}$ sends all its data to $U_{i,j},U_{j,i}$ , for each $j\in[n^{1/3}]$ , in some load-balanced manner. We show how, for each $i\in[n^{1/3}]$ , $V_{i}$ sends $S[V_{i}]$ to $U_{i,j}$ , for each $j\in[n^{1/3}]$ , and in a symmetric way (with matrix $T$ instead of $S$ ) this can be done for $U_{j,i}$ .

In $\tilde{O}(n^{2/3}/c+1)$ rounds, allow communication within each $U_{i,j}$ by invoking \IfAppendixLABEL:\next ( (Grouping).)Lemma B.7. Let some $u_{i,j}\in U_{i,j}$ , be denoted leader, and broadcast the leaders of all the sets using \IfAppendixLABEL:\next ( (Broadcasting).)Lemma B.5 within $\tilde{O}(n^{2/3}/c+1)$ rounds. Node $u_{i,j}$ sends to both $v_{i},v_{j}$ all of $Tokens(U_{i,j})$ . Each node $v_{i}$ receives $n^{2/3}$ messages, thus this takes $\tilde{O}(n^{2/3}/c+1)$ rounds using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1. Now, each node $v_{i}$ broadcasts to $V_{i}$ all the tokens it receives, using \IfAppendixLABEL:\next ( (Group Broadcasting and Aggregating).)Corollary B.9, within $\tilde{O}(n^{2/3}/c+1)$ rounds.

Leader $v_{i}$ sends the contents of $C_{i}$ to all the leader nodes $u_{i,j},u_{j,i}$ , for any $j\in[n^{1/3}]$ , in $\tilde{O}(n^{2/3}/c+1)$ rounds using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1. Each $u_{i,j}$ broadcasts to $U_{i,j}$ the contents of $C_{i}$ and $C_{j}$ , within $\tilde{O}(n^{1/3}/c+1)$ rounds using \IfAppendixLABEL:\next ( (Group Broadcasting and Aggregating).)Corollary B.9.

We send information from $V_{i}$ to $U_{i,j}$ . The $\ell$ -th node in $U_{i,j}$ learns the finite elements in $S[V_{i}][c_{\ell}+1:c_{\ell+1}]$ . By definition of $C_{i}$ , for any $\ell$ , the number of finite elements in $S[V_{i}][c_{\ell}+1:c_{\ell+1}]$ is at most $16k\cdot n^{1/3}+16\Delta/n^{1/3}+2n^{2/3}$ , bounding the number of messages each node desires to receive. Every finite element held by a node $v\in V_{i}$ needs to be sent to $O(n^{1/3})$ nodes in the graph, in total. Further, since the finite elements of $v$ are stored in its carrier nodes, and each carrier node of $v$ stores $O(k)$ elements, each node sends at most a total of $O(k\cdot n^{1/3})$ messages. Thus, this routing can be accomplished in $\tilde{O}(k\cdot n^{1/3}/c+\Delta/(n^{1/3}\cdot c)+n^{2/3}/c+1)$ rounds, using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1.⁷⁷7To perform the routing, it is required that the carrier nodes of $v\in V_{i}$ know the communication tokens of the nodes in $U_{i,j}$ which receive the messages. All the nodes in $V_{i}$ received $Tokens(U_{i,j})$ , and so $v$ knows the required tokens. At the start of the proof, we state that we assume that every message a node receives is also broadcast from it to its carriers, and so also the carriers of $v$ know the required communication tokens..

We shuffle data within $U_{i,j}$ . Recall that $|U_{i,j}|=n^{1/3}$ . The first $|C_{i}|=|C_{j}|=n^{1/3}/4$ nodes of $U_{i,j}$ received data from $S[V_{i}],\ T[V_{j}]$ , according to $C_{i},C_{j}$ . Denote $C_{i,j}=C_{i}\cup C_{j}$ . We desire that for each interval, $[c_{\ell}+1,c_{\ell+1}]$ , for $c_{\ell}\in C_{i,j}$ , node $\ell\in U_{i,j}$ knows all the elements $S[V_{i}][c_{\ell}+1,c_{\ell+1}]$ and $T[c_{\ell}+1,c_{\ell+1}][V_{j}]$ . To do so, notice that $S[V_{i}][c_{\ell}+1,c_{\ell+1}]$ (likewise with $T$ ) is fully contained in the data which some node in $q\in U_{i,j}$ , where $q\in[n^{1/3}/4]$ already knows. Thus, each node $\ell\in[n^{1/3}/4]$ of $U_{i,j}$ denotes by $val(\ell)$ how many other nodes in $U_{i,j}$ are reliant on the data which it itself received from $V_{i}$ .⁸⁸8This can be done since all the nodes in $U_{i,j}$ know all of $C_{i,j}$ . Then, we invoke \IfAppendixLABEL:\next ( (Group Multicasting).)Lemma B.10 in $\tilde{O}(k\cdot n^{1/3}/c+\Delta/(n^{1/3}\cdot c+n^{2/3}/c+1)$ rounds in order to route the required information. Finally, each node $\ell\in[n^{1/3}/2]$ from $U_{i,j}$ knows $S[V_{i}][c_{\ell}+1,c_{\ell+1}]$ and $T[c_{\ell}+1,c_{\ell+1}][V_{j}]$ , denoted $S_{i,j,\ell},T_{i,j,\ell}$ , respectively, and can compute the product $P_{i,j,\ell}=S_{i,j,\ell}\times T_{i,j,\ell}$ .

We are at a state where for each $U_{i,j}$ , every $\ell\in[n^{1/3}/2]$ of $U_{i,j}$ knows some matrix $P_{i,j,\ell}$ such that $S[V_{i}]T[V_{j}]=\sum_{\ell\in[n^{1/3}/2]}P_{i,j,\ell}$ .

Step: Sparsification – $\hat{P}_{i,j,\ell}$ Matrices
We sparsify the $P_{i,j,\ell}$ matrices. Recall that we desire to output $\hat{P}$ which is $P=S\times T$ , with only the $k$ smallest entries on each row. Fix $i\in[n^{1/3}]$ , $\ell\in[n^{1/3}/2]$ , and denote by $Q_{i,\ell}$ as the matrix of size $|V_{i}|\times n$ created by concatenating $P_{i,0,\ell},\dots,P_{i,n^{1/3},\ell}$ . As shown in [14], we are allowed to keep only the $k$ smallest entries on each row of $Q_{i,\ell}$ , without damaging the final output $\hat{P}$ . That is, throwing out elements at this stage is guaranteed to throw out elements which are only in $P$ and not in $\hat{P}$ .

Fix $i\in[n^{1/3}],\ell\in[n^{1/3}/2]$ , and denote by $N_{i,\ell}$ the nodes numbered $\ell$ in each $U_{i,j}$ . The nodes in $N_{i,\ell}$ perform a binary search for each row of $Q_{i,n^{1/3},\ell}$ , to determine the cutoff for the $k^{th}$ smallest element on that row. The leaders $u_{i,0},\dots,u_{i,n^{1/3}}$ broadcast to each other $Tokens(U_{i,j}),\dots,Tokens(U_{i,n^{1/3}})$ , using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1, taking $O(n^{2/3}/c+1)$ rounds. Each leader $u_{i,j}$ broadcasts the $O(n^{2/3})$ tokens it received to $U_{i,j}$ , using \IfAppendixLABEL:\next ( (Group Broadcasting and Aggregating).)Corollary B.9, within $\tilde{O}(n^{2/3}/c+1)$ rounds. Now, the nodes $N_{i,\ell}$ all know $Tokens(N_{i,\ell})$ .

The nodes $N_{i,\ell}$ proceed in $\tilde{O}(1)$ iterations to perform concurrent binary searches on the threshold values for each row in $Q_{i,\ell}$ . Let $n_{i,\ell}$ be an arbitrarily chosen leader node for $N_{i,\ell}$ . In every iteration, node $n_{i,\ell}$ broadcasts $p_{1},\dots,p_{|V_{i}|}$ values, each a tentative threshold value for a row in $Q_{i,\ell}$ , using \IfAppendixLABEL:\next ( (Group Broadcasting and Aggregating).)Corollary B.9, and in response, aggregates from the nodes of $N_{i,\ell}$ , using \IfAppendixLABEL:\next ( (Group Broadcasting and Aggregating).)Corollary B.9 the total number of entries in each row of $Q_{i,\ell}$ below, equal to, and above the queried threshold. Each such iteration takes $\tilde{O}(|V_{i}|/c+1)=\tilde{O}(n^{2/3}/c+1)$ rounds, and after $\tilde{O}(1)$ iterations of this procedure, all the nodes in $N_{i,\ell}$ know a threshold for every row they posses, informing them which values in $Q_{i,n^{1/3},\ell}$ can be thrown out.

Define the matrices $\hat{P}_{i,j,\ell}$ by removing from $P_{i,j,\ell}$ the entries which are thrown away due to the thresholds.

Step: Balancing the Intermediate Representation
The nodes computed the matrices $\hat{P}_{i,j,\ell}$ , yet, some matrices $\hat{P}_{i,j,\ell}$ may be too dense in order to transport out of the nodes which locally hold them. Even though we sparsified the matrices above, the sparsification steps perform were on several $\hat{P}_{i,j,\ell}$ matrices at once, and thus we can still have single $\hat{P}_{i,j,\ell}$ matrices which remains very dense. Let $\hat{P}_{x,y,z}$ be such a very dense matrix. We overcome this challenge by having more nodes compute $\hat{P}_{x,y,z}$ from scratch, allowing each node to only take responsibility for retaining a part of $\hat{P}_{x,y,z}$ .

For each $i\in[n^{1/3}]$ , we pool the nodes $U_{i}=\bigcup_{j\in[n^{1/3}]}U_{i,j}$ and redistribute them such that areas in the matrix which are too dense get more nodes.

Each node $\ell\in[n^{1/3}/2]$ from $U_{i,j}$ computes the number of finite values in $\hat{P}_{i,j,\ell}$ , denoted as $p_{i,j,\ell}$ . Node $u_{i,j}$ computes $p_{i,j}=\sum_{\ell\in[n^{1/3}/2]}p_{i,j,\ell}$ using on $U_{i,j}$ within $\tilde{O}(n^{1/3}/c+1)$ rounds due to \IfAppendixLABEL:\next ( (Group Broadcasting and Aggregating).)Corollary B.9. Finally, $u_{i,j}$ broadcasts to the other leader nodes $u_{i,0},\dots,u_{i,n^{1/3}}$ the values $p_{i,j,0},\dots,p_{i,j,n^{1/3}/2}$ , and then broadcasts to $U_{i,j}$ all the $O(n^{2/3})$ values that it received – taking $O(n^{2/3}/c+1)$ rounds due to \IfAppendixLABEL:\next ( (Routing).)Lemma B.1 and \IfAppendixLABEL:\next ( (Group Broadcasting and Aggregating).)Corollary B.9.

Due to the fact that each row of $\hat{P}$ has at most $k$ elements in total, and currently $\hat{P}$ is distributed across $n^{1/3}/2$ matrices which need to be summed, all the nodes $U_{i}$ hold at most $k\cdot|V_{i}|\cdot n^{1/3}/2\leq k\cdot n$ elements. That is, the sum $p_{i}=\sum_{j\in[n^{1/3}]}p_{i,j}$ is at most $k\cdot n$ . Each node $\ell\in[n^{1/3}/2]$ from $U_{i,j}$ observes $\hat{P}_{i,j,\ell}$ and locally breaks it up into $t(i,j,\ell)=\lceil p_{i,j,\ell}/(2k\cdot n^{1/3})\rceil$ matrices $\hat{P}_{i,j,\ell,0},\dots,\hat{P}_{i,j,\ell,t(i,j,\ell)}$ which sum up to $\hat{P}_{i,j,\ell}$ and each have at most $2k\cdot n^{1/3}$ finite elements. This creates at most $n^{2/3}$ such matrices. That is, $\sum_{j\in[n^{1/3}],\ell\in[n^{1/3}/2]}t(i,j,\ell)\leq\sum_{j\in[n^{1/3}],\ell\in[n^{1/3}/2]}\lceil p_{i,j,\ell}/(2k\cdot n^{1/3})\rceil\leq n^{2/3}$ . Thus, the total number of intermediate matrices, spread over the nodes $U_{i}$ , is at most $n^{2/3}$ . Each $\ell\in[n^{1/3}/2]$ from $U_{i,j}$ is allocated $t(i,j,\ell)-1$ other nodes from $U_{i}$ , called auxiliary nodes in order to send them the matrices $\hat{P}_{i,j,\ell,1},\dots,\hat{P}_{i,j,\ell,t(i,j,\ell)}$ . Notice that since all nodes in $U_{i}$ know all the values $p_{i,j,\ell}$ , all the nodes can locally know which node in $U_{i}$ is allocated to help which other node in $U_{i}$ . However, node $\ell$ cannot send to all its auxiliary nodes all of this information, instead it sends to each of its auxiliary nodes the data it received from $S$ and $T$ with which it computed the matrix $P_{i,j,\ell}$ , and the $O(n^{2/3})$ thresholds which it used to turn $P_{i,j,\ell}$ into $\hat{P}_{i,j,\ell}$ . This can be accomplished via \IfAppendixLABEL:\next ( (Group Multicasting).)Lemma B.10 within $\tilde{O}(k\cdot n^{1/3}/c+n^{2/3}/c+1)$ rounds, since each node wishes to multicast at most $\tilde{O}(k\cdot n^{1/3})$ data (as this is the bound on the total data each node received from $S$ and $T$ ), to at most $|U_{i}|=O(n^{2/3})$ other nodes, and each node desires to receive multicast messages only from one other node.

Step: Summation
We send data from $U_{i}$ to $V_{i}$ to perform the final summation step. Node $v\in V_{i}$ learns all of the information held in the nodes in $U_{i}$ which pertains to $\hat{P}[v]$ . All the nodes in $U_{i}$ know $Tokens(V_{i})$ , and vice versa and can thus communicate. Each node in $V_{i}$ wishes to receive $O(k\cdot n^{1/3})$ data from $U_{i}$ , and likewise, every node in $U_{i}$ wishes to send $O(k\cdot n^{1/3})$ data. Thus, this communication is executed using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1 in $\tilde{O}(k\cdot n^{1/3}/c+1)$ rounds. Upon receiving the data, each node $v\in V_{i}$ computes the at most smallest $k$ entries of $\hat{P}[v]$ . Now, $\hat{P}$ is stored in a partial carrier configuration (with every node $v$ having $C_{v}^{out}=\{v\}$ ), and thus we invoke \IfAppendixLABEL:\next ( (Partial Configuration Completion).)Lemma B.18, within $\tilde{O}(\sqrt{nk}/c+n/(k\cdot c)+1)$ rounds, to ensure that $\hat{P}$ is stored in a carrier configuration $C$ . Notice that $\tilde{O}(\sqrt{nk}/c+1)=\tilde{O}(k\cdot n^{1/3}/c+n^{2/3}/c+1)$ , and so we are within the stated round complexity. ∎

4 Breaking Below $o(n^{1/3})$ in $\mathsf{Hybrid}$

We show a $(1+\varepsilon)$ approximation for weighted SSSP in the $\mathsf{Hybrid}$ model within $\tilde{O}(n^{5/17}/\varepsilon^{9})$ rounds, further implying that it is possible to compute a $(2+\varepsilon)$ approximation for the weighted diameter in this number of rounds. We achieve this by combining a simulation of our $\mathsf{AC(c)}$ model and of the $\mathsf{Broadcast\ Congested\ Clique}$ model. Roughly speaking, this incorporates density awareness in addition to the sparsity awareness discussed thus far.

A key approach in previous distance computations in the $\mathsf{Hybrid}$ model [8, 49, 18] is to construct an overlay skeleton graph, and show that solving distance problems on such skeleton graphs can be extended to solutions on the entire graph. In a nutshell, given a graph $G=(V,E)$ and some constant $0<x<1$ , a skeleton graph $S_{x}=(M,E_{S})$ is generated by letting every node in $V$ independently join $M$ with probability $n^{x-1}$ . Two skeleton nodes in $M$ have an edge in $E_{S}$ if there exists a path between them in $G$ of at most $\tilde{O}(n^{1-x})$ hops. In particular, the nodes of $S_{x}$ are well spaced in the graph and satisfy a variety of useful properties. The central distance related property is that pair of far enough nodes in $G$ have skeleton nodes at predictable intervals on some shortest path between them.

Given such a skeleton graph, $S_{x}=(M,E_{s})$ , with $|M|=\Theta(n^{x})$ , previous work showed that it is possible in the $\mathsf{Hybrid}$ model to let the nodes in $M$ take control over the other nodes in the graph and use all the global bandwidth available to perform messaging between nodes in $M$ . In essence, after $\tilde{\Theta}(n^{1-x})$ pre-processing rounds, the $\tilde{\Theta}(n)$ global bandwidth available to the entire graph in each round of the $\mathsf{Hybrid}$ model is utilized such that every node in $M$ can send and receive $\tilde{\Theta}(n/|M|)=\tilde{\Theta}(n^{1-x})$ messages per round from any other node in $M$ , in an amortized manner.⁹⁹9In $\tilde{\Theta}(n^{1-x})$ rounds, each node in $M$ can send and receive $\tilde{\Theta}(n^{2-2x})$ messages to other nodes in $M$ . For a given $v\in M$ , we denote by $H_{v}$ the set of helper nodes of $v$ which contribute their globally communication capacity to $v$ ; it is guaranteed that all nodes in $H_{v}$ are at most $\tilde{O}(n^{1-x})$ hops away from $v$ in $G$ .

A formal definition of skeleton graphs encapsulating all of the above, is in Definition 20 in Section C.1.2. These skeleton graphs are built upon nodes randomly selected from the input graph $G$ , and thus the number of edges in $S_{x}$ correlates to the density of neighborhoods in $G$ – the graph $S_{x}$ is either sparse or dense, depending on $G$ . We split into cases according to the sparsity of $S_{x}$ , which can be computed using known techniques from the $\mathsf{LOCAL}$ and $\mathsf{Node\text{-}Capacitated\ Clique}$ models.

Sparse $S_{x}$ : A hurdle that stands in the way for going below $o(n^{1/3})$ rounds is that one must choose $x>2/3$ , in order for the $\tilde{\Theta}(n^{1-x})$ -round pre-processing step to not exceed the goal complexity. However, in order to use the previously shown routing techniques, the identifiers of the nodes in $M$ must be globally known, a task which can be shown to take $\tilde{\Omega}(n^{x/2})=\omega(n^{1/3})$ rounds. This leads to an anonymity issue – letting nodes in $M$ communicate with one another although no node in the graph knows the identifiers of all the nodes in $M$ .

We overcome this anonymity problem by showing a routing algorithm which allows messaging over $S_{x}$ without assuming that the identifiers of the nodes in $M$ are globally known. This allows us to simulate the $\mathsf{AC(c)}$ model over $S_{x}$ . By simulating algorithms from the $\mathsf{AC(c)}$ model on $S_{x}$ , we directly get a $(1+\varepsilon)$ -approximation for SSSP in $o(n^{1/3})$ rounds, with the exact round complexity dependent on the sparsity of $S_{x}$ . However, as $S_{x}$ approaches having $|M|^{2}=\Theta(n^{2x})$ edges, the round complexity of the simulated $\mathsf{AC(c)}$ model algorithms approaches $\tilde{\Theta}(n^{1/3})$ . As such, using the techniques so far, we solve all cases except for very dense skeleton graphs $S_{x}$ .

Dense $S_{x}$ : To tackle a dense $S_{x}$ , we present a density aware simulation of the $\mathsf{BCC}$ model over $S_{x}$ . The $\mathsf{BCC}$ model is simulated over $S_{x}$ in [8] within $\tilde{O}(n^{1/3})$ rounds. Our observation is that as $S_{x}$ is more dense, broadcasting messages in the input graph can be made more efficient. In essence, as $S_{x}$ is more dense, neighborhoods in the original input graph $G$ are closely packed, and so when a node receives some message, it can efficiently share it with many nodes in its neighborhood. With this in hand, we can simulate the $\mathsf{BCC}$ algorithm from [11, Theorem 8] for approximate SSSP very quickly on dense skeleton graphs.

Tying up the pieces: Each simulation result by itself is insufficient, as in the extreme cases each solution takes $\tilde{O}(n^{1/3})$ rounds. Yet, by combining them and using each when it is better, based on the sparsity of $S_{x}$ , we achieve the resulting $\tilde{O}(n^{5/17})=o(n^{1/3})$ round algorithm, for all graphs.

The outline of the rest of this section: We first perform routing over skeleton graphs where the receivers of messages do not know which nodes desire to send them messages (Section 4.1). In Section 4.2, we simulate the $\mathsf{AC(c)}$ model in the $\mathsf{Hybrid}$ model. Next, we state that the $\mathsf{BCC}$ model can be simulated in the the $\mathsf{Hybrid}$ model, and defer the proof to Appendix C. Finally, in Section 4.3, we combine our various algorithms to yield the SSSP approximation result from which the weighted diameter approximation result also follows.

4.1 Oblivious Token Routing

The following claim shows how to route unicast messages inside a skeleton graph, and is based on \IfAppendixLABEL:\next ( (Oblivious Token Routing).)Lemma C.11 shown in Section C.2.

Claim 4.1 (Skeleton Unicast).

Given a graph $G$ , a skeleton graph $S_{x}=(M,E_{S})$ , and a set of messages between the nodes of $M$ , s.t. each $v\in M$ is the sender and receiver of at most $k=\tilde{O}(n^{2-2x})$ messages, and each message is initially known to its sender, it is possible to route all given messages within $\tilde{O}(n^{1-x})$ rounds of the $\mathsf{Hybrid}$ model.

\IfAppendix

LABEL:\next ( (Skeleton Unicast).)Claim 4.1 is an extremely important requirement for showing our $o(n^{1/3})$ round algorithms. Previously, [49] required that every node knows how many messages every other node intends to send to it. In turn, this would require that for each $v\in M$ , all the other nodes in $M$ know the identifier of $v$ . But the latter can be shown to take $\omega(n^{x/2})=\omega(n^{1/3})$ rounds, since $x>2/3$ (as elaborated above). Therefore, the necessity of our strengthened claim follows.

4.2 $\mathsf{AC(c)}$ and $\mathsf{BCC}$ Simulations in $\mathsf{Hybrid}$

Theorem 4.2 ( $\mathsf{AC(c)}$ Simulation in $\mathsf{Hybrid}$ ).

Consider a graph $G=(V,E)$ , and a skeleton graph $S_{x}=(M,E_{S})$ of $G$ , for some constant $2/3<x<1$ . Let $ALG_{AC}$ be an algorithm which runs in the $\mathsf{AC(c)}$ model with capacity $c={\tilde{{\Theta}}}(n^{2-2x})$ over $S_{x}$ in $t$ rounds. Then there exists an algorithm which simulates $ALG_{AC}$ within ${\tilde{{\Theta}}}(t\cdot n^{1-x})$ rounds of the $\mathsf{Hybrid}$ model over $G$ , w.h.p. Further, it is ensured that at the start of the simulation, every node in $M$ knows the communication tokens of all its neighbors in $S_{x}$ .

Proof of Theorem 4.2.

We show that we can instantiate the $\mathsf{AC(c)}$ model over $S_{x}$ and then show how to simulate each round.

In order to instantiate the $\mathsf{AC(c)}$ model over the nodes $M$ , the $\mathsf{AC(c)}$ model definition asserts that every node $v\in M$ has an identifier in $[|M|]$ as well as a communication token whose knowledge enables other nodes to communicate with $v$ . In order to ensure the condition related to identifiers is satisfied, we invoke \IfAppendixLABEL:\next ( (Unique IDs).)Claim C.7 over $G$ and $S_{x}$ in $\tilde{O}(1)$ rounds. For the second condition, each node $v\in M$ uses its original identifier in the $\mathsf{Hybrid}$ model as its communication token in the $\mathsf{AC(c)}$ model, which enables us to use \IfAppendixLABEL:\next ( (Skeleton Unicast).)Claim 4.1 in order to later simulate the rounds of the $\mathsf{AC(c)}$ model. Finally, notice that it holds that every node in $M$ knows the communication tokens of all its neighbors in $S_{x}$ , as the communication tokens are the $\mathsf{Hybrid}$ model identifiers.

In each round of the $\mathsf{AC(c)}$ model, each node sends/receives at most $c$ messages to/from any other node, such that for each message it either knows the communication token of the recipient or the recipient is chosen independently, uniformly at random. We split each round of the $\mathsf{AC(c)}$ model into two phases. First, we route the messages which use communication tokens, and second, we route those messages destined to random nodes. Since the communication tokens in the $\mathsf{AC(c)}$ model are the identifiers from the $\mathsf{Hybrid}$ model, the first phase is implemented straightforwardly using \IfAppendixLABEL:\next ( (Skeleton Unicast).)Claim 4.1, taking ${\tilde{{O}}}(n^{1-x})$ rounds of the $\mathsf{Hybrid}$ model over $G$ , as required.

For the second phase, denote by $R_{v}$ the messages which node $v$ desires to send to random targets. Node $v$ selects, uniformly and independently, $|R_{v}|\leq c$ random nodes from $G$ , where each node is the target for a different message in $R_{v}$ – from here on, we assume that a message in $R_{v}$ has the identifier of a random node in $G$ attached to it. For each helper $u\in H_{v}$ , node $v$ assigns $c/|H_{v}|=\tilde{O}(n^{1-x})$ messages from $R_{v}$ , denoted by $R_{v}^{u}$ , to $u$ . Within $\tilde{O}(n^{1-x})$ rounds, node $v$ uses the local edges of the $\mathsf{Hybrid}$ model in order to inform $u\in H_{v}$ about $R_{v}^{u}$ . Next, each node $u$ uses the global edges of the $\mathsf{Hybrid}$ model in order to send the messages $R_{v}^{u}$ to their targets in $G$ , within $\tilde{O}(n^{1-x})$ rounds, due to \IfAppendixLABEL:\next ( (Uniform Sending).)Claim C.4. Finally, given that a node $u\in V$ received a message in this way, node $u$ selects uniformly and independently at random a node such that $u\in H_{v}$ and forwards that message $v$ . Notice that it can be the case that a certain $u\in V$ does not help any node, and thus there does not exist a node $v\in M$ such that $u\in H_{v}$ – in such a case, $u$ will report back to whichever node sent it the message saying that it cannot forward it. However, the definition of a skeleton graph promises that the total number of helper nodes of nodes in $M$ is at least $\tilde{\Omega}{(n)}$ (Property 7d of \IfAppendixLABEL:\next ( (Extended Skeleton Graph).)Definition 20). As every node assists at most $\tilde{O}(1)$ other nodes, this implies that at least a poly-logarithmic fraction of the nodes assist other nodes, and so if we repeat the above process and resend messages that bounced back, within $\tilde{O}(1)$ iterations of the above algorithm, w.h.p., all messages are forwarded.

Notice that the above methodology does not produce a uniform distribution of receivers over the nodes $M$ , since there might be some node $v\in M$ such that all of $H_{v}$ only help $v$ , and a node $u\in M$ such that all the nodes $H_{u}$ also help other nodes in $M$ (thus $u$ is less likely than $v$ to receive a random message). This happens even though each node only helps at most $\tilde{O}(1)$ other nodes and the number of nodes in $H_{v}$ is exactly the same for each $v\in M$ . Thus, we augment the probabilities with which a node $v\in M$ accepts a message. Each node $v\in M$ observes its helpers $H_{v}$ and computes the probability, denoted $p_{v}$ , that given that a uniformly chosen node $u\in H_{v}$ received a message, $u$ forwards the message to $v$ . Now, the nodes $M$ utilize an Aggregate and Broadcast tool of [7, Theorem 2.2] (see \IfAppendixLABEL:\next ( (Aggregate and Broadcast).)Claim C.1) in order to compute $p=\min_{v\in M}p_{v}$ . Every node $v\in M$ now accepts each message it received with probability $p/p_{v}$ , independently. In the case that a node $v\in M$ rejects a message, it notifies the original sender of the message that the message was rejected – this is done by sending messages in the reverse direction to the way they were sent previously. In the case that a node $v\in M$ hears that a message it sent was rejected, it will attempt to resend it by repeating the entire algorithm above. As every node $u\in V$ helps at most $\tilde{O}(1)$ nodes $v\in M$ , it holds that $p=\tilde{\Omega}(1)$ , and so within $\tilde{O}(1)$ iterations of the above algorithm, w.h.p., every message will be successfully delivered. ∎

The following is proven in Appendix C.

Theorem 4.3 ( $\mathsf{BCC}$ Simulation in $\mathsf{Hybrid}$ ).

Given a graph $G=(V,E)$ , a skeleton graph $S_{x}=(M,E_{S})$ , for some constant $2/3<x<1$ , with average degree $k=\Theta(|E_{S}|/|M|)$ , and an algorithm $ALG_{BCC}$ in the $\mathsf{Broadcast\ Congested\ Clique}$ model which runs on $S_{x}$ in $t$ rounds, it is possible to simulate $ALG_{BCC}$ by executing $\tilde{O}(t\cdot(n^{2x-1}/\sqrt{k}+n^{1-x}))$ rounds of the $\mathsf{Hybrid}$ model on $G$ . This assumes that prior to running $ALG_{BCC}$ , each node $v\in S_{x}$ has at most $\tilde{O}(\deg_{G}(v))$ bits of input used in $ALG_{BCC}$ , including, potentially, the incident edges of $v$ in $S_{x}$ , and that the output of each node in $ALG_{BCC}$ is at most $O(t\log n)$ bits.

4.3 A $(1+\varepsilon)$ -Approximation for SSSP

We show an $\tilde{O}(n^{5/17}/\varepsilon^{9})$ round algorithm for a $(1+\varepsilon)$ -approximation of weighted SSSP in the $\mathsf{Hybrid}$ model. We begin by showing how to use the $\mathsf{AC(c)}$ algorithm from \IfAppendixLABEL:\next ( ( $(1+\varepsilon)$ -Approximation for SSSP in $\mathsf{AC(c)}$ (Wrapper)).)Theorem B.25 in the $\mathsf{Hybrid}$ model for sparse skeleton graphs, with low maximal degree and even lower average degree, we then show an algorithm in the $\mathsf{Hybrid}$ model for graphs with low average degree, yet high maximal degree, and then finally we show how to use the $\mathsf{Broadcast\ Congested\ Clique}$ algorithm from [11, Theorem 8] in the $\mathsf{Hybrid}$ model for dense graphs, with high average degree. Combining these claims using \IfAppendixLABEL:\next ( ( $(1+\varepsilon)$ -Approximation for SSSP in $\mathsf{Hybrid}$ ).)Theorem 1.2 gives the desired result.

Lemma 4.4 (SSSP with Low Average and Maximal Degrees).

Given a weighted input graph $G=(V,E)$ , and a skeleton graph $S_{x}=(M,E_{S})$ , s.t. the average and maximal degrees in $S_{x}$ are $k=\tilde{O}(n^{x/2})$ and $\Delta_{S_{x}}=O(n^{2-2x})$ , respectively, a value $0<\varepsilon<1$ , and a source node $s\in M$ , ensures that every node in $M$ knows a $(1+\varepsilon)$ -approximation to its distance from $s$ over the edges $E_{S}$ , within $\tilde{O}((n^{11x/6-1}+n^{1-x})/\varepsilon)$ rounds in the $\mathsf{Hybrid}$ model, w.h.p.

Proof of Lemma 4.4.

Use \IfAppendixLABEL:\next ( ( $\mathsf{AC(c)}$ Simulation in $\mathsf{Hybrid}$ ).)Theorem 4.2 to simulate in the $\mathsf{Hybrid}$ model over $G$ the SSSP algorithm from \IfAppendixLABEL:\next ( ( $(1+\varepsilon)$ -Approximation for SSSP in $\mathsf{AC(c)}$ (Wrapper)).)Theorem B.25 over $S_{x}$ . Set capacity $c=\tilde{\Theta}(n^{2-2x})$ for a round complexity of $\tilde{O}(n^{1-x}\cdot(((n^{x})^{5/6}+k\cdot(n^{x})^{1/3})/(c\cdot\varepsilon)+1/\varepsilon+\Delta_{S_{x}}/c))=\tilde{O}(n^{1-x}\cdot(((n^{x})^{5/6}+n^{x/2}\cdot(n^{x})^{1/3})/(c\cdot\varepsilon)+1/\varepsilon+\Delta_{S_{x}}/c))=\tilde{O}((n^{5x/6-2+2x+1-x}+n^{1-x})/\varepsilon+\Delta_{S_{x}}/n^{1-x})=\tilde{O}((n^{11x/6-1}+n^{1-x})/\varepsilon)$ .∎

\IfAppendix

LABEL:\next ( ( $(1+\varepsilon)$ -Approximation for SSSP in $\mathsf{AC(c)}$ (Wrapper)).)Theorem B.25 depends on the maximal degree, and so applying it to a skeleton graph with high maximal degree is inefficient. Thus, for sparse skeleton graphs with a high maximal degree, we show a $\mathsf{Hybrid}$ algorithm ensuring one node in the skeleton graph can learn all of the skeleton graph and inform the nodes of their desired outputs. The proof of the following appears in Appendix C.

Lemma 4.5 (SSSP with Low Average and High Maximal Degrees).

Given a weighted input graph $G=(V,E)$ , and a skeleton graph $S_{x}=(M,E_{S})$ , for some constant $2/3<x\leq 12/17$ , such that the average and maximal degrees in $S_{x}$ are at most $\tilde{O}(n^{x/2})$ and at least $\Delta_{S_{x}}=\omega(n^{2-2x})$ , respectively, and a source node $s\in M$ , ensures that every node in $M$ knows its distance from $s$ over the edges $E_{S}$ , within $\tilde{O}(n^{1-x})$ rounds in the $\mathsf{Hybrid}$ model, w.h.p.

We use our efficient $\mathsf{BCC}$ simulation for dense skeleton graphs.

Lemma 4.6 (SSSP with High Average Degree).

Given a weighted, undirected input graph $G=(V,E)$ , a skeleton graph $S_{x}=(M,E_{S})$ , for some constant $2/3<x<1$ , such that the average degree in $S_{x}$ is at least $k={\tilde{{\Omega}}}(n^{x/2})$ , a value $0<\varepsilon<1$ , and a source node $s\in M$ , ensures that every node in $M$ knows a $(1+\varepsilon)$ -approximation to its distance from $s$ over the edges $E_{S}$ , within ${\tilde{{O}}}((n^{7x/4-1}+n^{1-x})/\varepsilon^{9})$ rounds in the $\mathsf{Hybrid}$ model, w.h.p.

Proof of Lemma 4.6.

\IfAppendix

LABEL:\next ( ( $\mathsf{BCC}$ Simulation in $\mathsf{Hybrid}$ ).)Theorem 4.3 simulates the SSSP approximation algorithm of [11, Theorem 8] on $S_{x}$ . As the complexity of [11, Theorem 8] is $\tilde{O}(\varepsilon^{-9})$ rounds of the $\mathsf{BCC}$ model, the simulation takes $\tilde{O}(\varepsilon^{-9}\cdot(n^{2x-1}/\sqrt{k}+n^{1-x}))=\tilde{O}(\varepsilon^{-9}\cdot(n^{2x-1-(x/2)/2}+n^{1-x}))={\tilde{{O}}}((n^{7x/4-1}+n^{1-x})/\varepsilon^{9})$ rounds of the $\mathsf{Hybrid}$ model. ∎

Finally, we combine \IfAppendixLABEL:\next ( (SSSP with Low Average and Maximal Degrees).) and LABEL:\next ( (SSSP with Low Average and High Maximal Degrees).) and LABEL:\next ( (SSSP with High Average Degree).)Lemmas 4.4, 4.5 and 4.6.

See 1.2

Proof of Theorem 1.2.

Denote $x=12/17$ . Construct a skeleton graph $S_{x}=(M,E_{S})$ in $\tilde{O}{(n^{1-x})}=\tilde{O}(n^{5/17})$ rounds using \IfAppendixLABEL:\next ( (Construct Skeleton).)Corollary C.6. Notice that \IfAppendixLABEL:\next ( (Construct Skeleton).)Corollary C.6 can ensure that source $s$ is also in $M$ . Using \IfAppendixLABEL:\next ( (Aggregate and Broadcast).)Claim C.1, compute $k=\Theta(|E_{S}|/|M|)$ and the maximal degree in $S_{x}$ , the value $\Delta_{S_{x}}$ , in $\tilde{O}(1)$ rounds.

First, ensure every node in $S_{x}$ knows a $(1+\varepsilon)$ -approximation to its distance from $s$ over the edge set $E_{S}$ only. Thus, selectively deploy Lemmas 4.6, 4.4 and 4.5, as follows. If $k=\tilde{\Omega}(n^{x/2})$ , invoke Lemma 4.6 in ${\tilde{{O}}}((n^{7x/4-1}+n^{1-x})/\varepsilon^{9})={\tilde{{O}}}((n^{(7/4)\cdot(12/17)-1}+n^{1-12/17})/\varepsilon^{9})={\tilde{{O}}}((n^{4/17}+n^{5/17})/\varepsilon^{9})={\tilde{{O}}}(n^{5/17}/\varepsilon^{9})$ rounds. Otherwise, $k=O(n^{x/2})$ , and so split the algorithm into two cases, according to $\Delta_{S_{x}}$ . If $\Delta_{S_{x}}=O(n^{2-2x})$ , invoke Lemma 4.4, in $\tilde{O}((n^{11x/6-1}+n^{1-x})/\varepsilon)=\tilde{O}((n^{(11/6)\cdot(12/17)-1}+n^{1-12/17})/\varepsilon)={\tilde{{O}}}(n^{5/17}/\varepsilon)$ rounds, and otherwise invoke Lemma 4.5 in $\tilde{O}(n^{1-x})={\tilde{{O}}}(n^{5/17})$ rounds.

Finally, due to Property 4 of the definition of a skeleton graph, we invoke \IfAppendixLABEL:\next ( (Extend Distances).)Claim C.10, in $\tilde{O}(n^{5/17})$ rounds, to ensure every $v\in V$ knows a $(1+\varepsilon)$ -approximation to its distance from $s$ in $G$ . ∎

\IfAppendix

LABEL:\next ( ( $(1+\varepsilon)$ -Approximation for SSSP in $\mathsf{Hybrid}$ ).)Theorem 1.2 with $\alpha=1+\varepsilon/2$ and \IfAppendixLABEL:\next ( (Diameter from SSSP).)Claim C.9 give the following.

Theorem 4.7 ( $(2+\varepsilon)$ -Approximation for Weighted Diameter in $\mathsf{Hybrid}$ ).

It is possible to compute a $(2+\varepsilon)$ -approximation for weighted diameter in $\tilde{O}(n^{5/17}/\varepsilon^{9})$ rounds in the $\mathsf{Hybrid}$ model, w.h.p.

5 Fast Distance Computations in the $\mathsf{CONGEST}$ Model

In Section 5.1 we show how to simulate the $\mathsf{AC(c)}$ model in $\mathsf{CONGEST}$ . We then employ our simulation to simulate the $\mathsf{Congested\ Clique}$ model in $\mathsf{CONGEST}$ (Section 5.2), and to derive our distance computations in $\mathsf{CONGEST}$ (Section 5.3).

5.1 $\mathsf{AC(c)}$ Simulation in $\mathsf{CONGEST}$

Key Principle. In the $\mathsf{CONGEST}$ model, each node $v$ can send or receive $\deg_{G}{(v)}$ messages in each round, implying a total bandwidth of $2m$ messages per round. We aim to build efficient distance tools which utilize this bandwidth completely. Consider the $\mathsf{AC(m/n)}$ model, which also has a bandwidth of $2m$ messages per round. Potentially, by comparing the bandwidths, it could be possible to simulate one round of the $\mathsf{AC(c)}$ model in single round of the $\mathsf{CONGEST}$ model. However, in the $\mathsf{CONGEST}$ model, nodes with degree ${o}(m/n)$ can not send $m/n$ messages in a single round, regardless of how an algorithm tries to do this.

To overcome this problem, we notice that the bandwidth of the nodes with degree at least $m/(4n)$ is at least $7m/4$ . Thus, the key principle we show is that the high degree nodes, denoted by $H$ , can learn all of the input of the low degree nodes, denote by $L$ . The higher the degree of a node, the more nodes it simulates. Formally, we create an assignment $\rho\colon L\mapsto H$ where for every $\ell\in L$ , the node $\rho(\ell)\in H$ simulates $\ell$ , and we ensure that $\rho$ is globally known.

We then simulate the $\mathsf{AC(m/n)}$ model using only the nodes in $H$ , and finally we send back the resulting output to the nodes in $L$ . To allow all-to-all communication between the nodes in $H$ , we use the routing algorithms developed in [38, 20] and stated in Appendix D and pay an overhead of $\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds for the simulation.

Some problems, such as Scattered APSP, require the output to be stored on some node, but do not specify on which. We adapt for this case by allowing nodes to produce an auxiliary output. After the simulation is over, every node $u$ , given the communication token of any node $v$ can compute the identifier of node $w$ where the auxiliary output of $v$ is stored.

Carrier Configuration and Communication Tokens. In addition to simulating the $\mathsf{AC(c)}$ model in the $\mathsf{CONGEST}$ model, we let every node know the communication token of its neighbors as well as construct a carrier configuration directly in the $\mathsf{CONGEST}$ model. This benefits some graph problems greatly.

Note that in the simulation in the $\mathsf{Hybrid}$ model, the carrier configuration is constructed in the $\mathsf{AC(c)}$ model itself. However, in the case of the $\mathsf{CONGEST}$ model, we cannot delegate this task efficiently to the $\mathsf{AC(c)}$ model since building a carrier configuration requires every node to be able to send its incident edges to arbitrary nodes in the graph. Doing so takes $\Omega(\Delta\cdot n/m)$ rounds in the $\mathsf{AC(m/n)}$ model, yet only $O(\tau_{\text{mix}}\cdot n^{o(1)})$ rounds in $\mathsf{CONGEST}$ .

However, building a carrier configuration in the $\mathsf{CONGEST}$ model is also not directly possible, as a low degree carrier might learn ${\Omega}(m/n)$ edges of other nodes, which could take ${\Omega}(m/n)$ rounds. Therefore, instead of each node sending its edges directly to its carriers, it sends them to the nodes which simulate its carriers.

Overall this might sound like a back and forth process, as we simulate low degree nodes by high degree nodes and then split the edges of the high degree nodes among nodes which may be simulated nodes. However, using the $\mathsf{AC(c)}$ model grants us the modular approach we aim for.

Supergraphs. \IfAppendixLABEL:\next ( (Hopset Construction).)Theorem B.19 constructs hopsets and Theorem B.28 computes Scattered APSP in the $\mathsf{AC(c)}$ model, and both require the input graph to have ${\Omega}(n^{3/2})$ edges.¹⁰¹⁰10This assumption can be removed at the expense of more complicated proofs, yet would not imply any speed-up for our end-results. We aim to apply those results on general graphs and so augment $G$ with ${O}(n)$ added nodes and ${\Theta}(n^{3/2})$ added edges while preserving distances between the nodes of $V$ and ensuring that all added edges are globally known. We call the resulting graph $G^{\prime}$ an $n^{3/2}$ -supergraph, build a carrier configuration holding $G^{\prime}$ , and apply the $\mathsf{AC(c)}$ algorithms on $G^{\prime}$ .

Definition 6.

Given a weighted graph $G=(V,E,w)$ with $n=\left|V\right|$ and $m=\left|E\right|$ , and a number $0\leq m^{\prime}\leq n^{2}$ , a weighted graph $G^{\prime}=(V^{\prime},E^{\prime},w^{\prime})$ is an $m^{\prime}$ -supergraph of $G$ , if $G^{\prime}=(V^{\prime},E^{\prime},w^{\prime})$ is obtained from $G$ by adding $\lceil m^{\prime}/n\rceil$ new nodes and adding an infinite weight edge between every added node and every original node.

Clearly, a supergraph preserves the original distances.

We now simulate the $\mathsf{AC(c)}$ model in the $\mathsf{CONGEST}$ model.

Theorem 5.1 ( $\mathsf{AC(c)}$ Simulation in $\mathsf{CONGEST}$ ).

Consider a graph $G$ and some constant $c$ . Let $m^{\prime}$ be some number s.t. $0\leq m^{\prime}\leq n^{2}$ . Let $k=m/n$ be the average degree of $G$ and let $A$ be an algorithm which runs in the $\mathsf{AC(c)}$ model over $G$ in $t$ rounds. For each $v$ denote by $i_{v}\log{n}$ and $o_{v}\log{n}$ the number of bits in the input and output of the node $v$ , respectively. Let $i_{c}=\max_{v\in V}\set{1+i_{v}/\deg_{G}(v)}$ and $o_{c}=\max_{v\in V}\set{1+{o_{v}/\deg_{G}(v)}}$ be the input and output capacities: the minimum number of rounds required for any node to send or receive its input or output, respectively.

There exists an algorithm which simulates $A$ within $(m^{\prime}/m+i_{c}+\lceil c/k\rceil\cdot t+o_{c})\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds of the $\mathsf{CONGEST}$ model over $G$ . The above works even if $A$ requires a carrier configuration of $G^{\prime}$ (an $m^{\prime}$ -supergraph of $G$ ) or communication tokens of neighbors as an input. Furthermore, each node $u$ might produce some unbounded auxiliary output, in which case the output is known to some (not necessary same) node $v$ such that each node $w$ can compute the identifier of $v$ given the communication token of $u$ .

Proof of Theorem 5.1.

Identifiers: First, compute $k=m/n$ in $O(D)=O(\tau_{\text{mix}})$ rounds. By the definition of the $\mathsf{CONGEST}$ model, each node $v$ has an identifier $v\in\left[n\right]$ , denoted the original identifier. Use \IfAppendixLABEL:\next ( (Identifiers).)Corollary D.2, to compute a set of new identifiers $ID_{new}\colon V\mapsto\left[n\right]$ . We abuse notation and denote by $\deg_{G}(i)$ the degree of node $v$ with identifier $ID_{new}(v)=i$ . Each node $v$ locally computes for each new identifier $i\in\left[n\right]$ its value $\lfloor\log\deg_{G}{(i)}\rfloor$ . A node $v\in V$ with degree that is less than $2^{\lfloor\log{k}\rfloor-2}$ is a low degree node $v\in L$ , otherwise it is a high degree node $v\in H$ . According to the properties of $ID_{new}$ , the new identifiers of nodes in $L$ are smaller than those of nodes in $H$ . We abuse the notation and treat $L$ and $H$ as set of new identifiers. For each $i\in L$ denote $x_{i}=\lfloor\log\deg_{G}{(i)}\rfloor$ and for each $j\in H$ denote $y_{j}=\lfloor\log\deg_{G}{(i)}\rfloor$ .

Simulation Assignment: The numbers $\left|L\right|,\left|H\right|$ and the sets $\set{x_{i}}_{i\in L},\set{y_{j}}_{j\in H}$ satisfy the conditions of \IfAppendixLABEL:\next ( (Assignment).)Claim D.4, since $\sum_{i\in L}{2^{x_{i}}}+\sum_{j\in H}{2^{y_{j}}}=\sum_{v\in V}{2^{\lfloor\deg_{G}{(v)}\rfloor}}>\sum_{v\in V}{2^{\deg_{G}{(v)}-1}}=2m/2=m$ . Hence, there is a partition of $L$ into $\left|H\right|$ sets $\set{I_{j}}$ , satisfying $\left|I_{j}\right|\leq 4\cdot\lfloor 2^{y_{j}}/k\rfloor\leq 4\lfloor\deg_{G}{(v)}/k\rfloor$ . The node $j\in H$ simulates the nodes in $I_{j}$ , and for simplicity it also simulates itself. Denote by $\rho(i)$ the new identifier of the node simulating the node with new identifier $i$ .

Input: For every $j\in H$ , we now deliver the information node $j$ requires to simulate nodes in $I_{j}$ . Each $u$ sends to its neighbors its new identifier $ID_{new}(u)$ . As $u$ knows its new identifier $ID_{new}(u)$ , it also knows the new identifier of $j$ . Using \IfAppendixLABEL:\next ( ( $\mathsf{CONGEST}$ Routing).)Claim D.1, $u\in L$ sends to the node $\rho(u)$ its new and original identifier together with the new and original identifiers of its neighbors and input. A node $v\in H$ , for each neighbor $w$ of node $u\in I_{v}$ ,can now locally compute the new identifier of $\rho(w)$ . This invokes the algorithm from \IfAppendixLABEL:\next ( ( $\mathsf{CONGEST}$ Routing).)Claim D.1 at most ${O}(i_{c})$ times. In each invocation, each $u\in L$ sends at most $\deg_{G}(u)$ messages and each node $v\in H$ receives at most $4\cdot(\deg_{G}(v)/k)\cdot k=\deg_{G}(v)\cdot 2^{{O}(\sqrt{\log{n}})}$ messages, as required. Instantiation: We now instantiate the $\mathsf{AC(c)}$ model. As a communication token of node $v$ in the $\mathsf{AC(c)}$ model, we use the concatenation of $v$ , $\rho(v)$ and $ID_{new}(v)$ . Clearly, identifiers are unique in $\left[n\right]$ and communication tokens are unique of size ${O}(\log{n})$ bits. While pre-possessing, we already guaranteed that the node which simulates $u$ knows the communication token of all neighbors of $u$ . Now the new identifier assignment $ID_{new}$ and simulation assignment $\rho$ satisfy the demands of \IfAppendixLABEL:\next ( (Build Carrier Configurations in $\mathsf{CONGEST}$ ).)Claim D.3 and we use it to build carrier configuration.

Round Simulation: During one round of the $\mathsf{AC(c)}$ model, each node can send and receive at most $c$ messages. Each message is sent either to a random node or a node with a known communication token. We split into two phases. First, sending the messages to nodes with known communication tokens, and second sending the messages to random nodes.

In the first phase, we use the fact that the new identifier of the destination is a part of the communication token. Each node $v\in H$ has to send and receive messages on behalf of all $I_{v}$ . Thus, node $v\in H$ has to send and receive at most $(4\cdot\lfloor\deg{(v)}/k\rfloor+1)\cdot c\leq(4\cdot\deg{(v)}+1)\lceil c/k\rceil$ messages. It is therefore enough to invoke the algorithm from \IfAppendixLABEL:\next ( ( $\mathsf{CONGEST}$ Routing).)Claim D.1 for ${\tilde{{O}}}(\lceil c/k\rceil)$ times to deliver all the messages w.h.p.

In the second phase, for each message, we chose a new identifier independently and uniformly. Since for node $u$ we know the new identifier of $\rho(u)$ , we also know where to route the message to. As $n$ nodes sample uniformly at most $c$ messages, each new identifier is sampled at most ${\tilde{{O}}}(c)$ times, w.h.p. Thus, the number of messages that a high degree node $v\in H$ has to send or receive is at most ${\tilde{{O}}}(\deg{(v)}c/k)$ . Thus, w.h.p., ${\tilde{{O}}}(\lceil{c/k}\rceil)$ invocations of algorithm from \IfAppendixLABEL:\next ( ( $\mathsf{CONGEST}$ Routing).)Claim D.1 suffice.

Main Output: Finally, we send outputs back to the simulated nodes. This works in a similar manner, with a node $v\in H$ splitting the output of each node $u\in I_{v}$ into $\lceil{o_{u}/\deg{(u)}}\rceil$ batches of size at most $\deg{(u)}$ . Notice that since $v$ receives the identifiers of all neighbors of $u$ , it knows $\deg{(u)}$ . Then, for $o_{c}$ rounds, each node uses \IfAppendixLABEL:\next ( ( $\mathsf{CONGEST}$ Routing).)Claim D.1 to send one batch to each node it simulates.

Auxiliary Output: The auxiliary output that some node $u$ produces is stored in the node $v=\rho(u)$ which simulates it. Since $\rho(u)=v$ is a part of the communication token of $u$ , each node $w$ which knows the communication token of $u$ , knows also $v$ .

Round Complexity: By \IfAppendixLABEL:\next ( (Identifiers).)Corollary D.2, computing new identifiers takes ${O}(\tau_{\text{mix}}+\log{n})$ rounds. By \IfAppendixLABEL:\next ( ( $\mathsf{CONGEST}$ Routing).)Claim D.1, sending the input requires $i_{c}\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds, the simulation of the $t$ rounds of the $\mathsf{AC(c)}$ model requires $\lceil\frac{c}{k}\rceil\cdot t\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds, and sending the output back takes $o_{c}\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds, w.h.p. By \IfAppendixLABEL:\next ( (Build Carrier Configurations in $\mathsf{CONGEST}$ ).)Claim D.3, building the carrier configuration takes $\lceil m^{\prime}/m\rceil\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds w.h.p. Thus, the execution terminates in $(m^{\prime}/m+i_{c}+\lceil c/k\rceil\cdot t+o_{c})\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds, w.h.p. ∎

5.2 Faster $\mathsf{Congested\ Clique}$ Simulation

The $\mathsf{AC(c)}$ model is, in a sense, a generalization of the $\mathsf{Congested\ Clique}$ model, directly implying the following.

Claim 5.2 ( $\mathsf{Congested\ Clique}$ Simulation in $\mathsf{AC(c)}$ ).

There is an algorithm, which executes one round of the $\mathsf{Congested\ Clique}$ model in the $\mathsf{AC(c)}$ in ${\tilde{{O}}}(\frac{n}{c})$ rounds w.h.p.

Proof of Claim 5.2.

Initially, for ${\tilde{{O}}}(\frac{n}{c})$ rounds, each node sends its communication token and identifier to $c$ (not necessary distinct) randomly sampled nodes. By Chernoff and union bounds, all nodes receive the identifiers and communication tokens of all other nodes w.h.p. For an additional ${\tilde{{O}}}(\frac{n}{c})$ rounds, the nodes use the learned communication tokens to deliver the messages they have. ∎

Combining \IfAppendixLABEL:\next ( ( $\mathsf{Congested\ Clique}$ Simulation in $\mathsf{AC(c)}$ ).) and LABEL:\next ( ( $\mathsf{AC(c)}$ Simulation in $\mathsf{CONGEST}$ ).)Claims 5.2 and 5.1 implies a density aware simulation of the $\mathsf{Congested\ Clique}$ model in the $\mathsf{CONGEST}$ model – the more edges the input graph has, the faster the simulation.

Theorem 1.3 ( $\mathsf{Congested\ Clique}$ Simulation in $\mathsf{CONGEST}$ ).

Consider a graph $G=(V,E)$ , and $A$ an algorithm which runs in the $\mathsf{Congested\ Clique}$ model over $G$ in $t$ rounds. For each $v$ denote by $i_{v}\log{n}$ and $o_{v}\log{n}$ its number of input and output bits, respectively. Let $i_{c}=\max_{v\in V}\set{1+i_{v}/\deg_{G}(v)}$ and $o_{c}=\max_{v\in V}\set{1+{o_{v}/\deg_{G}(v)}}$ be the input and output capacities: rounds required for any node to send or receive its input or output, respectively. Then $A$ can be simulated within $(i_{c}+{{n^{2}}/{m}}\cdot t+o_{c})\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds of the $\mathsf{CONGEST}$ model over $G$ , w.h.p.

5.3 Improved Distance Computations

We show an exact SSSP algorithm via a simulation of the algorithm from \IfAppendixLABEL:\next ( ( $\mathsf{AC(c)}$ Simulation in $\mathsf{CONGEST}$ ).)Theorem 5.1. See 1.4

Proof of Theorem 1.4.

The claim follows from simulating \IfAppendixLABEL:\next ( (Exact SSSP in $\mathsf{AC(c)}$ ).)Theorem B.23 of $\mathsf{AC(m/n)}$ over $G$ using \IfAppendixLABEL:\next ( ( $\mathsf{AC(c)}$ Simulation in $\mathsf{CONGEST}$ ).)Theorem 5.1, in $(1+\tilde{O}(m^{1/2}n^{1/6}/c+n/c+n^{7/6}/m^{1/2})\cdot\lceil c/k\rceil)\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}={(n^{7/6}/m^{1/2}+n^{2}/m)}\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds, w.h.p. ∎

Now, we define and approximate our first APSP relaxation.

Definition 7 (Shortest Path Query Problem).

Given an input graph $G$ , a query set is a set of $q$ source-destination pairs $Q=\set{(s_{i},t_{i})}_{i=1}^{q}$ called queries. For each node $u$ , the source and destination loads, $u_{s}$ and $u_{t}$ , respectively, are the number of times $u$ appears as a source and destination in $S$ divided by its degree. The maximum $\ell$ over all source and destination loads is the query set load.

A shortest path query problem is a query set of size $q$ and load $\ell$ , s.t. every $s_{i}$ knows the identifier of $t_{i}$ . The goal is to answer all queries, that is, $s_{i}$ computes or approximates $d_{G}(s_{i},t_{i})$ .

Given an input graph $G$ , an algorithm is a shortest path query algorithm if, after some pre-processing of $T_{pre-processing}$ rounds, given any query set of size $q$ and load $\ell$ , the algorithm solves shortest path query problem within an additional number of $T_{query}$ rounds.

We follow the approach of [14] in order to design a $(3+\varepsilon)$ -approximate shortest path query algorithm, using our methods from the $\mathsf{AC(c)}$ model. For this we use the following important tool whose proof deferred to Appendix D.

Lemma 5.3 ( $k$ -nearest in $\mathsf{CONGEST}$ ).

Given a graph $G$ , it is possible in the $\mathsf{CONGEST}$ model, within ${(k\cdot n^{4/3}/m+n^{5/3}/m+1)}\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds, w.h.p., to compute the distance $d_{G}(v,u)$ from every node $v$ to every node $u$ which is one of the closest $k$ nodes to $v$ (with ties broken arbitrarily).

See 1.5

Proof of Theorem 1.5.

To approximate the distance $d_{G}(s,t)$ , we compute $\min\set{d_{G}^{n^{1/2}}(s,t),d_{G}(s,p(s))+d_{G}(p(s),t)}$ , where $p(s)$ is the closest node to $s$ in some globally known hitting set $A$ of all $N^{n^{1/2}}_{G}(s)$ . Thus, while pre-possessing, we verify that each node $s$ knows $d_{G}(s,v)$ for each $v\in A$ .

Pre-processing: First, we execute the algorithm from Lemma 5.3 with $k=n^{1/2}$ to get the distance to each node in $N^{n^{1/2}}_{G}(v)$ . Now, each node enters $A$ independently with probability ${\tilde{{O}}}(n^{-1/2})$ . W.h.p., the set $A$ is of size ${\tilde{{O}}}(n^{1/2})$ and is a hitting set for each $N^{n^{1/2}}_{G}(v)$ . Let $\varepsilon^{\prime}=\varepsilon/3$ . We compute $(1+\varepsilon^{\prime})$ -approximate MSSP from $A$ using ${\tilde{{O}}}(n^{1/2})$ invocations of the $(1+\varepsilon)$ approximate SSSP algorithm [40, Corollary 5]. Each $v$ computes $p(v)\in A\cap N_{G}^{n^{1/2}}(v)$ , which is the sampled node closest to $v$ , which exists in the set of its $n^{1/2}$ nearest neighbors w.h.p.

Computing $n^{1/2}$ -nearest takes ${(k\cdot n^{4/3}/m+n^{5/3}/m+1)}\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds, w.h.p. The complexity of solving $n^{1/2}$ -SSP is $n^{1/2}\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{log{n}})}$ rounds w.h.p. Overall complexity of the pre-processing thus $(n^{1/2}+{n^{11/6}}/{m})\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds w.h.p.

Query: Whenever node $s_{i}$ needs to approximate its distance to $t_{i}$ , the node $s_{i}$ requests from $t_{i}$ the distance to $p(s_{i})$ , for which $t_{i}$ knows a $(1+\varepsilon^{\prime})$ approximation, $\tilde{d}(p(s_{i}),t_{i})$ , due to [40, Corollary 5]. The node $s$ approximates its distance to $t_{i}$ as $\tilde{d}(s_{i},t_{i})=d(s_{i},p(s_{i}))+\tilde{d}(p(s_{i}),t_{i})$ . The approximation factor follows from \IfAppendixLABEL:\next ( (APSP using $k$ -nearest and MSSP).)Claim A.3. To execute the routing, the nodes invoke Claim D.1. The number of rounds for solving a query with load $\ell$ is $\ell\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ w.h.p. ∎

By denoting $\delta$ as the minimal degree in the graph, one gets load $\ell\leq n/\delta$ and $m\geq n\cdot\delta$ , which implies our main result Theorem 1.1.

See 1.1

Finally, we use our $\mathsf{AC(c)}$ simulation together with \IfAppendixLABEL:\next ( ( $(3+\varepsilon)$ -Approximation for Scattered APSP in $\mathsf{AC(c)}$ ).)Theorem B.28 to obtain our Scattered APSP algorithm in the $\mathsf{CONGEST}$ model.

See 1.6

Proof of Theorem 1.6.

Simulate \IfAppendixLABEL:\next ( ( $(3+\varepsilon)$ -Approximation for Scattered APSP in $\mathsf{AC(c)}$ ).)Theorem B.28 using \IfAppendixLABEL:\next ( ( $\mathsf{AC(c)}$ Simulation in $\mathsf{CONGEST}$ ).)Theorem 5.1. Split the output of each node $u$ to two parts. The first part, $s_{u}$ , is $\tilde{O}(n^{1/2})$ bits encoding the communication tokens of nodes which store the distances from $u$ . Thus, the output capacity is $o_{c}={\tilde{{O}}}(n^{1/2})$ . The communication tokens decoded from $s_{u}$ , allow $u$ to know where its distances are stored. The second part of the output of $u$ is the auxiliary output which is the distances it knows. Thus, the algorithm completes in $((n^{11/6}/m+n^{1/3}+m/n)/\varepsilon+n^{1/2})\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds, w.h.p. ∎

6 Discussion

We believe that additional problems in various fundamental distributed settings could be solvable using our infrastructure for sparsity aware computation. This is a broad open direction for further research.

With respect to the specific results shown here, a major goal would be to construct a more sparse hopset in the $\mathsf{AC(c)}$ model. Further, one could attempt to show sparse matrix multiplication algorithms which relax the assumption that the input matrices and output matrix are bounded by the same number of finite elements, as this could directly improve the complexity of our $k$ -SSP algorithm. Either of these improvements is likely to significantly reduce the round complexity of many of our end results, in both the $\mathsf{Hybrid}$ and $\mathsf{CONGEST}$ models.

Acknowledgements

This project was partially supported by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement no. 755839. The authors would like to thank Michal Dory and Yuval Efron for various helpful conversations. We also thank Fabian Kuhn for sharing a preprint of [49] with us.

References

[1] Amir Abboud, Keren Censor-Hillel, and Seri Khoury. Near-linear lower bounds for distributed distance computations, even in sparse networks. In Cyril Gavoille and David Ilcinkas, editors, Distributed Computing - 30th International Symposium, DISC 2016, September 27-29, 2016. Proceedings, volume 9888 of Lecture Notes in Computer Science, pages 29–42, Paris, France, 2016. Springer. doi:10.1007/978-3-662-53426-7\\\\_3.
[2] Udit Agarwal and Vijaya Ramachandran. Distributed weighted all pairs shortest paths through pipelining. In 2019 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2019, May 20-24, 2019, pages 23–32, Rio de Janeiro, Brazil, 2019. IEEE. doi:10.1109/IPDPS.2019.00014.
[3] Udit Agarwal and Vijaya Ramachandran. Faster deterministic all pairs shortest paths in congest model. In Christian Scheideler and Michael Spear, editors, SPAA ’20: 32nd ACM Symposium on Parallelism in Algorithms and Architectures, July 15-17, 2020, pages 11–21, Virtual Event, USA, 2020. ACM. doi:10.1145/3350755.3400256.
[4] Udit Agarwal, Vijaya Ramachandran, Valerie King, and Matteo Pontecorvi. A deterministic distributed algorithm for exact weighted all-pairs shortest paths in õ(n ^3/2 ) rounds. In Calvin Newport and Idit Keidar, editors, Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing, PODC 2018, July 23-27, 2018, pages 199–205, Egham, United Kingdom, 2018. ACM. doi:10.1145/3212734.3212773.
[5] Bertie Ancona, Keren Censor-Hillel, Mina Dalirrooyfard, Yuval Efron, and Virginia Vassilevska Williams. Distributed distance approximation. In Quentin Bramas, Rotem Oshman, and Paolo Romano, editors, 24th International Conference on Principles of Distributed Systems, OPODIS 2020, December 14-16, 2020, volume 184 of LIPIcs, pages 30:1–30:17, Strasbourg, France (Virtual Conference), 2020. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.OPODIS.2020.30.
[6] John Augustine, Keerti Choudhary, Avi Cohen, David Peleg, Sumathi Sivasubramaniam, and Suman Sourav. Distributed graph realizations †. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 18-22, 2020, pages 158–167, New Orleans, LA, USA, 2020. IEEE. doi:10.1109/IPDPS47924.2020.00026.
[7] John Augustine, Mohsen Ghaffari, Robert Gmyr, Kristian Hinnenthal, Christian Scheideler, Fabian Kuhn, and Jason Li. Distributed computation in node-capacitated networks. In Christian Scheideler and Petra Berenbrink, editors, The 31st ACM on Symposium on Parallelism in Algorithms and Architectures, SPAA 2019, June 22-24, 2019, pages 69–79, Phoenix, AZ, USA, 2019. ACM. doi:10.1145/3323165.3323195.
[8] John Augustine, Kristian Hinnenthal, Fabian Kuhn, Christian Scheideler, and Philipp Schneider. Shortest paths in a hybrid network model. In Shuchi Chawla, editor, Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, January 5-8, 2020, pages 1280–1299, Salt Lake City, UT, USA, 2020. SIAM. doi:10.1137/1.9781611975994.78.
[9] Florent Becker, Antonio Fernández Anta, Ivan Rapaport, and Eric Rémila. Brief announcement: A hierarchy of congested clique models, from broadcast to unicast. In Chryssis Georgiou and Paul G. Spirakis, editors, Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing, PODC 2015, July 21 - 23, 2015, pages 167–169, Donostia-San Sebastián, Spain, 2015. ACM. doi:10.1145/2767386.2767447.
[10] Florent Becker, Antonio Fernández Anta, Ivan Rapaport, and Eric Rémila. The effect of range and bandwidth on the round complexity in the congested clique model. In Thang N. Dinh and My T. Thai, editors, Computing and Combinatorics - 22nd International Conference, COCOON 2016, August 2-4, 2016, Proceedings, volume 9797 of Lecture Notes in Computer Science, pages 182–193, Ho Chi Minh City, Vietnam, 2016. Springer. doi:10.1007/978-3-319-42634-1\\\\_15.
[11] Ruben Becker, Andreas Karrenbauer, Sebastian Krinninger, and Christoph Lenzen. Near-optimal approximate shortest paths and transshipment in distributed and streaming models. In Andréa W. Richa, editor, 31st International Symposium on Distributed Computing, DISC 2017, October 16-20, 2017, volume 91 of LIPIcs, pages 7:1–7:16, Vienna, Austria, 2017. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.DISC.2017.7.
[12] Aaron Bernstein and Danupon Nanongkai. Distributed exact weighted all-pairs shortest paths in near-linear time. In Moses Charikar and Edith Cohen, editors, Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, June 23-26, 2019, pages 334–342, Phoenix, AZ, USA, 2019. ACM. doi:10.1145/3313276.3316326.
[13] Keren Censor-Hillel, Yi-Jun Chang, François Le Gall, and Dean Leitersdorf. Tight distributed listing of cliques. In Dániel Marx, editor, Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, January 10 - 13, 2021, pages 2878–2891, Virtual Conference, 2021. SIAM. doi:10.1137/1.9781611976465.171.
[14] Keren Censor-Hillel, Michal Dory, Janne H. Korhonen, and Dean Leitersdorf. Fast approximate shortest paths in the congested clique. In Peter Robinson and Faith Ellen, editors, Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing, PODC 2019, Canada, July 29 - August 2, 2019, pages 74–83, Toronto, ON, 2019. ACM. doi:10.1145/3293611.3331633.
[15] Keren Censor-Hillel, François Le Gall, and Dean Leitersdorf. On distributed listing of cliques. In Yuval Emek and Christian Cachin, editors, PODC ’20: ACM Symposium on Principles of Distributed Computing, August 3-7, 2020, pages 474–482, Virtual Event, Italy, 2020. ACM. doi:10.1145/3382734.3405742.
[16] Keren Censor-Hillel, Petteri Kaski, Janne H. Korhonen, Christoph Lenzen, Ami Paz, and Jukka Suomela. Algebraic methods in the congested clique. Distributed Comput., 32(6):461–478, 2019. doi:10.1007/s00446-016-0270-2.
[17] Keren Censor-Hillel, Seri Khoury, and Ami Paz. Quadratic and near-quadratic lower bounds for the CONGEST model. In Andréa W. Richa, editor, 31st International Symposium on Distributed Computing, DISC 2017, October 16-20, 2017, volume 91 of LIPIcs, pages 10:1–10:16, Vienna, Austria, 2017. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.DISC.2017.10.
[18] Keren Censor-Hillel, Dean Leitersdorf, and Volodymyr Polosukhin. Distance computations in the hybrid network model via oracle simulations. In Markus Bläser and Benjamin Monmege, editors, 38th International Symposium on Theoretical Aspects of Computer Science, STACS 2021, March 16-19, 2021, volume 187 of LIPIcs, pages 21:1–21:19, Saarbrücken, Germany (Virtual Conference), 2021. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.STACS.2021.21.
[19] Keren Censor-Hillel, Dean Leitersdorf, and Elia Turner. Sparse matrix multiplication and triangle listing in the congested clique model. Theor. Comput. Sci., 809:45–60, 2020. doi:10.1016/j.tcs.2019.11.006.
[20] Yi-Jun Chang, Seth Pettie, and Hengjie Zhang. Distributed triangle detection via expander decomposition. In Timothy M. Chan, editor, Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, January 6-9, 2019, pages 821–840, San Diego, California, USA, 2019. SIAM. doi:10.1137/1.9781611975482.51.
[21] Yi-Jun Chang and Thatchaphol Saranurak. Improved distributed expander decomposition and nearly optimal triangle enumeration. In Peter Robinson and Faith Ellen, editors, Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing, PODC 2019, Canada, July 29 - August 2, 2019, pages 66–73, Toronto, ON, 2019. ACM. doi:10.1145/3293611.3331618.
[22] Yi-Jun Chang and Thatchaphol Saranurak. Deterministic distributed expander decomposition and routing with applications in distributed derandomization. In 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, November 16-19, 2020, pages 377–388, Durham, NC, USA, 2020. IEEE. doi:10.1109/FOCS46700.2020.00043.
[23] Shiri Chechik and Doron Mukhtar. Single-source shortest paths in the CONGEST model with improved bound. In Yuval Emek and Christian Cachin, editors, PODC ’20: ACM Symposium on Principles of Distributed Computing, August 3-7, 2020, pages 464–473, Virtual Event, Italy, 2020. ACM. doi:10.1145/3382734.3405729.
[24] Edith Cohen. Polylog-time and near-linear work approximation scheme for undirected shortest paths. J. ACM, 47(1):132–166, 2000. doi:10.1145/331605.331610.
[25] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, 3rd Edition. MIT Press, 2009. URL: http://mitpress.mit.edu/books/introduction-algorithms.
[26] Yong Cui, Hongyi Wang, and Xiuzhen Cheng. Channel allocation in wireless data center networks. In INFOCOM 2011. 30th IEEE International Conference on Computer Communications, Joint Conference of the IEEE Computer and Communications Societies, 10-15 April 2011, pages 1395–1403, Shanghai, China, 2011. IEEE. doi:10.1109/INFCOM.2011.5934925.
[27] Michael Dinitz, Michael Schapira, and Gal Shahaf. Approximate moore graphs are good expanders. J. Comb. Theory, Ser. B, 141:240–263, 2020. doi:10.1016/j.jctb.2019.08.003.
[28] Michal Dory and Merav Parter. Exponentially faster shortest paths in the congested clique. In Yuval Emek and Christian Cachin, editors, PODC ’20: ACM Symposium on Principles of Distributed Computing, August 3-7, 2020, pages 59–68, Virtual Event, Italy, 2020. ACM. doi:10.1145/3382734.3405711.
[29] Andrew Drucker, Fabian Kuhn, and Rotem Oshman. On the power of the congested clique model. In Magnús M. Halldórsson and Shlomi Dolev, editors, ACM Symposium on Principles of Distributed Computing, PODC ’14, July 15-18, 2014, pages 367–376, Paris, France, 2014. ACM. doi:10.1145/2611462.2611493.
[30] Talya Eden, Nimrod Fiat, Orr Fischer, Fabian Kuhn, and Rotem Oshman. Sublinear-time distributed algorithms for detecting small cliques and even cycles. In Jukka Suomela, editor, 33rd International Symposium on Distributed Computing, DISC 2019, October 14-18, 2019, volume 146 of LIPIcs, pages 15:1–15:16, Budapest, Hungary, 2019. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.DISC.2019.15.
[31] Michael Elkin. Distributed exact shortest paths in sublinear time. In Hamed Hatami, Pierre McKenzie, and Valerie King, editors, Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Canada, June 19-23, 2017, pages 757–770, Montreal, QC, 2017. ACM. doi:10.1145/3055399.3055452.
[32] Michael Feldmann, Kristian Hinnenthal, and Christian Scheideler. Fast hybrid network algorithms for shortest paths in sparse graphs. In Quentin Bramas, Rotem Oshman, and Paolo Romano, editors, 24th International Conference on Principles of Distributed Systems, OPODIS 2020, December 14-16, 2020, volume 184 of LIPIcs, pages 31:1–31:16, Strasbourg, France (Virtual Conference), 2020. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.OPODIS.2020.31.
[33] Sebastian Forster and Danupon Nanongkai. A faster distributed single-source shortest paths algorithm. In Mikkel Thorup, editor, 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2018, October 7-9, 2018, pages 686–697, Paris, France, 2018. IEEE Computer Society. doi:10.1109/FOCS.2018.00071.
[34] Silvio Frischknecht, Stephan Holzer, and Roger Wattenhofer. Networks cannot compute their diameter in sublinear time. In Yuval Rabani, editor, Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, January 17-19, 2012, pages 1150–1162, Kyoto, Japan, 2012. SIAM. doi:10.1137/1.9781611973099.91.
[35] François Le Gall. Further algebraic algorithms in the congested clique model and applications to graph-theoretic problems. In Cyril Gavoille and David Ilcinkas, editors, Distributed Computing - 30th International Symposium, DISC 2016, September 27-29, 2016. Proceedings, volume 9888 of Lecture Notes in Computer Science, pages 57–70, Paris, France, 2016. Springer. doi:10.1007/978-3-662-53426-7\\\\_5.
[36] Mohsen Ghaffari and Bernhard Haeupler. Distributed algorithms for planar networks I: planar embedding. In George Giakkoupis, editor, Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing, PODC 2016, July 25-28, 2016, pages 29–38, Chicago, IL, USA, 2016. ACM. doi:10.1145/2933057.2933109.
[37] Mohsen Ghaffari and Bernhard Haeupler. Distributed algorithms for planar networks II: low-congestion shortcuts, mst, and min-cut. In Robert Krauthgamer, editor, Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, January 10-12, 2016, pages 202–219, Arlington, VA, USA, 2016. SIAM. doi:10.1137/1.9781611974331.ch16.
[38] Mohsen Ghaffari, Fabian Kuhn, and Hsin-Hao Su. Distributed MST and routing in almost mixing time. In Elad Michael Schiller and Alexander A. Schwarzmann, editors, Proceedings of the ACM Symposium on Principles of Distributed Computing, PODC 2017, July 25-27, 2017, pages 131–140, Washington, DC, USA, 2017. ACM. doi:10.1145/3087801.3087827.
[39] Mohsen Ghaffari and Jason Li. Improved distributed algorithms for exact shortest paths. In Ilias Diakonikolas, David Kempe, and Monika Henzinger, editors, Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, June 25-29, 2018, pages 431–444, Los Angeles, CA, USA, 2018. ACM. doi:10.1145/3188745.3188948.
[40] Mohsen Ghaffari and Jason Li. New distributed algorithms in almost mixing time via transformations from parallel algorithms. In Ulrich Schmid and Josef Widder, editors, 32nd International Symposium on Distributed Computing, DISC 2018, October 15-19, 2018, volume 121 of LIPIcs, pages 31:1–31:16, New Orleans, LA, USA, 2018. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.DISC.2018.31.
[41] Ofer Grossman, Seri Khoury, and Ami Paz. Improved hardness of approximation of diameter in the CONGEST model. In Hagit Attiya, editor, 34th International Symposium on Distributed Computing, DISC 2020, October 12-16, 2020, volume 179 of LIPIcs, pages 19:1–19:16, Virtual Conference, 2020. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.DISC.2020.19.
[42] Kai Han, Zhiming Hu, Jun Luo, and Liu Xiang. RUSH: routing and scheduling for hybrid data center networks. In 2015 IEEE Conference on Computer Communications, INFOCOM 2015, April 26 - May 1, 2015, pages 415–423, Kowloon, Hong Kong, 2015. IEEE. doi:10.1109/INFOCOM.2015.7218407.
[43] Vipul Harsh, Sangeetha Abdu Jyothi, Inderdeep Singh, and Philip Brighten Godfrey. Expander datacenters: From theory to practice. CoRR, abs/1811.00212, 2018. URL: http://arxiv.org/abs/1811.00212, arXiv:1811.00212.
[44] Monika Henzinger, Sebastian Krinninger, and Danupon Nanongkai. A deterministic almost-tight distributed algorithm for approximating single-source shortest paths. In Daniel Wichs and Yishay Mansour, editors, Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, June 18-21, 2016, pages 489–498, Cambridge, MA, USA, 2016. ACM. doi:10.1145/2897518.2897638.
[45] Stephan Holzer and Nathan Pinsker. Approximation of distances and shortest paths in the broadcast congest clique. In Emmanuelle Anceaume, Christian Cachin, and Maria Gradinariu Potop-Butucaru, editors, 19th International Conference on Principles of Distributed Systems, OPODIS 2015, December 14-17, 2015, volume 46 of LIPIcs, pages 6:1–6:16, Rennes, France, 2015. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.OPODIS.2015.6.
[46] Stephan Holzer and Roger Wattenhofer. Optimal distributed all pairs shortest paths and applications. In Darek Kowalski and Alessandro Panconesi, editors, ACM Symposium on Principles of Distributed Computing, PODC ’12, July 16-18, 2012, pages 355–364, Funchal, Madeira, Portugal, 2012. ACM. doi:10.1145/2332432.2332504.
[47] He Huang, Xiangke Liao, Shanshan Li, Shaoliang Peng, Xiaodong Liu, and Bin Lin. The architecture and traffic management of wireless collaborated hybrid data center network. In Dah Ming Chiu, Jia Wang, Paul Barford, and Srinivasan Seshan, editors, ACM SIGCOMM 2013 Conference, SIGCOMM’13, August 12-16, 2013, pages 511–512, Hong Kong, China, 2013. ACM. doi:10.1145/2486001.2491724.
[48] Taisuke Izumi, François Le Gall, and Frédéric Magniez. Quantum distributed algorithm for triangle finding in the CONGEST model. In Christophe Paul and Markus Bläser, editors, 37th International Symposium on Theoretical Aspects of Computer Science, STACS 2020, March 10-13, 2020, volume 154 of LIPIcs, pages 23:1–23:13, Montpellier, France, 2020. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.STACS.2020.23.
[49] Fabian Kuhn and Philipp Schneider. Computing shortest paths and diameter in the hybrid network model. In Yuval Emek and Christian Cachin, editors, PODC ’20: ACM Symposium on Principles of Distributed Computing, August 3-7, 2020, pages 109–118, Virtual Event, Italy, 2020. ACM. doi:10.1145/3382734.3405719.
[50] Christoph Lenzen and Boaz Patt-Shamir. Fast routing table construction using small messages: extended abstract. In Dan Boneh, Tim Roughgarden, and Joan Feigenbaum, editors, Symposium on Theory of Computing Conference, STOC’13, June 1-4, 2013, pages 381–390, Palo Alto, CA, USA, 2013. ACM. doi:10.1145/2488608.2488656.
[51] Christoph Lenzen and Boaz Patt-Shamir. Fast partial distance estimation and applications. In Chryssis Georgiou and Paul G. Spirakis, editors, Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing, PODC 2015, July 21 - 23, 2015, pages 153–162, Donostia-San Sebastián, Spain, 2015. ACM. doi:10.1145/2767386.2767398.
[52] Christoph Lenzen and David Peleg. Efficient distributed source detection with limited bandwidth. In Panagiota Fatourou and Gadi Taubenfeld, editors, ACM Symposium on Principles of Distributed Computing, PODC ’13, Canada, July 22-24, 2013, pages 375–382, Montreal, QC, 2013. ACM. doi:10.1145/2484239.2484262.
[53] Jason Li and Merav Parter. Planar diameter via metric compression. In Moses Charikar and Edith Cohen, editors, Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, June 23-26, 2019, pages 152–163, Phoenix, AZ, USA, 2019. ACM. doi:10.1145/3313276.3316358.
[54] Danupon Nanongkai. Distributed approximation algorithms for weighted shortest paths. In David B. Shmoys, editor, Symposium on Theory of Computing, STOC 2014, May 31 - June 03, 2014, pages 565–573, New York, NY, USA, 2014. ACM. doi:10.1145/2591796.2591850.
[55] Merav Parter. Distributed planar reachability in nearly optimal time. In Hagit Attiya, editor, 34th International Symposium on Distributed Computing, DISC 2020, October 12-16, 2020, volume 179 of LIPIcs, pages 38:1–38:17, Virtual Conference, 2020. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.DISC.2020.38.
[56] David Peleg. Distributed Computing: A Locality-Sensitive Approach. Society for Industrial and Applied Mathematics, USA, 2000.
[57] David Peleg, Liam Roditty, and Elad Tal. Distributed algorithms for network diameter and girth. In Artur Czumaj, Kurt Mehlhorn, Andrew M. Pitts, and Roger Wattenhofer, editors, Automata, Languages, and Programming - 39th International Colloquium, ICALP 2012, July 9-13, 2012, Proceedings, Part II, volume 7392 of Lecture Notes in Computer Science, pages 660–672, Warwick, UK, 2012. Springer. doi:10.1007/978-3-642-31585-5\\\\_58.
[58] Atish Das Sarma, Stephan Holzer, Liah Kor, Amos Korman, Danupon Nanongkai, Gopal Pandurangan, David Peleg, and Roger Wattenhofer. Distributed verification and hardness of distributed approximation. SIAM J. Comput., 41(5):1235–1265, 2012. doi:10.1137/11085178X.
[59] Jeanette P. Schmidt, Alan Siegel, and Aravind Srinivasan. Chernoff-hoeffding bounds for applications with limited independence. SIAM J. Discret. Math., 8(2):223–250, 1995. doi:10.1137/S089548019223872X.
[60] Hsin-Hao Su and Hoa T. Vu. Distributed dense subgraph detection and low outdegree orientation. In Hagit Attiya, editor, 34th International Symposium on Distributed Computing, DISC 2020, October 12-16, 2020, volume 179 of LIPIcs, pages 15:1–15:18, Virtual Conference, 2020. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.DISC.2020.15.
[61] Salil P. Vadhan. Pseudorandomness. Found. Trends Theor. Comput. Sci., 7(1-3):1–336, 2012. doi:10.1561/0400000010.
[62] Guohui Wang, David G. Andersen, Michael Kaminsky, Konstantina Papagiannaki, T. S. Eugene Ng, Michael Kozuch, and Michael P. Ryan. c-through: part-time optics in data centers. In Shivkumar Kalyanaraman, Venkata N. Padmanabhan, K. K. Ramakrishnan, Rajeev Shorey, and Geoffrey M. Voelker, editors, Proceedings of the ACM SIGCOMM 2010 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, August 30 -September 3, 2010, pages 327–338, New Delhi, India, 2010. ACM. doi:10.1145/1851182.1851222.

Appendix A Preliminaries – Extended

A.1 Definitions

Additional Models. Throughout the paper, we also refer to the $\mathsf{Congested\ Clique}$ and $\mathsf{Broadcast\ Congested\ Clique}$ models. Both are synchronous models, where every node can communicate in each round with every other node by sending messages of $O(\log n)$ bits. In the $\mathsf{Congested\ Clique}$ model, the messages between every pair of nodes can be unique, while in the $\mathsf{Broadcast\ Congested\ Clique}$ model each node sends the same message to all the other nodes.

Similarly to [7] we define a distributive aggregate function and an Aggregation-Problem.

Definition 8 (Aggregate Function).

An aggregate function $f$ maps a multiset $S=\set{x_{1},...,x_{N}}$ of input values to some value $f(S)$ . An aggregate function $f$ is called distributive if there is an aggregate function $g$ such that for any multiset $S$ and any partition $S_{1},\cdots,S_{\ell}$ of $S$ , it holds that $f(S)=g(f(S_{1}),...,f(S_{\ell}))$ .

Definition 9 (Aggregate-and-Broadcast Problem).

In the Aggregate-and-Broadcast Problem we are given a distributive aggregate function $f$ and each node $u$ stores exactly one input value $val(u)$ . The goal is to let every node learn $f(\Set{val(u)}_{u\in V})$ .

A.2 Mathematical Tools

Similarly to [49], we will use families of $k$ -wise independent functions.

Definition 10 (Family of $k$ -wise Independent Random Functions).

For finite sets $A,B$ , let $\mathcal{H}=\set{h\colon A\mapsto B}$ be a family of hash functions. Then $\mathcal{H}$ is called $k$ -wise independent if for a random function $h\in H$ and for any $k$ distinct keys $\set{a_{i}}_{i=1}^{k}\subseteq A$ , we have that $\set{b_{i}}_{i=1}^{k}\subseteq B$ are independent and uniformly distributed random variables in $B$ .

In particular, we are interested in a hash function on which nodes can agree within a small amount of communication.

Claim A.1 (Seed).

[61][49, Lemma D.1] For $A=\set{0,1}^{a}$ and $B=\set{0,1}^{b}$ , there is a family of $k$ -wise independent hash functions $\mathcal{H}=\set{h\colon A\mapsto B}$ such that selecting a function from $\mathcal{H}$ requires $k\cdot\max\set{a,b}$ random bits and computing $h(x)$ for any $x\in A$ can be done in $\operatorname{poly}(a,b,k)$ time.

Unlike [49] we do not necessarily apply the hash function $h\in\mathcal{H}$ sampled from the family of $k$ -wise independent random functions on distinct sets of arguments, but rather on a multiset of arguments where each argument appears at most ${\tilde{{O}}}(1)$ times. So, we show in \IfAppendixLABEL:\next ( (Conflicts).)Claim A.2 the property similar to [49, Lemma D.2], which bounds the number of collisions.

Claim A.2 (Conflicts).

There exists a value $k={\tilde{{\Theta}}}(1)$ such that for a sufficiently large $n$ , given a function $h\colon A\mapsto B\in\mathcal{H}$ (with $\left|A\right|,\left|B\right|={\tilde{{\Theta}}}(n)$ ) sampled from a family of $k$ -wise independent hash functions, and a multi-set of keys $A^{\prime}=\set{a_{i}}_{i=1}^{{\tilde{{O}}}(n)}$ , in which each key appears in $A^{\prime}$ at most ${\tilde{{O}}}(1)$ times, each value $b\in B$ appears in the multiset of values $B^{\prime}=h(A^{\prime})=\set{h(a_{i})}_{i=1}^{{\tilde{{O}}}(n)}$ at most ${\tilde{{O}}}(1)$ times w.h.p.

Proof of Claim A.2.

Split $A^{\prime}$ greedily into ${\tilde{{O}}}(1)$ sets of distinct keys $\set{A^{\prime}_{j}}_{j=1}^{{\tilde{{O}}}(1)}$ . Consider some $A^{\prime}_{j}$ . Let $\mathcal{H}$ be a family of $k$ -wide independent hash functions, for some $k$ to be determined. By the definition of $h\in\mathcal{H}$ , the random variables $B^{\prime}_{j}=h(A^{\prime}_{j})=\set{h(a)\colon a\in A^{\prime}_{j}}\subseteq B^{\prime}$ are $k$ -wise independent and uniformly distributed in $B$ . Thus the probability to sample some particular $b\in B$ is $\frac{1}{\left|B\right|}={\tilde{{\Theta}}}(\frac{1}{n})$ . By a Chernoff Bound for variables with bounded independence [59, Theorem 2] and a union bound over all $b\in B$ and $A^{\prime}_{j}$ , there is a large enough $k={\tilde{{\Theta}}}(1)$ , such that each value $b\in B$ appears in $B^{\prime}_{j}$ at most ${\tilde{{O}}}(1)$ times w.h.p. Thus, each value $b$ appears in $B^{\prime}=\bigcup_{j=1}^{{\tilde{{O}}}(1)}B^{\prime}_{j}$ at most ${\tilde{{O}}}(1)\cdot{\tilde{{O}}}(1)={\tilde{{O}}}(1)$ times w.h.p. ∎

A.3 Distance Tools

Claim A.3 (APSP using $k$ -nearest and MSSP).

(see e.g. [14]) Let $G=(V,E,w)$ be a weighted graph, let $c$ be a constant, let $k$ be a value, and let $A$ be a set of nodes marked independently with probability at least $k^{-(c+1)\log{n}}$ .

With probability at least $1-n^{-c}$ , it holds that $N^{k}(v)\cap A\neq\emptyset$ . Denote by $p(s)\in A\cap N^{k}(v)$ one of the closest nodes to $s$ in $A\cap N^{k}(v)$ . Denote by $\tilde{d}\colon A\times V\mapsto\mathbb{R}$ the $\alpha$ -approximate distance from $A$ to other nodes for some $\alpha$ . With probability at least $1-n^{-c}$ , for any pair of nodes $s,t\in V$ it holds that $\hat{d}(s,t)=d(s,p(s))+\tilde{d}(p(s),t)$ is a $3\alpha$ -approximate weighted distance between $s$ and $t$ .

Appendix B The $\mathsf{Anonymous\ Capacitated}$ Model – Extended

Section B.1 contains proofs of various technical tools for routing information in the $\mathsf{AC(c)}$ model – we note that if taken as black-boxes, its contents can be skipped without harming the understanding of the main contributions of this section. Then, we show how to build carrier configurations and how to work with them, in Section B.2. We use sparse matrix multiplication Theorem 3.1 to construct hopsets in Section B.3, which eventually allows us to obtain our fast algorithms for SSSP and MSSP in Section B.4 and Section B.5, respectively.

B.1 General Tools

We show basic tools which are useful in the $\mathsf{AC(c)}$ model, for overcoming the anonymity challenges, as well as for solving problems related to communication with limited bandwidth.

We introduce the following notation. Given a set of nodes $W\subseteq V$ , denote by $Tokens(W)$ the set of pairs of communication tokens and identifiers of the nodes in $W$ .

B.1.1 Basic Message Routing

Lemma B.1 (Routing).

Given a set of messages and a globally known value $k$ , if each node desires to send at most $k$ messages and knows the communication tokens of their recipients, and each node is the recipient of at most $k$ messages, then it is possible to deliver the messages in ${\tilde{{O}}}(k/c+1)$ rounds of the $\mathsf{AC(c)}$ model, w.h.p.

Proof of Lemma B.1.

Denote by $m_{v}$ the messages that node $v$ desires to send. We proceed in $\Theta{(\left\lceil k\log^{2}{n}/c\right\rceil)}$ rounds, where in each round each node $v$ samples messages from $m_{v}$ that are not yet sent, independently with probability $\Theta{(\frac{c}{k\log n})}$ , and sends them to their destinations. The probability that some message is not sampled during this procedure is $\left(1-\Theta{(\frac{c}{k\log n})}\right)^{\Theta{(\left\lceil k\log^{2}{n}/c\right\rceil)}}={O}(1/\operatorname{poly}{n})$ . Thus, by applying a union bound over all messages, each message is sent w.h.p.

For any given round, the probability for a specific message to be sent or received by some node is at most $\Theta{(\frac{c}{k\log n})}$ (it is zero for rounds after the one in which it has been sent). Thus, due to the independence between messages, by a Chernoff Bound and a union bound over senders, receivers and rounds, on each round, each node sends or receives at most $c$ messages w.h.p.

∎

B.1.2 Anonymous Communication Primitives

Definition 11 (Communication Tree).

Given a graph $G=(V,E)$ and a node $s\in V$ , a communication tree rooted at $s$ in $G$ is a $\tilde{O}(1)$ -depth directed tree which is rooted at $s$ and spans $V$ , such that each node has at most 2 edges directed away from it. A communication tree over $W\subseteq V$ , satisfies the conditions above, yet, only spans $W$ and not $V$ .

A communication tree rooted at $s$ allows to efficiently broadcast messages from $s$ to the entire graph as well as compute aggregation functions.

We show it is possible to build many communication trees in parallel.

Lemma B.2 (Constructing Communication Trees).

Given a set of nodes $S$ , it is possible to construct for each $s\in S$ a communication tree rooted at $s$ , $T_{s}$ , such that each node in the graph knows the edges incident to it in each tree. This takes ${\tilde{{O}}}(|S|/c+1)$ rounds of the $\mathsf{AC(c)}$ model, w.h.p.

Proof of Lemma B.2.

Consider the task of constructing $T_{s}$ for a single node $s$ . Node $s$ randomly samples two nodes, $v_{1},v_{2}$ , and tells them it is their parent in $T_{s}$ . Nodes $v_{1},v_{2}$ each sample two nodes and repeat this process. At each step, a node $v_{i}$ might sample a node $v_{j}$ which is already in $T_{s}$ . In such a case, $v_{j}$ rejects the demand of $v_{i}$ to add it as a child. Thus, when building the next level of the tree, we repeat the choosing step $\tilde{O}(1)$ times, ensuring, w.h.p., that each node has two nodes as its children. Notice that this ensures, w.h.p., that at every level in $T_{s}$ , except for the last, each node has exactly two children, and thus the depth of $T_{s}$ is $\tilde{O}(1)$ w.h.p. Thus, in $\tilde{O}(1)$ rounds, a communication tree from $s$ which spans the entire graph is constructed.

In each round, every node sends and receives $\tilde{O}(1)$ messages, w.h.p., thus we can perform this for $\tilde{O}(c)$ nodes in parallel, taking $\tilde{O}(|S|/c)$ rounds overall to build such trees for all $s\in S$ . ∎

Lemma B.3 (Message Doubling on Communication Trees).

Let $W\subseteq V$ be a set of nodes, and $T_{s}$ a communication tree rooted at $s\in W$ , which spans $W$ . It is possible for $s$ to broadcast a set of $M$ messages to the entire set $W$ within $\tilde{O}(|M|/c+1)$ rounds of the $\mathsf{AC(c)}$ model, w.h.p., while utilizing only the communication bandwidth of the nodes in $W$ . Likewise, it is possible to compute $k$ aggregation functions on values of the nodes in $W$ , in $\tilde{O}(k/c+1)$ rounds.

Proof of Lemma B.3.

On the first round, $s$ sends to its children in $T_{s}$ , $s_{\ell}$ and $s_{r}$ , some set $Q_{1}\subseteq M$ , where $|Q_{1}|=c/2$ . On the second round, $s_{\ell}$ and $s_{r}$ forward $Q_{1}$ to their children, while $s$ sends them some other such set $Q_{2}$ . This continues for $|M|/c+\tilde{O}(1)$ rounds. Notice that every node sends and receives at most $c$ messages per round.

To solve $k$ aggregation functions, reversing the flow of messages in the above algorithm suffices. ∎

Lemma B.4 (Synchronization).

In the $\mathsf{AC(c)}$ model, given a communication tree $T$ rooted at some node $s$ and assuming that every node $v$ has a value $val(v)$ , it is possible in $\tilde{O}(1)$ rounds to ensure that $v$ knows the sum $prev(v)$ of all the values $val(u)$ of the nodes $u$ which come before it in the in-order traversal of $T$ . Further, it is possible to solve $k$ instances of this problem in parallel in $\tilde{O}(k/c+1)$ rounds.

Proof of Lemma B.4.

We treat only a single value, noting that allowing $k$ values follows as in the single value case, each node sends and receives only a constant number of messages per round. For a node $v$ , denote by $v_{\ell}$ , $v_{r}$ its left and right children in $T$ , respectively, and by $v_{p}$ its parent.

Start from the leaves of $T$ and sum the total of the values upwards till the root. To clarify, a leaf $v$ sends $val(v)$ to $v_{p}$ . Denote by $subtreeVal(v)$ the sum of all of the values of the nodes in the subtree of $T$ rooted at $v$ . Once $v$ receives $subtreeVal(v_{\ell})$ and $subtreeVal(v_{r})$ from its children, it sends up to $v_{p}$ the sum $subtreeVal(v_{\ell})+val(v)+subtreeVal(v_{r})$ .

Then, the root $s$ of $T$ sets $prev(s)=subtreeVal(s_{\ell})$ . Further, $s$ sends to $s_{\ell}$ the value zero, and sends to $s_{r}$ the sum $prev(s)+val(s)$ . Then, every node $v$ , upon receiving a value $i$ from $v_{p}$ , sets $prev(v)=i+subtreeVal(v_{\ell})$ , forwards the value $i$ to $v_{\ell}$ , and sends $prev(v)+val(v)$ to $v_{r}$ . This algorithm takes $\tilde{O}(1)$ rounds and achieves the desired result. ∎

Lemma B.5 (Broadcasting).

Let $M$ be a set of messages distributed across the nodes arbitrarily. It is possible to broadcast this set of messages to all nodes in $\tilde{O}(|M|/c+1)$ rounds of the $\mathsf{AC(c)}$ model, w.h.p.

Proof of Lemma B.5.

Construct a communication tree $T$ from the node with ID 1, and then send down $T$ the communication token of node 1, which implies that from now on every node can communicate with node 1. Using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1, node 1 receives all of $M$ in $\tilde{O}(|M|/c+1)$ rounds.

Once node 1 knows all of $M$ , we send $M$ down along $T$ in $\tilde{O}(|M|/c+1)$ rounds, using Lemma B.3. ∎

Reversing the flow of messages in the proof of Lemma B.5 proves Corollary B.6.

Corollary B.6 (Aggregation).

It is possible to solve $c$ aggregation problems in ${\tilde{{O}}}(1)$ rounds in the $\mathsf{AC(c)}$ model $c$ . That is, if every node $v$ has a vector of values $\{v_{1},\dots,v_{c}\}$ , denote $i\in[c]:\ S_{i}=\{v_{i}\ |\ v\in V\}$ , and there are $c$ aggregation functions, $f_{1},\dots,f_{c}$ , it is possible to ensure within $\tilde{O}(1)$ rounds that all the nodes know the values $f_{1}(S_{1}),\dots,f_{c}(S_{c})$ .

B.1.3 Communication Tools Within Groups of Nodes

We show the following communication tools related to allowing subsets of nodes in the graph to communicate with one another.

\IfAppendix

LABEL:\next ( (Grouping).)Lemma B.7 allows grouping together disjoint sets of nodes such that each node in a given set knows all the communication tokens of the other nodes in the set. \IfAppendixLABEL:\next ( (Group Broadcasting and Aggregating).)Corollary B.9 allows a single node in the set to quickly broadcast messages to or perform aggregation operations on the set.

Lemma B.7 (Grouping).

Let $V_{1},\dots,V_{k}\subseteq V$ be disjoint sets where $1\leq|V_{i}|\leq p$ , for some $p$ , and where every node $v$ knows if and to which set it belongs. It is possible in ${\tilde{{O}}}(k/c+p/c+1)$ rounds of the $\mathsf{AC(c)}$ model to ensure that, for each $i$ , every node $v\in V_{i}$ knows $Tokens(V_{i})$ w.h.p.

Proof of Lemma B.7.

Due to the definition of the sets, we know that $k\leq n$ . Thus, the nodes $[k]$ (with identifiers $1,\dots,k$ ) broadcast $Tokens([k])$ using \IfAppendixLABEL:\next ( (Broadcasting).)Lemma B.5 in ${\tilde{{O}}}(k/c+1)$ rounds. Using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1, for each $i\in[k]$ , the nodes in $V_{i}$ send $Tokens(V_{i})$ to node $i$ , in $\tilde{O}(p/c+1)$ rounds.

Fix $i\in[k]$ . Node $i$ performs message duplication to tell all nodes in $V_{i}$ the sets $Tokens(V_{i})$ , as follows. Node $i$ chooses some $m_{i}\in V_{i}$ , and tells it, within $O(p/c+1)$ rounds, $Tokens(V_{i})$ . Now, $i$ and $m_{i}$ , each proceed to each tell another node in $V_{i}$ all this information, doubling the number of nodes which know $Tokens(V_{i})$ to 4. This continues for $\tilde{O}(1)$ iterations, each taking $O(p/c+1)$ rounds. ∎

Claim B.8 (Group Communication Tree Construction).

Given a set of nodes $W$ , where every $v\in W$ knows $Tokens(W)$ , it is possible to build a communication tree $T$ over $W$ such that the in-order traversal of the tree imposes any desired ordering of the nodes $W$ . This is done using local computation only and requires no communication.

Proof of Claim B.8.

Since each node in $W$ knows $Tokens(W)$ , this simply entails having every node locally decide which other nodes in $W$ are its parent, left, and right children in the output tree, $T$ . ∎

The following statement follows immediately from \IfAppendixLABEL:\next ( (Group Communication Tree Construction).)Claim B.8 and \IfAppendixLABEL:\next ( (Message Doubling on Communication Trees).)Lemma B.3.

Corollary B.9 (Group Broadcasting and Aggregating).

Given a set of nodes $W$ , where every $v\in W$ knows $Tokens(W)$ , it is possible to allow a single node $s\in W$ to broadcast a set of $M$ messages to the entire set $W$ within $\tilde{O}(|M|/c)$ rounds of the $\mathsf{AC(c)}$ model, w.h.p., while utilizing only the communication bandwidth of the nodes in $W$ . Likewise, it is possible to compute $k$ aggregation functions on values of the nodes in $W$ , in $\tilde{O}(k/c+1)$ rounds.

\IfAppendix

LABEL:\next ( (Group Multicasting).)Lemma B.10 extends \IfAppendixLABEL:\next ( (Group Broadcasting and Aggregating).)Corollary B.9 in order to allow nodes to efficiently send multicast messages within their given set.

Lemma B.10 (Group Multicasting).

Given a set of nodes $W$ , where every $v\in W$ knows $Tokens(W)$ , and a value $k$ such that every node $v\in W$ desires to multicast at most $k$ messages to $m_{v}$ nodes in $W$ , where each node $u\in W$ is the destination of messages originating in at most one node, it is possible to perform the communication task in $\tilde{O}(k/c+m/c+1)$ rounds of the $\mathsf{AC(c)}$ model, where $m$ is an upper bound for all $m_{v}$ .

Proof of Lemma B.10.

Assume, w.l.o.g., that the nodes in $W$ have identifiers $[|W|]$ . Then, we use \IfAppendixLABEL:\next ( (Group Communication Tree Construction).)Claim B.8 to construct a communication tree $T$ over $W$ , with the demand that the in-order traversal of the tree will output the nodes of $W$ in ascending order from $1$ to $|W|$ . Then, execute, in $\tilde{O}(1)$ rounds, the algorithm given by \IfAppendixLABEL:\next ( (Synchronization).)Lemma B.4 over $T$ in order to ensure that every node $i\in W$ knows the sum $s_{i}=\sum_{j\in[i-1]}m_{j}$ .

Now, node $i$ tells node $s_{i}+1$ the values $s_{i}$ and $m_{i}$ . Then, node $i$ tells these values to $s_{i}+2$ , while node $s_{i}+1$ tells them to node $s_{i}+3$ , and we continue in this fashion for $\tilde{O}(1)$ rounds until all nodes in $[s_{i}+1,s_{i}+m_{i}]$ know the values $s_{i}$ and $m_{i}$ . We now call these nodes the gateways of node $i$ . Notice that every node has distinct gateways – that is, no node is a gateway of more than one node.

Node $i$ now sends all of its $k$ messages to its gateways. To do so, it tells its first gateway all of its $k$ messages in $O(k/c+1)$ rounds. Then, it tells its second gateway, while the first gateway tells the third gateway. This continues for $\tilde{O}(1)$ iterations, each taking $O(k/c+1)$ rounds, until all the gateways of node $i$ know all the messages of $i$ . Next, in $\tilde{O}(m_{i}/c+1)$ rounds, node $i$ sequentially tells its $j^{th}$ gateway the identifier of the $j^{th}$ node which is set to receive the multicast messages from $i$ . Finally, every $j^{th}$ gateway forwards the $k$ messages to the target which $i$ tells it to send the messages to, in $O(k/c+1)$ rounds. ∎

B.2 Carrier Configurations

Refer to caption — Figure 1: The *Carrier Configuration* Distributed Data Structure
In this example, $k=2$ , $C_{v}^{out}=\{v_{1},v_{2},v_{3},v_{4}\}$ , $C_{u}^{in}=\{u_{1},u_{2},u_{3}\}$ . The two arrays denote which edges $v$ and $u$ have, with a *checkmark* indicating the existence of an edge. The node $u_{1}$ holds information about $u$ and the first four nodes. That is, it knows that there are edges from the first two nodes to $u$ and that there are no edges from the following to nodes to $u$ . Notice that in this case $v_{2}$ and $u_{1}$ both hold the edge $e=(v,u)$ and thus will know its weight, direction, the communication tokens of $v$ and $u$ , and the communication tokens of each other ( $v_{2}$ and $u_{1}$ ). Further, $v,u$ have communication trees (not shown), which allow them to perform broadcast and aggregate operations on all of $C_{v}^{out},$ $C_{u}^{in}$ , respectively.

Definition 12 (Carrier Configuration).

Given a set of nodes $V$ , a data structure $C$ is a Carrier Configuration holding a graph $G=(V,E)$ with average degree $k=|E|/|V|$ , if for every node $v\in V$ the following hold:

Carrier Node Allocations

1.

$C_{v}^{out},C_{v}^{in}\subseteq V$ are the outgoing and incoming carrier nodes of $v$ , where $|C_{v}^{out}|=\lceil\deg^{out}(v)/k\rceil$ , $|C_{v}^{in}|=\lceil\deg^{in}(v)/k\rceil$ .
2.

$v$ is in at most $\zeta\log n$ sets $C_{i}^{out}$ and $\zeta\log n$ sets $C_{j}^{in}$ , for a constant $\zeta$ , and knows which sets it is in.

Data Storage

3.

An edge $e\in E$ is always stored alongside its weight and direction.
4.

For each $u\in C_{v}^{out}$ , there exists an interval $I_{u}\subseteq[n]$ , such that $u$ knows all of the edges directed away from $v$ and towards nodes with IDs in the interval $I_{u}$ , and there are at most $k$ such edges. It further holds that the intervals $\{I_{u}\ |\ u\in C_{v}^{out}\}$ partition $[n]$ . Similar constraints hold for $C_{v}^{in}$ .
5.

Node $v$ knows, for each $u\in C_{v}^{out}\cup C_{v}^{in}$ , the two delimiters of the interval $I_{u}$ .

Communication Structure

6.

For each $v$ , the nodes in $\set{v}\cup C_{v}^{in}$ are connected by the communication tree $T_{v}^{out}$ , implying that each node knows its parent and children in the tree. The same holds for nodes in $\set{v}\cup C_{v}^{out}$ .

The definition of the data structure is compatible with both directed and undirected graphs, where for undirected graphs each edge appears in both directions. We also use carrier configurations for holding matrices, where it can be thought that every finite entry at indices $(i,j)$ in a matrix represents an edge from node $i$ to $j$ . Each node $i$ stores the finite entries of row $i$ as edges outgoing from $i$ , and the finite entries of column $i$ as edges incoming to $i$ .

In order to use carrier configurations in the $\mathsf{AC(c)}$ model, we must slightly extend the definition in order to address the usage of communication tokens. Thus, we present the following definition for $\mathsf{AC(c)}$ Carrier Configurations, to which we often refer simply as ‘carrier configurations’.

Definition 13 ( $\mathsf{AC(c)}$ Carrier Configuration).

Given a set of nodes $V$ , a data structure $C$ is a $\mathsf{AC(c)}$ carrier configuration holding a graph $G=(V,E)$ with average degree $k=|E|/|V|$ , if it is a Carrier Configuration and, additionally, for every node $v\in V$ the following hold:

1.

Node $v$ knows $Tokens(C_{v}^{in})$ and $Tokens(C_{v}^{out})$ .
2.

Node $u$ knows the communication tokens and identifiers of each $v$ such that $u\in C_{v}^{in}$ or $u\in C_{v}^{out}$ .
3.

Node $u$ knows the communication tokens of its parent and children in each communication tree that it belongs to as part of the data structure.
4.

An edge $e=(u,v)$ is always stored alongside the communication tokens of $u$ and $v$ .
5.

Every $u\in C_{v}^{out}$ knows for every edge $e=(v,w)$ which it holds, the communication token of node $u^{\prime}\in C_{w}^{in}$ which also holds $e$ . Similarly, $u^{\prime}$ knows the communication token of $u$ .

Throughout this entire section, the term ‘carrier configuration’ refers to \IfAppendixLABEL:\next ( ( $\mathsf{AC(c)}$ Carrier Configuration).)Definition 13, unless otherwise specified.

B.2.1 Initialization

We show how to construct a carrier configuration, given that the edges of the graph are initially known to the nodes incident to them. As the stages taken during the construction can be partially reused in other algorithms which we show, we break up the construction into two statements – \IfAppendixLABEL:\next ( (Initialize Carriers).)Lemma B.11 creates an empty carrier configuration by allocating the carrier node sets and creating communication trees spanning them, and \IfAppendixLABEL:\next ( (Populate Carriers).)Lemma B.12 transfers the data from nodes to their carrier nodes.

Lemma B.11 (Initialize Carriers).

Given a graph $G=(V,E)$ , with $k=|E|/|V|$ , and $\Delta$ the maximal degree in $G$ , where each node initially only knows $\deg_{G}^{in}{(v)},\ \deg_{G}^{out}{(v)}$ (but not even the edges incident to it), it is possible to assign for each node $v\in V$ sets $C_{v}^{in}$ , $C_{v}^{out}$ which satisfy Items 1, 2, 6, 1, 2 and 3, in $\tilde{O}(\Delta/(k\cdot c)+1)$ rounds, w.h.p.

Note: We do not assume that $k$ and $\Delta$ are originally known to all the nodes.

Proof of Lemma B.11.

We perform two operations in this proof. First, we allocate the carrier node sets. Then, we construct communication trees across them. We show the case of outgoing carrier nodes, $C_{v}^{out}$ , and note that the case of $C_{v}^{in}$ is symmetric.

Carrier Allocations: We start by computing the values $k=\left|E\right|/\left|V\right|$ and $\Delta$ , using \IfAppendixLABEL:\next ( (Aggregation).)Corollary B.6, in $\tilde{O}(1)$ rounds. Each node $v$ selects $C_{v}^{out}$ by sending its communication token and identifier to $\lceil\deg^{out}(v)/{k}\rceil$ random nodes, and each node which $v$ reaches replies to $v$ with its communication token and identifier. The expected number of times a $u$ node is picked as carrier node is at most ${{1}/{n}}\cdot\sum_{v\in V}{\left\lceil{\deg^{out}{(v)}}/{k}\right\rceil}=({2\left|E\right|/k}+n)/{n}=3$ , and thus by applying a Chernoff Bound, there exists a constant $\zeta$ such that each node is picked by $\zeta\log{n}$ (not necessarily distinct) nodes in order to be in their carrier node set w.h.p. This concludes the creation of the sets themselves, and satisfies Items 1, 2, 1 and 2.

The round complexity of this step is $\tilde{O}(\Delta/(k\cdot c)+1)$ , as each node $v$ initially sends $O(\deg{(v)}/k+1)$ messages, and then replies to the at most $\tilde{O}(1)$ nodes which chose it for their carrier configuration sets.

Communication Trees: Node $v$ locally builds a balanced binary tree $T_{v}$ which spans $C_{v}^{out}$ , and sends to each $u\in C_{v}^{out}$ the communication tokens of its parent and children in $T_{v}$ , taking $\tilde{O}(|C_{v}^{out}|/c+1)=\tilde{O}(\Delta/(k\cdot c)+1)$ rounds w.h.p. Notice that $T_{v}$ is a communication tree (Definition 11) as it is of depth ${O}(\log{n})$ , and thus we satisfy Items 6 and 3. ∎

Now, we assume that we are given a carrier configuration which is still incomplete and only satisfies the conditions from the previous statement, and we complete it to a proper carrier configuration.

Lemma B.12 (Populate Carriers).

Let $G=(V,E)$ be a graph where each node $v$ knows all the edges incident to it and the communication tokens of all of its neighbors. Assume that we have a carrier configuration $C$ which is currently in-construction and satisfies all of the properties of a carrier configuration, except for Items 3, 4, 5, 4 and 5. Then, it is possible, within $\tilde{O}(\Delta/c+1)$ rounds, where $\Delta$ is the maximal degree in the graph, to reach a state where $C$ satisfies all of the properties of a carrier configuration, w.h.p.

Proof of Lemma B.12.

We show the procedure for, $C_{v}^{out}$ , and note that the case of $C_{v}^{in}$ is symmetric.

Node $v$ partitions the identifier space $\left[n\right]$ into $|C_{v}^{out}|$ intervals, $I_{v}^{1},\dots,I_{v}^{|C_{v}^{out}|}$ , such that for every such interval $I_{v}^{i}$ , the number of edges directed from $v$ to nodes with identifiers in $I_{i}$ is at most $k$ . Denote by $\delta_{v}^{i}$ the edges from $v$ to nodes with identifiers in $I_{v}^{i}$ . For each node $u\in C_{v}^{out}$ , node $v$ assigns $u$ a unique interval $I_{v}^{i_{u}}$ , and sends to $u$ the delimiters of the interval as well as all the edges in $\delta_{v}^{i_{u}}$ . Every edge is sent along with its weight, direction, and the communication tokens and identifiers of both of its endpoints.

The above procedure satisfies Items 3, 4, 5 and 4. We proceed to analyze the round complexity of this step. Clearly, every node $v$ desires to send at most $O(\Delta)$ messages. To bound the number of messages each node receives, recall that by Item 2, each node $v$ is a carrier node in at most ${\tilde{{O}}}(1)$ carried nodes sets, and the number of messages it receives on behalf of each of them is at most $k\leq\Delta$ . Thus, each node desires to receive $\tilde{O}(\Delta)$ messages. Therefore, by \IfAppendixLABEL:\next ( (Routing).)Lemma B.1, this stage can be executed in ${\tilde{{O}}}(\Delta/c+1)$ rounds w.h.p.

Finally, we need to satisfy Item 5. We assume that every node $v$ followed the above procedure to construct both sets $C_{v}^{out}$ , and $C_{v}^{in}$ which satisfy all properties except for Property 5, and now we show how, at once, this property can be satisfied for both $C_{v}^{out}$ , and $C_{v}^{in}$ . For every node $v$ , and every edge $e=(v,w)$ for some $w$ , let $u\in C_{v}^{out}$ be the node holding $e$ , then node $u$ asks node $w$ which node $x\in C_{w}^{in}$ holds $e$ . Node $w$ , which knows this information due to the above, replies to $u$ with the answer. As each node carries ${\tilde{{O}}}(k)={\tilde{{O}}}(\Delta)$ edges, and each node receives $\Delta$ requests, one for each of its edges, by \IfAppendixLABEL:\next ( (Routing).)Lemma B.1 it takes an additional ${\tilde{{O}}}(\Delta/c+1)$ rounds. ∎

Applying \IfAppendixLABEL:\next ( (Initialize Carriers).)Lemma B.11, followed by \IfAppendixLABEL:\next ( (Populate Carriers).)Lemma B.12, directly gives the following.

Lemma B.13 (Initialize Carrier Configuration).

Given a graph $G=(V,E)$ , where each node $v$ knows all the edges incident to it and the communication tokens of all of its neighbors in $G$ , it is possible, within $\tilde{O}(\Delta/c+1)$ rounds, where $\Delta$ is the maximal degree in the graph, to reach a state where $G$ is held in a carrier configuration $C$ , w.h.p.

B.2.2 Basic Tools

We show a basic communication tool within carrier configurations.

Lemma B.14 (Carriers Broadcast and Aggregate).

Let $G=(V,E)$ be a graph held in a carrier configuration $C$ . In parallel for all nodes, every $v\in V$ can broadcast $c$ messages to all the nodes in $C_{v}^{in}$ and $C_{v}^{out}$ , as well as solve $c$ aggregation tasks over $C_{v}^{in}$ and $C_{v}^{out}$ . This requires $\tilde{O}(1)$ rounds.

Proof of Lemma B.14.

Due to Items 6 and 3, there is a communication tree spanning each $C_{v}^{in}$ and $C_{v}^{out}$ , and every carrier node knows the communication tokens of its parent and children in the tree. Further, since each node is a member of at most ${\tilde{{O}}}(1)$ sets of carrier nodes, it is possible to apply \IfAppendixLABEL:\next ( (Message Doubling on Communication Trees).)Lemma B.3 simultaneously across all the communication trees in the carrier configuration, in $\tilde{O}(1)$ rounds, proving the claim.

∎

We show the following helpful statement which enables nodes to query a carrier configuration and return to the classical state in which edges are known by the nodes incident to them.

Lemma B.15 (Learn Carried Information).

Given a graph $G=(V,E)$ with average degree $k=\left|E\right|/\left|V\right|$ held in a carrier configuration $C$ , it is possible for each node $v$ to learn all edges adjacent to it in $G$ in ${\tilde{{O}}}(\Delta/c+1)$ rounds w.h.p., where $\Delta$ is the maximal degree in $G$ . It is possible to invoke this procedure for only outgoing or incoming edges separately, requiring ${\tilde{{O}}}(\Delta_{out}/c+1)$ , ${\tilde{{O}}}(\Delta_{in}/c+1)$ rounds, respectively, where $\Delta_{out}$ is the maximal out-degree, and $\Delta_{in}$ is the maximal in-degree.

Proof of Lemma B.15.

First, each node $v$ computes $\deg{(v)}$ by summing up the number of edges nodes in $C_{v}^{out}$ and $C_{v}^{in}$ hold. Then, the nodes compute the maximum of their degrees, the value $\Delta$ . Every node in $C_{v}^{out}$ and $C_{v}^{in}$ sends to $v$ the edges incident to $v$ which it holds. Node $v$ desires to receive at most $O(\Delta)$ messages, and each node desires to send at most $\tilde{O}(\Delta)$ messages, as every node is the carrier of at most $\tilde{O}(1)$ nodes. Thus, due to \IfAppendixLABEL:\next ( (Routing).)Lemma B.1, this requires $\tilde{O}(\Delta/c+1)$ rounds.

∎

It is possible to extend Lemma B.15, and show that if for a node $v$ , both $v$ and all the carrier nodes of $v$ know some predicate over edges, then it is possible to send to $v$ only edges incident to it which satisfy the predicate. We formalize this, as follows.

Lemma B.16 (Learn Carried Information with Predicate).

Assume that we are given a graph $G=(V,E)$ with average degree $k=\left|E\right|/\left|V\right|$ held in a carrier configuration $C$ . If each node $v$ has a predicate $p_{v}$ over the edges incident to $v$ , which both $v$ and the nodes $C_{v}^{out},C_{v}^{in}$ know, then it is possible for each node $v$ to learn all edges incident to it in $G$ which satisfy the predicate. The round complexity for this procedure is ${\tilde{{O}}}(\Delta_{p}/c+1)$ rounds w.h.p., where $\Delta_{p}$ is the maximal number of edges incident to any node $v$ which satisfy $p_{v}$ .

B.2.3 Merging Carrier Configurations

We present a useful tool, which shows how to compute the point-wise minimum of two matrices. With respect to graphs, this can be seen as adding edges to a graph, and if an edge exists twice, then setting its weight to the minimum of the two. This tool can be used in order to merge two carrier configurations.

Lemma B.17 (Merging).

Let $V$ be a set of nodes which hold two $n\times n$ matrices $S,T$ in carrier configurations $A$ , $B$ , respectively. Denote by $P=\min\{S,\ T\}$ the matrix generated by taking the point-wise minimum of the two given matrices. It is possible within $\tilde{O}((k_{S}+k_{T}+h_{A}+h_{B})/c+1)$ rounds to output $P$ in a carrier configuration $C$ , where the values $k_{S},k_{T}$ denote the average number of finite elements per row of $S,T$ , respectively, and the values $h_{A},h_{B}$ denote the maximal number of carriers each node has in $A$ , $B$ , respectively, w.h.p.

Proof of Lemma B.17.

We show how to set up $C_{v}^{out}$ , and note that the case of $C_{v}^{in}$ is symmetric. Thus, we sometimes drop the superscripts and denote $A_{v}=A_{v}^{out},B_{v}=B_{v}^{out},C_{v}=C_{v}^{out}$ . A critical note is that in the following proof, when we denote $A_{v}\cup B_{v}$ , if a node appears in both carrier sets, we count it twice in the union. That is, $A_{v}\cup B_{v}$ denotes a multiset. Further, let $e$ be some edge which is held in $A$ or $B$ by some carrier node $w$ . At the onset of the algorithm, $w$ attaches its identifier and communication token to $e$ – that is, whenever $e$ is sent in a message, it is sent along with these values as metadata.

Proof Overview

Goal: Consider a node $v\in V$ . Essentially, node $v$ has a sparse array (row $v$ in matrix $S$ , denoted as $S[v]$ ) held, in a distributed fashion, over the nodes in $A_{v}$ , and a sparse array (row $v$ in matrix $T$ , denoted as $T[v]$ ) held over the nodes in $B_{v}$ . Node $v$ wishes to merge these two arrays into one sparse array (the currently not-yet computed $P[v]$ ), and hold it in some (currently not allocated) carrier set $C_{v}$ . In the case that an entry appears in both $S[v]$ and $T[v]$ , it should keep the minimum of the values.

Merging: Initially, $v$ performs some merging mechanism in order to compute the sparse array $P[v]$ . At the end of this step, the array $P[v]$ is distributed across the nodes $A_{v}$ and $B_{v}$ , as we have yet to allocate $C_{v}$ .

Constructing $C$ : Finally, we allocate the set $C_{v}$ , and move the data of $P[v]$ from its temporary storage in the nodes $A_{v}$ and $B_{v}$ to be distributed across $C_{v}$ . Further, several steps are taken to ensure that $C$ is a valid carrier configuration.

Step: Merging

Observe some node $v$ . In this step, our goal is to compute $P[v]$ , and store it in a convenient distributed representation across the nodes in $A_{v}\cup B_{v}$ .

Initially, we desire for all the nodes in $A_{v}\cup B_{v}$ to be able to communicate with one another. Node $v$ knows the communication tokens and identifiers of the nodes in $A_{v}\cup B_{v}$ (Item 1), and broadcasts all of them to all the nodes $A_{v}\cup B_{v}$ in $\tilde{O}(|A_{v}|/c+|B_{v}|/c+1)=\tilde{O}(h_{A}/c+h_{B}/c+1)$ rounds using \IfAppendixLABEL:\next ( (Carriers Broadcast and Aggregate).)Lemma B.14.

Due to Item 4, $S[v]$ is distributed across $A_{v}$ such that an interval $I_{w}=[w_{b},w_{e}]$ corresponds to each $w\in A_{v}$ , where $w$ holds all the finite elements in $S[v][I_{w}]$ (entries from index $w_{b}$ to index $w_{e}$ of $S[v]$ ). The same holds for $B_{v}$ . For a set of carrier nodes $X$ , denote $I(X)=\{I_{w}|w\in X\}$ , and for a set of intervals $J$ , denote by $D(J)=\{x,y|[x,y]\in J\}$ the set of delimiters of $J$ . Due to Property 5, node $v$ knows $D(I(A_{v})\cup I(B_{v}))$ and $D(I(B_{v}))$ . We now perform the following steps.

1.

Node $v$ computes $J=\{[x,y]\ |\ x<y\in D(I(A_{v})\cup I(B_{v}))\land\nexists z\in D(I(A_{v})\cup I(B_{v}))$ , where $x<z<y\}$ , that is, the partition of $[n]$ into intervals using all of the delimiters in $D(I(A_{v})\cup I(B_{v}))$ . Notice that $|J|\leq 2(|I(A_{v})|+|I(B_{v})|)=2(|A_{v}|+|B_{v}|)$ , and every $I\in J$ is contained in exactly one interval in $I(A_{v})$ and in one interval in $I(B_{v})$ . Further, $\bigcup_{K\in J}=[n]$ , since $\bigcup_{K\in I(A_{v})}=[n]$ and $\bigcup_{K\in I(B_{v})}=[n]$ .
2.

Node $v$ broadcasts $D(J)$ to $A_{v}\cup B_{v}$ in $\tilde{O}(|J|/c+1)=\tilde{O}(|A_{v}|/c+|B_{v}|/c+1)=\tilde{O}(h_{A}/c+h_{B}/c+1)$ rounds using Lemma B.14.

Notice that all the nodes in $A_{v}\cup B_{v}$ know the identifiers of one another (guaranteed above), and also all of $D(J)$ . Thus, it is possible for the nodes in $A_{v}\cup B_{v}$ to perform local computation which allocates to each $u\in A_{v}\cup B_{v}$ two intervals, $K_{u}^{1},K_{u}^{2}\in J$ , and every node in $A_{v}\cup B_{v}$ knows that $u$ is assigned $K_{u}^{1},K_{u}^{2}$ .

Now, we wish for $u$ to learn the finite entries in $S[v][K_{u}^{1}],\ T[v][K_{u}^{1}],\ S[v][K_{u}^{2}],\ T[v][K_{u}^{2}]$ , and compute $P[v][K_{u}^{1}],\ P[v][K_{u}^{2}]$ . To do so, we need to route the finite entries which $u$ requires from their current storage in the nodes $A_{v}\cup B_{v}$ to $u$ . We bound the amount of information $u$ receives. For any interval $L_{a}\in I(A_{v})$ and $L_{b}\in I(B_{v})$ , there are at most $k_{S}$ and $k_{T}$ finite elements in $S[v][L_{a}]$ and $T[v][L_{b}]$ , respectively. Further, every interval $K\in J$ is contained in exactly one interval in $I(A_{v})$ and one interval in $I(B_{v})$ , and so the number of finite elements in $S[v][K]$ and $T[v][K]$ is at most $k_{S}$ and $k_{T}$ respectively. Therefore, node $u$ desires to learn at most $O(k_{S}+k_{T})$ finite elements. We now bound the amount of information node $u$ sends to other nodes in $A_{v}\cup B_{v}$ in order to let them learn their desired intervals. Node $u$ originally holds at most $O(k_{S}+k_{T})$ finite elements, and each element is desired by exactly one node. Therefore, node $u$ sends at most $O(k_{S}+k_{T})$ finite elements. Thus, we conclude that every node in $A_{v}\cup B_{v}$ sends and receives at most $O(k_{S}+k_{T})$ messages to and from other nodes in $A_{v}\cup B_{v}$ , showing that this step can be completed in $\tilde{O}((k_{S}+k_{T})/c+1)$ rounds, using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1.

Finally, we wish for all the nodes in $A_{v}\cup B_{v}$ to know, for each $K\in J$ , how many finite entries are in $P[v][K]$ . Using Lemma B.1, every node $u\in A_{v}\cup B_{v}$ sends to $v$ the number of finite entries in $P[v][K_{u}^{1}]$ and in $P[v][K_{u}^{2}]$ , within $\tilde{O}(|A_{v}|/c+|B_{v}|/c+1)=\tilde{O}(h_{A}/c+h_{B}/c+1)$ rounds. Then, $v$ broadcasts all of this information to $A_{v}\cup B_{v}$ using \IfAppendixLABEL:\next ( (Carriers Broadcast and Aggregate).)Lemma B.14 in $\tilde{O}(|A_{v}|/c+|B_{v}|/c+1)=\tilde{O}(h_{A}/c+h_{B}/c+1)$ rounds.

Step: Constructing $C$

We perform several operations in this step. First, we invoke \IfAppendixLABEL:\next ( (Initialize Carriers).)Lemma B.11 w.r.t. $P$ , in order to create the carrier sets $C_{v}$ , which satisfy all of the properties of a carrier configuration, except for Items 3, 4, 5, 4 and 5. These remaining properties relate to populating the sets $C_{v}$ with data. Therefore, we then populate $C_{v}$ with the data pertaining to $P[v]$ .

Sub-step – Invoking Lemma B.11: In order to invoke Lemma B.11 w.r.t. $P$ , every node $v$ needs to know the number of finite entries in row $v$ of $P$ and column $v$ of $P$ . Notice that $v$ can compute the number of finite entries in $P[v]$ by aggregating over the nodes $A_{v}\cup B_{v}$ . In order to compute the number of finite entries in column $v$ of $P$ , recall that at the beginning of the proof, we say that our analysis follows only the rows of the matrices, thus, inherently one also runs the algorithm up to this point on the columns of the matrices. Therefore, in a symmetric way, $v$ can know the number of finite entries in column $v$ of $P$ . Next, we analyze the round complexity of invoking Lemma B.11 w.r.t. $P$ . Denote by $k_{P}$ the average number of finite elements in a row of $P$ , and, by aggregation, all of the nodes of the graph compute $k_{P}$ . Notice that $k_{P}\geq k_{S}$ , and $k_{P}\geq k_{T}$ , as in each row the number of finite elements could only have increased due to the minimization operation. Further, the maximal number of finite elements in a row of $P$ is at most the maximal number of finite elements in a row of $S$ plus the maximal number of finite elements in a row of $T$ . Thus, the round complexity is $\tilde{O}((k_{S}\cdot h_{A}+k_{T}\cdot h_{B})/(k_{P}\cdot c)+1)=O((h_{A}+h_{B})/c+1)$ rounds.

Sub-step – Ensuring Items 3, 4, 5 and 4: First, we begin by computing the intervals $I(C_{v})$ . Notice that $|C_{v}|\leq|A_{v}|+|B_{v}|$ , since $k_{P}\geq k_{S},\ k_{P}\geq k_{T}$ , and $A_{v}$ and $B_{v}$ can themselves hold the vectors $S[v],\ T[v]$ with each node in each set carrying at most $k_{S},\ k_{T}$ finite elements, respectively. Now, we partition $[n]$ into $|C_{v}|$ intervals $I(C_{v})=L_{1},\dots,L_{|C_{v}|}$ , such that the number of finite elements in every $P[v][L_{i}]$ is at most $k_{P}$ . As the nodes in $A_{v}\cup B_{v}$ all know $D(J)$ , as well for each $K\in J$ , how many finite entries are in $P[v][K]$ , every node in $A_{v}\cup B_{v}$ knows for every finite element in $P[v]$ which it holds the number of finite elements preceding it in $P[v]$ . Thus, for every interval $L_{i}\in I(C_{v})$ , there exist some two nodes $x,y\in A_{v}\cup B_{v}$ such that $x$ can compute the left endpoint of $L_{i}$ and $y$ can compute the right. We show this for left endpoints as the proof for right endpoints is symmetric. The left endpoint of $L_{i}$ is the index of the $[(i-1)\cdot k_{P}+1]$ -th finite entry in $P[v]$ , and thus the node in $A_{v}\cup B_{v}$ which holds this finite element, knows the left endpoint of $L_{i}$ . In $\tilde{O}(|C_{v}|/c+1)=\tilde{O}(|A_{v}|/c+|B_{v}|/c+1)=\tilde{O}(h_{A}/c+h_{B}/c+1)$ rounds, the nodes in $A_{v}\cup B_{v}$ tell $v$ the contents of $D(I(C_{v}))$ , using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1. In the same round complexity, $v$ broadcasts $D(I(C_{v}))$ to $A_{v}$ , $B_{v}$ , and $C_{v}$ , using \IfAppendixLABEL:\next ( (Carriers Broadcast and Aggregate).)Lemma B.14.

Now, we move the finite entries of $P[v]$ from the nodes in $A_{v}\cup B_{v}$ to the nodes in $C_{v}$ . Node $v$ broadcasts the communication tokens and identifiers of all the nodes in $C_{v}$ to all the nodes in $A_{v}\cup B_{v}$ , in $\tilde{O}(|C_{v}|/c+1)=\tilde{O}(|A_{v}|/c+|B_{v}|/c+1)=\tilde{O}(h_{A}/c+h_{B}/c+1)$ rounds. The nodes in $A_{v}\cup B_{v}$ communicate all of the finite entries of $P[v]$ to $C_{v}$ , each node knowing where to send the information which it holds as all the nodes in $A_{v}\cup B_{v}$ know $D(I(C_{v}))$ . Each node sends or receives at most $O(k_{S}+k_{T}+k_{P})=O(k_{S}+k_{T})$ messages, therefore routing these messages takes $\tilde{O}(k_{S}/c+k_{T}/c+1)$ rounds using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1.

Observe that the above procedure ensures that $C$ satisfies Items 3, 4, 5 and 4.

Sub-step – Ensuring Item 5: Recall that, as stated at the beginning of the proof, we show how to construct $C_{v}^{out}$ , while the case of $C_{v}^{in}$ is a symmetric algorithm. At this point, we require that all of the above be executed w.r.t. to both $C_{v}^{out}$ and $C_{v}^{in}$ . This is due to the fact that in order to satisfy Item 5 for $C_{v}^{out}$ , we query the nodes of $C_{v}^{in}$ for some information which they compute above. Thus, we now show how Item 5 is satisfied for $C_{v}^{out}$ . In a symmetric way, it can be shown for $C_{v}^{in}$ .

Let there be some edge $e=(v,w)$ , for some $w\in V$ , which is now held in $\gamma\in C_{v}^{out}$ . Denote by $u\in A_{v}^{out}\cup B_{v}^{out}$ , the node which originally held $e$ at the onset of the algorithm, and recall that at the onset of the algorithm (stated at the beginning of the proof), we attach to $e$ the communication token and identifier of $u$ , and so $\gamma$ knows $u$ . W.l.o.g., assume that $u\in A_{v}^{out}$ . Due to the fact that $A$ is a carrier configuration, node $u$ knows the communication token and identifier of node $w^{\prime}\in A_{w}^{in}$ which also holds $e$ . Again, assume that at the onset of the algorithm, node $u$ attached to $e$ the communication token and identifier of $w^{\prime}$ . Thus, node $\gamma$ knows the communication token and identifier of $w^{\prime}$ .

As such, $\gamma$ asks $w^{\prime}$ which node in $C_{w}^{in}$ holds $e$ . Node $w^{\prime}$ is able to answer this query, as all the nodes in $A_{v}^{in}\cup B_{v}^{in}$ know which intervals are held by which nodes in $C_{v}^{in}$ . The answer to this query is exactly the information which node $\gamma$ needs in order to satisfy Item 5. We analyze the round complexity of this routing. Each node in $C_{v}^{out}$ sends queries only w.r.t. edges it holds as part of $C$ , and each node in $A_{v}^{in},B_{v}^{in}$ answers queries only w.r.t. edges it holds in $A,B$ . Thus, this step can be executed in $\tilde{O}(k_{S}/c+k_{T}/c+k_{P}/c+1)=\tilde{O}(k_{S}/c+k_{T}/c+1)$ rounds, using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1. ∎

B.2.4 Partial Carrier Configuration

We now prove a fundamental tool which can be roughly viewed as computing the transpose of a matrix. Notice that each entry of data is stored twice in a carrier configuration $C$ . For instance, an edge $e=(v,w)$ is stored in both $C_{v}^{out}$ , and $C_{w}^{in}$ . We show that if only the outgoing carrier sets $C_{v}^{out}$ are stored, one can complete the data structure to contain also the incoming carrier sets $C_{v}^{in}$ .

This is a very useful tool, as sometimes nodes can only compute the edges directed away from them, and not the edges directed towards them. For instance, in \IfAppendixLABEL:\next ( (Sparse Matrix Multiplication).)Theorem 3.1, we reach a state where there are few edges directed away from every node, but potentially $\Theta(n)$ edges directed towards some nodes. If one were to simply invoke \IfAppendixLABEL:\next ( (Initialize Carrier Configuration).)Lemma B.13, this would require every node to learn all of the edges directed both away and towards it, which would incur a high round complexity. Instead, \IfAppendixLABEL:\next ( (Partial Configuration Completion).)Lemma B.18 shows that given that every node $v$ has a partial carrier set holding edges directed away from it, the matching carrier set for edges directed towards $v$ can be allocated and directly populated with these edges without the edges ever being known to $v$ itself.

Definition 14 (Partial Carrier Configuration).

Given a set of nodes $V$ , a data structure $C$ is a partial carrier configuration holding a graph $G=(V,E)$ if all the conditions of \IfAppendixLABEL:\next ( ( $\mathsf{AC(c)}$ Carrier Configuration).)Definition 13 hold, yet, only for the outgoing edges. That is, each node $v\in V$ only has $C_{v}^{out}$ .

Notice that Item 5 is not demanded, as it requires the existence of both $C_{v}^{out}$ and $C_{v}^{in}$ .

Lemma B.18 (Partial Configuration Completion).

Given a graph $G=(V,E)$ which is held in a partial carrier configuration $C$ , there exists an algorithm which runs in $\tilde{O}(\sqrt{nk}/c+n/(k\cdot c)+1)$ rounds, where $k=|E|/|V|$ , and outputs a carrier configuration $D$ holding $G$ , w.h.p.

Proof of Lemma B.18.

We assign $D_{v}^{out}=C_{v}^{out}$ for every $v\in V$ , and thus we are required to show two items in this proof: how to allocate the sets $D_{v}^{in}$ , and how to populate them with data.

Allocating $D_{v}^{in}$ :

In order to allocate $D_{v}^{in}$ , node $v$ needs to know $\deg_{G}^{in}{(v)}$ . Denote $m=|E|$ , $t=\sqrt{m}$ , and $L=\{v\in V\ |\ \deg_{G}^{in}{(v)}\leq 2t\}$ , $H=V\setminus L$ . The sets $L$ and $H$ contain light and heavy nodes, respectively, yet, notice that at the current stage in the algorithm, no node knows whether it itself is in $L$ or in $H$ , as it does not know $\deg_{G}^{in}{(v)}$ . Our goal is to satisfy an even stronger condition – to make every node $v\in V$ know for every $u\in V$ whether $u$ is in $L$ or $H$ .

For a set of nodes $X\subseteq V$ , denote by $\deg_{G}^{in}{(X)}=\sum_{v\in X}\deg_{G}^{in}{(v)}$ . Let $\mathbb{V}=\{V_{1},\dots,V_{t}\}$ be an arbitrary, hardcoded, globally known partition of $V$ , where all the parts are of roughly equal size. Using \IfAppendixLABEL:\next ( (Aggregation).)Corollary B.6, all nodes compute the values $\deg_{G}^{in}{(V_{1})},\dots,\deg_{G}^{in}{(V_{t})}$ in $\tilde{O}(t/c+1)$ rounds. Denote the set of parts in $\mathbb{V}$ which have low in-degree by $\mathbb{L}=\{V_{i}\in\mathbb{V}|\deg_{G}^{in}{(V_{i})}\leq 2t\}$ , and the high in-degree ones by $\mathbb{H}=\mathbb{V}\setminus\mathbb{L}$ . Given $v\in V$ , if $v$ belongs to some set in $\mathbb{L}$ , that is, $v\in V_{i},\ V_{i}\in\mathbb{L}$ , then certainly $v\in L$ . As $\mathbb{V}$ is hardcoded and globally known, and all the nodes know $\mathbb{L}$ , all such nodes $v$ know that they are in $L$ , and further all the nodes in the graph know this as well.

As $\deg_{G}^{in}{(V_{1})}+\dots+\deg_{G}^{in}{(V_{t})}=\deg_{G}^{in}{(V)}=m=t^{2}$ , it holds that $|\mathbb{H}|<t/2$ , implying, $\mathbb{L}\geq t/2$ . Since the sets in $\mathbb{V}$ are of equal sizes, then we have guaranteed at that least half of the nodes in $V$ are now identified as belonging to $L$ . These nodes are set aside, and we iterate over this procedure. In each iteration, at least half of the nodes remaining are marked as belonging to $L$ , up until the final iteration where only nodes in $H$ remain. Thus, every node in the graph knows which nodes belong to $L$ and which to $H$ .

Fix $v\in L$ . It holds that $\deg_{G}^{in}{(v)}=O(t)$ . Each node $u$ which holds an edge directed towards $v$ now sends that edge to $v$ . This is done using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1. We must show that $u$ knows the communication token of $v$ . This follows from Item 4, which states that every edge is stored alongside the communication tokens of both of its endpoints. The execution of Lemma B.1 completes in $\tilde{O}(t/c+1)$ rounds, as every node receives at most $\tilde{O}(t)$ messages, and sends at most $O(k)=O(\sqrt{nk})=O(\sqrt{m})=O(t)$ messages. Thus, $v$ computes $\deg_{G}^{in}{(v)}$ , as it knows all of the edges which are directed towards itself.

Observe $H$ – we show that every $v\in H$ also computes $\deg_{G}^{in}{(v)}$ . As the minimum in-degree of a node in $H$ is $2t$ , and $t^{2}=m$ , we get $|H|=O(t)$ . Further, recall that every node knows which nodes are in $H$ . Therefore, using \IfAppendixLABEL:\next ( (Aggregation).)Corollary B.6, within $\tilde{O}(|H|/c+1)=\tilde{O}(t/c+1)$ rounds, every node in the graph knows the in-degree of every node in $H$ .

Finally, we allocate $D_{v}^{in}$ . We are given as input the partial carrier configuration $C$ , allowing each node $v\in V$ to compute $\deg_{G}^{out}{(v)}$ in $\tilde{O}(1)$ rounds, using \IfAppendixLABEL:\next ( (Carriers Broadcast and Aggregate).)Lemma B.14. Thus, we invoke \IfAppendixLABEL:\next ( (Initialize Carriers).)Lemma B.11, in $\tilde{O}(n/(k\cdot c)+1)$ rounds, to create the sets $E_{v}^{out},E_{v}^{in}$ , which satisfy all of the properties of a carrier configuration, except for Items 3, 4, 5, 4 and 5. These remaining properties relate to populating the carrier node sets with input data. We throw away $E_{v}^{out}$ and set $D_{v}^{in}=E_{v}^{in}$ .

Populating $D_{v}^{in}$ :

Fix $v\in L$ . As $v$ knows all the edges directed towards it, it sends these edges to its carrier in $\tilde{O}(\deg_{G}^{in}{(v)}/c+1)=\tilde{O}(t/c+1)$ rounds, using \IfAppendixLABEL:\next ( (Carriers Broadcast and Aggregate).)Lemma B.14 and trivially completes Items 3, 4, 5 and 4, by sending at most $\tilde{O}(1)$ messages to each node in $D_{v}^{in}$ , requiring $\tilde{O}(\deg_{G}^{in}{(v)}/(k\cdot c)+1)=\tilde{O}(t/(k\cdot c)+1)$ rounds. The only challenging task is ensuring Item 5, which requires that for every edge $e=(w,v)$ , the node $w\in D_{v}^{in}$ knows the communication token and identifier of $w^{\prime}\in D_{w}^{out}$ which holds $e$ . However, since every edge in this algorithm is sent directly from the carrier node which holds it (node $w^{\prime}$ sends $e$ ), that carrier node can attach its own communication token and identifier to the edge itself when sending it, thus providing the information which the nodes in $D_{v}^{in}$ need in order to satisfy Item 5.

Partition $V$ into $t/k$ sets, $W_{1}=[1,nk/t],\dots,W_{t/k}=[n-nk/t+1,n]$ , and use \IfAppendixLABEL:\next ( (Grouping).)Lemma B.7 to ensure that for each $W_{i}$ , every node in $W_{i}$ knows $Tokens(W_{i})$ , within $\tilde{O}(t/(k\cdot c)+(nk)/(t\cdot c)+1)=\tilde{O}(t/c+1)$ rounds. For each set $W_{i}$ , denote some arbitrary, hardcoded $w_{i}\in W_{i}$ as the leader of $W_{i}$ , and in $\tilde{O}(t/(k\cdot c)+1)$ rounds, broadcast the communication tokens and identifiers of all the leaders, using \IfAppendixLABEL:\next ( (Broadcasting).)Lemma B.5.

Fix $v\in H$ . Observe that $|D_{v}^{in}|\geq\deg_{G}^{in}{(v)}/k\geq t/k$ . For each set $W_{i}$ , the $i$ -th carrier in $D_{v}^{in}$ , sends its communication token and identifier to $w_{i}$ , using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1, in $\tilde{O}(t/c+1)$ rounds, as every carrier sends at most one message and every leader receives at most $O(|H|)=O(t)$ messages. Then, leader $w_{i}$ broadcasts to $W_{i}$ all the communication token and identifiers of carrier nodes which it received, taking $\tilde{O}(|H|/c+1)=\tilde{O}(t/c+1)$ rounds, using \IfAppendixLABEL:\next ( (Group Broadcasting and Aggregating).)Corollary B.9.

Finally, every node in $W_{i}$ which holds an edge directed towards $v$ , tells the $i$ -th carrier in $D_{v}^{in}$ about this edge using Lemma B.1, in $\tilde{O}(nk/(t\cdot t)+1)=\tilde{O}(t/c+1)$ rounds, as each carrier node receives at most $|W_{i}|=nk/t$ messages, and each node sends at most $O(k)=O(\sqrt{nk})=O(t)$ messages since $C$ is a partial carrier configuration. Now, the nodes $D_{v}^{in}$ know all of the edges directed towards $v$ . Since $i<j$ implies that every $x\in W_{i},\ y\in W_{j}$ hold $x<y$ , within $\tilde{O}(nk/(t\cdot c)+|D_{v}^{in}|/c+1)=\tilde{O}(t/c+n/(k\cdot c)+1)$ rounds, the nodes in $D_{v}^{in}$ can rearrange the information stored in them, as well as communicate with $v$ , in order to satisfy Items 3, 4, 5 and 4. To satisfy Item 5, an identical claim to the case of $v\in L$ can be used. ∎

B.3 Efficient Hopset Construction

We efficiently compute, in the $\mathsf{AC(c)}$ model, a hopset [24], which allows approximating distance-related problems quickly. We follow the general outline of [14], and solve $k$ -nearest and $(S,d,k)$ -source detection, defined below, in order to construct the hopset. We solve these problems mainly using \IfAppendixLABEL:\next ( (Sparse Matrix Multiplication).)Theorem 3.1.

However, in contrast to the $\mathsf{Congested\ Clique}$ implementation, in the $\mathsf{AC(c)}$ model we are met with additional challenges, as many operations which are trivial in the $\mathsf{Congested\ Clique}$ model become highly complex. For instance, upon computing the edges $E_{H}$ of a hopset $H$ , one must add the edges to the graph – an operation which is straightforward in the $\mathsf{Congested\ Clique}$ model, yet requires \IfAppendixLABEL:\next ( (Merging).)Lemma B.17 in the $\mathsf{AC(c)}$ model. Further, in the $\mathsf{Congested\ Clique}$ model, when we consider undirected graphs, once a node $v$ adds an edge $(v,u)$ to the graph then the edge $(u,v)$ is added as well, or updated to the minimum cost, if it exists already. To accomplish this in the $\mathsf{AC(c)}$ model, one should invoke the algorithm in \IfAppendixLABEL:\next ( (Merging).)Lemma B.17 on the matrix and the transpose of the matrix. However, transposing a matrix is not trivial and we accomplish it due to the definition of the carrier configuration, which implies that whenever nodes hold a matrix $A$ , they also implicitly hold $A^{T}$ . This goes to show why various new tools are required in the $\mathsf{AC(c)}$ model for this problem.

Definition 15 ( $(\beta,\varepsilon)$ -Hopset).

For a given weighted graph $G=(V,E)$ , a $(\beta,\varepsilon)$ -hopset, $H=(V,E_{H})$ is a set of edges such that paths of length at most $\beta$ hops in $G\cup H$ approximate distances in $G$ by a multiplicative factor of at most $(1+\varepsilon)$ . That is, for each $u,v\in V$ , $d_{G}^{\infty}(u,v)\leq d_{G\cup H}^{\beta}(u,v)\leq(1+\varepsilon)d_{G}^{\infty}(u,v)$ , where $d_{G}^{h}(u,v)$ is the weight of the shortest path with at most $h$ hops between $u,v$ in $G$ .

We demonstrate how to construct the $(\log n/\varepsilon,\varepsilon)$ -hopset $H$ over the input graph, where the number of edges in $H$ is $\tilde{O}(n^{3/2})$ . This is done using \IfAppendixLABEL:\next ( (Hopset Construction).)Theorem B.19.

Theorem B.19 (Hopset Construction).

There exists an algorithm in the $\mathsf{AC(c)}$ model, such that given a weighted undirected input graph $G=(V,E)$ with $n=|V|$ and $m=|E|=\Omega(n^{3/2})$ , held in a carrier configuration $C$ , and given some $0<\varepsilon<1$ , computes a $(\log n/\varepsilon,\varepsilon)$ -hopset $H$ , with $|H|=\tilde{O}(n^{3/2})$ , and outputs $G^{\prime}=(V,E\cup H)$ in a carrier configuration $C^{\prime}$ . The round complexity of this algorithm is $\tilde{O}((n^{5/6}/c+m/(n^{2/3}\cdot c)+1)/\varepsilon)$ , w.h.p.

Before proving \IfAppendixLABEL:\next ( (Hopset Construction).)Theorem B.19, we prove several theorems related to the following two problems.

Definition 16 ( $k$ -nearest).

Given a graph $G=(V,E)$ and a value $k\in[n]$ , in the $k$ -nearest problem, each node $v$ must learn $k$ of its closest neighbors in $G$ , breaking ties arbitrarily.

Definition 17 ( $(S,d,k)$ -source detection).

Given a graph $G=(V,E)$ , a set $S\subseteq V$ , a value $d\in[n]$ , and a value $k\leq|S|$ , in the $(S,d,k)$ -source detection problem, each node $v$ is required to learn its $k$ closest neighbors in $S$ , while considering paths of up to $d$ hops only.

We solve the $k$ -nearest and $k$ -nearest problems for the case where $k=\Omega(n^{1/3})$ , as the round complexity of our solutions does not improve for $k=o(n^{1/3})$ due to pre-processing costs.

Lemma B.20 ( $k$ -nearest Algorithm).

Given a graph $G=(V,E)$ , where $n=|V|$ , held in a carrier configuration $C$ , and some value $k=\Omega(n^{1/3})$ , it is possible in the $\mathsf{AC(c)}$ model, within $\tilde{O}(k\cdot n^{1/3}/c+n^{2/3}/c+1)$ rounds, w.h.p., to output a directed graph $G^{\prime}=(V,E^{\prime})$ held in a carrier configuration $C^{\prime}$ , where $E^{\prime}$ contains an edge from every node $v\in V$ to every node $u$ which is one of the closest $k$ nodes to $v$ (with ties broken arbitrarily), with weight $d_{G}(v,u)$ . Notice that it can be the case that $E\not\subset E^{\prime}$ .

Proof of Lemma B.20.

As shown in [14], the following process solves the problem. Take the adjacency matrix $A$ of $G$ , and in each row keep the $k$ smallest entries (breaking ties arbitrarily), to create some¹¹¹¹11Given $A$ , there potentially are many options for $A^{\prime}$ , since ties can be broken arbitrarily. matrix $A^{\prime}$ . Then, the matrix $A^{\prime}$ is iteratively squared, for at most $\log n$ iterations, while after each product only the $k$ smallest entries in each row are preserved. We create some matrix $A^{\prime}$ from $A$ . Fix $v\in V$ . Node $v$ computes $\deg_{out}{(v)}$ , in $\tilde{O}(1)$ rounds, using \IfAppendixLABEL:\next ( (Carriers Broadcast and Aggregate).)Lemma B.14. If $\deg_{out}{(v)}\leq k$ , then $v$ uses \IfAppendixLABEL:\next ( (Learn Carried Information).)Lemma B.15 in $\tilde{O}(k/c+1)$ rounds to learn all of the edges outgoing from it. Otherwise $\deg_{out}{(v)}>k$ , and denote by $f(v,p,i)$ the edges directed away from $v$ with weight at most $p$ and towards nodes with identifiers at most $i$ . Node $v$ computes two values, $p_{v}$ and $i_{v}$ , such that $p_{v}$ is the maximal value where there exists an $i_{v}$ such that $|f(v,p_{v},i_{v})|=k$ . Given any value $p$ , node $v$ can compute $|f(v,p,\infty)|$ within $\tilde{O}(1)$ rounds using \IfAppendixLABEL:\next ( (Carriers Broadcast and Aggregate).)Lemma B.14. Thus, in $\tilde{O}(1)$ rounds, $v$ can compute $p_{v}$ using binary search. Likewise, $|f(v,p_{v},i)|$ can be computed in $\tilde{O}(1)$ rounds for any $i$ , and thus using binary search $v$ computes $i_{v}$ . Then, node $v$ broadcasts $p_{v}$ and $i_{v}$ to $C_{v}^{out}$ , and using \IfAppendixLABEL:\next ( (Learn Carried Information with Predicate).)Lemma B.16, within $\tilde{O}(k/c+1)$ rounds, learns all of the edges in $f(v,p_{v},i_{v})$ . We need to hold $A^{\prime}$ in a carrier configuration in order to use it for matrix multiplication. Each node $v$ with $\deg_{out}{(v)}>k$ knows the entries of row $v$ in $A^{\prime}$ – they are $f(v,p_{v},i_{v})$ . Each node $v$ with $\deg_{out}{(v)}<k$ , locally adds arbitrary edges directed away from it with infinite weight, to have exactly $k$ edges directed away from it. We denote the new matrix created by this process by $A^{\prime\prime}$ , and notice that $A^{\prime\prime}$ has the same properties w.r.t. distances as $A^{\prime}$ , since edges of infinite weight do not affect shortest paths. As each node holds exactly $k$ edges directed away from it, the nodes themselves are a partial carrier configuration, $D$ , holding $A^{\prime\prime}$ . That is, for each node $v$ , we set $D_{v}^{out}=\{v\}$ . We invoke \IfAppendixLABEL:\next ( (Partial Configuration Completion).)Lemma B.18, within $\tilde{O}(\sqrt{nk}/c+n/(k\cdot c)+1)=\tilde{O}(k\cdot n^{1/3}/c+n^{2/3}/c+1)$ rounds, since $k=\Omega(n^{1/3})$ , in order to get a carrier configuration $D^{\prime}$ which holds $A^{\prime\prime}$ .

Finally, we iteratively square $A^{\prime\prime}$ by applying \IfAppendixLABEL:\next ( (Sparse Matrix Multiplication).)Theorem 3.1. That is, we compute $h(A^{\prime\prime}\times A^{\prime\prime})$ , where $h$ takes a matrix and leaves only the $k$ smallest entries in each row. Then, we compute $h(h((A^{\prime\prime})^{2})\times h((A^{\prime\prime})^{2}))$ , and so forth. Repeating this procedure for $O(\log n)$ iterations results in an output matrix which holds in row $v$ edges only to some $k$ closest nodes to $v$ . We perform $O(\log n)$ matrix multiplication, with each taking $\tilde{O}(k\cdot n^{1/3}/c+n/(k\cdot c)+1)=\tilde{O}(k\cdot n^{1/3}/c+n^{2/3}/c+1)$ rounds, since $k=\Omega(n^{1/3})$ and we always multiply two matrices with at most $k$ elements per row, due to \IfAppendixLABEL:\next ( (Sparse Matrix Multiplication).)Theorem 3.1. ∎

Lemma B.21 ( $(S,d,k)$ -source detection Algorithm).

Given a graph $G=(V,E)$ , where $n=|V|$ and $m=|E|$ , held in a carrier configuration $C$ , and given $S,d,k$ , where $k=|S|=\Omega(n^{1/3})$ and $d\in[n]$ , it is possible in the $\mathsf{AC(c)}$ model, within $\tilde{O}(([k+m/n]\cdot n^{1/3}/c+n^{2/3}/c+1)\cdot(d+1))$ rounds, w.h.p., to output a directed graph $G^{\prime}=(V,E^{\prime})$ held in a carrier configuration $C^{\prime}$ , where $E^{\prime}$ contains an edge from every node $v$ to every node $s\in S$ which is at most $d$ hops away from $v$ , with weight $d_{G}^{d}(v,s)$ . Notice that it can be the case that $E\not\subset E^{\prime}$ .

It is assumed that the IDs of the nodes in $S$ are known to all of $V$ .

Proof of Lemma B.21.

In [14], it is shown that the following process solves the problem. Denote by $A$ the adjacency matrix of $G$ . Denote by $A^{\prime}$ the sparsified adjacency matrix with edges only entering nodes in $S$ . The matrix $B=A^{d-1}\times A^{\prime}$ is the solution to the problem.

We construct a carrier configuration which holds $A^{\prime}$ . Fix $v\in V$ . Denote by $E(v,S)$ the edges from $v$ directed towards nodes in $S$ . Node $v$ uses \IfAppendixLABEL:\next ( (Learn Carried Information with Predicate).)Lemma B.16 to learn $E(v,S)$ , in $\tilde{O}(|S|/c+1)=\tilde{O}(k/c+1)$ rounds. We construct a partial carrier configuration $F$ which contains for each $v$ the edges $E(v,S)$ , by setting $F_{v}^{out}=\{v\}$ , since the average degree in $F$ is exactly $|S|$ .¹²¹²12Assuming that if a node $v$ does not have an edge to node in $S$ , it inserts a dummy edge with infinite weight. Using \IfAppendixLABEL:\next ( (Partial Configuration Completion).)Lemma B.18, we turn $F$ into a carrier configuration $D$ holding $A^{\prime}$ , in $\tilde{O}(\sqrt{nk}/c+n/(k\cdot c)+1)=\tilde{O}(k\cdot n^{1/3}/c+n^{2/3}/c+1)$ rounds, since $k=\Omega(n^{1/3})$ . Finally, we perform $d-1$ multiplications. We compute the product $B\times A^{\prime}$ by invoking \IfAppendixLABEL:\next ( (Sparse Matrix Multiplication).)Theorem 3.1 with the carrier configurations $C,D$ , to get a carrier configuration $E$ , which holds $B\times A^{\prime}$ , in $\tilde{O}([k+m/n]\cdot n^{1/3}/c+n/(k\cdot c)+1)=\tilde{O}([k+m/n]\cdot n^{1/3}/c+n^{2/3}/c+1)$ rounds, since the average number of finite elements per row in $B$ is at most $O(m/n)$ , in $A^{\prime}$ it is at most $k$ , and $k=\Omega(n^{1/3})$ . Notice that while this invocation of \IfAppendixLABEL:\next ( (Sparse Matrix Multiplication).)Theorem 3.1 only computes the $k$ smallest entries in each row of $B\times A^{\prime}$ , there are only at most $k$ entries in $B\times A^{\prime}$ which are finite – all the columns not corresponding to nodes in $S$ do not contain finite values. Thus, it turns out that the invocation of \IfAppendixLABEL:\next ( (Sparse Matrix Multiplication).)Theorem 3.1 actually computes $B\times A^{\prime}$ exactly. Thus, we now multiply $C$ by $E$ and repeat $d-2$ times until achieving the final result, taking $\tilde{O}(([k+m/n]\cdot n^{1/3}/c+n^{2/3}/c+1)\cdot(d+1))$ rounds. ∎

We turn our attention to proving \IfAppendixLABEL:\next ( (Hopset Construction).)Theorem B.19.

Proof of Theorem B.19.

In the $\mathsf{AC(c)}$ model, some preparation is necessary before constructing the desired hopset.

Initialization: We initialize the hopset $H$ and denote by $D$ the carrier configuration holding it. We initialize $H$ with $\tilde{\Theta}(n^{3/2})$ arbitrary, hardcoded edges all with infinite weights. While adding arbitrary edges of infinite weight does not affect distances, it ensures that throughout the entire algorithm $H$ will contain $\tilde{\Omega}(n^{3/2})$ edges. Further, as no more than $\tilde{O}(n^{3/2})$ are added in the algorithm which follows, $H$ will always contain $\tilde{\Theta}(n^{3/2})$ edges, ensuring that the average degree in $H$ is always $\tilde{\Theta}(\sqrt{n})$ , and thus the maximal number of carrier nodes that each node has is at most $\tilde{O}(n/\sqrt{n})=\tilde{O}(\sqrt{n})$ .

Whenever a set of edges is added to $H$ , it is assumed that if an edge is added from node $v$ to node $u$ , then also an edge is added in the opposite direction. As such, assume that whenever we add sets of edges to $H$ , we then reset $H$ to be $\min(H,H^{T})$ , by invoking \IfAppendixLABEL:\next ( (Merging).)Lemma B.17 on $D$ . Notice that due to \IfAppendixLABEL:\next ( ( $\mathsf{AC(c)}$ Carrier Configuration).)Definition 13, if we set ${D^{\prime}}_{v}^{in}={D}_{v}^{out}$ and ${D^{\prime}}_{v}^{out}={D}_{v}^{in}$ , we get that $D^{\prime}$ is a carrier configuration holding $H^{T}$ . As the maximal number of carrier nodes each node has in $D$ is $\tilde{O}(\sqrt{n})$ , these invocations of \IfAppendixLABEL:\next ( (Merging).)Lemma B.17 take $\tilde{O}(n^{1/2}/c+1)$ rounds.

Construction: We now begin computing the edges of $H$ . Initially, we use \IfAppendixLABEL:\next ( ( $k$ -nearest Algorithm).)Lemma B.20 on $C$ , with $k=\tilde{\Theta}(\sqrt{n})$ in order to get a carrier configuration $K$ with an edge from each node $v\in V$ to its $\tilde{\Theta}(\sqrt{n})$ nearest neighbors. This takes $\tilde{O}(n^{5/6}/c+1)$ rounds. We add the edges from $K$ to $D$ using \IfAppendixLABEL:\next ( (Merging).)Lemma B.17 in $\tilde{O}(n^{1/2}/c+1)$ rounds.

Next, we sample nodes $S$ , where $|S|=\tilde{\Theta}(\sqrt{n})$ , by letting every node join $S$ independently with probability $\tilde{\Theta}(n^{-1/2})$ , ensuring, w.h.p., that each node $v\in V$ holds in $D$ its distance to at least one node of $S$ . We use \IfAppendixLABEL:\next ( (Broadcasting).)Lemma B.5, in $\tilde{O}(n^{1/2}/c+1)$ rounds, in order to let every node $v\in V$ know all of $S$ .

We solve the $(S,d,k)$ -source detection problem with $S$ over the graph $G\cup H$ , and add the resulting edges to $H$ . We need to do this iteratively for $\tilde{O}(1)$ iterations. In each iteration, we invoke \IfAppendixLABEL:\next ( ( $(S,d,k)$ -source detection Algorithm).)Lemma B.21 with $S,d=O(1/\varepsilon),k=|S|=\tilde{\Theta}(\sqrt{n})$ in $\tilde{O}(((\sqrt{n}+m/n)\cdot n^{1/3}/c+n^{2/3}/c+1)\cdot(1/\varepsilon+1))=\tilde{O}((n^{5/6}/c+m/(n^{2/3}\cdot c)+1)/\varepsilon)$ rounds. We have now constructed the hopset $H$ , and therefore can create $C^{\prime}$ by executing \IfAppendixLABEL:\next ( (Merging).)Lemma B.17 on $C$ and $D$ , taking $\tilde{O}((n^{1/2}+m/n)/c+1)$ rounds, since we assume that $m=\Omega(n^{3/2})$ , completing the proof. ∎

B.4 SSSP

We begin by showing how to perform Bellman-Ford iterations [25] in the $\mathsf{AC(c)}$ model using carrier configurations. Given a source node $s$ in a graph $G$ , in a Bellman-Ford iteration $i$ , every node in $v\in G$ broadcasts to its neighbors $d^{i-1}_{G}(s,v)$ , its distance to $s$ with at most $i-1$ hops, and then calculates $d^{i}_{G}(s,v)$ by taking the minimal distance to $s$ which it receives from its neighbors in this iteration.

Lemma B.22 (Bellman-Ford Iterations in $\mathsf{AC(c)}$ ).

Given a (directed or undirected) weighted graph $G=(V,E)$ with average degree $k$ held in a carrier configuration $C$ , and a source node $s\in V$ , it is possible in the $\mathsf{AC(c)}$ model, within $\tilde{O}((k/c+1)\cdot i)$ rounds, w.h.p., to perform $i$ iterations of the Bellman-Ford algorithm on $G$ with $s$ as the source.

Proof of Lemma B.22.

Fix $v\in V$ . Node $v$ computes $d_{G}^{1}(s,v)$ , within $\tilde{O}(1)$ rounds, using \IfAppendixLABEL:\next ( (Carriers Broadcast and Aggregate).)Lemma B.14. Then, to simulate the $j$ -th iteration, node $v$ broadcasts to $C_{v}^{out}$ the value $d_{G}^{j-1}(s,v)$ , in $\tilde{O}(1)$ rounds. Each node $u\in C_{v}^{out}$ , for every edge $e=\{v,w\}$ which $u$ stores, sends to the node $u^{\prime}\in C_{w}^{in}$ which stores $e$ , the value $d_{G}^{j-1}(s,v)+w(e)$ . Since the average degree in $G$ is $k$ , and due to \IfAppendixLABEL:\next ( ( $\mathsf{AC(c)}$ Carrier Configuration).)Definition 13, it holds that each $u\in V$ sends and receives at most $k$ messages in this step, thus taking $\tilde{O}(k/c+1)$ rounds, using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1. Finally, $v$ sets $d_{G}^{j}(s,v)$ to be the minimum over all the values which nodes $C_{v}^{in}$ received in this iteration, within $\tilde{O}(1)$ rounds. We repeat the above process $i-1$ times. ∎

We show how to compute exact SSSP in the $\mathsf{AC(c)}$ model.

Theorem B.23 (Exact SSSP in $\mathsf{AC(c)}$ ).

Given a weighted undirected graph $G=(V,E)$ with $n=|V|$ and $m=|E|$ , held in a carrier configuration $C$ , and a source node $s\in V$ , it is possible in the $\mathsf{AC(c)}$ model, within $\tilde{O}(m^{1/2}n^{1/6}/c+n/c+n^{7/6}/m^{1/2})$ rounds, w.h.p., to ensure that every node $v\in G$ knows the value $d_{G}(s,v)$ .

Proof of Theorem B.23.

It was proven in [54] that it is possible to solve exact SSSP on a weighted, undirected graph $G$ with a source $s$ by using the following steps. First, one solves the $k$ -nearest problem, for some $k$ , and creates the graph $G^{\prime}$ by starting with $G$ and adding weighted edges from each node to $k$ of its closest neighbors, with the weights equal to the weighted distance between the nodes in $G$ . Then, one performs $O(n/k)$ Bellman-Ford iterations on $G^{\prime}$ with $s$ as the source, and it is guaranteed that for each $v\in V$ , it holds that $d^{O(n/k)}_{G^{\prime}}(s,v)=d^{n}_{G}(s,v)$ .

Thus, in order to solve exact SSSP, we choose the value $k^{\prime}=m^{1/2}/n^{1/6}$ which balances the number of rounds required for our $k$ -nearest ¹³¹³13\IfAppendixLABEL:\next ( ( $k$ -nearest Algorithm).)Lemma B.20 requires $k^{\prime}=\Omega(n^{1/3})$ , and this holds since we assume that the graph is connected, implying, $m\geq n$ . and Bellman-Ford implementations.

Due to \IfAppendixLABEL:\next ( ( $k$ -nearest Algorithm).)Lemma B.20, we can solve the $k^{\prime}$ -nearest problem in $\tilde{O}(k^{\prime}\cdot n^{1/3}/c+n^{2/3}/c+1)=\tilde{O}(m^{1/2}n^{1/6}/c+n/c+1)$ rounds. This gives us a carrier configuration $D$ , which for every node holds edges directed away from it to $k^{\prime}$ of its nearest nodes. We need to add these edges from $D$ to the carrier configuration $C$ which holds $G$ . Thus, we invoke \IfAppendixLABEL:\next ( (Merging).)Lemma B.17, in order to get a carrier configuration $E$ which includes the edges from $C$ and from $D$ . Notice that \IfAppendixLABEL:\next ( (Merging).)Lemma B.17 always takes at most $\tilde{O}(n/c+1)$ rounds, and so this fits within our desired complexity.

Finally, we perform $O(n/k^{\prime})$ Bellman-Ford iterations on $E$ . Notice that the average degree in $E$ is $\Theta(m/n+k^{\prime})$ . Therefore, due to \IfAppendixLABEL:\next ( (Bellman-Ford Iterations in $\mathsf{AC(c)}$ ).)Lemma B.22, this completes within $\tilde{O}(((m/n+k^{\prime})/c+1)\cdot n/k^{\prime})=\tilde{O}(((m/n+k^{\prime})/c)\cdot(n/k^{\prime})+n/k^{\prime})=\tilde{O}(((m/n+k^{\prime})/c)\cdot(n/k^{\prime})+n^{7/6}/m^{1/2})=\tilde{O}((m/k^{\prime}+n)/c+n^{7/6}/m^{1/2})=\tilde{O}(m^{1/2}n^{1/6}/c+n/c+n^{7/6}/m^{1/2})$ rounds. ∎

We now proceed to showing how to compute an approximation of SSSP in the $\mathsf{AC(c)}$ model.

Theorem B.24 ( $(1+\varepsilon)$ -Approximation for SSSP in $\mathsf{AC(c)}$ ).

There exists an algorithm in the $\mathsf{AC(c)}$ model, such that given a weighted, undirected input graph $G=(V,E)$ , with $n=|V|$ and $m=|E|=\Omega(n^{3/2})$ , held in some carrier configuration $C$ , some $0<\varepsilon<1$ , and a source $s\in V$ , ensures that each node knows a $(1+\varepsilon)$ -approximation to its distance from $s$ . The round complexity of this algorithm is $\tilde{O}((n^{5/6}/c+m/(n^{2/3}\cdot c)+1)/\varepsilon)$ , w.h.p.

Proof of Theorem B.24.

We construct a $(\log n/\varepsilon,\varepsilon)$ -hopset $H$ by using \IfAppendixLABEL:\next ( (Hopset Construction).)Theorem B.19 on $C$ , in $\tilde{O}((n^{5/6}/c+m/(n^{2/3}\cdot c)+1)/\varepsilon)$ rounds, and obtain $G^{\prime}=(V,E\cup H)$ held in a carrier configuration $D$ . Due to the definition of $H$ , for every $v\in V$ , it holds that $d_{G}(s,v)\leq d_{G^{\prime}}^{\log n/\varepsilon}(s,v)\leq(1+\varepsilon)\cdot d_{G}(s,v)$ . Notice that since $G$ , $H$ , and $G^{\prime}$ are undirected, for each $v\in V$ , the sets $D_{v}^{out}$ , $D_{v}^{in}$ hold the same edges, and so it is irrelevant which one of them we use. Thus, we denote $D_{v}=D_{v}^{out}$ from here on.

We now perform $O(\log n/\varepsilon)$ Bellman-Ford iterations on $G^{\prime}$ , in order to ensure that every node $v$ knows $d_{G^{\prime}}^{\log n/\varepsilon}(s,v)$ and as such a $(1+\varepsilon)$ -approximation to $d_{G}(s,v)$ . To do so, we invoke \IfAppendixLABEL:\next ( (Bellman-Ford Iterations in $\mathsf{AC(c)}$ ).)Lemma B.22 on $G^{\prime}$ , which is held in $D$ , and the source $s$ , requiring $\tilde{O}((\log n/\varepsilon)\cdot(\sqrt{n}+m/n)/c+1)=\tilde{O}((n^{1/2}/c+m/(n\cdot c)+1)/\varepsilon)$ rounds, since $|H|=\tilde{O}(n^{3/2})$ . ∎

Finally, as our goal is to simulate our SSSP approximation algorithm in other distributed models directly, we provide the following wrapper statement which receives as input a graph where each node knows its neighbors, instead of a graph held in a carrier configuration.

Theorem B.25 ( $(1+\varepsilon)$ -Approximation for SSSP in $\mathsf{AC(c)}$ (Wrapper)).

There exists an algorithm in the $\mathsf{AC(c)}$ model, such that given a weighted, undirected input graph $G=(V,E)$ , with $n=|V|$ and $m=|E|$ , where each node $v$ knows all the edges incident to it, and the communication tokens of all of its neighbors in $G$ , some $0<\varepsilon<1$ , and a source $s\in V$ , ensures that each node knows a $(1+\varepsilon)$ -approximation to its distance from $s$ . The round complexity of this algorithm is $\tilde{O}((n^{5/6}/c+m/(n^{2/3}\cdot c)+1)/\varepsilon+\Delta/c)$ , where $\Delta$ is the maximal degree in the graph, w.h.p.

Proof of Theorem B.25.

We wish to invoke \IfAppendixLABEL:\next ( ( $(1+\varepsilon)$ -Approximation for SSSP in $\mathsf{AC(c)}$ ).)Theorem B.24, yet the main hurdle in our way is that it requires a graph with at least $\Omega(n^{3/2})$ edges. Therefore, we build $G^{\prime}=(V,E^{\prime})$ such that $\left|E^{\prime}\right|={\Omega}(n^{3/2})$ , $E\subseteq E^{\prime}$ and for each $u,v\in V$ $d_{G^{\prime}}(u,v)=d_{G}(u,v)$ . We call a node $u$ with degree less than ${O}(n^{1/2})$ a low degree node. Each low degree node $u$ meets $\Omega(n^{1/2})$ nodes which are not its neighbors in $G$ , and adds edges with infinite weight to those nodes. To meet new nodes, each low degree node $u$ sends its identifier and communication token to ${\tilde{{O}}}(n^{1/2})$ random nodes, and each node which received the communication token of $u$ , responds with its identifier and communication token. By \IfAppendixLABEL:\next ( (Sampling Unique Elements).)Lemma B.26, a low degree node $u$ meets at least ${\Omega}(n^{1/2})$ unique nodes which are not its original neighbors in $G$ , w.h.p. As such, $u$ connects itself with edges to these nodes which it meets. Notice, that each node was sampled at most ${\tilde{{O}}}(n^{1/2})$ times w.h.p. Thus, the maximum degree, $\Delta^{\prime}$ , in $G^{\prime}$ is $\Delta+{\tilde{{O}}}(n^{1/2})$ . The number of edges added is ${\tilde{{\Theta}}}(n^{3/2})$ , w.h.p., implying that the number of edges in $G^{\prime}$ is $m^{\prime}=m+{\tilde{{\Theta}}}(n^{3/2})={\tilde{{\Omega}}}(n^{3/2})$ .

We initialize a carrier configuration $C$ from $G^{\prime}$ , using \IfAppendixLABEL:\next ( (Initialize Carrier Configuration).)Lemma B.13 in $\tilde{O}(\Delta^{\prime}/c+1)={\tilde{{O}}}(\Delta/c+n^{1/2}/c+1)$ rounds. Then, we invoke \IfAppendixLABEL:\next ( ( $(1+\varepsilon)$ -Approximation for SSSP in $\mathsf{AC(c)}$ ).)Theorem B.24 in $\tilde{O}((n^{5/6}/c+m^{\prime}/(n^{2/3}\cdot c)+1)/\varepsilon)=\tilde{O}((n^{5/6}/c+m/(n^{2/3}\cdot c)+1)/\varepsilon)$ rounds w.h.p. ∎

Lemma B.26 (Sampling Unique Elements).

Let $c$ be some constant. Let $S$ be a set of $n$ elements. Let $B\subseteq S$ be a set of at most $b\sqrt{n}$ bad elements. Denote by $G=S\setminus B$ the set of good elements. Let $r\sqrt{n}$ be a number of required good elements. Let $D$ be a sequence of length at least $d\geq\sqrt{n}r\frac{0.5\ln{n}-\ln{r}+1}{0.5\ln{n}-\ln{(b+r)}}+\frac{c\ln{n}}{0.5\log{n}-\log{(b+r)}}$ elements, where each element is sampled independently uniformly and randomly from the set $S$ . There are more than $r$ unique good elements in the sequence $D$ with probability at least $n^{-c}$ .

Proof of Lemma B.26.

We upper bound the number of sequences which contain $r$ or less different good elements. There are ${\binom{n-b\sqrt{n}}{r\sqrt{n}}}$ subsets of good elements which may appear. For each of these subsets there are $((b+r)\sqrt{n})^{d}$ sequences. Notice, there are a lot of sequences which are counted multiple times, however since we only need an upper bound it is enough for us. So the number of bad sequences is upper bounded by:

	$\displaystyle{\binom{n-b\sqrt{n}}{r\sqrt{n}}}\cdot((b+r)\sqrt{n})^{d}\leq\left(\frac{(n-b\sqrt{n})\cdot e}{r\sqrt{n}}\right)^{r\sqrt{n}}\cdot\left((b+r)\sqrt{n}\right)^{d}\leq$
	$\displaystyle\frac{(n\cdot e)^{r\sqrt{n}}}{{(r\sqrt{n})}^{r\sqrt{n}}}\cdot\left((b+r)\sqrt{n}\right)^{d}\leq n^{d-c},$

for $d$ , which satisfies $\frac{\sqrt{n}^{d}}{(b+r)^{d}}\geq\left(\frac{\sqrt{n}e}{r}\right)^{r\sqrt{n}}n^{c}$ . Solving this condition for $d$ results in $d\geq\sqrt{n}r\frac{0.5\ln{n}-\ln{r}+1}{0.5\ln{n}-\ln{(b+r)}}+\frac{c\ln{n}}{0.5\log{n}-\log{(b+r)}}$ for large enough n.

Since the total number of sequences is $n^{d}$ , and all sequences are obtained with the same probability, the probability to get bad sequence is upper bounded by $n^{-c}$ . ∎

B.5 $k$ -SSP and APSP

We further show results pertaining to approximating distances from more than one source.

Theorem B.27 ( $(1+\varepsilon)$ -Approximation for $k$ -SSP in $\mathsf{AC(c)}$ ).

There exists an algorithm in the $\mathsf{AC(c)}$ model, such that given a weighted, undirected input graph $G=(V,E)$ with $n=|V|$ and $m=|E|=\Omega(n^{3/2})$ , held in a carrier configuration $C$ , some $0<\varepsilon<1$ , and a set of sources $M\subseteq V$ , $k=|M|=\Omega(n^{1/3})$ , outputs:

1.

A directed graph $G^{\prime}=(V,E^{\prime})$ held in a carrier configuration $D$ , where $E^{\prime}$ contains an edge from every node $v$ to every node $s\in M$ , where the weight of the edge maintains $d_{G}(v,s)\leq w((v,s))\leq(1+\varepsilon)\cdot d_{G}(v,s)$ . Notice that it can be the case that $E\not\subset E^{\prime}$ .
2.

Every node $v\in V$ , knows a $(1+\varepsilon)$ -approximation for $d_{G}(s,v)$ for every $s\in M$ .

The round complexity of this algorithm is $\tilde{O}((n^{5/6}/c+m/(n^{2/3}\cdot c)+k\cdot n^{1/3}/c+1)/\varepsilon)$ , w.h.p.

Proof of Theorem B.27.

This proof is split into three parts.

First, we construct a $(\log n/\varepsilon,\varepsilon)$ -hopset $H$ by using \IfAppendixLABEL:\next ( (Hopset Construction).)Theorem B.19 on $C$ , in $\tilde{O}((n^{5/6}/c+m/(n^{2/3}\cdot c)+1)/\varepsilon)$ rounds, and obtain $G^{\prime}=(V,E\cup H)$ held in a carrier configuration $C^{\prime}$ . Due to definition of $H$ , for every $v\in V$ , it holds that $d_{G}(s,v)\leq d_{G^{\prime}}^{\log n/\varepsilon}(s,v)\leq(1+\varepsilon)\cdot d_{G}(s,v)$ .

Next, we make the IDs of the nodes in $M$ globally known within $\tilde{O}(k/c+1)$ rounds, using \IfAppendixLABEL:\next ( (Broadcasting).)Lemma B.5. This enables us to invoke \IfAppendixLABEL:\next ( ( $(S,d,k)$ -source detection Algorithm).)Lemma B.21 on $G^{\prime}$ with $S=M$ , $k=|M|$ , and $d=O(\log n/\varepsilon)$ , creating the carrier configuration $D$ which is described in the statement of this theorem. This requires

$\tilde{O}(([k+m/n]\cdot n^{1/3}/c+n^{2/3}/c+1)\cdot(d+1)=\tilde{O}((m/(n^{2/3}\cdot c)+k\cdot n^{1/3}/c+n^{2/3}/c+1)/\varepsilon)$ rounds.

Finally, to satisfy the second guarantee, node $v$ learns all the edges held in $D_{v}^{out}$ , using \IfAppendixLABEL:\next ( (Learn Carried Information).)Lemma B.15 within $\tilde{O}(k/c+1)$ rounds.

∎

Theorem B.28 ( $(3+\varepsilon)$ -Approximation for Scattered APSP in $\mathsf{AC(c)}$ ).

There exists an algorithm in the $\mathsf{AC(c)}$ model, such that given a weighted, undirected input graph $G=(V,E)$ with $n=|V|$ and $m=|E|=\Omega(n^{3/2})$ , held in a carrier configuration $C$ , and some $0<\varepsilon<1$ , solves the $(3+\varepsilon)$ -Approximate Scattered APSP problem (\IfAppendixLABEL:\next ( (Scattered APSP).)Definition 4) on $G$ .

That is, the algorithm ensures that for every $u,v\in V$ , there exist nodes $w_{uv}$ , $w_{vu}$ (potentially $w_{uv}=w_{vu}$ ), which each know a $(3+\varepsilon)$ approximation to $d_{G}(u,v)$ , and node $u$ knows the identifier and communication token of node $w_{uv}$ , while node $v$ knows the identifier and communication token of $w_{vu}$ .

Further, for a given node $u$ , the following hold:

1.

The set $W_{u}=\{w_{uv}\ |\ v\in V\}$ contains at most $\tilde{O}(n^{1/2})$ unique nodes.
2.

Node $u$ can compute a string of $\tilde{O}(n^{1/2})$ bits, $s_{u}$ , such that using $s_{u}$ , for any $v\in V$ , it is possible to determine $w\in W_{u}$ such that $w=w_{uv}$ .
3.

Denote $P_{u}=\{x\in V\ |\ \exists v\in V$ s.t. $u=w_{xv}\}$ . It holds that $|P_{u}|=\tilde{O}(n^{1/2})$ .

The round complexity of this algorithm is $\tilde{O}((n^{5/6}/c+m/(n^{2/3}\cdot c)+1)/\varepsilon)$ , w.h.p.

The outline of the proof breaks into two parts – initialization and reshuffling.

Initialization: First, every node $v$ computes $\Theta(n^{1/2})$ of its nearest neighbors, $K(v)$ . Next, a $(1+\varepsilon)$ -approximation for distances from $V$ to a random set $A$ of $\tilde{\Theta}(n^{1/2})$ nodes is computed, denoted by $\tilde{d}$ .

It holds that for each $v\in V$ , w.h.p., $A\cap K(v)\neq\emptyset$ , and so we denote by $p(v)$ a closest node to $v$ in $A\cap K(v)$ .

Reshuffling: Due to \IfAppendixLABEL:\next ( (APSP using $k$ -nearest and MSSP).)Claim A.3, for every two nodes $v,u$ , it holds that $d_{G}(v,p(v))+\tilde{d}(p(v),u)$ is a $(3+\varepsilon)$ approximation to $d_{G}(v,u)$ . As $G$ is undirected, both $\tilde{d}(p(v),u)$ and $\tilde{d}(u,p(v))$ can be used, and so we work with $\tilde{d}(u,p(v))$ . Thus, we desire a state where for every two nodes $v,u$ , there exists a node $w_{vu}$ which knows $d_{G}(v,p(v))$ and $\tilde{d}(u,p(v))$ , and whose identifier is known to $v$ , and a node $w_{uv}$ which knows $d_{G}(u,p(u))$ and $\tilde{d}(v,p(u))$ and whose identifier is known to $u$ , concluding the proof.

Proof of Theorem B.28.

Initialization: Invoke \IfAppendixLABEL:\next ( ( $k$ -nearest Algorithm).)Lemma B.20 on $G$ , with $k=\Theta(n^{1/2})$ , within $\tilde{O}(k\cdot n^{1/3}/c+n^{2/3}/c+1)=\tilde{O}(n^{5/6}/c+1)$ rounds, to get a directed graph $G^{\prime}=(V,E^{\prime})$ held in a carrier configuration $C^{\prime}$ , where $E^{\prime}$ contains an edge from every node $v\in V$ to every node $u\in K(v)$ with weight $d_{G}(v,u)$ , where $K(v)$ is a set of $k$ closest nodes to $v$ . Using \IfAppendixLABEL:\next ( (Learn Carried Information).)Lemma B.15, in $\tilde{O}(k/c)=\tilde{O}(n^{1/2}/c)$ rounds, every node $v$ itself knows all the distances to the nodes in $K(v)$ .

Then, a random set $A$ of $\tilde{\Theta}(n^{1/2})$ nodes is selected, by letting each node join $A$ with probability $\Theta(n^{-1/2})$ , and $Tokens(A)$ are broadcast, using \IfAppendixLABEL:\next ( (Broadcasting).)Lemma B.5 within $\tilde{O}(|A|/c)=\tilde{O}(n^{1/2}/c+1)$ rounds. As seen in \IfAppendixLABEL:\next ( (APSP using $k$ -nearest and MSSP).)Claim A.3, it holds that for each $v\in V$ , w.h.p., $A\cap K(v)\neq\emptyset$ , and so we denote by $p(v)$ a closest node to $v$ in $A\cap K(v)$ . Node $v$ knows $p(v)$ , since it knows the distances to all the nodes in $K(v)$ . Finally, invoke \IfAppendixLABEL:\next ( ( $(1+\varepsilon)$ -Approximation for $k$ -SSP in $\mathsf{AC(c)}$ ).)Theorem B.27 on $G$ , using $M=A$ as the source set, to compute a $(1+\varepsilon)$ approximation for distances from $V$ to all of $A$ , denoted by $\tilde{d}$ , and requiring $O((n^{5/6}/c+m/(n^{2/3}\cdot c)+1)/\varepsilon)$ rounds. As per the specifications of Theorem B.27, $\tilde{d}$ is stored in a carrier configuration $D$ as edges in a directed graph $G^{\prime\prime}$ , where for each $v\in V,a\in A$ , there is an edge with weight $w_{G^{\prime\prime}}(v,a)=\tilde{d}(v,a)$ .

Reshuffling: For every node $a\in A$ , denote by $C(a)=\{v\in V\ |\ p(v)=a\}$ . Notice that $a$ does not know the set $C(a)$ .

Primarily, we compute all the values, $\{|C(a)|\ |\ a\in A\}$ , at once and make them known to all the nodes in $V$ , using \IfAppendixLABEL:\next ( (Aggregation).)Corollary B.6 within $\tilde{O}(|A|/c)=\tilde{O}(n^{1/2}/c+1)$ rounds.

Base Case (The Set $A_{0}$ ): Denote by $A_{0}$ nodes $a\in A$ with $|C(a)|\leq 2n/|A|=\tilde{\Theta}(n^{1/2})$ . Since the values $\{|C(a)|\ |\ a\in A\}$ are globally known, every node knows which nodes are in $A_{0}$ . Fix a node $a\in A_{0}$ . Every node $v\in C(a)$ sends to $a$ the following values: (1) the identifier $v$ , (2), the value $d_{G}(v,a)$ , and (3), the communication token of $v$ . Node $v$ knows all of these values, and also knows the communication token of $a$ ,¹⁴¹⁴14As $Tokens(A)$ are broadcast initially. and so node $v$ can send these three messages to node $a$ . This requires $\tilde{O}(|C(a)|/c+1)=\tilde{O}(n^{1/2}/c+1)$ rounds using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1. Node $a$ broadcasts the information it receives to $D_{a}^{in}$ , in $\tilde{O}(n^{1/2}/c+1)$ rounds, using \IfAppendixLABEL:\next ( (Carriers Broadcast and Aggregate).)Lemma B.14. Observe some node $w\in D_{a}^{in}$ . Notice that due to Item 4, and due to the fact that there is an edge in $D$ from every node in $V$ to $a$ , then there exists some interval $I_{w}=[w_{b},w_{e}]\subseteq[n]$ , such that node $w$ knows the values $\{\tilde{d}(v,a)|v\in I_{w}\}$ . Node $w$ sends to each $v\in C(a)$ the values $w_{b}$ and $w_{e}$ , using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1. This is possible as $w$ knows $Tokens(C(a))$ ,¹⁵¹⁵15This holds since $a$ broadcast the information it received, including $Tokens(C(a))$ , to $D_{a}^{in}$ . and takes $\tilde{O}(|C(a)|/c+|D_{a}^{in}|/c+1)=\tilde{O}(n^{1/2}/c+1)$ rounds,¹⁶¹⁶16 $|D_{a}^{in}|=O(n/(|A|)=\tilde{O}(n^{1/2})$ , as the average degree in $D$ is $|A|$ , and each node $a\in A$ has $n$ edges directed towards it in $D$ . as $w$ sends $O(|C(a)|)$ messages and every node in $C(a)$ receives $|D_{a}^{in}|$ messages.

Fix some $v\in V$ where $p(v)\in A_{0}$ , we claim that the output of the theorem is satisfied for $v$ . Observe that given any $u\in V$ , there exists some node $w_{vu}\in D_{p(v)}^{in}$ which both knows $d(v,p(v))$ and $\tilde{d}(u,p(v))$ , and further $v$ knows which node $w_{vu}$ is, as it is the node such that $u\in I_{w}$ . Finally, notice that we also satisfy that $W_{u}=\{w_{uv}\ |\ v\in V\}$ and $P_{u}=\{x\in V\ |\ \exists v\in V$ s.t. $u=w_{xv}\}$ contain $\tilde{O}(n^{1/2})$ unique nodes, and that it is possible to condense into $\tilde{O}(n^{1/2})$ bits the information describing which node in $W_{u}$ is $w_{uv}$ , for any $v\in V$ , as this depends only on the intervals which each node in $D_{a}^{in}$ holds. Thus, all the conditions of the statement we are proving are satisfied for the case of $A_{0}$ .

Iterative Case (Sets $A_{i}$ ): We proceed in $O(\log n)$ iterations. Fix the iteration counter $i\in[O(\log n)]$ . Denote by $A_{i}$ , the set of nodes $a\in A$ with $2^{i}n/|A|<|C(a)|\leq 2^{i+1}n/|A|=\tilde{\Theta}(2^{i+1}n^{1/2})$ . Notice that $\sum_{a\in A}|C(a)|=n$ , and therefore, $|A_{i}|\leq A/2^{i}$ . Further, the values $\{|C(a)|\ |\ a\in A\}$ are globally known, implying that the contents of $A_{i}$ are globally known, and so all the nodes locally compute an assignment of $2^{i}$ unique nodes $H(a)\subseteq A$ to each $a\in A_{i}$ .

Fix $a\in A_{i}$ . The nodes $D_{a}^{in}$ duplicate the information which they hold, so that for each $a^{\prime}\in H(a)$ , the nodes $D_{a^{\prime}}^{in}$ will contain the information held in $D_{a}^{in}$ . We use the following observation: Since the graph is connected, for any $x\in A$ , the nodes $D_{x}^{in}$ hold exactly $n$ values, $\{\tilde{d}(v,x)\ |\ v\in V\}$ . Combining this with the average degree in $D$ being $\Theta(|A|)$ , gives that $|D_{a}^{in}|=\tilde{O}(n/|A|)=\tilde{O}(n^{1/2})$ . Node $a$ selects some node $h_{1}\in H(a)$ , and sends to $h_{1}$ the values $Tokens(D_{a}^{in})$ , in $\tilde{O}(n^{1/2}/c+1)$ rounds. Then, $h_{1}$ broadcasts $Tokens(D_{a}^{in})$ to $D_{h_{1}}^{in}$ , in $\tilde{O}(n^{1/2}/c+1)$ rounds, using \IfAppendixLABEL:\next ( (Carriers Broadcast and Aggregate).)Lemma B.14. Due to the observation above, it also holds that $|D_{a}^{in}|=n=|D_{h_{1}}^{in}|$ , and so each carrier node in $D_{h_{1}}^{in}$ selects a unique carrier node in $D_{a}^{in}$ , and within $\tilde{O}(n^{1/2}/c+1)$ rounds, and learns all of the information which it holds. Thus, the nodes $D_{h_{1}}^{in}$ know all of the information which the nodes $D_{a}^{in}$ know. Then, nodes $a$ and $h_{1}$ selects nodes $h_{2},h_{3}\in H(a)$ , and repeat the process where $a$ sends information to $h_{2}$ and $h_{1}$ sends information to $h_{3}$ . This process is repeated for $\log|H(a)|=\log 2^{i}=i=\tilde{O}(1)$ iterations, until the information has been spread to all of $H(a)$ and their corresponding carrier nodes.

Fix $a\in A_{i}$ . The set $C(a)$ is split into $2^{i}$ roughly equal-sized parts, $C_{1}(a),\dots,C_{2^{i}}(a)$ . The main challenge is that no node in the graph knows all of $C(a)$ , and therefore partitioning $C(a)$ into $C_{1}(a),\dots,C_{2^{i}}(a)$ is not trivial. We overcome this final challenge as follows. Every node in $w\in D_{a}^{in}$ observes $I_{w}$ (defined above as the interval such that $w$ knows $\tilde{d}(v,a)$ for all $v\in I_{w}$ ) and sends each $v\in I_{w}$ a message asking if it is in $C(a)$ . Notice that this is possible since $w$ knows the communication tokens of all of $I_{w}$ , due to Item 4. Since $|I_{w}|=\tilde{O}(|A|)=\tilde{O}(n^{1/2})$ , then $w$ sends at most $\tilde{O}(n^{1/2})$ messages. Further, as $|A|=\tilde{O}(n^{1/2})$ , each node in the graph receives at most $\tilde{O}(n^{1/2})$ messages, and so all of these messages may be routed in $\tilde{O}(n^{1/2}/c+1)$ rounds using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1. Thus, the identities of all of $C(a)$ are dispersed across the carrier nodes $D_{a}^{in}$ . Similarly to the step above, this information is doubled, in $\tilde{O}(n^{1/2}/c+1)$ rounds, in order to make sure that for each $h\in H(a)$ , the carrier nodes $D_{h}^{in}$ hold the identifiers of $C(a)$ as well. Finally, for each $h\in H(a)$ (w.l.o.g. nodes in $H(a)$ are numbered from 1 to $|H(a)|$ – possible since $H(a)$ is globally known), node $h$ performs a binary search, by querying its carrier nodes $D_{h}^{in}$ , in order to find the interval $J_{h}=[h_{b},h_{e}]$ of nodes, such that the number of nodes in $C(a)$ with identifiers at most $h_{b}$ is at most $|C(a)|/h$ , and the number of nodes in $C(a)$ with identifiers in the interval $J_{v}$ is $|C(a)|/|H(a)|$ . These nodes form the set $C_{h}(a)$ . Node $h$ broadcasts $h_{b}$ and $h_{e}$ to $D_{h}^{in}$ , and then uses \IfAppendixLABEL:\next ( (Learn Carried Information with Predicate).)Lemma B.16 in order to learn the identifiers and communication tokens of $C_{h}(a)$ , in $\tilde{O}(|C_{h}(a)|/c+1)=\tilde{O}(n^{1/2}/c+1)$ rounds. Finally, within $\tilde{O}(|C_{h}(a)|/c+1)=\tilde{O}(n^{1/2}/c+1)$ rounds, node $h$ messages the nodes $C_{h}(a)$ to notify them that they are in $C_{h}(a)$ , which overcomes the challenge. This concludes the proof of the statement.

Fix $a\in A_{i}$ , and $h\in H(a)$ . The nodes in $C_{h}(a)$ and repeat the same process as done for the case of $A_{0}$ by communicating with $h$ . That is, each $v\in C_{h}(a)$ sends to $h$ the values: (1) the identifier $v$ , (2) the value $d_{G}(v,a)$ , and (3) the communication token of $v$ . Then, $h$ broadcasts this to $D_{h}^{in}$ . As shown for the case of $A_{0}$ , this requires $\tilde{O}(n^{1/2}/c+1)$ rounds, and completes the proof.

∎

Appendix C The $\mathsf{Hybrid}$ Model – Missing Proofs

C.1 Preliminaries – Extended Subsection

C.1.1 Communication Primitives

We observe several basic routing claims which are known in the $\mathsf{Hybrid}$ model.

In [7, Theorem 2.2], the following is shown for the weaker $\mathsf{Node\text{-}Capacitated\ Clique}$ model (in this model there are only global edges), and trivially holds in the $\mathsf{Hybrid}$ model.

Claim C.1 (Aggregate and Broadcast).

There is an Aggregate-and-Broadcast Algorithm that solves any Aggregate-and-Broadcast Problem in ${O}(\log n)$ rounds in the $\mathsf{Hybrid}$ model.

In [49, 8], solutions are presented for the \IfAppendixLABEL:\next ( (Token Dissemination).)Claim C.2 (see [8, Theorem 2.1]) and \IfAppendixLABEL:\next ( (Token Routing).)Claim C.3 (see [49, Theorem 2.2]) problems. Token dissemination is useful for broadcasting, while token routing has the ability to be used in a fashion that is more similar to unicast.

Definition 18 (Token Dissemination Problem).

The problem of making $k$ distinct tokens globally known, where each token is initially known to one node, and each node initially knows at most $\ell$ tokens is called the $(k,\ell)$ -Token Dissemination (TD) problem.

Claim C.2 (Token Dissemination).

There is an algorithm that solves $(k,\ell)$ -TD in the $\mathsf{Hybrid}$ model in ${\tilde{{O}}}(\sqrt{k}+\ell)$ rounds, w.h.p.

The following is discussed in [49] for token routing, and we later redefine the problem and remove the strong assumption which requires that each receiver knows the number of messages each sender sends it. We overcome this limitation later.

Definition 19 (Token Routing Problem).

The token routing problem is defined as follows. Let $S\subseteq V$ be a set of sender nodes and $R\subseteq V$ be a set of receiver nodes. Each sender needs to send at most $k_{S}$ tokens and each receiver needs to receive at most $k_{R}$ tokens, of size ${O}(\log n)$ bits each. Each token has a dedicated receiver node $r\in R$ , and each receiver $r\in R$ knows the senders it must receive a token from and how many token it needs to receive from each sender. The token routing problem is solved when all nodes in $R$ know all tokens they are the receivers of.

Claim C.3 (Token Routing).

$S,R\subseteq V$ be sets of nodes sampled from $V$ with probabilities $p_{S}=n^{x_{S}-1}$ and $p_{R}=n^{x_{R}-1}$ , for constant $x_{S},x_{R}\in(0,1]$ , respectively. Let $k_{S}$ and $k_{R}$ be the number of tokens to be sent or received by any node in $S$ and $R$ , respectively. Let $K=\left|S\right|\cdot k_{S}+\left|R\right|\cdot k_{R}$ be the total workload. The token routing problem can be solved in ${\tilde{{O}}}(\frac{K}{n}+\sqrt{k_{S}}+\sqrt{k_{R}})$ rounds in the $\mathsf{Hybrid}$ model w.h.p.

The following claim enables sending a polynomial number of messages uniformly at random while obeying the constraints of the model.

Claim C.4 (Uniform Sending).

[8, Lemma 3.1] Presume some $\mathsf{Hybrid}$ model algorithm takes at most $p(n)$ rounds for some polynomial $p$ . Presume that each round, every node sends at most $\sigma={\Theta}(\log n)$ messages via global edges to $\sigma$ targets in $V$ sampled independently and uniformly at random. Then there is a $\rho={\Theta}(\log n)$ such that for sufficiently large $n$ , in every round, every node in $V$ receives at most $\rho$ messages per round w.h.p.

C.1.2 Skeleton Graph

We use the notion of the skeleton graph presented in [8, 49] and augment it with additional conditions. In particular, its nodes are well spaced in the graph and satisfy the properties of marked nodes stated above.

Definition 20 (Extended Skeleton Graph).

Given a graph $G=(V,E)$ and a value $0<x<1$ , a graph $S_{x}=(M,E_{S})$ is called a skeleton graph in $G$ , if all of the following hold.

1.

$\{v,u\}\in E_{S}$ if and only if there is a path of at most $h=\tilde{\Theta}{(n^{1-x})}$ edges between $v,u$ in $G$ .
2.

Every node $v\in M$ knows all its incident edges in $E_{S}$ .
3.

$S_{x}$ is connected.
4.

For any two nodes $v,v^{\prime}\in M$ , $d_{S}(v,v^{\prime})=d_{G}(v,v^{\prime})$ .
5.

For any two nodes $u,v\in V$ with $hop(u,v)\geq h$ , there is at least one shortest path $P$ from $u$ to $v$ in $G$ , such that any sub-path $Q$ of $P$ with at least $h$ nodes contains a node $w\in M$ .
6.

$|M|={\tilde{{\Theta}}}(n^{x})$ .
7.
For each $v\in M$ there is a helper set $H_{v}$ which satisfies:
1. (a)
  
  $|H_{v}|=n^{1-x}$ .
2. (b)
  
  $\forall u\in H_{v}\colon hop(u,v)={\tilde{{O}}}(n^{1-x})$ .
3. (c)
  
  For each node $u\in V$ , there are at most ${\tilde{{O}}}(1)$ nodes $V_{u}\subseteq M$ such that for each $w\in V_{u}$ , $u\in H_{w}$ .
4. (d)
  
  $\left|\bigcup_{v\in M}H_{v}\right|={\tilde{{\Omega}}}(n)$ .

In this definition, we merge the properties used by [8, 49], slightly adjust Property 7a and prove Property 7d.

Claim C.5 (Skeleton From Random Nodes).

Given a graph $G=(V,E)$ , a value $0<x<1$ , and a set of nodes $M$ marked independently with probability $n^{x-1}$ ,

there is an algorithm which constructs a skeleton graph $S_{x}=(M,E_{S})$ in ${\tilde{{O}}}(n^{1-x})$ rounds w.h.p. If also given a single node $s\in V$ , it is possible to construct $S_{x}=(M\cup\set{s},E_{S})$ , without damaging the properties of $S_{x}$ .

Proof of Claim C.5.

Similarly to [8, Algorithm 7] and [49, Algorithm 6], the algorithm for constructing the skeleton graph $S_{x}=(M,E_{S})$ is to learn the ${\tilde{{\Theta}}}(n^{1-x})$ -hop neighborhood and to run [49, Algorithm 1] to compute the helper sets.

We group and slightly extend the claims given in [8, 49]. Properties 3 and 4 holds w.h.p. since $G$ is connected, see [8, Lemma 4.3] or [49, Lemma C.2]. Property 5 follow from [8, Lemma 4.2] or [49, LemmaC.1]. Property 6 follows from Chernoff Bounds. The helper sets described in Property 7 are computed using [49, Algorithm 1], and in [49, Lemma 2.2], their Properties 7b and 7c are proven. It is also shown there that, w.h.p., for every $v\in M$ it holds that $\left|H_{v}\right|\geq n^{1-x}$ , and thus in an additional ${\tilde{{O}}}(n^{1-x})$ rounds of local communication, we select exactly $n^{1-x}$ helpers and obtain Property 7a. The remaining Property 7d of the helper sets states that almost all of the nodes in the graph help other nodes. This holds since there are $\left|M\right|={\tilde{{\Omega}}}(n^{x})$ skeleton nodes, each has $n^{1-x}$ helpers and each helper helps ${\tilde{{O}}}(1)$ skeleton nodes, so, by the pigeonhole principle, the overall number of helpers is at least $\frac{{\tilde{{\Omega}}}(n^{x})\cdot n^{1-x}}{{\tilde{{O}}}(1)}={\tilde{{\Omega}}}(n)$ .

Finally, for adding a given node $s$ to the skeleton graph, notice that, as stated in [49], this node can take as helpers $n^{1-x}$ closest nodes. For each helper node, this will at most double the number of skeleton nodes it helps (see Property 7c). ∎

For the sake of formality in the following proofs, as some are stated for a set of marked nodes and some for the skeleton graph, we also show the following \IfAppendixLABEL:\next ( (Construct Skeleton).)Corollary C.6.

Corollary C.6 (Construct Skeleton).

Given a graph $G=(V,E)$ , and a value $0<x<1$ , there is an algorithm which constructs a skeleton graph $S_{x}=(M,E_{S})$ in ${\tilde{{O}}}(n^{1-x})$ rounds w.h.p. Further, if also given a single node $s\in V$ , it is possible to ensure that $s\in M$ without damaging the properties of $S_{x}$ .

Proof of Corollary C.6.

First mark each skeleton independently with probability $n^{1-x}$ , getting a set of skeleton nodes $M$ , then, using the algorithm from \IfAppendixLABEL:\next ( (Skeleton From Random Nodes).)Claim C.5 it is possible to construct the skeleton graph $S_{x}=(M,E_{S})$ within $\tilde{O}(n^{1-x})$ rounds w.h.p.

∎

We show several primitives related to communication within skeleton graphs.

We show the following claim which, given a skeleton graph $S_{x}=(M,E_{S})$ , assigns the nodes $M$ unique IDs from the set $[|M|]$ . This is useful, among other uses, for symmetry breaking and synchronization among the skeleton nodes.

Claim C.7 (Unique IDs).

Given a graph $G=(V,E)$ , and a skeleton graph $S_{x}=(M,E_{S})$ , it is possible to assign the nodes $M$ unique IDs from the set $[|M|]$ within $\tilde{O}(1)$ rounds in the $\mathsf{Hybrid}$ model, w.h.p.

Proof of Claim C.7.

We construct a binary tree of $\tilde{O}(1)$ depth over the nodes $M$ , and then assign each node an ID equal to its index in the pre-order traversal of the tree.

The nodes $M$ compute the node $v^{\prime}\in M$ with minimal initial ID, the ID which it has due to the definition of the $\mathsf{Hybrid}$ model. Notice it is possible to identify $v^{\prime}$ and ensure that all nodes in $M$ know the identifier of $v^{\prime}$ within $\tilde{O}(1)$ rounds due to \IfAppendixLABEL:\next ( (Aggregate and Broadcast).)Claim C.1. Further, using \IfAppendixLABEL:\next ( (Aggregate and Broadcast).)Claim C.1, the nodes compute $|M|$ .

Next, node $v^{\prime}$ chooses two nodes at random from $G$ , nodes $a,b$ , and sends them each a message. Nodes $a,b$ reply each with a random node $\alpha,\beta$ , respectively, where $a\in H_{\alpha},b\in H_{\beta}$ . Node $v^{\prime}$ repeats this process as long as it does not receive two distinct nodes $\alpha,\beta$ . Node $v$ then sends messages to both $\alpha,\beta$ and lets them know that they are its children in the tree. The nodes added to the tree continue this process, each of them randomly choosing two nodes as its children until it receives two distinct nodes which are not already in the tree, or until some $\tilde{O}(1)$ rounds elapsed. Clearly, w.h.p., this process constructs a binary tree of depth $\tilde{O}(1)$ within $\tilde{O}(1)$ rounds.

Finally we would like to assign an ordering to the nodes. Each node tells its parent the size of its subtree. That is, the leaves tell their parents that they are leaves, and whenever a node reaches a state where it has heard from all its children, it tells its parent how many nodes are in its subtree. Then, the root of the tree, $v^{\prime}$ , begins with the ID pallet $[|M|]$ , takes the first ID for itself, and passes down two contiguous intervals for possible IDs, broken according to the sizes of the subtrees of its children, to its two children nodes – with the left child receiving the interval with smaller IDs. Inductively, each node takes the first ID from the pallet it receives from its parent, breaks the pallet into two contiguous parts, according to the sizes of the subtrees of its children, and sends the part with smaller IDs to its left child, and the higher part to its right child. Since the depth of the tree is ${\tilde{{O}}}(1)$ , this completes in ${\tilde{{O}}}(1)$ rounds, w.h.p. ∎

We use the following statement from [18] to prove \IfAppendixLABEL:\next ( (SSSP with Low Average and High Maximal Degrees).)Lemma 4.5.

Lemma C.8 (Reassign Skeletons).

[18, Lemma 29] Given graph $G=(V,E)$ , a skeleton graph $S_{x}=(M,E_{S})$ , a value $k$ which is known to all the nodes, and nodes $A\subseteq V$ such that each $u\in A$ has at least ${\tilde{{\Theta}}}(k\cdot|A|)$ nodes $M_{u}\subseteq M$ in its $\tilde{\Theta}(n^{1-x})$ neighborhood, there is an algorithm that assigns $K_{u}\subseteq M_{u}$ nodes to $u$ , where $|K_{u}|={\tilde{{\Omega}}}(k)$ , such that each node in $M$ is assigned to at most $\tilde{O}(1)$ nodes in $A$ . With respect to the set $A$ , it is only required that every node in $G$ must know whether or not it itself is in $A$ – that is, the entire contents of $A$ do not have to be globally known. The algorithm runs in ${\tilde{{O}}}(n^{1-x})$ rounds in the $\mathsf{Hybrid}$ model, w.h.p.

The skeleton-based techniques allow us to approximate weighted SSSP fast in the $\mathsf{Hybrid}$ model. After we do it, the following well-known simple reduction allows us to compute $(2+\varepsilon)$ approximate weighted diameter.

Claim C.9 (Diameter from SSSP).

(see e.g. [18, Claim 34]) Given a graph $G=(V,E)$ , a value $\alpha>0$ and an algorithm which computes an $\alpha$ approximation of weighted SSSP in $T$ rounds of the $\mathsf{Hybrid}$ model, there is an algorithm which computes a $2\alpha$ -approximation of the weighted diameter in $T+{\tilde{{O}}}(1)$ rounds of the $\mathsf{Hybrid}$ model.

We use the following basic claim regarding usage of skeleton graphs for purposes of distance computations in the $\mathsf{Hybrid}$ model. It is proven in [8].

Claim C.10 (Extend Distances).

[8, Theorem 2.7]Let $G=(V,E)$ , let $S_{x}=(M,E_{S})$ be a skeleton graph, and let $V^{\prime}\subseteq V$ be the set of source nodes. If for each source node $s\in V^{\prime}$ , each skeleton node $v\in M$ knows the $\left(\alpha,\beta\right)$ -approximate distance $\tilde{d}\left(v,s\right)$ such that $d(v,s)\leq\tilde{d}(v,s)\leq\alpha d(v,s)+\beta$ , then each node $u\in V$ can compute for all source nodes $s\in V^{\prime}$ , a value $\tilde{d}(u,s)$ such that $d(u,s)\leq\tilde{d}(u,s)\leq\alpha d(u,s)+\beta$ in ${\tilde{{O}}}(n^{1-x})$ rounds.

C.2 Oblivious Token Routing

In [49], they introduce and solve the token routing problem over a skeleton graph, where each receiver $r$ knows the number of tokens each sender $s$ has for $r$ . This is insufficient for our purposes since we work in the ${o}(n^{1/3})$ complexity realm with ${\omega}(n^{2/3})$ skeleton nodes, where we can’t make the identifiers of the skeleton nodes globally known, let alone the number of messages between pairs of nodes. Therefore, we define the following routing problem, in which the receivers do not know neither the identifiers of the senders nor the number of messages each sender intends to send them.

Definition 21 (Oblivious Token Routing Problem).

The oblivious-token routing problem is defined as follows. Let $S\subseteq V$ be a set of sender nodes and $R\subseteq V$ be a set of receiver nodes. Each sender needs to send and each receiver needs to receive at most $k$ tokens, of size ${O}(\log n)$ bits each. Each token has a dedicated receiver node $r\in R$ , and each sender $s\in S$ and receiver $r\in R$ know the bound $k$ on number of tokens the receiver is going to receive. The oblivious-token routing problem is solved when all nodes in $R$ know all tokens they are the receivers of.

Notice that the assumption of the knowledge of a bound on the number $k$ of messages each receiver gets is something which is easy to eliminate by having the receiver double its estimate and repeat the algorithm till success for ${O}(\log{k})$ iterations. To verify if some particular invocation succeeded, we can can make a node broadcast failure if it sent or received more than half of its global capacity at some point.

Lemma C.11 (Oblivious Token Routing).

Given a graph $G=(V,E)$ , and a skeleton graph $S_{x}=(M,E_{S})$ , let $k$ be an upper bound on the number of tokens to be sent or received by any node in $M$ and let $K=2\cdot\left|M\right|\cdot k$ be the total workload. The oblivious-token routing problem can be solved in ${\tilde{{O}}}({k/n^{1-x}}+n^{1-x})$ rounds, w.h.p., in the $\mathsf{Hybrid}$ model.

Proof of Lemma C.11.

The problem overcome in [49, Theorem 2.2] (the non-oblivious case), is that even though there are enough helpers near each skeleton node to send and receive all the messages, it is not straightforward to connect between senders’ and receivers’ helpers. So, in [49] it is suggested to relay messages via some intermediate receivers. This way, a message is sent by a sender to one of its helpers, by the helper to an intermediate receiver, from there to a helper of the receiver, and from there it is sent to the receiver. To compute intermediate receivers for the message number $i$ from $s$ to $r$ , they apply a pseudo-random hash function $h(s,r,i)$ .

However, the receiver needs to be able to compute $h(s,r,i)$ as well, so it needs to know the number of messages it is to receive from each sender $s$ , and we cannot assume this for our purposes.

To overcome this limitation, we assign for helper number $i\in\left[n^{1-x}\right]$ of the receiver $r$ the intermediate receiver whose identifier is computed as $w=h(r,i)$ , where $h$ is pseudo-random hash function. We deliver the messages in $\left\lceil{k/n^{1-x}}\right\rceil$ phases. To keep the load balanced between phases, for each message $j$ we independently at random sample $p_{j}\sim U\left[\left\lceil{k/n^{1-x}}\right\rceil\right]$ , which is the phase on which it will be sent. In order to keep the load balanced between the receivers’ helpers and intermediate receivers on some phase $p$ , for each message $j$ we also independently at random sample receivers’ helper index $i_{j}\sim U\left[n^{1-x}\right]$ . The intermediate receiver is decided by hash function $h$ , i.e. we route the message $j$ with the final receiver $r_{j}$ via $w=h(r_{j},i_{j})$ . Unlike [49], we apply $h$ on arguments that are not necessarily distinct, which could increase the number of conflicts. However, we show that every time all nodes apply $h$ , each key $\left(r_{j},i_{j}\right)$ is used at most ${\tilde{{O}}}(1)$ -times w.h.p., so due to \IfAppendixLABEL:\next ( (Conflicts).)Claim A.2 the congestion on each intermediate receiver is ${\tilde{{O}}}(1)$ w.h.p.

The pseudo-code is provided by Algorithm 1.

1 The node with the minimum ID samples and broadcasts

{\tilde{{O}}}(1)

bits of seed

2Each node uses the seed to sample a pseudo-random hash function

h\in\mathcal{H}

3Each sender

s\in M

balances tokens to send between its helpers

H_{s}

4Each receiver

r

enumerates its helpers

u\in H_{r}

and informs them about their indices

5Each sender’s helper

v\in V

, for each message

j

it is assigned to send, samples receiver’s helper index

i_{j}\sim U\left[n^{1-x}\right]

and phase

p_{j}\in U\left[\left\lceil{k/n^{1-x}}\right\rceil\right]

6for $p$ from $0$ to $\left\lceil{k/n^{1-x}}\right\rceil$ do

7 Each sender’s helper

v\in V

for each message

j

such that

p_{j}=p

sends it to

h(r_{j},i_{j})

8 Each receiver’s helper

u\in V

, for each receiver

r

it helps sends

\Braket{u,r}

w=h(r,i)

, such that

i

is the index of

u

H_{r}

9 Each intermediate receiver

w\in V

which receives

\Braket{u,r}

sends all the messages it received for

r

on this phase to

u

Each receiver

r\in M

collects messages addressed to it from its helpers

H_{r}

Algorithm 1 Oblivious-Token Routing Protocol

Notice that each node can play five different roles: it could be a sender $s$ , a receiver $r$ , a sender’s helper $v$ , a receiver’s helper $u$ and an intermediate receiver $w$ . Moreover, it can be a sender’s or a receiver’s helper for up to ${\tilde{{O}}}(1)$ nodes. We show that it can be an intermediate receiver for ${\tilde{{O}}}(1)$ receiver’s helpers w.h.p.

First, all nodes sample a globally known pseudo-random hash function $h\colon V\times\left[n^{1-x}\right]\mapsto V$ from the family of ${\tilde{{\Theta}}}(1)$ -wise independent random functions $\mathcal{H}$ , which is used to compute the intermediate receivers for each message (Algorithms 1 and 1). For this, by \IfAppendixLABEL:\next ( (Seed).)Claim A.1, ${\tilde{{O}}}(1)$ bits of globally known seed are enough and the node with the minimal identifier samples and broadcasts them using \IfAppendixLABEL:\next ( (Aggregate and Broadcast).)Claim C.1. Afterwards, each sender $s\in M$ distributes the tokens between its helpers $H_{v}$ in a balanced manner – each sender’s helper is assigned at most $\left\lceil{k/n^{1-x}}\right\rceil$ messages to send (Algorithm 1). Each receiver enumerates its helpers by identifiers (Algorithm 1). Each sender’s helper $v$ , for each message $j$ it has to send, samples a random phase $p_{j}\sim U\left[\left\lceil{k/n^{1-x}}\right\rceil\right]$ and a random receiver’s helper index $i_{j}$ (Algorithm 1).

We then proceed for $\left\lceil{k/n^{1-x}}\right\rceil$ phases. On phase $p$ , each sender’s helper $v$ sends each message $j$ for which it sampled $p_{j}=p$ to the node $h(r_{j},i_{j})$ . Afterwards, in Algorithm 1, each receiver’s helper $v$ for each receiver $r$ it helps sends $\Braket{v,r}$ to $h(r,i)$ , where $i$ is the index of $v$ in $H_{r}$ computed in Algorithm 1. Each intermediate receiver $w$ , sends all messages $j$ it received with destination $r_{j}$ to the $v$ from which it received $\Braket{v,r_{j}}$ .

Algorithm 1 takes ${\tilde{{O}}}(1)$ rounds by \IfAppendixLABEL:\next ( (Aggregate and Broadcast).)Claim C.1, and Algorithms 1, 1 and 1 are implemented using local edges in ${\tilde{{O}}}(n^{1-x})$ rounds. There are ${\tilde{{O}}}(\left\lceil{k/n^{1-x}}\right\rceil)$ iterations of the loop in Line 1, and we argue that each of them requires ${\tilde{{O}}}(1)$ rounds of communications via global edges w.h.p. Overall, the complexity is ${\tilde{{O}}}({k/n^{1-x}}+n^{1-x})$ rounds.

For each of the $k$ messages designated to some receiver $r$ , the phase number $p_{j}$ is sampled independently with probability ${\tilde{{O}}}(\min\set{1,{n^{1-x}}/{k}})$ , therefore by a Chernoff Bound, there are ${\tilde{{O}}}(n^{1-x})$ messages with $r$ as the final destination, which are sent on the $p$ -th phase w.h.p. On the $p$ -th phase, some receiver’s helper index for each of these messages is sampled with probability $\frac{1}{n^{1-x}}$ , therefore by a Chernoff Bound it is sampled ${\tilde{{O}}}(1)$ times w.h.p. By a union bound over all phases, receivers and receivers’ helper indices, on each phase, for each receiver each receiver’s helper index is selected ${\tilde{{O}}}(1)$ times w.h.p. Thus, by \IfAppendixLABEL:\next ( (Conflicts).)Claim A.2 each $w\in V$ is selected as an intermediate receiver ${\tilde{{O}}}(1)$ times and receives ${\tilde{{O}}}(1)$ messages in ${\tilde{{O}}}(1)$ rounds w.h.p. This implies that no message is lost during Algorithm 1 and that Algorithms 1 and 1 take ${\tilde{{O}}}(1)$ rounds.

Since each node helps at most ${\tilde{{O}}}(1)$ senders and due to Chernoff Bounds, each helper sends ${\tilde{{O}}}(1)$ messages w.h.p. on some phase in Algorithm 1. Since each node is a helper to at most ${\tilde{{O}}}(1)$ receivers, Algorithm 1 also takes ${\tilde{{O}}}(1)$ rounds w.h.p. Similarly, by \IfAppendixLABEL:\next ( (Conflicts).)Claim A.2, since there are ${\tilde{{O}}}(n^{x})\cdot n^{1-x}={\tilde{{O}}}(n)$ distinct pairs of receivers and receiver receiver’s helper index, w.h.p. each intermediate receiver is assigned to at most ${\tilde{{O}}}(1)$ receiver helpers. Thus, Algorithm 1 also takes ${\tilde{{O}}}(1)$ rounds w.h.p. ∎

See 4.1

Proof of Claim 4.1.

The claim follows by an invocation of \IfAppendixLABEL:\next ( (Oblivious Token Routing).)Lemma C.11 with parameters $x,\ k,\ K={\tilde{{O}}}(n^{x}\cdot k)$ resulting in ${\tilde{{O}}}(\frac{k}{n^{1-x}}+\sqrt{k})={\tilde{{O}}}(n^{1-x})$ rounds, as required. ∎

C.3 $\mathsf{Broadcast\ Congested\ Clique}$ Simulation

We use the following claims from [18] to improve the simulating of the $\mathsf{Broadcast\ Congested\ Clique}$ model in the $\mathsf{Hybrid}$ .

Lemma C.12 ( $\mathsf{LOCAL}$ Simulation in $\mathsf{Hybrid}$ ).

[18, Lemma 16] Given a graph $G=(V,E)$ , and a skeleton graph $S_{x}=(M,E_{S})$ , it is possible to simulate one round of the $\mathsf{LOCAL}$ model over $S_{x}$ within $\tilde{O}(n^{1-x})$ rounds in $G$ in the $\mathsf{Hybrid}$ model. That is, within $\tilde{O}(n^{1-x})$ rounds in $G$ in the $\mathsf{Hybrid}$ model, any two adjacent nodes in $S_{x}$ can communicate any amount of data between each other.

Lemma C.13 (Sampled neighbors [18, Lemma 3.1]).

Given is a graph $G=(V,E)$ . For a value $q\leq n$ , there is a value $x={\tilde{{O}}}(n/q)$ such that the following holds w.h.p.: Let $V^{\prime}\subseteq V$ be a subset of $\left|V^{\prime}\right|=x$ nodes sampled uniformly at random from $M$ . Then each node $u\in V$ with $\deg{(u)}\geq q$ has a neighbor in $V^{\prime}$ .

We show how to simulate the $\mathsf{Broadcast\ Congested\ Clique}$ model using the $\mathsf{AC(c)}$ model and the $\mathsf{LOCAL}$ model together, and it then follows by \IfAppendixLABEL:\next ( ( $\mathsf{AC(c)}$ Simulation in $\mathsf{Hybrid}$ ).)Theorem 4.2 and \IfAppendixLABEL:\next ( ( $\mathsf{LOCAL}$ Simulation in $\mathsf{Hybrid}$ ).)Lemma C.12 that this can be converted into a simulation in the $\mathsf{Hybrid}$ model. The intuition behind the simulation follows from observing \IfAppendixLABEL:\next ( (Sampled neighbors [18, Lemma 3.1]).)Lemma C.13 – if every node desires to broadcast a single message to the entire graph, then with relatively little bandwidth it is possible to ensure that all nodes above a certain minimal degree will get these messages from all the nodes in the graph. We begin with the simulation of the $\mathsf{Broadcast\ Congested\ Clique}$ model in the combined $\mathsf{AC(c)}$ and $\mathsf{LOCAL}$ models.

Lemma C.14 ( $\mathsf{BCC}$ Simulation in $\mathsf{AC(c)}$ and $\mathsf{LOCAL}$ ).

Given a graph $G=(V,E)$ with average degree $k$ , given an algorithm $ALG_{BCC}$ in the $\mathsf{Broadcast\ Congested\ Clique}$ model, which runs on $G$ in $t$ rounds, and given some value $c$ , there exists an algorithm which uses $\tilde{O}(t\cdot n/(\sqrt{k}\cdot c))$ rounds of the $\mathsf{AC(c)}$ model and $O(t)$ rounds of the $\mathsf{LOCAL}$ model on $G$ and simulates $ALG_{BCC}$ on $G$ . It is assumed that prior to running $ALG_{BCC}$ , each node $v\in V$ has at most $\tilde{O}(\deg_{G}(v))$ bits of input used in $ALG_{BCC}$ , including, potentially, the incident edges of $v$ in $G$ . Further, it is assumed that the output of each node in $ALG_{BCC}$ is at most $O(t\log n)$ bits.

Proof of Lemma C.14.

The outline of the simulation is as follows. We split the graph into high degree nodes, $H\subseteq V$ , and low degree nodes, $L=V\setminus H$ , at a certain cut-off. The key idea is that if every node $v\in V$ takes a single message and sends it randomly to $c$ nodes in $V$ , then every $u\in H$ will have at least one neighbor, w.h.p., which hears the message from $v$ , for every $v\in V$ , due to \IfAppendixLABEL:\next ( (Sampled neighbors [18, Lemma 3.1]).)Lemma C.13. Thus, we choose some subset $F\subset H$ and assign to each node $v\in L$ some node $u\in F$ which partially simulates $v$ . By partially simulating, we mean that, initially, node $v$ tells node $u$ all of its input to $ALG_{BCC}$ , and then for each round, node $u$ tells $v$ what message $v$ wants to send in that round, and $v$ then sends this message (that it wishes to broadcast) to $c$ random nodes. Finally, we are guaranteed that every node in $H$ hears all the messages broadcast in the graph, which allows for $u$ to internally simulate the local computation which $v$ should perform in $ALG_{BCC}$ before the next round. In a sense, when $u$ simulates $v$ , after each round node $v$ knows what message it wants to send in that round of $ALG_{BCC}$ , yet not necessarily other information that it would have learned from other nodes in the graph during that round of $ALG_{BCC}$ . Thus, node $v$ might not know its output in $ALG_{BCC}$ . To overcome this, notice that $u$ knows the output of $v$ in $ALG_{BCC}$ , and due to our assumption in the statement of this theorem, each node outputs at most $O(t\log n)$ bits, and so we can simulate another $t$ rounds where each $v$ will just broadcast its output (ensuring that it itself receives it from $u$ ).

Initialization
We begin by showing how to initialize the nodes of high degree which simulate those of low degree. The cut-off for being a high or low degree node is $\Theta{(\sqrt{k})}$ . That is, we desire to simulate every node $v\in V$ with $\deg(v)=o(\sqrt{k})$ using a node $u\in V$ with degree at least $\deg(u)=\Omega(\sqrt{k})$ . Observe that since $k$ is the average degree, there are $\Theta(nk)$ edges in the graph. Since the maximal degree is at most $n$ , there must be at least $k$ nodes with degree at least $\Omega(\sqrt{k})$ . Thus, we denote by $F$ the $k$ nodes in $G$ with the highest degree, and are guaranteed that for each $v\in F$ , $\deg(v)=\Omega(\sqrt{k})$ . Notice that it is possible within $\tilde{O}(1)$ rounds to count the number of nodes in $V$ with degree above a threshold, using \IfAppendixLABEL:\next ( (Aggregation).)Corollary B.6, and thus within $\tilde{O}(1)$ rounds it is possible to do a binary search for the degree of the node with $k^{th}$ highest degree, allowing each node to know whether or not it is in $F$ .

Let $v\in F$ . The node $v$ now knows that it is in $F$ , and thus randomly sends $\tilde{\Theta}(n/k)$ messages, using $\tilde{\Theta}(n/(k\cdot c))\leq\tilde{O}(n/(\sqrt{k}\cdot c))$ rounds of the $\mathsf{AC(c)}$ model, containing its ID and communication token in the $\mathsf{AC(c)}$ model. Clearly, w.h.p., every node $u\in V\setminus F$ has received a message from at least one node $v\in F$ . Thus, if node $u$ needs simulating, that is, $\deg(u)=o(\sqrt{k})$ , it chooses arbitrarily among the nodes from $F$ which it heard from some node $v$ and tells $v$ that it should simulate $u$ . Denote by $J_{v}$ the set of nodes which choose $v$ to simulate them. Each node $v\in F$ , upon receiving $J_{v}$ , chooses and arbitrary order for $J_{v}$ and sends back to each $u\in J_{v}$ its index in that order.

Every node $v\in F$ now attempts to learn all the input to $ALG_{BCC}$ of the nodes which it simulates. Notice that now for every node in $v\in F$ it holds that $|J_{v}|$ is at most $\tilde{O}(n/k)$ . Notice that each node $u\in J_{v}$ has degree $\deg(u)=o(\sqrt{k})=O(\sqrt{k})$ , since otherwise it would have opted to not be simulated by any node, implying, by the constraints of this theorem, that $u$ has at most $\tilde{O}(\sqrt{k})$ bits of input to $ALG_{BCC}$ , and therefore all the nodes $J_{v}$ desire to send to $v$ at most $\tilde{O}(n/\sqrt{k})$ messages. Since $v$ synchronized all the nodes in $J_{v}$ by sending them each its index in some order of $J_{v}$ , it is possible to send all this data to $v$ in $\tilde{O}(n/(\sqrt{k}\cdot c))$ rounds of the $\mathsf{AC(c)}$ model: To do so, assume that every node $u\in J_{v}$ wishes to send exactly $d=\tilde{\Theta}(n/\sqrt{k})$ messages to $v$ (we can assume this since $u$ wants to send at most $\tilde{O}(n/\sqrt{k})$ messages to $v$ , and so it can just add extra empty messages at the end), therefore, since node $u$ knows its index in $J_{v}$ , for some ordering which $v$ decided on, it is possible to order all the messages from all of $J_{v}$ to $v$ in such a way that each node $u\in F_{v}$ knows the indices of its messages and such that at all round neither $v$ receives more than $c$ messages, nor a node $u\in F_{v}$ sends more than $c$ messages.

Round Simulation
We now show how to simulate every one of the $t$ rounds of the given $ALG_{BCC}$ . That is, we show the two final steps of the simulation: how $v\in F$ tells each node in $J_{v}$ what value to randomly send to nodes across the graph, and how each node in $F$ gets all the messages which were sent by all the nodes. The first part is simple – we already saw that node $v$ can send a single, unique message to each $u\in J_{v}$ within $\tilde{O}(n/(\sqrt{k}\cdot c))$ rounds of the $\mathsf{AC(c)}$ model. The second part follows from \IfAppendixLABEL:\next ( (Sampled neighbors [18, Lemma 3.1]).)Lemma C.13. According to \IfAppendixLABEL:\next ( (Sampled neighbors [18, Lemma 3.1]).)Lemma C.13, for every node $v\in F$ (which has $\deg_{G}{v}\geq q=\sqrt{k}$ ) to receive a single message from some node $u\in V$ , it is enough for this node $u$ to send a message $x={\tilde{{O}}}(n/q)={\tilde{{O}}}(n/\sqrt{k})$ times to nodes sampled uniformly at random and for each node $v\in F$ to learn received messages from its neighbors in $G$ . Sending $x={\tilde{{O}}}(n/\sqrt{k})$ messages, each one to random node requires ${\tilde{{O}}}(n/(\sqrt{k}\cdot c))$ rounds of the $\mathsf{AC(c)}$ model. And aggregating messages from neighbors requires single round of the $\mathsf{LOCAL}$ model.

Output
It is critival that every node will know its output at the end of the simulation of $ALG_{BCC}$ . This is ensured since we assume in the statement of this theorem that the output of every node in $ALG_{BCC}$ it at most $O(t\log n)$ bits. Thus, instead of simulating $ALG_{BCC}$ directly, we simulate an algorithm $ALG_{BCC}$ which is just like $ALG_{BCC}$ , yet is followed by $t$ rounds where each node broadcasts its output. Since $ALG_{BCC}$ takes $t$ rounds, this simply doubles the round complexity achieved above. Due to the fact that our simulation maintains that each node knows all the messages which it broadcasts during the simulated algorithm, every node will necessarily know its output. ∎

Finally, we show how to use \IfAppendixLABEL:\next ( ( $\mathsf{BCC}$ Simulation in $\mathsf{AC(c)}$ and $\mathsf{LOCAL}$ ).)Lemma C.14 for simulating a $\mathsf{Broadcast\ Congested\ Clique}$ algorithm in the $\mathsf{Hybrid}$ model.

See 4.3

Proof of Theorem 4.3.

We execute the simulation of \IfAppendixLABEL:\next ( ( $\mathsf{BCC}$ Simulation in $\mathsf{AC(c)}$ and $\mathsf{LOCAL}$ ).)Lemma C.14 on $G$ and $S_{x}$ with $c=\tilde{\Theta}(n^{2-2x})$ , in order to obtain an algorithm $ALG_{WACC,\ LOCAL}$ which simulates $ALG_{BCC}$ on $S_{x}$ within $t_{1}=\tilde{O}(t\cdot n^{x}/(\sqrt{k}\cdot c)=\tilde{O}(t\cdot n^{x}/(\sqrt{k}\cdot n^{2-2x})=\tilde{O}(t\cdot n^{3x-2}/\sqrt{k})$ rounds of the $\mathsf{AC(c)}$ model and $t_{2}=O(t)$ rounds of the $\mathsf{LOCAL}$ model on $S_{x}$ . Due to \IfAppendixLABEL:\next ( ( $\mathsf{AC(c)}$ Simulation in $\mathsf{Hybrid}$ ).)Theorem 4.2, since $c=\tilde{\Theta}(n^{2-2x})$ , it is possible to simulate the $\mathsf{AC(c)}$ rounds of $ALG_{WACC,\ LOCAL}$ on $S_{x}$ in $\tilde{O}(t_{1}\cdot n^{1-x})=\tilde{O}(t\cdot n^{2x-1}/\sqrt{k})$ rounds of the $\mathsf{Hybrid}$ model on $G$ . Likewise, using \IfAppendixLABEL:\next ( ( $\mathsf{LOCAL}$ Simulation in $\mathsf{Hybrid}$ ).)Lemma C.12, it is possible to simulate the $\mathsf{LOCAL}$ rounds of $ALG_{WACC,\ LOCAL}$ on $S_{x}$ in $\tilde{O}(t_{2}\cdot n^{1-x})=\tilde{O}(t\cdot n^{1-x})$ rounds of the $\mathsf{Hybrid}$ model on $G$ . ∎

C.4 A $(1+\varepsilon)$ -Approximation for SSSP

We show the missing proof of how we compute SSSP with low average degree and high maximal degree. See 4.5

Proof of Lemma 4.5.

Let $v^{\prime}\in S_{x}$ be some node with degree $\deg_{S_{x}}(v^{\prime})=\omega(n^{2-2x})$ . Observe that such a node $v^{\prime}$ exists and can be found and agreed upon by all the nodes of $G$ using \IfAppendixLABEL:\next ( (Aggregate and Broadcast).)Claim C.1 within $\tilde{O}(1)$ rounds. Denote by $N_{v^{\prime}}$ , where $|N_{v^{\prime}}|=\Theta(n^{2-2x})$ , an arbitrary subset of the neighbors of $v^{\prime}$ . Using \IfAppendixLABEL:\next ( (Token Dissemination).)Claim C.2, it is possible to ensure within $\tilde{O}(n^{1-x})$ rounds that every node in the graph knows which nodes are in $N_{v^{\prime}}$ . We strive to have the nodes of $M$ send all the contents of $E_{S}$ to the nodes $N_{v^{\prime}}$ .

We now show how the nodes $N_{v^{\prime}}$ can learn all of $E_{S}$ . We show that there is a way to do this in which every node in $M$ desires to send and receive at most $\tilde{O}(n^{2-2x})$ messages, and therefore due to \IfAppendixLABEL:\next ( (Skeleton Unicast).)Claim 4.1, the routing will complete in $\tilde{O}(n^{1-x})$ rounds.

We start by showing that the nodes $N_{v^{\prime}}$ even have the bandwidth to receive $E_{S}$ within at most $\tilde{O}(n^{1-x})$ rounds. Due to \IfAppendixLABEL:\next ( (Skeleton Unicast).)Claim 4.1, each node in $N_{v^{\prime}}$ can receive $\tilde{\Theta}(n^{2-2x})$ messages in $\tilde{O}(n^{1-x})$ rounds, implying that in total $N_{v^{\prime}}$ can receive $\tilde{\Theta}(|N_{v^{\prime}}|\cdot n^{2-2x})=\tilde{\Theta}(n^{4-4x})$ messages. Since the average degree in $S$ is $\tilde{O}(n^{x/2})$ , then $|E_{S}|=\tilde{O}(n^{3x/2})$ , and $3x/2\leq 4-4x$ if and only if $x\leq 8/11$ , which holds since we assume $x\leq 12/17\leq 8/11$ .

We now show that each node $v\in M$ has the bandwidth to send all its incident edges within at most $\tilde{O}(n^{1-x})$ rounds. Notice that if $\deg_{S_{x}}{(v)}=\tilde{O}(n^{2-2x})$ , then it can clearly do so. Thus, for all other nodes in $M$ which have higher degrees, we assign some of the other nodes of $M$ in their $\tilde{\Theta}(n^{1-x})$ neighborhood to assist them. Let $A\subseteq M$ be the set of all nodes in $M$ such that $v\in A$ has $\deg_{S_{x}}(v)=\Omega(n^{2-2x})$ . We strive to invoke \IfAppendixLABEL:\next ( (Reassign Skeletons).)Lemma C.8 on $A$ in order to assign each $v\in A$ some $\tilde{\Theta}(n^{3x-2})$ nodes, denoted $A_{v}\subseteq M$ , where $A_{v}$ are in the $\tilde{O}(n^{1-x})$ -hop neighborhood of $v$ in $G$ , and where each node in $M$ is assigned to at most $\tilde{O}(1)$ nodes in $A$ . Thus, we must show that for each node $v\in A$ , it has in its $\tilde{O}(n^{1-x})$ neighborhood in $G$ at least $\tilde{\Theta}(|A|\cdot n^{3x-2})$ nodes of $M$ . Recall that $\deg_{S_{x}}(v)=\Omega(n^{2-2x})$ , implying that $v$ has at least $\Omega(n^{2-2x})$ nodes of $M$ in its $\tilde{O}(n^{1-x})$ neighborhood in $G$ . We thus strive to show that $\tilde{\Theta}(|A|\cdot n^{3x-2})=O(n^{2-2x})$ . Notice that $|A|=\tilde{O}(n^{3x/2}/n^{2-2x})=\tilde{O}(n^{7x/2-2})$ since the average degree in $S_{x}$ is at most $\tilde{O}(n^{x/2})$ and the minimal degree of a node in $A$ is $\Omega(n^{2-2x})$ . Thus, $\tilde{\Theta}(|A|\cdot n^{3x-2})=\tilde{O}(n^{13x/2-4})$ , and since $13x/2-4\leq 2-2x$ if and only if $x\leq 12/17$ , which is given in the conditions of this statement, we conclude. Therefore, it is possible to invoke \IfAppendixLABEL:\next ( (Reassign Skeletons).)Lemma C.8 and thus we can assume that each node $v\in A$ is assigned the nodes $A_{v}$ defined previously. Now, node $v\in A$ distributes its incident edges in $S_{x}$ to the nodes $A_{v}$ uniformly, using the local edges of the $\mathsf{Hybrid}$ model, within $\tilde{O}(n^{1-x})$ rounds. Since $v$ has at most $|M|=\tilde{O}(n^{x})$ incident edges in $S_{x}$ , this means that each node $u\in A_{v}$ receives at most $\tilde{O}(n^{x-3x+2})=\tilde{O}(n^{2-2x})$ messages from $v$ . Every node $u\in A_{v}$ takes responsibility for the edges it received from $v$ and soon forwards them to the nodes $N_{v^{\prime}}$ . Notice that since each node $u\in M$ is assigned to as most $\tilde{O}(1)$ nodes in $A$ , each $u$ takes responsibility for at most $\tilde{O}(n^{2-2x})$ messages in total.

At last, noticed that we reach a state where every node in $S_{x}$ wishes to send at most $\tilde{O}(n^{2-2x})$ messages to the nodes in $N_{v^{\prime}}$ – this is since nodes with at most $\tilde{O}(n^{2-2x})$ neighbors in $S_{x}$ (nodes in $M\setminus A$ ) have at most such many messages, and each node $v\in A$ distributed such many messages per each node in $A_{v}$ . Further, as stated above, the total number of messages to send are $\tilde{O}(n^{3x/2})$ . Notice that for each message, it does not matter which node in $N_{v^{\prime}}$ receives it, and so for each message we select the target in $N_{v}^{\prime}$ independently and uniformly. The expected number of messages each node in $N_{v}^{\prime}$ receives is ${\tilde{{O}}}(n^{3x/2-2+2x})$ = ${\tilde{{O}}}(n^{7x/2-2})$ = ${\tilde{{O}}}(n^{2-2x})$ , where the last transition holds since $x\leq 12/17\leq 8/11$ , as seen previously. Since the targets of the messages are independent, by an application of a Chernoff Bound, and applying a union bound over all $N_{v^{\prime}}$ , number of messages each node receives is ${\tilde{{O}}}(n^{2-2x})$ w.h.p. Thus by Claim 4.1, it is possible to route the messages within ${\tilde{{O}}}(n^{1-x})$ rounds w.h.p. This, combined with the fact that the contents of $N_{v^{\prime}}$ were previously make globally known using \IfAppendixLABEL:\next ( (Token Dissemination).)Claim C.2, allows every node $v\in M$ to locally compute to which node in $N_{v^{\prime}}$ it should deliver each of its messages in a way such that every node in $N_{v^{\prime}}$ receives the same number of messages across all the messages being sent. At this point, since every node in $M$ desires to send and receive at most $\tilde{\Theta}(n^{2-2x})$ messages, it is possible to invoke \IfAppendixLABEL:\next ( (Skeleton Unicast).)Claim 4.1 in order to route all these messages within $\tilde{\Theta}(n^{1-x})$ rounds.

Finally, since now nodes in $N_{v^{\prime}}$ know all of $E_{S}$ , node $v$ can learn all of $E_{S}$ by learning all the information stored in $N_{v^{\prime}}$ within $\tilde{O}(n^{1-x})$ rounds. Node $v^{\prime}$ can compute the exact distance from the source $s\in M$ to any node $v\in M$ . Thus, node $v^{\prime}$ desires to tell every node $v\in M$ the value of $d_{S_{x}}(s,v)$ . This is possible since $S_{x}$ is connected and thus every node $v\in M$ sent at least one message to $v$ throughout the above algorithm, and thus it is possible to reverse the direction of the messages sent above in order to ensure that for each $v\in M$ , node $v^{\prime}$ can send a unique message to $v$ , in the same round complexity as of the above algorithm. ∎

Appendix D The $\mathsf{CONGEST}$ Model – Missing Proofs

We begin with some preliminaries for this section.

Claim D.1 ( $\mathsf{CONGEST}$ Routing).

[38, Theorem 1.2][20, Theorem 2] Consider a graph $G=(V,E)$ with an identifier assignment $ID\colon V\mapsto\left[n\right]$ such that any node $u$ given $ID(v)$ can compute $\lfloor\log\deg{(v)}\rfloor$ , and a set of point-to-point routing requests, each given by the identifiers of the corresponding source-destination pair. If each node $v$ of $G$ is the source and the destination of at most $\deg_{G}(v)\cdot 2^{{O}(\sqrt{\log{n}})}$ messages, there is a randomized distributed algorithm that delivers all messages in time $\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ in the $\mathsf{CONGEST}$ model, w.h.p.

Corollary D.2 (Identifiers).

In the $\mathsf{CONGEST}$ model in ${O}(\tau_{\text{mix}}+\log{n})$ time we can compute an ID assignment $ID\colon V\mapsto\left[n\right]$ and other information such that $ID(u)<ID(v)$ implies $\left\lfloor\log\deg(u)\right\rfloor\leq\left\lfloor\log\deg(v)\right\rfloor$ , and any vertex $u$ , given $ID(v)$ , can locally compute $\left\lfloor\log\deg(v)\right\rfloor$ for any $v$ .

Proof of Corollary D.2.

[20, Lemma 4.1] shows how to compute the aforementioned set of identifiers in ${O}(D+\log{n})$ rounds in the $\mathsf{CONGEST}$ model, where $D$ is the diameter of the graph. Since the diameter is at most the mixing time $\tau_{mix}$ , we also have that the identifiers are computable in ${O}(\tau_{\text{mix}}+\log{n})$ rounds w.h.p. ∎

The following lemma shows how to build a carrier configuration of the $m^{\prime}$ -supergraph $G^{\prime}=(V^{\prime},E^{\prime},w^{\prime})$ of the input graph $G=(V,E,w)$ in the $\mathsf{CONGEST}$ model and is rather technical. Thus, its proof is deferred to the Appendix D. Let $ID_{new}$ be an assignment of identifiers, s.t. any node $u$ can compute $\lfloor\log\deg{(v)}\rfloor$ using $ID_{new}(v)$ . For an added node $v\in V^{\prime}\setminus V$ supergraph, we assume that old identifier $v$ and new identifier $ID_{new}(v)$ are equal $v=ID_{new}(v)$ and greater than identifier of any original node $v^{\prime}\in V$ . Denote by $\rho\colon\left[n\right]\mapsto\left[n\right]$ a globally known simulation assignment, which satisfies $\left|\{i\colon\rho(i)=i^{\prime}\}\right|={O}(\deg_{G}(i^{\prime})/k)$ , for each new identifier $i^{\prime}$ .

Claim D.3 (Build Carrier Configurations in $\mathsf{CONGEST}$ ).

Given is a graph $G$ , $k=\left|E\right|/\left|V\right|$ , and an assignment of new identifiers $ID_{new}\colon V\mapsto\left[n\right]$ . Let $m^{\prime}$ be s.t. $0\leq m^{\prime}\leq n^{2}$ . Assume that for each $v\in V$ , node $\rho(ID_{new}(v))$ knows the original identifier of $v$ and $ID_{new}(v)$ . There is an algorithm that builds a carrier configuration $C$ , which holds an $m^{\prime}$ -supergraph $G^{\prime}$ of $G$ . The communication token in $C$ for node $u$ is a concatenation of $u$ , $\rho(u)$ and $ID_{new}(u)$ . The algorithm runs in $\lceil m^{\prime}/m\rceil\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds w.h.p. and ensures that information for the carrier node $i\in\left[n\right]$ (carried edges, and communication tokens) is stored in the node $\rho(i)$ , which simulates $i$ .

Proof of Claim D.3.

We show how to build the outgoing carrier configuration $C^{out}$ which holds the $m^{\prime}$ -supergraph $G^{\prime}$ . The incoming carrier configuration is built similarly and simultaneously.

Representation: The added $\left\lceil m^{\prime}/n\right\rceil$ nodes are represented by the first $\left\lceil m^{\prime}/n\right\rceil$ nodes with lowest $ID_{new}$ identifiers. So the $i$ -th added node is simulated by $\rho(i)$ .

Carrier Allocation for Added Nodes: We preallocate outgoing carriers for the added $\left\lceil m^{\prime}/n\right\rceil\leq n$ carried nodes $V^{\prime}\setminus V$ . For this we compute $\left|E^{\prime}\right|=\left|E\right|+m^{\prime}$ using BFS, and set $\left|V^{\prime}\right|=\left|V\right|+\left\lceil m^{\prime}/n\right\rceil$ and $k^{\prime}=\left|E^{\prime}\right|/\left|V^{\prime}\right|$ . Then, we split the outgoing edges of each added node into $\lceil n/k^{\prime}\rceil$ batches of size at most $k^{\prime}$ . Notice that there are at most $\left\lceil m^{\prime}/n\right\rceil\cdot\left\lceil\left|V^{\prime}\right|/k^{\prime}\right\rceil={O}(n)$ added batches, which we assign to the original nodes $V$ to carry, such that each outgoing carrier node carries a constant number of batches. This assignment is done in terms of $ID_{new}$ . Now each node knows locally for each added node $v\in V^{\prime}\setminus V$ the identifiers $ID_{new}$ of its outgoing carrier nodes $C_{v}^{out}$ . In particular, each outgoing carrier $u$ knows which added edges it carries.

Carrier Allocation for Original Nodes: Now, we allocate the part of the outgoing carrier configuration which stores original nodes and edges. Each node $v$ samples $\lceil\deg_{G^{\prime}}(v)/k^{\prime}\rceil$ identifiers ( $ID_{new}$ ) randomly independently and uniformly. Those are to become its outgoing carriers $C^{out}_{v}$ (Item 1). By Chernoff bounds, there is a constant $\zeta$ , such that each node is an outgoing carrier for at most $\zeta\log{n}$ nodes w.h.p. (Item 2).

Acquainting: For each assigned (for new nodes) or sampled (for original nodes) outgoing carrier identifier $i$ , which belongs to some outgoing carrier $u$ , carried node $v$ or its representative knows the identifier ( $ID_{new}$ ) $\rho(i)$ of $i$ ’s simulating node. Node $v$ (or its representative) sends the identifiers $v$ , $\rho(v)$ and $ID_{new}(v)$ , and the identifier $i$ of the carrier node $u$ directly to the simulating node with the new identifier $\rho(i)$ . This requires for each node $v$ to send at most ${\tilde{{O}}}(\lceil\deg_{G^{\prime}}(v)/k^{\prime}\rceil)={\tilde{{O}}}(\deg_{G}(v))$ messages and to receive $\lceil\deg_{G}(v)/k\rceil={\tilde{{O}}}(\deg_{G}(v))$ w.h.p. Simulating node $\rho(i)$ responds with the identifiers $u$ , $\rho(u)$ . Again each node sends and receives at most ${\tilde{{O}}}(\deg_{G}(v))$ messages w.h.p. Now, each carried node $v$ , for each $u\in C^{out}_{v}$ , knows the communication token of $u$ , which is the concatenation of $u$ , $\rho(u)$ and $ID_{new}(u)$ (Item 1). Also, for each carrier node $u$ , $u$ ’s simulating node $w$ with identifier $\rho(ID_{new}(u))$ knows the communication tokens of each carried node $v$ whose edges $u$ carries (Item 2).

Each carried node $v$ sorts its outgoing carrier nodes by identifiers. It partitions the interval $\left[n^{\prime}\right]$ into $\lceil\deg_{G^{\prime}}(v)/k^{\prime}\rceil$ continuous sub-intervals with at most $k^{\prime}$ identifiers of opposite endpoints of outgoing edges. We assign the $j$ -th sub-interval to the $j$ -th carrier. For each carrier $u$ , we send to its simulating node $w$ the boundaries of its interval. Each node sends ${\tilde{{O}}}(\lceil\deg_{G^{\prime}}(v)/k^{\prime}\rceil)={\tilde{{O}}}(\deg_{G}(v))$ and receives $\lceil\deg_{G}(v)/k\rceil={\tilde{{O}}}(\deg_{G}(v))$ messages w.h.p. (Items 4 and 5)

Then, carried node $v$ , for each true outgoing edge $e=(v,v^{\prime})$ , sends to its other endpoint $v^{\prime}$ the communication token of the outgoing carrier which is assigned to carry the edge $e$ . Now, Item 5 is satisfied for original outgoing edges but not for added outgoing edges. This requires sending ${O}(1)$ messages over edges of $G$ .

Consider an added outgoing edge $e=(v,v^{\prime})$ , where $v$ is an added node and $v^{\prime}$ is the original one. Let $u\in C^{out}_{v}$ be the carrier of the outgoing part of the edge $e$ and $u^{\prime}\in C^{out}_{v^{\prime}}$ be the carrier of the incoming part of the edge $e$ . New identifier $ID_{new}(u)$ is globally known by construction given $v=ID_{2}(v)$ . Thus, new identifier of the representative of $u$ is globally known, as well as the identifier of its simulator $w^{\prime}$ . Let $w$ be the simulator of the $u$ . Node $w$ sends to $w^{\prime}$ tuple $u^{\prime},w,ID_{new}(u^{\prime})$ . This requires for each simulating node to send or to receive ${\tilde{{O}}}(k^{\prime}\cdot\deg_{G}(v)/k)$ messages. This makes Item 5 satisfied for added outgoing edges as well.

Communication Tree: Each carried node $v$ (or its representative) locally builds a Communication Tree on its outgoing carrier nodes and sends to each node which simulates a carrier node, its parent and children in the tree (Items 6 and 3). Here each node sends ${O}(1)$ messages and receives ${O}(\deg_{G}(v)/k)={O}(\deg_{G}(v))$ messages.

Carrier Population: Each node $v$ , for each of its outgoing carriers $u$ , sends to node $w$ which simulates $u$ , the batch of original edges assigned for $u$ to carry along with communication token of carriers of the opposite direction of these edges (Items 3 and 4). For this, each node sends ${\tilde{{O}}}(\deg_{G}(v))$ messages and receives ${\tilde{{O}}}(k^{\prime}\cdot\deg_{G}(v)/k)$ . For the added edges we send the identifiers of the first and last edges they store. To do so, each node sends ${\tilde{{O}}}(1)$ message and receives ${\tilde{{O}}}(\deg_{G}(v)/k)$ messages.

Round Complexity: The carrier allocation phase is done locally, thus requires no communication.

In the acquainting phase, we use the routing algorithm from \IfAppendixLABEL:\next ( ( $\mathsf{CONGEST}$ Routing).)Claim D.1 for the problems where each node $v$ sends and receives ${\tilde{{O}}}(\deg_{G}(v))$ messages and ${\tilde{{O}}}({k^{\prime}\cdot\deg_{G}(v)/k})={\tilde{{O}}}(\lceil m^{\prime}/m\rceil\deg_{G}(v))$ messages. Thus, it requires $\lceil m^{\prime}/m\rceil\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds w.h.p.

Communication tree building requires only $1$ invocation of the \IfAppendixLABEL:\next ( ( $\mathsf{CONGEST}$ Routing).)Claim D.1, thus terminates in $\lceil m^{\prime}/m\rceil\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds w.h.p.

For the Carrier population phase, each node $v$ sends ${\tilde{{O}}}(\deg_{G}(v))$ and receives ${\tilde{{O}}}({k^{\prime}\cdot\deg_{G}(v)/k})={\tilde{{O}}}(\lceil m^{\prime}/m\rceil\deg_{G}(v))$ messages, thus runs in $\lceil m^{\prime}/m\rceil\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds w.h.p.

The overall complexity is $\lceil m^{\prime}/m\rceil\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ rounds w.h.p. ∎

In the proof of \IfAppendixLABEL:\next ( ( $\mathsf{AC(c)}$ Simulation in $\mathsf{CONGEST}$ ).)Theorem 5.1 we use the following technical claim.

Claim D.4 (Assignment).

Let $n_{1},n_{2},m,x_{1},\dots,x_{n_{1}},y_{1},\dots,y_{n_{2}}$ be integers such that $m<\sum_{i=1}^{n_{1}}{2^{x_{i}}}+\sum_{i=1}^{n_{2}}{2^{y_{i}}}$ , for each $i\in\left[n_{1}\right]:0\leq x_{i}<\lfloor\log{k}\rfloor-2$ and $j\in\left[n_{2}\right]:\lfloor\log{k}\rfloor-2\leq y_{j}$ , where $n=n_{1}+n_{2}$ , $k=m/n$ . There is a partition of $\left[n_{1}\right]$ to $n_{2}$ sets $I_{1},\dots,I_{n_{2}}$ , such that for each $j\in\left[n_{2}\right]:\left|I_{j}\right|\leq 4\cdot\lfloor 2^{y_{j}}/k\rfloor$ .

Proof of Claim D.4.

We construct sets greedily, by adding new elements to the set $I_{j}$ as long as its size is less than $\lfloor 4\cdot 2^{y_{j}}/k\rfloor$ . We notice that the total capacity of sets

	$\displaystyle\sum_{j=1}^{n_{2}}{4\cdot\left\lfloor\frac{2^{y_{j}}}{k}\right\rfloor}\geq 4\cdot\sum_{j=1}^{n_{2}}{\frac{2^{y_{j}}}{k}}-n$
	$\displaystyle>4\cdot\frac{m-\sum_{i=1}^{n_{1}}{2^{x_{i}}}}{k}-n>4\cdot\frac{m-\sum_{i=1}^{n_{1}}{2^{\lfloor\log{k}\rfloor-2}}}{k}-n$
	$\displaystyle\geq 4\cdot\frac{m-n\cdot{2^{\log{k}-2}}}{k}-n=4\frac{m-n\frac{m}{4n}}{\frac{m}{n}}-n$
	$\displaystyle=2\cdot n\geq n,$

is enough to hold all elements.

∎

See 5.3

Proof of Lemma 5.3.

First, we compute $m$ and $\frac{m}{n}$ using a BFS algorithm in ${O}(D)={O}(\tau_{\text{mix}})$ rounds. We simulate the algorithm from \IfAppendixLABEL:\next ( ( $k$ -nearest Algorithm).)Lemma B.20 to compute the $\max\set{k,n^{1/3}}$ -nearest (notice that Lemma B.20 works for $k\geq n^{1/3}$ ) problem in a carrier configuration, and then we simulate the algorithm from \IfAppendixLABEL:\next ( (Learn Carried Information).)Lemma B.15 to learn the edges stored in the output carrier configuration $C^{out}$ by nodes in the $\mathsf{AC(m/n)}$ model. The simulation in the $\mathsf{CONGEST}$ model is done by the algorithm from \IfAppendixLABEL:\next ( ( $\mathsf{AC(c)}$ Simulation in $\mathsf{CONGEST}$ ).)Theorem 5.1. Notice that the out-degree in the resulting graph is $\max\set{k,n^{1/3}}$ , and we truncate the output of each node to $k\log{n}$ bits before the end of the simulation. In the $\mathsf{AC(c)}$ model, solving $k$ -nearest requires $\tilde{O}(\max\set{k,n^{1/3}}\cdot n^{1/3}/c+n^{2/3}/c+1)$ , thus in the $\mathsf{CONGEST}$ model the simulation round complexity is $((\max\set{k,n^{1/3}}\cdot n^{1/3}/(m/n)+n^{2/3}/(m/n)+1+k/c)\cdot(m/n)/(m/n)+k)\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}={(k\cdot n^{4/3}/m+n^{5/3}/m+1)}\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}$ w.h.p.

∎

On Sparsity Awareness in Distributed Computations

1 Introduction

Theorem 1.1 ((3+ε)(3+\varepsilon)-Approximation for APSP in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}).

1.1 Our Contributions

1.1.1 Fast Algorithms for the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} Model

Theorem 1.2 ((1+ε)(1+\varepsilon)-Approximation for SSSP in 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid}).

1.1.2 Fast Algorithms for the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} Model

Theorem 1.3 (𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽​𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} Simulation in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}).

SSSP.

Theorem 1.4 (Exact SSSP in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}).

APSP.

Theorem 1.5 ((3+ε)(3+\varepsilon)-Approximation for Shortest-Path Query in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}).

Theorem 1.6 ((3+ε)(3+\varepsilon)-Approximation for Scattered APSP in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}).

Roadmap.

1.2 Additional Related Work

𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid}.

𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}.

𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽​𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} and 𝖡𝗋𝗈𝖺𝖽𝖼𝖺𝗌𝗍​𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽​𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Broadcast\ Congested\ Clique}.

2 Preliminaries

Definition 1 (𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} Model).

Definition 2 (The 𝖠𝖢​(𝖼)\mathsf{AC(c)} Model).

Definition 3 (kk-Source Shortest Paths (k-SSP)).

Definition 4 (Scattered APSP).

Definition 5 (Diameter).

3 The 𝖠𝗇𝗈𝗇𝗒𝗆𝗈𝗎𝗌​𝖢𝖺𝗉𝖺𝖼𝗂𝗍𝖺𝗍𝖾𝖽\mathsf{Anonymous\ Capacitated} Model

3.1 Carrier Configurations

3.2 Sparsity Aware Distance Computations

Theorem 3.1 (Sparse Matrix Multiplication).

Proof of Theorem 3.1.

4 Breaking Below o​(n1/3)o(n^{1/3}) in 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid}

4.1 Oblivious Token Routing

Claim 4.1 (Skeleton Unicast).

4.2 𝖠𝖢​(𝖼)\mathsf{AC(c)} and 𝖡𝖢𝖢\mathsf{BCC} Simulations in 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid}

Theorem 4.2 (𝖠𝖢​(𝖼)\mathsf{AC(c)} Simulation in 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid}).

Proof of Theorem 4.2.

Theorem 4.3 (𝖡𝖢𝖢\mathsf{BCC} Simulation in 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid}).

4.3 A (1+ε)(1+\varepsilon)-Approximation for SSSP

Lemma 4.4 (SSSP with Low Average and Maximal Degrees).

Proof of Lemma 4.4.

Lemma 4.5 (SSSP with Low Average and High Maximal Degrees).

Lemma 4.6 (SSSP with High Average Degree).

Proof of Lemma 4.6.

Proof of Theorem 1.2.

Theorem 4.7 ((2+ε)(2+\varepsilon)-Approximation for Weighted Diameter in 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid}).

5 Fast Distance Computations in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} Model

5.1 𝖠𝖢​(𝖼)\mathsf{AC(c)} Simulation in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}

Definition 6.

Theorem 5.1 (𝖠𝖢​(𝖼)\mathsf{AC(c)} Simulation in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}).

Proof of Theorem 5.1.

5.2 Faster 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽​𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} Simulation

Claim 5.2 (𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽​𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} Simulation in 𝖠𝖢​(𝖼)\mathsf{AC(c)}).

Proof of Claim 5.2.

Theorem 1.3 (𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽​𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} Simulation in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}).

5.3 Improved Distance Computations

Proof of Theorem 1.4.

Definition 7 (Shortest Path Query Problem).

Lemma 5.3 (kk-nearest in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}).

Proof of Theorem 1.5.

Proof of Theorem 1.6.

6 Discussion

Acknowledgements

References

Appendix A Preliminaries – Extended

A.1 Definitions

Definition 8 (Aggregate Function).

Definition 9 (Aggregate-and-Broadcast Problem).

A.2 Mathematical Tools

Definition 10 (Family of kk-wise Independent Random Functions).

Claim A.1 (Seed).

Claim A.2 (Conflicts).

Proof of Claim A.2.

A.3 Distance Tools

Claim A.3 (APSP using kk-nearest and MSSP).

Appendix B The 𝖠𝗇𝗈𝗇𝗒𝗆𝗈𝗎𝗌​𝖢𝖺𝗉𝖺𝖼𝗂𝗍𝖺𝗍𝖾𝖽\mathsf{Anonymous\ Capacitated} Model – Extended

B.1 General Tools

B.1.1 Basic Message Routing

Lemma B.1 (Routing).

Proof of Lemma B.1.

B.1.2 Anonymous Communication Primitives

Definition 11 (Communication Tree).

Theorem 1.1 ( $(3+\varepsilon)$ -Approximation for APSP in $\mathsf{CONGEST}$ ).

1.1.1 Fast Algorithms for the $\mathsf{Hybrid}$ Model

Theorem 1.2 ( $(1+\varepsilon)$ -Approximation for SSSP in $\mathsf{Hybrid}$ ).

1.1.2 Fast Algorithms for the $\mathsf{CONGEST}$ Model

Theorem 1.3 ( $\mathsf{Congested\ Clique}$ Simulation in $\mathsf{CONGEST}$ ).

Theorem 1.4 (Exact SSSP in $\mathsf{CONGEST}$ ).

Theorem 1.5 ( $(3+\varepsilon)$ -Approximation for Shortest-Path Query in $\mathsf{CONGEST}$ ).

Theorem 1.6 ( $(3+\varepsilon)$ -Approximation for Scattered APSP in $\mathsf{CONGEST}$ ).

$\mathsf{Hybrid}$ .

$\mathsf{CONGEST}$ .

$\mathsf{Congested\ Clique}$ and $\mathsf{Broadcast\ Congested\ Clique}$ .

Definition 1 ( $\mathsf{Hybrid}$ Model).

Definition 2 (The $\mathsf{AC(c)}$ Model).

Definition 3 ( $k$ -Source Shortest Paths (k-SSP)).

3 The $\mathsf{Anonymous\ Capacitated}$ Model

4 Breaking Below $o(n^{1/3})$ in $\mathsf{Hybrid}$

4.2 $\mathsf{AC(c)}$ and $\mathsf{BCC}$ Simulations in $\mathsf{Hybrid}$

Theorem 4.2 ( $\mathsf{AC(c)}$ Simulation in $\mathsf{Hybrid}$ ).

Theorem 4.3 ( $\mathsf{BCC}$ Simulation in $\mathsf{Hybrid}$ ).

4.3 A $(1+\varepsilon)$ -Approximation for SSSP

Theorem 4.7 ( $(2+\varepsilon)$ -Approximation for Weighted Diameter in $\mathsf{Hybrid}$ ).

5 Fast Distance Computations in the $\mathsf{CONGEST}$ Model

5.1 $\mathsf{AC(c)}$ Simulation in $\mathsf{CONGEST}$

Theorem 5.1 ( $\mathsf{AC(c)}$ Simulation in $\mathsf{CONGEST}$ ).

5.2 Faster $\mathsf{Congested\ Clique}$ Simulation

Claim 5.2 ( $\mathsf{Congested\ Clique}$ Simulation in $\mathsf{AC(c)}$ ).

Theorem 1.3 ( $\mathsf{Congested\ Clique}$ Simulation in $\mathsf{CONGEST}$ ).

Lemma 5.3 ( $k$ -nearest in $\mathsf{CONGEST}$ ).

Definition 10 (Family of $k$ -wise Independent Random Functions).

Claim A.3 (APSP using $k$ -nearest and MSSP).

Appendix B The $\mathsf{Anonymous\ Capacitated}$ Model – Extended

Definition 13 ( $\mathsf{AC(c)}$ Carrier Configuration).

Definition 15 ( $(\beta,\varepsilon)$ -Hopset).

Definition 16 ( $k$ -nearest).

Definition 17 ( $(S,d,k)$ -source detection).

Lemma B.20 ( $k$ -nearest Algorithm).

Lemma B.21 ( $(S,d,k)$ -source detection Algorithm).

Lemma B.22 (Bellman-Ford Iterations in $\mathsf{AC(c)}$ ).

Theorem B.23 (Exact SSSP in $\mathsf{AC(c)}$ ).

Theorem B.24 ( $(1+\varepsilon)$ -Approximation for SSSP in $\mathsf{AC(c)}$ ).

Theorem B.25 ( $(1+\varepsilon)$ -Approximation for SSSP in $\mathsf{AC(c)}$ (Wrapper)).

B.5 $k$ -SSP and APSP

Theorem B.27 ( $(1+\varepsilon)$ -Approximation for $k$ -SSP in $\mathsf{AC(c)}$ ).

Theorem B.28 ( $(3+\varepsilon)$ -Approximation for Scattered APSP in $\mathsf{AC(c)}$ ).

Appendix C The $\mathsf{Hybrid}$ Model – Missing Proofs