This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\WithSuffix

[1]O(#1){O}\left(#1\right) \WithSuffix[1]O~(#1){\tilde{{O}}}\left(#1\right) \WithSuffix[1]o(#1){o}\left(#1\right) \WithSuffix[1]o~(#1){\tilde{{o}}}\left(#1\right) \WithSuffix[1]Ω(#1){\Omega}\left(#1\right) \WithSuffix[1]Ω~(#1){\tilde{{\Omega}}}\left(#1\right) \WithSuffix[1]ω(#1){\omega}\left(#1\right) \WithSuffix[1]ω~(#1){\tilde{{\omega}}{\left(#1\right)}} \WithSuffix[1]Θ(#1){\Theta}\left(#1\right) \WithSuffix[1]Θ~(#1){\tilde{{\Theta}}\left(#1\right)} \NewEnvironkillcontents

On Sparsity Awareness in Distributed Computations

Keren Censor-Hillel    Dean Leitersdorf    Volodymyr Polosukhin
(Technion111{ckeren, leitersdorf, po}@cs.technion.ac.il)

We extract a core principle that underlies seemingly different fundamental distributed settings, which is that sparsity awareness may induce faster algorithms for core problems in these settings. To leverage this, we establish a new framework by developing an intermediate auxiliary model which is weak enough to be successfully simulated in the classic 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model given low mixing time, as well as in the recently introduced 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model. We prove that despite imposing harsh restrictions, this artificial model allows balancing massive data transfers with a maximal utilization of bandwidth. We then exemplify the power we gain from our methods, by deriving fast shortest-paths algorithms which greatly improve upon the state-of-the-art.

Specifically, we obtain the following end results for graphs of nn nodes:

  • A (3+ε)(3+\varepsilon) approximation for weighted, undirected APSP in (n1/2+n/δ)τmix2O(logn)(n^{1/2}+n/\delta)\cdot\tau_{\text{mix}}\cdot 2^{O(\sqrt{\log{n}})} rounds in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model, where δ\delta is the minimum degree of the graph and τmix\tau_{\text{mix}} is its mixing time. For graphs with δ=τmix2ω(logn)\delta=\tau_{\text{mix}}\cdot 2^{\omega{(\sqrt{\log{n}})}}, this takes o(n)o(n) rounds, despite the Ω(n)\Omega(n) known lower bound for general graphs [Nanongkai, STOC’14].

  • An (n7/6/m1/2+n2/m)τmix2O(logn){(n^{7/6}/m^{1/2}+n^{2}/m)}\cdot\tau_{\text{mix}}\cdot 2^{O(\sqrt{\log{n}})}-round exact SSSP algorithm in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model, for graphs with mm edges and a mixing time of τmix\tau_{\text{mix}}. This improves upon the previous algorithm of [Chechik and Mukhtar, PODC’20] for significant ranges of values of mm and τmix\tau_{\text{mix}}.

  • A 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} simulation in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model which improves upon the previous state-of-the-art simulation of [Ghaffari, Kuhn, and SU, PODC’17] by a factor that is proportional to the average degree in the graph.

  • An O~(n5/17/ε9)\tilde{O}(n^{5/17}/\varepsilon^{9})-round algorithm for a (1+ε)(1+\varepsilon) approximation for SSSP in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model. The only previous o(n1/3)o(n^{1/3}) round algorithm for distance approximations in this model is for a much larger approximation factor of (1/ε)O(1/ε)(1/\varepsilon)^{O(1/\varepsilon)} in O~(nε)\tilde{O}(n^{\varepsilon}) rounds [Augustine, Hinnenthal, Kuhn, Scheideler, Schneider, SODA’20].

1 Introduction

The overarching theme of this paper is laying down an algorithmic infrastructure and employing it for designing fast algorithms in seemingly unrelated distributed settings, namely, the classic 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model and the recently-introduced 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model.

The 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model [56] abstracts a synchronous network of nn nodes, in which in each round of computation, each node can send a message of O(logn)O(\log n) bits on each of its links. A recent line of work addresses computing MST, distances, and data summarizaion in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} with low-mixing time [38, 40, 60] and finding small graphs in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} benefits from efficient computation inside components of low-mixing time [20, 21, 22, 15, 13, 30, 48]. Low-mixing time is, in particular, a property of expander graphs, which have been shown to be useful for designing data centers [43, 27].

The 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model, recently introduced by [8], abstracts networks supporting high-bandwidth communication over local edges, as well as very low-bandwidth communication over global edges. Aligned with most previous work on this model, we assume here unbounded bandwidth for messages to neighbors, and O(logn)O(\log n)-bit messages to a constant number of nodes anywhere in the network. This model in particular abstracts recent developments in hybrid data centers [26, 42, 47, 62]. Most research in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model has been devoted to distance computation problems [8, 49, 32, 18].

While these settings highly differ, we pinpoint an approach that underlies computation in both settings, namely, sparsity and density awareness. The key type of tasks which we tackle are tasks requiring transfer of massive amounts of data. Our general approach is to design load balancing mechanisms that leverage the full bandwidth of a given communication model. Examples of tasks that enjoy our framework are matrix multiplication and distance computations.

For the purpose of our framework, as an auxiliary tool (not as a computational model in its own right), we define the 𝖠𝗇𝗈𝗇𝗒𝗆𝗈𝗎𝗌𝖢𝖺𝗉𝖺𝖼𝗂𝗍𝖺𝗍𝖾𝖽\mathsf{Anonymous\ Capacitated} model with capacity cc (abbreviated 𝖠𝖢(𝖼)\mathsf{AC(c)}), which is an all-to-all setting whose main characteristics are a limit on the bandwidth per node and the anonymity of nodes. To cope with the harsh nature of the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, which is needed in order to allow it to be efficiently simulated, we develop a distributed data structure and accompanying algorithms, dedicated towards load balancing and full utilization of the available bandwidth.

We then show sparsity aware simulations of the 𝖠𝖢(𝖼)\mathsf{AC(c)} model in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} and 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} settings. Specifically, the simulations focus on utilizing all the available bandwidth in the underlying models, even when given highly skewed inputs. Combined, these obtain our end results – fast algorithms for distance computations in low-mixing time 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} and in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model.

A first flavor of our end results: One of our main contributions is proving that the size nn of the graph is not fine-grained enough to capture the complexity of the all-pairs shortest-paths problem (APSP) in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model. While there is an Ω~(n)\tilde{\Omega}(n) lower bound for general graphs, even when allowing large approximation factors [54], and a matching randomized algorithm giving an exact solution [12], we show that one can go significantly below this complexity, depending on the minimal degree in the graph and its mixing time.

Theorem 1.1 ((3+ε)(3+\varepsilon)-Approximation for APSP in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}).

For any constant 0<ε<10<\varepsilon<1, and weighted, undirected graph G{G} with minimal degree δ\delta and mixing time τmix\tau_{\text{mix}}, there is an algorithm in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model computing a (3+ε)(3+\varepsilon) approximation to APSP on GG within (n1/2+n/δ)τmix2O(logn)(n^{1/2}+n/\delta)\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds222Note that there is a typo in the complexity stated for this theorem in the SPAA’21 proceedings., w.h.p.

For any constant 0<γ<10<\gamma<1, consider a graph GG with δ=nγτmix2ω(logn)\delta=n^{\gamma}\cdot\tau_{\text{mix}}\cdot 2^{{\omega}(\sqrt{\log{n}})}. Using Theorem 1.1, it is possible to approximate weighted, undirected APSP on GG in n1/2τmix2O(logn)+O(n1γ)n^{1/2}\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}+O(n^{1-\gamma}) rounds, w.h.p., in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model, which is a major improvement over the linear general case. This approach is aligned with a conclusion that is obtained by the single-source shortest paths (SSSP) algorithm of [40] that reflects that nn and the diameter DD are insufficient for capturing the complexity of SSSP. Our result suggests that for APSP the dependence parameters could be n,δn,\delta and τmix\tau_{\text{mix}}, and this opens the complexity landscape of APSP in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model to much further exciting research.

1.1 Our Contributions

1.1.1 Fast Algorithms for the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} Model

The pioneering works of [8, 49] lay down technical foundations that show that utilizing both the local and global edges in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model allows solutions which are much faster than algorithms which only use the local or global edges. One of their prime contributions is showing that the complexity of exact and approximate APSP is Θ~(n1/2)\tilde{\Theta}(n^{1/2}) rounds.

The Θ~(n1/3)\tilde{\Theta}(n^{1/3})-round regime is also of importance in this model, as [49, 18] show a variety of algorithms with this complexity. For diameter, a lower bound of Ω~(n1/3)\tilde{\Omega}(n^{1/3}) rounds for exact unweighted diameter and for a (2ε)(2-\varepsilon) approximation for weighted diameter is shown (the first lower bound in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model for a problem with a small output), matched with a O~(n1/3)\tilde{O}(n^{1/3}) round algorithm for a (1+ε)(1+\varepsilon) approximation for unweighted diameter, and a 22 approximation for weighted diameter. For weighted distances, O~(n1/3)\tilde{O}(n^{1/3})-round algorithms are shown for approximations from polynomially many sources, in addition to exact distances from O~(n1/3)\tilde{O}(n^{1/3}) sources.

Our main contribution in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model is breaking below o(n1/3)o(n^{1/3}) rounds for a (1+ε)(1+\varepsilon) approximation for weighted single source shortest paths (SSSP), which also implies a (2+ε)(2+\varepsilon) approximation for weighted diameter. As we elaborate upon in the following subsections, this requires establishing an entire foundation of techniques which are a core contribution of the paper. We show the following algorithm which completes in O~(n5/17)\tilde{O}(n^{5/17}) rounds, w.h.p. (as common, high probability is at least 1nc1-n^{-c}, for a constant c>1c>1).

Theorem 1.2 ((1+ε)(1+\varepsilon)-Approximation for SSSP in 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid}).

Given a weighted, undirected graph G=(V,E)G=(V,E), with n=|V|n=|V| and m=|E|m=|E|, a value 0<ε<10<\varepsilon<1, and a source sVs\in V, there is a 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model algorithm computing a (1+ε)(1+\varepsilon)-approximation of SSSP from ss, in O~(n5/17/ε9){\tilde{{O}}}(n^{5/17}/\varepsilon^{9}) rounds, w.h.p.

Our results provide an interesting open question: what is the best round complexity for a 22 approximation of the diameter? We show that for a (2+ε)(2+\varepsilon) approximation, o(n1/3)o(n^{1/3}) rounds suffice, while (2ε)(2-\varepsilon) requires Ω~(n1/3)\tilde{\Omega}(n^{1/3}) rounds, as stated above.

1.1.2 Fast Algorithms for the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} Model

As a warm-up, we illustrate the power of our 𝖠𝖢(𝖼)\mathsf{AC(c)} model and its simulation in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model, by showing how to simulate the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} model (a synchronous model where every two nodes can exchange messages of O(logn)O(\log n) bits in every round) in 𝖠𝖢(𝖼)\mathsf{AC(c)} and hence in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}. Simulation of algorithms from the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} model may both significantly simplify algorithm design as well as improve results in other distributed models, as seen in previous works [38, 20, 21, 22, 15, 13, 30, 49].

We get the following result, from which one can already show new algorithms which beat the state-of-the-art in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model by simulating existing 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} algorithms.

Theorem 1.3 (𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} Simulation in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}).

Consider a graph GG. Let AA be an algorithm which runs in the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} model over GG in tt rounds. If each node vv has O(degvlogn){O}(\deg{v}\cdot\log{n}) bits of input and output, then there is an algorithm which simulates AA in (tn2/m)τmix2O(logn)(t\cdot{{n^{2}}/{m}})\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds of the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model over GG, w.h.p.

Here, τmix\tau_{\text{mix}} is the mixing time of the graph, which is roughly the number of rounds required for a lazy random walk to reach the stationary distribution (a precise definition is not needed for understanding our results). A previous non trivial simulation of the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} model in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model is due to [38]. It emulates one 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} round in O(nτmix(1+Δ2lognτmixn)lognlogn)O({n\cdot\tau_{\text{mix}}\cdot(1+\frac{\Delta^{2}\log{n}\cdot\tau_{\text{mix}}}{n})\log{n}\log^{*}{n}}) rounds of the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model, where Δ\Delta is the maximum degree in the graph. Moreover, for certain graphs, this is improved to O~(nτmix)\tilde{O}(n\cdot\tau_{\text{mix}}). Thus, our simulation improves upon both previous algorithms for all graphs with m=n2ω(logn)m=n\cdot 2^{{\omega}(\sqrt{\log{n}})}edges by a factor that is roughly proportional to the average degree, namely, faster by a factor of m/(n2αlogn)m/(n\cdot 2^{\alpha\cdot\sqrt{\log{n}}}) for some constant α1\alpha\geq 1. Intuitively, in [38], if every node in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} desires to message every other node then this requires many rounds. We circumvent this by sending the input of low degree nodes to high degree ones, which then simulate the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} model and send back the output to the low degree nodes.

Finally, such a simulation implies a relation between lower bounds in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} and 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} models. Specifically, simulating one 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} round in TT rounds of the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model shows that a lower bound of RR rounds in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model implies a lower bound of R/TR/T rounds in the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} model. Due to [29, Theorem 4], we know that lower bounds for some problems in the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} model imply lower bounds in bounded-depth circuit complexity, and are therefore considered hard to obtain. Plugging our results from Theorem 1.3 in the R/TR/T lower bound for the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} model shows that if one constructs a family of graphs 𝒢\mathcal{G} with mm edges and τmix\tau_{\text{mix}} mixing time, for which solving some problem PP in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model requires RR rounds, then PP has a lower bound of (Rm/(n2τmix2αlogn))(R\cdot m/(n^{2}\cdot\tau_{\text{mix}}\cdot 2^{\alpha{\sqrt{\log{n}}}})) rounds in the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} model for some constant α1\alpha\geq 1. This means that any value of τmix\tau_{\text{mix}} that is below Rm/(n22αlogn)R\cdot m/(n^{2}\cdot 2^{\alpha\sqrt{\log{n}}}) implies a lower bound that is considered hard in the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} model.

SSSP.

The current state-of-the-art exact SSSP algorithm in the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} model, due to [14], runs in O(n1/6)O(n^{1/6}) rounds. Using our result from Theorem 1.3, a simulation of this algorithm in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model runs in (n13/6/m)τmix2O(logn)({n^{13/6}}/{m})\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds, w.h.p. We further improve this result by presenting a solution that is faster for any graph which is not extremely dense, namely, for which m=o(n2)m={o}(n^{2}). In a nutshell, our speed-up is due to the fact that the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} algorithm does not use all of the Ω(n2)\Omega(n^{2}) bandwidth available to it in every round, and so it is inefficient to simulate directly in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}. Thus, we instead simulate our 𝖠𝖢(𝖼)\mathsf{AC(c)} model, which better reflects the complexity of such algorithms, giving us the following.

Theorem 1.4 (Exact SSSP in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}).

Given a weighted, undirected graph GG and a source node sGs\in G, there is an algorithm in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model that ensures that every node vGv\in G knows the value dG(s,v)d_{G}(s,v), within (n7/6/m1/2+n2/m)τmix2O(logn){(n^{7/6}/m^{1/2}+n^{2}/m)}\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds, w.h.p.

Consider graphs with a mixing time τmix=O~(1)=polylog(n)\tau_{\text{mix}}={\tilde{{O}}}(1)=\operatorname{poly}\log(n). The diameter DD of such graphs is also O~(1){\tilde{{O}}}(1). If such a graph has at least m=n3/22ω(logn)m=n^{3/2}\cdot 2^{{\omega}(\sqrt{\log{n}})} edges, then our SSSP algorithm runs asymptotically faster than the state-of-the-art O~(n1/2D1/4){\tilde{{O}}}(n^{1/2}D^{1/4})-round algorithm of [23].

APSP.

We now turn our attention to the APSP problem. As observed by Nanongkai [54, Observation 1.4], to solve APSP in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model, a node vv is required to learn O~(n){\tilde{{O}}}(n) bits of information. In the worst case, for a node with a constant degree, this takes Ω~(n){\tilde{{\Omega}}}(n) rounds. For this reason, slightly modified requirements for the output have been considered.

We consider a shortest-path query problem, in which we separate the computation of shortest paths into two phases, one in which the input graph is pre-processed, and another in which a query set of pairs of nodes is given, QV×VQ\subseteq V\times V, and every node vv is required to learn the distance to every node uu such that (v,u)Q(v,u)\in Q. The round complexity of this problem is thus bi-criteria, measured both in terms of pre-processing time and in terms of query time. We analyze the round complexity of the query in terms of the query load, where given a node vv, qv=|{u|(v,u)Q(u,v)Q}|q_{v}=\left|\{u\ |\ (v,u)\in Q\lor(u,v)\in Q\}\right| denotes the number of queries which vv is a part of, and the total, normalized query load is =maxvVqv/deg(v)\ell=\max_{v\in V}q_{v}/\deg{(v)}.

Theorem 1.5 ((3+ε)(3+\varepsilon)-Approximation for Shortest-Path Query in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}).

For any constant 0<ε<10<\varepsilon<1 and a weighted, undirected graph GG with mm edges, there is a 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} algorithm which after (n1/2+n11/6/m)τmix2O(logn)(n^{1/2}+{n^{11/6}}/{m})\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds of pre-processing, solves any instance of the (3+ε)(3+\varepsilon)-approximate shortest path query problem with a known load \ell, in τmix2O(logn)\ell\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds, w.h.p.

By denoting δ\delta as the minimal degree in the graph, one gets n/δ\ell\leq n/\delta and mnδm\geq n\cdot\delta, which implies Theorem 1.1 stated above.

Finally, we consider a version of APSP, which we call Scattered APSP, where the distance between every pair of nodes is known to some node, not necessarily the endpoints themselves. That is, we require that every node uu knows, for every node vv, the identity of a node wuvw_{uv}, which stores the distance dG(u,v)d_{G}(u,v).

Theorem 1.6 ((3+ε)(3+\varepsilon)-Approximation for Scattered APSP in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}).

There exists an algorithm in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model, that given a weighted, undirected input graph G=(V,E)G=(V,E) with n=|V|n=|V| and m=|E|m=|E|, and some constant 0<ε<10<\varepsilon<1, solves the (3+ε)(3+\varepsilon)-approximate Scattered APSP problem on GG, within ((n11/6/m+n1/3+m/n)/ε+n1/2)τmix2O(logn)((n^{11/6}/m+n^{1/3}+m/n)/\varepsilon+n^{1/2})\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds333See footnote 2., w.h.p.

Roadmap.

After a survey of additional related work, Section 2 provides required definitions. Additional preliminaries appear in Appendix A. Section 3 is dedicated to the definition of carrier configurations in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, and for a flavor of a sample proof, namely, sparse matrix multiplication. The bulk of our infrastructure for 𝖠𝖢(𝖼)\mathsf{AC(c)} is deferred to Appendix B. Finally, Section 4 and Section 5 provide proofs of our end results in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} and 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} models, respectively. Both sections have some content deferred to Appendix C and Appendix D, respectively. We conclude with a discussion in Section 6.

1.2 Additional Related Work

𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid}.

There is growing interest in the recently introduced 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} network model [8]. It is further studied by [49, 32, 18]. [8, 49, 18] consider the same values for the local bandwidth λ\lambda and global bandwidth γ\gamma as we do, and address mainly distance computations. The complexity of exact weighted APSP is Θ~(n1/2){\tilde{{\Theta}}}(n^{1/2}) rounds, by combining the upper bound of [49], and the lower bound of [8]. The complexity of 33 approximate weighted n2/3n^{2/3}-SSP is Θ~(n1/3){\tilde{{\Theta}}}(n^{1/3}) rounds, by combining the upper bound of [18], and the lower bound of [49]. [8, 49] show how to simulate one round of the 𝖡𝗋𝗈𝖺𝖽𝖼𝖺𝗌𝗍𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Broadcast\ Congested\ Clique} (a synchronous distributed model where every node can broadcast a (same) O(logn)O(\log n)-bit message to all nodes per round) and 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} models on the skeleton graph with O~(n2/3){\tilde{{O}}}(n^{2/3}) nodes in O~(n1/3){\tilde{{O}}}(n^{1/3}) rounds of the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model and obtain various distance related results using them. For example [49] show (7+ε)(7+\varepsilon) and (3+ε)(3+\varepsilon) weighted nxn^{x}-SSP approximation in O(n1/3/ε)O(n^{1/3}/\varepsilon) and O~(n0.397+nx/2){\tilde{{O}}}(n^{0.397+n^{x/2}}) rounds w.h.p., respectively. [18] improve on some of those results by simulating ad-hoc models which exploit additional communication abilities of the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model. For instance, they show 33 approximate weighted nxn^{x}-SSP approximation in O(n1/3/ε)O(n^{1/3}/\varepsilon) rounds w.h.p. [32] solve distance problems for sparse families of graphs using local congestion bounded by logn\log{n}. Another special case of the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model is the 𝖭𝗈𝖽𝖾-𝖢𝖺𝗉𝖺𝖼𝗂𝗍𝖺𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Node\text{-}Capacitated\ Clique} model, which contains only global edges [6, 7].

𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}.

Distance problems are extensively studied in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model due to being fundamental tasks. One of the important problems in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model is (1+ε)(1+\varepsilon)-approximate SSSP [58, 11, 44]. The state-of-the-art randomized algorithm [11] solves it in O~((n+D)ε3){\tilde{{O}}}((\sqrt{n}+D)\varepsilon^{-3}) rounds w.h.p., even in the more restricted Broadcast 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model. This is close to the Ω~(n+D)\tilde{\Omega}{(\sqrt{n}+D)} lower bound by [58]. The state-of-the-art deterministic algorithm [44] completes in n1/2+o(1)+D1+o(1)n^{1/2+o(1)}+D^{1+o(1)} rounds. The complexity of exact SSSP is still open, with algorithms given in [31, 39, 33, 23]. The current best known algorithm is of [23] runs in O~(n1/2D1/4+D){\tilde{{O}}}(n^{1/2}D^{1/4}+D) rounds, w.h.p.

There is a lower bound of Ω~(n){\tilde{{\Omega}}}(n) rounds to compute APSP [54] which was then tweaked to give a Ω(n){\Omega}(n) lower bound for the weighted case [17]. Over the years there was much progress in understanding the complexity of this problem [57, 52, 51, 4, 31, 12, 2, 3]. For weighted exact APSP, the best known randomized algorithm [12] is optimal up to polylogarithmic terms. The best deterministic algorithm to date [3] completes in O~(n4/3){\tilde{{O}}}(n^{4/3}) rounds. For unweighted exact APSP, O~(n){\tilde{{O}}}(n) rounds algorithms are known [57, 52, 46].

To go below the Ω(n){\Omega}(n) lower bound, [50, 51] propose to build name-dependent routing tables and APSP. This means that the algorithm is allowed to choose small labels and output results that are consistent with those labels. Choosing the labels carefully thus overcomes the need to send too many bits of information to low-degree nodes. There is also a line of work which breaks below the lower bound for some certain graph families [36, 37, 53, 55]. [57, 34, 1, 41, 5] study exact and approximate diameter, eccentricities, girth and other problems.

Recently, [38, 40, 60] noticed that it is possible to develop faster 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} algorithms when the underlying graph has low mixing time. This was shown for MST [38], maximum flow, SSSP and transshipment [40] and frequency moments [60]. Algorithms for subgraph-freeness and related variants also enjoy fast computations on graphs with low mixing time [20, 48, 21, 20, 21, 22, 15, 13, 30].

𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} and 𝖡𝗋𝗈𝖺𝖽𝖼𝖺𝗌𝗍𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Broadcast\ Congested\ Clique}.

In the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} and 𝖡𝗋𝗈𝖺𝖽𝖼𝖺𝗌𝗍𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Broadcast\ Congested\ Clique} models, distance related problems like APSP, kk-SSP and diameter and their approximation are studied in [54, 16, 35, 19, 14, 28, 45, 11]. We mention that models related to the 𝖡𝗋𝗈𝖺𝖽𝖼𝖺𝗌𝗍𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Broadcast\ Congested\ Clique} and 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} models have been studied in [9, 10], who explore the regime between broadcast and unicast.

2 Preliminaries

The following are some required definitions, while Appendix A contains additional definitions and basic claims. We begin with a variant of the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model, introduced in [8].

Definition 1 (𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} Model).

In the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model, a synchronous network of nn nodes with identifiers in [n]\left[n\right] is given by a graph G=(V,E)G=(V,E). In each round, every node can send and receive λ\lambda messages of O(logn)O(\log n) bits to/from each of its neighbors (over local edges EE) and an additional γ\gamma messages in total that are O(logn){O}(\log{n}) bits long to/from any other nodes in the network (over global edges). If in some round more than γ\gamma messages are sent via global edges to/from a node, only γ\gamma messages selected adversarially are delivered.

All of [8, 49, 18] use λ=,γ=logn\lambda=\infty,\gamma=\log{n}. To our knowledge only [32] considers more restrictive settings of λ=1,γ=logn\lambda=1,\gamma=\log{n}.

We introduce the 𝖠𝗇𝗈𝗇𝗒𝗆𝗈𝗎𝗌𝖢𝖺𝗉𝖺𝖼𝗂𝗍𝖺𝗍𝖾𝖽\mathsf{Anonymous\ Capacitated} model with capacity cc (abbreviated 𝖠𝖢(𝖼)\mathsf{AC(c)}), which we show is powerful for extracting core distributed principles. The model has two defining characteristics which are anonymity and restricted message capacity.

Definition 2 (The 𝖠𝖢(𝖼)\mathsf{AC(c)} Model).

The 𝖠𝗇𝗈𝗇𝗒𝗆𝗈𝗎𝗌𝖢𝖺𝗉𝖺𝖼𝗂𝗍𝖺𝗍𝖾𝖽\mathsf{Anonymous\ Capacitated} model with capacity cc is a distributed synchronous communication model, over a graph G=(V,E)G=(V,E) with nn nodes, where each node vv can send and receive cc messages in every round, each of size O(logn)O(\log n) bits, to and from any node in the graph. Nodes have identifiers in [n][n], however, every node has a unique O(logn)O(\log n)-bit, communication token, initially known only to itself. Node vv can send each message either to some node ww whose communication token is already known to vv, or, to a node selected uniformly, independently at random from the entire graph.

𝖠𝖢(𝖼)\mathsf{AC(c)} bears some similarity to the 𝖭𝖢𝖢0\mathsf{NCC}_{0} model [6] with an empty initial knowledge graph. Unlike the 𝖭𝖢𝖢0\mathsf{NCC}_{0} model, 𝖠𝖢(𝖼)\mathsf{AC(c)} has additional capacity to send cc messages from each node and ability to send messages to random nodes. Our motivation for defining 𝖠𝖢(𝖼)\mathsf{AC(c)} is to provide a setting which is both strong enough in order to solve challenging problems, yet, at the same time is weak enough in order to be simulated efficiently in the settings of interest. We note the importance of having both identifiers and communication tokens. An identifier is chosen from a given, hardcoded set and thus can be used for assigning tasks to specific nodes. Communication tokens both assist in dealing with anonymity, and enhance the ability of the 𝖠𝖢(𝖼)\mathsf{AC(c)} model to be easily simulated in other distributed settings – a simulating algorithm can encode routing information of the underlying model in the tokens.

Many of our results hold for weighted graphs G=(V,E,w)G=(V,E,w), where w:E{1,2,,W}w\colon E\mapsto\set{1,2,\dots,W} for a WW which is polynomial in nn. Whenever we send an edge ee as part of a message, we assume w(e)w(e) is sent as well. We assume that all graphs that we deal with are connected.

Given a graph G=(V,E)G=(V,E) and a pair of nodes u,vVu,v\in V, we denote by hop(u,v)hop(u,v) the hop distance between uu and vv, by NGk(v)N^{k}_{G}(v) a subset of the kk closest nodes to vv with ties broken arbitrarily, and by dGh(u,v)d_{G}^{h}(u,v) the weight of the lightest path between uu and vv of at most hh-hops, where if there is no path of at most hh-hops then dGh(u,v)=d_{G}^{h}(u,v)=\infty. In the special case of h=h=\infty, we denote by dG(u,v)d_{G}(u,v) the weight of a lightest path between uu and vv. We also denote by degG(v)\deg_{G}{(v)} the degree of vv in GG, and, in the directed case, degGin(v),degGout(v)\deg_{G}^{in}{(v)},\ \deg_{G}^{out}{(v)} denote the in-degree and out-degree of vv in GG, respectively. When it is clear from the context we sometimes drop the subscript GG.

Definition 3 (kk-Source Shortest Paths (k-SSP)).

Given a graph G=(V,E)G=(V,E), in the kk-source shortest path problem, we are given a set SVS\subseteq V of kk sources. Every uVu\in V is required to learn the distance dG(u,s)d_{G}(u,s) for each source sSs\in S. The cases where k=1k=1, k=nk=n are called the single source shortest paths problem (SSSP), and all pairs shortest paths problem (APSP), respectively.

Definition 4 (Scattered APSP).

Given a graph G=(V,E)G=(V,E), in the Scattered APSP problem, for every pair of nodes u,vVu,v\in V, there exist nodes wuv,wvuVw_{uv},w_{vu}\in V (potentially wuv=wvuw_{uv}=w_{vu}), such that wuvw_{uv} and wvuw_{vu} know dG(u,v)d_{G}(u,v), uu knows the identifier of wuvw_{uv}, and vv knows the identifier of wvuw_{vu}.

In the approximate versions of these problems, each uVu\in V is required to learn an (α,β)\left(\alpha,\beta\right)-approximate distance d~(u,v)\widetilde{d}(u,v) which satisfies d(u,v)d~(u,v)αd(u,v)+βd(u,v)\leq\widetilde{d}(u,v)\leq\alpha\cdot d(u,v)+\beta, and in case β=0\beta=0, d~(u,v)\widetilde{d}(u,v) is called an α\alpha-approximate distance.

Definition 5 (Diameter).

Given a graph G=(V,E)G=(V,E), the diameter D=maxu,vV{d(u,v)}D=\max_{u,v\in V}\set{d(u,v)} is the maximum distance in the graph. An α\alpha-approximation of the diameter D~\widetilde{D} satisfies D/αD~DD/\alpha\leq\widetilde{D}\leq D.

3 The 𝖠𝗇𝗈𝗇𝗒𝗆𝗈𝗎𝗌𝖢𝖺𝗉𝖺𝖼𝗂𝗍𝖺𝗍𝖾𝖽\mathsf{Anonymous\ Capacitated} Model

The role of defining the 𝖠𝖢(𝖼)\mathsf{AC(c)} model is its power given by our ability to efficiently simulate it in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} and 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} models. Applications of this strength are exemplified by improved algorithms for distance computation problems in these models.

To this end, we design fast algorithms in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model for the useful tools of sparse matrix multiplication and hopset construction (Section 3.2). Such algorithms already exist in all-to-all communication models, such as the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} [14, 28]. However, fundamental load balancing and synchronization steps that are simple to implement when assuming a bandwidth of Θ(n2)\Theta(n^{2}) messages per round as in the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} model, provide formidable challenges when nodes cannot receive Θ(n)\Theta(n) messages each. For instance, when multiplying two matrices, while the number of finite elements in row vv of both input matrices might be small, in the output matrix, row vv might be very dense, so that node vv would not be able to learn all of this information. This even implies that it is not always the case that every node can even know about all the edges incident to it in some overlay graph (e.g., a hopset). The crux in the way we overcome these challenges is in introducing the carrier configuration distributed data structure that performs automatic load balancing (Section 3.1). Missing proofs are deferred to Appendix B.

3.1 Carrier Configurations

A carrier configuration is a distributed data structure for holding graphs and matrices, whose main objective is to provide a unified framework for load balancing in situations where substantial amounts of data need to be transferred. The key is that when using the carrier configurations data structure, an algorithm does not need to address many load balancing issues, as those are dealt with under-the-hood by the data structure itself. Therefore, this data structure allows us to focus on the core concepts of each algorithm and abstract away challenges that arise due to skewed inputs.

The data structure crucially enables our algorithms to enjoy sparsity awareness, by yielding complexities that depend on the average degree in an input graph rather than its maximal degree. This allows one to eventually deal with data which is highly skewed and which would otherwise cause a slow-down by having some nodes send and receive significantly more messages than others.

We stress that the manner in which we implement carrier configurations is inherently distributed in nature. In many cases, when two nodes u,vu,\ v store data, D(u),D(v)D(u),\ D(v), respectively, in a carrier configuration, the data is dispersed among many other nodes, and when operations are performed using both D(u)D(u) and D(v)D(v), the nodes which store the data perform direct communication between themselves, without necessarily involving uu or vv.

In more detail, the carrier configuration data structure is based on three key parts. (1) Carrier Nodes: every node vv gets a set CvC_{v} of carrier nodes, where |Cv|=Θ(deg(v)/k)|C_{v}|=\Theta(\deg(v)/k) and kk is the average degree in the graph, which help vv to carry information and store its input edges in a distributed fashion. A key insight is that it is possible to create such an assignment of carrier nodes and also maintain that each node itself is not a carrier for too many other nodes, thus avoiding congestion. (2) Ordered Data Storage: the data of vv is stored in a sorted fashion across CvC_{v} in order to enable its efficient usage. In particular, it takes O(logn)O(\log n) bits in order to describe what ranges of data are stored in a specific carrier node. (3) Communication Structure: the nodes Cv{v}C_{v}\cup\{v\} are connected using a communication tree (Definition 11), which enables fast broadcasting and aggregation of information between vv and CvC_{v}.

A formal definition of a carrier configuration data structure (Definition 12), as well as an extended 𝖠𝖢(𝖼)\mathsf{AC(c)}-specific definition (Definition 13) are given in Section B.2.

Carrier Configuration Toolbox. Typically, data is converted from a classical storage setting (every node knows the edges incident to it) into being stored in a carrier configuration, and then operations are applied to change the state of the configuration. Thus, we show in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model how to convert a classical graph representation to a carrier configuration. Then, we provide basic tools, e.g., given two matrices held in carrier configurations, produce a third configuration holding the matrix resulting from computing the point-wise minimum of the two input matrices. The descriptions and implementations of these are deferred to Section B.2.

3.2 Sparsity Aware Distance Computations

In order to give a taste of the type of algorithms which we construct in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, we present an outline of our sparse matrix multiplication algorithm.

We build a foundation in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model which enables us to eventually implement sparse matrix multiplication and hopset construction algorithms, as inspired by the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} model algorithms in [14],444We note that while some results of [14] were improved in [28], the improvements focus on reducing the amount of synchronous rounds of the algorithm, yet do not decrease the message complexity in way that assists us. in order to solve distance related problems. Ultimately, compared to [14], our main contribution is significantly reducing the message complexity overhead of the load balancing mechanisms. In the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} implementation, various load balancing steps require Θ(n2)\Theta(n^{2}) messages, which is trivial in the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} model, yet is highly problematic in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model (and in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} and 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} models which ultimately simulate it). Interestingly, for the relevant sparsity used in distance computations, the sparse matrix multiplication itself requires o(n2)o(n^{2}) messages for actual transfer of matrix elements, and in the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} algorithm, the message complexity is dominated by the Θ(n2)\Theta(n^{2}) overhead. We reduce this overhead significantly such that the message complexity of the algorithm is dominated by the messages which actually transfer matrix elements.

We show how to perform sparse matrix multiplication in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model. To simplify our proofs, we assume that the two input matrices and the output matrix have the same average number of finite elements per row. We note that it is possible to use the same ideas we show here in order to prove the general case where the matrices have different densities.

Theorem 3.1 (Sparse Matrix Multiplication).

Given two n×nn\times n input matrices S,TS,T, both with an average number of finite elements per row of at most kk, and stored in carrier configurations, AA and BB, it is possible to output the matrix P^\hat{P}, in carrier configuration CC, where P^\hat{P} is P=STP=S\cdot T, with only the smallest kk elements of each row computed, breaking ties arbitrarily. This takes O~(kn1/3/c+n/(kc)+1)\tilde{O}(k\cdot n^{1/3}/c+n/(k\cdot c)+1) rounds, in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, w.h.p.

Proof of Theorem 3.1.

Throughout the algorithm, we assume that every piece of data sent directly to a node is also known by all its carrier nodes. This is possible to do with only a O~(1)\tilde{O}(1) multiplicative factor to the round complexity, due to \IfAppendixLABEL:\next ( (Carriers Broadcast and Aggregate).)Lemma B.14. Further, as the input matrices are stored in carrier configuration, due to Item 4, every entry S[i][j]S[i][j] is stored in AA alongside the communication tokens of both ii and jj (likewise for entries of TT stored in BB). Thus, whenever a value is sent in the algorithm from one of the input matrices, we assume that it is sent alongside these communication tokens.

Denote by Δ\Delta the maximal number of finite elements in a row of SS or TT. In the proof below, we show a round complexity of O~(kn1/3/c+n/(kc)+Δ/(n1/3c)+n2/3/c+1)\tilde{O}(k\cdot n^{1/3}/c+n/(k\cdot c)+\Delta/(n^{1/3}\cdot c)+n^{2/3}/c+1), which is bounded by the claimed round complexity of O~(kn1/3/c+n/(kc)+1)\tilde{O}(k\cdot n^{1/3}/c+n/(k\cdot c)+1).555Notice that O~(kn1/3/c+n/(kc)+Δ/(n1/3c)+n2/3/c+1)=O~(kn1/3/c+n/(kc)+1)\tilde{O}(k\cdot n^{1/3}/c+n/(k\cdot c)+\Delta/(n^{1/3}\cdot c)+n^{2/3}/c+1)=\tilde{O}(k\cdot n^{1/3}/c+n/(k\cdot c)+1), since: 1) It always holds that Δn\Delta\leq n, and so Δ/(n1/3c)=O(n2/3/c)\Delta/(n^{1/3}\cdot c)=O(n^{2/3}/c), and 2) If k<n1/3k<n^{1/3}, then n2/3/c=O(n/(kc))n^{2/3}/c=O(n/(k\cdot c)), yet if kn1/3k\geq n^{1/3}, then n2/3/c=O(kn1/3/c)n^{2/3}/c=O(k\cdot n^{1/3}/c).

Throughout the proof, we construct various sets of nodes (for instance, ViV_{i}) where every node in the set knows the communication tokens and identifiers of all the other nodes in the set (Tokens(Vi)Tokens(V_{i})), and thus we assume that the nodes in the each set locally compute some arbitrary node (some fixed viViv_{i}\in V_{i}) which is the leader of the set.  

Step: Partitioning the Input – Sets ViV_{i} 
Denote by degS(v),degT(v)\deg_{S}(v),\ \deg_{T}(v) the number of finite elements in row vv of SS, and column vv of TT, respectively.

Denote W1,,Wn1/3/2W_{1},\dots,W_{n^{1/3}/2}, a hardcoded partition of VV into equally sized sets. Using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1 and \IfAppendixLABEL:\next ( (Grouping).)Lemma B.7, every vWiv\in W_{i} broadcasts degS(v),degT(v)\deg_{S}(v),\ \deg_{T}(v) to WiW_{i}, within O(n2/3/c+1)O(n^{2/3}/c+1) rounds. The nodes in WiW_{i} locally partition WiW_{i} into Wi,1,,Wi,jiW_{i,1},\dots,W_{i,j_{i}}, for some jij_{i}, where for each k[ji]k\in[j_{i}], vWi,kdegS(v)+degT(v)4kn2/3+4Δ\sum_{v\in W_{i,k}}\deg_{S}(v)+\deg_{T}(v)\leq 4k\cdot n^{2/3}+4\Delta. Since vVdegS(v)+degT(v)2nk\sum_{v\in V}\deg_{S}(v)+\deg_{T}(v)\leq 2nk, there exists a way to create this partitioning such that i[n1/3]jin1/3\sum_{i\in[n^{1/3}]}j_{i}\leq n^{1/3}.

From here on, we refer to the sets Wi,kW_{i,k} as VjV_{j}. We denote by S[Vj]S[V_{j}] the rows of SS corresponding to the nodes VjV_{j}, and by T[Vj]T[V_{j}] the columns of TT corresponding to the nodes VjV_{j}. As i[n1/3]jin1/3\sum_{i\in[n^{1/3}]}j_{i}\leq n^{1/3}, there are at most n1/3n^{1/3} sets ViV_{i} – the sets V1,,Vn1/3V_{1},\dots,V_{n^{1/3}}. For each set ViV_{i}, an arbitrary viViv_{i}\in V_{i} is designated as its leader. The identifiers and communication tokens of all the leaders are broadcast using \IfAppendixLABEL:\next ( (Broadcasting).)Lemma B.5, within O~(n1/3/c+1)\tilde{O}(n^{1/3}/c+1) rounds.

We partition the information held by the sets ViV_{i}. We compute Ci={0=c0,c1,cn1/3/41,cn1/3/4=n}C_{i}=\{0=c_{0},c_{1},\dots c_{n^{1/3}/4-1},c_{n^{1/3}/4}=n\}, where the total number of finite entries in666The notation S[X][a:b]S[X][a:b] denotes all rows of SS with indices in XX, from column aa to column bb. S[Vi][cj+1:cj+1]S[V_{i}][c_{j}+1:c_{j+1}] is at most 16kn1/3+16Δ/n1/3+|Vi|16kn1/3+16Δ/n1/3+2n2/316k\cdot n^{1/3}+16\Delta/n^{1/3}+|V_{i}|\leq 16k\cdot n^{1/3}+16\Delta/n^{1/3}+2n^{2/3}. Recall that for each ViV_{i}, it holds that vVidegS(v)+degT(v)4kn2/3+4Δ\sum_{v\in V_{i}}\deg_{S}(v)+\deg_{T}(v)\leq 4k\cdot n^{2/3}+4\Delta, and also |Vi|2n2/3|V_{i}|\leq 2n^{2/3}, implying the existence of such a set CiC_{i}.

The nodes in ViV_{i} compute the values in CiC_{i} using binary search. To see this, given any p1,,pcp_{1},\dots,p_{c}, the number of finite entries in each of S[Vi][0:p1],,S[Vi][0:pc]S[V_{i}][0:p_{1}],\dots,S[V_{i}][0:p_{c}] can be computed by the nodes in ViV_{i} in O~(1)\tilde{O}(1) rounds, using \IfAppendixLABEL:\next ( (Group Broadcasting and Aggregating).)Corollary B.9. Thus, each cjCic_{j}\in C_{i} can be found using binary search, with cc binary searches run in parallel in O~(1)\tilde{O}(1) rounds. In total O~(|Ci|/c+1)=O~(n1/3/c+1)\tilde{O}(|C_{i}|/c+1)=\tilde{O}(n^{1/3}/c+1) rounds are required to compute CiC_{i}.  

Step: Creating Intermediate Representations – Sets Ui,jU_{i,j}, Pi,j,P_{i,j,\ell} Matrices 
Let Ui,jU_{i,j}, for i,j[n1/3]i,j\in[n^{1/3}], be a hard-coded partition of VV into equally sized sets. The goal of this step is for nodes Ui,jU_{i,j} to compute an intermediate representation of the product S[Vi]×T[Vj]S[V_{i}]\times T[V_{j}]. Therefore, we desire that each ViV_{i} sends all its data to Ui,j,Uj,iU_{i,j},U_{j,i}, for each j[n1/3]j\in[n^{1/3}], in some load-balanced manner. We show how, for each i[n1/3]i\in[n^{1/3}], ViV_{i} sends S[Vi]S[V_{i}] to Ui,jU_{i,j}, for each j[n1/3]j\in[n^{1/3}], and in a symmetric way (with matrix TT instead of SS) this can be done for Uj,iU_{j,i}.

In O~(n2/3/c+1)\tilde{O}(n^{2/3}/c+1) rounds, allow communication within each Ui,jU_{i,j} by invoking \IfAppendixLABEL:\next ( (Grouping).)Lemma B.7. Let some ui,jUi,ju_{i,j}\in U_{i,j}, be denoted leader, and broadcast the leaders of all the sets using \IfAppendixLABEL:\next ( (Broadcasting).)Lemma B.5 within O~(n2/3/c+1)\tilde{O}(n^{2/3}/c+1) rounds. Node ui,ju_{i,j} sends to both vi,vjv_{i},v_{j} all of Tokens(Ui,j)Tokens(U_{i,j}). Each node viv_{i} receives n2/3n^{2/3} messages, thus this takes O~(n2/3/c+1)\tilde{O}(n^{2/3}/c+1) rounds using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1. Now, each node viv_{i} broadcasts to ViV_{i} all the tokens it receives, using \IfAppendixLABEL:\next ( (Group Broadcasting and Aggregating).)Corollary B.9, within O~(n2/3/c+1)\tilde{O}(n^{2/3}/c+1) rounds.

Leader viv_{i} sends the contents of CiC_{i} to all the leader nodes ui,j,uj,iu_{i,j},u_{j,i}, for any j[n1/3]j\in[n^{1/3}], in O~(n2/3/c+1)\tilde{O}(n^{2/3}/c+1) rounds using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1. Each ui,ju_{i,j} broadcasts to Ui,jU_{i,j} the contents of CiC_{i} and CjC_{j}, within O~(n1/3/c+1)\tilde{O}(n^{1/3}/c+1) rounds using \IfAppendixLABEL:\next ( (Group Broadcasting and Aggregating).)Corollary B.9.

We send information from ViV_{i} to Ui,jU_{i,j}. The \ell-th node in Ui,jU_{i,j} learns the finite elements in S[Vi][c+1:c+1]S[V_{i}][c_{\ell}+1:c_{\ell+1}]. By definition of CiC_{i}, for any \ell, the number of finite elements in S[Vi][c+1:c+1]S[V_{i}][c_{\ell}+1:c_{\ell+1}] is at most 16kn1/3+16Δ/n1/3+2n2/316k\cdot n^{1/3}+16\Delta/n^{1/3}+2n^{2/3}, bounding the number of messages each node desires to receive. Every finite element held by a node vViv\in V_{i} needs to be sent to O(n1/3)O(n^{1/3}) nodes in the graph, in total. Further, since the finite elements of vv are stored in its carrier nodes, and each carrier node of vv stores O(k)O(k) elements, each node sends at most a total of O(kn1/3)O(k\cdot n^{1/3}) messages. Thus, this routing can be accomplished in O~(kn1/3/c+Δ/(n1/3c)+n2/3/c+1)\tilde{O}(k\cdot n^{1/3}/c+\Delta/(n^{1/3}\cdot c)+n^{2/3}/c+1) rounds, using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1.777To perform the routing, it is required that the carrier nodes of vViv\in V_{i} know the communication tokens of the nodes in Ui,jU_{i,j} which receive the messages. All the nodes in ViV_{i} received Tokens(Ui,j)Tokens(U_{i,j}), and so vv knows the required tokens. At the start of the proof, we state that we assume that every message a node receives is also broadcast from it to its carriers, and so also the carriers of vv know the required communication tokens..

We shuffle data within Ui,jU_{i,j}. Recall that |Ui,j|=n1/3|U_{i,j}|=n^{1/3}. The first |Ci|=|Cj|=n1/3/4|C_{i}|=|C_{j}|=n^{1/3}/4 nodes of Ui,jU_{i,j} received data from S[Vi],T[Vj]S[V_{i}],\ T[V_{j}], according to Ci,CjC_{i},C_{j}. Denote Ci,j=CiCjC_{i,j}=C_{i}\cup C_{j}. We desire that for each interval, [c+1,c+1][c_{\ell}+1,c_{\ell+1}], for cCi,jc_{\ell}\in C_{i,j}, node Ui,j\ell\in U_{i,j} knows all the elements S[Vi][c+1,c+1]S[V_{i}][c_{\ell}+1,c_{\ell+1}] and T[c+1,c+1][Vj]T[c_{\ell}+1,c_{\ell+1}][V_{j}]. To do so, notice that S[Vi][c+1,c+1]S[V_{i}][c_{\ell}+1,c_{\ell+1}] (likewise with TT) is fully contained in the data which some node in qUi,jq\in U_{i,j}, where q[n1/3/4]q\in[n^{1/3}/4] already knows. Thus, each node [n1/3/4]\ell\in[n^{1/3}/4] of Ui,jU_{i,j} denotes by val()val(\ell) how many other nodes in Ui,jU_{i,j} are reliant on the data which it itself received from ViV_{i}.888This can be done since all the nodes in Ui,jU_{i,j} know all of Ci,jC_{i,j}. Then, we invoke \IfAppendixLABEL:\next ( (Group Multicasting).)Lemma B.10 in O~(kn1/3/c+Δ/(n1/3c+n2/3/c+1)\tilde{O}(k\cdot n^{1/3}/c+\Delta/(n^{1/3}\cdot c+n^{2/3}/c+1) rounds in order to route the required information. Finally, each node [n1/3/2]\ell\in[n^{1/3}/2] from Ui,jU_{i,j} knows S[Vi][c+1,c+1]S[V_{i}][c_{\ell}+1,c_{\ell+1}] and T[c+1,c+1][Vj]T[c_{\ell}+1,c_{\ell+1}][V_{j}], denoted Si,j,,Ti,j,S_{i,j,\ell},T_{i,j,\ell}, respectively, and can compute the product Pi,j,=Si,j,×Ti,j,P_{i,j,\ell}=S_{i,j,\ell}\times T_{i,j,\ell}.

We are at a state where for each Ui,jU_{i,j}, every [n1/3/2]\ell\in[n^{1/3}/2] of Ui,jU_{i,j} knows some matrix Pi,j,P_{i,j,\ell} such that S[Vi]T[Vj]=[n1/3/2]Pi,j,S[V_{i}]T[V_{j}]=\sum_{\ell\in[n^{1/3}/2]}P_{i,j,\ell}

Step: Sparsification – P^i,j,\hat{P}_{i,j,\ell} Matrices 
We sparsify the Pi,j,P_{i,j,\ell} matrices. Recall that we desire to output P^\hat{P} which is P=S×TP=S\times T, with only the kk smallest entries on each row. Fix i[n1/3]i\in[n^{1/3}], [n1/3/2]\ell\in[n^{1/3}/2], and denote by Qi,Q_{i,\ell} as the matrix of size |Vi|×n|V_{i}|\times n created by concatenating Pi,0,,,Pi,n1/3,P_{i,0,\ell},\dots,P_{i,n^{1/3},\ell}. As shown in [14], we are allowed to keep only the kk smallest entries on each row of Qi,Q_{i,\ell}, without damaging the final output P^\hat{P}. That is, throwing out elements at this stage is guaranteed to throw out elements which are only in PP and not in P^\hat{P}.

Fix i[n1/3],[n1/3/2]i\in[n^{1/3}],\ell\in[n^{1/3}/2], and denote by Ni,N_{i,\ell} the nodes numbered \ell in each Ui,jU_{i,j}. The nodes in Ni,N_{i,\ell} perform a binary search for each row of Qi,n1/3,Q_{i,n^{1/3},\ell}, to determine the cutoff for the kthk^{th} smallest element on that row. The leaders ui,0,,ui,n1/3u_{i,0},\dots,u_{i,n^{1/3}} broadcast to each other Tokens(Ui,j),,Tokens(Ui,n1/3)Tokens(U_{i,j}),\dots,Tokens(U_{i,n^{1/3}}), using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1, taking O(n2/3/c+1)O(n^{2/3}/c+1) rounds. Each leader ui,ju_{i,j} broadcasts the O(n2/3)O(n^{2/3}) tokens it received to Ui,jU_{i,j}, using \IfAppendixLABEL:\next ( (Group Broadcasting and Aggregating).)Corollary B.9, within O~(n2/3/c+1)\tilde{O}(n^{2/3}/c+1) rounds. Now, the nodes Ni,N_{i,\ell} all know Tokens(Ni,)Tokens(N_{i,\ell}).

The nodes Ni,N_{i,\ell} proceed in O~(1)\tilde{O}(1) iterations to perform concurrent binary searches on the threshold values for each row in Qi,Q_{i,\ell}. Let ni,n_{i,\ell} be an arbitrarily chosen leader node for Ni,N_{i,\ell}. In every iteration, node ni,n_{i,\ell} broadcasts p1,,p|Vi|p_{1},\dots,p_{|V_{i}|} values, each a tentative threshold value for a row in Qi,Q_{i,\ell}, using \IfAppendixLABEL:\next ( (Group Broadcasting and Aggregating).)Corollary B.9, and in response, aggregates from the nodes of Ni,N_{i,\ell}, using \IfAppendixLABEL:\next ( (Group Broadcasting and Aggregating).)Corollary B.9 the total number of entries in each row of Qi,Q_{i,\ell} below, equal to, and above the queried threshold. Each such iteration takes O~(|Vi|/c+1)=O~(n2/3/c+1)\tilde{O}(|V_{i}|/c+1)=\tilde{O}(n^{2/3}/c+1) rounds, and after O~(1)\tilde{O}(1) iterations of this procedure, all the nodes in Ni,N_{i,\ell} know a threshold for every row they posses, informing them which values in Qi,n1/3,Q_{i,n^{1/3},\ell} can be thrown out.

Define the matrices P^i,j,\hat{P}_{i,j,\ell} by removing from Pi,j,P_{i,j,\ell} the entries which are thrown away due to the thresholds. 

Step: Balancing the Intermediate Representation 
The nodes computed the matrices P^i,j,\hat{P}_{i,j,\ell}, yet, some matrices P^i,j,\hat{P}_{i,j,\ell} may be too dense in order to transport out of the nodes which locally hold them. Even though we sparsified the matrices above, the sparsification steps perform were on several P^i,j,\hat{P}_{i,j,\ell} matrices at once, and thus we can still have single P^i,j,\hat{P}_{i,j,\ell} matrices which remains very dense. Let P^x,y,z\hat{P}_{x,y,z} be such a very dense matrix. We overcome this challenge by having more nodes compute P^x,y,z\hat{P}_{x,y,z} from scratch, allowing each node to only take responsibility for retaining a part of P^x,y,z\hat{P}_{x,y,z}.

For each i[n1/3]i\in[n^{1/3}], we pool the nodes Ui=j[n1/3]Ui,jU_{i}=\bigcup_{j\in[n^{1/3}]}U_{i,j} and redistribute them such that areas in the matrix which are too dense get more nodes.

Each node [n1/3/2]\ell\in[n^{1/3}/2] from Ui,jU_{i,j} computes the number of finite values in P^i,j,\hat{P}_{i,j,\ell}, denoted as pi,j,p_{i,j,\ell}. Node ui,ju_{i,j} computes pi,j=[n1/3/2]pi,j,p_{i,j}=\sum_{\ell\in[n^{1/3}/2]}p_{i,j,\ell} using on Ui,jU_{i,j} within O~(n1/3/c+1)\tilde{O}(n^{1/3}/c+1) rounds due to \IfAppendixLABEL:\next ( (Group Broadcasting and Aggregating).)Corollary B.9. Finally, ui,ju_{i,j} broadcasts to the other leader nodes ui,0,,ui,n1/3u_{i,0},\dots,u_{i,n^{1/3}} the values pi,j,0,,pi,j,n1/3/2p_{i,j,0},\dots,p_{i,j,n^{1/3}/2}, and then broadcasts to Ui,jU_{i,j} all the O(n2/3)O(n^{2/3}) values that it received – taking O(n2/3/c+1)O(n^{2/3}/c+1) rounds due to \IfAppendixLABEL:\next ( (Routing).)Lemma B.1 and \IfAppendixLABEL:\next ( (Group Broadcasting and Aggregating).)Corollary B.9.

Due to the fact that each row of P^\hat{P} has at most kk elements in total, and currently P^\hat{P} is distributed across n1/3/2n^{1/3}/2 matrices which need to be summed, all the nodes UiU_{i} hold at most k|Vi|n1/3/2knk\cdot|V_{i}|\cdot n^{1/3}/2\leq k\cdot n elements. That is, the sum pi=j[n1/3]pi,jp_{i}=\sum_{j\in[n^{1/3}]}p_{i,j} is at most knk\cdot n. Each node [n1/3/2]\ell\in[n^{1/3}/2] from Ui,jU_{i,j} observes P^i,j,\hat{P}_{i,j,\ell} and locally breaks it up into t(i,j,)=pi,j,/(2kn1/3)t(i,j,\ell)=\lceil p_{i,j,\ell}/(2k\cdot n^{1/3})\rceil matrices P^i,j,,0,,P^i,j,,t(i,j,)\hat{P}_{i,j,\ell,0},\dots,\hat{P}_{i,j,\ell,t(i,j,\ell)} which sum up to P^i,j,\hat{P}_{i,j,\ell} and each have at most 2kn1/32k\cdot n^{1/3} finite elements. This creates at most n2/3n^{2/3} such matrices. That is, j[n1/3],[n1/3/2]t(i,j,)j[n1/3],[n1/3/2]pi,j,/(2kn1/3)n2/3\sum_{j\in[n^{1/3}],\ell\in[n^{1/3}/2]}t(i,j,\ell)\leq\sum_{j\in[n^{1/3}],\ell\in[n^{1/3}/2]}\lceil p_{i,j,\ell}/(2k\cdot n^{1/3})\rceil\leq n^{2/3}. Thus, the total number of intermediate matrices, spread over the nodes UiU_{i}, is at most n2/3n^{2/3}. Each [n1/3/2]\ell\in[n^{1/3}/2] from Ui,jU_{i,j} is allocated t(i,j,)1t(i,j,\ell)-1 other nodes from UiU_{i}, called auxiliary nodes in order to send them the matrices P^i,j,,1,,P^i,j,,t(i,j,)\hat{P}_{i,j,\ell,1},\dots,\hat{P}_{i,j,\ell,t(i,j,\ell)}. Notice that since all nodes in UiU_{i} know all the values pi,j,p_{i,j,\ell}, all the nodes can locally know which node in UiU_{i} is allocated to help which other node in UiU_{i}. However, node \ell cannot send to all its auxiliary nodes all of this information, instead it sends to each of its auxiliary nodes the data it received from SS and TT with which it computed the matrix Pi,j,P_{i,j,\ell}, and the O(n2/3)O(n^{2/3}) thresholds which it used to turn Pi,j,P_{i,j,\ell} into P^i,j,\hat{P}_{i,j,\ell}. This can be accomplished via \IfAppendixLABEL:\next ( (Group Multicasting).)Lemma B.10 within O~(kn1/3/c+n2/3/c+1)\tilde{O}(k\cdot n^{1/3}/c+n^{2/3}/c+1) rounds, since each node wishes to multicast at most O~(kn1/3)\tilde{O}(k\cdot n^{1/3}) data (as this is the bound on the total data each node received from SS and TT), to at most |Ui|=O(n2/3)|U_{i}|=O(n^{2/3}) other nodes, and each node desires to receive multicast messages only from one other node. 

Step: Summation 
We send data from UiU_{i} to ViV_{i} to perform the final summation step. Node vViv\in V_{i} learns all of the information held in the nodes in UiU_{i} which pertains to P^[v]\hat{P}[v]. All the nodes in UiU_{i} know Tokens(Vi)Tokens(V_{i}), and vice versa and can thus communicate. Each node in ViV_{i} wishes to receive O(kn1/3)O(k\cdot n^{1/3}) data from UiU_{i}, and likewise, every node in UiU_{i} wishes to send O(kn1/3)O(k\cdot n^{1/3}) data. Thus, this communication is executed using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1 in O~(kn1/3/c+1)\tilde{O}(k\cdot n^{1/3}/c+1) rounds. Upon receiving the data, each node vViv\in V_{i} computes the at most smallest kk entries of P^[v]\hat{P}[v]. Now, P^\hat{P} is stored in a partial carrier configuration (with every node vv having Cvout={v}C_{v}^{out}=\{v\}), and thus we invoke \IfAppendixLABEL:\next ( (Partial Configuration Completion).)Lemma B.18, within O~(nk/c+n/(kc)+1)\tilde{O}(\sqrt{nk}/c+n/(k\cdot c)+1) rounds, to ensure that P^\hat{P} is stored in a carrier configuration CC. Notice that O~(nk/c+1)=O~(kn1/3/c+n2/3/c+1)\tilde{O}(\sqrt{nk}/c+1)=\tilde{O}(k\cdot n^{1/3}/c+n^{2/3}/c+1), and so we are within the stated round complexity. ∎

4 Breaking Below o(n1/3)o(n^{1/3}) in 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid}

We show a (1+ε)(1+\varepsilon) approximation for weighted SSSP in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model within O~(n5/17/ε9)\tilde{O}(n^{5/17}/\varepsilon^{9}) rounds, further implying that it is possible to compute a (2+ε)(2+\varepsilon) approximation for the weighted diameter in this number of rounds. We achieve this by combining a simulation of our 𝖠𝖢(𝖼)\mathsf{AC(c)} model and of the 𝖡𝗋𝗈𝖺𝖽𝖼𝖺𝗌𝗍𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Broadcast\ Congested\ Clique} model. Roughly speaking, this incorporates density awareness in addition to the sparsity awareness discussed thus far.

A key approach in previous distance computations in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model [8, 49, 18] is to construct an overlay skeleton graph, and show that solving distance problems on such skeleton graphs can be extended to solutions on the entire graph. In a nutshell, given a graph G=(V,E)G=(V,E) and some constant 0<x<10<x<1, a skeleton graph Sx=(M,ES)S_{x}=(M,E_{S}) is generated by letting every node in VV independently join MM with probability nx1n^{x-1}. Two skeleton nodes in MM have an edge in ESE_{S} if there exists a path between them in GG of at most O~(n1x)\tilde{O}(n^{1-x}) hops. In particular, the nodes of SxS_{x} are well spaced in the graph and satisfy a variety of useful properties. The central distance related property is that pair of far enough nodes in GG have skeleton nodes at predictable intervals on some shortest path between them.

Given such a skeleton graph, Sx=(M,Es)S_{x}=(M,E_{s}), with |M|=Θ(nx)|M|=\Theta(n^{x}), previous work showed that it is possible in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model to let the nodes in MM take control over the other nodes in the graph and use all the global bandwidth available to perform messaging between nodes in MM. In essence, after Θ~(n1x)\tilde{\Theta}(n^{1-x}) pre-processing rounds, the Θ~(n)\tilde{\Theta}(n) global bandwidth available to the entire graph in each round of the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model is utilized such that every node in MM can send and receive Θ~(n/|M|)=Θ~(n1x)\tilde{\Theta}(n/|M|)=\tilde{\Theta}(n^{1-x}) messages per round from any other node in MM, in an amortized manner.999In Θ~(n1x)\tilde{\Theta}(n^{1-x}) rounds, each node in MM can send and receive Θ~(n22x)\tilde{\Theta}(n^{2-2x}) messages to other nodes in MM. For a given vMv\in M, we denote by HvH_{v} the set of helper nodes of vv which contribute their globally communication capacity to vv; it is guaranteed that all nodes in HvH_{v} are at most O~(n1x)\tilde{O}(n^{1-x}) hops away from vv in GG.

A formal definition of skeleton graphs encapsulating all of the above, is in Definition 20 in Section C.1.2. These skeleton graphs are built upon nodes randomly selected from the input graph GG, and thus the number of edges in SxS_{x} correlates to the density of neighborhoods in GG – the graph SxS_{x} is either sparse or dense, depending on GG. We split into cases according to the sparsity of SxS_{x}, which can be computed using known techniques from the 𝖫𝖮𝖢𝖠𝖫\mathsf{LOCAL} and 𝖭𝗈𝖽𝖾-𝖢𝖺𝗉𝖺𝖼𝗂𝗍𝖺𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Node\text{-}Capacitated\ Clique} models.

Sparse SxS_{x}: A hurdle that stands in the way for going below o(n1/3)o(n^{1/3}) rounds is that one must choose x>2/3x>2/3, in order for the Θ~(n1x)\tilde{\Theta}(n^{1-x})-round pre-processing step to not exceed the goal complexity. However, in order to use the previously shown routing techniques, the identifiers of the nodes in MM must be globally known, a task which can be shown to take Ω~(nx/2)=ω(n1/3)\tilde{\Omega}(n^{x/2})=\omega(n^{1/3}) rounds. This leads to an anonymity issue – letting nodes in MM communicate with one another although no node in the graph knows the identifiers of all the nodes in MM.

We overcome this anonymity problem by showing a routing algorithm which allows messaging over SxS_{x} without assuming that the identifiers of the nodes in MM are globally known. This allows us to simulate the 𝖠𝖢(𝖼)\mathsf{AC(c)} model over SxS_{x}. By simulating algorithms from the 𝖠𝖢(𝖼)\mathsf{AC(c)} model on SxS_{x}, we directly get a (1+ε)(1+\varepsilon)-approximation for SSSP in o(n1/3)o(n^{1/3}) rounds, with the exact round complexity dependent on the sparsity of SxS_{x}. However, as SxS_{x} approaches having |M|2=Θ(n2x)|M|^{2}=\Theta(n^{2x}) edges, the round complexity of the simulated 𝖠𝖢(𝖼)\mathsf{AC(c)} model algorithms approaches Θ~(n1/3)\tilde{\Theta}(n^{1/3}). As such, using the techniques so far, we solve all cases except for very dense skeleton graphs SxS_{x}.

Dense SxS_{x}: To tackle a dense SxS_{x}, we present a density aware simulation of the 𝖡𝖢𝖢\mathsf{BCC} model over SxS_{x}. The 𝖡𝖢𝖢\mathsf{BCC} model is simulated over SxS_{x} in [8] within O~(n1/3)\tilde{O}(n^{1/3}) rounds. Our observation is that as SxS_{x} is more dense, broadcasting messages in the input graph can be made more efficient. In essence, as SxS_{x} is more dense, neighborhoods in the original input graph GG are closely packed, and so when a node receives some message, it can efficiently share it with many nodes in its neighborhood. With this in hand, we can simulate the 𝖡𝖢𝖢\mathsf{BCC} algorithm from [11, Theorem 8] for approximate SSSP very quickly on dense skeleton graphs.

Tying up the pieces: Each simulation result by itself is insufficient, as in the extreme cases each solution takes O~(n1/3)\tilde{O}(n^{1/3}) rounds. Yet, by combining them and using each when it is better, based on the sparsity of SxS_{x}, we achieve the resulting O~(n5/17)=o(n1/3)\tilde{O}(n^{5/17})=o(n^{1/3}) round algorithm, for all graphs.

The outline of the rest of this section: We first perform routing over skeleton graphs where the receivers of messages do not know which nodes desire to send them messages (Section 4.1). In Section 4.2, we simulate the 𝖠𝖢(𝖼)\mathsf{AC(c)} model in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model. Next, we state that the 𝖡𝖢𝖢\mathsf{BCC} model can be simulated in the the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model, and defer the proof to Appendix C. Finally, in Section 4.3, we combine our various algorithms to yield the SSSP approximation result from which the weighted diameter approximation result also follows.

4.1 Oblivious Token Routing

The following claim shows how to route unicast messages inside a skeleton graph, and is based on \IfAppendixLABEL:\next ( (Oblivious Token Routing).)Lemma C.11 shown in Section C.2.

Claim 4.1 (Skeleton Unicast).

Given a graph GG, a skeleton graph Sx=(M,ES)S_{x}=(M,E_{S}), and a set of messages between the nodes of MM, s.t. each vMv\in M is the sender and receiver of at most k=O~(n22x)k=\tilde{O}(n^{2-2x}) messages, and each message is initially known to its sender, it is possible to route all given messages within O~(n1x)\tilde{O}(n^{1-x}) rounds of the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model.

\IfAppendix

LABEL:\next ( (Skeleton Unicast).)Claim 4.1 is an extremely important requirement for showing our o(n1/3)o(n^{1/3}) round algorithms. Previously, [49] required that every node knows how many messages every other node intends to send to it. In turn, this would require that for each vMv\in M, all the other nodes in MM know the identifier of vv. But the latter can be shown to take ω(nx/2)=ω(n1/3)\omega(n^{x/2})=\omega(n^{1/3}) rounds, since x>2/3x>2/3 (as elaborated above). Therefore, the necessity of our strengthened claim follows.

4.2 𝖠𝖢(𝖼)\mathsf{AC(c)} and 𝖡𝖢𝖢\mathsf{BCC} Simulations in 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid}

Theorem 4.2 (𝖠𝖢(𝖼)\mathsf{AC(c)} Simulation in 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid}).

Consider a graph G=(V,E)G=(V,E), and a skeleton graph Sx=(M,ES)S_{x}=(M,E_{S}) of GG, for some constant 2/3<x<12/3<x<1. Let ALGACALG_{AC} be an algorithm which runs in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model with capacity c=Θ~(n22x)c={\tilde{{\Theta}}}(n^{2-2x}) over SxS_{x} in tt rounds. Then there exists an algorithm which simulates ALGACALG_{AC} within Θ~(tn1x){\tilde{{\Theta}}}(t\cdot n^{1-x}) rounds of the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model over GG, w.h.p. Further, it is ensured that at the start of the simulation, every node in MM knows the communication tokens of all its neighbors in SxS_{x}.

Proof of Theorem 4.2.

We show that we can instantiate the 𝖠𝖢(𝖼)\mathsf{AC(c)} model over SxS_{x} and then show how to simulate each round.

In order to instantiate the 𝖠𝖢(𝖼)\mathsf{AC(c)} model over the nodes MM, the 𝖠𝖢(𝖼)\mathsf{AC(c)} model definition asserts that every node vMv\in M has an identifier in [|M|][|M|] as well as a communication token whose knowledge enables other nodes to communicate with vv. In order to ensure the condition related to identifiers is satisfied, we invoke \IfAppendixLABEL:\next ( (Unique IDs).)Claim C.7 over GG and SxS_{x} in O~(1)\tilde{O}(1) rounds. For the second condition, each node vMv\in M uses its original identifier in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model as its communication token in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, which enables us to use \IfAppendixLABEL:\next ( (Skeleton Unicast).)Claim 4.1 in order to later simulate the rounds of the 𝖠𝖢(𝖼)\mathsf{AC(c)} model. Finally, notice that it holds that every node in MM knows the communication tokens of all its neighbors in SxS_{x}, as the communication tokens are the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model identifiers.

In each round of the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, each node sends/receives at most cc messages to/from any other node, such that for each message it either knows the communication token of the recipient or the recipient is chosen independently, uniformly at random. We split each round of the 𝖠𝖢(𝖼)\mathsf{AC(c)} model into two phases. First, we route the messages which use communication tokens, and second, we route those messages destined to random nodes. Since the communication tokens in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model are the identifiers from the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model, the first phase is implemented straightforwardly using \IfAppendixLABEL:\next ( (Skeleton Unicast).)Claim 4.1, taking O~(n1x){\tilde{{O}}}(n^{1-x}) rounds of the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model over GG, as required.

For the second phase, denote by RvR_{v} the messages which node vv desires to send to random targets. Node vv selects, uniformly and independently, |Rv|c|R_{v}|\leq c random nodes from GG, where each node is the target for a different message in RvR_{v} – from here on, we assume that a message in RvR_{v} has the identifier of a random node in GG attached to it. For each helper uHvu\in H_{v}, node vv assigns c/|Hv|=O~(n1x)c/|H_{v}|=\tilde{O}(n^{1-x}) messages from RvR_{v}, denoted by RvuR_{v}^{u}, to uu. Within O~(n1x)\tilde{O}(n^{1-x}) rounds, node vv uses the local edges of the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model in order to inform uHvu\in H_{v} about RvuR_{v}^{u}. Next, each node uu uses the global edges of the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model in order to send the messages RvuR_{v}^{u} to their targets in GG, within O~(n1x)\tilde{O}(n^{1-x}) rounds, due to \IfAppendixLABEL:\next ( (Uniform Sending).)Claim C.4. Finally, given that a node uVu\in V received a message in this way, node uu selects uniformly and independently at random a node such that uHvu\in H_{v} and forwards that message vv. Notice that it can be the case that a certain uVu\in V does not help any node, and thus there does not exist a node vMv\in M such that uHvu\in H_{v} – in such a case, uu will report back to whichever node sent it the message saying that it cannot forward it. However, the definition of a skeleton graph promises that the total number of helper nodes of nodes in MM is at least Ω~(n)\tilde{\Omega}{(n)} (Property 7d of \IfAppendixLABEL:\next ( (Extended Skeleton Graph).)Definition 20). As every node assists at most O~(1)\tilde{O}(1) other nodes, this implies that at least a poly-logarithmic fraction of the nodes assist other nodes, and so if we repeat the above process and resend messages that bounced back, within O~(1)\tilde{O}(1) iterations of the above algorithm, w.h.p., all messages are forwarded.

Notice that the above methodology does not produce a uniform distribution of receivers over the nodes MM, since there might be some node vMv\in M such that all of HvH_{v} only help vv, and a node uMu\in M such that all the nodes HuH_{u} also help other nodes in MM (thus uu is less likely than vv to receive a random message). This happens even though each node only helps at most O~(1)\tilde{O}(1) other nodes and the number of nodes in HvH_{v} is exactly the same for each vMv\in M. Thus, we augment the probabilities with which a node vMv\in M accepts a message. Each node vMv\in M observes its helpers HvH_{v} and computes the probability, denoted pvp_{v}, that given that a uniformly chosen node uHvu\in H_{v} received a message, uu forwards the message to vv. Now, the nodes MM utilize an Aggregate and Broadcast tool of [7, Theorem 2.2] (see \IfAppendixLABEL:\next ( (Aggregate and Broadcast).)Claim C.1) in order to compute p=minvMpvp=\min_{v\in M}p_{v}. Every node vMv\in M now accepts each message it received with probability p/pvp/p_{v}, independently. In the case that a node vMv\in M rejects a message, it notifies the original sender of the message that the message was rejected – this is done by sending messages in the reverse direction to the way they were sent previously. In the case that a node vMv\in M hears that a message it sent was rejected, it will attempt to resend it by repeating the entire algorithm above. As every node uVu\in V helps at most O~(1)\tilde{O}(1) nodes vMv\in M, it holds that p=Ω~(1)p=\tilde{\Omega}(1), and so within O~(1)\tilde{O}(1) iterations of the above algorithm, w.h.p., every message will be successfully delivered. ∎

The following is proven in Appendix C.

Theorem 4.3 (𝖡𝖢𝖢\mathsf{BCC} Simulation in 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid}).

Given a graph G=(V,E)G=(V,E), a skeleton graph Sx=(M,ES)S_{x}=(M,E_{S}), for some constant 2/3<x<12/3<x<1, with average degree k=Θ(|ES|/|M|)k=\Theta(|E_{S}|/|M|), and an algorithm ALGBCCALG_{BCC} in the 𝖡𝗋𝗈𝖺𝖽𝖼𝖺𝗌𝗍𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Broadcast\ Congested\ Clique} model which runs on SxS_{x} in tt rounds, it is possible to simulate ALGBCCALG_{BCC} by executing O~(t(n2x1/k+n1x))\tilde{O}(t\cdot(n^{2x-1}/\sqrt{k}+n^{1-x})) rounds of the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model on GG. This assumes that prior to running ALGBCCALG_{BCC}, each node vSxv\in S_{x} has at most O~(degG(v))\tilde{O}(\deg_{G}(v)) bits of input used in ALGBCCALG_{BCC}, including, potentially, the incident edges of vv in SxS_{x}, and that the output of each node in ALGBCCALG_{BCC} is at most O(tlogn)O(t\log n) bits.

4.3 A (1+ε)(1+\varepsilon)-Approximation for SSSP

We show an O~(n5/17/ε9)\tilde{O}(n^{5/17}/\varepsilon^{9}) round algorithm for a (1+ε)(1+\varepsilon)-approximation of weighted SSSP in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model. We begin by showing how to use the 𝖠𝖢(𝖼)\mathsf{AC(c)} algorithm from \IfAppendixLABEL:\next ( ((1+ε)(1+\varepsilon)-Approximation for SSSP in 𝖠𝖢(𝖼)\mathsf{AC(c)} (Wrapper)).)Theorem B.25 in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model for sparse skeleton graphs, with low maximal degree and even lower average degree, we then show an algorithm in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model for graphs with low average degree, yet high maximal degree, and then finally we show how to use the 𝖡𝗋𝗈𝖺𝖽𝖼𝖺𝗌𝗍𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Broadcast\ Congested\ Clique} algorithm from [11, Theorem 8] in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model for dense graphs, with high average degree. Combining these claims using \IfAppendixLABEL:\next ( ((1+ε)(1+\varepsilon)-Approximation for SSSP in 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid}).)Theorem 1.2 gives the desired result.

Lemma 4.4 (SSSP with Low Average and Maximal Degrees).

Given a weighted input graph G=(V,E)G=(V,E), and a skeleton graph Sx=(M,ES)S_{x}=(M,E_{S}), s.t. the average and maximal degrees in SxS_{x} are k=O~(nx/2)k=\tilde{O}(n^{x/2}) and ΔSx=O(n22x)\Delta_{S_{x}}=O(n^{2-2x}), respectively, a value 0<ε<10<\varepsilon<1, and a source node sMs\in M, ensures that every node in MM knows a (1+ε)(1+\varepsilon)-approximation to its distance from ss over the edges ESE_{S}, within O~((n11x/61+n1x)/ε)\tilde{O}((n^{11x/6-1}+n^{1-x})/\varepsilon) rounds in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model, w.h.p.

Proof of Lemma 4.4.

Use \IfAppendixLABEL:\next ( (𝖠𝖢(𝖼)\mathsf{AC(c)} Simulation in 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid}).)Theorem 4.2 to simulate in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model over GG the SSSP algorithm from \IfAppendixLABEL:\next ( ((1+ε)(1+\varepsilon)-Approximation for SSSP in 𝖠𝖢(𝖼)\mathsf{AC(c)} (Wrapper)).)Theorem B.25 over SxS_{x}. Set capacity c=Θ~(n22x)c=\tilde{\Theta}(n^{2-2x}) for a round complexity of O~(n1x(((nx)5/6+k(nx)1/3)/(cε)+1/ε+ΔSx/c))=O~(n1x(((nx)5/6+nx/2(nx)1/3)/(cε)+1/ε+ΔSx/c))=O~((n5x/62+2x+1x+n1x)/ε+ΔSx/n1x)=O~((n11x/61+n1x)/ε)\tilde{O}(n^{1-x}\cdot(((n^{x})^{5/6}+k\cdot(n^{x})^{1/3})/(c\cdot\varepsilon)+1/\varepsilon+\Delta_{S_{x}}/c))=\tilde{O}(n^{1-x}\cdot(((n^{x})^{5/6}+n^{x/2}\cdot(n^{x})^{1/3})/(c\cdot\varepsilon)+1/\varepsilon+\Delta_{S_{x}}/c))=\tilde{O}((n^{5x/6-2+2x+1-x}+n^{1-x})/\varepsilon+\Delta_{S_{x}}/n^{1-x})=\tilde{O}((n^{11x/6-1}+n^{1-x})/\varepsilon).∎

\IfAppendix

LABEL:\next ( ((1+ε)(1+\varepsilon)-Approximation for SSSP in 𝖠𝖢(𝖼)\mathsf{AC(c)} (Wrapper)).)Theorem B.25 depends on the maximal degree, and so applying it to a skeleton graph with high maximal degree is inefficient. Thus, for sparse skeleton graphs with a high maximal degree, we show a 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} algorithm ensuring one node in the skeleton graph can learn all of the skeleton graph and inform the nodes of their desired outputs. The proof of the following appears in Appendix C.

Lemma 4.5 (SSSP with Low Average and High Maximal Degrees).

Given a weighted input graph G=(V,E)G=(V,E), and a skeleton graph Sx=(M,ES)S_{x}=(M,E_{S}), for some constant 2/3<x12/172/3<x\leq 12/17, such that the average and maximal degrees in SxS_{x} are at most O~(nx/2)\tilde{O}(n^{x/2}) and at least ΔSx=ω(n22x)\Delta_{S_{x}}=\omega(n^{2-2x}), respectively, and a source node sMs\in M, ensures that every node in MM knows its distance from ss over the edges ESE_{S}, within O~(n1x)\tilde{O}(n^{1-x}) rounds in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model, w.h.p.

We use our efficient 𝖡𝖢𝖢\mathsf{BCC} simulation for dense skeleton graphs.

Lemma 4.6 (SSSP with High Average Degree).

Given a weighted, undirected input graph G=(V,E)G=(V,E), a skeleton graph Sx=(M,ES)S_{x}=(M,E_{S}), for some constant 2/3<x<12/3<x<1, such that the average degree in SxS_{x} is at least k=Ω~(nx/2)k={\tilde{{\Omega}}}(n^{x/2}), a value 0<ε<10<\varepsilon<1, and a source node sMs\in M, ensures that every node in MM knows a (1+ε)(1+\varepsilon)-approximation to its distance from ss over the edges ESE_{S}, within O~((n7x/41+n1x)/ε9){\tilde{{O}}}((n^{7x/4-1}+n^{1-x})/\varepsilon^{9}) rounds in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model, w.h.p.

Proof of Lemma 4.6.
\IfAppendix

LABEL:\next ( (𝖡𝖢𝖢\mathsf{BCC} Simulation in 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid}).)Theorem 4.3 simulates the SSSP approximation algorithm of [11, Theorem 8] on SxS_{x}. As the complexity of [11, Theorem 8] is O~(ε9)\tilde{O}(\varepsilon^{-9}) rounds of the 𝖡𝖢𝖢\mathsf{BCC} model, the simulation takes O~(ε9(n2x1/k+n1x))=O~(ε9(n2x1(x/2)/2+n1x))=O~((n7x/41+n1x)/ε9)\tilde{O}(\varepsilon^{-9}\cdot(n^{2x-1}/\sqrt{k}+n^{1-x}))=\tilde{O}(\varepsilon^{-9}\cdot(n^{2x-1-(x/2)/2}+n^{1-x}))={\tilde{{O}}}((n^{7x/4-1}+n^{1-x})/\varepsilon^{9}) rounds of the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model. ∎

See 1.2

Proof of Theorem 1.2.

Denote x=12/17x=12/17. Construct a skeleton graph Sx=(M,ES)S_{x}=(M,E_{S}) in O~(n1x)=O~(n5/17)\tilde{O}{(n^{1-x})}=\tilde{O}(n^{5/17}) rounds using \IfAppendixLABEL:\next ( (Construct Skeleton).)Corollary C.6. Notice that \IfAppendixLABEL:\next ( (Construct Skeleton).)Corollary C.6 can ensure that source ss is also in MM. Using \IfAppendixLABEL:\next ( (Aggregate and Broadcast).)Claim C.1, compute k=Θ(|ES|/|M|)k=\Theta(|E_{S}|/|M|) and the maximal degree in SxS_{x}, the value ΔSx\Delta_{S_{x}}, in O~(1)\tilde{O}(1) rounds.

First, ensure every node in SxS_{x} knows a (1+ε)(1+\varepsilon)-approximation to its distance from ss over the edge set ESE_{S} only. Thus, selectively deploy Lemmas 4.6, 4.4 and 4.5, as follows. If k=Ω~(nx/2)k=\tilde{\Omega}(n^{x/2}), invoke Lemma 4.6 in O~((n7x/41+n1x)/ε9)=O~((n(7/4)(12/17)1+n112/17)/ε9)=O~((n4/17+n5/17)/ε9)=O~(n5/17/ε9){\tilde{{O}}}((n^{7x/4-1}+n^{1-x})/\varepsilon^{9})={\tilde{{O}}}((n^{(7/4)\cdot(12/17)-1}+n^{1-12/17})/\varepsilon^{9})={\tilde{{O}}}((n^{4/17}+n^{5/17})/\varepsilon^{9})={\tilde{{O}}}(n^{5/17}/\varepsilon^{9}) rounds. Otherwise, k=O(nx/2)k=O(n^{x/2}), and so split the algorithm into two cases, according to ΔSx\Delta_{S_{x}}. If ΔSx=O(n22x)\Delta_{S_{x}}=O(n^{2-2x}), invoke Lemma 4.4, in O~((n11x/61+n1x)/ε)=O~((n(11/6)(12/17)1+n112/17)/ε)=O~(n5/17/ε)\tilde{O}((n^{11x/6-1}+n^{1-x})/\varepsilon)=\tilde{O}((n^{(11/6)\cdot(12/17)-1}+n^{1-12/17})/\varepsilon)={\tilde{{O}}}(n^{5/17}/\varepsilon) rounds, and otherwise invoke Lemma 4.5 in O~(n1x)=O~(n5/17)\tilde{O}(n^{1-x})={\tilde{{O}}}(n^{5/17}) rounds.

Finally, due to Property 4 of the definition of a skeleton graph, we invoke \IfAppendixLABEL:\next ( (Extend Distances).)Claim C.10, in O~(n5/17)\tilde{O}(n^{5/17}) rounds, to ensure every vVv\in V knows a (1+ε)(1+\varepsilon)-approximation to its distance from ss in GG. ∎

Theorem 4.7 ((2+ε)(2+\varepsilon)-Approximation for Weighted Diameter in 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid}).

It is possible to compute a (2+ε)(2+\varepsilon)-approximation for weighted diameter in O~(n5/17/ε9)\tilde{O}(n^{5/17}/\varepsilon^{9}) rounds in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model, w.h.p.

5 Fast Distance Computations in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} Model

In Section 5.1 we show how to simulate the 𝖠𝖢(𝖼)\mathsf{AC(c)} model in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}. We then employ our simulation to simulate the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} model in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} (Section 5.2), and to derive our distance computations in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} (Section 5.3).

5.1 𝖠𝖢(𝖼)\mathsf{AC(c)} Simulation in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}

Key Principle. In the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model, each node vv can send or receive degG(v)\deg_{G}{(v)} messages in each round, implying a total bandwidth of 2m2m messages per round. We aim to build efficient distance tools which utilize this bandwidth completely. Consider the 𝖠𝖢(𝗆/𝗇)\mathsf{AC(m/n)} model, which also has a bandwidth of 2m2m messages per round. Potentially, by comparing the bandwidths, it could be possible to simulate one round of the 𝖠𝖢(𝖼)\mathsf{AC(c)} model in single round of the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model. However, in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model, nodes with degree o(m/n){o}(m/n) can not send m/nm/n messages in a single round, regardless of how an algorithm tries to do this.

To overcome this problem, we notice that the bandwidth of the nodes with degree at least m/(4n)m/(4n) is at least 7m/47m/4. Thus, the key principle we show is that the high degree nodes, denoted by HH, can learn all of the input of the low degree nodes, denote by LL. The higher the degree of a node, the more nodes it simulates. Formally, we create an assignment ρ:LH\rho\colon L\mapsto H where for every L\ell\in L, the node ρ()H\rho(\ell)\in H simulates \ell, and we ensure that ρ\rho is globally known.

We then simulate the 𝖠𝖢(𝗆/𝗇)\mathsf{AC(m/n)} model using only the nodes in HH, and finally we send back the resulting output to the nodes in LL. To allow all-to-all communication between the nodes in HH, we use the routing algorithms developed in [38, 20] and stated in Appendix D and pay an overhead of τmix2O(logn)\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds for the simulation.

Some problems, such as Scattered APSP, require the output to be stored on some node, but do not specify on which. We adapt for this case by allowing nodes to produce an auxiliary output. After the simulation is over, every node uu, given the communication token of any node vv can compute the identifier of node ww where the auxiliary output of vv is stored.

Carrier Configuration and Communication Tokens. In addition to simulating the 𝖠𝖢(𝖼)\mathsf{AC(c)} model in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model, we let every node know the communication token of its neighbors as well as construct a carrier configuration directly in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model. This benefits some graph problems greatly.

Note that in the simulation in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model, the carrier configuration is constructed in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model itself. However, in the case of the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model, we cannot delegate this task efficiently to the 𝖠𝖢(𝖼)\mathsf{AC(c)} model since building a carrier configuration requires every node to be able to send its incident edges to arbitrary nodes in the graph. Doing so takes Ω(Δn/m)\Omega(\Delta\cdot n/m) rounds in the 𝖠𝖢(𝗆/𝗇)\mathsf{AC(m/n)} model, yet only O(τmixno(1))O(\tau_{\text{mix}}\cdot n^{o(1)}) rounds in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}.

However, building a carrier configuration in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model is also not directly possible, as a low degree carrier might learn Ω(m/n){\Omega}(m/n) edges of other nodes, which could take Ω(m/n){\Omega}(m/n) rounds. Therefore, instead of each node sending its edges directly to its carriers, it sends them to the nodes which simulate its carriers.

Overall this might sound like a back and forth process, as we simulate low degree nodes by high degree nodes and then split the edges of the high degree nodes among nodes which may be simulated nodes. However, using the 𝖠𝖢(𝖼)\mathsf{AC(c)} model grants us the modular approach we aim for.

Supergraphs. \IfAppendixLABEL:\next ( (Hopset Construction).)Theorem B.19 constructs hopsets and Theorem B.28 computes Scattered APSP in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, and both require the input graph to have Ω(n3/2){\Omega}(n^{3/2}) edges.101010This assumption can be removed at the expense of more complicated proofs, yet would not imply any speed-up for our end-results. We aim to apply those results on general graphs and so augment GG with O(n){O}(n) added nodes and Θ(n3/2){\Theta}(n^{3/2}) added edges while preserving distances between the nodes of VV and ensuring that all added edges are globally known. We call the resulting graph GG^{\prime} an n3/2n^{3/2}-supergraph, build a carrier configuration holding GG^{\prime}, and apply the 𝖠𝖢(𝖼)\mathsf{AC(c)} algorithms on GG^{\prime}.

Definition 6.

Given a weighted graph G=(V,E,w)G=(V,E,w) with n=|V|n=\left|V\right| and m=|E|m=\left|E\right|, and a number 0mn20\leq m^{\prime}\leq n^{2}, a weighted graph G=(V,E,w)G^{\prime}=(V^{\prime},E^{\prime},w^{\prime}) is an mm^{\prime}-supergraph of GG, if G=(V,E,w)G^{\prime}=(V^{\prime},E^{\prime},w^{\prime}) is obtained from GG by adding m/n\lceil m^{\prime}/n\rceil new nodes and adding an infinite weight edge between every added node and every original node.

Clearly, a supergraph preserves the original distances.

We now simulate the 𝖠𝖢(𝖼)\mathsf{AC(c)} model in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model.

Theorem 5.1 (𝖠𝖢(𝖼)\mathsf{AC(c)} Simulation in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}).

Consider a graph GG and some constant cc. Let mm^{\prime} be some number s.t. 0mn20\leq m^{\prime}\leq n^{2}. Let k=m/nk=m/n be the average degree of GG and let AA be an algorithm which runs in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model over GG in tt rounds. For each vv denote by ivlogni_{v}\log{n} and ovlogno_{v}\log{n} the number of bits in the input and output of the node vv, respectively. Let ic=maxvV{1+iv/degG(v)}i_{c}=\max_{v\in V}\set{1+i_{v}/\deg_{G}(v)} and oc=maxvV{1+ov/degG(v)}o_{c}=\max_{v\in V}\set{1+{o_{v}/\deg_{G}(v)}} be the input and output capacities: the minimum number of rounds required for any node to send or receive its input or output, respectively.

There exists an algorithm which simulates AA within (m/m+ic+c/kt+oc)τmix2O(logn)(m^{\prime}/m+i_{c}+\lceil c/k\rceil\cdot t+o_{c})\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds of the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model over GG. The above works even if AA requires a carrier configuration of GG^{\prime} (an mm^{\prime}-supergraph of GG) or communication tokens of neighbors as an input. Furthermore, each node uu might produce some unbounded auxiliary output, in which case the output is known to some (not necessary same) node vv such that each node ww can compute the identifier of vv given the communication token of uu.

Proof of Theorem 5.1.

Identifiers: First, compute k=m/nk=m/n in O(D)=O(τmix)O(D)=O(\tau_{\text{mix}}) rounds. By the definition of the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model, each node vv has an identifier v[n]v\in\left[n\right], denoted the original identifier. Use \IfAppendixLABEL:\next ( (Identifiers).)Corollary D.2, to compute a set of new identifiers IDnew:V[n]ID_{new}\colon V\mapsto\left[n\right]. We abuse notation and denote by degG(i)\deg_{G}(i) the degree of node vv with identifier IDnew(v)=iID_{new}(v)=i. Each node vv locally computes for each new identifier i[n]i\in\left[n\right] its value logdegG(i)\lfloor\log\deg_{G}{(i)}\rfloor. A node vVv\in V with degree that is less than 2logk22^{\lfloor\log{k}\rfloor-2} is a low degree node vLv\in L, otherwise it is a high degree node vHv\in H. According to the properties of IDnewID_{new}, the new identifiers of nodes in LL are smaller than those of nodes in HH. We abuse the notation and treat LL and HH as set of new identifiers. For each iLi\in L denote xi=logdegG(i)x_{i}=\lfloor\log\deg_{G}{(i)}\rfloor and for each jHj\in H denote yj=logdegG(i)y_{j}=\lfloor\log\deg_{G}{(i)}\rfloor.

Simulation Assignment: The numbers |L|,|H|\left|L\right|,\left|H\right| and the sets {xi}iL,{yj}jH\set{x_{i}}_{i\in L},\set{y_{j}}_{j\in H} satisfy the conditions of \IfAppendixLABEL:\next ( (Assignment).)Claim D.4, since iL2xi+jH2yj=vV2degG(v)>vV2degG(v)1=2m/2=m\sum_{i\in L}{2^{x_{i}}}+\sum_{j\in H}{2^{y_{j}}}=\sum_{v\in V}{2^{\lfloor\deg_{G}{(v)}\rfloor}}>\sum_{v\in V}{2^{\deg_{G}{(v)}-1}}=2m/2=m. Hence, there is a partition of LL into |H|\left|H\right| sets {Ij}\set{I_{j}}, satisfying |Ij|42yj/k4degG(v)/k\left|I_{j}\right|\leq 4\cdot\lfloor 2^{y_{j}}/k\rfloor\leq 4\lfloor\deg_{G}{(v)}/k\rfloor. The node jHj\in H simulates the nodes in IjI_{j}, and for simplicity it also simulates itself. Denote by ρ(i)\rho(i) the new identifier of the node simulating the node with new identifier ii.

Input: For every jHj\in H, we now deliver the information node jj requires to simulate nodes in IjI_{j}. Each uu sends to its neighbors its new identifier IDnew(u)ID_{new}(u). As uu knows its new identifier IDnew(u)ID_{new}(u), it also knows the new identifier of jj. Using \IfAppendixLABEL:\next ( (𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} Routing).)Claim D.1, uLu\in L sends to the node ρ(u)\rho(u) its new and original identifier together with the new and original identifiers of its neighbors and input. A node vHv\in H, for each neighbor ww of node uIvu\in I_{v},can now locally compute the new identifier of ρ(w)\rho(w). This invokes the algorithm from \IfAppendixLABEL:\next ( (𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} Routing).)Claim D.1 at most O(ic){O}(i_{c}) times. In each invocation, each uLu\in L sends at most degG(u)\deg_{G}(u) messages and each node vHv\in H receives at most 4(degG(v)/k)k=degG(v)2O(logn)4\cdot(\deg_{G}(v)/k)\cdot k=\deg_{G}(v)\cdot 2^{{O}(\sqrt{\log{n}})} messages, as required. Instantiation: We now instantiate the 𝖠𝖢(𝖼)\mathsf{AC(c)} model. As a communication token of node vv in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, we use the concatenation of vv, ρ(v)\rho(v) and IDnew(v)ID_{new}(v). Clearly, identifiers are unique in [n]\left[n\right] and communication tokens are unique of size O(logn){O}(\log{n}) bits. While pre-possessing, we already guaranteed that the node which simulates uu knows the communication token of all neighbors of uu. Now the new identifier assignment IDnewID_{new} and simulation assignment ρ\rho satisfy the demands of \IfAppendixLABEL:\next ( (Build Carrier Configurations in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}).)Claim D.3 and we use it to build carrier configuration.

Round Simulation: During one round of the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, each node can send and receive at most cc messages. Each message is sent either to a random node or a node with a known communication token. We split into two phases. First, sending the messages to nodes with known communication tokens, and second sending the messages to random nodes.

In the first phase, we use the fact that the new identifier of the destination is a part of the communication token. Each node vHv\in H has to send and receive messages on behalf of all IvI_{v}. Thus, node vHv\in H has to send and receive at most (4deg(v)/k+1)c(4deg(v)+1)c/k(4\cdot\lfloor\deg{(v)}/k\rfloor+1)\cdot c\leq(4\cdot\deg{(v)}+1)\lceil c/k\rceil messages. It is therefore enough to invoke the algorithm from \IfAppendixLABEL:\next ( (𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} Routing).)Claim D.1 for O~(c/k){\tilde{{O}}}(\lceil c/k\rceil) times to deliver all the messages w.h.p.

In the second phase, for each message, we chose a new identifier independently and uniformly. Since for node uu we know the new identifier of ρ(u)\rho(u), we also know where to route the message to. As nn nodes sample uniformly at most cc messages, each new identifier is sampled at most O~(c){\tilde{{O}}}(c) times, w.h.p. Thus, the number of messages that a high degree node vHv\in H has to send or receive is at most O~(deg(v)c/k){\tilde{{O}}}(\deg{(v)}c/k). Thus, w.h.p., O~(c/k){\tilde{{O}}}(\lceil{c/k}\rceil) invocations of algorithm from \IfAppendixLABEL:\next ( (𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} Routing).)Claim D.1 suffice.

Main Output: Finally, we send outputs back to the simulated nodes. This works in a similar manner, with a node vHv\in H splitting the output of each node uIvu\in I_{v} into ou/deg(u)\lceil{o_{u}/\deg{(u)}}\rceil batches of size at most deg(u)\deg{(u)}. Notice that since vv receives the identifiers of all neighbors of uu, it knows deg(u)\deg{(u)}. Then, for oco_{c} rounds, each node uses \IfAppendixLABEL:\next ( (𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} Routing).)Claim D.1 to send one batch to each node it simulates.

Auxiliary Output: The auxiliary output that some node uu produces is stored in the node v=ρ(u)v=\rho(u) which simulates it. Since ρ(u)=v\rho(u)=v is a part of the communication token of uu, each node ww which knows the communication token of uu, knows also vv.

Round Complexity: By \IfAppendixLABEL:\next ( (Identifiers).)Corollary D.2, computing new identifiers takes O(τmix+logn){O}(\tau_{\text{mix}}+\log{n}) rounds. By \IfAppendixLABEL:\next ( (𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} Routing).)Claim D.1, sending the input requires icτmix2O(logn)i_{c}\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds, the simulation of the tt rounds of the 𝖠𝖢(𝖼)\mathsf{AC(c)} model requires cktτmix2O(logn)\lceil\frac{c}{k}\rceil\cdot t\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds, and sending the output back takes ocτmix2O(logn)o_{c}\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds, w.h.p. By \IfAppendixLABEL:\next ( (Build Carrier Configurations in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}).)Claim D.3, building the carrier configuration takes m/mτmix2O(logn)\lceil m^{\prime}/m\rceil\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds w.h.p. Thus, the execution terminates in (m/m+ic+c/kt+oc)τmix2O(logn)(m^{\prime}/m+i_{c}+\lceil c/k\rceil\cdot t+o_{c})\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds, w.h.p. ∎

5.2 Faster 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} Simulation

The 𝖠𝖢(𝖼)\mathsf{AC(c)} model is, in a sense, a generalization of the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} model, directly implying the following.

Claim 5.2 (𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} Simulation in 𝖠𝖢(𝖼)\mathsf{AC(c)}).

There is an algorithm, which executes one round of the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} model in the 𝖠𝖢(𝖼)\mathsf{AC(c)} in O~(nc){\tilde{{O}}}(\frac{n}{c}) rounds w.h.p.

Proof of Claim 5.2.

Initially, for O~(nc){\tilde{{O}}}(\frac{n}{c}) rounds, each node sends its communication token and identifier to cc (not necessary distinct) randomly sampled nodes. By Chernoff and union bounds, all nodes receive the identifiers and communication tokens of all other nodes w.h.p. For an additional O~(nc){\tilde{{O}}}(\frac{n}{c}) rounds, the nodes use the learned communication tokens to deliver the messages they have. ∎

Combining \IfAppendixLABEL:\next ( (𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} Simulation in 𝖠𝖢(𝖼)\mathsf{AC(c)}).) and LABEL:\next ( (𝖠𝖢(𝖼)\mathsf{AC(c)} Simulation in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}).)Claims 5.2 and 5.1 implies a density aware simulation of the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} model in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model – the more edges the input graph has, the faster the simulation.

Theorem 1.3 (𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} Simulation in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}).

Consider a graph G=(V,E)G=(V,E), and AA an algorithm which runs in the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} model over GG in tt rounds. For each vv denote by ivlogni_{v}\log{n} and ovlogno_{v}\log{n} its number of input and output bits, respectively. Let ic=maxvV{1+iv/degG(v)}i_{c}=\max_{v\in V}\set{1+i_{v}/\deg_{G}(v)} and oc=maxvV{1+ov/degG(v)}o_{c}=\max_{v\in V}\set{1+{o_{v}/\deg_{G}(v)}} be the input and output capacities: rounds required for any node to send or receive its input or output, respectively. Then AA can be simulated within (ic+n2/mt+oc)τmix2O(logn)(i_{c}+{{n^{2}}/{m}}\cdot t+o_{c})\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds of the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model over GG, w.h.p.

5.3 Improved Distance Computations

We show an exact SSSP algorithm via a simulation of the algorithm from \IfAppendixLABEL:\next ( (𝖠𝖢(𝖼)\mathsf{AC(c)} Simulation in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}).)Theorem 5.1. See 1.4

Proof of Theorem 1.4.

The claim follows from simulating \IfAppendixLABEL:\next ( (Exact SSSP in 𝖠𝖢(𝖼)\mathsf{AC(c)}).)Theorem B.23 of 𝖠𝖢(𝗆/𝗇)\mathsf{AC(m/n)} over GG using \IfAppendixLABEL:\next ( (𝖠𝖢(𝖼)\mathsf{AC(c)} Simulation in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}).)Theorem 5.1, in (1+O~(m1/2n1/6/c+n/c+n7/6/m1/2)c/k)τmix2O(logn)=(n7/6/m1/2+n2/m)τmix2O(logn)(1+\tilde{O}(m^{1/2}n^{1/6}/c+n/c+n^{7/6}/m^{1/2})\cdot\lceil c/k\rceil)\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}={(n^{7/6}/m^{1/2}+n^{2}/m)}\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds, w.h.p. ∎

Now, we define and approximate our first APSP relaxation.

Definition 7 (Shortest Path Query Problem).

Given an input graph GG, a query set is a set of qq source-destination pairs Q={(si,ti)}i=1qQ=\set{(s_{i},t_{i})}_{i=1}^{q} called queries. For each node uu, the source and destination loads, usu_{s} and utu_{t}, respectively, are the number of times uu appears as a source and destination in SS divided by its degree. The maximum \ell over all source and destination loads is the query set load.

A shortest path query problem is a query set of size qq and load \ell, s.t. every sis_{i} knows the identifier of tit_{i}. The goal is to answer all queries, that is, sis_{i} computes or approximates dG(si,ti)d_{G}(s_{i},t_{i}).

Given an input graph GG, an algorithm is a shortest path query algorithm if, after some pre-processing of TpreprocessingT_{pre-processing} rounds, given any query set of size qq and load \ell, the algorithm solves shortest path query problem within an additional number of TqueryT_{query} rounds.

We follow the approach of [14] in order to design a (3+ε)(3+\varepsilon)-approximate shortest path query algorithm, using our methods from the 𝖠𝖢(𝖼)\mathsf{AC(c)} model. For this we use the following important tool whose proof deferred to Appendix D.

Lemma 5.3 (kk-nearest in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}).

Given a graph GG, it is possible in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model, within (kn4/3/m+n5/3/m+1)τmix2O(logn){(k\cdot n^{4/3}/m+n^{5/3}/m+1)}\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds, w.h.p., to compute the distance dG(v,u)d_{G}(v,u) from every node vv to every node uu which is one of the closest kk nodes to vv (with ties broken arbitrarily).

See 1.5

Proof of Theorem 1.5.

To approximate the distance dG(s,t)d_{G}(s,t), we compute min{dGn1/2(s,t),dG(s,p(s))+dG(p(s),t)}\min\set{d_{G}^{n^{1/2}}(s,t),d_{G}(s,p(s))+d_{G}(p(s),t)}, where p(s)p(s) is the closest node to ss in some globally known hitting set AA of all NGn1/2(s)N^{n^{1/2}}_{G}(s). Thus, while pre-possessing, we verify that each node ss knows dG(s,v)d_{G}(s,v) for each vAv\in A.

Pre-processing: First, we execute the algorithm from Lemma 5.3 with k=n1/2k=n^{1/2} to get the distance to each node in NGn1/2(v)N^{n^{1/2}}_{G}(v). Now, each node enters AA independently with probability O~(n1/2){\tilde{{O}}}(n^{-1/2}). W.h.p., the set AA is of size O~(n1/2){\tilde{{O}}}(n^{1/2}) and is a hitting set for each NGn1/2(v)N^{n^{1/2}}_{G}(v). Let ε=ε/3\varepsilon^{\prime}=\varepsilon/3. We compute (1+ε)(1+\varepsilon^{\prime})-approximate MSSP from AA using O~(n1/2){\tilde{{O}}}(n^{1/2}) invocations of the (1+ε)(1+\varepsilon) approximate SSSP algorithm [40, Corollary 5]. Each vv computes p(v)ANGn1/2(v)p(v)\in A\cap N_{G}^{n^{1/2}}(v), which is the sampled node closest to vv, which exists in the set of its n1/2n^{1/2} nearest neighbors w.h.p.

Computing n1/2n^{1/2}-nearest takes (kn4/3/m+n5/3/m+1)τmix2O(logn){(k\cdot n^{4/3}/m+n^{5/3}/m+1)}\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds, w.h.p. The complexity of solving n1/2n^{1/2}-SSP is n1/2τmix2O(logn)n^{1/2}\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{log{n}})} rounds w.h.p. Overall complexity of the pre-processing thus (n1/2+n11/6/m)τmix2O(logn)(n^{1/2}+{n^{11/6}}/{m})\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds w.h.p.

Query: Whenever node sis_{i} needs to approximate its distance to tit_{i}, the node sis_{i} requests from tit_{i} the distance to p(si)p(s_{i}), for which tit_{i} knows a (1+ε)(1+\varepsilon^{\prime}) approximation, d~(p(si),ti)\tilde{d}(p(s_{i}),t_{i}), due to [40, Corollary 5]. The node ss approximates its distance to tit_{i} as d~(si,ti)=d(si,p(si))+d~(p(si),ti)\tilde{d}(s_{i},t_{i})=d(s_{i},p(s_{i}))+\tilde{d}(p(s_{i}),t_{i}). The approximation factor follows from \IfAppendixLABEL:\next ( (APSP using kk-nearest and MSSP).)Claim A.3. To execute the routing, the nodes invoke Claim D.1. The number of rounds for solving a query with load \ell is τmix2O(logn)\ell\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} w.h.p. ∎

By denoting δ\delta as the minimal degree in the graph, one gets load n/δ\ell\leq n/\delta and mnδm\geq n\cdot\delta, which implies our main result Theorem 1.1.

See 1.1

Finally, we use our 𝖠𝖢(𝖼)\mathsf{AC(c)} simulation together with \IfAppendixLABEL:\next ( ((3+ε)(3+\varepsilon)-Approximation for Scattered APSP in 𝖠𝖢(𝖼)\mathsf{AC(c)}).)Theorem B.28 to obtain our Scattered APSP algorithm in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model.

See 1.6

Proof of Theorem 1.6.

Simulate \IfAppendixLABEL:\next ( ((3+ε)(3+\varepsilon)-Approximation for Scattered APSP in 𝖠𝖢(𝖼)\mathsf{AC(c)}).)Theorem B.28 using \IfAppendixLABEL:\next ( (𝖠𝖢(𝖼)\mathsf{AC(c)} Simulation in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}).)Theorem 5.1. Split the output of each node uu to two parts. The first part, sus_{u}, is O~(n1/2)\tilde{O}(n^{1/2}) bits encoding the communication tokens of nodes which store the distances from uu. Thus, the output capacity is oc=O~(n1/2)o_{c}={\tilde{{O}}}(n^{1/2}). The communication tokens decoded from sus_{u}, allow uu to know where its distances are stored. The second part of the output of uu is the auxiliary output which is the distances it knows. Thus, the algorithm completes in ((n11/6/m+n1/3+m/n)/ε+n1/2)τmix2O(logn)((n^{11/6}/m+n^{1/3}+m/n)/\varepsilon+n^{1/2})\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds, w.h.p. ∎

6 Discussion

We believe that additional problems in various fundamental distributed settings could be solvable using our infrastructure for sparsity aware computation. This is a broad open direction for further research.

With respect to the specific results shown here, a major goal would be to construct a more sparse hopset in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model. Further, one could attempt to show sparse matrix multiplication algorithms which relax the assumption that the input matrices and output matrix are bounded by the same number of finite elements, as this could directly improve the complexity of our kk-SSP algorithm. Either of these improvements is likely to significantly reduce the round complexity of many of our end results, in both the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} and 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} models.

Acknowledgements

This project was partially supported by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement no. 755839. The authors would like to thank Michal Dory and Yuval Efron for various helpful conversations. We also thank Fabian Kuhn for sharing a preprint of [49] with us.

References

  • [1] Amir Abboud, Keren Censor-Hillel, and Seri Khoury. Near-linear lower bounds for distributed distance computations, even in sparse networks. In Cyril Gavoille and David Ilcinkas, editors, Distributed Computing - 30th International Symposium, DISC 2016, September 27-29, 2016. Proceedings, volume 9888 of Lecture Notes in Computer Science, pages 29–42, Paris, France, 2016. Springer. doi:10.1007/978-3-662-53426-7\\\\_3.
  • [2] Udit Agarwal and Vijaya Ramachandran. Distributed weighted all pairs shortest paths through pipelining. In 2019 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2019, May 20-24, 2019, pages 23–32, Rio de Janeiro, Brazil, 2019. IEEE. doi:10.1109/IPDPS.2019.00014.
  • [3] Udit Agarwal and Vijaya Ramachandran. Faster deterministic all pairs shortest paths in congest model. In Christian Scheideler and Michael Spear, editors, SPAA ’20: 32nd ACM Symposium on Parallelism in Algorithms and Architectures, July 15-17, 2020, pages 11–21, Virtual Event, USA, 2020. ACM. doi:10.1145/3350755.3400256.
  • [4] Udit Agarwal, Vijaya Ramachandran, Valerie King, and Matteo Pontecorvi. A deterministic distributed algorithm for exact weighted all-pairs shortest paths in õ(n 3/2 ) rounds. In Calvin Newport and Idit Keidar, editors, Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing, PODC 2018, July 23-27, 2018, pages 199–205, Egham, United Kingdom, 2018. ACM. doi:10.1145/3212734.3212773.
  • [5] Bertie Ancona, Keren Censor-Hillel, Mina Dalirrooyfard, Yuval Efron, and Virginia Vassilevska Williams. Distributed distance approximation. In Quentin Bramas, Rotem Oshman, and Paolo Romano, editors, 24th International Conference on Principles of Distributed Systems, OPODIS 2020, December 14-16, 2020, volume 184 of LIPIcs, pages 30:1–30:17, Strasbourg, France (Virtual Conference), 2020. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.OPODIS.2020.30.
  • [6] John Augustine, Keerti Choudhary, Avi Cohen, David Peleg, Sumathi Sivasubramaniam, and Suman Sourav. Distributed graph realizations †. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 18-22, 2020, pages 158–167, New Orleans, LA, USA, 2020. IEEE. doi:10.1109/IPDPS47924.2020.00026.
  • [7] John Augustine, Mohsen Ghaffari, Robert Gmyr, Kristian Hinnenthal, Christian Scheideler, Fabian Kuhn, and Jason Li. Distributed computation in node-capacitated networks. In Christian Scheideler and Petra Berenbrink, editors, The 31st ACM on Symposium on Parallelism in Algorithms and Architectures, SPAA 2019, June 22-24, 2019, pages 69–79, Phoenix, AZ, USA, 2019. ACM. doi:10.1145/3323165.3323195.
  • [8] John Augustine, Kristian Hinnenthal, Fabian Kuhn, Christian Scheideler, and Philipp Schneider. Shortest paths in a hybrid network model. In Shuchi Chawla, editor, Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, January 5-8, 2020, pages 1280–1299, Salt Lake City, UT, USA, 2020. SIAM. doi:10.1137/1.9781611975994.78.
  • [9] Florent Becker, Antonio Fernández Anta, Ivan Rapaport, and Eric Rémila. Brief announcement: A hierarchy of congested clique models, from broadcast to unicast. In Chryssis Georgiou and Paul G. Spirakis, editors, Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing, PODC 2015, July 21 - 23, 2015, pages 167–169, Donostia-San Sebastián, Spain, 2015. ACM. doi:10.1145/2767386.2767447.
  • [10] Florent Becker, Antonio Fernández Anta, Ivan Rapaport, and Eric Rémila. The effect of range and bandwidth on the round complexity in the congested clique model. In Thang N. Dinh and My T. Thai, editors, Computing and Combinatorics - 22nd International Conference, COCOON 2016, August 2-4, 2016, Proceedings, volume 9797 of Lecture Notes in Computer Science, pages 182–193, Ho Chi Minh City, Vietnam, 2016. Springer. doi:10.1007/978-3-319-42634-1\\\\_15.
  • [11] Ruben Becker, Andreas Karrenbauer, Sebastian Krinninger, and Christoph Lenzen. Near-optimal approximate shortest paths and transshipment in distributed and streaming models. In Andréa W. Richa, editor, 31st International Symposium on Distributed Computing, DISC 2017, October 16-20, 2017, volume 91 of LIPIcs, pages 7:1–7:16, Vienna, Austria, 2017. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.DISC.2017.7.
  • [12] Aaron Bernstein and Danupon Nanongkai. Distributed exact weighted all-pairs shortest paths in near-linear time. In Moses Charikar and Edith Cohen, editors, Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, June 23-26, 2019, pages 334–342, Phoenix, AZ, USA, 2019. ACM. doi:10.1145/3313276.3316326.
  • [13] Keren Censor-Hillel, Yi-Jun Chang, François Le Gall, and Dean Leitersdorf. Tight distributed listing of cliques. In Dániel Marx, editor, Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, January 10 - 13, 2021, pages 2878–2891, Virtual Conference, 2021. SIAM. doi:10.1137/1.9781611976465.171.
  • [14] Keren Censor-Hillel, Michal Dory, Janne H. Korhonen, and Dean Leitersdorf. Fast approximate shortest paths in the congested clique. In Peter Robinson and Faith Ellen, editors, Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing, PODC 2019, Canada, July 29 - August 2, 2019, pages 74–83, Toronto, ON, 2019. ACM. doi:10.1145/3293611.3331633.
  • [15] Keren Censor-Hillel, François Le Gall, and Dean Leitersdorf. On distributed listing of cliques. In Yuval Emek and Christian Cachin, editors, PODC ’20: ACM Symposium on Principles of Distributed Computing, August 3-7, 2020, pages 474–482, Virtual Event, Italy, 2020. ACM. doi:10.1145/3382734.3405742.
  • [16] Keren Censor-Hillel, Petteri Kaski, Janne H. Korhonen, Christoph Lenzen, Ami Paz, and Jukka Suomela. Algebraic methods in the congested clique. Distributed Comput., 32(6):461–478, 2019. doi:10.1007/s00446-016-0270-2.
  • [17] Keren Censor-Hillel, Seri Khoury, and Ami Paz. Quadratic and near-quadratic lower bounds for the CONGEST model. In Andréa W. Richa, editor, 31st International Symposium on Distributed Computing, DISC 2017, October 16-20, 2017, volume 91 of LIPIcs, pages 10:1–10:16, Vienna, Austria, 2017. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.DISC.2017.10.
  • [18] Keren Censor-Hillel, Dean Leitersdorf, and Volodymyr Polosukhin. Distance computations in the hybrid network model via oracle simulations. In Markus Bläser and Benjamin Monmege, editors, 38th International Symposium on Theoretical Aspects of Computer Science, STACS 2021, March 16-19, 2021, volume 187 of LIPIcs, pages 21:1–21:19, Saarbrücken, Germany (Virtual Conference), 2021. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.STACS.2021.21.
  • [19] Keren Censor-Hillel, Dean Leitersdorf, and Elia Turner. Sparse matrix multiplication and triangle listing in the congested clique model. Theor. Comput. Sci., 809:45–60, 2020. doi:10.1016/j.tcs.2019.11.006.
  • [20] Yi-Jun Chang, Seth Pettie, and Hengjie Zhang. Distributed triangle detection via expander decomposition. In Timothy M. Chan, editor, Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, January 6-9, 2019, pages 821–840, San Diego, California, USA, 2019. SIAM. doi:10.1137/1.9781611975482.51.
  • [21] Yi-Jun Chang and Thatchaphol Saranurak. Improved distributed expander decomposition and nearly optimal triangle enumeration. In Peter Robinson and Faith Ellen, editors, Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing, PODC 2019, Canada, July 29 - August 2, 2019, pages 66–73, Toronto, ON, 2019. ACM. doi:10.1145/3293611.3331618.
  • [22] Yi-Jun Chang and Thatchaphol Saranurak. Deterministic distributed expander decomposition and routing with applications in distributed derandomization. In 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, November 16-19, 2020, pages 377–388, Durham, NC, USA, 2020. IEEE. doi:10.1109/FOCS46700.2020.00043.
  • [23] Shiri Chechik and Doron Mukhtar. Single-source shortest paths in the CONGEST model with improved bound. In Yuval Emek and Christian Cachin, editors, PODC ’20: ACM Symposium on Principles of Distributed Computing, August 3-7, 2020, pages 464–473, Virtual Event, Italy, 2020. ACM. doi:10.1145/3382734.3405729.
  • [24] Edith Cohen. Polylog-time and near-linear work approximation scheme for undirected shortest paths. J. ACM, 47(1):132–166, 2000. doi:10.1145/331605.331610.
  • [25] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, 3rd Edition. MIT Press, 2009. URL: http://mitpress.mit.edu/books/introduction-algorithms.
  • [26] Yong Cui, Hongyi Wang, and Xiuzhen Cheng. Channel allocation in wireless data center networks. In INFOCOM 2011. 30th IEEE International Conference on Computer Communications, Joint Conference of the IEEE Computer and Communications Societies, 10-15 April 2011, pages 1395–1403, Shanghai, China, 2011. IEEE. doi:10.1109/INFCOM.2011.5934925.
  • [27] Michael Dinitz, Michael Schapira, and Gal Shahaf. Approximate moore graphs are good expanders. J. Comb. Theory, Ser. B, 141:240–263, 2020. doi:10.1016/j.jctb.2019.08.003.
  • [28] Michal Dory and Merav Parter. Exponentially faster shortest paths in the congested clique. In Yuval Emek and Christian Cachin, editors, PODC ’20: ACM Symposium on Principles of Distributed Computing, August 3-7, 2020, pages 59–68, Virtual Event, Italy, 2020. ACM. doi:10.1145/3382734.3405711.
  • [29] Andrew Drucker, Fabian Kuhn, and Rotem Oshman. On the power of the congested clique model. In Magnús M. Halldórsson and Shlomi Dolev, editors, ACM Symposium on Principles of Distributed Computing, PODC ’14, July 15-18, 2014, pages 367–376, Paris, France, 2014. ACM. doi:10.1145/2611462.2611493.
  • [30] Talya Eden, Nimrod Fiat, Orr Fischer, Fabian Kuhn, and Rotem Oshman. Sublinear-time distributed algorithms for detecting small cliques and even cycles. In Jukka Suomela, editor, 33rd International Symposium on Distributed Computing, DISC 2019, October 14-18, 2019, volume 146 of LIPIcs, pages 15:1–15:16, Budapest, Hungary, 2019. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.DISC.2019.15.
  • [31] Michael Elkin. Distributed exact shortest paths in sublinear time. In Hamed Hatami, Pierre McKenzie, and Valerie King, editors, Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Canada, June 19-23, 2017, pages 757–770, Montreal, QC, 2017. ACM. doi:10.1145/3055399.3055452.
  • [32] Michael Feldmann, Kristian Hinnenthal, and Christian Scheideler. Fast hybrid network algorithms for shortest paths in sparse graphs. In Quentin Bramas, Rotem Oshman, and Paolo Romano, editors, 24th International Conference on Principles of Distributed Systems, OPODIS 2020, December 14-16, 2020, volume 184 of LIPIcs, pages 31:1–31:16, Strasbourg, France (Virtual Conference), 2020. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.OPODIS.2020.31.
  • [33] Sebastian Forster and Danupon Nanongkai. A faster distributed single-source shortest paths algorithm. In Mikkel Thorup, editor, 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2018, October 7-9, 2018, pages 686–697, Paris, France, 2018. IEEE Computer Society. doi:10.1109/FOCS.2018.00071.
  • [34] Silvio Frischknecht, Stephan Holzer, and Roger Wattenhofer. Networks cannot compute their diameter in sublinear time. In Yuval Rabani, editor, Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, January 17-19, 2012, pages 1150–1162, Kyoto, Japan, 2012. SIAM. doi:10.1137/1.9781611973099.91.
  • [35] François Le Gall. Further algebraic algorithms in the congested clique model and applications to graph-theoretic problems. In Cyril Gavoille and David Ilcinkas, editors, Distributed Computing - 30th International Symposium, DISC 2016, September 27-29, 2016. Proceedings, volume 9888 of Lecture Notes in Computer Science, pages 57–70, Paris, France, 2016. Springer. doi:10.1007/978-3-662-53426-7\\\\_5.
  • [36] Mohsen Ghaffari and Bernhard Haeupler. Distributed algorithms for planar networks I: planar embedding. In George Giakkoupis, editor, Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing, PODC 2016, July 25-28, 2016, pages 29–38, Chicago, IL, USA, 2016. ACM. doi:10.1145/2933057.2933109.
  • [37] Mohsen Ghaffari and Bernhard Haeupler. Distributed algorithms for planar networks II: low-congestion shortcuts, mst, and min-cut. In Robert Krauthgamer, editor, Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, January 10-12, 2016, pages 202–219, Arlington, VA, USA, 2016. SIAM. doi:10.1137/1.9781611974331.ch16.
  • [38] Mohsen Ghaffari, Fabian Kuhn, and Hsin-Hao Su. Distributed MST and routing in almost mixing time. In Elad Michael Schiller and Alexander A. Schwarzmann, editors, Proceedings of the ACM Symposium on Principles of Distributed Computing, PODC 2017, July 25-27, 2017, pages 131–140, Washington, DC, USA, 2017. ACM. doi:10.1145/3087801.3087827.
  • [39] Mohsen Ghaffari and Jason Li. Improved distributed algorithms for exact shortest paths. In Ilias Diakonikolas, David Kempe, and Monika Henzinger, editors, Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, June 25-29, 2018, pages 431–444, Los Angeles, CA, USA, 2018. ACM. doi:10.1145/3188745.3188948.
  • [40] Mohsen Ghaffari and Jason Li. New distributed algorithms in almost mixing time via transformations from parallel algorithms. In Ulrich Schmid and Josef Widder, editors, 32nd International Symposium on Distributed Computing, DISC 2018, October 15-19, 2018, volume 121 of LIPIcs, pages 31:1–31:16, New Orleans, LA, USA, 2018. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.DISC.2018.31.
  • [41] Ofer Grossman, Seri Khoury, and Ami Paz. Improved hardness of approximation of diameter in the CONGEST model. In Hagit Attiya, editor, 34th International Symposium on Distributed Computing, DISC 2020, October 12-16, 2020, volume 179 of LIPIcs, pages 19:1–19:16, Virtual Conference, 2020. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.DISC.2020.19.
  • [42] Kai Han, Zhiming Hu, Jun Luo, and Liu Xiang. RUSH: routing and scheduling for hybrid data center networks. In 2015 IEEE Conference on Computer Communications, INFOCOM 2015, April 26 - May 1, 2015, pages 415–423, Kowloon, Hong Kong, 2015. IEEE. doi:10.1109/INFOCOM.2015.7218407.
  • [43] Vipul Harsh, Sangeetha Abdu Jyothi, Inderdeep Singh, and Philip Brighten Godfrey. Expander datacenters: From theory to practice. CoRR, abs/1811.00212, 2018. URL: http://arxiv.org/abs/1811.00212, arXiv:1811.00212.
  • [44] Monika Henzinger, Sebastian Krinninger, and Danupon Nanongkai. A deterministic almost-tight distributed algorithm for approximating single-source shortest paths. In Daniel Wichs and Yishay Mansour, editors, Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, June 18-21, 2016, pages 489–498, Cambridge, MA, USA, 2016. ACM. doi:10.1145/2897518.2897638.
  • [45] Stephan Holzer and Nathan Pinsker. Approximation of distances and shortest paths in the broadcast congest clique. In Emmanuelle Anceaume, Christian Cachin, and Maria Gradinariu Potop-Butucaru, editors, 19th International Conference on Principles of Distributed Systems, OPODIS 2015, December 14-17, 2015, volume 46 of LIPIcs, pages 6:1–6:16, Rennes, France, 2015. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.OPODIS.2015.6.
  • [46] Stephan Holzer and Roger Wattenhofer. Optimal distributed all pairs shortest paths and applications. In Darek Kowalski and Alessandro Panconesi, editors, ACM Symposium on Principles of Distributed Computing, PODC ’12, July 16-18, 2012, pages 355–364, Funchal, Madeira, Portugal, 2012. ACM. doi:10.1145/2332432.2332504.
  • [47] He Huang, Xiangke Liao, Shanshan Li, Shaoliang Peng, Xiaodong Liu, and Bin Lin. The architecture and traffic management of wireless collaborated hybrid data center network. In Dah Ming Chiu, Jia Wang, Paul Barford, and Srinivasan Seshan, editors, ACM SIGCOMM 2013 Conference, SIGCOMM’13, August 12-16, 2013, pages 511–512, Hong Kong, China, 2013. ACM. doi:10.1145/2486001.2491724.
  • [48] Taisuke Izumi, François Le Gall, and Frédéric Magniez. Quantum distributed algorithm for triangle finding in the CONGEST model. In Christophe Paul and Markus Bläser, editors, 37th International Symposium on Theoretical Aspects of Computer Science, STACS 2020, March 10-13, 2020, volume 154 of LIPIcs, pages 23:1–23:13, Montpellier, France, 2020. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.STACS.2020.23.
  • [49] Fabian Kuhn and Philipp Schneider. Computing shortest paths and diameter in the hybrid network model. In Yuval Emek and Christian Cachin, editors, PODC ’20: ACM Symposium on Principles of Distributed Computing, August 3-7, 2020, pages 109–118, Virtual Event, Italy, 2020. ACM. doi:10.1145/3382734.3405719.
  • [50] Christoph Lenzen and Boaz Patt-Shamir. Fast routing table construction using small messages: extended abstract. In Dan Boneh, Tim Roughgarden, and Joan Feigenbaum, editors, Symposium on Theory of Computing Conference, STOC’13, June 1-4, 2013, pages 381–390, Palo Alto, CA, USA, 2013. ACM. doi:10.1145/2488608.2488656.
  • [51] Christoph Lenzen and Boaz Patt-Shamir. Fast partial distance estimation and applications. In Chryssis Georgiou and Paul G. Spirakis, editors, Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing, PODC 2015, July 21 - 23, 2015, pages 153–162, Donostia-San Sebastián, Spain, 2015. ACM. doi:10.1145/2767386.2767398.
  • [52] Christoph Lenzen and David Peleg. Efficient distributed source detection with limited bandwidth. In Panagiota Fatourou and Gadi Taubenfeld, editors, ACM Symposium on Principles of Distributed Computing, PODC ’13, Canada, July 22-24, 2013, pages 375–382, Montreal, QC, 2013. ACM. doi:10.1145/2484239.2484262.
  • [53] Jason Li and Merav Parter. Planar diameter via metric compression. In Moses Charikar and Edith Cohen, editors, Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, June 23-26, 2019, pages 152–163, Phoenix, AZ, USA, 2019. ACM. doi:10.1145/3313276.3316358.
  • [54] Danupon Nanongkai. Distributed approximation algorithms for weighted shortest paths. In David B. Shmoys, editor, Symposium on Theory of Computing, STOC 2014, May 31 - June 03, 2014, pages 565–573, New York, NY, USA, 2014. ACM. doi:10.1145/2591796.2591850.
  • [55] Merav Parter. Distributed planar reachability in nearly optimal time. In Hagit Attiya, editor, 34th International Symposium on Distributed Computing, DISC 2020, October 12-16, 2020, volume 179 of LIPIcs, pages 38:1–38:17, Virtual Conference, 2020. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.DISC.2020.38.
  • [56] David Peleg. Distributed Computing: A Locality-Sensitive Approach. Society for Industrial and Applied Mathematics, USA, 2000.
  • [57] David Peleg, Liam Roditty, and Elad Tal. Distributed algorithms for network diameter and girth. In Artur Czumaj, Kurt Mehlhorn, Andrew M. Pitts, and Roger Wattenhofer, editors, Automata, Languages, and Programming - 39th International Colloquium, ICALP 2012, July 9-13, 2012, Proceedings, Part II, volume 7392 of Lecture Notes in Computer Science, pages 660–672, Warwick, UK, 2012. Springer. doi:10.1007/978-3-642-31585-5\\\\_58.
  • [58] Atish Das Sarma, Stephan Holzer, Liah Kor, Amos Korman, Danupon Nanongkai, Gopal Pandurangan, David Peleg, and Roger Wattenhofer. Distributed verification and hardness of distributed approximation. SIAM J. Comput., 41(5):1235–1265, 2012. doi:10.1137/11085178X.
  • [59] Jeanette P. Schmidt, Alan Siegel, and Aravind Srinivasan. Chernoff-hoeffding bounds for applications with limited independence. SIAM J. Discret. Math., 8(2):223–250, 1995. doi:10.1137/S089548019223872X.
  • [60] Hsin-Hao Su and Hoa T. Vu. Distributed dense subgraph detection and low outdegree orientation. In Hagit Attiya, editor, 34th International Symposium on Distributed Computing, DISC 2020, October 12-16, 2020, volume 179 of LIPIcs, pages 15:1–15:18, Virtual Conference, 2020. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.DISC.2020.15.
  • [61] Salil P. Vadhan. Pseudorandomness. Found. Trends Theor. Comput. Sci., 7(1-3):1–336, 2012. doi:10.1561/0400000010.
  • [62] Guohui Wang, David G. Andersen, Michael Kaminsky, Konstantina Papagiannaki, T. S. Eugene Ng, Michael Kozuch, and Michael P. Ryan. c-through: part-time optics in data centers. In Shivkumar Kalyanaraman, Venkata N. Padmanabhan, K. K. Ramakrishnan, Rajeev Shorey, and Geoffrey M. Voelker, editors, Proceedings of the ACM SIGCOMM 2010 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, August 30 -September 3, 2010, pages 327–338, New Delhi, India, 2010. ACM. doi:10.1145/1851182.1851222.

Appendix A Preliminaries – Extended

A.1 Definitions

Additional Models. Throughout the paper, we also refer to the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} and 𝖡𝗋𝗈𝖺𝖽𝖼𝖺𝗌𝗍𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Broadcast\ Congested\ Clique} models. Both are synchronous models, where every node can communicate in each round with every other node by sending messages of O(logn)O(\log n) bits. In the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} model, the messages between every pair of nodes can be unique, while in the 𝖡𝗋𝗈𝖺𝖽𝖼𝖺𝗌𝗍𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Broadcast\ Congested\ Clique} model each node sends the same message to all the other nodes.

Similarly to [7] we define a distributive aggregate function and an Aggregation-Problem.

Definition 8 (Aggregate Function).

An aggregate function ff maps a multiset S={x1,,xN}S=\set{x_{1},...,x_{N}} of input values to some value f(S)f(S). An aggregate function ff is called distributive if there is an aggregate function gg such that for any multiset SS and any partition S1,,SS_{1},\cdots,S_{\ell} of SS, it holds that f(S)=g(f(S1),,f(S))f(S)=g(f(S_{1}),...,f(S_{\ell})).

Definition 9 (Aggregate-and-Broadcast Problem).

In the Aggregate-and-Broadcast Problem we are given a distributive aggregate function ff and each node uu stores exactly one input value val(u)val(u). The goal is to let every node learn f({val(u)}uV)f(\Set{val(u)}_{u\in V}).

A.2 Mathematical Tools

Similarly to [49], we will use families of kk-wise independent functions.

Definition 10 (Family of kk-wise Independent Random Functions).

For finite sets A,BA,B, let ={h:AB}\mathcal{H}=\set{h\colon A\mapsto B} be a family of hash functions. Then \mathcal{H} is called kk-wise independent if for a random function hHh\in H and for any kk distinct keys {ai}i=1kA\set{a_{i}}_{i=1}^{k}\subseteq A, we have that {bi}i=1kB\set{b_{i}}_{i=1}^{k}\subseteq B are independent and uniformly distributed random variables in BB.

In particular, we are interested in a hash function on which nodes can agree within a small amount of communication.

Claim A.1 (Seed).

[61][49, Lemma D.1] For A={0,1}aA=\set{0,1}^{a} and B={0,1}bB=\set{0,1}^{b}, there is a family of kk-wise independent hash functions ={h:AB}\mathcal{H}=\set{h\colon A\mapsto B} such that selecting a function from \mathcal{H} requires kmax{a,b}k\cdot\max\set{a,b} random bits and computing h(x)h(x) for any xAx\in A can be done in poly(a,b,k)\operatorname{poly}(a,b,k) time.

Unlike [49] we do not necessarily apply the hash function hh\in\mathcal{H} sampled from the family of kk-wise independent random functions on distinct sets of arguments, but rather on a multiset of arguments where each argument appears at most O~(1){\tilde{{O}}}(1) times. So, we show in \IfAppendixLABEL:\next ( (Conflicts).)Claim A.2 the property similar to [49, Lemma D.2], which bounds the number of collisions.

Claim A.2 (Conflicts).

There exists a value k=Θ~(1)k={\tilde{{\Theta}}}(1) such that for a sufficiently large nn, given a function h:ABh\colon A\mapsto B\in\mathcal{H} (with |A|,|B|=Θ~(n)\left|A\right|,\left|B\right|={\tilde{{\Theta}}}(n)) sampled from a family of kk-wise independent hash functions, and a multi-set of keys A={ai}i=1O~(n)A^{\prime}=\set{a_{i}}_{i=1}^{{\tilde{{O}}}(n)}, in which each key appears in AA^{\prime} at most O~(1){\tilde{{O}}}(1) times, each value bBb\in B appears in the multiset of values B=h(A)={h(ai)}i=1O~(n)B^{\prime}=h(A^{\prime})=\set{h(a_{i})}_{i=1}^{{\tilde{{O}}}(n)} at most O~(1){\tilde{{O}}}(1) times w.h.p.

Proof of Claim A.2.

Split AA^{\prime} greedily into O~(1){\tilde{{O}}}(1) sets of distinct keys {Aj}j=1O~(1)\set{A^{\prime}_{j}}_{j=1}^{{\tilde{{O}}}(1)}. Consider some AjA^{\prime}_{j}. Let \mathcal{H} be a family of kk-wide independent hash functions, for some kk to be determined. By the definition of hh\in\mathcal{H}, the random variables Bj=h(Aj)={h(a):aAj}BB^{\prime}_{j}=h(A^{\prime}_{j})=\set{h(a)\colon a\in A^{\prime}_{j}}\subseteq B^{\prime} are kk-wise independent and uniformly distributed in BB. Thus the probability to sample some particular bBb\in B is 1|B|=Θ~(1n)\frac{1}{\left|B\right|}={\tilde{{\Theta}}}(\frac{1}{n}). By a Chernoff Bound for variables with bounded independence [59, Theorem 2] and a union bound over all bBb\in B and AjA^{\prime}_{j}, there is a large enough k=Θ~(1)k={\tilde{{\Theta}}}(1), such that each value bBb\in B appears in BjB^{\prime}_{j} at most O~(1){\tilde{{O}}}(1) times w.h.p. Thus, each value bb appears in B=j=1O~(1)BjB^{\prime}=\bigcup_{j=1}^{{\tilde{{O}}}(1)}B^{\prime}_{j} at most O~(1)O~(1)=O~(1){\tilde{{O}}}(1)\cdot{\tilde{{O}}}(1)={\tilde{{O}}}(1) times w.h.p. ∎

A.3 Distance Tools

Claim A.3 (APSP using kk-nearest and MSSP).

(see e.g. [14]) Let G=(V,E,w)G=(V,E,w) be a weighted graph, let cc be a constant, let kk be a value, and let AA be a set of nodes marked independently with probability at least k(c+1)lognk^{-(c+1)\log{n}}.

With probability at least 1nc1-n^{-c}, it holds that Nk(v)AN^{k}(v)\cap A\neq\emptyset. Denote by p(s)ANk(v)p(s)\in A\cap N^{k}(v) one of the closest nodes to ss in ANk(v)A\cap N^{k}(v). Denote by d~:A×V\tilde{d}\colon A\times V\mapsto\mathbb{R} the α\alpha-approximate distance from AA to other nodes for some α\alpha. With probability at least 1nc1-n^{-c}, for any pair of nodes s,tVs,t\in V it holds that d^(s,t)=d(s,p(s))+d~(p(s),t)\hat{d}(s,t)=d(s,p(s))+\tilde{d}(p(s),t) is a 3α3\alpha-approximate weighted distance between ss and tt.

Appendix B The 𝖠𝗇𝗈𝗇𝗒𝗆𝗈𝗎𝗌𝖢𝖺𝗉𝖺𝖼𝗂𝗍𝖺𝗍𝖾𝖽\mathsf{Anonymous\ Capacitated} Model – Extended

Section B.1 contains proofs of various technical tools for routing information in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model – we note that if taken as black-boxes, its contents can be skipped without harming the understanding of the main contributions of this section. Then, we show how to build carrier configurations and how to work with them, in Section B.2. We use sparse matrix multiplication Theorem 3.1 to construct hopsets in Section B.3, which eventually allows us to obtain our fast algorithms for SSSP and MSSP in Section B.4 and Section B.5, respectively.

B.1 General Tools

We show basic tools which are useful in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, for overcoming the anonymity challenges, as well as for solving problems related to communication with limited bandwidth.

We introduce the following notation. Given a set of nodes WVW\subseteq V, denote by Tokens(W)Tokens(W) the set of pairs of communication tokens and identifiers of the nodes in WW.

B.1.1 Basic Message Routing

Lemma B.1 (Routing).

Given a set of messages and a globally known value kk, if each node desires to send at most kk messages and knows the communication tokens of their recipients, and each node is the recipient of at most kk messages, then it is possible to deliver the messages in O~(k/c+1){\tilde{{O}}}(k/c+1) rounds of the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, w.h.p.

Proof of Lemma B.1.

Denote by mvm_{v} the messages that node vv desires to send. We proceed in Θ(klog2n/c)\Theta{(\left\lceil k\log^{2}{n}/c\right\rceil)} rounds, where in each round each node vv samples messages from mvm_{v} that are not yet sent, independently with probability Θ(cklogn)\Theta{(\frac{c}{k\log n})}, and sends them to their destinations. The probability that some message is not sampled during this procedure is (1Θ(cklogn))Θ(klog2n/c)=O(1/polyn)\left(1-\Theta{(\frac{c}{k\log n})}\right)^{\Theta{(\left\lceil k\log^{2}{n}/c\right\rceil)}}={O}(1/\operatorname{poly}{n}). Thus, by applying a union bound over all messages, each message is sent w.h.p.

For any given round, the probability for a specific message to be sent or received by some node is at most Θ(cklogn)\Theta{(\frac{c}{k\log n})} (it is zero for rounds after the one in which it has been sent). Thus, due to the independence between messages, by a Chernoff Bound and a union bound over senders, receivers and rounds, on each round, each node sends or receives at most cc messages w.h.p.

B.1.2 Anonymous Communication Primitives

Definition 11 (Communication Tree).

Given a graph G=(V,E)G=(V,E) and a node sVs\in V, a communication tree rooted at ss in GG is a O~(1)\tilde{O}(1)-depth directed tree which is rooted at ss and spans VV, such that each node has at most 2 edges directed away from it. A communication tree over WVW\subseteq V, satisfies the conditions above, yet, only spans WW and not VV.

A communication tree rooted at ss allows to efficiently broadcast messages from ss to the entire graph as well as compute aggregation functions.

We show it is possible to build many communication trees in parallel.

Lemma B.2 (Constructing Communication Trees).

Given a set of nodes SS, it is possible to construct for each sSs\in S a communication tree rooted at ss, TsT_{s}, such that each node in the graph knows the edges incident to it in each tree. This takes O~(|S|/c+1){\tilde{{O}}}(|S|/c+1) rounds of the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, w.h.p.

Proof of Lemma B.2.

Consider the task of constructing TsT_{s} for a single node ss. Node ss randomly samples two nodes, v1,v2v_{1},v_{2}, and tells them it is their parent in TsT_{s}. Nodes v1,v2v_{1},v_{2} each sample two nodes and repeat this process. At each step, a node viv_{i} might sample a node vjv_{j} which is already in TsT_{s}. In such a case, vjv_{j} rejects the demand of viv_{i} to add it as a child. Thus, when building the next level of the tree, we repeat the choosing step O~(1)\tilde{O}(1) times, ensuring, w.h.p., that each node has two nodes as its children. Notice that this ensures, w.h.p., that at every level in TsT_{s}, except for the last, each node has exactly two children, and thus the depth of TsT_{s} is O~(1)\tilde{O}(1) w.h.p. Thus, in O~(1)\tilde{O}(1) rounds, a communication tree from ss which spans the entire graph is constructed.

In each round, every node sends and receives O~(1)\tilde{O}(1) messages, w.h.p., thus we can perform this for O~(c)\tilde{O}(c) nodes in parallel, taking O~(|S|/c)\tilde{O}(|S|/c) rounds overall to build such trees for all sSs\in S. ∎

Lemma B.3 (Message Doubling on Communication Trees).

Let WVW\subseteq V be a set of nodes, and TsT_{s} a communication tree rooted at sWs\in W, which spans WW. It is possible for ss to broadcast a set of MM messages to the entire set WW within O~(|M|/c+1)\tilde{O}(|M|/c+1) rounds of the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, w.h.p., while utilizing only the communication bandwidth of the nodes in WW. Likewise, it is possible to compute kk aggregation functions on values of the nodes in WW, in O~(k/c+1)\tilde{O}(k/c+1) rounds.

Proof of Lemma B.3.

On the first round, ss sends to its children in TsT_{s}, ss_{\ell} and srs_{r}, some set Q1MQ_{1}\subseteq M, where |Q1|=c/2|Q_{1}|=c/2. On the second round, ss_{\ell} and srs_{r} forward Q1Q_{1} to their children, while ss sends them some other such set Q2Q_{2}. This continues for |M|/c+O~(1)|M|/c+\tilde{O}(1) rounds. Notice that every node sends and receives at most cc messages per round.

To solve kk aggregation functions, reversing the flow of messages in the above algorithm suffices. ∎

Lemma B.4 (Synchronization).

In the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, given a communication tree TT rooted at some node ss and assuming that every node vv has a value val(v)val(v), it is possible in O~(1)\tilde{O}(1) rounds to ensure that vv knows the sum prev(v)prev(v) of all the values val(u)val(u) of the nodes uu which come before it in the in-order traversal of TT. Further, it is possible to solve kk instances of this problem in parallel in O~(k/c+1)\tilde{O}(k/c+1) rounds.

Proof of Lemma B.4.

We treat only a single value, noting that allowing kk values follows as in the single value case, each node sends and receives only a constant number of messages per round. For a node vv, denote by vv_{\ell}, vrv_{r} its left and right children in TT, respectively, and by vpv_{p} its parent.

Start from the leaves of TT and sum the total of the values upwards till the root. To clarify, a leaf vv sends val(v)val(v) to vpv_{p}. Denote by subtreeVal(v)subtreeVal(v) the sum of all of the values of the nodes in the subtree of TT rooted at vv. Once vv receives subtreeVal(v)subtreeVal(v_{\ell}) and subtreeVal(vr)subtreeVal(v_{r}) from its children, it sends up to vpv_{p} the sum subtreeVal(v)+val(v)+subtreeVal(vr)subtreeVal(v_{\ell})+val(v)+subtreeVal(v_{r}).

Then, the root ss of TT sets prev(s)=subtreeVal(s)prev(s)=subtreeVal(s_{\ell}). Further, ss sends to ss_{\ell} the value zero, and sends to srs_{r} the sum prev(s)+val(s)prev(s)+val(s). Then, every node vv, upon receiving a value ii from vpv_{p}, sets prev(v)=i+subtreeVal(v)prev(v)=i+subtreeVal(v_{\ell}), forwards the value ii to vv_{\ell}, and sends prev(v)+val(v)prev(v)+val(v) to vrv_{r}. This algorithm takes O~(1)\tilde{O}(1) rounds and achieves the desired result. ∎

Lemma B.5 (Broadcasting).

Let MM be a set of messages distributed across the nodes arbitrarily. It is possible to broadcast this set of messages to all nodes in O~(|M|/c+1)\tilde{O}(|M|/c+1) rounds of the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, w.h.p.

Proof of Lemma B.5.

Construct a communication tree TT from the node with ID 1, and then send down TT the communication token of node 1, which implies that from now on every node can communicate with node 1. Using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1, node 1 receives all of MM in O~(|M|/c+1)\tilde{O}(|M|/c+1) rounds.

Once node 1 knows all of MM, we send MM down along TT in O~(|M|/c+1)\tilde{O}(|M|/c+1) rounds, using Lemma B.3. ∎

Reversing the flow of messages in the proof of Lemma B.5 proves Corollary B.6.

Corollary B.6 (Aggregation).

It is possible to solve cc aggregation problems in O~(1){\tilde{{O}}}(1) rounds in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model cc. That is, if every node vv has a vector of values {v1,,vc}\{v_{1},\dots,v_{c}\}, denote i[c]:Si={vi|vV}i\in[c]:\ S_{i}=\{v_{i}\ |\ v\in V\}, and there are cc aggregation functions, f1,,fcf_{1},\dots,f_{c}, it is possible to ensure within O~(1)\tilde{O}(1) rounds that all the nodes know the values f1(S1),,fc(Sc)f_{1}(S_{1}),\dots,f_{c}(S_{c}).

B.1.3 Communication Tools Within Groups of Nodes

We show the following communication tools related to allowing subsets of nodes in the graph to communicate with one another.

\IfAppendix

LABEL:\next ( (Grouping).)Lemma B.7 allows grouping together disjoint sets of nodes such that each node in a given set knows all the communication tokens of the other nodes in the set. \IfAppendixLABEL:\next ( (Group Broadcasting and Aggregating).)Corollary B.9 allows a single node in the set to quickly broadcast messages to or perform aggregation operations on the set.

Lemma B.7 (Grouping).

Let V1,,VkVV_{1},\dots,V_{k}\subseteq V be disjoint sets where 1|Vi|p1\leq|V_{i}|\leq p, for some pp, and where every node vv knows if and to which set it belongs. It is possible in O~(k/c+p/c+1){\tilde{{O}}}(k/c+p/c+1) rounds of the 𝖠𝖢(𝖼)\mathsf{AC(c)} model to ensure that, for each ii, every node vViv\in V_{i} knows Tokens(Vi)Tokens(V_{i}) w.h.p.

Proof of Lemma B.7.

Due to the definition of the sets, we know that knk\leq n. Thus, the nodes [k][k] (with identifiers 1,,k1,\dots,k) broadcast Tokens([k])Tokens([k]) using \IfAppendixLABEL:\next ( (Broadcasting).)Lemma B.5 in O~(k/c+1){\tilde{{O}}}(k/c+1) rounds. Using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1, for each i[k]i\in[k], the nodes in ViV_{i} send Tokens(Vi)Tokens(V_{i}) to node ii, in O~(p/c+1)\tilde{O}(p/c+1) rounds.

Fix i[k]i\in[k]. Node ii performs message duplication to tell all nodes in ViV_{i} the sets Tokens(Vi)Tokens(V_{i}), as follows. Node ii chooses some miVim_{i}\in V_{i}, and tells it, within O(p/c+1)O(p/c+1) rounds, Tokens(Vi)Tokens(V_{i}). Now, ii and mim_{i}, each proceed to each tell another node in ViV_{i} all this information, doubling the number of nodes which know Tokens(Vi)Tokens(V_{i}) to 4. This continues for O~(1)\tilde{O}(1) iterations, each taking O(p/c+1)O(p/c+1) rounds. ∎

Claim B.8 (Group Communication Tree Construction).

Given a set of nodes WW, where every vWv\in W knows Tokens(W)Tokens(W), it is possible to build a communication tree TT over WW such that the in-order traversal of the tree imposes any desired ordering of the nodes WW. This is done using local computation only and requires no communication.

Proof of Claim B.8.

Since each node in WW knows Tokens(W)Tokens(W), this simply entails having every node locally decide which other nodes in WW are its parent, left, and right children in the output tree, TT. ∎

Corollary B.9 (Group Broadcasting and Aggregating).

Given a set of nodes WW, where every vWv\in W knows Tokens(W)Tokens(W), it is possible to allow a single node sWs\in W to broadcast a set of MM messages to the entire set WW within O~(|M|/c)\tilde{O}(|M|/c) rounds of the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, w.h.p., while utilizing only the communication bandwidth of the nodes in WW. Likewise, it is possible to compute kk aggregation functions on values of the nodes in WW, in O~(k/c+1)\tilde{O}(k/c+1) rounds.

\IfAppendix

LABEL:\next ( (Group Multicasting).)Lemma B.10 extends \IfAppendixLABEL:\next ( (Group Broadcasting and Aggregating).)Corollary B.9 in order to allow nodes to efficiently send multicast messages within their given set.

Lemma B.10 (Group Multicasting).

Given a set of nodes WW, where every vWv\in W knows Tokens(W)Tokens(W), and a value kk such that every node vWv\in W desires to multicast at most kk messages to mvm_{v} nodes in WW, where each node uWu\in W is the destination of messages originating in at most one node, it is possible to perform the communication task in O~(k/c+m/c+1)\tilde{O}(k/c+m/c+1) rounds of the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, where mm is an upper bound for all mvm_{v}.

Proof of Lemma B.10.

Assume, w.l.o.g., that the nodes in WW have identifiers [|W|][|W|]. Then, we use \IfAppendixLABEL:\next ( (Group Communication Tree Construction).)Claim B.8 to construct a communication tree TT over WW, with the demand that the in-order traversal of the tree will output the nodes of WW in ascending order from 11 to |W||W|. Then, execute, in O~(1)\tilde{O}(1) rounds, the algorithm given by \IfAppendixLABEL:\next ( (Synchronization).)Lemma B.4 over TT in order to ensure that every node iWi\in W knows the sum si=j[i1]mjs_{i}=\sum_{j\in[i-1]}m_{j}.

Now, node ii tells node si+1s_{i}+1 the values sis_{i} and mim_{i}. Then, node ii tells these values to si+2s_{i}+2, while node si+1s_{i}+1 tells them to node si+3s_{i}+3, and we continue in this fashion for O~(1)\tilde{O}(1) rounds until all nodes in [si+1,si+mi][s_{i}+1,s_{i}+m_{i}] know the values sis_{i} and mim_{i}. We now call these nodes the gateways of node ii. Notice that every node has distinct gateways – that is, no node is a gateway of more than one node.

Node ii now sends all of its kk messages to its gateways. To do so, it tells its first gateway all of its kk messages in O(k/c+1)O(k/c+1) rounds. Then, it tells its second gateway, while the first gateway tells the third gateway. This continues for O~(1)\tilde{O}(1) iterations, each taking O(k/c+1)O(k/c+1) rounds, until all the gateways of node ii know all the messages of ii. Next, in O~(mi/c+1)\tilde{O}(m_{i}/c+1) rounds, node ii sequentially tells its jthj^{th} gateway the identifier of the jthj^{th} node which is set to receive the multicast messages from ii. Finally, every jthj^{th} gateway forwards the kk messages to the target which ii tells it to send the messages to, in O(k/c+1)O(k/c+1) rounds. ∎

B.2 Carrier Configurations

Refer to caption
Figure 1: The Carrier Configuration Distributed Data Structure
In this example, k=2k=2, Cvout={v1,v2,v3,v4}C_{v}^{out}=\{v_{1},v_{2},v_{3},v_{4}\}, Cuin={u1,u2,u3}C_{u}^{in}=\{u_{1},u_{2},u_{3}\}. The two arrays denote which edges vv and uu have, with a checkmark indicating the existence of an edge. The node u1u_{1} holds information about uu and the first four nodes. That is, it knows that there are edges from the first two nodes to uu and that there are no edges from the following to nodes to uu. Notice that in this case v2v_{2} and u1u_{1} both hold the edge e=(v,u)e=(v,u) and thus will know its weight, direction, the communication tokens of vv and uu, and the communication tokens of each other (v2v_{2} and u1u_{1}). Further, v,uv,u have communication trees (not shown), which allow them to perform broadcast and aggregate operations on all of Cvout,C_{v}^{out}, CuinC_{u}^{in}, respectively.
Definition 12 (Carrier Configuration).

Given a set of nodes VV, a data structure CC is a Carrier Configuration holding a graph G=(V,E)G=(V,E) with average degree k=|E|/|V|k=|E|/|V|, if for every node vVv\in V the following hold:  
 
Carrier Node Allocations

  1. 1.

    Cvout,CvinVC_{v}^{out},C_{v}^{in}\subseteq V are the outgoing and incoming carrier nodes of vv, where |Cvout|=degout(v)/k|C_{v}^{out}|=\lceil\deg^{out}(v)/k\rceil, |Cvin|=degin(v)/k|C_{v}^{in}|=\lceil\deg^{in}(v)/k\rceil.

  2. 2.

    vv is in at most ζlogn\zeta\log n sets CioutC_{i}^{out} and ζlogn\zeta\log n sets CjinC_{j}^{in}, for a constant ζ\zeta, and knows which sets it is in.

Data Storage

  1. 3.

    An edge eEe\in E is always stored alongside its weight and direction.

  2. 4.

    For each uCvoutu\in C_{v}^{out}, there exists an interval Iu[n]I_{u}\subseteq[n], such that uu knows all of the edges directed away from vv and towards nodes with IDs in the interval IuI_{u}, and there are at most kk such edges. It further holds that the intervals {Iu|uCvout}\{I_{u}\ |\ u\in C_{v}^{out}\} partition [n][n]. Similar constraints hold for CvinC_{v}^{in}.

  3. 5.

    Node vv knows, for each uCvoutCvinu\in C_{v}^{out}\cup C_{v}^{in}, the two delimiters of the interval IuI_{u}.

Communication Structure

  1. 6.

    For each vv, the nodes in {v}Cvin\set{v}\cup C_{v}^{in} are connected by the communication tree TvoutT_{v}^{out}, implying that each node knows its parent and children in the tree. The same holds for nodes in {v}Cvout\set{v}\cup C_{v}^{out}.

The definition of the data structure is compatible with both directed and undirected graphs, where for undirected graphs each edge appears in both directions. We also use carrier configurations for holding matrices, where it can be thought that every finite entry at indices (i,j)(i,j) in a matrix represents an edge from node ii to jj. Each node ii stores the finite entries of row ii as edges outgoing from ii, and the finite entries of column ii as edges incoming to ii.

In order to use carrier configurations in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, we must slightly extend the definition in order to address the usage of communication tokens. Thus, we present the following definition for 𝖠𝖢(𝖼)\mathsf{AC(c)} Carrier Configurations, to which we often refer simply as ‘carrier configurations’.

Definition 13 (𝖠𝖢(𝖼)\mathsf{AC(c)} Carrier Configuration).

Given a set of nodes VV, a data structure CC is a 𝖠𝖢(𝖼)\mathsf{AC(c)} carrier configuration holding a graph G=(V,E)G=(V,E) with average degree k=|E|/|V|k=|E|/|V|, if it is a Carrier Configuration and, additionally, for every node vVv\in V the following hold:

  1. 1.

    Node vv knows Tokens(Cvin)Tokens(C_{v}^{in}) and Tokens(Cvout)Tokens(C_{v}^{out}).

  2. 2.

    Node uu knows the communication tokens and identifiers of each vv such that uCvinu\in C_{v}^{in} or uCvoutu\in C_{v}^{out}.

  3. 3.

    Node uu knows the communication tokens of its parent and children in each communication tree that it belongs to as part of the data structure.

  4. 4.

    An edge e=(u,v)e=(u,v) is always stored alongside the communication tokens of uu and vv.

  5. 5.

    Every uCvoutu\in C_{v}^{out} knows for every edge e=(v,w)e=(v,w) which it holds, the communication token of node uCwinu^{\prime}\in C_{w}^{in} which also holds ee. Similarly, uu^{\prime} knows the communication token of uu.

Throughout this entire section, the term ‘carrier configuration’ refers to \IfAppendixLABEL:\next ( (𝖠𝖢(𝖼)\mathsf{AC(c)} Carrier Configuration).)Definition 13, unless otherwise specified.

B.2.1 Initialization

We show how to construct a carrier configuration, given that the edges of the graph are initially known to the nodes incident to them. As the stages taken during the construction can be partially reused in other algorithms which we show, we break up the construction into two statements – \IfAppendixLABEL:\next ( (Initialize Carriers).)Lemma B.11 creates an empty carrier configuration by allocating the carrier node sets and creating communication trees spanning them, and \IfAppendixLABEL:\next ( (Populate Carriers).)Lemma B.12 transfers the data from nodes to their carrier nodes.

Lemma B.11 (Initialize Carriers).

Given a graph G=(V,E)G=(V,E), with k=|E|/|V|k=|E|/|V|, and Δ\Delta the maximal degree in GG, where each node initially only knows degGin(v),degGout(v)\deg_{G}^{in}{(v)},\ \deg_{G}^{out}{(v)} (but not even the edges incident to it), it is possible to assign for each node vVv\in V sets CvinC_{v}^{in}, CvoutC_{v}^{out} which satisfy Items 1, 2, 6, 1, 2 and 3, in O~(Δ/(kc)+1)\tilde{O}(\Delta/(k\cdot c)+1) rounds, w.h.p.

Note: We do not assume that kk and Δ\Delta are originally known to all the nodes.

Proof of Lemma B.11.

We perform two operations in this proof. First, we allocate the carrier node sets. Then, we construct communication trees across them. We show the case of outgoing carrier nodes, CvoutC_{v}^{out}, and note that the case of CvinC_{v}^{in} is symmetric.

Carrier Allocations: We start by computing the values k=|E|/|V|k=\left|E\right|/\left|V\right| and Δ\Delta, using \IfAppendixLABEL:\next ( (Aggregation).)Corollary B.6, in O~(1)\tilde{O}(1) rounds. Each node vv selects CvoutC_{v}^{out} by sending its communication token and identifier to degout(v)/k\lceil\deg^{out}(v)/{k}\rceil random nodes, and each node which vv reaches replies to vv with its communication token and identifier. The expected number of times a uu node is picked as carrier node is at most 1/nvVdegout(v)/k=(2|E|/k+n)/n=3{{1}/{n}}\cdot\sum_{v\in V}{\left\lceil{\deg^{out}{(v)}}/{k}\right\rceil}=({2\left|E\right|/k}+n)/{n}=3, and thus by applying a Chernoff Bound, there exists a constant ζ\zeta such that each node is picked by ζlogn\zeta\log{n} (not necessarily distinct) nodes in order to be in their carrier node set w.h.p. This concludes the creation of the sets themselves, and satisfies Items 1, 2, 1 and 2.

The round complexity of this step is O~(Δ/(kc)+1)\tilde{O}(\Delta/(k\cdot c)+1), as each node vv initially sends O(deg(v)/k+1)O(\deg{(v)}/k+1) messages, and then replies to the at most O~(1)\tilde{O}(1) nodes which chose it for their carrier configuration sets.

Communication Trees: Node vv locally builds a balanced binary tree TvT_{v} which spans CvoutC_{v}^{out}, and sends to each uCvoutu\in C_{v}^{out} the communication tokens of its parent and children in TvT_{v}, taking O~(|Cvout|/c+1)=O~(Δ/(kc)+1)\tilde{O}(|C_{v}^{out}|/c+1)=\tilde{O}(\Delta/(k\cdot c)+1) rounds w.h.p. Notice that TvT_{v} is a communication tree (Definition 11) as it is of depth O(logn){O}(\log{n}), and thus we satisfy Items 6 and 3. ∎

Now, we assume that we are given a carrier configuration which is still incomplete and only satisfies the conditions from the previous statement, and we complete it to a proper carrier configuration.

Lemma B.12 (Populate Carriers).

Let G=(V,E)G=(V,E) be a graph where each node vv knows all the edges incident to it and the communication tokens of all of its neighbors. Assume that we have a carrier configuration CC which is currently in-construction and satisfies all of the properties of a carrier configuration, except for Items 3, 4, 5, 4 and 5. Then, it is possible, within O~(Δ/c+1)\tilde{O}(\Delta/c+1) rounds, where Δ\Delta is the maximal degree in the graph, to reach a state where CC satisfies all of the properties of a carrier configuration, w.h.p.

Proof of Lemma B.12.

We show the procedure for, CvoutC_{v}^{out}, and note that the case of CvinC_{v}^{in} is symmetric.

Node vv partitions the identifier space [n]\left[n\right] into |Cvout||C_{v}^{out}| intervals, Iv1,,Iv|Cvout|I_{v}^{1},\dots,I_{v}^{|C_{v}^{out}|}, such that for every such interval IviI_{v}^{i}, the number of edges directed from vv to nodes with identifiers in IiI_{i} is at most kk. Denote by δvi\delta_{v}^{i} the edges from vv to nodes with identifiers in IviI_{v}^{i}. For each node uCvoutu\in C_{v}^{out}, node vv assigns uu a unique interval IviuI_{v}^{i_{u}}, and sends to uu the delimiters of the interval as well as all the edges in δviu\delta_{v}^{i_{u}}. Every edge is sent along with its weight, direction, and the communication tokens and identifiers of both of its endpoints.

The above procedure satisfies Items 3, 4, 5 and 4. We proceed to analyze the round complexity of this step. Clearly, every node vv desires to send at most O(Δ)O(\Delta) messages. To bound the number of messages each node receives, recall that by Item 2, each node vv is a carrier node in at most O~(1){\tilde{{O}}}(1) carried nodes sets, and the number of messages it receives on behalf of each of them is at most kΔk\leq\Delta. Thus, each node desires to receive O~(Δ)\tilde{O}(\Delta) messages. Therefore, by \IfAppendixLABEL:\next ( (Routing).)Lemma B.1, this stage can be executed in O~(Δ/c+1){\tilde{{O}}}(\Delta/c+1) rounds w.h.p.

Finally, we need to satisfy Item 5. We assume that every node vv followed the above procedure to construct both sets CvoutC_{v}^{out}, and CvinC_{v}^{in} which satisfy all properties except for Property 5, and now we show how, at once, this property can be satisfied for both CvoutC_{v}^{out}, and CvinC_{v}^{in}. For every node vv, and every edge e=(v,w)e=(v,w) for some ww, let uCvoutu\in C_{v}^{out} be the node holding ee, then node uu asks node ww which node xCwinx\in C_{w}^{in} holds ee. Node ww, which knows this information due to the above, replies to uu with the answer. As each node carries O~(k)=O~(Δ){\tilde{{O}}}(k)={\tilde{{O}}}(\Delta) edges, and each node receives Δ\Delta requests, one for each of its edges, by \IfAppendixLABEL:\next ( (Routing).)Lemma B.1 it takes an additional O~(Δ/c+1){\tilde{{O}}}(\Delta/c+1) rounds. ∎

Applying \IfAppendixLABEL:\next ( (Initialize Carriers).)Lemma B.11, followed by \IfAppendixLABEL:\next ( (Populate Carriers).)Lemma B.12, directly gives the following.

Lemma B.13 (Initialize Carrier Configuration).

Given a graph G=(V,E)G=(V,E), where each node vv knows all the edges incident to it and the communication tokens of all of its neighbors in GG, it is possible, within O~(Δ/c+1)\tilde{O}(\Delta/c+1) rounds, where Δ\Delta is the maximal degree in the graph, to reach a state where GG is held in a carrier configuration CC, w.h.p.

B.2.2 Basic Tools

We show a basic communication tool within carrier configurations.

Lemma B.14 (Carriers Broadcast and Aggregate).

Let G=(V,E)G=(V,E) be a graph held in a carrier configuration CC. In parallel for all nodes, every vVv\in V can broadcast cc messages to all the nodes in CvinC_{v}^{in} and CvoutC_{v}^{out}, as well as solve cc aggregation tasks over CvinC_{v}^{in} and CvoutC_{v}^{out}. This requires O~(1)\tilde{O}(1) rounds.

Proof of Lemma B.14.

Due to Items 6 and 3, there is a communication tree spanning each CvinC_{v}^{in} and CvoutC_{v}^{out}, and every carrier node knows the communication tokens of its parent and children in the tree. Further, since each node is a member of at most O~(1){\tilde{{O}}}(1) sets of carrier nodes, it is possible to apply \IfAppendixLABEL:\next ( (Message Doubling on Communication Trees).)Lemma B.3 simultaneously across all the communication trees in the carrier configuration, in O~(1)\tilde{O}(1) rounds, proving the claim.

We show the following helpful statement which enables nodes to query a carrier configuration and return to the classical state in which edges are known by the nodes incident to them.

Lemma B.15 (Learn Carried Information).

Given a graph G=(V,E)G=(V,E) with average degree k=|E|/|V|k=\left|E\right|/\left|V\right| held in a carrier configuration CC, it is possible for each node vv to learn all edges adjacent to it in GG in O~(Δ/c+1){\tilde{{O}}}(\Delta/c+1) rounds w.h.p., where Δ\Delta is the maximal degree in GG. It is possible to invoke this procedure for only outgoing or incoming edges separately, requiring O~(Δout/c+1){\tilde{{O}}}(\Delta_{out}/c+1), O~(Δin/c+1){\tilde{{O}}}(\Delta_{in}/c+1) rounds, respectively, where Δout\Delta_{out} is the maximal out-degree, and Δin\Delta_{in} is the maximal in-degree.

Proof of Lemma B.15.

First, each node vv computes deg(v)\deg{(v)} by summing up the number of edges nodes in CvoutC_{v}^{out} and CvinC_{v}^{in} hold. Then, the nodes compute the maximum of their degrees, the value Δ\Delta. Every node in CvoutC_{v}^{out} and CvinC_{v}^{in} sends to vv the edges incident to vv which it holds. Node vv desires to receive at most O(Δ)O(\Delta) messages, and each node desires to send at most O~(Δ)\tilde{O}(\Delta) messages, as every node is the carrier of at most O~(1)\tilde{O}(1) nodes. Thus, due to \IfAppendixLABEL:\next ( (Routing).)Lemma B.1, this requires O~(Δ/c+1)\tilde{O}(\Delta/c+1) rounds.

It is possible to extend Lemma B.15, and show that if for a node vv, both vv and all the carrier nodes of vv know some predicate over edges, then it is possible to send to vv only edges incident to it which satisfy the predicate. We formalize this, as follows.

Lemma B.16 (Learn Carried Information with Predicate).

Assume that we are given a graph G=(V,E)G=(V,E) with average degree k=|E|/|V|k=\left|E\right|/\left|V\right| held in a carrier configuration CC. If each node vv has a predicate pvp_{v} over the edges incident to vv, which both vv and the nodes Cvout,CvinC_{v}^{out},C_{v}^{in} know, then it is possible for each node vv to learn all edges incident to it in GG which satisfy the predicate. The round complexity for this procedure is O~(Δp/c+1){\tilde{{O}}}(\Delta_{p}/c+1) rounds w.h.p., where Δp\Delta_{p} is the maximal number of edges incident to any node vv which satisfy pvp_{v}.

B.2.3 Merging Carrier Configurations

We present a useful tool, which shows how to compute the point-wise minimum of two matrices. With respect to graphs, this can be seen as adding edges to a graph, and if an edge exists twice, then setting its weight to the minimum of the two. This tool can be used in order to merge two carrier configurations.

Lemma B.17 (Merging).

Let VV be a set of nodes which hold two n×nn\times n matrices S,TS,T in carrier configurations AA, BB, respectively. Denote by P=min{S,T}P=\min\{S,\ T\} the matrix generated by taking the point-wise minimum of the two given matrices. It is possible within O~((kS+kT+hA+hB)/c+1)\tilde{O}((k_{S}+k_{T}+h_{A}+h_{B})/c+1) rounds to output PP in a carrier configuration CC, where the values kS,kTk_{S},k_{T} denote the average number of finite elements per row of S,TS,T, respectively, and the values hA,hBh_{A},h_{B} denote the maximal number of carriers each node has in AA, BB, respectively, w.h.p.

Proof of Lemma B.17.

We show how to set up CvoutC_{v}^{out}, and note that the case of CvinC_{v}^{in} is symmetric. Thus, we sometimes drop the superscripts and denote Av=Avout,Bv=Bvout,Cv=CvoutA_{v}=A_{v}^{out},B_{v}=B_{v}^{out},C_{v}=C_{v}^{out}. A critical note is that in the following proof, when we denote AvBvA_{v}\cup B_{v}, if a node appears in both carrier sets, we count it twice in the union. That is, AvBvA_{v}\cup B_{v} denotes a multiset. Further, let ee be some edge which is held in AA or BB by some carrier node ww. At the onset of the algorithm, ww attaches its identifier and communication token to ee – that is, whenever ee is sent in a message, it is sent along with these values as metadata.

Proof Overview

Goal: Consider a node vVv\in V. Essentially, node vv has a sparse array (row vv in matrix SS, denoted as S[v]S[v]) held, in a distributed fashion, over the nodes in AvA_{v}, and a sparse array (row vv in matrix TT, denoted as T[v]T[v]) held over the nodes in BvB_{v}. Node vv wishes to merge these two arrays into one sparse array (the currently not-yet computed P[v]P[v]), and hold it in some (currently not allocated) carrier set CvC_{v}. In the case that an entry appears in both S[v]S[v] and T[v]T[v], it should keep the minimum of the values.

Merging: Initially, vv performs some merging mechanism in order to compute the sparse array P[v]P[v]. At the end of this step, the array P[v]P[v] is distributed across the nodes AvA_{v} and BvB_{v}, as we have yet to allocate CvC_{v}.

Constructing CC: Finally, we allocate the set CvC_{v}, and move the data of P[v]P[v] from its temporary storage in the nodes AvA_{v} and BvB_{v} to be distributed across CvC_{v}. Further, several steps are taken to ensure that CC is a valid carrier configuration.

Step: Merging

Observe some node vv. In this step, our goal is to compute P[v]P[v], and store it in a convenient distributed representation across the nodes in AvBvA_{v}\cup B_{v}.

Initially, we desire for all the nodes in AvBvA_{v}\cup B_{v} to be able to communicate with one another. Node vv knows the communication tokens and identifiers of the nodes in AvBvA_{v}\cup B_{v} (Item 1), and broadcasts all of them to all the nodes AvBvA_{v}\cup B_{v} in O~(|Av|/c+|Bv|/c+1)=O~(hA/c+hB/c+1)\tilde{O}(|A_{v}|/c+|B_{v}|/c+1)=\tilde{O}(h_{A}/c+h_{B}/c+1) rounds using \IfAppendixLABEL:\next ( (Carriers Broadcast and Aggregate).)Lemma B.14.

Due to Item 4, S[v]S[v] is distributed across AvA_{v} such that an interval Iw=[wb,we]I_{w}=[w_{b},w_{e}] corresponds to each wAvw\in A_{v}, where ww holds all the finite elements in S[v][Iw]S[v][I_{w}] (entries from index wbw_{b} to index wew_{e} of S[v]S[v]). The same holds for BvB_{v}. For a set of carrier nodes XX, denote I(X)={Iw|wX}I(X)=\{I_{w}|w\in X\}, and for a set of intervals JJ, denote by D(J)={x,y|[x,y]J}D(J)=\{x,y|[x,y]\in J\} the set of delimiters of JJ. Due to Property 5, node vv knows D(I(Av)I(Bv))D(I(A_{v})\cup I(B_{v})) and D(I(Bv))D(I(B_{v})). We now perform the following steps.

  1. 1.

    Node vv computes J={[x,y]|x<yD(I(Av)I(Bv))zD(I(Av)I(Bv))J=\{[x,y]\ |\ x<y\in D(I(A_{v})\cup I(B_{v}))\land\nexists z\in D(I(A_{v})\cup I(B_{v})), where x<z<y}x<z<y\}, that is, the partition of [n][n] into intervals using all of the delimiters in D(I(Av)I(Bv))D(I(A_{v})\cup I(B_{v})). Notice that |J|2(|I(Av)|+|I(Bv)|)=2(|Av|+|Bv|)|J|\leq 2(|I(A_{v})|+|I(B_{v})|)=2(|A_{v}|+|B_{v}|), and every IJI\in J is contained in exactly one interval in I(Av)I(A_{v}) and in one interval in I(Bv)I(B_{v}). Further, KJ=[n]\bigcup_{K\in J}=[n], since KI(Av)=[n]\bigcup_{K\in I(A_{v})}=[n] and KI(Bv)=[n]\bigcup_{K\in I(B_{v})}=[n].

  2. 2.

    Node vv broadcasts D(J)D(J) to AvBvA_{v}\cup B_{v} in O~(|J|/c+1)=O~(|Av|/c+|Bv|/c+1)=O~(hA/c+hB/c+1)\tilde{O}(|J|/c+1)=\tilde{O}(|A_{v}|/c+|B_{v}|/c+1)=\tilde{O}(h_{A}/c+h_{B}/c+1) rounds using Lemma B.14.

Notice that all the nodes in AvBvA_{v}\cup B_{v} know the identifiers of one another (guaranteed above), and also all of D(J)D(J). Thus, it is possible for the nodes in AvBvA_{v}\cup B_{v} to perform local computation which allocates to each uAvBvu\in A_{v}\cup B_{v} two intervals, Ku1,Ku2JK_{u}^{1},K_{u}^{2}\in J, and every node in AvBvA_{v}\cup B_{v} knows that uu is assigned Ku1,Ku2K_{u}^{1},K_{u}^{2}.

Now, we wish for uu to learn the finite entries in S[v][Ku1],T[v][Ku1],S[v][Ku2],T[v][Ku2]S[v][K_{u}^{1}],\ T[v][K_{u}^{1}],\ S[v][K_{u}^{2}],\ T[v][K_{u}^{2}], and compute P[v][Ku1],P[v][Ku2]P[v][K_{u}^{1}],\ P[v][K_{u}^{2}]. To do so, we need to route the finite entries which uu requires from their current storage in the nodes AvBvA_{v}\cup B_{v} to uu. We bound the amount of information uu receives. For any interval LaI(Av)L_{a}\in I(A_{v}) and LbI(Bv)L_{b}\in I(B_{v}), there are at most kSk_{S} and kTk_{T} finite elements in S[v][La]S[v][L_{a}] and T[v][Lb]T[v][L_{b}], respectively. Further, every interval KJK\in J is contained in exactly one interval in I(Av)I(A_{v}) and one interval in I(Bv)I(B_{v}), and so the number of finite elements in S[v][K]S[v][K] and T[v][K]T[v][K] is at most kSk_{S} and kTk_{T} respectively. Therefore, node uu desires to learn at most O(kS+kT)O(k_{S}+k_{T}) finite elements. We now bound the amount of information node uu sends to other nodes in AvBvA_{v}\cup B_{v} in order to let them learn their desired intervals. Node uu originally holds at most O(kS+kT)O(k_{S}+k_{T}) finite elements, and each element is desired by exactly one node. Therefore, node uu sends at most O(kS+kT)O(k_{S}+k_{T}) finite elements. Thus, we conclude that every node in AvBvA_{v}\cup B_{v} sends and receives at most O(kS+kT)O(k_{S}+k_{T}) messages to and from other nodes in AvBvA_{v}\cup B_{v}, showing that this step can be completed in O~((kS+kT)/c+1)\tilde{O}((k_{S}+k_{T})/c+1) rounds, using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1.

Finally, we wish for all the nodes in AvBvA_{v}\cup B_{v} to know, for each KJK\in J, how many finite entries are in P[v][K]P[v][K]. Using Lemma B.1, every node uAvBvu\in A_{v}\cup B_{v} sends to vv the number of finite entries in P[v][Ku1]P[v][K_{u}^{1}] and in P[v][Ku2]P[v][K_{u}^{2}], within O~(|Av|/c+|Bv|/c+1)=O~(hA/c+hB/c+1)\tilde{O}(|A_{v}|/c+|B_{v}|/c+1)=\tilde{O}(h_{A}/c+h_{B}/c+1) rounds. Then, vv broadcasts all of this information to AvBvA_{v}\cup B_{v} using \IfAppendixLABEL:\next ( (Carriers Broadcast and Aggregate).)Lemma B.14 in O~(|Av|/c+|Bv|/c+1)=O~(hA/c+hB/c+1)\tilde{O}(|A_{v}|/c+|B_{v}|/c+1)=\tilde{O}(h_{A}/c+h_{B}/c+1) rounds.

Step: Constructing CC

We perform several operations in this step. First, we invoke \IfAppendixLABEL:\next ( (Initialize Carriers).)Lemma B.11 w.r.t. PP, in order to create the carrier sets CvC_{v}, which satisfy all of the properties of a carrier configuration, except for Items 3, 4, 5, 4 and 5. These remaining properties relate to populating the sets CvC_{v} with data. Therefore, we then populate CvC_{v} with the data pertaining to P[v]P[v].

Sub-step – Invoking Lemma B.11: In order to invoke Lemma B.11 w.r.t. PP, every node vv needs to know the number of finite entries in row vv of PP and column vv of PP. Notice that vv can compute the number of finite entries in P[v]P[v] by aggregating over the nodes AvBvA_{v}\cup B_{v}. In order to compute the number of finite entries in column vv of PP, recall that at the beginning of the proof, we say that our analysis follows only the rows of the matrices, thus, inherently one also runs the algorithm up to this point on the columns of the matrices. Therefore, in a symmetric way, vv can know the number of finite entries in column vv of PP. Next, we analyze the round complexity of invoking Lemma B.11 w.r.t. PP. Denote by kPk_{P} the average number of finite elements in a row of PP, and, by aggregation, all of the nodes of the graph compute kPk_{P}. Notice that kPkSk_{P}\geq k_{S}, and kPkTk_{P}\geq k_{T}, as in each row the number of finite elements could only have increased due to the minimization operation. Further, the maximal number of finite elements in a row of PP is at most the maximal number of finite elements in a row of SS plus the maximal number of finite elements in a row of TT. Thus, the round complexity is O~((kShA+kThB)/(kPc)+1)=O((hA+hB)/c+1)\tilde{O}((k_{S}\cdot h_{A}+k_{T}\cdot h_{B})/(k_{P}\cdot c)+1)=O((h_{A}+h_{B})/c+1) rounds.

Sub-step – Ensuring Items 3, 4, 5 and 4: First, we begin by computing the intervals I(Cv)I(C_{v}). Notice that |Cv||Av|+|Bv||C_{v}|\leq|A_{v}|+|B_{v}|, since kPkS,kPkTk_{P}\geq k_{S},\ k_{P}\geq k_{T}, and AvA_{v} and BvB_{v} can themselves hold the vectors S[v],T[v]S[v],\ T[v] with each node in each set carrying at most kS,kTk_{S},\ k_{T} finite elements, respectively. Now, we partition [n][n] into |Cv||C_{v}| intervals I(Cv)=L1,,L|Cv|I(C_{v})=L_{1},\dots,L_{|C_{v}|}, such that the number of finite elements in every P[v][Li]P[v][L_{i}] is at most kPk_{P}. As the nodes in AvBvA_{v}\cup B_{v} all know D(J)D(J), as well for each KJK\in J, how many finite entries are in P[v][K]P[v][K], every node in AvBvA_{v}\cup B_{v} knows for every finite element in P[v]P[v] which it holds the number of finite elements preceding it in P[v]P[v]. Thus, for every interval LiI(Cv)L_{i}\in I(C_{v}), there exist some two nodes x,yAvBvx,y\in A_{v}\cup B_{v} such that xx can compute the left endpoint of LiL_{i} and yy can compute the right. We show this for left endpoints as the proof for right endpoints is symmetric. The left endpoint of LiL_{i} is the index of the [(i1)kP+1][(i-1)\cdot k_{P}+1]-th finite entry in P[v]P[v], and thus the node in AvBvA_{v}\cup B_{v} which holds this finite element, knows the left endpoint of LiL_{i}. In O~(|Cv|/c+1)=O~(|Av|/c+|Bv|/c+1)=O~(hA/c+hB/c+1)\tilde{O}(|C_{v}|/c+1)=\tilde{O}(|A_{v}|/c+|B_{v}|/c+1)=\tilde{O}(h_{A}/c+h_{B}/c+1) rounds, the nodes in AvBvA_{v}\cup B_{v} tell vv the contents of D(I(Cv))D(I(C_{v})), using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1. In the same round complexity, vv broadcasts D(I(Cv))D(I(C_{v})) to AvA_{v}, BvB_{v}, and CvC_{v}, using \IfAppendixLABEL:\next ( (Carriers Broadcast and Aggregate).)Lemma B.14.

Now, we move the finite entries of P[v]P[v] from the nodes in AvBvA_{v}\cup B_{v} to the nodes in CvC_{v}. Node vv broadcasts the communication tokens and identifiers of all the nodes in CvC_{v} to all the nodes in AvBvA_{v}\cup B_{v}, in O~(|Cv|/c+1)=O~(|Av|/c+|Bv|/c+1)=O~(hA/c+hB/c+1)\tilde{O}(|C_{v}|/c+1)=\tilde{O}(|A_{v}|/c+|B_{v}|/c+1)=\tilde{O}(h_{A}/c+h_{B}/c+1) rounds. The nodes in AvBvA_{v}\cup B_{v} communicate all of the finite entries of P[v]P[v] to CvC_{v}, each node knowing where to send the information which it holds as all the nodes in AvBvA_{v}\cup B_{v} know D(I(Cv))D(I(C_{v})). Each node sends or receives at most O(kS+kT+kP)=O(kS+kT)O(k_{S}+k_{T}+k_{P})=O(k_{S}+k_{T}) messages, therefore routing these messages takes O~(kS/c+kT/c+1)\tilde{O}(k_{S}/c+k_{T}/c+1) rounds using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1.

Observe that the above procedure ensures that CC satisfies Items 3, 4, 5 and 4.

Sub-step – Ensuring Item 5: Recall that, as stated at the beginning of the proof, we show how to construct CvoutC_{v}^{out}, while the case of CvinC_{v}^{in} is a symmetric algorithm. At this point, we require that all of the above be executed w.r.t. to both CvoutC_{v}^{out} and CvinC_{v}^{in}. This is due to the fact that in order to satisfy Item 5 for CvoutC_{v}^{out}, we query the nodes of CvinC_{v}^{in} for some information which they compute above. Thus, we now show how Item 5 is satisfied for CvoutC_{v}^{out}. In a symmetric way, it can be shown for CvinC_{v}^{in}.

Let there be some edge e=(v,w)e=(v,w), for some wVw\in V, which is now held in γCvout\gamma\in C_{v}^{out}. Denote by uAvoutBvoutu\in A_{v}^{out}\cup B_{v}^{out}, the node which originally held ee at the onset of the algorithm, and recall that at the onset of the algorithm (stated at the beginning of the proof), we attach to ee the communication token and identifier of uu, and so γ\gamma knows uu. W.l.o.g., assume that uAvoutu\in A_{v}^{out}. Due to the fact that AA is a carrier configuration, node uu knows the communication token and identifier of node wAwinw^{\prime}\in A_{w}^{in} which also holds ee. Again, assume that at the onset of the algorithm, node uu attached to ee the communication token and identifier of ww^{\prime}. Thus, node γ\gamma knows the communication token and identifier of ww^{\prime}.

As such, γ\gamma asks ww^{\prime} which node in CwinC_{w}^{in} holds ee. Node ww^{\prime} is able to answer this query, as all the nodes in AvinBvinA_{v}^{in}\cup B_{v}^{in} know which intervals are held by which nodes in CvinC_{v}^{in}. The answer to this query is exactly the information which node γ\gamma needs in order to satisfy Item 5. We analyze the round complexity of this routing. Each node in CvoutC_{v}^{out} sends queries only w.r.t. edges it holds as part of CC, and each node in Avin,BvinA_{v}^{in},B_{v}^{in} answers queries only w.r.t. edges it holds in A,BA,B. Thus, this step can be executed in O~(kS/c+kT/c+kP/c+1)=O~(kS/c+kT/c+1)\tilde{O}(k_{S}/c+k_{T}/c+k_{P}/c+1)=\tilde{O}(k_{S}/c+k_{T}/c+1) rounds, using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1. ∎

B.2.4 Partial Carrier Configuration

We now prove a fundamental tool which can be roughly viewed as computing the transpose of a matrix. Notice that each entry of data is stored twice in a carrier configuration CC. For instance, an edge e=(v,w)e=(v,w) is stored in both CvoutC_{v}^{out}, and CwinC_{w}^{in}. We show that if only the outgoing carrier sets CvoutC_{v}^{out} are stored, one can complete the data structure to contain also the incoming carrier sets CvinC_{v}^{in}.

This is a very useful tool, as sometimes nodes can only compute the edges directed away from them, and not the edges directed towards them. For instance, in \IfAppendixLABEL:\next ( (Sparse Matrix Multiplication).)Theorem 3.1, we reach a state where there are few edges directed away from every node, but potentially Θ(n)\Theta(n) edges directed towards some nodes. If one were to simply invoke \IfAppendixLABEL:\next ( (Initialize Carrier Configuration).)Lemma B.13, this would require every node to learn all of the edges directed both away and towards it, which would incur a high round complexity. Instead, \IfAppendixLABEL:\next ( (Partial Configuration Completion).)Lemma B.18 shows that given that every node vv has a partial carrier set holding edges directed away from it, the matching carrier set for edges directed towards vv can be allocated and directly populated with these edges without the edges ever being known to vv itself.

Definition 14 (Partial Carrier Configuration).

Given a set of nodes VV, a data structure CC is a partial carrier configuration holding a graph G=(V,E)G=(V,E) if all the conditions of \IfAppendixLABEL:\next ( (𝖠𝖢(𝖼)\mathsf{AC(c)} Carrier Configuration).)Definition 13 hold, yet, only for the outgoing edges. That is, each node vVv\in V only has CvoutC_{v}^{out}.

Notice that Item 5 is not demanded, as it requires the existence of both CvoutC_{v}^{out} and CvinC_{v}^{in}.

Lemma B.18 (Partial Configuration Completion).

Given a graph G=(V,E)G=(V,E) which is held in a partial carrier configuration CC, there exists an algorithm which runs in O~(nk/c+n/(kc)+1)\tilde{O}(\sqrt{nk}/c+n/(k\cdot c)+1) rounds, where k=|E|/|V|k=|E|/|V|, and outputs a carrier configuration DD holding GG, w.h.p.

Proof of Lemma B.18.

We assign Dvout=CvoutD_{v}^{out}=C_{v}^{out} for every vVv\in V, and thus we are required to show two items in this proof: how to allocate the sets DvinD_{v}^{in}, and how to populate them with data.

Allocating DvinD_{v}^{in}:

In order to allocate DvinD_{v}^{in}, node vv needs to know degGin(v)\deg_{G}^{in}{(v)}. Denote m=|E|m=|E|, t=mt=\sqrt{m}, and L={vV|degGin(v)2t}L=\{v\in V\ |\ \deg_{G}^{in}{(v)}\leq 2t\}, H=VLH=V\setminus L. The sets LL and HH contain light and heavy nodes, respectively, yet, notice that at the current stage in the algorithm, no node knows whether it itself is in LL or in HH, as it does not know degGin(v)\deg_{G}^{in}{(v)}. Our goal is to satisfy an even stronger condition – to make every node vVv\in V know for every uVu\in V whether uu is in LL or HH.

For a set of nodes XVX\subseteq V, denote by degGin(X)=vXdegGin(v)\deg_{G}^{in}{(X)}=\sum_{v\in X}\deg_{G}^{in}{(v)}. Let 𝕍={V1,,Vt}\mathbb{V}=\{V_{1},\dots,V_{t}\} be an arbitrary, hardcoded, globally known partition of VV, where all the parts are of roughly equal size. Using \IfAppendixLABEL:\next ( (Aggregation).)Corollary B.6, all nodes compute the values degGin(V1),,degGin(Vt)\deg_{G}^{in}{(V_{1})},\dots,\deg_{G}^{in}{(V_{t})} in O~(t/c+1)\tilde{O}(t/c+1) rounds. Denote the set of parts in 𝕍\mathbb{V} which have low in-degree by 𝕃={Vi𝕍|degGin(Vi)2t}\mathbb{L}=\{V_{i}\in\mathbb{V}|\deg_{G}^{in}{(V_{i})}\leq 2t\}, and the high in-degree ones by =𝕍𝕃\mathbb{H}=\mathbb{V}\setminus\mathbb{L}. Given vVv\in V, if vv belongs to some set in 𝕃\mathbb{L}, that is, vVi,Vi𝕃v\in V_{i},\ V_{i}\in\mathbb{L}, then certainly vLv\in L. As 𝕍\mathbb{V} is hardcoded and globally known, and all the nodes know 𝕃\mathbb{L}, all such nodes vv know that they are in LL, and further all the nodes in the graph know this as well.

As degGin(V1)++degGin(Vt)=degGin(V)=m=t2\deg_{G}^{in}{(V_{1})}+\dots+\deg_{G}^{in}{(V_{t})}=\deg_{G}^{in}{(V)}=m=t^{2}, it holds that ||<t/2|\mathbb{H}|<t/2, implying, 𝕃t/2\mathbb{L}\geq t/2. Since the sets in 𝕍\mathbb{V} are of equal sizes, then we have guaranteed at that least half of the nodes in VV are now identified as belonging to LL. These nodes are set aside, and we iterate over this procedure. In each iteration, at least half of the nodes remaining are marked as belonging to LL, up until the final iteration where only nodes in HH remain. Thus, every node in the graph knows which nodes belong to LL and which to HH.

Fix vLv\in L. It holds that degGin(v)=O(t)\deg_{G}^{in}{(v)}=O(t). Each node uu which holds an edge directed towards vv now sends that edge to vv. This is done using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1. We must show that uu knows the communication token of vv. This follows from Item 4, which states that every edge is stored alongside the communication tokens of both of its endpoints. The execution of Lemma B.1 completes in O~(t/c+1)\tilde{O}(t/c+1) rounds, as every node receives at most O~(t)\tilde{O}(t) messages, and sends at most O(k)=O(nk)=O(m)=O(t)O(k)=O(\sqrt{nk})=O(\sqrt{m})=O(t) messages. Thus, vv computes degGin(v)\deg_{G}^{in}{(v)}, as it knows all of the edges which are directed towards itself.

Observe HH – we show that every vHv\in H also computes degGin(v)\deg_{G}^{in}{(v)}. As the minimum in-degree of a node in HH is 2t2t, and t2=mt^{2}=m, we get |H|=O(t)|H|=O(t). Further, recall that every node knows which nodes are in HH. Therefore, using \IfAppendixLABEL:\next ( (Aggregation).)Corollary B.6, within O~(|H|/c+1)=O~(t/c+1)\tilde{O}(|H|/c+1)=\tilde{O}(t/c+1) rounds, every node in the graph knows the in-degree of every node in HH.

Finally, we allocate DvinD_{v}^{in}. We are given as input the partial carrier configuration CC, allowing each node vVv\in V to compute degGout(v)\deg_{G}^{out}{(v)} in O~(1)\tilde{O}(1) rounds, using \IfAppendixLABEL:\next ( (Carriers Broadcast and Aggregate).)Lemma B.14. Thus, we invoke \IfAppendixLABEL:\next ( (Initialize Carriers).)Lemma B.11, in O~(n/(kc)+1)\tilde{O}(n/(k\cdot c)+1) rounds, to create the sets Evout,EvinE_{v}^{out},E_{v}^{in}, which satisfy all of the properties of a carrier configuration, except for Items 3, 4, 5, 4 and 5. These remaining properties relate to populating the carrier node sets with input data. We throw away EvoutE_{v}^{out} and set Dvin=EvinD_{v}^{in}=E_{v}^{in}.

Populating DvinD_{v}^{in}:

Fix vLv\in L. As vv knows all the edges directed towards it, it sends these edges to its carrier in O~(degGin(v)/c+1)=O~(t/c+1)\tilde{O}(\deg_{G}^{in}{(v)}/c+1)=\tilde{O}(t/c+1) rounds, using \IfAppendixLABEL:\next ( (Carriers Broadcast and Aggregate).)Lemma B.14 and trivially completes Items 3, 4, 5 and 4, by sending at most O~(1)\tilde{O}(1) messages to each node in DvinD_{v}^{in}, requiring O~(degGin(v)/(kc)+1)=O~(t/(kc)+1)\tilde{O}(\deg_{G}^{in}{(v)}/(k\cdot c)+1)=\tilde{O}(t/(k\cdot c)+1) rounds. The only challenging task is ensuring Item 5, which requires that for every edge e=(w,v)e=(w,v), the node wDvinw\in D_{v}^{in} knows the communication token and identifier of wDwoutw^{\prime}\in D_{w}^{out} which holds ee. However, since every edge in this algorithm is sent directly from the carrier node which holds it (node ww^{\prime} sends ee), that carrier node can attach its own communication token and identifier to the edge itself when sending it, thus providing the information which the nodes in DvinD_{v}^{in} need in order to satisfy Item 5.

Partition VV into t/kt/k sets, W1=[1,nk/t],,Wt/k=[nnk/t+1,n]W_{1}=[1,nk/t],\dots,W_{t/k}=[n-nk/t+1,n], and use \IfAppendixLABEL:\next ( (Grouping).)Lemma B.7 to ensure that for each WiW_{i}, every node in WiW_{i} knows Tokens(Wi)Tokens(W_{i}), within O~(t/(kc)+(nk)/(tc)+1)=O~(t/c+1)\tilde{O}(t/(k\cdot c)+(nk)/(t\cdot c)+1)=\tilde{O}(t/c+1) rounds. For each set WiW_{i}, denote some arbitrary, hardcoded wiWiw_{i}\in W_{i} as the leader of WiW_{i}, and in O~(t/(kc)+1)\tilde{O}(t/(k\cdot c)+1) rounds, broadcast the communication tokens and identifiers of all the leaders, using \IfAppendixLABEL:\next ( (Broadcasting).)Lemma B.5.

Fix vHv\in H. Observe that |Dvin|degGin(v)/kt/k|D_{v}^{in}|\geq\deg_{G}^{in}{(v)}/k\geq t/k. For each set WiW_{i}, the ii-th carrier in DvinD_{v}^{in}, sends its communication token and identifier to wiw_{i}, using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1, in O~(t/c+1)\tilde{O}(t/c+1) rounds, as every carrier sends at most one message and every leader receives at most O(|H|)=O(t)O(|H|)=O(t) messages. Then, leader wiw_{i} broadcasts to WiW_{i} all the communication token and identifiers of carrier nodes which it received, taking O~(|H|/c+1)=O~(t/c+1)\tilde{O}(|H|/c+1)=\tilde{O}(t/c+1) rounds, using \IfAppendixLABEL:\next ( (Group Broadcasting and Aggregating).)Corollary B.9.

Finally, every node in WiW_{i} which holds an edge directed towards vv, tells the ii-th carrier in DvinD_{v}^{in} about this edge using Lemma B.1, in O~(nk/(tt)+1)=O~(t/c+1)\tilde{O}(nk/(t\cdot t)+1)=\tilde{O}(t/c+1) rounds, as each carrier node receives at most |Wi|=nk/t|W_{i}|=nk/t messages, and each node sends at most O(k)=O(nk)=O(t)O(k)=O(\sqrt{nk})=O(t) messages since CC is a partial carrier configuration. Now, the nodes DvinD_{v}^{in} know all of the edges directed towards vv. Since i<ji<j implies that every xWi,yWjx\in W_{i},\ y\in W_{j} hold x<yx<y, within O~(nk/(tc)+|Dvin|/c+1)=O~(t/c+n/(kc)+1)\tilde{O}(nk/(t\cdot c)+|D_{v}^{in}|/c+1)=\tilde{O}(t/c+n/(k\cdot c)+1) rounds, the nodes in DvinD_{v}^{in} can rearrange the information stored in them, as well as communicate with vv, in order to satisfy Items 3, 4, 5 and 4. To satisfy Item 5, an identical claim to the case of vLv\in L can be used. ∎

B.3 Efficient Hopset Construction

We efficiently compute, in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, a hopset [24], which allows approximating distance-related problems quickly. We follow the general outline of [14], and solve kk-nearest and (S,d,k)(S,d,k)-source detection, defined below, in order to construct the hopset. We solve these problems mainly using \IfAppendixLABEL:\next ( (Sparse Matrix Multiplication).)Theorem 3.1.

However, in contrast to the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} implementation, in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model we are met with additional challenges, as many operations which are trivial in the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} model become highly complex. For instance, upon computing the edges EHE_{H} of a hopset HH, one must add the edges to the graph – an operation which is straightforward in the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} model, yet requires \IfAppendixLABEL:\next ( (Merging).)Lemma B.17 in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model. Further, in the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Congested\ Clique} model, when we consider undirected graphs, once a node vv adds an edge (v,u)(v,u) to the graph then the edge (u,v)(u,v) is added as well, or updated to the minimum cost, if it exists already. To accomplish this in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, one should invoke the algorithm in \IfAppendixLABEL:\next ( (Merging).)Lemma B.17 on the matrix and the transpose of the matrix. However, transposing a matrix is not trivial and we accomplish it due to the definition of the carrier configuration, which implies that whenever nodes hold a matrix AA, they also implicitly hold ATA^{T}. This goes to show why various new tools are required in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model for this problem.

Definition 15 ((β,ε)(\beta,\varepsilon)-Hopset).

For a given weighted graph G=(V,E)G=(V,E), a (β,ε)(\beta,\varepsilon)-hopset, H=(V,EH)H=(V,E_{H}) is a set of edges such that paths of length at most β\beta hops in GHG\cup H approximate distances in GG by a multiplicative factor of at most (1+ε)(1+\varepsilon). That is, for each u,vVu,v\in V, dG(u,v)dGHβ(u,v)(1+ε)dG(u,v)d_{G}^{\infty}(u,v)\leq d_{G\cup H}^{\beta}(u,v)\leq(1+\varepsilon)d_{G}^{\infty}(u,v), where dGh(u,v)d_{G}^{h}(u,v) is the weight of the shortest path with at most hh hops between u,vu,v in GG.

We demonstrate how to construct the (logn/ε,ε)(\log n/\varepsilon,\varepsilon)-hopset HH over the input graph, where the number of edges in HH is O~(n3/2)\tilde{O}(n^{3/2}). This is done using \IfAppendixLABEL:\next ( (Hopset Construction).)Theorem B.19.

Theorem B.19 (Hopset Construction).

There exists an algorithm in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, such that given a weighted undirected input graph G=(V,E)G=(V,E) with n=|V|n=|V| and m=|E|=Ω(n3/2)m=|E|=\Omega(n^{3/2}), held in a carrier configuration CC, and given some 0<ε<10<\varepsilon<1, computes a (logn/ε,ε)(\log n/\varepsilon,\varepsilon)-hopset HH, with |H|=O~(n3/2)|H|=\tilde{O}(n^{3/2}), and outputs G=(V,EH)G^{\prime}=(V,E\cup H) in a carrier configuration CC^{\prime}. The round complexity of this algorithm is O~((n5/6/c+m/(n2/3c)+1)/ε)\tilde{O}((n^{5/6}/c+m/(n^{2/3}\cdot c)+1)/\varepsilon), w.h.p.

Before proving \IfAppendixLABEL:\next ( (Hopset Construction).)Theorem B.19, we prove several theorems related to the following two problems.

Definition 16 (kk-nearest).

Given a graph G=(V,E)G=(V,E) and a value k[n]k\in[n], in the kk-nearest problem, each node vv must learn kk of its closest neighbors in GG, breaking ties arbitrarily.

Definition 17 ((S,d,k)(S,d,k)-source detection).

Given a graph G=(V,E)G=(V,E), a set SVS\subseteq V, a value d[n]d\in[n], and a value k|S|k\leq|S|, in the (S,d,k)(S,d,k)-source detection problem, each node vv is required to learn its kk closest neighbors in SS, while considering paths of up to dd hops only.

We solve the kk-nearest and kk-nearest problems for the case where k=Ω(n1/3)k=\Omega(n^{1/3}), as the round complexity of our solutions does not improve for k=o(n1/3)k=o(n^{1/3}) due to pre-processing costs.

Lemma B.20 (kk-nearest Algorithm).

Given a graph G=(V,E)G=(V,E), where n=|V|n=|V|, held in a carrier configuration CC, and some value k=Ω(n1/3)k=\Omega(n^{1/3}), it is possible in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, within O~(kn1/3/c+n2/3/c+1)\tilde{O}(k\cdot n^{1/3}/c+n^{2/3}/c+1) rounds, w.h.p., to output a directed graph G=(V,E)G^{\prime}=(V,E^{\prime}) held in a carrier configuration CC^{\prime}, where EE^{\prime} contains an edge from every node vVv\in V to every node uu which is one of the closest kk nodes to vv (with ties broken arbitrarily), with weight dG(v,u)d_{G}(v,u). Notice that it can be the case that EEE\not\subset E^{\prime}.

Proof of Lemma B.20.

As shown in [14], the following process solves the problem. Take the adjacency matrix AA of GG, and in each row keep the kk smallest entries (breaking ties arbitrarily), to create some111111Given AA, there potentially are many options for AA^{\prime}, since ties can be broken arbitrarily. matrix AA^{\prime}. Then, the matrix AA^{\prime} is iteratively squared, for at most logn\log n iterations, while after each product only the kk smallest entries in each row are preserved. We create some matrix AA^{\prime} from AA. Fix vVv\in V. Node vv computes degout(v)\deg_{out}{(v)}, in O~(1)\tilde{O}(1) rounds, using \IfAppendixLABEL:\next ( (Carriers Broadcast and Aggregate).)Lemma B.14. If degout(v)k\deg_{out}{(v)}\leq k, then vv uses \IfAppendixLABEL:\next ( (Learn Carried Information).)Lemma B.15 in O~(k/c+1)\tilde{O}(k/c+1) rounds to learn all of the edges outgoing from it. Otherwise degout(v)>k\deg_{out}{(v)}>k, and denote by f(v,p,i)f(v,p,i) the edges directed away from vv with weight at most pp and towards nodes with identifiers at most ii. Node vv computes two values, pvp_{v} and ivi_{v}, such that pvp_{v} is the maximal value where there exists an ivi_{v} such that |f(v,pv,iv)|=k|f(v,p_{v},i_{v})|=k. Given any value pp, node vv can compute |f(v,p,)||f(v,p,\infty)| within O~(1)\tilde{O}(1) rounds using \IfAppendixLABEL:\next ( (Carriers Broadcast and Aggregate).)Lemma B.14. Thus, in O~(1)\tilde{O}(1) rounds, vv can compute pvp_{v} using binary search. Likewise, |f(v,pv,i)||f(v,p_{v},i)| can be computed in O~(1)\tilde{O}(1) rounds for any ii, and thus using binary search vv computes ivi_{v}. Then, node vv broadcasts pvp_{v} and ivi_{v} to CvoutC_{v}^{out}, and using \IfAppendixLABEL:\next ( (Learn Carried Information with Predicate).)Lemma B.16, within O~(k/c+1)\tilde{O}(k/c+1) rounds, learns all of the edges in f(v,pv,iv)f(v,p_{v},i_{v}). We need to hold AA^{\prime} in a carrier configuration in order to use it for matrix multiplication. Each node vv with degout(v)>k\deg_{out}{(v)}>k knows the entries of row vv in AA^{\prime} – they are f(v,pv,iv)f(v,p_{v},i_{v}). Each node vv with degout(v)<k\deg_{out}{(v)}<k, locally adds arbitrary edges directed away from it with infinite weight, to have exactly kk edges directed away from it. We denote the new matrix created by this process by A′′A^{\prime\prime}, and notice that A′′A^{\prime\prime} has the same properties w.r.t. distances as AA^{\prime}, since edges of infinite weight do not affect shortest paths. As each node holds exactly kk edges directed away from it, the nodes themselves are a partial carrier configuration, DD, holding A′′A^{\prime\prime}. That is, for each node vv, we set Dvout={v}D_{v}^{out}=\{v\}. We invoke \IfAppendixLABEL:\next ( (Partial Configuration Completion).)Lemma B.18, within O~(nk/c+n/(kc)+1)=O~(kn1/3/c+n2/3/c+1)\tilde{O}(\sqrt{nk}/c+n/(k\cdot c)+1)=\tilde{O}(k\cdot n^{1/3}/c+n^{2/3}/c+1) rounds, since k=Ω(n1/3)k=\Omega(n^{1/3}), in order to get a carrier configuration DD^{\prime} which holds A′′A^{\prime\prime}.

Finally, we iteratively square A′′A^{\prime\prime} by applying \IfAppendixLABEL:\next ( (Sparse Matrix Multiplication).)Theorem 3.1. That is, we compute h(A′′×A′′)h(A^{\prime\prime}\times A^{\prime\prime}), where hh takes a matrix and leaves only the kk smallest entries in each row. Then, we compute h(h((A′′)2)×h((A′′)2))h(h((A^{\prime\prime})^{2})\times h((A^{\prime\prime})^{2})), and so forth. Repeating this procedure for O(logn)O(\log n) iterations results in an output matrix which holds in row vv edges only to some kk closest nodes to vv. We perform O(logn)O(\log n) matrix multiplication, with each taking O~(kn1/3/c+n/(kc)+1)=O~(kn1/3/c+n2/3/c+1)\tilde{O}(k\cdot n^{1/3}/c+n/(k\cdot c)+1)=\tilde{O}(k\cdot n^{1/3}/c+n^{2/3}/c+1) rounds, since k=Ω(n1/3)k=\Omega(n^{1/3}) and we always multiply two matrices with at most kk elements per row, due to \IfAppendixLABEL:\next ( (Sparse Matrix Multiplication).)Theorem 3.1. ∎

Lemma B.21 ((S,d,k)(S,d,k)-source detection Algorithm).

Given a graph G=(V,E)G=(V,E), where n=|V|n=|V| and m=|E|m=|E|, held in a carrier configuration CC, and given S,d,kS,d,k, where k=|S|=Ω(n1/3)k=|S|=\Omega(n^{1/3}) and d[n]d\in[n], it is possible in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, within O~(([k+m/n]n1/3/c+n2/3/c+1)(d+1))\tilde{O}(([k+m/n]\cdot n^{1/3}/c+n^{2/3}/c+1)\cdot(d+1)) rounds, w.h.p., to output a directed graph G=(V,E)G^{\prime}=(V,E^{\prime}) held in a carrier configuration CC^{\prime}, where EE^{\prime} contains an edge from every node vv to every node sSs\in S which is at most dd hops away from vv, with weight dGd(v,s)d_{G}^{d}(v,s). Notice that it can be the case that EEE\not\subset E^{\prime}.

It is assumed that the IDs of the nodes in SS are known to all of VV.

Proof of Lemma B.21.

In [14], it is shown that the following process solves the problem. Denote by AA the adjacency matrix of GG. Denote by AA^{\prime} the sparsified adjacency matrix with edges only entering nodes in SS. The matrix B=Ad1×AB=A^{d-1}\times A^{\prime} is the solution to the problem.

We construct a carrier configuration which holds AA^{\prime}. Fix vVv\in V. Denote by E(v,S)E(v,S) the edges from vv directed towards nodes in SS. Node vv uses \IfAppendixLABEL:\next ( (Learn Carried Information with Predicate).)Lemma B.16 to learn E(v,S)E(v,S), in O~(|S|/c+1)=O~(k/c+1)\tilde{O}(|S|/c+1)=\tilde{O}(k/c+1) rounds. We construct a partial carrier configuration FF which contains for each vv the edges E(v,S)E(v,S), by setting Fvout={v}F_{v}^{out}=\{v\}, since the average degree in FF is exactly |S||S|.121212Assuming that if a node vv does not have an edge to node in SS, it inserts a dummy edge with infinite weight. Using \IfAppendixLABEL:\next ( (Partial Configuration Completion).)Lemma B.18, we turn FF into a carrier configuration DD holding AA^{\prime}, in O~(nk/c+n/(kc)+1)=O~(kn1/3/c+n2/3/c+1)\tilde{O}(\sqrt{nk}/c+n/(k\cdot c)+1)=\tilde{O}(k\cdot n^{1/3}/c+n^{2/3}/c+1) rounds, since k=Ω(n1/3)k=\Omega(n^{1/3}). Finally, we perform d1d-1 multiplications. We compute the product B×AB\times A^{\prime} by invoking \IfAppendixLABEL:\next ( (Sparse Matrix Multiplication).)Theorem 3.1 with the carrier configurations C,DC,D, to get a carrier configuration EE, which holds B×AB\times A^{\prime}, in O~([k+m/n]n1/3/c+n/(kc)+1)=O~([k+m/n]n1/3/c+n2/3/c+1)\tilde{O}([k+m/n]\cdot n^{1/3}/c+n/(k\cdot c)+1)=\tilde{O}([k+m/n]\cdot n^{1/3}/c+n^{2/3}/c+1) rounds, since the average number of finite elements per row in BB is at most O(m/n)O(m/n), in AA^{\prime} it is at most kk, and k=Ω(n1/3)k=\Omega(n^{1/3}). Notice that while this invocation of \IfAppendixLABEL:\next ( (Sparse Matrix Multiplication).)Theorem 3.1 only computes the kk smallest entries in each row of B×AB\times A^{\prime}, there are only at most kk entries in B×AB\times A^{\prime} which are finite – all the columns not corresponding to nodes in SS do not contain finite values. Thus, it turns out that the invocation of \IfAppendixLABEL:\next ( (Sparse Matrix Multiplication).)Theorem 3.1 actually computes B×AB\times A^{\prime} exactly. Thus, we now multiply CC by EE and repeat d2d-2 times until achieving the final result, taking O~(([k+m/n]n1/3/c+n2/3/c+1)(d+1))\tilde{O}(([k+m/n]\cdot n^{1/3}/c+n^{2/3}/c+1)\cdot(d+1)) rounds. ∎

We turn our attention to proving \IfAppendixLABEL:\next ( (Hopset Construction).)Theorem B.19.

Proof of Theorem B.19.

In the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, some preparation is necessary before constructing the desired hopset.  

Initialization: We initialize the hopset HH and denote by DD the carrier configuration holding it. We initialize HH with Θ~(n3/2)\tilde{\Theta}(n^{3/2}) arbitrary, hardcoded edges all with infinite weights. While adding arbitrary edges of infinite weight does not affect distances, it ensures that throughout the entire algorithm HH will contain Ω~(n3/2)\tilde{\Omega}(n^{3/2}) edges. Further, as no more than O~(n3/2)\tilde{O}(n^{3/2}) are added in the algorithm which follows, HH will always contain Θ~(n3/2)\tilde{\Theta}(n^{3/2}) edges, ensuring that the average degree in HH is always Θ~(n)\tilde{\Theta}(\sqrt{n}), and thus the maximal number of carrier nodes that each node has is at most O~(n/n)=O~(n)\tilde{O}(n/\sqrt{n})=\tilde{O}(\sqrt{n}).

Whenever a set of edges is added to HH, it is assumed that if an edge is added from node vv to node uu, then also an edge is added in the opposite direction. As such, assume that whenever we add sets of edges to HH, we then reset HH to be min(H,HT)\min(H,H^{T}), by invoking \IfAppendixLABEL:\next ( (Merging).)Lemma B.17 on DD. Notice that due to \IfAppendixLABEL:\next ( (𝖠𝖢(𝖼)\mathsf{AC(c)} Carrier Configuration).)Definition 13, if we set Dvin=Dvout{D^{\prime}}_{v}^{in}={D}_{v}^{out} and Dvout=Dvin{D^{\prime}}_{v}^{out}={D}_{v}^{in}, we get that DD^{\prime} is a carrier configuration holding HTH^{T}. As the maximal number of carrier nodes each node has in DD is O~(n)\tilde{O}(\sqrt{n}), these invocations of \IfAppendixLABEL:\next ( (Merging).)Lemma B.17 take O~(n1/2/c+1)\tilde{O}(n^{1/2}/c+1) rounds.  

Construction: We now begin computing the edges of HH. Initially, we use \IfAppendixLABEL:\next ( (kk-nearest Algorithm).)Lemma B.20 on CC, with k=Θ~(n)k=\tilde{\Theta}(\sqrt{n}) in order to get a carrier configuration KK with an edge from each node vVv\in V to its Θ~(n)\tilde{\Theta}(\sqrt{n}) nearest neighbors. This takes O~(n5/6/c+1)\tilde{O}(n^{5/6}/c+1) rounds. We add the edges from KK to DD using \IfAppendixLABEL:\next ( (Merging).)Lemma B.17 in O~(n1/2/c+1)\tilde{O}(n^{1/2}/c+1) rounds.

Next, we sample nodes SS, where |S|=Θ~(n)|S|=\tilde{\Theta}(\sqrt{n}), by letting every node join SS independently with probability Θ~(n1/2)\tilde{\Theta}(n^{-1/2}), ensuring, w.h.p., that each node vVv\in V holds in DD its distance to at least one node of SS. We use \IfAppendixLABEL:\next ( (Broadcasting).)Lemma B.5, in O~(n1/2/c+1)\tilde{O}(n^{1/2}/c+1) rounds, in order to let every node vVv\in V know all of SS.

We solve the (S,d,k)(S,d,k)-source detection problem with SS over the graph GHG\cup H, and add the resulting edges to HH. We need to do this iteratively for O~(1)\tilde{O}(1) iterations. In each iteration, we invoke \IfAppendixLABEL:\next ( ((S,d,k)(S,d,k)-source detection Algorithm).)Lemma B.21 with S,d=O(1/ε),k=|S|=Θ~(n)S,d=O(1/\varepsilon),k=|S|=\tilde{\Theta}(\sqrt{n}) in O~(((n+m/n)n1/3/c+n2/3/c+1)(1/ε+1))=O~((n5/6/c+m/(n2/3c)+1)/ε)\tilde{O}(((\sqrt{n}+m/n)\cdot n^{1/3}/c+n^{2/3}/c+1)\cdot(1/\varepsilon+1))=\tilde{O}((n^{5/6}/c+m/(n^{2/3}\cdot c)+1)/\varepsilon) rounds. We have now constructed the hopset HH, and therefore can create CC^{\prime} by executing \IfAppendixLABEL:\next ( (Merging).)Lemma B.17 on CC and DD, taking O~((n1/2+m/n)/c+1)\tilde{O}((n^{1/2}+m/n)/c+1) rounds, since we assume that m=Ω(n3/2)m=\Omega(n^{3/2}), completing the proof. ∎

B.4 SSSP

We begin by showing how to perform Bellman-Ford iterations [25] in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model using carrier configurations. Given a source node ss in a graph GG, in a Bellman-Ford iteration ii, every node in vGv\in G broadcasts to its neighbors dGi1(s,v)d^{i-1}_{G}(s,v), its distance to ss with at most i1i-1 hops, and then calculates dGi(s,v)d^{i}_{G}(s,v) by taking the minimal distance to ss which it receives from its neighbors in this iteration.

Lemma B.22 (Bellman-Ford Iterations in 𝖠𝖢(𝖼)\mathsf{AC(c)}).

Given a (directed or undirected) weighted graph G=(V,E)G=(V,E) with average degree kk held in a carrier configuration CC, and a source node sVs\in V, it is possible in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, within O~((k/c+1)i)\tilde{O}((k/c+1)\cdot i) rounds, w.h.p., to perform ii iterations of the Bellman-Ford algorithm on GG with ss as the source.

Proof of Lemma B.22.

Fix vVv\in V. Node vv computes dG1(s,v)d_{G}^{1}(s,v), within O~(1)\tilde{O}(1) rounds, using \IfAppendixLABEL:\next ( (Carriers Broadcast and Aggregate).)Lemma B.14. Then, to simulate the jj-th iteration, node vv broadcasts to CvoutC_{v}^{out} the value dGj1(s,v)d_{G}^{j-1}(s,v), in O~(1)\tilde{O}(1) rounds. Each node uCvoutu\in C_{v}^{out}, for every edge e={v,w}e=\{v,w\} which uu stores, sends to the node uCwinu^{\prime}\in C_{w}^{in} which stores ee, the value dGj1(s,v)+w(e)d_{G}^{j-1}(s,v)+w(e). Since the average degree in GG is kk, and due to \IfAppendixLABEL:\next ( (𝖠𝖢(𝖼)\mathsf{AC(c)} Carrier Configuration).)Definition 13, it holds that each uVu\in V sends and receives at most kk messages in this step, thus taking O~(k/c+1)\tilde{O}(k/c+1) rounds, using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1. Finally, vv sets dGj(s,v)d_{G}^{j}(s,v) to be the minimum over all the values which nodes CvinC_{v}^{in} received in this iteration, within O~(1)\tilde{O}(1) rounds. We repeat the above process i1i-1 times. ∎

We show how to compute exact SSSP in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model.

Theorem B.23 (Exact SSSP in 𝖠𝖢(𝖼)\mathsf{AC(c)}).

Given a weighted undirected graph G=(V,E)G=(V,E) with n=|V|n=|V| and m=|E|m=|E|, held in a carrier configuration CC, and a source node sVs\in V, it is possible in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, within O~(m1/2n1/6/c+n/c+n7/6/m1/2)\tilde{O}(m^{1/2}n^{1/6}/c+n/c+n^{7/6}/m^{1/2}) rounds, w.h.p., to ensure that every node vGv\in G knows the value dG(s,v)d_{G}(s,v).

Proof of Theorem B.23.

It was proven in [54] that it is possible to solve exact SSSP on a weighted, undirected graph GG with a source ss by using the following steps. First, one solves the kk-nearest problem, for some kk, and creates the graph GG^{\prime} by starting with GG and adding weighted edges from each node to kk of its closest neighbors, with the weights equal to the weighted distance between the nodes in GG. Then, one performs O(n/k)O(n/k) Bellman-Ford iterations on GG^{\prime} with ss as the source, and it is guaranteed that for each vVv\in V, it holds that dGO(n/k)(s,v)=dGn(s,v)d^{O(n/k)}_{G^{\prime}}(s,v)=d^{n}_{G}(s,v).

Thus, in order to solve exact SSSP, we choose the value k=m1/2/n1/6k^{\prime}=m^{1/2}/n^{1/6} which balances the number of rounds required for our kk-nearest 131313\IfAppendixLABEL:\next ( (kk-nearest Algorithm).)Lemma B.20 requires k=Ω(n1/3)k^{\prime}=\Omega(n^{1/3}), and this holds since we assume that the graph is connected, implying, mnm\geq n. and Bellman-Ford implementations.

Due to \IfAppendixLABEL:\next ( (kk-nearest Algorithm).)Lemma B.20, we can solve the kk^{\prime}-nearest problem in O~(kn1/3/c+n2/3/c+1)=O~(m1/2n1/6/c+n/c+1)\tilde{O}(k^{\prime}\cdot n^{1/3}/c+n^{2/3}/c+1)=\tilde{O}(m^{1/2}n^{1/6}/c+n/c+1) rounds. This gives us a carrier configuration DD, which for every node holds edges directed away from it to kk^{\prime} of its nearest nodes. We need to add these edges from DD to the carrier configuration CC which holds GG. Thus, we invoke \IfAppendixLABEL:\next ( (Merging).)Lemma B.17, in order to get a carrier configuration EE which includes the edges from CC and from DD. Notice that \IfAppendixLABEL:\next ( (Merging).)Lemma B.17 always takes at most O~(n/c+1)\tilde{O}(n/c+1) rounds, and so this fits within our desired complexity.

Finally, we perform O(n/k)O(n/k^{\prime}) Bellman-Ford iterations on EE. Notice that the average degree in EE is Θ(m/n+k)\Theta(m/n+k^{\prime}). Therefore, due to \IfAppendixLABEL:\next ( (Bellman-Ford Iterations in 𝖠𝖢(𝖼)\mathsf{AC(c)}).)Lemma B.22, this completes within O~(((m/n+k)/c+1)n/k)=O~(((m/n+k)/c)(n/k)+n/k)=O~(((m/n+k)/c)(n/k)+n7/6/m1/2)=O~((m/k+n)/c+n7/6/m1/2)=O~(m1/2n1/6/c+n/c+n7/6/m1/2)\tilde{O}(((m/n+k^{\prime})/c+1)\cdot n/k^{\prime})=\tilde{O}(((m/n+k^{\prime})/c)\cdot(n/k^{\prime})+n/k^{\prime})=\tilde{O}(((m/n+k^{\prime})/c)\cdot(n/k^{\prime})+n^{7/6}/m^{1/2})=\tilde{O}((m/k^{\prime}+n)/c+n^{7/6}/m^{1/2})=\tilde{O}(m^{1/2}n^{1/6}/c+n/c+n^{7/6}/m^{1/2}) rounds. ∎

We now proceed to showing how to compute an approximation of SSSP in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model.

Theorem B.24 ((1+ε)(1+\varepsilon)-Approximation for SSSP in 𝖠𝖢(𝖼)\mathsf{AC(c)}).

There exists an algorithm in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, such that given a weighted, undirected input graph G=(V,E)G=(V,E), with n=|V|n=|V| and m=|E|=Ω(n3/2)m=|E|=\Omega(n^{3/2}), held in some carrier configuration CC, some 0<ε<10<\varepsilon<1, and a source sVs\in V, ensures that each node knows a (1+ε)(1+\varepsilon)-approximation to its distance from ss. The round complexity of this algorithm is O~((n5/6/c+m/(n2/3c)+1)/ε)\tilde{O}((n^{5/6}/c+m/(n^{2/3}\cdot c)+1)/\varepsilon), w.h.p.

Proof of Theorem B.24.

We construct a (logn/ε,ε)(\log n/\varepsilon,\varepsilon)-hopset HH by using \IfAppendixLABEL:\next ( (Hopset Construction).)Theorem B.19 on CC, in O~((n5/6/c+m/(n2/3c)+1)/ε)\tilde{O}((n^{5/6}/c+m/(n^{2/3}\cdot c)+1)/\varepsilon) rounds, and obtain G=(V,EH)G^{\prime}=(V,E\cup H) held in a carrier configuration DD. Due to the definition of HH, for every vVv\in V, it holds that dG(s,v)dGlogn/ε(s,v)(1+ε)dG(s,v)d_{G}(s,v)\leq d_{G^{\prime}}^{\log n/\varepsilon}(s,v)\leq(1+\varepsilon)\cdot d_{G}(s,v). Notice that since GG, HH, and GG^{\prime} are undirected, for each vVv\in V, the sets DvoutD_{v}^{out}, DvinD_{v}^{in} hold the same edges, and so it is irrelevant which one of them we use. Thus, we denote Dv=DvoutD_{v}=D_{v}^{out} from here on.

We now perform O(logn/ε)O(\log n/\varepsilon) Bellman-Ford iterations on GG^{\prime}, in order to ensure that every node vv knows dGlogn/ε(s,v)d_{G^{\prime}}^{\log n/\varepsilon}(s,v) and as such a (1+ε)(1+\varepsilon)-approximation to dG(s,v)d_{G}(s,v). To do so, we invoke \IfAppendixLABEL:\next ( (Bellman-Ford Iterations in 𝖠𝖢(𝖼)\mathsf{AC(c)}).)Lemma B.22 on GG^{\prime}, which is held in DD, and the source ss, requiring O~((logn/ε)(n+m/n)/c+1)=O~((n1/2/c+m/(nc)+1)/ε)\tilde{O}((\log n/\varepsilon)\cdot(\sqrt{n}+m/n)/c+1)=\tilde{O}((n^{1/2}/c+m/(n\cdot c)+1)/\varepsilon) rounds, since |H|=O~(n3/2)|H|=\tilde{O}(n^{3/2}). ∎

Finally, as our goal is to simulate our SSSP approximation algorithm in other distributed models directly, we provide the following wrapper statement which receives as input a graph where each node knows its neighbors, instead of a graph held in a carrier configuration.

Theorem B.25 ((1+ε)(1+\varepsilon)-Approximation for SSSP in 𝖠𝖢(𝖼)\mathsf{AC(c)} (Wrapper)).

There exists an algorithm in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, such that given a weighted, undirected input graph G=(V,E)G=(V,E), with n=|V|n=|V| and m=|E|m=|E|, where each node vv knows all the edges incident to it, and the communication tokens of all of its neighbors in GG, some 0<ε<10<\varepsilon<1, and a source sVs\in V, ensures that each node knows a (1+ε)(1+\varepsilon)-approximation to its distance from ss. The round complexity of this algorithm is O~((n5/6/c+m/(n2/3c)+1)/ε+Δ/c)\tilde{O}((n^{5/6}/c+m/(n^{2/3}\cdot c)+1)/\varepsilon+\Delta/c), where Δ\Delta is the maximal degree in the graph, w.h.p.

Proof of Theorem B.25.

We wish to invoke \IfAppendixLABEL:\next ( ((1+ε)(1+\varepsilon)-Approximation for SSSP in 𝖠𝖢(𝖼)\mathsf{AC(c)}).)Theorem B.24, yet the main hurdle in our way is that it requires a graph with at least Ω(n3/2)\Omega(n^{3/2}) edges. Therefore, we build G=(V,E)G^{\prime}=(V,E^{\prime}) such that |E|=Ω(n3/2)\left|E^{\prime}\right|={\Omega}(n^{3/2}), EEE\subseteq E^{\prime} and for each u,vVu,v\in V dG(u,v)=dG(u,v)d_{G^{\prime}}(u,v)=d_{G}(u,v). We call a node uu with degree less than O(n1/2){O}(n^{1/2}) a low degree node. Each low degree node uu meets Ω(n1/2)\Omega(n^{1/2}) nodes which are not its neighbors in GG, and adds edges with infinite weight to those nodes. To meet new nodes, each low degree node uu sends its identifier and communication token to O~(n1/2){\tilde{{O}}}(n^{1/2}) random nodes, and each node which received the communication token of uu, responds with its identifier and communication token. By \IfAppendixLABEL:\next ( (Sampling Unique Elements).)Lemma B.26, a low degree node uu meets at least Ω(n1/2){\Omega}(n^{1/2}) unique nodes which are not its original neighbors in GG, w.h.p. As such, uu connects itself with edges to these nodes which it meets. Notice, that each node was sampled at most O~(n1/2){\tilde{{O}}}(n^{1/2}) times w.h.p. Thus, the maximum degree, Δ\Delta^{\prime}, in GG^{\prime} is Δ+O~(n1/2)\Delta+{\tilde{{O}}}(n^{1/2}). The number of edges added is Θ~(n3/2){\tilde{{\Theta}}}(n^{3/2}), w.h.p., implying that the number of edges in GG^{\prime} is m=m+Θ~(n3/2)=Ω~(n3/2)m^{\prime}=m+{\tilde{{\Theta}}}(n^{3/2})={\tilde{{\Omega}}}(n^{3/2}).

We initialize a carrier configuration CC from GG^{\prime}, using \IfAppendixLABEL:\next ( (Initialize Carrier Configuration).)Lemma B.13 in O~(Δ/c+1)=O~(Δ/c+n1/2/c+1)\tilde{O}(\Delta^{\prime}/c+1)={\tilde{{O}}}(\Delta/c+n^{1/2}/c+1) rounds. Then, we invoke \IfAppendixLABEL:\next ( ((1+ε)(1+\varepsilon)-Approximation for SSSP in 𝖠𝖢(𝖼)\mathsf{AC(c)}).)Theorem B.24 in O~((n5/6/c+m/(n2/3c)+1)/ε)=O~((n5/6/c+m/(n2/3c)+1)/ε)\tilde{O}((n^{5/6}/c+m^{\prime}/(n^{2/3}\cdot c)+1)/\varepsilon)=\tilde{O}((n^{5/6}/c+m/(n^{2/3}\cdot c)+1)/\varepsilon) rounds w.h.p. ∎

Lemma B.26 (Sampling Unique Elements).

Let cc be some constant. Let SS be a set of nn elements. Let BSB\subseteq S be a set of at most bnb\sqrt{n} bad elements. Denote by G=SBG=S\setminus B the set of good elements. Let rnr\sqrt{n} be a number of required good elements. Let DD be a sequence of length at least dnr0.5lnnlnr+10.5lnnln(b+r)+clnn0.5lognlog(b+r)d\geq\sqrt{n}r\frac{0.5\ln{n}-\ln{r}+1}{0.5\ln{n}-\ln{(b+r)}}+\frac{c\ln{n}}{0.5\log{n}-\log{(b+r)}} elements, where each element is sampled independently uniformly and randomly from the set SS. There are more than rr unique good elements in the sequence DD with probability at least ncn^{-c}.

Proof of Lemma B.26.

We upper bound the number of sequences which contain rr or less different good elements. There are (nbnrn){\binom{n-b\sqrt{n}}{r\sqrt{n}}} subsets of good elements which may appear. For each of these subsets there are ((b+r)n)d((b+r)\sqrt{n})^{d} sequences. Notice, there are a lot of sequences which are counted multiple times, however since we only need an upper bound it is enough for us. So the number of bad sequences is upper bounded by:

(nbnrn)((b+r)n)d((nbn)ern)rn((b+r)n)d\displaystyle{\binom{n-b\sqrt{n}}{r\sqrt{n}}}\cdot((b+r)\sqrt{n})^{d}\leq\left(\frac{(n-b\sqrt{n})\cdot e}{r\sqrt{n}}\right)^{r\sqrt{n}}\cdot\left((b+r)\sqrt{n}\right)^{d}\leq
(ne)rn(rn)rn((b+r)n)dndc,\displaystyle\frac{(n\cdot e)^{r\sqrt{n}}}{{(r\sqrt{n})}^{r\sqrt{n}}}\cdot\left((b+r)\sqrt{n}\right)^{d}\leq n^{d-c},

for dd, which satisfies nd(b+r)d(ner)rnnc\frac{\sqrt{n}^{d}}{(b+r)^{d}}\geq\left(\frac{\sqrt{n}e}{r}\right)^{r\sqrt{n}}n^{c}. Solving this condition for dd results in dnr0.5lnnlnr+10.5lnnln(b+r)+clnn0.5lognlog(b+r)d\geq\sqrt{n}r\frac{0.5\ln{n}-\ln{r}+1}{0.5\ln{n}-\ln{(b+r)}}+\frac{c\ln{n}}{0.5\log{n}-\log{(b+r)}} for large enough n.

Since the total number of sequences is ndn^{d}, and all sequences are obtained with the same probability, the probability to get bad sequence is upper bounded by ncn^{-c}. ∎

B.5 kk-SSP and APSP

We further show results pertaining to approximating distances from more than one source.

Theorem B.27 ((1+ε)(1+\varepsilon)-Approximation for kk-SSP in 𝖠𝖢(𝖼)\mathsf{AC(c)}).

There exists an algorithm in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, such that given a weighted, undirected input graph G=(V,E)G=(V,E) with n=|V|n=|V| and m=|E|=Ω(n3/2)m=|E|=\Omega(n^{3/2}), held in a carrier configuration CC, some 0<ε<10<\varepsilon<1, and a set of sources MVM\subseteq V, k=|M|=Ω(n1/3)k=|M|=\Omega(n^{1/3}), outputs:

  1. 1.

    A directed graph G=(V,E)G^{\prime}=(V,E^{\prime}) held in a carrier configuration DD, where EE^{\prime} contains an edge from every node vv to every node sMs\in M, where the weight of the edge maintains dG(v,s)w((v,s))(1+ε)dG(v,s)d_{G}(v,s)\leq w((v,s))\leq(1+\varepsilon)\cdot d_{G}(v,s). Notice that it can be the case that EEE\not\subset E^{\prime}.

  2. 2.

    Every node vVv\in V, knows a (1+ε)(1+\varepsilon)-approximation for dG(s,v)d_{G}(s,v) for every sMs\in M.

The round complexity of this algorithm is O~((n5/6/c+m/(n2/3c)+kn1/3/c+1)/ε)\tilde{O}((n^{5/6}/c+m/(n^{2/3}\cdot c)+k\cdot n^{1/3}/c+1)/\varepsilon), w.h.p.

Proof of Theorem B.27.

This proof is split into three parts.

First, we construct a (logn/ε,ε)(\log n/\varepsilon,\varepsilon)-hopset HH by using \IfAppendixLABEL:\next ( (Hopset Construction).)Theorem B.19 on CC, in O~((n5/6/c+m/(n2/3c)+1)/ε)\tilde{O}((n^{5/6}/c+m/(n^{2/3}\cdot c)+1)/\varepsilon) rounds, and obtain G=(V,EH)G^{\prime}=(V,E\cup H) held in a carrier configuration CC^{\prime}. Due to definition of HH, for every vVv\in V, it holds that dG(s,v)dGlogn/ε(s,v)(1+ε)dG(s,v)d_{G}(s,v)\leq d_{G^{\prime}}^{\log n/\varepsilon}(s,v)\leq(1+\varepsilon)\cdot d_{G}(s,v).

Next, we make the IDs of the nodes in MM globally known within O~(k/c+1)\tilde{O}(k/c+1) rounds, using \IfAppendixLABEL:\next ( (Broadcasting).)Lemma B.5. This enables us to invoke \IfAppendixLABEL:\next ( ((S,d,k)(S,d,k)-source detection Algorithm).)Lemma B.21 on GG^{\prime} with S=MS=M, k=|M|k=|M|, and d=O(logn/ε)d=O(\log n/\varepsilon), creating the carrier configuration DD which is described in the statement of this theorem. This requires

O~(([k+m/n]n1/3/c+n2/3/c+1)(d+1)=O~((m/(n2/3c)+kn1/3/c+n2/3/c+1)/ε)\tilde{O}(([k+m/n]\cdot n^{1/3}/c+n^{2/3}/c+1)\cdot(d+1)=\tilde{O}((m/(n^{2/3}\cdot c)+k\cdot n^{1/3}/c+n^{2/3}/c+1)/\varepsilon) rounds.

Finally, to satisfy the second guarantee, node vv learns all the edges held in DvoutD_{v}^{out}, using \IfAppendixLABEL:\next ( (Learn Carried Information).)Lemma B.15 within O~(k/c+1)\tilde{O}(k/c+1) rounds.

Theorem B.28 ((3+ε)(3+\varepsilon)-Approximation for Scattered APSP in 𝖠𝖢(𝖼)\mathsf{AC(c)}).

There exists an algorithm in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, such that given a weighted, undirected input graph G=(V,E)G=(V,E) with n=|V|n=|V| and m=|E|=Ω(n3/2)m=|E|=\Omega(n^{3/2}), held in a carrier configuration CC, and some 0<ε<10<\varepsilon<1, solves the (3+ε)(3+\varepsilon)-Approximate Scattered APSP problem (\IfAppendixLABEL:\next ( (Scattered APSP).)Definition 4) on GG.

That is, the algorithm ensures that for every u,vVu,v\in V, there exist nodes wuvw_{uv}, wvuw_{vu} (potentially wuv=wvuw_{uv}=w_{vu}), which each know a (3+ε)(3+\varepsilon) approximation to dG(u,v)d_{G}(u,v), and node uu knows the identifier and communication token of node wuvw_{uv}, while node vv knows the identifier and communication token of wvuw_{vu}.

Further, for a given node uu, the following hold:

  1. 1.

    The set Wu={wuv|vV}W_{u}=\{w_{uv}\ |\ v\in V\} contains at most O~(n1/2)\tilde{O}(n^{1/2}) unique nodes.

  2. 2.

    Node uu can compute a string of O~(n1/2)\tilde{O}(n^{1/2}) bits, sus_{u}, such that using sus_{u}, for any vVv\in V, it is possible to determine wWuw\in W_{u} such that w=wuvw=w_{uv}.

  3. 3.

    Denote Pu={xV|vVP_{u}=\{x\in V\ |\ \exists v\in V s.t. u=wxv}u=w_{xv}\}. It holds that |Pu|=O~(n1/2)|P_{u}|=\tilde{O}(n^{1/2}).

The round complexity of this algorithm is O~((n5/6/c+m/(n2/3c)+1)/ε)\tilde{O}((n^{5/6}/c+m/(n^{2/3}\cdot c)+1)/\varepsilon), w.h.p.

The outline of the proof breaks into two parts – initialization and reshuffling.

Initialization: First, every node vv computes Θ(n1/2)\Theta(n^{1/2}) of its nearest neighbors, K(v)K(v). Next, a (1+ε)(1+\varepsilon)-approximation for distances from VV to a random set AA of Θ~(n1/2)\tilde{\Theta}(n^{1/2}) nodes is computed, denoted by d~\tilde{d}.

It holds that for each vVv\in V, w.h.p., AK(v)A\cap K(v)\neq\emptyset, and so we denote by p(v)p(v) a closest node to vv in AK(v)A\cap K(v).

Reshuffling: Due to \IfAppendixLABEL:\next ( (APSP using kk-nearest and MSSP).)Claim A.3, for every two nodes v,uv,u, it holds that dG(v,p(v))+d~(p(v),u)d_{G}(v,p(v))+\tilde{d}(p(v),u) is a (3+ε)(3+\varepsilon) approximation to dG(v,u)d_{G}(v,u). As GG is undirected, both d~(p(v),u)\tilde{d}(p(v),u) and d~(u,p(v))\tilde{d}(u,p(v)) can be used, and so we work with d~(u,p(v))\tilde{d}(u,p(v)). Thus, we desire a state where for every two nodes v,uv,u, there exists a node wvuw_{vu} which knows dG(v,p(v))d_{G}(v,p(v)) and d~(u,p(v))\tilde{d}(u,p(v)), and whose identifier is known to vv, and a node wuvw_{uv} which knows dG(u,p(u))d_{G}(u,p(u)) and d~(v,p(u))\tilde{d}(v,p(u)) and whose identifier is known to uu, concluding the proof.

Proof of Theorem B.28.

Initialization: Invoke \IfAppendixLABEL:\next ( (kk-nearest Algorithm).)Lemma B.20 on GG, with k=Θ(n1/2)k=\Theta(n^{1/2}), within O~(kn1/3/c+n2/3/c+1)=O~(n5/6/c+1)\tilde{O}(k\cdot n^{1/3}/c+n^{2/3}/c+1)=\tilde{O}(n^{5/6}/c+1) rounds, to get a directed graph G=(V,E)G^{\prime}=(V,E^{\prime}) held in a carrier configuration CC^{\prime}, where EE^{\prime} contains an edge from every node vVv\in V to every node uK(v)u\in K(v) with weight dG(v,u)d_{G}(v,u), where K(v)K(v) is a set of kk closest nodes to vv. Using \IfAppendixLABEL:\next ( (Learn Carried Information).)Lemma B.15, in O~(k/c)=O~(n1/2/c)\tilde{O}(k/c)=\tilde{O}(n^{1/2}/c) rounds, every node vv itself knows all the distances to the nodes in K(v)K(v).

Then, a random set AA of Θ~(n1/2)\tilde{\Theta}(n^{1/2}) nodes is selected, by letting each node join AA with probability Θ(n1/2)\Theta(n^{-1/2}), and Tokens(A)Tokens(A) are broadcast, using \IfAppendixLABEL:\next ( (Broadcasting).)Lemma B.5 within O~(|A|/c)=O~(n1/2/c+1)\tilde{O}(|A|/c)=\tilde{O}(n^{1/2}/c+1) rounds. As seen in \IfAppendixLABEL:\next ( (APSP using kk-nearest and MSSP).)Claim A.3, it holds that for each vVv\in V, w.h.p., AK(v)A\cap K(v)\neq\emptyset, and so we denote by p(v)p(v) a closest node to vv in AK(v)A\cap K(v). Node vv knows p(v)p(v), since it knows the distances to all the nodes in K(v)K(v). Finally, invoke \IfAppendixLABEL:\next ( ((1+ε)(1+\varepsilon)-Approximation for kk-SSP in 𝖠𝖢(𝖼)\mathsf{AC(c)}).)Theorem B.27 on GG, using M=AM=A as the source set, to compute a (1+ε)(1+\varepsilon) approximation for distances from VV to all of AA, denoted by d~\tilde{d}, and requiring O((n5/6/c+m/(n2/3c)+1)/ε)O((n^{5/6}/c+m/(n^{2/3}\cdot c)+1)/\varepsilon) rounds. As per the specifications of Theorem B.27, d~\tilde{d} is stored in a carrier configuration DD as edges in a directed graph G′′G^{\prime\prime}, where for each vV,aAv\in V,a\in A, there is an edge with weight wG′′(v,a)=d~(v,a)w_{G^{\prime\prime}}(v,a)=\tilde{d}(v,a).  

Reshuffling: For every node aAa\in A, denote by C(a)={vV|p(v)=a}C(a)=\{v\in V\ |\ p(v)=a\}. Notice that aa does not know the set C(a)C(a).

Primarily, we compute all the values, {|C(a)||aA}\{|C(a)|\ |\ a\in A\}, at once and make them known to all the nodes in VV, using \IfAppendixLABEL:\next ( (Aggregation).)Corollary B.6 within O~(|A|/c)=O~(n1/2/c+1)\tilde{O}(|A|/c)=\tilde{O}(n^{1/2}/c+1) rounds.

Base Case (The Set A0A_{0}): Denote by A0A_{0} nodes aAa\in A with |C(a)|2n/|A|=Θ~(n1/2)|C(a)|\leq 2n/|A|=\tilde{\Theta}(n^{1/2}). Since the values {|C(a)||aA}\{|C(a)|\ |\ a\in A\} are globally known, every node knows which nodes are in A0A_{0}. Fix a node aA0a\in A_{0}. Every node vC(a)v\in C(a) sends to aa the following values: (1) the identifier vv, (2), the value dG(v,a)d_{G}(v,a), and (3), the communication token of vv. Node vv knows all of these values, and also knows the communication token of aa,141414As Tokens(A)Tokens(A) are broadcast initially. and so node vv can send these three messages to node aa. This requires O~(|C(a)|/c+1)=O~(n1/2/c+1)\tilde{O}(|C(a)|/c+1)=\tilde{O}(n^{1/2}/c+1) rounds using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1. Node aa broadcasts the information it receives to DainD_{a}^{in}, in O~(n1/2/c+1)\tilde{O}(n^{1/2}/c+1) rounds, using \IfAppendixLABEL:\next ( (Carriers Broadcast and Aggregate).)Lemma B.14. Observe some node wDainw\in D_{a}^{in}. Notice that due to Item 4, and due to the fact that there is an edge in DD from every node in VV to aa, then there exists some interval Iw=[wb,we][n]I_{w}=[w_{b},w_{e}]\subseteq[n], such that node ww knows the values {d~(v,a)|vIw}\{\tilde{d}(v,a)|v\in I_{w}\}. Node ww sends to each vC(a)v\in C(a) the values wbw_{b} and wew_{e}, using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1. This is possible as ww knows Tokens(C(a))Tokens(C(a)),151515This holds since aa broadcast the information it received, including Tokens(C(a))Tokens(C(a)), to DainD_{a}^{in}. and takes O~(|C(a)|/c+|Dain|/c+1)=O~(n1/2/c+1)\tilde{O}(|C(a)|/c+|D_{a}^{in}|/c+1)=\tilde{O}(n^{1/2}/c+1) rounds,161616|Dain|=O(n/(|A|)=O~(n1/2)|D_{a}^{in}|=O(n/(|A|)=\tilde{O}(n^{1/2}), as the average degree in DD is |A||A|, and each node aAa\in A has nn edges directed towards it in DD. as ww sends O(|C(a)|)O(|C(a)|) messages and every node in C(a)C(a) receives |Dain||D_{a}^{in}| messages.

Fix some vVv\in V where p(v)A0p(v)\in A_{0}, we claim that the output of the theorem is satisfied for vv. Observe that given any uVu\in V, there exists some node wvuDp(v)inw_{vu}\in D_{p(v)}^{in} which both knows d(v,p(v))d(v,p(v)) and d~(u,p(v))\tilde{d}(u,p(v)), and further vv knows which node wvuw_{vu} is, as it is the node such that uIwu\in I_{w}. Finally, notice that we also satisfy that Wu={wuv|vV}W_{u}=\{w_{uv}\ |\ v\in V\} and Pu={xV|vVP_{u}=\{x\in V\ |\ \exists v\in V s.t. u=wxv}u=w_{xv}\} contain O~(n1/2)\tilde{O}(n^{1/2}) unique nodes, and that it is possible to condense into O~(n1/2)\tilde{O}(n^{1/2}) bits the information describing which node in WuW_{u} is wuvw_{uv}, for any vVv\in V, as this depends only on the intervals which each node in DainD_{a}^{in} holds. Thus, all the conditions of the statement we are proving are satisfied for the case of A0A_{0}.

Iterative Case (Sets AiA_{i}): We proceed in O(logn)O(\log n) iterations. Fix the iteration counter i[O(logn)]i\in[O(\log n)]. Denote by AiA_{i}, the set of nodes aAa\in A with 2in/|A|<|C(a)|2i+1n/|A|=Θ~(2i+1n1/2)2^{i}n/|A|<|C(a)|\leq 2^{i+1}n/|A|=\tilde{\Theta}(2^{i+1}n^{1/2}). Notice that aA|C(a)|=n\sum_{a\in A}|C(a)|=n, and therefore, |Ai|A/2i|A_{i}|\leq A/2^{i}. Further, the values {|C(a)||aA}\{|C(a)|\ |\ a\in A\} are globally known, implying that the contents of AiA_{i} are globally known, and so all the nodes locally compute an assignment of 2i2^{i} unique nodes H(a)AH(a)\subseteq A to each aAia\in A_{i}.

Fix aAia\in A_{i}. The nodes DainD_{a}^{in} duplicate the information which they hold, so that for each aH(a)a^{\prime}\in H(a), the nodes DainD_{a^{\prime}}^{in} will contain the information held in DainD_{a}^{in}. We use the following observation: Since the graph is connected, for any xAx\in A, the nodes DxinD_{x}^{in} hold exactly nn values, {d~(v,x)|vV}\{\tilde{d}(v,x)\ |\ v\in V\}. Combining this with the average degree in DD being Θ(|A|)\Theta(|A|), gives that |Dain|=O~(n/|A|)=O~(n1/2)|D_{a}^{in}|=\tilde{O}(n/|A|)=\tilde{O}(n^{1/2}). Node aa selects some node h1H(a)h_{1}\in H(a), and sends to h1h_{1} the values Tokens(Dain)Tokens(D_{a}^{in}), in O~(n1/2/c+1)\tilde{O}(n^{1/2}/c+1) rounds. Then, h1h_{1} broadcasts Tokens(Dain)Tokens(D_{a}^{in}) to Dh1inD_{h_{1}}^{in}, in O~(n1/2/c+1)\tilde{O}(n^{1/2}/c+1) rounds, using \IfAppendixLABEL:\next ( (Carriers Broadcast and Aggregate).)Lemma B.14. Due to the observation above, it also holds that |Dain|=n=|Dh1in||D_{a}^{in}|=n=|D_{h_{1}}^{in}|, and so each carrier node in Dh1inD_{h_{1}}^{in} selects a unique carrier node in DainD_{a}^{in}, and within O~(n1/2/c+1)\tilde{O}(n^{1/2}/c+1) rounds, and learns all of the information which it holds. Thus, the nodes Dh1inD_{h_{1}}^{in} know all of the information which the nodes DainD_{a}^{in} know. Then, nodes aa and h1h_{1} selects nodes h2,h3H(a)h_{2},h_{3}\in H(a), and repeat the process where aa sends information to h2h_{2} and h1h_{1} sends information to h3h_{3}. This process is repeated for log|H(a)|=log2i=i=O~(1)\log|H(a)|=\log 2^{i}=i=\tilde{O}(1) iterations, until the information has been spread to all of H(a)H(a) and their corresponding carrier nodes.

Fix aAia\in A_{i}. The set C(a)C(a) is split into 2i2^{i} roughly equal-sized parts, C1(a),,C2i(a)C_{1}(a),\dots,C_{2^{i}}(a). The main challenge is that no node in the graph knows all of C(a)C(a), and therefore partitioning C(a)C(a) into C1(a),,C2i(a)C_{1}(a),\dots,C_{2^{i}}(a) is not trivial. We overcome this final challenge as follows. Every node in wDainw\in D_{a}^{in} observes IwI_{w} (defined above as the interval such that ww knows d~(v,a)\tilde{d}(v,a) for all vIwv\in I_{w}) and sends each vIwv\in I_{w} a message asking if it is in C(a)C(a). Notice that this is possible since ww knows the communication tokens of all of IwI_{w}, due to Item 4. Since |Iw|=O~(|A|)=O~(n1/2)|I_{w}|=\tilde{O}(|A|)=\tilde{O}(n^{1/2}), then ww sends at most O~(n1/2)\tilde{O}(n^{1/2}) messages. Further, as |A|=O~(n1/2)|A|=\tilde{O}(n^{1/2}), each node in the graph receives at most O~(n1/2)\tilde{O}(n^{1/2}) messages, and so all of these messages may be routed in O~(n1/2/c+1)\tilde{O}(n^{1/2}/c+1) rounds using \IfAppendixLABEL:\next ( (Routing).)Lemma B.1. Thus, the identities of all of C(a)C(a) are dispersed across the carrier nodes DainD_{a}^{in}. Similarly to the step above, this information is doubled, in O~(n1/2/c+1)\tilde{O}(n^{1/2}/c+1) rounds, in order to make sure that for each hH(a)h\in H(a), the carrier nodes DhinD_{h}^{in} hold the identifiers of C(a)C(a) as well. Finally, for each hH(a)h\in H(a) (w.l.o.g. nodes in H(a)H(a) are numbered from 1 to |H(a)||H(a)| – possible since H(a)H(a) is globally known), node hh performs a binary search, by querying its carrier nodes DhinD_{h}^{in}, in order to find the interval Jh=[hb,he]J_{h}=[h_{b},h_{e}] of nodes, such that the number of nodes in C(a)C(a) with identifiers at most hbh_{b} is at most |C(a)|/h|C(a)|/h, and the number of nodes in C(a)C(a) with identifiers in the interval JvJ_{v} is |C(a)|/|H(a)||C(a)|/|H(a)|. These nodes form the set Ch(a)C_{h}(a). Node hh broadcasts hbh_{b} and heh_{e} to DhinD_{h}^{in}, and then uses \IfAppendixLABEL:\next ( (Learn Carried Information with Predicate).)Lemma B.16 in order to learn the identifiers and communication tokens of Ch(a)C_{h}(a), in O~(|Ch(a)|/c+1)=O~(n1/2/c+1)\tilde{O}(|C_{h}(a)|/c+1)=\tilde{O}(n^{1/2}/c+1) rounds. Finally, within O~(|Ch(a)|/c+1)=O~(n1/2/c+1)\tilde{O}(|C_{h}(a)|/c+1)=\tilde{O}(n^{1/2}/c+1) rounds, node hh messages the nodes Ch(a)C_{h}(a) to notify them that they are in Ch(a)C_{h}(a), which overcomes the challenge. This concludes the proof of the statement.

Fix aAia\in A_{i}, and hH(a)h\in H(a). The nodes in Ch(a)C_{h}(a) and repeat the same process as done for the case of A0A_{0} by communicating with hh. That is, each vCh(a)v\in C_{h}(a) sends to hh the values: (1) the identifier vv, (2) the value dG(v,a)d_{G}(v,a), and (3) the communication token of vv. Then, hh broadcasts this to DhinD_{h}^{in}. As shown for the case of A0A_{0}, this requires O~(n1/2/c+1)\tilde{O}(n^{1/2}/c+1) rounds, and completes the proof.

Appendix C The 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} Model – Missing Proofs

C.1 Preliminaries – Extended Subsection

C.1.1 Communication Primitives

We observe several basic routing claims which are known in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model.

In [7, Theorem 2.2], the following is shown for the weaker 𝖭𝗈𝖽𝖾-𝖢𝖺𝗉𝖺𝖼𝗂𝗍𝖺𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Node\text{-}Capacitated\ Clique} model (in this model there are only global edges), and trivially holds in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model.

Claim C.1 (Aggregate and Broadcast).

There is an Aggregate-and-Broadcast Algorithm that solves any Aggregate-and-Broadcast Problem in O(logn){O}(\log n) rounds in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model.

In [49, 8], solutions are presented for the \IfAppendixLABEL:\next ( (Token Dissemination).)Claim C.2 (see [8, Theorem 2.1]) and \IfAppendixLABEL:\next ( (Token Routing).)Claim C.3 (see [49, Theorem 2.2]) problems. Token dissemination is useful for broadcasting, while token routing has the ability to be used in a fashion that is more similar to unicast.

Definition 18 (Token Dissemination Problem).

The problem of making kk distinct tokens globally known, where each token is initially known to one node, and each node initially knows at most \ell tokens is called the (k,)(k,\ell)-Token Dissemination (TD) problem.

Claim C.2 (Token Dissemination).

There is an algorithm that solves (k,)(k,\ell)-TD in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model in O~(k+){\tilde{{O}}}(\sqrt{k}+\ell) rounds, w.h.p.

The following is discussed in [49] for token routing, and we later redefine the problem and remove the strong assumption which requires that each receiver knows the number of messages each sender sends it. We overcome this limitation later.

Definition 19 (Token Routing Problem).

The token routing problem is defined as follows. Let SVS\subseteq V be a set of sender nodes and RVR\subseteq V be a set of receiver nodes. Each sender needs to send at most kSk_{S} tokens and each receiver needs to receive at most kRk_{R} tokens, of size O(logn){O}(\log n) bits each. Each token has a dedicated receiver node rRr\in R, and each receiver rRr\in R knows the senders it must receive a token from and how many token it needs to receive from each sender. The token routing problem is solved when all nodes in RR know all tokens they are the receivers of.

Claim C.3 (Token Routing).

S,RVS,R\subseteq V be sets of nodes sampled from VV with probabilities pS=nxS1p_{S}=n^{x_{S}-1} and pR=nxR1p_{R}=n^{x_{R}-1}, for constant xS,xR(0,1]x_{S},x_{R}\in(0,1], respectively. Let kSk_{S} and kRk_{R} be the number of tokens to be sent or received by any node in SS and RR, respectively. Let K=|S|kS+|R|kRK=\left|S\right|\cdot k_{S}+\left|R\right|\cdot k_{R} be the total workload. The token routing problem can be solved in O~(Kn+kS+kR){\tilde{{O}}}(\frac{K}{n}+\sqrt{k_{S}}+\sqrt{k_{R}}) rounds in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model w.h.p.

The following claim enables sending a polynomial number of messages uniformly at random while obeying the constraints of the model.

Claim C.4 (Uniform Sending).

[8, Lemma 3.1] Presume some 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model algorithm takes at most p(n)p(n) rounds for some polynomial pp. Presume that each round, every node sends at most σ=Θ(logn)\sigma={\Theta}(\log n) messages via global edges to σ\sigma targets in VV sampled independently and uniformly at random. Then there is a ρ=Θ(logn)\rho={\Theta}(\log n) such that for sufficiently large nn, in every round, every node in VV receives at most ρ\rho messages per round w.h.p.

C.1.2 Skeleton Graph

We use the notion of the skeleton graph presented in [8, 49] and augment it with additional conditions. In particular, its nodes are well spaced in the graph and satisfy the properties of marked nodes stated above.

Definition 20 (Extended Skeleton Graph).

Given a graph G=(V,E)G=(V,E) and a value 0<x<10<x<1, a graph Sx=(M,ES)S_{x}=(M,E_{S}) is called a skeleton graph in GG, if all of the following hold.

  1. 1.

    {v,u}ES\{v,u\}\in E_{S} if and only if there is a path of at most h=Θ~(n1x)h=\tilde{\Theta}{(n^{1-x})} edges between v,uv,u in GG.

  2. 2.

    Every node vMv\in M knows all its incident edges in ESE_{S}.

  3. 3.

    SxS_{x} is connected.

  4. 4.

    For any two nodes v,vMv,v^{\prime}\in M, dS(v,v)=dG(v,v)d_{S}(v,v^{\prime})=d_{G}(v,v^{\prime}).

  5. 5.

    For any two nodes u,vVu,v\in V with hop(u,v)hhop(u,v)\geq h, there is at least one shortest path PP from uu to vv in GG, such that any sub-path QQ of PP with at least hh nodes contains a node wMw\in M.

  6. 6.

    |M|=Θ~(nx)|M|={\tilde{{\Theta}}}(n^{x}).

  7. 7.

    For each vMv\in M there is a helper set HvH_{v} which satisfies:

    1. (a)

      |Hv|=n1x|H_{v}|=n^{1-x}.

    2. (b)

      uHv:hop(u,v)=O~(n1x)\forall u\in H_{v}\colon hop(u,v)={\tilde{{O}}}(n^{1-x}).

    3. (c)

      For each node uVu\in V, there are at most O~(1){\tilde{{O}}}(1) nodes VuMV_{u}\subseteq M such that for each wVuw\in V_{u}, uHwu\in H_{w}.

    4. (d)

      |vMHv|=Ω~(n)\left|\bigcup_{v\in M}H_{v}\right|={\tilde{{\Omega}}}(n).

In this definition, we merge the properties used by [8, 49], slightly adjust Property 7a and prove Property 7d.

Claim C.5 (Skeleton From Random Nodes).

Given a graph G=(V,E)G=(V,E), a value 0<x<10<x<1, and a set of nodes MM marked independently with probability nx1n^{x-1},

there is an algorithm which constructs a skeleton graph Sx=(M,ES)S_{x}=(M,E_{S}) in O~(n1x){\tilde{{O}}}(n^{1-x}) rounds w.h.p. If also given a single node sVs\in V, it is possible to construct Sx=(M{s},ES)S_{x}=(M\cup\set{s},E_{S}), without damaging the properties of SxS_{x}.

Proof of Claim C.5.

Similarly to [8, Algorithm 7] and [49, Algorithm 6], the algorithm for constructing the skeleton graph Sx=(M,ES)S_{x}=(M,E_{S}) is to learn the Θ~(n1x){\tilde{{\Theta}}}(n^{1-x})-hop neighborhood and to run [49, Algorithm 1] to compute the helper sets.

We group and slightly extend the claims given in [8, 49]. Properties 3 and 4 holds w.h.p. since GG is connected, see  [8, Lemma 4.3] or [49, Lemma C.2]. Property 5 follow from [8, Lemma 4.2] or [49, LemmaC.1]. Property 6 follows from Chernoff Bounds. The helper sets described in Property 7 are computed using [49, Algorithm 1], and in [49, Lemma 2.2], their Properties 7b and 7c are proven. It is also shown there that, w.h.p., for every vMv\in M it holds that |Hv|n1x\left|H_{v}\right|\geq n^{1-x}, and thus in an additional O~(n1x){\tilde{{O}}}(n^{1-x}) rounds of local communication, we select exactly n1xn^{1-x} helpers and obtain Property 7a. The remaining Property 7d of the helper sets states that almost all of the nodes in the graph help other nodes. This holds since there are |M|=Ω~(nx)\left|M\right|={\tilde{{\Omega}}}(n^{x}) skeleton nodes, each has n1xn^{1-x} helpers and each helper helps O~(1){\tilde{{O}}}(1) skeleton nodes, so, by the pigeonhole principle, the overall number of helpers is at least Ω~(nx)n1xO~(1)=Ω~(n)\frac{{\tilde{{\Omega}}}(n^{x})\cdot n^{1-x}}{{\tilde{{O}}}(1)}={\tilde{{\Omega}}}(n).

Finally, for adding a given node ss to the skeleton graph, notice that, as stated in [49], this node can take as helpers n1xn^{1-x} closest nodes. For each helper node, this will at most double the number of skeleton nodes it helps (see Property 7c). ∎

For the sake of formality in the following proofs, as some are stated for a set of marked nodes and some for the skeleton graph, we also show the following \IfAppendixLABEL:\next ( (Construct Skeleton).)Corollary C.6.

Corollary C.6 (Construct Skeleton).

Given a graph G=(V,E)G=(V,E), and a value 0<x<10<x<1, there is an algorithm which constructs a skeleton graph Sx=(M,ES)S_{x}=(M,E_{S}) in O~(n1x){\tilde{{O}}}(n^{1-x}) rounds w.h.p. Further, if also given a single node sVs\in V, it is possible to ensure that sMs\in M without damaging the properties of SxS_{x}.

Proof of Corollary C.6.

First mark each skeleton independently with probability n1xn^{1-x}, getting a set of skeleton nodes MM, then, using the algorithm from \IfAppendixLABEL:\next ( (Skeleton From Random Nodes).)Claim C.5 it is possible to construct the skeleton graph Sx=(M,ES)S_{x}=(M,E_{S}) within O~(n1x)\tilde{O}(n^{1-x}) rounds w.h.p.

We show several primitives related to communication within skeleton graphs.

We show the following claim which, given a skeleton graph Sx=(M,ES)S_{x}=(M,E_{S}), assigns the nodes MM unique IDs from the set [|M|][|M|]. This is useful, among other uses, for symmetry breaking and synchronization among the skeleton nodes.

Claim C.7 (Unique IDs).

Given a graph G=(V,E)G=(V,E), and a skeleton graph Sx=(M,ES)S_{x}=(M,E_{S}), it is possible to assign the nodes MM unique IDs from the set [|M|][|M|] within O~(1)\tilde{O}(1) rounds in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model, w.h.p.

Proof of Claim C.7.

We construct a binary tree of O~(1)\tilde{O}(1) depth over the nodes MM, and then assign each node an ID equal to its index in the pre-order traversal of the tree.

The nodes MM compute the node vMv^{\prime}\in M with minimal initial ID, the ID which it has due to the definition of the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model. Notice it is possible to identify vv^{\prime} and ensure that all nodes in MM know the identifier of vv^{\prime} within O~(1)\tilde{O}(1) rounds due to \IfAppendixLABEL:\next ( (Aggregate and Broadcast).)Claim C.1. Further, using \IfAppendixLABEL:\next ( (Aggregate and Broadcast).)Claim C.1, the nodes compute |M||M|.

Next, node vv^{\prime} chooses two nodes at random from GG, nodes a,ba,b, and sends them each a message. Nodes a,ba,b reply each with a random node α,β\alpha,\beta, respectively, where aHα,bHβa\in H_{\alpha},b\in H_{\beta}. Node vv^{\prime} repeats this process as long as it does not receive two distinct nodes α,β\alpha,\beta. Node vv then sends messages to both α,β\alpha,\beta and lets them know that they are its children in the tree. The nodes added to the tree continue this process, each of them randomly choosing two nodes as its children until it receives two distinct nodes which are not already in the tree, or until some O~(1)\tilde{O}(1) rounds elapsed. Clearly, w.h.p., this process constructs a binary tree of depth O~(1)\tilde{O}(1) within O~(1)\tilde{O}(1) rounds.

Finally we would like to assign an ordering to the nodes. Each node tells its parent the size of its subtree. That is, the leaves tell their parents that they are leaves, and whenever a node reaches a state where it has heard from all its children, it tells its parent how many nodes are in its subtree. Then, the root of the tree, vv^{\prime}, begins with the ID pallet [|M|][|M|], takes the first ID for itself, and passes down two contiguous intervals for possible IDs, broken according to the sizes of the subtrees of its children, to its two children nodes – with the left child receiving the interval with smaller IDs. Inductively, each node takes the first ID from the pallet it receives from its parent, breaks the pallet into two contiguous parts, according to the sizes of the subtrees of its children, and sends the part with smaller IDs to its left child, and the higher part to its right child. Since the depth of the tree is O~(1){\tilde{{O}}}(1), this completes in O~(1){\tilde{{O}}}(1) rounds, w.h.p. ∎

We use the following statement from [18] to prove \IfAppendixLABEL:\next ( (SSSP with Low Average and High Maximal Degrees).)Lemma 4.5.

Lemma C.8 (Reassign Skeletons).

[18, Lemma 29] Given graph G=(V,E)G=(V,E), a skeleton graph Sx=(M,ES)S_{x}=(M,E_{S}), a value kk which is known to all the nodes, and nodes AVA\subseteq V such that each uAu\in A has at least Θ~(k|A|){\tilde{{\Theta}}}(k\cdot|A|) nodes MuMM_{u}\subseteq M in its Θ~(n1x)\tilde{\Theta}(n^{1-x}) neighborhood, there is an algorithm that assigns KuMuK_{u}\subseteq M_{u} nodes to uu, where |Ku|=Ω~(k)|K_{u}|={\tilde{{\Omega}}}(k), such that each node in MM is assigned to at most O~(1)\tilde{O}(1) nodes in AA. With respect to the set AA, it is only required that every node in GG must know whether or not it itself is in AA – that is, the entire contents of AA do not have to be globally known. The algorithm runs in O~(n1x){\tilde{{O}}}(n^{1-x}) rounds in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model, w.h.p.

The skeleton-based techniques allow us to approximate weighted SSSP fast in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model. After we do it, the following well-known simple reduction allows us to compute (2+ε)(2+\varepsilon) approximate weighted diameter.

Claim C.9 (Diameter from SSSP).

(see e.g. [18, Claim 34]) Given a graph G=(V,E)G=(V,E), a value α>0\alpha>0 and an algorithm which computes an α\alpha approximation of weighted SSSP in TT rounds of the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model, there is an algorithm which computes a 2α2\alpha-approximation of the weighted diameter in T+O~(1)T+{\tilde{{O}}}(1) rounds of the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model.

We use the following basic claim regarding usage of skeleton graphs for purposes of distance computations in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model. It is proven in [8].

Claim C.10 (Extend Distances).

[8, Theorem 2.7]Let G=(V,E)G=(V,E), let Sx=(M,ES)S_{x}=(M,E_{S}) be a skeleton graph, and let VVV^{\prime}\subseteq V be the set of source nodes. If for each source node sVs\in V^{\prime}, each skeleton node vMv\in M knows the (α,β)\left(\alpha,\beta\right)-approximate distance d~(v,s)\tilde{d}\left(v,s\right) such that d(v,s)d~(v,s)αd(v,s)+βd(v,s)\leq\tilde{d}(v,s)\leq\alpha d(v,s)+\beta, then each node uVu\in V can compute for all source nodes sVs\in V^{\prime}, a value d~(u,s)\tilde{d}(u,s) such that d(u,s)d~(u,s)αd(u,s)+βd(u,s)\leq\tilde{d}(u,s)\leq\alpha d(u,s)+\beta in O~(n1x){\tilde{{O}}}(n^{1-x}) rounds.

C.2 Oblivious Token Routing

In [49], they introduce and solve the token routing problem over a skeleton graph, where each receiver rr knows the number of tokens each sender ss has for rr. This is insufficient for our purposes since we work in the o(n1/3){o}(n^{1/3}) complexity realm with ω(n2/3){\omega}(n^{2/3}) skeleton nodes, where we can’t make the identifiers of the skeleton nodes globally known, let alone the number of messages between pairs of nodes. Therefore, we define the following routing problem, in which the receivers do not know neither the identifiers of the senders nor the number of messages each sender intends to send them.

Definition 21 (Oblivious Token Routing Problem).

The oblivious-token routing problem is defined as follows. Let SVS\subseteq V be a set of sender nodes and RVR\subseteq V be a set of receiver nodes. Each sender needs to send and each receiver needs to receive at most kk tokens, of size O(logn){O}(\log n) bits each. Each token has a dedicated receiver node rRr\in R, and each sender sSs\in S and receiver rRr\in R know the bound kk on number of tokens the receiver is going to receive. The oblivious-token routing problem is solved when all nodes in RR know all tokens they are the receivers of.

Notice that the assumption of the knowledge of a bound on the number kk of messages each receiver gets is something which is easy to eliminate by having the receiver double its estimate and repeat the algorithm till success for O(logk){O}(\log{k}) iterations. To verify if some particular invocation succeeded, we can can make a node broadcast failure if it sent or received more than half of its global capacity at some point.

Lemma C.11 (Oblivious Token Routing).

Given a graph G=(V,E)G=(V,E), and a skeleton graph Sx=(M,ES)S_{x}=(M,E_{S}), let kk be an upper bound on the number of tokens to be sent or received by any node in MM and let K=2|M|kK=2\cdot\left|M\right|\cdot k be the total workload. The oblivious-token routing problem can be solved in O~(k/n1x+n1x){\tilde{{O}}}({k/n^{1-x}}+n^{1-x}) rounds, w.h.p., in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model.

Proof of Lemma C.11.

The problem overcome in [49, Theorem 2.2] (the non-oblivious case), is that even though there are enough helpers near each skeleton node to send and receive all the messages, it is not straightforward to connect between senders’ and receivers’ helpers. So, in [49] it is suggested to relay messages via some intermediate receivers. This way, a message is sent by a sender to one of its helpers, by the helper to an intermediate receiver, from there to a helper of the receiver, and from there it is sent to the receiver. To compute intermediate receivers for the message number ii from ss to rr, they apply a pseudo-random hash function h(s,r,i)h(s,r,i).

However, the receiver needs to be able to compute h(s,r,i)h(s,r,i) as well, so it needs to know the number of messages it is to receive from each sender ss, and we cannot assume this for our purposes.

To overcome this limitation, we assign for helper number i[n1x]i\in\left[n^{1-x}\right] of the receiver rr the intermediate receiver whose identifier is computed as w=h(r,i)w=h(r,i), where hh is pseudo-random hash function. We deliver the messages in k/n1x\left\lceil{k/n^{1-x}}\right\rceil phases. To keep the load balanced between phases, for each message jj we independently at random sample pjU[k/n1x]p_{j}\sim U\left[\left\lceil{k/n^{1-x}}\right\rceil\right], which is the phase on which it will be sent. In order to keep the load balanced between the receivers’ helpers and intermediate receivers on some phase pp, for each message jj we also independently at random sample receivers’ helper index ijU[n1x]i_{j}\sim U\left[n^{1-x}\right]. The intermediate receiver is decided by hash function hh, i.e. we route the message jj with the final receiver rjr_{j} via w=h(rj,ij)w=h(r_{j},i_{j}). Unlike [49], we apply hh on arguments that are not necessarily distinct, which could increase the number of conflicts. However, we show that every time all nodes apply hh, each key (rj,ij)\left(r_{j},i_{j}\right) is used at most O~(1){\tilde{{O}}}(1)-times w.h.p., so due to \IfAppendixLABEL:\next ( (Conflicts).)Claim A.2 the congestion on each intermediate receiver is O~(1){\tilde{{O}}}(1) w.h.p.

The pseudo-code is provided by Algorithm 1.

1 The node with the minimum ID samples and broadcasts O~(1){\tilde{{O}}}(1) bits of seed
2Each node uses the seed to sample a pseudo-random hash function hh\in\mathcal{H}
3Each sender sMs\in M balances tokens to send between its helpers HsH_{s}
4Each receiver rr enumerates its helpers uHru\in H_{r} and informs them about their indices
5Each sender’s helper vVv\in V, for each message jj it is assigned to send, samples receiver’s helper index ijU[n1x]i_{j}\sim U\left[n^{1-x}\right] and phase pjU[k/n1x]p_{j}\in U\left[\left\lceil{k/n^{1-x}}\right\rceil\right]
6for pp from 0 to k/n1x\left\lceil{k/n^{1-x}}\right\rceil  do
7       Each sender’s helper vVv\in V for each message jj such that pj=pp_{j}=p sends it to h(rj,ij)h(r_{j},i_{j})
8      Each receiver’s helper uVu\in V, for each receiver rr it helps sends u,r\Braket{u,r} to w=h(r,i)w=h(r,i), such that ii is the index of uu in HrH_{r}
9      Each intermediate receiver wVw\in V which receives u,r\Braket{u,r} sends all the messages it received for rr on this phase to uu
10
Each receiver rMr\in M collects messages addressed to it from its helpers HrH_{r}
Algorithm 1 Oblivious-Token Routing Protocol

Notice that each node can play five different roles: it could be a sender ss, a receiver rr, a sender’s helper vv, a receiver’s helper uu and an intermediate receiver ww. Moreover, it can be a sender’s or a receiver’s helper for up to O~(1){\tilde{{O}}}(1) nodes. We show that it can be an intermediate receiver for O~(1){\tilde{{O}}}(1) receiver’s helpers w.h.p.

First, all nodes sample a globally known pseudo-random hash function h:V×[n1x]Vh\colon V\times\left[n^{1-x}\right]\mapsto V from the family of Θ~(1){\tilde{{\Theta}}}(1)-wise independent random functions \mathcal{H}, which is used to compute the intermediate receivers for each message (Algorithms 1 and 1). For this, by \IfAppendixLABEL:\next ( (Seed).)Claim A.1, O~(1){\tilde{{O}}}(1) bits of globally known seed are enough and the node with the minimal identifier samples and broadcasts them using \IfAppendixLABEL:\next ( (Aggregate and Broadcast).)Claim C.1. Afterwards, each sender sMs\in M distributes the tokens between its helpers HvH_{v} in a balanced manner – each sender’s helper is assigned at most k/n1x\left\lceil{k/n^{1-x}}\right\rceil messages to send (Algorithm 1). Each receiver enumerates its helpers by identifiers (Algorithm 1). Each sender’s helper vv, for each message jj it has to send, samples a random phase pjU[k/n1x]p_{j}\sim U\left[\left\lceil{k/n^{1-x}}\right\rceil\right] and a random receiver’s helper index iji_{j} (Algorithm 1).

We then proceed for k/n1x\left\lceil{k/n^{1-x}}\right\rceil phases. On phase pp, each sender’s helper vv sends each message jj for which it sampled pj=pp_{j}=p to the node h(rj,ij)h(r_{j},i_{j}). Afterwards, in Algorithm 1, each receiver’s helper vv for each receiver rr it helps sends v,r\Braket{v,r} to h(r,i)h(r,i), where ii is the index of vv in HrH_{r} computed in Algorithm 1. Each intermediate receiver ww, sends all messages jj it received with destination rjr_{j} to the vv from which it received v,rj\Braket{v,r_{j}}.

Algorithm 1 takes O~(1){\tilde{{O}}}(1) rounds by \IfAppendixLABEL:\next ( (Aggregate and Broadcast).)Claim C.1, and Algorithms 1, 1 and 1 are implemented using local edges in O~(n1x){\tilde{{O}}}(n^{1-x}) rounds. There are O~(k/n1x){\tilde{{O}}}(\left\lceil{k/n^{1-x}}\right\rceil) iterations of the loop in Line 1, and we argue that each of them requires O~(1){\tilde{{O}}}(1) rounds of communications via global edges w.h.p. Overall, the complexity is O~(k/n1x+n1x){\tilde{{O}}}({k/n^{1-x}}+n^{1-x}) rounds.

For each of the kk messages designated to some receiver rr, the phase number pjp_{j} is sampled independently with probability O~(min{1,n1x/k}){\tilde{{O}}}(\min\set{1,{n^{1-x}}/{k}}) , therefore by a Chernoff Bound, there are O~(n1x){\tilde{{O}}}(n^{1-x}) messages with rr as the final destination, which are sent on the pp-th phase w.h.p. On the pp-th phase, some receiver’s helper index for each of these messages is sampled with probability 1n1x\frac{1}{n^{1-x}}, therefore by a Chernoff Bound it is sampled O~(1){\tilde{{O}}}(1) times w.h.p. By a union bound over all phases, receivers and receivers’ helper indices, on each phase, for each receiver each receiver’s helper index is selected O~(1){\tilde{{O}}}(1) times w.h.p. Thus, by \IfAppendixLABEL:\next ( (Conflicts).)Claim A.2 each wVw\in V is selected as an intermediate receiver O~(1){\tilde{{O}}}(1) times and receives O~(1){\tilde{{O}}}(1) messages in O~(1){\tilde{{O}}}(1) rounds w.h.p. This implies that no message is lost during Algorithm 1 and that Algorithms 1 and 1 take O~(1){\tilde{{O}}}(1) rounds.

Since each node helps at most O~(1){\tilde{{O}}}(1) senders and due to Chernoff Bounds, each helper sends O~(1){\tilde{{O}}}(1) messages w.h.p. on some phase in Algorithm 1. Since each node is a helper to at most O~(1){\tilde{{O}}}(1) receivers, Algorithm 1 also takes O~(1){\tilde{{O}}}(1) rounds w.h.p. Similarly, by \IfAppendixLABEL:\next ( (Conflicts).)Claim A.2, since there are O~(nx)n1x=O~(n){\tilde{{O}}}(n^{x})\cdot n^{1-x}={\tilde{{O}}}(n) distinct pairs of receivers and receiver receiver’s helper index, w.h.p. each intermediate receiver is assigned to at most O~(1){\tilde{{O}}}(1) receiver helpers. Thus, Algorithm 1 also takes O~(1){\tilde{{O}}}(1) rounds w.h.p. ∎

See 4.1

Proof of Claim 4.1.

The claim follows by an invocation of \IfAppendixLABEL:\next ( (Oblivious Token Routing).)Lemma C.11 with parameters x,k,K=O~(nxk)x,\ k,\ K={\tilde{{O}}}(n^{x}\cdot k) resulting in O~(kn1x+k)=O~(n1x){\tilde{{O}}}(\frac{k}{n^{1-x}}+\sqrt{k})={\tilde{{O}}}(n^{1-x}) rounds, as required. ∎

C.3 𝖡𝗋𝗈𝖺𝖽𝖼𝖺𝗌𝗍𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Broadcast\ Congested\ Clique} Simulation

We use the following claims from [18] to improve the simulating of the 𝖡𝗋𝗈𝖺𝖽𝖼𝖺𝗌𝗍𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Broadcast\ Congested\ Clique} model in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid}.

Lemma C.12 (𝖫𝖮𝖢𝖠𝖫\mathsf{LOCAL} Simulation in 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid}).

[18, Lemma 16] Given a graph G=(V,E)G=(V,E), and a skeleton graph Sx=(M,ES)S_{x}=(M,E_{S}), it is possible to simulate one round of the 𝖫𝖮𝖢𝖠𝖫\mathsf{LOCAL} model over SxS_{x} within O~(n1x)\tilde{O}(n^{1-x}) rounds in GG in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model. That is, within O~(n1x)\tilde{O}(n^{1-x}) rounds in GG in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model, any two adjacent nodes in SxS_{x} can communicate any amount of data between each other.

Lemma C.13 (Sampled neighbors [18, Lemma 3.1]).

Given is a graph G=(V,E)G=(V,E). For a value qnq\leq n, there is a value x=O~(n/q)x={\tilde{{O}}}(n/q) such that the following holds w.h.p.: Let VVV^{\prime}\subseteq V be a subset of |V|=x\left|V^{\prime}\right|=x nodes sampled uniformly at random from MM. Then each node uVu\in V with deg(u)q\deg{(u)}\geq q has a neighbor in VV^{\prime}.

We show how to simulate the 𝖡𝗋𝗈𝖺𝖽𝖼𝖺𝗌𝗍𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Broadcast\ Congested\ Clique} model using the 𝖠𝖢(𝖼)\mathsf{AC(c)} model and the 𝖫𝖮𝖢𝖠𝖫\mathsf{LOCAL} model together, and it then follows by \IfAppendixLABEL:\next ( (𝖠𝖢(𝖼)\mathsf{AC(c)} Simulation in 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid}).)Theorem 4.2 and \IfAppendixLABEL:\next ( (𝖫𝖮𝖢𝖠𝖫\mathsf{LOCAL} Simulation in 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid}).)Lemma C.12 that this can be converted into a simulation in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model. The intuition behind the simulation follows from observing \IfAppendixLABEL:\next ( (Sampled neighbors [18, Lemma 3.1]).)Lemma C.13 – if every node desires to broadcast a single message to the entire graph, then with relatively little bandwidth it is possible to ensure that all nodes above a certain minimal degree will get these messages from all the nodes in the graph. We begin with the simulation of the 𝖡𝗋𝗈𝖺𝖽𝖼𝖺𝗌𝗍𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Broadcast\ Congested\ Clique} model in the combined 𝖠𝖢(𝖼)\mathsf{AC(c)} and 𝖫𝖮𝖢𝖠𝖫\mathsf{LOCAL} models.

Lemma C.14 (𝖡𝖢𝖢\mathsf{BCC} Simulation in 𝖠𝖢(𝖼)\mathsf{AC(c)} and 𝖫𝖮𝖢𝖠𝖫\mathsf{LOCAL}).

Given a graph G=(V,E)G=(V,E) with average degree kk, given an algorithm ALGBCCALG_{BCC} in the 𝖡𝗋𝗈𝖺𝖽𝖼𝖺𝗌𝗍𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Broadcast\ Congested\ Clique} model, which runs on GG in tt rounds, and given some value cc, there exists an algorithm which uses O~(tn/(kc))\tilde{O}(t\cdot n/(\sqrt{k}\cdot c)) rounds of the 𝖠𝖢(𝖼)\mathsf{AC(c)} model and O(t)O(t) rounds of the 𝖫𝖮𝖢𝖠𝖫\mathsf{LOCAL} model on GG and simulates ALGBCCALG_{BCC} on GG. It is assumed that prior to running ALGBCCALG_{BCC}, each node vVv\in V has at most O~(degG(v))\tilde{O}(\deg_{G}(v)) bits of input used in ALGBCCALG_{BCC}, including, potentially, the incident edges of vv in GG. Further, it is assumed that the output of each node in ALGBCCALG_{BCC} is at most O(tlogn)O(t\log n) bits.

Proof of Lemma C.14.

The outline of the simulation is as follows. We split the graph into high degree nodes, HVH\subseteq V, and low degree nodes, L=VHL=V\setminus H, at a certain cut-off. The key idea is that if every node vVv\in V takes a single message and sends it randomly to cc nodes in VV, then every uHu\in H will have at least one neighbor, w.h.p., which hears the message from vv, for every vVv\in V, due to \IfAppendixLABEL:\next ( (Sampled neighbors [18, Lemma 3.1]).)Lemma C.13. Thus, we choose some subset FHF\subset H and assign to each node vLv\in L some node uFu\in F which partially simulates vv. By partially simulating, we mean that, initially, node vv tells node uu all of its input to ALGBCCALG_{BCC}, and then for each round, node uu tells vv what message vv wants to send in that round, and vv then sends this message (that it wishes to broadcast) to cc random nodes. Finally, we are guaranteed that every node in HH hears all the messages broadcast in the graph, which allows for uu to internally simulate the local computation which vv should perform in ALGBCCALG_{BCC} before the next round. In a sense, when uu simulates vv, after each round node vv knows what message it wants to send in that round of ALGBCCALG_{BCC}, yet not necessarily other information that it would have learned from other nodes in the graph during that round of ALGBCCALG_{BCC}. Thus, node vv might not know its output in ALGBCCALG_{BCC}. To overcome this, notice that uu knows the output of vv in ALGBCCALG_{BCC}, and due to our assumption in the statement of this theorem, each node outputs at most O(tlogn)O(t\log n) bits, and so we can simulate another tt rounds where each vv will just broadcast its output (ensuring that it itself receives it from uu).

Initialization
We begin by showing how to initialize the nodes of high degree which simulate those of low degree. The cut-off for being a high or low degree node is Θ(k)\Theta{(\sqrt{k})}. That is, we desire to simulate every node vVv\in V with deg(v)=o(k)\deg(v)=o(\sqrt{k}) using a node uVu\in V with degree at least deg(u)=Ω(k)\deg(u)=\Omega(\sqrt{k}). Observe that since kk is the average degree, there are Θ(nk)\Theta(nk) edges in the graph. Since the maximal degree is at most nn, there must be at least kk nodes with degree at least Ω(k)\Omega(\sqrt{k}). Thus, we denote by FF the kk nodes in GG with the highest degree, and are guaranteed that for each vFv\in F, deg(v)=Ω(k)\deg(v)=\Omega(\sqrt{k}). Notice that it is possible within O~(1)\tilde{O}(1) rounds to count the number of nodes in VV with degree above a threshold, using \IfAppendixLABEL:\next ( (Aggregation).)Corollary B.6, and thus within O~(1)\tilde{O}(1) rounds it is possible to do a binary search for the degree of the node with kthk^{th} highest degree, allowing each node to know whether or not it is in FF.

Let vFv\in F. The node vv now knows that it is in FF, and thus randomly sends Θ~(n/k)\tilde{\Theta}(n/k) messages, using Θ~(n/(kc))O~(n/(kc))\tilde{\Theta}(n/(k\cdot c))\leq\tilde{O}(n/(\sqrt{k}\cdot c)) rounds of the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, containing its ID and communication token in the 𝖠𝖢(𝖼)\mathsf{AC(c)} model. Clearly, w.h.p., every node uVFu\in V\setminus F has received a message from at least one node vFv\in F. Thus, if node uu needs simulating, that is, deg(u)=o(k)\deg(u)=o(\sqrt{k}), it chooses arbitrarily among the nodes from FF which it heard from some node vv and tells vv that it should simulate uu. Denote by JvJ_{v} the set of nodes which choose vv to simulate them. Each node vFv\in F, upon receiving JvJ_{v}, chooses and arbitrary order for JvJ_{v} and sends back to each uJvu\in J_{v} its index in that order.

Every node vFv\in F now attempts to learn all the input to ALGBCCALG_{BCC} of the nodes which it simulates. Notice that now for every node in vFv\in F it holds that |Jv||J_{v}| is at most O~(n/k)\tilde{O}(n/k). Notice that each node uJvu\in J_{v} has degree deg(u)=o(k)=O(k)\deg(u)=o(\sqrt{k})=O(\sqrt{k}), since otherwise it would have opted to not be simulated by any node, implying, by the constraints of this theorem, that uu has at most O~(k)\tilde{O}(\sqrt{k}) bits of input to ALGBCCALG_{BCC}, and therefore all the nodes JvJ_{v} desire to send to vv at most O~(n/k)\tilde{O}(n/\sqrt{k}) messages. Since vv synchronized all the nodes in JvJ_{v} by sending them each its index in some order of JvJ_{v}, it is possible to send all this data to vv in O~(n/(kc))\tilde{O}(n/(\sqrt{k}\cdot c)) rounds of the 𝖠𝖢(𝖼)\mathsf{AC(c)} model: To do so, assume that every node uJvu\in J_{v} wishes to send exactly d=Θ~(n/k)d=\tilde{\Theta}(n/\sqrt{k}) messages to vv (we can assume this since uu wants to send at most O~(n/k)\tilde{O}(n/\sqrt{k}) messages to vv, and so it can just add extra empty messages at the end), therefore, since node uu knows its index in JvJ_{v}, for some ordering which vv decided on, it is possible to order all the messages from all of JvJ_{v} to vv in such a way that each node uFvu\in F_{v} knows the indices of its messages and such that at all round neither vv receives more than cc messages, nor a node uFvu\in F_{v} sends more than cc messages.

Round Simulation
We now show how to simulate every one of the tt rounds of the given ALGBCCALG_{BCC}. That is, we show the two final steps of the simulation: how vFv\in F tells each node in JvJ_{v} what value to randomly send to nodes across the graph, and how each node in FF gets all the messages which were sent by all the nodes. The first part is simple – we already saw that node vv can send a single, unique message to each uJvu\in J_{v} within O~(n/(kc))\tilde{O}(n/(\sqrt{k}\cdot c)) rounds of the 𝖠𝖢(𝖼)\mathsf{AC(c)} model. The second part follows from \IfAppendixLABEL:\next ( (Sampled neighbors [18, Lemma 3.1]).)Lemma C.13. According to \IfAppendixLABEL:\next ( (Sampled neighbors [18, Lemma 3.1]).)Lemma C.13, for every node vFv\in F (which has degGvq=k\deg_{G}{v}\geq q=\sqrt{k}) to receive a single message from some node uVu\in V, it is enough for this node uu to send a message x=O~(n/q)=O~(n/k)x={\tilde{{O}}}(n/q)={\tilde{{O}}}(n/\sqrt{k}) times to nodes sampled uniformly at random and for each node vFv\in F to learn received messages from its neighbors in GG. Sending x=O~(n/k)x={\tilde{{O}}}(n/\sqrt{k}) messages, each one to random node requires O~(n/(kc)){\tilde{{O}}}(n/(\sqrt{k}\cdot c)) rounds of the 𝖠𝖢(𝖼)\mathsf{AC(c)} model. And aggregating messages from neighbors requires single round of the 𝖫𝖮𝖢𝖠𝖫\mathsf{LOCAL} model.

Output
It is critival that every node will know its output at the end of the simulation of ALGBCCALG_{BCC}. This is ensured since we assume in the statement of this theorem that the output of every node in ALGBCCALG_{BCC} it at most O(tlogn)O(t\log n) bits. Thus, instead of simulating ALGBCCALG_{BCC} directly, we simulate an algorithm ALGBCCALG_{BCC} which is just like ALGBCCALG_{BCC}, yet is followed by tt rounds where each node broadcasts its output. Since ALGBCCALG_{BCC} takes tt rounds, this simply doubles the round complexity achieved above. Due to the fact that our simulation maintains that each node knows all the messages which it broadcasts during the simulated algorithm, every node will necessarily know its output. ∎

Finally, we show how to use \IfAppendixLABEL:\next ( (𝖡𝖢𝖢\mathsf{BCC} Simulation in 𝖠𝖢(𝖼)\mathsf{AC(c)} and 𝖫𝖮𝖢𝖠𝖫\mathsf{LOCAL}).)Lemma C.14 for simulating a 𝖡𝗋𝗈𝖺𝖽𝖼𝖺𝗌𝗍𝖢𝗈𝗇𝗀𝖾𝗌𝗍𝖾𝖽𝖢𝗅𝗂𝗊𝗎𝖾\mathsf{Broadcast\ Congested\ Clique} algorithm in the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model.

See 4.3

Proof of Theorem 4.3.

We execute the simulation of \IfAppendixLABEL:\next ( (𝖡𝖢𝖢\mathsf{BCC} Simulation in 𝖠𝖢(𝖼)\mathsf{AC(c)} and 𝖫𝖮𝖢𝖠𝖫\mathsf{LOCAL}).)Lemma C.14 on GG and SxS_{x} with c=Θ~(n22x)c=\tilde{\Theta}(n^{2-2x}), in order to obtain an algorithm ALGWACC,LOCALALG_{WACC,\ LOCAL} which simulates ALGBCCALG_{BCC} on SxS_{x} within t1=O~(tnx/(kc)=O~(tnx/(kn22x)=O~(tn3x2/k)t_{1}=\tilde{O}(t\cdot n^{x}/(\sqrt{k}\cdot c)=\tilde{O}(t\cdot n^{x}/(\sqrt{k}\cdot n^{2-2x})=\tilde{O}(t\cdot n^{3x-2}/\sqrt{k}) rounds of the 𝖠𝖢(𝖼)\mathsf{AC(c)} model and t2=O(t)t_{2}=O(t) rounds of the 𝖫𝖮𝖢𝖠𝖫\mathsf{LOCAL} model on SxS_{x}. Due to \IfAppendixLABEL:\next ( (𝖠𝖢(𝖼)\mathsf{AC(c)} Simulation in 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid}).)Theorem 4.2, since c=Θ~(n22x)c=\tilde{\Theta}(n^{2-2x}), it is possible to simulate the 𝖠𝖢(𝖼)\mathsf{AC(c)} rounds of ALGWACC,LOCALALG_{WACC,\ LOCAL} on SxS_{x} in O~(t1n1x)=O~(tn2x1/k)\tilde{O}(t_{1}\cdot n^{1-x})=\tilde{O}(t\cdot n^{2x-1}/\sqrt{k}) rounds of the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model on GG. Likewise, using \IfAppendixLABEL:\next ( (𝖫𝖮𝖢𝖠𝖫\mathsf{LOCAL} Simulation in 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid}).)Lemma C.12, it is possible to simulate the 𝖫𝖮𝖢𝖠𝖫\mathsf{LOCAL} rounds of ALGWACC,LOCALALG_{WACC,\ LOCAL} on SxS_{x} in O~(t2n1x)=O~(tn1x)\tilde{O}(t_{2}\cdot n^{1-x})=\tilde{O}(t\cdot n^{1-x}) rounds of the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model on GG. ∎

C.4 A (1+ε)(1+\varepsilon)-Approximation for SSSP

We show the missing proof of how we compute SSSP with low average degree and high maximal degree. See 4.5

Proof of Lemma 4.5.

Let vSxv^{\prime}\in S_{x} be some node with degree degSx(v)=ω(n22x)\deg_{S_{x}}(v^{\prime})=\omega(n^{2-2x}). Observe that such a node vv^{\prime} exists and can be found and agreed upon by all the nodes of GG using \IfAppendixLABEL:\next ( (Aggregate and Broadcast).)Claim C.1 within O~(1)\tilde{O}(1) rounds. Denote by NvN_{v^{\prime}}, where |Nv|=Θ(n22x)|N_{v^{\prime}}|=\Theta(n^{2-2x}), an arbitrary subset of the neighbors of vv^{\prime}. Using \IfAppendixLABEL:\next ( (Token Dissemination).)Claim C.2, it is possible to ensure within O~(n1x)\tilde{O}(n^{1-x}) rounds that every node in the graph knows which nodes are in NvN_{v^{\prime}}. We strive to have the nodes of MM send all the contents of ESE_{S} to the nodes NvN_{v^{\prime}}.

We now show how the nodes NvN_{v^{\prime}} can learn all of ESE_{S}. We show that there is a way to do this in which every node in MM desires to send and receive at most O~(n22x)\tilde{O}(n^{2-2x}) messages, and therefore due to \IfAppendixLABEL:\next ( (Skeleton Unicast).)Claim 4.1, the routing will complete in O~(n1x)\tilde{O}(n^{1-x}) rounds.

We start by showing that the nodes NvN_{v^{\prime}} even have the bandwidth to receive ESE_{S} within at most O~(n1x)\tilde{O}(n^{1-x}) rounds. Due to \IfAppendixLABEL:\next ( (Skeleton Unicast).)Claim 4.1, each node in NvN_{v^{\prime}} can receive Θ~(n22x)\tilde{\Theta}(n^{2-2x}) messages in O~(n1x)\tilde{O}(n^{1-x}) rounds, implying that in total NvN_{v^{\prime}} can receive Θ~(|Nv|n22x)=Θ~(n44x)\tilde{\Theta}(|N_{v^{\prime}}|\cdot n^{2-2x})=\tilde{\Theta}(n^{4-4x}) messages. Since the average degree in SS is O~(nx/2)\tilde{O}(n^{x/2}), then |ES|=O~(n3x/2)|E_{S}|=\tilde{O}(n^{3x/2}), and 3x/244x3x/2\leq 4-4x if and only if x8/11x\leq 8/11, which holds since we assume x12/178/11x\leq 12/17\leq 8/11.

We now show that each node vMv\in M has the bandwidth to send all its incident edges within at most O~(n1x)\tilde{O}(n^{1-x}) rounds. Notice that if degSx(v)=O~(n22x)\deg_{S_{x}}{(v)}=\tilde{O}(n^{2-2x}), then it can clearly do so. Thus, for all other nodes in MM which have higher degrees, we assign some of the other nodes of MM in their Θ~(n1x)\tilde{\Theta}(n^{1-x}) neighborhood to assist them. Let AMA\subseteq M be the set of all nodes in MM such that vAv\in A has degSx(v)=Ω(n22x)\deg_{S_{x}}(v)=\Omega(n^{2-2x}). We strive to invoke \IfAppendixLABEL:\next ( (Reassign Skeletons).)Lemma C.8 on AA in order to assign each vAv\in A some Θ~(n3x2)\tilde{\Theta}(n^{3x-2}) nodes, denoted AvMA_{v}\subseteq M, where AvA_{v} are in the O~(n1x)\tilde{O}(n^{1-x})-hop neighborhood of vv in GG, and where each node in MM is assigned to at most O~(1)\tilde{O}(1) nodes in AA. Thus, we must show that for each node vAv\in A, it has in its O~(n1x)\tilde{O}(n^{1-x}) neighborhood in GG at least Θ~(|A|n3x2)\tilde{\Theta}(|A|\cdot n^{3x-2}) nodes of MM. Recall that degSx(v)=Ω(n22x)\deg_{S_{x}}(v)=\Omega(n^{2-2x}), implying that vv has at least Ω(n22x)\Omega(n^{2-2x}) nodes of MM in its O~(n1x)\tilde{O}(n^{1-x}) neighborhood in GG. We thus strive to show that Θ~(|A|n3x2)=O(n22x)\tilde{\Theta}(|A|\cdot n^{3x-2})=O(n^{2-2x}). Notice that |A|=O~(n3x/2/n22x)=O~(n7x/22)|A|=\tilde{O}(n^{3x/2}/n^{2-2x})=\tilde{O}(n^{7x/2-2}) since the average degree in SxS_{x} is at most O~(nx/2)\tilde{O}(n^{x/2}) and the minimal degree of a node in AA is Ω(n22x)\Omega(n^{2-2x}). Thus, Θ~(|A|n3x2)=O~(n13x/24)\tilde{\Theta}(|A|\cdot n^{3x-2})=\tilde{O}(n^{13x/2-4}), and since 13x/2422x13x/2-4\leq 2-2x if and only if x12/17x\leq 12/17, which is given in the conditions of this statement, we conclude. Therefore, it is possible to invoke \IfAppendixLABEL:\next ( (Reassign Skeletons).)Lemma C.8 and thus we can assume that each node vAv\in A is assigned the nodes AvA_{v} defined previously. Now, node vAv\in A distributes its incident edges in SxS_{x} to the nodes AvA_{v} uniformly, using the local edges of the 𝖧𝗒𝖻𝗋𝗂𝖽\mathsf{Hybrid} model, within O~(n1x)\tilde{O}(n^{1-x}) rounds. Since vv has at most |M|=O~(nx)|M|=\tilde{O}(n^{x}) incident edges in SxS_{x}, this means that each node uAvu\in A_{v} receives at most O~(nx3x+2)=O~(n22x)\tilde{O}(n^{x-3x+2})=\tilde{O}(n^{2-2x}) messages from vv. Every node uAvu\in A_{v} takes responsibility for the edges it received from vv and soon forwards them to the nodes NvN_{v^{\prime}}. Notice that since each node uMu\in M is assigned to as most O~(1)\tilde{O}(1) nodes in AA, each uu takes responsibility for at most O~(n22x)\tilde{O}(n^{2-2x}) messages in total.

At last, noticed that we reach a state where every node in SxS_{x} wishes to send at most O~(n22x)\tilde{O}(n^{2-2x}) messages to the nodes in NvN_{v^{\prime}} – this is since nodes with at most O~(n22x)\tilde{O}(n^{2-2x}) neighbors in SxS_{x} (nodes in MAM\setminus A) have at most such many messages, and each node vAv\in A distributed such many messages per each node in AvA_{v}. Further, as stated above, the total number of messages to send are O~(n3x/2)\tilde{O}(n^{3x/2}). Notice that for each message, it does not matter which node in NvN_{v^{\prime}} receives it, and so for each message we select the target in NvN_{v}^{\prime} independently and uniformly. The expected number of messages each node in NvN_{v}^{\prime} receives is O~(n3x/22+2x){\tilde{{O}}}(n^{3x/2-2+2x})=O~(n7x/22){\tilde{{O}}}(n^{7x/2-2})=O~(n22x){\tilde{{O}}}(n^{2-2x}), where the last transition holds since x12/178/11x\leq 12/17\leq 8/11, as seen previously. Since the targets of the messages are independent, by an application of a Chernoff Bound, and applying a union bound over all NvN_{v^{\prime}}, number of messages each node receives is O~(n22x){\tilde{{O}}}(n^{2-2x}) w.h.p. Thus by Claim 4.1, it is possible to route the messages within O~(n1x){\tilde{{O}}}(n^{1-x}) rounds w.h.p. This, combined with the fact that the contents of NvN_{v^{\prime}} were previously make globally known using \IfAppendixLABEL:\next ( (Token Dissemination).)Claim C.2, allows every node vMv\in M to locally compute to which node in NvN_{v^{\prime}} it should deliver each of its messages in a way such that every node in NvN_{v^{\prime}} receives the same number of messages across all the messages being sent. At this point, since every node in MM desires to send and receive at most Θ~(n22x)\tilde{\Theta}(n^{2-2x}) messages, it is possible to invoke \IfAppendixLABEL:\next ( (Skeleton Unicast).)Claim 4.1 in order to route all these messages within Θ~(n1x)\tilde{\Theta}(n^{1-x}) rounds.

Finally, since now nodes in NvN_{v^{\prime}} know all of ESE_{S}, node vv can learn all of ESE_{S} by learning all the information stored in NvN_{v^{\prime}} within O~(n1x)\tilde{O}(n^{1-x}) rounds. Node vv^{\prime} can compute the exact distance from the source sMs\in M to any node vMv\in M. Thus, node vv^{\prime} desires to tell every node vMv\in M the value of dSx(s,v)d_{S_{x}}(s,v). This is possible since SxS_{x} is connected and thus every node vMv\in M sent at least one message to vv throughout the above algorithm, and thus it is possible to reverse the direction of the messages sent above in order to ensure that for each vMv\in M, node vv^{\prime} can send a unique message to vv, in the same round complexity as of the above algorithm. ∎

Appendix D The 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} Model – Missing Proofs

We begin with some preliminaries for this section.

Claim D.1 (𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} Routing).

[38, Theorem 1.2][20, Theorem 2] Consider a graph G=(V,E)G=(V,E) with an identifier assignment ID:V[n]ID\colon V\mapsto\left[n\right] such that any node uu given ID(v)ID(v) can compute logdeg(v)\lfloor\log\deg{(v)}\rfloor, and a set of point-to-point routing requests, each given by the identifiers of the corresponding source-destination pair. If each node vv of GG is the source and the destination of at most degG(v)2O(logn)\deg_{G}(v)\cdot 2^{{O}(\sqrt{\log{n}})} messages, there is a randomized distributed algorithm that delivers all messages in time τmix2O(logn)\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model, w.h.p.

Corollary D.2 (Identifiers).

In the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model in O(τmix+logn){O}(\tau_{\text{mix}}+\log{n}) time we can compute an ID assignment ID:V[n]ID\colon V\mapsto\left[n\right] and other information such that ID(u)<ID(v)ID(u)<ID(v) implies logdeg(u)logdeg(v)\left\lfloor\log\deg(u)\right\rfloor\leq\left\lfloor\log\deg(v)\right\rfloor, and any vertex uu, given ID(v)ID(v), can locally compute logdeg(v)\left\lfloor\log\deg(v)\right\rfloor for any vv.

Proof of Corollary D.2.

[20, Lemma 4.1] shows how to compute the aforementioned set of identifiers in O(D+logn){O}(D+\log{n}) rounds in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model, where DD is the diameter of the graph. Since the diameter is at most the mixing time τmix\tau_{mix}, we also have that the identifiers are computable in O(τmix+logn){O}(\tau_{\text{mix}}+\log{n}) rounds w.h.p. ∎

The following lemma shows how to build a carrier configuration of the mm^{\prime}-supergraph G=(V,E,w)G^{\prime}=(V^{\prime},E^{\prime},w^{\prime}) of the input graph G=(V,E,w)G=(V,E,w) in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model and is rather technical. Thus, its proof is deferred to the Appendix D. Let IDnewID_{new} be an assignment of identifiers, s.t. any node uu can compute logdeg(v)\lfloor\log\deg{(v)}\rfloor using IDnew(v)ID_{new}(v). For an added node vVVv\in V^{\prime}\setminus V supergraph, we assume that old identifier vv and new identifier IDnew(v)ID_{new}(v) are equal v=IDnew(v)v=ID_{new}(v) and greater than identifier of any original node vVv^{\prime}\in V. Denote by ρ:[n][n]\rho\colon\left[n\right]\mapsto\left[n\right] a globally known simulation assignment, which satisfies |{i:ρ(i)=i}|=O(degG(i)/k)\left|\{i\colon\rho(i)=i^{\prime}\}\right|={O}(\deg_{G}(i^{\prime})/k), for each new identifier ii^{\prime}.

Claim D.3 (Build Carrier Configurations in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}).

Given is a graph GG, k=|E|/|V|k=\left|E\right|/\left|V\right|, and an assignment of new identifiers IDnew:V[n]ID_{new}\colon V\mapsto\left[n\right]. Let mm^{\prime} be s.t. 0mn20\leq m^{\prime}\leq n^{2}. Assume that for each vVv\in V, node ρ(IDnew(v))\rho(ID_{new}(v)) knows the original identifier of vv and IDnew(v)ID_{new}(v). There is an algorithm that builds a carrier configuration CC, which holds an mm^{\prime}-supergraph GG^{\prime} of GG. The communication token in CC for node uu is a concatenation of uu, ρ(u)\rho(u) and IDnew(u)ID_{new}(u). The algorithm runs in m/mτmix2O(logn)\lceil m^{\prime}/m\rceil\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds w.h.p. and ensures that information for the carrier node i[n]i\in\left[n\right] (carried edges, and communication tokens) is stored in the node ρ(i)\rho(i), which simulates ii.

Proof of Claim D.3.

We show how to build the outgoing carrier configuration CoutC^{out} which holds the mm^{\prime}-supergraph GG^{\prime}. The incoming carrier configuration is built similarly and simultaneously.

Representation: The added m/n\left\lceil m^{\prime}/n\right\rceil nodes are represented by the first m/n\left\lceil m^{\prime}/n\right\rceil nodes with lowest IDnewID_{new} identifiers. So the ii-th added node is simulated by ρ(i)\rho(i).

Carrier Allocation for Added Nodes: We preallocate outgoing carriers for the added m/nn\left\lceil m^{\prime}/n\right\rceil\leq n carried nodes VVV^{\prime}\setminus V. For this we compute |E|=|E|+m\left|E^{\prime}\right|=\left|E\right|+m^{\prime} using BFS, and set |V|=|V|+m/n\left|V^{\prime}\right|=\left|V\right|+\left\lceil m^{\prime}/n\right\rceil and k=|E|/|V|k^{\prime}=\left|E^{\prime}\right|/\left|V^{\prime}\right|. Then, we split the outgoing edges of each added node into n/k\lceil n/k^{\prime}\rceil batches of size at most kk^{\prime}. Notice that there are at most m/n|V|/k=O(n)\left\lceil m^{\prime}/n\right\rceil\cdot\left\lceil\left|V^{\prime}\right|/k^{\prime}\right\rceil={O}(n) added batches, which we assign to the original nodes VV to carry, such that each outgoing carrier node carries a constant number of batches. This assignment is done in terms of IDnewID_{new}. Now each node knows locally for each added node vVVv\in V^{\prime}\setminus V the identifiers IDnewID_{new} of its outgoing carrier nodes CvoutC_{v}^{out}. In particular, each outgoing carrier uu knows which added edges it carries.

Carrier Allocation for Original Nodes: Now, we allocate the part of the outgoing carrier configuration which stores original nodes and edges. Each node vv samples degG(v)/k\lceil\deg_{G^{\prime}}(v)/k^{\prime}\rceil identifiers (IDnewID_{new}) randomly independently and uniformly. Those are to become its outgoing carriers CvoutC^{out}_{v} (Item 1). By Chernoff bounds, there is a constant ζ\zeta, such that each node is an outgoing carrier for at most ζlogn\zeta\log{n} nodes w.h.p. (Item 2).

Acquainting: For each assigned (for new nodes) or sampled (for original nodes) outgoing carrier identifier ii, which belongs to some outgoing carrier uu, carried node vv or its representative knows the identifier (IDnewID_{new}) ρ(i)\rho(i) of ii’s simulating node. Node vv (or its representative) sends the identifiers vv, ρ(v)\rho(v) and IDnew(v)ID_{new}(v), and the identifier ii of the carrier node uu directly to the simulating node with the new identifier ρ(i)\rho(i). This requires for each node vv to send at most O~(degG(v)/k)=O~(degG(v)){\tilde{{O}}}(\lceil\deg_{G^{\prime}}(v)/k^{\prime}\rceil)={\tilde{{O}}}(\deg_{G}(v)) messages and to receive degG(v)/k=O~(degG(v))\lceil\deg_{G}(v)/k\rceil={\tilde{{O}}}(\deg_{G}(v)) w.h.p. Simulating node ρ(i)\rho(i) responds with the identifiers uu, ρ(u)\rho(u). Again each node sends and receives at most O~(degG(v)){\tilde{{O}}}(\deg_{G}(v)) messages w.h.p. Now, each carried node vv, for each uCvoutu\in C^{out}_{v}, knows the communication token of uu, which is the concatenation of uu, ρ(u)\rho(u) and IDnew(u)ID_{new}(u) (Item 1). Also, for each carrier node uu, uu’s simulating node ww with identifier ρ(IDnew(u))\rho(ID_{new}(u)) knows the communication tokens of each carried node vv whose edges uu carries (Item 2).

Each carried node vv sorts its outgoing carrier nodes by identifiers. It partitions the interval [n]\left[n^{\prime}\right] into degG(v)/k\lceil\deg_{G^{\prime}}(v)/k^{\prime}\rceil continuous sub-intervals with at most kk^{\prime} identifiers of opposite endpoints of outgoing edges. We assign the jj-th sub-interval to the jj-th carrier. For each carrier uu, we send to its simulating node ww the boundaries of its interval. Each node sends O~(degG(v)/k)=O~(degG(v)){\tilde{{O}}}(\lceil\deg_{G^{\prime}}(v)/k^{\prime}\rceil)={\tilde{{O}}}(\deg_{G}(v)) and receives degG(v)/k=O~(degG(v))\lceil\deg_{G}(v)/k\rceil={\tilde{{O}}}(\deg_{G}(v)) messages w.h.p. (Items 4 and 5)

Then, carried node vv, for each true outgoing edge e=(v,v)e=(v,v^{\prime}), sends to its other endpoint vv^{\prime} the communication token of the outgoing carrier which is assigned to carry the edge ee. Now, Item 5 is satisfied for original outgoing edges but not for added outgoing edges. This requires sending O(1){O}(1) messages over edges of GG.

Consider an added outgoing edge e=(v,v)e=(v,v^{\prime}), where vv is an added node and vv^{\prime} is the original one. Let uCvoutu\in C^{out}_{v} be the carrier of the outgoing part of the edge ee and uCvoutu^{\prime}\in C^{out}_{v^{\prime}} be the carrier of the incoming part of the edge ee. New identifier IDnew(u)ID_{new}(u) is globally known by construction given v=ID2(v)v=ID_{2}(v). Thus, new identifier of the representative of uu is globally known, as well as the identifier of its simulator ww^{\prime}. Let ww be the simulator of the uu. Node ww sends to ww^{\prime} tuple u,w,IDnew(u)u^{\prime},w,ID_{new}(u^{\prime}). This requires for each simulating node to send or to receive O~(kdegG(v)/k){\tilde{{O}}}(k^{\prime}\cdot\deg_{G}(v)/k) messages. This makes Item 5 satisfied for added outgoing edges as well.

Communication Tree: Each carried node vv (or its representative) locally builds a Communication Tree on its outgoing carrier nodes and sends to each node which simulates a carrier node, its parent and children in the tree (Items 6 and 3). Here each node sends O(1){O}(1) messages and receives O(degG(v)/k)=O(degG(v)){O}(\deg_{G}(v)/k)={O}(\deg_{G}(v)) messages.

Carrier Population: Each node vv, for each of its outgoing carriers uu, sends to node ww which simulates uu, the batch of original edges assigned for uu to carry along with communication token of carriers of the opposite direction of these edges (Items 3 and 4). For this, each node sends O~(degG(v)){\tilde{{O}}}(\deg_{G}(v)) messages and receives O~(kdegG(v)/k){\tilde{{O}}}(k^{\prime}\cdot\deg_{G}(v)/k). For the added edges we send the identifiers of the first and last edges they store. To do so, each node sends O~(1){\tilde{{O}}}(1) message and receives O~(degG(v)/k){\tilde{{O}}}(\deg_{G}(v)/k) messages.

Round Complexity: The carrier allocation phase is done locally, thus requires no communication.

In the acquainting phase, we use the routing algorithm from \IfAppendixLABEL:\next ( (𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} Routing).)Claim D.1 for the problems where each node vv sends and receives O~(degG(v)){\tilde{{O}}}(\deg_{G}(v)) messages and O~(kdegG(v)/k)=O~(m/mdegG(v)){\tilde{{O}}}({k^{\prime}\cdot\deg_{G}(v)/k})={\tilde{{O}}}(\lceil m^{\prime}/m\rceil\deg_{G}(v)) messages. Thus, it requires m/mτmix2O(logn)\lceil m^{\prime}/m\rceil\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds w.h.p.

Communication tree building requires only 11 invocation of the \IfAppendixLABEL:\next ( (𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} Routing).)Claim D.1, thus terminates in m/mτmix2O(logn)\lceil m^{\prime}/m\rceil\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds w.h.p.

For the Carrier population phase, each node vv sends O~(degG(v)){\tilde{{O}}}(\deg_{G}(v)) and receives O~(kdegG(v)/k)=O~(m/mdegG(v)){\tilde{{O}}}({k^{\prime}\cdot\deg_{G}(v)/k})={\tilde{{O}}}(\lceil m^{\prime}/m\rceil\deg_{G}(v)) messages, thus runs in m/mτmix2O(logn)\lceil m^{\prime}/m\rceil\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds w.h.p.

The overall complexity is m/mτmix2O(logn)\lceil m^{\prime}/m\rceil\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} rounds w.h.p. ∎

Claim D.4 (Assignment).

Let n1,n2,m,x1,,xn1,y1,,yn2n_{1},n_{2},m,x_{1},\dots,x_{n_{1}},y_{1},\dots,y_{n_{2}} be integers such that m<i=1n12xi+i=1n22yim<\sum_{i=1}^{n_{1}}{2^{x_{i}}}+\sum_{i=1}^{n_{2}}{2^{y_{i}}}, for each i[n1]:0xi<logk2i\in\left[n_{1}\right]:0\leq x_{i}<\lfloor\log{k}\rfloor-2 and j[n2]:logk2yjj\in\left[n_{2}\right]:\lfloor\log{k}\rfloor-2\leq y_{j}, where n=n1+n2n=n_{1}+n_{2}, k=m/nk=m/n. There is a partition of [n1]\left[n_{1}\right] to n2n_{2} sets I1,,In2I_{1},\dots,I_{n_{2}}, such that for each j[n2]:|Ij|42yj/kj\in\left[n_{2}\right]:\left|I_{j}\right|\leq 4\cdot\lfloor 2^{y_{j}}/k\rfloor.

Proof of Claim D.4.

We construct sets greedily, by adding new elements to the set IjI_{j} as long as its size is less than 42yj/k\lfloor 4\cdot 2^{y_{j}}/k\rfloor. We notice that the total capacity of sets

j=1n242yjk4j=1n22yjkn\displaystyle\sum_{j=1}^{n_{2}}{4\cdot\left\lfloor\frac{2^{y_{j}}}{k}\right\rfloor}\geq 4\cdot\sum_{j=1}^{n_{2}}{\frac{2^{y_{j}}}{k}}-n
>4mi=1n12xikn>4mi=1n12logk2kn\displaystyle>4\cdot\frac{m-\sum_{i=1}^{n_{1}}{2^{x_{i}}}}{k}-n>4\cdot\frac{m-\sum_{i=1}^{n_{1}}{2^{\lfloor\log{k}\rfloor-2}}}{k}-n
4mn2logk2kn=4mnm4nmnn\displaystyle\geq 4\cdot\frac{m-n\cdot{2^{\log{k}-2}}}{k}-n=4\frac{m-n\frac{m}{4n}}{\frac{m}{n}}-n
=2nn,\displaystyle=2\cdot n\geq n,

is enough to hold all elements.

See 5.3

Proof of Lemma 5.3.

First, we compute mm and mn\frac{m}{n} using a BFS algorithm in O(D)=O(τmix){O}(D)={O}(\tau_{\text{mix}}) rounds. We simulate the algorithm from \IfAppendixLABEL:\next ( (kk-nearest Algorithm).)Lemma B.20 to compute the max{k,n1/3}\max\set{k,n^{1/3}}-nearest (notice that Lemma B.20 works for kn1/3k\geq n^{1/3}) problem in a carrier configuration, and then we simulate the algorithm from \IfAppendixLABEL:\next ( (Learn Carried Information).)Lemma B.15 to learn the edges stored in the output carrier configuration CoutC^{out} by nodes in the 𝖠𝖢(𝗆/𝗇)\mathsf{AC(m/n)} model. The simulation in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model is done by the algorithm from \IfAppendixLABEL:\next ( (𝖠𝖢(𝖼)\mathsf{AC(c)} Simulation in 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST}).)Theorem 5.1. Notice that the out-degree in the resulting graph is max{k,n1/3}\max\set{k,n^{1/3}}, and we truncate the output of each node to klognk\log{n} bits before the end of the simulation. In the 𝖠𝖢(𝖼)\mathsf{AC(c)} model, solving kk-nearest requires O~(max{k,n1/3}n1/3/c+n2/3/c+1)\tilde{O}(\max\set{k,n^{1/3}}\cdot n^{1/3}/c+n^{2/3}/c+1), thus in the 𝖢𝖮𝖭𝖦𝖤𝖲𝖳\mathsf{CONGEST} model the simulation round complexity is ((max{k,n1/3}n1/3/(m/n)+n2/3/(m/n)+1+k/c)(m/n)/(m/n)+k)τmix2O(logn)=(kn4/3/m+n5/3/m+1)τmix2O(logn)((\max\set{k,n^{1/3}}\cdot n^{1/3}/(m/n)+n^{2/3}/(m/n)+1+k/c)\cdot(m/n)/(m/n)+k)\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})}={(k\cdot n^{4/3}/m+n^{5/3}/m+1)}\cdot\tau_{\text{mix}}\cdot 2^{{O}(\sqrt{\log{n}})} w.h.p.