This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Distributed D-core Decomposition over Large Directed Graphs

Xuankun Liao Hong Kong Baptist University [email protected] Qing Liu Hong Kong Baptist University [email protected] Jiaxin Jiang Hong Kong Baptist University [email protected] Xin Huang Hong Kong Baptist University [email protected] Jianliang Xu Hong Kong Baptist University [email protected]  and  Byron Choi Hong Kong Baptist University [email protected]
Abstract.

Given a directed graph GG and integers kk and ll, a D-core is the maximal subgraph HGH\subseteq G such that for every vertex of HH, its in-degree and out-degree are no smaller than kk and ll, respectively. For a directed graph GG, the problem of D-core decomposition aims to compute the non-empty D-cores for all possible values of kk and ll. In the literature, several peeling-based algorithms have been proposed to handle D-core decomposition. However, the peeling-based algorithms that work in a sequential fashion and require global graph information during processing are mainly designed for centralized settings, which cannot handle large-scale graphs efficiently in distributed settings. Motivated by this, we study the distributed D-core decomposition problem in this paper. We start by defining a concept called anchored coreness, based on which we propose a new H-index-based algorithm for distributed D-core decomposition. Furthermore, we devise a novel concept, namely skyline coreness, and show that the D-core decomposition problem is equivalent to the computation of skyline corenesses for all vertices. We design an efficient D-index to compute the skyline corenesses distributedly. We implement the proposed algorithms under both vertex-centric and block-centric distributed graph processing frameworks. Moreover, we theoretically analyze the algorithm and message complexities. Extensive experiments on large real-world graphs with billions of edges demonstrate the efficiency of the proposed algorithms in terms of both the running time and communication overhead.

PVLDB Reference Format:
PVLDB, 14(1): XXX-XXX, 2020.
doi:XX.XX/XXX.XX This work is licensed under the Creative Commons BY-NC-ND 4.0 International License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of this license. For any use beyond those covered by this license, obtain permission by emailing [email protected]. Copyright is held by the owner/author(s). Publication rights licensed to the VLDB Endowment.
Proceedings of the VLDB Endowment, Vol. 14, No. 1 ISSN 2150-8097.
doi:XX.XX/XXX.XX

1. Introduction

Graph is a widely used data structure to depict entities and their relationships. In a directed graph, edges have directions to represent links from one vertex to another, which has many real-life applications, e.g., the following relationship in online social networks such as Twitter, the money flow in financial networks, the traffic route in road networks, and the message forwarding path in communication networks. Among many graph analysis algorithms, cohesive subgraph analysis is to discover densely connected subgraphs under a cohesive subgraph model. A well-known model used for undirected graphs is kk-core, which requires every vertex in the subgraph to have at least kk neighbors (Seidman, 1983). As a directed version of kk-core, DD-core, a.k.a. (k,l)(k,l)-core, is the maximal directed subgraph such that every vertex has at least kk in-neighbors and ll out-neighbors within this subgraph (Giatsidis et al., 2013). For example, in Figure 1(a), the whole directed graph GG is a (2, 2)-core since every vertex has an in-degree of at least 2 and an out-degree of at least 2.

Refer to caption
(a) A directed graph GG
Refer to caption
(b) All non-empty D-cores
Figure 1. An example of D-core decomposition on GG

As a foundation of D-core discovery, the problem of D-core decomposition aims to compute the non-empty D-cores of a directed graph for all possible values of kk and ll. D-core decomposition has a number of applications. It has been used to build coreness-based indexes for speeding up community search (Fang et al., 2018; Chen et al., 2020), to measure influence in social networks (Garcia et al., 2017), to evaluate graph collaboration features of communities (Giatsidis et al., 2013), to visualize and characterize complex networks (Montresor et al., 2012), and to discover hubs and authorities of directed networks (Soldano et al., 2017). For example, based on the D-core decomposition results, we can index a graph’s vertices by their corenesses using a table (Fang et al., 2018) or D-Forest (Chen et al., 2020); then, D-core-based community search can be accelerated by looking up the table or D-Forest directly, instead of performing the search from scratch. In the literature, peeling-based algorithms have been proposed for D-core decomposition in centralized settings (Giatsidis et al., 2013; Fang et al., 2018). They work in a sequential fashion to remove disqualified vertices one by one from a graph. That is, they first determine all possible values of kk (i.e., from 0 to the maximum in-degree of the graph). Next, for each value kk, they compute (k,l)(k,l)-cores for all possible values of ll by iteratively deleting the vertices with the smallest out-degree. Figure 1(b) shows the D-core decomposition results of GG for different kk and ll values.

In this paper, we study the problem of D-core decomposition in distributed settings, where the input graph GG is stored on a collection of machines and each machine holds only a partial subgraph of GG. The motivation is two-folded. First, due to the large size of graph data, D-core decomposition necessitates huge memory space, which may exceed the capacity of a single machine. For example, the existing algorithms could not work for billion-scale graphs due to excessive memory space costs (Fang et al., 2018). Second, in practical applications, many large graphs are inherently distributed over a collection of machines, making distributed processing a natural solution (Linial, 1992; Aridhi et al., 2016; Montresor et al., 2012; Mandal and Hasan, 2017).

However, the existing peeling-based algorithms are not efficient when extended to distributed settings. In particular, when computing the (k,l)(k,l)-cores for a given kk, the algorithms need to iteratively find the vertices with the smallest out-degree to delete and then update the out-degrees for the remaining vertices, until the graph becomes empty. This process (i) is not parallelizable since the update of out-degrees in each iteration depends on the vertices deletion in the previous iteration and (ii) entails expensive network communications since it needs global graph information.

To address these issues, we design new distributed D-core decomposition algorithms by exploiting the relationships between a vertex and its neighbors. First, inspired by the notion of k-list (Fang et al., 2018), we propose an anchored coreness-based algorithm. Specifically, for a vertex vv, if we fix the value of kvk_{v}, we can compute the maximum value of lvl_{v} such that vv is contained in the (kv,lv)(k_{v},l_{v})-core. We call this pair (kv,lv)(k_{v},l_{v}) an anchored coreness of vv. For example, for vertex v2v_{2} in Figure 1, when kv2=0k_{v_{2}}=0, the maximum value of lv2l_{v_{2}} is 2 since v2(0,2)v_{2}\in(0,2)-core but v2(0,3)v_{2}\notin(0,3)-core. Hence, (0,2)(0,2) is an anchored coreness of v2v_{2}. The other anchored corenesses of v2v_{2} are (1,2)(1,2), (2,2)(2,2), and (3,1)(3,1). Once we have computed the anchored corenesses for every vertex, we can easily derive the D-cores from these anchored corenesses. Specifically, given integers kk and ll, the (k,l)(k,l)-core consists of the vertices whose anchored coreness (kv,lv)(k_{v},l_{v}) satisfies kv=kk_{v}=k and lvll_{v}\geq l. Thus, the problem of distributed D-core decomposition is equivalent to computing the anchored corenesses in a distributed way. To do so, we first exploit the in-degree relationship between a vertex and its in-neighbors and define an in-H-index, based on which we compute the maximum value of kk for each vertex. Then, we study the property of (k,0)(k,0)-core and define an out-H-index. On the basis of that, for each possible value of kk with respect to a vertex, we iteratively compute the corresponding upper bound of ll simultaneously. Finally, we utilize the definition of D-core to iteratively refine all the upper bounds to obtain the anchored corenesses of all vertices.

Note that the anchored coreness-based algorithm first fixes one dimension and then computes the anchored corenesses for the other dimension, which may lead to suboptimal performance. To improve performance, we further propose a novel concept, called skyline coreness, and develop a skyline coreness-based algorithm. Specifically, we say the pair (kv,lv)(k_{v},l_{v}) is a skyline coreness of a vertex vv, if there is no other pair (kv,lv)(k_{v}^{\prime},l_{v}^{\prime}) such that kvkvk_{v}^{\prime}\geq k_{v}, lvlvl_{v}^{\prime}\geq l_{v}, and v(kv,lv)v\in(k_{v}^{\prime},l_{v}^{\prime})-core. For example, in Figure 1, the skyline corenesses of v2v_{2} are {(2,2),(3,1)}\{(2,2),(3,1)\}. Compared with anchored corenesses, a vertex’s skyline corenesses contain fewer pairs of (kv,lv)(k_{v},l_{v}). Nevertheless, based on the skyline corenesses, we can still easily find all the D-cores containing the corresponding vertex. To be specific, if (kv,lv)(k_{v},l_{v}) is a skyline coreness of vv, then vv is also in the (k,l)(k,l)-cores with kkvk\leq k_{v} and llvl\leq l_{v}. The basic idea of the skyline coreness-based algorithm is to use neighbors’ skyline corenesses to iteratively estimate the skyline corenesses of each vertex. To this end, we define a new index, called D-index, for each vertex based on the following unique property of skyline corenesses. If (kv,lv)(k_{v},l_{v}) is one of the skyline corenesses of a vertex vv, we have (i) vv has at least kvk_{v} in-neighbors such that each of these in-neighbors, viv_{i}, has a skyline coreness (kvi,lvi)(k_{v_{i}}^{\prime},l_{v_{i}}^{\prime}) satisfying kvikvk_{v_{i}}^{\prime}\geq k_{v} and lvilvl_{v_{i}}^{\prime}\geq l_{v}; and (ii) vv has at least lvl_{v} out-neighbors such that each of these out-neighbors, vjv_{j}, has a skyline coreness (kvj′′,lvj′′)(k_{v_{j}}^{\prime\prime},l_{v_{j}}^{\prime\prime}) satisfying kvj′′kvk_{v_{j}}^{\prime\prime}\geq k_{v} and lvj′′lvl_{v_{j}}^{\prime\prime}\geq l_{v}. With this property, we design a distributed algorithm to iteratively compute the D-index for each vertex with its neighbors’ D-indexes. To deal with the combinatorial blow-ups in the computation of D-indexes, we further develop three optimization strategies to improve efficiency.

We implement our algorithms under two well-known distributed graph processing frameworks, i.e., vertex-centric (Malewicz et al., 2010; gir, 2012; Salihoglu and Widom, 2013; Low et al., 2012) and block-centric (Tian et al., 2013; Yan et al., 2014; Fan et al., 2017). Empirical results on small graphs demonstrate that our algorithms run faster than the peeling-based algorithm by up to 3 orders of magnitude. For larger graphs with more than 50 million edges, the peeling-based algorithm cannot finish within 5 days, while our algorithms can finish within 1 hour for most datasets. Moreover, our proposed algorithms require less than 100 rounds to converge for most datasets, and more than 90%90\% vertices can converge within 10 rounds.

This paper’s main contributions are summarized as follows:

  • For the first time in the literature, we study the problem of distributed D-core decomposition over large directed graphs.

  • We develop an anchored coreness-based distributed algorithm using well-defined in-H-index and out-H-index. To efficiently compute the anchored corenesses, we propose tight upper bounds that can be iteratively refined to exact anchored corenesses with reduced network communications.

  • We further propose a novel concept of skyline coreness and show that the problem is equivalent to the computation of skyline corenesses for all vertices. A new two-dimensional D-index that unifies the in- and out-neighbor relationships, together with three optimization strategies, is designed to compute the skyline corenesses distributedly.

  • Both theoretical analysis and empirical evaluation validate the efficiency of our algorithms for distributed D-core decomposition.

The rest of the paper is organized as follows. Section 2 reviews related work. Section 3 formally defines the problem. Sections 4 and 5 propose two distributed algorithms for computing anchored coreness and skyline coreness, respectively. Experimental results are reported in Section 6. Finally, Section 7 concludes the paper.

2. Related Work

In this section, we review the related work in two aspects, i.e., core decomposition and distributed graph computation.

Core Decomposition. As a well-known dense subgraph model, a kk-core is the maximal subgraph of an undirected graph such that every vertex has at least kk neighbors within this subgraph (Seidman, 1983). The core decomposition task aims at finding the kk-cores for all possible values of kk in a graph. Many efficient algorithms have been proposed to handle core decomposition over an undirected graph, such as peeling-based algorithms (Batagelj and Zaversnik, 2003; Cheng et al., 2011; Khaouid et al., 2015), disk-based algorithm (Cheng et al., 2011; Khaouid et al., 2015), semi-external algorithm (Wen et al., 2016), streaming algorithms (Saríyüce et al., 2013; Sarıyüce et al., 2016), parallel algorithms (Dasari et al., 2014; Esfandiari et al., 2018), and distributed algorithms (Aridhi et al., 2016; Montresor et al., 2012; Mandal and Hasan, 2017). It is worth mentioning that the distributed algorithms for kk-core decomposition (Aridhi et al., 2016; Montresor et al., 2012; Mandal and Hasan, 2017) cannot be used for distributed D-core decomposition. Specifically, the distributed kk-core decomposition algorithms use the neighbors’ corenesses to estimate a vertex’s coreness, where all neighbors are of the same type. For D-core, a vertex’s neighbors include in-neighbors and out-neighbors, which affect each other and should be considered simultaneously. If we consider only one type of neighbors, we cannot get the correct answer. Inspired by the H-index-based computation for core decomposition (Lü et al., 2016) and nucleus decomposition (Sariyüce et al., 2018), we apply a similar idea in the design of distributed algorithms. Nevertheless, our technical novelty lies in the non-trivial extension of H-index from one-dimensional undirected coreness to two-dimensional anchored/skyline coreness, which needs to consider the computations of in-degrees and out-degrees simultaneously in a unified way.

In addition, core decomposition has been studied for different types of networks, such as weighted graphs (Eidsaa and Almaas, 2013; Zhou et al., 2021), uncertain graphs (Bonchi et al., 2014; Peng et al., 2018), bipartite graphs (Liu et al., 2019), temporal graphs (Galimberti et al., 2021; Wu et al., 2015), and heterogeneous information networks (Fang et al., 2020). Recently, a new problem of distance-generalized core decomposition has been studied by considering vertices’ hh-hop connectivity (Bonchi et al., 2019; Liu et al., 2021). Note that a directed graph can be viewed as a bipartite graph. After transforming a directed graph to a bipartite graph, the (k,l)(k,l)-core in the directed graph has a corresponding (α,β)(\alpha,\beta)-core in the bipartite graph (Liu et al., 2019), but not vice versa. Therefore, the problems of (k,l)(k,l)-core decomposition and (α,β)(\alpha,\beta)-core decomposition are not equivalent, and (α,β)(\alpha,\beta)-core decomposition algorithms cannot be used in our work.

Distributed Graph Computation. In the literature, there exist various distributed graph computing models and systems to support big graph analytics. Among them, the vertex-centric framework (Malewicz et al., 2010; Low et al., 2012; McCune et al., 2015) and the block-centric framework (Tian et al., 2013; Yan et al., 2014) are two most popular frameworks.

The vertex-centric framework assumes that each vertex is associated with one computing node and communication occurs through edges. The workflow of the vertex-centric framework consists of a set of synchronous supersteps. Within each superstep, the vertices execute a user-defined function asynchronously after receiving messages from their neighbors. If a vertex does not receive any message, it will be marked as inactive. The framework stops once all vertices become inactive. Typical vertex-centric systems include Pregel (Malewicz et al., 2010), Giraph (gir, 2012), GPS (Salihoglu and Widom, 2013), and GraphLab (Low et al., 2012). For the block-centric framework, one computing node stores the vertices within a block together and communication occurs between blocks after the computation within a block reaches convergence. Compared with the vertex-centric framework, the block-centric framework can reduce the network traffic and better balance the workload among nodes. Distributed graph processing systems such as Giraph++ (Tian et al., 2013), Blogel (Yan et al., 2014), and GRAPE (Fan et al., 2017) belong to the block-centric framework.

Note that in this paper we mainly focus on algorithm designs for distributed D-core decomposition. To demonstrate the flexibility of our proposed algorithms, we implement them for performance evaluation in both vertex-centric and block-centric frameworks.

3. Problem Formulation

In this paper, we consider a directed, unweighted simple graph G=(VG,EG)G=(V_{G},E_{G}), where VGV_{G} and EGE_{G} are the set of vertices and edges, respectively. Each edge eEGe\in E_{G} has a direction. For an edge e=u,vEGe=\langle u,v\rangle\in E_{G}, we say uu is an in-neighbor of vv and vv is an out-neighbor of uu. Correspondingly, NGin(v)N^{in}_{G}(v) and NGout(v)N^{out}_{G}(v) are respectively denoted as the in-neighbor set and out-neighbor set of a vertex vv in GG. We define three kinds of degrees for a vertex vv as follows: (1) the in-degree is the number of vv’s in-neighbors in GG, i.e., degGin(v)=|NGin(v)|deg^{in}_{G}(v)=|N^{in}_{G}(v)|; (2) the out-degree is the number of vv’s out-neighbors in GG, i.e., degGout(v)=|NGout(v)|deg^{out}_{G}(v)=|N^{out}_{G}(v)|; (3) the degree is the sum of its in-degree and out-degree, i.e., degG(v)=degGin(v)+degGout(v)deg_{G}(v)=deg^{in}_{G}(v)+deg^{out}_{G}(v). Based on the in-degree and out-degree, we give a definition of D-core as follows.

Definition 3.0.

D-core (Giatsidis et al., 2013). Given a directed graph G=(VG,EG)G=(V_{G},E_{G}) and two integers kk and ll, a D-core of GG, also denoted as (k,l)(k,l)-core, is the maximal subgraph H=(VH,EH)GH=(V_{H},E_{H})\subseteq G such that vVH\forall v\in V_{H}, degHin(v)kdeg_{H}^{in}(v)\geq k and degHout(v)ldeg_{H}^{out}(v)\geq l.

According to Definition 3.1, a D-core should satisfy both the degree constraints and the size constraint. The degree constraints ensure the cohesiveness of D-core in terms of in-degree and out-degree. The size constraint guarantees the uniqueness of the D-core, i.e., for a specific pair of (k,l)(k,l), there exists at most one D-core in GG. Moreover, D-core has a partial nesting property as follows.

Property 3.1.

Partial Nesting. Given two D-cores, (k1,l1)(k_{1},l_{1})-core D1D_{1} and (k2,l2)(k_{2},l_{2})-core D2D_{2}, D1D_{1} is nested in D2D_{2} (i.e., D1D2D_{1}\subseteq D_{2}) if k1k2k_{1}\geq k_{2} and l1l2l_{1}\geq l_{2}. Note that if k1k2k_{1}\geq k_{2} and l1<l2l_{1}<l_{2}, or k1<k2k_{1}<k_{2} and l1l2l_{1}\geq l_{2}, D1D_{1} and D2D_{2} may be not nested in each other.

Refer to caption
Figure 2. D-core
Example 3.0.

In Figure 2, the directed subgraph H1H_{1} induced by the vertices v1v_{1}, v4v_{4}, v5v_{5}, and v6v_{6} is a (2,2)(2,2)-core since vVH1\forall v\in V_{H_{1}}, degH1in(v)=degH1out(v)=2deg^{in}_{H_{1}}(v)=deg^{out}_{H_{1}}(v)=2. Moreover, H1H_{1} \subseteq H2=(2,0)H_{2}=(2,0)-core, H1H_{1} \subseteq H3=(1,1)H_{3}=(1,1)-core. On the other hand, H2H_{2} \nsubseteq H3H_{3} and H3H_{3} \nsubseteq H2H_{2}, due to the non-overlapping vertices v2v_{2}, v3v_{3}, and v7v_{7}.

In this paper, we study the problem of D-core decomposition to find all D-cores of a directed graph GG in distributed settings. In-memory algorithms of D-core decomposition have been studied in (Giatsidis et al., 2013; Fang et al., 2018), assuming that the entire graph and associated structures can fit into the memory of a single machine. To our best knowledge, the problem of distributed D-core decomposition, considering a large graph distributed over a collection of machines, has not been investigated in the literature. We formulate a new problem of distributed D-core decomposition as follows.

Problem 1.

(Distributed D-core Decomposition). Given a directed graph G=(VG,EG)G=(V_{G},E_{G}) that is distributed in a collection of machines {Mi:\{M_{i}: a machine MiM_{i} holds a partial subgraph GiGG_{i}\subseteq G, 1in}1\leq i\leq n\} where n2n\geq 2 and i=1nGi=G\cup_{i=1}^{n}G_{i}=G, the problem of distributed D-core decomposition is to find all D-cores of GG using nn machines, i.e., identifying the (k,l)(k,l)-cores with all possible (k,l)(k,l) pairs.

Consider applying D-core decomposition on GG in Figure 2. We can obtain a total of 9 different D-cores: (0,0)(0,0)-core == (1,0)(1,0)-core == GG; (0,1)(0,1)-core == (1,1)(1,1)-core == H3H_{3}; (0,2)(0,2)-core == the subgraph of GG induced by the vertices in VH1{v7}V_{H_{1}}\cup\{v_{7}\}; (1,2)(1,2)-core == (2,1)(2,1)-core == (2,2)(2,2)-core == H1H_{1}; (2,0)(2,0)-core == H2H_{2}.

In the following two sections, we propose two new distributed algorithms for D-core decomposition. Without loss of generality, we mainly present the algorithms under the vertex-centric framework. At the end of Sections 4 and 5, we discuss how to extend our proposed algorithms to the block-centric framework.

4. Distributed Anchored Coreness-Based Algorithm

In this section, we first give a definition of anchored coreness, which is useful for D-core decomposition. Then, we present a vertex-centric distributed algorithm for anchored coreness computation. Finally, we analyze the correctness and complexity of our proposed algorithm, and discuss its block-centric extension.

4.1. Anchored Coreness

Recall that, in the undirected kk-core model (Seidman, 1983), every vertex vv has a unique value called coreness, i.e., the maximum value k0k\in\mathbb{N}_{0} such that vv is contained in a non-empty kk-core. Similarly, we give a definition of anchored coreness for directed graphs as follows.

Definition 4.0.

(Anchored Coreness). Given a directed graph GG and an integer kk, the anchored coreness of a vertex vVGv\in V_{G} is a pair (k,lmax(v,k))(k,l_{max}(v,k)), where lmax(v,k)=maxl0{l|(k,l)-core HGvVH}l_{max}(v,k)=\max\limits_{l\in\mathbb{N}_{0}}\{l\ |\ \exists(k,l)\text{-core }H\subseteq G\wedge v\in V_{H}\}. The entire anchored corenesses of the vertex vv are defined as Φ(v)={(k,lmax(v,k))| 0kkmax(v)}\Phi(v)=\{(k^{\prime},l_{max}(v,k^{\prime}))\ |\ 0\leq k^{\prime}\leq k_{max}(v)\}, where kmax(v)=maxk′′0{k′′|(k′′,0)-core HvVH}k_{max}(v)=\max\limits_{k^{\prime\prime}\in\mathbb{N}_{0}}\{k^{\prime\prime}\ |\ \exists(k^{\prime\prime},0)\text{-core }H\wedge v\in V_{H}\}.

Different from the undirected coreness, the anchored coreness is a two-dimensional feature of in-degree and out-degree in directed graphs. For example, consider the graph GG in Figure 1 and k=3k=3, the anchored coreness of vertex v2v_{2} is (3,1)(3,1), as lmax(v2,3)=1l_{max}(v_{2},3)=1. Correspondingly, Φ(v2)={(0,2),(1,2),(2,2),(3,1)}\Phi(v_{2})=\{(0,2),(1,2),(2,2),(3,1)\}. The anchored corenesses can facilitate the distributed D-core decomposition as follows. According to Property 3.1, for a vertex vv with the anchored coreness of (k,l)(k,l), vv belongs to any (k,l)(k,l^{\prime})-core with lll^{\prime}\leq l. Hence, as long as we compute the anchored corenesses of vv for each possible kk, we can get all D-cores containing vv. As a result, for a given directed graph GG, the problem of D-core decomposition is equivalent to computing the entire anchored corenesses for every vertex vVGv\in V_{G}, i.e., {Φ(v)|vVG}\{\Phi(v)|v\in V_{G}\}.

4.2. Distributed Anchored Coreness Computing

In this section, we present a distributed algorithm for computing the entire anchored corenesses for every vertex in GG.

Overview. To handle the anchored coreness computation simultaneously in a distributed setting, we propose a distributed vertex-centric algorithm to compute all feasible anchored corenesses (k,l)(k,l)’s for a vertex vv. The general idea is to first identify the feasible range of k[0,kmax(v)]k\in[0,k_{max}(v)] by exploring (k,0)(k,0)-cores and then refine an estimated upper bound of lmax(v,k)l_{max}(v,k) to be exact for all possible values of kk. The framework is outlined in Algorithm 1, which gives an overview of the anchored coreness updating procedure in three phases: 1) deriving kmax(v)k_{max}(v); 2) computing the upper bound of lmax(v,k)l_{max}(v,k) for each kk; and 3) refining the upper bound to the exact anchored coreness lmax(v,k)l_{max}(v,k). Note that in the second and third phases, the upper bound of lmax(v,k)l_{max}(v,k) can be computed and refined in batch, instead of one by one sequentially, for different values of k[0,kmax(v)]k\in[0,k_{max}(v)].

Table 1. An illustration of distributed D-core decomposition using Algorithm 1 on graph GG in Figure 2.
Vertices
v1v_{1} v2v_{2} v3v_{3} v4v_{4} v5v_{5} v6v_{6} v7v_{7} v8v_{8}
Phase I iH(0)(v)\mathrm{iH}^{(0)}(v) 3 2 2 2 2 3 1 2
iH(1)(v)\mathrm{iH}^{(1)}(v) 2 2 2 2 2 2 1 2
iH(2)(v)\mathrm{iH}^{(2)}(v) == kmax(v)k_{max}(v) 2 2 2 2 2 2 1 2
Phase II k[0,kmax(v)]\forall k\in[0,k_{max}(v)], oHG[k](0)(v)\mathrm{oH}_{G[k]}^{(0)}(v) 3; 3; 3 0; 0; 0 0; 0; 0 5; 5; 5 3; 3; 3 2; 2; 2 2; 2 2; 2; 2
k[0,kmax(v)]\forall k\in[0,k_{max}(v)], oHG[k](1)(v)\mathrm{oH}_{G[k]}^{(1)}(v) 2; 2; 2 0; 0; 0 0; 0; 0 2; 2; 2 2; 2; 2 2; 2; 2 2; 2 1; 1; 0
k[0,kmax(v)]\forall k\in[0,k_{max}(v)], oHG[k](2)(v)\mathrm{oH}_{G[k]}^{(2)}(v) 2; 2; 2 0; 0; 0 0; 0; 0 2; 2; 2 2; 2; 2 2; 2; 2 2; 2 1; 1; 0
Phase III k[0,kmax(v)]\forall k\in[0,k_{max}(v)], lupp(k,v)l_{upp}(k,v) 2; 2; 2 0; 0; 0 0; 0; 0 2; 2; 2 2; 2; 2 2; 2; 2 2; 2 1; 1; 0
k[0,kmax(v)]\forall k\in[0,k_{max}(v)], lupp(k,v)l^{\prime}_{upp}(k,v) 2; 2; 2 0; 0; 0 0; 0; 0 2; 2; 2 2; 2; 2 2; 2; 2 2; 1 1; 1; 0
k[0,kmax(v)]\forall k\in[0,k_{max}(v)], lmax(k,v)l_{max}(k,v) 2; 2; 2 0; 0; 0 0; 0; 0 2; 2; 2 2; 2; 2 2; 2; 2 2; 1 1; 1; 0
Input: directed graph GG, vertex vv
Output: anchored corenesses of vertex vv
1 Compute kmax(v)k_{max}(v) for vertex vv using Algorithm 2;
2 Compute the upper bounds lupp(k,v)l_{upp}(k,v) where k[0,kmax]k\in[0,k_{max}], by invoking Algorithm 3;
3 Refine the upper bounds lupp(k,v)l_{upp}(k,v) to anchored corenesses lmax(k,v)l_{max}(k,v) using Algorithm 4;
return the entire anchored corenesses of vv as Φ(v)\Phi(v);
Algorithm 1 Distributed Anchored Coreness Computation: routine executed by a vertex vv

Phase I: Computing the in-degree limit kmax(v)k_{max}(v). To compute kmax(v)k_{max}(v), first, we introduce a concept of H-index (Hirsch, 2005). Specifically, given a collection of integers SS, the H-index of SS is a maximum integer hh such that SS has at least hh integer elements whose values are no less than hh, denoted as (S)\mathcal{H}(S). For example, given S={1,2,3,3,4,6}S=\{1,2,3,3,4,6\}, H-index (S)=3\mathcal{H}(S)=3, as SS has at least 3 elements whose values are no less than 3. Based on H-index, we give a new definition of nn-order in-H-index for directed graph.

Definition 4.0.

(nn-order in-H-index). Given a vertex vv in GG, the nn-order in-H-index of vv, denoted by iHG(n)(v)\mathrm{iH}_{G}^{(n)}(v), is defined as

(1) iHG(n)(v)={degGin(v),n=0(I),n>0\mathrm{iH}_{G}^{(n)}(v)=\begin{cases}deg^{in}_{G}(v),&n=0\\ \mathcal{H}(I),&n>0\end{cases}

where the integer set I={iHG(n1)(u)|uNGin(v)}I=\{\mathrm{iH}_{G}^{(n-1)}(u)|u\in N^{in}_{G}(v)\}.

Theorem 4.3 (Convergence).
(2) kmax(v)=limniHG(n)(v)k_{max}(v)=\lim_{n\rightarrow\infty}\mathrm{iH}_{G}^{(n)}(v)
Proof.

Due to space limitation, we give a proof sketch here. The detailed proof can be found in (Liao et al., 2022). First, we prove that iHG(n)(v)\mathrm{iH}_{G}^{(n)}(v) is non-increasing with the increase of order nn. Thus, iHG(n)(v)\mathrm{iH}_{G}^{(n)}(v) finally converges to an integer when nn is big enough. Then, we prove kmax(v)iHG()(v)iHG()(v)k_{max}(v)\leq\mathrm{iH}_{G^{\prime}}^{(\infty)}(v)\leq\mathrm{iH}_{G}^{(\infty)}(v), where GGG^{\prime}\subseteq G is a subgraph induced by the vertices vv^{\prime} with kmax(v)kmax(v)k_{max}(v^{\prime})\geq k_{max}(v). Also, we know kmax(v)iHG()(v)k_{max}(v)\geq\mathrm{iH}_{G}^{(\infty)}(v) by definition. Hence, kmax(v)=iHG()(v)k_{max}(v)=\mathrm{iH}_{G}^{(\infty)}(v). ∎

According to Theorem 2, iHG(n)(v)\mathrm{iH}_{G}^{(n)}(v) finally converges to kmax(v)k_{max}(v), based on which we present a distributed algorithm as shown in Algorithm 2 to compute kmax(v)k_{max}(v). Algorithm 2 has an initialization step (lines 1-4), and two update procedures after receiving one message (lines 5-7) and all messages (lines 8-11). It first uses 0 to initialize the set II, which keeps the latest nn-order in-H-indexes of vv’s in-neighbors (lines 1-2). Then, the algorithm sets the nn-order in-H-index of vv to its in-degree (line 3) and sends the message ¡vv, iH(v)\mathrm{iH}(v)¿ to all its out-neighbors (line 4). When vv receives a message ¡vv^{\prime}, iH(v)\mathrm{iH}(v^{\prime})¿ from its in-neighbor vv^{\prime}, the algorithm updates the nn-order in-H-index of vv^{\prime} (line 5). If iH(v)<iH(v)\mathrm{iH}(v^{\prime})<\mathrm{iH}(v), it means the nn-order in-H-index of vv may decrease. Thus, flag is set to True to indicate the re-computation of vv’s nn-order in-H-index (line 7). After receiving all massages, if flag is True, Algorithm 2 updates vv’s nn-order in-H-index iH(v)\mathrm{iH}(v) and inform all its out-neighbors if iH(v)\mathrm{iH}(v) decreases (lines 9-11). Algorithm 2 completes and returns iH(v)\mathrm{iH}(v) as kmax(v)k_{max}(v) when there is no vertex broadcasting messages (line 12).

Example 4.0.

We use the directed graph GG in Figure 2 to illustrate Algorithm 2, whose calculation process is shown in Table 1. We take vertex v1v_{1} as an example. First, v1v_{1}’s 0-order in-H-index is initialized with its in-degree, i.e., iHG(0)(v1)=3\mathrm{iH}_{G}^{(0)}(v_{1})=3. Then, Algorithm 2 iteratively computes iHG(n)(v1)\mathrm{iH}_{G}^{(n)}(v_{1}). After one iteration, the 1-order in-H-index of v1v_{1} has converged to 𝒮(iHG(0)(v4),iHG(0)(v6),iHG(0)(v7))\mathcal{S}(\mathrm{iH}_{G}^{(0)}(v_{4}),\mathrm{iH}_{G}^{(0)}(v_{6}),\mathrm{iH}_{G}^{(0)}(v_{7})) = 𝒮(2,3,1)\mathcal{S}(2,3,1) = 2. Thus, kmax(v1)=iHG(2)(v1)=iHG(1)(v1)=2k_{max}(v_{1})=\mathrm{iH}_{G}^{(2)}(v_{1})=\mathrm{iH}_{G}^{(1)}(v_{1})=2.

Input: directed graph GG, vertex vv
Output: kmax(v)k_{max}(v)
Initializations
1 for each vNGin(v)v^{\prime}\in N^{in}_{G}(v) do
2       I[v]0I[v^{\prime}]\leftarrow 0;
3iH(v)degGin(v)\mathrm{iH}(v)\leftarrow deg^{in}_{G}(v); Send message v\langle v, iH(v)\mathrm{iH}(v)\rangle to all out-neighbors of vv;
4
On receiving message v\langle v^{\prime}, iH(v)\mathrm{iH}(v^{\prime})\rangle from vv’s in-neighbor vv^{\prime}
5 I[v]iH(v)I[v^{\prime}]\leftarrow\mathrm{iH}(v^{\prime}); if iH(v)\mathrm{iH}(v^{\prime}) ¡ iH(v)\mathrm{iH}(v) then
6       flagflag\leftarrow True
7
After receiving all messages
8 if flag=Trueflag=True then
9       if (I)\mathcal{H}(I) ¡ iH(v)\mathrm{iH}(v)  then
10             iH(v)(I)\mathrm{iH}(v)\leftarrow\mathcal{H}(I);flagflag\leftarrow False; Send message v\langle v, iH(v)\mathrm{iH}(v)\rangle to all out-neighbors of vv
11      
When no vertex broadcasts messages
return kmax(v)iH(v)k_{max}(v)\leftarrow\mathrm{iH}(v);
Algorithm 2 Computing kmax(v)k_{max}(v)

Phase II: Computing the upper bounds of lmax(k,v)l_{max}(k,v). In a distributed setting, the computation of lmax(k,v)l_{max}(k,v) faces technical challenges. It is difficult to compute lmax(k,v)l_{max}(k,v) by making use of only the “intermediate” neighborhood information. Because some vertices uNG(v)u\in N_{G}(v) may become disqualified and thus be removed from the candidate set of (k,lmax(k,v))(k,l_{max}(k,v))-core during the iteration process. Even worse, verifying the candidacy of uu requires a large number of message exchanges between vertices. To address these issues, we design a novel upper bound for lmax(k,v)l_{max}(k,v), denoted by lupp(k,v)l_{upp}(k,v), which can be iteratively computed with “intermediate” corenesses to reduce communication costs. To start with, we give a new definition of nn-order out-H-index, similar to Definition 4.2.

Definition 4.0.

(nn-order out-H-index). Given a vertex vv in GG, the nn-order out-H-index of vv, denoted as oHG(n)(v)\mathrm{oH}_{G}^{(n)}(v), is defined as

(3) oHG(n)(v)={degGout(v),n=0(O),n>0\mathrm{oH}_{G}^{(n)}(v)=\begin{cases}deg^{out}_{G}(v),&n=0\\ \mathcal{H}(O),&n>0\end{cases}

where O={oHG(n1)(u)|uNGout(v)}O=\{\mathrm{oH}_{G}^{(n-1)}(u)|u\in N^{out}_{G}(v)\}.

Based on oHG(n)(v)\mathrm{oH}_{G}^{(n)}(v), we have the following theorem.

Theorem 4.6.

Given a vertex vv in GG and an integer k[0,kmax(v)]k\in[0,k_{max}(v)], let G[k]G[k] be the subgraph of GG induced by the vertices in Vk={u|uVGkmax(u)k}V_{k}=\{u\ |\ u\in V_{G}\wedge k_{max}(u)\geq k\}. Then, it holds that

(4) lmax(k,v)limnoHG[k](n)(v).l_{max}(k,v)\leq\lim_{n\rightarrow\infty}\mathrm{oH}_{G[k]}^{(n)}(v).
Proof.

Similar to Theorem 2, we can prove limnoHG[k](n)(v)\lim\limits_{n\rightarrow\infty}\mathrm{oH}_{G[k]}^{(n)}(v) =l=l^{\prime} such that v(0,l)v\in(0,l^{\prime})-core of G[k]G[k] but v(0,l+1)v\notin(0,l^{\prime}+1)-core of G[k]G[k]. Then, we have the following relationship for the D-cores of G[k]G[k]: (k,lmax(k,v))(k,l_{max}(k,v))-core \subseteq (0,lmax(k,v))(0,l_{max}(k,v))-core \subseteq (0,l)(0,l^{\prime})-core. According to the partial nesting property of D-core, llmax(k,v)l^{\prime}\geq l_{max}(k,v) holds.

Theorem 4 indicates that limnoHG[k](n)(v)\lim\limits_{n\rightarrow\infty}\mathrm{oH}_{G[k]}^{(n)}(v) can be served as an upper bound of lmax(k,v)l_{max}(k,v), i.e., lupp(k,v)=limnoHG[k](n)(v)l_{upp}(k,v)=\lim\limits_{n\rightarrow\infty}\mathrm{oH}_{G[k]}^{(n)}(v). Thus, we can compute lupp(k,v)l_{upp}(k,v) by iteratively calculating the nn-order out-H-index of vv in the directed subgraph G[k]G[k]. Moreover, to efficiently compute lupp(k,v)l_{upp}(k,v) for all values k[0,kmax(v)]k\in[0,k_{max}(v)] in parallel, our distributed algorithm should send updating messages in batch and compute lupp(k,v)l_{upp}(k,v) simultaneously.

Input: directed graph GG, vertex vv, kmax(v)k_{max}(v)
Output: the upper bounds lupp(k,v)l_{upp}(k,v) for k[0,kmax(v)]k\in[0,k_{max}(v)]
Initializations
1 for each k[0,kmax(v)]k\in[0,k_{max}(v)] do
2       for each vNG[k]out(v)v^{\prime}\in N^{out}_{G[k]}(v) do
3             I[k][v]0I[k][v^{\prime}]\leftarrow 0;
4      oHv[k]degG[k]out(v)\mathrm{oH}_{v}[k]\leftarrow deg^{out}_{G[k]}(v); change[k]change[k]\leftarrow True;
5
6Send message v\langle v, oHv[],change[]\mathrm{oH}_{v}[\cdot],change[\cdot]\rangle to all in-neighbors of vv;
7
On receiving message v\langle v^{\prime}, oHv[],change[]\mathrm{oH}_{v^{\prime}}[\cdot],change[\cdot]\rangle from vv’s out-neighbor vv^{\prime}
8 for each k[0,kmax(v)]k\in[0,k_{max}(v)] do
9       if change[k]=Truechange[k]=True then
10             I[k][v]oH[k]I[k][v^{\prime}]\leftarrow\mathrm{oH}[k];
11             if oHv[k]\mathrm{oH}_{v^{\prime}}[k] ¡ oHv[k]\mathrm{oH}_{v}[k] then
12                   flag[k]flag[k]\leftarrow True
13            
14      
15
After receiving all messages
16 for each k[0,kmax(v)]k\in[0,k_{max}(v)] do
17       if flag[k]=Trueflag[k]=True then
18             if (I[k])\mathcal{H}(I[k]) ¡ oHv[k]\mathrm{oH}_{v}[k]  then
19                   oHv[k](I[k])\mathrm{oH}_{v}[k]\leftarrow\mathcal{H}(I[k]); change[k]change[k]\leftarrow False;
20            
21      
22if k[0,kmax(v)]\exists k\in[0,k_{max}(v)], change[k]=Truechange[k]=True then
23       Send message v\langle v, oHv[],change[]\mathrm{oH}_{v}[\cdot],change[\cdot]\rangle to all in-neighbors of vv;
24      
25
When no vertex broadcasts messages
26 lupp[]oHv[]l_{upp}[\cdot]\leftarrow\mathrm{oH}_{v}[\cdot];
Algorithm 3 Computing Upper Bounds lupp(k,v)l_{upp}(k,v)

Based on the above discussion, we propose a distributed algorithm for computing the upper bounds lupp(k,v)l_{upp}(k,v). Algorithm 3 presents the detailed procedure. First, it initializes the nn-order out-H-index of vv for each possible value of kk and sends them to vv’s in-neighbors (lines 1-5). When vv receives a message from its out-neighbor vv^{\prime}, vv updates the nn-order out-H-index of vv^{\prime} for subsequent calculation (lines 6-10). After receiving all messages, vv updates its own nn-order out-H-index for each possible value of kk (lines 11-14). If any nn-order out-H-indexes of vv decreases, vv informs all its in-neighbors (lines 15-16). Finally, when there is no vertex broadcasting messages, we get the upper bound lupp(k,v)l_{upp}(k,v) for each k[0,kmax(v)]k\in[0,k_{max}(v)] (line 17).

Example 4.0.

We illustrate Algorithm 3 by continuing Example 4.4. As shown in Table 1, since kmax(v1)=2k_{max}(v_{1})=2, we first initialize the 0-order out-H-indexes of v1v_{1} as oHG[k](0)(v1)=3\mathrm{oH}_{G[k]}^{(0)}(v_{1})=3 for each k{0,1,2}k\in\{0,1,2\}. After one iteration of computing the nn-order out-H-indexes, all 11-order out-H-indexes of v1v_{1} have converged to 2. Thus, we have oHG[0](1)(v1)=2\mathrm{oH}_{G[0]}^{(1)}(v_{1})=2, oHG[1](1)(v1)=2\mathrm{oH}_{G[1]}^{(1)}(v_{1})=2, oHG[2](1)(v1)=2\mathrm{oH}_{G[2]}^{(1)}(v_{1})=2.

Phase III: Refining lupp(k,v)l_{upp}(k,v) to lmax(k,v)l_{max}(k,v). Finally, we present the third phase of refining the upper bound lupp(k,v)l_{upp}(k,v) to get the exact anchored coreness lmax(k,v)l_{max}(k,v). To this end, we first present the following theorem.

Theorem 4.8.

Given a vertex vv in GG and an integer kk, if (k,lupp(k,v))(k,l_{upp}(k,v)) is an anchored coreness of vv, it should satisfy two constraints on in-neighbors and out-neighbors: (i) vv has at least kk in-neighbors vv^{\prime} such that lupp(k,v)lupp(k,v)l_{upp}(k,v^{\prime})\geq l_{upp}(k,v); and (ii) vv has at least lupp(k,v)l_{upp}(k,v) out-neighbors v′′v^{\prime\prime} such that lupp(k,v′′)lupp(k,v)l_{upp}(k,v^{\prime\prime})\geq l_{upp}(k,v).

Theorem 4.8 obviously holds, according to Def. 3.1 of D-core and the upper bound lupp(k,v)lmax(k,v)l_{upp}(k,v)\geq l_{max}(k,v). Based on Theorem 4.8, we can refine lupp(k,v)l_{upp}(k,v) decrementally by checking the upper bounds lupp(k,v)l_{upp}(k,v^{\prime})’s of vv’s in- and out-neighbors. If vv satisfies the above two constraints in Theorem 4.8, lupp(k,v)l_{upp}(k,v) keeps unchanged; otherwise, lupp(k,v)l_{upp}(k,v) decreases by 1 as the current (k,lupp(k,v))(k,l_{upp}(k,v)) is not an anchored coreness of vv. The above process needs to repeat for all vertices and all possible values of kk, until none of (k,lupp(k,v))(k,l_{upp}(k,v)) changes. Finally, we obtain all anchored corenesses {Φ(v)|vVG}\{\Phi(v)|v\in V_{G}\}.

Algorithm 4 outlines the procedure of the distributed refinement phase. First, the algorithm initializes some auxiliary structures and broadcast vv’s upper bound lupp(k,v)l_{upp}(k,v) for each possible k[0,kmax(v)]k\in[0,k_{max}(v)] (lines 1-3). When it receives a message from vv’s neighbor vv^{\prime}, the algorithm updates the upper bound set for vv^{\prime} (lines 4-7). After receiving all messages, the algorithm refines lupp(k,v)l_{upp}(k,v) for each k[0,kmax(v)]k\in[0,k_{max}(v)] based on Theorem 4.8 (lines 8-13). If there exists such a (k,lupp(k,v))(k,l_{upp}(k,v)) whose lupp(k,v)l_{upp}(k,v) is decreased, the algorithm broadcasts the new upper bound set to vv’s neighbors (lines 14-15). As soon as there are no vertex broadcasting messages, Algorithm 4 terminates and we get all anchored corenesses of vv (lines 16-17).

Example 4.0.

Continue Example 4.7 to illustrate Algorithm 4 in Phase III, which refines the upper bound lupp(k,v1)l_{upp}(k,v_{1}) to the exact lmax(k,v1)l_{max}(k,v_{1}). For kmax(v1)=3k_{max}(v_{1})=3 and each k[0,kmax(v1)]k\in[0,k_{max}(v_{1})], Table 1 reports the final results lupp(k,v1)l_{upp}(k,v_{1}) == lmax(k,v1)=2l_{max}(k,v_{1})=2. Therefore, the entire anchored corenesses of v1v_{1} are Φ(v1)={(0,2),(1,2),(2,2)}\Phi(v_{1})=\{(0,2),(1,2),(2,2)\}.

Input: graph GG, vertex vv, kmax(v)k_{max}(v), upper bounds lupp[]l_{upp}[\cdot]
Output: the entire anchored corenesses of vv as Φ(v)\Phi(v)
Initializations
1 for each k[0,kmax(v)]k\in[0,k_{max}(v)] do
2       change[k]change[k]\leftarrow True; l[k][v]0l[k][v]\leftarrow 0;
3      
4Send message v\langle v, lupp[],change[]l_{upp}[\cdot],change[\cdot]\rangle to all neighbors of vv;
5
On receiving message v\langle v^{\prime}, lupp[],change[]l_{upp}[\cdot],change[\cdot]\rangle from vv’s neighbor vv^{\prime}
6 for each k[0,kmax(v)]k\in[0,k_{max}(v)] do
7       if change[k]=Truechange[k]=True then
8             l[k][v]lupp[k]l[k][v^{\prime}]\leftarrow l_{upp}[k];
9             flag[k]flag[k]\leftarrow True
10      
11
After receiving all messages
12 for each k[0,kmax(v)]k\in[0,k_{max}(v)] do
13       if flag[k]=Trueflag[k]=True then
14             V{v|vNGin(v)l[k][v]lupp[k]}V^{\prime}\leftarrow\{v^{\prime}|v^{\prime}\in N^{in}_{G}(v)\wedge l[k][v^{\prime}]\geq l_{upp}[k]\}; V′′{v′′|v′′NGout(v)l[k][v′′]lupp[k]}V^{\prime\prime}\leftarrow\{v^{\prime\prime}|v^{\prime\prime}\in N^{out}_{G}(v)\wedge l[k][v^{\prime\prime}]\geq l_{upp}[k]\}; if |V|<k|V^{\prime}|<k or |V′′|<lupp[k]|V^{\prime\prime}|<l_{upp}[k]  then
15                   lupp[k]lupp[k]1l_{upp}[k]\leftarrow l_{upp}[k]-1; change[k]change[k]\leftarrow True;
16            
17      
18
19if \existsk[0,kmax(v)]k\in[0,k_{max}(v)] such that change[k]=Truechange[k]=True  then
20       Send message v\langle v, lupp[],change[]l_{upp}[\cdot],change[\cdot]\rangle to all neighbors of vv;
21      
22
When no vertex broadcasts messages
23 for each k[0,kmax(v)]k\in[0,k_{max}(v)] do
24       Add (k,lupp[k])(k,l_{upp}[k]) to the anchored corenesses Φ(v)\Phi(v);
Algorithm 4 Anchored Coreness Refinement

4.3. Algorithm Analysis and Extension

Complexity analysis. We first analyze the time, space, message complexities of Algorithm 1. Let the edge size |EG|=m|E_{G}|=m, the maximum in-degree Δin=maxvVGdegGin(v)\Delta_{in}=\max_{v\in V_{G}}deg^{in}_{G}(v), the maximum out-degree Δout=maxvVGdegGout(v)\Delta_{out}=\max_{v\in V_{G}}deg^{out}_{G}(v), and the maximum degree Δ=maxvVGdegG(v)\Delta=\max_{v\in V_{G}}deg_{G}(v). In addition, let RACIR_{AC-I}, RACIIR_{AC-II}, and RACIIIR_{AC-III} be the number of convergence rounds required by the three phases in Algorithm 1, respectively. Let be the total number of converge rounds in Algorithm 1 as RAC=RACI+RACII+RACIIIR_{AC}=R_{AC-I}+R_{AC-II}+R_{AC-III} and RACO(Δ)R_{AC}\in O(\Delta). We have the following theorems (their detailed proofs can be found in  (Liao et al., 2022)):

Theorem 4.10.

(Time and Space Complexities) Algorithm 1 takes O(RACΔinΔ)O(R_{AC}\cdot\Delta_{in}\cdot\Delta) time and O(ΔinΔ)O(\Delta_{in}\cdot\Delta) space. The total time and space complexities for computing all vertices’ corenesses are O(RACΔinm)O(R_{AC}\cdot\Delta_{in}\cdot m) and O(Δinm)O(\Delta_{in}\cdot m), respectively.

Theorem 4.11.

(Message Complexity) The message complexity (i.e., the total number of times that a vertex send messages) of Algorithm 1 is O(ΔinΔoutΔ)O(\Delta_{in}\cdot\Delta_{out}\cdot\Delta). The total message complexity for computing all vertices’ corenesses is O(ΔinΔoutm)O(\Delta_{in}\cdot\Delta_{out}\cdot m).

Block-centric extension of Algorithm 1. We further discuss to extend the vertex-centric D-core decomposition in Algorithm 1 to the block-centric framework. The extension can be easily achieved by changing the update operation after receiving all messages. That is, instead of having Algorithms 2, 3 and 4 perform the update operation only once after receiving all messages, in the block-centric framework, the algorithms should update the H-indexes multiple times until the local block converges. For example, Algorithm 2 computes the nn-order in-H-index of vv only once in each round (lines 10-13). In contrast, the block-centric version should compute vv’s nn-order in-H-index iteratively with vv’s in-neighbors, that are located in the same block as vv, before broadcasting messages to other blocks to enter the next round. Note that in the worst case, for block-centric algorithms, every vertex converges within the block after computing the in-H-index/out-H-index only once, which is the same as vertex-centric algorithms. Therefore, the worse-case cost of block-centric algorithms is the same as that of vertex-centric algorithms.

5. Distributed Skyline Coreness-Based Algorithm

In this section, we propose a novel concept of skyline coreness, which is more elegant than the anchored coreness. Then, we give a new definition of nn-order D-index for computing skyline corenesses. Based on the D-index, we propose a distributed algorithm for skyline coreness computation to accomplish D-core decomposition.

5.1. Skyline Coreness

Motivation. The motivation for proposing another skyline coreness lies in an important observation that the anchored corenesses (k,l)(k,l)’s may have redundancy. For example, in Figure 1, the vertex v2v_{2} has four anchored corenesses, i.e., Φ(v2)={\Phi(v_{2})=\{(0, 2), (1, 2), (2, 2), (3, 1)}\}. According to D-core’s partial nesting property, if v2(2,2)v_{2}\in(2,2)-core, v2v_{2} must also belong to (0,2)(0,2)-core and (1,2)(1,2)-core. Thus, it is sufficient and more efficient to keep the coreness of v2v_{2} as {\{(2, 2), (3, 1)}\}, which uses (2, 2)-core to represent other two D-cores (0, 2)-core and (1, 2)-core. This elegant representation is termed as skyline coreness, which can facilitate space saving and fast computation of D-core decomposition. Based on the above observation, we formally define the dominance operation and skyline coreness as follows.

Definition 5.0.

(Dominance Operations). Given two coreness pairs (k,l)(k,l) and (k,l)(k^{\prime},l^{\prime}), we define two operations ‘\prec’ and ‘\preceq’ to compare them: (i) (k,l)(k,l)(k^{\prime},l^{\prime})\prec(k,l) indicates that (k,l)(k,l) dominates (k,l)(k^{\prime},l^{\prime}), i.e., either k<kk^{\prime}<k, lll^{\prime}\leq l hold or kkk^{\prime}\leq k, l<ll^{\prime}<l hold; and (ii) (k,l)(k,l)(k^{\prime},l^{\prime})\preceq(k,l) represents that kkk^{\prime}\leq k, lll^{\prime}\leq l hold.

Definition 5.0.

(Skyline Coreness). Given a vertex vv in a directed graph GG and a coreness pair (k,l)(k,l), we say that (k,l)(k,l) is a skyline coreness of vv iff it satisfies that (i) v(k,l)v\in(k,l)-core; and (ii) there exist no other pair (k,l)(k^{\prime},l^{\prime}) such that (k,l)(k,l)(k,l)\prec(k^{\prime},l^{\prime}) and v(k,l)v\in(k^{\prime},l^{\prime})-core. We use 𝖲𝖢(v){\mathsf{SC}}(v) to denote the entire skyline corenesses of the vertex vv, i.e., 𝖲𝖢(v)={(k,l)|(k,l){\mathsf{SC}}(v)=\{(k,l)\ |\ (k,l) is a skyline coreness of v}v\}.

In other words, the skyline coreness of a vertex vv is a non-dominated pair (k,l)(k,l) whose corresponding (k,l)(k,l)-core contains vv. For instance, vertex v2v_{2} has the skyline corenesses 𝖲𝖢(v2)={(2,2),(3,1)}{\mathsf{SC}}(v_{2})=\{(2,2),(3,1)\} in Figure 1, reflecting that no other coreness (k,l)(k,l) can dominate any skyline coreness in 𝖲𝖢(v2){\mathsf{SC}}(v_{2}). According to D-core’s partial nesting property, for a skyline coreness (k,l)(k,l) of vv, vv is contained in the (k,l)(k^{\prime},l^{\prime})-core with (k,l)(k,l)(k^{\prime},l^{\prime})\prec(k,l). Therefore, if we compute all skyline corenesses 𝖲𝖢(v){\mathsf{SC}}(v) for a vertex vv, we can find all D-cores the vertex vv belonging to. As a result, the problem of D-core decomposition is equivalent to computing the entire skyline corenesses for every vertex in GG, i.e., {𝖲𝖢(v)|vVG}\{{\mathsf{SC}}(v)|v\in V_{G}\}.

Structural properties of skyline coreness. We analyze the structural properties of skyline coreness.

Property 5.1.

Let (kv,lv)(k_{v},l_{v}) be a skyline coreness of vv, the following properties hold:

  • (I)

    There exist kvk_{v} in-neighbors vNGin(v)v^{\prime}\in N_{G}^{in}(v) such that (kv,lv)(k_{v},l_{v}) \preceq (kv,lv)(k_{v^{\prime}},l_{v^{\prime}}), and also lvl_{v} out-neighbors v′′NGout(v)v^{\prime\prime}\in N_{G}^{out}(v) such that (kv,lv)(k_{v},l_{v}) \preceq (kv′′,lv′′)(k_{v^{\prime\prime}},l_{v^{\prime\prime}}).

  • (II)

    Two cases cannot hold in either way: there exist kv+1k_{v}+1 in-neighbors vNGin(v)v^{\prime}\in N_{G}^{in}(v) such that (kv+1,lv)(k_{v}+1,l_{v}) \preceq (kv,lv)(k_{v^{\prime}},l_{v^{\prime}}), or lvl_{v} out-neighbors v′′NGout(v)v^{\prime\prime}\in N_{G}^{out}(v) such that (kv+1,lv)(k_{v}+1,l_{v}) \preceq (kv′′,lv′′)(k_{v^{\prime\prime}},l_{v^{\prime\prime}}).

  • (III)

    Two cases cannot hold in either way: there exist kvk_{v} in-neighbors vNGin(v)v^{\prime}\in N_{G}^{in}(v) such that (kv,lv+1)(k_{v},l_{v}+1) \preceq (kv,lv)(k_{v^{\prime}},l_{v^{\prime}}), or lv+1l_{v}+1 out-neighbors v′′NGout(v)v^{\prime\prime}\in N_{G}^{out}(v) such that (kv,lv+1)(k_{v},l_{v}+1) \preceq (kv′′,lv′′)(k_{v^{\prime\prime}},l_{v^{\prime\prime}}).

Proof.

First, we prove Property 5.1(I). Let D1D_{1} be the (kv,lv)(k_{v},l_{v})-core of GG, we have degD1in(v)kvdeg_{D_{1}}^{in}(v)\geq k_{v} and degD1out(v)lvdeg_{D_{1}}^{out}(v)\geq l_{v}. For v(ND1in(v)ND1out(v))\forall v^{\prime}\in(N_{D_{1}}^{in}(v)\cup N_{D_{1}}^{out}(v)), vv^{\prime} may be in the (k,l)(k^{\prime},l^{\prime})-core with kvkkvk_{v}\leq k^{\prime}\leq k_{v^{\prime}} and lvllvl_{v}\leq l^{\prime}\leq l_{v^{\prime}}. Therefore, (I) of Property 5.1 holds.

Next, we prove Property 5.1(II). Assume that vv has kv+1k_{v}+1 in-neighbors VV^{\prime} and lvl_{v} out-neighbors V′′V^{\prime\prime} satisfying the constraints of (II). Then, VV^{\prime} and V′′V^{\prime\prime} must be in the (kv+1,lv)(k_{v}+1,l_{v})-core. Moreover, vv \cup (kv+1,lv)(k_{v}+1,l_{v})-core is also a (kv+1,lv)(k_{v}+1,l_{v})-core. Hence, (kv+1,lv)(k_{v}+1,l_{v}) rather than (kv,lv)(k_{v},l_{v}) is a skyline coreness of vv, which contradicts to the condition of Property 5.1. Therefore, the assumption does not hold.

Finally, Property 5.1(III) can be proved in the same way of Property 5.1(II). It is omitted due to space limitation. ∎

For example, (2,2)(2,2) is a skyline coreness of v2v_{2} in Figure 1. The in-neighbors of v2v_{2} are v3v_{3}, v4v_{4}, v5v_{5}, and v7v_{7}, whose skyline corenesses are {(3,3)}\{(3,3)\}, {(2,2)}\{(2,2)\}, {(3,3)}\{(3,3)\}, and {(2,2),(3,1)}\{(2,2),(3,1)\}, respectively. These four vertices all have skyline corenesses that dominate or are identical to v2v_{2}’s skyline coreness (2,2)(2,2). But only two vertices v3v_{3} and v5v_{5} have skyline corenesses that dominate (kv2+1,lv2)=(3,2)(k_{v_{2}}+1,l_{v_{2}})=(3,2). Hence, (3,2)(3,2) is not a skyline coreness of v2v_{2}. Property 5.1 reveals the relationships among vertices’ skyline corenesses, based on which we propose an algorithm for skyline coreness computation in the next subsection.

5.2. Distributed Skyline Corenesses Computing

We begin with a novel concept of D-index.

Definition 5.0.

(D-index). Given two sets of pairs of integers RinR_{in}, RoutR_{out} 0×0\subseteq\mathbb{N}_{0}\times\mathbb{N}_{0}, the D-index of RinR_{in} and RoutR_{out} is denoted by 𝒟(Rin,Rout)\mathcal{D}(R_{in},R_{out}) 0×0\subseteq\mathbb{N}_{0}\times\mathbb{N}_{0}, where each element (k,l)𝒟(Rin,Rout)(k,l)\in\mathcal{D}(R_{in},R_{out}) satisfies: (i) there exist at least kk pairs (ki,li)Rin(k_{i},l_{i})\in R_{in} such that (k,l)(ki,li)(k,l)\preceq(k_{i},l_{i}) for 1ik1\leq i\leq k; (ii) there exist at least ll pairs (kj,lj)Rout(k_{j},l_{j})\in R_{out} such that (k,l)(kj,lj)(k,l)\preceq(k_{j},l_{j}) for 1jl1\leq j\leq l; (iii) there does not exist another (k,l)0×0(k^{\prime},l^{\prime})\in\mathbb{N}_{0}\times\mathbb{N}_{0} satisfying the above conditions (1) and (2), and (k,l)(k,l)(k,l)\prec(k^{\prime},l^{\prime}).

The idea of D-index is very similar to H-index. Actually, the D-index is an extension of H-index to handle two-dimensional integer pairs. For 𝒟(Rin,Rout)\mathcal{D}(R_{in},R_{out}), it finds a series of (k,l)(k,l) skyline pairs such that each has at least kk dominated pairs in RinR_{in} and at least ll dominated pairs in RoutR_{out}, using a joint indexing way. For example, let Rin={(1,1),(2,2)}R_{in}=\{(1,1),(2,2)\} and Rout={(3,3),(4,4)}R_{out}=\{(3,3),(4,4)\}, then 𝒟(Rin,Rout)={(1,2)}\mathcal{D}(R_{in},R_{out})=\{(1,2)\}. Note that 𝒟(Rin,Rout)𝒟(Rout,Rin)\mathcal{D}(R_{in},R_{out})\neq\mathcal{D}(R_{out},R_{in}) may hold for the D-index, as 𝒟(Rout,Rin)={(2,1)}{(1,2)}=𝒟(Rin,Rout)\mathcal{D}(R_{out},R_{in})=\{(2,1)\}\neq\{(1,2)\}=\mathcal{D}(R_{in},R_{out}) in this example. Next, we introduce another concept of nn-order D-index for distributed D-core decomposition.

Definition 5.0.

(nn-order D-index). Given a vertex vv in GG, the nn-order D-index of vv, denoted by D(n)(v)D^{(n)}(v) 0×0\subseteq\mathbb{N}_{0}\times\mathbb{N}_{0}, is defined as

(5) D(n)(v)={{(degGin(v),degGout(v))},n=0𝒟(Rin(n1)(v),Rout(n1)(v)),n>0D^{(n)}(v)=\begin{cases}\{(deg_{G}^{in}(v),deg_{G}^{out}(v))\},&n=0\\ \mathcal{D}(R_{in}^{(n-1)}(v),R_{out}^{(n-1)}(v)),&n>0\end{cases}

Here, Rin(n1)(v)R_{in}^{(n-1)}(v) == {(ku,lu)D(n1)(u)|uNGin(v)}\{(k_{u},l_{u})\in D^{(n-1)}(u)\ |\ u\in N_{G}^{in}(v)\} and Rout(n1)(v)R_{out}^{(n-1)}(v) == {(ku,lu)D(n1)(u)|uNGout(v)}\{(k_{u},l_{u})\in D^{(n-1)}(u)\ |\ u\in N_{G}^{out}(v)\}. Note that D(n)(v)D^{(n)}(v) is the largest non-dominated D-index such that it dominates or at least is identical to 𝒟(Rin(n1)(v),Rout(n1)(v))\mathcal{D}(R_{in}^{(n-1)}(v),R_{out}^{(n-1)}(v)), for each Rin(n1)(v)R_{in}^{(n-1)}(v) D(n1)(u1)××D(n1)(ui)\in D^{(n-1)}(u_{1})\times\ldots\times D^{(n-1)}(u_{i}) when NGin(v)={u1,,ui}N_{G}^{in}(v)=\{u_{1},\ldots,u_{i}\} and each Rout(n1)(v)R_{out}^{(n-1)}(v) D(n1)(u1)××D(n1)(uj)\in D^{(n-1)}(u_{1})\times\ldots\times D^{(n-1)}(u_{j}) when NGout(v)={u1,,uj}N_{G}^{out}(v)=\{u_{1},\ldots,u_{j}\}.

The nn-order D-index D(n)(v)D^{(n)}(v) may contain more than one pair (k,l)(k,l), i.e., |D(n)(v)|1|D^{(n)}(v)|\geq 1. Note that Rin(n1)(v)R_{in}^{(n-1)}(v) and Rout(n1)(v)R_{out}^{(n-1)}(v) consist of one pair (ku,lu)(k_{u},l_{u}) for each in-neighbor uNGin(v)u\in N^{in}_{G}(v) and each out-neighbor uNGout(v)u\in N^{out}_{G}(v), respectively. Therefore, there exist multiple combinations of Rout(n1)(v)R_{out}^{(n-1)}(v) and Rout(n1)(v)R_{out}^{(n-1)}(v). Moreover, D(n)(v)D^{(n)}(v) should consider all combinations of Rout(n1)(v)R_{out}^{(n-1)}(v) and Rout(n1)(v)R_{out}^{(n-1)}(v), and finally select the “best” choice as the largest non-dominated set of D-index 𝒟(Rin(n1)(v),Rout(n1)(v))\mathcal{D}(R_{in}^{(n-1)}(v),R_{out}^{(n-1)}(v)).

For two pair sets R1R_{1}, R2R_{2} 0×0\subseteq\mathbb{N}_{0}\times\mathbb{N}_{0}, we say R2R1R_{2}\preceq R_{1} if and only if (k,l)R2\forall(k,l)\in R_{2}, (k,l)R1\exists(k^{\prime},l^{\prime})\in R_{1} such that (k,l)(k,l) \preceq (k,l)(k^{\prime},l^{\prime}). Then, we have the following theorem of nn-order D-index convergence.

Theorem 5.5 (nn-order D-index Convergence).

For a vertex vv in GG, it holds that

(6) 𝖲𝖢(v)=limnD(n)(v){\mathsf{SC}}(v)=\lim_{n\rightarrow\infty}D^{(n)}(v)
Proof.

The proof can be similarly done as Theorem 2. ∎

By Theorem 6, we can compute vertices’ skyline corenesses via iteratively computing their nn-order D-indexes until convergence.

5.3. Algorithms and Optimizations

A naive implementation of the distributed algorithm to compute D(n)(v)D^{(n)}(v) may suffer from serious performance problems, due to the combinatorial blow-ups in a large number of choices of Rin(n1)(v)R_{in}^{(n-1)}(v) and Rout(n1)(v)R_{out}^{(n-1)}(v). Thus, we first tackle three critical issues for fast distributed computation of nn-order D-index.

Optimization-1: Fast computation of D-index 𝒟(Rin,Rout)\mathcal{D}(R_{in},R_{out}). The first issue is, given RinR_{in} and RoutR_{out}, how to compute D-index 𝒟(Rin,Rout)\mathcal{D}(R_{in},R_{out}). A straightforward way is to list all candidate pairs and return the pairs satisfying Def. 5.3. According to conditions (1)&(2) in Def. 5.3, if (k,l)(k,l) belongs to D-index, there exists at least kk pairs of RinR_{in} satisfying the dominance relationship. Therefore, 0k|Rin|0\leq k\leq|R_{in}|. Similarly, 0l|Rout|0\leq l\leq|R_{out}|. Thus, there are a total of (|R1|+1)(|R2|+1)(|R_{1}|+1)\cdot(|R_{2}|+1) candidate pairs to be checked, which is costly for large |R1||R_{1}| and |R2||R_{2}|. In addition, the basic operation of D-index computation is frequently invoked in the process of computing D(n)(v)D^{(n)}(v). Hence, it is necessary to develop faster algorithms. To this end, we try to reduce the pairs for examination as many as possible through the following two optimizations.

  • Reducing the ranges of kk and ll. For conditions (1)&(2) in Def. 5.3, if (k,l)(k,l) belongs to 𝒟(Rin,Rout)\mathcal{D}(R_{in},R_{out}), there exist at least kk pairs (ki,li)(k_{i},l_{i}) in RinR_{in} such that (k,l)(ki,li)(k,l)\preceq(k_{i},l_{i}). In other words, at least kk pairs (ki,li)(k_{i},l_{i}) in RinR_{in} have kikk_{i}\geq k. Thus, the maximum kk is denoted by kmaxk_{max} == (Ik)\mathcal{H}(I_{k}), where Ik={ki|(ki,li)Rin}I_{k}=\{k_{i}\ |\ (k_{i},l_{i})\in R_{in}\}. Similarly, we can also obtain the maximum ll, denoted by lmaxl_{max}, as lmaxl_{max} == (Ol)\mathcal{H}(O_{l}), where Ol={lj|(kj,lj)Rout}O_{l}=\{l_{j}\ |\ (k_{j},l_{j})\in R_{out}\}. Since (Ik)|Rin|\mathcal{H}(I_{k})\leq|R_{in}| and (Ol)|Rout|\mathcal{H}(O_{l})\leq|R_{out}|, the total number of candidate pairs decreases.

  • Pruning disqualified candidate pairs. Let (k,l)(k,l) \in 𝒟(Rin,Rout)\mathcal{D}(R_{in},R_{out}). According to condition (3) in Def. 5.3, if (k,l)(k^{\prime},l^{\prime}) \in 𝒟(Rin,Rout)\mathcal{D}(R_{in},R_{out}) with k<kk^{\prime}<k, ll^{\prime} must satisfy l>ll^{\prime}>l. Otherwise, (k,l)(k,l)(k^{\prime},l^{\prime})\prec(k,l) and (k,l)(k^{\prime},l^{\prime}) \notin 𝒟(Rin,Rout)\mathcal{D}(R_{in},R_{out}). This rule can be used to prune disqualified pairs based on the found skyline corenesses.

Optimization-2: Fast computation of n-order D-index D(n)(v)D^{(n)}(v). The second issue is the computation of D(n)(v)D^{(n)}(v). By Def. 5.4, both Rin(n1)(v)R_{in}^{(n-1)}(v) and Rout(n1)(v)R_{out}^{(n-1)}(v) may have multiple instances. Hence, a straightforward way is to compute the D-index for every instance and finally integrate them together. In total, we need to compute the D-index O(vNGin(v)|D(n1)(v)|O(\prod_{v^{\prime}\in N_{G}^{in}(v)}|D^{(n-1)}(v^{\prime})| \cdot v′′NGout(v)|D(n1)(v′′)|)\prod_{v^{\prime\prime}\in N_{G}^{out}(v)}|D^{(n-1)}(v^{\prime\prime})|) times, which is very inefficient. Actually, several redundant computations occur due to many independent instances in the D-index computation. For example, in one instance, we have verified that (k,l)(k,l) belongs to the nn-order D-index. Then, there is no need to verify (k,l)(k,l) in other instances. This motivates us to devise a more efficient method to compute D(n)(v)D^{(n)}(v), which requires D-index computation only once. Specifically, we first compute kmaxk_{max} and lmaxl_{max}. Then, we enumerate candidate pairs for dominance checking. Here, we highlight two differences from the original D-index computation method.

  • The difference of kmaxk_{max} and lmaxl_{max} computations. For kmaxk_{max} and lmaxl_{max} in D-index computation, IkI_{k} (resp. OlO_{l}) is formed by just adding kik_{i} (resp. lil_{i}) from each pair in RinR_{in}. For nn-order D-index computation, the vertex’s (n1)(n-1)-order D-index may have more than one pairs. We should select the maximum kik_{i} and lil_{i} among these pairs. Specifically, for vv’s nn-order D-index computation, to compute kmaxk_{max}, Ik(v)={ki|vNGin(v),ki=max(ki,li)D(n1)(v)(ki)}I_{k}(v)=\{k_{i}\ |\ v^{\prime}\in N_{G}^{in}(v),k_{i}=\max_{(k^{\prime}_{i},l^{\prime}_{i})\in D^{(n-1)}(v^{\prime})}(k^{\prime}_{i})\}. In the same way, Il(v)={lj|vNGout(v),lj=max(kj,lj)D(n1)(v)(lj)}I_{l}(v)=\{l_{j}\ |\ v^{\prime}\in N_{G}^{out}(v),l_{j}=\max_{(k^{\prime}_{j},l^{\prime}_{j})\in D^{(n-1)}(v^{\prime})}(l^{\prime}_{j})\}.

  • The difference of dominance checking. For a candidate pair (k,l)(k,l), the D-index computation should find the pairs in RinR_{in} and RoutR_{out} that dominate or are identical to (k,l)(k,l). To compute D(n)(v)D^{(n)}(v), we should find all vv’s neighbors vv^{\prime} whose (n1)(n-1)-order D-index has a pair dominating or identical to (k,l)(k,l). If D(n1)(v)D^{(n-1)}(v^{\prime}) has multiple pairs, we need to examine the dominance relationship for each of these pairs with (k,l)(k,l). Once one pair dominates or is identical to (k,l)(k,l), such vv^{\prime} is identified.

Input: directed graph GG, vertex vv
Output: the skyline corenesses 𝖲𝖢(v){\mathsf{SC}}(v)
Initializations
1 Compute iHG()(v)\mathrm{iH}^{(\infty)}_{G}(v) and oHG()(v)\mathrm{oH}^{(\infty)}_{G}(v) using Algorithm 2;
2 Dv={(iHG()(v),oHG()(v))}D_{v}=\{(\mathrm{iH}^{(\infty)}_{G}(v),\mathrm{oH}^{(\infty)}_{G}(v))\};
3 Send message v\langle v, DvD_{v}\rangle to all neighbors of vv;
4
On receiving messagev\langle v^{\prime}, DvD_{v^{\prime}}\rangle from vv’s neighbor vv^{\prime}
5 Dk[v]0D_{k}[v^{\prime}]\leftarrow 0; Dl[v]0D_{l}[v^{\prime}]\leftarrow 0;
6 D[v]DvD[v^{\prime}]\leftarrow D_{v^{\prime}};
7 for each (k,l)Dv(k,l)\in D_{v^{\prime}} do
8       Dk[v]max(Dk[v],k)D_{k}[v^{\prime}]\leftarrow\max(D_{k}[v^{\prime}],k); Dl[v]max(Dl[v],l)D_{l}[v^{\prime}]\leftarrow\max(D_{l}[v^{\prime}],l);
9      
10flagflag\leftarrow True;
After receiving all messages
11 if flag=Trueflag=True then
12       Apply Algorithm 6 on nn-order D-index computation;
13      if D[v]DD[v]\neq D then
14             D[v]DD[v]\leftarrow D; DvDD_{v}\leftarrow D;
15             Send message v\langle v, DvD_{v}\rangle to all neighbors of vv;
16            
17      
18
When no vertex broadcasts messages
return 𝖲𝖢(v)D[v]{\mathsf{SC}}(v)\leftarrow D[v];
Algorithm 5 Distributed Skyline Corenesses Computation Algorithm: routine executed by vertex vv
Output: vv’s nn-order D-index
1 DD\leftarrow\varnothing; lmin0l_{min}\leftarrow 0;
2 Ik={Dk[v]|vNGin(v)}I_{k}=\{D_{k}[v^{\prime}]|v^{\prime}\in N_{G}^{in}(v)\}; kmax(Ik)k_{max}\leftarrow\mathcal{H}(I_{k});
3 Ol={Dl[v]|vNGout(v)}O_{l}=\{D_{l}[v^{\prime}]|v^{\prime}\in N_{G}^{out}(v)\}; lmax(Ol)l_{max}\leftarrow\mathcal{H}(O_{l});
4
5for kkmaxk\leftarrow k_{max} to 0 do
6       llmaxl\leftarrow l_{max};
7       while l>lminl>l_{min} do
8             V1={v|vNGin(v),V_{1}=\{v^{\prime}|v^{\prime}\in N_{G}^{in}(v), and (k,l)D[v],\exists(k^{\prime},l^{\prime})\in D[v^{\prime}], (k,l)(k,l)}(k,l)\preceq(k^{\prime},l^{\prime})\};
9             V2={v|vNGout(v),V_{2}=\{v^{\prime}|v^{\prime}\in N_{G}^{out}(v), and (k,l)D[v],\exists(k^{\prime},l^{\prime})\in D[v^{\prime}], (k,l)(k,l)}(k,l)\preceq(k^{\prime},l^{\prime})\};
10             if |V1|k|V_{1}|\geq k \wedge |V2|l|V_{2}|\geq l then
11                   lminll_{min}\leftarrow l; DD(k,l)D\leftarrow D\cup(k,l);
12                  
13            ll1l\leftarrow l-1 ;
14            
15      
Algorithm 6 nn-order D-index Computation
Table 2. An illustration of distributed skyline coreness computation using Algorithm 5 on graph GG in Figure 2.
Vertices
v1v_{1} v2v_{2} v3v_{3} v4v_{4} v5v_{5} v6v_{6} v7v_{7} v8v_{8}
D(0)(v)D^{(0)}(v) {(2,2)}\{(2,2)\} {(2,0)}\{(2,0)\} {(2,0)}\{(2,0)\} {(2,2)}\{(2,2)\} {(2,2)}\{(2,2)\} {(2,2)}\{(2,2)\} {(1,2)}\{(1,2)\} {(2,1)}\{(2,1)\}
D(1)(v)D^{(1)}(v) {(2,2)}\{(2,2)\} {(2,0)}\{(2,0)\} {(2,0)}\{(2,0)\} {(2,2)}\{(2,2)\} {(2,2)}\{(2,2)\} {(2,2)}\{(2,2)\} {(0,2),(1,1)}\{(0,2),(1,1)\} {(1,1),(2,0)}\{(1,1),(2,0)\}
D(2)(v)D^{(2)}(v) {(2,2)}\{(2,2)\} {(2,0)}\{(2,0)\} {(2,0)}\{(2,0)\} {(2,2)}\{(2,2)\} {(2,2)}\{(2,2)\} {(2,2)}\{(2,2)\} {(0,2),(1,1)}\{(0,2),(1,1)\} {(1,1),(2,0)}\{(1,1),(2,0)\}

Optimization-3: Tight initialization. Finally, we present an optimization for D(n)(v)D^{(n)}(v) computation using a tight initialization. In Def. 5.4, the 0-order D-index is initialized with the vertex’s in-degree and out-degree. The optimization idea is that if we tightly initialize the vertex’s 0-order D-index with smaller values (denoted by D(0)(v)=(k0(v),l0(v))D^{(0)}(v)=(k_{0}(v),l_{0}(v))), the nn-order D-index can converge faster to the exact skyline coreness. Here, we highlight two principles to find such (k0(v),l0(v))(k_{0}(v),l_{0}(v)): (i) k0(v)max(ki,li)𝖲𝖢(v)kik_{0}(v)\leq\max_{(k_{i},l_{i})\in{\mathsf{SC}}(v)}k_{i} and l0(v)max(ki,li)𝖲𝖢(v)lil_{0}(v)\leq\max_{(k_{i},l_{i})\in{\mathsf{SC}}(v)}l_{i}, otherwise the D(n)(v)D^{(n)}(v) cannot converge to 𝖲𝖢(v){\mathsf{SC}}(v); (ii) (k0(v),l0(v))(k_{0}(v),l_{0}(v)) should be easy to compute in distributed settings. As a result, we present the following theorem.

Theorem 5.6.

For any vertex vv in GG, it holds that kmax(v)max{ki|(ki,li)𝖲𝖢(v)}k_{max}(v)\geq\max\{k_{i}\ |\ (k_{i},l_{i})\in{\mathsf{SC}}(v)\} and lmax(v)max{li|(ki,li)𝖲𝖢(v)}l_{max}(v)\geq\max\{l_{i}\ |\ (k_{i},l_{i})\in{\mathsf{SC}}(v)\}, where lmax(v)l_{max}(v) == max{l|v(0,l)-core v(0,l+1)-core}\max\{l\ |\ v\in(0,l)\text{-core }\wedge v\notin(0,l+1)\text{-core}\}.

Theorem 5.6 offers two tight upper bounds for k0k_{0} and l0l_{0}, i.e., kmax(v)k_{max}(v) and lmax(v)l_{max}(v), respectively. In addition, according to Theorems 2 and 4, kmax(v)k_{max}(v) and lmax(v)l_{max}(v) can be computed by iteratively computing vv’s nn-order in-H-index and out-H-index, respectively. Therefore, we initialize D(0)(v)=(kmax(v),lmax(v))D^{(0)}(v)=(k_{max}(v),l_{max}(v)).

Algorithms. Based on the above theoretical analytics and optimizations, we present the distributed skyline corenesses computation algorithm in Algorithm 5. At the initialization phase, the algorithm computes iHG()(v)\mathrm{iH}^{(\infty)}_{G}(v) and oHG()(v)\mathrm{oH}^{(\infty)}_{G}(v) using Algorithm 2 and uses them to initialize the 0-order D-index of vv, which is broadcast to all neighbors of vv (lines 1-3). When vv receives a message from its neighbor vv^{\prime}, Algorithm 5 updates the nn-order D-index of vv^{\prime} that is stored in vv’s node, and finds the maximum values in each pair of kk and ll (lines 4-8). After vv receives all messages, Algorithm 5 computes the nn-order D-index for vv, which is described in Algorithm 6. Then, it broadcasts to all neighbors of vv if the nn-order D-index changes (lines 9-13). When there is no vertex broadcasting messages, Algorithm 5 returns the latest nn-order D-index as skyline corenesses (line 14).

Next, we present the procedure of Algorithm 6 for nn-order D-index computation. It first computes kmaxk_{max} and lmaxl_{max} as shown in Optimization-1 and Optimization-2 (lines 2-3), which help to determine the range of candidate pairs. Then, the algorithm enumerates all candidate pairs (k,l)(k,l) and examines whether (k,l)(k,l) belongs to the nn-order D-index of vv (lines 6-11). Note that lminl_{min} keeps the minimal value of ll for the remaining candidate pairs, which is used to prune disqualified pairs.

Example 5.0.

We use the graph GG in Figure 2 to illustrate Algorithm 5. Table 2 reports the process of computing skyline corenesses. Take vertex v7v_{7} as an example. First, the 0-order D-index of v7v_{7} is initialized with {(1,2)}\{(1,2)\}, i.e., D(0)(v7)={(1,2)}D^{(0)}(v_{7})=\{(1,2)\}. Then, we iteratively compute the nn-order D-index for v7v_{7}. We can observe that after one iteration only, the 11-order D-index of v7v_{7} has converged as D(2)(v7)=D(1)(v7)=D^{(2)}(v_{7})=D^{(1)}(v_{7})= {(0,2),(1,1)}\{(0,2),(1,1)\}. Thus, the entire skyline corenesses of v7v_{7} are 𝖲𝖢(v7)={\mathsf{SC}}(v_{7})= {(0,2),(1,1)}\{(0,2),(1,1)\}.

5.4. Algorithm Analysis and Extension

Complexity analysis. Let RSCR_{SC} be the number of convergence rounds taken by Algorithm 5. In practice, our algorithms achieve RSCRACΔR_{SC}\leq R_{AC}\ll\Delta on real datasets. We show the time, space, and message complexities of Algorithm 5 below.

Theorem 5.8.

(Time and Space Complexities) Algorithm 5 takes O(RSCΔinΔout)O(R_{SC}\cdot\Delta_{in}\cdot\Delta_{out}) time and O(Δmin{Δin,Δout})O(\Delta\cdot\min\{\Delta_{in},\Delta_{out}\}) space. The total time and space complexities for computing all vertices’ corenesses are O(RSCΔinm)O(R_{SC}\cdot\Delta_{in}\cdot m) and O(min{Δin,Δout}m)O(\min\{\Delta_{in},\Delta_{out}\}\cdot m), respectively.

Theorem 5.9.

(Message Complexity) The message complexity of Algorithm 5 is O(Δ2)O(\Delta^{2}). The total message complexity for computing all vertices’ corenesses is O(Δm)O(\Delta\cdot m).

Through the above analysis, we can see that the skyline coreness-based approach in Algorithm 5 takes less space and runs much faster than the anchored coreness approach in Algorithm 1.

Block-centric extension. Algorithm 5 can be easily extended to the block-centric framework. The only difference is that each machine iteratively computes the nn-order D-index locally until the algorithm converges within the local block, before broadcasting to other blocks (lines 9-13 of Algorithm 5).

6. Performance Evaluation

In this section, we empirically evaluate our proposed algorithms. We conduct our experiments on a collection of Amazon EC2 r5.2x large instances, each powered by 8 vCPUs and 64GB memory. The network bandwidth is up to 10G Gb/s. All experiments are implemented in C++ on the Ubuntu 18.04 operating system.

Datasets. We use 11 real-world graphs in our experiments. Table 3 shows the statistics of these graphs. Specifically, Wiki-vote111http://snap.stanford.edu/data/index.html is a voting graph; Email-EuAll1 is a communication graph; Amazon1 is a product co-purchasing graph; Hollywood2 is an actors collaboration graph; Pokec1, Live Journal1, and Slashdot1 are social graphs; Citation1 is a citation graph; UK-2002222http://law.di.unimi.it/datasets.php, IT-20042, and UK-20052 are web graphs.

Table 3. Statistics of the datasets (𝒅𝒆𝒈𝒂𝒗𝒈\bm{deg_{avg}} represents the average degree; K = 10310^{3}, M = 10610^{6}, and B = 10910^{9})
Dataset Abbr. |𝑽𝑮|\bm{|V_{G}|} |𝑬𝑮|\bm{|E_{G}|} 𝒅𝒆𝒈𝒂𝒗𝒈\bm{deg_{avg}} 𝒌𝒎𝒂𝒙\bm{k_{max}} 𝒍𝒎𝒂𝒙\bm{l_{max}}
Wiki-vote WV 7.1K 103.6K 14.57 19 15
Email-EuAll EE 265.2K 420K 1.58 28 28
Slashdot SL 82.1K 948.4K 11.54 54 9
Amazon AM 400.7K 3.2M 7.99 10 10
Citation CT 3.7M 16.5M 4.37 1 1
Pokec PO 1.6M 30.6M 18.75 32 31
Live Journal LJ 4.8M 69.0M 14.23 253 254
Hollywood HW 2.1M 228.9M 105.00 1,297 99
UK-2002 UK2 18.5M 298.1M 16.09 942 99
UK-2005 UK5 39.4M 936.3M 23.73 584 99
IT-2004 IT 41.2M 1.1B 27.87 3,198 990

Algorithms. We compare five algorithms in our experiments.

  • AC-V and AC-B: The distributed anchored coreness-based D-core decomposition algorithms implemented in the vertex-centric and block-centric frameworks, respectively.

  • SC-V and SC-B: The distributed skyline coreness-based D-core decomposition algorithms implemented in the vertex-centric and block-centric frameworks, respectively.

  • Peeling: The distributed version of the peeling algorithm for D-core decomposition (Fang et al., 2018), in which one machine is assigned as the coordinator to collect global graph information and dispatch decomposition tasks.

We employ GRAPE (Fan et al., 2017) as the block-centric framework and use the hash partitioner for graph partitioning by default. For the sake of fairness, we also employ GRAPE to simulate the vertex-centric framework. In specific, at each round, all vertices within a block execute computations only once and when all vertices complete the computation, the messages will be broadcast to their neighbors.

Parameters and Metrics. The parameters tested in experiments include #\# machines and graph size, whose default settings are 8 and 100%|VG|100\%\cdot|V_{G}|, respectively. The performance metrics evaluated include #\# iterations required for convergence, convergence rate (i.e., the percentage of vertices who have computed the coreness), running time (in seconds), and communication overhead (i.e., the total messages sent by all vertices).

6.1. Convergence Evaluation

The first set of experiments evaluates the convergence of our proposed algorithms.

Table 4. #\# Iterations required for the algorithms
Algorithms Datasets
WV EE SL AM CT
Upper Bound 1,167 7,636 5,064 2,757 793
AC-V Phase I 19 17 40 16 32
Phase II 32 19 53 64 32
Phase III 33 22 61 61 2
Total 84 58 154 141 66
AC-B Phase I 14 14 35 13 28
Phase II 15 7 43 30 28
Phase III 16 21 45 25 2
Total 45 42 123 68 58
SC-V 33 19 61 65 2
SC-B 17 6 46 25 2
Refer to caption
(a) Phase I of AC-V and AC-B
Refer to caption
(b) Phase II of AC-V and AC-B
Refer to caption
(c) Phase III of AC-V and AC-B
Refer to caption
(d) SC-V and SC-B
Figure 3. Convergence rates of our algorithms (AM)

Exp-1: Evaluation on the number of iterations. We start by evaluating #\# iterations required for our algorithms to converge. Note that an iteration here refers to a cycle of the algorithm receiving messages, performing computations, and broadcasting messages. Table 4 reports the results on datasets WV, EE, SL, AM, and CT. We make several observations. First, for every graph, all of our proposed algorithms have much less iterations than the upper bound (i.e., the maximum degree of the graph), which demonstrates the efficiency of our algorithms. Second, the iterations of SC-V and SC-B are less than those of AC-V and AC-B. This is because the computation of anchored corenesses is more cumbersome than that of skyline corenesses. Hence, both AC-V and AC-B take more iterations. Third, for the same type of algorithms, i.e., AC or SC, the algorithm implemented in the block-centric framework takes less iterations than that in the vertex-centric framework. The reason is that the block-centric framework allows algorithms to use vertices located in the same block to converge locally within a single round, which leads to faster convergence.

Exp-2: Evaluation on the convergence rate. Since different vertices require different numbers of iterations to converge, in this experiment, we evaluate the algorithms’ convergence rates. Figure 3 shows the results on Amazon. As expected, the algorithms implemented in the block-centric framework converge faster. For example, in Figure 3(d), after 8 iterations, the convergence rates of SC-V and SC-B reach 89.9%89.9\% and 98.6%98.6\%, respectively. Moreover, most vertices can converge within just a few iterations. Specifically, for SC-B, more than 95%95\% vertices converge within 5 iterations. In addition, SC algorithms have faster convergence rates than AC algorithms. For example, AC-B takes 68 iterations to reach convergence while SC-B takes 25 iterations.

6.2. Efficiency Evaluation

Next, we evaluate the efficiency of our proposed algorithms against the state-of-the-art peeling algorithm, denoted as Peeling. Note that if an algorithm cannot finish within 5 days, we denote it by ’INF’.

Exp-3: Our algorithms vs. Peeling. We first compare the performance of our proposed algorithms with Peeling under the default experiment settings. Figures 4(a) and 4(b) report the results in terms of the running time and communication overhead, respectively. First, we can see that Peeling cannot finish within 5 days on the large-scale graphs with more than 50 million edges, including LJ, HW, UK2 UK5, and IT, while our algorithms can finish within 1 hour for most of these datasets and no more than 10 hours on the largest billion-scale graph for our fastest algorithm. Moreover, for the datasets where Peeling can finish, our algorithms outperform Peeling by up to 3 orders of magnitude. This well demonstrates the efficiency of our proposed algorithms. Second, SC-V and SC-B perform better than AC-V and AC-B in terms of both the running time and communication overhead. For example, on the biggest graph IT with over a billion edges, the improvement is nearly 1 order of magnitude. This is because SC-V and SC-B compute less corenesses than AC-V and AC-B. Third, AC-V (resp. SC-V) is better than AV-B (resp. SC-B) in terms of the running time while it is opposite in terms of the communication overhead. This is due to the effect of straggler (Ananthanarayanan et al., 2010). Specifically, for the block-centric framework, in each iteration, the algorithms use block information to converge locally (i.e., within each block); they cannot start the next iteration until all blocks have converged. There are machines where some blocks may converge very slowly, which deteriorates the overall performance of the block-centric algorithms.

Refer to caption
(a) Running time
Refer to caption
(b) Communication
Figure 4. Performance comparisons

Exp-4: Effect of the number of machines. Next, we vary the number of machines from 2 to 16 and test its effect on performance. Figure 5 reports the results for datasets UK2 and HW. As shown in Figures 5(a) and 5(b), all of our algorithms take less running time when the number of machines increases. This is because the more the machines, the stronger the computing power our algorithms can take advantage of, thanks to their distributed designs. In addition, Figures 5(c) and 5(d) show that the communication overheads of all algorithms do not change with the number of machines. This is because the communication overhead is determined by the convergence rate of the algorithms, which is not influenced by the number of machines.

Refer to caption
(a) Running time (HW)
Refer to caption
(b) Running time (UK2)
Refer to caption
(c) Communication (HW)
Refer to caption
(d) Communication (UK2)
Figure 5. Effect of #\# machines

Exp-5: Effect of dataset cardinality. We evaluate the effect of cardinality for our proposed algorithms on datasets PO and UK5. For this purpose, we extract a set of subgraphs from the original graphs by randomly selecting different fractions of vertices, which varies from 20%20\% to 100%100\%. As shown in Figure 6, both the running time and communication overhead increase with the dataset cardinality. This is expected because the larger the dataset, the more the corenesses of the vertices to compute, resulting in to poorer performance.

Exp-6: Effect of partition strategies. We evaluate the effect of different partition strategies in block-centric algorithms, i.e., AC-B and SC-B. Specifically, we compare four partition strategies, including SEG (Fan et al., 2017), HASH (Fan et al., 2017), FENNEL (Tsourakakis et al., 2014), and METIS (Karypis and Kumar, 1998).

  • SEG is a built-in partitioner of GRAPE. Let CC be the maximum cardinality of partitioned subgraphs. For a vertex vv with its ID vid[0,n1]v_{id}\in[0,n-1], vv is allocated to the ii-th subgraph, where i=vid/Ci=v_{id}/C.

  • HASH is also a built-in partitioner of GRAPE. Let NN be the number of partitioned subgraphs. For a vertex vv with its ID vid[0,n1]v_{id}\in[0,n-1], vv is allocated to the ii-th subgraph, where i=vid%Ni=v_{id}\%N.

  • FENNEL subsumes two popular heuristics to partition the graph: the folklore heuristic that places a vertex to the subgraph with the fewest non-neighbors, and the degree-based heuristic that uses different heuristics to place a vertex based on its degree.

  • METIS is a popular edge-cut partitioner that partitions the graph into subgraphs with minimum crossing edges.

Figure 7 shows the results. We can observe that HASH has the best performance in terms of running time on most datasets, but it has the worse performance in terms of communication cost. This is because HASH has more balanced partitions (i.e., each partition has almost an equal number of vertices) while METIS and FENNEL have higher locality, which leads to more running time, due to the effect of straggler, but less communication overhead. Considering the importance of efficiency in practice, we employ HASH as the default partition strategy in our experiments, as mentioned earlier.

Refer to caption
(a) Running time (PO)
Refer to caption
(b) Running time (UK5)
Refer to caption
(c) Communication (PO)
Refer to caption
(d) Communication (UK5)
Figure 6. Effect of cardinality
Refer to caption
(a) Running time (AC-B)
Refer to caption
(b) Running time (SC-B)
Refer to caption
(c) Communication (AC-B)
Refer to caption
(d) Communication (SC-B)
Figure 7. Effect of partition strategies

7. Conclusion

In this paper, we study the problem of D-core decomposition in distributed settings. To handle distributed D-core decomposition, we propose two efficient algorithms, i.e., the anchored-coreness-based algorithm and skyline-coreness-based algorithm. Specifically, the anchored-coreness-based algorithm employs in-H-index and out-H-index to compute the anchored corenesses in a distributed way; the skyline-coreness-based algorithm uses a newly designed index, called D-index, for D-core decomposition through skyline coreness computing. Both theoretical analysis and empirical evaluation show the efficiency of our proposed algorithms with fast convergence rates.

As for future work, we are interested to study how to further improve the performance of the skyline-coreness-based algorithm, in particular how to accelerate the computation of D-index on each single machine. We are also interested to investigate efficient algorithms of distributed D-core maintenance for dynamic graphs.

References

  • (1)
  • gir (2012) 2012. Giraph. https://giraph.apache.org/.
  • Ananthanarayanan et al. (2010) Ganesh Ananthanarayanan, Srikanth Kandula, Albert G. Greenberg, Ion Stoica, Yi Lu, Bikas Saha, and Edward Harris. 2010. Reining in the Outliers in Map-Reduce Clusters using Mantri. In OSDI. 265–278.
  • Aridhi et al. (2016) Sabeur Aridhi, Martin Brugnara, Alberto Montresor, and Yannis Velegrakis. 2016. Distributed k-core decomposition and maintenance in large dynamic graphs. In Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems. 161–168.
  • Batagelj and Zaversnik (2003) Vladimir Batagelj and Matjaz Zaversnik. 2003. An O (m) algorithm for cores decomposition of networks. arXiv preprint cs/0310049 (2003).
  • Bonchi et al. (2014) Francesco Bonchi, Francesco Gullo, Andreas Kaltenbrunner, and Yana Volkovich. 2014. Core decomposition of uncertain graphs. In Proceedings of the 20th International Conference on Knowledge Discovery and Data Mining. 1316–1325.
  • Bonchi et al. (2019) Francesco Bonchi, Arijit Khan, and Lorenzo Severini. 2019. Distance-generalized core decomposition. In Proceedings of the 2019 International Conference on Management of Data. 1006–1023.
  • Chen et al. (2020) Yankai Chen, Jie Zhang, Yixiang Fang, Xin Cao, and Irwin King. 2020. Efficient community search over large directed graphs: An augmented index-based approach. In International Joint Conference on Artificial Intelligence. 3544–3550.
  • Cheng et al. (2011) James Cheng, Yiping Ke, Shumo Chu, and M Tamer Özsu. 2011. Efficient core decomposition in massive networks. In 2011 IEEE 27th International Conference on Data Engineering. 51–62.
  • Dasari et al. (2014) Naga Shailaja Dasari, Ranjan Desh, and Mohammad Zubair. 2014. ParK: An efficient algorithm for k-core decomposition on multicore processors. In 2014 IEEE International Conference on Big Data. 9–16.
  • Eidsaa and Almaas (2013) Marius Eidsaa and Eivind Almaas. 2013. S-core network decomposition: A generalization of k-core analysis to weighted networks. Physical Review E 88, 6 (2013), 062819.
  • Esfandiari et al. (2018) Hossein Esfandiari, Silvio Lattanzi, and Vahab Mirrokni. 2018. Parallel and streaming algorithms for k-core decomposition. In International Conference on Machine Learning. 1397–1406.
  • Fan et al. (2017) Wenfei Fan, Jingbo Xu, Yinghui Wu, Wenyuan Yu, and Jiaxin Jiang. 2017. GRAPE: Parallelizing sequential graph computations. Proceedings of the VLDB Endowment 10, 12 (2017), 1889–1892.
  • Fang et al. (2018) Yixiang Fang, Zhongran Wang, Reynold Cheng, Hongzhi Wang, and Jiafeng Hu. 2018. Effective and efficient community search over large directed graphs. IEEE Transactions on Knowledge and Data Engineering 31, 11 (2018), 2093–2107.
  • Fang et al. (2020) Yixiang Fang, Yixing Yang, Wenjie Zhang, Xuemin Lin, and Xin Cao. 2020. Effective and Efficient Community Search over Large Heterogeneous Information Networks. Proceedings of the VLDB Endowment 13, 6 (2020), 854–867.
  • Galimberti et al. (2021) Edoardo Galimberti, Martino Ciaperoni, Alain Barrat, Francesco Bonchi, Ciro Cattuto, and Francesco Gullo. 2021. Span-core Decomposition for Temporal Networks: Algorithms and Applications. ACM Transactions on Knowledge Discovery from Data 15, 1 (2021), 2:1–2:44.
  • Garcia et al. (2017) David Garcia, Pavlin Mavrodiev, Daniele Casati, and Frank Schweitzer. 2017. Understanding popularity, reputation, and social influence in the twitter society. Policy & Internet 9, 3 (2017), 343–364.
  • Giatsidis et al. (2013) Christos Giatsidis, Dimitrios M Thilikos, and Michalis Vazirgiannis. 2013. D-cores: measuring collaboration of directed graphs based on degeneracy. Knowledge and Information Systems 35, 2 (2013), 311–343.
  • Hirsch (2005) Jorge E. Hirsch. 2005. H-index. https://en.wikipedia.org/wiki/H-index/.
  • Karypis and Kumar (1998) George Karypis and Vipin Kumar. 1998. Metis: A software package for partitioning unstructured graphs. Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices Version 4 (1998).
  • Khaouid et al. (2015) Wissam Khaouid, Marina Barsky, Venkatesh Srinivasan, and Alex Thomo. 2015. K-core decomposition of large networks on a single pc. Proceedings of the VLDB Endowment 9, 1 (2015), 13–23.
  • Liao et al. (2022) Xunkun Liao, Qing Liu, Jiaxin Jiang, Xin Huang, Jianliang Xu, and Byron Choi. 2022. Distributed D-core Decomposition over Large Directed Graphs. In arXivpreprint arXiv:XXXXXXX.
  • Linial (1992) Nathan Linial. 1992. Locality in Distributed Graph Algorithms. SIAM J. Comput. 21, 1 (1992), 193–201.
  • Liu et al. (2019) Boge Liu, Long Yuan, Xuemin Lin, Lu Qin, Wenjie Zhang, and Jingren Zhou. 2019. Efficient (a,β\beta)-core Computation: an Index-based Approach. In International World Wide Web Conference. 1130–1141.
  • Liu et al. (2021) Qing Liu, Xuliang Zhu, Xin Huang, and Jianliang Xu. 2021. Local Algorithms for Distance-generalized Core Decomposition over Large Dynamic Graphs. The Proceedings of the VLDB Endowment (2021).
  • Low et al. (2012) Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M Hellerstein. 2012. Distributed graphlab: A framework for machine learning in the cloud. arXiv preprint arXiv:1204.6078 (2012).
  • Lü et al. (2016) Linyuan Lü, Tao Zhou, Qian-Ming Zhang, and H Eugene Stanley. 2016. The H-index of a network node and its relation to degree and coreness. Nature Communications 7, 1 (2016), 1–7.
  • Malewicz et al. (2010) Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 International Conference on Management of Data. 135–146.
  • Mandal and Hasan (2017) Aritra Mandal and Mohammad Al Hasan. 2017. A distributed k-core decomposition algorithm on spark. In IEEE International Conference on Big Data. 976–981.
  • McCune et al. (2015) Robert Ryan McCune, Tim Weninger, and Greg Madey. 2015. Thinking Like a Vertex: A Survey of Vertex-Centric Frameworks for Large-Scale Distributed Graph Processing. Comput. Surveys 48, 2 (2015), 25:1–25:39.
  • Montresor et al. (2012) Alberto Montresor, Francesco De Pellegrini, and Daniele Miorandi. 2012. Distributed k-core decomposition. IEEE Transactions on Parallel and Distributed Systems 24, 2 (2012), 288–300.
  • Peng et al. (2018) You Peng, Ying Zhang, Wenjie Zhang, Xuemin Lin, and Lu Qin. 2018. Efficient Probabilistic K-Core Computation on Uncertain Graphs. In IEEE International Conference on Data Engineering. 1192–1203.
  • Salihoglu and Widom (2013) Semih Salihoglu and Jennifer Widom. 2013. Gps: A graph processing system. In Proceedings of the 25th International Conference on Scientific and Statistical Database Management. 1–12.
  • Saríyüce et al. (2013) Ahmet Erdem Saríyüce, Buğra Gedik, Gabriela Jacques-Silva, Kun-Lung Wu, and Ümit V Çatalyürek. 2013. Streaming algorithms for k-core decomposition. Proceedings of the VLDB Endowment 6, 6 (2013), 433–444.
  • Sarıyüce et al. (2016) Ahmet Erdem Sarıyüce, Buğra Gedik, Gabriela Jacques-Silva, Kun-Lung Wu, and Ümit V Çatalyürek. 2016. Incremental k-core decomposition: algorithms and evaluation. The VLDB Journal 25, 3 (2016), 425–447.
  • Sariyüce et al. (2018) Ahmet Erdem Sariyüce, C. Seshadhri, and Ali Pinar. 2018. Local Algorithms for Hierarchical Dense Subgraph Discovery. Proceedings of the VLDB Endowment 12, 1 (2018), 43–56.
  • Seidman (1983) Stephen B Seidman. 1983. Network structure and minimum degree. Social Networks 5, 3 (1983), 269–287.
  • Soldano et al. (2017) Henry Soldano, Guillaume Santini, Dominique Bouthinon, and Emmanuel Lazega. 2017. Hub-authority cores and attributed directed network mining. In International Conference on Tools with Artificial Intelligence. 1120–1127.
  • Tian et al. (2013) Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, and John McPherson. 2013. From” think like a vertex” to” think like a graph”. Proceedings of the VLDB Endowment 7, 3 (2013), 193–204.
  • Tsourakakis et al. (2014) Charalampos Tsourakakis, Christos Gkantsidis, Bozidar Radunovic, and Milan Vojnovic. 2014. FENNEL: Streaming Graph Partitioning for Massive Scale Graphs. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining. 333–342.
  • Wen et al. (2016) Dong Wen, Lu Qin, Ying Zhang, Xuemin Lin, and Jeffrey Xu Yu. 2016. I/O efficient Core Graph Decomposition at web scale. In International Conference on Data Engineering. 133–144.
  • Wu et al. (2015) Huanhuan Wu, James Cheng, Yi Lu, Yiping Ke, Yuzhen Huang, Da Yan, and Hejun Wu. 2015. Core decomposition in large temporal graphs. In IEEE International Conference on Big Data. 649–658.
  • Yan et al. (2014) Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2014. Blogel: A block-centric framework for distributed computation on real-world graphs. Proceedings of the VLDB Endowment 7, 14 (2014), 1981–1992.
  • Zhou et al. (2021) Wei Zhou, Hong Huang, Qiang-Sheng Hua, Dongxiao Yu, Hai Jin, and Xiaoming Fu. 2021. Core decomposition and maintenance in weighted graph. World Wide Web 24, 2 (2021), 541–561.

Appendix A Proof of Theorem 2

Theorem 4.1. (Convergence).

kmax(v)=limniHG(n)(v)k_{max}(v)=\lim_{n\rightarrow\infty}\mathrm{iH}_{G}^{(n)}(v)
Proof.

First, we prove that iHG(n)(v)\mathrm{iH}_{G}^{(n)}(v) is non-increasing with the increase of order nn. Then, we prove that kmax(v)iHG()(v)iHG()(v)k_{max}(v)\leq\mathrm{iH}_{G^{\prime}}^{(\infty)}(v)\leq\mathrm{iH}_{G}^{(\infty)}(v), where GGG^{\prime}\subseteq G is a subgraph induced by the vertices vv^{\prime} with kmax(v)kmax(v)k_{max}(v^{\prime})\geq k_{max}(v). Also, we know kmax(v)iHG()(v)k_{max}(v)\geq\mathrm{iH}_{G}^{(\infty)}(v) by definition. Hence, kmax(v)=iHG()(v)k_{max}(v)=\mathrm{iH}_{G}^{(\infty)}(v). Complete proof is given here:

First, we prove that iHG(n)(v)\mathrm{iH}_{G}^{(n)}(v) is non-increasing with the increase of nn through mathematical induction. (1) It is straightforward that iHG(0)(v)\mathrm{iH}_{G}^{(0)}(v) \geq iHG(1)(v)\mathrm{iH}_{G}^{(1)}(v) holds according to the definition of iHG(n)(v)\mathrm{iH}_{G}^{(n)}(v). (2) Assume that iHG(m)(v)iHG(m+1)(v)\mathrm{iH}_{G}^{(m)}(v)\geq\mathrm{iH}_{G}^{(m+1)}(v) holds. We have iHG(m+2)(v)=(iHG(m+1)(u1),\mathrm{iH}_{G}^{(m+2)}(v)=\mathcal{H}(\mathrm{iH}_{G}^{(m+1)}(u_{1}), ,iHG(m+1)(uki))...,\mathrm{iH}_{G}^{(m+1)}(u_{k_{i}}))\leq (iHG(m)(u1),,iHG(m)(uki))=iHG(m+1)(v)\mathcal{H}(\mathrm{iH}_{G}^{(m)}(u_{1}),...,\mathrm{iH}_{G}^{(m)}(u_{k_{i}}))=\mathrm{iH}_{G}^{(m+1)}(v), i.e., iHG(m+1)(v)iHG(m+2)(v)\mathrm{iH}_{G}^{(m+1)}(v)\geq\mathrm{iH}_{G}^{(m+2)}(v). Thus, if iHG(n)(v)iHG(n+1)(v)\mathrm{iH}_{G}^{(n)}(v)\geq\mathrm{iH}_{G}^{(n+1)}(v) holds for n=mn=m, it also holds for n=m+1n=m+1. Combining (1) and (2), iHG(n)(v)\mathrm{iH}_{G}^{(n)}(v) is non-increasing with the increase of nn. Moreover, since iHG(n)(v)\mathrm{iH}_{G}^{(n)}(v) is a positive integer, iHG(n)(v)\mathrm{iH}_{G}^{(n)}(v) can converge to a certain value.

Next, we prove that iHG()(v)kmax(v)\mathrm{iH}_{G}^{(\infty)}(v)\geq k_{max}(v). To this end, we introduce two properties of iHG(n)(v)\mathrm{iH}_{G}^{(n)}(v): (i) if GGG^{\prime}\subseteq G, for vVG\forall v\in V_{G^{\prime}} and n0\forall n\in\mathbb{N}_{0}, we have iHG(n)(v)iHG(n)(v)\mathrm{iH}_{G^{\prime}}^{(n)}(v)\leq\mathrm{iH}_{G}^{(n)}(v); and (ii) for vVG\forall v\in V_{G} and n0\forall n\in\mathbb{N}_{0}, we have iHG(n)(v)indegmin(G)\mathrm{iH}_{G}^{(n)}(v)\geq indeg_{min}(G), where indegmin(G)indeg_{min}(G) is the minimum in-degree of GG. Let GGG^{\prime}\subseteq G be the subgraph of GG induced by vertex vv and the vertices vv^{\prime} with kmax(v)kmax(v)k_{max}(v^{\prime})\geq k_{max}(v). We have iHG()(v)iHG()(v)indegmin(G)kmax(v)\mathrm{iH}_{G}^{(\infty)}(v)\geq\mathrm{iH}_{G^{\prime}}^{(\infty)}(v)\geq indeg_{min}(G^{\prime})\geq k_{max}(v).

Then, we prove that kmax(v)iHG()(v)k_{max}(v)\geq\mathrm{iH}_{G}^{(\infty)}(v). To this end, we construct a subgraph G′′GG^{\prime\prime}\subseteq G induced by vertex vv and the vertices v′′v^{\prime\prime} satisfying iHG()(v′′)iHG()(v)\mathrm{iH}_{G}^{(\infty)}(v^{\prime\prime})\geq\mathrm{iH}_{G}^{(\infty)}(v). Obviously, G′′G^{\prime\prime} is a (iHG()(v),0)(\mathrm{iH}_{G}^{(\infty)}(v),0)-core. Hence, kmax(v)iHG()(v)k_{max}(v)\geq\mathrm{iH}_{G}^{(\infty)}(v).

Finally, combining the above three proofs, we obtain Theorem 4.1. ∎

Appendix B Proof of Theorem 4.10

Theorem 4.4. (Time and Space Complexities). Algorithm 1 takes O(RACΔinΔ)O(R_{AC}\cdot\Delta_{in}\cdot\Delta) time and O(ΔinΔ)O(\Delta_{in}\cdot\Delta) space. The total time and space complexities for computing all vertices’ corenesses are O(RACΔinm)O(R_{AC}\cdot\Delta_{in}\cdot m) and O(Δinm)O(\Delta_{in}\cdot m), respectively..

Proof.

Algorithm 1 consists of three phases in Algorithms 2, 3, and 4. Algorithm 2 iteratively computes the nn-order in-H-index for all vertices. The time complexity of Algorithm 2 is mainly determined by the number of rounds that need to compute the nn-order in-H-index and also the processing time of each round, which are RACIR_{AC-I} and O(degGin(v))O(deg^{in}_{G}(v)), respectively. Hence, the time complexity of Algorithm 2 is O(RACIdegGin(v))O(R_{AC-I}\cdot deg^{in}_{G}(v)). Next, Algorithm 3 iteratively computes lupp(k,v)l_{upp}(k,v). Each round of Algorithm 3 takes O(degGin(v)degGout(v))O(deg^{in}_{G}(v)\cdot deg^{out}_{G}(v)) time. Algorithm 4 iteratively refines lupp(k,v)l_{upp}(k,v) to lmax(k,v)l_{max}(k,v). Each round of Algorithm 4 takes O(degGin(v)degG(v))O(deg^{in}_{G}(v)\cdot deg_{G}(v)) time. Hence, the time complexities of Algorithms 3 and 4 are O(RACIIdegGin(v)degGout(v))O(R_{AC-II}\cdot deg^{in}_{G}(v)\cdot deg^{out}_{G}(v)) and O(RACIIIdegGin(v)degG(v))O(R_{AC-III}\cdot deg^{in}_{G}(v)\cdot deg_{G}(v)), respectively. Therefore, the time complexity of Algorithm 1 is O(RACIdegGin(v)+RACIIdegGin(v)degGout(v)+RACIIIdegGin(v)degG(v))O(R_{AC-I}\cdot deg^{in}_{G}(v)+R_{AC-II}\cdot deg^{in}_{G}(v)\cdot deg^{out}_{G}(v)+R_{AC-III}\cdot deg^{in}_{G}(v)\cdot deg_{G}(v)) == O(RACdegGin(v)degG(v))O(R_{AC}\cdot deg^{in}_{G}(v)\cdot deg_{G}(v)) == O(RACΔinΔ)O(R_{AC}\cdot\Delta_{in}\cdot\Delta). The total time complexity for all vertices is O(vVGRACdegGin(v)degG(v))O(\sum\limits_{v\in V_{G}}R_{AC}\cdot deg^{in}_{G}(v)\cdot deg_{G}(v)) == O(RACΔinvVGdegG(v))O(R_{AC}\cdot\Delta_{in}\cdot\sum\limits_{v\in V_{G}}deg_{G}(v)) == O(RACΔinm)O(R_{AC}\cdot\Delta_{in}\cdot m).

Next, we analyze the space complexity. Algorithm 2 computes iHG(n)(v)\mathrm{iH}_{G}^{(n)}(v) by storing iHG(n1)(u)\mathrm{iH}_{G}^{(n-1)}(u) for uNin(v)u\in N^{in}(v), which costs O(degGin(v))O(deg^{in}_{G}(v)) space. Next, Algorithm 3 computes lupp(k,v)l_{upp}(k,v). For each k[0,kmax]k\in[0,k_{max}], it stores the out-H-indexes of the out-neighbors in O(degGout(v))O(deg^{out}_{G}(v)) space. Hence, Phase II requires O(degGin(v)degGout(v))O(deg^{in}_{G}(v)\cdot deg^{out}_{G}(v)) space. Phase III refines lupp(k,v)l_{upp}(k,v) to lmax(k,v)l_{max}(k,v) in Algorithm 4. For each k[0,kmax]k\in[0,k_{max}], it stores lupp(k,v)l_{upp}(k,v^{\prime}) for all vv’s neighbors in O(degGin(v)degG(v))O(deg^{in}_{G}(v)\cdot deg_{G}(v)) space. Overall, the space complexity of Algorithm 1 is O(degGin(v)+degGin(v)degGout(v)+degGin(v)degG(v))O(deg^{in}_{G}(v)+deg^{in}_{G}(v)\cdot deg^{out}_{G}(v)+deg^{in}_{G}(v)\cdot deg_{G}(v)) == O(degGin(v)degG(v))O(deg^{in}_{G}(v)\cdot deg_{G}(v)) == O(ΔinΔ)O(\Delta_{in}\cdot\Delta). The total space complexity for all vertices is O(vVGdegGin(v)degG(v))O(\sum\limits_{v\in V_{G}}deg^{in}_{G}(v)\cdot deg_{G}(v)) == O(Δinm)O(\Delta_{in}\cdot m). ∎

Appendix C Proof of Theorem 4.11

Theorem 4.5. (Message Complexity). The message complexity (i.e., the total number of times that a vertex send messages) of Algorithm 1 is O(ΔinΔoutΔ)O(\Delta_{in}\cdot\Delta_{out}\cdot\Delta). The total message complexity for computing all vertices’ corenesses is O(ΔinΔoutm)O(\Delta_{in}\cdot\Delta_{out}\cdot m).

Proof.

We analyze the message complexity in terms of two parts: Algorithm 2 and Algorithms 3 and 4. First, Algorithm 2 calculate kmax(v)k_{max}(v) by iteratively computing nn-order in-H-index of vv. When nn-order in-H-index is not equal to (n1)(n-1)-order in-H-index, Algorithm 2 also sends messages to all vv’s out-neighbors. vv sends message at most degGin(v)deg^{in}_{G}(v) times. Hence, the message complexity of Algorithm 1 is O(degGin(v)degGout(v))O(deg^{in}_{G}(v)\cdot deg^{out}_{G}(v)).

Next, for each value of kk, Algorithm 3 first computes the upper bound lupp(k,v)l_{upp}(k,v) and then Algorithm 4 refine lupp(k,v)l_{upp}(k,v) to lmax(k,v)l_{max}(k,v). Algorithms 3 obtains lupp(k,v)l_{upp}(k,v) by iteratively computing nn-order out-H-index of vv. When nn-order out-H-index of vv decreases, Algorithm 3 sends messages to vv’s in-neighbors. Algorithm 3 continues until the nn-order out-H-index of vv equals lupp(k,v)l_{upp}(k,v). Then, Algorithms 4 refines lupp(k,v)l_{upp}(k,v) by gradually decreasing it to lmax(k,v)l_{max}(k,v). Each time lupp(k,v)l_{upp}(k,v) decreases, Algorithm 4 sends messages to vv’s neighbors. Since lmax(k,v)l_{max}(k,v) is at least 1 and there are at most degGin(v)deg^{in}_{G}(v) values of kk, the total messages sent by vv is degGout(v)degGin(v)degG(v)deg^{out}_{G}(v)\cdot deg^{in}_{G}(v)\cdot deg_{G}(v) in worst. Overall, the message complexity of Algorithms 3 and 4 is O(degGout(v)degGin(v)degG(v))O(deg^{out}_{G}(v)\cdot deg^{in}_{G}(v)\cdot deg_{G}(v)).

As a result, the message complexity of Algorithm 1 is O(degGin(v)degGout(v)+degGout(v)degGin(v)degG(v))O(deg^{in}_{G}(v)\cdot deg^{out}_{G}(v)+deg^{out}_{G}(v)\cdot deg^{in}_{G}(v)\cdot deg_{G}(v)) == O(degGout(v)degGin(v)degG(v))O(deg^{out}_{G}(v)\cdot deg^{in}_{G}(v)\cdot deg_{G}(v)) == O(ΔinΔoutΔ)O(\Delta_{in}\cdot\Delta_{out}\cdot\Delta). The total message complexity for all vertices is O(vVGdegGout(v)degGin(v)degG(v))O(\sum\limits_{v\in V_{G}}deg^{out}_{G}(v)\cdot deg^{in}_{G}(v)\cdot deg_{G}(v)) == O(ΔinΔoutvVGdegG(v))O(\Delta_{in}\cdot\Delta_{out}\cdot\sum\limits_{v\in V_{G}}deg_{G}(v)) == O(ΔinΔoutm)O(\Delta_{in}\cdot\Delta_{out}\cdot m). ∎

Appendix D Proof of Theorem 5.8

Theorem 5.3. (Time and Space Complexities). Algorithm 5 takes O(RSCΔinΔout)O(R_{SC}\cdot\Delta_{in}\cdot\Delta_{out}) time in O(Δmin{Δin,Δout})O(\Delta\cdot\min\{\Delta_{in},\Delta_{out}\}) space. The total time and space complexities for computing all vertices’ corenesses are O(RSCΔinm)O(R_{SC}\cdot\Delta_{in}\cdot m) and O(min{Δin,Δout}m)O(\min\{\Delta_{in},\Delta_{out}\}\cdot m), respectively.

Proof.

The time complexity of Algorithm 5 is dominated by the iterative computation of nn-order D-index. In each round, Algorithm 6 examines at most degGin(v)degGout(v)deg^{in}_{G}(v)\cdot deg^{out}_{G}(v) pairs. Therefore, the time complexity of Algorithm 5 is O(RSCdegGin(v)degGout(v))O(R_{SC}\cdot deg^{in}_{G}(v)\cdot deg^{out}_{G}(v)) == O(RSCΔinΔout)O(R_{SC}\cdot\Delta_{in}\cdot\Delta_{out}). The total time complexity for all vertices is O(vVGRSCdegGin(v)degGout(v))O(\sum\limits_{v\in V_{G}}R_{SC}\cdot deg^{in}_{G}(v)\cdot deg^{out}_{G}(v)) == O(RSCΔinvVGdegG(v))O(R_{SC}\cdot\Delta_{in}\cdot\sum\limits_{v\in V_{G}}deg_{G}(v)) == O(RSCΔinm)O(R_{SC}\cdot\Delta_{in}\cdot m).

Next, we analyze the space complexity. To compute vv’s nn-order D-index, Algorithm 5 stores the (n1)(n-1)-order D-indexes of all vv’s neighbors. The size of vertex vv’s nn-order D-index is bounded by min{Δin,Δout}\min\{\Delta_{in},\Delta_{out}\}. Thus, the space complexity of Algorithm 5 is O(degG(v)min{Δin,Δout})O(deg_{G}(v)\cdot\min\{\Delta_{in},\Delta_{out}\}) == O(Δmin{Δin,Δout})O(\Delta\cdot\min\{\Delta_{in},\Delta_{out}\}). The total time complexity for all vertices is O(vVGmin{degGin(v),degGout(v)}degGout(v))O(\sum\limits_{v\in V_{G}}\min\{deg^{in}_{G}(v),deg^{out}_{G}(v)\}\cdot deg^{out}_{G}(v)) == O(min{Δin,Δout}vVGdegG(v))O(\min\{\Delta_{in},\Delta_{out}\}\cdot\sum\limits_{v\in V_{G}}deg_{G}(v)) == O(min{Δin,Δout}m)O(\min\{\Delta_{in},\Delta_{out}\}\cdot m). ∎

Appendix E Proof of Theorem 5.9

Theorem 5.4. (Message Complexity). The message complexity of Algorithm 5 is O(Δ2)O(\Delta^{2}). The total message complexity for computing all vertices’ corenesses is O(Δm)O(\Delta\cdot m).

Proof.

The message complexity of Algorithm 5 is dominated by the computation of nn-order D-index for vertex vv. When the nn-order D-index changes from the (n1)(n-1)-order D-index, vv broadcasts messages to its neighbors. The nn-order D-index contains skyline pairs. Assume that in each round the skyline pair decreases one dimension by one at most. There are a total of degG(v)deg_{G}(v) rounds for vv. Hence, the message complexity of Algorithm 5 is O(degG(v)degG(v))O(deg_{G}(v)\cdot deg_{G}(v)) == O(Δ2)O(\Delta^{2}). The total message complexity for all vertices is O(vVGdegG(v)degG(v))O(\sum\limits_{v\in V_{G}}deg_{G}(v)\cdot deg_{G}(v)) == O(ΔvVGdegG(v))O(\Delta\cdot\sum\limits_{v\in V_{G}}deg_{G}(v)) == O(Δm)O(\Delta\cdot m). ∎

Appendix F Performance comparisons on a single machine

This set of experiments evaluates the performance of our proposed algorithms and the state-of-the-art centralized method (Fang et al., 2018) over a single machine. Figure 8 shows the running time results. We can observe that for most of the small graphs, the peeling algorithm is more efficient than our algorithms. The reason is two-folded: (1) our proposed algorithms require iterative computations of in-H-index/out-H-index/D-index for every vertex before convergence, which is time consuming in centralized settings; (2) on a single machine, the vertex state update is transparent to other vertices in the peeling algorithm while it needs message communication exchanges in our algorithms, which incur a higher overhead. Nevertheless, our proposed algorithms run much faster than the peeling method in distributed settings as reported in Section 6.2. On the other hand, for billion-scale graphs UK5 and IT, all algorithms could not finish, due to memory overflows. This indeed motivates the need of distributed D-core decomposition over large directed graphs using a collection of machines.

Refer to caption
Figure 8. Performance comparisons on a single machine