This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Modular Framework for
Centrality and Clustering in Complex Networks

Frederique Oggier, Silivanxay Phetsouvanh, and Anwitaman Datta F. Oggier is with Division of Mathematical Sciences, Nanyang Technological University, Singapore.
E-mail: [email protected] S. Phetsouvanh and A. Datta was/is with School of Computer Science and Engineering, Nanyang Technological University, Singapore.
E-mail: [email protected]
Abstract

Abstract: The structure of many complex networks includes edge directionality and weights on top of their topology. Network analysis that can seamlessly consider combination of these properties are desirable. In this paper, we study two important such network analysis techniques, namely, centrality and clustering. An information-flow based model is adopted for clustering, which itself builds upon an information theoretic measure for computing centrality. Our principal contributions include a generalized model of Markov entropic centrality with the flexibility to tune the importance of node degrees, edge weights and directions, with a closed-form asymptotic analysis. It leads to a novel two-stage graph clustering algorithm. The centrality analysis helps reason about the suitability of our approach to cluster a given graph, and determine ‘query’ nodes, around which to explore local community structures, leading to an agglomerative clustering mechanism. The entropic centrality computations are amortized by our clustering algorithm, making it computationally efficient: compared to prior approaches using Markov entropic centrality for clustering, our experiments demonstrate multiple orders of magnitude of speed-up. Our clustering algorithm naturally inherits the flexibility to accommodate edge directionality, as well as different interpretations and interplay between edge weights and node degrees. Overall, this paper thus not only makes significant theoretical and conceptual contributions, but also translates the findings into artifacts of practical relevance, yielding new, effective and scalable centrality computations and graph clustering algorithms, whose efficacy has been validated through extensive benchmarking experiments.

Index Terms:
Directed Weighted Graphs, Entropy, Centrality, Graph Clustering, Random Walkers.

1 Introduction

Complex networks [9] are a form of structured data that model interactions among individuals in diverse real-life applications. New instances of complex networks may be studied with existing techniques, or may motivate the introduction of new approaches. An example of a relatively recent and fast evolving complex network is the Bitcoin network, where individuals or nodes are Bitcoin addresses, and edges represent transactions from one address to another; they are thus directed and weighted. There are many reasons to study the Bitcoin network and other cryptocurrency networks as complex networks: understanding their topology and their evolution over time in turn offers economic or societal insights. Centrality analysis and clustering (also known as community detection) are two popular techniques that are used to study network structures and could be applied.

Centrality: The identification of key individuals in a complex network is often done using centrality, a family of measures that captures the importance of nodes in a graph. Centrality measures are often categorized (see Figure 1) by either flow or walk structure. (i) Flow-based measures come with a 2-dimensional typology [6]: the trajectory of the flow (categorized as - geodesics, paths where neither nodes nor edges are repeated, trails where nodes are repeated but edges are not, and walks, where both nodes and edges can be repeated) and the method of spread (which includes - transfer of an atomic flow, duplication or replication of the flow or a split of a volume conserved flow). (ii) Walk structure-based measures are partitioned into radial centralities (counting walks which start/end from a given vertex, e.g. the degree and the eigenvalue centralities) and medial centralities (counting walks which pass through a given vertex, e.g. betweenness centrality).

Our focus is on flow-based measures, motivated by the study of phenomena such as the movement of cryptocurrencies across a network from one sender to a receiver, where the volume of the flow is conserved. In [36], it was argued that the existing flow-based centrality measures back then left a void in the typology because they did not capture well the idea of path-transfer flow. Accordingly entropy, as a measure of uncertainty of dispersal, was advanced as a new centrality measure. The idea was modified in [23], where the path restriction was relaxed from the original idea of [36], allowing for the study of the flow as a random walker using a Markov model.

In order to distinguish senders from receivers, and consider equivalent or distinct (as the need of a specific analysis might be) the users involved in transactions involving large amounts from users carrying out large number of transactions of small amounts, and other such nuances, it is desirable to have the flexibility to consider the directionality and weights of the edges, which could carry their own semantics. While a notion of weighted degree centrality has been proposed [28], this paper proposes a notion of flow-based centrality which is seamlessly adjustable to undirected, directed and/or weighted graphs. Akin to [23], our work uses a random walker starting its walk at any given node, whose centrality is captured by the spread of the walk, measured in terms of entropy, and an absorption probability is used to model the walker stopping at any destination. While [23] rendered the usage of entropic centrality more practical than [36] by introducing the idea of a random walker, it also has limitations both in the theoretical exploration of the concepts as well as the resulting practical implications. These limitations, together with the potential of the ideas themselves, serve as motivations for our current work. Particularly, we provide, among other improvements, closed form expressions to characterize the behaviour of the walker. The specific details are elaborated when we discuss our contributions below.

Refer to caption
Figure 1: Positioning our work in the current centrality and clustering landscape.

Graph clustering: A set of nodes that share common or similar features are considered to comprise a cluster in a network. A general understanding of a good cluster is that it is dense (nodes within are well connected), and have few connections with the rest of the network [35, 22]. Beyond that, there is no unique criterion that necessarily captures or reflects the quality of the clusters. Depending on the particular aspects being captured, different approaches may be relevant, and likewise, different outcomes may be expected and desirable. In [35], two broad families of graph clustering methods have been identified (see Figure 1) - (i) those based on vertex similarity in some manner, e.g., distance or similarity measure which may or not take into account structural information (for instance, web-page similarity may consider only content similarity, or may also consider the hyperlinks to determine similarity), adjacency or connectivity based measures; and (ii) those based on cluster fitness measures, including density and cut-based measures, e.g., modularity. In [22], the former are referred to as pattern based, since these methods go beyond basic edge density characteristics. They include algorithms relying on random walkers [1, 4, 16], where the basic premise is that random walks originating at a given ‘query’ node is more likely to stay confined within a subgraph which forms a local community containing the query node, than to visit nodes outside said community. The probability distribution of a random walker’s visit (and absorption) at different nodes in the graph is used as a signal to identify cluster boundaries.

It is observed in [22] that while numerous clustering algorithms exist for undirected (weighted) graphs (see [35] for a survey of well known clustering techniques for undirected graphs), clustering algorithms for directed graphs are fewer. Most algorithms rely on creating an “undirected” representation of the directed graph considered, which may result in the loss of semantics and information contained in the directionalities. Examples of notable exceptions where the clustering algorithm is designed specifically for directed graphs are the information flow approaches of [34], where clusters are identified as subgraphs where the information flow within is larger compared to outside node groups, and [16], where an information flow-based cluster is a group of nodes where a random walker (referred as surfer in the work) is more likely to get trapped, rather than moving outside the group. Tellingly, the other random walker based approaches mentioned above, namely [1, 4], because of the additional constraints imposed on the walkers in those algorithms, are only suitable for undirected unweighted graphs.

Since directionality is of importance in many set-ups such as the example of a cryptocurrency network, we will focus on clustering algorithms that can handle both (un)directed and/or (un)weighted graphs, and thus we explore flow circulation based algorithms, which falls within the pattern based (vertex similarity) family of graph clustering algorithms. This choice has a natural coupling with our proposed entropic centrality measure. Under the premise of confinement of flows within a subset of nodes as a way to interpret community structures, one may further consider the flow using any of the many characterization of flows described above in the context of centrality measures, including those supporting the treatment of directed and weighted graphs, each in turn capturing distinct real world phenomena. We would like to emphasize that these distinct approaches of interpreting the flows are not meant to compete with each other (let alone, compete with other clustering algorithms such as those based on local density), and are instead targeted at modelling and capturing distinct behaviors. Given a complex network which is inherently both directed and weighted, e.g., the Bitcoin network, one may need to study its properties considering its directionality, with or without the influence of weights, considering possibly different interpretation of weights, or just its bare topology. It is thus important to have a single clustering method with enough versatility to accommodate and instantiate the spectrum of combinations. Our proposed approach is designed to meet this particular objective.

Organization. The paper is logically organized in two halves: the first part (Sections 2 and 3) is dedicated to the centrality analysis, while the second (Section 4), which relies on the first part, delves into the clustering algorithm. Relevant experiments are reported within the corresponding sections.

In Section 2, we carry out a study of the proposed Markov entropic centrality for unweighted (un)directed graphs. We coin the term ‘Markov’ entropic centrality, as opposed to the original path based model studied in [36], because our model leverages the random walks being absorbed in an abstraction called auxiliary nodes with certain probability, and the future steps of the random walker depends solely on its current location, irrespective of its past. We analyze the behavior of this centrality as a function of (a) the probability that the walker stops at given nodes, and (b) time (or number of steps in the walk). Results are analytically proven in terms of bounds and a closed form expression captures the asymptotic behavior of the random walk probability distribution for unweighted directed graphs (undirected graphs can be readily captured by introducing directed edges for both directions), in contrast to the numerical computations carried out in [23]. An immediate practical implication of such a closed form solution instead of reliance on numerical computations is computational efficiency and scalability of the model. Our analysis provides a thorough understanding of the role of an associated parameter termed as the absorption probability, as well as the flexibility to explore the probability being a constant at all nodes, or being dependent on the individual nodes’ degrees — which in turn makes the model versatile to capture different situations, and enables to reason about the choice of the parameter. This is in contrast to the ad-hoc choices used for experiments in [23]. Finally, we introduce the notion of Markov entropic centralization which we study analytically in the asymptotic time regime as a way to provide further insights regarding the relative importance of the nodes.

In Section 3, we generalize the theory of entropic centrality for weighted graphs, taking into account the need to accommodate different interpretations of edge weights, and its competition with number of edges, in determining node centrality, e.g., tune the model as per the need of an analysis to let an edge with weight 3 have either more or less or equal contribution to a node’s centrality as three edges of weight 1. We refer to [28] for a more elaborate discussion on interpreting centrality for weighted graphs. This is furthermore in contrast to the treatment in [23], where weighted graphs are dealt with by changing the transition probabilities of the random walker as per edge weights, which is one specific instance within our tunable model.

Overall, Sections 2 and 3 result in our first set of contributions, which is vis-à-vis entropic centrality. We provide (i) a generalization of the entropic centrality model, which can be tuned to capture a spectrum of combinations in terms of the role of the edges’ weights and directionality, accompanied with (ii) a mathematically rigorous analysis which helps understand the role of the model parameters and choose them judiciously, and ultimately resulting in (iii) a computationally, significantly efficient and thus scalable model with respect to prior work.

Section 4 contains the proposed clustering algorithm. The above analysis of the random walker probability distributions carried out in the context of entropic centrality, accompanied with an analytical study of the graph centralization measure which captures how tightly the graph is organized around high centrality nodes, directly lends itself to the design of a novel graph clustering (community detection) algorithm, which amortizes the computations carried out for computing the node centralities. Foremost, from the study of the (absorption) probability distributions of the random walkers and the resulting entropic centrality scores of the nodes, we observed that: (i) nodes with a low entropic centrality have local clusters where ties are relatively strong, while nodes tend to have a relatively high entropic centrality when they either are embedded within a large cluster, or when they are at the boundary of two clusters (and could arguably belong to either), and moreover, (ii) sharp changes in the random walker absorption probability distribution signal boundaries of local community structures.

This leads to a two-stage clustering algorithm: First, observing that nodes at the boundary of clusters often act as hubs and have high centrality, we determine meaningful node centric local communities around nodes with low entropic centrality. This approach inherits the flexibility of the underlying centrality model, and is applicable to the spectrum of combinations (un/directed, un/weighted) of graphs. Furthermore, the strong coupling of the algorithm’s design with the aforementioned centrality measure allows to initiate the random walkers in an informed manner, in contrast to previous works based on random walks, such as [1, 4], which, in absence of any guidance on where to initiate random walks needed more complex (computationally, as well as in terms of implementation and analysis), constrained random walkers. Second, the process is reapplied on the created clusters (instead of the nodes) to effectuate a bottom-up, scalable, hierarchical clustering. Working principles behind the heuristic algorithms are supported by formal derivations, and the performance of the algorithms have been tested with experiments in terms of sensitivity and correctness, as well as through rigorous benchmarking using synthetic and real world networks with ground truth.

Finally, the existing flow based clustering approaches (and most clustering approaches, in general) do not have an inherent way to reason about the quality of clusters they may yield, given any graph. In contrast the distribution of the relative values of entropic centrality of nodes and graph centralization also inform us on whether and how much a given graph is amenable to the use of our approach for carrying out clustering.

In [23], a radically different approach (adapting [11]) for community detection is applied, where edges are iteratively removed, and in each iteration one needs to identify the edge such that the average entropic centrality over all the nodes is reduced the most. This requires the computation of entropic centrality of all the nodes multiple times for the different resulting graph snapshots from edge removals, and do so numerically over and over again. Naturally, the approach is computationally intensive and not scalable. For a specific undirected unweighted graph, our algorithm took 1.072 seconds to compute the communities, which is multiple orders of magnitude faster than the 3196.07 seconds needed by the edge removal algorithm [23] (more details on the specific experiments can be found in Subsection 4.4). The communities detected by our approach were also qualitatively better for the network where the two approaches were compared head-to-head. Moreover, the authors in [23] noted that their approach was not useful for community detection in directed graphs. We demonstrate with experiments that our approach works also with directed graphs.

To summarize, our second set of contributions are vis-à-vis graph clustering. The salient aspects of our work comprise (i) the use of entropic centrality (and underlying analysis) to inform on whether our approach will yield good quality of clusters given a graph, and if so, (ii) how to identify good candidate ‘query nodes’ around which initial sets of local community structures should be explored, which in turn yields (iii) an effective novel two-stage clustering algorithm even while using a single random walker (as opposed to prior studies, where random walkers initiated at random query nodes were shown not to be effective, necessitating more complex and constrained walks by multiple random walkers). Our clustering algorithm (iv) amortizes the previous computation of entropic centralities in computing these communities, and (v) inherits (from the underlying centrality model) the flexibility of interpreting the roles of edge weights and directionality, allowing for capturing different behaviors of a weighted directed graph as per application needs. Our approach is thus one of the few graph clustering approaches in the literature that handles directed graphs naturally, and additionally, it allows for flexible incorporation and interpretation of edge weights.

Experiments and Benchmarking: Emerging complex networks such as the Bitcoin network motivated the introduction of a framework for centrality and clustering that handles a variety of angles from which the network structures can be studied, considering in combination the direction and weight of edges. The principal objective of the proposed clustering algorithm is to seamlessly accommodate the various combinations in which the network can be interpreted. A second objective is to ensure that our approach also yields meaningful clusters in general, and in particular, that the obtained results are meaningful when applied to standard graphs used for graph clustering benchmarking. Experiments to support our work are designed accordingly.

Clustering of Bitcoin transaction subgraphs [24, 25] is performed to explore the versatility of our model vis-à-vis the spectrum of choices in interpreting the directionality and weights of edges. Since no ground truth is known, we compare the obtained results with an existing Bitcoin forensics study [32].

Evaluating in general the quality of a clustering algorithm is a debated topic [37, 30]. Complications arise because of multiple reasons, including that different clustering algorithms may identify communities which may reflect different aspects of relationships and thus different results may not necessarily be wrong or worse; there may be multiple interpretations of relationships, resulting in different ‘ground truths’ being valid. We thus report results from a variety of networks, falling into two categories: (a) synthetic networks, and (b) real world graphs with meta-information used to assign a ground truth.

We apply our algorithm on synthetically generated networks, which eliminates the ambiguity of ‘ground truth’ since it is determined by design. We refer to [17, 18] for more details advocating the use of synthetic graphs for benchmarking community detection algorithms. A caveat with using synthetic benchmark graphs is that, they may themselves be created following certain rules, e.g., [18] is generated guided by local density measures. That in turn may favor specific families of clustering algorithms. Ranging from small graphs (of 100 to 500 nodes) to larger graphs (of 1000 and 4000 nodes), the proposed algorithm fares well and is comparative with Louvain [5], with our results showing neither algorithm clearly outperforming the other in determining the clusters, but rather, each algorithm outperforming the other depending on individual instances.

We carry out comparative studies with classical networks and connect these experimental studies back to the conceptual foundations of our approach, which allow us to understand whether our approach is expected to yield good clustering results for a given graph depending on the distribution of the relative values of entropic centralities. This includes the classical karate club [38] and the small cocaine dealing network [8] to interpret and explain the meaning of our model and validate the corresponding findings guided by the well understood narrative associated with these graphs, but also the dolphin network [21] and the American football network [11], for which the performance of our clustering algorithm is compared with well known algorithms, namely Louvain [5], Infomap [34] and label propagation [2]. Remark: All experiments and implementations presented in this paper were done in Python, using in particular the libraries NetworkX [14] and Scikit-Learn [29]. All graphs were drawn using either NetworkX or Gephi [3].

2 Entropic Centrality for Unweighted Graphs

A right stochastic matrix PP is a square matrix, where every coefficient is a non-negative real number and each row sums to 1. A right n×nn\times n stochastic matrix describes a Markov chain XtX_{t}, t1t\geq 1, over nn states. The coefficient Pu,vP_{u,v} of PP is the probability of moving from state uu to state vv, and since the total u=1nPu,v\sum_{u=1}^{n}P_{u,v} of transition probabilities from state uu to all other states must be 1 by definition of probability, the matrix PP is indeed a right stochastic matrix, called a transition matrix. To show that PkP^{k} is again a right stochastic matrix for any positive integer kk, take the all 1 column vector 𝟏{\bf 1} and notice that P𝟏=𝟏P{\bf 1}={\bf 1} for a right stochastic matrix. Then Pk𝟏=P𝟏=𝟏P^{k}{\bf 1}=P{\bf 1}={\bf 1}. Let (Pu,l)l(P_{u,l})_{l} be the uuth row of PP, and (Pl,v)l(P_{l,v})_{l} be its vvth column. Their product (Pu,l)l(Pl,v)l=l(Pu,l)l(Pl,v)l(P_{u,l})_{l}(P_{l,v})_{l}=\sum_{l}(P_{u,l})_{l}(P_{l,v})_{l} determines the probability of going from the state ii to the state jj in two steps, where the intermediate state is ll, for any possible state ll. In general, PkP^{k} gives the probability transition of going from any state to any another state in kk steps.

Consider now a connected directed graph G=(V,E)G=(V,E) with n=|V|n=|V| nodes labeled from 11 to nn and |E||E| directed edges, and a random walker on GG, which starts a random walk at any given node according to an initial probability distribution. If the walker decides on which node vv to walk to, based only on the current node uu at which it currently is (vv does not have to be distinct from uu), the random walk is a Markov chain, which can be modeled by a stochastic matrix PP where PuvP_{uv} is the probability to go to node vv from node uu. We will assume that every node has a self-loop (a self-loop is needed for the model to encapsulate the case of nodes with zero out-degree.) Let dout(u)d_{out}(u) denote the out-degree of uu, which includes the self-loop as per our model. A typical choice that we consider in this section is Puv=1dout(u)P_{uv}=\frac{1}{d_{out}(u)} for every node vv belonging to the set 𝒩u\mathcal{N}_{u} of out-neighbors of uu (this is saying that each neighbor is chosen uniformly at random). Other choices are possible, and indeed, subsequently in this paper, we shall consider the case of weighted graphs where the transition probabilities depend on the edge weights.

2.1 Markov Entropic Centrality

The notion of entropic centrality CH(u)C_{H}(u) for a node uu in an undirected graph was originally defined in [36] as

CH(u)=vVquvlog2(quv)\begin{array}[]{c}C_{H}(u)=-\sum_{v\in V}q_{uv}\log_{2}(q_{uv})\end{array} (1)

where quvq_{uv} denotes the probability of reaching vv from uu following any path. This formalizes the idea that a node is central if there is a high uncertainty about the destination of a random walker starting at it, assuming that at each hop, the random walker is equally likely to continue to any unvisited node, or to stop there. This is because the Shannon entropy H(X)=xp(X=x)log2p(X=x)H(X)=-\sum_{x}p(X=x)\log_{2}p(X=x) of a random variable XX measures the amount of uncertainty contained in XX. Entropic centrality given in (1) was then extended to non-atomic flows in [26, 27] and to a Markov model in [23] by relaxing the assumption (in [36]) that a random walker cannot revisit previously traversed nodes. In this paper we generalize the Markov entropic centrality model. This notion of centrality captures the influence of “friends”, and of friends of friends (see e.g. [15] for a discussion of the role of friends in considering centralities).

To define a notion of entropic centrality similar to (1), the random walker actually needs to have the choice of stopping at any given node. This is not well captured by the transition matrix PP described above, even if there is a self-loop at every node. Thus for every node vv in GG, an auxiliary node vv^{\prime} is added [23], together with a single directed edge (v,v)(v,v^{\prime}), and a probability pvvp_{vv^{\prime}} to be chosen (the original probabilities puvp_{uv} are adjusted to the probabilities p~uv\tilde{p}_{uv} so that the overall matrix remains stochastic). Once the flow reaches an auxiliary node, it is absorbed because the auxiliary nodes have no outgoing edges. This gives a notion of Markov entropic centrality as defined in [23].

Definition 1

The Markov entropic centrality of a node uu at time tt is defined to be

CHt(u)=vV(p~uv(t)+puv(t))log2(p~uv(t)+puv(t))C_{H}^{t}(u)=-\sum_{v\in V}(\tilde{p}_{uv}^{(t)}+p_{uv^{\prime}}^{(t)})\log_{2}(\tilde{p}_{uv}^{(t)}+p_{uv^{\prime}}^{(t)}) (2)

where p~uv(t)\tilde{p}_{uv}^{(t)} is the probability to reach vv at time tt from uu, for vv a node in VV where puvp_{uv^{\prime}} is the probability to reach an auxiliary node vv^{\prime} from uu.

A node uu is central if CH(t)(u)C_{H}^{(t)}(u) is high: if CH(t)(u)C_{H}^{(t)}(u) is high, using the underlying entropy interpretation, this means that for a random walker starting at node uu, the uncertainty about its destination vv after tt steps is high, thus uu is well connected.

The time parameter tt can also be interpreted as a notion of locality. It describes a diameter around the node uu, of length tt steps. Thus the entropic centrality at t=1t=1 emphasizes a node’s degree, tt being the graph diameter implies that we are considering a period of time by when the whole graph can be first reached, and tt\rightarrow\infty describes the asymptotic behavior over time. Thus CH(t)(u)C_{H}^{(t)}(u) can be regarded as a measure of influence of uu over its close neighborhood for small values of tt, and over the whole graph asymptotically.

Given an n×nn\times n stochastic matrix PP describing the moves of a random walker on a directed graph, let us then introduce n=|V|n=|V| auxiliary nodes, one for each node vv, vVv\in V, with corresponding probabilities pvv=Dvvp_{vv^{\prime}}=D_{vv} to walk from node vv to node vv^{\prime}. This means the out-degree of uu increases by 1 because of the addition of uu^{\prime}. Nevertheless, by dout(u)d_{out}(u), we will actually refer to the degree of uu before the addition of uu^{\prime}. This creates the following right stochastic matrix

P^=[P~D𝟎n𝐈n]\hat{P}=\begin{bmatrix}\tilde{P}&D\\ {\bf 0}_{n}&{\bf I}_{n}\end{bmatrix}

where D=DuuD=D_{uu} is a diagonal matrix, and P~\tilde{P} is such that [P~,D]𝟏=𝟏[\tilde{P},D]{\bf 1}={\bf 1}. We assume that P~\tilde{P} has for uu-th row (P~u,l)l=(1puu)(Pu,l)l(\tilde{P}_{u,l})_{l}=(1-p_{uu^{\prime}})(P_{u,l})_{l}. Then l=12n(P^)ul=(1puu)l=1npul+puu=1\sum_{l=1}^{2n}(\hat{P})_{ul}=(1-p_{uu^{\prime}})\sum_{l=1}^{n}p_{ul}+p_{uu^{\prime}}=1 for every uu. This is alternatively written as P~=(𝐈nD)P\tilde{P}=({\bf I}_{n}-D)P. The identity matrix 𝐈n{\bf I}_{n} represents the stoppage of the flow at the auxiliary nodes (an absorption of the flow arriving at these nodes). To determine the centrality of a specific node uu, we assume an initial distribution that gives a probability of 1 to start at uu, and 0 elsewhere.

The definition of Markov entropic centrality was used as part of a clustering algorithm in [23], and the probabilities pvvp_{vv^{\prime}} were explored numerically to optimize the clustering results. Our first contribution is the following closed-form expression for the asymptotic behavior of the transition matrix.

Lemma 1

For an integer t1t\geq 1 and D𝟎D\neq{\bf 0}, we have

P^t=[P~t(j=0t1P~j)D𝟎n𝐈n].\hat{P}^{t}=\begin{bmatrix}\tilde{P}^{t}&(\sum_{j=0}^{t-1}\tilde{P}^{j})D\\ {\bf 0}_{n}&{\bf I}_{n}\end{bmatrix}. (3)

In particular

P^t=[𝟎n(𝐈nP~)1D𝟎n𝐈n]\hat{P}^{t}=\begin{bmatrix}{\bf 0}_{n}&({\bf I}_{n}-\tilde{P})^{-1}D\\ {\bf 0}_{n}&{\bf I}_{n}\end{bmatrix} (4)

when tt\rightarrow\infty.

Proof:

Formula (3) follows from an immediate computation. Since P~,D\tilde{P},D have non-negative real coefficients, so has P~jD\tilde{P}^{j}D, thus (j=0lP~j)D(j=0l1P~j)D(\sum_{j=0}^{l}\tilde{P}^{j})D\geq(\sum_{j=0}^{l-1}\tilde{P}^{j})D for any l1l\geq 1, and equality holds if and only if P~N=0\tilde{P}^{N}=0 for some NN. But this NN must exist when tt\rightarrow\infty, because P^\hat{P} is a stochastic matrix, meaning that the sum of every row must remain 1, while the coefficients of (j=0t1P~j)D(\sum_{j=0}^{t-1}\tilde{P}^{j})D increases at every increment of tNt\leq N. Then

P^N=[𝟎n(j=0N1P~j)D𝟎n𝐈n]\hat{P}^{N}=\begin{bmatrix}{\bf 0}_{n}&(\sum_{j=0}^{N-1}\tilde{P}^{j})D\\ {\bf 0}_{n}&{\bf I}_{n}\end{bmatrix}

but since (P~𝐈n)(j=0N1P~j)=P~N𝐈n=𝐈n(\tilde{P}-{\bf I}_{n})(\sum_{j=0}^{N-1}\tilde{P}^{j})=\tilde{P}^{N}-{\bf I}_{n}=-{\bf I}_{n}, the matrix (P~𝐈n)(\tilde{P}-{\bf I}_{n}) is invertible, j=0N1P~j=(P~𝐈n)1\sum_{j=0}^{N-1}\tilde{P}^{j}=-(\tilde{P}-{\bf I}_{n})^{-1}, yielding (4). ∎

The matrix P^t\hat{P}^{t} in (3) contains a first block P~t\tilde{P}^{t} whose coefficients p~uv(t)\tilde{p}_{uv}^{(t)} are the probabilities to go from uu to vv in tt steps. The second block (j=0t1P~j)D(\sum_{j=0}^{t-1}\tilde{P}^{j})D contains as coefficients the probabilities to go from uu to vv^{\prime} in tt steps, which we denote by puv(t)p_{uv^{\prime}}^{(t)}. Therefore the probabilities p~uv(t)+puv(t)\tilde{p}_{uv}^{(t)}+p_{uv^{\prime}}^{(t)} in (2) are obtained from the uuth row of the matrix P^t\hat{P}^{t}, by summing the coefficients in the columns vv and vv^{\prime}. When tt\rightarrow\infty, the term in column vv becomes 0, and we are left with the term in column vv^{\prime},

of the form

πuv:=puv()={t1p~uv(t)pvvuv(t1p~uv(t)+1)pvvu=v.\pi_{uv}:=p_{uv^{\prime}}^{(\infty)}=\left\{\begin{array}[]{ll}\sum_{t\geq 1}\tilde{p}_{uv}^{(t)}p_{vv^{\prime}}&u\neq v\\ (\sum_{t\geq 1}\tilde{p}_{uv}^{(t)}+1)p_{vv^{\prime}}&u=v.\\ \end{array}\right.

This means that we are looking at the probability to start at uu and reach vv in t0t\geq 0 steps, and then to get absorbed at vv (that is reaching vv^{\prime}, and then not leaving vv^{\prime}). Asymptotically, the entropic centrality defined in (2) becomes

CH(u)=vVπuvlog2(πuv).C_{H}^{\infty}(u)=-\sum_{v\in V}\pi_{uv}\log_{2}(\pi_{uv}).

We discuss next how the choice of the probabilities puup_{uu^{\prime}} to reach an auxiliary node uu^{\prime} from uu influences the random walker at time tt.

Lemma 2

The probability p~uv(t)\tilde{p}_{uv}^{(t)} to start a walk at uu and to reach vv in tt steps is bounded as follows:

(1puu)(maxwV(1pww))t1puv(t)p~uv(t)(1puu)(minwV(1pww))t1puv(t).(1-p_{uu^{\prime}})(\max_{w\in V}(1-p_{ww^{\prime}}))^{t-1}p_{uv}^{(t)}\geq\tilde{p}_{uv}^{(t)}\geq(1-p_{uu^{\prime}})(\min_{w\in V}(1-p_{ww^{\prime}}))^{t-1}p_{uv}^{(t)}.
Proof:

Since p~uv=(1puu)puv\tilde{p}_{uv}=(1-p_{uu^{\prime}})p_{uv}, we have

p~uv(2)\displaystyle\tilde{p}_{uv}^{(2)} =\displaystyle= wout(u)in(v)p~uwp~wv=w(1puu)puw(1pww)pwv\displaystyle\sum_{w\in out(u)\cap in(v)}\tilde{p}_{uw}\tilde{p}_{wv}=\sum_{w}(1-p_{uu^{\prime}})p_{uw}(1-p_{ww^{\prime}})p_{wv}
\displaystyle\geq (1puu)minw(1pww)wpuwpwv=(1puu)minw(1pww)puv(2),\displaystyle(1-p_{uu^{\prime}})\min_{w}(1-p_{ww^{\prime}})\sum_{w}p_{uw}p_{wv}=(1-p_{uu^{\prime}})\min_{w}(1-p_{ww^{\prime}})p^{(2)}_{uv},
p~uv(3)\displaystyle\tilde{p}_{uv}^{(3)} =\displaystyle= xp~uxp~xv(2)x(1puu)pux(1pxx)minwout(x)in(v)(1pww)pxv(2)\displaystyle\sum_{x}\tilde{p}_{ux}\tilde{p}_{xv}^{(2)}\geq\sum_{x}(1-p_{uu^{\prime}})p_{ux}(1-p_{xx^{\prime}})\min_{w\in out(x)\cap in(v)}(1-p_{ww^{\prime}})p^{(2)}_{xv}
\displaystyle\geq (1puu)minxout(u)in(w)(1pxx)minw(1pww)puv(3)\displaystyle(1-p_{uu^{\prime}})\min_{x\in out(u)\cap in(w)}(1-p_{xx^{\prime}})\min_{w}(1-p_{ww^{\prime}})p_{uv}^{(3)}

and we observe that the minimization is taken over all walks from uu to vv in tt steps, or more precisely over all nodes involved in such walks, except uu and vv. Certainly for xx in such a walk, minx(1pxx)minwV(1pww)\min_{x}(1-p_{xx^{\prime}})\geq\min_{w\in V}(1-p_{ww^{\prime}}). The other inequality can be established identically, and thus

(1puu)(maxwV(1pww))t1puv(t)p~uv(t)(1puu)(minwV(1pww))t1puv(t).(1-p_{uu^{\prime}})(\max_{w\in V}(1-p_{ww^{\prime}}))^{t-1}p_{uv}^{(t)}\geq\tilde{p}_{uv}^{(t)}\geq(1-p_{uu^{\prime}})(\min_{w\in V}(1-p_{ww^{\prime}}))^{t-1}p_{uv}^{(t)}.

For tt small or for uu an isolated node, better bounds are obtained by replacing the minimum/maximum over all nodes by those in the different walks from uu to vv, but for tt large, starting from tn1t\geq n-1, it is likely that for most well connected nodes, the bounds are getting tight. ∎

Corollary 1

In particular:

  1. 1.

    If puu=a<1p_{uu^{\prime}}=a<1 for all uVu\in V, we have p~uv(t)=(1a)tpuv(t)\tilde{p}_{uv}^{(t)}=(1-a)^{t}p_{uv}^{(t)}.

  2. 2.

    If puv=1dout(u)p_{uv}=\frac{1}{d_{out}(u)} and puu=1dout(u)+1p_{uu^{\prime}}=\frac{1}{d_{out}(u)+1}, then p~uv=1dout(u)+1\tilde{p}_{uv}=\frac{1}{d_{out}(u)+1}, and

    (11dout(u)+1)(maxwVdout(w)dout(w)+1)t1puv(t)p~uv(t)(11dout(u)+1)(minwVdout(w)dout(w)+1)t1puv(t).(1-\tfrac{1}{d_{out}(u)+1})(\max_{w\in V}\frac{d_{out}(w)}{d_{out}(w)+1})^{t-1}p_{uv}^{(t)}\geq\tilde{p}_{uv}^{(t)}\geq(1-\tfrac{1}{d_{out}(u)+1})(\min_{w\in V}\frac{d_{out}(w)}{d_{out}(w)+1})^{t-1}p_{uv}^{(t)}.
Proof:
  1. 1.

    All inequalities in the proof of the lemma are equalities in this case.

  2. 2.

    By definition, p~uv=(11dout(u)+1)1dout(u)=1dout(u)+1\tilde{p}_{uv}=(1-\frac{1}{d_{out}(u)+1})\frac{1}{d_{out}(u)}=\frac{1}{d_{out}(u)+1}.

The case when puu=a<1p_{uu^{\prime}}=a<1 is reminiscent of the notion of Katz centrality. A factor (1a)t(1-a)^{t} is introduced so that the longer the path, the lower the probability. If aa is chosen close to 1, e.g. a=0.9a=0.9, then (1a)t(1-a)^{t} (e.g. 110t\tfrac{1}{10^{t}}) becomes quickly negligible. If instead aa is chosen close to 0, e.g. a=0.1a=0.1, then it takes longer for probabilities to become negligible (e.g. (910)500.0051(\tfrac{9}{10})^{50}\approx 0.0051).

The case puu=1dout(u)+1p_{uu^{\prime}}=\frac{1}{d_{out}(u)+1} instead uses (inverse) proportionality to the number of outgoing edges. Both the upper and lower bounds given in the proof of the above corollary depend on the degree of the nodes included in walks from uu to vv, and when the walk length grows, the number of distinct nodes is likely to increase, giving the bounds stated in the corollary. These bounds depend on the function xx+1\tfrac{x}{x+1}, which closely converges to 1, in fact for dout(v)=9d_{out}(v)=9, we already get 0.90.9. Therefore, p~uv(t)\tilde{p}_{uv}^{(t)} is mostly behaving as puv(t)p_{uv}^{(t)}, except if dout(u)d_{out}(u) is small enough (say less than 8).

In summary, the emphasis of the case puu=a<1p_{uu^{\prime}}=a<1 is on the length of the walk, not on the walk itself, while for puu=1dout(u)+1p_{uu^{\prime}}=\frac{1}{d_{out}(u)+1} it is actually on the nodes traversed during the walk, with the ability to separate the nodes of low entropic centrality from the others.

The bounds of Lemma 2 can be applied to the asymptotic case.

Lemma 3

The probability πuw=t1p~uw(t)pww\pi_{uw}=\sum_{t\geq 1}\tilde{p}_{uw}^{(t)}p_{ww^{\prime}} to start at uu and to be absorbed at wuw\neq u over time is bounded as follows:

(1puu)(t1(minxV(1pxx))t1puw(t))pwwπuw(1puu)(t1(maxxV(1pxx))t1puw(t))pww.(1-p_{uu^{\prime}})(\sum_{t\geq 1}(\min_{x\in V}(1-p_{xx^{\prime}}))^{t-1}p_{uw}^{(t)})p_{ww^{\prime}}\leq\pi_{uw}\leq(1-p_{uu^{\prime}})(\sum_{t\geq 1}(\max_{x\in V}(1-p_{xx^{\prime}}))^{t-1}p_{uw}^{(t)})p_{ww^{\prime}}.
Proof:

Recall that πuw=t1p~uw(t)pww\pi_{uw}=\sum_{t\geq 1}\tilde{p}_{uw}^{(t)}p_{ww^{\prime}} is the probability to start at uu and to be absorbed at ww over time. Then

πuw=t1p~uw(t)pww\displaystyle\pi_{uw}=\sum_{t\geq 1}\tilde{p}_{uw}^{(t)}p_{ww^{\prime}} =\displaystyle= (p~uw+t2i=1t1p~uv(i)p~vw(ti)+t2i=1t1yvp~uy(i)p~yw(ti))pww\displaystyle(\tilde{p}_{uw}+\sum_{t\geq 2}\sum_{i=1}^{t-1}\tilde{p}_{uv}^{(i)}\tilde{p}_{vw}^{(t-i)}+\sum_{t\geq 2}\sum_{i=1}^{t-1}\sum_{y\neq v}\tilde{p}_{uy}^{(i)}\tilde{p}_{yw}^{(t-i)})p_{ww^{\prime}} (5)

and use Lemma 2:

p~uv(i)(1puu)(minxV(1pxx))i1puv(i),p~vw(ti)(1pvv)(minxV(1pxx))ti1pvw(ti)\tilde{p}_{uv}^{(i)}\geq(1-p_{uu^{\prime}})(\min_{x\in V}(1-p_{xx^{\prime}}))^{i-1}p_{uv}^{(i)},~{}\tilde{p}_{vw}^{(t-i)}\geq(1-p_{vv^{\prime}})(\min_{x\in V}(1-p_{xx^{\prime}}))^{t-i-1}p_{vw}^{(t-i)}

to get

t2i=1t1p~uv(i)p~vw(ti)\displaystyle\sum_{t\geq 2}\sum_{i=1}^{t-1}\tilde{p}_{uv}^{(i)}\tilde{p}_{vw}^{(t-i)} \displaystyle\geq (1puu)(1pvv)t2(minxV(1pxx))t2i=1t1puv(i)pvw(ti)\displaystyle(1-p_{uu^{\prime}})(1-p_{vv^{\prime}})\sum_{t\geq 2}(\min_{x\in V}(1-p_{xx^{\prime}}))^{t-2}\sum_{i=1}^{t-1}p_{uv}^{(i)}p_{vw}^{(t-i)}
t2i=1t1yvp~uy(i)p~yw(ti)\displaystyle\sum_{t\geq 2}\sum_{i=1}^{t-1}\sum_{y\neq v}\tilde{p}_{uy}^{(i)}\tilde{p}_{yw}^{(t-i)} \displaystyle\geq (1puu)t2(minxV(1pxx))t2i=1t1yv(1pyy)puy(i)pyw(ti)\displaystyle(1-p_{uu^{\prime}})\sum_{t\geq 2}(\min_{x\in V}(1-p_{xx^{\prime}}))^{t-2}\sum_{i=1}^{t-1}\sum_{y\neq v}(1-p_{yy^{\prime}})p_{uy}^{(i)}p_{yw}^{(t-i)}
\displaystyle\geq (1puu)t2(minxV(1pxx))t1i=1t1yvpuy(i)pyw(ti).\displaystyle(1-p_{uu^{\prime}})\sum_{t\geq 2}(\min_{x\in V}(1-p_{xx^{\prime}}))^{t-1}\sum_{i=1}^{t-1}\sum_{y\neq v}p_{uy}^{(i)}p_{yw}^{(t-i)}.

Therefore

πuw\displaystyle\pi_{uw} \displaystyle\geq (1puu)[puw+(1pvv)t2(minxV(1pxx))t2i=1t1puv(i)pvw(ti)\displaystyle(1-p_{uu^{\prime}})[p_{uw}+(1-p_{vv^{\prime}})\sum_{t\geq 2}(\min_{x\in V}(1-p_{xx^{\prime}}))^{t-2}\sum_{i=1}^{t-1}p_{uv}^{(i)}p_{vw}^{(t-i)} (6)
+t2(minxV(1pxx))t1i=1t1yvpuy(i)pyw(ti))]pww,\displaystyle+\sum_{t\geq 2}(\min_{x\in V}(1-p_{xx^{\prime}}))^{t-1}\sum_{i=1}^{t-1}\sum_{y\neq v}p_{uy}^{(i)}p_{yw}^{(t-i)})]p_{ww^{\prime}},

for vv an intermediate node between uu and ww.

The bound can be made coarser, using that (1pvv)minxV(1pxx)(1-p_{vv^{\prime}})\geq\min_{x\in V}(1-p_{xx^{\prime}}):

πuw\displaystyle\pi_{uw} \displaystyle\geq (1puu)[puw+t2(minxV(1pxx))t1i=1t1(puv(i)pvw(ti)+yvpuy(i)pyw(ti))]pww\displaystyle(1-p_{uu^{\prime}})[p_{uw}+\sum_{t\geq 2}(\min_{x\in V}(1-p_{xx^{\prime}}))^{t-1}\sum_{i=1}^{t-1}(p_{uv}^{(i)}p_{vw}^{(t-i)}+\sum_{y\neq v}p_{uy}^{(i)}p_{yw}^{(t-i)})]p_{ww^{\prime}}
=\displaystyle= (1puu)[puw+t2(minxV(1pxx))t1puv(t)]pww\displaystyle(1-p_{uu^{\prime}})[p_{uw}+\sum_{t\geq 2}(\min_{x\in V}(1-p_{xx^{\prime}}))^{t-1}p_{uv}^{(t)}]p_{ww^{\prime}}
=\displaystyle= (1puu)t1(minxV(1pxx))t1puv(t)pww.\displaystyle(1-p_{uu^{\prime}})\sum_{t\geq 1}(\min_{x\in V}(1-p_{xx^{\prime}}))^{t-1}p_{uv}^{(t)}p_{ww^{\prime}}.

The bounds are well matching the intuition: three components mostly influence πuw\pi_{uw}: the likelihood of leaving uu (the term 1puu1-p_{uu^{\prime}}), that of reaching ww from uu, and that of getting trapped at ww (the term pwwp_{ww^{\prime}}). When puu=a<1p_{uu^{\prime}}=a<1 for all uVu\in V, the bounds on πuw\pi_{uw} are met with equality and we have that

πuw=(1a)t1(1a)(t1)puw(t)a,πuu=(1a)t1(1a)(t1)puu(t)a+a.\pi_{uw}=(1-a)\sum_{t\geq 1}(1-a)^{(t-1)}p_{uw}^{(t)}a,~{}\pi_{uu}=(1-a)\sum_{t\geq 1}(1-a)^{(t-1)}p_{uu^{\prime}}^{(t)}a+a.

Therefore πuw\pi_{uw} is weighted by (1a)(1-a) and aa irrespectively of the choice of u,wu,w, and so is the corresponding entropic centrality: when aa grows, πuu\pi_{uu} grows, thus πuw\pi_{uw} for wuw\neq u must decrease since probabilities sum up to 1 and

wπuwlog2(1πuw)\displaystyle\sum_{w}\pi_{uw}\log_{2}(\tfrac{1}{\pi_{uw}}) =\displaystyle= πuulog2(1πuu)+wuπuwlog2(1πuw),\displaystyle\pi_{uu}\log_{2}(\tfrac{1}{\pi_{uu}})+\sum_{w\neq u}\pi_{uw}\log_{2}(\tfrac{1}{\pi_{uw}}),
\displaystyle\leq πuulog2(1πuu)+(1a)log2(n11a)\displaystyle\pi_{uu}\log_{2}(\tfrac{1}{\pi_{uu}})+(1-a)\log_{2}(\tfrac{n-1}{1-a})

since the second sum contains n1n-1 terms whose sum is at most 1a1-a, and whose maximum is reached when all probabilities are 1an1\tfrac{1-a}{n-1}. Also the function xlog2(x)=1ln2xlnx-x\log_{2}(x)=-\tfrac{1}{\ln 2}x\ln x is concave and has a global maximum at xx such that 1ln2(lnx+1)=0-\tfrac{1}{\ln 2}(\ln x+1)=0, that is x=130.36788x=\tfrac{1}{3}\approx 0.36788, for which xlog2(x)0.53074-x\log_{2}(x)\approx 0.53074. Thus

CH(u)0.53074+(1a)log2(n11a)C_{H}^{\infty}(u)\leq 0.53074+(1-a)\log_{2}(\tfrac{n-1}{1-a}) (7)

and for a given nn, this upper bound becomes small when aa grows. The same argument repeated for puu=1dout(u)+1p_{uu^{\prime}}=\frac{1}{d_{out}(u)+1} shows (part of) the role of the out-degree of uu in its centrality.

In what follows, we will provide some experiments to illustrate the consequences of choosing puu=a<1p_{uu^{\prime}}=a<1, but we will then focus on the case puu=1dout(u)+1p_{uu^{\prime}}=\frac{1}{d_{out}(u)+1} for the reasons just discussed: (1) choosing p~uv\tilde{p}_{uv} is influenced by the choice of the walk rather than its length, and (2) πuv\pi_{uv} depends on the degrees of uu and vv rather than on the constant aa.

Lemma 4

Set n=|V|n=|V|. The asymptotic Markov entropic centralization (see [10]) defined by

CH(G)=vVCH(v^)CH(v)maxvVCH(v^)CH(v)C_{H}^{\infty}(G)=\frac{\sum_{v\in V}C_{H}^{\infty}(\hat{v})-C_{H}^{\infty}(v)}{\max\sum_{v\in V}C_{H}^{\infty}(\hat{v})-C_{H}^{\infty}(v)}

where v^\hat{v} is the node with the highest Markov entropic centrality in VV is given by

CH(G)=vVCH(v^)CH(v)nlog2(n)C_{H}^{\infty}(G)=\frac{\sum_{v\in V}C_{H}^{\infty}(\hat{v})-C_{H}^{\infty}(v)}{n\log_{2}(n)}

when puu=1dout(u)+1p_{uu^{\prime}}=\frac{1}{d_{out}(u)+1}.

Proof:

The maximum at the denominator is taken over all possible graphs with the same number of vertices. A graph that maximizes the denominator would have one node with the maximum centrality, and all the other nodes with a centrality 0 (thus minimizing the terms contributing negatively). This graph is the star graph on |V||V| vertices, since the middle node has |V||V| outgoing edges, and the |V|1|V|-1 leaves have none. Therefore all leaves have an entropic centrality of 0, and the center v^\hat{v} of the star has maximum entropic centrality. Formally, with |V|=n|V|=n and putting v^\hat{v} in the first row:

P^=[P~D𝟎n𝐈n],P~=[1n+11n+11n+1𝟎n1,112𝐈n1,n1],D=[1n+1𝟎1,n1𝟎n1,112𝐈n1,n1].\hat{P}=\begin{bmatrix}\tilde{P}&D\\ {\bf 0}_{n}&{\bf I}_{n}\end{bmatrix},~{}\tilde{P}=\begin{bmatrix}\tfrac{1}{n+1}&\tfrac{1}{n+1}&\ldots&\tfrac{1}{n+1}\\ {\bf 0}_{n-1,1}&\tfrac{1}{2}{\bf I}_{n-1,n-1}\\ \end{bmatrix},~{}D=\begin{bmatrix}\tfrac{1}{n+1}&{\bf 0}_{1,n-1}\\ {\bf 0}_{n-1,1}&\tfrac{1}{2}{\bf I}_{n-1,n-1}\\ \end{bmatrix}.

Then

𝐈nP~=[nn+11n+11n+1𝟎n1,112𝐈n1,n1],{\bf I}_{n}-\tilde{P}=\begin{bmatrix}\tfrac{n}{n+1}&-\tfrac{1}{n+1}&\ldots&-\tfrac{1}{n+1}\\ {\bf 0}_{n-1,1}&\tfrac{1}{2}{\bf I}_{n-1,n-1}\\ \end{bmatrix},

and using Schur complement:

(𝐈nP~)1=[n+1n2n2n𝟎n1,12𝐈n1,n1],({\bf I}_{n}-\tilde{P})^{-1}=\begin{bmatrix}\tfrac{n+1}{n}&\tfrac{2}{n}&\ldots&\tfrac{2}{n}\\ {\bf 0}_{n-1,1}&2{\bf I}_{n-1,n-1}\\ \end{bmatrix},

so

(𝐈nP~)1D=[1n1n1n𝟎n1,1𝐈n1,n1].({\bf I}_{n}-\tilde{P})^{-1}D=\begin{bmatrix}\tfrac{1}{n}&\tfrac{1}{n}&\ldots&\tfrac{1}{n}\\ {\bf 0}_{n-1,1}&{\bf I}_{n-1,n-1}\\ \end{bmatrix}.

Lemma 1 tells us that πv^v=1n\pi_{\hat{v}v}=\frac{1}{n} for all vv and thus its Markov entropic centrality is log2(n)\log_{2}(n), while CH(v)C_{H}^{\infty}(v) is 0 for vv^v\neq\hat{v}. ∎

Definition 2

Given a graph GG, label its vertices in increasing order with respect to their Markov entropic centralities, that is CH(vi)CH(vi+1)C_{H}^{\infty}(v_{i})\leq C_{H}^{\infty}(v_{i+1}) for i=1,,n1i=1,\ldots,n-1 and puu=1dout(u)+1p_{uu^{\prime}}=\tfrac{1}{d_{out}(u)+1}. We define the centralization sequence of GG to be the ordered sequence

(vVCH(vi)CH(v)log2(n))i=1,,n.(\frac{\sum_{v\in V}C_{H}^{\infty}(v_{i})-C_{H}^{\infty}(v)}{\log_{2}(n)})_{i=1,\ldots,n}.

It is an increasing sequence, with values ranging from -1 to 1. We already know that the maximum is 1 by Lemma 4. The minimum is achieved when v1v_{1} has centrality 0, and every other node has centrality log2(n)/n\log_{2}(n)/n. This is the case if we consider a graph on nn vertices defined as follows: build a complete graph on n1n-1 vertices, that is each of the n1n-1 vertices have n1n-1 outgoing edges (including to themselves). Then add one additional vertex (v1v_{1}), and an outgoing edge from every of the previous n1n-1 vertices of the complete graph to this new vertex. The advantage of the centralization sequence with say considering the sequences of ordered centralities is that it captures for every node how central it is with respect to other nodes, normalized by a factor that takes into account the size of the graph, which allows comparisons among different graphs.

2.2 A Markov Entropic Centrality Analysis of the Karate Club Network

Refer to caption
Figure 2: The karate club network: we show in Figure 3 the path and Markov entropic centralities (for different values of DD and tt) for the nodes 1, 5, 12, 29, 33 and 34. The degrees are respectively 16,3,1,3,12,1716,3,1,3,12,17.

Consider the karate club network [38] (used as an example in [23]) shown in Figure 2. We use this small social network comprising 34 members (nodes) as a toy example to illustrate and validate some of the ideas explored in this paper. The 78 edges represent the interactions between members outside the club, which eventually led to a split of the club into two, and are used to predict which members will join which group. This is an undirected unweighted graph, which is treated as a particular case of directed graph (dout(u)d_{out}(u) is the degree of uu). Let PP denote the transition matrix such that Puv=1dout(u)P_{uv}=\frac{1}{d_{out}(u)} for every uVu\in V and every neighbor vv of uu. The work [23] explores the choice of DD (and tt) in the context of clustering. Here, we start by investigating the role of DD in terms of the resulting Markov entropic centrality, for tt finite (t=1,,6t=1,\ldots,6 since the karate club network has a diameter of 5) and asymptotic (using Lemma 1).

Refer to caption
Figure 3: The Markov entropic centrality CHt(u)C_{H}^{t}(u) for u=1,34,33,29,12,5u=1,34,33,29,12,5 (from Figure 2) for t=1,2,3,4,5,6t=1,2,3,4,5,6: on the upper left, D=0.001𝐈34D=0.001{\bf I}_{34}, on the upper right, D=15𝐈34D=\tfrac{1}{5}{\bf I}_{34}, on the lower left, D=12𝐈34D=\tfrac{1}{2}{\bf I}_{34} and for DD such that Duu=1dout(u)+1D_{uu}=\tfrac{1}{d_{out}(u)+1} on the lower right. The dots show the asymptotic values CHinf(u)C_{H}^{\inf}(u).
Influence of DD.

Figure 3 illustrates the Markov entropic centrality CHt(u)C_{H}^{t}(u) of the nodes u{1,34,33,u\in\{1,34,33, 29,12,5}29,12,5\}111We choose these specific nodes, since these were studied explicitly in [23]. for different values of DD: for D=a𝐈34D=a{\bf I}_{34},222Note that 𝐈M{\bf I}_{M} represents the identity matrix of dimension M×MM\times M. with a=0.001a=0.001333We cannot use D=𝟎D={\bf 0} since this means no absorption probability. Also the matrix (𝐈nP~)({\bf I}_{n}-\tilde{P}) in Lemma 1 would have no reason to be invertible., 0.2,0.50.2,0.5, we observe that CHt(u)C_{H}^{t}(u) is decreasing and flattening, for all nodes and for every value of tt. This is expected, since when the probabilities at the auxiliary nodes are increasing as a constant, the overall uncertainty about the random walk is reducing (as computed in (7)). Thus the higher the absorption probability, the greater the attenuation of the entropic centralities for all nodes. More precisely, (7) upper bounds CH(u)C_{H}^{\infty}(u) by

0.53074+(1a)log2(331a)5.5715,4.8238,3.5529fora=0.001,0.2,0.5,0.53074+(1-a)\log_{2}(\tfrac{33}{1-a})\approx 5.5715,4.8238,3.5529~{}for~{}a=0.001,0.2,0.5,

which is consistent with the numerical values obtained (4.8232\approx 4.8232, 4.13194.1319, 2.94752.9475).

For DD such that Duu=1dout(u)+1D_{uu}=\tfrac{1}{d_{out}(u)+1} (shown on the lower right subfigure), the Markov entropic centralities are more separated than previously: indeed, a node with small degree then has a high absorption probability, which induces a large attenuation on its entropic centrality (as discussed after (7)). The net effect is a wide gulf in the centrality scores between nodes with low and high degrees.

Influence of tt.

We notice how the centrality of a node is influenced over time by its neighbors. Consider for example the upper left corner figure for nodes 12 and 5. Node 12 starts (at t=1t=1) with an entropic centrality significantly lower than node 5 - indeed node 5 has three neighbors (dout(5)=2d_{out}(5)=2 neighbors and the self-loop), more than 12 which has only one (thus CH1(12)=1(1/2)log2(2)C_{H}^{1}(12)=1(1/2)\log_{2}(2) including the self-loop). Yet node 12 reaches a “hub” (a node of itself high centrality), namely node 1, at t=2t=2, and thus for t2t\geq 2, its entropic centrality grows, and eventually ends up being almost as high as that of node 1 itself. In contrast, even though node 5 reaches the same hub, it also belongs to a local community within the graph, inside which a significant amount of its influence (flow) stays confined. This explains why node 5 ends up having a lower entropic centrality than node 12 in particular. In the upper right and lower left plots, we have assigned a significant volume of the flow to be absorbed at the auxiliary nodes, which has a net effect of attenuating the absolute values of entropic centrality for all nodes, i.e., a downward shift and flattening. This happens for nodes 5, 12, 29 on the lower right plot, since 1dout(u)+1\tfrac{1}{d_{out}(u)+1} is significant if the node has a low degree like 5, 12, and 29. Taken together with the initial gap among the centralities of nodes 5 and 12, we thus do not observe the overtake in these experiments, unlike in the case where the absorption probability was negligible.

Another less prominent behavior that we notice for node 34, and to a certain extent for node 1, is that there is a dip in their entropic centralities at the beginning, before they rise up to their long term values. We hypothesize that this is because some of the immediate neighbors of these nodes form local groupings (for instance, 34’s neighbor node 15 has only 33 and 34 as neighbors, node 27 has only nodes 30 and 34 as neighbors), and thus after the initial dispersal of influence among a relatively large number of these neighbors (hence the initial high centrality), their influence gets concentrated within the local cluster with the initiating node itself for a while (thus the dip), before reaching out more evenly to the rest of the network, leading to another rise in entropic centrality.

Centralization sequence.

The min and the max of the centralities are 3.0859 and 4.7216, while the mean and the median are 4.07636 and 3.9111. The min, median, mean and max of the centralization sequence as defined in Definition 2 are [-0.19467,-0.03248,0,0.12682]. The values for the above studied nodes of interest are 0.1226 for 1, 0.1241 for 34, 0.10195 for 33, 0.03485 for 29, -0.17471 for 12 and -0.06289 for 5.

Hypothesis.

The above analysis suggests that changes in the Markov entropic centralities over time are indicative of local communities in the graph, with changes in gradient corresponding to traversal of boundaries from one community to another. Nodes with low centralization have a low centrality with respect to both other nodes in the graph and graphs of the same size, since they are likely to be either isolated or to belong to a small community (e.g. node 5). While the reverse argument suggests that nodes with high centralization (e.g. nodes 1, 33 and 34) are likely to be either hubs or close to hubs. We will explore and exploit this observation to design a clustering algorithm in Section 4.

Comparison with known centralities.

Table I compares different centralities. In addition to the previously mentioned prominent centrality measures, we also consider load centrality [12] in our experiments, because it captures the fraction of traffic that flows through individual nodes (load), considering all pairwise communications to be through corresponding shortest paths. Both the path and the asymptotic Markov centralities give the same ranking, it is similar to the ranking given by the degree centrality CDC_{D} (with some difference regarding node 5 and 12, which are explained by the above discussion). The betweenness and load centrality agree on their ranking, which is different from the other ones. These results are not surprising: for a small graph, the degree heavily influences short paths/walks, explaining the agreement among CH,CHC_{H},C_{H}^{\infty} and CDC_{D}, but it has little to do with the betweenness and the load. The same ranking observed between betweenness and load centrality is expected, since the latter was proposed to be a different interpretation of betweenness centrality [12], even though some minor differences have been identified subsequently [7].

node CHC_{H} CHC_{H}^{\infty} CDC_{D} CBC_{B} CLC_{L}
node 34 4.83992 4.82504 0.5151 0.3040 0.2984
node 1 4.83041 4.81999 0.4848 0.4376 0.4346
node 33 4.76892 4.72539 0.3636 0.1452 0.1476
node 29 4.50092 4.34323 0.0909 0.0017 0.0017
node 5 4.07244 3.90674 0.0909 0.0006 0.0006
node 12 3.39469 3.26763 0.0303 0 0
TABLE I: Different centrality measures: CHC_{H} and CHC_{H}^{\infty} are the path (1) and asymptotic Markov entropic centralities (2), CDC_{D}, CBC_{B}, CLC_{L} are respectively the degree, betweenness and load centralities.

3 Entropic Centrality for Weighted Graphs

Consider now a weighted directed graph Gw=(V,E)G_{w}=(V,E), where the weight function w:E0w:E\rightarrow\mathbb{R}_{\geq 0} attaches a non-negative weight w(u,v)w(u,v) to every edge (u,v)E(u,v)\in E. For a node uVu\in V, let 𝒩u={vV,(u,v)E}\mathcal{N}_{u}=\{v\in V,~{}(u,v)\in E\} be the set of out-neighbors of uu, dout(u)=|𝒩u|d_{out}(u)=|\mathcal{N}_{u}| be the out-degree of uu, and dw,out(u)=v𝒩uw(u,v)d_{w,out}(u)=\sum_{v\in\mathcal{N}_{u}}w(u,v) be the weighted out-degree of uu.

A natural way to define a transition matrix PwP_{w} to describe a random walk over GwG_{w} taking into account the weight function ww would be to set Pw,uv=w(u,w)dw,out(u)P_{w,uv}=\frac{w(u,w)}{d_{w,out}(u)}. Now at t=1t=1, a node u1u_{1} whose out-degree is 2 with outgoing edges of weight 1 is considered less important than a node u2u_{2} with out-degree 6 and all 6 outgoing edges of weight 1, because u1u_{1} contributes a probability 12\frac{1}{2} instead of 16\frac{1}{6} for u2u_{2}, therefore reducing the uncertainty from log262.585\log_{2}6\approx 2.585 to log22=1\log_{2}2=1. But if a node u3u_{3} has two outgoing edges, one with weight 2 and one with weight 3, then its entropic centrality is 25log22535log2350.97<1-\frac{2}{5}\log_{2}\frac{2}{5}-\frac{3}{5}\log_{2}\frac{3}{5}\approx 0.97<1. Thus the node u3u_{3} with weighted outgoing edges has lower entropic centrality than the node u1u_{1} with unweighted outgoing edges. This is meaningful from an entropy view point, since it captures uncertainty, and the uniform distribution over the outgoing edges gives the most uncertainty about the random walk. However this may not be desired if the given scenario requires nodes with high weighted out-degree to be more central than others, e.g., when using the centrality measure as a proxy for determining influence. It is well known that centrality measures for unweighted graphs can be adapted to weighted graphs, but at the risk of changing their meaning, and that one way to remedy this is by the introduction of a weight parameter, see e.g. [28].

3.1 Weighted Markov Entropic Centrality

In order to capture the weights of the outgoing edges in the current framework of Markov entropic centrality, we introduce two tuning parameters, a conversion function α:w(e)α(w(e))\alpha:w(e)\rightarrow\alpha(w(e)) and a node weight function μ:vμ(v)\mu:v\rightarrow\mu(v).

The conversion function α:w(e)α(w(e))\alpha:w(e)\rightarrow\alpha(w(e)) adjusts the transition matrix Pα(w)P_{\alpha(w)} such that Pα(w),uv=α(w(u,w))dα(w),out(u)P_{\alpha(w),uv}=\frac{\alpha(w(u,w))}{d_{\alpha(w),out}(u)} (with dα(w),out(u)=v𝒩uα(w(u,v))d_{\alpha(w),out}(u)=\sum_{v\in\mathcal{N}_{u}}\alpha(w(u,v))) depending on the importance that weights are supposed to play compared to edges. The two obvious choices for α\alpha are α(w(e))=1\alpha(w(e))=1 for all edges ee, which treats the weighted graph as unweighted, and α(w(e))=w(e)\alpha(w(e))=w(e) which considers the graph with its given weights, more generally one could define α(w(e))=w(e)β\alpha(w(e))=w(e)^{\beta} for some parameter β\beta. This formulation has the flexibility to give more or less importance to weights with respect to edges. It is enough here to consider the case where weighted edges are more important than unweighted ones.

The conversion function α\alpha allows the weights to play a role in the transition matrix Pα(w)P_{\alpha(w)}, based on the scenario needs, but it does not address the issue of defining entropic centrality for weighted graphs. To do so, we need a tuning parameter (e.g., [28]) within the definition of entropic centrality, that maintains the semantics and meaning of the notion of entropy. We use the node weight function μ:vμ(v)\mu:v\rightarrow\mu(v) and propose the notion of weighted Markov entropic centrality, which is inspired by the concept of weighted entropy [13].

Definition 3

The weighted Markov entropic centrality Cα(w),Ht(u)C_{\alpha(w),H}^{t}(u) of a node uu at time tt is defined to be

vVμ(v)(p~α(w),uv(t)+puv(t))log2(p~α(w),uv(t)+puv(t)),-\sum_{v\in V}\mu(v)(\tilde{p}_{\alpha(w),uv}^{(t)}+p_{uv^{\prime}}^{(t)})\log_{2}(\tilde{p}_{\alpha(w),uv}^{(t)}+p_{uv^{\prime}}^{(t)}), (8)

where p~α(w),uv(t)\tilde{p}_{\alpha(w),uv}^{(t)} is the probability to reach vv at time tt from uu, for vv in VV, taking into account the weights α(w(e))\alpha(w(e)) for every edge in EE. Auxiliary nodes defined for the unweighted case are still present and puv(t)p_{uv^{\prime}}^{(t)} is the probability to reach an auxiliary node vv^{\prime} from uu at time tt, which depends on the choice of the absorption probability matrix DD, as in the unweighted case. The probabilities p~α(w),uv(t)+puv(t)\tilde{p}_{\alpha(w),uv}^{(t)}+p_{uv^{\prime}}^{(t)} are obtained from the matrix P^α(w)t\hat{P}^{t}_{\alpha(w)} (use Pα(w)P_{\alpha(w)} instead of PP in Lemma 1).

Before discussing the choice of μ(v)\mu(v), we recall that when tt\rightarrow\infty, as for the unweighted case, the terms in column vv of P^α(w)t\hat{P}^{t}_{\alpha(w)} becomes 0, and we are left with the term in column vv^{\prime}, of the form

πα(w),uv:={t1p~α(w),uv(t)pvvuv(t1p~α(w),uv(t)+1)pvvu=v,\pi_{\alpha(w),uv}:=\left\{\begin{array}[]{ll}\sum_{t\geq 1}\tilde{p}_{\alpha(w),uv}^{(t)}p_{vv^{\prime}}&u\neq v\\ (\sum_{t\geq 1}\tilde{p}_{\alpha(w),uv}^{(t)}+1)p_{vv^{\prime}}&u=v,\\ \end{array}\right.

asymptotically yielding the entropic centrality

Cα(w),H(u)=vVμ(v)πα(w),uvlog2(πα(w),uv).C_{\alpha(w),H}^{\infty}(u)=-\sum_{v\in V}\mu(v)\pi_{\alpha(w),uv}\log_{2}(\pi_{\alpha(w),uv}).

Lemma 3 holds, therefore similarly to the unweighted case, we have that if the probability of absorption is puu=ap_{uu^{\prime}}=a, then repeating the arguments leading to (7) yields:

wμ(w)πuwlog2(1πuw)\displaystyle\sum_{w}\mu(w)\pi_{uw}\log_{2}(\tfrac{1}{\pi_{uw}}) =\displaystyle= μ(u)πuulog2(1πuu)+wuμ(w)πuwlog2(1πuw),\displaystyle\mu(u)\pi_{uu}\log_{2}(\tfrac{1}{\pi_{uu}})+\sum_{w\neq u}\mu(w)\pi_{uw}\log_{2}(\tfrac{1}{\pi_{uw}}), (9)
\displaystyle\leq maxvVμ(v)(0.53074+(1a)log2(n11a))\displaystyle\max_{v\in V}\mu(v)(0.53074+(1-a)\log_{2}(\tfrac{n-1}{1-a}))

which shows that while maxvVμ(v)\max_{v\in V}\mu(v) may increase the overall centrality, it remains true, as for the unweighted case, that increasing aa just reduces the overall centrality. Therefore we will continue to use puu=1dout(u)+1p_{uu^{\prime}}=\tfrac{1}{d_{out}(u)+1} as absorption probability.

The computation of Cα(w),Ht(u)C_{\alpha(w),H}^{t}(u) involves a sum over all nodes vVv\in V, including v=uv=u, and the probability to go from uu to every vv. The weight μ(v)\mu(v) captures the influence of the reached node vv: if we reach influential nodes vv of weight μ(v)\mu(v) in tt time, then uu should be more central than if the only nodes we reach from it have no influence themselves. We define μ(v)\mu(v) to be μ(v)=(dw,out(v)dout(v))γ\mu(v)=\left(\tfrac{d_{w,out}(v)}{d_{out}(v)}\right)^{\gamma} if dout(v)0d_{out}(v)\neq 0. When dout(v)=0d_{out}(v)=0, dw,out(v)=0d_{w,out}(v)=0 and we set μ(v)=1\mu(v)=1. The weighted Markov entropic centrality (8) gives an influence which is proportional to the ratio (dw,out(v)dout(v))γ\left(\tfrac{d_{w,out}(v)}{d_{out}(v)}\right)^{\gamma}, while γ\gamma gives a way to amplify or reduce this influence.

If all weights are 1, then the ratio simplifies to 1, and we are back to the non-weighted definition of Markov entropic centrality (as also with γ=0\gamma=0). Other possible choices for μ(v)\mu(v) could be explored, such as μ(v)=log2dw,out(v)log2dout(v)\mu(v)=\tfrac{\log_{2}d_{w,out}(v)}{\log_{2}d_{out}(v)}. Indeed, log2dw,out(v)\log_{2}d_{w,out}(v) would be the influence that vv would have had over its neighbors in terms of entropic centrality if it had dw,out(v)d_{w,out}(v) neighbors of weight 1, while log2dout(v)\log_{2}d_{out}(v) is the entropic centrality over the neighbors of vv, ignoring the weights.

If the weights are normalized such that the lowest weight is mapped to 1, the entropic centrality values obtained when using a non-trivial function for μ(v)\mu(v) will necessarily be at least as much as obtained with μ(v)=1\mu(v)=1.

The centralization as computed in Lemma 4 can be generalized to the weighted case. In the unweighted case, comparison was done among graphs with the same number of vertices. In the weighted case, comparison is done among graphs with the same number of vertices, with given weights μ(v)\mu(v). A graph that would give the maximum centrality consists again of a star graph, with center v1v_{1}, assuming that the leaves have a self-loop which can be weighted, in which case dout(vi)=1d_{out}(v_{i})=1 for i1i\neq 1, and μ(vi)\mu(v_{i}) is any weight. The centrality of the leaves will be zero, but that of v1v_{1} will be i=1nμ(vi)pilog2(1/pi)\sum_{i=1}^{n}\mu(v_{i})p_{i}\log_{2}(1/p_{i}). However maximizing this quantity over pip_{i} does not have a closed form expression. Loose bounds can be computed though: 1nlog(n)i=1nμ(vi)maxp1,,pni=1nμ(vi)pilog2(1/pi)0.53074i=1nμ(vi)\frac{1}{n}\log(n)\sum_{i=1}^{n}\mu(v_{i})\leq\max_{p_{1},\ldots,p_{n}}\sum_{i=1}^{n}\mu(v_{i})p_{i}\log_{2}(1/p_{i})\leq 0.53074\sum_{i=1}^{n}\mu(v_{i}) (see before (7) for a bound on plog2(1/p)p\log_{2}(1/p)).

3.2 A Weighted Markov Entropic Centrality Analysis of a Cocaine Dealing Network

We consider the cocaine dealing network [8], a small directed weighted graph coming from an investigation into a large cocaine traffic in New York City. It involves 28 persons (nodes), and 40 directed weighted edges indicate communication exchanges obtained from police wiretapping.

Refer to caption
Figure 4: The cocaine dealing network [8]: weighted edges are drawn with quantized girth. Figure 5 shows the (weighted) Markov entropic centrality of the nodes 2, 27, 20 and 14.

The (weighted) Markov entropic centrality depends on the choice of the absorption matrix DD, we thus start by looking at how a change in DD influences the entropic centralities. We keep the same choices for DD as for the unweighted case, namely D=0.001𝐈28D=0.001{\bf I}_{28}, D=0.2𝐈28D=0.2{\bf I}_{28}, D=0.5𝐈D=0.5{\bf I} and DD such that Duu=1dw,out(u)+1D_{uu}=\tfrac{1}{d_{w,out}(u)+1}, since when w(e)=1w(e)=1 for all edges, then dw,out(u)=dout(u)d_{w,out}(u)=d_{out}(u) for all nodes.

Refer to caption
Figure 5: The Markov entropic centrality Cα(w),Ht(u)C_{\alpha(w),H}^{t}(u) for u=27,14u=27,14 (from Figure 4) for t=1,,5t=1,\ldots,5: on the upper left, D=0.001𝐈28D=0.001{\bf I}_{28}, on the upper right, D=0.2𝐈28D=0.2{\bf I}_{28}, on the lower left, D=0.5𝐈D=0.5{\bf I} and DD with Duu=1dw,out(u)+1D_{uu}=\tfrac{1}{d_{w,out}(u)+1} on the lower right.

In Figure 5, we plot the (weighted) Markov entropic centrality Cα(w),Ht(u)C_{\alpha(w),H}^{t}(u) for u=27,14u=27,14 for t=1,2,3,4,5,6t=1,2,3,4,5,6, for the 4 choices of absorption probabilities DD. For u=27,14u=27,14, 4 variations of Markov entropic centralities are considered: α(w)=1\alpha(w)=1 and μ(v)=1\mu(v)=1 (straight lines), corresponding to the unweighted case, α(w)=1\alpha(w)=1 and μ(v)=(dw,out(v)dout(v))\mu(v)=\left(\tfrac{d_{w,out}(v)}{d_{out}(v)}\right) (long dash lines) for the case where the transition matrix PP is used, α(w)=w\alpha(w)=w and μ(v)=1\mu(v)=1 (short dash lines) to show how centrality is computed based purely on PwP_{w}, and finally α(w)=w\alpha(w)=w with μ(v)=(dw,out(v)dout(v))\mu(v)=\left(\tfrac{d_{w,out}(v)}{d_{out}(v)}\right) (dotted lines), for which PwP_{w} is used for the transition matrix, together with the weighted entropic centrality. Nodes 27 and 14 are different in nature, in that node 14 is a hub, with high degree compared to the rest of the other nodes, while node 27 has only one incoming edge and two outgoing edges. They are chosen as representatives of nodes of respectively high and low degree (even though the degree is not the only contributing factor in the node centrality, it still plays an important role for small values of tt). For each of the 4 plots, the 4 upper lines characterize the behavior of node 14, and the 4 lower lines the one of node 27. We observe that the behavior of the entropic centralities when μ(v)=1\mu(v)=1 are similar to what was already observed for the karate club network: for D=0.2𝐈28D=0.2{\bf I}_{28} and D=0.5𝐈D=0.5{\bf I}, the centralities are flattened because the path uncertainty is reduced when the absorption probabilities are increasing (as shown in (9)), while for DD such that Duu=1dw,out(u)+1D_{uu}=\tfrac{1}{d_{w,out}(u)+1}, the gap among the centralities is (slightly) increasing. When μ(v)=dw,out(v)dout(v)\mu(v)=\tfrac{d_{w,out}(v)}{d_{out}(v)}, both centralities are amplified. Since node 14 has many outgoing edges, with weights including 10, 11, 14 18, 19, the introduction of μ(v)\mu(v) creates a higher amplification for node 14 than for node 27, whose edge weights are all less than 5. Zooming at time t=2t=2 for node 14 and D=0.001𝐈28D=0.001{\bf I}_{28}, we notice that the short dashed line has a peak, before going down. This is explained by the fact that when α(w)=w\alpha(w)=w and μ(v)=1\mu(v)=1, the entropic centrality is that of a node with degree amplified by the weights, thus creating an initial jump in the entropy. However at the next time, several reached nodes have in turn very few (or no) neighbors, and edges of low weights, and this leads to a saturation effect. This behavior is less prominent for other choices of DD, where the absorption probability is high, and hence the effect of the edge weights is attenuated.

We then focus on the case where Duu=1dw,out(u)+1D_{uu}=\tfrac{1}{d_{w,out}(u)+1}, which depends only on the network setting, instead of other choices of DD which are parameters whose range can be explored one by one. With this choice of DD, we consider the (weighted) Markov entropic centrality for u=2,27,20u=2,27,20, as displayed in Figure 6. These nodes have outgoing edges with weights 1,1,11,1,1, for u=2u=2, weights 4,3,14,3,1 for u=27u=27, and weights 2,1,1,12,1,1,1 for u=20u=20. These nodes are chosen because they have similar out-degrees, but different weighted out-degrees. The same 4 scenarios as above are considered. For node 2, its edge to 10 stops at 10, its edge to 5 has only one connection to node 23 which has weight 1, and its last edge goes to 25, which has no connection either, therefore the 4 centralities are actually of the same quantity, and there all the 4 lines are coincident. For node 27, since it has the same out-degree as node 2, in the unweighted case, it starts at the same centrality as node 2. However it is even less influential since none of its own neighbors have neighbors. In the 3rd scenario, its entropic centrality is even lower than in the unweighted case, which is normal, since in this case, the walk distribution is not uniform anymore. In the two cases where μ(v)=dw,out(v)dout(v)\mu(v)=\tfrac{d_{w,out}(v)}{d_{out}(v)}, we see a jump in the centrality, explained by the weights of the outgoing edges which are higher than for node 2. However, the centrality of node 27 remains below that of 20, whose weighted out-degree is less than that of 27, but its out-degree is actually more.

Refer to caption
Figure 6: The Markov entropic centrality Cα(w),Ht(u)C_{\alpha(w),H}^{t}(u) with Duu=1dw,out(u)+1D_{uu}=\tfrac{1}{d_{w,out}(u)+1}: for u=2,27,20u=2,27,20 (from Figure 4), for t=1,,5t=1,\ldots,5 and for (α(w),μ(v))=(1,1)(\alpha(w),\mu(v))=(1,1) (straight lines), (1,dw,out(u)dout(u))(1,\tfrac{d_{w,out}(u)}{d_{out}(u)}) (long dash lines), (w,1)(w,1) (short dash lines) and (w,dw,out(u)dout(u))(w,\tfrac{d_{w,out}(u)}{d_{out}(u)}) (dotted lines).

3.3 Interpreting Entropic Centrality with a Bitcoin Subgraph

We next consider a small subgraph extracted from the bitcoin network comprising 178 nodes and 250 edges. Nodes are bitcoin addresses, and there is an edge between one node to another if a bitcoin payment has been made. Since the proposed Markov entropic centrality measure is suitable for undirected, directed and weighted graphs, we look at the chosen bitcoin subgraph as an undirected unweighted graph by ignoring the direction of transactions, as a directed unweighted graph by just considering whether any transactions exists, and as a directed weighted graph (with α(w(e))=w(e)\alpha(w(e))=w(e), μ(v)=dw,out(v)dout(v)\mu(v)=\tfrac{d_{w,out}(v)}{d_{out}(v)}) to capture the effect of transaction amount.

In Figure 7 we show the three variations with nodes colored according to their Markov entropic centrality at t=t=\infty (asymptotic). The darker the node color, the higher the Markov entropic centrality. In subfigure 7a, one hub stands apart, with the highest entropic centrality, which is a node which is highly connected to other nodes. If we look at the same node in subfigure 7b, we see that it is not central anymore: this is because in the directed graph, this node is actually seen to receive bitcoin amounts from many other nodes, so as far as sending money is concerned, it has very little actual influence (in the same graph but with edge directions reversed, this node would have had the highest centrality). On the other hand, nodes that were not so prominent in the undirected graph appear now as important. The graph in subfigure 7b highlights nodes which are effectuating many bitcoin transactions. Now one node may do several transactions with little bitcoin amount, we may want a node that does few but high amount transactions to be more visible, which is illustrated in subfigure 7c. It turns out that for the subgraph we studied, high centrality nodes in the unweighted case remain high centrality nodes in the weighted case, meaning that nodes performing many bitcoin transactions are also those sending high amount transactions.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 7: Asymptotic Markov entropic centralities for a 178 node subgraph of the bitcoin network: undirected and unweighted on the left, directed and unweighted in the middle, directed and weighted on the right. The top 10%, next 20%, 30% and remaining 40% nodes in terms of their entropic centrality are shown in progressively lighter colors.

For the sake of completeness, we provide Kendall-τ\tau rank correlation coefficients among different variations of (weighted) Markov entropic centralities for the directed graph, for t=t=\infty: for (α(w(e))=1,μ(v)=1)(\alpha(w(e))=1,\mu(v)=1) (shown on subfigure 7b), for (α(w(e))=w(e),μ(v)=dw,out(v)dout(v))(\alpha(w(e))=w(e),\mu(v)=\tfrac{d_{w,out}(v)}{d_{out}(v)}) (shown on subfigure 7c), but also for (α(w(e))=w(e),μ(v)=1)(\alpha(w(e))=w(e),\mu(v)=1), and for (α(w(e))=1,μ(v)=dw,out(v)dout(v))(\alpha(w(e))=1,\mu(v)=\tfrac{d_{w,out}(v)}{d_{out}(v)}). The out-degree centrality is also tested. A Kendall-τ\tau coefficient is a measure of rank correlation: the higher the measure, the higher the pairwise mismatch of ranks across two lists. Results are shown in Table II. Above the diagonal, coefficients are obtained using all 178 nodes. Below the diagonal, only 67 nodes are used, which are in the union of the top 20% nodes for each of the 5 aforementioned centrality measures. In the given graph, many (peripheral) nodes have the same (low) entropic centrality, so when all nodes are considered for ranking correlation, the average is misleadingly low. Since high centrality nodes are often of interest, ranking variations among the top nodes is pertinent, and we observe that (i) out-degree is not a good proxy for most entropic centrality variants, and (ii) the different variations yield significantly distinct sets of high centrality nodes, corroborating the need for our flexible entropic centrality framework.

α\alpha,μ\mu 1,1 doutd_{out} ww,1 1,dw,out(v)dout(v)\tfrac{d_{w,out}(v)}{d_{out}(v)} ww,dw,out(v)dout(v)\tfrac{d_{w,out}(v)}{d_{out}(v)}
1,1 0 0.159 0.075 0.084 0.086
doutd_{out} 0.419 0 0.193 0.214 0.218
ww,1 0.315 0.60 0 0.120 0.050
1,dw,out(v)dout(v)\tfrac{d_{w,out}(v)}{d_{out}(v)} 0.186 0.538 0.305 0 0.089
ww,dw,out(v)dout(v)\tfrac{d_{w,out}(v)}{d_{out}(v)} 0.346 0.671 0.097 0.283 0
TABLE II: Kendall-τ\tau coefficients of 5 centrality measures: 4 of them are the Markov entropic centralities for different values of α(w(e))\alpha(w(e)) and μ(v)\mu(v); doutd_{out} is the out-degree centrality. Above the diagonal, coefficients are obtained using all 178 nodes. Below the diagonal, only 67 nodes are used, which are in the union of the top 20% nodes for each of these five centrality measures.

4 Entropic Centrality Based Clustering

We next explore whether and how the local communities inferred from the gradients observed in the evolution of the entropic centralities as a function of tt (see Section 2) can be exploited to realize effective clustering.

The formulas (2) and (8) for the (weighted) Markov entropic centralities involve the sum of probabilities puv(t)+puv(t)p_{uv}^{(t)}+p_{uv^{\prime}}^{(t)} and pα(w),uv(t)+pα(w),uv(t)p_{\alpha(w),uv}^{(t)}+p_{\alpha(w),uv^{\prime}}^{(t)} respectively. We will use P^\hat{P} to denote the matrix whose uuth row contains the coefficient puv(t)+puv(t)p_{uv}^{(t)}+p_{uv^{\prime}}^{(t)} in the unweighted case, and pα(w),uv(t)+pα(w),uv(t)p_{\alpha(w),uv}^{(t)}+p_{\alpha(w),uv^{\prime}}^{(t)} in the weighted case, in column vv. Since the algorithm that we describe next uses P^\hat{P} in the same manner, irrespectively of tt (though we will focus on the case t=t=\infty) and of whether there is a weight function, we keep the same notation P^\hat{P}.

The proposed algorithm works in two stages: first, we create local clusters around ‘query nodes’, and then, we carry out a hierarchical agglomeration. The choice of query nodes in our approach is informed by the entropic centrality, which we describe first (Subsection 4.1), before we elaborate the algorithm (in Section 4.2).

4.1 Query Node Selection

In the initial step of the algorithm, we look for a cluster around and inclusive of a specific query node, similarly to [4]. In [4] though, it was not possible a priori to determine suitable query nodes (e.g., a node could be at the boundary of two clusters, and thus yield the union of subsets of these clusters as a resultant cluster), and hence multiple constrained random walkers were deployed to increase the confinement probability of the random walks within a single cluster. In contrast, our choice of query nodes is informed by their Markov entropic centrality.

Let us first formalize some of the information captured by the notion of entropy.

Proposition 1

Given the probabilities p1p2pm<pm+1=1n==pm+s<pm+s+1pnp_{1}\leq p_{2}\leq\ldots\leq p_{m}<p_{m+1}=\frac{1}{n}=\ldots=p_{m+s}<p_{m+s+1}\leq\ldots\leq p_{n}, pi0p_{i}\geq 0, i=1npi=1\sum_{i=1}^{n}p_{i}=1, and γ>1\gamma>1 such that i=1npilog2(1/pi)=log2(n)γ\sum_{i=1}^{n}p_{i}\log_{2}(1/p_{i})=\frac{\log_{2}(n)}{\gamma}, we have:

  1. 1.

    snγs\leq\tfrac{n}{\gamma}, and nmsnτ1τγ1γn-m-s\geq n\tfrac{\tau-1}{\tau}\tfrac{\gamma-1}{\gamma} for τ>1\tau>1,

  2. 2.

    i=m+s+1npi>τ1τ(γ1)γi=1mpi<1snτ1τ(γ1)γ\sum_{i=m+s+1}^{n}p_{i}>\tfrac{\tau-1}{\tau}\tfrac{(\gamma-1)}{\gamma}\iff\sum_{i=1}^{m}p_{i}<1-\frac{s}{n}-\tfrac{\tau-1}{\tau}\tfrac{(\gamma-1)}{\gamma}.

Proof:

Since we ordered the nn probabilities by increasing order, we write

i=1npilog2(1/pi)=i=1mpilog2(1/pi)+snlog2(n)+i=m+s+1npilog2(1/pi),\sum_{i=1}^{n}p_{i}\log_{2}(1/p_{i})=\sum_{i=1}^{m}p_{i}\log_{2}(1/p_{i})+\frac{s}{n}\log_{2}(n)+\sum_{i=m+s+1}^{n}p_{i}\log_{2}(1/p_{i}),

and i=1npilog2(1/pi)=log2(n)γ\sum_{i=1}^{n}p_{i}\log_{2}(1/p_{i})=\frac{\log_{2}(n)}{\gamma} for some γ>1\gamma>1 certainly implies snlog2(n)log2(n)γ\frac{s}{n}\log_{2}(n)\leq\frac{\log_{2}(n)}{\gamma}. Thus there must be snγs\leq\tfrac{n}{\gamma} probabilities of the form 1n\tfrac{1}{n} (in particular, if γ>n\gamma>n, s=0s=0, and likewise, when γ>1\gamma>1 , sn1s\leq n-1). In turn, we must have nsn-s which is at least n(γ1)γ\tfrac{n(\gamma-1)}{\gamma} probabilities shared between p1,,pmp_{1},\ldots,p_{m} and pm+s+1,,pnp_{m+s+1},\ldots,p_{n}, and they must sum up to 1sn=nsn1-\tfrac{s}{n}=\tfrac{n-s}{n}.

It is not possible that all nsn-s probabilities belong to the group p1,,pmp_{1},\ldots,p_{m} (i.e., <1n<\frac{1}{n}), because if all of them were strictly less than 1n\frac{1}{n}, we would have at most sn+(ns)pm<sn+nsn<1\frac{s}{n}+(n-s)p_{m}<\frac{s}{n}+\frac{n-s}{n}<1 (a similar argument holds for not having all the probabilities in the group pm+s+1,,pnp_{m+s+1},\ldots,p_{n}). More generally, this is saying that deficiency from 1/n1/n among the p1,,pmp_{1},\ldots,p_{m} has to be compensated by the pm+s+1,,pnp_{m+s+1},\ldots,p_{n}:

i=1mpi=1sni=m+s+1npi,\sum_{i=1}^{m}p_{i}=1-\frac{s}{n}-\sum_{i=m+s+1}^{n}p_{i},

where 0<i=1mpi<nsn0<\sum_{i=1}^{m}p_{i}<\frac{n-s}{n}, and accordingly i=m+s+1npi\sum_{i=m+s+1}^{n}p_{i} varies from being strictly less than 1sn1-\frac{s}{n} to strictly more than 0.

We can refine these ranges using the number of probabilities we are allowed in each sum. Suppose that among the nsn-s probabilities that are left to be assigned, 1τ(ns)\frac{1}{\tau}(n-s) are the probabilities p1,,pmp_{1},\ldots,p_{m} for some τ>1\tau>1, and the rest, namely τ1τ(ns)τ1τn(γ1)γ\tfrac{\tau-1}{\tau}(n-s)\geq\tfrac{\tau-1}{\tau}\tfrac{n(\gamma-1)}{\gamma} are the probabilities pm+s<pm+s+1pnp_{m+s}<p_{m+s+1}\leq\ldots\leq p_{n}.

For at least τ1τn(γ1)γ\tfrac{\tau-1}{\tau}\tfrac{n(\gamma-1)}{\gamma} probabilities strictly more than 1/n1/n, we have:

i=m+s+1npi>τ1τ(γ1)γi=1mpi<1snτ1τ(γ1)γ.\sum_{i=m+s+1}^{n}p_{i}>\tfrac{\tau-1}{\tau}\tfrac{(\gamma-1)}{\gamma}\iff\sum_{i=1}^{m}p_{i}<1-\frac{s}{n}-\tfrac{\tau-1}{\tau}\tfrac{(\gamma-1)}{\gamma}.

We apply this proposition to πuv\pi_{uv}, for a given node uu, and for vv ranging over possible nodes in VV. Let τ>1\tau>1 be a parameter that decides the proportion of probabilities less than 1|V|\frac{1}{|V|} (the larger τ\tau, the smaller the proportion), and let γ>1\gamma>1 be a parameter that characterizes how small the Markov entropic centrality of a node uu is (the larger γ\gamma, the lower the entropic centrality). The result says that given γ\gamma, there is a tension between agglomeration of small probabilities on the one hand and agglomeration of large probabilities on the other hand: the larger the agglomeration of large probabilities (as a function of τ\tau and γ\gamma), the lower the agglomeration of small probabilities, and conversely. Furthermore, as γ\gamma grows, it creates an agglomeration of large probabilities which is detached from that with small probabilities. This suggests that nodes with a low entropic centrality have local clusters where ties are relatively strong, while

nodes tend to have a relatively high entropic centrality when they either are embedded within a large cluster, or when they are at the boundary of two clusters (and could arguably belong to either). Accordingly, we choose to start identifying local communities by choosing the low entropic centrality nodes as query nodes.

This is illustrated in Figure 8 for the 178 node Bitcoin subgraph, treated as unweighted and undirected. We consider a node with high centrality on the left hand-side, it has centrality 5.08297\approx 5.08297, and one with low centrality on the right hand-side (with centrality 1.73539\approx 1.73539). By high and low, we mean that both belong to the top 10 nodes in terms of respective high/low centrality. The histograms show the probability that a random walker is (asymptotically) absorbed at a node. The yy-axis is in logscale. The bins are placed at 0, 1/2|V|1/2|V|, 2/|V|2/|V|, and then at half the maximum probability (as observed across the two cases) and the maximum probability. In the first bin, for both cases, there are more than 100 nodes which have a negligible probability of the random walker being absorbed. As predicted by Proposition 1, for the node with high entropic centrality, they are less spread apart (and there are more probabilities close to 1/|V|1/|V|) than for the low entropic centrality node, which has, furthermore one high probability (between 0.3 and 0.6). Such higher probabilities of absorption at a few nodes in the graph are precisely what we use as a signal for a local community structure.

Refer to caption
(a) A high centrality query node.
Refer to caption
(b) A low centrality query node.
Figure 8: Histogram showing number of nodes (yy-axis in log-scale) at which a random walker is absorbed for given (range of) values of absorption probability (xx-axis) for the 178 node Bitcoin subgraph, treated as unweighted and undirected.

This finding is consistent with the hypothesis formulated at the end of Section 2.2, namely that if there is no significant change in the entropic centrality over time, then the node belongs to a small community, while larger entropic centralities indicate nodes that are well connected (possibly indirectly) to many nodes rather than being strongly embedded in a particular well defined community. In Figure 9 we show a scatter diagram illustrating, again for the 178 Bitcoin subgraph, the relations between the entropic centrality of a node and the highest probability of absorption at any node for a random walker starting at the respective node. We see that nodes with high centralities have as highest absorption probability, values below 0.3, while low entropic centrality nodes have highest absorption probability of more than 0.4. For this subgraph, the centralization sequence has a minimum of -0.27962, a maximum of 0.40350, a mean of 0 and a median of 0.00560. The nodes that are “circled” are those with highest centralization in red (centralization values more than 0.18), and those with lowest centralization in blue (centralization values less than -0.2).

Refer to caption
Figure 9: On the xx-axis, the centrality of nodes, on the yy-axis, the value of the highest probability of absorption at any node, for a random walker starting at that node (in the 178 node Bitcoin subgraph, treated as unweighted and undirected). The two vertical bars demarcate the top 30% and 40% of nodes in terms of entropic centrality. Nodes that are circled have highest/lowest centralization.

The Markov entropic centrality score acts as a good summary of the detailed probability distributions of absorption for a random walker starting at different query nodes. Specifically, we see that using low entropic centrality nodes as query nodes would yield meaningful local clusters precisely by identifying the nodes at which the random walker is absorbed with high probability to constitute a local community. Such communities can then be further coalesced into larger (and fewer) clusters or re-clustered into smaller (and more numerous) communities, as deemed suitable, in a hierarchical manner. We will see, in our experiments, what is a suitable ‘low’ or ‘high’ entropic centrality, particularly, that this would depend on the distribution of the centrality scores. The distribution of the relative Markov centrality scores for a given graph is thus also a good indicator of whether the proposed approach would be effective for clustering that graph instance, and helps us reason about whether to use the proposed approach. This will be illustrated in our experiments, in particular in Subsection 4.6.

4.2 Entropic Centrality Based Clustering Algorithms

Keeping in mind the above discussion, we describe the first stage (without the iterative hierarchical process) of clustering in Algorithms 1 and 2.

In Algorithm 1, we maintain SclusterS_{cluster} as a current global view of clusters. We start from the lowest entropic centrality node as a query node, and repeat the process as long as there are nodes that do not already belong to some existing cluster (listed in SclusterS_{cluster}).

Algorithm 1 Probability distribution based graph clustering
1:procedure ProbDistClustering(G=(V,E),N,tG=(V,E),N,t)
2:\triangleright N<<|V|N<<|V| stands for the top-NN most central nodes
3:     Compute P^\hat{P} and the entropic centrality of all vVv\in V.
4:     Assign SHE={S_{HE}=\{the top-NN entropic centrality nodes}\}.
5:     \triangleright Initialization
6:     Set QQ: nodes in ascending order of entropic centrality.
7:     Set Scluster=S_{cluster}=\varnothing.\triangleright Current clusters
8:     \triangleright Global clustering
9:     while QQ is not empty do
10:         Take the query node vqv_{q} from QQ’s head (remove it).
11:         \triangleright Obtain query node centric local cluster
12:         Apply a(ny) clustering algorithm on the row (P^vq,v)vV(\hat{P}_{v_{q},v})_{v\in V} of P^\hat{P} to form the set Sini,vqS_{ini,v_{q}} of clusters.
13:         Choose SvqS_{v_{q}} from Sini,vqS_{ini,v_{q}} with the highest average transition probability (include vqv_{q}). \triangleright vqv_{q}’s raw cluster
14:         \triangleright Prune the raw cluster SvqS_{v_{q}} using Algorithm 2.
15:         ProcessRawCluster(Svq,SHE,SclusterS_{v_{q}},S_{HE},S_{cluster})
16:         vSvq\forall v\in S_{v_{q}}, remove vv from QQ.
17:         \triangleright Integrate the local result with the global view.
18:         if SvqS_{v_{q}} intersects with any cluster(s) in SclusterS_{cluster} then
19:              Merge them (update SclusterS_{cluster} accordingly).
20:         else Add SvqS_{v_{q}} to SclusterS_{cluster}.               
21:     return SclusterS_{cluster}

We consider the transition probabilities from a query node vqv_{q} to all the other nodes as per P^\hat{P}, and carry out a clustering of these (one-dimensional, scalar) probability values. Lemma 1 shows the existence of (up to) “three clusters”, formed by probabilities of values around 1/|V|1/|V|, and of values away from 1/|V|1/|V|, either by being smaller or larger enough. How clearly defined these clusters are depends on γ\gamma: if γ\gamma is small, close to 1, probability values can still be close to uniform, on the other extreme, if γ\gamma is large, there may be very few or no value around 1/|V|1/|V|, resulting in mostly two clusters. In our implementation, we use the Python scikit-learn agglomerative clustering to look for this initial set of (up to) three clusters Sini,vqS_{ini,v_{q}}. Among these clusters, we choose the cluster with the highest average transition probability from vqv_{q}. We define SvqS_{v_{q}} to be the nodes corresponding to this cluster along with vqv_{q} itself since, (i) the comprising nodes have similar probabilities for random walks to end up there when originating from vqv_{q} (this follows from the clustering of the probability values), and crucially, these nodes are considered to comprise the immediate local community because (ii) this is the largest (in expectation) such probability.

We consider the absorption probabilities for a random walker starting at vqv_{q} to be absorbed at any of the nodes in SvqS_{v_{q}}, and define the minimum of these values as σ\sigma i.e. σ=minvSvqP^vq,v\sigma=\min_{v\in S_{v_{q}}}\hat{P}_{v_{q},v}. Thus, σ\sigma can be understood as an effective threshold (deduced a posteriori) above which the probability of being absorbed in the local cluster of vqv_{q} is high enough. Proposition 2 and its corollary below show that if vv belongs to the local cluster of vqv_{q}, but ww should also belong to the local cluster of vv, then provided that the absorption probability pvvp_{vv^{\prime}} at vv is not too large (pvvσp_{vv^{\prime}}\leq\sigma), ww will also belong to SvqS_{v_{q}}, together with vv.

Proposition 2

The probability πuw=t1p~uw(t)pww\pi_{uw}=\sum_{t\geq 1}\tilde{p}_{uw}^{(t)}p_{ww^{\prime}} to start at uu and to be absorbed at wuw\neq u over time is lower bounded by:

p~uwpww+πuvπvwpvv,\tilde{p}_{uw}p_{ww^{\prime}}+\pi_{uv}\frac{\pi_{vw}}{p_{vv^{\prime}}},
Proof:

We have that πuw=t1p~uw(t)pww\pi_{uw}=\sum_{t\geq 1}\tilde{p}_{uw}^{(t)}p_{ww^{\prime}} is the probability to start at uu and to be absorbed at ww over time. We start again with (5):

πuw\displaystyle\pi_{uw} =\displaystyle= (p~uw+t2i=1t1p~uv(i)p~vw(ti)+t2i=1t1yvp~uy(i)p~yw(ti))pww\displaystyle(\tilde{p}_{uw}+\sum_{t\geq 2}\sum_{i=1}^{t-1}\tilde{p}_{uv}^{(i)}\tilde{p}_{vw}^{(t-i)}+\sum_{t\geq 2}\sum_{i=1}^{t-1}\sum_{y\neq v}\tilde{p}_{uy}^{(i)}\tilde{p}_{yw}^{(t-i)})p_{ww^{\prime}}
\displaystyle\geq (p~uw+t2i=1t1p~uv(i)p~vw(ti))pww\displaystyle(\tilde{p}_{uw}+\sum_{t\geq 2}\sum_{i=1}^{t-1}\tilde{p}_{uv}^{(i)}\tilde{p}_{vw}^{(t-i)})p_{ww^{\prime}}
=\displaystyle= (p~uw+s1p~vw(s)i1p~uv(i))pww=p~uwpww+πuvpvvπvw\displaystyle(\tilde{p}_{uw}+\sum_{s\geq 1}\tilde{p}_{vw}^{(s)}\sum_{i\geq 1}\tilde{p}_{uv}^{(i)})p_{ww^{\prime}}=\tilde{p}_{uw}p_{ww^{\prime}}+\frac{\pi_{uv}}{p_{vv^{\prime}}}\pi_{vw}

since we recognize a Cauchy product, and we invoked Merten’s Theorem where i1p~uv(i)πuv\sum_{i\geq 1}\tilde{p}_{uv}^{(i)}\rightarrow\pi_{uv}, s1p~vw(s)πvw\sum_{s\geq 1}\tilde{p}_{vw}^{(s)}\rightarrow\pi_{vw}, and both sequences (p~uv(i))i,(p~vw(s))s(\tilde{p}_{uv}^{(i)})_{i},(\tilde{p}_{vw}^{(s)})_{s} are absolutely converging to 0. ∎

Corollary 2

Given that the probability πuv\pi_{uv} to start at uu and to be absorbed at vv is more than a threshold σ\sigma, and that the probability to start at vv and to be absorbed at ww is also more than σ\sigma, then if pvvσp_{vv^{\prime}}\leq\sigma, πuwσ\pi_{uw}\geq\sigma.

Proof:

Suppose πuv,πvwσ\pi_{uv},\pi_{vw}\geq\sigma, then by the above proposition:

πuwp~uwpww+πuvπvwpvvπuvπvwpvvσ2pvv\pi_{uw}\geq\tilde{p}_{uw}p_{ww^{\prime}}+\pi_{uv}\frac{\pi_{vw}}{p_{vv^{\prime}}}\geq\pi_{uv}\frac{\pi_{vw}}{p_{vv^{\prime}}}\geq\frac{\sigma^{2}}{p_{vv^{\prime}}}

so pvvσp_{vv^{\prime}}\leq\sigma implies πuwσ\pi_{uw}\geq\sigma. Note that this reasoning is iterative, namely if now if we consider that from uu, we got absorbed at ww, but also from ww, we got absorbed at yy, with πywσ\pi_{yw}\geq\sigma, then

πuyp~uypyy+σ2pww\pi_{uy}\geq\tilde{p}_{uy}p_{yy^{\prime}}+\frac{\sigma^{2}}{p_{ww^{\prime}}}

and again pwwσp_{ww^{\prime}}\leq\sigma implies πuyσ\pi_{uy}\geq\sigma. ∎

If the query node vqv_{q} is a low entropic centrality node, it is expected that once a cluster SvqS_{v_{q}} is formed, nodes belonging to SvqS_{v_{q}} are unlikely to be high centrality nodes themselves (otherwise vqv_{q} would likely have inherited the influence and would not be a low centrality node itself). However, if the relative difference between what are considered low or high entropic centrality is not high, then several well connected nodes may get clubbed together in the clustering process, further bringing in many other nodes, coalescing a large group of nodes in the early stage of the clustering. Likewise, if the query node vqv_{q} belongs to a pre-designated group of high entropic centrality nodes SHES_{HE}, then there is a risk that it may inadvertently merge multiple clusters which one may want to be separate. These motivate the addition of a pruning algorithm described in Algorithm 2, which works as follows.

Algorithm 2 Pruning of the raw cluster SvqS_{v_{q}}.
1:procedure ProcessRawCluster(Svq,SHE,SclusterS_{v_{q}},S_{HE},S_{cluster})
2:\triangleright most central node list SHES_{HE}, current cluster list SclusterS_{cluster}
3:     if vqSHEv_{q}\in S_{HE} then
4:         Set {C1,,Cr}=argmaxCScluster|CSvq|\{C_{1},\ldots,C_{r}\}=\arg\max_{C\in S_{cluster}}|C\cap S_{v_{q}}|.
5:         if r>1r>1 then C=rand{C1,,Cr}C^{\prime}=rand{\{C_{1},\ldots,C_{r}\}}
6:         else  C=C1C^{\prime}=C_{1}          
7:         Svq=Svq\(CScluster(CSvq)\C)S_{v_{q}}=S_{v_{q}}\backslash(\cup_{C\in S_{cluster}}(C\cap S_{v_{q}})\backslash C^{\prime}).      
8:     if |Svq(SHE\vq)|>1|S_{v_{q}}\cap(S_{HE}\backslash v_{q})|>1 then
9:     \triangleright SvqS_{v_{q}} contains multiple high entropy nodes beside vqv_{q}
10:         Among nodes in SHE\vqS_{HE}\backslash v_{q}, keep only the node(s) which have the highest transition probability from vqv_{q}
11:         \triangleright nodes not in SHES_{HE} are not affected      
12:     return SvqS_{v_{q}}

If the query node vqv_{q} is a high entropic centrality node, then we check the intersection of SvqS_{v_{q}} against existing clusters from SclusterS_{cluster}, and in case there are non-trivial intersections, and yet there is no unique cluster with which there is a largest intersection, we retain only one of the largest clusters as SvqS_{v_{q}} chosen randomly, since it otherwise risks merging clusters which ought to be distinct. Otherwise, we retain as SvqS_{v_{q}} the nodes from the largest intersection, as well as nodes that were in no intersection, but discard the rest. Irrespective of the query node, if there are multiple pre-designated high entropic centrality nodes in SvqS_{v_{q}} (other than vqv_{q}), we retain among these only the ones to which the transition probability from vqv_{q} is highest. This pruned list is given to Algorithm 1, where it is checked against existing clusters in SclusterS_{cluster}, and if there is any intersection, then they are merged, otherwise, SvqS_{v_{q}} is added as a new cluster in SclusterS_{cluster}. Note that the pruning mechanism introduces an element of randomization in our algorithm. As such, all the reported results in this paper are based on ten experiment runs. For all but one of the graphs used in our experiments, the conditional statement which introduces the randomization was in fact never triggered, and thus the results are produced in a consistent fashion. Only some variations of the 178 nodes Bitcoin subgraph (results shown in Figure 12) triggered the conditional statement: When it was treated as undirected and unweighted, the conditional statement was triggered but nevertheless resulted in same clusters across the ten experiment runs. For the same graph treated as directed but unweighted, the randomization yielded distinct but very similar cluster results (F-score 0.982 among the distinct results), and one of these result instances is shown.

Having identified localized query-node centric community structures, in the second stage, we agglomerate these to identify clusters at different degrees of granularity. A single stage of agglomeration is almost identical to the initial clustering process described above, with the following subtle changes. The cluster results from the previous step are considered as the new nodes. We still only use the matrix P^\hat{P}, and hence we do not (need to) explicitly define edges connecting the clustered nodes. The new coalesced nodes are assigned an entropic centrality value corresponding to the average of the entropic centrality of their constituent nodes. For transition probabilities across clustered nodes (say C¯\bar{C} and C~\tilde{C}), we considered the minimum, mean and maximum transition probabilities amongst all node pairs uC¯,vC~u\in\bar{C},v\in\tilde{C} as per P^\hat{P}. Finally, we discard a specific agglomeration in case the resulting agglomerated cluster would not result in a (weakly) connected graph. Our experiments indicate that the best clustering results are obtained using the minimum transition probabilities, we only report the corresponding results next.

Refer to caption
(a) Edge removal (20 iterations).
Refer to caption
(b) Initial clusters (stage 1).
Refer to caption
(c) The final 2-cluster result (stage 2).
Figure 10: Clustering of the karate club network: using the edge removal clustering of [23] on the left, and the proposed algorithms in the middle (stage 1) and on the right (stage 2, with agglomeration).

4.3 Clustering of the Karate Club Network

We consider the karate club network [38] (see Figure 10 for the cluster results), to illustrate and analyze the workings of the clustering algorithm with a toy example with known baseline, before studying larger graphs. In Figure 10b, the initial set Sini,vqS_{ini,v_{q}} of clusters is shown along with the dendrogram for agglomeration, and Figure 10c shows the final clusters. Clustering obtained using the edge removal technique (20 iterations) from [23] is shown in Figure 10a for comparison. We also show the time evolution of some of the nodes with highest asymptotic entropic centrality, and the node with the lowest asymptotic entropic centrality in Figure 11 (this is in addition to Figure 3 where we demonstrated the time evolution of certain nodes too), to illustrate the behavior of nodes which are hubs, at the interface of the clusters and at the periphery of the network. This top-down clustering approach follows the idea of edge removal from [11], but using the reduction of average Markov entropic centrality to determine which edge to remove.

While it is visually apparent that our approach yields better clustering, we quantify this based on the ground truth [38] using F-score [31]. The result obtained with our approach has a F-score of 0.884 while the one obtained with [23] has a F-score of 0.737.

Refer to caption
Figure 11: Time evolution of entropic centrality of the three nodes with the highest centralities (nodes 3, 34, 1 respectively), node 14, which visually appears in the graph to be at the border of the clusters (and has, in fact, the seventh highest centrality) and the node with the lowest entropic centrality (node 17).

We furthermore benchmark the computation time: the edge removal based clustering approach [23] took 14.154 seconds, while our final result of 2-clusters were computed in 0.026 seconds. These experiments were run on a 64-bit PC with x64-based Intel(R) Xeon(R) CPU E5-1650 0 @3.20GHz processor and 16 GB RAM. This is easily explained: in our algorithm, the transition probability matrix P^\hat{P} needs to be computed only once. In contrast, even within a single iteration, the edge removal algorithm [23] needs to recompute the said matrix for every graph instance created by removal of each possible edge, to determine which edge to remove, and this exercise is then repeated in every iteration. That accounts for the huge discrepancy in the computation time, and demonstrates the computational efficacy of our approach.

4.4 Clustering of Bitcoin Subgraphs

We apply our proposed clustering algorithm to variations (un/directed, un/weighted) of the 178 node Bitcoin network subgraph [24] and report the results in Figure 12. The results obtained using the edge removal algorithm [23] on the undirected unweighted graph variant is shown on Figure 13. Unless otherwise stated, we use as a parameter N=53N=53, essentially considering SHES_{HE} to comprise the top 30% entropic centrality nodes (see the sensitivity analysis below).

Refer to caption
(a) Initial clusters (Algo. 1 output)
Refer to caption
(b) Two rounds of agglomeration.
Refer to caption
(c) Agglomeration dendrogram
Refer to caption
(d) Initial clusters (Algo. 1 output)
Refer to caption
(e) One round of agglomeration
Refer to caption
(f) Two rounds of agglomeration
Refer to caption
(g) Initial clusters (Algo. 1 output)
Refer to caption
(h) Four rounds of agglomeration
Refer to caption
(i) Agglomeration dendrogram
Figure 12: Clustering of the 178 node bitcoin subgraph (results obtained in 1.072, 1.077 and 1.123 seconds respectively): 1st row - undirected unweighted graph, 2nd row - directed unweighted graph, 3rd row - directed weighted graph (α=1,μ=dw,out(v)dout(v)\alpha=1,~{}\mu=\tfrac{d_{w,out}(v)}{d_{out}(v)}).

While it is visually clear from Figure 12 that we obtain different clusters depending on the scenario considered, Table III confirms this by reporting F-scores [31] across the graph variants. Also, the effect of the parameter μ\mu (without the transition probabilities being altered by edge weights) is rather low. This reinforces an underlying motivation of our work, namely that which graph variant (un/directed and/or weighted) to study is application dependent, and hence having one graph clustering algorithm that works across all variants is beneficial.

The clusterings of the undirected unweighted graph found by our algorithm in 1.072 seconds (Figure 12) and by the edge removal algorithm in 3196.07 seconds (Figure 13) are very different (F-score of 0.125). Our algorithm visually yields (more) meaningful results, though in the absence of a ground truth, this assertion remains subjective. The agglomeration process in our algorithm could have been continued beyond the final result shown here, as indicated by the associated dendrogram. The edge removal based clustering was stopped at the 50th iteration, after which no further average entropy reduction was observed with any single edge removal.

graph UUUU D1,1D_{1,1} Dw,1D_{w,1} D1,MD_{1,M} Dw,MD_{w,M} UUerUU_{er}
UUUU 1 0.416 0.506 0.372 0.271 0.125
D1,1D_{1,1} 0.416 1 0.473 0.806 0.364 0.146
Dw,1D_{w,1} 0.506 0.473 1 0.408 0.416 0.122
D1,MD_{1,M} 0.372 0.806 0.408 1 0.413 0.145
Dw,MD_{w,M} 0.271 0.346 0.416 0.413 1 0.131
UUerUU_{er} 0.125 0.146 0.122 0.145 0.131 1
TABLE III: Pairwise F-score among clusterings achieved for different graph variants. Legend — UU: undirected & unweighted; Dx,yD_{x,y}: Directed with α=x\alpha=x, and μ=1\mu=1 if y=1y=1, else μ=dw,out(v)dout(v)\mu=\tfrac{d_{w,out}(v)}{d_{out}(v)} if y=My=M; UUerUU_{er}: edge removal algo [23].
Refer to caption
Figure 13: Clustering by iterative edge removal [23].

When to stop agglomerating is typically purpose dependent. The second row of Figure 12 shows various stages of agglomeration, the dendrogram for agglomeration is given for the 1st and 3rd row. There are many community structures that repeat across the considered scenarios, and most of the cluster boundaries can be traced back to the high entropy nodes (Figure 7), yet there are also subtle differences, e.g., in the weighted directed graph, there are instances of single isolated nodes, which stay isolated for several interactions of agglomeration because of weak (low weight) connections.

Sensitivity analysis. For each graph variant, the size of SHES_{HE} was varied to comprise 10%, 20%, 30% and 40% of the top entropic centrality nodes. For the unweighted and weighted directed graph, the F-score between clusterings obtained with 10% and the others are 0.824 and 0.989 respectively, while all the other pairs have F-score of 1. This suggests very consistent results in these cases, irrespectively of the choice of |SHE||S_{HE}|. However for the unweighted undirected graph, |SHE||S_{HE}| has a significant impact, with the F-score between 20% and 40% being the lowest at 0.626, while the best score of 0.912 is obtained between 30% and 40%. This justifies the use of the top 30% entropic centrality nodes for the reported results. Looking back at Figure 9, we observe that considering only 20% for the size of SHES_{HE} means including a number of query nodes with relatively low highest probabilities, while between 30% and 40%, these highest probabilities are not changing significantly, which is consistent with the observed variations in sensitivity.

Refer to caption
Figure 14: Clustering of a Bitcoin subgraph involved in the Ashley-Madison extortion scam [32].

We next consider another Bitcoin subgraph, but this time, we use a specific subgraph [25] of 4571 nodes, constructed around Bitcoin addresses suspected to be involved in the Ashley-Madison extortion scam [32]. The result of the clustering algorithm is shown on Figure 14 and Figure 15: (1) the graph was considered once unweighted (the emphasis is thus on node connections), once weighted (to capture the amount of Bitcoins involved), (2) the asymptotic Markov entropic centrality was used (we have no specific diameter of interest), (3) the top 30% nodes with the highest centrality are chosen as high centrality nodes SHES_{HE}, (4) five agglomeration iterations were performed for clustering the unweighted graph, and one for the weighted variant.

On Figure 14, showing the overall unweighted graph, there are three visually obvious main clusters: the upper green cluster, the purple cluster on the right, and the grey cluster on the left. The first observation is that the grey color here only represents nodes whose cluster size is too small to be significant (only 5 iterations were performed), they are thus kept in grey so as to make the other clusters more visible. The green and purple clusters are easily interpreted: each contains one Bitcoin address that is a hub for all its neighbors.

We then zoom into the central clusters, shown on Figure 15a. The actual relationship among the constituent wallet addresses in a cluster can be determined e.g. by using blockchain.com/explorer. We observe a green cluster near the middle (boxed). In our layout, we have isolated one of the constituent nodes (on the right, encircled), to show that the nodes in this collection have multiple mutual connections, as expected among nodes within a cluster. We see that the encircled node above the boxed group has also been assigned to the same cluster. This node is in fact connected to several of the other clusters that have been identified with our algorithm, and is one of the high centrality nodes, which lies at the interface of clusters. It happens to have been assigned to the green cluster, since each node is assigned to at most a single cluster. Some of the nodes in the (boxed) cluster were previously identified to be suspect addresses involved in the Ashley-Madison data breach extortion scam [32]. The resulting clusters thus help draw our attention to the other nodes which have been grouped together, since it indicates that Bitcoin flows have circulated among them, for their relationship with the already known suspected nodes to be investigated further.

Zooming into the weighted graph gives a very different picture: since the amounts of Bitcoin involved drive the clustering in this case, we prominently see two clusters highlighting nodes dealing with high volume of Bitcoins. This confirms an expected behavior from scammers, which consists of collecting few Bitcoins from many addresses to avoid attention. Combining both clustering results however correlate nodes that are likely to be involved in the scam, together with nodes dealing with high volume of Bitcoins. For example, this could be a direction to consider for Bitcoin forensics: nodes appearing in clusters by interpreting the graph in both manners could possibly be involved in aggregating scam money, since they stand out both in terms of the volume of Bitcoin they handle, and in terms of the frequency of interactions with multiple suspected wallet addresses.

Refer to caption
(a) Graph treated as unweighted
Refer to caption
(b) Graph treated as weighted
Figure 15: Zoom in of the Ashley-Madison extortion scam (Bitcoin transactions induced) graph.

4.5 Benchmarking With Synthetic Graphs

In the previous subsection, we looked at the clusterings that the algorithm provides with Bitcoin subgraphs, for which no ground truth is available. In order to benchmark our proposed algorithm rigorously, we thus further experiment with graphs for which some form of ground truth is assumed.

An acknowledged concern in the research community is that a unique objective benchmark to compare graph clustering algorithms is not feasible [37, 30]. Different algorithms may yield different clusters for a given graph, that may each be meaningfully interpreted based on distinct contexts. Conversely, in the real world, connections may have been induced as a consequence of multiple causes (contexts), and using the meta-data representing a subset of these contexts to determine a ‘ground truth’ for the resulting graph may not be accurate. Furthermore, there may be implicit or explicit hierarchical community structures in the graph, or the communities may be fuzzy, and a clustering algorithm may find clusters at a coarser or finer granularity than the one considered as the ground truth.

Synthetic graphs (for example [18]) are considered to alleviate the issue of the lack of a unique and objective ground truth. Yet such synthetic graphs may not carry all the characteristics and associated complications of a real network.

Considering the merits of benchmarking with graphs with ground truth, but also the inherent limitations associated with any particular instance(s) of real or synthetic (family of) graph(s), we study a wide range of graphs with assumed ground truth. We next discuss our experiments with synthetic graphs, before studying some real graphs in the following subsection.

The principal idea of generating synthetic graphs with a known ground truth is to first create isolated subgraphs (with certain properties such as a given node degree distribution) that represent the ground truth communities. Then, rewiring of a fraction of the connections is carried out to establish cross community links such that, probabilistically, a 1μ1-\mu fraction of links are with nodes within the same community, while a fraction μ\mu (mixing probability) of connections are with other nodes. Though this rewiring process itself might affect the neighborhood of individual nodes to an extent that it materially changes the community it actually belongs to (particularly for high values of the mixing probability μ\mu), the original allocation of the communities is considered to continue to hold, and is treated as the ground truth. For the reported experiments below, we used synthetic benchmark graph instances randomly generated using NetworkX.

Refer to caption
(a) 100 nodes graph: 4 clusters detected with F-score 0.900
Refer to caption
(b) 300 nodes graph: 15 clusters detected with F-score 0.973
Refer to caption
(c) 500 nodes graph: 21 clusters detected with F-score 0.909
Figure 16: Clusterings of synthetically generated networks with mixing probability μ=0.1\mu=0.1.

For sanity check and to manually (visually) interpret the results, we start the benchmarking with small graphs and a small value of μ=0.1\mu=0.1. The clusters identified by our algorithm for synthetic graphs with 100, 300 and 500 nodes are shown in Figure 16 for μ=0.1\mu=0.1. Visually, we see that the algorithm yields meaningful clusters. We also determine the F-score with respect to the ground truth as determined by the network generation process, and across the different networks we observe very good (0.9 or above) F-score values. For the 100 nodes graph, one can visually see certain nodes, particularly in the middle group being allocated cluster labels that mismatch, explaining the relatively lower score of 0.9 among these graph instances. For the 500 nodes graph instance, the isolated nodes had distinct labels in the ground truth. Some other misattributions can also be seen visually, explaining the relatively lower score of 0.909. The 300 nodes graph instance had disconnected components, which might also have made it easier for the clustering algorithm to find relevant communities, yielding a noticeably high score of 0.973 was obtained.

Refer to caption
Refer to caption
Figure 17: Scatter plots of F-scores for large-scale benchmarking of our Entropy based graph clustering approach with randomly generated synthetic graphs comprising 1000 and 4000 nodes respectively for a wide range of inter-cluster linkage μ\mu characteristics are shown. Modularity optimization based Louvain [5] community detection algorithm is also shown to provide a point of comparison.

We next extend our study with larger scale experiments both (i) in terms of graph size (1000 and 4000 nodes) and (ii) in terms of range of μ\mu values representing different extents of cross-community linkages. We show the scatter plots of observed F-scores for our Entropy based graph clustering algorithm. For a point of reference, we also provide the results observed for clustering with Modularity optimization [5]. For relatively smaller values of μ\mu, perfect clustering is obtained by [5] while near perfect clustering is also obtained by our approach. For the rest of the spectrum of μ\mu values, performance of both the approaches varies to certain extent (more so in the larger graphs with 4000 nodes), but overall both algorithms yield good results, e.g., F-score is consistently higher than 0.8 for μ<0.3\mu<0.3, and the deterioration of the F-score with increasing μ\mu is gradual, rather than sharp. Furthermore, while the performance varies across different graph instances, very high (absolute values of) Pearson’s correlation coefficients between F-score and μ\mu (precise correlation coefficient rr values are indicated in the figures) indicate a good degree of consistency in the behaviour for both the algorithms. From individual data points, we observe that among the two algorithms, there is no clear winner, and each outperforms the other for some graph instances across most of the μ\mu value ranges. These large-scale benchmarking experiments with randomly generated synthetic graphs demonstrate the efficacy of our proposed approach on its own. While the original objective of our proposal was to investigate a new way to carry out graph clustering rather than to necessarily compete with existing approaches, the comparison using synthetic graphs with one of the popular existing approaches further demonstrates that the quality of clustering results obtained by our approach is in fact comparable.

4.6 Benchmarking With Real World Graphs

Since synthetic graphs may not exhibit all the nuances of communities occurring in the real world, we complement our study with benchmarking experiments using networks with known ground truth, namely, the dolphin network [21] and the American college football network [11] which were previously used in the work [23] that we extend. Moreover, we extend the comparative aspect of our evaluation, and to that end we compare our approach with other popular community detection algorithms, namely, InfoMap [34], label propagation [2] and Louvain clustering [5].

4.6.1 Clustering of the Dolphin Network

We consider the dolphin network [21], an undirected unweighted social network where bottlenose dolphins are represented as nodes and association between dolphin pairs are represented as links. The network comprises 62 nodes and 159 links, and it was noticed that the dolphin community splits into two communities [21] comprising 20 nodes and 42 nodes. We use this as the ground truth.

In Figure 18a, we show the network structure and the corresponding relative (that is, normalized by the maximum observed value) Markov entropic centralities: the darker the node color, the higher the relative centrality. Figures 18b and 19f respectively depict the relative Markov entropic centrality (fractional, bucketed) distribution and the scatter diagram of the absolute entropic centrality score (xx-axis), versus the maximum absorption probability at any node (yy-axis) for a random walker starting from the given node.

The histogram indicates that it would be more meaningful to choose the clustering parameter SHES_{HE} to include the top 50%-70% (instead of \approx 30%, as used in the previous experiments) since more than 60% of the nodes have normalized entropy value above 0.7. The scatter diagram helps us see that, indeed, taking the top 30% nodes for SHES_{HE} would mean including many nodes whose highest probability is relatively small, while there are few such a node by fixing the choice threshold at 60% (the vertical line demarcates the top 60% nodes on the right). The result obtained with two iterations of the algorithm is shown on Figure 19a, next to clustering results obtained using InfoMap [34], Louvain [5] and label propagation [2] (with their default parameters). We observe visually that the proposed clustering produces a better result compared to other clusterings. This is confirmed by computing the F-score [31] for each clustering result against the ground truth: the F-score of the proposed algorithm is 0.858, in contrast, it is 0.545 for InfoMap, 0.565 for Louvain, and 0.657 for label propagation. These three community detection techniques also find more than two clusters. From Figure 19, we also visually infer that the other clustering results could be improved if agglomeration techniques were applied to the smaller communities located on right hand side of the dolphin network. In fact, it could be argued that even though the group of dolphins split in two groups (which is the basis of the ground truth), it does not preclude the existence of further smaller communities within those two split groups, which could then be what is being detected by these algorithms.

Refer to caption
(a) The asymptotic Markov entropic centralities (darker the color, higher the relative entropic centrality score).
Refer to caption
(b) Histogram of the normalized (by the maximum) centralities.
Figure 18: The dolphin network
Refer to caption
(a) Proposed algorithm (2 iterations, top 60%).
Refer to caption
(b) InfoMap.
Refer to caption
(c) Louvain.
Refer to caption
(d) Label propagation.
Refer to caption
(e) The ground truth [21].
Refer to caption
(f) Maximum absorption probability as a function of the entropic centrality.
Figure 19: Clusterings of the dolphin network.

4.6.2 Clustering of the American College Football Network

We next consider the American college football network [11], an undirected unweighted network representing the Division-I football games from Fall 2000. A team is represented as a node, and a game between two teams is represented as a link between two nodes. There were in total 115 teams and 613 games. Teams were divided into 12 conferences, and teams in the same conference frequently had games against others. We treat the 12 conferences as the network’s ground truth, comprising 12 clusters.

Refer to caption
Refer to caption
Figure 20: The American College football network [11] is shown (darker the color, higher the relative entropic centrality score). The histogram of the normalized (by the maximum) centralities distribution for the same network is also shown.
Refer to caption
(a) Proposed technique (one iteration).
Refer to caption
(b) Proposed technique (applied on the 3 largest subgraphs in Figure 21a).
Refer to caption
(c) InfoMap.
Refer to caption
(d) Louvain.
Refer to caption
(e) Label propagation.
Refer to caption
(f) The ground truth [11].
Figure 21: Clusterings of the American College football network.
Refer to caption
Refer to caption
Figure 22: Scatter diagram with the entropic centrality of nodes on the xx-axis, and on the yy-axis, the maximum absorption probability at any node for a random walker starting at that node: the American college football network (left), and the EU mail network (right).

Figure 20 shows the network and the corresponding relative (that is, normalized by the maximum entropic centrality value) Markov entropic centralities: darker the color, higher the relative entropic centrality score. We also see in the distribution that a majority of nodes have normalized entropic centrality between 0.2 to 0.4. This helps us to identify our clustering parameter SHES_{HE} to identify the set of high entropy nodes deemed as center/border of a cluster. Accordingly, we chose SHES_{HE} to comprise the top 50/60/70/80% entropic centrality nodes and results obtained had F-scores against ground truth as 0.273, 0.406, 0.409, and 0.517 respectively. The scatter diagram showing the (absolute) entropic centrality and the maximum absorption probability at any node for a random walker starting at corresponding nodes is shown in Figure 22 (left). The vertical line in the diagram shows the demarcation for SHES_{HE} for 80%. Unlike for the dolphin network, there is barely any node with distinctively high probability. The threshold of 80% separates a few nodes with both slightly highest entropic centrality and highest probability. The result with F-score 0.517 is shown in Figure 21a. We stop at the first iteration since our clustering technique is a bottom-up approach, which means that second iteration will produce fewer number of clusters. Based on the ground truth (Fig. 21f), we notice for our algorithm a similar behavior as was observed with the other algorithms for the dolphin network, namely: the algorithm coalesced several of the ground truth communities to create larger communities. By extracting the largest three subgraphs of these larger communities and re-applying the algorithm on each of these subgraphs (again with parameter SHES_{HE} consisting of the top 80% entropic centrality nodes), we obtained a new group of clusters, shown on Fig. 21b, with a significantly improved F-score of 0.811. The overall computation time was a total of 0.221 seconds. We compare this with results obtained with InfoMap [34] (F-score of 0.904 and total time of 0.013 seconds), Louvain [5] (F-score of 0.823 and total time of 0.002 seconds), and label propagation [2] (F-score of 0.796 and total time of 0.001 seconds) as shown in Figure 21 along with ground truth. In the ground truth [11], a cluster with yellow color and another cluster with crimson color spread their members out over the network. Considering this anomalous ‘ground truth’, ours as well as other clustering techniques produce very good results. InfoMap has the highest F-score, Louvain has similar F-score to our clustering technique, and label propagation has the lowest F-score.

Finally, we considered the European email network [20] representing email communication in a large European research institution, among members that belong to 42 departments (thus 42 clusters). The corresponding scatter diagram is shown in on the right of Figure 22. Looking at the scatter diagram from left to right, we observe (actually) a few nodes with entropy 0, these are isolated nodes with no edge. Then there is a small group with entropies varying between 4 and 6, and finally on the right the bulk of nodes have both entropies more than 6 and highest probability less than 0.3. This suggests that the proposed algorithm will have difficulties in identifying clusters, since choosing SHES_{HE} to include the large right group gives too many clusters, but either iterating or taking smaller SHES_{HE} leads to too few clusters. This demonstrates how the scatter diagram informs whether and when our approach is suitable for clustering a given graph instance.

5 Concluding Remarks

In this paper, we investigated the entropic centrality of a graph, using the spread/uncertainty of a random walker’s eventual destination as a measure, that is applicable for all families of graphs: un/weighted, un/directed. Studying the probability distribution of a random walker to be absorbed at any given node when initiated at a node of a given entropic centrality, we established principled insights on how to choose query nodes for random walkers, and how to exploit said probability distribution to identify local community structures. We utilized these mechanisms to realize heuristic bottom-up clustering algorithms, relying on the centrality informed choice of query nodes, which inherit the universality of the entropic centrality model, and hence are also applicable across families of graphs. We benchmarked the proposed clustering mechanism using a variety of data sets and by comparing it with other popular algorithms. Given the principles that guided the design of our heuristics, we also explored how the underlying analysis could be leveraged to reason about when and whether our algorithm is suitable to cluster a given graph.

Given the bottom-up and localized nature of the most relevant information that are used in the decision making process, in the future we also want to explore the trade-offs in the quality of results obtained against computational scalability and possible distribution/parallelization, if partial information is used to compute approximate centrality scores and random walker distributions.

Our model naturally fits applications such as the flow of money and its confined circulation among subsets of users, where the volume or frequency of interactions can be mapped to edge weights, and the direction of the flow is vital. Money laundering and cryptocurrency forensics are thus application areas of interest which we want to explore with the designed tools in the immediate future.

Funding

This work was not supported by any funding.

References

  • [1] Alamgir, M. & von Luxburg, U. (2010) Multi-agent random walks for local clustering on graphs. in IEEE Intl. Conf. on Data Mining (ICDM).
  • [2] Andersen, R., Chung, F. & Lang, K. (2006) Local graph partitioning using pagerank vectors. in 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 475–486. IEEE.
  • [3] Bastian, M., Heymann, S. & Jacomy, M. (2009) Gephi: An open source software for exploring and manipulating networks. in AAAI Conf. on Weblogs and Social Media.
  • [4] Bian, Y., Ni, J., Cheng, W. & Zhang, X. (2017) Many heads are better than one: Local community detection by the multi-walker chain. in IEEE Intl. Conf. on Data Mining (ICDM).
  • [5] Blondel, V. D., Guillaume, J. L., Lambiotte, R. & Lefebvre, E. (2008) Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10), P10008.
  • [6] Borgati, S.P. (2005) Centrality and network flow. Social Networks, 27(1).
  • [7]    (2008) On variants of shortest-path betweenness centrality and their Generic Computation. Social Networks, 30.
  • [8] Coutinho, J. (2017) https://sites.google.com/site/ucinetsoftware/datasets/covert-networks/cocainedealingnatarajan.
  • [9] Estrada, E. (2011) Centrality measures. in The structure of complex networks: Theory and applications. Oxford University Press.
  • [10] Freeman, L. (1978) Centrality in social networks conceptual clarification. Social Networks, 1(3).
  • [11] Girvan, M. & Newman, M. E. (2002) Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821–7826.
  • [12] Goh, K., Kahng, B. & Kim, D. (2001) Universal behavior of load distribution in scale-free networks. Physical Review Letters, 87(27).
  • [13] Guiasu, S. (1971) Weighted Entropy. Reports on Mathematical Physics, 2(3).
  • [14] Hagberg, A., Schult, D. & Swart, P. (2008) Exploring network structure, dynamics, and function using NetworkX. in 7th Python in Science Conference (SciPy2008).
  • [15] Higham, D. J. (2019) Centrality-friendship paradoxes: when our friends are more important than us. Journal of Complex Networks, 7(4), 515–528.
  • [16] Kim, Y., Son, S.-W. & H., J. (2010) Finding communities in directed networks. Physics Review E, 81(016103).
  • [17] Lancichinetti, A., Fortunato, S. (2009) Community detection algorithms: A comparative analysis. Physics Review E, 80(056117).
  • [18] Lancichinetti, A., Fortunato, S., Radicchi, F. (2008) Benchmark graphs for testing community detection algorithms. Physics Review E, 78(046110).
  • [19] Lee, C. & Cunningham, P. (2014) Community detection: effective evaluation on large social networks. Journal of Complex Networks, 2(1), 19–37.
  • [20] Leskovec, J., Kleinberg, J. & Faloutsos, C. (2007) Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1).
  • [21] Lusseau, D., Schneider, K., Boisseau, O. J., Haase, P., Slooten, E. & Dawson, S. M. (2003) The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology, 54(4), 396–405.
  • [22] Malliaros, F. D. & Vazirgiannis, M. (2013) Clustering and community detection in directed networks: A survey. Physics Reports, 533(4).
  • [23] Nikolaev, A., Razib, R. & Kucheriya, A. (2015) On efficient use of entropy centrality for social network analysis and community detection. Social Networks, 40.
  • [24] Oggier, F., Phetsouvanh, S. & Datta, A. (2018a) A 178 node directed Bitcoin address subgraph. https://doi.org/10.21979/N9/TJMQ8L, DR-NTU (Data), V1.
  • [25]    (2018b) A 4571 node directed weighted Bitcoin address subgraph. https://doi.org/10.21979/N9/IEPBXV, DR-NTU (Data), V1.
  • [26]    (2018c) Entropic centrality for non-atomic flow networks. in International Symposium on Information Theory and Its Applications (ISITA).
  • [27]    (2019) A split-and-transfer flow based entropic centrality. PeerJ, Computer Science.
  • [28] Opsahl, T., Agneessens, F. & Skvoretz, J. (2010) Node centrality in weighted networks: Generalizing degree and shortest paths. Social Networks, 32.
  • [29] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M. & Duchesnay, E. (2011) Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12.
  • [30] Peel, L., Larremore, D.B. & Clauset, A (2017) The ground truth about metadata and community detection in networks. Science Advances, 3(5).
  • [31] Pfitzner, D., Leibbrandt, R. & Powers, D. (2009) Characterization and Evaluation of Similarity Measures for Pairs of Clusterings. Knowledge Information Systems, 19(3).
  • [32] Phetsouvanh, S., Oggier, F. & Datta, A. (2018) EGRET: Extortion Graph Exploration Techniques in the Bitcoin Network. in IEEE ICDM Workshop on Data Mining in Networks (DaMNet).
  • [33] Quax, R., Apolloni, A. & Sloot, P. (2013) Towards understanding the behavior of physical systems using information theory. The European Physical Journal - Special Topics, 222(6), 1389–1401.
  • [34] Rosvall, M. & Bergstrom, C. (2008) Maps of random walks on complex networks reveal community structure. Proc. of the National Academy of Sciences, 105(4).
  • [35] Schaeffer, S. E. (2007) Graph clustering. Computer Science Review I.
  • [36] Tutzauer, F. (2007) Entropy as a measure of centrality in networks characterized by path-transfer flow. Social Networks, 29.
  • [37] von Luxburg, U., Williamson, R. & Guyon, I. (2012) Clustering: Science or Art?. in Workshop on Unsupervised and Transfer Learning.
  • [38] Zachary, W. (1977) An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33(4).