Centrality of shortest paths: Algorithms and complexity results
Abstract
The degree centrality of a node, defined as the number of nodes adjacent to it, is often used as a measure of importance of a node to the structure of a network. This metric can be extended to paths in a network, where the degree centrality of a path is defined as the number of nodes adjacent to it. In this paper, we reconsider the problem of finding the most degree-central shortest path in an unweighted network. We propose a polynomial algorithm with the worst-case running time of , where is the number of vertices in the network, is the number of edges in the network, and is the maximum degree of the graph. We conduct a numerical study of our algorithm on synthetic and real-world networks and compare our results to the existing literature. In addition, we show that the same problem is NP-hard when a weighted graph is considered. Furthermore, we consider other centrality measures, such as the betweenness and closeness centrality, showing that the problem of finding the most betweenness-central shortest path is solvable in polynomial time and finding the most closeness-central shortest path is NP-hard, regardless of whether the graph is weighted or not.
1 Introduction
1.1 Motivation
The centrality of nodes in a network indicates the relative importance of a node in the network’s topology. For instance, in social networks, the more central a node is, the better information can spread from and through the node. Various metrics have been proposed to measure the centrality of nodes, such as the betweenness, closeness, degree, and eigenvector centralities, which all offer some insight into the influence of a node in how information propagates through a network. As proposed by Everett and Borgatti, (1999), some of these metrics can be extended to consider the centrality of a group of nodes. In this pioneering paper, the authors extend the definitions of degree, closeness, and betweenness centrality to groups of nodes to quantitatively study their influence. With these measures, a natural question arises: what is the most central set of nodes in a network with some special structure? Optimisation problems with this structure were first proposed by Vogiatzis et al., (2014), who first studied the problem of finding the most and the least central cliques in a graph. In this paper, the authors formulate linear 0-1 integer programs and examine their performance on real-world and synthetic instances. Vogiatzis and Camur, (2019) later studied a similar problem of finding the most degree-central induced star graph. They proposed an integer program and heuristic greedy algorithms and applied these to find the most essential proteins in protein-protein interaction networks.
Later, Matsypura et al., (2023) investigated the problem of finding the most degree-central walks in a graph. The authors consider problems of finding the most degree-central walk, trail, path, induced path, and shortest path. They show that the problems are nested in the sense that a shortest path is an instance of an induced path, an induced path is an instance of a simple path, which is an instance of a trail, which, in turn, is an instance of a walk. Further, all the problems considered in that study were shown to be NP-hard, except for the problem of finding the most degree-central shortest path, which can be solved in polynomial time using an algorithm based on the breadth-first search technique. We will refer to this algorithm as the MVP algorithm. For all the remaining problems, the authors developed mixed-integer programs and proposed heuristic solutions as a warm start to improve performance. For a more thorough review of group centrality optimisation problems, we refer the reader to a recent survey by Camur and Vogiatzis, (2024).
In this paper, we extend the work of Matsypura et al., (2023) and focus on centrality of shortest paths. We first develop an efficient polynomial algorithm to find the most degree-central shortest path. Our computational experiments demonstrate the advantage of the proposed algorithm relative to the MVP. We also show that the same problem on a weighted graph is NP-hard even in the presence of two distinct weights. We then study the problem of finding the most betweenness-central shortest path and show that it is solvable in polynomial time for both weighted and unweighted cases. Finally, we prove that the problem of finding the most closeness-central shortest path is NP-hard regardless of whether the weights are present.
The problems under consideration have many applications, including the design of distribution channels and social network analysis with security implications. In the context of distribution network design, a planner may be interested in constructing a hub line network by connecting cities in a single path whilst maximising the network’s reach (de Sá et al.,, 2015). This model can be used to determine routes for efficient and direct high-capacity transport systems such as rail, which are supplemented by trucks and drones for last-mile shipping. During natural disasters, authorities may use this model to consider prioritising locations to set up evacuation centres by maximising the number of affected communities that can be reached most efficiently. In social networks, by considering each user as a node, we can use this model to find the most influential users that can be reached most efficiently (Peng et al.,, 2018), which, from an adversarial player point of view, can be used with significant effect when information and software with malicious intent are distributed.
1.2 Problem definition
Let be an unweighted (possibly directed) graph with a set of vertices and a set of edges . Following the definitions provided by Matsypura et al., (2023), we use to denote a path, i.e., a finite sequence of distinct adjacent vertices in , i.e., . We use to denote the neighbourhood of path , where is the set of vertices adjacent to , excluding the vertices of itself, i.e.,
Further, we define the centrality of path , . Note that when the path consists of a single node, this reduces to the traditional definition of the degree centrality of a node. We also denote by the set of all shortest paths between any pair of nodes in .
Given a graph , we aim to solve the following problem:
(1) |
i.e., find a path with the largest degree centrality under the condition that is a shortest path between a pair of nodes in .
1.3 Structure of the paper
The remaining paper is structured as follows: in Section 2, we present a polynomial time algorithm that can be used to solve Problem (1). In Section 3, we extend our analysis to the case of graphs with weighted edges and show that the problem becomes NP-hard even with just two distinct weights. This is followed by Section 4, where we conduct an extensive numerical study of the proposed algorithm on synthetic and real-world graph instances and provide insights into the results. In Section 5, we consider two other measures of centrality, the betweenness and closeness centrality, and prove the complexity status of finding the most central shortest paths with these metrics. Finally, in Section 6, we provide concluding remarks.
2 Algorithm
Before formally introducing the algorithm, we prove the following lemma that underpins many efficient algorithms for solving problems involving shortest paths.
Lemma 1.
If is a shortest path from to , then is a shortest path from to .
Proof.
Suppose there is a shorter path from to , and let this be . This implies that would be a shorter path from to . However, this cannot be true as is a shortest path from to . ∎
We now extend the lemma above to consider the degree centrality of the path.
Lemma 2.
If is a most degree-central shortest path from to , then is a most degree-central shortest path from to .
Proof.
First, observe that vertex cannot share any neighbours with . To see this, assume that vertex on the path, where , shares a neighbour with vertex . Then cannot possibly be the most degree-central shortest path between and as would be a shorter path.
Now consider the graph in Figure 1 where vertex does not share any neighbours with any node on path , but with and .
From the diagram, it is clear that if is a most degree-central shortest path from to , then is a most degree-central shortest path from to . To see this, suppose that there is an alternate shortest path to vertex that shares no neighbours with vertex and has a higher centrality. In this case, path , would have a centrality greater than that of path as vertices and contribute the same vertices to the neighbourhoods of both paths. ∎
We now introduce the following notation to simplify the exposition. First, consider a source node . Let and be the shortest distance and most degree-central shortest path between nodes and , respectively, and let be the set of vertices that can directly precede vertex on any shortest path from to . Additionally, we introduce the operation , which we use to denote extending the path to vertices , then .
To solve Problem (1), we extend the breadth-first search (BFS) algorithm. The BFS algorithm can be used to traverse a graph from a source node to any other node in the network by iteratively extending paths already constructed in previous iterations. Specifically, we begin with the source node and, at each iteration of the algorithm, consider a queue of vertices such that , if a path from to has been established in a previous iteration and hasn’t been processed yet. Then, we pop the first element in , denoting it , and consider the set , i.e., edges that can be connected to . We then check if has been processed in a previous iteration, and if it has not, we extend the path to and add to the end of (this is the traditional BFS algorithm). Note that this procedure also identifies the shortest path from a source node to all other nodes in an unweighted graph. To simplify the explanation of our algorithm, we adopt some aspects of Dijkstra’s algorithm (Dijkstra,, 1959), which extends the BFS procedure to find shortest paths in weighted graphs. Instead of checking if a node has been reached, we record the shortest distance from a source node to all other vertices in the graph. When we extend a path, we compare the length of the previously calculated shortest path from to and a new path from to and then to and take the shorter path. Then, to obtain the most degree-central shortest path, we modify the algorithm in the following manner: for each vertex , we place all nodes that can directly precede on the shortest path from to in the set . Each time we consider extending the shortest path to , we compare the neighbourhood of the existing most degree-central shortest path from to against the neighbourhoods of paths obtained by extending the most degree-central shortest path to , , to then for all . Furthermore, instead of only considering the case where a path is shorter than a previous path in the traditional Dijkstra’s algorithm, we also consider tied shortest paths as we need to compute and compare the centrality to ensure the most degree-central shortest path is found.
This is summarised in Algorithm 1.
Proposition 1.
Given a graph and a source vertex , Algorithm 1 returns the most degree-central shortest paths from to all other vertices in .
Proof.
We show this by induction.
Base case: The most degree-central shortest path to the source vertex is just . Also, it is easy to see that any path connected in the first iteration of the while loop must be the shortest path from as it is directly connected to the source node. Furthermore, this will be the most degree-central path as there cannot be another path of the same or shorter length to this set of vertices.
Inductive step: Assume that in the th iteration of the while loop, the most degree-central shortest paths to all nodes in have been found.
In the th iteration, we consider node where is the smallest distance between the source vertex and all other previously connected nodes that have not yet been processed. The for-loop in line 8 allows us to branch to all adjacent nodes, while lines 10, 13 and 21 guarantee that only the shortest path will be recorded. The for-loops and if-statements in lines 16-20 and 23-27 then allow us to extend paths to then for all , which, by the inductive hypothesis and Lemma 2, will return the most degree-central shortest path from to that passes through vertex . After considering extending all shortest paths of length in subsequent iterations of the while-loop, and therefore considering all possible , we are guaranteed the most degree-central shortest path from to . ∎
Proposition 2.
Given a graph , the worst-case running time of Algorithm 1 is .
Proof.
Algorithm 1 is a modification of the BFS algorithm. The BFS algorithm has a runtime of as we enumerate all the edges in the graph once in the outer while loop, and we add to the queue times to ensure all vertices are processed.
Assuming that and have been stored and can be easily accessed at each iteration of the while loop, computing in the inner for-loop for all possible ’s takes time in the worst case as we need to consider at most ’s, and computing takes time, which dominates the running time of the inner for-loop. This operation is completed at most times as there are iterations of the while-loop to process all the edges. As a result, Algorithm 1 has runtime of . ∎
This bound can be tightened. Let be the maximum degree of graph . Then, the following is true.
Proposition 3.
Given a graph , the worst-case running time of Algorithm 1 is .
Proof.
There are only iterations of the inner for-loops in lines 16-20 and 23-27 as there are at most nodes that may directly precede vertex on a shortest path to vertex . As a result, one order of can be replaced with from the runtime quoted in Proposition 2. ∎
Theorem 1.
Problem (1) can be solved in time.
Proof.
Comparing this to the running time of Algorithm MVP, which has a runtime of , with being the diameter of the graph, Algorithm 1 has a runtime of as and , which is an improvement of , and in addition, we only require space, rather than .
3 The case of weighted graphs
In this section, we examine solving Problem (1) on a graph with weighted edges. Hence, throughout this section, we assume that graph is weighted.
We first show that Problem (1) is NP-hard even with just two distinct weights by showing it is at least as hard as the Maximum Satisfiability (MaxSAT) problem, which is known to be NP-complete (Krentel,, 1988).
MaxSAT | |
---|---|
Instance: | Set of variables, collection of clauses over and positive integer . |
Question: | Is there a truth assignment for that simultaneously satisfies at least of the clauses in ? |
Before we show the reduction, we formally define the decision variant of Problem (1) for a weighted graph.
MDCWSP-D | |
---|---|
Instance: | A weighted graph and a positive integer . |
Question: | Is there a shortest path between two vertices in such that its degree centrality is greater than ? |
Theorem 2.
MDCWSP-D is NP-complete.
Proof.
First, observe that the degree centrality of a path can be computed in polynomial time; hence, MDCWSP-D is in NP.
Given an instance of MaxSAT, we can construct a directed weighted graph as follows. First, consider an unweighted graph with nodes and edges with the following structure: we start with two unlinked paths and both of length . For , we add additional edges and to . This means that the shortest path between any nodes or and or is of length for . Finally, to complete , we add four additional nodes , that are both connected to and , and and that are connected to and . Note that we connect the nodes such that the edges point towards , and . To obtain the weighted graph , we add vertices and edges to depending on the clauses. For each clause , we add a vertex to and undirected edges for each variable in clause , each with edge weight . Given these edge weights, any shortest path in cannot contain vertex in the middle of the path because an alternate shorter path will exist. Note: The additional edges can be directed; however, we set them to be undirected to amplify the difference between the unweighted and weighted settings.
An example of the structure of is given in Figure 2.
We now show that there exists a truth assignment for with more than satisfiable clauses if and only if there exists a shortest path in with a degree of at least .
If there is some truth assignment for with at least satisfiable clauses, then the corresponding path in will have at least neighbours by construction as we have neighbours from the False assignments and neighbours from the satisfied clauses, with an additional neighbours coming from the source and sink vertices.
Suppose we are given a shortest path in with at least neighbours. We can transform this path into the form , where is the exclusive or operator, and maintain a degree centrality of at least . Consider the following cases:
-
•
If the given path ends at , , or , we remove this node, and the centrality will increase by one.
-
•
If the given path starts and ends in , for example, , then any arbitrary extension of the path so that it starts at and ends at will maintain or increase the centrality as the neighbourhood will contain the unselected nodes in .
-
•
If the path starts or ends at one of the additional nodes , for example, , we can remove from the path and extend the path using the result from the above case, resulting in the path . This new path will not have a centrality smaller than the original path as we will gain as a neighbour and the unselected nodes in .
In all the above cases, the new paths can be computed in polynomial time. The path returned will be a solution to MaxSAT because after the removal of the two start and end nodes, as well as the unvisited nodes in , the remaining nodes in the neighbourhood correspond to the satisfied clauses and the path corresponds to the truth assignment. ∎
Corollary 1.
Problem (1) with a weighted graph is NP-hard.
We now discuss two special cases of Problem (1) on a weighted graph and show how to solve them. The first is the case of positive integer-valued weights, and the second is the case where weights are generated from some positive continuous distribution.
Case 1: Although Dijkstra’s algorithm can be used to solve the problem of finding the shortest path in a graph with positive weighted edges, we note that Lemma 2 does not hold for weighted graphs. However, if the weights are positive integers, we can augment the network with auxiliary vertices and edges to represent the weighted edges. Then, Algorithm 1 can be applied to the augmented graph with a slight modification to ensure that the auxiliary nodes are not counted in the neighbourhood. We demonstrate the importance of this using the example in Figure 3. Given the weighted graph in Figure 3(a), the most degree-central shortest path is with and . We now construct the augmented graph as shown in Figure 3(b). We label the auxiliary vertices using tuples to indicate that the new node is edges away from node on the original arc. If we were to naïvely apply Algorithm 1 on the augmented graph, the optimal path would be with and . However, the corresponding path without the auxiliary nodes only has the centrality of 3. This discrepancy arises because the auxiliary vertices introduced for the edges linking nodes 3 and 4 to the path should only contribute one neighbour to the neighbourhood.
To avoid the issue of double-counting nodes when computing the neighbourhood of a path, additional work needs to be done to consider the topology of the original graph. This will not worsen the asymptotic running time of Algorithm 1 as the additional work in lines 17 and 24 of the algorithm is checking what type of node we are processing. As a result, Algorithm 1 will have a running time of , where is the sum of all edge weights. Note that the running time is not as we only need to change the number of times the while-loop is executed. The remaining runtime remains the same as we only need to consider setting the starting vertex to be one that is in the original graph, and we only need to compute and compare the centrality of the paths when processing a vertex in the original graph. By applying Algorithm 1 on the augmented graph, the runtime is now pseudo-polynomial, as it depends on the sum of the edge weights.
Case 2: If the edge weights are generated from some positive continuous distribution for all , the shortest path will be unique with probability 1 as for all . In this case, the problem can be solved by first solving the all pairs shortest path problem and then evaluating and comparing the centralities of the paths. This can be done in time with the Floyd-Warshall algorithm.
4 Computational experiments
We tested Algorithm 1 on a large number of synthetic and real-world graphs. The experiments were conducted on a Mac Studio (2023) with Apple M2 Ultra and 128 GB of RAM running macOS Ventura 13.6. The algorithm was implemented in Python 3.9.
4.1 Data and results
The synthetic graphs were randomly generated using two procedures outlined below (each type of graph was generated 30 times for each parameter setting):
-
•
Watts–Strogatz graphs. We considered the ‘small-world’ network proposed by Watts and Strogatz, (1998), which represents networks that are “highly clustered, like regular lattices, yet have small characteristic path lengths”. We considered Watts–Strogatz graphs with 100, 500, 1000, 5000, and 10000 nodes. The number of initial neighbours was set to 4, and the rewiring probability was set to one of 0.1 or 0.2.
-
•
Barabási-Albert graphs. The ‘scale-free preferential attachment’ model proposed by Barabási and Albert, (1999) generates graphs with edges whose degree follows the power rule. We considered graphs with the same number of nodes as the Watts–Strogatz instances and set the number of edges attached to any new node to 2 to mirror the edge density of the Watts-Strogatz graphs.
In total, we generated and analysed 450 synthetic instances. We averaged the results for each parameter instance and summarised them in Tables 1 and 2.
To compare our results to the MVP algorithm, we also considered the following undirected real-world instances:
-
•
Krebs: The terrorist network of the 9/11 hijackers (Krebs,, 2002).
-
•
Dolphins: A network representing the social interactions of bottlenose dolphins (Lusseau et al.,, 2003).
-
•
Sandi Auths: An academic collaboration network (Rossi and Ahmed,, 2015).
-
•
IEEE Bus: A subnetwork of the US Electric Power System in 1962 (Davis and Hu,, 2011).
-
•
Santa Fe: A collaboration network for the Santa Fe Institute (Girvan and Newman,, 2002).
- •
-
•
Bus: Bus power system network (Davis and Hu,, 2011).
- •
-
•
Cerevisiae: Biological network of yeast protein-protein interactions (Davis and Hu,, 2011).
The results from these real-world graph instances are presented in Table 3.
In addition to the unweighted graphs, we also consider the following weighted graphs:
-
•
Copenhagen calls and Copenhagen SMS: Undirected and directed graphs with weights for the number of phone calls made and text messages sent between university students in Copenhagen over multiple weeks (Sapiezynski et al.,, 2019).
-
•
Bitcoin Alpha and Bitcoin OTC: Directed graphs of trust scores between users on the Bitcoin Alpha and Bitcoin OTC platforms (Kumar et al.,, 2016). Note that we have transformed the weights from a range to to ensure positive weights.
-
•
Advogato: Directed graph of trust scores from users of Advogato, an open source software community (Massa et al.,, 2009). Similar to the Bitcoin graphs, the weights have been rescaled from having weights of to .
For these weighted graphs, we examine the performance of Algorithm 1 on the unweighted graph obtained by ignoring the weights and the augmented graph with the modified version of Algorithm 1. The results are presented in Tables 4 and 5. Note that these two tables also include rows for the number of nodes traversed so that comparisons can be made to the unweighted case.
In addition to the standard statistics, we report the size of the feasible set , i.e., the number of shortest paths in , as well as which is the number of pairwise connectable nodes in (i.e., the number of shortest paths if they were all unique), and finally the ratio which can be used to compare the size of the feasible sets. For the sake of brevity, the values of and can be found in the appendix.
Watts–Strogatz (4, 0.1) | Watts–Strogatz (4, 0.2) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
100 | 500 | 1000 | 5000 | 10000 | 100 | 500 | 1000 | 5000 | 10000 | |
200 | 1000 | 2000 | 10000 | 20000 | 200 | 1000 | 2000 | 10000 | 20000 | |
5.87 | 6.40 | 6.70 | 7.27 | 7.33 | 6.60 | 7.40 | 7.43 | 8.03 | 8.20 | |
2.45 | 3.18 | 3.53 | 4.37 | 4.80 | 2.04 | 2.25 | 2.35 | 2.57 | 2.66 | |
diam | 10.43 | 15.40 | 17.73 | 22.33 | 24.50 | 8.43 | 11.50 | 12.97 | 16.13 | 17.47 |
diam centrality | 22.23 | 31.20 | 35.37 | 42.53 | 45.20 | 21.67 | 28.80 | 30.60 | 37.73 | 40.17 |
path length | 9.03 | 12.97 | 14.73 | 18.63 | 19.37 | 7.03 | 9.33 | 10.17 | 12.53 | 13.27 |
path centrality | 23.17 | 33.87 | 37.60 | 46.90 | 50.40 | 23.33 | 31.73 | 35.17 | 43.77 | 47.07 |
MVP runtime | 0.41 | 59.73 | 513.84 | - | - | 0.41 | 55.73 | 482.51 | - | - |
Alg 1 runtime | 0.07 | 2.08 | 10.26 | 366.88 | 1712.52 | 0.06 | 2.08 | 9.75 | 359.95 | 1704.02 |
Barabási-Albert | |||||
---|---|---|---|---|---|
100 | 500 | 1000 | 5000 | 10000 | |
196 | 996 | 1996 | 9996 | 19996 | |
25.27 | 53.83 | 76.53 | 188.20 | 257.40 | |
2.13 | 2.67 | 2.87 | 3.34 | 3.53 | |
diam | 5.57 | 7.03 | 7.27 | 8.53 | 9.00 |
diam centrality | 46.17 | 120.10 | 179.30 | 382.57 | 553.27 |
path length | 3.87 | 4.67 | 5.07 | 5.80 | 6.03 |
path centrality | 51.87 | 138.33 | 199.83 | 483.93 | 676.47 |
MVP runtime | 0.36 | 45.92 | 381.45 | - | - |
Alg 1 runtime | 0.07 | 2.31 | 13.46 | 713.06 | 3792.77 |
Kreb | Dolphins | Sandi Auths | IEEE Bus | Santa Fe | US Air 97 | Bus | Cerevisiae | ||
62 | 62 | 86 | 118 | 118 | 332 | 662 | 1133 | 1458 | |
153 | 159 | 124 | 179 | 200 | 2126 | 906 | 5451 | 1948 | |
22 | 12 | 12 | 9 | 29 | 139 | 9 | 71 | 56 | |
1.95 | 2.94 | 1.36 | 2.26 | 1.51 | 5.55 | 2.44 | 6.73 | 2.57 | |
diam | 5 | 8 | 11 | 14 | 12 | 6 | 25 | 8 | 19 |
diam centrality | 38 | 30 | 32 | 32 | 90 | 167 | 45 | 159 | 57 |
path length | 3 | 6 | 7 | 8 | 10 | 3 | 20 | 4 | 7 |
path centrality | 40 | 36 | 34 | 33 | 92 | 206 | 50 | 187 | 156 |
MVP runtime | 0.09 | 0.11 | 0.25 | 0.73 | 0.66 | 21.96 | 147.03 | 807.92 | 1317.34 |
Alg 1 runtime | 0.03 | 0.03 | 0.04 | 0.09 | 0.09 | 2.17 | 3.88 | 26.53 | 22.89 |
Copenhagen calls | Copenhagen SMS | |||||||
weighted | no | yes | no | yes | no | yes | no | yes |
directed | yes | yes | no | no | yes | yes | no | no |
536 | 3212 | 536 | 3515 | 568 | 23598 | 568 | 24204 | |
924 | 3600 | 621 | 3600 | 1303 | 24333 | 697 | 24333 | |
18 | 18 | 18 | 18 | 11 | 11 | 11 | 11 | |
1.43 | 1.35 | 1.79 | 1.50 | 1.99 | 1.30 | 2.12 | 1.32 | |
diam | 21 | 75 | 22 | 197 | 22 | 1783 | 20 | 3781 |
diam nodes traversed | 22 | 25 | 23 | 15 | 23 | 10 | 21 | 12 |
diam centrality | 61 | 25 | 41 | 42 | 30 | 22 | 42 | 30 |
path length | 16 | 22 | 15 | 24 | 18 | 61 | 10 | 67 |
path nodes traversed | 17 | 15 | 16 | 15 | 19 | 20 | 11 | 20 |
path centrality | 63 | 52 | 66 | 63 | 49 | 59 | 50 | 64 |
Alg 1 runtime | 0.40 | 4.31 | 0.81 | 4.53 | 1.10 | 74.19 | 1.33 | 96.71 |
Bitcoin Alpha | Bitcoin OTC | Advogato | ||||
weighted | no | yes | no | yes | no | yes |
3783 | 281050 | 5881 | 397821 | 6539 | 167369 | |
24186 | 301453 | 35592 | 427532 | 51127 | 211957 | |
511 | 511 | 795 | 795 | 805 | 805 | |
9.50 | 2.68 | 10.74 | 19.18 | 8.54 | 1.83 | |
diam | 10 | 107 | 11 | 107 | 11 | 35 |
diam nodes traversed | 11 | 12 | 12 | 12 | 12 | 7 |
diam centrality | 689 | 266 | 971 | 299 | 318 | 846 |
path length | 5 | 39 | 4 | 31 | 4 | 11 |
path nodes traversed | 6 | 9 | 5 | 12 | 5 | 4 |
path centrality | 865 | 891 | 1349 | 1433 | 1318 | 1165 |
Alg 1 runtime | 1134.96 | 20996.75 | 4189.95 | 58050.34 | 1345.74 | 24927.09 |
4.2 Discussion and insights
We observe that using Algorithm 1, we can solve problems much more efficiently than using the MVP algorithm. For instance, the Watts-Strogatz instances with 1000 nodes, which took around 8 minutes to solve, can now be solved in roughly 10 seconds. Problems involving 5000 and 10000 nodes can now be solved within a fifth and half of an hour, respectively. A similar observation can be made for the Barabási-Albert graphs. We did not consider solving larger instances with the MVP algorithm, as one 5000-node instance took over 13 hours to solve. This is consistent with the expected runtime based on the difference between solving 100 and 500-node instances. For real-world instances, the improvement in runtime is also significant. For example, the Cerevisiae instance no longer requires over twenty minutes to solve; it can be solved in just over twenty seconds. We also note that in practice, Algorithm 1 scales much more efficiently than stated in Theorem 1 because, for most vertices and paths , the size of and is small as the number of immediate neighbours is smaller than the maximum degree of the graph . We also observe the difference in running time over the two types of synthetic instances. Although the number of vertices and edges in both sets of graphs are nearly identical, the differences in running times for both algorithms on both sets of graphs are pronounced. Algorithm 1 returns a solution faster in Watts-Strogatz graphs than Barabási-Albert graphs of similar sizes, while the converse is true for Algorithm MVP. This is consistent with the theoretical running times of the MVP algorithm, and Theorem 1 as Watts-Strogatz graphs have smaller maximum degrees but larger diameters than Barabási-Albert graphs.
We also observe that the weighted case is much harder to solve. However, as mentioned earlier, we see that the ratio of the running times between the weighted and unweighted case is not as implied by the analysis. It is larger. This is because the work done in each iteration to compute and compare is not the same across both cases. More work needs to be done in the weighted case as we need to check the type of node to which we are extending the path. Table 4 also offers insight into the algorithm’s performance on directed and undirected graphs. In order to model the undirected edges, we use two directed edges. This means that although undirected graphs have fewer edges, the algorithm is faster on directed graphs as there is not necessarily bidirectional flow between all pairs of connected vertices. In addition, we can see that the centrality of the optimal path is different when weights are considered. There is no guarantee on whether this value is higher or lower. Hence, there is a need to compare the cases separately depending on the problem instance.
We finally note that although in our experiments, both Algorithm MVP and Algorithm 1 were run sequentially over all starting vertices, both could be run in parallel for an improved runtime.
5 Other measures of centrality
As mentioned in the introduction, other centrality measures can also be applied. In this section, we consider two alternative metrics: the betweenness centrality and the closeness centrality.
5.1 Betweenness centrality
We first consider the problem of finding the most betweenness-central shortest path, which we define as follows.
For a path , let be the number of shortest paths between any pair of nodes not on that traverse through at least one node in , with
where is the number of shortest (geodesic) paths between nodes and that traverse through at least one node in . Although this definition is different to the one provided by Everett and Borgatti, (1999), who define the betweenness centrality for a set of nodes as
where is the number of geodesics between and , we disregard the term in the denominator so that this becomes a counting function. If a normalised measure is desired, one may divide by to yield a similar result.
It then follows that the problem of finding the most betweenness-central shortest path becomes
(2) |
To solve this problem, we rely on the following lemma:
Lemma 3.
If is a most betweenness-central shortest path from to , then is a most betweenness-central shortest path from to .
Proof.
By Lemma 1, it is clear that if is a shortest path from to then is a shortest path from to .
Consider the graph in Figure 4 where the nodes have been arranged from left to right in terms of distance from . This implies that a shortest path from to any other node must travel from left to right and cannot contain a vertical edge. Additionally, assume there is a path which is a shortest path from to , and for the sake of argument, is another shortest path from to .
Given that is the most betweenness-central shortest path between and , removing from the end of the path will update the betweenness centralilty score by adding paths that end at and traverse through at least one of the nodes in and subtracting paths that only traverse through and none of the other nodes in the remaining path. The number of paths subtracted in this case is fixed regardless of whether or was the most betweenness-central shortest path between and . This, in turn, implies that for to be the most betweenness-central shortest path, must be the most betweenness-central shortest path between and as this path already accounts for shortest paths that end in that traverse through at least one node in the path. ∎
Lemma 3 implies that an algorithm similar to Algorithm 1 can be applied to solve Problem 2; however, some simplifications can be made. Comparing Lemma 2 and Lemma 3, we only need to extend the existing most central shortest paths by one node rather than two nodes. This means that the nested for-loops in lines 16-20 and 23-27 can be removed and replaced with an if-statement that checks whether is larger than . As can be computed in polynomial time, we arrive at the following theorem.
Theorem 3.
Problem (2) is solvable in polynomial time.
Furthermore, solving Problem (2) with weighted graphs is easier than solving Problem (1). By Lemma 3, we only need to look back to the preceding node, and no distance comparison is required. As a result, Lemma 3 also holds for graphs with positively weighted edges. This means that the algorithm described above can be further adapted to handle graphs with positively weighted edges by updating the new distance with in line 9, dropping lines 10-12 and using a priority queue instead of a standard queue, which is more akin to Dijkstra’s algorithm (Dijkstra,, 1959) than the standard BFS procedure. This leads to the following result.
Corollary 2.
Problem (2) on a graph with positively weighted edges is solvable in polynomial time.
5.2 Closeness centrality
We now consider the problem of finding the most closeness-central shortest path, where the closeness centrality of a path is defined below.
Let be the shortest distance from a node to the path ,
where is the length of the shortest path from to . To compute the closeness centrality for path , we take the maximum closeness centrality scores over all nodes that are not on
This leads to the problem of finding the most closeness-central shortest path:
(3) |
Although other aggregation functions can be used to define a path’s closeness centrality (see Vogiatzis et al., (2014), where the authors study the sum and maximum of the closest distances), we chose this definition as it represents the furthest distance one must travel to reach the path . For example, this could represent the largest number of towns any commuter in a region must travel through to reach the closest train station. Even with the choice of aggregation functions that make this objective easy to work with, we show that the problem is as hard as the Satisfiability (SAT) problem, which is known to be NP-complete (Garey and Johnson,, 1979).
SAT | |
---|---|
Instance: | Set of variables, collection of clauses over . |
Question: | Is there a satisfying truth assignment for ? |
We define the corresponding decision version of Problem (3) (MCCSP-D) below.
MCCSP-D | |
---|---|
Instance: | A graph and a positive integer . |
Question: | Is there a shortest path between two vertices in such that its closeness centrality is less than ? |
Theorem 4.
MCCSP-D is NP-complete.
Proof.
MCCSP-D is in NP because all pairwise distances in can be computed in polynomial time. Hence, computing the closeness centrality of a path can also be completed in polynomial time.
Given an instance of SAT, we construct a graph in the following manner. Similar to the proof of Theorem 2, we start by constructing a graph with nodes and directed edges. Again, we start with two unlinked paths and both of length . For , we add edges and to connect these two paths. To complete , we add in two nodes, and , connected to the existing graph with the following edges , , and . Finally, to obtain , we start with , and for each clause , we add the following nodes and edges. For each variable in clause , where , we construct an undirected path of length . Overall, has nodes and directed and undirected edges, where is the length of the formula, i.e., the number of variables in the clauses. Comparing this graph to the one used in the proof of Theorem 2, there is only one copy of the source and sink nodes and , and the paths from to consists of unit-weighted edges rather than a single edge with weight .
We now show that a solution to SAT exists if and only if there exists a shortest path in with a closeness centrality less than or equal to .
Given a solution to SAT, the corresponding path in will have a closeness centrality of exactly (or 1 if the instance of SAT contains no clauses). This is because any nodes that are associated with unsatisfied clauses will be at away from the shortest path.
Given a solution to MCCSP-D with a closeness centrality less than or equal , we have the following cases:
-
1.
The path starts at and ends at . Then, removing and yields the corresponding satisfying truth assignment for .
-
2.
The path starts at or and ends at or . Then, any arbitrary extension of the path through any nodes in such that the path starts at and ends at leads to the first case.
-
3.
The path starts and/or ends at one of the nodes in but not in . Then, removing the nodes only in and arbitrarily extending the path such that it starts at and ends at again leads to the first case. Note that if the path only traverses nodes corresponding to one clause, then the mapping to SAT is trivial.
As these transformations can be computed in polynomial time, MCCSP-D is NP-complete. ∎
Corollary 3.
Problem (3) is NP-hard.
5.3 Discussion and insights
We summarise the complexity status of the problems considered in this paper in Table 6.
Centrality measure | Unweighted graph | Weighted graph |
---|---|---|
Betweenness | P | P |
Degree | P | NP-hard |
Closeness | NP-hard | NP-hard |
Comparing the three centrality measures analysed in this paper, we observe that the complexity of the problem of finding the most central shortest path depends on the measure of centrality employed. We see that the problem of finding the most betweenness-central shortest path is the easiest as it is solvable in polynomial time for both unweighted and weighted graphs. On the other hand, the problem of finding the most degree-central shortest path can be solved in polynomial time on unweighted graphs but is NP-hard on weighted graphs. In contrast, the problem of finding the most closeness-central shortest path is NP-hard on both unweighted and weighted graphs. We hypothesise that this relationship arises as betweenness centrality is distance-invariant, i.e., we are just counting the number of shortest paths. Similarly, finding the most degree-central shortest path in an unweighted graph can be computed in polynomial time because optimal paths of length can be obtained by extending optimal paths of length . However, when weights are added to the edges, the result no longer holds, and the problem becomes NP-hard. Unlike these two metrics, closeness centrality has a built-in distance measure, and hence the problem of finding the most closeness-central shortest path is NP-hard regardless of whether the graph has weighted edges or not.
6 Concluding remarks
In this paper, we consider the problem of finding the most central shortest path in a graph. We show that the problem of finding the most degree-central shortest path on a simple graph can be solved in polynomial time by extending the BFS algorithm. This new algorithm is more efficient than the one proposed by Matsypura et al., (2023). Our numerical tests on synthetic and real-world graph instances demonstrate significant improvement. In addition to this, we show that the above problem with a weighted graph is NP-hard, even with two distinct weights, and we provide a pseudo-polynomial time extension of our algorithm for the integer weighted problem, as well as a polynomial time solution to the case with continuous weights.
Furthermore, we show that finding the most betweenness-central shortest path can be completed in polynomial time regardless of whether the graph has weighted edges or not. In contrast, the problem of finding the most closeness-central shortest path is NP-hard, irrespective of whether the graph is weighted or not. Future work in this area may focus on studying the problem of finding the most closeness-central shortest path with other aggregation functions or investigate heuristics and approximation schemes for problems established to be NP-hard in this paper.
References
- Barabási and Albert, (1999) Barabási, A.-L. and Albert, R. (1999). Emergence of scaling in random networks. Science (American Association for the Advancement of Science), 286(5439):509–512.
- Batagelj and Mrvar, (2006) Batagelj, V. and Mrvar, A. (2006). Pajek datasets.
- Camur and Vogiatzis, (2024) Camur, M. C. and Vogiatzis, C. (2024). A survey on optimization studies of group centrality metrics.
- Davis and Hu, (2011) Davis, T. A. and Hu, Y. (2011). The University of Florida Sparse Matrix Collection. ACM Transactions on Mathematical Software (TOMS), 38(1).
- de Sá et al., (2015) de Sá, E. M., Contreras, I., Cordeau, J.-F., de Camargo, R. S., and de Miranda, G. (2015). The hub line location problem. Transportation Science, 49(3):500–518.
- Dijkstra, (1959) Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische Mathematik, 1(1):269–271.
- Everett and Borgatti, (1999) Everett, M. G. and Borgatti, S. P. (1999). The centrality of groups and classes. The Journal of Mathematical Sociology, 23(3):181–201.
- Garey and Johnson, (1979) Garey, M. R. and Johnson, D. S. (1979). Computers and intractability: a guide to the theory of NP-completeness. A Series of books in the mathematical sciences. W. H. Freeman, San Francisco.
- Girvan and Newman, (2002) Girvan, M. and Newman, M. E. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12):7821–7826.
- Guimera et al., (2003) Guimera, R., Danon, L., Diaz-Guilera, A., Giralt, F., and Arenas, A. (2003). Self-similar community structure in a network of human interactions. Physical Review E, 68(6):065103.
- Krebs, (2002) Krebs, V. (2002). Uncloaking terrorist networks. First Monday, 7(4).
- Krentel, (1988) Krentel, M. W. (1988). The complexity of optimization problems. Journal of Computer and System Sciences, 36(3):490–509.
- Kumar et al., (2016) Kumar, S., Spezzano, F., Subrahmanian, V., and Faloutsos, C. (2016). Edge weight prediction in weighted signed networks. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pages 221–230.
- Lusseau et al., (2003) Lusseau, D., Schneider, K., Boisseau, O. J., Haase, P., Slooten, E., and Dawson, S. M. (2003). The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology, 54(4):396–405.
- Massa et al., (2009) Massa, P., Salvetti, M., and Tomasoni, D. (2009). Bowling alone and trust decline in social network sites. In 2009 Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing, pages 658–663. IEEE.
- Matsypura et al., (2023) Matsypura, D., Veremyev, A., Pasiliao, E. L., and Prokopyev, O. A. (2023). Finding the most degree-central walks and paths in a graph: Exact and heuristic approaches. European Journal of Operational Research, 308(3):1021–1036.
- Peng et al., (2018) Peng, S., Zhou, Y., Cao, L., Yu, S., Niu, J., and Jia, W. (2018). Influence analysis in social networks: A survey. Journal of Network and Computer Applications, 106:17–32.
- Rossi and Ahmed, (2015) Rossi, R. and Ahmed, N. (2015). The network data repository with interactive graph analytics and visualization. Proceedings of the AAAI Conference on Artificial Intelligence, 29(1).
- Sapiezynski et al., (2019) Sapiezynski, P., Stopczynski, A., Lassen, D. D., and Lehmann, S. (2019). Interaction data from the Copenhagen Networks Study. Scientific data, 6(1):315–10.
- Vogiatzis and Camur, (2019) Vogiatzis, C. and Camur, M. C. (2019). Identification of essential proteins using induced stars in protein–protein interaction networks. INFORMS Journal on Computing, 31(4):703–718.
- Vogiatzis et al., (2014) Vogiatzis, C., Veremyev, A., Pasiliao, E., and Pardalos, P. (2014). An integer programming approach for finding the most and the least central cliques. Optimization Letters, 9:1–19.
- Watts and Strogatz, (1998) Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature (London), 393(6684):440–442.
Appendix A Additional tables
Watts–Strogatz (4, 0.1) | |||||
---|---|---|---|---|---|
100 | 500 | 1000 | 5000 | 10000 | |
200 | 1000 | 2000 | 10000 | 20000 | |
12388.6 | 398706.1 | 1766514.3 | 54612545.7 | 239916860.4 | |
5050 | 125250 | 500500 | 12502500 | 50005000 | |
2.45 | 3.18 | 3.53 | 4.37 | 4.80 |
Watts–Strogatz (4, 0.2) | |||||
---|---|---|---|---|---|
100 | 500 | 1000 | 5000 | 10000 | |
200 | 1000 | 2000 | 10000 | 20000 | |
10314.1 | 282350.7 | 1178139.3 | 32168149.2 | 133122135.8 | |
5050 | 125250 | 500500 | 12502500 | 50005000 | |
2.04 | 2.25 | 2.35 | 2.57 | 2.66 |
Barabási-Albert | |||||
---|---|---|---|---|---|
100 | 500 | 1000 | 5000 | 10000 | |
196 | 996 | 1996 | 9996 | 19996 | |
10750.7 | 334680.5 | 1435649.6 | 41703849.9 | 176462004.9 | |
5050 | 125250 | 500500 | 12502500 | 50005000 | |
2.13 | 2.67 | 2.87 | 3.34 | 3.53 |
Kreb | Dolphins | Sandi Auths | IEEE Bus | Santa Fe | US Air 97 | Bus | Cerevisiae | ||
62 | 62 | 86 | 118 | 118 | 332 | 662 | 1133 | 1458 | |
153 | 159 | 124 | 179 | 200 | 2126 | 906 | 5451 | 1948 | |
3807 | 5734 | 5071 | 15897 | 10613 | 306814 | 536222 | 4321934 | 2738552 | |
1953 | 1953 | 3741 | 7021 | 7021 | 55278 | 219453 | 642411 | 1063611 | |
1.95 | 2.94 | 1.36 | 2.26 | 1.51 | 5.55 | 2.44 | 6.73 | 2.57 |
Copenhagen calls | Copenhagen SMS | |||||||
weighted | no | yes | no | yes | no | yes | no | yes |
directed | yes | yes | no | no | yes | yes | no | no |
536 | 3212 | 536 | 3515 | 568 | 23598 | 568 | 24204 | |
924 | 3600 | 621 | 3600 | 1303 | 24333 | 697 | 24333 | |
73136 | 69157 | 109296 | 91811 | 364599 | 239031 | 222517 | 138371 | |
51224 | 51224 | 61040 | 61040 | 183411 | 183411 | 104869 | 104869 | |
1.43 | 1.35 | 1.79 | 1.50 | 1.99 | 1.30 | 2.12 | 1.32 |
Bitcoin Alpha | Bitcoin OTC | Advogato | ||||
weighted | no | yes | no | yes | no | yes |
3783 | 281050 | 5881 | 397821 | 6539 | 167369 | |
24186 | 301453 | 35592 | 427532 | 51127 | 211957 | |
116049178 | 32751385 | 297445256 | 531265752 | 138259270 | 29573473 | |
12215353 | 12215353 | 27696379 | 27696379 | 16190692 | 16190692 | |
9.50 | 2.68 | 10.74 | 19.18 | 8.54 | 1.83 |