Centrality of shortest paths: Algorithms and complexity results

Johnson Phosavanh The University of Sydney Dmytro Matsypura The University of Sydney

Abstract

The degree centrality of a node, defined as the number of nodes adjacent to it, is often used as a measure of importance of a node to the structure of a network. This metric can be extended to paths in a network, where the degree centrality of a path is defined as the number of nodes adjacent to it. In this paper, we reconsider the problem of finding the most degree-central shortest path in an unweighted network. We propose a polynomial algorithm with the worst-case running time of $O(|E||V|^{2}\Delta(G))$ , where $|V|$ is the number of vertices in the network, $|E|$ is the number of edges in the network, and $\Delta(G)$ is the maximum degree of the graph. We conduct a numerical study of our algorithm on synthetic and real-world networks and compare our results to the existing literature. In addition, we show that the same problem is NP-hard when a weighted graph is considered. Furthermore, we consider other centrality measures, such as the betweenness and closeness centrality, showing that the problem of finding the most betweenness-central shortest path is solvable in polynomial time and finding the most closeness-central shortest path is NP-hard, regardless of whether the graph is weighted or not.

1 Introduction

1.1 Motivation

The centrality of nodes in a network indicates the relative importance of a node in the network’s topology. For instance, in social networks, the more central a node is, the better information can spread from and through the node. Various metrics have been proposed to measure the centrality of nodes, such as the betweenness, closeness, degree, and eigenvector centralities, which all offer some insight into the influence of a node in how information propagates through a network. As proposed by Everett and Borgatti, (1999), some of these metrics can be extended to consider the centrality of a group of nodes. In this pioneering paper, the authors extend the definitions of degree, closeness, and betweenness centrality to groups of nodes to quantitatively study their influence. With these measures, a natural question arises: what is the most central set of nodes in a network with some special structure? Optimisation problems with this structure were first proposed by Vogiatzis et al., (2014), who first studied the problem of finding the most and the least central cliques in a graph. In this paper, the authors formulate linear 0-1 integer programs and examine their performance on real-world and synthetic instances. Vogiatzis and Camur, (2019) later studied a similar problem of finding the most degree-central induced star graph. They proposed an integer program and heuristic greedy algorithms and applied these to find the most essential proteins in protein-protein interaction networks.

Later, Matsypura et al., (2023) investigated the problem of finding the most degree-central walks in a graph. The authors consider problems of finding the most degree-central walk, trail, path, induced path, and shortest path. They show that the problems are nested in the sense that a shortest path is an instance of an induced path, an induced path is an instance of a simple path, which is an instance of a trail, which, in turn, is an instance of a walk. Further, all the problems considered in that study were shown to be NP-hard, except for the problem of finding the most degree-central shortest path, which can be solved in polynomial time using an algorithm based on the breadth-first search technique. We will refer to this algorithm as the MVP algorithm. For all the remaining problems, the authors developed mixed-integer programs and proposed heuristic solutions as a warm start to improve performance. For a more thorough review of group centrality optimisation problems, we refer the reader to a recent survey by Camur and Vogiatzis, (2024).

In this paper, we extend the work of Matsypura et al., (2023) and focus on centrality of shortest paths. We first develop an efficient polynomial algorithm to find the most degree-central shortest path. Our computational experiments demonstrate the advantage of the proposed algorithm relative to the MVP. We also show that the same problem on a weighted graph is NP-hard even in the presence of two distinct weights. We then study the problem of finding the most betweenness-central shortest path and show that it is solvable in polynomial time for both weighted and unweighted cases. Finally, we prove that the problem of finding the most closeness-central shortest path is NP-hard regardless of whether the weights are present.

The problems under consideration have many applications, including the design of distribution channels and social network analysis with security implications. In the context of distribution network design, a planner may be interested in constructing a hub line network by connecting cities in a single path whilst maximising the network’s reach (de Sá et al.,, 2015). This model can be used to determine routes for efficient and direct high-capacity transport systems such as rail, which are supplemented by trucks and drones for last-mile shipping. During natural disasters, authorities may use this model to consider prioritising locations to set up evacuation centres by maximising the number of affected communities that can be reached most efficiently. In social networks, by considering each user as a node, we can use this model to find the most influential users that can be reached most efficiently (Peng et al.,, 2018), which, from an adversarial player point of view, can be used with significant effect when information and software with malicious intent are distributed.

1.2 Problem definition

Let $G=(V,E)$ be an unweighted (possibly directed) graph with a set of vertices $V$ and a set of edges $E$ . Following the definitions provided by Matsypura et al., (2023), we use $P$ to denote a path, i.e., a finite sequence of distinct adjacent vertices in $G$ , i.e., $P:=\langle i_{1},i_{2},\dots,i_{n}\rangle$ . We use $\mathcal{N}(P)$ to denote the neighbourhood of path $P$ , where $\mathcal{N}(P)$ is the set of vertices adjacent to $P$ , excluding the vertices of $P$ itself, i.e.,

\mathcal{N}(P):=\{v:(u,v)\in E,u\in P\}\setminus P.

Further, we define the centrality of path $P$ , $C(P):=|\mathcal{N}(P)|$ . Note that when the path consists of a single node, this reduces to the traditional definition of the degree centrality of a node. We also denote by $\mathcal{SP}(G)$ the set of all shortest paths between any pair of nodes in $G$ .

Given a graph $G$ , we aim to solve the following problem:

\displaystyle\max\{C(P):P\in\mathcal{SP}(G)\},

(1)

i.e., find a path $P$ with the largest degree centrality under the condition that $P$ is a shortest path between a pair of nodes in $G$ .

1.3 Structure of the paper

The remaining paper is structured as follows: in Section 2, we present a polynomial time algorithm that can be used to solve Problem (1). In Section 3, we extend our analysis to the case of graphs with weighted edges and show that the problem becomes NP-hard even with just two distinct weights. This is followed by Section 4, where we conduct an extensive numerical study of the proposed algorithm on synthetic and real-world graph instances and provide insights into the results. In Section 5, we consider two other measures of centrality, the betweenness and closeness centrality, and prove the complexity status of finding the most central shortest paths with these metrics. Finally, in Section 6, we provide concluding remarks.

2 Algorithm

Before formally introducing the algorithm, we prove the following lemma that underpins many efficient algorithms for solving problems involving shortest paths.

Lemma 1.

If $\langle s,p_{1},\ldots,p_{k},t\rangle$ is a shortest path from $s$ to $t$ , then $\langle s,p_{1},\ldots,p_{k}\rangle$ is a shortest path from $s$ to $p_{k}$ .

Proof.

Suppose there is a shorter path from $s$ to $p_{k}$ , and let this be $\langle s,\tilde{p}_{1},\ldots,\tilde{p}_{j},p_{k}\rangle$ . This implies that $\langle s,\tilde{p}_{1},\ldots,\tilde{p}_{j},p_{k},t\rangle$ would be a shorter path from $s$ to $t$ . However, this cannot be true as $\langle s,p_{1},\ldots,p_{k},t\rangle$ is a shortest path from $s$ to $t$ . ∎

We now extend the lemma above to consider the degree centrality of the path.

Lemma 2.

If $\langle s,p_{1},\ldots,p_{k-1},p_{k},t\rangle$ is a most degree-central shortest path from $s$ to $t$ , then $\langle s,p_{1},\ldots,p_{k-1}\rangle$ is a most degree-central shortest path from $s$ to $p_{k-1}$ .

Proof.

First, observe that vertex $t$ cannot share any neighbours with $\langle s,p_{1},\ldots,p_{k-2}\rangle$ . To see this, assume that vertex $p_{i}$ on the path, where $i\leq k-2$ , shares a neighbour $v$ with vertex $t$ . Then $\langle s,p_{1},\ldots,p_{k-1},p_{k},t\rangle$ cannot possibly be the most degree-central shortest path between $s$ and $t$ as $\langle s,p_{1},\ldots,p_{i},v,t\rangle$ would be a shorter path.

Now consider the graph in Figure 1 where vertex $t$ does not share any neighbours with any node on path $\langle s,p_{1},\ldots,p_{k-2}\rangle$ , but with $p_{k-1}$ and $p_{k}$ .

Figure 1: Illustration of Lemma 2. The path

\langle s,p_{1},\ldots,p_{k-1}\rangle

must be a most degree-central shortest path between

s

and

p_{k-1}

if the path

\langle s,p_{1},\ldots,p_{k},t\rangle

is a most degree-central shortest path between

s

and

t

From the diagram, it is clear that if $\langle s,p_{1},\ldots,p_{k-1},p_{k},t\rangle$ is a most degree-central shortest path from $s$ to $t$ , then $\langle s,p_{1},\ldots,p_{k-1}\rangle$ is a most degree-central shortest path from $s$ to $p_{k-1}$ . To see this, suppose that there is an alternate shortest path $\langle s,\tilde{p}_{1},\ldots,\tilde{p}_{k-2},p_{k-1}\rangle$ to vertex $p_{k-1}$ that shares no neighbours with vertex $t$ and has a higher centrality. In this case, path $\langle s,\tilde{p}_{1},\ldots,\tilde{p}_{k-2},p_{k-1},p_{k},t\rangle$ , would have a centrality greater than that of path $\langle s,p_{1},\ldots,p_{k-1},p_{k},t\rangle$ as vertices $p_{k}$ and $t$ contribute the same vertices to the neighbourhoods of both paths. ∎

We now introduce the following notation to simplify the exposition. First, consider a source node $s$ . Let $d_{u}$ and $P_{u}$ be the shortest distance and most degree-central shortest path between nodes $s$ and $u$ , respectively, and let $\mathcal{P}_{u}$ be the set of vertices that can directly precede vertex $u$ on any shortest path from $s$ to $u$ . Additionally, we introduce the operation $\langle P,u,v\rangle$ , which we use to denote extending the path $P$ to vertices $u$ , then $v$ .

To solve Problem (1), we extend the breadth-first search (BFS) algorithm. The BFS algorithm can be used to traverse a graph from a source node $s$ to any other node in the network by iteratively extending paths already constructed in previous iterations. Specifically, we begin with the source node $s$ and, at each iteration of the algorithm, consider a queue of vertices $Q$ such that $q\in Q$ , if a path from $s$ to $q$ has been established in a previous iteration and $q$ hasn’t been processed yet. Then, we pop the first element in $Q$ , denoting it $u$ , and consider the set $\{(u,v):(u,v)\in E\}$ , i.e., edges that can be connected to $u$ . We then check if $v$ has been processed in a previous iteration, and if it has not, we extend the path to $v$ and add $v$ to the end of $Q$ (this is the traditional BFS algorithm). Note that this procedure also identifies the shortest path from a source node $s$ to all other nodes in an unweighted graph. To simplify the explanation of our algorithm, we adopt some aspects of Dijkstra’s algorithm (Dijkstra,, 1959), which extends the BFS procedure to find shortest paths in weighted graphs. Instead of checking if a node has been reached, we record the shortest distance from a source node $s$ to all other vertices in the graph. When we extend a path, we compare the length of the previously calculated shortest path from $s$ to $v$ and a new path from $s$ to $u$ and then to $v$ and take the shorter path. Then, to obtain the most degree-central shortest path, we modify the algorithm in the following manner: for each vertex $v$ , we place all nodes that can directly precede $v$ on the shortest path from $s$ to $v$ in the set $\mathcal{P}_{v}$ . Each time we consider extending the shortest path $\langle s,\dots,u\rangle$ to $v$ , we compare the neighbourhood of the existing most degree-central shortest path from $s$ to $v$ against the neighbourhoods of paths obtained by extending the most degree-central shortest path to $w$ , $P_{w}$ , to $u$ then $v$ for all $w\in\mathcal{P}_{u}$ . Furthermore, instead of only considering the case where a path is shorter than a previous path in the traditional Dijkstra’s algorithm, we also consider tied shortest paths as we need to compute and compare the centrality to ensure the most degree-central shortest path is found.

This is summarised in Algorithm 1.

Algorithm 1 Finding the most degree-central shortest path

1:Graph

G=(V,E)

, starting vertex

s

2:Most degree-central shortest paths

P_{v}

from node

s

v

for

v\in V

3:for

v\in V

d_{v}\leftarrow+\infty

P_{v}\leftarrow\text{undefined}

\mathcal{P}_{v}=\emptyset

5:end for

P_{s}\leftarrow\langle s\rangle

7:Insert

(0,s)

to the queue

Q

8:while

Q

not empty do

(d_{u},u)\leftarrow

pop next element of

Q

10: for

v\in\{v^{\prime}\in V:(u,v^{\prime})\in E

} do

11:

d_{\text{new}}\leftarrow d_{u}+1

12: if

d_{\text{new}}=1

then

13: Insert

(d_{\text{new}},v)

to the queue

Q

14:

d_{v}\leftarrow d_{\text{new}}

P_{v}=\langle s,v\rangle

\mathcal{P}_{v}\leftarrow\{s\}

15: else if

d_{\text{new}}<d_{v}

then

16: Insert

(d_{\text{new}},v)

to the queue

Q

17:

d_{v}\leftarrow d_{\text{new}}

\mathcal{P}_{v}\leftarrow\mathcal{P}_{v}\cup\{u\}

18: for

w\in\mathcal{P}_{u}

19: if

C(\langle P_{w},u,v\rangle)>C(P_{v})

then

20:

P_{v}\leftarrow\langle P_{w},u,v\rangle

21: end if

22: end for

23: else if

d_{\text{new}}=d_{v}

then

24:

\mathcal{P}_{v}\leftarrow\mathcal{P}_{v}\cup\{u\}

25: for

w\in\mathcal{P}_{u}

26: if

C(\langle P_{w},u,v\rangle)>C(P_{v})

then

27:

P_{v}\leftarrow\langle P_{w},u,v\rangle

28: end if

29: end for

30: end if

31: end for

32:end while

Proposition 1.

Given a graph $G=(V,E)$ and a source vertex $s\in V$ , Algorithm 1 returns the most degree-central shortest paths from $s$ to all other vertices in $V$ .

Proof.

We show this by induction.

Base case: The most degree-central shortest path to the source vertex $s$ is just $\langle s\rangle$ . Also, it is easy to see that any path connected in the first iteration of the while loop must be the shortest path from $s$ as it is directly connected to the source node. Furthermore, this will be the most degree-central path as there cannot be another path of the same or shorter length to this set of vertices.

Inductive step: Assume that in the $k$ th iteration of the while loop, the most degree-central shortest paths to all nodes in $S\subseteq V$ have been found.

In the $(k+1)$ th iteration, we consider node $u$ where $d_{u}$ is the smallest distance between the source vertex $s$ and all other previously connected nodes that have not yet been processed. The for-loop in line 8 allows us to branch to all adjacent nodes, while lines 10, 13 and 21 guarantee that only the shortest path will be recorded. The for-loops and if-statements in lines 16-20 and 23-27 then allow us to extend paths $P_{w}$ to $u$ then $v$ for all $w\in\mathcal{P}_{u}$ , which, by the inductive hypothesis and Lemma 2, will return the most degree-central shortest path from $s$ to $v$ that passes through vertex $u$ . After considering extending all shortest paths of length $d_{u}$ in subsequent iterations of the while-loop, and therefore considering all possible $u\in\mathcal{P}_{v}$ , we are guaranteed the most degree-central shortest path from $s$ to $v$ . ∎

Proposition 2.

Given a graph $G=(V,E)$ , the worst-case running time of Algorithm 1 is $O(|E||V|^{2})$ .

Proof.

Algorithm 1 is a modification of the BFS algorithm. The BFS algorithm has a runtime of $O(|V|+|E|)$ as we enumerate all the edges in the graph once in the outer while loop, and we add to the queue $|V|$ times to ensure all vertices are processed.

Assuming that $\mathcal{N}(P_{w})$ and $\mathcal{N}(P_{v})$ have been stored and can be easily accessed at each iteration of the while loop, computing $\mathcal{N}(\langle P_{w},u,v\rangle)$ in the inner for-loop for all possible $w$ ’s takes $O(|V|^{2})$ time in the worst case as we need to consider at most $|V|$ $P_{w}$ ’s, and computing $C(\langle P_{w},u,v\rangle)$ takes $O(|V|)$ time, which dominates the running time of the inner for-loop. This operation is completed at most $O(|E|)$ times as there are $|E|$ iterations of the while-loop to process all the edges. As a result, Algorithm 1 has runtime of $O(|E||V|^{2})$ . ∎

This bound can be tightened. Let $\Delta(G)$ be the maximum degree of graph $G$ . Then, the following is true.

Proposition 3.

Given a graph $G=(V,E)$ , the worst-case running time of Algorithm 1 is $O(|E||V|\Delta(G))$ .

Proof.

There are only $|\mathcal{P}_{u}|\leq\Delta(G)$ iterations of the inner for-loops in lines 16-20 and 23-27 as there are at most $\Delta(G)$ nodes that may directly precede vertex $u$ on a shortest path to vertex $u$ . As a result, one order of $|V|$ can be replaced with $\Delta(G)$ from the runtime quoted in Proposition 2. ∎

Theorem 1.

Problem (1) can be solved in $O(|E||V|^{2}\Delta(G))$ time.

Proof.

By Proposition 3, Algorithm 1 can be used to solve the problem of finding all most degree-central shortest paths from a given source node in $O(|E||V|\Delta(G))$ time. To solve Problem (1), we can apply Algorithm 1 over all vertices in the network, resulting in an overall running time of $O(|E||V|^{2}\Delta(G))$ . ∎

Comparing this to the running time of Algorithm MVP, which has a runtime of $O(k|V|^{6})\sim O(|V|^{7})$ , with $k$ being the diameter of the graph, Algorithm 1 has a runtime of $O(|E||V|^{2}\Delta(G))\sim O(|V|^{5})$ as $O(|E|)\leq O(|V|^{2})$ and $O(\Delta(G))\leq O(|V|)$ , which is an improvement of $O(|V|^{2})$ , and in addition, we only require $O(|V|^{2})$ space, rather than $O(|V|^{3})$ .

3 The case of weighted graphs

In this section, we examine solving Problem (1) on a graph with weighted edges. Hence, throughout this section, we assume that graph $G$ is weighted.

We first show that Problem (1) is NP-hard even with just two distinct weights by showing it is at least as hard as the Maximum Satisfiability (MaxSAT) problem, which is known to be NP-complete (Krentel,, 1988).

MaxSAT
Instance:	Set of $U$ variables, collection $C$ of clauses over $U$ and positive integer $k$ .
Question:	Is there a truth assignment for $U$ that simultaneously satisfies at least $k$ of the clauses in $C$ ?

Before we show the reduction, we formally define the decision variant of Problem (1) for a weighted graph.

MDCWSP-D
Instance:	A weighted graph $G$ and a positive integer $\alpha$ .
Question:	Is there a shortest path between two vertices in $G$ such that its degree centrality is greater than $\alpha$ ?

Theorem 2.

MDCWSP-D is NP-complete.

Proof.

First, observe that the degree centrality of a path can be computed in polynomial time; hence, MDCWSP-D is in NP.

Given an instance of MaxSAT, we can construct a directed weighted graph as follows. First, consider an unweighted graph $H$ with $2(|U|+2)$ nodes and $4(|U|+1)$ edges with the following structure: we start with two unlinked paths $\langle x_{1},\dots,x_{|U|}\rangle$ and $\langle\bar{x}_{1},\dots,\bar{x}_{|U|}\rangle$ both of length $|U|-1$ . For $i=1,\dots,|U|-1$ , we add additional edges $(x_{i},\bar{x}_{i+1})$ and $(\bar{x}_{i},x_{i+1})$ to $H$ . This means that the shortest path between any nodes $x_{i}$ or $\bar{x}_{i}$ and $x_{j}$ or $\bar{x}_{j}$ is of length $j-i$ for $i<j$ . Finally, to complete $H$ , we add four additional nodes $s$ , $\bar{s}$ that are both connected to $x_{1}$ and $\bar{x}_{1}$ , and $t$ and $\bar{t}$ that are connected to $x_{|U|}$ and $\bar{x}_{|U|}$ . Note that we connect the nodes such that the edges point towards $s$ , $\bar{s}$ $t$ and $\bar{t}$ . To obtain the weighted graph $G$ , we add vertices and edges to $H$ depending on the clauses. For each clause $c\in C$ , we add a vertex $y_{c}$ to $G$ and undirected edges $(z,y_{c})$ for each variable $z$ in clause $c$ , each with edge weight $|U|$ . Given these edge weights, any shortest path in $G$ cannot contain vertex $y_{c}$ in the middle of the path because an alternate shorter path will exist. Note: The additional edges can be directed; however, we set them to be undirected to amplify the difference between the unweighted and weighted settings.

An example of the structure of $G$ is given in Figure 2.

Figure 2: Example of a weighted graph

G

used in Theorem 2, with the blue vertices and edges indicating the initial unweighted graph

H

and the red vertices and edges indicating the additional vertices and edges with edge weight

|U|

We now show that there exists a truth assignment for $U$ with more than $k$ satisfiable clauses if and only if there exists a shortest path in $G$ with a degree of at least $|U|+k+4$ .

$\Longrightarrow$ If there is some truth assignment for $U$ with at least $k$ satisfiable clauses, then the corresponding path in $G$ will have at least $|U|+k+4$ neighbours by construction as we have $|U|$ neighbours from the False assignments and $k$ neighbours from the satisfied clauses, with an additional $4$ neighbours coming from the source and sink vertices.

$\Longleftarrow$ Suppose we are given a shortest path in $G$ with at least $|U|+k+4$ neighbours. We can transform this path into the form $\langle(x_{1}\oplus\bar{x}_{1}),\dots,(x_{|U|}\oplus\bar{x}_{|U|})\rangle\>$ , where $\oplus$ is the exclusive or operator, and maintain a degree centrality of at least $|U|+k+4$ . Consider the following cases:

•

If the given path ends at $s$ , $\bar{s}$ , $t$ or $\bar{t}$ , we remove this node, and the centrality will increase by one.
•

If the given path starts and ends in $H$ , for example, $\langle x_{i},\dots,x_{j}\rangle$ , then any arbitrary extension of the path so that it starts at $(x_{1}\oplus\bar{x}_{1})$ and ends at $(x_{|U|}\oplus\bar{x}_{|U|})$ will maintain or increase the centrality as the neighbourhood will contain the unselected nodes in $H$ .
•

If the path starts or ends at one of the additional nodes $y_{c}$ , for example, $\langle y_{c},x_{i},\dots,x_{j}\rangle$ , we can remove $y_{c}$ from the path and extend the path using the result from the above case, resulting in the path $\langle(x_{1}\oplus\bar{x}_{1}),\dots,(x_{i-1}\oplus\bar{x}_{i-1}),x_{i},\dots,x_{j},(x_{j+1}\oplus\bar{x}_{j+1}),\dots,(x_{|U|}\oplus\bar{x}_{|U|})\rangle$ . This new path will not have a centrality smaller than the original path as we will gain $y_{c}$ as a neighbour and the unselected nodes in $H$ .

In all the above cases, the new paths can be computed in polynomial time. The path returned will be a solution to MaxSAT because after the removal of the two start and end nodes, as well as the unvisited $|U|$ nodes in $H$ , the remaining $k$ nodes in the neighbourhood correspond to the satisfied clauses and the path corresponds to the truth assignment. ∎

Corollary 1.

Problem (1) with a weighted graph is NP-hard.

We now discuss two special cases of Problem (1) on a weighted graph and show how to solve them. The first is the case of positive integer-valued weights, and the second is the case where weights are generated from some positive continuous distribution.

Case 1: Although Dijkstra’s algorithm can be used to solve the problem of finding the shortest path in a graph with positive weighted edges, we note that Lemma 2 does not hold for weighted graphs. However, if the weights are positive integers, we can augment the network with auxiliary vertices and edges to represent the weighted edges. Then, Algorithm 1 can be applied to the augmented graph with a slight modification to ensure that the auxiliary nodes are not counted in the neighbourhood. We demonstrate the importance of this using the example in Figure 3. Given the weighted graph in Figure 3(a), the most degree-central shortest path is $P^{*}=\langle 1,2\rangle$ with $\mathcal{N}(P^{*})=\{3,4,5,6\}$ and $C(P^{*})=4$ . We now construct the augmented graph as shown in Figure 3(b). We label the auxiliary vertices using tuples $(u,v,i)$ to indicate that the new node is $i$ edges away from node $u$ on the original $(u,v)$ arc. If we were to naïvely apply Algorithm 1 on the augmented graph, the optimal path would be $\tilde{P}=\langle 1,2,(2,6,1),(2,6,2),6\rangle$ with $\mathcal{N}(\tilde{P})=\{(1,3,1),(1,4,1),(3,6,1),4,5\}$ and $C(\tilde{P})=5$ . However, the corresponding path without the auxiliary nodes $\langle 1,2,6\rangle$ only has the centrality of 3. This discrepancy arises because the auxiliary vertices introduced for the edges linking nodes 3 and 4 to the path should only contribute one neighbour to the neighbourhood.

(a) Original weighted graph (weights on edges).

(b) Augmented graph (auxiliary nodes and edges in red).

Figure 3: Augmenting a weighted graph.

To avoid the issue of double-counting nodes when computing the neighbourhood of a path, additional work needs to be done to consider the topology of the original graph. This will not worsen the asymptotic running time of Algorithm 1 as the additional work in lines 17 and 24 of the algorithm is checking what type of node we are processing. As a result, Algorithm 1 will have a running time of $O(w_{\text{sum}}|V|^{2}\Delta(G))$ , where $w_{\text{sum}}$ is the sum of all edge weights. Note that the running time is not $O(w_{\text{sum}}^{3}\Delta(G))$ as we only need to change the number of times the while-loop is executed. The remaining runtime remains the same as we only need to consider setting the starting vertex to be one that is in the original graph, and we only need to compute and compare the centrality of the paths when processing a vertex in the original graph. By applying Algorithm 1 on the augmented graph, the runtime is now pseudo-polynomial, as it depends on the sum of the edge weights.

Case 2: If the edge weights $w_{uv}$ are generated from some positive continuous distribution $W_{uv}$ for all $(u,v)\in E$ , the shortest path will be unique with probability 1 as $P(W_{uv}=w_{uv})=0$ for all $(u,v)\in E$ . In this case, the problem can be solved by first solving the all pairs shortest path problem and then evaluating and comparing the centralities of the paths. This can be done in $O(|V|^{3})$ time with the Floyd-Warshall algorithm.

4 Computational experiments

We tested Algorithm 1 on a large number of synthetic and real-world graphs. The experiments were conducted on a Mac Studio (2023) with Apple M2 Ultra and 128 GB of RAM running macOS Ventura 13.6. The algorithm was implemented in Python 3.9.

4.1 Data and results

The synthetic graphs were randomly generated using two procedures outlined below (each type of graph was generated 30 times for each parameter setting):

•

Watts–Strogatz graphs. We considered the ‘small-world’ network proposed by Watts and Strogatz, (1998), which represents networks that are “highly clustered, like regular lattices, yet have small characteristic path lengths”. We considered Watts–Strogatz graphs with 100, 500, 1000, 5000, and 10000 nodes. The number of initial neighbours was set to 4, and the rewiring probability was set to one of 0.1 or 0.2.
•

Barabási-Albert graphs. The ‘scale-free preferential attachment’ model proposed by Barabási and Albert, (1999) generates graphs with edges whose degree follows the power rule. We considered graphs with the same number of nodes as the Watts–Strogatz instances and set the number of edges attached to any new node to 2 to mirror the edge density of the Watts-Strogatz graphs.

In total, we generated and analysed 450 synthetic instances. We averaged the results for each parameter instance and summarised them in Tables 1 and 2.

To compare our results to the MVP algorithm, we also considered the following undirected real-world instances:

•

Krebs: The terrorist network of the 9/11 hijackers (Krebs,, 2002).
•

Dolphins: A network representing the social interactions of bottlenose dolphins (Lusseau et al.,, 2003).
•

Sandi Auths: An academic collaboration network (Rossi and Ahmed,, 2015).
•

IEEE Bus: A subnetwork of the US Electric Power System in 1962 (Davis and Hu,, 2011).
•

Santa Fe: A collaboration network for the Santa Fe Institute (Girvan and Newman,, 2002).
•

US Air 97: A transportation network of US Air in 1997 (Batagelj and Mrvar,, 2006; Davis and Hu,, 2011).
•

Bus: Bus power system network (Davis and Hu,, 2011).
•

Email: Network of e-mail interchanges (Guimera et al.,, 2003; Davis and Hu,, 2011).
•

Cerevisiae: Biological network of yeast protein-protein interactions (Davis and Hu,, 2011).

The results from these real-world graph instances are presented in Table 3.

In addition to the unweighted graphs, we also consider the following weighted graphs:

•

Copenhagen calls and Copenhagen SMS: Undirected and directed graphs with weights for the number of phone calls made and text messages sent between university students in Copenhagen over multiple weeks (Sapiezynski et al.,, 2019).
•

Bitcoin Alpha and Bitcoin OTC: Directed graphs of trust scores between users on the Bitcoin Alpha and Bitcoin OTC platforms (Kumar et al.,, 2016). Note that we have transformed the weights from a $[-10,10]$ range to $[1,21]$ to ensure positive weights.
•

Advogato: Directed graph of trust scores from users of Advogato, an open source software community (Massa et al.,, 2009). Similar to the Bitcoin graphs, the weights have been rescaled from having weights of $\{0.6,0.8,1.0\}$ to $\{3,4,5\}$ .

For these weighted graphs, we examine the performance of Algorithm 1 on the unweighted graph obtained by ignoring the weights and the augmented graph with the modified version of Algorithm 1. The results are presented in Tables 4 and 5. Note that these two tables also include rows for the number of nodes traversed so that comparisons can be made to the unweighted case.

In addition to the standard statistics, we report the size of the feasible set $|\mathcal{SP}(G)|$ , i.e., the number of shortest paths in $G$ , as well as $U(G)$ which is the number of pairwise connectable nodes in $G$ (i.e., the number of shortest paths if they were all unique), and finally the ratio $|\mathcal{SP}(G)|/U(G)$ which can be used to compare the size of the feasible sets. For the sake of brevity, the values of $|\mathcal{SP}(G)|$ and $U(G)$ can be found in the appendix.

	Watts–Strogatz (4, 0.1)					Watts–Strogatz (4, 0.2)
$\|V\|$	100	500	1000	5000	10000	100	500	1000	5000	10000
$\|E\|$	200	1000	2000	10000	20000	200	1000	2000	10000	20000
$\Delta(G)$	5.87	6.40	6.70	7.27	7.33	6.60	7.40	7.43	8.03	8.20
$\|\mathcal{SP}(G)\|/U(G)$	2.45	3.18	3.53	4.37	4.80	2.04	2.25	2.35	2.57	2.66
diam	10.43	15.40	17.73	22.33	24.50	8.43	11.50	12.97	16.13	17.47
diam centrality	22.23	31.20	35.37	42.53	45.20	21.67	28.80	30.60	37.73	40.17
path length	9.03	12.97	14.73	18.63	19.37	7.03	9.33	10.17	12.53	13.27
path centrality	23.17	33.87	37.60	46.90	50.40	23.33	31.73	35.17	43.77	47.07
MVP runtime	0.41	59.73	513.84	-	-	0.41	55.73	482.51	-	-
Alg 1 runtime	0.07	2.08	10.26	366.88	1712.52	0.06	2.08	9.75	359.95	1704.02

Table 1: Results for Watts–Strogatz graphs averaged over 30 instances.

	Barabási-Albert
$\|V\|$	100	500	1000	5000	10000
$\|E\|$	196	996	1996	9996	19996
$\Delta(G)$	25.27	53.83	76.53	188.20	257.40
$\|\mathcal{SP}(G)\|/U(G)$	2.13	2.67	2.87	3.34	3.53
diam	5.57	7.03	7.27	8.53	9.00
diam centrality	46.17	120.10	179.30	382.57	553.27
path length	3.87	4.67	5.07	5.80	6.03
path centrality	51.87	138.33	199.83	483.93	676.47
MVP runtime	0.36	45.92	381.45	-	-
Alg 1 runtime	0.07	2.31	13.46	713.06	3792.77

Table 2: Results for Barabási-Albert graphs averaged over 30 instances.

	Kreb	Dolphins	Sandi Auths	IEEE Bus	Santa Fe	US Air 97	Bus	Email	Cerevisiae
$\|V\|$	62	62	86	118	118	332	662	1133	1458
$\|E\|$	153	159	124	179	200	2126	906	5451	1948
$\Delta(G)$	22	12	12	9	29	139	9	71	56
$\|\mathcal{SP}(G)\|/U(G)$	1.95	2.94	1.36	2.26	1.51	5.55	2.44	6.73	2.57
diam	5	8	11	14	12	6	25	8	19
diam centrality	38	30	32	32	90	167	45	159	57
path length	3	6	7	8	10	3	20	4	7
path centrality	40	36	34	33	92	206	50	187	156
MVP runtime	0.09	0.11	0.25	0.73	0.66	21.96	147.03	807.92	1317.34
Alg 1 runtime	0.03	0.03	0.04	0.09	0.09	2.17	3.88	26.53	22.89

Table 3: Results for real-world instances.

	Copenhagen calls				Copenhagen SMS
weighted	no	yes	no	yes	no	yes	no	yes
directed	yes	yes	no	no	yes	yes	no	no
$\|V\|$	536	3212	536	3515	568	23598	568	24204
$\|E\|$	924	3600	621	3600	1303	24333	697	24333
$\Delta(G)$	18	18	18	18	11	11	11	11
$\|\mathcal{SP}(G)\|/U(G)$	1.43	1.35	1.79	1.50	1.99	1.30	2.12	1.32
diam	21	75	22	197	22	1783	20	3781
diam nodes traversed	22	25	23	15	23	10	21	12
diam centrality	61	25	41	42	30	22	42	30
path length	16	22	15	24	18	61	10	67
path nodes traversed	17	15	16	15	19	20	11	20
path centrality	63	52	66	63	49	59	50	64
Alg 1 runtime	0.40	4.31	0.81	4.53	1.10	74.19	1.33	96.71

Table 4: Results for Copenhagen calls and SMS graph instances. Note that

|E|=w_{\text{sum}}

for the augmented graphs.

	Bitcoin Alpha		Bitcoin OTC		Advogato
weighted	no	yes	no	yes	no	yes
$\|V\|$	3783	281050	5881	397821	6539	167369
$\|E\|$	24186	301453	35592	427532	51127	211957
$\Delta(G)$	511	511	795	795	805	805
$\|\mathcal{SP}(G)\|/U(G)$	9.50	2.68	10.74	19.18	8.54	1.83
diam	10	107	11	107	11	35
diam nodes traversed	11	12	12	12	12	7
diam centrality	689	266	971	299	318	846
path length	5	39	4	31	4	11
path nodes traversed	6	9	5	12	5	4
path centrality	865	891	1349	1433	1318	1165
Alg 1 runtime	1134.96	20996.75	4189.95	58050.34	1345.74	24927.09

Table 5: Results for trust graph instances. Note that

|E|=w_{\text{sum}}

for the augmented graphs.

4.2 Discussion and insights

We observe that using Algorithm 1, we can solve problems much more efficiently than using the MVP algorithm. For instance, the Watts-Strogatz instances with 1000 nodes, which took around 8 minutes to solve, can now be solved in roughly 10 seconds. Problems involving 5000 and 10000 nodes can now be solved within a fifth and half of an hour, respectively. A similar observation can be made for the Barabási-Albert graphs. We did not consider solving larger instances with the MVP algorithm, as one 5000-node instance took over 13 hours to solve. This is consistent with the expected runtime based on the difference between solving 100 and 500-node instances. For real-world instances, the improvement in runtime is also significant. For example, the Cerevisiae instance no longer requires over twenty minutes to solve; it can be solved in just over twenty seconds. We also note that in practice, Algorithm 1 scales much more efficiently than stated in Theorem 1 because, for most vertices $u$ and paths $P_{u}$ , the size of $\mathcal{P}_{u}$ and $\mathcal{N}(P_{u})$ is small as the number of immediate neighbours is smaller than the maximum degree of the graph $\Delta(G)$ . We also observe the difference in running time over the two types of synthetic instances. Although the number of vertices and edges in both sets of graphs are nearly identical, the differences in running times for both algorithms on both sets of graphs are pronounced. Algorithm 1 returns a solution faster in Watts-Strogatz graphs than Barabási-Albert graphs of similar sizes, while the converse is true for Algorithm MVP. This is consistent with the theoretical running times of the MVP algorithm, and Theorem 1 as Watts-Strogatz graphs have smaller maximum degrees but larger diameters than Barabási-Albert graphs.

We also observe that the weighted case is much harder to solve. However, as mentioned earlier, we see that the ratio of the running times between the weighted and unweighted case is not $w_{\text{sum}}/|E|$ as implied by the analysis. It is larger. This is because the work done in each iteration to compute and compare $C(P)$ is not the same across both cases. More work needs to be done in the weighted case as we need to check the type of node to which we are extending the path. Table 4 also offers insight into the algorithm’s performance on directed and undirected graphs. In order to model the undirected edges, we use two directed edges. This means that although undirected graphs have fewer edges, the algorithm is faster on directed graphs as there is not necessarily bidirectional flow between all pairs of connected vertices. In addition, we can see that the centrality of the optimal path is different when weights are considered. There is no guarantee on whether this value is higher or lower. Hence, there is a need to compare the cases separately depending on the problem instance.

We finally note that although in our experiments, both Algorithm MVP and Algorithm 1 were run sequentially over all starting vertices, both could be run in parallel for an improved runtime.

5 Other measures of centrality

As mentioned in the introduction, other centrality measures can also be applied. In this section, we consider two alternative metrics: the betweenness centrality and the closeness centrality.

5.1 Betweenness centrality

We first consider the problem of finding the most betweenness-central shortest path, which we define as follows.

For a path $P$ , let $C_{b}(P)$ be the number of shortest paths between any pair of nodes not on $P$ that traverse through at least one node in $P$ , with

C_{b}(P):=\sum_{u<v:u,v\in V\setminus P}g_{uv}(P),

where $g_{uv}(P)$ is the number of shortest (geodesic) paths between nodes $u$ and $v$ that traverse through at least one node in $P$ . Although this definition is different to the one provided by Everett and Borgatti, (1999), who define the betweenness centrality for a set of nodes $P$ as

\sum_{u<v:u,v\in V\setminus P}\frac{g_{uv}(P)}{g_{uv}},

where $g_{uv}$ is the number of geodesics between $u$ and $v$ , we disregard the term in the denominator so that this becomes a counting function. If a normalised measure is desired, one may divide $C_{b}(P)$ by $\sum_{u<v:u,v\in V\setminus P}g_{uv}$ to yield a similar result.

It then follows that the problem of finding the most betweenness-central shortest path becomes

\displaystyle\max\left\{C_{b}(P):P\in\mathcal{SP}(G)\right\}.

(2)

To solve this problem, we rely on the following lemma:

Lemma 3.

If $\langle s,p_{1},\ldots,p_{k},t\rangle$ is a most betweenness-central shortest path from $s$ to $t$ , then $\langle s,p_{1},\ldots,p_{k}\rangle$ is a most betweenness-central shortest path from $s$ to $p_{k}$ .

Proof.

By Lemma 1, it is clear that if $\langle s,p_{1},\ldots,p_{k},t\rangle$ is a shortest path from $s$ to $t$ then $\langle s,p_{1},\ldots,p_{k}\rangle$ is a shortest path from $s$ to $p_{k}$ .

Consider the graph in Figure 4 where the nodes have been arranged from left to right in terms of distance from $s$ . This implies that a shortest path from $s$ to any other node must travel from left to right and cannot contain a vertical edge. Additionally, assume there is a path $\langle s,\tilde{p}_{1},\ldots,\tilde{p}_{k-1},\tilde{p}_{k},\tilde{p}_{k+1},\tilde{p}_{k+2}\rangle$ which is a shortest path from $s$ to $\tilde{p}_{k+2}$ , and for the sake of argument, $\langle s,\tilde{p}_{1},\ldots,\tilde{p}_{k-1},p_{k}\rangle$ is another shortest path from $s$ to $p_{k}$ .

Figure 4: Illustration of graph used in the proof of Lemma 3.

Given that $\langle s,p_{1},\ldots,p_{k},t\rangle$ is the most betweenness-central shortest path between $s$ and $t$ , removing $t$ from the end of the path will update the betweenness centralilty score by adding paths that end at $t$ and traverse through at least one of the nodes in $\langle s,p_{1},\ldots,p_{k}\rangle$ and subtracting paths that only traverse through $t$ and none of the other nodes in the remaining path. The number of paths subtracted in this case is fixed regardless of whether $\langle s,p_{1},\ldots,p_{k},t\rangle$ or $\langle s,\tilde{p}_{1},\ldots,\tilde{p}_{k-1},p_{k},t\rangle$ was the most betweenness-central shortest path between $s$ and $t$ . This, in turn, implies that for $\langle s,p_{1},\ldots,p_{k-1},p_{k},t\rangle$ to be the most betweenness-central shortest path, $\langle s,p_{1},\ldots,p_{k-1},p_{k}\rangle$ must be the most betweenness-central shortest path between $s$ and $p_{k}$ as this path already accounts for shortest paths that end in $t$ that traverse through at least one node in the path. ∎

Lemma 3 implies that an algorithm similar to Algorithm 1 can be applied to solve Problem 2; however, some simplifications can be made. Comparing Lemma 2 and Lemma 3, we only need to extend the existing most central shortest paths by one node rather than two nodes. This means that the nested for-loops in lines 16-20 and 23-27 can be removed and replaced with an if-statement that checks whether $C_{b}(\langle P_{u},v\rangle)$ is larger than $C_{b}(P_{v})$ . As $C_{b}(P)$ can be computed in polynomial time, we arrive at the following theorem.

Theorem 3.

Problem (2) is solvable in polynomial time.

Furthermore, solving Problem (2) with weighted graphs is easier than solving Problem (1). By Lemma 3, we only need to look back to the preceding node, and no distance comparison is required. As a result, Lemma 3 also holds for graphs with positively weighted edges. This means that the algorithm described above can be further adapted to handle graphs with positively weighted edges by updating the new distance with $d_{\text{new}}\leftarrow d_{u}+w_{uv}$ in line 9, dropping lines 10-12 and using a priority queue instead of a standard queue, which is more akin to Dijkstra’s algorithm (Dijkstra,, 1959) than the standard BFS procedure. This leads to the following result.

Corollary 2.

Problem (2) on a graph with positively weighted edges is solvable in polynomial time.

5.2 Closeness centrality

We now consider the problem of finding the most closeness-central shortest path, where the closeness centrality of a path is defined below.

Let $d(u,P)$ be the shortest distance from a node $u$ to the path $P$ ,

d(u,P):=\min_{v\in P}\{d_{uv}\},

where $d_{uv}$ is the length of the shortest path from $u$ to $v$ . To compute the closeness centrality for path $P$ , we take the maximum closeness centrality scores over all nodes that are not on $P$

C_{c}^{\max}(P):=\max_{u:u\in V\setminus P}\{d(u,P)\}.

This leads to the problem of finding the most closeness-central shortest path:

\displaystyle\min\left\{C_{c}^{\max}(P):P\in\mathcal{SP}(G)\right\}.

(3)

Although other aggregation functions can be used to define a path’s closeness centrality (see Vogiatzis et al., (2014), where the authors study the sum and maximum of the closest distances), we chose this definition as it represents the furthest distance one must travel to reach the path $P$ . For example, this could represent the largest number of towns any commuter in a region must travel through to reach the closest train station. Even with the choice of aggregation functions that make this objective easy to work with, we show that the problem is as hard as the Satisfiability (SAT) problem, which is known to be NP-complete (Garey and Johnson,, 1979).

SAT
Instance:	Set of $U$ variables, collection $C$ of clauses over $U$ .
Question:	Is there a satisfying truth assignment for $C$ ?

We define the corresponding decision version of Problem (3) (MCCSP-D) below.

MCCSP-D
Instance:	A graph $G$ and a positive integer $\alpha$ .
Question:	Is there a shortest path between two vertices in $G$ such that its closeness centrality is less than $\alpha$ ?

Theorem 4.

MCCSP-D is NP-complete.

Proof.

MCCSP-D is in NP because all pairwise distances in $G$ can be computed in polynomial time. Hence, computing the closeness centrality of a path can also be completed in polynomial time.

Given an instance of SAT, we construct a graph $G$ in the following manner. Similar to the proof of Theorem 2, we start by constructing a graph $H$ with $2(|U|+1)$ nodes and $4|U|$ directed edges. Again, we start with two unlinked paths $\langle x_{1},\dots,x_{|U|}\rangle$ and $\langle\bar{x}_{1},\dots,\bar{x}_{|U|}\rangle$ both of length $|U|-1$ . For $i=1,\dots,|U|-1$ , we add edges $(x_{i},\bar{x}_{i+1})$ and $(\bar{x}_{i},x_{i+1})$ to connect these two paths. To complete $H$ , we add in two nodes, $s$ and $t$ , connected to the existing graph with the following edges $(s,x_{1})$ , $(s,\bar{x}_{1})$ , $(x_{|U|},t)$ and $(\bar{x}_{|U|},t)$ . Finally, to obtain $G$ , we start with $H$ , and for each clause $c\in C$ , we add the following nodes and edges. For each variable $j$ in clause $c$ , where $j\in\cup_{i=1}^{|U|}\{x_{i},\bar{x}_{i}\}$ , we construct an undirected path $\langle j,z_{c,j,1},\dots,z_{c,j,|U|-1},y_{c}\rangle$ of length $|U|$ . Overall, $G$ has $2(|U|+1)+|C|+C_{\text{len}}(|U|-1)$ nodes and $4|U|$ directed and $C_{\text{len}}(|U|-1)$ undirected edges, where $C_{\text{len}}=\sum_{i=1}^{|C|}|c_{i}|$ is the length of the formula, i.e., the number of variables in the clauses. Comparing this graph to the one used in the proof of Theorem 2, there is only one copy of the source and sink nodes $s$ and $t$ , and the paths from $H$ to $y_{c}$ consists of $|U|$ unit-weighted edges rather than a single edge with weight $|U|$ .

We now show that a solution to SAT exists if and only if there exists a shortest path in $G$ with a closeness centrality less than or equal to $|U|$ .

$\Longrightarrow$ Given a solution to SAT, the corresponding path in $G$ will have a closeness centrality of exactly $|U|$ (or 1 if the instance of SAT contains no clauses). This is because any nodes $z_{c}$ that are associated with unsatisfied clauses will be at $|U|+1$ away from the shortest path.

$\Longleftarrow$ Given a solution to MCCSP-D with a closeness centrality less than or equal $|U|$ , we have the following cases:

1.

The path starts at $s$ and ends at $t$ . Then, removing $s$ and $t$ yields the corresponding satisfying truth assignment for $C$ .
2.

The path starts at $s$ or $(x_{i}\oplus\bar{x}_{i})$ and ends at $t$ or $(x_{j}\oplus\bar{x}_{j})$ . Then, any arbitrary extension of the path through any nodes in $H$ such that the path starts at $s$ and ends at $t$ leads to the first case.
3.

The path starts and/or ends at one of the nodes in $G$ but not in $H$ . Then, removing the nodes only in $G$ and arbitrarily extending the path such that it starts at $s$ and ends at $t$ again leads to the first case. Note that if the path only traverses nodes corresponding to one clause, then the mapping to SAT is trivial.

As these transformations can be computed in polynomial time, MCCSP-D is NP-complete. ∎

Corollary 3.

Problem (3) is NP-hard.

As a result of Corollary 3, a polynomial time algorithm for Problem (3) does not exist unless $\text{P}=\text{NP}$ .

5.3 Discussion and insights

We summarise the complexity status of the problems considered in this paper in Table 6.

Centrality measure	Unweighted graph	Weighted graph
Betweenness	P	P
Degree	P	NP-hard
Closeness	NP-hard	NP-hard

Table 6: Comparison of the complexity status of the problems considered on unweighted and positive edge-weighted graphs.

Comparing the three centrality measures analysed in this paper, we observe that the complexity of the problem of finding the most central shortest path depends on the measure of centrality employed. We see that the problem of finding the most betweenness-central shortest path is the easiest as it is solvable in polynomial time for both unweighted and weighted graphs. On the other hand, the problem of finding the most degree-central shortest path can be solved in polynomial time on unweighted graphs but is NP-hard on weighted graphs. In contrast, the problem of finding the most closeness-central shortest path is NP-hard on both unweighted and weighted graphs. We hypothesise that this relationship arises as betweenness centrality is distance-invariant, i.e., we are just counting the number of shortest paths. Similarly, finding the most degree-central shortest path in an unweighted graph can be computed in polynomial time because optimal paths of length $k$ can be obtained by extending optimal paths of length $k-2$ . However, when weights are added to the edges, the result no longer holds, and the problem becomes NP-hard. Unlike these two metrics, closeness centrality has a built-in distance measure, and hence the problem of finding the most closeness-central shortest path is NP-hard regardless of whether the graph has weighted edges or not.

6 Concluding remarks

In this paper, we consider the problem of finding the most central shortest path in a graph. We show that the problem of finding the most degree-central shortest path on a simple graph can be solved in polynomial time by extending the BFS algorithm. This new algorithm is more efficient than the one proposed by Matsypura et al., (2023). Our numerical tests on synthetic and real-world graph instances demonstrate significant improvement. In addition to this, we show that the above problem with a weighted graph is NP-hard, even with two distinct weights, and we provide a pseudo-polynomial time extension of our algorithm for the integer weighted problem, as well as a polynomial time solution to the case with continuous weights.

Furthermore, we show that finding the most betweenness-central shortest path can be completed in polynomial time regardless of whether the graph has weighted edges or not. In contrast, the problem of finding the most closeness-central shortest path is NP-hard, irrespective of whether the graph is weighted or not. Future work in this area may focus on studying the problem of finding the most closeness-central shortest path with other aggregation functions or investigate heuristics and approximation schemes for problems established to be NP-hard in this paper.

References

Barabási and Albert, (1999) Barabási, A.-L. and Albert, R. (1999). Emergence of scaling in random networks. Science (American Association for the Advancement of Science), 286(5439):509–512.
Batagelj and Mrvar, (2006) Batagelj, V. and Mrvar, A. (2006). Pajek datasets.
Camur and Vogiatzis, (2024) Camur, M. C. and Vogiatzis, C. (2024). A survey on optimization studies of group centrality metrics.
Davis and Hu, (2011) Davis, T. A. and Hu, Y. (2011). The University of Florida Sparse Matrix Collection. ACM Transactions on Mathematical Software (TOMS), 38(1).
de Sá et al., (2015) de Sá, E. M., Contreras, I., Cordeau, J.-F., de Camargo, R. S., and de Miranda, G. (2015). The hub line location problem. Transportation Science, 49(3):500–518.
Dijkstra, (1959) Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische Mathematik, 1(1):269–271.
Everett and Borgatti, (1999) Everett, M. G. and Borgatti, S. P. (1999). The centrality of groups and classes. The Journal of Mathematical Sociology, 23(3):181–201.
Garey and Johnson, (1979) Garey, M. R. and Johnson, D. S. (1979). Computers and intractability: a guide to the theory of NP-completeness. A Series of books in the mathematical sciences. W. H. Freeman, San Francisco.
Girvan and Newman, (2002) Girvan, M. and Newman, M. E. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12):7821–7826.
Guimera et al., (2003) Guimera, R., Danon, L., Diaz-Guilera, A., Giralt, F., and Arenas, A. (2003). Self-similar community structure in a network of human interactions. Physical Review E, 68(6):065103.
Krebs, (2002) Krebs, V. (2002). Uncloaking terrorist networks. First Monday, 7(4).
Krentel, (1988) Krentel, M. W. (1988). The complexity of optimization problems. Journal of Computer and System Sciences, 36(3):490–509.
Kumar et al., (2016) Kumar, S., Spezzano, F., Subrahmanian, V., and Faloutsos, C. (2016). Edge weight prediction in weighted signed networks. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pages 221–230.
Lusseau et al., (2003) Lusseau, D., Schneider, K., Boisseau, O. J., Haase, P., Slooten, E., and Dawson, S. M. (2003). The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology, 54(4):396–405.
Massa et al., (2009) Massa, P., Salvetti, M., and Tomasoni, D. (2009). Bowling alone and trust decline in social network sites. In 2009 Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing, pages 658–663. IEEE.
Matsypura et al., (2023) Matsypura, D., Veremyev, A., Pasiliao, E. L., and Prokopyev, O. A. (2023). Finding the most degree-central walks and paths in a graph: Exact and heuristic approaches. European Journal of Operational Research, 308(3):1021–1036.
Peng et al., (2018) Peng, S., Zhou, Y., Cao, L., Yu, S., Niu, J., and Jia, W. (2018). Influence analysis in social networks: A survey. Journal of Network and Computer Applications, 106:17–32.
Rossi and Ahmed, (2015) Rossi, R. and Ahmed, N. (2015). The network data repository with interactive graph analytics and visualization. Proceedings of the AAAI Conference on Artificial Intelligence, 29(1).
Sapiezynski et al., (2019) Sapiezynski, P., Stopczynski, A., Lassen, D. D., and Lehmann, S. (2019). Interaction data from the Copenhagen Networks Study. Scientific data, 6(1):315–10.
Vogiatzis and Camur, (2019) Vogiatzis, C. and Camur, M. C. (2019). Identification of essential proteins using induced stars in protein–protein interaction networks. INFORMS Journal on Computing, 31(4):703–718.
Vogiatzis et al., (2014) Vogiatzis, C., Veremyev, A., Pasiliao, E., and Pardalos, P. (2014). An integer programming approach for finding the most and the least central cliques. Optimization Letters, 9:1–19.
Watts and Strogatz, (1998) Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature (London), 393(6684):440–442.

Appendix A Additional tables

	Watts–Strogatz (4, 0.1)
$\|V\|$	100	500	1000	5000	10000
$\|E\|$	200	1000	2000	10000	20000
$\|\mathcal{SP}(G)\|$	12388.6	398706.1	1766514.3	54612545.7	239916860.4
$U(G)$	5050	125250	500500	12502500	50005000
$\|\mathcal{SP}(G)\|/U(G)$	2.45	3.18	3.53	4.37	4.80

Table 7: Number of paths for Watts–Strogatz graphs with the rewiring probability of 0.1, averaged over 30 instances.

	Watts–Strogatz (4, 0.2)
$\|V\|$	100	500	1000	5000	10000
$\|E\|$	200	1000	2000	10000	20000
$\|\mathcal{SP}(G)\|$	10314.1	282350.7	1178139.3	32168149.2	133122135.8
$U(G)$	5050	125250	500500	12502500	50005000
$\|\mathcal{SP}(G)\|/U(G)$	2.04	2.25	2.35	2.57	2.66

Table 8: Number of paths for Watts–Strogatz graphs with the rewiring probability of 0.2, averaged over 30 instances.

	Barabási-Albert
$\|V\|$	100	500	1000	5000	10000
$\|E\|$	196	996	1996	9996	19996
$\|\mathcal{SP}(G)\|$	10750.7	334680.5	1435649.6	41703849.9	176462004.9
$U(G)$	5050	125250	500500	12502500	50005000
$\|\mathcal{SP}(G)\|/U(G)$	2.13	2.67	2.87	3.34	3.53

Table 9: Number of paths for Barabási-Albert graphs averaged over 30 instances.

	Kreb	Dolphins	Sandi Auths	IEEE Bus	Santa Fe	US Air 97	Bus	Email	Cerevisiae
$\|V\|$	62	62	86	118	118	332	662	1133	1458
$\|E\|$	153	159	124	179	200	2126	906	5451	1948
$\|\mathcal{SP}(G)\|$	3807	5734	5071	15897	10613	306814	536222	4321934	2738552
$U(G)$	1953	1953	3741	7021	7021	55278	219453	642411	1063611
$\|\mathcal{SP}(G)\|/U(G)$	1.95	2.94	1.36	2.26	1.51	5.55	2.44	6.73	2.57

Table 10: Number of paths for real-world instances.

	Copenhagen calls				Copenhagen SMS
weighted	no	yes	no	yes	no	yes	no	yes
directed	yes	yes	no	no	yes	yes	no	no
$\|V\|$	536	3212	536	3515	568	23598	568	24204
$\|E\|$	924	3600	621	3600	1303	24333	697	24333
$\|\mathcal{SP}(G)\|$	73136	69157	109296	91811	364599	239031	222517	138371
$U(G)$	51224	51224	61040	61040	183411	183411	104869	104869
$\|\mathcal{SP}(G)\|/U(G)$	1.43	1.35	1.79	1.50	1.99	1.30	2.12	1.32

Table 11: Number of paths for Copenhagen calls and SMS graph instances.

	Bitcoin Alpha		Bitcoin OTC		Advogato
weighted	no	yes	no	yes	no	yes
$\|V\|$	3783	281050	5881	397821	6539	167369
$\|E\|$	24186	301453	35592	427532	51127	211957
$\|\mathcal{SP}(G)\|$	116049178	32751385	297445256	531265752	138259270	29573473
$U(G)$	12215353	12215353	27696379	27696379	16190692	16190692
$\|\mathcal{SP}(G)\|/U(G)$	9.50	2.68	10.74	19.18	8.54	1.83

Table 12: Number of paths for trust graph instances.