A Distributed Solution for Efficient K Shortest Paths Computation over Dynamic Road Networks

Ziqiang Yu¹, Xiaohui Yu^2∗, Nick Koudas³, Yueting Chen², Yang Liu⁴
*Corresponding author ¹Yantai University ²York University ³University of Toronto ⁴Wilfrid Laurier University

Abstract

The problem of identifying the k-shortest paths (KSPs for short) in a dynamic road network is essential to many location-based services. Road networks are dynamic in the sense that the weights of the edges in the corresponding graph constantly change over time, representing evolving traffic conditions. Very often such services have to process numerous KSP queries over large road networks at the same time, thus there is a pressing need to identify distributed solutions for this problem. However, most existing approaches are designed to identify KSPs on a static graph in a sequential manner (i.e., the $(i+1)^{th}$ shortest path is generated based on the $i^{th}$ shortest path), restricting their scalability and applicability in a distributed setting. We therefore propose KSP-DG, a distributed algorithm for identifying k-shortest paths in a dynamic graph. It is based on partitioning the entire graph into smaller subgraphs, and reduces the problem of determining KSPs into the computation of partial KSPs in relevant subgraphs, which can execute in parallel on a cluster of servers. A distributed two-level index called DTLP is developed to facilitate the efficient identification of relevant subgraphs. A salient feature of DTLP is that it indexes a set of virtual paths that are insensitive to varying traffic conditions in an efficient and compact fashion, leading to very low maintenance cost in dynamic road networks. This is the first treatment of the problem of processing KSP queries over dynamic road networks. Extensive experiments conducted on real road networks confirm the superiority of our proposal over baseline methods.

Index Terms:

k

shortest paths, road networks, dynamic graph, distributed index.

1 Introduction

In this work, we are concerned with identifying k-shortest paths (KSPs for short) over dynamic road networks. That is, for a given dynamic road network and a pair of origin and destination, identify the k shortest loop-less paths according to a predefined measure (e.g., travel time). The road network can be considered a graph where the intersections/endpoints are represented as vertices and roads as edges. It is dynamic in the sense that the travel time (or other similar measures such as congestion level) changes over time, corresponding to evolving weights of edges in the graph.

Identifying KSPs in a dynamic road network is an essential building block in many location-based services such as route navigation and ride-sharing [1, 4, 2, 3]. Most of the existing work to address the KSP problem in a road network (or more generally, in a graph) assumes a centralized approach [5, 6, 7, 8, 9, 10, 11, 12, 13, 14], and is incapable of handling large volumes of concurrent queries over a dynamic graph for two main reasons. First, the query processing strategies employed are sequential, namely they generate the $(i+1)^{th}$ shortest path based on the $i^{th}$ shortest path, which limits their scalability with respect to the number of concurrent queries. Second, some previous approaches [8, 10, 11] require building a heavy-weight path index, which is costly and easily becomes invalid once the edge weights in the graph change.

Distributed algorithms have recently been developed to compute shortest paths in static graphs [15, 16, 17, 18, 19, 20], but also cannot be directly adopted to process KSP queries in dynamic road networks because they also need to require building indexes beforehand to enhance search efficiency, which is impractical for dynamic graphs.

In order to tackle the aforementioned problems, we have proposed a distributed solution that can be deployed on a cluster of servers to handle KSP queries in a distributed fashion over large dynamic graphs. At a very high level, this solution adopts a divide-and-conquer strategy: the large graph is partitioned into multiple subgraphs maintained on different servers; given a query, we first compute in parallel partial shortest paths in the subgraphs on different servers, which are subsequently merged to construct the k shortest paths. Although similar strategies have been adopted for distributed query evaluation on graphs [21, 22, 23, 24, 25], the processing of KSP queries presents some unique challenges: (1) It is non-trivial to identify the partitioned subgraphs containing the required partial shortest paths. (2) It is vital but difficult to build an effective index for identifying KSPs that is easy to maintain in the presence of constantly changing weights.

With these challenges in mind, we first propose a Distributed Two-Level Path (DTLP) index. This index is based on partitioning any graph $G$ into multiple subgraphs, where two subgraphs may overlap at a small number of vertices called boundary vertices but have no common edges. For any two boundary vertices in the same subgraph, we compute specific paths between them called bounding paths. The boundary vertices and the bounding paths in all subgraphs form the first level of DTLP, which provides a lower bound of the shortest distance between a pair of boundary vertices. These bounding paths do not change due to varying weights, making the index easily maintainable. However, the volume of bounding paths in a subgraph can be prohibitively large. We thus propose a tree-based index structure to compactly store the bounding paths without sacrificing maintenance efficiency. The second level, is a skeleton graph $G_{\lambda}$ , in which the vertices correspond to the boundary vertices of all subgraphs, and there exists an edge between a pair of vertices in $G_{\lambda}$ if and only if there is a set of bounding paths between the corresponding vertices in the original graph, with the weight of the edge computed based on that set of bounding paths. Graph $G_{\lambda}$ serves the purpose of supplying an approximate search direction for identifying KSPs.

We propose an iterative algorithm, KSP-DG, to identify KSPs in Dynamic Graphs based on DTLP, which follows a ”filter-and-refine” strategy. At each iteration of the algorithm, we first use the skeleton graph $G_{\lambda}$ in the filter step to compute a shortest path in $G_{\lambda}$ that has not been examined before. Then the subgraphs covering the vertices on the path (i.e., boundary vertices in subgraphs) are selected for further examination. In the refine step, $k$ shortest paths between the boundary vertices are generated from each subgraph, which are combined to form complete paths in $G$ . These paths are then used to update the list of shortest paths in $G$ that have been obtained so far. This process terminates when there is no more change in the list, and the final result is guaranteed to be the KSPs. The algorithm is distributed by design, in that the partitioning of the graph allows the refine step to run in parallel over the cluster, and the small footprint of the skeleton graph $G_{\lambda}$ lends itself well to be replicated to any node in the cluster where it is needed for generating the shortest paths in $G_{\lambda}$ .

In the refine step of KSP-DG, a frequently invoked but expensive operation is to compute the partial KSPs between boundary vertices in given subgraphs. In order to speed up this computation, We propose a progressive version of the widely adopted Yen’s algorithm [6], which was originally designed for computing pair-wise shortest distances in a graph, to improve the search efficiency through the following the enhancements. We call this new algorithm PYen (for Progressive Yen’s algorithm). (1) PYen parallelizes the generation of each potential shortest path and can prune unpromising potential shortest paths early before the complete generation; and (2) it fully reuses identified shortest paths from previous results and only compute shortest paths for unknown vertex pairs incrementally to avoid redundant computation.

In summary, we make the following contributions.

•

We introduce a comprehensive distributed solution for processing KSP queries, utilizing a divide-and-conquer strategy. This independent exploration across different divisions allows our solution to be easily scaled across multiple servers and executed in parallel, thereby accelerating the search process.
•

We devise DTLP, a dual-level path index suitable for deployment in a distributed setting to facilitate the processing of KSP queries over dynamic graphs.
•

We condense the extensive volume of bounding paths of DTLP into a space-efficient modified prefix tree using locality sensitive hashing. This enhances the space efficiency without compromising the indexing power.
•

We propose KSP-DG that decomposes the problem of identifying KSPs in the entire graph into parallel search for partial KSPs in different subgraphs, suitable for distributed processing on a cluster. Furthermore, we present PYen, a Progressive Yen’s algorithm, to expedite the computation of partial KSPs within subgraphs, a crucial operation frequently used by KSP-DG.
•

We conduct extensive experiments on real road networks to evaluate the performance of the proposed approach, confirming its effectiveness and superiority over other approaches across a variety of settings.

The rest of the paper is organized as follows. Section 2 defines the problem of finding KSPs. Section 3 presents the DTLP index, and its maintenance strategy is described in Section 4. Section 5 discusses KSP-DG as well as further optimizations. Section 6 presents the distributed implementation of DTLP and KSP-DG, and experimentally evaluates the performance of our proposal. Section 7 discusses related work, and Section 8 concludes this paper.

2 Problem Definition

Definition 1 ( Dynamic undirected graph).

A dynamic graph $G$ = ( $\mathcal{V}$ , $\mathcal{E}$ , $\mathcal{W}$ ) consists of (1) a finite set of vertices $\mathcal{V}$ , (2) a set of edges $\mathcal{E}\subseteq\mathcal{V}\times\mathcal{V}$ , where $e_{i,j}$ ( $e_{i,j}\in\mathcal{E}$ ) denotes an edge between vertices $v_{i}$ and $v_{j}$ , and (3) a set of non-negative weights $\mathcal{W}$ , where $w_{i,j}\in\mathcal{W}$ is the weight of edge $e_{i,j}$ , which may change by a negative or non-negative value $\Delta w$ at any time point.

In what follows, we present the terminologies and solutions in the context of graphs instead of road networks, in view of their potential applicability in other scenarios involving dynamic undirected graphs.

Definition 2 (Subgraph).

A graph ${SG}$ =( $\mathcal{V}^{\prime}$ , $\mathcal{E}^{\prime}$ , $\mathcal{W}^{\prime}$ ) is a subgraph of the graph $G$ = ( $\mathcal{V}$ , $\mathcal{E}$ , $\mathcal{W}$ ) iff

•

$\mathcal{V}^{\prime}\subseteq\mathcal{V}$ ,
•

$\mathcal{E}^{\prime}\subseteq\mathcal{E}$ $\land(e_{m,n}\in\mathcal{E}^{\prime}\to v_{m},v_{n}\in\mathcal{V}^{\prime})$ ,
•

$\mathcal{W}^{\prime}\subseteq\mathcal{W}$ $\land(w_{m,n}\in\mathcal{W}^{\prime}\to e_{m,n}\in\mathcal{E}^{\prime})$ .

Definition 3 (Path, Distance of Path).

Path $P(s,t)$ from one vertex $v_{s}$ (the source vertex) to another vertex $v_{t}$ (the destination vertex) in graph $G$ is a sequence of vertices $\langle\texttt{v}_{0}$ = $v_{s}$ , $\cdots$ $\texttt{v}_{l}$ , $\cdots$ , $\texttt{v}_{n}$ = $v_{t}\rangle$ such that $\forall l\in[1,n]$ , $e_{i,j}\in\mathcal{E}$ if $\texttt{v}_{l-1}=v_{i}$ and $\texttt{v}_{l}=v_{j}$ . In this work, we only consider simple paths, i.e., paths with no repeat vertices. The distance of $P(s,t)$ is defined as $D(P(s,t))$ = $\sum\limits_{i=1}^{n}w_{i-1,i}$ .

Definition 4 (k-shortest path query (KSP query)).

For a given pair of source and destination vertices, $v_{s}$ and $v_{t}$ , the $k$ -shortest path query, $q(v_{s},v_{t})$ , identifies the set of $k$ paths from $v_{s}$ to $v_{t}$ in graph $G$ , $\mathcal{P}_{s,t}=\{{P_{1}}(s,t),\ldots,P_{k}(s,t)\}$ , such that $D(P_{i}(s,t))\leq D(P_{i+1}(s,t))(i\in[1,k-1]$ ), and $\forall P(s,t)\notin\mathcal{P}_{s,t}$ , $D(P(s,t))\geq D(P_{k}(s,t))$ .

To ensure timely processing of the queries, we assume that query processing takes place in main memory. Moreover, as graph $G$ constantly evolves as queries arrive, we use a buffer $G_{curr}$ to model the current version of $G$ to ensure unambiguous semantics of the query answers. This buffer is updated continuously and asynchronously as the graph changes. At fixed time intervals, a snapshot $G_{curr}$ is taken, and the answer to an incoming KSP query is processed against the most recent snapshot. Each query answer has a timestamp indicating the moment at which the answer is exact.

3 Distributed Two-Level Path Index

A naive approach to process KSP queries is to compute from scratch the shortest paths on graph $G$ directly each time a query arrives. In practice, however, both the size and the dynamic nature of graphs, as well the vast volume of concurrent queries, make it infeasible to process queries this way. For this reason we consider an index structure that could scale and speed up the processing.

3.1 Desiderata

Our problem setting necessitates the following desirable properties in an index to support processing KSP queries.

(1) Low maintenance cost for dynamic graphs. Some existing index structures for processing shortest path queries maintain the shortest path between selected “landmark” vertices. However, when the edge weights of the graph constantly change, this becomes extremely expensive to maintain. As such, one of our requirements is that the index must be easy to maintain for dynamic graphs and imposes as little overhead as possible when the graph evolves.

(2) Suitable for deployment in a distributed setting. As the size of the graph and the number of concurrent queries keep increasing, a distributed index becomes a more attractive solution than a centralized one due to its ability to scale out. Since the graph $G$ itself may have to be partitioned and stored across the cluster, we should be able to maintain parts of the index on different nodes in a cluster corresponding to subgraphs of $G$ .

(3) Supporting distributed algorithms. With the graph partitioned and subgraphs stored across the cluster, query processing takes place on different nodes in parallel. It is thus necessary for the index to assist the identification of relevant subgraphs that may contain pieces of the shortest paths to avoid examination of all subgraphs. Such an index should provide sufficient pruning power, and at the same time guarantee the correctness of the query result.

3.2 Overview of DTLP

In view of the above requirements, we devise an index called DTLP that facilitates distributed KSP query processing over dynamic graphs. Based on a partitioning of $G$ , DTLP has a two-level structure: the first level indexes each subgraph by maintaining a list of bounding paths (Section 3.4) between any pair of boundary vertices (Section 3.3). This provides the basis for computing the lower bounds of the shortest distances between the boundary vertices. The second level keeps a skeleton graph (Section 3.6) with all boundary vertices in all subgraphs, and it is computed based on the bounding paths identified in the first level. Fig. 1 illustrates the structure of the DTLP index.

Refer to caption — Figure 1: Schematic Diagram of DTLP

3.3 Graph Partitioning and Boundary Vertices

The construction of DTLP starts with partitioning graph G into multiple subgraphs such that each subgraph has at most $z$ vertices. Different subgraphs may share vertices but not edges. For graph partitioning, we start from any vertex and traverse the graph $G$ using a breadth-first strategy to generate the subgraphs. The set of subgraphs is denoted as $\mathcal{S}$ = $\{{SG}_{1}$ , $\cdots$ , $SG_{n}\}$ such that (1) $\mathcal{V}_{1}\cup\cdots\cup\mathcal{V}_{n}=\mathcal{V}$ ; (2) ${\mathcal{E}}_{1}\cup\cdots\cup{\mathcal{E}}_{n}=\mathcal{E}$ ; (3) ${\mathcal{W}}_{1}\cup\cdots\cup{\mathcal{W}}_{n}=\mathcal{W}$ , where $n$ is the number of subgraphs and $SG_{i}=\{\mathcal{V}_{i},\mathcal{E}_{i},\mathcal{W}_{i}\}$ $(i\in[1,n])$ .

The vertices shared by two or more subgraphs are called boundary vertices. Evidently, any path from a non-boundary vertex in $SG_{i}$ to a non-boundary vertex in $SG_{j}$ must pass through one or more boundary vertices, as those boundary vertices are the only ”contact vertices” between subgraphs.

Example 1.

Figure 2 gives an example of graph $G$ that is partitioned into four subgraphs in Figure 3, where the threshold $z$ (maximum number of vertices in a subgraph) is set to 6. The boundary vertices in each subgraph are shaded.

(a)

(b)

(c)

(d)

Figure 3: Subgraphs of

G

3.4 Bounding Paths

Once graph partitions are in place, we move on to identify for each pair of boundary vertices in a subgraph a set of bounding paths. Bounding paths are specific paths in the subgraph serving as a reference to establish the bound distance, a stable lower bound of the shortest distance between the respective boundary vertices. The bounding paths do not change when the edge weights in the graph change, but the bound distances have to be updated to reflect the new information on edge weights. Those bounding paths will in turn be used to construct the skeleton graph.

As a first attempt, we consider utilizing the path(s) with the fewest edges between two boundary vertices as the bounding path(s). For any bounding path $P(i,j)$ with $m$ edges between two boundary vertices $v_{i}$ and $v_{j}$ in subgraph $SG$ , the corresponding bound distance, denoted by $BD(P(i,j))$ , is computed as the sum of the $m$ smallest edge weights in $SG$ . It is easy to show that $BD(P(i,j))$ is not greater than the shortest distance between $v_{i}$ and $v_{j}$ in $SG$ , but it very likely has a large discrepancy with the actual shortest distance. Next, we seek to further improve this bound in two ways. First, we identify more than one bounding path between a pair of boundary vertices, which provides more bound distances and we can choose the maximum one to narrow down the discrepancy with the actual shortest distance.

Second, we decompose the edge weight to finer granularity to provide better resolution when computing the bound distance. In particular, we assign virtual fragments (vfrags for short) for each edge in the graph $G$ and the number of vfrags of any edge $e_{i,j}$ equals to $w^{0}_{i,j}$ , where $w^{0}_{i,j}$ is the initial weight of $e_{i,j}$ at the beginning of the DTLP construction. Such vfrags are additive, i.e., the number of vfrags on a path, $\phi(P(i,j))$ , is the sum of the number of vfrags on each edge in the path. We call the weight of each vfrag in $e_{i,j}$ the unit weight, defined as ${w_{i,j}}/{w^{0}_{i,j}}$ , where $w_{i,j}$ is the current weight of edge $e_{i,j}$ , which varies over time. For any edge, the number of vfrags always remains the same, but the unit weight will change with varying weight.

With the two improvements above, we compute the bounding path and the bound distance as follows. For two boundary vertices $v_{i}$ and $v_{j}$ , given a configurable parameter $\xi$ , we compute a set $\mathcal{B}_{i,j}$ consisting of at most $\xi$ bounding paths that contain the least number of vfrags (where bounding paths containing the same number of vfrags are counted as only one path). Here, ${P^{\prime}_{l}}(i,j)$ ( $l\in[1,|\mathcal{B}_{i,j}|]$ ) is called a bounding path, and ${P^{\prime}_{1}}(i,j)$ represents the bounding path with the fewest vfrags.

For a bounding path ${{P_{l}}^{\prime}}(i,j)$ from $v_{i}$ to $v_{j}$ in subgraph $SG$ , we can compute a corresponding bound distance, which is the sum of the $\phi({{P_{l}}^{\prime}}(i,j))$ smallest unit weights in $SG$ , denoted by $BD({{P_{l}}^{\prime}}(i,j))$ .

Example 2.

For boundary vertices $v_{13}$ and $v_{14}$ in $SG_{4}$ in Fig. 4(a), ${P^{\prime}_{1}}(13,14)$ = $\langle v_{13},v_{16},v_{14}\rangle$ and ${P^{\prime}_{2}}(13,14)$ = $\langle v_{13},v_{18},v_{17},v_{16},v_{14}\rangle$ are the bounding paths if $\xi=2$ . As to ${P^{\prime}_{1}}(13,14)$ , $\phi({P^{\prime}_{1}}(13,14))=8$ and all unit weights of vfrags in $SG_{4}$ are 1 initially, so $BD({P^{\prime}_{1}}(13,14))=8$ . When $SG_{4}$ change to $SG^{\prime}_{4}$ , the unit weights in $SG^{\prime}_{4}$ are updated to $(1/3,3)$ , $(1/2,4)$ , $(1,8)$ , and $(2,3)$ , meaning there are 3 vfrags of unit weight $1/3$ , 4 vfrags of unit weight $1/2$ , and so on. Thus, $BD({P^{\prime}_{1}}(13,14))$ can be computed using the 8 smallest unit weights, consisting of 3 unit weights of $1/3$ , 4 unit weights of $1/2$ , and 1 unit weight of $1$ , with the result being 4.

Bounding paths are be computed in the initial graph offline and always remain the same regardless of the changing weights.

3.5 Lower Bound Distances

After identifying the set of bounding paths between $v_{i}$ and $v_{j}$ in subgraph $SG$ , we aim to determine the bounding path that would impose as tight a lower bound on the shortest distance as possible. This translates to locating the maximal bound distance less than or equal to the shortest distance corresponding to this set of bounding paths. It will be referred to as the lower bound distance between $v_{i}$ and $v_{j}$ , and the corresponding bounding path is called the lower bounding path.

To reach a tighter lower bound, recall that for each bounding path ${p_{l}}^{\prime}(i,j)$ between $v_{i}$ and $v_{j}$ in $SG$ , its bound distance ( $BD({p_{l}}^{\prime}(i,j))$ ) computed using vfrags is never greater than its actual distance ( $D({p_{l}}^{\prime}(i,j))$ ). Consequently, if $D({p_{l}}^{\prime}(i,j))\leq BD({p_{g}}^{\prime}(i,j))$ , we can conclude $D({p_{l}}^{\prime}(i,j))\leq D({p_{g}}^{\prime}(i,j))$ . Exploiting this relationship allows us to identify the shortest path between $v_{i}$ and $v_{j}$ among the set of bounding paths in many cases. If this fails, however, we can still utilize the bounding path with the maximal bound distance as the lower bounding path. We detail this approach below.

Definition 5 (Lower Bounding Path).

Let $\mathcal{B}_{i,j}$ be a set of bounding paths between $v_{i}$ and $v_{j}$ in $SG$ and ${P_{1}}(i,j)$ be the shortest path connecting these two vertices in the original graph $G$ . A bounding path ${P^{\prime}_{b}}(i,j)$ is the lower bounding path between $v_{i}$ and $v_{j}$ if it satisfies one of the following conditions:

(1) $D({P^{\prime}_{b}}(i,j))=D({P_{1}}(i,j))$ ; or

(2) the bound distance of ${P^{\prime}_{b}}(i,j)$ is maximal among all bounding paths in $\mathcal{B}_{i,j}$ .

Definition 6 (Lower Bound Distance).

Following Definition 5, the lower bound distance between $v_{i}$ and $v_{j}$ , denoted by $LBD(i,j)$ , equals to $D({P_{1}}(i,j))$ if condition (1) is established, or $BD({P^{\prime}_{b}}(i,j))$ if condition (2) is met.

Next, we introduce Theorem1¹¹1We have excluded the proofs for the theorems and lemmas in consideration of space limitations; however, they can be found in the appendix. to help identify the lower bound distance for a pair of boundary vertices.

Theorem 1.

Let $\mathcal{B}_{i,j}$ = $\{{P^{\prime}_{l}(i,j)}\}$ ( $l\in[1,r]$ , $r=|\mathcal{B}_{i,j}|$ ) be the set of bounding paths connecting $v_{i}$ and $v_{j}$ in subgraph $SG$ , and let $P^{\prime}_{u}(i,j)$ ( $u\in[1,r]$ ) be the path whose actual distance in $SG$ is the shortest within $\mathcal{B}_{i,j}$ . Assuming that the paths in $\mathcal{B}_{i,j}$ are sorted in ascending order based on their bound distances, we can make the following claims:

1.

If $BD(P^{\prime}_{l}(i,j))\leq D(P^{\prime}_{u}(i,j))$ and $BD(P^{\prime}_{l+1}$ $(i,j))$ $\geq$ $D(P^{\prime}_{u}(i,j))$ ( $l+1\in[1,r]$ ), then $P^{\prime}_{u}(i,j)$ is the shortest path between $v_{i}$ and $v_{j}$ in $SG$ . In this case, $P^{\prime}_{u}(i,j)$ is the lower bounding path.
2.

If $BD(P^{\prime}_{r}(i,j))<D(P^{\prime}_{u}(i,j))$ , then $BD(P^{\prime}_{r}(i,j))$ must be less than the actual shortest distance between $v_{i}$ and $v_{j}$ in $SG$ and $P^{\prime}_{r}(i,j)$ is the lower bounding path.

Proof.

To show the first claim holds, suppose the shortest path from $v_{i}$ to $v_{j}$ is not $P^{\prime}_{u}(i,j)$ but another path denoted by $P^{\prime}_{f}(i,j)$ that is not covered by $\mathcal{B}_{i,j}$ . Based on this assumption, it can be inferred that $\phi(P^{\prime}_{f}(i,j))>\phi(P^{\prime}_{r}(i,j))$ , so $BD(P^{\prime}_{f}(i,j))>BD(P^{\prime}_{r}(i,j))$ . As the actual distance of a path is no less than its bound distance, we have $D(P^{\prime}_{f}(i,j))>BD(P^{\prime}_{r}(i,j))\geq BD(P^{\prime}_{l+1}(i,j))$ . Also, because $BD(P^{\prime}_{l+1}(i,j))>D(P^{\prime}_{u}(i,j))$ , we have $D(P^{\prime}_{f}(i,j))>D(P^{\prime}_{u}(i,j))$ which is a contradiction.

The second claim is true if $P^{\prime}_{u}(i,j)$ is the shortest path from $v_{i}$ to $v_{j}$ in $SG$ ; otherwise, we infer that the shortest path from $v_{i}$ to $v_{j}$ is not in $\mathcal{B}_{i,j}$ . In this case, the bound distance of this shortest path must be greater than $BD(P^{\prime}_{r}(i,j))$ , so its actual distance is also greater than $BD$ $(P^{\prime}_{r}(i,j))$ . ∎

Example 3.

This example illustrates the two cases in Theorem 1. For the graph in Fig. 5(a), $P^{\prime}_{1}(s,t)$ = $\langle v_{s},v_{1},v_{t}\rangle$ , $P^{\prime}_{2}(s,t)$ = $\langle v_{s},v_{2},v_{3},$ $v_{t}\rangle$ , and $P^{\prime}_{3}(s,t)$ = $\langle v_{s},v_{4},v_{5},v_{6},v_{t}\rangle$ are bounding paths between $v_{s}$ and $v_{t}$ when $\xi=3$ . If the weights change as shown in Fig. 5(b), $P^{\prime}_{3}(s,t)$ becomes the bounding path with the shortest distance and the unit weights in this graph are updated to (2, 4), (4, 3), and (8, 2). As $BD(P^{\prime}_{1}(s,t))=4$ , $BD(P^{\prime}_{2}(s,t))=6$ , and $BD(P^{\prime}_{3}(s,t))=8$ , the bound distance of $BD$ $(P^{\prime}_{3}(s,t))$ equals to its actual distance. Hence, $P^{\prime}_{3}(s,t)$ is the shortest path from $v_{s}$ to $v_{t}$ , which follows the first claim in Theorem 1.

For the scenario shown as Fig. 5(c), $P^{\prime}_{1}(s,t)$ , $P^{\prime}_{2}(s,t)$ , and $P^{\prime}_{3}(s,t)$ are still the bounding paths connecting $v_{s}$ and $v_{t}$ . When the graph in Figure 5(c) changes to the one in Fig. 5(d), there are five vfrags with unit weight 1. In this case, $BD(P^{\prime}_{1}(s,t))=2$ , $BD(P^{\prime}_{2}(s,t))=3$ , and $BD(P^{\prime}_{3}(s,t))=4$ . Because $D(P^{\prime}_{3}(s,t))>BD(P^{\prime}_{3}(s,t))$ , it is uncertain whether $P^{\prime}_{3}(s,t)$ is the shortest path from $v_{s}$ to $v_{t}$ , but one can guarantee that the shortest distance between $v_{s}$ and $v_{t}$ is greater than $BD(P^{\prime}_{3}(s,t))$ , conforming to the second claim in Theorem 1.

Based on Theorem 1, we can identify the lower bounding paths and the lower bound distances between any pair of boundary vertices in each subgraph. Since any pair of boundary vertices $v$ can co-occur in more than one subgraph, there may be multiple lower bound distances associated with them, each for one subgraph. We call the least of these lower bound distances the minimum lower bound distance, denoted by $MBD(i,j)$ , that provides the basis for constructing the skeleton graph in the Section 3.6.

Figure 6: Skeleton Graph

G_{\lambda}

3.6 Skeleton Graph

The skeleton graph $G_{\lambda}$ contains all boundary vertices of all subgraphs. Any pair of boundary vertices $v_{i}$ and $v_{j}$ within the same subgraph is connected by edge $e^{\prime}_{i,j}$ with its weight being the minimum lower bound distance between $v_{i}$ and $v_{j}$ . The rationale behind introducing the skeleton graph is that KSPs between two vertices in the original graph $G$ possibly pass through the same sequence of boundary vertices as their shortest paths in $G_{\lambda}$ . Thus, $G_{\lambda}$ can provide an approximate search guideline to identify the different subgraphs in finding KSPs between a pair of vertices in $G$ . Fig. 6 shows a skeleton graph $G_{\lambda}$ corresponding to the graph $G$ in Fig. 2.

4 Maintenance of DTLP

As the edge weights of the graph evolve over time, the main task in maintaining DTLP is to update the lower bound distances between boundary vertices in each subgraph. After that, the lower bound distances between boundary vertices can be easily deduced, so do the edge weights in the skeleton graph $G_{\lambda}$ . In this section, we first introduce an Edges and Bounding Paths Inverted Index EBP-II in Section 4.1 to ease the maintenance of lower bound distances, and then transform EBP-II into a prefix tree-based index structure with higher storage efficiency in Section 4.2.

4.1 EBP-II

EBP-II is an inverted index built for each subgraph. Each entity in EBP-II is a key-value pair, where the edges appearing in the bounding paths are viewed as keys and the value of each key (i.e., an edge) is the set of bounding paths containing the corresponding edge. The design of EBP-II is such that when the weight of an edge changes by $\Delta w$ , we can immediately identify the bounding paths containing that edge, and update their actual distances by $\Delta w$ . Subsequently, we compute the unit weights of the edge based on its new weight, and then update the bound distances of the bounding paths in the subgraph. Finally, for any two boundary vertices, if their bounding paths are impacted by the update on the actual or bound distances, we can easily update their lower bound distance as per Theorem 1. Fig. 7 shows a subgraph instance as well as the bounding paths between the boundary vertices $v_{1}$ and $v_{10}$ , and its corresponding EBP-II is illustrated in Fig. 8.

4.2 Compaction of EBP-II

EBP-II supports efficient maintenance of lower bound distances. However, it requires large storage overhead due to two major reasons: 1) a subgraph could contain a large number of edges, and therefore indexing them using edges as keys and set of bounding paths as values could lead to large storage overhead; and 2) there could be many duplicate bounding paths associated with different keys, which unnecessarily consumes extra storage. In view of these issues, we introduce an algorithm to compact EBP-II without sacrificing the maintenance efficiency.

The compaction strategy first partitions the key-value pairs in EBP-II into different sets with the purpose of assigning the values having as many common paths as possible into the same set. In this way, the shared paths corresponding to different values within the same set have a higher likelihood of being compacted. Following this, a Modified Prefix Tree (MPTree for short) is introduced to condense the repeated bounding paths in the values within each group. We describe the procedure in more detail below.

4.2.1 Partitioning EBP-II.

Suppose $\mathcal{P}$ and $\mathcal{P}^{\prime}$ are two bounding path sets associated with two distinct edges. The compaction ratio for storing $\mathcal{P}$ and $\mathcal{P}^{\prime}$ refers to the proportion of common paths between $\mathcal{P}$ and $\mathcal{P}^{\prime}$ (which are stored only once) among all paths in these two sets. Ideally, their compaction ratio can reach $\tfrac{|\mathcal{P}\cap\mathcal{P}^{\prime}|}{|\mathcal{P}\cup\mathcal{P}^{\prime}|}$ , i.e., their Jaccard similarity. It is more likely to achieve a higher compaction ratio if values with a higher Jaccard similarity are grouped into the same set. As such, we employ Locality Sensitive Hashing (LSH) [26] to hash the edges whose values have high Jaccard similarity into the same set with the following procedure.

Step 1. Transforming EBP-II into a matrix called PE-Matrix, where each bounding path becomes a row and edges correspond to columns. If a bounding path includes an edge, the corresponding position of the matrix is set to 1 and 0 otherwise. The PE-Matrix helps depict the relationship between bounding paths and edges using 0s and 1s, simplifying the subsequent computation of a signature matrix with Minhash [27]. Fig. 9 gives the PE-Matrix of EP-Index in Fig. 8.

Step 2. Generating a signature matrix of PE-Matrix with MinHash [27], denoted by Sig-Matrix. The columns of Sig-Matrix still correspond to all edges, and rows correspond to $h$ hash functions. We employ each hash function to hash the row numbers of PE-Matrix and select a hashed row number as the signature for each column of Sig-Matrix. The signatures produced by $h$ hash functions constitute $h$ rows of Sig-Matrix. Similar sequences of signatures in Sig-Matrix indicate edges likely to share common paths. Example 4 illustrates the generation of the Sig-Matrix.

Step 3. Employing LSH to divide columns (edges) in a Sig-Matrix into different sets. More specifically, the rows of Sig-Matrix are first partitioned into $b$ bands. In every band, the sequence of $\frac{h}{b}$ signatures in every column is used to hash the columns into different sets such that any two columns hashed into the same set are identical in at least one band. As such, the columns within the same set are likely to have more common bounding paths than those in different sets.

For the edges within the same partitioned set, we propose the MPTree to further compact their bounding path sets, which will be discussed in Section 4.2.2.

Example 4.

For the given PE-Matrix in Fig.9, we introduce two hash functions ${h_{1}}(x)={(x+1)}$ mod 4 and ${h_{2}}(x)={(5x+1)}$ mod 4 to compute the corresponding Sig-Matrix that is initialized as depicted by Fig.10(a). Next, we process each row $r$ of PE-Matrix as follows. First, we compute ${h_{1}(r)}$ and ${h_{2}(r)}$ , shown in the right part of Fig.9, where $r$ is the row number. Second, for each column $c$ in $r$ do the following: 1) If $c$ has 0 in row $r$ , do nothing; 2) otherwise, all rows in column $c$ of Sig-Matrix are set to the smaller values between the current ones and $\{{h_{1}}(r),{h_{2}}(r)\}$ . Fig. 10(b) shows the procedure of computing the Sig-Matrix, where the edges with the same color are partitioned into the same group.

4.2.2 MPTree

MPTree is a modified prefix tree used to compact the sets of bounding paths corresponding to the edges within the same partitioned set by LSH. Prior to constructing MPTree, we first sort the bounding paths in each set in descending order based on their frequency. Next, we initialize MPTree as a root node, and then insert each edge and its bounding paths (as nodes) into the MPTree with the following procedure.

(1) For an edge $e_{i,j}$ and the set of its bounding paths $\mathcal{P}_{i,j}$ = $\{p_{0},\cdots,p_{l}\}$ to be inserted into MPTree, we first combine them as a sequence ${L}={\langle p_{0},\cdots,p_{l},e_{i,j}\rangle}$ , and then identify the longest matching prefix of ${L}$ in the MPTree, denoted by $\tilde{L}$ . Please note that $\tilde{L}$ does not need to start from the root but possibly from any node in the MPTree.

(2) If $\tilde{L}$ exists, the remaining part of ${L}$ will be directly appended to $\tilde{L}$ . Otherwise, ${L}$ will be inserted at the root. Here, $p_{j}$ ( $j\in[0,l]$ ) and $e_{i,j}$ are called normal and tail nodes respectively.

(3) When $\tilde{L}$ is inserted, the root node will record $e_{i,j}$ that has a reference to the tail node $e_{i,j}$ ; meanwhile, the tail node $e_{i,j}$ keeps $|\mathcal{P}_{i}|$ (i.e., the size of $\mathcal{P}_{i}$ ) that is used to efficiently identify the bounding paths set $\mathcal{P}_{i,j}$ for $e_{i,j}$ in the MPTree.

After all edges and their bounding path sets are inserted, the construction of MPTree is complete. Since each partitioned set of edges corresponds to a MPTree and a subgraph usually has multiple partitioned sets, we thus merge these MPTrees into a Global MPTree (G-MPTree for short) for the subgraph. In the merging process, we add a common parent node for the root of each MPTree as the root of G-MPTree. The root of G-MPTree records the edges kept by each of its child nodes. Fig. 11 shows a G-MPTree that consists of two MPTrees corresponding to the green and purple groups of edges shown in Fig. 10(b).

4.3 Maintenance of lower bound distances

When the weight of an edge $e_{i,j}$ changes by $\Delta w$ , we update the lower bound distances as follows. First, we identify the bounding paths covering $e_{i,j}$ based on G-MPTree. In particular, we first visit the root of G-MPTree to identify the subtree containing $e_{i,j}$ , and then locate the tail node $e_{i,j}$ in the subtree based on the reference to $e_{i,j}$ kept by the root. Subsequently, we traverse upward $|\mathcal{P}_{i}|$ steps from $e_{i,j}$ , where the visited normal nodes form $\mathcal{P}_{i}$ , i.e., the set of bounding paths containing $e_{i,j}$ . Finally, as discussed in Section 4.1, we update the actual and bound distances of the bounding paths in $\mathcal{P}_{i}$ . After that, the lower bound distances between the corresponding boundary vertices as specified in Section 4.1 can be efficiently updated.

5 KSP-DG algorithm

In this section, we first discuss the theoretical basis of KSP-DG (Section 5.1), and then present the detailed algorithm (Section 5.2). Subsequently, we propose a progressive Yen’s algorithm to optimize KSP-DG ( Section 5.3) and prove its correctness (Section 5.4). For clarity, we use ${P_{i}}(s,t)$ and ${P^{\lambda}_{j}}(s,t)$ to denote the $i^{th}$ and the $j^{th}$ shortest path in the original graph $G$ and the skeleton graph $G_{\lambda}$ respectively. For a given query $q(v_{s},v_{t})$ , we assume $v_{s}$ and $v_{t}$ are both boundary vertices in $G$ for ease of presentation, which means both of them are in $G_{\lambda}$ . Cases where $v_{s}$ and $v_{t}$ are not boundary vertices in $G$ are discussed in Section 5.

5.1 Underpinnings of KSP-DG

For a given query $q(v_{s},v_{t})$ , the basic idea of KSP-DG is to use the shortest paths between $v_{s}$ and $v_{t}$ in $G_{\lambda}$ one by one (in increasing order of distance) as a reference path to identify the corresponding shortest paths in $G$ that have the same sequence of boundary vertices. This iterative process continues until the KSPs for the query are found. For this to succeed, one key observation is that the path between $v_{s}$ and $v_{t}$ in $G_{\lambda}$ is not longer than the path linking $v_{s}$ and $v_{t}$ in $G$ with the same sequence of boundary vertices. which is formally presented as follows.

Lemma 1.

Given two boundary vertices $v_{i}$ and $v_{j}$ in $G_{\lambda}$ , if the shortest path between $v_{i}$ and $v_{j}$ in $G_{\lambda}$ , ${P^{\lambda}_{1}}(i,j)$ , contains only $v_{i}$ and $v_{j}$ , then $D({P^{\lambda}_{1}}(i,j))\leq D({P_{1}}(i,j))$ , where $D({P^{\lambda}_{1}}(i,j))$ and $D({P_{1}}(i,j))$ are the shortest distances between $v_{i}$ to $v_{j}$ in $G_{\lambda}$ and $G$ respectively.

Proof.

Assume for the sake of contradiction that $D({P^{\lambda}_{1}}(i,j))$ $>$ $D({P_{1}}(i,j))$ . By definition, the weight of edge ( $v_{i}$ , $v_{j}$ ) in $G_{\lambda}$ is the minimum lower bound distance between $v_{i}$ and $v_{j}$ , which is not greater than $D({P_{1}}(i,j))$ . Therefore, if $D({P^{\lambda}_{1}}(i,j))$ $>$ $D({P_{1}}(i,j))$ , then there must exist one or more vertices between $v_{i}$ and $v_{j}$ in ${P^{\lambda}_{1}}(i,j)$ , which contradicts the initial assumption. ∎

Theorem 2.

$\forall v_{s},v_{t}\in G_{\lambda}$ , $D({P^{\lambda}_{1}}(s,t))\leq D(P_{1}(s,t))$ .

Proof.

For the shortest path $P_{1}(s,t)$ between $v_{s}$ and $v_{t}$ in $G$ , there must be a corresponding path in graph $G_{\lambda}$ with the same sequence of boundary vertices as those present in $P_{1}(s,t)$ ; let us denote this path by ${P^{\lambda}_{f}}(s,t)$ . For any pair of adjacent vertices $v_{i}$ and $v_{j}$ in this sequence of boundary vertices (i.e., there exists an edge $(v_{i},v_{j})$ in ${P^{\lambda}_{f}}(s,t)$ ), it follows from Lemma 1 that the distance between $v_{i}$ and $v_{j}$ in ${P^{\lambda}_{f}}(i,j)$ cannot be greater than the distance connecting $v_{i}$ and $v_{j}$ in $P_{1}(i,j)$ . Hence, $D({P^{\lambda}_{f}}(s,t))\leq D(P_{1}(s,t))$ . If ${P^{\lambda}_{f}}(s,t)$ is the shortest path between $v_{s}$ and $v_{t}$ in $G^{\lambda}$ , then we immediately have $D({P^{\lambda}_{1}}(s,t))\leq D(P_{1}(s,t))$ ; otherwise, there must exist a shortest path ${P^{\lambda}_{1}}(s,t)$ such that $D({P^{\lambda}_{1}}(s,t))\leq D({P^{\lambda}_{f}}(s,t))$ . As $D({P^{\lambda}_{f}}(s,t))\leq D(P_{1}(s,t))$ , we have $D({P^{\lambda}_{1}}(s,t))$ $\leq$ $D(P_{1}(s,t))$ . ∎

5.2 KSP-DG Algorithm in Detail

KSP-DG is designed to run in a distributed cluster with the master-worker model. The master node maintains the original graph $G$ , receives the weight updates for edges in $G$ , and serves as the entry point of KSP queries. DTLP, as the fundamental index structure, is distributed across workers. In particular, the subgraphs resulting from partitioning $G$ are allocated to different workers on a many-to-one basis based on their load. Each worker maintains the assigned subgraphs as well as the bounding paths between boundary vertices (i.e., the first level of DTLP) within each subgraph. Moreover, a copy of the skeleton graph $G_{\lambda}$ (the second level of the DTLP index) is also kept on each worker.

Using the DLTP index, KSP-DG adopts a filter-and-refine strategy to iteratively find KSPs for the query $q(v_{s},v_{t})$ . Each iteration of KSP-DG consists of two steps: filter and refine. Without loss of generality, we describe the two steps for the $i^{th}$ iteration.

In the filter step, we compute the $i^{th}$ shortest path between $v_{s}$ and $v_{t}$ in the skeleton graph $G_{\lambda}$ , and use it as the reference path. This path corresponds to a sequence of boundary vertices, denoted by $\mathcal{R}$ , between $v_{s}$ and $v_{t}$ in $G$ . This step can be executed on one of the workers responsible for processing this query.

In the refine step, we aim to compute the corresponding k shortest paths connecting $v_{s}$ and $v_{t}$ in $G$ that traverse the same sequence of boundary vertices as those present in the reference path. Since any two adjacent boundary vertices in $\mathcal{R}$ must belong to the same subgraph, we can identify partial k shortest paths for each pair of adjacent boundary vertices in $\mathcal{R}$ from the corresponding subgraphs individually. This operation can be carried out by the workers maintaining those respective subgraphs in parallel. Next, the generated partial k shortest paths are reported back by these workers to the worker responsible for this query, which will then merge the partial k shortest paths received to form $k$ complete shortest paths, which all share the same sequence of boundary vertices as the reference path.

In each iteration, the generated k shortest paths corresponding to the reference path, called the candidate KSPs, are used to update a list $\mathcal{L}$ of the shortest paths that have been obtained so far. $\mathcal{L}$ keeps only k shortest paths found so far in ascending order of distance. After using the generated candidate KSPs to update $\mathcal{L}$ in the $i^{th}$ iteration, if the distance of the $k^{th}$ path in $\mathcal{L}$ is not greater than that of the reference path generated in the $(i+1)^{th}$ iteration, the algorithm terminates, and the paths in $\mathcal{L}$ are the final answer, i.e., the KSPs between $v_{s}$ and $v_{t}$ in $G$ . Otherwise, the iteration continues.

Algorithm 1 shows the procedure of KSP-DG. It first initializes the parameters in Line 1, and then executes the filter and refine steps in each iteration in Lines 2-12, where $P^{\lambda}_{i}(s,t)$ denotes the $i^{th}$ reference path. The function candidateKSP is to identify the candidate KSPs for a given reference path, and its pseudo-code is shown in Algorithm 2. Line 6 in Algorithm 2 uses a Progressive Yen’s (PYen’s for short) Algorithm that will be discussed in Section 5.3.2 to compute the $k$ -shortest paths in subgraph SG between the $j^{th}$ and $(j+1)^{th}$ vertices of the reference path.

G_{\lambda}

q(v_{s},v_{t})

\mathcal{S}

\{{SG}_{1}

\cdots

SG_{n}\}

;

0: KSPs from

v_{s}

v_{t}

G

;

\mathcal{L}=\phi

;

Dist=\infty

;

i=1

;

2: while

\mathcal{L}=\phi

||

Dist\leq D({P^{\lambda}_{i+1}}(s,t))

{\mathcal{C}}=

candidateKSP (

\mathcal{S}

{P^{\lambda}_{i}}(s,t)

);

4: Add

{\mathcal{C}}

into

{\mathcal{L}}

;

5: if

|{\mathcal{L}}|>k

then

6: Keep the k shortest paths in

\mathcal{L}

and remove others;

7: end if

8: Dist = the distance of the

k^{th}

path in

\mathcal{L}

;

i++

;

10: end while

11: return

\mathcal{L}

;

Algorithm 1 KSP-DG

\mathcal{S}

\{{SG}_{1}

\cdots

SG_{n}\}

P^{\lambda}_{i}(s,t)

;

0: Candidate KSPs from

v_{s}

v_{t}

G

;

1: Set

\mathcal{C}=\phi

; Set

\mathcal{Y}=\phi

;

j

=1;

2: while

j<|P^{\lambda}_{i}(s,t)|

3: Identify

\mathcal{U}

, the set of subgraphs containing the

j^{th}

and

(j+1)^{th}

vertices in

P^{\lambda}_{i}(s,t)

\texttt{v}_{j}

and

\texttt{v}_{j+1}

;

\mathcal{Y}=\phi

;

5: for all subgraph

SG

\in

\mathcal{U}

\mathcal{Y}

\mathcal{Y}

\cup

PYen(

\texttt{v}_{j}

\texttt{v}_{j+1}

SG

);

7: end for

8: Keep only

k

shortest paths in

\mathcal{Y}

;

\mathcal{C}

\mathcal{C}

\Join

\mathcal{Y}

;

10: Keep only

k

shortest paths in

\mathcal{C}

;

11:

j++

;

12: end while

13: Return

\mathcal{C}

;

Algorithm 2 candidateKSP

Example 5.

Suppose $v_{4}$ and $v_{13}$ in Fig. 2 are the source and destination vertices respectively, and $k=2$ . In the first iteration, KSP-DG identifies the first reference path from $v_{4}$ to $v_{13}$ in $G_{\lambda}$ (shown as Fig. 6) as ${P^{\lambda}_{1}(4,13)}=\langle v_{4},v_{6},v_{9},v_{13}\rangle$ with distance 19. Next, KSP-DG computes $k=2$ shortest paths between any two adjacent boundary vertices as shown in the following table, where the third column shows the subgraphs involved. Then, KSP-DG joins the partial shortest paths to generate $k=2$ candidate shortest paths from $v_{4}$ to $v_{13}$ , denoted by ${P_{1}}=\langle v_{4},v_{6},v_{9},v_{11},v_{12},v_{13}\rangle$ with distance 19 and ${P_{2}}=\langle v_{4},v_{5},v_{6},v_{9},v_{11},v_{12},v_{13}\rangle$ with distance 25. So, $\mathcal{L}=\{{P_{1}},{P_{2}}\}$ .

adjacent boundary vertices	partial shortest paths	involved subgraphs
$(v_{4}$ , $v_{6})$	$\langle v_{4},v_{5},v_{6}\rangle$ , $\langle v_{4},v_{6}\rangle$	$SG_{1}$ , $SG_{2}$
$(v_{6}$ , $v_{9})$	$\langle v_{6},v_{9}\rangle$	$SG_{2}$
$(v_{9}$ , $v_{13})$	$\langle v_{9},v_{11},v_{12},v_{13}\rangle$ $\langle v_{9},v_{11},v_{10},v_{14},v_{13}\rangle$	$SG_{3}$

Since the second reference path is ${P^{\lambda}_{2}}(4,13)$ = $\langle v_{4},v_{9},v_{13}\rangle$ with distance 22, and $D(P_{2})>D({P^{\lambda}_{2}}(4,13))$ , KSP-DG continues onto the second iteration, where it calculates candidate KSPs w.r.t. ${P^{\lambda}_{2}}(4,13)$ , denoted by $P_{3}$ = $\langle v_{4},v_{7},v_{8},v_{9},v_{11},v_{12},v_{13}\rangle$ with distance 22, and $P_{4}$ = $\langle v_{4},v_{7},v_{8},v_{9},v_{11},$ $v_{10},v_{14},v_{13}\rangle$ with distance 34. Then $\mathcal{L}$ is updated to $\{P_{1},P_{3}\}$ . After identifying the third reference path $P^{\lambda}_{3}(4,13)$ = $\langle v_{4},v_{6},v_{10},v_{13}\rangle$ with distance 25, it is safe for KSP-DG to conclude that $P_{1}$ and $P_{3}$ are the two shortest paths from $v_{4}$ to $v_{13}$ , as $D(P_{3})<Dist(P^{\lambda}_{3}(4,13))$ .

Finding KSPs in directed graphs. KSP-DG can be adapted for KSP queries in directed graphs with minor DTLP index modifications. Instead of computing a single lower bound distance for a pair of boundary vertices $v_{i}$ and $v_{j}$ , we maintain two sets of bounding paths and their corresponding lower bound distances, one for each direction (from $v_{i}$ to $v_{j}$ and vice versa). This modification transforms the skeleton graph into a directed one with two edges connecting each adjacent vertex pair, enabling KSP-DG to operate effectively for KSP search.

Non-boundary Vertices as Source or Destination. In scenarios where $v_{s}$ and/or $v_{t}$ are not boundary vertices, we address this by initially treating them as boundary vertices within subgraph $SG_{x}$ and $SG_{y}$ respectively. This tactical maneuver involves connecting $v_{s}$ to every boundary vertex $v_{i}$ in $SG_{x}$ and $v_{t}$ to every boundary vertex $v_{j}$ in $SG_{y}$ within the skeleton graph $G_{\lambda}$ . These connections are made with edge weights set to the minimum lower bound distance between $v_{s}$ and $v_{i}$ , and similarly for $v_{t}$ . Afterward, both $v_{s}$ and $v_{t}$ are incorporated into $G_{\lambda}$ , and the KSP-DG process, as outlined in Algorithm 1, is executed as if $v_{s}$ and $v_{t}$ were boundary vertices.

5.3 Optimization on Computing Partial KSPs

In the refinement step of KSP-DG, a computationally intensive yet frequently used operation involves calculating partial KSPs for specified pairs of boundary vertices within subgraphs. Typically, this task is performed using Yen’s algorithm, which can be inefficient. To address this inefficiency, we introduce PYen, a Progressive Yen’s algorithm designed to expedite this computation. In this section, we first provide an overview of Yen’s algorithm and then delve into the optimizations introduced in PYen.

5.3.1 Yen’s algorithm

In Yen’s algorithm, given source and destination vertices $v_{s}$ and $v_{t}$ in a subgraph $SG$ , it iteratively identifies the $(i+1)^{th}$ shortest path $P_{i+1}(s,t)$ based on the previously determined $i^{th}$ shortest path $P_{i}(s,t)$ ( $1\leq i<k$ ). The algorithm computes a deviation path $P^{l}i(s,t)$ for each vertex $v_{l}$ (excluding $v_{t}$ ) in $P{i}(s,t)$ , referred to as a deviation vertex. A deviation path $P^{l}i(s,t)$ comprises two partial paths: the base partial path from $v_{s}$ to $v_{l}$ following $P{i}(s,t)$ and the spur partial path, which represents the shortest path from $v_{l}$ to $v_{t}$ in the subgraph with the weight of edge $e_{l,l+1}$ set to infinity. Here, $v_{l}$ and $v_{l+1}$ are adjacent vertices in ${P_{i}}(s,t)$ . Since the base partial path is already known, the focus is on computing the spur partial path, typically done using existing shortest path algorithms like Dijkstra. Once all deviation paths of ${P_{i}}(s,t)$ are determined, the shortest one among them becomes the next shortest path $P_{i+1}(s,t)$ . This process continues until the KSPs between $v_{s}$ and $v_{t}$ are determined.

5.3.2 PYen algorithm

PYen has the following optimizations over Yen’s algorithm.

Parallel deviation path identification. Following insights from Para-Yen[28] that parallelized deviation path computation, PYen improves Yen’s efficiency by initiating separate search instances to compute spur partial paths from each vertex in $P_{i}(s,t)$ to the destination concurrently. This fully parallelized approach yields significant speedup over Yen’s algorithm, particularly when ample computing resources, such as CPU cores, enable greater parallelism. In our distributed solution, where computing resources are already stretched with three parallel levels for processing KSP queries, computing partial KSPs, and identifying deviation paths, PYen shifts focus from single shortest path parallelization, as in Para-Yen[28], to optimizing search efficiency by reducing redundant computations and early termination of unpromising deviation path exploration.

Avoiding repetitive computation. Yen’s algorithm redundantly computes the same shortest paths for spur partial paths, whereas PYen mitigates this by reusing previously computed shortest paths. Unlike prior work, such as Feng’s algorithm [29], which precomputes and indexes the shortest paths to the destination vertex, PYen addresses dynamic graphs where precomputed paths often become invalid due to changing edge weights. Furthermore, PYen tackles the unique challenge of reusing previously identified shortest paths in parallel deviation path computation, a departure from the sequential processing in Feng’s algorithm [29].

Challenge: In Yen’s algorithm, each deviation vertex must find a loop-free spur path by avoiding specific impassable edges. To achieve this, each deviation vertex corresponds to a residual subgraph, created by excluding impassable edges from the original subgraph, facilitating the computation of the corresponding spur path. However, disparities in the shortest paths between the same vertices in different residual subgraphs hinder straightforward reuse. Specifically, for any two deviation vertices, the one with more impassable edges involves a residual subgraph nested within the other’s residual subgraph. Due to parallel deviation path computation, the spur path of the vertex with more impassable edges is often computed before the other. Consequently, the shortest paths computed in its subgraph cannot be directly applied to the other, as they tend to be longer in the former’s subgraph.

Solution: To tackle this challenge, we index only those shortest paths that are entirely consistent in both the residual subgraph they are computed and the original subgraph. Such paths inherently cannot be shorter within any other subgraph version and can thus be used for computing the spur path from any deviation vertex, provided it lacks impassable edges related to the spur path. To facilitate this approach, PYen maintains a lightweight index consisting of a distance array $\mathcal{A}_{D}$ and a path array $\mathcal{A}_{P}$ . These arrays store identified shortest distances and paths from deviation vertices to $v_{t}$ , with sizes matching the vertex count of the subgraph. We sort the vertices in the subgraph based on their identifiers in ascending order and arrange the items in $\mathcal{A}_{D}$ and $\mathcal{A}_{P}$ so that each item corresponds to an vertex in the same order. For a vertex $v_{i}$ ( $v_{i}\neq v_{t}$ ) in the subgraph, we let $\mathcal{A}_{D}[v_{i}]$ and $\mathcal{A}_{P}[v_{i}]$ represent the values of $v_{i}$ in $\mathcal{A}_{D}$ and $\mathcal{A}_{P}$ respectively, where $\mathcal{A}_{D}[v_{i}]$ is the shortest distance from $v_{i}$ to $v_{t}$ , while $\mathcal{A}_{P}[v_{i}]$ keeps the identifier of the vertex next to $v_{i}$ in the shortest path from $v_{i}$ to $v_{t}$ .

All values in $\mathcal{A}_{D}$ and $\mathcal{A}_{P}$ are initialized to $\infty$ and null respectively. Once the shortest path $P(i,t)$ from $v_{i}$ to $v_{t}$ is identified and identical with the shortest path from $v_{i}$ to $v_{t}$ in $SG$ , the shortest distances from all vertices in $P(i,t)$ to $v_{t}$ are determined and stored in $\mathcal{A}_{D}$ . Meanwhile, $P(i,t)$ is inserted into $\mathcal{A}_{P}$ . With $\mathcal{A}_{D}$ and $\mathcal{A}_{P}$ , an search instance first accesses $\mathcal{A}_{D}[v_{h}]$ to detect whether the shortest distance $SD(v_{h},v_{t})$ from $v_{h}$ to $v_{t}$ is known when visiting a vertex $v_{h}$ for computing a spur partial path. If $SD(v_{h},v_{t})$ has been previously determined (i.e., not $\infty$ ), we can easily obtain the shortest path from $v_{h}$ to $v_{t}$ , $P(h,t)$ , from $\mathcal{A}_{P}$ by sequentially obtaining the identifier of the next vertex in $P(h,t)$ from $v_{h}$ . If $P(h,t)$ contains no impassable edges, it can be readily used to expedite the spur path computation; otherwise, the shortest path from $v_{h}$ to $v_{t}$ needs to be recomputed.

Early termination of unpromising deviation paths. Deviation paths that are guaranteed not to be the partial KSPs are pruned early in PYen. Once the $i^{th}$ shortest path $P_{i}(s,t)$ is found and $(k-i)$ shortest paths remain to be computed, we only keep track of the already identified $(k-i)$ shortest deviation paths. We use the $(k-i)^{th}$ deviation path ( $P^{k-i}$ ) as a reference to prune deviation paths that show no promise. Specifically, if the current distance of a deviation path being explored for $P_{i}(s,t)$ exceeds that of $P^{k-i}$ before it is fully generated, we safely discard it. In contrast, deviation paths with a smaller distance than $P^{k-i}$ are utilized to update the maintained $(k-i)$ shortest deviation paths. After all deviation paths of $P_{i}(s,t)$ have been processed in this way, the shortest one among the maintained $(k-i)$ deviation paths is the next shortest path $P_{i+1}(s,t)$ .

5.4 Analysis of KSP-DG

Correctness. The KSP-DG algorithm is provably correct, as shown below.

Lemma 2.

Let ${P^{\lambda}_{i}}(s,t)$ be the $i^{th}$ reference path from $v_{s}$ to $v_{t}$ and $\mathcal{C}_{i}$ be the set of candidate KSPs w.r.t. ${P^{\lambda}_{i}}(s,t)$ . We have $\forall P_{i}(s,t)\in\mathcal{C}_{i}$ , $D(P^{\lambda}_{i}(s,t))$ $\leq$ $D(P_{i}(s,t))$ .

Proof.

Let $\mathcal{S}_{i}$ denote the sequence of boundary vertices on ${P^{\lambda}_{i}}(s,t)$ in graph $G$ , where $v_{s}$ and $v_{t}$ are viewed as boundary vertices. Since $P_{i}(s,t)\in\mathcal{C}_{i}$ , the sequence of boundary vertices on $P_{i}(s,t)$ is the same as $\mathcal{S}_{i}$ in graph $G$ . For any two adjacent boundary vertices $v_{j}$ and $v_{j+1}$ in $\mathcal{S}_{i}$ , based on Lemma 1, we infer that the shortest distance between $v_{j}$ and $v_{j+1}$ in skeleton graph $G_{\lambda}$ is not greater than their shortest distance in graph $G$ . Accumulating the shortest distances of all pairs of adjacent vertices in $\mathcal{S}_{i}$ on $G_{\lambda}$ and $G$ , we have $D(P^{\lambda}_{i}(s,t))$ $\leq$ $D(P_{i}(s,t))$ . ∎

Theorem 3.

In the $i^{th}$ ( $i\geq 1$ ) iteration of KSP-DG, if $D(P_{k})\leq D({P^{\lambda}_{i+1}}(s,t))$ , where $P_{k}$ is the $k^{th}$ path in $\mathcal{L}$ , then the paths in $\mathcal{L}$ are KSPs from $v_{s}$ to $v_{t}$ in $G$ , .

Proof.

Suppose for the sake of contradiction that there exists a path ${P_{f}}(s,t)$ with a smaller distance than $P_{k}$ and not in $\mathcal{L}$ . ${P_{f}}(s,t)$ is thus not a candidate shortest path generated based on any reference path ${P^{\lambda}_{j}}(s,t)$ ( $j\in[1,i]$ ). Therefore, there must be another reference path ${P^{\lambda}_{f}}(s,t)$ that matches ${P_{f}}(s,t)$ and $D({P^{\lambda}_{f}}(s,t))\geq D({P^{\lambda}_{i+1}}(s,t))$ . Based on the given condition $D(P_{k})\leq D({P^{\lambda}_{i+1}}(s,t))$ , we can infer that $D(P_{k})\leq D({P^{\lambda}_{f}}(s,t))$ . Because $D({P^{\lambda}_{f}}(s,t))\leq D({P_{f}}(s,t))$ follows from Lemma 2, we have $D(P_{k})\leq D({P_{f}}(s,t))$ , which contradicts the initial assumption. ∎

Computation Cost. KSP-DG comprises two primary operations: (1) identifying reference paths and (2) computing candidate KSPs for each reference path. In Operation (1), the worker computes a total of $r$ reference paths with a time complexity of $O(r(|\mathcal{E}_{\lambda}|+|\mathcal{V}_{\lambda}|)\log{|\mathcal{V}_{\lambda}|})$ . Here, $|\mathcal{V}_{\lambda}|$ and $|\mathcal{E}_{\lambda}|$ refer to the vertex and edge counts in the skeleton graph $G_{\lambda}$ . In Operation (2), during each KSP-DG iteration, partial KSPs are computed between adjacent boundary vertices in $P_{\lambda}$ . As such, there are ${|P_{\lambda}|}-1$ pairs of adjacent boundary vertices to process in each iteration, resulting in a computation complexity of $O(k\cdot{|P_{\lambda}|}\cdot(|\mathcal{V}_{A}|+|\mathcal{E}_{A}|)\log{|\mathcal{V}_{A}|})$ per iteration, where $|\mathcal{V}_{A}|$ and $|\mathcal{E}_{A}|$ represent the maximum number of vertices and edges of each relevant subgraph. Given that this operation is distributed across $w$ workers and executed $r$ times, the overall computation cost for KSP-DG is expressed as: $O(r(|\mathcal{E}_{\lambda}|+|\mathcal{V}_{\lambda}|)\log{|\mathcal{V}_{\lambda}|})$ + $O(k\cdot r\cdot(|\mathcal{V}_{A}|+|\mathcal{E}_{A}|)\log{|\mathcal{V}_{A}|}\cdot\left\lceil\frac{|P_{\lambda}|}{w}\right\rceil$ ).

6 Experiments

6.1 Implementation of KSP-DG on Storm

We implement KSP-DG on Apache Storm[30], a popular distributed stream processing framework, to evaluate its performance. Following the Storm paradigm, KSP-DG is designed as a “topology” of a directed acyclic graph with Spouts and Bolts as nodes of this graph. In Storm, spouts and bolts, as logical processors, are connected with streams, where each stream is an unbounded sequence of tuples containing the data to be processed. A spout acts as a source of streams in a topology, and every bolt processes the tuples according to a user-specified logic.

In the KSP-DG topology, only one Spout is set to receive edge weight updates and new KSP queries. This Spout maintains the partitioning of subgraphs to assist itself in sending edge weight update directly to corresponding subgraphs. SubgraphBolts are responsible for maintaining a set of subgraphs, while QueryBolts are in charge of executing KSP queries, where each query is processed by one unique QueryBolt.

Figure 12 presents the procedure of processing $q(v_{s},v_{t})$ by KSP-DG on Storm. When query $q$ arrives at the Spout, we first detect whether $v_{s}$ and $v_{t}$ are boundary vertices. If not, they will be processed by Step 1 first; otherwise, $q$ will be directly processed by Step 2.

Step 1 : The Spout sends $v_{s}$ and $v_{t}$ to the SubgraphBolt(s) that maintains the subgraph(s) covering $v_{s}$ and $v_{t}$ respectively. Then each of the relevant SubgraphBolt(s) computes the lower bound distances from $v_{s}$ and $v_{t}$ to each boundary vertex in the subgraph(s) covering $v_{s}$ and $v_{t}$ respectively. Next, those identified lower bound distances as well as $v_{s}$ and $v_{t}$ are delivered to every QueryBolt, which inserts $v_{s}$ and $v_{t}$ into the skeleton graph $G_{\lambda}$ using the principle discussed in Section 5.

Step 2 : The Spout assigns $q$ to a QueryBolt $QB_{i}$ , which computes a reference path $P^{\lambda}$ for this query, and then broadcasts $P^{\lambda}$ and $q$ to all SubgraphBolts. Next, every SubgraphBolt identifies the subgraphs within those it maintains that cover a pair of two adjacent vertices in $P^{\lambda}$ , and then generates partial KSPs between each pair of vertices from each such subgraph. These partial KSPs and $q$ are then returned to ${QB}_{i}$ . After receiving all partial KSPs related to $P^{\lambda}$ , ${QB}_{i}$ joins the received partial paths to generate candidate KSPs corresponding to $P^{\lambda}$ . These candidate KSPs are used to update the list of paths obtained so far. When the identified KSPs satisfy the terminating condition, ${QB}_{i}$ outputs the final result; otherwise, it generates the next reference path and continues onto the next iteration.

6.2 Experiment Setup and Datasets

The system is deployed on a cluster of 10 servers from a public cloud service provider, and each server has a quadcore CPU of 2.5GHz and 32GB memory. These servers are connected via Ethernet with a bandwidth of 1Gbps. We use as the datasets four real road networks with travel times from New York, Colorado, Florida, and Central USA [31], which are directed graphs, denoted by NY, COL, FLA, and CUSA respectively. The number of vertices and edges in these graphs are given in Table I, along with the number of subgraphs and skeleton graphs that can be obtained when $z$ (the maximum subgraph size) takes on their typical values. Moreover, we employ 20 hash functions with in the form of $h_{i}(r)=(a_{i}\cdot r+1)\bmod c$ ( $i\in[1,20]$ ) in LSH to partition EBP-II and the number of bands $b$ is set to 2 as discussed in Section 4.2.1. Notably, $c$ is the maximum prime number based on the the number of rows in PE-Matrix, and $a_{i}$ sequentially takes prime numbers ranging from 2 to 71, totaling 20 prime numbers.

TABLE I: Statistics on the Road Network Datasets

road network	#vertices	#edges	z	#subgraphs ( $n_{b}>$ 5)	#vertices of $G_{\lambda}$
NY	264,346	733,846	200	4,173 (1,586)	24461
COL	435,666	1,057,066	200	8,001 (2,004)	27,665
FLA	1,070,376	2,712,798	500	13,701 (3,682)	52,640
CUSA	14,081,816	34,292,496	1000	121,725 (18,251)	514,618

We use the (normalized) travel time on each road in the road networks as the edge weights in the graphs. Since only one snapshot of the travel times in each road network is given in the dataset, we adopt a well-established model [32] to dynamically vary the travel time in each road to simulate real-world traffic conditions. We use $\alpha$ to represent the percentage of edges that change weights at each snapshot, and $[-\tau,\tau]$ to denote the range of weight variation. We apply identical changes to the weights of the two edges in the opposite direction between a pair of vertices to simulate varying undirected graphs; for CUSA, we also experiment with the case where the weights of the opposite edges change independently to simulate varying directed graphs. All results shown are the average of 20 runs on the cluster of 10 servers unless otherwise specified.

6.3 Evaluation of DTLP

We evaluate the influence of parameters $z$ , $\xi$ , $\alpha$ , and $\tau$ on the performance of DTLP, which are summarized in Table II. The subgraph size $z$ significantly impacts the scale of the skeleton graph ${|G^{\lambda}|}$ , the number of relevant subgraphs for each query ( $r$ ), and the computation cost of partial KSPs within a relevant subgraph ( $t_{r}$ ). Ideally, $z$ should be relatively small to ensure that $r$ for each query exceeds the number of distributed workers ( $w$ ), facilitating effective parallel computation of partial KSPs and reducing $t_{r}$ . However, setting $z$ excessively small is unwise. An overly small $z$ increases $|G^{\lambda}|$ , elevates the cost of computing the reference path based on $G^{\lambda}$ , and results in an excess of relevant subgraphs for each query, exceeding $w$ and inflating scheduling costs. In light of these considerations, it is advisable to set $z$ to be three or four orders of magnitude smaller than the graph size, particularly for graphs containing hundreds of thousands of vertices.

TABLE II: Summary of Parameters Used in Evaluation

Parameters

Meaning

Default value

z

size of subgraph

100 (NY), 200 (COL) 400 (FLA), 1000 (CUSA)

\xi

number of bounding paths between

a pair of boundary vertices

\alpha

percentage of edges changing weights at each snapshot

50%

\tau

range of edge weight variation

50%

6.3.1 Construction Cost

Figures 15(a)-15(d) depict the time and memory usage to build DTLP for NY, COL, FLA, and CUSA varying the values of $z$ . Building time first decreases and then increases as $z$ grows. This is because the number of subgraphs is reduced when the road network is partitioned into larger subgraphs, resulting in less subgraphs being assigned to each server, so the total building time declines. When $z$ grows beyond a certain threshold (e.g., $z$ =100 for NY), however, this decrease is outweighed by the increase in the number, as well as, the average length of bounding paths in each subgraph. For the same reason, memory consumption by MPTree and the skeleton graph shows a trend similar to that of building time. Moreover, we compare the time for building DTLP in the directed and undirected graphs using CUSA. Fig. 15(d) shows that the building time for the directed graph is double that of the undirected graph, as two sets of bounding paths (in opposite directions) for each pair of boundary vertices have to be computed in directed graphs. For the same reason, the maintenance cost of DTLP for directed graphs is also almost doubled, as shown in Fig. 14(d). Fig. 15(e) depicts the memory consumption comparison of MPTree and EBP-II on NY. MPTree consumes significantly less memory than EBP-II due to the compaction of duplicate bounding paths, while its maintenance cost is merely a little greater than that of EBP-II, as shown in Fig. 14(e).

We further evaluate the influence of the size of a graph on the building time of DTLP. For this purpose, we choose five subgraphs from COL with 50K, 100K, 150K, 200K, and 250K vertices respectively, and use $N_{g}$ to denote their sizes. We measure the time for building DTLP for these selected graphs, and the results are shown in Fig. 14(a) (left vertical axis). Apparently, the construction cost of DTLP increases almost linearly with the size of the graph, as the cost of computing bounding paths is roughly proportional to $N_{g}$ .

6.3.2 Maintenance Cost

We study the trend of the maintenance cost of DTLP on graphs of varying sizes. We change the weights of half of the edges in each graph. Therefore, the number of varying weights is directly proportional to $N_{g}$ , the size of the graph examined. As shown in Fig. 14(a) (right vertical axis), there is an approximately linear ascending trend in the maintenance time with the size of a graph, as the maintenance time is heavily affected by the number of the varying weights which is proportional to $N_{g}$ .

We further evaluate the time required to update DTLP with different values of $\alpha$ and $\xi$ , and the results are shown in Fig. 14(b) and Fig. 14(c) respectively. In these two groups of experiments, all weight updates are fed into the system as a batch, and the maintenance time of DTLP is measured as the time between receiving the new weights and finishing updating the DTLP index. Fig. 14(b) shows an ascending trend of the maintenance cost w.r.t. $\xi$ , as larger values of $\xi$ cause more bounding paths with bound distances to be updated, resulting in higher maintenance costs. However, the rate of growth slows down when $\xi$ exceeds a certain value (e.g., $\xi$ =15 in FLA) because the number of bounding paths in some subgraphs stops increasing when $\xi$ is large enough. Fig. 14(c) shows an ascending trend with increasing $\alpha$ , as more weights need to be processed.

6.4 Evaluation of KSP-DG

Our next task is to study the impact of different parameters ( $z$ , $\alpha$ , $\xi$ , and $k$ ) on the performance of KSP-DG. The default value of $k$ is 2, but we vary it as needed for our evaluation.

6.4.1 Number of Iterations

The number of iterations required by KSP-DG with varying values of $\xi$ , $\alpha$ , $k$ , and $\tau$ are shown in Figures LABEL:iterations-xi-LABEL:iterations-varying-rate. $k$ is set to 50 for effective measuring the the number of iterations as the variation is more apparent when $k$ takes larger values. Fig. LABEL:iterations-xi shows that the number of iterations significantly decreases with increasing $\xi$ , which is as expected, because when more bounding paths are indexed, it narrows the gap between the lower bound distance and the actual shortest distance for the given pair of vertices and thus reduces the number of iterations needed. However, as a higher construction and maintenance cost of DTLP is associated with a larger $\xi$ , the value of $\xi$ has to be chosen in a way that balances the processing time of KSP-DG and the construction/maintenance time of DTLP.

As shown in Fig. LABEL:iterations-varying-range, the number of iterations increases with growing $\tau$ (the varying range of weights). The reason is that, greater variation in the weights would loosen the lower bound distance and thus weaken the pruning power of the skeleton graph $G_{\lambda}$ . Moreover, greater values of k would incur more iterations in KSP-DG, as shown in Fig. LABEL:iterations-k. But the good news is that the rate of increase is very small when k is not large (i.e., $k<30$ ), which should be sufficient for most applications. The influence of $\alpha$ , as shown in Fig. LABEL:iterations-varying-rate, differs from one dataset to another, implying that its effect may depend on the particular distribution of edges with varying weights. However, it is apparent that the numbers of iterations for all the datasets are small when the weights of the graph are not changing dramatically (i.e., $\alpha<30\%$ ).

TABLE III: Number of Vertices in Skeleton Graph

G_{\lambda}

with Varying

z

NY,COL FLA	z=100 z=350	z=150 z=400	z=200 z=450	z=250 z=500	z=300 z=550
$G_{\lambda}$ (NY)	32,534	27,668	24,461	22,604	20,775
$G_{\lambda}$ (COL)	36,831	30,886	27,655	25,329	23,271
$G_{\lambda}$ (FLA)	60,125	57,085	54,695	52,640	50,411

6.4.2 Query Processing Time

Figures 16(a)-16(b) depict the influence of parameters $z$ and $k$ on the query processing time using KSP-DG. In this group of experiments, we randomly generate 1,000 queries ( $N_{q}=1,000$ ), feed them into the system simultaneously, and measure the total processing time of all the queries. As can be observed from the plots, the processing time first decreases and then increases as $z$ grows. This is because as $z$ increases, the number of subgraphs decreases, and so does the number of boundary vertices, which in turn leads to a smaller skeleton graph $G_{\lambda}$ (shown in Table III). Roughly speaking, a smaller $G_{\lambda}$ means fewer vertices in a reference path and fewer subgraphs to be explored in each iteration. Moreover, the cost of generating partial k shortest paths within each subgraph grows very slowly when the size of the subgraph is small. Therefore, the processing time decreases as $z$ increases but is still small. However, when $z$ grows beyond a certain threshold (e.g., 200 for $k=2$ on NY), the cost of computing partial k shortest paths in subgraphs increases significantly and dominates the overall cost. Consequently, the processing time starts to grow as $z$ becomes sufficiently large. In the following discussion, we set the value of $z$ to 200, 200, 500, and 1000 in NY, COL, FLA, and CUSA respectively, unless otherwise specified.

From each figure we also observe that the processing time of KSP-DG increases linearly with $k$ , which is because larger values of $k$ lead to more candidate k shortest paths being generated in each iteration. Moreover, the number of iterations also increases with k, as shown in Figure LABEL:iterations-k.

The scalability of KSP-DG w.r.t. the number of concurrent queries is evaluated, and the results are shown in Fig. 16(c). We generate multiple batches of queries with different batch sizes (number of queries), and feed each batch into the system to measure the total processing time. From the curves in Fig. 16(c), we observe that running time of KSP-DG increases approximately linearly w.r.t. the number of queries with a low rate of growth, benefiting from its distributed paradigm.

Figures 16(d)-16(e) present the impact of $\xi$ and $\tau$ on the running time of KSP-DG. We only display the performance on NY; similar trends are observed on the other datasets. Fig. 16(d) shows that the running time decreases with increasing $\xi$ . This is because a larger $\xi$ leads to a smaller number of iterations, as demonstrated in Fig. LABEL:iterations-xi. This trend is more apparent when k takes larger values as more iterations are required by KSP-DG when $k$ is large (as shown in Fig. LABEL:iterations-k). The relationship of $\tau$ and the processing time is evaluated in Fig. 16(e), where the processing time slowly increases when $\tau$ grows, as a larger $\tau$ leads to a greater number of iterations (Fig. LABEL:iterations-varying-range).

6.5 Comparison with Baseline Algorithms

We compare KSP-DG to FindKSP [5], Para-Yen[29], and Yen’s algorithm [6] on scalability with respect to the number of queries and $k$ . Since FindKSP, Para-Yen and Yen’s algorithm are centralized, we run these three algorithms on every server individually and then distribute all queries to the adopted servers randomly for fair comparison. To evaluate PYen’s effectiveness, we included two KSP-DG versions, namely KSP-DG-Yen and Para-KSP-DG, which employ Yen [6] and Para-Yen [29] for computing partial KSPs in place of PYen respectively, serving as baseline comparisons. We refer to KSP-DG, KSP-DG-Yen, and Para-KSP-DG collectively as “KSP-DG-based algorithms”.

Figures 17(a)-17(d) show the scalability comparison of the all algorithms when processing an increasing number of queries in various graphs. KSP-DG-based algorithms outperform the three centralized algorithms by exhibiting a lower rate of processing time growth. Their advantage stems from the ability to decompose the KSP search problem into smaller parallel procedures, which is challenging for the sequential processing required by FindKSP, Para-Yen, and Yen’s algorithm. The performance advantages of KSP-DG-based algorithms are particularly evident in large graphs (e.g., CUSA), making them highly suitable for large graph scenarios with substantial concurrent queries.

Moreover, KSP-DG outperforms KSP-DG-Yen and Para-KSP-Yen on each dataset due to the effectiveness of PYen in computing partial KSPs. This advantage becomes more remarkable as $k$ increases, as depicted by Fig.17(e). This is because PYen’s optimizations, especially the reuse of previous computations, become more effective with higher $k$ values. In contrast, Para-Yen’s approach of parallelizing single shortest path computations for each deviation path within KSP-DG’s resource-intensive processes fails to expedite partial KSP computation and instead adds scheduling overhead, especially noticeable as $k$ grows.

6.6 Scaling-out

To further evaluate the horizontal scalability of DTLP and the search algorithms, we extend the cluster to 20 servers and the results are shown in Figures 18(a)-18(e). As is clear from Fig. 18(a), the building time of DTLP decreases when more servers are introduced, as DTLP can distribute the load to all servers, and thus is able to utilize the power of a cluster.

Fig. 18(b) demonstrates the performance of KSP-DG with a varying number of servers. We feed a batch of 1,000 queries into KSP-DG with a different number of servers employed. In each figure, we observe a notable reduction of processing cost when more servers are used, which testifies to the horizontal scalability of KSP-DG. We also feed KSP-DG with 1,000 queries with different values of $k$ to further validate its scalability by varying the number of servers, and the results are illustrated in Fig. 18(c). It is observed that the running time of KSP-DG significantly decreases with more servers being used, regardless of the value of $k$ .

Fig. 18(d) depicts the scalability of the six algorithms when processing the same group of queries on a different number of servers. The results show that the three KSP-DG based algorithms always outperform FindKSP, Para-Yen and Yen’s algorithm. The relative speedups of the three algorithms with a varying number of servers are shown in Fig. 18(e). It can be observed that the relative speedup of each algorithm grows linearly with the number of servers.

7 Related work

Centralized KSP Algorithms. Yen’s algorithm [6] identifies KSPs based on a deviation paradigm, which first computes the shortest path, and then generates all candidate paths that deviate from this path by applying Dijkstra’s algorithm repeatedly, from which the shortest is selected as the next shortest path. It repeats the above steps until KSPs have been determined. Some methods [34, 9, 35, 8, 10, 11, 5, 28] are proposed to further optimize the generation of candidate paths. [9, 34, 35] partition the candidate shortest paths into equivalence classes and safely prune specific classes that are impossible to contain k shortest paths. Others adopt the Shortest Path Tree (SPT for short) [8, 10, 11, 29, 35, 5, 36] to help identify candidate shortest paths, where SPT maintains the shortest paths from all vertices to the terminal vertex.

All of the above proposals suffer from the following drawbacks if directly applied to our problem. First, most of them require access to the entire graph during their operation; as a result the only option is to replicate the entire dynamic graph on each server, allowing them to operate in a distributed fashion. This however has significant cost and scalability implications. Second, the majority of these algorithms adopt a sequential strategy that necessitates a search for the shortest paths one after the other, which limits their ability to handle many concurrent queries in a distributed setting. Finally, some of them require building a path index such as SPT [8, 10, 11] for every query, which is too heavy-weight for dynamic graphs as the index often become invalid due to varying weights.

Distributed SSP Algorithms. Past work focuses on identifying a Single Shortest Path (SSP) [15, 16, 17, 18, 37, 19, 20, 38] over a static graph in a distributed fashion. [15, 16, 17, 18] aim to determine the SSP in a communication network, which is distributed if each vertex represents a processor. Others investigate the distributed computation of the SSP on a cluster of servers [19, 20, 38]. Li et al. investigate the SSP in multimodal road network that includes buses, trains, cars, bikes, and so on. Qiu et al. [20] identify the SSP based on the principle of pruned landmark labeling [39], while Aridhi et al. [19] propose a distributed algorithm to identify an approximate shortest path in a large-scale network with MapReduce.

These distributed SSP algorithms suffer from the following problems when applied to our setting. First, they are designed for static graphs, and thus most index structures employed in those approaches such as k-shortcut hopset [17, 18, 37] and node labels [20] quickly become obsolete in a dynamic graph rendering them unusable. Second, all of them focus on the search for a single shortest path and its nontrivial to extend, to the case of finding KSPs.

SSP in Dynamic Graphs. Previous research has addressed SSP computation in dynamic graphs[22, 40]. Shang et al. investigated monitoring the shortest path in a dynamic graph, yet their centralized search algorithm, reliant on a road network expansion tree, proves challenging to deploy in a distributed setting. Yang et al. [22] propose a distributed algorithm CANDS to identify the SSP in a dynamic graph. It partitions the entire graph into subgraphs residing on different servers, and indexes the shortest path between any pair of boundary vertices within each subgraph. Although seemingly similar to our solution, this approach would not work well when directly applied to our problem. First, its search procedure is essentially sequential. For a given query, it starts from the subgraph covering the source vertex and iteratively expands to other subgraphs via the indexed shortest paths using Dijkstra’s algorithm until reaching the subgraph containing the destination vertex. The examined subgraphs are not known initially and they have to be explored in order. Second, the indexed shortest paths in CANDS require frequent re-computation in a dynamic graph, which can be quite expensive. Finally, CANDS is designed for identifying single shortest path, and it is non-trivial to adapt this paradigm to identify KSPs.

8 Conclusions and Future Work

We focus on the problem of identifying KSPs over road networks and propose a suite of distributed solutions. Consisting of the bounding paths and the skeleton graph, DTLP provides reference paths assisting the identification of relevant subgraphs that need to be explored to identify KSPs. Since the bounding paths compacted by G-MPTrees in DTLP do not change with varying weights, DTLP is light-weight for dynamic graphs with low maintenance cost. Based on DTLP, the KSP-DG algorithm is designed to run in a distributed setting. We have demonstrated through extensive experiments that our proposal significantly outperforms baseline methods across a variety of settings.

As future work, one may consider addressing variants of the problem studied in this paper. For example, a constrained version of the KSP query may require all shortest paths to pass through some designated vertices; another version may involve limiting the diversity of the shortest paths to below a certain threshold. These variants have important practical applications, and it is worthwhile to investigate their solutions in a distributed environment.

References

[1] S. Shang, L. Chen, Z. Wei, C. S. Jensen, J.-R. Wen, and P. Kalnis, “Collective travel planning in spatial networks,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 5, pp. 1132–1146, 2016.
[2] L. Chen, Y. Gao, Z. Fang, X. Miao, C. S. Jensen, and C. Guo, “Real-time distributed co-movement pattern detection on streaming trajectories,” Proceedings of the VLDB Endowment, vol. 12, no. 10, pp. 1208–1220, 2019.
[3] L. Chen, Q. Zhong, X. Xiao, Y. Gao, P. Jin, and C. S. Jensen, “Price-and-time-aware dynamic ridesharing,” in 2018 IEEE 34th international conference on data engineering (ICDE). IEEE, 2018, pp. 1061–1072.
[4] Z. Yu, X. Yu, N. Koudas, Y. Liu, Y. Li, Y. Chen, and D. Yang, “Distributed processing of k shortest path queries over dynamic road networks,” in SIGMOD, 2020, pp. 665–679.
[5] H. Liu, C. Jin, B. Yang, and A. Zhou, “Finding top-k shortest paths with diversity,” IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 3, pp. 488–502, 2018.
[6] J. Y. Yen, “Finding the k shortest loopless paths in a network,” Management Science, vol. 17, no. 11, pp. 712–716, 1971.
[7] N. Katoh, T. Ibaraki, and H. Mine, “An efficient algorithm for k shortest simple paths,” Networks, vol. 12, no. 4, pp. 411–427, 1982.
[8] D. Eppstein, “Finding the k shortest paths,” SIAM Journal on computing, vol. 28, no. 2, pp. 652–673, 1998.
[9] J. Hershberger, M. Maxel, and S. Suri, “Finding the k shortest simple paths: A new algorithm and its implementation,” ACM Transactions on Algorithms, vol. 3, no. 4, pp. 45–es, 2007.
[10] J. Gao, H. Qiu, X. Jiang, T. Wang, and D. Yang, “Fast top-k simple shortest paths discovery in graphs,” in CIKM, 2010, pp. 509–518.
[11] J. Gao, J. Yu, H. Qiu, X. Jiang, T. Wang, and D. Yang, “Holistic top-k simple shortest path join in graphs,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 4, pp. 665–677, 2012.
[12] B. Zheng, C. Huang, C. S. Jensen, L. Chen, N. Q. V. Hung, G. Liu, G. Li, and K. Zheng, “Online trichromatic pickup and delivery scheduling in spatial crowdsourcing,” in ICDE. IEEE, 2020, pp. 973–984.
[13] B. Zheng, K. Zheng, C. S. Jensen, N. Q. V. Hung, H. Su, G. Li, and X. Zhou, “Answering why-not group spatial keyword queries,” IEEE Trans. Knowl. Data Eng., vol. 32, no. 1, pp. 26–39, 2020.
[14] B. Zheng, L. Bi, J. Cao, H. Chai, J. Fang, L. Chen, Y. Gao, X. Zhou, and C. S. Jensen, “Speaknav: Voice-based route description language understanding for template driven path search,” Proc. VLDB Endow., vol. 14, no. 12, pp. 3056–3068, 2021.
[15] K. M. Chandy and J. Misra, “Distributed computation on graphs: Shortest path algorithms,” Programming Techniques and Data Structures, pp. 833–837, 1982.
[16] B. Awerbuch, “Randomized distributed shortest paths algorithms,” in STOC, 1989, pp. 490–500.
[17] M. Elkin, “Distributed exact shortest paths in sublinear time,” in STOC, 2017, pp. 757–770.
[18] M. Ghaffari and J. Li, “Improved distributed algorithms for exact shortest paths,” in STOC, 2018, p. 431–444.
[19] S. Aridhi, P. Lacomme, L. Ren, and B. Vincent, “A mapreduce-based approach for shortest path problem in large-scale networks,” Engineering Applications of Artificial Intelligence, vol. 41, pp. 151–165, 2015.
[20] K. Qiu, Y. Zhu, J. Yuan, J. Zhao, X. Wang, and T. Wolf, “Parapll: Fast parallel shortest-path distance query on large-scale weighted graphs,” in ICPP, 2018, pp. 1–10.
[21] W. Fan, X. Wang, and Y. Wu, “Performance guarantees for distributed reachability queries,” in VLDB, 2012, pp. 1304–1315.
[22] D. Yang, D. Zhang, K.-L. Tan, J. Cao, and F. Le Mouël, “Cands: continuous optimal navigation via distributed stream processing,” in VLDB, 2014, pp. 137–148.
[23] S. Shang, L. Chen, Z. Wei, C. S. Jensen, K. Zheng, and P. Kalnis, “Trajectory similarity join in spatial networks,” in VLDB, 2017, pp. 1178–1189.
[24] B. Yao, W. Zhang, Z.-J. Wang, Z. Chen, S. Shang, K. Zheng, and M. Guo, “Distributed in-memory analytics for big temporal data,” in DASFAA, 2018.
[25] L. Chen, S. Shang, C. S. Jensen, B. Yao, and P. Kalnis, “Parallel semantic trajectory similarity join,” in ICDE, 2020, pp. 997–1008.
[26] R. Anand and U. Jeffrey David, Mining of massive datasets. Cambridge university press, 2011.
[27] J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining of Massive Datasets. USA: Cambridge University Press, 2014.
[28] D. Ajwani, E. Duriakova, N. Hurley, U. Meyer, and A. Schickedanz, “An empirical comparison of k-shortest simple path algorithms on multicores,” in ICPP, 2018.
[29] G. Feng, “Finding k shortest simple paths in directed graphs: A node classification algorithm,” Networks, pp. 6–17, 2014.
[30] “Apache storm,” http://storm.apache.org/, 2019.
[31] DIMACS, “http://users.diag.uniroma1.it/challenge9,” 2005.
[32] F. Bernhard, G. Martin, and G. Stefan, “Time-varying travel times in vehicle routing,” Transportation Science, vol. 38, no. 2, pp. 121–255, 2004.
[33] B. Hannah, D. Daniel, G. Andrew, M.-H. Matthias, P. Thomas, S. Peter, W. Dorothea, and R. F. Werneck, “Route planning in transportation networks,” Algorithm Engineering, pp. 19–80, 2016.
[34] J. Hershberger and S. Suri, “Vickrey prices and shortest paths: What is an edge worth?” in IEEE International Conference on Cluster Computing, 2001, pp. 252–259.
[35] L. Chang, X. Lin, L. Qin, J. X. Yu, and J. Pei, “Efficiently computing top-k shortest path join,” in EDBT, 2015, pp. 133–144.
[36] Z. Luo, L. Li, M. Zhang, W. Hua, Y. Xu, and X. Zhou, “Diversified top-k route planning in road network,” in VLDB, 2022, pp. 3199–3212.
[37] S. Forster and D. Nanongkai, “A faster distributed single-source shortest paths algorithm,” in IEEE 59th Annual Symposium on Foundations of Computer Science, 2018, pp. 686–697.
[38] Y. Li, Y. Yuan, Y. Wang, X. Lian, Y. Ma, and G. Wang, “Distributed multimodal path queries,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 7, pp. 3196–3210, 2022.
[39] T. Akiba, Y. Iwata, and Y. Yoshida, “Fast exact shortest-path distance queries on large networks by pruned landmark labeling,” in SIGMOD, 2013, pp. 349–360.
[40] S. Shang, L. Chen, Z.-W. Wei, D.-H. Guo, and J.-R. Wen, “Dynamic shortest path monitoring in spatial networks,” Journal of Computer Science and Technology, vol. 31, pp. 637–648, 2016.

Acknowledgments

This work was supported by NSFC Grants [No.62172351] and NSERC [RGPIN-2022-04623].