This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On Optimization of 12\frac{1}{2}-Approximation Path Cover Algorithm

Junyuan Lin, Guangpeng Ren
(August 2020)
Abstract

In this paper, we propose a deterministic algorithm that approximates the optimal path cover on weighted undirected graphs. Based on the 12\frac{1}{2}-Approximation Path Cover Algorithm by Moran et al., we add a procedure to remove the redundant edges as the algorithm progresses. Our optimized algorithm not only significantly reduces the computation time but also maintains the theoretical guarantee of the original 12\frac{1}{2}-Approximation Path Cover Algorithm. To test the time complexity, we conduct numerical tests on graphs with various structures and random weights, from structured ring graphs to random graphs, such as Erdos-Renyi graphs. The tests demonstrate the effectiveness of our proposed algorithm on graphs, especially those with high degree nodes, and the advantages expand as the graph gets larger. Moreover, we also launch tests on various graphs/networks derived from a wide range of real-world problems to suggest the effectiveness and applicability of the proposed algorithm.

Keywords— p ath cover; graph algorithms; Watts-Strogatz graph; Erdos-Renyi graph; greedy algorithm; social network

1 Introduction

In graph theory, the path cover refers to a set of paths that cover all vertices of a graph, where every vertex belongs to only one path. To find the optimal path cover, we would use the path cover algorithm to select a set of paths of minimum or maximum edge weights of the graph. Such algorithms have been widely implemented in many general graph theory problems, such as on cocomparability graphs, cographs, interval graphs, block graphs, and permutation graphs [5, 4, 6, 7]. In applications, path cover also provides solutions to solve real-world problems. In [2], it provides a path analysis of site visiting to help understand the website traffic and improve marketing strategies. In [8], it shares a similar idea to the path cover algorithm by finding the path that has the minimum weight, such as the shortest distance or the shortest time on a map. In [3], the path cover can provide the minimum number of test paths to cover different structural coverage criteria.

Finding an optimal path cover in any graph is an NP-complete problem. Therefore, we hope to obtain approximation algorithms to estimate an optimal path cover. In [1], Moran et al. introduced three fundamental approximation algorithms for the maximum weighted path cover of a graph. 12\frac{1}{2}-Approximation Covering Algorithm is the first one in the paper. It is a greedy algorithm that works with the undirected graphs, and it can obtain at least 12\frac{1}{2} of the optimal path cover weight on the graph. There are two other path cover approximation algorithms introduced in the [1] that are linear in programming. They both guarantee 23\frac{2}{3} weight in undirected graphs and directed graphs. These two algorithms share a similar idea by finding a degree-constrained subgraph of the primary graph that has maximum weight. The degree of node in the subgraph is constrained to be less or equal to 2, and there are only paths and cycles left in the subgraph. Each cycle has at least three edges, and there is at least one edge in this cycle whose weight is less than 13\frac{1}{3} of the total weight in this cycle. Thereby, these results give the 23\frac{2}{3} theoretical bound of the algorithm.

In recent years, there are also works done on approximation algorithms that take different approaches. In [11], it uses primal-dual method and designs a 12\frac{1}{2}-approximation algorithm called VCP3VCP_{3}(VertexVertex CoverCover P3P_{3}) where P3P_{3} is a path with 3 vertices. The main idea of VCP3VCP_{3} is to remove redundant vertices as the algorithm goes. Its time complexity takes O(mn)O(mn) time, where mm is the number of edges and nn is the number of vertices. In [12], it provides an algorithm specifically for MWVCCkMWVCC_{k}(MinimumMinimum WeightWeight ConnectedConnected kSubgraphk-Subgraph CoverCover). This algorithm is an optimization for the algorithm in [11] where it only considers when k=2k=2, yet the algorithm in [12] proves 1k1\frac{1}{k-1}-approximation for MWVCCkMWVCC_{k} problem. Its time complexity is O(n2m)O(n^{2}\cdot m).

In applications of path cover, most algorithm approaches were derived from approximation algorithms by Moran et al. Depending on the property of the problems, generalizations and optimizations are made to boost the efficiency. In [9], it provides a generalized algorithm specifically working towards the flow graph, which is a presentation of all possible paths in a program. In [10], the authors propose an optimization approach which is an adaptive multigrid method for linear systems with weighted graph Laplacians. Using the 12\frac{1}{2}-Approximation Covering Algorithm, the authors were able to approximate the level sets of the smooth error and further increase the accuracy and robustness of building multilevel structures.

As the data sets become larger and more complex nowadays, it calls for more robust path cover approximating algorithms to efficiently handle problems on these large graphs/networks. In this paper, we propose a deterministic algorithm based on the 12\frac{1}{2}-Approximation Covering Algorithm. Our algorithm inherits several advantages from the original one: 1) It is a greedy algorithm and straight-forward to implement. 2) Every step of the algorithm is deterministic, and there is no randomness involved. 3) There is a theoretical guarantee on time complexity. On top of these advantages, we take a further step to sequentially optimize it by removing the redundant edges as the algorithm progresses. Adding this step can significantly drop the computational time while still guaranteeing the 12\frac{1}{2} weight lower bound. In section 2, we focus on 12\frac{1}{2}-Approximation Covering Algorithm by introducing some preliminaries and time complexity as well as the pseudocode and analysis of the algorithm. In section 3, we focus on the optimized algorithm by discussing our motivations, the pseudocode, the analysis of maintaining the theoretical guarantees as well as the process of removing edges. In section 4, we provide the numerical results and visualizations of both algorithms on Watts-Strogatz graph, a deterministic structure graph in the setting of this paper, and Erdos-Renyi graph, a random graph, as well as graphs from real-world problems to show the effectiveness and advantages of the optimized algorithm.

2 12\frac{1}{2}-Approximation Covering Algorithm

We first introduce some preliminaries that help explain the 12\frac{1}{2}-Approximation Covering Algorithm, denoted as algorithm 1, and the complexity of the algorithm. Then we then show the pseudocode and review the proof of 12\frac{1}{2} theoretical bound from [1].

2.1 Preliminaries and complexity

Consider an undirected weighted graph GG, which contains vertex set VV, edge set EE, and associating weights WW. An edge e={u,v}e=\{u,v\} in EE contains two end points, and w(e)w(e) is the weight of ee from WW. We also define NN to be the number of vertices in VV and MM to be the number of edges in EE. We denote OPTOPT as the optimal (maximal) path cover of GG. cover as the approximated path cover of OPTOPT that is generated by the approximating algorithms. Later on, as we analyze the lower bound of WW in GG that the algorithms can obtain, we denote dAd_{A}(uu) to be edges of cover incident to uu, ϵ(p)\epsilon(p) to be the set of edges in cover, S(e)S(e) to be the set of edges incident to ee, and Ep={e=(u,v)|eϵ(p),dA(u)=1}E_{p}=\{e=(u,v)|e\in\epsilon(p),d_{A}(u)=1\} [1]. In addition, we denote HH to be the number of edges in the path cover and KK to be the number of paths in the path cover.

Algorithm 1 has a 12\frac{1}{2} theoretical bound on the weight of OPTOPT, which is where the name comes from. Its time complexity is O(MLogM)O(M\cdot LogM) which sources from the fact that initialization, the execution time of the loop, and the output are in O(M)O(M), but the sorting time is in O(MLogM)O(M\cdot LogM). Thus, the entire test is eventually in O(MLogM)O(M\cdot LogM).

2.2 Pseudocode

Algorithm 1 is the fundamental path cover algorithm in [1]. Each step is deterministic and easy to follow. While the advantages are obvious, algorithm 1 can be time-consuming as the graph gets larger.

Algorithm 1 12\frac{1}{2} Approximation Path Cover
1:
2:procedure [COVER](PathCover(A))
3:A——an undirected positive weighted graphG=(V,E,W)
4:cover——a path cover of graph GG
5:     sorted_edges \leftarrow Sort the edges in descending order based on W
6:     for e(u, v)sorted_edges \textit{e(u, v)}\in\texttt{sorted$\_$edges } do
7:         if  neither uu nor vv is in any paths in cover then
8:              Add {uu, vv} as a new path in cover
9:         else if uu is the endpoint of a path in cover and vv is not in any paths then
10:              Append {vv} to cover {path that contains uu}
11:         else if vv is the endpoint of a path in cover and uu is not in any paths then
12:              Append {uu} to cover {path that contains vv}
13:         else if uu and vv are the endpoints of different paths in cover then
14:              Merge two paths
15:         end if
16:     end for
17:end procedure

We can observe that algorithm 1 is essentially a greedy algorithm by checking each edge based on the weight in descending order to build the path cover. Eventually, the weight of path cover is at least 12\frac{1}{2} weight of OPTOPT. More rigorous details will be discussed in Section 2.3.

2.3 Algorithm 1 Analysis

The goal of the analysis is to show the following theorem of algorithm 1:

Theorem 1

The weight of the path cover is at least 12\frac{1}{2} of the weight of OPTOPT

w(cover)12w(OPT)w(\texttt{cover})\geq\frac{1}{2}\cdot w(OPT)

To review the 12\frac{1}{2} lower bound proof in [1], we first make the following classification:

(1) E1E_{1} ={ee=(uu, vv) || ee \in cover \cap OPTOPT. }

(2) E2E_{2} ={ee=(uu, vv) || ee \in OPTOPT -cover, dAd_{A}(uu)=dAd_{A}(vv)=1. }

(3) E3E_{3} ={ee=(uu, vv) || ee \in OPTOPT -cover, max{dAd_{A}(uu), dAd_{A}(vv)}=2. }

To generalize, we first consider ee(uu, vv) \in E2E_{2}. It is not hard to state that ww(ee) \leq min(ww(cover) because, otherwise, the algorithm would have taken ee to cover. Since S(e)S(e) is the set of edges incident to ee. Then S(e)S(e)=(e1e_{1}, e2)e_{2})) for each ee \in E2E_{2}, and we can claim ww(ee) \leq 12\frac{1}{2}(ww(e1e_{1})+ww(e2e_{2}).

If we consider ee(uu, vv) \in E3E_{3} whose end points connect to at most two edges in cover, by similar argument, we let S1(e)S_{1}(e)=(e1e_{1}, e2)e_{2})) to be the set of edges incident to uu and S2(e)S_{2}(e)=(e3e_{3}, e4)e_{4})) to be the set of edges incident to vv. Thus, we can still claim that ww(ee) \leq 12\frac{1}{2}(ww(e1e_{1})+ww(e2e_{2}) or ww(ee) \leq 12\frac{1}{2}(ww(e3e_{3})+ww(e4e_{4})

Based on the above statements, we conclude that

ww(OPTOPT)=ww(E1E_{1})+ww(E2E_{2})+ww(E3E_{3})\leq w1w_{1}+w2w_{2}+w3w_{3}

w1=eE1w(e)w_{1}=\sum_{e\in E_{1}}w(e)
w2=eE2S(e)=(e1,e2)12(w(e1)+w(e2))w_{2}=\sum_{\begin{subarray}{c}e\in E_{2}\\ S(e)=(e_{1},e_{2})\end{subarray}}\frac{1}{2}(w(e_{1})+w(e_{2}))
w3=eE3S(e)=(e1,e2)12(w(e1)+w(e2))w_{3}=\sum_{\begin{subarray}{c}e\in E_{3}\\ S(e)=(e_{1},e_{2})\end{subarray}}\frac{1}{2}(w(e_{1})+w(e_{2}))

To show the contribution of each edge cover\in\texttt{cover} to WW, the lower bound of 12\frac{1}{2} can be proved by discussing the following 4 cases.

Assume that ee is an edge in ϵ(p)Ep\epsilon(p)-E_{p} which means dA(u)d_{A}(u)\neq1

Case 1: If eOPTe\in OPT, w(e)w(e) contributes only once in w1w_{1} since eOPTcovere\in OPT\cap\texttt{cover} and at most twice in w3w_{3} since E3E_{3} requires dA(u)d_{A}(u) = 2, and it is multiplied by 12\frac{1}{2} each time in w3w_{3}. Thus, ee contributes at most 2 \cdot w(e)w(e) to WW.

Case 2: If eOPTe\notin OPT, w1w_{1} and w2w_{2} do not contribute since eOPTe\notin OPT and dA(u)d_{A}(u) = 2, and w3w_{3} contributes 2 \cdot w(e)w(e) as well since each end point uu and vv appears at most twice in w3w_{3}.

Then assume that eEpe\in E_{p} which means dA(u)d_{A}(u)=1

Case 3: If eOPTe\in OPT, w(e)w(e) appears once in w1w_{1}. There might be at most one edge in E2E_{2} incident to ee, then ee appears at most once in w2w_{2}, and it is multiplied by 12\frac{1}{2}. ee also appears at most once in w3w_{3} where ee may be one of the two edges incident to an edge in E3E_{3}, and it is multiplied by 12\frac{1}{2}. Thus, ee contributes at most 2 \cdot w(e)w(e) to WW.

Case 4: If eOPTe\notin OPT, w(e)w(e) does not appear in w1w_{1}. It appears once in w2w_{2} since it may incident to an edge from E2E_{2}, and it is multiplied by 12\frac{1}{2}. It also appears at most once in w3w_{3} since there could be at most two edges from E3E_{3} incident to one side of ee. Thus, ee contributes at most 32\frac{3}{2} \cdot w(e)w(e) to WW.

Therefore, we can conclude that each edge from cover contributes at most 2 \cdot w(e)w(e) to WW. To take a further step, we can claim that it is guaranteed that at least 12\frac{1}{2} of WW can be obtained by cover. Such relation can be presented as the following:

w(OPT)=w(E1)+w(E2)+w(E3)W2w(cover)w(OPT)=w(E_{1})+w(E_{2})+w(E_{3})\leq W\leq 2\cdot w(\texttt{cover})


To rewrite the expression, we can obtain:

12w(OPT)w(cover)\frac{1}{2}\cdot w(OPT)\leq w(\texttt{cover})

3 Optimized Algorithm

As algorithm 1 is straight-forward and deterministic to build the path cover, we are motivated to develop an approach that can decrease the computational and preserve the theoretical results by removing redundant edges.

3.1 Motivations

Refer to caption
Figure 1: Redundancy

In figure 1, edges in red are in path cover, and the edges in black are incident to the path. In algorithm 1, it would check all the edges to decide if they should be added to the path cover. However, by the definition of path, only edges incident to endpoints can be added to the path. Thus, checking these edges is the redundancy for the algorithm, and we add a step of removing these redundant edges in algorithm 2 to save computational time. The advantage of the time saved is significant as the average degree of the graph gets higher. Meanwhile, the time complexity and 12\frac{1}{2} weight bound are still guaranteed in algorithm 2.

3.2 Pseudocode

Steps 8, 11 and 14 of algorithm 2 are the steps that remove the redundant edges and save the time of checking these edges as the algorithm progresses. More numerical details on the efficiency increase will be discussed later in Section 4.

Algorithm 2 Sequential Optimized Path Cover
1:
2:procedure [COVER](PathCover(A))
3:A——an undirected positive weighted graphG=(V,E,W)
4:cover——a path cover of graph GG
5:     sorted_edges \leftarrow Sort the edges in descending order based on W
6:     for e(u, v)sorted_edges \textit{e(u, v)}\in\texttt{sorted$\_$edges } do
7:         if  neither uu nor vv is in any paths in cover then
8:              Add {uu, vv} as a new path in cover
9:         else if uu is the endpoint of a path in cover and vv is not in any paths then
10:              Append {vv} to cover {path that contains uu}
11:              Remove {adj(u),uadj(u),u} from sorted_edges \triangleright adj(u)adj(u) \leftarrow adjacent nodes of uu in GG but not in cover
12:         else if vv is the endpoint of a path in cover and uu is not in any paths then
13:              Append {uu} to cover {path that contains vv}
14:              Remove {adj(v),vadj(v),v} from sorted_edges\triangleright adj(v)adj(v) \leftarrow adjacent nodes of vv in GG but not in cover
15:         else if uu and vv are the endpoints of different paths in cover then
16:              Merge two paths
17:              Remove {adj(v),vadj(v),v} and {adj(u),uadj(u),u} from sorted_edges
18:         end if
19:     end for
20:end procedure

Comparing to the pseudocode of algorithm 1, these extra steps only remove the redundant edges from the loop of sorted_\_edges without influencing the build of path cover.

3.3 Algorithm 2 Analysis

As mentioned before, removing redundant edges does not affect the algorithm to obtain 12\frac{1}{2} weight of WW since it does not interact with path cover in pseudocode. Therefore, the lower bound is still guaranteed in this new algorithm, as well as the time complexity.

Refer to caption
Figure 2: OPTOPT
Refer to caption
Figure 3: cover by Algorithm 1

We generate an example of algorithm 2 following the same argument in the proof of algorithm 1. To visually compare, we generate graphs presented in figure 3 and 3 which are the path covers generated by the optimal path cover(OPTOPT) which has the possible highest weight and the optimized algorithm, or cover. OPTOPT obtains the weight of 51, and cover obtains the weight of 50. The classification of edges can be made in the following:

(1) E1E_{1} ={e, f}, {f, g}, {c, d}, {a, b}

(2) E2E_{2} ={d, g}

(3) E3E_{3} ={b, e} and {c, h}

We first pick {d, g} from E2E_{2} and compare it to the weight of each edge in cover, and we can claim that ww(d, g) \leq max(w(coverw(\texttt{cover})). Otherwise, the algorithm would have selected {d, g} instead of its two incident edges {c, d} and {f, g}. This can be expressed as ww(d, g) \leq 12\frac{1}{2}(ww(c, d)+ww(f, g)).

Picking edges from E3E_{3}, we have {b, e} and {c, h}. By similar argument, we can claim the following expressions:

ww(b, e) \leq 12\frac{1}{2}(ww(a, b)+ww(b, c)) and ww(c, h) \leq 12\frac{1}{2}(ww(b, c)+ww(c, d))

Then we discuss the weight contribution of each edge. Looking back to the 4 cases mentioned previously, we can classify {c, d}, {f, g} for case 1, {c, d}, {f, g}, {e, f}, {a, b} for case 3 and {a, e}, {b, c} for case 4.

Edges in case 1 contribute once in w1w_{1} since these edges are in OPTOPT and cover. Thus, these contributes at most 1 \cdot w(e)w(e) to WW.

Edges in case 3 contribute once in w1w_{1} since they are all in E1E_{1}. There is only one edge {d, g} in E2E_{2} incident to {c, d} and {f, g}. Thus, these two edges appear at most once in w2w_{2}, and their weight is multiplied by 12\frac{1}{2}. Edges that also appear at most once in w3w_{3} as incident edges of E3E_{3} are {a, b}, {c, d} and {e, f}, and they are multiplied by 12\frac{1}{2} as well. Therefore, these edges in case 3 contribute at most 2 \cdot w(e)w(e) to WW

Edges in case 4 do not contribute in w1w_{1} since they are not in OPTOPT. They may appear once in w2w_{2} if they incident to an edge from E2E_{2}, and the weight is multiplied by 12\frac{1}{2}. However, it does not apply in this example. Our selected edges in case 4 happened to be in w3w_{3}, so they contribute 1 \cdot w(e)w(e) which is within the 32\frac{3}{2} bound mentioned earlier.

This example still satisfies the expression we showed for algorithm 1, and it still works for algorithm 2:

w(OPT)=w(E1)+w(E2)+w(E3)W2w(cover)w(OPT)=w(E_{1})+w(E_{2})+w(E_{3})\leq W\leq 2\cdot w(\texttt{cover})


and

12w(OPT)w(cover)\frac{1}{2}\cdot w(OPT)\leq w(\texttt{cover})


Time complexity is still O(MLogM)O(M\cdot LogM). Previously we have initialization, the execution time of the loop, and the output is in O(M)O(M), and the removal of redundant edges is also in O(M)O(M). Also, the sorting time is still in O(MLogM)O(M\cdot LogM). Thus, the entire test is in O(MLogM)O(M\cdot LogM).

3.4 Removed Edges

To visualize the process of removing redundant edges in algorithm 2, we use the Watts-Strogatz graph in the setting of structured ring graphs. Thus, it is more straightforward to show the difference of both outputs from algorithms.

Refer to caption
Figure 4: Initialized Watts-Strogatz Graph
Refer to caption
Figure 5: Output of Algorithm 1
Refer to caption
Figure 6: Output of Algorithm 2

Figure 6 is an initialized Watts-Strogatz graph with the degree of 4 to exemplify the process of generating path cover by two algorithms. Figure 6 is the output graph by algorithm 1, and the path cover is in blue. Figure 6 is the output graph by algorithm 2, and the path cover is in yellow.

The process of algorithm 1 starts from figure 6 as the following. Based on the descending order of weights, it takes {1, 0} as the first edge, {1, 4} to connect node 1, {3, 4} to connect node 4, finally {2, 3} to connect node 3. Based on the weight, it should then check {0, 2}, but if this edge connects {1, 0} and {2, 3}, it is no longer a path. Therefore, algorithm 1 terminates.

The process of algorithm 2 starting from figure 6 is mostly similar to algorithm 1. However, we need to remove redundant edges at the same time. It first takes {1, 0}, and there are no redundant edges yet since every other edge can be potentially chosen. Then it takes {1, 4}, then {1, 3} and {1, 2} cannot be taken because it needs to be a path. It takes {3, 4} next, then {0, 4} and {2, 4} cannot be taken for the same reason. Lastly, it removes {0, 3} after taking {2, 3}. {0, 2} is not the only remaining edge, and it cannot be taken. Therefore, algorithm 2 terminates.

We summarize our findings on the number of edges removed in the following claims:

Lemma 2

In a connected weighted graph GG, let HH be the number of edges in the path cover, NN be the number of vertices in the graph. If GG is covered by one single path, we have:

H=N1H=N-1

We can claim that if there is only one path in the path cover of the connected graph GG, then HH should be N1N-1.

According to this lemma, we can arrive at the following corollary:

Corollary 2.1

In a fully connected weighted graph GG, denote HH as the number of edges in the path cover and NN as the number of vertices in the graph, we know that

H=N1H=N-1

.

This is because fully connected graphs are guaranteed to be covered by a single path. However, this is not guaranteed for any connected graphs.

Refer to caption
Figure 7: Original graph
Refer to caption
Figure 8: Algorithm 1(No edges removed)

For example, figure 8 is a connected graph but not a fully connected graph, we can see by launching algorithm 1 that there are two paths in the path cover (see figure 8). In this case, HH does not equal N1N-1. For general cases where graphs are less connected, we use the following theorem to conclude the number of removed edges.

Theorem 3

In a weighted undirected graph, let HH be the number of edges in path cover, NN be the number of vertices in the graph, KK be the number of paths in path cover. HH can be expressed as the following:

H=NKH=N-K
Refer to caption
Figure 9: Theorem 3 visualization 1
Refer to caption
Figure 10: Theorem 3 visualization 2

In figure 10, it visualizes a graph containing three paths without removing the redundant edges, and figure 10 visualizes the graph that removes the redundant edges. To verify the theorem, we observe that N=14N=14 and K=3K=3. Thus H=11H=11, and it verifies the formula given by Theorem 3.

We include a direct proof for Theorem 3.

proof:proof: To show H=NKH=N-K

By Lemma 2,

H=i=1KHi=i=1K(Ni1)=i=1K(Ni)K=NKH=\sum_{i=1}^{K}H_{i}=\sum_{i=1}^{K}(N_{i}-1)=\sum_{i=1}^{K}(N_{i})-K=N-K

where HiH_{i} is the number of edges and NiN_{i} is the number of nodes in the ithi-th path of path cover. Since all ii paths cover the entire vertex set, i=1K(Ni)=N\sum_{i=1}^{K}(N_{i})=N, Theorem 3 is therefore proved.

4 Numerical Tests

The numerical tests are conducted with a 1.80 GHz Intel Core i7-8550U CPU, a quad-core processor, and 16 GB of RAM. The test program is written in Python where packages of networkx and numpy are imported. The dictionary is the main data structure throughout the program since it is implemented as hash tables, and its average time complexity is O(1)O(1). At the end, packages of line-profiler and sys are imported to record the computational time.

The following tables 1-9 and figures 14-18 provide the numerical results from the tests and visualizations on both Watts-Strogatz graphs and Erdos-Renyi graphs. We set 4, 6 and 8 as the degrees of node in Watts-Strogatz and 2ln(M)M\frac{2\cdot ln(M)}{M}, 2.5ln(M)M\frac{2.5\cdot ln(M)}{M} and 3ln(M)M\frac{3\cdot ln(M)}{M}, where MM is the number of edges, as the probability of generating edges in Erdos-Renyi is to see how it would affect the results as the density gets higher.

Refer to caption
Figure 11: Watts-Strogatz graph Node=4
Refer to caption
Figure 12: Watts-Strogatz graph Node=6
Refer to caption
Figure 13: Watts-Strogatz graph Node=8
Refer to caption
Figure 14: Watts-Strogatz Total Comparison
Table 1: Watts-Strogatz Graph (Degree of nodes = 4)
Nodes Edges Avg. Degree Algo 1 Algo 2 Time Ratio
131,072 262,144 4 1.73s 1.71s 0.99
262,144 524,288 4 3.39s 3.26s 0.96
524,288 1,048,576 4 6.90s 6.93s 1.00
1,048,576 2,097,152 4 15.40s 16.01s 1.04
2,097,152 4,194,304 4 31.15s 29.50s 0.95
4,194,304 8,388,608 4 63.01s 60.54s 0.96
Table 2: Watts-Strogatz Graph (Degree of nodes = 6)
Nodes Edges Avg. Degree Algo 1 Algo 2 Time Ratio
131,072 393,216 6 3.18s 2.62s 0.82
262,144 786,432 6 5.82s 4.96s 0.85
524,288 1,572,864 6 12.11s 10.28s 0.85
1,048,576 3,145,728 6 26.33s 20.34s 0.77
2,097,152 6,291,456 6 47.68s 41.17s 0.86
4,194,304 12,582,912 6 95.24s 83.69s 0.88
Table 3: Watts-Strogatz Graph (Degree of nodes = 8)
Nodes Edges Avg. Degree Algo 1 Algo 2 Time Ratio
131,072 524,288 8 5.59s 3.31s 0.59
262,144 1,048,576 8 7.89s 6.64s 0.84
524,288 2,097,152 8 14.29s 11.68s 0.82
1,048,576 4,194,304 8 29.34s 23.04s 0.79
2,097,152 8,388,608 8 48.31s 38.92s 0.81
4,194,304 16,777,216 8 104.24s 78.95s 0.76

Tables 1-3 and figures 14-14 are the numerical results and visualizations of Watts-Strogatz graph. The setting of Watts-Strogatz in this paper is a deterministic structure graph as we let the degree of node to be fixed number 4, 6, and 8, and only the weights on edges are assigned randomly from 1 to 10. The starting point of O(MLogM)O(M\cdot LogM) is rescaled to the starting point of algorithm 1 to observe if algorithm 1 meets the time complexity bound, and algorithm 1 and algorithm 2 are plotted based on the values from the tables to compare the computational time. We can observe that algorithms 1 and 2 are slightly above the theoretical bound of O(MLogM)O(M\cdot LogM) when the degree of node is 4 due to the overhead of algorithms, but they have better performance as the degree of node gets higher. When the degree of node is 4, the graph is relatively sparse, and it is difficult to claim that algorithm 2 performs better than algorithm 1. However, as the degree of node increases, algorithm 2 starts to save more time, and the time ratio of time(algorithm 2) over time (algorithm 1) seems to be stable around 0.8. One notable result is that, in tables 14-14 and figure 14, time cost when the degree of nodes is 8 is less than when the degree of nodes is 6 which breaks the pattern that the computational time increases as degree of nodes increase starting from 2,097,152 nodes number. Therefore, in a deterministic graph, we can claim that algorithm 2 is expected to obtain better results as the graph and its average degree of nodes gets larger.

Refer to caption
Figure 15: Erdos-Renyi graph Probability=2
Refer to caption
Figure 16: Erdos-Renyi graph Probability=2.5
Refer to caption
Figure 17: Erdos-Renyi graph Probability=3
Table 4: Erdos-Renyi Graph (Probability = 2(ln(M))M\frac{2\cdot(ln(M))}{M})
Nodes Edges Avg. Degree Algo 1 Algo 2 Time Ratio
25,806 261,786 20.28 1.46s 0.92s 0.63
48,585 524,926 21.60 2.99s 1.84s 0.61
91,763 1,048,862 22.86 6.22s 3.76s 0.60
173,811 2,097,921 24.14 12.96s 7.52s 0.58
330,076 4,195,049 25.42 23.19s 14.10s 0.61
628,322 8,388,585 26.70 46.78s 27.09s 0.58
Table 5: Erdos-Renyi Graph (Probability = 2.5(ln(M))M\frac{2.5\cdot(ln(M))}{M})
Nodes Edges Avg. Degree Algo 1 Algo 2 Time Ratio
25,806 327,924 25.42 1.77s 1.02s 0.58
48,585 655,398 26.98 3.57s 2.13s 0.60
91,763 1,310,775 28.56 7.83s 4.35s 0.56
173,811 2,623,118 30.62 13.75s 8.14s 0.59
330,076 5,242,429 31.76 28.40s 16.24s 0.57
628,322 10,485,256 33.38 58.83s 32.82s 0.56
Table 6: Erdos-Renyi Graph (Probability = 3(ln(M))M\frac{3\cdot(ln(M))}{M})
Nodes Edges Avg. Degree Algo 1 Algo 2 Time Ratio
25,806 393,057 30.46 1.97s 1.18s 0.60
48,585 786,773 32.38 4.07s 2.35s 0.58
91,763 1,572,307 34.26 8.39s 4.75s 0.57
173,811 3,147,013 36.22 18.31s 10.75s 0.59
330,076 6,288,240 38.10 34.63s 19.54s 0.56
628,322 12,580,367 40.04 74.24s 40.26s 0.54

While algorithm 2 is tested well on a deterministic structure graph, we would also like to see its performance on a random graph such as Erdos-Renyi graph. Erdos-Renyi graph generates its edges based on probability and assigns the weights on edges randomly from 1 to 10. In figures, 17-17, algorithm 1 stays close to the theoretical bound, and algorithm 2 obviously has a slower increasing trend then algorithm 1 does. In tables 4-6, the time ratio of algorithm 1 and 2 is stable in the range of 0.5 to 0.6 and likely to be decreasing. We can claim from the results that, as the graph and average degree get larger, algorithm 2 shows better performance.

s Edges Avg. Degree Skew. Kurt. Algo 1 Algo 2 Time Ratio 394,137 30.54 0.1940 0.0185 1.91s 1.10s 0.58 393,468 30.50 0.1961 0.0075 1.92s 1.11s 0.58 393,088 30.46 0.1876 0.0639 1.93s 1.13s 0.59 393,690 30.52 0.1647 0.0707 1.92s 1.15s 0.60 392,296 30.40 0.1783 0.0148 1.96s 1.15s 0.59 391,919 30.36 0.1934 0.0669 2.05s 1.21s 0.59 393,543 30.50 0.1729 0.0541 1.99s 1.21s 0.61 393,203 30.48 0.1892 0.0591 1.98s 1.22s 0.62 392,256 30.40 0.1675 0.0786 2.07s 1.27s 0.61 392,968 30.46 0.1661 0.0414 1.97s 1.27s 0.64

Table 7: Erdos-Renyi Graph (Node=25806, Prob = 3(ln(M))M\frac{3\cdot(ln(M))}{M})
Table 8: Erdos-Renyi Graph (Node=48585, Prob = 3(ln(M))M\frac{3\cdot(ln(M))}{M})
Edges Avg. Degree Skew. Kurt. Algo 1 Algo 2 Time Ratio
786571 32.38 0.1771 0.0490 4.10s 2.19s 0.53
787096 32.40 0.1827 0.0238 3.98s 2.21s 0.59
786128 32.36 0.1808 0.0142 4.11s 2.24s 0.55
786808 32.38 0.1649 0.0602 4.05s 2.26s 0.56
788167 32.44 0.1722 0.0588 4.08s 2.31s 0.57
784711 23.30 0.1982 0.1131 4.02s 2.34s 0.58
785326 32.32 0.1817 0.0457 4.20s 2.38s 0.57
788140 32.44 0.1774 0.0486 4.06s 2.40s 0.59
786928 32.40 0.1762 0.0236 4.07s 2.49s 0.61
787851 32.44 0.1746 0.3380 4.07s 2.53s 0.62
Table 9: Erdos-Renyi Graph (Node=91763, Prob = 3(ln(M))M\frac{3\cdot(ln(M))}{M})
Edges Avg. Degree Skew. Kurt. Algo 1 Algo 2 Time Ratio
1571521 34.26 0.1649 0.0092 7.73s 4.34s 0.56
1571816 34.26 0.1737 0.0157 8.20s 4.56s 0.56
1573021 34.28 0.1618 0.0542 8.10s 4.61s 0.57
1573531 34.30 0.1606 0.0046 8.23s 4.63s 0.56
1573614 34.30 0.1585 0.0194 8.56s 4.67s 0.55
1572042 34.26 0.1684 0.0304 8.15s 4.77s 0.59
1573326 34.30 0.1755 0.0283 8.16s 4.87s 0.60
1572499 34.28 0.1711 0.0136 8.29s 4.91s 0.59
1573169 34.28 0.1620 -0.0068 9.25s 5.01s 0.54
1568528 34.18 0.1752 0.0399 9.19s 5.16s 0.56
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Refer to caption
(f)
Figure 18: Least and Most Time Comparison

In a random graph, the variance of different test results while the nodes are the same could be high. To investigate what might affect the computational time of each test, we observe that the skewness and kurtosis of the degree of node distribution could be the reasons. In tables 7-9, we explicitly list out 10 test results of three different nodes, and the list is sorted by the algorithm 2 time in ascending order. The idea of algorithm 2 is to eliminate the extra edges to save time, so we are looking for the graph whose nodes have a more high average degree of nodes. Hypothetically, we would prefer the degree of nodes distribution that has low skewness and kurtosis. In other words, we want the distribution skewing to the right with the heavy tail which means more nodes with a high degree. Looking at the tables, we find this to be mostly true while the density and edges number are close. In figure 18, the top 3 graphs are the distribution of degree from the least running time graph of three nodes number, and the bottom 3 graphs are the distribution of degree from the most running time graph of three nodes number. We can tell that the top 3 distributions skew more to the right and have heavier tails comparing to the bottom 3 accordingly.

Table 10: gemsec-Facebook
Group Nodes Edges Avg. Degree Skew. Kurt. Algo 1 Algo 2 Time Ratio
Government 7,057 89,429 25.34 6.02 64.63 0.45s 0.26s 0.58
New Sites 27,917 205,964 14.76 8.83 154.37 1.18s 0.66s 0.56
Athletes 13,866 86,811 12.52 6.54 85.59 0.44s 0.28s 0.64
Public Figures 11,565 67,038 11.59 5.40 42.56 0.36s 0.21s 0.58
TV Shows 3,892 17,239 8.86 3.71 18.21 0.10s 0.06s 0.60
Politician 5,908 41,706 14.12 4.46 33.79 0.28s 0.15s 0.54
Artist 50,515 819,090 32.43 7.40 90.50 4.37s 2.43s 0.56
Company 14,113 52,126 7.39 6.43 77.74 0.28s 0.19s 0.68
Table 11: gemsec-Deezer
Group Nodes Edges Avg. Degree Skew. Kurt. Algo 1 Algo 2 Time Ratio
HR 54,573 498,202 18.26 2.91 21.63 2.87s 1.66s 0.58
HU 47,538 222,887 9.38 2.07 9.03 1.26s 0.86s 0.68
RO 41,773 125,826 6.02 3.27 23.64 0.72s 0.53s 0.74
Table 12: Road Network
Group Nodes Edges Avg. Degree Skew. Kurt. Algo 1 Algo 2 Time Ratio
TX 1,379,917 1,921,660 2.79 -0.59 -0.48 15.04s 14.97s 1.00
CA 1,965,206 2,766,607 2.82 -0.55 -0.39 21.51s 21.49s 1.00
PA 1,088,092 1,541,898 2.83 -0.58 -0.42 11.85s 12.19s 1.03
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Refer to caption
(f)
Refer to caption
(g)
Refer to caption
(h)
Figure 19: gamesec-Facebook
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 20: gamesec-Deezer
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 21: Road Network
Refer to caption
Figure 22: Real-World Graphs Boxplot

After testing with the deterministic structure graph and random graph generated by the computer, we can claim that algorithm 2 costs less time than algorithm 1 does, and the advantage can expand as the graph gets larger and average degree gets higher. Moreover, in a group of similar graphs, algorithm 2 works better when skewness and kurtosis of degree distribution are low since it prefers a high degree of node.

Next, we would test algorithm 2 on real-world problems to see if the above advantages continue. We select two social network data sets [14] and one road network data set [13] from Stanford Large Data Network Collection.

In figures 19-21, are the degree distributions of three real-world data sets. we notice that the first two social networks data sets have high skewness and an extremely large degree of nodes that are hard to observe from graphs. However, road networks have a low degree in general. In tables 10-12, time ratio for gamesec-Facebook and gamesec-Deezer are relatively stable. While the nodes and edges vary widely, the average degree is high enough to ensure that algorithm 2 outperforms algorithm 1. When the average degree of graphs is low, such as in the road networks, the advantages of algorithm 2 are not as pronounced. This can be explained by the type of these data sets. The first two tables are social networks that have clusters and a high degree of nodes that can provide redundant edges for algorithm 2 to remove and save time. However, in road network graphs, algorithm 2 can only remove edges at cross-intersections and T-intersections. The property of the road network constrains the effectiveness of algorithm 2. Therefore, in real-world examples, algorithm 2 still works better on graphs that have a high average degree of nodes. In figure 22, we pre-processed the data by removing the outliers to better observe the boxplots. The higher height of mean would potentially bring the time ratio in tables lower, and the ratio of IQR length and boxplot length seem to be stable throughout the graph which matches with the distribution graphs. Overall, we can conclude that algorithm 2 performs increasingly well on larger graphs with a higher average degree of nodes.

5 Conclusion and Future Works

Finding an optimal path cover in any graph is an NP-complete problem, yet there have been many works done to use approximation algorithms to estimate the optimal path cover. In this paper, we first introduce the 12\frac{1}{2}-Approximation Covering Algorithm as a fundamental path cover algorithm. A greedy algorithm that works with an undirected graph, it guarantees a path cover that obtains at least 12\frac{1}{2} weight of the optimal path cover with time complexity of O(MLogM)O(M\cdot LogM). Since the steps of this algorithm are straightforward and deterministic, we optimize it by adding a step of removing redundant edges that connect to the existing path. We go over the proof in [1] to show that the 12\frac{1}{2} theoretical bound still works for our optimized algorithm and visualize the process of removing redundant edges with a simple Watts-Strogatz graph. We then discuss our test results on both the deterministic version of Watts-Strogatz graphs, ring structured graphs, and Erdos-Renyi graphs, random graphs. From both the numerical results of time cost by algorithms and the visualizations, we observe that algorithm 2 works well on both types of graphs, especially as the average degree increases. Then we show that algorithm 2 is also effective on most real-world problems, except the low-degree sparse graphs such as the road networks. Therefore, a high degree of nodes in the graph would lead to a satisfying performance of algorithm 2.

Overall, the sequential optimization algorithm attains superior performance than the primary algorithm with low computational time and stable time ratio, and the advantage would expand as the graph and its average degree gets larger. Because modern problems tend to have large data sets, the new algorithm will certainly be competitive if we bring it to applications, especially problems in the form of high degree networks, such as social activities and e-commerce. It is also applicable to graph theory problems, such as cocomparability graphs, cographs, interval graphs, block graphs, and permutation graphs. Each structure may require some changes of the algorithm, but it is a general approach in many cases.

Even though the program thus far is good enough to use and meet the time complexity bound, we still want to explore if there are other potential data structures, functions, and programming tips of Python that we can implement to optimize our current program. Besides, we want to rewrite the current program in other languages, such as MATLAB or JAVA, which have better computation packages or work better with hardware directly to perform better. Additionally, we plan to take a step ahead to parallel optimization as we can separate one graph into multiple graphs by removing the local clustering and solve all the graphs at once to generate the path cover which would bring the efficiency to a higher level.

References

  • [1] S. Moran and I. Newman and Y. Wolfstahl, Approximation algorithms for covering a graph by vertex‐disjoint paths of maximum total weight, Networks, 20 (1990), 055–064.
  • [2] S. Ehikioya and S. Lu, A Path Analysis Model for Effective E-Commerce Transactions, Afr. J. Comp. & ICT, 12 (2019), 055–071.
  • [3] A. Dwarakanath and A. Jankiti, Minimum Number of Test Paths for Prime Path and Other Structural Coverage Criteria, the 26th ICTSS, 8763 (2014), 063–079.
  • [4] R. Lin and S. Olariu and G. Pruesse, An optimal path cover algorithm for cographs, COMPUT. MATH. APPL., 30 (1995), 075–083.
  • [5] D. G. Corneil and B. Dalton and M. Habib, LDFS-Based Certifying Algorithm for the Minimum Path Cover Problem on Cocomparability Graphs, SIAM J. Comput., 42 (2013), 792–807.
  • [6] S. R. Arikati and C. P. Rangan, Linear algorithm for optimal path cover problem on interval graphs, Inf. Process. Lett., 35 (1990), 149–153.
  • [7] R. Srikant and R. Sundaram and K. S. Singh and C. P. Rangan, Optimal path cover problem on block graphs and bipartite permutation graphs, Theor. Comput. Sci, 115 (1993), 351–357.
  • [8] E. W. Dijkstra, A node on two problem in connexion with graphs, Numer Math (Heidelb), 1 (1959), 269–271.
  • [9] A. Bertolino and M. Marre, Automatic generation of path covers based on the control flow analysis of computer programs, IEEE Trans. Softw. Eng., 20 (1994), 885–899.
  • [10] X. Hu and J. Lin and L. T. Zikatanov, An Adaptive Multigrid Method Based on Path Cover, SIAM J. Sci. Comput., 41 (2019), 220–241.
  • [11] J. Tu and W. Zhou, ”A primal–dual approximation algorithm for the vertex cover P3P_{3} problem, Theor. Comput. Sci., 412 (2011), 7044–7048.
  • [12] Y. Zhang and Y. Shi and Z. Zhang, Approximation algorithm for the minimum weight connected k-subgraph cover problem, Theor. Comput. Sci., 535 (2014), 054–058.
  • [13] J. Leskovec, K. Lang, A. Dasgupta and M. Mahoney, Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters, Internet Mathematics, 6 (2009), 029–123.
  • [14] Rozemberczki, Benedek and Davies, Ryan and Sarkar, Rik and Sutton, Charles, GEMSEC: Graph Embedding with Self Clustering, ASONAM, (2019), 065–072.

Junyuan Lin
Loyola Marymount University
1 LMU Drive
Los Angeles, CA 90045
E-mail: [email protected]

Guangpeng Ren
Loyola Marymount University
1 LMU Drive
Los Angeles, CA 90045
E-mail: [email protected]