This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Graph Ordering: Towards the Optimal by Learning

Kangfei Zhao [email protected] The Chinese University of Hong Kong Yu Rong [email protected] Tencent AI Lab Jeffrey Xu Yu [email protected] The Chinese University of Hong Kong Junzhou Huang [email protected] Tencent AI Lab  and  Hao Zhang [email protected] The Chinese University of Hong Kong
Abstract.

Graph representation learning has achieved a remarkable success in many graph-based applications, such as node classification, link prediction, and community detection. These models are usually designed to preserve the vertex information at different granularity and reduce the problems in discrete space to some machine learning tasks in continuous space. However, regardless of the fruitful progress, for some kind of graph applications, such as graph compression and edge partition, it is very hard to reduce them to some graph representation learning tasks. Moreover, these problems are closely related to reformulating a global layout for a specific graph, which is an important NP-hard combinatorial optimization problem: graph ordering. In this paper, we propose to attack the graph ordering problem behind such applications by a novel learning approach. Distinguished from greedy algorithms based on predefined heuristics, we propose a neural network model: Deep Order Network (DON) to capture the hidden locality structure from partial vertex order sets. Supervised by sampled partial order, DON has the ability to infer unseen combinations. Furthermore, to alleviate the combinatorial explosion in the training space of DON and make the efficient partial vertex order sampling , we employ a reinforcement learning model: the Policy Network, to adjust the partial order sampling probabilities during the training phase of DON automatically. To this end, the Policy Network can improve the training efficiency and guide DON to evolve towards a more effective model automatically. The training of two networks is performed interactively and the whole framework is called DON-RL. Comprehensive experiments on both synthetic and real data validate that DON-RL outperforms the current state-of-the-art heuristic algorithm consistently. Two case studies on graph compression and edge partitioning demonstrate the potential power of DON-RL in real applications.

1. Introduction

Graphs have been widely used to model complex relations among data, and have been used to support many large real applications including social networks, biological networks, citation networks and road networks. In the literature, in addition to the graph algorithms and graph processing systems designed and developed, graph representation learning have also been studied recently (DBLP:conf/www/AhmedSNJS13, ; DBLP:conf/nips/BelkinN01, ; fu2012graph, ; DBLP:conf/kdd/PerozziAS14, ; DBLP:conf/kdd/GroverL16, ; DBLP:conf/www/TangQWZYM15, ; DBLP:conf/www/TsitsulinMKM18, ; DBLP:conf/kdd/ZhangCWPY018, ; DBLP:conf/aaai/WangCWP0Y17, ; DBLP:conf/icde/FengCKLLC18, ; DBLP:conf/icde/HuAMH16, ; DBLP:conf/ijcai/ManSLJC16, ). They are designed to preserve vertex information at different granularity from microscopic neighborhood structure to macroscopic community structure, and are effectively used for node classification, link prediction, influence diffusion prediction, anomaly detection, network alignment, and recommendation. The surveys can be found in (cui2018survey, ; DBLP:journals/debu/HamiltonYL17, ).

Refer to caption
(a) Order by GO (DBLP:conf/sigmod/WeiYLL16, )
Refer to caption
(b) Order by DON-RL
Figure 1. Matrix Visualization for the Reordered Facebook
\Description

Matrix Visualization for the Reordered Facebook

In this paper, we concentrate ourselves on a rather different set of graph problems: graph visualization (DBLP:journals/cgf/BehrischBRSF16, ), graph compression (KangF11, ), graph edge partitioning (DBLP:conf/kdd/BourseLV14, ; DBLP:conf/kdd/ZhangWLTL17, ). Here, graph visualization is to provide an effective way to visualize complex data, graph compression is to find a compact edge layout, and graph edge partitioning is to partition edges instead of vertices for better load balancing (graphlab, ; DBLP:conf/osdi/GonzalezXDCFS14, ). Such problems can be addressed if we can arrange all vertices in a good order. The-state-of-art graph ordering (DBLP:conf/sigmod/WeiYLL16, ) reassigns every vertex a unique number in [1..n][1..n] where nn is the number of vertices in the graph by maximizing a cumulated locality score function for vertices in a sliding window of size w(>1)w~{}(>1), which is an NP-hard combinatorial optimization problem as a general form of the maximum Travel Salesman Problem (TSP) (w=1w=1). The GO algorithm proposed in (DBLP:conf/sigmod/WeiYLL16, ) achieves 12w\frac{1}{2w}-approximation, and its real performance is very close to the optimal in their testing, as it is close to the upper bound of the optimal solution.

Given such a high-quality solution for graph ordering, a natural question is if we can do better by learning. The answer is yes. We demonstrate it in Fig. 1(b), which shows the matrix visualization for the Facebook (snapnets, ) reordered by the GO algorithm and our new learning-based approach over graph. The left figures are the overall visualizations and the right are the zoom-in details of the upper-left part of the overall ordering. Comparing with the GO algorithm (DBLP:conf/sigmod/WeiYLL16, ) (Fig. 1(a)), our learning-based approach (Fig. 1(b)) keeps the compact permutation better from the local perspective, and has the advantage of avoiding stucking in forming local dense areas from the global perspective. Moreover, the learned permutation has a large potential to support other real applications. Based on our case studies, the learned permutation reduces the storage costs larger than 2×2\times compared with the order for real-graph compression (KangF11, ). And the edge partitioning deriving from our approach outperforms the widely-used partitioning algorithm (DBLP:conf/kdd/BourseLV14, ) up to 37%.

The main contributions of this paper are summarized below.

  • For the graph ordering problem, distinguished from the traditional heuristics-based algorithm, we propose a neural network based model: Deep Order Network (DON) to replace the predefined evaluation function to capture the complex implicit locality in graphs. Hence our proposed model can produce a better graph ordering than the current state-of-the-art heuristics-based algorithm.

  • To address the combinatorial explosion in the training space of DON, we exploit the idea of reinforcement learning and construct a policy network to learn the sampling probability of vertices. This policy network enables DON to focus on vertices which have more remarkable contribution in the solution and guides DON to evolve towards a more effective model automatically. The whole learning framework is called DON-RL.

  • We conduct extensive experiments on both synthetic and real-world graphs with different properties. Experimental results show that DON-RL consistently outperforms the other algorithms. Furthermore, we conduct two case studies: graph compression and graph edge partitioning to demonstrate the potential value of our algorithm in real applications.

The paper is organized as follows. Section 2 introduces the preliminaries, including the problem definition, the intuitions and challenges. Section 3 gives an overview of our learning framework, which is composed of a central network for ordering and an auxiliary Policy Network. The design principles of these two neural networks are introduced in Section 4 and Section 5, respectively. Section 6 presents the experimental result of our approach in solving the graph ordering problem. We review the related works in Section 7. Finally, we conclude our approach in Section 8.

Table 1. Frequently Used Notations
Notations Definitions
G=(V,E)G=(V,E) directed graph with the vertex set VV, the edge set EE.
𝚂(u,v)\mathtt{S}(u,v) The pair-wise similarity function of vertex uu and vv.
F(Φ)F(\Phi) The locality score function of a permutation Φ\Phi.
ww The given window size.
𝒬^(S,𝐯;Θ){{\mathcal{\hat{Q}}}(S,{\bf v};\Theta)} Deep Order Network (DON).
Π(s;Θ)\Pi(s;\Theta^{\prime}) Policy Network.
𝗉𝗋𝗈𝖻{\bf{{\mathsf{prob}}}} (𝗉𝗋𝗈𝖻𝐭{\bf{{\mathsf{prob}}}_{t}}) The vertices sampling probability (at time tt).
𝐚{\bf a} (𝐚𝐭{\bf a_{t}}) The tuning action of vertices sampling probability (at time tt).
Dt/D𝖤D_{t}/D_{{\mathsf{E}}} The training data at time t / The evaluation data.
Algorithm 1 GO (G=(V,E)G=(V,E))
1:  SS\leftarrow\emptyset;
2:  while |S||V||S|\neq|V| do
3:     v=argmaxvV𝒬(S,v)v^{*}=\arg max_{\forall v\in V}{\mathcal{Q}(S,v)} ;
4:     SS{v}S\leftarrow S\cup\{v^{*}\};
5:  end while
6:  return  SS;
Refer to caption
(a) A Directed Graph
Refer to caption
(b) Maximize F(Φ)F(\Phi)
Figure 2. An Example of Graph Ordering (DBLP:conf/sigmod/WeiYLL16, )

2. Preliminaries

In this section, first, the graph ordering problem and its existing solution are introduced. Then, we elaborate on our intuitive improvement and the challenges.

2.1. Graph Ordering

Given a graph G=(V,E)G=(V,E) where V(G)V(G) is a set of vertices and E(G)E(G) is a set of edges of GG, the graph ordering problem is to find the optimal order for all vertices. In brief, let Φ()\Phi(\cdot) be a permutation function which assigns a vertex a unique number in [1..n][1..n] where n=|V(G)|n=|V(G)|. The problem is to find the optimal graph ordering Φ()\Phi(\cdot) by maximizing a cumulated locality score F()F(\cdot) (Eq. (1)) over a sliding window of size ww.

(1) F(Φ)=Σ0<Φ(v)Φ(u)w𝚂(u,v)F(\Phi)=\displaystyle{\mathop{\Sigma}_{\begin{subarray}{c}0<\Phi(v)-\Phi(u)\leqslant w\end{subarray}}}\mathtt{S}(u,v)

Here, uu, vv are two vertices ordered within a window of size ww in the permutation, and 𝚂(u,v)\mathtt{S}(u,v) is a pair-wise similarity function to measure the closeness of uu and vv. In (DBLP:conf/sigmod/WeiYLL16, ), 𝚂(u,v)\mathtt{S}(u,v) is defined as 𝚂(u,v)=𝚂s(u,v)+𝚂n(u,v)\mathtt{S}(u,v)=\mathtt{S}_{s}(u,v)+\mathtt{S}_{n}(u,v), where 𝚂s(u,v)\mathtt{S}_{s}(u,v) is the number of times that uu and vv are sibling, and 𝚂n(u,v)\mathtt{S}_{n}(u,v) is the number of times that uu and vv are neighbors.

Example 2.1.

Taking the directed graph shown in Fig. 2(a) as an example, there are 12 nodes numbered from 1 to 12, which is considered as one possible graph ordering. Table 2 lists the pair-wise 𝚂(u,v)\mathtt{S}(u,v) values, in a matrix form for vertices 151-5 in Fig. 2(a) due to limited space. Given a window size w=3w=3, consider a partial permutation of length 3, Φ=[1,2,5]\Phi=[1,2,5] for the vertices in Fig. 2(a), its locality score F(Φ)=𝚂(1,2)+𝚂(2,5)+𝚂(3,5)=4F(\Phi)=\mathtt{S}(1,2)+\mathtt{S}(2,5)+\mathtt{S}(3,5)=4. Fig. 2(b) shows the optimal graph ordering which maximizes F(Φ)F(\Phi) given the window size w=3w=3.

Table 2. 𝚂(u,v)\mathtt{S}(u,v) Values for Fig. 2(a)
𝚂(u,v)\mathtt{S}(u,v) 1 2 3 4 5
1 - 2 0 1 1
2 2 - 0 1 1
3 0 0 - 0 0
4 1 1 0 - 1
5 1 1 0 1 -

In general, maximizing F(Φ)F(\Phi) is a variant of maximum TSP (for w=1w=1) with the sum of weights within a window of size w>1w>1 and thus is NP-hard as proven in (DBLP:conf/sigmod/WeiYLL16, ).

Wei et al. in  (DBLP:conf/sigmod/WeiYLL16, ) proposed a greedy algorithm (named GO), as sketched in Algorithm 1, which iteratively extends a partial solution SS by a vertex vv^{*}. The vertex vv^{*} is selected by maximizing a potential function 𝒬(S,v){\mathcal{Q}(S,v)}, which measures the quality of the partial solution extended by a vertex vv, based on the current partial solution SS.

In details, in ii-th iteration, it computes the cumulated weight k(v)k(v) (Eq.2) for each vertex vv and inserts the one with maximum k(v)k(v) in the permutation:

(2) k(v)=Σj=max(1,iw)𝚂(vj,v),k(v)=\displaystyle{\mathop{\Sigma}_{\begin{subarray}{c}j=max(1,i-w)\end{subarray}}}\mathtt{S}(v_{j},v),

where k(v)k(v) serves as the potential function QQ, {vj|j=max(1,iw)}\{v_{j}|j=max(1,i-w)\} are the last max(1,iw)max(1,i-w) inserted vertices. This procedure repeats until all the vertices in GG are placed. The total time complexity of GO is O(wdmaxn2)O(wd_{max}n^{2}), where dmaxd_{max} denotes the maximum in-degree of the graph GG. There are nn iterations, and in each iteration, it scans the remaining vertices in O(n)O(n) and computes the score k(v)k(v) for the scanned vertex vv in O(wdmax)O(wd_{max}). The algorithm achieves 12w\frac{1}{2w}-approximation for maximizing F(Φ)F(\Phi). It can achieve high-quality approximations with small window size and its practical performance is close to the optimal in the experiments.

2.2. Intuitions and Challenges

Intuitions: Given the current greedy-based framework, Algorithm 1 can be regarded as a discrete Markov Decision Process (MDP) (DBLP:books/lib/SuttonB98, ) , where the states are feasible partial permutations, actions are to insert one vertex to the permutations. For traditional greedy algorithm with predefined heuristics, the solution is built by a deterministic trajectory which has locally optimal reward. With regards to the graph ordering problem, Q(s,a)Q(s,a) (Eq. (3)) is simply approximated by the evaluation function 𝒬(S,v){\mathcal{Q}(S,v)}, which is specifically defined in Eq. (2). A natural question arises that can we do better by learning an expressive, parameterized 𝒬^(S,𝐯;Θ){{\mathcal{\hat{Q}}}(S,{\bf v};\Theta)} instead.

Inspired by universal approximation theorem (csaji2001approximation, ), deep reinforcement learning (DRL) learns a policy π\pi from states to actions that specifies what action to take in each state. We can formulate Algorithm 1 as MDP, which is composed by a tuple (𝒮,𝒜,,,γ)(\mathcal{S},\mathcal{A},\mathcal{R},\mathbb{P},\gamma) : 1) 𝒮\mathcal{S} is a set of possible states, 2) 𝒜\mathcal{A} is a set of possible actions, 3) =r(st,at)\mathcal{R}=r(s_{t},a_{t}) is the distribution of reward given state-action (st,at)(s_{t},a_{t}) pair. 4) =P(st+1|st,at)\mathbb{P}=P(s_{t+1}|s_{t},a_{t}) is the transition probability given (st,at)(s_{t},a_{t}) pair. 5) γ\gamma is the reward discount factor. The policy π\pi is a function π\pi: 𝒮𝒜\mathcal{S}\to\mathcal{A} that specifies what action to perform in each state.

The objective of RL is to find the optimal policy π\pi^{*} which maximizes the expected cumulative discounted reward. This cumulative discounted reward can be defined by the Bellman Equation as below for all states recursively:

(3) Q(s,a)=𝔼[r(s,a)+γmaxaQ(s,a)|s,a],Q(s,a)=\mathbb{E}[r(s,a)+\gamma max_{a^{\prime}}Q(s^{\prime},a^{\prime})|s,a],

where Q(s,a)Q(s,a) and Q(s,a)Q(s^{\prime},a^{\prime}) are the maximum expected cumulative rewards achievable from the state-action pair (s,a)(s,a) and the previous state-action pair (s,a)(s^{\prime},a^{\prime}), respectively. To avoid exhaustively exploring the MDP, Monte Carlo simulation is used to sample the state-action trajectory (s0,a0,r0,s1,a1,r1,,sT,aT,rT)(s_{0},a_{0},r_{0},s_{1},a_{1},r_{1},\cdots,s_{T},a_{T},r_{T}) to estimate the values of downstreaming state-action pairs. In this vein, DRL is adopted to solve the combinational optimization problem (DBLP:journals/corr/BelloPLNB16, ; DBLP:conf/nips/KhalilDZDS17, ), whose policy is learned by specific deep neural networks.

Refer to caption
Figure 3. The overview of DON-RL. DON-RL contains two components. One is in the blue box named DON. The other is in the orange box named the Policy Network. DON-RL learns a 𝒬^(S,𝐯;Θ){{\mathcal{\hat{Q}}}(S,{\bf v};\Theta)} by sampling partial solutions over a graph. The Policy Network takes the sampling probability 𝗉𝗋𝗈𝖻𝐭{\bf{{\mathsf{prob}}}_{t}} as input to predict a sampling tuning action 𝐚𝐭{\bf a_{t}}. The training of the two networks is performed interactively, where the updating of the Policy Network is controlled by the evaluation result of DON.

Challenges: Recently, these RL-based approaches (DBLP:journals/corr/BelloPLNB16, ; DBLP:conf/nips/KhalilDZDS17, ) have surpassed traditional greedy strategies on finding better feasible solutions. However, in practice, the pure RL framework is inefficient as the discrete action space becomes large. This drawbacks obstacles RL-base approaches to solve the graph ordering problem over a single large real-world graph. Concretely, RL algorithms use Thompson Sampling (DBLP:conf/icml/GopalanMM14, ) to explore trajectories and estimate the expected cumulative reward (Eq. (3)). The larger the action space is, the lower probability trajectories with high accumulated reward can be sampled in limited steps. The low-quality samples will extremely degrade the effectiveness of the model as well as the convergence in the training phase. Therefore, (DBLP:journals/corr/BelloPLNB16, ; DBLP:conf/nips/KhalilDZDS17, ) can only support training on graphs with tens to one hundred of vertices, where their optimal solutions are provided by state-of-the-art integer programming solvers (cplex201412, ; applegate2006concorde, ). Recall that AlphaGo (DBLP:journals/nature/SilverHMGSDSAPL16, ) has 19×1919\times 19 actions at most, which in fact is an enormously large action space requiring high computation power of a large cluster to explore. Unfortunately, due to the combinatorial explosion in the training space of 𝒬(S,v){\mathcal{Q}(S,v)}, any real large graph has a larger searching space than that of AlphaGo.

In a nutshell, the construction of the parametric model 𝒬^(S,𝐯;Θ){{\mathcal{\hat{Q}}}(S,{\bf v};\Theta)} and the efficient training in an exponential training space are two main challenges when we try to adopt the model-based framework to solve the graph ordering problem.

3. A New Approach: DON-RL

In order to address the challenges mentioned in Section 2.2, we propose a new framework: Deep Ordering Network with Reinforcement Learning (DON-RL) to capture the complex combinatorial structures of a single large graph in solving the graph ordering problem. To make the better approximation of the evaluation function 𝒬(S,v){\mathcal{Q}(S,v)}, we propose a new model: Deep Order Network (DON) which take the partial vertex order set SS as input and predict the next vertex which can maximize the cumulated locality score. Furthermore, to prune the exponential training space and improve the training of DON efficiency, we employ a policy gradient RL model: the Policy Network to explore the valuable vertex order segments efficiently. Fig. 3 demonstrates the overall learning framework.

Deep Order Network: As a parametric permutation-invariant model of the evaluation function 𝒬(S,v){\mathcal{Q}(S,v)} (Algorithm 1), DON takes a partial vertex order solution, i.e., a set of vertices as input, and outputs the probability distribution of the decision of next vertex to be selected. In other words, DON is to learn a parameterized 𝒬^(S,𝐯;Θ){{\mathcal{\hat{Q}}}(S,{\bf v};\Theta)} function by exploiting vertex sets over the whole graph GG, where every set encodes the contribution of vertices in the set for constructing F(Φ)F(\Phi) if they coexist in a window of size ww. DON is trained by supervised learning over partial solutions. Here, a partial solution is a set of mwm\leq w vertices for a window of size mm by first sampling m1m-1 vertices from GG followed by computing the likelihood of th e last vertex to be in the window. The last vertex to be added for a partial solution is by the ground truth of 𝒬{\mathcal{Q}} value, which is easy to be calculated when mm is small.

Since the function 𝒬^(S,𝐯;Θ){{\mathcal{\hat{Q}}}(S,{\bf v};\Theta)} has the property of permutation invariant regarding any permutation of the vertices in SS, we employ a permutation invariant neural network architecture to model it. We present the design and principle of our Deep Order Network in Section 4 in details.

RL-based Training: In practice, the space of the all possible partial vertex order solution is exponential with respect to the number of vertices of a graph. Theoretically, a neural network can memorize and serve as an oracle for any query solution. However, it is intractable to enumerate all the partial solutions in training DON. Hence, sampling the high-quality partial solutions, i.e., sets of vertices, is vital to train DON model with high generalization ability which can inference unseen vertex combinations rather than memorization.

In other words, DON should have the ability to inference unseen vertex combinations rather than memorization. Hence, sampling the high-quality partial solutions, i.e., sets of vertices, is vital for the training of DON.

Rather than specifying the sampling probability uniformly or by human experience, we employ Policy Network to explore how to make adjustments on the probability, based on the evaluation result of DON dynamically. In this vein, we regard DON as an environment and the Policy Network is an agent which obtains rewards from the environment and predicts an adjusting action. The sampler of vertex sets adopts the adjusted sampling probability (i.e., the state transformed from adjusting action of the Policy Network), to sample the training data for next time slot and feeds these data to DON. After DON is trained for a certain number of iterations, the evaluation result of DON is received by the Policy Network as the rewards for updating its weights, as well as subsequent prediction. The training of the two networks is performed interactively as shown in Fig. 3.

The significance of using an RL framework to control the input pipeline is two folds. First, it improves the training efficiency by a self-adaptive sampling scheme. Second, it guides DON to evolve towards a more effective model automatically.

In contrast to applying RL to solve the problem directly, using RL to make decisions on the sampling probability adjustment is feasible. On one hand, in practice, the action space of this probability adjustment task is smaller than that of directly attacking the whole problem. Since a main property of real-world graph is scale-free, we find that among the vertices there exists skewness regarding the significance to build high-quality solutions so that the pattern of rising up and pushing down vertex sampling probability is not diverse. On the other hand, in DON-RL framework, RL only serves as an auxiliary model to improve the training effectiveness and efficiency of DON, instead of estimating 𝒬^(S,𝐯;Θ){{\mathcal{\hat{Q}}}(S,{\bf v};\Theta)} directly. Thus, the effectiveness of DON is less sensitive to low-quality state-action trajectories. Section 5 introduces the RL formulation and our algorithm in details.

4. Deep Order Network

In this section, we propose the model Deep Order Network (DON) as a parameterized evaluation function 𝒬^(S,𝐯;Θ){{\mathcal{\hat{Q}}}(S,{\bf v};\Theta)} to replace the evaluation function 𝒬(S,v){\mathcal{Q}(S,v)} in Algorithm 1. Concretely, 𝒬^(S,𝐯;Θ){{\mathcal{\hat{Q}}}(S,{\bf v};\Theta)} takes a vectorized representation of partial solution SS of w1w-1 vertices as input, and outputs a vector 𝐯N{\bf v}\in\mathbb{R}^{N} where 𝐯i{\bf v}_{i} represents the likelihood of inserting a vertex viv_{i} to the permutation, given the current partial solution SS. And 𝒬^(S,𝐯;Θ){{\mathcal{\hat{Q}}}(S,{\bf v};\Theta)} being learned will replace 𝒬(S,v){\mathcal{Q}(S,v)} seamlessly in Algorithm 1 to generate a solution.

To learn 𝒬^(S,𝐯;Θ){{\mathcal{\hat{Q}}}(S,{\bf v};\Theta)}, we learn a vertex representation with the preservation of quality of the partial solutions, i.e., partial permutations Φ(S)\Phi(S) which are composed by any subsets of vertices of size ww. For a given graph, this representation encodes the hidden optimality structure of vertices in F(Φ(S))F(\Phi(S)) within a window. Intuitively, vertices with a higher 𝒬^{{\mathcal{\hat{Q}}}} value tend to cluster in the vector space, whereas vertices with a lower 𝒬^{{\mathcal{\hat{Q}}}} value are kept away. A high-quality feasible solution can be constructed by set expansion towards a high 𝒬^{{\mathcal{\hat{Q}}}} value in the vector space. It is worth mentioning that, for a partial permutation SS within a window, the locality score function F(Φ(S))F(\Phi(S)) should permutation invariant with respect to Φ(S)\Phi(S). Therefore, the learned evaluation function 𝒬^{{\mathcal{\hat{Q}}}} is permutation invariant with any permutation of the elements in SS.

Property 1.

(DBLP:conf/nips/ZaheerKRPSS17, ) A function of sets f:Xyf:X\rightarrow y, is permutation invariant, i.e., for any set 𝐱={x1,,xM}X{\bf x}=\{x_{1},\cdots,x_{M}\}\in X with any permutation π\pi of the elements in 𝐱{\bf x}, f({x1,,xM})=f({xπ(1),,xπ(M)})f(\{x_{1},\cdots,x_{M}\})=f(\{x_{\pi(1)},\cdots,x_{\pi(M)}\}).

DeepSets (DBLP:conf/nips/ZaheerKRPSS17, ) is a neural network architecture to model the permutation invariant function defined on sets. For a set x whose domain is the power set of a countable set, any valid, i.e., permutation invariant function can be represented in the form of Eq. (4), for two suitable functions ϕ\phi and ρ\rho.

(4) f(𝐱)=ρ(Σi=1Mϕ(xi)){\small f({\bf x})=\rho(\Sigma_{i=1}^{M}\phi(x_{i}))}

In Eq. (4), xiDx_{i}\in\mathbb{R}^{D} is the ii-th element in the set xx, ϕ:DK\phi:\mathbb{R}^{D}\rightarrow\mathbb{R}^{K}, and ρ:K\rho:\mathbb{R}^{K}\rightarrow\mathbb{R}. Intuitively, ϕ\phi generates the element-wise representation for each xix_{i}, and Σ\Sigma is a pooling function which performs a generalization of classic aggregation functions, e.g., 𝖲𝖴𝖬\mathsf{SUM}, 𝖠𝖵𝖦\mathsf{AVG} and 𝖬𝖠𝖷\mathsf{MAX} on the set. The added-up representation is processed by ρ\rho in the same manner as any neural network. In our problem, the domain of partial solutions SS is the combinations of VV with size w1w-1, where V={v1,v2,vN}V=\{v_{1},v_{2},\cdots v_{N}\} is the vertex set. The evaluation function 𝒬^(S,𝐯;Θ){{\mathcal{\hat{Q}}}(S,{\bf v};\Theta)} for any such SS can be learned by parameterizing the two functions (ϕ\phi and ρ\rho) with a specified pooling function Σ\Sigma.

We measure the output of 𝒬^(S,𝐯;Θ){{\mathcal{\hat{Q}}}(S,{\bf v};\Theta)} by a distribution p(𝐯|S)p({\bf v}|S), where p(vi|S)p({v_{i}}|S) is the ii-th element of p(𝐯|S)p({\bf v}|S), meaning how probable the ii-th vertex can be added into the current partial solution SS. With the Bayes rule, this probability is in Eq. (5).

(5) p(vi|S)=p(vi,S)p(S)p(S{vi}){\small p(v_{i}|S)=\frac{p(v_{i},S)}{p(S)}\propto p(S\cup\{v_{i}\})}

In Eq. (5), p(S{vi})p(S\cup\{v_{i}\}) is the marginal probability of the extended solution, which can be estimated by computing FF score of the partial permutation ϕ(S{vi})\phi(S\cup\{v_{i}\}) explicitly.

(6) p(S{vi})=F(ϕ(S{vi}))Σi=1NF(ϕ(S{vi})){\small p(S\cup\{v_{i}\})=\frac{F(\phi(S\cup\{v_{i}\}))}{\Sigma_{i=1}^{N}F(\phi(S\cup\{v_{i}\}))}}

Additionally, we use the normalized F(ϕ(S{vi}))F(\phi(S\cup\{v_{i}\})) values  (Eq. (6)) to make training more stable. As shown in Fig. 3, a training instance is constructed by sampling a partial solution SS with w1w-1 vertices as input feature, then p(𝐯|S)p({\bf v}|S) (Eq. (6)) serves as the soft label for training. We use the cross entropy as the training loss, as given in Eq. (7). Here, p^(𝐯|S)\hat{p}({\bf v}|S) is the prediction of DON.

(7) L(S)=Σi=1Np(vi|S)log(p^(vi|S)){\small L(S)=\Sigma_{i=1}^{N}-p({v_{i}}|S)\log(\hat{p}({v_{i}}|S))}

To model 𝒬^(S,𝐯;Θ){{\mathcal{\hat{Q}}}(S,{\bf v};\Theta)}, batches of partial solutions are fed into DON, where a partial solution SS is generated by sampling w1w-1 vertices over VV by a probability distribution 𝗉𝗋𝗈𝖻{{\mathsf{prob}}} without replacement.

Refer to caption
(a) Wiki-vote
Refer to caption
(b) Air Traffic
Figure 4. The FF score, the increment of FF score and the vertex degree of the permutation results of GO on two real graphs: Wiki-Vote and Air Traffic.

5. RL-based Training

In this section, we discuss how we adjust the sampling probability adaptively. As shown in Section 4, the feasible input of 𝒬^(S,𝐯;Θ){{\mathcal{\hat{Q}}}(S,{\bf v};\Theta)} is all possible combinations of the vertices with size w1w-1, which indicates that it is intractable to enumerate all the partial solutions in training DON. To alleviate the combinatorial explosion issue, we can employ the sampling strategy in the training space to reduce the candidate partial solutions.

By intuition, high-degree vertices sharing many common neighbors play a significant role in constructing high-quality feasible solutions. However, since the graph ordering is a discrete optimization problem, the significance of a discrete element that contributes to a solution can be very different from others, and there exists skewness regarding such significance among all elements.

To validate this claim, we report the permutation results of two real graphs: Wiki-vote (snapnets, ) and Air Traffic (konect, ) in Fig. 4. Here, the horizontal axis is the linear graph ordering. In vertical, we show the FF score, the increment of FF, and the degree of the vertex from top to bottom in three subfigures, respectively. As shown in Fig. 4 we can observe that the patterns of the two solutions are very different. For Wiki-vote, over 2/32/3 vertices in the ordering almost make no contribution to enlarging the FF score, while it is not the case for Air Traffic, as vertex adding into the permutation, the FF score increases continuously. Interestingly, it is counterintuitive that the high-degree vertices do not always make a significant contribution to increasing the FF score.

This result inspires us that what if adjusting the sampling probability dynamically during the training phase. This further motives us to use RL framework to tune sampling probability. Therefore, we model the tuning sampling probability in an MDP as follows:

States: a state ss is a vector of sampling probability 𝗉𝗋𝗈𝖻(0,1)n{{\mathsf{prob}}}\in(0,1)^{n}, 𝗉𝗋𝗈𝖻={𝗉𝗋𝗈𝖻(vi)}i=1n{{\mathsf{prob}}}=\{{{\mathsf{prob}}}(v_{i})\}_{i=1}^{n} where nn is the number of vertices. 𝗉𝗋𝗈𝖻(vi){{\mathsf{prob}}}(v_{i}) is the probability of sampling vertex viv_{i} for one training instance and 𝗉𝗋𝗈𝖻{\bf{{\mathsf{prob}}}} is normalized, i.e., Σi=1n𝗉𝗋𝗈𝖻(vi)=1\Sigma_{i=1}^{n}{{\mathsf{prob}}}(v_{i})=1. It is worth mentioning that the initialization of 𝗉𝗋𝗈𝖻{\bf{{\mathsf{prob}}}} cannot be random. What the Policy Network does is to adjust the experienced-based 𝗉𝗋𝗈𝖻{\bf{{\mathsf{prob}}}} slightly instead of learning it directly without any bias, which is extremely difficult for on-policy RL in an online fashion.

Actions: an action is a vector of 0-1, 𝐚={a(i)}i=1n{\bf a}=\{a(i)\}_{i=1}^{n} {0,1}n\in\{0,1\}^{n}. Each element a(i)a(i) denotes a tuning action imposed on 𝗉𝗋𝗈𝖻(vi){{\mathsf{prob}}}(v_{i}), i.e., increase (0) or decrease (1). There is a tuning rate λ\lambda\in\mathbb{R} to control the constant delta changes of each element 𝗉𝗋𝗈𝖻(vi){{\mathsf{prob}}}(v_{i}) for a given a(i)a(i). The adjustment is specified in Eq. (8). The final step of an adjustment is the L1L1 normalization of 𝗉𝗋𝗈𝖻{\bf{{\mathsf{prob}}}}.

(8) 𝗉𝗋𝗈𝖻(vi)={𝗉𝗋𝗈𝖻(vi)+λ,a(i)=0𝗉𝗋𝗈𝖻(vi)λ,a(i)=1{{{{\mathsf{prob}}}(v_{i})}=\begin{cases}{{\mathsf{prob}}}(v_{i})+\lambda,&a(i)=0\\ {{\mathsf{prob}}}(v_{i})-\lambda,&a(i)=1\end{cases}}

Transition: given the state-action pair (𝗉𝗋𝗈𝖻𝐭,𝐚𝐭)({\bf{{\mathsf{prob}}}_{t}},{\bf a_{t}}) at time tt, the transition to next state 𝗉𝗋𝗈𝖻𝐭+𝟏{\bf{{\mathsf{prob}}}_{t+1}}, is deterministic.

Reward: the reward rt=r(𝗉𝗋𝗈𝖻𝐭,𝐚𝐭)r_{t}=r({\bf{{\mathsf{prob}}}_{t}},{\bf a_{t}}) reflects the benefits of updating a state 𝗉𝗋𝗈𝖻𝐭{\bf{{\mathsf{prob}}}_{t}} by an action 𝐚𝐭{\bf a_{t}} at time tt. We use the evaluation results (e.g., root mean square error, cross entropy, etc.) of DON trained on the training data generated by updating 𝗉𝗋𝗈𝖻𝐭{\bf{{\mathsf{prob}}}_{t}} as rtr_{t}. To avoid the over-fitting of DON, instead of sampling the evaluation set D𝖤D_{{\mathsf{E}}} according to 𝗉𝗋𝗈𝖻𝐭{\bf{{\mathsf{prob}}}_{t}} directly, we adopt the fixed degree sampler, i.e., the initial 𝗉𝗋𝗈𝖻{\bf{{\mathsf{prob}}}} to sample the first vertex and generate the optimal partial solution with window size ww. The evaluation is performed on a selected evaluation set, D𝖤D_{{\mathsf{E}}}, after training DON a certain number of steps. The cumulative reward is computed by adding future reward discounted by a discounting factor γ[0,1]\gamma\in[0,1].

Policy: A policy function π(𝐚𝐭|𝗉𝗋𝗈𝖻𝐭)\pi({\bf a_{t}}|{\bf{{\mathsf{prob}}}_{t}}) specifies a tuning operation 𝐚𝐭{\bf a_{t}} to perform on a given state 𝗉𝗋𝗈𝖻𝐭{\bf{{\mathsf{prob}}}_{t}}. Here we employ a parametric function πΘ(𝐚𝐭|𝗉𝗋𝗈𝖻𝐭)\pi_{\Theta^{\prime}}({\bf a_{t}}|{\bf{{\mathsf{prob}}}_{t}}) with parameter Θ\Theta^{\prime} to analog output of the policy. This model can be any any multi-label classification model (DBLP:journals/jdwm/TsoumakasK07, ), e.g., multilayer perceptron (MLP), which takes 𝗉𝗋𝗈𝖻{\bf{{\mathsf{prob}}}} as input and outputs the probability of an action 𝐚{\bf a}. We use πΘ(𝐚𝐭|st)\pi_{\Theta^{\prime}}({\bf a_{t}}|{s_{t}}) to denote the output probability at time tt. To distinguish from DON 𝒬^(S,𝐯;Θ){{\mathcal{\hat{Q}}}(S,{\bf v};\Theta)}, we use Π(s;Θ)\Pi(s;\Theta^{\prime}) to denote the Policy Network.

5.1. Train Tuning Policy

To find the optimal policy, there are two kinds of RL algorithms (DBLP:books/lib/SuttonB98, ). One is value-based, e.g., Q-learning, which learns the optimal state-action values in Eq. (3) and the corresponding optimal policy is to take the best action in any state. The other is policy-based, which learns the optimal policy directly from a collection of policies. Since the policy-based RL algorithm is more suitable for high dimensional action space and has better convergence properties, we use a model-free, policy-based RL algorithm to train the Policy Network directly. The objective is to find the optimal policy π\pi^{*} by maximizing the following expected accumulated and discounted rewards of Eq (9), which is estimated by sampling state-action trajectory ς\varsigma (s0,a0,r0,s1,a1,(s_{0},a_{0},r_{0},s_{1},a_{1}, )...) of TT steps.

(9) J(Θ)=𝔼[Σt=0t=Tγtrt|πΘ]=𝔼ςπΘ(ς)[Σt=0t=Tγtrt]{\small J(\Theta^{\prime})=\mathbb{E}[\Sigma_{t=0}^{t=T}\gamma^{t}r_{t}|\pi_{\Theta^{\prime}}]=\mathbb{E}_{\varsigma\sim\pi_{\Theta^{\prime}}(\varsigma)}[\Sigma_{t=0}^{t=T}\gamma^{t}r_{t}]}

The gradient of the Eq. (9) is formulated using the 𝖱𝖤𝖨𝖭𝖥𝖮𝖱𝖢𝖤\mathsf{REINFORCE} algorithm (DBLP:journals/ml/Williams92, ) as given in Eq. (10).

(10) J(Θ)=𝔼ςπΘ(ς)[logπ(st)(Σt=0t=Tγtrtb(st)]{\small\nabla J(\Theta^{\prime})=\mathbb{E}_{\varsigma\sim\pi_{\Theta^{\prime}}(\varsigma)}[\nabla\log\pi(s_{t})(\Sigma_{t=0}^{t=T}\gamma^{t}r_{t}-b(s_{t})]}

Here, as the raw reward of a trajectory, Σt=0t=Tγtrt\Sigma_{t=0}^{t=T}\gamma^{t}r_{t}, is always positive, a baseline function b(s)b(s) is introduced to reduce the variance of the gradients. The formula Σt=0t=Tγtrtb(s)\Sigma_{t=0}^{t=T}\gamma^{t}r_{t}-b(s) indicates whether a reward is better or worse than the expected value we should get from state ss. For the baseline function b(s)b(s), we adopt the widely-used moving average of the historical rewards, i.e., the evaluation result of DON.

Algorithm 2 Train the Policy Network Π(s;Θ)\Pi(s;\Theta^{\prime})
1:  Input: graph GG, initial state 𝗉𝗋𝗈𝖻0{\bf{{\mathsf{prob}}}}_{0}, evaluation set D𝖤D_{{\mathsf{E}}}, learning rate α\alpha, discounting factor γ\gamma
2:  Initialize Π(s;Θ)\Pi(s;\Theta^{\prime})
3:  for each RL step  do
4:     for tt = 0 to TT do
5:        Sample atπΘ(𝐚𝐭|st)a_{t}\propto\pi_{\Theta^{\prime}}({\bf a_{t}}|s_{t}), where st=𝗉𝗋𝗈𝖻𝐭s_{t}={\bf{{\mathsf{prob}}}_{t}}
6:         Compute 𝗉𝗋𝗈𝖻𝐭+𝟏{\bf{{\mathsf{prob}}}_{t+1}} by 𝐚𝐭{\bf a_{t}} and normalization
7:        Train and update 𝒬^(S,𝐯;Θ){{\mathcal{\hat{Q}}}(S,{\bf v};\Theta)} based on Dt+1D_{t+1}, where Dt+1D_{t+1} is sampled from GG by 𝗉𝗋𝗈𝖻𝐭+𝟏{\bf{{\mathsf{prob}}}_{t+1}}
8:        Evaluate 𝒬^(S,𝐯;Θ){{\mathcal{\hat{Q}}}(S,{\bf v};\Theta)}, and receive the reward rtr_{t} computed on D𝖤D_{{\mathsf{E}}}
9:     end for
10:     for tt = 0 to TT do
11:        Compute cumulated reward Rt=Σi=ti=TγitriR_{t}=\Sigma_{i=t}^{i=T}\gamma^{i-t}r_{i}
12:         ΘΘ+α(Rtb(st))logπ(st)\Theta^{\prime}\leftarrow\Theta^{\prime}+\alpha(R_{t}-b(s_{t}))\nabla\log\pi(s_{t})
13:     end for
14:  end for
15:  Output: the Policy Network Π(s;Θ)\Pi(s;\Theta^{\prime})

The training algorithm of the Policy Network is given in Algorithm 2. It takes a graph GG, an initial and experience-based sampling probability 𝗉𝗋𝗈𝖻𝟎{\bf{{\mathsf{prob}}}_{0}}, the evaluation set D𝖤D_{{\mathsf{E}}}, and the learning rate of RL as input, and trains Policy Network Π(s;Θ)\Pi(s;\Theta^{\prime}) by interacting with the training process of Deep Order Network 𝒬^(S,𝐯;Θ){{\mathcal{\hat{Q}}}(S,{\bf v};\Theta)}. The intuition is if the reward of an action is positive, it indicates the action is good and the gradients should be applied to make the action even more likely to be chosen in the future. However, if the reward is negative, it indicates the action is bad and the opposite gradients should be applied to make this action less likely in the future.

We briefly explain the algorithm below. First, Policy Network Π(s;Θ)\Pi(s;\Theta^{\prime}) is initialized (line 2). The training of policy gradient starts at line 3. Before that, the training of DON has started several steps to collect enough evaluation results for computing the baseline. In each RL step, the algorithm updates Π(s;Θ)\Pi(s;\Theta^{\prime}) once by one Monte Carlo sampling of a trajectory of length TT (line 4). For each time tt, it repeats the following 4 steps one by one. First, draw a random action 𝐚𝐭{\bf a_{t}} based on the probability output by Policy Network, i.e., πΘ(𝐚𝐭|𝗉𝗋𝗈𝖻𝐭)\pi_{\Theta^{\prime}}({\bf a_{t}}|{\bf{{\mathsf{prob}}}_{t}}) (line 5). Instead of directly choosing the action with the highest probability, the random action manages a balance between exploring new actions and exploiting the actions which are learned to work well. Second, apply the action 𝐚𝐭{\bf a_{t}} on the state 𝗉𝗋𝗈𝖻𝐭{\bf{{\mathsf{prob}}}_{t}} by imposing an increment/decrement of λ\lambda (line 6), to generate the next state 𝗉𝗋𝗈𝖻𝐭{\bf{{\mathsf{prob}}}_{t}}. Third, generate the training data set Dt+1D_{t+1} of DON by sampling GG with probability 𝗉𝗋𝗈𝖻𝐭+𝟏{\bf{{\mathsf{prob}}}_{t+1}}, and feed Dt+1D_{t+1} to train 𝒬^(S,𝐯;Θ){{\mathcal{\hat{Q}}}(S,{\bf v};\Theta)} (line 7). Fourth, after training 𝒬^(S,𝐯;Θ){{\mathcal{\hat{Q}}}(S,{\bf v};\Theta)} in a certain number of steps, we use the evaluation set D𝖤D_{{\mathsf{E}}} to evaluate DON, and collect the evaluation result as reward rtr_{t} (line 8). When a trajectory is simulated, we compute the cumulative rewards (line 11) and apply gradient ascent to update the Policy Network Π(s;Θ)\Pi(s;\Theta^{\prime}) (line 12).

6. Experimental Studies

In this section, we present our experimental evaluations. First, we give the specific setting of the testing including datasets, settings of the models and training. Then, we compare the proposed model: DON-RL with the state-of-the-art algorithmic heuristic, conduct an A/B testing to validate the effect of Policy Network, and observe the performance of models as ww varies. Finally, two case studies of compressing graph and deriving a competitive edge partitioning from model-based ordering are presented.

6.1. Experimental Setup

Table 3. Datasets
Graphs |V||V| |𝐄|{\bf|E|} Density Description
Wiki-vote (snapnets, ) 7,115 103,689 0.00200.0020 vote graph
Facebook (snapnets, ) 4,039 88,234 0.01080.0108 social graph
p2p (snapnets, ) 6,301 20,777 0.00050.0005 file sharing graph
Arxiv-HEP (snapnets, ) 9,877 25,998 0.00030.0003 co-authorship graph
Cora (konect, ) 23,166 91,500 0.00010.0001 citation graph
PPI (snapnets, ) 21,557 342,353 0.00150.0015 biological graph
PL10K_1.6 10,000 121,922 0.00240.0024 power-law graph
PL10K_1.8 10,000 58,934 0.00110.0011
PL10K_2.0 10,000 31,894 0.00060.0006
ER10K_0.02 10,000 1,000,337 0.02000.0200 ER graph
ER10K_0.05 10,000 2,498,836 0.05000.0500
ER10K_0.1 10,000 5,004,331 0.10000.1000

Datasets: We use six real graphs collected from Stanford SNAP (snapnets, ) and KONECT (The Koblenz Network Collection) (konect, ): Wiki-vote is the Wikipedia adminship vote network. Facebook is the Facebook friendship network. p2p is a snapshot of the Gnutella peer-to-peer file sharing network. Arxiv-HEP is the Arxiv collaboration network between authors in the field of High Energy Physics. Cora is the cora scientific paper citation network. PPI is a protein-protein interaction network where vertices represent human proteins and edges represents physical interactions between proteins.

In addition, to validate the performance of our models on graphs with different structures explicitly, we generate six synthetic graphs by SNAP random graph generator: power-law and Erdös-Rényi. For power-law, we choose three power-law distributions with shape parameter γ\gamma\in {1.6, 1.8, 2.0} and adopt Newman’s method (newman2001random, ) to generate three power-law graphs: PL10K_1.6, PL10K_1.8 and PL10K_2.0. For Erdös-Rényi, we utilize the parameter p{0.02,0.05.0.1}p\in\{0.02,0.05.0.1\} which indicates the probability of edge existence between two vertices to generate three synthetic graphs with different densities, that are, ER10K_0.02,ER10K_0.05 and ER10K_0.1. All synthetic graphs have 10,000 vertices. Table 3 summarizes the information of these real and synthetic graphs.

uuv1v_{1}v2v_{2}v3v_{3}
(a) Order Equivalence
uuvv^{*}
(b) Merged Vertex
Figure 5. Reducing the Sampling Classes V(G)V(G)

Reduce the Sampling Classes: Recall that DON takes a vertices set sampled from V(G)V(G) as input to predict the likelihood of appending each vertex to the permutation. Based on the RL-based sampling probability tuning approach we have explored, we can further reduce the sampling classes V(G)V(G) by a simple but effective strategy. The reduction is based on the automorphism of vertices on the trig of the graph. We use an example in Fig. 5 to illustrate this reduction. In Fig. 5(a), vertices v1v_{1}, v2v_{2} and v3v_{3} have a common in-neighbor uu, which is also their only neighbor. For vi(i=1,2,3)v_{i}~{}(i=1,2,3), we have 𝚂(vi,u)=1\mathtt{S}(v_{i},u)=1 and the 𝚂\mathtt{S} value is 0 for all the other vertices. v1v_{1}, v2v_{2} and v3v_{3} are automorphism so that permuting them can be regarded as permuting one vertex vv^{*} three times, where 𝚂(v,u)=1\mathtt{S}(v^{*},u)=1 and 𝚂\mathtt{S} is 0 for the other vertices. In other words, if three positions for the three vertices viv_{i} are targeted in the permutation, the FF value will be equivalent for all the 3!3! assignments of viv_{i}. In this vein, we merge v1v_{1}, v2v_{2} and v3v_{3} to one vertex vv^{*} as shown in Fig. 5(b). Sampling vertex sets and inference to generate permutation can be directly conducted on the merged graph (Fig. 5(b)). Each time the model selects vv^{*} to extend the permutation, we insert a vertex randomly drawn from v1v_{1}, v2v_{2}, v3v_{3}. For real-world graph, due to its power-low degree distribution, our experiments show this preprocessing can reduce about 9% vertices on average.

Model settings: In DON, both the ϕ\phi and ρ\rho networks are two-layer MLP. We use the 𝖲𝖴𝖬\mathsf{SUM} as the pooling layer to add up the representations of ϕ\phi and the output is further processed by ρ\rho. The input of ϕ\phi is an identity vector representation, {0,1}n\{0,1\}^{n}, of a partial solution with size w1w-1 and the output is a probability distribution p(𝐯|S)(0,1)np({\bf v}|S)\in(0,1)^{n} generated by a softmax function. All the hidden units are activated by ReLU function ReLU(x)=max(0,x)ReLU(x)=\max(0,x).

The Policy Network is a multi-label classifier of two-layers MLP. The input is the state 𝗉𝗋𝗈𝖻𝐭(0,1)n{\bf{{\mathsf{prob}}}_{t}}\in(0,1)^{n} and the output layer predicts the probability to perform action 𝐚𝐭{\bf a_{t}}, i.e., a vector (0,1)n(0,1)^{n} by sigmoid activation. Similar to DON, the hidden units are activated by ReLU function.

Table 4. The Hyper-parameter Configuration
Hyper-parameters Values
DON learning rate 10310410^{-3}\sim 10^{-4}
mini-batch size 64 \sim 512
# hidden units 32, 64, 128, 256
global steps 5×103×1065\times 10^{3}\sim\times 10^{6}
Policy Network learning rate 10310510^{-3}\sim 10^{-5}
trajectory length TT 1 \sim 10
RL steps 50 \sim 300
discounting factor γ\gamma 0.9, 0.95
tuning rate λ\lambda 0.1n0.2n0.1n\sim 0.2n
# hidden units 32, 64, 128, 256
# evaluation set 2000, 5000

Implementation and training: The learning framework is built on Tensorflow (DBLP:conf/osdi/AbadiBCCDDDGIIK16, ) 1.8 with Python 2.7. We use Adam (DBLP:journals/corr/KingmaB14, ) and RMSProp optimizer to train DON and the Policy Network, respectively, which are trained interactively as shown in Algorithm 2.

Table 4 shows the hyper-parameters settings in the training. The models are learned with these parameters tuned in the corresponding experienced range. In Table 4, the global steps and RL steps are the numbers of parameter updating of DON and the Policy Network, respectively. For one RL step, a trajectory of length TT DON training is conducted. To this end, (global steps) / (RL steps ×T\times T) DON steps are trained in one timestamp. For computing the rewards, we use the Root Mean Square Error (RMSE) as the evaluation metric of the evaluation set. The reward is defined as the opposite number of RMSE. We use the best neighbor heuristic to generate the evaluation set by randomly choose the first vertex with initial 𝗉𝗋𝗈𝖻{{\mathsf{prob}}}, the degree distribution. A bare DON is trained by setting sampling probability as 𝗉𝗋𝗈𝖻𝟎{\bf{{\mathsf{prob}}}_{0}}. In this section, we use DON and DON-RL to denote the bare DON model and DON with Policy Network respectively.

Table 5. Result of Graph Ordering
F(Φ)F(\Phi) |V||V^{\prime}| GO DON DON-RL
Wiki-vote 5,880 145,736 156,871 157,669
Facebook 3,974 230,031 207,511 234,151
p2p 5,624 20,472 21,422 22,086
Arxiv-HEP 9,877 85,958 90,629 91,001
Cora 22,317 98,334 101,063 100,966
PPI 19,041 383,343 347,237 404,728
PL10K_1.6 8,882 166,540 190,021 197,622
PL10K_1.8 8,292 66,272 86,840 89,930
PL10K_2.0 8,484 29,373 37,332 38,071
ER10K_0.02 10,000 136,925 145,084 162,462
ER10K_0.05 10,000 615,706 673,357 724,743
ER10K_0.1 10,000 2,319,250 2,554,032 2,647,686
Refer to caption
(a) PPI
Refer to caption
(b) p2p
Refer to caption
(c) Facebook
Refer to caption
(d) Cora
Refer to caption
(e) Wikivote
Refer to caption
(f) PL10K_1.6
Refer to caption
(g) PL10K_1.8
Refer to caption
(h) PL10K_2.0
Refer to caption
(i) Arxiv-HEP
Refer to caption
(j) ER10K_0.02
Refer to caption
(k) ER10K_0.05
Refer to caption
(l) ER10K_0.1
Figure 6. DON vs. DON-RL

6.2. Result of Graph Ordering

Table 5 shows the overall results of the FF score of the permutations generated by DON, compared with the baseline graph ordering algorithm GO (DBLP:conf/sigmod/WeiYLL16, ). As a preprocessing step, vertices on the trig of the graph, i.e., vertex whose degree is 1 are merged to one equivalent vertex as shown in Fig. 5. In Table 5, the column |V||V^{\prime}| is the compacted vertex number of the graph. We can see that this trick compresses up to 17% vertices for real and power-low graphs but it is invalid for three Erdös-Rényi graphs due to their degree distribution. With regards to maximize FF, the solutions of DON and DON-RL surpass that of the greedy algorithm significantly. For the real graphs, all the solutions of DON-RL outperform that of GO from 1.8% up to 8.2%. And the solution of DON, on the 3 reals graphs, Wiki-Vote, p2p and Cora, the improvements of FF are 7.6%, 4.7% and 2.8%, respectively. For the synthetic graphs, both the DON and DON-RL can generate solutions of much higher FF score than GO. The DON-RL performs best and its solutions surpass that of GO up to 35.0% for powerlaw graph, and 18.6% for ER graph. As the matrix visualization results shown in Fig. 1(b), the model-based approach DON-RL focuses on the overall performance and can optimize the permutation of the sub-significant vertices. The greedy heuristic is powerful for ordering the top significant vertices while for the sub-significant vertices, it fails to capture the global information as ordering them w.r.t. the common neighbor relations. It is easy for GO to stuck in forming local dense areas while incurring sparsity in the global scope. This phenomenon agrees with our intuition, that is, the greedy algorithm could neglect better solutions in the future scope.

6.3. Adaptive Training by RL

In order to validate the effectiveness of the Policy Network, we perform A/B test to observe the effect of RL-based adaptive training, i.e, using the same hyper-parameters to train a DON model and a conjuncted DON-RL model. During the overall training process, we generate a solution after per 10% global training steps of DON ends. We directly observe the changes of objective function F(Φ)F(\Phi) as the loss function of the model is not a direct reflection of the objection of this NP-hard problem. Fig. 6 presents the results over the 12 graphs in Table 3.

We observe that the performance of DON and DON-RL are different among different graphs during training. In general, imposing RL on the sampling probability adjustment improves the performance of DON in most cases. As shown in Fig. 6(a), 6(c), 6(e)-6(g) and 6(j)-6(l), DON-RL can achieve better results than DON in graphs with high density since the locality of vertices is significantly skew on the graph. On the other hand, for the graph with low density and flat degree distribution, such as Cora (Fig. 6(d)) and p2p (Fig. 6(b)), using RL does not make significant improvement during the training. We conjecture, in this scenario, the potential action space is relatively large so that it is difficult for RL algorithm to find an action trajectory with high reward, especially when the locality has been well captured by DON. In addition, due to the dynamic adjusting mechanism, DON-RL can make the training process robust and avoid over-fitting. As shown in Fig. 6(b) and 6(i), DON reaches its best effect much earlier than DON-RL then it goes worse as over-fitting. In contrast, DON-RL can rectify the over-fitting autonomously in Fig. 6(c) and 6(h). The performance of DON-RL increases constantly and surpasses DON in the end.

Refer to caption
Figure 7. Varying Window Size ww

Therefore, the RL-based DON-RL is suitable for the graphs whose structures are highly skew. The results in Fig. 6 also confirms the effectiveness of the Policy Network in DON-RL.

Refer to caption
(a) b = 8
Refer to caption
(b) b = 16
Refer to caption
(c) b = 32
Figure 8. Compression Costs
Refer to caption
Figure 9. Edge Partitioning of Facebook

6.4. Varying the Window Size ww

In this section, we investigate the effect of the window size ww in Eq. (1). Fig. 7 shows for graph PL10K_1.6, the FF scores of 5 models training with ww is set to {4,5,6,7,8}\{4,5,6,7,8\}. For these DON models, they perform stably better than GO as ww grows. In Fig. 7, DON-RL is the 5 models associated with RL training, which generates permutations of highest FF. Recall that the approximate ratio of GO is 12w\frac{1}{2w}, which means theoretically, the larger the ww, the larger the gap between GO and the optimal solution, i.e., the improvement space of GO. Interestingly, in Fig 7, as ww grows, the increment of locality score of model-based approaches grow a little faster than that of GO. It implies that the model-based approaches can better take advantage of the gap to optimized the learned solution.

In addition, another interesting observation is that models trained with smaller ww have better performance than that of larger ww. It indicates that we can use pre-trained models with small ww to generate different permutations of larger ww.

6.5. Case Studies

In this section, we perform two case studies to demonstrate the power of DON-RL in solving real-world problems.

Graph Compression: First, we explore the potential of using DON-RL to deal with graph compression problem. Concretely, given a graph, we want to layout its edges so that its adjacency matrix AA is easy to be compressed. Formally, this problem is equivalent to find a vertex ordering to minimize the storage cost 𝖼𝗈𝗌𝗍nz(A,b){{\mathsf{cost}}}_{nz}(A,b), where bb is the block width. Concretely, we divide the matrix AA into b×bb\times b square matrix blocks and count the number of the nonempty block as 𝖼𝗈𝗌𝗍nz(A,b){{\mathsf{cost}}}_{nz}(A,b) (KangF11, ). Since 𝖼𝗈𝗌𝗍nz(A,b){{\mathsf{cost}}}_{nz}(A,b) is not comparable for different block size bb. Here we adopt a normalized version 𝖼𝗈𝗌𝗍r(A,b){{\mathsf{cost}}}_{r}(A,b):

(11) 𝖼𝗈𝗌𝗍r(A,b)=𝖼𝗈𝗌𝗍nz(A,b)N/b2,\displaystyle{{\mathsf{cost}}}_{r}(A,b)=\frac{{{\mathsf{cost}}}_{nz}(A,b)}{\lceil N/b\rceil^{2}},

where N/b2\lceil N/b\rceil^{2} is the number of blocks in AA. A good ordering for the compression should have low 𝖼𝗈𝗌𝗍nz(A,b){{\mathsf{cost}}}_{nz}(A,b) for given block size bb.

Fig. 8 demonstrates the compression cost 𝖼𝗈𝗌𝗍r(A,b){{\mathsf{cost}}}_{r}(A,b) on the 5 real graphs in Table 3, when the block wide bb is 88 (Fig. 8(a)), 1616 (Fig. 8(b)) and 3232 (Fig. 8(c)). We compare the compression performance with two methods: Degree permutes the vertex based on the decreasing degree. SlashBurn  (KangF11, ) is the graph ordering algorithm proposed for graph compression. We set w=5w=5 for our method: DON-RL for all datasets. As shown in Fig. 8, DON-RL outperforms the other methods on three datasets (Wikivote, Facebook and Cora) and achieves comparable results on two datasets (p2p and PPI). For the number of nonempty blocks, DON-RL reduces the counts by 2.97×2.97\times and 2.25×2.25\times compared with the second best orderings on Facebook and Cora. Interestingly, Degree is a quite good heuristic ordering for the graph compression. It achieves the best performance on p2p and PPI. The results validate that our learned vertex order is a good potential criteria for graph compression.

Graph Edge Partitioning: The second application we investigate is graph edge partitioning, which can achieve better load balancing than traditional vertex partitioning for skew graph data in the parallel/distribution environment. The problem is, given a partition number kk, partitioning the edge set EE disjointly into kk balanced subsets EiE_{i}, to minimize the replication factor (RF) of the vertices in kk partitions:

(12) RF(E1,,Ek)=1|V|Σi[k]|V(Ei)|,wherei[k]Ei=E,\text{RF}(E_{1},\cdots,E_{k})=\frac{1}{|V|}\Sigma_{i\in[k]}|V(E_{i})|,\text{where}\cup_{i\in[k]}E_{i}=E,

The problem is also NP-hardness (DBLP:conf/kdd/ZhangWLTL17, ). Since the graph ordering is based on vertex locality, we use the ordering result of DON-RL to generate an edge partitioning directly. The partition is conducted by assigning edges of adjacent vertices in the permutation to one partition under the balance constraint.

Fig 9 shows the partitioning derives from DON-RL with w=5w=5, and 5 other edge partitioning and graph ordering algorithms over Facebook. Here, Random  (DBLP:conf/kdd/BourseLV14, ) assigns each edge independently and uniformly to k partitions. The Greedy  (DBLP:conf/kdd/BourseLV14, ) heuristic preferentially assigns an edge to a partition already contains the two or one of its end vertices with trading off the balance. NE and SNE (DBLP:conf/kdd/ZhangWLTL17, ) are the state-of-the-art edge partitioning heuristic based on neighbor expansion. The replication factor of DON-RL is smaller than that of the widely used Greedy up to 37%. Also, as the number of partition increases, the replication factor of DON-RL increases much slower than Greedy and SlashBurn . It is interesting that a good partitioning derives from a high-quality ordering result.

7. Related Works

Graph Representation Learning is to learn an effective graph representation to encode sufficient features for downstream machine learning/deep learning tasks. Recently, graph representation learning have been extensively studied. The surveys can be found in (cui2018survey, ; DBLP:journals/debu/HamiltonYL17, ). The traditional graph representation learning, that uses matrix factorization to find a low-rank space for the adjacency matrix (DBLP:conf/www/AhmedSNJS13, ; DBLP:conf/nips/BelkinN01, ; fu2012graph, ), cannot be used to solve our problem, because they are designed to reconstruct the original graph. Structure preserving representation learning techniques can encode/decode graph structures and properties. DeepWalk (DBLP:conf/kdd/PerozziAS14, ) and Node2Vec (DBLP:conf/kdd/GroverL16, ) encode the neighborhood relationship by exploiting truncated random walks as embedding context. Furthermore, there are the graph representations to preserve vertex information at different granularity from microscopic neighborhood structure to macroscopic community structure. (DBLP:conf/www/TangQWZYM15, ; DBLP:conf/www/TsitsulinMKM18, ; DBLP:conf/kdd/ZhangCWPY018, ) carry high-order proximity and closeness measures, to provide flexible and diverse representations for unsupervised graph learning tasks. M-NMF (DBLP:conf/aaai/WangCWP0Y17, ) incorporates graph macroscopic proximity, the community affiliation of vertex. Such approaches are designed to machine learning tasks (e.g., node classification, link prediction, as well as recommendation). Unfortunately, these representations cannot be directly applied to our problem. First, these representations are learned by unsupervised fashion. Second, they do not contain abundant vertex features, the learned graph proximity is insufficient. Third, our problem aims to find an optimal combinatorial structure instead of performing inference on individual instances which only involves local information.

There are reported studies on representation learning over special graphs, such as representation of heterogeneous graphs (DBLP:conf/kdd/DongCS17, ), dynamic graphs (DBLP:journals/corr/abs-1803-04051, ), attributed graphs (DBLP:journals/corr/KipfW16, ). And there are specified graph machine learning applications that require specialized graph representation, for instance, influence diffusion prediction (DBLP:conf/icde/FengCKLLC18, ), anomaly detection (DBLP:conf/icde/HuAMH16, ), and network alignment (DBLP:conf/ijcai/ManSLJC16, ). These models are too specific to deal with our problem.

Neural Networks for Combinatorial Optimization: Early works that use neural network to construct solutions for NP-hard optimization problems are summarized in (DBLP:journals/informs/Smith99, ). Recently, deep learning has also been adopted to solve combinatorial optimization. A new attention-based sequence-to-sequence neural architecture, Pointer Network (NIPS2015_5866, ), is proposed and is used to solve the planar Travel Salesman Problem (TSP). Pointer Network learns the planar convex hull supervised by a set of problem instances and solutions alone, and then (DBLP:journals/corr/BelloPLNB16, ) uses reinforcement learning, the Actor-Critic algorithm (DBLP:conf/nips/KondaT99, ), to train Pointer Network. Instead of using given training instances, an actor network makes trials of tour and evaluates the trials using a critic network. To further improve the performance, (anonymous2019attention, ) proposes an alternative neural network which encodes input nodes with multi-head attention (DBLP:conf/nips/VaswaniSPUJGKP17, ). These approaches (NIPS2015_5866, ; DBLP:journals/corr/BelloPLNB16, ; anonymous2019attention, ) concentrate on 2D Euclidean space TSP, and it is non-trivial to extend them to deal with graphs. For graph data, Dai et. al. (DBLP:conf/nips/KhalilDZDS17, ) propose a framework to learn a greedy meta-algorithm over graphs. First, a graph is encoded by a graph embedding network. Second, the embedding is fed into some more layers to estimate an evaluation function. The overall network is trained by deep Q-learning (DBLP:journals/ml/Williams92, ) in an end-to-end fashion. The framework is demonstrated to solve Minimum Vertex Cover, Maximum Cut and TSP. These studies aim to generalize the process of problem-solving for a distribution of problem instances offline. A recent study provides a supervised learning approach to deal with NP problems equivalent to maximum independent set (MIS). It adopts graph convolutional network (GCN) (DBLP:journals/corr/KipfW16, ) to predict the likelihood whether each vertex is in the optimal solution (DBLP:conf/nips/LiCK18, ). Since it relies on the state-of-the-art MIS local search heuristics to refine the candidate solutions, the bare utility of the model needs to be further studied.

8. Conclusion

In this paper, we focus on an NP-hard problem, graph ordering, in a novel, machine learning-based perspective. Distinguished from recent research, the NP-hard problem is over an specific larger graph. We design a new model: Deep Order Network (DON) to learn the underlying combinatorial closeness over the vertices of graph, by sampling small vertex sets as locally partial solutions. To further improve the sampling effectiveness, we propose an adaptive training approach based on reinforcement learning, which automatically adjusts the sampling probabilities. Compare with the high-quality greedy algorithm, our overall model, DON-RL improves the quality of solution up to 7.6% and 35% for real graphs and synthetic graphs, respectively. Our study reveals that a simple neural network has the ability to deal with NP optimization problem by encoding hidden features of the combination structures. We will public our code and pre-trained models later.

References

  • [1] KONECT (the koblenz network collection). http://konect.uni-koblenz.de.
  • [2] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. A. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng. Tensorflow: A system for large-scale machine learning. In Proc. of OSDI’16, 2016.
  • [3] A. Ahmed, N. Shervashidze, S. M. Narayanamurthy, V. Josifovski, and A. J. Smola. Distributed large-scale natural graph factorization. In Proc. of WWW’13, 2013.
  • [4] Anonymous. Attention, learn to solve routing problems! In Submitted to International Conference on Learning Representations, 2019.
  • [5] D. Applegate, R. Bixby, V. Chvatal, and W. Cook. Concorde tsp solver, 2006.
  • [6] M. Behrisch, B. Bach, N. H. Riche, T. Schreck, and J. Fekete. Matrix reordering methods for table and network visualization. Comput. Graph. Forum, 35(3), 2016.
  • [7] M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Proc. of NIPS’01, 2001.
  • [8] I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio. Neural combinatorial optimization with reinforcement learning. CoRR, abs/1611.09940, 2016.
  • [9] F. Bourse, M. Lelarge, and M. Vojnovic. Balanced graph edge partition. In Proc. of KDD’14, 2014.
  • [10] I. I. CPLEX. 12.6. CPLEX User?s Manual, 2014.
  • [11] B. C. Csáji. Approximation with artificial neural networks. Faculty of Sciences, Etvs Lornd University, Hungary, 24:48, 2001.
  • [12] P. Cui, X. Wang, J. Pei, and W. Zhu. A survey on network embedding. IEEE Trans. on Knowledge and Data Engineering, 2018.
  • [13] Y. Dong, N. V. Chawla, and A. Swami. metapath2vec: Scalable representation learning for heterogeneous networks. In Proc. of KDD’17, 2017.
  • [14] S. Feng, G. Cong, A. Khan, X. Li, Y. Liu, and Y. M. Chee. Inf2vec: Latent representation model for social influence embedding. In Proc. of ICDE’18, 2018.
  • [15] Y. Fu and Y. Ma. Graph embedding for pattern analysis. Springer Science & Business Media, 2012.
  • [16] J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Proc. of OSDI’12, 2012.
  • [17] J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. Graphx: Graph processing in a distributed dataflow framework. In Proc. of OSDI’14, 2014.
  • [18] A. Gopalan, S. Mannor, and Y. Mansour. Thompson sampling for complex online problems. In Proc. of ICML’14, 2014.
  • [19] A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks. In Proc. of KDD’16, 2016.
  • [20] W. L. Hamilton, R. Ying, and J. Leskovec. Representation learning on graphs: Methods and applications. IEEE Data Eng. Bull., 40(3), 2017.
  • [21] R. Hu, C. C. Aggarwal, S. Ma, and J. Huai. An embedding approach to anomaly detection. In Proc. of ICDE’16, 2016.
  • [22] U. Kang and C. Faloutsos. Beyond ’caveman communities’: Hubs and spokes for graph compression and mining. In Proc. of ICDM’11, 2011.
  • [23] E. B. Khalil, H. Dai, Y. Zhang, B. Dilkina, and L. Song. Learning combinatorial optimization algorithms over graphs. In Proc. of NIPS’17, 2017.
  • [24] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In Proc. of ICLR’15, 2015.
  • [25] T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. CoRR, abs/1609.02907, 2016.
  • [26] V. R. Konda and J. N. Tsitsiklis. Actor-critic algorithms. In Proc. of NIPS’99, 1999.
  • [27] J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
  • [28] Z. Li, Q. Chen, and V. Koltun. Combinatorial optimization with graph convolutional networks and guided tree search. In Proc. of NeurIPS’18, 2018.
  • [29] T. Man, H. Shen, S. Liu, X. Jin, and X. Cheng. Predict anchor links across social networks via an embedding approach. In Proc. of IJCAI’16, 2016.
  • [30] M. E. Newman, S. H. Strogatz, and D. J. Watts. Random graphs with arbitrary degree distributions and their applications. Physical review E, 64(2):026118, 2001.
  • [31] B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: online learning of social representations. In Proc. of KDD’14, 2014.
  • [32] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. P. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 2016.
  • [33] K. A. Smith. Neural networks for combinatorial optimization: A review of more than a decade of research. INFORMS Journal on Computing, 11(1), 1999.
  • [34] R. S. Sutton and A. G. Barto. Reinforcement learning - an introduction. Adaptive computation and machine learning. MIT Press, 1998.
  • [35] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. LINE: large-scale information network embedding. In Proc. of WWW’15, 2015.
  • [36] R. Trivedi, M. Farajtabar, P. Biswal, and H. Zha. Representation learning over dynamic graphs. CoRR, abs/1803.04051, 2018.
  • [37] A. Tsitsulin, D. Mottin, P. Karras, and E. Müller. VERSE: versatile graph embeddings from similarity measures. In Proc. of WWW’18, 2018.
  • [38] G. Tsoumakas and I. Katakis. Multi-label classification: An overview. IJDWM, 3(3), 2007.
  • [39] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. In Proc. of NIPS’17, 2017.
  • [40] O. Vinyals, M. Fortunato, and N. Jaitly. Pointer networks. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28. 2015.
  • [41] X. Wang, P. Cui, J. Wang, J. Pei, W. Zhu, and S. Yang. Community preserving network embedding. In Proc. of AAAI’17, 2017.
  • [42] H. Wei, J. X. Yu, C. Lu, and X. Lin. Speedup graph processing by graph ordering. In Proc. of SIGMOD’16, 2016.
  • [43] R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8, 1992.
  • [44] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Póczos, R. R. Salakhutdinov, and A. J. Smola. Deep sets. In Proc. of NIPS’17, 2017.
  • [45] C. Zhang, F. Wei, Q. Liu, Z. G. Tang, and Z. Li. Graph edge partitioning via neighborhood heuristic. In Proc. of KDD’17, 2017.
  • [46] Z. Zhang, P. Cui, X. Wang, J. Pei, X. Yao, and W. Zhu. Arbitrary-order proximity preserved network embedding. In Proc. of KDD’18, 2018.