This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Local Graph Embeddings Based on Neighbors Degree Frequency of Nodes

Vahid Shirbisheh [email protected]
Abstract.

We propose a local-to-global strategy for graph machine learning and network analysis by defining certain local features and vector representations of nodes and then using them to learn globally defined metrics and properties of the nodes by means of deep neural networks. By extending the notion of the degree of a node via Breath-First Search, a general family of parametric centrality functions is defined which are able to reveal the importance of nodes. We introduce the neighbors degree frequency (NDF), as a locally defined embedding of nodes of undirected graphs into euclidean spaces. This gives rise to a vectorized labeling of nodes which encodes the structure of local neighborhoods of nodes and can be used for graph isomorphism testing. We add flexibility to our construction so that it can handle dynamic graphs as well. Afterwards, the Breadth-First Search is used to extend NDF vector representations into two different matrix representations of nodes which contain higher order information about the neighborhoods of nodes. Our matrix representations of nodes provide us with a new way of visualizing the shape of the neighborhood of a node. Furthermore, we use these matrix representations to obtain feature vectors, which are suitable for typical deep learning algorithms. To demonstrate these node embeddings actually contain some information about the nodes, in a series of examples, we show that PageRank and closeness centrality can be learned by applying deep learning to these local features. Our constructions are flexible enough to handle evolving graphs. In fact, after training a model on a graph and then modifying the graph slightly, one can still use the model on the new graph. Moreover our methods are inductive, meaning that one can train a model based on a known graph and then apply the trained model on another graph with a similar structure. Finally, we explain how to adapt our constructions for directed graphs. We propose several methods of feature engineering based on higher order NDF embeddings and test their performance in several specific machine learning algorithms on graphs.

Key words and phrases:
Graph Embeddings, Vector Representation of Nodes, Matrix Representation of Nodes, Local Features of Graphs, Centrality Measures, Closeness Centrality, PageRank, Parametric Centrality Measures, Dynamic Graphs, Inductive Graph Deep Learning Models, Graph Machine Learning, Graph Feature Engineering.

1. Introduction

A fundamental challenge in graph machine learning is that graphs in real world applications (aka networks) have a large and varying number of nodes (vertices) and edges (links or relationships). Therefore machine learning models on evolving graphs require a considerable time for feature extraction, training and inference. This limits their practical usage in real time applications concerning dynamic graphs. On the other hand, many graph algorithms and machine learning algorithms on graphs rely on operations involving the adjacency matrix or the Laplacian matrix of graphs which are computationally very expensive on large graphs. Time complexity problem also arises in the computation of various globally defined centrality measures, such as closeness centrality. Therefore, for machine learning on large dynamic graphs and generally for large network analytics, one wishes for two things: feature engineering based on local structures of graphs and inductive algorithms suitable for dealing with dynamic graphs.

In this work, we try to fulfill both these wishes by proposing various locally defined embeddings of nodes into euclidean vector spaces and then designing inductive algorithms to learn globally defined features and properties of nodes of graphs. We only use the adjacency lists of graphs and never get involved with complexity of matrix computations on adjacency matrices or Laplacian matrices of graphs. Among other benefits, this makes our methods more suitable for parallel and distributed computations, although we do not propose any parallel algorithm in this article.

The adjective “local” in our nomenclature has a very specific and concrete meaning. It means we use the Breadth-First Search (BFS) algorithm to traverse around the neighborhood of a node and collect the local information. Therefore, implementations of our methods may include a natural number parameter to determine how many layers of the BFS should be performed. Another basic concept that we use is the degree of nodes. Only by combining these two very basic notions, we are able to design several ruled based node representations that contain a great deal of information about the local structure of the neighborhoods of nodes.

These representations of nodes have several theoretical and machine learning applications. They can be used in graph isomorphism testing. They gives rise to a local approximation of neighborhoods of nodes by simple trees. They provide a simple way of visualizing the neighborhoods of nodes as color maps. They can be considered as feature vectors to be fed into deep learning algorithms to learn global properties of nodes. Finally we show how one can perform feature engineering on them to extract richer feature vectors suitable for specific downstream tasks. These applications are just the tip of iceberg and we expect our methods of graph representation find more applications in other areas of machine learning on graphs and network analysis.

We begin with exploiting the BFS and the degrees of nodes to define a general family of parametric centrality functions on undirected graphs in Section 2. These centrality functions are defined as certain summations. We demonstrate the prospect of these parametric centrality functions on very small graphs appearing in network science. However, they are not our primary objective, but they serve us mainly as the simple prototypes and introductory constructions that guide us to more sophisticated methods and constructions.

To achieve our aforementioned goals in machine learning on huge dynamic graphs, we propose a decomposition of the degrees of nodes into vectors in Section 3. Our way of decomposing the degrees of nodes, which is the main source of our various vector representations, is the counting of occurrences of different degrees among the neighbors of nodes, hence the name neighbors degree frequency or shortly NDF. It is a vector representation of nodes, namely a mapping that maps a node to its NDF vector inside a euclidean vector space. In other words, it is a node embedding. The NDF vector of a node contains not only the information of its degree, but also the information of the degrees of its neighbors. This two layers of information naturally lays out a perspective of the neighborhood of the node and it can be interpreted and utilized in several ways. For instance, we show that the NDF vector of a node amounts to an approximation of its neighborhood by means of a tree of height 2. We also interpret the NDF vectors as a labeling of nodes that needs to be preserved by isomorphisms of graphs. In this way, NDF vectors and their higher order generalizations find application in graph isomorphism testing. The details of these topics are presented in Subsections 3.2 and 5.2.

Whenever there exit a larger number of different degrees in the graph, bookkeeping of the frequency of degrees among neighbors of nodes can increase the space complexity of algorithms. On the other hand, the set of occurring degrees in a dynamic graph can change with every node/edge added or removed from the graph. Therefore we propose a form of aggregation in the degree frequency counting in Section 4. This gives rise to a diverse family of dynamic NDF vector representations which are capable of both reducing the dimension of the NDF vectors and handling changes in dynamic graphs easily. Dynamic NDF vectors depend on how we partition the set of possible degrees in a graph, so we discuss several practical and intuitively reasonable ways of implementing these partitions.

Two layers of information about each node does not seem enough for employing NDF vectors as feature vectors to fed into deep learning algorithms. Therefore we extend the definition of NDF vector representations into two different higher order NDF matrix representations in Section 5. Every row of these matrix representations of nodes is an aggregation of NDF vectors of layers of the BFS (NDFC matrices) or the degree frequency of layers (CDF matrices). Besides using these matrix representations as feature vectors in deep learning algorithms, they have other applications. Like NDF vectors, higher order NDF matrices can be used in graph isomorphism testing. They perform better than color refinement algorithm in several examples. One can also use these matrices to visualize the neighborhoods of nodes in the form of color maps.

In Section 6, we flatten NDFC matrices as one dimensional arrays and consider them as feature vectors associated to nodes of graphs. We feed these feature vectors into a simple feedforward neural network model and obtain a supervised deep learning algorithm to learn and predict PageRanks of nodes in undirected graphs. We test some other ways of feature engineering and a special form of convolutional neural networks too. We also show how our method for learning PageRank is applicable for predicting PageRank in dynamic and evolving graphs and even in unseen graph.

We delve further into feature engineering using our higher order NDF matrix representations in Section 7. We consider the un-normalized version of CDF matrices and perform a specific type of aggregation on them. The result is a set of feature vectors suitable for learning the closeness centrality of nodes by means of feedforward neural networks. Our method in this section are also inductive and easily applicable to dynamic graphs.

PageRanks of nodes are usually computed in directed graphs, specifically in the network of webpages and hyperlinks in the internet. Therefore one would like to extend the method developed for learning PageRanks of undirected graphs, in Section 6, to directed graphs. We explain how NDF embeddings and their higher order generalizations can be defined on directed graphs in Section 8. We also propose a minor change in the definition of NDFC matrices which is needed to take into account the adverse effect of certain directed edges in PageRank. Afterwards, we apply the directed version of NDFC matrices for learning the PageRanks of directed graphs.

Our methods and constructions depend only on very basic concepts in graph theory. On the other hand, they have shown several early successes in learning PageRank and closeness centrality by means of deep learning. Therefore we believe the theory of NDF embeddings and its higher order generalizations will find a wide range of applications in graph theory, network science, machine learning on graphs and related areas. Besides, it appears that our constructions are new in graph theory and computer science, and so there are so many details that need a through investigation to mature the subject.

Related works. The author is new in network science, graph representation learning and machine learning on graphs and does not claim expertise in these research areas. That being said, we believe almost all constructions and methods developed in this article are new and have not appeared in the work of other researchers. However, near to the completion of this work, we found out that a very special case of what we call dynamic NDF vectors appeared before in the work of Ai et al [1]. They introduced a centrality measure based on the distribution of various degrees among the neighbors of nodes. They use iterated logarithm to associate a vector to each node. In our terminology, this vector is the dynamic NDF vector associated to the list of starting points {1,2,3,5,17,,2n+1,22n+1}\{1,2,3,5,17,\cdots,2^{n}+1,2^{2^{n}}+1\}, see [1] and Section 4 for details and precise comparison. We also notice that what we call the NDF-equivalence among nodes of a graph coincides with the coloring of the nodes obtained after two rounds of color refinement, see Subsection 3.2 for details.

In the following sections, all graphs are assumed undirected and simple, with no loops and multiple edges, except in Section 8 where we deal with directed graphs. We denote graphs using the notation G=(V,E)G=(V,E), meaning GG is the graph, VV is the set of its vertices that we call them nodes, and EE is the set of edges of the graph.

2. Parametric Centrality Functions

The Breadth-First Search algorithm in graphs has a mathematical interpretation in terms of a metric structure on the set of nodes. Here, we review this point of view and use it to associate a certain sequence of numbers to each node of a graph. We show how these sequences of numbers may be used to construct parametric centrality measures applicable in network science.

In mathematics, a metric space is a set XX equipped with a metric function d:X×X[0,+)d:X\times X{\rightarrow}[0,+\infty) such that for all x,y,zXx,y,z\in X, we have

  • (i)

    d(x,y)=d(y,x)d(x,y)=d(y,x) (symmetry).

  • (ii)

    d(x,y)0d(x,y)\geq 0, and the equality holds if and only if x=yx=y (positivity).

  • (iii)

    d(x,z)d(x,y)+d(y,z)d(x,z)\leq d(x,y)+d(y,z) (triangle inequality).

A metric space is usually denoted by the pair (X,d)(X,d).

The natural metric space structure on the set of nodes of a graph G=(V,E)G=(V,E) is defined by the distance of every two nodes as the metric function. We recall that the distance d(u,v)d(u,v) of two nodes v,uVv,u\in V is defined as the number of edges in any shortest path connecting uu and vv. In order to this definition makes sense, here, we assume that GG is connected. With this assumption, it is easy to verify the axioms of a metric space for (V,d)(V,d). By adding ++\infty to the target space of the metric function and defining the distance of two nodes in two distinct components to be ++\infty, one can extend this definition to all undirected graphs, possibly disconnected. We use this extended definition when the graph under consideration is not connected. Although dd is a metric on the set VV, we may interchangeably call it the natural metric on the graph GG. One notes that this metric structure on VV induces a discrete topology on VV, meaning every subset of VV is both open and closed in this topology. Therefore we drop the terms open and closed in our discussion.

The notion of a metric function extends the euclidean distance in n{\mathbb{R}}^{n}. Thus similar to intervals, disks and balls in {\mathbb{R}}, 2{\mathbb{R}}^{2} and 3{\mathbb{R}}^{3}, respectively, we can define corresponding notions in any metric space as well. However, since most graph drawings are in two dimensions, we borrow the terms ”disk” and ”circle” from 2{\mathbb{R}}^{2} to use in our terminology.

Given an undirected graph G=(V,E)G=(V,E), for every node vVv\in V and integers k=0,1,2,k=0,1,2,\cdots, we define the disk of radius kk centered at vv by

Dk(v)={uV;d(v,u)k},D_{k}(v)=\{u\in V;d(v,u)\leq k\},

and similarly, the circle of radius kk centered at vv is defined by

Ck(v)={uV;d(v,u)=k}.C_{k}(v)=\{u\in V;d(v,u)=k\}.

Next, computing the cardinality of the above circles defines a sequence of functions S={sk}k0S=\{s_{k}\}_{k\geq 0} on VV, i.e.

sk(v)=|Ck(v)|,vV.s_{k}(v)=|C_{k}(v)|,\quad\forall v\in V.

Here SS stands for the word “size” and let us call this sequence the sizes of circles around a node. Now, we are ready to define the general family of parametric functions on graphs.

Definition 2.1.

Let G=(V,E)G=(V,E) be an undirected graph. To any given sequence Λ={λk}k0\Lambda=\{\lambda_{k}\}_{k\geq 0} of real numbers, we associate a parametric function φΛ:V\varphi_{\Lambda}:V{\rightarrow}{\mathbb{R}} defined by

ΦΛ(v)=k=0λksk(v),vV.\Phi_{\Lambda}(v)=\sum_{k=0}^{\infty}\lambda_{k}s_{k}(v),\forall v\in V.

We call this function the parametric centrality function associated with Λ\Lambda.

When Λ\Lambda is known, we may drop it from our notation. We note the followings:

Remarks 2.2.
  • (i)

    Since we are dealing with finite graphs, the number of non-zero functions in the sequence {sk}k0\{s_{k}\}_{k\geq 0} is bounded by the diameter of the graph (plus one). Therefore the above series is always finite and well defined.

  • (ii)

    By choosing λ1=1\lambda_{1}=1 and λk=0\lambda_{k}=0 otherwise, we get the degree centrality (up to a scaling factor).

  • (iii)

    In practice, we let only first few parameters in Λ\Lambda to be non-zero. Thus these parametric functions are indeed locally defined features on graph nodes.

  • (iv)

    By carefully choosing Λ\Lambda according to the downstream applications, ΦΛ\Phi_{\Lambda} can be regarded as a parametric centrality measure of graph nodes, see Examples 2.3, 2.4 and 2.5 below.

  • (v)

    Even the sequence {sk}k0\{s_{k}\}_{k\geq 0} contains some information about the local structure of the graph around its nodes. For instance, comparing the value s2s_{2} against another local feature of the graph gives rise to a measure of the inter-connectivity of neighborhoods of nodes, see Proposition 3.4.

  • (vi)

    The elements of the sequence {sk}\{s_{k}\} increase in higher density neighborhoods. Alternatively, one can use the sequence {1sk}\{\frac{1}{s_{k}}\} (or better {1skα+ε}\{\frac{1}{s_{k}^{\alpha}+\varepsilon}\} for some α,ε>0\alpha,\varepsilon>0) in lieu of {sk}\{s_{k}\} to define a family of parametric functions such that they decrease on nodes in higher density neighborhoods.

  • (vii)

    When nodes or edges are weighted, one can incorporate those weights in the definition of the sequence {sk}\{s_{k}\} to obtain more meaningful centrality measures on graph nodes.

In the following examples, we expand on the prospect of the parametric centrality functions in network science. Here, we consider a relatively simple way of defining the sequence Λ\Lambda of parameters. For some 0<p<10<p<1, we set λk=pk1\lambda_{k}=p^{k-1} for k>0k>0 and λ0=0\lambda_{0}=0. For the sake of simplicity, let us call the parametric centrality function associated to this Λ\Lambda the pp-centrality function and denote it simply by Φp\Phi_{p}. It is interesting to note that this very simple family of parametric centrality measures can mimic (to some extent) some of the well known centrality measures.

Example 2.3.

Let G=(V,E)G=(V,E) be the network of neighboring countries in a continent, i.e. the nodes are countries and two countries are connected if they share a land border. It is a reasonable assumption that the financial benefits of mutual trades between two countries decrease by some factor 0<p<10<p<1 for every other country lying in a shortest path connecting them. Therefore the total economical benefits of continental trade for each country depends on both the distance of countries and the number of other countries in each circle around the country. In mathematical terms, for a given country XX, the financial benefits of trade between XX and countries with distance kk from XX is pk1sk(X)p^{k-1}s_{k}(X), because there are exactly k1k-1 countries in any shortest path connecting them. The total benefits of continental trade is the summation over all distances kk, i.e. k=1pk1sk(X)\sum_{k=1}^{\infty}p^{k-1}s_{k}(X) and this is exactly Φp(X)\Phi_{p}(X). Therefore the pp-centrality function Φp\Phi_{p} models the total financial benefits of continental trade for each country according to the graph structure of this network (and our assumptions!).

Refer to caption
Figure 1. The graph in Examples 2.4, 3.3, 3.5, 3.8(iii) and 4.4.

We use the graph shown in Figure 1 in several occasions in this paper to explain various concepts and constructions. It was originally designed to explain the content of Section 3. However, it appears to be explanatory for prospects of parametric centrality functions as well.

Example 2.4.

Let G=(V,E)G=(V,E) be the undirected graph shown in Figure 1. We heuristically set p=0.6p=0.6 and compute pp-centrality function Φp\Phi_{p} on nodes of GG. The result is divided by the scaling factor 21.15 in order to be comparable to closeness centrality. In Table 1, we compare the sorted dictionary of pp-centrality function with the sorted dictionaries of closeness, eigenvector and betweenness centrality measures (computed using the Python package NetworkX [4]). In each case, we arrange the result in decreasing order to show that the order of nodes according to pp-centrality function is similar to the order with respect to closeness centrality. One observes that those nodes which are not in the same order happen to have very close pp-centrality as well as closeness centrality. The order of values of pp-centrality function shows less similarity with the order of values of eigenvector centrality. One can also spot more disarray while comparing pp-centrality function with betweenness centrality. In Table 2, we compare the values of pp-centrality function (again divided by the scaling factor 21.15) with the values of closeness centrality. It shows that pp-centrality function fluctuates around closeness centrality with an average error of about 3.938 percent.

The diameter, the maximum distance between two nodes, of graph GG is 7. So for computing pp-centrality function, we only need to consider circles of radius at most 7. If we restrict our computations to circles with strictly smaller radiuses, we get less accurate results. For instance, the average percentage of differences between closeness and pp-centrality (the last row of Table 2) will be 4.149 percent (respectively 4.411, 5.868, 13.401, 30.941 and 65.599 percent) if we use circles of radius at most 6 (respectively 5, 4, 3, 2 and 1). This suggests that the higher maximum radius, the more precise estimation of closeness centrality. It also shows that by restricting our computations to circles of radius at most 5, we lose about 0.473 percentage of accuracy, but it decreases the computational burden. Therefore, in larger graphs, the maximum radius of circles involved in the computation can be a notable hyperparameter to establish a balance between precision and time complexity of the computation. Finally we must mention that in a low maximum radius computation of pp-centrality, one should consider a different scaling factor to achieve a better estimation. For example, in the computation of pp-centrality function of GG involving circles of at most radius 2 and with the scaling factor 15.7, the average percentage of differences drops to 14.631 percent (from 30.941 percent with the scaling factor 21.15).

Closeness pp-Centrality Eigenvector Betweenness
Centrality Function Centrality Centrality
A 0.485 A 0.485 Y 0.444 A 0.613
H 0.444 H 0.448 A 0.437 B 0.346
Y 0.432 Y 0.437 H 0.436 H 0.261
B 0.410 B 0.418 I 0.347 D 0.242
I 0.364 I 0.378 K 0.236 Y 0.229
C 0.356 D 0.361 B 0.224 K 0.156
L 0.348 E 0.353 L 0.211 I 0.128
M 0.340 C 0.351 M 0.175 M 0.086
D, E, J 0.333 K 0.348 C 0.165 C 0.083
F 0.314 L 0.347 E 0.143 L 0.061
K 0.308 M 0.336 D 0.141 E 0.046
N 0.291 F 0.311 J 0.136 F, N 0.021
Q, R 0.254 J 0.310 N 0.128 J, P, Q, R 0.0
P 0.239 N 0.298 F 0.096
Q, R 0.235 P 0.073
P 0.227 Q, R 0.044
Table 1. The sorted lists of closeness centrality, pp-centrality function, eigenvector centrality and betweenness centrality of the graph shown in Figure 1. Each value has been rounded to three decimal places.
Node Closeness pp-Centrality Percentage of
Centrality Function Difference
A 0.485 0.485 0.014%
H 0.444 0.448 0.766%
B 0.410 0.418 1.898%
Y 0.432 0.437 0.941%
I 0.364 0.378 3.852%
C 0.356 0.351 1.149%
L 0.348 0.347 0.127%
E 0.333 0.353 5.886%
D 0.333 0.361 8.155%
J 0.333 0.310 7.041%
M 0.340 0.336 1.289%
F 0.314 0.311 0.759%
K 0.308 0.348 12.979%
N 0.291 0.298 2.594%
Q 0.254 0.235 7.381%
R 0.254 0.235 7.381%
P 0.239 0.227 4.739%
Average percentage of Differences 3.938%
Table 2. The value comparison of closeness centrality and pp-centrality function of nodes in the graph shown in Figure 1. Each value has been rounded to three decimal places. The last column is computed using the formula 100|CΦp|C\frac{100|C-\Phi_{p}|}{C}, where CC is the closeness centrality and Φp\Phi_{p} is pp-centrality function. We rounded the result after applying the formula on raw values. This explains why node AA has equal closeness centrality and pp-centrality in the table, but the percentage of the difference is slightly bigger than zero.

In the next example, we again compare certain pp-centrality functions to closeness centrality on three famous graphs in network science and we obtain similar results as the above example.

Examples 2.5.

We use the Python library NetworkX, [4], for generating the following graphs and for computing the closeness centrality of their nodes. To compute pp-centrality functions for all three cases, we use circles of radius at most 5. We refer the reader to [4] for corresponding references for each graph.

  • (i)

    For Zachary’s Karate Club graph, we choose p=0.6p=0.6. Then by dividing the pp-centrality function by the scaling factor 4242, we get an estimation of the closeness centrality function by an average error of about 2 percent.

  • (ii)

    For the graph of Florentine families, we set p=0.6p=0.6 and let the scaling factor be 17.9617.96. Then, the average differences between pp-centrality function and closeness centrality is about 3.416 percent.

  • (iii)

    For the graph of co-occurrence of characters in the novel Les Miserables, we set p=0.565p=0.565 and let the scaling factor be 86.886.8. In this case, the average differences between pp-centrality function and closeness centrality is about 5.46 percent.

A word of caution is in order regarding the above examples. Our primary goal is not to prove that pp-centrality function is similar to (or approximates) other centrality measures. These examples only suggest that pp-centrality functions and more generally parametric centrality functions can equally be considered as certain measures to evaluate the importance of nodes in a graph. Furthermore, since we compute parametric centrality functions for each node based on its local neighborhood, the computation of these functions are more practical and computationally cheaper than other centrality measures, which are computed based on the information of the whole graph. Therefore, in certain applications in network science, they may offer cheaper and more flexible substitutes for well-known centrality measures.

In certain problems in network science, the factor pp can be the probability of some interaction (transmission of information or diseases, etc) between nodes in a network. In these cases, one might need to define λk=pk\lambda_{k}=p^{k} in lieu of pk1p^{k-1} for k>0k>0 and the appropriate parametric centrality function would be k=1pksk\sum_{k=1}^{\infty}p^{k}s_{k}. The bottom line is our definition has enough flexibility to be useful in many situations occurring in real world problems in network science.

Counting the number of nodes in each circle (layer of the BFS) around graph nodes reveals only a small portion of the local structure of graphs. To extract more refined local information of each node, we continue with constructing a vector representation of nodes in the next section.

3. The Neighbors Degree Frequency (NDF) Vector Representation of Nodes

In this section, we construct the simple version of our graph embedding and discuss its interpretations, usages and shortcomings. This leads us to more mature versions of our construction in upcoming sections.

Definition 3.1.

Let G=(V,E)G=(V,E) be an undirected graph and let mm be the maximum degree of nodes in GG. For every node vVv\in V, we define a vector vndf(v)=(vndf(v)1,,vndf(v)m)mvndf(v)=(vndf(v)_{1},\cdots,vndf(v)_{m})\in\mathbb{R}^{m} by setting vndf(v)ivndf(v)_{i} to be the number of neighbors of vv of degree ii for all i=1,,mi=1,\cdots,m. We call the function vndf:Vmvndf:V\rightarrow\mathbb{R}^{m} the vanilla neighbors degree frequency embedding of nodes of GG, and briefly vanilla NDF.

The main point of our definition is to count the frequency of various degrees among neighbors of a node, hence the name Neighbors Degree Frequency and the acronym NDF. We basically decompose the degree of a node into a vector consisting of degree frequencies of its neighbors, as one computes deg(v)=s1(v)=i=1mvndf(v)ideg(v)=s_{1}(v)=\sum_{i=1}^{m}vndf(v)_{i}. Using vndf(v)vndf(v), one can also find an upper bound for the size of the circle of radius 2 around vv, i.e. s2(v)s_{2}(v), see Proposition 3.4.

The vanilla version of neighbors degree frequency embedding has three obvious shortcomings: First it is wasteful, because there might exist numbers between 1 and m1m-1 that never occur as a degree of a node in GG. For example, consider a star shape graph with 10 nodes. It has 9 nodes with degree 1 that are adjacent to the central node of degree 9. The vanilla NDF requires 9 euclidean dimensions, but only the first and the last entries are used in the image of the mapping vndfvndf. To remedy this issue, we propose a minimal version of NDF embedding in Definition 3.2 below.

The second problem occurs when networks grow. Since the nodes with higher degrees usually attract most of the new links (for instance, because of preferential attachment process), there is a good chance that the maximum degree mm of graph nodes increases in growing graphs, and so a fixed mm wouldn’t be sufficient for the above definition any more. This issue on small dynamic graphs has an easy solution: One can count all nodes of degree greater or equal to mm as degree mm.

The third problem is the curse of dimensionality for graphs having nodes with very high degrees. For instance, graphs appearing in social networks may contain nodes with millions of links. Clearly, a vector representation in a euclidean space of several million dimensions is not a practical embedding!. For the case of big dynamic graphs, we propose a dynamic version of NDF in Section 4. Our solution for big dynamic graph is perfectly applicable to the above cases too. However, since the vanilla and minimal NDF vector representations have several conceptual and theoretical importance, we devote a few more paragraphs to them.

Definition 3.2.

Let G=(V,E)G=(V,E) be an undirected graph. Let {d1,,dm}\{d_{1},\cdots,d_{m}\} be the ascending list of all occurring degrees of nodes in GG, which we call it the degrees list of the graph GG. The minimal neighbors degree frequency embedding of nodes of GG is defined as

mndf(v)=(mndf(v)1,,mndf(v)m)m,vV,mndf(v)=(mndf(v)_{1},\cdots,mndf(v)_{m})\in\mathbb{R}^{m},\qquad\forall v\in V,

where mndf(v)imndf(v)_{i} is the number of neighbors of vv of degree did_{i}, for all i=1,,mi=1,\cdots,m.

The minimal NDF is essentially the same as vanilla NDF except that we have omitted never occurring degrees while recording the frequency of degrees of neighbors of each node. The important point we need to emphasize here is that we substitute the role played by the “maximum degree” of graph nodes with “degrees list” of the graph. This paves the way for more variations of NDF by upgrading the concept of “degrees list” into “intervals list”. This is our main pivot to define more practical versions of NDF for huge dynamic graphs in Section 4. In the following example, we examine some aspects of vanilla (and minimal) NDF:

Example 3.3.

Let G=(V,E)G=(V,E) be the graph shown in Figure 1. The ascending list of degrees in GG is {1,2,3,4,5}\{1,2,3,4,5\}. Hence the minimal NDF and the vanilla NDF are the same and they map the nodes of GG into 5\mathbb{R}^{5}. The NDF vector representations of nodes of GG are shown in Table 3. One immediately notices the followings:

  • (i)

    Nodes B,E,IB,E,I are all of degree 3, but they have very different NDF vector representations. This shows how NDF is much more detailed criterion than degree centrality in classifying nodes.

  • (ii)

    Nodes YY and HH have the same NDF vector representations, because their neighbors have similar degree frequency. But the circles of radius 2 around them do not have the same size. In fact, one easily checks that s2(Y)=5s_{2}(Y)=5 and s2(H)=6s_{2}(H)=6. This hints that in order to define a more faithful representation of nodes, we cannot content ourselves to just NDF vectors. Using the sequence {sk}k2\{s_{k}\}_{k\geq 2} is one way to gain more in depth knowledge of the neighborhood topology. We shall also propose two extensions of NDF vectors to circles of higher radius around a node to extract more information about the bigger neighborhoods of nodes in Section 5.

  • (iii)

    Nodes QQ and RR cannot be distinguished from each other by all these measures. They essentially have the same statistics of degree frequency and circles sizes at all levels. In this case, it is clear that the permutation which only switches these two nodes is an automorphism on GG (an isomorphism from GG into itself). Therefore it is expected that they have the same NDF vector representation, see also Example 3.9.

Node(s) NDF Vector
A (1, 1, 1, 2, 0)
D (2, 0, 2, 0, 0)
K (1, 2, 1, 0, 0)
Y, H (0, 1, 1, 1, 1)
B (0, 0, 1, 1, 1)
E (0, 1, 1, 1, 0)
I (0, 0, 0, 3, 0)
C (0, 1, 0, 0, 1)
F (0, 1, 1, 0, 0)
L (0, 0, 0, 2, 0)
N, M (0, 1, 0, 1, 0)
J (0, 0, 0, 0, 1)
P, Q, R (0, 0, 0, 1, 0)
Table 3. The vanilla (and minimal) NDF embedding of nodes of graph GG, shown in Figure 1 and explained in Examples 3.3, 3.5 and 3.9.

3.1. Tree approximations of neighborhoods of nodes

One might interpret vanilla and minimal NDF vector representations of nodes as an extension of degree of nodes into an embedding of nodes into vector spaces. However there is a more explanatory picture of NDF which is the subject of this subsection. Here we explain how NDF gives rise to an approximation of local neighborhoods of nodes by means of trees. The following proposition with minor modifications is valid for the minimal NDF too. For the sake of simplicity, we state it for the vanilla NDF only. First, we need some notations.

Let G=(V,E)G=(V,E) be an undirected graph. We recall that the ego subgraph of a graph GG centered at vVv\in V of radius rr, which we denote it by Er(v)E_{r}(v), is the induced graph whose set of nodes equals Dr(v)D_{r}(v), the disk of radius rr around vv, and whose set of edges consists of those edges of GG that both ends belong to Dr(v)D_{r}(v), see [4].

Proposition 3.4.

With the notation of Definition 3.1, we define a function s^:V\hat{s}:V\rightarrow\mathbb{N} by setting s^(v):=i=1m(i1)vndf(v)i\hat{s}(v):=\sum_{i=1}^{m}(i-1)vndf(v)_{i}, i.e. s^(v)\hat{s}(v) is defined as the dot product of vndf(v)vndf(v) and the vector (0,1,,m1)(0,1,\cdots,m-1). Then, for every vVv\in V, we have the followings:

  • (i)

    s2(v)s^(v)s_{2}(v)\leq\hat{s}(v), for all vVv\in V.

  • (ii)

    If E2(v)E_{2}(v) is a tree, then s2(v)=s^(v)s_{2}(v)=\hat{s}(v).

  • (iii)

    Let E2(v)E_{2}^{\prime}(v) be the graph obtained from E2(v)E_{2}(v) by omitting the edges whose both ends lie in C2(v)C_{2}(v). Then s2(v)=s^(v)s_{2}(v)=\hat{s}(v) if and only if E2(v)E_{2}^{\prime}(v) is a tree.

Proof.

Given a node uC1(v)u\in C_{1}(v), let S(u)S(u) denote C1(u){v}C_{1}(u)-\{v\}, the set of neighbors of uu besides vv. Then we have

C2(v)uC1(v)S(u)=i=1m(uC1(v),deg(u)=iS(u)).C_{2}(v)\subseteq\bigcup_{u\in C_{1}(v)}S(u)=\bigcup_{i=1}^{m}\left(\bigcup_{u\in C_{1}(v),deg(u)=i}S(u)\right).

Taking the cardinality of both sides of this inclusion proves (i).

Assertion (ii) follows from (iii). To prove (iii), we only need to notice that the equality s2(v)=s^(v)s_{2}(v)=\hat{s}(v) is logically equivalent to the presence of the following two conditions:

S(u)\displaystyle S(u) \displaystyle\subseteq C2(v),uC1(v),\displaystyle C_{2}(v),\qquad\forall u\in C_{1}(v),
S(u)S(w)\displaystyle S(u)\cap S(w) =\displaystyle= ,uwC1(v).\displaystyle\emptyset,\qquad\forall u\neq w\in C_{1}(v).

And this equivalence is clear from the above inclusion. ∎

Example 3.5.

In this example, we apply the content of the above proposition to some selected nodes of graph GG in Figure 1.

  • (i)

    One computes s2(A)=6s_{2}(A)=6 and s^(A)=9\hat{s}(A)=9. This is because E2(A)E_{2}^{\prime}(A) is not a tree; first S(Y)S(Y) and S(H)S(H) are not subsets of C2(A)C_{2}(A), secondly S(Y)S(H)={I}S(Y)\cap S(H)=\{I\}.

  • (ii)

    We observe that E2(P)E_{2}(P) is a tree and we have s2(P)=s^(P)=3s_{2}(P)=\hat{s}(P)=3.

  • (iii)

    We have s2(F)=s^(F)=3s_{2}(F)=\hat{s}(F)=3, but the ego subgraph E2(F)E_{2}(F) is not a tree, because of the triangle ΔBDE\Delta BDE. However, the graph E2(F)E_{2}^{\prime}(F) is a tree (after omitting the edge (B, D) from E2(F)E_{2}(F)).

The insight gained in Proposition 3.4 leads us to the following concept of local approximation:

Remark 3.6.

For every node vv in a graph G=(V,E)G=(V,E), we define a rooted tree TvT_{v} of height 2 as follows: We call the root of the tree tvt_{v} and the set of its children is {tu,uC1(v)}\{t_{u},u\in C_{1}(v)\}. In other words the children of the root are indexed by neighbors of vv. Next, for every neighbor uu of vv, we associate deg(u)1deg(u)-1 children tu1,,tudeg(u)1t_{u}^{1},\cdots,t_{u}^{deg(u)-1} to the node tut_{u}. With this setting in place, the intuition behind NDF and Proposition 3.4 is that the vanilla (and minimal) NDF simplifies (or approximates) every local neighborhood of a node vv in a graph GG as the tree TvT_{v}. In fact, s2(v)=s^(v)s_{2}(v)=\hat{s}(v) if and only if there is an isomorphism φ:TvE2(v)\varphi:T_{v}\rightarrow E_{2}^{\prime}(v) of graphs such that φ(tv)=v\varphi(t_{v})=v.

As an example for the last statement of the above remark consider node NN in the graph GG in Figure 1. Then the mapping φ:TNE2(N)\varphi:T_{N}\rightarrow E_{2}^{\prime}(N) defined as φ(tN)=N\varphi(t_{N})=N, φ(tM)=M\varphi(t_{M})=M, φ(tK)=K\varphi(t_{K})=K, φ(tK1)=L\varphi(t_{K}^{1})=L, φ(tK2)=I\varphi(t_{K}^{2})=I, φ(tK3)=P\varphi(t_{K}^{3})=P and φ(tM1)=H\varphi(t_{M}^{1})=H is an isomorphism of graphs, so s2(N)=s^(N)s_{2}(N)=\hat{s}(N).

This approximation is similar to the role played by tangent spaces in geometry of smooth manifolds, because they locally approximate the neighborhood of points on a manifold by means of flat vector spaces. For example consider the tangent line at a point in a smooth curve lying in 2{\mathbb{R}}^{2}. At a small neighborhood of that point, the tangent line is a locally good approximation of the curve. Here by local approximation, we mean that there is an invertible function ff between an open neighborhood of the point on the curve and an open interval around zero in the tangent line such that both ff and f1f^{-1} are smooth mappings.

One notes that this analogy is not perfect. For instance, tangent spaces vary “smoothly” when points vary smoothly in the manifold. But there is no notion of smoothness in graphs (at least nothing that I am aware of), because graph nodes are essentially discrete creatures. Another important gap in this analogy is the notion of “dimension”: In smooth (or even topological) manifolds, every point in a connected manifold enjoys a tangent space with the same dimension, which is the dimension of the manifold. Again, we do not have a notion of dimension in graphs or trees yet (to my knowledge). However this analogy helps us to think about higher order approximations of the neighborhood of a node in the graph. In Section 5, we expand the idea of counting the frequency of degrees to higher circles around each node.

3.2. NDF-equivalence as a requirement for graph isomorphisms

Another application of NDF embeddings is in graph isomorphism testing problem. It starts with the observation that if θ:GH\theta:G{\rightarrow}H is an isomorphism between two graphs GG and HH, then for every node vv in GG, we must have vndf(v)=vndf(θ(v))vndf(v)=vndf(\theta(v)). Therefore, vanilla NDF can serve as an easy test for two graphs not being isomorphic. In other words, a necessary condition for two undirected graphs to be isomorphic is that the vanilla NDF embeddings of them should have the same image with the same multiplicity for each element in the image. When studying the group of automorphisms of a graph, vanilla NDF can be used similarly to filter the set of potential permutations of nodes eligible to be an automorphism of the graph, see Part (ii) of Example 3.8 and Example 3.9. In order to formulate these ideas in mathematical terms, we need to define two levels of equivalence relations; between nodes and between graphs.

Definition 3.7.
  • (i)

    Two nodes in an undirected graph G=(V,E)G=(V,E) are called NDF-equivalent if their vanilla NDF vectors are the same. The partition of VV induced by this equivalence relation is called the NDF-partition.

  • (ii)

    Let G1=(V1,E1)G_{1}=(V_{1},E_{1}) and G2=(V2,E2)G_{2}=(V_{2},E_{2}) be two undirected graphs. The graphs G1G_{1} and G2G_{2} are called NDF-equivalent if the following conditions hold:

    • (a)

      The vanilla NDF embedding on them has the same target space, equivalently the maximum degrees of G1G_{1} and G2G_{2} are equal.

    • (b)

      The images of the vanilla NDF embedding on G1G_{1} and G2G_{2} are equal, that is vndf(G1)=vndf(G2)vndf(G_{1})=vndf(G_{2}).

    • (c)

      For every vector Xvndf(G1)=vndf(G2)X\in vndf(G_{1})=vndf(G_{2}), the inverse images of XX in V1V_{1} and V2V_{2} have the same number of nodes. That is

      |{vV1;vndf(v)=X}|=|{vV2;vndf(v)=X}|.|\{v\in V_{1};vndf(v)=X\}|=|\{v\in V_{2};vndf(v)=X\}|.

These equivalence relations are defined with respect to vanilla NDF. Similar definitions are possible for other types of NDF embeddings including higher order NDF embeddings. When it is necessary to distinguish between them, we will add an adjective (for example “vanilla’ or “minimal”, etc) to specify what kind of NDF was used to define the equivalence. It is straightforward to see that two isomorphic graphs are NDF-equivalent. But the converse is not true, even for trees, see Part (i) of the following example:

Example 3.8.

Let G1G_{1} and G2G_{2} be the following graphs:

G1:12345678910G_{1}:\lx@xy@svg{\hbox{\raise 0.0pt\hbox{\kern 5.5pt\hbox{\ignorespaces\ignorespaces\ignorespaces\hbox{\vtop{\kern 0.0pt\offinterlineskip\halign{\entry@#!@&&\entry@@#!@\cr&&&&&&&&\\&&&&&&&&\crcr}}}\ignorespaces{{{\hbox{\ellipsed@{5.5pt}{6.22221pt}}}}\hbox{\kern-5.5pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise-3.22223pt\hbox{$\textstyle{1}$}}}}}\ignorespaces\ignorespaces\ignorespaces{{{}{}{}{}{}}}{{{}{}{}{}{}}}\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces{}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}{{{\hbox{\ellipsed@{5.5pt}{6.22221pt}}}}\hbox{\kern 29.5pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise-3.22223pt\hbox{$\textstyle{2}$}}}}}\ignorespaces\ignorespaces\ignorespaces{{{}{}{}{}{}}}{{{}{}{}{}{}}}\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces{}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}{{{\hbox{\ellipsed@{5.5pt}{6.22221pt}}}}\hbox{\kern 64.5pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise-3.22223pt\hbox{$\textstyle{3}$}}}}}\ignorespaces\ignorespaces\ignorespaces{{{}{}{}{}{}}}{{{}{}{}{}{}}}\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces{}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}{{{\hbox{\ellipsed@{5.5pt}{6.22221pt}}}}\hbox{\kern 99.5pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise-3.22223pt\hbox{$\textstyle{4}$}}}}}\ignorespaces\ignorespaces\ignorespaces{{{}{}{}{}{}}}{{{}{}{}{}{}}}\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces{}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}{{{\hbox{\ellipsed@{5.5pt}{6.22221pt}}}}\hbox{\kern 137.00002pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise-3.22223pt\hbox{$\textstyle{5}$}}}}}\ignorespaces\ignorespaces\ignorespaces{{{}{}{}{}{}}}{{{}{}{}{}{}}}\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces{}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}\ignorespaces\ignorespaces\ignorespaces{{{}{}{}{}{}}}{{{}{}{}{}{}}}\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces{}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}{{{\hbox{\ellipsed@{5.5pt}{6.22221pt}}}}\hbox{\kern 174.50003pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise-3.22223pt\hbox{$\textstyle{6}$}}}}}\ignorespaces\ignorespaces\ignorespaces{{{}{}{}{}{}}}{{{}{}{}{}{}}}\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces{}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}{{{\hbox{\ellipsed@{5.5pt}{6.22221pt}}}}\hbox{\kern 209.50003pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise-3.22223pt\hbox{$\textstyle{7}$}}}}}\ignorespaces\ignorespaces\ignorespaces{{{}{}{}{}{}}}{{{}{}{}{}{}}}\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces{}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}{{{\hbox{\ellipsed@{5.5pt}{6.22221pt}}}}\hbox{\kern 244.50003pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise-3.22223pt\hbox{$\textstyle{8}$}}}}}\ignorespaces\ignorespaces\ignorespaces{{{}{}{}{}{}}}{{{}{}{}{}{}}}\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces{}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}{{{\hbox{\ellipsed@{5.5pt}{6.22221pt}}}}\hbox{\kern 279.50003pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise-3.22223pt\hbox{$\textstyle{9}$}}}}}{\hbox{\kern-3.0pt\raise-36.44443pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{\hbox{\kern 32.0pt\raise-36.44443pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{\hbox{\kern 67.0pt\raise-36.44443pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{\hbox{\kern 102.0pt\raise-36.44443pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{{{\hbox{\ellipsed@{8.0pt}{6.22221pt}}}}\hbox{\kern 134.5pt\raise-36.44443pt\hbox{\hbox{\kern 3.0pt\raise-3.22223pt\hbox{$\textstyle{10}$}}}}}{\hbox{\kern 177.00003pt\raise-36.44443pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{\hbox{\kern 212.00003pt\raise-36.44443pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{\hbox{\kern 247.00003pt\raise-36.44443pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{\hbox{\kern 282.00003pt\raise-36.44443pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}\ignorespaces}}}}\ignorespaces
G2:12345678910G_{2}:\lx@xy@svg{\hbox{\raise 0.0pt\hbox{\kern 5.5pt\hbox{\ignorespaces\ignorespaces\ignorespaces\hbox{\vtop{\kern 0.0pt\offinterlineskip\halign{\entry@#!@&&\entry@@#!@\cr&&&&&&&&\\&&&&&&&&\crcr}}}\ignorespaces{{{\hbox{\ellipsed@{5.5pt}{6.22221pt}}}}\hbox{\kern-5.5pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise-3.22223pt\hbox{$\textstyle{1}$}}}}}\ignorespaces\ignorespaces\ignorespaces{{{}{}{}{}{}}}{{{}{}{}{}{}}}\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces{}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}{{{\hbox{\ellipsed@{5.5pt}{6.22221pt}}}}\hbox{\kern 29.5pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise-3.22223pt\hbox{$\textstyle{2}$}}}}}\ignorespaces\ignorespaces\ignorespaces{{{}{}{}{}{}}}{{{}{}{}{}{}}}\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces{}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}{{{\hbox{\ellipsed@{5.5pt}{6.22221pt}}}}\hbox{\kern 64.5pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise-3.22223pt\hbox{$\textstyle{3}$}}}}}\ignorespaces\ignorespaces\ignorespaces{{{}{}{}{}{}}}{{{}{}{}{}{}}}\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces{}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}{{{\hbox{\ellipsed@{5.5pt}{6.22221pt}}}}\hbox{\kern 102.00002pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise-3.22223pt\hbox{$\textstyle{4}$}}}}}\ignorespaces\ignorespaces\ignorespaces{{{}{}{}{}{}}}{{{}{}{}{}{}}}\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces{}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}\ignorespaces\ignorespaces\ignorespaces{{{}{}{}{}{}}}{{{}{}{}{}{}}}\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces{}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}{{{\hbox{\ellipsed@{5.5pt}{6.22221pt}}}}\hbox{\kern 139.50003pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise-3.22223pt\hbox{$\textstyle{5}$}}}}}\ignorespaces\ignorespaces\ignorespaces{{{}{}{}{}{}}}{{{}{}{}{}{}}}\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces{}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}{{{\hbox{\ellipsed@{5.5pt}{6.22221pt}}}}\hbox{\kern 174.50003pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise-3.22223pt\hbox{$\textstyle{6}$}}}}}\ignorespaces\ignorespaces\ignorespaces{{{}{}{}{}{}}}{{{}{}{}{}{}}}\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces{}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}{{{\hbox{\ellipsed@{5.5pt}{6.22221pt}}}}\hbox{\kern 209.50003pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise-3.22223pt\hbox{$\textstyle{7}$}}}}}\ignorespaces\ignorespaces\ignorespaces{{{}{}{}{}{}}}{{{}{}{}{}{}}}\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces{}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}{{{\hbox{\ellipsed@{5.5pt}{6.22221pt}}}}\hbox{\kern 244.50003pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise-3.22223pt\hbox{$\textstyle{8}$}}}}}\ignorespaces\ignorespaces\ignorespaces{{{}{}{}{}{}}}{{{}{}{}{}{}}}\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces{}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}{{{\hbox{\ellipsed@{5.5pt}{6.22221pt}}}}\hbox{\kern 279.50003pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise-3.22223pt\hbox{$\textstyle{9}$}}}}}{\hbox{\kern-3.0pt\raise-36.44443pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{\hbox{\kern 32.0pt\raise-36.44443pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{\hbox{\kern 67.0pt\raise-36.44443pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{{{\hbox{\ellipsed@{8.0pt}{6.22221pt}}}}\hbox{\kern 99.5pt\raise-36.44443pt\hbox{\hbox{\kern 3.0pt\raise-3.22223pt\hbox{$\textstyle{10}$}}}}}{\hbox{\kern 142.00003pt\raise-36.44443pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{\hbox{\kern 177.00003pt\raise-36.44443pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{\hbox{\kern 212.00003pt\raise-36.44443pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{\hbox{\kern 247.00003pt\raise-36.44443pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{\hbox{\kern 282.00003pt\raise-36.44443pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}\ignorespaces}}}}\ignorespaces
  • (i)

    Clearly, G1G_{1} and G2G_{2} are not isomorphic, but they are NDF-equivalent, see Table 4.

  • (ii)

    If an automorphism of G1G_{1} (resp. G2G_{2}) does not change nodes 1 and 9, it must be the trivial automorphism. On the other hand, since nodes 1 and 9 are NDF-equivalent in graph G1G_{1} (resp. G2G_{2}), every non-trivial automorphism on G1G_{1} (resp. G2G_{2}) should switch nodes 1 and 9. Therefore, with a little of analysis, one observes that there is only one non-trivial automorphism on G1G_{1} given with the following permutation:

    (1234567891098765432110).\left(\begin{array}[]{cccccccccc}1&2&3&4&5&6&7&8&9&10\\ 9&8&7&6&5&4&3&2&1&10\end{array}\right).

    However, since 4 has a unique NDF vector in G2G_{2}, it is mapped to itself by all automorphisms of G2G_{2}. Therefore if an automorphism wants to switches 1 and 9, it should map isomorphically the path (1,2,3,4)(1,2,3,4) onto the path (9,8,7,6,5,4)(9,8,7,6,5,4), which is impossible. Thus, there is no non-trivial automorphism on graph G2G_{2}.

Nodes in G1G_{1} Nodes in G2G_{2} The Associated NDF Vector
1, 9 1, 9 (0, 1, 0)
2, 8 2, 8 (1, 1, 0)
3, 7 6, 7 (0, 2, 0)
4, 6 3, 5 (0, 1, 1)
5 4 (1, 2, 0)
10 10 (0, 0, 1)
Table 4. Comparing the NDF embeddings of graphs G1G_{1} and G2G_{2} of Example 3.8
Example 3.9.

In this example, we apply the vanilla NDF of the graph GG shown in Figure 1 to prove that the only non-trivial automorphism of GG is the permutation switching only nodes QQ and RR. According to the vanilla NDF of nodes of GG listed in Table 3, we have three NDF-equivalence classes with more than one element. Regarding our discussion at the beginning of this subsection, these are the only chances for having a non-trivial automorphism on GG. Again using NDF vectors, we try to eliminate most of these chances. Nodes YY and HH both have only one neighbor of degree 2, i.e. respectively, LL and MM. If an automorphism switches YY and HH, then it must switch LL and MM too, which is impossible because vndf(L)vndf(M)vndf(L)\neq vndf(M). Similar arguments show that no automorphism can switch the pairs (MM, NN), (PP, QQ) and (PP, RR). Therefore, the only possibility for an automorphism on graph GG is the permutation on the set of nodes which switches nodes QQ and RR. And one easily checks that this permutation is actually an automorphism on GG.

Color refinement or 1-dimensional Weisfeiler-Leman algorithm is an iterative algorithm to partition the set of nodes of undirected graphs and is a popular algorithm for isomorphism testing, see [3] for an introduction to the subject. To show the relationship between NDF-equivalence and color refinement, we only need to recall its definition. At the start of this algorithm, a single color is associated to all nodes. In each round (iteration) of color refinement, two nodes uu and vv with the same color are assigned different colors only if there is a color, say cc, such that the number of neighbors of uu with color cc is different than the number of neighbors of vv with color cc. So, after the first round of color refinement the set of nodes of the graph are partitioned according to their degrees, that is uu and vv have the same color if and only if they have the same degree. After the second round, two nodes uu and vv have the same color if and only if for every occurring degree, say kk, the numbers of neighbors of uu of degree kk is equal to the numbers of neighbors of vv of degree kk. This is exactly the NDF-partition.

We shall come back to this subject in Subsection 5.2, where we show how RNDFC-equivalence (a version of higher order NDF-equivalence) performs better than color refinement in certain examples.

4. NDF Embeddings for Dynamic Graphs

To address the problems arising from the abundance of possible degrees in big graphs and the ever-changing nature of dynamic graphs appearing in real world applications, our main approach is to replace the degrees list appearing in the definition of the minimal NDF, see Definition 3.2, with a list of intervals and then compute the NDF vector representation of nodes with respect to the intervals in this list. To this work, we have to impose three necessary conditions:

  • (1)

    The union of intervals must cover all natural numbers.

  • (2)

    The intervals must be mutually disjoint, that is the intersection of every two different intervals must be empty.

  • (3)

    The list of starting numbers of intervals must be in ascending order.

The first two conditions mean that the set of intervals must be a partition of the set of natural numbers. The third condition determines the order of components of the NDF vector. These conditions imply that the first interval must start with 1 and the last interval has to be infinite (with no end). Furthermore, since we intend to use the NDF vectors as features in machine learning algorithms, we impose a rather non-mandatory condition too:

  • (4)

    The last interval must include the maximum degree of nodes in the graph.

Condition (4) guarantees that the last entry of the NDF vector of (at least) one node is non-zero. Otherwise, the last entry would be redundant. One may wish to impose a stronger version of Condition (4) to ensure that every interval in the intervals list contains some occurring degree, but this can be a requirement that is very hard to satisfy in general. We formalize requirements (1)-(4) in the following definition:

Definition 4.1.

Let G=(V,E)G=(V,E) be an undirected graph.

  • (i)

    An intervals list \mathcal{I} for graph GG can be realized by a finite sequence {n1,n2,,nm}\{n_{1},n_{2},\cdots,n_{m}\} of natural numbers such that 1=n1<n2<<nmd1=n_{1}<n_{2}<\cdots<n_{m}\leq d, where dd is the maximum degree of nodes in GG. We call such a sequence the list of starting points of the intervals list. Then the intervals list \mathcal{I} is the ordered set {I1,I2,,Im}\{I_{1},I_{2},\cdots,I_{m}\}, where Ii={ni,ni+1,,ni+11}I_{i}=\{n_{i},n_{i}+1,\cdots,n_{i+1}-1\} for i=1,,m1i=1,\cdots,m-1 and Im={nm,nm+1,}I_{m}=\{n_{m},n_{m}+1,\cdots\}.

  • (ii)

    To any given intervals list \mathcal{I} for graph GG as above, we associate a vector representation dndf:Vmdndf:V{\rightarrow}\mathbb{R}^{m} called the dynamic neighbors degree frequency embedding of nodes of GG defined as dndf(v)=(dndf(v)1,,dndf(v)m)dndf(v)=(dndf(v)_{1},\cdots,dndf(v)_{m}) where for i=1,,mi=1,\cdots,m, dndf(v)idndf(v)_{i} is the number of neighbors of node vv whose degrees lie in IiI_{i}.

What we are referring to as an “interval” here can also be called a “range”. However we prefer the word interval because each interval is indeed a usual interval in real numbers restricted to natural numbers. Although the definition of a dynamic NDF embedding depends on the intervals list \mathcal{I} employed in its construction, for the sake of simplicity, we omit the intervals list \mathcal{I} from our notation. When it is important one may want to emphasis this dependency by denoting the dynamic NDF by dndfdndf_{\mathcal{I}}.

It is clear from the above definition that the dimension of vectors obtained in dynamic NDF equals the number of intervals in the chosen intervals list. In the following example, we explain that our definition of dynamic NDFs has enough flexibility to cover dynamic versions of vanilla and minimal NDF as well as degree centrality.

Example 4.2.

In this example, let GG be an undirected graph.

  • (i)

    Let dd denote the maximum degree of nodes in GG. Then the intervals list \mathcal{I} constructed from the list of starting points {1,2,,d}\{1,2,\cdots,d\} gives rise to the dynamic version of vanilla NDF. For future reference, we call \mathcal{I} the vanilla intervals list.

  • (ii)

    As Definition 3.2, let {d1,,dm}\{d_{1},\cdots,d_{m}\} be the ascending list of all occurring degrees of nodes in GG. If d11d_{1}\neq 1, we change it to 1. Then it is a list of starting points and defines an intervals list, which we call it the minimal intervals list. The dynamic NDF associated to this intervals list is the dynamic version of minimal NDF.

  • (iii)

    The singleton {1}\{1\} can be considered as a list of starting points. The intervals list associated to it consists of only one interval which is the whole natural numbers and the dynamic NDF defined on top of this intervals list is simply the assignment of degree to each node of the graph. Up to some normalizing factor, this one dimensional vector representation of nodes is the same as degree centrality on GG.

The above example demonstrates the relationship between dynamic NDF embeddings and previously defined NDF embeddings. In the following definition, we present two general methods to construct intervals lists. Based on the intuition gained from network science, we assume that most nodes of a graph have low degrees and substantially fewer nodes have high degrees. This suggests that we devote more intervals to lower degrees and fewer intervals to higher degrees. On the other hand, nodes with higher degrees appear to be the neighbors of much greater number of nodes, and so it is equally important to differentiate between high degree nodes too. We also note that the distance between higher degrees are much bigger than the distance between lower degrees, which suggests smaller intervals around lower degrees and bigger intervals around higher degrees. Based on these observations, we propose an “adjustable design” for partitioning the set of natural numbers into intervals lists. Therefore, we include some parameters in our construction to create more flexibility and control on the design of intervals lists.

Definition 4.3.

Let GG be an undirected graph and let dd be the maximum degree of nodes in GG. We set a maximum length mm for intervals.

  • (i)

    As the first approach, we try to partition the set of natural numbers into intervals of fix length mm (besides the last interval which has to be infinite!). Due to the possible remainder, we let the potentially smaller interval to be the first interval. Therefore our algorithm for choosing a “uniform” list of starting points of the intervals list would be as follows:

    Input: Integers dm1d\geq m\geq 1 and d>1d>1
    // dd: the maximum degree of nodes in GG
    // mm: the maximum length of intervals
    Output: A list of starting points
    next_pointdnext\_point\leftarrow d
    // Assign an empty list to starting_pointsstarting\_points
    starting_points[]starting\_points\leftarrow[\ ]
    while next_point>0next\_point>0 do
           starting_pointsstarting\_points.append(next_pointnext\_point)
           next_pointnext_pointmnext\_point\leftarrow next\_point-m
           if next_point1next\_point\leq 1 then
                 starting_pointsstarting\_points.append(11)
                 break
                
           end if
          
    end while
    // Reverse the order of starting_pointsstarting\_points
    starting_pointsstarting\_points.reverse()
    return starting_pointsstarting\_points
    Algorithm 1 Generating a uniform list of starting points
  • (ii)

    As the second approach, we try to generate an intervals list such that the lengths of the intervals are increasing. In Algorithm 2, we set a starting length ss and a maximum length mm for intervals. Then in each step, we increase the length of intervals by the ratio rr until we reach the maximum length.

    Input: Integers dms1d\geq m\geq s\geq 1 and a real number r>1r>1
    // dd: the maximum degree of nodes in GG
    // mm: the maximum length of intervals
    // ss: the starting length of intervals
    // rr: the ratio of the length of intervals
    Output: A list of starting points
    next_lengthsnext\_length\leftarrow s
    starting_points[1,s+1]starting\_points\leftarrow[1,s+1]
    while starting_points[1]<dstarting\_points[-1]<d do
           next_lengthnext\_length\leftarrow min(next_lengthr,m)(next\_length*r,m)
           next_pointnext\_point\leftarrow int(starting_points[1]+next_length)(starting\_points[-1]+next\_length)
           starting_pointsstarting\_points.append(next_pointnext\_point)
          
    end while
    if starting_points[1]>dstarting\_points[-1]>d then
           starting_pointsstarting\_points.pop(1-1)
          
    end if
    return starting_pointsstarting\_points
    Algorithm 2 Generating an increasing list of starting points

One notes that if we set s=ms=m (the starting length being equal to maximum length) in Algorithm 2, we do not necessarily get the same intervals list as obtained in Algorithm 1 with the same maximum length, even though there is no increase in the length of intervals in both cases. In this scenario, the intervals list obtained from Algorithm 2 consists of intervals of fixed length mm except the last interval that contains the maximum degree of nodes and has to be infinite. However the intervals list obtained from Algorithm 1 consists of intervals of length mm and the last interval which starts from the maximum degree of nodes and possibly the first interval that might have length less than mm.

Example 4.4.

Let GG be the graph shown in Figure 1. We define two intervals lists as follows:

1\displaystyle\mathcal{I}_{1} =\displaystyle= {{1,2},{3,4,}}\displaystyle\{\{1,2\},\{3,4,\cdots\}\}
2\displaystyle\mathcal{I}_{2} =\displaystyle= {{1},{2,3},{4,5,}}\displaystyle\{\{1\},\{2,3\},\{4,5,\cdots\}\}

The intervals list 1\mathcal{I}_{1} can be generated by Algorithm 2 by setting s=2s=2, mm any integer greater than or equal to 33 and rr any real number greater than or equal to 1.51.5. Similarly, 2\mathcal{I}_{2} can be generated by Algorithm 2 by setting s=1s=1, m=2m=2 and rr any real number greater than or equal to 22. The dynamic NDF embeddings of nodes of GG with respect to 1\mathcal{I}_{1} and 2\mathcal{I}_{2} are shown in Table 5. We notice that by reducing the dimension of NDF embedding from 5 (for vanilla NDF) to 3 (for dynamic NDF w.r.t 2\mathcal{I}_{2}) or even to 2 (for dynamic NDF w.r.t 1\mathcal{I}_{1}), we do not lose a whole lot of precision. One can expect this phenomenon to occur more often and more notably in graphs whose maximum degrees are very large.

Node(s) Dynamic NDF w.r.t 1\mathcal{I}_{1} Node(s) Dynamic NDF w.r.t 2\mathcal{I}_{2}
A (2, 3) A (1, 2, 2)
D (2, 2) D (2, 2, 0)
K (3, 1) K (1, 3, 0)
Y, H (1, 3) Y, H (0, 2, 2)
B, I (0, 3) B (0, 1, 2)
I (0, 0, 3)
E (1, 2) E (0, 2, 1)
C, F, M, N (1, 1) C, M, N (0, 1, 1)
F (0, 2, 0)
L (0, 2) L (0, 0, 2)
J, P, Q, R (0, 1) J, P, Q, R (0, 0, 1)
Table 5. Two dynamic NDF embeddings of nodes of graph GG, shown in Figure 1 with respect to the intervals lists 1\mathcal{I}_{1}, 2\mathcal{I}_{2} explained in Example 4.4.

The above algorithms demonstrate only two possible ways of generating starting points for intervals lists, and consequently two ways of defining reasonable dynamic NDFs on graphs.

Remark 4.5.

The only feature of the graph used in these methods is the maximum degree of nodes in the graph and the rest of parameters are set by the user. The best way of defining an intervals list for a specific graph depends on the distribution of degrees of nodes in natural numbers. Therefore, for a given graph with a particular structure, in order to define the most relevant dynamic NDF embedding, one has to incorporate more detailed information about the set of occurring degrees. There are also situations that we need to define a common intervals list for two or more graphs. In these situations, we choose an auxiliary point less than the maximum degree (call it the “last point”). Then, we use one of the above algorithms to choose a list of starting points up to the last point. To cover the rest of degrees, we choose a complementary list of points according to the task in hand and the set of degrees of nodes. The concatenation of these two lists of starting points yields the final list of starting points. For example, assume we have a graph whose set of occurring degrees up to 100 can be perfectly covered by Algorithm 2 and it has only 5 nodes whose degrees are greater than 100, say {108,111,121,148,293}\{108,111,121,148,293\}. We let the last point be 100 and choose the primary list SP1SP_{1} of starting points using Algorithm 2. We heuristically set the complementary list SP2SP_{2} of points to be {103,116,136,201}\{103,116,136,201\}. Then the concatenation SP1+SP2SP_{1}+SP_{2} of these two lists is our final list of starting points. While choosing the complementary list SP2SP_{2}, we didn’t care about any particular growth rate for the lengths of intervals. Instead, We tried to impose two things: First, to avoid any empty interval, an interval which contains none of the degrees in the set {108,111,121,148,293}\{108,111,121,148,293\}. Secondly, intervals should contain some margin around the degrees they contain, (e.g. the first interval contains some numbers less than 108 and some numbers greater than 111). We use this method several times in this article to build mixed lists of starting points, for instance see Examples 6.4, 8.2 and 8.3.

We note that by choosing appropriate values for the parameters of the above algorithms, we can control the dimension of the vector representation obtained from dynamic NDF embeddings. For instance, two and three dimensional dynamic NDF embeddings of the graph shown in Figure 1 are discussed in Example 4.4. To some extent, this resolves the “curse of dimensionality” explained after Definition 3.1. But aggressively reducing the dimension of the embedding by this method imposes a coarse aggregation among nodes, because all neighbors of a node whose degrees lie in an interval are considered the same type. An alternative way of reducing the dimension of the embedding would be to apply a deep neural network algorithm equipped with some appropriate objective function on the vectors obtained from dynamic NDFs. Then the number of neurons in the last layer (before applying the objective function) determines the dimension of the final embedding. Since the objective function should be designed according to the downstream tasks, we postpone the detailed analysis of these types of the compositions of neural networks and NDF embeddings to future works.

Dynamic NDF embeddings employ some sort of aggregation while counting the degree frequency of neighbors. Therefore, some precision has been lost and we cannot repeat the content of Subsection 3.1 for dynamic NDFs. However, it is still true that the degree of a node equals the sum of all entries of its dynamic NDF vector, i.e. deg(v)=s1(v)=i=1mdndf(v)ideg(v)=s_{1}(v)=\sum_{i=1}^{m}dndf(v)_{i} for all nodes vv in the graph.

The content of Subsection 3.2 can be repeated for dynamic embeddings as well, although it is obvious that dynamic NDF embeddings have less expressive power than vanilla and minimal NDF embeddings, see Example 4.4. To some extent, the expressive power of dynamic NDF embeddings can be increased by considering higher order NDF embeddings, which is the subject of the next section.

5. Higher Order NDF Embeddings as Matrix Representations of Nodes

So far for NDF embeddings, we have been considering the degree frequency of neighbors of a node and driving a vector representation from it. For higher order generalizations of NDF embeddings, we have two natural choices to drive vectors from circles around the node:

In the first approach, we consider the previously defined NDF vector as the radius zero NDF. That is, for a node vv, we computed neighbors degree frequency of elements of C0(v)={v}C_{0}(v)=\{v\}, the circle of radius zero centered at vv. So naturally for n>0n>0, the radius nn NDF vector would be an aggregation of neighbors degree frequency of elements of Cn(v)C_{n}(v). The aggregation method we adopt here is simply the mean of NDF vectors of elements of Cn(v)C_{n}(v), but one can try other aggregation methods, depending on downstream tasks. More advanced option would be to learn the aggregation method, perhaps, similar to what has been done in [5], but in the level of each circle. Computing mean amounts to dividing the sum of NDF vectors of elements of Cn(v)C_{n}(v) by the size of Cn(v)C_{n}(v), i.e. sn(v)s_{n}(v). Dividing by sn(v)s_{n}(v) is necessary, because it makes the set of NDF vectors associated to different radiuses more homogeneous and comparable to each other. Whenever Cn(v)C_{n}(v) is empty for some nn, the radius nn NDF vector is simply defined to be the zero vector.

In the second approach, we consider the previously defined NDF vector as the radius one degree frequency (DF). The justification is that the NDF vector of a node vv is computed by counting the “degree frequency” of elements of C1(v)C_{1}(v). Therefore for an arbitrary natural number nn, the order nn generalization of the NDF vector could be the degree frequency of elements of Cn(v)C_{n}(v). This method produces a sequence of vectors with sharply varying magnitude and thus not comparable to each other, for instance see Examples 5.2(iii) below. In order to normalize the degree frequency vectors in different radiuses, we divide each nonzero DF vector of radius nn by sn(v)s_{n}(v). Again when Cn(v)C_{n}(v) is empty (i.e. sn(v)=0s_{n}(v)=0), the DF vector of radius nn is defined to be the zero vector. One checks that sn(v)s_{n}(v) is exactly the 1\ell^{1}-norm of the DF vector of radius nn centered at node vv, so this method of normalization is called the 1\ell^{1}-normalization. It converts every nonzero DF vector (including the previously defined NDF vector) into a probability vector. In this approach, one should also note that we use the phrase “degree frequency” (DF) in lieu of “neighbors degree frequency” (NDF), because the degree frequency of very elements of circles are computed, not the ones of their neighbors!

Finally, these methods of extending NDF vector representations even without dividing by the size of circles (normalization) can be useful in certain topics, so we record both versions for later references. The formal definitions of these generalizations of NDF embeddings are as follows:

Definition 5.1.

Let G=(V,E)G=(V,E) be an undirected graph and let ={I1,I2,,Im}\mathcal{I}=\{I_{1},I_{2},\cdots,I_{m}\} be an intervals list for GG.

  • (i)

    The order rr matrix of neighbors degree frequency of circles (NDFC matrix) with respect to the intervals list \mathcal{I} associates an (r+1)×m(r+1)\times m matrix ndfcr(v)ndfc_{r}(v) to each node vVv\in V such that the kk-th row of ndfcr(v)ndfc_{r}(v) is computed as follows:

    1sk1(v)uCk1(v)dndf(u),when sk1(v)0,\frac{1}{s_{k-1}(v)}\sum_{u\in C_{k-1}(v)}dndf(u),\qquad\text{when }s_{k-1}(v)\neq 0,

    and the zero vector of dimension mm when sk1(v)=0s_{k-1}(v)=0.

    The order rr RNDFC matrix, the raw version of NDFC matrix, of a node vv is denoted by rndfcr(v)rndfc_{r}(v) and is defined similarly except that the kk-th row of rndfcr(v)rndfc_{r}(v) is not divided by sk1(v)s_{k-1}(v).

  • (ii)

    The order rr matrix of circles degree frequency (CDF matrix) with respect to the intervals list \mathcal{I} assigns an r×mr\times m matrix cdfr(v)cdf_{r}(v) to each node vVv\in V such that when si(v)0s_{i}(v)\neq 0, the (i,j)(i,j)-th entry of cdfr(v)cdf_{r}(v) is the number of nodes in Ci(v)C_{i}(v) whose degrees belong to IjI_{j} divided by si(v)s_{i}(v), that is

    cdfr(v)ij=|{uCi(v);deg(u)Ij}|si(v),when si(v)0.cdf_{r}(v)_{ij}=\frac{|\{u\in C_{i}(v);deg(u)\in I_{j}\}|}{s_{i}(v)},\qquad\text{when }s_{i}(v)\neq 0.

    When si(v)=0s_{i}(v)=0, the entire ii-th row of cdfr(v)cdf_{r}(v) is defined to be the zero vector of dimension mm.

    The order rr RCDF matrix, the raw version of the CDF matrix, of a node vv is denoted by rcdfr(v)rcdf_{r}(v) and is defined similarly except that the (i,j)(i,j)-th entry of rcdfr(v)rcdf_{r}(v) is not divided by si(v)s_{i}(v).

One notes that the first row of NDFC (respectively RNDFC and RCDF) matrix is the same as the dynamic NDF vector. However the first row of CDF matrix is the 1\ell^{1}-normalized version of the dynamic NDF vector, i.e. dndf(v)/dndf(v)1dndf(v)/\|dndf(v)\|_{1} where 1\|-\|_{1} is the 1\ell^{1}-norm. In the following examples, we highlight some of the features of these matrix representations of nodes of a graph:

Examples 5.2.

Let GG be the graph in Figure 1. Let 0\mathcal{I}_{0} be the vanilla intervals list of GG and let 1\mathcal{I}_{1} be as defined in Example 4.4. In the following, we say a matrix representation separates two nodes if its values on these nodes are different. We round all float numbers up to three decimal digits.

  • (i)

    The order 3 NDFC matrix w.r.t. 0\mathcal{I}_{0} for nodes HH and YY are as follows:

    ndfc3(H)=[0.1.1.1.1.0.250.750.51.750.250.1670.6670.3330.6670.50.50.51.0.50.],ndfc3(Y)=[0.1.1.1.1.0.250.50.52.0.250.20.80.40.40.60.40.60.80.60.]ndfc_{3}(H)=\left[\begin{array}[]{lllll}0.&1.&1.&1.&1.\\ 0.25&0.75&0.5&1.75&0.25\\ 0.167&0.667&0.333&0.667&0.5\\ 0.5&0.5&1.&0.5&0.\end{array}\right],\quad ndfc_{3}(Y)=\left[\begin{array}[]{lllll}0.&1.&1.&1.&1.\\ 0.25&0.5&0.5&2.&0.25\\ 0.2&0.8&0.4&0.4&0.6\\ 0.4&0.6&0.8&0.6&0.\end{array}\right]

    Although the vanilla NDF vectors of HH and YY are the same, others rows of the NDFC matrix of these nodes are different. In fact, even the order 1 NDFC matrix representation of nodes of GG w.r.t. 0\mathcal{I}_{0} can separate all nodes of GG, except RR and QQ for which there is an automorphism of GG that flips them, so they must have identical neighborhood statistics. This emphasizes the fact that higher order NDF representations of nodes have more expressive power, namely they are better at revealing the differences between neighborhoods of the nodes of a graph. This is in line with our discussion in Subsection 3.2 and will be explained more in Subsection 5.2.

  • (ii)

    The order 5 NDFC matrix w.r.t. 1\mathcal{I}_{1} for nodes BB and II are as follows::

    ndfc5(B)=[0.3.1.6672.3330.5711.5710.3332.2.1.0.1.],ndfc5(I)=[0.3.1.6672.3330.81.60.3331.6671.3331.6670.1.]ndfc_{5}(B)=\left[\begin{array}[]{ll}0.&3.\\ 1.667&2.333\\ 0.571&1.571\\ 0.333&2.\\ 2.&1.\\ 0.&1.\end{array}\right],\quad ndfc_{5}(I)=\left[\begin{array}[]{ll}0.&3.\\ 1.667&2.333\\ 0.8&1.6\\ 0.333&1.667\\ 1.333&1.667\\ 0.&1.\end{array}\right]

    Although nodes BB are II have different vanilla NDF vectors, the first two rows of their NDFC matrices w.r.t. 1\mathcal{I}_{1} are the same. In fact, it take the order 2 NDFC matrix representation of nodes of GG w.r.t. 1\mathcal{I}_{1} to separate all nodes of GG, (of course, except RR and QQ).

  • (iii)

    The order 3 CDF and RCDF matrices w.r.t. 1\mathcal{I}_{1} of nodes C,F,MC,F,M and NN are as follows:

    cdf3(C)=[0.50.50.20.80.50.5],cdf3(F)=[0.50.50.1.0.60.4],cdf3(M)=[0.50.50.1.0.80.2],cdf3(N)=[0.50.50.50.50.1.]cdf_{3}(C)=\left[\begin{array}[]{ll}0.5&0.5\\ 0.2&0.8\\ 0.5&0.5\end{array}\right],cdf_{3}(F)=\left[\begin{array}[]{ll}0.5&0.5\\ 0.&1.\\ 0.6&0.4\end{array}\right],cdf_{3}(M)=\left[\begin{array}[]{ll}0.5&0.5\\ 0.&1.\\ 0.8&0.2\end{array}\right],cdf_{3}(N)=\left[\begin{array}[]{ll}0.5&0.5\\ 0.5&0.5\\ 0.&1.\end{array}\right]
    rcdf3(C)=[111422],rcdf3(F)=[110332],rcdf3(M)=[110441],rcdf3(N)=[112202]rcdf_{3}(C)=\left[\begin{array}[]{ll}1&1\\ 1&4\\ 2&2\end{array}\right],\quad rcdf_{3}(F)=\left[\begin{array}[]{ll}1&1\\ 0&3\\ 3&2\end{array}\right],\quad rcdf_{3}(M)=\left[\begin{array}[]{ll}1&1\\ 0&4\\ 4&1\end{array}\right],\quad rcdf_{3}(N)=\left[\begin{array}[]{ll}1&1\\ 2&2\\ 0&2\end{array}\right]

    One can easily check that the CDF matrix representation has to be of order at least 3 to be able to separate the nodes of GG (except RR and QQ), while order 2 is enough for RCDF to do the same task. This example clearly shows that RCDF can be more efficient than CDF in separating the nodes of GG.

To use our matrix representations of nodes in our exemplary machine learning algorithms in next sections, we shall apply certain aggregations to turn them into 1-dimensional arrays. However the present 2-dimensional array structure of these representations offers certain secondary applications.

5.1. Shapes of neighborhoods of nodes

The matrix representations of nodes as 2-dimensional arrays can be used to produce a simple visualization of the neighborhoods of nodes in a graph. For this purpose, we use only NDFC matrices because of several reasons: (1) In NDFC matrices, all rows are NDF vectors or their means (aggregations). So, there is a meaningful relationship between different rows. (2) The rows of RNDFC and RCDF matrices are not comparable to each other and heavily populated rows suppress less populated rows. So one cannot compare them by looking at the visualization. For example, consider rcdf3(C)rcdf_{3}(C) in Part (iii) of Example 5.2, there is no obvious and meaningful way to compare the second row with the first row. (3) The row-by-row 1\ell^{1}-normalization of CDF matrices might suppress some important values. For instance, consider cdf3(F)cdf_{3}(F) in Part (iii) of Example 5.2. In the matrix rcdf3(F)rcdf_{3}(F), we observe that there are two nodes in C3(F)C_{3}(F) whose degrees are greater than 2, while only one node in C1(F)C_{1}(F) has a degree greater than 2. However, in cdf3(F)cdf_{3}(F), the entry (3,2)(3,2) is less than the entry (1,2)(1,2), which is clearly counter-intuitive.

We slightly modify NDFC matrices to obtain a more extensive visualization. With the notation of Definition 5.1, for every node vv in the graph, we define a vector of dimension mm whose all entries are zero except the jj-th entry which is 1 and jj is the index such that the degree of vv belongs to IjI_{j}. We call this vector the degree vector of vv with respect to the intervals list \mathcal{I}. We add this vector as the first row to the NDFC matrix and shift other rows below. In this way, we can view the degree of the node as the NDF vector of an imaginary circle of radius -1 around each node and take it into account while visualizing other NDF vectors of circles. Let us call the matrix representation obtained in this way the visual NDFC (or VNDFC) matrix representation of nodes of the graph and denote the order rr VNDFC matrix representation of a node vVv\in V by vndfcr(v)vndfc_{r}(v).

Example 5.3.

We choose 7 nodes AA, JJ, CC, DD, QQ, MM, and NN of the graph GG in Figure 1 and use the Python package matplotlib to draw the (gray) color map of the order 7 VNDFC matrix representations of these nodes w.r.t. the vanilla intervals list. The order 7 is chosen because the diameter of GG is 7 and all 9 rows of the vndfc7(Q)vndfc_{7}(Q) are non-zero. Figure 2 illustrates the visualizations of these nodes. It is worth noting that since nodes JJ and QQ are connected to the rest of graph via nodes AA and DD, respectively, they have very similar visualizations to the ones of AA and DD, respectively. One observes less similarity between the visualizations of nodes AA and CC, because AA only connects CC to the left part of the graph. One also notices some similarity between visualizations of nodes MM and NN, which is not surprising regarding the similarity of their neighborhoods in GG.

Refer to caption
Figure 2. Visualizations of several selected nodes of the graph depicted in Figure 1 as described in Example 5.3.

The representation of nodes by 2-dimensional arrays and the above visualizations suggest the idea that one can feed the higher order NDF matrices into convolutional neural network models and extract some patterns. However, unfortunately, there is an important reason that prevents two dimensional convolutions to be able to detect and reveal meaningful patterns in these matrices. The reason is that the rows and columns of these matrices have totally different natures, and so unlike images, the patterns in higher order NDF matrices are not equivariant under most two dimensional geometric transformations. That being said, inspired by the observations made in Example 5.3, there are some hopes for one dimensional convolutional operations to be useful in detecting certain patterns among these matrices. Specifically, we expect the vertical (column-wise) convolution operations be able to detect hidden patterns in higher order NDF matrices. Further evidences for this idea are provided by the success of applying certain column-wise aggregations to the RCDF matrix representations of nodes in Section 7. Example 6.2 shows an instance of such a convolutional neural networks, though it has less success than the other neural network that consists of fully connected linear layers.

5.2. RNDFC-equivalence vs color refinement

We have four options for higher order generalizations of NDF-equivalence; equivalences based on NDFC, RNDFC, CDF, and RCDF matrix representations of nodes. During normalization of rows in NDFC and CDF matrices, we loose some information, so raw versions of these matrix representations, i.e. RNDFC and RCDF, work better for separating the nodes of a graph. On the other hand, since RNDFC matrices are computed based on neighbors degree frequency, they are more similar to color refinement and heuristically better than RCDF matrices, which are computed based on degree frequency. Therefore we work only with vanilla RNDFC-equivalence of graphs and nodes here. The definition of RNDFC-equivalence is the same as Definition 3.7. Also, one checks that isomorphism of graphs implies RNDFC-equivalence.

Here, we content ourselves with two examples in which RNDFC-equivalence can detect two non-isomorphic graphs while color refinement cannot and postpone the detailed study of RNDFC-equivalence as a tool for graph isomorphism testing to our future works.

Example 5.4.

Let G1G_{1} be the cycle graph of length 6 and let G2G_{2} be the disjoint union of two cycle graphs of length 3. They are both 2-regular graphs, and so color refinement does not change the colors of nodes. Therefore it cannot tell the difference between these graphs. On the other hand the diameter of G2G_{2} is 1. This implies that only first two rows of RNDFC-matrices of its nodes are non-zero. However, the diameter of G1G_{1} is 3 and so there are 4 non-zero rows in the order nn RNDFC matrices of nodes of G1G_{1} for all n3n\geq 3. This clearly shows that these graph are not RNDFC-equivalent and so they are not isomorphic.

In the above example, we didn’t have to even compute RNDFC matrices and our argument was based on the diameter of the graphs. This shows the advantage of RNCDF-equivalence versus color refinement. But the main superiority of RNCDF-equivalence over color refinement is based on the fact that it labels nodes by means of concrete matrices. Therefore in cases that the color refinement and RNCDF-equivalence both partition the set of nodes similarly, the matrices associated to each equivalence class might tell the difference between two graphs.

Refer to caption
Figure 3. The graphs in Example 5.5
Example 5.5.

Let G1G_{1} and G2G_{2} be the graphs illustrated in Figure 3. Because of vertical and horizontal symmetries in both graphs, the color refinement partitions them similarly as follows:

P1={0,1},P2={2,3,4,5},P3={6,7,8,9}.P_{1}=\{0,1\},\quad P_{2}=\{2,3,4,5\},\quad P_{3}=\{6,7,8,9\}.

So color refinement cannot tell they are not isomorphic. Again because of those symmetries, the RNDFC-partitions of these graphs are the same too. However, the RNDFC matrices associated to classes P2P_{2} and P3P_{3} are different in each graph. To see this, we list the order 5 RNDFC matrices (w.r.t. vanilla intervals list) of each RNDFC-equivalence class in graphs G1G_{1} and G2G_{2}:

  • (G1G_{1})
    rndfc5(P1)=[021043062040000000],rndfc5(P2)=[011041052042020000],rndfc5(P3)=[020031032032031020].rndfc_{5}(P_{1})=\left[\begin{array}[]{lll}0&2&1\\ 0&4&3\\ 0&6&2\\ 0&4&0\\ 0&0&0\\ 0&0&0\end{array}\right],\quad rndfc_{5}(P_{2})=\left[\begin{array}[]{lll}0&1&1\\ 0&4&1\\ 0&5&2\\ 0&4&2\\ 0&2&0\\ 0&0&0\end{array}\right],\quad rndfc_{5}(P_{3})=\left[\begin{array}[]{lll}0&2&0\\ 0&3&1\\ 0&3&2\\ 0&3&2\\ 0&3&1\\ 0&2&0\end{array}\right].
  • (G2G_{2})
    rndfc5(P1)=[021043062040000000],rndfc5(P2)=[011041052022040000],rndfc5(P3)=[020031032021022040].rndfc_{5}(P_{1})=\left[\begin{array}[]{lll}0&2&1\\ 0&4&3\\ 0&6&2\\ 0&4&0\\ 0&0&0\\ 0&0&0\end{array}\right],\quad rndfc_{5}(P_{2})=\left[\begin{array}[]{lll}0&1&1\\ 0&4&1\\ 0&5&2\\ 0&2&2\\ 0&4&0\\ 0&0&0\end{array}\right],\quad rndfc_{5}(P_{3})=\left[\begin{array}[]{lll}0&2&0\\ 0&3&1\\ 0&3&2\\ 0&2&1\\ 0&2&2\\ 0&4&0\end{array}\right].

Therefore, G1G_{1} and G2G_{2} are not RNDFC-equivalent. Consequently they cannot be isomorphic. In fact, order 3 RNDFC matrices were enough to prove this.

6. Learning PageRank

In this section, we present several simple examples to show how the higher order NDF matrix representations can be fed as feature vectors into simple feedforward neural networks consisting of only fully connected linear layers to learn PageRanks of graphs. We run these experiments/examples on two undirected graphs appearing in social networks: The first graph is the network of facebook pages of 14113 companies. The second graph is the network of 27917 facebook pages of media. The datasets of these graphs are available in [9]. They are big enough to contain a sufficient number of nodes for training deep learning models and small enough to make it possible to test each sensible scenario in less than an hour on an ordinary PC. Along the way, we also examine various options for feature engineering and the architecture of the models. After training our model on the first graph, we modify the graph slightly and observe that the trained model still has a good predictive power on the modified version of the graph. This proves that our method is capable of handling dynamic graphs. In order to show that our method of learning PageRank is inductive, we next consider two random graphs generated by the Barabási–Albert algorithm with different seeds. We train our model on one of the graphs and show it is able to predict the PageRank of the other graph as well.

Example 6.1.

Let G=(V,E)G=(V,E) be the graph of facebook pages of companies. The nodes represent the pages and edges represent the mutual likes. It has 14113 nodes and average degrees of nodes is 7. One finds more specifics in the webpage of the graph [9]. To produce an intervals list, we use algorithm 2 with the parameters s=1s=1, m=50m=50, d=maxvVdeg(v)d=\max_{v\in V}deg(v) and r=1.3r=1.3. For the higher order NDF matrix representation, we choose the order 5 NDFC matrices associated to this intervals list. The result is 14113 matrices of size 6×176\times 17 indexed by nodes of GG. We flatten these matrices as vectors with 102 entries and the result is the set of feature vectors for training/testing of our model. For the target variable, we compute the PageRanks of nodes using the Python package NetworkX, [4]. In order to establish a rough balance between target variables and feature vectors, we multiply all PageRanks of nodes by 10000. The dataset is shuffled and divided into train/test sets, 10000 nodes for training and the rest of 4113 nodes for evaluating the model success. The following feedforward neural network model is used to learn the PageRank of the graph:

Listing 1: The feedforward neural network model, written in PyTorch, used for learning PageRank of the network of facebook pages of companies.
class FFNN_model(nn.Module):
def __init__(self):
super().__init__()
num_features = features.shape[1] # in this example; 102
self.fc1 = nn.Linear(num_features, 400)
self.fc2 = nn.Linear(400, 800)
self.fc3 = nn.Linear(800, 200)
self.fc4 = nn.Linear(200, 64)
self.fc5 = nn.Linear(64, 8)
self.fc6 = nn.Linear(8, 1)
self.dropout1 = nn.Dropout(0.4)
self.dropout2 = nn.Dropout(0.3)
self.dropout3 = nn.Dropout(0.5)
def forward(self, X):
X = torch.tanh(self.fc1(X))
X = torch.relu(self.fc2(X))
X = self.dropout1(X)
X = torch.relu(self.fc3(X))
X = self.dropout3(X)
X = torch.relu(self.fc4(X))
X = self.dropout2(X)
X = torch.tanh(self.fc5(X))
return self.fc6(X)

Our objective function is the Mean Square Error function and we apply Adam optimizer with the learning rate 0.001. In each epoch, the training dataset is shuffled again and divided into 25 batches of size 400, so 25 rounds of gradient decent is performed in each epoch. After training this model for 2000 epochs, our model is able to predict the PageRanks of nodes in the test set with around 90% accuracy (more precisely; 9.651% inaccuracy on average). One should note that due to some random factors in our algorithm, like shuffling the data and parameter initialization, retraining the model may result slightly different accuracy rate. After saving this model, we randomly remove 500 edges from GG and add 500 new edges in such a way the modified graph stays connected. We apply the same feature engineering on the modified graph and compute the PageRanks of its nodes using NetworkX as targets. Our saved model is able to predict the PageRanks of the nodes of the modified graph with an average inaccuracy rate of 9.783%.

We have also run an experiment to train a neural network model equipped with convolutional layers to learn the PageRanks of nodes. Although the final accuracy rate is worse than the above example, we present this experiment here as an example for typical convolution neural networks applied on higher NDF matrices, as explained at the end of Subsection 5.1.

Example 6.2.

Every little detail of this experiment is similar to Example 6.1, except that we do not reshape the matrices as vectors and the model starts with two convolution layers and a pooling is applied after the first convolution layer. Afterwards we flatten the data to continue with the fully connected linear layers. The size of the kernels of convolution layers are 3×13\times 1 and 4×14\times 1, respectively. The specifics of these layers are listed in Listing 2.

Listing 2: The definition of convolution layers of the convolutional neural network model, written in PyTorch, used for learning PageRank of the network of facebook pages of companies.
# Convolutional layers
final_out_channels = 6
self.conv_layer_1 = nn.Conv2d(in_channels=1, out_channels=3, kernel_size=(3,1), stride=(1, 1), padding=(1,0), padding_mode=’zeros’)
self.conv_layer_2 = nn.Conv2d(in_channels=3, out_channels=final_out_channels, kernel_size=(4,1), stride=(1, 1), padding=(1,0), padding_mode=’zeros’)
self.pooling_1 = nn.MaxPool2d(kernel_size=(3,1), stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)

The trained model predicts the PageRank of nodes in the test set with around 64% accuracy on average.

In the following example, we discuss our experiments concerning deep learning models for PageRank of the network of facebook pages of media:

Example 6.3.

Let G=(V,E)G=(V,E) be the graph of facebook pages of media. The nodes represent the pages and edges represent the mutual likes. It has 27917 nodes and average degrees of nodes is 14. One finds more specifics in the webpage of the graph [9]. To build an intervals list, we apply algorithm 2 with the parameters s=1s=1, m=70m=70, d=maxvVdeg(v)d=\max_{v\in V}deg(v) and r=1.3r=1.3. For the higher order NDF, we take the order 5 NDFC matrices w.r.t. to this intervals list. The result is matrices of size 6×236\times 23, and after flattening them the feature vectors have 138 entries. Again the PageRanks of nodes are the target variables. To balance the target variables with the feature vectors, this time, we multiply PageRanks by 24000. The dataset is shuffled and divided into 20000 nodes for training and 7917 nodes for evaluating the model. The feedforward neural network model is the same as the model described in Example 6.1. Again we train the model for 2000 epochs and the average inaccuracy of the prediction on the test set is 12.438%. Obviously, there are some random processes in our training algorithm such as shuffling and splitting the data, initialization of parameters, etc. To make sure that these elements of randomness do not undermine the promise of our methods, we repeat the training, and in the second try, the average inaccuracy on the test set is 13.551%. We have also trained the same model with the order 6 CDF matrix representation in lieu of the order 5 NDFC matrix representation and the trained model had the average inaccuracy of 28.44% on test set. This suggests that the NDFC matrix representation is a better choice than the CDF matrix representation for learning PageRank.

In order to demonstrate that our method of learning PageRank based on the higher order NDF matrix representation is inductive, we run another experiment in the following example.

Example 6.4.

We use Python package NetworkX to build two random graphs with 20000 nodes. The method we apply to generate these random graphs is based on the preferential attachment and is called the dual Barabási–Albert algorithm, see [7, 4] for details. It has the following parameters:

  • nn: The number of nodes. We set n=20000n=20000 for both graphs.

  • pp: A probability, i.e. 0<p<10<p<1. We set p=0.5p=0.5 for both graphs.

  • m1m_{1}: The number of edges to link each new node to existing nodes with probability pp. We set m1=3m_{1}=3 for both graphs.

  • m2m_{2}: The number of edges to link each new node to existing nodes with probability 1p1-p. We set m2=1m_{2}=1 for both graphs.

  • seedseed: The seed indicating the random number generation state. We set seed=1seed=1 and seed=2seed=2 for the first graph and the second graph, respectively.

  • initial_graphinitial\_graph: The initial graph to start the preferential attachment process. This parameter is set “None” for both graphs.

The graphs we obtain both have around 40000 edges with an average degree of around 4. However the list of degrees occurring in these graphs are very different especially for higher degrees, For instance the maximum degree of the first graph is 280, while it is 314 for the second graph. Therefore to choose a common list of starting points, we have to use the method explained in Remark 4.5, and our choice for the list of starting points is as follows:

{1,2,3,4,5,7,9,11,14,18,23,29,36,44,60,80,100,127,150,165,205}.\{1,2,3,4,5,7,9,11,14,18,23,29,36,44,60,80,100,127,150,165,205\}.

We compute the order 5 NDFC matrix representations of both graphs w.r.t. this list of starting points and obtain a matrix of size 6×216\times 21 for each node of these graphs. We reshape these matrices as 1-dimensional arrays with 126 entries and consider them as the feature vectors. As for targets, we compute the PageRanks of nodes of these graphs. To balance the target variables with the feature vectors, we multiply PageRanks by 1000. Afterwards, we train the same feedforward neural network model described in Example 6.1 with 10000 randomly chosen feature vectors of the first graph as the feature set and the PageRank of the corresponding nodes as the target set. The average error of the trained model on the rest of 10000 nodes is 8.296%. Then, we apply the trained model to the whole dataset of the second graph. The average error of predicting PageRanks of nodes of the second graph is 8.926%.

This example shows that the “local” feature vectors obtained from the order 5 NDFC matrix representation of the first graph contain enough statistical information about the “global” structure of the graph to be useful for learning and predicting not only the PageRanks of nodes in the first graph, but also the PageRanks of nodes in the second graph. In other words, the above example confirms that our method is a local-to-global strategy. It also clearly shows that our method can learn PageRank using the data of a known graph, and then predict the PageRank of another graph, as long as these graphs share some statistical similarities. Hence our method of learning PageRank is inductive. We shall return to the subject of leaning of PageRank in Section 8, after adopting a modification inspired by directed graphs.

7. Aggregations and learning the closeness centrality

The operation of concatenating rows (resp. columns) is one of the most basic methods of aggregation on entries of a matrix which amounts to reshaping the matrix (resp. the transpose of the matrix) into a 1-dimensional array. Some of the other basic aggregation methods are minimum, maximum, summation, mean and median which can be performed column-wise, row-wise or on all entries of the matrix. However since the order of rows in our matrix representations comes from the Breadth-First Search (or equivalently, the radius of the circles around nodes), we have a natural order in each column. On the other hand, the order of columns comes from the natural increasing order of starting points of the underlying intervals list, so there is a meaningful order between entries of each row too. Exploiting these vertical and horizontal orderings gives us a pivot and more creative opportunities for feature engineering based on these matrix representations. The following definition which was inspired by our previous constructions in Section 2 is only an instance of new types of possible aggregations:

Definition 7.1.

Let G=(V,E)G=(V,E) be a graph and let π:VMr×m\pi:V{\rightarrow}M_{r\times m} be a matrix representation of nodes of GG as r×mr\times m matrices. For any given sequence of parameters Λ={λi}i=1r\Lambda=\{\lambda_{i}\}_{i=1}^{r} of real numbers, we define a vector representation πΛ:Vm\pi_{\Lambda}:V\rightarrow\mathbb{R}^{m} as follows:

πΛ(v)j=i=1rλiπ(v)i,jvV,j=1,,m.\pi_{\Lambda}(v)_{j}=\sum_{i=1}^{r}\lambda_{i}\pi(v)_{i,j}\qquad\forall v\in V,\quad j=1,\dots,m.

In other words, πΛ=MΛπ\pi_{\Lambda}=M_{\Lambda}\pi, where MΛM_{\Lambda} is the vector (or the row matrix) [λ1,,λr]\left[\lambda_{1},\dots,\lambda_{r}\right]. We call πΛ\pi_{\Lambda} the parametric aggregation of π\pi with respect to Λ\Lambda.

More precisely, the above aggregation is a column-wise parametric aggregation. Obviously, we can repeat this definition for row-wise parametric aggregation which replaces each row by the weighted sum of its elements. Even more generally, one can consider an r×mr\times m matrix of parameters and perform a 2-dimensional aggregation of the given matrix representation or apply any other pooling technique to obtain a lower dimensional (matrix or vector) representation of nodes. Of course, they all depend on the prior knowledge about the structure of the graph and the downstream task in hand. Inspired by pp-centrality function discussed in Section 2, in the following definition, we introduce a special case of parametric aggregation for our matrix representations of nodes:

Definition 7.2.

Let G=(V,E)G=(V,E) and π:VMr×m\pi:V{\rightarrow}M_{r\times m} be as Definition 7.1. For a given real number 0<p<10<p<1, the pp-aggregation of the matrix representation π\pi is the parametric aggregation of π\pi with respect to the parameter sequence Λ={pi1}i=1r\Lambda=\{p^{i-1}\}_{i=1}^{r}. and it is denoted simply by πp\pi_{p}.

Given a node vv in a graph, the sum of all entries of the ii-th row of the RCDF matrix representation of vv is si(v)s_{i}(v). So one can consider the rows of the RCDF matrix of vv as a decomposition of first few terms of the sequence {si(v)}\{s_{i}(v)\} with respect to the underlying intervals list. Therefore the pp-aggregation of the RCDF matrix is a decomposition of pp-centrality function. Combining this observation with the intuition gained in Examples 2.4 and 2.5 makes us to believe that the pp-aggregation of the RCDF matrix contains a good deal of information about the closeness centrality of the graph. In the following examples, we feed the vectors obtained from the pp-aggregation of the RCDF matrix representations into simple feedforward deep learning models to learn and predict closeness centrality of the graphs discussed in the previous section.

Example 7.3.

Let G=(V,E)G=(V,E) be the graph of facebook pages of companies discussed in Example 6.1.

  • (i)

    Let \mathcal{I} be the intervals list associated with the starting points generated by Algorithm 2 according to the parameters: s=1s=1, m=35m=35, d=maxvVdeg(v)d=\max_{v\in V}deg(v) and r=1.5r=1.5, We set p=0.3p=0.3 and consider the pp-aggregation of the order 4 RCDF matrix representation of nodes of GG w.r.t. the intervals list \mathcal{I} as our feature set. We compute the closeness centrality of the nodes of the graph using NetworkX library, [4], and consider it as the target set. We employ a not-so-deep feedforward neural network model as follows:

    Listing 3: The feedforward neural network model, written in PyTorch, used for learning closeness centrality of the network of facebook pages of companies.
    class FFNN_model(nn.Module):
    def __init__(self):
    super().__init__()
    num_features = features.shape[1]
    self.fc1 = nn.Linear(num_features, 64)
    self.fc2 = nn.Linear(64, 8)
    self.fc3 = nn.Linear(8, 1)
    self.dropout1 = nn.Dropout(0.3)
    def forward(self, X):
    X = torch.tanh(self.fc1(X))
    X = self.dropout1(X)
    X = torch.relu(self.fc2(X))
    return self.fc3(X)

    We use the Mean Square Error function as the objective function and apply Adam optimizer. The nodes are shuffled and divided into train/test sets with 10000 nodes for training and 4113 nodes for testing. During each epoch, the training dataset is shuffled and divided into 25 batches of size 400, so 25 rounds of gradient decent is performed. After training this simple model for 2000 epochs, we obtain a model for predicting closeness centrality of nodes based on the aforementioned local features with an average error of 1.86% on the test set.

  • (ii)

    We delete 500 edges randomly from GG and add 500 new edges randomly in such a way that the modified graph remains connected. Then we compute the feature set and target set of the new graph exactly with the same method as Part (i). We apply the trained model in Part (i) to the feature set of all 14113 nodes of the modified graph. The average error is 2.195%. A word of caution is necessary here. After removing and adding 500 edges to graph GG, the list of starting points of the intervals list computed for graph GG was still a good choice for the modified graph. In practice, it is possible that lists of degrees occurring in these graphs are very different and one has to develop a method to choose a common list of starting point for them.

The above example clearly shows the robustness of our method of learning closeness centrality by applying deep learning on pp-aggregation of the RCDF matrix representations of nodes of the graph. The fact that we only used order 4 RCDF matrix representations of nodes (a locally engineered feature) to learn closeness centrality (a globally defined centrally measure) confirms the local-to-global nature of our method. Moreover, Part (ii) of the above example demonstrates that our method is applicable to dynamic graphs. In Example 7.5, we demonstrate the inductive power of our method to learn closeness centrality. In the following example, we train a model to learn the closeness centrality of the network of facebook pages of media. We also test several alternative ideas for feature engineering.

Example 7.4.
  • (i)

    Let G=(V,E)G=(V,E) be the graph of facebook pages of media. We use Algorithm 2 with parameters s=1s=1, m=70m=70, d=maxvVdeg(v)d=\max_{v\in V}deg(v) and r=1.3r=1.3 to construct an intervals list. For the higher order NDF, we take the order 3 RCDF and RNDFC matrices associated to this intervals list. For pp-aggregation, we try all vales in the set {0.3,0.2,0.15}\{0.3,0.2,0.15\}. The results of all pp-aggregations are vectors with 23 entries. We compute the closeness centrality of nodes using NetworkX library for the target set. The dataset is shuffled and divided into 10000 nodes for training and 17917 nodes for evaluating the model. The feedforward neural network model is the same as the model described in Example 7.3. Again we train the model for 2000 epochs. For various options described in the above, the average inaccuracies of the prediction on the test set are as the Table 6. Although the order 3 RNDFC matrix matrices have one row more that the order 3 RCDF matrices, the model using RNDFC matrices performs worse than the model using RCDF matrices. This shows the relative advantage of RCDF matrices for learning closeness centrality. We also notice that the parameter pp in the pp-aggregation is an important hyperparameter for our method.

  • (ii)

    To see whether a deeper model can help to lower the average inaccuracy, We consider the order 3 RCDF matrix representation with pp-aggregation for p=0.2p=0.2 and feed the feature vectors obtained into the model described in Example 6.1. The average inaccuracy of the trained model is 2.089%, which is surprisingly higher than the one for shallower model!

p=0.3p=0.3 p=0.2p=0.2 p=0.15p=0.15
RCDF 2.122% 1.755% 1.752%
RNDFC 4.398% 3.471% 2.404%
Table 6. The average inaccuracies of the model described in Example 7.4 on the test set according to different options. Columns are indexed according to parameter pp of the aggregation and rows are marked by the matrix representation which was used.
Example 7.5.

Consider two random graphs discussed in Example 6.4 and the list of starting points we constructed there. We compute the order 2 RCDF matrix w.r.t. the intervals list associated to these starting points and obtain a 2×212\times 21 matrix representation for each node of these graphs. Next, we set p=0.2p=0.2 and apply pp-aggregation to this matrix representation. The final feature vectors have 21 euclidean dimension. We train the same feedforward neural network model described in Example 7.3 with 10000 randomly chosen feature vectors of the first graph as the feature set and the closeness centrality of the corresponding nodes as the target set. The average error of the trained model on the rest of 10000 nodes is 1.42%. Afterwards, we apply the trained model to the whole dataset of the second graph. The average error of predicting closeness centrality of nodes of the second graph is 2.053. This clearly shows that the “local” feature vectors obtained from pp-aggregation of order 2 RCDF matrix representation of the first graph contains enough statistical information to be used to predict the closeness centrality of not only nodes of the first graph, but also the nodes of the second graph.

The above example confirms the inductive nature of our method. The convenience of learning closeness centrality using our methods offers a number of research opportunities on applications of closeness centrality measure in network science which will be developed in our future works.

Finally, we note that replacing vanilla or minimal intervals lists with a dynamic intervals list induces a row-wise aggregation on the level of higher order NDF matrix representations. We leave the exact formulation of these type of aggregations and the study of their generalizations and possible applications to the interested reader.

8. NDF for Directed Graphs

All of our previous constructions are based on two notions in graphs; the degrees of nodes and the distance between nodes. There are two possibilities for each of these notions in directed graphs; “inward” and “outward” according to whether the directed edges are towards a node or starting from a node, respectively. Therefore, to extend various constructions of NDF embeddings and its higher order generalizations to directed graphs, we need to consider both of these variations. We do not bore the reader by repeating all aforementioned definitions and constructions according to inward/outward dichotomy and content ourselves with few instances. We begin with the crucial difficulties that arise while dealing with directed graphs. The directed graph shown in Figure 4 is used to demonstrate our reasoning, constructions and basic examples, so we briefly denote it by G0=(V0,E0)G_{0}=(V_{0},E_{0}) in this section.

Refer to caption
Figure 4. The directed graph illustrating the concepts and constructions of Section 8.

Let G=(V,E)G=(V,E) be a directed graph. The main issue is that edges in directed graphs are not symmetric. This has several major consequences that should be addressed:

  • Given a node vVv\in V, a node uVu\in V is called an in-neighbor (resp. out-neighbor) of vv if there is a directed edge (u,v)(u,v) (resp. (v,u)(v,u)). The first consequence of asymmetric edges in directed graphs is that the number of in-neighbors of vv (also called in-degree of vv and denoted by degin(v)deg^{in}(v)) is not always equal to the number of out-neighbors of vv (also called out-degree of vv and denoted by degout(v)deg^{out}(v)). Therefore, all degree frequency calculations must be done accordingly in two parallel ways. We use the superscripts in and out for denoting which way (inward or outward) is intended. For example, the inward vanilla NDF (i.e. the vanilla in-neighbors in-degree frequency) vector of a node vVv\in V is denoted by vndfin(v)vndf^{in}(v).

  • The second consequence is the possibility for a node to have in-neighbors (resp. out-neighbors) whose in-degrees (resp. out-degrees) are zero. For example in graph G0G_{0}, node BB is an in-neighbor of node EE whose in-degree is zero. This requires us to consider an entry for zero in NDF vectors. Therefore any (vanilla or dynamic) list of starting points should begin with zero instead of 1. When we consider a minimal intervals list for a static graph the list of starting points is allowed to start with a number bigger than 0, but for dynamic version of minimal intervals list, the list of starting point must begin with 0 as well.

  • Moreover, we have two maximum degrees in directed graphs; maximum in-degree and maximum out-degree, so construction of a dynamic NDF for a directed graph starts with choosing two lists of starting points accordingly. For graph G0G_{0}, both the (inward/outward) vanilla (and minimal) lists of starting points are equal to {0,1,2,3}\{0,1,2,3\}, the in-degrees list and the out-degrees list of G0G_{0}. The vanilla inward/outward NDF vectors of some selected nodes of G0G_{0} are listed below:

    vndfin(A)\displaystyle vndf^{in}(A) =\displaystyle= (1,0,1,0),vndfout(A)=(1,1,1,0)\displaystyle(1,0,1,0),\qquad vndf^{out}(A)=(1,1,1,0)
    vndfin(H)\displaystyle vndf^{in}(H) =\displaystyle= (0,0,2,0),vndfout(H)=(0,1,0,1)\displaystyle(0,0,2,0),\qquad vndf^{out}(H)=(0,1,0,1)
    vndfin(B)\displaystyle vndf^{in}(B) =\displaystyle= (0,0,0,0),vndfout(B)=(1,0,1,1)\displaystyle(0,0,0,0),\qquad vndf^{out}(B)=(1,0,1,1)
    vndfin(K)\displaystyle vndf^{in}(K) =\displaystyle= (0,2,0,0),vndfout(A)=(1,2,0,0)\displaystyle(0,2,0,0),\qquad vndf^{out}(A)=(1,2,0,0)
  • Finally, we note that the existence of a directed path (v0,,vn)(v_{0},\cdots,v_{n}) starting from a node v0Vv_{0}\in V and ending to another node vnVv_{n}\in V does not guarantee that there is a directed path from vnv_{n} to v0v_{0}. For example, there is a directed path (B,A,C,F)(B,A,C,F) from node BB to node FF in graph G0G_{0}, but not vice versa. Moreover, if there is such a path, there is no guarantee that the shortest path from v0v_{0} to vnv_{n} has the same length as the shortest path from vnv_{n} to v0v_{0}. For example, the shortest path from KK to YY, i.e. (K,I,H,Y)(K,I,H,Y), has length 3 while the one from YY to KK, i.e. (Y,L,K)(Y,L,K), has length 2. This means we have no well-defined symmetric distance (metric) function on the set of nodes of a directed graph. However, we can perform the inward/outward Breath-First Search and define in-circles as well as out-circles around each node. Specifically, the in-circle of radius nn centered at a node vVv\in V is defined and denoted as

    Cnin(v)={uV,there is a shoetest directed path from u to v of length n},C_{n}^{in}(v)=\{u\in V,\text{there is a shoetest directed path from }u\text{ to }v\text{ of length }n\},

    for n0n\geq 0. The out-circle of radius nn centered at vv is defined similarly and denoted by Cnout(v)C_{n}^{out}(v). For example in G0G_{0}, we have

    C2in(A)={L,H},C2out(A)={M,Y,F}.C_{2}^{in}(A)=\{L,H\},\qquad C_{2}^{out}(A)=\{M,Y,F\}.

Having the notions of (inward/outward) circles in our arsenal, we are able to define all variations of the higher order NDF matrix representations of nodes in a directed graph as straightforward generalizations of these notions in undirected graphs. Therefore we are ready to apply our methods in previous sections to directed graphs as well. We are particularly interested in learning PageRank of directed graphs.

8.1. Learning PageRanks of directed graphs

The original definition of PageRank and our experiments in Section 6 suggest that we should employ the inward NDFC matrix representations as our feature engineering method. Before applying our deep learning methods for learning PageRank, we would like to introduce an intuitive adjustment to NDFC matrix representation as a secondary option for feature engineering necessary for learning PageRank. Then we shall use both versions of NDFC matrices in several examples. Hopefully this adjustment inspires further ideas for advanced feature engineering methods based on higher order NDF representations.

The PageRank was originally introduced for hyperlink analysis of webpages in the internet, see [2]. As an iterative algorithm the PageRank of a node vv in the directed graph of webpages (with hyperlinks as directed edges) is defined as

PRn+1(v)=uC1in(v)PRn(u)degout(u),PR_{n+1}(v)=\sum_{u\in C_{1}^{in}(v)}\frac{PR_{n}(u)}{deg^{out}(u)},

where PRn(x)PR_{n}(x) is the PageRank of a node xx in the nn-th iteration. The denominator of the summand takes into account the adverse effect of out-degrees of in-neighbors of vv on its PageRank. A similar phenomenon can be observed while building the NDFC matrix representation of nodes of a directed graph. For example, assume graph G0G_{0} consists of webpages and consider the second row of the inward NDFC matrix of node HH in G0G_{0} w.r.t. vanilla intervals list. We have C1in(H)={A,I}C_{1}^{in}(H)=\{A,I\}, and so the second row of the matrix ndfcin(H)ndfc^{in}(H) is the mean of the inward vanilla NDF vectors of AA and II, that is

(1,0,1,0)+(0,0,2,0)2.\frac{(1,0,1,0)+(0,0,2,0)}{2}.

One observes that II has only one out-neighbor which is HH and does not lower the chance of HH to be visited after II. But AA has 3 out-neighbors {H,J,C}\{H,J,C\}, and besides HH, the other two out-neighbors of AA may attract some of the visitors of AA and consequently lower the chance of HH to be visited after AA. Therefore, giving the same weight to the NDF vectors of AA and II does not seem right, so we need to discount the inward NDF vectors of in-neighbors of HH proportionate to their out-degrees. One obvious solution would be to divide the inward NDF vectors of in-neighbors of HH by their out-degrees, just like the iterative formula for PageRank mentioned above.

On the other hand, we note that discounting the NDF vectors must be applied for all circles of positive radius. So besides the first row of the NDFC matrix, the rest of rows will be adjusted by the discounting process. Regarding these considerations, we propose the following discounted version of the inward NDFC matrix representation:

Definition 8.1.

Let G=(V,E)G=(V,E) be a directed graph. The order rr discounted inward NDFC matrix representation with respect to an intervals list ={I1,I2,,Im}\mathcal{I}=\{I_{1},I_{2},\cdots,I_{m}\} assigns an (r+1)×m(r+1)\times m matrix ndfcrin_dis(v)ndfc_{r}^{in\_dis}(v) to each node vVv\in V such that its first row is the same as the first row of ndfcrin(v)ndfc_{r}^{in}(v) and, for k>1k>1, its kk-th row is defined by

1sk1in(v)uCk1in(v)dndfin(u)degout(u),when sk1in(v)0,\frac{1}{s_{k-1}^{in}(v)}\sum_{u\in C_{k-1}^{in}(v)}\frac{dndf^{in}(u)}{deg^{out}(u)},\qquad\text{when }s_{k-1}^{in}(v)\neq 0,

and the zero vector of dimension mm when sk1in(v)=0s_{k-1}^{in}(v)=0.

When uu is an in-neighbor of some node, we have degout(u)1deg^{out}(u)\geq 1, so the denominator of the summand in the above definition is non-zero. One also notes that considering the mapping u1degout(u)u\mapsto\frac{1}{deg^{out}(u)} as the discount function is only one simple proposal and finding the best definition for the discount function depends on the downstream application and it requires further research. In the following examples, we use both versions (discounted and not discounted) of the inward NDFC matrix representation as feature vectors to train deep learning models for learning PageRanks of directed graphs.

Example 8.2.

Let G=(V,E)G=(V,E) be the directed graph of “Wikipedia vote network” [6], in which Wikipedia pages are considered as nodes and votes between them are considered as directed edges. This graph has 7115 nodes. To build a list of starting points according to the in-degrees list of GG, we apply the method explained in Remark 4.5 and choose the following list of starting points:

{0,1,2,4,7,12,19,30,47,72,110,167,237,307,450}\{0,1,2,4,7,12,19,30,47,72,110,167,237,307,450\}

Then we compute the order 5 inward NDFC matrix representation of nodes of GG w.r.t. this list of starting points. After flattening these matrices into one dimensional arrays, we obtain our feature vectors each with 90 entries. As usual, we compute the PageRank of nodes using the Python package NetworkX as the target variable. To balance target variables and feature vectors, we multiply all PageRanks of nodes by 5000. The dataset is shuffled and divided into train/test sets, 5000 nodes for training and the rest of 2115 nodes for evaluating the model success. The rest of details of the training process is as in Example 6.1 except that we choose the learning rate 0.0005 to avoid overfitting. The average inaccuracy of the prediction of PageRank on the test set is 13.09%. Afterwards, we repeat this experiment with the order 5 discounted inward NDFC matrices and obtain the average inaccuracy rate of 13.87% on the test set. Due to various randomness in the training process, there is a few percentage points fluctuation around the average inaccuracy rate. Therefore, we can conclude that both versions of NDFC matrices perform almost similarly on this graph.

In the following example, we try our methods on a larger directed graph:

Example 8.3.

Let G=(V,E)G=(V,E) be the directed graph of “Epinions social network” [8], in which the nodes are the members of the site epinions.com and directed edges denote who trust whom. This graph has 75879 nodes and its maximum in-degree is 3035. Again we apply the method explained in Remark 4.5 to build a list of starting points according to the in-degrees list of GG. The result is the following list of starting points:

{0,1,2,3,4,5,7,9,11,14,17,21,25,30,36,43,51,60,70,82,96,112,130,151,175,203,\displaystyle\{0,1,2,3,4,5,7,9,11,14,17,21,25,30,36,43,51,60,70,82,96,112,130,151,175,203,
235,272,315,365,422,488,558,628,698,768,838,908,978,1100,1200,1450,2800}\displaystyle\quad 235,272,315,365,422,488,558,628,698,768,838,908,978,1100,1200,1450,2800\}

Then we compute the order 3 inward NDFC matrix representation of nodes of GG w.r.t. this list of starting points, which yields matrices of size 4×434\times 43. The feature vectors obtained from flattening these matrices have 172 entries. As usual, we compute the PageRanks of nodes using the Python package NetworkX as the target variables. To balance target variables and feature vectors, we multiply all PageRanks of nodes by 10000. The dataset is shuffled and divided into train/test sets, 20000 nodes for training and the rest of 55879 nodes for evaluating the model success. The rest of details of the training process is as in Example 6.1 except that we train this model only for 1000 epochs to avoid overfitting. The average inaccuracy of the prediction of PageRank on the test set is 27.196%. Afterwards, we repeat this experiment with the order 3 discounted inward NDFC matrices and obtain the average inaccuracy of 19.425% on the test set. We can attribute this notable improvement to applying discounting in our feature engineering.

8.2. PageRanks of undirected graphs using discounted NDFC matrix representations

Every undirected graph can be considered as directed graph if we consider its symmetric undirected edges as two directed edges opposite to each other. In the directed graph obtained this way all inward and outward concepts and constructions we discussed in this section are equivalent and in particular the discounting process is simply dividing by the degree of the neighbor. That is, given an undirected graph G=(V,E)G=(V,E), vVv\in V and k>0k>0, the kk-th row of ndfcrdis(v)ndfc_{r}^{dis}(v), the order rr discounted NDFC matrix representation of vv, is defined by

1sk1(v)uCk1(v)dndf(u)deg(u),when sk1(v)0,\frac{1}{s_{k-1}(v)}\sum_{u\in C_{k-1}(v)}\frac{dndf(u)}{deg(u)},\qquad\text{when }s_{k-1}(v)\neq 0,

and the zero vector when sk1(v)=0s_{k-1}(v)=0. In the following examples we apply the discounted NDFC matrix representations to learn PageRanks of two undirected graphs discussed in Section 6, i.e. the graph of facebook pages of companies and the graph of facebook pages of media.

Example 8.4.

In Example 6.1, we replace the NDFC matrices with the discounted NDFC matrices and train the model with the same details as before. The trained model has the average inaccuracy of 9.09% on the test set. We notice that discounted NDFC matrices speed up the training process slightly and during 2000 epochs some overfitting occurs. So we reduce the number of epochs to 1500 and repeat the experiment. This time the trained model has even the lower average inaccuracy of 8.069%.

The above example clearly suggests that the discounted NDFC matrix representation is slightly more suitable for learning PageRanks of undirected graphs. However using the discounted NDFC matrices does not show an improvement in the following example:

Example 8.5.

In Example 6.3, we replace the NDFC matrices with the discounted NDFC matrices. The average inaccuracy of trained model on the test set is 13.143%.

Our conclusion is that the discounting the NDF vectors according to the out-degrees of nodes has a mixed effect in improving the quality of feature vectors for learning PageRank. A comprehensive research is needed to determine exact conditions under which this technique can produce better results.

9. Conclusion and Final Remarks

In this article, we introduced the following constructions based on local data gathered from the neighborhoods of nodes of graphs:

  • A general family of parametric centrality measures.

  • Static and dynamic versions of NDF vector representations of nodes.

  • Higher order NDF matrix representations of nodes.

  • An effective aggregation method on higher order NDF matrix representations.

  • A discounting process for engineering more efficient feature vectors.

Using these constructions, we proposed the following methods:

  • An approximation of the neighborhoods of nodes by trees of height 2.

  • A method for graph isomorphism testing based on NDF vectors and RNDFC matrices.

  • An easy way to visualize the neighborhoods of nodes as color maps.

  • An inductive deep learning algorithm for learning PageRank which is applicable to dynamic graphs.

  • An inductive deep learning algorithm for learning closeness centrality which is applicable to dynamic graphs.

The above list of applications is not exhaustive and we expect our graph embeddings and machine leaning methods find more applications in other subjects related to machine learning and data science on graphs, network science, graph isomorphism testing, graph labeling, etc.

Final remarks. There are several important points that need to be highlighted here:

  • Our focus in this article was to define NDF vector embeddings and their higher order matrix representations and other feature engineering methods. We did not pay a through attention to the architecture of deep learning algorithms we have used. Therefore, we think interested readers will find a number of untouched research ideas concerning various designs of deep learning algorithms according to the applications mentioned here or new applications. For example, possibility of learning other centrality measures using NDF embeddings.

  • As we already said multiple times, our way of training depends on a number of random procedures such as parameter initialization, data shuffling and splitting, etc. Therefore, one should expect minor differences while verifying the performance of our methods.

  • The complete set of codes for all of constructions and examples in this article will be available in the author’s GitHub account. We hope the curious reader can perform new experiments with our code and gain more insights.

  • Definitely, the most important choice to be made in our methods is choosing the right list of starting points according to various statistics of the graph and the task at hand. We strongly encourage readers interested in our methods to pay special attention to this issue as it is in the heart of NDF embeddings.

  • An interesting research idea is to feed the NDF vectors or their higher order matrix representations into well-known graph representation learning models and graph neural network models as initial state. Due to the competitive performance of NDF and RNDFC embeddings in graph isomorphism testing, it is expected that those deep learning models gain more expressive power because of the NDF embeddings.

References

  • [1] J. Ai, H. Zhao, K.M. Carley, Z. Su, H. Li Neighbor vector centrality of complex networks based on neighbors degree distribution. Eur. Phys. J. B 86, 163, (2013).
  • [2] S. Brin, L. Page The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 33:107–17, (1998).
  • [3] M. Grohe, K. Kersting, M. Mladenov, P. Schweitzer Color refinement and its applications. Chapter 15 in: An Introduction to Lifted Probabilistic Inference (Neural Information Processing series), The MIT Press, (2021).
  • [4] A. Hagberg, D. Schult, P. Swart NetworkX Reference, Release 2.6.2. (Available at networkx.org and accessed in 2021).
  • [5] W.L. Hamilton, R. Ying, J. Leskovec Inductive representation learning on large graphs. Proceedings of NIPS, pp. 1024-1034, (2017).
  • [6] J. Leskovec, D. Huttenlocher, J. Kleinberg Predicting positive and negative links in online social networks. WWW’10: Proceedings of the 19th international conference on World wide web, April 2010, 641–650, (2010), (Dataset is available at http://snap.stanford.edu/data/wiki-Vote.html and accessed in 2022).
  • [7] N. Moshiri The dual-Barabasi-Albert model, arXiv:1810.10538.
  • [8] M. Richardson, R. Agrawal, P. Domingos Trust Management for the Semantic Web. In: The Semantic Web - ISWC 2003. ISWC 2003. Lecture Notes in Computer Science, vol 2870. Springer, Berlin, Heidelberg. Pages 351-368, (2003), (Dataset is available at http://snap.stanford.edu/data/soc-Epinions1.html and accessed in 2022).
  • [9] R.A. Rossi, N.K. Ahmed The Network Data Repository with Interactive Graph Analytics and Visualization. (Datasets are available at https://networkrepository.com/fb-pages-company.php, and https://networkrepository.com/fb-pages-media.php and accessed in 2022).