This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A hyper-distance-based method for hypernetwork comparison

Tao Xu Xiaowen Xie Zi-Ke Zhang Chuang Liu Xiu-Xiu Zhan [email protected] Alibaba Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou, 311121, P. R. China Digital Communication Research Center, Zhejiang University, Hangzhou 310058, China College of Media and International Culture, Zhejiang University, Hangzhou 310058, PR China
Abstract

Hypernetwork is a useful way to depict multiple connections between nodes, making it an ideal tool for representing complex relationships in network science. In recent years, there has been a marked increase in studies on hypernetworks, however, the comparison of the difference between two hypernetworks has been given less attention. This paper proposes a hyper-distance-based method (HD) for comparing hypernetworks. This method takes into account high-order information, such as the high-order distance between nodes. The experiments carried out on synthetic hypernetworks have shown that HD is capable of distinguishing between hypernetworks generated with different parameters, and it is successful in the classification of hypernetworks. Furthermore, HD outperforms current state-of-the-art baselines to distinguish empirical hypernetworks when hyperedges are disrupted.

keywords:
Hypernetwork, Higher-order Distance, Hypernetwork Comparison , Jensen-Shannon Divergence

1 Introduction

With the rapid development of the Internet and the explosive growth of information  [1], the size and complexity of the network data continue to increase. In the modern digital era, it is essential to compare and evaluate the distinctions between various networks [2] in order to gain a thorough understanding of the structure, function, and behavior of the network. The exploration of network comparison is a significant area of study with implications for disciplines such as network science, data mining, and machine learning [3, 4, 5, 6] and can be used to solve problems such as network classification, algorithm evaluation, and anomaly detection [7, 8, 9]. Nevertheless, the customary techniques for contrasting networks mainly concentrate on the low-order features of networks. For instance, Han et al. [4] proposed a straightforward and effective measure, i.e., the Jaccard method, which calculates the difference between the adjacency matrices of two networks. Koutra et al. [10] employed a method called Deltacon to compare the similarity between all pairs of nodes in two networks. Aliakbary et al. [11] analyzed the graphlet distribution of a network and compared the similarity between two networks through the graphlet distribution. Furthermore, Bagrow et al. [12] introduced the network portrait and KL divergence to compare two networks. Despite the fact that these techniques may partially uncover the fundamental characteristics of networks [13], they have restrictions on grasping the intricate multivariate connections existing in complex systems. In the real world, many complex systems exhibit rich multivariate relationships, including dependencies among multiple variables and interactions among multiple objects. For example, in collaboration networks, multiple individuals may collaborate on one project. In the brain’s neural system, multiple regions can be highly active simultaneously. In social platforms, groups made up of multiple individuals share information and interact with each other [14, 15, 16, 17]. The relationships mentioned above can be modeled by hypernetworks, which are networks composed of nodes and hyperedges. A hyperedge may contain more than two nodes, and a single node may belong to multiple hyperedges  [18]. Representing a complex system as a hypernetwork can break the limitation of ”binary interactions” in an ordinary network and better reflect the multidimensional interactions present in the system  [19]. Several methods have been proposed to compare differences in hypernetworks, such as those of Surana et al. [20]. However, most of these methods are adaptations of ordinary network comparison methods and do not fully utilize the high-order characteristics specific to hypernetworks.

To address this problem, we propose a hyper-distance-based network comparison method (HD), which reveals a comprehensive picture of network dissimilarity and characterizes the dissimilarity between hypernetworks using high-order information. The method compares the differences between two hypernetworks through the shortest path lengths defined by the high-order distance. We apply the method to compare hypernetworks generated by network models as well as empirical hypernetworks deduced by real data from different domains. The results show that our method can successfully compare different networks and perform better than the baselines.

The rest of this paper is structured as follows. Section 2 outlines the specifics of the hyper-distance-based network comparison approach. Section 3 provides an overview of the baseline methods. Section 4 presents the experimental results of comparing different hypernetworks. Finally, Section 5 summarizes the findings and suggests potential future research directions.

2 Hyper-distance-based network comparison method

In this section, we propose a hypernetwork comparison method based on the local and global properties of hypernetworks. We first illustrate the basic definitions including the high-order distance, and the hypernetwork comparison method is further defined based on the high-order distance.

2.1 Basic Definitions

A hypernetwork is defined as H=(V,E)H=(V,E), with sets of nodes VV and hyperedges EE represented as V={v1,v2,,vN}V=\{v_{1},v_{2},\cdots,v_{N}\} and E={e1,e2,,eM}E=\{e_{1},e_{2},\cdots,e_{M}\}, respectively, where NN and MM represent the number of nodes and hyperedges. An incident matrix INMI_{N*M} is constructed based on the membership relationship between the nodes and hyperedges. If node viv_{i} is part of a hyperedge eje_{j}, then the value of IijI_{ij} is set to 11, otherwise it is 0. Therefore, we can obtain the following.

kH(vi)=ejEIij,k^{H}(v_{i})=\sum_{e_{j}\in{E}}I_{ij}, (1)
kE(ej)=viVIij,k^{E}(e_{j})=\sum_{v_{i}\in{V}}I_{ij}, (2)

where k(vi)k(v_{i}) and kE(ej)k^{E}(e_{j}) represent the hyperdegree of node viv_{i} and the size of hyperedge eje_{j}, respectively.

We create the ordinary network G=(V,Eo)G=(V,E^{o}) of HH by linking all the nodes that are part of a hyperedge in HH. In other words, an edge eo=(vi,vj)Eoe^{o}=(v_{i},v_{j})\in E^{o} if viv_{i} and vjv_{j} both appear in at least one of the hyperedges in EE. The network GG is an unweighted and undirected network, and its adjacency matrix is denoted as AoA^{o}. This means that if there is an edge between the nodes viv_{i} and vjv_{j} in GG, then AijoA^{o}_{ij} is equal to 1, otherwise it is 0.

The ss-distance between nodes: We employ the hyperedge distance from [21] to calculate the ss-distance between two nodes. Two hyperedges are considered to be ss-adjacent if they have at least ss nodes in common. Concretely, the ss-distance between two hyperedges, eie_{i} and eje_{j}, is denoted as LsE(ei,ej)L_{s}^{E}(e_{i},e_{j}) and is expressed as:

LsE(ei,ej)={The shortest s-path length,if there exists an s-path between ei and ej,if there is no s-path betweenei and ej.\small L_{s}^{E}(e_{i},e_{j})=\begin{cases}\text{The shortest }s\text{-path length,}&\text{if there exists an }s\text{-path between }e_{i}\text{ and }e_{j}\\ \infty,&\text{if there is no }s\text{-path between}e_{i}\text{ and }e_{j}\end{cases}. (3)

Based on the above definition, the ss-distance between two nodes viv_{i} and vjv_{j} is LsV(vi,vj)L_{s}^{V}(v_{i},v_{j}) and can be defined as follows:

LsV(vi,vj)={1,if vi,vj exist in the same hyperedgemin(LsE(em,en))+1,viem,vjen.\small L_{s}^{V}(v_{i},v_{j})=\begin{cases}1,&\text{if }v_{i},\ v_{j}\text{ exist in the same hyperedge}\\ \min(L_{s}^{E}(e_{m},e_{n}))+1,&v_{i}\in{e_{m}},v_{j}\in{e_{n}}\end{cases}. (4)

2.2 Hyper distance-based method for hypernetwork comparison

The ss-distance distribution of a hypernetwork: The proportion of nodes whose ss-distance from node viv_{i} is denoted by pis(j)p_{i}^{s}(j). The ss-distance distribution of node viv_{i} is represented by a vector of size 1×(ds+1)1\times(d^{s}+1), denoted as Pis={pis(j)}P_{i}^{s}=\{p_{i}^{s}(j)\}, where dsd^{s} is the maximum value of ss-distance. The proportion of nodes that have an ss-distance of infinity from node viv_{i} is represented by pis(ds+1)p_{i}^{s}(d^{s}+1). Consequently, we define the ss-distance distribution of the hypernetwork HH as Ps={P1s,P2s,,PNs}P^{s}=\{P_{1}^{s},P_{2}^{s},\cdots,P_{N}^{s}\}, indicating the detailed topological information with respect to the ss-distance between nodes. When s=1s=1, the distance distribution of the ordinary network is obtained, which is derived by HH, is obtained; when s>1s>1, a high-order distance distribution is revealed, which reflects the various high-order topological relationships between nodes that are indicated by different values of ss. It should be noted that for a hypernetwork HH, we have 1sSm1\leq s\leq S_{m}, where SmS_{m} represents the maximum number of overlapping nodes between hyperedges.

Dissimilarity between two hypernetworks: Given a hypernetwork HH and its ss-distance distribution Ps={P1s,P2s,,PNs}(1sSm)P^{s}=\{P_{1}^{s},P_{2}^{s},\cdots,P_{N}^{s}\}(1\leq s\leq S_{m}), we leverage the Jensen-Shannon divergence to define the ss-order heterogeneity, which is named ss-order hypernetwork node dispersion NNDs(H)NND^{s}(H). The definition of NNDsNND^{s} can be expressed as follows:

NNDs(H)=J(P1s,P2s,,PNs)ln(ds+1),NND^{s}(H)=\frac{J(P_{1}^{s},P_{2}^{s},\cdots,P_{N}^{s})}{\ln(d^{s}+1)}, (5)

where J(P1s,P2s,,PNs)J(P_{1}^{s},P_{2}^{s},\cdots,P_{N}^{s}) represents the Jensen-Shannon divergence of nodes ss-distance distribution, and is defined as:

J(P1s,P2s,,PNs)=1Ni,jpis(j)ln(pis(j)μjs),J(P_{1}^{s},P_{2}^{s},\cdots,P_{N}^{s})=\frac{1}{N}\sum_{i,j}p_{i}^{s}(j)\ln(\frac{p_{i}^{s}(j)}{{\mu}_{j}^{s}}), (6)

where μjs{\mu}_{j}^{s} denotes the average of the NN ss-distance distributions, which is given by:

μjs=(i=1Npis(j))/N.{\mu}_{j}^{s}=(\sum_{i=1}^{N}p_{i}^{s}(j))/N. (7)

We compare the dissimilarity between two hypernetworks, i.e., HH and HH^{{}^{\prime}}, based on the ss-order heterogeneity defined above. Specifically, the structural dissimilarity D(H,H)D(H,H^{{}^{\prime}}) is written as follows:

D(H,H)=βs=1SmJ(μHs,μHs)ln2+(1β)s=1Sm|NNDs(H)NNDs(H)|,\scriptsize D(H,H^{{}^{\prime}})=\beta\sum_{s=1}^{S_{m}}\sqrt{\frac{J(\mu_{H}^{s},\mu_{H^{\prime}}^{s})}{\ln 2}}+(1-\beta)\sum_{s=1}^{S_{m}}|\sqrt{NND^{s}(H)}-\sqrt{NND^{s}(H^{{}^{\prime}})}|, (8)

where J(μHs,μHs)J(\mu_{H}^{s},\mu_{H^{\prime}}^{s}) denotes the Jensen-Shannon divergence of the average ss-distance distributions between HH and HH^{{}^{\prime}}. Dissimilarity D(H,H)D(H,H^{{}^{\prime}}) is composed of two terms: The first term compares the average connectivity of nodes at different ss-distances, capturing global dissimilarity between two hypernetworks; The second term integrates the dissimilarities among different network node dispersions via varying the value of ss, indicating the local differences between two hypernetworks. We use a parameter β(0<β<1)\beta(0<\beta<1) to adjust the importance of these two terms. A higher value of D(H,H)D(H,H^{{}^{\prime}}) indicates larger differences between HH and HH^{{}^{\prime}}.

For clarity, we give a flow diagram of the proposed hypernetwork comparison method in Figure 1. Taking a hypernetwork HH as an example, we first compute the ss-distance distribution (P1,P2,,PSmP^{1},P^{2},\cdots,{P^{S_{m}}}). Subsequently, the values of NNDs(H)NND^{s}(H) and μHs\mu_{H}^{s} are calculated using Eqs. (5-7). Consequently, the values of NNDs(H)NND^{s}(H^{{}^{\prime}}) and μHs\mu_{H^{{}^{\prime}}}^{s} can also be computed. We further obtain the dissimilarity between HH and HH^{{}^{\prime}} following Eq. (8).

Refer to caption
Figure 1: Flow diagram of hyper-distance-based method for hypernetwork comparison.

3 Baseline Methods

Previous research [20] has largely relied on the projected ordinary network when comparing hypernetworks, meaning that two hypernetworks are first converted into ordinary networks and then the methods used for ordinary networks are applied for comparison. We assess the efficacy of our approach by extending the comparison methods of ordinary networks to hypernetworks, including an information-theoretic approach based on portraits, adjacency tensors, and hypernetwork centrality-based methods. The specifics of each method are outlined below.

3.1 Portrait-based hypernetwork comparison method

For a given value of ss, the s-distance-based hypernetwork portrait BsB^{s} is an array with (l,k)(l,k) elements [12, 22], and we have

Bl,ksthe number of nodes who have k nodes at s-distance l,B_{l,k}^{s}\equiv\mbox{the number of nodes who have }k\mbox{ nodes at s-distance }l, (9)

where 0lds0\leq{l}\leq{d^{s}}, 0kN10\leq{k}\leq{N-1}, and dsd^{s} denotes the maximum ss-distance value. It is noteworthy that the hypernetwork portrait BsB^{s} is independent of the ordering or labeling of nodes. We use probability distribution QsQ^{s} to comprehend the entries of BSB^{S}. Thus, the probability of a randomly selected node having kk nodes at a distance of ll is expressed by

Qs(l|k)=1NBl,ks.Q^{s}(l|k)=\frac{1}{N}B^{s}_{l,k}. (10)

We consider two hypernetworks, HH and HH^{{}^{\prime}}, and use QsQ^{s} and QsQ^{s^{\prime}} as probability distributions to interpret the rows of the ss-distance-based hypernetwork portraits for each of them. The difference between HH and HH^{{}^{\prime}} is expressed by DP(H,H)D^{P}(H,H^{\prime}) and is stated as:

Dp(H,H)=s=1Sm(12KL(QsMs)+12KL(QsMs)),D^{p}(H,H^{{}^{\prime}})=\sum_{s=1}^{S_{m}}(\frac{1}{2}KL(Q^{s}\|M^{s})+\frac{1}{2}KL(Q^{s^{\prime}}\|M^{s})), (11)

where Ms=12(Qs+Qs)M^{s}=\frac{1}{2}(Q^{s}+Q^{s^{\prime}}), KL(|)KL(\cdotp|\cdotp) is the Kullback-Liebler divergence [23] between the two distributions. We compare the difference between HH and HH^{{}^{\prime}} by comparing the difference of the portrait distributions of every order of distance with the alteration of ss.

3.2 Tensor-based hypernetwork Comparison Method

The adjacency tensor of a hypernetwork HH is indicated by Tξ×N×NT\in\mathbb{R}^{\xi\times{N}\times{N}} [24, 25, 26], where ξ\xi is the maximum size of the hyperedges and NN is the number of nodes. If nodes viv_{i} and vjv_{j} are both present in a hyperedge of size kk, then TkijT_{kij} is set to 1; if not, it is 0. Subsequently, we use the adjacency tensor TT to compare the similarity between two hypernetworks HH and HH^{\prime}, which is represented as follows:

DT(H,H)=k=2ξi,j=1N1γk|TkijTkij|,D_{T}(H,H^{{}^{\prime}})=\sum_{k=2}^{\xi}\sum_{i,j=1}^{N}\frac{1}{\gamma^{k}}|T_{kij}-T^{{}^{\prime}}_{kij}|, (12)

where γ\gamma is a parameter that can be tuned, and we choose γ=2\gamma=2 in the subsequent experiments.

3.3 Centrality-based hypernetwork Comparison Methods

To perform hypernetwork comparisons, we construct centrality vectors for nodes or hyperedges based on various centrality measures. The centrality of each node or hyperedge is denoted by c=(c1,c2,,cN)Tc=(c_{1},c_{2},\dots,c_{N})^{T} or c=(c1,c2,,cM)Tc=(c_{1},c_{2},\dots,c_{M})^{T} respectively, with NN and MM being the total number of nodes and hyperedges. The formula to compare two hypernetworks H,HH,H^{{}^{\prime}} using node centrality methods can be written as:

Dc(H,H)=i=1N|cici|,D_{c}(H,H^{\prime})=\sum_{i=1}^{N}|c_{i}-c^{{}^{\prime}}_{i}|, (13)

and the formula for comparing H,HH,H^{{}^{\prime}} using hyperedge centrality methods is given by

Dc(H,H)=i=1M|cici|D_{c}(H,H^{\prime})=\sum_{i=1}^{M}|c_{i}-c^{{}^{\prime}}_{i}| (14)

We utilize six centrality measures  [2, 27, 21, 28, 29, 30], including node centrality methods such as Hyper Degree Centrality (HDC), Normalized Degree Centrality (NDC), Vector Centrality (VC) and three hyperedge centrality methods, s-Eccentricity Centrality (ECC), Edge-base Degree Centrality (EDC), s-Harmonic Closeness Centrality (HCC), to compare hypernetworks. Centrality measures are employed to create centrality vectors for nodes or hyperedges, which are then used as characteristics for the comparison of hypernetworks.

4 Results

4.1 Comparison of synthetic hypernetworks

We evaluate the performance of our approach by contrasting the synthetic uniform hypernetworks created by the Erdos-Rényi hypernetwork model [20], Watts-Strogats hypernetwork model (WSH) [31], and the Scale-Free hypernetwork model (SFH) [32, 33]. Details of how to generate synthetic hypernetworks are given below:

Erdos-Rényi hypernetwork model (ERH). The ERH model constructs a kEk^{E}-uniform hypernetwork H=(V,E)H=(V,E) given the number of nodes NN and the number of hyperedges MM. Specifically, we first randomly select kEk^{E} nodes to form a hyperedge ee and add ee to EE if eEe\notin E, otherwise we reselect the nodes. We repeat the above process for MM times to obtain HH.

Watts-Strogats hypernetwork model (WSH). The WSH model generates the kEk^{E}-uniform WS hypernetwork given the number of nodes NN and rewiring probability qq. The model is described as follows: Initially, a kEk^{E}-ring hypernetwork H={N,E}H=\{N,E\} is given, where all nodes are arranged in a circular manner, and each node forms two hyperedges with its kE1k^{E}-1 left and right nodes separately. For each hyperedge ee, we randomly select kEk^{E} nodes to form a new hyperedge ee^{{}^{\prime}}. If eEe^{{}^{\prime}}\notin E, we replace ee by ee^{{}^{\prime}} with probability qq. The process will be terminated until all the hyperedges in HH are traversed.

Scale-free hypernetwork model (SFH). Given the number of nodes NN, the number of hyperedges MM and the degree distribution p(k)(k)rp(k)\sim(k)^{-r}, we generate a kEk^{E}-uniform SFH hypernetwork through the following steps: (i) we create a node degree sequence based on the degree distribution p(k)(k)rp(k)\sim(k)^{-r}, and each node viv_{i} is assigned a probability pip_{i} according to the node degree sequence; (ii) For an empty hyperedge ee, we add node viv_{i} to ee with probability pip_{i} until the hyperedge size of ee reaches kEk^{E}. If ee already exists in HH, we regenerate ee; (iii) Repeat step (ii) for MM times.

The results of comparing uniform hypernetworks generated by ERH for various methods are depicted in Figure 2, with kEk^{E} representing the size of hyperedges. For HD, we observe that hypernetworks are similar when the values of kEk^{E} are close, which is in agreement with the way the hypernetworks are generated. The other baseline methods (except Portrait and ECC) show similar trends with those of HD with different values, indicating that the majority of the methods can distinguish hypernetworks generated by ERH with different kEk^{E}. We analyze the hypernetworks created by WSH and compare them in Figure 3 based on different rewiring probabilities qq. The results demonstrate that HD, Tensors, EDC, HCC, and NDC are capable of distinguishing hypernetworks created by different qq values. On the other hand, the other methods cannot. For instance, the dissimilarity between networks generated by q=0.1q=0.1 and q=0.5q=0.5 and networks generated by q=0.3q=0.3 and q=0.5q=0.5 are roughly the same, which is counter-intuitive. For the uniform hypernetworks generated by SFH, we show the difference between networks generated by different values of rr (see Figure 4), which is the slope of the degree distribution. It can be intuitively assumed that hypernetworks should be alike when the value of rr is near, and HD can demonstrate this pattern. In contrast, the other baselines are unable to distinguish between hypernetworks created with different rr values. Overall speaking, HD is robust to distinguish hypernetworks generated by different models, but the baselines have weakness in different scenarios.

Refer to caption
Figure 2: Comparison of hypernetworks generated by ERH, where N=1000N=1000 and M=500M=500. The methods are: (a) HD; (b) Portrait; (c) Tensors; (d) HDC; (e) NDC; (f) VC; (g) ECC; (h) EDC; (i) HCC.
Refer to caption
Figure 3: Comparison of hypernetworks generated by WSH, where N=1000N=1000 , M=3000M=3000 and kE=4k^{E}=4. The methods are: (a) HD; (b) Portrait; (c) Tensors; (d) HDC; (e) NDC; (f) VC; (g) ECC; (h) EDC; (i) HCC.
Refer to caption
Figure 4: Comparison of hypernetworks generated by SFH, where N=1000N=1000 , M=500M=500 and kE=4k^{E}=4. The methods are: (a) HD; (b) Portrait; (c) Tensors; (d) HDC; (e) NDC; (f) VC; (g) ECC; (h) EDC; (i) HCC.

We further explore whether our method can classify hypernetworks generated by different mechanisms and the results are given in Figure 5. We generate ERHs, WSHs, and SFHs with a network size of N=1000N=1000. For each type, we generate 3030 hypernetworks. The similarities between the hypernetworks are calculated by HD, and we further adopt multidimensional scaling map to show the distance between them in a geometric space. A shorter distance indicates that the two hypernetworks are more similar. The results of our analysis demonstrate that HD is effective in classifying hypernetworks, as those generated by the same mechanism are grouped together and those generated by different mechanisms are separated.

Refer to caption
Figure 5: Synthetic hypernetwork classification: Red: ERH (N=1000,M=500,kE=4N=1000,M=500,k^{E}=4), Blue: SFH (N=1000,M=500,kE=4,r=2.1N=1000,M=500,k^{E}=4,r=2.1), Green: WSH (N=1000,M=3000,kE=4,q=0.05N=1000,M=3000,k^{E}=4,q=0.05).

4.2 Comparison of empirical hypernetworks

We use six hypernetworks created from empirical data to prove the efficiency of our approach. Yuecai and Chuancai are recipe hypernetworks in which the hyperedges are made up of the components of each dish. Restaurants-Rev and Bar-Rev are review hypernetworks obtained from Yelp.com. These hypernetworks are composed of users who have reviewed the same restaurant or bar. NDC-classes is a hypernetwork of drugs, with each hyperedge representing a combination of class labels that make up a single drug. Algebra is collected from MathOverflow.net, where users who answered the same question are enclosed in a hyperedge. The information regarding the topological characteristics of the hypernetworks is presented in Table 1. This table includes the number of nodes, the number of hyperedges, the degree, the average hyperdegree, the average size of the hyperedges, the clustering coefficient, the average shortest path length, the diameter, and the average link density of each hypernetwork. It is worth noting that the clustering coefficient, the average shortest path length, the diameter, and the average link density are computed based on the ordinary networks of the hypernetworks.

Table 1: The topological properties of hypernetworks, where NN, MM, k\langle k\rangle, kH\langle k^{H}\rangle, kE\langle k^{E}\rangle, cc, l\langle l\rangle, dd and ρ\rho denote the number of nodes, the number of hyperedges, the average degree, the average hyperdegree, the average size of the hyperedges, the clustering coefficient, the average shortest path length, the diameter and the average density of hypernetwork, respectively.
Data NN MM k\langle k\rangle kH\langle k^{H}\rangle kE\langle k^{E}\rangle CC l\langle l\rangle dd ρ\rho
Yuecai 497 867 4.52 3.17 1.82 0.39 3.44 8 0.01
Chuancai 438 835 5.42 3.40 1.79 0.35 3.42 8 0.01
Restaurants-Rev 565 601 8.14 8.14 7.66 0.54 1.98 5 0.14
Bars-Rev 1234 1194 174.30 9.62 9.93 0.58 2.10 6 0.14
NDC-classes 1161 1088 10.72 5.54 5.92 0.61 3.50 9 0.01
Algebra 423 1268 78.90 19.53 6.52 0.79 1.95 5 0.19
Refer to caption
Figure 6: Results of comparing the differences in perturbation across different methods for different real networks: (a)Yuecai; (b)Chuancai; (c)Restaurants-Rev; (d)Bars-Rev; (e)NDC-classes; (f)Algebra.

We assess the effectiveness of our proposed technique by performing hyperedge perturbations on empirical hypernetworks. We randomly add or take away a certain proportion (f[0.9,0.9]f\in[-0.9,0.9]) of hyperedges from a hypernetwork. A positive value of ff indicates the addition of hyperedges, while a negative value implies the deletion of hyperedges. The size of each added hyperedge is randomly chosen from the range of [2,ξ][2,\xi], where ξ\xi is the maximum hyperedge size. We compare the difference between the original hypernetwork and the perturbed hypernetworks, the results are shown in Figure 6. The proposed method, i.e., HD, exhibits a good increasing trend of the dissimilarity values in both hyperedge deletion and addition perturbations in all the empirical hypernetworks. It follows that if we make more changes to the number of hyperedges in the hypernetwork, the resulting hypernetworks will be more different from the original one, which is in line with our expectations. In contrast, baseline methods are inferior to HD. For example, the portrait method shows a clear decreasing trend of dissimilarity values in hyperedge deletion for hypernetworks such as Yuecai, Chuancai, and Algebra, which means if we delete more hyperedges from a hypernetwork, the generated hypernetwork is more similar to the original one. Similarly, the Tensors method shows a decreasing trend in hyperedge deletion for all six hypernetworks, while the increasing trend is not significant in the hyperedge addition for Restaurant-Rev, Bars-Rev, NDC-classes, and Algebra. Moreover, the six centrality methods, namely, ECC, EDC, HCC, HDC, NDC, and VC, show either an indistinct increasing or a decreasing trend in hyperedge deletion for the six hypernetworks.

4.3 Parameter Analysis

The method we proposed contains a parameter β\beta that emphasizes the importance of global and local dissimilarity between two hypernetworks. We further explore how the value of HD changes with β\beta. The results are given in Figure 7 for six different hypernetworks. We observe that the values of HD with higher β\beta values are lower for hyperedge deletion process. However, for the hyperedge addition process, there is no consistent pattern of how HD changes with β\beta.

Refer to caption
Figure 7: Comparing the differences in perturbations for different hypernetworks by altering the value of β\beta: (a)Yuecai; (b)Chuancai; (c)Restaurants-Rev; (d)Bars-Rev; (e)NDC-classes; (f)Algebra.

5 Conclusions & Discussion

In this paper, we propose a method called the hyper-distance-based network comparison method (HD) to compare the dissimilarity between hypernetworks. HD is based on higher-order distances and utilizes node high-order distance distributions and Jensen-Shannon divergence. Specifically, we first compute the ss-distance between each pair of nodes using s-walk and generate the ss-distance distribution matrices of the hypernetwork. Next, we compare two hypernetworks via the Jensen-Shannon divergence, which encodes comparing the difference of the average connectivity between nodes at different ss-distances and the heterogeneity of distance distributions. We compare the proposed HD method with other baselines on various synthetic and real hypernetworks. The experimental results demonstrate that HD is superior to the others in revealing the dissimilarity between hypernetworks. Future research directions include further expanding and improving the higher-order-distance-based method for comparing hypernetwork dissimilarity to adapt to more complex and large-scale network datasets. Furthermore, it can be extended to other types of networks such as multilayer hypernetworks and temporal hypernetworks [34, 35].

6 Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

7 Data availability

Data will be available on request.

8 Acknowledgments

This work was supported by Natural Science Foundation of Zhejiang Province (Grant No. LQ22F030008), the National Natural Science Foundation of China (Grant No. 92146001), the Fundamental Research Funds for the Central Universities of Ministry of Education of China, and the Scientific Research Foundation for Scholars of HZNU (2021QDL030).

References

  • [1] Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang DU. Complex networks: Structure and dynamics. Phys Rep 2006;424:175–308.
  • [2] Tantardini M, Ieva F, Tajoli L, Piccardi C. Comparing methods for comparing networks. Sci Rep 2019;9:1–19.
  • [3] Albert R, Barabási AL. Statistical mechanics of complex networks. Rev Mod Phys 2002;74:47.
  • [4] Hand DJ. Principles of data mining. Drug Saf 2007;30:621–622.
  • [5] Rolnick D, Donti PL, Kaack LH, Kochanski K, Lacoste A, Sankaran K, Ross AS, Milojevic-Dupont N, Jaques N, Waldman-Brown A, et al. Tackling climate change with machine learning. ACM Comput Surv 2022;55:1–96.
  • [6] Xu HC, Wang ZY, Jawadi F, Zhou WX. Reconstruction of international energy trade networks with given marginal data: A comparative analysis. Chaos Solit Fractals 2023;167:113031.
  • [7] Fkih F. Similarity measures for Collaborative Filtering-based Recommender Systems: Review and experimental comparison. J King Saud Univ Comput Inf Sci 2022;34:7645–7669.
  • [8] Tabak MA, Norouzzadeh MS, Wolfson DW, Sweeney SJ, VerCauteren KC, Snow NP, Halseth JM, Di Salvo PA, Lewis JS, White MD, et al. Machine learning to classify animal species in camera trap images: Applications in ecology. Methods Ecol Evol 2019;10:585–590.
  • [9] Greener JG, Kandathil SM, Moffat L, Jones DT. A guide to machine learning for biologists. Nat Rev Mol Cell Biol 2022;23:40–55.
  • [10] Koutra D, Shah N, Vogelstein JT, Gallagher B, Faloutsos C. Deltacon: Principled massive-graph similarity function with attribution. ACM Trans Knowl Discov Data 2016;10:1–43.
  • [11] Aliakbary S, Motallebi S, Rashidian S, Habibi J, Movaghar A. Distance metric learning for complex networks: Towards size-independent comparison of network structures. Chaos 2015;25:023111.
  • [12] Bagrow JP, Bollt EM. An information-theoretic, all-scales approach to comparing networks. Appl Netw Sci 2019;4:1–15.
  • [13] Wang Z, Zhan XX, Liu C, Zhang ZK. Quantification of network structural dissimilarities based on network embedding. iScience 2022;25.
  • [14] Newman ME. The structure and function of networks. Comput Phys Commun 2002;147:40–45.
  • [15] Battiston F, Cencetti G, Iacopini I, Latora V, Lucas M, Patania A, Young JG, Petri G. Networks beyond pairwise interactions: structure and dynamics. Phys Rep 2020;874:1–92.
  • [16] Benson AR, Gleich DF, Leskovec J. Higher-order organization of complex networks. Science 2016;353:163–166.
  • [17] Sweeney P, Chen C, Rajapakse I, Cone RD. Network dynamics of hypothalamic feeding neurons. Proc Natl Acad Sci 2021;118:e2011140118.
  • [18] Xie M, Zhan XX, Liu C, Zhang ZK. An efficient adaptive degree-based heuristic algorithm for influence maximization in hypergraphs. Inf Process Manag 2023;60:103161.
  • [19] Chodrow PS, Veldt N, Benson AR. Generative hypergraph clustering: From blockmodels to modularity. Sci Adv 2021;7:eabh1303.
  • [20] Surana A, Chen C, Rajapakse I. Hypergraph similarity measures. IEEE Trans Netw Sci Eng 2022;10:658–674.
  • [21] Aksoy SG, Joslyn C, Marrero CO, Praggastis B, Purvine E. Hypernetwork science via high-order hypergraph walks. EPJ Data Sci 2020;9:16.
  • [22] Bagrow JP, Bollt EM, Skufca JD, Ben-Avraham D. Portraits of complex networks. EPL 2008;81:68004.
  • [23] Van Erven T, Harremos P. Rényi divergence and Kullback-Leibler divergence. IEEE Trans Inf Theory 2014;60:3797–3820.
  • [24] Banerjee A, Char A, Mondal B. Spectra of general hypergraphs. Linear Algebra Appl 2017;518:14–30.
  • [25] Li G, Qi L, Yu G. The Z-eigenvalues of a symmetric tensor and its application to spectral hypergraph theory. Numer Linear Algebra Appl 2013;20:1001–1029.
  • [26] Chang J, Chen Y, Qi L, Yan H. Hypergraph clustering using a new laplacian tensor with applications in image processing. SIAM J Imaging Sci 2020;13:1157–1178.
  • [27] Kovalenko K, Romance M, Vasilyeva E, Aleja D, Criado R, Musatov D, Raigorodskii AM, Flores J, Samoylenko I, Alfaro-Bittner K, et al. Vector centrality in hypergraphs. Chaos Solutions Fractals 2022;162:112397.
  • [28] Hu F, Ma L, Zhan XX, Zhou Y, Liu C, Zhao H, Zhang ZK. The aging effect in evolving scientific citation networks. Scientometrics 2021;126:4297–4309.
  • [29] Wang JW, Rong LL, Deng QH, Zhang JY. Evolving hypernetwork model. Eur Phys J B 2010;77:493–498.
  • [30] Xie X, Zhan X, Zhang Z, Liu C. Vital node identification in hypergraphs via gravity model. Chaos 2023;33:013104.
  • [31] Watts DJ, Strogatz SH. Collective dynamics of ‘small-world’networks. Nature 1998;393:440–442.
  • [32] Ko J, Kook Y, Shin K. Growth patterns and models of real-world hypergraphs. Knowl Inf Syst 2022;64:2883–2920.
  • [33] Barabási AL, Albert R. Emergence of scaling in random networks. Science 1999;286:509–512.
  • [34] Fischer MT, Arya D, Streeb D, Seebacher D, Keim DA, Worring M. Visual analytics for temporal hypergraph model exploration. IEEE Trans Vis Comput 2020;27:550–560.
  • [35] Holme P, Saramäki J. Temporal networks. Phys Rep 2012;519:97–125.