A hyper-distance-based method for hypernetwork comparison

Tao Xu Xiaowen Xie Zi-Ke Zhang Chuang Liu Xiu-Xiu Zhan [email protected] Alibaba Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou, 311121, P. R. China Digital Communication Research Center, Zhejiang University, Hangzhou 310058, China College of Media and International Culture, Zhejiang University, Hangzhou 310058, PR China

Abstract

Hypernetwork is a useful way to depict multiple connections between nodes, making it an ideal tool for representing complex relationships in network science. In recent years, there has been a marked increase in studies on hypernetworks, however, the comparison of the difference between two hypernetworks has been given less attention. This paper proposes a hyper-distance-based method (HD) for comparing hypernetworks. This method takes into account high-order information, such as the high-order distance between nodes. The experiments carried out on synthetic hypernetworks have shown that HD is capable of distinguishing between hypernetworks generated with different parameters, and it is successful in the classification of hypernetworks. Furthermore, HD outperforms current state-of-the-art baselines to distinguish empirical hypernetworks when hyperedges are disrupted.

keywords:

Hypernetwork, Higher-order Distance, Hypernetwork Comparison , Jensen-Shannon Divergence

1 Introduction

With the rapid development of the Internet and the explosive growth of information [1], the size and complexity of the network data continue to increase. In the modern digital era, it is essential to compare and evaluate the distinctions between various networks [2] in order to gain a thorough understanding of the structure, function, and behavior of the network. The exploration of network comparison is a significant area of study with implications for disciplines such as network science, data mining, and machine learning [3, 4, 5, 6] and can be used to solve problems such as network classification, algorithm evaluation, and anomaly detection [7, 8, 9]. Nevertheless, the customary techniques for contrasting networks mainly concentrate on the low-order features of networks. For instance, Han et al. [4] proposed a straightforward and effective measure, i.e., the Jaccard method, which calculates the difference between the adjacency matrices of two networks. Koutra et al. [10] employed a method called Deltacon to compare the similarity between all pairs of nodes in two networks. Aliakbary et al. [11] analyzed the graphlet distribution of a network and compared the similarity between two networks through the graphlet distribution. Furthermore, Bagrow et al. [12] introduced the network portrait and KL divergence to compare two networks. Despite the fact that these techniques may partially uncover the fundamental characteristics of networks [13], they have restrictions on grasping the intricate multivariate connections existing in complex systems. In the real world, many complex systems exhibit rich multivariate relationships, including dependencies among multiple variables and interactions among multiple objects. For example, in collaboration networks, multiple individuals may collaborate on one project. In the brain’s neural system, multiple regions can be highly active simultaneously. In social platforms, groups made up of multiple individuals share information and interact with each other [14, 15, 16, 17]. The relationships mentioned above can be modeled by hypernetworks, which are networks composed of nodes and hyperedges. A hyperedge may contain more than two nodes, and a single node may belong to multiple hyperedges [18]. Representing a complex system as a hypernetwork can break the limitation of ”binary interactions” in an ordinary network and better reflect the multidimensional interactions present in the system [19]. Several methods have been proposed to compare differences in hypernetworks, such as those of Surana et al. [20]. However, most of these methods are adaptations of ordinary network comparison methods and do not fully utilize the high-order characteristics specific to hypernetworks.

To address this problem, we propose a hyper-distance-based network comparison method (HD), which reveals a comprehensive picture of network dissimilarity and characterizes the dissimilarity between hypernetworks using high-order information. The method compares the differences between two hypernetworks through the shortest path lengths defined by the high-order distance. We apply the method to compare hypernetworks generated by network models as well as empirical hypernetworks deduced by real data from different domains. The results show that our method can successfully compare different networks and perform better than the baselines.

The rest of this paper is structured as follows. Section 2 outlines the specifics of the hyper-distance-based network comparison approach. Section 3 provides an overview of the baseline methods. Section 4 presents the experimental results of comparing different hypernetworks. Finally, Section 5 summarizes the findings and suggests potential future research directions.

2 Hyper-distance-based network comparison method

In this section, we propose a hypernetwork comparison method based on the local and global properties of hypernetworks. We first illustrate the basic definitions including the high-order distance, and the hypernetwork comparison method is further defined based on the high-order distance.

2.1 Basic Definitions

A hypernetwork is defined as $H=(V,E)$ , with sets of nodes $V$ and hyperedges $E$ represented as $V=\{v_{1},v_{2},\cdots,v_{N}\}$ and $E=\{e_{1},e_{2},\cdots,e_{M}\}$ , respectively, where $N$ and $M$ represent the number of nodes and hyperedges. An incident matrix $I_{N*M}$ is constructed based on the membership relationship between the nodes and hyperedges. If node $v_{i}$ is part of a hyperedge $e_{j}$ , then the value of $I_{ij}$ is set to $1$ , otherwise it is $0$ . Therefore, we can obtain the following.

k^{H}(v_{i})=\sum_{e_{j}\in{E}}I_{ij},

(1)

k^{E}(e_{j})=\sum_{v_{i}\in{V}}I_{ij},

(2)

where $k(v_{i})$ and $k^{E}(e_{j})$ represent the hyperdegree of node $v_{i}$ and the size of hyperedge $e_{j}$ , respectively.

We create the ordinary network $G=(V,E^{o})$ of $H$ by linking all the nodes that are part of a hyperedge in $H$ . In other words, an edge $e^{o}=(v_{i},v_{j})\in E^{o}$ if $v_{i}$ and $v_{j}$ both appear in at least one of the hyperedges in $E$ . The network $G$ is an unweighted and undirected network, and its adjacency matrix is denoted as $A^{o}$ . This means that if there is an edge between the nodes $v_{i}$ and $v_{j}$ in $G$ , then $A^{o}_{ij}$ is equal to 1, otherwise it is 0.

The $s$ -distance between nodes: We employ the hyperedge distance from [21] to calculate the $s$ -distance between two nodes. Two hyperedges are considered to be $s$ -adjacent if they have at least $s$ nodes in common. Concretely, the $s$ -distance between two hyperedges, $e_{i}$ and $e_{j}$ , is denoted as $L_{s}^{E}(e_{i},e_{j})$ and is expressed as:

\small L_{s}^{E}(e_{i},e_{j})=\begin{cases}\text{The shortest }s\text{-path length,}&\text{if there exists an }s\text{-path between }e_{i}\text{ and }e_{j}\\ \infty,&\text{if there is no }s\text{-path between}e_{i}\text{ and }e_{j}\end{cases}.

(3)

Based on the above definition, the $s$ -distance between two nodes $v_{i}$ and $v_{j}$ is $L_{s}^{V}(v_{i},v_{j})$ and can be defined as follows:

\small L_{s}^{V}(v_{i},v_{j})=\begin{cases}1,&\text{if }v_{i},\ v_{j}\text{ exist in the same hyperedge}\\ \min(L_{s}^{E}(e_{m},e_{n}))+1,&v_{i}\in{e_{m}},v_{j}\in{e_{n}}\end{cases}.

(4)

2.2 Hyper distance-based method for hypernetwork comparison

The $s$ -distance distribution of a hypernetwork: The proportion of nodes whose $s$ -distance from node $v_{i}$ is denoted by $p_{i}^{s}(j)$ . The $s$ -distance distribution of node $v_{i}$ is represented by a vector of size $1\times(d^{s}+1)$ , denoted as $P_{i}^{s}=\{p_{i}^{s}(j)\}$ , where $d^{s}$ is the maximum value of $s$ -distance. The proportion of nodes that have an $s$ -distance of infinity from node $v_{i}$ is represented by $p_{i}^{s}(d^{s}+1)$ . Consequently, we define the $s$ -distance distribution of the hypernetwork $H$ as $P^{s}=\{P_{1}^{s},P_{2}^{s},\cdots,P_{N}^{s}\}$ , indicating the detailed topological information with respect to the $s$ -distance between nodes. When $s=1$ , the distance distribution of the ordinary network is obtained, which is derived by $H$ , is obtained; when $s>1$ , a high-order distance distribution is revealed, which reflects the various high-order topological relationships between nodes that are indicated by different values of $s$ . It should be noted that for a hypernetwork $H$ , we have $1\leq s\leq S_{m}$ , where $S_{m}$ represents the maximum number of overlapping nodes between hyperedges.

Dissimilarity between two hypernetworks: Given a hypernetwork $H$ and its $s$ -distance distribution $P^{s}=\{P_{1}^{s},P_{2}^{s},\cdots,P_{N}^{s}\}(1\leq s\leq S_{m})$ , we leverage the Jensen-Shannon divergence to define the $s$ -order heterogeneity, which is named $s$ -order hypernetwork node dispersion $NND^{s}(H)$ . The definition of $NND^{s}$ can be expressed as follows:

NND^{s}(H)=\frac{J(P_{1}^{s},P_{2}^{s},\cdots,P_{N}^{s})}{\ln(d^{s}+1)},

(5)

where $J(P_{1}^{s},P_{2}^{s},\cdots,P_{N}^{s})$ represents the Jensen-Shannon divergence of nodes $s$ -distance distribution, and is defined as:

J(P_{1}^{s},P_{2}^{s},\cdots,P_{N}^{s})=\frac{1}{N}\sum_{i,j}p_{i}^{s}(j)\ln(\frac{p_{i}^{s}(j)}{{\mu}_{j}^{s}}),

(6)

where ${\mu}_{j}^{s}$ denotes the average of the $N$ $s$ -distance distributions, which is given by:

{\mu}_{j}^{s}=(\sum_{i=1}^{N}p_{i}^{s}(j))/N.

(7)

We compare the dissimilarity between two hypernetworks, i.e., $H$ and $H^{{}^{\prime}}$ , based on the $s$ -order heterogeneity defined above. Specifically, the structural dissimilarity $D(H,H^{{}^{\prime}})$ is written as follows:

\scriptsize D(H,H^{{}^{\prime}})=\beta\sum_{s=1}^{S_{m}}\sqrt{\frac{J(\mu_{H}^{s},\mu_{H^{\prime}}^{s})}{\ln 2}}+(1-\beta)\sum_{s=1}^{S_{m}}|\sqrt{NND^{s}(H)}-\sqrt{NND^{s}(H^{{}^{\prime}})}|,

(8)

where $J(\mu_{H}^{s},\mu_{H^{\prime}}^{s})$ denotes the Jensen-Shannon divergence of the average $s$ -distance distributions between $H$ and $H^{{}^{\prime}}$ . Dissimilarity $D(H,H^{{}^{\prime}})$ is composed of two terms: The first term compares the average connectivity of nodes at different $s$ -distances, capturing global dissimilarity between two hypernetworks; The second term integrates the dissimilarities among different network node dispersions via varying the value of $s$ , indicating the local differences between two hypernetworks. We use a parameter $\beta(0<\beta<1)$ to adjust the importance of these two terms. A higher value of $D(H,H^{{}^{\prime}})$ indicates larger differences between $H$ and $H^{{}^{\prime}}$ .

For clarity, we give a flow diagram of the proposed hypernetwork comparison method in Figure 1. Taking a hypernetwork $H$ as an example, we first compute the $s$ -distance distribution ( $P^{1},P^{2},\cdots,{P^{S_{m}}}$ ). Subsequently, the values of $NND^{s}(H)$ and $\mu_{H}^{s}$ are calculated using Eqs. (5-7). Consequently, the values of $NND^{s}(H^{{}^{\prime}})$ and $\mu_{H^{{}^{\prime}}}^{s}$ can also be computed. We further obtain the dissimilarity between $H$ and $H^{{}^{\prime}}$ following Eq. (8).

Refer to caption — Figure 1: Flow diagram of hyper-distance-based method for hypernetwork comparison.

3 Baseline Methods

Previous research [20] has largely relied on the projected ordinary network when comparing hypernetworks, meaning that two hypernetworks are first converted into ordinary networks and then the methods used for ordinary networks are applied for comparison. We assess the efficacy of our approach by extending the comparison methods of ordinary networks to hypernetworks, including an information-theoretic approach based on portraits, adjacency tensors, and hypernetwork centrality-based methods. The specifics of each method are outlined below.

3.1 Portrait-based hypernetwork comparison method

For a given value of $s$ , the s-distance-based hypernetwork portrait $B^{s}$ is an array with $(l,k)$ elements [12, 22], and we have

B_{l,k}^{s}\equiv\mbox{the number of nodes who have }k\mbox{ nodes at s-distance }l,

(9)

where $0\leq{l}\leq{d^{s}}$ , $0\leq{k}\leq{N-1}$ , and $d^{s}$ denotes the maximum $s$ -distance value. It is noteworthy that the hypernetwork portrait $B^{s}$ is independent of the ordering or labeling of nodes. We use probability distribution $Q^{s}$ to comprehend the entries of $B^{S}$ . Thus, the probability of a randomly selected node having $k$ nodes at a distance of $l$ is expressed by

Q^{s}(l|k)=\frac{1}{N}B^{s}_{l,k}.

(10)

We consider two hypernetworks, $H$ and $H^{{}^{\prime}}$ , and use $Q^{s}$ and $Q^{s^{\prime}}$ as probability distributions to interpret the rows of the $s$ -distance-based hypernetwork portraits for each of them. The difference between $H$ and $H^{{}^{\prime}}$ is expressed by $D^{P}(H,H^{\prime})$ and is stated as:

D^{p}(H,H^{{}^{\prime}})=\sum_{s=1}^{S_{m}}(\frac{1}{2}KL(Q^{s}\|M^{s})+\frac{1}{2}KL(Q^{s^{\prime}}\|M^{s})),

(11)

where $M^{s}=\frac{1}{2}(Q^{s}+Q^{s^{\prime}})$ , $KL(\cdotp|\cdotp)$ is the Kullback-Liebler divergence [23] between the two distributions. We compare the difference between $H$ and $H^{{}^{\prime}}$ by comparing the difference of the portrait distributions of every order of distance with the alteration of $s$ .

3.2 Tensor-based hypernetwork Comparison Method

The adjacency tensor of a hypernetwork $H$ is indicated by $T\in\mathbb{R}^{\xi\times{N}\times{N}}$ [24, 25, 26], where $\xi$ is the maximum size of the hyperedges and $N$ is the number of nodes. If nodes $v_{i}$ and $v_{j}$ are both present in a hyperedge of size $k$ , then $T_{kij}$ is set to 1; if not, it is 0. Subsequently, we use the adjacency tensor $T$ to compare the similarity between two hypernetworks $H$ and $H^{\prime}$ , which is represented as follows:

D_{T}(H,H^{{}^{\prime}})=\sum_{k=2}^{\xi}\sum_{i,j=1}^{N}\frac{1}{\gamma^{k}}|T_{kij}-T^{{}^{\prime}}_{kij}|,

(12)

where $\gamma$ is a parameter that can be tuned, and we choose $\gamma=2$ in the subsequent experiments.

3.3 Centrality-based hypernetwork Comparison Methods

To perform hypernetwork comparisons, we construct centrality vectors for nodes or hyperedges based on various centrality measures. The centrality of each node or hyperedge is denoted by $c=(c_{1},c_{2},\dots,c_{N})^{T}$ or $c=(c_{1},c_{2},\dots,c_{M})^{T}$ respectively, with $N$ and $M$ being the total number of nodes and hyperedges. The formula to compare two hypernetworks $H,H^{{}^{\prime}}$ using node centrality methods can be written as:

D_{c}(H,H^{\prime})=\sum_{i=1}^{N}|c_{i}-c^{{}^{\prime}}_{i}|,

(13)

and the formula for comparing $H,H^{{}^{\prime}}$ using hyperedge centrality methods is given by

D_{c}(H,H^{\prime})=\sum_{i=1}^{M}|c_{i}-c^{{}^{\prime}}_{i}|

(14)

We utilize six centrality measures [2, 27, 21, 28, 29, 30], including node centrality methods such as Hyper Degree Centrality (HDC), Normalized Degree Centrality (NDC), Vector Centrality (VC) and three hyperedge centrality methods, s-Eccentricity Centrality (ECC), Edge-base Degree Centrality (EDC), s-Harmonic Closeness Centrality (HCC), to compare hypernetworks. Centrality measures are employed to create centrality vectors for nodes or hyperedges, which are then used as characteristics for the comparison of hypernetworks.

4 Results

4.1 Comparison of synthetic hypernetworks

We evaluate the performance of our approach by contrasting the synthetic uniform hypernetworks created by the Erdos-Rényi hypernetwork model [20], Watts-Strogats hypernetwork model (WSH) [31], and the Scale-Free hypernetwork model (SFH) [32, 33]. Details of how to generate synthetic hypernetworks are given below:

Erdos-Rényi hypernetwork model (ERH). The ERH model constructs a $k^{E}$ -uniform hypernetwork $H=(V,E)$ given the number of nodes $N$ and the number of hyperedges $M$ . Specifically, we first randomly select $k^{E}$ nodes to form a hyperedge $e$ and add $e$ to $E$ if $e\notin E$ , otherwise we reselect the nodes. We repeat the above process for $M$ times to obtain $H$ .

Watts-Strogats hypernetwork model (WSH). The WSH model generates the $k^{E}$ -uniform WS hypernetwork given the number of nodes $N$ and rewiring probability $q$ . The model is described as follows: Initially, a $k^{E}$ -ring hypernetwork $H=\{N,E\}$ is given, where all nodes are arranged in a circular manner, and each node forms two hyperedges with its $k^{E}-1$ left and right nodes separately. For each hyperedge $e$ , we randomly select $k^{E}$ nodes to form a new hyperedge $e^{{}^{\prime}}$ . If $e^{{}^{\prime}}\notin E$ , we replace $e$ by $e^{{}^{\prime}}$ with probability $q$ . The process will be terminated until all the hyperedges in $H$ are traversed.

Scale-free hypernetwork model (SFH). Given the number of nodes $N$ , the number of hyperedges $M$ and the degree distribution $p(k)\sim(k)^{-r}$ , we generate a $k^{E}$ -uniform SFH hypernetwork through the following steps: (i) we create a node degree sequence based on the degree distribution $p(k)\sim(k)^{-r}$ , and each node $v_{i}$ is assigned a probability $p_{i}$ according to the node degree sequence; (ii) For an empty hyperedge $e$ , we add node $v_{i}$ to $e$ with probability $p_{i}$ until the hyperedge size of $e$ reaches $k^{E}$ . If $e$ already exists in $H$ , we regenerate $e$ ; (iii) Repeat step (ii) for $M$ times.

The results of comparing uniform hypernetworks generated by ERH for various methods are depicted in Figure 2, with $k^{E}$ representing the size of hyperedges. For HD, we observe that hypernetworks are similar when the values of $k^{E}$ are close, which is in agreement with the way the hypernetworks are generated. The other baseline methods (except Portrait and ECC) show similar trends with those of HD with different values, indicating that the majority of the methods can distinguish hypernetworks generated by ERH with different $k^{E}$ . We analyze the hypernetworks created by WSH and compare them in Figure 3 based on different rewiring probabilities $q$ . The results demonstrate that HD, Tensors, EDC, HCC, and NDC are capable of distinguishing hypernetworks created by different $q$ values. On the other hand, the other methods cannot. For instance, the dissimilarity between networks generated by $q=0.1$ and $q=0.5$ and networks generated by $q=0.3$ and $q=0.5$ are roughly the same, which is counter-intuitive. For the uniform hypernetworks generated by SFH, we show the difference between networks generated by different values of $r$ (see Figure 4), which is the slope of the degree distribution. It can be intuitively assumed that hypernetworks should be alike when the value of $r$ is near, and HD can demonstrate this pattern. In contrast, the other baselines are unable to distinguish between hypernetworks created with different $r$ values. Overall speaking, HD is robust to distinguish hypernetworks generated by different models, but the baselines have weakness in different scenarios.

We further explore whether our method can classify hypernetworks generated by different mechanisms and the results are given in Figure 5. We generate ERHs, WSHs, and SFHs with a network size of $N=1000$ . For each type, we generate $30$ hypernetworks. The similarities between the hypernetworks are calculated by HD, and we further adopt multidimensional scaling map to show the distance between them in a geometric space. A shorter distance indicates that the two hypernetworks are more similar. The results of our analysis demonstrate that HD is effective in classifying hypernetworks, as those generated by the same mechanism are grouped together and those generated by different mechanisms are separated.

4.2 Comparison of empirical hypernetworks

We use six hypernetworks created from empirical data to prove the efficiency of our approach. Yuecai and Chuancai are recipe hypernetworks in which the hyperedges are made up of the components of each dish. Restaurants-Rev and Bar-Rev are review hypernetworks obtained from Yelp.com. These hypernetworks are composed of users who have reviewed the same restaurant or bar. NDC-classes is a hypernetwork of drugs, with each hyperedge representing a combination of class labels that make up a single drug. Algebra is collected from MathOverflow.net, where users who answered the same question are enclosed in a hyperedge. The information regarding the topological characteristics of the hypernetworks is presented in Table 1. This table includes the number of nodes, the number of hyperedges, the degree, the average hyperdegree, the average size of the hyperedges, the clustering coefficient, the average shortest path length, the diameter, and the average link density of each hypernetwork. It is worth noting that the clustering coefficient, the average shortest path length, the diameter, and the average link density are computed based on the ordinary networks of the hypernetworks.

Table 1: The topological properties of hypernetworks, where

N

M

\langle k\rangle

\langle k^{H}\rangle

\langle k^{E}\rangle

c

\langle l\rangle

d

and

\rho

denote the number of nodes, the number of hyperedges, the average degree, the average hyperdegree, the average size of the hyperedges, the clustering coefficient, the average shortest path length, the diameter and the average density of hypernetwork, respectively.

Data	$N$	$M$	$\langle k\rangle$	$\langle k^{H}\rangle$	$\langle k^{E}\rangle$	$C$	$\langle l\rangle$	$d$	$\rho$
Yuecai	497	867	4.52	3.17	1.82	0.39	3.44	8	0.01
Chuancai	438	835	5.42	3.40	1.79	0.35	3.42	8	0.01
Restaurants-Rev	565	601	8.14	8.14	7.66	0.54	1.98	5	0.14
Bars-Rev	1234	1194	174.30	9.62	9.93	0.58	2.10	6	0.14
NDC-classes	1161	1088	10.72	5.54	5.92	0.61	3.50	9	0.01
Algebra	423	1268	78.90	19.53	6.52	0.79	1.95	5	0.19

We assess the effectiveness of our proposed technique by performing hyperedge perturbations on empirical hypernetworks. We randomly add or take away a certain proportion ( $f\in[-0.9,0.9]$ ) of hyperedges from a hypernetwork. A positive value of $f$ indicates the addition of hyperedges, while a negative value implies the deletion of hyperedges. The size of each added hyperedge is randomly chosen from the range of $[2,\xi]$ , where $\xi$ is the maximum hyperedge size. We compare the difference between the original hypernetwork and the perturbed hypernetworks, the results are shown in Figure 6. The proposed method, i.e., HD, exhibits a good increasing trend of the dissimilarity values in both hyperedge deletion and addition perturbations in all the empirical hypernetworks. It follows that if we make more changes to the number of hyperedges in the hypernetwork, the resulting hypernetworks will be more different from the original one, which is in line with our expectations. In contrast, baseline methods are inferior to HD. For example, the portrait method shows a clear decreasing trend of dissimilarity values in hyperedge deletion for hypernetworks such as Yuecai, Chuancai, and Algebra, which means if we delete more hyperedges from a hypernetwork, the generated hypernetwork is more similar to the original one. Similarly, the Tensors method shows a decreasing trend in hyperedge deletion for all six hypernetworks, while the increasing trend is not significant in the hyperedge addition for Restaurant-Rev, Bars-Rev, NDC-classes, and Algebra. Moreover, the six centrality methods, namely, ECC, EDC, HCC, HDC, NDC, and VC, show either an indistinct increasing or a decreasing trend in hyperedge deletion for the six hypernetworks.

4.3 Parameter Analysis

The method we proposed contains a parameter $\beta$ that emphasizes the importance of global and local dissimilarity between two hypernetworks. We further explore how the value of HD changes with $\beta$ . The results are given in Figure 7 for six different hypernetworks. We observe that the values of HD with higher $\beta$ values are lower for hyperedge deletion process. However, for the hyperedge addition process, there is no consistent pattern of how HD changes with $\beta$ .

5 Conclusions & Discussion

In this paper, we propose a method called the hyper-distance-based network comparison method (HD) to compare the dissimilarity between hypernetworks. HD is based on higher-order distances and utilizes node high-order distance distributions and Jensen-Shannon divergence. Specifically, we first compute the $s$ -distance between each pair of nodes using s-walk and generate the $s$ -distance distribution matrices of the hypernetwork. Next, we compare two hypernetworks via the Jensen-Shannon divergence, which encodes comparing the difference of the average connectivity between nodes at different $s$ -distances and the heterogeneity of distance distributions. We compare the proposed HD method with other baselines on various synthetic and real hypernetworks. The experimental results demonstrate that HD is superior to the others in revealing the dissimilarity between hypernetworks. Future research directions include further expanding and improving the higher-order-distance-based method for comparing hypernetwork dissimilarity to adapt to more complex and large-scale network datasets. Furthermore, it can be extended to other types of networks such as multilayer hypernetworks and temporal hypernetworks [34, 35].

6 Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

7 Data availability

Data will be available on request.

8 Acknowledgments

This work was supported by Natural Science Foundation of Zhejiang Province (Grant No. LQ22F030008), the National Natural Science Foundation of China (Grant No. 92146001), the Fundamental Research Funds for the Central Universities of Ministry of Education of China, and the Scientific Research Foundation for Scholars of HZNU (2021QDL030).

References

[1] Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang DU. Complex networks: Structure and dynamics. Phys Rep 2006;424:175–308.
[2] Tantardini M, Ieva F, Tajoli L, Piccardi C. Comparing methods for comparing networks. Sci Rep 2019;9:1–19.
[3] Albert R, Barabási AL. Statistical mechanics of complex networks. Rev Mod Phys 2002;74:47.
[4] Hand DJ. Principles of data mining. Drug Saf 2007;30:621–622.
[5] Rolnick D, Donti PL, Kaack LH, Kochanski K, Lacoste A, Sankaran K, Ross AS, Milojevic-Dupont N, Jaques N, Waldman-Brown A, et al. Tackling climate change with machine learning. ACM Comput Surv 2022;55:1–96.
[6] Xu HC, Wang ZY, Jawadi F, Zhou WX. Reconstruction of international energy trade networks with given marginal data: A comparative analysis. Chaos Solit Fractals 2023;167:113031.
[7] Fkih F. Similarity measures for Collaborative Filtering-based Recommender Systems: Review and experimental comparison. J King Saud Univ Comput Inf Sci 2022;34:7645–7669.
[8] Tabak MA, Norouzzadeh MS, Wolfson DW, Sweeney SJ, VerCauteren KC, Snow NP, Halseth JM, Di Salvo PA, Lewis JS, White MD, et al. Machine learning to classify animal species in camera trap images: Applications in ecology. Methods Ecol Evol 2019;10:585–590.
[9] Greener JG, Kandathil SM, Moffat L, Jones DT. A guide to machine learning for biologists. Nat Rev Mol Cell Biol 2022;23:40–55.
[10] Koutra D, Shah N, Vogelstein JT, Gallagher B, Faloutsos C. Deltacon: Principled massive-graph similarity function with attribution. ACM Trans Knowl Discov Data 2016;10:1–43.
[11] Aliakbary S, Motallebi S, Rashidian S, Habibi J, Movaghar A. Distance metric learning for complex networks: Towards size-independent comparison of network structures. Chaos 2015;25:023111.
[12] Bagrow JP, Bollt EM. An information-theoretic, all-scales approach to comparing networks. Appl Netw Sci 2019;4:1–15.
[13] Wang Z, Zhan XX, Liu C, Zhang ZK. Quantification of network structural dissimilarities based on network embedding. iScience 2022;25.
[14] Newman ME. The structure and function of networks. Comput Phys Commun 2002;147:40–45.
[15] Battiston F, Cencetti G, Iacopini I, Latora V, Lucas M, Patania A, Young JG, Petri G. Networks beyond pairwise interactions: structure and dynamics. Phys Rep 2020;874:1–92.
[16] Benson AR, Gleich DF, Leskovec J. Higher-order organization of complex networks. Science 2016;353:163–166.
[17] Sweeney P, Chen C, Rajapakse I, Cone RD. Network dynamics of hypothalamic feeding neurons. Proc Natl Acad Sci 2021;118:e2011140118.
[18] Xie M, Zhan XX, Liu C, Zhang ZK. An efficient adaptive degree-based heuristic algorithm for influence maximization in hypergraphs. Inf Process Manag 2023;60:103161.
[19] Chodrow PS, Veldt N, Benson AR. Generative hypergraph clustering: From blockmodels to modularity. Sci Adv 2021;7:eabh1303.
[20] Surana A, Chen C, Rajapakse I. Hypergraph similarity measures. IEEE Trans Netw Sci Eng 2022;10:658–674.
[21] Aksoy SG, Joslyn C, Marrero CO, Praggastis B, Purvine E. Hypernetwork science via high-order hypergraph walks. EPJ Data Sci 2020;9:16.
[22] Bagrow JP, Bollt EM, Skufca JD, Ben-Avraham D. Portraits of complex networks. EPL 2008;81:68004.
[23] Van Erven T, Harremos P. Rényi divergence and Kullback-Leibler divergence. IEEE Trans Inf Theory 2014;60:3797–3820.
[24] Banerjee A, Char A, Mondal B. Spectra of general hypergraphs. Linear Algebra Appl 2017;518:14–30.
[25] Li G, Qi L, Yu G. The Z-eigenvalues of a symmetric tensor and its application to spectral hypergraph theory. Numer Linear Algebra Appl 2013;20:1001–1029.
[26] Chang J, Chen Y, Qi L, Yan H. Hypergraph clustering using a new laplacian tensor with applications in image processing. SIAM J Imaging Sci 2020;13:1157–1178.
[27] Kovalenko K, Romance M, Vasilyeva E, Aleja D, Criado R, Musatov D, Raigorodskii AM, Flores J, Samoylenko I, Alfaro-Bittner K, et al. Vector centrality in hypergraphs. Chaos Solutions Fractals 2022;162:112397.
[28] Hu F, Ma L, Zhan XX, Zhou Y, Liu C, Zhao H, Zhang ZK. The aging effect in evolving scientific citation networks. Scientometrics 2021;126:4297–4309.
[29] Wang JW, Rong LL, Deng QH, Zhang JY. Evolving hypernetwork model. Eur Phys J B 2010;77:493–498.
[30] Xie X, Zhan X, Zhang Z, Liu C. Vital node identification in hypergraphs via gravity model. Chaos 2023;33:013104.
[31] Watts DJ, Strogatz SH. Collective dynamics of ‘small-world’networks. Nature 1998;393:440–442.
[32] Ko J, Kook Y, Shin K. Growth patterns and models of real-world hypergraphs. Knowl Inf Syst 2022;64:2883–2920.
[33] Barabási AL, Albert R. Emergence of scaling in random networks. Science 1999;286:509–512.
[34] Fischer MT, Arya D, Streeb D, Seebacher D, Keim DA, Worring M. Visual analytics for temporal hypergraph model exploration. IEEE Trans Vis Comput 2020;27:550–560.
[35] Holme P, Saramäki J. Temporal networks. Phys Rep 2012;519:97–125.