From Cubes to Networks: Fast Generic Model for Synthetic Networks Generation

Min Shaojie¹&Liu Ji¹ ¹Chongqing University {alexmin, liujiboy}@cqu.edu.cn

Abstract

Analytical explorations on complex networks and cubes (i.e., multi-dimensional datasets) are currently two separate research fields with different strategies. To gain more insights into cube dynamics via unique network-domain methodologies and to obtain abundant synthetic networks, we need a transformation approach from cubes into associated networks. To this end, we propose FGM, a fast generic model converting cubes into interrelated networks, whereby samples are remodeled into nodes and network dynamics are guided under the concept of nearest-neighbor searching. Through comparison with previous models, we show that FGM can cost-efficiently generate networks exhibiting typical patterns more closely aligned to factual networks, such as more authentic degree distribution, power-law average nearest-neighbor degree dependency, and the influence decay phenomenon we consider vital for networks. Furthermore, we evaluate the networks that FGM generates through various cubes. Results show that FGM is resilient to input perturbations, producing networks with consistent fine properties.

1 Introduction

Factual networks are often sophisticated and hard to analyze. Thanks to the discovery of common network patterns and the advancement of network-generation models Travers and Milgram (1977); Watts and Strogatz (1998); Barabási and Albert (1999), network researchers can utilize synthetic networks created by models, rather than merely relying on factual networks. Generative network models create synthetic graphs with provided inputs using sets of pre-defined regulations and equations Chakrabarti and Faloutsos (2006); Derr et al. (2018). The generated graph is only applicable when it matches patterns of reality Chakrabarti and Faloutsos (2006), i.e., if we can guarantee synthetic graphs are similar to real-life networks in certain aspects, perceptions will be provided into dynamic causalities on network development.

Integration of generative network models and real-life data inherits both the advantage of system dynamic modeling provided by unique network domain methodologies and the advantage of abundant real-life data from various research domains, i.e., to predict how a specific system evolves with the help of network approaches and to create sufficient synthetic networks with reality interrelations.

However, designing such a data-driven network model can be problematic. Firstly, heterogeneity lies in data format, i.e., traditional research mainly inquiries cubes (i.e. multi-dimensional data) whereas network studies nodes and connections. Secondly and more importantly, as Wen et al argued Wen et al. (2022) that the ultimate goal of the data-driven network model is to create a synthetic network as the “Digital Twin” of reality, it is vital for the generated network to be interrelated with the original data. Achieving this requirement is yet more arduous when we attempt the network to mimic real-life patterns. Additionally, the generator should also be cost-efficient and sustained to input perturbations.

Current solutions for such a transformation (i.e., from real-life data to networks) have limited applicability as their focuses are mainly confined to a single dataset or a specific research topic Chung et al. (2019); Roeder et al. (2021); Lai et al. (2022a); Gürsoy and Bertan (2022); Inoue et al. (2020). A relatively generic solution, the OSN Evolution model Khan and Lee (2018a), introduce a similarity-based method for user-network creation. Although its approach of mapping users’ characteristics into node similarities provides flexibility to the model, the generation process is computationally expensive and cannot be applied to non-user-based fields.

The primary goal of this paper is to create a generative model pursuing the aforementioned objectives. To this end, we propose FGM, Fast Generic Network-Generation Model for transformation from real-life data to interrelated networks. Real-life data for FGM input is in the form of multi-dimensional datasets (i.e., cubes). As a consequence, we refer to the input of FGM as cubes hereafter. The model converts samples in a cube into nodes with representative attributes and the network evolution is guided by those attributes and a potential-neighbor resampling strategy.

FGM is able to generate networks with low computational complexity while still reproducing real-life phenomena (e.g., properties of factual networks and the diminishing of individual’s influence). Through comparison with previous works and simulations with varied cubes, we provide validation for FGM’s excellent performance and robustness to perturbations.

Our methodology is highly flexible, but to ensure generated networks remain dynamic and retain desirable properties, we stipulate two integral restrictions on the input cube:

1.

Individuals within ought to process orderly relationships, portraying the corresponding nodes’ successive sequence of arrival.
2.

Individuals within ought to retain representative values, either generated from the original cube or manually assigned.
3.

(Optional) Individuals within may have geo-graphic feature(s) specifying the positional information, which ultimately yields a location-based network.

It is worth mentioning that those restrictions don’t have to be directly satisfied by the cube. Instead, they can be calculated or inferred. We provide detailed elaborations concerning those attributes in Section 3.1.

2 Related Works

Generative Network Models

Previous generative models of networks can be roughly divided into two categories based on the formation determinant of the network. The first category is property-based-solely models, in which network arrangements and connections are solely determined by the network’s own properties. This type of model is invaluable for unveiling the causality of observed patterns, yet their abilities of associating with reality are still limited because no additional data is supplemented during the procedure. Examples include BA model Barabási and Albert (1999), Small-World Network Watts and Strogatz (1998), Binary Relation model Holland and Leinhardt (1981), Fitness-Weighted Preferential Attachment model Romero et al. (2020), etc. The second is network models with hybrid inputs. Inputs for this category include distribution sequences, a pre-defined graph partition, samples with attributes from a dataset, etc. Hybrid inputs provide generative models with not only additional information but also the potentiality to integrate with other research fields. Representative of this type is the Configuration model Newman (2003). Other examples include DSNG-M Duan et al. (2019), which produce dynamic graphs from community partitioning alongside provided initial graphs, and NetGAN Bojchevski et al. (2018), taking existing graphs with hidden links as input while learning the hidden details through sequential characteristics. To a certain degree, DSNG-M and NetGAN are analogous as they both require a part of the network provided beforehand. Those models can be innovative within their specific research focuses, however, the study of network generation with seamless integration of reality is not investigated.

Data-Driven Network Models

Data-driven network models explore integration of networks and factual data. For synthetic network systems with reality interrelations, Wen et al Wen et al. (2022) proposed a uniﬁed assessment criteria and argue that the ultimate goal of such an approach is creating a Digital Twin (DT) that perfectly matches reality. Unfortunately, current methodologies still fall far short of desired outcomes. Using network methods to model reality has received increasing attention over the years Sahafizadeh and Ladani (2020); Lai et al. (2022b); Sottile et al. (2022); Han and Wang (2022). As those explorations are restricted to a specific domain, their applicability is limited. OSN Evolution model Khan and Lee (2018b) is a relatively more generic approach for user-network generation but it can be computationally expensive. To our best knowledge, there are still no generic approaches for transformation from real-life data into interrelated networks.

3 The Model

FGM transmutes cubes into affiliated networks. As illustrated in Figure 1, the procedure involves three major steps: nodes generation for converting individuals into nodes; neighbors resampling for finding each node’s potential neighbors; and edge generation for settling the final connections.

Refer to caption — Figure 1: The workflow of FGM.

3.1 Nodes Generation

As a first step, we generate $N$ nodes with attributes according to the $N$ samples within the cube, denoted as $V=(T,C,I)$ , where $T,C,I$ represent nodes’ order, geographic and influence attributes respectively.

The Order Attribute

The order attribute $T$ establishes a non-strict partial order relation“ $\leq$ ” among nodes that is reflexive, antisymmetric, and transitive. As $T$ defines the ”order of arrival”, the simplest approach is employing original temporal information, nevertheless, any other intended features that satisfy the following restrictions are suitable: any $t_{i},t_{j},t_{k}\in T$ satisfies:

1.

Reflexivity: $t_{i}\leq t_{i}$ , (i.e., every element is related to itself).
2.

Anti-symmetry: if $t_{i}\leq t_{j}$ and $t_{j}\leq t_{i}$ , then $t_{i}=t_{i}$ (i.e., no two distinct elements precede each other).
3.

Transitivity: if $t_{i}\leq t_{j}$ and $t_{j}\leq t_{k}$ , then $t_{i}\leq t_{k}$ .

The Geographic Attribute

The geographic attribute $C$ can have more than one dimension. It specifies the positional information among nodes for finding potential neighbors thereafter. With respect to cubes with geo-based properties such as longitude and latitude, we simply apply those properties as $C$ , resulting in geo-based networks with positional nodes. For others without, we also provide an alternative strategy in Section 6.

The Influence Attribute

The influence attribute $I$ indicates the node’s impact on the network by dictating the number of potential edges and establishing the edge-formation probabilities during Section 3.3. Although $I$ can be manually assigned, our research has shown that due to the general applicability of the Pareto Principle, for a plentiful of cubes, individuals within can be portrayed with values that belong to a power-law population (examples in Figure 2 a.START (2020); b.UK Department for Transport (2022); c.Preda (2022), d.Rojas (2019), e.Jha (2022); f.Scheijen (2022)). This transformation is accomplished via a linear function:

infRt_{v_{s}}=Scale(\alpha{X_{s}}+\beta)

(1)

where $infRt_{v_{s}}\in I$ is the matching node’s influence attribute, $X_{s}$ is the original features of sample $s$ , and $Scale(\cdot)$ is the min-max scaler controlling total number of edges. As for other cases where a power-law $I$ cannot be attained, we consider them to have random $I$ , and FGM is still capable of converting this type of cube into scale-free networks with desirable properties.

3.2 Neighbors Resampling

The following step is determining candidates of final neighbors for each node (i.e., potential neighbors). The $T$ sequence yields an arrival sequence of $V$ . For each $t\in T$ , let $v_{t}$ describe the corresponding node and $G_{t}=(V_{t},E_{t})$ describe the graph. The potential neighbors of $v_{t}$ , denoted by $\mathit{PNbr_{v_{t}}}$ , are sampled through a nearest neighbor searching algorithm, denoted by $\Upsilon(\cdot)$ :

PNbr_{v_{t}}=\Upsilon_{k_{t}}(v_{t},V_{t-1})

(2)

Within $\Upsilon(\cdot)$ , the number of potential neighbors (i.e., $k_{t}$ ) of node $v_{t}$ is determined by $v_{t}$ ’s influence attribute $infRt_{v_{t}}$ regardless of its geographic attribute $c_{v_{t}}$ :

k_{t}=min(n_{t-1},\ \eta*infRt_{v_{t}}^{\theta})

(3)

where $n_{t-1}=\mid V_{t-1}\mid$ and $\eta,\theta\in\mathbf{R}^{+}$ are parameters guiding the global edge scale and the degree deviation among nodes respectively.

As we have discussed previously, $I$ is either pow-law or considered random. For the former, $\theta$ is set to 1, and for the latter, $\theta$ is generally between 3-9 to maintain power-law degree distribution. Once $k_{t}$ is settled, the distance measurement within $\Upsilon(\cdot)$ is obtained as:

d(i,j)=dist_{measure}(attr_{i},attr_{j})

(4)

where $i,j\in V$ , $dist_{measure}(\cdot)$ represents an intended measurement, and $attr$ is the weighted order and geographic attributes:

attr_{i}=[\mu_{t}t_{i},\ \mu_{c}(c_{i,1},\ c_{i,2},...,c_{i,\lambda})]

(5)

where $t_{i}\in T$ , $c_{i}\in C$ , $i\in V$ , and $\lambda$ being the geographic dimension. Note that the order and geographic attributes (i.e., $T$ and $C$ ) are both put into consideration when calculating the distance. Because in FGM, it is imperative that we fully account for the impact of both order and geographic properties on node associations (the benefit will be illustrated in Section 4.4).

As the result of neighbors resampling simply alters the recipient of connections, the only variation that can make an impact on the network’s overall properties here is $k_{t}$ . This decoupling between specific algorithms and network overall properties gives more flexibility to the model. However, with different implementations leading to varied link participants, the interrelations between generated network and original data will ultimately be affected. As a result, although various implementations can be employed here, it is advised to choose according to the data under investigation.

3.3 Edges Generation

After the potential neighbors (i.e., $PNbr$ ) are determined, edges are generated under the control of the influence attribute $I$ .

For network $G_{t}$ , let $E_{v_{t},j}=1$ indicates the existence of an undirected edge between node $v_{t}$ and $j$ . The node $v_{t}$ connects to $j$ with probability:

P(E_{v_{t},j}=1)=\Gamma(infRt_{v_{t}},\ infRt_{j})

(6)

where $j\in PNbr_{v_{t}}$ and $\Gamma(\cdot)$ represents a generalized function shifting pairs of influence attributes to edge formation determinant.

4 Evaluation of Generated Networks

Here, we focus on patterns and measurements of synthetic FGM networks by comparing them with other generative network models. As a generic model, FGM evaluations should not be fixed with any specific cube. Hence, node attributes are obtained as follows throughout this section:

•

The order and geographic attributes (i.e., $T$ and $C$ ) are created as random samples, whereby the dimension of $C$ is set to 2;
•

Two scenarios of the influence attribute (i.e., $I$ ), are evaluated. In one scenario, denoted by $\mathrm{FGM_{p}}$ , $I$ is acquired as samples from Lomax distribution with the probability density function as $p(x)=\frac{\alpha\mu^{\alpha}}{(x+\mu)^{\alpha+1}}$ , in which $\mu,\alpha$ are set to 1 and 3 respectively. In the other scenario, denoted by $\mathrm{FGM_{r}},$ $I$ is obtained as random samples.

Other experiment settings and regulations that are shared across this section, if not specified otherwise, are listed below:

•

For the scale parameters in Equation 3, $\eta$ is set to 40 and $\theta$ is set to 1 for $\mathrm{FGM_{p}}$ and 9 for $\mathrm{FGM_{r}}$ .
•

Previous models under comparison include ER network, Small-World network, and the two most representative models of our model categories in Section 2 — BA network and Configuration model.
•

The network scales under evaluation are 500, 1000, 2000, 5000, and 10000.
•

As $T$ and $C$ are random, the nearest neighbor algorithm in Equation 2 and distance measurement in Equation 4 won’t manipulate the simulation results. Nevertheless, to clarify, we employ KNN with Minkowski distance.
•

The version of FGM won’t affect what we try to convert after Section 4.3. For simplicity, we just utilize $\mathrm{FGM_{p}}$ , directly addressing them as FGM.

4.1 Degree Distribution

In this experiment, we test whether synthetic networks produced by FGM portray intended degree distributions similar to reality. We use the term Gnode (giant node) referring to the few nodes with large degrees (i.e., the tail of the power-law distribution).

For both scenarios of FGM (see Figure 3), networks are able to exhibit power-law tendencies. Compared with other models (i.e., BA network and Configuration model) where the log-log curve gradient remains constant, the curve head of FGM is flatter than the tail, particularly in $\mathrm{FGM_{p}}$ . Since the power law often applies only for values greater than some minimum $x_{min}$ for real-life networks Clauset et al. (2009), we believe our model better captures the nature of factual degree distribution (see Figure 4). The results also indicate that the larger the network scale is, the more obvious the pattern becomes.

Figure 4 shows the degree distribution of several real-life networks (i.e., a.Rozemberczki et al. (2019); b.McAuley and Leskovec (2012); c/e.Rozemberczki and Sarkar (2020); d.Leskovec et al. (2007)) and $\mathrm{FGM_{p}}$ ’s simulations with corresponding scales. Regardless of the field or scale of real-life networks, the actual distribution curve head is always flatter than the tail. This network property is correctly captured by our model while others cannot.

4.2 Average Nearest-Neighbor Degree

Starting here, we omit the assessment on ER networks because the randomness is too high to show observable patterns.

Another network measurement we present in detail is the average nearest-neighbor degree (ANND) Pastor-Satorras et al. (2001). As a network quantization of degree-degree correlations, ANND distribution reflects the connectivity details by specifying how connected nodes with certain degrees are. When ANND was first introduced in 2001, researchers found that the Internet in 1998 exhibited a distinct power-law ANND dependence and the same patterns have been later observed to be applicable to many other factual networks Pastor-Satorras et al. (2001); Catanzaro et al. (2005) (examples in Figure 5, a.Leskovec et al. (2005); b/c.Rozemberczki and Sarkar (2021); d.Zitnik et al. (2018); e.Rozemberczki et al. (2019); f.Cho et al. (2011)).

The ANND simulation results under different network scales show that both cases of FGM display a clear power-law ANND dependency while others remain relatively constant. To this concern, we also believe FGM outperforms others because the power-law ANND tendencies were also observed in reality.

4.3 Other Network Properties

This section provides a comparison of other network properties that help assess the synthetic networks.

Model	# of Nodes	# of Edges	Parallel Edges	Clustering Coefficient	Average Path Length
Small-World Network	500	2000	No	0.225	3.552
	1000	4000		0.221	3.992
	2000	8000		0.227	4.427
	5000	20000		0.227	4.974
	10000	40000		0.227	5.387
BA Network	500	7275	No	0.126	2.123
	1000	14775		0.076	2.333
	2000	29775		0.050	2.527
	5000	74775		0.024	2.739
	10000	149775		0.014	2.855
Configuration Model	500	3745	Yes	NA	2.556
	1000	7296			2.783
	2000	14330			3.044
	5000	36381			3.389
	10000	73529			3.602
$\mathrm{FGM_{p}}$	500	3610	No	0.553	2.466
	1000	7415		0.468	2.968
	2000	13069		0.480	3.377
	5000	39024		0.497	4.323
	10000	76987		0.478	5.313
$\mathrm{FGM_{r}}$	500	4409	No	0.560	2.779
	1000	9294		0.553	3.319
	2000	20027		0.542	3.975
	5000	53785		0.544	5.234
	10000	103114		0.540	6.553

Table 1: Statistics of synthetic networks under different network scales (NA stands for ”Not Applicable”).

Table 1 shows that, under varied scales, FGM is the only model being able to create synthetic networks with low average length paths and high clustering coefficients while still providing network variations.

Although Small-World networks process relatively high clustering coefficients and low average path lengths, their fixed edge number severely restricts model flexibility. BA networks inherit low clustering coefficients and the coefficient seems to deteriorate while the network becomes larger. The clustering coefficient measurement for the Configuration model is not applicable due to its parallel edges, making it more troublesome to measure the network density.

4.4 The Influence Decay

Influence Decay Phenomenon

For network modeling reality, the edge-formation probability implies influence, i.e., how vital a node signifies in the network or how many links the node is able to establish. Common sense indicates that any individual’s influence, even for the most important ones cannot last forever. While the system evolves, the influence of individual will diminish. This is what we address as the influence decay phenomenon.

To the network domain’s concern, influence decay implies that any node’s edge-forming probability ought to demise while the network evolves, i.e., any node only interacts with a vanishing fraction of previous nodes Bloem-Reddy et al. (2018). To explain how this influence decay is uniquely reproduced in FGM, we compare FGM network with BA network from a micro-perspective: the edge-formation probability dynamics of randomly sampled Gnodes.

For FGM, All Gnodes’ probabilities dissolve at around 200 nodes after their own arrivals, satisfying the phenomenon that “any individual’s influence eventually demises”. But even as the network grows 20 times bigger, those probabilities in BA network remain relatively stable, indicating sustained long-term influences.

Note that although we use the BA network as illustrations because of its profound influence, any generative model under our category property-based-solely models will potentially behave as the opposite of influence decay. Because properties inevitably grow while the network unfolds.

Although the edge-forming probabilities decay in FGM, the probability determinant $InfRt_{v}$ remains constant. What has changed is the less likelihood of current arrival including earlier ones in its neighbor resampling results as a result of introducing the order attribute into distance calculation.

This is an essential component of what we attempt to accomplish in FGM — the notable decayed influence with an intrinsic property remained. Because in many real-life situations, as the environment changes, individuals within have a tendency to remain in the status quo, i.e., “the leopard cannot change its spots”.

4.5 Large Scale with High Generation Proficiency

In this section, we demonstrate the generation time complexity of FGM by employing Locality-Sensitive Hashing (LSH) as the neighbor-searching function (i.e., $\Upsilon(\cdot$ )) in Equation 2.

Generation time of a 5000-node FGM network is shown in Figure 8. The generation time complexity of FGM is solely dependent upon the process of finding $PNbr$ owing to two factors:

1.

Every node only has to consider the possibility of forming edges with a small faction of existing nodes;
2.

The determinant of edge formation (i.e., $InfRt$ ) is constant, circumventing the necessity of recalculation when network changes.

A constant time of finding PNbr will lead to a network generation complexity of $O(n)$ , which is also the case in our demonstration of FGM with LSH’s $O(1)$ query time. This makes it possible for modeling any large-scale cubes within tolerable time as dataset scale will not accumulate the computational burden. However, it is not the case for models whose dynamics are dependent on the ever-changing network properties because of the need for recalculation. Here, we compare the time complexities of different models.

We believe the comparison of proportional time costs is much more suitable than the absolute time costs. With the very nature of network models being several pre-defined regulations, varied implementations of the same model will alter the time cost to a great extent. Still, if compared to the time cost of a small network, the proportional time costs of bigger networks will still reflect the model’s tolerance of large-scale data.

Thus, every model’s generation time of a 1000-node network is denoted as {1000} uniformly beforehand. We also include the OSN Evolution model Khan and Lee (2018b) which, like FGM, models data into networks.

# of Nodes	1000	2000	5000	10000
Configuration model	1.0	2.1	12.2	20.1
Small-World network	1.0	4.0	26.2	96.4
BA network	1.0	6.2	81.6	606.7
OSN Evolution model	1.0	9.2	130.5	1112.7
FGM	1.0	4.1	25.5	103.8

Table 2: The proportional time cost of models with unit as {1000}.

Table 2 shows that within the field of scale-free networks, the Configuration model, BA model Atwood et al. (2015), and FGM process linear generation time complexities. As for the other data-driven approach, the OSN Evolution model becomes extremely time-consuming as the scale gets bigger.

5 Cube Experimentations

5.1 Properties of Cube-Networks

Cubes	# of Nodes	# of Edges	Clustering Coefficient	Avg. Path Length	Time Cost (Sec.)
Youtube Channels	1000	9588	0.458	3.934	less than 1
COVID19 Vaccination	86512	711170	0.375	4.311	6
Global Terrorism Attacks	190036	1036035	0.245	5.756	10
UK Road-Traffic	2047256	12703808	0.282	3.501	117
Dutch Crimes	2596254	14435180	0.251	5.221	133
IP Traffic Flows	3577296	28797789	0.392	3.822	230

Table 3: Statistics of synthetic networks derived from six cubes of varied domains

In this section, we provide evaluations of FGM from a macro-perspective analysis, i.e., properties of interrelated networks from various cubes. We convert previously illustrated cubes in Figure 2 to interrelated networks and further investigate whether synthetic networks can reproduce typical patterns as real-life networks while, in the meantime, reflecting patterns of original cubes.

From Table 3, it is clear that, regardless of cube sizes, FGM synthetic networks can maintain high clustering coefficients with low average path lengths as factual networks. Figure 9 shows the degree distributions exhibit power-law patterns. Notably, when examined in conjunction with the $I$ distribution of the original cubes, these degree distributions also accurately capture the uniqueness of the original patterns, that is, the manner in which the data points are distributed in the plot.

Now, from a macro-perspective analysis, we believe that FGM can generate synthetic networks with interrelations to reality, where common properties of factual networks are reproduced and original data patterns are preserved to a certain degree.

5.2 Decayed Influence of 9-11

Here, we present the evaluations of FGM from a micro-perspective analysis, i.e., how the Gnode (i.e., 9-11 attack) in Global Terrorism Database (GTD) START (2020) reflects the prescribed influence decay phenomenon (see Section 4.4). To elaborate, we compare the edge dynamics of 9-11 attack in FGM and BA network of modeling GTD between 2001 and 2005.

Figure 10 shows that, in FGM Gnode 9-11’s degree remained stationary after around 3 months whereas the edge dynamic in BA network kept evolving throughout the simulation. That is to say, the 9-11 influence on other attacks in FGM eventually decays despite its enormous impact at the time, whereby, in property-based model like BA Network, the impact keeps growing. To make things worse, 9-11 is forever the biggest node for each new arrivals no matter how the world evolves. Nowadays clearly more recent events have bigger impacts.

From a micro-perspective analysis, we also believe FGM reflects factual phenomenon to a centain degree where previous works cannot.

6 Discussion

During the process of neighbors resampling (see Section 3.2), we obtained potential neighbors (i.e., $PNbr$ ) through the order and geographic attributes (i.e., $T$ and $C$ ). However, we believe the acquirement of $PNbr$ (i.e. $\Upsilon(\cdot)$ in Equation 2) doesn’t have to utilize node attributes at all. An alternative implementation of $\Upsilon(\cdot)$ can take the original dataset’s features as input and directly output PNbr through machine learning algorithms such as Bayesian methods, Clustering, or any others that produce classification results. The direct attainment of $PNbr$ introduce better flexibility to FGM as the geographic attribute $C$ is no longer required, i.e., FGM becomes geo-free allowing for geographic irrelevance data to be modeled into non-locale-based networks.

This alternative provides yet another benefit besides more flexibility: with $PNbr$ acquired beforehand, FGM’s network generation time will be fixed to O(n) regardless of model’s implementation details.

7 Conclusion

In this paper, we answer the question “Can we efficiently model cubes from varied domains to interrelated networks?” With the proposed FGM, we believe we reach a positive answer to this question. Extensive experiments show the generated networks can both reproduce patterns shared by factual networks and perserve the uniqueness of original data features from both marco-perspective and micro-perspective. Moreover, FGM’s decoupling from specific algorithm and linear generation time provide the approach with noble flexibility and efficiency.

FGM opens doors to utilizing network-unique methodologies for systematic dynamic analysis of cubes and to generating abundant synthetic networks for future network research.

References

Atwood et al. [2015] James Atwood, Bruno Ribeiro, and Don Towsley. Efficient network generation under general preferential attachment. Computational Social Networks, 2(1):7, December 2015.
Barabási and Albert [1999] Albert L. Barabási and Réka Albert. Emergence of scaling in random networks. Science, 286(5439):509–512, October 1999.
Bloem-Reddy et al. [2018] Benjamin Bloem-Reddy, Adam Foster, Emile Mathieu, and Yee W. Teh. Sampling and inference for beta neutral-to-the-left models of sparse networks. In Uncertainty in Artificial Intelligence, pages 477–486, Monterey, CA, August 2018. 34th Conference on Uncertainty in Artificial Intelligence (UAI).
Bojchevski et al. [2018] Aleksandar Bojchevski, Oleksandr Shchur, Daniel Zugner, and Stephan Gunnemann. Netgan: Generating graphs via random walks. In 35th International Conference on Machine Learning (ICML), July 2018.
Catanzaro et al. [2005] Michele Catanzaro, Marián Boguñá, and Romualdo Pastor-Satorras. Generation of uncorrelated random scale-free networks. Physical Review E, 71(2):027103, February 2005.
Chakrabarti and Faloutsos [2006] Deepayan Chakrabarti and Christos Faloutsos. Graph mining: Laws, generators, and algorithms. Acm Computing Surveys, 38(1):A1–A69, March 2006.
Cho et al. [2011] Eunjoon Cho, Seth A. Myers, and Jure Leskovec. Friendship and mobility: User movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1082––1090, New York, NY, USA, August 2011. Association for Computing Machinery.
Chung et al. [2019] Wingyan Chung, Bingbing Rao, and Liqiang Wang. Interaction models for detecting nodal activities in temporal social media networks. Acm Transactions on Management Information Systems, 10(4), December 2019.
Clauset et al. [2009] Aaron Clauset, Cosma Rohilla, and Mark E.J. Newman. Power-law distributions in empirical data. SIAM REVIEW, 51(4):661–703, December 2009.
Derr et al. [2018] Tyler Derr, Charu Aggarwal, and Jiliang Tang. Signed network modeling based on structural balance theory. In 27th ACM International Conference on Information and Knowledge Management (CIKM), pages 557–566, New York, NY, USA, October 2018. Association for Computing Machinery.
Duan et al. [2019] Binyao Duan, Wenjian Luo, Hao Jiang, and Li Ni. Dynamic social networks generator based on modularity: Dsng-m. In 2nd International Conference on Data Intelligence and Security (ICDIS), pages 167–173, June 2019.
Gürsoy and Bertan [2022] Furkan Gürsoy and Badur Bertan. An agent-based modeling approach to brain drain. IEEE transactions on computational social systems, 9(2):9, March 2022.
Han and Wang [2022] Xuehua Han and Juanle Wang. Modelling and analyzing the semantic evolution of social media user behaviors during disaster events: A case study of covid-19. Isprs International Journal of Geo-Information, 11(7), July 2022.
Holland and Leinhardt [1981] Paul W. Holland and Samuel Leinhardt. An exponential family of probability-distributions for directed-graphs. Journal of the American Statistical Association, 76(373):33–50, 1981.
Inoue et al. [2020] Masaaki Inoue, Pham Thong, and Hidetoshi Shimodaira. Joint estimation of non-parametric transitivity and preferential attachment functions in scientific co-authorship networks. Journal of Informetrics, 14(3), August 2020.
Jha [2022] Suraj Jha. Most subscribed youtube channels. https://www.kaggle.com/datasets/surajjha101/top-youtube-channels-data, 2022.
Khan and Lee [2018a] Jebran Khan and Sungchang Lee. Online social networks (osn) evolution model based on homophily and preferential attachment. Symmetry-Basel, 10(11), November 2018.
Khan and Lee [2018b] Jebran Khan and Sungchang Lee. Online social networks (osn) evolution model based on homophily and preferential attachment. Symmetry-Basel, 10(11), November 2018.
Lai et al. [2022a] Guijun Lai, Yuzhen Shang, Binbao He, Guanwei Zhao, and Muzhuang Yang. Revealing taxi interaction network of urban functional area units in shenzhen, china. Isprs International Journal of Geo-Information, 11(7), July 2022.
Lai et al. [2022b] Guijun Lai, Yuzhen Shang, Binbao He, Guanwei Zhao, and Muzhuang Yang. Revealing taxi interaction network of urban functional area units in shenzhen, china. Isprs International Journal of Geo-Information, 11(7), July 2022.
Leskovec et al. [2005] Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graphs over time: Densification laws, shrinking diameters and possible explanations. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pages 177––187, New York, NY, USA, August 2005. Association for Computing Machinery.
Leskovec et al. [2007] Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data, 1(1):2––es, March 2007.
McAuley and Leskovec [2012] Julian McAuley and Jure Leskovec. Learning to discover social circles in ego networks. http://snap.stanford.edu/data/ego-Facebook.html, 2012.
Newman [2003] Mark E.J. Newman. The structure and function of complex networks. Siam Review, 45(2):167–256, Jun 2003.
Pastor-Satorras et al. [2001] Romualdo Pastor-Satorras, Alexei Vázquez, and Alessandro Vespignani. Dynamical and correlation properties of the internet. Physical Review Letters, 87(25):258701, November 2001.
Preda [2022] Gabriel Preda. Covid-19 world vaccination progress. https://www.kaggle.com/datasets/gpreda/covid-world-vaccination-progress, 2022.
Roeder et al. [2021] Michael Roeder, Nguyen Pham Thuy Sy, Felix Conrads, Ana Alexandra Morim da Silva, Axel-Cyrille Ngonga Ngomo, and Ieee. Lemming - example-based mimicking of knowledge graphs. In 15th IEEE International Conference on Semantic Computing (ICSC), pages 62–69, January 2021.
Rojas [2019] Juan S. Rojas. Ip network traffic flows labeled with 75 apps. https://www.kaggle.com/datasets/jsrojas/ip-network-traffic-flows-labeled-with-87-apps, 2019.
Romero et al. [2020] Juan Romero, Jorge Finke, and Andres Salazar. Fitness-weighted preferential attachment with varying number of new connections. In 8th International Conference on Complex Networks and Their Applications (COMPLEX NETWORKS), pages 612–620, December 2020.
Rozemberczki and Sarkar [2020] Benedek Rozemberczki and Rik Sarkar. Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM ’20), pages 1325––1334. ACM, 2020.
Rozemberczki and Sarkar [2021] Benedek Rozemberczki and Rik Sarkar. Twitch gamers: a dataset for evaluating proximity preserving and structural role-based node embeddings. http://snap.stanford.edu/data/twitch_gamers.html, 2021.
Rozemberczki et al. [2019] Benedek Rozemberczki, Carl Allen, and Rik Sarkar. Multi-scale attributed node embedding. http://snap.stanford.edu/data/github-social.html, 2019.
Sahafizadeh and Ladani [2020] Ebrahim Sahafizadeh and Behrouz Tork Ladani. A model for social communication network in mobile instant messaging systems. Ieee Transactions on Computational Social Systems, 7(1):68–83, February 2020.
Scheijen [2022] Max Scheijen. Dutch crimes. https://www.kaggle.com/datasets/maxscheijen/dutch-crimes, 2022.
Sottile et al. [2022] Sara Sottile, Ozan Kahramanogullari, and Mattia Sensi. How network properties and epidemic parameters influence stochastic sir dynamics on scale-free random networks. Journal of Simulation, August 2022.
START [2020] START. National consortium for the study of terrorism and responses to terrorism. https://www.start.umd.edu/gtd/, 2020.
Travers and Milgram [1977] Jeffrey Travers and Stanley Milgram. An experimental study of the small world problem. In Social Networks, pages 179–197. Academic Press, 1977.
UK Department for Transport [2022] UK Department for Transport. Road safety data. https://www.data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-safety-data/, 2022.
Watts and Strogatz [1998] Duncan J. Watts and Steven H. Strogatz. Collective dynamics of ’small-world’ networks. Nature, 393(6684):440–442, June 1998.
Wen et al. [2022] Jiaqi Wen, Bogdan Gabrys, and Katarzyna Musial. Toward digital twin oriented modeling of complex networked systems and their dynamics: A comprehensive survey. Ieee Access, 10:66886–66923, June 2022.
Zitnik et al. [2018] Marinka Zitnik, Rok Sosič, Sagar Maheshwari, and Jure Leskovec. BioSNAP Datasets: Stanford biomedical network dataset collection. http://snap.stanford.edu/biodata, August 2018.