This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

From Cubes to Networks: Fast Generic Model for Synthetic Networks Generation

Min Shaojie1&Liu Ji1 1Chongqing University {alexmin, liujiboy}@cqu.edu.cn
Abstract

Analytical explorations on complex networks and cubes (i.e., multi-dimensional datasets) are currently two separate research fields with different strategies. To gain more insights into cube dynamics via unique network-domain methodologies and to obtain abundant synthetic networks, we need a transformation approach from cubes into associated networks. To this end, we propose FGM, a fast generic model converting cubes into interrelated networks, whereby samples are remodeled into nodes and network dynamics are guided under the concept of nearest-neighbor searching. Through comparison with previous models, we show that FGM can cost-efficiently generate networks exhibiting typical patterns more closely aligned to factual networks, such as more authentic degree distribution, power-law average nearest-neighbor degree dependency, and the influence decay phenomenon we consider vital for networks. Furthermore, we evaluate the networks that FGM generates through various cubes. Results show that FGM is resilient to input perturbations, producing networks with consistent fine properties.

1 Introduction

Factual networks are often sophisticated and hard to analyze. Thanks to the discovery of common network patterns and the advancement of network-generation models Travers and Milgram (1977); Watts and Strogatz (1998); Barabási and Albert (1999), network researchers can utilize synthetic networks created by models, rather than merely relying on factual networks. Generative network models create synthetic graphs with provided inputs using sets of pre-defined regulations and equations Chakrabarti and Faloutsos (2006); Derr et al. (2018). The generated graph is only applicable when it matches patterns of reality Chakrabarti and Faloutsos (2006), i.e., if we can guarantee synthetic graphs are similar to real-life networks in certain aspects, perceptions will be provided into dynamic causalities on network development.

Integration of generative network models and real-life data inherits both the advantage of system dynamic modeling provided by unique network domain methodologies and the advantage of abundant real-life data from various research domains, i.e., to predict how a specific system evolves with the help of network approaches and to create sufficient synthetic networks with reality interrelations.

However, designing such a data-driven network model can be problematic. Firstly, heterogeneity lies in data format, i.e., traditional research mainly inquiries cubes (i.e. multi-dimensional data) whereas network studies nodes and connections. Secondly and more importantly, as Wen et al argued Wen et al. (2022) that the ultimate goal of the data-driven network model is to create a synthetic network as the “Digital Twin” of reality, it is vital for the generated network to be interrelated with the original data. Achieving this requirement is yet more arduous when we attempt the network to mimic real-life patterns. Additionally, the generator should also be cost-efficient and sustained to input perturbations.

Current solutions for such a transformation (i.e., from real-life data to networks) have limited applicability as their focuses are mainly confined to a single dataset or a specific research topic Chung et al. (2019); Roeder et al. (2021); Lai et al. (2022a); Gürsoy and Bertan (2022); Inoue et al. (2020). A relatively generic solution, the OSN Evolution model Khan and Lee (2018a), introduce a similarity-based method for user-network creation. Although its approach of mapping users’ characteristics into node similarities provides flexibility to the model, the generation process is computationally expensive and cannot be applied to non-user-based fields.

The primary goal of this paper is to create a generative model pursuing the aforementioned objectives. To this end, we propose FGM, Fast Generic Network-Generation Model for transformation from real-life data to interrelated networks. Real-life data for FGM input is in the form of multi-dimensional datasets (i.e., cubes). As a consequence, we refer to the input of FGM as cubes hereafter. The model converts samples in a cube into nodes with representative attributes and the network evolution is guided by those attributes and a potential-neighbor resampling strategy.

FGM is able to generate networks with low computational complexity while still reproducing real-life phenomena (e.g., properties of factual networks and the diminishing of individual’s influence). Through comparison with previous works and simulations with varied cubes, we provide validation for FGM’s excellent performance and robustness to perturbations.

Our methodology is highly flexible, but to ensure generated networks remain dynamic and retain desirable properties, we stipulate two integral restrictions on the input cube:

  1. 1.

    Individuals within ought to process orderly relationships, portraying the corresponding nodes’ successive sequence of arrival.

  2. 2.

    Individuals within ought to retain representative values, either generated from the original cube or manually assigned.

  3. 3.

    (Optional) Individuals within may have geo-graphic feature(s) specifying the positional information, which ultimately yields a location-based network.

It is worth mentioning that those restrictions don’t have to be directly satisfied by the cube. Instead, they can be calculated or inferred. We provide detailed elaborations concerning those attributes in Section 3.1.

2 Related Works

Generative Network Models

Previous generative models of networks can be roughly divided into two categories based on the formation determinant of the network. The first category is property-based-solely models, in which network arrangements and connections are solely determined by the network’s own properties. This type of model is invaluable for unveiling the causality of observed patterns, yet their abilities of associating with reality are still limited because no additional data is supplemented during the procedure. Examples include BA model Barabási and Albert (1999), Small-World Network Watts and Strogatz (1998), Binary Relation model Holland and Leinhardt (1981), Fitness-Weighted Preferential Attachment model Romero et al. (2020), etc. The second is network models with hybrid inputs. Inputs for this category include distribution sequences, a pre-defined graph partition, samples with attributes from a dataset, etc. Hybrid inputs provide generative models with not only additional information but also the potentiality to integrate with other research fields. Representative of this type is the Configuration model Newman (2003). Other examples include DSNG-M Duan et al. (2019), which produce dynamic graphs from community partitioning alongside provided initial graphs, and NetGAN Bojchevski et al. (2018), taking existing graphs with hidden links as input while learning the hidden details through sequential characteristics. To a certain degree, DSNG-M and NetGAN are analogous as they both require a part of the network provided beforehand. Those models can be innovative within their specific research focuses, however, the study of network generation with seamless integration of reality is not investigated.

Data-Driven Network Models

Data-driven network models explore integration of networks and factual data. For synthetic network systems with reality interrelations, Wen et al Wen et al. (2022) proposed a unified assessment criteria and argue that the ultimate goal of such an approach is creating a Digital Twin (DT) that perfectly matches reality. Unfortunately, current methodologies still fall far short of desired outcomes. Using network methods to model reality has received increasing attention over the years Sahafizadeh and Ladani (2020); Lai et al. (2022b); Sottile et al. (2022); Han and Wang (2022). As those explorations are restricted to a specific domain, their applicability is limited. OSN Evolution model Khan and Lee (2018b) is a relatively more generic approach for user-network generation but it can be computationally expensive. To our best knowledge, there are still no generic approaches for transformation from real-life data into interrelated networks.

3 The Model

FGM transmutes cubes into affiliated networks. As illustrated in Figure 1, the procedure involves three major steps: nodes generation for converting individuals into nodes; neighbors resampling for finding each node’s potential neighbors; and edge generation for settling the final connections.

Refer to caption
Figure 1: The workflow of FGM.

3.1 Nodes Generation

As a first step, we generate NN nodes with attributes according to the NN samples within the cube, denoted as V=(T,C,I)V=(T,C,I), where T,C,IT,C,I represent nodes’ order, geographic and influence attributes respectively.

The Order Attribute

The order attribute TT establishes a non-strict partial order relation“\leq” among nodes that is reflexive, antisymmetric, and transitive. As TT defines the ”order of arrival”, the simplest approach is employing original temporal information, nevertheless, any other intended features that satisfy the following restrictions are suitable: any ti,tj,tkTt_{i},t_{j},t_{k}\in T satisfies:

  1. 1.

    Reflexivity: titit_{i}\leq t_{i}, (i.e., every element is related to itself).

  2. 2.

    Anti-symmetry: if titjt_{i}\leq t_{j} and tjtit_{j}\leq t_{i}, then ti=tit_{i}=t_{i} (i.e., no two distinct elements precede each other).

  3. 3.

    Transitivity: if titjt_{i}\leq t_{j} and tjtkt_{j}\leq t_{k}, then titkt_{i}\leq t_{k}.

The Geographic Attribute

The geographic attribute CC can have more than one dimension. It specifies the positional information among nodes for finding potential neighbors thereafter. With respect to cubes with geo-based properties such as longitude and latitude, we simply apply those properties as CC, resulting in geo-based networks with positional nodes. For others without, we also provide an alternative strategy in Section 6.

The Influence Attribute

The influence attribute II indicates the node’s impact on the network by dictating the number of potential edges and establishing the edge-formation probabilities during Section 3.3. Although II can be manually assigned, our research has shown that due to the general applicability of the Pareto Principle, for a plentiful of cubes, individuals within can be portrayed with values that belong to a power-law population (examples in Figure 2 a.START (2020); b.UK Department for Transport (2022); c.Preda (2022), d.Rojas (2019), e.Jha (2022); f.Scheijen (2022)). This transformation is accomplished via a linear function:

infRtvs=Scale(αXs+β)infRt_{v_{s}}=Scale(\alpha{X_{s}}+\beta) (1)

where infRtvsIinfRt_{v_{s}}\in I is the matching node’s influence attribute, XsX_{s} is the original features of sample ss, and Scale()Scale(\cdot) is the min-max scaler controlling total number of edges. As for other cases where a power-law II cannot be attained, we consider them to have random II, and FGM is still capable of converting this type of cube into scale-free networks with desirable properties.

Refer to caption
Figure 2: Distributions (log-log) of influence attribute II from various cubes.

3.2 Neighbors Resampling

The following step is determining candidates of final neighbors for each node (i.e., potential neighbors). The TT sequence yields an arrival sequence of VV. For each tTt\in T, let vtv_{t} describe the corresponding node and Gt=(Vt,Et)G_{t}=(V_{t},E_{t}) describe the graph. The potential neighbors of vtv_{t}, denoted by 𝑃𝑁𝑏𝑟vt\mathit{PNbr_{v_{t}}}, are sampled through a nearest neighbor searching algorithm, denoted by Υ()\Upsilon(\cdot):

PNbrvt=Υkt(vt,Vt1)PNbr_{v_{t}}=\Upsilon_{k_{t}}(v_{t},V_{t-1}) (2)

Within Υ()\Upsilon(\cdot), the number of potential neighbors (i.e., ktk_{t}) of node vtv_{t} is determined by vtv_{t}’s influence attribute infRtvtinfRt_{v_{t}} regardless of its geographic attribute cvtc_{v_{t}}:

kt=min(nt1,ηinfRtvtθ)k_{t}=min(n_{t-1},\ \eta*infRt_{v_{t}}^{\theta}) (3)

where nt1=Vt1n_{t-1}=\mid V_{t-1}\mid and η,θ𝐑+\eta,\theta\in\mathbf{R}^{+} are parameters guiding the global edge scale and the degree deviation among nodes respectively.

As we have discussed previously, II is either pow-law or considered random. For the former, θ\theta is set to 1, and for the latter, θ\theta is generally between 3-9 to maintain power-law degree distribution. Once ktk_{t} is settled, the distance measurement within Υ()\Upsilon(\cdot) is obtained as:

d(i,j)=distmeasure(attri,attrj)d(i,j)=dist_{measure}(attr_{i},attr_{j}) (4)

where i,jVi,j\in V, distmeasure()dist_{measure}(\cdot) represents an intended measurement, and attrattr is the weighted order and geographic attributes:

attri=[μtti,μc(ci,1,ci,2,,ci,λ)]attr_{i}=[\mu_{t}t_{i},\ \mu_{c}(c_{i,1},\ c_{i,2},...,c_{i,\lambda})] (5)

where tiTt_{i}\in T, ciCc_{i}\in C, iVi\in V, and λ\lambda being the geographic dimension. Note that the order and geographic attributes (i.e., TT and CC) are both put into consideration when calculating the distance. Because in FGM, it is imperative that we fully account for the impact of both order and geographic properties on node associations (the benefit will be illustrated in Section 4.4).

As the result of neighbors resampling simply alters the recipient of connections, the only variation that can make an impact on the network’s overall properties here is ktk_{t}. This decoupling between specific algorithms and network overall properties gives more flexibility to the model. However, with different implementations leading to varied link participants, the interrelations between generated network and original data will ultimately be affected. As a result, although various implementations can be employed here, it is advised to choose according to the data under investigation.

3.3 Edges Generation

After the potential neighbors (i.e., PNbrPNbr) are determined, edges are generated under the control of the influence attribute II.

For network GtG_{t}, let Evt,j=1E_{v_{t},j}=1 indicates the existence of an undirected edge between node vtv_{t} and jj. The node vtv_{t} connects to jj with probability:

P(Evt,j=1)=Γ(infRtvt,infRtj)P(E_{v_{t},j}=1)=\Gamma(infRt_{v_{t}},\ infRt_{j}) (6)

where jPNbrvtj\in PNbr_{v_{t}} and Γ()\Gamma(\cdot) represents a generalized function shifting pairs of influence attributes to edge formation determinant.

4 Evaluation of Generated Networks

Here, we focus on patterns and measurements of synthetic FGM networks by comparing them with other generative network models. As a generic model, FGM evaluations should not be fixed with any specific cube. Hence, node attributes are obtained as follows throughout this section:

  • The order and geographic attributes (i.e., TT and CC) are created as random samples, whereby the dimension of CC is set to 2;

  • Two scenarios of the influence attribute (i.e., II), are evaluated. In one scenario, denoted by FGMp\mathrm{FGM_{p}}, II is acquired as samples from Lomax distribution with the probability density function as p(x)=αμα(x+μ)α+1p(x)=\frac{\alpha\mu^{\alpha}}{(x+\mu)^{\alpha+1}}, in which μ,α\mu,\alpha are set to 1 and 3 respectively. In the other scenario, denoted by FGMr,\mathrm{FGM_{r}}, II is obtained as random samples.

Other experiment settings and regulations that are shared across this section, if not specified otherwise, are listed below:

  • For the scale parameters in Equation 3, η\eta is set to 40 and θ\theta is set to 1 for FGMp\mathrm{FGM_{p}} and 9 for FGMr\mathrm{FGM_{r}}.

  • Previous models under comparison include ER network, Small-World network, and the two most representative models of our model categories in Section 2 — BA network and Configuration model.

  • The network scales under evaluation are 500, 1000, 2000, 5000, and 10000.

  • As TT and CC are random, the nearest neighbor algorithm in Equation 2 and distance measurement in Equation 4 won’t manipulate the simulation results. Nevertheless, to clarify, we employ KNN with Minkowski distance.

  • The version of FGM won’t affect what we try to convert after Section 4.3. For simplicity, we just utilize FGMp\mathrm{FGM_{p}}, directly addressing them as FGM.

4.1 Degree Distribution

In this experiment, we test whether synthetic networks produced by FGM portray intended degree distributions similar to reality. We use the term Gnode (giant node) referring to the few nodes with large degrees (i.e., the tail of the power-law distribution).

Refer to caption
Figure 3: Degree distributions of synthetic networks under varied network scales.

For both scenarios of FGM (see Figure 3), networks are able to exhibit power-law tendencies. Compared with other models (i.e., BA network and Configuration model) where the log-log curve gradient remains constant, the curve head of FGM is flatter than the tail, particularly in FGMp\mathrm{FGM_{p}}. Since the power law often applies only for values greater than some minimum xminx_{min} for real-life networks Clauset et al. (2009), we believe our model better captures the nature of factual degree distribution (see Figure 4). The results also indicate that the larger the network scale is, the more obvious the pattern becomes.

Refer to caption
Figure 4: Degree distributions of several factual networks and the corresponding simulations of FGMp\mathrm{FGM_{p}}.

Figure 4 shows the degree distribution of several real-life networks (i.e., a.Rozemberczki et al. (2019); b.McAuley and Leskovec (2012); c/e.Rozemberczki and Sarkar (2020); d.Leskovec et al. (2007)) and FGMp\mathrm{FGM_{p}}’s simulations with corresponding scales. Regardless of the field or scale of real-life networks, the actual distribution curve head is always flatter than the tail. This network property is correctly captured by our model while others cannot.

4.2 Average Nearest-Neighbor Degree

Starting here, we omit the assessment on ER networks because the randomness is too high to show observable patterns.

Another network measurement we present in detail is the average nearest-neighbor degree (ANND) Pastor-Satorras et al. (2001). As a network quantization of degree-degree correlations, ANND distribution reflects the connectivity details by specifying how connected nodes with certain degrees are. When ANND was first introduced in 2001, researchers found that the Internet in 1998 exhibited a distinct power-law ANND dependence and the same patterns have been later observed to be applicable to many other factual networks Pastor-Satorras et al. (2001); Catanzaro et al. (2005) (examples in Figure 5, a.Leskovec et al. (2005); b/c.Rozemberczki and Sarkar (2021); d.Zitnik et al. (2018); e.Rozemberczki et al. (2019); f.Cho et al. (2011)).

Refer to caption
Figure 5: ANND distributions of several factual networks.
Refer to caption
Figure 6: ANND distributions with fitting curves of FGM and other models under varied network scales. Results show clear power-law patterns in FGM.

The ANND simulation results under different network scales show that both cases of FGM display a clear power-law ANND dependency while others remain relatively constant. To this concern, we also believe FGM outperforms others because the power-law ANND tendencies were also observed in reality.

4.3 Other Network Properties

This section provides a comparison of other network properties that help assess the synthetic networks.

Model # of Nodes # of Edges Parallel Edges Clustering Coefficient Average Path Length
Small-World Network 500 2000 No 0.225 3.552
1000 4000 0.221 3.992
2000 8000 0.227 4.427
5000 20000 0.227 4.974
10000 40000 0.227 5.387
BA Network 500 7275 No 0.126 2.123
1000 14775 0.076 2.333
2000 29775 0.050 2.527
5000 74775 0.024 2.739
10000 149775 0.014 2.855
Configuration Model 500 3745 Yes NA 2.556
1000 7296 2.783
2000 14330 3.044
5000 36381 3.389
10000 73529 3.602
FGMp\mathrm{FGM_{p}} 500 3610 No 0.553 2.466
1000 7415 0.468 2.968
2000 13069 0.480 3.377
5000 39024 0.497 4.323
10000 76987 0.478 5.313
FGMr\mathrm{FGM_{r}} 500 4409 No 0.560 2.779
1000 9294 0.553 3.319
2000 20027 0.542 3.975
5000 53785 0.544 5.234
10000 103114 0.540 6.553
Table 1: Statistics of synthetic networks under different network scales (NA stands for ”Not Applicable”).

Table 1 shows that, under varied scales, FGM is the only model being able to create synthetic networks with low average length paths and high clustering coefficients while still providing network variations.

Although Small-World networks process relatively high clustering coefficients and low average path lengths, their fixed edge number severely restricts model flexibility. BA networks inherit low clustering coefficients and the coefficient seems to deteriorate while the network becomes larger. The clustering coefficient measurement for the Configuration model is not applicable due to its parallel edges, making it more troublesome to measure the network density.

4.4 The Influence Decay

Influence Decay Phenomenon

For network modeling reality, the edge-formation probability implies influence, i.e., how vital a node signifies in the network or how many links the node is able to establish. Common sense indicates that any individual’s influence, even for the most important ones cannot last forever. While the system evolves, the influence of individual will diminish. This is what we address as the influence decay phenomenon.

To the network domain’s concern, influence decay implies that any node’s edge-forming probability ought to demise while the network evolves, i.e., any node only interacts with a vanishing fraction of previous nodes Bloem-Reddy et al. (2018). To explain how this influence decay is uniquely reproduced in FGM, we compare FGM network with BA network from a micro-perspective: the edge-formation probability dynamics of randomly sampled Gnodes.

Refer to caption
Figure 7: Edge-forming probability dynamics of random sampled Gnodes under network of 2000 nodes.

For FGM, All Gnodes’ probabilities dissolve at around 200 nodes after their own arrivals, satisfying the phenomenon that “any individual’s influence eventually demises”. But even as the network grows 20 times bigger, those probabilities in BA network remain relatively stable, indicating sustained long-term influences.

Note that although we use the BA network as illustrations because of its profound influence, any generative model under our category property-based-solely models will potentially behave as the opposite of influence decay. Because properties inevitably grow while the network unfolds.

Although the edge-forming probabilities decay in FGM, the probability determinant InfRtvInfRt_{v} remains constant. What has changed is the less likelihood of current arrival including earlier ones in its neighbor resampling results as a result of introducing the order attribute into distance calculation.

This is an essential component of what we attempt to accomplish in FGM — the notable decayed influence with an intrinsic property remained. Because in many real-life situations, as the environment changes, individuals within have a tendency to remain in the status quo, i.e., “the leopard cannot change its spots”.

4.5 Large Scale with High Generation Proficiency

In this section, we demonstrate the generation time complexity of FGM by employing Locality-Sensitive Hashing (LSH) as the neighbor-searching function (i.e., Υ(\Upsilon(\cdot)) in Equation 2.

Refer to caption
Figure 8: Generation time of FGM with LSH and its fitting curve with original scatters in plot regularly sampled into 1/50 for legibility.

Generation time of a 5000-node FGM network is shown in Figure 8. The generation time complexity of FGM is solely dependent upon the process of finding PNbrPNbr owing to two factors:

  1. 1.

    Every node only has to consider the possibility of forming edges with a small faction of existing nodes;

  2. 2.

    The determinant of edge formation (i.e., InfRtInfRt) is constant, circumventing the necessity of recalculation when network changes.

A constant time of finding PNbr will lead to a network generation complexity of O(n)O(n), which is also the case in our demonstration of FGM with LSH’s O(1)O(1) query time. This makes it possible for modeling any large-scale cubes within tolerable time as dataset scale will not accumulate the computational burden. However, it is not the case for models whose dynamics are dependent on the ever-changing network properties because of the need for recalculation. Here, we compare the time complexities of different models.

We believe the comparison of proportional time costs is much more suitable than the absolute time costs. With the very nature of network models being several pre-defined regulations, varied implementations of the same model will alter the time cost to a great extent. Still, if compared to the time cost of a small network, the proportional time costs of bigger networks will still reflect the model’s tolerance of large-scale data.

Thus, every model’s generation time of a 1000-node network is denoted as {1000} uniformly beforehand. We also include the OSN Evolution model Khan and Lee (2018b) which, like FGM, models data into networks.

# of Nodes 1000 2000 5000 10000
Configuration model 1.0 2.1 12.2 20.1
Small-World network 1.0 4.0 26.2 96.4
BA network 1.0 6.2 81.6 606.7
OSN Evolution model 1.0 9.2 130.5 1112.7
FGM 1.0 4.1 25.5 103.8
Table 2: The proportional time cost of models with unit as {1000}.

Table 2 shows that within the field of scale-free networks, the Configuration model, BA model Atwood et al. (2015), and FGM process linear generation time complexities. As for the other data-driven approach, the OSN Evolution model becomes extremely time-consuming as the scale gets bigger.

5 Cube Experimentations

5.1 Properties of Cube-Networks

Cubes # of Nodes # of Edges Clustering Coefficient Avg. Path Length Time Cost (Sec.)
Youtube Channels 1000 9588 0.458 3.934 less than 1
COVID19 Vaccination 86512 711170 0.375 4.311 6
Global Terrorism Attacks 190036 1036035 0.245 5.756 10
UK Road-Traffic 2047256 12703808 0.282 3.501 117
Dutch Crimes 2596254 14435180 0.251 5.221 133
IP Traffic Flows 3577296 28797789 0.392 3.822 230
Table 3: Statistics of synthetic networks derived from six cubes of varied domains

In this section, we provide evaluations of FGM from a macro-perspective analysis, i.e., properties of interrelated networks from various cubes. We convert previously illustrated cubes in Figure 2 to interrelated networks and further investigate whether synthetic networks can reproduce typical patterns as real-life networks while, in the meantime, reflecting patterns of original cubes.

Refer to caption
Figure 9: Cubes InfRtInfRt distributions alongside degree distributions of corresponding networks FGM generates.

From Table 3, it is clear that, regardless of cube sizes, FGM synthetic networks can maintain high clustering coefficients with low average path lengths as factual networks. Figure 9 shows the degree distributions exhibit power-law patterns. Notably, when examined in conjunction with the II distribution of the original cubes, these degree distributions also accurately capture the uniqueness of the original patterns, that is, the manner in which the data points are distributed in the plot.

Now, from a macro-perspective analysis, we believe that FGM can generate synthetic networks with interrelations to reality, where common properties of factual networks are reproduced and original data patterns are preserved to a certain degree.

5.2 Decayed Influence of 9-11

Here, we present the evaluations of FGM from a micro-perspective analysis, i.e., how the Gnode (i.e., 9-11 attack) in Global Terrorism Database (GTD) START (2020) reflects the prescribed influence decay phenomenon (see Section 4.4). To elaborate, we compare the edge dynamics of 9-11 attack in FGM and BA network of modeling GTD between 2001 and 2005.

Refer to caption
Figure 10: GTD (2001-2005) modeling of FGM and BA networkt (line chart is horizontally aligned with network plots).

Figure 10 shows that, in FGM Gnode 9-11’s degree remained stationary after around 3 months whereas the edge dynamic in BA network kept evolving throughout the simulation. That is to say, the 9-11 influence on other attacks in FGM eventually decays despite its enormous impact at the time, whereby, in property-based model like BA Network, the impact keeps growing. To make things worse, 9-11 is forever the biggest node for each new arrivals no matter how the world evolves. Nowadays clearly more recent events have bigger impacts.

From a micro-perspective analysis, we also believe FGM reflects factual phenomenon to a centain degree where previous works cannot.

6 Discussion

During the process of neighbors resampling (see Section 3.2), we obtained potential neighbors (i.e., PNbrPNbr) through the order and geographic attributes (i.e., TT and CC). However, we believe the acquirement of PNbrPNbr (i.e. Υ()\Upsilon(\cdot) in Equation 2) doesn’t have to utilize node attributes at all. An alternative implementation of Υ()\Upsilon(\cdot) can take the original dataset’s features as input and directly output PNbr through machine learning algorithms such as Bayesian methods, Clustering, or any others that produce classification results. The direct attainment of PNbrPNbr introduce better flexibility to FGM as the geographic attribute CC is no longer required, i.e., FGM becomes geo-free allowing for geographic irrelevance data to be modeled into non-locale-based networks.

This alternative provides yet another benefit besides more flexibility: with PNbrPNbr acquired beforehand, FGM’s network generation time will be fixed to O(n) regardless of model’s implementation details.

7 Conclusion

In this paper, we answer the question “Can we efficiently model cubes from varied domains to interrelated networks?” With the proposed FGM, we believe we reach a positive answer to this question. Extensive experiments show the generated networks can both reproduce patterns shared by factual networks and perserve the uniqueness of original data features from both marco-perspective and micro-perspective. Moreover, FGM’s decoupling from specific algorithm and linear generation time provide the approach with noble flexibility and efficiency.

FGM opens doors to utilizing network-unique methodologies for systematic dynamic analysis of cubes and to generating abundant synthetic networks for future network research.

References

  • Atwood et al. [2015] James Atwood, Bruno Ribeiro, and Don Towsley. Efficient network generation under general preferential attachment. Computational Social Networks, 2(1):7, December 2015.
  • Barabási and Albert [1999] Albert L. Barabási and Réka Albert. Emergence of scaling in random networks. Science, 286(5439):509–512, October 1999.
  • Bloem-Reddy et al. [2018] Benjamin Bloem-Reddy, Adam Foster, Emile Mathieu, and Yee W. Teh. Sampling and inference for beta neutral-to-the-left models of sparse networks. In Uncertainty in Artificial Intelligence, pages 477–486, Monterey, CA, August 2018. 34th Conference on Uncertainty in Artificial Intelligence (UAI).
  • Bojchevski et al. [2018] Aleksandar Bojchevski, Oleksandr Shchur, Daniel Zugner, and Stephan Gunnemann. Netgan: Generating graphs via random walks. In 35th International Conference on Machine Learning (ICML), July 2018.
  • Catanzaro et al. [2005] Michele Catanzaro, Marián Boguñá, and Romualdo Pastor-Satorras. Generation of uncorrelated random scale-free networks. Physical Review E, 71(2):027103, February 2005.
  • Chakrabarti and Faloutsos [2006] Deepayan Chakrabarti and Christos Faloutsos. Graph mining: Laws, generators, and algorithms. Acm Computing Surveys, 38(1):A1–A69, March 2006.
  • Cho et al. [2011] Eunjoon Cho, Seth A. Myers, and Jure Leskovec. Friendship and mobility: User movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1082––1090, New York, NY, USA, August 2011. Association for Computing Machinery.
  • Chung et al. [2019] Wingyan Chung, Bingbing Rao, and Liqiang Wang. Interaction models for detecting nodal activities in temporal social media networks. Acm Transactions on Management Information Systems, 10(4), December 2019.
  • Clauset et al. [2009] Aaron Clauset, Cosma Rohilla, and Mark E.J. Newman. Power-law distributions in empirical data. SIAM REVIEW, 51(4):661–703, December 2009.
  • Derr et al. [2018] Tyler Derr, Charu Aggarwal, and Jiliang Tang. Signed network modeling based on structural balance theory. In 27th ACM International Conference on Information and Knowledge Management (CIKM), pages 557–566, New York, NY, USA, October 2018. Association for Computing Machinery.
  • Duan et al. [2019] Binyao Duan, Wenjian Luo, Hao Jiang, and Li Ni. Dynamic social networks generator based on modularity: Dsng-m. In 2nd International Conference on Data Intelligence and Security (ICDIS), pages 167–173, June 2019.
  • Gürsoy and Bertan [2022] Furkan Gürsoy and Badur Bertan. An agent-based modeling approach to brain drain. IEEE transactions on computational social systems, 9(2):9, March 2022.
  • Han and Wang [2022] Xuehua Han and Juanle Wang. Modelling and analyzing the semantic evolution of social media user behaviors during disaster events: A case study of covid-19. Isprs International Journal of Geo-Information, 11(7), July 2022.
  • Holland and Leinhardt [1981] Paul W. Holland and Samuel Leinhardt. An exponential family of probability-distributions for directed-graphs. Journal of the American Statistical Association, 76(373):33–50, 1981.
  • Inoue et al. [2020] Masaaki Inoue, Pham Thong, and Hidetoshi Shimodaira. Joint estimation of non-parametric transitivity and preferential attachment functions in scientific co-authorship networks. Journal of Informetrics, 14(3), August 2020.
  • Jha [2022] Suraj Jha. Most subscribed youtube channels. https://www.kaggle.com/datasets/surajjha101/top-youtube-channels-data, 2022.
  • Khan and Lee [2018a] Jebran Khan and Sungchang Lee. Online social networks (osn) evolution model based on homophily and preferential attachment. Symmetry-Basel, 10(11), November 2018.
  • Khan and Lee [2018b] Jebran Khan and Sungchang Lee. Online social networks (osn) evolution model based on homophily and preferential attachment. Symmetry-Basel, 10(11), November 2018.
  • Lai et al. [2022a] Guijun Lai, Yuzhen Shang, Binbao He, Guanwei Zhao, and Muzhuang Yang. Revealing taxi interaction network of urban functional area units in shenzhen, china. Isprs International Journal of Geo-Information, 11(7), July 2022.
  • Lai et al. [2022b] Guijun Lai, Yuzhen Shang, Binbao He, Guanwei Zhao, and Muzhuang Yang. Revealing taxi interaction network of urban functional area units in shenzhen, china. Isprs International Journal of Geo-Information, 11(7), July 2022.
  • Leskovec et al. [2005] Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graphs over time: Densification laws, shrinking diameters and possible explanations. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pages 177––187, New York, NY, USA, August 2005. Association for Computing Machinery.
  • Leskovec et al. [2007] Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data, 1(1):2––es, March 2007.
  • McAuley and Leskovec [2012] Julian McAuley and Jure Leskovec. Learning to discover social circles in ego networks. http://snap.stanford.edu/data/ego-Facebook.html, 2012.
  • Newman [2003] Mark E.J. Newman. The structure and function of complex networks. Siam Review, 45(2):167–256, Jun 2003.
  • Pastor-Satorras et al. [2001] Romualdo Pastor-Satorras, Alexei Vázquez, and Alessandro Vespignani. Dynamical and correlation properties of the internet. Physical Review Letters, 87(25):258701, November 2001.
  • Preda [2022] Gabriel Preda. Covid-19 world vaccination progress. https://www.kaggle.com/datasets/gpreda/covid-world-vaccination-progress, 2022.
  • Roeder et al. [2021] Michael Roeder, Nguyen Pham Thuy Sy, Felix Conrads, Ana Alexandra Morim da Silva, Axel-Cyrille Ngonga Ngomo, and Ieee. Lemming - example-based mimicking of knowledge graphs. In 15th IEEE International Conference on Semantic Computing (ICSC), pages 62–69, January 2021.
  • Rojas [2019] Juan S. Rojas. Ip network traffic flows labeled with 75 apps. https://www.kaggle.com/datasets/jsrojas/ip-network-traffic-flows-labeled-with-87-apps, 2019.
  • Romero et al. [2020] Juan Romero, Jorge Finke, and Andres Salazar. Fitness-weighted preferential attachment with varying number of new connections. In 8th International Conference on Complex Networks and Their Applications (COMPLEX NETWORKS), pages 612–620, December 2020.
  • Rozemberczki and Sarkar [2020] Benedek Rozemberczki and Rik Sarkar. Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM ’20), pages 1325––1334. ACM, 2020.
  • Rozemberczki and Sarkar [2021] Benedek Rozemberczki and Rik Sarkar. Twitch gamers: a dataset for evaluating proximity preserving and structural role-based node embeddings. http://snap.stanford.edu/data/twitch_gamers.html, 2021.
  • Rozemberczki et al. [2019] Benedek Rozemberczki, Carl Allen, and Rik Sarkar. Multi-scale attributed node embedding. http://snap.stanford.edu/data/github-social.html, 2019.
  • Sahafizadeh and Ladani [2020] Ebrahim Sahafizadeh and Behrouz Tork Ladani. A model for social communication network in mobile instant messaging systems. Ieee Transactions on Computational Social Systems, 7(1):68–83, February 2020.
  • Scheijen [2022] Max Scheijen. Dutch crimes. https://www.kaggle.com/datasets/maxscheijen/dutch-crimes, 2022.
  • Sottile et al. [2022] Sara Sottile, Ozan Kahramanogullari, and Mattia Sensi. How network properties and epidemic parameters influence stochastic sir dynamics on scale-free random networks. Journal of Simulation, August 2022.
  • START [2020] START. National consortium for the study of terrorism and responses to terrorism. https://www.start.umd.edu/gtd/, 2020.
  • Travers and Milgram [1977] Jeffrey Travers and Stanley Milgram. An experimental study of the small world problem. In Social Networks, pages 179–197. Academic Press, 1977.
  • UK Department for Transport [2022] UK Department for Transport. Road safety data. https://www.data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-safety-data/, 2022.
  • Watts and Strogatz [1998] Duncan J. Watts and Steven H. Strogatz. Collective dynamics of ’small-world’ networks. Nature, 393(6684):440–442, June 1998.
  • Wen et al. [2022] Jiaqi Wen, Bogdan Gabrys, and Katarzyna Musial. Toward digital twin oriented modeling of complex networked systems and their dynamics: A comprehensive survey. Ieee Access, 10:66886–66923, June 2022.
  • Zitnik et al. [2018] Marinka Zitnik, Rok Sosič, Sagar Maheshwari, and Jure Leskovec. BioSNAP Datasets: Stanford biomedical network dataset collection. http://snap.stanford.edu/biodata, August 2018.