Networks with Growth and Preferential Attachment: Modeling and Applications

Gabriel G. Piva¹¹1 https://orcid.org/0000-0001-7636-6568 [email protected] Departamento de Física, Universidade Federal de Lavras, 37200-900 Lavras, MG, Brazil Departamento de Fisica, Pontifícia Universidade Católica do Rio de Janeiro, 22451-900 Rio de Janeiro, RJ, Brazil Fabiano L. Ribeiro²²2 https://orcid.org/0000-0002-2719-6061 [email protected] Departamento de Física, Universidade Federal de Lavras, 37200-900 Lavras, MG, Brazil Angélica S. Mata³³3 https://orcid.org/0000-0002-3892-5274 [email protected] Departamento de Física, Universidade Federal de Lavras, 37200-900 Lavras, MG, Brazil

Abstract

In this article we presented a brief study of the main network models with growth and preferential attachment. Such models are interesting because they present several characteristics of real systems. We started with the classical model proposed by Barabàsi and Albert Barabási and Albert (1999): nodes are added to the network connecting preferably to other nodes that are more connected. We also presented models that consider more representative elements from social perspectives, such as the homophily between the vertices or the fitness that each node has to build connections Bianconi and Barabási (2001); de Almeida Maurício L. et. al. (2013). Furthermore, we showed a version of these models including the Euclidean distance between the nodes as a preferential attachment rule Soares et al. (2005). Our objective is to investigate the basic properties of these networks as distribution of connectivity, degree correlation, shortest path, cluster coefficient and how these characteristics are affected by the preferential attachment rules. Finally, we also provided a comparison of these synthetic networks with real ones. We found that characteristics as homophily, fitness and geographic distance are significant preferential attachment rules to modeling real networks. These rules can change the degree distribution form of these synthetic network models and make them more suitable to model real networks.

I Introduction

Complex systems has become a widely applied area of research because of everything around us can be described by a complex network, including social, technological or biological organisms. The growth and the preferential attachment considering that a node has higher probability to connect with a other node that already have many edges are famous ingredients Barabási and Albert (1999) to produce a power law degree distribution, frequently used topology to describe real systems.

In general, it has been shown that real networks present a power law degree distribution with $2<\gamma<3$ Radicchi (2015); Newman and Park (2003); Radicchi and Claudio (2015); Caldarelli (2007); Newman (2002); Barabási and Pãsfai (2016); Song et al. (2010); Ebel et al. (2002); Leskovec et al. (2007). However, this is a controversial topic Broido and Clauset (2019); Holme (2019); Barabási and Pãsfai (2016). In a recent study, Broido and Clauset Broido and Clauset (2019) investigated nearly 1000 of real networks using statistical tools. They showed evidences that power-law degree structured is not usual to be found in real-life. They evaluated social, biological, technological, transportation, and information networks. Their main conclusion is social networks are weakly scale-free while technological and biological networks are strongly scale-free. However, they also found that $51\%$ of the real data set can be classified as some kind of scale-free category. Barabàsi also arguments⁴⁴4blog post: https://www.barabasilab.com/post/love-is-all-you-need that real networks, ruled by growth and preferential attachment, have power law with an exponential cutoff degree distribution.

In this paper, we investigated social and technological real networks and we found that they can be modeled by networks with growth and preferential attachment. To account for more realistic aspects, we considered other concepts in the preferential attachment as fitness Bianconi and Barabási (2001), homophily de Almeida Maurício L. et. al. (2013), and Euclidean distance between nodes Soares et al. (2005). Indeed, social systems often present these kind of feature’s connections Currarini et al. (2016); Bisgin et al. (2010) and real-world systems in general are often embedded in Euclidean space Rozenfeld et al. (2002); Liu (2018); Laniado and Kaltenbrunner (2017); Lengyel et al. (2015). We investigated the phone calls Song et al. (2010), collaboration Leskovec et al. (2007) and e-mails networks Ebel et al. (2002). The first two are social networks because they describe family, friendship and/or professional interactions while the email network behaves as a technological network. We also found that the email network present a more “scale-free” behavior in its degree distribuiton while social networks are better described by a $q$ -exponential degree distribution, according to the model proposed by Soares and collaborators Soares et al. (2005).

The paper is divided as follows: The detailed description of networks models with growth and different rules of preferential attachments are found in section II, where we also studied some properties of these networks as degree distribution and assortativity. The main information and results about the networks are summarized in table (1). In section III, we provided a comparison of these synthetic networks with real ones. At last, we presented our final considerations in section IV.

II Network Models with Growth and Preferential Attachment

A network model has properties similar to real systems. Networks are considered a powerful tools to represent patterns of connections between parts of systems such as Internet, power grid, food webs, social networks, etc Caldarelli (2007); Bollobás and Riordan (2003); Dorogovtsev and Mendes (2002). Some particular metrics properties, like degree distribution, shortest path length, and clustering coefficient have been attracted attention of physics communities.
Watts and Strogatz Watts and Strogatz (1998) shows that real networks is characterized by average shortest path distance between two vertex and large clustering coefficient, describing these properties by small-world model.

Based on that, Barabàsi-Albert Barabási and Albert (1999) proposed two basics mechanisms that try to better characterize a real network: growth of system, adding new agents and preferential attachment, where a new agent connects preferentially with most connected nodes already on the network. The web expands with adding of new documents which links with older or well known sites, for instance. The probability that a new node will connect to a node with $k$ links is proportional to $k$ , independently of geographic distance.

However, there are other examples of real networks whose connectivity may depend on the geographic distance between the nodes, as a power grid. In addition to geographic distance, there may be other relevant ingredients to consider when connecting the elements of the system. Social interaction between people have intrinsic characteristics that should be taken into account as for example the influence one person has on another and the affinity between them, representing friendship, familiar or professional ties.

To model these features, some networks have been studied through over the years. We presented below some of them that consider preferential attachment rules according to the degree (Barabàsi-Albert model Barabási and Albert (1999)), or the fitness of the node to make connections Bianconi and Barabási (2001), or the homophiy between them de Almeida Maurício L. et. al. (2013), and finally, according to the euclidean distance between the nodes Soares et al. (2005).

II.1 Barabási-Albert Network

To explain in a simple way the behavior of technological networks, such as internet, Barabàsi and Albert Barabási and Albert (1999) proposed the following model:

•

The system starts with $m_{0}$ nodes connected to each other.
•

At each time step, a new node $j$ is entered on the network and it connects to a random node $i$ chosen at random with probability $\Pi(k_{i}|j)$ proportional to its degree ( $k_{i}$ ), which means

$\Pi(k_{i}|j)=\frac{k_{i}}{\sum_{n}k_{n}}$ (1)

where the normalization $\sum_{n}k_{n}$ is the sum over all degree $k_{n}$ of each node $n$ already connected on the network.

These rules define what is know by Barabàsi-Albert (BA) model, and generate a network with a distribution of connectivity, say $P(k)$ , that follows a power-law degree distribution of the form $P(k)\sim k^{-\gamma}$ , with $\gamma=3.0$ , in the thermodynamic limit, which is independent of the value of $m_{0}$ , as shown in figure 1.

Refer to caption — Figure 1: Distribution of the connectivity degree P(k) of the BA network. Dots are the average over $10^{3}$ networks of size $N=10^{5}$ and $m_{0}=3$ . The dashed line has a slope $P(k)\sim k^{-3}$ and serves as a guide for the eyes.

We can also calculate the clustering coefficient, say $\langle C\rangle$ , of the BA network Dorogovtsev and Mendes (2004). It is the tendency of the network to form fully connected sub-graphs in the neighborhood of a given vertex, and grows with the network size $N$ as:

\langle C\rangle\sim\frac{[\ln(N)]^{2}}{N}

(2)

We showed this behavior in figure 2. The simulation data follow the same bias as given by equation 2.

Other important measure of networks is called shortest path length. The distance between two any nodes $i$ and $j$ is defined as the number of links in the shortest path that connects them, named $d_{ij}$ . The measure that represents the average over all shortest paths that link all the possible pairs of vertices in the network is called the average shortest path length $\langle d\rangle$ Dorogovtsev and Mendes (2004). For BA network, it is given by $\langle d\rangle\sim\frac{\log N}{\log(\log N)}$ , confirming its small world property Bollobás and Riordan (2003).

Other feature that should be analysed is the degree correlation. The nodes of a network can present a tendency to connect with other nodes that have a similar or dissimilar degree. When the first case happens one says the network is assortative correlated and if the second case occurs, the network is categorized as a disassortative correlated Newman (2002).

The simplest and most used way to quantify the degree correlation is given by the average degree of the nearest neighbors (nn) of a vertex $i$ with degree $k_{i}$ Dorogovtsev and Mendes (2002),

k_{nn,i}=\frac{1}{k_{i}}\sum_{j\in\mathcal{N}(i)}k_{j},

(3)

where the sum runs over by the nearest neighbors vertices of $i$ , represented by the set $\mathcal{N}(i)$ . The degree correlation is obtained by the average degree of the nearest neighbors, $k_{nn}(k)$ , for vertices of degree $k$ Pastor-Satorras et al. (2001). That is,

k_{nn}(k)=\frac{1}{N_{k}}\sum_{i|k_{i}=k}k_{nn,i},

(4)

where $N_{k}$ is the number of nodes of degree $k$ and the sum runs over all vertices with the same degree $k$ . This quantity is related to the correlations between the degrees of connected nodes because in average it can be expressed as

k_{nn}(k)=\sum_{k^{\prime}}k^{\prime}P(k^{\prime}|k),

(5)

where $P(k^{\prime}|k)$ is the probability of a node with degree $k$ to have a neighbour node with degree $k^{\prime}$ . If degrees of neighboring vertices are uncorrelated, $P(k^{\prime}|k)$ is just a function of $k^{\prime}$ and $k_{nn}(k)$ is a constant. If $k_{nn}$ increases with $k$ then vertices with high degrees have a larger likelihood of being connected to each other. If $k_{nn}$ decreases with $k$ , high degree vertices have larger probabilities of have neighbors with low degrees Pastor-Satorras et al. (2001); Barrat et al. (2008).

Table 1: Table with all main informations of the networks that were investigated in this work. The mean clustering coefficient

\langle C\rangle

and the Pearson correlation coefficient

c_{P}

are obtained for a sample of 1000 networks with size

N=10^{4}

. The average shortest path length

\langle d\rangle

is obtained for a sample of at least 20 networks with the same size. For networks with Euclidean distance we consider always

\alpha_{A}=3

, since the topological phase transition occurs for

\alpha_{A}\approx 2.

The preferential attachment rule is shown in the column

\prod(k_{i}|j)

. P(k) is the degree distribution form of each model, and

\gamma

is the exponent related to a power law degree distribution that characterizes the first three networks that were investigated.

Network	$\Pi(k_{i}\|j)$	$P(k)$	$\gamma$	$\langle C\rangle$	$\langle d\rangle$	$c_{p}$
Barabási-Albert	$\frac{k_{i}}{\sum_{n}k_{n}}$	$\sim k^{-\gamma}$	3	0.0055(5)	4.3(1)	-0.037(4)
Fitness Model	$\frac{\eta_{i}k_{i}}{\sum_{n}k_{n}}$	$\sim k^{-\gamma}$	2.25	0.028(7)	4.1(2)	-0.09(1)
(Bianconi-Barabási)
Homophilic Model	$\frac{(1-{\cal A}_{ij})k_{i}}{\sum_{n}(1-{\cal A}_{in})k_{n}}$	$\sim k^{-\gamma}$	2.75	0.015(3)	4.2(2)	-0.038(4)
Euclidean Distance Model	$\frac{k_{i}r_{ij}^{-\alpha_{A}}}{\sum_{n}k_{i}r_{in}^{-\alpha_{A}}}$	$\sim e_{q}^{-k/\kappa}$	-	0.0019(2)	4.7(1)	0.034(7)
Fitness Model	$\frac{\eta_{i}k_{i}r_{ij}^{-\alpha_{A}}}{\sum_{n}\eta_{n}k_{i}r_{in}^{-\alpha_{A}}}$	$e_{q}^{-k/\kappa}$	-	0.0034(4)	4.6(1)	-0.046(8)
with euclidean distance
Homophilic Model	$\frac{(1-{\cal A}_{ij})k_{i}r_{ij}^{-\alpha_{A}}}{\sum_{n}(1-{\cal A}_{in})k_{i}r_{in}^{-\alpha_{A}}}$	$e_{q}^{-k/\kappa}$	-	0.0020(2)	4.7(1)	0.028(7)
with euclidean distance

The BA network is weakly disassortative as we showed in the figure 3. We observe that the preferential attachment interferes just in the connectivity of nodes recently added in the network. According to the rule of the model, these nodes connect preferably with hubs, creating a disassortative correlation for small values of $k$ . But, as long as the degree grows, the network becomes almost uncorrelated.

We also can use the Pearson coefficient, named $c_{P}$ , to quantify degree correlations, according to the expression Barrat et al. (2008):

c_{P}=\frac{\sum_{e}j_{e}k_{e}/E-[\sum_{e}(j_{e}+k_{e})/(2E)]^{2}}{[\sum_{e}(j_{e}^{2}+k_{e}^{2})/(2E)]-[\sum_{e}(j_{e}+k_{e})/(2E)]^{2}},

(6)

where $j_{e}$ and $k_{e}$ are the degrees of the nodes that are in the beginning and in the end of the edge $e$ , and $E$ is the total number of connections. This quantity ranges from $-1$ to $1$ meaning disassortative and assortative networks, respectively. It is a complementary information to the $k_{nn}(k)$ measure. While the latter provides how the degree correlation can vary with $k$ , the Pearson coefficient ( $c_{P}$ ) quantifies the degree correlation of the entire network according to a scale ranging from -1 to 1. This measure was also used to complement the characterization of a topological phase transition on growth and preferential attachment model that consider the euclidean distance between the nodes, as we will see in section II.4. In addition, it will be useful to compare the synthetic networks with real ones, in section III.

All the main information of the BA network is summarized in the table 1, as well the information about other networks that were also treated in this paper. In general, real networks present a power law degree distribution with $2<\gamma<3$ Radicchi and Claudio (2015); Newman (2002). So, the BA model is restricted to describe a large set of them because its degree exponent is fixed $\gamma\approx 3$ . Next, we show other features that can be added to the model to make it more realistic.

II.2 Fitness Model: Bianconi-Barabási Network

The original BA model produces a power-law network with the presence of sites that become privileged that is, with more connections over time. But this model does not taking into account the competitiveness, this means, the ability of younger nodes to acquire new neighbors. Facebook, for example, has become one of the most visited sites in a short period of time when compared to the Google, an older search website. Another example is the growth of corporations where some newer ones concentrate more services than older ones. We can simulate this situation including a intrinsic characteristic in each node, called fitness. In social networks, fitness would represent an individual’s attribute of becoming more popular due to some quality of him/her. In networks, it represent the probability of a node to become a hub quickly.

This characteristic was observed in real networks by Bianconi and Barabàsi, who later proposed an alternative model including the fitness factor $\eta_{i}$ of each node $i$ Bianconi and Barabási (2001). The algorithm is similar to the BA network, but each site connects to an existing node on the network with a probability that, in addition to depending on $k$ connectivity, is also proportional to the attractiveness $\eta_{i}$ . The choice of $\eta_{i}\in[0,1]$ is usually given by a uniform distribution $\rho(\eta_{i})$ Bianconi and Barabási (2001), and the connection probability is defined by:

\Pi(k_{i}|j)=\frac{\eta_{i}k_{i}}{\sum_{n}\eta_{n}k_{n}}.

(7)

When the fitness parameter is imposed, the network remains a power-law degree distribution but with an exponent less than 3 (see figure 4). According to the literature, $\gamma=2.25$ in the thermodynamic limit. There are more privileged sites and, consequently, more hubs than the BA network, which makes the $\gamma$ exponent smaller, that is, the network is more heterogeneous. In the inset of figure 4 we show the degree correlation measured through the nearest-neighbors degree for the Fitness model. The behavior is similar to that one we found for BA network. We also calculate the mean clustering coefficient $\langle C\rangle$ , the average shortest path length $\langle d\rangle$ , and the Pearson correlation coefficient $c_{P}$ . These informations are shown in table 1. We observed that, when compared to BA model, this network presents a higher cluster coefficient and a lower Pearson correlation coefficient, but the average shortest path length pretty does not change.

II.3 Homophilic Model

In a social network, people tend to relate to others who share common characteristics such as musical taste, football team, religion, and work. To take this tendency into account in social network models, we can include a connection parameter called homophily.

Almeida and colleagues de Almeida Maurício L. et. al. (2013) proposed a model introducing this parameter through an intrinsic property value of each node, called $\eta_{i}$ , similar to the previous model. The homophily between any two nodes $i$ and $j$ , say $\mathcal{A}_{ij}$ , is defined as the module of the difference between $\eta_{i}$ and $\eta_{j}$ , that is, $\mathcal{A}_{ij}=|\eta_{i}-\eta_{j}|$ . The lower is $\mathcal{A}_{ij}$ , the greater the affinity between both and, consequently, the greater the probability to connect with each other. The proposed algorithm is as follows:

•

It starts with $m_{0}$ sites connected to each other, in the same way as in the BA network, but introducing a characteristic $\eta_{i}$ for each node, chosen randomly in a uniform distribution $p(\eta)$ in the interval $[0,1]$ .

•

For each time step, add a node $j$ that links to other $m_{0}$ nodes already on the network. Each $j$ node connects preferably to a node $i$ according to the probability

\Pi(k_{i}|j)=\frac{(1-\mathcal{A}_{ij})k_{i}}{\sum_{n}(1-\mathcal{A}_{in})k_{n}}

(8)

•

The procedure of the second item is repeated until the network reaches a previously established size $N$ .

The competition between the degree of connectivity and the affinity between the nodes generates a network with a power law degree distribution, but with $\gamma\sim 2.75$ , as we shown in figure 5. This value is lesser than the exponent obtained in the BA model ( $\gamma=3.0$ ) but it is greater than the value obtained in the Fitness network ( $\gamma=2.25$ ). This difference is explained because in the BA network, only the degree of connectivity is considered to make links, which generates the phenomenon “rich gets richer”. In the Fitness model, nodes newly inserted in the network can become hubs more easily as long as they have high fitness. That is, there is a democratization because a node can become hub regardless of its age, as is the case of facebook and google network, for example. In the homophilic model, a node $j$ can assume a value of $\eta_{j}=0.5$ , for example. When this happens, if it tries to connect to a node $i$ that has $\eta_{i}=0.3$ or another node $k$ which has $\eta_{k}=0.7$ , the affinities between both pairs $ij$ and $jk$ are the same. So, in this case, according to the expression (8) who dictates the preference in the connection is the degree of the node de Almeida Maurício L. et. al. (2013).

In the inset of figure 5 we show the degree correlation measured through the nearest-neighbors degree for the Homophilic model. The behavior is similar to that one we found for the other networks. We also calculate the mean clustering coefficient $\langle C\rangle$ , the average shortest path length $\langle d\rangle$ , and the Pearson correlation coefficient $c_{P}$ . These informations are shown in table 1. We observed that, when compared to BA model, this network presents almost the same $\langle d\rangle$ and $c_{P}$ but a slightly higher clustering coefficient.

II.4 Networks with Euclidean distance

The models presented above do not take into account the spatial distance between the agents that compose the network. But in many real systems, that variable plays an important role. For example, in the city model proposed by Ribeiro and colaborators Ribeiro et al. (2017), the authors observed how the Euclidean distance influences the potential of cities and in scale’s law to measure socio-economic and infrastructure indicators. There are other works that showed the relation between social interaction and spatial properties Laniado and Kaltenbrunner (2017); Liu et al. (2019); Lengyel et al. (2015). For example, in the paper Laniado and Kaltenbrunner (2017), the authors analyzed online social networks and they found that spatial distance restricts who interacts with whom and denser connected groups tend to arise at shorter spatial distances. In the following subsections we reconstructed the standard models shown previously including euclidean distance between the nodes as an attachment ingredient.

II.4.1 Model proposed by Soares et. al

Soares and colleagues Soares et al. (2005) built a model in which the preferred connection dynamics happens according to the degree of connectivity but also considers the Euclidean distance between the nodes. To build the model, we consider that each site is inserted on a continuous plane. The first node is added at an arbitrary distance from the origin and the others are isotropically distributed with a probability $P_{G}(r)\propto r^{-(2+\alpha_{G})}$ , which depends on the distance $r$ from the center of mass, which is positioned at $r_{cm}$ from the origin and is re-calculate at each time step. The exponent $\alpha_{G}$ ( $G$ refers to the word growth) is responsible for the network growth, that is, defines how close or distant the nodes will be placed. The calculation of the position of the center of mass is given by $\mathbf{r}_{cm}=\frac{1}{M}\sum_{n=1}^{N}m_{n}\mathbf{r}_{n}$ where $m_{n}$ is the mass of the node $n$ , and $\mathbf{r}_{n}$ is the vector-distance of this node to the origin, and $M=\sum_{n=1}^{N}m_{n}$ is the total mass. The network has a total of $N$ nodes, and we can consider each node with mass $m_{n}=1$ , so we have $\mathbf{r}_{cm}=\frac{1}{N}\sum_{n=1}^{N}\mathbf{r}_{n}$ . Each new site $j$ connects to a pre-existing node $i$ following a rule of preferential connection that depends on the distance between them, $r_{ij}$ and the degree of connectivity of the node $i$ , that is,

\Pi(k_{i}|j)=\frac{k_{i}r_{ij}^{-\alpha_{A}}}{\sum_{n}k_{n}r_{in}^{-\alpha_{A}}}.

(9)

The $\alpha_{A}$ exponent ( $A$ refers to the word attachment) controls the influence of spatial distance between sites in the preferential attachment. If $\alpha_{A}=0$ , we recover the BA network that does not take into account the spatial distance between the nodes. This model preserves the preferential attachment according to the degree of the nodes, but also taking into account the geographical distance as a criterion to dispute for links. In figure 6 we show a plot of an example of a network generated according to this algorithm.

Numerical results show that the parameter $\alpha_{G}$ does not affect the behavior of the connectivity distribution $P(k)$ of the network (see Figure 7(b)). This parameter refers just to the distance distribution in relation to the center of mass, and acts only on size scale but not on the structure of the network, and consequently it does not impact on the preferential attachment rules. On the other hand, as $\alpha_{A}$ increases, the connectivity distribution changes (see figure 7(a)). Soares et. al. Soares et al. (2005) showed that the degree distributions of networks generated according to their model are very well fitted with the form

P(k)=P(0)e_{q}^{-k/k_{0}}

(10)

where $k_{0}>0$ is the characteristic number of connections, $P(0)$ is a constant to be normalized, $q$ is the entropic index and $e_{q}^{x}$ is the $q$ -exponential defined by

\displaystyle e_{q}^{x}\equiv\left[1+(1-q)x\right]^{1/(1-q)},

(11)

where the natural exponential function is a particular case: $e^{x}=e_{q=1}^{x}$ .

The authors Soares et al. (2005) showed that both $k_{0}$ and $q$ are functions of $\alpha_{A}$ . So, as $\alpha_{A}$ increases, a topological phase transition occurs in the connectivity distribution Nunes et al. (2017); Soares et al. (2005). The network changes from a completely heterogeneous network ( $\alpha_{A}=0$ ) to an increasingly homogeneous network as $\alpha_{A}$ tends to infinity. Such phase transition also appears in the degree correlation of the nodes, as we show in the calculation of $k_{nn}(k)$ for different values of $\alpha_{A}$ (see figure 8(a)). The transition is clearer in the graph 8(b) in which we show the calculation of Pearson’s coefficient as a function of $\alpha_{a}$ . Close to $\alpha_{a}=2$ , Pearson’s coefficient changes from a negative value, which characterizes a disassociative network, to a positive value, which characterizes an associative network.

Finally, other two more evidence that the topological phase transiton can be discussed. We can measure the level of heterogeneity of a network using the quantity $\kappa=\langle k^{2}\rangle/\langle k\rangle$ , where $\langle k^{p}\rangle$ is the $p-th$ moment of the degree distribution. If $\kappa/\langle k\rangle>1$ the network is considered heterogeneous because it means that the second moment of the degree distribution can diverge when $N\rightarrow\infty$ while for homogeneous networks $\kappa/\langle k\rangle\approx 1$ Barrat et al. (2008). As can be seen in figure 9, the network becomes more homogeneous as $\alpha_{A}$ increases because $\kappa/\langle k\rangle$ decreases and approaches to one. We also calculated the average shortest path length. When the network becomes more homogeneous, the average shortest path length increases because the number of hubs decreases and consequently the path between the nodes increases. This measure, also shown in figure 9, reinforces the topological phase transition.

II.4.2 Variations of the model proposed by Soares et. al.

It is possible to include euclidian distance in the homophilic and the fitness models, as investigated by Nunes and collaborators Nunes et al. (2017). For example, when we study the social interaciton in a city Ribeiro et al. (2017), the parameter $\eta_{i}$ can represent the influence of different places localized in the city. So it is possible to use the fitness model with Euclidean distance to try to explain, for example, why some places in a city is more attractiveness than others to open a store, coffee shop or gas station. We also can use the homophilic model including spatial distance to study the influence of the topology in a formation of neighborhoods, since people tend to cluster with people that have a similar social class, religion or workplace Bisgin et al. (2010); Vinicius M. Netto, Joao Meirelles and Ribeiro (2017); Netto et al. (2018).

The algorithms used to construct both models were already shown in previous sections. Now, we just have to include the metric, using the function $P_{G}\sim r^{-(\alpha_{G}+2)}$ to distribute the nodes in a continuous plane and change the preferential attachment rules that become,

\Pi(k_{i}|j)=\frac{\eta_{i}k_{i}r_{ij}^{-\alpha_{A}}}{\sum_{n}\eta_{n}k_{n}r_{in}^{-\alpha_{A}}}\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \text{and}

(12)

\Pi(k_{i}|j)=\frac{(1-\mathcal{A}_{ij})k_{i}r_{ij}^{-\alpha_{A}}}{\sum_{n}(1-\mathcal{A}_{in})k_{n}r_{in}^{-\alpha_{A}}},

(13)

for fitness and homophilic models, respectively. Nunes Nunes et al. (2017) also shown a topological phase transition, as $\alpha_{A}$ increases for fitness model and no influence of the parameter $\alpha_{G}$ in the pattern of the connectivity distribution. We obtained the same results for homophilic networks. The data are not shown because they are very similar to the results shown in figure 7.

III Real Networks

Most of social networks are assortative while technological ones tend to be more disassociative Newman (2002); Newman and Park (2003). To support this evidence and show that the models studied in this article are successful in modeling real systems, we have chosen three real networks to investigated two distinct properties: degree distribution and assortativity. The networks are:

•

Phone Calls: nodes represent cell phone users and the edges exist if they have called each other at least once during the investigated period. Data are from Song et al. (2010).
•

Collaboration network: each node represents an author in a scientific collaboration and the edges between them represent a co-authored at least one paper in the period from January 1993 to April 2003. The data are obtained from arXiv preprint Condense Matter Physics Leskovec et al. (2007).
•

Email: nodes are email adress and a directed link from one node to another represent at least one email sent. The data are collected during 112 days in the University of Kiel (Germany) Ebel et al. (2002).

According to the table 2, the Pearson’s coefficient of phone calls and collaboration networks are positive while for email network this coefficient is negative. The first two are social networks and they are basically related to family/friendship and professional interactions, respectively. In reference Newman (2002), Newman found similar results for biology and mathematics coauthorship. However, the email network, although it also describes some social interaction, behaves more as a technological network. In the reference cited above, the author also found similar value for World-Wide-Web.

Table 2: Size

N

and Pearson’s correlation coefficient

r

for different real networks. We compared the values with the Pearson’s coefficient calculated for synthetic networks with the same size. For the phone calls network, we used the homophilic network including euclidean distance (

\alpha_{A}=5

). For the collaboration network, we used the BA network including euclidean distance (

\alpha_{A}=5

) and finally for the email network, we used the fitness network including euclidean distance (

\alpha_{A}=1

Real Network	$N$	$c_{P}$	$c_{P}$ (synthetic network)
Phone Calls	36594	0.282	0.120
Collaboration	23132	0.134	0.112
Email	57194	-0.075	- 0.078

Now, we can compare this real systems with our investigated models. In the case of phone calls network, we used the homophilic model and we investigated how the euclidean distance between the nodes of the system affects the network’s degree distribution. Homophilic model was chosen because it is reasonable to assume that telephone calls happen between people who have a certain affinity with each other, whether for personal, family or professional reasons. This hypothesis is corroborated in recent works Currarini et al. (2016); Bisgin et al. (2010).

The Pearson correlation coefficient of the investigated synthetic network is not very similar to the value obtained for the real network (see table 2). However we appreciated how accurate the degree distribution of this synthetic network is when compared to the real one. As shown in figure 10, when we used the attachment parameter $\alpha_{A}=5$ , the fits works extremely well, emphasizing the importance of considering geographic distance between the elements of the system when modeling real social networks. Indeed, a lot of work have followed this line Lengyel et al. (2015); Laniado and Kaltenbrunner (2017); Liu et al. (2019).

The same analysis can be done for the collaboration network. The Pearson correlation coefficient of the synthetic network is similar to the one calculated for the real system. In this study, the only change was the synthetic network investigated. We chosed the traditional Barabàsi-Albert model but also including the Euclidean distance and, as we showed in figure 11, the fit using $\alpha_{A}=5$ is accurate as well. In networks of scientific co-authorship more distinguished researchers, such as university professors, tend to publish works with less famous researchers, such as their graduate students. This support both assumptions: the BA preferential attachment rule according to the degree of the node and the influence of the distance between the elements of the system. For the collaboration network, the fitness and homophilic models also showed reasonable results. As long as we increased the value of $\alpha_{A}$ , the preferential connection rule involving the Euclidean distance prevails in relation to the others.

Finally, the email network presents a very similar Pearson correlation coefficient with compared to the fitness synthetic network considering the euclidean distante. But here the parameter $\alpha_{A}=1$ fits better the degree distribution of real data. It shows a smaller impact of the geographical distance of the nodes in technological networks than in social ones. This can also be related to the fact that this real network has directed links. As this email network was obtained from a university, the fitness model was chosen based on the fact that, in academia, students tend to send more emails to teachers than the otherwise. So the message sent depends on how influential (greater fitness) the reciever is.

IV Conclusions

In this work we have studied network models with growth and different rules of preferential attachment. We reviewed some important algorithms such as the Barabasi-Albert model and others that includes fitness, homophily and/or Euclidean distance as strategies to make connections between nodes. From an applicable perspective, these models are useful to model real-world networks because they present characteristics found in social sytems and also in technological ones, as we showed in the last section.

Our results corroborated with evidences that power-law degree structured is not very common in real systems. We evaluated two social and one technological network and we compared the degree distribution of these networks to degree distributions generated by growth and preferential attachment models. Our main conclusion is that the real networks analysed are better fitted with models which consider traits as fitness, homophily and euclidean distance between nodes. We observed that geographic distance between nodes seems to be an important factor to model specially real social systems. This feature changes the form of the degree distribution of a power law to a q-exponential according to the model proposed by Soares and collaborators Soares et al. (2005). Our results are in agreement with recent studies involving real networks Radicchi and Claudio (2015); Broido and Clauset (2019); Holme (2019); Song et al. (2010); Ebel et al. (2002); Leskovec et al. (2007); Ribeiro (2017); Laniado and Kaltenbrunner (2017); Liu et al. (2019); Lengyel et al. (2015).

We also supplemented the characterization of these synthetic networks investigating measures as clustering, average shortest path length, degree distribution and assortativity.

Finally, it is important to mention that many dynamical processes as epidemics, rumor propagation and synchronization were extensively investigated in scale-free to- pologies as the Barabàsi-Albert network. However the study of these dynamics in substrates where the distance between the elements of the system is taken into account needs to further advance, since this element has already been shown to be very important. Even on online social networks Lengyel et al. (2015); Currarini et al. (2016); Bisgin et al. (2010); Laniado and Kaltenbrunner (2017); Liu (2018); Liu et al. (2019), it seems to influence the connection between the nodes, as well as fitness and homophily.

Acknowledgments

This work was partially supported by the Brazilian agencies CAPES, CNPq and FAPEMIG. The authors acknowledges computational time at DFI-UFLA. Angélica S. Mata thanks the support from FAPEMIG (Grant No. APQ-02482-18) and CNPq (Grant No. 423185/2018-7). Fabiano L. Ribeiro thanks the support from CNPq (405921/2016-0) and CAPES (88881.119533/2016-01).

References

(1)
Barabási and Albert (1999) A.-L. Barabási and R. Albert, science 286, 509 (1999).
Bianconi and Barabási (2001) G. Bianconi and A.-L. Barabási, EPL (Europhysics Letters) 54, 436 (2001).
de Almeida Maurício L. et. al. (2013) de Almeida Maurício L. et. al., The European Physical Journal B-Condensed Matter and Complex Systems 86, 38 (2013).
Soares et al. (2005) D. J. Soares, C. Tsallis, A. M. Mariz, and L. R. da Silva, EPL (Europhysics Letters) 70, 70 (2005).
Radicchi (2015) F. Radicchi, Nature Physics 11, 597 (2015).
Newman and Park (2003) M. E. J. Newman and J. Park, Phys. Rev. E 68, 036122 (2003), URL https://link.aps.org/doi/10.1103/PhysRevE.68.036122.
Radicchi and Claudio (2015) C. Radicchi, Filippo and Claudio, Nature Communications 6, 10196 (2015), URL https://doi.org/10.1038/ncomms10196.
Caldarelli (2007) G. Caldarelli, Scale-free networks: complex webs in nature and technology (Oxford University Press, 2007).
Newman (2002) M. E. J. Newman, Phys. Rev. Lett. 89, 208701 (2002), URL https://link.aps.org/doi/10.1103/PhysRevLett.89.208701.
Barabási and Pãsfai (2016) A. Barabási and M. Pãsfai, Network Science (Cambridge University Press, 2016), ISBN 9781107076266, URL https://books.google.com.br/books?id=iLtGDQAAQBAJ.
Song et al. (2010) C. Song, Z. Qu, N. Blumm, and A.-L. Barabási, Science 327, 1018 (2010), ISSN 0036-8075, eprint https://science.sciencemag.org/content/327/5968/1018.full.pdf, URL https://science.sciencemag.org/content/327/5968/1018.
Ebel et al. (2002) H. Ebel, L.-I. Mielsch, and S. Bornholdt, Phys. Rev. E 66, 035103 (2002), URL https://link.aps.org/doi/10.1103/PhysRevE.66.035103.
Leskovec et al. (2007) J. Leskovec, J. Kleinberg, and C. Faloutsos, ACM Trans. Knowl. Discov. Data 1, 2–es (2007), ISSN 1556-4681, URL https://doi.org/10.1145/1217299.1217301.
Broido and Clauset (2019) A. D. Broido and A. Clauset, Nature communications 10, 1 (2019).
Holme (2019) P. Holme, Nature communications 10, 1 (2019).
Currarini et al. (2016) S. Currarini, J. Matheson, and F. Vega-Redondo, European Economic Review 90, 18 (2016).
Bisgin et al. (2010) H. Bisgin, N. Agarwal, and X. Xu, in Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on (IEEE, 2010), vol. 1, pp. 533–536.
Rozenfeld et al. (2002) A. F. Rozenfeld, R. Cohen, D. ben Avraham, and S. Havlin, Phys. Rev. Lett. 89, 218701 (2002), URL https://link.aps.org/doi/10.1103/PhysRevLett.89.218701.
Liu (2018) B. A. C. H. L. W. Y. Q. X. L. X. Liu, L.; Chen, ISPRS Int. J. Geo-Inf 7 (2018).
Laniado and Kaltenbrunner (2017) V. Y. S. S. M. C. Laniado, D. and A. Kaltenbrunner, Information Systems Frontiers (2017), URL https://doi.org/10.1007/s10796-017-9784-9.
Lengyel et al. (2015) B. Lengyel, A. Varga, B. Ságvári, Á. Jakobi, and J. Kertész, PLoS ONE 10 (2015).
Bollobás and Riordan (2003) B. Bollobás and O. M. Riordan, Handbook of graphs and networks: from the genome to the internet pp. 1–34 (2003).
Dorogovtsev and Mendes (2002) S. N. Dorogovtsev and J. F. Mendes, Advances in physics 51, 1079 (2002).
Watts and Strogatz (1998) D. J. Watts and S. H. Strogatz, Nature 393, 440 (1998).
Dorogovtsev and Mendes (2004) S. N. Dorogovtsev and J. F. F. Mendes, CoRR cond-mat/0404593 (2004), URL http://arxiv.org/abs/cond-mat/0404593.
Pastor-Satorras et al. (2001) R. Pastor-Satorras, A. Vázquez, and A. Vespignani, Physical review letters 87, 258701 (2001).
Barrat et al. (2008) A. Barrat, M. Barthelemy, and A. Vespignani, Dynamical processes on complex networks (Cambridge university press, 2008).
Ribeiro et al. (2017) F. L. Ribeiro, J. Meirelles, F. F. Ferreira, and C. R. Neto, Royal Society open science 4, 160926 (2017).
Liu et al. (2019) X. Liu, Y. Xu, and X. Ye, Outlook and Next Steps: Integrating Social Network and Spatial Analyses for Urban Research in the New Data Environment (Springer International Publishing, Cham, 2019), pp. 227–238, ISBN 978-3-319-95351-9, URL https://doi.org/10.1007/978-3-319-95351-9_13.
Nunes et al. (2017) T. C. Nunes, S. Brito, L. R. da Silva, and C. Tsallis, Journal of Statistical Mechanics: Theory and Experiment 2017, 093402 (2017).
Vinicius M. Netto, Joao Meirelles and Ribeiro (2017) Vinicius M. Netto, Joao Meirelles and F. L. Ribeiro, Complexity 2017 (2017).
Netto et al. (2018) V. M. Netto, E. Brigatti, J. Meirelles, F. L. Ribeiro, B. Pace, C. Cacholas, and P. Sanches, Entropy 20, 1 (2018), ISSN 10994300.
Ribeiro (2017) F. F. L. Ribeiro, Revista Brasileira de Ensino de Fisica 39, 1 (2017), ISSN 01024744.