This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Percolation on complex networks: Theory and application

Ming Li Run-Ran Liu Linyuan Lü [email protected] Mao-Bin Hu Shuqi Xu Yi-Cheng Zhang Department of Thermal Science and Energy Engineering, University of Science and Technology of China, Hefei, 230026, P. R. China Alibaba Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou, 310036, P. R. China Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, P. R. China Beijing Computational Science Research Center, Beijing, 100193, P. R. China Department of Physics, University of Fribourg, Fribourg CH-1700, Switzerland
Abstract

In the last two decades, network science has blossomed and influenced various fields, such as statistical physics, computer science, biology and sociology, from the perspective of the heterogeneous interaction patterns of components composing the complex systems. As a paradigm for random and semi-random connectivity, percolation model plays a key role in the development of network science and its applications. On the one hand, the concepts and analytical methods, such as the emergence of the giant cluster, the finite-size scaling, and the mean-field method, which are intimately related to the percolation theory, are employed to quantify and solve some core problems of networks. On the other hand, the insights into the percolation theory also facilitate the understanding of networked systems, such as robustness, epidemic spreading, vital node identification, and community detection. Meanwhile, network science also brings some new issues to the percolation theory itself, such as percolation of strong heterogeneous systems, topological transition of networks beyond pairwise interactions, and emergence of a giant cluster with mutual connections. So far, the percolation theory has already percolated into the researches of structure analysis and dynamic modeling in network science. Understanding the percolation theory should help the study of many fields in network science, including the still opening questions in the frontiers of networks, such as networks beyond pairwise interactions, temporal networks, and network of networks. The intention of this paper is to offer an overview of these applications, as well as the basic theory of percolation transition on network systems.

keywords:
Percolation , Complex Network , Network Structure , Network Dynamics , Phase Transition and Critical Phenomena
journal: Physics Reports

1 Introduction

In contrast to many other modern research fields, the network problem is often easy to define by abstracting from everyday life [1, 2]. For examples, how many people an epidemic can infect in a social contact network, whether a communication network can maintain its function after an intentional attack, which node has the largest impact in a social network, and so on [3, 4, 5, 6]. The key point of these problems with network involved can be summarized as a cluster forming process within a chosen fraction of nodes, those might be infected people, preserved nodes after an intentional attack, or individuals with the same opinion. In principle, these processes are easy to define, however, not so easy to solve.

Fortunately, in statistical physics, a profound theoretical system, called percolation theory, just touches this problem, i.e., the behaviors of a networked system when some of nodes or links are not available [7]. Indeed, when the network science was just a new rising topic, the percolation theory has already been widely used for explaining empirical results, and solving models [2]. Now, after more than twenty years’ development of network science, the percolation theory, including conceptions, analytical methods, and algorithms, can be found almost in all the research fields of network science.

It is known that the classical percolation in statistical physics only considers regular lattices, therefore, with these applications to complex networks, the percolation theory itself also has been enriched and developed. In recent hot areas of network science, such as higher-order networks [8, 9] and networks beyond pairwise interactions [10], models and methods of percolation have still been widely touched. This is obviously because the connection property must always be a key point to understand network structure and dynamics.

Due to importance of the percolation theory in the study of complex networks, almost all the review articles about networks have the relevant sections to introduce distinguishable developments and applications of percolation theory on complex networks, however, a systematic comparison and summary specifically from the perspective of percolation is still absent. This review article aims to fill this gap, and comb the scattered discussions on the network percolation and its applications, which can facilitate a wide area of sciences, ranging from physics and computer science to biology and sociology, as well as various branches of probability theory in mathematics.

1.1 Classical percolation

Percolation now usually refers to a class of models that describe geometric features of random media. In statistical physics, percolation theory is often accompanied by scaling law, fractal, self-organization criticality, and renormalization, which are all of immense importance theoretically in many diverse fields of physics [7]. Therefore, percolation has long served as a basic ideal model for demonstrating phase transition and critical phenomena. However, quite apart from the role percolation theory now plays, it originates from an honest applied problem in the study of gelation in the 1940s [11, 12, 13]. To be a mathematical subject, it first starts from Broadbent and Hammersley’s paper in 1957 [14], in which its name, and the geometrical and probabilistic concepts were introduced.

The study of percolation then becomes popularized in the physics community, and many of the open problems have been proposed and solved [15, 16, 17, 18, 19, 20, 21]. Now, the percolation theory has also been found to be of a broad range of applications to diverse problems. The applications in network science that this article focuses on are one of the typical representatives. This is also benefited from the development of computational technology, as the simulation plays a crucial role in the study of percolation [22].

In the following, we will firstly give a brief introduction to the percolation model, as well as some physical concepts and quantities involved.

1.1.1 Percolation process in random media

Imagine a large porous stone immersed in water. Does the water come into the core of the stone? If so, there must be some paths formed by the pores running through the stone. However, as a whole, the connection of any two adjacent pores is probabilistic, supposing the probability is pp. The problem thus reduces to whether there exist such paths for a given probability pp. This actually is a typical example of percolation problems. The connection of two adjacent pores, in the terminology of percolation, is called occupying the corresponding bond between the two pores, and hence pp is the occupied probability. Obviously, if pp is large enough, the core of the porous stone can be wetted. In that case, the connected pores are able to form a cluster that penetrates the stone. Accordingly, percolation theory is mainly concerned with the existence of such a cluster and its structure with respect to the occupied probability pp.

The above process is just a special case of the percolation process. The same question can also be proposed for many other systems constituted by random medium. Another typical example is the forest fire model. Suppose a burning tree can only ignite the trees in the adjacent sites, the destructiveness of the forest fire depends on the density of the trees, i.e., the probability pp of finding a tree on a site. This is also obviously a percolation process – when the adjacent trees form a cluster that can penetrate the entire forest, the fire could raze almost the entire forest; otherwise, the fire is constrained in a small area. Furthermore, similar regulations can also be applied to model the spreading of epidemics among individuals, where infected individuals can infect their neighbors probabilistically.

Although the soaking of porous stone, the forest fire and the spreading of epidemics seemingly belong to separated fields, the three are now converging in an intriguing manner. That is, is there a cluster of connected sites through the system? As is quite typical, it is actually easier to examine such an infinite cluster in an infinite system than just large ones. With the increasing of the probability pp, there must be a critical pcp_{c}, called percolation threshold or critical point, below which such a cluster cannot be found. For a more detailed understanding of this criticality, the percolation model needs to be defined explicitly. That is what we are going to talk about.

1.1.2 Percolation model on lattices

To facilitate presentation, we consider a two-dimensional lattice here as shown in Fig.1 (a). Mathematically, each bond is occupied with probability pp or unoccupied with probability 1p1-p. Then, the occupied bonds connect the sites into clusters. This model is called bond percolation, which can be used to model the process of the soaking of porous stone and the spreading of epidemics. For the forest fire model, one often uses a slightly different percolation model, the so-called site percolation model. For this model, we occupy each site with probability pp rather than bonds as shown in Fig.1 (b). In general, the bond percolation is considered less general than the site percolation due to the fact that the bond percolation can be reformulated as a site percolation on a different lattice, but not vice versa.

Refer to caption
Figure 1: (Color online) Schematic diagrams of the classical percolation on square lattice, where different colors denote different clusters. The size of the system used here is N=L×L=80×80N=L\times L=80\times 80. The values of pp labeled in the figures are the corresponding site/bond occupied probabilities. (a) Bond percolation. For ease of identification, the sites and the unoccupied bonds are not shown here. For p=0.51p=0.51, a giant cluster exists indicated by yellow. (b) Site percolation. For ease of identification, the bonds and the unoccupied sites are not shown here. For p=0.605p=0.605, a giant cluster exists indicated by blue.

The percolation theory mainly focuses on the emergence of the infinite cluster with the increasing of probability pp. To characterize this phenomenon, one often adopts the size of the giant cluster, which is defined as

S=limNs1N.S=\lim_{N\to\infty}\frac{s_{1}}{N}. (1)

Here, NN is the size of the system (site number), and s1s_{1} is the number of sites in the largest cluster. As shown in Fig.1, with the increasing of pp, there must be a critical point pcp_{c}, above which a non-zero SS can be found. This figures out the percolation transition of the system with respect to the control parameter pp, and SS is the corresponding order parameter.

In addition, there is another commonly used order parameter called wrapping probability, which is defined as the probability that a cluster wraps around the periodic boundary conditions on a regular lattice. For large systems, this probability is equal to the probability that the system percolates. This parameter is usually used to estimate the position of the percolation threshold, since it is a size-independent parameter at the critical point. Specifically, cluster wrapping can be defined in a number of different ways, such as wrapping around one direction, wrapping around one direction but not the other, and wrapping around both directions. This probability, however, cannot be well defined in network systems without spatial constraints, so that the size of the giant cluster is the preferred order parameter in network science.

To describe the features of finite clusters, the distribution of cluster sizes ps=ms/smsp_{s}=m_{s}/\sum_{s}m_{s} is also used, where msm_{s} is the number of the clusters with size ss. Sometimes, the normalized cluster number ns=ms/Nn_{s}=m_{s}/N is also used to feature the cluster size distribution. It is obvious that ps=nsN/sms=nssp_{s}=n_{s}N/\sum_{s}m_{s}=n_{s}\langle s\rangle, where s=N/sms=ssps\langle s\rangle=N/\sum_{s}m_{s}=\sum_{s}sp_{s} is the average cluster size.

It must be pointed out that the average size s\langle s\rangle is not the mean cluster size χ\chi commonly used in percolation theory, which is defined as χ=ss2ns\chi=\sum_{s}s^{2}n_{s}. It is not hard to know that snssn_{s} is the probability that a randomly chosen site belongs to a cluster with size ss. Thus, the mean cluster size χ\chi actually is the average size of the cluster that a randomly chosen site belongs to. Together with the characteristic length ξ\xi and the characteristic size sξs_{\xi}, above which the clusters are exponentially scarce, these statistics are often used to describe and characterize the percolation transition [7]. The critical behaviors of these parameters will be briefly introduced in the next section.

Besides, there exist several other variants of percolation. For example, one could drop the assumption of independence of occupation, so that the occupation of a site or a bond could depend on the state of other sites or bonds. This type of percolation is often called dependent percolation, which is widely used in network science to model the spreading dynamics and the cascading process [3, 5, 6, 8, 23]. Such models will also be discussed in subsequent parts of this paper.

1.1.3 Percolation transition

The prerequisite for the study of critical phenomena is determining the percolation threshold pcp_{c}. Figure 1 indicates that the site percolation and the bond percolation have different critical points. In fact, the threshold of bond percolation on square lattice can be solved exactly by the duality transformation or real-space renormalization [24, 7, 25, 26, 27], that is pc=1/2p_{c}=1/2. However, there is no exact solution for site percolation on square lattice. Through Monte Carlo simulation, a good estimate of the critical point can be obtained, such as pc=0.59274621(13)p_{c}=0.59274621(13) in Ref.[28], and pc=0.5927465(4)p_{c}=0.5927465(4) in Ref.[29]. In Tab.1, we list the percolation thresholds for some typical systems.

Table 1: Selected percolation thresholds for various networks. Here, zz is the coordination number of the Bethe lattice, mm is the minimum degree of SF network, ζ(s,a)\zeta(s,a) is the Hurwitz zeta function, G1(x)G_{1}(x) is the generating function of the excess-degree distribution, and kn\langle k^{n}\rangle is the nnth moment of the degree distribution.
Network Site Bond
square lattice 0.59274621(13)0.59274621(13) [28], 0.5927465(4)0.5927465(4) [29] 1/21/2
triangular lattice 1/21/2 2sin(π/18)2\sin(\pi/18) [30]
honeycomb lattice 0.697040230(5)0.697040230(5) [31], 0.6970402(1)0.6970402(1) [32], 0.6970413(10)0.6970413(10) [33] 12sin(π/18)1-2\sin(\pi/18) [30]
simple cubic 0.3116077(4)0.3116077(4) [29], 0.3116077(2)0.3116077(2) [34], 0.3116060(48)0.3116060(48) [35], 0.3116004(35)0.3116004(35) [36], 0.31160768(15)0.31160768(15) [37] 0.24881182(10)0.24881182(10) [34], 0.2488125(25)0.2488125(25) [38], 0.2488126(5)0.2488126(5) [39]
hypercubic d=4d=4 0.1968861(14)0.1968861(14) [40], 0.196889(3)0.196889(3) [41], 0.196901(5)0.196901(5) [42], 0.19680(23)0.19680(23) [43], 0.1968904(65)0.1968904(65) [35], 0.19688561(3)0.19688561(3) [44] 0.1601314(13)0.1601314(13) [40], 0.160130(3)0.160130(3) [41], 0.1601310(10)0.1601310(10) [38], 0.1601312(2)0.1601312(2) [45], 0.16013122(6)0.16013122(6) [44]
hypercubic d=5d=5 0.1407966(15)0.1407966(15) [40], 0.1407966(26)0.1407966(26) [35], 0.14079633(4)0.14079633(4) [44] 0.118172(1)0.118172(1) [40], 0.1181718(3)0.1181718(3) [38], 0.11817145(3)0.11817145(3) [44]
hypercubic d=6d=6 0.109017(2)0.109017(2) [40], 0.1090117(30)0.1090117(30) [35], 0.109016661(8)0.109016661(8) [44] 0.0942(1)0.0942(1) [46], 0.0942019(6)0.0942019(6) [40], 0.09420165(2)0.09420165(2) [44]
hypercubic d=7d=7 0.0889511(9)0.0889511(9) [40], 0.0889511(90)0.0889511(90) [35], 0.088951121(1)0.088951121(1) [44] 0.078685(30)0.078685(30) [46], 0.0786752(3)0.0786752(3) [40], 0.078675230(2)0.078675230(2) [44]
Bethe lattice 1/z1/z 1/z1/z
ER network 1/k1/\langle k\rangle [47, 48] 1/k1/\langle k\rangle [47, 48]
SF network ζ(λ1,m)/[ζ(λ2,m)ζ(λ1,m)]\zeta(\lambda-1,m)/[\zeta(\lambda-2,m)-\zeta(\lambda-1,m)] [48] ζ(λ1,m)/[ζ(λ2,m)ζ(λ1,m)]\zeta(\lambda-1,m)/[\zeta(\lambda-2,m)-\zeta(\lambda-1,m)] [48]
tree-like network 1/G1(1)=k/[k2k]1/G_{1}^{\prime}(1)=\langle k\rangle/[\langle k^{2}\rangle-\langle k\rangle] [48] 1/G1(1)=k/[k2k]1/G_{1}^{\prime}(1)=\langle k\rangle/[\langle k^{2}\rangle-\langle k\rangle] [48]

In the subcritical regime (p<pcp<p_{c}), all clusters are finite and the size distribution has a tail which decays exponentially; in the supercritical regime (p>pcp>p_{c}), an infinite cluster can be found in the system, and the size distribution of other finite clusters has a tail which decays slower than exponential. Near the critical point, some asymptotic behaviors can be found, referred to as critical phenomena. In the percolation theory, these behaviors are characterized by the critical exponents [7],

S\displaystyle S (ppc)β,\displaystyle\varpropto(p-p_{c})^{\beta}, (2)
χ\displaystyle\chi |ppc|γ,\displaystyle\varpropto|p-p_{c}|^{-\gamma}, (3)
ξ\displaystyle\xi |ppc|ν,\displaystyle\varpropto|p-p_{c}|^{-\nu}, (4)
sξ\displaystyle s_{\xi} |ppc|1/σ,\displaystyle\varpropto|p-p_{c}|^{-1/\sigma}, (5)
ps\displaystyle p_{s} sτ.\displaystyle\varpropto s^{-\tau}. (6)

with relations

β\displaystyle\beta =τ2σ,\displaystyle=\frac{\tau-2}{\sigma}, (7)
γ\displaystyle\gamma =3τσ.\displaystyle=\frac{3-\tau}{\sigma}. (8)

Here, β\beta, γ\gamma, ν\nu, σ\sigma, and τ\tau are the so-called critical exponents, which determine the universal class of the percolation transition. Besides, at the critical point the characteristic length ξ\xi and size sξs_{\xi} also have a relation

sξξdf.s_{\xi}\varpropto\xi^{d_{f}}. (9)

The exponent dfd_{f} is often called fractal dimension [7, 49], characterizing the structure of the infinite cluster at the critical point. Assuming the dimension of the system is dd, there is another relation between critical exponents, called hyperscaling,

df=dβν.d_{f}=d-\frac{\beta}{\nu}. (10)

Note that this relation is also universal, and independent of the topological structure of the system. The phase transition theory points out that there is an upper critical dimension dcd_{c} (for percolation dc=6d_{c}=6), above which the critical exponents become the same as that in mean-field theory [50, 51]. So this hyperscaling relation holds only for ddcd\leq d_{c}. It is also worth mentioning that the geometric structure of high-dimensional percolation clusters cannot be fully accounted for by the counterparts of random networks, though both of them have the mean-field nature [52].

In addition, the critical behavior below dcd_{c} is different from the mean-field approximation which is valid away from the phase transition. For these systems, the renormalization group theory has made remarkable predictions about the behavior of the percolation near and at the threshold [53, 54]. In the renormalization group theory, there is also a lower critical dimension (for percolation dl=2d_{l}=2), below which there is no phase transition.

Furthermore, the universality principle states that the value of pcp_{c} is determined by the local structure of the system, whereas the behavior of clusters that is observed near pcp_{c} is independent of the local structure (lattice type and percolation type). In this sense, percolation is believed to be a substrate-dependent but model-independent process, and therefore, the critical exponents of the transition are only determined by the geometry of the system, and identical for the bond and site percolation. The critical exponents are thus more natural to be considered than the threshold pcp_{c}, and there is no need to deal with the site and bond percolation, individually.

To end this subsection, it must be pointed out that the critical exponent β\beta is distinguishable for the site and bond percolation on scaling-free (SF) networks [55]. This is a special case derived from the vanished percolation threshold, which has no effect on the other properties discussed above.

1.2 Networks

This paper mainly focuses on the percolation theory on networks and its applications. So, in this section we will provide an overview of some simple and general properties and models of networks.

1.2.1 Network representation

In network science, the elements of a system and the connections between them are no longer known as site and bond. Instead, they are often called node and link, or vertex and edge, respectively. The number of links a node has is called degree, labeled kk.

To exactly represent the connection pattern of a network, one often uses a N×NN\times N matrix 𝐀\mathbf{A}, called adjacent matrix, where NN is the number of nodes in the network. In this way, the element aija_{ij} of 𝐀\mathbf{A} is one when there is a link from node ii to node jj, and zero when there is no link. For undirected networks, it must have aij=ajia_{ij}=a_{ji}, thus 𝐀\mathbf{A} is a symmetric matrix. For weighted networks, the element aija_{ij} can further be any non-zero value to represent the corresponding link weight.

1.2.2 Statistical properties

Instead of studying a special network with a given adjacent matrix, the network science is more on discovering the common nature of a class of networks, i.e., a network ensemble. For that, networks are often featured by the degree distribution pkp_{k}, which provides the probability that a randomly selected node in the network has degree kk.

A typical network ensemble is the one with Poisson degree distribution

pk=ekkkk!,p_{k}=\frac{e^{-\langle k\rangle}\langle k\rangle^{k}}{k!}, (11)

where \langle\rangle means the average over all the nodes. This just is the ensemble of the known Erdős–Rényi (ER) random networks/graphs [47], and can be simply realized by randomly connecting a given number of links among a set of nodes, or connecting each pairs of nodes with a given probability. Quite obviously, the generation of ER networks is actually a percolation process. Only when the system percolates (k>1\langle k\rangle>1), the giant cluster can be found. Specifically, when k<1\langle k\rangle<1 almost surely each cluster has size logN\log N, at k=1\langle k\rangle=1, the largest cluster has a size of order N2/3N^{2/3}, and at k>1\langle k\rangle>1, there exists a single giant cluster of size of order NN and all other clusters have a size of order logN\log N. Since such systems have no spatial constraint, the critical phenomena in this ensemble just recover the mean-field solution.

Besides, distinguishing from the degree distribution, there is another kind of commonly used networks, which represents a deeper organizing principle of real networks called the SF property [2]. In mathematical terms, the SF property translates into a power-law function of the form

pkkλ,km.p_{k}\propto k^{-\lambda},~{}~{}~{}~{}k\geq m. (12)

For a real network, there must also be an upper bound of degree, called degree cutoff KK.

The main difference between an ER and a SF network comes in the tail of the degree distribution. For ER networks, most nodes have comparable degrees and hence the degree cutoff is in the order of the average degree. In contrast, nodes with very large degrees are expected in SF networks, called hubs of the networks. Indeed, the degrees of hubs grow with the network size, and thus can grow quite large. This indicates that SF networks have strong heterogeneity, and hence the percolation transition bears a strong dependence upon the degree distribution. Beyond this, there are many other measurements to further subdivide the network ensembles, such as clustering, correlation, and community. In recent years, temporal networks, multiplex networks, and high-order networks have also received a lot of attention. Because of space limitation and the focus of this paper, the details will not be included here and brief discussions will be given in this article when necessary. Further details on these can also be found in Refs.[3, 5, 6, 1, 2].

1.2.3 Network models

After a review of the fundamental properties of complex networks, we explore the mathematical modeling of network ensembles, and mainly focus on the configuration model and the hidden parameter model [1, 2]. In principle, the two models can generate any network ensembles with a meaningful degree distribution. In fact, these two models are usually only used to realize SF property, since there exist easy-implemented models for Poisson degree distribution (ER networks) and small-world (SW) networks [1, 2]. Note that the SF network also has an easy-implemented model, Barabási–Albert (BA) model [56], nevertheless, the generated SF network has its own structural features and limitations rather than a network with programmable and tunable SF properties.

From random graph theory [47], there are two ways to generate a network with Poisson degree distribution. One is randomly connecting nodes until the excepted number of links is met, and the other is connecting each pair of nodes with a given probability. For a given average degree, networks implemented by the former method must be a subset of those by the later one. In spite of this, the two methods will become equivalent in the thermodynamic limitation (NN\to\infty).

The SW network grows out of a regular network, such as the triangle lattice and the square lattice, by randomly rewiring a small fraction of links [57, 58, 59, 60]. Since there is no spatial constraint for the rewired link, the average distance of nodes becomes much smaller than that of the underlying regular network. This is why it is called small-world. In turn, the regular backbone of the SW network usually leads to a high clustering, namely, the neighbors of a node are also connected. This is another characteristic of the SW network. For this type of networks, the degree distribution is often not the concern, and one often uses the backbone network and the rewiring probability to define a SW network ensemble.

The configuration model can help us build a network with a pre-defined degree distribution [61, 62, 1, 2]. The algorithm consists of two steps. First, assigning degrees to each node drawn randomly from the pre-defined degree distribution, represented as stubs. Then, two stubs are selected randomly and connected. Repeating this procedure until all stubs are paired up. If there is nothing in this procedure to forbid self- and multi-connections, the obtained network is probably not a simple graph as usually studied in network science. While rejecting such connections could add much more overhead to the program’s execution time.

Refer to caption
Figure 2: Schematic diagram of the rewiring procedure. Here, the rewiring means that swapping the ends of two links. There are two ways to rewire, labeled R1R_{1} and R2R_{2} respectively in the figure. Note that, to ensure the ergodicity and the detailed balance, the two ways must be chosen randomly.

Indeed, one of the effective methods to generate random networks by the configuration model without self- and multi-connections can be as follows. First, connecting the stubs in any fast and easy-to-implement ways (the details are dependent on the specific degree distribution), and recording the self- and multi-connections. Then, rewiring a fraction of links of the obtained network, and the self- and multi-connections must be included. Note that the rewiring leading to self- or multi-connections is forbidden.

A schematic diagram of the rewiring procedure is shown in Fig.2. For the degree distribution with the maximum degree KK larger than N\sqrt{N}, this method could be much more effective than the original one, which may not even complete the network. This is because there must be some degree correlation in the network with K>NK>\sqrt{N} [63, 64], randomly connecting stubs with self- and multi-connection forbidden is a very inefficient procedure. Of course, the efficiency of the algorithm also depends on the implementing way and the detailed property of the network, thus this just is a general discussion and not always the case.

By the way, the rewiring operation as shown in Fig.2 is often called degree preserving randomization, which can randomize the connections of a network without changing the degree distribution. With this operation, one can eliminate other properties from the network, for example, clustering, and assure that a certain network feature is predicted by its degree distribution alone. If the network is large enough and all the degrees are much smaller than the network size NN, the degree preserving randomization could turn the network to be locally tree-like.

Note that the correlation derived from the degree distribution cannot be undone by this randomization procedure. A typical example is a network with K>NK>\sqrt{N}. Consequently, when we study networks with hubs, i.e., SF networks, the configuration model and the degree preserving randomization must be carefully dealt with. In turn, with some constraints, this rewiring operation can also “randomize” the network into the one with a given high-order property [65, 66].

There is another model, hidden parameter model, to generate networks with a pre-defined degree distribution [67, 68, 69, 70, 71]. In this model, each node ii is assigned a hidden parameter ηi\eta_{i}, chosen from a pre-defined distribution pηp_{\eta}. Then, we connect each node pair with probability p(ηi,ηj)=ηiηj/Nηp(\eta_{i},\eta_{j})=\eta_{i}\eta_{j}/N\langle\eta\rangle. The expected number of links is thus L=i,jηiηj/2NηL=\sum_{i,j}\eta_{i}\eta_{j}/2N\langle\eta\rangle. For SF networks, we can set ηiiα\eta_{i}\propto i^{-\alpha}, and the obtained network has the degree distribution pkk(1+1/α)p_{k}\propto k^{-(1+1/\alpha)}.

The SF networks generated by the configuration model and the hidden parameter model also have their own properties, thus are not exactly equivalent, at least for small systems. First of all, the boundaries of degrees must be given for the configuration model to normalize the degree distribution. However, the boundaries of degrees are not needed for the hidden parameter model, in which a degree cutoff occurs naturally at N1/(λ1)N^{1/(\lambda-1)}, thus known as the natural cutoff of the SF network [72, 73, 74, 75, 76]. This cutoff can be also found in the configuration model, if the upper boundary MM used to normalize the degree distribution is larger than N1/(λ1)N^{1/(\lambda-1)}. Then, the hidden parameter model generates no self- and multi-connections, since only a single connection would occur between each node pair. Another is that the degree distribution of the SF network generated by the hidden parameter model has negative deviations for small degrees, while the one generated by the configuration model almost perfectly matches the pre-defined degree distribution. In addition, setting the total number of links is not allowed in the configuration model, since it is fixed by the pre-defined degree distribution. Rather, the hidden parameter model allows controlling the link number, accurately. Finally, what need to be pointed out is that both the networks generated by the configuration model and by the hidden parameter model are only a possible realization of the pre-defined degree distribution. To check the property induced by a certain degree distribution, the ensemble average must be done.

In recent years many works have also been devoted to the so-called multiplex networks (or multilayer networks, interdependent networks) [8, 9]. In this network ensemble, links are divided into different layers to represent different types of connections between nodes. To generate such a network ensemble, one can first generate several networks using the methods mentioned above, and then bundle nodes from different networks together to form the layered connections. It can also be interpreted as inserting some inter-connections between nodes in different networks (layers), which represent the dependence relations between them. The bundling of nodes (or inter-connection) could be one-to-one, one-to-many, or with some other rules, that depending on the inter-correlation between layers. Of course, there are many other network models with different properties, one can find them in special books and reviews about complex networks.

1.3 Outline of the report

This paper contains six sections. In the next section, we start by reviewing the basic properties of the percolation process on networks, including the model definition, the analytical methods, and the corresponding discussions. Section 3 reviews some typical percolation models on networks, mainly focusing on the phase transition and the critical phenomena. The percolation model is by nature featuring the cluster forming process in random media. Therefore, the percolation theory including conceptions, theoretical methods, and algorithms, is widely used in network structure analysis. This is the main content of Sec.4. Moreover, the percolation process is often used to model and analyze network dynamics, which will be reviewed in Sec.5. Finally, we summarize and prospect to this paper in Sec.6.

In addition, the main notations used in this paper are listed in Tab.2 for readability.

Table 2: The main notations used in this paper.
Notation Meaning
dd system dimension
dfd_{f} fractal dimension
G0(x)G_{0}(x) generating function of degree distribution
G1(x)G_{1}(x) generating function of excess-degree distribution
\mathcal{H} Hamiltonian of Potts model
H0(x)H_{0}(x) generating function of πs\pi_{s}
H1(x)H_{1}(x) generating function of ρs\rho_{s}
ik(t)i_{k}(t) fraction of infected individuals in all degree-kk nodes at time step tt
kk node degree
KK degree cutoff of a network
k\langle k\rangle average degree
kn\langle k^{n}\rangle nn-th moment of degree distribution
𝐌\mathbf{M} Hashimoto or non-backtracking matrix of graph
NN node number of a network
pp occupied probability
pcp_{c} percolation threshold
pkp_{k} degree distribution
psp_{s} distribution of the cluster sizes
qkq_{k} excess-degree distribution
rijr_{i\to j} probability that link iji\to j belongs to the giant cluster
rk(t)r_{k}(t) fraction of removed individuals in all degree-kk nodes at time step tt
RR probability that a link belongs to the giant cluster
SS probability that a node belongs to the giant cluster
ss cluster size
sis_{i} probability that node ii belongs to the giant cluster
sk(t)s_{k}(t) fraction of susceptible individuals in all degree-kk nodes at time step tt
sξs_{\xi} characteristic size
ZZ partition function of Potts model
𝒵\mathcal{Z} rescaled partition function of Potts model
ziz_{i} partition function of the branching from node ii of Potts model
β\beta critical exponent for the giant cluster; infection rate
γ\gamma critical exponent for the mean cluster size
δx,y\delta_{x,y} Kronecker delta function
ζ(x)\zeta(x) Riemann zeta function
ζ(x,a)\zeta(x,a) Hurwitz zeta function
Θk(t)\Theta_{k}(t) probability that a neighbor of a degree-kk node is infected
κ\kappa ratio of k2\langle k^{2}\rangle and k\langle k\rangle
λ\lambda exponent of SF degree distribution, pkkλp_{k}\propto k^{-\lambda}
μ\mu recovery rate
ν\nu critical exponent for the characteristic length
ξ\xi characteristic length
πs\pi_{s} size distribution of the cluster that a randomly chosen node belongs to
ρs\rho_{s} size distribution of the cluster at the end of a link
σ\sigma critical exponent for the characteristic size
τ\tau Fisher exponent
Φ(x,s,a)\Phi(x,s,a) Lerch’s transcendent
χ\chi mean cluster size

2 Classical percolation on networks

In statistical physics, the percolation model is often displayed on regular lattices. However, the regular topological structure is not always the case in reality. In this section, we will review the basic properties of the classical percolation on heterogeneous network systems, mainly focusing on analytic methods, critical phenomena, and Monte Carlo algorithms of the classical percolation. It should be pointed out that there are a lot of theoretical studies of percolation transition on networks from the perspective of random graphs, one can refer to Refs.[77, 78, 47, 79, 80, 81, 82] and the relevant references to learn more. Here we mainly focus on the network percolation from the perspective of statistical physics and network science.

2.1 Problem description

As a lattice model, the percolation model can be easily extended to networked systems, namely, nodes or links are randomly designated either occupied (with probability pp) or unoccupied (with probability 1p1-p). For convenience, in network science one usually removes each node or link with probability 1p1-p to realize the percolation model. The parameters and the corresponding problems can be defined on the preserved network as that on regular lattices. As a result, there is no need to repeat the arguments. Here we will concentrate on the relationships between the classical percolation model and some typical issues of network science, such as network robustness, and epidemics spreading on network.

For site percolation, the network science often translates the unoccupied nodes as failed/removed ones. In this way, the percolation process is just the performance of the network under random nodal failures. If the giant cluster remains after the percolation process, the resulted network can be thought of as a whole, and still functional. Thus the percolation threshold pcp_{c} can be used to evaluate the robustness of the network [83, 84, 85, 72, 86]. A large threshold pcp_{c} indicates that only a small amount of failed nodes can disconnect the network, so its robustness is poor, and vice versa. Furthermore, if the unoccupied nodes are chosen with preference, this model can further explain the robustness of networks under intentional attacks. A remarkable knowledge of this model is the finding of the strong robustness of SF networks under random failures, and extremely fragile for intentional attacks [85]. A specific discussion on this problem can be found in Sec.4.

In network science, bond percolation often refers to the spreading process [23]. The connection between the spreading of epidemic and percolation is in fact one of the original motivations for the percolation model itself [16, 7, 17, 25]. For this process, the occupation of a link means a successful infection between the two connected nodes. That is, the occupied probability reflects the infectiousness of the epidemic. However, different from the mapping between site percolation and models for network robustness, the link occupied probability is not simply equivalent to the infectious probability, but an integrated probability of transmission of the disease between two individuals [87]. In this way, the emergence of the giant cluster corresponds to the outbreak of an epidemic, so that the percolation threshold can also be used to evaluate the infectiousness of the epidemic. An epidemic with smaller percolation threshold pcp_{c} has a stronger infectious ability. Specific discussion on this problem can be found in Sec.5.

The basic mechanisms of the network percolation and their extensions can be further applied in various fields of network science from structure analysis to dynamics modeling, which this article explores next.

2.2 Analytic method based on branching process

Although the percolation model is easy to define, the exact solutions are absent for most systems. If the network topology is restricted to be tree-like, like that of Bethe lattice, there is a mean-field method based on the branching process for solving the percolation model. Here we introduce the framework of this method, including the solutions for the size of the giant cluster and the distribution of finite clusters [48, 87, 1].

2.2.1 Behaviors of the giant cluster

The percolation process is tantamount to dilute links and thus changes the degree distribution pkp_{k}. So, for convenience and generality, the occupation of nodes or links will not be considered in the following discussion. After obtaining the result, we can revise it by considering the diluting effect of the percolation on the degree distribution pkp_{k}.

When a network percolates, there must be an infinite cluster, in which the branching process is endless. Specifically, when arriving at a node by following a link, the node must have some other links (at least one) to ensure the continuity of branching. In terms of this, assuming a link belongs (connects) to the giant cluster with probability RR, we have a self-consistent equation

R\displaystyle R =k=1qk[1(1R)k1]\displaystyle=\sum_{k=1}q_{k}\left[1-(1-R)^{k-1}\right]
=1G1(1R),\displaystyle=1-G_{1}(1-R), (13)

where qk=pkk/kq_{k}=p_{k}k/\langle k\rangle is the probability that the node reached by following a link has degree kk, known as the excess-degree distribution, and G1(x)=kqkxk1G_{1}(x)=\sum_{k}q_{k}x^{k-1} is the corresponding generating function. The term 1(1R)k11-(1-R)^{k-1} in the square brackets just means that at least one excess link is needed to keep branching.

After obtaining RR from Eq.(13), the order parameter SS, i.e., the probability that a randomly chosen node belongs to the giant cluster (the infinite cluster), can be expressed as

S\displaystyle S =k=0pk[1(1R)k]\displaystyle=\sum_{k=0}p_{k}\left[1-(1-R)^{k}\right]
=1G0(1R),\displaystyle=1-G_{0}(1-R), (14)

where G0(x)=kpkxkG_{0}(x)=\sum_{k}p_{k}x^{k} is the generating function of the degree distribution. The right hand side of this equation means that at least one of the neighbors must belong to the giant cluster. From the perspective of branching, this equation can also be interpreted as the start point of the branching process, while Eq.(13) is the intermediate process. In the infinite cluster any nodes can be the start point, so SS is an average probability, as well as the parameter RR.

Furthermore, for the percolation with occupied probability pp, we only need to replace the generating functions in Eqs.(13) and (14) with those of the diluted network. Next, let us calculate the degree distribution of the diluted network. Regardless of the type of the percolation (site percolation or bond percolation), the diluting ratio of links is pp, i.e., each link is preserved with probability pp. So the generating functions g0(x)g_{0}(x) and g1(x)g_{1}(x) of the diluted networks can be written as

g0(x)\displaystyle g_{0}(x) =k=0pkxk\displaystyle=\sum_{k^{\prime}=0}^{\infty}p_{k^{\prime}}x^{k^{\prime}}
=k=0k=kpk(kk)pk(1p)kkxk\displaystyle=\sum_{k^{\prime}=0}^{\infty}\sum_{k=k^{\prime}}^{\infty}p_{k}\binom{k}{k^{\prime}}p^{k^{\prime}}(1-p)^{k-k^{\prime}}x^{k^{\prime}}
=G0(1p+px),\displaystyle=G_{0}(1-p+px), (15)
g1(x)\displaystyle g_{1}(x) =k=1pkkpkxk1\displaystyle=\sum_{k^{\prime}=1}^{\infty}\frac{p_{k^{\prime}}k^{\prime}}{p\langle k\rangle}x^{k^{\prime}-1}
=k=1k=kpk(kk)pk(1p)kkkpkxk1\displaystyle=\sum_{k^{\prime}=1}^{\infty}\sum_{k=k^{\prime}}^{\infty}p_{k}\binom{k}{k^{\prime}}p^{k^{\prime}}(1-p)^{k-k^{\prime}}\frac{k^{\prime}}{p\langle k\rangle}x^{k^{\prime}-1}
=G1(1p+px).\displaystyle=G_{1}(1-p+px). (16)

Then, inserting them into Eqs.(13) and (14), we have [48, 87]

R\displaystyle R =1g1(1R)\displaystyle=1-g_{1}(1-R)
=1G1(1pR),\displaystyle=1-G_{1}(1-pR), (17)
S\displaystyle S =1g0(1R)\displaystyle=1-g_{0}(1-R)
=1G0(1pR).\displaystyle=1-G_{0}(1-pR). (18)

Before continuing to consider the details of these two equations, we must point out that the order parameter SS here is the size respected to the diluted network. For bond percolation, the dilution is only for links, so the diluted network has the same size as the original network and SS is also the size respecting to the original network. However, for site percolation the diluted network consists of only occupied nodes (fraction pp respected to the original network). As a result, Eq.(18) should be revised for site percolation, that is

S=p[1G0(1pR)].S=p\left[1-G_{0}(1-pR)\right]. (19)
Refer to caption
Figure 3: Percolation on ER networks and random regular (RR) networks. The scatters are the simulation results on networks with size N=105N=10^{5}, the corresponding lines are the theoretical prediction of Eqs.(17)-(19). (a) ER networks with average degree k=4\langle k\rangle=4. For this case the generating functions can be simplified as G0(x)=G1(x)=ek(1x)G_{0}(x)=G_{1}(x)=e^{-\langle k\rangle(1-x)}. (b) RR networks with all the nodes having identical degree k0=4k_{0}=4. For this case the generating functions can be simplified as G0(x)=xk0G_{0}(x)=x^{k_{0}} and G1(x)=xk01G_{1}(x)=x^{k_{0}-1}.

In general, one can solve Eq.(17) first, and then insert RR into Eq.(18) or (19) to get the order parameter SS for the bond percolation and the site percolation, respectively. From Eqs.(17)-(19), we can also find that the two types of percolation models give the same probability RR, but different giant clusters. For mathematical tractability, sometimes one just uses RR to represent the meaning of pRpR for site percolation in Eqs.(17) and (19), that is

R\displaystyle R =p[1G1(1R)],\displaystyle=p[1-G_{1}(1-R)], (20)
S\displaystyle S =p[1G0(1R)].\displaystyle=p[1-G_{0}(1-R)]. (21)

These two equations are fully equivalent to Eqs.(17) and (19).

Although the mean-field approximation is used in the above discussion, i.e., all the nodes have the same probability SS and all the links have the same probability RR, Eqs.(17)-(21) can give an exact solution for the percolation on tree-like networks, see Fig.3 as an example. A comparison between the mean-field prediction and the numerical simulation on some real-world networks can also be found in Ref.[88].

Following the idea of branching, the microstructure of the giant cluster in random networks [89], such as degree distribution, and degree correlations, as well as that of temporal networks [90], can also be obtained. For networks with loops, different links could lead to the same nodes in the branching process. In other words, the branching processes starting from different links are no longer independent of each other, thus all the equations (13)-(21) are not valid. That is why the above method is only applicable to the tree-like structure.

2.2.2 Behaviors of finite clusters

Besides the giant cluster, the percolation theory also concerns the behaviors of finite clusters, such as mean cluster size, and cluster size distribution. Next, along with the idea of the branching process used above, we will further show how to obtain the information of finite clusters in percolation model.

As discussed in Ref.[48], it is more convenient to study the distribution πs=nss=pss/s\pi_{s}=n_{s}s=p_{s}s/\langle s\rangle rather than nsn_{s} or psp_{s}, which means the probability distribution of the size of the cluster that a randomly chosen node belongs to. One can further define ρs\rho_{s} as the size distribution of the cluster at the end of a link. Accordingly, the generating functions have the forms H0(x)=s=0πsxsH_{0}(x)=\sum_{s=0}\pi_{s}x^{s} and H1(x)=s=0ρsxsH_{1}(x)=\sum_{s=0}\rho_{s}x^{s}. For convenience, the occupation of links and nodes is not taken into account in the following. When necessary, one can use the revised generating functions Eqs.(15) and (16) to get the corresponding results.

Note that a zero size has no physical senses, so π0=0\pi_{0}=0, while ρ0\rho_{0} can be non-zero corresponding a node with degree one. Since H0(1)H_{0}(1) and H1(x)H_{1}(x) only contain the finite clusters, and the giant cluster is excluded (if there is one), we have H0(1)=sπs=1SH_{0}(1)=\sum_{s}\pi_{s}=1-S and H1(1)=sρs=1RH_{1}(1)=\sum_{s}\rho_{s}=1-R, where SS and RR are defined as that for Eqs.(13) and (14). Below the critical point, R=0R=0 and S=0S=0, so H0(1)=1H_{0}(1)=1 and H1(1)=1H_{1}(1)=1.

As the schematic in Fig.4, the cluster size distributions πs\pi_{s} and ρs\rho_{s} are dependent on both the degree/excess-degree distribution and the branching size at the end of a link. Thus, Fig.4 can be translated as the equations

H0(x)\displaystyle H_{0}(x) =xk=0pk[H1(x)]k\displaystyle=x\sum_{k=0}p_{k}[H_{1}(x)]^{k}
=xG0[H1(x)],\displaystyle=xG_{0}[H_{1}(x)], (22)
H1(x)\displaystyle H_{1}(x) =xk=1qk[H1(x)]k1\displaystyle=x\sum_{k=1}q_{k}[H_{1}(x)]^{k-1}
=xG1[H1(x)].\displaystyle=xG_{1}[H_{1}(x)]. (23)

Note that there is one more factor xx on the right hand sides of the two equations for the contribution of the root node. Using the relations H0(1)=1SH_{0}(1)=1-S and H1(1)=1RH_{1}(1)=1-R, these two equations just recover Eqs.(13) and (14) when x=1x=1.

Refer to caption
Figure 4: Schematic diagrams of the self-consistent equations for the cluster size distribution. The line, circle and square represent link, node and cluster, respectively. (a) The possible cases of the branchings at the end of a link, corresponding to Eq.(23). (b) The possible cases of the branchings starting from a node, corresponding to Eq.(22). Such schematic diagrams are originally shown in Ref.[48].

In principle, we can solve Eqs.(22) and (23) together to find H0(x)H_{0}(x), and then use

πs=ds1dxs1H0(x)x|x=0\pi_{s}=\left.\frac{d^{s-1}}{dx^{s-1}}\frac{H_{0}(x)}{x}\right|_{x=0} (24)

to find the cluster size distribution. However, usually no closed form can be found for H0(x)H_{0}(x). Instead, we can substitute Eqs.(22) and (23) into Eq.(24), then do a change of variables via Cauchy formula, and finally get

πs=k(s1)!ds2dxs2[G1(x)]s|x=0.\pi_{s}=\left.\frac{\langle k\rangle}{(s-1)!}\frac{d^{s-2}}{dx^{s-2}}[G_{1}(x)]^{s}\right|_{x=0}. (25)

With this, the cluster size can be obtained directly from G1(x)G_{1}(x), therefore avoiding solve Eqs.(22) and (23) [91, 92, 93]. As πssnssps\pi_{s}\propto sn_{s}\propto sp_{s}, the result must have the form πss1τ\pi_{s}\propto s^{1-\tau}, where τ\tau is the Fisher exponent.

Furthermore, the mean cluster size χ\chi can also be obtained from Eqs.(22) and (23). From the definition, the mean cluster size χ\chi can be represented as

χ\displaystyle\chi =ss2nsssns\displaystyle=\frac{\sum_{s}s^{2}n_{s}}{\sum_{s}sn_{s}}
=ssπssπs\displaystyle=\frac{\sum_{s}s\pi_{s}}{\sum_{s}\pi_{s}}
=H0(1)H0(1)\displaystyle=\frac{H_{0}^{\prime}(1)}{H_{0}(1)}
=1+G0(1)H1(1)H0(1).\displaystyle=\frac{1+G_{0}^{\prime}(1)H_{1}^{\prime}(1)}{H_{0}(1)}. (26)

In the last equation the differential of Eq.(22) is used. We can further employ Eq.(23) to replace the term H1(1)H_{1}^{\prime}(1), and yield

χ=1H0(1)[1+G0(1)1G1(1)].\chi=\frac{1}{H_{0}(1)}\left[1+\frac{G_{0}^{\prime}(1)}{1-G_{1}^{\prime}(1)}\right]. (27)

It is obvious that χ\chi is divergent when

G1(1)=1.G_{1}^{\prime}(1)=1. (28)

This is the Molloy–Reed criterion k(k2)=0\langle k(k-2)\rangle=0 found in random graph theory [77], above which (G1(1)>1G_{1}^{\prime}(1)>1) the giant cluster exists.

In addition, the discussion of finite clusters based on generating functions H0(x)H_{0}(x) and H1(x)H_{1}(x) can also work without the help of generating functions G0(x)G_{0}(x) and G1(x)G_{1}(x), so it is not limited to uncorrelated tree-like networks, see Sec.3.4 for an example of the percolation on growing networks.

2.3 Potts model formulation

The Potts model is a generalization of the Ising model with more than two components [94], and related to a number of other outstanding problems in lattice systems [24], including percolation model. Although the Potts model is also unsolved for an arbitrary network, the connection between the Potts model and the percolation model has made it possible to explore the network percolation from the known information on the Potts model. This section will provide the mapping between Potts model and percolation model, as well as some discussions.

2.3.1 Fortuin-Kasteleyn cluster representation

It is known that the q1q\to 1 limit of the qq-state Potts model without external field corresponds to the percolation problem, which is thus applied to the study of percolation on networks [95, 96, 16, 24]. To consider the percolation on a network 𝒢\mathcal{G}, we introduce a qq-state Potts model with the Hamiltonian

=Ji,jδαi,αjHiδαi,α,\mathcal{H}=-J\sum_{\langle i,j\rangle}\delta_{\alpha_{i},\alpha_{j}}-H\sum_{i}\delta_{\alpha_{i},\alpha}, (29)

where αi=1,2,3,,q\alpha_{i}=1,2,3,\ldots,q is the spin state of node ii, and the sums i,j\sum_{\langle i,j\rangle} and i\sum_{i} are over all the links and nodes of the network, respectively. δx,y\delta_{x,y} is the Kronecker delta function, i.e., δx,y=0,1\delta_{x,y}=0,1 if xyx\neq y and x=yx=y, respectively. Here, the ferromagnetic interaction is used J>0J>0, and the magnetic field H>0H>0 is applied to the spin αH\alpha_{H}. Then, the partition function can be written as

Z\displaystyle Z ={α}eβ\displaystyle=\sum_{\{\alpha\}}e^{-\beta\mathcal{H}}
={α}eKi,jδαi,αjeLiδαi,αH\displaystyle=\sum_{\{\alpha\}}e^{K\sum_{\langle i,j\rangle}\delta_{\alpha_{i},\alpha_{j}}}e^{L\sum_{i}\delta_{\alpha_{i},\alpha_{H}}}
={α}i,jeKδαi,αjieLδαi,αH\displaystyle=\sum_{\{\alpha\}}\prod_{\langle i,j\rangle}e^{K\delta_{\alpha_{i},\alpha_{j}}}\prod_{i}e^{L\delta_{\alpha_{i},\alpha_{H}}}
={α}i,j[1+(eK1)δαi,αj]i[1+(eL1)δαi,αH],\displaystyle=\sum_{\{\alpha\}}\prod_{\langle i,j\rangle}\left[1+(e^{K}-1)\delta_{\alpha_{i},\alpha_{j}}\right]\prod_{i}\left[1+(e^{L}-1)\delta_{\alpha_{i},\alpha_{H}}\right], (30)

where K=βJK=\beta J, L=βHL=\beta H, and the sum {α}\sum_{\{\alpha\}} is over all the spin configurations of the system. We can further expand the products and use the subnetworks of 𝒢\mathcal{G} to represent the terms in the expansion, which is often called Fortuin–Kasteleyn cluster representation [97].

Next, we give a brief derivation for the Fortuin–Kasteleyn cluster representation. For arguments aia_{i}, iIi\in I, the product iI(1+ai)\prod_{i\in I}(1+a_{i}) is equivalent to the sum of all the possible polynomials formed by aia_{i}, iIi\in I, so that iI(1+ai)=IIiIai\prod_{i\in I}(1+a_{i})=\sum_{I^{\prime}\subseteq I}\prod_{i\in I^{\prime}}a_{i}, where II^{\prime} is a subset of II, and the sum I\sum_{I^{\prime}} is over all the possible cases. By this formula, the product i,j\prod_{\langle i,j\rangle} in Eq.(30) can be represented by the sum of the subnetworks 𝒢\mathcal{G}^{\prime} of 𝒢\mathcal{G}, that is

Z\displaystyle Z ={α}𝒢𝒢i,j𝒢(eK1)δαi,αji[1+(eL1)δαi,αH]\displaystyle=\sum_{\{\alpha\}}\sum_{\mathcal{G}^{\prime}\subseteq\mathcal{G}}\prod_{\langle i,j\rangle\in\mathcal{G}^{\prime}}(e^{K}-1)\delta_{\alpha_{i},\alpha_{j}}\prod_{i}\left[1+(e^{L}-1)\delta_{\alpha_{i},\alpha_{H}}\right]
=𝒢𝒢(eK1)l(𝒢){α}i,j𝒢δαi,αji[1+(eL1)δαi,αH],\displaystyle=\sum_{\mathcal{G}^{\prime}\subseteq\mathcal{G}}(e^{K}-1)^{l(\mathcal{G}^{\prime})}\sum_{\{\alpha\}}\prod_{\langle i,j\rangle\in\mathcal{G}^{\prime}}\delta_{\alpha_{i},\alpha_{j}}\prod_{i}\left[1+(e^{L}-1)\delta_{\alpha_{i},\alpha_{H}}\right], (31)

where l(𝒢)l(\mathcal{G}^{\prime}) is the number of links in subnetwork 𝒢\mathcal{G}^{\prime}. Note that the term (i,j)𝒢δαi,αj\prod_{(i,j)\in\mathcal{G}^{\prime}}\delta_{\alpha_{i},\alpha_{j}} gives nonzero value only when the nodes in the same clusters are in the same states (any of the qq states α=1,2,,q\alpha=1,2,\ldots,q). This can further simplify the partition function to be

Z\displaystyle Z =𝒢𝒢(eK1)l(𝒢){αc}c[1+(eL1)δαc,αH]sc\displaystyle=\sum_{\mathcal{G}^{\prime}\subseteq\mathcal{G}}(e^{K}-1)^{l(\mathcal{G}^{\prime})}\sum_{\{\alpha_{c}\}}\prod_{c}\left[1+(e^{L}-1)\delta_{\alpha_{c},\alpha_{H}}\right]^{s_{c}}
=𝒢𝒢(eK1)l(𝒢)c(eLsc+q1).\displaystyle=\sum_{\mathcal{G}^{\prime}\subseteq\mathcal{G}}(e^{K}-1)^{l(\mathcal{G}^{\prime})}\prod_{c}\left(e^{Ls_{c}}+q-1\right). (32)

Here, the product c\prod_{c} is over all the clusters formed by the links in subnetwork 𝒢\mathcal{G}^{\prime}, and scs_{c} is the number of nodes in cluster cc. The sum over spin {α}\{\alpha\} in Eq.(30) has now been replaced by a sum over subnetwork {𝒢}\{\mathcal{G}^{\prime}\}, this is just the Fortuin–Kasteleyn cluster representation of the partition function of Potts model.

Let eK1=p/(1p)e^{K}-1=p/(1-p), the partition function Eq.(32) can be rescaled as

𝒵\displaystyle\mathcal{Z} (1p)l(𝒢)Z\displaystyle\equiv(1-p)^{l(\mathcal{G})}Z
=𝒢𝒢pl(𝒢)(1p)l(𝒢)l(𝒢)c(eLsc+q1)\displaystyle=\sum_{\mathcal{G}^{\prime}\subseteq\mathcal{G}}p^{l(\mathcal{G}^{\prime})}(1-p)^{l(\mathcal{G})-l(\mathcal{G}^{\prime})}\prod_{c}\left(e^{Ls_{c}}+q-1\right)
=𝒢𝒢P𝒢c(eLsc+q1),\displaystyle=\sum_{\mathcal{G}^{\prime}\subseteq\mathcal{G}}P_{\mathcal{G}^{\prime}}\prod_{c}\left(e^{Ls_{c}}+q-1\right), (33)

where l(𝒢)l(\mathcal{G}) is the total number of links in 𝒢\mathcal{G}. If pp is interpreted as the occupied probability of links, the term P𝒢=pl(𝒢)(1p)l(𝒢)l(𝒢)P_{\mathcal{G}^{\prime}}=p^{l(\mathcal{G}^{\prime})}(1-p)^{l(\mathcal{G})-l(\mathcal{G}^{\prime})} is just the probability of a percolation configuration with l(𝒢)l(\mathcal{G}^{\prime}) occupied links. Thus, Eq.(33) bridges the Potts model and the percolation model.

From Eq.(33), we can write the free energy per node as

f(K,L,q)=limNln𝒵N.f(K,L,q)=\lim_{N\to\infty}\frac{\ln\mathcal{Z}}{N}. (34)

For convenience, we further define

h(K,L)\displaystyle h(K,L) qf(K,L,q)|q=1\displaystyle\equiv\left.\frac{\partial}{\partial q}f(K,L,q)\right|_{q=1}
=𝒢𝒢P𝒢ceLscN𝒢𝒢P𝒢\displaystyle=\frac{\sum_{\mathcal{G}^{\prime}\subseteq\mathcal{G}}P_{\mathcal{G}^{\prime}}\sum_{c}e^{-Ls_{c}}}{N\sum_{\mathcal{G}^{\prime}\subseteq\mathcal{G}}P_{\mathcal{G}^{\prime}}}
=ceLscN.\displaystyle=\frac{\left\langle\sum_{c}e^{-Ls_{c}}\right\rangle}{N}. (35)

For L>0L>0 the sum in this equation is over all the finite clusters, then the size of the percolating cluster is

S\displaystyle S =1cscN\displaystyle=1-\frac{\left\langle\sum_{c}s_{c}\right\rangle}{N}
=1+Lh(K,L)|L=0,\displaystyle=1+\left.\frac{\partial}{\partial L}h(K,L)\right|_{L=0}, (36)

and the mean cluster size is

χ=2L2h(K,L)|L=0.\chi=\left.\frac{\partial^{2}}{\partial L^{2}}h(K,L)\right|_{L=0}. (37)

By the above two equations, the percolation property can then be extracted from the Potts model.

The product c\prod_{c} in Eq.(33) is over all the clusters. It thus can be rewritten in another form of

𝒵=𝒢𝒢P𝒢s(eLs+q1)ms,\mathcal{Z}=\sum_{\mathcal{G}^{\prime}\subseteq\mathcal{G}}P_{\mathcal{G}^{\prime}}\prod_{s}\left(e^{Ls}+q-1\right)^{m_{s}}, (38)

where the product s\prod_{s} is over the size of the clusters, and msm_{s} is the number of clusters with size ss. By this formula, function h(K,L)h(K,L) can also be expressed as

h(K,L)=smseLsN=snseLs.h(K,L)=\frac{\left\langle\sum_{s}m_{s}e^{-Ls}\right\rangle}{N}=\sum_{s}n_{s}e^{-Ls}. (39)

Here, ns=ms/Nn_{s}=\langle m_{s}/N\rangle is the cluster size distribution. This is just the generating function h(K,x)=snsxsh(K,x)=\sum_{s}n_{s}x^{s} with x=eLx=e^{-L}, which can be used to study the cluster size distribution.

Specifically, once the Hamiltonian (Eq.(29)) of a network is known, all the percolation parameters can be obtained by the above equations. However, for most network models, it is difficult to simplify the sum over links in Eq.(29). A solvable case is the hidden parameter model, for which the pre-defined connection probability is given for each pair of nodes. Thus, the sum over links can be translated as a weighed (connection probability) sum over all the node pairs. With this technique, the Potts model formulation has been used in solving the percolation on SF networks [98, 99], on correlated hypergraphs [100], and in generalized canonical random network ensembles [101].

2.3.2 Relation with the mean-field equations

Through the so-called recurrence relation, the mean-field equations for tree-like networks can be recovered by the Potts model formulation [102]. For this purpose, the partition function can be represented as an integration of a root, labeled node 0, and the branchings from it, that is

Z\displaystyle Z =α0=1qeLδα0,αHi=1zi(α0)\displaystyle=\sum_{\alpha_{0}=1}^{q}e^{L\delta_{\alpha_{0},\alpha_{H}}}\prod_{i=1}z_{i}(\alpha_{0})
=eLi=1zi(αH)+(q1)i=1zi(α).\displaystyle=e^{L}\prod_{i=1}z_{i}(\alpha_{H})+(q-1)\prod_{i=1}z_{i}(\alpha). (40)

Here, zi(α0)z_{i}(\alpha_{0}) is the partition function of the branching from node ii, and the product i\prod_{i} runs over all the root’s nearest neighbors (nodes in the first shell of node 0). Due to the symmetry, α\alpha in the second term of Eq.(40) can be any state excluding aHa_{H}. Obviously, this formulation is only for the tree-like networks, if not, the product i\prod_{i} must include some double counting.

Different from Eq.(30), the partition function equation (40) represents the effective state of the root node. In this way, the free energy per nodes is lnZ\ln Z, then we have

h(K,L)\displaystyle h(K,L) =qlnZ|q=1\displaystyle=\left.\frac{\partial}{\partial q}\ln Z\right|_{q=1}
=eLi=1zi(α)i=1zi(αH).\displaystyle=e^{-L}\frac{\prod_{i=1}z_{i}(\alpha)}{\prod_{i=1}z_{i}(\alpha_{H})}. (41)

To study the percolation model, we assume here that zi(α)z_{i}(\alpha) describes the state for q=1q=1 and L=0L=0, thus it is irrelevant to the differentials /q\partial/\partial q and /L\partial/\partial L. For convenience, we further rescale zi(α)z_{i}(\alpha) by zi(αH)z_{i}(\alpha_{H}), i.e., xizi(α)/zi(αH)x_{i}\equiv z_{i}(\alpha)/z_{i}(\alpha_{H}). Then, by Eq.(36), we can find the order parameter of the percolation transition

S\displaystyle S =1+Lh(K,L)|L=0\displaystyle=1+\left.\frac{\partial}{\partial L}h(K,L)\right|_{L=0}
=1i=1xi.\displaystyle=1-\prod_{i=1}x_{i}. (42)

From this equation, the physical meaning of xix_{i} becomes clear, that is the probability that the branching from node ii is not the giant cluster.

To obtain the order parameter, we need to further find xi(α)x_{i}(\alpha). Comparing Eqs.(40) and (30), we have

zi(α0)={α}eKj,kδαj,αk+LjδαH,αj+Kδα0,αi,z_{i}(\alpha_{0})=\sum_{\{\alpha\}}e^{K\sum_{\langle j,k\rangle}\delta_{\alpha_{j},\alpha_{k}}+L\sum_{j}\delta_{\alpha_{H},\alpha_{j}}+K\delta_{\alpha_{0},\alpha_{i}}}, (43)

where the sums j\sum_{j} and j,k\sum_{\langle j,k\rangle} run over the nodes and the links in the branching from node ii, respectively. Note that the interaction between node ii and the root Kδα0,αiK\delta_{\alpha_{0},\alpha_{i}} is also included in this partition function. Furthermore, it is not hard to find that zi(α0)z_{i}(\alpha_{0}) satisfies the following recurrence relation

zi(α0)=αi=1qeKδα0,αi+LδαH,αimzm(αi).z_{i}(\alpha_{0})=\sum_{\alpha_{i}=1}^{q}e^{K\delta_{\alpha_{0},\alpha_{i}}+L\delta_{\alpha_{H},\alpha_{i}}}\prod_{m}z_{m}(\alpha_{i}). (44)

Here, the product m\prod_{m} is over all the neighbors of node ii except the root (nodes in the second shell of node 0), and zm(αi)z_{m}(\alpha_{i}) is the partition function for the subnetwork branching from node mm.

For infinite system, the recursive relation Eq.(44) holds for any two adjacent shells. If α0αH\alpha_{0}\neq\alpha_{H}, it can be rewritten as

zi(α0)=eKmzm(α0)+eLmzm(αH)+(q2)mzm(α).z_{i}(\alpha_{0})=e^{K}\prod_{m}z_{m}(\alpha_{0})+e^{L}\prod_{m}z_{m}(\alpha_{H})+(q-2)\prod_{m}z_{m}(\alpha). (45)

While for α0=αH\alpha_{0}=\alpha_{H}, Eq.(44) reduces to

zi(αH)=eK+Lmzm(αH)+(q1)mzm(α).z_{i}(\alpha_{H})=e^{K+L}\prod_{m}z_{m}(\alpha_{H})+(q-1)\prod_{m}z_{m}(\alpha). (46)

Then, we can find the recursive equation for xix_{i},

xi(α0)\displaystyle x_{i}(\alpha_{0}) =zi(α0)zi(αH)\displaystyle=\frac{z_{i}(\alpha_{0})}{z_{i}(\alpha_{H})}
=eKmzm(α0)+eLmzm(αH)+(q2)mzm(α)eK+Lmzm(αH)+(q1)mzm(α)\displaystyle=\frac{e^{K}\prod_{m}z_{m}(\alpha_{0})+e^{L}\prod_{m}z_{m}(\alpha_{H})+(q-2)\prod_{m}z_{m}(\alpha)}{e^{K+L}\prod_{m}z_{m}(\alpha_{H})+(q-1)\prod_{m}z_{m}(\alpha)}
=eKmxm(α0)+eL+(q2)mxm(α)eK+L+(q1)mxm(α).\displaystyle=\frac{e^{K}\prod_{m}x_{m}(\alpha_{0})+e^{L}+(q-2)\prod_{m}x_{m}(\alpha)}{e^{K+L}+(q-1)\prod_{m}x_{m}(\alpha)}. (47)

For q=1q=1 and L=0L=0, it reduces to

xi\displaystyle x_{i} =eK+(1eK)mxm\displaystyle=e^{-K}+(1-e^{-K})\prod_{m}x_{m}
=1p+pmxm,\displaystyle=1-p+p\prod_{m}x_{m}, (48)

where p=1eKp=1-e^{-K}.

From the view of mean-field theory, the probabilities xix_{i} for different nodes can all be replaced by an average probability xx. Then, Eqs.(42) and (48) become

S\displaystyle S =1kpkxk=1G0(x),\displaystyle=1-\sum_{k}p_{k}x^{k}=1-G_{0}(x), (49)
x\displaystyle x =1p+pkpkkkxk1=1p+pG1(x).\displaystyle=1-p+p\sum_{k}\frac{p_{k}k}{\langle k\rangle}x^{k-1}=1-p+pG_{1}(x). (50)

These are just the mean-field equations (17) and (18) found in Sec.2.2 with x=1pRx=1-pR. By this equation, we bridge the Potts model formulation of the percolation model with that of the mean-field method based on the branching process [102]. Note that the tree-like structure is also required here.

2.4 Message passing method

In recent years, there has been a growing interest in the analysis of the percolation problem on real networks. These systems all have finite sizes and are featured by the adjacency matrix rather than the degree distribution. As a consequence, the above mean-field method does not work in this case. Instead, the most common method to estimate the percolation threshold on these networks is the so-called message passing method [103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113]. Note that the percolation threshold (transition) is theoretically defined in the thermodynamic limit (NN\to\infty), here we assume that the system is large enough to observe a percolation transition. Next, we will go to the details of this method.

Different from the above mean-field method, the message passing method allows each node ii has its own probability sis_{i} to express whether it is a part of the giant cluster. From this perspective, the size of the giant cluster can be written as

S=isiN.S=\frac{\sum_{i}s_{i}}{N}. (51)

Similar to Eqs.(20) and (21), we can also express sis_{i} as

si\displaystyle s_{i} =p[1j𝒩i(1rij)],\displaystyle=p\left[1-\prod_{j\in\mathcal{N}_{i}}(1-r_{i\to j})\right], (52)
rij\displaystyle r_{i\to j} =p[1k𝒩j,ki(1rjk)],\displaystyle=p\left[1-\prod_{k\in\mathcal{N}_{j},k\neq i}(1-r_{j\to k})\right], (53)

where rijr_{i\to j} is the probability that the link iji\to j leads to the giant cluster, and 𝒩i\mathcal{N}_{i} is the set of neighbors of node ii. Furthermore, by evaluating the logarithm of the above two equations, we can turn the products into sums,

ln(1sip)\displaystyle\ln\left(1-\frac{s_{i}}{p}\right) =j𝒩iln(1rij),\displaystyle=\sum_{j\in\mathcal{N}_{i}}\ln(1-r_{i\to j}), (54)
ln(1rijp)\displaystyle\ln\left(1-\frac{r_{i\to j}}{p}\right) =k𝒩j,kiln(1rjk)\displaystyle=\sum_{k\in\mathcal{N}_{j},k\neq i}\ln\left(1-r_{j\to k}\right)
=kAkjln(1rjk)Ajiln(1rji).\displaystyle=\sum_{k}A_{kj}\ln\left(1-r_{j\to k}\right)-A_{ji}\ln\left(1-r_{j\to i}\right). (55)

Here 𝐀\mathbf{A} is the adjacency matrix of the network, i.e., Aij=1A_{ij}=1 if there is a link iji\to j, otherwise Aij=0A_{ij}=0.

Let wij=ln(1rij/p)w_{i\to j}=\ln(1-r_{i\to j}/p) and vij=ln(1rij)v_{i\to j}=\ln(1-r_{i\to j}), Eq.(55) becomes equivalent to the vectorial equation

𝐰=𝐌𝐯,\mathbf{w}=\mathbf{M}\mathbf{v}, (56)

where 𝐌\mathbf{M} is a 2L×2L2L\times 2L matrix (LL is the number of links). From Eq.(55), we can find that only when j=kj=k and ili\neq l, the entry Mij,klM_{i\to j,k\to l} is non-zero. In other words, the two links must be head to tail, and excluding the case of backtracking. This matrix is known as the Hashimoto or non-backtracking matrix of graphs [114, 115]. Mathematically, Mij,kl=δj,k(1δi,l)M_{i\to j,k\to l}=\delta_{j,k}(1-\delta_{i,l}), where δx,y\delta_{x,y} is the Kronecker delta function δx,y=1\delta_{x,y}=1 if x=yx=y, and δx,y=0\delta_{x,y}=0, otherwise.

When ppcp\to p_{c}, all rir_{i} trend to zero. Then, expanding Eq.(55) around the critical point, we obtain an eigenvalue equation of matrix 𝐌\mathbf{M},

1pc𝐫=𝐌𝐫.\frac{1}{p_{c}}\mathbf{r}=\mathbf{M}\mathbf{r}. (57)

According to the Perron-Frobenius theorem, only the largest eigenvalue λmax\lambda_{max} of matrix 𝐌\mathbf{M} can give a meaningful eigenvector 𝐫\mathbf{r} (with all elements non-negative). Therefore, it gives a lower bound for the site percolation threshold on an infinite graph pc=1/λmaxp_{c}=1/\lambda_{max} [103]. Considering the effects of triangles, this framework can also be used to establish a tighter lower bound of the bond percolation threshold on clustered networks [116].

The above discussion is for site percolation, it can be easily extended to bond percolation, that is

si\displaystyle s_{i} =1j𝒩i(1prij),\displaystyle=1-\prod_{j\in\mathcal{N}_{i}}(1-pr_{i\to j}), (58)
rij\displaystyle r_{i\to j} =1k𝒩j,ki(1prjk).\displaystyle=1-\prod_{k\in\mathcal{N}_{j},k\neq i}(1-pr_{j\to k}). (59)

Further discussion of the two equations can be done like that of the site percolation.

In addition, as was done by the mean-field method, the message passing method can also be adopted to study the cluster size distribution. One can refer to Refs.[103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113] for details.

2.5 Phase transition and critical phenomena

Due to the heterogeneous structure, the classical percolation on networks demonstrates many interesting phenomena. In this subsection we will review the findings on tree-like network ensembles, for which the percolation problem can be solved exactly by the methods reviewed in the above subsections.

2.5.1 Percolation threshold

To discuss the critical phenomena, we first need to determine the critical point. For tree-like networks, it is easy to know from Eqs.(17)-(21) that only a non-zero RR can lead to a non-zero SS. Thus the percolation threshold pcp_{c} can be found by analyzing the non-trivial solution of Eq.(17), and the site and bond percolations have the same critical point. For this purpose, we can construct a function

w(R)=R1+G1(1pR).w(R)=R-1+G_{1}(1-pR). (60)

The solution of Eq.(17) corresponds to the crossing point of w(R)w(R) and RR-axis. Since w(R)w(R) is continuous with w(0)=0w(0)=0 and w(1)>0w(1)>0, we can draw a qualitative curve of w(R)w(R) as shown in Fig.5. It is easy to find that the critical point pcp_{c} corresponds to the tangency of w(R)w(R) and RR-axis with Rc=0R_{c}=0, so we have

dw(R)dR=1pcG1(1)=0.\frac{dw(R)}{dR}=1-p_{c}G_{1}^{\prime}(1)=0. (61)

Thus, the percolation threshold can be written as

pc=1G1(1).p_{c}=\frac{1}{G_{1}^{\prime}(1)}. (62)

This is a general form for any tree-like networks. Using the expression of the generating function G1(x)G_{1}(x), it can be rewritten as

pc=kk2k.p_{c}=\frac{\langle k\rangle}{\langle k^{2}\rangle-\langle k\rangle}. (63)

For p=1p=1, this equation reduces to Eq.(28), i.e., the Molloy–Reed criterion k(k2)=0\langle k(k-2)\rangle=0 [77]. Note that this is an approximate expression for tree-like infinite networks. A general case for finite systems is the one found by message passing method, see the last subsection.

Refer to caption
Figure 5: (Color online) Schematic of the solution of Eq.(17). Only when pp exceeds pcp_{c}, function w(R)w(R) has a non-trivial crossing point with RR-axis. The critical point corresponds to the tangency of the function w(R)w(R) and RR-axis, which gives a vanished RR. The inset shows the non-trivial solution RR vs. pp.

For ER networks, node degrees obey the Poisson distribution pk=ekkk/k!p_{k}=e^{-\langle k\rangle}\langle k\rangle^{k}/k!, so that

G0(x)\displaystyle G_{0}(x) =G1(x)=ek(x1),\displaystyle=G_{1}(x)=e^{\langle k\rangle(x-1)}, (64)
k2\displaystyle\langle k^{2}\rangle =k2+k.\displaystyle=\langle k\rangle^{2}+\langle k\rangle. (65)

This allows the percolation threshold has a simple form pc=1/kp_{c}=1/\langle k\rangle. Besides, due to the Poisson degree distribution, ER networks can have a closed solution of percolation threshold for many network percolation models. For comparison, we list the percolation thresholds of ER networks for different models in Tab.3.

Table 3: The percolation thresholds of ER networks for some solvable models. The order parameter of the first part of the table is the occupied probability pp, i.e., removing each node with probability 1p1-p. In the second part of the table, the order parameter is the link inserting probability pp.
Model Threshold
Classical percolation 1/k1/\langle k\rangle [48]
Biconnected cluster 1/k1/\langle k\rangle [117]
Core percolation 2.7182818/k2.7182818/\langle k\rangle [118]
Greedy articulation point removal 3.39807/k3.39807/\langle k\rangle [119]
Two interdependent networks 2.4554/k2.4554/\langle k\rangle [120, 121]
Single networks with dependence links 2.4554/k\sqrt{2.4554/\langle k\rangle} [122]
Classical percolation 1/N1/N [47]
Clique percolation {(kl)!/[(kl)1]}2/(kl)(k+l1)N2/(k+l1)\{(k−l)!/[\binom{k}{l}−1]\}^{2/(k−l)(k+l−1)}N^{−2/(k+l−1)} [123, 124]
Growing network 1/81/8 [125]

A more interesting case is the SF network with 2<λ<32<\lambda<3, for which k2\langle k^{2}\rangle is divergent, indicating a vanished percolation threshold. Employing the Hurwitz zeta function ζ(x,a)=n=anx\zeta(x,a)=\sum_{n=a}n^{-x}, the percolation threshold of SF networks can be represented as

pc\displaystyle p_{c} =k=mk1λk=mk2λk=mk1λ\displaystyle=\frac{\sum_{k=m}^{\infty}k^{1-\lambda}}{\sum_{k=m}^{\infty}k^{2-\lambda}-\sum_{k=m}^{\infty}k^{1-\lambda}}
=ζ(λ1,m)ζ(λ2,m)ζ(λ1,m).\displaystyle=\frac{\zeta(\lambda-1,m)}{\zeta(\lambda-2,m)-\zeta(\lambda-1,m)}. (66)

One can find that for m=1m=1, this formula gives a percolation threshold pcp_{c} larger than 11 when λ>3.47875\lambda>3.47875\ldots, indicating there is no percolation transition. This is due to the absence of the spanning cluster in such SF networks [78]. The typical value λ3.47875\lambda\approx 3.47875 can be obtained by 2ζ(λ1)=ζ(λ2)2\zeta(\lambda-1)=\zeta(\lambda-2), where ζ(x)\zeta(x) is the Riemann zeta function. This problem can be overcome by simply setting m2m\geq 2. Furthermore, the Hurwitz zeta function ζ(x,a)\zeta(x,a) is divergent for x1x\leq 1. Thus a vanished percolation threshold can be found for SF networks with λ3\lambda\leq 3.

The zero percolation threshold is not unique to SF networks, and a general class of self-similar and hierarchical networks can also have zero percolation threshold [126, 127, 128]. This property can be derived from the hierarchy of nested subgraphs in self-similar networks and real-space renormalization group technique, and does not require the assumption that networks are tree-like. In addition, by embedding SF networks into physical dimensions, a non-zero percolation threshold can also be reconstructed [129]. This is mainly because the spatial constraint induces a high clustering and dilutes the global connections.

2.5.2 Scaling behaviors

Varied percolation thresholds in different networks are excepted, since it is not universal, whereas things get interesting in that the heterogeneous structures could produce a different universal property of percolation transition, i.e., the critical exponent. Indeed, we can expand the analytical results (see Sec.2.2) around the critical point to find the critical exponents. It is also worth pointing out that the critical exponents derived from the results obtained in Sec.2.2 are still the sense of mean-field solutions, although they might be different from the regular mean-field nature.

For a few cases that have a closed form of the generating functions G0(x)G_{0}(x) and G1(x)G_{1}(x), such as ER networks and RR networks, they give the critical exponents above the critical dimension (the mean-field critical exponents), since these systems have no spatial constraint. Mathematically, for these networks, the degree distribution only affects the coefficients of the expansion series around the percolation threshold, then yields the same leading order. For the calculation of these cases, one can refer to Refs.[48, 87, 91, 92] for details.

A special case is SF networks, which have strong heterogeneity. Although there is also no spatial constraint, it shows a λ\lambda-dependent critical behavior (λ\lambda is the exponent of the degree distribution pk=ckλp_{k}=ck^{-\lambda}). Even for a vanished percolation threshold (2<λ<32<\lambda<3), the critical behavior is also well defined. The critical exponents can be also derived from the results obtained in Sec.2.2, but bear a strong heterogeneity of the degree distribution.

Although the generating functions G0(x)G_{0}(x) and G1(x)G_{1}(x) have no closed form for SF networks, they can be represented by the Lerch’s transcendent Φ(x,s,a)=n=0xn(n+a)s\Phi(x,s,a)=\sum_{n=0}^{\infty}x^{n}(n+a)^{-s},

G0(x)\displaystyle G_{0}(x) =k=mckλxk\displaystyle=\sum_{k=m}ck^{-\lambda}x^{k}
=xmΦ(x,λ,m)ζ(λ,m)\displaystyle=\frac{x^{m}\Phi(x,\lambda,m)}{\zeta(\lambda,m)}
=cxmΦ(x,λ,m),\displaystyle=cx^{m}\Phi(x,\lambda,m), (67)
G1(x)\displaystyle G_{1}(x) =k=mk1λxk1k=mk1λ\displaystyle=\frac{\sum_{k=m}k^{1-\lambda}x^{k-1}}{\sum_{k=m}k^{1-\lambda}}
=xm1Φ(x,λ1,m)ζ(λ1,m)\displaystyle=\frac{x^{m-1}\Phi(x,\lambda-1,m)}{\zeta(\lambda-1,m)}
=1ζ(λ1,m)x[xmΦ(x,λ,m)]\displaystyle=\frac{1}{\zeta(\lambda-1,m)}\frac{\partial}{\partial x}\left[x^{m}\Phi(x,\lambda,m)\right]
=ckx[xmΦ(x,λ,m)],\displaystyle=\frac{c}{\langle k\rangle}\frac{\partial}{\partial x}\left[x^{m}\Phi(x,\lambda,m)\right], (68)

where kn=ck=mknλ=cζ(λn,m)\langle k^{n}\rangle=c\sum_{k=m}k^{n-\lambda}=c\zeta(\lambda-n,m). To find the critical behaviors of the percolation on SF networks, we need further to know the series expansion of G0(x)G_{0}(x) and G1(x)G_{1}(x) at the critical point pcp_{c}, which corresponds to x1x\to 1, see Sec.2.5.1. However, Φ(x,s,a)\Phi(x,s,a) has a singularity at x=1x=1, which is dependent on ss. This is the mathematical origin of the specific critical behaviors of SF networks.

From Ref.[130], for |lnx|<2π|\ln x|<2\pi and a0,1,2,a\neq 0,-1,-2,\cdots, the Lerch’s transcendent Φ(x,s,a)\Phi(x,s,a) can be expanded as

Φ(x,s,a)=xa[Γ(1s)(lnz)s1+n=0ζ(sn,a)n!(lnx)n],s1,2,3,.\Phi(x,s,a)=x^{-a}\left[\Gamma(1-s)(-\ln z)^{s-1}+\sum_{n=0}\frac{\zeta(s-n,a)}{n!}(\ln x)^{n}\right],~{}s\neq 1,2,3,\cdots. (69)

Let x1ϵ1x\equiv 1-\epsilon\to 1^{-}, we also have

ln(1ϵ)=ϵ+12ϵ2+13ϵ3+.-\ln(1-\epsilon)=\epsilon+\frac{1}{2}\epsilon^{2}+\frac{1}{3}\epsilon^{3}+\cdots. (70)

Then, we can further write Φ(x,s,a)\Phi(x,s,a) as a series of ϵ\epsilon,

Φ(x,s,a)=xa[Γ(1s)(n=1ϵnn!)s1+n=0(ϵ)nn!l=0nS(n,l)ζ(sl,a)],s1,2,3,,\Phi(x,s,a)=x^{-a}\left[\Gamma(1-s)\left(\sum_{n=1}\frac{\epsilon^{n}}{n!}\right)^{s-1}+\sum_{n=0}\frac{(-\epsilon)^{n}}{n!}\sum_{l=0}^{n}S(n,l)\zeta(s-l,a)\right],~{}s\neq 1,2,3,\cdots, (71)

where S(n,l)S(n,l) is the signed Stirling numbers of the first kind. With this equation, we can find the leading terms of the generating functions G0(x)G_{0}(x) and G1(x)G_{1}(x) for x1x\to 1, i.e., the asymptotic series to the percolation threshold. Then, substituting into the corresponding equations in Sec.2.2, the λ\lambda-dependent critical exponents can be obtained, which are summarized in Tab.4. We can find that the case of 3<λ<43<\lambda<4 is just a mirror copy of the case of 2<λ<32<\lambda<3.

Table 4: The mean-field results of the critical exponents for the classical percolation transition on SF networks [131].
λ\lambda β\beta γ\gamma ν\nu σ\sigma τ\tau
(2,3)(2,3) 1/(3λ)1/(3-\lambda) 1-1 (λ1)/(3λ)(\lambda-1)/(3-\lambda) (3λ)/(λ2)(3-\lambda)/(\lambda-2) (2λ3)/(λ2)(2\lambda-3)/(\lambda-2)
(3,4)(3,4) 1/(λ3)1/(\lambda-3) 11 (λ1)/(λ3)(\lambda-1)/(\lambda-3) (λ3)/(λ2)(\lambda-3)/(\lambda-2) (2λ3)/(λ2)(2\lambda-3)/(\lambda-2)
(4,)(4,\infty) 1 1 33 1/21/2 5/25/2

In essence, a strong heterogeneity just means a broad degree distribution. Therefore, when the emergence of hubs in SF networks is suppressed (large λ\lambda), the regular mean-field exponents as that found in ER networks are excepted. Cohen et al. pointed out that the regular mean-field results can be found when λ>λc=4\lambda>\lambda_{c}=4 [131].

Moreover, Radicchi and Castellano pointed out that the site percolation on SF networks gives a different critical exponent β\beta for 2<λ<32<\lambda<3 [55]. This comes from the singularity of Eq.(19), which has the leading term Sp(1R)S\propto p(1-R). For λ>3\lambda>3, the system has a non-trivial pcp_{c}, and the leading term reduces to S1RS\propto 1-R, which gives the same critical exponent β\beta as that of the bond percolation. However, for 2<λ<32<\lambda<3, it can be rewritten as S(ppc)(1R)S\propto(p-p_{c})(1-R), where pc=0p_{c}=0. This leads to β=1/(3λ)+1=(4λ)/(3λ)\beta=1/(3-\lambda)+1=(4-\lambda)/(3-\lambda).

It is also worth noting that with the Potts model formulation, Lee et al. shows τ=λ\tau=\lambda for 2<λ<32<\lambda<3 [98, 99], which is contradict with that listed in Tab.4 [131]. By this, the validity of the treatments used in the two works requires further study. Besides, Cohen et al. pointed out that due to the vanished threshold for 2<λ<32<\lambda<3, the exponent τ=(2λ3)/(λ2)\tau=(2\lambda-3)/(\lambda-2) is calculated at a small but fixed occupied probability [131].

With the framework shown above, many other features about the percolation transition on SF networks were also studied, such as the fractal dimensions of percolating networks [132], the giant cluster in the large-network limit [133], branching trees [134], statistical ensemble [135, 136, 137], width of percolation transition [138], the upper critical dimension [139], and the cluster forming in SF networks with exponent less than two [140] or one [141].

Although the above discussions are concentrated on the degree distributions of networks, we must point out that the spatial constraint still plays a key role in the nature of percolation transition. By embedding networks in a physical dimension, the network percolation can also exhibit a range of phase behaviors, as well as reconstructing the universality class of physical dimensions, depending on the dominant structure properties, such as spatial constraint, small-world, fractal, and hierarchy [142, 143, 144, 145, 146]. In Refs.[58, 60, 147, 148], Newman and the coauthors also developed a method to study the percolation transition on SW networks, suggesting that the SW network resembles more a random network in infinite dimension.

2.6 On clustered and correlated networks

Real world networks are commonly clustered and correlated rather than a tree-like random structure [2]. Some works also contributed to the percolation of clustered and correlated networks. Note that the clustering and the correlation discussed here are local, i.e., they are the properties of adjacent nodes.

2.6.1 Networks with low clustering

As a starting point, we first review the method proposed by Berchenko et al. to find the approximate solution of the percolation model for networks with low clustering [149]. Network science often employs the clustering coefficient CC to feature how well connected the neighborhood of a node is [150, 57, 1, 2]. If the neighbors are fully connected, the clustering coefficient is 11, while a value close to 0 means that there are hardly any connections in the neighborhood. Indeed, as the definition of the clustering coefficient, it also represents the connection probability of a node’s two neighbors. In the branching process one can exclude the backward links from the excess degrees, see Fig.6 (a). Therefore, the effective excess-degree kk^{\prime} of a node with degree kk can be expressed as

k=(k1k)Ck1k(1C)k.k^{\prime}=\binom{k-1}{k^{\prime}}C^{k-1-k^{\prime}}(1-C)^{k^{\prime}}. (72)

For C=0C=0, this reduces to the tree-like structure k=k1k^{\prime}=k-1. This equation has no physical meaning for C=1C=1, since it is only an approximate treatment for low clustering. With Eq.(72), the generating function of the excess degrees can be revised as

𝒢1c(x)\displaystyle\mathcal{G}_{1}^{c}(x) =\displaystyle= k=1pkkkk=0(k1k)Ck1k(1C)kxk\displaystyle\sum_{k=1}\frac{p_{k}k}{\langle k\rangle}\sum_{k^{\prime}=0}\binom{k-1}{k^{\prime}}C^{k-1-k^{\prime}}(1-C)^{k^{\prime}}x^{k^{\prime}} (73)
=\displaystyle= G1(C+xCx),\displaystyle G_{1}(C+x-Cx),

where G1(x)G_{1}(x) is that with no thought of clusterings. With this revised generating function, we can solve the percolation model on networks with low clustering, which gives the critical point [149]

pc=1(1C)G1(1).p_{c}=\frac{1}{(1-C)G_{1}^{\prime}(1)}. (74)

This indicates that for a given degree distribution, the clustering leads to a larger percolation threshold. This mainly because for a fixed number of links, the clustering structure reinforces the core of the network with the price of diluting the global connections [151, 152, 153, 154]. However, the clustering cannot restore a finite percolation threshold for SF networks with 2<λ<32<\lambda<3 [155], since the divergence of G1(1)G_{1}^{\prime}(1) only depends on the degree distribution in this approximation.

It is important to notice that Eq.(74) works only for networks with low clusterings. Strong clustering could induce the core-periphery structure [156, 157], in which the core and periphery might percolate at different critical points [158, 159, 160], and the above approximate treatment is not applicable. In this perspective, the clustered networks are often characterized as a robust system [161, 162, 163].

Refer to caption
Figure 6: (Color online) Schematic of the branching process in clustered networks. (a) The branching process starts from node 11 takes the breadth-first strategy, i.e., searching all the neighbors of node 11 firstly. At the end of link l12l_{12} indicated by the arrow, node 22 has three excess links, one of which (l23l_{23}) leads to a branching (node 33) that has already been reached as a neighbor of node 11. Therefore, only links l24l_{24} and l25l_{25} can lead to new branchings, i.e., there are only two effective excess degrees. In general, a local backward link, such as l23l_{23}, corresponds to the formation of a clustering, thus occurs with probability CC on average. (b) Considering the non-tree-like motifs as special links/nodes indicated by different colors, the network can be seen as a tree-like one.

2.6.2 Networks with high clustering

For high clustering, the triangles formed by adjacent nodes could share links, which cannot be directly reflected by the clustering coefficient, Eq.(72) thus overcounts the excess links. Of course, one can further use the polynomials of the clustering coefficient to estimate these high-order clustering structures [65]. However, to completely eliminate the overcounting, the history of the branching process has to be considered, which is beyond the compass of the mean-field method. In addition, this problem in a sense relates to the minimum spanning tree of networks [164, 165, 166], which is also a problem without a unified solution.

If the triangles or other non-tree-like motifs do not share links with each other, there is another framework to solve the corresponding percolation problem [152, 153, 167, 168]. Beyond the degree distribution, this approach requires to know the distribution of all other motifs that a node has, see Fig.6 (b). Taking the triangle structure as an example, when the number of triangles that a node has is known, one can treat the triangle as a special link in the branching process. Then, the network can percolate by means of not only single links but also triangles. Under this premise, the clustered networks can also be regarded as tree-like, and the processing method provided above can thus be used. In principle, this method can be applied to networks with any motifs. However, it is a tedious work to classify diverse clustered structures and get the corresponding distribution. In addition, this approach can be well summarized as mapping the clustered network into a tree-like hyper-network [169].

The general thought in dealing with this problem, namely, recognizing a certain connection structure as an individual (link or node), is also used in the so-called clique percolation [170]. In this percolation problem, the kk-clique, a complete subnetwork with kk nodes, is treated as a node, and two kk-cliques sharing ll nodes are considered to be connected. This percolation model is usually used as an algorithm for the detection of communities with overlap [171]. For percolation transition, the main finding is the seemingly discontinuous transition, which essentially is a continuous one [124]. The details will be discussed in Sec.4.3.

Refer to caption
Figure 7: (Color online) Schematic of the percolation configuration of a SW network. (a) A SW network consisted of an underlying two-dimensional square lattice and several shortcuts indicated by red. (b) A possible configuration after removing some links from the SW network shown in (a). Since the shortcuts are sparse, it is nearly impossible to form loop structures with shortcuts near or below the percolation threshold. The loop structures only exist in the clusters extracted from the underlying lattice.

Another application of this framework is the solution of the percolation model on SW networks [147, 148], which typically show a highly clustering effect. In Refs.[147, 148] the authors treated the finite percolation clusters extracted from the underlying lattice of the SW network as effective nodes, then the system is just a set of effective nodes connected by the shortcuts, which are links added between randomly selected pairs of nodes in the underlying lattice [57, 59]. As the setting of SW networks, the shortcuts are sparse, so near or below the percolation threshold the reduced network demonstrates a tree-like structure composed of effective nodes and shortcuts, see Fig.7. Consequently, the formulation of the self-consistent equation as Eq.(23) can be used to solve the model, too. Based on this treatment, they found that the shortcuts not only modify the percolation threshold but also the universality class.

Since the clusterings are naturally excluded in the non-backtracking matrix, with some techniques the message passing method can also be used to find the percolation threshold of clustered networks, see the references listed in Sec.2.4 for details.

2.6.3 Correlated networks

Along with clusterings, real networks generally show a structure with degree correlation rather than random connections. For such networks, the branching process cannot be simply described by the excess-degree distribution. Instead, a joint degree-degree distribution is used to figure out the degree correlation of the two nodes at two ends of a link [172, 173].

Theoretical results show that a finite amount of random mixing of the connections in SF networks with 2<λ<32<\lambda<3 is sufficient to give a divergent k2\langle k^{2}\rangle, and thus leads to a vanished percolation threshold [172]. Moreover, the assortative correlation makes networks more robust, while the disassortative correlation makes networks fragile even with a divergent second moment of degree distribution. However, in the spatially constrained ER networks, degree correlations favor or do not favor percolation depending on the connectivity rules [174].

The Monte Carlo simulations on the exponential random network show that the disassortative correlation has no effect on the critical phenomena so that the percolation transition on disassortative networks belongs to the same universality class as on uncorrelated networks [175]. While assortative correlation is relevant, percolation transition shows continuously varying critical exponents [175]. Recently, Mizutaka et al. proposed a maximally disassortative network model, which realizes a maximally negative degree-degree correlation [176]. Both the analytical and simulation results suggest this maximally disassortative network can also give new critical exponents for SF networks with 2<γ<32<\gamma<3.

Goltsev et al. further summarized three conditions [173]: (i) The largest eigenvalue λ1\lambda_{1} of the branching matrix is finite if k2\langle k^{2}\rangle is finite, or λ1\lambda_{1}\to\infty if k2\langle k^{2}\rangle\to\infty; (ii) The second largest eigenvalues of the branching matrix is finite; (iii) The sequence of entries of the eigenvector associated with the largest eigenvalue converges to a nonzero value. When these conditions are fulfilled, the critical exponents are completely determined by the asymptotic behavior of the degree distribution at large degrees, thus the percolation transition on a correlated network belongs to the same universality class as the percolation on an uncorrelated network. If at least one of the three conditions is not fulfilled, the critical exponent becomes model-dependent and hence non-universal [177, 178, 179]. Furthermore, by the technique that dividing nodes into different types with hyper-links, the site and bond percolations on clustered and correlated networks can also be solved [180, 181, 108].

2.7 On directed networks

The percolation transition can be also defined on directed networks. The difference is that the giant cluster of directed networks can be further specified as [48]: (i) the giant strongly connected cluster (GSCC), in which each node is reachable from others; (ii) the giant in-cluster (GIC), from which GSCC are reachable but those are not reachable from GSCC; (iii) the giant out-cluster (GOC), from which GSCC are not reachable but those are reachable from GSCC; (iv) the giant weakly connected cluster (GWCC), in which each pair of nodes are reachable without regard to the direction of links. A schematic of these giant clusters is shown in Fig.8.

Refer to caption
Figure 8: (Color online) Schematic of the giant cluster of directed networks. GSCC is a set of nodes, in which each node is reachable from all others. The set of nodes, from which GSCC are reachable but those are not reachable from GSCC, form the GIC. In turn, GOC is the set of nodes that GSCC are not reachable but those are reachable from GSCC. Besides, GIN can also lead to some nodes which do not belong to GSCC, if these nodes can lead to GOC then called tube, otherwise called tendril. Moreover, the nodes lead to GOC but not belong to GSCC and GIC are also called tendril. These giant clusters and the attached smaller clusters, including tubes and tendrils but not just, form the GWCC, which is just the giant cluster by ignoring the direction of links. In addition to these giant clusters, there are some small clusters that are disconnected to these giant clusters, though not depicted here.

The mean-field method reviewed in Sec.2.2 can also be extended to directed networks [182, 183, 184, 185, 186, 187, 188, 189]. For this, we define RIR^{I} and ROR^{O} as the probabilities that a directed link leads to GSCC and comes from GSCC, respectively. Thus, similar to Eq.(13), we have two self-consistent equations,

RI\displaystyle R^{I} =1ijpijik(1RI)j\displaystyle=1-\sum_{ij}\frac{p_{ij}i}{\langle k\rangle}\left(1-R^{I}\right)^{j}
=1G1I(1,1RI),\displaystyle=1-G_{1}^{I}(1,1-R^{I}), (75)
RO\displaystyle R^{O} =1ijpijjk(1RO)i\displaystyle=1-\sum_{ij}\frac{p_{ij}j}{\langle k\rangle}\left(1-R^{O}\right)^{i}
=1G1O(1RO,1),\displaystyle=1-G_{1}^{O}(1-R^{O},1), (76)

where pijp_{ij} is the probability that a node has in-degree ii and out-degree jj, and

G1I(x,y)\displaystyle G_{1}^{I}(x,y) =ijpijikxi1yj,\displaystyle=\sum_{ij}\frac{p_{ij}i}{\langle k\rangle}x^{i-1}y^{j}, (77)
G1O(x,y)\displaystyle G_{1}^{O}(x,y) =ijpijjkxiyj1\displaystyle=\sum_{ij}\frac{p_{ij}j}{\langle k\rangle}x^{i}y^{j-1} (78)

are the generating functions of the excess-degree distribution following and against the link direction, respectively. Note that a directed network has an identical average in-degree and average out-degree k=ijpiji=ijpijj\langle k\rangle=\sum_{ij}p_{ij}i=\sum_{ij}p_{ij}j.

When Eqs.(75) and (76) only have the trivial solution RI=RO=0R^{I}=R^{O}=0, there is thus no GSCC in the system. Otherwise, the giant cluster can be expressed by RIR^{I} and ROR^{O} according to their definitions,

SGIC\displaystyle S_{GIC} =1G0(1,1RI),\displaystyle=1-G_{0}(1,1-R^{I}), (79)
SGOC\displaystyle S_{GOC} =1G0(1RO,1),\displaystyle=1-G_{0}(1-R^{O},1), (80)
SGSCC\displaystyle S_{GSCC} =1G0(1RO,1)G0(1,1RI)+G0(1RO,1RI),\displaystyle=1-G_{0}(1-R^{O},1)-G_{0}(1,1-R^{I})+G_{0}(1-R^{O},1-R^{I}), (81)

where

G0(x,y)\displaystyle G_{0}(x,y) =ijpijxiyj\displaystyle=\sum_{ij}p_{ij}x^{i}y^{j} (82)

is the generating function of the degree distribution. Similar as that for undirected networks, let RI(RO)0R^{I}(R^{O})\rightarrow 0 in Eqs.(75) and (76), we have the criterion of percolation transition,

G1I(1,y)y|y=1\displaystyle\frac{\partial G_{1}^{I}(1,y)}{\partial y}\bigg{|}_{y=1} =1,\displaystyle=1, (83)
G1O(x,1)x|x=1\displaystyle\frac{\partial G_{1}^{O}(x,1)}{\partial x}\bigg{|}_{x=1} =1.\displaystyle=1. (84)

The two equations are equivalent, that is

ijpij(2ijij)=0.\sum_{ij}p_{ij}(2ij-i-j)=0. (85)

This is the condition that the GSCC exists in a directed network. Considering the dilution of the initial node or link removal as that for undirected networks (see Sec.2.2), one can also find the percolation threshold from Eq.(85),

pc=kij.p_{c}=\frac{\langle k\rangle}{\langle ij\rangle}. (86)

where ij=ijpijij\langle ij\rangle=\sum_{ij}p_{ij}ij. If there is no correlation between in-degrees and out-degrees, i.e., ij=k2\langle ij\rangle=\langle k\rangle^{2}, Eq.(86) reduces to pc=1/kp_{c}=1/\langle k\rangle. This indicates that in contrast to undirected networks, a non-zero percolation threshold can exist in directed SF networks with λI>2\lambda^{I}>2 and λO>2\lambda^{O}>2 [183]. Here, λI\lambda^{I} and λO\lambda^{O} are the in-degree and out-degree distribution exponents, respectively.

From the asymptotic expansion of Eqs.(75)-(81), the critical exponents can also be obtained [183]. In general, the GIC and GOC can give different critical exponents β\beta, which are dependent on the in-degree and out-degree distributions. As the definition, GSCC is the intersection of GIC and GOC. Therefore, it behaves as the smaller one of GIC and GOC. For directed SF networks, the critical exponents can be also written in the form of undirected SF networks, see Tab.4, but with an effective λ\lambda^{*}, which is dependent on the existence of correlations and on the degree distribution exponents λI\lambda^{I} and λO\lambda^{O} [183].

The above discussion can also be generalized to the cases with the bidirectional links and the degree correlations [184, 190]. The result shows that the percolation threshold can be generally expressed as a function of the maximum eigenvalue of the connectivity matrices. In particular, for networks with no degree correlations, bidirectional links act as a catalyst for percolation, favoring the emergence of the GSCC, and for SF networks, only an infinitesimal fraction of bidirectional links is needed. The interface links, i.e., the ones joining GIC/GOC and GSCC, can also be investigated analytically in this framework [185]. In addition, the GWCC of a directed network can be obtained by mapping it into an undirected network. It should be noted that the GWCC could emerge at a smaller pcp_{c} than that of GSCC, which is dependent on the degree correlations of the in-degrees and the out-degrees. This indicates that the directed network may have the GWCC and, simultaneously, may not have the GSCC.

2.8 Algorithm for network percolation

For most network systems, the percolation model cannot be solved exactly. In consequence, the simulation algorithm becomes very important to estimate the critical point and to study the critical phenomena. However, due to the diversity of network connections, the algorithms for classical percolation on regular lattices cannot be transplanted into network percolation directly. One needs to store the connection information of the network, while this is not necessary for that on regular lattices [191]. Hence, the simulation of the network percolation requires more memory than of the classical percolation on physical dimensions.

In general, we can first construct the percolation configuration following the percolation rule, then use the classical algorithm of graph searching to identify the connections of the configuration, including the breadth-first searching and the depth-first searching. For a given configuration with MM links, these searching algorithms take time O(M)O(M) to find all the clusters. This is the minimal overhead for scanning an unknown structure. If the data structure of the network does not allow to directly scan the neighbors of nodes, such as the adjacency matrix, the time complexity will be higher. This two-step algorithm, i.e., constructing the configuration first and then identifying the clusters, does not have a specific requirement for percolation rules and network structures, and thus is universally applicable.

To be more effective, the percolation algorithm should be a creative blend of configuration constructing and cluster identifying, not a process with two standalone steps. In other words, while constructing the percolation configuration, we must simultaneously update the cluster information. The detailed techniques are dependent on the percolation model and the network model. Here, we only introduce the algorithm proposed by Newman and Ziff for the classical percolation on networks [28, 192], which is applicable to any given networks. Other algorithms will be reviewed when necessary.

The Newman-Ziff algorithm employs an exquisite data structure [28, 192], with which the nodes/links can be occupied one by one, and the cluster size information can be updated concurrently, which improves greatly the efficiency for checking the percolation properties of a system under a sequence of occupied probability pp. In this algorithm, each cluster is stored as a separate tree with a single root node. Each node is allocated a pointer either to the root of the cluster or to another node in the cluster, such that by following a pointer chain we can travel from any node to the root of the cluster. The root nodes can be identified by the fact that they have a null pointer. To be more efficient, we can also use the pointer of the root to store the size of the cluster.

Taking bond percolation as an example, this percolation algorithm can be summarized as follows [28, 192]:

  1. 1.

    Initially each node is its own root, and contains a record of its own size, which is 11.

  2. 2.

    Links of the network are occupied in random order. When a link is occupied, two nodes are joined together. Follow the pointer chains from the two nodes separately until we reach the root nodes of the clusters to which they belong.

    1. (a)

      If the two roots are the same node, we do nothing further.

    2. (b)

      If the two roots are different, we examine the cluster sizes stored in them, and add a pointer from the root of the smaller cluster to the root of the larger, thereby making the smaller tree a subtree of the larger one. If the two have the same size, we may choose whichever tree we like to be the subtree of the other. We also update the size of the larger cluster by adding the size of the smaller one to it.

Step 22 is repeated until the expected number of occupied links is reached. The cluster size can be obtained from the pointer of the root, thus it allows us to evaluate the observable quantities of interest. In addition, to improve the efficiency of the algorithm, we can compress the pointer chain as much as possible. For site percolation, the algorithm is similar, see Refs.[28, 192] for details.

3 Network-specific percolation models

In the previous section, we reviewed the studies of the classical percolation model on networked systems. As a framework to evaluate the system performance, the emergence of the giant cluster with respect to some control parameters is widely used in network science [3, 5, 6, 1, 86, 8, 2]. However, the cluster forming rule is diversiform rather than a probabilistic occupation as that in the classical percolation. They usually contain some recursive/iterative processes or with a broad sense of occupation in the cluster forming. Thus, these derivative models are often categorized into a family called dependent percolation. Typical examples include kk-core percolation, clique percolation, core percolation, explosive percolation, percolation on interdependent/multiplex networks, etc. In this section we will give a brief introduction of the theoretical findings of these models.

In Tab.5, we list the key rules and the types of the percolation transitions for different models.

Table 5: The key rules and the types of the percolation transitions for different models. In general, there are three kinds of models. The first one is a node removal process triggered by removing a fraction 1p1-p of nodes from a network (occupying a fraction pp of nodes). This removal process can differ from model to model, which could contain multiple steps from one single operation to infinite iteration. It is also worth pointing out that due to a similar mechanism, although the bootstrap percolation is a recovery process for removed nodes, we also classify it into this class. The second one is a link insertion process in a set of nodes, i.e., at each time step a pair of nodes will be connected with probability pp. The rule for choosing a node pair is the key of this type of models. For the last one, there could be a special definition of the connection of nodes, and the connection unit is not limited to node. Note that the probability pp is used as the order parameter for the first two types, however, the meaning is different. In the table these three types of models are separated by a horizontal line.
Model Rule Type
kk-core percolation [193] Removing all the nodes with degrees less than kk, iteratively. continuous for k=2k=2, hybrid for k3k\geq 3
Threshold model [194, 195] Removing all the nodes with ki/ki,0<αk_{i}/k_{i,0}<\alpha iteratively, where kik_{i} and ki,0k_{i,0} are the current and the initial degrees of node ii, respectively continuous for small α\alpha, discontinuous for large α\alpha [195]
Bootstrap percolation [196] Removed nodes will be iteratively recovered when they have at least kk neighbors. discontinuous, hybrid [196]
Core percolation [118] Removing nodes of degree 11 along with its neighbor, iteratively. continuous for undirected networks, discontinuous for directed networks [118]
Greedy articulation points removal [119] Removing all the articulation nodes iteratively, where the articulation node is the one whose removal disconnects the network. hybrid [118]
Percolation on interdependent networks [120, 121] Removing all the failed nodes iteratively, where the failed node is the one does not belong to the giant cluster, or has a failed dependent partner in other layers. hybrid [120, 121]
Network observability [197] The initial observable (occupied) node makes both the node and all of its neighbors observable (occupied). continuous [197]
ll-hop percolation [198] The nodes no farther than ll away from the initial removed nodes will also be removed. continuous [198]
History-dependent percolation [199] Removing all the links between different clusters of the last generation. continuous for finite generations, hybrid for infinite generation [199, 200]
Explosive percolation [201, 202, 203, 204, 205, 21] At each time step, more than one potential links are arbitrarily chosen. The one that suppresses the emergence of a giant cluster is inserted eventually, and other potential links are discarded. continuous with unusual finite size behavior [206]
Growing network [125] At each time step, a new node is inserted into the system, then two nodes are chosen randomly from all the existing nodes and joined by a link with probability pp. infinite order [125]
kk-component The nodes in the same cluster must connect to each other by at least kk independent paths. continuous for k=2k=2 [117]
Limit path percolation [207] After link/node removal, two nodes are considered as connected, if the new shortest path between them is shorter than al(a1)al(a\geq 1), where ll is the shortest path before removal. continuous [207]
Clique percolation [170, 124] This model considers the percolation of connected cliques in a network, and two cliques are regarded as adjacent, if they share some nodes. continuous [124, 208]
Color-avoiding percolation [209, 210] Two nodes are considered as color-avoiding connected, only if they are always connected for the removal of any colored links/nodes. continuous [209, 210]

3.1 k-core percolation

kk-core percolation is one of the typical representatives of the dependent percolation on networks[211, 212, 213, 193]. For a given network kk-core is the subnetwork in which nodes have at least kk links connecting with other nodes in this subnetwork. So, as a percolation model, it mainly studies the emergence of the giant kk-core when a fraction 1p1-p nodes are removed, i.e., occupying each node with probability pp [193]. The case k=1k=1 recovers the classical percolation. While k=2k=2 the formed cluster has a structure similar to the so-called bicomponent/biconnectivity, which is a meaningful concept of network robustness [117]. For k3k\geq 3, kk-core percolation transition becomes discontinuous. In this subsection we will review the theoretical researches of kk-core percolation, as well as some related percolation models. kk-core percolation and its variants are often used as algorithms to provide the structure information of networks [214], some of which will be included in Sec.4.

3.1.1 Model and phase transition characteristics

We can iteratively remove nodes with degrees smaller than kk to obtain the kk-core of a network. Note that even without the initial node removal, kk-core may be absent in networks, dependent on the network topology. An extreme case is the tree-like network, for which there cannot exist any finite kk-cores with k2k\geq 2. This is not difficult to be understood from the pruning process of kk-core percolation. In any finite and tree-like networks, there must be some nodes with degree k=1k=1, located at the periphery of the network. The removal of these nodes must introduce some new periphery nodes with degree k=1k=1. In this way the pruning process will destroy the whole network. However, this does nothing to the theoretical analysis of kk-core percolation transition on tree-like networks, since an infinite spanning tree does not have periphery nodes, and all the degrees are larger than or equal to 22. This also means that there are no finite kk-core clusters (k2k\geq 2) in tree-like networks, thus the kk-core, if exists, is just the giant cluster, and can be used to feature the percolation transition.

As the model setting, if a node is in the kk-core, it must have at least kk links connecting to other nodes in the kk-core. By this, similar to Eqs.(17) and (19), we can write the self-consistent equations for kk-core percolation on tree-like networks as [193],

Sk\displaystyle S_{k} =pn=kd=npd(dn)Rkn(1Rk)dn\displaystyle=p\sum_{n=k}^{\infty}\sum_{d=n}^{\infty}p_{d}{d\choose n}R_{k}^{n}(1-R_{k})^{d-n} (87)
=p[1n=0k1Rknn!dnG0(x)dxn|x=1Rk],\displaystyle=p\left[1-\sum_{n=0}^{k-1}\frac{R_{k}^{n}}{n!}\frac{d^{n}G_{0}(x)}{dx^{n}}\bigg{|}_{x=1-R_{k}}\right], (88)
Rk\displaystyle R_{k} =pn=k1d=n+1pddk(d1n)Rkn(1Rk)d1n\displaystyle=p\sum_{n=k-1}^{\infty}\sum_{d=n+1}^{\infty}\frac{p_{d}d}{\langle k\rangle}{d-1\choose n}R_{k}^{n}(1-R_{k})^{d-1-n} (89)
=p[1n=0k2Rknn!dnG1(x)dxn|x=1Rk],\displaystyle=p\left[1-\sum_{n=0}^{k-2}\frac{R_{k}^{n}}{n!}\frac{d^{n}G_{1}(x)}{dx^{n}}\bigg{|}_{x=1-R_{k}}\right], (90)

where SkS_{k} is the size of the kk-core, and RkR_{k} is the probability that a node reached by following a randomly chosen link belongs to the kk-core. Here, to avoid confusion we use pdp_{d} to represent the degree distribution. The sum n\sum_{n} is over the nodes with nn preserved neighbors, and the sum d\sum_{d} is for the degree distribution. Note that the sum n\sum_{n} in Eq.(89) begins with k1k-1, since the link used to reach the node has already provided a link for the kk-core. For both the two cases k=1k=1 and k=2k=2, Eq.(90) reduces to that of the classical percolation equation (20), so 11-core percolation and 22-core percolation have the same threshold equation (62). For k>3k>3 the threshold of the kk-core percolation becomes distinguishable for kk.

Refer to caption
Figure 9: (Color online) Schematic of the solution of Eq.(90). Only when pp exceeds pcp_{c}, function w(R)w(R) has non-trivial crossing points with RR-axis. The critical point corresponds to the tangency of the function w(R)w(R) and RR-axis, which gives a non-zero RR. The inset shows the non-trivial solution RR for different pp.

To get the critical point, one can also define a function w(Rk)w(R_{k}) as that for classical percolation, see Sec.2.5.1. The critical point thus corresponds to the tangency of w(Rk)w(R_{k}) and RkR_{k}-axis. However, since function w(Rk)w(R_{k}) has more than one extreme point for k3k\geq 3, the tangency point gives a non-zero RkR_{k}, labeled Rk,cR_{k,c}, which indicates that the percolation transition is discontinuous, see the schematic shown in Fig.9.

Although there is no closed form for the solution of Eqs.(88) and (90), the asymptotic expansion shows that the scaling behaviors can also be observed near the critical point, that is

SkSk,c(ppc)β.S_{k}-S_{k,c}\propto(p-p_{c})^{\beta}. (91)

In consequence, the emergence of a kk-core is a unique hybrid phase transition with a jump emergence of the kk-core as a first-order phase transition but also with a critical singularity as the second-order phase transition. This result is first found in Bethe lattice with β=1,2,1/2\beta=1,2,1/2 respectively for k=1,2k=1,2 and 3\geq 3 [215], where the model is called bootstrap percolation. Note that the bootstrap percolation now often refers to another percolation model related to kk-core percolation, and we will give the details later. It is worth noting that this result indicates that 22-core percolation belongs to a different universality class from that of 11-core percolation (the classical percolation), although they have the same percolation threshold [216]. This finding is also confirmed by the Monte Carlo simulation on ER networks [217].

For SF networks that bear a strong heterogeneous degree distribution, a ratio βk=2/βk=1=λ1\beta_{k=2}/\beta_{k=1}=\lambda-1 for 2<λ<32<\lambda<3 and 22 for λ>3\lambda>3 is found [218], where λ\lambda is the exponent of the degree distribution of SF networks. This suggests that the kk-core percolation could reconstitute the normal mean-field nature like that on ER networks when λ>3\lambda>3, while for the classical percolation the criterion is λ>4\lambda>4 [131, 98]. Numerical results also reveal that the fluctuations of the order parameter and mean avalanche size diverge in different ways for k3k\geq 3 [219]. The main focus of this review is network systems, however, we need to point out that the mean-field hybrid nature of the kk-core percolation can only survive in high dimensions [220, 221, 222, 223].

More generally, Dorogovtsev et al. showed that if the second moment of the degree distribution of a network is finite, kk-core percolation has a hybrid nature, and there is no principal difference between different tree-like networks [193, 224, 225]. A dramatic difference takes place if the second moment of the degree distribution diverges, i.e., the SF networks with exponent 2<λ<32<\lambda<3, for which an infinite order transition is observed. Besides, kk-core percolation was also analyzed for clustered networks [226, 227] and for biased initial node removal [228]. Moreover, kk-core percolation is also studied as models of the jamming transition [229, 230], granular gas [231], evolution [232], and nervous system [233]. Furthermore, the inducing mechanism has been found that can bridge the classical percolation and kk-core percolation for a general kk, which always causes a discontinuous percolation transition [234].

In network science, the pruning process of kk-core percolation has also attracted a lot of attention [235, 236, 237, 238, 239], which is used to analyze network structures [211, 240, 241, 242, 243, 244, 245]. It often refers to kk-core/kk-shell decomposition [214], and the nodes belong to kk-core, but not (k+1)(k+1)-core, are named kk-shell.

3.1.2 Variants and related models

Focusing on the pruning process, one can adjust the criterion to observe the emergence of other core structures of networks. Meanwhile, the corresponding percolation transitions can be also defined. Although some of them are not initially motivated by kk-core percolation, here we review these studies roughly attributed to variants and related models of the kk-core percolation.

Heterogeneous kk-core percolation

In the pruning process of kk-core percolation, if different thresholds are allowed for different nodes, the model becomes a sophisticated one known as heterogeneous kk-core percolation [246, 247, 248, 249, 250]. Specifically, after the initial node removal, some of nodes need kak_{a} subtrees to survive from the pruning process, some of nodes need kbk_{b} subtrees, and so forth.

A representative example of the heterogeneous kk-core is the binary mixture 𝐤=(ka,kb)\mathbf{k}=(k_{a},k_{b}) that nodes have a threshold of either kak_{a} or kbk_{b} distributed randomly through the network with probabilities ff and 1f1−f, respectively. With the framework figured by Eqs.(88) and (90), one can easily express the size of the heterogeneous kk-core as a linear combination of those for the two thresholds [246, 247, 248, 249, 250], each of which takes the form as the right hand side of Eq.(88) or Eq.(90).

Refer to caption
Figure 10: (Color online) Schematic of two different phase diagrams of the heterogeneous kk-core percolation. (a) The two lines indicate the continuous transition (blue solid line) and the hybrid transition (green dashed line) matching at a tricritical point (red scatter). (b) The line for the hybrid transition (green dashed line) ends at a point (orange scatter) that is out of the line for the continuous transition (blue solid line). Note that these two figures are conceptual, and do not represent the exact values.

The transition nature of the heterogeneous kk-core percolation obviously depends on the probabilities assigned to different thresholds, as well as the values of thresholds. Although the cases k=1k=1 and k=2k=2 behave similarly in kk-core percolation, the percolation transitions of 𝐤=(1,3)\mathbf{k}=(1,3) and 𝐤=(2,3)\mathbf{k}=(2,3) are much different. For 𝐤=(2,3)\mathbf{k}=(2,3), there is a crossover of the continuous and the hybrid percolation transitions [247], i.e., in the phase diagram the lines for the two types of phase transitions match at a tricritical point, see Fig.10 (a). In contrast, the case 𝐤=(1,3)\mathbf{k}=(1,3) demonstrates a complex phase diagram: the line for the hybrid transition ends at a point that is out of the line of the continuous transition, see Fig.10 (b). This means that a double percolation transition can be observed in the system for some appropriate probabilities ff [248], namely, first showing the continuous transition, and later the discontinuous hybrid transition. In general, the tricritical point never occurs for the case 𝐤=(1,k)\mathbf{k}=(1,k), whereas in the case 𝐤=(2,k)\mathbf{k}=(2,k) it is presented only for k=3k=3 [249]. It is also worth pointing out that the case 𝐤=(2,3)\mathbf{k}=(2,3) can also map onto a model of glasses on the Bethe lattice [251], which suggests the same universality class.

More generally, the ternary mixture [252] and the generalized case 𝐤=(2,3,4,)\mathbf{k}=(2,3,4,\ldots) [250] were also investigated to show new types of critical phenomena. Through the analytic calculations and the numerical simulations on ER networks, Chae et al. concluded that the heterogeneous kk-core percolation can be featured by the series of continuous transitions with order parameter exponents β=2/n,n=1,2,3,\beta=2/n,~{}n=1,2,3,\ldots, discontinuous hybrid transitions with β=1/2\beta=1/2 or 1/41/4, and three kinds of multiple transitions [250].

Cascading failure model

A special case of the heterogeneous kk-core percolation is the one with threshold αk\alpha k for nodes with degree kk, where α[0,1]\alpha\in[0,1] [194, 253, 254, 255, 256, 257, 258, 195, 259, 260]. In other words, if a node ii can be preserved from the pruning process, ki/ki,0αk_{i}/k_{i,0}\geq\alpha must be satisfied for any stages, where kik_{i} and ki,0k_{i,0} are the current and the initial degree of node ii, respectively. This node removal process is usually used to model the spreading of failures on networks, hence its name cascading failure model, sometimes also referred to as threshold model.

The simulation results suggest the existence of the crossover of the continuous percolation transition and the discontinuous percolation transition [195]. However, more detailed checking of the critical phenomena as that of heterogeneous kk-core percolation is still lacking. More often, the emergence of the giant cluster in this model is studied as a criterion for network robustness rather than a characterization for phase transition. We will review these works in the later sections.

In addition, there are two caveats to this model. First, many works on this cascading failure model focus on the clusters formed by failed nodes, not those of preserved nodes [194, 253, 256, 257, 258, 261, 259]. Second, there is another model in network science called cascading failure model which focuses on the overload failures, see Ref.[262] and subsequent references for details.

Bootstrap percolation

Along with the kk-core percolation, there is an important and well-known model, called bootstrap percolation. In the early literature the bootstrap percolation just refers to the model of kk-core percolation introduced above, such as Refs.[215, 263, 246]. Now, the bootstrap percolation is generally referring to an activation process, which starts with a fraction ff of active seeds (occupied nodes) and other nodes (unoccupied) will be activated (occupied) when they have at least kk activated neighbors. A theoretical analysis for this model like Eqs.(88) and (90) can be found in Refs.[196, 264, 265]. This model and its variants are often used to describe the spreading dynamics in network science. A typical example is the cascading failure model reviewed previously.

Although the bootstrap percolation is an activation process beginning from a sparsely activated network, while the kk-core percolation is a pruning process beginning from the whole network, a corresponding relation between them can also be established [248]. For comparison, we construct a heterogeneous kk-core percolation with 𝐤=(1,k)\mathbf{k}=(1,k). Threshold 11 is randomly assigned to a fraction ff of nodes corresponding to the seeds in bootstrap percolation, and other nodes have threshold kk. By this mapping, the heterogeneous kk-core percolation 𝐤=(1,k)\mathbf{k}=(1,k) can demonstrate very similar critical phenomena as the bootstrap percolation, i.e., a double percolation transition [196, 248]. However, this model cannot completely recover the bootstrap percolation [248]. This is because all the clusters in the bootstrap percolation must grow from the seeds, while for heterogeneous kk-core percolation nodes can also survive from the pruning process by forming a compact cluster. Moreover, Di Muro et al. also pointed out that the heterogeneous bootstrap percolation is the complement of the heterogeneous k-core percolation for complex networks with any degree distribution in the thermodynamic limit, as long as the thresholds of the nodes in both processes complement each other [264].

Due to the correlation of bootstrap percolation and kk-core percolation, many works for kk-core percolation also contributed to bootstrap percolation. For systems in physical dimensions and the Bethe lattice, one can also refer to Refs.[266, 267, 268, 269, 270, 271, 272]. In network science, bootstrap-like processes are often used to model the cascading dynamics [3, 6, 23, 273, 1, 86], such as the spreading of information, opinion, and disease. Instead of the critical phenomena, these researches are focusing on the dynamical characteristics and their relations with the problems in real life. For this reason, there is no need to lump them into a clear category (bootstrap percolation or heterogeneous kk-core percolation). Some of these studies will be reviewed in the later sections.

Core percolation

The core of a network is defined as the subnetwork after a greedy leaf removal (GLR) procedure. Here, the leaf of a network refers to the node of degree 11. The GLR procedure is just iteratively removing nodes of degree 11 along with its neighbor. Note that the nodes with degree 11 only have one neighbor. Apparently, the GLR procedure is more destructive than the pruning process of 22-core percolation, and finite clusters cannot exist in tree-like networks. As the kk-core percolation, the core percolation model is also used to identify the core structure of networks, which could dominate the dynamical properties of complex networks [274].

Refer to caption
Figure 11: (Color online) Removal categories of nodes in the core percolation. Red nodes are non-removable, i.e., they belong to the core indicated by shaded background. Green nodes are removable: nodes 11 and 22 are α\alpha removable; nodes 33 and 55 are β\beta removable; node 44 is γ\gamma removable.

Considering the branching process, the infinite cluster (core) of a tree-like network can also be obtained exactly [118]. For that, nodes are divided into four categories, see Fig.11: (i) α\alpha removable: nodes that can become isolated without directly removing themselves, i.e., all neighbors are β\beta removable; (ii) β\beta removable: nodes that can become a neighbor of a leaf, i.e., at least one neighbor is α\alpha removable; (iii) γ\gamma removable: nodes that can become leaves but are neither α\alpha nor β\beta removable; (iv) non-removable: nodes that cannot be removed and hence belong to the core. The rule for γ\gamma removable nodes can be ignored because it is not useful to determine the size of the core. Denoting α\alpha (or β\beta) as the probability that a random neighbor of a random node in a network is α\alpha removable (or β\beta removable), two self-consistent equations can be derived,

α\displaystyle\alpha =G1(β),\displaystyle=G_{1}(\beta), (92)
β\displaystyle\beta =1G1(1α),\displaystyle=1-G_{1}(1-\alpha), (93)

where G1(x)=kpkkxk1/kG_{1}(x)=\sum_{k}p_{k}kx^{k-1}/\langle k\rangle is the generating function of the excess-degree distribution. From the two equations, it can be found that α\alpha satisfies α=G1[1G1(α)]\alpha=G_{1}[1-G_{1}(\alpha)]. With this, we can find the solutions of α\alpha and β\beta. For tree-like networks, if the core exists, it must have infinite size (a giant cluster). Hence, the size of the giant cluster, i.e., the normalized core size, can be expressed as

Score\displaystyle S_{core} =k=0pks=2k(ks)βks(1βα)s\displaystyle=\sum^{\infty}_{k=0}p_{k}\sum^{k}_{s=2}\binom{k}{s}\beta^{k-s}(1-\beta-\alpha)^{s}
=G0(1α)G0(β)c(1βα)α,\displaystyle=G_{0}(1-\alpha)-G_{0}(\beta)-c(1-\beta-\alpha)\alpha, (94)

where G0(x)=kpkxkG_{0}(x)=\sum_{k}p_{k}x^{k} is the generating function of degree distribution. This approach can also be generalized for directed networks with given in- and out-degree distributions [118].

The theoretical results show that, for undirected networks, if the core percolation occurs, then it is always continuous, while for directed networks, it becomes discontinuous with a hyperscaling β=1/2\beta=1/2 when the in- and out-degree distributions are different [118]. For ER networks the percolation threshold can be solved theoretically kc=2.7182818\langle k\rangle_{c}=2.7182818\ldots [275], and the core of purely SF networks never percolates for any degree exponents larger than 22. While for the SF networks generated by the static model with pkkγp_{k}\propto k^{-\gamma} only for large kk [276], the core develops when the average degree is larger than a threshold value, which is actually similar to ER networks.

The leaf removal procedure of the core percolation can also be generalized as that nodes of degree smaller than kk, together with their nearest neighbors and all incident links are progressively removed from a random network [277]. With this pruning process, the percolation transition can also be well defined, which can be seen as either a generalized core percolation or a generalized kk-core percolation. Similar to the ordinary kk-core percolation, this model also shows a discontinuous phase transition for k3k\geq 3.

Greedy articulation points removal

The articulation points of a network are the nodes whose removal disconnects the network, such as nodes 33, 44, 55 and 66 in Fig.11. Those nodes play an important role in keeping the connectivity of real-world networks, such as infrastructure networks, protein interaction networks and terrorist communication networks. Tian et al. proposed a greedy articulation point removal (GAPR) process to study the organizational principles of complex networks [119].

In each iterative step of GAPR, all the articulation points are removed from the network and their sub-clusters disconnected from the giant cluster are also removed. After that new articulation points emerge. This removal process is repeated until there is no articulation point left in the network. Note that at each step all the articulation points are removed simultaneously and the size of the giant cluster in the final network is studied as a key quantity.

Refer to caption
Figure 12: (Color online) A comparative sketch of the 22-core, the biconnected cluster (BCC), the cluster obtained by the greedy leaf removal (GLR), and the one obtained by the greedy articulation points removal (GAPR). For a given network (shown left), the four figures on the right show the corresponding subnetworks (indicated by red) under the four pruning rules, respectively. (a) 22-core. The cluster is obtained by iteratively removing nodes with degree 11. (b) BCC. The nodes in a BCC must connect each other with more than one independent paths, which can be found by removing all the nodes and links that do not belong to any loop structure. Here, two separated BBCs are obtained. (c) GLR. The cluster is obtained by iteratively removing nodes of degree 11 along with their direct neighbors. (d) GAPR. The cluster is obtained by iteratively removing the articulation points, which are the nodes that can bridge two clusters. After these pruning/connection processes, there generally exist more than one clusters. The corresponding percolation theory considers the emergence of such a giant cluster.

It is obvious that if the network is sparse, the giant cluster will finally be destroyed with the proceeding of GAPR. For a dense network, a giant cluster, in which all the nodes connect to each other with at least two paths, can be obtained. Note that this cluster is not equivalent to the 22-core, or the biconnected cluster (BCC) [278, 279, 117, 218, 216], or the one obtained by GLR. A comparative sketch of the 22-core, the BBC, the cluster preserved after the GLR, and the one after the GAPR is shown in Fig.12.

Next, we briefly cover the relations between these clusters. For convenience, let S2coreS_{2-core}, SBBCS_{BBC}, SGLRS_{GLR}, and SGAPRS_{GAPR} be the subnetworks after applying the pruning rules of 22-core, BBC, GLR, and GAPR, respectively. For 22-core, one only needs to recursively remove the nodes with degree 11, thus some tree-like structures might exist in 22-core. The nodes in a BCC must connect each other with more than one independent paths. This further requires removing the nodes and links that do not belong to any loops from the 22-core, therefore, SBBCS2coreS_{BBC}\subseteq S_{2-core}. For GLR, when we remove the nodes with degree 11, their root nodes will also be removed, thus we also have SGLRS2coreS_{GLR}\subseteq S_{2-core}.

However, the relation of SBBCS_{BBC} and SGLRS_{GLR} is network dependent. The case shown in Fig.12 gives an example that SBBCSGLRS_{BBC}\supseteq S_{GLR}. Considering a network configuration that two triangles are connected by a degree-22 node, we can also have SBBCSGLRS_{BBC}\subseteq S_{GLR}. But both SBBCS_{BBC} and SGLRS_{GLR} are larger than SGAPRS_{GAPR}. This is because all the nodes and links that do not belong to SBBCS_{BBC} and SGLRS_{GLR} are certainly not in SGAPRS_{GAPR}, whereas the articulation points excluded from SGAPRS_{GAPR} might contribute to SBBCS_{BBC} or SGLRS_{GLR}, see Fig.12. In summary, for a given network, we have the relation SGAPRSBBC(SGLR)S2coreS_{GAPR}\subseteq S_{BBC}(S_{GLR})\subseteq S_{2-core}.

An additional consequence of the above relation is that the emergence of the giant cluster after GAPR needs a denser initial configuration than that after GLR. For ER networks, it has been found that the critical point of the percolation induced by GAPR is kc=3.39807\langle k\rangle_{c}=3.39807\ldots [119], which is obviously larger than the one induced by GLR (core percolation) [118], see Tab.3 for the percolation thresholds of some models on ER networks. This percolation is also proved to be a hybrid percolation transition with critical exponent β=1/2\beta=1/2. Note that the core percolation is always continuous for undirected networks. Besides, if one only does a finite iteration of the GAPR, the percolation transition can also be observed, and will have the same nature of the ordinary percolation. This finding has the same manner of the so-called history-dependent percolation on multiplex networks [199].

There is also a variation of GAPR: iteratively remove the most destructive articulation point that will cause the most nodes disconnected from the giant cluster of the current network. This is called articulation point targeted attack strategy [119]. Given a limited “budget” (that is, the number of nodes to be removed), this strategy is very efficient in reducing the giant cluster, compared with strategies based on other node centrality measures, such as degree [84, 85] and collective influence [280].

The statistical properties of articulation points, i.e., the probability that a random node in a network is an articulation point, has been also derived in Ref.[281]. It has been found that high-degree nodes are more likely to be articulation points than low-degree nodes. Articulation point is related to another concept of “bridge”, which is a link whose removal disconnects the network and increases the number of connected clusters [282]. Both articulation point and bridge can be used to qualify the importance of a bridge structure in damaging a network and provide useful ideas to destroy a network efficiently.

Color-avoiding percolation

In the color-avoiding percolation, the nodes in the percolating cluster also need more than one path that connect with each other. The details are as follows. First, each link is assigned a color chosen from 𝐂={c1,c2,,cn}\mathbf{C}=\{c_{1},c_{2},\ldots,c_{n}\}. If there exists a path between two nodes after the removal of all the links of color c𝐂c\in\mathbf{C}, the two nodes are called cc-avoiding connected. Then, if two nodes are cc-avoiding connected for all the color cc, we say that the two nodes are color-avoiding connected. Thus, the color-avoiding percolation studies the behaviors of the giant cluster, in which all the nodes are color-avoiding connected. The color avoiding can also be generalized as the case where arbitrary sets of colors are avoided. Furthermore, colors can also be assigned on nodes, that is the setting when the model was first proposed [283].

Different from the previous percolation models, the color-avoiding connected clusters cannot be simply acquired from a single pruning process. Instead, one can find cc-avoiding connected clusters for all the color c𝐂c\in\mathbf{C}, then the intersections of these clusters are just the color-avoiding connected clusters, see Fig.13 for a schematic of a three-color case. Due to this intersection process, the connectivity of a color-avoiding cluster could access through some nodes/links outside the color-avoiding cluster.

In Ref.[284], it is suggested that the complexity of finding the color-avoiding cluster highly depends on the exact definition, using a strong version the problem is NP-hard, while using a weaker notion makes it possible to find the components in polynomial time. In spite of this, there are some special structures that surely do not contribute to the color-avoiding connected clusters. The first is the bridge link, whose removal disconnects the network. The second is the nodes that all their links have identical colors. A typical example for these two structures is the nodes of degree 11 and their links, i.e., the leaves of the network.

Besides, the size of 𝐂\mathbf{C}, i.e., the number of colors nn, also has significant restrictions on the structural features of the color-avoiding connected clusters. First, a small nn generally results in a large frequency for each color, thus the emergence of color-avoiding clusters for small nn requires a dense connection. Second, the size of the color-avoiding connected cluster has a lower bound, which increases with the color number nn. All these suggest that the color-avoiding connected cluster must have a relatively dense and complex structure, containing many loops.

Refer to caption
Figure 13: (Color online) A schematic for color-avoiding percolation. Here, three colors, namely, red, yellow, and blue, are uniformly distributed among all the links, see (a). If there exists a path between two nodes after the removal of all the blue/red/yellow links, the two nodes are called blue/red/yellow-avoiding connected. Then, if two nodes meet this connection criterion for all the three colors, we say that the two nodes are color-avoiding connected. A cluster, whose nodes are all color-avoiding connected, is thus the color-avoiding connected cluster. (b)-(d) show the clusters after the removal of blue, red, and yellow links, respectively. The intersections of these clusters correspond to the color-avoiding connected clusters indicated by green shades in (a). Note that the color-avoiding connections of the nodes in a color-avoiding cluster could access through some links outside the color-avoiding cluster.

If there is only one color n=1n=1, the model reduces to a classical percolation with respect to the fraction of colored links/nodes. For n2n\geq 2, the critical behaviors have a rich, multifaceted nature, depending on the network topology, the number of avoided colors and the color frequencies [283, 209, 210, 284]. If all the colors have identical probabilities, the results on ER networks show that the critical exponent β\beta is dependent on the number of colors, i.e., β=n\beta=n. This is identical for both node and link coloring. Varying the color frequencies, fractal exponents can also be observed. However, topological constraints of SF networks are so strong that the corresponding critical behaviors are independent of the number of the colors (only dependent on the exponent λ\lambda), but still dependent on the existence of the colors and therefore different from standard percolation. Due to the vanished percolation threshold, the breaking of the universality class of the site percolation and the bond percolation can also be observed for 2<λ<32<\lambda<3. In Ref.[285] a generic analytic theory that describes how structure and sizes of all connected components in the network are affected by simple and color-dependent bond percolation was also established.

3.2 Percolation on interdependent/multiplex networks

The hybrid percolation transition is not unique to the kk-core percolation [20, 21, 286, 287, 288, 289, 290, 291]. In the last decade a major concern in network science, the interdependent networks or multilayer networks [8, 9], also involves a dependent percolation, for which both abrupt change and critical exponents can be found at the critical point [120, 121, 292]. For convenience, the following discussions are given by means of interdependent networks [120], see Fig.14 for an example of the interdependent networks.

This model considers the iterative percolation process between different network layers. If a node does not belong to the giant cluster of one layer, its dependent nodes in other layers are also no longer eligible to be considered in the percolation of their layers, and vice versa. Obviously, one must check the percolation process of each layer interactively to obtain a steady giant cluster, referred to as the mutual giant cluster. Thus, this model checks the emergence of the mutual giant cluster after an initial node removal.

Refer to caption
Figure 14: (Color online) Schematic of the interdependent/multiplex network. Here, the system has two network layers AA and BB indicated by different colors, both of which can have their own topology, and dependence links indicated by dashed lines are used to represent the interdependence between nodes from network layers AA and BB. More generally, there can be more than two layers, and the interdependence relation needs not to be restricted to one-to-one.

3.2.1 Model and phase transition characteristics

Refer to caption
Figure 15: (Color online) Schematic of the cascading process on interdependent networks. Here, we consider an interdependent network with two layers, represented by the two columns of circles with the corresponding solid lines, respectively. The dashed lines are the dependence links between the two network layers. With an initial node removal (indicated by red in (a)), the percolation process ((b) and (d)) and the dependence process ((c) and (e)) alternately pruning nodes from the system (indicated by gray) until no more nodes can be removed.

To provide a direct and precise description for this percolation model, the dependence link is used to connect dependent nodes in different network layers, see Fig.14. Specifically, a dependence link between node ii and node jj means that if node ii is removed, node jj will also be removed, and vice versa. The node removal process caused by dependence links is often called dependence process. Correspondingly, all the finite percolation clusters in each layer will also be removed from the system, which is called percolation process. Thus, after an initial node removal (fraction 1p1-p), the removal of nodes caused by the percolation process and the dependence process will occur alternately until no more nodes can be removed, see Fig.15. The remained nodes (if any) form the mutual giant cluster, in which nodes are reachable through each layer. Note that if we do not remove the finite clusters from the system, the finite mutual clusters can also be well defined in the steady state [293, 199].

The above pruning process to find the mutual giant cluster was first proposed in Ref.[120], in which the dependence links are assigned randomly between two network layers, labeled AA and BB (see Fig.14). Assuming both the two network layers have tree-like structures, we can express the size of the mutual giant cluster as

S=p[1G0A(1RA)][1G0B(1RB)].S=p\left[1-G_{0}^{A}(1-R^{A})\right]\left[1-G_{0}^{B}(1-R^{B})\right]. (95)

Here, RAR^{A}(RBR^{B}) represents the probability that a randomly chosen link in network layer AA(BB) leads to the mutual giant cluster. By analogy with that of classical percolation equations (15) and (16), we can know that the right hand side of Eq.(95) just describes the final state of the pruning process, i.e., all the preserved nodes must be reachable in both layers. Similarly, RAR^{A} and RBR^{B} must satisfy

RA\displaystyle R^{A} =p[1G1A(1RA)][1G0B(1RB)],\displaystyle=p\left[1-G_{1}^{A}(1-R^{A})\right]\left[1-G_{0}^{B}(1-R^{B})\right], (96)
RB\displaystyle R^{B} =p[1G0A(1RA)][1G1B(1RB)].\displaystyle=p\left[1-G_{0}^{A}(1-R^{A})\right]\left[1-G_{1}^{B}(1-R^{B})\right]. (97)

The above three equations (95)-(97) were first used to analyze the percolation on the networks with dependence links by Son et al. [294, 295]. Considering the iterative process of the pruning process, this model can also be solved. That is the analytical method when this model was proposed [120, 121], which is equivalent to the method listed above [296, 297].

From Eqs.(96) and (97), we can also find the critical point of the system as we do for the classical percolation. The solution of RR shows a similar behavior as kk-core percolation illustrated in Fig.9, indicating a discontinuous transition. The value of pcp_{c} is significantly larger than that of classical percolation. For example, two interdependent ER networks give pc2.4554/kp_{c}\approx 2.4554/\langle k\rangle, while pc=1/kp_{c}=1/\langle k\rangle for single ER networks. This also means that k2.4554\langle k\rangle\approx 2.4554 is the minimum degree to observe a mutual giant cluster in two interdependent ER networks. Even for SF network layers with 2<λ<32<\lambda<3, the mutual giant cluster can emerge at a non-vanished pcp_{c}. Further study have showed that this discontinuous transition is also a hybrid transition with exponent β=1/2\beta=1/2 [121, 292], which is the same as the kk-core percolation with k3k\geq 3.

Parshani et al. also pointed out that the number of iterations in the cascading process of interdependent networks diverges when ppcp\to p_{c} [121], and thus can be used to identify the transition point in numerical simulations. This is mainly because there is a plateau in the collapse of the system, and the plateau regime increases with the system size. Zhou et al. found that during the collapse there is also a random branching process at criticality, i.e., a continuous percolation transition. This simultaneously continuous phase transition is just the origin of the observed long plateau regime in the cascading failures and its dependence on system size [298]. This result coincides with the one found in Ref.[199], in which the percolation transition is defined in each iteration, called generation. Theoretical analysis indicates that for any finite generation, the system demonstrates a continuous percolation transition. Monte Carlo simulations on ER networks further suggest that all these continuous transitions belong to the same universality class. Specifically, SF networks with exponent 2<γ<32<\gamma<3 have a vanished critical point for any finite generations. For infinite generation, it just recovers the percolation transition on interdependent networks, i.e., it has a nature of hyper transition. Hu et al. further pointed that in two-dimensional systems the infinite generation still presents a continuous transition, but with a different universality class [200].

By decreasing the coupled strength qq (the fraction of nodes that have dependence links), this discontinuous percolation transition can also reduce to the continuous transition [121]. An exception is the coupled lattices, for which the networks abruptly collapse for any finite q>0q>0 [299]. But this does not always mean the continuous percolation cannot exist in interdependent spatially embedded networks. If the dependence links establishing the interdependence between two lattices are not random but have a typical length rr, the percolation transition will be continuous for r<rmax8r<r_{max}\approx 8 (lattice units), and discontinuous for larger rr [300]. For r<rmaxr<r_{max}, the percolation threshold increases linearly with rr from 0.5930.593 at r=0r=0 and reaches a maximum, 0.7380.738 for r=rmaxr=r_{max}, and then gradually decreases to 0.6830.683 for r=r=\infty. This is mainly because the spatial embedding induces a similar structure in different network layers, and thus breaks the cascading picture of interdependence [294]. This model has also been generalized to high dimensions [301]. The simulation results show that the value rmaxr_{max} decreases with dimension, and for lattices with dimension larger than or equal to 66, the continuous percolation transition disappears. This suggests that a high degree of freedom (high dimension) decreases the structural similarity of the spatially embedded networks under random node or link dilution, thus reconstructs the cascading picture.

It should be noted that the model, based on which the authors claimed that percolation transitions are not always sharpened by making networks interdependent [294], is a bond percolation model on the interdependent square lattice with typical length r=0r=0, and not the one discussed above. Thus their findings do not contradict that of Ref.[300], which considers the site percolation with different typical lengths. Grassberger also studied the bond percolation model with typical length r=0r=0 in high dimension lattices [293]. The simulation results have showed that there is an upper critical dimension d=4d=4, below which the percolation transition is always continuous, and at least for d3d\neq 3 – not in the universality class of ordinary percolation. This suggests that although the similar structure induced by spatial embedding breaks the cascading picture of interdependence, it indeed changes the nature of the branching process in the continuous percolation transition. The bond percolation has also been studied on multiplex networks, and reported a multiple percolation phase transition [302]. Here, the multiple percolation transition refers to more than one transition in a percolation process, also is reported in other related percolation models [303, 304, 305, 306, 307, 308].

In Ref.[293], an efficient and implementable algorithm has also been proposed, based on which some other critical behaviors of the percolation on interdependent networks have been rechecked, numerically. The result suggests that for the model of Ref.[300] there exists such a rmaxr_{max} above which the transition seems to be discontinuous, but a strong evidence that this is related to very large finite-size corrections was also found. Moreover, this work also provided some evidences that the cluster statistics of independent ER networks can be exactly described by mean-field theory, while the cascade process cannot. For interdependent lattices with dimension d=5d=5 a discontinuous transition was observed as ER networks, but a nontrivial distribution of finite clusters also exists at the transition point.

Besides, some other percolation model can also be generalized to multiplex networks, such as kk-core percolation [309], bootstrap percolation [310], and group percolation [311].

3.2.2 Algorithms for reducing time complexity

For facilitating the studies on non-tree-like networks, some efficient algorithms have been developed to find the mutual giant cluster. Tracing the largest cluster during the whole pruning process, the algorithm proposed by Schneider et al. can achieve time complexity O(NlogN)O(N\log N) [312]. Nevertheless, this processing is too loose to study the critical phenomena, since an initially large cluster could be taken over by a cluster that had started smaller after the pruning process, especially near the critical point. Apart from the critical phenomena or limited to tree-like networks, this algorithm might be a good choice to avoid brute-force searching.

Although the finite mutual clusters are ignored in the original definition of this percolation model [120, 121], all mutual clusters must be taken into count for providing a more reliable simulation. A viable way to define such clusters can be found in Refs.[293, 199], in which the percolation transition can be defined in each generation. Based on a fully dynamic graph algorithm [279], Hwang et al. proposed an efficient algorithm designed to proceed as links are deleted, whose time complexity is approximately O(N1.2)O(N^{1.2}) for random networks [313]. With this algorithm, large scale simulations for ER and 22-dimensional interdependent networks have been carried out, which verifies the exponents for the order parameter and its fluctuations numerically, as well as the finite avalanches [314]. Stippinger et al. later generalized this algorithm to efficiently simulate interdependent networks with healing [315].

Besides, Grassberger proposed an easily implemented algorithm to find all the mutual clusters by mapping the problem onto a solid-on-solid model [293]. This method is simply done by alternately performing Leath-like processes on different network layers [316], each is referred to as a “wave” from the starting node. Waves are confined to the nodes that have the same “height” as the starting node. All the nodes’ heights are initially set as 0, and after the spreading of a wave, the heights of the involved nodes increase by 11. Do the searching process for all the possible starting nodes, a landscape of node heights can be obtained, from which the mutual giant cluster and the distribution of mutual cluster sizes can be found. With some techniques this algorithm can also simulate the percolation process on interdependent networks, effectively. In Ref.[315], it is also pointed out that compared to the algorithm of Ref.[293] their algorithm is generally intended for fast updates at the cost of higher memory use.

3.2.3 Variants and related models

With the growing trend of multiplex networks, this percolation model has also been employed frequently to figure out the robustness of multiplex networks [8, 317, 296, 318, 9], which can be generally classified into three categories, single networks with dependence links, networks of networks, and multiplex networks with general and special dependence. Most of these studies demonstrate a hybrid transition as shown above, and a crossover between the continuous and discontinuous percolation transition can also be found in some of them. Besides, the correlation between different network layers can also be represented by interconnections, in which the percolation transition can be also observed [319].

More importantly, these works show the robustness of multiplex networks in various environments, and are meaningful for understanding the structures and the robustness of multiplex networks. More results will be reviewed later when we turn our thoughts to the applications of percolation model. Of course, one can find more information in the special papers and books on multiplex networks [8, 317, 296, 318, 9].

3.3 Explosive percolation

The explosive percolation was proposed by Achlioptas et al. in 2009 [201], hence this type of percolation process is named as Achlioptas process. In general, Achlioptas process is a competitive process that can be realized by many rules, and the key is suppressing the emergence of a giant cluster. There is a specialized review article for this issue [206], here we only give a brief introduction of the main findings.

3.3.1 Achlioptas process

Refer to caption
Figure 16: (Color online) (a) Illustrations for the classical percolation and the explosive percolation on ER networks. Here the explosive percolation is manufactured by the best-of-22 rule. (b) Schematic of the best-of-22 rule. At each time step, 22 potential links indicated by dashed lines are chosen randomly from all the possible ones. According to the sum/product rule, here the potential link l34l_{34} will be inserted, and link l12l_{12} is discarded.

As we know, ER networks can be generated by randomly inserting links into a set of nodes one by one. The Achlioptas process is just a variation of this process [201]. Assuming there are initially NN nodes and no links, at each time step, mm potential links are arbitrarily chosen, each of which is supposed to bridge two distinct clusters with size s1s_{1} and s2s_{2}, respectively. (The two nodes can belong to the same cluster with no impact on the following process). Then, according to a function f(s1,s2)f(s_{1},s_{2}), the potential link with the smallest f(s1,s2)f(s_{1},s_{2}) is inserted eventually, and other potential links are discarded, see Fig.16 (b) as an example. Repeating this process until the expected number of links is met. Such a selection mechanism is often called the best-of-mm rule or the min-cluster-mm rule [202], and the function f(s1,s2)f(s_{1},s_{2}) represents the competitive rule.

In order to suppress the emergence of the giant cluster, the competitive rule f(s1,s2)f(s_{1},s_{2}) must be of positive correlation with s1s_{1} and s2s_{2}. Two commonly used ones are the product rule f(s1,s2)=s1s2f(s_{1},s_{2})=s_{1}s_{2} and the sum rule f(s1,s2)=s1+s2f(s_{1},s_{2})=s_{1}+s_{2}. For m=1m=1, this model reduces to the ER network model, which percolates when N/2N/2 links have been inserted, i.e., k=1\langle k\rangle=1. For m2m\geq 2, the emergence of a giant cluster will be suppressed, resulting in an explosive percolation. A comparison of the two cases is shown in Fig.16 (a). One can find that the explosive percolation shows a sharper transition with a larger threshold.

Since Ref.[201], many other generalized Achlioptas processes have been proposed and investigated. A typical example is the ll-node (ll-vertex) model. In this model ll nodes are chosen randomly at each step, and two of them are connected according to a selection criteria, for example, the size difference [320]. The original Achlioptas process just corresponds to l=4l=4, and the case l=3l=3 with different selection criteria is used in many studies [204, 321, 322].

Besides, there are some other models can realize the Achlioptas process, such as BFW model [323, 324, 325, 326, 327, 328, 329, 330], dCDGM model [203, 331, 332, 333, 334, 335, 336, 337], Hamiltonian model [205], diffusion-limited cluster aggregation model [338, 339], spanning cluster-avoiding model [340, 341], and hybrid model [342, 343, 344]. One can find a contrast analysis of these models in Ref.[345]. Most of these models can be generalized to any networks by constraining the potential links in a given configuration. One can refer to Ref.[206] for more information.

3.3.2 Phase transition characteristics

For a deep understanding of the explosive percolation, one can consider a complete network with size NN [202, 321]. In each step tt, all the unconnected node pairs provide potential links, i.e., m=N(N1)/2t+1m=N(N-1)/2-t+1. In this case, up to N/2N/2 new links are added, only connect isolated nodes to result in clusters with size 22 such that the maximum cluster size remains 22 for the first N/2N/2 steps. Each of the subsequent N/4N/4 links connects clusters of size 22 to clusters of size 44 keeping the maximum cluster size 44 until step 3N/43N/4. As an analogy, in step N1N−1, the remaining two clusters of size N/2N/2 join. The jump of the largest cluster is thus clearer, indicating a discontinuous transition [324, 346]. In general, one can connect a link chosen from all the possible positions at each step, thus the jump of the largest cluster is excepted in any network configurations. By the way, such successive jumps of the largest cluster are named microtransition cascades [330], which is probably used to warn signals anticipating phase transitions in complex systems.

A small mm will relax the suppression for the emergence of the giant cluster, thus the jump of the largest cluster is blurred. Although a sharp transition might also be observed as shown in Fig.16, Riordan et al. pointed out that any rule based on picking a fixed number of random nodes gives a continuous transition [324]. They also suggested a lower end of the range: Whenever mm\to\infty as NN\to\infty, the Achlioptas process exhibits explosive percolation. The findings of da Costa et al. show that the Achlioptas process is irreversible, which also indirectly suggests a continuous transition [203]. However, these findings do not mean that the Achlioptas process behaves exactly like a continuous percolation. The discontinuities can also be observed in the Achlioptas process [320]. The double humped distribution of the sizes of the largest cluster and the hysteresis, which are both recognized as the sign of discontinuous transition, can be also found in finite systems [347, 348]. These works also reported a finite-size behavior with non-analytic scaling functions, and demonstrated that the explosive percolation transitions are indeed continuous but with some unusual scaling properties [203, 349, 350, 325, 347, 351, 352, 345]. Riordan et al. also pointed out that the fluctuation of the order parameter in explosive percolation does not disappear in the thermodynamic limit [322].

3.4 Percolation transition during the growth of networks

In contrast to the network models with a fixed size, such as the configuration model and ER network model, there is a family of network models with growing size featuring the growing property of real networks [3, 4, 353]. A typical example is the famous BA networks [56]. However, there is always only one single cluster in the growing process, thus the system always percolates. To observe a percolation transition in the growth of a network, it must be allowed that new nodes enter the system without connecting to the existing nodes. In this way, the network could contain isolated nodes and finite clusters.

Instead of looking at the network configuration, one can solve the growing network model by checking the evolution process via the master equation. The key point is featuring the increment and decrement of the parameters in one time step, which might vary from model to model [354, 125, 355].

3.4.1 Growing random network

A simple rule to achieve a percolation transition in network growing is the one proposed in Ref.[125], called growing random network. This model begins with no node, and at each time step a new node is inserted into the system, then two nodes are chosen randomly from all the existing nodes and joined by a link with probability pp.

At time step tt, there are N(t)=tN(t)=t nodes in the system. Let pk(t)p_{k}(t) be the degree distribution at time step tt, then a rate equation for the evolution of the degree distribution pk(t)p_{k}(t) can be established,

N(t+1)pk(t+1)N(t)pk(t)=2ppk1(t)2ppk(t).N(t+1)p_{k}(t+1)-N(t)p_{k}(t)=2pp_{k-1}(t)-2pp_{k}(t). (98)

The two terms on the right hand side of this equation correspond to the expected increment and decrement of nodes with degree kk in one time step, respectively. For tt\to\infty, pk(t+1)=pk(t)pkp_{k}(t+1)=p_{k}(t)\equiv p_{k}, and Eq.(98) reduces to a recursion formula pk/pk1=2p/(1+2p)p_{k}/p_{k-1}=2p/(1+2p). Note that Eq.(98) holds only for k>0k>0, and for k=0k=0 it is specialized as

N(t+1)p0(t+1)N(t)p0(t)=12pp0(t),N(t+1)p_{0}(t+1)-N(t)p_{0}(t)=1-2pp_{0}(t), (99)

which gives p0=1/(1+2p)p_{0}=1/(1+2p) for tt\to\infty. Consequently, the degree distribution has a closed form for tt\to\infty,

pk=11+2p(2p1+2p)k.p_{k}=\frac{1}{1+2p}\left(\frac{2p}{1+2p}\right)^{k}. (100)

Obviously, the degree distribution pkp_{k} decreases with the increase of kk. In fact, the power-law degree distribution can also be produced in this model by introducing preferential attachment [355, 356].

From Eq.(100), the generating functions are also available, those are G0(x)=1/(1+2p2px)G_{0}(x)=1/(1+2p-2px) and G1(x)=1/(1+2p2px)2G_{1}(x)=1/(1+2p-2px)^{2}. Substituting G1(x)G_{1}(x) into the Molloy-Reed criterion Eq.(28), we can find the percolation threshold pc=1/4p_{c}=1/4. This result is for an uncorrelated tree-like network with degree distribution Eq.(100), whereas the growing random network has degree correlation. Specifically, earlier nodes are sure to form a core, in which there is a higher average degree. This indicates that the giant cluster forms more readily than in a network whose links are uniformly distributed.

To find the percolation threshold of the growing random network, we need to examine the cluster size distribution [125]. Let ns(t)n_{s}(t) be the normalized cluster number at time step tt, i.e., the ratio of the number of clusters with size ss to the total number of nodes, then a rate equation for the evolution of ns(t)n_{s}(t) can be expressed as

N(t+1)n1(t+1)N(t)n1(t)\displaystyle N(t+1)n_{1}(t+1)-N(t)n_{1}(t) =12pn1(t),\displaystyle=1-2pn_{1}(t), (101)
N(t+1)ns(t+1)N(t)ns(t)\displaystyle N(t+1)n_{s}(t+1)-N(t)n_{s}(t) =pi=1s1ini(t)(si)nsi(t)2psns(t),s2.\displaystyle=p\sum_{i=1}^{s-1}in_{i}(t)(s-i)n_{s-i}(t)-2psn_{s}(t),~{}~{}~{}s\geq 2. (102)

Note that snssn_{s} can be interpreted as the size of the cluster that a randomly chosen node belongs to. Therefore, pi=1s1ini(t)(si)nsi(t)p\sum_{i=1}^{s-1}in_{i}(t)(s-i)n_{s-i}(t) is the increment of clusters with size s2s\geq 2 by combining two small clusters in one time step, while for s=1s=1 the increment just is the newly added node; 2psns(t)2psn_{s}(t) is the decrement of clusters with size ss in one time step by combining two clusters that at least one of them has size ss.

For tt\to\infty, ns(t+1)=ns(t)nsn_{s}(t+1)=n_{s}(t)\equiv n_{s}, then Eqs.(101) and (102) can be simplified to

n1\displaystyle n_{1} =12pn1,\displaystyle=1-2pn_{1}, (103)
ns\displaystyle n_{s} =pi=1s1i(si)ninsi2psns,s2.\displaystyle=p\sum_{i=1}^{s-1}i(s-i)n_{i}n_{s-i}-2psn_{s},~{}~{}~{}s\geq 2. (104)

Although the two equations cannot give a closed form of nsn_{s}, we can find a recursive function for the generating function H0(x)=sπsxs=ssnsxsH_{0}(x)=\sum_{s}\pi_{s}x^{s}=\sum_{s}sn_{s}x^{s} by multiplying both sides of Eqs.(103) and (104) by sxssx^{s} and then summing over ss. The result reads

H0(x)\displaystyle H_{0}(x) =s=1snsxs\displaystyle=\sum_{s=1}sn_{s}x^{s}
=x2pn1x+ps=1i=2s1i(si)ninsisxs2ps=2s2nsxs\displaystyle=x-2pn_{1}x+p\sum_{s=1}\sum_{i=2}^{s-1}i(s-i)n_{i}n_{s-i}sx^{s}-2p\sum_{s=2}s^{2}n_{s}x^{s}
=x+pxddx(s=1snsxs)22pxddxs=1snsxs\displaystyle=x+px\frac{d}{dx}\left(\sum_{s=1}sn_{s}x^{s}\right)^{2}-2px\frac{d}{dx}\sum_{s=1}sn_{s}x^{s}
=x+2pxH0(x)H0(x)2pxH0(x),\displaystyle=x+2pxH_{0}(x)H_{0}^{\prime}(x)-2pxH_{0}^{\prime}(x), (105)

where nanxnnbnxn=nmambnmxn\sum_{n}a_{n}x^{n}\sum_{n}b_{n}x^{n}=\sum_{n}\sum_{m}a_{m}b_{n-m}x^{n} is used. Now, we have

H0(x)=H0(x)x2px[H0(x)1].H_{0}^{\prime}(x)=\frac{H_{0}(x)-x}{2px\left[H_{0}(x)-1\right]}. (106)

Note that the mean cluster size χ=H0(1)/H0(1)H0(1)\chi=H_{0}^{\prime}(1)/H_{0}(1)\propto H_{0}^{\prime}(1), then the critical behavior of χ\chi can be derived from Eq.(106). In the supercritical phase, the giant cluster exists, thus H0(1)<1H_{0}(1)<1, Eq.(106) gives the mean cluster size χ=H0(1)/H0(1)=1/2pH0(1)\chi=H_{0}^{\prime}(1)/H_{0}(1)=1/2pH_{0}(1). For the subcritical phase, all the clusters have finite sizes, i.e., H0(1)=1H_{0}(1)=1, then H0(1)H_{0}^{\prime}(1) can be solved by letting x1x\to 1 in Eq.(106), that is

H0(1)=118p4p.H_{0}^{\prime}(1)=\frac{1-\sqrt{1-8p}}{4p}. (107)

For a physical meaning, p1/8p\leq 1/8 is required here, which gives the percolation threshold pc=1/8p_{c}=1/8. This threshold is smaller than that for the configuration network with the same degree distribution, confirming that the growing mechanism facilitates the forming of the giant cluster, theoretically [125].

As just described, the critical behavior of χ\chi can be summarized as

χH0(1)={118p4p,p18,12p,p>18.\chi\propto H_{0}^{\prime}(1)=\left\{\begin{array}[]{lr}\frac{1-\sqrt{1-8p}}{4p},&p\leq\frac{1}{8},\\ \frac{1}{2p},&p>\frac{1}{8}.\end{array}\right. (108)

This means that the mean cluster size jumps discontinuously at the percolation threshold, whereas diverges for the standard percolation.

Moreover, compared with the static network, the existence of the core makes the size of the giant cluster increases more slowly in the supercritical phase. The simulation results imply that the size of the giant cluster obeys Seα(ppc)βS\propto e^{\alpha(p-p_{c})^{-\beta}} with β=1/2\beta=1/2, and suggest an infinite order phase transition [125, 357, 358, 359], like the Berezinskii-Kosterlitz-Thouless phase transition in the condensed matter [360, 361]. In addition, such a phase transition was also found on the substrate formed by XY spin configurations [362]. Some simulation verification for these findings and the fractal geometry can be found in Refs.[363, 364]. In a growing network with a renormalization group treatment, Dorogovtsev also found that the critical behavior of percolation on growing networks differs from that in uncorrelated networks [365].

3.4.2 Variants and related models

Similar results can also be found in a general model with preferential attachment [355], i.e., the link between nodes ii and jj is inserted with a probability proportional to (ki+a)(kj+a)(k_{i}+a)(k_{j}+a), where kik_{i} (kjk_{j}) is the degree of node ii (jj), and aa is a positive constant which plays the role of additional attractiveness of node for new links [366]. The power-law form of the cluster size distribution in standard percolation has a decay for large sizes both above and below the percolation threshold. However, in the growing network the power-law form can be observed in the entire phase without the giant cluster. This indicates that the system is in the critical state for the entire phase.

Instead of connecting nodes randomly, other typical mechanisms for generating networks can also be incorporated into the growing network model, such as BA network model [367] and the explosive percolation model [368, 333, 369, 308]. In these models, when a new node enters the system, with probability pp links are inserted following the given rule, otherwise, do nothing. When the rule of BA network model is applied, the percolation transition is still of infinite order [367]. This mainly because the rule of BA network model does not break the picture of the forming of the giant cluster, which is the crucial factor for the nature of infinite order transition [355].

Conversely, the Achlioptas process induces a suppression effect against the growth of the large clusters. Thus, when the Achlioptas process is used, the percolation of growing network can revert to a nature of the second order transition with scaling behaviors depending on the specific rules [333, 369]. It should be pointed out that if there are not enough potential nodes/links in the Achlioptas process, the growth mechanism will still dominate the nature of the percolation transition, hence an infinite order transition [368]; rather, if the Achlioptas process is done with global information, the growth of large clusters can be completely suppressed, so that the continuous percolation transition will change to a discontinuous one [370, 308].

Besides, the infinite order transition can occur even in the hierarchical networks with small-world effects [142, 371, 144, 372]. For the growing network without percolating state [373], the preferentially growing network [374], and the growing directed network [375, 376], the power-law distribution of cluster sizes can be also observed. The percolation transition of growing networks has also been studied by diluting links or nodes in a generated network [377]. The dense structures were also found to undergo phase transitions in the growth of networks[378, 379].

4 Applications to network structural analysis

The previous section mainly reviews the typical percolation models on complex networks, focusing on the phase transition and the critical phenomena. In fact, the percolation theory, including conceptions, theoretical methods, and algorithms, has a broader range of applications in the study of network structures and dynamics. This is mainly because the percolation model is by nature for featuring the cluster forming process in random media, while network science is exploring the properties of various topological structures and the correlation with dynamics. In this section we will review the applications to network structural analysis. We should point out that some of these problems are not firstly motivated by percolation, however, a good interpretation can be given from the perspective of percolation.

4.1 Hierarchical structure of networks

Hierarchical structure is one of the important characteristics of real networks [3, 5, 6, 1, 2], which can be generally identified by either a branching process or a pruning process. For tree-like networks, the analytical method for network percolation is often employed to find theoretical results, while for non-tree-like networks percolation process can also be used as an algorithm to identify hierarchical structures.

4.1.1 Tree-like networks

As shown in Fig.17, the hierarchical structure of a tree-like network can be figured out by a spanning tree. For a large enough network with tree-like structures, any node can be seen as the root of the spanning tree. Then the direct neighbors of this node are the first shell of the network, and the neighbors’ neighbors belong to the second shell, and so on. This is just the branching process we discussed in Sec.2.2. Thus, the generating function can be also employed to study this shell structure [48].

Refer to caption
Figure 17: (color online) Schematic for the hierarchical structure of a tree-like network. The number of nodes in the (m+1)(m+1)-th shell is just the total excess degrees of the nodes in the mm-th shell.

Since the root node is arbitrarily chosen, the sub-branchings from it can be described by the generating function G0(x)G_{0}(x). Here, an xx refers to an outgoing link from the root node. Following these outgoing links, we can find the nodes in the first shell of the network. As the explanation for Fig.4, the generating function G1(x)G_{1}(x) can be used to represent the state of the node at the end of a randomly chosen link, where xx also refers to the outgoing link from this node. Therefore, replacing xx in G0(x)G_{0}(x) with G1(x)G_{1}(x), we will obtain the sub-branchings from the nodes in the first shell G(1)(x)=G0(G1(x))G^{(1)}(x)=G_{0}(G_{1}(x)). Similarly, the sub-branchings from the second shell can be represented by G(2)(x)=G0(G1(G1(x)))G^{(2)}(x)=G_{0}(G_{1}(G_{1}(x))). In general, the sub-branchings coming from the mm-th shell can be written as

G(m)(x)\displaystyle G^{(m)}(x)
=\displaystyle= G0(G1(G1(G1(x)))mgenerations)\displaystyle G_{0}(\underbrace{G_{1}(G_{1}(...G_{1}(x)...))}_{m~{}\text{generations}})
=\displaystyle= G(m1)(G1(x)),m>1.\displaystyle G^{(m-1)}(G_{1}(x)),~{}~{}~{}m>1. (109)

With this iteration relation, Shao et al. found that the distribution of node numbers in these shells follows a power law for a broad class of complex networks [380]. This branching process can also be used to realize the configuration model [381], i.e., connecting nodes with given degrees shell by shell.

Note that loops are forbidden here, so there must be a one-to-one correspondence between the parameter xx and the sub-branching. Consequently, the exponent for the power of xx in Eq.(109) is just the total number of sub-branchings from the mm-th shell. In other words, the derivative of Eq.(109) with respect to xx at x=1x=1 must equal to the average number of nodes in the (m+1)(m+1)-th shell,

km+1\displaystyle\langle k_{m+1}\rangle =dG(m)(x)dx|x=1\displaystyle=\frac{dG^{(m)}(x)}{dx}\bigg{|}_{x=1}
=dG(m1)(x)dx|x=1G1(1)\displaystyle=\frac{dG^{(m-1)}(x)}{dx}\bigg{|}_{x=1}G_{1}^{\prime}(1)
=kmG1(1),m1,\displaystyle=\langle k_{m}\rangle G_{1}^{\prime}(1),~{}~{}~{}m\geq 1, (110)

where km\langle k_{m}\rangle is the average number of nodes in the mm-th shell, and k1=k\langle k_{1}\rangle=\langle k\rangle is just the average degree of the network. From Eq.(110), we can also find

km+1km=G1(1),m1.\frac{\langle k_{m+1}\rangle}{\langle k_{m}\rangle}=G_{1}^{\prime}(1),~{}~{}~{}m\geq 1. (111)

This indicates that the growth rate of the node number with shells is a constant G1(1)G_{1}^{\prime}(1). For an infinite network, the shell must be endless, so the node number cannot decrease with the increasing of shells, i.e.,

G1(1)=k2kk1.G_{1}^{\prime}(1)=\frac{\langle k^{2}\rangle-\langle k\rangle}{\langle k\rangle}\geq 1. (112)

This is the existence condition of the giant cluster in a tree-like network, namely, percolation threshold, see Sec.2.2.

For a finite network, the number of shells ll must be finite. Next, we will show how to estimate ll. Using Eq.(111), the number of nodes in the mm-th shell can be written as

km=\displaystyle\langle k_{m}\rangle= k1(k2k1)m1\displaystyle\langle k_{1}\rangle\left(\frac{\langle k_{2}\rangle}{\langle k_{1}\rangle}\right)^{m-1}
=\displaystyle= k(k2kk)m1,m1.\displaystyle\langle k\rangle\left(\frac{\langle k^{2}\rangle-\langle k\rangle}{\langle k\rangle}\right)^{m-1},~{}~{}m\geq 1. (113)

Summing over all the shells, we obtain the network size

N=1+m=1lkm.N=1+\sum_{m=1}^{l}\langle k_{m}\rangle. (114)

This yields

l=ln[(N1)(k22k)+k2]2lnkln(k2k)lnk.l=\frac{\ln\left[(N-1)\left(\langle k^{2}\rangle-2\langle k\rangle\right)+\langle k\rangle^{2}\right]-2\ln\langle k\rangle}{\ln\left(\langle k^{2}\rangle-\langle k\rangle\right)-\ln\langle k\rangle}. (115)

Real networks usually have NkN\gg\langle k\rangle and k2k\langle k^{2}\rangle\gg\langle k\rangle, thus ll can be approximately represented as

l=lnNlnklnk2lnk+1.l=\frac{\ln N-\ln\langle k\rangle}{\ln\langle k^{2}\rangle-\ln\langle k\rangle}+1. (116)

This result can be seen as an approximation of the average distance or the diameter of real networks [382, 383], which indicates that llnNl\propto\ln N. This is just the small-world effect of random networks.

The above derivations depend on a finite k2\langle k^{2}\rangle (when NN\to\infty). However, SF networks with 2<λ<32<\lambda<3 give a divergent moment k2k0N1/(λ1)k2λ𝑑kN(3λ)/(λ1)\langle k^{2}\rangle\propto\int_{k_{0}}^{N^{1/(\lambda-1)}}k^{2-\lambda}dk\propto N^{(3-\lambda)/(\lambda-1)}, where k0k_{0} is the lower boundary of degrees and N1/(λ1)N^{1/(\lambda-1)} is the natural cutoff of degrees. Substituting k2N(3λ)/(λ1)\langle k^{2}\rangle\propto N^{(3-\lambda)/(\lambda-1)} into Eq.(116), we can find a size-independent ll, indicating that SF networks are ultrasmall.

Actually, the diameter ll of SF networks increases with the system size in the form of lnlnN\ln\ln N [70, 384, 385]. To find this, the degree distribution of the nodes in each shell must be checked, separately. In general, an outgoing link is more likely to lead to a node with large degree, which can be seen from the expression of the excess-degree distribution qkpkkq_{k}\propto p_{k}k. For finite networks, this means that earlier shells will bag almost all the nodes with large degrees, so that the small degree nodes are located in the periphery of the hierarchical structure. This effect is more notable when networks have strong heterogeneity like SF networks. This mainly indicates that the degree cutoff of nodes in each shell decreases with the growth of shells. With this, the form llnlnNl\propto\ln\ln N can be found in [70, 384, 385, 381, 386].

4.1.2 k-shell structure

As pointed above, the hierarchical structure shown in Fig.17 has an arbitrary root. In network science, there is another hierarchical structure with deterministic shells, i.e., kk-shell structure [193]. For a given network, the kk-shell refers to the nodes that belong to the kk-core but not belong to the (k+1)(k+1)-core, see Fig.18.

Refer to caption
Figure 18: (color online) Schematic for the kk-shell structure of networks. In a kk-core, nodes are connected to one another by at least kk paths. The nodes that belong to the kk-core but not belong to the (k+1)(k+1)-core are called as kk-shell of the network.

For tree-like networks, one can use Eqs.(88) and (90) to obtain the kk-core size SkS_{k} for any kk, the size of kk-shell is thus SkSk+1S_{k}-S_{k+1}. In general, the pruning process of kk-core percolation is just used as an algorithm to identify the kk-shell structure, called kk-core/kk-shell decomposition. The related discussions are far away from the percolation transition. So we will not go into details on this topic here, and one can refer to the specialized review on this topic for details [214].

4.2 Network robustness

In network science, percolation process has already become a paradigm for studying the network robustness. Specifically, one often removes a fraction 1p1-p of nodes from a given network, then checks the existence of the giant cluster. If the giant cluster remains, the resulted network is still considered functional. Thus the percolation threshold pcp_{c} can be used to evaluate the robustness of the network. A large threshold pcp_{c} indicates that only a small amount of failed nodes can disconnect the network, so its robustness is poor, and vice versa. With different node removal methods, the percolation model can further explain the robustness of networks under various failure mechanisms, some of which are also used to model network dynamics. Here, we review the main findings of the network robustness based on the percolation model from two aspects, single networks and multiplex networks.

4.2.1 Single networks

As mentioned above, the percolation threshold of SF networks is much smaller than that of ER networks with the same link density, especially for SF networks with 2<λ<32<\lambda<3 which have a vanished percolation threshold [83, 84, 85, 72, 86]. This suggests that SF networks are more robust than ER networks. More generally, for networks with the same average degree, the broader the degree distribution, the more robust the network is [387].

For a given link density, the clustering structure compacts node clusters with the price of diluting global connections. Therefore, the corresponding percolation threshold is smaller than that of tree-like networks, indicating that the networks with clustering are fragile [151, 152, 153, 154]. However, highly clustered networks are often characterized as a resilient system [161, 162, 163]. In fact, the two are reconcilable, since strong clustering could induce the so-called core-periphery structure [156, 157], for which the percolation model shows a double transition [158].

More generally, the loop structure, including clustering, is a meaningful concept for network robustness, captured by the connection between a pair of nodes through at least two disjointed paths, termed biconnectivity. Since a pair of biconnected nodes in networks can communicate under the removal of one route, biconnectivity can play a significant role in network robustness [388, 117, 218, 216], as well as other high order connectivity. Some of studies on this topic in terms of the percolation transition can be found in Sec.3.1. Moreover, the resilience of networks with community structure has also been studied by the percolation model, which shows a behavior similar to the spin system under an external field [389].

In addition, the degree correlation can also influence the percolation process on networks, see Sec.2.6. Generally speaking, increasing the assortativity of a network makes the network more robust against nodes removal [390, 172]. In fact, the clustering structure and the degree correlation are inseparable for real networks. Even for network models, it is impossible to adjust the clustering structure arbitrarily without inducing any degree correlation, and vice versa. Therefore, the robustness of real networks, even of some network models, cannot be evaluated by their clustering coefficients or assortative coefficients, individually.

If the hub nodes are preferred to be removed initially, i.e., intentional attack, the percolation model can also be used to study the robustness of networks under targeted attack. A simple implementation approach is to remove a fraction 1p1-p of the nodes with the highest degrees [72], which reports that SF networks, known to be resilient to random removal, are sensitive to intentional attack. More generally, the intentional attacks are commonly realized by removing nodes with probability proportional to kαk^{\alpha} [391], where kk refers to degree. When α>0\alpha>0, nodes with larger degrees are more vulnerable under the intentional attacks, while for α<0\alpha<0, nodes with larger degrees are defended and so have a lower probability to fail. The case α=0\alpha=0 just represents the random removal of nodes. This mechanism can also be developed to study the robustness of networks under intentional link attack [392].

For α>0\alpha>0, SF networks become extremely fragile [85], and the percolation threshold will be larger than that of ER networks with the same link density [85] or even equal to 11 [393]. At criticality, the topology of the network depends on the details of the removal strategy, implying that different strategies may lead to different kinds of percolation transitions [391, 392]. An optimal attack is also directly related to optimal graph partitioning and immunization of epidemics [394]. Different structures can also emerge from these pruning processes [395, 396, 397]. Moreover, by mapping the intentional attack to a normal percolation problem, this robustness model can also be solved [398]. Further research also shows that an onion-like structure could be a nearly optimal structure against simultaneous random and targeted high degree node attacks [399, 400, 401].

There is also a special percolation model for the case α\alpha\to-\infty, called degree-ordered percolation [402], where nodes are occupied in degree-descending order (or alternatively, removed in degree-ascending order). The mean-field results on SF networks suggest that the critical exponents depend on the heterogeneity of the network, and do not belong to the universality class of the standard percolation [403].

Besides, the localized attack, meaning that an attack on a node can destroy the node together with its neighbors within a range, has also been introduced into the percolation model to show the robustness of networks under attacks [404, 387, 405, 228, 406, 407, 408]. When the intentional attacks are dependent on spatial distances, the spatial networks are more fragile than expected [409]. Moreover, the finite-size scaling analysis suggests that for some of the intentional attack strategies, the critical exponents seem to deviate from the mean field, but the evidence is not conclusive [410].

The percolation model is a random process, and the corresponding theory only characterizes the results on average. Nevertheless, two different realizations with the same magnitude of removal could result in very different giant clusters. For the study of the robustness of real networks, the fluctuations of different percolation realizations become extremely important. Bianconi introduced a message-passing algorithm able to predict the fluctuations in a single network, and an analytic prediction of the expected fluctuations in ensembles of sparse networks [411, 412]. The following works also contributed to the stability of the giant cluster [413], the large deviation of percolation [414, 415], and the largest biconnected cluster for random graphs [216]. The results mainly showed that the large-deviation approach to percolation can provide a more accurate characterization of system robustness.

As shown in Sec.2.7, the percolation transition can also be well defined on directed networks, so that with similar consideration the robustness of directed networks can also be studied in the framework of percolation transition. It mainly demonstrates that the GWCC is more vulnerable, and the directed network may have the GWCC and, simultaneously, may not have the GSCC [182, 183, 184, 190, 185, 186, 187, 188, 189].

In addition, if the communication of nodes in the network is effective only if the shortest path between nodes ii and jj after removal is shorter than αlij\alpha l_{ij}, where lijl_{ij} is the shortest path before removal, a much smaller failure of the network can lead to an effective network breakdown. This problem was also studied in a percolation form called limited path percolation [207, 416]. Note that this mechanism was also used to feature the entanglement percolation in quantum complex networks [417].

4.2.2 Multiplex networks

In recent years, multiplex network has become a research hotspot. The consideration of multiplex network naturally introduces the interaction between different network layers, thus constructs a cascading picture, which changes the nature of percolation transition, see Sec.3.2. Consequently, the robustness of multiplex networks under the framework of percolation transition has its own characteristics, or even in a completely opposite manner as the single network.

Topological structure

As shown in Sec.3.2, multiplex networks are more fragile than single networks due to the cascading failures induced by the interdependence between nodes from different network layers. More interestingly, the effects of the topology structure on the robustness of the network are also very different from those of the single network.

In Refs.[122, 120], researchers found that a network with a broader degree distribution results in a larger pcp_{c}, which is opposite to ordinary networks. As we know, the hub nodes have a dominant role in the robustness of a network. However, when the node dependence is involved, a hub node could depend on a small degree node, which is quite vulnerable during the iterated removal process triggered by the initial node removal. Moreover, a broader degree distribution with the same average degree implies that there are more small degree nodes. Therefore, the advantage of a broad degree distribution for the ordinary network becomes a disadvantage for the network with dependence links. For this reason, the assortativity and high clustering structures will also make such networks fragile [418, 419]. These features are in consonance with the robustness of single networks under intentional attack [395, 391, 392, 396, 397, 404]. For directed networks, the in-degree and out-degree correlations could increase the robustness of interdependent networks with heterogeneous degree distributions, but decrease the robustness of interdependent networks with homogeneous degree distributions and with strong coupling strengths [420].

Inter similarity

In contrast to single networks, the multiplex networks are difficult to defend by strategies such as protecting the large degree nodes (intentional attack with α<0\alpha<0) that have been found useful to significantly improve the robustness of single networks [398, 421, 422]. This is mainly because the dependence partners could have very different degrees, and defend strategies based on degrees cannot protect the two nodes at the same time. By this consideration, the system will obviously become robust, when the dependence partners have identical degrees [423, 424]. Such correlations between dependent partners are called inter similarity of different network layers.

Parshani et al. introduced two parameters to evaluate the level of inter similarity between networks: the inter degree-degree correlation and the inter clustering coefficient. The simulation results demonstrated that the inter-similar multiplex networks are significantly more robust to random failures than a randomly interdependent system [425]. Theoretically, Hu et al. developed an approach to analyze the robustness of multiplex networks with inter similarity [426]. In their model, there are some probabilities that the dependence nodes of two adjacent nodes in one network are also connected. The studies showed that this inter similarity can improve the robustness of the interdependent networks and change the critical behaviors of the percolation transition. In addition, a similar work has also been done on this problem [427]. The inter-similarity of dependence partners is also studied in single networks. The results show that the local dependence, i.e., dependence partners are adjacent, can result in a robust network [428].

Dependence relations

The dependence links were proposed to reflect the dependence relationships in reality, but not all the nodes in a networked system have such dependence. In this way, the percolation on networks with weak dependence has been studied [121, 422, 429]. In this model, instead of each node having a dependence partner, only a fraction qq of nodes have dependence links. When q=0q=0, the multiplex networks reduce to some single networks, while q=1q=1, the fully dependence model discussed above is covered. Thus the system becomes robust with the decreasing of qq.

Besides, without reducing the number of dependence links, the dependence can also be adjusted by the so-called dependence/coupling strength α[0,1]\alpha\in[0,1] [430, 431, 432]. When a node loses its dependence partner in the pruning process, each of its links is removed from the network with a probability α\alpha. With this mechanism, heterogeneous dependence can be realized by assigning different dependence links with different dependence strength, and the results demonstrate that the heterogeneous dependence strength makes the system more robust [431]. In addition, Liu et al. proposed a model that integrates interdependent networks and single networks with dependence links [433]. They found that when most of the dependence links are inner- or inter-ones, the coupled network system is fragile and shows a discontinuous percolation transition. However, when the two types of dependence links have nearly the same number, the system is robust and shows a continuous percolation transition.

The one-to-one dependence in the original model is also generalized to multiplex dependence or group dependence. Obviously, the multiplex/group dependence makes the spreading of failures easier, so that the vulnerability of interdependent network under various dependence mechanisms is demonstrated [122, 434, 435, 436]. Wang et al. further showed that a larger dependence group did not always make the network fragile, which is also dependent on the grouping mode and the cascading rules in the group [437, 438]. Li et al. also pointed out that the asymmetric dependence can break the cascading picture between dependence partners, thus results in a robust network [439].

Network of networks

Considering the real situation that more than two networks are connected by the dependence links, Gao et al. extended the model of the interdependent networks into the model of a network of networks [440, 441, 442, 443, 444, 445, 446]. In this model, each network is composed of a set of nodes and connectivity links, then the dependence links connect them to form a larger network. The common factors that influence the network robustness, such as clustering [447], and targeted attack [448], have also been explored in the network of networks. Liu et al. proposed a “weak” interdependence mechanism capturing the differences of networks, where network layers at different positions within the multiplex system experience distinct percolation transitions [306].

In general, the robustness of network of networks decreases with the increasing of network layers. Due to the cascading failures caused by the dependence of networks, the percolation transition of a network of networks could be discontinuous and even a single node failure may lead to an abrupt collapse of the system [440, 441, 442]. Based on a network model composed of spatially embedded networks, Shekhtman et al. showed the extreme sensitivity of coupled spatial networks and emphasized the susceptibility of these networks to sudden collapse [449]. In real life, infrastructural networks are governed and operated separately, and interactions are only allowed within well-defined boundaries. Therefore, for different situations, the percolation on network of networks may be very different. For example, Radicchi et al. introduced a model of percolation where the condition that makes a node functional is that the node is functioning in at least two network layers of the system, and found that the addition of a new layer of interdependent nodes to a preexisting multiplex network will improve its robustness [450].

In addition, the network of networks is also studied as interacting networks [319, 451, 452], in which the connections between the network layers are ordinary links. A system of this kind is therefore equivalent to a single modular network, for which the system can be easily separated into isolated modules, and different modules might percolate at different thresholds [453, 454, 455, 389]. It should mention that there are some special reviews on the resilience of network of networks [456, 457]. One can refer to those for more details.

4.3 Community detection

Community structures are the subsets of nodes within a network that have a high density of within-group connections but a lower density of between-group connections. Traditional community detection methods are usually deterministic and used for finding separated communities, whereas most of the actual networks are made of highly overlapping cohesive groups of nodes. To identify the overlapping communities in large real networks, Derényi and co-workers have proposed a dependent percolation model, called clique percolation [170, 171].

A clique is a fully connected subset of nodes of a network; such a subset with kk nodes is often called a kk-clique. In their model, two kk-cliques are regarded as adjacent, if they share ll (=k1=k-1) nodes. With this connection rule, a node can belong to more than one clique cluster, see Fig.19. Then, by identifying these clique clusters as communities, the overlapping communities are thus detected.

This method also has a clear limitation. That is the network must have a large number of cliques, so it may fail to give meaningful community structures for networks with just a few cliques. For more discussions on this problem, one can refer to Ref.[458].

Refer to caption
Figure 19: (color online) Schematics of the clique percolation. Two kk-cliques are regarded as adjacent, if they share ll (k1\leq k-1) nodes. Different clusters are distinguished by different colors. (a) k=3k=3 and l=2l=2. (b) k=4k=4 and l=2l=2.

Aside from the analysis of network structure, clique percolation transition itself provides a set of very interesting problems. As classical percolation in ER networks, kk-clique percolation considers the behaviors of clusters formed by connected kk-cliques for a different connection probability pp of links. The simulation results in Ref.[170] indicate that the fraction of nodes in the largest kk-clique cluster makes a discontinuous percolation transition with increasing connection probability pp; however, the fraction of kk-cliques in the largest kk-clique cluster demonstrates a continuous percolation transition. Bollobás and Riordan gave a rigorous mathematical result of the critical point for a more general percolation model, in which two kk-cliques are regarded as adjacent if they share l(=1,2,,k1)l~{}(=1,2,...,k-1) or more nodes [123]. The Monte Carlo simulation indicates that for different kk and ll, this general clique percolation model could demonstrate rich critical phenomena [459].

In Ref.[124], Li et al. proved theoretically that the fraction of nodes in the giant clique cluster ϕ\phi for l>1l>1 makes a step-function-like phase transition in the thermodynamic limit and a continuous phase transition for l=1l=1. More interestingly, they showed that the order parameter at the critical point ϕc\phi_{c} is neither 0 nor 11, but a constant depending on kk and ll. In addition, the fraction of cliques in the giant clique cluster ψ\psi always makes a continuous phase transition as the classical percolation. Through analysis of the clique cluster number distribution, they found that there is no essential distinction between the two processes measured by ψ\psi and ϕ\phi, and the different behaviors are mainly caused by the quantitative relation between the total numbers of cliques and nodes. Besides, the clique percolation has also been studied in two-dimensional systems [208]. The finite-size scaling analysis shows that, in contrast to the clique percolation on an ER network where the critical exponents are parameter dependent, the two-dimensional clique percolation simply shares the same critical exponents with traditional site or bond percolation, independent of the clique percolation parameters [208].

4.4 Vital nodes identification

The heterogeneity of complex networks shows up in the different roles that nodes play in the structure resilience and the dynamical behaviors. The vital nodes identification classifies nodes based on their roles played in network structure or dynamics [245]. As a model of cluster forming, percolation model has also been employed to identify vital nodes.

In Ref.[460], Piraveenan et al. proposed a centrality measure based on the percolation state of networks, called percolation centrality. In this method, percolation state of node ii is denoted as xix_{i}. Specifically, xi=1x_{i}=1 indicates that node ii is in the fully percolated state, and xi=0x_{i}=0 for a non-percolated state, while a partially percolated state corresponds to 0<xi<10<x_{i}<1. Accordingly, the percolation centrality of a node ii in a network with size NN is defined as

PC(i)=1N2sirσs,r(i)σs,rxsjxjxi,PC(i)=\frac{1}{N-2}\sum_{s\neq i\neq r}\frac{\sigma_{s,r}(i)}{\sigma_{s,r}}\frac{x_{s}}{\sum_{j}x_{j}-x_{i}}, (117)

where σs,r\sigma_{s,r} is the number of shortest paths between nodes ss and rr, and σs,r(i)\sigma_{s,r}(i) is that pass through node ii. With this, nodes in large clusters are considered to be more important. If all nodes are in the same cluster, xs/(jxjxi)=1/(N1)x_{s}/(\sum_{j}x_{j}-x_{i})=1/(N-1), so that the percolation centrality reduces to the well-known betweenness centrality. From this perspective, percolation centrality is just a betweenness centrality with weighting based on percolated state.

In percolation process, if a node’s removal breaks down the network into many small clusters, the nodes must be very important for the network structure. From this basis, Morone and Makse proposed a method to find the superspreaders in network spreading, in which the influence of a node is the size of the largest cluster after removing this node [280]. Accordingly, the influence maximization problem is to find the minimal set of nodes which, if removed, would turn the network into the non-percolating state. Such an operation is often called optimal percolation. The method is also used to find the influential nodes in brain networks [461] and multiplex networks [462].

Due to the inherent relation with spreading dynamics, bond percolation is also used to identify the optimal combination of a given number of influential spreaders [463]. This method needs first to perform a bond percolation on the network, i.e., randomly removing some links. In the resulting network, the largest degree nodes in the top-nn largest clusters are selected and one score is assigned. With different realizations, the average score of each node can be obtained, and the ones with large scores are suggested to be the influential spreaders. Furthermore, Hu et al. offered an analogy of the correlation length in percolation transition to explain the local length scale in spreading dynamics, which could also be used to measure a node’s or a group of nodes’ global influence [464].

In addition, percolation process is also employed as a criterion of the node ranking algorithm [245]. In this criterion, the size of the giant cluster after a node’s removal is assumed to be the structural importance of the node, such that the smaller the obtained giant cluster is the more important the node is. For a node ranking, one can remove nodes from the network one by one in the order of the ranking, then the average size of the giant clusters for each stage can be expressed as

R=i=1NSiN.R=\frac{\sum_{i=1}^{N}S_{i}}{N}. (118)

Here, NN is the number of nodes in the network, and SiS_{i} is the size of the giant cluster when the ii-th node has been removed. Obviously, a good node ranking will give a small RR. From Eq.(118), if the network is large enough, RR is approximately the area surrounded by function SiS_{i} and the horizontal axis.

4.5 Network observability

In Ref.[197], a fundamental problem in network science, namely, the determination of the state of the system from limited measurements, was abstracted as a percolation model, called network observability transition. In this model the initial node occupation is understood as placing a sensor device on the corresponding node, making both the node and all of its neighbors observable. Then, network observability transition focuses on the emergence of the giant cluster formed by observable nodes.

Obviously, the threshold of such percolation must be smaller than that of the classical percolation. By generating function technique, this model can also be solved theoretically. The percolation threshold depends dominantly on the average degree and the variance of degree distribution. The transition occurs earlier in denser and more heterogeneous networks, meaning that only a small amount of sensor devices can observe the key structure of such networks. However, Allard et al. further found a nontrivial coexistence of an observable and of a nonobservable extensive cluster in this percolation model [465]. This suggests that monitoring a macroscopic portion of a network does not prevent a macroscopic event to occur or be unknown to the observer. Further studies also showed that both the negative degree correlation and the betweenness-based sensor placement can make networks more observable [466, 467], and were applied to many real networks [468, 469]. In Ref.[469], a discontinuous observability transition was found in synthetic multiplex networks.

There is another percolation model called ll-hop percolation [198], following a very similar percolation rule. In this model when a node is removed all the nodes in the ll-hop distance, i.e., the nodes no farther than ll away from this node, will also be removed, therefore it can cause more damage than the normal robustness model based on the classical percolation. For the study of networks robustness, such a removal rule is often named localized attack [404, 387, 405, 228, 406, 407, 408].

5 Applications to network dynamics

The origin and development of percolation are inextricably bound up with the spreading process, therefore it is completely natural that in network science the percolation mechanism has been widely used to embody the spreading dynamics. In this section we will first review some typical models of spreading dynamics and their relations with the percolation transition. Besides, the percolation theory has also been borrowed to analyzing the cluster forming of individuals in the dynamics that the node or individual group plays a key role in the evolution of systems, such as the cascading process, traffic jam, and cooperation evolution. These works will also be reviewed in this section.

5.1 Epidemic spreading on networks

The modeling of epidemics is a very active field of research across different disciplines [470], and complex networks provide a powerful framework to describe the contact structure among individuals [471]. As mentioned in Sec.2.1, the epidemic spreading on networks is closely related to percolation process. When the spanning cluster does not exist or the susceptible individuals are isolated into small clusters, infectious diseases will be constrained to some small areas and a global outbreak is impossible. Similarly, when some individuals are immunized to the disease, i.e., recovered from the disease, being quarantined or get vaccinated, the susceptible individuals can be also separated as small isolated clusters and thus the disease cannot spread across the network. From this perspective, regardless of the specifics, the outbreak of the epidemics corresponds to a percolation transition among connected individuals, and there is a direct correlation between the percolation threshold and the epidemic threshold.

In this section, we introduce the framework of epidemic modeling in complex networks from a percolation perspective, although most of them are not inspired by percolation theory initially. Due to the focus of this article, we will not deeply explore various models and the corresponding analytic methods. For details, one can refer to the special reviews [472, 473, 23].

It should be noted that the spreading of epidemics is a dynamic process, while the percolation process results in a static configuration. Because of this, one often employs the master equation to feature the system state during the spreading process, instead of considering the static configuration of the system. Nevertheless, we will find that the results obtained by the master equation directly reflect the nature of percolation transition. With some mathematical techniques, some spreading models can also be exactly mapped to a percolation process [87, 23].

Generally, the epidemic model assumes that the individuals can be categorized as different classes depending on the stage of the disease in the population, such as susceptible (SS, those who have not got infected), infectious (II, those who have been infected and are contagious), and recovered (RR, those who have recovered from the disease or death) [23]. Based on this classification, three epidemiological models, susceptible–infected (SI), susceptible–infected–susceptible (SIS) and susceptible–infected–recovered (SIR) are usually used to study the dynamics of disease evolution.

The simplest one is the SI model in which individuals can only exist in two discrete states, namely, SS and II. The probability that an SS node gets infected by any given neighbor in an infinitesimal time interval dtdt is βdt\beta dt, where β\beta is the infection rate of the epidemic. The SIS model considers that individuals change stochastically through a reversible process: SISS\to I\to S, where the infection rate is described as in the SISI model, but infected individuals may recover and become susceptible again with a recovery rate μ\mu. In the SIR model, infected individuals are removed permanently from the network with a rate μ\mu and labeled as the state RR corresponding to immune or death. There is a second version of SIS/SIR model, where infected individuals remain in state II for τ\tau time steps, and then become either RR in SIR model or SS again in SIS model [474].

5.1.1 SI model

As the setting of SI model, even for a very small infection rate β\beta, all the individuals will be finally infected. Therefore, the epidemic threshold is meaningless for SI model, and the interest of research is the effect of the heterogeneity of degree distributions on the spreading velocity [475].

Let ik(t)i_{k}(t) be the fraction of infections in nodes with degree kk at time step tt, then the growth of ik(t)i_{k}(t) per time step can be expressed as βksk(t)Θk(t)=βk[1ik(t)]Θk(t)\beta ks_{k}(t)\Theta_{k}(t)=\beta k[1-i_{k}(t)]\Theta_{k}(t), where sk(t)s_{k}(t) is the fraction of susceptible in nodes with degree kk, and Θk(t)\Theta_{k}(t) is the probability that a neighbor of a node of degree kk is infected. Note that ik(t)i_{k}(t) cannot decrease with time for this model, so the rate equation reads

dik(t)dt\displaystyle\frac{di_{k}(t)}{dt} =βk[1ik(t)]Θk(t)\displaystyle=\beta k[1-i_{k}(t)]\Theta_{k}(t)
=βkΘk(t)βkik(t)Θk(t).\displaystyle=\beta k\Theta_{k}(t)-\beta ki_{k}(t)\Theta_{k}(t). (119)

For uncorrelated networks, Θk(t)Θ(t)\Theta_{k}(t)\equiv\Theta(t) is independent of kk. Considering that an infected node at least has an infected neighbor, from which the disease has been transmitted, we have

Θ(t)=k(k1)pkik(t)k.\Theta(t)=\frac{\sum_{k^{\prime}}(k^{\prime}-1)p_{k^{\prime}}i_{k^{\prime}}(t)}{\langle k\rangle}. (120)

In the early epidemic stages, ik(t)i_{k}(t) is small, then one can neglect the second term of the right hand side of Eq.(119), which is in an order of O([i(t)]2)O([i(t)]^{2}). Together with Eq.(120), we have

dik(t)dt\displaystyle\frac{di_{k}(t)}{dt} =βkΘ(t),\displaystyle=\beta k\Theta(t), (121)
dΘ(t)dt\displaystyle\frac{d\Theta(t)}{dt} =β(κ1)Θ(t),\displaystyle=\beta(\kappa-1)\Theta(t), (122)

where κ=k2/k\kappa=\langle k^{2}\rangle/\langle k\rangle. For a uniform initial condition ik(t=0)i(0)i_{k}(t=0)\equiv i(0), the two differential equations have solution

ik(t)=i(0)[1+kkk1κ1(et/τ1)],i_{k}(t)=i(0)\left[1+\frac{k}{\langle k\rangle}\frac{\langle k\rangle-1}{\kappa-1}\left(e^{t/\tau}-1\right)\right], (123)

with the outbreak’s time scale

τ=1β(κ1).\tau=\frac{1}{\beta(\kappa-1)}. (124)

The average prevalence can also be obtained,

i(t)\displaystyle i(t) =kpkik(t)\displaystyle=\sum_{k}p_{k}i_{k}(t)
=i(0)[1+k1κ1(et/τ1)].\displaystyle=i(0)\left[1+\frac{\langle k\rangle-1}{\kappa-1}\left(e^{t/\tau}-1\right)\right]. (125)

These results indicate that the prevalence shows an exponential growth in the early stages, and larger degree nodes display larger prevalence levels. By the way, the spreading dynamics on the networks with small-world effects also reported a power-law growth with a tunable exponent [476, 477], which has already been observed in the epidemic data of COVID-19 [478, 479, 480, 481]. Such growth patterns are often referred to the sub-exponential growth dynamics in the phenomenological models [482].

Equation (124) implies that the outbreak’s time scale of an epidemic is related to the heterogeneity of the degree distribution as measured by κ\kappa. A strong heterogeneity corresponds to a large κ\kappa, and thus the time scale of outbreak τ\tau is very small, which signals a very fast spreading of the infection. The physical reason is that once the disease has occupied the hubs, it can spread very fast among the network following a “cascade” of decreasing degree classes [475, 483].

Note that the percolation threshold corresponds to pc=1/(κ1)p_{c}=1/(\kappa-1) [72]. When the infection rate β>pc\beta>p_{c}, the outbreak’s time scale becomes τ<1\tau<1. This means that an epidemic with an infection rate larger than the percolation threshold can quickly spread far and wide. Moreover, a special case is SF networks with an exponent 2<λ<32<\lambda<3, for which any infected rates β>0\beta>0 will lead to a rapid spreading due to the vanished percolation threshold.

5.1.2 SIR model

The above discussion can be easily extended to the SIR model. To describe the transition from II state to RR state, an extra term μik(t)−\mu i_{k}(t) defining the rate at which infected individuals of degree kk recover and permanently removed must be added into Eq.(119) [484],

dik(t)dt\displaystyle\frac{di_{k}(t)}{dt} =βksk(t)Θk(t)μik(t),\displaystyle=\beta ks_{k}(t)\Theta_{k}(t)−\mu i_{k}(t), (126)
dsk(t)dt\displaystyle\frac{ds_{k}(t)}{dt} =βksk(t)Θk(t),\displaystyle=-\beta ks_{k}(t)\Theta_{k}(t), (127)
drk(t)dt\displaystyle\frac{dr_{k}(t)}{dt} =μik(t),\displaystyle=\mu i_{k}(t), (128)

where rk(t)r_{k}(t) is defined as the density of removed individuals of degree kk and sk(t)+ik(t)+rk(t)=1s_{k}(t)+i_{k}(t)+r_{k}(t)=1. With the initial conditions sk(0)1s_{k}(0)\simeq 1, ik(0)0i_{k}(0)\simeq 0, and rk(0)=0r_{k}(0)=0, the solution of Eq.(127) can be written as

sk(t)=eβkϕ(t).s_{k}(t)=e^{-\beta k\phi(t)}. (129)

Here, considering the uncorrelated networks, i.e., Θk(t)Θ(t)\Theta_{k}(t)\equiv\Theta(t), ϕ(t)\phi(t) reads

ϕ(t)\displaystyle\phi(t) =0tΘ(t)𝑑t\displaystyle=\int_{0}^{t}\Theta(t^{\prime})dt^{\prime}
=1kkpk(k1)0tik(t)𝑑t\displaystyle=\frac{1}{\langle k\rangle}\sum_{k}p_{k}(k-1)\int_{0}^{t}i_{k}(t^{\prime})dt^{\prime}
=1μkkpk(k1)rk(t).\displaystyle=\frac{1}{\mu\langle k\rangle}\sum_{k}p_{k}(k-1)r_{k}(t). (130)

In the last equality we have utilized the integral of Eq.(128).

Different from SI model, for tt\to\infty, the infections vanish and the individuals are either in state SS or in state RR, depending on the infection rate β\beta and the recovery rate μ\mu. We can use dϕ(t)/dt=0d\phi(t)/dt=0 to find the steady state, that is

0=dϕ(t)dt|t\displaystyle 0=\left.\frac{d\phi(t)}{dt}\right|_{t\to\infty} =1kkpk(k1)ik(t)\displaystyle=\frac{1}{\langle k\rangle}\sum_{k}p_{k}(k-1)i_{k}(t)
=1kkpk(k1)[1rk(t)sk(t)]\displaystyle=\frac{1}{\langle k\rangle}\sum_{k}p_{k}(k-1)\left[1-r_{k}(t)-s_{k}(t)\right]
=k1kμϕ(t)1kkpk(k1)sk(t).\displaystyle=\frac{\langle k\rangle-1}{\langle k\rangle}-\mu\phi(t)-\frac{1}{\langle k\rangle}\sum_{k}p_{k}(k-1)s_{k}(t). (131)

Together with Eq.(129), we find a self-consistent equation for ϕ()\phi(\infty),

ϕ()=k1μk1μkkpk(k1)eβkϕ().\phi(\infty)=\frac{\langle k\rangle-1}{\mu\langle k\rangle}-\frac{1}{\mu\langle k\rangle}\sum_{k}p_{k}(k-1)e^{-\beta k\phi(\infty)}. (132)

Solving this equation, we can find ϕ()\phi(\infty), and thus all other parameters for the steady state can be found.

It is easy to find that if

ddϕ()[k1μk1μkkpk(k1)eβkϕ()]|ϕ()=0<1,\frac{d}{d\phi(\infty)}\left.\left[\frac{\langle k\rangle-1}{\mu\langle k\rangle}-\frac{1}{\mu\langle k\rangle}\sum_{k}p_{k}(k-1)e^{-\beta k\phi(\infty)}\right]\right|_{\phi(\infty)=0}<1, (133)

Eq.(132) only has a solution ϕ()=0\phi(\infty)=0, indicating the epidemic cannot break out. This yields the epidemic threshold

δc=βcμc=1κ1,\delta_{c}=\frac{\beta_{c}}{\mu_{c}}=\frac{1}{\kappa-1}, (134)

above which the epidemic incidence attains a finite value. Note that 1/(κ1)1/(\kappa-1) is just the percolation threshold of the network. This suggests that there is a tight correspondence between the emergence of the giant cluster in the bond percolation and the outbreak of epidemic in the system.

Similar to SI model, Eq.(134) also suggests that a high level of connectivity heterogeneity (large κ\kappa) facilitates the spreading of epidemics. When κ\kappa diverges (SF networks with 2<λ<32<\lambda<3), an epidemic with any non-zero infection rate β\beta can break out in the system [475, 483, 485]. This is analogous to those concerning the network resilience under random damages, which can be studied by the standard percolation on networks.

The above discussion focuses the steady state of spreading process. The time evolution of SIR model can be solved by other approaches, such as the effective degree-based method [486] and the pair-based mean-field method [487, 488]. The extended degree-based method is used to derive an expression for the basic reproduction ratio R0R_{0} [486], and also provides an excellent agreement with numerical simulations for both the temporal evolution and the final outbreak size [489]. The threshold condition derived from the effective degree-based method turns out to be equal to the one obtained by percolation theory. The pair-based method was proven to be an exact deterministic description of the infection probability time course for each individual in the tree-like network [490]. For networks with loops, a precise closures can be also derived according to the detailed loop structure [491]. At the same time, the pair-based approach can be easily extended to models with heterogeneous infection and recovery rates [487].

The correspondence between the steady properties of SIR model and the bond percolation is also studied in Refs.[1, 87, 492, 493]. Considering the second version of SIR model with a uniform infection time τ\tau, i.e., infected nodes will recover after being infected for τ\tau time steps. For continuous-time dynamics, the transmissibility TT, defining the probability that the disease will be transmitted from an infected node to a susceptible neighbor before recovery takes place, can be computed as

T=1limΔt0(1βΔt)τ/Δt=1eτβ.T=1-\lim_{\Delta t\rightarrow 0}(1-\beta\Delta t)^{\tau/\Delta t}=1-e^{-\tau\beta}. (135)

With this relation, we can exactly map the steady state of an SIR model to a bond percolation with occupied probability TT. Then, from the critical transmissibility

Tc=1G1(1)=kk2k=1κ1,T_{c}=\frac{1}{G_{1}^{\prime}(1)}=\frac{\langle k\rangle}{\langle k^{2}\rangle-\langle k\rangle}=\frac{1}{\kappa-1}, (136)

one can find the critical infection rate for a given τ\tau, that is

βc=1τlnκ1κ2.\beta_{c}=\frac{1}{\tau}\ln\frac{\kappa-1}{\kappa-2}. (137)

For discrete time, particularly computer simulations, the transmissibility T=1(1β)τT=1-(1-\beta)^{\tau}, thus

βc=1(κ2κ1)1/τ.\beta_{c}=1-\left(\frac{\kappa-2}{\kappa-1}\right)^{1/\tau}. (138)

These results also show that different combinations of β\beta and τ\tau could result in the same spreading result, related to the percolation cluster under occupied probability TT [87].

5.1.3 SIS model

Based on the degree-based mean-field theory [494], the complete evolution equation for the SIS model on a network with arbitrary degree distribution can be written as

dik(t)dt=β[1ik(t)]kΘk(t)μik(t).\frac{di_{k}(t)}{dt}=\beta[1-i_{k}(t)]k\Theta_{k}(t)−\mu i_{k}(t). (139)

The creation term considers the density 1ik(t)1−i_{k}(t) of susceptible nodes with degree kk that can get the infection from a neighboring individual [495]. For uncorrected networks, Θ(t)=kkpkik(t)/k\Theta(t)=\sum_{k^{\prime}}k^{\prime}p_{k^{\prime}}i_{k^{\prime}}(t)/\langle k\rangle is used in Ref.[494], which is different from Eq.(120). This is because the infected nodes can recover to be a susceptible, and all the excess-degree of an infected node at the end of a link could lead to a node in state SS. However, we think that around or above the epidemic threshold, on average an infected individual is very unlikely to recover before the individuals infected by it transmit the disease to others. Thus, we still use Eq.(120) to represent Θk(t)\Theta_{k}(t) here. It must be pointed out that both the two choices are an approximation for Θk(t)\Theta_{k}(t), and do not affect the basic conclusion.

With the tt\rightarrow\infty limit, the system can reach the stationarity state, i.e., dik(t)/dt=0di_{k}(t)/dt=0. This yields

ik()=kβΘ()μ+kβΘ().i_{k}(\infty)=\frac{k\beta\Theta(\infty)}{\mu+k\beta\Theta(\infty)}. (140)

This equation shows that the larger the degree of a node, the higher its probability to be infected. Substituting Eq.(120) into Eq.(140), a self-consistent equation can be found

Θ()=1kkpk(k1)kβΘ()μ+kβΘ().\Theta(\infty)=\frac{1}{\langle k\rangle}\sum_{k}p_{k}(k-1)\frac{k\beta\Theta(\infty)}{\mu+k\beta\Theta(\infty)}. (141)

This equation always has a trivial solution Θ()=0\Theta(\infty)=0, and the nontrivial solution exists only for

ddΘ()[1kkpk(k1)kβΘ()μ+kβΘ()]|Θ()=01.\frac{d}{d\Theta(\infty)}\left.\left[\frac{1}{\langle k\rangle}\sum_{k}p_{k}(k-1)\frac{k\beta\Theta(\infty)}{\mu+k\beta\Theta(\infty)}\right]\right|_{\Theta(\infty)=0}\geq 1. (142)

Thus, the epidemic threshold reads

δc=βcμc=1κ1.\delta_{c}=\frac{\beta_{c}}{\mu_{c}}=\frac{1}{\kappa-1}. (143)

This result is the same as that found in SIR model, the qualitative analysis is not repeated here. Note that for SF networks with 2<λ<32<\lambda<3, neither assortativity nor disassortativity can reconstruct a non-vanished epidemic threshold for SIS model [496].

For the version that infected nodes remain in state II for τ\tau time steps, and then become SS again, the epidemic threshold of SIS model can be also derived by the branching process in percolation theory [474]. In SIS model, it has been found that there exist dynamic correlations coming from that if one node is infected, its neighbors are likely to be infected, which has been neglected by the degree-based mean-field theory. To account for this, effective degree approach [489, 497] and the combination of degree-based mean-field theory and effective degree approach [498] have been proposed to obtain accurate expressions for the average prevalence and the epidemic threshold.

5.1.4 Immunization

Not only the epidemic model itself but also the immunization strategy for suppressing the spreading is related to percolation process. The simplest immunization strategy is the random one, which chooses immune individuals randomly from a population. After immunization, the infected individuals cannot transmit disease to the immune individuals. Thus it is equivalent to a spreading process on a diluted network. If there is no giant cluster in the diluted network, the epidemic will not break out, regardless of infection rate. The corresponding immunization ratio is called immunization threshold. A detailed discussion for this problem is the same as that for the network robustness, and thus immunization can be treated as a percolation process.

The introduction of a fraction gg of immune individuals chosen randomly is equivalent to dilute nodes with probability 1g1-g. According to Eqs.(15) and (16), the immunization process results in κ1(κ1)(1g)\kappa-1\to(\kappa-1)(1-g) [499]. Thus, the epidemic threshold is rescaled as

δc=βcμc=11g1κ1.\delta_{c}=\frac{\beta_{c}}{\mu_{c}}=\frac{1}{1-g}\frac{1}{\kappa-1}. (144)

This generally covers the common sense that the more immune individuals, the larger the epidemic threshold. However, for SF networks with 2<λ<32<\lambda<3, κ\kappa is divergent, implying that random immunization cannot bring the epidemic into healthy region except for the case g=1g=1. From Eq.(144), we can also find that if 1g<1/(κ1)pc1-g<1/(\kappa-1)\equiv p_{c}, even if β=1\beta=1 Eq.(144) does not make sense. This indicates that when the fraction of unimmunized individuals is below the percolation threshold 1gc=pc1-g_{c}=p_{c}, the epidemic cannot break out [499]. This is mainly because below this threshold the susceptible individuals are segregated in smaller regions by immune individuals.

Obviously, whoever the immune individuals are, they can suppress the infection locally. However, an apparent inadequacy of random immunization strategies is that it gives the same importance to both high-connected nodes (with the largest infection potential) and low-connected nodes, and thus is unable to find the critical individuals to make them immune for the eradication of infection. Targeted immunization is referred as the immunization of most highly connected individuals (potentially the largest spreaders), i.e., a fraction gg of the individuals with the highest degrees is immunized [470]. The equivalent diluted result depends on the details of the targeted immunization, which is very similar to the intentional attack in the study of network robustness (see Sec.4.2).

In Ref.[499], the authors showed that the immunization threshold takes the form gce2μ/mβg_{c}\sim e^{-2\mu/m\beta} for SF networks with λ=3\lambda=3, where mm is the minimum degree. This clearly highlights that the targeted immunization program is extremely convenient, with an immunization threshold that is exponentially small in a wide range of infection rates. Based on the global knowledge of network, the nodes with the highest recalculated degree or betweenness centrality are selected for immunization during the selection procedure [500]. Schneider et al. sought to minimize a properly defined size for the connected cluster of susceptible individuals, which is found to be more effective than the methods based on immunizing the highest-betweenness links or nodes [501]. Similarly, seeking to fragment the network into connected clusters of approximately the same size, “equal graph partitioning” strategy can achieve the same degree of immunization as targeted immunization by fewer immunization doses [502].

An efficient targeted immunization requires global information about the network, which makes it to be impractical in many real situations. Acquaintance immunization can make immune a small fraction of random acquaintances of randomly selected nodes [503]. In this way, the highly connected nodes are more likely to be chosen and immunized, and the process prevents epidemics with a small finite immunization threshold and without requiring global knowledge of the network. In acquaintance immunization, a random fraction qq of nodes is chosen and then selects their random neighbors, i.e., acquaintances. The acquaintances, rather than the originally chosen nodes, are the ones immunized. The fraction qq may be larger than 11, for a node might be queried more than once, on average, while the fraction of nodes immunized gg is always less than or equal to 11. The critical threshold qcq_{c} for a complete immunization can be solved analytically based on a branching process on a network as that in percolation model.

If more local information is available, the acquaintance immunization strategy can be further improved in efficiency. For example, allowing initial selected node to have knowledge of the degrees of its nearest neighbors, immunizing the neighboring nodes with the largest degree is smart for efficiency [504]. Similarly, if the information within a distance dd of initial selected node is available, one can consider the immunization of the nodes with the highest degrees within that scope.

5.2 Cascading process on networks

It has been widely recognized that there are interdependencies or hidden interactions among the nodes in a networked system, which allows the viruses, information, opinion or failures to propagate from node to node via a certain topology. Although the specific scenarios might be different, most models present an iterative process: each individual in a population must decide a state between two or more alternatives and their decisions are made based on the options of their neighbors rather than relying on their own information about the problem, thus after an initial state setting the state flipping of individuals will occur recursively until no node meets the flipping criterion. This type of processes is often called cascading process.

With a given amount of excitation sources, the emergence of a giant cluster formed by individuals with identical states is just a dependent percolation process. Most of the models reviewed in Sec.4 belong to this category, i.e., the system evolves in generations. Here we mainly concentrate on another model widely used for the cascading process, called threshold model.

Threshold model was first proposed by Watts for the study of cascading failures in networks [194]. In this model, there are two states for nodes, namely, functional and failed, to be more generic, we use state 0 and state 11 instead. At each time step, the binary decision rule with externalities is outlined as follows: each node in state 0 observes the current states (either 0 or 11) of all its kk neighbors, and adopts state 11 if at least a threshold fraction β\beta of its kk neighbors are in state 11, else remains unchanged. No transition from state 11 back to state 0 is possible. With a small fraction of seed nodes (state 11) in a population which is initially set as state 0, the system evolves at successive time steps with all nodes updating their states according to the threshold rule above.

In a sufficiently large random network with sparse connections and only a few seeds, it can be assumed that the network is locally tree-like, hence, no node has more than one neighbor in state 11. Under this condition, only the nodes with degrees k1/βk\leq 1/\beta have the potential to flip their states to be 11, called vulnerable nodes. If the vulnerable nodes percolate in the system, the global cascades can be triggered by the initial seeds; otherwise, the propagation of cascades will be limited by the stable nodes and global cascade is impossible.

That is to say, this dynamic problem can be mapped to a static percolation problem, which can be solved by the generating function approach, see Sec.2.2. According to percolation threshold of random networks G1(1)=1G_{1}^{\prime}(1)=1 and denoting the probability that a node with degree kk is vulnerable as θk\theta_{k}, the largest vulnerable cluster percolates when

1kkθkpkk(k1)=1.\frac{1}{\langle k\rangle}\sum_{k}\theta_{k}p_{k}k(k-1)=1. (145)

With the above analysis, we know that θk=1\theta_{k}=1 for k1/βk\leq 1/\beta, otherwise θk=0\theta_{k}=0. For large thresholds β\beta, Eq.(145) has no solution, meaning that the vulnerable nodes cannot percolate and there is no global cascade in the system. For small thresholds β\beta, Eq.(145) has two solutions resulting in two phase transitions. Below the smaller critical point, the network is composed of some small clusters and the propagation of cascades is thus constrained by these segregated clusters. Above the larger critical point, the network becomes dense enough, so that the propagation of cascades is blocked by the local stability of nodes. In summary, the global cascades can only occur in the network with a structure lying between the two phase transition points. Moreover, the transition of the cascade upon the average degree is continuous at the smaller critical point and discontinuous at the larger critical point.

The above discussion is based on that the seeds are scarce. Gleeson and Cahalane further studied the effects of seed size on the cascading process [253]. Their method is based on a level-to-level update process, the final cascade size ρ\rho (rescaled by the system size) of active nodes (state 11) is given by

ρ=ρ0+(1ρ0)k=1pkm=0k(km)qm(1q)kmF(mk),\rho=\rho_{0}+(1-\rho_{0})\sum_{k=1}^{\infty}p_{k}\sum_{m=0}^{k}\binom{k}{m}q^{m}(1-q)^{k-m}F\left(\frac{m}{k}\right), (146)

where ρ0\rho_{0} is seed size, and F(m/k)F(m/k) gives the probability that the node with degree kk and mm active neighbors meets the excitation condition. In addition, qq is the probability that a link leads to an active node, which satisfies the recursion relation

q=ρ0+(1ρ0)k=1pkkkm=0k1(k1l)(ln)qm(1q)km1F(mk).q=\rho_{0}+(1-\rho_{0})\sum_{k=1}^{\infty}\frac{p_{k}k}{\langle k\rangle}\sum_{m=0}^{k-1}\binom{k-1}{l}\binom{l}{n}q^{m}(1-q)^{k-m-1}F\left(\frac{m}{k}\right). (147)

With these equations, the cascade size can be solved. The results show that the cascade transition at low k\langle k\rangle may in fact be discontinuous in certain parameter regimes, and the seed size ρ0\rho_{0} as low as 0.1%0.1\% has dramatic effects on the location of cascade transition points, that is

1kk=1pkk(k1)[F(1k)F(0)]>11ρ0.\frac{1}{\langle k\rangle}\sum_{k=1}p_{k}k(k-1)\left[F\left(\frac{1}{k}\right)-F(0)\right]>\frac{1}{1-\rho_{0}}. (148)

Obviously, this result reduces to Eq.(145) found in Ref.[194] by letting ρ00\rho_{0}\to 0, for which F(0)=0F(0)=0 and θk=F(1/k)\theta_{k}=F(1/k).

This analysis method is also valid for large seed ρ0\rho_{0}, which has been also extended to analyze networks with community structures and degree-degree correlations [256, 505]. For networks with clusterings [258], it turns out that for large and small values of k\langle k\rangle clustering reduces the size of cascades, while the converse occurs for intermediate values of the average degree. In Ref.[195], Liu et al. showed that using the seed size ρ0\rho_{0} as the control parameter, the crossover of the continuous and discontinuous phase transitions can be found.

Watts’s threshold model can be seen as a particular instance of a more generalized model of contagion [506], and has been used to study the effects of network properties that may influence the spreading process, such as boolean networks [507], SW networks [254], multiple layers [261], temporal networks [508] where the associated bursty activity of individuals may either facilitate [509] or hinder [510] the spreading process.

5.3 Traffic and transportation

As travel demand increases in modern cities, traffic congestion becomes more frequent. As a result, the study on the dynamical transition from free flow to congestion in traffic flow has attracted much attention. To understand the traffic transition in a network scale (representing a city or an urban region), the percolation model has also been applied to the urban traffic network.

With percolation theory, the typical question of how the traffic in the network collapses from a global efficient state to isolated local flows in small clusters is clearly demonstrated, which might be useful for improving transportation efficiency, reducing traffic delay, and facilitating emergency evacuation. Percolation theory also provides useful tools for urban planning. Traffic network connectivity under disasters, such as earthquake or rain, has also been investigated by percolation theory.

5.3.1 Percolation in urban traffic networks

Li et al. considered the percolation process in Beijing road traffic network, and identified bottleneck roads that connect different functional clusters [511]. The velocity data of 55-min segment records measured in a central region of Beijing is used to construct the road network, where nodes represent the intersections and links represent the road segments between two intersections. For each road, the velocity varies with time during a day. A dynamic functional traffic network can be constructed by the links with velocity higher than a given threshold qq. With a small threshold qq, a giant cluster of functional network can extend to almost the full scale of the road network. As qq increases, the size of the giant component decreases, and the second-largest cluster becomes maximal at a threshold qcq_{c}, which signifies the percolation transition from the connected phase to the fragmented phase of the functional traffic network. The critical value qcq_{c} can be seen as an indicator of the efficiency of global traffic. Obviously, an efficient traffic system will have a higher qcq_{c}.

In a static network, bottleneck links are identified usually based on structural connectivity information. Traffic, as a dynamical non-equilibrium system, evolves with time and shows varying qcq_{c} during the day. The bottleneck links of global traffic will also evolve accordingly and appear in different locations in different hours. By investigating the functional cluster at percolation criticality, Li et al. identified some bottleneck links that play a critical role in bridging different functional clusters of traffic [511]. With the congestion of these links, the giant cluster will disintegrate and result in some small clusters. Conversely, improvement of these critical bottleneck roads can significantly increase the threshold qcq_{c} of the traffic network, thus benefit the global traffic. This kind of improvement of qcq_{c} is higher than that by increasing the velocity of a random link or the link with the highest path-based betweenness. This proves the uniqueness of the bottleneck links found by percolation theory.

Zeng et al. further identified two modes of percolation behaviors in Beijing and Shenzhen traffic network [512]. To investigate the differences of the dynamic traffic network in rush hours and nonrush hours, they performed a percolation analysis by calculating the size of the functional clusters. At criticality, a well-defined Fisher exponent τ\tau as that in percolation theory can be observed. With this, Zeng et al. found that the disintegration transition of urban traffic can be characterized by two sets of exponents. During rush hours on workdays, the critical exponent τ\tau is in general smaller than that during nonrush hours or days off. During workday rush hours, τ\tau is close to the theoretical result of the two dimensional lattice (τ=187/912.055\tau=187/91\approx 2.055), while during other periods, it is close to the mean-filed value for high dimensional systems (τ=5/2\tau=5/2).

This phenomenon can be understood as follows. The urban traffic network can be generally seen as a system embedded in two dimensions with some long-range connections corresponding to urban highways. During nonrush hours, all the roads are in the free flow state, the system is thus a two dimensional lattice with long-range connections. In contrast, during rush hours, the highways become congested and the urban traffic network reduces to a two dimensional system. With the aid of traffic management methods, the traffic system can be tuned to the desired class by adjusting the amount of effective long-range connections. Using the corresponding signal control or road pricing on urban highways, this mechanism can help to create a significantly better traffic system.

The above investigations are based on the analysis of the empirical traffic data. Obviously, the percolation properties vary with traffic parameters, such as the level of traffic load and the detailed behaviors of drivers. Such analysis can also be performed by modeling simulations. Wang et al. studied the traffic percolation properties on a traffic system based on an agent-based model [513]. Using the traffic model in Ref.[514], the level of traffic demand and routing preference can be tuned to verify their impacts on the percolation property. It has been widely proved that the traffic will undergo a phase transition from the free state to the congested state with the increase of traffic load in the network. In Ref.[513], the functional network is constructed by the free state nodes with load lower than a threshold. The emergence of the giant cluster indicates the percolation transition and the formation of global traffic flow. It is also found that the threshold of the traffic percolation qcq_{c} shows a sharp minimum value at the traffic transition threshold, indicating that the global flow can be maintained with minimal number of local flows. The value of qcq_{c} is also influenced by the routing choice. In the giant percolation cluster, optimal paths will accumulate to a high level. This can provide a possible routing choice to mitigate traffic congestion.

In the modern urban traffic system, different transportation modes have been widely constructed, including bus, metro, and road networks, which can be represented as a multiplex network. In such a system, a fraction of node failures in one layer can trigger a cascade of failures that propagate in the multiplex network, see Sec.3.2. Baggag et al. studied the resilience and robustness of multi-modal transportation networks using percolation theory [515]. In this study, percolation is used to investigate the impact of link failures in the multiplex transportation network. By calculating the size of the mutually connected giant cluster under random or targeted failures, the robustness of multiplex transportation networks can be evaluated. The four cities studied in Ref.[515] behave similarly in terms of coverage degradation under random failure, with Paris network being the most robust.

5.3.2 Percolation in connected vehicle networks

Connected Vehicles (CV) technology is the leading edge of intelligent traffic systems [516], and has great potential to mitigate traffic congestion through the creation of a CV network [517]. Once the vehicle cloud network is formed, travel information such as speed, density, and signal timing can be retrieved from the network. This information would help the driver’s decision-making and eventually enhance the mobility of CV in terms of travel time. In an urban traffic system, the benefits of the corresponding applications will depend on the CVs’ communication range and the market penetration, which can be featured by the percolation model.

The study of CV network was pioneered by the analysis of network connectivity for wireless sensor networks and ad-hoc networks. Ammari and Das studied the sensing-converge and network-connectivity in wireless sensor networks using percolation theory [518]. A probabilistic approach is created to compute the covered area fraction at percolation threshold. Under various scenarios, the critical percolation density and radius of the covered components are identified. Similarly, Khanjary et al. conducted percolation studies in the two-dimensional fixed-orientation directional sensor network [519]. The critical density of the nodes can be analytically computed from the percolation theory framework.

Jin et al. first assessed the connectivity of Vehicular Ad-hoc NETwork (VANET) through a percolation framework [520]. The relationship among network connectivity, vehicle density and communication range was analyzed. When vehicle density or communication range is big enough, VANET will show a jump of network connectivity, corresponding to the percolation transition. Given vehicle density, the minimum communication range can be calculated to achieve good network connectivity. As a large transmission range can cause serious collisions in wireless links, the percolation analysis can help to determine the tradeoff and guide the deployment of VANETs in real world.

Talebpour et al. investigated the effect of CV information availability on the stability of traffic flow through continuum percolation theory [521]. The impact of CV density and communication range on the connectivity of the network is determined by percolation properties. The results show that as the communication range increases, the traffic flow system becomes more stable.

Mostafizi et al. investigated the impacts of CV network on the mobility of traffic systems at varying levels of market penetration and communication range [522]. When more CVs are present, the network tends to be more connected. When the percentage of CV decreases, there exists a percolation threshold where the giant cluster breaks apart. It has been found that the percolation transition is a function of both market penetration and communication range. The emergence of mobility benefits is highly related to the percolation transition in CV network. Only when the system reaches the percolation threshold, the benefits of reducing mean travel time begin to appear. Below the threshold, although small clusters can form in the network, they will not create a significant impact on the mean travel time due to the scarce information available from the CV network.

5.3.3 Percolation in urban traffic planning

The analysis of the inter-relations between traffic flows and urban street networks can provide a meaningful tool for urban planning processes. From this perspective, Serok et al. identified functional spatio-temporal street clusters of London and Tel Aviv using network percolation analysis [523]. The dynamics of these clusters and their spatial stability over time are analyzed, which can help to develop new, adaptive, decision-making tools for urban and transportation planners. Traditionally, urban planning is a long-term and static process. The application of traffic percolation will enable planners to keep their role in the flexible and dynamic urban system.

Behnisch et al. studied the settlement connectivity in Germany using the percolation concept [524]. The percolation is introduced to quantify the connectivity of buildings. They found that at a critical distance of 830±10830\pm 10 m the buildings settlement network in Germany will undergo a transition from isolated clusters to a country-spanning building cluster. This rather short critical distance indicates that the landscape is already overbuilt in Germany. For urban planning, limiting urban sprawl and further land consumption is crucial. The application of percolation can provide a useful monitoring tool for settlement and landscape degradation.

Transport accessibility is an important issue for industrial relocation and the related sustainable development of economics. Jiang et al. studied the impact of transportation accessibility on industry relocation pattern of China’s Yangtze River Economic Belt using percolation theory [525]. In the network, the nodes are cities, and the links are defined according to the Yangtze River’s waterways and the highways in the Yangtze River Economic Belt. Their results proved the existence of percolation transitions during the process of industry relocation, and identified the bottlenecks for industry relocation as cities located in border regions.

Piovani et al. studied the relation of urban retail location distribution and road network properties in London from percolation theory. Road network and retail location are two interwoven factors in an urban ecosystem. The emergence of clustering in retail centers depends greatly on road network [526]. Piovani et al. compared the road network’s hierarchical percolation structure with the retail location distribution. To evaluate the road network, the subgraph is constructed by the road segments with length below a threshold, and the transition is defined at the threshold that generates the maximal entropy configuration. The results show that clusters of the road network and retail location are very similar in spatial characteristics.

5.3.4 Percolation in post-disaster traffic networks

Post-disaster traffic networks are very important since they offer access to affected areas, sustain evacuation and medical operations after a disaster. If a disaster is very severe and many roads are malfunctioned, the traffic network could collapse into isolated subnetworks, and there will be no route connecting some locations. Therefore, the connectivity of post-disaster traffic networks also provide an important index for the assessment of vulnerability, robustness, and resilience of the road system.

Zhou et al. studied the connectivity of post-earthquake road networks using percolation theory [527]. To assess the road network, they proposed the concepts of “global connectivity” and “local connectivity”. Global connectivity measures the extent to which the whole network is connected, and local connectivity measures the distances between each node to its neighbors. Specifically, the global connectivity is quantified by the percolation threshold, the size of the giant subnetwork, and average sizes of small subnetworks, while local connectivity is quantified by the number of neighbors. In particular, Zhou et al. proposed an integrated percolation model combining localized attacks and random failures to capture the feature of earthquake-impacted [527]. Namely, the impact of the earthquake on the inner network is modeled as a localized attack (where node disruption is dominant), while it is modeled as random link disruption in the outer network. The model can be used to assess the connectivity of road networks impacted by different magnitudes of earthquakes.

Extreme weather, such as torrential rain and hurricane, could greatly influence urban traffic system. Although the traffic flows under extreme weather have been widely studied, little work has been done to understand whether and how the destruction of local roads can degrade the global traffic. Guo et al. studied the impact of torrential rain on traffic percolation properties in Beijing and Wuhan [528]. In Beijing, whose street layout is more symmetrical, the dysfunctional roads are scattered uniformly. However, for Wuhan with a river valley, the dysfunctional roads appear in a cluster way and partially along the riverbank. Interestingly, the percolation threshold of traffic network is stable against weather perturbation, even if some roads are significantly influenced. This is mainly because extreme weather (torrential rain) will not only damage road conditions in the supply end, but also reduce the traffic demand correspondingly.

5.4 Evolutionary game

Evolutionary game theory offers a powerful mathematical framework to study the emergence of cooperation among selfish individuals. When the contact network of individuals is taken into account, i.e., the spacial evolutionary game, the focus of the research just translates to the emergence of the clusters formed by adjacent cooperators. Therefore, the percolation theory is also employed to explore the spacial evolutionary game.

In Ref.[529], Yang et al. studied the cooperation percolation in spatial prisoner’s dilemma game. Their model is defined on a two-dimensional square lattice. Initially, with probability ff a node is occupied by a cooperator and with probability 1f1−f a node is occupied by a defector. At each time step, each individual plays the prisoner’s dilemma game with its four nearest neighbors. Specifically, if both players cooperate, then both get the payoff 11. If one cooperates while the other defects, the defector gets bb while the cooperator gets nothing. If both defect, no gains for both of them. After the games, individuals update their strategies asynchronously in a random sequential order. A randomly selected player compares its payoff with its nearest neighbors and changes strategy by following the one (including itself) with the highest payoff.

They found that the percolation transition can be well-defined by the control parameter ff, the phase diagram can be divided into several regions based on whether the cooperator cluster can percolate. With the analysis of finite size scaling, they claimed that the region 1<b<4/31<b<4/3 belongs to the universality class of the classical percolation in two dimensions, whereas region 3/2<b<23/2<b<2 is subject to that of invasion percolation with trapping. However, an exhaustive simulation revealed that all these regions belong to the universality class of the classical percolation in two dimensions [530]. In Ref.[530], the authors also studied the model on the triangular lattice and the hexagonal lattice, and concluded that the observed percolation transitions on various two-dimensional lattices belonged to the universality class of the standard percolations.

In Ref.[531] the percolation transition was also observed in the Stag Hunt game and the Snowdrift game. More interestingly, they found a double transition in ER networks. With the increase of ff, the giant cooperator cluster increases continuously from zero above a critical point, then at a second critical point the giant cooperator cluster jumps from a low value to a very high value. These rich phenomena highlight the role of the social contact in the emergence of the cooperation.

In Ref.[532] Lin et al. also found a scaling feature in the evolutionary game on the interaction graph extracted from percolation process. The power-law distribution of cooperator clusters like that in percolation model was also observed in fractal hierarchical networks [533]. Besides, an interaction graph at the percolation threshold has also been proved to be an optimal population density for public cooperation [534, 535]. Peng and Li suggested that this is due to the fractal structure formed at the percolation threshold [536], which constructs an asymmetric barrier that the defection strategy is almost impossible to cross, but the cooperation strategy has a not too small chance. In the system with mobile individuals, the geometric percolation of the contact network (i.e., irrespective of the strategy) also enhances cooperation [537].

6 Discussion and outlook

Percolation describes the patterns of connections under a random or semi-random connection mechanism, while network science focuses on the non-trivial topological features inspired largely by the empirical study of real-world networks. In this sense, although the proposal of percolation model predates the rise of the research of complex networks by about fifty years, it inherently provides a framework for the studies of complex networks. So it is no surprise that the application of percolation theory in network science has achieved fruitful results.

First of all, the study of percolation process on a non-trivial topology, including strong heterogeneity, high clustering, assortativity or disassortativity, modular, hierarchical structures and so on, has brought new power to percolation theory, especially for the fractal and non-physical dimension systems. For examples, the heavy-tailed degree distribution of the SF network induces a special mean-field nature of percolation transition (see Sec.2.5.2), and the percolation in the growing network demonstrates an infinite order transition like that found in the Berezinskii-Kosterlitz-Thouless phase transition in condensed matter (see Sec.3.4). In recent years, we have also witnessed a wide range of discussion on the explosive percolation and the hybrid percolation transition on network systems (see Sec.3). Besides, quite a number of pruning rules used in the identification of special network structures are also defined in terms of percolation process, such as kk-core percolation (see Sec.3.1), clique percolation (see Sec.4.3), and core percolation (see Sec.3.1). These percolation models also attracted a lot of studies on the system with physical dimensions, and enriched the research of the percolation theory [20, 21].

In addition to providing theoretical supports for the findings from empirical researches, the ideas and conceptions of percolation theory furnish a quantitative analysis approach to the methodology of network science. A typical representative is the giant cluster, which has already been widely used to evaluate the state of a network. For the study of network resilience, if a giant cluster exists, the network is considered functional, otherwise the network is paralyzed. In this way, the percolation threshold can be used to quantify the robustness of networks (see Sec.4.2). Similarly, if the infections form a giant cluster, it can be concluded that the epidemic breaks out, so that an epidemic with a smaller percolation threshold is more infectious (see Sec.5). Even for the problems that seemed to be irrelative to percolation process, the percolation analysis can offer a good interpretation for their findings, such as the evolution of traffic state of an urban network, and the emergence of the cooperation clusters (see Sec.5). Although almost all of these studies are not originally motivated by the percolation process and the evolution rules might be far away from the standard percolation, the emergence of the giant cluster is employed as a picture of the inherent problem. Because of this, some of the modeling of network dynamics just take the form of percolation process, such as the cascading failures, and the spreading of information/opinion/epidemics (see Sec.5).

The classical percolation in physical dimensions often refers to regular systems, such that the percolation results are totally random, and the focus is more on the statistic characteristic of the clusters. However, the important characteristic of empirical networks, i.e., strong heterogeneity, can bring determinacy into the percolation configuration. For example, the hubs of a SF network almost always belong to the giant cluster of different realizations. Once the hubs are destroyed, the network will be disintegrated. From this perspective, the percolation results in part reflect the connective features of networks, and thus can be used as an algorithm for identifying the special subsets of networks. Some typical examples have been reviewed in Sec.4, including kk-core decomposition, community detection, vital node identification, and so on.

In summary, the percolation theory has already percolated into many research aspects of network science, ranging from structure analysis to dynamics modeling. Loosely speaking, one would hardly evade conceptions, analytic methods, and algorithms of percolation in the studies of complex networks. For this reason, almost all the review articles on the network science spill some ink on the corresponding percolation problems, such as network structures and dynamics [5, 3, 6, 538], multiplex networks [8, 445, 317, 457, 446], spatial networks [539], epidemic process [23, 473], explosive transition [206], community detection [458, 540], kk-core [214], and vital nodes identification [245]. Besides, the significant findings of network percolation are also included in the articles on recent advances of percolation transition [21, 20].

By tracing the progress of percolation on networks and its applications, in this paper we briefly review the percolation on complex networks from models, theoretical methods, to applications for network problems. Overall, the studies of percolation on networks yield a rich harvest from theory to applications. Meanwhile, many challenges still remain. First, real networks have a rich mixture of properties, such as degree correlation, clustering, modularity, heterogeneity, small world, and spatial constraint. For these complex structures, only some approximate theory can be applied to simplify the structural models, while whether these structures can separately or collectively change the nature of the percolation transition or not remains an open question. Moreover, the mixture pattern often refers to the high-order structure of networks. This is a new research hotspot of network science in recent years, for which the percolation theory also plays an important role [541, 542, 543, 544, 545, 546, 10].

Furthermore, as a random process, the percolation result of a given network can differ from realization to realization, while the network identification often requires a deterministic result. How to solve this conflict when employing percolation process to analyze network structure? Are there some ways to take advantage of the fluctuation of percolation results to give more information on network structures? Moreover, it is known that the percolation clusters in the subcritical phase are dominated by local connections, and the global connections for the supercritical phase. In this way, is there a unified algorithm based on percolation process for identifying network structures in different scales? To answer these questions, more researches should also be done to explore the percolation process on networks.

Acknowledgments

We would like to thank Pan Zhang, Manuel S. Mariani, and Xu Na for useful discussions and comments. The research was supported by the National Natural Science Foundation of China under Grant Nos.11622538, 61673150, 61773148, and 12072340. L.L. also acknowledges Zhejiang Provincial Natural Science Foundation of China under Grant No.LR16A050001, and the Science Strength Promotion Programme of UESTC.

References