This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Learnability of Competitive Threshold Models

Anonymous submission
Abstract

Modeling the spread of social contagions is central to various applications in social computing. In this paper, we study the learnability of the competitive threshold model from a theoretical perspective. We demonstrate how competitive threshold models can be seamlessly simulated by artificial neural networks with finite VC dimensions, which enables analytical sample complexity and generalization bounds. Based on the proposed hypothesis space, we design efficient algorithms under the empirical risk minimization scheme. The theoretical insights are finally translated into practical and explainable modeling methods, the effectiveness of which is verified through a sanity check over a few synthetic and real datasets. The experimental results promisingly show that our method enjoys a decent performance without using excessive data points, outperforming off-the-shelf methods.

1 Introduction

Social contagion phenomenon such as the propagation of news and behavior patterns, or the adoption of opinions and new technologies, have attracted huge research interests from various fields like sociology, psychology, epidemiology and network science cozzo2013contact; Goldstone2005ComputationalMO; camacho2020four. Formal methods for modeling social contagion are crucial to many applications, for example, the recommendation and advertising in viral marketing qiu2018deepinf and misinformation detection tong2020stratlearner. Given the operational diffusion models, a fundamental problem is to which extend we can learn such models from data, i.e., the learnability of the models. In this paper, we study the classic threshold models in which the core hypothesis is that the total influence from active friends determines the contagion per a threshold. We seek to derive its PAC learnability and sample complexity by which efficient learning algorithms are possible.

Motivation.

Learning contagion models from data have been a central topic in social computing. Some existing works focused on estimating the parameters on diffusion models by local estimation techniques to further predict the influence results goyal2010learning; du2012learning; de2014learning. Most of their works made assumptions on the fully observed cascade data. In real life, we cannot always obtain the data set with local diffusion information. In our work, we also consider data set with incomplete observations. Moreover, our learnability results make no assumptions on the data type. Another line of work developed various graph neural network models to estimate the influence of each node combined with the deep learning techniques li2017deepcas; qiu2018deepinf; leung2019personalized. Although these methods have shown significant improvements in influence estimation performance, they do not have a theoretical guarantees of their approaches. In our work, we aim to show the analytical sample complexity for influence learning.

Recently, some works have focused on learning the influence function for different diffusion models from the training data, which maps from the seed set to the diffusion results. The works of du2012learning; he2016learning learn the approximate influence function by parameterization of a group of random reachability functions. Our approach will learn the influence function directly from their hypothesis space without any estimation. The most related works proposed the proper PAC learning algorithms to learn the influence function narasimhan2015learnability; adiga2019pac. Their work assumed that only one cascade diffused in the network. However, it is common that multiple competitive cascades disseminate simultaneously, for example, the network marketing campaign qiu2018deepinf. From this standpoint, we takes a step towards the influence estimation problem without limiting the number of cascades.

Contribution.

Our contribution can be summarized as follows.

  • A influence function class under competitive threshold model with finite VC dimensions: we simulate the diffusion process as the information propagation through the neural network with all piecewise polynomial units.

  • Proper PAC learning algorithms: We propose the learning algorithms under two data types following the empirical risk minimization learning strategy. In particular, we develop a polynomial time approach via linear programming under full observation.

  • Superior performance: The experiment results on different data set have demonstrated a better prediction performance for our proposed approaches.

2 Preliminaries

We first describe the diffusion model and then present the learning settings.

2.1 Competitive linear threshold model

We follow the standard competitive linear threshold (CLT) model ?, which has been widely adopted in the formal study of social contagions ?. A social network is represented as a directed graph G=(V,E)G=(V,E), where V={v1,,vN}V=\{v_{1},...,v_{N}\} is the set of nodes and EE is the set of edges, with N=|V|N=|V| and M=|E|M=|E| being the numbers of nodes and edges, respectively. For each node vVv\in V, we denote by N(v)VN(v)\subseteq V the set of its in-neighbors.

We consider the scenario where there are S+S\in\mathbb{Z}^{+} information cascades, each of which starts to spread from their seed set ψ0sV\psi_{0}^{s}\subseteq V for s[S]s\in[S]. Associated with each node viv_{i} and each cascade ss, there is a threshold θis[0,1]\theta_{i}^{s}\in[0,1]; associated with each edge (vi,vj)E(v_{i},v_{j})\in E and each cascade ss, there is a weight wi,js[0,1]w_{i,j}^{s}\in[0,1]. Without loss of generality, we assume that the seed sets are disjoint and the weights are normalized such that vjN(vi)w(j,i)s1\sum_{v_{j}\in N(v_{i})}w_{(j,i)}^{s}\leq 1 for each node viVv_{i}\in V and cascade ss. The nodes are initially inactive and can be ss-active if activated by cascade ss. In particular, the diffusion process unfolds step by step as follows:

  • Step 𝟎\mathbf{0}: For each cascade s[S]s\in[S], nodes in ψ0s\psi_{0}^{s} become ss-active.

  • Step 𝐭>𝟎\mathbf{t>0}: Let ψt1sV\psi_{t-1}^{s}\subseteq V be the set of ss-active nodes after t1t-1 step. There are two phases in step tt.

    • Phase 1: For an inactive node viVv_{i}\in V, let ζis\zeta^{s}_{i} be the summation of the weights from its ss-active in-neighbors:

      ζis=vjN(vi)ψt1sw(j,i)s.\displaystyle\zeta^{s}_{i}=\sum_{v_{j}\in N(v_{i})\bigcap\psi^{s}_{t-1}}w_{(j,i)}^{s}. (1)

      The node viv_{i} is activated by cascade ss if and only if ζisθis\zeta_{i}^{s}\geq\theta_{i}^{s}. After phase 1, it is possible that a node can be activated by more than one cascades.

    • Phase 2: For each node viv_{i}, it will be finally activated by the cascade ss^{*} with the largest weight summation:

      s=argmaxs:ζsiθsiζis.\displaystyle s^{*}=\operatorname*{arg\,max}_{s:\zeta^{i}_{s}\geq\theta^{i}_{s}}\zeta_{i}^{s}. (2)

      For tie-breaking, cascades with small indices are preferred.

Refer to caption
Figure 1: Diffusion process example. In this example, we have two cascades with seed sets {v2}\{v_{2}\} and {v1}\{v_{1}\}. The weights and thresholds are given as graph shows. According to the CLT model, v3v_{3} will become 11-active after one time step. v4v_{4} will become 11-active at the end of diffusion process.

Clearly, there are at most nn diffusion steps, and without loss of generality, we may assume that there are always nn diffusion steps. An illustration is given in Figure 1. The following notations are useful for formal discussions.

Definition 1.

We use a binary matrix 𝐈0{0,1}N×S\mathbf{I}_{0}\in\{0,1\}^{N\times S} to denote the initial status, where 𝐈0,i,s=1\mathbf{I}_{0,i,s}=1 if and only if node viv_{i} is in the seed set ψ0s\psi_{0}^{s} of cascade ss. For a time step t>0t>0, the diffusion status is denoted by a binary tensor 𝐈t{0,1}N×S×2\mathbf{I}_{t}\in\{0,1\}^{N\times S\times 2}, where, for a cascade ss, 𝐈t,i,s,p=1\mathbf{I}_{t,i,s,p}=1 if and only if node viv_{i} is ss-active after phase p{1,2}p\in\{1,2\} in time step tt.

We are interested in the learnability of the influence function of CLT models.

Definition 2.

Given a CLT model, the influence function F:{0,1}N×S{0,1}N×SF:\{0,1\}^{N\times S}\mapsto\{0,1\}^{N\times S} maps from the initial status 𝐈0\mathbf{I}_{0} to the final status 𝐈N,:,:,2\mathbf{I}_{N,:,:,2}.

2.2 Learning settings

Assuming that the social graph GG is unknown to us, we seek to learn the influence function using a collection DD of K+K\in\mathbb{Z}^{+} pairs of initial status and resulted the statuses:

D={(𝐈01,𝐈1:N,:,:,21),,(𝐈0K,𝐈1:N,:,:,1:2K)}\displaystyle D=\Big{\{}(\mathbf{I}_{0}^{1},\mathbf{I}_{1:N,:,:,2}^{1}),...,(\mathbf{I}_{0}^{K},\mathbf{I}_{1:N,:,:,1:2}^{K})\Big{\}} (3)

where 𝐈1:N,:,:,1:2k\mathbf{I}_{1:N,:,:,1:2}^{k} denotes the diffusion status after each phase in each diffusion step:

𝐈1:N,:,:,2k=[𝐈1,:,:,1k,𝐈1,:,:,2k,𝐈2,:,:,1k,𝐈2,:,:,2k,,𝐈N,:,:,1k,𝐈N,:,:,2k].\mathbf{I}_{1:N,:,:,2}^{k}=\Big{[}\mathbf{I}_{1,:,:,1}^{k},\mathbf{I}_{1,:,:,2}^{k},\mathbf{I}_{2,:,:,1}^{k},\mathbf{I}_{2,:,:,2}^{k},...,\mathbf{I}_{N,:,:,1}^{k},\mathbf{I}_{N,:,:,2}^{k}\Big{]}.

We use the zero-one loss 01\operatorname{\mathcal{\ell}_{0-1}} to measure the difference between the prediction 𝐈^N,:,:,2\hat{\mathbf{I}}_{N,:,:,2} and the ground truth 𝐈N,:,:,2\mathbf{I}_{N,:,:,2}, which is defined as

01(𝐈N,:,:,2,𝐈^N,:,:,2)\displaystyle\operatorname{\mathcal{\ell}_{0-1}}(\mathbf{I}_{N,:,:,2},\hat{\mathbf{I}}_{N,:,:,2})\operatorname{\coloneqq}\hskip 128.0374pt
12NSi[N]s[S]𝟙[𝐈N,i,s,2𝐈^N,i,s,2]\displaystyle\hskip 56.9055pt\frac{1}{2NS}\sum_{i\in[N]}\sum_{s\in[S]}\mathds{1}\big{[}\mathbf{I}_{N,i,s,2}\neq\hat{\mathbf{I}}_{N,i,s,2}\big{]}

For a distribution 𝒟\mathcal{D} over the input 𝐈0\mathbf{I}_{0} and a CLT model generating the training data DD, we wish to leverage DD to learn a mapping F^:{0,1}N×S{0,1}N×S\hat{F}:\{0,1\}^{N\times S}\mapsto\{0,1\}^{N\times S} such that the generalization loss can be minimized:

L(F^)𝔼𝐈0𝒟[01(F(𝐈0),F^(𝐈0))].L(\hat{F})\operatorname{\coloneqq}\mathbb{E}_{\mathbf{I}_{0}\sim\mathcal{D}}\Big{[}\operatorname{\mathcal{\ell}_{0-1}}\big{(}{F}(\mathbf{I}_{0}),\hat{F}(\mathbf{I}_{0})\big{)}\Big{]}. (4)

3 Learnability analysis

In this section, we present a PAC learnability analysis for the influence function. The overall idea is to design a realizable hypothesis with a finite VC dimension by which the PAC learnability can be established. The following assumptions will be made throughout the analysis.

Assumption 1.

We assume that the diffusion model is a decimal system of a finite precision: for the involved parameters (i.e., wi,jsw_{i,j}^{s} and θis\theta_{i}^{s}), number of their decimal places is upper bounded by a constant Q+Q\in\mathbb{Z}^{+}. any references to support this one?

Assumption 2.

The cascade number S=2ZS=2^{Z} is a power of 22 for some constant Z+Z\in\mathbb{Z}^{+}.

3.1 Realizable hypothesis space

Refer to caption
Figure 2: One-step Simulation. The left part depicts the scenario where the node v3v_{3} is activated by three cascades among which cascade aa finally wins the competition. The right parts demonstrates how to simulate such a scenario through three our design; the computation path for node v3v_{3} is highlighted.

In order to obtain a realizable hypothesis space, we seek to explicitly simulate the diffusion function through neural networks. While the universal approximation theory ensures that such neural networks exist, we wish for explicitly designs of a polynomial size so as to admit efficient learning algorithms. To this end, we first demonstrate how to simulate the one-step diffusion under the CLT model, which can used repeatedly to simulate an arbitrary number of diffusion steps.

For the one-step diffusion, its simulation is done through two groups of layers that are used in turn to simulate phase 1 and phase 2. As illustrate is in Figure 2, the simulation starts by receiving the diffusion status 𝐇1{0,1}NS\mathbf{H}^{1}\in\{0,1\}^{N\cdot S} after the last step, where for i[N]i\in[N] and s[S]s\in[S], 𝐇(i1)S+s1=1\mathbf{H}^{1}_{(i-1)\cdot S+s}=1 if and only if node viv_{i} is ss-active. After 2Z+42Z+4 layers of transformations, it ends by outputting a vector 𝐇2Z+5{0,1}NS\mathbf{H}^{2Z+5}\in\{0,1\}^{N\cdot S}, which corresponds exactly to the diffusion status after one diffusion step under the CLT model. The layer transformations all follow the canonical form:

𝐇l+1=σl+1(𝐖l𝐇l)\mathbf{H}^{l+1}=\sigma^{l+1}(\mathbf{W}^{l}\mathbf{H}^{l}) (5)

with 𝐖l\mathbf{W}^{l} being the weight matrix and σl\sigma^{l} being a collection of activation functions over the elements of 𝐖l𝐇l\mathbf{W}^{l}\mathbf{H}^{l}. In the rest of this subsection, we will present our designs of the weight matrices and the activation functions that can fulfill the goal of simulation.

3.1.1 Phase 1

The single-cascade activation can be simulated by

𝐇2=σ1(𝐖1𝐇1),\displaystyle\mathbf{H}^{2}=\sigma^{1}(\mathbf{W}^{1}\mathbf{H}^{1}), (6)

where for each i,j[N]i,j\in[N] and s1,s2[S]s_{1},s_{2}\in[S], we have

𝐖(i1)S+s1,(j1)S+s21\displaystyle\mathbf{W}^{1}_{(i-1)\cdot S+s_{1},(j-1)\cdot S+s_{2}}\operatorname{\coloneqq}\hskip 128.0374pt
{wi,js1if (vi,vj)E and s1=s21if i=j and s1=s20otherwise,\displaystyle\hskip 56.9055pt\begin{cases}w_{i,j}^{s_{1}}&\text{if $(v_{i},v_{j})\in E$ and $s_{1}=s_{2}$}\\ 1&\text{if $i=j$ and $s_{1}=s_{2}$}\\ 0&\text{otherwise}\end{cases},

and for each i[N]i\in[N] and s[S]s\in[S], we have

σ(i1)S+s1(x){xif xθis0Otherwise.\sigma^{1}_{(i-1)\cdot S+s}(x)\operatorname{\coloneqq}\begin{cases}x&\text{if $x\geq\theta_{i}^{s}$}\\ 0&\text{Otherwise}\end{cases}. (7)

For each i[N]i\in[N] and s[S]s\in[S], the resulted 𝐇(i1)S+s2\mathbf{H}^{2}_{(i-1)\cdot S+s} is equal to the influence summation of cascade ss at node viv_{i} if viv_{i} is activated by cascade ss, and otherwise zero.

3.1.2 Phase 2

In simulating phase 2, the general goal is to figure out among the candidate cascades which one wins the competition. Intuitively, this can be implemented through a hierarchy of pairwise comparisons. One technical challenge is that comparison made directly by linear functions tells only the difference between the summations, while we need to keep the summation for future comparisons; we observe that this problem can be solved by using two consecutive layers of piecewise linear functions. A more serious problem is that simply passing the largest influence summation to following layers causes the loss of the information about the cascade identity (i.e. index), making is impossible to identify the node status – which is required as the input to simulating the next diffusion step. We propose to address such a challenge by appending the cascade identity to the weight summation as the lowest bits. The total shifted bits are determined by the system precision (Assumption 1). Doing so does not affect the comparison result while making it is possible to retrieve the cascade identity by scale and modulo operations. In particular, two groups of transformations are employed to achieve the pairwise comparison and status recovery.

Pairwise comparison. With the input 𝐇2\mathbf{H}^{2} from phase 1, we encode the cascade identity through

𝐇3=σ2(𝐖2𝐇2),\displaystyle\mathbf{H}^{3}=\sigma^{2}(\mathbf{W}^{2}\mathbf{H}^{2}), (8)

where for each i[NS]i\in[N\cdot S] and j[NS]j\in[N\cdot S], we have

𝐖i,j2{10(Q+log10S+1)if i=j0otherwise,\mathbf{W}^{2}_{i,j}\operatorname{\coloneqq}\begin{cases}10^{(Q+\lfloor{\log_{10}S}\rfloor+1)}&\text{if $i=j$}\\ 0&\text{otherwise}\end{cases}, (9)

and for each i[N]i\in[N] and s[S]s\in[S], we have

σ(i1)S+s2(x)=x+s.\sigma^{2}_{(i-1)\cdot S+s}(x)=x+s. (10)

The design of 𝐖2\mathbf{W}^{2} ensures that sufficient bits are available at the low side, and the activation function σ2\sigma^{2} then writes the cascade identity into the empty bits (Remark 1 in Figure 2).

Using the information in 𝐇3\mathbf{H}^{3}, for a node viVv_{i}\in V, the largest influence summation can be determined by the sub-array

[𝐇(i1)S3,𝐇(i1)S+13,,𝐇(i1)S+S3]\Big{[}\mathbf{H}^{3}_{(i-1)\cdot S},\mathbf{H}^{3}_{(i-1)\cdot S+1},...,\mathbf{H}^{3}_{(i-1)\cdot S+S}\Big{]}

through pairwise comparisons. For the comparison between two cascades s1s_{1} and s2s_{2} at node viv_{i} (s1<s2s_{1}<s_{2}), as illustrated in Remark 2 in Figure 2, the difference

𝐇(i1)S+s13𝐇(i1)S+s23\mathbf{H}^{3}_{(i-1)\cdot S+s_{1}}-\mathbf{H}^{3}_{(i-1)\cdot S+s_{2}}

is first fed into a ReLu function of which the result is then added by 𝐇(i1)S+s23\mathbf{H}^{3}_{(i-1)\cdot S+s_{2}}. We can verify that this returns exactly

max(𝐇(i1)S+s13,𝐇(i1)S+s23).\max(\mathbf{H}^{3}_{(i-1)\cdot S+s_{1}},\mathbf{H}^{3}_{(i-1)\cdot S+s_{2}}).

Therefore, two layers are used to eliminate a half of the candidate cascades, thereby resulting in 2Z2Z layers in total. The output of this part is 𝐇2Z+3N\mathbf{H}^{2Z+3}\in\mathbb{R}^{N} in which we have

𝐇i2Z+3=max(𝐇(i1)S3,𝐇(i1)S+13,,𝐇(i1)S+S3)\displaystyle\mathbf{H}^{2Z+3}_{i}=\max\Big{(}\mathbf{H}^{3}_{(i-1)\cdot S},\mathbf{H}^{3}_{(i-1)\cdot S+1},...,\mathbf{H}^{3}_{(i-1)\cdot S+S}\Big{)}

for each i[N]i\in[N]. Notably, the lowest QQ bits in 𝐇2Z+3\mathbf{H}^{2Z+3} stores the cascade index, which can be recovered by

𝐇2Z+4=σ2Z+3(𝐖2Z+3𝐇2Z+3),\displaystyle\mathbf{H}^{2Z+4}=\sigma^{2Z+3}(\mathbf{W}^{2Z+3}\mathbf{H}^{2Z+3}), (11)

where 𝐖2Z+3\mathbf{W}^{2Z+3} is the identity matrix and we have

σ2Z+3(x):=xmod10log10S+1.\sigma^{2Z+3}(x)\colon=x\mod 10^{\lfloor{\log_{10}S}\rfloor+1}. (12)

Finally, the cascade index is converted to binary indicators through two layers. The first layer XXX and be achieved by

𝐇2Z+5=σ2Z+4(𝐖2Z+4𝐇2Z+4),\displaystyle\mathbf{H}^{2Z+5}=\sigma^{2Z+4}(\mathbf{W}^{2Z+4}\mathbf{H}^{2Z+4}), (13)

where for each i,j[N]i,j\in[N] and s[S]s\in[S], we have

𝐖i,(j1)S+s2Z+4{1if i=j0otherwise,\mathbf{W}^{2Z+4}_{i,(j-1)\cdot S+s}\operatorname{\coloneqq}\begin{cases}1&\text{if $i=j$}\\ 0&\text{otherwise}\end{cases}, (14)

and

σ(i1)S+s2Z+4(x){1if x>=s0otherwise.\sigma^{2Z+4}_{(i-1)\cdot S+s}(x)\operatorname{\coloneqq}\begin{cases}1&\text{if $x>=s$}\\ 0&\text{otherwise}\end{cases}. (15)

The second layer is used to XXX and can be implemented through

𝐇2Z+6=σ2Z+5(𝐖2Z+5𝐇2Z+5),\displaystyle\mathbf{H}^{2Z+6}=\sigma^{2Z+5}(\mathbf{W}^{2Z+5}\mathbf{H}^{2Z+5}), (16)

where for each i,j[N]i,j\in[N] and s1,s2[S]s_{1},s_{2}\in[S], we have

𝐖(i1)S+s1,(j1)S+s22Z+5{1if i=j and s1s21if i=j and s1=s20otherwise,\mathbf{W}^{2Z+5}_{(i-1)\cdot S+s_{1},(j-1)\cdot S+s_{2}}\operatorname{\coloneqq}\begin{cases}1&\text{if $i=j$ and $s_{1}\leq s_{2}$}\\ -1&\text{if $i=j$ and $s_{1}=s_{2}$}\\ 0&\text{otherwise}\end{cases}, (17)

and

σ(i1)S+s2Z+5(x){1if x>=s0otherwise.\sigma^{2Z+5}_{(i-1)\cdot S+s}(x)\operatorname{\coloneqq}\begin{cases}1&\text{if $x>=s$}\\ 0&\text{otherwise}\end{cases}. (18)

𝐇2Z+6\mathbf{H}^{2Z+6} is exactly the one-step diffusion result after 𝐇1\mathbf{H}^{1}. Repeating such one-step simulations for NN steps, the CLT model can be explicitly simulated. One important property is that the entire neural network is composed by piecewise polynomial activation functions, which will used in deriving its the sample complexity. By scrutiny, the following result summarizes the network structure.

Lemma 1.

A CLT model can be simulated by a feed-forward neural network composed of O(S(N+M))O(S\cdot(N+M)) adjustable weights, O(NZ)O(N\cdot Z) layers, and O(N2S)O(N^{2}\cdot S) piecewise linear computation units each with O(S2Q)O(S\cdot 2^{Q}) pieces.

3.2 Efficient ERM

Taking the weights wi,jsw_{i,j}^{s} and the thresholds θis\theta_{i}^{s} as parameters, the neural networks designed in the last subsection form a hypothesis space denoted by \operatorname{\mathcal{F}}. Theorem 1 suggests that for any sample set DD, there always exists a perfect hypothesis in \operatorname{\mathcal{F}}. In what follows, we show that such an empirical risk minimization solution can be efficiently computed. It is sufficient to find the parameters than can ensure the output of each activation function coincides with the diffusion result. Since the activation functions are all pairwise linear functions, the searching process can be done by a linear programming with polynomial number of constraints. Formally, we have the following statement.

Lemma 2.

For a CLT model and each sample set DD, there exists a hypothesis in \operatorname{\mathcal{F}} with zero training error, and it can be computed in polynomial time in terms of N,M,SN,M,S, and |D||D| by solving the following linear programming:

minwui|2N(ui)|,ξ>0k=1Ki=1nt=1ns=1Sξt,i,1,sk+ξt,i,2,sk\min_{w_{u^{i}}\in\mathbb{R}^{|2N(u^{i})|},\xi>0}\sum_{k=1}^{K}\sum_{i=1}^{n}\sum_{t=1}^{n}\sum_{s=1}^{S}\xi^{k}_{t,i,1,s}+\xi^{k}_{t,i,2,s} (19)

Subject to:

(2(It,i,1,sk)1)(ujN(ui)w(i,j)s𝐈t1,i,2,skθis)\displaystyle(2(I_{t,i,1,s}^{k})-1)(\sum_{u_{j}\in N(u_{i})}w_{(i,j)}^{s}\mathbf{I}_{t-1,i,2,s}^{k}-\theta_{i}^{s})\geq\hskip 128.0374pt
1ξt,i,1,sk\displaystyle\hskip 170.71652pt1-\xi^{k}_{t,i,1,s}
ujN(ui)w(i,j)s𝐈t1,i,2,skujN(ui)w(i,j)s𝐈t1,i,2,sk\displaystyle\sum_{u_{j}\in N(u_{i})}w_{(i,j)}^{s^{*}}\mathbf{I}_{t-1,i,2,s}^{k}-\sum_{u_{j}\in N(u_{i})}w_{(i,j)}^{s}\mathbf{I}_{t-1,i,2,s}^{k}\geq\hskip 128.0374pt
0ξt,1,2,sk\displaystyle\hskip 170.71652pt0-\xi^{k}_{t,1,2,s}

3.3 Generalization Performance

The learnability of class \operatorname{\mathcal{F}} is enabled by the analysis of its VC dimension.

Lemma 3.

The VC dimension of class FF is VC()=O~(S(N+M)Q)\operatorname{VC}(\mathcal{F})=\tilde{O}(S(N+M)Q).

proof sketch.

The proof is obtained by first showing that the layers for one-step simulation has manageable VC dimension. Given the fact that repeating such one-step simulations does not increase the model capacity in terms of shattering points, the VC dimension of the entire neural network can be derived immediately. ∎

Together with Lemma 2, the following result summarizes our theoretical findings.

Theorem 1.

The influence function class \mathcal{F} is PAC learnable and the sample complexity is O(VC()log(1/ϵ)+log(1/δ)ϵ)O(\frac{\operatorname{VC}(\mathcal{F})log(1/\epsilon)+log(1/\delta)}{\epsilon}). Please state it in a following way: given the parameter ϵ\epsilon and δ\delta,

proof sketch.

Following empirical risk minimization learning strategy, we can always find a group of parameters that let the empirical risk equal to 0 and that guarantee a realizable case. Based on the The fundamental theorem of statistical learning ? and with a finite VC dimension of 𝓊𝒾𝓈\mathcal{F_{u_{i}^{s}}}, we can obtain that \mathcal{F} is PAC learnable with sample complexity O(VC()log(1/ϵ)+log(1/δ)ϵ)O(\frac{\operatorname{VC}(\mathcal{F})log(1/\epsilon)+log(1/\delta)}{\epsilon}). To this end, the sample complexity of \mathcal{F} is a union bound over all NSNS node. ∎

Remark 1.

(Please discuss the case S=2S=2 here. The main point is that your results recover the existing ones for S=2.)

Given two competitive cascades, we obtain the same bound for the VC-dimension of the influence function under the LT model with single cascade narasimhan2015learnability. The structure for the corresponding neural network is slightly different from our prescribed neural network. For each diffusion step, it contains three layers with all 2N2N nodes, as illustrated in Figure 3.

Refer to caption
Figure 3: An example of designed neural network with 2 cascades.

The architecture of phase 1 is same as prescribed neural network. For phase 2,

𝐇3=σ2(𝐖2𝐇2),\displaystyle\mathbf{H}^{3}=\sigma^{2}(\mathbf{W}^{2}\mathbf{H}^{2}), (20)

where for each i,j[N]i,j\in[N] and s1,s2[2]s_{1},s_{2}\in[2], we have

𝐖(i1)S+s1,(j1)S+s22{1if i=j and s1=s21if i=j and |s1s2|=10otherwise,\mathbf{W}^{2}_{(i-1)\cdot S+s_{1},(j-1)\cdot S+s_{2}}\operatorname{\coloneqq}\begin{cases}1&\text{if $i=j$ and $s_{1}=s_{2}$}\\ -1&\text{if $i=j$ and $|s_{1}-s_{2}|=1$}\\ 0&\text{otherwise}\end{cases}, (21)

and σ2\sigma^{2} is a linear threshold function.

H3H^{3} is exactly the graph status after phase 2. Therefore, the CLT model under two competitive cascades in one step can also be implement by a neural network with 33 layers, 2(N+M)2(N+M) adjustable parameters and 4N4N computation units. Every output nodes is a linear threshold unit and the computational units are piecewise polynomial with 22 pieces and degree 11 in each piece. Based on theorem LABEL:, with fixed pieces and degree, we can obtain VC(i,1)=O((M+N)log(M+N))\operatorname{VC}(\mathcal{F}_{i,1})=O((M+N)log(M+N)) for node uiu^{i} for one step. With a shared parameter, when t>3t>3, follow the same analysis in Lemma LABEL:, we have VC(i,t)=O~((M+N))\operatorname{VC}(\mathcal{F}_{i,t})=\tilde{O}((M+N)).

(Pleas organize the appendix so that we have proofs only to Theorem 1, lemma 1, lemma 2 corollary 1.)

4 Experiment

In this section, we present empirical studies on a collection of datasets, with goal to evaluate our algorithms against different baselines in two sample types. We aim to explore the impact of number of samples needed by the proposed learning algorithms to achieve reasonable performance.

4.1 Data

Synthetic data

We use two group synthetic graph data: a Kronecker graph with 10241024 nodes and 26552655 edges and a power-law graph with 768768 nodes and 15321532 edges. Following the CLT model, for each cascade, we set the weight on the each edge (u,,v)(u,,v) as wu,vi=1dv+s/10w_{u,v}^{i}=\frac{1}{d_{v}+s/10}, where dvd_{v} denotes the in-degree of node vv. For each node uu, we set the threshold for each cascade θui\theta_{u}^{i} is generated uniformly at random from [0,1][0,1].

Real data

We apply our approaches the hashtag cascades data in the Gab network, which was collected from August 2016 to October 2018. The original network includes more than 50,00050,000 users and 11,467,91811,467,918 edges. 861,636861,636 different hashtags are recorded. Each hashtag can be seen as a cascade. To obtain the cascades with competitive relationships, we choose the hashtags with a relatively high overlap of forwarding users during a fixed period. In our experiments, we select hashtags ’trump’ and ’obama, since more than 30 percent of users both forwarded them from 2017 to 2018. And then we extract the corresponding social graph based on the users involved in these selected hashtags. The seed sets are the users who forward the cascades at the beginning of our collection. We set the time interval between each step as one day.

4.2 Experiment settings

Sample generating and CLT model

For each graph, we set 33 cascades propagate simultaneously. In each sample, the total number of the seeds is set as 0.1|V||isψ0i|0.5|V|0.1|V|\leq|\bigcup_{i}^{s}\psi_{0}^{i}|\leq 0.5|V|. And each seed set ψ0i\psi_{0}^{i} is selected uniformly at random over all subset of |V||V|. For each sample, the graph status during the diffusion step and the diffusion steps are record. For each graph, we collect 20002000 samples.

Evaluation metrics

We use four metrics to evaluate each learning approaches: precision, recall, F1F-1 score and accuracy.

The experiments contains two groups.

The training samples are randomly selected in different sizes: 50,100,500,100050,100,500,1000. The test data size is fixed at 500500.

  • Propose method: We use linear programming approach.

  • Baseline methods: We use multi-class classification supervised learning algorithms. For each node, we train a model to predict the node status after one time step tt. For all training samples, the training pair is defined as (𝐈t1,:,2::,𝐈t,:,2::)(\mathbf{I}_{t-1,:,2::},\mathbf{I}_{t,:,2::}). In testing We set diffusion steps in different number 1,2,3,4,5{1,2,3,4,5} to evaluate the performance of learned model.

    • Logistic regression model: The probability of a node in status ii is exp(w(u,v)i𝕀t1,i,2,sθui)1+exp(w(u,v)i𝕀t1,i,2,sθui)\frac{\exp(w_{(u,v)}^{i}\mathbb{I}_{t-1,i,2,s}-\theta_{u}^{i})}{1+\exp(w_{(u,v)}^{i}\mathbb{I}_{t-1,i,2,s}-\theta_{u}^{i})}. The prediction status is cascade ii with the maximize probability.

    • SVM model: This method implement the “one-versus-rest” multi-class strategy.

    • MLP model: With all the ReLu activation functions, We set the number of hidden layers is 100 and the learning rate is 0.0010.001.

  • We also do a random guess experiment.

4.3 Observation

In this section we list our main results in three table. Additional experiments results on power-law graph are in ?.

Overall observations

We compare the prediction performance of LP with other standard supervised machine learning algorithms. The main outcome on Kronecker graph and Gab cascade data is shown in Table 1, where the contents in each cell shows the corresponding metric with its standard deviation. The results show that the LP outperforms other methods with more training samples are given. With sample size 500500, LP has already showed an accuracy with 0.990.99.

train_size 100 500
algorithm iteration f_1 score precision recall accuracy f_1 score precision recall accuracy
LP - 0.994/0.001 0.995/0.000 0.995/0.001 0.993/0.000 1.000/0.000 0.999/0.000 1.000/0.000 0.999/0.000
LR 1 0.499/0.001 0.676/0.002 0.452/0.001 0.709/0.002 0.509/0.001 0.732/0.001 0.456/0.001 0.719/0.004
2 0.474/0.001 0.752/0.012 0.425/0.002 0.705/0.004 0.473/0.002 0.811/0.004 0.421/0.002 0.708/0.003
3 0.460/0.003 0.802/0.010 0.412/0.002 0.705/0.006 0.454/0.001 0.852/0.003 0.406/0.001 0.703/0.005
4 0.452/0.003 0.827/0.005 0.405/0.002 0.701/0.004 0.449/0.001 0.867/0.004 0.401/0.001 0.703/0.005
5 0.451/0.003 0.832/0.006 0.404/0.002 0.701/0.004 0.446/0.002 0.876/0.003 0.400/0.001 0.703/0.003
SVM 1 0.556/0.002 0.716/0.002 0.502/0.002 0.727/0.003 0.583/0.001 0.797/0.002 0.517/0.001 0.747/0.003
2 0.496/0.001 0.731/0.003 0.446/0.001 0.705/0.004 0.504/0.002 0.805/0.004 0.447/0.001 0.713/0.006
3 0.480/0.003 0.770/0.009 0.430/0.002 0.702/0.005 0.483/0.002 0.834/0.001 0.429/0.001 0.706/0.004
4 0.470/0.004 0.798/0.009 0.421/0.003 0.697/0.003 0.473/0.001 0.855/0.002 0.421/0.001 0.701/0.004
5 0.465/0.003 0.818/0.006 0.417/0.003 0.698/0.006 0.468/0.003 0.863/0.002 0.417/0.001 0.700/0.006
MLP 1 0.909/0.001 0.975/0.000 0.860/0.001 0.935/0.000 0.909/0.001 0.975/0.000 0.860/0.001 0.935/0.000
2 0.909/0.001 0.975/0.000 0.861/0.001 0.934/0.001 0.908/0.001 0.975/0.000 0.859/0.001 0.935/0.001
3 0.909/0.001 0.975/0.000 0.861/0.001 0.935/0.001 0.909/0.000 0.975/0.000 0.860/0.001 0.934/0.001
4 0.909/0.001 0.975/0.001 0.861/0.002 0.935/0.001 0.909/0.001 0.975/0.000 0.860/0.001 0.934/0.001
5 0.909/0.001 0.975/0.000 0.860/0.001 0.934/0.000 0.909/0.000 0.975/0.000 0.860/0.001 0.935/0.001
random guess - 0.209/0.002 0.250/0.002 0.250/0.004 0.250/0.002 0.210/0.001 0.250/0.001 0.250/0.001 0.250/0.001
Table 1: Main results under the full observation of Kronecker graph
train_size 100 500
algorithm iteration f_1 score precision recall accuracy f_1 score precision recall accuracy
LP - 0.994/0.001 0.995/0.000 0.995/0.001 0.994/0.000 1.000/0.000 0.999/0.000 1.000/0.000 0.999/0.000
LR 1 0.207/0.002 0.292/0.009 0.259/0.000 0.618/0.012 0.207/0.002 0.288/0.008 0.260/0.001 0.619/0.003
2 0.208/0.005 0.283/0.005 0.261/0.002 0.609/0.002 0.205/0.001 0.293/0.004 0.259/0.001 0.609/0.010
3 0.209/0.001 0.293/0.011 0.261/0.001 0.610/0.006 0.206/0.002 0.291/0.016 0.260/0.001 0.613/0.002
4 0.205/0.004 0.294/0.009 0.259/0.001 0.611/0.012 0.206/0.004 0.293/0.003 0.260/0.002 0.609/0.009
5 0.206/0.000 0.284/0.009 0.259/0.000 0.608/0.001 0.206/0.000 0.285/0.007 0.259/0.001 0.611/0.006
SVM 1 0.230/0.002 0.281/0.017 0.274/0.001 0.622/0.002 0.228/0.002 0.257/0.002 0.274/0.001 0.626/0.007
2 0.231/0.002 0.286/0.005 0.275/0.001 0.618/0.001 0.229/0.001 0.258/0.002 0.275/0.001 0.624/0.001
3 0.226/0.002 0.272/0.003 0.273/0.001 0.617/0.006 0.229/0.003 0.263/0.001 0.275/0.000 0.620/0.012
4 0.230/0.000 0.278/0.004 0.275/0.001 0.617/0.001 0.231/0.000 0.268/0.004 0.277/0.001 0.623/0.005
5 0.231/0.001 0.280/0.001 0.276/0.002 0.620/0.012 0.230/0.001 0.265/0.004 0.277/0.001 0.619/0.003
MLP 1 0.230/0.001 0.272/0.006 0.274/0.001 0.621/0.000 0.231/0.004 0.268/0.010 0.277/0.002 0.621/0.003
2 0.233/0.002 0.286/0.002 0.275/0.001 0.628/0.012 0.231/0.004 0.269/0.001 0.276/0.002 0.629/0.003
3 0.230/0.003 0.272/0.012 0.274/0.002 0.622/0.007 0.230/0.002 0.273/0.000 0.276/0.002 0.623/0.004
4 0.227/0.005 0.274/0.004 0.272/0.002 0.617/0.002 0.229/0.002 0.284/0.016 0.274/0.001 0.621/0.001
5 0.230/0.001 0.280/0.013 0.274/0.000 0.619/0.004 0.228/0.001 0.261/0.002 0.275/0.001 0.620/0.004
random guess - 0.214/0.002 0.251/0.002 0.251/0.003 0.251/0.001 0.213/0.000 0.250/0.000 0.250/0.001 0.250/0.000
Table 2: Main results of Power-law graph
Impact of the number of cascades

As our work focuses on the CLT model, we are interested in the performance of our proposed methods in different number of cascades. The Table 5 and 5 shows the experiment results under two data types.

5 Conclusion

Appendix A Proofs

A.1 Proof of Theorem 1

(Please revise the proof of Theorem 1: for each sigmalsigma^{l}, please explicitly state what kind of function they are; please explicitly count the numbers you have in the theorem.)

We design a neural network to simulate the diffusion process under CLT model. For each diffusion step, we

Phase 1
Phase 2

A.2 Proof of Lemma 1

please follow the following structure of the proof: for each time step transformation 𝐈t,:,:,2\mathbf{I}_{t,:,:,2} to 𝐈t+1,:,:,2\mathbf{I}_{t+1,:,:,2}, explicitly write down the constraint given by each activation function in Figure 2; after that, summarize all the constraints into one linear programming

The linear programming approach shows that ERM is efficient learnable from the sample data. Given the sample set SS, during the diffusion process, the observed activation status in sample k[K]k\in[K] after time step tt is given by 𝐈t1,:,2,:k\mathbf{I}_{t-1,:,2,:}^{k}. The loss of an influence function 𝐅\mathbf{F} for the sample kk is given by

t=1n01(𝐈t,:,2,:k,𝐅(𝐈t1,:,2,:k))\displaystyle\sum_{t=1}^{n}\operatorname{\mathcal{\ell}_{0-1}}(\mathbf{I}_{t,:,2,:}^{k},\mathbf{F}(\mathbf{I}_{t-1,:,2,:}^{k})) (22)

In the prescribed neural network with one diffusion step tt, both output after phase 1 and phase 2 can be seen as a binary vector with size NSNS. We first focus on each output node uisu_{i}^{s}.

Phase 1

Given the graph status 𝐈t1,:,2,:\mathbf{I}_{t-1,:,2,:} after step t1t-1, let gt,i,1,s(𝐈t1,N(ui)s,2,s,𝐖is):{0,1}2|N(ui)|{0,1}g_{t,i,1,s}(\mathbf{I}_{t-1,N(u_{i})^{s},2,s},\mathbf{W}_{i}^{s}):\{0,1\}^{2|N(u_{i})|}\to\{0,1\} denote the computation of the output node uisu_{i}^{s} after phase 1, and gt,i,1,s(𝐈t1,N(ui)s,2,s,𝐖is)=1(ζisθis)g_{t,i,1,s}(\mathbf{I}_{t-1,N(u_{i})^{s},2,s},\mathbf{W}_{i}^{s})=1(\zeta_{i}^{s}\geq\theta_{i}^{s}), where 𝐖is\mathbf{W}_{i}^{s} is the parameters related to node uisu_{i}^{s}. Then, the local prediction error can be defined as

𝐋^t,i,1,s(𝐖is)=1Kk=1K𝟏[𝐈t,i,2,sjgt,i,1,s(𝐈t1,N(ui),2,s,𝐖is)]\displaystyle\hat{\mathbf{L}}_{t,i,1,s}(\mathbf{W}_{i}^{s})=\frac{1}{K}\sum_{k=1}^{K}\boldsymbol{1}[\mathbf{I}_{t,i,2,s}^{j}\neq g_{t,i,1,s}(\mathbf{I}_{t-1,N(u_{i}),2,s},\mathbf{W}_{i}^{s})] (23)

During the phase 1 diffusion process in each step, the propagation for each cascade can be seen as SS cascades diffused independently according to the linear threshold model (And the neural network during phase 1 is a linear threshold network). Therefore, there always exists parameters such that 𝐋^t,i,1,s(𝐖is)=0\hat{\mathbf{L}}_{t,i,1,s}(\mathbf{W}_{i}^{s})=0 after phase 1.

Phase 2

In phase 2, the comparison process is happened without the unknown parameters. Let gt,i,2,s(𝐈t,N(u)s,1,s{0,1}g_{t,i,2,s}(\mathbf{I}_{t,N(u)^{s},1,s}\to\{0,1\} denote the computation of the output node uisu_{i}^{s} after phase 1. Then, the local prediction error can be defined as

𝐋^t,i,2,s=1Kk=1K𝟏[𝐈t,i,2,sjgt,i,2,s(𝐈t,N(ui),1,s)]\displaystyle\hat{\mathbf{L}}_{t,i,2,s}=\frac{1}{K}\sum_{k=1}^{K}\boldsymbol{1}[\mathbf{I}_{t,i,2,s}^{j}\neq g_{t,i,2,s}(\mathbf{I}_{t,N(u_{i}),1,s})] (24)
Formulating the LP

Our learning problem can be decoupled into a group of linear programmings for each time step in each samples. Our optimization problem is over the variables 𝐖S(N+M)\mathbf{W}\in\mathbb{R}^{S(N+M)} and a groups of slack variables ξt,i,p,s\xi_{t,i,p,s}\in\mathbb{R} for t[N],i[N],p{1,2}t\in[N],i\in[N],p\in\{1,2\} and s[S]s\in[S], which is designed to help us formulate the objective function. Therefore,

minwui|2N(ui)|,ξ>0k=1Ki=1nt=1ns=1Sξt,i,1,sk+ξt,i,2,sk\min_{w_{u^{i}}\in\mathbb{R}^{|2N(u^{i})|},\xi>0}\sum_{k=1}^{K}\sum_{i=1}^{n}\sum_{t=1}^{n}\sum_{s=1}^{S}\xi^{k}_{t,i,1,s}+\xi^{k}_{t,i,2,s} (25)

Subject to:

(2(It,i,1,sk)1)(ujN(ui)w(i,j)s𝐈t1,i,2,skθis)\displaystyle(2(I_{t,i,1,s}^{k})-1)(\sum_{u_{j}\in N(u_{i})}w_{(i,j)}^{s}\mathbf{I}_{t-1,i,2,s}^{k}-\theta_{i}^{s})\geq\hskip 128.0374pt
1ξt,i,1,sk\displaystyle\hskip 170.71652pt1-\xi^{k}_{t,i,1,s}
ujN(ui)w(i,j)s𝐈t1,i,2,skujN(ui)w(i,j)s𝐈t1,i,2,sk\displaystyle\sum_{u_{j}\in N(u_{i})}w_{(i,j)}^{s^{*}}\mathbf{I}_{t-1,i,2,s}^{k}-\sum_{u_{j}\in N(u_{i})}w_{(i,j)}^{s}\mathbf{I}_{t-1,i,2,s}^{k}\geq\hskip 128.0374pt
0ξt,1,2,sk\displaystyle\hskip 170.71652pt0-\xi^{k}_{t,1,2,s}

where 𝐈t,i,2,s=1\mathbf{I}_{t,i,2,s^{*}}=1

A.3 Proof of Lemma 1 ( the numbers should be generated by ref rather than hard coded)

The realizable assumption on hypothesis class \mathcal{H} is that there is a hh\in\mathcal{H}, such that given a sample set S={(x1,y1),,(xm,ym)}S=\{(x_{1},y_{1}),...,(x_{m},y_{m})\}, h(xi)=yih(x_{i})=y_{i}. Given the influence function class \mathcal{F}, the samples are generated from the distribution of 𝐈0\mathbf{I}_{0} and a target function ff. In our setting, we care about the function ff. Therefore, given our designed neural network model with a finite VC dimension, following the ERM strategy, we can obtain a probably approximately model, which also called restricted model. Thus, there always be a function ff\in\mathcal{F} consist with samples bartlett1999hardness. Therefore, suppose that parameters WW^{*} is the underlying parameters of CLT model, the following equation is always satisfied:

inf𝐖+S(N+M)01()=01(W)=0\inf_{\mathbf{W}\in\mathbb{R}_{+}^{S(N+M)}}\operatorname{\mathcal{\ell}_{0-1}}(\mathcal{F})=\operatorname{\mathcal{\ell}_{0-1}}(\mathcal{F}^{W^{*}})=0 (26)

This complete the proof.

A.4 Proof of Lemma 2

The following theorem gives the VC dimension of the class of neural networks with all piecewise-polynomial units.

Theorem 2.

? Suppose NN is a feed-forward network with a total of WW weights and kk computational units, in which the output unit is a linear threshold unit and every other computation unit has a piecewise-polynomial activation function with pp pieces and degree no more than ll. Suppose in addition that the computation units in the network are arranged in LL layers, so that each unit has connections only from units in earlier layers. Then if HH is the class of functions computed by NN,

VC(H)\displaystyle\operatorname{VC}(H)\leq 2WLlog2(4WLpk/ln2)+\displaystyle 2WL\log_{2}(4WLpk/\ln{2})+
+\displaystyle+ 2WL2log2(l+1)+2L\displaystyle 2WL^{2}\log_{2}(l+1)+2L (27)

for a fixed pp and ll, VC(H)=O(WLlog2W+WL2)\operatorname{VC}(H)=O(WL\log_{2}W+WL^{2})

Note that the influence function class \mathcal{F} is a set of all possible neural network we designed in section 3. We first consider the neural networks with single output uisu_{i}^{s}. We use 𝐅uis,tw(𝐈0):{0,1}sn{0,1}\mathbf{F}_{u^{s}_{i},t}^{w}(\mathbf{I}_{0}):\{0,1\}^{sn}\to\ \{0,1\} to denote the computation at the output node uiu^{i} after time step tt, where ws(n+m)w\in\mathbb{R}^{s(n+m)}. Based on the architecture of the neural network, for any tt, the attributes the corresponding neural network can be summarized in Table 3.

parameters symbol diffusion step tt
layers ll t(2Z+5)t(2Z+5)
maximum pieces of all units pp S(9q1)S(9^{q}-1)
maximum degree of all pieces dd 11
computational units kk (8S3)Nt(8S-3)Nt
adjustable parameters ww S(N+M)S(N+M)
Table 3: The parameters of neural network with one output uiu^{i}

Therefore, we can obtain the upper bound of the VC dimension when t=1t=1 with one output uisu_{i}^{s}, and denote as VC(𝐅uis,1w)=O~(S(N+M)Q)\operatorname{VC}(\mathbf{F}_{u^{s}_{i},1}^{w})=\tilde{O}(S(N+M)Q). Follow this idea, if we ignore the same parameters during each time step, the total number of unknown parameters in the global neural network would be N(S(N+M))N(S(N+M)). And according to theorem LABEL:theorem:VC, we will obtain a VC dimension bound of O~(NS(N+M)Q)\tilde{O}(NS(N+M)Q). However, all the parameters we want learn are shared during the global diffusion process in 𝒩\mathcal{N}, which leads to a lower ability of shattering a subset of points for that neural network. Therefore we can obtain a tighter bound of VC dimension for influence function class 𝐅uis,t\mathbf{F}_{u^{s}_{i},t} when t>2t>2.

To see this, we first calculate the VC(𝐅uis,2w)\operatorname{VC}(\mathbf{F}_{u^{s}_{i},2}^{w}), which is O~(S(N+M)Q)\tilde{O}(S(N+M)Q). Now we prove that the VCdim(𝒢uis,tw)VC(𝐅uis,2w)VCdim(\mathcal{G}_{u^{s}_{i},t}^{w})\leq\operatorname{VC}(\mathbf{F}_{u^{s}_{i},2}^{w}) when t>2t>2. Consider a set of NSNS points {X1,,XNS}{0,1}NS\{X_{1},...,X_{NS}\}\subseteq\{0,1\}^{NS} shattered by usi,tw\mathcal{F}_{u^{i}_{s},t}^{w}, where t>2t>2. From the definition of shattered set,we have:

2NS=|{Fuis,tw(X1),,Fuis,tw(XNS)|=|{Fuis,t1w(F1,1w(X1),,F1,Nw(X1)),,Fui,t1w(F1,1w(XNS),,F1,Nw(XNS))|=|{Fuis,t1w(𝐙1),,Fuis,t1w(𝐙NS)|\begin{split}2^{NS}&=|\{F^{w}_{u^{s}_{i},t}(X_{1}),...,F^{w}_{u_{i}^{s},t}(X_{NS})|\\ &=|\{F^{w}_{u_{i}^{s},t-1}(F^{w}_{1,1}(X_{1}),...,F^{w}_{1,N}(X_{1})),...,\\ &\ \ \ \ \ \ F^{w}_{u^{i},t-1}(F^{w}_{1,1}(X_{NS}),...,F^{w}_{1,N}(X_{NS}))|\\ &=|\{F^{w}_{u^{s}_{i},t-1}(\mathbf{Z}_{1}),...,F^{w}_{u^{s}_{i},t-1}(\mathbf{Z}_{NS})|\\ \end{split} (28)

where 𝐙1=[F1,1w(X1),,F1,Nw(X1)],,𝐙NS=F1,1w(XNS),,F1,Nw(XNS)0,1SN\mathbf{Z}_{1}=[F^{w}_{1,1}(X_{1}),...,F^{w}_{1,N}(X_{1})],...,\mathbf{Z}_{NS}=F^{w}_{1,1}(X_{NS}),...,F^{w}_{1,N}(X_{NS})\in{0,1}^{S}N

From above equation, we have

|{Fui,t1w(𝐙1),,Fui,t1w(𝐙sn)|=2V\displaystyle|\{F^{w}_{u^{i},t-1}(\mathbf{Z}_{1}),...,F^{w}_{u^{i},t-1}(\mathbf{Z}_{sn})|=2^{V^{\prime}} (29)

We could see that the 𝐙1,,𝐙NS\mathbf{Z}_{1},...,\mathbf{Z}_{NS} have to be different so that all the patterns of of NSNS nodes can be realized. This implies that the set 𝐙1,,𝐙NS\mathbf{Z}_{1},...,\mathbf{Z}_{NS} is shattered by uisw\mathcal{F}^{w}_{u^{s}_{i}} in time step t1t-1. Thus for any set of points of a given size shattered by uis\mathcal{F}_{u^{s}_{i}} when t>2t>2, there exists a set of points of the same size shattered by uis\mathcal{F}_{u^{s}_{i}} in time step t1t-1. Therefore:

VC(uis,t)VC(uis,t1)\operatorname{VC}(\mathcal{F}_{u^{s}_{i},t})\leq\operatorname{VC}(\mathcal{F}_{u^{s}_{i},t-1}) (30)

Hence, VC(ui)=O~(s(n+r)q)\operatorname{VC}(\mathcal{F}_{u^{i}})=\tilde{O}(s(n+r)q). The result is summarized in Theorem LABEL:theorem:VC.

A.5 Proof of corollary 1

Theorem 3.

? Let \mathcal{H} be a hypothesis class of functions from a domain 𝒳\mathcal{X} to {0,1}\{0,1\} and let the loss function be the 010-1 loss. Then, the following are equivalent:

  • \mathcal{H} has the uniform convergence property.

  • Any ERM rule is a successful agnostic PAC learner for \mathcal{H}.

  • \mathcal{H} is agnostic PAC learnable.

  • \mathcal{H} is PAC learnable.

  • Any ERM rule is a successful PAC learner for \mathcal{H}

  • \mathcal{H} has a finite VC dimension

Theorem 4.

Let \mathcal{H} be a hypothesis class of functions from a domain 𝒳\mathcal{X} to {0,1}\{0,1\} and let the loss function be the 010-1 loss. Assume that VC()=d<\operatorname{VC}(\mathcal{H})=d<\infty. Then, there are absolute constants C1,C2C_{1},C_{2} such that \mathcal{H} is PAC learnable with sample complexity:

C1d+log(1/δ)ϵm(ϵ,δ)C2dlog(1/ϵ)+log(1/δ)ϵ\displaystyle C_{1}\frac{d+\log(1/\delta)}{\epsilon}\leq m_{\mathcal{H}}(\epsilon,\delta)\leq C_{2}\frac{d\log(1/\epsilon)+\log(1/\delta)}{\epsilon} (31)

Therefore, based on the Fundamental Theorem of Statistical Learning, the influence function usi\mathcal{F}_{u^{i}_{s}} is PAC learnable with sample complexity muism_{\mathcal{F}_{u_{i}^{s}}} for any δ,ϵ(0,1)\delta,\epsilon\in(0,1) such that with confidence 1δ1-\delta, the generalization error L(F^)ϵL(\hat{F})\leq\epsilon, where

muis=O(VC()log(1/ϵ)+log(n/δ)ϵ)m_{\mathcal{F}_{u_{i}^{s}}}=O(\frac{\operatorname{VC}(\mathcal{F})log(1/\epsilon)+log(n/\delta)}{\epsilon}) (32)

By taking a union bound over all NSNS nodes, we can get m=O(VC()log(1/ϵ)+log(n/δ)ϵ)m_{\mathcal{F}}=O(\frac{\operatorname{VC}(\mathcal{F})log(1/\epsilon)+log(n/\delta)}{\epsilon}).

Appendix B Additional materials for experiments

train_size 50 100 500 1000
algorithm iteration f_1 score precision recall accuracy f_1 score precision recall accuracy f_1 score precision recall accuracy f_1 score precision recall accuracy
LP - 0.991/0.001 0.993/0.000 0.989/0.001 0.902/0.001 0.994/0.001 0.995/0.000 0.995/0.001 0.993/0.000 1.000/0.000 0.999/0.000 1.000/0.000 0.999/0.000
LR 1 0.488/0.002 0.622/0.002 0.448/0.002 0.695/0.002 0.499/0.001 0.676/0.002 0.452/0.001 0.709/0.002 0.509/0.001 0.732/0.001 0.456/0.001 0.719/0.004 0.508/0.001 0.732/0.002 0.455/0.001 0.718/0.005
2 0.471/0.002 0.668/0.011 0.427/0.002 0.697/0.004 0.474/0.001 0.752/0.012 0.425/0.002 0.705/0.004 0.473/0.002 0.811/0.004 0.421/0.002 0.708/0.003 0.472/0.001 0.816/0.004 0.421/0.001 0.708/0.005
3 0.460/0.001 0.713/0.022 0.416/0.001 0.697/0.004 0.460/0.003 0.802/0.010 0.412/0.002 0.705/0.006 0.454/0.001 0.852/0.003 0.406/0.001 0.703/0.005 0.454/0.001 0.853/0.003 0.406/0.001 0.702/0.002
4 0.458/0.004 0.743/0.019 0.413/0.004 0.698/0.006 0.452/0.003 0.827/0.005 0.405/0.002 0.701/0.004 0.449/0.001 0.867/0.004 0.401/0.001 0.703/0.005 0.448/0.001 0.874/0.003 0.401/0.001 0.700/0.006
5 0.454/0.005 0.772/0.017 0.408/0.005 0.699/0.004 0.451/0.003 0.832/0.006 0.404/0.002 0.701/0.004 0.446/0.002 0.876/0.003 0.400/0.001 0.703/0.003 0.446/0.002 0.879/0.002 0.399/0.001 0.703/0.005
SVM 1 0.537/0.001 0.663/0.008 0.491/0.001 0.716/0.004 0.556/0.002 0.716/0.002 0.502/0.002 0.727/0.003 0.583/0.001 0.797/0.002 0.517/0.001 0.747/0.003 0.904/0.001 0.974/0.000 0.854/0.001 0.932/0.001
2 0.488/0.001 0.674/0.005 0.443/0.001 0.695/0.002 0.496/0.001 0.731/0.003 0.446/0.001 0.705/0.004 0.504/0.002 0.805/0.004 0.447/0.001 0.713/0.006 0.904/0.001 0.975/0.000 0.854/0.001 0.933/0.001
3 0.476/0.003 0.705/0.011 0.431/0.002 0.693/0.002 0.480/0.003 0.770/0.009 0.430/0.002 0.702/0.005 0.483/0.002 0.834/0.001 0.429/0.001 0.706/0.004 0.904/0.000 0.974/0.000 0.854/0.001 0.931/0.000
4 0.471/0.004 0.734/0.017 0.425/0.004 0.696/0.002 0.470/0.004 0.798/0.009 0.421/0.003 0.697/0.003 0.473/0.001 0.855/0.002 0.421/0.001 0.701/0.004 0.905/0.001 0.974/0.000 0.855/0.001 0.932/0.000
5 0.465/0.004 0.757/0.018 0.420/0.004 0.692/0.005 0.465/0.003 0.818/0.006 0.417/0.003 0.698/0.006 0.468/0.003 0.863/0.002 0.417/0.001 0.700/0.006 0.904/0.001 0.974/0.001 0.854/0.002 0.932/0.001
MLP 1 0.909/0.001 0.975/0.000 0.860/0.001 0.935/0.001 0.909/0.001 0.975/0.000 0.860/0.001 0.935/0.000 0.909/0.001 0.975/0.000 0.860/0.001 0.935/0.000 0.910/0.001 0.975/0.000 0.862/0.001 0.935/0.000
2 0.909/0.001 0.975/0.001 0.862/0.001 0.934/0.001 0.909/0.001 0.975/0.000 0.861/0.001 0.934/0.001 0.908/0.001 0.975/0.000 0.859/0.001 0.935/0.001 0.908/0.001 0.975/0.000 0.859/0.001 0.935/0.000
3 0.909/0.001 0.975/0.000 0.860/0.001 0.934/0.001 0.909/0.001 0.975/0.000 0.861/0.001 0.935/0.001 0.909/0.000 0.975/0.000 0.860/0.001 0.934/0.001 0.909/0.000 0.975/0.000 0.860/0.001 0.935/0.001
4 0.909/0.000 0.975/0.000 0.860/0.000 0.935/0.001 0.909/0.001 0.975/0.001 0.861/0.002 0.935/0.001 0.909/0.001 0.975/0.000 0.860/0.001 0.934/0.001 0.909/0.000 0.975/0.000 0.861/0.000 0.935/0.001
5 0.909/0.000 0.975/0.000 0.860/0.001 0.934/0.001 0.909/0.001 0.975/0.000 0.860/0.001 0.934/0.000 0.909/0.000 0.975/0.000 0.860/0.001 0.935/0.001 0.909/0.000 0.975/0.000 0.861/0.001 0.935/0.000
random guess - 0.208/0.002 0.250/0.002 0.250/0.003 0.250/0.002 0.209/0.002 0.250/0.002 0.250/0.004 0.250/0.002 0.210/0.001 0.250/0.001 0.250/0.001 0.250/0.001 0.208/0.000 0.250/0.001 0.250/0.001 0.250/0.001
Table 4: Main results of Kronecker graph
train size 50 100 500
algorithm number of cascades f1_score precision recall accuracy f1_score precision recall accuracy f1_score precision recall accuracy
LP 2
4
8
LR 2
4
8
SVM 2
4
8
NN 2
4
8
Table 5: Evaluate the performance on different number of cascades

parameters:

  • V={u1,,uN}V=\{u_{1},...,u_{N}\}, uiu_{i} for i[N]={1,,N}i\in[N]=\{1,...,N\}

  • [S]={1,..,S}[S]=\{1,..,S\}, ss-active for s[S]s\in[S]; S=2nS=2^{n} for n+n\in\mathbb{Z}^{+}.

  • sample: k[K]k\in[K]

B.0.1 Phase 1

  • 𝐇𝟏\mathbf{H^{1}} size: NSNS

  • 𝐖1|H2|×|H1|(forWlHl),𝐖1NS×NS\mathbf{W}^{1}\in\mathbb{R}^{|H_{2}|\times|H_{1}|}(\text{for}W^{l}H^{l}),\mathbf{W}^{1}\in\mathbb{R}^{NS\times NS}

    Wi,j1:={w(i1S+1,j1S+1)(modis)If (ui1S+1,uj1S+1)E1If i1S+1=j1S+10OtherwiseW^{1}_{i,j}\colon=\begin{cases}w_{(\frac{i-1}{S}+1,\frac{j-1}{S}+1)}^{(\mod{\frac{i}{s}})}&\text{If $(u_{\frac{i-1}{S}+1},u_{\frac{j-1}{S}+1})\in E$}\\ 1&\text{If $\frac{i-1}{S}+1=\frac{j-1}{S}+1$}\\ 0&\text{Otherwise}\end{cases} (33)

    Or if I can use (is,js)(i^{s},j^{s}) to index the nodes in first layer and second layer, I can use following equation:

    Wis,js1:={w(i,j)sIf (ui,uj)E1If i=j0OtherwiseW^{1}_{i^{s},j^{s}}\colon=\begin{cases}w_{(i,j)}^{s}&\text{If $(u_{i},u_{j})\in E$}\\ 1&\text{If $i=j$}\\ 0&\text{Otherwise}\end{cases} (34)
  • 𝐇𝟐\mathbf{H^{2}} size: NSNS

  • σ2\sigma^{2}:

    σi2(x):={xIf xθi1S+1modis0Otherwise\sigma^{2}_{i}(x)\colon=\begin{cases}x&\text{If $x\geq\theta_{\frac{i-1}{S}+1}^{\mod{\frac{i}{s}}}$}\\ 0&\text{Otherwise}\end{cases} (35)

    Or if I can use isi^{s} to index the nodes in second layer, I can use following equation:

    σis2(x):={xIf xθuis0Otherwise\sigma^{2}_{i^{s}}(x)\colon=\begin{cases}x&\text{If $x\geq\theta_{u_{i}}^{s}$}\\ 0&\text{Otherwise}\end{cases} (36)

B.0.2 Phase 2

Pairwise comparison
  • Add cascade index

  • 𝐖2NS×NS\mathbf{W}^{2}\in\mathbb{R}^{NS\times NS}

    Wi,j2:={10(q+S10)If i=j0OtherwiseW^{2}_{i,j}\colon=\begin{cases}10^{(q+\left\lceil{\frac{S}{10}}\right\rceil)}&\text{If $i=j$}\\ 0&\text{Otherwise}\end{cases} (37)
  • 𝐇3\mathbf{H}^{3} size: NSNS

  • σ3\sigma^{3}:

    σi3(x)=x+modiS\sigma^{3}_{i}(x)=x+\mod{\frac{i}{S}} (38)
  • Begin comparison: the composition of these comparison layers can be seen as NN identical structure (each structure contains SS nodes) from the longitudinal observation. Among each identical structure, the idea is that we compare the node value between every two adjacent nodes and pass the larger value forward until there has only one node left. And the node size will be reduced by half for every two layers. Therefore, to complete the pairwise comparison process, 2n2n layers are needed. Furthermore, during each pairwise comparison(every two layers among 2n2n layers), the architecture is same.

  • 𝐖3NS×NS\mathbf{W}^{3}\in\mathbb{R}^{NS\times NS}

    Wi,j3={1If i=j1If modi/2=0 and j=i1 0OtherwiseW^{3}_{i,j}=\begin{cases}1&\text{If $i=j$}\\ -1&\text{If $\mod{i/2}=0$ and $j=i-1$ }\\ 0&\text{Otherwise}\end{cases} (39)
  • 𝐇4\mathbf{H}^{4} size: NSNS

  • σ4\sigma^{4}: ReLu

    σ4(x)=max{0,x}\sigma^{4}(x)=max\{0,x\} (40)
  • 𝐖4NS/2×NS\mathbf{W}^{4}\in\mathbb{R}^{NS/2\times NS}

    Wi,j4={1If j=2i or j=2i10OtherwiseW^{4}_{i,j}=\begin{cases}1&\text{If $j=2i$ or $j=2i-1$}\\ 0&\text{Otherwise}\end{cases} (41)
  • 𝐇5\mathbf{H}^{5} size: NS/2NS/2

  • σ5\sigma^{5}: Identity

    σ5(x)=x\sigma^{5}(x)=x (42)
  • 𝐇6\mathbf{H}^{6} size: NS/2NS/2

  • σ6\sigma^{6}: Relu

  • 𝐇7\mathbf{H}^{7} size: NS/4NS/4

  • σ7\sigma^{7}: Identity

  • 𝐇2n+3\mathbf{H}^{2n+3} size: NN

  • σ2n+3\sigma^{2n+3}: Identity

Status recovering
  • 𝐖2n+3N×N\mathbf{W}^{2n+3}\in\mathbb{R}^{N\times N}

    Wi,j2n+3={1If i=j0OtherwiseW^{2n+3}_{i,j}=\begin{cases}1&\text{If $i=j$}\\ 0&\text{Otherwise}\end{cases} (43)
  • 𝐇2n+4\mathbf{H}^{2n+4} size: NN

  • σ2n+4\sigma^{2n+4}:

    σ2n+4(x):=xxS10×S10\sigma^{2n+4}(x)\colon=x-\left\lfloor{\frac{x}{\left\lceil{\frac{S}{10}}\right\rceil}}\right\rfloor\times\left\lceil{\frac{S}{10}}\right\rceil (44)
  • 𝐖2n+4NS×N\mathbf{W}^{2n+4}\in\mathbb{R}^{NS\times N}

    Wi,j2n+4={1If i=2j or i=2j10OtherwiseW^{2n+4}_{i,j}=\begin{cases}1&\text{If $i=2j$ or $i=2j-1$}\\ 0&\text{Otherwise}\end{cases} (45)
  • 𝐇2n+5\mathbf{H}^{2n+5} size: NSNS

  • σ2n+5\sigma^{2n+5}:

    σi2x+4(x)={1If x(mod(iS)1,mod(iS)] 0Otherwise{}\sigma^{2x+4}_{i}(x)=\begin{cases}1&\text{If $x\in(mod(\frac{i}{S})-1,mod(\frac{i}{S})]$ }\\ 0&\text{Otherwise}\end{cases} (46)
𝟏st\mathbf{1}_{st} layer

N1N_{1} takes 𝐈0\mathbf{I}_{0} as the input. For further illustration, we reshape the matrix 𝐈0{0,1}n×s\mathbf{I}_{0}\in\{0,1\}^{n\times s} to a vector 𝐈0{0,1}ns\mathbf{I}_{0}\in\{0,1\}^{ns}. ( I think using vector format is more easier for me to describe the weight matrix clearly.) Let H0{0,1}nsH^{0}\in\{0,1\}^{ns} denote the input layer. The matrix W0[0,1]ns×nsW^{0}\in[0,1]^{ns\times ns} indicates the weights that are associated with each edges in the graph. We use uiu^{i} to index the nodes in input layer and 𝟏st\mathbf{1}_{st} layer, where uVu\in V and i{1,,s}i\in\{1,...,s\}. Furthermore, to keep the activated nodes remain active for the following diffusion process, an additional link with a fixed weight +1+1 will be added between each node in these two layers. The weight matrix W1W^{1} is defined as follows.

Wui,vi0:={w(u,v)iIf (u,v)E1If u=v0OtherwiseW^{0}_{u^{i},v^{i}}\colon=\begin{cases}w_{(u,v)}^{i}&\text{If $(u,v)\in E$}\\ 1&\text{If $u=v$}\\ 0&\text{Otherwise}\end{cases} (47)

The activation function σ1:{0,1}sn[0,1]sn\sigma^{1}:\{0,1\}^{sn}\to[0,1]^{sn} is designed to execute the linear threshold function with the unknown thresholds. In order to simulate the phase 22 diffusion process, it is crucial to record the weight summation value ζui\zeta^{i}_{u} for each cascades, which is exactly W0Hui0W^{0}H^{0}_{u^{i}}. Therefore, the activation function σ0\sigma^{0} is defined as follows.

σ1(W0H0)ui:={(W0H0)uiIf (W0H0)uiθui0Otherwise\sigma^{1}(W^{0}H^{0})_{u^{i}}\colon=\begin{cases}(W^{0}H^{0})_{u^{i}}&\text{If $(W^{0}H^{0})_{u^{i}}\geq\theta_{u}^{i}$}\\ 0&\text{Otherwise}\end{cases} (48)

Figure 4 gives an example of the structure for this group with 22 cascades. Obviously, the value of the nodes in the 𝟏st\mathbf{1}_{st} layer indicates the graph status after phase 11.

Refer to caption
Figure 4: An example of the structure for linear threshold layers. Suppose that there are 22 competitive cascades. To simplify, We only draw the links with corresponding weights related to node uu.
Lemma 4.

For i{1,,s}i\in\{1,...,s\}, if Hui1>0H^{1}_{u^{i}}>0, then node uu in graph is ii-active after phase 11; otherwise inactive.

𝟐nd\mathbf{2}_{nd} layer

This layer is designed to preserve the pre-activated cascade index ii. An additional binary node are added for each node in 𝟏st\mathbf{1}_{st} layer. The matrix W22ns×nsW^{2}\in\mathbb{R}^{2ns\times ns} is fixed and defined as follows.

Wj,k2:={10(q+s10)If j=2kmod(k/s)+1If j=2k+10OtherwiseW^{2}_{j,k}\colon=\begin{cases}10^{(q+\left\lceil{\frac{s}{10}}\right\rceil)}&\text{If $j=2k$}\\ \mod{(k/s)}+1&\text{If $j=2k+1$}\\ 0&\text{Otherwise}\end{cases} (49)

The activation function σ2\sigma^{2} is the identity function and defined as follows.

σ2(X1)=X1\sigma^{2}(X^{1})=X^{1} (50)
Refer to caption
Figure 5: A example of the substructure of comparison layers and suppose that there are 22 competitive cascades.

B.0.3 Comparison layers

This group simulates the phase 2 diffusion process. Since the comparison among the weight summation ζui\zeta_{u}^{i} for each cascade is happened inside every ss nodes, the composition of these comparison layers can be seen as nn identical structure from the longitudinal observation. An example of one user with 22 cascades is given in Figure 5. For every ss nodes, we construct a 222-2 comparison structure. The idea is that we compare the values between every 22 adjacent nodes and pass the larger value forward until there has only one node left. And the node size will be reduced by half for every 22 layers. Therefore, to complete the 222-2 comparison process, 2x2x layers are needed.

In the whole comparison layers for every ss nodes, the weight matrix is identical and fixed. Since the node size keep reducing by half for every 22 layers, to simplify, we set the input size is zl{2x,2x1,,1}z^{l}\in\{2^{x},2^{x-1},...,1\}, for l{3,,2x+2}l\in\{3,...,2x+2\}. If l/20l/2\neq 0, Wl=zl×zlW^{l}=\mathbb{R}^{z^{l}\times z^{l}} is defined in equation 51; otherwise, Wl=zl×zl/2W^{l}=\mathbb{R}^{z^{l}\times z^{l}/2} is in equation 52.

Wj,kl={1If j=k1If j=k+1 and j=2y for y+ 0Otherwise{}W^{l}_{j,k}=\begin{cases}1&\text{If $j=k$}\\ -1&\text{If $j=k+1$ and $j=2y$ for $y\in\mathbb{Z}^{+}$ }\\ 0&\text{Otherwise}\end{cases} (51)
Wj,kl={1If j=2k or j=2k1 0Otherwise{}W^{l}_{j,k}=\begin{cases}1&\text{If $j=2k$ or $j=2k-1$ }\\ 0&\text{Otherwise}\end{cases} (52)

ReLu and identity function are taking in turn among this group. When l/20l/2\neq 0, we use ReLu function; otherwise, we use identity function. For l{3,,2x}l\in\{3,...,2x\}, the activation function is defined as follows.

σl={max{Zl1,0}If l/20 Zl1Otherwise\sigma^{l}=\begin{cases}\max\{Z^{l-1},0\}&\text{If $l/2\neq 0$ }\\ Z^{l-1}&\text{Otherwise}\end{cases} (53)

Back to the whole structure of these 2x2x layers, by constructing 222-2 comparison for every ss nodes, the output H2x+2+nH^{2x+2}\in\mathbb{R}_{+}^{n} is a vector with size nn. Each node in this layer can be seen as the node in the network. And its value indicates the activation status after phase 2 along with the corresponding weights summation.

B.0.4 Recover layers

This group contains 22 layers and is designed to recover the format of the output. Figure 6 shows an example of the structure of this group.

(𝟐𝐱+𝟑)th\mathbf{(2x+3)}_{th} layer

This layer is designed to extract the cascade index from the single digit of H(2x+2)H^{(}2x+2). We use uVu\in V to index each node. The weight matrix W2x+3={0,1}n×nW^{2x+3}=\{0,1\}^{n\times n} is defined as follows.

Wu,v2x+3={1If u=v0OtherwiseW^{2x+3}_{u,v}=\begin{cases}1&\text{If $u=v$}\\ 0&\text{Otherwise}\end{cases} (54)

We use a step function (a constant polynomial with degree equal to zero) with 10qs10^{q}s pieces to return the final status of each node in the graph and it is defined as follows.

σ2x+3:=X2x+2X2x+2s10×s10\sigma^{2x+3}\colon=X^{2x+2}-\left\lfloor{\frac{X^{2x+2}}{\left\lceil{\frac{s}{10}}\right\rceil}}\right\rfloor\times\left\lceil{\frac{s}{10}}\right\rceil (55)
(𝟐𝐱+𝟒)th\mathbf{(2x+4)}_{th} layer

This layer is used to convert the format of the output layer as same as the 𝟏st\mathbf{1}_{st} layer. And the output is H4+k={0,1}snH^{4+k}=\{0,1\}^{sn}.Weight matrix W2x+4=n×nsW^{2x+4}=\mathbb{R}^{n\times ns} is defined as follows.

Wu,ui2x+4={1for i{1,,s}0OtherwiseW^{2x+4}_{u,u^{i}}=\begin{cases}1&\text{for $i\in\{1,...,s\}$}\\ 0&\text{Otherwise}\end{cases} (56)

For each node uiu^{i} in the output layer, we introduce a fixed threshold ρui=(i1,i]\rho_{u^{i}}=(i-1,i]. The activation function σx+5\sigma^{x+5} is defined as follows.

σui2x+4={1If Hui10ρui 0Otherwise{}\sigma^{2x+4}_{u^{i}}=\begin{cases}1&\text{If $H^{10}_{u^{i}}\in\rho_{u^{i}}$ }\\ 0&\text{Otherwise}\end{cases} (57)
Refer to caption
Figure 6: A example of the substructure of recovery layers and suppose that there are 22 competitive cascades.

Therefore, the output H2x+4H^{2x+4} of NtN_{t} is the graph status after phase 2.

Lemma 5.

H2x+4=𝐈n,:,2,:H^{2x+4}=\mathbf{I}_{n,:,2,:}

(Same issue with the format of I)