¹¹institutetext: IIIS, Tsinghua University, Beijing, China
¹¹email: [email protected]
²²institutetext: Microsoft Research, Beijing, China
²²email: [email protected]

Causal Inference for Influence Propagation — Identifiability of the Independent Cascade Model

Shi Feng 11 0000-0001-5517-0419 Wei Chen 22

Abstract

Independent cascade (IC) model is a widely used influence propagation model for social networks. In this paper, we incorporate the concept and techniques from causal inference to study the identifiability of parameters from observational data in extended IC model with unobserved confounding factors, which models more realistic propagation scenarios but is rarely studied in influence propagation modeling before. We provide the conditions for the identifiability or unidentifiability of parameters for several special structures including the Markovian IC model, semi-Markovian IC model, and IC model with a global unobserved variable. Parameter identifiability is important for other tasks such as influence maximization under the diffusion networks with unobserved confounding factors.

Keywords:

influence propagation independent cascade model identifiability causal inference.

1 Introduction

Extensive research has been conducted studying the information and influence propagation behavior in social networks, with numerous propagation models and optimization algorithms proposed (cf. [15, 2]). Social influence among individuals in a social network is intrinsically a causal behavior — one’s action or behavior causes the change of the behavior of his or her friends in the network. Therefore, it is helpful to view influence propagation as a causal phenomenon and apply the tools in causal inference to this domain.

In causal inference, one key consideration is the confounding factors caused by unobserved variables that affect the observed behaviors of individuals in the network. For example, we may observe that user A adopts a new product and a while later her friend B adopts the same new product. This situation could be because A influences B and causes B’s adoption, but it could also be caused by an unobserved factor (e.g. an unknown information source) that affects both A and B. Confounding factors are important in understanding the propagation behavior in networks, but so far the vast majority of influence propagation research does not consider confounders in network propagation modeling. In this paper, we intend to fill this gap by explicitly including unobserved confounders into the model, and we borrow the research methodology from causal inference to carry out our research.

Causal inference research has developed many tools and methodologies to deal with such unobserved confounders, and one important problem in causal inference is to study the identifiability of the causal model, that is, if we can identify the certain effect of an intervention, or identify causal model parameters, from the observational data. In this paper, we introduce the concept of identifiability in causal inference research to influence propagation research and study whether the propagation models can be identified from observational data when there are unobserved factors in the causal propagation model. We propose the extend the classical independent cascade (IC) model to include unobserved causal factors, and consider the parameter identifiability problem for several common causal graph structures. Our main results are as follows. First, for the Markovian IC model, in which each unobserved variable may affect only one observed node in the network, we show that it is fully identifiable. Second, for the semi-Markovian IC model, in which each unobserved variable may affect exactly two observed nodes in the network, we show that as long as a local graph structure exists in the network, then the model is not parameter identifiable. For the special case of a chain graph where all observed nodes form a chain and every unobserved variable affect two neighbors on the chain, the above result implies that we need to know at least $n/2$ parameters to make the rest parameters identifiable, where $n$ is the number of observed nodes in the chain. We then show a positive result that when we know $n$ parameters on the chain, the rest parameters are identifiable. Third, for the global hidden factor model where we have an unobserved variable that affects all observed nodes in the graph, we provide reasonable sufficient conditions so that the parameters are identifiable.

Overall, we view that our work starts a new direction to integrate rich research results from network propagation modeling and causal inference so that we could view influence propagation from the lens of causal inference, and obtain more realistic modeling and algorithmic results in this area. For example, from the causal inference lens, the classical influence maximization problem [15] of finding a set of $k$ nodes to maximize the total influence spread is really a causal intervention problem of forcing an intervention on $k$ nodes for their adoptions, and trying to maximize the causal effect of this intervention. Our study could give a new way of studying influence maximization that works under more realistic network scenarios encompassing unobserved confounders.

2 Related Work

Influence Propagation Modeling. As described in [2], the main two models used to describe influence propagation are the independent cascade model and the linear threshold model. Past researches on influence propagation mostly focused on influence maximization problems, such as [15, 22]. In these articles, they select seed nodes online, observe the propagation in the network, and optimize the number of activated nodes after propagation by selecting optimal seed nodes. Also, some works are studying the seed-node set minimization problem, such as [11]. However, in our work, we mainly consider restoring the parameters in the independent cascade model by observing the network propagation. After obtaining the parameters in the network, we can then base on this to accomplish downstream tasks including influence maximization and seed-node set minimization.

Causal Inference and Identifiability. For general semi-Markovian Bayesian causal graphs, [13] and [21] have given two different algorithms to determine whether a do effect is identifiable, and these two algorithms have both soundness and correctness. [14] also proves that the ID algorithm and the repeating use of the do calculus are equivalent, so for semi-Markovian Bayesian causal graphs, the do calculus can be used to compute all identifiable do effects.

In addition, for a special type of causal model, the linear causal model, articles [4] and [8] have given some necessary conditions and sufficient conditions on whether the parameters in the graph are identifiable with respect to the structure of the causal graph. However, the necessary and sufficient condition for parameter identifiability problem is not addressed and it remains an open question. In this paper, we study another special causal model derived from the IC model. Since the IC model can be viewed as a Bayesian causal model when the graph structure is a directed acyclic graph and it has some special properties, we try to give some necessary conditions and sufficient conditions for the parameters to be identifiable under some special graph structures.

3 Model and Problem Definitions

Following the convention in causal inference literature (e.g. [19]), we use capital letters ( $U,V,X,\ldots$ ) to represent variables or a set of variables, and their corresponding lower-case letters to represent their values. For a directed graph, we use $U$ ’s and $V$ ’s to represent nodes since each node will also be treated as a random variable in causal inference. For a node $V_{i}$ , we use $N^{+}(V_{i})$ and $N^{-}(V_{i})$ to represent the set of its out-neighbors and in-neighbors, respectively. When the graph is directed acyclic (DAG), we refer to a node’s in-neighbors as its parents and denote the set as ${\it Pa}(V_{i})=N^{-}(V_{i})$ . When we refer to the actual values of the parent nodes of $V_{i}$ , we use ${\it pa}(V_{i})$ . For a positive integer $k$ , we use $[k]$ to denote $\{1,2,\ldots,k\}$ . We use boldface letters to represent vectors, such as $\boldsymbol{r}=(r_{1},r_{2},\ldots,r_{n})=(r_{i})_{i\in[n]}$ .

The classical independent cascade model [15] of influence diffusion in a social network is modeled as follows. The social network is modeled as a directed graph $G=(V,E)$ , where $V=\{V_{1},V_{2},\cdots,V_{n}\}$ is the set of nodes representing individuals in the social network, and $E\subseteq V\times V$ is the set of directed edges representing the influence relationship between the individuals. Each edge $(V_{i},V_{j})\in E$ is associated with an influence probability $p(i,j)\in(0,1]$ (we assume that $p(i,j)=0$ if $(V_{i},V_{j})\notin E$ ). Each node is either in state $0$ or state $1$ , representing the idle state and the active state, respectively. At time step $0$ , a seed set $S_{0}\subseteq V$ of nodes is selected and activated (i.e. their states are set to $1$ ), and all other nodes are in state $0$ . The propagation proceeds in discrete time steps $t=1,2,\ldots$ . Let $S_{t}$ denote the set of nodes that are active by time $t$ , and let $S_{-1}=\emptyset$ . At any time $t=1,2,\ldots$ , the newly activated node $V_{i}\in S_{t-1}\setminus S_{t-2}$ tries to activate each of its inactive outgoing neighbors $V_{j}\in N^{+}(V_{i})$ , and the activation is successful with probability $p(i,j)$ . If successful, $V_{j}$ is activated at time $t$ and thus $V_{j}\in S_{t}$ . The activation trial of $V_{i}$ on its out-neighbor $V_{j}$ is independent of all other activation trials. Once activated, nodes stay as active, that is, $S_{t-1}\subseteq S_{t}$ . The propagation process ends at a step when there are no new nodes activated. It easy to see that the propagation ends in at most $n-1$ steps, so we use $S_{n-1}$ to denote the final set of active nodes after the propagation.

Influence propagation is naturally a result of causal effect — one node’s activation causes the activation of its outgoing neighbors. If the graph is directed and acyclic, then the IC model on this graph can be equated to a Bayesian causal model. In fact, we can consider each node in the IC model as a variable, and for a node $V_{i}$ , it takes the value determined by $P(V_{i}=1|{\it pa}(V_{i}))=1-\prod_{j:V_{j}\in{\it Pa}(V_{i}),v_{j}=1\text{ in }{\it pa}(V_{i})}(1-p_{j,i})$ . Obviously, this is equivalent to our definition in the IC model. IC model is introduced in [15] to model influence propagation in social networks, but in general, it can model the causal effects among binary random variables. In this paper, we mainly consider the directed acyclic graph (DAG) setting, which is in line with the causal graph setting in the causal inference literature [19]. We will discuss the extension to general cyclic graphs or networks in the appendix.

All variables $V_{1},V_{2},\ldots,V_{n}$ are observable, and we call them observed variables. They correspond to observed behaviors of individuals in the social network. There are also potentially many unobserved (or hidden) variables that affecting individuals’ behaviors. We use $U=\{U_{1},U_{2},\ldots\}$ to represent the set of unobserved variables. In the IC model, we assume each $U_{i}$ is a binary random variable with probability $r_{i}$ to be $1$ and probability $1-r_{i}$ to be $0$ , and all unobserved variables are mutually independent. We allow unobserved variables $U_{i}$ ’s to have directed edges pointing to the observed variables $V_{j}$ ’s, but we do not consider directed edges among the unobserved variables in this paper. If $U_{i}$ has a directed edge pointing to $V_{j}$ , we usually use $q_{i,j}$ to represent the parameter on this edge. It has the same semantics as the $p_{i,j}$ ’s in the classical IC model: if $U_{i}=1$ , then with probability $q_{i,j}$ $U_{i}$ successfully influence $V_{j}$ by setting its state to $1$ , and with probability $1-q_{i,j}$ $V_{j}$ ’s state is not affected by $U_{i}$ , and this influence or activation effect is independent from all other activation attempts on other edges. Thus, overall, in a network with unobserved or hidden variables, we use $G=(U,V,E)$ to represent the corresponding causal graph, where $U$ is the set of unobserved variables, $V$ is the set of observed variables, and $E\subseteq(V\times V)\cup(U\times V)$ is the set of directed edges. We assume that $G$ is a DAG, and the state of every unobserved variable $U_{i}$ is sampled from $\{0,1\}$ with parameter $r_{i}$ , while the state of every observed variable $V_{j}$ is determined by the states of its parents and the parameters on the incoming edges of $V_{j}$ following the IC model semantics. In the DAG $G$ , we refer to an observable node $V_{i}$ as a root if it has no observable parents in the graph. Every root $V_{i}$ has at least one unobserved parent. We use vectors $\boldsymbol{p},\boldsymbol{q},\boldsymbol{r}$ to represent parameter vectors associated with edges among observed variables, edges from unobserved to observed variables, and unobserved nodes, respectively. We refer to the model $M=(G=(U,V,E),\boldsymbol{p},\boldsymbol{q},\boldsymbol{r})$ as the causal IC model. When the distinction is needed, we use capital letters $P,Q,R$ to represent the parameter names, and lower boldface letters $\boldsymbol{p},\boldsymbol{q},\boldsymbol{r}$ to represent the parameter values.

In this paper, we focus on the parameter identifiability problem following the causal inference literature. In the context of the IC model, the states of nodes $V=\{V_{1},V_{2},\ldots,V_{n}\}$ are observable while the states of $U=\{U_{1},U_{2},\ldots\}$ are unobservable. We define parameter identifiability as follows.

Definition 1 (Parameter Identifiability)

Given a graph $G=(U,V,E)$ , we say that a set of IC model parameters $\Theta\subseteq P\cup Q\cup R$ on $G$ is identifiable if after fixing the values of parameters outside $\Theta$ and fixing the observed probability distributions $P(V^{\prime}=v^{\prime})$ for all $V^{\prime}\subseteq V$ and all $v^{\prime}\in\{0,1\}^{|V^{\prime}|}$ , the values of parameters in $\Theta$ are uniquely determined. We say that the graph $G$ is parameter identifiable if $\Theta=P\cup Q\cup R$ . Accordingly, the algorithmic problem of parameter identifiability is to derive the unique values of parameters in $\Theta$ given graph $G=(U,V,E)$ , the values of parameters outside $\Theta$ , and the observed probability distributions $P(V^{\prime}=v^{\prime})$ for all $V^{\prime}\subseteq V$ and all $v^{\prime}\in\{0,1\}^{|V^{\prime}|}$ . Finally, if the algorithm only uses a polynomial number of observed probability values $P(V^{\prime}=v^{\prime})$ ’s and runs in polynomial time, where both polynomials are with respect to the graph size, we say that the parameters in $\Theta$ are efficiently identifiable.

Note that when there are no unobserved variables (except the unique unobserved variables for each root of the graph), the problem is mainly to derive the parameters $p_{i,j}$ ’s from all observed $P(V^{\prime}=v^{\prime})$ ’s. In this case, the parameter identifiability problem bears similarity with the well-studied network inference problem [10, 16, 9, 7, 18, 1, 3, 6, 5, 17, 20, 12]. The network inference problem focuses on using observed cascade data to derive the network structure and propagation parameters, and it emphasizes on the sample complexity of inferring parameters. Hence, when there are no unobserved variables in the model, we could use the network inference methods to help to solve the parameter identifiability problem. However, in real social influence and network propagation, there are other hidden factors that affect the propagation and the resulting distribution. Such hidden factors are not addressed in the network inference literature. In contrast, our study in this paper is focusing on addressing these hidden factors in network inference, and thus we borrow the ideas from causal inference to study the identifiability problem under the IC model.

In this paper, we study three types of unobserved variables that could commonly occur in network influence propagation. They correspond to three types of IC models with unobserved variables, as summarized below.

Markovian IC Model. In the Markovian IC model, each observed variable $V_{i}$ is associated with a unique unobserved variable $U_{i}$ , and there is a directed edge from $U_{i}$ to $V_{i}$ . This models the scenario where each individual in the social network has some latent and unknown factor that affects its observed behavior. We use $q_{i}$ to denote the parameter on the edge $(U_{i},V_{i})$ . Note that the effect of $U_{i}$ on the activation of $V_{i}$ is determined by probability $r_{i}\cdot q_{i}$ , and thus we treat $r_{i}=1$ for all $i\in[n]$ , and focus on identifying parameters $q_{i}$ ’s. Thus the graph $G=(U,V,E)$ has parameters $\boldsymbol{q}=(q_{i})_{i\in[n]}$ , and $\boldsymbol{p}=(p_{i,j})_{(V_{i},V_{j})\in E}$ . Figure 2 shows an example of a Markovian IC model. If some $q_{i}=0$ , it means that the observed variable $V_{i}$ has no latent variable influencing it, and it only receives influence from other observed variables.

Refer to caption — Figure 1: A Markovian IC model with five nodes.

Semi-Markovian IC Model. The second type of unobserved variables is the hidden variables connected to exactly two observed variables in the graph. In particular, for every pair of nodes $V_{i},V_{j}\in V$ , we allow one unobserved variable $U_{i,j}$ that has two edges, one pointing to $V_{i}$ and the other pointing to $V_{j}$ . This models the scenario that two individuals in the social network has a common unobserved confounder that may affect the behavior of two individuals. We call this type of model semi-Markovian IC model, following the common terminology of the semi-Markovian model in the literature [19]. In this model, each $U_{i,j}$ has a parameter $r_{i,j}$ , and edges $(U_{i,j},V_{i})$ and $(U_{i,j},V_{j})$ have parameters $q_{i,j,1}$ and $q_{i,j,2}$ respectively. Therefore, the graph has parameters $\boldsymbol{r}=(r_{i,j})_{(V_{i},V_{j})\in E}$ , $\boldsymbol{q}=(q_{i,j,1},q_{i,j,2})_{(V_{i},V_{j})\in E}$ , and $\boldsymbol{p}=(p_{i,j})_{(V_{i},V_{j})\in E}$ .

Within this model, we will pay special attention to a special type of graphs where the observed variables form a chain, i.e. $V_{1}\rightarrow V_{2}\rightarrow\cdots\rightarrow V_{n}$ , and the unobserved variables always point to the two neighbors on the chain. In this case, we use $U_{i}$ to denote the unobserved variable associated with edge $(V_{i},V_{i+1})$ , and the parameters on the edges $(U_{i},V_{i})$ and $(U_{i},V_{i+1})$ are denoted as $q_{i,1}$ and $q_{i,2}$ , respectively. Figure 3 represents this chain model.

IC Model with A Global Unobserved Variable. The third type of hidden variables is a global unobserved variable $U_{0}$ that points to all observed variables in the network. This naturally models the global causal effect where some common factor affects all or most individuals in the network. For every edge $(U_{0},V_{i})$ , we use $q_{0,i}$ to represent its parameter.

Moreover, we can combine this model with the Markovian IC model, where we allow both unobserved variable $U_{i}$ for each individual and a global unobserved varoable $U_{0}$ . Figure 2 represents this model.

4 Parameter Identifiability of the Markovian IC Model

For the Markovian IC model in which every observed variable has its own unobserved variable, we can fully identify the model parameters in most cases, as given by the following theorem.

Theorem 4.1 (Identifiability of the Markovian IC Model)

For an arbitrary Markovian IC model $G=(U,V,E)$ with parameters $\boldsymbol{q}=(q_{i})_{i\in[n]}$ and $\boldsymbol{p}=(p_{i,j})_{(V_{i},V_{j})\in E}$ , all the $q_{i}$ parameters are efficiently identifiable, and for every $i\in[n]$ , if $q_{i}\neq 1$ , then all $p_{j,i}$ parameters for $(V_{j},V_{i})\in E$ are efficiently identifiable.

Proof

For an observed variable (node) $V_{i}$ , suppose that its observed parents are $V_{i_{1}},V_{i_{2}},\cdots,V_{i_{t}}$ . Therefore, we have

	$\displaystyle P(V_{i}=0\|V_{i_{1}}=0,\cdots,V_{i_{t}}=0)=1-q_{i},$		(1)
	$\displaystyle P(V_{i}=0\|V_{i_{j}}=1,V_{i_{1}}=0,\cdots,V_{i_{j-1}}=0,V_{i_{j+1}}=0,\cdots,V_{i_{t}}=0)=(1-q_{i})(1-p_{i_{j},i}).$		(2)

From Eq.(1), we can obtain the value of $q_{i}$ . Then if $q_{i}\neq 1$ , from Eq.(2), we can derive the value of $p_{i_{j},i}$ . Moreover, for each root node $V_{i}$ , we can get $q_{i}$ by computing $q_{i}=P(V_{i}=1)$ . The computational efficiency is obvious.∎

The theorem essentially says that all parameters are identifiable under the Markovian IC model, except for the corner case where some $q_{i}=1$ . In this case, the observed variable $V_{i}$ is fully determined by its unobserved parent $U_{i}$ , so we cannot determine the influence from other observed parents of $V_{i}$ to $V_{i}$ . But the influence from the observed parents of $V_{i}$ to $V_{i}$ is not useful any way in this case, so the edges from the observed parents of $V_{i}$ to $V_{i}$ will not affect the causal inference in the graph and they can be removed.

5 Parameter Identifiability of the Semi-Markovian IC Model

Following the definition in the model section, we then consider the identifiability problem of the semi-Markovian models. We will demonstrate that in most cases, this model is not parameter identifiable. Actually, from [21] we know that the semi-Markovian Bayesian causal model is also not identifiable in general. Essentially, our conclusion is not related to their result. On the other side, we will show that with some parameters known in advance, the semi-Markovian IC chain model will be identifiable.

5.1 Condition on Unidentifiability of the Semi-Markovian IC Model

More specifically, the following theorem shows the unidentifiability of the semi-Markovian IC model with a special structure in it.

Theorem 5.1 (Unidentifiability of the Semi-Markovian IC Model)

Suppose in a general graph $G$ , we can find the following structure. There are three observable nodes $V_{1},V_{2},V_{3}$ such that $(V_{1},V_{2})\in E,(V_{2},V_{3})\in E$ and unobservable $U_{1},U_{2}$ with $(U_{1},V_{1}),(U_{1},V_{2}),(U_{2},V_{2}),(U_{2},V_{3})\in E$ . Suppose each of $U_{1},U_{2}$ only has two edges associated to it, the three nodes $V_{1},V_{2},V_{3}$ can be written adjacently in a topological order of nodes in $U\cup V$ . Then we can deduce that the graph $G$ is not parameter identifiable.

Figure 4 is an example of the structure described in the above theorem.

Proof (Outline)

To prove that the parameters in the model with this structure are not identifiable, we give two different sets of parameters directly. We show that these two different sets of parameters produce the same distribution of nodes in $V$ , and thus the set of parameters is not identifiable by observing only the distribution of $V$ . The details of these two sets of parameters and the distributions they produce are included in Appendix 0.A. ∎

5.2 Identifiability of the Chain Model

We now consider the chain model as described in Section 3 and depicted in Figure 3. In this structure, we present a conclusion of identifiability under the assumption that the valuations of some parameters are our prior knowledge.

We divide the parameters of the graph into four vectors

	$\displaystyle{\boldsymbol{q}}_{1}=(q_{1,1},q_{2,1},\cdots,q_{n-1,1}),{\boldsymbol{q}}_{2}=(q_{1,2},q_{2,2},\cdots,q_{n-1,2}),$		(3)
	$\displaystyle{\boldsymbol{p}}=(p_{1},p_{2},\cdots,p_{n-1}),{\boldsymbol{r}}=(r_{1},r_{2},\cdots,r_{n-1}).$		(4)

For the chain model, our theorem below shows that once the parameters $p_{1}$ is known, ${\boldsymbol{q}}_{2}$ or ${\boldsymbol{r}}$ is known, the set consists of remaining parameters in the chain is efficiently identifiable.

Theorem 5.2 (Identifiability of the Semi-Markovian IC Chain Model)

Suppose that we have a semi-Markovian IC chain model with the graph $G=(U,V,E)$ and the IC parameters ${\boldsymbol{p}}=(p_{i})_{i\in[n-1]}$ , ${\boldsymbol{q}}_{1}=(q_{i,1})_{i\in[n-1]}$ , ${\boldsymbol{q}}_{2}=(q_{i,2})_{i\in[n-1]}$ and ${\boldsymbol{r}}=(r_{i})_{i\in[n-1]}$ , and suppose that all parameters are in the range $(0,1)$ . If the values of parameter $p_{1}$ is known, ${\boldsymbol{q}}_{2}$ or ${\boldsymbol{r}}$ is known, then the remaining parameters are efficiently identifiable.

Proof (Outline)

We use induction to prove this theorem. Under the assumption that $p_{1}$ is known and ${\boldsymbol{q}}_{2}$ or ${\boldsymbol{r}}$ is known, suppose $p_{1},p_{2},\cdots,p_{t-2}$ , $r_{1},r_{2},\cdots,r_{t-2}$ , $q_{1,1},q_{2,1},\cdots,q_{t-2,1}$ and $q_{1,2},q_{2,2},\cdots,q_{t-2,2},r_{t-1}q_{t-1,1}$ has been determined by us, and we prove that $q_{t-1,1},r_{t-1},p_{t-1},q_{t-1,2}$ and $r_{t}q_{t,1}$ can also be determined. In fact, by the distribution of the first $t$ nodes on the chain we can obtain three different equations, and after substituting our known parameters, the inductive transition can be completed. It is worthy noting that this inductive process can also be used to compute the unknown parameters efficiently.

The proof is lengthy because of the many corner cases considered and the need to discuss the cases $t=n,t=2$ and $2<t<n$ . The details of this proof are included in Appendix 0.B. $\Box$

According to Theorem 5.2 we get that the semi-Markovian chain is parameter identifiable in the case that $n$ particular parameters are known. Simultaneously, by Theorem 5.1, we can show that if just less than $\lfloor\frac{n+1}{2}\rfloor$ parameters are known, then this semi-Markovian chain will not be parameter identifiable. Actually, if the chain model is parameter identifiable, utilizing Theorem 5.1, we know that for each $2\leq t\leq n-1$ , at least one of parameters between $p_{t-1},p_{t},r_{t-1},r_{t},q_{t-1,1},q_{t-1,2},q_{t,1}$ and $q_{t,2}$ should be known. Therefore, we let $t=2,4,\cdots,2\lfloor\frac{n-1}{2}\rfloor$ , we can deduce that at least $\lfloor\frac{n-1}{2}\rfloor$ should be known. Formally, we have the following collary of Theorem 5.1 and Theorem 5.2.

Corollary 1

For a semi-Markovian IC chain model, if no more than $\lfloor\frac{n-1}{2}\rfloor$ parameters are known in advance, the remaining parameters are unidentifiable; if it is allowed to know $n$ parameters in advance, we can choose $p_{1},{\boldsymbol{q}}_{2}$ or $p_{1},{\boldsymbol{r}}$ to be known, then the remaining parameters are identifiable.

6 Parameter Identifiability of Model with a Global Hidden Variable

Next, we consider the case where there is a global hidden variable in the causal IC model, defined as those in Section 3. If there is only one hidden variable $U_{0}$ in the whole model, we prove that the parameters in general in this model are identifiable; if there is not only $U_{0}$ , the model is also Markovian, that is, there are also $n$ hidden variables $U_{1},\cdots,U_{n}$ corresponding to $V_{1},V_{2},\cdots,V_{n}$ , then the parameters in this model are identifiable if certain conditions are satisfied.

6.1 Observable IC Model with Only a Global Hidden Variable

Suppose the observed variables in the connected DAG graph $G=(U,V,E)$ are $V_{1},V_{2},\cdots,V_{n}$ in a topological order and there is a global hidden variable $U_{0}$ such that there exists an edge from $U_{0}$ to the node for each observable variable $V_{i}$ . Suppose the activating probability of $U_{0}$ is $r$ and the activating probability from $U$ to $V_{i}$ is $q_{i}\in[0,1)$ (naturally, $q_{1}\neq 0$ and there are at least $3$ of nonzero $q_{i}$ ’s). Now we propose a theorem according to these settings.

Theorem 6.1 (Identifiability of the IC Model with a Global Hidden Variable)

For an arbitrary IC model with a global hidden variable $G=(U,V,E)$ with parameters ${\bf q}=(q_{i})_{i\in[n]}$ , $r$ and ${\bf p}=(p_{i,j})_{(V_{i},V_{j})\in E}$ such that $q_{i}\neq 1,p_{i,j}\neq 1$ and $r\neq 1$ for $\forall i,j\in[n]$ , all the parameters in ${\bf p},r$ and ${\bf q}$ are identifiable.

Proof (Outline)

We discuss this problem in two cases, the first one is the existence of two disconnected points $V_{i},V_{j},i<j$ in $V$ and $q_{i},q_{j}\neq 0$ . At this point we can use $1-q_{j}=\frac{P(V_{1}=0,V_{2}=0,\cdots,V_{i}=1,V_{i+1}=0,\cdots,V_{j}=0)}{P(V_{1}=0,V_{2}=0,\cdots,V_{i}=1,V_{i+1}=0,\cdots,V_{j-1}=0)}$ to solve out $q_{j}$ , and then use $P(V_{1}=0,V_{2}=0,\cdots,V_{j}=0)$ and $P(V_{1}=0,V_{2}=0,\cdots,V_{j-1}=0)$ to solve out $r$ .

After getting $r$ , by the quotients of probabilities of propagating results, we can get all the parameters.

Another case is that there is no $V_{i},V_{j}$ as described above. At this point there must exist three points $V_{i},V_{j},V_{k}$ that are connected with each other and $q_{i},q_{j},q_{k}\neq 0$ . We observe the probabilities of different possible propagating results of these three points with all other nodes are $0$ after the propagation. From these, we can solve out $q_{i},q_{j},q_{k}$ , and then solve out all parameters by the same method as in the first case.∎

6.2 Markovian IC Model with a Global Hidden Variable (Mixed Model)

Suppose the model is $G=(U,V,E)$ , where $U=\{U_{0},U_{1},U_{2},\cdots,U_{n}\}$ , $V=\{V_{1},V_{2},\cdots,V_{n}\}$ . Here, $V_{1},V_{2},\cdots,V_{n}$ are in a topological order. The parameters are $r_{0}$ , ${\bf q}_{0}=(q_{0,i})_{i\in[n]}$ , ${\bf q}=(q_{i})_{i\in[n]}$ and ${\bf p}=(p_{i,j})_{(V_{i},V_{j})\in E}$ .

Theorem 6.2 (Identifiability of Markovian IC Model with a Global Hidden Variable (Mixed Model))

For an arbitrary Markovian IC Model with a Global Hidden Variable $G=(U,V,E)$ with parameters $r_{0}$ , ${\bf q}_{0}=(q_{0,i})_{i\in[n]}$ , ${\bf q}=(q_{i})_{i\in[n]}$ and ${\bf p}=(p_{i,j})_{(V_{i},V_{j})\in E}$ , we suppose that all the parameters are not $1$ . If $\exists i,j,k\in[n],i<j<k$ such that each pair in $V_{i},V_{j},V_{k}$ are disconnected and $q_{0,i},q_{0,j},q_{0,k}\neq 0$ , then the parameters $q_{0,t},q_{t}$ and $p_{t,l},l>t>k$ are identifiable. Moreover, if $V_{i},V_{j},V_{k}$ can be adjacently continuous in some topological order, i.e. $j=i+1,k=i+2$ without loss of generality, all the parameters are identifiable.

Proof (Outline)

Assuming that there exist $V_{i},V_{j},V_{k}$ that satisfy the requirements of the theorem, then we can write expressions for the distribution of these three parameters when all other nodes with subscripts not greater than $l$ are equal to $0$ . In fact, we can see that with these $8$ expressions, we can solve for $P(V_{1}=0,\cdots,V_{l}=0,U_{0}=1)$ and $P(V_{1}=0,\cdots,V_{l}=0,U_{0}=0)$ .

Since we have $P(V_{1}=0,\cdots,V_{l}=0,U_{0}=1)=r\prod_{t=1}^{l}(1-q_{t})(1-q_{0,t})$ and $P(V_{1}=0,\cdots,V_{l}=0,U_{0}=0)=(1-r)\prod_{t=1}^{l}(1-q_{t})$ , we will be able to obtain all the parameters very easily by dividing these equations two by two. This proof has some trivial discussion to show that this computational method does not fail due to corner cases. ∎

Notice that the parameters in this model are identifiable when and only when a special three-node structure appears in it. Intuitively, this is because through this structure we can more easily obtain some information about the parameters, which does not contradict the intuition of Theorem 5.1.

7 Conclusion

In this paper, we study the parameter identifiability of the independent cascade model in influence propagation and show conditions on identifiability or unidentifiability for several classes of causal IC model structure. We believe that the incorporation of observed confounding factors and causal inference techniques is important in the next step of influence propagation research and identifiability of the IC model is our first step towards this goal. There are many open problems and directions in combining causal inference and propagation research. For example, seed selection and influence maximization correspond to the intervention (or do effect) in causal inference, and how to compute such intervention effect under the network with unobserved confounders and how to do influence maximization is a very interesting research question. In terms of identifiability, one can also investigate the identifiability of the intervention effect, or whether given some intervention effect one can identify more of such effects. One can also look into identifiability in the general cyclic IC models, for which we provide some initial discussions in Appendix 0.E, but more investigations are needed.

References

[1] Abrahao, B., Chierichetti, F., Kleinberg, R., Panconesi, A.: Trace complexity of network inference. arXiv e-prints pp. arXiv–1308 (2013)
[2] Chen, W., Lakshmanan, L.V., Castillo, C.: Information and Influence Propagation in Social Networks. Morgan & Claypool Publishers (2013)
[3] Daneshmand, H., Gomez-Rodriguez, M., Song, L., Schölkopf, B.: Estimating diffusion network structures: Recovery conditions, sample complexity & soft-thresholding algorithm. In: ICML (2014)
[4] Drton, M., Foygel, R., Sullivant, S.: Global identifiability of linear structural equation models. The Annals of Statistics 39(2), 865–886 (2011)
[5] Du, N., Liang, Y., Balcan, M., Song, L.: Influence function learning in information diffusion networks. In: ICML 2014, Beijing, China, 21-26 June 2014 (2014)
[6] Du, N., Song, L., Gomez-Rodriguez, M., Zha, H.: Scalable influence estimation in continuous-time diffusion networks. In: NIPS 2013, December 5-8, 2013, Lake Tahoe, Nevada, United States. pp. 3147–3155 (2013)
[7] Du, N., Song, L., Smola, A.J., Yuan, M.: Learning networks of heterogeneous influence. In: NIPS 2012, December 3-6, 2012, Lake Tahoe, Nevada, United States. pp. 2789–2797 (2012)
[8] Foygel, R., Draisma, J., Drton, M.: Half-trek criterion for generic identifiability of linear structural equation models. The Annals of Statistics pp. 1682–1713 (2012)
[9] Gomez-Rodriguez, M., Balduzzi, D., Schölkopf, B.: Uncovering the temporal dynamics of diffusion networks. arXiv preprint arXiv:1105.0697 (2011)
[10] Gomez-Rodriguez, M., Leskovec, J., Krause, A.: Inferring networks of diffusion and influence. In: KDD’2010 (2010)
[11] Goyal, A., Bonchi, F., Lakshmanan, L.V., Venkatasubramanian, S.: On minimizing budget and time in influence propagation over social networks. Social network analysis and mining 3(2), 179–192 (2013)
[12] He, X., Xu, K., Kempe, D., Liu, Y.: Learning influence functions from incomplete observations. arXiv e-prints pp. arXiv–1611 (2016)
[13] Huang, Y., Valtorta, M.: Identifiability in causal bayesian networks: A sound and complete algorithm. In: AAAI. pp. 1149–1154 (2006)
[14] Huang, Y., Valtorta, M.: Pearl’s calculus of intervention is complete. arXiv preprint arXiv:1206.6831 (2012)
[15] Kempe, D., Kleinberg, J.M., Tardos, É.: Maximizing the spread of influence through a social network. In: KDD (2003)
[16] Myers, S.A., Leskovec, J.: On the convexity of latent social network inference. arXiv e-prints pp. arXiv–1010 (2010)
[17] Narasimhan, H., Parkes, D.C., Singer, Y.: Learnability of influence in networks. In: Proceedings of the 29th Annual Conference on Neural Information Processing Systems (2015)
[18] Netrapalli, P., Sanghavi, S.: Finding the graph of epidemic cascades. arXiv preprint arXiv:1202.1779 (2012)
[19] Pearl, J.: Causality. Cambridge University Press (2009), 2nd Edition
[20] Pouget-Abadie, J., Horel, T.: Inferring graphs from cascades: A sparse recovery framework. arXiv e-prints pp. arXiv–1505 (2015)
[21] Shpitser, I., Pearl, J.: Identification of joint interventional distributions in recursive semi-markovian causal models. In: Proceedings of the 21st National Conference on Artificial Intelligence, 2006. pp. 1219–1226 (2006)
[22] Tang, Y., Shi, Y., Xiao, X.: Influence maximization in near-linear time: A martingale approach. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data. pp. 1539–1554 (2015)

Appendix

Appendix 0.A Proof for Unidentifiability of the Semi-Markovian IC Model (Theorem 5.1)

See 5.1

Proof

To prove that the parameters are unidentifiable, we will construct two different sets of valuations of parameters such that two distributions of values taken by the nodes in $V$ are the same. In fact, we assume that the parent nodes of $V_{i}$ are $V_{i,1},V_{i,2},\cdots,V_{i,t_{i}}$ and $U_{i,1},\cdots,U_{i,s_{i}}$ for $i=1,2,3$ . And, the parameters are set as shown in Figure 4.

One set of parameters is that all the parameters are set to $0.5$ . Another set of parameters is that all the parameters are set to be $0.5$ except $r_{1},r_{2},q_{1,1},q_{1,2},q_{2,2}$ and $q_{2,3}$ . The exceptions are set to be

	$\displaystyle r_{1}=\frac{10r_{2}-7}{12r_{2}-10},q_{1,1}=\frac{1}{4r_{1}},$		(5)
	$\displaystyle q_{1,2}=\frac{6r_{2}-5}{8r_{2}-8},q_{2,1}=\frac{1}{3-2r_{2}},q_{2,2}=\frac{1}{4r_{2}}$		(6)

where $r_{2}$ is an arbitrary number between $\frac{1}{4}$ and $\frac{9}{14}$ . Then we only need to prove that for an arbitrary distribution of the ancestors of $V_{1},V_{2},V_{3}$ except $U_{1},U_{2}$ , the distributions of $V_{1},V_{2},V_{3}$ are the same for the two parameter settings. This is because if this condition is satisfied, then the children of $V_{1},V_{2},V_{3}$ will not be affected by the differences of the parameter valuations because the parameters determining them are the same (only the distribution of $U_{1},U_{2}$ will change but each of them only has two children that are in $\{V_{1},V_{2},V_{3}\}$ ). In fact, we have

	$\displaystyle P(V_{1}=1,V_{2}=1,V_{3}=1\|pa(V_{1}),pa(V_{2}),pa(V_{3}))$		(7)
	$\displaystyle=\frac{13+3P(V_{2}=1\|pa(V_{2})+3P(V_{1}=1\|pa(V_{2})(11+5P(V_{2}=1\|pa(V_{2}))}{128},$		(8)
	$\displaystyle P(V_{1}=1,V_{2}=1,V_{3}=0\|pa(V_{1}),pa(V_{2}),pa(V_{3}))=\frac{23+9P(V_{2}=1\|pa(V_{2})}{64},$		(9)
	$\displaystyle P(V_{1}=1,V_{2}=0,V_{3}=1\|pa(V_{1}),pa(V_{2}),pa(V_{3}))$		(10)
	$\displaystyle=\frac{3(1+5P(V_{1}=1\|pa(V_{1})))(P(V_{2}=1\|pa(V_{2}))-1)}{128},$		(11)
	$\displaystyle P(V_{1}=0,V_{2}=1,V_{3}=1\|pa(V_{1}),pa(V_{2}),pa(V_{3}))$		(12)
	$\displaystyle=\frac{3(1-P(V_{1}=1\|pa(V_{1})))(5P(V_{2}=1\|pa(V_{2}))+3)}{64},$		(13)
	$\displaystyle P(V_{1}=1,V_{2}=0,V_{3}=0\|pa(V_{1}),pa(V_{2}),pa(V_{3}))$		(14)
	$\displaystyle=\frac{3(1+5P(V_{1}=1\|pa(V_{1})))(1-P(V_{2}=1\|pa(V_{2})))}{128},$		(15)
	$\displaystyle P(V_{1}=0,V_{2}=1,V_{3}=0\|pa(V_{1}),pa(V_{2}),pa(V_{3}))$		(16)
	$\displaystyle=\frac{3(1-P(V_{1}=1\|pa(V_{1})))(3+5P(V_{2}=1\|pa(V_{2})))}{64},$		(17)
	$\displaystyle P(V_{1}=0,V_{2}=1,V_{3}=1\|pa(V_{1}),pa(V_{2}),pa(V_{3}))$		(18)
	$\displaystyle=\frac{3(1+5P(V_{1}=1\|pa(V_{1})))(1-P(V_{2}=1\|pa(V_{2})))}{128},$		(19)
	$\displaystyle P(V_{1}=0,V_{2}=0,V_{3}=1\|pa(V_{1}),pa(V_{2}),pa(V_{3}))$		(20)
	$\displaystyle=\frac{15(1-P(V_{1}=1\|pa(V_{1})))(1-P(V_{2}=1\|pa(V_{2})))}{64},$		(21)
	$\displaystyle P(V_{1}=0,V_{2}=0,V_{3}=1\|pa(V_{1}),pa(V_{2}),pa(V_{3}))$		(22)
	$\displaystyle=\frac{15(1-P(V_{1}=1\|pa(V_{1})))(1-P(V_{2}=1\|pa(V_{2})))}{64}.$		(23)

Notice that these equations are only related to $P(V_{1}=1|pa(V_{1}))$ and $P(V_{2}=1|pa(V_{2}))$ , so until now, we have proved that the two sets of parameters produce two same distributions for observed variables and therefore $G$ is parameter unidentifiable at this point.∎

Appendix 0.B Proof for the Chain Structure (Theorem 5.2)

See 5.2

Proof

In the analysis, we use the following notations. For observed nodes $V_{1},V_{2},$ $\ldots,V_{t}$ , we use $\overline{V_{1}\cdots V_{t}}$ to represent their collective states as a bit string. For a bit string $\gamma$ of length $t$ , we write

		$\displaystyle a^{\gamma}=P(\overline{V_{1}\cdots V_{t}}=\gamma),b^{\gamma}=P(\overline{V_{1}\cdots V_{t}}=\gamma,U_{t}=0),$		(24)
		$\displaystyle c^{\gamma}=P(\overline{V_{1}\cdots V_{t}}=\gamma,U_{t}=1).$		(24)

Note that $a^{\gamma}$ is observable, but $b^{\gamma}$ and $c^{\gamma}$ are not observable.

We will use induction method to prove this result. More specifically, we want to prove the base step that $p_{1},r_{1},q_{1,1},q_{1,2}$ and $q_{2,1}r_{2}$ are known. Moreover, we will prove the induction step that if $p_{1},p_{2},\cdots,p_{t-2}$ , $r_{1},r_{2},\cdots,r_{t-2}$ , $q_{1,1},q_{2,1},\cdots,q_{t-2,1}$ and $q_{1,2},q_{2,2},\cdots,q_{t-2,2},r_{t-1}q_{t-1,1}$ are already known, we can compute $q_{t-1,1},r_{t-1},p_{t-1},q_{t-1,2}$ and $r_{t}q_{t,1}$ using distributions of observed variables.

Initially, in order to complete the induction step, we divide the problem into two cases which are $t=n$ and $3\leq t\leq n-1$ .

Firstly, if $t=n$ , we only need to compute $q_{t-1,1},r_{t-1},p_{t-1},q_{t-1,2}$ . Actually, we have the following relations of $a^{\overline{\gamma 0}}$ and other known variables. If $\gamma[-1]=0$ , we have

$\displaystyle a^{\overline{\gamma 0}}$	$\displaystyle=b^{\gamma}+(1-q_{t-1,2})c^{\gamma}$	(25)
	$\displaystyle=\frac{1-r_{t-1}}{1-q_{t-1,1}r_{t-1}}a^{\gamma}+\frac{r_{t-1}(1-q_{t-1,1})}{1-q_{t-1,1}r_{t-1}}(1-q_{t-1,2})a^{\gamma}$
	$\displaystyle=a^{\gamma}-\frac{q_{t-1,2}r_{t-1}(1-q_{t-1,1})}{1-q_{t-1,1}r_{t-1}}a^{\gamma}$

If $\gamma[-1]=1$ and $\gamma=\overline{\beta 1},\beta[-1]=0$ , we have

	$\displaystyle a^{\overline{\gamma 0}}$	$\displaystyle=(1-p_{t-1})(a^{\gamma}-q_{t-1,2}c^{\gamma})$		(26)
		$\displaystyle=(1-p_{t-1})(a^{\gamma}-q_{t-1,2}r_{t-1}(a^{\beta}-(1-q_{t-1,1})(a^{\beta}-q_{t-2,2}c^{\beta})))$		(26)

If $\gamma[-1]=1$ and $\beta[-1]=1$ , we have

$\displaystyle a^{\overline{\gamma 0}}$	$\displaystyle=(1-p_{t-1})(a^{\gamma}-q_{t-1,2}c^{\gamma})$	(27)
	$\displaystyle=(1-p_{t-1})$
	$\displaystyle(a^{\gamma}-q_{t-1,2}r_{t-1}(a^{\beta}-(1-p_{t-2})(1-q_{t-1,1})(a^{\beta}-q_{t-2,2}c^{\beta})))$

We should notice that $b^{\beta},c^{\beta}$ are both determined by the parameters we already know, hence, we can see the three types of recursions as functions of $q_{t-1,2},q_{t-1,2}r_{t-1}$ and $p_{t-1}$ . We only need to prove that we can solve the three parameters from these functions. In order to simplify the equations, we define some notations as below where $|\beta|=t-2$ .

		$\displaystyle\text{Co}_{1}(\beta)=\begin{cases}-a^{\beta}+(1-p_{t-2})(a^{\beta}-q_{t-2,2}c^{\beta})&\beta[-1]=1\\ -a^{\beta}+(a^{\beta}-q_{t-2,2}c^{\beta})&\beta[-1]=0\end{cases}$		(28)
		$\displaystyle\text{Co}_{2}(\beta)=\begin{cases}-r_{t-1}q_{t-1,1}(1-p_{t-2})(a^{\beta}-q_{t-2,2}c^{\beta})&\beta[-1]=1\\ -r_{t-1}q_{t-1,1}(a^{\beta}-q_{t-2,2}c^{\beta})&\beta[-1]=0\end{cases}$		(28)

It is easy to verify that $\text{Co}_{1}(\beta)+\text{Co}_{2}(\beta)=-a^{\overline{\beta 1}}$ . According to these, we have

\displaystyle\frac{a^{\overline{\beta_{1}10}}}{a^{\overline{\beta_{2}10}}}

\displaystyle=\frac{\text{Co}_{1}(\beta_{1})q_{t-1,2}r_{t-1}+\text{Co}_{2}(\beta_{1})q_{t-1,2}-\text{Co}_{1}(\beta_{1})-\text{Co}_{2}(\beta_{1})}{\text{Co}_{1}(\beta_{2})q_{t-1,2}r_{t-1}+\text{Co}_{2}(\beta_{2})q_{t-1,2}-\text{Co}_{1}(\beta_{2})-\text{Co}_{2}(\beta_{2})}

(29)

which is equivalent to

		$\displaystyle(\frac{\text{Co}_{1}(\beta_{1})}{a^{\overline{\beta_{1}10}}}-\frac{\text{Co}_{1}(\beta_{2})}{a^{\overline{\beta_{2}10}}})q_{t-1,2}r_{t-1}+(\frac{\text{Co}_{2}(\beta_{1})}{a^{\overline{\beta_{1}10}}}-\frac{\text{Co}_{2}(\beta_{2})}{a^{\overline{\beta_{2}10}}})q_{t-1,2}$		(30)
		$\displaystyle-(\frac{\text{Co}_{1}(\beta_{1})+\text{Co}_{2}(\beta_{1})}{a^{\overline{\beta_{1}10}}}-\frac{\text{Co}_{1}(\beta_{2})+\text{Co}_{2}(\beta_{2})}{a^{\overline{\beta_{2}10}}})=0$		(30)

To show the relation of the coefficients of $q_{t-1,2}r_{t-1}$ and $q_{t-1,2}$ , we prove two lemmas at first.

Lemma 1

For two arbitrary $0-1$ string $\beta_{1},\beta_{2}$ with the same lengths, we prove that the equation $\frac{a^{\overline{\beta_{2}0}}}{c^{\overline{\beta_{2}0}}}=\frac{a^{\overline{\beta_{1}1}}}{c^{\overline{\beta_{1}1}}}$ is impossible.

Proof

Actually, if $\frac{a^{\overline{\beta_{2}0}}}{c^{\overline{\beta_{2}0}}}=\frac{a^{\overline{\beta_{1}1}}}{c^{\overline{\beta_{1}1}}}$ , we have $\frac{b^{\overline{\beta_{1}1}}}{c^{\overline{\beta_{1}1}}}=\frac{b^{\overline{\beta_{2}0}}}{c^{\overline{\beta_{2}0}}}$ . We can verify that

$\displaystyle\frac{b^{\overline{\beta_{2}0}}}{c^{\overline{\beta_{2}0}}}$	$\displaystyle=\frac{1-r_{t-1}}{r_{t-1}(1-q_{t-1,1})}$	(31)
	$\displaystyle=\frac{b^{\overline{\beta_{1}1}}}{c^{\overline{\beta_{1}1}}}$
	$\displaystyle=\begin{cases}\frac{1-r_{t-1}}{r_{t-1}}\frac{b^{\beta_{1}}+c^{\beta_{1}}-((1-q_{t-2,2})c^{\beta_{1}}+b^{\beta_{1}})}{b^{\beta_{1}}+c^{\beta_{1}}-(1-q_{t-1,1})(b^{\beta_{1}}+(1-q_{t-2,2})c^{\beta_{1}})}&\beta_{1}[-1]=0\\ \frac{1-r_{t-1}}{r_{t-1}}\frac{b^{\beta_{1}}+c^{\beta_{1}}-(1-p_{t-2})((1-q_{t-2,2})c^{\beta_{1}}+b^{\beta_{1}})}{b^{\beta_{1}}+c^{\beta_{1}}-(1-p_{t-2})(1-q_{t-1,1})(b^{\beta_{1}}+(1-q_{t-2,2})c^{\beta_{1}})}&\beta_{1}[-1]=1\end{cases}$

which is impossible if all the terms are not zero.∎

Lemma 2

As we defined above, we have the following equation if the denominators are not zero:

\displaystyle\frac{\frac{\text{Co}_{1}(\beta_{1})}{a^{\overline{\beta_{1}10}}}-\frac{\text{Co}_{1}(\beta_{2})}{a^{\overline{\beta_{2}10}}}}{\frac{\text{Co}_{2}(\beta_{1})}{a^{\overline{\beta_{1}10}}}-\frac{\text{Co}_{2}(\beta_{2})}{a^{\overline{\beta_{2}10}}}}=-\frac{q_{t-1,2}-1}{q_{t-1,2}r_{t-1}-1}

(32)

Proof

Actually, we can compute the left hand side of Equation 32 as below:

		$\displaystyle\frac{\frac{\text{Co}_{1}(\beta_{1})}{a^{\overline{\beta_{1}10}}}-\frac{\text{Co}_{1}(\beta_{2})}{a^{\overline{\beta_{2}10}}}}{\frac{\text{Co}_{2}(\beta_{1})}{a^{\overline{\beta_{1}10}}}-\frac{\text{Co}_{2}(\beta_{2})}{a^{\overline{\beta_{2}10}}}}$		(33)
		$\displaystyle=\frac{\frac{\text{Co}_{1}(\beta_{1})}{\text{Co}_{1}(\beta_{1})(q_{t-1,2}r_{t-1}-1)+\text{Co}_{2}(\beta_{1})(q_{t-1,2}-1)}-\frac{\text{Co}_{1}(\beta_{2})}{\text{Co}_{1}(\beta_{2})(q_{t-1,2}r_{t-1}-1)+\text{Co}_{2}(\beta_{2})(q_{t-1,2}-1)}}{\frac{\text{Co}_{2}(\beta_{1})}{\text{Co}_{1}(\beta_{1})(q_{t-1,2}r_{t-1}-1)+\text{Co}_{2}(\beta_{1})(q_{t-1,2}-1)}-\frac{\text{Co}_{2}(\beta_{2})}{\text{Co}_{1}(\beta_{2})(q_{t-1,2}r_{t-1}-1)+\text{Co}_{2}(\beta_{2})(q_{t-1,2}-1)}}$
		$\displaystyle=\frac{(\text{Co}_{1}(\beta_{1})\text{Co}_{2}(\beta_{2})-\text{Co}_{1}(\beta_{2})\text{Co}_{2}(\beta_{1}))(q_{t-1,2}-1)}{(\text{Co}_{1}(\beta_{2})\text{Co}_{2}(\beta_{1})-\text{Co}_{1}(\beta_{1})\text{Co}_{2}(\beta_{2}))(q_{t-1,2}r_{t-1}-1)}$
		$\displaystyle=-\frac{q_{t-1,2}-1}{q_{t-1,2}r_{t-1}-1}$

So the lemma is proved. We should also notice that this lemma holds not only for $t=n$ , it is true for $t=2,3,\cdots,n-1$ because $\text{Co}_{1}$ and $\text{Co}_{2}$ are defined the same as Equation 28.∎

Now we prove that Equation 25 and Equation 30 are two linear independent equations for unknown variables $q_{t-1,2}r_{t-1}$ and $q_{t-1,2}$ . We only need to prove that the ratio of coefficients of $q_{t-1,2}r_{t-1}$ and $q_{t-1,2}$ are different in these two equations. Because otherwise, we have

\displaystyle-\frac{q_{t-1,2}-1}{q_{t-1,2}r_{t-1}-1}=-\frac{1}{q_{t-1,1}r_{t-1}}

(34)

which is impossible because $q_{t-1,1}r_{t-1}(q_{t-1,2}-1)-(q_{t-1,2}r_{t-1}-1)=(1-q_{t-1,2}r_{t-1})(1-q_{t-1,1}r_{t-1})+(1-r_{t-1})q_{t-1,1}q_{t-1,2}r_{t-1}>0$ . Until now, we have proved that when $t$ is equal to $n$ , we can compute $q_{t-1,1},r_{t-1},p_{t-1},q_{t-1,2}$ if the other parameters are already known.

Now we consider the case that $3\leq t\leq n-1$ , and we need to prove that if $t<n$ , we can use $p_{1},p_{2},\cdots,p_{t-2}$ , $r_{1},r_{2},\cdots,r_{t-2}$ , $q_{1,1},q_{2,1},\cdots,q_{t-2,1}$ and $q_{1,2},q_{2,2},$ $\cdots,q_{t-2,2},r_{t-1}q_{t-1,1}$ to compute $q_{t-1,1},r_{t-1},p_{t-1},q_{t-1,2}$ and $r_{t}q_{t,1}$ . First, we propose a lemma to illustrate the recurrence relationship of the distributions.

Lemma 3

Suppose $\gamma$ is a $0-1$ string with $t-1$ items, then we have the following relations:

\displaystyle b^{\overline{\gamma 1}}=\begin{cases}(1-r_{t})(a^{\gamma}-(1-p_{t-1})(b^{\gamma}+(1-q_{t-1,2})c^{\gamma}))&\gamma[-1]=1\\ (1-r_{t})(a^{\gamma}-(b^{\gamma}+(1-q_{t-1,2})c^{\gamma}))&\gamma[-1]=0\end{cases}

(35)

\displaystyle b^{\overline{\gamma 0}}=\begin{cases}(1-r_{t})(1-p_{t-1})(b^{\gamma}+(1-q_{t-1,2})c^{\gamma})&\gamma[-1]=1\\ (1-r_{t})(b^{\gamma}+(1-q_{t-1,2})c^{\gamma})&\gamma[-1]=0\end{cases}

(36)

\displaystyle c^{\overline{\gamma 1}}=\begin{cases}r_{t}(a^{\gamma}-(1-p_{t-1})(1-q_{t,1})(b^{\gamma}+(1-q_{t-1,2})c^{\gamma}))&\gamma[-1]=1\\ r_{t}(a^{\gamma}-(1-q_{t,1})(b^{\gamma}+(1-q_{t-1,2})c^{\gamma}))&\gamma[-1]=0\end{cases}

(37)

\displaystyle c^{\overline{\gamma 0}}=\begin{cases}r_{t}(1-p_{t-1})(1-q_{t,1})(b^{\gamma}+(1-q_{t-1,2})c^{\gamma})&\gamma[-1]=1\\ r_{t}(1-q_{t,1})(b^{\gamma}+(1-q_{t-1,2})c^{\gamma})&\gamma[-1]=0\end{cases}

(38)

Moreover, we have the following relations of $a^{\overline{\gamma 0}}$ and other known variables. If $\gamma[-1]=0$ , we have

\displaystyle a^{\overline{\gamma 0}}

\displaystyle=(1-r_{t}q_{t,1})(a^{\gamma}-\frac{q_{t-1,2}r_{t-1}(1-q_{t-1,1})}{1-q_{t-1,1}r_{t-1}}a^{\gamma})

(39)

If $\gamma[-1]=1$ and $\gamma=\overline{\beta 1},\beta[-1]=0$ , we have

$\displaystyle a^{\overline{\gamma 0}}$	$\displaystyle=(1-r_{t}q_{t,1})(1-p_{t-1})(a^{\gamma}-q_{t-1,2}c^{\gamma})$	(40)
	$\displaystyle=(1-r_{t}q_{t,1})(1-p_{t-1})$
	$\displaystyle(a^{\gamma}-q_{t-1,2}r_{t-1}(a^{\beta}-(1-q_{t-1,1})(a^{\beta}-q_{t-2,2}c^{\beta})))$

If $\gamma[-1]=1$ and $\beta[-1]=1$ , we have

$\displaystyle a^{\overline{\gamma 0}}$	$\displaystyle=(1-r_{t}q_{t,1})(1-p_{t-1})(a^{\gamma}-q_{t-1,2}c^{\gamma})$	(41)
	$\displaystyle=(1-r_{t}q_{t,1})(1-p_{t-1})$
	$\displaystyle(a^{\gamma}-q_{t-1,2}r_{t-1}(a^{\beta}-(1-p_{t-2})(1-q_{t-1,1})(a^{\beta}-q_{t-2,2}c^{\beta})))$

Each of these equations is only one more factor compared to the $t=n$ case, so we can still get the similar result

		$\displaystyle(\frac{\text{Co}_{1}(\beta_{1})}{a^{\overline{\beta_{1}10}}}-\frac{\text{Co}_{1}(\beta_{2})}{a^{\overline{\beta_{2}10}}})q_{t-1,2}r_{t-1}+(\frac{\text{Co}_{2}(\beta_{1})}{a^{\overline{\beta_{1}10}}}-\frac{\text{Co}_{2}(\beta_{2})}{a^{\overline{\beta_{2}10}}})q_{t-1,2}$		(42)
		$\displaystyle-(\frac{\text{Co}_{1}(\beta_{1})+\text{Co}_{2}(\beta_{1})}{a^{\overline{\beta_{1}10}}}-\frac{\text{Co}_{1}(\beta_{2})+\text{Co}_{2}(\beta_{2})}{a^{\overline{\beta_{2}10}}})=0$		(42)

which is an equation of $q_{t-1,2}r_{t-1}$ and $q_{t-1,2}$ . Since we assume that $r_{t-1}$ or $q_{t-1,2}$ is known, we can get both $r_{t-1}$ and $q_{t-1,2}$ by this equation. Then we can get $r_{t}q_{t,1}$ by Equation 39 using $r_{t}q_{t,1}=1-\frac{a^{\overline{\gamma 0}}}{a^{\gamma}-\frac{q_{t-1,2}r_{t-1}(1-q_{t-1,1})}{1-q_{t-1,1}r_{t-1}}a^{\gamma}}$ . Then all the terms in Equation 40 except $p_{t-1}$ are known and not $0$ , so we can get $p_{t-1}$ . This completes the inductive transition.

Finally, we consider the base case. We only need to prove that if $p_{1}$ is known, one of $r_{1}$ and $q_{1,2}$ is known, then $q_{1,1},r_{2}q_{2,1}$ and the unknown one between $r_{1},q_{1,2}$ can be computed using $a^{\overline{00}},a^{\overline{01}}$ and $a^{\overline{10}}$ . Actually, we have the following equations

	$\displaystyle a^{\overline{00}}=(1-r_{1})(1-r_{2}q_{2,1})+r_{1}(1-q_{1,1})(1-q_{1,2})(1-r_{2}q_{2,1}),$		(43)
	$\displaystyle a^{\overline{10}}=r_{1}q_{1,1}(1-q_{1,2})(1-r_{2}q_{1,2})(1-p_{1}),$		(44)
	$\displaystyle a^{\overline{01}}=(1-r_{1})r_{2}q_{2,1}+r_{1}(1-q_{1,1})(1-(1-q_{1,2})(1-r_{2}q_{2,1})).$		(45)

When we know $p_{1}$ and $r_{1}$ , we can get the other parameters as below

	$\displaystyle q_{1,2}=\frac{1-a^{\overline{00}}-a^{\overline{01}}}{r_{1}},$		(46)
	$\displaystyle r_{2}q_{2,1}=((a^{\overline{00}})^{2}(p_{1}-1)+(1-r_{1})(p_{1}+a^{\overline{10}}-1)-a^{\overline{01}}(a^{\overline{01}}-1+p_{1}+r_{1}-p_{1}r_{1})$		(47)
	$\displaystyle-a^{\overline{00}}(-2+a^{\overline{01}}+a^{\overline{10}}+2p_{1}-a^{\overline{01}}p_{1}+r_{1}-p_{1}r_{1}))$		(48)
	$\displaystyle/((a^{\overline{00}}+a^{\overline{01}}-1)(p_{1}-1)(r_{1}-1)r_{2}),$		(49)
	$\displaystyle q_{1,1}=1+\frac{a^{\overline{10}}r_{1}}{(-1+p_{1})q_{1,1}(-1+r_{1})r_{1}(q_{2,1}r_{2}-1)}.$		(50)

When we know $p_{1}$ and $q_{1,2}$ , we can get the other parameters as below

	$\displaystyle r_{1}=\frac{1}{a^{\overline{10}}q_{1,2}}(-a^{\overline{00}}+(a^{\overline{00}})^{2}+a^{\overline{00}}a^{\overline{01}}+a^{\overline{00}}a^{\overline{10}}+a^{\overline{01}}a^{\overline{10}}+a^{\overline{00}}p_{1}-(a^{\overline{00}})^{2}p_{1}$		(51)
	$\displaystyle-a^{\overline{00}}a^{\overline{01}}p_{1}+a^{\overline{00}}q_{1,2}-(a^{\overline{00}})^{2}q_{1,2}-a^{\overline{00}}a^{\overline{01}}q_{1,2}+a^{\overline{10}}q_{1,2}-a^{\overline{00}}a^{\overline{10}}q_{1,2}$		(52)
	$\displaystyle-a^{\overline{01}}a^{\overline{10}}q_{1,2}+(a^{\overline{00}})^{2}p_{1}q_{1,2}+a^{\overline{00}}a^{\overline{01}}p_{1}q_{1,2}),$		(53)
	$\displaystyle q_{1,1}=\frac{1-a^{\overline{00}}-a^{\overline{01}}}{r_{1}},$		(54)
	$\displaystyle r_{2}q_{2,1}=\frac{-1+a^{\overline{10}}+p_{1}+(a^{\overline{00}}+a^{\overline{01}})(-1+p_{1})(-1+q_{1,2})+q_{1,2}-p_{1}q_{1,2}}{(-1+a^{\overline{00}}+a^{\overline{01}})(-1+p_{1})(-1+q_{1,2})r_{2}}.$		(55)

Notice that all the denominators in these solutions are not $0$ , therefore, we have proved the base case. Until now, we have proved the whole theorem.∎

Appendix 0.C Proof for the Model with a Global Hidden Variable (Theorem 6.1)

See 6.1

Proof

We have the following equations

		$\displaystyle P(V_{1}=0,V_{2}=0,\cdots,V_{t}=0)=(1-q_{1})(1-q_{2})\cdots(1-q_{t})r+(1-r),$		(56)
		$\displaystyle P(V_{1}=0,V_{2}=0,\cdots,V_{t}=1,V_{t+1}=0,\cdots,V_{t_{0}-1}=0)$
		$\displaystyle=(1-q_{1})(1-q_{2})\cdots q_{t}(1-q_{t+1})\cdots(1-q_{t_{0}-1})r\prod_{(V_{t},V_{i})\in E,t+1\leq i<t_{0}}(1-p_{t,i}),$
		$\displaystyle P(V_{1}=0,V_{2}=0,\cdots,V_{t}=1,V_{t+1}=0,\cdots,V_{t_{0}}=0)$
		$\displaystyle=(1-q_{1})(1-q_{2})\cdots q_{t}(1-q_{t+1})\cdots(1-q_{t_{0}}))r\prod_{(V_{t},V_{i})\in E,t+1\leq i<t_{0}}(1-p_{t,i})$

such that $V_{t_{0}}$ is not a child of $V_{t}$ . Therefore, for each pair of unconnected nodes, if two of them $V_{i},V_{j}$ ( $i<j$ ) satisfy $q_{i},q_{j}\neq 0$ , we can get $1-q_{j}=\frac{P(V_{1}=0,V_{2}=0,\cdots,V_{i}=1,V_{i+1}=0,\cdots,V_{j}=0)}{P(V_{1}=0,V_{2}=0,\cdots,V_{i}=1,V_{i+1}=0,\cdots,V_{j-1}=0)}$ . Then we can use the following two equations to solve $r$ :

		$\displaystyle(1-q_{j})(r(1-q_{1})(1-q_{2})\cdots(1-q_{j-1}))+(1-r)$		(57)
		$\displaystyle=P(V_{1}=0,V_{2}=0,\cdots,V_{j}=0),$
		$\displaystyle(r(1-q_{1})(1-q_{2})\cdots(1-q_{j-1}))+(1-r)$
		$\displaystyle=P(V_{1}=0,V_{2}=0,\cdots,V_{j-1}=0).$

If we see $1-r$ and $r(1-q_{1})(1-q_{2})\cdots(1-q_{j-1})$ as two unknown variables, they can be solved by this system of linear equations. With the value of $r$ , we can solve out $q_{1},q_{2},\cdots,q_{n}$ by using $P(V_{1}=0,V_{1}=0,\cdots,V_{t}=1)=(1-q_{1})(1-q_{2})\cdots q_{t}r$ for $t=1,2,\cdots,n$ . For $p_{i,j}$ , it can be computed by $1-p_{i,j}=\frac{P(V_{1}=0,V_{2}=0,\cdots,V_{i}=1,V_{i+1}=0,\cdots,V_{j}=0)}{P(V_{1}=0,V_{2}=0,\cdots,V_{i}=1,V_{i+1}=0,\cdots,V_{j}=0)(1-q_{j})}$ if $q_{i}\neq 0$ . We suppose the parents of $V_{j}$ are $V_{i_{1}},V_{i_{2}},\cdots,V_{i_{t}}$ such that $i_{1}<i_{2}<\cdots<i_{t}<j$ . Therefore, we have

		$\displaystyle\frac{P(V_{i_{1}}=0,\cdots,V_{i_{k-1}}=0,V_{i_{k}}=1,V_{i_{k+1}},\cdots,V_{i_{t}}=0,V_{j}=0)}{P(V_{i_{1}}=0,\cdots,V_{i_{k-1}}=0,V_{i_{k}}=1,V_{i_{k+1}},\cdots,V_{i_{t}}=0}$		(58)
		$\displaystyle=(1-q_{j})(1-p_{i_{k},j}),k=1,2,\cdots,t$		(58)

if $P(V_{i_{k}}=1)>0$ . Hence, we can still get all the meaningful $p_{i_{k},j}$ . Until now, we have got all the parameters solved.

Now we consider the case that $\not\exists 1\leq i<j\leq n$ such that $(V_{i},V_{j})\not\in E$ and $q_{i},q_{j}\neq 0$ . This is equivalent to the claim that for all the $V_{i}$ such that $q_{i}\neq 0$ , they form a complete graph if we remove all the directions of edges. Suppose three of them are $V_{i},V_{j},V_{k},1\leq i<j<k\leq n$ . Then we can get the following equations:

		$\displaystyle\frac{P(V_{1}=0,\cdots,V_{i-1}=0,V_{i}=1,V_{i+1}=0,\cdots,V_{j}=0)}{P(V_{1}=0,\cdots,V_{i-1}=0,V_{i}=1,V_{i+1}=0,\cdots,V_{j-1}=0)}=(1-q_{j})(1-p_{i,j}),$		(59)
		$\displaystyle\frac{P(V_{1}=0,\cdots,V_{i-1}=0,V_{i}=1,V_{i+1}=0,\cdots,V_{k}=0)}{P(V_{1}=0,\cdots,V_{i-1}=0,V_{i}=1,V_{i+1}=0,\cdots,V_{k-1}=0)}=(1-q_{k})(1-p_{i,k}),$
		$\displaystyle\frac{P(V_{1}=0,\cdots,V_{j-1}=0,V_{j}=1,V_{j+1}=0,\cdots,V_{k}=0)}{P(V_{1}=0,\cdots,V_{j-1}=0,V_{j}=1,V_{j+1}=0,\cdots,V_{k-1}=0)}=(1-q_{k})(1-p_{j,k}),$
		$\displaystyle\frac{P(V_{1}=0,\cdots,V_{i-1}=0,V_{i}=1,V_{i+1}=0,\cdots,V_{j-1}=0,V_{j}=1,V_{j+1}=0,\cdots,V_{k}=0)}{P(V_{1}=0,\cdots,V_{i-1}=0,V_{i}=1,V_{i+1}=0,\cdots,V_{k-1}=0)}$
		$\displaystyle=\frac{(1-q_{k})(1-p_{i,k})(1-p_{j,k})(1-(1-q_{j})(1-p_{i,j}))}{(1-q_{j})(1-p_{i,j})}.$

Therefore, $q_{i},q_{j},q_{k},p_{i,j},p_{i,k},p_{j,k}$ can all be solved. Then we can use the same procedure to get all the $q_{1},q_{2},\cdots,q_{n}$ and then all the parameters.

In conclusion, we have solved the identifiability problem of the model with a global hidden variable.∎

Appendix 0.D Proof for the Mixed Model (Theorem 6.2)

See 6.2

Proof

In order to simplify our proof, we introduce some new notations. Suppose we already have $V_{i},V_{j},V_{k}$ as required in description of this theorem.

		$\displaystyle a_{l}=r\prod_{t=1}^{l}(1-q_{t})(1-q_{0,t})=P(V_{1}=0,V_{2}=0,\cdots,V_{l}=0,U_{0}=1),$		(60)
		$\displaystyle b_{l}=(1-r)\prod_{t=1}^{l}(1-q_{t})=P(V_{1}=0,V_{2}=0,\cdots,V_{l}=0,U_{0}=0),$		(60)

		$\displaystyle x_{i,l}=\frac{1-(1-q_{i})(1-q_{0,i})}{(1-q_{i})(1-q_{0,i})}\prod_{(V_{i},V_{t})\in E,t\leq l}(1-p_{i,t})$		(61)
		$\displaystyle=\frac{P(V_{1}=0,V_{2}=0,\cdots,V_{l}=0,U_{0}=1)}{P(V_{1}=0,\cdots,V_{i-1}=0,V_{i}=1,V_{i+1}=0,\cdots,V_{l}=0,U_{0}=1)},$
		$\displaystyle x_{j,l}=\frac{1-(1-q_{j})(1-q_{0,j})}{(1-q_{j})(1-q_{0,j})}\prod_{(V_{j},V_{t})\in E,t\leq l}(1-p_{j,t})$
		$\displaystyle=\frac{P(V_{1}=0,V_{2}=0,\cdots,V_{l}=0,U_{0}=1)}{P(V_{1}=0,\cdots,V_{j-1}=0,V_{j}=1,V_{j+1}=0,\cdots,V_{l}=0,U_{0}=1)},$
		$\displaystyle x_{k,l}=\frac{1-(1-q_{k})(1-q_{0,k})}{(1-q_{k})(1-q_{0,k})}\prod_{(V_{k},V_{t})\in E,t\leq l}(1-p_{k,t})$
		$\displaystyle=\frac{P(V_{1}=0,V_{2}=0,\cdots,V_{l}=0,U_{0}=1)}{P(V_{1}=0,\cdots,V_{k-1}=0,V_{k}=1,V_{k+1}=0,\cdots,V_{l}=0,U_{0}=1)},$

		$\displaystyle y_{i,l}=\frac{q_{i}}{1-q_{i}}\prod_{(V_{i},V_{t})\in E,t\leq l}(1-p_{i,t})$		(62)
		$\displaystyle=\frac{P(V_{1}=0,V_{2}=0,\cdots,V_{l}=0,U_{0}=1)}{P(V_{1}=0,\cdots,V_{i-1}=0,V_{i}=1,V_{i+1}=0,\cdots,V_{l}=0,U_{0}=0)},$
		$\displaystyle y_{j,l}=\frac{q_{j}}{1-q_{j}}\prod_{(V_{j},V_{t})\in E,t\leq l}(1-p_{j,t})$
		$\displaystyle=\frac{P(V_{1}=0,V_{2}=0,\cdots,V_{l}=0,U_{0}=1)}{P(V_{1}=0,\cdots,V_{j-1}=0,V_{j}=1,V_{j+1}=0,\cdots,V_{l}=0,U_{0}=0)},$
		$\displaystyle y_{k,l}=\frac{q_{k}}{1-q_{k}}\prod_{(V_{k},V_{t})\in E,t\leq l}(1-p_{k,t})$
		$\displaystyle=\frac{P(V_{1}=0,V_{2}=0,\cdots,V_{l}=0,U_{0}=1)}{P(V_{1}=0,\cdots,V_{k-1}=0,V_{k}=1,V_{k+1}=0,\cdots,V_{l}=0,U_{0}=0)}.$

Therefore, we can easily verify that $a_{l}+b_{l}=P(V_{1}=0,V_{2}=0,\cdots,V_{l}=0)\equiv p_{1}$ . Moreover, because $V_{i},V_{j},V_{k}$ are disconnected with each other, we have the following equations.

		$\displaystyle a_{l}x_{i,l}+b_{l}y_{i,l}=P(V_{1}=0,\cdots,V_{i-1}=0,V_{i}=1,V_{i+1}=0,\cdots,V_{l}=0)\equiv p_{2},$		(63)
		$\displaystyle a_{l}x_{j,l}+b_{l}y_{j,l}=P(V_{1}=0,\cdots,V_{j-1}=0,V_{j}=1,V_{j+1}=0,\cdots,V_{l}=0)\equiv p_{3},$
		$\displaystyle a_{l}x_{k,l}+b_{l}y_{k,l}=P(V_{1}=0,\cdots,V_{k-1}=0,V_{k}=1,V_{k+1}=0,\cdots,V_{l}=0)\equiv p_{4},$
		$\displaystyle a_{l}x_{i,l}x_{j,l}+b_{l}y_{i,l}y_{j,l}=P(\overline{V_{1}V_{2}\cdots V_{l}}=\overline{0^{i-1}10^{j-i-1}10^{l-j}})\equiv p_{5},$
		$\displaystyle a_{l}x_{i,l}x_{k,l}+b_{l}y_{i,l}y_{k,l}=P(\overline{V_{1}V_{2}\cdots V_{l}}=\overline{0^{i-1}10^{k-i-1}10^{l-k}})\equiv p_{6},$
		$\displaystyle a_{l}x_{k,l}x_{j,l}+b_{l}y_{k,l}y_{j,l}=P(\overline{V_{1}V_{2}\cdots V_{l}}=\overline{0^{j-1}10^{k-j-1}10^{l-k}})\equiv p_{7},$
		$\displaystyle a_{l}x_{i,l}x_{j,l}x_{k,l}+b_{l}y_{i,l}y_{j,l}y_{k,l}=P(\overline{V_{1}V_{2}\cdots V_{l}}=\overline{0^{i-1}10^{j-i-1}10^{k-j-1}10^{l-k}})\equiv p_{8}.$

Here, $p_{t},t\in[8]$ are known because $V_{1},V_{2},\cdots,V_{l}$ are observable. Therefore, we can solve out $a_{l},b_{l}$ using the $8$ equations. The result is shown as below.

		$\displaystyle a_{l}=\frac{\text{numerator}_{1}\pm\text{numerator}_{2}}{\text{denominator}},$		(64)
		$\displaystyle\text{denominator}=2(-p_{1}^{2}p_{8}^{2}+2p_{1}p_{2}p_{7}p_{8}+2p_{1}p_{3}p_{6}p_{8}+2p_{1}p_{4}p_{5}p_{8}$
		$\displaystyle-4p_{1}p_{5}p_{6}p_{7}-p_{2}^{2}p_{7}^{2}-4p_{2}p_{3}p_{4}p_{8}+2p_{2}p_{3}p_{6}p_{7}+2p_{2}p_{4}p_{5}p_{7}-p_{3}^{2}p_{6}^{2}$
		$\displaystyle+2p_{3}p_{4}p_{5}p_{6}-p_{4}^{2}p_{5}^{2}),$
		$\displaystyle\text{numerator}_{1}=2{p_{1}}^{2}{p_{2}}{p_{7}}{p_{8}}+2{p_{1}}^{2}{p_{3}}{p_{6}}{p_{8}}+2{p_{1}}^{2}{p_{4}}{p_{5}}{p_{8}}-4{p_{1}}^{2}{p_{5}}{p_{6}}{p_{7}}-$
		$\displaystyle{p_{1}}{p_{2}}^{2}{p_{7}}^{2}-4{p_{1}}{p_{2}}{p_{3}}{p_{4}}{p_{8}}+2{p_{1}}{p_{2}}{p_{3}}{p_{6}}{p_{7}}+2{p_{1}}{p_{2}}{p_{4}}{p_{5}}{p_{7}}-{p_{1}}{p_{3}}^{2}{p_{6}}^{2}$
		$\displaystyle+2{p_{1}}{p_{3}}{p_{4}}{p_{5}}{p_{6}}-{p_{1}}{p_{4}}^{2}p_{5}^{2}-p_{1}^{3}p_{8}^{2},$
		$\displaystyle\text{numerator}_{2}=\left({p_{1}}^{2}{p_{8}}-{p_{1}}{p_{2}}{p_{7}}-{p_{1}}{p_{3}}{p_{6}}-{p_{1}}{p_{4}}{p_{5}}+2{p_{2}}{p_{3}}{p_{4}}\right)$
		$\displaystyle({p_{1}}^{2}{p_{8}}^{2}-2{p_{1}}{p_{2}}{p_{7}}{p_{8}}-2{p_{1}}{p_{3}}{p_{6}}{p_{8}}-2{p_{1}}{p_{4}}{p_{5}}{p_{8}}+4{p_{1}}{p_{5}}{p_{6}}{p_{7}}+{p_{2}}^{2}{p_{7}}^{2}$
		$\displaystyle+4{p_{2}}{p_{3}}{p_{4}}{p_{8}}-2{p_{2}}{p_{3}}{p_{6}}{p_{7}}-2{p_{2}}{p_{4}}{p_{5}}{p_{7}}+{p_{3}}^{2}{p_{6}}^{2}-2{p_{3}}{p_{4}}{p_{5}}{p_{6}}+{p_{4}}^{2}{p_{5}}^{2})^{\frac{1}{2}}.$

We can verify that

	$\displaystyle(p_{2}p_{3}-p_{1}p_{5})(p_{2}p_{4}-p_{1}p_{6})(p_{3}p_{4}-p_{1}p_{7})$		(65)
	$\displaystyle=-a_{l}^{3}b_{l}^{3}(x_{i,l}-y_{i,l})^{2}(x_{j,l}-y_{j,l})^{2}(x_{k,l}-y_{k,l})^{2}<0$		(66)

Therefore, there is only one solution of $a_{l}$ which is positive because

	$\displaystyle\text{numerator}_{1}^{2}-\text{numerator}_{2}^{2}$		(67)
	$\displaystyle=-4(p_{2}p_{3}-p_{1}p_{5})(p_{2}p_{4}-p_{1}p_{6})(p_{3}p_{4}-p_{1}p_{7})$		(68)
	$\displaystyle(p_{4}^{2}p_{5}^{2}-2p_{3}p_{4}p_{5}p_{6}+p_{3}^{2}p_{6}^{2}-2p_{2}p_{4}p_{5}p_{7}-2p_{2}p_{3}p_{6}p_{7}+4p_{1}p_{5}p_{6}p_{7}+p_{2}^{2}p_{7}^{2}$		(69)
	$\displaystyle+4p_{2}p_{3}p_{4}p_{8}-2p_{1}p_{4}p_{5}p_{8}-2p_{1}p_{3}p_{6}p_{8}-2p_{1}p_{2}p_{7}p_{8}+p_{1}^{2}p_{8}^{2})$		(70)
	$\displaystyle=2(p_{2}p_{3}-p_{1}p_{5})(p_{2}p_{4}-p_{1}p_{6})(p_{3}p_{4}-p_{1}p_{7})\text{denominator},$		(71)
	$\displaystyle\text{numerator}_{1}=\frac{1}{2}p_{1}\text{denominator}.$		(72)

Because $a_{l}$ is fixed now, we can get $b_{l}$ according to $a_{l}+b_{l}=p_{1}$ . This process can be done if and only if the denominator is not zero, which is equivalent to $a^{2}b^{2}(x_{i}-y_{i})^{2}(x_{j}-y_{j})^{2}(x_{k}-y_{k})^{2}\neq 0$ . Therefore, we only need to find $V_{i},V_{j},V_{k}$ satisfying the conditions in the theorem and $q_{0,i},q_{0,j},q_{0,k}\neq 0$ .

Notice that $l$ is an arbitrary number not less than $k$ , we can get $q_{t}$ and $q_{0,t},k+1\leq n$ can be computed using $\frac{a_{t}}{a_{t-1}}=(1-q_{t})(1-q_{0,t})$ and $\frac{b_{t}}{b_{t-1}}=1-q_{t}$ . For parameter $p_{t,l},l>t>k$ , we can compute it using

	$\displaystyle\frac{P(V_{1}=0,\cdots,V_{t-1}=0,V_{t}=1,V_{t+1}=0,\cdots,V_{l}=0)}{P(V_{1}=0,\cdots,V_{l}=0)}$		(73)
	$\displaystyle=\frac{a_{l}\frac{1-(1-q_{t})(1-q_{0,t})}{(1-q_{t})(1-q_{0,t})}+b_{l}\frac{q_{t}}{1-q_{t}}}{a_{l}+b_{l}}\prod_{(V_{t},V_{i})\in E,i\leq l}(1-p_{t,i}),$		(74)
	$\displaystyle\frac{P(V_{1}=0,\cdots,V_{t-1}=0,V_{t}=1,V_{t+1}=0,\cdots,V_{l-1}=0)}{P(V_{1}=0,\cdots,V_{l-1}=0)}$		(75)
	$\displaystyle=\frac{a_{l-1}\frac{1-(1-q_{t})(1-q_{0,t})}{(1-q_{t})(1-q_{0,t})}+b_{l-1}\frac{q_{t}}{1-q_{t}}}{a_{l-1}+b_{l-1}}\prod_{(V_{t},V_{i})\in E,i\leq l-1}(1-p_{t,i}).$		(76)

Then we can get $\prod_{(V_{t},V_{i})\in E,i\leq l-1}(1-p_{t,i})$ and $\prod_{(V_{t},V_{i})\in E,i\leq l}(1-p_{t,i})$ and their division is what we want. Until now, we have proved the first part of the theorem. Now suppose we have $i=j-1=k-2$ . Then for an arbitrary $0-1$ string $\gamma$ with $i-1$ bits, we still have the following facts.

		$\displaystyle\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=1,V_{j}=1,V_{k}=1,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)}=\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=1,V_{j}=0,V_{k}=0,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)}$		(77)
		$\displaystyle\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=1,V_{k}=0,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)}\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=1,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)},$
		$\displaystyle\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=1,V_{j}=1,V_{k}=0,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)}=\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=1,V_{j}=0,V_{k}=0,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)}$
		$\displaystyle\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=1,V_{k}=0,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)},$
		$\displaystyle\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=1,V_{j}=0,V_{k}=1,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)}=\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=1,V_{j}=0,V_{k}=0,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)}$
		$\displaystyle\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=1,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)},$
		$\displaystyle\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=1,V_{k}=1,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)}=\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=1,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)}$
		$\displaystyle\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=1,V_{k}=0,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)}$

where $t=0$ or $t=1$ . Therefore, we can still have that $8$ equations the same as Equation 63 and from those equations, we can solve out all of the probabilities in the facts above. Using those probabilities, we can directly get all the parameters with indexes not larger than $k$ . Together with the conclusion of the first part of this theorem, all the parameters are determined.∎

Appendix 0.E Discussion on the Cyclic Models

In the last section in appendix, we will introduce how to transform a general IC model to a causal model so that we can use do calculus method to identify do effects. In fact, since the propagation of the IC model takes place for at most $n$ rounds [2], we formulate that the state of $V_{i}$ in round $t$ is $V_{i,t}$ and that $V_{i,t}$ has three values, $0,1$ and $2$ , for three states. We construct a causal graph $G^{\prime}=(V^{\prime},E^{\prime})$ from these nodes with subscripts of time. Here, state $0$ means that the node is not activated, $1$ means that the node was activated at the last time point, and state $2$ means that the node is activated and has already tried to activate its child nodes. Moreover, all edges are defined as $(V_{i,t},V_{i,t+1}),1\leq i,t\leq n$ and $(V_{i,t},V_{j,t+1}),1\leq i,j,t\leq n,(V_{i},V_{j})\in E^{\prime}$ . Furthermore, we define the following propagating equations.

	$\displaystyle P(V_{i,t}=2\|V_{i,t-1}=2,V_{j,t-1}=v_{j,t-1},\forall j,(V_{j},V_{i})\in E)=1,$		(78)
	$\displaystyle P(V_{i,t}=2\|V_{i,t-1}=1,V_{j,t-1}=v_{j,t-1},\forall j,(V_{j},V_{i})\in E)=1,$		(79)
	$\displaystyle P(V_{i,t}=2\|V_{i,t-1}=0,V_{j,t-1}=v_{j,t-1},\forall j,(V_{j},V_{i})\in E)=0,$		(80)
	$\displaystyle P(V_{i,t}=1\|V_{i,t-1}=0,V_{j,t-1}=v_{j,t-1},\forall j,(V_{j},V_{i})\in E)$		(81)
	$\displaystyle=1-\prod_{1\leq j\leq n,(V_{j},V_{i})\in E}(1-p_{j,i}v_{j,t-1}I[v_{j,t-1}\neq 2]),$		(82)
	$\displaystyle P(V_{i,t}=0\|V_{i,t-1}=0,V_{j,t-1}=v_{j,t-1},\forall j,(V_{j},V_{i})\in E)$		(83)
	$\displaystyle=\prod_{1\leq j\leq n,(V_{j},V_{i})\in E}(1-p_{j,i}v_{j,t-1}I[v_{j,t-1}\neq 2]).$		(84)

By definition, we obtain that $G^{\prime}$ as a Bayesian causal model, so we can use the do calculus method [19] to solve out the identifiable do effects. For example, suppose $G$ has $V_{1},V_{2},V_{3}$ as observed nodes and $U_{1}$ as an unobserved node. The edges in $E$ are $(U_{1},V_{1}),(U_{1},V_{2}),(V_{1},V_{2}),(V_{2},V_{3})$ and $(V_{3},V_{1})$ . There exists a cycle in $G$ so it cannot be seen as a DAG. However, utilizing our transformation, the result graph $G^{\prime}$ is in Figure 5, which is a Bayesian causal model.