This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: IIIS, Tsinghua University, Beijing, China
11email: [email protected]
22institutetext: Microsoft Research, Beijing, China
22email: [email protected]

Causal Inference for Influence Propagation — Identifiability of the Independent Cascade Model

Shi Feng 11 0000-0001-5517-0419    Wei Chen 22
Abstract

Independent cascade (IC) model is a widely used influence propagation model for social networks. In this paper, we incorporate the concept and techniques from causal inference to study the identifiability of parameters from observational data in extended IC model with unobserved confounding factors, which models more realistic propagation scenarios but is rarely studied in influence propagation modeling before. We provide the conditions for the identifiability or unidentifiability of parameters for several special structures including the Markovian IC model, semi-Markovian IC model, and IC model with a global unobserved variable. Parameter identifiability is important for other tasks such as influence maximization under the diffusion networks with unobserved confounding factors.

Keywords:
influence propagation independent cascade model identifiability causal inference.

1 Introduction

Extensive research has been conducted studying the information and influence propagation behavior in social networks, with numerous propagation models and optimization algorithms proposed (cf. [15, 2]). Social influence among individuals in a social network is intrinsically a causal behavior — one’s action or behavior causes the change of the behavior of his or her friends in the network. Therefore, it is helpful to view influence propagation as a causal phenomenon and apply the tools in causal inference to this domain.

In causal inference, one key consideration is the confounding factors caused by unobserved variables that affect the observed behaviors of individuals in the network. For example, we may observe that user A adopts a new product and a while later her friend B adopts the same new product. This situation could be because A influences B and causes B’s adoption, but it could also be caused by an unobserved factor (e.g. an unknown information source) that affects both A and B. Confounding factors are important in understanding the propagation behavior in networks, but so far the vast majority of influence propagation research does not consider confounders in network propagation modeling. In this paper, we intend to fill this gap by explicitly including unobserved confounders into the model, and we borrow the research methodology from causal inference to carry out our research.

Causal inference research has developed many tools and methodologies to deal with such unobserved confounders, and one important problem in causal inference is to study the identifiability of the causal model, that is, if we can identify the certain effect of an intervention, or identify causal model parameters, from the observational data. In this paper, we introduce the concept of identifiability in causal inference research to influence propagation research and study whether the propagation models can be identified from observational data when there are unobserved factors in the causal propagation model. We propose the extend the classical independent cascade (IC) model to include unobserved causal factors, and consider the parameter identifiability problem for several common causal graph structures. Our main results are as follows. First, for the Markovian IC model, in which each unobserved variable may affect only one observed node in the network, we show that it is fully identifiable. Second, for the semi-Markovian IC model, in which each unobserved variable may affect exactly two observed nodes in the network, we show that as long as a local graph structure exists in the network, then the model is not parameter identifiable. For the special case of a chain graph where all observed nodes form a chain and every unobserved variable affect two neighbors on the chain, the above result implies that we need to know at least n/2n/2 parameters to make the rest parameters identifiable, where nn is the number of observed nodes in the chain. We then show a positive result that when we know nn parameters on the chain, the rest parameters are identifiable. Third, for the global hidden factor model where we have an unobserved variable that affects all observed nodes in the graph, we provide reasonable sufficient conditions so that the parameters are identifiable.

Overall, we view that our work starts a new direction to integrate rich research results from network propagation modeling and causal inference so that we could view influence propagation from the lens of causal inference, and obtain more realistic modeling and algorithmic results in this area. For example, from the causal inference lens, the classical influence maximization problem [15] of finding a set of kk nodes to maximize the total influence spread is really a causal intervention problem of forcing an intervention on kk nodes for their adoptions, and trying to maximize the causal effect of this intervention. Our study could give a new way of studying influence maximization that works under more realistic network scenarios encompassing unobserved confounders.

2 Related Work

Influence Propagation Modeling.  As described in [2], the main two models used to describe influence propagation are the independent cascade model and the linear threshold model. Past researches on influence propagation mostly focused on influence maximization problems, such as [15, 22]. In these articles, they select seed nodes online, observe the propagation in the network, and optimize the number of activated nodes after propagation by selecting optimal seed nodes. Also, some works are studying the seed-node set minimization problem, such as [11]. However, in our work, we mainly consider restoring the parameters in the independent cascade model by observing the network propagation. After obtaining the parameters in the network, we can then base on this to accomplish downstream tasks including influence maximization and seed-node set minimization.

Causal Inference and Identifiability.  For general semi-Markovian Bayesian causal graphs, [13] and [21] have given two different algorithms to determine whether a do effect is identifiable, and these two algorithms have both soundness and correctness. [14] also proves that the ID algorithm and the repeating use of the do calculus are equivalent, so for semi-Markovian Bayesian causal graphs, the do calculus can be used to compute all identifiable do effects.

In addition, for a special type of causal model, the linear causal model, articles [4] and [8] have given some necessary conditions and sufficient conditions on whether the parameters in the graph are identifiable with respect to the structure of the causal graph. However, the necessary and sufficient condition for parameter identifiability problem is not addressed and it remains an open question. In this paper, we study another special causal model derived from the IC model. Since the IC model can be viewed as a Bayesian causal model when the graph structure is a directed acyclic graph and it has some special properties, we try to give some necessary conditions and sufficient conditions for the parameters to be identifiable under some special graph structures.

3 Model and Problem Definitions

Following the convention in causal inference literature (e.g. [19]), we use capital letters (U,V,X,U,V,X,\ldots) to represent variables or a set of variables, and their corresponding lower-case letters to represent their values. For a directed graph, we use UU’s and VV’s to represent nodes since each node will also be treated as a random variable in causal inference. For a node ViV_{i}, we use N+(Vi)N^{+}(V_{i}) and N(Vi)N^{-}(V_{i}) to represent the set of its out-neighbors and in-neighbors, respectively. When the graph is directed acyclic (DAG), we refer to a node’s in-neighbors as its parents and denote the set as 𝑃𝑎(Vi)=N(Vi){\it Pa}(V_{i})=N^{-}(V_{i}). When we refer to the actual values of the parent nodes of ViV_{i}, we use 𝑝𝑎(Vi){\it pa}(V_{i}). For a positive integer kk, we use [k][k] to denote {1,2,,k}\{1,2,\ldots,k\}. We use boldface letters to represent vectors, such as 𝒓=(r1,r2,,rn)=(ri)i[n]\boldsymbol{r}=(r_{1},r_{2},\ldots,r_{n})=(r_{i})_{i\in[n]}.

The classical independent cascade model [15] of influence diffusion in a social network is modeled as follows. The social network is modeled as a directed graph G=(V,E)G=(V,E), where V={V1,V2,,Vn}V=\{V_{1},V_{2},\cdots,V_{n}\} is the set of nodes representing individuals in the social network, and EV×VE\subseteq V\times V is the set of directed edges representing the influence relationship between the individuals. Each edge (Vi,Vj)E(V_{i},V_{j})\in E is associated with an influence probability p(i,j)(0,1]p(i,j)\in(0,1] (we assume that p(i,j)=0p(i,j)=0 if (Vi,Vj)E(V_{i},V_{j})\notin E). Each node is either in state 0 or state 11, representing the idle state and the active state, respectively. At time step 0, a seed set S0VS_{0}\subseteq V of nodes is selected and activated (i.e. their states are set to 11), and all other nodes are in state 0. The propagation proceeds in discrete time steps t=1,2,t=1,2,\ldots. Let StS_{t} denote the set of nodes that are active by time tt, and let S1=S_{-1}=\emptyset. At any time t=1,2,t=1,2,\ldots, the newly activated node ViSt1St2V_{i}\in S_{t-1}\setminus S_{t-2} tries to activate each of its inactive outgoing neighbors VjN+(Vi)V_{j}\in N^{+}(V_{i}), and the activation is successful with probability p(i,j)p(i,j). If successful, VjV_{j} is activated at time tt and thus VjStV_{j}\in S_{t}. The activation trial of ViV_{i} on its out-neighbor VjV_{j} is independent of all other activation trials. Once activated, nodes stay as active, that is, St1StS_{t-1}\subseteq S_{t}. The propagation process ends at a step when there are no new nodes activated. It easy to see that the propagation ends in at most n1n-1 steps, so we use Sn1S_{n-1} to denote the final set of active nodes after the propagation.

Influence propagation is naturally a result of causal effect — one node’s activation causes the activation of its outgoing neighbors. If the graph is directed and acyclic, then the IC model on this graph can be equated to a Bayesian causal model. In fact, we can consider each node in the IC model as a variable, and for a node ViV_{i}, it takes the value determined by P(Vi=1|𝑝𝑎(Vi))=1j:Vj𝑃𝑎(Vi),vj=1 in 𝑝𝑎(Vi)(1pj,i)P(V_{i}=1|{\it pa}(V_{i}))=1-\prod_{j:V_{j}\in{\it Pa}(V_{i}),v_{j}=1\text{ in }{\it pa}(V_{i})}(1-p_{j,i}). Obviously, this is equivalent to our definition in the IC model. IC model is introduced in [15] to model influence propagation in social networks, but in general, it can model the causal effects among binary random variables. In this paper, we mainly consider the directed acyclic graph (DAG) setting, which is in line with the causal graph setting in the causal inference literature [19]. We will discuss the extension to general cyclic graphs or networks in the appendix.

All variables V1,V2,,VnV_{1},V_{2},\ldots,V_{n} are observable, and we call them observed variables. They correspond to observed behaviors of individuals in the social network. There are also potentially many unobserved (or hidden) variables that affecting individuals’ behaviors. We use U={U1,U2,}U=\{U_{1},U_{2},\ldots\} to represent the set of unobserved variables. In the IC model, we assume each UiU_{i} is a binary random variable with probability rir_{i} to be 11 and probability 1ri1-r_{i} to be 0, and all unobserved variables are mutually independent. We allow unobserved variables UiU_{i}’s to have directed edges pointing to the observed variables VjV_{j}’s, but we do not consider directed edges among the unobserved variables in this paper. If UiU_{i} has a directed edge pointing to VjV_{j}, we usually use qi,jq_{i,j} to represent the parameter on this edge. It has the same semantics as the pi,jp_{i,j}’s in the classical IC model: if Ui=1U_{i}=1, then with probability qi,jq_{i,j} UiU_{i} successfully influence VjV_{j} by setting its state to 11, and with probability 1qi,j1-q_{i,j} VjV_{j}’s state is not affected by UiU_{i}, and this influence or activation effect is independent from all other activation attempts on other edges. Thus, overall, in a network with unobserved or hidden variables, we use G=(U,V,E)G=(U,V,E) to represent the corresponding causal graph, where UU is the set of unobserved variables, VV is the set of observed variables, and E(V×V)(U×V)E\subseteq(V\times V)\cup(U\times V) is the set of directed edges. We assume that GG is a DAG, and the state of every unobserved variable UiU_{i} is sampled from {0,1}\{0,1\} with parameter rir_{i}, while the state of every observed variable VjV_{j} is determined by the states of its parents and the parameters on the incoming edges of VjV_{j} following the IC model semantics. In the DAG GG, we refer to an observable node ViV_{i} as a root if it has no observable parents in the graph. Every root ViV_{i} has at least one unobserved parent. We use vectors 𝒑,𝒒,𝒓\boldsymbol{p},\boldsymbol{q},\boldsymbol{r} to represent parameter vectors associated with edges among observed variables, edges from unobserved to observed variables, and unobserved nodes, respectively. We refer to the model M=(G=(U,V,E),𝒑,𝒒,𝒓)M=(G=(U,V,E),\boldsymbol{p},\boldsymbol{q},\boldsymbol{r}) as the causal IC model. When the distinction is needed, we use capital letters P,Q,RP,Q,R to represent the parameter names, and lower boldface letters 𝒑,𝒒,𝒓\boldsymbol{p},\boldsymbol{q},\boldsymbol{r} to represent the parameter values.

In this paper, we focus on the parameter identifiability problem following the causal inference literature. In the context of the IC model, the states of nodes V={V1,V2,,Vn}V=\{V_{1},V_{2},\ldots,V_{n}\} are observable while the states of U={U1,U2,}U=\{U_{1},U_{2},\ldots\} are unobservable. We define parameter identifiability as follows.

Definition 1 (Parameter Identifiability)

Given a graph G=(U,V,E)G=(U,V,E), we say that a set of IC model parameters ΘPQR\Theta\subseteq P\cup Q\cup R on GG is identifiable if after fixing the values of parameters outside Θ\Theta and fixing the observed probability distributions P(V=v)P(V^{\prime}=v^{\prime}) for all VVV^{\prime}\subseteq V and all v{0,1}|V|v^{\prime}\in\{0,1\}^{|V^{\prime}|}, the values of parameters in Θ\Theta are uniquely determined. We say that the graph GG is parameter identifiable if Θ=PQR\Theta=P\cup Q\cup R. Accordingly, the algorithmic problem of parameter identifiability is to derive the unique values of parameters in Θ\Theta given graph G=(U,V,E)G=(U,V,E), the values of parameters outside Θ\Theta, and the observed probability distributions P(V=v)P(V^{\prime}=v^{\prime}) for all VVV^{\prime}\subseteq V and all v{0,1}|V|v^{\prime}\in\{0,1\}^{|V^{\prime}|}. Finally, if the algorithm only uses a polynomial number of observed probability values P(V=v)P(V^{\prime}=v^{\prime})’s and runs in polynomial time, where both polynomials are with respect to the graph size, we say that the parameters in Θ\Theta are efficiently identifiable.

Note that when there are no unobserved variables (except the unique unobserved variables for each root of the graph), the problem is mainly to derive the parameters pi,jp_{i,j}’s from all observed P(V=v)P(V^{\prime}=v^{\prime})’s. In this case, the parameter identifiability problem bears similarity with the well-studied network inference problem [10, 16, 9, 7, 18, 1, 3, 6, 5, 17, 20, 12]. The network inference problem focuses on using observed cascade data to derive the network structure and propagation parameters, and it emphasizes on the sample complexity of inferring parameters. Hence, when there are no unobserved variables in the model, we could use the network inference methods to help to solve the parameter identifiability problem. However, in real social influence and network propagation, there are other hidden factors that affect the propagation and the resulting distribution. Such hidden factors are not addressed in the network inference literature. In contrast, our study in this paper is focusing on addressing these hidden factors in network inference, and thus we borrow the ideas from causal inference to study the identifiability problem under the IC model.

In this paper, we study three types of unobserved variables that could commonly occur in network influence propagation. They correspond to three types of IC models with unobserved variables, as summarized below.

Markovian IC Model.  In the Markovian IC model, each observed variable ViV_{i} is associated with a unique unobserved variable UiU_{i}, and there is a directed edge from UiU_{i} to ViV_{i}. This models the scenario where each individual in the social network has some latent and unknown factor that affects its observed behavior. We use qiq_{i} to denote the parameter on the edge (Ui,Vi)(U_{i},V_{i}). Note that the effect of UiU_{i} on the activation of ViV_{i} is determined by probability riqir_{i}\cdot q_{i}, and thus we treat ri=1r_{i}=1 for all i[n]i\in[n], and focus on identifying parameters qiq_{i}’s. Thus the graph G=(U,V,E)G=(U,V,E) has parameters 𝒒=(qi)i[n]\boldsymbol{q}=(q_{i})_{i\in[n]}, and 𝒑=(pi,j)(Vi,Vj)E\boldsymbol{p}=(p_{i,j})_{(V_{i},V_{j})\in E}. Figure 2 shows an example of a Markovian IC model. If some qi=0q_{i}=0, it means that the observed variable ViV_{i} has no latent variable influencing it, and it only receives influence from other observed variables.

Refer to caption
Figure 1: A Markovian IC model with five nodes.
Refer to caption
Figure 2: A Markovian IC model with five nodes and a global unobserved variable.

Semi-Markovian IC Model.  The second type of unobserved variables is the hidden variables connected to exactly two observed variables in the graph. In particular, for every pair of nodes Vi,VjVV_{i},V_{j}\in V, we allow one unobserved variable Ui,jU_{i,j} that has two edges, one pointing to ViV_{i} and the other pointing to VjV_{j}. This models the scenario that two individuals in the social network has a common unobserved confounder that may affect the behavior of two individuals. We call this type of model semi-Markovian IC model, following the common terminology of the semi-Markovian model in the literature [19]. In this model, each Ui,jU_{i,j} has a parameter ri,jr_{i,j}, and edges (Ui,j,Vi)(U_{i,j},V_{i}) and (Ui,j,Vj)(U_{i,j},V_{j}) have parameters qi,j,1q_{i,j,1} and qi,j,2q_{i,j,2} respectively. Therefore, the graph has parameters 𝒓=(ri,j)(Vi,Vj)E\boldsymbol{r}=(r_{i,j})_{(V_{i},V_{j})\in E}, 𝒒=(qi,j,1,qi,j,2)(Vi,Vj)E\boldsymbol{q}=(q_{i,j,1},q_{i,j,2})_{(V_{i},V_{j})\in E}, and 𝒑=(pi,j)(Vi,Vj)E\boldsymbol{p}=(p_{i,j})_{(V_{i},V_{j})\in E}.

Within this model, we will pay special attention to a special type of graphs where the observed variables form a chain, i.e. V1V2VnV_{1}\rightarrow V_{2}\rightarrow\cdots\rightarrow V_{n}, and the unobserved variables always point to the two neighbors on the chain. In this case, we use UiU_{i} to denote the unobserved variable associated with edge (Vi,Vi+1)(V_{i},V_{i+1}), and the parameters on the edges (Ui,Vi)(U_{i},V_{i}) and (Ui,Vi+1)(U_{i},V_{i+1}) are denoted as qi,1q_{i,1} and qi,2q_{i,2}, respectively. Figure 3 represents this chain model.

Refer to caption
Figure 3: The semi-Markovian IC chain model.

IC Model with A Global Unobserved Variable.  The third type of hidden variables is a global unobserved variable U0U_{0} that points to all observed variables in the network. This naturally models the global causal effect where some common factor affects all or most individuals in the network. For every edge (U0,Vi)(U_{0},V_{i}), we use q0,iq_{0,i} to represent its parameter.

Moreover, we can combine this model with the Markovian IC model, where we allow both unobserved variable UiU_{i} for each individual and a global unobserved varoable U0U_{0}. Figure 2 represents this model.

4 Parameter Identifiability of the Markovian IC Model

For the Markovian IC model in which every observed variable has its own unobserved variable, we can fully identify the model parameters in most cases, as given by the following theorem.

Theorem 4.1 (Identifiability of the Markovian IC Model)

For an arbitrary Markovian IC model G=(U,V,E)G=(U,V,E) with parameters 𝐪=(qi)i[n]\boldsymbol{q}=(q_{i})_{i\in[n]} and 𝐩=(pi,j)(Vi,Vj)E\boldsymbol{p}=(p_{i,j})_{(V_{i},V_{j})\in E}, all the qiq_{i} parameters are efficiently identifiable, and for every i[n]i\in[n], if qi1q_{i}\neq 1, then all pj,ip_{j,i} parameters for (Vj,Vi)E(V_{j},V_{i})\in E are efficiently identifiable.

Proof

For an observed variable (node) ViV_{i}, suppose that its observed parents are Vi1,Vi2,,VitV_{i_{1}},V_{i_{2}},\cdots,V_{i_{t}}. Therefore, we have

P(Vi=0|Vi1=0,,Vit=0)=1qi,\displaystyle P(V_{i}=0|V_{i_{1}}=0,\cdots,V_{i_{t}}=0)=1-q_{i}, (1)
P(Vi=0|Vij=1,Vi1=0,,Vij1=0,Vij+1=0,,Vit=0)=(1qi)(1pij,i).\displaystyle P(V_{i}=0|V_{i_{j}}=1,V_{i_{1}}=0,\cdots,V_{i_{j-1}}=0,V_{i_{j+1}}=0,\cdots,V_{i_{t}}=0)=(1-q_{i})(1-p_{i_{j},i}). (2)

From Eq.(1), we can obtain the value of qiq_{i}. Then if qi1q_{i}\neq 1, from Eq.(2), we can derive the value of pij,ip_{i_{j},i}. Moreover, for each root node ViV_{i}, we can get qiq_{i} by computing qi=P(Vi=1)q_{i}=P(V_{i}=1). The computational efficiency is obvious.∎

The theorem essentially says that all parameters are identifiable under the Markovian IC model, except for the corner case where some qi=1q_{i}=1. In this case, the observed variable ViV_{i} is fully determined by its unobserved parent UiU_{i}, so we cannot determine the influence from other observed parents of ViV_{i} to ViV_{i}. But the influence from the observed parents of ViV_{i} to ViV_{i} is not useful any way in this case, so the edges from the observed parents of ViV_{i} to ViV_{i} will not affect the causal inference in the graph and they can be removed.

5 Parameter Identifiability of the Semi-Markovian IC Model

Following the definition in the model section, we then consider the identifiability problem of the semi-Markovian models. We will demonstrate that in most cases, this model is not parameter identifiable. Actually, from [21] we know that the semi-Markovian Bayesian causal model is also not identifiable in general. Essentially, our conclusion is not related to their result. On the other side, we will show that with some parameters known in advance, the semi-Markovian IC chain model will be identifiable.

5.1 Condition on Unidentifiability of the Semi-Markovian IC Model

More specifically, the following theorem shows the unidentifiability of the semi-Markovian IC model with a special structure in it.

Theorem 5.1 (Unidentifiability of the Semi-Markovian IC Model)

Suppose in a general graph GG, we can find the following structure. There are three observable nodes V1,V2,V3V_{1},V_{2},V_{3} such that (V1,V2)E,(V2,V3)E(V_{1},V_{2})\in E,(V_{2},V_{3})\in E and unobservable U1,U2U_{1},U_{2} with (U1,V1),(U1,V2),(U2,V2),(U2,V3)E(U_{1},V_{1}),(U_{1},V_{2}),(U_{2},V_{2}),(U_{2},V_{3})\in E. Suppose each of U1,U2U_{1},U_{2} only has two edges associated to it, the three nodes V1,V2,V3V_{1},V_{2},V_{3} can be written adjacently in a topological order of nodes in UVU\cup V. Then we can deduce that the graph GG is not parameter identifiable.

Figure 4 is an example of the structure described in the above theorem.

Refer to caption
Figure 4: An example of the structure in Theorem 5.1.
Proof (Outline)

To prove that the parameters in the model with this structure are not identifiable, we give two different sets of parameters directly. We show that these two different sets of parameters produce the same distribution of nodes in VV, and thus the set of parameters is not identifiable by observing only the distribution of VV. The details of these two sets of parameters and the distributions they produce are included in Appendix 0.A. ∎

5.2 Identifiability of the Chain Model

We now consider the chain model as described in Section 3 and depicted in Figure 3. In this structure, we present a conclusion of identifiability under the assumption that the valuations of some parameters are our prior knowledge.

We divide the parameters of the graph into four vectors

𝒒1=(q1,1,q2,1,,qn1,1),𝒒2=(q1,2,q2,2,,qn1,2),\displaystyle{\boldsymbol{q}}_{1}=(q_{1,1},q_{2,1},\cdots,q_{n-1,1}),{\boldsymbol{q}}_{2}=(q_{1,2},q_{2,2},\cdots,q_{n-1,2}), (3)
𝒑=(p1,p2,,pn1),𝒓=(r1,r2,,rn1).\displaystyle{\boldsymbol{p}}=(p_{1},p_{2},\cdots,p_{n-1}),{\boldsymbol{r}}=(r_{1},r_{2},\cdots,r_{n-1}). (4)

For the chain model, our theorem below shows that once the parameters p1p_{1} is known, 𝒒2{\boldsymbol{q}}_{2} or 𝒓{\boldsymbol{r}} is known, the set consists of remaining parameters in the chain is efficiently identifiable.

Theorem 5.2 (Identifiability of the Semi-Markovian IC Chain Model)

Suppose that we have a semi-Markovian IC chain model with the graph G=(U,V,E)G=(U,V,E) and the IC parameters 𝐩=(pi)i[n1]{\boldsymbol{p}}=(p_{i})_{i\in[n-1]}, 𝐪1=(qi,1)i[n1]{\boldsymbol{q}}_{1}=(q_{i,1})_{i\in[n-1]}, 𝐪2=(qi,2)i[n1]{\boldsymbol{q}}_{2}=(q_{i,2})_{i\in[n-1]} and 𝐫=(ri)i[n1]{\boldsymbol{r}}=(r_{i})_{i\in[n-1]}, and suppose that all parameters are in the range (0,1)(0,1). If the values of parameter p1p_{1} is known, 𝐪2{\boldsymbol{q}}_{2} or 𝐫{\boldsymbol{r}} is known, then the remaining parameters are efficiently identifiable.

Proof (Outline)

We use induction to prove this theorem. Under the assumption that p1p_{1} is known and 𝒒2{\boldsymbol{q}}_{2} or 𝒓{\boldsymbol{r}} is known, suppose p1,p2,,pt2p_{1},p_{2},\cdots,p_{t-2}, r1,r2,,rt2r_{1},r_{2},\cdots,r_{t-2}, q1,1,q2,1,,qt2,1q_{1,1},q_{2,1},\cdots,q_{t-2,1} and q1,2,q2,2,,qt2,2,rt1qt1,1q_{1,2},q_{2,2},\cdots,q_{t-2,2},r_{t-1}q_{t-1,1} has been determined by us, and we prove that qt1,1,rt1,pt1,qt1,2q_{t-1,1},r_{t-1},p_{t-1},q_{t-1,2} and rtqt,1r_{t}q_{t,1} can also be determined. In fact, by the distribution of the first tt nodes on the chain we can obtain three different equations, and after substituting our known parameters, the inductive transition can be completed. It is worthy noting that this inductive process can also be used to compute the unknown parameters efficiently.

The proof is lengthy because of the many corner cases considered and the need to discuss the cases t=n,t=2t=n,t=2 and 2<t<n2<t<n. The details of this proof are included in Appendix 0.B. \Box

According to Theorem 5.2 we get that the semi-Markovian chain is parameter identifiable in the case that nn particular parameters are known. Simultaneously, by Theorem 5.1, we can show that if just less than n+12\lfloor\frac{n+1}{2}\rfloor parameters are known, then this semi-Markovian chain will not be parameter identifiable. Actually, if the chain model is parameter identifiable, utilizing Theorem 5.1, we know that for each 2tn12\leq t\leq n-1, at least one of parameters between pt1,pt,rt1,rt,qt1,1,qt1,2,qt,1p_{t-1},p_{t},r_{t-1},r_{t},q_{t-1,1},q_{t-1,2},q_{t,1} and qt,2q_{t,2} should be known. Therefore, we let t=2,4,,2n12t=2,4,\cdots,2\lfloor\frac{n-1}{2}\rfloor, we can deduce that at least n12\lfloor\frac{n-1}{2}\rfloor should be known. Formally, we have the following collary of Theorem 5.1 and Theorem 5.2.

Corollary 1

For a semi-Markovian IC chain model, if no more than n12\lfloor\frac{n-1}{2}\rfloor parameters are known in advance, the remaining parameters are unidentifiable; if it is allowed to know nn parameters in advance, we can choose p1,𝐪2p_{1},{\boldsymbol{q}}_{2} or p1,𝐫p_{1},{\boldsymbol{r}} to be known, then the remaining parameters are identifiable.

6 Parameter Identifiability of Model with a Global Hidden Variable

Next, we consider the case where there is a global hidden variable in the causal IC model, defined as those in Section 3. If there is only one hidden variable U0U_{0} in the whole model, we prove that the parameters in general in this model are identifiable; if there is not only U0U_{0}, the model is also Markovian, that is, there are also nn hidden variables U1,,UnU_{1},\cdots,U_{n} corresponding to V1,V2,,VnV_{1},V_{2},\cdots,V_{n}, then the parameters in this model are identifiable if certain conditions are satisfied.

6.1 Observable IC Model with Only a Global Hidden Variable

Suppose the observed variables in the connected DAG graph G=(U,V,E)G=(U,V,E) are V1,V2,,VnV_{1},V_{2},\cdots,V_{n} in a topological order and there is a global hidden variable U0U_{0} such that there exists an edge from U0U_{0} to the node for each observable variable ViV_{i}. Suppose the activating probability of U0U_{0} is rr and the activating probability from UU to ViV_{i} is qi[0,1)q_{i}\in[0,1) (naturally, q10q_{1}\neq 0 and there are at least 33 of nonzero qiq_{i}’s). Now we propose a theorem according to these settings.

Theorem 6.1 (Identifiability of the IC Model with a Global Hidden Variable)

For an arbitrary IC model with a global hidden variable G=(U,V,E)G=(U,V,E) with parameters 𝐪=(qi)i[n]{\bf q}=(q_{i})_{i\in[n]}, rr and 𝐩=(pi,j)(Vi,Vj)E{\bf p}=(p_{i,j})_{(V_{i},V_{j})\in E} such that qi1,pi,j1q_{i}\neq 1,p_{i,j}\neq 1 and r1r\neq 1 for i,j[n]\forall i,j\in[n], all the parameters in 𝐩,r{\bf p},r and 𝐪{\bf q} are identifiable.

Proof (Outline)

We discuss this problem in two cases, the first one is the existence of two disconnected points Vi,Vj,i<jV_{i},V_{j},i<j in VV and qi,qj0q_{i},q_{j}\neq 0. At this point we can use 1qj=P(V1=0,V2=0,,Vi=1,Vi+1=0,,Vj=0)P(V1=0,V2=0,,Vi=1,Vi+1=0,,Vj1=0)1-q_{j}=\frac{P(V_{1}=0,V_{2}=0,\cdots,V_{i}=1,V_{i+1}=0,\cdots,V_{j}=0)}{P(V_{1}=0,V_{2}=0,\cdots,V_{i}=1,V_{i+1}=0,\cdots,V_{j-1}=0)} to solve out qjq_{j}, and then use P(V1=0,V2=0,,Vj=0)P(V_{1}=0,V_{2}=0,\cdots,V_{j}=0) and P(V1=0,V2=0,,Vj1=0)P(V_{1}=0,V_{2}=0,\cdots,V_{j-1}=0) to solve out rr.

After getting rr, by the quotients of probabilities of propagating results, we can get all the parameters.

Another case is that there is no Vi,VjV_{i},V_{j} as described above. At this point there must exist three points Vi,Vj,VkV_{i},V_{j},V_{k} that are connected with each other and qi,qj,qk0q_{i},q_{j},q_{k}\neq 0. We observe the probabilities of different possible propagating results of these three points with all other nodes are 0 after the propagation. From these, we can solve out qi,qj,qkq_{i},q_{j},q_{k}, and then solve out all parameters by the same method as in the first case.∎

6.2 Markovian IC Model with a Global Hidden Variable (Mixed Model)

Suppose the model is G=(U,V,E)G=(U,V,E), where U={U0,U1,U2,,Un}U=\{U_{0},U_{1},U_{2},\cdots,U_{n}\}, V={V1,V2,,Vn}V=\{V_{1},V_{2},\cdots,V_{n}\}. Here, V1,V2,,VnV_{1},V_{2},\cdots,V_{n} are in a topological order. The parameters are r0r_{0}, 𝐪0=(q0,i)i[n]{\bf q}_{0}=(q_{0,i})_{i\in[n]}, 𝐪=(qi)i[n]{\bf q}=(q_{i})_{i\in[n]} and 𝐩=(pi,j)(Vi,Vj)E{\bf p}=(p_{i,j})_{(V_{i},V_{j})\in E}.

Theorem 6.2 (Identifiability of Markovian IC Model with a Global Hidden Variable (Mixed Model))

For an arbitrary Markovian IC Model with a Global Hidden Variable G=(U,V,E)G=(U,V,E) with parameters r0r_{0}, 𝐪0=(q0,i)i[n]{\bf q}_{0}=(q_{0,i})_{i\in[n]}, 𝐪=(qi)i[n]{\bf q}=(q_{i})_{i\in[n]} and 𝐩=(pi,j)(Vi,Vj)E{\bf p}=(p_{i,j})_{(V_{i},V_{j})\in E}, we suppose that all the parameters are not 11. If i,j,k[n],i<j<k\exists i,j,k\in[n],i<j<k such that each pair in Vi,Vj,VkV_{i},V_{j},V_{k} are disconnected and q0,i,q0,j,q0,k0q_{0,i},q_{0,j},q_{0,k}\neq 0, then the parameters q0,t,qtq_{0,t},q_{t} and pt,l,l>t>kp_{t,l},l>t>k are identifiable. Moreover, if Vi,Vj,VkV_{i},V_{j},V_{k} can be adjacently continuous in some topological order, i.e. j=i+1,k=i+2j=i+1,k=i+2 without loss of generality, all the parameters are identifiable.

Proof (Outline)

Assuming that there exist Vi,Vj,VkV_{i},V_{j},V_{k} that satisfy the requirements of the theorem, then we can write expressions for the distribution of these three parameters when all other nodes with subscripts not greater than ll are equal to 0. In fact, we can see that with these 88 expressions, we can solve for P(V1=0,,Vl=0,U0=1)P(V_{1}=0,\cdots,V_{l}=0,U_{0}=1) and P(V1=0,,Vl=0,U0=0)P(V_{1}=0,\cdots,V_{l}=0,U_{0}=0).

Since we have P(V1=0,,Vl=0,U0=1)=rt=1l(1qt)(1q0,t)P(V_{1}=0,\cdots,V_{l}=0,U_{0}=1)=r\prod_{t=1}^{l}(1-q_{t})(1-q_{0,t}) and P(V1=0,,Vl=0,U0=0)=(1r)t=1l(1qt)P(V_{1}=0,\cdots,V_{l}=0,U_{0}=0)=(1-r)\prod_{t=1}^{l}(1-q_{t}), we will be able to obtain all the parameters very easily by dividing these equations two by two. This proof has some trivial discussion to show that this computational method does not fail due to corner cases. ∎

Notice that the parameters in this model are identifiable when and only when a special three-node structure appears in it. Intuitively, this is because through this structure we can more easily obtain some information about the parameters, which does not contradict the intuition of Theorem 5.1.

7 Conclusion

In this paper, we study the parameter identifiability of the independent cascade model in influence propagation and show conditions on identifiability or unidentifiability for several classes of causal IC model structure. We believe that the incorporation of observed confounding factors and causal inference techniques is important in the next step of influence propagation research and identifiability of the IC model is our first step towards this goal. There are many open problems and directions in combining causal inference and propagation research. For example, seed selection and influence maximization correspond to the intervention (or do effect) in causal inference, and how to compute such intervention effect under the network with unobserved confounders and how to do influence maximization is a very interesting research question. In terms of identifiability, one can also investigate the identifiability of the intervention effect, or whether given some intervention effect one can identify more of such effects. One can also look into identifiability in the general cyclic IC models, for which we provide some initial discussions in Appendix 0.E, but more investigations are needed.

References

  • [1] Abrahao, B., Chierichetti, F., Kleinberg, R., Panconesi, A.: Trace complexity of network inference. arXiv e-prints pp. arXiv–1308 (2013)
  • [2] Chen, W., Lakshmanan, L.V., Castillo, C.: Information and Influence Propagation in Social Networks. Morgan & Claypool Publishers (2013)
  • [3] Daneshmand, H., Gomez-Rodriguez, M., Song, L., Schölkopf, B.: Estimating diffusion network structures: Recovery conditions, sample complexity & soft-thresholding algorithm. In: ICML (2014)
  • [4] Drton, M., Foygel, R., Sullivant, S.: Global identifiability of linear structural equation models. The Annals of Statistics 39(2), 865–886 (2011)
  • [5] Du, N., Liang, Y., Balcan, M., Song, L.: Influence function learning in information diffusion networks. In: ICML 2014, Beijing, China, 21-26 June 2014 (2014)
  • [6] Du, N., Song, L., Gomez-Rodriguez, M., Zha, H.: Scalable influence estimation in continuous-time diffusion networks. In: NIPS 2013, December 5-8, 2013, Lake Tahoe, Nevada, United States. pp. 3147–3155 (2013)
  • [7] Du, N., Song, L., Smola, A.J., Yuan, M.: Learning networks of heterogeneous influence. In: NIPS 2012, December 3-6, 2012, Lake Tahoe, Nevada, United States. pp. 2789–2797 (2012)
  • [8] Foygel, R., Draisma, J., Drton, M.: Half-trek criterion for generic identifiability of linear structural equation models. The Annals of Statistics pp. 1682–1713 (2012)
  • [9] Gomez-Rodriguez, M., Balduzzi, D., Schölkopf, B.: Uncovering the temporal dynamics of diffusion networks. arXiv preprint arXiv:1105.0697 (2011)
  • [10] Gomez-Rodriguez, M., Leskovec, J., Krause, A.: Inferring networks of diffusion and influence. In: KDD’2010 (2010)
  • [11] Goyal, A., Bonchi, F., Lakshmanan, L.V., Venkatasubramanian, S.: On minimizing budget and time in influence propagation over social networks. Social network analysis and mining 3(2), 179–192 (2013)
  • [12] He, X., Xu, K., Kempe, D., Liu, Y.: Learning influence functions from incomplete observations. arXiv e-prints pp. arXiv–1611 (2016)
  • [13] Huang, Y., Valtorta, M.: Identifiability in causal bayesian networks: A sound and complete algorithm. In: AAAI. pp. 1149–1154 (2006)
  • [14] Huang, Y., Valtorta, M.: Pearl’s calculus of intervention is complete. arXiv preprint arXiv:1206.6831 (2012)
  • [15] Kempe, D., Kleinberg, J.M., Tardos, É.: Maximizing the spread of influence through a social network. In: KDD (2003)
  • [16] Myers, S.A., Leskovec, J.: On the convexity of latent social network inference. arXiv e-prints pp. arXiv–1010 (2010)
  • [17] Narasimhan, H., Parkes, D.C., Singer, Y.: Learnability of influence in networks. In: Proceedings of the 29th Annual Conference on Neural Information Processing Systems (2015)
  • [18] Netrapalli, P., Sanghavi, S.: Finding the graph of epidemic cascades. arXiv preprint arXiv:1202.1779 (2012)
  • [19] Pearl, J.: Causality. Cambridge University Press (2009), 2nd Edition
  • [20] Pouget-Abadie, J., Horel, T.: Inferring graphs from cascades: A sparse recovery framework. arXiv e-prints pp. arXiv–1505 (2015)
  • [21] Shpitser, I., Pearl, J.: Identification of joint interventional distributions in recursive semi-markovian causal models. In: Proceedings of the 21st National Conference on Artificial Intelligence, 2006. pp. 1219–1226 (2006)
  • [22] Tang, Y., Shi, Y., Xiao, X.: Influence maximization in near-linear time: A martingale approach. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data. pp. 1539–1554 (2015)

Appendix

Appendix 0.A Proof for Unidentifiability of the Semi-Markovian IC Model (Theorem 5.1)

See 5.1

Proof

To prove that the parameters are unidentifiable, we will construct two different sets of valuations of parameters such that two distributions of values taken by the nodes in VV are the same. In fact, we assume that the parent nodes of ViV_{i} are Vi,1,Vi,2,,Vi,tiV_{i,1},V_{i,2},\cdots,V_{i,t_{i}} and Ui,1,,Ui,siU_{i,1},\cdots,U_{i,s_{i}} for i=1,2,3i=1,2,3. And, the parameters are set as shown in Figure 4.

One set of parameters is that all the parameters are set to 0.50.5. Another set of parameters is that all the parameters are set to be 0.50.5 except r1,r2,q1,1,q1,2,q2,2r_{1},r_{2},q_{1,1},q_{1,2},q_{2,2} and q2,3q_{2,3}. The exceptions are set to be

r1=10r2712r210,q1,1=14r1,\displaystyle r_{1}=\frac{10r_{2}-7}{12r_{2}-10},q_{1,1}=\frac{1}{4r_{1}}, (5)
q1,2=6r258r28,q2,1=132r2,q2,2=14r2\displaystyle q_{1,2}=\frac{6r_{2}-5}{8r_{2}-8},q_{2,1}=\frac{1}{3-2r_{2}},q_{2,2}=\frac{1}{4r_{2}} (6)

where r2r_{2} is an arbitrary number between 14\frac{1}{4} and 914\frac{9}{14}. Then we only need to prove that for an arbitrary distribution of the ancestors of V1,V2,V3V_{1},V_{2},V_{3} except U1,U2U_{1},U_{2}, the distributions of V1,V2,V3V_{1},V_{2},V_{3} are the same for the two parameter settings. This is because if this condition is satisfied, then the children of V1,V2,V3V_{1},V_{2},V_{3} will not be affected by the differences of the parameter valuations because the parameters determining them are the same (only the distribution of U1,U2U_{1},U_{2} will change but each of them only has two children that are in {V1,V2,V3}\{V_{1},V_{2},V_{3}\}). In fact, we have

P(V1=1,V2=1,V3=1|pa(V1),pa(V2),pa(V3))\displaystyle P(V_{1}=1,V_{2}=1,V_{3}=1|pa(V_{1}),pa(V_{2}),pa(V_{3})) (7)
=13+3P(V2=1|pa(V2)+3P(V1=1|pa(V2)(11+5P(V2=1|pa(V2))128,\displaystyle=\frac{13+3P(V_{2}=1|pa(V_{2})+3P(V_{1}=1|pa(V_{2})(11+5P(V_{2}=1|pa(V_{2}))}{128}, (8)
P(V1=1,V2=1,V3=0|pa(V1),pa(V2),pa(V3))=23+9P(V2=1|pa(V2)64,\displaystyle P(V_{1}=1,V_{2}=1,V_{3}=0|pa(V_{1}),pa(V_{2}),pa(V_{3}))=\frac{23+9P(V_{2}=1|pa(V_{2})}{64}, (9)
P(V1=1,V2=0,V3=1|pa(V1),pa(V2),pa(V3))\displaystyle P(V_{1}=1,V_{2}=0,V_{3}=1|pa(V_{1}),pa(V_{2}),pa(V_{3})) (10)
=3(1+5P(V1=1|pa(V1)))(P(V2=1|pa(V2))1)128,\displaystyle=\frac{3(1+5P(V_{1}=1|pa(V_{1})))(P(V_{2}=1|pa(V_{2}))-1)}{128}, (11)
P(V1=0,V2=1,V3=1|pa(V1),pa(V2),pa(V3))\displaystyle P(V_{1}=0,V_{2}=1,V_{3}=1|pa(V_{1}),pa(V_{2}),pa(V_{3})) (12)
=3(1P(V1=1|pa(V1)))(5P(V2=1|pa(V2))+3)64,\displaystyle=\frac{3(1-P(V_{1}=1|pa(V_{1})))(5P(V_{2}=1|pa(V_{2}))+3)}{64}, (13)
P(V1=1,V2=0,V3=0|pa(V1),pa(V2),pa(V3))\displaystyle P(V_{1}=1,V_{2}=0,V_{3}=0|pa(V_{1}),pa(V_{2}),pa(V_{3})) (14)
=3(1+5P(V1=1|pa(V1)))(1P(V2=1|pa(V2)))128,\displaystyle=\frac{3(1+5P(V_{1}=1|pa(V_{1})))(1-P(V_{2}=1|pa(V_{2})))}{128}, (15)
P(V1=0,V2=1,V3=0|pa(V1),pa(V2),pa(V3))\displaystyle P(V_{1}=0,V_{2}=1,V_{3}=0|pa(V_{1}),pa(V_{2}),pa(V_{3})) (16)
=3(1P(V1=1|pa(V1)))(3+5P(V2=1|pa(V2)))64,\displaystyle=\frac{3(1-P(V_{1}=1|pa(V_{1})))(3+5P(V_{2}=1|pa(V_{2})))}{64}, (17)
P(V1=0,V2=1,V3=1|pa(V1),pa(V2),pa(V3))\displaystyle P(V_{1}=0,V_{2}=1,V_{3}=1|pa(V_{1}),pa(V_{2}),pa(V_{3})) (18)
=3(1+5P(V1=1|pa(V1)))(1P(V2=1|pa(V2)))128,\displaystyle=\frac{3(1+5P(V_{1}=1|pa(V_{1})))(1-P(V_{2}=1|pa(V_{2})))}{128}, (19)
P(V1=0,V2=0,V3=1|pa(V1),pa(V2),pa(V3))\displaystyle P(V_{1}=0,V_{2}=0,V_{3}=1|pa(V_{1}),pa(V_{2}),pa(V_{3})) (20)
=15(1P(V1=1|pa(V1)))(1P(V2=1|pa(V2)))64,\displaystyle=\frac{15(1-P(V_{1}=1|pa(V_{1})))(1-P(V_{2}=1|pa(V_{2})))}{64}, (21)
P(V1=0,V2=0,V3=1|pa(V1),pa(V2),pa(V3))\displaystyle P(V_{1}=0,V_{2}=0,V_{3}=1|pa(V_{1}),pa(V_{2}),pa(V_{3})) (22)
=15(1P(V1=1|pa(V1)))(1P(V2=1|pa(V2)))64.\displaystyle=\frac{15(1-P(V_{1}=1|pa(V_{1})))(1-P(V_{2}=1|pa(V_{2})))}{64}. (23)

Notice that these equations are only related to P(V1=1|pa(V1))P(V_{1}=1|pa(V_{1})) and P(V2=1|pa(V2))P(V_{2}=1|pa(V_{2})), so until now, we have proved that the two sets of parameters produce two same distributions for observed variables and therefore GG is parameter unidentifiable at this point.∎

Appendix 0.B Proof for the Chain Structure (Theorem 5.2)

See 5.2

Proof

In the analysis, we use the following notations. For observed nodes V1,V2,V_{1},V_{2}, ,Vt\ldots,V_{t}, we use V1Vt¯\overline{V_{1}\cdots V_{t}} to represent their collective states as a bit string. For a bit string γ\gamma of length tt, we write

aγ=P(V1Vt¯=γ),bγ=P(V1Vt¯=γ,Ut=0),\displaystyle a^{\gamma}=P(\overline{V_{1}\cdots V_{t}}=\gamma),b^{\gamma}=P(\overline{V_{1}\cdots V_{t}}=\gamma,U_{t}=0), (24)
cγ=P(V1Vt¯=γ,Ut=1).\displaystyle c^{\gamma}=P(\overline{V_{1}\cdots V_{t}}=\gamma,U_{t}=1).

Note that aγa^{\gamma} is observable, but bγb^{\gamma} and cγc^{\gamma} are not observable.

We will use induction method to prove this result. More specifically, we want to prove the base step that p1,r1,q1,1,q1,2p_{1},r_{1},q_{1,1},q_{1,2} and q2,1r2q_{2,1}r_{2} are known. Moreover, we will prove the induction step that if p1,p2,,pt2p_{1},p_{2},\cdots,p_{t-2}, r1,r2,,rt2r_{1},r_{2},\cdots,r_{t-2}, q1,1,q2,1,,qt2,1q_{1,1},q_{2,1},\cdots,q_{t-2,1} and q1,2,q2,2,,qt2,2,rt1qt1,1q_{1,2},q_{2,2},\cdots,q_{t-2,2},r_{t-1}q_{t-1,1} are already known, we can compute qt1,1,rt1,pt1,qt1,2q_{t-1,1},r_{t-1},p_{t-1},q_{t-1,2} and rtqt,1r_{t}q_{t,1} using distributions of observed variables.

Initially, in order to complete the induction step, we divide the problem into two cases which are t=nt=n and 3tn13\leq t\leq n-1.

Firstly, if t=nt=n, we only need to compute qt1,1,rt1,pt1,qt1,2q_{t-1,1},r_{t-1},p_{t-1},q_{t-1,2}. Actually, we have the following relations of aγ0¯a^{\overline{\gamma 0}} and other known variables. If γ[1]=0\gamma[-1]=0, we have

aγ0¯\displaystyle a^{\overline{\gamma 0}} =bγ+(1qt1,2)cγ\displaystyle=b^{\gamma}+(1-q_{t-1,2})c^{\gamma} (25)
=1rt11qt1,1rt1aγ+rt1(1qt1,1)1qt1,1rt1(1qt1,2)aγ\displaystyle=\frac{1-r_{t-1}}{1-q_{t-1,1}r_{t-1}}a^{\gamma}+\frac{r_{t-1}(1-q_{t-1,1})}{1-q_{t-1,1}r_{t-1}}(1-q_{t-1,2})a^{\gamma}
=aγqt1,2rt1(1qt1,1)1qt1,1rt1aγ\displaystyle=a^{\gamma}-\frac{q_{t-1,2}r_{t-1}(1-q_{t-1,1})}{1-q_{t-1,1}r_{t-1}}a^{\gamma}

If γ[1]=1\gamma[-1]=1 and γ=β1¯,β[1]=0\gamma=\overline{\beta 1},\beta[-1]=0, we have

aγ0¯\displaystyle a^{\overline{\gamma 0}} =(1pt1)(aγqt1,2cγ)\displaystyle=(1-p_{t-1})(a^{\gamma}-q_{t-1,2}c^{\gamma}) (26)
=(1pt1)(aγqt1,2rt1(aβ(1qt1,1)(aβqt2,2cβ)))\displaystyle=(1-p_{t-1})(a^{\gamma}-q_{t-1,2}r_{t-1}(a^{\beta}-(1-q_{t-1,1})(a^{\beta}-q_{t-2,2}c^{\beta})))

If γ[1]=1\gamma[-1]=1 and β[1]=1\beta[-1]=1, we have

aγ0¯\displaystyle a^{\overline{\gamma 0}} =(1pt1)(aγqt1,2cγ)\displaystyle=(1-p_{t-1})(a^{\gamma}-q_{t-1,2}c^{\gamma}) (27)
=(1pt1)\displaystyle=(1-p_{t-1})
(aγqt1,2rt1(aβ(1pt2)(1qt1,1)(aβqt2,2cβ)))\displaystyle(a^{\gamma}-q_{t-1,2}r_{t-1}(a^{\beta}-(1-p_{t-2})(1-q_{t-1,1})(a^{\beta}-q_{t-2,2}c^{\beta})))

We should notice that bβ,cβb^{\beta},c^{\beta} are both determined by the parameters we already know, hence, we can see the three types of recursions as functions of qt1,2,qt1,2rt1q_{t-1,2},q_{t-1,2}r_{t-1} and pt1p_{t-1}. We only need to prove that we can solve the three parameters from these functions. In order to simplify the equations, we define some notations as below where |β|=t2|\beta|=t-2.

Co1(β)={aβ+(1pt2)(aβqt2,2cβ)β[1]=1aβ+(aβqt2,2cβ)β[1]=0\displaystyle\text{Co}_{1}(\beta)=\begin{cases}-a^{\beta}+(1-p_{t-2})(a^{\beta}-q_{t-2,2}c^{\beta})&\beta[-1]=1\\ -a^{\beta}+(a^{\beta}-q_{t-2,2}c^{\beta})&\beta[-1]=0\end{cases} (28)
Co2(β)={rt1qt1,1(1pt2)(aβqt2,2cβ)β[1]=1rt1qt1,1(aβqt2,2cβ)β[1]=0\displaystyle\text{Co}_{2}(\beta)=\begin{cases}-r_{t-1}q_{t-1,1}(1-p_{t-2})(a^{\beta}-q_{t-2,2}c^{\beta})&\beta[-1]=1\\ -r_{t-1}q_{t-1,1}(a^{\beta}-q_{t-2,2}c^{\beta})&\beta[-1]=0\end{cases}

It is easy to verify that Co1(β)+Co2(β)=aβ1¯\text{Co}_{1}(\beta)+\text{Co}_{2}(\beta)=-a^{\overline{\beta 1}}. According to these, we have

aβ110¯aβ210¯\displaystyle\frac{a^{\overline{\beta_{1}10}}}{a^{\overline{\beta_{2}10}}} =Co1(β1)qt1,2rt1+Co2(β1)qt1,2Co1(β1)Co2(β1)Co1(β2)qt1,2rt1+Co2(β2)qt1,2Co1(β2)Co2(β2)\displaystyle=\frac{\text{Co}_{1}(\beta_{1})q_{t-1,2}r_{t-1}+\text{Co}_{2}(\beta_{1})q_{t-1,2}-\text{Co}_{1}(\beta_{1})-\text{Co}_{2}(\beta_{1})}{\text{Co}_{1}(\beta_{2})q_{t-1,2}r_{t-1}+\text{Co}_{2}(\beta_{2})q_{t-1,2}-\text{Co}_{1}(\beta_{2})-\text{Co}_{2}(\beta_{2})} (29)

which is equivalent to

(Co1(β1)aβ110¯Co1(β2)aβ210¯)qt1,2rt1+(Co2(β1)aβ110¯Co2(β2)aβ210¯)qt1,2\displaystyle(\frac{\text{Co}_{1}(\beta_{1})}{a^{\overline{\beta_{1}10}}}-\frac{\text{Co}_{1}(\beta_{2})}{a^{\overline{\beta_{2}10}}})q_{t-1,2}r_{t-1}+(\frac{\text{Co}_{2}(\beta_{1})}{a^{\overline{\beta_{1}10}}}-\frac{\text{Co}_{2}(\beta_{2})}{a^{\overline{\beta_{2}10}}})q_{t-1,2} (30)
(Co1(β1)+Co2(β1)aβ110¯Co1(β2)+Co2(β2)aβ210¯)=0\displaystyle-(\frac{\text{Co}_{1}(\beta_{1})+\text{Co}_{2}(\beta_{1})}{a^{\overline{\beta_{1}10}}}-\frac{\text{Co}_{1}(\beta_{2})+\text{Co}_{2}(\beta_{2})}{a^{\overline{\beta_{2}10}}})=0

To show the relation of the coefficients of qt1,2rt1q_{t-1,2}r_{t-1} and qt1,2q_{t-1,2}, we prove two lemmas at first.

Lemma 1

For two arbitrary 010-1 string β1,β2\beta_{1},\beta_{2} with the same lengths, we prove that the equation aβ20¯cβ20¯=aβ11¯cβ11¯\frac{a^{\overline{\beta_{2}0}}}{c^{\overline{\beta_{2}0}}}=\frac{a^{\overline{\beta_{1}1}}}{c^{\overline{\beta_{1}1}}} is impossible.

Proof

Actually, if aβ20¯cβ20¯=aβ11¯cβ11¯\frac{a^{\overline{\beta_{2}0}}}{c^{\overline{\beta_{2}0}}}=\frac{a^{\overline{\beta_{1}1}}}{c^{\overline{\beta_{1}1}}}, we have bβ11¯cβ11¯=bβ20¯cβ20¯\frac{b^{\overline{\beta_{1}1}}}{c^{\overline{\beta_{1}1}}}=\frac{b^{\overline{\beta_{2}0}}}{c^{\overline{\beta_{2}0}}}. We can verify that

bβ20¯cβ20¯\displaystyle\frac{b^{\overline{\beta_{2}0}}}{c^{\overline{\beta_{2}0}}} =1rt1rt1(1qt1,1)\displaystyle=\frac{1-r_{t-1}}{r_{t-1}(1-q_{t-1,1})} (31)
=bβ11¯cβ11¯\displaystyle=\frac{b^{\overline{\beta_{1}1}}}{c^{\overline{\beta_{1}1}}}
={1rt1rt1bβ1+cβ1((1qt2,2)cβ1+bβ1)bβ1+cβ1(1qt1,1)(bβ1+(1qt2,2)cβ1)β1[1]=01rt1rt1bβ1+cβ1(1pt2)((1qt2,2)cβ1+bβ1)bβ1+cβ1(1pt2)(1qt1,1)(bβ1+(1qt2,2)cβ1)β1[1]=1\displaystyle=\begin{cases}\frac{1-r_{t-1}}{r_{t-1}}\frac{b^{\beta_{1}}+c^{\beta_{1}}-((1-q_{t-2,2})c^{\beta_{1}}+b^{\beta_{1}})}{b^{\beta_{1}}+c^{\beta_{1}}-(1-q_{t-1,1})(b^{\beta_{1}}+(1-q_{t-2,2})c^{\beta_{1}})}&\beta_{1}[-1]=0\\ \frac{1-r_{t-1}}{r_{t-1}}\frac{b^{\beta_{1}}+c^{\beta_{1}}-(1-p_{t-2})((1-q_{t-2,2})c^{\beta_{1}}+b^{\beta_{1}})}{b^{\beta_{1}}+c^{\beta_{1}}-(1-p_{t-2})(1-q_{t-1,1})(b^{\beta_{1}}+(1-q_{t-2,2})c^{\beta_{1}})}&\beta_{1}[-1]=1\end{cases}

which is impossible if all the terms are not zero.∎

Lemma 2

As we defined above, we have the following equation if the denominators are not zero:

Co1(β1)aβ110¯Co1(β2)aβ210¯Co2(β1)aβ110¯Co2(β2)aβ210¯=qt1,21qt1,2rt11\displaystyle\frac{\frac{\text{Co}_{1}(\beta_{1})}{a^{\overline{\beta_{1}10}}}-\frac{\text{Co}_{1}(\beta_{2})}{a^{\overline{\beta_{2}10}}}}{\frac{\text{Co}_{2}(\beta_{1})}{a^{\overline{\beta_{1}10}}}-\frac{\text{Co}_{2}(\beta_{2})}{a^{\overline{\beta_{2}10}}}}=-\frac{q_{t-1,2}-1}{q_{t-1,2}r_{t-1}-1} (32)
Proof

Actually, we can compute the left hand side of Equation 32 as below:

Co1(β1)aβ110¯Co1(β2)aβ210¯Co2(β1)aβ110¯Co2(β2)aβ210¯\displaystyle\frac{\frac{\text{Co}_{1}(\beta_{1})}{a^{\overline{\beta_{1}10}}}-\frac{\text{Co}_{1}(\beta_{2})}{a^{\overline{\beta_{2}10}}}}{\frac{\text{Co}_{2}(\beta_{1})}{a^{\overline{\beta_{1}10}}}-\frac{\text{Co}_{2}(\beta_{2})}{a^{\overline{\beta_{2}10}}}} (33)
=Co1(β1)Co1(β1)(qt1,2rt11)+Co2(β1)(qt1,21)Co1(β2)Co1(β2)(qt1,2rt11)+Co2(β2)(qt1,21)Co2(β1)Co1(β1)(qt1,2rt11)+Co2(β1)(qt1,21)Co2(β2)Co1(β2)(qt1,2rt11)+Co2(β2)(qt1,21)\displaystyle=\frac{\frac{\text{Co}_{1}(\beta_{1})}{\text{Co}_{1}(\beta_{1})(q_{t-1,2}r_{t-1}-1)+\text{Co}_{2}(\beta_{1})(q_{t-1,2}-1)}-\frac{\text{Co}_{1}(\beta_{2})}{\text{Co}_{1}(\beta_{2})(q_{t-1,2}r_{t-1}-1)+\text{Co}_{2}(\beta_{2})(q_{t-1,2}-1)}}{\frac{\text{Co}_{2}(\beta_{1})}{\text{Co}_{1}(\beta_{1})(q_{t-1,2}r_{t-1}-1)+\text{Co}_{2}(\beta_{1})(q_{t-1,2}-1)}-\frac{\text{Co}_{2}(\beta_{2})}{\text{Co}_{1}(\beta_{2})(q_{t-1,2}r_{t-1}-1)+\text{Co}_{2}(\beta_{2})(q_{t-1,2}-1)}}
=(Co1(β1)Co2(β2)Co1(β2)Co2(β1))(qt1,21)(Co1(β2)Co2(β1)Co1(β1)Co2(β2))(qt1,2rt11)\displaystyle=\frac{(\text{Co}_{1}(\beta_{1})\text{Co}_{2}(\beta_{2})-\text{Co}_{1}(\beta_{2})\text{Co}_{2}(\beta_{1}))(q_{t-1,2}-1)}{(\text{Co}_{1}(\beta_{2})\text{Co}_{2}(\beta_{1})-\text{Co}_{1}(\beta_{1})\text{Co}_{2}(\beta_{2}))(q_{t-1,2}r_{t-1}-1)}
=qt1,21qt1,2rt11\displaystyle=-\frac{q_{t-1,2}-1}{q_{t-1,2}r_{t-1}-1}

So the lemma is proved. We should also notice that this lemma holds not only for t=nt=n, it is true for t=2,3,,n1t=2,3,\cdots,n-1 because Co1\text{Co}_{1} and Co2\text{Co}_{2} are defined the same as Equation 28.∎

Now we prove that Equation 25 and Equation 30 are two linear independent equations for unknown variables qt1,2rt1q_{t-1,2}r_{t-1} and qt1,2q_{t-1,2}. We only need to prove that the ratio of coefficients of qt1,2rt1q_{t-1,2}r_{t-1} and qt1,2q_{t-1,2} are different in these two equations. Because otherwise, we have

qt1,21qt1,2rt11=1qt1,1rt1\displaystyle-\frac{q_{t-1,2}-1}{q_{t-1,2}r_{t-1}-1}=-\frac{1}{q_{t-1,1}r_{t-1}} (34)

which is impossible because qt1,1rt1(qt1,21)(qt1,2rt11)=(1qt1,2rt1)(1qt1,1rt1)+(1rt1)qt1,1qt1,2rt1>0q_{t-1,1}r_{t-1}(q_{t-1,2}-1)-(q_{t-1,2}r_{t-1}-1)=(1-q_{t-1,2}r_{t-1})(1-q_{t-1,1}r_{t-1})+(1-r_{t-1})q_{t-1,1}q_{t-1,2}r_{t-1}>0. Until now, we have proved that when tt is equal to nn, we can compute qt1,1,rt1,pt1,qt1,2q_{t-1,1},r_{t-1},p_{t-1},q_{t-1,2} if the other parameters are already known.

Now we consider the case that 3tn13\leq t\leq n-1, and we need to prove that if t<nt<n, we can use p1,p2,,pt2p_{1},p_{2},\cdots,p_{t-2}, r1,r2,,rt2r_{1},r_{2},\cdots,r_{t-2}, q1,1,q2,1,,qt2,1q_{1,1},q_{2,1},\cdots,q_{t-2,1} and q1,2,q2,2,q_{1,2},q_{2,2}, ,qt2,2,rt1qt1,1\cdots,q_{t-2,2},r_{t-1}q_{t-1,1} to compute qt1,1,rt1,pt1,qt1,2q_{t-1,1},r_{t-1},p_{t-1},q_{t-1,2} and rtqt,1r_{t}q_{t,1}. First, we propose a lemma to illustrate the recurrence relationship of the distributions.

Lemma 3

Suppose γ\gamma is a 010-1 string with t1t-1 items, then we have the following relations:

bγ1¯={(1rt)(aγ(1pt1)(bγ+(1qt1,2)cγ))γ[1]=1(1rt)(aγ(bγ+(1qt1,2)cγ))γ[1]=0\displaystyle b^{\overline{\gamma 1}}=\begin{cases}(1-r_{t})(a^{\gamma}-(1-p_{t-1})(b^{\gamma}+(1-q_{t-1,2})c^{\gamma}))&\gamma[-1]=1\\ (1-r_{t})(a^{\gamma}-(b^{\gamma}+(1-q_{t-1,2})c^{\gamma}))&\gamma[-1]=0\end{cases} (35)
bγ0¯={(1rt)(1pt1)(bγ+(1qt1,2)cγ)γ[1]=1(1rt)(bγ+(1qt1,2)cγ)γ[1]=0\displaystyle b^{\overline{\gamma 0}}=\begin{cases}(1-r_{t})(1-p_{t-1})(b^{\gamma}+(1-q_{t-1,2})c^{\gamma})&\gamma[-1]=1\\ (1-r_{t})(b^{\gamma}+(1-q_{t-1,2})c^{\gamma})&\gamma[-1]=0\end{cases} (36)
cγ1¯={rt(aγ(1pt1)(1qt,1)(bγ+(1qt1,2)cγ))γ[1]=1rt(aγ(1qt,1)(bγ+(1qt1,2)cγ))γ[1]=0\displaystyle c^{\overline{\gamma 1}}=\begin{cases}r_{t}(a^{\gamma}-(1-p_{t-1})(1-q_{t,1})(b^{\gamma}+(1-q_{t-1,2})c^{\gamma}))&\gamma[-1]=1\\ r_{t}(a^{\gamma}-(1-q_{t,1})(b^{\gamma}+(1-q_{t-1,2})c^{\gamma}))&\gamma[-1]=0\end{cases} (37)
cγ0¯={rt(1pt1)(1qt,1)(bγ+(1qt1,2)cγ)γ[1]=1rt(1qt,1)(bγ+(1qt1,2)cγ)γ[1]=0\displaystyle c^{\overline{\gamma 0}}=\begin{cases}r_{t}(1-p_{t-1})(1-q_{t,1})(b^{\gamma}+(1-q_{t-1,2})c^{\gamma})&\gamma[-1]=1\\ r_{t}(1-q_{t,1})(b^{\gamma}+(1-q_{t-1,2})c^{\gamma})&\gamma[-1]=0\end{cases} (38)

Moreover, we have the following relations of aγ0¯a^{\overline{\gamma 0}} and other known variables. If γ[1]=0\gamma[-1]=0, we have

aγ0¯\displaystyle a^{\overline{\gamma 0}} =(1rtqt,1)(aγqt1,2rt1(1qt1,1)1qt1,1rt1aγ)\displaystyle=(1-r_{t}q_{t,1})(a^{\gamma}-\frac{q_{t-1,2}r_{t-1}(1-q_{t-1,1})}{1-q_{t-1,1}r_{t-1}}a^{\gamma}) (39)

If γ[1]=1\gamma[-1]=1 and γ=β1¯,β[1]=0\gamma=\overline{\beta 1},\beta[-1]=0, we have

aγ0¯\displaystyle a^{\overline{\gamma 0}} =(1rtqt,1)(1pt1)(aγqt1,2cγ)\displaystyle=(1-r_{t}q_{t,1})(1-p_{t-1})(a^{\gamma}-q_{t-1,2}c^{\gamma}) (40)
=(1rtqt,1)(1pt1)\displaystyle=(1-r_{t}q_{t,1})(1-p_{t-1})
(aγqt1,2rt1(aβ(1qt1,1)(aβqt2,2cβ)))\displaystyle(a^{\gamma}-q_{t-1,2}r_{t-1}(a^{\beta}-(1-q_{t-1,1})(a^{\beta}-q_{t-2,2}c^{\beta})))

If γ[1]=1\gamma[-1]=1 and β[1]=1\beta[-1]=1, we have

aγ0¯\displaystyle a^{\overline{\gamma 0}} =(1rtqt,1)(1pt1)(aγqt1,2cγ)\displaystyle=(1-r_{t}q_{t,1})(1-p_{t-1})(a^{\gamma}-q_{t-1,2}c^{\gamma}) (41)
=(1rtqt,1)(1pt1)\displaystyle=(1-r_{t}q_{t,1})(1-p_{t-1})
(aγqt1,2rt1(aβ(1pt2)(1qt1,1)(aβqt2,2cβ)))\displaystyle(a^{\gamma}-q_{t-1,2}r_{t-1}(a^{\beta}-(1-p_{t-2})(1-q_{t-1,1})(a^{\beta}-q_{t-2,2}c^{\beta})))

Each of these equations is only one more factor compared to the t=nt=n case, so we can still get the similar result

(Co1(β1)aβ110¯Co1(β2)aβ210¯)qt1,2rt1+(Co2(β1)aβ110¯Co2(β2)aβ210¯)qt1,2\displaystyle(\frac{\text{Co}_{1}(\beta_{1})}{a^{\overline{\beta_{1}10}}}-\frac{\text{Co}_{1}(\beta_{2})}{a^{\overline{\beta_{2}10}}})q_{t-1,2}r_{t-1}+(\frac{\text{Co}_{2}(\beta_{1})}{a^{\overline{\beta_{1}10}}}-\frac{\text{Co}_{2}(\beta_{2})}{a^{\overline{\beta_{2}10}}})q_{t-1,2} (42)
(Co1(β1)+Co2(β1)aβ110¯Co1(β2)+Co2(β2)aβ210¯)=0\displaystyle-(\frac{\text{Co}_{1}(\beta_{1})+\text{Co}_{2}(\beta_{1})}{a^{\overline{\beta_{1}10}}}-\frac{\text{Co}_{1}(\beta_{2})+\text{Co}_{2}(\beta_{2})}{a^{\overline{\beta_{2}10}}})=0

which is an equation of qt1,2rt1q_{t-1,2}r_{t-1} and qt1,2q_{t-1,2}. Since we assume that rt1r_{t-1} or qt1,2q_{t-1,2} is known, we can get both rt1r_{t-1} and qt1,2q_{t-1,2} by this equation. Then we can get rtqt,1r_{t}q_{t,1} by Equation 39 using rtqt,1=1aγ0¯aγqt1,2rt1(1qt1,1)1qt1,1rt1aγr_{t}q_{t,1}=1-\frac{a^{\overline{\gamma 0}}}{a^{\gamma}-\frac{q_{t-1,2}r_{t-1}(1-q_{t-1,1})}{1-q_{t-1,1}r_{t-1}}a^{\gamma}}. Then all the terms in Equation 40 except pt1p_{t-1} are known and not 0, so we can get pt1p_{t-1}. This completes the inductive transition.

Finally, we consider the base case. We only need to prove that if p1p_{1} is known, one of r1r_{1} and q1,2q_{1,2} is known, then q1,1,r2q2,1q_{1,1},r_{2}q_{2,1} and the unknown one between r1,q1,2r_{1},q_{1,2} can be computed using a00¯,a01¯a^{\overline{00}},a^{\overline{01}} and a10¯a^{\overline{10}}. Actually, we have the following equations

a00¯=(1r1)(1r2q2,1)+r1(1q1,1)(1q1,2)(1r2q2,1),\displaystyle a^{\overline{00}}=(1-r_{1})(1-r_{2}q_{2,1})+r_{1}(1-q_{1,1})(1-q_{1,2})(1-r_{2}q_{2,1}), (43)
a10¯=r1q1,1(1q1,2)(1r2q1,2)(1p1),\displaystyle a^{\overline{10}}=r_{1}q_{1,1}(1-q_{1,2})(1-r_{2}q_{1,2})(1-p_{1}), (44)
a01¯=(1r1)r2q2,1+r1(1q1,1)(1(1q1,2)(1r2q2,1)).\displaystyle a^{\overline{01}}=(1-r_{1})r_{2}q_{2,1}+r_{1}(1-q_{1,1})(1-(1-q_{1,2})(1-r_{2}q_{2,1})). (45)

When we know p1p_{1} and r1r_{1}, we can get the other parameters as below

q1,2=1a00¯a01¯r1,\displaystyle q_{1,2}=\frac{1-a^{\overline{00}}-a^{\overline{01}}}{r_{1}}, (46)
r2q2,1=((a00¯)2(p11)+(1r1)(p1+a10¯1)a01¯(a01¯1+p1+r1p1r1)\displaystyle r_{2}q_{2,1}=((a^{\overline{00}})^{2}(p_{1}-1)+(1-r_{1})(p_{1}+a^{\overline{10}}-1)-a^{\overline{01}}(a^{\overline{01}}-1+p_{1}+r_{1}-p_{1}r_{1}) (47)
a00¯(2+a01¯+a10¯+2p1a01¯p1+r1p1r1))\displaystyle-a^{\overline{00}}(-2+a^{\overline{01}}+a^{\overline{10}}+2p_{1}-a^{\overline{01}}p_{1}+r_{1}-p_{1}r_{1})) (48)
/((a00¯+a01¯1)(p11)(r11)r2),\displaystyle/((a^{\overline{00}}+a^{\overline{01}}-1)(p_{1}-1)(r_{1}-1)r_{2}), (49)
q1,1=1+a10¯r1(1+p1)q1,1(1+r1)r1(q2,1r21).\displaystyle q_{1,1}=1+\frac{a^{\overline{10}}r_{1}}{(-1+p_{1})q_{1,1}(-1+r_{1})r_{1}(q_{2,1}r_{2}-1)}. (50)

When we know p1p_{1} and q1,2q_{1,2}, we can get the other parameters as below

r1=1a10¯q1,2(a00¯+(a00¯)2+a00¯a01¯+a00¯a10¯+a01¯a10¯+a00¯p1(a00¯)2p1\displaystyle r_{1}=\frac{1}{a^{\overline{10}}q_{1,2}}(-a^{\overline{00}}+(a^{\overline{00}})^{2}+a^{\overline{00}}a^{\overline{01}}+a^{\overline{00}}a^{\overline{10}}+a^{\overline{01}}a^{\overline{10}}+a^{\overline{00}}p_{1}-(a^{\overline{00}})^{2}p_{1} (51)
a00¯a01¯p1+a00¯q1,2(a00¯)2q1,2a00¯a01¯q1,2+a10¯q1,2a00¯a10¯q1,2\displaystyle-a^{\overline{00}}a^{\overline{01}}p_{1}+a^{\overline{00}}q_{1,2}-(a^{\overline{00}})^{2}q_{1,2}-a^{\overline{00}}a^{\overline{01}}q_{1,2}+a^{\overline{10}}q_{1,2}-a^{\overline{00}}a^{\overline{10}}q_{1,2} (52)
a01¯a10¯q1,2+(a00¯)2p1q1,2+a00¯a01¯p1q1,2),\displaystyle-a^{\overline{01}}a^{\overline{10}}q_{1,2}+(a^{\overline{00}})^{2}p_{1}q_{1,2}+a^{\overline{00}}a^{\overline{01}}p_{1}q_{1,2}), (53)
q1,1=1a00¯a01¯r1,\displaystyle q_{1,1}=\frac{1-a^{\overline{00}}-a^{\overline{01}}}{r_{1}}, (54)
r2q2,1=1+a10¯+p1+(a00¯+a01¯)(1+p1)(1+q1,2)+q1,2p1q1,2(1+a00¯+a01¯)(1+p1)(1+q1,2)r2.\displaystyle r_{2}q_{2,1}=\frac{-1+a^{\overline{10}}+p_{1}+(a^{\overline{00}}+a^{\overline{01}})(-1+p_{1})(-1+q_{1,2})+q_{1,2}-p_{1}q_{1,2}}{(-1+a^{\overline{00}}+a^{\overline{01}})(-1+p_{1})(-1+q_{1,2})r_{2}}. (55)

Notice that all the denominators in these solutions are not 0, therefore, we have proved the base case. Until now, we have proved the whole theorem.∎

Appendix 0.C Proof for the Model with a Global Hidden Variable (Theorem 6.1)

See 6.1

Proof

We have the following equations

P(V1=0,V2=0,,Vt=0)=(1q1)(1q2)(1qt)r+(1r),\displaystyle P(V_{1}=0,V_{2}=0,\cdots,V_{t}=0)=(1-q_{1})(1-q_{2})\cdots(1-q_{t})r+(1-r), (56)
P(V1=0,V2=0,,Vt=1,Vt+1=0,,Vt01=0)\displaystyle P(V_{1}=0,V_{2}=0,\cdots,V_{t}=1,V_{t+1}=0,\cdots,V_{t_{0}-1}=0)
=(1q1)(1q2)qt(1qt+1)(1qt01)r(Vt,Vi)E,t+1i<t0(1pt,i),\displaystyle=(1-q_{1})(1-q_{2})\cdots q_{t}(1-q_{t+1})\cdots(1-q_{t_{0}-1})r\prod_{(V_{t},V_{i})\in E,t+1\leq i<t_{0}}(1-p_{t,i}),
P(V1=0,V2=0,,Vt=1,Vt+1=0,,Vt0=0)\displaystyle P(V_{1}=0,V_{2}=0,\cdots,V_{t}=1,V_{t+1}=0,\cdots,V_{t_{0}}=0)
=(1q1)(1q2)qt(1qt+1)(1qt0))r(Vt,Vi)E,t+1i<t0(1pt,i)\displaystyle=(1-q_{1})(1-q_{2})\cdots q_{t}(1-q_{t+1})\cdots(1-q_{t_{0}}))r\prod_{(V_{t},V_{i})\in E,t+1\leq i<t_{0}}(1-p_{t,i})

such that Vt0V_{t_{0}} is not a child of VtV_{t}. Therefore, for each pair of unconnected nodes, if two of them Vi,VjV_{i},V_{j} (i<ji<j) satisfy qi,qj0q_{i},q_{j}\neq 0, we can get 1qj=P(V1=0,V2=0,,Vi=1,Vi+1=0,,Vj=0)P(V1=0,V2=0,,Vi=1,Vi+1=0,,Vj1=0)1-q_{j}=\frac{P(V_{1}=0,V_{2}=0,\cdots,V_{i}=1,V_{i+1}=0,\cdots,V_{j}=0)}{P(V_{1}=0,V_{2}=0,\cdots,V_{i}=1,V_{i+1}=0,\cdots,V_{j-1}=0)}. Then we can use the following two equations to solve rr:

(1qj)(r(1q1)(1q2)(1qj1))+(1r)\displaystyle(1-q_{j})(r(1-q_{1})(1-q_{2})\cdots(1-q_{j-1}))+(1-r) (57)
=P(V1=0,V2=0,,Vj=0),\displaystyle=P(V_{1}=0,V_{2}=0,\cdots,V_{j}=0),
(r(1q1)(1q2)(1qj1))+(1r)\displaystyle(r(1-q_{1})(1-q_{2})\cdots(1-q_{j-1}))+(1-r)
=P(V1=0,V2=0,,Vj1=0).\displaystyle=P(V_{1}=0,V_{2}=0,\cdots,V_{j-1}=0).

If we see 1r1-r and r(1q1)(1q2)(1qj1)r(1-q_{1})(1-q_{2})\cdots(1-q_{j-1}) as two unknown variables, they can be solved by this system of linear equations. With the value of rr, we can solve out q1,q2,,qnq_{1},q_{2},\cdots,q_{n} by using P(V1=0,V1=0,,Vt=1)=(1q1)(1q2)qtrP(V_{1}=0,V_{1}=0,\cdots,V_{t}=1)=(1-q_{1})(1-q_{2})\cdots q_{t}r for t=1,2,,nt=1,2,\cdots,n. For pi,jp_{i,j}, it can be computed by 1pi,j=P(V1=0,V2=0,,Vi=1,Vi+1=0,,Vj=0)P(V1=0,V2=0,,Vi=1,Vi+1=0,,Vj=0)(1qj)1-p_{i,j}=\frac{P(V_{1}=0,V_{2}=0,\cdots,V_{i}=1,V_{i+1}=0,\cdots,V_{j}=0)}{P(V_{1}=0,V_{2}=0,\cdots,V_{i}=1,V_{i+1}=0,\cdots,V_{j}=0)(1-q_{j})} if qi0q_{i}\neq 0. We suppose the parents of VjV_{j} are Vi1,Vi2,,VitV_{i_{1}},V_{i_{2}},\cdots,V_{i_{t}} such that i1<i2<<it<ji_{1}<i_{2}<\cdots<i_{t}<j. Therefore, we have

P(Vi1=0,,Vik1=0,Vik=1,Vik+1,,Vit=0,Vj=0)P(Vi1=0,,Vik1=0,Vik=1,Vik+1,,Vit=0\displaystyle\frac{P(V_{i_{1}}=0,\cdots,V_{i_{k-1}}=0,V_{i_{k}}=1,V_{i_{k+1}},\cdots,V_{i_{t}}=0,V_{j}=0)}{P(V_{i_{1}}=0,\cdots,V_{i_{k-1}}=0,V_{i_{k}}=1,V_{i_{k+1}},\cdots,V_{i_{t}}=0} (58)
=(1qj)(1pik,j),k=1,2,,t\displaystyle=(1-q_{j})(1-p_{i_{k},j}),k=1,2,\cdots,t

if P(Vik=1)>0P(V_{i_{k}}=1)>0. Hence, we can still get all the meaningful pik,jp_{i_{k},j}. Until now, we have got all the parameters solved.

Now we consider the case that 1i<jn\not\exists 1\leq i<j\leq n such that (Vi,Vj)E(V_{i},V_{j})\not\in E and qi,qj0q_{i},q_{j}\neq 0. This is equivalent to the claim that for all the ViV_{i} such that qi0q_{i}\neq 0, they form a complete graph if we remove all the directions of edges. Suppose three of them are Vi,Vj,Vk,1i<j<knV_{i},V_{j},V_{k},1\leq i<j<k\leq n. Then we can get the following equations:

P(V1=0,,Vi1=0,Vi=1,Vi+1=0,,Vj=0)P(V1=0,,Vi1=0,Vi=1,Vi+1=0,,Vj1=0)=(1qj)(1pi,j),\displaystyle\frac{P(V_{1}=0,\cdots,V_{i-1}=0,V_{i}=1,V_{i+1}=0,\cdots,V_{j}=0)}{P(V_{1}=0,\cdots,V_{i-1}=0,V_{i}=1,V_{i+1}=0,\cdots,V_{j-1}=0)}=(1-q_{j})(1-p_{i,j}), (59)
P(V1=0,,Vi1=0,Vi=1,Vi+1=0,,Vk=0)P(V1=0,,Vi1=0,Vi=1,Vi+1=0,,Vk1=0)=(1qk)(1pi,k),\displaystyle\frac{P(V_{1}=0,\cdots,V_{i-1}=0,V_{i}=1,V_{i+1}=0,\cdots,V_{k}=0)}{P(V_{1}=0,\cdots,V_{i-1}=0,V_{i}=1,V_{i+1}=0,\cdots,V_{k-1}=0)}=(1-q_{k})(1-p_{i,k}),
P(V1=0,,Vj1=0,Vj=1,Vj+1=0,,Vk=0)P(V1=0,,Vj1=0,Vj=1,Vj+1=0,,Vk1=0)=(1qk)(1pj,k),\displaystyle\frac{P(V_{1}=0,\cdots,V_{j-1}=0,V_{j}=1,V_{j+1}=0,\cdots,V_{k}=0)}{P(V_{1}=0,\cdots,V_{j-1}=0,V_{j}=1,V_{j+1}=0,\cdots,V_{k-1}=0)}=(1-q_{k})(1-p_{j,k}),
P(V1=0,,Vi1=0,Vi=1,Vi+1=0,,Vj1=0,Vj=1,Vj+1=0,,Vk=0)P(V1=0,,Vi1=0,Vi=1,Vi+1=0,,Vk1=0)\displaystyle\frac{P(V_{1}=0,\cdots,V_{i-1}=0,V_{i}=1,V_{i+1}=0,\cdots,V_{j-1}=0,V_{j}=1,V_{j+1}=0,\cdots,V_{k}=0)}{P(V_{1}=0,\cdots,V_{i-1}=0,V_{i}=1,V_{i+1}=0,\cdots,V_{k-1}=0)}
=(1qk)(1pi,k)(1pj,k)(1(1qj)(1pi,j))(1qj)(1pi,j).\displaystyle=\frac{(1-q_{k})(1-p_{i,k})(1-p_{j,k})(1-(1-q_{j})(1-p_{i,j}))}{(1-q_{j})(1-p_{i,j})}.

Therefore, qi,qj,qk,pi,j,pi,k,pj,kq_{i},q_{j},q_{k},p_{i,j},p_{i,k},p_{j,k} can all be solved. Then we can use the same procedure to get all the q1,q2,,qnq_{1},q_{2},\cdots,q_{n} and then all the parameters.

In conclusion, we have solved the identifiability problem of the model with a global hidden variable.∎

Appendix 0.D Proof for the Mixed Model (Theorem 6.2)

See 6.2

Proof

In order to simplify our proof, we introduce some new notations. Suppose we already have Vi,Vj,VkV_{i},V_{j},V_{k} as required in description of this theorem.

al=rt=1l(1qt)(1q0,t)=P(V1=0,V2=0,,Vl=0,U0=1),\displaystyle a_{l}=r\prod_{t=1}^{l}(1-q_{t})(1-q_{0,t})=P(V_{1}=0,V_{2}=0,\cdots,V_{l}=0,U_{0}=1), (60)
bl=(1r)t=1l(1qt)=P(V1=0,V2=0,,Vl=0,U0=0),\displaystyle b_{l}=(1-r)\prod_{t=1}^{l}(1-q_{t})=P(V_{1}=0,V_{2}=0,\cdots,V_{l}=0,U_{0}=0),
xi,l=1(1qi)(1q0,i)(1qi)(1q0,i)(Vi,Vt)E,tl(1pi,t)\displaystyle x_{i,l}=\frac{1-(1-q_{i})(1-q_{0,i})}{(1-q_{i})(1-q_{0,i})}\prod_{(V_{i},V_{t})\in E,t\leq l}(1-p_{i,t}) (61)
=P(V1=0,V2=0,,Vl=0,U0=1)P(V1=0,,Vi1=0,Vi=1,Vi+1=0,,Vl=0,U0=1),\displaystyle=\frac{P(V_{1}=0,V_{2}=0,\cdots,V_{l}=0,U_{0}=1)}{P(V_{1}=0,\cdots,V_{i-1}=0,V_{i}=1,V_{i+1}=0,\cdots,V_{l}=0,U_{0}=1)},
xj,l=1(1qj)(1q0,j)(1qj)(1q0,j)(Vj,Vt)E,tl(1pj,t)\displaystyle x_{j,l}=\frac{1-(1-q_{j})(1-q_{0,j})}{(1-q_{j})(1-q_{0,j})}\prod_{(V_{j},V_{t})\in E,t\leq l}(1-p_{j,t})
=P(V1=0,V2=0,,Vl=0,U0=1)P(V1=0,,Vj1=0,Vj=1,Vj+1=0,,Vl=0,U0=1),\displaystyle=\frac{P(V_{1}=0,V_{2}=0,\cdots,V_{l}=0,U_{0}=1)}{P(V_{1}=0,\cdots,V_{j-1}=0,V_{j}=1,V_{j+1}=0,\cdots,V_{l}=0,U_{0}=1)},
xk,l=1(1qk)(1q0,k)(1qk)(1q0,k)(Vk,Vt)E,tl(1pk,t)\displaystyle x_{k,l}=\frac{1-(1-q_{k})(1-q_{0,k})}{(1-q_{k})(1-q_{0,k})}\prod_{(V_{k},V_{t})\in E,t\leq l}(1-p_{k,t})
=P(V1=0,V2=0,,Vl=0,U0=1)P(V1=0,,Vk1=0,Vk=1,Vk+1=0,,Vl=0,U0=1),\displaystyle=\frac{P(V_{1}=0,V_{2}=0,\cdots,V_{l}=0,U_{0}=1)}{P(V_{1}=0,\cdots,V_{k-1}=0,V_{k}=1,V_{k+1}=0,\cdots,V_{l}=0,U_{0}=1)},
yi,l=qi1qi(Vi,Vt)E,tl(1pi,t)\displaystyle y_{i,l}=\frac{q_{i}}{1-q_{i}}\prod_{(V_{i},V_{t})\in E,t\leq l}(1-p_{i,t}) (62)
=P(V1=0,V2=0,,Vl=0,U0=1)P(V1=0,,Vi1=0,Vi=1,Vi+1=0,,Vl=0,U0=0),\displaystyle=\frac{P(V_{1}=0,V_{2}=0,\cdots,V_{l}=0,U_{0}=1)}{P(V_{1}=0,\cdots,V_{i-1}=0,V_{i}=1,V_{i+1}=0,\cdots,V_{l}=0,U_{0}=0)},
yj,l=qj1qj(Vj,Vt)E,tl(1pj,t)\displaystyle y_{j,l}=\frac{q_{j}}{1-q_{j}}\prod_{(V_{j},V_{t})\in E,t\leq l}(1-p_{j,t})
=P(V1=0,V2=0,,Vl=0,U0=1)P(V1=0,,Vj1=0,Vj=1,Vj+1=0,,Vl=0,U0=0),\displaystyle=\frac{P(V_{1}=0,V_{2}=0,\cdots,V_{l}=0,U_{0}=1)}{P(V_{1}=0,\cdots,V_{j-1}=0,V_{j}=1,V_{j+1}=0,\cdots,V_{l}=0,U_{0}=0)},
yk,l=qk1qk(Vk,Vt)E,tl(1pk,t)\displaystyle y_{k,l}=\frac{q_{k}}{1-q_{k}}\prod_{(V_{k},V_{t})\in E,t\leq l}(1-p_{k,t})
=P(V1=0,V2=0,,Vl=0,U0=1)P(V1=0,,Vk1=0,Vk=1,Vk+1=0,,Vl=0,U0=0).\displaystyle=\frac{P(V_{1}=0,V_{2}=0,\cdots,V_{l}=0,U_{0}=1)}{P(V_{1}=0,\cdots,V_{k-1}=0,V_{k}=1,V_{k+1}=0,\cdots,V_{l}=0,U_{0}=0)}.

Therefore, we can easily verify that al+bl=P(V1=0,V2=0,,Vl=0)p1a_{l}+b_{l}=P(V_{1}=0,V_{2}=0,\cdots,V_{l}=0)\equiv p_{1}. Moreover, because Vi,Vj,VkV_{i},V_{j},V_{k} are disconnected with each other, we have the following equations.

alxi,l+blyi,l=P(V1=0,,Vi1=0,Vi=1,Vi+1=0,,Vl=0)p2,\displaystyle a_{l}x_{i,l}+b_{l}y_{i,l}=P(V_{1}=0,\cdots,V_{i-1}=0,V_{i}=1,V_{i+1}=0,\cdots,V_{l}=0)\equiv p_{2}, (63)
alxj,l+blyj,l=P(V1=0,,Vj1=0,Vj=1,Vj+1=0,,Vl=0)p3,\displaystyle a_{l}x_{j,l}+b_{l}y_{j,l}=P(V_{1}=0,\cdots,V_{j-1}=0,V_{j}=1,V_{j+1}=0,\cdots,V_{l}=0)\equiv p_{3},
alxk,l+blyk,l=P(V1=0,,Vk1=0,Vk=1,Vk+1=0,,Vl=0)p4,\displaystyle a_{l}x_{k,l}+b_{l}y_{k,l}=P(V_{1}=0,\cdots,V_{k-1}=0,V_{k}=1,V_{k+1}=0,\cdots,V_{l}=0)\equiv p_{4},
alxi,lxj,l+blyi,lyj,l=P(V1V2Vl¯=0i110ji110lj¯)p5,\displaystyle a_{l}x_{i,l}x_{j,l}+b_{l}y_{i,l}y_{j,l}=P(\overline{V_{1}V_{2}\cdots V_{l}}=\overline{0^{i-1}10^{j-i-1}10^{l-j}})\equiv p_{5},
alxi,lxk,l+blyi,lyk,l=P(V1V2Vl¯=0i110ki110lk¯)p6,\displaystyle a_{l}x_{i,l}x_{k,l}+b_{l}y_{i,l}y_{k,l}=P(\overline{V_{1}V_{2}\cdots V_{l}}=\overline{0^{i-1}10^{k-i-1}10^{l-k}})\equiv p_{6},
alxk,lxj,l+blyk,lyj,l=P(V1V2Vl¯=0j110kj110lk¯)p7,\displaystyle a_{l}x_{k,l}x_{j,l}+b_{l}y_{k,l}y_{j,l}=P(\overline{V_{1}V_{2}\cdots V_{l}}=\overline{0^{j-1}10^{k-j-1}10^{l-k}})\equiv p_{7},
alxi,lxj,lxk,l+blyi,lyj,lyk,l=P(V1V2Vl¯=0i110ji110kj110lk¯)p8.\displaystyle a_{l}x_{i,l}x_{j,l}x_{k,l}+b_{l}y_{i,l}y_{j,l}y_{k,l}=P(\overline{V_{1}V_{2}\cdots V_{l}}=\overline{0^{i-1}10^{j-i-1}10^{k-j-1}10^{l-k}})\equiv p_{8}.

Here, pt,t[8]p_{t},t\in[8] are known because V1,V2,,VlV_{1},V_{2},\cdots,V_{l} are observable. Therefore, we can solve out al,bla_{l},b_{l} using the 88 equations. The result is shown as below.

al=numerator1±numerator2denominator,\displaystyle a_{l}=\frac{\text{numerator}_{1}\pm\text{numerator}_{2}}{\text{denominator}}, (64)
denominator=2(p12p82+2p1p2p7p8+2p1p3p6p8+2p1p4p5p8\displaystyle\text{denominator}=2(-p_{1}^{2}p_{8}^{2}+2p_{1}p_{2}p_{7}p_{8}+2p_{1}p_{3}p_{6}p_{8}+2p_{1}p_{4}p_{5}p_{8}
4p1p5p6p7p22p724p2p3p4p8+2p2p3p6p7+2p2p4p5p7p32p62\displaystyle-4p_{1}p_{5}p_{6}p_{7}-p_{2}^{2}p_{7}^{2}-4p_{2}p_{3}p_{4}p_{8}+2p_{2}p_{3}p_{6}p_{7}+2p_{2}p_{4}p_{5}p_{7}-p_{3}^{2}p_{6}^{2}
+2p3p4p5p6p42p52),\displaystyle+2p_{3}p_{4}p_{5}p_{6}-p_{4}^{2}p_{5}^{2}),
numerator1=2p12p2p7p8+2p12p3p6p8+2p12p4p5p84p12p5p6p7\displaystyle\text{numerator}_{1}=2{p_{1}}^{2}{p_{2}}{p_{7}}{p_{8}}+2{p_{1}}^{2}{p_{3}}{p_{6}}{p_{8}}+2{p_{1}}^{2}{p_{4}}{p_{5}}{p_{8}}-4{p_{1}}^{2}{p_{5}}{p_{6}}{p_{7}}-
p1p22p724p1p2p3p4p8+2p1p2p3p6p7+2p1p2p4p5p7p1p32p62\displaystyle{p_{1}}{p_{2}}^{2}{p_{7}}^{2}-4{p_{1}}{p_{2}}{p_{3}}{p_{4}}{p_{8}}+2{p_{1}}{p_{2}}{p_{3}}{p_{6}}{p_{7}}+2{p_{1}}{p_{2}}{p_{4}}{p_{5}}{p_{7}}-{p_{1}}{p_{3}}^{2}{p_{6}}^{2}
+2p1p3p4p5p6p1p42p52p13p82,\displaystyle+2{p_{1}}{p_{3}}{p_{4}}{p_{5}}{p_{6}}-{p_{1}}{p_{4}}^{2}p_{5}^{2}-p_{1}^{3}p_{8}^{2},
numerator2=(p12p8p1p2p7p1p3p6p1p4p5+2p2p3p4)\displaystyle\text{numerator}_{2}=\left({p_{1}}^{2}{p_{8}}-{p_{1}}{p_{2}}{p_{7}}-{p_{1}}{p_{3}}{p_{6}}-{p_{1}}{p_{4}}{p_{5}}+2{p_{2}}{p_{3}}{p_{4}}\right)
(p12p822p1p2p7p82p1p3p6p82p1p4p5p8+4p1p5p6p7+p22p72\displaystyle({p_{1}}^{2}{p_{8}}^{2}-2{p_{1}}{p_{2}}{p_{7}}{p_{8}}-2{p_{1}}{p_{3}}{p_{6}}{p_{8}}-2{p_{1}}{p_{4}}{p_{5}}{p_{8}}+4{p_{1}}{p_{5}}{p_{6}}{p_{7}}+{p_{2}}^{2}{p_{7}}^{2}
+4p2p3p4p82p2p3p6p72p2p4p5p7+p32p622p3p4p5p6+p42p52)12.\displaystyle+4{p_{2}}{p_{3}}{p_{4}}{p_{8}}-2{p_{2}}{p_{3}}{p_{6}}{p_{7}}-2{p_{2}}{p_{4}}{p_{5}}{p_{7}}+{p_{3}}^{2}{p_{6}}^{2}-2{p_{3}}{p_{4}}{p_{5}}{p_{6}}+{p_{4}}^{2}{p_{5}}^{2})^{\frac{1}{2}}.

We can verify that

(p2p3p1p5)(p2p4p1p6)(p3p4p1p7)\displaystyle(p_{2}p_{3}-p_{1}p_{5})(p_{2}p_{4}-p_{1}p_{6})(p_{3}p_{4}-p_{1}p_{7}) (65)
=al3bl3(xi,lyi,l)2(xj,lyj,l)2(xk,lyk,l)2<0\displaystyle=-a_{l}^{3}b_{l}^{3}(x_{i,l}-y_{i,l})^{2}(x_{j,l}-y_{j,l})^{2}(x_{k,l}-y_{k,l})^{2}<0 (66)

Therefore, there is only one solution of ala_{l} which is positive because

numerator12numerator22\displaystyle\text{numerator}_{1}^{2}-\text{numerator}_{2}^{2} (67)
=4(p2p3p1p5)(p2p4p1p6)(p3p4p1p7)\displaystyle=-4(p_{2}p_{3}-p_{1}p_{5})(p_{2}p_{4}-p_{1}p_{6})(p_{3}p_{4}-p_{1}p_{7}) (68)
(p42p522p3p4p5p6+p32p622p2p4p5p72p2p3p6p7+4p1p5p6p7+p22p72\displaystyle(p_{4}^{2}p_{5}^{2}-2p_{3}p_{4}p_{5}p_{6}+p_{3}^{2}p_{6}^{2}-2p_{2}p_{4}p_{5}p_{7}-2p_{2}p_{3}p_{6}p_{7}+4p_{1}p_{5}p_{6}p_{7}+p_{2}^{2}p_{7}^{2} (69)
+4p2p3p4p82p1p4p5p82p1p3p6p82p1p2p7p8+p12p82)\displaystyle+4p_{2}p_{3}p_{4}p_{8}-2p_{1}p_{4}p_{5}p_{8}-2p_{1}p_{3}p_{6}p_{8}-2p_{1}p_{2}p_{7}p_{8}+p_{1}^{2}p_{8}^{2}) (70)
=2(p2p3p1p5)(p2p4p1p6)(p3p4p1p7)denominator,\displaystyle=2(p_{2}p_{3}-p_{1}p_{5})(p_{2}p_{4}-p_{1}p_{6})(p_{3}p_{4}-p_{1}p_{7})\text{denominator}, (71)
numerator1=12p1denominator.\displaystyle\text{numerator}_{1}=\frac{1}{2}p_{1}\text{denominator}. (72)

Because ala_{l} is fixed now, we can get blb_{l} according to al+bl=p1a_{l}+b_{l}=p_{1}. This process can be done if and only if the denominator is not zero, which is equivalent to a2b2(xiyi)2(xjyj)2(xkyk)20a^{2}b^{2}(x_{i}-y_{i})^{2}(x_{j}-y_{j})^{2}(x_{k}-y_{k})^{2}\neq 0. Therefore, we only need to find Vi,Vj,VkV_{i},V_{j},V_{k} satisfying the conditions in the theorem and q0,i,q0,j,q0,k0q_{0,i},q_{0,j},q_{0,k}\neq 0.

Notice that ll is an arbitrary number not less than kk, we can get qtq_{t} and q0,t,k+1nq_{0,t},k+1\leq n can be computed using atat1=(1qt)(1q0,t)\frac{a_{t}}{a_{t-1}}=(1-q_{t})(1-q_{0,t}) and btbt1=1qt\frac{b_{t}}{b_{t-1}}=1-q_{t}. For parameter pt,l,l>t>kp_{t,l},l>t>k, we can compute it using

P(V1=0,,Vt1=0,Vt=1,Vt+1=0,,Vl=0)P(V1=0,,Vl=0)\displaystyle\frac{P(V_{1}=0,\cdots,V_{t-1}=0,V_{t}=1,V_{t+1}=0,\cdots,V_{l}=0)}{P(V_{1}=0,\cdots,V_{l}=0)} (73)
=al1(1qt)(1q0,t)(1qt)(1q0,t)+blqt1qtal+bl(Vt,Vi)E,il(1pt,i),\displaystyle=\frac{a_{l}\frac{1-(1-q_{t})(1-q_{0,t})}{(1-q_{t})(1-q_{0,t})}+b_{l}\frac{q_{t}}{1-q_{t}}}{a_{l}+b_{l}}\prod_{(V_{t},V_{i})\in E,i\leq l}(1-p_{t,i}), (74)
P(V1=0,,Vt1=0,Vt=1,Vt+1=0,,Vl1=0)P(V1=0,,Vl1=0)\displaystyle\frac{P(V_{1}=0,\cdots,V_{t-1}=0,V_{t}=1,V_{t+1}=0,\cdots,V_{l-1}=0)}{P(V_{1}=0,\cdots,V_{l-1}=0)} (75)
=al11(1qt)(1q0,t)(1qt)(1q0,t)+bl1qt1qtal1+bl1(Vt,Vi)E,il1(1pt,i).\displaystyle=\frac{a_{l-1}\frac{1-(1-q_{t})(1-q_{0,t})}{(1-q_{t})(1-q_{0,t})}+b_{l-1}\frac{q_{t}}{1-q_{t}}}{a_{l-1}+b_{l-1}}\prod_{(V_{t},V_{i})\in E,i\leq l-1}(1-p_{t,i}). (76)

Then we can get (Vt,Vi)E,il1(1pt,i)\prod_{(V_{t},V_{i})\in E,i\leq l-1}(1-p_{t,i}) and (Vt,Vi)E,il(1pt,i)\prod_{(V_{t},V_{i})\in E,i\leq l}(1-p_{t,i}) and their division is what we want. Until now, we have proved the first part of the theorem. Now suppose we have i=j1=k2i=j-1=k-2. Then for an arbitrary 010-1 string γ\gamma with i1i-1 bits, we still have the following facts.

P(V1Vi1¯=γ,Vi=1,Vj=1,Vk=1,U0=t)P(V1Vi1¯=γ,Vi=0,Vj=0,Vk=0,U0=t)=P(V1Vi1¯=γ,Vi=1,Vj=0,Vk=0,U0=t)P(V1Vi1¯=γ,Vi=0,Vj=0,Vk=0,U0=t)\displaystyle\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=1,V_{j}=1,V_{k}=1,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)}=\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=1,V_{j}=0,V_{k}=0,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)} (77)
P(V1Vi1¯=γ,Vi=0,Vj=1,Vk=0,U0=t)P(V1Vi1¯=γ,Vi=0,Vj=0,Vk=0,U0=t)P(V1Vi1¯=γ,Vi=0,Vj=0,Vk=1,U0=t)P(V1Vi1¯=γ,Vi=0,Vj=0,Vk=0,U0=t),\displaystyle\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=1,V_{k}=0,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)}\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=1,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)},
P(V1Vi1¯=γ,Vi=1,Vj=1,Vk=0,U0=t)P(V1Vi1¯=γ,Vi=0,Vj=0,Vk=0,U0=t)=P(V1Vi1¯=γ,Vi=1,Vj=0,Vk=0,U0=t)P(V1Vi1¯=γ,Vi=0,Vj=0,Vk=0,U0=t)\displaystyle\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=1,V_{j}=1,V_{k}=0,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)}=\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=1,V_{j}=0,V_{k}=0,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)}
P(V1Vi1¯=γ,Vi=0,Vj=1,Vk=0,U0=t)P(V1Vi1¯=γ,Vi=0,Vj=0,Vk=0,U0=t),\displaystyle\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=1,V_{k}=0,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)},
P(V1Vi1¯=γ,Vi=1,Vj=0,Vk=1,U0=t)P(V1Vi1¯=γ,Vi=0,Vj=0,Vk=0,U0=t)=P(V1Vi1¯=γ,Vi=1,Vj=0,Vk=0,U0=t)P(V1Vi1¯=γ,Vi=0,Vj=0,Vk=0,U0=t)\displaystyle\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=1,V_{j}=0,V_{k}=1,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)}=\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=1,V_{j}=0,V_{k}=0,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)}
P(V1Vi1¯=γ,Vi=0,Vj=0,Vk=1,U0=t)P(V1Vi1¯=γ,Vi=0,Vj=0,Vk=0,U0=t),\displaystyle\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=1,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)},
P(V1Vi1¯=γ,Vi=0,Vj=1,Vk=1,U0=t)P(V1Vi1¯=γ,Vi=0,Vj=0,Vk=0,U0=t)=P(V1Vi1¯=γ,Vi=0,Vj=0,Vk=1,U0=t)P(V1Vi1¯=γ,Vi=0,Vj=0,Vk=0,U0=t)\displaystyle\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=1,V_{k}=1,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)}=\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=1,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)}
P(V1Vi1¯=γ,Vi=0,Vj=1,Vk=0,U0=t)P(V1Vi1¯=γ,Vi=0,Vj=0,Vk=0,U0=t)\displaystyle\frac{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=1,V_{k}=0,U_{0}=t)}{P(\overline{V_{1}\cdots V_{i-1}}=\gamma,V_{i}=0,V_{j}=0,V_{k}=0,U_{0}=t)}

where t=0t=0 or t=1t=1. Therefore, we can still have that 88 equations the same as Equation 63 and from those equations, we can solve out all of the probabilities in the facts above. Using those probabilities, we can directly get all the parameters with indexes not larger than kk. Together with the conclusion of the first part of this theorem, all the parameters are determined.∎

Appendix 0.E Discussion on the Cyclic Models

In the last section in appendix, we will introduce how to transform a general IC model to a causal model so that we can use do calculus method to identify do effects. In fact, since the propagation of the IC model takes place for at most nn rounds [2], we formulate that the state of ViV_{i} in round tt is Vi,tV_{i,t} and that Vi,tV_{i,t} has three values, 0,10,1 and 22, for three states. We construct a causal graph G=(V,E)G^{\prime}=(V^{\prime},E^{\prime}) from these nodes with subscripts of time. Here, state 0 means that the node is not activated, 11 means that the node was activated at the last time point, and state 22 means that the node is activated and has already tried to activate its child nodes. Moreover, all edges are defined as (Vi,t,Vi,t+1),1i,tn(V_{i,t},V_{i,t+1}),1\leq i,t\leq n and (Vi,t,Vj,t+1),1i,j,tn,(Vi,Vj)E(V_{i,t},V_{j,t+1}),1\leq i,j,t\leq n,(V_{i},V_{j})\in E^{\prime}. Furthermore, we define the following propagating equations.

P(Vi,t=2|Vi,t1=2,Vj,t1=vj,t1,j,(Vj,Vi)E)=1,\displaystyle P(V_{i,t}=2|V_{i,t-1}=2,V_{j,t-1}=v_{j,t-1},\forall j,(V_{j},V_{i})\in E)=1, (78)
P(Vi,t=2|Vi,t1=1,Vj,t1=vj,t1,j,(Vj,Vi)E)=1,\displaystyle P(V_{i,t}=2|V_{i,t-1}=1,V_{j,t-1}=v_{j,t-1},\forall j,(V_{j},V_{i})\in E)=1, (79)
P(Vi,t=2|Vi,t1=0,Vj,t1=vj,t1,j,(Vj,Vi)E)=0,\displaystyle P(V_{i,t}=2|V_{i,t-1}=0,V_{j,t-1}=v_{j,t-1},\forall j,(V_{j},V_{i})\in E)=0, (80)
P(Vi,t=1|Vi,t1=0,Vj,t1=vj,t1,j,(Vj,Vi)E)\displaystyle P(V_{i,t}=1|V_{i,t-1}=0,V_{j,t-1}=v_{j,t-1},\forall j,(V_{j},V_{i})\in E) (81)
=11jn,(Vj,Vi)E(1pj,ivj,t1I[vj,t12]),\displaystyle=1-\prod_{1\leq j\leq n,(V_{j},V_{i})\in E}(1-p_{j,i}v_{j,t-1}I[v_{j,t-1}\neq 2]), (82)
P(Vi,t=0|Vi,t1=0,Vj,t1=vj,t1,j,(Vj,Vi)E)\displaystyle P(V_{i,t}=0|V_{i,t-1}=0,V_{j,t-1}=v_{j,t-1},\forall j,(V_{j},V_{i})\in E) (83)
=1jn,(Vj,Vi)E(1pj,ivj,t1I[vj,t12]).\displaystyle=\prod_{1\leq j\leq n,(V_{j},V_{i})\in E}(1-p_{j,i}v_{j,t-1}I[v_{j,t-1}\neq 2]). (84)

By definition, we obtain that GG^{\prime} as a Bayesian causal model, so we can use the do calculus method [19] to solve out the identifiable do effects. For example, suppose GG has V1,V2,V3V_{1},V_{2},V_{3} as observed nodes and U1U_{1} as an unobserved node. The edges in EE are (U1,V1),(U1,V2),(V1,V2),(V2,V3)(U_{1},V_{1}),(U_{1},V_{2}),(V_{1},V_{2}),(V_{2},V_{3}) and (V3,V1)(V_{3},V_{1}). There exists a cycle in GG so it cannot be seen as a DAG. However, utilizing our transformation, the result graph GG^{\prime} is in Figure 5, which is a Bayesian causal model.

Refer to caption
Figure 5: An example of transformation from IC model to Bayesian causal graph.