This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Inference on autoregulation in gene expression

Yue Wang Department of Computational Medicine, University of California, Los Angeles, California, United States of America Institut des Hautes Études Scientifiques (IHÉS), Bures-sur-Yvette, Essonne, France E-mail address: [email protected] (Y. W.). ORCID: 0000-0001-5918-7525 Siqi He Simons Center for Geometry and Physics, Stony Brook University, Stony Brook, New York, United States of America
Abstract

Some genes can promote or repress their own expressions, which is called autoregulation. Although gene regulation is a central topic in biology, autoregulation is much less studied. In general, it is extremely difficult to determine the existence of autoregulation with direct biochemical approaches. Nevertheless, some papers have observed that certain types of autoregulations are linked to noise levels in gene expression. We generalize these results by two propositions on discrete-state continuous-time Markov chains. These two propositions form a simple but robust method to infer the existence of autoregulation from gene expression data. This method only needs to compare the mean and variance of the gene expression level. Compared to other methods for inferring autoregulation, our method only requires non-interventional one-time data, and does not need to estimate parameters. Besides, our method has few restrictions on the model. We apply this method to four groups of experimental data and find some genes that might have autoregulation. Some inferred autoregulations have been verified by experiments or other theoretical works.

Keywords.

inference; gene expression; autoregulation; Markov chain.

Frequently used abbreviations:

GRN: gene regulatory network.

VMR: variance-to-mean ratio

1 Introduction

In general, genes are transcribed to mRNAs and then translated to proteins. We can use the abundance of mRNA or protein to represent the expression levels of genes. Both the synthesis and degradation of mRNAs/proteins are affected (activated or inhibited) by the expression levels of other genes [1], which is called (mutual) gene regulation. Genes and their regulatory relations form a gene regulatory network (GRN) [2], generally represented as a directed graph: each vertex is a gene, and each directed edge is a regulatory relationship. See Fig. 1 for an example of GRN.

PLCGPIP3AKTPIP2RAFMEKPKCPKAERKJNKP38

Figure 1: An example of GRN in human T cells [3]. Each vertex is a gene. Each arrow is a regulatory relationship. Notice that it has no directed cycle.

The expression of one gene could promote/repress its own expression, which is called positive/negative autoregulation [4]. Autoregulation is very common in E. coli [5]. Positive autoregulation is also called autocatalysis or autoactivation, and negative autoregulation is also called autorepression [6, 7]. For instance, HOX proteins form and maintain spatially inhomogeneous expression of HOX genes [8]. For genes with position-specific expressions during development, it is common that the increase of one gene can further increase or decrease its level [9].

While countless works infer the regulatory relationships between different genes (GRN structure) [10], determining the existence of autoregulation is an equally important yet less-studied field. Due to technical limitations, it is difficult and sometimes impossible to directly detect autoregulation in experiments. Instead, we can measure gene expression profiles and infer the existence of autoregulation. In this paper, we consider a specific data type: measure the expression levels of certain genes without intervention for a single cell (which reaches stationary) at a single time point, and repeat for many different cells to obtain a probability distribution for expression levels. Such single-cell non-interventional one-time gene expression data can be obtained with a relatively lower cost [11].

With such single-cell level data for one gene VV, we can calculate the ratio of variance and mean of the expression level (mRNA or protein count). This quantity is called the variance-to-mean ratio (VMR) or the Fano factor. Many papers that study gene expression systems with autoregulations have found that negative autoregulation can decrease noise (smaller VMR), and positive autoregulation can increase noise (larger VMR) [12, 13, 14, 15, 16, 17, 18]. This means VMR can be used to infer the existence of autoregulation.

We generalize the above observation and develop two mathematical results that use VMR to determine the existence of autoregulation. They apply to some genes that have autoregulation. For genes without autoregulation, these results cannot determine that autoregulation does not exist. We apply these results to four experimental gene expression data sets and detect some genes that might have autoregulation.

We start with some setup and introduce our main results (Section 2). Then we cite some previous works on this topic and compare them with our results (Section 3). For a single gene that is not regulated by other genes (Section 4) and multiple genes that regulate each other (Section 5), we develop mathematical results to identify the existence of autoregulation. These two mathematical sections can be skipped. We summarize the procedure of our method and apply it to experimental data (Section 6). We finish with some conclusions and discussions (Section 7).

2 Setup and main results

One possible mechanism of “the increase of one gene’s expression level further increases its expression level” is a positive feedback loop between two genes [19]. Here V1V_{1} and V2V_{2} promote each other, so that the increase of V1V_{1} increases V2V_{2}, which in return further increases V1V_{1}. We should not regard this feedback loop as autoregulation. When we define autoregulation for a gene VV, we should fix environmental factors and other genes that regulate VV, and observe whether the expression level of VV can affect itself. If VV is in a feedback loop that contains other genes, then those genes (which regulate VV and are regulated by VV) cannot be fixed when we change VV. Therefore, it is essentially difficult to determine whether VV has autoregulation in this scenario. In the following, we need to assume that VV is not contained in a feedback loop that involves other genes.

The actual gene expression mechanism might be complicated. Besides other genes/factors that can regulate a gene, for a gene VV itself, it might switch between inactivated (off) and activated (on) states [20]. These states correspond to different transcription rates to produce mRNAs. When mRNAs are translated into proteins, those proteins might affect the transition of gene activation states, which forms autoregulation [21]. See Fig. 2 for an illustration. Therefore, for a gene VV, we should regard the gene activation state, mRNA count, and protein count as a triplet of random variables G,M,PG,M,P, which depend on each other.

inactivated genemRNAproteinactivated gene

Figure 2: The mechanism of gene expression. A gene might switch between inactivated state and activated state. Gene is transcribed into mRNAs, which are translated into proteins. Proteins might (auto)regulate the state transition of the corresponding gene.

When we fix environmental factors and other genes that affect VV, the triplet G,M,PG,M,P should follow a continuous-time Markov chain. The state space is on/off (for GG) or the mRNA/protein count on \mathbb{Z} (for M,PM,P). When we consider the expression level MM or PP (but do not control GG), sometimes itself still follows a Markov chain, and we call this scenario “autonomous”. In other cases, MM or PP itself is no longer Markovian, and we call this scenario “non-autonomous”. We need to consider the triplet G,M,PG,M,P in the non-autonomous scenario.

For the autonomous scenario, we can fully classify autoregulation for a gene VV. Assume environmental factors and other genes that affect the expression of VV are kept at constants. Define the expression level (mRNA count for example) of one cell to be X=nX=n, the mRNA synthesis rate at X=n1X=n-1 to be fnf_{n}, and the degradation rate for each mRNA molecule at X=nX=n to be gng_{n}. This is a standard continuous-time Markov chain on \mathbb{Z} with transition rates

1Δt[X(t+Δt)=nX(t)=n1]=fn,\frac{1}{\Delta t}\mathbb{P}[X(t+\Delta t)=n\mid X(t)=n-1]=f_{n},
1Δt[X(t+Δt)=n1X(t)=n]=ngn.\frac{1}{\Delta t}\mathbb{P}[X(t+\Delta t)=n-1\mid X(t)=n]=ng_{n}.

Define the relative growth rate hn=fn/gnh_{n}=f_{n}/g_{n}. If there is no autoregulation, then hnh_{n} is a constant. Positive autoregulation means hn>hn1h_{n}>h_{n-1} for some nn, so that fn>fn1f_{n}>f_{n-1} and/or gn<gn1g_{n}<g_{n-1}; negative autoregulation means hn<hn1h_{n}<h_{n-1} for some nn, so that fn<fn1f_{n}<f_{n-1} and/or gn>gn1g_{n}>g_{n-1}. Notice that we can have hn>hn1h_{n}>h_{n-1} for some nn and hn<hn1h_{n^{\prime}}<h_{n^{\prime}-1} for some other nn^{\prime}, meaning that positive autoregulation and negative autoregulation can both exist for the same gene, but occur at different expression levels.

For the non-autonomous scenario, we can still define autoregulation. Consider the expression level XX of VV (mRNA count or protein count) and its interior factor II. If XX is the mRNA count, then II is the gene state; if XX is the protein count, then II is the gene state and the mRNA count. If there is no autoregulation, then XX cannot affect II, and for each value of II, the relative growth rate hnh_{n} of XX is a constant. If XX can affect II, or hnh_{n} is not a constant, then there is autoregulation. When XX can affect II, it is not always easy to distinguish between positive autoregulation and negative autoregulation.

Quantitatively, for the autonomous scenario, when we fix other factors that might regulate this gene VV, if VV has no autoregulation, then hn=fn/gnh_{n}=f_{n}/g_{n} is a constant hh for all nn. In this case, the stationary distribution of VV satisfies (X=n)/(X=n1)=h/n\mathbb{P}(X=n)/\mathbb{P}(X=n-1)=h/n, meaning that the distribution is Poisson with parameter hh, (X=n)=hneh/n!\mathbb{P}(X=n)=h^{n}e^{-h}/n!, and VMR=1\text{VMR}=1. If there exists positive autoregulation of certain forms, VMR>1\text{VMR}>1; if there exists negative autoregulation of certain forms, VMR<1\text{VMR}<1. However, such results are derived by assuming that fn,gnf_{n},g_{n} take certain functional forms, such as linear functions [22, 23], quadratic functions [24], or Hill functions [25]. There are other papers that consider Markov chain models in gene expression/regulation [26, 27, 28, 29, 30, 31], but the role of VMR is not thoroughly studied.

In this paper, we generalize the above result of inferring autoregulation with VMR by dropping the restrictions on parameters. Consider a gene VV in a known GRN, and assume it is not regulated by other genes, or assume other factors that regulate VV are fixed. Assume we have the autonomous scenario, meaning that its expression level X=nX=n satisfies a general Markov chain with synthesis rate fnf_{n} and per molecule degradation rate gng_{n}. We do not add any restrictions on fnf_{n} and gng_{n}. Use the single-cell non-interventional one-time gene expression data to calculate the VMR of VV. Proposition 1 states that VMR>1\text{VMR}>1 or VMR<1\text{VMR}<1 means the existence of positive/negative autoregulation.

Nevertheless, the autonomous condition requires some assumptions, and often does not hold in reality [32, 33, 34, 26]. Consider a gene VV that is not regulated by other genes, and has no autoregulation. The mRNA count or the protein count is regulated by the gene activation state, which cannot be fixed. Due to this non-controllable factor, there might be transcriptional bursting [35, 36] or translational bursting [37], where transcription or translation can occur in bursts, and we have VMR>1\text{VMR}>1. This does not mean that Proposition 1 is wrong. Instead, it means that the expression level itself is not Markovian, and the scenario is non-autonomous. In this scenario, we should apply Proposition 2, described below, which states that no autoregulation means VMR1\text{VMR}\geq 1.

We extend the idea of inferring autoregulation with VMR to a gene that is regulated by other genes, or with non-autonomous expression. Consider a gene VV^{\prime} in a known GRN. Assume VV^{\prime} is not contained in a feedback loop, and assume gng_{n}, the per molecule degradation rate of VV^{\prime}, is not regulated by other genes or its interior factors. We do not add any restrictions on the synthesis rate fnf_{n}. Proposition 2 states that if VV^{\prime} has no autoregulation, then VMR1\text{VMR}\geq 1. Therefore, VMR<1\text{VMR}<1 means autoregulation for VV^{\prime}. The conclusion “VMR<1\text{VMR}<1 means autoregulation” has been observed by Munsky et al. [15] for a single gene that is non-autonomous.

In the scenario that Proposition 2 may apply, if VMR1\text{VMR}\geq 1, Proposition 2 cannot determine whether autoregulation exists. In fact, with VMR, or even the full probability distribution, we might not distinguish a non-autonomous system with autoregulation from a non-autonomous system without autoregulation, which both have VMR1\text{VMR}\geq 1 [38]. In the non-autonomous scenario, we only focus on the less complicated case of VMR<1\text{VMR}<1, and derive Proposition 2 that firmly links VMR and autoregulation.

In reality, Proposition 1 and Proposition 2 can only apply to a few genes (which are not regulated by other genes or have VMR<1\text{VMR}<1), and they cannot determine negative results. Thus the inference results about autoregulation are a few “yes” and many “we do not know”. Besides, for the results inferred by Proposition 1, especially those with VMR>1\text{VMR}>1 (positive autoregulation), we cannot verify whether their expression is autonomous, and the inference results are less reliable.

Current experimental methods can hardly determine the existence of autoregulation, and to determine that a gene does not have autoregulation is even more difficult. Therefore, about whether genes in a GRN have autoregulation, experimentally, we do not have “yes” or “no”, but a few “yes” and many “we do not know”. Thus there is no gold standard to thoroughly evaluate the performance of our inference results. We can only report that some genes inferred by our method to have autoregulation are also verified by experiments or other inference methods to have autoregulation.

3 Related works

There are other mathematical approaches to infer the existence of autoregulation in gene expression [39, 40, 41, 42, 43, 44, 45, 46]. We introduce some works and compare them with our method. (A) Sanchez-Castillo et al. [39] considered an autoregressive model for multiple genes. This method (1) needs time series data; (2) requires the dynamics to be linear; (3) estimates a group of parameters. (B) Xing et al. [40] applied causal inference to a complicated gene expression model. This method (1) needs promoter sequences and information on transcription factor binding sites; (2) requires linearity for certain steps; (3) estimates a group of parameters. (C) Feigelman et al. [41] applied a Bayesian method for model selection. This method (1) needs time series data; (2) estimates a group of parameters. (D) Veerman et al. [42] considered the probability-generating function of a propagator model. This method (1) needs time series data; (2) estimates a group of parameters; (3) needs to approximate a Cauchy integral. (E) Jia et al. [43] compared the relaxation rate with degradation rate. This method (1) needs interventional data; (2) only works for a single gene that is not regulated by other genes; (3) requires that the per molecule degradation rate is a constant.

Compared to other methods, our method has some advantages: (1) Our method uses non-interventional one-time data. Time series data require measuring the same cell multiple times without killing it, and interventional data require some techniques to interfere with gene expression, such as gene knockdown. Therefore, non-interventional one-time data used in our method are much easier and cheaper to obtain. (2) Our method does not estimate parameters, and only calculate the mean and variance of the expression level. Some other methods need to estimate many parameters or approximate some complicated quantities, meaning that they need large data size and high data accuracy. Therefore, our method is easy to calculate, and need lower data accuracy and smaller data size. (3) Our method has few restrictions on the model, making them applicable to various scenarios with different dynamics. In sum, our method is simple and universal, and have lower requirements on data quality.

Compared to other methods, our method has some disadvantages: (1) The GRN structure needs to be known. (2) Our method does not work for certain genes, depending on regulatory relationships. Proposition 1 only works for a gene that is not regulated by other genes, and we require its expression to be autonomous; Proposition 2 only works for a gene that is not in a feedback loop. (3) Proposition 2 requires the per molecule degradation rate to be a constant, and it cannot provide information about autoregulation if VMR1\text{VMR}\geq 1. (4) Our method only works for cells at equilibrium. Thus time series data that contain time-specific information cannot be utilized other than treated as one-time data. With just stationary distribution, sometimes it is impossible to build the causal relationship (including autoregulation) [47]. Thus with this data type, some disadvantages are inevitable.

4 Scenario of a single isolated gene

4.1 Setup

We first consider the expression level (e.g., mRNA count) of one gene VV in a single cell. At the single-cell level, gene expression is essentially stochastic, and we use a random variable XX to represent the mRNA count of VV. We assume VV is not in a feedback loop. We also assume all environmental factors and other genes that can affect XX are kept at constant levels, so that we can focus on VV alone. This can be achieved if no other genes point to gene VV in the GRN, such as PIP3 in Fig. 1. Then we assume that the expression of VV is autonomous, thus XX satisfies a time-homogeneous Markov chain defined on \mathbb{Z}^{*}.

Assume that the mRNA synthesis rate at X(t)=n1X(t)=n-1, namely the transition rate from X=n1X=n-1 to X=nX=n, is fn>0f_{n}>0. Assume that with nn mRNA molecules, the degradation rate for each mRNA molecule is gn>0g_{n}>0. Then the overall degradation rate at X(t)=nX(t)=n, namely the transition rate from X=nX=n to X=n1X=n-1, is gnng_{n}n. The associated master equation is

d[X(t)=n]/dt=[X(t)=n+1]gn+1(n+1)+[X(t)=n1]fn[X(t)=n](fn+1+gnn).\begin{split}\mathrm{d}\mathbb{P}[X(t)=n]/\mathrm{d}t=&\mathbb{P}[X(t)=n+1]g_{n+1}(n+1)+\mathbb{P}[X(t)=n-1]f_{n}\\ &-\mathbb{P}[X(t)=n](f_{n+1}+g_{n}n).\end{split} (1)

Define the relative growth rate hn=fn/gnh_{n}=f_{n}/g_{n}. We assume that hnh_{n} has a finite upper bound. This means that as time tends to infinity, the process reaches equilibrium. Thus at equilibrium, (1) the stationary probability distribution Pn=limt[X(t)=n]P_{n}=\lim_{t\to\infty}\mathbb{P}[X(t)=n] exists, and Pn=Pn1hn/nP_{n}=P_{n-1}h_{n}/n; (2) the mean 𝔼(X)\mathbb{E}(X) and the variance σ2(X)\sigma^{2}(X) are finite [48].

If hn>hn1h_{n}>h_{n-1} for some nn, then there exists positive autoregulation. If hn<hn1h_{n}<h_{n-1} for some nn, then there exists negative autoregulation. If there is no autoregulation, then hnh_{n} is a constant hh, and the stationary distribution is Poisson with parameter hh. In this setting, positive autoregulation and negative autoregulation might coexist, meaning that hn+1<hnh_{n+1}<h_{n} for some nn and hn+1>hnh_{n^{\prime}+1}>h_{n^{\prime}} for some nn^{\prime}.

4.2 Theoretical results

With single-cell non-interventional one-time gene expression data for one gene, we have the stationary distribution of the Markov chain XX. We can infer the existence of autoregulation with the VMR of XX, defined as VMR(X)=σ2(X)/𝔼(X)\text{VMR}(X)=\sigma^{2}(X)/\mathbb{E}(X). The idea is that if we let fnf_{n} increase/decrease with nn, and control gng_{n} to make 𝔼(X)\mathbb{E}(X) invariant, then the variance σ2(X)\sigma^{2}(X) increases/decreases [49]. We shall prove that VMR>1\text{VMR}>1 means positive autoregulation, and VMR<1\text{VMR}<1 means negative autoregulation. Notice that VMR>1\text{VMR}>1 does not exclude the possibility that negative autoregulation exists for some expression level. This also applies to VMR<1\text{VMR}<1 and positive autoregulation.

We can illustrate this result with a linear model: set fn=k+b(n1)f_{n}=k+b(n-1), gn=cg_{n}=c. Here bb (can be positive or negative) is the strength of autoregulation, and cc satisfies c>0c>0 and cb>0c-b>0. Multiply Eq. 1 by nn and n(n1)n(n-1) and take summation, then we can calculate that VMR=1+b/(cb)\text{VMR}=1+b/(c-b). Therefore, VMR>1\text{VMR}>1 means positive autoregulation, b>0b>0; VMR<1\text{VMR}<1 means negative autoregulation, b<0b<0; VMR=1\text{VMR}=1 means no autoregulation, b=0b=0.

Lemma 1.

Consider the Markov chain model for one gene with general transition coefficients fn,gnf_{n},g_{n}, described by Eq. 1. Calculate VMR(X)\text{VMR}(X) at stationary. (1) Assume hn+1hnh_{n+1}\geq h_{n} for all nn. We have VMR(X)1\text{VMR}(X)\geq 1; moreover, VMR(X)=1\text{VMR}(X)=1 if and only if hn+1=hnh_{n+1}=h_{n} for all nn. (2) Assume hn+1hnh_{n+1}\leq h_{n} for all nn. We have VMR(X)1\text{VMR}(X)\leq 1; moreover, VMR(X)=1\text{VMR}(X)=1 if and only if hn+1=hnh_{n+1}=h_{n} for all nn.

From Lemma 1, we can directly obtain the following proposition.

Proposition 1.

In the setting of Lemma 1, (1) If VMR(X)>1\text{VMR}(X)>1, then there exist values of nn for which hn+1>hnh_{n+1}>h_{n}; thus this gene has positive autoregulation. (2) If VMR(X)<1\text{VMR}(X)<1, then there exist values of nn for which hn+1<hnh_{n+1}<h_{n}; thus this gene has negative autoregulation. (3) If VMR(X)=1\text{VMR}(X)=1, then either (A) hn+1=hnh_{n+1}=h_{n} for all nn, meaning that this gene has no autoregulation; or (B) hn+1<hnh_{n+1}<h_{n} for some nn and hn+1>hnh_{n^{\prime}+1}>h_{n^{\prime}} for some nn^{\prime}, meaning that this gene has both positive and negative autoregulation (at different expression levels).

Remark 1.

Proposition 1 requires that the gene expression is autonomous. In reality, many genes are non-autonomous, and transcriptional/translational bursting can make the VMR to be larger than 100100 [22].

Remark 2.

Results similar to Proposition 1 have been proven in a non-autonomous model of gene expression [50].

Proof of Lemma 1.

Define λ=logP0\lambda=-\log P_{0}, so that P0=exp(λ)P_{0}=\exp(-\lambda). Define dn=i=1nhi>0d_{n}=\prod_{i=1}^{n}h_{i}>0 and stipulate that d0=1d_{0}=1. We can see that

dndn+2dn+12=hn+2hn+1.\frac{d_{n}d_{n+2}}{d_{n+1}^{2}}=\frac{h_{n+2}}{h_{n+1}}.

Also,

Pn=Pn1fn/(gnn)=Pn1hn/n==P0(i=1nhi)/n!=eλdnn!.P_{n}=P_{n-1}f_{n}/(g_{n}n)=P_{n-1}h_{n}/n=\cdots=P_{0}(\prod_{i=1}^{n}h_{i})/n!\ =e^{-\lambda}\frac{d_{n}}{n!}.

Then

𝔼(X2)𝔼(X)=n=1(n2n)Pn=eλn=1(n2n)dnn!=eλn=2dn(n2)!=eλn=0dn+2n!,\begin{split}\mathbb{E}(X^{2})-\mathbb{E}(X)&=\sum_{n=1}^{\infty}(n^{2}-n)P_{n}=e^{-\lambda}\sum_{n=1}^{\infty}(n^{2}-n)\frac{d_{n}}{n!}\\ &=e^{-\lambda}\sum_{n=2}^{\infty}\frac{d_{n}}{(n-2)!}=e^{-\lambda}\sum_{n=0}^{\infty}\frac{d_{n+2}}{n!},\end{split}
[𝔼(X)]2=(n=1nPn)2=e2λ(n=1ndnn!)2=e2λ(n=0dn+1n!)2.[\mathbb{E}(X)]^{2}=\left(\sum_{n=1}^{\infty}nP_{n}\right)^{2}=e^{-2\lambda}\left(\sum_{n=1}^{\infty}n\frac{d_{n}}{n!}\right)^{2}=e^{-2\lambda}\left(\sum_{n=0}^{\infty}\frac{d_{n+1}}{n!}\right)^{2}.

Besides,

1=n=0Pn=eλn=0dnn!.1=\sum_{n=0}^{\infty}P_{n}=e^{-\lambda}\sum_{n=0}^{\infty}\frac{d_{n}}{n!}.

Now we have

𝔼(X2)𝔼(X)[𝔼(X)]2=e2λ(n=0dnn!)(n=0dn+2n!)e2λ(n=0dn+1n!)2.\mathbb{E}(X^{2})-\mathbb{E}(X)-[\mathbb{E}(X)]^{2}=e^{-2\lambda}\left(\sum_{n=0}^{\infty}\frac{d_{n}}{n!}\right)\left(\sum_{n=0}^{\infty}\frac{d_{n+2}}{n!}\right)-e^{-2\lambda}\left(\sum_{n=0}^{\infty}\frac{d_{n+1}}{n!}\right)^{2}.

(1) Assume hn+1hnh_{n+1}\geq h_{n} for all nn. Then

𝔼(X2)𝔼(X)[𝔼(X)]2e2λ(n=0dndn+2n!)2e2λ(n=0dn+1n!)20.\begin{split}&\mathbb{E}(X^{2})-\mathbb{E}(X)-[\mathbb{E}(X)]^{2}\\ \geq&e^{-2\lambda}\left(\sum_{n=0}^{\infty}\frac{\sqrt{d_{n}d_{n+2}}}{n!}\right)^{2}-e^{-2\lambda}\left(\sum_{n=0}^{\infty}\frac{d_{n+1}}{n!}\right)^{2}\geq 0.\end{split} (2)

Here the first inequality is from the Cauchy inequality, and the second inequality is from dndn+2dn+12d_{n}d_{n+2}\geq d_{n+1}^{2} for all nn. Then VMR(X)={𝔼(X2)[𝔼(X)]2}/𝔼(X)1\text{VMR}(X)=\{\mathbb{E}(X^{2})-[\mathbb{E}(X)]^{2}\}/\mathbb{E}(X)\geq 1. Equality holds if and only if dn/dn+2=dn+1/dn+3d_{n}/d_{n+2}=d_{n+1}/d_{n+3} for all nn (the first inequality of Eq. 2) and dndn+2=dn+12d_{n}d_{n+2}=d_{n+1}^{2} for all nn (the second inequality of Eq. 2). The equality condition is equivalent to hn+1=hnh_{n+1}=h_{n} for all nn.

(2) Assume hn+1hnh_{n+1}\leq h_{n} for all nn. Then dn+2/dn+1dn+1/dnd_{n+2}/d_{n+1}\leq d_{n+1}/d_{n}, and dnh1nd_{n}\leq h_{1}^{n} for all nn. Define

H(t)=n=0dnn!tn.H(t)=\sum_{n=0}^{\infty}\frac{d_{n}}{n!}t^{n}.

Since 0<dnh1n0<d_{n}\leq h_{1}^{n}, this series converges for all tt\in\mathbb{C}, so that H(t)H(t) is a well-defined analytical function on \mathbb{C}, and

H(t)=n=0dn+1n!tn, and H′′(t)=n=0dn+2n!tn.H^{\prime}(t)=\sum_{n=0}^{\infty}\frac{d_{n+1}}{n!}t^{n},\ \text{ and }\ H^{\prime\prime}(t)=\sum_{n=0}^{\infty}\frac{d_{n+2}}{n!}t^{n}.

In the following, we only consider H(t),H(t),H′′(t)H(t),H^{\prime}(t),H^{\prime\prime}(t) as real functions for tt\in\mathbb{R}.

To prove VMR(X)1\text{VMR}(X)\leq 1, we just need to prove 𝔼(X2)𝔼(X)[𝔼(X)]2=e2λ{H(1)H′′(1)[H(1)]2}0\mathbb{E}(X^{2})-\mathbb{E}(X)-[\mathbb{E}(X)]^{2}=e^{-2\lambda}\{H(1)H^{\prime\prime}(1)-[H^{\prime}(1)]^{2}\}\leq 0. However, we shall prove H′′(t)H(t)[H(t)]2H^{\prime\prime}(t)H(t)\leq[H^{\prime}(t)]^{2} for all tt\in\mathfrak{I}, where =(a,b)\mathfrak{I}=(a,b) is a fixed interval in \mathbb{R} with 0<a<10<a<1 and 1<b<1<b<\infty. Thus t=1t=1 is an interior point of \mathfrak{I}. Since H(t),H(t),H′′(t)H(t),H^{\prime}(t),H^{\prime\prime}(t) have positive lower bounds on \mathfrak{I}, the following statements are obviously equivalent: (i) H′′(t)H(t)[H(t)]2H^{\prime\prime}(t)H(t)\leq[H^{\prime}(t)]^{2} for all tt\in\mathfrak{I}; (ii) {log[H(t)/H(t)]}0\{\log[H^{\prime}(t)/H(t)]\}^{\prime}\leq 0 for all tt\in\mathfrak{I}; (iii) log[H(t)/H(t)]\log[H^{\prime}(t)/H(t)] is non-increasing on \mathfrak{I}; (iv) H(t)/H(t)H^{\prime}(t)/H(t) is non-increasing on \mathfrak{I}. To prove (i), we just need to prove (iv).

Consider any t1,t2t_{1},t_{2}\in\mathfrak{I} with t1t2t_{1}\leq t_{2} and any p,qp,q\in\mathbb{N} with pqp\geq q. Since dp+1/dpdq+1/dqd_{p+1}/d_{p}\leq d_{q+1}/d_{q}, and t1pqt2pqt_{1}^{p-q}\leq t_{2}^{p-q}, we have

dpdqt1qt2q(dp+1dpdq+1dq)(t1pqt2pq)0,d_{p}d_{q}t_{1}^{q}t_{2}^{q}(\frac{d_{p+1}}{d_{p}}-\frac{d_{q+1}}{d_{q}})(t_{1}^{p-q}-t_{2}^{p-q})\geq 0,

which means

dp+1dqt1pt2q+dq+1dpt1qt2pdp+1dqt2pt1q+dq+1dpt2qt1p.d_{p+1}d_{q}t_{1}^{p}t_{2}^{q}+d_{q+1}d_{p}t_{1}^{q}t_{2}^{p}\geq d_{p+1}d_{q}t_{2}^{p}t_{1}^{q}+d_{q+1}d_{p}t_{2}^{q}t_{1}^{p}.

Sum over all p,qp,q\in\mathbb{N} with pqp\geq q to obtain

H(t1)H(t2)=(n=0dn+1n!t1n)(n=0dnn!t2n)(n=0dn+1n!t2n)(n=0dnn!t1n)=H(t2)H(t1).\begin{split}H^{\prime}(t_{1})H(t_{2})&=(\sum_{n=0}^{\infty}\frac{d_{n+1}}{n!}t_{1}^{n})(\sum_{n=0}^{\infty}\frac{d_{n}}{n!}t_{2}^{n})\\ &\geq(\sum_{n=0}^{\infty}\frac{d_{n+1}}{n!}t_{2}^{n})(\sum_{n=0}^{\infty}\frac{d_{n}}{n!}t_{1}^{n})=H^{\prime}(t_{2})H(t_{1}).\end{split}

Thus H(t1)/H(t1)H(t2)/H(t2)H^{\prime}(t_{1})/H(t_{1})\geq H^{\prime}(t_{2})/H(t_{2}) for all t1,t2t_{1},t_{2}\in\mathfrak{I} with t1t2t_{1}\leq t_{2}. This means H′′(t)H(t)[H(t)]2H^{\prime\prime}(t)H(t)\leq[H^{\prime}(t)]^{2} for all tt\in\mathfrak{I}, and VMR(X)1\text{VMR}(X)\leq 1.

About the condition for the equality to hold, assume hn+1<hnh_{n^{\prime}+1}<h_{n^{\prime}} for a given nn^{\prime}. Then

dndn1t1n1t2n1(dn+1dndndn1)(t1t2)C(t2t1)d_{n^{\prime}}d_{n^{\prime}-1}t_{1}^{n^{\prime}-1}t_{2}^{n^{\prime}-1}(\frac{d_{n^{\prime}+1}}{d_{n^{\prime}}}-\frac{d_{n^{\prime}}}{d_{n^{\prime}-1}})(t_{1}-t_{2})\geq C(t_{2}-t_{1})

for all t1,t2t_{1},t_{2}\in\mathfrak{I} with t1t2t_{1}\leq t_{2} and a constant CC that does not depend on t1,t2t_{1},t_{2}. Therefore,

[H(t1)/H(t1)H(t2)/H(t2)][H(t1)H(t2)]=(n=0dn+1n!t1n)(n=0dnn!t2n)(n=0dn+1n!t2n)(n=0dnn!t1n)dndn1t1n1t2n1(dn+1dndndn1)(t1t2)C(t2t1).\begin{split}&[H^{\prime}(t_{1})/H(t_{1})-H^{\prime}(t_{2})/H(t_{2})]\cdot[H(t_{1})H(t_{2})]\\ =&(\sum_{n=0}^{\infty}\frac{d_{n+1}}{n!}t_{1}^{n})(\sum_{n=0}^{\infty}\frac{d_{n}}{n!}t_{2}^{n})-(\sum_{n=0}^{\infty}\frac{d_{n+1}}{n!}t_{2}^{n})(\sum_{n=0}^{\infty}\frac{d_{n}}{n!}t_{1}^{n})\\ \geq&d_{n^{\prime}}d_{n^{\prime}-1}t_{1}^{n^{\prime}-1}t_{2}^{n^{\prime}-1}(\frac{d_{n^{\prime}+1}}{d_{n^{\prime}}}-\frac{d_{n^{\prime}}}{d_{n^{\prime}-1}})(t_{1}-t_{2})\\ \geq&C(t_{2}-t_{1}).\end{split}

Since H(t)H(t) has a finite positive upper bound AA and a positive lower bound BB on \mathfrak{I}, we have

H(t1)/H(t1)H(t2)/H(t2)C(t2t1)/A2,H^{\prime}(t_{1})/H(t_{1})-H^{\prime}(t_{2})/H(t_{2})\geq C(t_{2}-t_{1})/A^{2},

meaning that

t,[H(t)/H(t)]={H(t)H′′(t)[H(t)]2}/[H(t)]2C/A2,\forall t\in\mathfrak{I},\,\,[H^{\prime}(t)/H(t)]^{\prime}=\{H(t)H^{\prime\prime}(t)-[H^{\prime}(t)]^{2}\}/[H(t)]^{2}\leq-C/A^{2},

and thus

t,H(t)H′′(t)[H(t)]2CB2/A2<0.\forall t\in\mathfrak{I},\,\,H(t)H^{\prime\prime}(t)-[H^{\prime}(t)]^{2}\leq-CB^{2}/A^{2}<0.

Therefore, 𝔼(X2)𝔼(X)[𝔼(X)]2=e2λ{H(1)H′′(1)[H(1)]2}<0\mathbb{E}(X^{2})-\mathbb{E}(X)-[\mathbb{E}(X)]^{2}=e^{-2\lambda}\{H(1)H^{\prime\prime}(1)-[H^{\prime}(1)]^{2}\}<0, and VMR(X)<1\text{VMR}(X)<1.

We have proved in (1) that if hn+1=hnh_{n+1}=h_{n} for all nn, then VMR(X)=1\text{VMR}(X)=1. Thus when hn+1hnh_{n+1}\leq h_{n} for all nn, VMR(X)=1\text{VMR}(X)=1 if and only if hn+1=hnh_{n+1}=h_{n} for all nn. ∎

In sum, for the Markov chain model of one gene (by assuming the expression to be autonomous), when we have the stationary distribution from single-cell non-interventional one-time gene expression data, we can calculate the VMR of XX. VMR(X)>1\text{VMR}(X)>1 means the existence of positive autoregulation, and VMR(X)<1\text{VMR}(X)<1 means the existence of negative autoregulation. VMR(X)=1\text{VMR}(X)=1 means either (1) no autoregulation exists; or (2) both positive autoregulation and negative autoregulation exist (at different expression levels).

5 Scenario of multiple entangled genes

5.1 Setup

We consider mm genes V1,,VmV_{1},\ldots,V_{m} for a single cell. Denote their expression levels by random variables X1,,XmX_{1},\ldots,X_{m}. The change of XiX_{i} can depend on XjX_{j} (mutual regulation) and XiX_{i} itself (autoregulation). Since these genes regulate each other, and their expression levels are not fixed, we cannot consider them separately. If the expression of gene VkV_{k} is non-autonomous, we also need to add its interior factors (gene activation state, etc.) into X1,,XmX_{1},\ldots,X_{m}.

We can use a continuous-time Markov chain on ()m(\mathbb{Z}^{*})^{m} to describe the dynamics. Each state of this Markov chain, (X1=n1,,Xi=ni,,Xm=nm)(X_{1}=n_{1},\ldots,X_{i}=n_{i},\ldots,X_{m}=n_{m}), can be abbreviated as 𝒏=(n1,,ni,,nm)\boldsymbol{n}=(n_{1},\ldots,n_{i},\ldots,n_{m}). For gene ViV_{i}, the transition rate of ni1nin_{i}-1\to n_{i} is fi(𝒏)f_{i}(\boldsymbol{n}), and the the transition rate of nini1n_{i}\to n_{i}-1 is gi(𝒏)nig_{i}(\boldsymbol{n})n_{i}. The master equation of this process is

d(𝒏)/dt=i(n1,,ni+1,,nm)gi(n1,,ni+1,,nm)(ni+1)+i(n1,,ni1,,nm)fi(𝒏)(𝒏)i[fi(n1,,ni+1,,nm)+gi(𝒏)ni].\begin{split}\mathrm{d}\mathbb{P}(\boldsymbol{n})/\mathrm{d}t=&\sum_{i}\mathbb{P}(n_{1},\ldots,n_{i}+1,\ldots,n_{m})g_{i}(n_{1},\ldots,n_{i}+1,\ldots,n_{m})(n_{i}+1)\\ &+\sum_{i}\mathbb{P}(n_{1},\ldots,n_{i}-1,\ldots,n_{m})f_{i}(\boldsymbol{n})\\ &-\mathbb{P}(\boldsymbol{n})\sum_{i}[f_{i}(n_{1},\ldots,n_{i}+1,\ldots,n_{m})+g_{i}(\boldsymbol{n})n_{i}].\end{split} (3)

Define 𝒏i¯=(n1,,ni1,ni+1,,nm)\boldsymbol{n}_{\bar{i}}=(n_{1},\ldots,n_{i-1},n_{i+1},\ldots,n_{m}). Define hi(𝒏)=fi(𝒏)/gi(𝒏)h_{i}(\boldsymbol{n})=f_{i}(\boldsymbol{n})/g_{i}(\boldsymbol{n}) to be the relative growth rate of gene ViV_{i}. Autoregulation means for some fixed 𝒏i¯\boldsymbol{n}_{\bar{i}}, hi(𝒏)h_{i}(\boldsymbol{n}) is (locally) increasing/decreasing with nin_{i}, thus fi(𝒏)f_{i}(\boldsymbol{n}) increases/decreases and/or gi(𝒏)g_{i}(\boldsymbol{n}) decreases/increases with nin_{i}. For the non-autonomous scenario, another possibility for autoregulation is that ViV_{i} can affect its interior factors.

5.2 Theoretical results

With expression data for multiple genes, there are various methods to infer the regulatory relationships between different genes, so that the GRN can be reconstructed [10]. In the GRN, if there is a directed path from gene ViV_{i} to gene VjV_{j}, meaning that ViV_{i} can directly or indirectly regulate VjV_{j}, then ViV_{i} is an ancestor of VjV_{j}, and VjV_{j} is a descendant of ViV_{i}.

Fix a gene VkV_{k} in a GRN. We first consider a simple case that VkV_{k} is not contained in any directed cycle (feedback loop), which means no gene is both an ancestor and a descendant of VkV_{k}, such as PIP2 in Fig. 1. This means VkV_{k} itself is a strongly connected component of the GRN. This condition is automatically satisfied if the GRN has no directed cycle. If the expression of VkV_{k} is non-autonomous, we need to add the interior factors of VkV_{k} into V1,,VmV_{1},\ldots,V_{m}, and it is acceptable that VkV_{k} regulates its interior factors. In this case, we can prove that if VkV_{k} does not regulate itself, meaning that hk(𝒏)h_{k}(\boldsymbol{n}) is a constant for fixed 𝒏k¯\boldsymbol{n}_{\bar{k}} and different nkn_{k}, and XkX_{k} does not affect its interior factors (if non-autonomous), then VMR(Xk)1\text{VMR}(X_{k})\geq 1. The reason is that VMR<1\text{VMR}<1 requires either a feedback loop or autoregulation. We need to assume that the per molecule degradation rate gk()g_{k}(\cdot) for VkV_{k} is not affected by V1,,VmV_{1},\ldots,V_{m}, which is not always true in reality [1]. With this result, when VMR<1\text{VMR}<1, there might be autoregulation.

Proposition 2.

Consider the Markov chain model for multiple genes, described by Eq. 3. Assume the GRN has no directed cycle, or at least there is no directed cycle that contains gene VkV_{k}. Assume gk()g_{k}(\cdot) is a constant for all 𝐧\boldsymbol{n}. If VkV_{k} has no autoregulation, meaning that hk()h_{k}(\cdot) and fk()f_{k}(\cdot) do not depend on nkn_{k}, and VkV_{k} does not regulate its interior factors, then VkV_{k} has VMR1\text{VMR}\geq 1. Therefore, VkV_{k} has VMR<1\text{VMR}<1 means VkV_{k} has autoregulation.

This Proposition is in an unpublished work by Paulsson et al., who study a similar problem [51, 52]. It also appears in a preprint by Mahajan et al. [53], but the proof is based on a linear approximation. We propose a rigorous proof independently.

Proof.

Denote the expression level of VkV_{k} by WW. Assume the ancestors of VkV_{k} are V1,,VlV_{1},\ldots,V_{l}. For simplicity, denote the expression levels of V1,,VlV_{1},\ldots,V_{l} by a (high-dimensional) random variable YY. Assume VkV_{k} has no autoregulation. Since VkV_{k} does not regulate V1,,VlV_{1},\ldots,V_{l}, WW does not affect YY. Denote the transition rate from Y=iY=i to Y=jY=j by qij0q_{ij}\geq 0. Stipulate that qii=jiqijq_{ii}=-\sum_{j\neq i}q_{ij}. When Y=iY=i, the transition rate from W=nW=n to W=n+1W=n+1 is FiF_{i} (does not depend on nn), and the transition rate from W=nW=n to W=n1W=n-1 is GG.

The master equation of this process is

d[W(t)=n,Y(t)=i]/dt=[W(t)=n1,Y(t)=i]Fi+[W(t)=n+1,Y(t)=i]G(n+1)+ji[W(t)=n,Y(t)=j]qji[W(t)=n,Y(t)=i](Fi+Gn+jiqij).\begin{split}&\mathrm{d}\mathbb{P}[W(t)=n,Y(t)=i]/\mathrm{d}t\\ =&\mathbb{P}[W(t)=n-1,Y(t)=i]F_{i}+\mathbb{P}[W(t)=n+1,Y(t)=i]G(n+1)\\ &+\sum_{j\neq i}\mathbb{P}[W(t)=n,Y(t)=j]q_{ji}-\mathbb{P}[W(t)=n,Y(t)=i](F_{i}+Gn+\sum_{j\neq i}q_{ij}).\end{split}

Assume there is a unique stationary probability distribution Pn,i=limt[W(t)=n,Y(t)=i]P_{n,i}=\lim_{t\to\infty}\mathbb{P}[W(t)=n,Y(t)=i]. Then we have

Pn,i[Fi+Gn+jqij]=Pn1,iFi+Pn+1,iG(n+1)+jPn,jqji.P_{n,i}\Big{[}F_{i}+Gn+\sum_{j}q_{ij}\Big{]}=P_{n-1,i}F_{i}+P_{n+1,i}G(n+1)+\sum_{j}P_{n,j}q_{ji}. (4)

Define Pi=nPn,iP_{i}=\sum_{n}P_{n,i}. Sum over nn for Eq. 4 to obtain

Pijqij=jPjqji,P_{i}\sum_{j}q_{ij}=\sum_{j}P_{j}q_{ji}, (5)

meaning that PiP_{i} is the stationary probability distribution of YY.

Define WiW_{i} to be WW conditioned on Y=iY=i at stationary. Then (Wi=n)=(W=nY=i)=Pn,i/Pi\mathbb{P}(W_{i}=n)=\mathbb{P}(W=n\mid Y=i)=P_{n,i}/P_{i}, and 𝔼(Wi)=nnPn,i/Pi\mathbb{E}(W_{i})=\sum_{n}nP_{n,i}/P_{i}. Multiply Eq. 4 by nn and sum over nn to obtain

(G+jqij)Pi𝔼(Wi)=FiPi+jqjiPj𝔼(Wj).\Big{(}G+\sum_{j}q_{ij}\Big{)}P_{i}\mathbb{E}(W_{i})=F_{i}P_{i}+\sum_{j}q_{ji}P_{j}\mathbb{E}(W_{j}). (6)

Sum over ii for Eq. 6 to obtain

GiPi𝔼(Wi)=iFiPi.G\sum_{i}P_{i}\mathbb{E}(W_{i})=\sum_{i}F_{i}P_{i}. (7)

Multiply Eq. 4 by n2n^{2} and sum over nn to obtain

(2G+jqij)Pi𝔼(Wi2)=FiPi+(2Fi+G)Pi𝔼(Wi)+jqjiPj𝔼(Wj2).\Big{(}2G+\sum_{j}q_{ij}\Big{)}P_{i}\mathbb{E}(W_{i}^{2})=F_{i}P_{i}+(2F_{i}+G)P_{i}\mathbb{E}(W_{i})+\sum_{j}q_{ji}P_{j}\mathbb{E}(W_{j}^{2}). (8)

Sum over ii for Eq. 8 to obtain

2GiPi𝔼(Wi2)=iFiPi+2iFiPi𝔼(Wi)+GiPi𝔼(Wi).2G\sum_{i}P_{i}\mathbb{E}(W_{i}^{2})=\sum_{i}F_{i}P_{i}+2\sum_{i}F_{i}P_{i}\mathbb{E}(W_{i})+G\sum_{i}P_{i}\mathbb{E}(W_{i}). (9)

Multiply Eq. 6 by 𝔼(Wi)\mathbb{E}(W_{i}) and sum over ii to obtain

GiPi[𝔼(Wi)]2+i,jPiqij[𝔼(Wi)]2=iFiPi𝔼(Wi)+i,jPjqji𝔼(Wi)𝔼(Wj).\begin{split}&G\sum_{i}P_{i}[\mathbb{E}(W_{i})]^{2}+\sum_{i,j}P_{i}q_{ij}[\mathbb{E}(W_{i})]^{2}\\ =&\sum_{i}F_{i}P_{i}\mathbb{E}(W_{i})+\sum_{i,j}P_{j}q_{ji}\mathbb{E}(W_{i})\mathbb{E}(W_{j}).\end{split} (10)

Then we have

iFiPi𝔼(Wi)GiPi[𝔼(Wi)]2=i,jPiqij[𝔼(Wi)]2i,jPjqji𝔼(Wi)𝔼(Wj)=12{i,jPiqij[𝔼(Wi)]2+i[𝔼(Wi)]2jPiqij2i,jPiqij𝔼(Wi)𝔼(Wj)}=12{i,jPiqij[𝔼(Wi)]2+i[𝔼(Wi)]2jPjqji2i,jPiqij𝔼(Wi)𝔼(Wj)}=12{i,jPiqij[𝔼(Wi)]2+i,jPiqij[𝔼(Wj)]22i,jPiqij𝔼(Wi)𝔼(Wj)}=12i,jPiqij[𝔼(Wi)𝔼(Wj)]20.\begin{split}&\sum_{i}F_{i}P_{i}\mathbb{E}(W_{i})-G\sum_{i}P_{i}[\mathbb{E}(W_{i})]^{2}\\ =&\sum_{i,j}P_{i}q_{ij}[\mathbb{E}(W_{i})]^{2}-\sum_{i,j}P_{j}q_{ji}\mathbb{E}(W_{i})\mathbb{E}(W_{j})\\ =&\frac{1}{2}\Big{\{}\sum_{i,j}P_{i}q_{ij}[\mathbb{E}(W_{i})]^{2}+\sum_{i}[\mathbb{E}(W_{i})]^{2}\sum_{j}P_{i}q_{ij}-2\sum_{i,j}P_{i}q_{ij}\mathbb{E}(W_{i})\mathbb{E}(W_{j})\Big{\}}\\ =&\frac{1}{2}\Big{\{}\sum_{i,j}P_{i}q_{ij}[\mathbb{E}(W_{i})]^{2}+\sum_{i}[\mathbb{E}(W_{i})]^{2}\sum_{j}P_{j}q_{ji}-2\sum_{i,j}P_{i}q_{ij}\mathbb{E}(W_{i})\mathbb{E}(W_{j})\Big{\}}\\ =&\frac{1}{2}\Big{\{}\sum_{i,j}P_{i}q_{ij}[\mathbb{E}(W_{i})]^{2}+\sum_{i,j}P_{i}q_{ij}[\mathbb{E}(W_{j})]^{2}-2\sum_{i,j}P_{i}q_{ij}\mathbb{E}(W_{i})\mathbb{E}(W_{j})\Big{\}}\\ =&\frac{1}{2}\sum_{i,j}P_{i}q_{ij}[\mathbb{E}(W_{i})-\mathbb{E}(W_{j})]^{2}\geq 0.\end{split} (11)

Here the first equality is from Eq. 10, the third equality is from Eq. 5, and other equalities are equivalent transformations.

Now we have

𝔼(W2)𝔼(W)[𝔼(W)]2=iPi𝔼(Wi2)iPi𝔼(Wi)[iPi𝔼(Wi)]2=1GiFiPi𝔼(Wi)+iPi𝔼(Wi)iPi𝔼(Wi)[iPi𝔼(Wi)]2iPi[𝔼(Wi)]2[iPi𝔼(Wi)]2=(iPi)iPi[𝔼(Wi)]2[iPi𝔼(Wi)]20,\begin{split}&\mathbb{E}(W^{2})-\mathbb{E}(W)-[\mathbb{E}(W)]^{2}\\ =&\sum_{i}P_{i}\mathbb{E}(W_{i}^{2})-\sum_{i}P_{i}\mathbb{E}(W_{i})-\Big{[}\sum_{i}P_{i}\mathbb{E}(W_{i})\Big{]}^{2}\\ =&\frac{1}{G}\sum_{i}F_{i}P_{i}\mathbb{E}(W_{i})+\sum_{i}P_{i}\mathbb{E}(W_{i})-\sum_{i}P_{i}\mathbb{E}(W_{i})-\Big{[}\sum_{i}P_{i}\mathbb{E}(W_{i})\Big{]}^{2}\\ \geq&\sum_{i}P_{i}[\mathbb{E}(W_{i})]^{2}-\Big{[}\sum_{i}P_{i}\mathbb{E}(W_{i})\Big{]}^{2}\\ =&\Big{(}\sum_{i}P_{i}\Big{)}\sum_{i}P_{i}[\mathbb{E}(W_{i})]^{2}-\Big{[}\sum_{i}P_{i}\mathbb{E}(W_{i})\Big{]}^{2}\geq 0,\\ \end{split}

where the first equality is by definition, the second equality is from Eqs. 7,9, the first inequality is from Eq. 11, the third equality is from iPi=1\sum_{i}P_{i}=1, and the second inequality is the Cauchy inequality.

Since 𝔼(W2)[𝔼(W)]2𝔼(W)\mathbb{E}(W^{2})-[\mathbb{E}(W)]^{2}\geq\mathbb{E}(W), VMR(W)={𝔼(W2)[𝔼(W)]2}/𝔼(W)1\text{VMR}(W)=\{\mathbb{E}(W^{2})-[\mathbb{E}(W)]^{2}\}/\mathbb{E}(W)\geq 1. ∎

We hypothesize that the requirement for gk()g_{k}(\cdot) in Proposition 2 can be dropped.

Conjecture 1.

Assume VkV_{k} is not contained in a directed cycle in the GRN, and VkV_{k} does not regulate its interior factors. If VkV_{k} has no autoregulation, meaning that hk()h_{k}(\cdot) does not depend on nkn_{k} (but might depend on 𝐧k¯\boldsymbol{n}_{\bar{k}}), then VkV_{k} has VMR1\text{VMR}\geq 1.

If the GRN has directed cycles, there is a conjecture by Paulsson et al. [51, 52], which has been numerically verified but not proved yet.

Conjecture 2.

Assume for each ViV_{i}, gi()g_{i}(\cdot) does not depend on 𝐧\boldsymbol{n}, and fi()f_{i}(\cdot) does not depend on nin_{i} (no autoregulation). Then for at least one gene VjV_{j}, we have VMR1\text{VMR}\geq 1.

Notice that Conjecture 2 does not hold if gig_{i} depends on 𝒏i¯\boldsymbol{n}_{\bar{i}}. One counterexample is m=2m=2, f1(n2)=g1(n2)=1f_{1}(n_{2})=g_{1}(n_{2})=1 for n2=2n_{2}=2, f1(n2)=g1(n2)=0f_{1}(n_{2})=g_{1}(n_{2})=0 for n22n_{2}\neq 2, and f2(n1)=g2(n1)=1f_{2}(n_{1})=g_{2}(n_{1})=1 for n1=2n_{1}=2, f2(n1)=g2(n1)=0f_{2}(n_{1})=g_{2}(n_{1})=0 for n12n_{1}\neq 2. Then VMR=2e/(4e1)0.55\text{VMR}=2e/(4e-1)\approx 0.55 for both genes.

Assume Conjecture 2 is correct. For mm genes, if we find that VMR for each gene is less than 11, then we can infer that autoregulation exists, although we do not know which gene has autoregulation.

6 Applying theoretical results to experimental data

 
  1. 1.

    Input

    Single-cell non-interventional one-time expression data for genes V1,,VmV_{1},\ldots,V_{m}

    The structure of GRN that contains V1,,VmV_{1},\ldots,V_{m}

  2. 2.

    Calculate the VMR of each VkV_{k}

  3. 3.

    If VkV_{k} is not in a directed cycle (like PIP2 in Fig. 1) and VMR<1\text{VMR}<1

    Output VkV_{k} has autoregulation

    // Assume the degradation of VkV_{k} is not regulated by V1,,VmV_{1},\ldots,V_{m}

    Else

    If VkV_{k} has no ancestor in the GRN (like PIP3 in Fig. 1) and VMR>1\text{VMR}>1

    Output VkV_{k} has autoregulation

    //Assume the expression of VkV_{k} is autonomous

    Else

    Output We cannot determine whether VkV_{k} has autoregulation

    End of if

    End of if

Algorithm 1 Detailed workflow of inferring autoregulation with gene expression data.

We summarize our theoretical results into Algorithm 1. Proposition 1 applies to a gene that has no ancestor in the GRN. However, it requires the corresponding gene has autonomous expression, which is difficult to validate and often does not hold in reality. Thus the inference result by Proposition 1 for VMR>1\text{VMR}>1 (positive autoregulation) is not very reliable. When VMR<1\text{VMR}<1 and Proposition 1 could apply, we should instead apply Proposition 2 to determine the existence of autoregulation, since Proposition 2 does not require the expression to be autonomous, thus being much more reliable. Proposition 2 applies when the gene is not in a feedback loop and has VMR<1\text{VMR}<1. Notice that our result cannot determine that a gene has no autoregulation.

For a given gene without autoregulation, its expression level satisfies a Poisson distribution, and VMR is 11. If we have nn samples of its expression level, then the sample VMR (sample variance divided by sample mean) asymptotically satisfies a Gamma distribution Γ[(n1)/2,2/(n1)]\Gamma[(n-1)/2,2/(n-1)], and we can determine the confidence interval of sample VMR [54]. If the sample VMR is out of this confidence interval, then we know that VMR is significantly different from 11, and Propositions 1,2 might apply.

We apply our method to four groups of single-cell non-interventional one-time gene expression data from experiments, where the corresponding GRNs are known. Notice that we need to convert indirect measurements into protein/mRNA count. See Table 1 for our inference results and theoretical/experimental evidence that partially validates our results. See Appendix A for details. There are 186 genes in these four data sets, and we can only determine that 12 genes have autoregulation (7 genes determined by Proposition 1, and 5 genes determined by Proposition 2). Not every VMR is less than 11, so that Conjecture 2 does not apply. For the other 174 genes, Proposition 1 and Proposition 2 do not apply, and we do not know whether they have autoregulation.

In some cases, we have experimental evidence that some genes have autoregulation, so that we can partially validate our inference results. Nevertheless, as discussed in the Introduction, there is no gold standard to evaluate our inference results.

In the data set by Guo et al. [55], Sanchez-Castillo et al. [39] inferred that 17 of 39 genes have autoregulation, and 22 genes do not have autoregulation. We infer that 5 genes have autoregulation, and 34 genes cannot be determined. Here 3 genes are shared by both inference results to have autoregulation. Consider a random classifier that randomly picks 5 genes and claims they have autoregulation. Using Sanchez-Castillo et al. as the standard, this random classifier has probability 62.55%62.55\% to be worse than our result, and 10.17%10.17\% to be better than our result. Thus our inference result is better than a random classifier, but the advantage is not significant.

Source
Propo-
sition 1
Propo-
sition 2
Theory Experiment
Guo
et al. [55]
FN1
HNF4A
TCFAP2C
BMP4
CREB312
BMP4 [39]
HNF4A [39]
TCFAP2C [39]
BMP4 [56]
HNF4A [57]
TCFAP2C [58]
Psaila
et al. [59]
BIM
CCND1
ECT2
PFKP
ECT2 [60]
Moignard
et al. [61]
EIF2B1
HOXD8
Sachs
et al. [62]
PIP3
Table 1: The autoregulation inference results by our method on four data sets. Source column is the paper that contains this data set. Proposition 1 column is the genes that can be only inferred by Proposition 1 to have autoregulation. Proposition 2 column is the genes that can be inferred by Proposition 2 to have autoregulation. Theory column is the genes inferred by both our method and other theoretical works to have autoregulation. Experiment column is the genes inferred by both our method and other experimental works to have autoregulation. Bold font means the inferred gene with autoregulation is validated by other results. Details can be found in Appendix A.

7 Conclusions

For a single gene that is not affected by other genes, or a group of genes that form a connected GRN, we develop theoretical results to determine the existence of autoregulation. These results generalize known relationships between autoregulation and VMR by dropping restrictions on parameters. Our results only depend on VMR, which is easy to compute and more robust than other complicated statistics. We also apply our method to experimental data and detect some genes that might have autoregulation. We prove two propositions for Markov chains, which might have theoretical values.

We introduce two conjectures that have been numerically verified but not yet proved. They are of theoretical interest and worth further consideration. The Markov chain models in this paper can be studied via lifting into a higher-dimensional space [63], treating as a random dynamical system [64], or as a branching process [65]. With the expression profiles of different genes, we can construct a similarity graph [66]. If we know the existence of autoregulation for some genes, we can use the similarity graph to infer other genes.

Our method requires independent and identically distributed samples from the exact stationary distribution of a fully observed Markov chain, plus a known GRN. Proposition 1 requires the expression is autonomous. Proposition 2 requires that the GRN has no directed cycle, and degradation is not regulated. If our inference fails, then some requirements are not met: (1) cells might affect each other, making the samples dependent; (2) cells are heterogeneous; (3) the measurements have extra errors; (4) the cells are not at stationary; (5) there exist unobserved variables that affect gene expression; (6) the GRN is inferred by a theoretical method, which can be interfered by the existence of autoregulation; (7) the expression is non-autonomous; (8) the GRN has unknown directed cycles; (9) the degradation rate is regulated by other genes. Such situations, especially the unobserved variables, are unavoidable. Therefore, current data might not satisfy these requirements, and our inference results should be interpreted as informative findings, not ground truths. In fact, other theoretical works that determine gene autoregulation, or general gene regulation, also need similar assumptions and might fail. Nevertheless, with the development of experimental technologies, there will be more data with higher quality that fit the requirements of our method. Thus we believe that our method will be more applicable in the future.

Cells keep growing and dividing, and the gene expression fluctuates along the cell cycle. Discussions on such non-stationary situations can be found in other papers [20, 67, 68, 69].

About cell heterogeneity, we prove a result in Appendix B that if several cell types have VMR1\text{VMR}\geq 1, then for a mixed population of such cell types, we still have VMR1\text{VMR}\geq 1. Therefore, cell heterogeneity does not fail Proposition 2, since VMR<1\text{VMR}<1 for the mixture of several cell types means VMR<1\text{VMR}<1 for at least one cell type.

Appendix A Details of applications on experimental data

In experiments, the expression levels of genes are not directly measured as mRNA or protein counts. Rather, they are measured as cycle threshold (Ct) values or fluorescence intensity values. Such indirected measurements need to be converted. Related details can be found in other papers [50].

Guo et al. [55] measured the expression (mRNA) levels of 48 genes for mouse embryo cells at different developmental stages. We consider three groups (16-cell stage, 32-cell stage, 64-cell stage) that have more than 50 samples. Sanchez-Castillo et al. [39] used such data to infer the GRN structure, including autoregulation, but the GRN only contains 39 genes. Thus we ignore the other 9 genes. In the inferred GRN, genes BMP4, CREB312, and TCFAP2C are not contained in directed cycles. In the 16-cell stage group with 75 samples, if there is no autoregulation, then the 95%95\% confidence interval of VMR is [0.7041,1.3470][0.7041,1.3470]. BMP4 (VMR=0.2139\text{VMR}=0.2139), CREB312 (VMR=0.1971\text{VMR}=0.1971), and TCFAP2C (VMR=0.3468\text{VMR}=0.3468) have significantly small VMR, and we can apply Proposition 2 to infer that BMP4, CREB312, and TCFAP2C might have autoregulation. In the other two groups, these genes do not have VMR<1\text{VMR}<1, and the results are relatively weak. Besides, in the inferred GRN, genes FN1 and HNF4A have no ancestors. For the 16-cell stage with 75 samples, the VMR of FN1 and HNF4A are 3.45223.4522 and 1.35991.3599, outside of the 95%95\% confidence interval [0.7041,1.3470][0.7041,1.3470]; for the 32-cell stage with 113 samples, the VMR of FN1 and HNF4A are 93.107093.1070 and 46.768846.7688, outside of the 95%95\% confidence interval [0.7554,1.2784][0.7554,1.2784]; for the 64-cell stage with 159 samples, the VMR of FN1 and HNF4A are 117.3059117.3059 and 93.958993.9589, outside of the 95%95\% confidence interval [0.7917,1.2322][0.7917,1.2322]. Thus we can apply Proposition 1 to infer that FN1 and HNF4A (VMR>1\text{VMR}>1 for all three cell groups) might have positive autoregulation. Nevertheless, it is more likely that the expressions of FN1 and HNF4A are non-autonomous, and there is no autoregulation. Sanchez-Castillo et al. [39] inferred that BMP4, HNF4A, TCFAP2C have autoregulation. Besides, there is experimental evidence that BMP4 [56], HNF4A [57], TCFAP2C [58] have autoregulation. Therefore, our inference results are partially validated.

Psaila et al. [59] measured the expression (mRNA) levels of 90 genes for human megakaryocyte-erythroid progenitor cells. Chan et al. [70] inferred the GRN structure (autoregulation not included). In the inferred GRN, genes BIM, CCND1, ECT2, PFKP have no ancestors. BIM has 214 effective samples, and VMR is 187.7187.7, outside of the 95%95\% confidence interval [0.8191,1.1987][0.8191,1.1987]. CCND1 has 68 effective samples, and VMR is 111.3111.3, outside of the 95%95\% confidence interval [0.6905,1.3660][0.6905,1.3660]. ECT2 has 56 effective samples, and VMR is 8.28.2, outside of the 95%95\% confidence interval [0.6618,1.4069][0.6618,1.4069]. PFKP has 134 effective samples, and VMR is 82.182.1, outside of the 95%95\% confidence interval [0.7742,1.2543][0.7742,1.2543]. Thus we can apply Proposition 1 to infer that BIM, CCND1, ECT2, PFKP might have positive autoregulation. Nevertheless, it is more likely that the expressions of these four genes are non-autonomous, and there is no autoregulation. There is experimental evidence that ECT2 has autoregulation [60], which partially validates our inference results. No other gene fits the requirement of Proposition 2.

Moignard et al. [61] measured the expression (mRNA) levels of 46 genes for mouse embryo cells. Chan et al. [70] inferred the GRN structure (autoregulation not included). Gene EIF2B1 has 3934 effective samples, and VMR is 0.660.66, outside of the 95%95\% confidence interval [0.9563,1.0447][0.9563,1.0447]. Gene EIF2B1 has 12 effective samples, and VMR is 0.240.24, outside of the 95%95\% confidence interval [0.3469,1.9927][0.3469,1.9927]. We can apply Proposition 2 to infer that EIF2B1 and HOXD8 might have autoregulation. No other gene fits the requirement of Proposition 1.

Sachs et al. [62] measured the expression (protein) levels of 11 genes in the RAF signaling pathway for human T cells. The measurements were repeated for 14 groups of cells under different interventions. Werhli et al. [3] inferred the GRN structure (autoregulation not included). In the inferred GRN (Fig. 1), PIP3 gene has no ancestor, and its VMRs in all 14 groups are larger than 55, while the 95%95\% confidence intervals for all 14 groups are contained in [0.8,1.2][0.8,1.2]. Therefore, we can apply Proposition 1 and infer that PIP3 might have positive autoregulation. Nevertheless, it is more likely that the expression of PIP3 is non-autonomous, and there is no autoregulation. No other gene fits the requirement of Proposition 2.

Appendix B Heterogeneity and VMR

Proposition 3.

Consider nn independent random variables X1,,XnX_{1},\ldots,X_{n} and probabilities p1,,pnp_{1},\ldots,p_{n} with pi=1\sum p_{i}=1. Consider an independent random variable RR that equals ii with probability pip_{i}. Construct a random variable ZZ that equals XiX_{i} when R=iR=i. If each XiX_{i} has VMR1\text{VMR}\geq 1, then ZZ has VMR1\text{VMR}\geq 1.

Proof.

We only need to prove this for n=2n=2. The case for general nn can be proved by mixing two variables iteratively.

Consider random variables X,YX,Y and construct ZZ that equals XX or YY with probability pp or 1p1-p. Since VMR(X)1\text{VMR}(X)\geq 1, VMR(Y)1\text{VMR}(Y)\geq 1, we have 𝔼(X2)[𝔼(X)]2𝔼(X)\mathbb{E}(X^{2})-[\mathbb{E}(X)]^{2}\geq\mathbb{E}(X) and 𝔼(Y2)[𝔼(Y)]2𝔼(Y)\mathbb{E}(Y^{2})-[\mathbb{E}(Y)]^{2}\geq\mathbb{E}(Y). Then

VMR(Z)=p𝔼(X2)+(1p)𝔼(Y2)p𝔼(X)+(1p)𝔼(Y)+p2[𝔼(X)]22p(1p)𝔼(X)𝔼(Y)(1p)2[𝔼(Y)]2p𝔼(X)+(1p)𝔼(Y)=p𝔼(X2)p[𝔼(X)]2+(1p)𝔼(Y2)(1p)[𝔼(Y)]2p𝔼(X)+(1p)𝔼(Y)+p(1p)[𝔼(X)]22p(1p)𝔼(X)𝔼(Y)+p(1p)[𝔼(Y)]2p𝔼(X)+(1p)𝔼(Y)p𝔼(X)+(1p)𝔼(Y)p𝔼(X)+(1p)𝔼(Y)+p(1p)[𝔼(X)𝔼(Y)]2p𝔼(X)+(1p)𝔼(Y)1.\begin{split}&\text{VMR}(Z)\\ =&\frac{p\mathbb{E}(X^{2})+(1-p)\mathbb{E}(Y^{2})}{p\mathbb{E}(X)+(1-p)\mathbb{E}(Y)}\\ &+\frac{-p^{2}[\mathbb{E}(X)]^{2}-2p(1-p)\mathbb{E}(X)\mathbb{E}(Y)-(1-p)^{2}[\mathbb{E}(Y)]^{2}}{p\mathbb{E}(X)+(1-p)\mathbb{E}(Y)}\\ =&\frac{p\mathbb{E}(X^{2})-p[\mathbb{E}(X)]^{2}+(1-p)\mathbb{E}(Y^{2})-(1-p)[\mathbb{E}(Y)]^{2}}{p\mathbb{E}(X)+(1-p)\mathbb{E}(Y)}\\ &+\frac{p(1-p)[\mathbb{E}(X)]^{2}-2p(1-p)\mathbb{E}(X)\mathbb{E}(Y)+p(1-p)[\mathbb{E}(Y)]^{2}}{p\mathbb{E}(X)+(1-p)\mathbb{E}(Y)}\\ \geq&\frac{p\mathbb{E}(X)+(1-p)\mathbb{E}(Y)}{p\mathbb{E}(X)+(1-p)\mathbb{E}(Y)}+\frac{p(1-p)[\mathbb{E}(X)-\mathbb{E}(Y)]^{2}}{p\mathbb{E}(X)+(1-p)\mathbb{E}(Y)}\\ \geq&1.\end{split}

Acknowledgments

This research was partially supported by NIH grant R01HL146552 (Y.W.). Y.W. would like to thank Jiawei Yan for fruitful discussions, and Xiangting Li, Zikun Wang, Mingtao Xia for helpful comments. The authors would like to thank some anonymous reviewers for their wise suggestions.

Declaration of interests

The Authors declare that there is no conflict of interest.

References

  • [1] Karamyshev AL, Karamysheva ZN. Lost in translation: ribosome-associated mRNA and protein quality controls. Front Genet. 2018;9:431.
  • [2] Cunningham TJ, Duester G. Mechanisms of retinoic acid signalling and its roles in organ and limb development. Nat Rev Mol Cell Biol. 2015;16(2):110–123.
  • [3] Werhli AV, Grzegorczyk M, Husmeier D. Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical Gaussian models and Bayesian networks. Bioinformatics. 2006;22(20):2523–2531.
  • [4] Carrier TA, Keasling JD. Investigating autocatalytic gene expression systems through mechanistic modeling. J Theor Biol. 1999;201(1):25–36.
  • [5] Shen-Orr SS, Milo R, Mangan S, Alon U. Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet. 2002;31(1):64–68.
  • [6] Baumdick M, Gelléri M, Uttamapinant C, Beránek V, Chin JW, Bastiaens PI. A conformational sensor based on genetic code expansion reveals an autocatalytic component in EGFR activation. Nat Commun. 2018;9(1):1–13.
  • [7] Fang J, Ianni A, Smolka C, Vakhrusheva O, Nolte H, Krüger M, et al. Sirt7 promotes adipogenesis in the mouse by inhibiting autocatalytic activation of Sirt1. Proc Natl Acad Sci USA. 2017;114(40):E8352–E8361.
  • [8] Sheth R, Bastida MF, Kmita M, Ros M. “Self-regulation,” a new facet of Hox genes’ function. Dev Dyn. 2014;243(1):182–191.
  • [9] Wang Y, Kropp J, Morozova N. Biological notion of positional information/value in morphogenesis theory. Int J Dev Biol. 2020;64(10-11-12):453–463.
  • [10] Wang Y, Wang Z. Inference on the structure of gene regulatory networks. J Theor Biol. 2022;539:111055.
  • [11] Luecken MD, Theis FJ. Current best practices in single-cell RNA-seq analysis: a tutorial. Mol Syst Biol. 2019;15(6):e8746.
  • [12] Thattai M, Van Oudenaarden A. Intrinsic noise in gene regulatory networks. Proc Natl Acad Sci USA. 2001;98(15):8614–8619.
  • [13] Swain PS. Efficient attenuation of stochasticity in gene expression through post-transcriptional control. J Mol Biol. 2004;344(4):965–976.
  • [14] Hornos JE, Schultz D, Innocentini GC, Wang J, Walczak AM, Onuchic JN, et al. Self-regulating gene: an exact solution. Phys Rev E. 2005;72(5):051907.
  • [15] Munsky B, Neuert G, Van Oudenaarden A. Using gene expression noise to understand gene regulation. Science. 2012;336(6078):183–187.
  • [16] Grönlund A, Lötstedt P, Elf J. Transcription factor binding kinetics constrain noise suppression via negative feedback. Nat Commun. 2013;4(1):1–5.
  • [17] Dessalles R, Fromion V, Robert P. A stochastic analysis of autoregulation of gene expression. J Math Biol. 2017;75(5):1253–1283.
  • [18] Czuppon P, Pfaffelhuber P. Limits of noise for autoregulated gene expression. J Math Biol. 2018;77(4):1153–1191.
  • [19] Hui Z, Jiang Z, Qiao D, Bo Z, Qiyuan K, Shaohua B, et al. Increased expression of LCN2 formed a positive feedback loop with activation of the ERK pathway in human kidney cells during kidney stone formation. Sci Rep. 2020;10(1):1–12.
  • [20] Cao Z, Grima R. Analytical distributions for detailed models of stochastic gene expression in eukaryotic cells. Proc Natl Acad Sci USA. 2020;117(9):4682–4692.
  • [21] Firman T, Wedekind S, McMorrow T, Ghosh K. Maximum caliber can characterize genetic switches with multiple hidden species. J Phys Chem B. 2018;122(21):5666–5677.
  • [22] Paulsson J. Models of stochastic gene expression. Phys Life Rev. 2005;2(2):157–175.
  • [23] Ramos AF, Hornos JEM, Reinitz J. Gene regulation and noise reduction by coupling of stochastic processes. Phys Rev E. 2015;91(2):020701.
  • [24] Giovanini G, Sabino AU, Barros LR, Ramos AF. A comparative analysis of noise properties of stochastic binary models for a self-repressing and for an externally regulating gene. Math Biosci Eng. 2020;17(5):5477–5503.
  • [25] Stewart AJ, Seymour RM, Pomiankowski A, Reuter M. Under-dominance constrains the evolution of negative autoregulation in diploids. PLOS Comput Biol. 2013;9(3):e1002992.
  • [26] Jia C. Simplification of Markov chains with infinite state space and the mathematical theory of random gene expression bursts. Phys Rev E. 2017;96(3):032402.
  • [27] Sharma A, Adlakha N. Markov chain model to study the gene expression. Adv Appl Sci Res. 2014;5(2):387–393.
  • [28] Shmulevich I, Gluhovsky I, Hashimoto RF, Dougherty ER, Zhang W. Steady-state analysis of genetic regulatory networks modelled by probabilistic Boolean networks. Comp Funct Genomics. 2003;4(6):601–608.
  • [29] Chen X, Jia C. Limit theorems for generalized density-dependent Markov chains and bursty stochastic gene regulatory networks. J Math Biol. 2020;80(4):959–994.
  • [30] Shen H, Huo S, Yan H, Park JH, Sreeram V. Distributed dissipative state estimation for Markov jump genetic regulatory networks subject to round-robin scheduling. IEEE Trans Neural Netw Learn Syst. 2019;31(3):762–771.
  • [31] Ko Y, Kim J, Rodriguez-Zas SL. Markov chain Monte Carlo simulation of a Bayesian mixture model for gene network inference. Genes Genom. 2019;41(5):547–555.
  • [32] Bokes P, King JR, Wood AT, Loose M. Multiscale stochastic modelling of gene expression. J Math Biol. 2012;65(3):493–520.
  • [33] Jia C, Zhang MQ, Qian H. Emergent Lévy behavior in single-cell stochastic gene expression. Phys Rev E. 2017;96(4):040402.
  • [34] Jia C. Kinetic foundation of the zero-inflated negative binomial model for single-cell RNA sequencing data. SIAM J Appl Math. 2020;80(3):1336–1355.
  • [35] Shahrezaei V, Swain PS. Analytical distributions for stochastic gene expression. Proc Natl Acad Sci USA. 2008;105(45):17256–17261.
  • [36] Dobrinić P, Szczurek AT, Klose RJ. PRC1 drives Polycomb-mediated gene repression by controlling transcription initiation and burst frequency. Nat Struct Mol Biol. 2021;28(10):811–824.
  • [37] Cagnetta R, Wong HHW, Frese CK, Mallucci GR, Krijgsveld J, Holt CE. Noncanonical modulation of the eIF2 pathway controls an increase in local translation during neural wiring. Mol Cell. 2019;73(3):474–489.
  • [38] Cao Z, Grima R. Linear mapping approximation of gene regulatory networks with stochastic dynamics. Nat Commun. 2018;9(1):1–15.
  • [39] Sanchez-Castillo M, Blanco D, Tienda-Luna IM, Carrion M, Huang Y. A Bayesian framework for the inference of gene regulatory networks from time and pseudo-time series data. Bioinformatics. 2018;34(6):964–970.
  • [40] Xing B, Van Der Laan MJ. A causal inference approach for constructing transcriptional regulatory networks. Bioinformatics. 2005;21(21):4007–4013.
  • [41] Feigelman J, Ganscha S, Hastreiter S, Schwarzfischer M, Filipczyk A, Schroeder T, et al. Analysis of cell lineage trees by exact Bayesian inference identifies negative autoregulation of Nanog in mouse embryonic stem cells. Cell Syst. 2016;3(5):480–490.
  • [42] Veerman F, Popović N, Marr C. Parameter inference with analytical propagators for stochastic models of autoregulated gene expression. Int J Nonlinear Sci Numer Simul. 2021;.
  • [43] Jia C, Qian H, Chen M, Zhang MQ. Relaxation rates of gene expression kinetics reveal the feedback signs of autoregulatory gene networks. J Chem Phys. 2018;148(9):095102.
  • [44] Zhou T, Zhang J. Analytical results for a multistate gene model. SIAM J Appl Math. 2012;72(3):789–818.
  • [45] Jia C, Grima R. Small protein number effects in stochastic models of autoregulated bursty gene expression. J Chem Phys. 2020;152(8):084115.
  • [46] Jia C, Grima R. Dynamical phase diagram of an auto-regulating gene in fast switching conditions. J Chem Phys. 2020;152(17):174110.
  • [47] Wang Y, Wang L. Causal inference in degenerate systems: An impossibility result. In: International Conference on Artificial Intelligence and Statistics. PMLR; 2020. p. 3383–3392.
  • [48] Wang Y, Mistry BA, Chou T. Discrete stochastic models of SELEX: Aptamer capture probabilities and protocol optimization. J Chem Phys. 2022;156(24):244103.
  • [49] Wang Y. Some Problems in Stochastic Dynamics and Statistical Analysis of Single-Cell Biology of Cancer [Ph.D. thesis]. University of Washington; 2018.
  • [50] Jia C, Xie P, Chen M, Zhang MQ. Stochastic fluctuations can reveal the feedback signs of gene regulatory networks at the single-molecule level. Sci Rep. 2017;7(1):1–9.
  • [51] Hilfinger A, Norman TM, Vinnicombe G, Paulsson J. Constraints on fluctuations in sparsely characterized biological systems. Phys Rev Lett. 2016;116(5):058101.
  • [52] Yan J, Hilfinger A, Vinnicombe G, Paulsson J, et al. Kinetic uncertainty relations for the control of stochastic reaction networks. Phys Rev Lett. 2019;123(10):108101.
  • [53] Mahajan T, Singh A, Dar R. Topological Constraints on Noise Propagation in Gene Regulatory Networks. bioRxiv. 2021;.
  • [54] Eden UT, Kramer MA. Drawing inferences from Fano factor calculations. J Neurosci Methods. 2010;190(1):149–152.
  • [55] Guo G, Huss M, Tong GQ, Wang C, Sun LL, Clarke ND, et al. Resolution of cell fate decisions revealed by single-cell gene expression analysis from zygote to blastocyst. Dev Cell. 2010;18(4):675–685.
  • [56] Pramono A, Zahabi A, Morishima T, Lan D, Welte K, Skokowa J. Thrombopoietin induces hematopoiesis from mouse ES cells via HIF-1α\alpha–dependent activation of a BMP4 autoregulatory loop. Ann N Y Acad Sci. 2016;1375(1):38–51.
  • [57] Chahar S, Gandhi V, Yu S, Desai K, Cowper-Sal·lari R, Kim Y, et al. Chromatin profiling reveals regulatory network shifts and a protective role for hepatocyte nuclear factor 4α\alpha during colitis. Mol Cell Biol. 2014;34(17):3291–3304.
  • [58] Kidder BL, Palmer S. Examination of transcriptional networks reveals an important role for TCFAP2C, SMARCA4, and EOMES in trophoblast stem cell maintenance. Genome Res. 2010;20(4):458–472.
  • [59] Psaila B, Barkas N, Iskander D, Roy A, Anderson S, Ashley N, et al. Single-cell profiling of human megakaryocyte-erythroid progenitors identifies distinct megakaryocyte and erythroid differentiation pathways. Genome Biol. 2016;17(1):1–19.
  • [60] Hara T, Abe M, Inoue H, Yu L, Veenstra TD, Kang Y, et al. Cytokinesis regulator ECT2 changes its conformation through phosphorylation at Thr-341 in G2/M phase. Oncogene. 2006;25(4):566–578.
  • [61] Moignard V, Woodhouse S, Haghverdi L, Lilly AJ, Tanaka Y, Wilkinson AC, et al. Decoding the regulatory network of early blood development from single-cell gene expression measurements. Nat Biotechnol. 2015;33(3):269–276.
  • [62] Sachs K, Perez O, Pe’er D, Lauffenburger DA, Nolan GP. Causal protein-signaling networks derived from multiparameter single-cell data. Science. 2005;308(5721):523–529.
  • [63] Wang Y, Qian H. Mathematical representation of Clausius? and Kelvin?s statements of the second law and irreversibility. J Stat Phys. 2020;179(3):808–837.
  • [64] Ye FXF, Wang Y, Qian H. Stochastic dynamics: Markov chains and random transformations. Discrete Contin Dyn Syst - B. 2016;21(7):2337.
  • [65] Jiang DQ, Wang Y, Zhou D. Phenotypic equilibrium as probabilistic convergence in multi-phenotype cell population dynamics. PLOS ONE. 2017;12(2):e0170916.
  • [66] Wang Y, Zhang B, Kropp J, Morozova N. Inference on tissue transplantation experiments. J Theor Biol. 2021;520:110645.
  • [67] Swain PS, Elowitz MB, Siggia ED. Intrinsic and extrinsic contributions to stochasticity in gene expression. Proc Natl Acad Sci USA. 2002;99(20):12795–12800.
  • [68] Skinner SO, Xu H, Nagarkar-Jaiswal S, Freire PR, Zwaka TP, Golding I. Single-cell analysis of transcription kinetics across the cell cycle. Elife. 2016;5:e12175.
  • [69] Jia C, Grima R. Frequency domain analysis of fluctuations of mRNA and protein copy numbers within a cell lineage: theory and experimental validation. Phys Rev X. 2021;11(2):021032.
  • [70] Chan TE, Stumpf MP, Babtie AC. Gene regulatory network inference from single-cell data using multivariate information measures. Cell Syst. 2017;5(3):251–267.