The Cumulative Distribution Function Based Method for Random Drift Model

Chenghua Duan ¹¹1Department of Mathematics, Shanghai University, Shanghai, China. Email: [email protected]. ²²2Newtouch Center for Mathematics of Shanghai University, Shanghai, China, 20044. Chun Liu³³3Department of Applied Mathematics, Illinois Institute of Technology, Chicago, IL 60616, USA. Email: [email protected]. Xingye Yue ⁴⁴4School of Mathematical Sciences, Soochow University, Suzhou 215006, Jiangsu, China. Email: [email protected].

Abstract

In this paper, we propose a numerical method to uniformly handle the random genetic drift model for pure drift with or without natural selection and mutation. For pure drift and natural selection case, the Dirac $\delta$ singularity will develop at two boundary ends and the mass lumped at the two ends stands for the fixation probability. For the one-way mutation case, known as Muller’s ratchet, the accumulation of deleterious mutations leads to the loss of the fittest gene, the Dirac $\delta$ singularity will spike only at one boundary end, which stands for the fixation of the deleterious gene and loss of the fittest one. For two-way mutation case, the singularity with negative power law may emerge near boundary points. We first rewrite the original model on the probability density function (PDF) to one with respect to the cumulative distribution function (CDF). Dirac $\delta$ singularity of the PDF becomes the discontinuity of the CDF. Then we establish a upwind scheme, which keeps the total probability, is positivity preserving and unconditionally stable. For pure drift, the scheme also keeps the conservation of expectation. It can catch the discontinuous jump of the CDF, then predicts accurately the fixation probability for pure drift with or without natural selection and one-way mutation. For two-way mutation case, it can catch the power law of the singularity. The numerical results show the effectiveness of the scheme.

Keywords: Random Genetic Drift Model, Cumulative Distribution Function, Pure Drift, Natural Selection, Mutation, Muller’s ratchet.

1 Introduction

In the population genetics, the random genetic drift model describes that the number of gene variants (alleles) fluctuates randomly over time due to random sampling. The fraction of an allele in the population can be used to measure the intensity of the random genetic drift. In other words, the value of the fraction equals to zero or one, which means the allele disappears or is completely chosen in the system. Population genetics models, aiming at modeling genetic variability, had a natural start with discrete stochastic models at the individual level. The earlier mathematical model of random genetic drift, known as the Wright-Fisher model, was constructed by Ronald Fisher [13] and Sewall Wright [28, 29, 30]. The Wright-Fisher model is regarded as a discrete-time Markov chain under the assumption that the generations do not overlap and that each copy of the gene of the new generation is selected independently and randomly from the whole gene pool of the previous generation. Then Moran [21, 22] and Kimura [16, 17, 19, 20] derived the diffusion limit of random Wright-Fisher model. Chalub et. al spreaded the large population limit of the Moran process, and obtained a continuous model that may highlight the genetic-drift (neutral evolution) or natural selection [20, 4, 26].

We consider a population of constant size $N_{e}$ with a pair of two types A and B. Time $t$ is increased by the time step $\Delta t$ , and the process is repeated. $x$ is the gene $A$ frequency at $t$ generation. Let $P_{N_{e},\Delta t}(t,x)$ denote the probability of finding a fraction $x$ of type $A$ individuals at time $t$ in population of fixed size $N_{e}$ evolving in discrete time, according to the Moran process. Then, in the limit of large population and small time steps, we postulate the existence of a probability density of states, that will depend on the precise way the limits are taken [4]. Namely:

f(t,x)=\lim\limits_{N_{e}\rightarrow\infty,\\ \Delta t\rightarrow 0}\frac{P_{N_{e},\Delta t}(t,x)}{1/N_{e}}.

In this paper, we study the following initial-boundary problem of $f(t,x)$ :

\left\{\begin{aligned} &\partial_{t}f-\partial_{xx}[x(1-x)f]+\partial_{x}[M(x)f]=0,\ x\in\Omega=(0,1),\ t>0,\\ &\mathcal{J}=-\partial_{x}(x(1-x)f)+M(x)f,\ \mathcal{J}\cdot{\bf n}|_{\partial\Omega}=0,\\ &f(0,x)=f_{0}(x),x\in\Omega,\end{aligned}\right.\

(1.1)

where the function $M(x)$ , which is typically a polynomial in $x$ , incorporates the forces of migration, mutation, and selection, acting at time t[35, 20, 16].

Eq.(1.1) can be supplemented by the following conservation laws:

(a)

Mass conservation law:

$\int_{0}^{1}f(0,x)dt=\int_{0}^{1}f(t,x)dt=1.$ (1.2)
(b)

Moment conservation law:

$\frac{d}{dt}\int_{0}^{1}\theta(x)f(t,x)dx=0,$ (1.3)

where $\theta(x)$ is the fixation probability function that satisfies

$x(1-x)\theta^{\prime\prime}+M(x)\theta^{\prime}=0,\ \theta(0)=0,\ \theta(1)=1.$ (1.4)

One notes that (1.3) recovers the conservation of expectation for pure drift ( $M(x)=0$ ) with $\theta(x)=x$ .

Based on different $M(x)$ , the behavior of the solution has three cases:

Case 1. Pure drift and natural selection. For pure drift, we have $M(x)=0$ . When natural selection is involved, we have [5, 3],

M(x)=x(1-x)(\eta x+\beta).

(1.5)

The important theoretically relevant results were shown in [4, 5, 3]. Let $\mathcal{BM}^{+}([0,1])$ denote the space of (positive) Radon measures on $[0,1]$ .

Definition 1.1.

A weak solution to (1.1) is a function in $L^{\infty}([0,\infty),\mathcal{BM}^{+}([0,1]))$ satisfying

\begin{split}-\int_{0}^{\infty}\int_{0}^{1}f(t,x)\partial_{t}\phi(t,x)dxdt&=\int_{0}^{\infty}\int_{0}^{1}f(t,x)x(1-x)\left(\partial_{xx}\phi(t,x)+\frac{M(x)}{x(x-1)}\partial_{x}\phi(t,x)\right)dxdt\\ &+\int_{0}^{1}f_{0}(x)\phi(0,x)dx,\ \mbox{for\ any\ }\phi(t,x)\in C_{c}^{\infty}([0,+\infty)\times[0,1]).\end{split}

The following theorem can be found in [5].

Theorem 1.2.

If $f_{0}\in\mathcal{BM}^{+}([0,1])$ and $M(x)=0$ or given by (1.5), then there exists a unique weak solution $f\in L^{\infty}([0,\infty);\mathcal{BM}^{+}([0,1]))\cap C^{\infty}(\mathbb{R}^{+},C^{\infty}((0,1)))$ to equation (1.1), in a sense of Definition (1.1), such that the conservation laws (1.2)-(1.3) are valid. The solution has a form as

f(t,x)=r(t,x)+a(t)\delta_{0}+b(t)\delta_{1},

where $\delta_{s}$ denotes the singular point measure supported at $s$ , $r(t,x)\in C^{\infty}(\mathbb{R}^{+};C^{\infty}([0,1]))$ is a classical solution to (1.1) without any boundary condition, $a(t)$ and $b(t)$ $\in C([0,\infty))\cap C^{\infty}(\mathbb{R}^{+})$ . Furthermore, $\lim\limits_{t\rightarrow\infty}r(t,x)=0$ uniformly, and $a(t)$ and $b(t)$ are monotonically increasing functions such that

a^{\infty}:=\lim\limits_{t\rightarrow\infty}a(t)=\left(1-\int_{0}^{1}f_{0}(x)\theta(x)dx\right),

and

b^{\infty}:=\lim\limits_{t\rightarrow\infty}b(t)=\int_{0}^{1}f_{0}(x)\theta(x)dx,

where $\theta(x)$ is given by (1.4). Finally, the convergence rate is exponential.

The appearance of the point measure $\delta_{0}$ ( $\delta_{1}$ ) stands for that the fixation at gene $B$ ( $A$ ) happens with a probability $a(t)$ ( $b(t)$ ). The spatial temporal dynamics of the Kimura equation are well understood in the purely diffusive case and in only a relatively small number of population [18, 8, 7].

Case 2. One-way mutation: Muller’s ratchet. Considering the one-way mutation from gene $B$ to gene $A$ ,

M(x)=\gamma(1-x),

(1.6)

where the constant $\gamma>0$ stands for the mutation rate [16, 11, 31, 15].

A well-known model is Muller’s ratchet, i.e, the fittest gene $B$ of individuals is eventually lost from the population and deleterious mutations ( $B\rightarrow A$ ) slowly but irreversibly accumulate through rare stochastic fluctuations [24, 12]. In a finite asexual population, offspring inherit all the deleterious mutations their parents possess. Since these offspring also occasionally acquire new deleterious mutations, populations will tend to accumulate deleterious mutations over time. This effect is known as Muller’s ratchet.

There exists a unique steady solution $f_{\infty}$ ,

f_{\infty}=\delta_{1},\ i.e.,\ \mbox{the\ fittest\ gene\ is\ complitely\ lost with probability $1$}.

(1.7)

Case 3. Two-way mutation.

M(x)=\gamma(1-x)-\mu x,\ \gamma,\mu\in(0,1),

(1.8)

where the constant $\mu>0$ stands for the mutation rate of gene $A$ to gene $B$ and $\gamma>0$ is the rate in opposite direction. In the long term one might expect to exist an equilibrium state due to the two direction mutation. Actually, there exists a unique steady state solution $f_{\infty}(x)$ to (1.1) and (1.8),

f_{\infty}(x)=\frac{C}{x^{1-\gamma}(1-x)^{1-\mu}},\ x\in(0,1),

(1.9)

where $C$ is the constant such that $\int_{0}^{1}f_{\infty}(x)dx=1$ . Only singularity of negative power law develops at the ends, rather than Dirac $\delta$ appears for the cases of pure drift, natural selection and one-way mutation.

For the numerical simulation, the crucial features to solve (1.1) are that the numerical solution can keep the total probability conservation law and accurately capture the concentration phenomena at the discrete level. The total probability fails to keep the total probability by some classical numerical schemes [1, 27, 16]. A complete solution, whose total probability is unity, obtained by finite volume method (FVM) schemes in [35]. Xu et al. [33] compared a serial of finite volume and finite element schemes for the pure diffusion equation. Their critical comparison of the long-time asymptotic performance urges carefulness in choosing a numerical method for this type of problem, otherwise the main properties of the model might be not kept. In recent years, a variational particle method was proposed based on an energetic variational approach, by which a complete numerical solution can be obtained and the positivity of the solution can be kept [10]. However, some artificial criteria must be introduced to handle the concentrations near the boundary ends, even though it is designed based on the biological background. Recently, an optimal mass transport method based on pseudo-inverse of CDF is used to solve the model [3]. In this method, the feasible solution is strictly monotonous. However, for the cases of pure drift, natural selection and one-way mutation, the Dirac- $\delta$ concentrations must be developed, then the corresponding CDF must be discontinuous at the concentration points. This leads to a fact that its pseudo-inverse can not be strictly monotonous. However, numerical results were presented for the cases of pure drift and selection. So some manual intervention must be introduced in the numerical implements. The pure drift problem with multi-alleles, corresponding to a multidimensional PDE, is investigated in [34] by finite-difference methods, where the authors proposed a numerical scheme with absolute stability and several biologically/physically relevant quantities conserved, such as positivity, total probability, and expectation. So far, although quite a few numerical methods have been established for the pure drift and selection cases, efficient numerical works on the mutation case are not reported yet.

In this paper, we rewrite the system (1.1) to the following new one on the cumulative distribution function (CDF) $F(t,x)=\int_{0}^{x}f(t,s)ds$ as

\left\{\begin{aligned} &\partial_{t}F-\partial_{x}[x(1-x)\partial_{x}F]+M(x)\partial_{x}F=0,\ x\in\Omega=(0,1),\ t>0,\\ &F(t,0^{-})=0,\ F(t,1^{+})=1,\\ &F(0,x)=\int_{0}^{x}f_{0}(s)ds,x\in\Omega.\end{aligned}\right.\

(1.10)

Taking the CDF into account, the Dirac $\delta$ singularity at the boundary points of original PDF $f$ will change to the discontinuity of the CDF $F$ at the boundary points, and the fixation probability (lumped mass) will change to the height of the discontinuous jump. The singularity of negative power law will change to a bounded positive power law.

We will propose a upwind numerical scheme for (1.10) with a key revision near the boundary, which can handle the pure drift with or without selection and mutation, is unconditional stable, and keeps the total probability and positivity. It also keeps the conservation of expectation for pure drift. The numerical results show that the scheme can catch the height of discontinuity at the ends and predict accurately the fixation probability for the cases of pure drift, natural selection and one-way mutation. It also predict accurately the negative power of the power law for two-way mutation case.

The rest of this paper is organized as follows. In Section 2, we construct the numerical scheme for (1.10). Some numerical analysis is presented in Section 3. In Section 4, several numerical examples are presented to validate the theoretical results and to demonstrate the ability to trace the long-time dynamics of random genetic drift. Some discussions will be presented in Section 5 about the relations between the revised scheme and the standard finite difference method.

2 Numerical Scheme

In this section, we introduce the revised finite difference method (rFDM) for (1.10) with the central difference method for diffusion term and the upwind scheme for convection term. Let $h=1/K$ be the spatial step size, and $x_{i}=ih$ , $i=0,\cdots,{K}$ , be the spatial grid points. Let $\tau$ be the temporal step size, and $t_{n}=n\tau$ , $n=1,2,\cdots,N$ , be the temporal grid points.

Define the first order difference as

D_{h}F_{i}=\frac{F_{i}-F_{i-1}}{h},\ \ D_{h}^{up}F_{i}=\left\{\begin{aligned} &D_{h}F_{i},\ M(x_{i})>0,\\ &D_{h}F_{i+1},\ M(x_{i})<0,\end{aligned}\right.\,

(2.1)

and denote the diffusion coefficient by $a(x)=x(1-x)$ . The upwind numerical scheme, referred as rFDM, reads as: Given $F^{n-1}$ , $F^{n}$ solves the following linear algebra system,

\frac{F^{n}_{i}-F^{n-1}_{i}}{\tau}-\frac{a_{i+\frac{1}{2}}D_{h}F^{n}_{i+1}-a_{i-\frac{1}{2}}D_{h}F^{n}_{i}}{h}+M(x_{i})D^{up}_{h}F^{n}_{i}=0,\ \ i=1,\cdots,{K}-1,

(2.2)

with $F^{n}_{0}=0$ and $F^{n}_{K}=1$ , for $n=1,\cdots,N$ , and the key revision on the diffusion cofficient

a_{i-\frac{1}{2}}=a(x_{i-\frac{1}{2}}),\ i=2,\cdots,{K}-1,\ \mbox{\bf but}\ a_{\frac{1}{2}}=a_{{K}-\frac{1}{2}}=0.

(2.3)

Finally, the solution of the original equation (1.1) can be recovered from $F$ by, for $n=1,\cdots,N$ ,

f_{i}^{n}=\left\{\begin{aligned} &\frac{F_{i+1}^{n}-F_{i-1}^{n}}{2h},\ i=2,\cdots,K-2,\\ &\frac{F_{i+1}^{n}-F_{i}^{n}}{h},\ i=0,1,\\ &\frac{F_{i}^{n}-F_{i-1}^{n}}{h},\ i=K-1,K.\end{aligned}\right.\

(2.4)

In the above formula, central difference does not applied at near boundary points $i=1$ or $K-1$ , thanks to the fact that the discontinuous jump may occur at the boundary points.

Remark 2.1.

For problem (1.10) in continuous sense, the diffusion coefficient $a(0)=a(1)=0$ degenerates at the boundary points. This means that the information at the boundary points can never be transferred into the domain $\Omega=(0,1)$ by diffusion. In our revised treatment (2.3) for numerical scheme, we set $a_{\frac{1}{2}}=a_{{K}-\frac{1}{2}}=0$ , where a term of $O(h)$ order is omitted, since the exact value $a(x_{\frac{1}{2}})=a(x_{K-\frac{1}{2}})=\frac{h}{2}(1-\frac{h}{2})$ . With this revised treatment, the boundary value $F_{0}^{n}$ and $F_{K}^{n}$ is not involved in the discrete system (2.2) if $M\equiv 0$ , i.e., the boundary value can never be transferred into the inner points by discrete diffusion. In Section 5, we will discuss the standard scheme without this revision.

3 Analysis Results

In this section, we will focus on the numerical analysis for rFDM (2.2), including the unconditional stability, positivity preserving, and conservation law of the total probability and expectation.

Theorem 3.1.

The upwind scheme rFDM (2.2) is unconditionally stable and positivity preserving.

Proof: Without loss of generality, assume there exists $i^{*}$ such that $M(i)>0$ , $i=1,\cdots,i^{*}$ and $M(i)<0$ , $i=i^{*}+1,\cdots,K-1$ . (2.2) can be written as:

\begin{split}&\frac{F^{n}_{i}-F^{n-1}_{i}}{\tau}-\frac{a_{i+\frac{1}{2}}F^{n}_{i+1}-(a_{i+\frac{1}{2}}+a_{i-\frac{1}{2}})F^{n}_{i}+a_{i-\frac{1}{2}}F^{n}_{i-1}}{h^{2}}+M(x_{i})\frac{F^{n}_{i}-F^{n}_{i-1}}{h}=0,\\ &\ i=1,\cdots i^{*},\\ &\frac{F^{n}_{i}-F^{n-1}_{i}}{\tau}-\frac{a_{i+\frac{1}{2}}F^{n}_{i+1}-(a_{i+\frac{1}{2}}+a_{i-\frac{1}{2}})F^{n}_{i}+a_{i-\frac{1}{2}}F^{n}_{i-1}}{h^{2}}+M(x_{i})\frac{F^{n}_{i+1}-F^{n}_{i}}{h}=0,\\ &\ i=i^{*}+1,\cdots,K-1.\end{split}

(3.1)

with $F^{n}_{0}=0$ , $F^{n}_{K}=1$ at time $t^{n}$ , $n=1,\cdots,N$ .

Let ${\bf B}=(b_{ij})$ be the matrix of the linear system. Then ${\bf B}$ is a tri-diagonal matrix.

For $i=1,\cdots,i^{*}$ ,

b_{ii}=1+\frac{\tau}{h^{2}}(a_{i+\frac{1}{2}}+a_{i-\frac{1}{2}})+\frac{\tau}{h}M(x_{i}),

b_{i,i-1}=-\frac{\tau}{h^{2}}a_{i-\frac{1}{2}}-\frac{\tau}{h}M(x_{i}),

b_{i,i+1}=-\frac{\tau}{h^{2}}a_{i+\frac{1}{2}}.

For $i=i^{*}+1,\cdots,K-1$ ,

b_{ii}=1+\frac{\tau}{h^{2}}(a_{i+\frac{1}{2}}+a_{i-\frac{1}{2}})-\frac{\tau}{h}M(x_{i}),

b_{i,i-1}=-\frac{\tau}{h^{2}}a_{i-\frac{1}{2}},

b_{i,i+1}=-\frac{\tau}{h^{2}}a_{i+\frac{1}{2}}+\frac{\tau}{h}M(x_{i}).

For $i=0$ , $b_{ii}=1,\ b_{i,i+1}=0.$ For $i=K$ , $b_{ii}=1,\ b_{i,i-1}=0.$

Note that

b_{ii}>0,\ i=0,\cdots,K

b_{i,i-1}\leq 0,\ i=1,\cdots,K,

b_{i,i+1}\leq 0,\ i=0,\cdots,K-1,

and

b_{ii}-\Big{|}|b_{i,i-1}|+|b_{i,i+1}|\Big{|}=1>0,\ i=1,\cdots,K-1,

Then ${\bf B}$ is a M-matrix, i.e., any entry of the inverse matrix ${\bf B}^{-1}$ is positive. So rFDM (2.2) is unconditionally stable and positivity preserving.

Theorem 3.2.

The numerical scheme rFDM (2.2) keeps the conservation of total probability. For pure drift case, the conservation of expectation also holds.

Proof: Define $p_{i}^{n}:=F_{i}^{n}-F_{i-1}^{n}$ be the probability for the fraction of gene $A$ belongs to the interval $I_{i}=[x_{i-1},x_{i}]$ . Then the total probability is

P_{total}^{n}=\sum\limits_{i=1}^{K}p^{n}_{i}=F^{n}_{K}-F^{n}_{0}=1,\ \ n=0,1,\cdots,N,

i.e., the discrete system (2.2) keeps the mass conservation naturally.

By the integral by parts, the expectation $\mathcal{E}(t)$ satisfies that

\mathcal{E}(t):=\int_{0}^{1}xf(t,x)dx=\int_{0}^{1}x\partial_{x}F(t,x)dx=1-\int_{0}^{1}F(t,x)dx.

So we define a discrete expectation as

\mathcal{E}^{n}_{h}:=1-\sum\limits_{i=1}^{K-1}F^{n}_{i}h-\frac{h}{2}F^{n}_{0}-\frac{h}{2}F^{n}_{K}.

(3.2)

For pure drift case, $M(x)\equiv 0$ , then we have, by (2.2) and (2.3), that

	$\displaystyle\mathcal{E}^{n}_{h}-\mathcal{E}^{n-1}_{h}$	$\displaystyle=-\sum\limits_{i=1}^{K-1}h(F_{i}^{n}-F_{i}^{n-1})=-\sum\limits_{i=1}^{K-1}\tau\left(a_{i+\frac{1}{2}}D_{h}F^{n}_{i+1}-a_{i-\frac{1}{2}}D_{h}F^{n}_{i}\right)$
		$\displaystyle=-\tau(a_{K-\frac{1}{2}}D_{h}F^{n}_{K}-a_{\frac{1}{2}}D_{h}F^{n}_{1})=0.$

So the discrete system (2.2) keeps the conservation of the discrete expectation for pure drift case. Please note that, without the revised treatment (2.3), the conservation of the discrete expectation is invalid.

4 Numerical results

In this section, we will show the effectiveness of this algorithm by different numerical examples. In Example 1, we verify the local convergence. In Section 4.1, the pure drift $M=0$ and natural selection $M(x)=x(1-x)(\eta x+\beta)$ case are studied in Example 2 and Example 3, respectively. In Section 4.2, we consider the mutation case $M(x)=\gamma(1-x)-\mu x$ , including one-way mutation model, such as the Muller’s ratchet model, in Example 4 and two-way mutation model in Example 5.

Example 1. Local convergence

Although the singularity may develop at boundary points, the solution in interior area is sufficiently smooth. To verify the correctness of the numerical scheme, we check the local convergence.

Let $\Omega^{\prime}\subset\Omega$ be the interior area and

k_{1}:=\inf\limits_{0\leq i\leq K}\{i|x_{i}\in\Omega^{\prime}\},

k_{2}:=\max\limits_{0\leq i\leq K}\{i|x_{i}\in\Omega^{\prime}\}.

Define the error $e$ of $F(t,x)$ in $\mathcal{L}^{2}(\Omega^{\prime})$ and $\mathcal{L}^{\infty}(\Omega^{\prime})$ mode as

\|e\|_{2}:=\left(\sum\limits_{i=k_{1}}^{k_{2}}(F_{i}^{n}-F^{n}_{e,i})^{2}h\right)^{\frac{1}{2}},

(4.1)

and

\|e\|_{\infty}:=\max\limits_{k_{1}\leq i\leq k_{2}}\{|F^{n}_{i}-F^{n}_{e,i}|\},

(4.2)

where $\{F_{i}^{n}\}_{i=0}^{K}$ is the numerical solution of CDF model (1.10), and $\{F_{e,i}^{n}\}_{i=0}^{K}$ is the corresponding exact solution at time $t^{n}$ , $n=1,\cdots,N$ .

In this example, we take the initial probability density as

f_{0}(x)=1,\ x\in[0,1].

Table 1 shows the error and local convergence order of $F(t,x)$ for $M(x)=x(1-x)(-4x+2)$ and $M(x)=0.2(1-x)+0.4x$ in $\mathcal{L}^{2}([0.3,0.7])$ and $\mathcal{L}^{\infty}([0.3,0.7])$ mode with different space and time grid sizes at time $t=0.1$ . The reference ”exact” solution is obtained numerically on a fine mesh with $h=1/100000$ and $\tau=1/100000$ . The results show that the local convergence of the numerical scheme is 2nd order in space and 1st order in time for different $M(x)$ in the inner region $[0.3,0.7]$ .

Table 1: Error of

F

\Omega^{\prime}=[0.3,0.7]

t=0.1

in Example 1.

$h$	$\tau$	$\\|e\\|_{2}$	order	$\\|e\\|_{\infty}$	order
		$M(x)=x(1-x)(-4x+2)$
1/100	1/100	9.78093e-04		2.61509e-03
1/200	1/400	2.41752e-04	2.01644	6.62837e-04	1.98013
1/400	1/1600	6.03394e-05	2.00236	1.69222e-04	1.96973
		$M(x)=0.2(1-x)+0.4x$
$h$	$\tau$	$\\|e\\|_{2}$	order	$\\|e\\|_{\infty}$	order
1/100	1/100	9.90565e-04		3.05562e-03
1/200	1/400	2.47424e-04	2.00127	7.76938e-04	1.97559
1/400	1/1600	6.16308e-05	2.00526	1.96225e-04	1.98529

4.1 Pure drift and natural selection

In this section, we study pure drift and natural selection case which keeps the conservation law (1.2) and (1.3).

Firstly, we define the error of fixation probability at the left and right boundary points at large enough time (near the steady state) $T=\tau N$ as

e_{left}:=|F_{1}^{N}-F_{0}^{N}-a^{\infty}|,

(4.3)

and

e_{right}:=|F_{K}^{N}-F_{K-1}^{N}-b^{\infty}|,

(4.4)

where $a^{\infty}$ and $b^{\infty}$ are the exact fixation probability at boundary points given in Theorem 1.2.

Example 2. Pure drift

In this example, we consider pure drift case, i.e., $M(x)=0$ , with a Gaussian distribution initial function as

f(0,x)=\frac{1}{\sqrt{2\pi}\sigma}\exp^{-\frac{(x-x_{0})^{2}}{2\sigma^{2}}},

(4.5)

with $\sigma=0.01$ and $x_{0}=0.7$ .

In Fig. 1, the evolution of CDF $F(t,x)$ are shown with partition $h=1/1000$ , $\tau=1/1000$ . The discontinuity seems to develop at the boundary as time evolves. That means Dirac $\delta$ singularities develop at the boundary points for the original PDF $f$ . To verify that the scheme can catch the height of the discontinuous jump, i.e., the fixation probability, we present the results in Table 2 on different spacial grid sizes ( $h=1/100,1/200,1/400,1/800)$ with a very fine time step $\tau=1/10000$ at sufficiently large time $T=36$ . It can be found that the discontinuity occurs at boundary points and the height of the jump on the two ends approach to the fixation probability given in Theorem (1.2) with a rate of $1$ st order. In Fig. 2, the discrete expectation in (3) keeps conservation as time evolves with $h=1/1000$ , $\tau=1/1000$ .

Refer to caption — Figure 1: Evolution of $F(t,x)$ in Example 2 with $h=1/1000$ , $\tau=1/1000$ .

Table 2: Discontinuity at the boundary points at

T=36

with

\tau=1/10000

in Example 2.

$h$	$F_{1}^{N}-F_{0}^{N}$	$e_{left}$	Order	$F_{K}^{N}-F_{K-1}^{N}$	$e_{right}$	Rate
1/100	0.303030	3.03030e-03		0.696970	3.03030e-03
1/200	0.301508	1.50754e-03	1.00727	0.698492	1.50753e-03	1.00727
1/400	0.300752	7.51880e-04	1.00362	0.699248	7.51880e-04	1.00362
1/800	0.300375	3.75469e-04	1.00181	0.699624	3.75469e-04	1.00180

1

$a^{\infty}=1-x_{0}=0.3$ , $b^{\infty}=x_{0}=0.7$ in (4.3)-(4.4).

Example 3. Natural selection

In this example, we consider the natural selection case:

M(x)=x(1-x)(\eta x+\beta).

Let

\mathcal{E}^{\theta}(t):=\int_{0}^{1}f(t,x)\theta(x)dx.

The conserved quantity (1.3) is $\mathcal{E}^{\theta}(t)=\mathcal{E}^{\theta}(0)$ with

\theta(x)=\frac{\int_{0}^{x}e^{-\frac{\eta}{2}\bar{x}^{2}-\beta\bar{x}}d\bar{x}}{\int_{0}^{1}e^{-\frac{\eta}{2}x^{2}-\beta x}dx},

being the solution of (1.4).

By the integration by parts, we have

	$\displaystyle\mathcal{E}^{\theta}(t)$	$\displaystyle=\int_{0}^{1}\partial_{x}F(t,x)\theta(x)dx=F(t,1)\theta(1)-F(t,0)\theta(0)-\int_{0}^{1}F(t,x)\theta^{\prime}dx$		(4.6)
		$\displaystyle=1-\int_{0}^{1}F(t,x)\frac{e^{-\frac{\eta}{2}x^{2}-\beta x}}{\int_{0}^{1}e^{-\frac{\eta}{2}x^{2}-\beta x}dx}dx.$		(4.6)

Then a discrete conserved quantity can be defined as

\mathcal{E}^{\theta,n}_{h}=1-\frac{1}{A}\left(h\sum\limits_{i=1}^{N-1}F_{i}^{n}e^{-\frac{\eta}{2}x_{i}^{2}-\beta x_{i}}+\frac{h}{2}F_{0}^{n}e^{-\frac{\eta}{2}x_{0}^{2}-\beta x_{0}}+\frac{h}{2}F_{N}^{n}e^{-\frac{\eta}{2}x_{N}^{2}-\beta x_{N}}\right),

where $A$ is an approximation of $\int_{0}^{1}e^{-\frac{\eta}{2}x^{2}-\beta x}dx$ ,

A:=h\sum\limits_{i=1}^{N-1}e^{-\frac{\eta}{2}x_{i}^{2}-\beta x_{i}}+\frac{h}{2}e^{-\frac{\eta}{2}x_{0}^{2}-\beta x_{0}}+\frac{h}{2}e^{-\frac{\eta}{2}x_{N}^{2}-\beta x_{N}}.

In this example, $\eta=-4$ and $\beta=2$ and the initial state is given in (4.5) with $x_{0}=0.7$ and $\sigma=0.01$ . Fig. 3 shows that the discontinuous points emerge at the boundary points. As time evolves, the discrete expectation $\mathcal{E}_{h}^{n}$ does not keep the conservation and tends to a certain value ( $\approx 0.671595$ ), but $\mathcal{E}^{\theta,n}_{h}$ keeps the conservation and its value is about $0.671529$ shown in Fig. 5. Table 3 shows the ability to catch the jump of the discontinuity at boundary points under different spacial grid sizes ( $h=1/100,1/200,1/400,1/800$ ), and a very fine time step $\tau=1/10000$ at $T=15$ when the steady state is approaching. The fixation probability is predicted in a $1$ -order accuracy at left and right boundary point.

Table 3: Discontinuity at the boundary points at

T=15

with

\tau=1/10000

in Example 3.

$h$	$F_{1}^{N}-F_{0}^{N}$	$e_{left}$	Rate	$F_{K}^{N}-F_{K-1}^{N}$	$e_{right}$	Rate
1/100	0.330230	2.16298e-03		0.669770	2.16298e-03
1/200	0.329103	1.03683e-03	1.06085	0.670896	1.03683e-03	1.06085
1/400	0.328556	4.89322e-04	1.08332	0.671444	4.89322e-04	1.08332
1/800	0.328287	2.20144e-04	1.15234	0.671713	2.20144e-04	1.15234

1

$e_{left}$ and $e_{right}$ are defined in (4.3) and (4.4), with $b^{\infty}=\mathcal{E}^{\theta,0}_{h^{*}}$ and $a^{\infty}=1-b^{\infty}$ , where the conserved quantity $\mathcal{E}^{\theta,0}_{h^{*}}=0.671933$ under a very fine space step $h=1/10000$ .

4.2 Mutation case

In this section, we discuss the mutation case $M(X)=\gamma(1-x)-\mu x$ , $\mu,\gamma\geq 0$ , including one-way mutation and two-way mutation.

Example 4. One-way mutation: Muller’s ratchet

One-way mutation $M(x)=\gamma(1-x)$ with $\gamma=0.2$ is considered in this example. The initial state is taken as

f(0,x)=\left\{\begin{aligned} &\delta_{0},\ x=0,\\ &0,\ x\in(0,1],\end{aligned}\right.

(4.7)

i.e., there is a point measure with the whole probability $1$ at $x=0$ at the initial time. That means only the fittest gene $B$ exists in the system. Fig. 6 shows the evolution of CDF $F(t,x)$ . The discontinuity develops at $x=1$ and the height of the discontinuous jump rises up to $1$ eventually. Table 4 shows the numerical results with different spatial step sizes ( $h=1/100,\ 1/200,\ 1/400,\ 1/800$ ) and a very fine time step $\tau=1/10000$ at $T=50$ , when the steady state is approaching. It can be found that the discontinuity only emerges at $x=1$ with height of $1$ and no discontinuity happens at $x=0$ . This fact accords with the Muller’s ratchet theory: all fittest gene $B$ will mutate irreversibly to the deleterious gene $A$ .

Table 4: Discontinuity at the right boundary point at

T=50

in Example 4.

$h$	$F_{1}^{N}-F_{0}^{N}$	$F_{K}^{N}-F_{K-1}^{N}$
1/100	2.22716e-05	0.999946
1/200	1.93788e-05	0.999946
1/400	1.68642e-05	0.999946
1/800	1.46777e-05	0.999946

Example 5. Two-way mutation

In this example, we consider a two-way mutation $M(X)=\gamma(1-x)-\mu x$ with $\mu=0.2,\ \gamma=0.4$ . Taking into account that $F(t,x)$ may be smooth on $x\in[0,1]$ , the probability density $f(t,x)$ can be recovered by (2.4) except $i=1,K-1$ . Central difference is available now,

f_{i}^{n}=\left\{\begin{aligned} &\frac{F_{i+1}^{n}-F_{i-1}^{n}}{2h},\ i=1,\\ &\frac{F_{i+1}^{n}-F_{i-1}^{n}}{2h},\ i=K-1.\\ \end{aligned}\right.\

(4.8)

The initial function is chosen as (4.5) with $\sigma=0.01$ . Fig. 7 shows that the CDF may be continuous as time evolves with $x_{0}=0.7$ under $h=1/1000$ , $\tau=1/1000$ . Fig. 8 shows the expectation is not conserved as time evolves and towards to the same value $0.66677$ with different initial $x_{0}=0.7,0.2$ . The left figure of Fig. 9 shows the relationship between $\ln(F(t,x))$ and $\ln(x)$ is approximately linear at $T=36$ with $h=1/3200$ and $\tau=1/10000$ . $\ln(1-F(t,x))$ and $\ln(1-x)$ are also approximately linear in the right figure of Fig. 9. The results show that $F(t,x)$ can be approached by polynomial with $x^{\xi}$ near $x=0$ and $(1-x)^{\eta}$ near $x=1$ , where $\xi,\eta$ are positive constant, at $T=36$ near the steady state.

Table 5 shows the behavior of the power law near the boundary points $x=0,1$ with different initial states ( $x_{0}=0.7$ and $x_{0}=0.2$ ) under different spatial mesh steps and the refine time step $\tau=1/10000$ at time $T=36$ . The value of $F_{1}^{N}-F_{0}^{N}$ means the point measure does not emerge at the boundary point. The results also show that the numerical $F(t,x)$ can be approached by polynomial $x^{\gamma}$ with $\gamma\approx 0.4$ near $x=0$ and $(1-x)^{\mu}$ with $\mu\approx 0.2$ near $x=1$ , respectively. That means the corresponding probability density $f(t,x)$ is close to $x^{-0.6}$ at $x=0$ and $(1-x)^{-0.8}$ at $x=1$ . It suggests that the numerical results are consistent with theoretical results (1.9). In addition, numerical results also show the fact that the steady state has nothing with the initial states.

Table 5: Behavior of power law at boundary points at

T=36

with

\tau=1/10000

in Example 5.

$h$	$F_{0}^{N}$	$F_{1}^{N}$	$\gamma$	$F_{K-1}^{N}$	$F_{K}^{N}$	$\mu$
	$x_{0}=0.7$
1/200	0.00000	4.78993e-02		7.37404e-01	1.00000
1/400	0.00000	3.62271e-02	0.402935	7.72496e-01	1.00000	0.206953
1/800	0.00000	2.74198e-02	0.401851	8.02473e-01	1.00000	0.203842
1/1600	0.00000	2.07643e-02	0.401110	8.28292e-01	1.00000	0.202093
1/3200	0.00000	1.57294e-02	0.400639	8.50637e-01	1.00000	0.201136
	$x_{0}=0.2$
$h$	$F_{0}^{N}$	$F_{1}^{N}$	$\gamma$	$F_{K-1}^{N}$	$F_{K}^{N}$	$\mu$
1/200	0.00000	4.78993e-02		7.37404e-01	1.00000
1/400	0.00000	3.62271e-02	0.402935	7.72496e-01	1.00000	0.206953
1/800	0.00000	2.74198e-02	0.401851	8.02473e-01	1.00000	0.203842
1/1600	0.00000	2.07643e-02	0.401110	8.28292e-01	1.00000	0.202093
1/3200	0.00000	1.57294e-02	0.400639	8.50637e-01	1.00000	0.201136

1

$\gamma$ and $\mu$ are obtained by

$\gamma=\ln(F_{1}^{N}/F_{1}^{2N})/\ln(2),$

$\mu=\ln\left(\frac{F_{K}^{N}-F_{1}^{N}}{F_{K}^{2N}-F_{1}^{2N}}\right)/\ln(2).$

5 Some discussions about the revised FDM and the standard FDM

In this section, we discuss what happens if the revised treatment (2.3) is not introduced.

Recalling that $a(x)=x(1-x)$ , the standard finite difference scheme, referred as sFDM, is as follows. Given $F^{n-1}$ , $F^{n}=(F^{n}_{0},\cdots,F^{n}_{K})$ such that

\frac{F^{n}_{i}-F^{n-1}_{i}}{\tau}-\frac{a(x_{i+\frac{1}{2}})D_{h}F_{i+1}^{n}-a(x_{i-\frac{1}{2}})D_{h}F_{i}^{n}}{h}+M(x_{i})D^{up}_{h}F^{n}_{i}=0,\ i=1,\cdots,K-1,

(5.1)

subject to $F^{n}_{0}=0$ , $F^{n}_{K}=1$ , $n=1,\cdots,N$ .

Firstly, we compare the numerical behavior of the two schemes. Without loss of generality, we consider the pure drift case $M(x)=0$ and take the initial state as (4.5) with $x_{0}=0.7$ and $\sigma=0.01$ . Numerical results are presented in Figs 10 and 11.

The evolution of CDF $F(t,x)$ by rFDM (2.2)-(2.3) and sFDM (5.1) are shown in Fig. 10. As time evolves, the discontinuity emerges at the ends $x=0,1$ by rFDM and the fixation phenomenon is correctly predicted. For sFDM, no evidence implies the development of the discontinuity, i.e., sFDM fails to predict the fixation phenomenon. To make it more clear, more results by sFDM are presented in Table 6 with different spatial grid size ( $h=1/100,\ 1/200,\ 1/400,\ 1/800$ ) and the fixed time step size $\tau=1/10000$ . It is obvious that sFDM fails to catch the discontinuity that should develop at the ends.

The evolution of expectation in Fig. 11 shows that rFDM keeps the conservation of expectation, while sFDM fails.

Table 6: Behavior at boundary points for sFDM (5.1) at

T=36

$h$	$F_{1}^{N}-F_{0}^{N}$	$e_{left}$	$F_{K}^{N}-F_{K-1}^{N}$	$e_{right}$
1/100	0.153003	0.146998	0.153003	0.546997
1/200	0.138051	0.161949	0.138052	0.561948
1/400	0.125864	0.174135	0.125865	0.574135
1/800	0.115703	0.184297	0.115707	0.584294

1

$a^{\infty}=1-x_{0}=0.3$ , $b^{\infty}=x_{0}=0.7$ in (4.3)-(4.4).

The reason why sFDM does not work is that sFDM destroys the rule that the information at boundary points should not be transferred into the domain by diffusion due to the degeneration of the diffusion coefficient $a(0)=a(1)=0$ at the boundary points.

Next, the result of sFDM in Fig. 10 looks like the one in Fig. 7 for two-way mutation case. But what we treat now is the pure drift case $M(x)=0$ . To make this clear, we take the difference between rFDM (2.2)-(2.3) and sFDM (5.1). The only difference takes place at two points $x_{1}$ and $x_{K-1}$ . At $x_{1}$ , sFDM is rFDM plus a term at the left hand side as

\Lambda_{1}=a(x_{\frac{1}{2}})D_{h}F^{n}_{1}/h=\frac{1}{2}(1-\frac{h}{2})D_{h}F^{n}_{1}.

This means that at $x_{1}$ , a mutation from gene $B$ to $A$ is numerically introduced to a pure drift case, here $M(x)=\gamma(1-x)$ with a mutation ratio $\gamma=\frac{1}{2}$ . Similarly, at $x_{K-1}$ , sFDM is rFDM plus a term at the right hand side as

\Lambda_{K-1}=-a(x_{K-\frac{1}{2}})D_{h}F^{n}_{K}/h=-\frac{1}{2}x_{K-\frac{1}{2}}D_{h}F^{n}_{K}.

That implies that at $x_{K-1}$ , a mutation from gene $A$ to $B$ is numerically introduced, here $M(x)=-\mu x$ with a mutation rate $\mu=\frac{1}{2}$ . With these artificial mutations, fixation can never happen. That’s the reason why sFDM fails to predict the fixation that should happen for pure drift and its numerical results behavior like a problem with two-way mutation.

6 Conlusions

We re-model the random genetic drift problem on PDF to a new one on CDF. The possible Dirac $\delta$ singularity on PDF then changes to a discontinuous jump on CDF and the height of the jump is just the fixation probability. The possible singularity of negative power law changes to a bounded positive power law. A revised finite difference method (rFDM) is proposed to uniformly and effectively handle the pure drift with or without natural selection, one-way mutation and two-way mutation.

The idea working on CDF is a potential way to treat multi-alleles genetic drift problems, where multi-dimensional partial differential equations is involved. It is quite direct to change the equation on PDF to one on CDF. But it is a challenge now to settle down the boundary condition, which is corresponding to the margin distribution.

Acknowledgments

The authors would like to thank Prof X.F. Chen for very helpful discussions on this topic. C. Duan was supported in part by NSFC 11901109. C. Liu is partially supported by NSF grants DMS-1950868 and DMS2118181. X. Yue was supported by NSFC 11971342, 12071190 and 12371401.

Reference

[1] R. Barakat and D. Wagener, Solutions of the forward diallelic diffusion equation in population genetics, Math. Biosci. 41 (1978), pp. 65-79.
[2] C. Br $\ddot{a}$ utigam and Matteo Smerlak, Diffusion approximations in population genetics and the rate of Muller’s ratchet, J. Theor. Biol. 550 (2022), pp. 111236.
[3] J. A. Carrillo, L. Chen and Q. Wang, An optimal mass transport method for random genetic drift, SIAM J. Numer. Anal., 60(3) (2022), pp. 940-969.
[4] F. A. Chalub and M. O. Souza, From discrete to continuous evolution models: A unifying approach to drift-diffusion and replicator dynamics, Theoret. Popul. Biol., 76 (2009), pp. 268-277.
[5] F. A. Chalub and M. O. Souza, A non-standard evolution problem arising in population genetics, Commun. Math. Sci., 7 (2009), pp. 489-502.
[6] F. A. Chalub, L. Monsaingeon, A. M. Ribeiro and M. O. Souza, Gradient flow formulations of discrete and continuous evolutionary models: A unifying perspective, Acta Appl. Math., 171 (2021), pp. 24.
[7] J. F. Crow and M. Kimura, Some genetic problems in natural populations, Proc. Third Berkeley Symp. Math. Stat. and Prob., 4 (1956), pp. 1-22.
[8] J. F. Crow and M. Kimura, An Introduction to Population Genetics Theory, Harper & Row, New York, (1970).
[9] A.Devi and K. Jain, The impact of dominance on adaptation in changing environments, Genetics 216 (2020), pp. 227-240.
[10] C. Duan, C. Liu, C. Wang and X. Yue, Numerical complete solution for random genetic drift by energetic variational approach, ESAIM Math. Model. Numer. Anal., 53 (2019), pp. 615-634.
[11] W. J. Ewens, Mathematical Population Genetics. I. Theoretical Introduction, Springer-Verlag, New York, (2004).
[12] J.Felsenstein, The evolutionary advantage of recombination, Genetics 78 (1974), pp. 737-756.
[13] R. A. Fisher, On the dominance ratio, Proc. Royal Soc. of Edinburgh, 42 (1923), pp. 321-341.
[14] J. Haigh, The accumulation of deleterious genes in a population-Muller’s ratchet, Theoret. Popul. Biol., 14(2) (1978), pp. 251-267.
[15] S. Kaushik, Effect of beneficial sweeps and background selection on genetic diversity in changing environments, J. Theor. Biol., 562 (2023), pp. 111431.
[16] M. Kimura, Stochastic processes and distribution of gene frequencies under natural selection, Cold Spring Harb. Symp. Quant. Biol., 20 (1955), pp. 33-53.
[17] M. Kimura, Random genetic drift in multi-allelic locus, Evolution, 9 (1955), pp. 419-435.
[18] M. Kimura, Solution of a process of random genetic drift with a continuous model, Proc. Natl. Acad. Sci., 41 (1955), pp. 141-150.
[19] M. Kimura, On the probability of fixation of mutant genes in a population, Genetics, 47 (1962), pp. 713-719.
[20] M. Kimura, Diffusion models in population genetics, J. Appl. Probability, 1 (1964), pp. 177-232.
[21] P. A. P. Moran, Random processes in genetics, Proc. Cambridge Philos. Soc., 54 (1958), pp. 60-72.
[22] P. A. P. Moran, The Statistical Processes of Evolutionary Theory, Clarendon Press, Oxford University Press, Oxford, (1962).
[23] H. J. Muller, Some genetic aspects of sex, Am. Nat., 66 (703) (1932), pp.118-138.
[24] H. J. Muller, The relation of recombination to mutational advance, Mutat. Res., 1 (1964), pp. 2-9.
[25] T. D. Tran, J. Hofrichter and J. Jost, An introduction to the mathematical structure of the Wright-Fisher model of population genetics, Theoret. Biosci., 132 (2013), pp. 73-82.
[26] A. Traulsen, J. C. Claussen and C. Hauert, Coevolutionary dynamics: From finite to infinite populations, Phys. Rev. Lett., 95 (2005), 238701.
[27] Y. Wang and B. Rannala, A novel solution for the time-dependent probability of gene fixation or loss under natural selection, Genetics, 168 (2004), pp.1081-1084.
[28] S. Wright, The evolution of dominance, Amer. Nat., 63 (1929), pp. 556-561.
[29] S. Wright, The distribution of gene frequencies in populations, Proc. Natl. Acad. Sci. USA, 23 (1937), pp. 307-320.
[30] S. Wright, The differential equation of the distribution of gene frequencies, Proc. Natl. Acad. Sci. USA, 31 (1945), pp. 382-389.
[31] H.Uecker and J. Hermisson, On the fixation process of a beneficial mutation in a variable environment, Genetics, 188 (2011), pp. 915-930.
[32] C. S. Wylie, C.-M. Ghim, D. Kessler, and H. Levine, The fixation probability of rare mutators in finite asexual populations, Genetics, 181 (2009), pp. 1595-1612.
[33] S. Xu, M. Chen, C. Liu, R. Zhang, and X. Yue, Behavior of different numerical schemes for random genetic drift, BIT Numer. Math., 59 (2019), pp. 797-821.
[34] S. Xu, X. Chen, C. Liu, and X. Yue, Numerical method for multi-alleles genetic drift problem, SIAM J. Numer. Anal., 57 (2019), pp. 1770-1788.
[35] L. Zhao, X. Yue and D. Waxman, Complete numerical solution of the diffusion equation of random genetic drift, Genetics, 194 (2013), pp. 973-985.