This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Cutoff for the logistic SIS epidemic model with self-infection

Roxanne He School of Mathematics and Statistics, The University of Melbourne, Parkville, VIC, 3010, Australia [email protected] Malwina Luczak Department of Mathematics, University of Manchester [email protected]  and  Nathan Ross School of Mathematics and Statistics, The University of Melbourne, Parkville, VIC, 3010, Australia [email protected]
Abstract.

We study a variant of the classical Markovian logistic SIS epidemic model on a complete graph, which has the additional feature that healthy individuals can become infected without contacting an infected member of the population. This additional “self-infection” is used to model situations where there is an unknown source of infection or an external disease reservoir, such as an animal carrier population. In contrast to the classical logistic SIS epidemic model, the version with self-infection has a non-degenerate stationary distribution, and we derive precise asymptotics for the time to converge to stationarity (mixing time) as the population size becomes large. It turns out that the chain exhibits the cutoff phenomenon, which is a sharp transition in time from one to zero of the total variation distance to stationarity. We obtain the exact leading constant for the cutoff time, and show the window size is constant (optimal) order. While this result is interesting in its own right, an additional contribution of our work is that the proof illustrates a recently formalised methodology of Barbour, Brightwell and Luczak [6], which can be used to show cutoff via a combination of concentration of measure inequalities for the trajectory of the chain, and coupling techniques.

The research of Malwina Luczak is currently supported by the Leverhulme Trust, grant reference LIP-2022-005, and was previously supported by an Australian Research Council Future Fellowship (FT170104009).

1. Introduction

Mathematical models of disease spread are widely used in epidemiology, to assist in preventing and managing epidemic outbreaks. One well-studied epidemic model is the logistic Susceptible-Infected-Susceptible model (SIS model) on the complete graph with NN vertices, which is a continuous time birth-and-death Markov chain XN=(XN(t))t0X^{*}_{N}=(X^{*}_{N}(t))_{t\geq 0} with state space {0,1,,N}\{0,1,\ldots,N\}, and transitions given by

xx+1 at rate λx(1x/N),\displaystyle x^{*}\mapsto x^{*}+1\text{ at rate }\lambda x^{*}(1-x^{*}/N),
xx1 at rate μx,\displaystyle x^{*}\mapsto x^{*}-1\text{ at rate }\mu x^{*},

where x{0,1,,N}x^{*}\in\{0,1,\ldots,N\} and λ,μ>0\lambda,\mu>0. The model describes an epidemic spreading in a closed population with NN individuals, with the number of infected people at time tt represented by XN(t)X^{*}_{N}(t). To explain the dynamics, each infected person encounters a random member of the population at rate λ\lambda, and if the other individual is currently susceptible to infection, that individual then becomes infected. Each infected individual recovers at rate μ\mu, and once they are recovered, they immediately become susceptible again. If, at any time, the number of infected people becomes zero, we say that the epidemic outbreak is extinct. This model was first formulated and studied by Feller [18] in the 1930s, but was not heavily studied until the 1970s, when it was popularised by Weiss and Dishon [33].

We are interested in asymptotics as NN\to\infty, with λ\lambda, μ\mu fixed positive constants. We use the phrase “with high probability” to mean “with probability tending to 11 as NN\to\infty”. A key quantity in the SIS model is the so-called basic reproduction ratio 0:=λ/μ\mathcal{R}_{0}:=\lambda/\mu. The epidemic is called supercritical if 0>1\mathcal{R}_{0}>1, and subcritical if 0<1\mathcal{R}_{0}<1. It is well-known that for the supercritical SIS model, if there are a large number of infected individuals at time 0, then with high probability, XN(t)/NX_{N}^{*}(t)/N heads rapidly towards the stable fixed point of a corresponding deterministic model given by a (logistic) differential equation, then spends most of its time before extinction in the neighbourhood of that fixed point. The time to extinction is exponential in NN (in mean and with high probability) as NN\to\infty; see, for example, Nåsell [25]. For more detailed aspects of the supercritical logistic SIS model such as quasi-stationarity, see, for example, Andersson and Djehiche [4], Barbour [5] and Nåsell [26].

In the subcritical case where λ<μ\lambda<\mu, Doering et al. [16] showed in 2005 that, if the initial state XN(0)X_{N}^{*}(0) is of order NN, then the expected extinction time is 1μλ(logN+O(1))\frac{1}{\mu-\lambda}(\log N+O(1)).

Brightwell et al. [10] obtained a formula for the asymptotic distribution of the extinction time in the subcritical case, and, in particular, proved that, if the starting state is of order NN, then for any constant c>0c>0 large enough, the probability of extinction by time 1μλlog(N)c\frac{1}{\mu-\lambda}\log(N)-c is nearly zero, while by time 1μλlog(N)+c\frac{1}{\mu-\lambda}\log(N)+c, it is nearly one. To be precise, an asymptotic formula for the extinction time, where the randomness has a Gumbel distribution, is given in [10] in the case where μ>λ\mu>\lambda and both are fixed constants. The authors also consider the case where λ,μ\lambda,\mu are allowed to vary with NN, and μ(N)λ(N)0+\mu(N)-\lambda(N)\to 0^{+} suitably slowly (the “barely subcritical” case) and show that the same formula for the extinction time holds in this regime.

Recent work of Foxall [19] summarises and derives further results for the asymptotic behaviour of the extinction time under different asymptotic regimes for the initial infection size and reproduction ratio (also allowing λ\lambda and μ\mu to vary with NN).

The model studied in this paper is a variant of the logistic SIS epidemic model known as the logistic SIS epidemic model with self-infection (ε\varepsilon-SIS model), which is a continuous-time birth-and-death Markov chain XN=(XN(t))t0X_{N}=(X_{N}(t))_{t\geq 0} with state space {0,1,,N}\{0,1,\ldots,N\}, and transitions given by

xx+1 at rate λx(1x/N)+ε(Nx),\displaystyle x\mapsto x+1\text{ at rate }\lambda x(1-x/N)+\varepsilon(N-x),
xx1 at rate μx,\displaystyle x\mapsto x-1\text{ at rate }\mu x,

where x{0,1,,N}x\in\{0,1,\ldots,N\} and λ,μ,ε>0\lambda,\mu,\varepsilon>0. A literature review for this lesser studied model is given later in the present section.

Compared to the standard SIS model, the ε\varepsilon-SIS model has an additional source of infection, which can be thought of as an infection from an unknown source or external disease reservoir, such as an animal carrier of the disease. Thus, any susceptible person can be infected (at rate ε\varepsilon) without having contact with any infected member of the population. Excluding the case ε=0\varepsilon=0 (done throughout this paper), which is the classical well-studied SIS model, the ε\varepsilon-SIS model admits a non-degenerate stationary distribution, and so the relevant quantity of interest is the time to stationarity, which is analogous to the extinction time in the classical SIS model. We are interested in asymptotics as the population size NN tends to infinity. Assuming the parameters of the model, λ\lambda, μ\mu and ε\varepsilon are absolute constants not depending on NN, we show that the sequence of ε\varepsilon-SIS Markov chains (XN)N1(X_{N})_{N\geq 1} exhibits a cutoff, as per the following definition.

Definition 1.1.

The sequence of Markov chains {XN(t):t0}N1\{X_{N}(t):t\geq 0\}_{N\geq 1} with finite state space ENE_{N}, transition function PNt(x,)P_{N}^{t}(x,\,\cdot\,), and stationary distribution πN\pi_{N} is said to exhibit a cutoff at time tNt_{N} with window size ωN\omega_{N}, if ωN=o(tN)\omega_{N}=o(t_{N}) and

limslim infNρN(tNsωN)=1,limslim supNρN(tN+sωN)=0,\lim_{s\to\infty}\liminf_{N\to\infty}\rho_{N}(t_{N}-s\omega_{N})=1,\quad\lim_{s\to\infty}\limsup_{N\to\infty}\rho_{N}(t_{N}+s\omega_{N})=0,

where

ρN(t):=maxxENPNt(x,)πNTV,\rho_{N}(t):=\max_{x\in E_{N}}\lVert P_{N}^{t}(x,\,\cdot\,)-\pi_{N}\rVert_{\text{TV}},

is the worst-case total variation distance between the distribution of XN(t)X_{N}(t) and the chain’s stationary distribution πN\pi_{N}.

Informally, cutoff means that, for any starting state, the distribution of the chain goes from being about as far as possible in total variation distance from the stationary distribution to about as close as possible, over a window of time of length ωN\omega_{N} centred around time tNt_{N}.

The cutoff phenomenon for Markov chains was first identified in the 1980s for random transpositions on the symmetric group by Diaconis and Shahshahani [14], and the name was coined by Aldous and Diaconis [2]. Establishing cutoff for specific models of Markov chains is to this day an active area of research and in general a difficult problem. For the model in question, we have the following main result of the paper, which gives the precise cutoff time of the chain, with an optimal window.

Theorem 1.2.

The sequence of ε\varepsilon-SIS Markov chains (XN)N1(X_{N})_{N\geq 1} has a cutoff at tN:=12JlogNt_{N}:=\frac{1}{2J}\log N with constant window size, where J:=(λμε)2+4λεJ:=\sqrt{(\lambda-\mu-\varepsilon)^{2}+4\lambda\varepsilon}.

Before going further, we highlight a few points about the theorem. First, Theorem 1.2 implies that for a very small ε\varepsilon, the cutoff time of the ε\varepsilon-SIS model is about 12(μλ)log(N)\frac{1}{2(\mu-\lambda)}\log(N). This is about twice as fast as the subcritical classical SIS model, which has a cutoff at 1μλlog(N)\frac{1}{\mu-\lambda}\log(N). This finding does not appear to be at all a priori obvious: we will give an intuitive explanation for it after describing the proof in more detail at the end of the introduction.

Second, general conditions that guarantee a sequence of birth-and-death chains exhibits cutoff are given in Ding et al. [15]. They show that, if the sequence satisfies the so-called product condition, that the relaxation times tREL(N)t_{\rm{REL}}(N)–defined to be one over the spectral gap of the chain–is of smaller order as NN goes to infinity than the mixing time tMIX(N)t_{\rm{MIX}}(N)–defined to be the first time the total variation distance to stationarity is less than 1/41/4–then the chain exhibits cutoff with window of order at most tREL(N)tMIX(N)\sqrt{t_{\rm{REL}}(N)t_{\rm{MIX}}(N)}. While it is likely possible in our setting to verify the product condition, or related conditions, see for example the follow-up works [11, 12, 13], we would still need to derive bounds on the mixing time, and the window size tREL(N)tMIX(N)\sqrt{t_{\rm{REL}}(N)t_{\rm{MIX}}(N)} is at best of order log(N)\sqrt{\log(N)}, because the relaxation time is at least one, and the mixing time is of order log(N)\log(N). Thus our Theorem 1.2 is sharper than what would be obtained from these general results, by giving the constant for the cutoff time, and showing the window is of constant order. Ding et al. [15] also show the window size given in their results is unimprovable for general birth-and-death chains, so our result is also interesting as a non-trivial example where the window is smaller than predicted by the general result.

We also mention in this vein of research the work of Basu et al. [8], which gives analogous conditions for cutoff of reversible Markov chains on finite trees, and semi birth-and-death chains, and the very recent work of Salez [28], which gives a “product-like” condition for cutoff assuming that the Markov chain has “non-negative curvature”. These works also give a cutoff window of order at most tMIX(N)tREL(N)α\sqrt{t_{\rm{MIX}}(N)}\cdot t_{\rm{REL}}(N)^{\alpha} for an α>0\alpha>0, which, as in the birth-and-death case, may not be sharp. Thus, in many applications precise information about the exact cutoff time and window size needs to be obtained by “bare hands” methods.

The technique we use for proving Theorem 1.2 is adapted from Barbour et al. [6], where the authors developed a general approach to establishing the cutoff phenomenon for suitable chains by using couplings and concentration of measure inequalities; see also that paper for a comprehensive literature review of the cutoff phenomenon. The approach has been used in other places such as Lopes and Luczak [24], who derived a formula for the asymptotic distribution of the extinction time for the weaker of two competing SIS epidemics (while that result is not an example of the cutoff phenomenon, it has a similar flavour); and Eskenazis and Nestoridi [17], who established the cutoff phenomenon for the Bernoulli–Laplace urn model with o(N)o(N) swaps. Thus, an additional contribution of our work is to provide another example of this approach, which we hope will serve to standardise it as a tool for establishing cutoff.

A final contribution of this article is to bring the ε\varepsilon-SIS model to the attention of researchers in applied probability, and so we next discuss the existing literature. Afterwards, the end of the introduction gives an overview of the proof of Theorem 1.2, and simultaneously the organisation of the paper.

Related literature. The ε\varepsilon-SIS model has appeared sporadically in the applied literature. The theoretical primer (aimed at applied researchers) of Keeling and Ross [21] includes the additional ε\varepsilon term in their definition of the (vanilla) SIS model. Stone et al. [29] apply the ε\varepsilon-SIS model to head lice epidemic data, after deriving closed-form analytic formulas for some stationary quantities, and Hill et al. [20] use it to model obesity spread. Nieddu et al. [27] derive analytic quantities of the model to understand the parameter space where the disease is endemic.

The most systematic study of the model appears in a series of papers by Van Mieghem, alone and with and a number of co-authors  [1, 30, 31, 32]. Van Mieghem [30] uses an exact expression for the mean prevalence (i.e., the expectation of XNX_{N}) in equilibrium to identify a sharp phase transition in this mean as λ/μ\lambda/\mu varies, for fixed NN and a fixed very small positive value of ε\varepsilon. A follow-up paper of Van Mieghem and Wang [32] studies time-dependent behaviour in this regime. We remark here that known results about the supercritical classical logistic SIS model (λ/μ>1\lambda/\mu>1) help to explain this phase transition phenomenon. When λ/μ>1\lambda/\mu>1 and ε\varepsilon is much smaller compared to 1/N1/N, the behaviour of the ε\varepsilon-SIS model is very similar to that of the usual logistic SIS model in the supercritical regime, except that, after reaching state 0, the process restarts after the next self-infection, so that the mean waiting time in state 0 is (Nε)1(N\varepsilon)^{-1}. It is known (see, for instance, [4]) that the duration of the supercritical logistic epidemic is asymptotically exponential with a mean whose leading term is eγNe^{\gamma N}, with γ=log(λ/μ)1+μ/λ\gamma=\log(\lambda/\mu)-1+\mu/\lambda. If (Nϵ)1(N\epsilon)^{-1} is much smaller than eγNe^{\gamma N}, then the mean prevalence is of order NN, as the ε\varepsilon-SIS model will be spending most of its time near the stable fixed point of the logistic equation at 1μ/λ1-\mu/\lambda. On the other hand, if (Nϵ)1(N\epsilon)^{-1} is much larger than eγNe^{\gamma N}, then the mean prevalence is near zero, since then most of the time the epidemic will be extinct, waiting to restart.

While papers [30, 32] assume that the population size NN is fixed, the regime considered there is qualitatively very similar to the case where NN is very large, ε\varepsilon is exponentially small in NN, and λ,μ\lambda,\mu fixed constants, and therefore very different to the regime considered in the present paper.

Proof and paper overview. In Section 2 we provide a brief discussion of the deterministic ε\varepsilon-SIS model, represented by the differential equation given in (2.1). The solution to this differential equation provides an approximation for the scaled process (XN(t)/N)t0(X_{N}(t)/N)_{t\geq 0} over an appropriate time scale, for large NN.

In Section 3, we show in Lemma 3.1 that, for a (deterministic) period of time that is stretched exponential in NN, with high probability, the scaled process (XN(t)/N)(X_{N}(t)/N) is at most O(N1h2)O(N^{-\frac{1-h}{2}}), for some fixed h(0,1),h\in(0,1), away from the solution of equation (2.1). In addition, we show in Lemma 3.2 that for large enough times tt, the mean of XN(t)/NX_{N}(t)/N is within O(N1/2)O(N^{-1/2}) of the stable fixed point xx^{\star} of (2.1), given explicitly at (2.2).

In Section 4, we obtain an upper bound on the mixing time of the ε\varepsilon-SIS model. We show that by time tN+ξt_{N}+\xi, where ξ\xi is a constant that is independent of any parameters of the model and the population size NN, the distribution of the chain XN(t)X_{N}(t) is sufficiently close to its stationary distribution πN\pi_{N}. This is done by using the Markov chain coupling method, where the coupling used consists of running two copies of the chain independently, over three phases. During the burn-in phase, the concentration results of Section 3 imply that by time 1h2Jlog(N)\frac{1-h}{2J}\log(N), both copies are within O(N1+h2)O(N^{\frac{1+h}{2}}) of xNx^{\star}N with high probability; see Corollary 4.4. Given that this event has occurred, during the intermediate phase, the copies come within O(N)O(\sqrt{N}) of each other after an additional h2Jlog(N)+ξ/2\frac{h}{2J}\log(N)+\xi/2 time with high probability; see Lemma 4.5. In the final phase, given the copies are within N\sqrt{N} of each other, their distance may be compared to an unbiased random walk, which hits 0 from a state of order N\sqrt{N} within a constant time, given that the transition rate is of order NN; see Lemma 4.6.

In Section 5, we obtain a lower bound on the mixing time. We first show an improved version of the concentration inequality given in Section 3, which only holds if the starting state of the chain XN(t)X_{N}(t) is in a “good set”. We then break the proof of the lower bound into two parts. First, we combine the rapidly mixing result from the previous section and the improved concentration inequality to show the stationary distribution πN\pi_{N} is concentrated within a ball of order N\sqrt{N} around xNx^{\star}N. Then we use concentration around the deterministic solution to show that most of the mass of XN(tNξ)X_{N}(t_{N}-\xi) is located outside that ball, and thus by the time tNξt_{N}-\xi the distribution of the chain cannot be close to stationary.

Finally, as mentioned already, for very small ε\varepsilon the ε\varepsilon-SIS model mixes twice as fast as the classical subcritical SIS model. To see why this is the case, for the ε\varepsilon-SIS model, the (non-normalised) process arrives within distance N\sqrt{N} of the fixed point xNx^{\star}N, where most of the mass of the stationary distribution lies, after time of order about 12(μλ)log(N)\frac{1}{2(\mu-\lambda)}\log(N). After that, the stochastic model can be shown to do better than the deterministic one, as we can compare its behaviour to that of an unbiased random walk in continuous time taking steps at rate of order NN, since the transition rates remain of order NN throughout. Such a random walk will hit 0 in constant time with high probability.

For the classical subcritical SIS model, the process also arrives within N\sqrt{N} of 0 after about 12(μλ)log(N)\frac{1}{2(\mu-\lambda)}\log(N) time. However, at that point events happen at rate of order only N\sqrt{N} and decrease as we get closer to 0, so any comparison with an unbiased random walk would be useless. Instead, we need to keep taking advantage of the negative drift right until the end, first following the differential equation closely, and then approximating by a linear birth-and-death process.

2. Deterministic version of the ε\varepsilon-SIS model

In this section, we consider the deterministic version of the logistic SIS model with self-infection. The model represents a spreading epidemic governed by the autonomous ordinary differential equation (ODE)

dxdt=λx(1x)+ε(1x)μx,t0,\frac{dx}{dt}=\lambda x(1-x)+\varepsilon(1-x)-\mu x,\quad t\geq 0, (2.1)

where λ,μ,ε>0\lambda,\mu,\varepsilon>0. The proportion of infected people at time tt is modeled by x(t)x(t). When ε=0\varepsilon=0, we recover the classical deterministic logistic SIS model.

We solve the equation f(x)=λx(1x)+ε(1x)μx=0f(x)=\lambda x(1-x)+\varepsilon(1-x)-\mu x=0 to identify the fixed points. There are two solutions to the equation, however, since only non-negative solutions have a biological meaning, we require x(t)0x(t)\geq 0 for all t0t\geq 0 in (2.1). The only fixed point that falls in this range is

x=12λ((λμε)+(λμε)2+4λε):=λμε+J2λ,x^{\star}=\frac{1}{2\lambda}\left((\lambda-\mu-\varepsilon)+\sqrt{(\lambda-\mu-\varepsilon)^{2}+4\lambda\varepsilon}\right):=\frac{\lambda-\mu-\varepsilon+J}{2\lambda}, (2.2)

where we recall from the statement of Theorem 1.2 that J=(λμε)2+4λεJ=\sqrt{(\lambda-\mu-\varepsilon)^{2}+4\lambda\varepsilon}. Since x(t)x(t) models the proportion of infected people, x=(λμε+J)/2λ1x^{\star}=(\lambda-\mu-\varepsilon+J)/2\lambda\leq 1, as one would expect. When ε=0\varepsilon=0, the solution simplifies to λμλ\frac{\lambda-\mu}{\lambda}, if λ>μ\lambda>\mu, whereas if λ<μ\lambda<\mu the solution degenerates to zero. The other solution is

x1=12λ((λμε)(λμε)2+4λε):=λμεJ2λ,x_{1}^{\star}=\frac{1}{2\lambda}\left((\lambda-\mu-\varepsilon)-\sqrt{(\lambda-\mu-\varepsilon)^{2}+4\lambda\varepsilon}\right):=\frac{\lambda-\mu-\varepsilon-J}{2\lambda},

which is non-positive and degenerates to zero when ε=0\varepsilon=0, provided λ>μ\lambda>\mu. The following proposition is easily verified by simple calculations and standard results, for example, Theorem 2.4.2 in [9].

Proposition 2.1.

The differential equation (2.1) subject to the initial condition x(0)=α[0,1]x(0)=\alpha\in[0,1] has an explicit solution

x(t)=x+Jλ(αx)(αx1)etJ(αx),whereJ=(λμε)2+4λε.x(t)=x^{\star}+\frac{\frac{J}{\lambda}(\alpha-x^{\star})}{(\alpha-x_{1}^{\star})e^{tJ}-(\alpha-x^{\star})},\quad\text{where}\quad J=\sqrt{(\lambda-\mu-\varepsilon)^{2}+4\lambda\varepsilon}. (2.3)

It is easy to see from (2.3) that, as tt\to\infty, the solution monotonically approaches xx^{\star} from below when 0α<x0\leq\alpha<x^{\star}, whereas it monotonically approaches xx^{\star} from above if x<α1x^{\star}<\alpha\leq 1. This can also be seen from the phase diagram. The graph of f(x)f(x) versus xx is a downward parabola, so dx/dt>0dx/dt>0 when 0x<x0\leq x<x^{\star}, and dx/dt<0dx/dt<0, if x<x1x^{\star}<x\leq 1. This implies that regardless of the starting point, any solution of (2.1) approaches xx^{\star} monotonically as tt\to\infty, so the fixed point xx^{\star} is globally attractive, over the set [0,1][0,1].

3. Concentration around the deterministic process

For the rest of this paper, we use x(t)x(t) to denote the solution to the governing equation (2.1) with initial state x(0)=XN(0)/Nx(0)=X_{N}(0)/N. There are two main results of this section. We first show that the scaled process XN(t)/NX_{N}(t)/N closely follows the solution x(t)x(t) defined in (2.3). To be precise, we prove the following result.

Lemma 3.1.

There exist positive constants C1,C2C_{1},C_{2} depending only on λ,μ\lambda,\mu and ε\varepsilon such that for all h(0,1)h\in(0,1), the following relation holds for all NN large enough (depending on λ,μ,ε\lambda,\mu,\varepsilon and hh; see (3.11)),

(supttfollow|XN(t)Nx(t)|>C2N1h2)4eC1Nh,\mathbb{P}\left(\sup_{t\leq t_{\rm{follow}}}\left\lvert\frac{X_{N}(t)}{N}-x(t)\right\rvert>C_{2}N^{-\frac{1-h}{2}}\right)\leq 4e^{-C_{1}N^{h}}, (3.1)

where

tfollowtfollow(N,h):=1JeC1Nh.t_{\rm{follow}}\equiv t_{\rm{follow}}(N,h):=\mbox{$\frac{1}{J}$}\lceil e^{C_{1}N^{h}}\rceil. (3.2)

As will be seen in the proof, it is enough to consider hh fixed (say equal to 1/21/2), so the dependence of NN and tfollowt_{\rm{follow}} on hh is ultimately not important.

Our other result in this section controls the mean of XN(t)/NX_{N}(t)/N, and is used in the proof of the lower bound on the mixing time in Section 5.

Lemma 3.2.

Let 𝔼x0\mathbb{E}_{x_{0}} be the expectation given the starting state XN(0)=x0X_{N}(0)=x_{0}. The following two statements hold:

  1. (1)

    There exists a positive constant K1K_{1}, depending only on λ,\lambda, μ\mu and ε\varepsilon, such that, for any cc\in\mathbb{R} and NN large enough (depending on λ,μ,ε\lambda,\mu,\varepsilon, and cc),

    |𝔼x0(XN(12Jlog(N)+c))xN|K1eJcN.\left\lvert\mathbb{E}_{x_{0}}\left(X_{N}\left(\mbox{$\frac{1}{2J}$}\log(N)+c\right)\right)-x^{\star}N\right\rvert\leq K_{1}e^{-Jc}\sqrt{N}.
  2. (2)

    Suppose that that x0/Nx>αx_{0}/N-x^{\star}>\alpha, for some α>0\alpha>0. Then there exists a positive constant K2K_{2}, depending on λ,μ,ε\lambda,\mu,\varepsilon and α\alpha, such that, for any cc\in\mathbb{R} and NN large enough (depending on λ,μ,ε,α\lambda,\mu,\varepsilon,\alpha, and cc),

    K2eJcN𝔼x0(XN(12Jlog(N)+c))xN.K_{2}e^{-Jc}\sqrt{N}\leq\mathbb{E}_{x_{0}}\left(X_{N}\left(\mbox{$\frac{1}{2J}$}\log(N)+c\right)\right)-x^{\star}N.

3.1. Proof of Lemma 3.1

We study the centred processes YN(t):=XN(t)/NxY_{N}(t):=X_{N}(t)/N-x^{\star} and y(t):=x(t)xy(t):=x(t)-x^{\star}, with x(0)=XN(0)/Nx(0)=X_{N}(0)/N. The overall strategy of the proof is to give an integral representation of the deterministic process y(t)y(t) (Lemma 3.3), which can be compared to a similar representation of YN(t)Y_{N}(t), but with an addition of a martingale term (Lemma 3.4). From here, the result essentially follows by controlling the martingale term (Lemma 3.6), using a simplified version of [24, Lemma 7] (stated here in Lemma 3.5) and then applying Grönwall’s lemma.

We first give a representation of y(t)y(t).

Lemma 3.3.

In the notation just defined, we have

y(t)=y(0)0tλy2(s)+Jy(s)ds,y(t)=y(0)-\int_{0}^{t}\lambda y^{2}(s)+Jy(s)\,ds, (3.3)

and

y(t)=eJty(0)0tλeJ(ts)y2(s)𝑑s.y(t)=e^{-Jt}y(0)-\int_{0}^{t}\lambda e^{-J(t-s)}y^{2}(s)\,ds. (3.4)
Proof.

Using the definition of xx^{\star}, the governing equation (2.1) can be rewritten as

dxdt=λ(xx)(xx+Jλ),\frac{dx}{dt}=-\lambda(x-x^{\star})\left(x-x^{\star}+\frac{J}{\lambda}\right),

which is the same as

dydt=λy2Jy.\frac{dy}{dt}=-\lambda y^{2}-Jy. (3.5)

From here, (3.3) follows by integration. Moreover, a simple calculation using (3.5) shows that

ddseJ(ts)y(s)=λeJ(ts)y2(s),\frac{d}{ds}e^{-J(t-s)}y(s)=-\lambda e^{-J(t-s)}y^{2}(s),

which is the same as (3.4) after integrating. ∎

We compare YN(t)=XN(t)/NxY_{N}(t)=X_{N}(t)/N-x^{\star} to y(t)y(t) via a representation that is similar to (3.4).

Lemma 3.4.

In the notation just defined, we have

YN(t)=eJtYN(0)0tλeJ(ts)YN2(s)𝑑s+0teJ(ts)𝑑MN(s),Y_{N}(t)=e^{-Jt}Y_{N}(0)-\int_{0}^{t}\lambda e^{-J(t-s)}Y^{2}_{N}(s)\,ds+\int_{0}^{t}e^{-J(t-s)}\,dM_{N}(s), (3.6)

where

MN(t):=YN(t)YN(0)+0tλYN2(s)JYN(s)ds,M_{N}(t):=Y_{N}(t)-Y_{N}(0)+\int_{0}^{t}\lambda Y^{2}_{N}(s)-JY_{N}(s)\,ds, (3.7)

is a zero-mean martingale.

Proof.

To show that the process MN(t)M_{N}(t) defined in (3.7) is a martingale, note that the family of Markov chains (XN(t))N1(X_{N}(t))_{N\geq 1} is a density dependent family (see Section 4 in [22]) with transitions of the form

xx+1at rateNf(xN,+1):=N(λxN(1xN)+ε(1xN)),\displaystyle x\mapsto x+1\quad\text{at rate}\quad Nf\left(\frac{x}{N},+1\right):=N\cdot\left(\lambda\frac{x}{N}\left(1-\frac{x}{N}\right)+\varepsilon\left(1-\frac{x}{N}\right)\right),
xx1at rateNf(xN,1):=NμxN,\displaystyle x\mapsto x-1\quad\text{at rate}\quad Nf\left(\frac{x}{N},-1\right):=N\cdot\mu\frac{x}{N},

and the state space {0,1,,N}\{0,1,\ldots,N\} is compact. It is easy to check that the scaled process (XN(t)/N)t0(X_{N}(t)/N)_{t\geq 0} satisfies the conditions of [22, Proposition 2.1] and thus

MN(t):=XN(t)NXN(0)N0tF(XN(s)N)𝑑s,M_{N}(t):=\frac{X_{N}(t)}{N}-\frac{X_{N}(0)}{N}-\int_{0}^{t}F\left(\frac{X_{N}(s)}{N}\right)\,ds,

is a zero-mean martingale, where F(x)=f(x,+1)f(x,1)F(x)=f\left(x,+1\right)-f\left(x,-1\right). Then formula (3.6) holds, by [7, Lemma 4.1]. ∎

Comparing expressions (3.4) and (3.6), it is clear that to be able to approximate YN(t)Y_{N}(t) by y(t)y(t) we must control the size of |0teJ(ts)𝑑MN(s)|\big{\lvert}\int_{0}^{t}e^{-J(t-s)}\,dM_{N}(s)\big{\rvert}. To do this, we use the following simplified version of [24, Lemma 7] for multi-dimensional Markov jump chains.

Lemma 3.5.

[24, Lemma 7] Let (X(t))t0(X(t))_{t\geq 0} be a pure jump Markov chain with state space EE\subseteq\mathbb{R}. For x,lx,l\in\mathbb{R} such that x,x+lEx,x+l\in E, let q(x,x+l)q(x,x+l) denote the rate the chain jumps from xx to x+lx+l, and assume that there is a deterministic B>0B>0 such that q(x,x+l)=0q(x,x+l)=0 for |l|>B\lvert l\rvert>B. Suppose further that, for each xEx\in E, the drift F(x):=llq(x,x+l)F(x):=\sum_{l}lq(x,x+l) at xx can be written in the form

F(x)=A~x+F~(x),for some A~0.F(x)=\widetilde{A}x+\widetilde{F}(x),\quad\text{for some }\widetilde{A}\leq 0.

Let (M(t))t0(M(t))_{t\geq 0} be the corresponding Dynkin martingale given by

M(t):=X(t)X(0)0tF(X(s))𝑑s,M(t):=X(t)-X(0)-\int_{0}^{t}F(X(s))\,ds,

and let M~(t)=0teA~(ts)𝑑M(s)\widetilde{M}(t)=\int_{0}^{t}e^{\widetilde{A}(t-s)}\,dM(s). For δ>0\delta>0, define T~(δ)=inf{t0:|M~(t)|>δ}\widetilde{T}(\delta)=\inf\{t\geq 0:\lvert\widetilde{M}(t)\rvert>\delta\} to be the infimum of time tt such that M~(t)\widetilde{M}(t) exceeds δ\delta in absolute value. Further, given u+u\in\mathbb{R}_{+}, let ν(z,u):=wq(z,z+w)(eA~uw)2\nu(z,u):=\sum_{w}q(z,z+w)(e^{\widetilde{A}u}w)^{2}, and suppose that for some K>0K>0, 0tν(X(s),ts)𝑑sK\int_{0}^{t}\nu(X(s),t-s)\,ds\leq K. Then for any σ>0\sigma>0 and 0<ω4(log2)2K/B20<\omega\leq 4(\log 2)^{2}K/B^{2}, we have

(T~(eA~σωK)σeω/8)4eω/8.\mathbb{P}(\widetilde{T}(e^{-\widetilde{A}\sigma}\sqrt{\omega K})\leq\sigma\lceil e^{\omega/8}\rceil)\leq 4e^{-\omega/8}.

Specialising the lemma to our setting, we obtain the following result, which controls the size of the term |0teJ(ts)𝑑MN(s)|\big{\lvert}\int_{0}^{t}e^{-J(t-s)}\,dM_{N}(s)\big{\rvert}.

Lemma 3.6.

Fix h(0,1)h\in(0,1), set ω:=4(log2)2kNh\omega:=4(\log 2)^{2}kN^{h}, k:=(λ+μ+ε)/2Jk:=(\lambda+\mu+\varepsilon)/2J, and

T1:=inf{t0:|0teJ(ts)𝑑MN(s)|>eωkN}.T_{1}:=\inf\left\{t\geq 0:\left\lvert\int_{0}^{t}e^{-J(t-s)}\,dM_{N}(s)\right\rvert>e\sqrt{\frac{\omega k}{N}}\right\}.

Then

(T11Jeω/8)4eω/8.\mathbb{P}\left(T_{1}\leq\frac{1}{J}\left\lceil e^{\omega/8}\right\rceil\right)\leq 4e^{-\omega/8}.
Proof.

We apply Lemma 3.5 to our process YN(t)Y_{N}(t). Clearly, the only possible jumps for YN(t)Y_{N}(t) are ±1/N\pm 1/N so we can take B:=1/NB:=1/N. The drift FXF_{X} of XNX_{N} is given by

FX(x)=1λx(1xN)+ε(Nx)+(1)μx=λN(xNx)(xNx+Jλ).F_{X}(x)=1\cdot\lambda x\left(1-\frac{x}{N}\right)+\varepsilon(N-x)+(-1)\cdot\mu x=-\lambda N\left(\frac{x}{N}-x^{\star}\right)\left(\frac{x}{N}-x^{\star}+\frac{J}{\lambda}\right).

Hence it is easily seen that the drift of YNY_{N} is given by FY(y)=A~y+F~(y)F_{Y}(y)=\widetilde{A}y+\widetilde{F}(y), where A~=J\widetilde{A}=-J and F~(y)=λy2\widetilde{F}(y)=-\lambda y^{2}.

Moreover, there is a uniform bound on 0tν(YN(s),ts)𝑑s\int_{0}^{t}\nu(Y_{N}(s),t-s)ds, where ν(z,u):=wq(z,z+w)(eJuw)2\nu(z,u):=\sum_{w}q(z,z+w)(e^{-Ju}w)^{2} and q(y,y+w)q(y,y+w) is the rate of the jump from yy to y+wy+w for y{0,1,,N}/Ny\in\{0,1,\ldots,N\}/N, w=±1/Nw=\pm 1/N:

0tν(YN(s),ts)𝑑s\displaystyle\int_{0}^{t}\nu(Y_{N}(s),t-s)\,ds
=0t(λXN(s)(1XN(s)N)+ε(NXN(s))+μXN(s))e2J(ts)N2𝑑s\displaystyle\quad=\int_{0}^{t}\left(\lambda X_{N}(s)\left(1-\frac{X_{N}(s)}{N}\right)+\varepsilon\left(N-X_{N}(s)\right)+\mu X_{N}(s)\right)e^{-2J(t-s)}N^{-2}\,ds
=1N0t(λXN(s)N(1XN(s)N)+ε(1XN(s)N)+μXN(s)N)e2J(ts)𝑑s\displaystyle\quad=\frac{1}{N}\int_{0}^{t}\left(\lambda\frac{X_{N}(s)}{N}\left(1-\frac{X_{N}(s)}{N}\right)+\varepsilon\left(1-\frac{X_{N}(s)}{N}\right)+\mu\frac{X_{N}(s)}{N}\right)e^{-2J(t-s)}\,ds
1N0t((λ+μ)XN(s)N+ε)e2J(ts)𝑑s\displaystyle\quad\leq\frac{1}{N}\int_{0}^{t}\left((\lambda+\mu)\frac{X_{N}(s)}{N}+\varepsilon\right)e^{-2J(t-s)}\,ds
λ+μ+εN0te2J(ts)𝑑skN.\displaystyle\quad\leq\frac{\lambda+\mu+\varepsilon}{N}\int_{0}^{t}e^{-2J(t-s)}\,ds\leq\frac{k}{N}.

The result then follows directly from the Lemma 3.5 by setting σ=1J\sigma=\frac{1}{J} and K=k/NK=k/N. ∎

The last missing ingredient for the proof of Lemma 3.1 is a quantitative upper bound on the speed of convergence for the solution to the governing equation (2.1) with initial condition x(0)=XN(0)/Nx(0)=X_{N}(0)/N to the fixed point xx^{\star}. This will then be used to bound |YN(t)y(t)|=|XN(t)x(t)|\lvert Y_{N}(t)-y(t)\rvert=\lvert X_{N}(t)-x(t)\rvert. The following lemma with an elementary calculus proof gives such a bound.

Lemma 3.7.

Using the notation defined above, the following inequality holds,

|y(t)|2JJ(λμε)|y(0)|etJ,t0.\lvert y(t)\rvert\leq\frac{2J}{J-(\lambda-\mu-\varepsilon)}\lvert y(0)\rvert e^{-tJ},\quad t\geq 0. (3.8)
Proof.

Recall that x=λμε+J2λ0,x1=λμεJ2λ0x^{\star}=\frac{\lambda-\mu-\varepsilon+J}{2\lambda}\geq 0,x_{1}^{\star}=\frac{\lambda-\mu-\varepsilon-J}{2\lambda}\leq 0, where J=(λμε)2+4λεJ=\sqrt{(\lambda-\mu-\varepsilon)^{2}+4\lambda\varepsilon}. We have xy(0)1x-x^{\star}\leq y(0)\leq 1-x^{\star}. Also, from (2.3) and the fact J/λxJ/\lambda\geq x^{\star}, we obtain

|y(t)|\displaystyle\lvert y(t)\rvert =|J2λ2y(0)(xx1+y(0))etJy(0)|=Jλ|y(0)|(Jλ+y(0))etJy(0)\displaystyle=\left\lvert\frac{J}{2\lambda}\frac{2y(0)}{(x^{\star}-x_{1}^{\star}+y(0))e^{tJ}-y(0)}\right\rvert=\frac{\frac{J}{\lambda}\lvert y(0)\rvert}{\left(\frac{J}{\lambda}+y(0)\right)e^{tJ}-y(0)}
=Jλ|y(0)|Jλ+(1etJ)y(0)etJ.\displaystyle=\frac{\frac{J}{\lambda}\lvert y(0)\rvert}{\frac{J}{\lambda}+(1-e^{-tJ})y(0)}e^{-tJ}.

It then follows by easy manipulation that

|y(t)|Jλ|y(0)|Jλ(1etJ)xetJJλ|y(0)|Jλλμε+J2λetJ=2JJ(λμε)|y(0)|etJ.\lvert y(t)\rvert\leq\frac{\frac{J}{\lambda}\lvert y(0)\rvert}{\frac{J}{\lambda}-(1-e^{-tJ})x^{\star}}e^{-tJ}\leq\frac{\frac{J}{\lambda}\lvert y(0)\rvert}{\frac{J}{\lambda}-\frac{\lambda-\mu-\varepsilon+J}{2\lambda}}e^{-tJ}=\frac{2J}{J-(\lambda-\mu-\varepsilon)}\lvert y(0)\rvert e^{-tJ}.\qed

We can now prove Lemma 3.1. As previously discussed, we use representations (3.4) and (3.6) to bound the difference between XN(t)/NX_{N}(t)/N and x(t)x(t), and use Lemma 3.6 to control the martingale term in (3.6).

Proof of Lemma 3.1.

Using (3.4) and (3.6), we have

e~N(t)\displaystyle\widetilde{e}_{N}(t) :=|YN(t)y(t)|\displaystyle:=\lvert Y_{N}(t)-y(t)\rvert
λ0teJ(ts)|YN(s)2y(s)2|𝑑s+|0teJ(ts)𝑑MN(s)|\displaystyle\leq\lambda\int_{0}^{t}e^{-J(t-s)}\left\lvert Y_{N}(s)^{2}-y(s)^{2}\right\rvert\,ds+\left\lvert\int_{0}^{t}e^{-J(t-s)}\,dM_{N}(s)\right\rvert
λ0teJ(ts)e~N(s)(e~N(s)+2|y(s)|)𝑑s+|0teJ(ts)𝑑MN(s)|.\displaystyle\leq\lambda\int_{0}^{t}e^{-J(t-s)}\widetilde{e}_{N}(s)(\widetilde{e}_{N}(s)+2\lvert y(s)\rvert)\,ds+\left\lvert\int_{0}^{t}e^{-J(t-s)}\,dM_{N}(s)\right\rvert.

Using (3.8) to bound |y(s)|\lvert y(s)\rvert, we obtain

e~N(t)4λJJ(λμε)|y(0)|eJt0te~N(s)𝑑s+λ0teJ(ts)e~N(s)2ds+|0teJ(ts)𝑑MN(s)|.\displaystyle\begin{split}\widetilde{e}_{N}(t)\leq\frac{4\lambda J}{J-(\lambda-\mu-\varepsilon)}\lvert y(0)\rvert e^{-Jt}\int_{0}^{t}\widetilde{e}_{N}(s)\,ds+\lambda\int_{0}^{t}&e^{-J(t-s)}\widetilde{e}_{N}(s)^{2}\,ds\\ &+\left\lvert\int_{0}^{t}e^{-J(t-s)}\,dM_{N}(s)\right\rvert.\end{split} (3.9)

To bound the first two terms on the right-hand side of (3.9), we define

T2:=inf{t:e~N(t)>3C~ωkN},T_{2}:=\inf\left\{t:\widetilde{e}_{N}(t)>3\widetilde{C}\sqrt{\frac{\omega k}{N}}\right\},

where C~:=4e4λJ(λμε)(x(1x))\widetilde{C}:=4e^{\frac{4\lambda}{J-(\lambda-\mu-\varepsilon)}(x^{\star}\vee(1-x^{\star}))} (the exact form is not so important). Recall the definition of T1T_{1}, ω\omega and kk from Lemma 3.6. We will show that on the event tT1T2t\leq T_{1}\wedge T_{2}, one has

e~N(t)C~ωkN.\widetilde{e}_{N}(t)\leq\widetilde{C}\sqrt{\frac{\omega k}{N}}. (3.10)

For now, suppose this is true. Define

T3:=inf{t:e~N(t)>2C~ωkN}.T_{3}:=\inf\left\{t:\widetilde{e}_{N}(t)>2\widetilde{C}\sqrt{\frac{\omega k}{N}}\right\}.

By definition, (T3T2)=1\mathbb{P}(T_{3}\leq T_{2})=1. Moreover, since we have assumed that (3.10) holds, we also have (T3T1T2)=0\mathbb{P}(T_{3}\leq T_{1}\wedge T_{2})=0, which implies that 0=(T3T1T2)=(T3T1)0=\mathbb{P}(T_{3}\leq T_{1}\wedge T_{2})=\mathbb{P}(T_{3}\leq T_{1}). Applying Lemma 3.6, we obtain

(T31Jeω/8)(T11Jeω/8)4eω/8,\mathbb{P}\left(T_{3}\leq\mbox{$\frac{1}{J}$}\lceil e^{\omega/8}\rceil\right)\leq\mathbb{P}\left(T_{1}\leq\mbox{$\frac{1}{J}$}\lceil e^{\omega/8}\rceil\right)\leq 4e^{-\omega/8},

which completes the proof. The only thing left to check is (3.10). On the event tT1t\leq T_{1}, it follows from (3.9) that

e~N(t)4λJJ(λμε)|y(0)|eJt0te~N(s)𝑑s+λ0teJ(ts)e~N(s)2𝑑s+eωkN.\widetilde{e}_{N}(t)\leq\frac{4\lambda J}{J-(\lambda-\mu-\varepsilon)}\lvert y(0)\rvert e^{-Jt}\int_{0}^{t}\widetilde{e}_{N}(s)\,ds+\lambda\int_{0}^{t}e^{-J(t-s)}\widetilde{e}_{N}(s)^{2}\,ds+e\sqrt{\frac{\omega k}{N}}.

To further bound e~N(t)\widetilde{e}_{N}(t), a standard method is to use the Grönwall’s Lemma (see for example Theorem 1.3.1 in [3]). However, to apply the inequality we first need to control the e~N2\widetilde{e}_{N}^{2} term. On the event tT1T2t\leq T_{1}\wedge T_{2}, one has

e~N(t)\displaystyle\widetilde{e}_{N}(t) 4λJJ(λμε)|y(0)|etJ0te~N(s)𝑑s+9λC~2ωkN0teJ(ts)𝑑s+eωkN\displaystyle\leq\frac{4\lambda J}{J-(\lambda-\mu-\varepsilon)}\lvert y(0)\rvert e^{-tJ}\int_{0}^{t}\widetilde{e}_{N}(s)\,ds+9\lambda\widetilde{C}^{2}\frac{\omega k}{N}\int_{0}^{t}e^{-J(t-s)}\,ds+e\sqrt{\frac{\omega k}{N}}
4λJJ(λμε)|y(0)|etJ0te~N(s)𝑑s+9λC~2JωkN+eωkN.\displaystyle\leq\frac{4\lambda J}{J-(\lambda-\mu-\varepsilon)}\lvert y(0)\rvert e^{-tJ}\int_{0}^{t}\widetilde{e}_{N}(s)\,ds+\frac{9\lambda\widetilde{C}^{2}}{J}\frac{\omega k}{N}+e\sqrt{\frac{\omega k}{N}}.

Choose N>(18kλC~2J1log2)21hN>(18k\lambda\widetilde{C}^{2}J^{-1}\log{2})^{\frac{2}{1-h}}, so that

9λC~2JωkN=18kλC~2log2JN1h2ωkN<ωkN,\frac{9\lambda\widetilde{C}^{2}}{J}\frac{\omega k}{N}=\frac{18k\lambda\widetilde{C}^{2}\log{2}}{J}N^{-\frac{1-h}{2}}\sqrt{\frac{\omega k}{N}}<\sqrt{\frac{\omega k}{N}}, (3.11)

and hence

e~N(t)4λJJ(λμε)|y(0)|eJt0te~N(s)𝑑s+4ωkN.\widetilde{e}_{N}(t)\leq\frac{4\lambda J}{J-(\lambda-\mu-\varepsilon)}\lvert y(0)\rvert e^{-Jt}\int_{0}^{t}\widetilde{e}_{N}(s)\,ds+4\sqrt{\frac{\omega k}{N}}.

Let e^N(t)=etJe~N(t)\widehat{e}_{N}(t)=e^{tJ}\widetilde{e}_{N}(t), then on the event tT1T2t\leq T_{1}\wedge T_{2}, we have

e^N(t)4λJJ(λμε)|y(0)|0tesJe^N(s)𝑑s+4etJωkN,\widehat{e}_{N}(t)\leq\frac{4\lambda J}{J-(\lambda-\mu-\varepsilon)}\lvert y(0)\rvert\int_{0}^{t}e^{-sJ}\widehat{e}_{N}(s)\,ds+4e^{tJ}\sqrt{\frac{\omega k}{N}},

for NN large enough. Applying Grönwall’s lemma to e^N(t)\widehat{e}_{N}(t), we obtain for tT1T2t\leq T_{1}\wedge T_{2} that

e^N(t)4e4λJJ(λμε)|y(0)|1etJJωkNetJ\displaystyle\widehat{e}_{N}(t)\leq 4e^{\frac{4\lambda J}{J-(\lambda-\mu-\varepsilon)}\lvert y(0)\rvert\frac{1-e^{-tJ}}{J}}\sqrt{\frac{\omega k}{N}}e^{tJ} 4e4λJ(λμε)|y(0)|ωkNetJ\displaystyle\leq 4e^{\frac{4\lambda}{J-(\lambda-\mu-\varepsilon)}\lvert y(0)\rvert}\sqrt{\frac{\omega k}{N}}e^{tJ}

Dividing etJe^{tJ} from both side the inequality above then establishes (3.10) as |y(0)|x(1x)\lvert y(0)\rvert\leq x^{\star}\vee(1-x^{\star}). ∎

3.2. Proof of Lemma 3.2

To explain the proof of the other main result of this section, write y0=x0/Nxy_{0}=x_{0}/N-x^{\star}, which is the corresponding starting state of the chain YN(t)Y_{N}(t). Recall the definition of MN(t)M_{N}(t) in (3.7), Since 𝔼(MN(t))=0\mathbb{E}(M_{N}(t))=0 for all t0t\geq 0, we have

𝔼x0(YN(t))\displaystyle\mathbb{E}_{x_{0}}(Y_{N}(t)) =y0𝔼x0(0tJYN(s)𝑑s)𝔼x0(0tλYN2(s)𝑑s)\displaystyle=y_{0}-\mathbb{E}_{x_{0}}\left(\int_{0}^{t}JY_{N}(s)\,ds\right)-\mathbb{E}_{x_{0}}\left(\int_{0}^{t}\lambda Y^{2}_{N}(s)\,ds\right)
=y00tJ𝔼x0(YN(s))𝑑s0tλ𝔼x0(YN2(s))𝑑s,\displaystyle=y_{0}-\int_{0}^{t}J\mathbb{E}_{x_{0}}\left(Y_{N}(s)\right)\,ds-\int_{0}^{t}\lambda\mathbb{E}_{x_{0}}\left(Y^{2}_{N}(s)\right)\,ds, (3.12)

where we have used Fubini in the last equality to change the order of expectation and the integral. If we could replace 𝔼x0(YN2(s))\mathbb{E}_{x_{0}}\left(Y^{2}_{N}(s)\right) by (𝔼x0YN(s))2\left(\mathbb{E}_{x_{0}}Y_{N}(s)\right)^{2}, then 𝔼x0(YN(s))\mathbb{E}_{x_{0}}\left(Y_{N}(s)\right) would exactly solve the integral equation (3.4), and the claim of the lemma would be trivial. Thus we prove the result by showing the difference between 𝔼x0(YN2(t))\mathbb{E}_{x_{0}}\left(Y^{2}_{N}(t)\right) and (𝔼x0YN(t))2\left(\mathbb{E}_{x_{0}}Y_{N}(t)\right)^{2} is small for all ttfollowt\leq t_{\rm{follow}}, and so the difference between 0tλ𝔼x0(YN2(s))𝑑s\int_{0}^{t}\lambda\mathbb{E}_{x_{0}}\left(Y^{2}_{N}(s)\right)ds and 0tλ(𝔼x0YN(s))2𝑑s\int_{0}^{t}\lambda\left(\mathbb{E}_{x_{0}}Y_{N}(s)\right)^{2}ds is also small for not too large tt.

Proof of Lemma 3.2.

Let y(t)y(t) solve (3.5) and At:={z:|zy(t)|C2N1h2}A_{t}:=\{z:\lvert z-y(t)\rvert\leq C_{2}N^{-\frac{1-h}{2}}\} for fixed h<1/2h<1/2 and C2C_{2} from Lemma 3.1. Applying that lemma with this hh, we conclude that x0(YN(t)Atc)\mathbb{P}_{x_{0}}(Y_{N}(t)\in A_{t}^{c}) is exponentially small in NN for ttfollowt\leq t_{\rm{follow}}. Together with the fact that YN(t)Y_{N}(t) is bounded for all t0t\geq 0, we obtain

𝔼x0(YN(t))=𝔼x0(YN(t)𝟙YN(t)At)+𝔼x0(YN(t)𝟙YN(t)Atc)=y(t)+O(N1h2),\mathbb{E}_{x_{0}}(Y_{N}(t))=\mathbb{E}_{x_{0}}(Y_{N}(t)\mathbbm{1}_{Y_{N}(t)\in A_{t}})+\mathbb{E}_{x_{0}}(Y_{N}(t)\mathbbm{1}_{Y_{N}(t)\in A_{t}^{c}})=y(t)+O(N^{-\frac{1-h}{2}}),

and using this and a similar argument implies

Varx0(YN(t))\displaystyle\mathrm{Var}_{x_{0}}\left(Y_{N}(t)\right) =𝔼x0((YN(t)𝔼x0(YN(t)))2𝟙YN(t)At)\displaystyle=\mathbb{E}_{x_{0}}\left(\left(Y_{N}(t)-\mathbb{E}_{x_{0}}(Y_{N}(t))\right)^{2}\mathbbm{1}_{Y_{N}(t)\in A_{t}}\right)
+𝔼x0((YN(t)𝔼x0(YN(t)))2𝟙YN(t)Atc)=O(N(1h)),\displaystyle\qquad\qquad+\mathbb{E}_{x_{0}}\left(\left(Y_{N}(t)-\mathbb{E}_{x_{0}}(Y_{N}(t))\right)^{2}\mathbbm{1}_{Y_{N}(t)\in A_{t}^{c}}\right)=O(N^{-(1-h)}), (3.13)

for ttfollowt\leq t_{\rm{follow}}. Rewriting (3.2) to take advantage of these bounds, we have

𝔼x0(YN(t))=y00tJ𝔼x0(YN(s))𝑑s0tλ(𝔼x0YN(s))2𝑑s0tVarx0(YN(s))𝑑s.\mathbb{E}_{x_{0}}(Y_{N}(t))=y_{0}-\int_{0}^{t}J\mathbb{E}_{x_{0}}\left(Y_{N}(s)\right)\,ds-\int_{0}^{t}\lambda\left(\mathbb{E}_{x_{0}}Y_{N}(s)\right)^{2}\,ds-\int_{0}^{t}\mathrm{Var}_{x_{0}}\left(Y_{N}(s)\right)\,ds. (3.14)

With this equation and the bound (3.2), we can use standard differential equation comparison theorems to bound 𝔼x0(YN(t))\mathbb{E}_{x_{0}}(Y_{N}(t)). Let u(t):=𝔼x0(YN(t))u(t):=\mathbb{E}_{x_{0}}(Y_{N}(t)) and differentiate the integral equation (3.14) to get the initial value problem

ddtu(t)=λu2(t)Ju(t)Varx0(YN(t)),u(0)=y0,ttfollow.\frac{d}{dt}u(t)=-\lambda u^{2}(t)-Ju(t)-\mathrm{Var}_{x_{0}}\left(Y_{N}(t)\right),\quad u(0)=y_{0},\quad t\leq t_{\rm{follow}}. (3.15)

It follows from (3.2) that there exists a suitable constant C>0C^{*}>0 such that Varx0(YN(s))CN(1h)\mathrm{Var}_{x_{0}}\left(Y_{N}(s)\right)\leq C^{*}N^{-(1-h)} and so we obtain

ddtu(t)λu2(t)Ju(t)CN(1h),u(0)=y0,ttfollow.\frac{d}{dt}u(t)\geq-\lambda u^{2}(t)-Ju(t)-C^{*}N^{-(1-h)},\quad u(0)=y_{0},\quad t\leq t_{\rm{follow}}.

From standard comparison theory of differential equations (see for example Theorem 2.2.2 in [3]), we have u(t)z(t)u(t)\geq z(t) for all t0t\geq 0, where z(t)z(t) is the minimal solution to the initial value problem

ddtz(t)=λz2(t)Jz(t)CN(1h),z(0)=y0,ttfollow.\frac{\,d}{dt}z(t)=-\lambda z^{2}(t)-Jz(t)-C^{*}N^{-(1-h)},\quad z(0)=y_{0},\quad t\leq t_{\rm{follow}}.

This differential equation is of the same form as the governing equation (2.1), so it is not hard to see that for sufficiently large NN such that J24λCN(1h)>0J^{2}-4\lambda C^{*}N^{-(1-h)}>0, the solution is uniquely defined for all t0t\geq 0 and thus

u(t)z(t)=c2+c1λ(y0c2)(y0c3)(y0c2)ec1tec1t,u(t)\geq z(t)=c_{2}+\frac{\frac{c_{1}}{\lambda}(y_{0}-c_{2})}{(y_{0}-c_{3})-(y_{0}-c_{2})e^{-c_{1}t}}e^{-c_{1}t},

where

c1:=J24λCN(1h)(0,J),c2:=J+c12λ<0,c3:=Jc12λ<0.c_{1}:=\sqrt{J^{2}-4\lambda C^{*}N^{-(1-h)}}\in(0,J),\quad c_{2}:=\frac{-J+c_{1}}{2\lambda}<0,\quad c_{3}:=\frac{-J-c_{1}}{2\lambda}<0.

For any h<1/2h<1/2, we can write c1=Jδc_{1}=J-\delta, c2=δ2λc_{2}=-\frac{\delta}{2\lambda}, and c3=Jλ+δ2λc_{3}=-\frac{J}{\lambda}+\frac{\delta}{2\lambda}, where δ=o(N12)>0\delta=o(N^{-\frac{1}{2}})>0. Using this notation, the bound above then becomes

u(t)z(t)=δ2λ+Jδλ(y0+δ2λ)(y0+Jλδ2λ)eδt(y0+δ2λ)eJteJt.u(t)\geq z(t)=-\frac{\delta}{2\lambda}+\frac{\frac{J-\delta}{\lambda}(y_{0}+\frac{\delta}{2\lambda})}{(y_{0}+\frac{J}{\lambda}-\frac{\delta}{2\lambda})e^{-\delta t}-(y_{0}+\frac{\delta}{2\lambda})e^{-Jt}}e^{-Jt}. (3.16)

Note that if z(0)=y0>δ2λz(0)=y_{0}>-\frac{\delta}{2\lambda} then, as tt increases, z(t)z(t) monotonically approaches δ2λ-\frac{\delta}{2\lambda} from above, whereas z(t)z(t) monotonically approaches δ2λ-\frac{\delta}{2\lambda} from below if z(0)=y0(2Jδ2λ,δ2λ)z(0)=y_{0}\in(-\frac{2J-\delta}{2\lambda},-\frac{\delta}{2\lambda}). (Note that y02Jδ2λy_{0}\leq-\frac{2J-\delta}{2\lambda} is not biologically viable.) Moreover, for any fixed tt, z(t)z(t) is monotonically increasing as y0y_{0} increases, which can be verified by differentiating (3.16) with respect to y0y_{0}.

On the other hand, since Varx0(YN(t))>0\mathrm{Var}_{x_{0}}\left(Y_{N}(t)\right)>0, it follows from (3.15) that

ddtu(t)λu2(t)Ju(t),u(0)=y0,t0.\frac{\,d}{dt}u(t)\leq-\lambda u^{2}(t)-Ju(t),\quad u(0)=y_{0},\quad t\geq 0.

Again, by the comparison theorem and using (3.5), we have

u(t)y(t)=Jλy0(y0+Jλ)y0eJteJt.u(t)\leq y(t)=\frac{\frac{J}{\lambda}y_{0}}{(y_{0}+\frac{J}{\lambda})-y_{0}e^{-Jt}}e^{-Jt}. (3.17)

Similar to the behaviour of z(t)z(t), if y(0)=y0>0y(0)=y_{0}>0 then y(t)y(t) monotonically approaches 0 from above, whereas y(t)y(t) monotonically approaches 0 from below if y(0)(Jλ,0)y(0)\in(-\frac{J}{\lambda},0). (Note that y0Jλy_{0}\leq-\frac{J}{\lambda} is not biologically viable.) Also, by differentiating (3.17) with respect to y0y_{0}, one can see that for any fixed tt, y(t)y(t) is monotonically increasing as y0y_{0} increases.

For any cc\in\mathbb{R}, we take t=12Jlog(N)+ct=\frac{1}{2J}\log(N)+c, which is then non-negative for NN sufficiently large. If in the right-hand side of the relation (3.16), one has y0=x0/Nx>α>0y_{0}=x_{0}/N-x^{\star}>\alpha>0, then, since eδt1e^{-\delta t}\to 1 and eJt0e^{-Jt}\to 0 as NN\to\infty, we obtain

u(t)z(t)(o(1)+(αJλα+Jλ+o(1)))eJcN.\displaystyle u(t)\geq z(t)\geq\left(o(1)+\left(\frac{\alpha\frac{J}{\lambda}}{\alpha+\frac{J}{\lambda}}+o(1)\right)\right)\frac{e^{-Jc}}{\sqrt{N}}.

Therefore, there exists a constant K2K_{2} depending on λ,μ,ε\lambda,\mu,\varepsilon and α\alpha such that for NN large enough,

u(t)z(t)eJcK2N,u(t)\geq z(t)\geq\frac{e^{-Jc}K_{2}}{\sqrt{N}}, (3.18)

which finishes the proof of the second statement in Lemma 3.2. For the first statement, in view of (3.16) and (3.17), and the monotonicity of y(t),z(t)y(t),z(t) with respect to the initial value y0y_{0}, one has, for any 0<y01x0<y_{0}\leq 1-x^{\star},

δ2λ<z(t),0<y(t)y¯(t),-\frac{\delta}{2\lambda}<z(t),\qquad 0<y(t)\leq\bar{y}(t), (3.19)

where

y¯(t):=Jλ(1x)(1x+Jλ)(1x)eJteJt.\bar{y}(t):=\frac{\frac{J}{\lambda}(1-x^{\star})}{(1-x^{\star}+\frac{J}{\lambda})-(1-x^{\star})e^{-Jt}}e^{-Jt}.

Similarly, for any xy0<δ2λ-x^{\star}\leq y_{0}<-\frac{\delta}{2\lambda},

z¯(t)z(t)y(t)<0,\underline{z}(t)\leq z(t)\leq y(t)<0, (3.20)

where

z¯(t):=δ2λ+Jδλ(x+δ2λ)(x+Jλδ2λ)eδt(x+δ2λ)eJteJt.\underline{z}(t):=-\frac{\delta}{2\lambda}+\frac{\frac{J-\delta}{\lambda}(-x^{\star}+\frac{\delta}{2\lambda})}{(-x^{\star}+\frac{J}{\lambda}-\frac{\delta}{2\lambda})e^{-\delta t}-(-x^{\star}+\frac{\delta}{2\lambda})e^{-Jt}}e^{-Jt}.

Also, observe that when y0[δ2λ,0]y_{0}\in[-\frac{\delta}{2\lambda},0], we have z(t),y(t)[δ2λ,0]z(t),y(t)\in[-\frac{\delta}{2\lambda},0] for all t0t\geq 0. Since |u(t)||z(t)||y(t)|\lvert u(t)\rvert\leq\lvert z(t)\rvert\vee\lvert y(t)\rvert, it follows from (3.19) and (3.20) and the previous observation that

|u(t)|{z¯(t),y0[x,δ2λ),δ2λ,y0[δ2λ,0],δ2λy¯(t),y0(0,1x].\lvert u(t)\rvert\leq\begin{cases}-\underline{z}(t),&y_{0}\in[-x^{\star},-\frac{\delta}{2\lambda}),\\ \frac{\delta}{2\lambda},&y_{0}\in[-\frac{\delta}{2\lambda},0],\\ \frac{\delta}{2\lambda}\vee\bar{y}(t),&y_{0}\in(0,1-x^{\star}].\end{cases}

Now, fix any h<1/2h<1/2, so that δ=o(N12)>0\delta=o(N^{-\frac{1}{2}})>0. The proof is completed by noticing that for some positive constants K1,K1K_{1}^{*},K_{1}^{\prime} depending only on λ,μ\lambda,\mu and ε,\varepsilon, one has

y¯(t)eJcK1N,z¯(t)eJcK1N,\bar{y}(t)\leq\frac{e^{-Jc}K_{1}^{*}}{\sqrt{N}},\quad-\underline{z}(t)\leq\frac{e^{-Jc}K_{1}^{\prime}}{\sqrt{N}},

for NN large enough. ∎

4. Proof of the upper bound for the mixing time

In this section, we prove the upper bound in Theorem 1.2, which in particular shows that, for a population of size NN, the mixing time of the Markov chain XNX_{N} is asymptotically at most 12JlogN\frac{1}{2J}\log N.

Recall ρN(t)=maxx{0,1,,N}PNt(x,)πNTV\rho_{N}(t)=\max_{x\in\{0,1,\ldots,N\}}\lVert P^{t}_{N}(x,\cdot)-\pi_{N}\rVert_{TV} and tN=12Jlog(N)t_{N}=\frac{1}{2J}\log(N) from Definition 1.1 and Theorem 1.2.

Lemma 4.1 (Cutoff upper bound, rapid mixing).

For all ξ0\xi\geq 0, we have

lim supNρN(tN+ξ)ψ1(ξ),\limsup_{N\to\infty}\rho_{N}\left(t_{N}+\xi\right)\leq\psi_{1}(\xi),

where ψ1()\psi_{1}(\cdot) is a non-negative real valued function depending only on λ,μ\lambda,\mu and ε\varepsilon and ψ1(ξ)=O(1/ξ)\psi_{1}(\xi)=O(1/\sqrt{\xi}) as ξ\xi\to\infty.

Let ZN(t)Z_{N}(t) and WN(t)W_{N}(t) be two copies of the process XN(t)X_{N}(t), with initial states ZN(0)WN(0)Z_{N}(0)\geq W_{N}(0), moving independently until the coalescence time, which is defined as

τcouple:=inf{t:WN(t)=ZN(t)}.\tau_{\text{couple}}:=\inf\{t:W_{N}(t)=Z_{N}(t)\}.

After the coalescence time, the states in the two copies are the same and move together forever after. Note that this defines a Markovian coupling.

By standard results (see Corollary 5.5 in [23]), to prove Lemma 4.1, it is sufficient to show the following. Write w0,z0\mathbb{P}_{w_{0},z_{0}} for the probability measure of the chain given WN(0)=w0W_{N}(0)=w_{0} and ZN(0)=z0Z_{N}(0)=z_{0}.

Lemma 4.2.

For all w0,z0{0,,N}w_{0},z_{0}\in\{0,\ldots,N\} with w0z0w_{0}\leq z_{0}, and all ξ0\xi\geq 0, we have

lim supNw0,z0(τcouple>12Jlog(N)+ξ)ψ1(ξ),\limsup_{N\to\infty}\mathbb{P}_{w_{0},z_{0}}\left(\tau_{\rm{couple}}>\mbox{$\frac{1}{2J}$}\log(N)+\xi\right)\leq\psi_{1}(\xi),

where ψ1()\psi_{1}(\cdot) is a non-negative real valued function depending only on λ,μ\lambda,\mu and ε\varepsilon and ψ1(ξ)=O(1/ξ)\psi_{1}(\xi)=O(1/\sqrt{\xi}) as ξ\xi\to\infty.

Before we dive into the details, we give an outline of the proof. As mentioned in the introduction, we will break up the analysis into the following phases.

  • Initial phase: We use the concentration result from the Section 3 to show that with high probability, by the time 1h2Jlog(N)\frac{1-h}{2J}\log(N), both copies WN(t),ZN(t)W_{N}(t),Z_{N}(t) will be in the interior of a good set consisting of a ball of size O(N1+h2)O(N^{\frac{1+h}{2}}) around xNx^{\star}N. We also prove that if the copies start anywhere in the interior of the good set, then with high probability they remain in the good set for a time period that is exponential in NN.

  • Intermediate phase: After both copies reach the interior of the good set, the independent coupling of the chains is contractive while the copies stay in the good set. As a consequence, after another h2Jlog(N)\frac{h}{2J}\log(N) time (plus a constant time), the distance between them will drop to N\sqrt{N} with high probability.

  • Final phase: Once the copies are within N\sqrt{N} of each other and given that they are still in the good set, we show the coalescence time of the copies is no greater than the time it takes for an unbiased random walk (with transition rate of order NN, so on average, there are NN events happening in a unit time period) to hit 0, which is at most a time of constant order with high probability.

Throughout the rest of this section, we use x0()\mathbb{P}_{x_{0}}(\cdot) to denote the underlying probability measure of the process XN(t)X_{N}(t) with starting state XN(0)=x0X_{N}(0)=x_{0}.

4.1. Initial phase

We first show that with high probability uniformly over all starting states, by the time 1h2Jlog(N)\frac{1-h}{2J}\log(N), the process XN(t)X_{N}(t) will be within O(N1+h2)O(N^{\frac{1+h}{2}}) of the fixed point xNx^{\star}N. This will be done by a simple application of Lemma 3.1, on account of Proposition 2.1.

To state the result, we first need to introduce some notation, which will be used frequently later. Consider intervals of the form

I(r):=[xr,x+r],r>0.I(r):=\left[x^{\star}-r,x^{\star}+r\right],\quad r>0. (4.1)

We denote the first time that the scaled chain leaves an interval II by

τexitX(I):=inf{t0:XN(t)/NI}.\tau^{X}_{\rm{exit}}(I):=\inf\{t\geq 0:X_{N}(t)/N\notin I\}. (4.2)

To prove the upper bound for the mixing time, we fix a h(0,1)h\in(0,1) and consider the following set to be the good set,

S^S^(N):=I(η),ηη(N):=2(C2+C3)N1h2,\widehat{S}\equiv\widehat{S}(N):=I\left(\eta\right),\quad\eta\equiv\eta(N):=2(C_{2}+C_{3})N^{-\frac{1-h}{2}}, (4.3)

where C2C_{2} was defined in Lemma 3.1 and C3:=(Jλx|x1|)(1x)C_{3}:=\left(\frac{J}{\lambda}\frac{x^{\star}}{\lvert x_{1}^{\star}\rvert}\right)\vee(1-x^{\star}). We also define the interior of the good set by S~S~(N):=I(η2)\widetilde{S}\equiv\widetilde{S}(N):=I\left(\frac{\eta}{2}\right). Moreover, to simplify the notation, we write τexitX\tau^{X}_{\rm{exit}} for τexitX(S^)\tau^{X}_{\rm{exit}}(\widehat{S}), i.e. the first exit time of S^\widehat{S}.

Recall that tfollow=1JeC1Nht_{\rm{follow}}=\mbox{$\frac{1}{J}$}\lceil e^{C_{1}N^{h}}\rceil from (3.2), where C1C_{1} was defined in Lemma 3.1. We prove the following lemma, where the first part of the lemma serves as the initial phase and the second part controls the exit time of the interval I(r)I(r)

Lemma 4.3.
  1. (1)

    For any h(0,1)h\in(0,1) and x0{0,,N}x_{0}\in\{0,\ldots,N\}, one has

    x0(N1XN(1h2Jlog(N))S~)14eC1Nh,\mathbb{P}_{x_{0}}\left(N^{-1}X_{N}\left(\frac{1-h}{2J}\log(N)\right)\in\widetilde{S}\right)\geq 1-4e^{-C_{1}N^{h}},

    for NN sufficiently large such that 1h2Jlog(N)tfollow\frac{1-h}{2J}\log(N)\leq t_{\rm{follow}} and as per Lemma 3.1.

  2. (2)

    For any h(0,1)h\in(0,1), suppose r=r(N)r=r(N) satisfies the following condition

    r2C2N1h21,asN,\frac{r}{2C_{2}}N^{\frac{1-h}{2}}\geq 1,\quad\text{as}\quad N\to\infty, (4.4)

    and x0/NI(r2)x_{0}/N\in I(\frac{r}{2}). Then we have

    x0(τexitX(I(r))>tfollow)14eC1Nh.\mathbb{P}_{x_{0}}(\tau^{X}_{\rm{exit}}(I(r))>t_{\rm{follow}})\geq 1-4e^{-C_{1}N^{h}}.

    In particular, r=η(N)r=\eta(N) satisfies (4.4) and so we have x0(τexitX>tfollow)14eC1Nh\mathbb{P}_{x_{0}}(\tau^{X}_{\rm{exit}}>t_{\rm{follow}})\geq 1-4e^{-C_{1}N^{h}} for all x0/NS~x_{0}/N\in\widetilde{S}.

Proof.

Recall the solution to the governing equation of the deterministic ε\varepsilon-SIS model given in (2.3). A simple calculation using Lemma 3.7 shows that

|x(1h2Jlog(N))x|C3N1h2,\left\lvert x\left(\mbox{$\frac{1-h}{2J}$}\log(N)\right)-x^{\star}\right\rvert\leq C_{3}N^{-\frac{1-h}{2}}, (4.5)

uniformly for x(0){0,,N}x(0)\in\{0,\ldots,N\}. It follows from a union bound and Lemma 3.1 that, for large enough NN such that 1h2Jlog(N)tfollow\frac{1-h}{2J}\log(N)\leq t_{\rm{follow}},

x0\displaystyle\mathbb{P}_{x_{0}} (|XN(1h2Jlog(N))xN|>(C2+C3)N1+h2)\displaystyle\left(\left\lvert X_{N}\left(\mbox{$\frac{1-h}{2J}$}\log(N)\right)-x^{\star}N\right\rvert>(C_{2}+C_{3})N^{\frac{1+h}{2}}\right)
x0(|XN(1h2Jlog(N))x(1h2Jlog(N))N|>C2N1+h2)\displaystyle\qquad\leq\mathbb{P}_{x_{0}}\left(\left\lvert X_{N}\left(\mbox{$\frac{1-h}{2J}$}\log(N)\right)-x\left(\mbox{$\frac{1-h}{2J}$}\log(N)\right)N\right\rvert>C_{2}N^{\frac{1+h}{2}}\right)
x0(supttfollow|XN(t)Nx(t)|>C2N1h2)4eC1Nh,\displaystyle\qquad\leq\mathbb{P}_{x_{0}}\left(\sup_{t\leq t_{\rm{follow}}}\left\lvert\frac{X_{N}(t)}{N}-x(t)\right\rvert>C_{2}N^{-\frac{1-h}{2}}\right)\leq 4e^{-C_{1}N^{h}},

which completes the proof of the first statement. For the second statement, note that, given x(0)I(r2)x(0)\in I(\frac{r}{2}), by the monotonicity of x(t)x(t) in tt, we have x(t)I(r2)x(t)\in I(\frac{r}{2}) for all t0t\geq 0. The condition (4.4), implies that I(r2+C2N1h2)I(r)I(\frac{r}{2}+C_{2}N^{-\frac{1-h}{2}})\subset I(r) for sufficiently large NN. Moreover, it follows from Lemma 3.1 that the probability that XN(t)/NX_{N}(t)/N is within C2N1h2C_{2}N^{-\frac{1-h}{2}} of x(t)x(t) for all ttfollowt\leq t_{\rm{follow}} is at least 14eC1Nh1-4e^{-C_{1}N^{h}}. Together with the previous observation, this implies that the probability that XN(t)/NI(r2+C2N1h2)I(r)X_{N}(t)/N\in I(\frac{r}{2}+C_{2}N^{-\frac{1-h}{2}})\subset I(r) for all ttfollowt\leq t_{\rm{follow}} is at least 14eC1Nh1-4e^{-C_{1}N^{h}}. ∎

The next corollary states the result of Lemma 4.3 in terms of the two coupled copies WN(t)W_{N}(t) and ZN(t)Z_{N}(t). That is, by the time 1h2Jlog(N)\frac{1-h}{2J}\log(N), both WN(t),ZN(t)W_{N}(t),Z_{N}(t) have entered the interior of the good set with high probability. Moreover, once they have reached the interior, they will then remain in the good set for at least tfollowt_{\rm{follow}} time which is exponential in NN with high probability. Let

τexit:=inf{t0:WN(t)/NS^ or ZN(t)/NS^},\displaystyle\tau_{\rm{exit}}:=\inf\left\{t\geq 0:W_{N}(t)/N\notin\widehat{S}\text{\, or \,}Z_{N}(t)/N\notin\widehat{S}\right\}, (4.6)

be the first time either copy leaves the good set. The corollary follows directly from Lemma 4.3 by applying a union bound.

Corollary 4.4.

For any h(0,1)h\in(0,1), the following holds for NN sufficiently large such that 1h2Jlog(N)tfollow\frac{1-h}{2J}\log(N)\leq t_{\rm{follow}} and as per Lemma 3.1:

  1. (1)

    For any w0,z0{0,,N}w_{0},z_{0}\in\{0,\ldots,N\},

    w0,z0(N1WN(1h2Jlog(N))S~,N1ZN(1h2Jlog(N))S~)18eC1Nh.\mathbb{P}_{w_{0},z_{0}}\left(N^{-1}W_{N}\left(\mbox{$\frac{1-h}{2J}$}\log(N)\right)\in\widetilde{S},N^{-1}Z_{N}\left(\mbox{$\frac{1-h}{2J}$}\log(N)\right)\in\widetilde{S}\right)\geq 1-8e^{-C_{1}N^{h}}.
  2. (2)

    Suppose w0/NS~,z0/NS~w_{0}/N\in\widetilde{S},z_{0}/N\in\widetilde{S}, then we have w0,z0(τexit>tfollow)18eC1Nh\mathbb{P}_{w_{0},z_{0}}(\tau_{\rm{exit}}>t_{\rm{follow}})\geq 1-8e^{-C_{1}N^{h}}.

4.2. Intermediate phase

For this phase, we prove that, after the burn-in period in the first phase, the distance between the two copies will drop to N\sqrt{N} with high probability after another h2JlogN\frac{h}{2J}\log N time (plus a constant time) by showing that the coupling is contracting if both WN(t)/N,ZN(t)/NW_{N}(t)/N,Z_{N}(t)/N are in the good set S^\widehat{S}, provided that NN is large enough so that J/2λ>η(N)J/2\lambda>\eta(N), where η\eta is as defined in (4.3). To be precise, recall from (4.6) that τexit\tau_{\rm{exit}} is the first time either copies exits the good set S^\widehat{S}. Also, let DN(t):=ZN(t)WN(t)D_{N}(t):=Z_{N}(t)-W_{N}(t) be the difference between the two copies of the chain at time tt. The lemma below states that, for any ξ>0\xi>0, given that DN(0)=d0D_{N}(0)=d_{0} is of order N1+h2N^{\frac{1+h}{2}}, with high probability, on the event {τexit>h2Jlog(N)+ξ2}\{\tau_{\rm{exit}}>\frac{h}{2J}\log(N)+\frac{\xi}{2}\}, DN(t)D_{N}(t) drops below N\sqrt{N} by the time h2Jlog(N)+ξ2\frac{h}{2J}\log(N)+\frac{\xi}{2}.

For use in the proof, the jump rates of the coupling (WN(t),ZN(t))(W_{N}(t),Z_{N}(t)) for tτcouplet\leq\tau_{\rm{couple}} are given by

(w,z)(w+1,z)\displaystyle(w,z)\mapsto(w+1,z) at rate w,\displaystyle\text{ at rate\quad}\mathcal{I}_{w}, (4.7)
(w,z)(w,z+1)\displaystyle(w,z)\mapsto(w,z+1) at rate z,\displaystyle\text{ at rate\quad}\mathcal{I}_{z},
(w,z)(w1,z)\displaystyle(w,z)\mapsto(w-1,z) at rate 𝒞w,\displaystyle\text{ at rate\quad}\mathcal{C}_{w},
(w,z)(w,z1)\displaystyle(w,z)\mapsto(w,z-1) at rate 𝒞z,\displaystyle\text{ at rate\quad}\mathcal{C}_{z},

where

x(x):=λx(1xN)+ε(Nx)and𝒞x𝒞(x):=μx,x{0,,N}.\mathcal{I}_{x}\equiv\mathcal{I}(x):=\lambda x\left(1-\frac{x}{N}\right)+\varepsilon(N-x)\quad\text{and}\quad\mathcal{C}_{x}\equiv\mathcal{C}(x):=\mu x,\quad x\in\{0,\ldots,N\}.

Note that before colliding the two copies almost surely do not jump simultaneously, and thus they do not cross without colliding. Since the chains move together after colliding and we assume (without loss of generality) that WN(0)ZN(0)W_{N}(0)\leq Z_{N}(0), the coupling is monotonic, meaning WN(t)ZN(t)W_{N}(t)\leq Z_{N}(t) for all t0t\geq 0.

Lemma 4.5.

For h(0,1)h\in(0,1), let d0:=z0w0d_{0}:=z_{0}-w_{0}. Suppose w0/N,z0/NS^w_{0}/N,z_{0}/N\in\widehat{S}, so that d02ηN=4(C2+C3)N1+h2d_{0}\leq 2\eta N=4(C_{2}+C_{3})N^{\frac{1+h}{2}}. There exists a positive constant C4C_{4} depending only on λ,μ\lambda,\mu and ε\varepsilon such that the following relation holds for NN sufficiently large and any ξ>0\xi>0,

w0,z0(DN(h2Jlog(N)+ξ2)N,τexit>h2Jlog(N)+ξ2)C4eJ2ξ.\mathbb{P}_{w_{0},z_{0}}\left(D_{N}\left(\mbox{$\frac{h}{2J}$}\log(N)+\mbox{$\frac{\xi}{2}$}\right)\geq\sqrt{N},\,\tau_{\rm{exit}}>\mbox{$\frac{h}{2J}$}\log(N)+\mbox{$\frac{\xi}{2}$}\right)\leq C_{4}e^{-\mbox{$\frac{J}{2}$}\xi}. (4.8)
Proof.

Recall the possible transitions for this phase defined in (4.7). Since the coupling is monotonic, DN(t)D_{N}(t) is non-negative for all t0t\geq 0. Also, τcouple=inf{t0:DN(t)=0}\tau_{\text{couple}}=\inf\{t\geq 0:D_{N}(t)=0\}. Note that the process DN(t)D_{N}(t) is not a Markov process by itself, but DN(t)D_{N}(t) is uniquely determined by the two-dimensional process (WN(t),ZN(t))(W_{N}(t),Z_{N}(t)), which is Markov with respect to its natural filtration.

For each NN, Proposition 2.1 in [22] implies the process

(WN(t),ZN(t))(WN(0),ZN(0))0tF((WN(s),ZN(s)))𝑑s,t0,\left(W_{N}(t),Z_{N}(t)\right)^{\top}-(W_{N}(0),Z_{N}(0))^{\top}-\int_{0}^{t}F\left((W_{N}(s),Z_{N}(s))^{\top}\right)\,ds,\quad t\geq 0, (4.9)

is a zero-mean martingale, where

F((w,z))\displaystyle F((w,z)^{\top}) :={(0,1)z+(1,0)w+(0,1)𝒞z+(1,0)𝒞w,wz(1,1)w+(1,1)𝒞w,w=z.\displaystyle:=\begin{cases}(0,1)^{\top}\mathcal{I}_{z}+(1,0)^{\top}\mathcal{I}_{w}+(0,-1)^{\top}\mathcal{C}_{z}+(-1,0)^{\top}\mathcal{C}_{w},&w\not=z\\ (1,1)^{\top}\mathcal{I}_{w}+(-1,-1)^{\top}\mathcal{C}_{w},&w=z.\end{cases}

Taking expectations in (4.9) and subtracting the coordinates, we obtain

𝔼w0,z0DN(t)=(z0w0)\displaystyle\mathbb{E}_{w_{0},z_{0}}D_{N}(t)=(z_{0}-w_{0}) +𝔼w0,z00t((ZN(s))(WN(s)))𝑑s\displaystyle+\mathbb{E}_{w_{0},z_{0}}\int_{0}^{t}\left(\mathcal{I}(Z_{N}(s))-\mathcal{I}(W_{N}(s))\right)\,ds
𝔼w0,z00t(𝒞(ZN(s))𝒞(WN(s)))𝑑s.\displaystyle-\mathbb{E}_{w_{0},z_{0}}\int_{0}^{t}(\mathcal{C}(Z_{N}(s))-\mathcal{C}(W_{N}(s)))\,ds.

Note that, for all tτexitt\leq\tau_{\rm{exit}}, one has (xη)NWN(t)ZN(t)\left(x^{\star}-\eta\right)N\leq W_{N}(t)\leq Z_{N}(t). Moreover, for any states w,zS^w,z\in\widehat{S}, we have

zw\displaystyle\mathcal{I}_{z}-\mathcal{I}_{w} =(λε)(zw)λN(zw)(z+w)\displaystyle=(\lambda-\varepsilon)(z-w)-\frac{\lambda}{N}(z-w)(z+w)
(λε)(zw)2λN(zw)(xη)N\displaystyle\leq(\lambda-\varepsilon)(z-w)-\frac{2\lambda}{N}(z-w)\left(x^{\star}-\eta\right)N
=(μJ+2λη)(zw).\displaystyle=\left(\mu-J+2\lambda\eta\right)(z-w).

Choose NN large enough so that J/2λη(N)J/2\lambda\geq\eta(N). Applying the optional stopping theorem to the Dynkin martingale for DN(t)D_{N}(t) (which is a function of the two-dimensional Markov chain (WN(t),ZN(t))(W_{N}(t),Z_{N}(t))) and bounded stopping time tτexitt\wedge\tau_{\rm{exit}}, as well as Fubini’s theorem, and the fact that DN(t)D_{N}(t) is non-negative, we obtain

𝔼w0,z0(DN(t)𝟙t<τexit)\displaystyle\mathbb{E}_{w_{0},z_{0}}\left(D_{N}(t)\mathbbm{1}_{t<\tau_{\rm{exit}}}\right) 𝔼w0,z0DN(tτexit)\displaystyle\leq\mathbb{E}_{w_{0},z_{0}}D_{N}(t\wedge\tau_{\rm{exit}})
(z0w0)(J2λη)𝔼w0,z0(0tτexitDN(s)𝑑s)\displaystyle\leq(z_{0}-w_{0})-(J-2\lambda\eta)\mathbb{E}_{w_{0},z_{0}}\left(\int_{0}^{t\wedge\tau_{\rm{exit}}}D_{N}(s)\,ds\right)
=(z0w0)(J2λη)𝔼w0,z0(0tDN(s)𝟙s<τexit𝑑s)\displaystyle=(z_{0}-w_{0})-(J-2\lambda\eta)\mathbb{E}_{w_{0},z_{0}}\left(\int_{0}^{t}D_{N}(s)\mathbbm{1}_{s<\tau_{\rm{exit}}}\,ds\right)
=(z0w0)(J2λη)0t𝔼w0,z0(DN(s)𝟙s<τexit)𝑑s.\displaystyle=(z_{0}-w_{0})-(J-2\lambda\eta)\int_{0}^{t}\mathbb{E}_{w_{0},z_{0}}\left(D_{N}(s)\mathbbm{1}_{s<\tau_{\rm{exit}}}\right)\,ds.

Letting d(t):=𝔼w0,z0(DN(t)𝟙t<τexit)d(t):=\mathbb{E}_{w_{0},z_{0}}\left(D_{N}(t)\mathbbm{1}_{t<\tau_{\rm{exit}}}\right), we see that d(t)d(t) satisfies the following integral inequality:

d(t)d0(J2λη)0td(s)𝑑s.d(t)\leq d_{0}-(J-2\lambda\eta)\int_{0}^{t}d(s)\,ds.

Then, for all NN large enough so that J/2λη(N)J/2\lambda\geq\eta(N), by adapting Bellman’s proof of the Grönwall’s inequality (see Theorem 1.2.2 in [3]) to this setting, we obtain that for all t0t\geq 0,

d(t)d0e(J2λη)t.d(t)\leq d_{0}e^{-(J-2\lambda\eta)t}.

Since we have assumed that d04(C2+C3)N1+h2d_{0}\leq 4(C_{2}+C_{3})N^{\frac{1+h}{2}}, we have

𝔼w0,z0(DN(h2Jlog(N)+ξ2)𝟙τexit>h2Jlog(N)+ξ2)\displaystyle\mathbb{E}_{w_{0},z_{0}}\left(D_{N}\left(\mbox{$\frac{h}{2J}$}\log(N)+\mbox{$\frac{\xi}{2}$}\right)\mathbbm{1}_{\tau_{\rm{exit}}>\mbox{$\frac{h}{2J}$}\log(N)+\mbox{$\frac{\xi}{2}$}}\right) d0e(J2λη)(h2Jlog(N)+ξ2)\displaystyle\leq d_{0}e^{-(J-2\lambda\eta)\left(\mbox{$\frac{h}{2J}$}\log(N)+\mbox{$\frac{\xi}{2}$}\right)}
4(C2+C3)N12NλhηJeJ2λη2ξ.\displaystyle\leq 4(C_{2}+C_{3})N^{\frac{1}{2}}N^{\frac{\lambda h\eta}{J}}e^{-\mbox{$\frac{J-2\lambda\eta}{2}$}\xi}.

Now, using the fact that 𝟙DN(t)NDN(t)/N\mathbbm{1}_{D_{N}\left(t\right)\geq\sqrt{N}}\leq D_{N}(t)/\sqrt{N} and the previous display, we have

w0,z0\displaystyle\mathbb{P}_{w_{0},z_{0}} (DN(h2Jlog(N)+ξ2)N,τexit>h2Jlog(N)+ξ2)\displaystyle\left(D_{N}\left(\mbox{$\frac{h}{2J}$}\log(N)+\mbox{$\frac{\xi}{2}$}\right)\geq\sqrt{N},\,\tau_{\rm{exit}}>\mbox{$\frac{h}{2J}$}\log(N)+\mbox{$\frac{\xi}{2}$}\right)
N1/2𝔼w0,z0(DN(h2Jlog(N)+ξ2)𝟙τexit>h2Jlog(N)+ξ2)\displaystyle\leq N^{-1/2}\mathbb{E}_{w_{0},z_{0}}\left(D_{N}\left(\mbox{$\frac{h}{2J}$}\log(N)+\mbox{$\frac{\xi}{2}$}\right)\mathbbm{1}_{\tau_{\rm{exit}}>\mbox{$\frac{h}{2J}$}\log(N)+\mbox{$\frac{\xi}{2}$}}\right)
4(C2+C3)NλhηJeJ2λη2ξ.\displaystyle\leq 4(C_{2}+C_{3})N^{\frac{\lambda h\eta}{J}}e^{-\mbox{$\frac{J-2\lambda\eta}{2}$}\xi}.

The proof is complete after noting that, as η(N)=O(N1h2)\eta(N)=O(N^{-\frac{1-h}{2}}), we have limNNλhJη=1\lim_{N\to\infty}N^{\frac{\lambda h}{J}\eta}=1. ∎

4.3. Final phase

We now show that, once the two copies are within distance N\sqrt{N} away from each other, the additional time to coalescence is at most ξ/2>0\xi/2>0 (which is an absolute constant), with high probability, on the event that the copies stay in the good set S^\widehat{S} for a time that is at least ξ/2\xi/2.

Lemma 4.6.

Let τ0:=inf{t0,DN(t)=0}\tau_{0}:=\inf\{t\geq 0,D_{N}(t)=0\}. Suppose w0/N,z0/NS^w_{0}/N,z_{0}/N\in\widehat{S} and d0=z0w0Nd_{0}=z_{0}-w_{0}\leq\sqrt{N}, then for all NN sufficiently large such that J/2λη(N)J/2\lambda\geq\eta(N), the following relation holds for all ξ>0\xi>0,

w0,z0(τ0>ξ2,τexit>ξ2)4μεξ12.\mathbb{P}_{w_{0},z_{0}}\left(\tau_{0}>\mbox{$\frac{\xi}{2}$},\tau_{\rm{exit}}>\mbox{$\frac{\xi}{2}$}\right)\leq\frac{4}{\sqrt{\mu\wedge\varepsilon}}\xi^{-\frac{1}{2}}.

To prove this lemma, we require [6, Proposition 4.1] (stated below as Proposition 4.7), which is a continuous time analogue of [23, Proposition 17.19], and controls the tails of certain hitting times. With the right setup, the proof of Lemma 4.6 is a straightforward application of the lemma.

Proposition 4.7.

[6, Proposition 4.1] Let X(t)X(t) be a continuous-time Markov jump chain with state space SS and rate matrix QQ, which is stable, conservative and non-explosive. Let BB and σ2\sigma^{2} be positive, and let f:S+f:S\to\mathbb{R}_{+} be a function. Set S0:={xS:f(x)=0}S_{0}:=\{x\in S:f(x)=0\}, and assume that

  1. (i)

    the drift yQ(x,y)(f(y)f(x))\sum_{y}Q(x,y)(f(y)-f(x)) of ff is non-positive for all xS\S0x\in S\backslash S_{0};

  2. (ii)

    f(X)f(X) makes jumps of magnitude at most BB;

  3. (iii)

    yQ(x,y)(f(y)f(x))2σ2\sum_{y}Q(x,y)(f(y)-f(x))^{2}\geq\sigma^{2} for all xS\S0x\in S\backslash S_{0}.

Define T:=inf{t:f(X(t))=0}T_{*}:=\inf\{t:f(X(t))=0\} the hitting time of S0S_{0}. Then, for any t02B2/σ2t_{0}\geq 2B^{2}/\sigma^{2},

(Tt0)22f(X0)σt0.\mathbb{P}(T_{*}\geq t_{0})\leq\frac{2\sqrt{2}f(X_{0})}{\sigma\sqrt{t_{0}}}.

The drift condition of the proposition does not hold globally for XNX_{N}, but it does hold in the good set S^\widehat{S}. Since we are only interested in the behaviour of the chain before it leaves the good set, conceptually the lemma still applies. To apply it directly, we introduced a modified version of the chain that reflects at the “boundary” of the good set. Let :={(xη)N1}\ell:=\left\{\left\lceil(x^{\star}-\eta)N\right\rceil-1\right\} and u:={(x+η)N+1}u:=\left\{\left\lfloor(x^{\star}+\eta)N\right\rfloor+1\right\}, be the states corresponding to leaving the good set S^\widehat{S}. Define a state space S:=NS^uS:=N\widehat{S}\cup u\cup\ell. Let X¯N(t)\overline{X}_{N}(t) be a Markov chain on SS, which evolves as follows. If X¯N(t)/NS^\overline{X}_{N}(t)/N\in\widehat{S}, then it makes jumps according to the transition rates of the original process XN(t)X_{N}(t). Otherwise, we have

uu1 at rate 𝒞u,uu+1 at rate 0,1 at rate 0,+1 at rate .\begin{split}\openup 3.0pt\begin{aligned} &u\to u-1\text{\quad at rate \quad}\mathcal{C}_{u},\\ &u\to u+1\text{\quad at rate \quad}0,\end{aligned}\qquad\qquad\begin{aligned} &\ell\to\ell-1\text{\quad at rate \quad}0,\\ &\ell\to\ell+1\text{\quad at rate \quad}\mathcal{I}_{\ell}.\end{aligned}\end{split} (4.10)

The processes XN(t)X_{N}(t) and X¯N(t)\overline{X}_{N}(t) admit a natural coupling, under which the two processes start from the same initial state x0/NS^x_{0}/N\in\widehat{S} and evolve together until time τexitX=inf{t0:XN(t)/NS^}\tau^{X}_{\rm{exit}}=\inf\{t\geq 0:X_{N}(t)/N\notin\widehat{S}\} defined in (4.2). This leads to the following immediate observation: for any event AA in the stopped sigma-algebra of τexitX\tau_{\rm{exit}}^{X}, one has (A)=¯(A)\mathbb{P}(A)=\overline{\mathbb{P}}(A), where we use \mathbb{P} to denote the underlying probability measure of XN(t)X_{N}(t) and ¯\overline{\mathbb{P}} to denote the underlying probability measure of X¯N(t)\overline{X}_{N}(t). That is, the two processes XN(t)X_{N}(t) and X¯N(t)\overline{X}_{N}(t) are path-wise indistinguishable in probability up to time τexitX\tau^{X}_{\rm{exit}}.

As before, let (WN(t),ZN(t))(W_{N}(t),Z_{N}(t)) be two independent copies of XN(t)X_{N}(t) with transitions specified by (4.7) before coalescence, and moving together after coalescence. Also, let (W¯N(t),Z¯N(t))(\overline{W}_{N}(t),\overline{Z}_{N}(t)) evolve as two independent copies of X¯N(t)\overline{X}_{N}(t) with W¯N(0)Z¯N(0)\overline{W}_{N}(0)\leq\overline{Z}_{N}(0) until coalescence, and moving together after coalescence. Analogously to the relationship between XN(t)X_{N}(t) and X¯N(t)\overline{X}_{N}(t), the two-dimensional processes (WN(t),ZN(t))(W_{N}(t),Z_{N}(t)) and (W¯N(t),Z¯N(t))(\overline{W}_{N}(t),\overline{Z}_{N}(t)) are path-wise indistinguishable in probability up to the first time (WN(t),ZN(t))(W_{N}(t),Z_{N}(t)) exits NS^×NS^N\widehat{S}\times N\widehat{S}, that is until the time τexit\tau_{\rm{exit}} defined in (4.6).

Proof of Lemma 4.6.

Set

τ0W¯,Z¯\displaystyle\tau^{\overline{W},\overline{Z}}_{0} :=inf{t0:Z¯N(t)W¯N(t)=0},\displaystyle:=\inf\{t\geq 0:\overline{Z}_{N}(t)-\overline{W}_{N}(t)=0\},
τexitW¯,Z¯\displaystyle\tau^{\overline{W},\overline{Z}}_{\rm{exit}} :=inf{t0:W¯N(t)= or Z¯N(t)=u},\displaystyle:=\inf\{t\geq 0:\overline{W}_{N}(t)=\ell\text{ or }\overline{Z}_{N}(t)=u\},

and write \mathbb{P} for the underlying probability measure of (WN(t),ZN(t))(W_{N}(t),Z_{N}(t)) and ¯\overline{\mathbb{P}} for the underlying probability measure of (W¯N(t),Z¯N(t))(\overline{W}_{N}(t),\overline{Z}_{N}(t)). Since W¯N(t)Z¯N(t)\overline{W}_{N}(t)\leq\overline{Z}_{N}(t) for all t0t\geq 0, the process (W¯N(t),Z¯N(t))(\overline{W}_{N}(t),\overline{Z}_{N}(t)) leaves NS^×NS^N\widehat{S}\times N\widehat{S} either by having the lower one jump down or by having the upper one jump up. Then, as in the previous discussion of the relationship between XN(t)X_{N}(t) and X¯N(t)\overline{X}_{N}(t), we have the following two immediate observations:

w0,z0(τexitξ2)\displaystyle\mathbb{P}_{w_{0},z_{0}}\left(\tau_{\rm{exit}}\leq\mbox{$\frac{\xi}{2}$}\right) =¯w0,z0(τexitW¯,Z¯ξ2);\displaystyle=\overline{\mathbb{P}}_{w_{0},z_{0}}\left(\tau^{\overline{W},\overline{Z}}_{\rm{exit}}\leq\mbox{$\frac{\xi}{2}$}\right); (4.11)
w0,z0(τ0ξ2,τexit>ξ2)\displaystyle\mathbb{P}_{w_{0},z_{0}}\left(\tau_{0}\leq\mbox{$\frac{\xi}{2}$},\,\tau_{\rm{exit}}>\mbox{$\frac{\xi}{2}$}\right) =¯w0,z0(τ0W¯,Z¯ξ2,τexitW¯,Z¯>ξ2).\displaystyle=\overline{\mathbb{P}}_{w_{0},z_{0}}\left(\tau^{\overline{W},\overline{Z}}_{0}\leq\mbox{$\frac{\xi}{2}$},\,\tau^{\overline{W},\overline{Z}}_{\rm{exit}}>\mbox{$\frac{\xi}{2}$}\right). (4.12)

Therefore, it suffices to prove the result for the process (W¯N(t),Z¯N(t))(\overline{W}_{N}(t),\overline{Z}_{N}(t)). To apply [6, Proposition 4.1] (Proposition 4.7) to (W¯N(t),Z¯N(t))(\overline{W}_{N}(t),\overline{Z}_{N}(t)), first set

f((w,z))=(zw)𝟙wz.f((w,z))=(z-w)\mathbbm{1}_{w\leq z}.

Clearly, ff is non-negative and integer-valued, so f(x)1f(\textbf{x})\geq 1 for all x=(w,z)\textbf{x}=(w,z) such that f(x)0f(\textbf{x})\neq 0. Also, Condition (2) of Proposition 4.7 is satisfied by taking B=1B=1.

We now verify Condition (1) of Propositions 4.7, i.e. the drift, yQ(x,y)(f(y)f(x))\sum_{\textbf{y}}Q(\textbf{x},\textbf{y})(f(\textbf{y})-f(\textbf{x})) is non-positive for all x such that f(x)0f(\textbf{x})\neq 0. First note that, in our case, there are four terms in the sum since, before coalescence, the coupled process (W¯N(t),Z¯N(t))(\overline{W}_{N}(t),\overline{Z}_{N}(t)) has four possible ways of jumping. The movements

x=(w¯,z¯)y=(w¯+1,z¯) with Q(x,y)=w¯,\displaystyle\textbf{x}=(\bar{w},\bar{z})\mapsto\textbf{y}=(\bar{w}+1,\bar{z})\text{\quad with \quad}Q(\textbf{x},\textbf{y})=\mathcal{I}_{\bar{w}},
x=(w¯,z¯)y=(w¯,z¯1) with Q(x,y)=𝒞z¯,\displaystyle\textbf{x}=(\bar{w},\bar{z})\mapsto\textbf{y}=(\bar{w},\bar{z}-1)\text{\quad with \quad}Q(\textbf{x},\textbf{y})=\mathcal{C}_{\bar{z}},

are always allowed and lead to f(y)f(x)=1f(\textbf{y})-f(\textbf{x})=-1. The other two possible jumps,

x=(w¯,z¯)y=(w¯,z¯+1) with Q(x,y)=z¯𝟙{z¯u},\displaystyle\textbf{x}=(\bar{w},\bar{z})\mapsto\textbf{y}=(\bar{w},\bar{z}+1)\text{\quad with \quad}Q(\textbf{x},\textbf{y})=\mathcal{I}_{\bar{z}}\cdot\mathbbm{1}\{\bar{z}\neq u\},
x=(w¯,z¯)y=(w¯1,z¯) with Q(x,y)=𝒞w¯𝟙{w¯},\displaystyle\textbf{x}=(\bar{w},\bar{z})\mapsto\textbf{y}=(\bar{w}-1,\bar{z})\text{\quad with \quad}Q(\textbf{x},\textbf{y})=\mathcal{C}_{\bar{w}}\cdot\mathbbm{1}\{\bar{w}\neq\ell\},

always leads to f(y)f(x)=1f(\textbf{y})-f(\textbf{x})=1 and the boundary conditions are reflected by the indicators in the transition rates. Also, note that for all xSx\in S,

|x/Nx|η+N1=O(N1h2).\lvert x/N-x^{\star}\rvert\leq\eta+N^{-1}=O(N^{-\frac{1-h}{2}}).

with ηη(N)\eta\equiv\eta(N) defined in (4.3). It then follows that

yQ(x,y)(f(y)f(x))\displaystyle\smash{\sum_{\textbf{y}}}Q(\textbf{x},\textbf{y})(f(\textbf{y})-f(\textbf{x})) =(z¯𝟙{z¯u}+𝒞w¯𝟙{w¯})(w¯+𝒞z¯)\displaystyle=(\mathcal{I}_{\bar{z}}\cdot\mathbbm{1}\{\bar{z}\neq u\}+\mathcal{C}_{\bar{w}}\cdot\mathbbm{1}\{\bar{w}\neq\ell\})-(\mathcal{I}_{\bar{w}}+\mathcal{C}_{\bar{z}})
(z¯w¯)(𝒞z¯𝒞w¯)\displaystyle\leq(\mathcal{I}_{\bar{z}}-\mathcal{I}_{\bar{w}})-(\mathcal{C}_{\bar{z}}-\mathcal{C}_{\bar{w}})
=(λμε)(z¯w¯)λN(z¯w¯)(z¯+w¯)\displaystyle=(\lambda-\mu-\varepsilon)(\bar{z}-\bar{w})-\mbox{$\frac{\lambda}{N}$}(\bar{z}-\bar{w})(\bar{z}+\bar{w})
((λμε)2λ(xηN1))(z¯w¯)\displaystyle\leq\left((\lambda-\mu-\varepsilon)-2\lambda\left(x^{\star}-\eta-N^{-1}\right)\right)(\bar{z}-\bar{w})
=(J2λη2λN1)|z¯w¯|,\displaystyle=-(J-2\lambda\eta-2\lambda N^{-1})\lvert\bar{z}-\bar{w}\rvert, (4.13)

which is clearly negative, when NN is sufficiently large, for z¯w¯\bar{z}\not=\bar{w}. To get the σ2\sigma^{2} satisfying Condition (3), note that

yQ(x,y)(f(y)f(x))2\displaystyle\sum_{\textbf{y}}Q(\textbf{x},\textbf{y})(f(\textbf{y})-f(\textbf{x}))^{2} =w¯+z¯𝟙{z¯u}+𝒞z¯+𝒞w¯𝟙{w¯}\displaystyle=\mathcal{I}_{\bar{w}}+\mathcal{I}_{\bar{z}}\cdot\mathbbm{1}\{\bar{z}\neq u\}+\mathcal{C}_{\bar{z}}+\mathcal{C}_{\bar{w}}\cdot\mathbbm{1}\{\bar{w}\neq\ell\}
w¯+𝒞z¯w¯+𝒞w¯(με)N:=σ2,\displaystyle\geq\mathcal{I}_{\bar{w}}+\mathcal{C}_{\bar{z}}\geq\mathcal{I}_{\bar{w}}+\mathcal{C}_{\bar{w}}\geq(\mu\wedge\varepsilon)N:=\sigma^{2},

since the minimum of

g(x):=x+𝒞x=λNx2+(λ+με)x+εNg(x):=\mathcal{I}_{x}+\mathcal{C}_{x}=-\frac{\lambda}{N}x^{2}+(\lambda+\mu-\varepsilon)x+\varepsilon N

is achieved either at g(0)=εNg(0)=\varepsilon N or g(N)=μNg(N)=\mu N. An application of Proposition 4.7 gives

¯w0,z0(τ0W¯,Z¯τexitW¯,Z¯>ξ2)¯w0,z0(τ0W¯,Z¯>ξ2)22d0(με)Nξ/24μεξ12.\overline{\mathbb{P}}_{w_{0},z_{0}}\left(\tau^{\overline{W},\overline{Z}}_{0}\wedge\tau^{\overline{W},\overline{Z}}_{\rm{exit}}>\mbox{$\frac{\xi}{2}$}\right)\leq\overline{\mathbb{P}}_{w_{0},z_{0}}\left(\tau^{\overline{W},\overline{Z}}_{0}>\mbox{$\frac{\xi}{2}$}\right)\leq\frac{2\sqrt{2}d_{0}}{\sqrt{(\mu\wedge\varepsilon)N}\sqrt{\xi/2}}\leq\frac{4}{\sqrt{\mu\wedge\varepsilon}}\xi^{-\frac{1}{2}}. (4.14)

Then in view of observation (4.11) and (4.12), we conclude that

w0,z0(τ0ξ/2,τexit>ξ/2)\displaystyle\mathbb{P}_{w_{0},z_{0}}\left(\tau_{0}\leq\xi/2,\tau_{\rm{exit}}>\xi/2\right) =¯w0,z0(τ0W¯,Z¯ξ/2,τexitW¯,Z¯>ξ/2)\displaystyle=\overline{\mathbb{P}}_{w_{0},z_{0}}\left(\tau^{\overline{W},\overline{Z}}_{0}\leq\xi/2,\,\tau^{\overline{W},\overline{Z}}_{\rm{exit}}>\xi/2\right)
=¯w0,z0(τexitW¯,Z¯>ξ/2)¯w0,z0(τ0W¯,Z¯τexitW¯,Z¯>ξ/2)\displaystyle=\overline{\mathbb{P}}_{w_{0},z_{0}}\left(\tau^{\overline{W},\overline{Z}}_{\rm{exit}}>\xi/2\right)-\overline{\mathbb{P}}_{w_{0},z_{0}}\left(\tau^{\overline{W},\overline{Z}}_{0}\wedge\tau^{\overline{W},\overline{Z}}_{\rm{exit}}>\xi/2\right)
w0,z0(τexit>ξ/2)4μεξ12,\displaystyle\geq\mathbb{P}_{w_{0},z_{0}}\left(\tau_{\rm{exit}}>\xi/2\right)-\frac{4}{\sqrt{\mu\wedge\varepsilon}}\xi^{-\frac{1}{2}},

and hence

w0,z0(τ0>ξ/2,τexit>ξ/2)\displaystyle\mathbb{P}_{w_{0},z_{0}}\left(\tau_{0}>\xi/2,\tau_{\rm{exit}}>\xi/2\right) =w0,z0(τexit>ξ/2)w0,z0(τ0ξ/2,τexit>ξ/2)\displaystyle=\mathbb{P}_{w_{0},z_{0}}\left(\tau_{\rm{exit}}>\xi/2\right)-\mathbb{P}_{w_{0},z_{0}}\left(\tau_{0}\leq\xi/2,\tau_{\rm{exit}}>\xi/2\right)
4μεξ12,\displaystyle\leq\frac{4}{\sqrt{\mu\wedge\varepsilon}}\xi^{-\frac{1}{2}},

as desired. ∎

4.4. Bounding the time to coalescence.

We combine the previous results from this section for a proof of Lemma 4.2.

Proof of Lemma 4.2.

Fix ξ>0\xi>0. Denote the good event that the coupled processes WN(t)W_{N}(t) and ZN(t)Z_{N}(t) are in the interior good set S~\widetilde{S} by time 1h2Jlog(N)\frac{1-h}{2J}\log(N) by

BN:={N1WN(1h2Jlog(N))S~,N1ZN(1h2Jlog(N))S~}.B_{N}:=\left\{N^{-1}W_{N}\left(\mbox{$\frac{1-h}{2J}$}\log(N)\right)\in\widetilde{S},N^{-1}Z_{N}\left(\mbox{$\frac{1-h}{2J}$}\log(N)\right)\in\widetilde{S}\right\}.

Then the first part of Corollary 4.4 implies

w0,z0(τcouple>1h2Jlog(N)+h2Jlog(N)+ξ)w0,z0(τcouple>1h2Jlog(N)+h2Jlog(N)+ξ,BN)+8eC1Nh\begin{split}\mathbb{P}_{w_{0},z_{0}}&\left(\tau_{\rm{couple}}>\mbox{$\frac{1-h}{2J}$}\log(N)+\mbox{$\frac{h}{2J}$}\log(N)+\xi\right)\\ &\leq\mathbb{P}_{w_{0},z_{0}}\left(\tau_{\rm{couple}}>\mbox{$\frac{1-h}{2J}$}\log(N)+\mbox{$\frac{h}{2J}$}\log(N)+\xi,B_{N}\right)+8e^{-C_{1}N^{h}}\end{split} (4.15)

By the Markov property, to bound the first term on the right hand side of (4.15), it is enough to bound, uniformly in w0/N,z0/NS~S^w_{0}/N,z_{0}/N\in\widetilde{S}\subset\widehat{S},

w0,z0\displaystyle\mathbb{P}_{w_{0},z_{0}} (τcouple>h2Jlog(N)+ξ)\displaystyle\left(\tau_{\rm{couple}}>\mbox{$\frac{h}{2J}$}\log(N)+\xi\right)
w0,z0(τcouple>h2Jlog(N)+ξ,τexit>tfollow)+8eC1Nh\displaystyle\leq\mathbb{P}_{w_{0},z_{0}}\left(\tau_{\rm{couple}}>\mbox{$\frac{h}{2J}$}\log(N)+\xi,\tau_{\rm{exit}}>t_{\rm{follow}}\right)+8e^{-C_{1}N^{h}}
w0,z0(τcouple>h2Jlog(N)+ξ,τexit>h2Jlog(N)+ξ)+8eC1Nh\displaystyle\leq\mathbb{P}_{w_{0},z_{0}}\left(\tau_{\rm{couple}}>\mbox{$\frac{h}{2J}$}\log(N)+\xi,\tau_{\rm{exit}}>\mbox{$\frac{h}{2J}$}\log(N)+\xi\right)+8e^{-C_{1}N^{h}}

where the first inequality follows from the second part of Corollary 4.4, and for the second inequality we assume NN is sufficiently large so that h2Jlog(N)+ξtfollow\mbox{$\frac{h}{2J}$}\log(N)+\xi\leq t_{\rm{follow}}.

Lemma 4.5 and Lemma 4.6 combined with a straightforward application of the Markov property implies that for all NN large enough,

w0,z0(τcouple>h2Jlog(N)+ξ,τexit>h2Jlog(N)+ξ)4μεξ12+C4eJ2ξ.\mathbb{P}_{w_{0},z_{0}}\left(\tau_{\rm{couple}}>\mbox{$\frac{h}{2J}$}\log(N)+\xi,\,\tau_{\rm{exit}}>\mbox{$\frac{h}{2J}$}\log(N)+\xi\right)\leq\frac{4}{\sqrt{\mu\wedge\varepsilon}}\xi^{-\frac{1}{2}}+C_{4}e^{-\mbox{$\frac{J}{2}$}\xi}.

Combining the displays above gives

w0,z0(τcouple>1h2Jlog(N)+h2Jlog(N)+ξ)4μεξ12+C4eJ2ξ+16eC1Nh,\mathbb{P}_{w_{0},z_{0}}\left(\tau_{\rm{couple}}>\mbox{$\frac{1-h}{2J}$}\log(N)+\mbox{$\frac{h}{2J}$}\log(N)+\xi\right)\leq\frac{4}{\sqrt{\mu\wedge\varepsilon}}\xi^{-\frac{1}{2}}+C_{4}e^{-\mbox{$\frac{J}{2}$}\xi}+16e^{-C_{1}N^{h}},

and taking the limit superior as NN\to\infty on both sides finishes the proof. ∎

5. Proof of the lower bound for the mixing time

In this section, we prove the following lower bound on the mixing time, which together with Lemma 4.1 proves Theorem 1.2. Recall ρN(t)=maxx{0,1,,N}PNt(x,)πNTV\rho_{N}(t)=\max_{x\in\{0,1,\ldots,N\}}\lVert P^{t}_{N}(x,\cdot)-\pi_{N}\rVert_{TV} and tN=12Jlog(N)t_{N}=\frac{1}{2J}\log(N) from Definition 1.1 and Theorem 1.2.

Lemma 5.1.

For sufficiently large ξ0\xi\geq 0, we have

lim infNρN(tNξ)1ψ2(ξ),\liminf_{N\to\infty}\rho_{N}\left(t_{N}-\xi\right)\geq 1-\psi_{2}(\xi),

where ψ2()\psi_{2}(\cdot) is a non-negative real valued function depending only on λ,μ\lambda,\mu and ε\varepsilon with ψ2(ξ)=O(1/ξ)\psi_{2}(\xi)=O(1/\sqrt{\xi}) as ξ\xi\to\infty.

Before outlining the strategy of the proof, we first state an improved version of the concentration of measure inequality from Lemma 3.1 for the scaled process XN(t)/NX_{N}(t)/N, which holds only if the starting state of the chain XN(t)X_{N}(t) is close enough to xNx^{\star}N. Recall the interval of the form I(r)=[xr,x+r]I(r)=[x^{\star}-r,x^{\star}+r] defined in (4.1). In the previous section, we took r=η(N)r=\eta(N), in defining our good set S^\widehat{S}, so that the size of the good set shrinks as NN gets large. To prove the lower bound, we consider r<J/6λr<J/6\lambda as a positive constant, i.e. independent of NN.

Lemma 5.2.

Fix h(0,1)h\in(0,1), r<J/6λr<J/6\lambda and ξ>0\xi>0. Let x0/NI(r2)x_{0}/N\in I(\frac{r}{2}) and let ttfollow=1JeC1Nht\leq t_{\rm{follow}}=\mbox{$\frac{1}{J}$}\lceil e^{C_{1}N^{h}}\rceil. Then the following holds for all NN large enough (depending on h,ξ,λ,μh,\xi,\lambda,\mu and ε\varepsilon):

x0(|XN(t)𝔼x0(XN(t))|2ξN)3eJξ23(λ+μ+ε).\mathbb{P}_{x_{0}}\left(\left\lvert X_{N}(t)-\mathbb{E}_{x_{0}}(X_{N}(t))\right\rvert\geq 2\xi\sqrt{N}\right)\leq 3e^{-\frac{J\xi^{2}}{3(\lambda+\mu+\varepsilon)}}.

The lemma essentially follows from an application of [6, Theorem 3.3], but the details are technical, so we postpone the proof until the end of this section. Assuming that Lemma 5.2 holds, the proof of Lemma 5.1 is then broken down into two steps.

The first step is taken in Lemma 5.3, where we show that the stationary distribution πN\pi_{N} is with high probability concentrated in a ball of order N\sqrt{N} around the fixed point xNx^{\star}N. The explicit expression for the stationary distribution can be obtained by solving the detailed balance equation, (see for example the displays (A1) and (A2) in [30]), but it is not easy to observe the concentration from this expression. The outline of our strategy is to first fix ξ>0\xi>0 and choose NN large enough such that tN+ξ<tfollowt_{N}+\xi<t_{\rm{follow}}. We use the improved concentration inequality of Lemma 5.2 to show that the distribution of XN(tN+ξ)X_{N}(t_{N}+\xi) is concentrated in a 2ξN2\xi\sqrt{N}-ball around 𝔼x0(XN(tN+ξ))\mathbb{E}_{x_{0}}(X_{N}(t_{N}+\xi)) given x0/NI(r2)x_{0}/N\in I(\frac{r}{2}). Next, the first part of Lemma 3.2 implies that 𝔼x0(XN(tN+ξ))\mathbb{E}_{x_{0}}(X_{N}(t_{N}+\xi)) is within eJξK1Ne^{-J\xi}K_{1}\sqrt{N} of xNx^{\star}N. Together these imply that XN(tN+ξ)X_{N}(t_{N}+\xi) is in a (eJξK1+2ξ)N(e^{-J\xi}K_{1}+2\xi)\sqrt{N} ball around xNx^{\star}N with high probability. Finally, since the distribution of XNX_{N} is close to stationary by the time tN+ξt_{N}+\xi, as stated in Lemma 4.1, this implies that πN\pi_{N} also assigns most of the mass to a (eJξK1+2ξ)N(e^{-J\xi}K_{1}+2\xi)\sqrt{N} ball around xNx^{\star}N.

To prove the lower bound, it is sufficient to only consider one specific initial state. For convenience, we take the initial state to be x¯:=(x+r2)N\bar{x}:=\lfloor(x^{\star}+\frac{r}{2})N\rfloor. The second step of the proof of Lemma 5.1 is then to show that the distribution of XN(tNξ)X_{N}(t_{N}-\xi) given XN(0)=x¯X_{N}(0)=\bar{x} is concentrated in a region that is sufficiently apart from xNx^{\star}N; combined with the concentration of πN\pi_{N} around xNx^{\star}N, this gives a lower bound on the mixing time. The outline of the proof of this result is as follows. First fix a ξ\xi large enough such that eJξK22ξ>eJξK1+2ξe^{J\xi}K_{2}-2\xi>e^{-J\xi}K_{1}+2\xi, where K1,K2K_{1},K_{2} are as in Lemma 3.2. By the improved concentration inequality, the distribution of XN(tNξ)X_{N}(t_{N}-\xi) is concentrated in a 2ξN2\xi\sqrt{N}-ball around 𝔼x¯(XN(tNξ))\mathbb{E}_{\bar{x}}(X_{N}(t_{N}-\xi)). The second part of Lemma 3.2 then implies that 𝔼x¯(XN(tNξ))\mathbb{E}_{\bar{x}}(X_{N}(t_{N}-\xi)) at least eJξK2Ne^{J\xi}K_{2}\sqrt{N} above the fixed point xNx^{\star}N. Together, these two results indicate that with high probability

Xx¯(tNξ)xN(eJξK22ξ)N>(eJξK1+2ξ)N,X_{\bar{x}}(t_{N}-\xi)-x^{\star}N\geq(e^{J\xi}K_{2}-2\xi)\sqrt{N}>(e^{-J\xi}K_{1}+2\xi)\sqrt{N},

but by Lemma 5.3, we know that πN\pi_{N} assigns most of its mass within (eJξK1+2ξ)N(e^{-J\xi}K_{1}+2\xi)\sqrt{N} of xNx^{\star}N.

Now, with the outline in mind, we state and prove concentration for the stationary distribution πN\pi_{N}.

Lemma 5.3.

For any ξ>0\xi>0, define the set

𝒮N:={x{0,1,,N}:|xxN|(eJξK1+2ξ)N},\mathcal{S}_{N}:=\left\{x\in\{0,1,\ldots,N\}:\left\lvert x-x^{\star}N\right\rvert\leq(e^{-J\xi}K_{1}+2\xi)\sqrt{N}\right\},

where K1K_{1} is the constant defined in Lemma 3.2. One has

lim supNπN(𝒮Nc)ψ1(ξ)+3eJξ23(λ+μ+ε),\limsup_{N\to\infty}\pi_{N}(\mathcal{S}_{N}^{c})\leq\psi_{1}(\xi)+3e^{-\frac{J\xi^{2}}{3(\lambda+\mu+\varepsilon)}},

where ψ1(ξ)=O(1/ξ)\psi_{1}(\xi)=O(1/\sqrt{\xi}) is as defined in Lemma 4.1.

Proof.

Let x0/NI(r2)x_{0}/N\in I(\frac{r}{2}), let h(0,1)h\in(0,1) and let ξ>0\xi>0. Lemma 5.2 implies that, as long as NN is sufficiently large,

x0(|XN(tN+ξ)𝔼x0(XN(tN+ξ))|2ξN)13eJξ23(λ+μ+ε).\displaystyle\mathbb{P}_{x_{0}}\left(\left\lvert X_{N}\left(t_{N}+\xi\right)-\mathbb{E}_{x_{0}}\left(X_{N}(t_{N}+\xi)\right)\right\rvert\leq 2\xi\sqrt{N}\right)\geq 1-3e^{-\frac{J\xi^{2}}{3(\lambda+\mu+\varepsilon)}}.

Moreover, applying the first part of Lemma 3.2 with c=ξc=\xi, we obtain

|𝔼x0(XN(tN+ξ))xN|eJξK1N,\lvert\mathbb{E}_{x_{0}}(X_{N}(t_{N}+\xi))-x^{\star}N\rvert\leq e^{-J\xi}K_{1}\sqrt{N},

for NN large enough. It then follows that, for NN large enough,

x0\displaystyle\mathbb{P}_{x_{0}} (|XN(tN+ξ)xN|(2ξ+eJξK1)N)\displaystyle\left(\left\lvert X_{N}(t_{N}+\xi)-x^{\star}N\right\rvert\leq(2\xi+e^{-J\xi}K_{1})\sqrt{N}\right)
(|XN(tN+ξ)𝔼x0(XN(tN+ξ))N|2ξN)\displaystyle\geq\mathbb{P}\left(\left\lvert X_{N}(t_{N}+\xi)-\mathbb{E}_{x_{0}}\left(X_{N}(t_{N}+\xi)\right)\,N\right\rvert\leq 2\xi\sqrt{N}\right)
13eJξ23(λ+μ+ε).\displaystyle\geq 1-3e^{-\frac{J\xi^{2}}{3(\lambda+\mu+\varepsilon)}}.

Thus, for any ξ>0\xi>0 and NN large enough, the inequality implies that

πN(𝒮Nc)\displaystyle\pi_{N}(\mathcal{S}_{N}^{c}) PNtN+ξ(x0,𝒮Nc)+ρN(tN+ξ)\displaystyle\leq P_{N}^{t_{N}+\xi}(x_{0},\,\mathcal{S}_{N}^{c})+\rho_{N}(t_{N}+\xi)
3eJξ23(λ+μ+ε)+ρN(tN+ξ),\displaystyle\leq 3e^{-\frac{J\xi^{2}}{3(\lambda+\mu+\varepsilon)}}+\rho_{N}(t_{N}+\xi),

where PNt(x0,)P_{N}^{t}(x_{0},\,\cdot) is the law of XN(t)X_{N}(t) given the initial state x0x_{0}. The result now follows by taking the limsup and applying Lemma 4.1. ∎

We can now prove our lower bound on the mixing time.

Proof of Lemma 5.1.

Fix a large enough ξ>0\xi>0 such that eJξK22ξ>eJξK1+2ξe^{J\xi}K_{2}-2\xi>e^{-J\xi}K_{1}+2\xi and let x¯=(x+r2)N\bar{x}=\lfloor(x^{\star}+\frac{r}{2})N\rfloor. It follows from the second part of Lemma 3.2 by taking c=ξc=-\xi that

𝔼x¯(XN(tNξ))xNeJξK2N,\mathbb{E}_{\bar{x}}(X_{N}(t_{N}-\xi))-x^{\star}N\geq e^{J\xi}K_{2}\sqrt{N},

for all NN large enough. Moreover, note that x¯/NI(r2)\bar{x}/N\in I(\frac{r}{2}) and so for any NN such that tNξtfollowt_{N}-\xi\leq t_{\rm{follow}}, Lemma 5.2 implies that

x¯(|XN(tNξ)𝔼x¯(XN(tNξ))|2ξN)13eJξ23(λ+μ+ε).\displaystyle\mathbb{P}_{\bar{x}}\left(\left\lvert X_{N}(t_{N}-\xi)-\mathbb{E}_{\bar{x}}\left(X_{N}(t_{N}-\xi)\right)\right\rvert\leq 2\xi\sqrt{N}\right)\geq 1-3e^{-\frac{J\xi^{2}}{3(\lambda+\mu+\varepsilon)}}.

for all NN large enough. Combining the previous two relations gives

PNtNξ(x¯,𝒮N)3eJξ23(λ+μ+ε),P_{N}^{t_{N}-\xi}(\bar{x},\mathcal{S}_{N})\leq 3e^{-\frac{J\xi^{2}}{3(\lambda+\mu+\varepsilon)}}, (5.1)

where 𝒮N\mathcal{S}_{N} is the set defined in Lemma 5.3. Now,

ρN(tNξ)\displaystyle\rho_{N}(t_{N}-\xi) PNtNξ(x¯,)πNTV|πN(𝒮N)PNtnξ(x¯,𝒮N)|,\displaystyle\geq\lVert P_{N}^{t_{N}-\xi}(\bar{x},\cdot)-\pi_{N}\rVert_{TV}\geq\lvert\pi_{N}(\mathcal{S}_{N})-P_{N}^{t_{n}-\xi}(\bar{x},\mathcal{S}_{N})\rvert,

so that (5.1) and Lemma 5.3 imply that for all sufficiently large ξ\xi,

lim infNρN(tNξ)1ψ1(ξ)6eJξ23(λ+μ+ε).\liminf_{N\to\infty}\rho_{N}(t_{N}-\xi)\geq 1-\psi_{1}(\xi)-6e^{-\frac{J\xi^{2}}{3(\lambda+\mu+\varepsilon)}}.\qed

The only thing left at this point is to prove Lemma 5.2. To do this, we shall make use of [6, Theorem 3.3] which provides a concentration inequality for contracting Markov chains.

Theorem 5.4.

[6, Theorem 3.3] Let X(t)X(t) be a stable, conservative, non-explosive continuous-time Markov chain on a discrete state space EE, with generator matrix Q:=(Q(x,y):x,yE)Q:=(Q(x,y):x,y\in E). Suppose that d(,)d(\cdot,\,\cdot) is a metric on EE, and let f:Ef:E\to\mathbb{R} be a function such that, for some constant LL, |f(x)f(y)|Ld(x,y)\lvert f(x)-f(y)\rvert\leq Ld(x,y) for all x,yEx,y\in E. Let E^\widehat{E} be a subset of EE, and let qq and DD be constants such that Q(x,x)q-Q(x,x)\leq q for all xE^x\in\widehat{E} and d(x,y)Dd(x,\,y)\leq D whenever xE^x\in\widehat{E} and yy is such that Q(x,y)>0Q(x,y)>0. For t>0t>0, let At={X(s)E^ for 0s<t}A_{t}=\{X(s)\in\widehat{E}\text{ for }0\leq s<t\}. Suppose there is a Markovian coupling of two copies of X(t)X(t) with generator 𝒜\mathcal{A} and constant >0\ell>0, such that for all x,yEx,y\in E,

𝒜d(x,y)d(x,y).\mathcal{A}d(x,y)\leq-\ell d(x,y). (5.2)

Then, for all xE^,t>0x\in\widehat{E},t>0 and ξ0\xi\geq 0,

x({|f(X(t))𝔼x(f(X(t)))|ξ}At)2exp(ξ2qL2D2/+2LDξ/3).\mathbb{P}_{x}\left(\left\{\left\lvert f(X(t))-\mathbb{E}_{x}(f(X(t)))\right\rvert\geq\xi\right\}\cap A_{t}\right)\leq 2\exp\left(-\frac{\xi^{2}}{qL^{2}D^{2}/\ell+2LD\xi/3}\right).

We are not able to apply this theorem directly to our chain XN(t)X_{N}(t) since it is not contracting on the entire state space. One could modify the theorem to get a version that requires the contraction property to hold only on a “good” subset of the state space, as is the case for XN(t)X_{N}(t) near xNx^{\star}N. (Indeed, [6, Theorem 2.3] is stated in such a form, but for discrete-time.) Instead, we avoid this issue by making use of the reflected chain X¯N(t)\overline{X}_{N}(t) defined for the final phase immediately after Proposition 4.7, which behaves as XN(t)X_{N}(t) in a good set and reflects at the boundary. Here, with a bit of abuse of notation, we define X¯N(t)\overline{X}_{N}(t) to be the reflected chain on the state space E:=NI(r){(x+r)N+1}{(xr)N1}E:=NI(r)\cup\{\lfloor(x^{\star}+r)N\rfloor+1\}\cup\{\lceil(x^{\star}-r)N\rceil-1\}, where {(x+r)N+1}\{\lfloor(x^{\star}+r)N\rfloor+1\} and {(xr)N1}\{\lceil(x^{\star}-r)N\rceil-1\} are considered to the the upper and lower boundary respectively and r<J/6λr<J/6\lambda is a fixed positive constant.

Proof of Lemma 5.2.

Recall the definition of τexitX(I(r))\tau^{X}_{\rm{exit}}(I(r)) from (4.2). Just as we argued in the passage following (4.10), it is sufficient to prove the concentration for X¯N(t)\overline{X}_{N}(t) on At={τexitX¯(I(r))t}A_{t}=\{\tau^{\overline{X}}_{\rm{exit}}(I(r))\geq t\}, where τexitX¯(I(r)):=inf{t0,X¯N(t){(x+r)N+1}{(xr)N1}}\tau^{\overline{X}}_{\rm{exit}}(I(r)):=\inf\{t\geq 0,\overline{X}_{N}(t)\in\{\lfloor(x^{\star}+r)N\rfloor+1\}\cup\{\lceil(x^{\star}-r)N\rceil-1\}\}. To apply Theorem 5.4, we take ff to be the identity function and the metric d(x,y)=|xy|d(x,\,y)=\lvert x-y\rvert so that L=1L=1. The jump size of X¯N(t)\overline{X}_{N}(t) is clearly bounded by D=1D=1 and we have for all x{0,,N}x\in\{0,\ldots,N\}

Q(x,x)λx(1x/N)+ε(Nx)+μx(λ+μ+ε)N:=q.-Q(x,x)\leq\lambda x(1-x/N)+\varepsilon(N-x)+\mu x\leq(\lambda+\mu+\varepsilon)N:=q.

Finally, we show that there is a coupling (W¯N(t),Z¯N(t))(\overline{W}_{N}(t),\overline{Z}_{N}(t)) of two copies of X¯N(t)\overline{X}_{N}(t) which satisfies (5.2) for all (W¯N(0),Z¯N(0))=(w,z)E×E(\overline{W}_{N}(0),\overline{Z}_{N}(0))=(w,z)\in E\times E, and hence X¯N(t)\overline{X}_{N}(t) is contracting in Wasserstein distance. The coupling is that used earlier in Section 4.3, where the copies move independently until they collide, and then they move in unison. Verifying (5.2) for this coupling follows very similarly to verifying Condition (1) in Proposition 4.7.

To compute the left-hand side of (5.2), first note that if the two copies have already collided with each other, then the left-hand side is 0. Otherwise, the left-hand side is the same as in (4.3) with η\eta replaced by rr. Since we have chosen rr to be less than J/6λJ/6\lambda, so for sufficiently large NN we must have J2λr2λN1J/2J-2\lambda r-2\lambda N^{-1}\geq J/2. Therefore, (5.2) is satisfied with =J/2\ell=J/2. Then, Theorem 5.4 implies that for all x0E^:=NI(r)x_{0}\in\widehat{E}:=NI(r), t0t\geq 0, and ξ>0\xi>0,

¯x0({|X¯N(t)𝔼¯x0(X¯N(t))|ξN}{τexitX¯(I(r))t})2eξ22(λ+μ+ε)/J+2ξ/(3N).\displaystyle\overline{\mathbb{P}}_{x_{0}}\left(\left\{\left\lvert\overline{X}_{N}(t)-\overline{\mathbb{E}}_{x_{0}}(\overline{X}_{N}(t))\right\rvert\geq\xi\sqrt{N}\right\}\cap\{\tau^{\overline{X}}_{\rm{exit}}(I(r))\geq t\}\right)\leq 2e^{-\frac{\xi^{2}}{2(\lambda+\mu+\varepsilon)/J+2\xi/(3\sqrt{N})}}.

Moreover, since XN(t)X_{N}(t) and X¯N(t)\overline{X}_{N}(t) are path-wise indistinguishable in probability before leaving NI(r)NI(r), we have

|𝔼x0(XN(t))𝔼¯x0(X¯N(t))|\displaystyle\big{\lvert}\mathbb{E}_{x_{0}}(X_{N}(t))-\overline{\mathbb{E}}_{x_{0}}(\overline{X}_{N}(t))\big{\rvert} =|𝔼x0(XN(t)𝟙τexitX(I(r))<t)𝔼¯x0(X¯N(t)𝟙τexitX¯(I(r))<t)|\displaystyle=\left\lvert\mathbb{E}_{x_{0}}\left(X_{N}(t)\mathbbm{1}_{\tau^{X}_{\rm{exit}}(I(r))<t}\right)-\overline{\mathbb{E}}_{x_{0}}\left(\overline{X}_{N}(t)\mathbbm{1}_{\tau^{\overline{X}}_{\rm{exit}}(I(r))<t}\right)\right\rvert
Nx0(τexitX(I(r))<t).\displaystyle\leq N\mathbb{P}_{x_{0}}(\tau^{X}_{\rm{exit}}(I(r))<t).

The second part of Lemma 4.3 implies that for all ttfollowt\leq t_{\rm{follow}}, x0/NI(r2)I(r)x_{0}/N\in I(\frac{r}{2})\subset I(r) and all sufficiently large NN, we have

x0(τexitX(I(r))<t)x0(τexitX(I(r))tfollow)4eC1Nh,\mathbb{P}_{x_{0}}(\tau^{X}_{\rm{exit}}(I(r))<t)\leq\mathbb{P}_{x_{0}}(\tau^{X}_{\rm{exit}}(I(r))\leq t_{\rm{follow}})\leq 4e^{-C_{1}N^{h}}, (5.3)

and hence

|𝔼x0(XN(t))𝔼¯x0(X¯N(t))|=o(1).\left\lvert\mathbb{E}_{x_{0}}(X_{N}(t))-\overline{\mathbb{E}}_{x_{0}}(\overline{X}_{N}(t))\right\rvert=o(1).

Now, for any fixed ξ>0\xi>0, choose NN large enough such that |𝔼x0(XN(t))𝔼¯x0(X¯N(t))|<ξN\left\lvert\mathbb{E}_{x_{0}}(X_{N}(t))-\overline{\mathbb{E}}_{x_{0}}(\overline{X}_{N}(t))\right\rvert<\xi\sqrt{N}. It follows that

x0\displaystyle\mathbb{P}_{x_{0}} ({|XN(t)𝔼x0(XN(t))|2ξN}{τexitX(I(r))t})\displaystyle\left(\left\{\left\lvert X_{N}(t)-\mathbb{E}_{x_{0}}(X_{N}(t))\right\rvert\geq 2\xi\sqrt{N}\right\}\cap\{\tau^{X}_{\rm{exit}}(I(r))\geq t\}\right)
¯x0({|X¯N(t)𝔼¯x0(X¯N(t))|ξN}{τexitX¯(I(r))t})2eξ22(λ+μ+ε)/J+2ξ/(3N).\displaystyle\leq\overline{\mathbb{P}}_{x_{0}}\left(\left\{\left\lvert\overline{X}_{N}(t)-\overline{\mathbb{E}}_{x_{0}}(\overline{X}_{N}(t))\right\rvert\geq\xi\sqrt{N}\right\}\cap\{\tau^{\overline{X}}_{\rm{exit}}(I(r))\geq t\}\right)\leq 2e^{-\frac{\xi^{2}}{2(\lambda+\mu+\varepsilon)/J+2\xi/(3\sqrt{N})}}.

Together with (5.3), we have

x0(|XN(t)𝔼x0(XN(t))|2ξN)\displaystyle\mathbb{P}_{x_{0}}\left(\left\lvert X_{N}(t)-\mathbb{E}_{x_{0}}(X_{N}(t))\right\rvert\geq 2\xi\sqrt{N}\right) 2eξ22(λ+μ+ε)/J+2ξ/(3N)+4eC1Nh\displaystyle\leq 2e^{-\frac{\xi^{2}}{2(\lambda+\mu+\varepsilon)/J+2\xi/(3\sqrt{N})}}+4e^{-C_{1}N^{h}}
3eJξ23(λ+μ+ε),\displaystyle\leq 3e^{-\frac{J\xi^{2}}{3(\lambda+\mu+\varepsilon)}},

for NN large enough, as required. ∎

References

  • [1] Massimo A Achterberg, Bastian Prasse and Piet Van Mieghem “Analysis of continuous-time Markovian ε\varepsilon-SIS epidemics on networks” In Physical Review E 105.5, 2022, pp. 054305
  • [2] David Aldous and Persi Diaconis “Shuffling cards and stopping times” In The American Mathematical Monthly 93.5 Taylor & Francis, 1986, pp. 333–348
  • [3] William F Ames and BG Pachpatte “Inequalities for differential and integral equations” Elsevier, 1997
  • [4] Håkan Andersson and Boualem Djehiche “A threshold limit theorem for the stochastic logistic epidemic” In Journal of Applied Probability 35.03, 1998, pp. 662–670
  • [5] A. D. Barbour “Quasi–stationary distributions in Markov population processes” In Advances in Applied Probability 8.2 Cambridge University Press, 1976, pp. 296–314
  • [6] A. D. Barbour, Graham Brightwell and Malwina Luczak “Long-term concentration of measure and cut-off” In Stochastic Processes and their Applications 152 Elsevier, 2022, pp. 378–423
  • [7] A. D. Barbour and MJ Luczak “A law of large numbers approximation for Markov population processes with countably many types” In Probability Theory and Related Fields 153.3 Springer, 2012, pp. 727–757
  • [8] Riddhipratim Basu, Jonathan Hermon and Yuval Peres “Characterization of cutoff for reversible Markov chains” In Proceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms, 2014, pp. 1774–1791 SIAM
  • [9] William E Boyce, Richard C DiPrima and Hugo Villagómez Velázquez “Elementary differential equations and boundary value problems. Ecuaciones diferenciales y problemas con valores en la frontera”, 2004
  • [10] Graham Brightwell, Thomas House and Malwina Luczak “Extinction times in the subcritical stochastic SIS logistic epidemic” In Journal of Mathematical Biology 77.2 Springer, 2018, pp. 455–493
  • [11] Guan-Yu Chen and Laurent Saloff-Coste “On the mixing time and spectral gap for birth and death chains” In ALEA Latin American Journal of Probability and Mathematical Statistics 10.1, 2013, pp. 293–321
  • [12] Guan-Yu Chen and Laurent Saloff-Coste “Spectral computations for birth and death chains” In Stochastic Processes and Applications 124.1, 2014, pp. 848–882
  • [13] Guan-Yu Chen and Laurent Saloff-Coste “Computing cutoff times of birth and death chains” In Electronic Journal of Probability 20, 2015, pp. no. 76, 47
  • [14] Persi Diaconis and Mehrdad Shahshahani “Generating a random permutation with random transpositions” In Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 57.2 Springer, 1981, pp. 159–179
  • [15] Jian Ding, Eyal Lubetzky and Yuval Peres “Total variation cutoff in birth-and-death chains” In Probabability Theory and Related Fields 146.1-2, 2010, pp. 61–85
  • [16] Charles R. Doering, Khachik V. Sargsyan and Leonard M. Sander “Extinction Times for Birth-Death Processes: Exact Results, Continuum Asymptotics, and the Failure of the Fokker–Planck Approximation” In Multiscale Modeling & Simulation 3.2, 2005, pp. 283–299
  • [17] Alexandros Eskenazis and Evita Nestoridi “Cutoff for the Bernoulli–Laplace urn model with o(n)o(n) swaps” In Annales de l’Institut Henri Poincaré, Probabilités et Statistiques 56.4, 2020, pp. 2621–2639 Institut Henri Poincaré
  • [18] Willy Feller “Die Grundlagen der Volterraschen Theorie des Kampfes ums Dasein in wahrscheinlichkeitstheoretischer Behandlung” In Acta Biotheoretica 5.1, 1939, pp. 11–40
  • [19] Eric Foxall “Extinction time of the logistic process” In Journal of Applied Probability 58.3 Cambridge University Press, 2021, pp. 637–676
  • [20] Alison L. Hill, David G. Rand, Martin A. Nowak and Nicholas A. Christakis “Infectious Disease Modeling of Social Contagion in Networks” In PLOS Computational Biology 6.11 Public Library of Science, 2010, pp. 1–15
  • [21] Matthew James Keeling and Joshua V Ross “On methods for studying stochastic disease dynamics” In Journal of the Royal Society Interface 5.19 The Royal Society London, 2008, pp. 171–181
  • [22] Thomas G Kurtz “Limit theorems for sequences of jump Markov processes approximating ordinary differential processes” In Journal of Applied Probability 8.2 Cambridge University Press, 1971, pp. 344–356
  • [23] David A Levin and Yuval Peres “Markov chains and mixing times” American Mathematical Soc., 2017
  • [24] Fabio Lopes and Malwina Luczak “Extinction time for the weaker of two competing SIS epidemics” In The Annals of Applied Probability 30.6 Institute of Mathematical Statistics, 2020, pp. 2880–2922
  • [25] Ingemar Nåsell “The quasi-stationary distribution of the closed endemic SIS model” In Advances in Applied Probability 28.3 Cambridge University Press, 1996, pp. 895–932
  • [26] Ingemar Nåsell “On the quasi-stationary distribution of the stochastic logistic epidemic” In Mathematical Biosciences 156.1-2, 1999, pp. 21–40
  • [27] Garrett T Nieddu, Eric Forgoston and Lora Billings “Characterizing outbreak vulnerability in a stochastic SIS model with an external disease reservoir” In Journal of the Royal Society Interface 19.192 The Royal Society, 2022, pp. 20220253
  • [28] Justin Salez “Cutoff for non-negatively curved Markov chains” In Journal of the European Mathematical Society 26.11, 2024, pp. 4375–4392
  • [29] Patricia Stone, Hilde Wilkinson-Herbots and Valerie Isham “A stochastic model for head lice infections” In Journal of Mathematical Biology 56 Springer, 2008, pp. 743–763
  • [30] Piet Van Mieghem “Explosive phase transition in susceptible-infected-susceptible epidemics with arbitrary small but nonzero self-infection rate” In Physical Review E 101.3 APS, 2020, pp. 032303
  • [31] Piet Van Mieghem and Eric Cator “Epidemics in networks with nodal self-infection and the epidemic threshold” In Physical Review E 86.1 APS, 2012, pp. 016116
  • [32] Piet Van Mieghem and Fenghua Wang “Time dependence of susceptible-infected-susceptible epidemics on networks with nodal self-infections” In Physical Review E 101.5 APS, 2020, pp. 052310
  • [33] George H. Weiss and Menachem Dishon “On the asymptotic behavior of the stochastic and deterministic models of an epidemic” In Mathematical Biosciences 11.3-4, 1971, pp. 261–265