This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Usefulness of the Age-Structured SIR Dynamics in Modelling COVID-19

Rohit Parasnis Department of Electrical and Computer Engineering, University of California San Diego Ryosuke Kato Department of Electrical and Computer Engineering, University of California San Diego Amol Sakhale Department of Electrical and Computer Engineering, University of California San Diego Massimo Franceschetti Department of Electrical and Computer Engineering, University of California San Diego Behrouz Touri Email: rparasni,rkato,asakhale,mfranceschetti,[email protected] Department of Electrical and Computer Engineering, University of California San Diego
Abstract

We examine the age-structured SIR model, a variant of the classical Susceptible-Infected-Recovered (SIR) model of epidemic propagation, in the context of COVID-19. In doing so, we provide a theoretical basis for the model, perform an empirical validation, and discover the limitations of the model in approximating arbitrary epidemics. We first establish the differential equations defining the age-structured SIR model as the mean-field limits of a continuous-time Markov process that models epidemic spreading on a social network involving random, asynchronous interactions. We then show that, as the population size grows, the infection rate for any pair of age groups converges to its mean-field limit if and only if the edge update rate of the network approaches infinity, and we show how the rate of mean-field convergence depends on the edge update rate. We then propose a system identification method for parameter estimation of the bilinear ODEs of our model, and we test the model performance on a Japanese COVID-19 dataset by generating the trajectories of the age-wise numbers of infected individuals in the prefecture of Tokyo for a period of over 365 days. In the process, we also develop an algorithm to identify the different phases of the pandemic, each phase being associated with a unique set of contact rates. Our results show a good agreement between the generated trajectories and the observed ones.

1 INTRODUCTION

The global COVID-19 death toll has crossed 6 million [1], and it is no surprise that researchers all over the world have been forecasting the evolution of this pandemic to propose control policies aimed at minimizing its medical and economic impacts [2, 3, acemoglu2020optimal, 4, 5, 6]. Their efforts have typically relied on classical epidemiological models or their variants (for an overview see [7] and the references therein). One such classical epidemic model is the Susceptible-Infected-Recovered (SIR) model. Proposed in [8], the SIR model is a compartmental model in which every individual belongs to one of three possible states at any given time instant: the susceptible state, the infected state, and the recovered state. The continuous-time SIR dynamics models the time-evolution of the fraction of individuals in any of these states using a set of ordinary differential equations (ODEs) parameterized by two quantities: the infection rate (the rate at which a given infected individual infects a given susceptible individual) and the recovery rate of infected individuals.

Even though the continuous-time SIR model is a deterministic model, it models an inherently random phenomenon in a large (but discrete) population. To bridge between the deterministic continuous-time SIR model and the underlying random processes over a finite population, researchers have shown that the associated (continuous-time) ODEs are the mean-field limits of continuous-time Markovian epidemic processes over a finite population [9, 10]. Similar results have been obtained for variants of the original model, such as for the SIR dynamics on a configuration model network [11, 12]. These results theoretically justify the SIR model ODEs.

Classical SIR models, however, (continuous and discrete-time) are homogeneous – the same infection and recovery rates apply to the whole social network despite differences in the individuals’ age, gender, race, immunity level, and pre-existing medical conditions. For COVID-19, this assumption is inconsistent with studies showing that the contact rates between individuals and the recovery rates of infected individuals depend on factors such as age and location [13, 14, 15, 16]. In addition, [17] argues that homogeneous models can introduce significant biases in forecasting the epidemic, including overestimation of the number of infections required to achieve herd immunity, overestimation of the strictness of optimal control policies, overestimation of the impact of policy relaxations, and incorrect estimation of the time of onset of the pandemic.

We therefore need to shift our focus to variants of the classical SIR model with heterogeneous contact rates. Examples include the multi-risk SIR model [4] and the age-stratified SIR models considered in [18, 19, 20], in which the population is partitioned into multiple groups and the rates of infection and recovery vary across groups. See [17] for a survey of these papers.

However, the models considered in the above works have two main shortcomings. On the one hand, barring exceptions such as [21], they are typically not validated using real data. On the other hand, they do not have a strong theoretical foundation because the dynamical processes studied in these works have not been established as the mean field limits of stochastic epidemic processes evolving on time-varying random graphs. We emphasize that even the convergence results obtained for homogeneous SIR models [9, 10, 11, 12] make the unrealistic assumption that the network of physical contacts (in-person interactions) existing in the population is time-invariant. As such, we cannot justify the use of these models in designing optimal control policies aimed at minimizing the impact of any epidemic. We therefore address the aforementioned shortcomings using the age-structured SIR model, a multi-group SIR model that partitions the population of a given region into different age groups and assigns different infection rates and recovery rates to the age groups. We note that, although we adopt the term age-structured in our paper, our analysis also applies to populations partitioned on the basis of differences in geographical location, sex, immunity level, etc. Moreover, among existing heterogeneous models [17], the age-structured SIR model is the simplest and hence more mathematically and computationally tractable than other models.

The contributions of this paper are as follows:

  1. 1.

    Modeling: We extend our previously proposed stochastic epidemic model [22] to a more general model that incorporates (a) a random and time-varying network of physical contacts (in-person interactions between pairs of individuals) that are updated asynchronously and at random times, (b) random transmissions of disease-causing pathogens from infected individuals to their susceptible neighbors, and (c) recoveries of infected individuals that occur at random times. We analyze the resulting dynamics and show that under certain independence assumptions, the expected trajectories of the fractions of susceptible/infected/recovered individuals in any age group converge in mean-square to the solutions of the age-structured SIR ODEs as the population size goes to \infty.

  2. 2.

    Convergence Rate Analysis: We derive a lower bound on the effective infection rate for a given pair of age groups in the stochastic model. This bound, as we show, is approximately linear in the reciprocal of the network update rate, which leads to the infection rate converging to its limit (specified by the ODEs) as fast as the reciprocal of the network update rate vanishes.

  3. 3.

    Validation: We validate our age-structured model empirically by estimating the parameters of our model using a Japanese COVID-19 dataset and, subsequently, by generating the age-wise numbers of infected individuals as functions of time. In this process, we leverage the crucial fact that the ODEs defining our model are linear in the model parameters (transmission and recovery rates), which enables us to use a least-squares method for the system identification.

  4. 4.

    A Method to Detect Changes in Social Behavior: We design a simple algorithm that can be used to detect changes in social behavior throughout the duration of the pandemic. Given the age-wise daily infection counts, the algorithm estimates the dates around which the inter-age-group contact rates change significantly.

  5. 5.

    Insights into Epidemic Spreading: We interpret the results of our phase detection algorithm to identify the least and the most infectious age groups and the least and the most vulnerable age groups. Additionally, we analyze the data for the entire period from March 2020 to April 2021 to explain how certain social events influenced the propagation of COVID-19 in the prefecture of Tokyo.

The structure of our paper is as follows: We introduce the age-structured SIR model and our stochastic epidemic model in Section 2. We establish the age-structured SIR ODEs as the mean-field limits of our stochastic model in Section 3. We also discuss the limitations of (converse result for) our model in Section 3. Next, we describe the empirical validation of our model (in the context of the COVID-19 outbreak in Tokyo) in Section 5. We conclude with a brief summary and future directions in Section 6.

Related Works: [23] proposes a heterogeneous epidemic model with time-varying parameters to show that heterogeneous susceptibility to infection results in a temporary weakening of the COVID-19 pandemic but not in herd immunity. The model is validated using the death tolls (and not the case numbers) reported for New York and Chicago for a period of about 80 days. [19] uses the age-structured SEIQRD model to predict the number of deaths with a reasonable accuracy, but unlike our work, it does not use the proposed model to generate the number of new cases as a function of time. [24] uses heterogeneous variants of the SEIR model to study the impact of the lockdown policy implemented in France, but it does not validate these models empirically. [15] reports contact rate matrices for the population of the UK based on the self-reported data of 36,000 volunteers. However, the study ignores the time-varying nature of these contact rates, which we capture in our phase detection algorithm (Section 5). Another study that uses time-invariant model parameters is [25], which proposes the age-structured SEIRA model and uses it to simulate the number of new infections in different social groups of Chile.

[26] uses a heterogeneous SEIRD model to predict the effects of various relaxation policies on infection counts in certain regions of Italy. The model therein is empirically validated only using the data obtained during the first 60 days of the pandemic. In [20], the authors propose an age-structured SIRD model and calibrate it with the data obtained from [2]. Unlike our paper, however, [20] divides the population into only two age groups, and does not compare the model-generated values of the number of infections with the official case counts. Two other studies that use two-age-group SIR models are [27] and [28]. While [27] argues that in Florida, old and socially inert adults have been possibly infected by the young, [28] argues that age-group-targeted policies are more effective than uniform policies in reducing the economic impact of COVID-19. [29] proposes a heterogeneous SIR model with feedback and forecasts the economic and medical impacts of various policies aimed at controlling the pandemic in Chile. Unlike our study, however, [29] ignores the time-varying nature of contact rates. [21] proposes the SEIR-HC-SEC-AGE model, a heterogeneous SEIR model that sub-divides each age-group further into risk sectors with different vulnerabilities to the SARS-CoV-2 virus. The model therein, which is calibrated to predict the effects of different lockdown policies in certain regions of Italy, simulates the time-evolution of the observed death toll with a high accuracy. By contrast, we pick a much simpler heterogeneous model and examine whether it fits the observed case numbers well. [18] and [4] use an age-structured SIR model to show that control policies that target different age groups differently perform better than uniform policies. However, these results assume that inter-age-group contact rates are the same for all pairs of age groups, an assumption that is inconsistent with our empirical results (Section 5). Hence, deriving optimal policies in the framework of the age-structured SIR model under more general assumptions is an important open problem.

Notation: We let \mathbb{N} denote the set of natural numbers and 0:={0}\mathbb{N}_{0}:=\mathbb{N}\cup\{0\}. We let [l]:={1,2,,}[l]:=\{1,2,\ldots,\ell\} for \ell\in\mathbb{N}. We denote the set of real and positive real numbers by \mathbb{R} and +\mathbb{R}_{+}, respectively. For xx\in\mathbb{R}, we let x+:=max{x,0}x_{+}:=\max\{x,0\} denote the positive part of xx.

The symbols tt and kk are used as a continuous-time and discrete-time indices, respectively. We use the notation z(t)z(t) for functions z:+{0}z:\mathbb{R}_{+}\cup\{0\}\to\mathbb{R} and z[k]z[k] for functions z:z:\mathbb{N}\to\mathbb{R}. We occasionally omit the time index (t)(t) when the value of tt is clear from the context.

We use the Bachmann-Landau asymptotic notation O(f(n))O(f(n)) for a given function f:f:\mathbb{N}\rightarrow\mathbb{R} in the context of nn\rightarrow\infty. We use o(Δt)o(\Delta t) in the context of Δt0\Delta t\rightarrow 0. In addition, for a given function g:[0,)g:[0,\infty)\rightarrow\mathbb{R}, we use the notation g=g(t)g^{\prime}=g^{\prime}(t) to denote dgdt\frac{dg}{dt}, the first derivative of gg with respect to time.

For a set 𝒮\mathcal{S}, we let |𝒮||\mathcal{S}| denote the cardinality of 𝒮\mathcal{S}. In this paper, all random events and random variables are with respect to a probability space (Ω,,Pr)(\Omega,\mathcal{F},\text{Pr}), where Ω\Omega is the sample space, \mathcal{F} is the set of events, and Pr()\text{Pr}(\cdot) is the probability measure on this space. We denote random variables and random events using capital letters, and for a random event CC, we define 1C1_{C} to be the indicator random variable associated with CC, i.e, 1C:Ω1_{C}:\Omega\to\mathbb{R} is the random variable with 1C(ω)=11_{C}(\omega)=1 if ωC\omega\in C and 1C(ω)=01_{C}(\omega)=0, otherwise. For an event CC\in\mathcal{F}, C¯\bar{C} represents the complement of CC. For a random variable XX, 𝔼[X]\mathbb{E}[X] denotes the expected value of XX and 𝔼[XC]\mathbb{E}[X\mid C] denotes the conditional expectation of XX given the event CC. For random variables XX and YY and a random event CC, we define

𝔼[XY,C]=𝔼[X1CY]𝔼[1CY].\mathbb{E}[X\mid Y,C]=\frac{\mathbb{E}[X1_{C}\mid Y]}{\mathbb{E}[1_{C}\mid Y]}.

Therefore, for an event FF\in\mathcal{F}

Pr(FY,C)=𝔼[1FY,C]=𝔼[1FCY]𝔼[1CY].\text{Pr}(F\mid Y,C)=\mathbb{E}[1_{F}\mid Y,C]=\frac{\mathbb{E}[1_{F\cap C}\mid Y]}{\mathbb{E}[1_{C}\mid Y]}.

We denote tuples of length r>1r>1 using bold-face letters and random tuples using bold-face capital letters. For a tuple 𝐱\mathbf{x} of length rr\in\mathbb{N} and an index [r]\ell\in[r], we let x=(𝐱)x_{\ell}=(\mathbf{x})_{\ell} denote the \ell-th entry of 𝐱\mathbf{x}.

For nn\in\mathbb{N} and E[n]×[n]E\subset[n]\times[n], we use G=([n],E)G=([n],E) to denote the directed graph (digraph) with vertex set [n][n] and edge set EE. Finally, for a graph G=([n],E)G=([n],E), given two distinct nodes a,b[n]a,b\in[n], we let a,b:=(a1)(n1)+bχba\langle a,b\rangle:=(a-1)(n-1)+b-\chi_{b-a}, where χα=1\chi_{\alpha}=1 if α>0\alpha>0 and χα=0\chi_{\alpha}=0, otherwise. Note that ,\langle\cdot,\cdot\rangle maps the edges between (distinct) nodes of the graph to the numbers 1,,n2n1,\ldots,n^{2}-n in lexicographic order.

2 PROBLEM FORMULATION

We now introduce two epidemic models, of which the first describes a deterministic dynamical system and the second describes a stochastic process on a finite population. One of the main objectives of this work is to relate these models, which is achieved in Section 3.

2.1 The Age-Structured SIR Model

Consider a population of individuals spanning mm age groups111As mentioned before, throughout this paper, we could generalize the discussions involving age groups to subpopulations distinguished by geographical locations, pre-existing health conditions, sex, etc.. Suppose a part of this population contracts a communicable disease at time t=0t=0. Let si(t),βi(t)s_{i}(t),\beta_{i}(t), and ri(t)r_{i}(t) denote, respectively, the fractions of susceptible, infected, and recovered individuals in the ii-th age group at (a continuous) time t0t\geq 0, so that si(t)+βi(t)+ri(t)s_{i}(t)+\beta_{i}(t)+r_{i}(t) equals the fraction of individuals in the ii-th age group for all t0t\geq 0. As the disease spreads across the population, susceptible individuals get infected, and infected individuals recover in accordance with the system of ODEs given by

s˙i(t)\displaystyle\dot{s}_{i}(t) =si(t)j=1mAijβj(t),\displaystyle=-s_{i}(t)\sum_{j=1}^{m}A_{ij}\beta_{j}(t), (1)
β˙i(t)\displaystyle\dot{\beta}_{i}(t) =si(t)j=1mAijβj(t)γiβi(t),\displaystyle=s_{i}(t)\sum_{j=1}^{m}A_{ij}\beta_{j}(t)-\gamma_{i}\beta_{i}(t), (2)
r˙i(t)\displaystyle\dot{r}_{i}(t) =γiβ(t),\displaystyle=\gamma_{i}\beta(t),

where for each i,j[m]i,j\in[m], the constant AijA_{ij} represents the rate of infection transmission from an individual in age group jj to an individual in age group ii, and γi\gamma_{i} denotes the recovery rate of an infected individual in age group ii. Hereafter, we refer to AijA_{ij} as the contact rate of age group jj with age group ii. Note that the third equation in (1) can be obtained from the first two equations simply by using the fact that s˙i(t)+β˙i(t)+r˙i(t)=0\dot{s}_{i}(t)+\dot{\beta}_{i}(t)+\dot{r}_{i}(t)=0 for all t0t\geq 0. Also, if m=1m=1, the above model reduces to the classical (homogeneous and continuous-time) SIR model.

2.2 A Stochastic Epidemic Model

Let us now define a continuous-time Markov chain that describes an age-structured process of epidemic spreading occurring over a finite (atomic) population composed of individuals that are connected through a random, time-varying network G(t)G(t).

2.2.1 Age Groups

Let nn\in\mathbb{N} denote the total population size, and let [n][n] be the vertex set of the time-varying graph G(t)G(t), so that the vertex set indexes all the individuals/nodes in the network. We assume that [n][n] is partitioned into mm age groups {𝒜i}i=1m\{\mathcal{A}_{i}\}_{i=1}^{m} and that |𝒜i||\mathcal{A}_{i}| (the number of individuals in the ii-th age group) scales linearly with nn for all i[m]i\in[m]. In the following, i,j[m]i,j\in[m] are generic age group indices.

2.2.2 State Space

The state space of our random process is the space 𝕊={1,0,1}n×{0,1}2n(n1)\mathbb{S}=\{-1,0,1\}^{n}\times\{0,1\}^{2n(n-1)}. The network state is a tuple 𝐱=(x1,x2,,x2n2n)𝕊\mathbf{x}=(x_{1},x_{2},\ldots,x_{2n^{2}-n})\in\mathbb{S}, where

  1. (i)

    {x}[n]\{x_{\ell}\}_{\ell\in[n]} denotes the disease states of the nodes in the network, i.e., for [n]\ell\in[n], we set x=0x_{\ell}=0, 11, or 1-1 accordingly as node \ell is susceptible, infected, or recovered, respectively.

  2. (ii)

    For {n+1,n+2,,n2}\ell\in\{n+1,n+2,\ldots,n^{2}\}, we let xx_{\ell} denote the edge state of the \ell-th pair in the following lexicographic order of pairs of distinct nodes: (1,2),,(1,n),(2,1),,(2,n),,(n,1),,(n,n1)(1,2),\ldots,(1,n),(2,1),\ldots,(2,n),\ldots,(n,1),\ldots,(n,n-1). In other words, for any node pair (a,b)[n]×[n](a,b)\in[n]\times[n] such that aba\neq b, we set xa,b=1x_{\langle a,b\rangle}=1 if there is a directed edge from bb to aa in the network GG, and xa,b=0x_{\langle a,b\rangle}=0, otherwise. For notational convenience, we let 1(a,b)(𝐱):=xa,b1_{(a,b)}(\mathbf{x}):=x_{\langle a,b\rangle}.

  3. (iii)

    For {n2+1,,2n2n}\ell\in\{n^{2}+1,\ldots,2n^{2}-n\}, we let xx_{\ell} be a binary variable whose value flips (becomes 1x1-x_{\ell}) whenever the (n2)(\ell-n^{2})-th edge state gets updated (re-initialized). However, the direction of this flip (whether xx_{\ell} changes from 0 to 1 or from 1 to 0) carries no significance.

2.2.3 State Attributes

For all 𝐱𝕊\mathbf{x}\in\mathbb{S}, we let 𝒮i(𝐱):={a𝒜i:xa=0}\mathcal{S}_{i}(\mathbf{x}):=\{a\in\mathcal{A}_{i}:x_{a}=0\}, i(𝐱):={a𝒜i:xa=1}\mathcal{I}_{i}(\mathbf{x}):=\{a\in\mathcal{A}_{i}:x_{a}=1\}, and i(𝐱):={a𝒜i:xa=1}\mathcal{R}_{i}(\mathbf{x}):=\{a\in\mathcal{A}_{i}:x_{a}=-1\} denote, respectively, the set of susceptible individuals, the set of infected individuals, and the set of recovered individuals in 𝒜i\mathcal{A}_{i} given that the network state is 𝐱\mathbf{x}. We let 𝒮(𝐱):=i=1m𝒮i(𝐱)\mathcal{S}(\mathbf{x}):=\cup_{i=1}^{m}\mathcal{S}_{i}(\mathbf{x}) and (𝐱):=i=1mi(𝐱)\mathcal{I}(\mathbf{x}):=\cup_{i=1}^{m}\mathcal{I}_{i}(\mathbf{x}). Additionally, for every node a[n]a\in[n], we let Ej(a)(𝐱):=cj(𝐱)1(a,c)(𝐱)E_{j}^{(a)}(\mathbf{x}):=\sum_{c\in\mathcal{I}_{j}(\mathbf{x})}1_{(a,c)}(\mathbf{x}) be the number of arcs from j(𝐱)\mathcal{I}_{j}(\mathbf{x}) to aa.

2.2.4 The Markov Process

Let 𝐗(t)𝕊\mathbf{X}(t)\in\mathbb{S} denote the state of the network at any time t0t\geq 0. Then we assume that {𝐗(t):t0}\{\mathbf{X}(t):t\geq 0\} is a right-continuous time-homogeneous Markov process in which every transition from a state 𝐱𝕊\mathbf{x}\in\mathbb{S} to a state 𝐲𝕊{𝐱}\mathbf{y}\in\mathbb{S}\setminus\{\mathbf{x}\} belongs to one of the following categories:

  1. 1.

    Infection transition: This occurs when a node a𝒮i(𝐱)a\in\mathcal{S}_{i}(\mathbf{x}) gets infected by a node in k=1mk(𝐱)\cup_{k=1}^{m}\mathcal{I}_{k}(\mathbf{x}), while the disease states of all other nodes and the edge states of all the node pairs remain the same. In other words, xa=0x_{a}=0, ya=1y_{a}=1, and x=yx_{\ell}=y_{\ell} for all a\ell\neq a. Denoting the state-independent rate of pathogen transmission from a node in k(𝐱)\mathcal{I}_{k}(\mathbf{x}) to an adjacent node in 𝒮i(𝐱)\mathcal{S}_{i}(\mathbf{x}) by BikB_{ik}, we note that the rate of infection transmission from any node ck(𝐱)c\in\mathcal{I}_{k}(\mathbf{x}) to aa is Bik1(a,c)(𝐱)B_{ik}1_{(a,c)}(\mathbf{x}). Hence, the total rate at which aa receives pathogens from k\mathcal{I}_{k} is ck(𝐱)Bik1(a,c)(𝐱)=BikEk(a)(𝐱)\sum_{c\in\mathcal{I}_{k}(\mathbf{x})}B_{ik}1_{(a,c)}(\mathbf{x})=B_{ik}E_{k}^{(a)}(\mathbf{x}), assuming that different edges transmit the infection independently of each other during vanishingly small time intervals. As a result, the effective rate at which aa gets infected is k=1mBikEk(a)(𝐱)\sum_{k=1}^{m}B_{ik}E_{k}^{(a)}(\mathbf{x}). We denote the successor state 𝐲\mathbf{y} of 𝐱\mathbf{x}, where the node aa turns from susceptible to infected, by 𝐱a\mathbf{x}_{\uparrow a}.

  2. 2.

    Recovery transition: This occurs when a node ai(𝐱)a\in\mathcal{I}_{i}(\mathbf{x}) recovers, i.e., xa=1x_{a}=1, ya=1y_{a}=-1, and x=yx_{\ell}=y_{\ell} for all a\ell\neq a. We let γi\gamma_{i} denote the rate at which an infected node in 𝒜i\mathcal{A}_{i} (such as aa) recovers. For such a transition, we denote 𝐲=𝐱a\mathbf{y}=\mathbf{x}_{\downarrow a}.

  3. 3.

    Edge update transition: This occurs when xa,bx_{\langle a,b\rangle}, the edge state of a node pair (a,b)𝒜i×𝒜j(a,b)\in\mathcal{A}_{i}\times\mathcal{A}_{j}, is updated or re-initialized, i.e., yn2+a,b=1xn2+a,by_{n^{2}+\langle a,b\rangle}=1-x_{n^{2}+\langle a,b\rangle}, and y=xy_{\ell}=x_{\ell} for all {a,b,n2+a,b}\ell\notin\{\langle a,b\rangle,n^{2}+\langle a,b\rangle\}. We let λ\lambda denote the edge update rate or the rate at which an edge state is updated. In addition, given that the edge state of (a,b)(a,b) is updated at time t0t\geq 0, the probability that 1(a,b)(t)=11_{(a,b)}(t)=1 (i.e., the edge (a,b)(a,b) exists after the re-initialization) equals ρijn\frac{\rho_{ij}}{n}, where ρij>0\rho_{ij}>0 is constant in time. Therefore, if ya,b=1y_{\langle a,b\rangle}=1 (meaning that (a,b)(a,b) exists as an arc in GG in the network state 𝐲\mathbf{y}), then the rate of transition from 𝐱\mathbf{x} to 𝐲\mathbf{y} equals λρijn\lambda\frac{\rho_{ij}}{n}, whereas if ya,b=0y_{\langle a,b\rangle}=0, then the rate of transition from 𝐱\mathbf{x} to 𝐲\mathbf{y} equals λ(1ρijn)\lambda\left(1-\frac{\rho_{ij}}{n}\right). In the former case, we write 𝐲=𝐱(a,b)\mathbf{y}=\mathbf{x}_{\uparrow(a,b)}, while in the latter case, we write 𝐲=𝐱(a,b)\mathbf{y}=\mathbf{x}_{\downarrow(a,b)}. Note that the rate of transition from 𝐱\mathbf{x} to 𝐱(a,b)\mathbf{x}_{\downarrow(a,b)} or 𝐱(a,b)\mathbf{x}_{\uparrow(a,b)} does not depend on 𝐱\mathbf{x}.

    The edge update transition of (a,b)(a,b) can be described informally as follows. Throughout the evolution of the pandemic, aa and bb decide whether or not to meet each other at a constant rate λ>0\lambda>0, i.e., their decision times {T(a,b)}=1\{T_{\ell}^{(a,b)}\}_{\ell=1}^{\infty} form a Poisson process with rate λ\lambda. Each time they make such a decision, they decide to interact with probability ρijn\frac{\rho_{ij}}{n}, and they decide not to interact with probability 1ρijn1-\frac{\rho_{ij}}{n}, independently of their past decisions. The probability of interaction is assumed to scale inversely with nn so that the mean degree of every node is constant with respect to nn.

To summarize, the rate of transition from any state 𝐱𝕊\mathbf{x}\in\mathbb{S} to any state 𝐲𝕊{𝐱}\mathbf{y}\in\mathbb{S}\setminus\{\mathbf{x}\} is given by 𝐐\mathbf{Q}, the infinitesimal generator of the Markov chain {𝐗(t):t0}\{\mathbf{X}(t):t\geq 0\}, where for 𝐱𝐲\mathbf{x}\not=\mathbf{y}

𝐐(𝐱,𝐲):={k=1mBikEk(a)(𝐱)if 𝐲=𝐱a for some a𝒮i(𝐱),i[m]γiif 𝐲=𝐱a for some ai(𝐱),i[m]λρijnif 𝐲=𝐱(a,b) for some (a,b)𝒜i×𝒜j,i,j[m]λ(1ρijn)if 𝐲=𝐱(a,b), for some (a,b)𝒜i×𝒜j,i,j[m]0otherwise,\displaystyle\mathbf{Q}(\mathbf{x},\mathbf{y}):=\begin{cases}\sum_{k=1}^{m}B_{ik}E_{k}^{(a)}(\mathbf{x})\,&\text{if }\mathbf{y}=\mathbf{x}_{\uparrow a}\text{ for some }a\in\mathcal{S}_{i}(\mathbf{x}),i\in[m]\\ \gamma_{i}\quad&\text{if }\mathbf{y}=\mathbf{x}_{\downarrow a}\text{ for some }a\in\mathcal{I}_{i}(\mathbf{x}),i\in[m]\\ \lambda\frac{\rho_{ij}}{n}\quad&\text{if }\mathbf{y}=\mathbf{x}_{\uparrow(a,b)}\text{ for some }(a,b)\in\mathcal{A}_{i}\times\mathcal{A}_{j},i,j\in[m]\\ \lambda\left(1-\frac{\rho_{ij}}{n}\right)\quad&\text{if }\mathbf{y}=\mathbf{x}_{\downarrow(a,b),}\text{ for some }(a,b)\in\mathcal{A}_{i}\times\mathcal{A}_{j},i,j\in[m]\\ 0\quad&\text{otherwise}\end{cases},

and 𝐐(𝐱,𝐱):=𝐲𝕊{𝐱}𝐐(𝐱,𝐲)\mathbf{Q}(\mathbf{x},\mathbf{x}):=-\sum_{\mathbf{y}\in\mathbb{S}\setminus\{\mathbf{x}\}}\mathbf{Q}(\mathbf{x},\mathbf{y}). In addition, we say that 𝐲\mathbf{y} succeeds 𝐱\mathbf{x} potentially iff 𝐐(𝐱,𝐲)>0\mathbf{Q}(\mathbf{x},\mathbf{y})>0.

3 MAIN RESULT

To provide a rigorous mean-field derivation of the dynamics (1), we now consider a sequence of social networks with increasing population sizes such that each network obeys the theoretical framework described in Section 2. Given a network from this sequence with population size nn\in\mathbb{N}, we let 𝒮j(n)(t):=𝒮j(𝐗(t))\mathcal{S}^{(n)}_{j}(t):=\mathcal{S}_{j}(\mathbf{X}(t)), j(n)(t):=j(𝐗(t))\mathcal{I}^{(n)}_{j}(t):=\mathcal{I}_{j}(\mathbf{X}(t)), and j(n)(t):=j(𝐗(t))\mathcal{R}^{(n)}_{j}(t):=\mathcal{R}_{j}(\mathbf{X}(t)) denote the (random) sets of infected, susceptible, and infected individuals in the jj-th age group, respectively, and we let sj(n)(t):=1n|𝒮j(n)(t)|s_{j}^{(n)}(t):=\frac{1}{n}|\mathcal{S}_{j}^{(n)}(t)|, βj(n)(t):=1n|j(n)(t)|\beta_{j}^{(n)}(t):=\frac{1}{n}|\mathcal{I}_{j}^{(n)}(t)| and rj(n)(t):=1n|j(n)(t)|r_{j}^{(n)}(t):=\frac{1}{n}|\mathcal{R}_{j}^{(n)}(t)| denote the fractions of susceptible, infected, and recovered individuals in the jj-th age group, respectively. As for the absolute numbers, we let Sj(n)(t):=|𝒮j(n)(t)|S_{j}^{(n)}(t):=|\mathcal{S}_{j}^{(n)}(t)|, Ij(n)(t):=|j(n)(t)|I_{j}^{(n)}(t):=|\mathcal{I}_{j}^{(n)}(t)|, and Rj(n)(t):=|j(n)(t)|R_{j}^{(n)}(t):=|\mathcal{R}_{j}^{(n)}(t)|. Additionally, we let E(n)(t)E^{(n)}(t) denote the edge set of the network at time tt, and we drop the superscript (n) when the context makes our reference to the nn-th network clear.

Another quantity that varies with nn is λ(n)\lambda^{(n)}, the edge update rate. To obtain the desired mean-field limit in Theorem 1, we assume that λ(n)\lambda^{(n)}\rightarrow\infty as nn\rightarrow\infty. To interpret this assumption, consider any pair of individuals (a,b)𝒜i×𝒜j(a,b)\in\mathcal{A}_{i}\times\mathcal{A}_{j} that are in contact with each other at time t0t\geq 0 during the epidemic. Since the edge state of (a,b)(a,b) is updated to 0 (the state of non-existence) at a time-invariant rate of λ(n)(1ρijn)\lambda^{(n)}\left(1-\frac{\rho_{ij}}{n}\right), the assumption implies that the mean interaction time of bb with aa, which is 1λ(n)+O(1nλ(n))\frac{1}{\lambda^{(n)}}+O\left(\frac{1}{n\lambda^{(n)}}\right), vanishes as the population size increases. This is a possible real-world scenario, because as nn increases, the population density of the given geographical region increases, which could result in overcrowding and rapidly changing interaction patterns in the network. This may be especially true in the case of public places such as supermarkets and subway stations at a time when the society is already aware of an evolving epidemic. Another implication of limnλ(n)=\lim_{n\rightarrow\infty}\lambda^{(n)}=\infty is that the rate at which a given infected node contacts and transmits pathogens to a given susceptible node vanishes as the population size goes to \infty (see Remark 1 for an explanation). This implication is weaker than the often-assumed condition that the rate of pathogen transmission is proportional to the reciprocal of the population size [30, 31].

We are now ready to state our main result. Its proof is based on the theory of continuous-time Markov chains and an analysis of how the disease propagation process is affected by random updates occurring in the network structure at random times (which results in Propositions 1 and 2) in addition to the proof techniques used in [30]. The proofs of all these results are available in the appendix.

Theorem 1.

Suppose that limnλ(n)=\lim_{n\rightarrow\infty}\lambda^{(n)}=\infty and that for every i[m]i\in[m], there exist si,0,βi,0[0,1]s_{i,0},\beta_{i,0}\in[0,1] such that limnsi(n)(0)=si,0\lim_{n\to\infty}s_{i}^{(n)}(0)=s_{i,0} and limnβi(n)(0)=βi,0\lim_{n\rightarrow\infty}\beta_{i}^{(n)}(0)=\beta_{i,0}. Then for each i[m]i\in[m],

limn𝔼[(si(n)(t),βi(n)(t))(yi(t),wi(t))22]=0.{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\lim_{n\to\infty}}\mathbb{E}\left[\left\|\left(s_{i}^{(n)}(t),\beta_{i}^{(n)}(t)\right)-\left(y_{i}(t),w_{i}(t)\right)\right\|_{2}^{2}\right]=0.

on any finite time interval [0,T0][0,T_{0}], where (yi(t),wi(t))(y_{i}(t),w_{i}(t)) is the solution to the ODE system given by the first two equations in (1), i.e., (yi(t),wi(t))(y_{i}(t),w_{i}(t)) satisfies

  1. (I).

    y˙i=yij=1mAijwj,yi(0)=si,0\quad\dot{y}_{i}=-y_{i}\sum_{j=1}^{m}A_{ij}w_{j},\quad y_{i}(0)=s_{i,0},

  2. (II).

    w˙i=yij=1mAijwjγiwi,wi(0)=βi,0\quad\dot{w}_{i}=y_{i}\sum_{j=1}^{m}A_{ij}w_{j}-\gamma_{i}w_{i},\quad w_{i}(0)=\beta_{i,0},

and Am×mA\in\mathbb{R}^{m\times m} is defined by Aij:=ρijBijA_{ij}:=\rho_{ij}B_{ij}.

Theorem 1 relies on the following proposition.

Proposition 1.

For each i,j[m]i,j\in[m], let

χij=χij(t,𝒮,):=𝔼[1(a,b)(t)𝒮(t),(t)]=Pr((a,b)E(t)𝒮(t),(t))\chi_{ij}=\chi_{ij}(t,\mathcal{S},\mathcal{I}):=\mathbb{E}[1_{(a,b)}(t)\mid\mathcal{S}(t),\mathcal{I}(t)]=\text{Pr}((a,b)\in E(t)\mid\mathcal{S}(t),\mathcal{I}(t))

be the random variable that denotes the conditional probability that a pair of nodes (a,b)𝒮i(t)×j(t)(a,b)\in\mathcal{S}_{i}(t)\times\mathcal{I}_{j}(t) are in physical contact at time tt given the state of the network at time tt. Then the following equations hold for all t0t\geq 0:

  1. (i).

    𝔼[si]=j=1mBij𝔼[nχijsiβj]\mathbb{E}[s_{i}]^{\prime}=-\sum_{j=1}^{m}B_{ij}\mathbb{E}[n\chi_{ij}s_{i}\beta_{j}],

  2. (ii).

    𝔼[βi]=j=1mBij𝔼[nχijsiβj]γi𝔼[βi]\mathbb{E}[\beta_{i}]^{\prime}=\sum_{j=1}^{m}B_{ij}\mathbb{E}[n\chi_{ij}s_{i}\beta_{j}]-\gamma_{i}\mathbb{E}[\beta_{i}],

  3. (iii).

    𝔼[si2]=j=1m(2Bij𝔼[nχijsi2βj]Bij𝔼[nχijsiβj]/n)\mathbb{E}[s_{i}^{2}]^{\prime}=-\sum_{j=1}^{m}\big{(}2B_{ij}\mathbb{E}[n\chi_{ij}s_{i}^{2}\beta_{j}]-B_{ij}\mathbb{E}[n\chi_{ij}s_{i}\beta_{j}]/n\big{)},

  4. (iv).

    𝔼[βi2]=j=1mBij(2𝔼[nχijsiβjβi]+𝔼[nχijsiβj]/n)γi(2𝔼[βi2]𝔼[βi]/n)\mathbb{E}[\beta_{i}^{2}]^{\prime}=\sum_{j=1}^{m}B_{ij}(2\mathbb{E}[n\chi_{ij}s_{i}\beta_{j}\beta_{i}]+\mathbb{E}[n\chi_{ij}s_{i}\beta_{j}]/n)-\gamma_{i}\left(2\mathbb{E}[\beta_{i}^{2}]-\mathbb{E}[\beta_{i}]/n\right).

We point out that if χij=ρijn\chi_{ij}=\frac{\rho_{ij}}{n} then Equations (i) and (ii) have the same coefficients as (I) and (II). It is then natural to ask: how does the conditional edge probability χij\chi_{ij} compare to the unconditional edge probability ρijn\frac{\rho_{ij}}{n}? The following proposition provides an answer. As we show in Remark 1, our answer helps characterize the rate at which the infection transmission rates converge to their respective limits, an analysis missing from other works such as [30] and [31].

Proposition 2.

For all t0t\geq 0, nn\in\mathbb{N} and i,j[m]i,j\in[m],

ρijn(1Bijλ(n)(1eλ(n)t))χijρijn.\frac{\rho_{ij}}{n}\left(1-\frac{B_{ij}}{\lambda^{(n)}}\left(1-e^{-\lambda^{(n)}t}\right)\right)\leq\chi_{ij}\leq\frac{\rho_{ij}}{n}.

Remark 1.

Given (𝒮(t),(t))(\mathcal{S}(t),\mathcal{I}(t)), note that the conditional probability that a given infected node in 𝒜j\mathcal{A}_{j} infects a given susceptible node in 𝒜i\mathcal{A}_{i} during a time interval [t,t+Δt)[t,t+\Delta t) is Bijχij(t,𝒮,)Δt+o(Δt).B_{ij}\chi_{ij}(t,\mathcal{S},\mathcal{I})\Delta t+o(\Delta t). In light of Proposition 2, this means that the associated conditional infection rate Bijχij(t,𝒮,)B_{ij}\chi_{ij}(t,\mathcal{S},\mathcal{I}) belongs to the interval

[1nAij(1Bijλ(n)(1eλ(n)t)),1nAij].\left[\frac{1}{n}A_{ij}\left(1-\frac{B_{ij}}{\lambda^{(n)}}(1-e^{-\lambda^{(n)}t})\right),\frac{1}{n}A_{ij}\right].

On taking expectations, we realize that the same applies to the associated unconditional infection rate as well. Hence, the total rate of infection transmission from all of 𝒜j\mathcal{A}_{j} to any given node of 𝒜i\mathcal{A}_{i} is at least Ij(n)(t)×1nAij(1Bijλ(n)(1eλ(n)t))=Aijβj(n)(t)(1Bijλ(n)(1eλ(n)t))I_{j}^{(n)}(t)\times\frac{1}{n}A_{ij}\left(1-\frac{B_{ij}}{\lambda^{(n)}}(1-e^{-\lambda^{(n)}t})\right)=A_{ij}\beta_{j}^{(n)}(t)\left(1-\frac{B_{ij}}{\lambda^{(n)}}(1-e^{-\lambda^{(n)}t})\right) and at most Aijβj(n)(t)A_{ij}\beta_{j}^{(n)}(t). Since we assume limnλ(n)=\lim_{n\rightarrow\infty}\lambda^{(n)}=\infty, this further implies that the concerned rate is approximately Aijβj(n)(t)A_{ij}\beta_{j}^{(n)}(t) for large nn, thereby giving us an interpretation of the ‘contact rate’ AijA_{ij} as a normalized infection rate. That is, in the limit as nn\rightarrow\infty, the matrix AA quantifies the infection transmission rates between any two age groups relative to the level of infectedness (fraction of infected persons) of the transmitting age group. Moreover, Proposition 2 also implies that the difference between the age-wise infection transmission rates and their respective mean-field limits (which exist as per Theorem 1) is O(1λ(n))O\left(\frac{1}{\lambda^{(n)}}\right).

4 A CONVERSE RESULT

The purpose of this section is to argue that the age-structured SIR dynamics does not model an epidemic well if the infection rates BijB_{ij} are high enough to be comparable to the edge update rate of the network.

Theorem 2.

Suppose λ:=limnλ(n)<\lambda_{\infty}:=\lim_{n\rightarrow\infty}\lambda^{(n)}<\infty and that for every p[m]p\in[m], there exist sp,0,βp,0[0,1]s_{p,0},\beta_{p,0}\in[0,1] such that sp(n)(0)sp,0s_{p}^{(n)}(0)\rightarrow s_{p,0} and βi(n)(0)βp,0\beta_{i}^{(n)}(0)\rightarrow\beta_{p,0} as nn\rightarrow\infty. In addition, let {(yq(t),wq(t))}q[m]\{(y_{q}(t),w_{q}(t))\}_{q\in[m]} be the solutions of the ODEs (I) and (II). Then, there exists no interval [t1,t2][0,)[t_{1},t_{2}]\subset[0,\infty) for which minp,q[m]mint[t1,t2]yp(t)wq(t)>0\min_{p,q\in[m]}\min_{t\in[t_{1},t_{2}]}y_{p}(t)w_{q}(t)>0 and on which the pairs {(sq(n)(t),βq(n)(t))}q=1m\left\{\left(s_{q}^{(n)}(t),\beta_{q}^{(n)}(t)\right)\right\}_{q=1}^{m} uniformly converge in probability to the corresponding pairs in {(yq(t),wq(t))}q=1m\{(y_{q}(t),w_{q}(t))\}_{q=1}^{m}. More precisely, for every interval [t1,t2][t_{1},t_{2}]\subset\mathbb{R} such that yp(t)>0y_{p}(t)>0 and wp(t)>0w_{p}(t)>0 for all p[m]p\in[m] and t[t1,t2]t\in[t_{1},t_{2}], there exists a q[m]q\in[m] and an εq>0\varepsilon_{q}>0 such that

lim infnsupt[t1,t2]Pr((sq(n)(t),βq(n)(t))(yq(t),wq(t))2>εq)>0.\liminf_{n\rightarrow\infty}\sup_{t\in[t_{1},t_{2}]}\text{Pr}\left(\left\|\left(s_{q}^{(n)}(t),\beta_{q}^{(n)}(t)\right)-\left(y_{q}(t),w_{q}(t)\right)\right\|_{2}>\varepsilon_{q}\right)>0.

Proof.

Suppose, on the contrary, that there exists a time interval [t1,t2][0,)[t_{1},t_{2}]\subset[0,\infty) such that yp(t)>0y_{p}(t)>0 and wp(t)>0w_{p}(t)>0 for all p[m]p\in[m] and t[t1,t2]t\in[t_{1},t_{2}], and the following holds for all q[m]q\in[m] and all εq>0\varepsilon_{q}>0:

lim infnsupt[t1,t2]Pr((sq(n)(t),βq(n)(t))(yq(t),wq(t))2>εq)=0,\liminf_{n\rightarrow\infty}\sup_{t\in[t_{1},t_{2}]}\text{Pr}\left(\left\|\left(s_{q}^{(n)}(t),\beta_{q}^{(n)}(t)\right)-\left(y_{q}(t),w_{q}(t)\right)\right\|_{2}>\varepsilon_{q}\right)=0,

i.e., for a fixed ε>0\varepsilon>0, there exists a sequence {π(n)}n=1\{\pi(n)\}_{n=1}^{\infty}\subset\mathbb{N} such that

limnsupt[t1,t2]Pr((sq(π(n))(t),βq(π(n))(t))(yq(t),wq(t))2>ε)=0.\lim_{n\rightarrow\infty}\sup_{t\in[t_{1},t_{2}]}\text{Pr}\left(\left\|\left(s_{q}^{(\pi(n))}(t),\beta_{q}^{(\pi(n))}(t)\right)-\left(y_{q}(t),w_{q}(t)\right)\right\|_{2}>\varepsilon\right)=0.

We then arrive at a contradiction, as shown below.

We first choose an η>2(1+Amaxλ)\eta>2\left(1+\frac{A_{\max}}{\lambda_{\infty}}\right) (where Amax:=max{Apq:p,q[m]}A_{\max}:=\max\{A_{pq}:p,q\in[m]\}) and a t(t1,t2)t\in(t_{1},t_{2}). By our hypothesis and norm equivalence, for κ0:=1ηλ\kappa_{0}:=\frac{1}{\eta\lambda_{\infty}} and for every δ>0\delta>0, there exists an Nε,δN_{\varepsilon,\delta}\in\mathbb{N} such that

Pr((sq(π(n))(τ),βq(π(n))(τ))(yq(τ),wq(τ))1ε)1δ\text{Pr}\left(\left\|\left(s_{q}^{(\pi(n))}\left(\tau\right),\beta_{q}^{(\pi(n))}\left(\tau\right)\right)-\left(y_{q}\left(\tau\right),w_{q}\left(\tau\right)\right)\right\|_{1}\leq\varepsilon\right)\geq 1-\delta

for all nNε,δn\geq N_{\varepsilon,\delta} and all τ[tκ0,t]\tau\in[t-\kappa_{0},t].

We now define α0:=minp,q[m]mint[t1,t2]yp(t)wq(t)\alpha_{0}:=\min_{p,q\in[m]}\min_{t\in[t_{1},t_{2}]}y_{p}(t)w_{q}(t) and we let aa and bb be any two nodes such that (a,b)𝒮i(t)×j(t)(a,b)\in\mathcal{S}_{i}(t)\times\mathcal{I}_{j}(t) for arbitrary i,j[m]i,j\in[m]. Additionally, we let K:=tinf{τ0:b(τ)}K:=t-\inf\{\tau\geq 0:b\in\mathcal{I}(\tau)\} denote the (random) time elapsed between the time at which bb gets infected and time tt. We then have

Pr(Kκ0𝒮(t),(t))\displaystyle\text{Pr}\left(K\leq\kappa_{0}\mid\mathcal{S}(t),\mathcal{I}(t)\right) =Pr(b is infected during [tκ0,t]𝒮(t),(t))\displaystyle=\text{Pr}\left(b\text{ is infected during }[t-\kappa_{0},t]\mid\mathcal{S}(t),\mathcal{I}(t)\right) (3)
(a)1eAmaxκ0\displaystyle\stackrel{{\scriptstyle(a)}}{{\leq}}1-e^{-A_{\max}\kappa_{0}} (4)
(b)Amaxκ0\displaystyle\stackrel{{\scriptstyle(b)}}{{\leq}}A_{\max}\kappa_{0} (5)
=Amaxηλ,\displaystyle=\frac{A_{\max}}{\eta\lambda_{\infty}}, (6)

where (b)(b) holds because 1euu1-e^{-u}\leq u for all u0u\geq 0, and (a)(a) can be explained as follows: given (𝒮(τ),(τ))(\mathcal{S}(\tau),\mathcal{I}(\tau)) and an infected node cq(τ)c\in\mathcal{I}_{q}(\tau) for some time τ[tκ0,t)\tau\in[t-\kappa_{0},t), and given that b𝒮j(τ)b\in\mathcal{S}_{j}(\tau), we know from Proposition 2 that the conditional probability of the edge (b,c)(b,c) existing in the network at time τ\tau is at most ρjqπ(n)\frac{\rho_{jq}}{\pi(n)}. Also, as per the definition of our stochastic epidemic model, given that (b,c)E(τ)(b,c)\in E(\tau) and given (𝒮(τ),(τ))(\mathcal{S}(\tau),\mathcal{I}(\tau)) (and hence, also that (b,c)𝒮j(τ)×q(τ)(b,c)\in\mathcal{S}_{j}(\tau)\times\mathcal{I}_{q}(\tau)), the conditional rate of infection transmission from cc to bb at time τ\tau is BjqB_{jq}. Hence, given (𝒮(τ),(τ))(\mathcal{S}(\tau),\mathcal{I}(\tau)) (and hence, that (b,c)𝒮j(τ)×q(τ)(b,c)\in\mathcal{S}_{j}(\tau)\times\mathcal{I}_{q}(\tau)), the conditional rate of infection transmission from cc to bb is at most Bjqρjqπ(n)=Ajqπ(n)B_{jq}\frac{\rho_{jq}}{\pi(n)}=\frac{A_{jq}}{\pi(n)}. Under our modelling assumption that distinct edges transmit the infection independently of each other during vanishingly small time intervals, this means that, conditional on 𝒮(τ)\mathcal{S}(\tau) and (τ)\mathcal{I}(\tau), the conditional total rate at which bb receives infection is at most

q[m]cq(τ)Ajqπ(n)=q[m]|q(τ)|Ajqπ(n)=q[m]βq(π(n))(τ)AjqAmaxq[m]βq(π(n))(τ)Amax1.\sum_{q\in[m]}\sum_{c\in\mathcal{I}_{q}(\tau)}\frac{A_{jq}}{\pi(n)}=\sum_{q\in[m]}|\mathcal{I}_{q}(\tau)|\frac{A_{jq}}{\pi(n)}=\sum_{q\in[m]}\beta_{q}^{(\pi(n))}(\tau)A_{jq}\leq A_{\max}\sum_{q\in[m]}\beta_{q}^{(\pi(n))}(\tau)\leq A_{\max}\cdot 1.

Note that this upper bound is time-invariant and does not depend on 𝒮(τ)\mathcal{S}(\tau) or (τ)\mathcal{I}(\tau) for any time τ\tau. It thus follows that, conditional on (𝒮(t),(t))(\mathcal{S}(t),\mathcal{I}(t)), the rate at which bb gets infected is at most AmaxA_{\max} throughout the interval [tκ0,t)[t-\kappa_{0},t) and hence, the probability that bb does not get infected during an interval of length κ0\kappa_{0} is at least eAmaxκ0e^{-A_{\max}\kappa_{0}}. This implies (a)(a).

We now infer from (3) that

Pr(Kκ0𝒮(t),(t))1Amaxηλ.\displaystyle~{}\text{Pr}\left(K\geq\kappa_{0}\mid\mathcal{S}(t),\mathcal{I}(t)\right)\geq 1-\frac{A_{\max}}{\eta\lambda_{\infty}}. (7)

Next, we lower-bound Pr(Tκ0𝒮(t),(t))\text{Pr}(T\geq\kappa_{0}\mid\mathcal{S}(t),\mathcal{I}(t)). To this end, note from Proposition 2 that Pr((a,b)E(t)𝒮(t),(t))ρijπ(n)\text{Pr}((a,b)\in E(t)\mid\mathcal{S}(t),\mathcal{I}(t))\leq\frac{\rho_{ij}}{\pi(n)}. As a result, we have

|Pr(Tκ0𝒮(t),(t))Pr(Tκ0(a,b)E(t),𝒮(t),(t))|\displaystyle|\text{Pr}(T\geq\kappa_{0}\mid\mathcal{S}(t),\mathcal{I}(t))-\text{Pr}(T\geq\kappa_{0}\mid(a,b)\notin E(t),\mathcal{S}(t),\mathcal{I}(t))|
=|Pr(Tκ0(a,b)E(t),𝒮(t),(t))(1Pr((a,b)E(t)𝒮(t),(t)))\displaystyle=|\text{Pr}(T\geq\kappa_{0}\mid(a,b)\notin E(t),\mathcal{S}(t),\mathcal{I}(t))(1-\text{Pr}((a,b)\in E(t)\mid\mathcal{S}(t),\mathcal{I}(t)))
+Pr(Tκ0(a,b)E(t),𝒮(t),(t))Pr((a,b)E(t)𝒮(t),(t))\displaystyle\quad+\text{Pr}(T\geq\kappa_{0}\mid(a,b)\in E(t),\mathcal{S}(t),\mathcal{I}(t))\cdot\text{Pr}((a,b)\in E(t)\mid\mathcal{S}(t),\mathcal{I}(t))
Pr(Tκ0(a,b)E(t),𝒮(t),(t))|\displaystyle\quad-\text{Pr}(T\geq\kappa_{0}\mid(a,b)\notin E(t),\mathcal{S}(t),\mathcal{I}(t))|
ρijπ(n),\displaystyle\leq\frac{\rho_{ij}}{\pi(n)},

which also means that

|Pr(T<κ0(𝒮(t),(t)))Pr(T<κ0(a,b)E(t),(𝒮(t),(t)))|ρijπ(n).\displaystyle|\text{Pr}(T<\kappa_{0}\mid(\mathcal{S}(t),\mathcal{I}(t)))-\text{Pr}(T<\kappa_{0}\mid(a,b)\notin E(t),(\mathcal{S}(t),\mathcal{I}(t)))|\leq\frac{\rho_{ij}}{\pi(n)}. (8)

Moreover, for any realization (𝒮0,0)(\mathcal{S}_{0},\mathcal{I}_{0}) of (𝒮(t),(t))(\mathcal{S}(t),\mathcal{I}(t)), Remark 5 asserts that

Pr(Tκ0K=κ,(𝒮(t),(t))=(𝒮0,0),(a,b)E(t))1eλκ0\displaystyle\text{Pr}(T\leq\kappa_{0}\mid K=\kappa,(\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0}),(a,b)\notin E(t))\leq 1-e^{-\lambda\kappa_{0}}

for all 0κt0\leq\kappa\leq t. Since the right-hand-side above is independent of both κ\kappa and (𝒮0,0)(\mathcal{S}_{0},\mathcal{I}_{0}), it follows that

Pr(Tκ0(𝒮(t),(t)),(a,b)E(t))1eλκ0λκ0.\displaystyle\text{Pr}(T\leq\kappa_{0}\mid(\mathcal{S}(t),\mathcal{I}(t)),(a,b)\notin E(t))\leq 1-e^{-\lambda\kappa_{0}}\leq\lambda\kappa_{0}. (9)

Therefore, as a consequence of (7), (8), (9), and the union bound, we have

Pr(Tκ0,Kκ0𝒮(t),(t))\displaystyle\text{Pr}(T\geq\kappa_{0},K\geq\kappa_{0}\mid\mathcal{S}(t),\mathcal{I}(t)) =1Pr({T<κ0}{K<κ0}𝒮(t),(t))\displaystyle=1-\text{Pr}(\{T<\kappa_{0}\}\cup\{K<\kappa_{0}\}\mid\mathcal{S}(t),\mathcal{I}(t)) (10)
1Pr(T<κ0𝒮(t),(t))Pr(K<κ0𝒮(t),(t))\displaystyle\geq 1-\text{Pr}(T<\kappa_{0}\mid\mathcal{S}(t),\mathcal{I}(t))-\text{Pr}(K<\kappa_{0}\mid\mathcal{S}(t),\mathcal{I}(t)) (11)
1Amaxηλλ(π(n))ηλρijπ(n).\displaystyle\geq 1-\frac{A_{\max}}{\eta\lambda_{\infty}}-\frac{\lambda^{(\pi(n))}}{\eta\lambda_{\infty}}-\frac{\rho_{ij}}{\pi(n)}. (12)

This further yields,

χij(t,𝒮,)\displaystyle\chi_{ij}(t,\mathcal{S},\mathcal{I}) =Pr((a,b)E(t)𝒮(t),(t))\displaystyle=\text{Pr}((a,b)\in E(t)\mid\mathcal{S}(t),\mathcal{I}(t)) (13)
=Pr((a,b)E(t)Tκ0,Kκ0,𝒮(t),(t))Pr(Tκ0,Kκ0𝒮(t),(t))\displaystyle=\text{Pr}((a,b)\in E(t)\mid T\geq\kappa_{0},K\geq\kappa_{0},\mathcal{S}(t),\mathcal{I}(t))\cdot\text{Pr}(T\geq\kappa_{0},K\geq\kappa_{0}\mid\mathcal{S}(t),\mathcal{I}(t)) (14)
+Pr((a,b)E(t){T<κ0}{K<κ0},𝒮(t),(t))Pr({T<κ0}{K<κ0}𝒮(t),(t))\displaystyle\quad+\text{Pr}((a,b)\in E(t)\mid\{T<\kappa_{0}\}\cup\{K<\kappa_{0}\},\mathcal{S}(t),\mathcal{I}(t))\cdot\text{Pr}(\{T<\kappa_{0}\}\cup\{K<\kappa_{0}\}\mid\mathcal{S}(t),\mathcal{I}(t)) (15)
ρijπ(n)eBijκ0(1Amaxηλλ(π(n))ηλρijπ(n))+(Amaxηλ+λ(π(n))ηλ+ρijπ(n))ρijπ(n)\displaystyle\leq\frac{\rho_{ij}}{\pi(n)}e^{-B_{ij}\kappa_{0}}\left(1-\frac{A_{\max}}{\eta\lambda_{\infty}}-\frac{\lambda^{(\pi(n))}}{\eta\lambda_{\infty}}-\frac{\rho_{ij}}{\pi(n)}\right)+\left(\frac{A_{\max}}{\eta\lambda_{\infty}}+\frac{\lambda^{(\pi(n))}}{\eta\lambda_{\infty}}+\frac{\rho_{ij}}{\pi(n)}\right)\frac{\rho_{ij}}{\pi(n)} (16)
=ρijπ(n)eBijηλ(1Amaxηλλ(π(n))ηλρijπ(n))+(Amaxηλ+λ(π(n))ηλ+ρijπ(n))ρijπ(n)\displaystyle=\frac{\rho_{ij}}{\pi(n)}e^{-\frac{B_{ij}}{\eta\lambda_{\infty}}}\left(1-\frac{A_{\max}}{\eta\lambda_{\infty}}-\frac{\lambda^{(\pi(n))}}{\eta\lambda_{\infty}}-\frac{\rho_{ij}}{\pi(n)}\right)+\left(\frac{A_{\max}}{\eta\lambda_{\infty}}+\frac{\lambda^{(\pi(n))}}{\eta\lambda_{\infty}}+\frac{\rho_{ij}}{\pi(n)}\right)\frac{\rho_{ij}}{\pi(n)} (17)

where the inequality is a consequence of (10) and Remark 4. Recall that limnλ(n)=λ\lim_{n\rightarrow\infty}\lambda^{(n)}=\lambda_{\infty}, which means that the right hand side can be made smaller than (1+ε)ρijπ(n)eBijηλ(1+\varepsilon)\frac{\rho_{ij}}{\pi(n)}e^{-\frac{B_{ij}}{\eta\lambda_{\infty}}} by choosing nn large enough. Moreover (13) holds for all t[t1,t2]t\in[t_{1},t_{2}].

Proposition 1 now implies that for all t[t1,t2]t\in[t_{1},t_{2}] and large enough nn,

𝔼[si]=j=1mBij𝔼[nχijsiβj]>j=1mAij(1+ε)eBijηλ𝔼[siβj],\displaystyle\mathbb{E}[s_{i}]^{\prime}=-\sum_{j=1}^{m}B_{ij}\mathbb{E}[n\chi_{ij}s_{i}\beta_{j}]>-\sum_{j=1}^{m}A_{ij}(1+\varepsilon)e^{-\frac{B_{ij}}{\eta\lambda_{\infty}}}\mathbb{E}[s_{i}\beta_{j}], (18)

Now, observe that for any t[t1,t2]t\in[t_{1},t_{2}], we have

𝔼[si(t)βj(t)]\displaystyle\mathbb{E}[s_{i}(t)\beta_{j}(t)] (19)
1Pr((sq(n)(τ),βq(n)(τ))(yq(τ),wq(τ))1>ε)\displaystyle\leq 1\cdot\text{Pr}\left(\left\|\left(s_{q}^{(n)}\left(\tau\right),\beta_{q}^{(n)}\left(\tau\right)\right)-\left(y_{q}\left(\tau\right),w_{q}\left(\tau\right)\right)\right\|_{1}>\varepsilon\right) (20)
+(wi(t)+ε)(yj(t)+ε)Pr((sq(n)(τ),βq(n)(τ))(yq(τ),wq(τ))1ε)\displaystyle\quad+(w_{i}(t)+\varepsilon)(y_{j}(t)+\varepsilon)\cdot\text{Pr}\left(\left\|\left(s_{q}^{(n)}\left(\tau\right),\beta_{q}^{(n)}\left(\tau\right)\right)-\left(y_{q}\left(\tau\right),w_{q}\left(\tau\right)\right)\right\|_{1}\leq\varepsilon\right) (21)
δ+yi(t)wj(t)+2ε+ε2.\displaystyle\leq\delta+y_{i}(t)w_{j}(t)+2\varepsilon+\varepsilon^{2}. (22)

Therefore, assuming that δ\delta and ε\varepsilon are small enough to satisfy δ+2ε+ε2<α0(eBijηλ1+ε1)\delta+2\varepsilon+\varepsilon^{2}<\alpha_{0}\left(\frac{e^{\frac{B_{ij}}{\eta\lambda_{\infty}}}}{1+\varepsilon}-1\right), (18) implies the existence of a constant ε>0\varepsilon^{\prime}>0 such that :

𝔼[si]=j=1mBij𝔼[nχijsiβj]j=1mAijyi(t)wj(t)+ε=yi(t)+ε.\mathbb{E}[s_{i}]^{\prime}=-\sum_{j=1}^{m}B_{ij}\mathbb{E}[n\chi_{ij}s_{i}\beta_{j}]\geq-\sum_{j=1}^{m}A_{ij}y_{i}(t)w_{j}(t)+\varepsilon^{\prime}=y_{i}^{\prime}(t)+\varepsilon^{\prime}.

Since this holds for all t[t1,t2]t\in[t_{1},t_{2}], we have

𝔼[si(π(n))(t2)]yi(t2)𝔼[si(π(n))(t1)]yi(t1)+(t2t1)ε\mathbb{E}[s_{i}^{(\pi(n))}(t_{2})]-y_{i}(t_{2})\geq\mathbb{E}[s_{i}^{(\pi(n))}(t_{1})]-y_{i}(t_{1})+(t_{2}-t_{1})\varepsilon^{\prime}

for all sufficiently large nn. Here, we observe that {si(n)(t1),βj(n)(t1):i,j[m],n}\{s_{i}^{(n)}(t_{1}),\beta_{j}^{(n)}(t_{1}):i,j\in[m],n\in\mathbb{N}\} are bounded by the constant function 11, which is integrable with respect to probability measures. Therefore, {si(n)(t1):i,j[m],n}\{s_{i}^{(n)}(t_{1}):i,j\in[m],n\in\mathbb{N}\} are uniformly integrable. Since they converge in probability to {yi(t1):i,j[m]}\{y_{i}(t_{1}):i,j\in[m]\} (by hypothesis), it follows by Vitali’s Convergence Theorem that they also converge in L1L^{1}-norm. Thus, 𝔼[si(n)(t1)]yi(t1)0\mathbb{E}[s_{i}^{(n)}(t_{1})]-y_{i}(t_{1})\rightarrow 0 as nn\rightarrow\infty, thereby implying that

lim infn(𝔼[si(π(n))(t2)]yi(t2))(t2t1)ε.\liminf_{n\rightarrow\infty}\left(\mathbb{E}[s_{i}^{(\pi(n))}(t_{2})]-y_{i}(t_{2})\right)\geq(t_{2}-t_{1})\varepsilon^{\prime}. (23)

On the other hand, Vitali’s Convergence Theorem and our hypothesis also imply that 𝔼[si(n)(t2)]yi(t2)\mathbb{E}[s_{i}^{(n)}(t_{2})]\rightarrow y_{i}(t_{2}) as nn\rightarrow\infty, which contradicts (23). Hence, our hypothesis that {sq(n)(t),βq(n)(t)}q[m]\{s_{q}^{(n)}(t),\beta_{q}^{(n)}(t)\}_{q\in[m]} converge in probability to the solutions of (I) and (II) uniformly on the interval [t1,t2][t_{1},t_{2}] is false. This completes the proof.



Before interpreting Theorem 2, we first note that the result only applies to the time intervals on which {yi(t)wj(t):i,j[m]}\{y_{i}(t)w_{j}(t):i,j\in[m]\} are positive throughout the interval. Although this condition appears stringent, it is mild from the viewpoint of epidemic spreading in the real world. This is because, in practice we are only interested in time periods during which every age group has infected cases (which ensures that βj(t)>0\beta_{j}(t)>0 for all j[m]j\in[m]), and most epidemics leave behind uninfected individuals (thereby ensuring that si(t)>0s_{i}(t)>0 for all i[m]i\in[m]). Therefore, Theorem 2 applies to all time intervals of practical interest.

Restricting our focus to such intervals, Theorem 2 asserts that, if the edge update rate does not go to \infty with the population size, then there exists a positive lower bound on the probability of the age-wise infected and susceptible fractions differing significantly from the corresponding solutions of the age-structured SIR ODEs at one or more points of time in the considered time interval. At this point, we remark that for large populations, the edge update rate λ\lambda is approximately the reciprocal of the mean duration of every interaction in the network. This means that the greater the value of λ\lambda, the faster will be the changes that occur in the social interaction patterns of the network. Therefore, in conjunction with Theorem 1, Theorem 2 enables us to draw the following inference: the age-structured SIR model can be expected to approximate a real-world epidemic spreading in a large population accurately if and only if the social interaction patterns of the network change rapidly with time. This is more likely to be the case in crowded public places such as supermarkets and airports.

There is another way to interpret Theorems 1 and 2. Note that we have assumed that the sequence of edge states realized during the timeline of the epidemic are independent for every pair of nodes in the network. Therefore, for greater values of λ\lambda, the network structure becomes more unrecognizable from its past realizations. Thus, the age-structured SIR model can be expected to approximate epidemic spreading well if and only if the network is highly memoryless, i.e., if and only if the network continually “forgets” its past interaction patterns throughout the timeline of the epidemic under study.

Remark 2.

Observe from the proof of Theorem 2 that the difference between 𝔼[si]\mathbb{E}[s_{i}]^{\prime}, the first derivative of the expected fraction of infected nodes in 𝒜i\mathcal{A}_{i}, and yiy_{i}^{\prime}, the first derivative of the corresponding ODE solution yi(t)y_{i}(t), is small only if eBijηλe^{-\frac{B_{ij}}{\eta\lambda_{\infty}}} is close to 1, which happens when λBij\lambda_{\infty}\gg B_{ij}. Moreover, this observation is consistent with Remark 1, according to which the total infection rate from [n]=j=1m𝒜j[n]=\cup_{j=1}^{m}\mathcal{A}_{j} to any given susceptible node in 𝒜i\mathcal{A}_{i} is close to j=1mAijβj(t)\sum_{j=1}^{m}A_{ij}\beta_{j}(t) (and hence, in close agreement with the ODEs (1)) when Bijλ(n)1\frac{B_{ij}}{\lambda^{(n)}}\ll 1. Along with Theorems 1 and 2, this means that the age-structured SIR model is likely to approximate real-world epidemic spreading well if and only if the infection transmission rates are negligible when compared to the social mixing rate λ\lambda.

Intuitively, when Bijλ1\frac{B_{ij}}{\lambda}\ll 1, the time scales (the mean duration of time) over which the concerned disease spreads from any age group to any other age group are orders of magnitude greater than the time scale over which the network is updated. As a result, the independence of the sequences of edge state updates ensures that most of the possible realizations of the network structure are attained over the time scale of infection transmission. Equivalently, from the viewpoint of the pathogens causing the disease, the effective network structure (the network topology averaged over any of the age-wise infection timescales) is close to being a complete graph. Hence, by extrapolating the existing results on mean-field limits of epidemic processes on complete graphs (such as [30]) to heterogeneous epidemic models, we can assert that the age-structured SIR ODEs are able to approximate the epidemic propagation with a high accuracy.

On the other hand, if the infection rates BijB_{ij} are too high (and hence, comparable to the social mixing rate λ\lambda, which is always finite in reality), the pathogens perceive a randomly generated network even on the time scale of infection transmission. Since this random network is sparse (because we assume the expected node degrees to be constant, which results in the edge probability scaling inversely with the population size), it follows that the number of transmissions occurring in any given time period is likely to be smaller than in the case of a complete graph. Thus, the age-structued SIR ODEs overestimate the rate of growth of age-wise infected fractions. This is further confirmed by the sign of the inequality in (23).

5 EMPIRICAL VALIDATION

We now validate the age-structured SIR model in the context of the COVID-19 pandemic in Japan as follows: we first estimate the model parameters using the data provided by the Government of Japan, and we then compare the trajectories generated by the model with the reference data.

5.1 Dataset

We use a dataset provided by the Government of Japan at [32]. This dataset partitions the population of the prefecture of Tokyo into m=5m=5 age groups: 0 - 19, 20 - 39, 40 - 59, 60 - 79, and 80+ years old individuals. For each age group i[m]i\in[m] and each day kk in the year-long timeline Γ={March 10, 2020,,April 9, 2021}\Gamma=\{\text{March 10, 2020},\ldots,\text{April 9, 2021}\}, the dataset lists the total number of people infected in the age group until date kk. We denote this number by IiT[k]I_{i}^{T}[k].

5.2 Preprocessing

Due to several factors, such as lack of reporting/testing on the weekends, the raw data has missing information and is contaminated with noise. Therefore, using a moving average filter with a window size of 15 days, we de-noise the raw data to obtain the estimated total number of infected individuals by day kk in age group ii, denoted by IiT[k]I_{i}^{T}[k]. We then estimate from the smoothed data the number of susceptible, infected, and recovered individuals in age group i[m]i\in[m] on day kk, denoted by Si[k]S_{i}[k], Ii[k]I_{i}[k], and Ri[k]R_{i}[k], respectively. We do this as follows: for any age group i[m]i\in[m] and day kΓk\in\Gamma, we have IiT[k]=Ii[k]+Ri[k]I_{i}^{T}[k]=I_{i}[k]+R_{i}[k], because the cumulative number of infections IiT[k]I_{i}^{T}[k] includes both active COVID-19 cases and closed cases (cases of individuals who were infected in the past but recovered/succumbed by day kk). Therefore, to estimate Ii[k]I_{i}[k] and Ri[k]R_{i}[k] from IiT[k]I_{i}^{T}[k], we assume that every infected individual takes exactly TR=14T_{R}=14 days to recover. This assumption is consistent with WHO’s criteria for discharging patients from isolation (i.e., discontinuing transmission-based precautions) [33] after a period involving the first 10 days from the onset of symptoms and 3 additional symptom-free days (if the patient is originally symptomatic) or after 10 days from being tested positive for SARS-CoV-2 (if the patient is asymptomatic). After the required period, the patients were not required to re-test. Under such an assumption on the recovery time, we have Ri[k]=IiT[kTR]R_{i}[k]=I_{i}^{T}[k-T_{R}] and Ii[k]=IiT[k]IiT[kTR]I_{i}[k]=I_{i}^{T}[k]-I_{i}^{T}[k-T_{R}]. Next, we obtain Si[k]S_{i}[k] by subtracting IiT[k]I_{i}^{T}[k] from the total population of 𝒜i\mathcal{A}_{i}, which is obtained from the age distribution and the total population of Tokyo.

We must mention that in the subsequent analysis, all infected individuals are considered infectious, i.e., they can potentially transmit the SARS-CoV-2 virus to their susceptible contacts. This assumption, on which the classical SIR model and all its variants are based, is consistent with the CDC’s understanding of the first wave of SARS-CoV-2 infection, which claims that every infected individual remains infectious for up to about 10 days from the onset of symptoms, though the exact duration of the period of infectiousness remains uncertain [34].

5.3 Parameter Estimation Algorithm

Before estimating the parameters of our model, we discretize the ODEs (1) with a step size of 1 day and obtain the following:

si[k+1]si[k]\displaystyle s_{i}[k+1]-s_{i}[k] =si[k]j=1mAijβj[k],\displaystyle=-s_{i}[k]\sum^{m}_{j=1}A_{ij}\beta_{j}[k], (24)
βi[k+1]βi[k]\displaystyle\beta_{i}[k+1]-\beta_{i}[k] =si[k]j=1mAijβj[k]γiβi[k],\displaystyle=s_{i}[k]\sum^{m}_{j=1}A_{ij}\beta_{j}[k]-\gamma_{i}\beta_{i}[k], (25)
ri[k+1]ri[k]\displaystyle r_{i}[k+1]-r_{i}[k] =γiβi[k],\displaystyle=\gamma_{i}\beta_{i}[k],

A key observation here is that these equations are linear in the model parameters. Therefore, given the sets of fractions {si[k]:i[m],kΓ}\{s_{i}[k]:i\in[m],k\in\Gamma\}, {βi[k]:i[m],kΓ}\{\beta_{i}[k]:i\in[m],k\in\Gamma\}, and {ri[k]:i[m],kΓ}\{r_{i}[k]:i\in[m],k\in\Gamma\} (which we obtain by implementing the data processing steps described above) for all i[m]i\in[m], we can express (24) in the form of a matrix equation Cx=dCx=d, where the column vector xm2+mx\in\mathbb{R}^{m^{2}+m} is a stack of the parameters {Aij:1i,jm}\{A_{ij}:1\leq i,j\leq m\} and {γi:1im}\{\gamma_{i}:1\leq i\leq m\}, the column vector dd is a stack of the increments {si[k+1]si[k]:i[m],kΓ}\{s_{i}[k+1]-s_{i}[k]:i\in[m],k\in\Gamma\}, {βi[k+1]βi[k]:i[m],kΓ}\{\beta_{i}[k+1]-\beta_{i}[k]:i\in[m],k\in\Gamma\}, and {ri[k+1]ri[k]:i[m],kΓ}\{r_{i}[k+1]-r_{i}[k]:i\in[m],k\in\Gamma\}, and CC is a matrix of coefficients. Thus, solving the least-squares problem (26) gives us the best estimates of the model parameters {Aij:i,j[m]}{γi:i[m]}{\{A_{ij}:i,j\in[m]\}\cup\{\gamma_{i}:i\in[m]\}} in the mean-square sense.

x^=argminx0Cxd2.\displaystyle\hat{x}=\underset{x\geq 0}{\text{argmin}}\|Cx-d\|_{2}. (26)

However, the values of the contact rates AijA_{ij} change as and when the patterns of social interaction in the network change during the course of the pandemic. For this reason, we assume that the pandemic timeline splits up into multiple phases, say Γ1,,Γs\Gamma_{1},\ldots,\Gamma_{s}, with the contact rates varying across phases, and we perform the required optimization separately for each phase. At the same time, we do not expect the contact rates to make quantum leaps (or falls) from one phase to the next. Therefore, for every 2\ell\geq 2, in the objective function corresponding to Phase \ell we introduce a regularization term that penalizes any deviation of the optimization variables from the model parameters estimated for the previous phase (Phase 1\ell-1). Adding this term also ensures that our parameter estimation algorithm does not overfit the data associated with any one phase. Our optimization problem for Phase \ell thus becomes

x^()=argminx0(Cxd2+λx()x(1)2),\displaystyle\hat{x}^{(\ell)}=\underset{x\geq 0}{\text{argmin}}\left(\|Cx-d\|_{2}+\lambda\|x^{(\ell)}-x^{(\ell-1)}\|_{2}\right), (27)

where x()x^{(\ell)} is the parameter vector estimated for Phase \ell.

We now summarize this parameter estimation algorithm for Phase [s]\ell\in[s] .

Algorithm 1 Parameter Estimation Algorithm for Phase \ell

Input: (si[k],βi[k],ri[k])(s_{i}[k],\beta_{i}[k],r_{i}[k]) for all i[m]i\in[m] and kΓk\in\Gamma_{\ell}
      Output: x^\hat{x}

1:function Get_Parameters((si[k],βi[k],ri[k])(s_{i}[k],\beta_{i}[k],r_{i}[k]))
2:     for each day, each age group do
3:         Stack the difference equations (24) vertically      
4:     Obtain the matrix equation Cx=dCx=d
5:     Solve Least Squares Problem (27)
6:     return x^\hat{x}

5.4 Phase Detection Algorithm

We now provide an algorithm that divides the timeline of the pandemic into multiple phases in such a way that the beginning of each new phase indicates a significant change in one or more of the contact rates {Aij:i,j[m]}\{A_{ij}:i,j\in[m]\}.

Given the pandemic timeline {p0,,ps}\{p_{0},\ldots,p_{s}\} (where p0p_{0} denotes March 10, 2020 and psp_{s} denotes April 9, 2021), our phase detection algorithm outputs s1s-1 phase boundaries p1p2ps1p_{1}\leq p_{2}\leq\cdots\leq p_{s-1} that divide [p0,ps)[p_{0},p_{s}) into ss phases, namely Γ1=[p0,p1),Γ2=[p1,p2),,Γs=[ps1,ps)\Gamma_{1}=[p_{0},p_{1}),\Gamma_{2}=[p_{1},p_{2}),\ldots,\Gamma_{s}=[p_{s-1},p_{s}). Central to the algorithm are the following optimization problems:

Problem (a): Unconstrained Optimization

minimizex0C[p,p+w)xd[p,p+w)2.\displaystyle\underset{x\geq 0}{\text{minimize}}\ \ \ \|C_{[p,p+w)}x-d_{[p,p+w)}\|_{2}. (28)

Problem (b): Constrained Optimization

minimizex0\displaystyle\underset{x\geq 0}{\text{minimize}}\ \ \ C[p,p+w)xd[p,p+w)2,\displaystyle\|C_{[p,p+w)}x-d_{[p,p+w)}\|_{2}, (29)
subject to xx¯[p,p+w)2εx¯[pΔp,pΔp+w)2.\displaystyle\|x-\bar{x}_{[p,p+w)}\|_{2}\leq\varepsilon\|\bar{x}_{[p-\Delta p,p-\Delta p+w)}\|_{2}. (30)

In these problems, pΓp\in\Gamma denotes the start date (chosen recursively as described in Algorithm 2), ww\in\mathbb{N} is the optimization window, Δp<w\Delta p<w is the algorithm step size, [p,p+w)[p,p+w) denotes a ww-day period from day pp, C[p,p+w)C_{[p,p+w)} and d[p,p+w)d_{[p,p+w)} are obtained from {(si[k],βi[k],ri[k]):i[m],k{p,,p+w}}\{(s_{i}[k],\beta_{i}[k],r_{i}[k]):i\in[m],k\in\{p,\ldots,p+w\}\} by using the procedure described in Section 5.3, and x¯:=argminx0C[p,p+w)xd[p,p+w)2\bar{x}:=\operatornamewithlimits{argmin}_{x\geq 0}\|C_{[p,p+w)}x-d_{[p,p+w)}\|_{2} is the parameter vector estimated by Problem (a). We set w=30w=30 (days), and the quantities Δp\Delta p and ε\varepsilon are pre-determined algorithm parameters whose choice is discussed in the next subsection.

Observe that both Problem and Problem (b) result in the minimization of the mean-square error (31), where {(s^i[k],β^i[k],r^i[k]):i[m],k{p,,p+w}}{\{(\hat{s}_{i}[k],\hat{\beta}_{i}[k],\hat{r}_{i}[k]):i\in[m],k\in\{p,\ldots,p+w\}\}} are the model-generated values (estimates) of the susceptible, infected, and recovered fractions {(si[k],βi[k],ri[k]):i[m],k{p,,p+w}}\{(s_{i}[k],\beta_{i}[k],r_{i}[k]):i\in[m],k\in\{p,\ldots,p+w\}\}. Also note that Problem (a) performs this minimization while ignoring all the previously estimated model parameters, whereas Problem (b) performs the same minimization while constraining xx to remain close to the parameter vector estimated for the period [pΔp,pΔp+w)[p-\Delta p,p-\Delta p+w). However, if the contact rates do not change significantly around day pp, then the additional constraint imposed in Problem (b) should be satisfied automatically (without imposition) in Problem (a), which should in turn result in the same mean-square error for both the problems.

=13m(w+1)i[m]k{p,,p+w}((si[k]s^i[k])2+(βi[k]β^i[k])2+(ri[k]r^i[k])2).\displaystyle\mathcal{E}=\frac{1}{3m(w+1)}\sum_{i\in[m]}\sum_{k\in\{p,\ldots,p+w\}}\bigg{(}(s_{i}[k]-\hat{s}_{i}[k])^{2}+(\beta_{i}[k]-\hat{\beta}_{i}[k])^{2}+(r_{i}[k]-\hat{r}_{i}[k])^{2}\bigg{)}. (31)

Therefore, after solving Problems (a) and (b), our phase detection algorithm compares (a)p\mathcal{E}_{(a)p} (the mean-square error for Problem (a)) with (b)p\mathcal{E}_{(b)p} (the mean-square error for Problem (b)) as follows: using (31), the algorithm first computes (a)p\mathcal{E}_{(a)p} and (b)p\mathcal{E}_{(b)p}. It then compares |(b)p(a)p|(a)p\frac{|\mathcal{E}_{(b)p}-\mathcal{E}_{(a)p}|}{\mathcal{E}_{(a)p}} with δ\delta, a positive threshold whose choice is discussed in the next subsection. If

|(b)p(a)p|(a)p>δ,\displaystyle\frac{|\mathcal{E}_{(b)p}-\mathcal{E}_{(a)p}|}{\mathcal{E}_{(a)p}}>\delta, (32)

then pp is identified as a phase boundary. Otherwise, the algorithm increments the value of pp by Δp\Delta p, checks whether the interval [p,p+w)[p,p+w) is part of the timeline Γ\Gamma, and repeats the entire procedure described above.

Finally, the algorithm merges every short phase (length 20\leq 20 days) with its predecessor by deleting the appropriate phase boundary(s). There are two reasons for this step. First, the contact rates are believed to change not instantly but with a transition period of positive duration. Second, since the data used is noisy, to avoid overfitting the data it is necessary for the number of data points per phase (given by 2m2m times the number of days per phase) to significantly exceed m2+mm^{2}+m, the number of model parameters to be estimated per phase.

We now provide the pseudocode for the entire algorithm. Observe that Problems (a) and (b) are both convex optimization problems. This enables us to use the Embedded Conic Solver (ECOS) [35] of CVXPY [36, 37] to implement our algorithm.

Algorithm 2 Phase Detection Algorithm

Input: (si[k],βi[k],ri[k])(s_{i}[k],\beta_{i}[k],r_{i}[k]) for all i[m]i\in[m] and kΓk\in\Gamma
      Output: Set of phase boundaries \mathcal{B}

1:function Detect_Phases((si[k],βi[k],ri[k])(s_{i}[k],\beta_{i}[k],r_{i}[k]))
2:     Initialize set of phase boundaries ϕ\mathcal{B}\leftarrow\phi
3:     Initialize start date pΔpp\leftarrow\Delta p
4:     while  pΓp\in\Gamma do
5:         Solve Problem (a) for window [p,p+w)[p,p+w)
6:         Solve Problem (b) for window [p,p+w)[p,p+w)
7:         if condition (32) holds then
8:              {p}\mathcal{B}\leftarrow\mathcal{B}\cup\{p\}          
9:         pp+Δpp\leftarrow p+\Delta p      
10:     Initialize pstart0p_{\text{start}}\leftarrow 0
11:     Initialize 𝐛list()\mathbf{b}\leftarrow\text{list}(\mathcal{B})
12:     Sort 𝐛\mathbf{b} in ascending order
13:     for p𝐛p\in\mathbf{b} do
14:         if ppstart20p-p_{\text{start}}\leq 20 then
15:              {p}\mathcal{B}\leftarrow\mathcal{B}\setminus\{p\}
16:         else
17:              pstart=pp_{\text{start}}=p               
18:     return \mathcal{B}

5.5 Selection of Algorithm Parameters

We now explain our parameter choices for the algorithms described above.

5.5.1 Phase Detection Algorithm

As mentioned earlier, for Algorithm 2, we set Δp=5\Delta p=5 days and the optimization window w=30w=30 days. This ensures that the optimization window is large enough for the number of model parameters to be significantly smaller than the number of data points used to estimate these parameters in Problems (a) and (b). In addition, we set δ=3\delta=3, and ε=104\varepsilon=10^{-4} for the following reasons:

  1. 1.

    ε=104\varepsilon=10^{-4}: If both [p,p+w)[p,p+w) and [pΔp,pΔp+w)[p-\Delta p,p-\Delta p+w) are sub-intervals of the same phase, then the same set of contact rates (and hence the same parameter vector xx) should apply to the network during both the time intervals.

  2. 2.

    δ=3\delta=3: If day pp marks the beginning of a new phase (i.e., a new set of contact rates), we expect the least-squares error (31) to increase significantly upon the imposition of the constraint introduced in (29).

5.5.2 Parameter Estimation Algorithm

We set λ=105\lambda=10^{-5} in (27). This small but non-zero value is consistent with our belief that around every phase boundary, contact rates change gradually but significantly during a transition period involving the phase boundary.

5.6 Results

We now present the results of implementing both the algorithms on our chosen dataset.

5.6.1 Phase Detection

Algorithm 2 detects the following phases.

Phase From To Corresponding Events
1 Mar 10 2020 Mar 28 2020 Closure of Schools
2 Mar 28 2020 April 23 2020 Issuance of State of Emergency
3 April 23 2020 May 20 2020
4 May 20 2020 Jun 22 2020
5 Jun 22 2020 Jul 24 2020 Summer Vacation
6 Jul 24 2020 Aug 25 2020 Obon, Summer Vacation
7 Aug 25 2020 Sep 23 2020 Summer Vacation
8 Sep 23 2020 Oct 20 2020 “Go to Travel” Campaign
Relaxation of Immigration Policy
9 Oct 20 2020 Nov 14 2020 “Go to Eat” Campaign
“Go to Travel” Campaign
10 Nov 14 2020 Dec 19 2020
11 Dec 19 2020 Jan 12 2021 Issuance of State of Emergency
Winter Vacation
12 Jan 12 2021 Feb 07 2021
13 Feb 07 2021 Apr 09 2021
Table 1: Phases Detected by Algorithm 2 [38, 39]

Although some of the detected phases can be accounted for by identifying changes in governmental policies and major social events, many of them seem to result from changes in social interaction patterns that cannot be explained using public information sources (such as news websites). However, this is consistent with out intuition that social behavior is inherently dynamic – it displays significant changes even in the absence of government diktats and important calendar events. Moreover, except for the first phase, the length of every phase is at least 25 days, which points to the likely scenario that it takes at least 3 to 4 weeks for the contact rates to change significantly. This could be true because social behavior is often unorganized. In particular, the interaction patterns of any one individual are often not in synchronization with those of others.

Another noteworthy inference to be drawn from Table 1 and Figure 1 is that policy changes initiated by governments have a delayed effect at times. For example, the “Go to Travel” and the “Go to Eat” campaigns, launched between mid-September and mid-November (Phases 8 and 9), seem to have caused a spike in daily case counts in the subsequent phases (Phases 10 and 11). Likewise, the State of Emergency issued in Phase 11 seems to have come to fruition in Phase 12 and its effects appear to have remained until the last phase (Phase 13).

5.6.2 Parameter Estimation and Its Implications

Figure 1 below plots the original and the model-generated fractions of infected individuals in each age group as functions of time.

Refer to caption
Figure 1: Age-wise Daily Fractions of Infected Individuals in Tokyo, Japan: Original and Generated Trajectories

Figure 2 plots the estimated contact rates and labels the 10 most significant ones among the 25 rates.

As seen in Figure 1, three COVID-19 surges or “waves” occur during the considered timeline. For each wave, we explain below the corresponding contact rate variations and their implications with the help of the mobility data of Tokyo (Figure 3) collected by Google [40].

Refer to caption
Figure 2: Estimated Contact Rate Between Groups
Refer to caption
Figure 3: Mobility for Each Type of Place by Google

The period in which state of emergency is issued is highlighted in red.

The First Wave (March 2020 - June 2020, Phases 1 - 3)

This wave corresponds to a rapid surge in daily cases across the world followed by various governmental measures such as issuance of national emergencies, tightening of immigration policies, home quarantines, and school closures. In Japan, the national emergency consisted of various measures such as restrictions on service times in restaurants and bars, enforcement of work from home, and a limit on the number of people attending public events. As a result of these measures, the mobility of workplaces, retail and recreation, and transit stations dropped dramatically in April 2020 and remained low for over a month (Figure 3).

This drop is reflected in our simulation results (Figure 2), which show that the three greatest contact rates decreased steadily from April to June. However, Figure 2 also shows that contact rates from the age group 60-79 to most other age groups (shown in blue) remained remarkably high throughout the timeline Γ\Gamma. This may be because most people admitted to nursing homes are aged above 60 and frequently come in contact with the relatively younger care-taking staff. More strikingly, the contact rate from age group 60-79 to age group 80+ is consistently high. This could be because there is a significant number of married couples with members from both these age groups (thereby resulting in a high value of ρij\rho_{ij} for (i,j)=(m,m1)(i,j)=(m,m-1)) and because the age group 80+ has the lowest immunity levels, which leads to a large effective BijB_{ij} (infection rate) for (i,j)=(m,m1)(i,j)=(m,m-1).

The Second Wave (July 2020 - September 2020, Phases 4 - 7)

The most intriguing aspect of the second wave is that the wave subsided without any significant governmental interventions (such as the issuance of a nationwide emergency). To explain this phenomenon, some researchers point out that (i) the rate of PCR testing increased in July and thus more infections were detected in the first few weeks of the second wave, and (ii) people’s mobility decreased in August during the Japanese summer vacation period called “Obon” [39]. As we can infer from Figure 3, this decrease in mobility occurs primarily at workplaces and transit stations [39].

Besides Figure 3, our simulation results provide some insight into the second wave. Figure 1 shows that the contact rates from age group 60-79 to other age groups do not show any increase during the first few weeks of the wave. However, the intra-group contact rate of the age group 20-39 increases rapidly before this period and drops significantly in August, corresponding to a decrease in daily cases. This strongly suggests that the social activities of those aged between 20-39 played a key role in the second wave. Meanwhile, contact rates from the age group 60-79 decreased after the first wave, possibly because of an increase in the proportion of quarantined individuals among the elderly, which in turn could have resulted from an increased public awareness of older age groups’ higher susceptibility to the virus.

The Third Wave (October 2020 - January 2021, Phases 8 - 11)

This wave was the most severe of the three because in October, a policy promoting domestic travel (the “Go to Travel” campaign) was implemented in the Tokyo prefecture and eating out was promoted as well (as part of the “Go to Eat” campaign). In addition, Japan started relaxing its immigration policy in October. [39] points out that the “major factors for this rise include the government’s implementation of further policies to encourage certain activities, relaxed immigration restrictions, and people not reducing their level of activity”. This observation is supported by Figure 3, which shows that there is no drop in mobility in any category during the third wave. As a result, daily infection counts dropped only after the second state of emergency was issued by the government on January 7, 2021.

In agreement with these observations are our simulation results (Figure 1), which show that the age group 60-79 remained the most infectious throughout the third wave, and that the contact rates from the age group 20-39 gradually increased in the early weeks of the wave. This was followed by a remarkable decrease in the intra-group contact rate of the age group 60-79 from mid-January onwards.

Comparing the Age Groups on the Basis of Infectiousness and Susceptibility

It is evident from Figure 2 that among all the five age groups, members of the youngest age group (0-19) are the least likely to contract COVID-19. This validates the current understanding of the scientific community that children and teenagers are more immune to the disease than adults. At the opposite extreme, the age groups 80+ and 20-39 appear to be the most vulnerable, possibly because members of the former group have the lowest immunity levels and the latter group exhibits the highest levels of mobility and social activity.

Besides throwing light on how the likelihood of receiving infection varies across age groups, Figure 2 also throws light on how the likelihood of transmitting the infection varies across age groups. From the figure, the two most infectious age groups are clearly 60-79 and 20-39. Surprisingly, the age group 80+ is found to be less infectious than the group 60-79, perhaps because of the lower social mobility of the former. The figure also shows that the age groups 0-19, 40-59 and 80+ are remarkably less infectious than the other two age groups. However, we need additional empirical evidence to validate these findings, and it would be interesting to see whether our inferences are echoed by future empirical studies.

6 CONCLUSION AND FUTURE DIRECTIONS

We have analyzed the age-structured SIR model of epidemic spreading from both theoretical and empirical viewpoints. Starting from a stochastic epidemic model, we have shown that the ODEs defining the age-structured SIR model are the mean-field limits of a continuous-time Markov process evolving over a time-varying network that involves random, asynchronous interactions if and only if the social mixing rate grows unboundedly with the population size. We have also provided a lower-bound on the associated convergence rates in terms of the social mixing rate. As for empirical validation, we have proposed two algorithms: a least-square method to estimate the model parameters based on real data and a phase detection algorithm to detect changes in contact rates and hence also the most significant social behavioral changes that possibly occurred during the observed pandemic timeline. We have validated our model empirically by using it to approximate the trajectories of the numbers of susceptible, infected, and recovered individuals in the prefecture of Tokyo, Japan, over a period of more than 12 months. Our results show that for the purpose of forecasting the future of the COVID-19 pandemic and designing appropriate control policies, the age-structured SIR model is likely to be a strong contender among compartmental epidemic models.

Our analysis, however, has a few limitations. First, it is not clear whether the large number of phases detected by Algorithm 2 indicates rapidly changing social interaction patterns or simply that our model is unable to approximate the pandemic over timescales significantly longer than a month. Second, the outputs of our algorithms have a few surprising implications that are as yet unconfirmed by independent empirical studies. For example, the estimated contact rates indicate that the age group 60-79 is consistently more infectious than the age group 20-39, a finding that is inconsistent with the widely held belief that younger age groups are significantly more mobile than the older ones. Such apparent anomalies highlight the need for age-stratified mobility datasets that would enable further investigation into the dynamic interplay between social behavior and epidemic spreading.

References

  • [1] Worldometer, “COVID-19 Coronavirus Pandemic,” World Health Organization, www.worldometers.info/coronavirus, last accessed 2021, November 11.
  • [2] N. Ferguson, D. Laydon, G. Nedjati-Gilani, N. Imai, K. Ainslie, M. Baguelin, S. Bhatia, A. Boonyasiri, Z. Cucunubá, G. Cuomo-Dannenburg, et al., “Report 9: Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand,” Imperial College London, vol. 10, no. 77482, pp. 491–497, 2020.
  • [3] F. E. Alvarez, D. Argente, and F. Lippi, “A simple planning problem for COVID-19 lockdown,” tech. rep., National Bureau of Economic Research, 2020.
  • [4] D. Acemoglu, V. Chernozhukov, I. Werning, and M. D. Whinston, “A multi-risk SIR model with optimally targeted lockdown,” NBER working paper, no. w27102, 2020.
  • [5] K. Chatterjee, K. Chatterjee, A. Kumar, and S. Shankar, “Healthcare impact of COVID-19 epidemic in India: A stochastic mathematical model,” Medical Journal, Armed Forces, India, vol. 76, no. 2, pp. 147–155, 2020.
  • [6] P. E. Paré, J. Liu, C. L. Beck, A. Nedić, and T. Başar, “Multi-competitive viruses over time-varying networks with mutations and human awareness,” Automatica, vol. 123, p. 109330, 2021.
  • [7] F. Bullo, “Current epidemiological models: Scientific basis and evaluation,” Health, vol. 4, p. 14, 2020.
  • [8] W. O. Kermack and A. G. McKendrick, “A contribution to the mathematical theory of epidemics,” Proceedings of the Royal Society of London, Series A, vol. 115, no. 772, pp. 700–721, 1927.
  • [9] T. G. Kurtz, “Solutions of ordinary differential equations as limits of pure jump Markov processes,” Journal of Applied Probability, vol. 7, no. 1, pp. 49–58, 1970.
  • [10] M. Benaim and J.-Y. Le Boudec, “A class of mean field interaction models for computer and communication systems,” Performance Evaluation, vol. 65, no. 11-12, pp. 823–838, 2008.
  • [11] E. Volz, “SIR dynamics in random networks with heterogeneous connectivity,” Journal of Mathematical Biology, vol. 56, no. 3, pp. 293–310, 2008.
  • [12] L. Decreusefond, J.-S. Dhersin, P. Moyal, V. C. Tran, et al., “Large graph limit for an SIR process in random network with heterogeneous connectivity,” The Annals of Applied Probability, vol. 22, no. 2, pp. 541–575, 2012.
  • [13] J. Mossong, N. Hens, M. Jit, P. Beutels, K. Auranen, R. Mikolajczyk, M. Massari, S. Salmaso, G. S. Tomba, J. Wallinga, et al., “Social contacts and mixing patterns relevant to the spread of infectious diseases,” PLoS Medicine, vol. 5, no. 3, p. e74, 2008.
  • [14] K. Prem, A. R. Cook, and M. Jit, “Projecting social contact matrices in 152 countries using contact surveys and demographic data,” PLoS Computational Biology, vol. 13, no. 9, p. e1005697, 2017.
  • [15] P. Klepac, A. J. Kucharski, A. J. Conlan, S. Kissler, M. L. Tang, H. Fry, and J. R. Gog, “Contacts in context: large-scale setting-specific social mixing matrices from the BBC Pandemic Project,” MedRxiv, 2020.
  • [16] I. Voinsky, G. Baristaite, and D. Gurwitz, “Effects of age and sex on recovery from COVID-19: Analysis of 5769 Israeli patients,” Journal of Infection, vol. 81, no. 2, pp. e102–e103, 2020.
  • [17] G. Ellison, “Implications of heterogeneous SIR models for analyses of COVID-19,” tech. rep., National Bureau of Economic Research, 2020.
  • [18] D. Acemoglu, V. Chernozhukov, I. Werning, and M. D. Whinston, “Optimal targeted lockdowns in a multigroup SIR model,” American Economic Review: Insights, vol. 3, no. 4, pp. 487–502, 2021.
  • [19] D. Baqaee, E. Farhi, M. J. Mina, and J. H. Stock, “Reopening scenarios,” tech. rep., National Bureau of Economic Research, 2020.
  • [20] A. A. Rampini, “Sequential lifting of COVID-19 interventions with population heterogeneity,” tech. rep., National Bureau of Economic Research, 2020.
  • [21] C. A. Favero, A. Ichino, and A. Rustichini, “Restarting the economy while saving lives under COVID-19,” 2020. CEPR Discussion Paper No. DP14664.
  • [22] R. Parasnis, A. Sakhale, R. Kato, M. Franceschetti, and B. Touri, “A Case for the Age-Structured SIR Dynamics for Modelling COVID-19,” in 2021 60th IEEE Conference on Decision and Control (CDC), pp. 5508–5513, IEEE, 2021.
  • [23] A. V. Tkachenko, S. Maslov, A. Elbanna, G. N. Wong, Z. J. Weiner, and N. Goldenfeld, “Time-dependent heterogeneity leads to transient suppression of the COVID-19 epidemic, not herd immunity,” Proceedings of the National Academy of Sciences, vol. 118, no. 17, 2021.
  • [24] J. Dolbeault and G. Turinici, “Heterogeneous social interactions and the COVID-19 lockdown outcome in a multi-group SEIR model,” Mathematical Modelling of Natural Phenomena, vol. 15, p. 36, 2020.
  • [25] S. Contreras, H. A. Villavicencio, D. Medina-Ortiz, J. P. Biron-Lattes, and Á. Olivera-Nappa, “A multi-group SEIRA model for the spread of COVID-19 among heterogeneous populations,” Chaos, Solitons & Fractals, vol. 136, p. 109925, 2020.
  • [26] A. Viguerie, G. Lorenzo, F. Auricchio, D. Baroli, T. J. Hughes, A. Patton, A. Reali, T. E. Yankeelov, and A. Veneziani, “Simulating the spread of COVID-19 via a spatially-resolved Susceptible–Exposed–Infected–Recovered–Deceased (SEIRD) model with heterogeneous diffusion,” Applied Mathematics Letters, vol. 111, p. 106617, 2021.
  • [27] J. E. Harris, “Data from the COVID-19 epidemic in Florida suggest that younger cohorts have been transmitting their infections to less socially mobile older adults,” Review of Economics of the Household, vol. 18, no. 4, pp. 1019–1037, 2020.
  • [28] M. Giagheddu and A. Papetti, “The macroeconomics of age-varying epidemics,” Available at SSRN 3651251, 2020.
  • [29] A. Janiak, C. Machado, and J. Turén, “COVID-19 contagion, economic activity and business reopening protocols,” Journal of Economic behavior & Organization, vol. 182, pp. 264–284, 2021.
  • [30] B. Armbruster and E. Beck, “Elementary proof of convergence to the mean-field model for the SIR process,” Journal of Mathematical Biology, vol. 75, no. 2, pp. 327–339, 2017.
  • [31] P. L. Simon and I. Z. Kiss, “From exact stochastic to mean-field ODE models: a new approach to prove convergence results,” The IMA Journal of Applied Mathematics, vol. 78, no. 5, pp. 945–964, 2013.
  • [32] Tokyo Metropolitan Government, “COVID-19 Information Website,” https://stopcovid19.metro.tokyo.lg.jp/en, last accessed 2021, September 28.
  • [33] W. H. Organization, “Criteria for releasing COVID-19 patients from isolation,” https://www.who.int/news-room/commentaries/detail/criteria-for-releasing-covid-19-patients-from-isolation, last accessed 2021, September 28.
  • [34] C. for Disease Control and Prevention, “Clinical Questions about COVID-19: Questions and Answers / When is someone infectious?,” https://www.cdc.gov/coronavirus/2019-ncov/hcp/faq.html#Transmission, last accessed 2021, September 28.
  • [35] A. Domahidi, E. Chu, and S. Boyd, “ECOS: An SOCP solver for embedded systems,” in 2013 European Control Conference (ECC), pp. 3071–3076, IEEE, 2013.
  • [36] S. Diamond and S. Boyd, “CVXPY: A Python-embedded modeling language for convex optimization,” Journal of Machine Learning Research, vol. 17, no. 83, pp. 1–5, 2016.
  • [37] A. Agrawal, R. Verschueren, S. Diamond, and S. Boyd, “A rewriting system for convex optimization problems,” Journal of Control and Decision, vol. 5, no. 1, pp. 42–60, 2018.
  • [38] Wikipedia, “Timeline of the COVID-19 pandemic in Japan,” last accessed 2021, September 28.
  • [39] K. Karako, P. Song, Y. Chen, W. Tang, and N. Kokudo, “Overview of the characteristics of and responses to the three waves of COVID-19 in japan during 2020-2021,” BioScience Trends, 2021.
  • [40] Google, “COVID-19 Community Mobility Reports,” https://www.google.com/covid19/mobility/, last accessed 2021, September 28.
  • [41] S. P. Lalley, “Continuous Time Markov Chains,” Lecture Notes on Stochastic Processes II, 2012. galton.uchicago.edu/~lalley/Courses/313/ContinuousTime.pdf.

Appendix

Our first aim is to prove Proposition 1, which is based on the Lemma 1. This lemma describes a known property of continuous-time Markov chains, but we prove it nevertheless. The proof is based on the concept of jump times, defined below.

Definition 1 (Jump Times).

The jump times of the Markov chain {𝐗(τ):τ0}\{\mathbf{X}(\tau):\tau\geq 0\} are the random times defined by J0:=0J_{0}:=0 and J:=inf{τ0:𝐗(J1+τ)𝐗(J1)}J_{\ell}:=\inf\{\tau\geq 0:\mathbf{X}(J_{\ell-1}+\tau)\neq\mathbf{X}(J_{\ell-1})\} for all \ell\in\mathbb{N}.

Note that jump times are simply the times at which the Markov chain jumps or transitions to a new state.

Lemma 1.

Let 𝐱𝕊\mathbf{x}\in\mathbb{S} and let [t,t+Δt)[0,)[t,t+\Delta t)\subset[0,\infty). Given that 𝐗(t)=𝐱\mathbf{X}(t)=\mathbf{x}, the conditional probability that more than one state transitions occur during [t,t+Δt)[t,t+\Delta t) is o(Δt)o(\Delta t).

Proof.

Let 𝐲,𝐳𝕊\mathbf{y},\mathbf{z}\in\mathbb{S} be any two states such that 𝐲\mathbf{y} and 𝐳\mathbf{z} potentially succeed 𝐱\mathbf{x} and 𝐲\mathbf{y}, respectively. Also, let {𝐗(Ji)}i=0\{\mathbf{X}(J_{i})\}_{i=0}^{\infty} be the embedded jump chain of {𝐗(τ)}τ0\{\mathbf{X}(\tau)\}_{\tau\geq 0} (where J0:=0J_{0}:=0). Then, given that 𝐗(0)=𝐱\mathbf{X}(0)=\mathbf{x}, 𝐗(J1)=𝐲\mathbf{X}(J_{1})=\mathbf{y}, and 𝐗(J2)=𝐳\mathbf{X}(J_{2})=\mathbf{z}, the holding times J1J_{1} and J2J1J_{2}-J_{1} are conditionally independent exponential random variables with parameters qx:=|𝐐(𝐱,𝐱)|q_{x}:=|\mathbf{Q}(\mathbf{x},\mathbf{x})| and qy:=|𝐐(𝐲,𝐲)|q_{y}:=|\mathbf{Q}(\mathbf{y},\mathbf{y})|, respectively. Therefore, given that the original Markov chain makes its first and second transitions from 𝐱\mathbf{x} to 𝐲\mathbf{y} and from 𝐲\mathbf{y} to 𝐳\mathbf{z} respectively, the conditional probability that both of these transitions occur during [0,Δt)[0,\Delta t) is given by

Pr(J2<Δt(𝐗(0),𝐗(J1),𝐗(J2))=(𝐱,𝐲,𝐳))\displaystyle\text{Pr}(J_{2}<\Delta t\mid(\mathbf{X}(0),\mathbf{X}(J_{1}),\mathbf{X}(J_{2}))=(\mathbf{x},\mathbf{y},\mathbf{z})) (33)
Pr(J2J1<Δt,J1<Δt(𝐗(0),𝐗(J1),𝐗(J2))=(𝐱,𝐲,𝐳))\displaystyle\leq\text{Pr}(J_{2}-J_{1}<\Delta t,J_{1}<\Delta t\mid(\mathbf{X}(0),\mathbf{X}(J_{1}),\mathbf{X}(J_{2}))=(\mathbf{x},\mathbf{y},\mathbf{z})) (34)
=Pr(J2J1<Δt(𝐗(0),𝐗(J1),𝐗(J2))=(𝐱,𝐲,𝐳))Pr(J1<Δt(𝐗(0),𝐗(J1),𝐗(J2))=(𝐱,𝐲,𝐳))\displaystyle=\text{Pr}(J_{2}-J_{1}<\Delta t\mid(\mathbf{X}(0),\mathbf{X}(J_{1}),\mathbf{X}(J_{2}))=(\mathbf{x},\mathbf{y},\mathbf{z}))\cdot\text{Pr}(J_{1}<\Delta t\mid(\mathbf{X}(0),\mathbf{X}(J_{1}),\mathbf{X}(J_{2}))=(\mathbf{x},\mathbf{y},\mathbf{z})) (35)
=(1eqyΔt)(1eqxΔt)\displaystyle=(1-e^{-q_{y}\Delta t})(1-e^{-q_{x}\Delta t}) (36)
=o(Δt).\displaystyle=o(\Delta t). (37)

Therefore, Pr(J2<Δt𝐗(0)=𝐱)\text{Pr}(J_{2}<\Delta t\mid\mathbf{X}(0)=\mathbf{x}), which is at most max𝐲,𝐳𝕊Pr(J2<Δt(𝐗(0),𝐗(J1),𝐗(Jt)=(𝐱,𝐲,𝐳))\max_{\mathbf{y},\mathbf{z}\in\mathbb{S}}\text{Pr}(J_{2}<\Delta t\mid(\mathbf{X}(0),\mathbf{X}(J_{1}),\mathbf{X}(J_{t})=(\mathbf{x},\mathbf{y},\mathbf{z})), is o(Δt)o(\Delta t). Hence, given that 𝐗(0)=𝐱\mathbf{X}(0)=\mathbf{x}, the conditional probability that at least two state transitions occur during [0,Δt)[0,\Delta t) is o(Δt)o(\Delta t). By time-homogeneity, this means the following: given that 𝐗(t)=𝐱\mathbf{X}(t)=\mathbf{x}, the conditional probability that at least two state transitions occur during [t,t+Δt)[t,t+\Delta t) is o(Δt)o(\Delta t).

Proof of Proposition 1

Proof.

We derive the equations one by one.

Proof of (i):

Consider any state 𝐱𝕊\mathbf{x}\in\mathbb{S}. Then, by the definition of 𝐐\mathbf{Q}, for any i[n]i\in[n] and a𝒮i(𝐱)a\in\mathcal{S}_{i}(\mathbf{x}), we have

Pr(𝐗(t+Δt)=𝐱a𝐗(t)=𝐱)\displaystyle~{}\text{Pr}(\mathbf{X}(t+\Delta t)=\mathbf{x}_{\uparrow a}\mid\mathbf{X}(t)=\mathbf{x}) =𝐐(𝐱,𝐱a)Δt+o(Δt)=(k=1mBikEk(a)(𝐱))Δt+o(Δt).\displaystyle=\mathbf{Q}(\mathbf{x},\mathbf{x}_{\uparrow a})\Delta t+o(\Delta t)=\left(\sum_{k=1}^{m}B_{ik}E_{k}^{(a)}(\mathbf{x})\right)\Delta t+o(\Delta t). (38)

We now use (38) to evaluate the probability of the event {Si(t+Δt)=Si(t)1}\{S_{i}(t+\Delta t)=S_{i}(t)-1\}. To this end, let D(U,t,Δt))D_{\ell}(U,t,\Delta t)) denote the event that exactly \ell nodes in a given set U[n]U\subset[n] recover during [t,t+Δt)[t,t+\Delta t) (i.e., there exist exactly \ell indices r1,,rr_{1},\ldots,r_{\ell} in UU such that Xrk(t)=1X_{r_{k}}(t)=1 and Xrk(t+Δt)=1X_{r_{k}}(t+\Delta t)=-1). Similarly, let (U,t,Δt)\mathscr{I}_{\ell}(U,t,\Delta t) denote the event that exactly \ell nodes in UU get infected during [t,t+Δt)[t,t+\Delta t). Then,

Pr(Si(t+Δt)Si(t)=1𝐗(t)=𝐱)\displaystyle\text{Pr}(S_{i}(t+\Delta t)-S_{i}(t)=-1\mid\mathbf{X}(t)=\mathbf{x}) (39)
=Pr(1(𝒜i,t,Δt)𝐗(t)=𝐱)\displaystyle=\text{Pr}(\mathscr{I}_{1}(\mathcal{A}_{i},t,\Delta t)\mid\mathbf{X}(t)=\mathbf{x}) (40)
=(a)Pr(D0([n],t,Δt)0([n]𝒜i,t,Δt)1(𝒜i,t,Δt)𝐗(t)=𝐱)+o(Δt)\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\text{Pr}(D_{0}([n],t,\Delta t)\cap\mathscr{I}_{0}([n]\setminus\mathcal{A}_{i},t,\Delta t)\cap\mathscr{I}_{1}(\mathcal{A}_{i},t,\Delta t)\mid\mathbf{X}(t)=\mathbf{x})+o(\Delta t) (41)
=Pr(a𝒮i(𝐱){𝐗(t+Δt)=𝐱a}𝐗(t)=𝐱)+o(Δt)\displaystyle=\text{Pr}\left(\cup_{a\in\mathcal{S}_{i}(\mathbf{x})}\{\mathbf{X}(t+\Delta t)=\mathbf{x}_{\uparrow a}\}\mid\mathbf{X}(t)=\mathbf{x}\right)+o(\Delta t) (42)
=a𝒮i(𝐱)Pr(𝐗(t+Δt)=𝐱a𝐗(t)=𝐱)+o(Δt)\displaystyle=\sum_{a\in\mathcal{S}_{i}(\mathbf{x})}\text{Pr}(\mathbf{X}(t+\Delta t)=\mathbf{x}_{\uparrow a}\mid\mathbf{X}(t)=\mathbf{x})+o(\Delta t) (43)
=(b)(a𝒮i(𝐱)j=1mBijEj(a)(𝐱))Δt+o(Δt)\displaystyle\stackrel{{\scriptstyle(b)}}{{=}}\left(\sum_{a\in\mathcal{S}_{i}(\mathbf{x})}\sum_{j=1}^{m}B_{ij}E_{j}^{(a)}(\mathbf{x})\right)\Delta t+o(\Delta t) (44)
=(j=1ma𝒮i(𝐱)bj(𝐱)Bij1(a,b)(𝐱))Δt+o(Δt).\displaystyle=\left(\sum_{j=1}^{m}\sum_{a\in\mathcal{S}_{i}(\mathbf{x})}\sum_{b\in\mathcal{I}_{j}(\mathbf{x})}B_{ij}1_{(a,b)}(\mathbf{x})\right)\Delta t+o(\Delta t). (45)

where (a)(a) is a straightforward consequence of Lemma 1, and (b)(b) follows from (38). Since this holds for all 𝐱𝕊\mathbf{x}\in\mathbb{S}, we have

Pr(Si(t+Δt)Si(t)=1𝐗(t))=(j=1ma𝒮i(t)bj(t)Bij1(a,b)(t))Δt+o(Δt),\displaystyle\text{Pr}(S_{i}(t+\Delta t)-S_{i}(t)=-1\mid\mathbf{X}(t))=\left(\sum_{j=1}^{m}\sum_{a\in\mathcal{S}_{i}(t)}\sum_{b\in\mathcal{I}_{j}(t)}B_{ij}1_{(a,b)}(t)\right)\Delta t+o(\Delta t), (46)

where 𝒮i(t)\mathcal{S}_{i}(t), i(t)\mathcal{I}_{i}(t), and 1(a,b)(t)1_{(a,b)}(t) stand for 𝒮i(𝐗(t))\mathcal{S}_{i}(\mathbf{X}(t)), i(𝐗(t))\mathcal{I}_{i}(\mathbf{X}(t)), and 1(a,b)(𝐗(t))1_{(a,b)}(\mathbf{X}(t)), respectively. Since 𝒮(t)\mathcal{S}(t) and (t)\mathcal{I}(t) are determined by 𝐗(t)\mathbf{X}(t), we may express (46) as

Pr(Si(t+Δt)Si(t)=1𝒮(t),(t),𝐗(t))=(j=1ma𝒮i(t)bj(t)Bij1(a,b)(t))Δt+o(Δt).\displaystyle\text{Pr}(S_{i}(t+\Delta t)-S_{i}(t)=-1\mid\mathcal{S}(t),\mathcal{I}(t),\mathbf{X}(t))=\left(\sum_{j=1}^{m}\sum_{a\in\mathcal{S}_{i}(t)}\sum_{b\in\mathcal{I}_{j}(t)}B_{ij}1_{(a,b)}(t)\right)\Delta t+o(\Delta t). (47)

As a result, we have

Pr(Si(t+Δt)Si(t)=1𝒮(t),(t))=(j=1ma𝒮i(t)bj(t)Bij𝔼[1(a,b)(t)𝒮(t),(t)])Δt+o(Δt).\displaystyle\text{Pr}(S_{i}(t+\Delta t)-S_{i}(t)=-1\mid\mathcal{S}(t),\mathcal{I}(t))=\left(\sum_{j=1}^{m}\sum_{a\in\mathcal{S}_{i}(t)}\sum_{b\in\mathcal{I}_{j}(t)}B_{ij}\mathbb{E}[1_{(a,b)}(t)\mid\mathcal{S}(t),\mathcal{I}(t)]\right)\Delta t+o(\Delta t). (48)

At this point we note that

𝔼[1(a,b)(t)𝒮(t),(t)]=Pr((a,b)E(t)𝒮(t),(t))=χij(t,𝒮,).\mathbb{E}[1_{(a,b)}(t)\mid\mathcal{S}(t),\mathcal{I}(t)]=\text{Pr}((a,b)\in E(t)\mid\mathcal{S}(t),\mathcal{I}(t))=\chi_{ij}(t,\mathcal{S},\mathcal{I}).

We thus have the following for ΔtSi:=Si(t+Δt)Si(t)\Delta_{t}S_{i}:=S_{i}(t+\Delta t)-S_{i}(t):

𝔼[ΔtSi𝒮(t),(t)]\displaystyle\mathbb{E}[\Delta_{t}S_{i}\mid\mathcal{S}(t),\mathcal{I}(t)] =Pr(ΔtSi=1𝒮(t),(t))=2nPr(ΔtSi=)\displaystyle=-\text{Pr}(\Delta_{t}S_{i}=-1\mid\mathcal{S}(t),\mathcal{I}(t))-\sum_{\ell=2}^{n}\ell\cdot\text{Pr}(\Delta_{t}S_{i}=-\ell) (49)
=(a)Pr(ΔtSi=1𝒮(t),(t))+o(Δt)\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}-\text{Pr}(\Delta_{t}S_{i}=-1\mid\mathcal{S}(t),\mathcal{I}(t))+o(\Delta t) (50)
=(j=1ma𝒮i(t)bj(t)Bijχij(t))Δt+o(Δt)\displaystyle=-\left(\sum_{j=1}^{m}\sum_{a\in\mathcal{S}_{i}(t)}\sum_{b\in\mathcal{I}_{j}(t)}B_{ij}\chi_{ij}(t)\right)\Delta t+o(\Delta t) (51)
=(j=1mBijχij(t)Si(t)Ij(t))Δt+o(Δt),\displaystyle=-\left(\sum_{j=1}^{m}B_{ij}\chi_{ij}(t)S_{i}(t)I_{j}(t)\right)\Delta t+o(\Delta t), (52)

where (a) follows from Lemma 1. Taking expectations on both sides of (49) and dividing the resulting relation by Δt\Delta t now yields

𝔼[Si(t+Δt)Si(t)Δt]\displaystyle\mathbb{E}\left[\frac{S_{i}(t+\Delta t)-S_{i}(t)}{\Delta t}\right] =j=1mBij𝔼[nχij(t)si(t)Ij(t)]+o(Δt)Δt,\displaystyle=-\sum_{j=1}^{m}B_{ij}\mathbb{E}[n\chi_{ij}(t)s_{i}(t)I_{j}(t)]+\frac{o(\Delta t)}{\Delta t}, (53)

where we used that Si(t)=nsi(t)S_{i}(t)=ns_{i}(t). On letting Δt0\Delta t\rightarrow 0 and then dividing both the sides of (53) by nn, we obtain (i).

Proof of (ii):

Observe that for any 𝐱𝕊\mathbf{x}\in\mathbb{S}, we have

Pr(Ii(t+Δt)Ii(t)=1𝐗(t)=𝐱)\displaystyle\text{Pr}(I_{i}(t+\Delta t)-I_{i}(t)=1\mid\mathbf{X}(t)=\mathbf{x}) (54)
=Pr(|i(𝐗(t+Δt))||i(𝐗(t))|=1𝐗(t)=𝐱)\displaystyle=\text{Pr}(|\mathcal{I}_{i}(\mathbf{X}(t+\Delta t))|-|\mathcal{I}_{i}(\mathbf{X}(t))|=1\mid\mathbf{X}(t)=\mathbf{x}) (55)
=Pr(=0n(D(𝒜i,t,Δt)+1(𝒜i,t,Δt))𝐗(t)=𝐱)\displaystyle=\text{Pr}(\cup_{\ell=0}^{n}(D_{\ell}(\mathcal{A}_{i},t,\Delta t)\cap\mathscr{I}_{\ell+1}(\mathcal{A}_{i},t,\Delta t))\mid\mathbf{X}(t)=\mathbf{x}) (56)
=(a)Pr(D0(𝒜i,t,Δt)1(𝒜i,t,Δt)𝐗(t)=𝐱)+o(Δt)\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\text{Pr}(D_{0}(\mathcal{A}_{i},t,\Delta t)\cap\mathscr{I}_{1}(\mathcal{A}_{i},t,\Delta t)\mid\mathbf{X}(t)=\mathbf{x})+o(\Delta t) (57)
=(b)Pr(D0([n],t,Δt)0([n]𝒜i,t,Δt)1(𝒜i,t,Δt)𝐗(t)=𝐱)+o(Δt)\displaystyle\stackrel{{\scriptstyle(b)}}{{=}}\text{Pr}(D_{0}([n],t,\Delta t)\cap\mathscr{I}_{0}([n]\setminus\mathcal{A}_{i},t,\Delta t)\cap\mathscr{I}_{1}(\mathcal{A}_{i},t,\Delta t)\mid\mathbf{X}(t)=\mathbf{x})+o(\Delta t) (58)
=Pr(c𝒮i(𝐱){𝐗(t+Δt)=𝐱c}𝐗(t)=𝐱)+o(Δt)\displaystyle=\text{Pr}\left(\cup_{c\in\mathcal{S}_{i}(\mathbf{x})}\{\mathbf{X}(t+\Delta t)=\mathbf{x}_{\uparrow c}\}\mid\mathbf{X}(t)=\mathbf{x}\right)+o(\Delta t) (59)
=c𝒮i(𝐱)Pr(𝐗(t+Δt)=𝐱c𝐗(t)=𝐱)+o(Δt)\displaystyle=\sum_{c\in\mathcal{S}_{i}(\mathbf{x})}\text{Pr}(\mathbf{X}(t+\Delta t)=\mathbf{x}_{\uparrow c}\mid\mathbf{X}(t)=\mathbf{x})+o(\Delta t) (60)
=(c)(c𝒮i(𝐱)j=1mBijEj(c)(𝐱))Δt+o(Δt),\displaystyle\stackrel{{\scriptstyle(c)}}{{=}}\left(\sum_{c\in\mathcal{S}_{i}(\mathbf{x})}\sum_{j=1}^{m}B_{ij}E_{j}^{(c)}(\mathbf{x})\right)\Delta t+o(\Delta t), (61)

where (a)(a) and (b)(b) follow from Lemma 1 and (c)(c) follows from (38).

On the other hand,

Pr(Ii(t+Δt)Ii(t)=1𝐗(t)=𝐱)\displaystyle\text{Pr}(I_{i}(t+\Delta t)-I_{i}(t)=-1\mid\mathbf{X}(t)=\mathbf{x}) (62)
=Pr(=0n(D+1(𝒜i,t,Δt)(𝒜i,t,Δt))𝐗(t)=𝐱)\displaystyle=\text{Pr}(\cup_{\ell=0}^{n}(D_{\ell+1}(\mathcal{A}_{i},t,\Delta t)\cap\mathscr{I}_{\ell}(\mathcal{A}_{i},t,\Delta t))\mid\mathbf{X}(t)=\mathbf{x}) (63)
=(a)Pr(D1(𝒜i,t,Δt)0(𝒜i,t,Δt)𝐗(t)=𝐱)+o(Δt)\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\text{Pr}(D_{1}(\mathcal{A}_{i},t,\Delta t)\cap\mathscr{I}_{0}(\mathcal{A}_{i},t,\Delta t)\mid\mathbf{X}(t)=\mathbf{x})+o(\Delta t) (64)
=(b)Pr(D1(𝒜i,t,Δt)D0([n]𝒜i,t,Δt)0([n],t,Δt)𝐗(t)=𝐱)+o(Δt)\displaystyle\stackrel{{\scriptstyle(b)}}{{=}}\text{Pr}(D_{1}(\mathcal{A}_{i},t,\Delta t)\cap D_{0}([n]\setminus\mathcal{A}_{i},t,\Delta t)\cap\mathscr{I}_{0}([n],t,\Delta t)\mid\mathbf{X}(t)=\mathbf{x})+o(\Delta t) (65)
=ci(𝐱)Pr(𝐗(t+Δt)=𝐱c𝐗(t)=𝐱)+o(Δt)\displaystyle=\sum_{c\in\mathcal{I}_{i}(\mathbf{x})}\text{Pr}(\mathbf{X}(t+\Delta t)=\mathbf{x}_{\downarrow c}\mid\mathbf{X}(t)=\mathbf{x})+o(\Delta t) (66)
=ci(𝐱)(𝐐(𝐱,𝐱c)Δt+o(Δt))+o(Δt)\displaystyle=\sum_{c\in\mathcal{I}_{i}(\mathbf{x})}(\mathbf{Q}(\mathbf{x},\mathbf{x}_{\downarrow c})\Delta t+o(\Delta t))+o(\Delta t) (67)
=ci(𝐱)γiΔt+o(Δt)\displaystyle=\sum_{c\in\mathcal{I}_{i}(\mathbf{x})}\gamma_{i}\Delta t+o(\Delta t) (68)
=γi|Ii(𝐱)|Δt+o(Δt).\displaystyle=\gamma_{i}|I_{i}(\mathbf{x})|\Delta t+o(\Delta t). (69)

As a result of (54), (62), and Lemma 1, we have

𝔼[Ii(t+Δt)Ii(t)𝐗(t)]=(c𝒮i(𝐱)j=1mBijEj(c)(𝐗(t))γi|i(𝐗(t))|)Δt+o(Δt).\displaystyle\mathbb{E}[I_{i}(t+\Delta t)-I_{i}(t)\mid\mathbf{X}(t)]=\left(\sum_{c\in\mathcal{S}_{i}(\mathbf{x})}\sum_{j=1}^{m}B_{ij}E_{j}^{(c)}(\mathbf{X}(t))-\gamma_{i}|\mathcal{I}_{i}(\mathbf{X}(t))|\right)\Delta t+o(\Delta t).

By repeating the arguments used in the proof of (i), we can use the above to prove that

𝔼[Ii(t+Δt)Ii(t)𝒮(t),(t)]=(j=1mBijχij(t)Si(t)Ij(t)γiIi(t))Δt+o(Δt),\displaystyle\mathbb{E}[I_{i}(t+\Delta t)-I_{i}(t)\mid\mathcal{S}(t),\mathcal{I}(t)]=\left(\sum_{j=1}^{m}B_{ij}\chi_{ij}(t)S_{i}(t)I_{j}(t)-\gamma_{i}I_{i}(t)\right)\Delta t+o(\Delta t),

which implies that

𝔼[Ii(t+Δt)Ii(t)Δt]=j=1mBij𝔼[nχij(t)si(t)Ij(t)]γi𝔼[Ii(t)]+o(Δt)Δt.\displaystyle\mathbb{E}\left[\frac{I_{i}(t+\Delta t)-I_{i}(t)}{\Delta t}\right]=\sum_{j=1}^{m}B_{ij}\mathbb{E}[n\chi_{ij}(t)s_{i}(t)I_{j}(t)]-\gamma_{i}\mathbb{E}[I_{i}(t)]+\frac{o(\Delta t)}{\Delta t}.

On dividing both sides by nn and then letting Δt0\Delta t\rightarrow 0, we obtain (ii).

Proof of (iii):

Observe that when ΔtSi=1\Delta_{t}S_{i}=-1, we have Si2(t+Δt)Si2(t)=12Si(t)S_{i}^{2}(t+\Delta t)-S_{i}^{2}(t)=1-2S_{i}(t). Therefore,

𝔼[Si2(t+Δt)Si2(t)𝒮(t),(t)]\displaystyle\mathbb{E}[S_{i}^{2}(t+\Delta t)-S_{i}^{2}(t)\mid\mathcal{S}(t),\mathcal{I}(t)] =(12Si(t))Pr(ΔtSi=1𝒮(t),(t))+o(Δt)\displaystyle=(1-2S_{i}(t))\cdot\text{Pr}(\Delta_{t}S_{i}=-1\mid\mathcal{S}(t),\mathcal{I}(t))+o(\Delta t)
=(j=1mBijχijSiIjΔt2j=1mBijχijSi2IjΔt)+o(Δt).\displaystyle=\left(\sum_{j=1}^{m}B_{ij}\chi_{ij}S_{i}I_{j}\Delta t-2\sum_{j=1}^{m}B_{ij}\chi_{ij}S_{i}^{2}I_{j}\Delta t\right)+o(\Delta t).

Taking expectations on both sides, dividing by Δt\Delta t, letting Δt0\Delta t\rightarrow 0, and dividing both sides by n2n^{2} yields (iii).

Proof of (iv):

Observe that if ΔtIi(t)=1\Delta_{t}I_{i}(t)=-1, we have Ii2(t+Δt)Ii2(t)=12Ii(t)I_{i}^{2}(t+\Delta t)-I_{i}^{2}(t)=1-2I_{i}(t), and if ΔtIi(t)=1\Delta_{t}I_{i}(t)=1, we have Ii2(t+Δt)Ii2(t)=1+2Ii(t)I_{i}^{2}(t+\Delta t)-I_{i}^{2}(t)=1+2I_{i}(t).

Thus,

𝔼[Ii2(t+Δt)Ii2(t)𝒮(t),(t)]\displaystyle\mathbb{E}[I_{i}^{2}(t+\Delta t)-I_{i}^{2}(t)\mid\mathcal{S}(t),\mathcal{I}(t)]
=(12Ii(t))Pr(ΔtIi=1𝒮(t),(t))+(1+2Ii(t))Pr(ΔtIi=1𝒮(t),(t))+o(Δt)\displaystyle=(1-2I_{i}(t))\cdot\text{Pr}(\Delta_{t}I_{i}=-1\mid\mathcal{S}(t),\mathcal{I}(t))+(1+2I_{i}(t))\cdot\text{Pr}(\Delta_{t}I_{i}=1\mid\mathcal{S}(t),\mathcal{I}(t))+o(\Delta t)

On substituting the probabilities above with the expressions derived earlier, taking expectations on both sides, dividing by n2Δtn^{2}\Delta t and letting Δt0\Delta t\rightarrow 0, we obtain (iv).

Lemma 2.

Let a𝒜ia\in\mathcal{A}_{i} and b𝒜jb\in\mathcal{A}_{j} be any two nodes, let t[0,)t\in[0,\infty) be any time, and let T[0,t]T\in[0,t] be the random variable such that tTt-T is the time at which 1(a,b)1_{(a,b)} is updated for the last time during the interval [0,t][0,t]. Then the random variables TT and 1(a,b)(t)1_{(a,b)}(t) are independent.

Proof.

For τ[0,t]\tau\in[0,t], let NτN_{\tau} denote the number of times 1(a,b)1_{(a,b)} is updated in the open interval (tτ,t)(t-\tau,t), and let UτU_{\tau} denote the zero-probability event that 1(a,b)1_{(a,b)} is updated at time tτt-\tau. Note that 𝐐(𝐱,𝐱(a,b))+𝐐(𝐱,𝐱(a,b))=λ\mathbf{Q}(\mathbf{x},\mathbf{x}_{\uparrow(a,b)})+\mathbf{Q}(\mathbf{x},\mathbf{x}_{\downarrow(a,b)})=\lambda for all 𝐱𝕊\mathbf{x}\in\mathbb{S}, which means that the rate at which 1(a,b)1_{(a,b)} is updated is time-invariant and independent of the network state. This means that the sequence of times at which 1(a,b)1_{(a,b)} is updated is a Poisson process, which further means that the updates of 1(a,b)1_{(a,b)} occurring in disjoint time intervals are independent. It follows that NτN_{\tau} is a Poisson random variable (with mean λτ\lambda\tau) that is independent of Uτ{U_{\tau}} and 1(a,b)(tτ)1_{(a,b)}(t-\tau). As a result,

Pr((a,b)E(t)T=τ)\displaystyle\text{Pr}((a,b)\in E(t)\mid T=\tau) =(a)Pr((a,b)E(tτ)Uτ,Nτ=0)\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\text{Pr}((a,b)\in E(t-\tau)\mid U_{\tau},N_{\tau}=0)
=(b)Pr(Nτ=0(a,b)E(tτ),Uτ)Pr(Nτ=0Uτ)Pr((a,b)E(tτ)Uτ)\displaystyle\stackrel{{\scriptstyle(b)}}{{=}}\frac{\text{Pr}(N_{\tau}=0\mid(a,b)\in E(t-\tau),U_{\tau})}{\text{Pr}(N_{\tau}=0\mid U_{\tau})}\cdot\text{Pr}((a,b)\in E(t-\tau)\mid U_{\tau})
=Pr(Nτ=01(a,b)(tτ)=1,Uτ)Pr(Nτ=0Uτ)Pr((a,b)E(tτ)Uτ)\displaystyle=\frac{\text{Pr}(N_{\tau}=0\mid 1_{(a,b)}(t-\tau)=1,U_{\tau})}{\text{Pr}(N_{\tau}=0\mid U_{\tau})}\cdot\text{Pr}((a,b)\in E(t-\tau)\mid U_{\tau})
=Pr(Nτ=0)Pr(Nτ=0)Pr((a,b)E(tτ)Uτ)\displaystyle=\frac{\text{Pr}(N_{\tau}=0)}{\text{Pr}(N_{\tau}=0)}\cdot\text{Pr}((a,b)\in E(t-\tau)\mid U_{\tau})
=(c)ρijn,\displaystyle\stackrel{{\scriptstyle(c)}}{{=}}\frac{\rho_{ij}}{n},

where (a) follows from the definition of TT, (b)(b) follows from Bayes’ rule, and (c)(c) follows from the model definition (Section 2). Thus, Pr(1(a,b)(t)=1T=τ)\text{Pr}(1_{(a,b)}(t)=1\mid T=\tau) and Pr(1(a,b)(t)=0T=τ)\text{Pr}(1_{(a,b)}(t)=0\mid T=\tau) do not depend on τ\tau, which means that TT and 1(a,b)(t)1_{(a,b)}(t) are independent.

The proof of Proposition 2 is based on the concepts of transition sequences and agnostic transition sequences, which we define below.

Definition 2 (Transition Sequence).

Consider any time t0t\geq 0, integer r0r\in\mathbb{N}_{0}, tuples 𝐱(1),𝐱(2),,𝐱(r)𝕊\mathbf{x}^{(1)},\mathbf{x}^{(2)},\ldots,\mathbf{x}^{(r)}\in\mathbb{S}, and times 0t1<t2<<trt0\leq t_{1}<t_{2}<\cdots<t_{r}\leq t. Let F={𝐱(0)t1𝐱(1)t2tr𝐱(r)t𝐱(r)}F=\{\mathbf{x}^{(0)}\stackrel{{\scriptstyle t_{1}}}{{\rightarrow}}\mathbf{x}^{(1)}\stackrel{{\scriptstyle t_{2}}}{{\rightarrow}}\cdots\stackrel{{\scriptstyle t_{r}}}{{\rightarrow}}\mathbf{x}^{(r)}\stackrel{{\scriptstyle t}}{{\rightarrow}}\mathbf{x}^{(r)}\} denote the event that the embedded jump chain {𝐗(J):0}\{\mathbf{X}(J_{\ell}):\ell\in\mathbb{N}_{0}\} satisfies 𝐗(J)=𝐱()\mathbf{X}(J_{\ell})=\mathbf{x}^{(\ell)} and J=tJ_{\ell}=t_{\ell} for all [r]\ell\in[r], and Jr+1>tJ_{r+1}>t. Then FF is said to be a transition sequence for the time interval [0,t][0,t].

Note that if FF is a transition sequence for [0,t][0,t], then for every tuple 𝐱𝕊\mathbf{x}\in\mathbb{S}, we either have F{𝐗(t)=𝐱}F\subset\{\mathbf{X}(t)=\mathbf{x}\} or F{𝐗(t)𝐱}F\subset\{\mathbf{X}(t)\neq\mathbf{x}\}.

Definition 3 ((a,b)(a,b)-Complement).

Let 𝐱𝕊\mathbf{x}\in\mathbb{S}. Then the (a,b)(a,b)-complement of 𝐱\mathbf{x}, denoted by 𝐱(a,b¯)\mathbf{x}_{(\overline{a,b})}, is defined by

(𝐱(a,b¯))={xif [2n2n]{a,b},1xif =a,b.\displaystyle(\mathbf{x}_{(\overline{a,b})})_{\ell}=\begin{cases}x_{\ell}\quad&\text{if }\ell\in[2n^{2}-n]\setminus\{\langle a,b\rangle\},\\ 1-x_{\ell}\quad&\text{if }\ell=\langle a,b\rangle.\end{cases}

Definition 4 ((a,b)(a,b)-Agnostic Transition Sequence).

Let a,b[n]a,b\in[n], and let F={𝐱(0)t1𝐱(1)t2tr𝐱(r)t𝐱(r)}F=\{\mathbf{x}^{(0)}\stackrel{{\scriptstyle t_{1}}}{{\rightarrow}}\mathbf{x}^{(1)}\stackrel{{\scriptstyle t_{2}}}{{\rightarrow}}\cdots\stackrel{{\scriptstyle t_{r}}}{{\rightarrow}}\mathbf{x}^{(r)}\stackrel{{\scriptstyle t}}{{\rightarrow}}\mathbf{x}^{(r)}\} be a transition sequence for a time interval [0,t][0,t]. Further, define

Λ(a,b)(F)={max{[r]:𝐱(){𝐱(a,b)(1),𝐱(a,b)(1)}}if {[r]:𝐱(){𝐱(a,b)(1),𝐱(a,b)(1)}}0otherwise.\Lambda_{(a,b)}(F)=\begin{cases}\max\left\{\ell\in[r]:\mathbf{x}^{(\ell)}\in\{\mathbf{x}^{(\ell-1)}_{\uparrow(a,b)},\mathbf{x}^{(\ell-1)}_{\downarrow(a,b)}\}\right\}\quad&\text{if }\left\{\ell\in[r]:\mathbf{x}^{(\ell)}\in\{\mathbf{x}^{(\ell-1)}_{\uparrow(a,b)},\mathbf{x}^{(\ell-1)}_{\downarrow(a,b)}\}\right\}\neq\emptyset\\ 0\quad&\text{otherwise.}\end{cases}

Then the (a,b)(a,b)-agnostic transition sequence for FF is the event F?(a,b)=FF(a,b¯)F_{?(a,b)}=F\cup F_{(\overline{a,b})}, where F(a,b¯)F_{(\overline{a,b})}, defined by

F(a,b¯):=\displaystyle F_{(\overline{a,b})}:= {𝐱(0)t1tΛ(a,b)(F)1𝐱(Λ(a,b)(F)1)tΛ(a,b)(F)𝐱(a,b¯)(Λ(a,b)(F))\displaystyle\Big{\{}\mathbf{x}^{(0)}\stackrel{{\scriptstyle t_{1}}}{{\rightarrow}}\cdots\stackrel{{\scriptstyle t_{\Lambda_{(a,b)}(F)-1}}}{{\rightarrow}}\mathbf{x}^{({\Lambda_{(a,b)}(F)-1})}\stackrel{{\scriptstyle t_{\Lambda_{(a,b)}(F)}}}{{\rightarrow}}\mathbf{x}^{({\Lambda_{(a,b)}(F)})}_{(\overline{a,b})}
tΛ(a,b)(F)+1𝐱(a,b¯)(Λ(a,b)(F)+1)tΛ(a,b)(F)+2tr𝐱(a,b¯)(r)t𝐱(a,b¯)(r)},\displaystyle\quad\stackrel{{\scriptstyle t_{\Lambda_{(a,b)}(F)+1}}}{{\rightarrow}}\mathbf{x}^{({\Lambda_{(a,b)}(F)+1})}_{(\overline{a,b})}\stackrel{{\scriptstyle t_{\Lambda_{(a,b)}(F)+2}}}{{\rightarrow}}\cdots\stackrel{{\scriptstyle t_{r}}}{{\rightarrow}}\mathbf{x}^{(r)}_{(\overline{a,b})}\stackrel{{\scriptstyle t}}{{\rightarrow}}\mathbf{x}^{(r)}_{(\overline{a,b})}\Big{\}},

is called the (a,b)(a,b)-complement of FF.

Given that FF occurs, tΛ(a,b)(F)t_{\Lambda_{(a,b)}(F)} denotes the time at which the edge state of (a,b)(a,b) is updated for the last time during the time interval [0,t][0,t] (note that, if the edge state of (a,b)(a,b) is not updated during [0,t][0,t], then tΛ(a,b)(F)=t0:=0t_{\Lambda_{(a,b)}(F)}=t_{0}:=0). Therefore, the only difference between FF and F(a,b¯)F_{(\overline{a,b})} is that the last edge state of (a,b)(a,b) to be realized during the interval [0,t][0,t] is different for FF and F(a,b)F_{(a,b)}. Stated differently, if F{(a,b)E(t)}F\subset\{(a,b)\in E(t)\}, then F(a,b¯){(a,b)E(t)}F_{(\overline{a,b})}\subset\{(a,b)\notin E(t)\}, and vice-versa. As a result, the event F?(a,b)F_{?(a,b)} is (a,b)(a,b)-agnostic in that the occurrence of this event does not provide any information about the edge state of (a,b)(a,b) at time tt.

Note that if FF is a transition sequence, then FF, F(a,b¯)F_{(\overline{a,b})}, and F?(a,b)F_{?(a,b)} are all zero-probability events. We now approximate these events with the help of suitable positive-probability events.

Definition 5 (δ\delta-Approximation).

Let F={𝐱(0)t1𝐱(1)t2tr𝐱(r)t𝐱(r)}F=\{\mathbf{x}^{(0)}\stackrel{{\scriptstyle t_{1}}}{{\rightarrow}}\mathbf{x}^{(1)}\stackrel{{\scriptstyle t_{2}}}{{\rightarrow}}\cdots\stackrel{{\scriptstyle t_{r}}}{{\rightarrow}}\mathbf{x}^{(r)}\stackrel{{\scriptstyle t}}{{\rightarrow}}\mathbf{x}^{(r)}\} be a transition sequence. Then, for a given δ>0\delta>0, the δ\delta-approximation of FF is the event Fδ:={𝐗(0)=𝐱(0),,𝐗(Jr)=𝐱(r),Δ1J[Δ1t,Δ1t+δ),,ΔrJ[Δrt,Δrt+δ),Δr+1J>ttr}F^{\delta}:=\{\mathbf{X}(0)=\mathbf{x}^{(0)},\ldots,\mathbf{X}(J_{r})=\mathbf{x}^{(r)},\Delta_{1}J\in[\Delta_{1}t,\Delta_{1}t+\delta),\ldots,\Delta_{r}J\in[\Delta_{r}t,\Delta_{r}t+\delta),\Delta_{r+1}J>t-t_{r}\}, where ΔJ:=JJ1\Delta_{\ell}J:=J_{\ell}-J_{\ell-1}, Δt:=tt1\Delta_{\ell}t:=t_{\ell}-t_{\ell-1}, and t0:=0t_{0}:=0. Also, the δ\delta-approximation of F?(a,b)F_{?(a,b)} is the event F?(a,b)δ:=FδF(a,b¯)δF_{?(a,b)}^{\delta}:=F^{\delta}\cup F_{(\overline{a,b})}^{\delta} (where F(a,b¯)δF^{\delta}_{(\overline{a,b})} is the δ\delta-approximation of F(a,b¯)F_{(\overline{a,b})}).

The following lemma evaluates the probability of occurrence of a δ\delta-approximation event.

Lemma 3.

Let F={𝐱(0)t1𝐱(1)t2tr𝐱(r)t𝐱(r)}F=\{\mathbf{x}^{(0)}\stackrel{{\scriptstyle t_{1}}}{{\rightarrow}}\mathbf{x}^{(1)}\stackrel{{\scriptstyle t_{2}}}{{\rightarrow}}\cdots\stackrel{{\scriptstyle t_{r}}}{{\rightarrow}}\mathbf{x}^{(r)}\stackrel{{\scriptstyle t}}{{\rightarrow}}\mathbf{x}^{(r)}\} be a transition sequence. Then for all sufficiently small δ>0\delta>0, the ratio Pr(Fδ)Pr(𝐗(0)=𝐱(0))\frac{\text{Pr}(F^{\delta})}{\text{Pr}(\mathbf{X}(0)=\mathbf{x}^{(0)})} equals

eqr(ttr)=1rq1,(eq1(tt1)δ+o(δ)),e^{-q_{r}(t-t_{r})}\prod_{\ell=1}^{r}q_{\ell-1,\ell}(e^{-q_{\ell-1}(t_{\ell}-t_{\ell-1})}\delta+o(\delta)),

where t0:=0t_{0}:=0, q:=𝐐(𝐱(),𝐱())q_{\ell}:=-\mathbf{Q}(\mathbf{x}^{(\ell)},\mathbf{x}^{(\ell)}), and q,+1:=𝐐(𝐱(),𝐱(+1))q_{\ell,\ell+1}:=\mathbf{Q}(\mathbf{x}^{(\ell)},\mathbf{x}^{(\ell+1)}) for each [r]\ell\in[r].

Proof.

Observe that

Pr(Fδ)Pr(𝐗(0)=𝐱(0))\displaystyle\frac{\text{Pr}(F^{\delta})}{\text{Pr}(\mathbf{X}(0)=\mathbf{x}^{(0)})} =Pr([r]{𝐗(J)=𝐱(),ΔJ[Δt,Δt+δ)}𝐗(0)=𝐱(0))\displaystyle=\text{Pr}(\cap_{\ell\in[r]}\{\mathbf{X}(J_{\ell})=\mathbf{x}^{(\ell)},\Delta_{\ell}J\in[\Delta_{\ell}t,\Delta_{\ell}t+\delta)\}\mid\mathbf{X}(0)=\mathbf{x}^{(0)})
×Pr(Δr+1J>ttr[r]{𝐗(J)=𝐱(),ΔJ[Δt,Δt+δ)})\displaystyle\quad\times\text{Pr}(\Delta_{r+1}J>t-t_{r}\mid\cap_{\ell\in[r]}\{\mathbf{X}(J_{\ell})=\mathbf{x}^{(\ell)},\Delta_{\ell}J\in[\Delta_{\ell}t,\Delta_{\ell}t+\delta)\})
=(a)=1rPr(ΔJ[Δt,Δt+δ),𝐗(J)=𝐱()𝐗(J1)=𝐱(1))\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\prod_{\ell=1}^{r}\text{Pr}(\Delta_{\ell}J\in[\Delta_{\ell}t,\Delta_{\ell}t+\delta),\mathbf{X}(J_{\ell})=\mathbf{x}^{(\ell)}\mid\mathbf{X}(J_{\ell-1})=\mathbf{x}^{(\ell-1)})
×Pr(Δr+1J>ttr𝐗(Jr)=𝐱(r))\displaystyle\quad\times\text{Pr}(\Delta_{r+1}J>t-t_{r}\mid\mathbf{X}(J_{r})=\mathbf{x}^{(r)})
=(b)=1rPr(Δ1J[Δt,Δt+δ),𝐗(J1)=𝐱()𝐗(0)=𝐱(1))\displaystyle\stackrel{{\scriptstyle(b)}}{{=}}\prod_{\ell=1}^{r}\text{Pr}(\Delta_{1}J\in[\Delta_{\ell}t,\Delta_{\ell}t+\delta),\mathbf{X}(J_{1})=\mathbf{x}^{(\ell)}\mid\mathbf{X}(0)=\mathbf{x}^{(\ell-1)})
×Pr(Δ1J>ttr𝐗(0)=𝐱(r))\displaystyle\quad\times\text{Pr}(\Delta_{1}J>t-t_{r}\mid\mathbf{X}(0)=\mathbf{x}^{(r)})
=(c)=1r(Pr(Δ1J[Δt,Δt+δ)𝐗(0)=𝐱(1))Pr(𝐗(J1)=𝐱()𝐗(0)=x(1)))\displaystyle\stackrel{{\scriptstyle(c)}}{{=}}\prod_{\ell=1}^{r}\Big{(}\text{Pr}(\Delta_{1}J\in[\Delta_{\ell}t,\Delta_{\ell}t+\delta)\mid\mathbf{X}(0)=\mathbf{x}^{(\ell-1)})\cdot\text{Pr}(\mathbf{X}(J_{1})=\mathbf{x}^{(\ell)}\mid\mathbf{X}(0)=x^{(\ell-1)})\Big{)}
×Pr(Δ1J>ttr𝐗(0)=𝐱(r))\displaystyle\quad\times\text{Pr}(\Delta_{1}J>t-t_{r}\mid\mathbf{X}(0)=\mathbf{x}^{(r)})
=(d)=1r((q1eq1Δtδ+o(δ))q1,q1)×eqr(ttr)\displaystyle\stackrel{{\scriptstyle(d)}}{{=}}\prod_{\ell=1}^{r}\left(\left(q_{\ell-1}e^{-q_{\ell-1}\Delta_{\ell}t}\delta+o(\delta)\right)\frac{q_{\ell-1,\ell}}{q_{\ell-1}}\right)\times e^{-q_{r}(t-t_{r})}
=eqr(ttr)=1rq1,(eq1(tt1)δ+o(δ)),\displaystyle=e^{-q_{r}(t-t_{r})}\prod_{\ell=1}^{r}q_{\ell-1,\ell}\left(e^{-q_{\ell-1}(t_{\ell}-t_{\ell-1})}\delta+o(\delta)\right),

where (a)(a) follows from the strong Markov property and the fact that jump times are stopping times, (b)(b) follows from Proposition 3.2 of [41], (c)(c) follows from the fact that 𝐗(J1)\mathbf{X}(J_{1}) and Δ1J\Delta_{1}J (which equals J1J_{1}) are conditionally independent given 𝐗(0)\mathbf{X}(0) (see Proposition 3.1 of [41]), and (d)(d) follows from the following two facts:

  1. 1.

    Δ1J\Delta_{1}J is conditionally exponentially distributed with mean q11q_{\ell-1}^{-1} given that 𝐗(0)=𝐱(1)\mathbf{X}(0)=\mathbf{x}^{(\ell-1)}.

  2. 2.

    For the embedded jump chain, the probability of transitioning from 𝐱𝕊\mathbf{x}\in\mathbb{S} to 𝐲𝕊\mathbf{y}\in\mathbb{S} is 𝐐(𝐱,𝐲)|𝐐(𝐱,𝐱)|\frac{\mathbf{Q}(\mathbf{x},\mathbf{y})}{|\mathbf{Q}(\mathbf{x},\mathbf{x})|}.

For the rest of the appendix, let a𝒜ia\in\mathcal{A}_{i} and b𝒜jb\in\mathcal{A}_{j} be any two nodes, let t[0,)t\in[0,\infty) be any time instant, let T[0,t]T\in[0,t] be the random variable such that tTt-T is the time at which 1(a,b)1_{(a,b)} is updated for the last time during the interval [0,t)[0,t), and let K:=tmin{t,inf{τ:b(τ)}}K:=t-\min\{t,\inf\{\tau:b\in\mathcal{I}(\tau)\}\}, where inf{τ:b(τ)}\inf\{\tau:b\in\mathcal{I}(\tau)\} is the time at which bb gets infected.

Lemma 4.

The PDF of TT has [0,t][0,t] as its support and is given by

fT(τ)=λeλτ+eλtδD(τt),f_{T}(\tau)=\lambda e^{-\lambda\tau}+e^{-\lambda t}\delta_{D}(\tau-t),

where δD()\delta_{D}(\cdot) is the Dirac-delta function.

Proof.

The definition of TT implies that the support of its PDF is [0,t][0,t]. To derive the required closed-form expression for this PDF, recall that 𝐐(𝐱,𝐱(a,b))+𝐐(𝐱,𝐱(a,b))=λ\mathbf{Q}(\mathbf{x},\mathbf{x}_{\uparrow(a,b)})+\mathbf{Q}(\mathbf{x},\mathbf{x}_{\downarrow(a,b)})=\lambda for all 𝐱𝕊\mathbf{x}\in\mathbb{S}, which means that the edge state of (a,b)(a,b) is updated at a constant rate of λ\lambda at all times. Therefore, for any τ[0,t)\tau\in[0,t), the quantity Pr(T>τ)\text{Pr}(T>\tau) (the probability that 1(a,b)1_{(a,b)} is not updated during [tτ,t][t-\tau,t]) is given by eλτe^{-\lambda\tau}. However, Pr(T>t)=0\text{Pr}(T>t)=0, implying that Pr(T=t)=Pr(Tt)=limτtPr(T>τ)=eλt\text{Pr}(T=t)=\text{Pr}(T\geq t)=\lim_{\tau\rightarrow t^{-}}\text{Pr}(T>\tau)=e^{-\lambda t}. Hence, the CDF of TT is F(τ)=1eλτF(\tau)=1-e^{-\lambda\tau} for all τ[0,t)\tau\in[0,t), and F(t)=1F(t)=1. Taking the first derivative of this CDF now yields the required expression for fTf_{T}.

To prove the next lemma, we need the notion of agnostic superstates, which is defined below.

Definition 6 ((a,b)(a,b)-Agnostic Superstate).

Given a node pair (a,b)[n]×[n](a,b)\in[n]\times[n], a collection of states 𝕏𝕊\mathbb{X}\subset\mathbb{S} is an (a,b)(a,b)-agnostic superstate if 𝕏\mathbb{X} can be expressed as 𝕏={𝐱,𝐱(a,b¯),𝐲,𝐲(a,b¯)}\mathbb{X}=\left\{\mathbf{x},\mathbf{x}_{(\overline{a,b})},\mathbf{y},\mathbf{y}_{(\overline{a,b})}\right\} for a pair of states 𝐱,𝐲𝕊\mathbf{x},\mathbf{y}\in\mathbb{S} satisfying yn2+a,b=1xn2+a,by_{n^{2}+\langle a,b\rangle}=1-x_{n^{2}+\langle a,b\rangle} and x=yx_{\ell}=y_{\ell} for all [2n2n]{n2+a,b}\ell\in[2n^{2}-n]\setminus\{n^{2}+\langle a,b\rangle\}.

Note that an (a,b)(a,b)-agnostic superstate specifies the disease states of all the nodes and the edge states of all the node pairs except (a,b)(a,b).

Definition 7 ((a,b)(a,b)-Agnostic Jump Times).

Given (a,b)[n]×[n](a,b)\in[n]\times[n], the (a,b)(a,b)-agnostic jump times of the chain {𝐗(τ):τ0}\{\mathbf{X}(\tau):\tau\geq 0\}, denoted by {Lk}k=0\{L_{k}\}_{k=0}^{\infty}, are defined by L0:=0L_{0}:=0 and Lk:=inf{J:,J>Lk1,𝐗(J){(𝐗(J1))(a,b),(𝐗(J1))(a,b)}}L_{k}:=\inf\{J_{\ell}:\ell\in\mathbb{N},J_{\ell}>L_{k-1},\mathbf{X}(J_{\ell})\notin\{(\mathbf{X}(J_{\ell-1}))_{\uparrow(a,b)},(\mathbf{X}(J_{\ell-1}))_{\downarrow(a,b)}\}\} for all kk\in\mathbb{N}.

Note that {Lk}k=0{J}=0\{L_{k}\}_{k=0}^{\infty}\subset\{J_{\ell}\}_{\ell=0}^{\infty} and that the (a,b)(a,b)-agnostic jump times of {𝐗(τ)}\{\mathbf{X}(\tau)\} are the jump times of the chain at which the edge state of (a,b)(a,b) is not updated.

Lemma 5.

KK is independent of (T,1(a,b)(t))(T,1_{(a,b)}(t)).

Proof.

Note that KK is a function of K~:=inf{τ0:b(τ)}\tilde{K}:=\inf\{\tau\geq 0:b\in\mathcal{I}(\tau)\}, the time at which bb gets infected. Hence, it suffices to prove that K~\tilde{K} is independent of (T,1(a,b)(t))(T,1_{(a,b)}(t)).

Consider now any κ0\kappa\geq 0 and note that {K~κ}=N=0({K~=LN}{LNκ})\{\tilde{K}\geq\kappa\}=\cup_{N=0}^{\infty}\left(\{\tilde{K}=L_{N}\}\cap\{L_{N}\geq\kappa\}\right). To examine the probability of {K~=LN}\{\tilde{K}=L_{N}\}, we let N(κ)\mathcal{F}_{N}(\kappa) denote the set of all the events of the form F={𝐗(0)𝕏(0),𝐗(L1)𝕏(1),,𝐗(LN)𝕏(N)}F=\{\mathbf{X}(0)\in\mathbb{X}^{(0)},\mathbf{X}(L_{1})\in\mathbb{X}^{(1)},\ldots,\mathbf{X}(L_{N})\in\mathbb{X}^{(N)}\} (where 𝕏(0),,𝕏(N)\mathbb{X}^{(0)},\ldots,\mathbb{X}^{(N)} are (a,b)(a,b)-agnostic superstates satisfying 𝕏(k)𝕏(k1)\mathbb{X}^{(k)}\neq\mathbb{X}^{(k-1)} for all k[N]k\in[N]) that satisfy F{K~=LN}F\subset\{\tilde{K}=L_{N}\}, and we observe that {K~=LN}=FN(κ)F\{\tilde{K}=L_{N}\}=\cup_{F\in\mathcal{F}_{N}(\kappa)}F.

We now examine Pr(F)\text{Pr}(F) for an arbitrary F={𝐗(0)𝕏(0),𝐗(L1)𝕏(1),,𝐗(LN)𝕏(N)}N(κ)F=\{\mathbf{X}(0)\in\mathbb{X}^{(0)},\mathbf{X}(L_{1})\in\mathbb{X}^{(1)},\ldots,\mathbf{X}(L_{N})\in\mathbb{X}^{(N)}\}\in\mathcal{F}_{N}(\kappa). Pick any k[N]k\in[N] and 𝐱𝕏(k1)\mathbf{x}\in\mathbb{X}^{(k-1)}. Note that F{K~=LN}F\subset\{\tilde{K}=L_{N}\} implies that b𝒮(𝐱)b\in\mathcal{S}(\mathbf{x}). In view of our definition of 𝐐\mathbf{Q}, this means that 𝐐(𝐱,𝐱)=𝐳𝕊{𝐱}𝐐(𝐱,𝐳)\mathbf{Q}(\mathbf{x},\mathbf{x})=-\sum_{\mathbf{z}\in\mathbb{S}\setminus\{\mathbf{x}\}}\mathbf{Q}(\mathbf{x},\mathbf{z}), which possibly depends on the disease states {x1,,xn}\{x_{1},\ldots,x_{n}\} and on the edge states {1(c,d)(𝐱):(c,d)[n]×[n]:d(𝐱)}\{1_{(c,d)}(\mathbf{x}):(c,d)\in[n]\times[n]:d\in\mathcal{I}(\mathbf{x})\}, does not depend on 1(a,b)(𝐱)1_{(a,b)}(\mathbf{x}). We next observe that, by the definitions of (a,b)(a,b)-agnostic states and jump times, none of the possible transitions from 𝕏(k1)\mathbb{X}^{(k-1)} to 𝕏(k)\mathbb{X}^{(k)} involves an edge state update for (a,b)(a,b), which means that the values of both Xa,b=1(a,b)(𝐗)X_{\langle a,b\rangle}=1_{(a,b)}(\mathbf{X}) and Xn2+a,bX_{n^{2}+\langle a,b\rangle} are preserved in such transitions. Therefore, for every 𝐱𝕏(k1)\mathbf{x}\in\mathbb{X}^{(k-1)}, there exists at most one state 𝐲𝕏(k)\mathbf{y}\in\mathbb{X}^{(k)} that potentially succeeds 𝐱\mathbf{x}. For such a state 𝐲\mathbf{y}, the transition rate 𝐐(𝐱,𝐲)\mathbf{Q}(\mathbf{x},\mathbf{y}), given by

𝐐(𝐱,𝐲)={q=1mdq(𝐱)Bpq1(c,d)(𝐱)if c𝒮p(𝐱) such that 𝐲=𝐱cγpif cp(𝐱) such that 𝐲=𝐱c,λρpqnif (c,d)𝒜p×𝒜q{(a,b)} such that 𝐲=𝐱(c,d),λ(1ρpqn)if (a,b)𝒜p×𝒜q{(a,b)} such that 𝐲=𝐱(c,d),\displaystyle\mathbf{Q}(\mathbf{x},\mathbf{y})=\begin{cases}\sum_{q=1}^{m}\sum_{d\in\mathcal{I}_{q}(\mathbf{x})}B_{pq}1_{(c,d)}(\mathbf{x})\,&\text{if }\exists\,\,c\in\mathcal{S}_{p}(\mathbf{x})\text{ such that }\mathbf{y}=\mathbf{x}_{\uparrow c}\\ \gamma_{p}\quad&\text{if }\exists\,\,c\in\mathcal{I}_{p}(\mathbf{x})\text{ such that }\mathbf{y}=\mathbf{x}_{\downarrow c},\\ \lambda\frac{\rho_{pq}}{n}\quad&\text{if }\exists\,\,(c,d)\in\mathcal{A}_{p}\times\mathcal{A}_{q}\setminus\{(a,b)\}\text{ such that }\mathbf{y}=\mathbf{x}_{\uparrow(c,d),}\\ \lambda\left(1-\frac{\rho_{pq}}{n}\right)\quad&\text{if }\exists\,\,(a,b)\in\mathcal{A}_{p}\times\mathcal{A}_{q}\setminus\{(a,b)\}\text{ such that }\mathbf{y}=\mathbf{x}_{\downarrow(c,d)},\end{cases}

does not depend on xa,b=ya,bx_{\langle a,b\rangle}=y_{\langle a,b\rangle} or on xn2+a,b=yn2+a,bx_{n^{2}+\langle a,b\rangle}=y_{n^{2}+\langle a,b\rangle}, because b(𝐱)b\notin\mathcal{I}(\mathbf{x}) implies that (a,b)p=1mc𝒮p(𝐱)q=1m{(c,d):dq(𝐱)}(a,b)\notin\cup_{p=1}^{m}\cup_{c\in\mathcal{S}_{p}(\mathbf{x})}\cup_{q=1}^{m}\{(c,d):d\in\mathcal{I}_{q}(\mathbf{x})\}. It follows that the rate at which the Markov chain {𝐗(τ)}\{\mathbf{X}(\tau)\} transitions from 𝐱\mathbf{x} to a state in 𝕏(k)\mathbb{X}^{(k)}, given by 𝐳𝕏(k)𝐐(𝐱,𝐳)=𝐐(𝐱,𝐲)\sum_{\mathbf{z}\in\mathbb{X}^{(k)}}\mathbf{Q}(\mathbf{x},\mathbf{z})=\mathbf{Q}(\mathbf{x},\mathbf{y}), is time-invariant and takes the same value for every 𝐱𝕏(k1)\mathbf{x}\in\mathbb{X}^{(k-1)}. This means that, as long as the Markov chain {𝐗(τ):τ0}\{\mathbf{X}(\tau):\tau\geq 0\} does not leave the (a,b)(a,b)-agnostic superstate 𝕏(k1)\mathbb{X}^{(k-1)}, the rate at which the chain transitions to 𝕏(k)\mathbb{X}^{(k)} remains the same regardless of transitions within 𝕏(k1)\mathbb{X}^{(k-1)}. We can express this formally as

limΔτ0Pr(𝐗(τ+Δτ)𝕏(k)𝐗(τ)𝕏(k1),𝐗(τ)=𝐱)Δτ=𝐐(𝕏(k1),𝕏(k))\lim_{\Delta\tau\rightarrow 0}\frac{\text{Pr}(\mathbf{X}(\tau+\Delta\tau)\in\mathbb{X}^{(k)}\mid\mathbf{X}(\tau)\in\mathbb{X}^{(k-1)},\mathbf{X}(\tau)=\mathbf{x})}{\Delta\tau}=\mathbf{Q}(\mathbb{X}^{(k-1)},\mathbb{X}^{(k)})

for all τ0\tau\geq 0 and 𝐱𝕏(k1)\mathbf{x}\in\mathbb{X}^{(k-1)}, where

𝐐(𝕏(k1),𝕏(k)):=limΔτ0Pr(𝐗(τ+Δτ)𝕏(k)𝐗(τ)𝕏(k1))Δτ\mathbf{Q}(\mathbb{X}^{(k-1)},\mathbb{X}^{(k)}):=\lim_{\Delta\tau\rightarrow 0}\frac{\text{Pr}(\mathbf{X}(\tau+\Delta\tau)\in\mathbb{X}^{(k)}\mid\mathbf{X}(\tau)\in\mathbb{X}^{(k-1)})}{\Delta\tau}

denotes the time-invariant rate of transitioning from 𝕏(k1)\mathbb{X}^{(k-1)} to 𝕏(k)\mathbb{X}^{(k)}.

By Markovity, this implies that222In this paper, conditioning an event HH on {𝐗(τ):0ττ}\{\mathbf{X}(\tau^{\prime}):0\leq\tau^{\prime}\leq\tau\} means conditioning HH on every 𝐗(τ)\mathbf{X}(\tau^{\prime}) for 0ττ0\leq\tau^{\prime}\leq\tau, i.e., conditioning H on the trajectory traced by the Markov chain during the interval [0,τ][0,\tau] and not just on the random set of tuples {𝐗(τ):0ττ}\{\mathbf{X}(\tau^{\prime}):0\leq\tau^{\prime}\leq\tau\}. Conditioning on the set {𝐗(τ):0ττ}\{\mathbf{X}(\tau^{\prime}):0\leq\tau^{\prime}\leq\tau\} is not sufficient because sets, by definition, are unordered.

limΔτ0Pr(𝐗(τ+Δτ)𝕏(k)𝐗(τ)=𝐱,{𝐗(τ):0τ<τ})Δτ=𝐐(𝕏(k1),𝕏(k)),\lim_{\Delta\tau\rightarrow 0}\frac{\text{Pr}(\mathbf{X}(\tau+\Delta\tau)\in\mathbb{X}^{(k)}\mid\mathbf{X}(\tau)=\mathbf{x},\{\mathbf{X}(\tau^{\prime}):0\leq\tau^{\prime}<\tau\})}{\Delta\tau}=\mathbf{Q}(\mathbb{X}^{(k-1)},\mathbb{X}^{(k)}),

for all τ0\tau\geq 0 and 𝐱𝕏(k1)\mathbf{x}\in\mathbb{X}^{(k-1)}, which means that the conditional rate at which the chain leaves 𝕏(k1)\mathbb{X}^{(k-1)} time τ\tau is independent of the history {𝐗(τ):τ[0,τ]}\{\mathbf{X}(\tau^{\prime}):\tau^{\prime}\in[0,\tau]\}. Since {1(a,b)(τ):0ττ}\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq\tau\} are determined by {𝐗(τ):0ττ}\{\mathbf{X}(\tau^{\prime}):0\leq\tau^{\prime}\leq\tau\}, it follows that

limΔτ0Pr(𝐗(τ+Δτ)𝕏(k)𝐗(τ)=𝐱,{1(a,b)(τ):0ττ})Δτ=𝐐(𝕏(k1),𝕏(k))\lim_{\Delta\tau\rightarrow 0}\frac{\text{Pr}(\mathbf{X}(\tau+\Delta\tau)\in\mathbb{X}^{(k)}\mid\mathbf{X}(\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq\tau\})}{\Delta\tau}=\mathbf{Q}(\mathbb{X}^{(k-1)},\mathbb{X}^{(k)})

for all τ0\tau\geq 0 and 𝐱𝕏(k1)\mathbf{x}\in\mathbb{X}^{(k-1)}. Now, let T(τ):=inf{τ0:𝐗(τ+τ)=(𝐗((τ+τ)))(a,b)}T_{\uparrow}(\tau):=\inf\{\tau^{\prime}\geq 0:\mathbf{X}(\tau+\tau^{\prime})=(\mathbf{X}((\tau+\tau^{\prime})^{-}))_{\uparrow(a,b)}\} be the (random) time elapsed between time τ\tau and the first of the updates of 1(a,b)1_{(a,b)} that occur after time τ\tau and result in 1(a,b)=11_{(a,b)}=1. Then, the following holds for all 𝐱𝕏(k1)\mathbf{x}\in\mathbb{X}^{(k-1)}, τ0\tau\geq 0, σ>0\sigma>0 and sufficiently small Δτ>0\Delta\tau>0:

Pr(T(τ)σ𝐗(τ)=𝐱,{1(a,b)(τ):0ττ},𝐗(τ+Δτ)𝕏(k))\displaystyle\text{Pr}(T_{\uparrow}(\tau)\geq\sigma\mid\mathbf{X}(\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq\tau\},\mathbf{X}(\tau+\Delta\tau)\in\mathbb{X}^{(k)})
=Pr(T(τ+Δτ)σΔτ𝐗(τ)=𝐱,{1(a,b)(τ):0ττ},𝐗(τ+Δτ)𝕏(k),T(τ)Δτ)\displaystyle=\text{Pr}(T_{\uparrow}(\tau+\Delta\tau)\geq\sigma-\Delta\tau\mid\mathbf{X}(\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq\tau\},\mathbf{X}(\tau+\Delta\tau)\in\mathbb{X}^{(k)},T_{\uparrow}(\tau)\geq\Delta\tau)
Pr(T(τ)Δτ𝐗(τ)=𝐱,{1(a,b)(τ):0ττ},𝐗(τ+Δτ)𝕏(k))\displaystyle\quad\cdot\text{Pr}(T_{\uparrow}(\tau)\geq\Delta\tau\mid\mathbf{X}(\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq\tau\},\mathbf{X}(\tau+\Delta\tau)\in\mathbb{X}^{(k)})
=Pr(T(τ+Δτ)σΔτ𝐗(τ)=𝐱,{1(a,b)(τ):0ττ},𝐗(τ+Δτ)𝕏(k),\displaystyle=\text{Pr}(T_{\uparrow}(\tau+\Delta\tau)\geq\sigma-\Delta\tau\mid\mathbf{X}(\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq\tau\},\mathbf{X}(\tau+\Delta\tau)\in\mathbb{X}^{(k)},
Xn2+a,b(τ)=Xn2+a,b(τ)τ[τ,τ+Δτ))\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad X_{n^{2}+\langle a,b\rangle}(\tau^{\prime})=X_{n^{2}+\langle a,b\rangle}(\tau)\,\forall\,\tau^{\prime}\in[\tau,\tau+\Delta\tau))
Pr(T(τ)Δτ𝐗(τ)=𝐱,{1(a,b)(τ):0ττ},𝐗(τ+Δτ)𝕏(k))\displaystyle\quad\cdot\text{Pr}(T_{\uparrow}(\tau)\geq\Delta\tau\mid\mathbf{X}(\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq\tau\},\mathbf{X}(\tau+\Delta\tau)\in\mathbb{X}^{(k)})
=(a)eλρijn(σΔτ)Pr(T(τ)Δτ𝐗(τ)=𝐱,{1(a,b)(τ):0ττ},𝐗(τ+Δτ)𝕏(k))\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}e^{-\lambda\frac{\rho_{ij}}{n}(\sigma-\Delta\tau)}\cdot\text{Pr}(T_{\uparrow}(\tau)\geq\Delta\tau\mid\mathbf{X}(\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq\tau\},\mathbf{X}(\tau+\Delta\tau)\in\mathbb{X}^{(k)})
Δτ0eλρijnσ1\displaystyle\stackrel{{\scriptstyle\Delta\tau\rightarrow 0}}{{\longrightarrow}}e^{-\lambda\frac{\rho_{ij}}{n}\sigma}\cdot 1
=(b)Pr(T(τ)σ𝐗(τ)=𝐱,{1(a,b)(τ):0ττ}),\displaystyle\stackrel{{\scriptstyle(b)}}{{=}}\text{Pr}(T_{\uparrow}(\tau)\geq\sigma\mid\mathbf{X}(\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq\tau\}),

where (a)(a) and (b)(b) follow from Markovity and the fact that 𝐐(𝐳,𝐳(a,b))=λρijn\mathbf{Q}(\mathbf{z},\mathbf{z}_{\uparrow(a,b)})=\lambda\frac{\rho_{ij}}{n} for all 𝕫𝕊\mathbb{z}\in\mathbb{S}. Now, let T(1)(τ):=T(τ)T^{(1)}_{\uparrow}(\tau):=T_{\uparrow}(\tau) and T()(τ):=T(T(1)(τ))T^{(\ell)}_{\uparrow}(\tau):=T_{\uparrow}(T^{(\ell-1)}_{\uparrow}(\tau)) for all \ell\in\mathbb{N}. Then, since {T()}=1\{T_{\uparrow}^{(\ell)}\}_{\ell=1}^{\infty} are stopping times, similar arguments can be used to show the following for all σ1,σ2,,σ0\sigma_{1},\sigma_{2},\ldots,\sigma_{\ell}\geq 0 and all \ell\in\mathbb{N}:

limΔτ0\displaystyle\lim_{\Delta\tau\rightarrow 0} Pr(T(1)(τ)σ1,,T()(τ)σ𝐗(τ)=𝐱,{1(a,b)(τ):0ττ},𝐗(τ+Δτ)𝕏(k))\displaystyle\text{Pr}\left(T_{\uparrow}^{(1)}(\tau)\geq\sigma_{1},\ldots,T_{\uparrow}^{(\ell)}(\tau)\geq\sigma_{\ell}\mid\mathbf{X}(\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq\tau\},\mathbf{X}(\tau+\Delta\tau)\in\mathbb{X}^{(k)}\right)
=Pr(T(1)(τ)σ1,,T()(τ)σ𝐗(τ)=𝐱,{1(a,b)(τ):0ττ}).\displaystyle=\text{Pr}\left(T_{\uparrow}^{(1)}(\tau)\geq\sigma_{1},\ldots,T_{\uparrow}^{(\ell)}(\tau)\geq\sigma_{\ell}\mid\mathbf{X}(\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq\tau\}\right).

Similarly, if we let T(1):=inf{τ0:𝐗(τ+τ)=(𝐗((τ+τ)))(a,b)}T_{\downarrow}^{(1)}:=\inf\{\tau^{\prime}\geq 0:\mathbf{X}(\tau+\tau^{\prime})=(\mathbf{X}((\tau+\tau^{\prime})^{-}))_{\downarrow(a,b)}\} and T()(τ):=T(T(1)(τ))T^{(\ell)}_{\downarrow}(\tau):=T_{\downarrow}(T^{(\ell-1)}_{\downarrow}(\tau)) for all \ell\in\mathbb{N}, then we can show that for all σ1,,σ,σ1,,σ0\sigma_{\uparrow 1},\ldots,\sigma_{\uparrow\ell},\sigma_{\downarrow_{1}},\ldots,\sigma_{\downarrow\ell^{\prime}}\geq 0 and all ,\ell,\ell^{\prime}\in\mathbb{N},

limΔτ0Pr(T(1)(τ)σ1,,T()(τ)σ,T(1)(τ)σ1,,T()(τ)σ\displaystyle\lim_{\Delta\tau\rightarrow 0}\text{Pr}\Big{(}T_{\uparrow}^{(1)}(\tau)\geq\sigma_{\uparrow 1},\ldots,T_{\uparrow}^{(\ell)}(\tau)\geq\sigma_{\uparrow\ell},T_{\downarrow}^{(1)}(\tau)\geq\sigma_{\downarrow 1},\ldots,T_{\downarrow}^{(\ell^{\prime})}(\tau)\geq\sigma_{\downarrow\ell^{\prime}}
𝐗(τ)=𝐱,{1(a,b)(τ):0ττ},𝐗(τ+Δτ)𝕏(k))\displaystyle\quad\quad\quad\quad\mid\mathbf{X}(\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq\tau\},\mathbf{X}(\tau+\Delta\tau)\in\mathbb{X}^{(k)}\Big{)}
=Pr(T(1)(τ)σ1,,T()(τ)σ,T(1)(τ)σ1,,T()(τ)σ\displaystyle=\text{Pr}\Big{(}T_{\uparrow}^{(1)}(\tau)\geq\sigma_{\uparrow 1},\ldots,T_{\uparrow}^{(\ell)}(\tau)\geq\sigma_{\uparrow\ell},T_{\downarrow}^{(1)}(\tau)\geq\sigma_{\downarrow 1},\ldots,T_{\downarrow}^{(\ell^{\prime})}(\tau)\geq\sigma_{\downarrow\ell^{\prime}}
𝐗(τ)=𝐱,{1(a,b)(τ):0ττ}).\displaystyle\quad\quad\quad\mid\mathbf{X}(\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq\tau\}\Big{)}.

As a result, we have the following for all σ1,,σ,σ1,,σ0\sigma_{\uparrow 1},\ldots,\sigma_{\uparrow\ell},\sigma_{\downarrow_{1}},\ldots,\sigma_{\downarrow\ell^{\prime}}\geq 0 and all ,\ell,\ell^{\prime}\in\mathbb{N}:

Pr(𝐗(τ+Δτ)𝕏(k)𝐗(τ)=𝐱,{1(a,b)(τ)}τ[0,τ],{T(ξ)(τ)σξ}ξ=1,{T(ξ)(τ)σξ}ξ=1)Δτ\displaystyle\frac{\text{Pr}\Big{(}\mathbf{X}(\tau+\Delta\tau)\in\mathbb{X}^{(k)}\mid\mathbf{X}(\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime})\}_{\tau^{\prime}\in[0,\tau]},\{T_{\uparrow}^{(\xi)}(\tau)\geq\sigma_{\uparrow\xi}\}_{\xi=1}^{\ell},\{T_{\downarrow}^{(\xi)}(\tau)\geq\sigma_{\downarrow\xi}\}_{\xi=1}^{\ell^{\prime}}\Big{)}}{\Delta\tau}
=(a)Pr({T(ξ)(τ)σξ}ξ=1,{T(ξ)(τ)σξ}ξ=1𝐗(τ)=𝐱,{1(a,b)(τ)}τ[0,τ],𝐗(τ+Δτ)𝕏(k))Pr({T(ξ)(τ)σξ}ξ=1,{T(ξ)(τ)σξ}ξ=1𝐗(τ)=𝐱,{1(a,b)(τ)}τ[0,τ])\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\frac{\text{Pr}\Big{(}\{T_{\uparrow}^{(\xi)}(\tau)\geq\sigma_{\uparrow\xi}\}_{\xi=1}^{\ell},\{T_{\downarrow}^{(\xi)}(\tau)\geq\sigma_{\downarrow\xi}\}_{\xi=1}^{\ell^{\prime}}\mid\mathbf{X}(\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime})\}_{\tau^{\prime}\in[0,\tau]},\mathbf{X}(\tau+\Delta\tau)\in\mathbb{X}^{(k)}\Big{)}}{\text{Pr}\Big{(}\{T_{\uparrow}^{(\xi)}(\tau)\geq\sigma_{\uparrow\xi}\}_{\xi=1}^{\ell},\{T_{\downarrow}^{(\xi)}(\tau)\geq\sigma_{\downarrow\xi}\}_{\xi=1}^{\ell^{\prime}}\mid\mathbf{X}(\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime})\}_{\tau^{\prime}\in[0,\tau]}\Big{)}}
×Pr(𝐗(τ+Δτ)𝕏(k)𝐗(τ)=𝐱,{1(a,b)(τ)}τ[0,τ])Δτ\displaystyle\quad\times\frac{\text{Pr}\Big{(}\mathbf{X}(\tau+\Delta\tau)\in\mathbb{X}^{(k)}\mid\mathbf{X}(\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime})\}_{\tau^{\prime}\in[0,\tau]}\Big{)}}{\Delta\tau}
Δτ01×𝐐(𝕏(k1),𝕏(k)),\displaystyle\stackrel{{\scriptstyle\Delta\tau\rightarrow 0}}{{\longrightarrow}}1\times\mathbf{Q}(\mathbb{X}^{(k-1)},\mathbb{X}^{(k)}),

i.e.,

limΔτ0Pr(𝐗(τ+Δτ)𝕏(k)𝐗(τ)=𝐱,{1(a,b)(τ)}τ[0,τ],{T(ξ)(τ)}ξ=1,{T(ξ)(τ)}ξ=1)Δτ\displaystyle\lim_{\Delta\tau\rightarrow 0}\frac{\text{Pr}\Big{(}\mathbf{X}(\tau+\Delta\tau)\in\mathbb{X}^{(k)}\mid\mathbf{X}(\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime})\}_{\tau^{\prime}\in[0,\tau]},\{T_{\uparrow}^{(\xi)}(\tau)\}_{\xi=1}^{\ell},\{T_{\downarrow}^{(\xi)}(\tau)\}_{\xi=1}^{\ell^{\prime}}\Big{)}}{\Delta\tau}
=𝐐(𝕏(k1),𝕏(k))\displaystyle=\mathbf{Q}(\mathbb{X}^{(k-1)},\mathbb{X}^{(k)})

for all ,\ell,\ell^{\prime}\in\mathbb{N}. Now, observe that if we are given {1(a,b)(τ):0ττ}\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq\tau\}, then {1(a,b)(τ):ττt}\{1_{(a,b)}(\tau^{\prime}):\tau\leq\tau^{\prime}\leq t\} are determined by a subset of the random variables {T()}=1{T()}=1\{T_{\uparrow}^{(\ell)}\}_{\ell=1}^{\infty}\cup\{T_{\downarrow}^{(\ell)}\}_{\ell=1}^{\infty} and this subset is random but almost surely finite. Hence, the above limit implies the following for all 𝐱𝕏(k1)\mathbf{x}\in\mathbb{X}^{(k-1)} and τ0\tau\geq 0:

limΔτ0Pr(𝐗(τ+Δτ)𝕏(k)𝐗(τ)=𝐱,{1(a,b)(τ):0τt})Δτ=𝐐(𝕏(k1),𝕏(k)).\lim_{\Delta\tau\rightarrow 0}\frac{\text{Pr}\left(\mathbf{X}(\tau+\Delta\tau)\in\mathbb{X}^{(k)}\mid\mathbf{X}(\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq t\}\right)}{\Delta\tau}=\mathbf{Q}(\mathbb{X}^{(k-1)},\mathbb{X}^{(k)}).

Moreover, since the above arguments remain valid if we replace 𝕏(k)\mathbb{X}^{(k)} with an arbitrary (a,b)(a,b)-agnostic superstate 𝕐𝕏(k1)\mathbb{Y}\neq\mathbb{X}^{(k-1)}, we can generalize the above to

limΔτ0Pr(𝐗(τ+Δτ)𝕐𝐗(τ)=𝐱,{1(a,b)(τ):0τt})Δτ=𝐐(𝕏(k1),𝕐)\lim_{\Delta\tau\rightarrow 0}\frac{\text{Pr}\left(\mathbf{X}(\tau+\Delta\tau)\in\mathbb{Y}\mid\mathbf{X}(\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq t\}\right)}{\Delta\tau}=\mathbf{Q}(\mathbb{X}^{(k-1)},\mathbb{Y})

for all (a,b)(a,b)-agnostic superstates 𝕐𝕏(k1)\mathbb{Y}\neq\mathbb{X}^{(k-1)}. It follows that

limΔτ0Pr(𝐗(τ+Δτ)𝕏(k1)𝐗(τ)=𝐱,{1(a,b)(τ):0τt})Δτ=𝕐𝕏(k1)𝐐(𝕏(k1),𝕐)\displaystyle\lim_{\Delta\tau\rightarrow 0}\frac{\text{Pr}\left(\mathbf{X}(\tau+\Delta\tau)\notin\mathbb{X}^{(k-1)}\mid\mathbf{X}(\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq t\}\right)}{\Delta\tau}=\sum_{\mathbb{Y}\neq\mathbb{X}^{(k-1)}}\mathbf{Q}(\mathbb{X}^{(k-1)},\mathbb{Y}) (70)

for all 𝐱𝕏(k1)\mathbf{x}\in\mathbb{X}^{(k-1)} and all τ0\tau\geq 0. This means that, given {𝐗(Lk1)=𝐱}\{\mathbf{X}(L_{k-1})=\mathbf{x}\} for some 𝐱𝕏(k1)\mathbf{x}\in\mathbb{X}^{(k-1)}, the random quantity LkLk1L_{k}-L_{k-1}, which is the duration of time spent by the Markov chain in 𝕏(k1)\mathbb{X}^{(k-1)}, is conditionally exponentially distributed with rate 𝕐𝕏(k1)𝐐(𝕏(k1),𝕐)\sum_{\mathbb{Y}\neq\mathbb{X}^{(k-1)}}\mathbf{Q}(\mathbb{X}^{(k-1)},\mathbb{Y}) and it is conditionally independent of {1(a,b)(τ):0τt}\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq t\}. Besides, the above deductions also imply the following: given {1(a,b)(τ):0τt}\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq t\} and given that the chain exits 𝕏(k1)\mathbb{X}^{(k-1)} from state 𝐱\mathbf{x} at time τ0\tau\geq 0, the conditional probability that it enters 𝕏(k)\mathbb{X}^{(k)} at time τ\tau is

Pr(𝐗(τ)𝕏(k)𝐗(τ)=𝐱,{1(a,b)(τ):0τt},𝐗(τ)𝕏(k1))\displaystyle\text{Pr}(\mathbf{X}(\tau)\in\mathbb{X}^{(k)}\mid\mathbf{X}(\tau^{-})=\mathbf{x},\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq t\},\mathbf{X}(\tau)\notin\mathbb{X}^{(k-1)})
=limΔτ0Pr(𝐗(τ)𝕏(k)𝐗(τΔτ)=𝐱,{1(a,b)(τ):0τt},𝐗(τ)𝕏(k1))\displaystyle=\lim_{\Delta\tau\rightarrow 0}\text{Pr}(\mathbf{X}(\tau)\in\mathbb{X}^{(k)}\mid\mathbf{X}(\tau-\Delta\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq t\},\mathbf{X}(\tau)\notin\mathbb{X}^{(k-1)})
=limΔτ0Pr(𝐗(τ)𝕏(k)𝐗(τΔτ)=𝐱,{1(a,b)(τ):0τt})Pr(𝐗(τ)𝕏(k1)𝐗(τΔτ)=𝐱,{1(a,b)(τ):0τt})\displaystyle=\lim_{\Delta\tau\rightarrow 0}\frac{\text{Pr}(\mathbf{X}(\tau)\in\mathbb{X}^{(k)}\mid\mathbf{X}(\tau-\Delta\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq t\})}{\text{Pr}(\mathbf{X}(\tau)\notin\mathbb{X}^{(k-1)}\mid\mathbf{X}(\tau-\Delta\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq t\})}
=limΔτ0Pr(𝐗(τ)𝕏(k)𝐗(τΔτ)=𝐱,{1(a,b)(τ):0τt})Δτ\displaystyle=\lim_{\Delta\tau\rightarrow 0}\frac{\text{Pr}(\mathbf{X}(\tau)\in\mathbb{X}^{(k)}\mid\mathbf{X}(\tau-\Delta\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq t\})}{\Delta\tau}
×limΔτ0(Pr(𝐗(τ)𝕏(k1)𝐗(τΔτ)=𝐱,{1(a,b)(τ):0τt})Δτ)1\displaystyle\quad\times\lim_{\Delta\tau\rightarrow 0}\left(\frac{\text{Pr}(\mathbf{X}(\tau)\notin\mathbb{X}^{(k-1)}\mid\mathbf{X}(\tau-\Delta\tau)=\mathbf{x},\{1_{(a,b)}(\tau^{\prime}):0\leq\tau^{\prime}\leq t\})}{\Delta\tau}\right)^{-1}
=𝐐(𝕏(k1),𝕏(k))𝕐𝕏(k1)𝐐(𝕏(k1),𝕐).\displaystyle=\frac{\mathbf{Q}(\mathbb{X}^{(k-1)},\mathbb{X}^{(k)})}{\sum_{\mathbb{Y}\neq\mathbb{X}^{(k-1)}}\mathbf{Q}(\mathbb{X}^{(k-1)},\mathbb{Y})}.

By invoking Markovity in the preceding arguments, the above can be generalized to

Pr(𝐗(τ)𝕏(k)𝐗(τ)=𝐱,{𝐗(τ)}τ[0,τ),{1(a,b)(τ)}τ[0,t],𝐗(τ)𝕏(k1))=𝐐(𝕏(k1),𝕏(k))𝕐𝕏(k1)𝐐(𝕏(k1),𝕐),\displaystyle\text{Pr}(\mathbf{X}(\tau)\in\mathbb{X}^{(k)}\mid\mathbf{X}(\tau^{-})=\mathbf{x},\{\mathbf{X}(\tau^{\prime})\}_{\tau^{\prime}\in[0,\tau)},\{1_{(a,b)}(\tau^{\prime})\}_{\tau^{\prime}\in[0,t]},\mathbf{X}(\tau)\notin\mathbb{X}^{(k-1)})=\frac{\mathbf{Q}(\mathbb{X}^{(k-1)},\mathbb{X}^{(k)})}{\sum_{\mathbb{Y}\neq\mathbb{X}^{(k-1)}}\mathbf{Q}(\mathbb{X}^{(k-1)},\mathbb{Y})}, (71)

which implies that

Pr(𝐗(τ)𝕏(k)𝐗(τ)=𝐱τ[Lk1,τ),{𝐗(τ)}τ[0,Lk1){1(a,b)(τ)}τ[0,t],𝐗(τ)𝕏(k1))\displaystyle\text{Pr}(\mathbf{X}(\tau)\in\mathbb{X}^{(k)}\mid\mathbf{X}(\tau^{\prime})=\mathbf{x}\,\forall\,\tau^{\prime}\in[L_{k-1},\tau),\{\mathbf{X}(\tau^{\prime})\}_{\tau^{\prime}\in[0,L_{k-1})}\{1_{(a,b)}(\tau^{\prime})\}_{\tau^{\prime}\in[0,t]},\mathbf{X}(\tau)\notin\mathbb{X}^{(k-1)})
=𝐐(𝕏(k1),𝕏(k))𝕐𝕏(k1)𝐐(𝕏(k1),𝕐).\displaystyle=\frac{\mathbf{Q}(\mathbb{X}^{(k-1)},\mathbb{X}^{(k)})}{\sum_{\mathbb{Y}\neq\mathbb{X}^{(k-1)}}\mathbf{Q}(\mathbb{X}^{(k-1)},\mathbb{Y})}.

Equivalently,

Pr(𝐗(Lk)𝕏(k)𝐗(Lk1)=𝐱,{𝐗(τ)}τ[0,Lk1),{1(a,b)(τ)}τ[0,t],Lk=τ)=𝐐(𝕏(k1),𝕏(k))𝕐𝕏(k1)𝐐(𝕏(k1),𝕐).\text{Pr}(\mathbf{X}(L_{k})\in\mathbb{X}^{(k)}\mid\mathbf{X}(L_{k-1})=\mathbf{x},\{\mathbf{X}(\tau^{\prime})\}_{\tau^{\prime}\in[0,L_{k-1})},\{1_{(a,b)}(\tau^{\prime})\}_{\tau^{\prime}\in[0,t]},L_{k}=\tau)=\frac{\mathbf{Q}(\mathbb{X}^{(k-1)},\mathbb{X}^{(k)})}{\sum_{\mathbb{Y}\neq\mathbb{X}^{(k-1)}}\mathbf{Q}(\mathbb{X}^{(k-1)},\mathbb{Y})}.

for all τ>0\tau>0 and 𝐱𝕏(k1)\mathbf{x}\in\mathbb{X}^{(k-1)}. Hence,

Pr(𝐗(Lk)𝕏(k)𝐗(Lk1)𝕏(k1),{𝐗(τ)}τ[0,Lk1),{1(a,b)(τ)}τ[0,t],Lk)=𝐐(𝕏(k1),𝕏(k))𝕐𝕏(k1)𝐐(𝕏(k1),𝕐).\displaystyle\text{Pr}(\mathbf{X}(L_{k})\in\mathbb{X}^{(k)}\mid\mathbf{X}(L_{k-1})\in\mathbb{X}^{(k-1)},\{\mathbf{X}(\tau^{\prime})\}_{\tau^{\prime}\in[0,L_{k-1})},\{1_{(a,b)}(\tau^{\prime})\}_{\tau^{\prime}\in[0,t]},L_{k})=\frac{\mathbf{Q}(\mathbb{X}^{(k-1)},\mathbb{X}^{(k)})}{\sum_{\mathbb{Y}\neq\mathbb{X}^{(k-1)}}\mathbf{Q}(\mathbb{X}^{(k-1)},\mathbb{Y})}. (72)

Since the entire analysis above holds for all k[N]k\in[N], we have the following for all σ1,σ2,,σN0\sigma_{1},\sigma_{2},\ldots,\sigma_{N}\geq 0:

Pr((k=1N{LkLk1σk})(k=0N{𝐗(Lk)𝕏(k)})𝐗(0)𝕏(0),{1(a,b)(τ)}τ[0,t])\displaystyle\text{Pr}\left(\left(\cap_{k=1}^{N}\{L_{k}-L_{k-1}\geq\sigma_{k}\}\right)\cap\left(\cap_{k=0}^{N}\{\mathbf{X}(L_{k})\in\mathbb{X}^{(k)}\}\right)\mid\mathbf{X}(0)\in\mathbb{X}^{(0)},\{1_{(a,b)}(\tau^{\prime})\}_{\tau^{\prime}\in[0,t]}\right) (73)
=k=1NPr(𝐗(Lk)𝕏(k),LkLk1σk{𝐗(Lξ)𝕏(ξ),LξLξ1σξ}ξ=0k1,{1(a,b)(τ)}τ[0,t])\displaystyle=\prod_{k=1}^{N}\text{Pr}(\mathbf{X}(L_{k})\in\mathbb{X}^{(k)},L_{k}-L_{k-1}\geq\sigma_{k}\mid\{\mathbf{X}(L_{\xi})\in\mathbb{X}^{(\xi)},L_{\xi}-L_{\xi-1}\geq\sigma_{\xi}\}_{\xi=0}^{k-1},\{1_{(a,b)}(\tau^{\prime})\}_{\tau^{\prime}\in[0,t]}) (74)
=k=1NPr(LkLk1σk{𝐗(Lξ)𝕏(ξ),LξLξ1σξ}ξ=0k1,{1(a,b)(τ)}τ[0,t])\displaystyle=\prod_{k=1}^{N}\text{Pr}(L_{k}-L_{k-1}\geq\sigma_{k}\mid\{\mathbf{X}(L_{\xi})\in\mathbb{X}^{(\xi)},L_{\xi}-L_{\xi-1}\geq\sigma_{\xi}\}_{\xi=0}^{k-1},\{1_{(a,b)}(\tau^{\prime})\}_{\tau^{\prime}\in[0,t]}) (75)
×k=1NPr(𝐗(Lk)𝕏(k)LkLk1σk,{𝐗(Lξ)𝕏(ξ),LξLξ1σξ}ξ=0k1,{1(a,b)(τ)}τ[0,t])\displaystyle\quad\times\prod_{k=1}^{N}\text{Pr}(\mathbf{X}(L_{k})\in\mathbb{X}^{(k)}\mid L_{k}-L_{k-1}\geq\sigma_{k},\{\mathbf{X}(L_{\xi})\in\mathbb{X}^{(\xi)},L_{\xi}-L_{\xi-1}\geq\sigma_{\xi}\}_{\xi=0}^{k-1},\{1_{(a,b)}(\tau^{\prime})\}_{\tau^{\prime}\in[0,t]}) (76)
=(a)k=1NPr(LkLk1σk𝐗(Lk1)𝕏(k1),{1(a,b)(τ)}τ[0,t])\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\prod_{k=1}^{N}\text{Pr}(L_{k}-L_{k-1}\geq\sigma_{k}\mid\mathbf{X}(L_{k-1})\in\mathbb{X}^{(k-1)},\{1_{(a,b)}(\tau^{\prime})\}_{\tau^{\prime}\in[0,t]}) (77)
×k=1NPr(𝐗(Lk)𝕏(k)LkLk1σk,{𝐗(Lξ)𝕏(ξ),LξLξ1σξ}ξ=0k1,{1(a,b)(τ)}τ[0,t])\displaystyle\quad\times\prod_{k=1}^{N}\text{Pr}(\mathbf{X}(L_{k})\in\mathbb{X}^{(k)}\mid L_{k}-L_{k-1}\geq\sigma_{k},\{\mathbf{X}(L_{\xi})\in\mathbb{X}^{(\xi)},L_{\xi}-L_{\xi-1}\geq\sigma_{\xi}\}_{\xi=0}^{k-1},\{1_{(a,b)}(\tau^{\prime})\}_{\tau^{\prime}\in[0,t]}) (78)
=(b)k=1Nexp(σk𝕐𝕏(k1)𝐐(𝕏(k1),𝕐))×k=1N𝐐(𝕏(k1),𝕏(k))𝕐𝕏(k1)𝐐(𝕏(k1),𝕐),\displaystyle\stackrel{{\scriptstyle(b)}}{{=}}\prod_{k=1}^{N}\exp\left(-\sigma_{k}\sum_{\mathbb{Y}\neq\mathbb{X}^{(k-1)}}\mathbf{Q}(\mathbb{X}^{(k-1)},\mathbb{Y})\right)\times\prod_{k=1}^{N}\frac{\mathbf{Q}(\mathbb{X}^{(k-1)},\mathbb{X}^{(k)})}{\sum_{\mathbb{Y}\neq\mathbb{X}^{(k-1)}}\mathbf{Q}(\mathbb{X}^{(k-1)},\mathbb{Y})}, (79)

where (a)(a) is a consequence of the strong Markov property and the fact that {Lk}k=1N\{L_{k}\}_{k=1}^{N} are stopping times, and (b) follows from (70) and (72). Since σ1,,σN\sigma_{1},\ldots,\sigma_{N} are arbitrary and since the above expression is independent of {1(a,b)(τ):0τt}\{1_{(a,b)}(\tau):0\leq\tau\leq t\}, we have shown that for the event

F{LNκ}={𝐗(0)𝕏(0),,𝐗(LN)𝕏(N),k=1N(LkLk1)κ},F\cap\{L_{N}\geq\kappa\}=\left\{\mathbf{X}(0)\in\mathbb{X}^{(0)},\ldots,\mathbf{X}(L_{N})\in\mathbb{X}^{(N)},\sum_{k=1}^{N}(L_{k}-L_{k-1})\geq\kappa\right\},

we have Pr(F{LNκ}{1(a,b)(τ):0τt})=Pr(F{LNκ})\text{Pr}(F\cap\{L_{N}\geq\kappa\}\mid\{1_{(a,b)}(\tau):0\leq\tau\leq t\})=\text{Pr}(F\cap\{L_{N}\geq\kappa\}). As a result,

Pr({K~=LN}{LNκ}{1(a,b)(τ):0τt})\displaystyle\text{Pr}(\{\tilde{K}=L_{N}\}\cap\{L_{N}\geq\kappa\}\mid\{1_{(a,b)}(\tau):0\leq\tau\leq t\})
=Pr(FN(κ)(F{LNκ}){1(a,b)(τ):0τt})\displaystyle=\text{Pr}\left(\cup_{F\in\mathcal{F}_{N}(\kappa)}(F\cap\{L_{N}\geq\kappa\})\mid\{1_{(a,b)}(\tau):0\leq\tau\leq t\}\right)
=(a)FN(κ)Pr(F{LNκ}{1(a,b)(τ):0τt})\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\sum_{F\in\mathcal{F}_{N}(\kappa)}\text{Pr}\left(F\cap\{L_{N}\geq\kappa\}\mid\{1_{(a,b)}(\tau):0\leq\tau\leq t\}\right)
=FN(κ)Pr(F{LNκ})\displaystyle=\sum_{F\in\mathcal{F}_{N}(\kappa)}\text{Pr}(F\cap\{L_{N}\geq\kappa\})
=(b)Pr(FN(κ)(F{LNκ}))\displaystyle\stackrel{{\scriptstyle(b)}}{{=}}\text{Pr}\left(\cup_{F\in\mathcal{F}_{N}(\kappa)}(F\cap\{L_{N}\geq\kappa\})\right)
=Pr({K~=LN}{LNκ}),\displaystyle=\text{Pr}(\{\tilde{K}=L_{N}\}\cap\{L_{N}\geq\kappa\}),

where (a)(a) and (b)(b) hold because the definition of N(κ)\mathcal{F}_{N}(\kappa) implies that N(κ)\mathcal{F}_{N}(\kappa) is a collection of disjoint events. Since {K~κ}=N=0({K~=LN}{LNκ})\{\tilde{K}\geq\kappa\}=\cup_{N=0}^{\infty}\left(\{\tilde{K}=L_{N}\}\cap\{L_{N}\geq\kappa\}\right) and since {K~=L1}{L1κ},{K~=L2}{L2κ},\{\tilde{K}=L_{1}\}\cap\{L_{1}\geq\kappa\},\{\tilde{K}=L_{2}\}\cap\{L_{2}\geq\kappa\},\ldots are disjoint events, it follows that Pr(K~κ{1(a,b)(τ):0τt})=Pr(K~κ)\text{Pr}(\tilde{K}\geq\kappa\mid\{1_{(a,b)}(\tau):0\leq\tau\leq t\})=\text{Pr}(\tilde{K}\geq\kappa). Moreover, since κ0\kappa\geq 0 is arbitrary, this means that K~\tilde{K} is independent of {1(a,b)(τ):0τt}\mid\{1_{(a,b)}(\tau):0\leq\tau\leq t\}. Finally, since KK and (T,1(a,b)(t))(T,1_{(a,b)}(t)) are functions of K~\tilde{K} and {1(a,b)(τ):0τt}\{1_{(a,b)}(\tau):0\leq\tau\leq t\}, respectively, it follows that KK and (T,1(a,b)(t))(T,1_{(a,b)}(t)) are independent.

Remark 3.

Observe that in the proof of Lemma 5, (73)\eqref{eq:last_leg} implies that the event {LNκ}(k=0N{𝐗(Lk)𝕏(k)})\{L_{N}\geq\kappa\}\cap\left(\cap_{k=0}^{N}\{\mathbf{X}(L_{k})\in\mathbb{X}^{(k)}\}\right) is independent of {1(a,b)(τ):0τt}\{1_{(a,b)}(\tau):0\leq\tau\leq t\} (since LN=k=1N(LkLk1)L_{N}=\sum_{k=1}^{N}(L_{k}-L_{k-1}) and since the initial state 𝐗(0)\mathbf{X}(0) is assumed to be non-random). Note that this is true for all the choices of (a,b)(a,b)-agnostic superstates {𝕏(k)}k=0N\{\mathbb{X}^{(k)}\}_{k=0}^{N} that satisfy k=0N{𝐗(Lk)𝕏(k)}{K~=LN}\cap_{k=0}^{N}\{\mathbf{X}(L_{k})\in\mathbb{X}^{(k)}\}\subset\{\tilde{K}=L_{N}\} and hence also for all {𝕏(k)}k=0N\{\mathbb{X}^{(k)}\}_{k=0}^{N} that satisfy k=0N{𝐗(Lk)𝕏(k)}{K~=LN}{𝐗(K~)𝕐}\cap_{k=0}^{N}\{\mathbf{X}(L_{k})\in\mathbb{X}^{(k)}\}\subset\{\tilde{K}=L_{N}\}\cap\{\mathbf{X}(\tilde{K})\in\mathbb{Y}\}, where 𝕐\mathbb{Y} is an arbitrary (a,b)(a,b)-agnostic superstate. Now, let us by 𝒳\mathcal{X} the set of all {𝕏(k)}k=0N\{\mathbb{X}^{(k)}\}_{k=0}^{N} satisfying k=0N{𝐗(Lk)𝕏(k)}{K~=LN}{𝐗(K~)𝕐}\cap_{k=0}^{N}\{\mathbf{X}(L_{k})\in\mathbb{X}^{(k)}\}\subset\{\tilde{K}=L_{N}\}\cap\{\mathbf{X}(\tilde{K})\in\mathbb{Y}\}, we have

{𝕏(k)}k=0N𝒳(k=0N{𝐗(Lk)𝕏(k)})={K~=LN}{𝐗(K~)𝕐}.\cup_{\{\mathbb{X}^{(k)}\}_{k=0}^{N}\in\mathcal{X}}\left(\cap_{k=0}^{N}\{\mathbf{X}(L_{k})\in\mathbb{X}^{(k)}\}\right)=\{\tilde{K}=L_{N}\}\cap\{\mathbf{X}(\tilde{K})\in\mathbb{Y}\}.

Then, by the preceding arguments we have

Pr({K~=LN}{𝐗(K~)𝕐}{LNκ}{1(a,b)(τ):0τt})\displaystyle\text{Pr}(\{\tilde{K}=L_{N}\}\cap\{\mathbf{X}(\tilde{K})\in\mathbb{Y}\}\cap\{L_{N}\geq\kappa\}\mid\{1_{(a,b)}(\tau):0\leq\tau\leq t\}) (80)
=Pr({𝕏(k)}k=0N𝒳(k=0N{𝐗(Lk)𝕏(k)}){LNκ}{1(a,b)(τ):0τt})\displaystyle=\text{Pr}\left(\cup_{\{\mathbb{X}^{(k)}\}_{k=0}^{N}\in\mathcal{X}}\left(\cap_{k=0}^{N}\{\mathbf{X}(L_{k})\in\mathbb{X}^{(k)}\}\right)\cap\{L_{N}\geq\kappa\}\mid\{1_{(a,b)}(\tau):0\leq\tau\leq t\}\right) (81)
={𝕏(k)}k=0N𝒳Pr(k=0N{𝐗(Lk)𝕏(k)}{LNκ}{1(a,b)(τ):0τt})\displaystyle=\sum_{\{\mathbb{X}^{(k)}\}_{k=0}^{N}\in\mathcal{X}}\text{Pr}\left(\cap_{k=0}^{N}\{\mathbf{X}(L_{k})\in\mathbb{X}^{(k)}\}\cap\{L_{N}\geq\kappa\}\mid\{1_{(a,b)}(\tau):0\leq\tau\leq t\}\right) (82)
={𝕏(k)}k=0N𝒳Pr(k=0N{𝐗(Lk)𝕏(k)}{LNκ})\displaystyle=\sum_{\{\mathbb{X}^{(k)}\}_{k=0}^{N}\in\mathcal{X}}\text{Pr}\left(\cap_{k=0}^{N}\{\mathbf{X}(L_{k})\in\mathbb{X}^{(k)}\}\cap\{L_{N}\geq\kappa\}\right) (83)
=Pr({𝕏(k)}k=0N𝒳(k=0N{𝐗(Lk)𝕏(k)}{LNκ}))\displaystyle=\text{Pr}\left(\cup_{\{\mathbb{X}^{(k)}\}_{k=0}^{N}\in\mathcal{X}}\left(\cap_{k=0}^{N}\{\mathbf{X}(L_{k})\in\mathbb{X}^{(k)}\}\cap\{L_{N}\geq\kappa\}\right)\right) (84)
=Pr({K~=LN}{𝐗(K~)𝕐}{LNκ}),\displaystyle=\text{Pr}(\{\tilde{K}=L_{N}\}\cap\{\mathbf{X}(\tilde{K})\in\mathbb{Y}\}\cap\{L_{N}\geq\kappa\}), (85)

which shows that {K~=LN}{𝐗(K~)𝕐}{LNκ}\{\tilde{K}=L_{N}\}\cap\{\mathbf{X}(\tilde{K})\in\mathbb{Y}\}\cap\{L_{N}\geq\kappa\} is independent of {1(a,b)(τ):0τt}\{1_{(a,b)}(\tau):0\leq\tau\leq t\}. Since {K~κ}{𝐗(K~)𝕐}=N=0({K~=LN}{𝐗(K~)𝕐}{LNκ})\{\tilde{K}\geq\kappa\}\cap\{\mathbf{X}(\tilde{K})\in\mathbb{Y}\}=\cup_{N=0}^{\infty}\left(\{\tilde{K}=L_{N}\}\cap\{\mathbf{X}(\tilde{K})\in\mathbb{Y}\}\cap\{L_{N}\geq\kappa\}\right), it follows that {K~κ}{𝐗(K~)𝕐}\{\tilde{K}\geq\kappa\}\cap\{\mathbf{X}(\tilde{K})\in\mathbb{Y}\} is independent of {1(a,b)(τ):0τt}\{1_{(a,b)}(\tau):0\leq\tau\leq t\}. As a consequence of this observation, the fact that 𝕐\mathbb{Y} is an arbitrary (a,b)(a,b)-agnostic superstate and the fact that κ\kappa is an arbitrary non-negative number, we have that (𝕏(K~),K~)(\mathbb{X}(\tilde{K}),\tilde{K}) are independent of {1(a,b)(τ):0τt}\{1_{(a,b)}(\tau):0\leq\tau\leq t\}, where 𝕏(K~)\mathbb{X}(\tilde{K}) denotes the (a,b)(a,b)-agnostic superstate of the chain at time K~\tilde{K}.

In order to state the remaining lemmas, we need to introduce some additional notation. For two nodes (a,b)[n]×[n](a,b)\in[n]\times[n], we let btab\stackrel{{\scriptstyle t}}{{\rightsquigarrow}}a denote the event that bb transmits pathogens to aa at time tt. For a given time interval [t,t+Δt)[0,)[t,t+\Delta t)\subset[0,\infty), we let {bt,Δta}:=τ[t,t+Δt){bτa}\left\{b\stackrel{{\scriptstyle t,\Delta t}}{{\rightsquigarrow}}a\right\}:=\cup_{\tau\in[t,t+\Delta t)}\{b\stackrel{{\scriptstyle\tau}}{{\rightsquigarrow}}a\}. The complement of this event is denoted by {b\centernott,Δta}\left\{b\stackrel{{\scriptstyle t,\Delta t}}{{\centernot\rightsquigarrow}}a\right\}. For two given node sets A,B[n]A,B\subset[n], we use {BtA}\{B\stackrel{{\scriptstyle t}}{{\rightsquigarrow}}A\} to denote the event that some node(s) of BB infect(s) one or more nodes in AA at time tt.

We now provide a sequence of lemmas that we later use to prove Proposition 2.

Lemma 6.

Suppose a𝒜ia\in\mathcal{A}_{i}, b𝒜jb\in\mathcal{A}_{j}, y{0,1}y\in\{0,1\}, and t1,t2[0,)t_{1},t_{2}\in[0,\infty) such that t1<t2t_{1}<t_{2}. Given that bj(t1):=j(𝐗(t1))b\in\mathcal{I}_{j}(t_{1}):=\mathcal{I}_{j}(\mathbf{X}(t_{1})) and that 1(a,b)(τ):=1(a,b)(𝐗(τ))=y1_{(a,b)}(\tau):=1_{(a,b)}(\mathbf{X}(\tau))=y for all τ[t1,t2)\tau\in[t_{1},t_{2}), the conditional probability that bb neither recovers nor infects aa during the interval [t1,t2)[t_{1},t_{2}) is e(Bijδ1y+γj)(t2t1)e^{-(B_{ij}\delta_{1y}+\gamma_{j})(t_{2}-t_{1})}, where δij\delta_{ij} is the Kronecker delta.

Proof.

Let 𝕏:={𝐱𝕊:bj(𝐱),1(a,b)(𝐱)=y}\mathbb{X}:=\{\mathbf{x}\in\mathbb{S}:b\in\mathcal{I}_{j}(\mathbf{x}),1_{(a,b)}(\mathbf{x})=y\}. Also, let Δt>0\Delta t>0. Since the rate of infection transmission from bb to aa at time t1t_{1} is Bij1(a,b)(𝐗(t1))B_{ij}1_{(a,b)}(\mathbf{X}(t_{1})), we have the following for all 𝐱𝕏\mathbf{x}\in\mathbb{X}:

Pr(bt1,Δta|𝐗(t1)=𝐱)=Bijδ1yΔt+o(Δt).\text{Pr}\left(b\stackrel{{\scriptstyle t_{1},\Delta t}}{{\rightsquigarrow}}a\Bigm{|}\mathbf{X}(t_{1})=\mathbf{x}\right)=B_{ij}\delta_{1y}\Delta t+o(\Delta t).

On the other hand, denoting the event that bb recovers during [t1,t1+Δt)[t_{1},t_{1}+\Delta t) by DbD_{b}, we have

Pr(Db𝐗(t)=𝐱)=γjΔt+o(Δt).\text{Pr}(D_{b}\mid\mathbf{X}(t)=\mathbf{x})={\gamma_{j}}\Delta t+o(\Delta t).

Similarly, if we let F(a,b)F_{(a,b)} denote the event that the edge state 1(a,b)1_{(a,b)} flips (i.e., changes from yy to 1y1-y) during [t1,t1+Δt)[t_{1},t_{1}+\Delta t), we have Pr(F(a,b)𝐗(t)=𝐱)=λ(y(1ρijn)+(1y)ρijn)Δt+o(Δt)\text{Pr}(F_{(a,b)}\mid\mathbf{X}(t)=\mathbf{x})=\lambda\left(y\left(1-\frac{\rho_{ij}}{n}\right)+(1-y)\frac{\rho_{ij}}{n}\right)\Delta t+o(\Delta t). As a result, we have

Pr({b\centernott1,Δta}D¯bF¯(a,b)|𝐗(t1)=𝐱)\displaystyle\text{Pr}\left(\left\{b\stackrel{{\scriptstyle t_{1},\Delta t}}{{\centernot\rightsquigarrow}}a\right\}\cap\bar{D}_{b}\cap\bar{F}_{(a,b)}\Bigm{|}\mathbf{X}(t_{1})=\mathbf{x}\right)
=1Pr({bt1,Δta}DbF(a,b)|𝐗(t1)=𝐱)\displaystyle=1-\text{Pr}\left(\left\{b\stackrel{{\scriptstyle t_{1},\Delta t}}{{\rightsquigarrow}}a\right\}\cup D_{b}\cup F_{(a,b)}\Bigm{|}\mathbf{X}(t_{1})=\mathbf{x}\right)
=(a)1Pr(bt1,Δta|𝐗(t1)=𝐱)Pr(Db𝐗(t1)=𝐱)Pr(F(a,b)𝐗(t1)=𝐱})+o(Δt)\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}1-\text{Pr}\left(b\stackrel{{\scriptstyle t_{1},\Delta t}}{{\rightsquigarrow}}a\Bigm{|}\mathbf{X}(t_{1})=\mathbf{x}\right)-\text{Pr}(D_{b}\mid\mathbf{X}(t_{1})=\mathbf{x})-\text{Pr}(F_{(a,b)}\mid\mathbf{X}(t_{1})=\mathbf{x}\})+o(\Delta t)
=1(Bijδ1yΔt+o(Δt))(γjΔt+o(Δt))(λ(y(1ρijn)+(1y)ρijn)Δt+o(Δt))+o(Δt)\displaystyle=1-(B_{ij}\delta_{1y}\Delta t+o(\Delta t))-(\gamma_{j}\Delta t+o(\Delta t))-\left(\lambda\left(y\left(1-\frac{\rho_{ij}}{n}\right)+(1-y)\frac{\rho_{ij}}{n}\right)\Delta t+o(\Delta t)\right)+o(\Delta t)
=1(Bijδ1y+γj+λ(y(1ρijn)+(1y)ρijn))Δt+o(Δt),\displaystyle=1-\left(B_{ij}\delta_{1y}+\gamma_{j}+\lambda\left(y\left(1-\frac{\rho_{ij}}{n}\right)+(1-y)\frac{\rho_{ij}}{n}\right)\right)\Delta t+o(\Delta t),

where (a) follows from Lemma 1 and the Inclusion-Exclusion principle. Since this holds for all 𝐱𝕏\mathbf{x}\in\mathbb{X}, the above implies that

Pr({b\centernott1,Δta}D¯bF¯(a,b)|𝐗(t1)𝕏)\displaystyle\text{Pr}\left(\left\{b\stackrel{{\scriptstyle t_{1},\Delta t}}{{\centernot\rightsquigarrow}}a\right\}\cap\bar{D}_{b}\cap\bar{F}_{(a,b)}\Bigm{|}\mathbf{X}(t_{1})\in\mathbb{X}\right)
=1(Bijδ1y+γj+λ(y(1ρijn)+(1y)ρijn))Δt+o(Δt).\displaystyle=1-\left(B_{ij}\delta_{1y}+\gamma_{j}+\lambda\left(y\left(1-\frac{\rho_{ij}}{n}\right)+(1-y)\frac{\rho_{ij}}{n}\right)\right)\Delta t+o(\Delta t).

Now, consider any 0\ell\in\mathbb{N}_{0}. By replacing t1t_{1} with t1+Δtt_{1}+\ell\Delta t in the above relation, we obtain

Pr({b\centernott1+Δt,Δta}D¯b()F¯(a,b)()|𝐗(t1+Δt)𝕏)\displaystyle\text{Pr}\left(\left\{b\stackrel{{\scriptstyle t_{1}+\ell\Delta t,\Delta t}}{{\centernot\rightsquigarrow}}a\right\}\cap\bar{D}_{b}^{(\ell)}\cap\bar{F}^{(\ell)}_{(a,b)}\Bigm{|}\mathbf{X}(t_{1}+\ell\Delta t)\in\mathbb{X}\right)
=1(Bijδ1y+γj+λ(y(1ρijn)+(1y)ρijn))Δt+o(Δt),\displaystyle=1-\left(B_{ij}\delta_{1y}+\gamma_{j}+\lambda\left(y\left(1-\frac{\rho_{ij}}{n}\right)+(1-y)\frac{\rho_{ij}}{n}\right)\right)\Delta t+o(\Delta t),

where Db()D_{b}^{(\ell)} is the event that bb recovers during [t1+Δt,t1+(+1)Δt)[t_{1}+\ell\Delta t,t_{1}+(\ell+1)\Delta t) and F(a,b)()F_{(a,b)}^{(\ell)} is the event that 1(a,b)1_{(a,b)} flips during [t1+Δt,t1+(+1)Δt)[t_{1}+\ell\Delta t,t_{1}+(\ell+1)\Delta t). Therefore, on setting Δt=t2t1N\Delta t=\frac{t_{2}-t_{1}}{N} for an arbitrary NN\in\mathbb{N}, it follows that

Pr({b(t)}{b\centernott1,t2t1a}{1(a,b)(τ)=yτ[t1,t2)}|𝐗(t1)=𝐱)\displaystyle\text{Pr}\left(\{b\in\mathcal{I}(t)\}\cap\left\{b\stackrel{{\scriptstyle t_{1},t_{2}-t_{1}}}{{\centernot\rightsquigarrow}}a\right\}\cap\{1_{(a,b)}(\tau)=y\,\forall\,\tau\in[t_{1},t_{2})\}\Bigm{|}\mathbf{X}(t_{1})=\mathbf{x}\right) (86)
==1N1Pr({b\centernott1+Δt,Δta}D¯b()F¯(a,b)()|b\centernott1,Δta,D¯b,D¯b(1),,D¯b(1),F¯(a,b),,F¯(a,b)(1),𝐗(t1)=𝐱)\displaystyle=\prod_{\ell=1}^{N-1}\text{Pr}\left(\left\{b\stackrel{{\scriptstyle t_{1}+\ell\Delta t,\Delta t}}{{\centernot\rightsquigarrow}}a\right\}\cap\bar{D}_{b}^{(\ell)}\cap\bar{F}_{(a,b)}^{(\ell)}\Bigm{|}b\stackrel{{\scriptstyle t_{1},\ell\Delta t}}{{\centernot\rightsquigarrow}}a,\bar{D}_{b},\bar{D}_{b}^{(1)},\ldots,\bar{D}_{b}^{(\ell-1)},\bar{F}_{(a,b)},\ldots,\bar{F}_{(a,b)}^{(\ell-1)},\mathbf{X}(t_{1})=\mathbf{x}\right) (87)
×Pr({b\centernott1,Δta}D¯bF¯(a,b)|𝐗(t1)=𝐱)\displaystyle\quad\quad\times\text{Pr}\left(\left\{b\stackrel{{\scriptstyle t_{1},\Delta t}}{{\centernot\rightsquigarrow}}a\right\}\cap\bar{D}_{b}\cap\bar{F}_{(a,b)}\Bigm{|}\mathbf{X}(t_{1})=\mathbf{x}\right) (88)
==1N1Pr({b\centernott1+Δt,Δta}D¯b()F¯(a,b)()|𝐗(t1+Δt)𝕏,b\centernott1,Δta,{D¯b(σ)}σ=01,{F¯(a,b)(σ)}σ=01,𝐗(t1)=𝐱)\displaystyle=\prod_{\ell=1}^{N-1}\text{Pr}\left(\left\{b\stackrel{{\scriptstyle t_{1}+\ell\Delta t,\Delta t}}{{\centernot\rightsquigarrow}}a\right\}\cap\bar{D}_{b}^{(\ell)}\cap\bar{F}_{(a,b)}^{(\ell)}\Bigm{|}\mathbf{X}(t_{1}+\ell\Delta t)\in\mathbb{X},b\stackrel{{\scriptstyle t_{1},\ell\Delta t}}{{\centernot\rightsquigarrow}}a,\{\bar{D}_{b}^{(\sigma)}\}_{\sigma=0}^{\ell-1},\{\bar{F}_{(a,b)}^{(\sigma)}\}_{\sigma=0}^{\ell-1},\mathbf{X}(t_{1})=\mathbf{x}\right) (89)
×(1(Bijδ1y+γj+λ(y(1ρijn)+(1y)ρijn))Δt+o(Δt))\displaystyle\quad\quad\times\left(1-\left(B_{ij}\delta_{1y}+\gamma_{j}+\lambda\left(y\left(1-\frac{\rho_{ij}}{n}\right)+(1-y)\frac{\rho_{ij}}{n}\right)\right)\Delta t+o(\Delta t)\right) (90)
=(a)=1N1(1(Bijδ1y+γj+λ(y(1ρijn)+(1y)ρijn))Δt+o(Δt))\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\prod_{\ell=1}^{N-1}\left(1-\left(B_{ij}\delta_{1y}+\gamma_{j}+\lambda\left(y\left(1-\frac{\rho_{ij}}{n}\right)+(1-y)\frac{\rho_{ij}}{n}\right)\right)\Delta t+o(\Delta t)\right) (91)
×(1(Bijδ1y+γj+λ(y(1ρijn)+(1y)ρijn))Δt+o(Δt))\displaystyle\quad\times\left(1-\left(B_{ij}\delta_{1y}+\gamma_{j}+\lambda\left(y\left(1-\frac{\rho_{ij}}{n}\right)+(1-y)\frac{\rho_{ij}}{n}\right)\right)\Delta t+o(\Delta t)\right) (92)
=(1(Bijδ1y+γj+λ(y(1ρijn)+(1y)ρijn))(t2t1N))N+o(1N),\displaystyle=\left(1-\left(B_{ij}\delta_{1y}+\gamma_{j}+\lambda\left(y\left(1-\frac{\rho_{ij}}{n}\right)+(1-y)\frac{\rho_{ij}}{n}\right)\right)\left(\frac{t_{2}-t_{1}}{N}\right)\right)^{N}+o\left(\frac{1}{N}\right), (93)

where (a) follows from the following observation: for any 𝐲𝕏\mathbf{y}\in\mathbb{X}, Markovity implies that

Pr({b\centernott1+Δt,Δta}D¯b()F¯(a,b)()|𝐗(t1+Δt)=𝐲,b\centernott1,Δta,{D¯b(σ)}σ=01,{F¯(a,b)(σ)}σ=01,𝐗(t1)=𝐱)\displaystyle\text{Pr}\left(\left\{b\stackrel{{\scriptstyle t_{1}+\ell\Delta t,\Delta t}}{{\centernot\rightsquigarrow}}a\right\}\cap\bar{D}_{b}^{(\ell)}\cap\bar{F}_{(a,b)}^{(\ell)}\Bigm{|}\mathbf{X}(t_{1}+\ell\Delta t)=\mathbf{y},b\stackrel{{\scriptstyle t_{1},\ell\Delta t}}{{\centernot\rightsquigarrow}}a,\{\bar{D}_{b}^{(\sigma)}\}_{\sigma=0}^{\ell-1},\{\bar{F}_{(a,b)}^{(\sigma)}\}_{\sigma=0}^{\ell-1},\mathbf{X}(t_{1})=\mathbf{x}\right)
=Pr({b\centernott1+Δt,Δta}D¯b(1)F¯(a,b)(1)|𝐗(t1+Δt)=𝐲)\displaystyle=\text{Pr}\left(\left\{b\stackrel{{\scriptstyle t_{1}+\Delta t,\Delta t}}{{\centernot\rightsquigarrow}}a\right\}\cap\bar{D}_{b}^{(1)}\cap\bar{F}_{(a,b)}^{(1)}\Bigm{|}\mathbf{X}(t_{1}+\Delta t)=\mathbf{y}\right)
=1(Bijδ1y+γj+λ(y(1ρijn)+(1y)ρijn))Δt+o(Δt),\displaystyle=1-\left(B_{ij}\delta_{1y}+\gamma_{j}+\lambda\left(y\left(1-\frac{\rho_{ij}}{n}\right)+(1-y)\frac{\rho_{ij}}{n}\right)\right)\Delta t+o(\Delta t),

which further implies that

Pr({b\centernott1+Δt,Δta}D¯b()F¯(a,b)()|𝐗(t1+Δt)𝕏,b\centernott1,Δta,{D¯b(σ)}σ=01,{F¯(a,b)(σ)}σ=01,𝐗(t1)=𝐱)\displaystyle\text{Pr}\left(\left\{b\stackrel{{\scriptstyle t_{1}+\ell\Delta t,\Delta t}}{{\centernot\rightsquigarrow}}a\right\}\cap\bar{D}_{b}^{(\ell)}\cap\bar{F}_{(a,b)}^{(\ell)}\Bigm{|}\mathbf{X}(t_{1}+\ell\Delta t)\in\mathbb{X},b\stackrel{{\scriptstyle t_{1},\ell\Delta t}}{{\centernot\rightsquigarrow}}a,\{\bar{D}_{b}^{(\sigma)}\}_{\sigma=0}^{\ell-1},\{\bar{F}_{(a,b)}^{(\sigma)}\}_{\sigma=0}^{\ell-1},\mathbf{X}(t_{1})=\mathbf{x}\right)
=1(Bijδ1y+γj+λ(y(1ρijn)+(1y)ρijn))Δt+o(Δt).\displaystyle=1-\left(B_{ij}\delta_{1y}+\gamma_{j}+\lambda\left(y\left(1-\frac{\rho_{ij}}{n}\right)+(1-y)\frac{\rho_{ij}}{n}\right)\right)\Delta t+o(\Delta t).

Now, since (86) holds for all NN\in\mathbb{N}, it follows that

Pr({b(t)}{b\centernott1,t2t1a}{1(a,b)(τ)=yτ[t1,t2)}|𝐗(t1)=𝐱)\displaystyle\text{Pr}\left(\{b\in\mathcal{I}(t)\}\cap\left\{b\stackrel{{\scriptstyle t_{1},t_{2}-t_{1}}}{{\centernot\rightsquigarrow}}a\right\}\cap\{1_{(a,b)}(\tau)=y\,\forall\,\tau\in[t_{1},t_{2})\}\Bigm{|}\mathbf{X}(t_{1})=\mathbf{x}\right) (94)
=limN((1(Bijδ1y+γj+λy(1ρijn)+λ(1y)ρijn)(t2t1N))N+o(1N))\displaystyle=\lim_{N\rightarrow\infty}\left(\left(1-\left(B_{ij}\delta_{1y}+\gamma_{j}+\lambda y\left(1-\frac{\rho_{ij}}{n}\right)+\lambda\left(1-y\right)\frac{\rho_{ij}}{n}\right)\left(\frac{t_{2}-t_{1}}{N}\right)\right)^{N}+o\left(\frac{1}{N}\right)\right) (95)
=e(Bijδ1y+γj+λ(y(1ρijn)+(1y)ρijn))(t2t1).\displaystyle=e^{-\left(B_{ij}\delta_{1y}+\gamma_{j}+\lambda\left(y\left(1-\frac{\rho_{ij}}{n}\right)+(1-y)\frac{\rho_{ij}}{n}\right)\right)\left(t_{2}-t_{1}\right)}. (96)

Similarly, we can show that

Pr(1(a,b)(τ)=yτ[t1,t2)|𝐗(t1)=𝐱)=eλ(y(1ρijn)+(1y)ρijn)(t2t1).\displaystyle\text{Pr}\left(1_{(a,b)}(\tau)=y\,\forall\,\tau\in[t_{1},t_{2})\Bigm{|}\mathbf{X}(t_{1})=\mathbf{x}\right)=e^{-\lambda\left(y\left(1-\frac{\rho_{ij}}{n}\right)+(1-y)\frac{\rho_{ij}}{n}\right)\left(t_{2}-t_{1}\right)}. (97)

As a result of (94) and (97),

Pr({b(t)}{b\centernott1,t2t1a}|{1(a,b)(τ)=yτ[t1,t2)},𝐗(t1)=𝐱)\displaystyle\text{Pr}\left(\{b\in\mathcal{I}(t)\}\cap\left\{b\stackrel{{\scriptstyle t_{1},t_{2}-t_{1}}}{{\centernot\rightsquigarrow}}a\right\}\Bigm{|}\{1_{(a,b)}(\tau)=y\,\forall\,\tau\in[t_{1},t_{2})\},\mathbf{X}(t_{1})=\mathbf{x}\right)
=Pr({b(t)}{b\centernott1,t2t1a}{1(a,b)(τ)=yτ[t1,t2)}|𝐗(t1)=𝐱)Pr(1(a,b)(τ)=yτ[t1,t2)|𝐗(t1)=𝐱)\displaystyle=\frac{\text{Pr}\left(\{b\in\mathcal{I}(t)\}\cap\left\{b\stackrel{{\scriptstyle t_{1},t_{2}-t_{1}}}{{\centernot\rightsquigarrow}}a\right\}\cap\{1_{(a,b)}(\tau)=y\,\forall\,\tau\in[t_{1},t_{2})\}\Bigm{|}\mathbf{X}(t_{1})=\mathbf{x}\right)}{\text{Pr}\left(1_{(a,b)}(\tau)=y\,\forall\,\tau\in[t_{1},t_{2})\Bigm{|}\mathbf{X}(t_{1})=\mathbf{x}\right)}
=e(Bijδ1y+γj)(t2t1).\displaystyle=e^{-(B_{ij}\delta_{1y}+\gamma_{j})(t_{2}-t_{1})}.

Since the above holds for all 𝐱𝕏\mathbf{x}\in\mathbb{X}, it follows that

Pr({b(t)}{b\centernott1,t2t1a}|{1(a,b)(τ)=yτ[t1,t2)},𝐗(t1)𝕏)=e(Bijδ1y+γj)(t2t1),\text{Pr}\left(\{b\in\mathcal{I}(t)\}\cap\left\{b\stackrel{{\scriptstyle t_{1},t_{2}-t_{1}}}{{\centernot\rightsquigarrow}}a\right\}\Bigm{|}\{1_{(a,b)}(\tau)=y\,\forall\,\tau\in[t_{1},t_{2})\},\mathbf{X}(t_{1})\in\mathbb{X}\right)=e^{-(B_{ij}\delta_{1y}+\gamma_{j})(t_{2}-t_{1})},

which proves the lemma.

Lemma 7.

Let Ton:=tKtT1(a,b)(σ)𝑑σT_{\text{on}}:=\int_{t-K}^{t-T}1_{(a,b)}(\sigma)d\sigma denote the total duration of time for which the edge (a,b)(a,b) exists in the network during [tK,tT][t-K,t-T]. Then, for all κ,τ[0,t]\kappa,\tau\in[0,t] and all τon[0,(κτ)+]\tau_{\text{on}}\in\left[0,(\kappa-\tau)_{+}\right], we have

Pr(b\centernot0,ta|(K,T,Ton)=(κ,τ,τon),b(t),(a,b)E(t))=eBijτon,\text{Pr}\left(b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\,\bigg{\lvert}\,(K,T,T_{\text{on}})=(\kappa,\tau,\tau_{\text{on}}),b\in\mathcal{I}(t^{-}),(a,b)\notin E(t)\right)=e^{-B_{ij}\tau_{\text{on}}},

where we define (σ):=ε>0τ[σε,σ)(τ)\mathcal{I}(\sigma^{-}):=\cup_{\varepsilon>0}\cap_{\tau^{\prime}\in[\sigma-\varepsilon,\sigma)}\mathcal{I}(\tau^{\prime}) for all σ0\sigma\geq 0. In other words, c(σ)c\in\mathcal{I}(\sigma^{-}) iff there exists an ε>0\varepsilon>0 such that c(τ)c\in\mathcal{I}(\tau^{\prime}) for all τ[σε,σ)\tau^{\prime}\in[\sigma-\varepsilon,\sigma).

Proof.

We first show that {b\centernottκ,κτa}\left\{b\stackrel{{\scriptstyle t-\kappa,\kappa-\tau}}{{\centernot\rightsquigarrow}}a\right\} is conditionally independent of {(a,b)E(t)}\{(a,b)\notin E(t)\} given (K,T,Ton)=(κ,τ,τon)(K,T,T_{\text{on}})=(\kappa,\tau,\tau_{\text{on}}) and b(tτ)b\in\mathcal{I}(t-\tau):

Pr((a,b)E(t)|(K,T,Ton)=(κ,τ,τon),b(tτ),b\centernottκ,κτa)\displaystyle\text{Pr}\left((a,b)\in E(t)\,\bigg{\lvert}\,(K,T,T_{\text{on}})=(\kappa,\tau,\tau_{\text{on}}),b\in\mathcal{I}(t-\tau),b\stackrel{{\scriptstyle t-\kappa,\kappa-\tau}}{{\centernot\rightsquigarrow}}a\right) (98)
=(a)Pr((a,b)E(tτ)|(K,T,Ton)=(κ,τ,τon),b(tτ),b\centernottκ,κτa)\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\text{Pr}\left((a,b)\in E(t-\tau)\,\bigg{\lvert}\,(K,T,T_{\text{on}})=(\kappa,\tau,\tau_{\text{on}}),b\in\mathcal{I}(t-\tau),b\stackrel{{\scriptstyle t-\kappa,\kappa-\tau}}{{\centernot\rightsquigarrow}}a\right) (99)
=(b)ρijn\displaystyle\stackrel{{\scriptstyle(b)}}{{=}}\frac{\rho_{ij}}{n} (100)
=Pr((a,b)E(tτ)T=τ)\displaystyle=\text{Pr}\left((a,b)\in E(t-\tau)\mid T=\tau\right) (101)
=(c)Pr((a,b)E(t)T=τ),\displaystyle\stackrel{{\scriptstyle(c)}}{{=}}\text{Pr}((a,b)\in E(t)\mid T=\tau), (102)

where (a)(a) and (c)(c) hold because 1(a,b)1_{(a,b)} is not updated during the interval [tτ,t)[t-\tau,t), and (b) follows from the modeling assumption that the probability of the edge (a,b)(a,b) existing in the network following an edge state update is ρijn\frac{\rho_{ij}}{n} (independent of the past states {𝐗(τ):0τ<tτ}\{\mathbf{X}(\tau^{\prime}):0\leq\tau^{\prime}<t-\tau\}), the fact that {b(tτ)}={b((tτ))}\{b\in\mathcal{I}(t-\tau)\}=\{b\in\mathcal{I}((t-\tau)^{-})\} almost surely, and from the observation that tτt-\tau is an update time for 1(a,b)1_{(a,b)} given T=τT=\tau.

In view of (98), the definitions of KK, TT, and TonT_{\text{on}} imply that

Pr(b\centernot0,ta|(K,T,Ton)=(κ,τ,τon),b(t),(a,b)E(t))\displaystyle\text{Pr}\left(b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\,\bigg{\lvert}\,(K,T,T_{\text{on}})=(\kappa,\tau,\tau_{\text{on}}),b\in\mathcal{I}(t^{-}),(a,b)\notin E(t)\right) (103)
=Pr(b\centernottκ,κτa|(K,T,Ton)=(κ,τ,τon),b(t),(a,b)E(t))\displaystyle=\text{Pr}\left(b\stackrel{{\scriptstyle t-\kappa,\kappa-\tau}}{{\centernot\rightsquigarrow}}a\,\bigg{\lvert}\,(K,T,T_{\text{on}})=(\kappa,\tau,\tau_{\text{on}}),b\in\mathcal{I}(t^{-}),(a,b)\notin E(t)\right) (104)
=Pr(b\centernottκ,κτa|(K,T,Ton)=(κ,τ,τon),b(t))\displaystyle=\text{Pr}\left(b\stackrel{{\scriptstyle t-\kappa,\kappa-\tau}}{{\centernot\rightsquigarrow}}a\,\bigg{\lvert}\,(K,T,T_{\text{on}})=(\kappa,\tau,\tau_{\text{on}}),b\in\mathcal{I}(t^{-})\right) (105)
=Pr(b\centernottκ,κτa|(K,T,Ton)=(κ,τ,τon),b(tτ),bτ(tτ,t)(τ))\displaystyle=\text{Pr}\left(b\stackrel{{\scriptstyle t-\kappa,\kappa-\tau}}{{\centernot\rightsquigarrow}}a\,\bigg{\lvert}\,(K,T,T_{\text{on}})=(\kappa,\tau,\tau_{\text{on}}),b\in\mathcal{I}(t-\tau),b\notin\cup_{\tau^{\prime}\in(t-\tau,t)}\mathcal{R}(\tau^{\prime})\right) (106)
=Pr(b\centernottκ,κτa|(K,T,Ton)=(κ,τ,τon),b(tτ)),\displaystyle=\text{Pr}\left(b\stackrel{{\scriptstyle t-\kappa,\kappa-\tau}}{{\centernot\rightsquigarrow}}a\,\bigg{\lvert}\,(K,T,T_{\text{on}})=(\kappa,\tau,\tau_{\text{on}}),b\in\mathcal{I}(t-\tau)\right), (107)

where the last step holds because 𝐐(𝐱,𝐱b)=γj\mathbf{Q}(\mathbf{x},\mathbf{x}_{\downarrow b})=\gamma_{j} for all 𝐱𝕊\mathbf{x}\in\mathbb{S} satisfying xb=1x_{b}=1, which implies that, given b(tτ)b\in\mathcal{I}(t-\tau) and any other conditioning event, node bb recovers during (tτ,t)(t-\tau,t) at a constant rate of γj\gamma_{j} independently of all past edge states and past disease states (and therefore independently of past transmissions as well). Hence, {b\centernottκ,κτa}\left\{b\stackrel{{\scriptstyle t-\kappa,\kappa-\tau}}{{\centernot\rightsquigarrow}}a\right\} and {bτ(tτ,t)(τ)}\{b\notin\cup_{\tau^{\prime}\in(t-\tau,t)}\mathcal{R}(\tau^{\prime})\} are conditionally independent given (K,T,Ton)=(κ,τ,τon)(K,T,T_{\text{on}})=(\kappa,\tau,\tau_{\text{on}}) and b(tτ)b\in\mathcal{I}(t-\tau).

We now evaluate the right-hand side of (103) as follows. Let CC denote the (random) number of times 1(a,b)1_{(a,b)} flips (changes) during [tK,tT][t-K,t-T], and let the times of these changes be T1<<TCT_{1}<\cdots<T_{C}. We assume that CC is even (as the case of CC being odd is handled similarly) and that 1(a,b)(τ)=01_{(a,b)}(\tau^{\prime})=0 for τ[tK,T1]\tau^{\prime}\in[t-K,T_{1}] (the case 1(a,b)(τ)=11_{(a,b)}(\tau^{\prime})=1 for τ[tK,T1]\tau^{\prime}\in[t-K,T_{1}] is handled similarly). Then, for a given cc\in\mathbb{N} and a collection of times t1,,tc,τont_{1},\ldots,t_{c},\tau_{\text{on}}, we have {C=c,T1=t1,,Tc=tc}{Ton=τon}\{C=c,T_{1}=t_{1},\ldots,T_{c}=t_{c}\}\subset\{T_{\text{on}}=\tau_{\text{on}}\} iff k=1c/2(t2kt2k1)=τon\sum_{k=1}^{c/2}(t_{2k}-t_{2k-1})=\tau_{\text{on}}. Suppose this condition holds. Then, observe that

Pr(b\centernottκ,κτa,b(tτ)|(K,T,Ton)=(κ,τ,τon),C=c,(T1,,Tc)=(t1,,tc))\displaystyle\text{Pr}\left(b\stackrel{{\scriptstyle t-\kappa,\kappa-\tau}}{{\centernot\rightsquigarrow}}a,b\in\mathcal{I}(t-\tau)\,\bigg{\lvert}\,(K,T,T_{\text{on}})=(\kappa,\tau,\tau_{\text{on}}),C=c,(T_{1},\ldots,T_{c})=(t_{1},\ldots,t_{c})\right) (108)
=Pr(b\centernottκ,κτa,b(tτ)|(K,T,Ton)=(κ,τ,τon),1(a,b)(τ)=1 iff τk=1c/2[t2k1,t2k])\displaystyle=\text{Pr}\left(b\stackrel{{\scriptstyle t-\kappa,\kappa-\tau}}{{\centernot\rightsquigarrow}}a,b\in\mathcal{I}(t-\tau)\,\bigg{\lvert}\,(K,T,T_{\text{on}})=(\kappa,\tau,\tau_{\text{on}}),1_{(a,b)}(\tau^{\prime})=1\text{ iff }\tau^{\prime}\in\cup_{k=1}^{c/2}[t_{2k-1},t_{2k}]\right) (109)
=(a)k=1c/2Pr(b\centernott2k1,t2kt2k1a,b(t2k)| 1(a,b)(τ)=1τ[t2k1,t2k],b(t2k1))\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\prod_{k=1}^{c/2}\text{Pr}\left(b\stackrel{{\scriptstyle t_{2k-1},t_{2k}-t_{2k-1}}}{{\centernot\rightsquigarrow}}a,b\in\mathcal{I}(t_{2k})\,\bigg{\lvert}\,1_{(a,b)}(\tau^{\prime})=1\,\forall\,\tau^{\prime}\in[t_{2k-1},t_{2k}],b\in\mathcal{I}(t_{2k-1})\right) (110)
×k=1c/2+1Pr(b\centernott2k2,t2k1t2k2a,b(t2k1)| 1(a,b)(τ)=0τ[t2k2,t2k1],b(t2k1))\displaystyle\quad\times\prod_{k=1}^{c/2+1}\text{Pr}\left(b\stackrel{{\scriptstyle t_{2k-2},t_{2k-1}-t_{2k-2}}}{{\centernot\rightsquigarrow}}a,b\in\mathcal{I}(t_{2k-1})\,\bigg{\lvert}\,1_{(a,b)}(\tau^{\prime})=0\,\forall\,\tau^{\prime}\in[t_{2k-2},t_{2k-1}],b\in\mathcal{I}(t_{2k-1})\right) (111)
=(b)k=1c/2e(Bij+γj)(t2kt2k1)×k=1c/2+1eγj(t2k1t2k2)\displaystyle\stackrel{{\scriptstyle(b)}}{{=}}\prod_{k=1}^{c/2}e^{-(B_{ij}+\gamma_{j})(t_{2k}-t_{2k-1})}\times\prod_{k=1}^{c/2+1}e^{-\gamma_{j}(t_{2k-1}-t_{2k-2})} (112)
=eBijk=1c/2(t2kt2k1)eγjk=1c+1(tktk1)\displaystyle=e^{-B_{ij}\sum_{k=1}^{c/2}(t_{2k}-t_{2k-1})}\cdot e^{-\gamma_{j}\sum_{k=1}^{c+1}(t_{k}-t_{k-1})} (113)
=eBijτoneγj(κτ),\displaystyle=e^{-B_{ij}\tau_{\text{on}}}e^{-\gamma_{j}(\kappa-\tau)}, (114)

where (b)(b) follows from Lemma 6, and (a)(a) follows from the following fact: since the definition of our epidemic model implies that the rate of pathogen transmission from bb to aa at any time instant tt^{\prime} depends only on 1(a,b)(t)1_{(a,b)}(t^{\prime}) and the disease state of bb at time tt^{\prime}, transmission events corresponding to disjoint time intervals are conditionally independent if we are given 1(a,b)1_{(a,b)} and the disease state of bb as functions of time.

On the other hand, we have

Pr(b(tτ)(K,T,Ton)=(κ,τ,τon),C=c,(T1,,Tc)=(t1,,tc))\displaystyle\text{Pr}(b\in\mathcal{I}(t-\tau)\mid(K,T,T_{\text{on}})=(\kappa,\tau,\tau_{\text{on}}),C=c,(T_{1},\ldots,T_{c})=(t_{1},\ldots,t_{c})) (115)
=Pr(b(tτ)(K,T,Ton)=(κ,τ,τon),C=c,(T1,,Tc)=(t1,,tc))\displaystyle=\text{Pr}(b\notin\mathbb{R}(t-\tau)\mid(K,T,T_{\text{on}})=(\kappa,\tau,\tau_{\text{on}}),C=c,(T_{1},\ldots,T_{c})=(t_{1},\ldots,t_{c})) (116)
=eγj((tτ)(tκ))\displaystyle=e^{-\gamma_{j}((t-\tau)-(t-\kappa))} (117)
=eγj(κτ),\displaystyle=e^{-\gamma_{j}(\kappa-\tau)}, (118)

where the second equality holds because our model assumes that the rate of recovery of an infected node is time-invariant and independent of all the edge states and the disease states of other nodes (precisely, 𝐐(𝐱,𝐱b)=γj\mathbf{Q}(\mathbf{x},\mathbf{x}_{\downarrow b})=\gamma_{j} for all 𝐱𝕊\mathbf{x}\in\mathbb{S} such that b(𝐱)b\in\mathcal{I}(\mathbf{x})).

As a result of (108) and (115), we have

Pr(b\centernottκ,κτa|(K,T,Ton)=(κ,τ,τon),b(tτ),(C,T1,,TC)=(c,t1,,tc))\displaystyle\text{Pr}\left(b\stackrel{{\scriptstyle t-\kappa,\kappa-\tau}}{{\centernot\rightsquigarrow}}a\,\bigg{\lvert}\,(K,T,T_{\text{on}})=(\kappa,\tau,\tau_{\text{on}}),b\in\mathcal{I}(t-\tau),(C,T_{1},\ldots,T_{C})=(c,t_{1},\ldots,t_{c})\right)
=Pr(b\centernottκ,κτa,b(tτ)|(K,T,Ton)=(κ,τ,τon),(C,T1,,TC)=(c,t1,,tc))Pr(b(tτ)|(K,T,Ton)=(κ,τ,τon),(C,T1,,TC)=(c,t1,,tc))\displaystyle=\frac{\text{Pr}\left(b\stackrel{{\scriptstyle t-\kappa,\kappa-\tau}}{{\centernot\rightsquigarrow}}a,b\in\mathcal{I}(t-\tau)\,\bigg{\lvert}\,(K,T,T_{\text{on}})=(\kappa,\tau,\tau_{\text{on}}),(C,T_{1},\ldots,T_{C})=(c,t_{1},\ldots,t_{c})\right)}{\text{Pr}\left(b\in\mathcal{I}(t-\tau)\,\bigg{\lvert}\,(K,T,T_{\text{on}})=(\kappa,\tau,\tau_{\text{on}}),(C,T_{1},\ldots,T_{C})=(c,t_{1},\ldots,t_{c})\right)}
=eBijτon.\displaystyle=e^{-B_{ij}\tau_{\text{on}}}.

Since (c,t1,,tc)(c,t_{1},\ldots,t_{c}) was an arbitrary tuple satisfying {(C,T1,,TC)=(c,t1,,tc)}{Ton=τon}\{(C,T_{1},\ldots,T_{C})=(c,t_{1},\ldots,t_{c})\}\subset\{T_{\text{on}}=\tau_{\text{on}}\}, it follows that

Pr(b\centernottκ,κτa|(K,T,Ton)=(κ,τ,τon),b(tτ))=eBijτon.\text{Pr}\left(b\stackrel{{\scriptstyle t-\kappa,\kappa-\tau}}{{\centernot\rightsquigarrow}}a\,\bigg{\lvert}\,(K,T,T_{\text{on}})=(\kappa,\tau,\tau_{\text{on}}),b\in\mathcal{I}(t-\tau)\right)=e^{-B_{ij}\tau_{\text{on}}}.

Invoking (103) now completes the proof.

Observe that in the above proof, given that K=κK=\kappa and that (a,b)E(t)(a,b)\notin E(t), (C,T1,,TC)(C,T_{1},\ldots,T_{C}) uniquely determines {1(a,b)(τ):tKτt}\{1_{(a,b)}(\tau):t-K\leq\tau\leq t\}. Therefore, as an implication of the above proof, we have

Pr(b\centernottK,taK=κ,{1(a,b)(τ):tKτt},b(t),(a,b)E(t))=eBijTon.\text{Pr}\left(b\stackrel{{\scriptstyle t-K,t}}{{\centernot\rightsquigarrow}}a\mid K=\kappa,\{1_{(a,b)}(\tau):t-K\leq\tau\leq t\},b\in\mathcal{I}(t^{-}),(a,b)\notin E(t)\right)=e^{-B_{ij}T_{\text{on}}}.

The dependence on the random variable TonT_{\text{on}} holds because TonT_{\text{on}} is a function of {1(a,b)(τ):tKτt}\{1_{(a,b)}(\tau):t-K\leq\tau\leq t\}. By invoking Markovity, this result can be extended to

Pr(b\centernottK,taK=κ,{1(a,b)(τ):0τt},b(t),(a,b)E(t))=eBijTon,\text{Pr}\left(b\stackrel{{\scriptstyle t-K,t}}{{\centernot\rightsquigarrow}}a\mid K=\kappa,\{1_{(a,b)}(\tau):0\leq\tau\leq t\},b\in\mathcal{I}(t^{-}),(a,b)\notin E(t)\right)=e^{-B_{ij}T_{\text{on}}},

which is equivalent to the following lemma.

Lemma 8.

Let Ton:=tKtT1(a,b)(σ)𝑑σT_{\text{on}}:=\int_{t-K}^{t-T}1_{(a,b)}(\sigma)d\sigma denote the total duration of time for which the edge (a,b)(a,b) exists in the network during [tK,tT][t-K,t-T]. Then, for all κ[0,t]\kappa\in[0,t], we have

Pr(b\centernot0,ta|K=κ,{1(a,b)(τ):0τt},b(t),(a,b)E(t))=eBijTon.\text{Pr}\left(b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\,\bigg{\lvert}\,K=\kappa,\{1_{(a,b)}(\tau):0\leq\tau\leq t\},b\in\mathcal{I}(t^{-}),(a,b)\notin E(t)\right)=e^{-B_{ij}T_{\text{on}}}.

Lemma 9.

Recall from Lemma 8 that Ton=tKtT1(a,b)(σ)𝑑σT_{\text{on}}=\int_{t-K}^{t-T}1_{(a,b)}(\sigma)d\sigma. Then for all κ,τ[0,t]\kappa,\tau\in[0,t], we have

Pr((𝒮(t),(t))=(𝒮0,0)|(K,T,Ton)=(κ,τ,τon),b\centernot0,ta,(a,b)E(t),b(t))\displaystyle\text{Pr}\left((\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0})\,\Big{\lvert}\,(K,T,T_{\text{on}})=(\kappa,\tau,\tau_{\text{on}}),b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a,(a,b)\notin E(t),b\in\mathcal{I}(t^{-})\right)
=Pr((𝒮(t),(t))=(𝒮0,0)|K=κ,b\centernot0,ta,(a,b)E(t),b(t)).\displaystyle=\text{Pr}\left((\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0})\,\Big{\lvert}\,K=\kappa,b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a,(a,b)\notin E(t),b\in\mathcal{I}(t^{-})\right).

Proof.

We first examine the following conditional probability for an arbitrary (a,b)(a,b)-agnostic superstate 𝕐\mathbb{Y}:

Pr((𝒮(t),(t))=(𝒮0,0),𝐗(K~)𝕐,K=κ,b\centernot0,ta,b(t){1(a,b)(τ):0τt},(a,b)E(t)).\text{Pr}\left((\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0}),\mathbf{X}(\tilde{K})\in\mathbb{Y},K=\kappa,b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a,b\in\mathcal{I}(t^{-})\mid\{1_{(a,b)}(\tau):0\leq\tau\leq t\},(a,b)\notin E(t)\right).

To begin, note that the proof of Lemma 5, Remark 3, and the fact that KK is a function of K~\tilde{K} together imply that

fK{1(a,b)(τ):0τt},(a,b)E(t)(κ)=fK(κ)\displaystyle f_{K\mid\{1_{(a,b)}(\tau):0\leq\tau\leq t\},(a,b)\notin E(t)}(\kappa)=f_{K}(\kappa)

and that

Pr(𝐗(K~)𝕐K=κ,{1(a,b)(τ):0τt},(a,b)E(t))=Pr(𝐗(K~)𝕐K=κ).\displaystyle\text{Pr}(\mathbf{X}(\tilde{K})\in\mathbb{Y}\mid K=\kappa,\{1_{(a,b)}(\tau):0\leq\tau\leq t\},(a,b)\notin E(t))=\text{Pr}(\mathbf{X}(\tilde{K})\in\mathbb{Y}\mid K=\kappa). (119)

Next, for the event {b(t)}\{b\in\mathcal{I}(t^{-})\}, we have

Pr(b(t)K=κ,𝐗(K~)𝕐,{1(a,b)(τ):0τt},(a,b)E(t))\displaystyle\text{Pr}\left(b\in\mathcal{I}(t^{-})\mid K=\kappa,\mathbf{X}(\tilde{K})\in\mathbb{Y},\{1_{(a,b)}(\tau):0\leq\tau\leq t\},(a,b)\notin E(t)\right) (120)
=(a)eγj(tκ)\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}e^{-\gamma_{j}(t-\kappa)} (121)
=(b)Pr(b(t)K=κ)\displaystyle\stackrel{{\scriptstyle(b)}}{{=}}\text{Pr}\left(b\in\mathcal{I}(t^{-})\mid K=\kappa\right) (122)

where (a)(a) and (b)(b) follow from our modelling assumption that 𝐐(𝐱,𝐱b)=γj\mathbf{Q}(\mathbf{x},\mathbf{x}_{\downarrow b})=\gamma_{j} for all 𝐱\mathbf{x} satisfying xb=1x_{b}=1, which means that the recovery time of bb depends only on the time of infection of bb and is conditionally independent of all other disease states and all the edge states. Similarly, we have

Pr(b\centernot0,ta𝐗(K~)𝕐,K=κ,{1(a,b)(τ):0τt},(a,b)E(t),b(t))\displaystyle\text{Pr}\left(b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\mid\mathbf{X}(\tilde{K})\in\mathbb{Y},K=\kappa,\{1_{(a,b)}(\tau):0\leq\tau\leq t\},(a,b)\notin E(t),b\in\mathcal{I}(t^{-})\right) (123)
=(a)Pr(b\centernot0,taK=κ,{1(a,b)(τ):0τt},(a,b)E(t),b(t))\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\text{Pr}\left(b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\mid K=\kappa,\{1_{(a,b)}(\tau):0\leq\tau\leq t\},(a,b)\notin E(t),b\in\mathcal{I}(t^{-})\right) (124)
=(b)eBijTon\displaystyle\stackrel{{\scriptstyle(b)}}{{=}}e^{-B_{ij}T_{\text{on}}} (125)

where (a)(a) follows from our modelling assumptions, which imply that the rate of infection transmission along an edge depends only on the edge state of the transmitting edge and the disease state of the transmitting node and is conditionally independent of other disease states and edge states (which are captured by the (a,b)(a,b)-agnostic superstate of the chain) and (b)(b) follows from Lemma 8. Note that TonT_{\text{on}} is a function of TT and hence also of {1(a,b)(τ):0τt}\{1_{(a,b)}(\tau):0\leq\tau\leq t\}.

It remains for us to analyze

Pr((𝒮(t),(t))=(𝒮0,0)𝐗(K~)𝕐,K=κ,b\centernot0,ta,b(t),{1(a,b)(τ):0τt},(a,b)E(t)).\text{Pr}\left((\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0})\mid\mathbf{X}(\tilde{K})\in\mathbb{Y},K=\kappa,b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a,b\in\mathcal{I}(t^{-}),\{1_{(a,b)}(\tau):0\leq\tau\leq t\},(a,b)\notin E(t)\right).

To do so, we first let LNL_{N} denote the time of the first (a,b)(a,b)-agnostic jump to occur after bb gets infected, i.e., N:=inf{:LK~}N:=\inf\{\ell\in\mathbb{N}:L_{\ell}\geq\tilde{K}\}, and we note the following: given the conditioning events and variables above (including the event that bb does not infect aa during [0,t][0,t]), the total conditional rate at which aa receives pathogens at any time τLN\tau\leq L_{N} is

q=1mdq(𝐗(K~){b}Biq1(a,d)(𝐗(K~)),\sum_{q=1}^{m}\sum_{d\in\mathcal{I}_{q}(\mathbf{X}(\tilde{K})\setminus\{b\}}B_{iq}1_{(a,d)}(\mathbf{X}(\tilde{K})),

which is determined uniquely by 𝕐\mathbb{Y}, the (a,b)(a,b)-agnostic superstate of the chain at time K~\tilde{K}. Therefore, this rate is conditionally independent of 1(a,b)(τ)1_{(a,b)}(\tau) for any τ\tau. Similarly, for all age groups [m]\ell\in[m], given the conditioning events and variables above, the conditional rate at which a node d(𝐗(K~))d\in\mathcal{I}_{\ell}(\mathbf{X}(\tilde{K})) recovers, which equals γ\gamma_{\ell}, and the total conditional rate at which a node c𝒜{a}c\in\mathcal{A}_{\ell}\setminus\{a\} receives pathogens, which equals q=1mdq(𝐗(K~))Bq1(c,d)(𝐗(K~))\sum_{q=1}^{m}\sum_{d\in\mathcal{I}_{q}(\mathbf{X}(\tilde{K}))}B_{\ell q}1_{(c,d)}(\mathbf{X}(\tilde{K})), are both conditionally independent of 1(a,b)(τ)1_{(a,b)}(\tau) given that 𝐗(K~)𝕐\mathbf{X}(\tilde{K})\in\mathbb{Y}. Therefore, by using arguments similar to those made in the proof of Lemma 5, we can show that (𝒮(LN),(LN))(\mathcal{S}(L_{N}),\mathcal{I}(L_{N})) is conditionally independent of {1(a,b)(τ):0τt}\{1_{(a,b)}(\tau):0\leq\tau\leq t\} given the rest of the conditioning events and variables. Moreover, by repeating the above for subsequent (a,b)(a,b)-agnostic jumps, we can generalize this conditional independence assertion to (𝒮(t),(t))(\mathcal{S}(t),\mathcal{I}(t)), which means that

Pr((𝒮(t\displaystyle\text{Pr}\Big{(}(\mathcal{S}(t ),(t))=(𝒮0,0)𝐗(K~)𝕐,K=κ,b\centernot0,ta,b(t),{1(a,b)(τ):0τt},(a,b)E(t))\displaystyle),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0})\mid\mathbf{X}(\tilde{K})\in\mathbb{Y},K=\kappa,b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a,b\in\mathcal{I}(t^{-}),\{1_{(a,b)}(\tau):0\leq\tau\leq t\},(a,b)\notin E(t)\Big{)}
=Pr((𝒮(t),(t))=(𝒮0,0)𝐗(K~)𝕐,K=κ,b\centernot0,ta,b(t),(a,b)E(t)).\displaystyle=\text{Pr}\left((\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0})\mid\mathbf{X}(\tilde{K})\in\mathbb{Y},K=\kappa,b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a,b\in\mathcal{I}(t^{-}),(a,b)\notin E(t)\right). (127)

Combining (119), (120), (123) and (Proof.) now yields

Pr((𝒮(t),(t))=(𝒮0,0),𝐗(K~)𝕐,b\centernot0,ta,b(t)K=κ,{1(a,b)(τ):0τt},(a,b)E(t))\displaystyle\text{Pr}\left((\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0}),\mathbf{X}(\tilde{K})\in\mathbb{Y},b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a,b\in\mathcal{I}(t^{-})\mid K=\kappa,\{1_{(a,b)}(\tau):0\leq\tau\leq t\},(a,b)\notin E(t)\right)
=Pr((𝒮(t),(t))=(𝒮0,0)𝐗(K~)𝕐,K=κ,b\centernot0,ta,b(t),(a,b)E(t))\displaystyle=\text{Pr}\left((\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0})\mid\mathbf{X}(\tilde{K})\in\mathbb{Y},K=\kappa,b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a,b\in\mathcal{I}(t^{-}),(a,b)\notin E(t)\right)
×eBijTon×Pr(b(t)K=κ)×Pr(𝐗(K~)𝕐K=κ)\displaystyle\quad\times e^{-B_{ij}T_{\text{on}}}\times\text{Pr}(b\in\mathcal{I}(t^{-})\mid K=\kappa)\times\text{Pr}(\mathbf{X}(\tilde{K})\in\mathbb{Y}\mid K=\kappa)

Summing both the sides of the above equation over the space of all (a,b)(a,b)-agnostic superstates 𝕐\mathbb{Y} gives

Pr((𝒮(t),(t))=(𝒮0,0),b\centernot0,ta,b(t)K=κ,{1(a,b)(τ):0τt},(a,b)E(t))\displaystyle\text{Pr}\left((\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0}),b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a,b\in\mathcal{I}(t^{-})\mid K=\kappa,\{1_{(a,b)}(\tau):0\leq\tau\leq t\},(a,b)\notin E(t)\right)
=eBijTon×Pr(b(t)K=κ)\displaystyle=e^{-B_{ij}T_{\text{on}}}\times\text{Pr}(b\in\mathcal{I}(t^{-})\mid K=\kappa) (128)
×𝕐(Pr((𝒮(t),(t))=(𝒮0,0)𝐗(K~)𝕐,K=κ,b\centernot0,ta,b(t),(a,b)E(t))\displaystyle\quad\times\sum_{\mathbb{Y}}\Bigg{(}\text{Pr}\left((\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0})\mid\mathbf{X}(\tilde{K})\in\mathbb{Y},K=\kappa,b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a,b\in\mathcal{I}(t^{-}),(a,b)\notin E(t)\right) (129)
Pr(𝐗(K~)𝕐K=κ)).\displaystyle\quad\quad\quad\quad\quad\cdot\text{Pr}(\mathbf{X}(\tilde{K})\in\mathbb{Y}\mid K=\kappa)\Bigg{)}. (130)

Here, we recall from our earlier arguments that

eBijTon×Pr(b(t)K=κ)\displaystyle e^{-B_{ij}T_{\text{on}}}\times\text{Pr}(b\in\mathcal{I}(t^{-})\mid K=\kappa)
=Pr(b\centernot0,taK=κ,{1(a,b)(τ):0τt},(a,b)E(t),b(t))\displaystyle=\text{Pr}\left(b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\mid K=\kappa,\{1_{(a,b)}(\tau):0\leq\tau\leq t\},(a,b)\notin E(t),b\in\mathcal{I}(t^{-})\right)
×Pr(b(t)K=κ,{1(a,b)(τ):0τt},(a,b)E(t))\displaystyle\quad\times\text{Pr}\left(b\in\mathcal{I}(t^{-})\mid K=\kappa,\{1_{(a,b)}(\tau):0\leq\tau\leq t\},(a,b)\notin E(t)\right)
=Pr(b\centernot0,ta,b(t){1(a,b)(τ):0τt},(a,b)E(t)).\displaystyle=\text{Pr}\left(b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a,b\in\mathcal{I}(t^{-})\mid\{1_{(a,b)}(\tau):0\leq\tau\leq t\},(a,b)\notin E(t)\right).

In light of (Proof.), this means that

Pr((𝒮(t),(t))=(𝒮0,0),b\centernot0,ta,b(t)K=κ,{1(a,b)(τ):0τt},(a,b)E(t))\displaystyle\text{Pr}\left((\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0}),b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a,b\in\mathcal{I}(t^{-})\mid K=\kappa,\{1_{(a,b)}(\tau):0\leq\tau\leq t\},(a,b)\notin E(t)\right)
=Pr(b\centernot0,ta,b(t){1(a,b)(τ):0τt},(a,b)E(t))\displaystyle=\text{Pr}\left(b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a,b\in\mathcal{I}(t^{-})\mid\{1_{(a,b)}(\tau):0\leq\tau\leq t\},(a,b)\notin E(t)\right)
×𝕐(Pr((𝒮(t),(t))=(𝒮0,0)𝐗(K~)𝕐,K=κ,b\centernot0,ta,b(t),(a,b)E(t))\displaystyle\quad\times\sum_{\mathbb{Y}}\Bigg{(}\text{Pr}\left((\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0})\mid\mathbf{X}(\tilde{K})\in\mathbb{Y},K=\kappa,b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a,b\in\mathcal{I}(t^{-}),(a,b)\notin E(t)\right)
Pr(𝐗(K~)𝕐K=κ)).\displaystyle\quad\quad\quad\quad\quad\cdot\text{Pr}(\mathbf{X}(\tilde{K})\in\mathbb{Y}\mid K=\kappa)\Bigg{)}.

Dividing both the sides of this equation by Pr(b\centernot0,ta,b(t){1(a,b)(τ):0τt},(a,b)E(t))\text{Pr}\left(b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a,b\in\mathcal{I}(t^{-})\mid\{1_{(a,b)}(\tau):0\leq\tau\leq t\},(a,b)\notin E(t)\right) gives

Pr((𝒮(t),(t))=(𝒮0,0)K=κ,b\centernot0,ta,b(t),{1(a,b)(τ):0τt},(a,b)E(t))\displaystyle\text{Pr}\left((\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0})\mid K=\kappa,b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a,b\in\mathcal{I}(t^{-}),\{1_{(a,b)}(\tau):0\leq\tau\leq t\},(a,b)\notin E(t)\right)
=𝕐(Pr((𝒮(t),(t))=(𝒮0,0)𝐗(K~)𝕐,K=κ,b\centernot0,ta,b(t),(a,b)E(t))\displaystyle=\sum_{\mathbb{Y}}\Bigg{(}\text{Pr}\left((\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0})\mid\mathbf{X}(\tilde{K})\in\mathbb{Y},K=\kappa,b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a,b\in\mathcal{I}(t^{-}),(a,b)\notin E(t)\right)
Pr(𝐗(K~)𝕐K=κ))\displaystyle\quad\quad\quad\quad\cdot\text{Pr}(\mathbf{X}(\tilde{K})\in\mathbb{Y}\mid K=\kappa)\Bigg{)}
=Pr((𝒮(t),(t))=(𝒮0,0)K=κ,b\centernot0,ta,b(t),(a,b)E(t)),\displaystyle=\text{Pr}\left((\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0})\mid K=\kappa,b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a,b\in\mathcal{I}(t^{-}),(a,b)\notin E(t)\right),

where the last step holds because the summation is independent of {1(a,b)(τ):0τt}\{1_{(a,b)}(\tau):0\leq\tau\leq t\} given that (a,b)E(t)(a,b)\notin E(t). We have thus shown the following: given K=κ,b\centernot0,taK=\kappa,b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a, and b(t)b\in\mathcal{I}(t^{-}), the event {(𝒮(t),(t))=(𝒮0,0)}\{(\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0})\} is conditionally independent of {1(a,b)(τ):0τt}\{1_{(a,b)}(\tau):0\leq\tau\leq t\}. Since TT and TonT_{\text{on}} are functions of {1(a,b)(τ):0τt}\{1_{(a,b)}(\tau):0\leq\tau\leq t\}, the assertion of the lemma follows.

Proof of Proposition 2

Before we prove Proposition 2, we recall that for any transition sequence F={𝐱(0)t1𝐱(1)t2tr𝐱(r)t𝐱(r)}F=\{\mathbf{x}^{(0)}\stackrel{{\scriptstyle t_{1}}}{{\rightarrow}}\mathbf{x}^{(1)}\stackrel{{\scriptstyle t_{2}}}{{\rightarrow}}\cdots\stackrel{{\scriptstyle t_{r}}}{{\rightarrow}}\mathbf{x}^{(r)}\stackrel{{\scriptstyle t}}{{\rightarrow}}\mathbf{x}^{(r)}\} on a time interval [0,t][0,t], the index Λ(a,b)(F)\Lambda_{(a,b)}(F) indexes the transition in which (a.b)(a.b) is updated for the last time during [0,t][0,t] given that FF occurs. We now define another similar index below:

Γ(a,b)(F)={min{[r]:𝐱()=𝐱b(1)}if {[r]:𝐱()=𝐱b(1)}0otherwise.\Gamma{(a,b)}(F)=\begin{cases}\min\left\{\ell\in[r]:\mathbf{x}^{(\ell)}=\mathbf{x}^{(\ell-1)}_{\uparrow b}\right\}\quad&\text{if }\left\{\ell\in[r]:\mathbf{x}^{(\ell)}=\mathbf{x}^{(\ell-1)}_{\uparrow b}\right\}\neq\emptyset\\ 0\quad&\text{otherwise.}\end{cases}

Observe that Γb(F)\Gamma_{b}(F) indexes the transition in which bb gets infected given that FF occurs.

Proof.

Consider any realization (𝒮0,0)(\mathcal{S}_{0},\mathcal{I}_{0}) of (𝒮(t),(t))(\mathcal{S}(t),\mathcal{I}(t)), and let \mathcal{F} be the set of all the transition sequences for [0,t][0,t] that result in the occurrence of {(𝒮(t),(t))=(𝒮0,0)}\{(\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0})\}, so that {(𝒮(t),(t))=(𝒮0,0)}=FF\{(\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0})\}=\cup_{F\in\mathcal{F}}F.

Consider now any pair of nodes (a,b)𝒜i𝒮0×𝒜j0(a,b)\in\mathcal{A}_{i}\cap\mathcal{S}_{0}\times\mathcal{A}_{j}\cap\mathcal{I}_{0} (so that we have a𝒮i(t)a\in\mathcal{S}_{i}(t) and bj(t)b\in\mathcal{I}_{j}(t) in the event that (𝒮(t),(t))=(𝒮0,0)(\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0})), and note that for any transition sequence FF\in\mathcal{F}, we have F(a,b¯)F_{(\overline{a,b})}\in\mathcal{F}, because both F(a,b¯)F_{(\overline{a,b})} and FF involve the same node recoveries and disease transmissions (all of which occur along edges other than (a,b)(a,b)). Therefore, F?(a,b)F_{?(a,b)}\subset\mathcal{F} for each FF\in\mathcal{F}, and it follows that {(𝒮(t),(t))=(𝒮0,0)}=FF?(a,b)\{(\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0})\}=\cup_{F\in\mathcal{F}}F_{?(a,b)}.

Hence, we can derive bounds on χij(t)\chi_{ij}(t) (defined to be Pr((a,b)E(t)𝒮(t),(t))\text{Pr}((a,b)\in E(t)\mid\mathcal{S}(t),\mathcal{I}(t))) by bounding Pr((a,b)E(t)(𝒮(t),(t))=(𝒮0,0))=Pr((a,b)E(t)FF?(a,b))\text{Pr}((a,b)\in E(t)\mid(\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0}))=\text{Pr}((a,b)\in E(t)\mid\cup_{F\in\mathcal{F}}F_{?(a,b)}). To this end, we pick FF\in\mathcal{F} and δ>0\delta>0, and apply Bayes’ rule to Pr((a,b)E(t)F?(a,b)δ)\text{Pr}((a,b)\in E(t)\mid F_{?(a,b)}^{\delta}) as follows:

Pr((a,b)E(t)F?(a,b)δ)\displaystyle\text{Pr}((a,b)\in E(t)\mid F_{?(a,b)}^{\delta}) =(Pr(F?(a,b)δ)Pr(F?(a,b)δ(a,b)E(t))Pr((a,b)E(t)))1\displaystyle=\left(\frac{\text{Pr}(F_{?(a,b)}^{\delta})}{\text{Pr}(F_{?(a,b)}^{\delta}\mid(a,b)\in E(t))\cdot\text{Pr}((a,b)\in E(t))}\right)^{-1} (131)
=(1+Pr(F?(a,b)δ(a,b)E(t))Pr(F?(a,b)δ(a,b)E(t))Pr((a,b)E(t))Pr((a,b)E(t)))1\displaystyle=\left(1+\frac{\text{Pr}(F_{?(a,b)}^{\delta}\mid(a,b)\notin E(t))}{\text{Pr}(F_{?(a,b)}^{\delta}\mid(a,b)\in E(t))}\cdot\frac{\text{Pr}((a,b)\notin E(t))}{\text{Pr}((a,b)\in E(t))}\right)^{-1} (132)
=(a)(1+Pr(F?(a,b)δ(a,b)E(t))Pr(F?(a,b)δ(a,b)E(t))1ρij/nρij/n)1,\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\left(1+\frac{\text{Pr}(F_{?(a,b)}^{\delta}\mid(a,b)\notin E(t))}{\text{Pr}(F_{?(a,b)}^{\delta}\mid(a,b)\in E(t))}\cdot\frac{1-\rho_{ij}/n}{\rho_{ij}/n}\right)^{-1}, (133)

where (a)(a) holds because Pr((a,b)E(t))=ρijn\text{Pr}((a,b)\in E(t))=\frac{\rho_{ij}}{n}, which is the probability that the edge (a,b)(a,b) exists in the network after the last of the updates of 1(a,b)1_{(a,b)} to occur during [0,t][0,t].

We now estimate Pr(F?(a,b)δ(a,b)E(t))Pr(F?(a,b)δ(a,b)E(t))\frac{\text{Pr}(F_{?(a,b)}^{\delta}\mid(a,b)\notin E(t))}{\text{Pr}(F_{?(a,b)}^{\delta}\mid(a,b)\in E(t))}. Note that if δ\delta is small enough, either Fδ{(a,b)E(t)}F^{\delta}\subset\{(a,b)\in E(t)\} or Fδ{(a,b)E(t)}F^{\delta}\subset\{(a,b)\notin E(t)\}. Assume w.l.o.g. that Fδ{(a,b)E(t)}F^{\delta}\subset\{(a,b)\notin E(t)\} (equivalently, Fδ(a,b¯){(a,b)E(t)}F^{\delta}_{(\overline{a,b})}\subset\{(a,b)\in E(t)\}), and observe that

Pr(F?(a,b)δ(a,b)E(t))Pr(F?(a,b)δ(a,b)E(t))\displaystyle\frac{\text{Pr}(F_{?(a,b)}^{\delta}\mid(a,b)\notin E(t))}{\text{Pr}(F_{?(a,b)}^{\delta}\mid(a,b)\in E(t))} =Pr(F?(a,b)δ{(a,b)E(t)})Pr(F?(a,b)δ{(a,b)E(t)})Pr(a,b)E(t)Pr(a,b)E(t)\displaystyle=\frac{\text{Pr}(F_{?(a,b)}^{\delta}\cap\{(a,b)\notin E(t)\})}{\text{Pr}(F_{?(a,b)}^{\delta}\cap\{(a,b)\in E(t)\})}\cdot\frac{\text{Pr}(a,b)\in E(t)}{\text{Pr}(a,b)\notin E(t)} (134)
=Pr(Fδ)Pr(Fδ(a,b¯))(ρij/n1ρij/n).\displaystyle=\frac{\text{Pr}(F^{\delta})}{\text{Pr}(F^{\delta}_{(\overline{a,b})})}\left(\frac{\rho_{ij}/{n}}{1-\rho_{ij}/{n}}\right). (135)

Thus, the next step is to evaluate Pr(Fδ)Pr(F(a,b¯)δ)\frac{\text{Pr}(F^{\delta})}{\text{Pr}(F_{(\overline{a,b})}^{\delta})}. To do so, suppose F={𝐱(0)t1𝐱(1)t2tr𝐱(r)tr+1x(r)}F=\{\mathbf{x}^{(0)}\stackrel{{\scriptstyle t_{1}}}{{\rightarrow}}\mathbf{x}^{(1)}\stackrel{{\scriptstyle t_{2}}}{{\rightarrow}}\cdots\stackrel{{\scriptstyle t_{r}}}{{\rightarrow}}\mathbf{x}^{(r)}\stackrel{{\scriptstyle t_{r+1}}}{{\rightarrow}}x^{(r)}\} with tr+1:=tt_{r+1}:=t, Λ(a,b)(F)=ζ{0,1,,r}\Lambda_{(a,b)}(F)=\zeta\in\{0,1,\ldots,r\}, Γb(F)=ξ{0,1,,r}\Gamma_{b}(F)=\xi\in\{0,1,\ldots,r\}, tΛ(a,b)(F)=tζ=tτt_{\Lambda_{(a,b)}(F)}=t_{\zeta}=t-\tau for some τ[0,t]\tau\in[0,t], and tΓb(F)=tξ=tκt_{\Gamma_{b}(F)}=t_{\xi}=t-\kappa for some κ[0,τ]\kappa\in[0,\tau]. Then the assumption Fδ{(a,b)E(t)}F^{\delta}\subset\{(a,b)\notin E(t)\} implies that 𝐱(ζ)=𝐱(ζ1)(a,b)\mathbf{x}^{(\zeta)}=\mathbf{x}^{(\zeta-1)}_{\downarrow(a,b)} and hence also that 𝐱(ζ)(a,b¯)=𝐱(ζ1)(a,b)\mathbf{x}^{(\zeta)}_{(\overline{a,b})}=\mathbf{x}^{(\zeta-1)}_{\uparrow(a,b)}. As a result,

F(a,b¯)\displaystyle F_{(\overline{a,b})} ={𝐱(0)t1tζ1𝐱(ζ1)tτ𝐱(a,b)(ζ1)tζ+1𝐱(a,b¯)(ζ+1)tζ+2tr𝐱(a,b¯)(r)t𝐱(a,b¯)(r)}.\displaystyle=\Big{\{}\mathbf{x}^{(0)}\stackrel{{\scriptstyle t_{1}}}{{\rightarrow}}\cdots\stackrel{{\scriptstyle t_{\zeta-1}}}{{\rightarrow}}\mathbf{x}^{(\zeta-1)}\stackrel{{\scriptstyle t-\tau}}{{\rightarrow}}\mathbf{x}_{\uparrow(a,b)}^{(\zeta-1)}\stackrel{{\scriptstyle t_{\zeta+1}}}{{\rightarrow}}\mathbf{x}_{(\overline{a,b})}^{(\zeta+1)}\stackrel{{\scriptstyle t_{\zeta+2}}}{{\rightarrow}}\cdots\stackrel{{\scriptstyle t_{r}}}{{\rightarrow}}\mathbf{x}_{(\overline{a,b})}^{(r)}\stackrel{{\scriptstyle t}}{{\rightarrow}}\mathbf{x}_{(\overline{a,b})}^{(r)}\Big{\}}.

It now follows from Lemma 3 that

Pr(Fδ)Pr(Fδ(a,b¯))=eqr(ttr)=ζrq1,(eq1(tt1)δ+o(δ))eq¯r(ttr)=ζrq¯1,(eq¯1(tt1)δ+o(δ)),\displaystyle\frac{\text{Pr}(F^{\delta})}{\text{Pr}(F^{\delta}_{(\overline{a,b})})}=\frac{e^{-q_{r}(t-t_{r})}\prod_{\ell=\zeta}^{r}q_{\ell-1,\ell}(e^{-q_{\ell-1}(t_{\ell}-t_{\ell-1})}\delta+o(\delta))}{e^{-\bar{q}_{r}(t-t_{r})}\prod_{\ell=\zeta}^{r}\bar{q}_{\ell-1,\ell}(e^{-\bar{q}_{\ell-1}(t_{\ell}-t_{\ell-1})}\delta+o(\delta))}, (136)

where q1,:=𝐐(𝐱(1),𝐱())q_{\ell-1,\ell}:=\mathbf{Q}(\mathbf{x}^{(\ell-1)},\mathbf{x}^{(\ell)}) and q¯1,:=𝐐(𝐱(1)(a,b¯),𝐱()(a,b¯))\bar{q}_{\ell-1,\ell}:=\mathbf{Q}(\mathbf{x}^{(\ell-1)}_{(\overline{a,b})},\mathbf{x}^{(\ell)}_{(\overline{a,b})}) for all {ζ+1,,r}\ell\in\{\zeta+1,\ldots,r\}, q:=𝐐(𝐱(),𝐱())q_{\ell}:=-\mathbf{Q}(\mathbf{x}^{(\ell)},\mathbf{x}^{(\ell)}) and q¯:=𝐐(𝐱()(a,b¯),𝐱()(a,b¯))\bar{q}_{\ell}:=-\mathbf{Q}(\mathbf{x}^{(\ell)}_{(\overline{a,b})},\mathbf{x}^{(\ell)}_{(\overline{a,b})}) for all {ζ,,r}\ell\in\{\zeta,\ldots,r\}, qζ1,ζ:=𝐐(𝐱(ζ1),𝐱(ζ1)(a,b))=λ(1ρijn)q_{\zeta-1,\zeta}:=\mathbf{Q}(\mathbf{x}^{(\zeta-1)},\mathbf{x}^{(\zeta-1)}_{\downarrow(a,b)})=\lambda\left(1-\frac{\rho_{ij}}{n}\right), q¯ζ1,ζ:=𝐐(𝐱(ζ1),𝐱(ζ1)(a,b))=λρijn\bar{q}_{\zeta-1,\zeta}:=\mathbf{Q}(\mathbf{x}^{(\zeta-1)},\mathbf{x}^{(\zeta-1)}_{\uparrow(a,b)})=\lambda\frac{\rho_{ij}}{n}, and q¯ζ1=qζ1:=𝐐(𝐱(ζ1),𝐱(ζ1))\bar{q}_{\zeta-1}=q_{\zeta-1}:=-\mathbf{Q}(\mathbf{x}^{(\zeta-1)},\mathbf{x}^{(\zeta-1)}).

The above definitions imply that qζ1,ζq¯ζ1,ζ=1ρij/nρij/n\frac{q_{\zeta-1,\zeta}}{\bar{q}_{\zeta-1,\zeta}}=\frac{1-\rho_{ij}/n}{\rho_{ij}/n}. To evaluate q1,q¯1,\frac{q_{\ell-1,\ell}}{\bar{q}_{\ell-1,\ell}} for {ζ+1,,r}\ell\in\{\zeta+1,\ldots,r\}, observe that by the definition of Λ(a,b)(F)\Lambda_{(a,b)}(F), we have 𝐱(){𝐱(1)(a,b),𝐱(1)(a,b)}\mathbf{x}^{(\ell)}\notin\{\mathbf{x}^{(\ell-1)}_{\uparrow(a,b)},\mathbf{x}^{(\ell-1)}_{\downarrow(a,b)}\} for >ζ=Λ(a,b)(F)\ell>\zeta=\Lambda_{(a,b)}(F). Moreover, the facts F{𝒮(t)=𝒮0}F\subset\{\mathcal{S}(t)=\mathcal{S}_{0}\} and a𝒮0a\in\mathcal{S}_{0} together imply that 𝐱()𝐱(1)a\mathbf{x}^{(\ell)}\neq\mathbf{x}^{(\ell-1)}_{\uparrow a} for all [r]\ell\in[r]. Hence, 𝐱(){𝐱(1)(a,b),𝐱(1)(a,b),𝐱(1)a}\mathbf{x}^{(\ell)}\notin\{\mathbf{x}^{(\ell-1)}_{\uparrow(a,b)},\mathbf{x}^{(\ell-1)}_{\downarrow(a,b)},\mathbf{x}^{(\ell-1)}_{\uparrow a}\} for all >ζ\ell>\zeta. It now follows from the definition of 𝐐\mathbf{Q} (Section 2) that 𝐐(𝐱(1),𝐱())=𝐐(𝐱(1)(a,b¯),𝐱()(a,b¯))\mathbf{Q}(\mathbf{x}^{(\ell-1)},\mathbf{x}^{(\ell)})=\mathbf{Q}(\mathbf{x}^{(\ell-1)}_{(\overline{a,b})},\mathbf{x}^{(\ell)}_{(\overline{a,b})}) for all {ζ+1,,r}\ell\in\{\zeta+1,\ldots,r\}. Thus, =ζrq1,=ζrq¯1,=1ρij/nρij/n\frac{\prod_{\ell=\zeta}^{r}q_{\ell-1,\ell}}{\prod_{\ell=\zeta}^{r}\bar{q}_{\ell-1,\ell}}=\frac{1-\rho_{ij}/n}{\rho_{ij}/n}.

To relate q¯\bar{q}_{\ell} to qq_{\ell}, note that F{(a,b)E(t)}F\subset\{(a,b)\notin E(t)\} implies that 𝐱()a,b=0\mathbf{x}^{(\ell)}_{\langle a,b\rangle}=0 and hence also that (𝐱()(a,b¯))a,b=1\left(\mathbf{x}^{(\ell)}_{(\overline{a,b})}\right)_{\langle a,b\rangle}=1 for ζ\ell\geq\zeta. Since 𝐱()u=(𝐱()(a,b¯))u\mathbf{x}^{(\ell)}_{u}=\left(\mathbf{x}^{(\ell)}_{(\overline{a,b})}\right)_{u} for all ua,bu\neq\langle a,b\rangle, we have

𝐐(𝐱()(a,b¯),(𝐱()(a,b¯))a)\displaystyle\mathbf{Q}\left(\mathbf{x}^{(\ell)}_{(\overline{a,b})},\left(\mathbf{x}^{(\ell)}_{(\overline{a,b})}\right)_{\uparrow a}\right) =k=1mBikEk(a)(𝐱()(a,b¯))\displaystyle=\sum_{k=1}^{m}B_{ik}E_{k}^{(a)}\left(\mathbf{x}^{(\ell)}_{(\overline{a,b})}\right)
=k=1mck(𝐱()(a,b¯))Bik1(a,c)(𝐱()(a,b¯))\displaystyle=\sum_{k=1}^{m}\sum_{c\in\mathcal{I}_{k}\left(\mathbf{x}^{(\ell)}_{(\overline{a,b})}\right)}B_{ik}1_{(a,c)}\left(\mathbf{x}^{(\ell)}_{(\overline{a,b})}\right)
=k=1mck(𝐱())Bik1(a,c)(𝐱())+Bij1(a,b)(𝐱()(a,b¯))\displaystyle=\sum_{k=1}^{m}\sum_{c\in\mathcal{I}_{k}\left(\mathbf{x}^{(\ell)}\right)}B_{ik}1_{(a,c)}\left(\mathbf{x}^{(\ell)}\right)+B_{ij}1_{(a,b)}\left(\mathbf{x}^{(\ell)}_{(\overline{a,b})}\right)
=𝐐(𝐱(),𝐱()a)+Bij\displaystyle=\mathbf{Q}\left(\mathbf{x}^{(\ell)},\mathbf{x}^{(\ell)}_{\uparrow a}\right)+B_{ij}

for all {ζ,,r}\ell\in\{\zeta,\;\ldots,r\} such that bj(𝐱())b\in\mathcal{I}_{j}(\mathbf{x}^{(\ell)}), and

𝐐(𝐱()(a,b¯),(𝐱()(a,b¯))a)=𝐐(𝐱(),𝐱()a)\displaystyle\mathbf{Q}\left(\mathbf{x}^{(\ell)}_{(\overline{a,b})},\left(\mathbf{x}^{(\ell)}_{(\overline{a,b})}\right)_{\uparrow a}\right)=\mathbf{Q}\left(\mathbf{x}^{(\ell)},\mathbf{x}^{(\ell)}_{\uparrow a}\right)

for all {ζ,,r}\ell\in\{\zeta,\ldots,r\} such that bj(𝐱())b\notin\mathcal{I}_{j}(\mathbf{x}^{(\ell)}). As a result,

𝐐(𝐱()(a,b¯),(𝐱()(a,b¯))a)={𝐐(𝐱(),𝐱()a)if ζ<ξ𝐐(𝐱(),𝐱()a)+Bijif max{ζ,ξ}r.\displaystyle\mathbf{Q}\left(\mathbf{x}^{(\ell)}_{(\overline{a,b})},\left(\mathbf{x}^{(\ell)}_{(\overline{a,b})}\right)_{\uparrow a}\right)=\begin{cases}\mathbf{Q}\left(\mathbf{x}^{(\ell)},\mathbf{x}^{(\ell)}_{\uparrow a}\right)\quad&\text{if }\zeta\leq\ell<\xi\\ \mathbf{Q}\left(\mathbf{x}^{(\ell)},\mathbf{x}^{(\ell)}_{\uparrow a}\right)+B_{ij}\quad&\text{if }\max\{\zeta,\xi\}\leq\ell\leq r.\end{cases} (137)

Moreover, using the definition of 𝐐\mathbf{Q} one can easily verify that regardless of whether the network is in state 𝐱()\mathbf{x}^{(\ell)} or in state 𝐱()(a,b¯)\mathbf{x}^{(\ell)}_{(\overline{a,b})}, the rates of infection of nodes in 𝒮(𝐱()){a}\mathcal{S}(\mathbf{x}^{(\ell)})\setminus\{a\}, the recovery rates of nodes in (x())\mathcal{I}(x^{(\ell)}), and the edge update rate (which is λ\lambda) are the same. Given that 𝐐(𝐱,𝐱)=𝐲𝕊{𝐱}𝐐(𝐱,𝐲)\mathbf{Q}(\mathbf{x},\mathbf{x})=-\sum_{\mathbf{y}\in\mathbb{S}\setminus\{\mathbf{x}\}}\mathbf{Q}(\mathbf{x},\mathbf{y}) for all 𝐱𝕊\mathbf{x}\in\mathbb{S}, it now follows from (137) that

q¯q={0if ζ<ξBijif max{ζ,ξ}r.\displaystyle\bar{q}_{\ell}-q_{\ell}=\begin{cases}0\quad&\text{if }\zeta\leq\ell<\xi\\ B_{ij}\quad&\text{if }\max\{\zeta,\xi\}\leq\ell\leq r.\end{cases} (138)

Combining the above observations with (136) yields

Pr(Fδ)Pr(Fδ(a,b¯))\displaystyle\frac{\text{Pr}(F^{\delta})}{\text{Pr}(F^{\delta}_{(\overline{a,b})})} =(1ρij/nρij/n)e(q¯rqr)(ttr)=ζ1r1(e(q¯q)(t+1t)δ+o(δ))\displaystyle=\left(\frac{1-{\rho_{ij}}/{n}}{{\rho_{ij}}/{n}}\right)e^{(\bar{q}_{r}-q_{r})(t-t_{r})}\prod_{\ell=\zeta-1}^{r-1}\left(e^{(\bar{q}_{\ell}-q_{\ell})(t_{\ell+1}-t_{\ell})}\delta+o(\delta)\right)
=(a)(1ρij/nρij/n)=ζ1r(e(q¯q)(t+1t)δ+o(δ))\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\left(\frac{1-{\rho_{ij}}/{n}}{{\rho_{ij}}/{n}}\right)\prod_{\ell=\zeta-1}^{r}\left(e^{(\bar{q}_{\ell}-q_{\ell})(t_{\ell+1}-t_{\ell})}\delta+o(\delta)\right)
=(1ρij/nρij/n)=ζ1max{ζ,ξ}1(1+o(δ))=max{ζ,ξ}r(eBij(t+1t)δ+o(δ))\displaystyle=\left(\frac{1-{\rho_{ij}}/{n}}{{\rho_{ij}}/{n}}\right)\prod_{\ell=\zeta-1}^{\max\{\zeta,\xi\}-1}(1+o(\delta))\prod_{\ell=\max\{\zeta,\xi\}}^{r}\left(e^{B_{ij}(t_{\ell+1}-t_{\ell})}\delta+o(\delta)\right)
=(1ρij/nρij/n)eBij(ttmax{ζ,ξ})+o(δ),\displaystyle=\left(\frac{1-{\rho_{ij}}/{n}}{{\rho_{ij}}/{n}}\right)e^{B_{ij}\left(t-t_{\max\{\zeta,\xi\}}\right)}+o(\delta),

which means that

Pr(Fδ)Pr(Fδ(a,b¯))=(1ρij/nρij/n)eBijmin{τ,κ}+o(δ).\displaystyle\frac{\text{Pr}(F^{\delta})}{\text{Pr}(F^{\delta}_{(\overline{a,b})})}=\left(\frac{1-{\rho_{ij}}/{n}}{{\rho_{ij}}/{n}}\right)e^{B_{ij}\min\{\tau,\kappa\}}+o(\delta). (139)

We now use (131) and (134) along with (139) to show that

Pr((a,b)E(t)Fδ?(a,b))\displaystyle\text{Pr}((a,b)\in E(t)\mid F^{\delta}_{?(a,b)}) =(1+(1ρij/nρij/n)eBijmin{τ,κ}+o(δ))1\displaystyle=\left(1+\left(\frac{1-\rho_{ij}/n}{\rho_{ij}/n}\right)e^{B_{ij}\min\{\tau,\kappa\}}+o(\delta)\right)^{-1}
=ρijn1ρijn1+(1ρijn)eBijmin{τ,κ}+o(δ).\displaystyle=\frac{\rho_{ij}}{n}\cdot\frac{1}{\frac{\rho_{ij}}{n}\cdot 1+\left(1-\frac{\rho_{ij}}{n}\right)e^{B_{ij}\min\{\tau,\kappa\}}}+o(\delta).

In the limit as δ0\delta\rightarrow 0, this yields

Pr((a,b)E(t)F?(a,b))\displaystyle\text{Pr}((a,b)\in E(t)\mid F_{?(a,b)}) =ρijn1ρijn1+(1ρijn)eBijmin{τ,κ}\displaystyle=\frac{\rho_{ij}}{n}\cdot\frac{1}{\frac{\rho_{ij}}{n}\cdot 1+\left(1-\frac{\rho_{ij}}{n}\right)e^{B_{ij}\min\{\tau,\kappa\}}} (140)

Note that these bounds hold for all FF\in\mathcal{F} satisfying ttΛ(a,b)(F)=τt-t_{\Lambda_{(a,b)}(F)}=\tau and ttΓb(F)=κt-t_{\Gamma_{b}(F)}=\kappa. We now recall that TT is the difference between tt and the time at which 1(a,b)1_{(a,b)} is updated for the last time during [0,t][0,t], so that we have T=ttΛ(a,b)(F)T=t-t_{\Lambda_{(a,b)}(F)} whenever F?(a,b)F_{?(a,b)} occurs. Likewise, K=ttΓb(F)K=t-t_{\Gamma_{b}(F)} on F?(a,b)F_{?(a,b)}. Therefore, (140) holds for all FF\in\mathcal{F} satisfying F{T=τ}{K=κ}F\subset\{T=\tau\}\cap\{K=\kappa\}. As a result, we have

Pr((a,b)E(t)FF?(a,b),T=τ,K=κ)=ρijn1ρijn1+(1ρijn)eBijmin{τ,κ}.\text{Pr}((a,b)\in E(t)\mid\cup_{F\in\mathcal{F}}F_{?(a,b)},T=\tau,K=\kappa)=\frac{\rho_{ij}}{n}\cdot\frac{1}{\frac{\rho_{ij}}{n}\cdot 1+\left(1-\frac{\rho_{ij}}{n}\right)e^{B_{ij}\min\{\tau,\kappa\}}}.

Since FF?(a,b)={(𝒮(t),(t))=(𝒮0,0)}\cup_{F\in\mathcal{F}}F_{?(a,b)}=\{(\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0})\} as argued earlier, it follows that

Pr((a,b)E(t)(𝒮(t),(t))=(𝒮0,0),T=τ,K=κ)=ρijn1ρijn1+(1ρijn)eBijmin{τ,κ}.\displaystyle\text{Pr}((a,b)\in E(t)\mid(\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0}),T=\tau,K=\kappa)=\frac{\rho_{ij}}{n}\cdot\frac{1}{\frac{\rho_{ij}}{n}\cdot 1+\left(1-\frac{\rho_{ij}}{n}\right)e^{B_{ij}\min\{\tau,\kappa\}}}. (141)

Observe that 0min{κ,τ}τ0\leq\min\{\kappa,\tau\}\leq\tau, which means that

ρijn\displaystyle\frac{\rho_{ij}}{n} Pr((a,b)E(t)(𝒮(t),(t))=(𝒮0,0),T=τ,K=κ)\displaystyle\geq\text{Pr}((a,b)\in E(t)\mid(\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0}),T=\tau,K=\kappa) (142)
ρijn1ρijn1+(1ρijn)eBijτ\displaystyle\geq\frac{\rho_{ij}}{n}\cdot\frac{1}{\frac{\rho_{ij}}{n}\cdot 1+\left(1-\frac{\rho_{ij}}{n}\right)e^{B_{ij}\tau}} (143)
ρijneBijτ.\displaystyle\geq\frac{\rho_{ij}}{n}e^{-B_{ij}\tau}. (144)

Furthermore, since the above bounds do not depend on κ\kappa, we can remove the conditioning on KK to obtain

ρijn\displaystyle\frac{\rho_{ij}}{n} Pr((a,b)E(t)(𝒮(t),(t))=(𝒮0,0),T=τ)ρijneBijτ.\displaystyle\geq\text{Pr}((a,b)\in E(t)\mid(\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0}),T=\tau)\geq\frac{\rho_{ij}}{n}e^{-B_{ij}\tau}. (145)

Consequently,

ρijn\displaystyle\frac{\rho_{ij}}{n} Pr((a,b)E(t)(𝒮(t),(t))=(𝒮0,0))\displaystyle\geq\text{Pr}((a,b)\in E(t)\mid(\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0}))
ρijn0teBijτfT(𝒮(t),(t))=(𝒮0,0)(τ)dτ\displaystyle\geq\frac{\rho_{ij}}{n}\int_{0}^{t}e^{-B_{ij}\tau}f_{T\mid(\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0})}(\tau)d\tau
ρijn(1Bijλ(1eλt)),\displaystyle\geq\frac{\rho_{ij}}{n}\left(1-\frac{B_{ij}}{\lambda}(1-e^{-\lambda t})\right),

where the second inequality follows from Lemmas 11 and 12. Since (𝒮0,0)(\mathcal{S}_{0},\mathcal{I}_{0}) is an arbitrary realization of (𝒮(t),(t))(\mathcal{S}(t),\mathcal{I}(t)), the assertion of the proposition follows.

Remark 4.

The proof of Proposition 2 enables us to make a stronger statement about the conditional probability of bb being in contact with aa at time tt. Indeed, consider (141) and observe that it holds for all realizations (𝒮0,0)(\mathcal{S}_{0},\mathcal{I}_{0}) of (𝒮(t),(t))(\mathcal{S}(t),\mathcal{I}(t)) that satisfy a𝒮0𝒜ia\in\mathcal{S}_{0}\cap\mathcal{A}_{i} and b0𝒜jb\in\mathcal{I}_{0}\cap\mathcal{A}_{j}. It follows that

Pr((a,b)E(t)𝒮(t),(t),T=τ,K=κ)=ρijn1ρijn1+(1ρijn)eBijmin{τ,κ}\text{Pr}((a,b)\in E(t)\mid\mathcal{S}(t),\mathcal{I}(t),T=\tau,K=\kappa)=\frac{\rho_{ij}}{n}\cdot\frac{1}{\frac{\rho_{ij}}{n}\cdot 1+\left(1-\frac{\rho_{ij}}{n}\right)e^{B_{ij}\min\{\tau,\kappa\}}}

holds for all node pairs (a,b)𝒮i(t)×j(t)(a,b)\in\mathcal{S}_{i}(t)\times\mathcal{I}_{j}(t). Equivalently, the following holds for all (a,b)𝒮i(t)×j(t)(a,b)\in\mathcal{S}_{i}(t)\times\mathcal{I}_{j}(t):

Pr((a,b)E(t)𝒮(t),(t),T,K)=ρijn1ρijn1+(1ρijn)eBijmin{T,K}.\text{Pr}((a,b)\in E(t)\mid\mathcal{S}(t),\mathcal{I}(t),T,K)=\frac{\rho_{ij}}{n}\cdot\frac{1}{\frac{\rho_{ij}}{n}\cdot 1+\left(1-\frac{\rho_{ij}}{n}\right)e^{B_{ij}\min\{T,K\}}}.

Lemma 10.

Let τ1,τ2[0,t]\tau_{1},\tau_{2}\in[0,t]. Then

1Pr(b\centernot0,ta|b(t),K=κ,T=τ2,(a,b)E(t))Pr(b\centernot0,ta|b(t),K=κ,T=τ1,(a,b)E(t))eBij(τ2τ1).1\leq\frac{\text{Pr}\bigg{(}b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{2},(a,b)\notin E(t)\bigg{)}}{\text{Pr}\bigg{(}b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{1},(a,b)\notin E(t)\bigg{)}}\leq e^{B_{ij}(\tau_{2}-\tau_{1})}.

Proof.

Consider Ton(τ2):=(tκtτ21(a,b)(τ)dτ)+T_{\text{on}}^{(\tau_{2})}:=\left(\int_{t-\kappa}^{t-\tau_{2}}1_{(a,b)}(\tau^{\prime})d\tau^{\prime}\right)_{+}, which denotes the duration of time for which bb is in contact with aa during the interval [tκ,tτ2)[t-\kappa,t-\tau_{2}). Then, for any σ[0,(κτ2)+]\sigma\in[0,(\kappa-\tau_{2})_{+}], we have

Pr(b\centernot0,ta|Ton(τ2)=σ,b(t),K=κ,T=τ1,(a,b)E(t))\displaystyle\text{Pr}\bigg{(}b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,T_{\text{on}}^{(\tau_{2})}=\sigma,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{1},(a,b)\notin E(t)\bigg{)} (146)
=Pr(b\centernottκ,κτ1a|Ton(τ2)=σ,b(t),K=κ,T=τ1,(a,b)E(t))\displaystyle=\text{Pr}\bigg{(}b\stackrel{{\scriptstyle t-\kappa,\kappa-\tau_{1}}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,T_{\text{on}}^{(\tau_{2})}=\sigma,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{1},(a,b)\notin E(t)\bigg{)} (147)
=Pr(b\centernottκ,κτ2a|Ton(τ2)=σ,b(t),K=κ,T=τ1,(a,b)E(t))\displaystyle=\text{Pr}\bigg{(}b\stackrel{{\scriptstyle t-\kappa,\kappa-\tau_{2}}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,T_{\text{on}}^{(\tau_{2})}=\sigma,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{1},(a,b)\notin E(t)\bigg{)} (148)
×Pr(b\centernottτ2,τ2τ1a|Ton(τ2)=σ,b(t),K=κ,T=τ1,(a,b)E(t))\displaystyle\quad\times\text{Pr}\bigg{(}b\stackrel{{\scriptstyle t-\tau_{2},\tau_{2}-\tau_{1}}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,T_{\text{on}}^{(\tau_{2})}=\sigma,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{1},(a,b)\notin E(t)\bigg{)} (149)
(a)Pr(b\centernottκ,κτ2a|Ton(τ2)=σ,b(t),K=κ,T=τ1,(a,b)E(t))eBij(τ2τ1)\displaystyle\stackrel{{\scriptstyle(a)}}{{\geq}}\text{Pr}\bigg{(}b\stackrel{{\scriptstyle t-\kappa,\kappa-\tau_{2}}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,T_{\text{on}}^{(\tau_{2})}=\sigma,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{1},(a,b)\notin E(t)\bigg{)}e^{-B_{ij}(\tau_{2}-\tau_{1})} (150)
(b)eBijσeBij(τ2τ1)\displaystyle\stackrel{{\scriptstyle(b)}}{{\geq}}e^{-B_{ij}\sigma}\cdot e^{-B_{ij}(\tau_{2}-\tau_{1})} (151)
=(c)Pr(b\centernottκ,κτ2a|Ton(τ2)=σ,b(t),K=κ,T=τ2,(a,b)E(t))eBij(τ2τ1)\displaystyle\stackrel{{\scriptstyle(c)}}{{=}}\text{Pr}\bigg{(}b\stackrel{{\scriptstyle t-\kappa,\kappa-\tau_{2}}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,T_{\text{on}}^{(\tau_{2})}=\sigma,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{2},(a,b)\notin E(t)\bigg{)}e^{-B_{ij}(\tau_{2}-\tau_{1})} (152)
=Pr(b\centernot0,ta|Ton(τ2)=σ,b(t),K=κ,T=τ2,(a,b)E(t))eBij(τ2τ1),\displaystyle=\text{Pr}\bigg{(}b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,T_{\text{on}}^{(\tau_{2})}=\sigma,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{2},(a,b)\notin E(t)\bigg{)}e^{-B_{ij}(\tau_{2}-\tau_{1})}, (153)

where (a) follows from the fact that tτ2tτ11(a,b)(τ)dττ2τ1\int_{t-\tau_{2}}^{t-\tau_{1}}1_{(a,b)}(\tau^{\prime})d\tau^{\prime}\leq\tau_{2}-\tau_{1} and from the observation that

Pr(b\centernottτ2,τ2τ1a|Ton(τ2)=σ,b(t),K=κ,T=τ1,(a,b)E(t),tτ2tτ11(a,b)(τ)dτ)\displaystyle\text{Pr}\bigg{(}b\stackrel{{\scriptstyle t-\tau_{2},\tau_{2}-\tau_{1}}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,T_{\text{on}}^{(\tau_{2})}=\sigma,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{1},(a,b)\notin E(t),\int_{t-\tau_{2}}^{t-\tau_{1}}1_{(a,b)}(\tau^{\prime})d\tau^{\prime}\bigg{)}
=eBij(tτ2tτ11(a,b)(τ)dτ),\displaystyle=e^{-B_{ij}\left(\int_{t-\tau_{2}}^{t-\tau_{1}}1_{(a,b)}(\tau^{\prime})d\tau^{\prime}\right)},

the proof of which parallels the proof of Lemma 7, and (b)(b) and (c)(c) follow from the fact that

Pr(b\centernottκ,κτ2a|Ton(τ2)=σ,b(t),K=κ,T=τ,(a,b)E(t))=eBijσ\displaystyle\text{Pr}\bigg{(}b\stackrel{{\scriptstyle t-\kappa,\kappa-\tau_{2}}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,T_{\text{on}}^{(\tau_{2})}=\sigma,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau^{\prime},(a,b)\notin E(t)\bigg{)}=e^{-B_{ij}\sigma}

holds for all τ[0,t]\tau^{\prime}\in[0,t], the proof of which also parallels the proof of Lemma 7.

We now eliminate Ton(τ2)T_{\text{on}}^{(\tau_{2})} from (146). To do so, we first note the following: given that tTtτ2t-T\geq t-\tau_{2}, the random variable Ton(τ2)T_{\text{on}}^{(\tau_{2})} is by definition conditionally independent of tTt-T (the time of the last edge update of (a,b)(a,b) during [tτ2,t][t-\tau_{2},t]) because the edge update process for (a,b)(a,b) is a Poisson process and hence, for any collection of disjoint time intervals, the times at which 1(a,b)1_{(a,b)} is updated during the intervals are independent of each other. Since {T=τ2},{T=τ1}{tTtτ2}\{T=\tau_{2}\},\{T=\tau_{1}\}\subset\{t-T\geq t-\tau_{2}\} and since {tTtτ2}={Tτ2}\{t-T\geq t-\tau_{2}\}=\{T\leq\tau_{2}\}, it follows that

Pr(b\centernot0,ta|b(t),K=κ,T=τ1,(a,b)E(t))\displaystyle\text{Pr}\bigg{(}b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{1},(a,b)\notin E(t)\bigg{)} (154)
=0(κτ2)+Pr(b\centernot0,ta|Ton(τ2)=σ,b(t),K=κ,T=τ1,(a,b)E(t))\displaystyle=\int_{0}^{(\kappa-\tau_{2})_{+}}\text{Pr}\bigg{(}b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,T_{\text{on}}^{(\tau_{2})}=\sigma,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{1},(a,b)\notin E(t)\bigg{)} (155)
fTon(τ2)b(t),K=κ,T=τ1,(a,b)E(t)(σ)dσ\displaystyle\quad\quad\quad\quad\quad\quad\cdot f_{T_{\text{on}}^{(\tau_{2})}\mid b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{1},(a,b)\notin E(t)}(\sigma)d\sigma (156)
=0(κτ2)+Pr(b\centernot0,ta|Ton(τ2)=σ,b(t),K=κ,T=τ1,(a,b)E(t))\displaystyle=\int_{0}^{(\kappa-\tau_{2})_{+}}\text{Pr}\bigg{(}b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,T_{\text{on}}^{(\tau_{2})}=\sigma,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{1},(a,b)\notin E(t)\bigg{)} (157)
fTon(τ2)b(t),K=κ,Tτ2,(a,b)E(t)(σ)dσ,\displaystyle\quad\quad\quad\quad\quad\quad\cdot f_{T_{\text{on}}^{(\tau_{2})}\mid b\in\mathcal{I}(t^{-}),K=\kappa,T\leq\tau_{2},(a,b)\notin E(t)}(\sigma)d\sigma, (158)

and likewise,

Pr(b\centernot0,ta|b(t),K=κ,T=τ2,(a,b)E(t))\displaystyle\text{Pr}\bigg{(}b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{2},(a,b)\notin E(t)\bigg{)}
=0(κτ2)+Pr(b\centernot0,ta|Ton(τ2)=σ,b(t),K=κ,T=τ2,(a,b)E(t))\displaystyle=\int_{0}^{(\kappa-\tau_{2})_{+}}\text{Pr}\bigg{(}b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,T_{\text{on}}^{(\tau_{2})}=\sigma,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{2},(a,b)\notin E(t)\bigg{)}
fTon(τ2)b(t),K=κ,Tτ2,(a,b)E(t)(σ)dσ.\displaystyle\quad\quad\quad\quad\quad\quad\cdot f_{T_{\text{on}}^{(\tau_{2})}\mid b\in\mathcal{I}(t^{-}),K=\kappa,T\leq\tau_{2},(a,b)\notin E(t)}(\sigma)d\sigma.

Therefore, taking conditional expectations on both sides of (146) (w.r.t. the PDF fTon(τ2)b(t),K=κ,Tτ2,(a,b)E(t)f_{T_{\text{on}}^{(\tau_{2})}\mid b\in\mathcal{I}(t^{-}),K=\kappa,T\leq\tau_{2},(a,b)\notin E(t)}) yields

Pr(b\centernot0,ta|b(t),K=κ,T=τ1,(a,b)E(t))\displaystyle\text{Pr}\bigg{(}b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{1},(a,b)\notin E(t)\bigg{)}
Pr(b\centernot0,ta|b(t),K=κ,T=τ2,(a,b)E(t))eBij(τ2τ1),\displaystyle\geq\text{Pr}\bigg{(}b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{2},(a,b)\notin E(t)\bigg{)}e^{-B_{ij}(\tau_{2}-\tau_{1})},

which proves the upper bound. For the lower bound, we again proceed as in (146), but reverse the inequality signs:

Pr(b\centernot0,ta|Ton(τ2)=σ,b(t),K=κ,T=τ1,(a,b)E(t))\displaystyle\text{Pr}\bigg{(}b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,T_{\text{on}}^{(\tau_{2})}=\sigma,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{1},(a,b)\notin E(t)\bigg{)}
=Pr(b\centernottκ,κτ2a|Ton(τ2)=σ,b(t),K=κ,T=τ1,(a,b)E(t))\displaystyle=\text{Pr}\bigg{(}b\stackrel{{\scriptstyle t-\kappa,\kappa-\tau_{2}}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,T_{\text{on}}^{(\tau_{2})}=\sigma,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{1},(a,b)\notin E(t)\bigg{)}
×Pr(b\centernottτ2,τ2τ1a|Ton(τ2)=σ,b(t),K=κ,T=τ1,(a,b)E(t))\displaystyle\quad\times\text{Pr}\bigg{(}b\stackrel{{\scriptstyle t-\tau_{2},\tau_{2}-\tau_{1}}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,T_{\text{on}}^{(\tau_{2})}=\sigma,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{1},(a,b)\notin E(t)\bigg{)}
Pr(b\centernottκ,κτ2a|Ton(τ2)=σ,b(t),K=κ,T=τ1,(a,b)E(t))\displaystyle\stackrel{{\scriptstyle}}{{\leq}}\text{Pr}\bigg{(}b\stackrel{{\scriptstyle t-\kappa,\kappa-\tau_{2}}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,T_{\text{on}}^{(\tau_{2})}=\sigma,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{1},(a,b)\notin E(t)\bigg{)}
=eBijσ\displaystyle\stackrel{{\scriptstyle}}{{=}}e^{-B_{ij}\sigma}
=Pr(b\centernottκ,κτ2a|Ton(τ2)=σ,b(t),K=κ,T=τ2,(a,b)E(t))\displaystyle\stackrel{{\scriptstyle}}{{=}}\text{Pr}\bigg{(}b\stackrel{{\scriptstyle t-\kappa,\kappa-\tau_{2}}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,T_{\text{on}}^{(\tau_{2})}=\sigma,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{2},(a,b)\notin E(t)\bigg{)}
=Pr(b\centernot0,ta|Ton(τ2)=σ,b(t),K=κ,T=τ2,(a,b)E(t))\displaystyle=\text{Pr}\bigg{(}b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,T_{\text{on}}^{(\tau_{2})}=\sigma,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{2},(a,b)\notin E(t)\bigg{)}

In light of (154) and (Proof.), the above inequality implies the required lower bound.

Lemma 11.

Let TT denote the random time defined earlier. Then

0teBijτfT(𝒮(t),(t))=(𝒮0,0),(a,b)E(t)(τ)dτ1Bijλ(1eλt).\displaystyle\int_{0}^{t}e^{-B_{ij}\tau}f_{T\mid(\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0}),(a,b)\notin E(t)}(\tau)d\tau\geq 1-\frac{B_{ij}}{\lambda}(1-e^{-\lambda t}).

Proof.

We first use Bayes’ rule to note that for any κ[0,t]\kappa\in[0,t],

fTK=κ,(𝒮(t),(t))=(𝒮0,0),(a,b)E(t)(τ)\displaystyle f_{T\mid K=\kappa,(\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0}),(a,b)\notin E(t)}(\tau) (159)
=Pr((𝒮(t),(t))=(𝒮0,0)T=τ,K=κ,(a,b)E(t))fKT=τ,(a,b)E(t)(κ)fT(a,b)E(t)(τ)\displaystyle=\text{Pr}((\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0})\mid T=\tau,K=\kappa,(a,b)\notin E(t))\cdot f_{K\mid T=\tau,(a,b)\notin E(t)}(\kappa)\cdot f_{T\mid(a,b)\notin E(t)}(\tau) (160)
(0tPr((𝒮(t),(t))=(𝒮0,0)T=τ,K=κ,(a,b)E(t))fKT=τ,(a,b)E(t)(κ)fT(a,b)E(t)(τ)dτ)1.\displaystyle\quad\cdot\bigg{(}\int_{0}^{t}\text{Pr}((\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0})\mid T=\tau^{\prime},K=\kappa,(a,b)\notin E(t))\cdot f_{K\mid T=\tau^{\prime},(a,b)\notin E(t)}(\kappa)\cdot f_{T\mid(a,b)\notin E(t)}(\tau^{\prime})d\tau^{\prime}\bigg{)}^{-1}.

We consider each multiplicand one by one. First, we use Lemmas 2 and 5 to note that fKT=τ,(a,b)E(t)(κ)fT(a,b)E(t)(τ)=fK(κ)fT(τ)f_{K\mid T=\tau,(a,b)\notin E(t)}(\kappa)\cdot f_{T\mid(a,b)\notin E(t)}(\tau)=f_{K}(\kappa)f_{T}(\tau). To deal effectively with the other multiplicands, we let Ton:=(tKtT1(a,b)(τ)dτ)+T_{\text{on}}:=\left(\int_{t-K}^{t-T}1_{(a,b)}(\tau^{\prime})d\tau^{\prime}\right)_{+} denote the total duration of time for which bb is in contact with aa during the time interval [tK,tT][t-K,t-T], and we observe that

Pr((𝒮(t),(t))=(𝒮0,0)T=τ,K=κ,(a,b)E(t))\displaystyle\text{Pr}((\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0})\mid T=\tau,K=\kappa,(a,b)\notin E(t))
=0(κτ)+(Pr((𝒮(t),(t))=(𝒮0,0)T=τ,K=κ,Ton=τon,(a,b)E(t))\displaystyle=\int_{0}^{(\kappa-\tau)_{+}}\Big{(}\text{Pr}((\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0})\mid T=\tau,K=\kappa,T_{\text{on}}=\tau_{\text{on}},(a,b)\notin E(t))
fTonK=κ,T=τ,(a,b)E(t)(τon))dτon\displaystyle\quad\quad\quad\quad\quad\quad\cdot f_{T_{\text{on}}\mid K=\kappa,T=\tau,(a,b)\notin E(t)}(\tau_{\text{on}})\Big{)}d\tau_{\text{on}}
=0(κτ)+(Pr((𝒮(t),(t))=(𝒮0,0)T=τ,K=κ,Ton=τon,(a,b)E(t),b\centernot0,ta,b(t))\displaystyle=\int_{0}^{(\kappa-\tau)_{+}}\Big{(}\text{Pr}((\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0})\mid T=\tau,K=\kappa,T_{\text{on}}=\tau_{\text{on}},(a,b)\notin E(t),b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a,b\in\mathcal{I}(t^{-}))
Pr(b\centernot0,tab(t),T=τ,K=κ,Ton=τon,(a,b)E(t))\displaystyle\quad\quad\quad\quad\quad\quad\cdot\text{Pr}(b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\mid b\in\mathcal{I}(t^{-}),T=\tau,K=\kappa,T_{\text{on}}=\tau_{\text{on}},(a,b)\notin E(t))
Pr(b(t)T=τ,K=κ,Ton=τon,(a,b)E(t))\displaystyle\quad\quad\quad\quad\quad\quad\cdot\text{Pr}(b\in\mathcal{I}(t^{-})\mid T=\tau,K=\kappa,T_{\text{on}}=\tau_{\text{on}},(a,b)\notin E(t))
fTonK=κ,T=τ,(a,b)E(t)(τon))dτon\displaystyle\quad\quad\quad\quad\quad\quad\cdot f_{T_{\text{on}}\mid K=\kappa,T=\tau,(a,b)\notin E(t)}(\tau_{\text{on}})\Big{)}d\tau_{\text{on}}
=(a)0(κτ)+(Pr((𝒮(t),(t))=(𝒮0,0)K=κ,(a,b)E(t),b\centernot0,ta,b(t))\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\int_{0}^{(\kappa-\tau)_{+}}\Big{(}\text{Pr}((\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0})\mid K=\kappa,(a,b)\notin E(t),b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a,b\in\mathcal{I}(t^{-}))
eBijτoneγjκfTonK=κ,T=τ,(a,b)E(t)(τon))dτon,\displaystyle\quad\quad\quad\quad\quad\quad\cdot e^{-B_{ij}\tau_{\text{on}}}\cdot e^{-\gamma_{j}\kappa}\cdot f_{T_{\text{on}}\mid K=\kappa,T=\tau,(a,b)\notin E(t)}(\tau_{\text{on}})\Big{)}d\tau_{\text{on}},

where (a)(a) follows from Lemmas 7 and 9 and from the modelling assumption that bb recovers at rate γj\gamma_{j} independently of any edge state. On substituting the above expression into (159), we obtain

fTK=κ,(𝒮(t),(t))=(𝒮0,0),(a,b)E(t)(τ)=(0(κτ)+eBijτonψκ,τ(τon)dτon)fT(τ)0t(0(κτ)+eBijτonψκ,τ(τon)dτon)fT(τ)dτ,\displaystyle f_{T\mid K=\kappa,(\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0}),(a,b)\notin E(t)}(\tau)=\frac{\left(\int_{0}^{(\kappa-\tau)_{+}}e^{-B_{ij}\tau_{\text{on}}}\psi_{\kappa,\tau}(\tau_{\text{on}})d\tau_{\text{on}}\right)f_{T}(\tau)}{\int_{0}^{t}\left(\int_{0}^{(\kappa-\tau^{\prime})_{+}}e^{-B_{ij}\tau_{\text{on}}}\psi_{\kappa,\tau^{\prime}}(\tau_{\text{on}})d\tau_{\text{on}}\right)f_{T}(\tau^{\prime})d\tau^{\prime}}, (162)

where ψκ,τ():=fTonK=κ,T=τ,(a,b)E(t)()\psi_{\kappa,\tau}(\cdot):=f_{T_{\text{on}}\mid K=\kappa,T=\tau,(a,b)\notin E(t)}(\cdot). Now, Lemma 7 implies that

0(κτ)+eBijτonψκ,τ(τon)dτon\displaystyle\int_{0}^{(\kappa-\tau)_{+}}e^{-B_{ij}\tau_{\text{on}}}\psi_{\kappa,\tau}(\tau_{\text{on}})d\tau_{\text{on}}
=0(κτ)+Pr(b\centernot0,ta|Ton=τon,b(t),K=κ,T=τ,(a,b)E(t))fTonK=κ,T=τ,(a,b)E(t)(τon)dτon\displaystyle=\int_{0}^{(\kappa-\tau)_{+}}\text{Pr}\bigg{(}b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,T_{\text{on}}=\tau_{\text{on}},b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau,(a,b)\notin E(t)\bigg{)}\cdot f_{T_{\text{on}}\mid K=\kappa,T=\tau,(a,b)\notin E(t)}(\tau_{\text{on}})d\tau_{\text{on}}
=(a)0(κτ)+Pr(b\centernot0,ta|Ton=τon,b(t),K=κ,T=τ,(a,b)E(t))\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\int_{0}^{(\kappa-\tau)_{+}}\text{Pr}\bigg{(}b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,T_{\text{on}}=\tau_{\text{on}},b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau,(a,b)\notin E(t)\bigg{)}
fTonb(t),K=κ,T=τ,(a,b)E(t)(τon)dτon\displaystyle\quad\quad\quad\quad\quad\,\,\cdot f_{T_{\text{on}}\mid b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau,(a,b)\notin E(t)}(\tau_{\text{on}})d\tau_{\text{on}}
=Pr(b\centernot0,ta|b(t),K=κ,T=τ,(a,b)E(t)),\displaystyle=\text{Pr}\bigg{(}b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau,(a,b)\notin E(t)\bigg{)},

where (a)(a) holds because the recovery time of bb is conditionally independent of TonT_{\text{on}} given K=κK=\kappa (recall that bb recovers at rate γj\gamma_{j} independently of any edge state (and hence independently of TonT_{\text{on}}), and {b(t)}\{b\in\mathcal{I}(t^{-})\} is precisely the event that the recovery time of bb is at least KK). Hence, (162) implies that for any τ1,τ2[0,t]\tau_{1},\tau_{2}\in[0,t] satisfying τ1τ2\tau_{1}\leq\tau_{2}, we have

g(τ2)g(τ1)=Pr(b\centernot0,ta|b(t),K=κ,T=τ2,(a,b)E(t))Pr(b\centernot0,ta|b(t),K=κ,T=τ1,(a,b)E(t))fT(τ2)fT(τ1),\displaystyle\frac{g(\tau_{2})}{g(\tau_{1})}=\frac{\text{Pr}\bigg{(}b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{2},(a,b)\notin E(t)\bigg{)}}{\text{Pr}\bigg{(}b\stackrel{{\scriptstyle 0,t}}{{\centernot\rightsquigarrow}}a\,\Big{\lvert}\,b\in\mathcal{I}(t^{-}),K=\kappa,T=\tau_{1},(a,b)\notin E(t)\bigg{)}}\cdot\frac{f_{T}(\tau_{2})}{f_{T}(\tau_{1})}, (163)

where g():=fTK=κ(𝒮(t),(t))=(𝒮0,0),(a,b)E(t)()g(\cdot):=f_{T\mid K=\kappa(\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0}),(a,b)\notin E(t)}(\cdot).

As a consequence of (163) and Lemma 10, we have

g(τ2)g(τ1)eBij(τ2τ1)fT(τ2)fT(τ1).\displaystyle\frac{g(\tau_{2})}{g(\tau_{1})}\leq e^{B_{ij}(\tau_{2}-\tau_{1})}\frac{f_{T}(\tau_{2})}{f_{T}(\tau_{1})}.

Since fT(τ)=λeλτ+eλtδ(τt)f_{T}(\tau)=\lambda e^{-\lambda\tau}+e^{-\lambda t}\delta(\tau-t) for τ[0,t]\tau\in[0,t] (see Lemma 4), we have the following for all 0τ1τ2<t0\leq\tau_{1}\leq\tau_{2}<t:

g(τ2)e(λBij)(τ2τ1)g(τ1),\displaystyle g(\tau_{2})\leq e^{-(\lambda-B_{ij})(\tau_{2}-\tau_{1})}g(\tau_{1}), (164)

and for all 0τ<t0\leq\tau<t, we have

g~(t)e(λBij)(tτ)λg(τ),\displaystyle\tilde{g}(t)\leq\frac{e^{-(\lambda-B_{ij})(t-\tau)}}{\lambda}g(\tau), (165)

where g~(t)\tilde{g}(t) scales δ(0)\delta(0) so that g(t)=g~(t)δ(0)g(t)=\tilde{g}(t)\delta(0). Since δ\delta is the Dirac-delta function, (165) simply means that

Pr(T=tK=κ,(𝒮(t),(t))=(𝒮0,0),(a,b)E(t))e(λBij)(tτ)λg(τ).\displaystyle\text{Pr}(T=t\mid K=\kappa,(\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0}),(a,b)\notin E(t))\leq\frac{e^{-(\lambda-B_{ij})(t-\tau)}}{\lambda}g(\tau).

Our next goal is to use (164) and (165) to show that

0teBijτg(τ)dτ0teBijτφ(τ)dτ,\displaystyle\int_{0}^{t}e^{-B_{ij}\tau}g(\tau)d\tau\geq\int_{0}^{t}e^{-B_{ij}\tau}\varphi(\tau)d\tau, (166)

where φ\varphi is the probability density function defined by φ(τ):=(λBij)e(λBij)τ+e(λBij)tδ(τt)\varphi(\tau):=(\lambda-B_{ij})e^{-(\lambda-B_{ij})\tau}+e^{-(\lambda-B_{ij})t}\delta(\tau-t) for all τ[0,t]\tau\in[0,t] and φ(τ)=0\varphi(\tau)=0 for τ>t\tau>t. To this end, we compare gg with φ\varphi under the following two cases.

Case 1: There exists a time τ0[0,t)\tau_{0}\in[0,t) such that g(τ0)<φ(τ0)g(\tau_{0})<\varphi(\tau_{0}). In this case, (164) implies that for all τ[τ0,t)\tau\in[\tau_{0},t),

g(τ)\displaystyle g(\tau) e(λBij)(ττ0)g(τ0)\displaystyle\leq e^{-(\lambda-B_{ij})(\tau-\tau_{0})}g(\tau_{0})
<e(λBij)(ττ0)(λBij)e(λBij)(τ0)\displaystyle<e^{-(\lambda-B_{ij})(\tau-\tau_{0})}(\lambda-B_{ij})e^{-(\lambda-B_{ij})(\tau_{0})}
=φ(τ),\displaystyle=\varphi(\tau),

which means that the set {τ[0,t):g(τ)<φ(τ)}\{\tau\in[0,t):g(\tau)<\varphi(\tau)\} is either [τ,t)[\tau^{*},t) or (τ,t)(\tau^{*},t), where τ:=inf{τ:g(τ)<φ(τ)}\tau^{*}:=\inf\{\tau:g(\tau)<\varphi(\tau)\}. Also, by the definition of τ\tau^{*}, we have g(τ)φ(τ)g(\tau)\geq\varphi(\tau) for all τ[0,τ)\tau\in[0,\tau^{*}). Next, to compare gg and φ\varphi at τ=t\tau=t, we use (165) to note that

g(t)\displaystyle g(t) e(λBij)(tτ0)λg(τ0)δ(0)\displaystyle\leq\frac{e^{-(\lambda-B_{ij})(t-\tau_{0})}}{\lambda}g(\tau_{0})\delta(0) (167)
e(λBij)(tτ0)λ(λBij)e(λBij)τ0δ(0)\displaystyle\leq\frac{e^{-(\lambda-B_{ij})(t-\tau_{0})}}{\lambda}(\lambda-B_{ij})e^{-(\lambda-B_{ij})\tau_{0}}\delta(0) (168)
=(1Bijλ)e(λBij)tδ(0)\displaystyle=\left(1-\frac{B_{ij}}{\lambda}\right)e^{-(\lambda-B_{ij})t}\delta(0) (169)
φ(t).\displaystyle\leq\varphi(t). (170)

Thus, g(τ)φ(τ)0g(\tau)-\varphi(\tau)\geq 0 for all τ[0,τ)\tau\in[0,\tau^{*}) and g(τ)φ(τ)0g(\tau)-\varphi(\tau)\leq 0 for all τ(τ,t]\tau\in(\tau^{*},t]. Now, since gg and φ\varphi are both PDFs, we must have 0(g(τ)φ(τ))dτ=0\int_{0}^{\infty}(g(\tau)-\varphi(\tau))d\tau=0 or equivalently,

0τ(g(τ)φ(τ))dτ=τ(φ(τ)g(τ))dτ.\int_{0}^{\tau^{*}}(g(\tau)-\varphi(\tau))d\tau=\int_{\tau^{*}}^{\infty}(\varphi(\tau)-g(\tau))d\tau.

Since both the integrands above are non-negative, we have

0τeBijτ(g(τ)φ(τ))dτ\displaystyle\int_{0}^{\tau^{*}}e^{-B_{ij}\tau}(g(\tau)-\varphi(\tau))d\tau
eBijτ0τ(g(τ)φ(τ))dτ\displaystyle\geq e^{-B_{ij}\tau^{*}}\int_{0}^{\tau^{*}}(g(\tau)-\varphi(\tau))d\tau
=eBijττ(φ(τ)g(τ))dτ\displaystyle=e^{-B_{ij}\tau^{*}}\int_{\tau^{*}}^{\infty}(\varphi(\tau)-g(\tau))d\tau
τeBijτ(φ(τ)g(τ))dτ.\displaystyle\geq\int_{\tau^{*}}^{\infty}e^{-B_{ij}\tau}(\varphi(\tau)-g(\tau))d\tau.

Adding τeBijτg(τ)dτ+0τeBijτφ(τ)dτ\int_{\tau^{*}}^{\infty}e^{-B_{ij}\tau}g(\tau)d\tau+\int_{0}^{\tau^{*}}e^{-B_{ij}\tau}\varphi(\tau)d\tau to both sides now yields (166).

Case 2: g(τ)φ(τ)g(\tau)\geq\varphi(\tau) for all τ[0,t)\tau\in[0,t). In this case, we can simply set τ=t\tau^{*}=t and repeat the arguments following  (167) in Case 1 to show that (166) holds.

Next, we use the definition of φ\varphi to evaluate 0teBijτφ(τ)dτ\int_{0}^{t}e^{-B_{ij}\tau}\varphi(\tau)d\tau, and we then restate (166) as follows:

0teBijτfTK=κ,(𝒮(t),(t))=(𝒮0,0),(a,b)E(t)(τ)dτ1Bijλ(1eλt).\displaystyle\int_{0}^{t}e^{-B_{ij}\tau}f_{T\mid K=\kappa,(\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0}),(a,b)\notin E(t)}(\tau)d\tau\geq 1-\frac{B_{ij}}{\lambda}(1-e^{-\lambda t}). (171)

Since this holds for all κ[0,t)\kappa\in[0,t), the assertion of the lemma follows.

Using arguments very similar to the proof above, we can prove the following result.

Lemma 12.

Let TT denote the random time defined earlier. Then

0teBijτfT(𝒮(t),(t))=(𝒮0,0),(a,b)E(t)(τ)dτ1Bijλ(1eλt).\displaystyle\int_{0}^{t}e^{-B_{ij}\tau}f_{T\mid(\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0}),(a,b)\in E(t)}(\tau)d\tau\geq 1-\frac{B_{ij}}{\lambda}(1-e^{-\lambda t}).

Remark 5.

In the proof of Lemma 11, if, instead of using (163) along with the upper bound in Lemma 10, we had used (163) along with the lower bound in Lemma 10, we would have obtained

g(τ2)g(τ1)fT(τ2)fT(τ1)=eλ(τ2τ1).\frac{g(\tau_{2})}{g(\tau_{1})}\geq\frac{f_{T}(\tau_{2})}{f_{T}(\tau_{1})}=e^{-\lambda(\tau_{2}-\tau_{1})}.

In addition, if we had subsequently replaced tt with a generic τ[0,t)\tau\in[0,t) and the weighting function [0,)τeBijτ(0,)[0,\infty)\ni\tau\rightarrow e^{-B_{ij}\tau}\in(0,\infty) by the constant function 11, and if we had defined φ\varphi by φ(τ):=λeλτ+eλtδ(τt)\varphi(\tau):=\lambda e^{-\lambda\tau}+e^{-\lambda t}\delta(\tau-t), then using the same arguments but with reversed inequality signs, we would have been able to prove that

0τ1g(τ)dτ0τ1φ(τ)dτ.\int_{0}^{\tau}1\cdot g(\tau^{\prime})d\tau^{\prime}\leq\int_{0}^{\tau}1\cdot\varphi(\tau^{\prime})d\tau^{\prime}.

Since the integral on the left-hand-side is Pr(TτK=κ,(𝒮(t),(t))=(𝒮0,0),(a,b)E(t))\text{Pr}(T\leq\tau\mid K=\kappa,(\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0}),(a,b)\notin E(t)) and since the right-hand-side evaluates to 1eλτ1-e^{-\lambda\tau}, we conclude that

Pr(TτK=κ,(𝒮(t),(t))=(𝒮0,0),(a,b)E(t))1eλτ\text{Pr}(T\leq\tau\mid K=\kappa,(\mathcal{S}(t),\mathcal{I}(t))=(\mathcal{S}_{0},\mathcal{I}_{0}),(a,b)\notin E(t))\leq 1-e^{-\lambda\tau}

for all τ[0,t)\tau\in[0,t) and all κ[0,t]\kappa\in[0,t].

Some Auxiliary Lemmas

In addition to the above results, the proof of Theorem 1 relies on the following lemmas, which we reproduce from [30].

Lemma 13.

For random variables YY and ZZ, we have Var[Y+Z]2(Var[Y]+Var[Z])\text{Var}[Y+Z]\leq 2(\text{Var}[Y]+\text{Var}[Z]).

Lemma 14.

For a random variable Y[0,1]Y\in[0,1], we have Var[Y2]4Var[Y]\text{Var}[Y^{2}]\leq 4\text{Var}[Y].

Lemma 15.

For random variables YY and ZZ in [0,1][0,1],

|E[YZ]E[Y]E[Z]|(Var[Y]+Var[Z])/2|\mathrm{E}[YZ]-\mathrm{E}[Y]\mathrm{E}[Z]|\leq(\operatorname{Var}[Y]+\operatorname{Var}[Z])/2
|E[Y2Z]E[Y]2E[Z]|2(Var[Y]+Var[Z]).\left|\mathrm{E}\left[Y^{2}Z\right]-\mathrm{E}[Y]^{2}\mathrm{E}[Z]\right|\leq 2(\operatorname{Var}[Y]+\operatorname{Var}[Z]).

The following result is a straightforward consequence of the above lemmas.

Corollary 1.

For non-negative random variables YY and ZZ satisfying 0Y+Z10\leq Y+Z\leq 1, we have

Var[YZ]8(Var[Y]+Var[Z]).\text{Var}[YZ]\leq 8(\text{Var}[Y]+\text{Var}[Z]).

Proof.

Note that

4Var[YZ]=Var[2YZ]\displaystyle 4\text{Var}[YZ]=\text{Var}[2YZ] =Var[(Y+Z)2+(1)(Y2+Z2)]\displaystyle=\text{Var}[(Y+Z)^{2}+(-1)(Y^{2}+Z^{2})]
(a)2(Var[(Y+Z)2]+(1)2Var[Y2+Z2])\displaystyle\stackrel{{\scriptstyle(a)}}{{\leq}}2(\text{Var}[(Y+Z)^{2}]+(-1)^{2}\text{Var}[Y^{2}+Z^{2}])
(b)2(4Var[Y+Z]+2(Var[Y2]+Var[Z2]))\displaystyle\stackrel{{\scriptstyle(b)}}{{\leq}}2(4\text{Var}[Y+Z]+2(\text{Var}[Y^{2}]+\text{Var}[Z^{2}]))
(c)2(4Var[Y+Z]+2(4Var[Y]+4Var[Z])),\displaystyle\stackrel{{\scriptstyle(c)}}{{\leq}}2(4\text{Var}[Y+Z]+2(4\text{Var}[Y]+4\text{Var}[Z])),

where (a) follows from Lemma 13, (b) from both Lemma 13 and Lemma 14, and (c) from Lemma 14 alone. Thus,

Var[YZ]2(Var[Y+Z]+2(Var[Y]+Var[Z]))8(Var[Y]+Var[Z]),\text{Var}[YZ]\leq 2(\text{Var}[Y+Z]+2(\text{Var}[Y]+\text{Var}[Z]))\leq 8(\text{Var}[Y]+\text{Var}[Z]),

where the last inequality follows from Lemma 13.

Proof of Theorem 1

Proof.

The proof is based on Proposition 2 and it follows the approach used in [30]. We first modify Equations (i) - (iv) (Proposition 1) by expressing the expectations of cross-terms such as 𝔼[siβj]\mathbb{E}[s_{i}\beta_{j}] in terms of expectations of individual terms such as 𝔼[si2]\mathbb{E}[s_{i}^{2}] and 𝔼[βj]\mathbb{E}[\beta_{j}]. To begin, we apply Lemma 15 to 𝔼[si(t)βj(t)]\mathbb{E}[s_{i}(t)\beta_{j}(t)] and obtain

|𝔼[si(t)βj(t)]𝔼[si(t)]𝔼[βj(t)]|12(Var[si(t)]+Var[βj(t)]).\displaystyle|\mathbb{E}[s_{i}(t)\beta_{j}(t)]-\mathbb{E}[s_{i}(t)]\mathbb{E}[\beta_{j}(t)]|\leq\frac{1}{2}(\text{Var}[s_{i}(t)]+\text{Var}[\beta_{j}(t)]).

Therefore, there exists a function hi,j,1,n:[0,)[1,1]h_{i,j,1,n}:[0,\infty)\rightarrow[-1,1] such that

𝔼[si(t)βj(t)]\displaystyle\mathbb{E}[s_{i}(t)\beta_{j}(t)] =𝔼[si(t)]𝔼[βj(t)]+hi,j,1,n(t)2(Var[si(t)]+Var[βj(t)]).\displaystyle=\mathbb{E}[s_{i}(t)]\mathbb{E}[\beta_{j}(t)]+\frac{h_{i,j,1,n}(t)}{2}(\text{Var}[s_{i}(t)]+\text{Var}[\beta_{j}(t)]).

Similarly, we can use Lemma 15 to show that there exists a function hi,j,2,n:[0,)[1,1]h_{i,j,2,n}:[0,\infty)\rightarrow[-1,1] such that

𝔼[si2(t)βj(t)]\displaystyle\mathbb{E}[s_{i}^{2}(t)\beta_{j}(t)] =𝔼[si(t)]2𝔼[βj(t)]+2hi,j,2,n(t)(Var[si(t)]+Var[βj(t)]).\displaystyle=\mathbb{E}[s_{i}(t)]^{2}\mathbb{E}[\beta_{j}(t)]+2h_{i,j,2,n}(t)(\text{Var}[s_{i}(t)]+\text{Var}[\beta_{j}(t)]).

Next, we use Corollary 1 to express 𝔼[si(t)βj(t)βi(t)]\mathbb{E}[s_{i}(t)\beta_{j}(t)\beta_{i}(t)] as

𝔼[si(t)βj(t)βi(t)]\displaystyle\mathbb{E}[s_{i}(t)\beta_{j}(t)\beta_{i}(t)] =𝔼[si(t)βj(t)]𝔼[βi(t)]+hi,j,5,n(t)2(Var[si(t)βj(t)]+Var[βi(t)])\displaystyle=\mathbb{E}[s_{i}(t)\beta_{j}(t)]\mathbb{E}[\beta_{i}(t)]+\frac{h_{i,j,5,n}(t)}{2}(\text{Var}[s_{i}(t)\beta_{j}(t)]+\text{Var}[\beta_{i}(t)])
=(𝔼[si(t)]𝔼[βj(t)]+hi,j,1,n(t)2(Var[si(t)]+Var[βj(t)]))𝔼[βi(t)]\displaystyle=\Big{(}\mathbb{E}[s_{i}(t)]\mathbb{E}[\beta_{j}(t)]+\frac{h_{i,j,1,n}(t)}{2}(\text{Var}[s_{i}(t)]+\text{Var}[\beta_{j}(t)])\Big{)}\mathbb{E}[\beta_{i}(t)]
+hi,j,5,n(t)2(hi,j,6,n(t)(8Var[si(t)]+8Var[βj(t)])+Var[βi(t)]),\displaystyle\quad+\frac{h_{i,j,5,n}(t)}{2}(h_{i,j,6,n}(t)(8\text{Var}[s_{i}(t)]+8\text{Var}[\beta_{j}(t)])+\text{Var}[\beta_{i}(t)]),

where hi,j,5,n(t)[1,1]h_{i,j,5,n}(t)\in[-1,1] and hi,j,6,n(t)[0,1]h_{i,j,6,n}(t)\in[0,1].

We thus obtain the following relations:

  1. (I)

    𝔼[siβj]=𝔼[si]𝔼[βj]+hi,j,1,n2(Var[si]+Var[βj])\mathbb{E}[s_{i}\beta_{j}]=\mathbb{E}[s_{i}]\mathbb{E}[\beta_{j}]+\frac{h_{i,j,1,n}}{2}(\text{Var}[s_{i}]+\text{Var}[\beta_{j}]),

  2. (II)

    𝔼[si2βj]=𝔼[si]2𝔼[βj]+2hi,j,2,n(Var[si]+Var[βj])\mathbb{E}[s_{i}^{2}\beta_{j}]=\mathbb{E}[s_{i}]^{2}\mathbb{E}[\beta_{j}]+2h_{i,j,2,n}(\text{Var}[s_{i}]+\text{Var}[\beta_{j}]),

  3. (III)

    𝔼[siβjβi]=(𝔼[si]𝔼[βj]+hi,j,1,n2(Var[si]+Var[βj]))+hi,j,5,n2(hi,j,6,n(8Var[si]+8Var[βj])+Var[βi]).\mathbb{E}[s_{i}\beta_{j}\beta_{i}]=\Big{(}\mathbb{E}[s_{i}]\mathbb{E}[\beta_{j}]+\frac{h_{i,j,1,n}}{2}(\text{Var}[s_{i}]+\text{Var}[\beta_{j}])\Big{)}+\frac{h_{i,j,5,n}}{2}(h_{i,j,6,n}(8\text{Var}[s_{i}]+8\text{Var}[\beta_{j}])+\text{Var}[\beta_{i}]).

To handle terms of the form Bij𝔼[nχij(t,𝒮,)U]B_{ij}\mathbb{E}[n\cdot\chi_{ij}(t,\mathcal{S},\mathcal{I})\cdot U] where UU is some random variable, we use Proposition 2 to obtain

Aij(1Bijλ(n)(1eλ(n)t))𝔼[U]Bij𝔼[nχij(t,𝒮,)U]Aij𝔼[U].\displaystyle A_{ij}\left(1-\frac{B_{ij}}{\lambda^{(n)}}(1-e^{-\lambda^{(n)}t})\right)\mathbb{E}[U]\leq B_{ij}\mathbb{E}[n\chi_{ij}(t,\mathcal{S},\mathcal{I})U]\leq A_{ij}\mathbb{E}[U].

As a result, if Pr(|U|1)=1\text{Pr}(|U|\leq 1)=1, then there exists a function hi,j,U,n:[0,)[0,BijAij]h_{i,j,U,n}:[0,\infty)\rightarrow[0,B_{ij}A_{ij}] such that

Bij𝔼[nχij(t,𝒮,)U]=Aij𝔼[U]hi,j,U,n(t)λ(n).\displaystyle B_{ij}\mathbb{E}[n\chi_{ij}(t,\mathcal{S},\mathcal{I})U]=A_{ij}\mathbb{E}[U]-\frac{h_{i,j,U,n}(t)}{\lambda^{(n)}}. (172)

By making the above substitutions in (i) - (iv), and by using the identity Var[Y]=𝔼[Y2]2𝔼[Y]𝔼[Y]\text{Var}[Y]^{\prime}=\mathbb{E}[Y^{2}]^{\prime}-2\mathbb{E}[Y]\mathbb{E}[Y]^{\prime}, we obtain the following differential equations:

  1. (I)

    𝔼[si]=j=1mhi,j,7,nλ(n)j=1mAij(𝔼[si]𝔼[βj]+hi,j,1,n2(Var[si]+Var[βj]))\mathbb{E}[s_{i}]^{\prime}=\sum_{j=1}^{m}\frac{h_{i,j,7,n}}{\lambda^{(n)}}-\sum_{j=1}^{m}A_{ij}\left(\mathbb{E}[s_{i}]\mathbb{E}[\beta_{j}]+\frac{h_{i,j,1,n}}{2}(\text{Var}[s_{i}]+\text{Var}[\beta_{j}])\right),

  2. (II)

    𝔼[βi]=j=1mAij(𝔼[si]𝔼[βj]+hi,j,1,n2(Var[si]+Var[βj]))j=1mhi,j,7,nλ(n)γi𝔼[βi]\mathbb{E}[\beta_{i}]^{\prime}=\sum_{j=1}^{m}A_{ij}\left(\mathbb{E}[s_{i}]\mathbb{E}[\beta_{j}]+\frac{h_{i,j,1,n}}{2}(\text{Var}[s_{i}]+\text{Var}[\beta_{j}])\right)-\sum_{j=1}^{m}\frac{h_{i,j,7,n}}{\lambda^{(n)}}-\gamma_{i}\mathbb{E}[\beta_{i}],

  3. (III)
    Var[si]\displaystyle\text{Var}[s_{i}]^{\prime} =2j=1mAij(𝔼[si]2𝔼[βj]+2hi,j,2,n(Var[si]+Var[βj]))\displaystyle=-2\sum_{j=1}^{m}A_{ij}\big{(}\mathbb{E}[s_{i}]^{2}\mathbb{E}[\beta_{j}]+2h_{i,j,2,n}(\text{Var}[s_{i}]+\text{Var}[\beta_{j}])\big{)}
    +j=1mAij(𝔼[si]𝔼[βj]+hi,j,1,n2(Var[si]+Var[βj]))(2𝔼[si]+1n)\displaystyle\quad+\sum_{j=1}^{m}A_{ij}\left(\mathbb{E}[s_{i}]\mathbb{E}[\beta_{j}]+\frac{h_{i,j,1,n}}{2}(\text{Var}[s_{i}]+\text{Var}[\beta_{j}])\right)\left(2\mathbb{E}[s_{i}]+\frac{1}{n}\right)
    +j=1m(2hi,j,8,nλ(n)hi,j,7,nnλ(n)2𝔼[si]hi,j,7,nλ(n)),\displaystyle\quad+\sum_{j=1}^{m}\left(\frac{2h_{i,j,8,n}}{\lambda^{(n)}}-\frac{h_{i,j,7,n}}{n\lambda^{(n)}}-2\mathbb{E}[s_{i}]\frac{h_{i,j,7,n}}{\lambda^{(n)}}\right),
  4. (IV)
    Var[βi]\displaystyle\text{Var}[\beta_{i}]^{\prime} =2j=1mAij(𝔼[si]𝔼[βj]𝔼[βi]+hi,j,1,n2(Var[si]+Var[βj])𝔼[βi]\displaystyle=2\sum_{j=1}^{m}A_{ij}\bigg{(}\mathbb{E}[s_{i}]\mathbb{E}[\beta_{j}]\mathbb{E}[\beta_{i}]+\frac{h_{i,j,1,n}}{2}(\text{Var}[s_{i}]+\text{Var}[\beta_{j}])\mathbb{E}[\beta_{i}]
    +hi,j,5,n2(hi,j,6,n(8Var[si]+8Var[βj])+Var[βi]))\displaystyle\quad\quad\quad\quad\quad\quad+\frac{h_{i,j,5,n}}{2}\left(h_{i,j,6,n}(8\text{Var}[s_{i}]+8\text{Var}[\beta_{j}])+\text{Var}[\beta_{i}]\right)\bigg{)}
    +j=1mAij(1n2𝔼[βi])(𝔼[si]𝔼[βj]hi,j,1,n2(Var[si]+Var[βj]))2γiVar[βi]+γi𝔼[βi]n\displaystyle\quad+\sum_{j=1}^{m}A_{ij}\left(\frac{1}{n}-2\mathbb{E}[\beta_{i}]\right)\Big{(}\mathbb{E}[s_{i}]\mathbb{E}[\beta_{j}]\frac{h_{i,j,1,n}}{2}(\text{Var}[s_{i}]+\text{Var}[\beta_{j}])\Big{)}-2\gamma_{i}\text{Var}[\beta_{i}]+\gamma_{i}\frac{\mathbb{E}[\beta_{i}]}{n}
    j=1m(2hi,j,9,nλ(n)+hi,j,7,nnλ(n)),\displaystyle\quad-\sum_{j=1}^{m}\left(\frac{2h_{i,j,9,n}}{\lambda^{(n)}}+\frac{h_{i,j,7,n}}{n\lambda^{(n)}}\right),

where hi,j,7,nh_{i,j,7,n}, hi,j,8,nh_{i,j,8,n}, and hi,j,9,nh_{i,j,9,n} are functions from [0,)[0,\infty) to [0,BijAij][0,B_{ij}A_{ij}] and are defined on the basis of (172).

The above equations constitute a proper system of differential equations with the same variables {𝔼[si]}i=1m\{\mathbb{E}[s_{i}]\}_{i=1}^{m}, {Var[si]}i=1m\{\text{Var}[s_{i}]\}_{i=1}^{m}, {𝔼[βi]}i=1m\{\mathbb{E}[\beta_{i}]\}_{i=1}^{m}, and {Var[βi]}i=1m\{\text{Var}[\beta_{i}]\}_{i=1}^{m} appearing on both the sides. To express these equations compactly, we define z(n)[0,1]4mz^{(n)}\in[0,1]^{4m} as the vector whose entries are given by zi,1(n):=z4(i1)+1(n):=𝔼[si(n)]z_{i,1}^{(n)}:=z_{4(i-1)+1}^{(n)}:=\mathbb{E}[s_{i}^{(n)}], zi,2(n):=z4(i1)+2(n):=𝔼[βi(n)]z_{i,2}^{(n)}:=z_{4(i-1)+2}^{(n)}:=\mathbb{E}[\beta_{i}^{(n)}], zi,3(n):=z4(i1)+3(n):=Var[si(n)]z_{i,3}^{(n)}:=z_{4(i-1)+3}^{(n)}:=\text{Var}[s_{i}^{(n)}], and zi,4(n):=z4i(n):=Var[βi(n)].z_{i,4}^{(n)}:=z_{4i}^{(n)}:=\text{Var}[\beta_{i}^{(n)}]. Then z(n)(t)z^{(n)}(t) is a solution to the initial value problem (z(n))=gn(t,z(n);1/n,1/λ(n))(z^{(n)})^{\prime}=g_{n}(t,z^{(n)};1/n,1/\lambda^{(n)}) and z(0)=z0(n)z(0)=z_{0}^{(n)}, where

  1. (I)

    gi,n(1)(t,z;ε1,ε2):=j=1mAij(zi,1zj,2+hi,j,1,n2(zi,3+zj,4))+ε2j=1mhi,j,7,ng_{i,n}^{(1)}(t,z;\varepsilon_{1},\varepsilon_{2}):=-\sum_{j=1}^{m}A_{ij}\Big{(}z_{i,1}z_{j,2}+\frac{h_{i,j,1,n}}{2}(z_{i,3}+z_{j,4})\Big{)}+\varepsilon_{2}\sum_{j=1}^{m}h_{i,j,7,n},

  2. (II)

    gi,n(2)(t,z;ε1,ε2):=j=1mAij(zi,1zj,2+hi,j,1,n2(zi,3+zj,4))γizi,2ε2j=1mhi,j,7,ng_{i,n}^{(2)}(t,z;\varepsilon_{1},\varepsilon_{2}):=\sum_{j=1}^{m}A_{ij}\left(z_{i,1}z_{j,2}+\frac{h_{i,j,1,n}}{2}(z_{i,3}+z_{j,4})\right)-\gamma_{i}z_{i,2}-\varepsilon_{2}\sum_{j=1}^{m}h_{i,j,7,n},

  3. (III)
    gi,n(3)(t,z;ε1,ε2)\displaystyle g_{i,n}^{(3)}(t,z;\varepsilon_{1},\varepsilon_{2}) :=2j=1mAij((zi,1)2zj,2+2hi,j,2,n(zi,3+zj,4))\displaystyle:=-2\sum_{j=1}^{m}A_{ij}((z_{i,1})^{2}z_{j,2}+2h_{i,j,2,n}(z_{i,3}+z_{j,4}))
    +j=1mAij(zi,1zj,2+hi,j,1,n2(zi,3+zj,4))(2zi,1+ε1)\displaystyle\quad+\sum_{j=1}^{m}A_{ij}\Big{(}z_{i,1}z_{j,2}+\frac{h_{i,j,1,n}}{2}(z_{i,3}+z_{j,4})\Big{)}\left(2z_{i,1}+\varepsilon_{1}\right)
    +j=1m(2hi,j,8,nε2hi,j,7,nε1ε22𝔼[s1]hi,j,7,nε2),\displaystyle\quad+\sum_{j=1}^{m}\left({2h_{i,j,8,n}}{\varepsilon_{2}}-{h_{i,j,7,n}}{\varepsilon_{1}\varepsilon_{2}}-2\mathbb{E}[s_{1}]h_{i,j,7,n}\varepsilon_{2}\right),
  4. (IV)
    gi,n(4)(t,z;ε1,ε2)\displaystyle g_{i,n}^{(4)}(t,z;\varepsilon_{1},\varepsilon_{2}) =2j=1mAij(zi,1zj,2zi,2+hi,j,1,n2(zi,3+zj,4)zi,2\displaystyle=2\sum_{j=1}^{m}A_{ij}\Bigg{(}z_{i,1}z_{j,2}z_{i,2}+\frac{h_{i,j,1,n}}{2}(z_{i,3}+z_{j,4})z_{i,2}
    +hi,j,5,n2(hi,j,6,n(8zi,3+8zj,4)+zi,4))\displaystyle\quad\quad\quad\quad\quad\quad+\frac{h_{i,j,5,n}}{2}\left(h_{i,j,6,n}(8z_{i,3}+8z_{j,4})+z_{i,4}\Bigg{)}\right)
    +j=1mAij(ε12zi,2)(zi,1zj,2+hi,j,1,n2(zi,3+zj,4))\displaystyle\quad+\sum_{j=1}^{m}A_{ij}\left(\varepsilon_{1}-2z_{i,2}\right)\left(z_{i,1}z_{j,2}+\frac{h_{i,j,1,n}}{2}(z_{i,3}+z_{j,4})\right)
    2γizi,4+ε1γizi,2j=1m(2hi,j,9,nε2+hi,j,7,nε1ε2),\displaystyle\quad-2\gamma_{i}z_{i,4}+\varepsilon_{1}\gamma_{i}z_{i,2}-\sum_{j=1}^{m}\left(2{h_{i,j,9,n}}{\varepsilon_{2}}+{h_{i,j,7,n}}{\varepsilon_{1}\varepsilon_{2}}\right),
  5. (V)

    z0(n)=(s1(n)(0),β1(n)(0),0,0,s2(n)(0),β2(n)(0),0,0,,sm(n)(0),βm(n)(0),0,0)z_{0}^{(n)}=(s_{1}^{(n)}(0),\beta_{1}^{(n)}(0),0,0,s_{2}^{(n)}(0),\beta_{2}^{(n)}(0),0,0,\ldots,s_{m}^{(n)}(0),\beta_{m}^{(n)}(0),0,0).

Observe that irrespective of the value of nn, the solution (z¯i,1(t),z¯i,2(t),z¯i,3(t),z¯i,4(t)):=(yi(t),wi(t),0,0)(\bar{z}_{i,1}(t),\bar{z}_{i,2}(t),\bar{z}_{i,3}(t),\bar{z}_{i,4}(t)):=(y_{i}(t),w_{i}(t),0,0) solves the initial value problem z=gn(t,z;0,0)z^{\prime}=g_{n}(t,z;0,0) and z(0)=z0z(0)=z_{0}, where

z0:=(s1,0,β1,0,0,0,s2,0,β2,0,0,0,,sm,0,βm,0,0,0).z_{0}:=(s_{1,0},\beta_{1,0},0,0,s_{2,0},\beta_{2,0},0,0,\ldots,s_{m,0},\beta_{m,0},0,0).

Next, we need to bound z(n)(t)z¯(t)\left\|z^{(n)}(t)-\bar{z}(t)\right\| (where z¯(t)[0,1]4m\bar{z}(t)\in[0,1]^{4m} is the unique vector satisfying z¯4(i1)+(t)=z¯i,(t)\bar{z}_{4(i-1)+\ell}(t)=\bar{z}_{i,\ell}(t) for all i[m]i\in[m] and [4]\ell\in[4]). For this purpose, we will need the following lemma, which we borrow from [30].

Lemma 16.

Consider the initial value problems x=f1(t,x)x^{\prime}=f_{1}(t,x), x(0)=x1x(0)=x_{1} and x=f2(t,x)x^{\prime}=f_{2}(t,x), x(0)=x2x(0)=x_{2} with solutions φ1(t)\varphi_{1}(t) and φ2(t)\varphi_{2}(t) respectively. If f1f_{1} is Lipschitz in xx with constant LL and f1(t,x)f2(t,x)M,\|f_{1}(t,x)-f_{2}(t,x)\|\leq M, then φ1(t)φ2(t)(x1x2+M/L)eLtM/L\left\|\varphi_{1}(t)-\varphi_{2}(t)\right\|\leq\left(\left\|x_{1}-x_{2}\right\|+M/L\right)e^{Lt}-M/L.

Now, note that the domain of zz for gn(t,z;ε1,ε2)g_{n}(t,z;\varepsilon_{1},\varepsilon_{2}) can be chosen to be bounded because 0𝔼[si],𝔼[βi]10\leq\mathbb{E}[s_{i}],\mathbb{E}[\beta_{i}]\leq 1 and Var[si]𝔼[si2]1\text{Var}[s_{i}]\leq\mathbb{E}[s_{i}^{2}]\leq 1. Similarly, Var[βi]1\text{Var}[\beta_{i}]\leq 1. Also, we let ε1,ε2(0,1)\varepsilon_{1},\varepsilon_{2}\in(0,1) and define ε:=max{ε1,ε2}\varepsilon:=\max\{\varepsilon_{1},\varepsilon_{2}\}. Since gn(t,z;0,0)g_{n}(t,z;0,0) is a polynomial in zz, it is Lipschitz-continuous with some Lipschitz constant L(0,)L\in(0,\infty). In addition, we use the bounds on zz and the functions {hi,j,,n:19}\{h_{i,j,\ell,n}:1\leq\ell\leq 9\} as follows:

gn(t,z;ε1,ε2)gn(t,z;0,0)\displaystyle\|g_{n}(t,z;\varepsilon_{1},\varepsilon_{2})-g_{n}(t,z;0,0)\| 2i=1mj=1mAijε|zi,1zj,2+hi,j,1,n2(zi,3+zj,4)|+i=1mγiε+10i=1mj=1mAijBijε\displaystyle\leq 2\sum_{i=1}^{m}\sum_{j=1}^{m}A_{ij}\varepsilon\left|z_{i,1}z_{j,2}+\frac{h_{i,j,1,n}}{2}(z_{i,3}+z_{j,4})\right|+\sum_{i=1}^{m}\gamma_{i}\varepsilon+10\sum_{i=1}^{m}\sum_{j=1}^{m}A_{ij}B_{ij}\varepsilon
(i=1mj=1mAij(4+10Bij)+i=1mγi)ε,\displaystyle\leq\left(\sum_{i=1}^{m}\sum_{j=1}^{m}A_{ij}(4+10B_{ij})+\sum_{i=1}^{m}\gamma_{i}\right)\varepsilon,

i.e.,

gn(t,z;ε1,ε2)gn(t,z;0,0)M(ε),\displaystyle\|g_{n}(t,z;\varepsilon_{1},\varepsilon_{2})-g_{n}(t,z;0,0)\|\leq M(\varepsilon),

where M(ε):=(i=1mj=1mAij(4+10Bij)+i=1mγi)εM(\varepsilon):=\left(\sum_{i=1}^{m}\sum_{j=1}^{m}A_{ij}(4+10B_{ij})+\sum_{i=1}^{m}\gamma_{i}\right)\varepsilon.

We now apply Lemma 16 after setting

f1(t,x)=gn(t,x;0,0),f2(t,x)=gn(t,x;1/n,1/λ(n)),x1=z0,x2=z0(n).\displaystyle f_{1}(t,x)=g_{n}(t,x;0,0),\quad f_{2}(t,x)=g_{n}(t,x;1/n,1/\lambda^{(n)}),\quad x_{1}=z_{0},\quad x_{2}=z_{0}^{(n)}.

Also, we let φ1=z¯\varphi_{1}=\bar{z} and φ2=z(n)\varphi_{2}=z^{(n)}. Then we have

z(n)(t)z¯(t)(z0z0(n)+MαnL)eLtMαnL,\left\|z^{(n)}(t)-\bar{z}(t)\right\|\leq\left(\left\|z_{0}-z_{0}^{(n)}\right\|+\frac{M\alpha_{n}}{L}\right)e^{Lt}-\frac{M\alpha_{n}}{L},

where αn:=max{1n,1λ(n)}\alpha_{n}:=\max\left\{\frac{1}{n},\frac{1}{\lambda^{(n)}}\right\}. Thus, for all tTt\leq T,

z(n)(t)z¯(t)(z0z0(n)+MαnL)eLTMαnL.\displaystyle\left\|z^{(n)}(t)-\bar{z}(t)\right\|\leq\left(\left\|z_{0}-z_{0}^{(n)}\right\|+\frac{M\alpha_{n}}{L}\right)e^{LT}-\frac{M\alpha_{n}}{L}. (173)

Since limε0M(ε)=0\lim_{\varepsilon\rightarrow 0}M(\varepsilon)=0, limnz0(n)=z0\lim_{n\rightarrow\infty}z_{0}^{(n)}=z_{0}, and limnαn=limnmax{1n,1λ(n)}=0\lim_{n\rightarrow\infty}\alpha_{n}=\lim_{n\rightarrow\infty}\max\left\{\frac{1}{n},\frac{1}{\lambda^{(n)}}\right\}=0, the right hand side of (173) goes to zero as n.n\rightarrow\infty. Hence we have the uniform convergence z(n)z¯z^{(n)}\rightarrow\bar{z} over any finite time interval [0,T][0,T]. The last step is to show that z(n)z¯z^{(n)}\rightarrow\bar{z} implies L2L^{2}-convergence, i.e., 𝔼[(si(n)yi,βi(n)wi)2]0\mathbb{E}[\|(s_{i}^{(n)}-y_{i},\beta_{i}^{(n)}-w_{i})\|_{2}]\rightarrow 0 as nn\rightarrow\infty. To this end, we have

𝔼[(si(n)yi,βi(n)wi)22]\displaystyle\mathbb{E}[\|(s_{i}^{(n)}-y_{i},\beta_{i}^{(n)}-w_{i})\|_{2}^{2}] =𝔼[(si(n)yi)2]+𝔼[(βi(n)wi)2]\displaystyle=\mathbb{E}[(s_{i}^{(n)}-y_{i})^{2}]+\mathbb{E}[(\beta_{i}^{(n)}-w_{i})^{2}]
=(𝔼[si(n)]yi])2+(𝔼[βi(n)]wi)2+Var[si(n)]+Var[βi(n)]\displaystyle=(\mathbb{E}[s_{i}^{(n)}]-y_{i}])^{2}+(\mathbb{E}[\beta_{i}^{(n)}]-w_{i})^{2}+\text{Var}[s_{i}^{(n)}]+\text{Var}[\beta_{i}^{(n)}]
|𝔼[si(n)]yi]|+|𝔼[βi(n)]wi]|+Var[si(n)]+Var[βi(n)]\displaystyle\leq|\mathbb{E}[s_{i}^{(n)}]-y_{i}]|+|\mathbb{E}[\beta_{i}^{(n)}]-w_{i}]|+\text{Var}[s_{i}^{(n)}]+\text{Var}[\beta_{i}^{(n)}]
=|zi,1(n)z¯i,1|+|zi,2(n)z¯i,2|+|zi,3(n)z¯i,3|+|zi,4(n)z¯i,4|,\displaystyle=|z_{i,1}^{(n)}-\bar{z}_{i,1}|+|z_{i,2}^{(n)}-\bar{z}_{i,2}|+|z_{i,3}^{(n)}-\bar{z}_{i,3}|+|z_{i,4}^{(n)}-\bar{z}_{i,4}|,

where we used that z¯i,3=z¯i,4=0\bar{z}_{i,3}=\bar{z}_{i,4}=0, and the inequality holds because yi,𝔼[si(n)],wi,𝔼[βi(n)][0,1]y_{i},\mathbb{E}[s_{i}^{(n)}],w_{i},\mathbb{E}[\beta_{i}^{(n)}]\in[0,1]. Thus, the uniform convergence of z(n)z^{(n)} to z¯\bar{z} over [0,T][0,T] proves that 𝔼[(si(n)yi,βi(n)wi)2]0\mathbb{E}[\|(s_{i}^{(n)}-y_{i},\beta_{i}^{(n)}-w_{i})\|_{2}]\rightarrow 0 as nn\rightarrow\infty.