Usefulness of the Age-Structured SIR Dynamics in Modelling COVID-19
Abstract
We examine the age-structured SIR model, a variant of the classical Susceptible-Infected-Recovered (SIR) model of epidemic propagation, in the context of COVID-19. In doing so, we provide a theoretical basis for the model, perform an empirical validation, and discover the limitations of the model in approximating arbitrary epidemics. We first establish the differential equations defining the age-structured SIR model as the mean-field limits of a continuous-time Markov process that models epidemic spreading on a social network involving random, asynchronous interactions. We then show that, as the population size grows, the infection rate for any pair of age groups converges to its mean-field limit if and only if the edge update rate of the network approaches infinity, and we show how the rate of mean-field convergence depends on the edge update rate. We then propose a system identification method for parameter estimation of the bilinear ODEs of our model, and we test the model performance on a Japanese COVID-19 dataset by generating the trajectories of the age-wise numbers of infected individuals in the prefecture of Tokyo for a period of over 365 days. In the process, we also develop an algorithm to identify the different phases of the pandemic, each phase being associated with a unique set of contact rates. Our results show a good agreement between the generated trajectories and the observed ones.
1 INTRODUCTION
The global COVID-19 death toll has crossed 6 million [1], and it is no surprise that researchers all over the world have been forecasting the evolution of this pandemic to propose control policies aimed at minimizing its medical and economic impacts [2, 3, acemoglu2020optimal, 4, 5, 6]. Their efforts have typically relied on classical epidemiological models or their variants (for an overview see [7] and the references therein). One such classical epidemic model is the Susceptible-Infected-Recovered (SIR) model. Proposed in [8], the SIR model is a compartmental model in which every individual belongs to one of three possible states at any given time instant: the susceptible state, the infected state, and the recovered state. The continuous-time SIR dynamics models the time-evolution of the fraction of individuals in any of these states using a set of ordinary differential equations (ODEs) parameterized by two quantities: the infection rate (the rate at which a given infected individual infects a given susceptible individual) and the recovery rate of infected individuals.
Even though the continuous-time SIR model is a deterministic model, it models an inherently random phenomenon in a large (but discrete) population. To bridge between the deterministic continuous-time SIR model and the underlying random processes over a finite population, researchers have shown that the associated (continuous-time) ODEs are the mean-field limits of continuous-time Markovian epidemic processes over a finite population [9, 10]. Similar results have been obtained for variants of the original model, such as for the SIR dynamics on a configuration model network [11, 12]. These results theoretically justify the SIR model ODEs.
Classical SIR models, however, (continuous and discrete-time) are homogeneous – the same infection and recovery rates apply to the whole social network despite differences in the individuals’ age, gender, race, immunity level, and pre-existing medical conditions. For COVID-19, this assumption is inconsistent with studies showing that the contact rates between individuals and the recovery rates of infected individuals depend on factors such as age and location [13, 14, 15, 16]. In addition, [17] argues that homogeneous models can introduce significant biases in forecasting the epidemic, including overestimation of the number of infections required to achieve herd immunity, overestimation of the strictness of optimal control policies, overestimation of the impact of policy relaxations, and incorrect estimation of the time of onset of the pandemic.
We therefore need to shift our focus to variants of the classical SIR model with heterogeneous contact rates. Examples include the multi-risk SIR model [4] and the age-stratified SIR models considered in [18, 19, 20], in which the population is partitioned into multiple groups and the rates of infection and recovery vary across groups. See [17] for a survey of these papers.
However, the models considered in the above works have two main shortcomings. On the one hand, barring exceptions such as [21], they are typically not validated using real data. On the other hand, they do not have a strong theoretical foundation because the dynamical processes studied in these works have not been established as the mean field limits of stochastic epidemic processes evolving on time-varying random graphs. We emphasize that even the convergence results obtained for homogeneous SIR models [9, 10, 11, 12] make the unrealistic assumption that the network of physical contacts (in-person interactions) existing in the population is time-invariant. As such, we cannot justify the use of these models in designing optimal control policies aimed at minimizing the impact of any epidemic. We therefore address the aforementioned shortcomings using the age-structured SIR model, a multi-group SIR model that partitions the population of a given region into different age groups and assigns different infection rates and recovery rates to the age groups. We note that, although we adopt the term age-structured in our paper, our analysis also applies to populations partitioned on the basis of differences in geographical location, sex, immunity level, etc. Moreover, among existing heterogeneous models [17], the age-structured SIR model is the simplest and hence more mathematically and computationally tractable than other models.
The contributions of this paper are as follows:
-
1.
Modeling: We extend our previously proposed stochastic epidemic model [22] to a more general model that incorporates (a) a random and time-varying network of physical contacts (in-person interactions between pairs of individuals) that are updated asynchronously and at random times, (b) random transmissions of disease-causing pathogens from infected individuals to their susceptible neighbors, and (c) recoveries of infected individuals that occur at random times. We analyze the resulting dynamics and show that under certain independence assumptions, the expected trajectories of the fractions of susceptible/infected/recovered individuals in any age group converge in mean-square to the solutions of the age-structured SIR ODEs as the population size goes to .
-
2.
Convergence Rate Analysis: We derive a lower bound on the effective infection rate for a given pair of age groups in the stochastic model. This bound, as we show, is approximately linear in the reciprocal of the network update rate, which leads to the infection rate converging to its limit (specified by the ODEs) as fast as the reciprocal of the network update rate vanishes.
-
3.
Validation: We validate our age-structured model empirically by estimating the parameters of our model using a Japanese COVID-19 dataset and, subsequently, by generating the age-wise numbers of infected individuals as functions of time. In this process, we leverage the crucial fact that the ODEs defining our model are linear in the model parameters (transmission and recovery rates), which enables us to use a least-squares method for the system identification.
-
4.
A Method to Detect Changes in Social Behavior: We design a simple algorithm that can be used to detect changes in social behavior throughout the duration of the pandemic. Given the age-wise daily infection counts, the algorithm estimates the dates around which the inter-age-group contact rates change significantly.
-
5.
Insights into Epidemic Spreading: We interpret the results of our phase detection algorithm to identify the least and the most infectious age groups and the least and the most vulnerable age groups. Additionally, we analyze the data for the entire period from March 2020 to April 2021 to explain how certain social events influenced the propagation of COVID-19 in the prefecture of Tokyo.
The structure of our paper is as follows: We introduce the age-structured SIR model and our stochastic epidemic model in Section 2. We establish the age-structured SIR ODEs as the mean-field limits of our stochastic model in Section 3. We also discuss the limitations of (converse result for) our model in Section 3. Next, we describe the empirical validation of our model (in the context of the COVID-19 outbreak in Tokyo) in Section 5. We conclude with a brief summary and future directions in Section 6.
Related Works: [23] proposes a heterogeneous epidemic model with time-varying parameters to show that heterogeneous susceptibility to infection results in a temporary weakening of the COVID-19 pandemic but not in herd immunity. The model is validated using the death tolls (and not the case numbers) reported for New York and Chicago for a period of about 80 days. [19] uses the age-structured SEIQRD model to predict the number of deaths with a reasonable accuracy, but unlike our work, it does not use the proposed model to generate the number of new cases as a function of time. [24] uses heterogeneous variants of the SEIR model to study the impact of the lockdown policy implemented in France, but it does not validate these models empirically. [15] reports contact rate matrices for the population of the UK based on the self-reported data of 36,000 volunteers. However, the study ignores the time-varying nature of these contact rates, which we capture in our phase detection algorithm (Section 5). Another study that uses time-invariant model parameters is [25], which proposes the age-structured SEIRA model and uses it to simulate the number of new infections in different social groups of Chile.
[26] uses a heterogeneous SEIRD model to predict the effects of various relaxation policies on infection counts in certain regions of Italy. The model therein is empirically validated only using the data obtained during the first 60 days of the pandemic. In [20], the authors propose an age-structured SIRD model and calibrate it with the data obtained from [2]. Unlike our paper, however, [20] divides the population into only two age groups, and does not compare the model-generated values of the number of infections with the official case counts. Two other studies that use two-age-group SIR models are [27] and [28]. While [27] argues that in Florida, old and socially inert adults have been possibly infected by the young, [28] argues that age-group-targeted policies are more effective than uniform policies in reducing the economic impact of COVID-19. [29] proposes a heterogeneous SIR model with feedback and forecasts the economic and medical impacts of various policies aimed at controlling the pandemic in Chile. Unlike our study, however, [29] ignores the time-varying nature of contact rates. [21] proposes the SEIR-HC-SEC-AGE model, a heterogeneous SEIR model that sub-divides each age-group further into risk sectors with different vulnerabilities to the SARS-CoV-2 virus. The model therein, which is calibrated to predict the effects of different lockdown policies in certain regions of Italy, simulates the time-evolution of the observed death toll with a high accuracy. By contrast, we pick a much simpler heterogeneous model and examine whether it fits the observed case numbers well. [18] and [4] use an age-structured SIR model to show that control policies that target different age groups differently perform better than uniform policies. However, these results assume that inter-age-group contact rates are the same for all pairs of age groups, an assumption that is inconsistent with our empirical results (Section 5). Hence, deriving optimal policies in the framework of the age-structured SIR model under more general assumptions is an important open problem.
Notation: We let denote the set of natural numbers and . We let for . We denote the set of real and positive real numbers by and , respectively. For , we let denote the positive part of .
The symbols and are used as a continuous-time and discrete-time indices, respectively. We use the notation for functions and for functions . We occasionally omit the time index when the value of is clear from the context.
We use the Bachmann-Landau asymptotic notation for a given function in the context of . We use in the context of . In addition, for a given function , we use the notation to denote , the first derivative of with respect to time.
For a set , we let denote the cardinality of . In this paper, all random events and random variables are with respect to a probability space , where is the sample space, is the set of events, and is the probability measure on this space. We denote random variables and random events using capital letters, and for a random event , we define to be the indicator random variable associated with , i.e, is the random variable with if and , otherwise. For an event , represents the complement of . For a random variable , denotes the expected value of and denotes the conditional expectation of given the event . For random variables and and a random event , we define
Therefore, for an event
We denote tuples of length using bold-face letters and random tuples using bold-face capital letters. For a tuple of length and an index , we let denote the -th entry of .
For and , we use to denote the directed graph (digraph) with vertex set and edge set . Finally, for a graph , given two distinct nodes , we let , where if and , otherwise. Note that maps the edges between (distinct) nodes of the graph to the numbers in lexicographic order.
2 PROBLEM FORMULATION
We now introduce two epidemic models, of which the first describes a deterministic dynamical system and the second describes a stochastic process on a finite population. One of the main objectives of this work is to relate these models, which is achieved in Section 3.
2.1 The Age-Structured SIR Model
Consider a population of individuals spanning age groups111As mentioned before, throughout this paper, we could generalize the discussions involving age groups to subpopulations distinguished by geographical locations, pre-existing health conditions, sex, etc.. Suppose a part of this population contracts a communicable disease at time . Let , and denote, respectively, the fractions of susceptible, infected, and recovered individuals in the -th age group at (a continuous) time , so that equals the fraction of individuals in the -th age group for all . As the disease spreads across the population, susceptible individuals get infected, and infected individuals recover in accordance with the system of ODEs given by
(1) | ||||
(2) | ||||
where for each , the constant represents the rate of infection transmission from an individual in age group to an individual in age group , and denotes the recovery rate of an infected individual in age group . Hereafter, we refer to as the contact rate of age group with age group . Note that the third equation in (1) can be obtained from the first two equations simply by using the fact that for all . Also, if , the above model reduces to the classical (homogeneous and continuous-time) SIR model.
2.2 A Stochastic Epidemic Model
Let us now define a continuous-time Markov chain that describes an age-structured process of epidemic spreading occurring over a finite (atomic) population composed of individuals that are connected through a random, time-varying network .
2.2.1 Age Groups
Let denote the total population size, and let be the vertex set of the time-varying graph , so that the vertex set indexes all the individuals/nodes in the network. We assume that is partitioned into age groups and that (the number of individuals in the -th age group) scales linearly with for all . In the following, are generic age group indices.
2.2.2 State Space
The state space of our random process is the space . The network state is a tuple , where
-
(i)
denotes the disease states of the nodes in the network, i.e., for , we set , , or accordingly as node is susceptible, infected, or recovered, respectively.
-
(ii)
For , we let denote the edge state of the -th pair in the following lexicographic order of pairs of distinct nodes: . In other words, for any node pair such that , we set if there is a directed edge from to in the network , and , otherwise. For notational convenience, we let .
-
(iii)
For , we let be a binary variable whose value flips (becomes ) whenever the -th edge state gets updated (re-initialized). However, the direction of this flip (whether changes from 0 to 1 or from 1 to 0) carries no significance.
2.2.3 State Attributes
For all , we let , , and denote, respectively, the set of susceptible individuals, the set of infected individuals, and the set of recovered individuals in given that the network state is . We let and . Additionally, for every node , we let be the number of arcs from to .
2.2.4 The Markov Process
Let denote the state of the network at any time . Then we assume that is a right-continuous time-homogeneous Markov process in which every transition from a state to a state belongs to one of the following categories:
-
1.
Infection transition: This occurs when a node gets infected by a node in , while the disease states of all other nodes and the edge states of all the node pairs remain the same. In other words, , , and for all . Denoting the state-independent rate of pathogen transmission from a node in to an adjacent node in by , we note that the rate of infection transmission from any node to is . Hence, the total rate at which receives pathogens from is , assuming that different edges transmit the infection independently of each other during vanishingly small time intervals. As a result, the effective rate at which gets infected is . We denote the successor state of , where the node turns from susceptible to infected, by .
-
2.
Recovery transition: This occurs when a node recovers, i.e., , , and for all . We let denote the rate at which an infected node in (such as ) recovers. For such a transition, we denote .
-
3.
Edge update transition: This occurs when , the edge state of a node pair , is updated or re-initialized, i.e., , and for all . We let denote the edge update rate or the rate at which an edge state is updated. In addition, given that the edge state of is updated at time , the probability that (i.e., the edge exists after the re-initialization) equals , where is constant in time. Therefore, if (meaning that exists as an arc in in the network state ), then the rate of transition from to equals , whereas if , then the rate of transition from to equals . In the former case, we write , while in the latter case, we write . Note that the rate of transition from to or does not depend on .
The edge update transition of can be described informally as follows. Throughout the evolution of the pandemic, and decide whether or not to meet each other at a constant rate , i.e., their decision times form a Poisson process with rate . Each time they make such a decision, they decide to interact with probability , and they decide not to interact with probability , independently of their past decisions. The probability of interaction is assumed to scale inversely with so that the mean degree of every node is constant with respect to .
To summarize, the rate of transition from any state to any state is given by , the infinitesimal generator of the Markov chain , where for
and . In addition, we say that succeeds potentially iff .
3 MAIN RESULT
To provide a rigorous mean-field derivation of the dynamics (1), we now consider a sequence of social networks with increasing population sizes such that each network obeys the theoretical framework described in Section 2. Given a network from this sequence with population size , we let , , and denote the (random) sets of infected, susceptible, and infected individuals in the -th age group, respectively, and we let , and denote the fractions of susceptible, infected, and recovered individuals in the -th age group, respectively. As for the absolute numbers, we let , , and . Additionally, we let denote the edge set of the network at time , and we drop the superscript (n) when the context makes our reference to the -th network clear.
Another quantity that varies with is , the edge update rate. To obtain the desired mean-field limit in Theorem 1, we assume that as . To interpret this assumption, consider any pair of individuals that are in contact with each other at time during the epidemic. Since the edge state of is updated to (the state of non-existence) at a time-invariant rate of , the assumption implies that the mean interaction time of with , which is , vanishes as the population size increases. This is a possible real-world scenario, because as increases, the population density of the given geographical region increases, which could result in overcrowding and rapidly changing interaction patterns in the network. This may be especially true in the case of public places such as supermarkets and subway stations at a time when the society is already aware of an evolving epidemic. Another implication of is that the rate at which a given infected node contacts and transmits pathogens to a given susceptible node vanishes as the population size goes to (see Remark 1 for an explanation). This implication is weaker than the often-assumed condition that the rate of pathogen transmission is proportional to the reciprocal of the population size [30, 31].
We are now ready to state our main result. Its proof is based on the theory of continuous-time Markov chains and an analysis of how the disease propagation process is affected by random updates occurring in the network structure at random times (which results in Propositions 1 and 2) in addition to the proof techniques used in [30]. The proofs of all these results are available in the appendix.
Theorem 1.
Suppose that and that for every , there exist such that and . Then for each ,
on any finite time interval , where is the solution to the ODE system given by the first two equations in (1), i.e., satisfies
-
(I).
,
-
(II).
,
and is defined by .
Theorem 1 relies on the following proposition.
Proposition 1.
For each , let
be the random variable that denotes the conditional probability that a pair of nodes are in physical contact at time given the state of the network at time . Then the following equations hold for all :
-
(i).
,
-
(ii).
,
-
(iii).
,
-
(iv).
.
We point out that if then Equations (i) and (ii) have the same coefficients as (I) and (II). It is then natural to ask: how does the conditional edge probability compare to the unconditional edge probability ? The following proposition provides an answer. As we show in Remark 1, our answer helps characterize the rate at which the infection transmission rates converge to their respective limits, an analysis missing from other works such as [30] and [31].
Proposition 2.
For all , and ,
Remark 1.
Given , note that the conditional probability that a given infected node in infects a given susceptible node in during a time interval is In light of Proposition 2, this means that the associated conditional infection rate belongs to the interval
On taking expectations, we realize that the same applies to the associated unconditional infection rate as well. Hence, the total rate of infection transmission from all of to any given node of is at least and at most . Since we assume , this further implies that the concerned rate is approximately for large , thereby giving us an interpretation of the ‘contact rate’ as a normalized infection rate. That is, in the limit as , the matrix quantifies the infection transmission rates between any two age groups relative to the level of infectedness (fraction of infected persons) of the transmitting age group. Moreover, Proposition 2 also implies that the difference between the age-wise infection transmission rates and their respective mean-field limits (which exist as per Theorem 1) is .
4 A CONVERSE RESULT
The purpose of this section is to argue that the age-structured SIR dynamics does not model an epidemic well if the infection rates are high enough to be comparable to the edge update rate of the network.
Theorem 2.
Suppose and that for every , there exist such that and as . In addition, let be the solutions of the ODEs (I) and (II). Then, there exists no interval for which and on which the pairs uniformly converge in probability to the corresponding pairs in . More precisely, for every interval such that and for all and , there exists a and an such that
Proof.
Suppose, on the contrary, that there exists a time interval such that and for all and , and the following holds for all and all :
i.e., for a fixed , there exists a sequence such that
We then arrive at a contradiction, as shown below.
We first choose an (where ) and a . By our hypothesis and norm equivalence, for and for every , there exists an such that
for all and all .
We now define and we let and be any two nodes such that for arbitrary . Additionally, we let denote the (random) time elapsed between the time at which gets infected and time . We then have
(3) | ||||
(4) | ||||
(5) | ||||
(6) |
where holds because for all , and can be explained as follows: given and an infected node for some time , and given that , we know from Proposition 2 that the conditional probability of the edge existing in the network at time is at most . Also, as per the definition of our stochastic epidemic model, given that and given (and hence, also that ), the conditional rate of infection transmission from to at time is . Hence, given (and hence, that ), the conditional rate of infection transmission from to is at most . Under our modelling assumption that distinct edges transmit the infection independently of each other during vanishingly small time intervals, this means that, conditional on and , the conditional total rate at which receives infection is at most
Note that this upper bound is time-invariant and does not depend on or for any time . It thus follows that, conditional on , the rate at which gets infected is at most throughout the interval and hence, the probability that does not get infected during an interval of length is at least . This implies .
We now infer from (3) that
(7) |
Next, we lower-bound . To this end, note from Proposition 2 that . As a result, we have
which also means that
(8) |
Moreover, for any realization of , Remark 5 asserts that
for all . Since the right-hand-side above is independent of both and , it follows that
(9) |
Therefore, as a consequence of (7), (8), (9), and the union bound, we have
(10) | ||||
(11) | ||||
(12) |
This further yields,
(13) | ||||
(14) | ||||
(15) | ||||
(16) | ||||
(17) |
where the inequality is a consequence of (10) and Remark 4. Recall that , which means that the right hand side can be made smaller than by choosing large enough. Moreover (13) holds for all .
Proposition 1 now implies that for all and large enough ,
(18) |
Now, observe that for any , we have
(19) | |||
(20) | |||
(21) | |||
(22) |
Therefore, assuming that and are small enough to satisfy , (18) implies the existence of a constant such that :
Since this holds for all , we have
for all sufficiently large . Here, we observe that are bounded by the constant function , which is integrable with respect to probability measures. Therefore, are uniformly integrable. Since they converge in probability to (by hypothesis), it follows by Vitali’s Convergence Theorem that they also converge in -norm. Thus, as , thereby implying that
(23) |
Before interpreting Theorem 2, we first note that the result only applies to the time intervals on which are positive throughout the interval. Although this condition appears stringent, it is mild from the viewpoint of epidemic spreading in the real world. This is because, in practice we are only interested in time periods during which every age group has infected cases (which ensures that for all ), and most epidemics leave behind uninfected individuals (thereby ensuring that for all ). Therefore, Theorem 2 applies to all time intervals of practical interest.
Restricting our focus to such intervals, Theorem 2 asserts that, if the edge update rate does not go to with the population size, then there exists a positive lower bound on the probability of the age-wise infected and susceptible fractions differing significantly from the corresponding solutions of the age-structured SIR ODEs at one or more points of time in the considered time interval. At this point, we remark that for large populations, the edge update rate is approximately the reciprocal of the mean duration of every interaction in the network. This means that the greater the value of , the faster will be the changes that occur in the social interaction patterns of the network. Therefore, in conjunction with Theorem 1, Theorem 2 enables us to draw the following inference: the age-structured SIR model can be expected to approximate a real-world epidemic spreading in a large population accurately if and only if the social interaction patterns of the network change rapidly with time. This is more likely to be the case in crowded public places such as supermarkets and airports.
There is another way to interpret Theorems 1 and 2. Note that we have assumed that the sequence of edge states realized during the timeline of the epidemic are independent for every pair of nodes in the network. Therefore, for greater values of , the network structure becomes more unrecognizable from its past realizations. Thus, the age-structured SIR model can be expected to approximate epidemic spreading well if and only if the network is highly memoryless, i.e., if and only if the network continually “forgets” its past interaction patterns throughout the timeline of the epidemic under study.
Remark 2.
Observe from the proof of Theorem 2 that the difference between , the first derivative of the expected fraction of infected nodes in , and , the first derivative of the corresponding ODE solution , is small only if is close to 1, which happens when . Moreover, this observation is consistent with Remark 1, according to which the total infection rate from to any given susceptible node in is close to (and hence, in close agreement with the ODEs (1)) when . Along with Theorems 1 and 2, this means that the age-structured SIR model is likely to approximate real-world epidemic spreading well if and only if the infection transmission rates are negligible when compared to the social mixing rate .
Intuitively, when , the time scales (the mean duration of time) over which the concerned disease spreads from any age group to any other age group are orders of magnitude greater than the time scale over which the network is updated. As a result, the independence of the sequences of edge state updates ensures that most of the possible realizations of the network structure are attained over the time scale of infection transmission. Equivalently, from the viewpoint of the pathogens causing the disease, the effective network structure (the network topology averaged over any of the age-wise infection timescales) is close to being a complete graph. Hence, by extrapolating the existing results on mean-field limits of epidemic processes on complete graphs (such as [30]) to heterogeneous epidemic models, we can assert that the age-structured SIR ODEs are able to approximate the epidemic propagation with a high accuracy.
On the other hand, if the infection rates are too high (and hence, comparable to the social mixing rate , which is always finite in reality), the pathogens perceive a randomly generated network even on the time scale of infection transmission. Since this random network is sparse (because we assume the expected node degrees to be constant, which results in the edge probability scaling inversely with the population size), it follows that the number of transmissions occurring in any given time period is likely to be smaller than in the case of a complete graph. Thus, the age-structued SIR ODEs overestimate the rate of growth of age-wise infected fractions. This is further confirmed by the sign of the inequality in (23).
5 EMPIRICAL VALIDATION
We now validate the age-structured SIR model in the context of the COVID-19 pandemic in Japan as follows: we first estimate the model parameters using the data provided by the Government of Japan, and we then compare the trajectories generated by the model with the reference data.
5.1 Dataset
We use a dataset provided by the Government of Japan at [32]. This dataset partitions the population of the prefecture of Tokyo into age groups: 0 - 19, 20 - 39, 40 - 59, 60 - 79, and 80+ years old individuals. For each age group and each day in the year-long timeline , the dataset lists the total number of people infected in the age group until date . We denote this number by .
5.2 Preprocessing
Due to several factors, such as lack of reporting/testing on the weekends, the raw data has missing information and is contaminated with noise. Therefore, using a moving average filter with a window size of 15 days, we de-noise the raw data to obtain the estimated total number of infected individuals by day in age group , denoted by . We then estimate from the smoothed data the number of susceptible, infected, and recovered individuals in age group on day , denoted by , , and , respectively. We do this as follows: for any age group and day , we have , because the cumulative number of infections includes both active COVID-19 cases and closed cases (cases of individuals who were infected in the past but recovered/succumbed by day ). Therefore, to estimate and from , we assume that every infected individual takes exactly days to recover. This assumption is consistent with WHO’s criteria for discharging patients from isolation (i.e., discontinuing transmission-based precautions) [33] after a period involving the first 10 days from the onset of symptoms and 3 additional symptom-free days (if the patient is originally symptomatic) or after 10 days from being tested positive for SARS-CoV-2 (if the patient is asymptomatic). After the required period, the patients were not required to re-test. Under such an assumption on the recovery time, we have and . Next, we obtain by subtracting from the total population of , which is obtained from the age distribution and the total population of Tokyo.
We must mention that in the subsequent analysis, all infected individuals are considered infectious, i.e., they can potentially transmit the SARS-CoV-2 virus to their susceptible contacts. This assumption, on which the classical SIR model and all its variants are based, is consistent with the CDC’s understanding of the first wave of SARS-CoV-2 infection, which claims that every infected individual remains infectious for up to about 10 days from the onset of symptoms, though the exact duration of the period of infectiousness remains uncertain [34].
5.3 Parameter Estimation Algorithm
Before estimating the parameters of our model, we discretize the ODEs (1) with a step size of 1 day and obtain the following:
(24) | ||||
(25) | ||||
A key observation here is that these equations are linear in the model parameters. Therefore, given the sets of fractions , , and (which we obtain by implementing the data processing steps described above) for all , we can express (24) in the form of a matrix equation , where the column vector is a stack of the parameters and , the column vector is a stack of the increments , , and , and is a matrix of coefficients. Thus, solving the least-squares problem (26) gives us the best estimates of the model parameters in the mean-square sense.
(26) |
However, the values of the contact rates change as and when the patterns of social interaction in the network change during the course of the pandemic. For this reason, we assume that the pandemic timeline splits up into multiple phases, say , with the contact rates varying across phases, and we perform the required optimization separately for each phase. At the same time, we do not expect the contact rates to make quantum leaps (or falls) from one phase to the next. Therefore, for every , in the objective function corresponding to Phase we introduce a regularization term that penalizes any deviation of the optimization variables from the model parameters estimated for the previous phase (Phase ). Adding this term also ensures that our parameter estimation algorithm does not overfit the data associated with any one phase. Our optimization problem for Phase thus becomes
(27) |
where is the parameter vector estimated for Phase .
We now summarize this parameter estimation algorithm for Phase .
5.4 Phase Detection Algorithm
We now provide an algorithm that divides the timeline of the pandemic into multiple phases in such a way that the beginning of each new phase indicates a significant change in one or more of the contact rates .
Given the pandemic timeline (where denotes March 10, 2020 and denotes April 9, 2021), our phase detection algorithm outputs phase boundaries that divide into phases, namely . Central to the algorithm are the following optimization problems:
Problem (a): Unconstrained Optimization
(28) |
Problem (b): Constrained Optimization
(29) | ||||
subject to | (30) |
In these problems, denotes the start date (chosen recursively as described in Algorithm 2), is the optimization window, is the algorithm step size, denotes a -day period from day , and are obtained from by using the procedure described in Section 5.3, and is the parameter vector estimated by Problem (a). We set (days), and the quantities and are pre-determined algorithm parameters whose choice is discussed in the next subsection.
Observe that both Problem and Problem (b) result in the minimization of the mean-square error (31), where are the model-generated values (estimates) of the susceptible, infected, and recovered fractions . Also note that Problem (a) performs this minimization while ignoring all the previously estimated model parameters, whereas Problem (b) performs the same minimization while constraining to remain close to the parameter vector estimated for the period . However, if the contact rates do not change significantly around day , then the additional constraint imposed in Problem (b) should be satisfied automatically (without imposition) in Problem (a), which should in turn result in the same mean-square error for both the problems.
(31) |
Therefore, after solving Problems (a) and (b), our phase detection algorithm compares (the mean-square error for Problem (a)) with (the mean-square error for Problem (b)) as follows: using (31), the algorithm first computes and . It then compares with , a positive threshold whose choice is discussed in the next subsection. If
(32) |
then is identified as a phase boundary. Otherwise, the algorithm increments the value of by , checks whether the interval is part of the timeline , and repeats the entire procedure described above.
Finally, the algorithm merges every short phase (length days) with its predecessor by deleting the appropriate phase boundary(s). There are two reasons for this step. First, the contact rates are believed to change not instantly but with a transition period of positive duration. Second, since the data used is noisy, to avoid overfitting the data it is necessary for the number of data points per phase (given by times the number of days per phase) to significantly exceed , the number of model parameters to be estimated per phase.
We now provide the pseudocode for the entire algorithm. Observe that Problems (a) and (b) are both convex optimization problems. This enables us to use the Embedded Conic Solver (ECOS) [35] of CVXPY [36, 37] to implement our algorithm.
Input: for all and
Output: Set of phase boundaries
5.5 Selection of Algorithm Parameters
We now explain our parameter choices for the algorithms described above.
5.5.1 Phase Detection Algorithm
As mentioned earlier, for Algorithm 2, we set days and the optimization window days. This ensures that the optimization window is large enough for the number of model parameters to be significantly smaller than the number of data points used to estimate these parameters in Problems (a) and (b). In addition, we set , and for the following reasons:
-
1.
: If both and are sub-intervals of the same phase, then the same set of contact rates (and hence the same parameter vector ) should apply to the network during both the time intervals.
- 2.
5.5.2 Parameter Estimation Algorithm
We set in (27). This small but non-zero value is consistent with our belief that around every phase boundary, contact rates change gradually but significantly during a transition period involving the phase boundary.
5.6 Results
We now present the results of implementing both the algorithms on our chosen dataset.
5.6.1 Phase Detection
Algorithm 2 detects the following phases.
Phase | From | To | Corresponding Events |
---|---|---|---|
1 | Mar 10 2020 | Mar 28 2020 | Closure of Schools |
2 | Mar 28 2020 | April 23 2020 | Issuance of State of Emergency |
3 | April 23 2020 | May 20 2020 | |
4 | May 20 2020 | Jun 22 2020 | |
5 | Jun 22 2020 | Jul 24 2020 | Summer Vacation |
6 | Jul 24 2020 | Aug 25 2020 | Obon, Summer Vacation |
7 | Aug 25 2020 | Sep 23 2020 | Summer Vacation |
8 | Sep 23 2020 | Oct 20 2020 | “Go to Travel” Campaign |
Relaxation of Immigration Policy | |||
9 | Oct 20 2020 | Nov 14 2020 | “Go to Eat” Campaign |
“Go to Travel” Campaign | |||
10 | Nov 14 2020 | Dec 19 2020 | |
11 | Dec 19 2020 | Jan 12 2021 | Issuance of State of Emergency |
Winter Vacation | |||
12 | Jan 12 2021 | Feb 07 2021 | |
13 | Feb 07 2021 | Apr 09 2021 |
Although some of the detected phases can be accounted for by identifying changes in governmental policies and major social events, many of them seem to result from changes in social interaction patterns that cannot be explained using public information sources (such as news websites). However, this is consistent with out intuition that social behavior is inherently dynamic – it displays significant changes even in the absence of government diktats and important calendar events. Moreover, except for the first phase, the length of every phase is at least 25 days, which points to the likely scenario that it takes at least 3 to 4 weeks for the contact rates to change significantly. This could be true because social behavior is often unorganized. In particular, the interaction patterns of any one individual are often not in synchronization with those of others.
Another noteworthy inference to be drawn from Table 1 and Figure 1 is that policy changes initiated by governments have a delayed effect at times. For example, the “Go to Travel” and the “Go to Eat” campaigns, launched between mid-September and mid-November (Phases 8 and 9), seem to have caused a spike in daily case counts in the subsequent phases (Phases 10 and 11). Likewise, the State of Emergency issued in Phase 11 seems to have come to fruition in Phase 12 and its effects appear to have remained until the last phase (Phase 13).
5.6.2 Parameter Estimation and Its Implications
Figure 1 below plots the original and the model-generated fractions of infected individuals in each age group as functions of time.

Figure 2 plots the estimated contact rates and labels the 10 most significant ones among the 25 rates.
As seen in Figure 1, three COVID-19 surges or “waves” occur during the considered timeline. For each wave, we explain below the corresponding contact rate variations and their implications with the help of the mobility data of Tokyo (Figure 3) collected by Google [40].


The period in which state of emergency is issued is highlighted in red.
The First Wave (March 2020 - June 2020, Phases 1 - 3)
This wave corresponds to a rapid surge in daily cases across the world followed by various governmental measures such as issuance of national emergencies, tightening of immigration policies, home quarantines, and school closures. In Japan, the national emergency consisted of various measures such as restrictions on service times in restaurants and bars, enforcement of work from home, and a limit on the number of people attending public events. As a result of these measures, the mobility of workplaces, retail and recreation, and transit stations dropped dramatically in April 2020 and remained low for over a month (Figure 3).
This drop is reflected in our simulation results (Figure 2), which show that the three greatest contact rates decreased steadily from April to June. However, Figure 2 also shows that contact rates from the age group 60-79 to most other age groups (shown in blue) remained remarkably high throughout the timeline . This may be because most people admitted to nursing homes are aged above 60 and frequently come in contact with the relatively younger care-taking staff. More strikingly, the contact rate from age group 60-79 to age group 80+ is consistently high. This could be because there is a significant number of married couples with members from both these age groups (thereby resulting in a high value of for ) and because the age group 80+ has the lowest immunity levels, which leads to a large effective (infection rate) for .
The Second Wave (July 2020 - September 2020, Phases 4 - 7)
The most intriguing aspect of the second wave is that the wave subsided without any significant governmental interventions (such as the issuance of a nationwide emergency). To explain this phenomenon, some researchers point out that (i) the rate of PCR testing increased in July and thus more infections were detected in the first few weeks of the second wave, and (ii) people’s mobility decreased in August during the Japanese summer vacation period called “Obon” [39]. As we can infer from Figure 3, this decrease in mobility occurs primarily at workplaces and transit stations [39].
Besides Figure 3, our simulation results provide some insight into the second wave. Figure 1 shows that the contact rates from age group 60-79 to other age groups do not show any increase during the first few weeks of the wave. However, the intra-group contact rate of the age group 20-39 increases rapidly before this period and drops significantly in August, corresponding to a decrease in daily cases. This strongly suggests that the social activities of those aged between 20-39 played a key role in the second wave. Meanwhile, contact rates from the age group 60-79 decreased after the first wave, possibly because of an increase in the proportion of quarantined individuals among the elderly, which in turn could have resulted from an increased public awareness of older age groups’ higher susceptibility to the virus.
The Third Wave (October 2020 - January 2021, Phases 8 - 11)
This wave was the most severe of the three because in October, a policy promoting domestic travel (the “Go to Travel” campaign) was implemented in the Tokyo prefecture and eating out was promoted as well (as part of the “Go to Eat” campaign). In addition, Japan started relaxing its immigration policy in October. [39] points out that the “major factors for this rise include the government’s implementation of further policies to encourage certain activities, relaxed immigration restrictions, and people not reducing their level of activity”. This observation is supported by Figure 3, which shows that there is no drop in mobility in any category during the third wave. As a result, daily infection counts dropped only after the second state of emergency was issued by the government on January 7, 2021.
In agreement with these observations are our simulation results (Figure 1), which show that the age group 60-79 remained the most infectious throughout the third wave, and that the contact rates from the age group 20-39 gradually increased in the early weeks of the wave. This was followed by a remarkable decrease in the intra-group contact rate of the age group 60-79 from mid-January onwards.
Comparing the Age Groups on the Basis of Infectiousness and Susceptibility
It is evident from Figure 2 that among all the five age groups, members of the youngest age group (0-19) are the least likely to contract COVID-19. This validates the current understanding of the scientific community that children and teenagers are more immune to the disease than adults. At the opposite extreme, the age groups 80+ and 20-39 appear to be the most vulnerable, possibly because members of the former group have the lowest immunity levels and the latter group exhibits the highest levels of mobility and social activity.
Besides throwing light on how the likelihood of receiving infection varies across age groups, Figure 2 also throws light on how the likelihood of transmitting the infection varies across age groups. From the figure, the two most infectious age groups are clearly 60-79 and 20-39. Surprisingly, the age group 80+ is found to be less infectious than the group 60-79, perhaps because of the lower social mobility of the former. The figure also shows that the age groups 0-19, 40-59 and 80+ are remarkably less infectious than the other two age groups. However, we need additional empirical evidence to validate these findings, and it would be interesting to see whether our inferences are echoed by future empirical studies.
6 CONCLUSION AND FUTURE DIRECTIONS
We have analyzed the age-structured SIR model of epidemic spreading from both theoretical and empirical viewpoints. Starting from a stochastic epidemic model, we have shown that the ODEs defining the age-structured SIR model are the mean-field limits of a continuous-time Markov process evolving over a time-varying network that involves random, asynchronous interactions if and only if the social mixing rate grows unboundedly with the population size. We have also provided a lower-bound on the associated convergence rates in terms of the social mixing rate. As for empirical validation, we have proposed two algorithms: a least-square method to estimate the model parameters based on real data and a phase detection algorithm to detect changes in contact rates and hence also the most significant social behavioral changes that possibly occurred during the observed pandemic timeline. We have validated our model empirically by using it to approximate the trajectories of the numbers of susceptible, infected, and recovered individuals in the prefecture of Tokyo, Japan, over a period of more than 12 months. Our results show that for the purpose of forecasting the future of the COVID-19 pandemic and designing appropriate control policies, the age-structured SIR model is likely to be a strong contender among compartmental epidemic models.
Our analysis, however, has a few limitations. First, it is not clear whether the large number of phases detected by Algorithm 2 indicates rapidly changing social interaction patterns or simply that our model is unable to approximate the pandemic over timescales significantly longer than a month. Second, the outputs of our algorithms have a few surprising implications that are as yet unconfirmed by independent empirical studies. For example, the estimated contact rates indicate that the age group 60-79 is consistently more infectious than the age group 20-39, a finding that is inconsistent with the widely held belief that younger age groups are significantly more mobile than the older ones. Such apparent anomalies highlight the need for age-stratified mobility datasets that would enable further investigation into the dynamic interplay between social behavior and epidemic spreading.
References
- [1] Worldometer, “COVID-19 Coronavirus Pandemic,” World Health Organization, www.worldometers.info/coronavirus, last accessed 2021, November 11.
- [2] N. Ferguson, D. Laydon, G. Nedjati-Gilani, N. Imai, K. Ainslie, M. Baguelin, S. Bhatia, A. Boonyasiri, Z. Cucunubá, G. Cuomo-Dannenburg, et al., “Report 9: Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand,” Imperial College London, vol. 10, no. 77482, pp. 491–497, 2020.
- [3] F. E. Alvarez, D. Argente, and F. Lippi, “A simple planning problem for COVID-19 lockdown,” tech. rep., National Bureau of Economic Research, 2020.
- [4] D. Acemoglu, V. Chernozhukov, I. Werning, and M. D. Whinston, “A multi-risk SIR model with optimally targeted lockdown,” NBER working paper, no. w27102, 2020.
- [5] K. Chatterjee, K. Chatterjee, A. Kumar, and S. Shankar, “Healthcare impact of COVID-19 epidemic in India: A stochastic mathematical model,” Medical Journal, Armed Forces, India, vol. 76, no. 2, pp. 147–155, 2020.
- [6] P. E. Paré, J. Liu, C. L. Beck, A. Nedić, and T. Başar, “Multi-competitive viruses over time-varying networks with mutations and human awareness,” Automatica, vol. 123, p. 109330, 2021.
- [7] F. Bullo, “Current epidemiological models: Scientific basis and evaluation,” Health, vol. 4, p. 14, 2020.
- [8] W. O. Kermack and A. G. McKendrick, “A contribution to the mathematical theory of epidemics,” Proceedings of the Royal Society of London, Series A, vol. 115, no. 772, pp. 700–721, 1927.
- [9] T. G. Kurtz, “Solutions of ordinary differential equations as limits of pure jump Markov processes,” Journal of Applied Probability, vol. 7, no. 1, pp. 49–58, 1970.
- [10] M. Benaim and J.-Y. Le Boudec, “A class of mean field interaction models for computer and communication systems,” Performance Evaluation, vol. 65, no. 11-12, pp. 823–838, 2008.
- [11] E. Volz, “SIR dynamics in random networks with heterogeneous connectivity,” Journal of Mathematical Biology, vol. 56, no. 3, pp. 293–310, 2008.
- [12] L. Decreusefond, J.-S. Dhersin, P. Moyal, V. C. Tran, et al., “Large graph limit for an SIR process in random network with heterogeneous connectivity,” The Annals of Applied Probability, vol. 22, no. 2, pp. 541–575, 2012.
- [13] J. Mossong, N. Hens, M. Jit, P. Beutels, K. Auranen, R. Mikolajczyk, M. Massari, S. Salmaso, G. S. Tomba, J. Wallinga, et al., “Social contacts and mixing patterns relevant to the spread of infectious diseases,” PLoS Medicine, vol. 5, no. 3, p. e74, 2008.
- [14] K. Prem, A. R. Cook, and M. Jit, “Projecting social contact matrices in 152 countries using contact surveys and demographic data,” PLoS Computational Biology, vol. 13, no. 9, p. e1005697, 2017.
- [15] P. Klepac, A. J. Kucharski, A. J. Conlan, S. Kissler, M. L. Tang, H. Fry, and J. R. Gog, “Contacts in context: large-scale setting-specific social mixing matrices from the BBC Pandemic Project,” MedRxiv, 2020.
- [16] I. Voinsky, G. Baristaite, and D. Gurwitz, “Effects of age and sex on recovery from COVID-19: Analysis of 5769 Israeli patients,” Journal of Infection, vol. 81, no. 2, pp. e102–e103, 2020.
- [17] G. Ellison, “Implications of heterogeneous SIR models for analyses of COVID-19,” tech. rep., National Bureau of Economic Research, 2020.
- [18] D. Acemoglu, V. Chernozhukov, I. Werning, and M. D. Whinston, “Optimal targeted lockdowns in a multigroup SIR model,” American Economic Review: Insights, vol. 3, no. 4, pp. 487–502, 2021.
- [19] D. Baqaee, E. Farhi, M. J. Mina, and J. H. Stock, “Reopening scenarios,” tech. rep., National Bureau of Economic Research, 2020.
- [20] A. A. Rampini, “Sequential lifting of COVID-19 interventions with population heterogeneity,” tech. rep., National Bureau of Economic Research, 2020.
- [21] C. A. Favero, A. Ichino, and A. Rustichini, “Restarting the economy while saving lives under COVID-19,” 2020. CEPR Discussion Paper No. DP14664.
- [22] R. Parasnis, A. Sakhale, R. Kato, M. Franceschetti, and B. Touri, “A Case for the Age-Structured SIR Dynamics for Modelling COVID-19,” in 2021 60th IEEE Conference on Decision and Control (CDC), pp. 5508–5513, IEEE, 2021.
- [23] A. V. Tkachenko, S. Maslov, A. Elbanna, G. N. Wong, Z. J. Weiner, and N. Goldenfeld, “Time-dependent heterogeneity leads to transient suppression of the COVID-19 epidemic, not herd immunity,” Proceedings of the National Academy of Sciences, vol. 118, no. 17, 2021.
- [24] J. Dolbeault and G. Turinici, “Heterogeneous social interactions and the COVID-19 lockdown outcome in a multi-group SEIR model,” Mathematical Modelling of Natural Phenomena, vol. 15, p. 36, 2020.
- [25] S. Contreras, H. A. Villavicencio, D. Medina-Ortiz, J. P. Biron-Lattes, and Á. Olivera-Nappa, “A multi-group SEIRA model for the spread of COVID-19 among heterogeneous populations,” Chaos, Solitons & Fractals, vol. 136, p. 109925, 2020.
- [26] A. Viguerie, G. Lorenzo, F. Auricchio, D. Baroli, T. J. Hughes, A. Patton, A. Reali, T. E. Yankeelov, and A. Veneziani, “Simulating the spread of COVID-19 via a spatially-resolved Susceptible–Exposed–Infected–Recovered–Deceased (SEIRD) model with heterogeneous diffusion,” Applied Mathematics Letters, vol. 111, p. 106617, 2021.
- [27] J. E. Harris, “Data from the COVID-19 epidemic in Florida suggest that younger cohorts have been transmitting their infections to less socially mobile older adults,” Review of Economics of the Household, vol. 18, no. 4, pp. 1019–1037, 2020.
- [28] M. Giagheddu and A. Papetti, “The macroeconomics of age-varying epidemics,” Available at SSRN 3651251, 2020.
- [29] A. Janiak, C. Machado, and J. Turén, “COVID-19 contagion, economic activity and business reopening protocols,” Journal of Economic behavior & Organization, vol. 182, pp. 264–284, 2021.
- [30] B. Armbruster and E. Beck, “Elementary proof of convergence to the mean-field model for the SIR process,” Journal of Mathematical Biology, vol. 75, no. 2, pp. 327–339, 2017.
- [31] P. L. Simon and I. Z. Kiss, “From exact stochastic to mean-field ODE models: a new approach to prove convergence results,” The IMA Journal of Applied Mathematics, vol. 78, no. 5, pp. 945–964, 2013.
- [32] Tokyo Metropolitan Government, “COVID-19 Information Website,” https://stopcovid19.metro.tokyo.lg.jp/en, last accessed 2021, September 28.
- [33] W. H. Organization, “Criteria for releasing COVID-19 patients from isolation,” https://www.who.int/news-room/commentaries/detail/criteria-for-releasing-covid-19-patients-from-isolation, last accessed 2021, September 28.
- [34] C. for Disease Control and Prevention, “Clinical Questions about COVID-19: Questions and Answers / When is someone infectious?,” https://www.cdc.gov/coronavirus/2019-ncov/hcp/faq.html#Transmission, last accessed 2021, September 28.
- [35] A. Domahidi, E. Chu, and S. Boyd, “ECOS: An SOCP solver for embedded systems,” in 2013 European Control Conference (ECC), pp. 3071–3076, IEEE, 2013.
- [36] S. Diamond and S. Boyd, “CVXPY: A Python-embedded modeling language for convex optimization,” Journal of Machine Learning Research, vol. 17, no. 83, pp. 1–5, 2016.
- [37] A. Agrawal, R. Verschueren, S. Diamond, and S. Boyd, “A rewriting system for convex optimization problems,” Journal of Control and Decision, vol. 5, no. 1, pp. 42–60, 2018.
- [38] Wikipedia, “Timeline of the COVID-19 pandemic in Japan,” last accessed 2021, September 28.
- [39] K. Karako, P. Song, Y. Chen, W. Tang, and N. Kokudo, “Overview of the characteristics of and responses to the three waves of COVID-19 in japan during 2020-2021,” BioScience Trends, 2021.
- [40] Google, “COVID-19 Community Mobility Reports,” https://www.google.com/covid19/mobility/, last accessed 2021, September 28.
- [41] S. P. Lalley, “Continuous Time Markov Chains,” Lecture Notes on Stochastic Processes II, 2012. galton.uchicago.edu/~lalley/Courses/313/ContinuousTime.pdf.
Appendix
Our first aim is to prove Proposition 1, which is based on the Lemma 1. This lemma describes a known property of continuous-time Markov chains, but we prove it nevertheless. The proof is based on the concept of jump times, defined below.
Definition 1 (Jump Times).
The jump times of the Markov chain are the random times defined by and for all .
Note that jump times are simply the times at which the Markov chain jumps or transitions to a new state.
Lemma 1.
Let and let . Given that , the conditional probability that more than one state transitions occur during is .
Proof.
Let be any two states such that and potentially succeed and , respectively. Also, let be the embedded jump chain of (where ). Then, given that , , and , the holding times and are conditionally independent exponential random variables with parameters and , respectively. Therefore, given that the original Markov chain makes its first and second transitions from to and from to respectively, the conditional probability that both of these transitions occur during is given by
(33) | |||
(34) | |||
(35) | |||
(36) | |||
(37) |
Therefore, , which is at most , is . Hence, given that , the conditional probability that at least two state transitions occur during is . By time-homogeneity, this means the following: given that , the conditional probability that at least two state transitions occur during is .
Proof of Proposition 1
Proof.
We derive the equations one by one.
Proof of (i):
Consider any state . Then, by the definition of , for any and , we have
(38) |
We now use (38) to evaluate the probability of the event . To this end, let denote the event that exactly nodes in a given set recover during (i.e., there exist exactly indices in such that and ). Similarly, let denote the event that exactly nodes in get infected during . Then,
(39) | |||
(40) | |||
(41) | |||
(42) | |||
(43) | |||
(44) | |||
(45) |
where is a straightforward consequence of Lemma 1, and follows from (38). Since this holds for all , we have
(46) |
where , , and stand for , , and , respectively. Since and are determined by , we may express (46) as
(47) |
As a result, we have
(48) |
At this point we note that
We thus have the following for :
(49) | ||||
(50) | ||||
(51) | ||||
(52) |
where (a) follows from Lemma 1. Taking expectations on both sides of (49) and dividing the resulting relation by now yields
(53) |
where we used that . On letting and then dividing both the sides of (53) by , we obtain (i).
Proof of (ii):
Observe that for any , we have
(54) | ||||
(55) | ||||
(56) | ||||
(57) | ||||
(58) | ||||
(59) | ||||
(60) | ||||
(61) |
Proof of (iii):
Observe that when , we have . Therefore,
Taking expectations on both sides, dividing by , letting , and dividing both sides by yields (iii).
Proof of (iv):
Observe that if , we have , and if , we have .
Thus,
On substituting the probabilities above with the expressions derived earlier, taking expectations on both sides, dividing by and letting , we obtain (iv).
Lemma 2.
Let and be any two nodes, let be any time, and let be the random variable such that is the time at which is updated for the last time during the interval . Then the random variables and are independent.
Proof.
For , let denote the number of times is updated in the open interval , and let denote the zero-probability event that is updated at time . Note that for all , which means that the rate at which is updated is time-invariant and independent of the network state. This means that the sequence of times at which is updated is a Poisson process, which further means that the updates of occurring in disjoint time intervals are independent. It follows that is a Poisson random variable (with mean ) that is independent of and . As a result,
where (a) follows from the definition of , follows from Bayes’ rule, and follows from the model definition (Section 2). Thus, and do not depend on , which means that and are independent.
The proof of Proposition 2 is based on the concepts of transition sequences and agnostic transition sequences, which we define below.
Definition 2 (Transition Sequence).
Consider any time , integer , tuples , and times . Let denote the event that the embedded jump chain satisfies and for all , and . Then is said to be a transition sequence for the time interval .
Note that if is a transition sequence for , then for every tuple , we either have or .
Definition 3 (-Complement).
Let . Then the -complement of , denoted by , is defined by
Definition 4 (-Agnostic Transition Sequence).
Let , and let be a transition sequence for a time interval . Further, define
Then the -agnostic transition sequence for is the event , where , defined by
is called the -complement of .
Given that occurs, denotes the time at which the edge state of is updated for the last time during the time interval (note that, if the edge state of is not updated during , then ). Therefore, the only difference between and is that the last edge state of to be realized during the interval is different for and . Stated differently, if , then , and vice-versa. As a result, the event is -agnostic in that the occurrence of this event does not provide any information about the edge state of at time .
Note that if is a transition sequence, then , , and are all zero-probability events. We now approximate these events with the help of suitable positive-probability events.
Definition 5 (-Approximation).
Let be a transition sequence. Then, for a given , the -approximation of is the event , where , , and . Also, the -approximation of is the event (where is the -approximation of ).
The following lemma evaluates the probability of occurrence of a -approximation event.
Lemma 3.
Let be a transition sequence. Then for all sufficiently small , the ratio equals
where , , and for each .
Proof.
Observe that
where follows from the strong Markov property and the fact that jump times are stopping times, follows from Proposition 3.2 of [41], follows from the fact that and (which equals ) are conditionally independent given (see Proposition 3.1 of [41]), and follows from the following two facts:
-
1.
is conditionally exponentially distributed with mean given that .
-
2.
For the embedded jump chain, the probability of transitioning from to is .
For the rest of the appendix, let and be any two nodes, let be any time instant, let be the random variable such that is the time at which is updated for the last time during the interval , and let , where is the time at which gets infected.
Lemma 4.
The PDF of has as its support and is given by
where is the Dirac-delta function.
Proof.
The definition of implies that the support of its PDF is . To derive the required closed-form expression for this PDF, recall that for all , which means that the edge state of is updated at a constant rate of at all times. Therefore, for any , the quantity (the probability that is not updated during ) is given by . However, , implying that . Hence, the CDF of is for all , and . Taking the first derivative of this CDF now yields the required expression for .
To prove the next lemma, we need the notion of agnostic superstates, which is defined below.
Definition 6 (-Agnostic Superstate).
Given a node pair , a collection of states is an -agnostic superstate if can be expressed as for a pair of states satisfying and for all .
Note that an -agnostic superstate specifies the disease states of all the nodes and the edge states of all the node pairs except .
Definition 7 (-Agnostic Jump Times).
Given , the -agnostic jump times of the chain , denoted by , are defined by and for all .
Note that and that the -agnostic jump times of are the jump times of the chain at which the edge state of is not updated.
Lemma 5.
is independent of .
Proof.
Note that is a function of , the time at which gets infected. Hence, it suffices to prove that is independent of .
Consider now any and note that . To examine the probability of , we let denote the set of all the events of the form (where are -agnostic superstates satisfying for all ) that satisfy , and we observe that .
We now examine for an arbitrary . Pick any and . Note that implies that . In view of our definition of , this means that , which possibly depends on the disease states and on the edge states , does not depend on . We next observe that, by the definitions of -agnostic states and jump times, none of the possible transitions from to involves an edge state update for , which means that the values of both and are preserved in such transitions. Therefore, for every , there exists at most one state that potentially succeeds . For such a state , the transition rate , given by
does not depend on or on , because implies that . It follows that the rate at which the Markov chain transitions from to a state in , given by , is time-invariant and takes the same value for every . This means that, as long as the Markov chain does not leave the -agnostic superstate , the rate at which the chain transitions to remains the same regardless of transitions within . We can express this formally as
for all and , where
denotes the time-invariant rate of transitioning from to .
By Markovity, this implies that222In this paper, conditioning an event on means conditioning on every for , i.e., conditioning H on the trajectory traced by the Markov chain during the interval and not just on the random set of tuples . Conditioning on the set is not sufficient because sets, by definition, are unordered.
for all and , which means that the conditional rate at which the chain leaves time is independent of the history . Since are determined by , it follows that
for all and . Now, let be the (random) time elapsed between time and the first of the updates of that occur after time and result in . Then, the following holds for all , , and sufficiently small :
where and follow from Markovity and the fact that for all . Now, let and for all . Then, since are stopping times, similar arguments can be used to show the following for all and all :
Similarly, if we let and for all , then we can show that for all and all ,
As a result, we have the following for all and all :
i.e.,
for all . Now, observe that if we are given , then are determined by a subset of the random variables and this subset is random but almost surely finite. Hence, the above limit implies the following for all and :
Moreover, since the above arguments remain valid if we replace with an arbitrary -agnostic superstate , we can generalize the above to
for all -agnostic superstates . It follows that
(70) |
for all and all . This means that, given for some , the random quantity , which is the duration of time spent by the Markov chain in , is conditionally exponentially distributed with rate and it is conditionally independent of . Besides, the above deductions also imply the following: given and given that the chain exits from state at time , the conditional probability that it enters at time is
By invoking Markovity in the preceding arguments, the above can be generalized to
(71) |
which implies that
Equivalently,
for all and . Hence,
(72) |
Since the entire analysis above holds for all , we have the following for all :
(73) | ||||
(74) | ||||
(75) | ||||
(76) | ||||
(77) | ||||
(78) | ||||
(79) |
where is a consequence of the strong Markov property and the fact that are stopping times, and (b) follows from (70) and (72). Since are arbitrary and since the above expression is independent of , we have shown that for the event
we have . As a result,
where and hold because the definition of implies that is a collection of disjoint events. Since and since are disjoint events, it follows that . Moreover, since is arbitrary, this means that is independent of . Finally, since and are functions of and , respectively, it follows that and are independent.
Remark 3.
Observe that in the proof of Lemma 5, implies that the event is independent of (since and since the initial state is assumed to be non-random). Note that this is true for all the choices of -agnostic superstates that satisfy and hence also for all that satisfy , where is an arbitrary -agnostic superstate. Now, let us by the set of all satisfying , we have
Then, by the preceding arguments we have
(80) | |||
(81) | |||
(82) | |||
(83) | |||
(84) | |||
(85) |
which shows that is independent of . Since , it follows that is independent of . As a consequence of this observation, the fact that is an arbitrary -agnostic superstate and the fact that is an arbitrary non-negative number, we have that are independent of , where denotes the -agnostic superstate of the chain at time .
In order to state the remaining lemmas, we need to introduce some additional notation. For two nodes , we let denote the event that transmits pathogens to at time . For a given time interval , we let . The complement of this event is denoted by . For two given node sets , we use to denote the event that some node(s) of infect(s) one or more nodes in at time .
We now provide a sequence of lemmas that we later use to prove Proposition 2.
Lemma 6.
Suppose , , , and such that . Given that and that for all , the conditional probability that neither recovers nor infects during the interval is , where is the Kronecker delta.
Proof.
Let . Also, let . Since the rate of infection transmission from to at time is , we have the following for all :
On the other hand, denoting the event that recovers during by , we have
Similarly, if we let denote the event that the edge state flips (i.e., changes from to ) during , we have . As a result, we have
where (a) follows from Lemma 1 and the Inclusion-Exclusion principle. Since this holds for all , the above implies that
Now, consider any . By replacing with in the above relation, we obtain
where is the event that recovers during and is the event that flips during . Therefore, on setting for an arbitrary , it follows that
(86) | ||||
(87) | ||||
(88) | ||||
(89) | ||||
(90) | ||||
(91) | ||||
(92) | ||||
(93) |
where (a) follows from the following observation: for any , Markovity implies that
which further implies that
Now, since (86) holds for all , it follows that
(94) | ||||
(95) | ||||
(96) |
Similarly, we can show that
(97) |
Since the above holds for all , it follows that
which proves the lemma.
Lemma 7.
Let denote the total duration of time for which the edge exists in the network during . Then, for all and all , we have
where we define for all . In other words, iff there exists an such that for all .
Proof.
We first show that is conditionally independent of given and :
(98) | ||||
(99) | ||||
(100) | ||||
(101) | ||||
(102) |
where and hold because is not updated during the interval , and (b) follows from the modeling assumption that the probability of the edge existing in the network following an edge state update is (independent of the past states ), the fact that almost surely, and from the observation that is an update time for given .
In view of (98), the definitions of , , and imply that
(103) | ||||
(104) | ||||
(105) | ||||
(106) | ||||
(107) |
where the last step holds because for all satisfying , which implies that, given and any other conditioning event, node recovers during at a constant rate of independently of all past edge states and past disease states (and therefore independently of past transmissions as well). Hence, and are conditionally independent given and .
We now evaluate the right-hand side of (103) as follows. Let denote the (random) number of times flips (changes) during , and let the times of these changes be . We assume that is even (as the case of being odd is handled similarly) and that for (the case for is handled similarly). Then, for a given and a collection of times , we have iff . Suppose this condition holds. Then, observe that
(108) | ||||
(109) | ||||
(110) | ||||
(111) | ||||
(112) | ||||
(113) | ||||
(114) |
where follows from Lemma 6, and follows from the following fact: since the definition of our epidemic model implies that the rate of pathogen transmission from to at any time instant depends only on and the disease state of at time , transmission events corresponding to disjoint time intervals are conditionally independent if we are given and the disease state of as functions of time.
On the other hand, we have
(115) | ||||
(116) | ||||
(117) | ||||
(118) |
where the second equality holds because our model assumes that the rate of recovery of an infected node is time-invariant and independent of all the edge states and the disease states of other nodes (precisely, for all such that ).
Observe that in the above proof, given that and that , uniquely determines . Therefore, as an implication of the above proof, we have
The dependence on the random variable holds because is a function of . By invoking Markovity, this result can be extended to
which is equivalent to the following lemma.
Lemma 8.
Let denote the total duration of time for which the edge exists in the network during . Then, for all , we have
Lemma 9.
Proof.
We first examine the following conditional probability for an arbitrary -agnostic superstate :
To begin, note that the proof of Lemma 5, Remark 3, and the fact that is a function of together imply that
and that
(119) |
Next, for the event , we have
(120) | ||||
(121) | ||||
(122) |
where and follow from our modelling assumption that for all satisfying , which means that the recovery time of depends only on the time of infection of and is conditionally independent of all other disease states and all the edge states. Similarly, we have
(123) | ||||
(124) | ||||
(125) |
where follows from our modelling assumptions, which imply that the rate of infection transmission along an edge depends only on the edge state of the transmitting edge and the disease state of the transmitting node and is conditionally independent of other disease states and edge states (which are captured by the -agnostic superstate of the chain) and follows from Lemma 8. Note that is a function of and hence also of .
It remains for us to analyze
To do so, we first let denote the time of the first -agnostic jump to occur after gets infected, i.e., , and we note the following: given the conditioning events and variables above (including the event that does not infect during ), the total conditional rate at which receives pathogens at any time is
which is determined uniquely by , the -agnostic superstate of the chain at time . Therefore, this rate is conditionally independent of for any . Similarly, for all age groups , given the conditioning events and variables above, the conditional rate at which a node recovers, which equals , and the total conditional rate at which a node receives pathogens, which equals , are both conditionally independent of given that . Therefore, by using arguments similar to those made in the proof of Lemma 5, we can show that is conditionally independent of given the rest of the conditioning events and variables. Moreover, by repeating the above for subsequent -agnostic jumps, we can generalize this conditional independence assertion to , which means that
(127) |
Combining (119), (120), (123) and (Proof.) now yields
Summing both the sides of the above equation over the space of all -agnostic superstates gives
(128) | ||||
(129) | ||||
(130) |
Here, we recall from our earlier arguments that
In light of (Proof.), this means that
Dividing both the sides of this equation by gives
where the last step holds because the summation is independent of given that . We have thus shown the following: given , and , the event is conditionally independent of . Since and are functions of , the assertion of the lemma follows.
Proof of Proposition 2
Before we prove Proposition 2, we recall that for any transition sequence on a time interval , the index indexes the transition in which is updated for the last time during given that occurs. We now define another similar index below:
Observe that indexes the transition in which gets infected given that occurs.
Proof.
Consider any realization of , and let be the set of all the transition sequences for that result in the occurrence of , so that .
Consider now any pair of nodes (so that we have and in the event that ), and note that for any transition sequence , we have , because both and involve the same node recoveries and disease transmissions (all of which occur along edges other than ). Therefore, for each , and it follows that .
Hence, we can derive bounds on (defined to be ) by bounding . To this end, we pick and , and apply Bayes’ rule to as follows:
(131) | ||||
(132) | ||||
(133) |
where holds because , which is the probability that the edge exists in the network after the last of the updates of to occur during .
We now estimate . Note that if is small enough, either or . Assume w.l.o.g. that (equivalently, ), and observe that
(134) | ||||
(135) |
Thus, the next step is to evaluate . To do so, suppose with , , , for some , and for some . Then the assumption implies that and hence also that . As a result,
It now follows from Lemma 3 that
(136) |
where and for all , and for all , , , and .
The above definitions imply that . To evaluate for , observe that by the definition of , we have for . Moreover, the facts and together imply that for all . Hence, for all . It now follows from the definition of (Section 2) that for all . Thus, .
To relate to , note that implies that and hence also that for . Since for all , we have
for all such that , and
for all such that . As a result,
(137) |
Moreover, using the definition of one can easily verify that regardless of whether the network is in state or in state , the rates of infection of nodes in , the recovery rates of nodes in , and the edge update rate (which is ) are the same. Given that for all , it now follows from (137) that
(138) |
Note that these bounds hold for all satisfying and . We now recall that is the difference between and the time at which is updated for the last time during , so that we have whenever occurs. Likewise, on . Therefore, (140) holds for all satisfying . As a result, we have
Since as argued earlier, it follows that
(141) |
Observe that , which means that
(142) | ||||
(143) | ||||
(144) |
Furthermore, since the above bounds do not depend on , we can remove the conditioning on to obtain
(145) |
Consequently,
where the second inequality follows from Lemmas 11 and 12. Since is an arbitrary realization of , the assertion of the proposition follows.
Remark 4.
The proof of Proposition 2 enables us to make a stronger statement about the conditional probability of being in contact with at time . Indeed, consider (141) and observe that it holds for all realizations of that satisfy and . It follows that
holds for all node pairs . Equivalently, the following holds for all :
Lemma 10.
Let . Then
Proof.
Consider , which denotes the duration of time for which is in contact with during the interval . Then, for any , we have
(146) | ||||
(147) | ||||
(148) | ||||
(149) | ||||
(150) | ||||
(151) | ||||
(152) | ||||
(153) |
where (a) follows from the fact that and from the observation that
the proof of which parallels the proof of Lemma 7, and and follow from the fact that
holds for all , the proof of which also parallels the proof of Lemma 7.
We now eliminate from (146). To do so, we first note the following: given that , the random variable is by definition conditionally independent of (the time of the last edge update of during ) because the edge update process for is a Poisson process and hence, for any collection of disjoint time intervals, the times at which is updated during the intervals are independent of each other. Since and since , it follows that
(154) | ||||
(155) | ||||
(156) | ||||
(157) | ||||
(158) |
and likewise,
Therefore, taking conditional expectations on both sides of (146) (w.r.t. the PDF ) yields
which proves the upper bound. For the lower bound, we again proceed as in (146), but reverse the inequality signs:
In light of (154) and (Proof.), the above inequality implies the required lower bound.
Lemma 11.
Let denote the random time defined earlier. Then
Proof.
We first use Bayes’ rule to note that for any ,
(159) | ||||
(160) | ||||
We consider each multiplicand one by one. First, we use Lemmas 2 and 5 to note that . To deal effectively with the other multiplicands, we let denote the total duration of time for which is in contact with during the time interval , and we observe that
where follows from Lemmas 7 and 9 and from the modelling assumption that recovers at rate independently of any edge state. On substituting the above expression into (159), we obtain
(162) |
where . Now, Lemma 7 implies that
where holds because the recovery time of is conditionally independent of given (recall that recovers at rate independently of any edge state (and hence independently of ), and is precisely the event that the recovery time of is at least ). Hence, (162) implies that for any satisfying , we have
(163) |
where .
As a consequence of (163) and Lemma 10, we have
Since for (see Lemma 4), we have the following for all :
(164) |
and for all , we have
(165) |
where scales so that . Since is the Dirac-delta function, (165) simply means that
Our next goal is to use (164) and (165) to show that
(166) |
where is the probability density function defined by for all and for . To this end, we compare with under the following two cases.
Case 1: There exists a time such that . In this case, (164) implies that for all ,
which means that the set is either or , where . Also, by the definition of , we have for all . Next, to compare and at , we use (165) to note that
(167) | ||||
(168) | ||||
(169) | ||||
(170) |
Thus, for all and for all . Now, since and are both PDFs, we must have or equivalently,
Since both the integrands above are non-negative, we have
Adding to both sides now yields (166).
Case 2: for all . In this case, we can simply set and repeat the arguments following (167) in Case 1 to show that (166) holds.
Next, we use the definition of to evaluate , and we then restate (166) as follows:
(171) |
Since this holds for all , the assertion of the lemma follows.
Using arguments very similar to the proof above, we can prove the following result.
Lemma 12.
Let denote the random time defined earlier. Then
Remark 5.
In the proof of Lemma 11, if, instead of using (163) along with the upper bound in Lemma 10, we had used (163) along with the lower bound in Lemma 10, we would have obtained
In addition, if we had subsequently replaced with a generic and the weighting function by the constant function , and if we had defined by , then using the same arguments but with reversed inequality signs, we would have been able to prove that
Since the integral on the left-hand-side is and since the right-hand-side evaluates to , we conclude that
for all and all .
Some Auxiliary Lemmas
In addition to the above results, the proof of Theorem 1 relies on the following lemmas, which we reproduce from [30].
Lemma 13.
For random variables and , we have .
Lemma 14.
For a random variable , we have .
Lemma 15.
For random variables and in ,
The following result is a straightforward consequence of the above lemmas.
Corollary 1.
For non-negative random variables and satisfying , we have
Proof of Theorem 1
Proof.
The proof is based on Proposition 2 and it follows the approach used in [30]. We first modify Equations (i) - (iv) (Proposition 1) by expressing the expectations of cross-terms such as in terms of expectations of individual terms such as and . To begin, we apply Lemma 15 to and obtain
Therefore, there exists a function such that
Similarly, we can use Lemma 15 to show that there exists a function such that
Next, we use Corollary 1 to express as
where and .
We thus obtain the following relations:
-
(I)
,
-
(II)
,
-
(III)
To handle terms of the form where is some random variable, we use Proposition 2 to obtain
As a result, if , then there exists a function such that
(172) |
By making the above substitutions in (i) - (iv), and by using the identity , we obtain the following differential equations:
-
(I)
,
-
(II)
,
-
(III)
-
(IV)
where , , and are functions from to and are defined on the basis of (172).
The above equations constitute a proper system of differential equations with the same variables , , , and appearing on both the sides. To express these equations compactly, we define as the vector whose entries are given by , , , and Then is a solution to the initial value problem and , where
-
(I)
,
-
(II)
,
-
(III)
-
(IV)
-
(V)
.
Observe that irrespective of the value of , the solution solves the initial value problem and , where
Next, we need to bound (where is the unique vector satisfying for all and ). For this purpose, we will need the following lemma, which we borrow from [30].
Lemma 16.
Consider the initial value problems , and , with solutions and respectively. If is Lipschitz in with constant and then .
Now, note that the domain of for can be chosen to be bounded because and . Similarly, . Also, we let and define . Since is a polynomial in , it is Lipschitz-continuous with some Lipschitz constant . In addition, we use the bounds on and the functions as follows:
i.e.,
where .
We now apply Lemma 16 after setting
Also, we let and . Then we have
where . Thus, for all ,
(173) |
Since , , and , the right hand side of (173) goes to zero as Hence we have the uniform convergence over any finite time interval . The last step is to show that implies -convergence, i.e., as . To this end, we have
where we used that , and the inequality holds because . Thus, the uniform convergence of to over proves that as .