This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Analysis of COVID-19 in Japan with Extended SEIR model and ensemble Kalman filter

Q. Suna,b🖂, T. Miyoshia,c,d, S. Richarda,b
Abstract

We introduce an extended SEIR infectious disease model with data assimilation for the study of the spread of COVID-19. In this framework, undetected asymptomatic and pre-symptomatic cases are taken into account, and the impact of their uncertain proportion is fully investigated. The standard SEIR model does not consider these populations, while their role in the propagation of the disease is acknowledged. An ensemble Kalman filter is implemented to assimilate reliable observations of three compartments in the model. The system tracks the evolution of the effective reproduction number and estimates the unobservable subpopulations. The analysis is carried out for three main prefectures of Japan and for the entire population of Japan. For these four populations, our estimated effective reproduction numbers are more stable than the corresponding ones estimated by a different method (Toyokeizai). We also perform sensitivity tests for different values of some uncertain medical parameters, like the relative infectivity of symptomatic / asymptomatic cases. The regional analysis results suggest the decreasing efficiency of the states of emergency.

  1. a)

    Data Assimilation Research Team, RIKEN Center for Computational Science (R-CCS), Kobe, Japan

  2. b)

    Graduate School of Mathematics, Nagoya University, Nagoya, Japan

  3. c)

    Prediction Science Laboratory, RIKEN Cluster for Pioneering Research, Kobe, Japan

  4. d)

    RIKEN interdisciplinary Theoretical and Mathematical Sciences Program (iTHEMS), Kobe, Japan

keywords: COVID-19, data assimilation, ensemble Kalman filter, extended SEIR model

Mathematics Subject Classification: 92-08, 92C60

1 Introduction

The outbreak of coronavirus 2019 (COVID-19) had a huge impact on human society, and its devastating effects are still present more than 18 months after its official acknowledgment. One of the specific characteristics of this disease increases the difficulties for any scientific research based on infectious disease models, namely the existence of asymptomatic cases. Their existence, but also their potential infectiousness, blur the full picture of the disease spreading inside the society [41]. The simultaneous existence of mild symptomatic cases, which often remain undocumented, also generates an additional challenge for any epidemiological investigations. For example, one study discussing the potential fraction of undocumented cases at the beginning of the outbreak ends up with a ratio of 86%86\% undocumented infections [23].

There exists now a huge literature related to COVID-19, but only a small number of papers are using techniques of data assimilation, as we shall do. For that reason, we present only a few references closely related to our investigations. To get a better understanding of the spread of COVID-19, some studies start with a specially designed infectious disease model with appropriate structures. Those elaborated models may accurately describe the process in reality but they also bring more uncertainties and unknown parameters. For example in [1], a 2222 variables model is adopted with each variable representing a subpopulation. The structure related to asymptomatic, pre-symptomatic, detected, and undetected cases are all properly described by the structure. However, the inference is done with only observations of 44 compartments among all variables. Another example is [9], where an extended SEIR model with 1111 age-classes is used to estimate the posterior distribution of parameters and make short period predictions. The distribution of mild, severe and fatal cases for each age class is introduced as pre-defined parameters. Also, a matrix which describes the relative transmissions between each pair of age groups is predefined.

Despite these elaborated models, most of the investigations are still performed on the common infectious disease models, namely the SIR and SEIR models. Such models are easier to implement and contain less parameters, but they also sacrifice some details of the process. For example the SEIR model is used to study the dynamical behavior of COVID-19 outbreaks at the regional level in [13], and the SIR model was used to track the effective reproduction number for 124124 countries in [2]. A slightly more elaborated model is also introduced in [27], namely the SITR model with T standing for under treatment, for inferring infection numbers and parameters values.

For dealing with real data, one of the recent tools employed for the study of infectious diseases is data assimilation. The techniques developed for data assimilation have the ability to optimally meld dynamical systems with noisy and imperfect observations. They also provide forecasts and estimations of parameters and variables [34]. As examples of investigations studying the details of the implementation of these techniques to epidemic models, let us mention [25] and [33]. The previously cited work [1] also uses statistical data assimilation for identifying the measurements required for getting accurate states and parameters estimations. Finally, in [14] an ensemble Kalman filter is adopted to forecast the COVID-19 pandemic in Saudi Arabia with an extended SEIR model including vaccination.

Let us now present our research. First, we construct an extended SEIR model which takes into account one of the main specificities of COVID-19: the simultaneous existence of an asymptomatic ​/ ​pre-symptomatic population and of a symptomatic population. These two infected populations have different characteristics which are encoded with different parameters. The model is also constructed such that it can be fed with the data of only three compartments, namely the Hospitalized (or treated) population, the Recovered population, and the Deceased population, naturally identified by the letters HH, RR, and DD, respectively. Note that the population RR or DD are coming from the population HH, and these three populations are considered as fully recorded by health services. Except for the first couple of weeks at the beginning of the epidemic, these data are also considered as the most reliable ones. With this model and these data, our main aim is to provide information about the effective reproduction number and about the population of asymptomatic ​/ ​pre-symptomatic or undetected symptomatic, with uncertainty ranges. For these investigations, we shall also mainly concentrate on the population of Tokyo, but provide additional analysis with two additional prefectures (Osaka and Kanagawa) and with the entire population of Japan. Let us mention that targeting a specific population gives the opportunity to rely on local parameters and to use information provided by local medical research or health services. We also emphasize that for data assimilation, we use the ensemble transform Kalman filter [17] which is efficient compared with the standard extended Kalman filter.

As a result of our investigations we get an effective reproduction number for the four populations mentioned before, as well as a time dependent estimate for the number of asymptomatic ​/ ​pre-symptomatic in these four populations. A comparison between our estimated effective reproduction number and the same statistic provided by Toyokeizai [39] shows the reliability of our strategy. As an asset of our approach, we test the sensitivity of our model and its outcomes to the values of uncertain parameters borrowed from the literature. These experiment’s results show a robust performance of the strategy used in our investigations. Another clear result from our computation of the effective reproduction number is the decay of the effectiveness of the states of emergency as their number increases. This effect is clearly visible independently for the three prefectures and for the entire country. Note that a similar effect is also visible in the use of public transportation, see for example [26]. Other outcomes of our investigation are gathered in Sections 4 and 5.

Let us finally describe the structure of this paper. In Section 2, we recall the standard SIR and SEIR models, and introduce the extended SEIR model. We also provide information for the computation of the effective reproduction number, and discuss the constant parameter values and the observations. In Section 3, we explain the technical details of ensemble transform Kalman filter. Readers familiar with data assimilation and Kalman filters can skip this section without any loss for the application to infectious diseases. In Section 4, background information about the experiments, and subsequently technical information, are provided. The main results of our investigations are also presented in this section. The discussion and the comparisons between the different populations are provided in Section 5. We finally prepare the ground for future projects.

2 Extended SEIR model

In this section, we recall a few information on the SIR and SEIR models, and introduce the extended SEIR model. We also discuss medical parameters, and provide some explanations about the observations.

2.1 The SIR and SEIR models

The SIR model is a deterministic epidemic model expressed by a system of differential equations. Its construction is based on Kermack–McKendrick theory, and it describes the transmission process of an infectious disease. A given population of size NN is divided into three mutually exclusive sub-populations call compartments: the susceptible hosts SS, the infectious hosts II, and the recovered hosts RR. For any given time tt, S(t)S(t), I(t)I(t), and R(t)R(t) describe the number of individuals in each respective compartment, and they satisfy S(t)+I(t)+R(t)=NS(t)+I(t)+R(t)=N. Individuals in compartment SS can be infected by individuals in compartment II through direct contact. Once a successful contact happens, the infected individual leaves compartment SS and becomes a member of compartment II. The number of newly infected individuals per unit time is given by βI(t)S(t)/N\beta I(t)S(t)/N, where the parameter β\beta is called the transmission coefficient. Similarly, once a patient recovers, this person leaves compartment II and goes to compartment RR immediately. The transfer between II and RR is controlled by the recovery rate γ\gamma. The transfer rate, namely the number of transfer individuals per unit time, is given by γI(t)\gamma I(t).

SS II RR βIS/N\beta IS/NγI\gamma I
Figure 1: Transfer diagram for the SIR model

Figure 2.1 shows the process of the SIR model, and the following differential system describes its dynamic:

dSdt\displaystyle\frac{dS}{dt} =βIS/N,\displaystyle=-\beta IS/N,
dIdt\displaystyle\frac{dI}{dt} =βIS/NγI,\displaystyle=\beta IS/N-\gamma I, (1)
dRdt\displaystyle\frac{dR}{dt} =γI.\displaystyle=\gamma I.

Note that the system (1) assumes a permanent immunity once recovered. In other words, no individual can be infected a second time. The model also assumes a constant total population: there is no inflow to the system, or outflow from the system.

The SEIR model is very similar to the previous model, but with an additional compartment EE between the compartments SS and II. The individuals in EE are exposed, namely they have been contaminated, but they are not infectious yet. The time spent in EE corresponds to the incubation period, before becoming infectious. The transfer rate from EE to II is given by εE(t)\varepsilon E(t).

SS EE II RR βIS/N\beta IS/NεE\varepsilon EγI\gamma I
Figure 2: Transfer diagram for the SEIR model

Figure 2.1 shows the process of the SEIR model, and the following differential system describes its dynamic:

dSdt\displaystyle\frac{dS}{dt} =βIS/N,\displaystyle=-\beta IS/N,
dEdt\displaystyle\frac{dE}{dt} =βIS/NεE,\displaystyle=\beta IS/N-\varepsilon E,
dIdt\displaystyle\frac{dI}{dt} =εEγI,\displaystyle=\varepsilon E-\gamma I,
dRdt\displaystyle\frac{dR}{dt} =γI.\displaystyle=\gamma I.

2.2 The extended SEIR model

The extended SEIR model has been developed based on some specific features of the COVID-19 epidemic, as described now. In early 2020, health services already noticed that some infected individuals, who did not show any symptom, could spread the disease. These persons correspond either to asymptomatic hosts or to pre-symptomatic hosts. In the first cohort, people will never show any symptom, while the second cohort corresponds to individuals who will show some symptoms subsequently. Clearly, asymptomatic hosts and pre-symptomatic hosts are difficult to be identified by health services, even though they play an important role in the spread of the disease. Symptomatic individuals are also infectious, but they can be more easily identified precisely because of their symptoms. Thus, if symptomatic individuals correspond to the compartment II of the above SIR or SEIR models, then there is no compartment left for the asymptomatic or the pre-symptomatic individuals.

The existence of COVID-19 infectious spreaders who do not show (yet or at all) any symptom already brings a lot of uncertainties to health services. In addition, whenever the symptoms are relatively mild, some symptomatic individuals do hesitate to report the health service [31, p. 11]. As a consequence, daily new cases are under-reported. Combining this effect with some delayed information, some inaccurate tests (especially in the first phase of the epidemic), and other reasons that we are not aware of, it turns out that that the reported data are not very accurate. In such a situation one needs to use proper strategies to analyse the observation data.

SS EE IaI_{a} IsI_{s} HH RR DD RaR_{a} RsR_{s} βaIaS/N\beta_{a}I_{a}S/NβsIsS/N\beta_{s}I_{s}S/NεE\varepsilon EδIa\delta I_{a}τHIs\tau_{H}I_{s}γHH\gamma_{H}HγDH\gamma_{D}HγaIa\gamma_{a}I_{a}γsIs\gamma_{s}I_{s}
Figure 3: Transfer diagram for the extended SEIR model. Compartment II of the SEIR model is divided into two compartments IaI_{a} (asymptomatic ​/ ​pre-symptomatic) and IsI_{s} (symptomatic)

In order to meet the special features of COVID-19, we separate the compartment II of the SEIR model into two compartments IaI_{a} and IsI_{s}. These compartments correspond to asymptomatic ​/ ​pre-symptomatic and to symptomatic states, respectively. Both IaI_{a} and IsI_{s} can infect SS. As shown in Figure 2.2, the transmission processes related to IaI_{a} and to IsI_{s} have transmission coefficients βa\beta_{a} and βs\beta_{s} respectively. Once an individual in SS gets infected, this person becomes a member of EE, and then moves to IaI_{a} when it becomes infectious. In IaI_{a}, part of these individuals (the asymptomatic) will never generate any symptoms. They will thus spend some time in IaI_{a}, and then recover. The corresponding compartment is denoted by RaR_{a}. In contrast, the other part of individuals in IaI_{a} will develop symptoms, and then move to IsI_{s}. In IsI_{s}, a majority of individuals will be identified by health services, but as mentioned above a minority will recover without being identified by any health services. Compartment RsR_{s} corresponds to those recovered individuals who have not been registered. The identified persons in IsI_{s} will then move to the compartment HH, which corresponds to hospitalized or treated patients. The ones staying at home but under medical control, or the ones isolated in a hotel, are all included in the compartment HH. Finally, the patients in HH will recover, and thus move to the compartment RR, or will decease and end up in the compartment DD. As for the SIR or SEIR models, we assume a permanent immunity, which means that there is no flow from RaR_{a}, RsR_{s}, or RR to SS. Also, the total number NN of individual is constant, namely at all time one has

N=S+E+Ia+Is+H+R+D+Ra+Rs.N=S+E+I_{a}+I_{s}+H+R+D+R_{a}+R_{s}. (2)

Based on the Figure 2.2 and on the explanations provided above, the differential system of the extended SEIR model reads:

dSdt\displaystyle\frac{dS}{dt} =βaIaS/NβsIsS/N\displaystyle=-\beta_{a}I_{a}S/N-\beta_{s}I_{s}S/N dRdt\displaystyle\frac{dR}{dt} =γHH\displaystyle=\gamma_{H}H
dEdt\displaystyle\frac{dE}{dt} =βaIaS/N+βsIsS/NεE\displaystyle=\beta_{a}I_{a}S/N+\beta_{s}I_{s}S/N-\varepsilon E dDdt\displaystyle\frac{dD}{dt} =γDH\displaystyle=\gamma_{D}H
dIadt\displaystyle\frac{dI_{a}}{dt} =εEδIaγaIa\displaystyle=\varepsilon E-\delta I_{a}-\gamma_{a}I_{a} dRadt\displaystyle\frac{dR_{a}}{dt} =γaIa\displaystyle=\gamma_{a}I_{a} (3)
dIsdt\displaystyle\frac{dI_{s}}{dt} =δIaτHIsγsIs\displaystyle=\delta I_{a}-\tau_{H}I_{s}-\gamma_{s}I_{s} dRsdt\displaystyle\frac{dR_{s}}{dt} =γsIs\displaystyle=\gamma_{s}I_{s}
dHdt\displaystyle\frac{dH}{dt} =τHIsγHHγDH\displaystyle=\tau_{H}I_{s}-\gamma_{H}H-\gamma_{D}H

where βa,βs,ε,δ,γa,γs,γD\beta_{a},\beta_{s},\varepsilon,\delta,\gamma_{a},\gamma_{s},\gamma_{D} and γH\gamma_{H} are some medical parameters. Note that some of them are time dependent.

2.3 The reproduction number

The basic reproduction number, denoted by R0R_{0}, can be interpreted as the average number of secondary cases generated by on primary case in a susceptible population. It is commonly admitted that R0=1R_{0}=1 is a threshold value, as explained below. We also refer to [8, 16] for more explanations and more precise statements.

To study R0R_{0} for different models, a general definition of the basic reproduction number is introduced based on the notion of disease free equilibrium (DFE). In a DFE the population remains in the absence of disease. For example, in the SIR or SEIR models, it means that I=0I=0 while in the extended SEIR model it means that Ia=Is=0I_{a}=I_{s}=0. In this context, the basic reproduction number can be defined as the number of new infections produced by a typical infectious individual in a population at a DFE. The main feature of R0R_{0} is that it corresponds to a threshold parameter, namely if R0<1R_{0}<1, the DFE is locally asymptotically stable, while if R0>1R_{0}>1, the DFE is unstable and an outbreak is possible.

The precise expression for R0R_{0} is clearly model dependent, but numerous examples are available in the literature. For example, let us consider a general compartmental disease transmission model. Such models are represented by a system of ordinary differential equations, and under natural assumptions it can be shown that these models have a DFE, see [8]. In this reference, the precise expression for R0R_{0} is then provided for some classes of models, and the extended SEIR model meets the assumptions of the staged progression model, as presented in [8, Sec. 4.3]. For the extended SEIR model, the expression for R0R_{0} then reads:

R0=βaδ+γa+βsδ(δ+γa)(γs+τH),\displaystyle R_{0}=\frac{\beta_{a}}{\delta+\gamma_{a}}+\frac{\beta_{s}\delta}{(\delta+\gamma_{a})(\gamma_{s}+\tau_{H})}, (4)

where βa\beta_{a} and βs\beta_{s} are the initial transmission coefficients at time 0.

To understand the above expression, observe that δ/(δ+γa)\delta/(\delta+\gamma_{a}) corresponds to the fraction of individuals of IaI_{a} going to the compartment IsI_{s}, while 1/(δ+γa)1/(\delta+\gamma_{a}) and 1/(γs+τH)1/(\gamma_{s}+\tau_{H}) define the average times an infected individual spends in compartments IaI_{a} and IsI_{s} respectively. Thus, each term on the R.H.S. of (4) represents the number of new infections generated by an infectious individual during the time spent in the compartments IaI_{a} and IsI_{s}.

In contrast, the definition of the effective reproduction number RtR_{t} at time tt is much more delicate. The various challenges and possible errors have recently been discussed in [15], to which we refer for a thorough discussion. In our setting, we shall keep the interpretation of RtR_{t} as the average number of secondary cases generated by one primary case. This approach is possible because the transmission coefficients at time tt are available in our approach, and therefore one can compute RtR_{t} with (4) and the corresponding βa\beta_{a} and βs\beta_{s} at time tt. Additional information on RtR_{t} will be provided in Section 5.

2.4 The medical parameters

It clearly appears in Figure 2.2 and in the corresponding system (2.2) that several parameters control the flows between the compartments. The values of these parameters may result in very different behaviors of the model. We refer for example to [22, Sec. 1.4] and to [4, Chap. 2] for general discussions of model behaviors and the role of parameters. In our setting, some parameters are easy to evaluate, as for example the recovery rate γH\gamma_{H} or the death rate γD\gamma_{D}, but others are difficult to estimate, as for example the transmission coefficients βa\beta_{a} and βs\beta_{s}. Let us also stress that some parameters depend on location. Since our experiment is based on data from Japan, see Section 4, the medical parameters are chosen accordingly. For that reason, we use research or survey results specific to Japan, or even more precisely to specific prefectures in Japan.

In Table 1 we list the estimated values of some parameters for the extended SEIR model, and provide the sources of information. Several parameters in the table have the form of the product of a percentage and the inverse of a time length. A similar setting for the construction of the parameters can be found for example in [9, 21]. For δ\delta and γa\gamma_{a}, the percentage parts should sum up to 11. For the parameters τH\tau_{H} and γs\gamma_{s}, some information can be found in the surveys [31, 37] and the health services website [38]. For parameter γH\gamma_{H} and γD\gamma_{D}, instead of using constant value estimated by health services, we shall use the observation data to estimate their values on a daily basis. The details will be discussed in Section 4.

Let us stress that the value assigned to some of these parameters has evolved during the last 12 months. For example, the ratio of asymptomatic individuals was thought to be very high at the beginning of the epidemic (up to 80%80\%), but some recent research [3, 5, 32] have revised this ratio to 17%17\% to 20%20\%. Our knowledge about the length of infectious periods has also evolved, and the possible values are spread over a rather long interval. In Table 1 we list some mean values, but in the simulations additional uncertainties are implemented. Since pre-symptomatic patients become infected before the appearance of symptoms, the incubation period (encoded in ε\varepsilon) has been shorten a little bit, and the last 2 days of this incubation period have been moved to IaI_{a}.

One very delicate question is the relation between βa\beta_{a} and βs\beta_{s}, namely between the transmission coefficient for asymptomatic ​/ ​pre-symptomatic and the transmission coefficient for symptomatic. For our investigations, we shall rely on the result of the systematic review [3] which asserts that the relative risk of asymptomatic transmission is 42%42\% lower than that for symptomatic transmission. As a consequence, we shall fix

βa=0.58βs or equivalently βs=1.72βa.\beta_{a}=0.58\beta_{s}\quad\hbox{ or equivalently }\quad\beta_{s}=1.72\beta_{a}. (5)

This factor 0.580.58 is slightly smaller but of a comparable scale compared to earlier investigations, see for example [5].

{TAB}

(r,1cm,1.2cm)[2pt]c—c—c—l—c—c—c—c—c—c—c—c— parameter & estimation source remark
ε\varepsilon (3 days)1(3\textrm{ days})^{-1} [7] incubation period, not contagious
δ\delta \pbox6cm83%×(2 days)183\%\times(2\textrm{ days})^{-1}
(95% CI 80% to 86%)(95\%\hbox{ CI }80\%\hbox{ to }86\%) [3, 7] \pbox15cmproportion of pre-symptomatic
×\times (duration of pre-symptomatic)1(\textrm{duration of pre-symptomatic})^{-1}
τH\tau_{H} \pbox6cm78%×(8.3 days)178\%\times(8.3\textrm{ days})^{-1} (\sim 2020/5/31)
78%×(5.2 days)178\%\times(5.2\textrm{ days})^{-1} (2020/6/1 \sim) [31, 24] \pbox15cmproportion of detected symptomatic
×\times (days of symptoms onset)1(\textrm{days of symptoms onset})^{-1}
γa\gamma_{a} \pbox6cm17%×(9 days)117\%\times(9\textrm{ days})^{-1}
(95% CI 14% to 20%)(95\%\hbox{ CI }14\%\hbox{ to }20\%) [3, 32] \pbox15cmproportion of asymptomatic
×\times (duration of asymptomatic)1(\textrm{duration of asymptomatic})^{-1}
γs\gamma_{s} 22%×(7 days)122\%\times(7\textrm{ days})^{-1} [31, 37] \pbox15cmproportion of undetected symptomatic
×\times (duration of symptomatic)1(\textrm{duration of symptomatic})^{-1}
γH\gamma_{H} \pbox15cmcomputed by
observations [39] discussed in Section 4
γD\gamma_{D} \pbox15cmcomputed by
observations [39] discussed in Section 4

Table 1: Medical parameters

2.5 The observations

As introduced in Section 1, one of the aims of this study is to estimate the unknown parameters βa\beta_{a} and βs\beta_{s}, and the unobservable compartments IaI_{a} and IsI_{s}. Because of relation (5), it is clear that for the parameters only βs\beta_{s} has to be evaluated. The method that we are going to introduce in Section 3 requires observations of some compartments in the extended SEIR model. In Figure 2.2, we highlight the three compartments with observations in yellow, namely HH, RR and DD. The data corresponding to these compartments may not be perfect, and for the analysis based on these data we shall take some uncertainties into consideration. More precisely, some random noise will be added to the observations.

For our experiment, we shall firstly and mainly use the data about Tokyo, with a total population N=13955000N=13^{\prime}955^{\prime}000 (approximate mean of the population of 2020 and 2021). Subsequently, other prefectures in Japan are also considered. As already mentioned, the compartment HH corresponds to the identified individuals either hospitalized or treated at home or in a hotel. On the other hand, RR and DD describe the accumulated number of recovered and deceased individuals. The information about these three compartments are very easy to find for Tokyo, but also for any region in Japan. One can check for example the website of Ministry of Health, Labour and Welfare, prefecture’s websites or some mass communication companies’ websites to get more details. The information is provided on a daily basis. Note however, that these records were not very accurate at the beginning of the outbreak. This was caused by the delay of reports, but also by some changes in the policy for the collect of information. Our analysis will take this source of uncertainties into consideration.

3 Ensemble Transform Kalman filter

In this section, we introduce the main tool employed in the study, namely the ensemble Kalman filter. The analysis of parameter values and unobservable compartments is performed based on it.

The method Ensemble Transform Kalman Filter (ETKF) used in this paper has been introduced and developed in [6, 17]. It is based on the Ensemble Kalman Filter [10, 11, 12], but it provides analyses in a more efficient way.

Let us consider a discrete time state space model given for any time t:={1,2,}t\in\mathbb{N}:=\{1,2,\ldots\} by

xt\displaystyle x_{t} =Mt(xt1)\displaystyle=M_{t}(x_{t-1}) (6)
yt\displaystyle y_{t} =Ht(xt)+ϵt,\displaystyle=H_{t}(x_{t})+\epsilon_{t}, (7)

where xtx_{t} is an ll-dimensional state vector and yty_{t} an mm-dimensional observational vector, MtM_{t} is the operator which defines the dynamics of the state, HtH_{t} is an operator corresponding to the observation model, and ϵt\epsilon_{t} provides the observation error. All these vectors or operators are explicitly tt dependent. The vector xtx_{t} represents the state of the dynamical system, while yty_{t} is called the noisy observation, both at time tt. The observation error ϵt\epsilon_{t} follows a normal distribution N(0,Rt)N(0,\textbf{R}_{t}), where Rt\textbf{R}_{t} is an m×mm\times m observation error covariance matrix. In this framework, the general question is: given a list of noisy, unbiased observations (yt)t(y_{t})_{t}, how can one find the best estimates for (xt)t(x_{t})_{t} ​?

Under the assumption of unbiased Gaussian observation error and of Gaussian initial distributions, one can consider the maximum likelihood approach. For time t=1,,Nt=1,\dots,N, the likelihood of (xt)t(x_{t})_{t} is proportional to

t=1Nexp(12[ytHt(xt)]TRt1[ytHt(xt)]),\displaystyle\prod_{t=1}^{N}\exp\Big{(}-\frac{1}{2}[y_{t}-H_{t}(x_{t})]^{T}\textbf{R}_{t}^{-1}[y_{t}-H_{t}(x_{t})]\Big{)}, (8)

where the superscript TT denotes the transpose of a vector or of a matrix. Clearly, maximizing (8) can be transformed into a minimization problem. More precisely, by using equation (6), one first defines the cost function

t=1N[ytHt(Mt(xt1))]TRt1[ytHt(Mt(xt1))].\displaystyle\sum_{t=1}^{N}\big{[}y_{t}-H_{t}(M_{t}(x_{t-1}))]^{T}\textbf{R}_{t}^{-1}[y_{t}-H_{t}(M_{t}(x_{t-1}))\big{]}. (9)

Then, one ends up in looking for a list of estimates (xta)t(x_{t}^{a})_{t} of (xt)t(x_{t})_{t} which minimize (9).

When the operators MtM_{t} and HtH_{t} are linear, they can be represented by matrices, denoted also by MtM_{t} and HtH_{t}. In such a situation, and with our assumption of Gaussian observation error, the linear model leads to the propagation of Gaussian distributions for the states. More precisely, assume that at some time tt, there is a prior estimate xtbx^{b}_{t} of the state xtx_{t} with its covariance estimate PtbP_{t}^{b} (also called the background error covariance). Then the Kalman filter [19, 20] provides an optimal solution to minimize the variance of its uncertainty. For this reason, it is known as the minimum variance estimator. Formally, the prior estimate is updated by using the information of the observation yty_{t} at time tt as follows [18, Eq. (1), (2), (8) & (10)] ​:

xta\displaystyle x_{t}^{a} =xtb+Kt[ytHtxtb],\displaystyle=x^{b}_{t}+\textbf{K}_{t}[y_{t}-H_{t}x^{b}_{t}], (10)
Kt\displaystyle\textbf{K}_{t} =PtbHtT(HtPtbHtT+Rt)1,\displaystyle=P_{t}^{b}H_{t}^{T}(H_{t}P_{t}^{b}H_{t}^{T}+\textbf{R}_{t})^{-1}, (11)
Pta\displaystyle P_{t}^{a} =(IKtHt)Ptb,\displaystyle=(I-\textbf{K}_{t}H_{t})P_{t}^{b}, (12)
Pt+1b\displaystyle P_{t+1}^{b} =Mt+1PtaMt+1T,\displaystyle=M_{t+1}P_{t}^{a}M_{t+1}^{T}, (13)

where Kt\textbf{K}_{t} is called the Kalman gain, xtax_{t}^{a} the analysis, and PtaP_{t}^{a} its estimated covariance. With the analysis, one can use the forecast model (6) to evolve the analysis and get the prior estimate and the covariance at the next time step.

When the operators MtM_{t} and HtH_{t} are nonlinear, the Kalman filter is not applicable in its original form, but a simple extension with linear approximation is known to be effective. The straightforward nonlinear extension of the Kalman filter is known as the extended Kalman filter (EKF). In this case, Mt+1M_{t+1} and HtH_{t} in equation (10) to (13) are replaced by their linear approximations respectively. For low dimensional models, the cost of EKF is low. However, when the model complexity and the state dimension increases, one may consider the ensemble Kalman filter. In this case, an ensemble of model states is used to estimate the covariance matrices. Namely, let {xt1a(i),i=1,,k}\{x_{t-1}^{a(i)},i=1,\dots,k\} be an ensemble of estimates of the model states at time t1t-1. By letting this ensemble evolve according to the model (6), one gets the forecast or background ensemble {xtb(i),i=1,,k}\{x_{t}^{b(i)},i=1,\dots,k\} at time tt, where

xtb(i)=Mt(xt1a(i)).\displaystyle x_{t}^{b(i)}=M_{t}(x_{t-1}^{a(i)}).

The mean of the ensemble can be used as the most probable estimate of the state, with its covariance estimated by the sample covariance of the ensemble:

x¯tb=1ki=1kxtb(i) and Ptb=1k1(Xtb)(Xtb)T,\bar{x}_{t}^{b}=\frac{1}{k}\sum_{i=1}^{k}x_{t}^{b(i)}\qquad\hbox{ and }\qquad P_{t}^{b}=\frac{1}{k-1}(\textbf{X}_{t}^{b})(\textbf{X}_{t}^{b})^{T},

where Xtb\textbf{X}_{t}^{b} is an l×kl\times k matrix with the ithi^{\rm th} column xtb(i)x¯tbx_{t}^{b(i)}-\bar{x}_{t}^{b}.

The next step is to update the forecast ensemble mean and its covariance by using the information of the observation yty_{t}. The updated ensemble mean is denoted by x¯ta\bar{x}_{t}^{a} and its covariance by PtaP^{a}_{t}. To get them, one first needs to determine an analysis ensemble {xta(i),i=1,,k}\{x_{t}^{a(i)},i=1,\dots,k\}. Similar to the background ensemble, the mean and covariance of the analysis ensemble are given by

x¯ta\displaystyle\bar{x}_{t}^{a} =1ki=1kxta(i),\displaystyle=\frac{1}{k}\sum_{i=1}^{k}x_{t}^{a(i)}, (14)
Pta\displaystyle P_{t}^{a} =1k1(Xta)(Xta)T.\displaystyle=\frac{1}{k-1}(\textbf{X}_{t}^{a})(\textbf{X}_{t}^{a})^{T}. (15)

In our application the operator HtH_{t} is linear, so that HtH_{t} is given by a matrix. In this case and for ETKF, the computations can be written as follows [18, Eq. (15)\sim(17)] ​:

x¯ta\displaystyle\bar{x}_{t}^{a} =x¯tb+XtbP~ta(HtXtb)TRt1(ytHtxtb¯),\displaystyle=\bar{x}_{t}^{b}+\textbf{X}_{t}^{b}\tilde{P}_{t}^{a}(H_{t}\textbf{X}_{t}^{b})^{T}\textbf{R}_{t}^{-1}\big{(}y_{t}-\overline{H_{t}x^{b}_{t}}\big{)}, (16)
P~ta\displaystyle\tilde{P}_{t}^{a} =[(k1)I+(HtXtb)TRt1HtXtb]1,\displaystyle=\big{[}(k-1)I+\big{(}H_{t}\textbf{X}_{t}^{b}\big{)}^{T}\textbf{R}_{t}^{-1}H_{t}\textbf{X}_{t}^{b}\big{]}^{-1}, (17)
Xta\displaystyle\textbf{X}_{t}^{a} =Xtb[(k1)P~ta]12,\displaystyle=\textbf{X}_{t}^{b}\big{[}(k-1)\tilde{P}_{t}^{a}\big{]}^{\frac{1}{2}}, (18)

where the k×kk\times k matrix P~ta\tilde{P}_{t}^{a} is called the analysis error covariance in ensemble space. For (18), the background perturbation Xtb\textbf{X}_{t}^{b} is transformed into the analysis perturbation Xta\textbf{X}_{t}^{a} with the weight [(k1)P~ta]12[(k-1)\tilde{P}_{t}^{a}]^{\frac{1}{2}}. By adding x¯ta\bar{x}_{t}^{a} to each column of Xta\textbf{X}_{t}^{a}, one gets the analysis ensemble {xta(i),i=1,,k}\{x_{t}^{a(i)},i=1,\dots,k\} that satisfies (14). Note that the ETKF is a deterministic filter since no randomly perturbed observations are used in the computation [40], and it is also a square-root filter because its takes the power 12\frac{1}{2} of the matrix P~ta\tilde{P}_{t}^{a} [36].

To avoid a variance underestimation, caused for example by the limited ensemble size, an artificial inflation of the ensemble spread is usually applied. In [18], several ways to perform the variance inflation are discussed. In our study, we apply multiplicative inflation and additive inflation. For multiplicative inflation, the background error covariance PtbP_{t}^{b} is multiplied by a tunable factor ρ>1\rho>1 before the analysis. The multiplicative inflation can be thought as a procedure to increase the influence of the current observations on the analysis. For additive inflation, since the extended SEIR model is a conserved closed system, it is then applied by adding a random vector to the background ensemble. The random vector can be sampled from the normal distribution with mean 0 and some covariance matrix. This covariance matrix is assumed to be proportional to the background error covariance PtbP_{t}^{b} by a tunable parameter α(0,1)\alpha\in(0,1) [18].

For the current study with a low dimensional model, we could use EKF or EnKF. However, by considering the possibility of increasing the model complexity in future, we opted for ETKF with 50 ensemble members.

4 Experiments

In this section, we explain the design of the experiments and illustrate the outcomes by using observations from Tokyo Metropolis. These observations (HH, RR, and DD) are obtained from the website [39]. Discussions and comparisons with other prefectures are provided in the subsequent section. As mentioned in Section 2.5, health officials started providing data from February 2020 but they included some uncertainties caused by delay or by policy changes. Our experiments start with the observations from March 6th6^{\mathrm{th}}, 2020, when the record of data became more systematic.

4.1 The experiments, using data from Tokyo

For the state space model mentioned in Section 3, the dynamical system (6) is described by the differential system (2.2). To estimate the parameter βs\beta_{s}, we use the augmented state by adding one more equation dβsdt=0\frac{d\beta_{s}}{dt}=0 to the system (2.2), assuming persistence for βs\beta_{s}. Namely, βs\beta_{s} will stay at the same value during the time integration process, and will be updated during the analysis step of the data assimilation. In other words, if βs\beta_{s} has a correlation with the observations, βs\beta_{s} will be updated together with the other states. In system (2.2), the unit of time t=1t=1 represents one day. Thus, the one day forecast from day nn is obtained by integrating the system (2.2) on [n,n+1][n,n+1] with the initial values equal to the analysis on day nn. To avoid negative values in the analysis step, all the compartments are transformed to log\log scale with base e{\mathrm{e}}. The lower case will be used for these new variables, as for example s(t):=logS(t)s(t):=\log S(t), and the corresponding equation becomes ds(t)dt=1S(t)dS(t)dt\frac{ds(t)}{dt}=\frac{1}{S(t)}\frac{dS(t)}{dt}. The DA analysis update directly applies to the log\log-transformed values.

Now the forecast value of xtx_{t} is a 99-dimensional vector

(e(t),ia(t),is(t),h(t),r(t),d(t),ra(t),rs(t),logβs)T.\big{(}e(t),i_{a}(t),i_{s}(t),h(t),r(t),d(t),r_{a}(t),r_{s}(t),\log\beta_{s}\big{)}^{T}. (19)

Note that the compartment s(t)s(t) is not explicitly included since it can be deduced from the conservation relation (2). Before performing the DA analysis update, the above vector has to be projected to the observation space by the observation system (7). In our setting, the operator HtH_{t} introduced in (7) is defined by the projection which sends the forecast vector (19) on (h(t),r(t),d(t))T\big{(}h(t),r(t),d(t)\big{)}^{T}, namely on the three compartments with observations.

To initiate the experiments, initial values for all compartments and parameters have to be provided. When these initial values are far away from the observations, the system goes through an unreasonable transition period. To avoid or reduce the unreasonable transition, preliminary experiments were performed for determining suitable initial values. Namely, we run the system for a few days, starting with a few individuals in some compartments and a presumed value for βs\beta_{s}. Different values of βs\beta_{s} were tested, until values comparable to the data of the compartments H,D,RH,D,R on day zero (March 6th6^{\mathrm{th}}, 2020) were obtained. The corresponding values for the different compartments were then chosen as initial conditions. However, independent normal distributed random errors are also added to create 50 ensemble members {x0a(i),i=1,,50}\{x^{a(i)}_{0},i=1,\dots,50\}. Note that each member x0a(i)x^{a(i)}_{0} is a 99-dimensional vector containing the initial conditions for the compartments and for the parameter βs\beta_{s}.

The ensemble on day zero are then integrated by the model (2.2) for one day and the ETKF will update the one day forecast by using the observations. The procedure of the experiments is shown in Figure 4.1.

{xta(i)}\big{\{}x^{a(i)}_{t}\big{\}} extended SEIR model {xt+1b(i)}\big{\{}x^{b(i)}_{t+1}\big{\}} analysis update observations yt+1y_{t+1} {xt+1a(i)}\big{\{}x^{a(i)}_{t+1}\big{\}} 1-day forecast
Figure 4: Data assimilation flow-chart

As mentioned in Section 2.4, parameters γH\gamma_{H} and γD\gamma_{D} are estimated by assimilating the observations. Consider equation dRdt=γHH\frac{dR}{dt}=\gamma_{H}H in model (2.2). Since the time is discrete, we can rewrite this relation as R(t+1)R(t)=γH(t)H(t)R(t+1)-R(t)=\gamma_{H}(t)H(t) which leads to

γH(t)=R(t+1)R(t)H(t).\displaystyle\gamma_{H}(t)=\frac{R(t+1)-R(t)}{H(t)}. (20)

Similar computation can be generated for γD\gamma_{D} also, namely,

γD(t)=D(t+1)D(t)H(t).\displaystyle\gamma_{D}(t)=\frac{D(t+1)-D(t)}{H(t)}. (21)

The daily values of both parameters are shown in Figure 5. For the implementation of these parameters for the computation of the 11-day forecast, we have used a slightly smoothed version obtained by a 77-day convolution with the symmetric weights 164(1,6,15,20,15,6,1)\frac{1}{64}\big{(}1,6,15,20,15,6,1). The effect is to decrease the amplitude of the weekly oscillations.

Refer to caption
Refer to caption
Figure 5: Daily estimation of the parameters γH\gamma_{H} and γD\gamma_{D}

For the integration process, we have also included some uncertainties to all parameters: to the ones presented in Table 1, but also to the values of the parameters γH\gamma_{H}, γD\gamma_{D}. Thus, the 11-day forecast is obtained with the parameters of the previous day, each of them perturbed by a normal distribution N(0,(M/10)2)N\big{(}0,(M/10)^{2}\big{)}, where MM corresponds to the value of this parameter. These perturbations are independent and randomly generated for each 11-day forecast.

The last initial setting for data assimilation is the observation error covariance matrix Rt\textbf{R}_{t}. We assume that the covariance of the errors between different observations is zero, namely, the matrix Rt\textbf{R}_{t} is diagonal. Since we observe three states, Rt\textbf{R}_{t} is a 3×33\times 3 diagonal matrix. Clearly, the bigger the variance, the weaker contribution the corresponding observation will make to the update.

For simplicity, we assume that the observation error covariance is independent of time. The diagonal elements are chosen as (log(1.3))2\big{(}\log(1.3)\big{)}^{2} for any time tt, representing the observation error variance: Namely, for x{h,r,d}x\in\{h,r,d\} one has [xlog(1.3),x+log(1.3)]\big{[}x-\log(1.3),x+\log(1.3)\big{]} as 68%68\% CI and [x2log(1.3),x+2log(1.3)]\big{[}x-2\log(1.3),x+2\log(1.3)\big{]} as 95%95\% CI. Considering that all the observations have been transformed into log\log scale, that is, the observations in original scale are distributed in [X/(1.3),(1.3)X]\big{[}X/(1.3),(1.3)X\big{]} as 68%68\% CI and [X/(1.69),(1.69)X]\big{[}X/(1.69),(1.69)X\big{]} as 95%95\% CI.

Before showing the analysis result, a scatter plot of the values of hh and of log(βs)\log(\beta_{s}) for the different ensemble members on a certain day is provided in Figure 6. Indeed, for the parameter estimation, one assumption is the linear relation between the ensemble of parameters and the ensemble of compartments with observations. As shown in Figure 6, a positive correlation is observed between hh and log(βs)\log(\beta_{s}). Thus we can use the observation of HH for the estimation of βs\beta_{s}.

Refer to caption
Figure 6: Scatter plot of hh and log(βs)\log(\beta_{s}) on November 17, 2020

In Figure 7 we finally provide the analysis value for RtR_{t} from March 6th6^{\mathrm{th}} 2020 to the middle of October 2021. The black curve represents the mean value of the ensemble. The two shaded regions represent the 68%68\% CI (dark orange) and 95%95\% CI (light orange). The analysis value of RtR_{t} is computed by using βs\beta_{s} together with the information contained in (4) and in (5). The red curve shows the RtR_{t} provided by Toyokeizai.net for reference. The grey shaded regions represent the periods of state of emergency in Tokyo. Note that the jump at the end of May 2020 is due to an abrupt change in the value of τH\tau_{H} as indicated in Table 1. Additional discussions about this graph will be provided in the subsequent section.

Refer to caption
Figure 7: Analysis value for RtR_{t}, and comparison with the value of Toyokeizai.net

The analysis results for the compartments EE, IaI_{a}, IsI_{s}, HH, RR, DD with confidence intervals are also presented in Figure 8. For compartments HH, RR, DD, the red dots represent the observations of Tokyo. For all pictures, the yy-axis is in log10\log_{10} scale.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 8: The 6 compartments for Tokyo

4.2 Technical discussion, using data from Tokyo

In Figure 7, one first notices that the similarity between the analysis RtR_{t} and the reference RtR_{t} provided by Toyokeizai.net. The Toyokeizai RtR_{t} on day tt is computed by the following formula:

(i=t6tnew confirmed cases on day ii=t13t7new confirmed cases on day i)average generation timereporting interval,\displaystyle\bigg{(}\frac{\sum_{i=t-6}^{t}\hbox{new confirmed cases on day $i$}}{\sum_{i=t-13}^{t-7}\hbox{new confirmed cases on day $i$}}\bigg{)}^{\frac{\hbox{average generation time}}{\hbox{reporting interval}}},

where the average generation time is assumed to be 5 days and the reporting interval is assumed to be 7 days. The formula is a simplified version of a maximum likelihood estimation for the effective reproduction number provided by H. Nishiura [28]. Note that this formula is based on the daily confirmed cases, which experiences quick changes everyday. On the other hand, the analysis RtR_{t} and its confidence interval is computed by formula (4) which involve the analysis ensemble of βs\beta_{s} and βa\beta_{a}. The average over the ensemble member provides a smoother evolution of RtR_{t}, which is certainly closer to a true statistic’s evolution. Note that a third approach for the computation of RtR_{t} is available in [35]: it involves an agent-based model together with a particle filter.

In Figure 8, we show the analysis mean of compartments EE, IaI_{a}, IsI_{s} with the 68%68\% CI (dark orange) and 95%95\% CI (light orange), and the same results for compartments HH, RR and DD with the daily observations represented by red dots. One notes that the analyses of HH, RR, DD fit the observations appropriately except at the beginning of the observation period. This may be related to the initial guess and the model imperfections.

Refer to caption
Figure 9: Comparison of analysis results with different ratio kk between βa\beta_{a} and βs\beta_{s}

As mentioned in Section 2.4, the ratio between βa\beta_{a} and βs\beta_{s} is fixed at 0.580.58 based on the information provided by the literature. Since this ratio is quite uncertain, we performed similar investigations with different ratios varying from 0.10.1 to 1.01.0. For k=βaβsk=\frac{\beta_{a}}{\beta_{s}} with k=0.1,0.3,0.58,0.8k=0.1,0.3,0.58,0.8 and 1.01.0, we run the experiments independently (always with the data from Tokyo) and the analysis mean of RtR_{t} are shown in Figure 9. The patterns of each RtR_{t} are similar, especially after May 31st31^{\mathrm{st}}, 2020. However, at the beginning of the outbreak and during the first state of emergency, all the analysis value of RtR_{t} experience quick changes but with different speed. In Figure 10 we also provide the analysis mean of the different compartments, since they also depend on the ratio between βa\beta_{a} and βs\beta_{s}. The same conclusion is obtained: except for the first three months, the possible ratio do not generate any noticeable difference.

Refer to caption
Figure 10: Analysis mean of compartments with different ratio kk
Refer to caption
Refer to caption
Figure 11: Analysis mean and spread of RtR_{t} with different observation covariance

In Section 4.1, the error covariance matrix Rt\textbf{R}_{t} is defined as a diagonal matrix ((log(1.3))2I\big{(}(\log(1.3)\big{)}^{2}I where II is the identity 3×33\times 3 matrix. We also run a sensitivity test for different choices of the covariance matrix, and checked if it has an effect on the analysis results. We choose Rt\textbf{R}_{t} equals to sd2I{\rm sd}^{2}I with sd=log(1.1){\rm sd}=\log(1.1), log(1.3)\log(1.3), log(1.5)\log(1.5), log(2.0)\log(2.0), and log(2.5)\log(2.5). For these investigations the ratio between βa\beta_{a} and βs\beta_{s} is fixed as 0.580.58. In Figure 11(a), the analysis means are provided, and they are clearly close to each other except at the beginning stage of the outbreak. The different spreads are shown in Figure 11(b), and one notices that the spread increases with the sd{\rm sd}, but for sd{\rm sd} larger than log(1.5)\log(1.5), the increase is less clear. We also show the analysis mean of compartments EE, IaI_{a}, IsI_{s}, HH, RR and DD in Figure 12(a). Again, the analysis results of these compartments with different observation error covariance are quite close to each other. In Figure 12(b), the spread is computed by the standard deviation of ensemble members of the log\log-transformed compartments. The relations of spread are quite clear for the three compartments with observations, and there are some overlaps for spread of ee, iai_{a} and isi_{s}, but in general the spread increases with sd{\rm sd}.

Refer to caption
Refer to caption
Figure 12: Analysis mean (a) and spread (b) of the compartments with different observation covariance

As shown in Table 1, the proportion of symptomatic and asymptomatic used in our investigation is 83%83\% and 17%17\%, respectively. These values were determined based on sources mentioned in Section 2.4. However, since asymptomatic cases are very difficult to detect, and since we can not be fully confident in the above ratio, a sensitivity test is necessary. To do this, we choose two different combinations which are 70%70\% and 30%30\%, and 50%50\% and 50%50\%, and keep the observation error covariance matrix at ((log(1.3))2I\big{(}(\log(1.3)\big{)}^{2}I and k=0.58k=0.58. Compared to the previous settings, in these two scenarios more people become asymptomatic and recover without showing any symptoms, as illustrated in Figure 13(a). On the other hand, the three curves for RtR_{t} are close to each others, but the one corresponding to only 50%50\% of symptomatic is slightly higher than the other two, see Figure 13(b). In order to understand this, let us recall that the relation between the transmission coefficients βa\beta_{a} and βs\beta_{s} is set to be 0.580.58, it means that the asymptomatic cases are less infectious than the symptomatic and symptomatic cases. Thus, when more people do not show any symptoms, the transmission coefficient has to be bigger to create enough cases to fit the observations. As a consequence, RtR_{t} will also be bigger.

Refer to caption
Refer to caption
Figure 13: Analysis mean of state IaI_{a} (a) and RtR_{t} (b) with different ratios between symptomatic and asymptomatic

5 Discussion, comparisons, and future projects

Let us start by a few easy observations about the results obtained in the previous section.

The analysis values of RtR_{t} quickly decrease during the first state of emergency, and this parameter is lower than 11 around the middle of May 2020. This indicates that the disease has started to die out. During the second state of emergency, analysis values of RtR_{t} had a much slower decay, but nevertheless in early February 2021, RtR_{t} successfully drops below 11. However it starts to increase from March while the state of emergency had not been released. In the third state of emergency, the analysis RtR_{t} had the slowest decay, compared to the previous two states of emergency. During the first several weeks of the fourth state of emergency, the effective reproduction number continued increasing, even at a faster speed. It is only after the end of the first week of August that a decay finally took place.

For EE, IaI_{a} and IsI_{s}, one finds that the three values increase at the beginning of the first state of emergency, then they slow down and change the direction to decrease. Similar behaviors can be observed in the second state of emergency, and the values start to increase around the middle of March 2021 while the RtR_{t} starts yo increase at the beginning of March. There is clearly a delay between the changes of RtR_{t} and their effect on the three compartments EE, IaI_{a} and IsI_{s}.

Let us now provide a comparison with two different regions from Japan, namely Osaka and Kanagawa, and also provide the analysis for Japan as a whole. The experiments for Osaka, Kanagawa, and Japan start from March 26th26^{\mathrm{th}} 2020, March 18th18^{\mathrm{th}} 2020, March 1st1^{\mathrm{st}} 2020 respectively. The observed compartments of these regions are the same as those for Tokyo, namely HH, DD and RR. The way to determine the initial setting is the same as that for Tokyo and we use the same setting for the parameters which have been presented in Table 1. The daily values for the parameter γH\gamma_{H} and γD\gamma_{D} for each region are also computed by using (20) and (21). The ratio between βa\beta_{a} and βs\beta_{s} is 0.580.58, and the observation error covariance matrix is assumed to be (log(1.3))2I\big{(}\log(1.3)\big{)}^{2}I.

Refer to caption
(a) Osaka
Refer to caption
(b) Kanagawa
Refer to caption
(c) Japan
Figure 14: Analysis value of RtR_{t} for Osaka, Kanagawa and Japan

In Figure 14 we provide the analysis values of RtR_{t} with 68%68\% CI and 95%95\% CI for each region. The additional red curves are again RtR_{t} given by Toyokeizai.net. The grey shaded regions mark the respective periods of state of emergency. Though each region declared some states of emergency at different periods of time, they all have the common feature that the recent declarations are less efficient than the first ones. In Figure 15, 16, and 17, we provide the analysis results for compartments EE, IsI_{s}, IsI_{s}, HH, RR and DD for each region.

In Figure 18, the three analysis RtR_{t} curves for Tokyo, Osaka and Kanagawa are drawn simultaneously. The analysis RtR_{t} for Japan is represented by red dots. One observes that the general trends for each region are similar, but some delays are also visible. For example in early April 2020, RtR_{t} in Tokyo reached the first peak then started to decline, while RtR_{t} in Kanagawa reached the first peak in the middle of April, and then start to decline. Subsequently, these two regions are quite correlated, but Osaka has a slightly independent behavior, both in the summer 2020 and in April 2021.

Based on these observations, one can suspect that the movement of populations might be related to the evolution of RtR_{t} for regions which are close to each other. The study of the effect of movements can be carried out by constructing a proper model with multi-populations connected to each others. Some parameters may be tuned, for example for simulating movement restriction’s policies. Such simulations could be useful for public health officials or for local governments to understand and estimate different disease-control strategies. We plan to work in this direction in the future.

Refer to caption
Figure 15: Analysis results of Osaka
Refer to caption
Figure 16: Analysis results of Kanagawa
Refer to caption
Figure 17: Analysis results of Japan
Refer to caption
Figure 18: RtR_{t} for each region

Acknowledgement

S. R. is supported by the grant Topological invariants through scattering theory and noncommutative geometry from Nagoya University, and by JSPS Grant-in-Aid for scientific research C no 18K03328 & 21K03292, and on leave of absence from Univ. Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5208, Institut Camille Jordan, 43 blvd. du 11 novembre 1918, F-69622 Villeurbanne cedex, France.

References

  • [1] E. Armstrong, M. Runge, J. Gerardin, Identifying the measurements required to estimate rates of COVID-19 transmission, infection, and detection, using variational data assimilation, Infectious Disease Modelling 6, 133–147, 2021.
  • [2] F. Arroyo-Marioli, F. Bullano, S. Kucinskas, C. Rondón-Moreno, Tracking R of COVID-19: A new real-time estimation using the Kalman filter, PLoS ONE 16(1): e0244474, 2021.
  • [3] O. Byambasuren, M. Cardona, K. Bell, J. Clark, M.-L. McLaws, P. Glasziou, Estimating the extent of asymptomatic COVID-19 and its potential for community transmission: systematic review and meta-analysis, J. Association of Medical Microbiology and Infectious Disease Canada 5 Issue 4, 223–234, 2020.
  • [4] F. Brauer, P.v.d. Driessche, J. Wu, Mathematical epidemiology, Lecture Notes in Mathematics 1945, Springer, 2008.
  • [5] D. Buitrago-Garcia, D. Egli-Gany, M.J. Counotte, S. Hossmann, H. Imeri, A.M. Ipekci, et al., Occurrence and transmission potential of asymptomatic and presymptomatic SARS-CoV-2 infections: A living systematic review and meta-analysis, PLoS Med 17(9): e1003346, 2020.
  • [6] C.H. Bishop, B.J. Etherton, S.J. Majumdar, Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects, Mon. Wea. Rev. 129, 420–436, 2001.
  • [7] C. McAloon, A. Collins, K. Hunt, et al., Incubation period of COVID-19: a rapid systematic review and meta-analysis of observational research, BMJ Open 10:e039652, 2020.
  • [8] P.v.d. Driessche, J. Watmough, Reproduction numbers and sub-threshold endemic equilibria for compartmental models of disease transmission, Mathematical Biosciences 180, 29–48, 2002.
  • [9] G. Evensen, J. Amezcua, M. Bocquet, A. Carrassi, A. Farchi, A. Fowler, P. L. Houtekamer, C. K. Jones, R. J. de Moraes, M. Pulido, C. Sampson, F. C. Vossepoel, An international initiative of predicting the SARS-CoV-2 pandemic using ensemble data assimilation, Foundations of Data Science, American Institute of Mathematical Sciences, 2020.
  • [10] G. Evensen, Sequential data assimilation with a nonlinear quasigeostrophic model using Monte Carlo methods to forecast error statistics, J. Geophys. Res. 99, 10143–10162, 1994.
  • [11] G. Evensen, The ensemble Kalman filter: Theoretical formulation and practical implementation, Ocean Dynam. 53, 343–367, 2003.
  • [12] G. Evensen, Data Assimilation: The Ensemble Kalman Filter, Springer, 2006.
  • [13] R. Engbert, M.M. Rabe, R. Kliegl, S. Reich, Sequential Data Assimilation of the Stochastic SEIR Epidemic Model for Regional COVID-19 Dynamics, Bulletin of Mathematical Biology 83:1, 2021.
  • [14] R. Ghostine, M. Gharamti, S. Hassrouny, I. Hoteit, An Extended SEIR Model with Vaccination for Forecasting the COVID-19 Pandemic in Saudi Arabia Using an Ensemble Kalman Filter, Mathematics 9, 636. 2021.
  • [15] K.M. Gostic, L. McGough , E.B. Baskerville, S. Abbott, K. Joshi, C. Tedijanto, et al., Practical considerations for measuring the effective reproductive number, RtR_{t}, PLoS Comput Biol 16 No. 12, e1008409, 21 pages, 2020.
  • [16] H.W. Hethcote, The mathematics of infectious diseases, SIAM Rev. 42 No. 4, 599–653, 2000.
  • [17] B.R. Hunt, E.J. Kostelich, I. Szunyogh, Efficient data assimilation for spatiotemporal chaos: a local ensemble transform Kalman filter, Physica D. 230, 112–126, 2007.
  • [18] P.L. Houtekamer, F. Zhang, Review of the Ensemble Kalman filter for atmospheric data assimilation, Mon. Wea. Rev. 144, 4489–4532, 2016.
  • [19] R.E. Kalman, A new approach to linear filtering and prediction problems, Trans. ASME Ser. D: J. Basic Eng. 82, 35–45, 1960.
  • [20] R.E. Kalman, R.S. Bucy, New results in linear filtering and prediction theory, Trans. ASME Ser. D: J. Basic Eng. 83, 95–108, 1961.
  • [21] T. Kuniya, H. Inaba, Possible effects of mixed prevention strategy for COVID-19 epidemic: massive testing, quarantine and social distancing, AIMS Public Health 7 No 3, 490–503, 2020.
  • [22] M. Li, An Introduction to Mathematical Modeling of Infectious Diseases, Mathematics of Planet Earth 2, Springer 2018.
  • [23] R. Li, et al., Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2), Science 368, 489–493, 2020.
  • [24] m3: https://www.m3.com/open/iryoIshin/article/849820/
  • [25] L. Mitchell, A. Arnold, Analyzing the effects of observation function selection in ensemble Kalman filtering for epidemic models, Mathematical Biosciences 339, 2021.
  • [26] MLIT: https://www.mlit.go.jp/tetudo/tetudo_fr1_000062.html
  • [27] P. Nadler, S. Wang, R. Arcucci, X. Yang, Y. Guo, An epidemiological modelling approach for COVID-19 via data assimilation, European Journal of Epidemiology 35, 749–761, 2020.
  • [28] H. Nishiura: https://github.com/contactmodel/COVID19-Japan-Reff
  • [29] E. Ott, B.R. Hunt, I. Szunyogh, M. Corazza, E. Kalnay, D.J. Patil, J.A. Yorke, A.V. Zimin, E.J. Kostelich, Exploiting local low dimensionality of the atmospheric dynamics for efficient ensemble Kalman filtering, Preprint: https://arxiv.org/abs/physics/0203058v3.
  • [30] E. Ott, B.R. Hunt, I. Szunyogh, A.V. Zimin, E.J. Kostelich, M. Corazza, E. Kalnay, D.J. Patil, J.A. Yorke, A local ensemble Kalman filter for atmospheric data assimilation, Tellus A 56, 415–428, 2004.
  • [31] Osaka prefecture government, Citizens awareness and behavior change of measures against COVID-19, http://www.pref.osaka.lg.jp/hodo/attach/hodo-40479_4.pdf
  • [32] A.M. Pollock, J. Lancaster, Asymptomatic transmission of covid-19, BMJ 371:m4851, 2020.
  • [33] T.C. Rebollo, D. Coronil, Predictive data assimilation through Reduced Order Modeling for epidemics with data uncertainty, Preprint: https://arxiv.org/abs/2004.12341.
  • [34] C. J. Rhodes, T. D. Hollingsworth, Variational data assimilation with epidemic models, Journal of Theoretical Biology 258, 591–602, 2009.
  • [35] C. Sun, S. Richard, T. Miyoshi, Agent-based model and data assimilation: Analysis of COVID-19 in Tokyo, Preprint: https://arxiv.org/abs/2109.00258.
  • [36] M.K. Tippett, J.L. Anderson, C.H. Bishop, T.M. Hamill, J.S. Whitaker, Ensemble square-root filters, Mon. Wea. Rev. 131, 1485–1490, 2003.
  • [37] Bureau of social welfare and public health, About death cases due to COVID-19 in Tokyo, https://www.fukushihoken.metro.tokyo.lg.jp
  • [38] Tokyo metropolitan government, COVID-19 The information website, https://stopcovid19.metro.tokyo.lg.jp.
  • [39] Toyokeizai, Coronavirus disease (COVID-19) situation report in Japan, https://toyokeizai.net/sp/visual/tko/covid19/en.html.
  • [40] J.S. Whitaker, T.M. Hamill, Ensemble data assimilation without perturbed observations, Mon. Wea. Rev. 130, 1913–1924, 2002.
  • [41] World Health Organization: https://www.who.int/.