This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Early-Phase Local-Area Model for Pandemics Using Limited Data: A SARS-CoV-2 Application

Jiasheng Shilabel=e1][email protected] [    Jeffrey Morrislabel=e2][email protected] [    David Rubinlabel=e3][email protected] [    Jing Huanglabel=e4][email protected] [ School of Data Science, The Chinese University of Hong Kong, Shenzhen, China.presep=, ]e1 Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvaniapresep=, ]e2,e4 University of California presep=, ]e3
Abstract

The emergence of novel infectious agents presents challenges to statistical models of disease transmission. These challenges arise from limited, poor-quality data and an incomplete understanding of the agent. Moreover, outbreaks manifest differently across regions due to various factors, making it imperative for models to factor in regional specifics. In this work, we offer a model that effectively utilizes constrained data resources to estimate disease transmission rates at the local level, especially during the early outbreak phase when primarily infection counts and aggregated local characteristics are accessible. This model merges a pathogen transmission methodology based on daily infection numbers with regression techniques, drawing correlations between disease transmission and local-area factors, such as demographics, health policies, behavior, and even climate, to estimate and forecast daily infections. We incorporate the quasi-score method and an error term to navigate potential data concerns and mistaken assumptions. Additionally, we introduce an online estimator that facilitates real-time data updates, complemented by an iterative algorithm for parameter estimation. This approach facilitates real-time analysis of disease transmission when data quality is suboptimal and knowledge of the infectious pathogen is limited. It is particularly useful in the early stages of outbreaks, providing support for local decision-making.

Instantaneous reproduction number,
Online estimator,
Quasi-likelihood,
Time-Since-Infection model,
keywords:
\startlocaldefs\endlocaldefs

, and

1 Introduction

When a novel infectious agent emerges in a population lacking prior immunity, it can spread quickly, posing a threat of national or global outbreaks. An immediate response is crucial, but data infrastructures often fail to capture all the necessary data during the initial phase of an outbreak. For example, during the COVID-19 pandemic, although the CDC began collaborating with state, tribal, local, and territorial health departments in January 2020 to collect and validate COVID-19 case and death data, an automated data transfer system using an application programming interface was not fully established until July to improve data collection and quality at the jurisdiction level. It was only in December 2020 that the CDC set up business rules and curated official online data sources for most US counties (Khan et al., 2022, 2023). Furthermore, the first data collection on SARS-CoV-2 seroprevalence in the US did not begin until July 2020, and the estimates were not available until September 2021, with estimates only at the national level (Jones et al., 2021, 2023). Consequently, policymakers faced challenges in making timely, informed decisions with limited and potentially unreliable data in the pandemic’s early stages.

On the other hand, it’s important to note that while only a limited spectrum and quality of disease data are available, local factors may be harnessed to model an outbreak’s emergence and predict its trajectory (Pica and Bouvier, 2012; Stewart-Ibarra et al., 2014; Rubin et al., 2020; Alam and Sultana, 2021). Moreover, outbreaks are often driven by local transmission events, which can vary significantly from region to region. Local-area disease modeling is likely to be more useful than national modeling for policymakers to design mitigation strategies that consider regional-specific characteristics and needs.

A specific example is the policy questions posed by local governors during the initial stages of the COVID-19 pandemic in March 2020. At that time, governments deliberated over whether to enforce a lockdown and, if so, for how long. Some proposed a stringent short-term strategy, advocating for a strict lockdown until summer to entirely contain the virus, assuming that the virus would not survive in high temperatures. In contrast, others recommended a milder but more extended approach to social distancing, suggesting that temperature might not inhibit the virus’s spread, warranting a longer-term strategy (Roach, 2020; Aubrey, 2020; Gorman, 2020). To make an informed decision, it was essential to understand the impact of temperature on the virus’s transmission rate and to compare the effects of temperature and social distancing on the spread of this new pathogen. At that time, the pandemic had only begun a few weeks prior, so there wasn’t sufficient data within a single location to estimate the effect of temperature. A national model would also not be able to answer such a local question. We proposed utilizing data from multiple counties, spanning different latitudes with varying temperatures experienced during those initial weeks, to help gauge the impact of temperature, while adjusting for county-level covariates.

Regression models are often well-suited statistical tools for such an analysis. The challenge lies in creating or adapting an epidemiological model that captures the dynamics of disease transmission using limited and potentially unreliable data. This model must also be harmoniously integrated with a regression model to gauge the impact of local factors on the transmission process. Not all models are suitable for this task. For example, classical compartment models, which segregate individuals within a population into distinct compartments over varying time intervals, present challenges when data for all compartments are not available. The more compartments assumed, the more data variables are needed to fit the model. Additionally, they also face challenges when attempting to integrate with regression models. These models often treat the disease transmission parameter as a piecewise constant function over time, and they rely on solving a complex set of differential equations, either deterministic or stochastic, to determine disease transmission rates. In contrast, the time-since-infection (TSI) models, rooted in assumptions about the renewal process of case incidence numbers (which are treated as random variables and might be the sole disease data variables required for model fitting), can be tailored to regression models by applying specific distributional assumptions to the random variables representing case incidence numbers (Quick, Dey and Lin, 2021).

The TSI models are designed around the premise that the number of new infections at a particular moment, such as day tt, is influenced by three primary factors: the count of recent infections, the reproduction number at that moment, and an infectiousness function that measures the contagiousness of an infected person at each day since infection. For illustration, let IitI_{it} represent the new infections on day tt in location ii. If we know the infections before day tt, denoted by Ii0,,Ii,t1I_{i0},...,I_{i,t-1}, the anticipated new infections on day tt in location ii can be deduced as the multiplication of the instantaneous reproduction number and the infection potential on that particular day at the given location. This relationship can be expressed as:

E(Iit|Ii0,,Ii,t1)=RitΛit,\displaystyle E(I_{it}|I_{i0},\cdots,I_{i,t-1})=R_{it}\Lambda_{it}, (1.1)

where RitR_{it} represents the time-varying reproduction number and Λits=1tIi,tsωs\Lambda_{it}\triangleq\sum_{s=1}^{t}I_{i,t-s}\omega_{s} stands for the infection potential on day tt in location ii. The infection potential is shaped by the current number of infectious individuals and the infectiousness function ωs\omega_{s}, which quantifies the infectiousness of the existing infectious individuals on the ss-th day after infection, with s=0ωs=1\sum_{s=0}^{\infty}\omega_{s}=1. This infectiousness function can be approximated using the probability distribution of the serial interval or generation time (Svensson, 2007). To estimate RitR_{it}, one may assume distribution assumptions for IitI_{it} based on equation (1.1) and use methods like maximum likelihood or Bayesian approaches (Cori et al., 2013; WHO Ebola Response Team, 2014; Quick, Dey and Lin, 2021) with a predetermined ωs\omega_{s}. Due to the empirical nature of this model, it can be readily adapted to regression models, broadening its applicability (Quick, Dey and Lin, 2021).

TSI models were extensively used to analyze the COVID-19 pandemic (Pan et al., 2020; Nouvellet et al., 2021; Amman et al., 2022; Nash, Nouvellet and Cori, 2022; Ge et al., 2023), particularly in the early stages, likely because data on reported daily cases were mainly available during that time period. The models demonstrated superior accuracy in estimating disease transmission compared to many other methods (Gostic et al., 2020; Nash, Nouvellet and Cori, 2022). However, estimates of RitR_{it} can be biased due to data errors or incompleteness in IitI_{it}, which can stem from batch reporting, delayed reporting, under-ascertainment, manual errors, and other factors.. There are different ways to address these data challenges, one of which is to leverage additional data sources. For example, recent work have used serological studies and symptom surveys to address the under-ascertainment and delayed reporting of case incidence (Lison et al., 2023; Quick, Dey and Lin, 2021; Noh and Danuser, 2021; Dempsey, 2020). Relevant to our proposed work, the model introduced by Quick, Dey and Lin (2021) integrates testing data and population-based serological surveys to account for under-ascertainment and delayed reporting. However, these data are often unavailable at the local-area level and in the early stages of outbreaks.

In our study, we address these challenges in a distinct context, focusing on the early phase of an outbreak when additional data resources might be unavailable at the local-level. Considering the scarce of disease data and limited confidence about the pandemic models, our model introduces a general measurement error term, instead of relying on additional data and parametric assumptions, to model the mechanism of errors. This term accounts for deviations in RitR_{it} that may arise from data inaccuracies or inaccurate assumptions about the model and cannot be explained by observed local-area factors. In essence, the proposed model is a local-area disease transmission model robust to data errors, merging a TSI model of transmission dynamics with regression models. This framework identifies key local-area factors influencing disease transmission variability across regions. We utilize the quasi-score method to ease distributional assumptions about the data and make minimal parametric assumptions on the measurement error component, bolstering model robustness. Our model offers flexibility in estimating the RitR_{it} values even when relying on imperfect data or potentially mis-specified model assumptions. This is especially valuable during an outbreak’s early stages when additional data resources like testing data and serological surveys aren’t available and knowledge about the infectious agent is sparse. Furthermore, we introduce an online estimator coupled with an efficient iterative algorithm to estimate model parameters. This methodology supports continuous monitoring and dynamic forecasting of disease transmission under various scenarios, offering critical insights for local decision-making.

2 Method

In this section, we introduce the proposed model and estimation procedure, as well as the approach to construct confidence intervals for the estimates.

2.1 An early phase local-area model for modeling disease transmission

Building on Equation (1.1), for counties i=1,,ni=1,\cdots,n, and time points t=1,,Nt=1,\cdots,N, we posit moment constraints on the number of new infections on day tt in location ii given previously reported existing infections prior to day tt, as:

μit\displaystyle\mu_{it} 𝔼(Iit|i,t1)=𝔼(Rit|i,t1)Λit,νitVar(Iit|i,t1)=g(μit)ϕi,\displaystyle\triangleq\mathbbm{E}\big{(}I_{it}|\mathcal{F}_{i,t-1}\big{)}=\mathbbm{E}\big{(}R_{it}|\mathcal{F}_{i,t-1}\big{)}\Lambda_{it},\qquad\nu_{it}\triangleq\mbox{Var}\big{(}I_{it}|\mathcal{F}_{i,t-1}\big{)}=g(\mu_{it})\cdot\phi_{i}, (2.1)

where i,t\mathcal{F}_{i,t} is the filtration of past incident cases information, ϕi\phi_{i} is a dispersion parameter, g()g(\cdot) is a known variance function, and μit=RitΛit\mu_{it}=R_{it}\Lambda_{it} when Riti,t1R_{it}\in\mathcal{F}_{i,t-1}. When g()g(\cdot) is unknown, smoothing techniques, e.g. local polynomial fitting, can be used to estimate the variance function (Chiou and Müller, 1998). More discussions on unknown variance functions can be found in Section A of the Supplementary Materials.

As previously noted, Λits=1tIi,tsωs\Lambda_{it}\triangleq\sum_{s=1}^{t}I_{i,t-s}\omega_{s} represents the infection potential, determined by the current number of infectious individuals and the infectiousness function ωs\omega_{s}. We assume that the ωs\omega_{s}’s represent the probability distribution of the serial interval or generation time, and set ωs=0\omega_{s}=0 when s=0s=0 or s>ηs>\eta, with η\eta denoting the duration from infection to either recovery or mandatory quarantine. We use ω={ωs,1sη}\omega=\{\omega_{s},1\leq s\leq\eta\} to denote the vector of ωs\omega_{s}. In practice, ω\omega is a vector of positive values, often obtained from epidemiological studies of infectious pathogens (He et al., 2020). In cases where multiple studies on serial intervals exist, a meta-analysis can be conducted to obtain a pooled estimate.

To estimate RitR_{it} and investigate the impact of local area factors on disease transmission, we further assume a time series model as follows:

h(Rit)\displaystyle h(R_{it}) =WitTβˇi+XitTβˇ0+m=1qθmfm(D1,it,D2,it,D3,it)+ϵit,\displaystyle=W^{T}_{it}\check{\beta}_{i}+X_{it}^{T}\check{\beta}_{0}+\sum_{m=1}^{q}\theta_{m}f_{m}(D_{1,it},D_{2,it},D_{3,it})+\epsilon_{it}, (2.2)

where D1,it={Xij}0jtD_{1,it}=\big{\{}X_{ij}\big{\}}_{0\leq j\leq t}, D2,it={Iij}0jt1D_{2,it}=\big{\{}I_{ij}\big{\}}_{0\leq j\leq t-1}, D3,it={Rij}0jt1D_{3,it}=\big{\{}R_{ij}\big{\}}_{0\leq j\leq t-1}, and i,t1=σ(D1,itD2,it)\mathcal{F}_{i,t-1}=\sigma(D_{1,it}\cup D_{2,it}). The term m=1qθmfm(D1,it,D2,it,D3,it)\sum_{m=1}^{q}\theta_{m}f_{m}(D_{1,it},D_{2,it},D_{3,it}) describes the time-series dependency of the disease transmission rate. Specifically, fm()f_{m}(\cdot) is a function of past data and θm0\theta_{m}\geq 0 for 1m<q1\leq m<q. fq1f_{q}\equiv 1 is set as the constant function and θq\theta_{q} is the intercept term. The h()h(\cdot) is a known link function. Moreover, Witp1W_{it}\in\mathbb{R}^{p_{1}} and Xitpp1X_{it}\in\mathbb{R}^{p-p_{1}} are vectors of local-area factors for location ii on day tt associated with disease transmission. The factors in WitTW_{it}^{T} have fixed location-specific effects, while the factors in XitTX_{it}^{T} have common effects across locations. To simplify notion, we consolidate the presentation of all local-area factors as Zit=(XitT,0,,0p(i1),WitT,0,,0p(ni))TZ_{it}=(X_{it}^{T},\overbrace{0,\cdots,0}^{p(i-1)},W_{it}^{T},\overbrace{0,\cdots,0}^{p(n-i)})^{T}, present the vector of all regression coefficients of these factor as β=(βˇ0T,βˇ1T,,βˇnT)Tp\beta=(\check{\beta}_{0}^{T},\check{\beta}_{1}^{T},\cdots,\check{\beta}_{n}^{T})^{T}\in\mathbb{R}^{p}, and denote the vector of all other regression coefficients as θ=(θ1,,θq)Tq\theta=(\theta_{1},\cdots,\theta_{q})^{T}\in\mathbb{R}^{q}.

In practice, a choice for Equation (2.2) can be a non-stationary log-transmission model with an autoregressive process as follows:

log(Rit)=ZitTβ+m=1q1θmlog(Ri,tm)+θq+ϵit,θ1<1andμit=νit,fortq,θm0.\log(R_{it})=Z_{it}^{T}\beta+\sum_{m=1}^{q-1}\theta_{m}\log(R_{i,t-m})+\theta_{q}+\epsilon_{it},\;\|\theta\|_{1}<1\;\text{and}\;\mu_{it}=\nu_{it},\;{\rm for}\;t\geq q,\;\theta_{m}\geq 0. (2.3)

In the above models, to counter potential biases arising from data errors and incorrectly specified model assumptions, we incorporate ϵit\epsilon_{it} as a measurement error term for RitR_{it}. This term can also be viewed as a random effect term of RitR_{it}, capturing unexplained deviations in RitR_{it} that could result from inaccuracies in reported infection data or model mis-specifications. At this point, our proposed model breaks down the serial dependency of incidence cases into two components: Equation (2.1) encapsulates the serial dependency inherent in the generative nature of disease transmission given RitR_{it}, and Equation (2.2) portrays the serial dependency arising from the RitR_{it} interdependence between adjacent time points, elucidated by covariates and time series structures.

2.2 Estimation

With the measurement error term in Equation (2.2), the estimation of RitR_{it} and parameters in the proposed model becomes difficult. In a simple scenario when ϵit\epsilon_{it} is assumed to be 0, Equation (2.2) reduces to

h(Rit)\displaystyle h(R_{it}) =ZitTβ+m=1qθmfm(D1,it,D2,it,D3,it).\displaystyle=Z_{it}^{T}{\beta}+\sum_{m=1}^{q}\theta_{m}f_{m}(D_{1,{it}},D_{2,it},D_{3,it}). (2.4)

In this situation, an estimator of the parameters γ=(βT,θT)T\gamma=(\beta^{T},\theta^{T})^{T}, denoted as γ^\hat{\gamma}, can be obtained by solving UN(γ)=0U_{N}({\gamma})=0, where UN(γ)U_{N}(\gamma) is the quasi-score function

UN(γ)=i=1nt=qN(μitγ)Tνit1(Iitμit)i=1nt=qNξit(γ).U_{N}(\gamma)=\sum_{i=1}^{n}\sum_{t=q}^{N}\big{(}\frac{\partial\mu_{it}}{\partial\gamma}\big{)}^{T}\nu_{it}^{-1}(I_{it}-\mu_{it})\triangleq\sum_{i=1}^{n}\sum_{t=q}^{N}\xi_{it}(\gamma). (2.5)

However, when ϵit\epsilon_{it} is a non-degenerate random variable, calculating the expectation in μit=𝔼[Rit|i,t1]Λit\mu_{it}=\mathbbm{E}[R_{it}|\mathcal{F}_{i,t-1}]\Lambda_{it} and solving UN(γ)=0U_{N}(\gamma)=0 are generally difficult. Similar models were studied by Davis, Dunsmuir and Streett (2003) to model count data using the log link function, where the measurement error was assumed to be a stationary process and calculated using an autoregressive moving average recursion with a distributed lag structure. Here, given that {Rit}1tN1in\{R_{it}\}_{1\leq t\leq N}^{1\leq i\leq n} are not directly observed, we describe an iterative algorithm for estimating parameters in (2.2) without requiring any stationary structure and distribution assumption on the measurement error term. Briefly, the proposed iterative algorithm consists of two steps. The first step uses the second layer of the model, the time series model, to construct a semi-parametric, locally efficient estimator of β\beta based on a location-shift regression model. In this step, β\beta and RitR_{it} can be written as functions of θ\theta using an initial estimate of θ\theta or an estimate from the last iteration. In the second step, the parameters were updated using the first layer of the model, the quasi-score function. Specifically, for t=q,,Nt=q,\cdots,N, i=1,,ni=1,\cdots,n, we first rearrange Equation (2.2) to

h(Rit)m=1qθmfm(D1,it,D2,it,D3,it)=ZitTβ+ϵit.h(R_{it})-\sum_{m=1}^{q}\theta_{m}f_{m}(D_{1,it},D_{2,it},D_{3,it})=Z_{it}^{T}\beta+\epsilon_{it}.

Assuming the Gauss-Markov assumption (Wooldridge, 2016), i.e., the conditional independence of {ϵit|Zit,t1}\{\epsilon_{it}|Z_{it},t\geq 1\}, 𝔼[ϵit|Zit]=0\mathbbm{E}[\epsilon_{it}|Z_{it}]=0 and homoscedasticity, the above equation forms a location-shift regression model. Thus, assuming θ\theta is known, a semi-parametric locally efficient estimator (Tsiatis, 2007) for β\beta is obtained by

β^\displaystyle\hat{\beta} ={i=1nt=1N(ZitZ¯)(ZitZ¯)T}1\displaystyle=\left\{\sum_{i=1}^{n}\sum_{t=1}^{N}(Z_{it}-\bar{Z})(Z_{it}-\bar{Z})^{T}\right\}^{-1} (2.6)
i=1nt=1N(ZitZ¯)(h(Rit)m=1qθmfm(D1,it,D2,it,D3,it)),\displaystyle\qquad\qquad\sum_{i=1}^{n}\sum_{t=1}^{N}(Z_{it}-\bar{Z})\left(h(R_{it})-\sum_{m=1}^{q}\theta_{m}f_{m}(D_{1,it},D_{2,it},D_{3,it})\right),

where Z¯=(i=1nt=1TZit)/Nn\bar{Z}=\big{(}\sum_{i=1}^{n}\sum_{t=1}^{T}Z_{it}\big{)}/Nn, NN is the total number of observed time points and nn is the number of counties. Based on this formulation, we develop an iterative estimation method, namely the local-area disease transmission model using the quasi-score estimation (LOCAL-QUEST), to estimate γ\gamma and RitR_{it} by iteratively updating the locally-efficient semi-parametric estimator of β\beta and the quasi-score estimator of θ\theta as shown in Algorithm 1. Intuitively, with an initial estimate of θ\theta and β\beta, intermediate semi-parametric locally efficient estimator of β\beta can be written as a function of θ\theta and these initial estimates using equations (2.6) and (2.4). Then θ^\hat{\theta} can be updated by solving equation (2.5) and β^\hat{\beta} is updated by the updated θ^\hat{\theta} and equation (2.6). The algorithm is then proceeded iteratively until it converges. Due to its iterative nature, this procedure provides online estimators (Bottou, 1998; Farhang-Boroujeny, 2013) for {Rit}1tN1in\{R_{it}\}_{1\leq t\leq N}^{1\leq i\leq n} and γ\gamma, allowing daily updates or update when new data become available. This can be particularly useful for real-time monitoring of outbreaks, as disease incidence and local-area factors are changing over time, and the data stream comes in continuously. The parameters are updated incrementally as new data arrives, which also reduces the requirement of computational memory.

Algorithm 1 : The LOCAL-QUEST algorithm.
Estimation of the instantaneous reproduction number and local-area effects
1:procedure  Initialize the parameters using data from a small initial period of the pandemic τ0\tau_{0} and the model (2.4) that ignores measurement error. Denote the initial values as θ^(τ0)\hat{\theta}^{(\tau_{0})}, β^(τ0)\hat{\beta}^{(\tau_{0})} and {R^it(τ0)}0<tτ01in\{\hat{R}_{it}^{(\tau_{0})}\}_{0<t\leq\tau_{0}}^{1\leq i\leq n}.
2:    loop
3:         for k>τ0k>\tau_{0}, and kNk\leq N, with estimates θ^(k1),β^(k1)\hat{\theta}^{(k-1)},\hat{\beta}^{(k-1)}, and {R^it(k1)}0<tk11in\{\hat{R}_{it}^{(k-1)}\}_{0<t\leq k-1}^{1\leq i\leq n} obtained in the last iteration, do
4:             write the approximations of β^\hat{\beta} and {R^it}τ0+1tk1in\{\hat{R}_{it}\}_{\tau_{0}+1\leq t\leq k}^{1\leq i\leq n} as a function of θ\theta using equations (2.6)(\ref{sudoestimator}) and (2.4)(\ref{model_assumption}), that is
β~(k)\displaystyle\tilde{\beta}^{(k)} {i=1nt=1k1(ZitZ¯k1)(ZitZ¯k1)T}1\displaystyle\triangleq\left\{\sum_{i=1}^{n}\sum_{t=1}^{k-1}(Z_{it}-\bar{Z}_{k-1})(Z_{it}-\bar{Z}_{k-1})^{T}\right\}^{-1} (2.7)
×i=1nt=1(k1)(ZitZ¯k1)[h(R^it(k1))m=1qθmfm(D1,it,D2,it,D^3,it(k1))]\displaystyle\qquad\times\sum_{i=1}^{n}\sum_{t=1}^{(k-1)}(Z_{it}-\bar{Z}_{k-1})\Big{[}h(\hat{R}_{it}^{(k-1)})-\sum_{m=1}^{q}\theta_{m}f_{m}\big{(}D_{1,it},D_{2,it},\hat{D}_{3,it}^{(k-1)}\big{)}\Big{]}
and R~it(k)h1(m=1qθmfm(D1,it,D2,it,D^3,it(k1))+ZitTβ~(k)),\displaystyle\tilde{R}_{it}^{(k)}\triangleq h^{-1}\left(\sum_{m=1}^{q}\theta_{m}f_{m}(D_{1,it},D_{2,it},\hat{D}_{3,it}^{(k-1)})+Z_{it}^{T}\tilde{\beta}^{(k)}\right), (2.8)
5:             Obtain θ^(k)\hat{\theta}^{(k)} by solving the quasi-score equation
Uk(θ)=i=1nt=τ0+1k(μ~it(k)θ)T(ν~it(k))1(Iitμ~it(k))U_{k}(\theta)=\sum_{i=1}^{n}\sum_{t=\tau_{0}+1}^{k}\big{(}\frac{\partial\tilde{\mu}_{it}^{(k)}}{\partial\theta}\big{)}^{T}\big{(}\tilde{\nu}_{it}^{(k)}\big{)}^{-1}(I_{it}-\tilde{\mu}_{it}^{(k)}) (2.9)
using iterative methods, e.g. Newton-Raphson and iteratively reweighted least squares, where μ~it(k)=ν~it(k)R~it(k)Λit\tilde{\mu}_{it}^{(k)}=\tilde{\nu}_{it}^{(k)}\triangleq\tilde{R}_{it}^{(k)}\Lambda_{it}.
6:             Obtain β^(k)\hat{\beta}^{(k)} and {R^it(k)}1tk1in\{\hat{R}_{it}^{(k)}\}_{1\leq t\leq k}^{1\leq i\leq n} by plugging θ^(k)\hat{\theta}^{(k)} into (2.7) and (2.8), i.e.,
β^(k)\displaystyle\hat{\beta}^{(k)} {i=1nt=1k1(ZitZ¯k1)(ZitZ¯k1)T}1\displaystyle\triangleq\left\{\sum_{i=1}^{n}\sum_{t=1}^{k-1}(Z_{it}-\bar{Z}_{k-1})(Z_{it}-\bar{Z}_{k-1})^{T}\right\}^{-1} (2.10)
×i=1nt=1(k1)(ZitZ¯k1)[h(R^it(k1))m=1qθ^m(k)fm(D1,it,D2,it,D^3,it(k1))],\displaystyle\qquad\times\sum_{i=1}^{n}\sum_{t=1}^{(k-1)}(Z_{it}-\bar{Z}_{k-1})\Big{[}h(\hat{R}_{it}^{(k-1)})-\sum_{m=1}^{q}\hat{\theta}_{m}^{(k)}f_{m}\big{(}D_{1,it},D_{2,it},\hat{D}_{3,it}^{(k-1)}\big{)}\Big{]},
and R^it(k)h1(m=1qθ^m(k)fm(D1,it,D2,it,D^3,it(k))+ZitTβ^(k)).\displaystyle\hat{R}_{it}^{(k)}\triangleq h^{-1}\left(\sum_{m=1}^{q}\hat{\theta}_{m}^{(k)}f_{m}(D_{1,it},D_{2,it},\hat{D}_{3,it}^{(k)})+Z_{it}^{T}\hat{\beta}^{(k)}\right). (2.11)
7:         end for
8:    end loop
9:end procedure
10:

2.3 Uncertainty quantification

We use the block bootstrap method introduced by Hall (1985) and Künsch (1989) to quantify the uncertainty of the proposed online estimator. We will illustrate this calculation for β^\hat{\beta}. The confidence band of {R^it}1tN1in\{\hat{R}_{it}\}_{1\leq t\leq N}^{1\leq i\leq n} can be obtained similarly. Given the time series data of daily case incidence and local area factors, denoted as{(Iit,Zit)}1tN,1in\{(I_{it},Z_{it})\}_{1\leq t\leq N,1\leq i\leq n}, where ZitZ_{it} is a pp-dimensional vector of covariates at time tt in location ii, we assume that β\beta can be estimated using a block of the time series data as long as the length of the block exceeds \ell. Specifically, we require that the expression (it𝒮i(ZitZ¯𝒮)(ZitZ¯𝒮)T)1\left(\sum_{i}\sum_{t\in\mathcal{S}_{i}}(Z_{it}-\bar{Z}_{\mathcal{S}})(Z_{it}-\bar{Z}_{\mathcal{S}})^{T}\right)^{-1} be well defined for any arbitrary set 𝒮i{1,,N}\mathcal{S}_{i}\subset\{1,\cdots,N\} with |𝒮i||\mathcal{S}_{i}|\geq\ell, i=1,,ni=1,\cdots,n and Z¯𝒮\bar{Z}_{\mathcal{S}} is defined as an analogy to Z¯\bar{Z}.

For BB bootstrapping samples, we assume the sampled block 𝒮b\mathcal{S}_{b} is the same for all counties in one bootstrapping observation for simplicity, b=1,,Bb=1,\cdots,B. That is, we denote a block of data up to time tt with length \ell as

ξt={(Ii,t+1,Zi,t+1),,(Iit,Zit),i=1,,n}\xi_{t}=\Big{\{}\big{(}I_{i,t-\ell+1},Z_{i,t-\ell+1}\big{)},\cdots,\big{(}I_{it},Z_{it}\big{)},i=1,\cdots,n\Big{\}}

for t=,,Nt=\ell,\cdots,N. We adopt the suggested =O(N1/3)\ell=O(N^{1/3}) from Bühlmann and Künsch (1999) and independently resample BB blocks of data with replacement, for B=O((Npτ0)/)B=O(\lfloor(N-p-\tau_{0})/\ell\rfloor) (Bühlmann, 2002) to obtain BB bootstrap samples ξt1,ξt2,,ξtB\xi_{t_{1}},\xi_{t_{2}},\ldots,\xi_{t_{B}}.

For the effect of jj-th covariate, βj\beta_{j}, where j=1,,pj=1,\cdots,p, let’s denote the estimator based on the bb-th sample as β^tb,j\hat{\beta}^{*}_{t_{b},j} and the estimator using the whole data as βj^\hat{\beta_{j}}. One may use {β^tb,jβ^j:b=1,,B}\{\hat{\beta}^{*}_{t_{b},j}-\hat{\beta}_{j}:b=1,\cdots,B\} to approximate the empirical distribution of (β^jβj)(\hat{\beta}_{j}-\beta_{j}) or use {β^tb,j:b=1,,B}\{\hat{\beta}^{*}_{t_{b},j}:b=1,\cdots,B\} as an approximation of the empirical distribution of β^j\hat{\beta}_{j}. Thus, the level 1α1-\alpha bootstrap confidence interval of βj\beta_{j} based on the BB samples, denoted as (Lj,α/2,B,Uj,α/2,B)(L_{j,\alpha/2,B}^{*},U_{j,\alpha/2,B}^{*}), can be constructed by

Lj,α/2,B=2β^jsup{t:1Bb=1B𝟙(β^tb,jt)1α2},\displaystyle L_{j,\alpha/2,B}^{*}=2\hat{\beta}_{j}-\sup\left\{t:\frac{1}{B}\sum_{b=1}^{B}\mathbbm{1}(\hat{\beta}^{*}_{t_{b},j}\leq t)\leq 1-\frac{\alpha}{2}\right\},
Uj,α/2,B=2β^jinf{t:1Bb=1B𝟙(β^tb,jt)α2},or\displaystyle U_{j,\alpha/2,B}^{*}=2\hat{\beta}_{j}-\inf\left\{t:\frac{1}{B}\sum_{b=1}^{B}\mathbbm{1}(\hat{\beta}^{*}_{t_{b},j}\leq t)\geq\frac{\alpha}{2}\right\},\;\;\text{or} (2.12)
Lj,α/2,B=inf{t:1Bb=1B𝟙(β^tb,jt)α2},Uj,α/2,B=sup{t:1Bb=1B𝟙(β^tb,jt)1α2}.\displaystyle L_{j,\alpha/2,B}^{*}=\inf\left\{t:\frac{1}{B}\sum_{b=1}^{B}\mathbbm{1}(\hat{\beta}^{*}_{t_{b},j}\leq t)\geq\frac{\alpha}{2}\right\},\;U_{j,\alpha/2,B}^{*}=\sup\left\{t:\frac{1}{B}\sum_{b=1}^{B}\mathbbm{1}(\hat{\beta}^{*}_{t_{b},j}\leq t)\leq 1-\frac{\alpha}{2}\right\}.

3 Properties of the Estimators and Algorithm

In this section, we show the asymptotic results of the estimators and property of the estimating procedure.

3.1 Asymptotic properties

The proposed method is a general approach for various counting processes and is robust to mis-specification of distribution assumptions. When the measurement error can be ignored, i.e., ϵit=0\epsilon_{it}=0, the Poisson process used in (Cori et al., 2013) becomes a special case of Equation (2.1), with νit=μit\nu_{it}=\mu_{it}. Since Ut(γ)tσ(i=1ni,t)U_{t}(\gamma)\in\mathcal{F}_{t}\triangleq\sigma(\cup_{i=1}^{n}\mathcal{F}_{i,t}) and μit/γ,μit,νiti,t1\partial\mu_{it}/\partial\gamma,\mu_{it},\nu_{it}\in\mathcal{F}_{i,t-1}, then (Us(γ),s),sq{(U_{s}(\gamma),\mathcal{F}_{s}),s\geq q} forms a mean zero martingale sequence, indicating that UN(γ)U_{N}(\gamma) is an unbiased estimating function. For asymptotic properties of the estimators, we consider the least favorable case where we only have one county’s data and discard subscript ii in all notations. Denote the true parameter value as γ0\gamma_{0}, and according to Theorem 3 of Kaufmann (1987), we have:

Theorem 3.1.

Under condition 1 described in Section B of the Supplementary Material and on the non-extinction set defined in (B.1),

[t=qNCov(ξt(γ0)|t1)]1/2(γ^γ0)𝑑𝒩(0,I).\Big{[}\sum_{t=q}^{N}\mbox{Cov}\big{(}\xi_{t}(\gamma_{0})|\mathcal{F}_{t-1}\big{)}\Big{]}^{1/2}(\hat{\gamma}-\gamma_{0})\overset{d}{\longrightarrow}\mathcal{N}(0,I). (3.1)

In the Supplementary Materials, we also show a special case of (2.4), given by

log(Rit)=ZitTβ+θq+m=1q1θm[log(Ri,tm)Zi,tmTβ],fort>q>0,\log(R_{it})=Z_{it}^{T}\beta+\theta_{q}+\sum_{m=1}^{q-1}\theta_{m}\left[\log(R_{i,t-m})-Z_{i,t-m}^{T}\beta\right],\;{\rm for}\;t>q>0, (3.2)

where θ1<1\|\theta\|_{1}<1 and θm0\theta_{m}\geq 0 for m<qm<q satisfies (3.1) under a more traceable condition 2.

3.2 Bias correction and property of the estimation procedure

When the measurement error cannot be ignored, i.e., ϵit0\epsilon_{it}\neq 0, the proposed iterative algorithm consists of two major steps: one that utilizes a semi-parametric locally efficient estimator to express β\beta and RitR_{it} as functions of θ\theta, and another that updates the estimates by solving the score equation. However, the latter step is subject to estimation bias due to the non-zero measurement error term in equation (2.2), which can cause bias in the score equation (Cameron and Trivedi, 2013). Correcting this bias in the general case represented by (2.2) is challenging, but it can be tackled in the special case described by (2.3) by requiring 𝔼(eϵt|t1)=c0\mathbbm{E}\big{(}e^{\epsilon_{t}}|\mathcal{F}_{t-1}\big{)}=c_{0} for some constant c0c_{0} (Zeger, 1988). Therefore, we can correct the bias by modifying the quasi-score equation (2.9) into an unbiased estimating equation as follows:

Uk(θ)=t=τ0+1k(μ~t(k)θ)T(ν~t(k))1(It/c0μ~t(k)).U_{k}^{*}(\theta)=\sum_{t=\tau_{0}+1}^{k}\big{(}\frac{\partial\tilde{\mu}_{t}^{(k)}}{\partial\theta}\big{)}^{T}\big{(}\tilde{\nu}_{t}^{(k)}\big{)}^{-1}(I_{t}/c_{0}-\tilde{\mu}_{t}^{(k)}).

In practice, c0c_{0} can be treated as a nuisance parameter and estimated by solving the quasi-score equation. Moreover, under Equation (2.3), solving the modified quasi-score Equation (2.9) is equivalent to obtaining

θ^(k)=argmaxθ1<1t=τ0+1k((It/c0)logR~t(k)R~t(k)Λt),\hat{\theta}^{(k)}=\arg\max_{\|\theta\|_{1}<1}\sum_{t=\tau_{0}+1}^{k}\Big{(}(I_{t}/c_{0})\log\tilde{R}_{t}^{(k)}-\tilde{R}_{t}^{(k)}\Lambda_{t}\Big{)}, (3.3)

where θ^(k)\hat{\theta}^{(k)} is the MLE based on conditional profile likelihood of Poisson counts. Therefore, the proposed online estimation procedure guarantees the following properties. The proof of the theorems is presented in Section C of the Supplementary Materials.

Theorem 3.2 (Concavity).

When k>τ0k>\tau_{0}, updating the estimates in each iteration, which is equivalently to solve equation (3.3), is a globally concave maximization problem.

Theorem 3.3 (Iteration Difference).

Given regularity condition 3 in Section C of the Supplementary Materials, using data of observed case incidence and local-area factors up to time NN, {(It,Zt)}1tN\{(I_{t},Z_{t})\}_{1\leq t\leq N}, for each 1mq11\leq m\leq q-1,

|θ^m(k)θ^m(k1)|=O(1k1+Ik(t=τ0+1k1It)),\big{|}\hat{\theta}_{m}^{(k)}-\hat{\theta}_{m}^{(k-1)}\big{|}=O\Big{(}\frac{1}{k-1}+\frac{I_{k}}{\big{(}\sum_{t=\tau_{0}+1}^{k-1}I_{t}\big{)}}\Big{)},

where kNk\leq N is the indicator of iteration.

Theorem 3.3 shows that the bound of the step-wise difference of the online estimator between iterations decreases as the number of observation time increases. The consistency of the estimator remains an open question as discussed in Davis, Dunsmuir and Streett (2003). The difficulty arises from the measurement error, ϵit\epsilon_{it}, and the series dependency of disease incidence, Λit\Lambda_{it}, in the model. Both elements are essential in modeling the dynamics of disease transmission. Similar but simplified models, which do not model the measurement errors and series dependency of disease incidence were studied for different contexts, e.g. studies of economics or disease without series dependency of disease incidence (Davis, Dunsmuir and Wang, 1999, 2000; Davis, Dunsmuir and Streett, 2003; Fokianos, Rahbek and Tjøstheim, 2009; Neumann et al., 2011; Doukhan, Fokianos and Li, 2012; Doukhan, Fokianos and Tjøstheim, 2012). Inspired by the derivation and discussion in Davis, Dunsmuir and Streett (2003), we conjecture that conditions requiring stationary and uniformly ergodic of log(RtΛt)\log(R_{t}\Lambda_{t}) are needed to establish the consistency of estimators of model (2.3), although these conditions may be difficult to verify in practice.

4 Simulation Studies

This section describes the simulation studies we conducted to assess the performance of the proposed method. We focused on the estimation of the disease transmission rate and the impacts of time-varying factors . We calculated the relative bias, the coefficient of variation of the estimates, and the Bootstrap confidence interval coverage probability under various scenarios. Additionally, we compared the performance of the proposed model and the basic SIR model, the details of which are presented in Section A of the Supplementary Material.

4.1 Simulation settings

We generated data on the daily numbers of SARS-CoV-2 infections, along with associated local-area factors influencing the disease transmission rate. We generated data in a single location with time-varying factors and suppressed the location subscript ii in this section. We fit the proposed model, as described by Equation (2.3), assuming that the RtR_{t} values exhibited an AR(1) dependency over time and were linked with an intercept term and two time-varying factors, i.e. q=2q=2 and p=2p=2. Each scenario was replicated 1,000 times to calculate evaluation statistics. The infectiousness function was assumed to be known and was modeled using the probability density function of a gamma distribution, in a manner mirroring the results of a previous epidemiological study of SARS-CoV-2 cases in Wuhan, China (Li et al., 2020). The estimation algorithm was applied with a pre-estimation period of 5 days, i.e., τ0=5\tau_{0}=5.

Table 1: Parameter values used to generate data in simulation scenarios. In all scenarios, we set β1=0.02\beta_{1}=-0.02 and β2=0.125\beta_{2}=-0.125. Except for scenario 6, the standard deviation of the error terms in each model is determined by (Varϵ)1/2𝔼|ZT1|β1(\mbox{Var}\epsilon)^{1/2}\asymp\mathbbm{E}|Z_{T1}|\beta_{1}. Zt1Z_{t1} was generated from 4.5+(tT2)/6+𝒩(0,9)4.5+(t-\frac{T}{2})/6+\mathcal{N}(0,9) in scenario 1 and generated from 9+(tT2)/16+𝒩(0,9)9+(t-\frac{T}{2})/16+\mathcal{N}(0,9) in scenario 3 to avoid unrealistically large values of RtR_{t}.
Scenario 1: T=90,T=90, I0=500,I_{0}=500, ϵN(0,103),\epsilon\sim N(0,10^{-3}), θ2=0.5,\theta_{2}=0.5, θ1=0.7,\theta_{1}=0.7,
Scenario 2: T=120,T=120, I0=500,I_{0}=500, ϵN(0,103),\epsilon\sim N(0,10^{-3}), θ2=0.5,\theta_{2}=0.5, θ1=0.7,\theta_{1}=0.7,
Scenario 3: T=200,T=200, I0=500,I_{0}=500, ϵN(0,103),\epsilon\sim N(0,10^{-3}), θ2=0.5,\theta_{2}=0.5, θ1=0.7,\theta_{1}=0.7,
Scenario 4: T=120,T=120, I0=125,I_{0}=125, ϵN(0,103),\epsilon\sim N(0,10^{-3}), θ2=0.5,\theta_{2}=0.5, θ1=0.7,\theta_{1}=0.7,
Scenario 5: T=120,T=120, I0=250,I_{0}=250, ϵN(0,103),\epsilon\sim N(0,10^{-3}), θ2=0.5,\theta_{2}=0.5, θ1=0.7,\theta_{1}=0.7,
Scenario 6: T=120,T=120, I0=500,I_{0}=500, ϵN(0,102),\epsilon\sim N(0,10^{-2}), θ2=0.5,\theta_{2}=0.5, θ1=0.7,\theta_{1}=0.7,
Scenario 7: T=120,T=120, I0=500,I_{0}=500, ϵt3/103/2,\epsilon\sim t_{3}/10^{3/2}, θ2=0.5,\theta_{2}=0.5, θ1=0.7,\theta_{1}=0.7,
Scenario 8: T=120,T=120, I0=500,I_{0}=500, ϵN(0,103),\epsilon\sim N(0,10^{-3}), θ2=0.5,\theta_{2}=0.5, θ1=0.7,\theta_{1}=0.7, β3=0.03,\beta_{3}=-0.03,
Scenario 9: T=120,T=120, I0=500,I_{0}=500, ϵN(0,103),\epsilon\sim N(0,10^{-3}), θ3=0.45,\theta_{3}=0.45, θ1=0.5,\theta_{1}=0.5, θ2=0.3,\theta_{2}=0.3,

Data were simulated under various scenarios, including settings where the fitted models were correctly specified or misspecified, as shown in Table 1. For the scenarios with correctly specified models (scenarios 1-7), data were generated using the parameters (θ1,θ2,β1,β2)=(.7,.5,.02,.125)(\theta_{1},\theta_{2},\beta_{1},\beta_{2})=(.7,.5,-.02,-.125). Here, the simulated values {Rt,t1}\{R_{t},t\geq 1\} were designed to mimic the trend of the RitR_{it} reported in Pan et al. (2020). We varied factors such as the days of observation (TT), initial incident cases (I0I_{0}), and utilized different distributions of error terms ϵti.i.dϵ,t=1,,T{\epsilon_{t}\sim_{i.i.d}\epsilon,t=1,\ldots,T} for data generation. In the settings where the model was mis-specified (scenarios 8-9), data were simulated with the assumption that the RtR_{t} values followed an AR(2) dependency (i.e., q=3q=3) or were associated with an intercept term and three covariates (i.e., p=3p=3). The intercept term is θq\theta_{q}, while the two time-varying covariates (Zt,1,Zt,2Z_{t,1},Z_{t,2}) were simulated independently to mimic the real data of temperature in Philadelphia and data of social distancing from daily cellular telephone movement, provided by Unacast (Unacast, 2021), which measured the percent change in visits to nonessential businesses, e.g., restaurants and hair salons during March 1st and June 30th, 2020. Specifically, daily temperatures were generated from a shifted normal distribution, i.e., Zt1=𝑑5+(tT2)/8+𝒩(0,9)Z_{t1}\overset{d}{=}5+\Big{(}t-\frac{T}{2}\Big{)}\big{/}8+\mathcal{N}(0,9), for t=1,,Tt=1,\cdots,T. The social distancing data were generated from a uniform distribution and applied a logit transformation afterwards, i.e., Zt2i.i.d2+logit(Uniform(0,1)),Z_{t2}\overset{i.i.d}{\sim}2+logit(Uniform(0,1)), for t=1,,Tt=1,\cdots,T. Additionally, in the misspecified scenarios with three covariates, the third covariate was generated from Zt3=𝑑(tT2)/9+𝒩(0,5)Z_{t3}\overset{d}{=}\Big{(}t-\frac{T}{2}\Big{)}\big{/}9+\mathcal{N}(0,5), for t=1,,Tt=1,\cdots,T. The variations of error terms were assumed to be small as the variation of log(R)\log(R) was small with RR ranging from 0.5 to 3 and has a comparable size of the variation of covariates times their effects.

Table 2: Bias and coefficient of variation for the estimators in each scenario. The relative bias, bias, and coefficient of variation of the estimators were calculated using the following formulas. Relative bias:i=1Ns(a^ia)/aNs\sum_{i=1}^{N_{s}}(\hat{a}_{i}-a)/aN_{s}. Bias: i=1Ns(a^ia)/Ns\sum_{i=1}^{N_{s}}(\hat{a}_{i}-a)/N_{s},Coefficient of variation: {i=1Ns[a^i(i=1Nsa^i/Ns)]2}1/2/(i=1Nsa^i/Ns)\{\sum_{i=1}^{N_{s}}[\hat{a}_{i}-(\sum_{i=1}^{N_{s}}\hat{a}_{i}/N_{s})]^{2}\}^{1/2}/(\sum_{i=1}^{N_{s}}\hat{a}_{i}/N_{s}). Here Ns=1000N_{s}=1000 represents the number of replicates, aa denotes the true value of a parameter, and a^\hat{a} indicates the estimated value of that parameter.
relative bias (%) empirical bias (×102\times 10^{-2}) coefficient of variation σ^/μ^\hat{\sigma}/\hat{\mu}
Scenario θ1\theta_{1} θ2\theta_{2} β1\beta_{1} β2\beta_{2} θ1\theta_{1} θ2\theta_{2} β1\beta_{1} β2\beta_{2} θ1\theta_{1} θ2\theta_{2} β1\beta_{1} β2\beta_{2}
1 -1.51 -2.94 -1.19 6.26 -1.05 -1.47 -0.02 0.78 0.10 0.37 0.55 0.38
2 -1.42 -2.41 -0.34 5.03 -0.99 -1.20 -0.01 0.63 0.11 0.27 0.36 0.36
3 -0.78 -1.15 -0.47 2.19 -0.55 -0.57 -0.01 0.27 0.10 0.24 0.27 0.29
4 -3.57 -2.08 -0.04 7.18 -2.50 -1.04 -0.00 0.90 0.14 0.37 0.69 0.64
5 -1.85 -4.87 -5.67 5.00 -1.29 -2.44 -0.11 0.63 0.13 0.39 0.98 0.60
6 -4.16 -1.10 -4.47 6.90 -2.91 -0.55 -0.09 0.86 0.13 0.39 0.43 0.48
7 -1.43 -4.20 -0.53 8.22 -1.00 -2.10 -0.01 1.03 0.11 0.28 0.37 0.36
8 -0.72 -17.83 4.06 9.64 -0.51 -8.92 0.14 1.20 0.12 0.31 0.31 0.42
9 \diagup \diagup -17.67 9.95 \diagup \diagup -3.53 1.24 \diagup \diagup 0.27 0.27

4.2 Simulation results

Table 2 presents the bias and coefficient of variation of the estimators for each scenario. In scenarios where the models were correctly specified, both the bias and the coefficient of variation decreased with an increase in either the days of observation (scenarios 1-3) or the initial case count (scenarios 4-5). When the variance of the error term increased to a magnitude smaller than the product of the time-varying local-area factors and their corresponding effect sizes (comparing scenarios 6 and 2), the proposed estimator still demonstrated robust performance, with less than a 5% increase in relative bias and a 0.2 increase in the coefficient of variation. Even when the error term exhibited a heavier tail (comparing scenarios 7 and 2), the relative bias increased by less than 4%, and the coefficient of variation increased by less than 0.01.

Two types of mis-specification were considered. In scenario 8, an important time-varying covariate was omitted from the model. Due to the correlation between the generated local-area factors over time, this scenario differs from merely increasing the error term’s variance. In both mis-specified cases, the estimator θ^\hat{\theta} is closer to the corresponding partial correlation coefficients rather than to the true values. Although the mis-specification led to an increase in estimation bias, the algorithm still provided a relatively accurate estimator for the correctly specified parts, showing a small bias and coefficient of variation (comparing scenarios 2, 8, and 9). Most importantly, as illustrated in Figure 1 from a random replication of the simulation, the estimated RtR_{t} values still captured the tendency of the true {Rt}t0\{R_{t}\}_{t\geq 0}, especially when the time-series sample size increased.

Two types of mis-specification were considered. In scenario 8, an important time-varying covariate was omitted from the model. Owing to the correlation between the generated local-area factors over time, this scenario represents more than just an increase in the error term’s variance. In both cases of mis-specification, the estimator θ^\hat{\theta} was closer to the corresponding partial correlation coefficients than to the true values. While the mis-specification led to increased estimation bias, the algorithm still provided a relatively accurate estimator for the correctly specified parts, exhibiting a small bias and coefficient of variation (as seen when comparing scenarios 2, 8, and 9). Most importantly, as illustrated in Figure 1 from a random replication of the simulation, the estimated RtR_{t} values still accurately captured the trend of the true {Rt}t0\{R_{t}\}_{t\geq 0}, particularly as the time-series sample size increased.

Refer to caption

Figure 1: Estimated RtR_{t}’s and 90% bootstrap confidence bands from random replications of simulation, with block length =45\ell=45 and bootstrap replication B=200B=200.
Table 3: Coverage Probability of 1α1-\alpha Empirical Bootstrap Confidence Interval for θ2\theta_{2}.
T=300 T=300 T=300 T=450 T=450
=30\ell=30 =45\ell=45 =60\ell=60 =30\ell=30 =45\ell=45
α=5%\alpha=5\% 96.4% 94.8% 93.0% 99.8% 99.2%
α=10%\alpha=10\% 91.2% 87.8% 86.2% 98.6% 97.2%
α=20%\alpha=20\% 83.4% 79.8% 77.2% 96.8% 93.2%

We also constructed the bootstrap confidence interval according to (2.12), and the coverage probability of the bootstrap confidence interval for θ2\theta_{2} is shown in Table 3 as an illustration. Table 3 confirms that the proposed bootstrap confidence interval provides the expected coverage probability when choosing =45=τ0+O(T1/3)\ell=45=\tau_{0}+O(T^{1/3}) for T=300T=300, following the recommendations in Bühlmann and Künsch (1999), and setting the number of replications to B=200=O(T2/3)B=200=O(T^{2/3}), as per Bühlmann (2002). When a smaller block length =30\ell=30 is chosen for T=300T=300, the accuracy of the bootstrap estimator is reduced, leading to a conservative confidence interval. In contrast, when a larger block length =60\ell=60 is chosen, the bootstrap estimator’s accuracy may improve, but the coverage probability becomes less than ideal due to the high correlation between bootstrap subsamples. Furthermore, the optimal block length is also related to TT, as =30\ell=30 and =45\ell=45 would only result in conservative confidence intervals for T=450T=450. Practically, we suggest choosing the smaller value of \ell that satisfies =O(N1/3)\ell=O(N^{1/3}) as recommended in Bühlmann and Künsch (1999).

5 Application to COVID-19 Data

We utilized the proposed method to analyze a dataset that includes data from the early stages of the COVID-19 pandemic. The dataset contains daily infection data and county-level risk factors from 808 US counties across 47 states and the District of Columbia, spanning from March 2020 to May 2021. We examined the impact of county demographics, social behavior, and vaccination coverage on disease transmission and provided projections of daily COVID-19 case counts for three distinct pandemic waves, including the first one. For each period, the model was trained using observed daily infection and covariate data for 4 months (training window) and made projections for the subsequent four weeks (projection window). The first two periods, from March to July 2020 and August to December 2020, covered the first wave and the first holiday season of the pandemic, respectively. During these periods, we used county-level population density, daily wet-bulb temperature, and social distancing measures to explain the variation of RitR_{it} and project future cases. These decisions were informed by studies identifying factors that affect disease transmission and the predominant use of social distancing measures at the time (Rubin et al., 2020; Talic et al., 2021; Sera et al., 2021; Hou et al., 2021; Weaver et al., 2022). The third period, from January to May 2021, coincided with when COVID-19 vaccines became publicly available. In this phase, we incorporated vaccination coverage data along with the three existing covariates to model variation of RitR_{it} and project future cases.

To be included in this analysis, a county had to have 5 incident cases for more than two out of six consecutive days for at least 20 times, as of June 1st, 2020 and meet at least one of the following criteria: 1)contain a city with a population exceeding 100,000; 2)contain a state capital; 3)be the most populated county in the state or 4) have an average daily case incidence exceeding 10 during June 1st to June 30th, 2020. Counties were also required to have social distancing data for each of the study period and vaccination data from the CDC for the third study period. The 808 counties represent 79.5% of the US population, as shown in Figure 2. Population density data was sourced from the US Census data and is expressed as number of people per square mile. Log transformation was performed followed by standardization due to the large values of this variable and substantial skewness in density for the largest cities. Social distancing, obtained from Unacast, was measured by the percent change in visits to nonessential businesses (e.g., restaurants, hair salons), revealed by daily cell phone movement within each county, compared with visits in a four-week baseline period between February 10th and March 8th, 2020. We employed a rolling average of the percentage of visits from 7 to 14 days prior to the time of interest to account for the potential lag between changes in social distancing and alterations in disease transmission. Vaccine coverage data, obtained from the CDC, was defined as the percentage of the population that has received at least one vaccine dose. We applied a 14-day lag to allow time for any potential effects of the vaccine to manifest. The percentage of the non-vaccinated population, calculated by subtracting the 14-day-lagged vaccination percentage from 100%, was then entered into the regression model. When projecting future case numbers, we used county population density, historical average wet-bulb temperature from the last 10 years before the analysis year, and the most recent weekly average values for social distancing and vaccination coverage to calculate the assumed covariate values for future days.

Figure 2: Locations of the counties included in our analysis. Counties were categorized as small, midsize, and large using population cutoffs of 100,000 and 250,000 people.
Refer to caption

5.1 Influential local-area covariates for disease transmission

The estimated covariate effects on RitR_{it} with bootstrap intervals for the three time periods are shown in Table 4. Among the four local-area covariates we examined, social distancing appeared to be the most influential factor related to disease transmission, particularly during the first wave of the pandemic. It exhibited a significant positive association with disease transmission. During the first wave, a 50% reduction in the frequency of visiting non-essential businesses was estimated to reduce RitR_{it} by an average of 11%. This finding aligns with the effects of social distancing reported in other studies from the first wave of the pandemic, but our study encompasses more counties across the US (Rubin et al., 2020; Courtemanche et al., 2020). Over time, the impact of social distancing diminished. This change could be attributed to multiple factors, including possible changes in the virus and variations in population behavior, which might have prevented our social distancing measure from capturing all aspects of population mobility. Furthermore, we observed a gradual decrease in social distancing over time, from a median of 31% reduction in the first wave to 11% reduction in the third wave. It’s also possible that the effect of social distancing is nonlinear, and its impact was minimal when only a small amount of social distancing was implemented.

We also observed a significant heterogeneity in disease transmission across counties with varying population densities during the first wave. The reproduction number was higher in more populated areas compared to rural areas. Simultaneously, we found that wet-bulb temperature was negatively associated with RitR_{it}, suggesting that transmission was generally higher in colder weather compared to warmer days. Although the regression coefficient of temperature was small, its impact on policy-making could be substantial due to the wide range of temperatures a county might experience within a year. During the first study period, the largest increment in wet-bulb temperatures observed within a county was 21C, which could reduce COVID-19 transmission by 15%, with other covariates held constant. However, temperature could also affect social distancing values, as outdoor activities tend to increase during spring and summer, potentially offsetting the reduction in transmission due to increased temperatures. Both the heterogeneity associated with varying population densities and the impact of temperature decreased over time, indicating that the entire US began to experience a more uniform pandemic situation with less seasonality as we moved into the middle or later stages of the pandemic. Additionally, the effect of vaccination coverage on disease transmission during the first half of 2021 was not found to be significant. This lack of significance could be due to factors such as relatively lower vaccine coverage during early-stage vaccination rollout, behavioral changes post-vaccination, and challenges in achieving herd immunity.

\newcolumntype

P[1]¿\arraybackslashp#1 study period 1 2 3 training window 03/01/2020-06/30/2020 08/01/2020-11/30/2020 01/01/2021-05/31/2021 projection window 07/01/2020-07/28/2020 12/01/2020-12/28/2020 06/01/2021-06/28/2021 covariates population density
temperature
social distancing
population density
temperature
social distancing
population density
temperature
social distancing
vaccination coverage
covariates effects population density(×102\times 10^{-2}) 7.32 (1.89, 3.19) 2.37 (1.14, 4.74) 0.29 (-0.04, 0.59) temperature (×102\times 10^{-2}) -0.76 (-1.69, -0.16) -0.36 (-0.83, -0.09) -0.16 (-0.82, 0.05) social distancing 0.24 (0.10, 0.85) 0.15 (0.07, 0.24) 0.03 (-0.31, 0.15) proportion of non-vaccinated 0.02 (-0.20, 0.03) mean PAE (%) week 1 27.9 20.8 24.1 week 2 25.5 20.2 27.4 week 3 22.5 23.9 34.2 week 4 24.6 40.5 39.1 median PAE (%) week 1 22.5 17.8 16.4 week 2 19.4 16.6 17.9 week 3 17.7 16.7 19.9 week 4 20.1 27.3 25.1

Table 4: Model estimates and projections for the COVID-19 data during three study periods. Bootstrap intervals were based on replication k=200k=200 and block length =45\ell=45.

5.2 Forecasting future transmission

We compared the projected daily case counts with the actual observed numbers for the three study periods, and the accuracy of the projection was assessed using the weekly Percentage Absolute Error (PAE). Mean and median PAE are shown in Table 4 by week and study period. The mean PAE ranges from 20.2% to 27.9% for the first two weeks, and 22.5% to 40.5% for the second two weeks, respectively. We also compared our projection results with the CDC COVID-19 Cases Ensemble Forecast, which combines forecasts submitted by a large and variable number of contributing teams using different modeling techniques and data sources (Ray et al., 2020; Cramer et al., 2022). During the three study periods, our model showed similar performance to the CDC model in the 1-week projection and was superior in the 3-4 week projections. The CDC model’s average PAE across all states and weeks was 22%, 32%, 44%, and 57% for the 1 to 4-week forecast windows, respectively (Du et al., 2022). It’s worth noting that the CDC ensemble model is designed for state-level projection while our model focuses on local-level, specifically county-level, projections. Projecting at the county level is more challenging due to the smaller case incidence numbers and higher noise-to-signal ratio in county-level data.

6 Discussion

In this paper, we introduced a regression model with a measurement error term to monitor disease transmission and evaluate the impact of local-area factors in the early phases of outbreaks when data quantity and quality are limited. The time series dependency among the incidence case counts was decomposed into an epidemiological component, captured by the TSI model, and a statistical component, characterized using a time-series regression model. We employed the quasi-score method, complemented by a measurement error term, to tackle challenges related to poor data quality and potential model mispecification. We developed the iterative LOCAL-QUEST method to calculate online estimators of disease transmission and the impact of local-area factors on it.

Our method provides another novel tool to a much larger toolbox for combating future outbreaks, particularly useful during the early stages or for small regional-level analysis when supplemental data resources are scarce. However, the challenges of modeling transmission during outbreaks are still not fully addressed by existing methods. While the measurement error term in our model partially accounts for the data errors and reduces bias, as demonstrated in our simulation analysis, our disease transmission estimates might still be biased. When additional data, such as seroprevalence and symptom onset data, become available, they should be incorporated to further correct biases caused by underreporting or reporting delays (Lison et al., 2023; Quick, Dey and Lin, 2021). Furthermore, identifying causal risk factors for disease transmission using our method remains challenging, necessitating specialized study designs for accurate inference.

While acknowledging the challenges and limitations of our approach, the proposed method offers flexibility for modifications and further extensions. For instance, one can introduce a lag time between the instantaneous reproduction number and local-area factors, e.g., by regressing RitR_{it} on Xi(tτt)X_{i(t-\tau_{t})} with τt\tau_{t} being a certain number of days in the regression equation, to account for delayed effects between changes in local-area factors and mean RitR_{it} estimates across locations. It is also possible to incorporate a time series of under-reporting rates into the TSI model to adjust for varying under-reporting rates over time. When the under-reporting rate remains constant over time, using the reported cases counts to estimate RitR_{it} is still valid (Cori et al., 2013) as the constant under-reporting rate is canceled out in the TSI renewal process. Furthermore, the model can be extended into a nonparametric model by relaxing assumptions on the link function to avoid mis-specification. Further details are provided in Supplementary Material A.6. The estimation is straightforward if the dependency of {Rit}t0\{R_{it}\}_{t\geq 0} is ignored for each ii. When time series dependency of {Rit}t0\{R_{it}\}_{t\geq 0} is considered, e.g., 𝔼[Rit|Zit,D3,it]=g(Zit)+m=1qθmfm(Ri,tm)\mathbbm{E}\big{[}R_{it}|Z_{it},D_{3,it}\big{]}=g(Z_{it})+\sum_{m=1}^{q}\theta_{m}f_{m}(R_{i,t-m}) with a known Ri0R_{i0} and known functions fmf_{m}, 1mq1\leq m\leq q and unknown g()g(\cdot), we can adopt local linear kernel estimators (Fan and Gijbels, 1996) to estimate the parameters in the LOCAL-QUEST algorithm.

References

  • Alam and Sultana (2021) {barticle}[author] \bauthor\bsnmAlam, \bfnmMd Shafiul\binitsM. S. and \bauthor\bsnmSultana, \bfnmRumana\binitsR. (\byear2021). \btitleInfluences of climatic and non-climatic factors on COVID-19 outbreak: a review of existing literature. \bjournalEnvironmental Challenges \bvolume5 \bpages100255. \endbibitem
  • Amman et al. (2022) {barticle}[author] \bauthor\bsnmAmman, \bfnmFabian\binitsF., \bauthor\bsnmMarkt, \bfnmRudolf\binitsR., \bauthor\bsnmEndler, \bfnmLukas\binitsL., \bauthor\bsnmHupfauf, \bfnmSebastian\binitsS., \bauthor\bsnmAgerer, \bfnmBenedikt\binitsB., \bauthor\bsnmSchedl, \bfnmAnna\binitsA., \bauthor\bsnmRichter, \bfnmLukas\binitsL., \bauthor\bsnmZechmeister, \bfnmMelanie\binitsM., \bauthor\bsnmBicher, \bfnmMartin\binitsM., \bauthor\bsnmHeiler, \bfnmGeorg\binitsG. \betalet al. (\byear2022). \btitleViral variant-resolved wastewater surveillance of SARS-CoV-2 at national scale. \bjournalNature biotechnology \bvolume40 \bpages1814–1822. \endbibitem
  • Aubrey (2020) {barticle}[author] \bauthor\bsnmAubrey, \bfnmAllison\binitsA. (\byear2020). \btitleCan Coronavirus Be Crushed By Warmer Weather? \bjournalNPR. \endbibitem
  • Bottou (1998) {barticle}[author] \bauthor\bsnmBottou, \bfnmLéon\binitsL. (\byear1998). \btitleOnline algorithms and stochastic approximations. \bjournalOnline learning in neural networks. \endbibitem
  • Bühlmann (2002) {barticle}[author] \bauthor\bsnmBühlmann, \bfnmPeter\binitsP. (\byear2002). \btitleBootstraps for time series. \bjournalStatistical science \bpages52–72. \endbibitem
  • Bühlmann and Künsch (1999) {barticle}[author] \bauthor\bsnmBühlmann, \bfnmPeter\binitsP. and \bauthor\bsnmKünsch, \bfnmHans R\binitsH. R. (\byear1999). \btitleBlock length selection in the bootstrap for time series. \bjournalComputational Statistics & Data Analysis \bvolume31 \bpages295–310. \endbibitem
  • Cameron and Trivedi (2013) {bbook}[author] \bauthor\bsnmCameron, \bfnmA Colin\binitsA. C. and \bauthor\bsnmTrivedi, \bfnmPravin K\binitsP. K. (\byear2013). \btitleRegression analysis of count data \bvolume53. \bpublisherCambridge university press. \endbibitem
  • Chiou and Müller (1998) {barticle}[author] \bauthor\bsnmChiou, \bfnmJeng-Min\binitsJ.-M. and \bauthor\bsnmMüller, \bfnmHans-Georg\binitsH.-G. (\byear1998). \btitleQuasi-likelihood regression with unknown link and variance functions. \bjournalJournal of the American Statistical Association \bvolume93 \bpages1376–1387. \endbibitem
  • Cori et al. (2013) {barticle}[author] \bauthor\bsnmCori, \bfnmAnne\binitsA., \bauthor\bsnmFerguson, \bfnmNeil M\binitsN. M., \bauthor\bsnmFraser, \bfnmChristophe\binitsC. and \bauthor\bsnmCauchemez, \bfnmSimon\binitsS. (\byear2013). \btitleA new framework and software to estimate time-varying reproduction numbers during epidemics. \bjournalAmerican journal of epidemiology \bvolume178 \bpages1505–1512. \endbibitem
  • Courtemanche et al. (2020) {barticle}[author] \bauthor\bsnmCourtemanche, \bfnmCharles\binitsC., \bauthor\bsnmGaruccio, \bfnmJoseph\binitsJ., \bauthor\bsnmLe, \bfnmAnh\binitsA., \bauthor\bsnmPinkston, \bfnmJoshua\binitsJ. and \bauthor\bsnmYelowitz, \bfnmAaron\binitsA. (\byear2020). \btitleStrong Social Distancing Measures In The United States Reduced The COVID-19 Growth Rate: Study evaluates the impact of social distancing measures on the growth rate of confirmed COVID-19 cases across the United States. \bjournalHealth affairs \bvolume39 \bpages1237–1246. \endbibitem
  • Cramer et al. (2022) {barticle}[author] \bauthor\bsnmCramer, \bfnmEstee Y\binitsE. Y., \bauthor\bsnmHuang, \bfnmYuxin\binitsY., \bauthor\bsnmWang, \bfnmYijin\binitsY., \bauthor\bsnmRay, \bfnmEvan L\binitsE. L., \bauthor\bsnmCornell, \bfnmMatthew\binitsM., \bauthor\bsnmBracher, \bfnmJohannes\binitsJ., \bauthor\bsnmBrennen, \bfnmAndrea\binitsA., \bauthor\bsnmRivadeneira, \bfnmAlvaro J Castro\binitsA. J. C., \bauthor\bsnmGerding, \bfnmAaron\binitsA., \bauthor\bsnmHouse, \bfnmKatie\binitsK. \betalet al. (\byear2022). \btitleThe united states covid-19 forecast hub dataset. \bjournalScientific data \bvolume9 \bpages462. \endbibitem
  • Davis, Dunsmuir and Wang (1999) {barticle}[author] \bauthor\bsnmDavis, \bfnmRichard A\binitsR. A., \bauthor\bsnmDunsmuir, \bfnmWilliam TM\binitsW. T. and \bauthor\bsnmWang, \bfnmYing\binitsY. (\byear1999). \btitleModeling time series of count data. \bjournalStatistics Textbooks and Monographs \bvolume158 \bpages63–114. \endbibitem
  • Davis, Dunsmuir and Wang (2000) {barticle}[author] \bauthor\bsnmDavis, \bfnmRichard A\binitsR. A., \bauthor\bsnmDunsmuir, \bfnmWilliam TM\binitsW. T. and \bauthor\bsnmWang, \bfnmYin\binitsY. (\byear2000). \btitleOn autocorrelation in a Poisson regression model. \bjournalBiometrika \bvolume87 \bpages491–505. \endbibitem
  • Davis, Dunsmuir and Streett (2003) {barticle}[author] \bauthor\bsnmDavis, \bfnmRichard A\binitsR. A., \bauthor\bsnmDunsmuir, \bfnmWilliam TM\binitsW. T. and \bauthor\bsnmStreett, \bfnmSarah B\binitsS. B. (\byear2003). \btitleObservation-driven models for Poisson counts. \bjournalBiometrika \bvolume90 \bpages777–790. \endbibitem
  • Dempsey (2020) {barticle}[author] \bauthor\bsnmDempsey, \bfnmWalter\binitsW. (\byear2020). \btitleAddressing selection bias and measurement error in COVID-19 case count data using auxiliary information. \bjournalarXiv preprint arXiv:2005.10425. \endbibitem
  • Doukhan, Fokianos and Li (2012) {barticle}[author] \bauthor\bsnmDoukhan, \bfnmPaul\binitsP., \bauthor\bsnmFokianos, \bfnmKonstantinos\binitsK. and \bauthor\bsnmLi, \bfnmXiaoyin\binitsX. (\byear2012). \btitleOn weak dependence conditions: the case of discrete valued processes. \bjournalStatistics & Probability Letters \bvolume82 \bpages1941–1948. \endbibitem
  • Doukhan, Fokianos and Tjøstheim (2012) {barticle}[author] \bauthor\bsnmDoukhan, \bfnmPaul\binitsP., \bauthor\bsnmFokianos, \bfnmKonstantinos\binitsK. and \bauthor\bsnmTjøstheim, \bfnmDag\binitsD. (\byear2012). \btitleOn weak dependence conditions for Poisson autoregressions. \bjournalStatistics & Probability Letters \bvolume82 \bpages942–948. \endbibitem
  • Du et al. (2022) {barticle}[author] \bauthor\bsnmDu, \bfnmHongru\binitsH., \bauthor\bsnmDong, \bfnmEnsheng\binitsE., \bauthor\bsnmBadr, \bfnmHamada S\binitsH. S., \bauthor\bsnmPetrone, \bfnmMary E\binitsM. E., \bauthor\bsnmGrubaugh, \bfnmNathan D\binitsN. D. and \bauthor\bsnmGardner, \bfnmLauren M\binitsL. M. (\byear2022). \btitleA Deep Learning Approach to Forecast Short-Term COVID-19 Cases and Deaths in the US. \bjournalmedRxiv \bpages2022–08. \endbibitem
  • Fan and Gijbels (1996) {bbook}[author] \bauthor\bsnmFan, \bfnmJianqing\binitsJ. and \bauthor\bsnmGijbels, \bfnmIrene\binitsI. (\byear1996). \btitleLocal polynomial modelling and its applications: monographs on statistics and applied probability 66 \bvolume66. \bpublisherCRC Press. \endbibitem
  • Farhang-Boroujeny (2013) {bbook}[author] \bauthor\bsnmFarhang-Boroujeny, \bfnmBehrouz\binitsB. (\byear2013). \btitleAdaptive filters: theory and applications. \bpublisherJohn Wiley & Sons. \endbibitem
  • Fokianos, Rahbek and Tjøstheim (2009) {barticle}[author] \bauthor\bsnmFokianos, \bfnmKonstantinos\binitsK., \bauthor\bsnmRahbek, \bfnmAnders\binitsA. and \bauthor\bsnmTjøstheim, \bfnmDag\binitsD. (\byear2009). \btitlePoisson autoregression. \bjournalJournal of the American Statistical Association \bvolume104 \bpages1430–1439. \endbibitem
  • Ge et al. (2023) {barticle}[author] \bauthor\bsnmGe, \bfnmYong\binitsY., \bauthor\bsnmWu, \bfnmXilin\binitsX., \bauthor\bsnmZhang, \bfnmWenbin\binitsW., \bauthor\bsnmWang, \bfnmXiaoli\binitsX., \bauthor\bsnmZhang, \bfnmDie\binitsD., \bauthor\bsnmWang, \bfnmJianghao\binitsJ., \bauthor\bsnmLiu, \bfnmHaiyan\binitsH., \bauthor\bsnmRen, \bfnmZhoupeng\binitsZ., \bauthor\bsnmRuktanonchai, \bfnmNick W\binitsN. W., \bauthor\bsnmRuktanonchai, \bfnmCorrine W\binitsC. W. \betalet al. (\byear2023). \btitleEffects of public-health measures for zeroing out different SARS-CoV-2 variants. \bjournalNature Communications \bvolume14 \bpages5270. \endbibitem
  • Gorman (2020) {barticle}[author] \bauthor\bsnmGorman, \bfnmJames\binitsJ. (\byear2020). \btitleSummer heat may not diminish coronavirus strength. \bjournalNew York Times. \endbibitem
  • Gostic et al. (2020) {barticle}[author] \bauthor\bsnmGostic, \bfnmKatelyn M\binitsK. M., \bauthor\bsnmMcGough, \bfnmLauren\binitsL., \bauthor\bsnmBaskerville, \bfnmEdward B\binitsE. B., \bauthor\bsnmAbbott, \bfnmSam\binitsS., \bauthor\bsnmJoshi, \bfnmKeya\binitsK., \bauthor\bsnmTedijanto, \bfnmChristine\binitsC., \bauthor\bsnmKahn, \bfnmRebecca\binitsR., \bauthor\bsnmNiehus, \bfnmRene\binitsR., \bauthor\bsnmHay, \bfnmJames A\binitsJ. A., \bauthor\bsnmDe Salazar, \bfnmPablo M\binitsP. M. \betalet al. (\byear2020). \btitlePractical considerations for measuring the effective reproductive number, R t. \bjournalPLoS computational biology \bvolume16 \bpagese1008409. \endbibitem
  • Hall (1985) {barticle}[author] \bauthor\bsnmHall, \bfnmPeter\binitsP. (\byear1985). \btitleResampling a coverage pattern. \bjournalStochastic processes and their applications \bvolume20 \bpages231–246. \endbibitem
  • He et al. (2020) {barticle}[author] \bauthor\bsnmHe, \bfnmXi\binitsX., \bauthor\bsnmLau, \bfnmEric HY\binitsE. H., \bauthor\bsnmWu, \bfnmPeng\binitsP., \bauthor\bsnmDeng, \bfnmXilong\binitsX., \bauthor\bsnmWang, \bfnmJian\binitsJ., \bauthor\bsnmHao, \bfnmXinxin\binitsX., \bauthor\bsnmLau, \bfnmYiu Chung\binitsY. C., \bauthor\bsnmWong, \bfnmJessica Y\binitsJ. Y., \bauthor\bsnmGuan, \bfnmYujuan\binitsY., \bauthor\bsnmTan, \bfnmXinghua\binitsX. \betalet al. (\byear2020). \btitleTemporal dynamics in viral shedding and transmissibility of COVID-19. \bjournalNature medicine \bvolume26 \bpages672–675. \endbibitem
  • Hou et al. (2021) {barticle}[author] \bauthor\bsnmHou, \bfnmXiao\binitsX., \bauthor\bsnmGao, \bfnmSong\binitsS., \bauthor\bsnmLi, \bfnmQin\binitsQ., \bauthor\bsnmKang, \bfnmYuhao\binitsY., \bauthor\bsnmChen, \bfnmNan\binitsN., \bauthor\bsnmChen, \bfnmKaiping\binitsK., \bauthor\bsnmRao, \bfnmJinmeng\binitsJ., \bauthor\bsnmEllenberg, \bfnmJordan S\binitsJ. S. and \bauthor\bsnmPatz, \bfnmJonathan A\binitsJ. A. (\byear2021). \btitleIntracounty modeling of COVID-19 infection with human mobility: Assessing spatial heterogeneity with business traffic, age, and race. \bjournalProceedings of the National Academy of Sciences \bvolume118 \bpagese2020524118. \endbibitem
  • Jones et al. (2021) {barticle}[author] \bauthor\bsnmJones, \bfnmJefferson M\binitsJ. M., \bauthor\bsnmStone, \bfnmMars\binitsM., \bauthor\bsnmSulaeman, \bfnmHasan\binitsH., \bauthor\bsnmFink, \bfnmRebecca V\binitsR. V., \bauthor\bsnmDave, \bfnmHoney\binitsH., \bauthor\bsnmLevy, \bfnmMatthew E\binitsM. E., \bauthor\bsnmDi Germanio, \bfnmClara\binitsC., \bauthor\bsnmGreen, \bfnmValerie\binitsV., \bauthor\bsnmNotari, \bfnmEdward\binitsE., \bauthor\bsnmSaa, \bfnmPaula\binitsP. \betalet al. (\byear2021). \btitleEstimated US infection-and vaccine-induced SARS-CoV-2 seroprevalence based on blood donations, July 2020-May 2021. \bjournalJama \bvolume326 \bpages1400–1409. \endbibitem
  • Jones et al. (2023) {barticle}[author] \bauthor\bsnmJones, \bfnmJefferson M\binitsJ. M., \bauthor\bsnmManrique, \bfnmIrene Molina\binitsI. M., \bauthor\bsnmStone, \bfnmMars S\binitsM. S., \bauthor\bsnmGrebe, \bfnmEduard\binitsE., \bauthor\bsnmSaa, \bfnmPaula\binitsP., \bauthor\bsnmGermanio, \bfnmClara D\binitsC. D., \bauthor\bsnmSpencer, \bfnmBryan R\binitsB. R., \bauthor\bsnmNotari, \bfnmEdward\binitsE., \bauthor\bsnmBravo, \bfnmMarjorie\binitsM., \bauthor\bsnmLanteri, \bfnmMarion C\binitsM. C. \betalet al. (\byear2023). \btitleEstimates of SARS-CoV-2 seroprevalence and incidence of primary SARS-CoV-2 infections among blood donors, by COVID-19 vaccination status—United States, April 2021–September 2022. \bjournalMorbidity and Mortality Weekly Report \bvolume72 \bpages601. \endbibitem
  • Kaufmann (1987) {barticle}[author] \bauthor\bsnmKaufmann, \bfnmHeinz\binitsH. (\byear1987). \btitleRegression models for nonstationary categorical time series: asymptotic estimation theory. \bjournalThe Annals of Statistics \bpages79–98. \endbibitem
  • Khan et al. (2022) {barticle}[author] \bauthor\bsnmKhan, \bfnmDiba\binitsD., \bauthor\bsnmPark, \bfnmMeeyoung\binitsM., \bauthor\bsnmLerma, \bfnmSamuel\binitsS., \bauthor\bsnmSoroka, \bfnmStephen\binitsS., \bauthor\bsnmGaughan, \bfnmDenise\binitsD., \bauthor\bsnmBottichio, \bfnmLyndsay\binitsL., \bauthor\bsnmBray, \bfnmMonika\binitsM., \bauthor\bsnmFukushima, \bfnmMary\binitsM., \bauthor\bsnmBregman, \bfnmBrooke\binitsB., \bauthor\bsnmWiedeman, \bfnmCaleb\binitsC. \betalet al. (\byear2022). \btitleImproving efficiency of COVID-19 aggregate case and death surveillance data transmission for jurisdictions: current and future role of application programming interfaces (APIs). \bjournalJournal of the American Medical Informatics Association \bvolume29 \bpages1807–1809. \endbibitem
  • Khan et al. (2023) {barticle}[author] \bauthor\bsnmKhan, \bfnmDiba\binitsD., \bauthor\bsnmPark, \bfnmMeeyoung\binitsM., \bauthor\bsnmBurkholder, \bfnmJacqueline\binitsJ., \bauthor\bsnmDumbuya, \bfnmSorie\binitsS., \bauthor\bsnmRitchey, \bfnmMatthew D\binitsM. D., \bauthor\bsnmYoon, \bfnmPaula\binitsP., \bauthor\bsnmGalante, \bfnmAmanda\binitsA., \bauthor\bsnmDuva, \bfnmJoseph L\binitsJ. L., \bauthor\bsnmFreeman, \bfnmJeffrey\binitsJ., \bauthor\bsnmDuck, \bfnmWilliam\binitsW. \betalet al. (\byear2023). \btitleTracking COVID-19 in the United States with surveillance of aggregate cases and deaths. \bjournalPublic Health Reports \bvolume138 \bpages428–437. \endbibitem
  • Künsch (1989) {barticle}[author] \bauthor\bsnmKünsch, \bfnmHans R\binitsH. R. (\byear1989). \btitleThe jackknife and the bootstrap for general stationary observations. \bjournalAnnals of Statistics \bvolume17 \bpages1217–1241. \endbibitem
  • Li et al. (2020) {barticle}[author] \bauthor\bsnmLi, \bfnmQun\binitsQ., \bauthor\bsnmGuan, \bfnmXuhua\binitsX., \bauthor\bsnmWu, \bfnmPeng\binitsP., \bauthor\bsnmWang, \bfnmXiaoye\binitsX., \bauthor\bsnmZhou, \bfnmLei\binitsL., \bauthor\bsnmTong, \bfnmYeqing\binitsY., \bauthor\bsnmRen, \bfnmRuiqi\binitsR., \bauthor\bsnmLeung, \bfnmKathy SM\binitsK. S., \bauthor\bsnmLau, \bfnmEric HY\binitsE. H., \bauthor\bsnmWong, \bfnmJessica Y\binitsJ. Y. \betalet al. (\byear2020). \btitleEarly transmission dynamics in Wuhan, China, of novel coronavirus–infected pneumonia. \bjournalNew England journal of medicine. \endbibitem
  • Lison et al. (2023) {barticle}[author] \bauthor\bsnmLison, \bfnmAdrian\binitsA., \bauthor\bsnmAbbott, \bfnmSam\binitsS., \bauthor\bsnmHuisman, \bfnmJana\binitsJ. and \bauthor\bsnmStadler, \bfnmTanja\binitsT. (\byear2023). \btitleGenerative Bayesian modeling to nowcast the effective reproduction number from line list data with missing symptom onset dates. \bjournalarXiv preprint arXiv:2308.13262. \endbibitem
  • Nash, Nouvellet and Cori (2022) {barticle}[author] \bauthor\bsnmNash, \bfnmRebecca K\binitsR. K., \bauthor\bsnmNouvellet, \bfnmPierre\binitsP. and \bauthor\bsnmCori, \bfnmAnne\binitsA. (\byear2022). \btitleReal-time estimation of the epidemic reproduction number: Scoping review of the applications and challenges. \bjournalPLOS Digital Health \bvolume1 \bpagese0000052. \endbibitem
  • Neumann et al. (2011) {barticle}[author] \bauthor\bsnmNeumann, \bfnmMichael H\binitsM. H. \betalet al. (\byear2011). \btitleAbsolute regularity and ergodicity of Poisson count processes. \bjournalBernoulli \bvolume17 \bpages1268–1284. \endbibitem
  • Noh and Danuser (2021) {barticle}[author] \bauthor\bsnmNoh, \bfnmJungsik\binitsJ. and \bauthor\bsnmDanuser, \bfnmGaudenz\binitsG. (\byear2021). \btitleEstimation of the fraction of COVID-19 infected people in US states and countries worldwide. \bjournalPloS one \bvolume16 \bpagese0246772. \endbibitem
  • Nouvellet et al. (2021) {barticle}[author] \bauthor\bsnmNouvellet, \bfnmPierre\binitsP., \bauthor\bsnmBhatia, \bfnmSangeeta\binitsS., \bauthor\bsnmCori, \bfnmAnne\binitsA., \bauthor\bsnmAinslie, \bfnmKylie EC\binitsK. E., \bauthor\bsnmBaguelin, \bfnmMarc\binitsM., \bauthor\bsnmBhatt, \bfnmSamir\binitsS., \bauthor\bsnmBoonyasiri, \bfnmAdhiratha\binitsA., \bauthor\bsnmBrazeau, \bfnmNicholas F\binitsN. F., \bauthor\bsnmCattarino, \bfnmLorenzo\binitsL., \bauthor\bsnmCooper, \bfnmLaura V\binitsL. V. \betalet al. (\byear2021). \btitleReduction in mobility and COVID-19 transmission. \bjournalNature communications \bvolume12 \bpages1090. \endbibitem
  • Pan et al. (2020) {barticle}[author] \bauthor\bsnmPan, \bfnmAn\binitsA., \bauthor\bsnmLiu, \bfnmLi\binitsL., \bauthor\bsnmWang, \bfnmChaolong\binitsC., \bauthor\bsnmGuo, \bfnmHuan\binitsH., \bauthor\bsnmHao, \bfnmXingjie\binitsX., \bauthor\bsnmWang, \bfnmQi\binitsQ., \bauthor\bsnmHuang, \bfnmJiao\binitsJ., \bauthor\bsnmHe, \bfnmNa\binitsN., \bauthor\bsnmYu, \bfnmHongjie\binitsH., \bauthor\bsnmLin, \bfnmXihong\binitsX. \betalet al. (\byear2020). \btitleAssociation of public health interventions with the epidemiology of the COVID-19 outbreak in Wuhan, China. \bjournalJama \bvolume323 \bpages1915–1923. \endbibitem
  • Pica and Bouvier (2012) {barticle}[author] \bauthor\bsnmPica, \bfnmNatalie\binitsN. and \bauthor\bsnmBouvier, \bfnmNicole M\binitsN. M. (\byear2012). \btitleEnvironmental factors affecting the transmission of respiratory viruses. \bjournalCurrent opinion in virology \bvolume2 \bpages90–95. \endbibitem
  • Quick, Dey and Lin (2021) {barticle}[author] \bauthor\bsnmQuick, \bfnmCorbin\binitsC., \bauthor\bsnmDey, \bfnmRounak\binitsR. and \bauthor\bsnmLin, \bfnmXihong\binitsX. (\byear2021). \btitleRegression Models for Understanding COVID-19 Epidemic Dynamics With Incomplete Data. \bjournalJournal of the American Statistical Association \bvolume116 \bpages1561–1577. \endbibitem
  • Ray et al. (2020) {barticle}[author] \bauthor\bsnmRay, \bfnmEvan L\binitsE. L., \bauthor\bsnmWattanachit, \bfnmNutcha\binitsN., \bauthor\bsnmNiemi, \bfnmJarad\binitsJ., \bauthor\bsnmKanji, \bfnmAbdul Hannan\binitsA. H., \bauthor\bsnmHouse, \bfnmKatie\binitsK., \bauthor\bsnmCramer, \bfnmEstee Y\binitsE. Y., \bauthor\bsnmBracher, \bfnmJohannes\binitsJ., \bauthor\bsnmZheng, \bfnmAndrew\binitsA., \bauthor\bsnmYamana, \bfnmTeresa K\binitsT. K., \bauthor\bsnmXiong, \bfnmXinyue\binitsX. \betalet al. (\byear2020). \btitleEnsemble forecasts of coronavirus disease 2019 (COVID-19) in the US. \bjournalMedRXiv \bpages2020–08. \endbibitem
  • Roach (2020) {barticle}[author] \bauthor\bsnmRoach, \bfnmJohn\binitsJ. (\byear2020). \btitleSpring may impact the spread of the coronavirus. \bjournalAccuweather. \endbibitem
  • Rubin et al. (2020) {barticle}[author] \bauthor\bsnmRubin, \bfnmDavid\binitsD., \bauthor\bsnmHuang, \bfnmJing\binitsJ., \bauthor\bsnmFisher, \bfnmBrian T\binitsB. T., \bauthor\bsnmGasparrini, \bfnmAntonio\binitsA., \bauthor\bsnmTam, \bfnmVicky\binitsV., \bauthor\bsnmSong, \bfnmLihai\binitsL., \bauthor\bsnmWang, \bfnmXi\binitsX., \bauthor\bsnmKaufman, \bfnmJason\binitsJ., \bauthor\bsnmFitzpatrick, \bfnmKate\binitsK., \bauthor\bsnmJain, \bfnmArushi\binitsA. \betalet al. (\byear2020). \btitleAssociation of social distancing, population density, and temperature with the instantaneous reproduction number of SARS-CoV-2 in counties across the United States. \bjournalJAMA network open \bvolume3 \bpagese2016099–e2016099. \endbibitem
  • Sera et al. (2021) {barticle}[author] \bauthor\bsnmSera, \bfnmFrancesco\binitsF., \bauthor\bsnmArmstrong, \bfnmBen\binitsB., \bauthor\bsnmAbbott, \bfnmSam\binitsS., \bauthor\bsnmMeakin, \bfnmSophie\binitsS., \bauthor\bsnmO’Reilly, \bfnmKathleen\binitsK., \bauthor\bparticlevon \bsnmBorries, \bfnmRosa\binitsR., \bauthor\bsnmSchneider, \bfnmRochelle\binitsR., \bauthor\bsnmRoyé, \bfnmDominic\binitsD., \bauthor\bsnmHashizume, \bfnmMasahiro\binitsM., \bauthor\bsnmPascal, \bfnmMathilde\binitsM. \betalet al. (\byear2021). \btitleA cross-sectional analysis of meteorological factors and SARS-CoV-2 transmission in 409 cities across 26 countries. \bjournalNature communications \bvolume12 \bpages5968. \endbibitem
  • Stewart-Ibarra et al. (2014) {barticle}[author] \bauthor\bsnmStewart-Ibarra, \bfnmAnna M\binitsA. M., \bauthor\bsnmMuñoz, \bfnmÁngel G\binitsÁ. G., \bauthor\bsnmRyan, \bfnmSadie J\binitsS. J., \bauthor\bsnmAyala, \bfnmEfraín Beltrán\binitsE. B., \bauthor\bsnmBorbor-Cordova, \bfnmMercy J\binitsM. J., \bauthor\bsnmFinkelstein, \bfnmJulia L\binitsJ. L., \bauthor\bsnmMejía, \bfnmRaúl\binitsR., \bauthor\bsnmOrdoñez, \bfnmTania\binitsT., \bauthor\bsnmRecalde-Coronel, \bfnmG Cristina\binitsG. C. and \bauthor\bsnmRivero, \bfnmKeytia\binitsK. (\byear2014). \btitleSpatiotemporal clustering, climate periodicity, and social-ecological risk factors for dengue during an outbreak in Machala, Ecuador, in 2010. \bjournalBMC infectious diseases \bvolume14 \bpages1–16. \endbibitem
  • Svensson (2007) {barticle}[author] \bauthor\bsnmSvensson, \bfnmÅke\binitsÅ. (\byear2007). \btitleA note on generation times in epidemic models. \bjournalMathematical biosciences \bvolume208 \bpages300–311. \endbibitem
  • Talic et al. (2021) {barticle}[author] \bauthor\bsnmTalic, \bfnmStella\binitsS., \bauthor\bsnmShah, \bfnmShivangi\binitsS., \bauthor\bsnmWild, \bfnmHolly\binitsH., \bauthor\bsnmGasevic, \bfnmDanijela\binitsD., \bauthor\bsnmMaharaj, \bfnmAshika\binitsA., \bauthor\bsnmAdemi, \bfnmZanfina\binitsZ., \bauthor\bsnmLi, \bfnmXue\binitsX., \bauthor\bsnmXu, \bfnmWei\binitsW., \bauthor\bsnmMesa-Eguiagaray, \bfnmInes\binitsI., \bauthor\bsnmRostron, \bfnmJasmin\binitsJ. \betalet al. (\byear2021). \btitleEffectiveness of public health measures in reducing the incidence of covid-19, SARS-CoV-2 transmission, and covid-19 mortality: systematic review and meta-analysis. \bjournalbmj \bvolume375. \endbibitem
  • WHO Ebola Response Team (2014) {barticle}[author] \bauthor\bsnmWHO Ebola Response Team (\byear2014). \btitleEbola virus disease in West Africa—the first 9 months of the epidemic and forward projections. \bjournalNew England Journal of Medicine \bvolume371 \bpages1481–1495. \endbibitem
  • Tsiatis (2007) {bbook}[author] \bauthor\bsnmTsiatis, \bfnmAnastasios\binitsA. (\byear2007). \btitleSemiparametric theory and missing data. \bpublisherSpringer Science & Business Media. \endbibitem
  • Unacast (2021) {barticle}[author] \bauthor\bsnmUnacast (\byear2021). \btitleSocial distancing scoreboard. \bjournalUnacast. \endbibitem
  • Weaver et al. (2022) {barticle}[author] \bauthor\bsnmWeaver, \bfnmAmanda K\binitsA. K., \bauthor\bsnmHead, \bfnmJennifer R\binitsJ. R., \bauthor\bsnmGould, \bfnmCarlos F\binitsC. F., \bauthor\bsnmCarlton, \bfnmElizabeth J\binitsE. J. and \bauthor\bsnmRemais, \bfnmJustin V\binitsJ. V. (\byear2022). \btitleEnvironmental factors influencing COVID-19 incidence and severity. \bjournalAnnual review of public health \bvolume43 \bpages271–291. \endbibitem
  • Wooldridge (2016) {bbook}[author] \bauthor\bsnmWooldridge, \bfnmJeffrey M\binitsJ. M. (\byear2016). \btitleIntroductory econometrics: A modern approach. \bpublisherNelson Education. \endbibitem
  • Zeger (1988) {barticle}[author] \bauthor\bsnmZeger, \bfnmScott L\binitsS. L. (\byear1988). \btitleA regression model for time series of counts. \bjournalBiometrika \bvolume75 \bpages621–629. \endbibitem