Early-Phase Local-Area Model for Pandemics Using Limited Data: A SARS-CoV-2 Application

Jiasheng Shilabel=e1][email protected] [ Jeffrey Morrislabel=e2][email protected] [ David Rubinlabel=e3][email protected] [ Jing Huanglabel=e4][email protected] [ School of Data Science, The Chinese University of Hong Kong, Shenzhen, China.presep=, ]e1 Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvaniapresep=, ]e2,e4 University of California presep=, ]e3

Abstract

The emergence of novel infectious agents presents challenges to statistical models of disease transmission. These challenges arise from limited, poor-quality data and an incomplete understanding of the agent. Moreover, outbreaks manifest differently across regions due to various factors, making it imperative for models to factor in regional specifics. In this work, we offer a model that effectively utilizes constrained data resources to estimate disease transmission rates at the local level, especially during the early outbreak phase when primarily infection counts and aggregated local characteristics are accessible. This model merges a pathogen transmission methodology based on daily infection numbers with regression techniques, drawing correlations between disease transmission and local-area factors, such as demographics, health policies, behavior, and even climate, to estimate and forecast daily infections. We incorporate the quasi-score method and an error term to navigate potential data concerns and mistaken assumptions. Additionally, we introduce an online estimator that facilitates real-time data updates, complemented by an iterative algorithm for parameter estimation. This approach facilitates real-time analysis of disease transmission when data quality is suboptimal and knowledge of the infectious pathogen is limited. It is particularly useful in the early stages of outbreaks, providing support for local decision-making.

Instantaneous reproduction number,

Online estimator,

Quasi-likelihood,

Time-Since-Infection model,

keywords:

\startlocaldefs\endlocaldefs

, and

1 Introduction

When a novel infectious agent emerges in a population lacking prior immunity, it can spread quickly, posing a threat of national or global outbreaks. An immediate response is crucial, but data infrastructures often fail to capture all the necessary data during the initial phase of an outbreak. For example, during the COVID-19 pandemic, although the CDC began collaborating with state, tribal, local, and territorial health departments in January 2020 to collect and validate COVID-19 case and death data, an automated data transfer system using an application programming interface was not fully established until July to improve data collection and quality at the jurisdiction level. It was only in December 2020 that the CDC set up business rules and curated official online data sources for most US counties (Khan et al., 2022, 2023). Furthermore, the first data collection on SARS-CoV-2 seroprevalence in the US did not begin until July 2020, and the estimates were not available until September 2021, with estimates only at the national level (Jones et al., 2021, 2023). Consequently, policymakers faced challenges in making timely, informed decisions with limited and potentially unreliable data in the pandemic’s early stages.

On the other hand, it’s important to note that while only a limited spectrum and quality of disease data are available, local factors may be harnessed to model an outbreak’s emergence and predict its trajectory (Pica and Bouvier, 2012; Stewart-Ibarra et al., 2014; Rubin et al., 2020; Alam and Sultana, 2021). Moreover, outbreaks are often driven by local transmission events, which can vary significantly from region to region. Local-area disease modeling is likely to be more useful than national modeling for policymakers to design mitigation strategies that consider regional-specific characteristics and needs.

A specific example is the policy questions posed by local governors during the initial stages of the COVID-19 pandemic in March 2020. At that time, governments deliberated over whether to enforce a lockdown and, if so, for how long. Some proposed a stringent short-term strategy, advocating for a strict lockdown until summer to entirely contain the virus, assuming that the virus would not survive in high temperatures. In contrast, others recommended a milder but more extended approach to social distancing, suggesting that temperature might not inhibit the virus’s spread, warranting a longer-term strategy (Roach, 2020; Aubrey, 2020; Gorman, 2020). To make an informed decision, it was essential to understand the impact of temperature on the virus’s transmission rate and to compare the effects of temperature and social distancing on the spread of this new pathogen. At that time, the pandemic had only begun a few weeks prior, so there wasn’t sufficient data within a single location to estimate the effect of temperature. A national model would also not be able to answer such a local question. We proposed utilizing data from multiple counties, spanning different latitudes with varying temperatures experienced during those initial weeks, to help gauge the impact of temperature, while adjusting for county-level covariates.

Regression models are often well-suited statistical tools for such an analysis. The challenge lies in creating or adapting an epidemiological model that captures the dynamics of disease transmission using limited and potentially unreliable data. This model must also be harmoniously integrated with a regression model to gauge the impact of local factors on the transmission process. Not all models are suitable for this task. For example, classical compartment models, which segregate individuals within a population into distinct compartments over varying time intervals, present challenges when data for all compartments are not available. The more compartments assumed, the more data variables are needed to fit the model. Additionally, they also face challenges when attempting to integrate with regression models. These models often treat the disease transmission parameter as a piecewise constant function over time, and they rely on solving a complex set of differential equations, either deterministic or stochastic, to determine disease transmission rates. In contrast, the time-since-infection (TSI) models, rooted in assumptions about the renewal process of case incidence numbers (which are treated as random variables and might be the sole disease data variables required for model fitting), can be tailored to regression models by applying specific distributional assumptions to the random variables representing case incidence numbers (Quick, Dey and Lin, 2021).

The TSI models are designed around the premise that the number of new infections at a particular moment, such as day $t$ , is influenced by three primary factors: the count of recent infections, the reproduction number at that moment, and an infectiousness function that measures the contagiousness of an infected person at each day since infection. For illustration, let $I_{it}$ represent the new infections on day $t$ in location $i$ . If we know the infections before day $t$ , denoted by $I_{i0},...,I_{i,t-1}$ , the anticipated new infections on day $t$ in location $i$ can be deduced as the multiplication of the instantaneous reproduction number and the infection potential on that particular day at the given location. This relationship can be expressed as:

\displaystyle E(I_{it}|I_{i0},\cdots,I_{i,t-1})=R_{it}\Lambda_{it},

(1.1)

where $R_{it}$ represents the time-varying reproduction number and $\Lambda_{it}\triangleq\sum_{s=1}^{t}I_{i,t-s}\omega_{s}$ stands for the infection potential on day $t$ in location $i$ . The infection potential is shaped by the current number of infectious individuals and the infectiousness function $\omega_{s}$ , which quantifies the infectiousness of the existing infectious individuals on the $s$ -th day after infection, with $\sum_{s=0}^{\infty}\omega_{s}=1$ . This infectiousness function can be approximated using the probability distribution of the serial interval or generation time (Svensson, 2007). To estimate $R_{it}$ , one may assume distribution assumptions for $I_{it}$ based on equation (1.1) and use methods like maximum likelihood or Bayesian approaches (Cori et al., 2013; WHO Ebola Response Team, 2014; Quick, Dey and Lin, 2021) with a predetermined $\omega_{s}$ . Due to the empirical nature of this model, it can be readily adapted to regression models, broadening its applicability (Quick, Dey and Lin, 2021).

TSI models were extensively used to analyze the COVID-19 pandemic (Pan et al., 2020; Nouvellet et al., 2021; Amman et al., 2022; Nash, Nouvellet and Cori, 2022; Ge et al., 2023), particularly in the early stages, likely because data on reported daily cases were mainly available during that time period. The models demonstrated superior accuracy in estimating disease transmission compared to many other methods (Gostic et al., 2020; Nash, Nouvellet and Cori, 2022). However, estimates of $R_{it}$ can be biased due to data errors or incompleteness in $I_{it}$ , which can stem from batch reporting, delayed reporting, under-ascertainment, manual errors, and other factors.. There are different ways to address these data challenges, one of which is to leverage additional data sources. For example, recent work have used serological studies and symptom surveys to address the under-ascertainment and delayed reporting of case incidence (Lison et al., 2023; Quick, Dey and Lin, 2021; Noh and Danuser, 2021; Dempsey, 2020). Relevant to our proposed work, the model introduced by Quick, Dey and Lin (2021) integrates testing data and population-based serological surveys to account for under-ascertainment and delayed reporting. However, these data are often unavailable at the local-area level and in the early stages of outbreaks.

In our study, we address these challenges in a distinct context, focusing on the early phase of an outbreak when additional data resources might be unavailable at the local-level. Considering the scarce of disease data and limited confidence about the pandemic models, our model introduces a general measurement error term, instead of relying on additional data and parametric assumptions, to model the mechanism of errors. This term accounts for deviations in $R_{it}$ that may arise from data inaccuracies or inaccurate assumptions about the model and cannot be explained by observed local-area factors. In essence, the proposed model is a local-area disease transmission model robust to data errors, merging a TSI model of transmission dynamics with regression models. This framework identifies key local-area factors influencing disease transmission variability across regions. We utilize the quasi-score method to ease distributional assumptions about the data and make minimal parametric assumptions on the measurement error component, bolstering model robustness. Our model offers flexibility in estimating the $R_{it}$ values even when relying on imperfect data or potentially mis-specified model assumptions. This is especially valuable during an outbreak’s early stages when additional data resources like testing data and serological surveys aren’t available and knowledge about the infectious agent is sparse. Furthermore, we introduce an online estimator coupled with an efficient iterative algorithm to estimate model parameters. This methodology supports continuous monitoring and dynamic forecasting of disease transmission under various scenarios, offering critical insights for local decision-making.

2 Method

In this section, we introduce the proposed model and estimation procedure, as well as the approach to construct confidence intervals for the estimates.

2.1 An early phase local-area model for modeling disease transmission

Building on Equation (1.1), for counties $i=1,\cdots,n$ , and time points $t=1,\cdots,N$ , we posit moment constraints on the number of new infections on day $t$ in location $i$ given previously reported existing infections prior to day $t$ , as:

\displaystyle\mu_{it}

\displaystyle\triangleq\mathbbm{E}\big{(}I_{it}|\mathcal{F}_{i,t-1}\big{)}=\mathbbm{E}\big{(}R_{it}|\mathcal{F}_{i,t-1}\big{)}\Lambda_{it},\qquad\nu_{it}\triangleq\mbox{Var}\big{(}I_{it}|\mathcal{F}_{i,t-1}\big{)}=g(\mu_{it})\cdot\phi_{i},

(2.1)

where $\mathcal{F}_{i,t}$ is the filtration of past incident cases information, $\phi_{i}$ is a dispersion parameter, $g(\cdot)$ is a known variance function, and $\mu_{it}=R_{it}\Lambda_{it}$ when $R_{it}\in\mathcal{F}_{i,t-1}$ . When $g(\cdot)$ is unknown, smoothing techniques, e.g. local polynomial fitting, can be used to estimate the variance function (Chiou and Müller, 1998). More discussions on unknown variance functions can be found in Section A of the Supplementary Materials.

As previously noted, $\Lambda_{it}\triangleq\sum_{s=1}^{t}I_{i,t-s}\omega_{s}$ represents the infection potential, determined by the current number of infectious individuals and the infectiousness function $\omega_{s}$ . We assume that the $\omega_{s}$ ’s represent the probability distribution of the serial interval or generation time, and set $\omega_{s}=0$ when $s=0$ or $s>\eta$ , with $\eta$ denoting the duration from infection to either recovery or mandatory quarantine. We use $\omega=\{\omega_{s},1\leq s\leq\eta\}$ to denote the vector of $\omega_{s}$ . In practice, $\omega$ is a vector of positive values, often obtained from epidemiological studies of infectious pathogens (He et al., 2020). In cases where multiple studies on serial intervals exist, a meta-analysis can be conducted to obtain a pooled estimate.

To estimate $R_{it}$ and investigate the impact of local area factors on disease transmission, we further assume a time series model as follows:

\displaystyle h(R_{it})

\displaystyle=W^{T}_{it}\check{\beta}_{i}+X_{it}^{T}\check{\beta}_{0}+\sum_{m=1}^{q}\theta_{m}f_{m}(D_{1,it},D_{2,it},D_{3,it})+\epsilon_{it},

(2.2)

where $D_{1,it}=\big{\{}X_{ij}\big{\}}_{0\leq j\leq t}$ , $D_{2,it}=\big{\{}I_{ij}\big{\}}_{0\leq j\leq t-1}$ , $D_{3,it}=\big{\{}R_{ij}\big{\}}_{0\leq j\leq t-1}$ , and $\mathcal{F}_{i,t-1}=\sigma(D_{1,it}\cup D_{2,it})$ . The term $\sum_{m=1}^{q}\theta_{m}f_{m}(D_{1,it},D_{2,it},D_{3,it})$ describes the time-series dependency of the disease transmission rate. Specifically, $f_{m}(\cdot)$ is a function of past data and $\theta_{m}\geq 0$ for $1\leq m<q$ . $f_{q}\equiv 1$ is set as the constant function and $\theta_{q}$ is the intercept term. The $h(\cdot)$ is a known link function. Moreover, $W_{it}\in\mathbb{R}^{p_{1}}$ and $X_{it}\in\mathbb{R}^{p-p_{1}}$ are vectors of local-area factors for location $i$ on day $t$ associated with disease transmission. The factors in $W_{it}^{T}$ have fixed location-specific effects, while the factors in $X_{it}^{T}$ have common effects across locations. To simplify notion, we consolidate the presentation of all local-area factors as $Z_{it}=(X_{it}^{T},\overbrace{0,\cdots,0}^{p(i-1)},W_{it}^{T},\overbrace{0,\cdots,0}^{p(n-i)})^{T}$ , present the vector of all regression coefficients of these factor as $\beta=(\check{\beta}_{0}^{T},\check{\beta}_{1}^{T},\cdots,\check{\beta}_{n}^{T})^{T}\in\mathbb{R}^{p}$ , and denote the vector of all other regression coefficients as $\theta=(\theta_{1},\cdots,\theta_{q})^{T}\in\mathbb{R}^{q}$ .

In practice, a choice for Equation (2.2) can be a non-stationary log-transmission model with an autoregressive process as follows:

\log(R_{it})=Z_{it}^{T}\beta+\sum_{m=1}^{q-1}\theta_{m}\log(R_{i,t-m})+\theta_{q}+\epsilon_{it},\;\|\theta\|_{1}<1\;\text{and}\;\mu_{it}=\nu_{it},\;{\rm for}\;t\geq q,\;\theta_{m}\geq 0.

(2.3)

In the above models, to counter potential biases arising from data errors and incorrectly specified model assumptions, we incorporate $\epsilon_{it}$ as a measurement error term for $R_{it}$ . This term can also be viewed as a random effect term of $R_{it}$ , capturing unexplained deviations in $R_{it}$ that could result from inaccuracies in reported infection data or model mis-specifications. At this point, our proposed model breaks down the serial dependency of incidence cases into two components: Equation (2.1) encapsulates the serial dependency inherent in the generative nature of disease transmission given $R_{it}$ , and Equation (2.2) portrays the serial dependency arising from the $R_{it}$ interdependence between adjacent time points, elucidated by covariates and time series structures.

2.2 Estimation

With the measurement error term in Equation (2.2), the estimation of $R_{it}$ and parameters in the proposed model becomes difficult. In a simple scenario when $\epsilon_{it}$ is assumed to be 0, Equation (2.2) reduces to

\displaystyle h(R_{it})

\displaystyle=Z_{it}^{T}{\beta}+\sum_{m=1}^{q}\theta_{m}f_{m}(D_{1,{it}},D_{2,it},D_{3,it}).

(2.4)

In this situation, an estimator of the parameters $\gamma=(\beta^{T},\theta^{T})^{T}$ , denoted as $\hat{\gamma}$ , can be obtained by solving $U_{N}({\gamma})=0$ , where $U_{N}(\gamma)$ is the quasi-score function

U_{N}(\gamma)=\sum_{i=1}^{n}\sum_{t=q}^{N}\big{(}\frac{\partial\mu_{it}}{\partial\gamma}\big{)}^{T}\nu_{it}^{-1}(I_{it}-\mu_{it})\triangleq\sum_{i=1}^{n}\sum_{t=q}^{N}\xi_{it}(\gamma).

(2.5)

However, when $\epsilon_{it}$ is a non-degenerate random variable, calculating the expectation in $\mu_{it}=\mathbbm{E}[R_{it}|\mathcal{F}_{i,t-1}]\Lambda_{it}$ and solving $U_{N}(\gamma)=0$ are generally difficult. Similar models were studied by Davis, Dunsmuir and Streett (2003) to model count data using the log link function, where the measurement error was assumed to be a stationary process and calculated using an autoregressive moving average recursion with a distributed lag structure. Here, given that $\{R_{it}\}_{1\leq t\leq N}^{1\leq i\leq n}$ are not directly observed, we describe an iterative algorithm for estimating parameters in (2.2) without requiring any stationary structure and distribution assumption on the measurement error term. Briefly, the proposed iterative algorithm consists of two steps. The first step uses the second layer of the model, the time series model, to construct a semi-parametric, locally efficient estimator of $\beta$ based on a location-shift regression model. In this step, $\beta$ and $R_{it}$ can be written as functions of $\theta$ using an initial estimate of $\theta$ or an estimate from the last iteration. In the second step, the parameters were updated using the first layer of the model, the quasi-score function. Specifically, for $t=q,\cdots,N$ , $i=1,\cdots,n$ , we first rearrange Equation (2.2) to

h(R_{it})-\sum_{m=1}^{q}\theta_{m}f_{m}(D_{1,it},D_{2,it},D_{3,it})=Z_{it}^{T}\beta+\epsilon_{it}.

Assuming the Gauss-Markov assumption (Wooldridge, 2016), i.e., the conditional independence of $\{\epsilon_{it}|Z_{it},t\geq 1\}$ , $\mathbbm{E}[\epsilon_{it}|Z_{it}]=0$ and homoscedasticity, the above equation forms a location-shift regression model. Thus, assuming $\theta$ is known, a semi-parametric locally efficient estimator (Tsiatis, 2007) for $\beta$ is obtained by

	$\displaystyle\hat{\beta}$	$\displaystyle=\left\{\sum_{i=1}^{n}\sum_{t=1}^{N}(Z_{it}-\bar{Z})(Z_{it}-\bar{Z})^{T}\right\}^{-1}$		(2.6)
		$\displaystyle\qquad\qquad\sum_{i=1}^{n}\sum_{t=1}^{N}(Z_{it}-\bar{Z})\left(h(R_{it})-\sum_{m=1}^{q}\theta_{m}f_{m}(D_{1,it},D_{2,it},D_{3,it})\right),$

where $\bar{Z}=\big{(}\sum_{i=1}^{n}\sum_{t=1}^{T}Z_{it}\big{)}/Nn$ , $N$ is the total number of observed time points and $n$ is the number of counties. Based on this formulation, we develop an iterative estimation method, namely the local-area disease transmission model using the quasi-score estimation (LOCAL-QUEST), to estimate $\gamma$ and $R_{it}$ by iteratively updating the locally-efficient semi-parametric estimator of $\beta$ and the quasi-score estimator of $\theta$ as shown in Algorithm 1. Intuitively, with an initial estimate of $\theta$ and $\beta$ , intermediate semi-parametric locally efficient estimator of $\beta$ can be written as a function of $\theta$ and these initial estimates using equations (2.6) and (2.4). Then $\hat{\theta}$ can be updated by solving equation (2.5) and $\hat{\beta}$ is updated by the updated $\hat{\theta}$ and equation (2.6). The algorithm is then proceeded iteratively until it converges. Due to its iterative nature, this procedure provides online estimators (Bottou, 1998; Farhang-Boroujeny, 2013) for $\{R_{it}\}_{1\leq t\leq N}^{1\leq i\leq n}$ and $\gamma$ , allowing daily updates or update when new data become available. This can be particularly useful for real-time monitoring of outbreaks, as disease incidence and local-area factors are changing over time, and the data stream comes in continuously. The parameters are updated incrementally as new data arrives, which also reduces the requirement of computational memory.

Algorithm 1 : The LOCAL-QUEST algorithm.
Estimation of the instantaneous reproduction number and local-area effects

1:procedure Initialize the parameters using data from a small initial period of the pandemic

\tau_{0}

and the model (2.4) that ignores measurement error. Denote the initial values as

\hat{\theta}^{(\tau_{0})}

\hat{\beta}^{(\tau_{0})}

and

\{\hat{R}_{it}^{(\tau_{0})}\}_{0<t\leq\tau_{0}}^{1\leq i\leq n}

2: loop

3: for

k>\tau_{0}

, and

k\leq N

, with estimates

\hat{\theta}^{(k-1)},\hat{\beta}^{(k-1)}

, and

\{\hat{R}_{it}^{(k-1)}\}_{0<t\leq k-1}^{1\leq i\leq n}

obtained in the last iteration, do

4: write the approximations of

\hat{\beta}

and

\{\hat{R}_{it}\}_{\tau_{0}+1\leq t\leq k}^{1\leq i\leq n}

as a function of

\theta

using equations

(\ref{sudoestimator})

and

(\ref{model_assumption})

, that is

$\displaystyle\tilde{\beta}^{(k)}$	$\displaystyle\triangleq\left\{\sum_{i=1}^{n}\sum_{t=1}^{k-1}(Z_{it}-\bar{Z}_{k-1})(Z_{it}-\bar{Z}_{k-1})^{T}\right\}^{-1}$	(2.7)
	$\displaystyle\qquad\times\sum_{i=1}^{n}\sum_{t=1}^{(k-1)}(Z_{it}-\bar{Z}_{k-1})\Big{[}h(\hat{R}_{it}^{(k-1)})-\sum_{m=1}^{q}\theta_{m}f_{m}\big{(}D_{1,it},D_{2,it},\hat{D}_{3,it}^{(k-1)}\big{)}\Big{]}$
and	$\displaystyle\tilde{R}_{it}^{(k)}\triangleq h^{-1}\left(\sum_{m=1}^{q}\theta_{m}f_{m}(D_{1,it},D_{2,it},\hat{D}_{3,it}^{(k-1)})+Z_{it}^{T}\tilde{\beta}^{(k)}\right),$	(2.8)

5: Obtain

\hat{\theta}^{(k)}

by solving the quasi-score equation

U_{k}(\theta)=\sum_{i=1}^{n}\sum_{t=\tau_{0}+1}^{k}\big{(}\frac{\partial\tilde{\mu}_{it}^{(k)}}{\partial\theta}\big{)}^{T}\big{(}\tilde{\nu}_{it}^{(k)}\big{)}^{-1}(I_{it}-\tilde{\mu}_{it}^{(k)})

(2.9)

using iterative methods, e.g. Newton-Raphson and iteratively reweighted least squares, where

\tilde{\mu}_{it}^{(k)}=\tilde{\nu}_{it}^{(k)}\triangleq\tilde{R}_{it}^{(k)}\Lambda_{it}

6: Obtain

\hat{\beta}^{(k)}

and

\{\hat{R}_{it}^{(k)}\}_{1\leq t\leq k}^{1\leq i\leq n}

by plugging

\hat{\theta}^{(k)}

into (2.7) and (2.8), i.e.,

$\displaystyle\hat{\beta}^{(k)}$	$\displaystyle\triangleq\left\{\sum_{i=1}^{n}\sum_{t=1}^{k-1}(Z_{it}-\bar{Z}_{k-1})(Z_{it}-\bar{Z}_{k-1})^{T}\right\}^{-1}$	(2.10)
	$\displaystyle\qquad\times\sum_{i=1}^{n}\sum_{t=1}^{(k-1)}(Z_{it}-\bar{Z}_{k-1})\Big{[}h(\hat{R}_{it}^{(k-1)})-\sum_{m=1}^{q}\hat{\theta}_{m}^{(k)}f_{m}\big{(}D_{1,it},D_{2,it},\hat{D}_{3,it}^{(k-1)}\big{)}\Big{]},$
and	$\displaystyle\hat{R}_{it}^{(k)}\triangleq h^{-1}\left(\sum_{m=1}^{q}\hat{\theta}_{m}^{(k)}f_{m}(D_{1,it},D_{2,it},\hat{D}_{3,it}^{(k)})+Z_{it}^{T}\hat{\beta}^{(k)}\right).$	(2.11)

7: end for

8: end loop

9:end procedure

10:

2.3 Uncertainty quantification

We use the block bootstrap method introduced by Hall (1985) and Künsch (1989) to quantify the uncertainty of the proposed online estimator. We will illustrate this calculation for $\hat{\beta}$ . The confidence band of $\{\hat{R}_{it}\}_{1\leq t\leq N}^{1\leq i\leq n}$ can be obtained similarly. Given the time series data of daily case incidence and local area factors, denoted as $\{(I_{it},Z_{it})\}_{1\leq t\leq N,1\leq i\leq n}$ , where $Z_{it}$ is a $p$ -dimensional vector of covariates at time $t$ in location $i$ , we assume that $\beta$ can be estimated using a block of the time series data as long as the length of the block exceeds $\ell$ . Specifically, we require that the expression $\left(\sum_{i}\sum_{t\in\mathcal{S}_{i}}(Z_{it}-\bar{Z}_{\mathcal{S}})(Z_{it}-\bar{Z}_{\mathcal{S}})^{T}\right)^{-1}$ be well defined for any arbitrary set $\mathcal{S}_{i}\subset\{1,\cdots,N\}$ with $|\mathcal{S}_{i}|\geq\ell$ , $i=1,\cdots,n$ and $\bar{Z}_{\mathcal{S}}$ is defined as an analogy to $\bar{Z}$ .

For $B$ bootstrapping samples, we assume the sampled block $\mathcal{S}_{b}$ is the same for all counties in one bootstrapping observation for simplicity, $b=1,\cdots,B$ . That is, we denote a block of data up to time $t$ with length $\ell$ as

\xi_{t}=\Big{\{}\big{(}I_{i,t-\ell+1},Z_{i,t-\ell+1}\big{)},\cdots,\big{(}I_{it},Z_{it}\big{)},i=1,\cdots,n\Big{\}}

for $t=\ell,\cdots,N$ . We adopt the suggested $\ell=O(N^{1/3})$ from Bühlmann and Künsch (1999) and independently resample $B$ blocks of data with replacement, for $B=O(\lfloor(N-p-\tau_{0})/\ell\rfloor)$ (Bühlmann, 2002) to obtain $B$ bootstrap samples $\xi_{t_{1}},\xi_{t_{2}},\ldots,\xi_{t_{B}}$ .

For the effect of $j$ -th covariate, $\beta_{j}$ , where $j=1,\cdots,p$ , let’s denote the estimator based on the $b$ -th sample as $\hat{\beta}^{*}_{t_{b},j}$ and the estimator using the whole data as $\hat{\beta_{j}}$ . One may use $\{\hat{\beta}^{*}_{t_{b},j}-\hat{\beta}_{j}:b=1,\cdots,B\}$ to approximate the empirical distribution of $(\hat{\beta}_{j}-\beta_{j})$ or use $\{\hat{\beta}^{*}_{t_{b},j}:b=1,\cdots,B\}$ as an approximation of the empirical distribution of $\hat{\beta}_{j}$ . Thus, the level $1-\alpha$ bootstrap confidence interval of $\beta_{j}$ based on the $B$ samples, denoted as $(L_{j,\alpha/2,B}^{*},U_{j,\alpha/2,B}^{*})$ , can be constructed by

	$\displaystyle L_{j,\alpha/2,B}^{}=2\hat{\beta}_{j}-\sup\left\{t:\frac{1}{B}\sum_{b=1}^{B}\mathbbm{1}(\hat{\beta}^{}_{t_{b},j}\leq t)\leq 1-\frac{\alpha}{2}\right\},$
	$\displaystyle U_{j,\alpha/2,B}^{}=2\hat{\beta}_{j}-\inf\left\{t:\frac{1}{B}\sum_{b=1}^{B}\mathbbm{1}(\hat{\beta}^{}_{t_{b},j}\leq t)\geq\frac{\alpha}{2}\right\},\;\;\text{or}$		(2.12)
	$\displaystyle L_{j,\alpha/2,B}^{}=\inf\left\{t:\frac{1}{B}\sum_{b=1}^{B}\mathbbm{1}(\hat{\beta}^{}_{t_{b},j}\leq t)\geq\frac{\alpha}{2}\right\},\;U_{j,\alpha/2,B}^{}=\sup\left\{t:\frac{1}{B}\sum_{b=1}^{B}\mathbbm{1}(\hat{\beta}^{}_{t_{b},j}\leq t)\leq 1-\frac{\alpha}{2}\right\}.$

3 Properties of the Estimators and Algorithm

In this section, we show the asymptotic results of the estimators and property of the estimating procedure.

3.1 Asymptotic properties

The proposed method is a general approach for various counting processes and is robust to mis-specification of distribution assumptions. When the measurement error can be ignored, i.e., $\epsilon_{it}=0$ , the Poisson process used in (Cori et al., 2013) becomes a special case of Equation (2.1), with $\nu_{it}=\mu_{it}$ . Since $U_{t}(\gamma)\in\mathcal{F}_{t}\triangleq\sigma(\cup_{i=1}^{n}\mathcal{F}_{i,t})$ and $\partial\mu_{it}/\partial\gamma,\mu_{it},\nu_{it}\in\mathcal{F}_{i,t-1}$ , then ${(U_{s}(\gamma),\mathcal{F}_{s}),s\geq q}$ forms a mean zero martingale sequence, indicating that $U_{N}(\gamma)$ is an unbiased estimating function. For asymptotic properties of the estimators, we consider the least favorable case where we only have one county’s data and discard subscript $i$ in all notations. Denote the true parameter value as $\gamma_{0}$ , and according to Theorem 3 of Kaufmann (1987), we have:

Theorem 3.1.

Under condition 1 described in Section B of the Supplementary Material and on the non-extinction set defined in (B.1),

\Big{[}\sum_{t=q}^{N}\mbox{Cov}\big{(}\xi_{t}(\gamma_{0})|\mathcal{F}_{t-1}\big{)}\Big{]}^{1/2}(\hat{\gamma}-\gamma_{0})\overset{d}{\longrightarrow}\mathcal{N}(0,I).

(3.1)

In the Supplementary Materials, we also show a special case of (2.4), given by

\log(R_{it})=Z_{it}^{T}\beta+\theta_{q}+\sum_{m=1}^{q-1}\theta_{m}\left[\log(R_{i,t-m})-Z_{i,t-m}^{T}\beta\right],\;{\rm for}\;t>q>0,

(3.2)

where $\|\theta\|_{1}<1$ and $\theta_{m}\geq 0$ for $m<q$ satisfies (3.1) under a more traceable condition 2.

3.2 Bias correction and property of the estimation procedure

When the measurement error cannot be ignored, i.e., $\epsilon_{it}\neq 0$ , the proposed iterative algorithm consists of two major steps: one that utilizes a semi-parametric locally efficient estimator to express $\beta$ and $R_{it}$ as functions of $\theta$ , and another that updates the estimates by solving the score equation. However, the latter step is subject to estimation bias due to the non-zero measurement error term in equation (2.2), which can cause bias in the score equation (Cameron and Trivedi, 2013). Correcting this bias in the general case represented by (2.2) is challenging, but it can be tackled in the special case described by (2.3) by requiring $\mathbbm{E}\big{(}e^{\epsilon_{t}}|\mathcal{F}_{t-1}\big{)}=c_{0}$ for some constant $c_{0}$ (Zeger, 1988). Therefore, we can correct the bias by modifying the quasi-score equation (2.9) into an unbiased estimating equation as follows:

U_{k}^{*}(\theta)=\sum_{t=\tau_{0}+1}^{k}\big{(}\frac{\partial\tilde{\mu}_{t}^{(k)}}{\partial\theta}\big{)}^{T}\big{(}\tilde{\nu}_{t}^{(k)}\big{)}^{-1}(I_{t}/c_{0}-\tilde{\mu}_{t}^{(k)}).

In practice, $c_{0}$ can be treated as a nuisance parameter and estimated by solving the quasi-score equation. Moreover, under Equation (2.3), solving the modified quasi-score Equation (2.9) is equivalent to obtaining

\hat{\theta}^{(k)}=\arg\max_{\|\theta\|_{1}<1}\sum_{t=\tau_{0}+1}^{k}\Big{(}(I_{t}/c_{0})\log\tilde{R}_{t}^{(k)}-\tilde{R}_{t}^{(k)}\Lambda_{t}\Big{)},

(3.3)

where $\hat{\theta}^{(k)}$ is the MLE based on conditional profile likelihood of Poisson counts. Therefore, the proposed online estimation procedure guarantees the following properties. The proof of the theorems is presented in Section C of the Supplementary Materials.

Theorem 3.2 (Concavity).

When $k>\tau_{0}$ , updating the estimates in each iteration, which is equivalently to solve equation (3.3), is a globally concave maximization problem.

Theorem 3.3 (Iteration Difference).

Given regularity condition 3 in Section C of the Supplementary Materials, using data of observed case incidence and local-area factors up to time $N$ , $\{(I_{t},Z_{t})\}_{1\leq t\leq N}$ , for each $1\leq m\leq q-1$ ,

\big{|}\hat{\theta}_{m}^{(k)}-\hat{\theta}_{m}^{(k-1)}\big{|}=O\Big{(}\frac{1}{k-1}+\frac{I_{k}}{\big{(}\sum_{t=\tau_{0}+1}^{k-1}I_{t}\big{)}}\Big{)},

where $k\leq N$ is the indicator of iteration.

Theorem 3.3 shows that the bound of the step-wise difference of the online estimator between iterations decreases as the number of observation time increases. The consistency of the estimator remains an open question as discussed in Davis, Dunsmuir and Streett (2003). The difficulty arises from the measurement error, $\epsilon_{it}$ , and the series dependency of disease incidence, $\Lambda_{it}$ , in the model. Both elements are essential in modeling the dynamics of disease transmission. Similar but simplified models, which do not model the measurement errors and series dependency of disease incidence were studied for different contexts, e.g. studies of economics or disease without series dependency of disease incidence (Davis, Dunsmuir and Wang, 1999, 2000; Davis, Dunsmuir and Streett, 2003; Fokianos, Rahbek and Tjøstheim, 2009; Neumann et al., 2011; Doukhan, Fokianos and Li, 2012; Doukhan, Fokianos and Tjøstheim, 2012). Inspired by the derivation and discussion in Davis, Dunsmuir and Streett (2003), we conjecture that conditions requiring stationary and uniformly ergodic of $\log(R_{t}\Lambda_{t})$ are needed to establish the consistency of estimators of model (2.3), although these conditions may be difficult to verify in practice.

4 Simulation Studies

This section describes the simulation studies we conducted to assess the performance of the proposed method. We focused on the estimation of the disease transmission rate and the impacts of time-varying factors . We calculated the relative bias, the coefficient of variation of the estimates, and the Bootstrap confidence interval coverage probability under various scenarios. Additionally, we compared the performance of the proposed model and the basic SIR model, the details of which are presented in Section A of the Supplementary Material.

4.1 Simulation settings

We generated data on the daily numbers of SARS-CoV-2 infections, along with associated local-area factors influencing the disease transmission rate. We generated data in a single location with time-varying factors and suppressed the location subscript $i$ in this section. We fit the proposed model, as described by Equation (2.3), assuming that the $R_{t}$ values exhibited an AR(1) dependency over time and were linked with an intercept term and two time-varying factors, i.e. $q=2$ and $p=2$ . Each scenario was replicated 1,000 times to calculate evaluation statistics. The infectiousness function was assumed to be known and was modeled using the probability density function of a gamma distribution, in a manner mirroring the results of a previous epidemiological study of SARS-CoV-2 cases in Wuhan, China (Li et al., 2020). The estimation algorithm was applied with a pre-estimation period of 5 days, i.e., $\tau_{0}=5$ .

Table 1: Parameter values used to generate data in simulation scenarios. In all scenarios, we set

\beta_{1}=-0.02

and

\beta_{2}=-0.125

. Except for scenario 6, the standard deviation of the error terms in each model is determined by

(\mbox{Var}\epsilon)^{1/2}\asymp\mathbbm{E}|Z_{T1}|\beta_{1}

Z_{t1}

was generated from

4.5+(t-\frac{T}{2})/6+\mathcal{N}(0,9)

in scenario 1 and generated from

9+(t-\frac{T}{2})/16+\mathcal{N}(0,9)

in scenario 3 to avoid unrealistically large values of

R_{t}

Scenario 1:	$T=90,$	$I_{0}=500,$	$\epsilon\sim N(0,10^{-3}),$	$\theta_{2}=0.5,$	$\theta_{1}=0.7,$
Scenario 2:	$T=120,$	$I_{0}=500,$	$\epsilon\sim N(0,10^{-3}),$	$\theta_{2}=0.5,$	$\theta_{1}=0.7,$
Scenario 3:	$T=200,$	$I_{0}=500,$	$\epsilon\sim N(0,10^{-3}),$	$\theta_{2}=0.5,$	$\theta_{1}=0.7,$
Scenario 4:	$T=120,$	$I_{0}=125,$	$\epsilon\sim N(0,10^{-3}),$	$\theta_{2}=0.5,$	$\theta_{1}=0.7,$
Scenario 5:	$T=120,$	$I_{0}=250,$	$\epsilon\sim N(0,10^{-3}),$	$\theta_{2}=0.5,$	$\theta_{1}=0.7,$
Scenario 6:	$T=120,$	$I_{0}=500,$	$\epsilon\sim N(0,10^{-2}),$	$\theta_{2}=0.5,$	$\theta_{1}=0.7,$
Scenario 7:	$T=120,$	$I_{0}=500,$	$\epsilon\sim t_{3}/10^{3/2},$	$\theta_{2}=0.5,$	$\theta_{1}=0.7,$
Scenario 8:	$T=120,$	$I_{0}=500,$	$\epsilon\sim N(0,10^{-3}),$	$\theta_{2}=0.5,$	$\theta_{1}=0.7,$	$\beta_{3}=-0.03,$
Scenario 9:	$T=120,$	$I_{0}=500,$	$\epsilon\sim N(0,10^{-3}),$	$\theta_{3}=0.45,$	$\theta_{1}=0.5,$	$\theta_{2}=0.3,$

Data were simulated under various scenarios, including settings where the fitted models were correctly specified or misspecified, as shown in Table 1. For the scenarios with correctly specified models (scenarios 1-7), data were generated using the parameters $(\theta_{1},\theta_{2},\beta_{1},\beta_{2})=(.7,.5,-.02,-.125)$ . Here, the simulated values $\{R_{t},t\geq 1\}$ were designed to mimic the trend of the $R_{it}$ reported in Pan et al. (2020). We varied factors such as the days of observation ( $T$ ), initial incident cases ( $I_{0}$ ), and utilized different distributions of error terms ${\epsilon_{t}\sim_{i.i.d}\epsilon,t=1,\ldots,T}$ for data generation. In the settings where the model was mis-specified (scenarios 8-9), data were simulated with the assumption that the $R_{t}$ values followed an AR(2) dependency (i.e., $q=3$ ) or were associated with an intercept term and three covariates (i.e., $p=3$ ). The intercept term is $\theta_{q}$ , while the two time-varying covariates ( $Z_{t,1},Z_{t,2}$ ) were simulated independently to mimic the real data of temperature in Philadelphia and data of social distancing from daily cellular telephone movement, provided by Unacast (Unacast, 2021), which measured the percent change in visits to nonessential businesses, e.g., restaurants and hair salons during March 1st and June 30th, 2020. Specifically, daily temperatures were generated from a shifted normal distribution, i.e., $Z_{t1}\overset{d}{=}5+\Big{(}t-\frac{T}{2}\Big{)}\big{/}8+\mathcal{N}(0,9)$ , for $t=1,\cdots,T$ . The social distancing data were generated from a uniform distribution and applied a logit transformation afterwards, i.e., $Z_{t2}\overset{i.i.d}{\sim}2+logit(Uniform(0,1)),$ for $t=1,\cdots,T$ . Additionally, in the misspecified scenarios with three covariates, the third covariate was generated from $Z_{t3}\overset{d}{=}\Big{(}t-\frac{T}{2}\Big{)}\big{/}9+\mathcal{N}(0,5)$ , for $t=1,\cdots,T$ . The variations of error terms were assumed to be small as the variation of $\log(R)$ was small with $R$ ranging from 0.5 to 3 and has a comparable size of the variation of covariates times their effects.

Table 2: Bias and coefficient of variation for the estimators in each scenario. The relative bias, bias, and coefficient of variation of the estimators were calculated using the following formulas. Relative bias:

\sum_{i=1}^{N_{s}}(\hat{a}_{i}-a)/aN_{s}

. Bias:

\sum_{i=1}^{N_{s}}(\hat{a}_{i}-a)/N_{s}

,Coefficient of variation:

\{\sum_{i=1}^{N_{s}}[\hat{a}_{i}-(\sum_{i=1}^{N_{s}}\hat{a}_{i}/N_{s})]^{2}\}^{1/2}/(\sum_{i=1}^{N_{s}}\hat{a}_{i}/N_{s})

. Here

N_{s}=1000

represents the number of replicates,

a

denotes the true value of a parameter, and

\hat{a}

indicates the estimated value of that parameter.

	relative bias (%)				empirical bias ( $\times 10^{-2}$ )				coefficient of variation $\hat{\sigma}/\hat{\mu}$
Scenario	$\theta_{1}$	$\theta_{2}$	$\beta_{1}$	$\beta_{2}$	$\theta_{1}$	$\theta_{2}$	$\beta_{1}$	$\beta_{2}$	$\theta_{1}$	$\theta_{2}$	$\beta_{1}$	$\beta_{2}$
1	-1.51	-2.94	-1.19	6.26	-1.05	-1.47	-0.02	0.78	0.10	0.37	0.55	0.38
2	-1.42	-2.41	-0.34	5.03	-0.99	-1.20	-0.01	0.63	0.11	0.27	0.36	0.36
3	-0.78	-1.15	-0.47	2.19	-0.55	-0.57	-0.01	0.27	0.10	0.24	0.27	0.29
4	-3.57	-2.08	-0.04	7.18	-2.50	-1.04	-0.00	0.90	0.14	0.37	0.69	0.64
5	-1.85	-4.87	-5.67	5.00	-1.29	-2.44	-0.11	0.63	0.13	0.39	0.98	0.60
6	-4.16	-1.10	-4.47	6.90	-2.91	-0.55	-0.09	0.86	0.13	0.39	0.43	0.48
7	-1.43	-4.20	-0.53	8.22	-1.00	-2.10	-0.01	1.03	0.11	0.28	0.37	0.36
8	-0.72	-17.83	4.06	9.64	-0.51	-8.92	0.14	1.20	0.12	0.31	0.31	0.42
9	$\diagup$	$\diagup$	-17.67	9.95	$\diagup$	$\diagup$	-3.53	1.24	$\diagup$	$\diagup$	0.27	0.27

4.2 Simulation results

Table 2 presents the bias and coefficient of variation of the estimators for each scenario. In scenarios where the models were correctly specified, both the bias and the coefficient of variation decreased with an increase in either the days of observation (scenarios 1-3) or the initial case count (scenarios 4-5). When the variance of the error term increased to a magnitude smaller than the product of the time-varying local-area factors and their corresponding effect sizes (comparing scenarios 6 and 2), the proposed estimator still demonstrated robust performance, with less than a 5% increase in relative bias and a 0.2 increase in the coefficient of variation. Even when the error term exhibited a heavier tail (comparing scenarios 7 and 2), the relative bias increased by less than 4%, and the coefficient of variation increased by less than 0.01.

Two types of mis-specification were considered. In scenario 8, an important time-varying covariate was omitted from the model. Due to the correlation between the generated local-area factors over time, this scenario differs from merely increasing the error term’s variance. In both mis-specified cases, the estimator $\hat{\theta}$ is closer to the corresponding partial correlation coefficients rather than to the true values. Although the mis-specification led to an increase in estimation bias, the algorithm still provided a relatively accurate estimator for the correctly specified parts, showing a small bias and coefficient of variation (comparing scenarios 2, 8, and 9). Most importantly, as illustrated in Figure 1 from a random replication of the simulation, the estimated $R_{t}$ values still captured the tendency of the true $\{R_{t}\}_{t\geq 0}$ , especially when the time-series sample size increased.

Two types of mis-specification were considered. In scenario 8, an important time-varying covariate was omitted from the model. Owing to the correlation between the generated local-area factors over time, this scenario represents more than just an increase in the error term’s variance. In both cases of mis-specification, the estimator $\hat{\theta}$ was closer to the corresponding partial correlation coefficients than to the true values. While the mis-specification led to increased estimation bias, the algorithm still provided a relatively accurate estimator for the correctly specified parts, exhibiting a small bias and coefficient of variation (as seen when comparing scenarios 2, 8, and 9). Most importantly, as illustrated in Figure 1 from a random replication of the simulation, the estimated $R_{t}$ values still accurately captured the trend of the true $\{R_{t}\}_{t\geq 0}$ , particularly as the time-series sample size increased.

Refer to caption — Figure 1: Estimated $R_{t}$ ’s and 90% bootstrap confidence bands from random replications of simulation, with block length $\ell=45$ and bootstrap replication $B=200$ .

Table 3: Coverage Probability of

1-\alpha

Empirical Bootstrap Confidence Interval for

\theta_{2}

	T=300	T=300	T=300	T=450	T=450
	$\ell=30$	$\ell=45$	$\ell=60$	$\ell=30$	$\ell=45$
$\alpha=5\%$	96.4%	94.8%	93.0%	99.8%	99.2%
$\alpha=10\%$	91.2%	87.8%	86.2%	98.6%	97.2%
$\alpha=20\%$	83.4%	79.8%	77.2%	96.8%	93.2%

We also constructed the bootstrap confidence interval according to (2.12), and the coverage probability of the bootstrap confidence interval for $\theta_{2}$ is shown in Table 3 as an illustration. Table 3 confirms that the proposed bootstrap confidence interval provides the expected coverage probability when choosing $\ell=45=\tau_{0}+O(T^{1/3})$ for $T=300$ , following the recommendations in Bühlmann and Künsch (1999), and setting the number of replications to $B=200=O(T^{2/3})$ , as per Bühlmann (2002). When a smaller block length $\ell=30$ is chosen for $T=300$ , the accuracy of the bootstrap estimator is reduced, leading to a conservative confidence interval. In contrast, when a larger block length $\ell=60$ is chosen, the bootstrap estimator’s accuracy may improve, but the coverage probability becomes less than ideal due to the high correlation between bootstrap subsamples. Furthermore, the optimal block length is also related to $T$ , as $\ell=30$ and $\ell=45$ would only result in conservative confidence intervals for $T=450$ . Practically, we suggest choosing the smaller value of $\ell$ that satisfies $\ell=O(N^{1/3})$ as recommended in Bühlmann and Künsch (1999).

5 Application to COVID-19 Data

We utilized the proposed method to analyze a dataset that includes data from the early stages of the COVID-19 pandemic. The dataset contains daily infection data and county-level risk factors from 808 US counties across 47 states and the District of Columbia, spanning from March 2020 to May 2021. We examined the impact of county demographics, social behavior, and vaccination coverage on disease transmission and provided projections of daily COVID-19 case counts for three distinct pandemic waves, including the first one. For each period, the model was trained using observed daily infection and covariate data for 4 months (training window) and made projections for the subsequent four weeks (projection window). The first two periods, from March to July 2020 and August to December 2020, covered the first wave and the first holiday season of the pandemic, respectively. During these periods, we used county-level population density, daily wet-bulb temperature, and social distancing measures to explain the variation of $R_{it}$ and project future cases. These decisions were informed by studies identifying factors that affect disease transmission and the predominant use of social distancing measures at the time (Rubin et al., 2020; Talic et al., 2021; Sera et al., 2021; Hou et al., 2021; Weaver et al., 2022). The third period, from January to May 2021, coincided with when COVID-19 vaccines became publicly available. In this phase, we incorporated vaccination coverage data along with the three existing covariates to model variation of $R_{it}$ and project future cases.

To be included in this analysis, a county had to have 5 incident cases for more than two out of six consecutive days for at least 20 times, as of June 1st, 2020 and meet at least one of the following criteria: 1)contain a city with a population exceeding 100,000; 2)contain a state capital; 3)be the most populated county in the state or 4) have an average daily case incidence exceeding 10 during June 1st to June 30th, 2020. Counties were also required to have social distancing data for each of the study period and vaccination data from the CDC for the third study period. The 808 counties represent 79.5% of the US population, as shown in Figure 2. Population density data was sourced from the US Census data and is expressed as number of people per square mile. Log transformation was performed followed by standardization due to the large values of this variable and substantial skewness in density for the largest cities. Social distancing, obtained from Unacast, was measured by the percent change in visits to nonessential businesses (e.g., restaurants, hair salons), revealed by daily cell phone movement within each county, compared with visits in a four-week baseline period between February 10th and March 8th, 2020. We employed a rolling average of the percentage of visits from 7 to 14 days prior to the time of interest to account for the potential lag between changes in social distancing and alterations in disease transmission. Vaccine coverage data, obtained from the CDC, was defined as the percentage of the population that has received at least one vaccine dose. We applied a 14-day lag to allow time for any potential effects of the vaccine to manifest. The percentage of the non-vaccinated population, calculated by subtracting the 14-day-lagged vaccination percentage from 100%, was then entered into the regression model. When projecting future case numbers, we used county population density, historical average wet-bulb temperature from the last 10 years before the analysis year, and the most recent weekly average values for social distancing and vaccination coverage to calculate the assumed covariate values for future days.

5.1 Influential local-area covariates for disease transmission

The estimated covariate effects on $R_{it}$ with bootstrap intervals for the three time periods are shown in Table 4. Among the four local-area covariates we examined, social distancing appeared to be the most influential factor related to disease transmission, particularly during the first wave of the pandemic. It exhibited a significant positive association with disease transmission. During the first wave, a 50% reduction in the frequency of visiting non-essential businesses was estimated to reduce $R_{it}$ by an average of 11%. This finding aligns with the effects of social distancing reported in other studies from the first wave of the pandemic, but our study encompasses more counties across the US (Rubin et al., 2020; Courtemanche et al., 2020). Over time, the impact of social distancing diminished. This change could be attributed to multiple factors, including possible changes in the virus and variations in population behavior, which might have prevented our social distancing measure from capturing all aspects of population mobility. Furthermore, we observed a gradual decrease in social distancing over time, from a median of 31% reduction in the first wave to 11% reduction in the third wave. It’s also possible that the effect of social distancing is nonlinear, and its impact was minimal when only a small amount of social distancing was implemented.

We also observed a significant heterogeneity in disease transmission across counties with varying population densities during the first wave. The reproduction number was higher in more populated areas compared to rural areas. Simultaneously, we found that wet-bulb temperature was negatively associated with $R_{it}$ , suggesting that transmission was generally higher in colder weather compared to warmer days. Although the regression coefficient of temperature was small, its impact on policy-making could be substantial due to the wide range of temperatures a county might experience within a year. During the first study period, the largest increment in wet-bulb temperatures observed within a county was 21^∘C, which could reduce COVID-19 transmission by 15%, with other covariates held constant. However, temperature could also affect social distancing values, as outdoor activities tend to increase during spring and summer, potentially offsetting the reduction in transmission due to increased temperatures. Both the heterogeneity associated with varying population densities and the impact of temperature decreased over time, indicating that the entire US began to experience a more uniform pandemic situation with less seasonality as we moved into the middle or later stages of the pandemic. Additionally, the effect of vaccination coverage on disease transmission during the first half of 2021 was not found to be significant. This lack of significance could be due to factors such as relatively lower vaccine coverage during early-stage vaccination rollout, behavioral changes post-vaccination, and challenges in achieving herd immunity.

\newcolumntype

P[1]¿\arraybackslashp#1 study period 1 2 3 training window 03/01/2020-06/30/2020 08/01/2020-11/30/2020 01/01/2021-05/31/2021 projection window 07/01/2020-07/28/2020 12/01/2020-12/28/2020 06/01/2021-06/28/2021 covariates population density
temperature
social distancing population density
temperature
social distancing population density
temperature
social distancing
vaccination coverage covariates effects population density( $\times 10^{-2}$ ) 7.32 (1.89, 3.19) 2.37 (1.14, 4.74) 0.29 (-0.04, 0.59) temperature ( $\times 10^{-2}$ ) -0.76 (-1.69, -0.16) -0.36 (-0.83, -0.09) -0.16 (-0.82, 0.05) social distancing 0.24 (0.10, 0.85) 0.15 (0.07, 0.24) 0.03 (-0.31, 0.15) proportion of non-vaccinated — — 0.02 (-0.20, 0.03) mean PAE (%) week 1 27.9 20.8 24.1 week 2 25.5 20.2 27.4 week 3 22.5 23.9 34.2 week 4 24.6 40.5 39.1 median PAE (%) week 1 22.5 17.8 16.4 week 2 19.4 16.6 17.9 week 3 17.7 16.7 19.9 week 4 20.1 27.3 25.1

Table 4: Model estimates and projections for the COVID-19 data during three study periods. Bootstrap intervals were based on replication

k=200

and block length

\ell=45

5.2 Forecasting future transmission

We compared the projected daily case counts with the actual observed numbers for the three study periods, and the accuracy of the projection was assessed using the weekly Percentage Absolute Error (PAE). Mean and median PAE are shown in Table 4 by week and study period. The mean PAE ranges from 20.2% to 27.9% for the first two weeks, and 22.5% to 40.5% for the second two weeks, respectively. We also compared our projection results with the CDC COVID-19 Cases Ensemble Forecast, which combines forecasts submitted by a large and variable number of contributing teams using different modeling techniques and data sources (Ray et al., 2020; Cramer et al., 2022). During the three study periods, our model showed similar performance to the CDC model in the 1-week projection and was superior in the 3-4 week projections. The CDC model’s average PAE across all states and weeks was 22%, 32%, 44%, and 57% for the 1 to 4-week forecast windows, respectively (Du et al., 2022). It’s worth noting that the CDC ensemble model is designed for state-level projection while our model focuses on local-level, specifically county-level, projections. Projecting at the county level is more challenging due to the smaller case incidence numbers and higher noise-to-signal ratio in county-level data.

6 Discussion

In this paper, we introduced a regression model with a measurement error term to monitor disease transmission and evaluate the impact of local-area factors in the early phases of outbreaks when data quantity and quality are limited. The time series dependency among the incidence case counts was decomposed into an epidemiological component, captured by the TSI model, and a statistical component, characterized using a time-series regression model. We employed the quasi-score method, complemented by a measurement error term, to tackle challenges related to poor data quality and potential model mispecification. We developed the iterative LOCAL-QUEST method to calculate online estimators of disease transmission and the impact of local-area factors on it.

Our method provides another novel tool to a much larger toolbox for combating future outbreaks, particularly useful during the early stages or for small regional-level analysis when supplemental data resources are scarce. However, the challenges of modeling transmission during outbreaks are still not fully addressed by existing methods. While the measurement error term in our model partially accounts for the data errors and reduces bias, as demonstrated in our simulation analysis, our disease transmission estimates might still be biased. When additional data, such as seroprevalence and symptom onset data, become available, they should be incorporated to further correct biases caused by underreporting or reporting delays (Lison et al., 2023; Quick, Dey and Lin, 2021). Furthermore, identifying causal risk factors for disease transmission using our method remains challenging, necessitating specialized study designs for accurate inference.

While acknowledging the challenges and limitations of our approach, the proposed method offers flexibility for modifications and further extensions. For instance, one can introduce a lag time between the instantaneous reproduction number and local-area factors, e.g., by regressing $R_{it}$ on $X_{i(t-\tau_{t})}$ with $\tau_{t}$ being a certain number of days in the regression equation, to account for delayed effects between changes in local-area factors and mean $R_{it}$ estimates across locations. It is also possible to incorporate a time series of under-reporting rates into the TSI model to adjust for varying under-reporting rates over time. When the under-reporting rate remains constant over time, using the reported cases counts to estimate $R_{it}$ is still valid (Cori et al., 2013) as the constant under-reporting rate is canceled out in the TSI renewal process. Furthermore, the model can be extended into a nonparametric model by relaxing assumptions on the link function to avoid mis-specification. Further details are provided in Supplementary Material A.6. The estimation is straightforward if the dependency of $\{R_{it}\}_{t\geq 0}$ is ignored for each $i$ . When time series dependency of $\{R_{it}\}_{t\geq 0}$ is considered, e.g., $\mathbbm{E}\big{[}R_{it}|Z_{it},D_{3,it}\big{]}=g(Z_{it})+\sum_{m=1}^{q}\theta_{m}f_{m}(R_{i,t-m})$ with a known $R_{i0}$ and known functions $f_{m}$ , $1\leq m\leq q$ and unknown $g(\cdot)$ , we can adopt local linear kernel estimators (Fan and Gijbels, 1996) to estimate the parameters in the LOCAL-QUEST algorithm.

References

Alam and Sultana (2021) {barticle}[author] \bauthor\bsnmAlam, \bfnmMd Shafiul\binitsM. S. and \bauthor\bsnmSultana, \bfnmRumana\binitsR. (\byear2021). \btitleInfluences of climatic and non-climatic factors on COVID-19 outbreak: a review of existing literature. \bjournalEnvironmental Challenges \bvolume5 \bpages100255. \endbibitem
Amman et al. (2022) {barticle}[author] \bauthor\bsnmAmman, \bfnmFabian\binitsF., \bauthor\bsnmMarkt, \bfnmRudolf\binitsR., \bauthor\bsnmEndler, \bfnmLukas\binitsL., \bauthor\bsnmHupfauf, \bfnmSebastian\binitsS., \bauthor\bsnmAgerer, \bfnmBenedikt\binitsB., \bauthor\bsnmSchedl, \bfnmAnna\binitsA., \bauthor\bsnmRichter, \bfnmLukas\binitsL., \bauthor\bsnmZechmeister, \bfnmMelanie\binitsM., \bauthor\bsnmBicher, \bfnmMartin\binitsM., \bauthor\bsnmHeiler, \bfnmGeorg\binitsG. \betalet al. (\byear2022). \btitleViral variant-resolved wastewater surveillance of SARS-CoV-2 at national scale. \bjournalNature biotechnology \bvolume40 \bpages1814–1822. \endbibitem
Aubrey (2020) {barticle}[author] \bauthor\bsnmAubrey, \bfnmAllison\binitsA. (\byear2020). \btitleCan Coronavirus Be Crushed By Warmer Weather? \bjournalNPR. \endbibitem
Bottou (1998) {barticle}[author] \bauthor\bsnmBottou, \bfnmLéon\binitsL. (\byear1998). \btitleOnline algorithms and stochastic approximations. \bjournalOnline learning in neural networks. \endbibitem
Bühlmann (2002) {barticle}[author] \bauthor\bsnmBühlmann, \bfnmPeter\binitsP. (\byear2002). \btitleBootstraps for time series. \bjournalStatistical science \bpages52–72. \endbibitem
Bühlmann and Künsch (1999) {barticle}[author] \bauthor\bsnmBühlmann, \bfnmPeter\binitsP. and \bauthor\bsnmKünsch, \bfnmHans R\binitsH. R. (\byear1999). \btitleBlock length selection in the bootstrap for time series. \bjournalComputational Statistics & Data Analysis \bvolume31 \bpages295–310. \endbibitem
Cameron and Trivedi (2013) {bbook}[author] \bauthor\bsnmCameron, \bfnmA Colin\binitsA. C. and \bauthor\bsnmTrivedi, \bfnmPravin K\binitsP. K. (\byear2013). \btitleRegression analysis of count data \bvolume53. \bpublisherCambridge university press. \endbibitem
Chiou and Müller (1998) {barticle}[author] \bauthor\bsnmChiou, \bfnmJeng-Min\binitsJ.-M. and \bauthor\bsnmMüller, \bfnmHans-Georg\binitsH.-G. (\byear1998). \btitleQuasi-likelihood regression with unknown link and variance functions. \bjournalJournal of the American Statistical Association \bvolume93 \bpages1376–1387. \endbibitem
Cori et al. (2013) {barticle}[author] \bauthor\bsnmCori, \bfnmAnne\binitsA., \bauthor\bsnmFerguson, \bfnmNeil M\binitsN. M., \bauthor\bsnmFraser, \bfnmChristophe\binitsC. and \bauthor\bsnmCauchemez, \bfnmSimon\binitsS. (\byear2013). \btitleA new framework and software to estimate time-varying reproduction numbers during epidemics. \bjournalAmerican journal of epidemiology \bvolume178 \bpages1505–1512. \endbibitem
Courtemanche et al. (2020) {barticle}[author] \bauthor\bsnmCourtemanche, \bfnmCharles\binitsC., \bauthor\bsnmGaruccio, \bfnmJoseph\binitsJ., \bauthor\bsnmLe, \bfnmAnh\binitsA., \bauthor\bsnmPinkston, \bfnmJoshua\binitsJ. and \bauthor\bsnmYelowitz, \bfnmAaron\binitsA. (\byear2020). \btitleStrong Social Distancing Measures In The United States Reduced The COVID-19 Growth Rate: Study evaluates the impact of social distancing measures on the growth rate of confirmed COVID-19 cases across the United States. \bjournalHealth affairs \bvolume39 \bpages1237–1246. \endbibitem
Cramer et al. (2022) {barticle}[author] \bauthor\bsnmCramer, \bfnmEstee Y\binitsE. Y., \bauthor\bsnmHuang, \bfnmYuxin\binitsY., \bauthor\bsnmWang, \bfnmYijin\binitsY., \bauthor\bsnmRay, \bfnmEvan L\binitsE. L., \bauthor\bsnmCornell, \bfnmMatthew\binitsM., \bauthor\bsnmBracher, \bfnmJohannes\binitsJ., \bauthor\bsnmBrennen, \bfnmAndrea\binitsA., \bauthor\bsnmRivadeneira, \bfnmAlvaro J Castro\binitsA. J. C., \bauthor\bsnmGerding, \bfnmAaron\binitsA., \bauthor\bsnmHouse, \bfnmKatie\binitsK. \betalet al. (\byear2022). \btitleThe united states covid-19 forecast hub dataset. \bjournalScientific data \bvolume9 \bpages462. \endbibitem
Davis, Dunsmuir and Wang (1999) {barticle}[author] \bauthor\bsnmDavis, \bfnmRichard A\binitsR. A., \bauthor\bsnmDunsmuir, \bfnmWilliam TM\binitsW. T. and \bauthor\bsnmWang, \bfnmYing\binitsY. (\byear1999). \btitleModeling time series of count data. \bjournalStatistics Textbooks and Monographs \bvolume158 \bpages63–114. \endbibitem
Davis, Dunsmuir and Wang (2000) {barticle}[author] \bauthor\bsnmDavis, \bfnmRichard A\binitsR. A., \bauthor\bsnmDunsmuir, \bfnmWilliam TM\binitsW. T. and \bauthor\bsnmWang, \bfnmYin\binitsY. (\byear2000). \btitleOn autocorrelation in a Poisson regression model. \bjournalBiometrika \bvolume87 \bpages491–505. \endbibitem
Davis, Dunsmuir and Streett (2003) {barticle}[author] \bauthor\bsnmDavis, \bfnmRichard A\binitsR. A., \bauthor\bsnmDunsmuir, \bfnmWilliam TM\binitsW. T. and \bauthor\bsnmStreett, \bfnmSarah B\binitsS. B. (\byear2003). \btitleObservation-driven models for Poisson counts. \bjournalBiometrika \bvolume90 \bpages777–790. \endbibitem
Dempsey (2020) {barticle}[author] \bauthor\bsnmDempsey, \bfnmWalter\binitsW. (\byear2020). \btitleAddressing selection bias and measurement error in COVID-19 case count data using auxiliary information. \bjournalarXiv preprint arXiv:2005.10425. \endbibitem
Doukhan, Fokianos and Li (2012) {barticle}[author] \bauthor\bsnmDoukhan, \bfnmPaul\binitsP., \bauthor\bsnmFokianos, \bfnmKonstantinos\binitsK. and \bauthor\bsnmLi, \bfnmXiaoyin\binitsX. (\byear2012). \btitleOn weak dependence conditions: the case of discrete valued processes. \bjournalStatistics & Probability Letters \bvolume82 \bpages1941–1948. \endbibitem
Doukhan, Fokianos and Tjøstheim (2012) {barticle}[author] \bauthor\bsnmDoukhan, \bfnmPaul\binitsP., \bauthor\bsnmFokianos, \bfnmKonstantinos\binitsK. and \bauthor\bsnmTjøstheim, \bfnmDag\binitsD. (\byear2012). \btitleOn weak dependence conditions for Poisson autoregressions. \bjournalStatistics & Probability Letters \bvolume82 \bpages942–948. \endbibitem
Du et al. (2022) {barticle}[author] \bauthor\bsnmDu, \bfnmHongru\binitsH., \bauthor\bsnmDong, \bfnmEnsheng\binitsE., \bauthor\bsnmBadr, \bfnmHamada S\binitsH. S., \bauthor\bsnmPetrone, \bfnmMary E\binitsM. E., \bauthor\bsnmGrubaugh, \bfnmNathan D\binitsN. D. and \bauthor\bsnmGardner, \bfnmLauren M\binitsL. M. (\byear2022). \btitleA Deep Learning Approach to Forecast Short-Term COVID-19 Cases and Deaths in the US. \bjournalmedRxiv \bpages2022–08. \endbibitem
Fan and Gijbels (1996) {bbook}[author] \bauthor\bsnmFan, \bfnmJianqing\binitsJ. and \bauthor\bsnmGijbels, \bfnmIrene\binitsI. (\byear1996). \btitleLocal polynomial modelling and its applications: monographs on statistics and applied probability 66 \bvolume66. \bpublisherCRC Press. \endbibitem
Farhang-Boroujeny (2013) {bbook}[author] \bauthor\bsnmFarhang-Boroujeny, \bfnmBehrouz\binitsB. (\byear2013). \btitleAdaptive filters: theory and applications. \bpublisherJohn Wiley & Sons. \endbibitem
Fokianos, Rahbek and Tjøstheim (2009) {barticle}[author] \bauthor\bsnmFokianos, \bfnmKonstantinos\binitsK., \bauthor\bsnmRahbek, \bfnmAnders\binitsA. and \bauthor\bsnmTjøstheim, \bfnmDag\binitsD. (\byear2009). \btitlePoisson autoregression. \bjournalJournal of the American Statistical Association \bvolume104 \bpages1430–1439. \endbibitem
Ge et al. (2023) {barticle}[author] \bauthor\bsnmGe, \bfnmYong\binitsY., \bauthor\bsnmWu, \bfnmXilin\binitsX., \bauthor\bsnmZhang, \bfnmWenbin\binitsW., \bauthor\bsnmWang, \bfnmXiaoli\binitsX., \bauthor\bsnmZhang, \bfnmDie\binitsD., \bauthor\bsnmWang, \bfnmJianghao\binitsJ., \bauthor\bsnmLiu, \bfnmHaiyan\binitsH., \bauthor\bsnmRen, \bfnmZhoupeng\binitsZ., \bauthor\bsnmRuktanonchai, \bfnmNick W\binitsN. W., \bauthor\bsnmRuktanonchai, \bfnmCorrine W\binitsC. W. \betalet al. (\byear2023). \btitleEffects of public-health measures for zeroing out different SARS-CoV-2 variants. \bjournalNature Communications \bvolume14 \bpages5270. \endbibitem
Gorman (2020) {barticle}[author] \bauthor\bsnmGorman, \bfnmJames\binitsJ. (\byear2020). \btitleSummer heat may not diminish coronavirus strength. \bjournalNew York Times. \endbibitem
Gostic et al. (2020) {barticle}[author] \bauthor\bsnmGostic, \bfnmKatelyn M\binitsK. M., \bauthor\bsnmMcGough, \bfnmLauren\binitsL., \bauthor\bsnmBaskerville, \bfnmEdward B\binitsE. B., \bauthor\bsnmAbbott, \bfnmSam\binitsS., \bauthor\bsnmJoshi, \bfnmKeya\binitsK., \bauthor\bsnmTedijanto, \bfnmChristine\binitsC., \bauthor\bsnmKahn, \bfnmRebecca\binitsR., \bauthor\bsnmNiehus, \bfnmRene\binitsR., \bauthor\bsnmHay, \bfnmJames A\binitsJ. A., \bauthor\bsnmDe Salazar, \bfnmPablo M\binitsP. M. \betalet al. (\byear2020). \btitlePractical considerations for measuring the effective reproductive number, R t. \bjournalPLoS computational biology \bvolume16 \bpagese1008409. \endbibitem
Hall (1985) {barticle}[author] \bauthor\bsnmHall, \bfnmPeter\binitsP. (\byear1985). \btitleResampling a coverage pattern. \bjournalStochastic processes and their applications \bvolume20 \bpages231–246. \endbibitem
He et al. (2020) {barticle}[author] \bauthor\bsnmHe, \bfnmXi\binitsX., \bauthor\bsnmLau, \bfnmEric HY\binitsE. H., \bauthor\bsnmWu, \bfnmPeng\binitsP., \bauthor\bsnmDeng, \bfnmXilong\binitsX., \bauthor\bsnmWang, \bfnmJian\binitsJ., \bauthor\bsnmHao, \bfnmXinxin\binitsX., \bauthor\bsnmLau, \bfnmYiu Chung\binitsY. C., \bauthor\bsnmWong, \bfnmJessica Y\binitsJ. Y., \bauthor\bsnmGuan, \bfnmYujuan\binitsY., \bauthor\bsnmTan, \bfnmXinghua\binitsX. \betalet al. (\byear2020). \btitleTemporal dynamics in viral shedding and transmissibility of COVID-19. \bjournalNature medicine \bvolume26 \bpages672–675. \endbibitem
Hou et al. (2021) {barticle}[author] \bauthor\bsnmHou, \bfnmXiao\binitsX., \bauthor\bsnmGao, \bfnmSong\binitsS., \bauthor\bsnmLi, \bfnmQin\binitsQ., \bauthor\bsnmKang, \bfnmYuhao\binitsY., \bauthor\bsnmChen, \bfnmNan\binitsN., \bauthor\bsnmChen, \bfnmKaiping\binitsK., \bauthor\bsnmRao, \bfnmJinmeng\binitsJ., \bauthor\bsnmEllenberg, \bfnmJordan S\binitsJ. S. and \bauthor\bsnmPatz, \bfnmJonathan A\binitsJ. A. (\byear2021). \btitleIntracounty modeling of COVID-19 infection with human mobility: Assessing spatial heterogeneity with business traffic, age, and race. \bjournalProceedings of the National Academy of Sciences \bvolume118 \bpagese2020524118. \endbibitem
Jones et al. (2021) {barticle}[author] \bauthor\bsnmJones, \bfnmJefferson M\binitsJ. M., \bauthor\bsnmStone, \bfnmMars\binitsM., \bauthor\bsnmSulaeman, \bfnmHasan\binitsH., \bauthor\bsnmFink, \bfnmRebecca V\binitsR. V., \bauthor\bsnmDave, \bfnmHoney\binitsH., \bauthor\bsnmLevy, \bfnmMatthew E\binitsM. E., \bauthor\bsnmDi Germanio, \bfnmClara\binitsC., \bauthor\bsnmGreen, \bfnmValerie\binitsV., \bauthor\bsnmNotari, \bfnmEdward\binitsE., \bauthor\bsnmSaa, \bfnmPaula\binitsP. \betalet al. (\byear2021). \btitleEstimated US infection-and vaccine-induced SARS-CoV-2 seroprevalence based on blood donations, July 2020-May 2021. \bjournalJama \bvolume326 \bpages1400–1409. \endbibitem
Jones et al. (2023) {barticle}[author] \bauthor\bsnmJones, \bfnmJefferson M\binitsJ. M., \bauthor\bsnmManrique, \bfnmIrene Molina\binitsI. M., \bauthor\bsnmStone, \bfnmMars S\binitsM. S., \bauthor\bsnmGrebe, \bfnmEduard\binitsE., \bauthor\bsnmSaa, \bfnmPaula\binitsP., \bauthor\bsnmGermanio, \bfnmClara D\binitsC. D., \bauthor\bsnmSpencer, \bfnmBryan R\binitsB. R., \bauthor\bsnmNotari, \bfnmEdward\binitsE., \bauthor\bsnmBravo, \bfnmMarjorie\binitsM., \bauthor\bsnmLanteri, \bfnmMarion C\binitsM. C. \betalet al. (\byear2023). \btitleEstimates of SARS-CoV-2 seroprevalence and incidence of primary SARS-CoV-2 infections among blood donors, by COVID-19 vaccination status—United States, April 2021–September 2022. \bjournalMorbidity and Mortality Weekly Report \bvolume72 \bpages601. \endbibitem
Kaufmann (1987) {barticle}[author] \bauthor\bsnmKaufmann, \bfnmHeinz\binitsH. (\byear1987). \btitleRegression models for nonstationary categorical time series: asymptotic estimation theory. \bjournalThe Annals of Statistics \bpages79–98. \endbibitem
Khan et al. (2022) {barticle}[author] \bauthor\bsnmKhan, \bfnmDiba\binitsD., \bauthor\bsnmPark, \bfnmMeeyoung\binitsM., \bauthor\bsnmLerma, \bfnmSamuel\binitsS., \bauthor\bsnmSoroka, \bfnmStephen\binitsS., \bauthor\bsnmGaughan, \bfnmDenise\binitsD., \bauthor\bsnmBottichio, \bfnmLyndsay\binitsL., \bauthor\bsnmBray, \bfnmMonika\binitsM., \bauthor\bsnmFukushima, \bfnmMary\binitsM., \bauthor\bsnmBregman, \bfnmBrooke\binitsB., \bauthor\bsnmWiedeman, \bfnmCaleb\binitsC. \betalet al. (\byear2022). \btitleImproving efficiency of COVID-19 aggregate case and death surveillance data transmission for jurisdictions: current and future role of application programming interfaces (APIs). \bjournalJournal of the American Medical Informatics Association \bvolume29 \bpages1807–1809. \endbibitem
Khan et al. (2023) {barticle}[author] \bauthor\bsnmKhan, \bfnmDiba\binitsD., \bauthor\bsnmPark, \bfnmMeeyoung\binitsM., \bauthor\bsnmBurkholder, \bfnmJacqueline\binitsJ., \bauthor\bsnmDumbuya, \bfnmSorie\binitsS., \bauthor\bsnmRitchey, \bfnmMatthew D\binitsM. D., \bauthor\bsnmYoon, \bfnmPaula\binitsP., \bauthor\bsnmGalante, \bfnmAmanda\binitsA., \bauthor\bsnmDuva, \bfnmJoseph L\binitsJ. L., \bauthor\bsnmFreeman, \bfnmJeffrey\binitsJ., \bauthor\bsnmDuck, \bfnmWilliam\binitsW. \betalet al. (\byear2023). \btitleTracking COVID-19 in the United States with surveillance of aggregate cases and deaths. \bjournalPublic Health Reports \bvolume138 \bpages428–437. \endbibitem
Künsch (1989) {barticle}[author] \bauthor\bsnmKünsch, \bfnmHans R\binitsH. R. (\byear1989). \btitleThe jackknife and the bootstrap for general stationary observations. \bjournalAnnals of Statistics \bvolume17 \bpages1217–1241. \endbibitem
Li et al. (2020) {barticle}[author] \bauthor\bsnmLi, \bfnmQun\binitsQ., \bauthor\bsnmGuan, \bfnmXuhua\binitsX., \bauthor\bsnmWu, \bfnmPeng\binitsP., \bauthor\bsnmWang, \bfnmXiaoye\binitsX., \bauthor\bsnmZhou, \bfnmLei\binitsL., \bauthor\bsnmTong, \bfnmYeqing\binitsY., \bauthor\bsnmRen, \bfnmRuiqi\binitsR., \bauthor\bsnmLeung, \bfnmKathy SM\binitsK. S., \bauthor\bsnmLau, \bfnmEric HY\binitsE. H., \bauthor\bsnmWong, \bfnmJessica Y\binitsJ. Y. \betalet al. (\byear2020). \btitleEarly transmission dynamics in Wuhan, China, of novel coronavirus–infected pneumonia. \bjournalNew England journal of medicine. \endbibitem
Lison et al. (2023) {barticle}[author] \bauthor\bsnmLison, \bfnmAdrian\binitsA., \bauthor\bsnmAbbott, \bfnmSam\binitsS., \bauthor\bsnmHuisman, \bfnmJana\binitsJ. and \bauthor\bsnmStadler, \bfnmTanja\binitsT. (\byear2023). \btitleGenerative Bayesian modeling to nowcast the effective reproduction number from line list data with missing symptom onset dates. \bjournalarXiv preprint arXiv:2308.13262. \endbibitem
Nash, Nouvellet and Cori (2022) {barticle}[author] \bauthor\bsnmNash, \bfnmRebecca K\binitsR. K., \bauthor\bsnmNouvellet, \bfnmPierre\binitsP. and \bauthor\bsnmCori, \bfnmAnne\binitsA. (\byear2022). \btitleReal-time estimation of the epidemic reproduction number: Scoping review of the applications and challenges. \bjournalPLOS Digital Health \bvolume1 \bpagese0000052. \endbibitem
Neumann et al. (2011) {barticle}[author] \bauthor\bsnmNeumann, \bfnmMichael H\binitsM. H. \betalet al. (\byear2011). \btitleAbsolute regularity and ergodicity of Poisson count processes. \bjournalBernoulli \bvolume17 \bpages1268–1284. \endbibitem
Noh and Danuser (2021) {barticle}[author] \bauthor\bsnmNoh, \bfnmJungsik\binitsJ. and \bauthor\bsnmDanuser, \bfnmGaudenz\binitsG. (\byear2021). \btitleEstimation of the fraction of COVID-19 infected people in US states and countries worldwide. \bjournalPloS one \bvolume16 \bpagese0246772. \endbibitem
Nouvellet et al. (2021) {barticle}[author] \bauthor\bsnmNouvellet, \bfnmPierre\binitsP., \bauthor\bsnmBhatia, \bfnmSangeeta\binitsS., \bauthor\bsnmCori, \bfnmAnne\binitsA., \bauthor\bsnmAinslie, \bfnmKylie EC\binitsK. E., \bauthor\bsnmBaguelin, \bfnmMarc\binitsM., \bauthor\bsnmBhatt, \bfnmSamir\binitsS., \bauthor\bsnmBoonyasiri, \bfnmAdhiratha\binitsA., \bauthor\bsnmBrazeau, \bfnmNicholas F\binitsN. F., \bauthor\bsnmCattarino, \bfnmLorenzo\binitsL., \bauthor\bsnmCooper, \bfnmLaura V\binitsL. V. \betalet al. (\byear2021). \btitleReduction in mobility and COVID-19 transmission. \bjournalNature communications \bvolume12 \bpages1090. \endbibitem
Pan et al. (2020) {barticle}[author] \bauthor\bsnmPan, \bfnmAn\binitsA., \bauthor\bsnmLiu, \bfnmLi\binitsL., \bauthor\bsnmWang, \bfnmChaolong\binitsC., \bauthor\bsnmGuo, \bfnmHuan\binitsH., \bauthor\bsnmHao, \bfnmXingjie\binitsX., \bauthor\bsnmWang, \bfnmQi\binitsQ., \bauthor\bsnmHuang, \bfnmJiao\binitsJ., \bauthor\bsnmHe, \bfnmNa\binitsN., \bauthor\bsnmYu, \bfnmHongjie\binitsH., \bauthor\bsnmLin, \bfnmXihong\binitsX. \betalet al. (\byear2020). \btitleAssociation of public health interventions with the epidemiology of the COVID-19 outbreak in Wuhan, China. \bjournalJama \bvolume323 \bpages1915–1923. \endbibitem
Pica and Bouvier (2012) {barticle}[author] \bauthor\bsnmPica, \bfnmNatalie\binitsN. and \bauthor\bsnmBouvier, \bfnmNicole M\binitsN. M. (\byear2012). \btitleEnvironmental factors affecting the transmission of respiratory viruses. \bjournalCurrent opinion in virology \bvolume2 \bpages90–95. \endbibitem
Quick, Dey and Lin (2021) {barticle}[author] \bauthor\bsnmQuick, \bfnmCorbin\binitsC., \bauthor\bsnmDey, \bfnmRounak\binitsR. and \bauthor\bsnmLin, \bfnmXihong\binitsX. (\byear2021). \btitleRegression Models for Understanding COVID-19 Epidemic Dynamics With Incomplete Data. \bjournalJournal of the American Statistical Association \bvolume116 \bpages1561–1577. \endbibitem
Ray et al. (2020) {barticle}[author] \bauthor\bsnmRay, \bfnmEvan L\binitsE. L., \bauthor\bsnmWattanachit, \bfnmNutcha\binitsN., \bauthor\bsnmNiemi, \bfnmJarad\binitsJ., \bauthor\bsnmKanji, \bfnmAbdul Hannan\binitsA. H., \bauthor\bsnmHouse, \bfnmKatie\binitsK., \bauthor\bsnmCramer, \bfnmEstee Y\binitsE. Y., \bauthor\bsnmBracher, \bfnmJohannes\binitsJ., \bauthor\bsnmZheng, \bfnmAndrew\binitsA., \bauthor\bsnmYamana, \bfnmTeresa K\binitsT. K., \bauthor\bsnmXiong, \bfnmXinyue\binitsX. \betalet al. (\byear2020). \btitleEnsemble forecasts of coronavirus disease 2019 (COVID-19) in the US. \bjournalMedRXiv \bpages2020–08. \endbibitem
Roach (2020) {barticle}[author] \bauthor\bsnmRoach, \bfnmJohn\binitsJ. (\byear2020). \btitleSpring may impact the spread of the coronavirus. \bjournalAccuweather. \endbibitem
Rubin et al. (2020) {barticle}[author] \bauthor\bsnmRubin, \bfnmDavid\binitsD., \bauthor\bsnmHuang, \bfnmJing\binitsJ., \bauthor\bsnmFisher, \bfnmBrian T\binitsB. T., \bauthor\bsnmGasparrini, \bfnmAntonio\binitsA., \bauthor\bsnmTam, \bfnmVicky\binitsV., \bauthor\bsnmSong, \bfnmLihai\binitsL., \bauthor\bsnmWang, \bfnmXi\binitsX., \bauthor\bsnmKaufman, \bfnmJason\binitsJ., \bauthor\bsnmFitzpatrick, \bfnmKate\binitsK., \bauthor\bsnmJain, \bfnmArushi\binitsA. \betalet al. (\byear2020). \btitleAssociation of social distancing, population density, and temperature with the instantaneous reproduction number of SARS-CoV-2 in counties across the United States. \bjournalJAMA network open \bvolume3 \bpagese2016099–e2016099. \endbibitem
Sera et al. (2021) {barticle}[author] \bauthor\bsnmSera, \bfnmFrancesco\binitsF., \bauthor\bsnmArmstrong, \bfnmBen\binitsB., \bauthor\bsnmAbbott, \bfnmSam\binitsS., \bauthor\bsnmMeakin, \bfnmSophie\binitsS., \bauthor\bsnmO’Reilly, \bfnmKathleen\binitsK., \bauthor\bparticlevon \bsnmBorries, \bfnmRosa\binitsR., \bauthor\bsnmSchneider, \bfnmRochelle\binitsR., \bauthor\bsnmRoyé, \bfnmDominic\binitsD., \bauthor\bsnmHashizume, \bfnmMasahiro\binitsM., \bauthor\bsnmPascal, \bfnmMathilde\binitsM. \betalet al. (\byear2021). \btitleA cross-sectional analysis of meteorological factors and SARS-CoV-2 transmission in 409 cities across 26 countries. \bjournalNature communications \bvolume12 \bpages5968. \endbibitem
Stewart-Ibarra et al. (2014) {barticle}[author] \bauthor\bsnmStewart-Ibarra, \bfnmAnna M\binitsA. M., \bauthor\bsnmMuñoz, \bfnmÁngel G\binitsÁ. G., \bauthor\bsnmRyan, \bfnmSadie J\binitsS. J., \bauthor\bsnmAyala, \bfnmEfraín Beltrán\binitsE. B., \bauthor\bsnmBorbor-Cordova, \bfnmMercy J\binitsM. J., \bauthor\bsnmFinkelstein, \bfnmJulia L\binitsJ. L., \bauthor\bsnmMejía, \bfnmRaúl\binitsR., \bauthor\bsnmOrdoñez, \bfnmTania\binitsT., \bauthor\bsnmRecalde-Coronel, \bfnmG Cristina\binitsG. C. and \bauthor\bsnmRivero, \bfnmKeytia\binitsK. (\byear2014). \btitleSpatiotemporal clustering, climate periodicity, and social-ecological risk factors for dengue during an outbreak in Machala, Ecuador, in 2010. \bjournalBMC infectious diseases \bvolume14 \bpages1–16. \endbibitem
Svensson (2007) {barticle}[author] \bauthor\bsnmSvensson, \bfnmÅke\binitsÅ. (\byear2007). \btitleA note on generation times in epidemic models. \bjournalMathematical biosciences \bvolume208 \bpages300–311. \endbibitem
Talic et al. (2021) {barticle}[author] \bauthor\bsnmTalic, \bfnmStella\binitsS., \bauthor\bsnmShah, \bfnmShivangi\binitsS., \bauthor\bsnmWild, \bfnmHolly\binitsH., \bauthor\bsnmGasevic, \bfnmDanijela\binitsD., \bauthor\bsnmMaharaj, \bfnmAshika\binitsA., \bauthor\bsnmAdemi, \bfnmZanfina\binitsZ., \bauthor\bsnmLi, \bfnmXue\binitsX., \bauthor\bsnmXu, \bfnmWei\binitsW., \bauthor\bsnmMesa-Eguiagaray, \bfnmInes\binitsI., \bauthor\bsnmRostron, \bfnmJasmin\binitsJ. \betalet al. (\byear2021). \btitleEffectiveness of public health measures in reducing the incidence of covid-19, SARS-CoV-2 transmission, and covid-19 mortality: systematic review and meta-analysis. \bjournalbmj \bvolume375. \endbibitem
WHO Ebola Response Team (2014) {barticle}[author] \bauthor\bsnmWHO Ebola Response Team (\byear2014). \btitleEbola virus disease in West Africa—the first 9 months of the epidemic and forward projections. \bjournalNew England Journal of Medicine \bvolume371 \bpages1481–1495. \endbibitem
Tsiatis (2007) {bbook}[author] \bauthor\bsnmTsiatis, \bfnmAnastasios\binitsA. (\byear2007). \btitleSemiparametric theory and missing data. \bpublisherSpringer Science & Business Media. \endbibitem
Unacast (2021) {barticle}[author] \bauthor\bsnmUnacast (\byear2021). \btitleSocial distancing scoreboard. \bjournalUnacast. \endbibitem
Weaver et al. (2022) {barticle}[author] \bauthor\bsnmWeaver, \bfnmAmanda K\binitsA. K., \bauthor\bsnmHead, \bfnmJennifer R\binitsJ. R., \bauthor\bsnmGould, \bfnmCarlos F\binitsC. F., \bauthor\bsnmCarlton, \bfnmElizabeth J\binitsE. J. and \bauthor\bsnmRemais, \bfnmJustin V\binitsJ. V. (\byear2022). \btitleEnvironmental factors influencing COVID-19 incidence and severity. \bjournalAnnual review of public health \bvolume43 \bpages271–291. \endbibitem
Wooldridge (2016) {bbook}[author] \bauthor\bsnmWooldridge, \bfnmJeffrey M\binitsJ. M. (\byear2016). \btitleIntroductory econometrics: A modern approach. \bpublisherNelson Education. \endbibitem
Zeger (1988) {barticle}[author] \bauthor\bsnmZeger, \bfnmScott L\binitsS. L. (\byear1988). \btitleA regression model for time series of counts. \bjournalBiometrika \bvolume75 \bpages621–629. \endbibitem

	$\displaystyle L_{j,\alpha/2,B}^{}=2\hat{\beta}_{j}-\sup\left\{t:\frac{1}{B}\sum_{b=1}^{B}\mathbbm{1}(\hat{\beta}^{}_{t_{b},j}\leq t)\leq 1-\frac{\alpha}{2}\right\},$
	$\displaystyle U_{j,\alpha/2,B}^{}=2\hat{\beta}_{j}-\inf\left\{t:\frac{1}{B}\sum_{b=1}^{B}\mathbbm{1}(\hat{\beta}^{}_{t_{b},j}\leq t)\geq\frac{\alpha}{2}\right\},\;\;\text{or}$		(2.12)
	$\displaystyle L_{j,\alpha/2,B}^{}=\inf\left\{t:\frac{1}{B}\sum_{b=1}^{B}\mathbbm{1}(\hat{\beta}^{}_{t_{b},j}\leq t)\geq\frac{\alpha}{2}\right\},\;U_{j,\alpha/2,B}^{}=\sup\left\{t:\frac{1}{B}\sum_{b=1}^{B}\mathbbm{1}(\hat{\beta}^{}_{t_{b},j}\leq t)\leq 1-\frac{\alpha}{2}\right\}.$