This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\jyear

2021

[1,2,3]\fnmOliver \surEales

1]\orgdivInfectious Disease Dynamics Unit, Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, \orgnameThe University of Melbourne, \orgaddress \cityMelbourne, \countryAustralia

2]\orgdivSchool of Public Health, \orgnameImperial College London, \orgaddress \cityLondon, \countryUnited Kingdom

3]\orgdivMRC Centre for Global infectious Disease Analysis and Abdul Latif Jameel Institute for Disease and Emergency Analytics, \orgnameImperial College London, \orgaddress \cityLondon, \countryUnited Kingdom

Differences between the true reproduction number and the apparent reproduction number of an epidemic time series

[email protected]    \fnmSteven \surRiley [email protected] [ [ [
Abstract

The time-varying reproduction number R(t)R(t) measures the number of new infections per infectious individual and is closely correlated with the time series of infection incidence by definition. The timings of actual infections are rarely known, and analysis of epidemics usually relies on time series data for other outcomes such as symptom onset. A common implicit assumption, when estimating R(t)R(t) from an epidemic time series, is that R(t)R(t) has the same relationship with these downstream outcomes as it does with the time series of incidence. However, this assumption is unlikely to be valid given that most epidemic time series are not perfect proxies of incidence. Rather they represent convolutions of incidence with uncertain delay distributions. Here we define the apparent time-varying reproduction number, RA(t)R_{A}(t), the reproduction number calculated from a downstream epidemic time series and demonstrate how differences between RA(t)R_{A}(t) and R(t)R(t) depend on the convolution function. The mean of the convolution function sets a time offset between the two signals, whilst the variance of the convolution function introduces a relative distortion between them. We present the convolution functions of epidemic time series that were available during the SARS-CoV-2 pandemic. Infection prevalence, measured by random sampling studies, presents fewer biases than other epidemic time series. Here we show that additionally the mean and variance of its convolution function were similar to that obtained from traditional surveillance based on mass-testing and could be reduced using more frequent testing, or by using stricter thresholds for positivity. Infection prevalence studies continue to be a versatile tool for tracking the temporal trends of R(t)R(t), and with additional refinements to their study protocol, will be of even greater utility during any future epidemics or pandemics.

keywords:
Reproduction number, Epidemics, SARS-CoV-2, Pandemics, COVID-19, Infection prevalence, Disease surveillance, Testing

1 Introduction

Infectious disease epidemics are a major threat to public health. Over the past two decades there has been major outbreaks of SARS Anderson2004-dx , MERS Sharif-Yakan2014-dg , influenza Gog2014-fy , Ebola Gire2014-de , dengue MacCormack-Gelles2018-wk and most recently the emergence of SARS-CoV-2 Li2021-wx , which resulted in the COVID-19 pandemic. Accurately quantifying the transmission dynamics of infection is crucial for informing decisions about changes to policy on public health interventions.
The reproduction number, the expected number of secondary infections resulting from a typical primary infection, is a key quantity describing a pathogen’s transmission dynamics. If public health interventions are to be implemented, R(t)R(t) can inform on the magnitude of the interventions required to bring the epidemic under control Anderson1992-xa . Analysis of how R(t)R(t) changes over time is also important for assessing the impact of interventions including vaccination and social distancing. Li2021-uz ; Eales2022-ub .
Mathematically, R(t)R(t) is linked to the time series of infection incidence (the rate of new infections) by a renewal process. If the infection incidence and the generation time (the time between a primary and secondary infection) distribution are known, then R(t)R(t) can simply be calculated Wallinga2007-qv . However, the timings of infections are rarely known, and so accurate time series of infection incidence rarely exist. Estimates of R(t)R(t) must instead rely on other epidemic time series that are often treated as proxies for the infection incidence Nash2022-ob .
During an epidemic there are often many sources of time series data collected. These can include measurements of the frequency of downstream points in the natural history of infection such as onset of symptoms, hospitalisation, and death noauthor_undated-pb . During the SARS-CoV-2 pandemic, time series data for infection prevalence (the proportion of people testing positive for the virus) was also available in England Elliott2022-xj ; Chadeau-Hyam2022-kc . Infection prevalence time series are often less biased than other epidemic time series, which can be biased by changes in behaviour (health-care seeking behaviour, test-seeking behaviour, etc) or in severity of the virus (e.g, due to vaccination). However, there were concerns that such data would be ill-suited for estimating R(t)R(t) due to the presence of long-term shedders – individuals who continue to test positive for a long duration of time, following infection.
In general, an epidemic time series is linked to the time series of infection incidence by a convolution; if the function for the convolution is known then it is possible to estimate the time series of incidence, and from that R(t)R(t) Huisman2022-nv . However, in many instances the form of this function is not known accurately (or at all) and so R(t)R(t) cannot be straightforwardly calculated. In these instances, it is common practice for the epidemic time series to be treated as a reasonable proxy for the infection incidence and estimates of R(t)R(t) are obtained by assuming the same renewal process between downstream points in the natural history of infection, for example from onset to onset or from death to death Nash2022-ob . Often this assumption is still made even when an estimate of the convolution function could be made Dighe2020-kq ; Ali2013-yj ; Shim2021-oa ; despite the existence of computer packages that allow the inclusion of a convolution function when estimating R(t)R(t) Abbott2020_abc ; Lytras_undated-jd .
There are other potential biases in epidemic time series which may influence estimates of R(t)R(t). However, biases introduced by assuming the epidemic time series to be a perfect proxy for infection incidence have often been overlooked. Here we investigate the form that these biases take and how they depend on the relationship between an epidemic time series and the infection incidence. We further investigate how estimates of R(t)R(t) made during the SARS-CoV-2 pandemic may have been biased by the epidemic time series that were available (onset of symptoms, infection prevalence, hospitalisations and deaths considered), and how data could be better collected and used to minimise these biases.

2 Background

2.1 The relationship between incidence and R(t)R(t)

The reproduction number can be written as

R=η(τ)dτ,R=\int\eta(\tau)\text{d}\tau, (1)

where η(τ)\eta(\tau) describes the rate of secondary infections as a function of the time since a primary infection, τ\tau. Normalising η(τ)\eta(\tau) gives the generation time,

g(τ)=η(τ)η(τ)dτ,g(\tau)=\frac{\eta(\tau)}{\int\eta(\tau)\text{d}\tau}, (2)

the probability distribution describing the time between primary and secondary infections. The incidence at time tt, I(t)I(t) can then be written in terms of the incidence at times less than tt:

I(t)\displaystyle I(t) =0I(tτ)η(τ)dτ\displaystyle=\int_{-\infty}^{0}I(t-\tau)\eta(\tau)\text{d}\tau (3)
=0I(tτ)g(τ)R(t)dτ.\displaystyle=\int_{-\infty}^{0}I(t-\tau)g(\tau)R(t)\text{d}\tau.

Where R(t)R(t) is the reproduction number at time tt. Expressed in this way the reproduction number at time t can be calculated directly from the time-series of incidence (if the generation time function is known — which we will assume throughout) through the equation:

R(t)=I(t)0I(tτ)g(τ)dτ.R(t)=\frac{I(t)}{\int_{-\infty}^{0}I(t-\tau)g(\tau)\text{d}\tau}. (4)

2.2 The relationship between epidemic time series and R(t)R(t)

A general epidemic time-series, J(t)J(t), can be written in terms of the incidence at times less than tt and a convolution function, f(τ)f(\tau), that describes the relationship between the two time series:

J(t)=0I(tτ)f(τ)dτ.J(t)=\int_{-\infty}^{0}I(t-\tau)f(\tau)\text{d}\tau. (5)

The convolution function will take different forms depending on the epidemic time series. For example, for the time series of deaths due to an infectious disease, f(τ)f(\tau) would describe the average rate of deaths as a function of the time since infection. Substituting the relationship in equation 3 into equation 5 we obtain:

J(t)=00I(tτζ)g(ζ)R(tτ)f(τ)dτdζ.J(t)=\int_{-\infty}^{0}\int_{-\infty}^{0}I(t-\tau-\zeta)g(\zeta)R(t-\tau)f(\tau)\text{d}\tau\text{d}\zeta. (6)

In general there is not a simple relationship between J(t)J(t) and R(t)R(t). To calculate R(t)R(t) from J(t)J(t) the convolution function, f(τ)f(\tau), would need to be known (and of course g(τ)g(\tau)). However, it is often assumed that the same relationship exists between J(t)J(t) and R(t)R(t), as the one between I(t)I(t) and R(t)R(t):

J(t)=0J(tζ)g(ζ)RA(t)dζ.J(t)=\int_{-\infty}^{0}J(t-\zeta)g(\zeta)R_{A}(t)\text{d}\zeta. (7)

Here RA(t)R_{A}(t) is the apparent reproduction number inferred from a general epidemic time series under this incorrect assumption, and is given by:

RA(t)=J(t)0J(tζ)g(ζ)dζ.R_{A}(t)=\frac{J(t)}{\int_{-\infty}^{0}J(t-\zeta)g(\zeta)\text{d}\zeta}. (8)

In general RA(t)R(t)R_{A}(t)\neq R(t), but under certain conditions relationships between the two will hold.

R(t)=R(t)= constant

If R(t)R(t), has been constant for the period of time over which f(τ)f(\tau) goes to 0 then equation 6 can be written as:

J(t)=0J(tζ)g(ζ)R(t)¯dζ.J(t)=\int_{-\infty}^{0}J(t-\zeta)g(\zeta)\overline{R(t)}\text{d}\zeta. (9)

Where we have written R(t)R(t) as R(t)¯\overline{R(t)} to make it clear this equation is only valid when R(t)R(t) is constant. In this situation the relationship between J(t)J(t) and R(t)R(t) is the same as the relationship between I(t)I(t) and R(t)R(t) and we have

RA(t)=R(t)¯.R_{A}(t)=\overline{R(t)}. (10)

f(τ)=δ(μτ)f(\tau)=\delta(\mu-\tau)

If the convolution function, f(τ)f(\tau), is the Dirac-delta function with mean μ\mu, δ(μτ)\delta(\mu-\tau), then J(t)J(t) can be written as:

J(t)=I(tμ).J(t)=I(t-\mu). (11)

RA(t)R_{A}(t) can then be written as:

RA(t)\displaystyle R_{A}(t) =J(t)0J(tζ)g(ζ)dζ\displaystyle=\frac{J(t)}{\int_{-\infty}^{0}J(t-\zeta)g(\zeta)\text{d}\zeta} (12)
=I(tμ)0I(tμζ)g(ζ)dζ\displaystyle=\frac{I(t-\mu)}{\int_{-\infty}^{0}I(t-\mu-\zeta)g(\zeta)\text{d}\zeta}
=R(tμ).\displaystyle=R(t-\mu).

3 Results

3.1 The effect of convolution on estimates of R(t)R(t)

Trends in the apparent reproduction number, RA(t)R_{A}(t), lagged trends in the actual reproduction number, R(t)R(t), in ways that were dependent on the shape of the underlying infection time series. Infection incidence was simulated for two time series of R(t)R(t): sinusoidal and square waves. A sinusoidal wave describes gradual changes in R(t)R(t); gradual changes in R(t)R(t) would be expected over the course of an epidemic (due to the depletion of susceptibles). A square wave describes rapid changes in R(t)R(t), which would be expected when some public health interventions (e.g., lockdowns, school closures) are implemented. Epidemic time series and their associated RA(t)R_{A}(t) were then calculated for different gamma distributed convolution functions (Figure 1). As we would expect, the greater the mean of the convolution function, the greater the lag between RA(t)R_{A}(t) and R(t)R(t). When R(t)R(t) changed gradually (simulated as a sine function in time) (Figure 1b) there was a delay between R(t)R(t) and RA(t)R_{A}(t) reaching their respective maximums (Figure 2c). This time delay was approximately equal to the mean of the convolution function. When R(t)R(t) underwent a step decrease (figure 1a) there was a delay between R(t)R(t) decreasing and RA(t)R_{A}(t) decreasing (with similar results for a step increase). Measuring the time delay between R(t)R(t) decreasing (a step change) and RA(t)R_{A}(t) falling to under 95% of its maximum value (Figure 2a) we observed a greater delay for convolution functions with greater means and lower standard deviations.

Refer to caption
Figure 1: Estimates of the time-varying reproduction number from epidemic time series with different gamma distributed convolution functions. (A, B) The actual reproduction number, R(t)R(t) (black), and the apparent reproduction number, RA(t)R_{A}(t) (coloured), for epidemic time series with different gamma distributed convolution functions. (C) The gamma distributed convolution functions used in (A) and (B).
Refer to caption
Figure 2: Diagnostic differences between R(t)R(t) and RA(t)R_{A}(t). Diagnostic differences between R(t)R(t) and RA(t)R_{A}(t), for different gamma-distributed convolution functions. Only convolution functions with a mean value greater than their standard deviation are included. (A,B) Differences between RA(t)R_{A}(t) and R(t)R(t), when R(t)R(t) describes a step decrease (see Figure 1a, day 50). (A) The number of days it takes for RA(t)R_{A}(t) to fall below 95% of its maximum value following the step change in R(t)R(t). (B) The number of days between RA(t)R_{A}(t) falling below 95% of its maximum value and falling below 5% of its maximum value. (C,D) Differences between RA(t)R_{A}(t) and R(t)R(t), when R(t)R(t) describes a sine wave (see Figure 1b). (C) The number of days between R(t)R(t) reaching its maximum value and RA(t)R_{A}(t) reaching its maximum value. (D) A measure of the difference between the peak values of R(t)R(t) and RA(t)R_{A}(t), defined as 1max(R(t))max(RA(t))max(R(t))min(R(t))1-\frac{max(R(t))-max(R_{A}(t))}{max(R(t))-min(R(t))} it reflects the proportion of the maximum value that RA(t)R_{A}(t) reaches.

In addition to RA(t)R_{A}(t) lagging R(t)R(t), trends in RA(t)R_{A}(t) (the shape of the curve over time) were distorted relative to R(t)R(t), with increases in the standard deviation of the convolution function resulting in greater levels of distortion. When R(t)R(t) changed gradually (simulated as a sine function in time) the maximum value of RA(t)R_{A}(t) was lower than the maximum value of R(t)R(t). The greater the standard deviation of the convolution function the greater the relative reduction in the maximum value of RA(t)R_{A}(t) (Figure 2d). When R(t)R(t) underwent a sharp decrease (a step change over a single day) RA(t)R_{A}(t) decreased at a slower rate (Figure 1a). Measuring the time delay between RA(t)R_{A}(t) falling from 95% of its maximum value to 5% of its maximum value we observed a greater delay when the standard deviation of the convolution function was greater (Figure 2b).

3.2 Case study: SARS-CoV-2 epidemic time series

For all epidemic time series considered (that would have been available during the SARS-CoV-2 pandemic) there were clear differences between RA(t)R_{A}(t) and R(t)R(t), during a representative scenario (chosen to resemble trends in R(t)R(t) for SARS-CoV-2 in England from March to December 2020) (Figure 3). During periods when R(t)R(t) was constant, the value of RA(t)R_{A}(t) was the same (as expected). When R(t)R(t) changed, RA(t)R_{A}(t) lagged R(t)R(t) and had its shape distorted in a way that was dependent on the shape of the convolution function.

Refer to caption
Figure 3: Trends in R(t)R(t) and RA(t)R_{A}(t) for representative epidemic time series during the SARS-CoV-2 pandemic. (A) The actual reproduction number, R(t)R(t) (black), and the apparent reproduction number, RA(t)R_{A}(t) (coloured), for epidemic time series with different convolution functions. R(t)R(t) was chosen to resemble a possible trajectory of R(t)R(t) in England from May 2020 to December 2020 (B) The probability distribution of the convolution functions for deaths (Green), hospitalisations (Orange), prevalence (Purple), symptom onset (Pink). The mean and standard deviation of all convolution functions are presented in the inset graph.

The values of RA(t)R_{A}(t) calculated from the time series of symptom onset most closely resembled the underlying R(t)R(t). The mean and standard deviation of the convolution function for symptom onset was lower than all other time series considered (Figure 3b). This was expected for deaths and hospitalisations, for which the convolution functions were composed of the distributions of time from infection to symptom onset (incubation period) and time from symptom onset to outcome (death/hospitalisation) (see Methods). Interestingly, the peak probability density of the convolution function for infection prevalence was the same as for symptom onset, but due to a long tail in the distribution the mean and standard deviation were greater.

3.3 Improving inference using frequent testing

Increasing the frequency of random/asymptomatic testing reduced the mean and standard deviation of the convolution function (Figure 4c). Random testing or asymptomatic testing for SARS-CoV-2 should, in the absence of repeat testing, have a convolution function describing the probability of testing positive following infection. In our simulations, increasing the frequency of testing decreased the mean convolution function, as infections were more likely to be identified earlier. Testing once every three days gave a convolution function with a lower mean than that of symptom onset. Testing every day resulted in a convolution function with mean and standard deviation of approximately 3 days and 1 day respectively. The lower mean and standard deviation of the convolution function resulted in RA(t)R_{A}(t) estimates more closely resembling R(t)R(t) as expected (Figures 4a and 4b).

Refer to caption
Figure 4: The effect of increasing testing frequency on estimates of RA(t)R_{A}(t). (A,B) The actual reproduction number, R(t)R(t) (black), and the apparent reproduction number, RA(t)R_{A}(t) (coloured), for epidemic time series with different convolution functions. (C) The probability distribution of convolution functions for different frequencies of rt-PCR testing. Testing frequency of ’every 100 days’ reflects no repeat testing and has the same convolution function as for prevalence in figure 3. The mean and standard deviation of all convolution functions are presented in the inset graph.

3.4 Improving inference from infection prevalence studies

Imposing a stricter viral threshold for positivity reduced the standard deviation of the convolution function (Figure 5c). rt-PCR tests, which are often used to test for the presence of SARS-CoV-2, can also provide information on the Ct value (proxy for viral load) of a positive test. By only including individual tests with a Ct value lower (higher viral load) than a certain threshold, it becomes less likely that early-infections or long-term shedders are included within the epidemic time series. This led to substantial decreases in the standard deviation of the convolution function, and smaller decreases in the mean of the convolution function. Accordingly, this led to the shape of RA(t)R_{A}(t) more closely resembling that of R(t)R(t), while the lag between the two signals stayed approximately constant (Figure 5a and 5b). Imposing a stricter threshold reduced the number of individuals classified as positive and therefore the power of any analysis (Figure 5d).

Refer to caption
Figure 5: The effect of imposing stricter Ct thresholds for positivity on estimates of RA(t)R_{A}(t). (A,B) The actual reproduction number, R(t)R(t) (black), and the apparent reproduction number, RA(t)R_{A}(t) (coloured), for epidemic time series with different convolution functions. (C) The probability distribution of convolution functions for different Ct thresholds of rt-PCR positivity. Note that the convolution function was defined from a model of population Ct values rather that rt-PCR positivity, and so will not match the convolution function for prevalence in figure 3. The mean and standard deviation of all convolution functions are presented in the inset graph. (D) The proportion of individual tests classified as positive relative to the number defined as positive with a Ct threshold of 37.

4 Discussion

During the SARS-CoV-2 pandemic, estimates of the time-varying reproduction number, R(t)R(t), were crucial for informing public health interventions. In most counties there were numerous epidemic time series available for estimating R(t)R(t). We have described the biases that will be present in estimates of ‘R(t)R(t)’ when it is assumed that these time series are reliable proxies for infection incidence (and not the convolution of infection incidence with some other function). This assumption is commonly made Nash2022-ob , despite methods existing to incorporate estimates of the convolution function into estimates of R(t)R(t) Abbott2020_abc . We presented the convolution functions representing four different epidemic time series for SARS-CoV-2: date of symptom onset (symptom reporting studies), prevalence (rt-PCR testing of random samples or asymptomatic testing), hospitalisations and deaths. Additionally, the convolution function for case-numbers, obtained from mass testing (available in many countries worldwide), would likely be a linear combination of the convolution functions for prevalence (asymptomatic testing) and symptom onset (symptomatic testing) with an additional delay to test distribution (delay between symptom onset and positive test).
If the exact form of these convolution function was known, then estimates of R(t)R(t) could easily be made from the corresponding epidemic time series Huisman2022-nv , but this is not always the case. During the early stages of a pandemic there is little epidemic information, and so convolution functions cannot be inferred. During later periods of a pandemic there can still be great uncertainty in estimates of the convolution function. There is also no guarantee that the convolution functions remain constant over time. For example, rates of symptomatic infection can change (effecting mass testing) Eales2022-vf ; Eales2022-if , severity can change (effecting hospitalisation and death time series) Eales2023-wm , and even the duration an individual remains test positive can change due to new variants Eales2022-aq , or due to vaccination Kissler2021-ng . Thus, there are times when only the apparent reproduction number, RA(t)R_{A}(t), can be estimated, and so understanding the inherent biases in these estimates is crucial.
Changes in RA(t)R_{A}(t) lagged changes in R(t)R(t), with the duration of the lag controlled by the mean of the convolution function. If timeliness in R(t)R(t) estimates are required (e.g., for pandemic surveillance), then this time lag is a major limitation (though estimates could still be useful for retrospective analysis). The temporal trends in RA(t)R_{A}(t) were distorted relative to R(t)R(t), with a greater degree of distortion when variance in the convolution function was greater; If there was no variance in the convolution function (standard deviation=0), RA(t)R_{A}(t) would exhibit the exact same temporal trends as R(t)R(t) (lagged by the mean of the convolution function). This distortion was most pronounced when R(t)R(t) changed rapidly, and less pronounced when R(t)R(t) changed more gradually. During many epidemics only gradual changes in R(t)R(t) are observed (e.g., due to the changing proportion of susceptibility as individuals are infected and develop immunity) and such distortions would be minimal. However, in some instances R(t)R(t) may change rapidly, such as when restrictions are implemented Knock2021-ug — for example lock downs during the SARS-CoV-2 pandemic — or when schools are closed/opened Li2021-uz . In these instances, there will be a high degree of distortion between R(t)R(t) and RA(t)R_{A}(t).
The convolution function for SARS-CoV-2 symptom onset had the smallest mean and variance, of the four convolution functions considered, which would result in estimates of RA(t)R_{A}(t) that were the most similar to R(t)R(t). However, there are often multiple circulating pathogens which may present similar symptom profiles. To estimate RA(t)R_{A}(t), it must be determined which individuals reporting symptoms are infected with the relevant pathogen. For this symptomatic testing the exact convolution function would rely on a distribution describing the time between symptom onset and testing (testing delay); this would likely increase the convolution function’s mean and variance. To improve estimates of RA(t)R_{A}(t) the testing delay distribution should be reduced as much as possible. Additionally, the date of symptom onset should be routinely collected; this would reduce the mean and variance of the convolution function (no testing delay component required) and would also allow R(t)R(t) to be estimated (once the incubation period has been estimated) as the convolution function would be less likely to vary over time (testing delay can change over time).
Random sampling has been proposed as a method for tracking COVID-19 infections Dean2022-ub , without the biases present in standard pandemic surveillance methods such as mass testing Ricoca_Peixoto2020-wj . There has been some concern that such studies would be less well suited for estimating R(t)R(t), due to the presence of long-term shedders — individuals who continue to test positive for a long duration of time, following infection. Random sampling studies measure the infection prevalence, the proportion of a population that test positive for the virus. We have demonstrated that the convolution function for prevalence of SARS-CoV-2 has a mean and variance only slightly greater than that of symptom onset. In fact, compared to mass-testing, which relies heavily on symptomatic testing, it is likely that the mean and variance of the convolution functions would be comparable, depending on the testing delay distribution.
The convolution function for symptomatic testing can at best match the incubation period of the pathogen. In contrast we have shown that there exist methods for improving the inference of RA(t)R_{A}(t) based on prevalence studies. Imposing a stricter threshold for positivity can reduce variance of the convolution function (though larger samples may be required as more positive tests would be excluded). It is worth noting that the Ct value model we presented is not a perfect analogue to rt-PCR positivity and so results are informative, but not the exact outcome that would be obtained using a stricter threshold for positivity. Increasing the frequency of testing could also reduce both the mean and variation of the convolution function. We observed that for daily testing the convolution function’s mean and variance was smaller than that of the incubation period of SARS-CoV-2. It is highly unlikely that a random sampling study, which is very expensive, could be designed to undertake such high rates of testing. However, the convolution function presented would also be valid for asymptomatic testing. Regular asymptomatic testing was performed by many individuals and businesses throughout the pandemic. Studies or surveillance systems set up around frequent asymptomatic testing could result in estimates of RA(t)R_{A}(t) that more closely resemble R(t)R(t). In the latter stages of the pandemic Lateral Flow Viral Antigen detection devices (LFDs) were regularly used by some individuals. LFDs are less sensitive in general than rt-PCR testing, but they are still highly sensitive at detecting infections with high viral loads Joint_PHE_Porton_Down_University_of_Oxford_SARS-CoV-2_test_development_and_validation_cell2020-bg . This is at times a limitation as less infected individuals will be identified, but the resulting convolution function would likely have even less variation.
We have only considered the biases in R(t)R(t) due to the convolution function. There are often many other sources of bias associated with epidemic time series data Ricoca_Peixoto2020-wj . For example, changing testing rates can bias estimates of R(t)R(t) obtained from cases identified through mass testing Eales2022-yr . Additionally, estimates of R(t)R(t) also rely on accurate estimates of the generation time distribution Wallinga2007-qv . The generation time is not always well characterised (especially during the early period of a pandemic) and in some circumstances it can change over time Abbott2022-ku . When the generation time is not known (or poorly estimated), the growth rate is regularly used to quantify epidemic growth and decline as an alternative to R(t)R(t) Parag2022-jw . However, though the growth rate is independent of generation time, estimating it from a general epidemic time series would introduce a similar set of biases as for R(t)R(t) due to the convolution function.

5 Conclusion

There are clear biases when estimating R(t)R(t) from an epidemic time series, under the assumption that the time series is a perfect proxy for infection incidence. By collecting data to inform on the convolution function linking the epidemic time series to incidence, estimates of R(t)R(t) could be made without such biases. During periods in which the convolution function cannot be estimated, the apparent reproduction number can and should still be estimated. Despite its biases, it is highly useful for quantifying transmission dynamics and informing public health interventions. To minimise the inherent biases present, more careful consideration should be given into what epidemic data is used and how it can be better collected and manipulated. Additionally, studies in which it is the apparent reproduction number being estimated should state this explicitly, making it clear what possible biases may be present.

6 Materials and Methods

6.1 Simulating epidemic time series

The time series of R(t)R(t) are pre-specified. R(t)R(t) is assumed to be 1 for a period of 100 days (days -99 to 0) prior to the simulation for ease of later integral calculations (integration’s are all performed over the previous 100 days). The time series of R(t)R(t) can then be defined in any way. We define two main time series of R(t)R(t) following a square wave and a sine wave, both with a period of 100 days, a maximum value of 1.5 and a minimum value of 0.5. We also define a time series of R(t)R(t) that follows approximately the same trend as the estimated R(t)R(t) in England from March 2020 to December 2020 Knock2021-ug . With a time series of R(t)R(t), the time series for infection incidence can be calculated through equation 3:

I(t)=0I(tτ)g(τ)R(t)dτ.I(t)=\int_{-\infty}^{0}I(t-\tau)g(\tau)R(t)\text{d}\tau. (13)

The generation time is assumed to be gamma distributed with a mean of 6.36 days and a standard deviation of 4.20 days Bi2020-ls . This was chosen to match the generation time of SARS-CoV-2, but as we assume perfect knowledge of the generation time all results are independent of the choice of the generation time distribution used. When calculating the value of I(t)I(t) we perform the integral over the previous 100 days. This ensures the integral has fallen to approximately 0. Days -99 to 0 are assumed to have I(t)=1I(t)=1.
General epidemic time series, J(t)J(t) are calculated directly from I(t)I(t) through equation 5:

J(t)=0I(tτ)f(τ)dτ.J(t)=\int_{-\infty}^{0}I(t-\tau)f(\tau)\text{d}\tau. (14)

The integration is again performed over the previous 100 days, ensuring the integral has fallen to approximately 0. f(τ)f(\tau) is a convolution function which we define over a period of only 100 days.

6.2 Calculating the reproduction number

We calculate the apparent reproduction number, RA(t)R_{A}(t), directly from the general epidemic time series J(t)J(t) through equation 8:

RA(t)=J(t)0J(tζ)g(ζ)dζ.R_{A}(t)=\frac{J(t)}{\int_{-\infty}^{0}J(t-\zeta)g(\zeta)\text{d}\zeta}. (15)

The integration is again performed over the previous 100 days, ensuring the integral has fallen to approximately 0. As before g(ζ)g(\zeta) is the generation time distribution which is known exactly when performing the calculation.

6.3 Convolution functions

We define many convolution functions, f(τ)f(\tau), for converting I(t)I(t) into a general epidemic time-series. All convolution functions are defined only for values of τ\tau between 0 and 100.

Gamma-distributed convolution functions

The gamma-distributed convolution functions are defined using the gamma distribution. A gamma distribution defined with scale parameter, kk, and shape parameter, θ\theta has mean, kθk\theta and standard deviation, k1/2θk^{1/2}\theta. We then define our gamma-distributed convolution functions with mean, μ\mu and standard deviation, σ\sigma using the inverse relationships: k=μ2/σ2k=\mu^{2}/\sigma^{2} and θ=σ2/μ\theta=\sigma^{2}/\mu.

Convolution functions for SARS-CoV-2 epidemic time series

The convolution function for symptom onset is defined by the incubation period of SARS-CoV-2, which we assume is gamma distributed with mean 6.0 days and standard deviation 3.1 days Linton2020-oi .
The convolution functions for deaths and hospitalisations are calculated using the equation:

f(τ)=τ0f1(τζ)f2(ζ)dζ.f(\tau)=\int_{\tau}^{0}f_{1}(\tau-\zeta)f_{2}(\zeta)\text{d}\zeta. (16)

In this equation f1(τ)f_{1}(\tau) describes the incubation period of SARS-CoV-2 (same gamma distribution as above) and f2(ζ)f_{2}(\zeta) describes the distribution of the time between symptom onset and death/hospitalisation. For deaths we assume f2(ζ)f_{2}(\zeta) is gamma distributed with mean 15.0 days and standard deviation 6.9 days Linton2020-oi . For hospitalisations we assume f2(ζ)f_{2}(\zeta) is gamma distributed with mean 7.8 days and standard deviation 6.0 days Hawryluk2020-qf . The parameters used for all distributions are only meant to be informative, there are many other potential values that could have been selected from the literature.
The convolution function for prevalence is defined using the probability of testing positive as a function of time since infection. We use the median of the modelled probability of testing positive (as a function of time since infection) estimated in Hellewell et al 2021 Hellewell2021-fo .

Convolution function for frequent testing

The convolution function for frequent testing is calculated using the distribution describing the probability of testing positive as a function of time since infection (the convolution function for prevalence). When there is frequent testing the convolution function describes the day in which an individual first tests positive only. Writing the probability of testing positive τ\tau days after infection as P(τ)P(\tau), we can calculate the probability of first testing positive τ\tau days after infection as:

P(first positive,τ)=P(τ)n=1N11P(τn×γ),P(\text{first positive},\tau)=P(\tau)\prod_{n=1}^{N-1}1-P(\tau-n\times\gamma), (17)

where NN is the total number of tests performed since infection and γ\gamma is the number of days between tests. The product in the above equation is calculating the probability of testing negative in all previous tests.

Convolution function for different Ct value thresholds

The convolution function for different Ct value thresholds of positivity was calculated by simulating the trajectory of Ct values over time since infection for 1000 individuals. The proportion of individuals with a Ct value less than a threshold Ct value was then calculated as a function of time since infection. Ct values were simulated using the results and data of Hay et al 2022 Hay2022-dn (Ct values represent ORF1ab). In the paper an individual’s Ct value trajectories can be quantified using three parameters: the minimum Ct value reached, the time taken for Ct value to decrease from the limit of detection (Ct value=40) to its minimum value (linear decrease assumed), and the time taken for Ct value to increase from its minimum value to the limit of detection (linear increase assumed). The parameters for an individual are assumed to be drawn from parameter distributions describing the population as a whole. To simulate 1000 individuals we extracted 1000 individual level parameters from the population parameter distributions estimated for the model fit to the data for all Omicron infections. We had to assume the time of infection for all individuals. We assumed that all individuals reached peak viral load (minimum Ct value) exactly 5 days after infection.

Declarations

Conflicts of interest

The authors declare no competing interests.

References

  • \bibcommenthead
  • (1) Anderson RM, Fraser C, Ghani AC, Donnelly CA, Riley S, Ferguson NM, et al. Epidemiology, transmission dynamics and control of SARS: the 2002-2003 epidemic. Philos Trans R Soc Lond B Biol Sci. 2004 Jul;359(1447):1091–1105.
  • (2) Sharif-Yakan A, Kanj SS. Emergence of MERS-CoV in the Middle East: origins, transmission, treatment, and perspectives. PLoS Pathog. 2014 Dec;10(12):e1004457.
  • (3) Gog JR, Ballesteros S, Viboud C, Simonsen L, Bjornstad ON, Shaman J, et al. Spatial Transmission of 2009 Pandemic Influenza in the US. PLoS Comput Biol. 2014 Jun;10(6):e1003635.
  • (4) Gire SK, Goba A, Andersen KG, Sealfon RSG, Park DJ, Kanneh L, et al. Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak. Science. 2014 Sep;345(6202):1369–1372.
  • (5) MacCormack-Gelles B, Lima Neto AS, Sousa GS, Nascimento OJ, Machado MMT, Wilson ME, et al. Epidemiological characteristics and determinants of dengue transmission during epidemic and non-epidemic years in Fortaleza, Brazil: 2011-2015. PLoS Negl Trop Dis. 2018 Dec;12(12):e0006990.
  • (6) Li J, Lai S, Gao GF, Shi W. The emergence, genomic diversity and global spread of SARS-CoV-2. Nature. 2021 Dec;600(7889):408–418.
  • (7) Anderson RM, May RM. Infectious Diseases of Humans: Dynamics and Control. OUP Oxford; 1992.
  • (8) Li Y, Campbell H, Kulkarni D, Harpur A, Nundy M, Wang X, et al. The temporal association of introducing and lifting non-pharmaceutical interventions with the time-varying reproduction number (R) of SARS-CoV-2: a modelling study across 131 countries. Lancet Infect Dis. 2021 Feb;21(2):193–202.
  • (9) Eales O, Wang H, Haw D, Ainslie KEC, Walters CE, Atchison C, et al. Trends in SARS-CoV-2 infection prevalence during England’s roadmap out of lockdown, January to July 2021. PLoS Comput Biol. 2022 Nov;18(11):e1010724.
  • (10) Wallinga J, Lipsitch M. How generation intervals shape the relationship between growth rates and reproductive numbers. Proc Biol Sci. 2007 Feb;274(1609):599–604.
  • (11) Nash RK, Nouvellet P, Cori A. Real-time estimation of the epidemic reproduction number: Scoping review of the applications and challenges. PLOS Digital Health. 2022 Jun;1(6):e0000052.
  • (12) : Official UK Coronavirus Dashboard. Accessed: 2022-10-20. https://coronavirus.data.gov.uk/.
  • (13) Elliott P, Eales O, Bodinier B, Tang D, Wang H, Jonnerby J, et al. Dynamics of a national Omicron SARS-CoV-2 epidemic during January 2022 in England. Nat Commun. 2022 Aug;13(1):4500.
  • (14) Chadeau-Hyam M, Eales O, Bodinier B, Wang H, Haw D, Whitaker M, et al. Breakthrough SARS-CoV-2 infections in double and triple vaccinated adults and single dose vaccine effectiveness among children in Autumn 2021 in England: REACT-1 study. EClinicalMedicine. 2022 Jun;48:101419.
  • (15) Huisman JS, Scire J, Angst DC, Li J, Neher RA, Maathuis MH, et al. Estimation and worldwide monitoring of the effective reproductive number of SARS-CoV-2. Elife. 2022 Aug;11.
  • (16) Dighe A, Cattarino L, Cuomo-Dannenburg G, Skarp J, Imai N, Bhatia S, et al. Response to COVID-19 in South Korea and implications for lifting stringent interventions. BMC Med. 2020 Oct;18(1):321.
  • (17) Ali ST, Kadi AS, Ferguson NM. Transmission dynamics of the 2009 influenza A (H1N1) pandemic in India: the impact of holiday-related school closure. Epidemics. 2013 Dec;5(4):157–163.
  • (18) Shim E, Tariq A, Chowell G. Spatial variability in reproduction number and doubling time across two waves of the COVID-19 pandemic in South Korea, February to July, 2020. Int J Infect Dis. 2021 Jan;102:1–9.
  • (19) Abbott S, Hellewell J, Hickson J, Munday J, Gostic K, Ellis P, et al. EpiNow2: Estimate Real-Time Case Counts and Time-Varying Epidemiological Parameters. -. 2020;-(-):–. 10.5281/zenodo.3957489.
  • (20) Lytras T.: bayEStim: Estimate epidemic effective reproduction number in a Bayesian framework.
  • (21) Eales O, de Oliveira Martins L, Page AJ, Wang H, Bodinier B, Tang D, et al. Dynamics of competing SARS-CoV-2 variants during the Omicron epidemic in England. Nat Commun. 2022 Jul;13(1):1–11.
  • (22) Eales O, Page AJ, de Oliveira Martins L, Wang H, Bodinier B, Haw D, et al. SARS-CoV-2 lineage dynamics in England from September to November 2021: high diversity of Delta sub-lineages and increased transmissibility of AY.4.2. BMC Infect Dis. 2022 Jul;22(1):647.
  • (23) Eales O, Haw D, Wang H, Atchison C, Ashby D, Cooke GS, et al. Dynamics of SARS-CoV-2 infection hospitalisation and infection fatality ratios over 23 months in England. PLoS Biol. 2023 May;21(5):e3002118.
  • (24) Eales O, Walters CE, Wang H, Haw D, Ainslie KEC, Atchison CJ, et al. Characterising the persistence of RT-PCR positivity and incidence in a community survey of SARS-CoV-2. Wellcome Open Res. 2022 Mar;7:102.
  • (25) Kissler SM, Fauver JR, Mack C, Tai CG, Breban MI, Watkins AE, et al. Viral Dynamics of SARS-CoV-2 Variants in Vaccinated and Unvaccinated Persons. N Engl J Med. 2021 Dec;385(26):2489–2491.
  • (26) Knock ES, Whittles LK, Lees JA, Perez-Guzman PN, Verity R, FitzJohn RG, et al. Key epidemiological drivers and impact of interventions in the 2020 SARS-CoV-2 epidemic in England. Sci Transl Med. 2021 Jul;13(602).
  • (27) Dean N. Tracking COVID-19 infections: time for change. Nature. 2022 Feb;602(7896):185.
  • (28) Ricoca Peixoto V, Nunes C, Abrantes A. Epidemic Surveillance of Covid-19: Considering Uncertainty and Under-Ascertainment. Portuguese Journal of Public Health. 2020;38(1):23–29.
  • (29) Joint PHE Porton Down & University of Oxford SARS-CoV-2 test development and validation cell. Preliminary report: Rapid evaluation of Lateral Flow Viral Antigen detection devices (LFDs) for mass community testing; 2020.
  • (30) Eales O, Ainslie KEC, Walters CE, Wang H, Atchison C, Ashby D, et al. Appropriately smoothing prevalence data to inform estimates of growth rate and reproduction number. Epidemics. 2022 Jun;40:100604.
  • (31) Abbott S, Sherratt K, Gerstung M, Funk S. Estimation of the test to test distribution as a proxy for generation interval distribution for the Omicron variant in England; 2022.
  • (32) Parag KV, Thompson RN, Donnelly CA. Are epidemic growth rates more informative than reproduction numbers? J R Stat Soc Ser A Stat Soc. 2022 May;.
  • (33) Bi Q, Wu Y, Mei S, Ye C, Zou X, Zhang Z, et al. Epidemiology and transmission of COVID-19 in 391 cases and 1286 of their close contacts in Shenzhen, China: a retrospective cohort study. Lancet Infect Dis. 2020 Aug;20(8):911–919.
  • (34) Linton NM, Kobayashi T, Yang Y, Hayashi K, Akhmetzhanov AR, Jung SM, et al. Incubation Period and Other Epidemiological Characteristics of 2019 Novel Coronavirus Infections with Right Truncation: A Statistical Analysis of Publicly Available Case Data. J Clin Med Res. 2020 Feb;9(2).
  • (35) Hawryluk I, Mellan TA, Hoeltgebaum H, Mishra S, Schnekenberg RP, Whittaker C, et al. Inference of COVID-19 epidemiological distributions from Brazilian hospital data. J R Soc Interface. 2020 Nov;17(172):20200596.
  • (36) Hellewell J, Russell TW, SAFER Investigators and Field Study Team, Crick COVID-19 Consortium, CMMID COVID-19 working group, Beale R, et al. Estimating the effectiveness of routine asymptomatic PCR testing at different frequencies for the detection of SARS-CoV-2 infections. BMC Med. 2021 Apr;19(1):106.
  • (37) Hay JA, Kissler SM, Fauver JR, Mack C, Tai CG, Samant RM, et al. Viral dynamics and duration of PCR positivity of the SARS-CoV-2 Omicron variant; 2022.