Computing the daily reproduction number of COVID-19 by inverting the renewal equation using a variational technique

Significance

Based on a signal-processing approach, we propose a method to compute the reproduction number R_t, the transmission potential of an epidemic over time. R_t is estimated by minimizing a functional that enforces: 1) the ability to produce an incidence curve i_t corrected of the weekly periodic bias produced by the weekend effect, obtained from R_t through a renewal equation; and 2) the regularity of R_t. A good agreement is found between our R_t estimate and the one provided by the currently accepted method, EpiEstim, except that our method predicts R_t several days closer to present. We provide the mathematical arguments for this shift.

Abstract

The COVID-19 pandemic has undergone frequent and rapid changes in its local and global infection rates, driven by governmental measures or the emergence of new viral variants. The reproduction number R_t indicates the average number of cases generated by an infected person at time t and is a key indicator of the spread of an epidemic. A timely estimation of R_t is a crucial tool to enable governmental organizations to adapt quickly to these changes and assess the consequences of their policies. The EpiEstim method is the most widely accepted method for estimating R_t. But it estimates R_t with a significant temporal delay. Here, we propose a method, EpiInvert, that shows good agreement with EpiEstim, but that provides estimates of R_t several days in advance. We show that R_t can be estimated by inverting the renewal equation linking R_t with the observed incidence curve of new cases, i_t. Our signal-processing approach to this problem yields both R_t and a restored i_t corrected for the weekend effect by applying a deconvolution and denoising procedure. The implementations of the EpiInvert and EpiEstim methods are fully open source and can be run in real time on every country in the world and every US state.

The reproduction number R_t is a key epidemiological parameter evaluating transmission potential of a disease over time. It is defined as the average number of new infections caused by a single infected individual at time t in a partially susceptible population (1). R_t can be computed from the daily observation of the incidence curve i_t, but requires empirical knowledge of the probability distribution $_{s}$ of the delay between two infections (2, 3).

There are two different models for the incidence curve and its corresponding infection delay $_{s}$ . In a theoretical model, i_t would represent the real daily number of new infections, and $_{s}$ is sometimes called generation time (4, 5) and represents the probability distribution of the time between infection of a primary case and infections in secondary cases. In practice, neither parameter is easily observable because the infected are rarely detected before the appearance of symptoms, and tests will be negative until the virus has multiplied over several days. What is routinely recorded by health organizations is the number of new detected, incident cases. When dealing with this real incidence curve, $_{s}$ is called the serial interval (4, 5). The serial interval is defined as the delay between the onset of symptoms in a primary case and the onset of symptoms in secondary cases (5).

R_t is linked to i_t and $_{s}$ through the renewal equation, first formulated for birthdeath processes in a 1907 note of Alfred Lotka (6). We adopt the Nishiura et al. formulation (7, 8), $i_{t} =_{s = f_{0}}^{f} R_{t s} i_{t s}_{s} for t = 0, .., t_{c},$ [1]where t_c represents the last time at which i_t was available, and f₀ and f are the minimal and maximal observed times between a primary and a secondary case. The underlying epidemiological assumption of this model is that the time-varying factor R_t causes a constant proportional change in an individuals infectiousness, over the course of their entire infectious period, based on the day on which they were infected. In this case, we refer to R_t as the case reproductive number. According to Cori et al. (ref. 5, which cites ref. 9), It is the average number of secondary cases that a case infected at time step t will eventually infect.

It is important to note that secondary infections are sometimes detected before primary ones, and, therefore, the minimum delay f₀ is generally negative (see Fig. 2). Eq. 1 does not yield an explicit expression for R_t. Yet, an easy solution can be found for the version of the renewal equation proposed in Fraser (equation 9 in ref. 9) and Cori et al. (5), $i_{t} = R_{t}_{s = f_{0}}^{f} i_{t s}_{s} .$ [2]

By this equation, R_t is derived at time t from the past incidence values $i_{t s}$ by a simple division, provided that $f_{0} 0$ : $R_{t} = \frac{i_{t}}{_{s = f_{0}}^{f} i_{t s}_{s}} .$ [3]

The underlying epidemiological assumption of this model is that the time-varying factor of R_t causes a change in the infectiousness only on the day on which transmission occurs.* In this case, we refer to R_t as the instantaneous reproduction number. This R_t estimate, implemented by the EpiEstim software, is highly recommended in a very recent review (10) signed by representatives from 10 different epidemiological laboratories from several continents.

EpiEstim is the standard method to compute a real-time estimation of the reproduction number and of widespread use. In its stochastic formulation, the first member i_t of Eq. 2 is assumed to be a Poisson variable, and the second member of this equation is interpreted as the expectation of this Poisson variable. This leads to a maximum-likelihood estimation strategy to compute R_t (5, 1114). A detailed description of the EpiEstim method can be found in SI Appendix.

Comparing Eqs. 2 and 1 shows that when applied with the same serial interval and case-incidence curve, the second equation is derived from the first by assuming R_t constant on the serial interval support $[t f, t f_{0}]$ . Replacing $R_{t s}$ by R_t in Eq. 1 indeed yields Eq. 2. However, it is more precise, when replacing $R_{t s}$ by a constant value, to take into account where the values of the serial interval $_{s}$ are concentrated, namely, the expectation (of the probability distribution) of the serial interval. In particular, a more accurate interpretation of the quotient on the right of Eq. 3 would be $R_{t} = \frac{i_{t}}{_{f_{0}}^{f} i_{t s}_{s}},$ [4]where is a central value of the probability distribution of the serial interval that could be, for instance, the median or the mean. In the Ma et al. (15) estimate of the serial interval for COVID-19, we have $5.5$ for the median and $6.7$ for the mean. This supports the hypothesis that EpiEstim estimates R_t with an average delay of more than 5 d.

In practice, the way the sliding average of the incidence is calculated causes another delay. Indeed, as illustrated in Fig. 1, the raw data of the incidence curve i_t can oscillate strongly with a 7-d period. This oscillation has little to do with the Poisson noise used in most aforementioned publications. Government statistics are affected by changes of testing and polling policies and by weekend reporting delays. These recording delays and subsequent rash corrections result in impulse noise and a strong weekly periodic bias observable on the incidence curve (in green) of Fig. 1 (Top).

Illustration of the EpiInvert method, using the online interface (25), on the US incidence curve of new cases. (Top) In green, the raw oscillating curve of incident cases up to September 5, 2021; in blue, the incidence curve after correction of the weekend bias; and in red, the incidence curve simulated from R_t after the inversion of the renewal equation. (Bottom) In black, R_t, the reproduction number estimated by the current EpiEstim method, adopted by most health experts (27), shifted back 8 d. Estimating its value every day guides the health policy of each country. Having R_t slightly smaller than one after a period of positive values of R_t, as is the case for the United States on September 5 means a potential modification of the pandemic trend. In red is the estimation of R_t by the EpiInvert method. This estimate, obtained by compensating the weekend bias and inverting the integral equation, has a temporal shift of about 8 d with respect to EpiEstim. The shadowed areas give the 90% and 95% CIs for the R_t estimation.

» data-icon-position= » » data-hide-link-title= »0″>

Fig. 1.

Illustration of the EpiInvert method, using the online interface (25), on the US incidence curve of new cases. (Top) In green, the raw oscillating curve of incident cases up to September 5, 2021; in blue, the incidence curve after correction of the weekend bias; and in red, the incidence curve simulated from R_t after the inversion of the renewal equation. (Bottom) In black, R_t, the reproduction number estimated by the current EpiEstim method, adopted by most health experts (27), shifted back 8 d. Estimating its value every day guides the health policy of each country. Having R_t slightly smaller than one after a period of positive values of R_t, as is the case for the United States on September 5 means a potential modification of the pandemic trend. In red is the estimation of R_t by the EpiInvert method. This estimate, obtained by compensating the weekend bias and inverting the integral equation, has a temporal shift of about 8 d with respect to EpiEstim. The shadowed areas give the 90% and 95% CIs for the R_t estimation.

To reliably estimate the reproduction number, a regularity constraint on R_t is needed. Cori et al. (5), initiators of the EpiEstim method, use as regularity constraint the assumption that R_t is locally constant in a time window of size ending at time t (usually = 7 d). This results in smoothing the incidence curve with a sliding mean over 7 d. This assumption has two limitations: It causes a significant resolution loss and an additional $\frac{}{2} = 3.5$ backward shift in the estimation of R_t, given that R_t is assumed constant in $[t, t]$ .

In summary, the computation of R_t by Eqs. 1 and 2 raises three challenges:

1) The renewal equation [1] involves future values of i_t, those for $t + 1,, t f_{0}$ .

2) Its second form, Eq. 2, used by the standard method estimates R_t with a backward shift of about 5 d.

3) Smoothing of the weekend effect causes a 3.5-d shift backward.

These cumulative backward shifts may cause a time delay of up to 8.5 d. We shall give an experimental confirmation of such delays by two independent methods: using a simulator with synthetic ground truths and a thorough study of the incidence curves of 55 countries. The practical meaning of this study is that the value of R_t computed by EpiEstim at time t might refer approximately to $R_{t 8}$ .

Here, we address these three issues by proposing a method to invert the renewal equations [1] and [2]. The inversion method developed for Eq. 1 is illustrated in Fig. 1 (Bottom), where the EpiEstim result using the renewal equation [2] (in black) is superposed with the estimate (in red) of R_t by EpiInvert using Eq. 1. After registering both, the black EpiEstim curve stops 8 d before EpiInvert, the red curve. More generally, we found, using the incidence curve of 55 countries, that the median of the temporal shift between the EpiEstim and EpiInvert R_t estimates using the form Eq. 1 of the renewal equation is about 8.24 d and that the median of the root mean squared error (RMSE) between both estimates is just about 0.036.

This result is slightly surprising, given that the interpretation of R_t in both equations is different and that the serial interval used in both equations also is different. In Eq. 2, the serial interval is indeed truncated to preserve the temporal causality of this equation. This excellent 0.036 fit nevertheless suggests that the EpiInvert method, applied to the renewal equation [1], is compatible with the EpiEstim method, but brings an R_t estimate closer to present. This fact will be investigated by simulations in Sections 3 and 4.

One of the key points of the success of EpiInvert is the correction of the weekly bias. For instance, in Fig. 1 (Top), we observe that, in the last months, the registered incidence curve (green) in the United States shows a clear and significant weekly bias, which is the main source of perturbation of the incidence curve. EpiInvert is able to properly correct the weekly bias (blue curve), which allows for a more accurate estimation of R_t closer to present and a restored incidence curve simulated from R_t after the inversion of the renewal equation (red curve).

The general integral Eq. 1 is a functional equation in R_t. Integral equations have been used to estimate R_t: In ref. 16, the authors estimate R_t as the direct deconvolution of a simplified integral equation, where i_t is expressed in terms of R_t and i_t in the past, without using the serial interval. Such inverse problems involving noise and a reproducing kernel can be resolved through the TikhonovArsenin (17) variational approach involving a regularization term. This method is widely used to solve integral equations and convolutional equations (18). The solution of the equation is estimated by an energy minimization. The regularity of the solution is obtained by penalizing high values of the derivative of the solution. Our variational formulation includes the correction of the weekly periodic bias, or weekend effect. The standard way to deal with a weekly periodic bias is to smooth the incidence curve by a 7-d sliding mean. This implicitly assumes that the periodic bias is additive. The present study supports the idea that this bias is better dealt with as multiplicative. In the variational framework, the periodic bias is therefore corrected by estimating multiplicative periodic correction factors. This is illustrated in Fig. 1 (Top), where the green oscillatory curve is transformed into the blue filtered curve by the same energy-minimization process that also computes R_t (Bottom, in red) and reconstructs the incidence curve up to present by evaluating the renewal equation using the computed R_t and the filtered incidence curve (Top, in red).

In this work we use two versions of the renewal equation formulation to compute R_t. It is, however, possible to formulate statistical models for R_t that do not take into account the serial interval and the renewal equation. For instance, in ref. 19, the author proposes to use the model: $\log (R_{t}) = \log (R_{t 1}) + Z_{t} i_{t 1},$ [5]where Z_t is an independent and identically distributed sequence of standard normal random variables, is the dispersion of the random walk, and is the coefficient of drift. The model was fit to the provided incidence data by applying Bayesian inference on the parameter and state space with assumed prior distributions.

1. Available Serial Interval Functions for SARS-CoV-2

As we saw, the serial interval in epidemiology refers to the time between successive observed cases in a chain of transmission. Du et al. (20) define it as the time duration between a primary case (infector) developing symptoms and secondary case (infectee) developing symptoms.

Du et al. (20) obtained the distribution of the serial interval by a careful inquiry on 468 pairs of patients, where one was the probable cause of the infection of the other. The serial distribution $_{s}$ obtained (20) has a significant number of cases on negative days, meaning that the infectee had developed symptoms up to 10 d before the infector $(f_{0} = 10)$ . In addition to this first serial interval, we test a serial interval obtained by Nishiura et al. (21) using 28 cases, which is approximated by a log-normal distribution, and a serial interval obtained by Ma et al. (15) using 689 cases. As proposed by the authors, this serial interval has been approximated by a shifted log-normal to take into account the cases in the negative days. In Fig. 2, we show the profile of the three serial intervals. There is good agreement of the serial intervals obtained by Du et al. (20) and Ma et al. (15).^{Note that $f_{0} = 4$ for the Ma et al. serial interval, $f_{0} = 0$ for Nishiura et al., and $f_{0} = 10$ for Du et al. The discrete support of is therefore contained in the interval $[f_{0}, f]$ .}

Serial intervals used in our experiments: the discrete one proposed by Du et al. (20) (solid bars in blue), the serial interval proposed by Ma et al. (15) (solid bars in orange) and its shifted log-normal approximation (in green), and finally a log-normal approximation of the serial interval proposed by Nishiura et al. (21) (in red).

» data-icon-position= » » data-hide-link-title= »0″>

Fig. 2.

Serial intervals used in our experiments: the discrete one proposed by Du et al. (20) (solid bars in blue), the serial interval proposed by Ma et al. (15) (solid bars in orange) and its shifted log-normal approximation (in green), and finally a log-normal approximation of the serial interval proposed by Nishiura et al. (21) (in red).

We are assuming that the serial interval profile does not change across time. As stated in ref. 22 (equation 10), it is nevertheless possible to use a more general form of the renewal equation [1], $i_{t} =_{s = f_{0}}^{f} R_{t s} i_{t s}_{t s, s} for t = 0, .., t_{c},$ [6]where $_{t s, s}$ is the forward serial interval, which takes into account that the onset of symptoms and transmission potential can jointly depend on the life history of a disease. The forward serial interval measures the time forward from symptom onset of an infector, obtained from a cohort of infectors that developed symptoms at the same time t s. This more general form of the renewal equation is used in ref. 22 to properly link the initial epidemic growth to the reproduction number R0. The variational approach proposed in the present work can be easily extended to compute R_t from i_t and $_{p, s}$ , provided an estimation of $_{p, s}$ is available.

Transmissibility can also depend on coronavirus lineage. For instance in ref. 23, the authors show that the SARS-CoV-2 variant B.1.1.7 has a 43 to 90% higher reproduction number than preexisting variants. It cannot be ruled out that these new variants have a different serial interval than preexisting ones.

2. Computing R_t by a Variational Method

As explained in the previous section, we aim at solving two versions of the renewal equation $i_{t} = F (i, R,, t) for t = 0, .., t_{c},$ [7]

where $F = F_{1} R_{t}_{s = f_{0}}^{f} i_{t s}_{s}; F = F_{2}_{s = f_{0}}^{f} i_{t s} R_{t s}_{s} .$ [8]

F₂ corresponds to the case reproductive number formulation (Eq. 1) and F₁ to the instantaneous reproduction number formulation (Eq. 2). Both formulations of the renewal equation are valid, and we can apply our methodology to both. As we shall see, this leads to anticipate by several days the estimate of R_t. Eq. 2 is also used in the classic Wallinga Teunis method (4), as shown in SI Appendix. This last method is widely used to compute R_t, retrospectively.

Correcting the Weekend Effect

We must first formulate a compensation for the weekend effect, which in most countries is stationary, strong, and the main cause of discrepancy between i_t and its expected value $F (i, R,, t)$ . To remove the weekend effect, we estimate periodic multiplicative factors defined by a vector $q = (q_{0}, q_{1}, q_{2}, q_{3}, q_{4}, q_{5}, q_{6})$ .

The variational framework we propose to estimate R_t is therefore given by the minimization of the energy $\begin{array}{l} E ({R_{t}}; q) =_{t = 0}^{t_{c}} {(\frac{q_{t % 7} i_{t} F ({q_{t % 7} i_{t}}, R,, t)}{media n_{(t, t]} (i)})}^{2} + \\ w_{t = 1}^{t_{c}} {(R_{t} R_{t 1})}^{2}, \end{array}$ [9]where $t % 7$ denotes the remainder of the Euclidean division of t by 7, t = 0 represents the beginning of the epidemic spread, and t_c represents the date of the last available incidence value.

The weekend effect has varied over the course of the pandemic. Hence, for the estimate of q, it is better to use a time interval $[t_{c} T + 1, T]$ , where T is fixed in the experiments to T = 56 (8 wk). This 2-mo time interval is long enough to avoid overfitting and small enough to ensure that the testing policy has not changed too much. The optimization of R_t is still performed through the whole time interval $[0, t_{c}]$ . The corrected value ${\hat{i}}_{t} = q_{t % 7} i_{t}$ amounts to a deterministic attenuation of the weekend effect on i_t. An obvious objection is that this correction might not be mean-preserving. To preserve the number of accumulated cases in the period of estimation, we therefore add the constraint $_{t = t_{c} T + 1}^{t_{c}} i_{t} =_{t = t_{c} T + 1}^{t_{c}} {\hat{i}}_{t} =_{t = t_{c} T + 1}^{t_{c}} q_{t % 7} i_{t},$ [10]

to the minimization problem Eq. 9.

In that way, the multiplication by the factor $q_{t % 7}$ produces a redistribution of the cases i_t during the period of estimation, but it does not change the global number of cases. In Eq. 9, $media n_{(t, t]} (i)$ is the median of i_t in the interval $(t, t]$ used to normalize the energy with respect to the size of i_t. In the experiments, we use = 21. The first term of E is a data-fidelity term, which forces the renewal Eq. 7 to be satisfied as closely as possible. The second term is a classic TikhonovArsenin regularizer of R_t. As in the case of EpiEstim, this method provides a real-time estimate of R_t up to the date, t_c, of the last available incidence value. Yet, in contrast with EpiEstim, this method takes advantage for $t < t_{c}$ of the knowledge of the incidence curve $i_{\overset{}{t}}$ for $\overset{}{t} [t, t_{c}]$ . This improves the posterior accuracy of the R_t estimate.

The Regularization Weight

The regularization weight $w 0$ is a dimensionless constant weight fixing the balance between the data-adjustment term and the regularization term.

Boundary Conditions of the Variational Model

Since t = 0 is the beginning of the epidemic spread where the virus runs free, one is led to use an estimate of $R_{0} = R 0$ based on the basic reproduction number R0. (In SI Appendix, we present a basic estimation of R0 from the initial exponential growth rate of the epidemic obtained as in ref. 24). Therefore, to solve Eq. 9, we add the boundary condition $R_{0} = R 0$ . The proposed inversion model provides an estimation of R_t up to the date, t_c, of the last available incidence value. Yet, if $f_{0} < 0$ , the functional Eq. 9 involves a few future values of R_t and i_t for $t_{c} < t t_{c} f_{0}$ . These values are unknown at present time t_c. We use a basic linear regression using the last seven values of i_t to extrapolate the values of i_t beyond t_c. We show in SI Appendix that the boundary conditions and the choice of the extrapolation procedure have a minor influence in the estimation of R_t in the last days when minimizing Eq. 9.

All of the experiments described here can be reproduced with the online interface available in ref. 25. This online interface allows one to assess the performance of the method applied to the total world population and to any country and any state in the United States, with the last available data. We detail our daily sources in SI Appendix.

An Empirical CI for R_t

In the absence of a statistical model on the distribution of R_t, no theoretical a priori CI for this estimate can be given. Nevertheless, a realistic CI is obtained by the following procedure: Let us denote by $R_{t}^{\overset{}{t}}$ the EpiInvert estimate at time t using the incidence curve up to the date $\overset{}{t} t$ . Therefore, $R_{t}^{t_{c}}$ represents the final EpiInvert estimate of R_t using the incidence data up to the last available date t_c. As shown below using the real and simulated data, for $F F_{2}, R_{t}^{t + k}$ stabilizes for $k 8$ . We can therefore consider $R_{t}^{t + 8}$ as an approximation of the reproduction number ground truth. We want to provide an empirical CI $I_{t} = [R_{t}^{t_{c}} r (t), R_{t}^{t_{c}} + r (t)]$ such that 95% of the times $R_{t}^{t + 8} I_{t}$ (for $t = t_{c}, t_{c} 1,, t_{c} 7$ ). To define r(t) we use, on the one hand, a measure of the variation of R_t in the last few days given by $(t) = \sqrt{\frac{_{n = 1}^{N} {(R_{t}^{t_{c}} R_{t}^{t_{c} n})}^{2}}{N}},$ [11]where $R_{t}^{t_{c} n}$ in $(t_{c} n, t_{c}]$ are obtained by linear extrapolation. In our experiments, we use N = 3. On the other hand, we use, supported by results obtained below for real and simulated data, that the error in the estimation of R_t grows linearly when t approaches t_c (the last time at which i_t was available). Combining $(t)$ with a linear function with respect to $(t_{c} t)$ , we obtain the following expression for r(t): $r (t) = (t) + {(B C (t_{c} t))}_{+},$ [12]
where B and C are parameters of the estimation and ${(x)}_{+} \max (0, x)$ . The advantage of this empirical approach is that the estimation of the CI is adapted to the variation of R_t in the last few days. To compute the empirical CIs, we use the incidence curve of 55 countries. For each country, and for any $t (t_{c} 308,$ $t_{c} 8]$ and $k 8$ , we refer to $R_{t}^{t + k}$ as the EpiInvert estimate of the reproduction number at time t using the incidence curve up to the time t + k. Using the values of $R_{t}^{t + k}$ for $t (t_{c} 308, t_{c} 8]$ and the 55 countries, we obtain, in the case of $F F_{2}$ , that using B = 0.24 and C = 0.03, 95% of times $R_{t}^{t + 8} I_{t}$ (for $t = t_{c}, t_{c} 1,, t_{c} 7$ ). In the same way, for the empirical 90% CI, we obtain B = 0.16 and C = 0.022. If we consider now $F F_{1}$ , then it is observed that $R_{t}^{t + k}$ stabilizes for k = 3. Using it as ground truth, the obtained empirical CIs for $R_{t}^{t + 3}$ are given by B = 0.04 and C = 0.016 (in the case of 95%) and B = 0.02 and C = 0.009 (in the case of 90%) These empirical intervals are displayed for each t in the online algorithm in ref. 25 and have the aspect of fattened curves above and below R_t.

Efficiency Measure of the Weekly Bias Correction

We estimate the correction of the weekly periodic bias by the efficiency measure $I = \sqrt{\frac{_{t = t_{c} T + 1}^{t_{c}} {({\hat{i}}_{t} F (\hat{i}, R,, t))}^{2}}{_{t = t_{c} T + 1}^{t_{c}} {(i_{t} F (i, R 1,, t))}^{2}}} .$ [13]

I represents the reduction factor of the RMSE between the incidence curve and its estimate using the renewal equation after correcting the weekend bias. ${\hat{i}}_{t} = i_{t} q_{t % 7}$ and R are the optimal values for the energy Eq. 9, and R1 denotes the R estimate without correction of the weekly bias. The value of I can be used to assess whether it is worth applying the correction of the weekly periodic bias to a given country in a given time interval.

Estimation of the Temporal Shift between EpiEstim and EpiInvert.

In what follows, we will denote by $R_{t}^{Epi}$ the EpiEstim estimation of the reproduction number by Cori et al. (5), detailed in SI Appendix. As we have argued above, we expect a significant temporal shift between the EpiInvert estimate of R_t and $R_{t}^{Epi}$ of the order of 9 d. This expectation is strongly confirmed by the experimental results and can be checked by applying the proposed method to any country using the online interface available in ref. 25. In summary, the time shift between both methods should be a half-week (3.5 d) for $F F_{1}$ and by Eq. 4 of about $+ 3.5 9$ for $F F_{2}$ . This will be verified experimentally by computing the shift $\overset{}{t}$ between $R_{t}^{Epi}$ and R_t yielding the best RMSE between both estimates: $\overset{}{t} = \underset{_{t [0, 12]}}{\arg \min} S (t) \sqrt{\frac{_{k = t_{c} T + 1}^{t_{c}} {(R_{k t} R_{k}^{Epi})}^{2}}{T}},$ [14]

where T = 56 (8 wk) and where we evaluate $R_{k t}$ for noninteger values of k t by linear interpolation.

Summary of the Algorithm Parameters and Options

Choice of the serial interval: The default options are the serial intervals obtained by Ma et al. (we use the shifted log-normal approximation), Nishiura et al., and Du et al. The users can also upload their own serial interval;

Choice of the renewal equation used, $F F_{1}$ or $F F_{2}$ ;

Correction of the weekly periodic bias (option by default)

The regularization weight w is always fixed to five, the value we obtain below, experimentally, by comparing with EpiEstim.

Summary of the Output Displayed in Ref. 25

First, we draw two charts. In the first one, we draw R_t and $R_{t}^{Epi}$ shifted back $\overset{}{t}$ days where $\overset{}{t}$ is defined in Eq. 14. R_t is surrounded by a shaded area that represents the above-defined empirical CIs. In the second chart, we draw the initial incidence curve i_t in green, the incidence curve after the correction of the weekly periodic bias ${\hat{i}}_{t} = i_{t} q_{t % 7}$ in blue, and the evaluation of the renewal equation given by $t F (\hat{i}, R,, t)$ in red. For each experiment, we also compute:

1) $R_{t_{c}}$ : last available value of the EpiInvert R_t estimate.

2) $R_{t_{c}}^{Epi}$ : last available value of the EpiEstim estimate $R_{t}^{Epi}$ .

3) $\overset{}{t}$ : optimal shift (in days) between R and R^Epi defined in Eq. 14.

4) $S (\overset{}{t})$ : RMSE between R and R^Epi shifted back $\overset{}{t}$ days (defined in Eq. 14).

5) $V (i)$ : variability of the original incidence curve, i_t, given by: $V (i) \frac{| i^{} |_{L^{1} [t_{c} T, t_{c}]}}{| i |_{L^{1} [t_{c} T, t_{c}]}} \frac{_{t = t_{c} T + 1}^{t_{c}} | i_{t} i_{t 1} |}{_{t = t_{c} T + 1}^{t_{c}} i_{t}} .$ [15]

6) $V (\hat{i})$ : variability of the filtered incidence ${\hat{i}}_{t}$ after the correction of the weekly periodic bias.

7) I: reduction factor of the RMSE between the incidence curve and its estimate using the renewal equation after the correction of the weekly periodic bias (defined in Eq. 13).

8) $q = (q_{0}, .., q_{6})$ : the correction coefficients of the weekly periodic bias (q₆ corresponds to the t_c, the last time at which i_t was available).

3. Results on Incidence Curves from 55 Countries

To estimate a reference value for the regularization parameter w, we used the incidence data up to July 17, 2021, for the 55 countries showing the larger number of cases. For each country, we performed 30 experiments. Starting with the incidence data up to July 17, in each experiment, we removed the last 10 d from the incidence data used in the previous experiment. In that way, we got a large variety of real epidemic scenarios. We highlight the challenges of interpreting real case report time series, which frequently have an additional layer of convolution/delay relative to the true case incidence time series due, for instance, to the delay between the test and their registration in the official records.

We optimized the RMSE $S (\overset{}{t})$ between R_t and $R_{t}^{Epi}$ shifted back $\overset{}{t}$ days (defined in Eq. 14). This optimization was performed with respect to w and $\overset{}{t}$ . The goal was to fix w, the only parameter of the method, so that the result of EpiInvert is as close as possible to EpiEstim in the days where both methods predict R_t. The second goal of this optimization was to estimate the effective time shift $\overset{}{t}$ between both methods.

In Fig. 3, we show the box plot of the distribution of w for F₁ and F₂ when w was optimized independently for each experiment to minimize the average error over 56 d between the EpiEstim and the EpiInvert methods. The median of the distribution of w is five for F₁ and F₂, which indicated that a common value of w could be fixed for all countries. Here, and in all figures to follow, each dot represents the average of all experimental results associated to a country.

Distribution of w for F₁ and F₂ when the regularization weight w and the delay t˜ are optimized independently for each country to minimize the average error S(t˜) between the EpiEstim and the EpiInvert methods on a time lapse of 56 d. France, blue; Japan, green; Peru, cyan; South Africa, magenta; United States, red.

» data-icon-position= » » data-hide-link-title= »0″>

Fig. 3.

Distribution of w for F₁ and F₂ when the regularization weight w and the delay $\overset{}{t}$ are optimized independently for each country to minimize the average error $S (\overset{}{t})$ between the EpiEstim and the EpiInvert methods on a time lapse of 56 d. France, blue; Japan, green; Peru, cyan; South Africa, magenta; United States, red.

In Fig. 4, we show, for the versions F₁ and F₂ of the renewal equation, the average error $S (\overset{}{t})$ over 56 consecutive days of the error between the EpiEstim and the EpiInvert estimates of R_t for each country. The median of the overall average error is 0.025 for F₁ and 0.034 for F₂.

Average error S(t˜) between the EpiEstim and the EpiInvert estimates of R_t for each country. In Left, w is fixed, and in Tight, it is the optimal weight per country. France, blue; Japan, green; Peru, cyan; South Africa, magenta; United States, red.

» data-icon-position= » » data-hide-link-title= »0″>

Fig. 4.

Average error $S (\overset{}{t})$ between the EpiEstim and the EpiInvert estimates of R_t for each country. In Left, w is fixed, and in Tight, it is the optimal weight per country. France, blue; Japan, green; Peru, cyan; South Africa, magenta; United States, red.

Computing the daily reproduction number of COVID-19 by inverting the renewal equation using a variational technique

Significance

Abstract

1. Available Serial Interval Functions for SARS-CoV-2

2. Computing R_t by a Variational Method

Correcting the Weekend Effect

The Regularization Weight

Boundary Conditions of the Variational Model

An Empirical CI for R_t

Efficiency Measure of the Weekly Bias Correction

Estimation of the Temporal Shift between EpiEstim and EpiInvert.

Summary of the Algorithm Parameters and Options

Summary of the Output Displayed in Ref. 25

3. Results on Incidence Curves from 55 Countries

4. Validation on Epidemic Simulations

5. Conclusion

Data Availability

Footnotes

www.actusduweb.com

Suivez Actusduweb sur Google News

Computing the daily reproduction number of COVID-19 by inverting the renewal equation using a variational technique

Significance

Abstract

1. Available Serial Interval Functions for SARS-CoV-2

2. Computing Rt by a Variational Method

Correcting the Weekend Effect

The Regularization Weight

Boundary Conditions of the Variational Model

An Empirical CI for Rt

Efficiency Measure of the Weekly Bias Correction

Estimation of the Temporal Shift between EpiEstim and EpiInvert.

Summary of the Algorithm Parameters and Options

Summary of the Output Displayed in Ref. 25

3. Results on Incidence Curves from 55 Countries

4. Validation on Epidemic Simulations

5. Conclusion

Data Availability

Footnotes

www.actusduweb.com

Suivez Actusduweb sur Google News

2. Computing R_t by a Variational Method

An Empirical CI for R_t