--- title: "Seasonal Epidemic Onset" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Seasonal Epidemic Onset} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, warning=FALSE, message=FALSE} library(aedseo) ``` ## Methodology The methodology used to detect the seasonal onset of epidemics, can be divided into two essential criteria: 1. The local estimate of the exponential growth rate, $r$, is significantly greater than zero. 2. The sum of cases (SoC) over the past $k$ units of time exceeds a disease-specific threshold. Here, $k$ denotes the window size employed to obtain the local estimate of the exponential growth rate and SoC. When both of these criteria are met, an alarm is triggered and the onset of the seasonal epidemic is detected. The model is implemented in the `seasonal_onset()` function of the `aedseo` package. Criteria one is fulfilled if the `growth_warning` in the output is `TRUE`. Criteria two is fulfilled if the `sum_of_cases_warning` in the output is `TRUE`. ### Exponential growth rate The exponential growth rate, denoted as $r$, represents the per capita change in the number of new cases per unit of time. Given that incidence data are integer-valued, the proposed method relies on generalized linear models (GLM). For count data, the Poisson distribution is a suitable choice as a model. Hence, the count observations denoted as $Y$ are assumed to follow a Poisson distribution \begin{equation} Y \sim \mathrm{Pois}(\lambda) \end{equation} Here, the link function, $\log()$, connects the linear predictor to the expected value of the data point, expressed as $\log(\lambda)=\mu$. Given a single continuous covariate $t$, the mean $\mu$ can be expressed as \begin{equation} \mu = \alpha + r t \end{equation} This is equivalent to a multiplicative model for $\lambda$, i.e. \begin{equation} \lambda = \exp(\alpha + r t) = \exp(\alpha) \exp(r t) \end{equation} Intuitively, negative values of $r$ result in a decline in the number of observed cases, while $r=0$ represents stability, and positive values of $r$ indicate an increase. It is important to note that the Poisson distribution assumes that the mean and variance are equal. In reality, real data often deviate from this assumption, with the variance ($v$) being significantly larger than the mean. This biological phenomenon, known as overdispersion, can be addressed within a model in various ways. One approach is to employ quasi-Poisson regression, which assumes $v=\sigma\lambda$, or to use negative binomial regression (not implemented yet), which assumes $v=\lambda+\lambda^2/\theta$, where both $\sigma$ and $\theta$ are overdispersion parameters. ## Applying the seasonal_onset algorithm First we generate some data as an `tsd` object, with the `generate_seasonal_data()` function. ```{r} # Construct an 'tsd' object with time series data set.seed(222) tsd_data <- generate_seasonal_data( years = 3, start_date = as.Date("2020-10-18"), trend_rate = 1.002, noise_overdispersion = 3, relative_epidemic_concentration = 2, time_interval = "week" ) ``` Next, the `tsd` object is passed to the `seasonal_onset()` function. Here, a window size of `k=5` is specified, meaning that a total of 5 weeks is used in the local estimate of the exponential growth rate. `na_fraction_allowed = 0.4` defines how large a fraction of observations in the `k` window that are allowed to be `NA`, here `0.4*5 = 2` observations. Additionally, a 95\% confidence interval is specified. Finally, the exponential growth rate is estimated using quasi-Poisson regression to account for overdispersion in the data. A disease-specific threshold can additionally be passed to the function, but is not necessary if only the growth rate estimations are wanted. `season_start` and `season_end` can be used to specify the season to stratify the observations by. This algorithm runs across seasons, such that the first observation in a new season will use the last `k-1` observations from the previous season. The `seasonal_onset()` function provides a `tsd_onset` object with a comprehensive seasonal onset analysis. ```{r} seasonal_onset_results <- seasonal_onset( tsd = tsd_data, k = 5, level = 0.95, disease_threshold = 20, family = "quasipoisson", season_start = 21, season_end = 20, only_current_season = FALSE ) ``` ## Visualising Growth Rates In the first figure, observations over time are shown with a legend for the seasonal onset alarm. In the second figure, the local estimates of the growth rates are presented along with their corresponding 95% confidence interval with a legend for the growth warning. This visualisation can be generated by utilizing the `plot()` S3 method with objects of the `tsd_onset` class. ```{r, dpi=300} plot(seasonal_onset_results) ``` ## Predicting Growth Rates The `predict()` S3 method for `tsd_onset` objects allows you to make predictions for future time steps based on the estimated growth rates. Following is an example of predict observations for the next 5 weekly time steps. ```{r} prediction <- predict(seasonal_onset_results, n_step = 5) ``` ```{r, echo = FALSE} prediction <- prediction |> dplyr::select(-t) |> dplyr::rename(observation = estimate) # Extract two seasons for better visualisation: seasonal_onset_short <- seasonal_onset_results |> dplyr::filter(season %in% c("2023/2024", "2024/2025")) # Plot observations and predictions autoplot(seasonal_onset_short)$observed + # Add the prediction ribbon ggplot2::geom_ribbon( data = prediction, ggplot2::aes( x = reference_time, ymin = lower, ymax = upper, fill = "Observations with \n Confidence intervals" ), alpha = 0.2, inherit.aes = FALSE ) + ggplot2::geom_line( data = prediction, ggplot2::aes( x = reference_time, y = observation, color = "Observations with \n Confidence intervals" ), linewidth = 1, inherit.aes = FALSE ) + ggplot2::scale_color_manual( name = "Predictions", values = c("Observations with \n Confidence intervals" = "red") ) + ggplot2::scale_fill_manual( name = "Predictions", values = c("Observations with \n Confidence intervals" = "red") ) + ggplot2::coord_cartesian( xlim = c( min(seasonal_onset_short$reference_time), max(prediction$reference_time) ) ) ``` In the example above, we use the predict method to predict growth rates for the next 5 time steps, according to the `time_interval = "week"` in the `tsd_onset` object. The `n_step` argument specifies the number of steps into the future you want to forecast. The resulting `tsd_predict` object contains observation estimates, lower bounds, and upper bounds for each time step. ## Summarising seasonal_onset results The `summary()` S3 method for `tsd_onset` objects provides a concise summary of your automated early detection of `seasonal_onset` analysis. You can use it to retrieve important information about your analysis, including the first seasonal onset alarm (reference time point), SoC at reference time point (here over a 5 week window), growth rate estimates at reference time point, total number of growth warnings in the series and latest warnings (growth and SoC). It helps you quickly assess the key findings of your analysis. ```{r summary} summary(seasonal_onset_results) ```