--- title: "Getting Started with nowcastr" knitr: opts_chunk: collapse: true comment: "#>" # description: | # An overview of the nowcastr package. vignette: > %\VignetteIndexEntry{Getting Started with nowcastr} %\VignetteEngine{quarto::html} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} library(dplyr) # Ensure pipe operator is available ``` Nowcasting is the process of estimating the current state of a phenomenon when the data are incomplete due to reporting delays. The **nowcastr** package implements the chain-ladder method for nowcasting, supporting both non-cumulative delay-based estimation and model-based completeness fitting (*e.g.*, logistic or Gompertz curves). This vignette provides a quick start guide to using the package with demo data. ## Setup The package is available on GitHub. Install it with: ```{r} #| eval: false pak::pak("whocov/nowcastr") ``` ```{r} library(nowcastr) ``` ## Data Structure Your dataset must contain at least three columns: - **occurrence date**: when the event happened - **reporting date**: when the event was reported - **value**: the observed count/value - \<*groups*\>: none, one or multiple grouping columns: *e.g.* `group_cols = c("group") # or c("region", "disease")` The package includes a demo dataset `nowcast_demo` that follows this structure ```{r} print(nowcast_demo) ``` The demo data also includes a `group` column for demonstrating grouped processing, though you can have multiple grouping columns. ```{r} #| echo: false #| eval: false # generate_test_data( # n_reportdates = 5, # n_delays = 5 # ) ``` ## Workflow A typical nowcasting workflow with **nowcastr** involves the following steps. ### 1. Visualize Input Data Before nowcasting, inspect the reporting pattern of your data: ```{r, fig.asp=5.5/10} nowcast_demo %>% plot_nc_input( option = "triangle", col_date_occurrence = date_occurrence, col_date_reporting = date_report, col_value = value, group_cols = "group" ) ``` The "millipede" plot provides an alternative view of delays: ```{r} nowcast_demo %>% plot_nc_input( option = "millipede", col_date_occurrence = date_occurrence, col_date_reporting = date_report, col_value = value, group_cols = "group" ) ``` ### 2. Prepare Data (Optional) You may want to fill missing values with the last known reporting values to ensure consistent time units: ```{r, fig.asp=5.5/10} data_filled <- nowcast_demo %>% fill_future_reported_values( col_date_occurrence = date_occurrence, col_date_reporting = date_report, col_value = value, group_cols = "group", max_delay = "auto" ) data_filled %>% plot_nc_input( option = "triangle", col_date_occurrence = date_occurrence, col_date_reporting = date_report, col_value = value, group_cols = "group" ) ``` This step is optional; `nowcast_cl` can handle unfilled data. ### 3. Run Nowcast Perform the nowcasting using the chain-ladder method: ```{r} nc_obj <- data_filled %>% nowcast_cl( col_date_occurrence = date_occurrence, col_date_reporting = date_report, col_value = value, group_cols = "group", time_units = "weeks", do_model_fitting = TRUE ) ``` The `nowcast_cl()` function returns a `nowcast_results` object containing predictions, delay distributions, completeness estimates, and parameters. ```{r} S7::prop_names(nc_obj) ``` ### 4. Explore Results Access the components of the result object: ```{r slots} nc_obj@results # Final nowcasted values nc_obj@delays # Delay distribution nc_obj@completeness # Data with completeness estimates str(nc_obj@params) # Parameters used ``` Plot the results: ```{r plots} #| warning: false plot(nc_obj, which = "delays") # Delay distribution plot(nc_obj, which = "results") # Nowcast time series ``` Open a Shiny app to explore results group by group: ```{r} #| eval: false explore_nowcast(nc_obj) ``` ## How It Works The chain-ladder method estimates "completeness" for each delay bucket: - **Delay** = reporting date - occurrence date - **Completeness** = observed value / last reported value (approximation of true value) - **Average completeness** per delay bucket (across occurrence dates) - **Nowcast** = observed value / average completeness Recent occurrence dates have shorter delays and lower completeness. The method upweights these observations to estimate the true count. ### Grouped Processing You can nowcast multiple groups (e.g., regions, diseases) in a single call by specifying multiple grouping columns: ```{r grouped} #| eval: false nowcast_cl( # ... group_cols = c("region", "disease") ) ``` ## Other Utility Functions ### Calculate Retro Scores of input data retro_score = number of actual value changes / max possible value changes [0-1] ```{r calculate_retro_score} # Calculate retro-scores (= number of actual value changes / max possible value changes) nowcast_demo %>% calculate_retro_score( col_date_occurrence = date_occurrence, col_date_reporting = date_report, col_value = value, group_cols = c("group") ) ``` ### Remove duplicated data This is the opposite of `fill_future_reported_values()`. This can be useful to reduce data size without losing information. ```{r rm_repeated_values} # Remove duplicate reported values (same value and higher reporting date) nowcast_demo %>% rm_repeated_values( col_date_occurrence = date_occurrence, col_date_reporting = date_report, col_value = value, group_cols = c("group") ) ```