--- title: "Getting started with epiviz" output: rmarkdown::html_vignette: toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{Getting started with epiviz} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Introduction The `epiviz` package provides epidemiological visualization functions for creating both static (ggplot2) and interactive (plotly) charts commonly used in public health surveillance and outbreak investigation. This guide introduces you to the package using the built-in `lab_data` dataset. ## Prerequisites ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, eval = TRUE, fig.width = 8, fig.height = 6, warning = FALSE, message = FALSE) ``` ```{r load-libraries, eval=TRUE, echo=TRUE} library(epiviz) library(dplyr) library(lubridate) ``` ## The lab_data dataset `lab_data` is a synthetic laboratory dataset included with epiviz for demonstration purposes. It contains simulated laboratory detection data with typical epidemiological variables: ```{r explore-data, eval=TRUE, echo=TRUE} # Explore the structure of lab_data glimpse(epiviz::lab_data) ``` The dataset includes: - **Patient demographics**: `date_of_birth`, `sex` - **Laboratory information**: `organism_species_name`, `specimen_date`, `lab_code` - **Geographic data**: `local_authority_name`, `local_authority_code`, `region` ## Example 1: Regional distribution of detections When analyzing laboratory surveillance data, we often want to understand the geographic distribution of detections. Here we'll create a simple column chart showing detections by region for a specific time period. ### Prepare the data ```{r prepare-regional-data, eval=TRUE, echo=TRUE} # Filter to a specific time period and aggregate by region regional_detections <- epiviz::lab_data %>% filter( specimen_date >= as.Date("2023-01-01"), specimen_date <= as.Date("2023-01-31") ) %>% count(region, name = "detections") %>% arrange(desc(detections)) %>% slice(1:6) %>% # Keep top 6 regions for readability mutate( # Handle long region names for better display region = ifelse(region == "Yorkshire and Humber", "Yorkshire and\nHumber", region) ) ``` ### Create the visualization ```{r regional-chart, fig.width=8, fig.height=6, eval=TRUE, echo=TRUE} col_chart( dynamic = FALSE, # Create static ggplot chart params = list( df = regional_detections, x = "region", # Variable for x-axis y = "detections", # Variable for y-axis fill_colours = "#007C91", # Single color for all bars chart_title = "Laboratory detections by region (January 2023)", x_axis_title = "Region", y_axis_title = "Number of detections", x_axis_label_angle = -45, # Rotate labels for readability show_gridlines = FALSE # Remove grid lines for cleaner look ) ) ``` **Interpretation**: This chart shows the regional distribution of laboratory detections in January 2023, with London having the highest number of detections. ## Example 2: Temporal trends in detections Time series analysis is fundamental in epidemiological surveillance. Here we'll create a line chart showing monthly trends in detections over a two-year period. ### Prepare the data ```{r prepare-monthly-data, eval=TRUE, echo=TRUE} # Aggregate detections by month monthly_detections <- epiviz::lab_data %>% filter( specimen_date >= as.Date("2022-01-01"), specimen_date <= as.Date("2023-12-31") ) %>% mutate( specimen_month = floor_date(specimen_date, "month") ) %>% count(specimen_month, name = "detections") ``` ### Create the visualization ```{r monthly-line-chart, fig.width=8, fig.height=6, eval=TRUE, echo=TRUE} line_chart( dynamic = FALSE, # Create static ggplot chart params = list( dfr = monthly_detections, # Note: use 'dfr' parameter for line_chart x = "specimen_month", # Date variable for x-axis y = "detections", # Count variable for y-axis line_colour = c("#007C91"), # Color for the line (vector format) line_type = c("solid") # Line type ) ) ``` **Interpretation**: This line chart reveals seasonal patterns in laboratory detections, with potential peaks and troughs throughout the two-year period. ## Tips for getting started 1. **Start with static charts**: Use `dynamic = FALSE` initially to create ggplot2 charts, then switch to `dynamic = TRUE` for interactive plotly charts when you need zooming, hovering, or filtering capabilities. 2. **Filter your data**: The `lab_data` dataset is quite large. Always filter to specific time periods, regions, or organisms to create readable visualizations. 3. **Check your data structure**: Use `glimpse()` or `str()` to understand your data before passing it to visualization functions. 4. **Parameter naming**: Most functions use a `params` list to organize parameters. This keeps function calls clean and allows for easy parameter reuse. 5. **Color consistency**: Use consistent color schemes across your visualizations. The package provides sensible defaults, but you can customize colors using the `*_colours` parameters. ## Next steps - Explore the function-specific vignettes for detailed examples of each visualization type - Try setting `dynamic = TRUE` in the examples above to see interactive versions - Experiment with different time periods and filters to explore the `lab_data` dataset