---
title: "Introduction to GHRexplore"
output:
  rmarkdown::html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{Introduction to GHRexplore}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.align = "center",
  warning = FALSE,
  message = FALSE
)
```

# Overview

**GHRexplore** is an R package for exploratory analysis of temporal and spatiotemporal health data including case counts, incidence rates, and covariates. It provides commonly used visualizations and supports data transformations such as temporal and spatial aggregations. The package also offers extensive customization options for the resulting figures. The output of all functions is a ggplot2 object that can be further modified if needed.

```{r setup, results='hide', message=FALSE}
# Load GHRexplore
library(GHRexplore)

# Other necessary libraries for the vignette
library(dplyr)
library(sf)
```

## Data requirements

The data must be a long-format data frame containing a regularly spaced time series (e.g., daily, weekly, monthly) of a single or several spatial units. In this data frame, the time and optional space identifiers must be included as columns.

As an example, we can have a look at the `dengue_MS` dataset included in the package, which includes data on dengue cases from the *Mato Grosso do Sul* state in Brazil as well as a set of relevant covariates.

```{r load data}
data("dengue_MS")
glimpse(dengue_MS)
```

In this data frame, the *date* column contains the temporal identifier (in this case, we have monthly data where the variable *date* represents the first day of the month) and the spatial unit identifier is *micro_code*.

Since we have data for several spatial units, it is useful to have the polygon geometries of the areas (e.g., a geopackage, shapefile o geojson file), loaded as an `sf` object, in order to plot cartographic representations. The `sf` object must also contain the area unit identifier to be able to link the geometries with the data frame.

For the `dengue_MS`, the geometries are already included in the package in the `map_MS` object:

```{r load boundaries}
data("map_MS")
glimpse(map_MS)
```

In `map_MS`, the *code* variable corresponds to the *micro_code* area identifier in the `dengue_MS` object.

> 💡 **Tip**: If you don't have geometries for your data, a good place to look for them is [GADM](https://gadm.org/), [natural earth](https://www.naturalearthdata.com/), or [GeoBoundaries](https://www.geoboundaries.org/). 

## Covariates, case counts, and incidence rates

In all **GHRexplore** functions, the column name of the variable to be plotted needs to be specified with the `var` argument. Plotting functions behave differently depending on the type of data being plotted as defined by the `type` argument: covariates (`type='cov'`), case counts (`type='counts'`), and incidence rates (`type='inc'`). Several examples for each type are provided in the following sections.

To plot incidence rates, the column with the disease counts must be used as `var` and the population must be also supplied as the `pop` argument. By default, rates are computed per 100,000 persons, but this number can be changed using the `pt` argument.

## Spatial and temporal aggregations

Most **GHRexplore** functions support temporal and spatial aggregations from finer to coarser resolution. For example, you can aggregate daily to weekly data, or aggregate small regions to larger regions. 

Temporal and spatial aggregations are performed using the arguments `aggregate_time` and `aggregate_space`, respectively. For covariates (`type='cov'`), the aggregation function can be specified using `aggregate_space_fun` and `aggregate_time_fun` (options include mean (default), median, and sum). For case counts and incidence, the aggregation function is fixed to be the sum of cases.

## Color palettes

In **GHRexplore**, color is controlled by the `palette` argument. We included a few in-house palettes that are used as defaults depending on the plot type:

```{r IDE palettes, fig.width=6}
GHR_palettes()
```

In addition to these, all color palettes included in the packages `RColorBrewer` and `colorspace` can also be used. All available options can be checked by running `RColorBrewer::display.brewer.all()` and `colorspace::hcl_palettes(plot=TRUE)`. 

As a few examples, the 'Blues' palette can be useful when plotting precipitation-related variables, 'Greens' when plotting vegetation-related variables, or 'Blue-Red' when displaying temperature. When a single color is needed, a color in `colors()` or a hex code can also be specified. In addition, the user can provide custom palettes using a vector of hex codes to the palette argument.

> 💡 **Tip**: Palettes can be reversed by preceding them with a minus sign, e.g. '-IDE1'. 

# Plot types

## Time series

`plot_timeseries` produces time series plots of covariates, case counts, or incidence rates. It allows for spatial and temporal aggregations, plotting single or multiple time series of different areas simultaneously (using facets or colors), y-axis transformations, and axis and title labeling.

We start by plotting the time series data for minimum temperature (note `type="cov"`) for all the 11 areas in `dengue_MS`:

```{r ts1, fig.width=7}
plot_timeseries(dengue_MS, var = "tmin", type = "cov", 
                var_label = "Minimum temp.",
                time = "date", area = "micro_code")
```

Since 11 areas are quite a lot for a single graph, we can use `facet = TRUE` to display them in different panels:

```{r ts2, fig.width=7, fig.height=5}
plot_timeseries(dengue_MS, var = "tmin", type = "cov", var_label = "Minimum temp.",
                time = "date", area = "micro_code", facet = TRUE)
```

Another possible strategy is to keep the 11 areas, but highlight the one we are interested in using the `highlight` argument by setting it to the area identifier we want to highlight:

```{r ts3, fig.width=7}
plot_timeseries(dengue_MS, var = "tmin", type = "cov", var_label = "Minimum temp.",
                time = "date", area = "micro_code", highlight = "50001",
                title = "Micro code 50001")
```

As an alternative option, we could also aggregate to a coarser spatial unit by using an aggregation function. Here, we aggregate to meso areas using the `aggregate_space` argument. Since `type="cov"`, it will aggregate temperatures using the mean as a default.

```{r ts4, fig.width=7}
plot_timeseries(dengue_MS, var = "tmin", type = "cov", var_label = "Minimum temp.",
                time = "date", area = "micro_code", aggregate_space = "meso_code")
```

After temperature, we move on to plotting dengue counts (note `type="counts"`) aggregating to meso areas using `aggregate_space`, which applies a sum function by default when `type="counts"`. Given the right-skewed distribution of the counts, we scale the y axis using a `log10(x+1)` transformation:

```{r ts5, fig.width=7}
plot_timeseries(dengue_MS, var = "dengue_cases", type = "counts",
                time = "date", area = "micro_code", aggregate_space = "meso_code",
                transform = "log10p1")
```

In a similar fashion, we can also plot incidence rates (note `type="inc"` and the defined `pop` argument) aggregating to meso areas using `aggregate_space`, which for `type="inc"` it first sums the cases per area, and then computes the incidence rates. We set the scale of the incidence rate (`pt` argument) to 1,000 persons, and scale the y axis using a `log10(x+1)` transformation:

```{r ts6, fig.width=7}
plot_timeseries(dengue_MS, var = "dengue_cases", type = "inc", pop = "population",
                time = "date", area = "micro_code", aggregate_space = "meso_code",
                pt = 1000, transform = "log10p1")
```

## Heatmap

`plot_heatmap` plots time series heatmaps of covariates, case counts, or incidence rates. Years are displayed on the y axis and weeks or months on the x axis. The function allows for spatial and temporal aggregations, plotting single or multiple time series for different areas simultaneously (using facets), color transformations, and axis and title labeling.

In this first example, we plot the variable (note `type="cov"`) PDSI (Palmer Drought Severity Index) aggregated at the meso code level. We use a suitable palette for drought and we center it at zero:

```{r heatmap1, fig.width=6, fig.height=5}
plot_heatmap(dengue_MS, var = "pdsi", type = "cov", var_label = "PDSI",
             time = "date", area = "micro_code",
             aggregate_space = "meso_code", palette = "-Vik", centering = 0) 
```

In this second example, we plot dengue incidence rates (note `type="inc"` and `pop` arguments) at the meso code level and apply a transformation to the color gradient to have a better contrast.

```{r heatmap2, fig.width=6, fig.height=5}
plot_heatmap(dengue_MS, var = "dengue_cases", type = "inc", pop = "population",
             time = "date", area = "micro_code", aggregate_space = "meso_code",
             title= "Monthly Incidence", transform = "log10p1") 
```

## Seasonality

`plot_seasonality` produces yearly time series of covariates, case counts, or incidence rates to explore seasonality patterns. Months/weeks are shown on the x axis, the magnitude of the variable on the y axis, and years are represented as colors. The function allows for spatial and temporal aggregations, plotting single or multiple time series of different areas simultaneously (using facets), axis transformations, and axis and title labeling.

In this example, we explore the seasonal patterns of minimum temperature aggregated at the meso code level.

```{r seasonality, fig.width=7, fig.height=4}
plot_seasonality(dengue_MS, var = "tmin", var_label = "Minimum temperature",
                 type = "cov", time = "date", area = "micro_code",   
                 aggregate_space = "meso_code") 
```

## Map

`plot_map` plots choropleth maps of covariates, case counts, or incidence rates. In this function, we also need to supply the sf object containing the polygon geometries in the `map` argument. We can use the `map_area` argument to specify the column in the `sf` object that contains the area identifiers to be merged to the `area` column of the data frame specified in `data`.

`plot_map` allows for temporal aggregations (by year or across all years), color transformations, centering and binning (current options include quantiles and equal area), as well as title labeling. It can plot both numerical and categorical variables.

In this first example, we plot average urbanicity levels (note `type="cov"`) over the entire study period while using an inverted palette:

```{r map1, fig.width=5}
plot_map(data = dengue_MS, var = "urban",  time = "date", type = "cov", area = "micro_code",  
         map = map_MS, map_area = "code", by_year = FALSE, var_label= "Urbanicity",
         palette = "-Heat")
```

We now plot case incidence per 1,000 persons (note `type="inc"` and `pt=1000`) for every year (`by_year=TRUE`) and binning into 5 categories using the quantile method (see `bins` and `bins_method` arguments).

```{r map2, fig.width=5, fig.height=4}
plot_map(dengue_MS, var = "dengue_cases", type = "inc", pop = "population", 
         pt = 1000, time = "date", area = "micro_code",  
         map = map_MS, map_area = "code", by_year = TRUE, 
         bins = 5, bins_method = "quantile", palette = "-Rocket")  
```

Lastly, here is one example with the categorical, time-invariant *biome* covariate:

```{r map3, fig.width=4, fig.height=3}
plot_map(data = dengue_MS, var = "biome_name", type = "cov",
         time = "date", area = "micro_code", by_year = FALSE,
         map = map_MS, map_area = "code", var_label= "Biome")
```

## Bivariate

`plot_bivariate` allows the visually assessment of associations between two variables. It will be a scatter plot if both variables are numeric and box plots if one of them is categorical. Options for customization include grouping by a variable using color or facets (`area` argument) and axis and title labeling.

In this first example, we explore the relationship between maximum temperature and drought while grouping by meso code:

```{r biv1, fig.width=5}
plot_bivariate(dengue_MS, 
               var = c("tmax", "pdsi"), 
               var_label = c("Max. temp", "PDSI"),
               area = "meso_code")
```

Next, we do the same but grouping using facets with free scales:

```{r biv2, fig.width=7, fig.height=5}
plot_bivariate(dengue_MS, 
               var = c("tmax", "pdsi"), 
               var_label = c("Max. temp", "PDSI"),
               area = "meso_code", 
               facet = TRUE, free_x_scale = TRUE, free_y_scale = TRUE)
```

Lastly, we explore the distribution of minimum temperature and the categorical variable biome while coloring by  meso code:

```{r biv3, fig.width=6}
plot_bivariate(dengue_MS, 
               var = c("biome_name", "tmax"), 
               var_label = c("Biome", "Max. temp"),
               area = "meso_code")
```

## Correlation

`plot_correlation` plots a correlation matrix of a set of variables. By default, Pearson correlation is computed, and circles are used to depict correlation values in the lower triangle and the diagonal of the matrix, whereas numbering is used in the upper triangle:

```{r corr1, fig.width=5, fig.height=4}
plot_correlation(dengue_MS, 
                 var = c("dengue_cases","pop_density", "tmax", "tmin",
                         "pdsi", "urban", "water_network", "water_shortage")) 
```

In this second example, we use Spearman correlation and customize the triangles and the palette.

```{r corr2, fig.width=5, fig.height=4}
plot_correlation(dengue_MS, var = c("dengue_cases","pop_density", "tmax", "tmin",
                                    "pdsi", "urban", "water_network", "water_shortage"),  
                 method = "spearman", plot_type = c("number", "raster"),
                 palette = "RdBu") 
```

Custom labels for the variables can be provided using the `var_label` argument.

## Compare

`plot_compare` allows the visualization of several variables in the same figure using the same **GHRexplore** plotting function. Possible functions include `plot_timeseries`, `plot_heatmap`, `plot_seasonality` and `plot_map`.

In `plot_compare`, the `var`, `var_lab`, `type` and `palette` can be vectors of the same length that refer to each of the elements to be plotted, while some other elements, like `pop`,  `pt`, `time`, `area` and aggregation functions are shared. The final display of the multiple plots is automatically done via the `cowplot` package, and several options to arrange the figures are available (see `?plot_combine`) including all arguments of `cowplot::plot_grid`.

In this first example, we plot the time series of PDSI and dengue incidence as a single column and combine the legends (`ncol=1`, `combine_legend=TRUE`):

```{r compare, fig.width=7, fig.height=5}
plot_compare(plot_function = plot_timeseries,
             data = dengue_MS, 
             var = c("pdsi", "dengue_cases"),
             type = c("cov", "inc"), 
             var_lab = c("PDSI", "Dengue Incidence"),
             pop = "population",
             time = "date", 
             area = "micro_code", 
             aggregate_space = "meso_code",
             ncol=1, 
             combine_legend=TRUE)
```

In this second example, we plot heatmaps of PDSI and dengue incidence using different palettes:

```{r compare2, fig.width=7, fig.height=9}
plot_compare(plot_function = plot_heatmap,
             data = dengue_MS, 
             var = c("pdsi", "dengue_cases"),
             type = c("cov", "inc"), 
             var_lab = c("PDSI", "Incidence"),
             palette = c("Purp", "Reds"), 
             pop = "population",
             time = "date", 
             area = "micro_code", 
             aggregate_space = "meso_code",
             ncol=1)
```