--- title: "Introduction to sfhotspot" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to sfhotspot} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.align = "center", fig.asp = 0.7, fig.width = 6, out.width = "80%" ) ``` ```{r setup, echo=FALSE, include=FALSE} library(sf) library(sfhotspot) library(ggplot2) ``` sfhotspot is a package for understanding patterns in data that represent points in space. You can use it to count points in different places, estimate the density of points, show changes in the distribution of points over time, identify places with more points than would be expected by chance, and classify areas based on the number of points in them during different periods. The specific motivation for this package was to analyse the locations of crimes, but the functions should be useful for understanding patterns in points representing other features or events. The package is called sfhotspot because it works with (and -- where relevant -- produces) SF objects as produced by the [sf](https://r-spatial.github.io/sf/) package. sfhotspot also produces data that is [*tidy*](https://tidyr.tidyverse.org/articles/tidy-data.html), making it easy to use functions from packages such as [dplyr](https://dplyr.tidyverse.org/) to filter the results, etc. All the functions in sfhotspot work on an SF data frame or tibble in which each row in the data represents a single point (e.g. the location of an event). In this introduction we will use the built-in `memphis_robberies` dataset to show how each of the `hotspot_*()` family of functions works. `memphis_robberies` contains details of `r format(nrow(memphis_robberies), big.mark = ",")` robberies in Memphis, Tennessee, in 2019. ```{r, echo=FALSE} knitr::kable( memphis_robberies[1:10, ], caption = "The included `memphis_robberies` dataset" ) ``` We can plot this raw data, but the resulting plot is not very informative (even with the points made semi-transparent), since there are too many points to see clear patterns. ```{r} #| fig.alt: > #| A very basic point map showing the locations of individual robberies in #| Memphis, TN. Each robbery is reprsented as a small, semi-transparent dot. ggplot(memphis_robberies) + geom_sf(alpha = 0.1) + theme_minimal() ``` ## Counting points The `hotspot_count()` produces an SF object with counts for the number of points in (by default) each cell in a grid of cells. As with all the functions in the package, this can be customised in various ways -- see *Common arguments*, below. ```{r} point_counts <- hotspot_count(memphis_robberies) point_counts ``` We can then plot that grid of cells. ```{r} #| fig.alt: > #| A basic map showing the count of robberies in each cell in a grid covering #| Memphis, TN. Grid cells with more robberies are shown in darker shades of #| blue, while those with fewer robberies are shown in lighter shades. ggplot() + geom_sf( mapping = aes(fill = n), data = point_counts, alpha = 0.75, colour = NA ) + scale_fill_distiller(direction = 1) ``` ## Calculating kernel density The `hotspot_kde()` function can be used to calculate kernel density estimates for each cell in a grid. The kernel density estimation (KDE) can be customised using the `bandwidth` and `bandwidth_adjust` arguments. This function also accepts the argument explained in the *Common arguments* section, below. If you do not specify any optional arguments, `hotspot_kde()` will try to choose reasonable default values. The KDE algorithm requires projected co-ordinates (i.e. not longitudes and latitudes), so we must first transform the data to use an appropriate local projected co-ordinate system. ```r robbery_kde <- memphis_robberies |> st_transform("EPSG:2843") |> hotspot_kde() robbery_kde ``` ```{r, echo=FALSE, include=FALSE} robbery_kde <- memphis_robberies |> st_transform("EPSG:2843") |> hotspot_kde() ``` ```{r echo=FALSE} robbery_kde ``` Again, we can plot the result. ```{r} #| fig.alt: > #| A hotspot map showing the density of robberies in each cell in a grid #| covering Memphis, TN. Grid cells with higher densitites of robberies are #| shown in darker shades of blue, while those with lower densities are shown #| in lighter shades. ggplot() + geom_sf( mapping = aes(fill = kde), data = robbery_kde, alpha = 0.75, colour = NA ) + scale_fill_distiller(direction = 1) ``` We can adjust the appearance of the KDE layer on this map by specifying optional arguments to `hotspot_kde()`. In particular, the `bandwidth_adjust` argument is useful for controlling the level of detail visible in the density layer -- use values of `bandwidth_adjust` below 1 to show more detail, and values above 1 to show a smoother density surface. ## Common arguments All the functions in this package work on a grid of cells, which can be customised using one or more of these common arguments: * `cell_size` specifies the size of each equally spaced grid cell, using the same units (metres, degrees, etc.) as used in the sf data frame given in the data argument. Ignored if `grid` is not `NULL`. If this argument and `grid` are both `NULL` (the default), the cell size will be calculated automatically. * `grid_type` specifies whether the grid should be made up of squares ("rect", the default) or hexagons ("hex"). Ignored if `grid` is not `NULL`. * `grid` specifies an `sf` data frame containing polygons, which will be used as the grid for which counts are made. * `quiet` whether messages should be printed reporting the values of any parameters (such as `cell_size`) that have been set automatically. ### Automatic cell-size selection If `grid` and `cell_size` are both `NULL`, the cell size will be set so that there are 50 cells on the shorter side of the grid. If the data SF object is projected in metres or feet, the number of cells will be adjusted upwards so that the cell size is a multiple of 100.