raceland: Describing local racial patterns of racial landscapes at different spatial scales

Anna Dmowska, Tomasz Stepinski, Jakub Nowosad

2023-04-14

INTRODUCTION

The raceland package implements a computational framework for a pattern-based, zoneless analysis and visualization of (ethno)racial topography. It is a reimagined approach for analyzing residential segregation and racial diversity based on the concept of ‘landscape’ used in the domain of landscape ecology. An overview of the implemented method is presented in the first vignette. Here we demonstrate, how the raceland package can be used for describing racial landscape at different spatial scales.

Racial landscape method is based on the raster gridded data, and unlike the previous methods, does not depend on the division for specific zones (census tract, census block, etc.). Calculation of racial diversity (entropy) and racial segregation (mutual information) can be performed for the whole area of interests (i.e., metropolitan area) without introducing any arbitrary divisions. Racial landscape method also allows for performing the calculation at different spatial scales.

# install required packages 
pkgs = c(
  "raceland",                
  "comat",            
  "terra",   
  "sf",              
  "dplyr"
)
to_install = !pkgs %in% installed.packages()
if(any(to_install)) {
  install.packages(pkgs[to_install])
}

A computational framework requires a few steps (see the first vignette for details) before calculating IT-metrics:

# reading input data
list_raster = list.files(system.file("rast_data", package = "raceland"),
                         full.names = TRUE)
race_raster = rast(list_raster)

# constructing racial landscape
real_raster = create_realizations(x = race_raster, n = 100)

# calculating local subpopulation densities
dens_raster = create_densities(real_raster, race_raster, window_size = 10)

DESCRIBING LOCAL RACIAL PATTERNS OF RACIAL LANDSCAPES AT DIFFERENT SPATIAL SCALES

Defining local patterns

Let consider an example presented below. The racial landscape covers the area of 16 by 16 cells. Such an area can be divided into a square-shaped block of cells. Each square of cells will represent a local pattern (a local landscape), and for each local pattern, IT metrics (entropy and mutual information) are calculated. The extent of a local pattern is defined by two parameters: size and shift.

The example presented below consists of the racial landscape 16 by 16 cells. Setting size = 4 (and shift= 4) results in dividing the racial landscape into four squared windows, each 4x4 cells. Each window represents a local pattern. For each local pattern, IT metrics can be calculated, and the results will be assigned to the resultant grid of square windows. In fact, the original racial landscape with 16x16 cells is reduced to the 2x2 ‘large cells’.

Setting size=4 and shift = 2 results in overlapping square windows. First, the window of the size 4x4 defines the local pattern (see dark blue square). In the next step, this window is shifted by two cells to the right, and the new local pattern is selected (see the light blue square). It will create a resultant grid of the cell size defined by the shift parameter.

Calculate IT metrics from local patterns (1)

The create_grid() function creates spatial object with a grid (each ‘cell’ is defined by size and shift). This function requires the SpatRaster object with realizations (racial landscapes) and size parameter. If the shift parameter is not set, it is assumed that size=shift. Below such grid is imposed into the racial landscape to show local patterns.

race_colors = c("#F16667", "#6EBE44", "#7E69AF", "#C77213", "#F8DF1D")
grid_sf = create_grid(real_raster, size = 20)
plot_realization(real_raster[[1]], race_raster, hex = race_colors)
plot(st_geometry(grid_sf), add = TRUE, lwd = 2)

The calculate_metrics() function is used to calculate IT metrics. Parameter size=20 means that the area of interests will be divided into a grid of local patterns of the size 20x20 cells (which in this case corresponds to the square of 0.6 km x 0.6km). The neighboorhood = 4 defines that adjacencies between cells are defined in four directions, fun="mean" calculate average values of population density from adjacent cells, threshold = 0.5 - calculation will be performed if there is at least 50% of non-NA cells.

IT metrics are calculated for each local pattern for each realization. The output table will have 900 rows (there are nine local patterns of size 20x20 cells and 100 realizations).

metr_df_20 = calculate_metrics(x = real_raster, w = dens_raster, 
                               neighbourhood = 4, fun = "mean", 
                               size = 20, threshold = 0.5)
metr_df_20[metr_df_20$realization == 1, ]
#>   realization row col      ent  joinent  condent      mutinf
#> 1           1   1   1 1.242992 2.473338 1.230346 0.012645856
#> 2           1   1   2 1.618187 3.101661 1.483474 0.134712712
#> 3           1   1   3 1.391554 2.766716 1.375162 0.016392488
#> 4           1   2   1 1.604952 2.985754 1.380802 0.224149925
#> 5           1   2   2 1.535208 3.029593 1.494386 0.040821771
#> 6           1   2   3 1.535544 2.904158 1.368614 0.166929892
#> 7           1   3   1 1.419478 2.696524 1.277046 0.142432066
#> 8           1   3   2 1.544421 3.080596 1.536174 0.008246697
#> 9           1   3   3 1.371094 2.704907 1.333813 0.037280600

Racial topography at the analyzed scale is quantified as an ensemble average from multiple realizations. First, for each square window is calculated the average value of entropy and mutual information based on 100 realizations. The table below shows the mean (ent_mean, mutinf_mean) and standard deviation (ent_sd, mutinf_sd) values for each square window.

smr = metr_df_20 %>%
  group_by(row, col) %>%
  summarize(
    ent_mean = mean(ent, na.rm = TRUE),
    ent_sd = sd(ent, na.rm = TRUE),
    mutinf_mean = mean(mutinf, na.rm = TRUE),
    mutinf_sd = sd(mutinf, na.rm = TRUE)
  )
#> `summarise()` has grouped output by 'row'. You can override using the `.groups`
#> argument.
smr
#> # A tibble: 9 × 6
#> # Groups:   row [3]
#>     row   col ent_mean ent_sd mutinf_mean mutinf_sd
#>   <dbl> <dbl>    <dbl>  <dbl>       <dbl>     <dbl>
#> 1     1     1     1.27 0.0460      0.0193   0.00883
#> 2     1     2     1.59 0.0115      0.160    0.0289 
#> 3     1     3     1.42 0.0318      0.0462   0.0154 
#> 4     2     1     1.61 0.0258      0.151    0.0373 
#> 5     2     2     1.52 0.0218      0.0347   0.0134 
#> 6     2     3     1.52 0.0285      0.156    0.0293 
#> 7     3     1     1.38 0.0470      0.136    0.0424 
#> 8     3     2     1.54 0.0302      0.0152   0.00658
#> 9     3     3     1.42 0.0299      0.0477   0.0138

Then the averages from the mean values of entropy and mutual information are calculated.

smr %>% 
  ungroup() %>% 
  select(-row, -col) %>% 
  summarise_all(mean)
#> # A tibble: 1 × 4
#>   ent_mean ent_sd mutinf_mean mutinf_sd
#>      <dbl>  <dbl>       <dbl>     <dbl>
#> 1     1.47 0.0303      0.0851    0.0218

Racial diversity-segregation classification

Racial diversity and segregation can be analyzed and displayed separately. However, much more information can be gained by visualizing those two measures at the same time. For this purpose, entropy and mutual information are reclassified into three classes (1-low, 2-medium, 3-high). By joining those two classification, we obtain 9 classes, each describing the level of diversity and segregation: 11 - low segregation/low diversity; 12-low segregation/medium diversity; 13-low segregation/high diversity; 21 - medium segregation/low diversity; 22-medium segregation/medium diversity; 23-medium segregation/high diversity; 31 - high segregation/low diversity; 32-high segregation/medium diversity; 33-high segregation/high diversity.

Each class is coded by one color using bivariate palette (see below).

The bivariate_classification() function presented below takes three arguments (entropy - a vector of entropy values, mutual_information - a vector of mutual information values, n - a number of categories in racial landscape) and return a 9-coded vector of racial segregation/divesity classes.

# n is a number of categories in racial landscape
bivariate_classification = function(entropy, mutual_information, n) {
  
  # calculate bivariate classification
  nent = log2(n)
  ent_cat = cut(entropy, breaks = c(0, 0.66, 1.33, nent), labels = c(1, 2, 3), 
                include.lowest = TRUE, right = TRUE)
  ent_cat = as.integer(as.character(ent_cat))
  
  mut_cat = cut(mutual_information, breaks = c(0, 0.33, 0.66, 1), labels = c(10, 20, 30), 
                include.lowest = TRUE, right = TRUE)
  mut_cat = as.integer(as.character(mut_cat))
  
  bivar_cls = mut_cat + ent_cat
  bivar_cls = as.factor(bivar_cls)
  
  return(bivar_cls)
}
smr$bivar_cls = bivariate_classification(entropy = smr$ent_mean, 
                                         mutual_information = smr$mutinf_mean,
                                         n = nlyr(race_raster))

Mapping local racial diversity and racial segregation

The average value of entropy and mutual information calculated for each square-shaped window from all realizations can be joined to the spatial grid object. Such operation allows for mapping metrics and shows how segregation and racial diversity change over the area.

# join IT-metric to the grid
attr_grid = dplyr::left_join(grid_sf, smr, by = c("row", "col"))
# calculate breaks parameter for plotting entropy and mutual information
# the values of entropy and mutual information are divided into equal breaks
ent_breaks = c(seq(0, 2, by = 0.25), log2(nlyr(race_raster)))
mut_breaks = seq(0, 1, by = 0.1)

Mapping racial diversity

plot(attr_grid["ent_mean"], breaks = ent_breaks, key.pos = 1, 
     pal = rev(hcl.colors(length(ent_breaks) - 1, palette = "RdBu")),
     bty = "n", main = "Racial diversity (Entropy)")

Mapping racial segregation

plot(attr_grid["mutinf_mean"], breaks = mut_breaks, key.pos = 1, 
     pal = rev(hcl.colors(length(mut_breaks) - 1, palette = "RdBu")),
     bty = "n", main = "Racial segregation (Mutual information)")

Mapping racial segregation and diversity using bivariate classification

biv_colors = c("11" = "#e8e8e8", "12" = "#e4acac", "13" = "#c85a5a", "21" = "#b0d5df",
               "22" = "#ad9ea5", "23" = "#985356", "31" = "#64acbe", "32"= "#627f8c", 
               "33" = "#574249")
bcat = biv_colors[names(biv_colors)%in%unique(attr_grid$bivar_cls)]
plot(attr_grid["bivar_cls"], pal = bcat, main = "Racial diversity and residential segregation")

Calculate IT metrics from local patterns (overlapping windows)

The next example shows how to calculate a local pattern using overlapping windows. This option is recommended to use, especially for a larger size value. Using overlapping windows does not introduce arbitrary boundaries.

To obtain overlapping windows the calculate_metrics() function requires additional argument - shift.

# calculate metrics for overlapping windows
metr_df_10 = calculate_metrics(x = real_raster, w = dens_raster, 
                               neighbourhood = 4, fun = "mean", 
                               size = 20, shift = 10, threshold = 0.5)

smr10 = metr_df_10 %>%
  group_by(row, col) %>%
  summarize(
    ent_mean = mean(ent, na.rm = TRUE),
    ent_sd = sd(ent, na.rm = TRUE),
    mutinf_mean = mean(mutinf, na.rm = TRUE),
    mutinf_sd = sd(mutinf, na.rm = TRUE)
  )
#> `summarise()` has grouped output by 'row'. You can override using the `.groups`
#> argument.

smr10 %>% 
  ungroup() %>% 
  select(-row, -col) %>% 
  summarise_all(mean)
#> # A tibble: 1 × 4
#>   ent_mean ent_sd mutinf_mean mutinf_sd
#>      <dbl>  <dbl>       <dbl>     <dbl>
#> 1     1.48 0.0333      0.0811    0.0224

# calculate bivariate classification
smr10$bivar_cls = bivariate_classification(
  entropy = smr10$ent_mean,
  mutual_information = smr10$mutinf_mean,
  n = nlyr(race_raster)
)

Mapping local racial diversity and racial segregation

The average value of entropy and mutual information calculated for each square-shaped window from all realizations can be joined to spatial grid object (created by create_grid()). For overlapping windows, the resolution of the grid will be defined by shift parameter.

# create spatial grid object
grid_sf10 = create_grid(real_raster, size = 20, shift = 10)

# join IT-metrics to the grid
attr_grid10 = dplyr::left_join(grid_sf10, smr10, by = c("row", "col"))

Mapping racial diversity

plot(attr_grid10["ent_mean"], breaks = ent_breaks, key.pos = 1, 
     pal = rev(hcl.colors(length(ent_breaks) - 1, palette = "RdBu")),
     # pal = grDevices::hcl.colors(length(ent_breaks) - 1, palette = "Blue-Red"),
     bty = "n", main = "Racial diversity (Entropy)")

Mapping racial segregation

plot(attr_grid10["mutinf_mean"], breaks = mut_breaks, key.pos = 1,
     pal = rev(hcl.colors(length(mut_breaks) - 1, palette = "RdBu")),
     bty = "n", main = "Racial segregation (Mutual information)")

Mapping racial segregation and diversity using bivariate classification

# `biv_color`s defines a bivariate palette, 
# `bcat` selects only colors for categories available for analyzed areas
biv_colors = c("11" = "#e8e8e8", "12" = "#e4acac", "13" = "#c85a5a", "21" = "#b0d5df", 
               "22" = "#ad9ea5", "23" = "#985356", "31" = "#64acbe","32" = "#627f8c",
               "33" = "#574249")
bcat = biv_colors[names(biv_colors)%in%unique(attr_grid10$bivar_cls)]
plot(attr_grid10["bivar_cls"], pal = bcat, main = "Racial diversity and residential segregation")