---
title: "Workflow example for the Tasmanian thylacine"
author: "Global ChEC Lab"
date: "2024-04-11"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Thylacine example}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
In this vignette we provide a more advanced example of the *poems* workflow using a
population model of the historic decline of the *thylacine* in Tasmania, Australia.
The *thylacine* was a carnivorous marsupial that declined to extinction in response to
hunting and land-use change following European settlement of Australia. As well as
being hunted for the exotic fur and skin trade, *thylacine* were also killed because
they were considered a threat to livestock. A government bounty was offered for
eradicated *thylacine* during the period 1888 until 1909 (Guiler 1985). We model these
hunting and bounty dynamics using a bio-economic approach adapted from Bulte, Horan,
and Shogren (2003). We build our model ensemble by selecting the simulations most
congruent to expected extirpation/extinction dates and changes (regression slopes) in
the number of *thylacines* harvested during the bounty interval. These targets are
derived from historic bounty, capture and sighting records.
## Setup
We begin by loading the *poems* package. We will
set a flag for running the vignette in demonstration mode (default). When set to *TRUE*,
pre-run simulation results are loaded rather than waiting (potentially hours) to run
the full sample model set. Feel free to set this flag to *FALSE* to run your own
simulations. You may also change the number of samples generated, set the number of
parallel cores available on your system, and set the simulation output directory.
```r
library(poems)
DEMONSTRATION <- TRUE # load pre-run data rather than running simulations
SAMPLES <- 20000
PARALLEL_CORES <- 2
```
## Workflow
The *poems* workflow implements a pattern-oriented modeling (POM) approach
(Grimm et al., 2005), which can be broken into six steps:
1. Build the population model for the study region.
1. Generate dynamic model parameters.
1. Sample model and generator parameters for each simulation.
1. Build a simulation manager to run each simulation.
1. Build a results manager to generate summary results (metrics).
1. Build a validator to select a model ensemble based on POM.
### Step 1: Build the population model for the study region
We start by building a model template (using the *PopulationModel* class) with fixed
model parameters and user-defined functions, including our bio-economic harvest
function. Since our model is spatially explicit, we also need to define our study region
(via the *Region* class).
##### Tasmanian study region
First, we'll define our study region via a *raster* of 0.1 degree grid-cells for the
majority of Tasmania, once inhabited by the *thylacine*. This *raster* of Tasmania
accompanies the *poems* package, and may be loaded via its variable name.
```r
# Raster of Tasmania (note: islands removed where there was no evidence of thylacine occupancy).
data(tasmania_raster)
raster::plot(tasmania_raster,
main = "Tasmania raster",
xlab = "Longitude (degrees)", ylab = "Latitude (degrees)",
colNA = "blue"
)
```
The grid-cell size was selected as a reasonable approximation for *thylacine*
territorial range (Guiler & Godard, 1998).
To define our region, we will use this *raster* of Tasmania as a template for the
*Region* object, which will set the occupiable cell indices in the order in which they
are stored in the *raster*.
```r
# Tasmania study region (795 cells stored in the order shown)
region <- Region$new(template_raster = tasmania_raster)
raster::plot(region$region_raster,
main = "Tasmanian study region (cell indices)",
xlab = "Longitude (degrees)", ylab = "Latitude (degrees)",
colNA = "blue"
)
```
##### IBRA bioregions
We partition our study region into Interim Bioregionalisation of Australia (IBRA)
bioregions for several purposes. Our harvest function will distribute our simulated
harvest across these IBRA bioregions. We will also use these IBRA bioregions to
approximate spatial differences in habitat suitability decline. In step 6, we will use
estimated extirpation dates for each IBRA bioregion when selecting our model ensemble.
A *raster* of the approximate distribution of our study region cells across IBRA
bioregions, as well as a *data frame* containing information about each IBRA bioregion,
are included with the *poems* package. Here we also collate lists of indices and the
number of cells for each bioregion.
```r
# Tasmania study Interim Bioregionalisation of Australia (IBRA) bioregion cell distribution
data(tasmania_ibra_raster)
raster::plot(tasmania_ibra_raster,
main = "Tasmania study IBRA bioregions", colNA = "blue",
xlab = "Longitude (degrees)", ylab = "Latitude (degrees)"
)
```
```r
data(tasmania_ibra_data)
tasmania_ibra_data # examine
#> index key abbr name
#> 1 1 A FUR Furneaux
#> 2 2 B BEN Ben Lomond
#> 3 3 C TNM Tasmanian Northern Midlands
#> 4 4 D TSE Tasmanian South East
#> 5 5 E TW Tasmanian West
#> 6 6 F TNS Tasmanian Northern Slopes
#> 7 7 G TSR Tasmanian Southern Ranges
#> 8 8 H TCH Tasmanian Central Highlands
#> 9 9 I KIN King
# Calculate cell indices and counts for IBRA bioregions
ibra_indices <- lapply(
as.list(tasmania_ibra_data$index),
function(i) {
which(tasmania_ibra_raster[region$region_indices] == i)
}
)
ibra_indices[1:2] # examine
#> [[1]]
#> [1] 13 14 15 25 26 27 28 29 30 31 44 45 46 47 48 50 52 53 54
#> [20] 55 71 72 73 74 75 76 77 78 79 85 86 87 105 106 107 108 109 110
#> [39] 124 143 161 197 198 233 234 270
#>
#> [[2]]
#> [1] 49 51 80 81 82 83 84 111 112 113 114 115 116 117 118 119 120 121 122
#> [20] 123 148 149 150 151 152 153 154 155 156 157 158 159 160 187 188 189 190 191
#> [39] 192 193 194 195 196 225 226 227 228 229 230 231 232 261 262 263 264 265 266
#> [58] 267 268 269 296 297 298 299 300 301 302 303 304 305 332 333 334 335 336 337
ibra_cells <- unlist(lapply(ibra_indices, length))
ibra_cells # examine
#> [1] 46 76 40 147 188 72 93 86 47
```
##### Density dependence function
Here we create a user-defined function for density dependence utilizing the logistic
approach (Ricker, 1954). Although the *poems* package has the Ricker logistic function
as a predefined function operating at the population grid-cell level (via
*density_dependence = "logistic"*), this was found to be inadequate for our *thylacine*
model due to low densities of *thylacine* at the grid-cell level. To overcome this our
density dependence function operates at a neighborhood level, where each
cell's neighborhood include its adjacent cells (thus up to 9 cells in total).
This approach considers the mobility of the *thylacine* when calculating annual
reproduction and survival rates, whereby individuals have the ability to search for
mates or more suitable habitat in neighboring cells.
In order to first define our neighborhoods for our density dependence function, it is
convenient to partially skip ahead to step 2 (building generators) and build our
dispersal generator.
```r
# Build the dispersal generator and calculate distance data
dispersal_gen <- DispersalGenerator$new(
region = region,
dispersal_max_distance = 50,
distance_scale = 1000, # in km
dispersal_friction = DispersalFriction$new(), # modify coastline distances
inputs = c("dispersal_p", "dispersal_b"),
decimals = 5
)
dispersal_gen$calculate_distance_data()
head(dispersal_gen$distance_data$base) # examine
#> target_pop source_pop compact_row distance_class
#> 1 2 1 1 9
#> 2 3 1 2 17
#> 3 4 1 3 26
#> 4 5 1 4 14
#> 5 6 1 5 12
#> 6 7 1 6 14
```
We will use the generator for its dispersal functionality later, but for now we will
use it to calculate distance data in order to define our neighborhoods of adjacent
cells for each grid-cell. At our defined spatial resolution, adjacent cells are within
14 km from each cell.
```r
# Define neighborhoods (of up to 9 adjacent cells) based on a 14 km range from each
# grid cell for density dependence calculations (using a dispersal generator)
distance_data <- dispersal_gen$distance_data[[1]]
nh_data <- distance_data[which(distance_data$distance_class <= 14), 2:1]
neighborhoods <- as.list(1:795)
for (i in 1:nrow(nh_data)) {
neighborhoods[[nh_data$source_pop[i]]] <- c(
neighborhoods[[nh_data$source_pop[i]]],
nh_data$target_pop[i]
)
}
neighborhoods[1:3] # examine
#> [[1]]
#> [1] 1 2 5 6 7
#>
#> [[2]]
#> [1] 2 1 3 6 7 8
#>
#> [[3]]
#> [1] 3 2 4 7 8 9
```
We can now define our density dependence function, which is list-nested with our
neighborhoods and an Allee effect (Allee, 1931) parameter. We apply a Tasmania-wide Allee
effect to suppress a sustained low, non-viable population. Additional functionality is also
included to prevent reproduction when a neighborhood has a single *thylacine*. We also
define an alias to the Allee parameter so we can sample it later.
```r
# User-defined function for Ricker logistic density dependence via neighborhoods, with
# Allee effects; also remove fecundities if single thylacine in a neighborhood
density_dependence <- list(
neighborhoods = neighborhoods,
allee = 25, # Tasmania-wide Allee effect parameter
function(params) {
# Apply logistic density dependence using neighborhoods
growth_rate_max <- params$growth_rate_max
nh_density_abundance <- unlist(lapply(
params$neighborhoods,
function(nh_indices) {
sum(params$density_abundance[nh_indices])
}
))
nh_carrying_capacity <- unlist(lapply(
params$neighborhoods,
function(nh_indices) {
sum(params$carrying_capacity[nh_indices])
}
))
occupied_indices <- params$occupied_indices
growth_rate <- growth_rate_max * (1 - (nh_density_abundance[occupied_indices] /
nh_carrying_capacity[occupied_indices]))
params$transition_array[, , occupied_indices] <-
params$apply_multipliers(
params$transition_array[, , occupied_indices],
params$calculate_multipliers(growth_rate)
)
# Apply Tasmania-wide allee effect
total_abundance <- sum(params$density_abundance)
params$transition_array <-
params$transition_array * total_abundance / (params$allee + total_abundance)
# Remove fecundities for single thylacines
single_indices <- which(nh_density_abundance == 1)
params$transition_array[, , single_indices] <-
(params$transition_array[, , single_indices] * as.vector(+(!params$fecundity_mask)))
return(params$transition_array)
}
)
density_aliases <- list(density_allee = "density_dependence$allee")
```
Note that after calculating a modified growth rate via neighborhood abundance and
carrying capacity for each cell, we utilize a lookup table (calculated at the time of
simulation) so as to apply a multiplier to the stage matrix corresponding to each cell,
thus modifying its equivalent growth rate appropriately.
##### Harvest function
Let's now define our bio-economic harvest function, which is optionally list-nested
with harvest parameters. The total (Tasmania-wide) number of *thylacines* harvested at
each model simulation time step is calculated via the combination of a constant
opportunistic harvest rate plus a bio-economic component adapted from Bulte et al.
(2003). The bio-economic component simulates hunting effort based on economic return
(or profit), relative to other economic alternatives. Economic hunting rewards include
a bounty applied during the bounty period (1888-1909), plus an ongoing reward for pelts.
It has been estimated that up to half of *thylacines* killed were not submitted for bounty
(Guiler & Godard, 1998). We therefore include an additional parameter in our harvest
model to calculate the fraction of each harvest that is submitted for bounty. Once we
calculate our simulated yearly total harvest and bounty, we then distribute them across
IBRA bioregions based on *thylacine* bioregion densities. We define aliases for the
harvest/bounty parameters so we can sample them later.
```r
# Harvest bounty (economic) model user-defined function adapted from Bulte et al. (2003).
harvest <- list(
# Function parameters (passed to function in params list)
ohr = 0.05, # opportunistic harvest rate
t1 = 1888, # first year of bounty
tb = 1909, # last year of bounty
fb = 0.75, # harvest fraction submitted for bounty
B = c(1.6, 0.6), # bounty/skin price in pounds, pre/post bounty
w = 3.5, # opportunity cost in pounds per year
E0 = 25, # effort in 1888 (no. hunters)
q = 0.0025, # catchability coefficient
v1 = 0.02, # entry rate
v2 = 0.5, # exit rate
ibra_indices = ibra_indices, # bioregion cell (row) indices
ibra_cells = ibra_cells, # number of cells in bioregions
# Function definition
bounty_function = function(params) {
# Unpack parameters (used at every time step)
ohr <- params$ohr
t1 <- params$t1
tb <- params$tb
fb <- params$fb
B <- params$B
w <- params$w
q <- params$q
v1 <- params$v1
v2 <- params$v2
ibra_indices <- params$ibra_indices
ibra_cells <- params$ibra_cells
ibra_number <- length(ibra_cells)
stages <- params$stages
populations <- params$populations
simulator <- params$simulator
tm <- params$tm
x <- params$stage_abundance
# Initialise (first time step only)
if (tm == 1) { # attach variables and access results via simulator reference object
simulator$attached$E <- params$E0 # current bounty effort
simulator$attached$vi <- v1 # current bounty rate
simulator$results$bounty <- array(0, c(ibra_number, params$time_steps))
}
# Access persistent parameters via simulator reference object
E <- simulator$attached$E
vi <- simulator$attached$vi
# Next year's hunting effort and entry/exit rates based on this year's profit
h <- max(0, round((ohr + q * E) * sum(x))) # harvest
b <- round(h * fb * ((tm + t1 - 1) <= tb)) # bounty submitted
profit <- round(B[((tm + t1 - 1) > tb) + 1] * b + B[2] * (h - b) - w * E, 1)
simulator$attached$E <- max(0, round(E + vi * profit))
simulator$attached$vi <- c(v1, v2)[(profit < 0) + 1]
# Distribute harvest and bounty across bioregions based on each IBRA density
staged_indices <- array(1:(stages * populations), c(stages, populations))
rep_indices <- unlist(apply(
matrix(staged_indices[, unlist(ibra_indices)]), 1,
function(i) rep(i, x[i])
))
distributed_h <- array(0, c(stages, populations))
if (length(rep_indices) && h > 0) {
ibra_x <- unlist(lapply(ibra_indices, function(indices) sum(x[, indices])))
rep_ibra <- unlist(apply(matrix(1:ibra_number), 1, function(i) rep(i, ibra_x[i])))
rep_prob <- 1 / ibra_cells[rep_ibra]
h_indices <- sample(1:length(rep_indices), min(h, sum(x)), prob = rep_prob)
if (b > 0) {
b_indices <- h_indices[sample(1:length(h_indices), b)]
simulator$results$bounty[, tm] <- tabulate(rep_ibra[b_indices],
nbins = ibra_number
)
}
for (i in rep_indices[h_indices]) distributed_h[i] <- distributed_h[i] + 1
}
# Return abundance
return(x - distributed_h)
}
)
harvest_aliases <- list(
harvest_ohr = "harvest$ohr", harvest_fb = "harvest$fb",
harvest_w = "harvest$w", harvest_E0 = "harvest$E0",
harvest_q = "harvest$q", harvest_v1 = "harvest$v1",
harvest_v2 = "harvest$v2"
)
```
##### Template model
Finally, we can build our template model with these user-defined functions and their
associated parameters, as well as other fixed model parameters.
```r
# Population (simulation) model template for fixed parameters
model_template <- PopulationModel$new(
region = region,
time_steps = 80, # years (1888-1967)
populations = region$region_cells, # 795
# initial_abundance : generated
# stage_matrix: generated
fecundity_max = 2,
demographic_stochasticity = TRUE,
# carrying_capacity : generated
density_dependence = density_dependence, # user-defined
harvest = harvest, # user-defined
# dispersal : generated
dispersal_target_k = 0.5,
dispersal_target_n = list(threshold = 4, cutoff = 8),
simulation_order = c("results", "harvest", "transition", "dispersal"),
results_selection = c("abundance", "harvested"),
attribute_aliases = c(density_aliases, harvest_aliases)
)
```
Note the use of thresholds/cutoffs for dispersal target carrying capacities and
abundances. The former suppress dispersal to cells with near-zero suitability,
whereas the latter prevent model *thylacines* from overcrowding cells. When overridden,
the simulation order may allow the initial abundance to be included in the results.
Also note the commented placeholders for model parameters that will be sampled or
generated.
### Step 2: Build generators for dynamically generating model parameters
Here we build a generator for combined model initial abundance and carrying capacity,
as well as a generator for generating a stage matrix for each sampled model based on
sampled growth rate. We will test our generators, including our pre-built dispersal
generator (from part 1), by generating some example outputs.
##### Initial habitat suitability
Firstly, our initial abundance and carrying capacity generator utilizes the habitat
suitability for our study region. The initial habitat suitability was derived from a
species distribution model (SDM) for the *thylacine*. The initial habitat suitability
*raster* has also been included with the *poems* package.
```r
# Initial thylacine habitat suitability
data(thylacine_hs_raster)
raster::plot(thylacine_hs_raster,
main = "Initial thylacine habitat suitability", colNA = "blue",
xlab = "Longitude (degrees)", ylab = "Latitude (degrees)"
)
```
##### Habitat decline
Bulte et al. (2003) estimated a 3% per annum Tasmania-wide decline in *thylacine*
habitat suitability due to human land use. To approximate spatial differences in
decline, we approximate the decline in *thylacine* habitat suitability by applying
a constant annual decline to all IBRA bioregions except Tasmanian West (5) and Central
Highlands (8), which have mostly retained natural ecosystems.
##### Initial abundance and carrying capacity generator
We will utilize *Generator* class functionality for creating user-defined (template)
functions to generate initial abundance and carrying capacity (outputs). The functions
will use the initial habitat suitability and spatial distribution of habitat decline,
along with sampled values (inputs) for initial carrying capacity, percentage decline
(in 7/9 IBRA bioregions), and the fraction of capacity for initial abundance (phi).
```r
# Build a carrying capacity generator model based on habitat suitability and sampled
# initial capacity, initial fraction (phi), & decline rate per year in selected bioregions.
capacity_gen <- Generator$new(
description = "capacity",
time_steps = 80, # years (1888-1967)
initial_hs = thylacine_hs_raster[region$region_indices],
decline_indices = which(!tasmania_ibra_raster[region$region_indices] %in% c(5, 8)),
inputs = c("k_init", "k_decline", "k_phi"),
outputs = c("initial_abundance", "carrying_capacity"),
generative_requirements = list(
initial_abundance = "function",
carrying_capacity = "function"
)
)
capacity_gen$add_function_template(
"initial_abundance",
function_def = function(params) {
distr_k <- round(params$initial_hs / sum(params$initial_hs) * params$k_init)
a_init <- round(params$k_init * params$k_phi) # total initial
distr_a <- array(0, length(distr_k))
rep_indices <- unlist(apply(
matrix(1:length(distr_k)), 1,
function(i) rep(i, distr_k[i])
))
sample_indices <- rep_indices[sample(
1:length(rep_indices),
min(a_init, length(rep_indices))
)]
for (i in sample_indices) distr_a[i] <- distr_a[i] + 1
return(distr_a)
},
call_params = c("initial_hs", "k_init", "k_phi")
)
capacity_gen$add_function_template(
"carrying_capacity",
function_def = function(params) {
distr_k <- params$initial_hs / sum(params$initial_hs) * params$k_init
decline_matrix <- array(1, c(length(distr_k), params$time_steps))
decline_matrix[params$decline_indices, ] <-
matrix((1 - params$k_decline)^(0:(params$time_steps - 1)),
nrow = length(params$decline_indices), ncol = params$time_steps,
byrow = TRUE
)
return(distr_k * decline_matrix)
},
call_params = c(
"initial_hs", "time_steps", "decline_indices",
"k_init", "k_decline"
)
)
```
We can test our generator with some example (mid-value) sample inputs and
examine plots of the output initial abundance and carrying capacity (final).
```r
# Generate example initial abundance and declining carrying capacity time-series
generated_k <- capacity_gen$generate(input_values = list(
k_init = 2800,
k_decline = 0.04,
k_phi = 0.8
))
example_initial_abundance <- generated_k$initial_abundance
example_carrying_capacity <- generated_k$carrying_capacity
# Plot the example initial abundance
example_initial_n_raster <- region$region_raster
example_initial_n_raster[region$region_indices] <- example_initial_abundance
raster::plot(example_initial_n_raster,
main = "Example initial thylacines",
colNA = "blue", xlab = "Longitude (degrees)", ylab = "Latitude (degrees)"
)
```
```r
# Plot the example final carrying capacity
example_final_raster <- region$region_raster
example_final_raster[region$region_indices] <- example_carrying_capacity[, 80]
raster::plot(example_final_raster,
main = "Final thylacine carrying capacity",
colNA = "blue", xlab = "Longitude (degrees)", ylab = "Latitude (degrees)",
zlim = c(0, 8)
)
```
##### Stage matrix generator
Our stage matrix generator adjusts our original stage matrix so that its
equivalent (simple) growth rate (*lambda* - 1) is that of a sampled input rate.
```r
# Build a stage matrix generator based on sampled growth rate
stage_matrix_gen <- Generator$new(
description = "stage matrix",
base_matrix = matrix(c(
0.00, 0.57, 1.17,
0.50, 0.00, 0.00,
0.00, 0.80, 0.80
), nrow = 3, ncol = 3, byrow = TRUE),
inputs = c("growth_r"),
outputs = c("stage_matrix"),
generative_requirements = list(stage_matrix = "function")
)
stage_matrix_gen$add_function_template(
"stage_matrix",
function_def = function(params) {
return(params$base_matrix * (1 + params$growth_r) /
Re((eigen(params$base_matrix)$values)[1]))
},
call_params = c("base_matrix", "growth_r")
)
```
We can test our generator with an example (mid-value) sample input growth rate of 0.25
(*lambda* = 1.25).
```r
# Generate sampled stage matrix for growth rate: lambda = 1.25
gen_stage_m <- stage_matrix_gen$generate(input_values = list(growth_r = 0.25))
gen_stage_m # examine
#> $stage_matrix
#> [,1] [,2] [,3]
#> [1,] 0.0000000 0.5923549 1.2158863
#> [2,] 0.5196095 0.0000000 0.0000000
#> [3,] 0.0000000 0.8313752 0.8313752
```
##### Dispersal generator
We already built our dispersal generator in step 1, so as to use it to calculate
neighborhood distances for density dependence. Our distance-based dispersal generator
will use sampled values for the dispersal proportion (*p*) and breadth (*b*) parameters,
which are described in the generator's help documentation (*?DispersalGenerator*). We
will now use it to generate some example dispersal data, based on our best estimates for
the *p* and *b* parameters.
```r
# Generate sampled dispersals for p = 0.5, b = 7 (km)
sample_dispersal_data <- dispersal_gen$generate(
input_values = list(dispersal_p = 0.5, dispersal_b = 7)
)$dispersal_data
head(sample_dispersal_data[[1]]) # examine
#> target_pop source_pop emigrant_row immigrant_row dispersal_rate
#> 1 2 1 1 1 0.13164
#> 2 3 1 2 1 0.04198
#> 3 4 1 3 1 0.01161
#> 4 5 1 4 1 0.06444
#> 5 6 1 5 1 0.08576
#> 6 7 1 6 1 0.06444
```
### Example model run
Before proceeding to step 3 to setup multiple model simulations, let's run our model
with its fixed parameter values, those initially set in our user-defined density
dependence and harvest functions, along with our generated example initial abundance,
carrying capacity, stage matrix, and dispersal data. We will do this by cloning our
model template and setting our example parameters before running the population
simulator.
```r
# Run the model with example parameters
model <- model_template$clone()
model$set_attributes(
initial_abundance = example_initial_abundance,
carrying_capacity = example_carrying_capacity,
stage_matrix = gen_stage_m$stage_matrix,
dispersal = sample_dispersal_data
)
results <- population_simulator(model) # run poems simulator
# Plot the total abundance and number harvested
plot(
x = 1888:1967, y = results$all$abundance, xlab = "Year",
ylab = "Number of thylacines", main = "Thylacine example model run",
ylim = c(0, 2500), type = "l", col = "green", lwd = 2
)
lines(x = 1888:1967, y = results$all$harvested, lty = 1, col = "blue", lwd = 2)
legend("topright",
legend = c("Population size", "Simulated harvest"),
col = c("green", "blue"), lty = c(1, 1), lwd = 2, cex = 0.8
)
```
### Step 3: Sample model and generator parameters for each simulation
In order to explore the model parameter space to find the best models, we will now
generate 20,000 Latin hypercube samples of model and generator parameters using the
*LatinHypercubeSampler* class. We'll sample each parameter from uniform distributions
across parameter ranges derived from Bulte et al. (2003), and via model trial runs.
```r
# Create a LHS object
lhs_gen <- LatinHypercubeSampler$new()
# Set capacity and growth parameters (as per Bulte et al., 2003)
lhs_gen$set_uniform_parameter("k_init", lower = 2100, upper = 3500, decimals = 0)
lhs_gen$set_uniform_parameter("k_decline", lower = 0.03, upper = 0.05, decimals = 3)
lhs_gen$set_uniform_parameter("k_phi", lower = 0.6, upper = 1.0, decimals = 2)
lhs_gen$set_uniform_parameter("growth_r", lower = 0.1875, upper = 0.3125, decimals = 2)
# Set density dependence allee effect parameter
lhs_gen$set_uniform_parameter("density_allee", lower = 0, upper = 50, decimals = 1)
# Set bio-economic harvest parameters (as per Bulte et al., 2003)
lhs_gen$set_uniform_parameter("harvest_ohr", lower = 0, upper = 0.1, decimals = 3)
lhs_gen$set_uniform_parameter("harvest_fb", lower = 0.5, upper = 1.0, decimals = 2)
lhs_gen$set_uniform_parameter("harvest_w", lower = 2.625, upper = 4.375, decimals = 1)
lhs_gen$set_uniform_parameter("harvest_E0", lower = 18.75, upper = 31.25, decimals = 0)
lhs_gen$set_uniform_parameter("harvest_q", lower = 0, upper = 0.005, decimals = 4)
lhs_gen$set_uniform_parameter("harvest_v1", lower = 0.015, upper = 0.025, decimals = 3)
lhs_gen$set_uniform_parameter("harvest_v2", lower = 0.375, upper = 0.625, decimals = 3)
# Set new spatial parameters for dispersal
lhs_gen$set_uniform_parameter("dispersal_p", lower = 0.3, upper = 0.7, decimals = 2)
lhs_gen$set_uniform_parameter("dispersal_b", lower = 4, upper = 10, decimals = 1)
# Generate samples
sample_data <- lhs_gen$generate_samples(number = SAMPLES, random_seed = 123)
head(sample_data) # examine
#> k_init k_decline k_phi growth_r density_allee harvest_ohr harvest_fb
#> 1 3451 0.050 0.67 0.27 46.1 0.016 0.64
#> 2 3208 0.048 0.80 0.28 1.0 0.052 0.50
#> 3 2375 0.033 0.96 0.29 30.9 0.072 0.76
#> 4 2733 0.042 0.87 0.25 23.5 0.057 0.63
#> 5 3048 0.040 0.64 0.19 46.4 0.074 0.74
#> 6 3250 0.031 0.76 0.21 38.5 0.081 0.99
#> harvest_w harvest_E0 harvest_q harvest_v1 harvest_v2 dispersal_p dispersal_b
#> 1 3.3 30 0.0025 0.020 0.555 0.59 5.5
#> 2 2.9 23 0.0009 0.017 0.534 0.33 5.1
#> 3 3.8 29 0.0023 0.019 0.406 0.68 8.1
#> 4 4.0 26 0.0008 0.017 0.510 0.54 4.8
#> 5 3.1 21 0.0049 0.019 0.444 0.54 5.0
#> 6 2.9 24 0.0021 0.016 0.463 0.57 6.7
dim(sample_data) # dimensions
#> [1] 20000 14
```
### Step 4: Build a simulation manager to run each simulation
We will now setup a *SimulationManager* class to manage the simulation of each set
(or row) of sampled parameters. In demonstration mode we will only run the simulations
for the first two samples. The *poems* package provides example simulation summary data
for all 20,000 samples (which is used in our model ensemble selection and validation).
```r
OUTPUT_DIR <- tempdir()
# Build the simulation manager
sim_manager <- SimulationManager$new(
sample_data = sample_data,
model_template = model_template,
generators = list(capacity_gen, stage_matrix_gen, dispersal_gen),
parallel_cores = PARALLEL_CORES,
results_dir = OUTPUT_DIR
)
# Run the simulations
if (DEMONSTRATION) {
sim_manager$sample_data <- sample_data[1:2, ]
}
run_output <- sim_manager$run()
run_output$summary
#> [1] "2 of 2 sample models ran and saved results successfully"
if (DEMONSTRATION) {
dir(OUTPUT_DIR, "*.RData") # includes 2 result files
}
#> [1] "sample_1_results.RData" "sample_2_results.RData"
#> [3] "sample_93_rerun_1_results.RData" "sample_93_rerun_2_results.RData"
dir(OUTPUT_DIR, "*.txt") # plus simulation log
#> [1] "generation_log.txt" "simulation_log.txt"
```
Note that the output directory contains a R-data result files for each sample simulation
and a simulation log file.
### Step 5: Build a results manager to generate summary results (metrics)
We will now collate summary result metrics for each of our simulations using the
*ResultsManager* class. Here we will generate three metrics:
1. Regression slopes of total bounty submitted over three time intervals.
1. Estimated IBRA bioregion extirpation dates.
1. Estimated Tasmania-wide extinction date.
The metrics (and any desired summary matrices) are calculated via user-defined functions
operating on, or direct attributes of, *PopulationResults* class objects.
##### Using the *PopulationResults* class
To setup and test our summary metric functions, we can load the results from our example
simulation run into a *PopulationResults* class object. A time-series of IBRA bioregion
bounty values are attached to our results via our harvest function. This can be used to
calculate the regression slopes for specified intervals. To obtain a time-series of
bioregion abundance values, we can include the bioregion cell indices in our result
class object (as an attachment). We can then use this to calculate a time-series of
abundance for each bioregion, and thus calculate extirpation times for each bioregion.
We will use *PopulationResults* functionality for calculating population (cell)
abundance slopes (trends) and extirpations via result object cloning. Our clones will
instead be initialized with the bounty or bioregion abundance values. The Tasmania-wide
extinction time is directly available from the results object.
```r
# Load our results (list) into a PopulationResults object
p_results <- PopulationResults$new(
results = results,
ibra_indices = ibra_indices
)
# Summary metrics for IBRA bioregions and Tasmania-wide extinction
ibra_bounty <- p_results$get_attribute("bounty") # saved in harvest function
ibra_bounty_clone <- p_results$new_clone(
results = list(abundance = ibra_bounty),
trend_interval = (1888:1894) - 1887
)
ibra_bounty_clone$all$abundance_trend # 1888-1894 total bounty slope
#> [1] -5
ibra_abundance <- t(array(unlist(lapply(
p_results$get_attribute("ibra_indices"),
function(indices) {
colSums(p_results$abundance[indices, ])
}
)), c(80, 9)))
ibra_abundance_clone <- p_results$new_clone(results = list(abundance = ibra_abundance))
(1888:1967)[ibra_abundance_clone$extirpation] # IBRA extirpation
#> [1] 1935 1930 1938 1937 1942 1939 1941 1941 1939
(1888:1967)[p_results$all$extirpation] # total extinction
#> [1] 1942
```
##### Generating summary metrics and matrices
Once we're happy with how to derive our summary metrics (and matrices) via the
*PopulationResults* class, we can setup our results manager to generate the metrics
and matrix (rows) for each sample simulation results (file). Note that passing the
simulation manager when initializing the results manager copies the previous setup
(sample data, output directory, etc.).
Our bounty regression slope and IBRA bioregion extirpation metrics can be combined into
single metrics by calculating the square root of the mean square error (RMSE) of the
simulated metrics from known or derived target values, or confidence intervals (CI),
for each time interval or bioregion. Thus our targets for each of these RMSE metrics
(in step 6) will be zero. We can calculate these combined (RMSE) metrics using our
results manager by including (or attaching) the target values/CI to our results object.
However, if we wish to retain flexibility in how we combine our individual metrics, or
how we deal with *NA* values (e.g. no extirpation) in step 6, we can alternatively
store our metrics for each slope interval or bioregion in generated matrix rows and
calculate the difference with targets later. Here we will demonstrate both approaches,
although the matrix approach requires more disk space.
In demonstration mode, we will again only generate metrics for the first two samples.
We will load pre-generated example summaries for all 20,000 sample simulations, provided
with the *poems* package prior to our validation and ensemble selection (in step 6).
```r
# Set targets for our summary metrics (used to calculate combined metric errors)
slope_intervals <- c("1888-1894", "1895-1901", "1902-1909")
targets <- list(
bounty_slope = array(c(2.36, 3.25, -17.71), dimnames = list(slope_intervals)),
ibra_extirpation = array(
c(
c(NA, NA), c(1934, 1934), c(1912, 1919), c(1921, 1940),
c(1936, 1938), c(1935, 1935), c(1934, 1942),
c(1934, 1934), c(1932, 1932)
), c(2, 9),
dimnames = list(c("lower", "upper"), tasmania_ibra_data$abbr)
),
total_extinction = c(lower = 1936, upper = 1942) # CI
)
# Create a results manager for summary metrics and matrices
results_manager <- ResultsManager$new(
simulation_manager = sim_manager,
simulation_results = PopulationResults$new(
ibra_indices = ibra_indices, # attachments
targets = targets,
extirp_NA_replace = 1968
),
result_attachment_functions = list( # attached for multiple use
bounty_slope = function(results) { # via results object cloning
bounty_slope <- array(NA, 3)
ibra_bounty <- results$get_attribute("bounty") # saved in harvest function
ibra_bounty_clone <- results$new_clone(
results = list(abundance = ibra_bounty),
trend_interval = (1888:1894) - 1887
)
bounty_slope[1] <- ibra_bounty_clone$all$abundance_trend
ibra_bounty_clone <- results$new_clone(
results = list(abundance = ibra_bounty),
trend_interval = (1895:1901) - 1887
)
bounty_slope[2] <- ibra_bounty_clone$all$abundance_trend
ibra_bounty_clone <- results$new_clone(
results = list(abundance = ibra_bounty),
trend_interval = (1902:1909) - 1887
)
bounty_slope[3] <- ibra_bounty_clone$all$abundance_trend
bounty_slope
},
ibra_extirpation = function(results) { # via results object cloning
ibra_abundance_clone <- results$new_clone(results = list(
abundance = t(array(unlist(lapply(
results$get_attribute("ibra_indices"),
function(indices) {
colSums(results$abundance[indices, ])
}
)), c(80, 9)))
))
(1888:1967)[ibra_abundance_clone$extirpation]
}
),
summary_metrics = c(
"bounty_slope_error", "ibra_extirpation_error",
"total_extinction"
),
summary_matrices = c(
"extirpation", "total_bounty", "ibra_bounty",
"bounty_slope", "ibra_extirpation"
),
summary_functions = list(
# Summary metrics
bounty_slope_error = function(results) { # RMSE
sqrt(mean((results$get_attribute("targets")$bounty_slope -
results$get_attribute("bounty_slope"))^2))
},
ibra_extirpation_error = function(results) { # RMSE with NAs replaced
ibra_extirpation <- results$get_attribute("ibra_extirpation")
ibra_extirpation[is.na(ibra_extirpation)] <-
results$get_attribute("extirp_NA_replace")
target_CI <- results$get_attribute("targets")$ibra_extirpation
sqrt(mean(
((ibra_extirpation < target_CI[1, ]) * (ibra_extirpation - target_CI[1, ]) +
(ibra_extirpation > target_CI[2, ]) * (ibra_extirpation - target_CI[2, ]))^2,
na.rm = TRUE
))
},
total_extinction = function(results) {
(1888:1967)[results$all$extirpation]
},
# Summary matrices
extirpation = function(results) { # for later use
(1888:1967)[results$extirpation]
},
total_bounty = function(results) { # for later use
colSums(results$get_attribute("bounty"))
},
ibra_bounty = function(results) { # for later use
rowSums(results$get_attribute("bounty"))
},
bounty_slope = function(results) { # calculate RMSE later
results$get_attribute("bounty_slope")
},
ibra_extirpation = function(results) { # calculate RMSE later
results$get_attribute("ibra_extirpation")
}
),
parallel_cores = PARALLEL_CORES
)
# Generate the summary metrics and matrices
gen_output <- results_manager$generate()
gen_output$summary
#> [1] "2 of 2 summary metrics/matrices generated from sample results successfully"
dir(OUTPUT_DIR, "*.txt") # plus generation log
#> [1] "generation_log.txt" "simulation_log.txt"
summary_metric_data <- results_manager$summary_metric_data
summary_matrix_list <- results_manager$summary_matrix_list
head(summary_metric_data) # examine
#> index bounty_slope_error ibra_extirpation_error total_extinction
#> 1 1 11.271672 12.03121 1950
#> 2 2 9.343379 32.79672 NA
lapply(summary_matrix_list, dim) # dimensions
#> $extirpation
#> [1] 2 795
#>
#> $total_bounty
#> [1] 2 80
#>
#> $ibra_bounty
#> [1] 2 9
#>
#> $bounty_slope
#> [1] 2 3
#>
#> $ibra_extirpation
#> [1] 2 9
head(summary_matrix_list$bounty_slope) # examine
#> [,1] [,2] [,3]
#> [1,] 1.75 -9.666667 -3.083333
#> [2,] -1.00 -2.600000 -3.000000
head(summary_matrix_list$ibra_extirpation) # examine
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
#> [1,] 1939 1941 1941 1939 1950 1945 1946 1948 1945
#> [2,] 1952 1954 1967 NA NA NA NA NA NA
```
Note that the summary matrix list collates the specified result for each simulation
on a single row of each matrix. Thus the columns in the extirpation matrix represent
the extirpation time for each population cell, the columns in the total bounty and IBRA
bounty matrices represent the total bounty at each of the 80 simulation time steps, and
for each IBRA bioregion respectively. Similarly, the columns in the bounty slope matrix
represent regression slopes for each of the three time intervals, and the IBRA
extirpation matrix has a column for each of our nine bioregions.
##### Summary metric refinement
We can now collate and refine the summary metrics that we wish to utilize in our
validation and ensemble model selection (in step 6). Although we have already calculated
RMSE metrics for our bounty regression slopes and IBRA extirpation dates, we could
refine these by utilizing their uncombined values in the corresponding matrices. Here
we will reproduce these metrics for our two samples for demonstration purposes (in
demonstration mode), before loading the full pre-generated example metrics summary set.
We will also calculate the error from the target confidence interval (CI) for our
simulated total extinction dates for our full metrics set.
```r
# Demonstrate calculating RMSE metrics from matrices
if (DEMONSTRATION) { # Calculate RMSE for bounty slopes
bounty_slope_error2 <- sqrt(rowMeans((summary_matrix_list$bounty_slope -
matrix(targets$bounty_slope,
nrow = 2,
ncol = 3, byrow = TRUE
))^2))
cbind(
bounty_slope_error = summary_metric_data$bounty_slope_error,
bounty_slope_error2
) # examine
}
#> bounty_slope_error bounty_slope_error2
#> [1,] 11.271672 11.271672
#> [2,] 9.343379 9.343379
if (DEMONSTRATION) { # Calculate RMSE for IBRA extirpation
ibra_extirpation <- summary_matrix_list$ibra_extirpation
ibra_extirpation[is.na(ibra_extirpation)] <- 1968
target_CI <- array(targets$ibra_extirpation, c(dim(targets$ibra_extirpation), 2))
ibra_extirpation_error2 <- sqrt(rowMeans(
((ibra_extirpation < t(target_CI[1, , ])) * (ibra_extirpation - t(target_CI[1, , ])) +
(ibra_extirpation > t(target_CI[2, , ])) * (ibra_extirpation - t(target_CI[2, , ])))^2,
na.rm = TRUE
))
cbind(
ibra_extirpation_error = summary_metric_data$ibra_extirpation_error,
ibra_extirpation_error2
) # examine
}
#> ibra_extirpation_error ibra_extirpation_error2
#> [1,] 12.03121 12.03121
#> [2,] 32.79672 32.79672
# Load full example metrics
if (DEMONSTRATION) {
data(thylacine_example_metrics)
dim(thylacine_example_metrics) # dimensions
summary_metric_data <- thylacine_example_metrics
}
# Calculate the error from the CI of total extinction
extinct <- summary_metric_data$total_extinction
target_CI <- targets$total_extinction
summary_metric_data$total_extinction_error <-
((extinct < target_CI[1]) * (extinct - target_CI[1]) +
(extinct > target_CI[2]) * (extinct - target_CI[2]))
head(summary_metric_data) # examine
#> index bounty_slope_error ibra_extirpation_error total_extinction
#> 1 1 11.66632 22.597013 1961
#> 2 2 10.00490 31.334885 NA
#> 3 3 17.04603 9.480243 1925
#> 4 4 10.35301 25.763346 NA
#> 5 5 20.13609 31.040296 1902
#> 6 6 21.62111 22.605309 1913
#> total_extinction_error
#> 1 19
#> 2 NA
#> 3 -11
#> 4 NA
#> 5 -34
#> 6 -23
```
Here we could have also replaced *NA* values, those indicating that a population
persisted, when calculating the error of the total extinction date (from the target CI),
however we will defer this so as to demonstrate *NA* replacement functionality in the
validator in the next step.
### Step 6: Build a validator to select a model ensemble
We will now validate and select our model ensemble via the *Validator* class, which by
default utilizes an approximate Bayesian computation (ABC) approach (Beaumont, Zhang,
& Balding, 2002) provided by the *abc* library (Csillery et al., 2015). We will use the
default configuration to select the best 200 models (via a *tolerance* of 0.01). Our
targets for selecting the best models will be zero for our RMSE metrics for bounty
slope, IBRA extirpation, and total extinction. Note that for models where our simulated
*thylacine* persist until the end of the simulation (1967), the total extinction value
will be *NA*. Since the *abc* function requires finite values only, we will use the
*non_finite_replacements* attribute of the validator to replace these *NA* values with
the RMSE (based on target CI) corresponding to 1968. Stronger penalties could be applied
if desired.
Note that the validator generates diagnostics (as per the *abc* documentation) as a PDF
file in the configured output directory. We can configure this directory, because
locating the default output directory generated via *tempdir()* can be tedious.
```r
# Create a validator for selecting the 'best' example models
validator <- Validator$new(
simulation_parameters = sample_data,
simulation_summary_metrics = summary_metric_data[c(
"bounty_slope_error",
"ibra_extirpation_error",
"total_extinction_error"
)],
observed_metric_targets = c(
bounty_slope_error = 0,
ibra_extirpation_error = 0,
total_extinction_error = 0
),
non_finite_replacements = list(total_extinction_error = function(x) {
(1968 - targets$total_extinction[2])
}),
output_dir = OUTPUT_DIR
)
suppressWarnings(validator$run(
tolerance = 0.01, sizenet = 1, lambda = 0.0001,
output_diagnostics = TRUE
))
#> 12345678910
#> 12345678910
dir(OUTPUT_DIR, "*.pdf") # plus validation diagnostics (see abc library documentation)
#> [1] "validation_diagnostics.pdf"
head(validator$selected_simulations) # examine
#> index weight
#> 1 93 0.02871514
#> 2 99 0.08332668
#> 3 164 0.21062364
#> 4 248 0.11186737
#> 5 309 0.43248925
#> 6 337 0.15984413
dim(validator$selected_simulations) # dimensions
#> [1] 200 2
selected_indices <- validator$selected_simulations$index
selected_weights <- validator$selected_simulations$weight
```
Note that each selected model (via its index in the sample data frame) have a
corresponding weight value, which is indicative of the congruence between each model's
summary metrics and the corresponding target patterns.
### Selected model ensemble
Now that we have selected our model ensemble we can examine:
* how well our simulations did in hitting validation targets;
* which model parameters were most influential in our ensemble selection; and
* the impact of model stochasticity on our ensemble selection.
We can also verify our model via historic bounty record data not explicitly utilized in
our validation and model selection.
Lastly, we can use our model ensemble to examine the spatio-temporal extirpation pattern
of our simulated *thylacine* decline.
##### Model ensemble summary metrics
To examine how well our individual (uncombined) metric targets have been matched, let's
now plot the distributions of our summary metrics of our selected model ensemble, as
well as those of the full 20,000 model simulations.
When in demonstration mode, we will load pre-generated example matrices, provided with
the *poems* package, containing bounty slope and IBRA bioregion extirpations for all
20,000 sample simulations. The example matrices also include cell extirpation dates, and
a time-series and IBRA distribution of total bounty, which we will use later.
```r
# Load pre-generated example matrices
if (DEMONSTRATION) {
data(thylacine_example_matrices)
summary_matrix_list <- thylacine_example_matrices
} else { # cell extirpation and total/ibra bounty for selected samples only
summary_matrix_list$extirpation <- summary_matrix_list$extirpation[selected_indices, ]
summary_matrix_list$total_bounty <- summary_matrix_list$total_bounty[selected_indices, ]
summary_matrix_list$ibra_bounty <- summary_matrix_list$ibra_bounty[selected_indices, ]
}
lapply(summary_matrix_list, dim) # dimensions
#> $extirpation
#> [1] 200 795
#>
#> $total_bounty
#> [1] 200 80
#>
#> $ibra_bounty
#> [1] 200 9
#>
#> $bounty_slope
#> [1] 20000 3
#>
#> $ibra_extirpation
#> [1] 20000 9
# Plot the simulation, targets, and selected metrics for bounty slopes
bounty_slope <- summary_matrix_list$bounty_slope
colnames(bounty_slope) <- slope_intervals
graphics::boxplot(bounty_slope,
border = rep("gray", 3), outline = FALSE, range = 0,
col = NULL, ylim = c(-35, 20), main = "Thylacine bounty slope",
xlab = "Slope interval (years)", ylab = "Bounty regression slope"
)
graphics::boxplot(cbind(bounty_slope[selected_indices, ], NA), # NA for relative width
border = rep("blue", 3), width = c(rep(0.5, 3), 1), outline = FALSE,
range = 0, add = TRUE, col = NULL
)
graphics::points(x = 1:3, y = targets$bounty_slope, col = "red", pch = 15)
legend("topright", c("All", "Target", "Selected"),
fill = c("gray", "red", "blue"),
border = NA, cex = 0.8
)
```
From this plot we see that bounty slopes from all 20,000 sample models do not all align
well with our bounty slope targets, however, bounty slopes from some models do
reasonably well at matching targets. These are the models that have been selected.
Generally, the distribution of our full sample set does not facilitate model congruence
with the targets, suggesting that our harvest function may be inadequate for generating
the expected temporal contour of the total *thylacine* bounty. This inadequacy becomes
clearer later, when we plot the total bounty across time for our model ensemble.
```r
# Plot the simulation, targets, and selected metrics for extirpation
extirpation <- cbind(
summary_matrix_list$ibra_extirpation,
summary_metric_data$total_extinction
)
colnames(extirpation) <- c(as.character(tasmania_ibra_data$abbr), "Total")
graphics::boxplot(extirpation[, 10:1],
border = rep("gray", 3), horizontal = TRUE,
outline = FALSE, range = 0, col = NULL, ylim = c(1888, 1968),
main = "Thylacine model IBRA extirpation and total extinction",
xlim = c(0.5, 11), xlab = "Extirpation/extinction time (year)",
ylab = "IBRA bioregion/Tasmania-wide total"
)
graphics::boxplot(cbind(extirpation[selected_indices, 10:1], NA),
horizontal = TRUE,
ylim = c(1888, 1968), border = rep("blue", 10),
width = c(rep(0.5, 10), 1), outline = FALSE, range = 0,
add = TRUE, col = NULL
)
for (i in 1:9) {
graphics::points(
x = targets$ibra_extirpation[, i], y = rep(11 - i, 2),
col = "red", pch = 15
)
graphics::lines(
x = targets$ibra_extirpation[, i], y = rep(11 - i, 2),
col = "red", lwd = 2
)
}
graphics::points(x = targets$total_extinction, y = c(1, 1), col = "red", pch = 15)
graphics::lines(x = targets$total_extinction, y = c(1, 1), col = "red", lwd = 2)
legend("topright", c("All", "Target (CI)", "Selected"),
fill = c("gray", "red", "blue"),
border = NA, cex = 0.8
)
```
Here we see that the simulated extirpation dates for individual IBRA bioregions vary in
their congruence with the target values, with some strong matches, including our total
Tasmania-wide extirpation date. These results may contribute insight into the degree to
which our sampled model harvest parameters can influence the spatial dynamics of the
simulated *thylacine* decline.
##### Model ensemble parameters
We can now examine the features of our selected model ensemble to analyze its properties
and adequacy. We can plot the distribution of our 200 selected model parameters in
comparison to the (uniform) distribution of the full 20,000 simulated models. Here we
will examine two notable example parameters: density Allee effect and harvest
catchability.
```r
# Allee effect
hist(sample_data$density_allee,
breaks = 30, main = "Model ensemble Allee effect",
xlab = "Allee effect", ylim = c(0, 1000), col = "gray", yaxt = "n"
)
hist(rep(sample_data$density_allee[selected_indices], 20), breaks = 20, col = "blue", add = TRUE)
legend("topright", c("All", "Selected"), fill = c("gray", "blue"), cex = 0.8)
```
```r
# Harvest catchability
hist(sample_data$harvest_q,
breaks = 30, main = "Model ensemble harvest catchability",
xlab = "Harvest catchability (q)", col = "gray"
) # , yaxt = "n")
hist(rep(sample_data$harvest_q[selected_indices], 20), breaks = 20, col = "blue", add = TRUE)
legend("topright", c("All", "Selected"), fill = c("gray", "blue"), cex = 0.8)
```
From the first plot we could note the general reliance our model has on the Allee effect
in matching extirpation/extinction targets (or non-persistence in general). The second
plot illustrates how the validation/selection process has "tuned" our harvest
catchability parameter.
We can also explore relationships between selected model parameters, thus identifying
which parameter combinations contributed to our selected model ensemble's congruence
with our metric targets. Here we examine one notable correlation between our selected
model opportunistic harvest rate and our catchability parameter.
```r
plot(
x = sample_data$harvest_ohr, y = sample_data$harvest_q, ylim = c(0, 0.0055),
xlab = "Opportunistic harvest rate", ylab = "Bio-economic harvest catchability",
main = "Opportunistic harvest vs. catchability", col = "gray"
)
points(
x = sample_data$harvest_ohr[selected_indices],
y = sample_data$harvest_q[selected_indices], col = "blue", pch = 3
)
graphics::legend("topright",
legend = c("All samples", "Selected"),
col = c("gray", "blue"), pch = c(1, 3), cex = 0.8
)
```
From the diagnostic output (PDF file in our output directory), we can also examine
the posterior distributions of our 200 selected model parameters relative to their
prior distributions. This can give us further insight into the influence of each
model parameter has had on our model selection.
From these diagnostics and plots we can analyze and assess the adequacy of our model
and/or summary metrics. We may choose to refine our model parameter ranges or our
summary metrics. We can also choose to simulate a more extensive parameter space,
select fewer or more models via our validator's *tolerance* parameter, or revisit
our model structure. We may also wish to examine the role (demographic) stochasticity
has played in our model selection.
##### Model ensemble stochasticity
We will now re-run each of our selected 200 models ten times, so as to examine the
impact of demographic stochasticity on our summary metrics. Again, in demonstration
mode, we will only run the first two re-run simulations, then load example re-run
summary metrics and matrices provided by the *poems* package.
Note that in order to not override existing results files we can either change our
output directory, or alternatively, as we do here, change the naming of the output
files via the *results_filename_attributes* attribute of the simulation and results
managers.
```r
# Run replicates of 10 for each selected model
sample_data_rerun <- cbind(sample = 1:nrow(sample_data), sample_data)
sample_data_rerun <- cbind(sample_data_rerun[rep(selected_indices, each = 10), ],
rerun = rep(1:10, length(selected_indices))
)
head(sample_data_rerun) # examine
#> sample k_init k_decline k_phi growth_r density_allee harvest_ohr
#> 93 93 3153 0.042 0.73 0.29 43.2 0.072
#> 93.1 93 3153 0.042 0.73 0.29 43.2 0.072
#> 93.2 93 3153 0.042 0.73 0.29 43.2 0.072
#> 93.3 93 3153 0.042 0.73 0.29 43.2 0.072
#> 93.4 93 3153 0.042 0.73 0.29 43.2 0.072
#> 93.5 93 3153 0.042 0.73 0.29 43.2 0.072
#> harvest_fb harvest_w harvest_E0 harvest_q harvest_v1 harvest_v2
#> 93 0.66 3.7 28 0.0014 0.016 0.414
#> 93.1 0.66 3.7 28 0.0014 0.016 0.414
#> 93.2 0.66 3.7 28 0.0014 0.016 0.414
#> 93.3 0.66 3.7 28 0.0014 0.016 0.414
#> 93.4 0.66 3.7 28 0.0014 0.016 0.414
#> 93.5 0.66 3.7 28 0.0014 0.016 0.414
#> dispersal_p dispersal_b rerun
#> 93 0.39 8.4 1
#> 93.1 0.39 8.4 2
#> 93.2 0.39 8.4 3
#> 93.3 0.39 8.4 4
#> 93.4 0.39 8.4 5
#> 93.5 0.39 8.4 6
if (DEMONSTRATION) {
sim_manager$sample_data <- sample_data_rerun[1:2, ]
} else {
sim_manager$sample_data <- sample_data_rerun
}
sim_manager$results_filename_attributes <- c("sample", "rerun")
run_output <- sim_manager$run()
run_output$summary
#> [1] "2 of 2 sample models ran and saved results successfully"
if (DEMONSTRATION) {
dir(OUTPUT_DIR, "*.RData") # includes 2 new result files
}
#> [1] "sample_1_results.RData" "sample_2_results.RData"
#> [3] "sample_93_rerun_1_results.RData" "sample_93_rerun_2_results.RData"
# Collate summary metrics for re-runs
if (DEMONSTRATION) {
results_manager$sample_data <- sample_data_rerun[1:2, ]
} else {
results_manager$sample_data <- sample_data_rerun
}
results_manager$summary_matrices <- c("bounty_slope", "ibra_extirpation")
results_manager$results_filename_attributes <- c("sample", "rerun")
gen_output <- results_manager$generate()
gen_output$summary
#> [1] "2 of 2 summary metrics/matrices generated from sample results successfully"
if (DEMONSTRATION) {
results_manager$summary_metric_data # examine demo
}
#> index bounty_slope_error ibra_extirpation_error total_extinction
#> 1 1 11.61502 5.000000 1933
#> 2 2 10.30186 4.168333 1933
if (DEMONSTRATION) { # load full example metrics and (some) matrices
data(thylacine_example_metrics_rerun)
summary_metric_data_rerun <- thylacine_example_metrics_rerun
data(thylacine_example_matrices_rerun)
summary_matrix_list_rerun <- thylacine_example_matrices_rerun
} else {
summary_metric_data_rerun <- results_manager$summary_metric_data
summary_matrix_list_rerun <- results_manager$summary_matrix_list
}
head(summary_metric_data_rerun) # examine
#> index bounty_slope_error ibra_extirpation_error total_extinction
#> 1 1 10.51599 7.516648 1929
#> 2 2 11.47151 4.227884 1934
#> 3 3 11.57192 4.358899 1934
#> 4 4 11.42602 5.809475 1938
#> 5 5 11.36964 6.304760 1930
#> 6 6 12.07262 5.937171 1933
dim(summary_metric_data_rerun) # dimensions
#> [1] 2000 4
lapply(summary_matrix_list_rerun, dim) # dimensions
#> $bounty_slope
#> [1] 2000 3
#>
#> $ibra_extirpation
#> [1] 2000 9
```
Note that the output directory contains new R-data result files for each sample rerun
simulation, named via a unique combination of sample data frame attributes.
Now we can compare the distributions of our metrics for all, selected, and re-run models
to better assess the impact of stochasticity. Note that simulations that persist are
represented via an extirpation date of 1968.
```r
# Bounty slope error
bounty_slope_error <- summary_metric_data$bounty_slope_error
hist(bounty_slope_error,
breaks = 140, main = "Thylacine bounty slope error",
xlim = c(0, 50), xlab = "Bounty slope RMSE", col = "gray", yaxt = "n"
)
bounty_slope_error_r <- summary_metric_data_rerun$bounty_slope_error
hist(rep(bounty_slope_error_r, 2), breaks = 20, col = "gold3", add = TRUE)
hist(rep(bounty_slope_error[selected_indices], 5),
breaks = 12,
col = "blue", add = TRUE
)
lines(x = rep(0, 2), y = c(0, 10000), col = "red", lwd = 2)
legend("topright", c("All", "Target", "Selected", "Replicates"),
fill = c("gray", "red", "blue", "gold3"), cex = 0.8
)
```
```r
# IBRA extirpation error
ibra_extirpation_error <- summary_metric_data$ibra_extirpation_error
hist(bounty_slope_error,
breaks = 140, main = "Thylacine IBRA extirpation error",
xlim = c(0, 50), xlab = "IBRA extirpation RMSE", col = "grey", yaxt = "n"
)
ibra_extirpation_error_r <- summary_metric_data_rerun$ibra_extirpation_error
hist(rep(ibra_extirpation_error_r, 2), breaks = 50, col = "gold3", add = TRUE)
hist(rep(ibra_extirpation_error[selected_indices], 5),
breaks = 20, col = "blue",
add = TRUE
)
lines(x = rep(0, 2), y = c(0, 10000), col = "red", lwd = 2)
legend("topright", c("All", "Target", "Selected", "Replicates"),
fill = c("grey", "red", "blue", "gold3"), cex = 0.8
)
```
```r
# Extinction time
extinction_time <- summary_metric_data$total_extinction
persistent_number <- length(which(is.na(extinction_time)))
extinction_time_finite <- extinction_time[!is.na(extinction_time)]
extinction_time_modified <-
hist(c(extinction_time_finite, rep(1968, round(persistent_number / 10))),
breaks = 81,
main = "Thylacine extinction", xlim = c(1888, 1968),
xlab = "Extinction time (year)", col = "gray40", yaxt = "n"
)
hist(extinction_time_finite, breaks = 81, col = "gray", add = TRUE)
extinction_time_rerun <- summary_metric_data_rerun$total_extinction
extinction_time_rerun[which(is.na(extinction_time_rerun))] <- 1968
hist(rep(extinction_time_rerun, 2), breaks = 50, col = "gold3", add = TRUE)
hist(rep(extinction_time[selected_indices], 5), breaks = 28, col = "blue", add = TRUE)
lines(x = rep(1931, 2), y = c(0, 10000), col = "red", lwd = 2)
lines(x = rep(1937, 2), y = c(0, 10000), col = "red", lwd = 2)
legend("topleft", c("All", "(persistent/10)", "Target (CI)", "Selected", "Replicates"),
fill = c("gray", "gray40", "red", "blue", "gold3"), cex = 0.8
)
```
From these plots we can observe that stochastic processes operating in our model have
some impact on our model validation and ensemble selection, since the distributions
of the metrics have slightly widened in our model ensemble replicate re-runs. However,
they have retained their proximity to the targets well. We could further analyze the
effects of stochasticity by exploring approaches for identifying which of our selected
models are congruent with our target metrics most frequently.
##### Model ensemble verification
We can verify the adequacy of our model by examining how well it produces observed data
patterns that were not (explicitly) utilized in the model calibration, validation and
selection stages. While the slopes of the bounty were used for validating and selecting
our model ensemble, the actual bounty magnitudes were not. Let's now use our model
ensemble to compare our simulated total bounty and the total historical (time-series)
records across time (provided with the *poems* package). Here we will use the selected
model weights to calculate a weighted average of simulated bounty submitted for the
selected ensemble and their replicates.
```r
# Compare weighted ensemble model to actual historical bounty time-series
data(thylacine_bounty_record)
selected_bounty <- summary_matrix_list$total_bounty
weighted_bounty <- colSums(selected_bounty * selected_weights / sum(selected_weights))
# Plot the simulated and actual bounty
plot(
x = 1888:1909, y = weighted_bounty[1:22], xlab = "Year",
ylab = "Number of thylacines", main = "Thylacine bounty submitted",
ylim = c(0, 200), type = "l", col = "blue", lwd = 2
)
lines(x = 1888:1909, y = thylacine_bounty_record$Total, lty = 1, col = "red", lwd = 2)
legend("topright",
legend = c("Model ensemble bounty", "Actual bounty"),
col = c("blue", "red"), lty = c(1, 1), lwd = 2, cex = 0.8
)
```
We can also compare our simulated total bounty submitted, both across IBRA bioregions
and Tasmania-wide, to our historical bounty record.
```r
# Compare weighted ensemble model to actual historical bioregion bounty values
selected_bounty <- cbind(
summary_matrix_list$ibra_bounty,
rowSums(summary_matrix_list$ibra_bounty)
)
weighted_bounty <- colSums(selected_bounty * selected_weights / sum(selected_weights))
combined_bounty <- rbind(
weighted_bounty,
actual_bounty <- colSums(thylacine_bounty_record[, c(3:11, 2)])
)
combined_bounty[, 10] <- combined_bounty[, 10] / 2
# Comparative plot of simulated and historic IBRA/total bounty
barplot(combined_bounty,
xlab = "IBRA bioregion/Tasmania-wide total", beside = TRUE,
ylab = "Number of thylacines", col = c("blue", "red"),
main = "Thylacine bounty submitted by region", border = NA
)
legend("topleft", c("Model ensemble bounty", "Actual bounty", "Note: Total/2"),
fill = c("blue", "red", NA), border = NA, cex = 0.8
)
```
We could do more thorough statistical analyzes to verify our model ensemble.
However, from our comparative plots of *thylacine* bounty harvests we have gained
insight into how well our ensemble model matches the bounty record both temporally and
spatially. We could thus devise ways in which to improve future versions of our model,
including developing a harvest function with mechanisms and parameterization that
better model spatio-temporal responses to the bounty reward. With these advances we
could possibly gain better insights into spatio-temporal dynamics of the *thylacine*
decline.
##### Model ensemble usage
As a final demonstration let's see how we can use our ensemble to examine our
simulated spatio-temporal extirpation pattern.
```r
# Calculate the weighted cell extirpation dates
selected_extirp <- summary_matrix_list$extirpation
weighted_extirpation <- colSums(selected_extirp * selected_weights / sum(selected_weights))
# Plot the weighted cell extirpation dates
extirpation_raster <- region$raster_from_values(weighted_extirpation)
raster::plot(extirpation_raster,
main = "Thylacine model ensemble extirpation",
xlab = "Longitude (degrees)", ylab = "Latitude (degrees)",
zlim = c(1888, 1940), col = grDevices::heat.colors(100), colNA = "blue"
)
```
## Summary
Building and utilizing our model ensemble for the *thylacine* has provided a relatively
advanced demonstration of the *poems* 6-step workflow and its modules.
1. First we constructed a customized model with user-defined density
dependence and harvest functions.
1. We then used functionality for dynamically generating stage matrices, initial
abundances, carrying capacities, and dispersal rates.
1. We generated sample model parameters via Latin hypercube sampling.
1. We ran these sampled models via a *poems* simulation manager.
1. We then collated and calculated summary metrics, based on functions we defined
within a results manager.
1. After which, we fed our sample model parameters, our summary metrics, and target
values for those metrics (derived from historical bounty and sighting records),
into an ABC validator, and ran it to select our model ensemble.
We then examined how well our model ensemble matched our target metrics, and analyzed
the influence that our model parameters and stochasticity had on the ensemble selection.
We also verified our model via the bounty records to see how well it matched patterns
that was not explicitly used in the model calibration, validation and ensemble
selection.
Applying the *poems* workflow to the *thylacine* resulted in the selection of models
that best reconstructed the range and extinction dynamics of the species, given the
validation targets that were accessible and the model structure. Simulations of bounty
harvest matched the historical records well, particularly at the Tasmania-wide level.
Simulated extirpation pattern and time of extinction matched known estimates for the
*thylacine* with fair-to-good accuracy. Further model developments could include
changes to the way that harvesters respond to the bounty reward, so as to better
match how the historical bounty changed over time and space.
In running the steps in this vignette's workflow, we have explored much of the
functionality offered by *poems*. The aim of the feature-rich example was to provide
a guide for building your own customized models with *poems*, so that you can explore
the spatio-temporal population dynamics of species of interest in your research and
practice.
Thank you :-)
## References
Allee, W. C. (1931). *Animal Aggregations: A Study in General Sociology*. University of
Chicago Press.
Beaumont, M. A., Zhang, W., & Balding, D. J. (2002). 'Approximate Bayesian computation
in population genetics'. *Genetics*, vol. 162, no. 4, pp, 2025–2035.
Bulte, E. H., Horan, R. D., & Shogren, J. F. (2003). 'Is the Tasmanian tiger extinct?
A biological–economic re-evaluation', *Ecological Economics*, vol. 45, no. 2,
pp. 271–279.
Csillery, K., Lemaire L., Francois O., & Blum M. (2015). 'abc: Tools for Approximate
Bayesian Computation (ABC)'. *R package version 2.1*. Retrieved from
Grimm, V., Revilla, E., Berger, U., Jeltsch, F., Mooij, W. M., Railsback, S. F.,
Thulke, H. H., Weiner, J., Wiegand, T., DeAngelis, D. L., (2005). 'Pattern-Oriented
Modeling of Agent-Based Complex Systems: Lessons from Ecology'. *Science*
vol. 310, no. 5750, pp. 987–991.
Guiler, E. (1985). *Thylacine: The Tragedy of the Tasmanian Tiger*. Oxford University
Press, Melbourne.
Guiler E. R. & Godard P. (1998). *Tasmanian tiger: a lesson to be learnt*.
Abrolhos Publishing, Perth, Western Australia.
Ricker, W. E. (1954). 'Stock and recruitment'.
*Journal of the Fisheries Research Board of Canada*, vol. 11, no. 5, pp. 559-623.