General IPMs

Simple IPMs vs General IPMs

In the introduction vignette, we reviewed some basic IPM theory and worked through examples of simple IPMs, which only model a single, continuously distributed state variable. General IPMs are distinguished from simple ones in that they incorporate multiple state variables and/or include transitions between continuous and discrete states. These state variables could be, for example, a continuous measure of coral colony surface area and a discrete state to denote aspergillosis infection (in Bruno et al. 2011), a mixture of trees classified by basal diameter and a discrete state for seeds in a seed bank (in Crandall & Knight 2017), or a mixture of multiple continuous and discrete state (e.g. Aikens & Roach 2014). A more comprehensive definition is provided in Ellner & Rees (2006) and in Ellner, Childs & Rees, Chapter 6 (2016).

A general, density independent, deterministic IPM

We’ll start with the least complicated of the general IPMs and build progressively from there. If you’ve already read through the introduction vignette, then most of this will look pretty familiar. If not, it is probably best to at least skim the first few sections before proceeding here.

Key differences between simple and general IPMs.

As noted above, we are now working with models that may have multiple continuous state variables and/or discrete states to describe the demography of our species. Thus, we need to add some more information to the model’s definition in order to fully capture these additional demographic details.

Relative to how we define simple models in ipmr, there are now two new bits:

  1. Each kernel that has an integration requires that d_statevariable get appended to the kernel formula. This is equivalent to the “dz” in \(\int_L^U K(z',z) n(z,t) \mathrm{dz}\). ipmr automatically generates this variable internally, so there is no need to define it in the data_list or define_domains(), we can just add it any of the formula(s) that we want to. We skipped this step in the simple IPMs because it gets appended automatically. Unfortunately, it is less easy to infer the correct state variable to d_z with for general IPMs where there may be many continuous and/or discrete states.

  2. The implementation arguments list can now have different values in the state_start and state_end slots. This is demonstrated in a separate chunk below.

Mathematical overview of the example

This example will use an IPM for Ligustrum obtusifolium, a tree species that is now invasive in North America. Data were collected outside of St. Louis, Missouri, and the full model is described in Levin et al. 2019. The IPM consists of a single discrete stage (seed bank, abbreviated b) and a single continuous state (height, abbreviated ht/\(z,z'\)). The census timing was such that all seeds must enter the seed bank. They can either germinate in the next year, or they can die (so stay_discrete = 0). Thus, the full model takes the following form:

  1. \(n(z', t + 1) = \int_L^U P(z', z) n(z,t)d\mathrm{z} + b(t) * leave\_discrete(z')\)

  2. \(b(t + 1) = \int_L^Ugo\_discrete(z) n(z,t)d\mathrm{z}+ stay\_discrete\)

  3. \(P(z',z) = s(z) * G(z',z)\)

  4. \(go\_discrete(z) = f_s(z) * f_r(z) * g_i\)

  5. \(leave\_discrete(z') = e_p * f_d(z')\)

  6. \(stay\_discrete = 0\)

\(f_G\) and \(f_{r_d}\) denote Normal probability density functions. The vital rates functions and example code to fit them are:

  1. survival (s): A logistic regression with a squared term

    • Example code: glm(surv ~ ht_1 + I(ht_1)^2, data = my_surv_data, family = binomial())

    • Mathematical form: \(Logit(s(z)) = \alpha_s + \beta_{s,1} * z + \beta_{s,2} * z^2\)

  2. growth (g): A linear regression

    • example code: lm(ht_2 ~ ht_1, data = grow_data)

    • Mathematical form:

      • \(\mu_G = \alpha_G + \beta_G * z\)

      • \(G(z',z) = f_G(\mu_G, \sigma_G)\)

  3. flowering probability (r_r): A logistic regression

    • example code: glm(repro ~ ht_1, data = my_repro_data, family = binomial())

    • Mathematical form: \(Logit(r_r(z)) = \alpha_{r_r} + \beta_{r_r} * z\)

  4. seed production (r_s): A Poisson regression

    • example code: glm(seeds ~ ht_1, data = my_seed_data, family = poisson())

    • Mathematical form: \(Log(r_s(z)) = \alpha_{r_s} + \beta_{r_s} * z\)

  5. Recruit size distribution (r_d): A normal distribution with mean r_d_mu and standard deviation r_d_sd.

    • example code: r_d_mu <- mean(my_recr_data$ht_2, na.rm = TRUE) and r_d_sd <- sd(my_rer_data$ht_2, na.rm = TRUE).

    • Mathematical form: \(r_d(z') = f_{r_d}(\mu_{r_d}, \sigma_{r_d})\)

  6. germination (g_i) and establishment (e_p) are constants. The code below assumes we have our data in long format (each seed get its own row) and that successful germination/establishment is coded as 1s, failures are 0s, and seeds we don’t know the fate of for whatever reason are NAs.

    • example code: g_i <- mean(my_germ_data, na.rm = TRUE) and e_p <- mean(my_est_data, na.rm = TRUE)

    • Mathematical form: \(g_i = 0.5067, e_p = 0.15\)

Model code

Below is the code to implement this model. First, we define our our parameters in a list.

library(ipmr)

# Set up the initial population conditions and parameters

data_list <- list(
  g_int     = 5.781,
  g_slope   = 0.988,
  g_sd      = 20.55699,
  s_int     = -0.352,
  s_slope   = 0.122,
  s_slope_2 = -0.000213,
  r_r_int   = -11.46,
  r_r_slope = 0.0835,
  r_s_int   = 2.6204,
  r_s_slope = 0.01256,
  r_d_mu    = 5.6655,
  r_d_sd    = 2.0734,
  e_p       = 0.15,
  g_i       = 0.5067
)

Next, we set up two functions to pass into the model. These perform the inverse logit transformations for the probability of flowering model (r_r/\(r_r(z)\)) and survival model (s/\(s(z)\)).

# We'll set up some helper functions. The survival function
# in this model is a quadratic function, so we use an additional inverse logit function
# that can handle the quadratic term.

inv_logit <- function(int, slope, sv) {
  1/(1 + exp(-(int + slope * sv)))
}

inv_logit_2 <- function(int, slope, slope_2, sv) {
  1/(1 + exp(-(int + slope * sv + slope_2 * sv ^ 2)))
}

Now, we’re ready to begin making the IPM kernels. We change the sim_gen argument of init_ipm() to "general".

general_ipm <- init_ipm(sim_gen = "general", di_dd = "di", det_stoch = "det") %>%
  define_kernel(
    name          = "P",
    
    # We add d_ht to formula to make sure integration is handled correctly.
    # This variable is generated internally by make_ipm(), so we don't need
    # to do anything else.
    
    formula       = s * g * d_ht,
    
    # The family argument tells ipmr what kind of transition this kernel describes.
    # it can be "CC" for continuous -> continuous, "DC" for discrete -> continuous
    # "CD" for continuous -> discrete, or "DD" for discrete -> discrete.
    
    family        = "CC",
    
    # The rest of the arguments are exactly the same as in the simple models
    
    g             = dnorm(ht_2, g_mu, g_sd),
    g_mu          = g_int + g_slope * ht_1,
    s             = inv_logit_2(s_int, s_slope, s_slope_2, ht_1),
    data_list     = data_list,
    states        = list(c('ht')),
    uses_par_sets = FALSE,
    evict_cor     = TRUE,
    evict_fun     = truncated_distributions('norm',
                                            'g')
  ) %>%
  define_kernel(
    name          = "go_discrete",
    formula       = r_r * r_s * d_ht,
    
    # Note that now, family = "CD" because it denotes a continuous -> discrete transition
    
    family        = 'CD',
    r_r           = inv_logit(r_r_int, r_r_slope, ht_1),
    r_s           = exp(r_s_int + r_s_slope * ht_1),
    data_list     = data_list,
    
    # Note that here, we add "b" to our list in states, because this kernel
    # "creates" seeds entering the seedbank
    
    states        = list(c('ht', "b")),
    uses_par_sets = FALSE
  ) %>%
  define_kernel(
    name    = 'stay_discrete',
    
    # In this case, seeds in the seed bank either germinate or die, but they 
    # do not remain for multiple time steps. This can be adjusted as needed.
    
    formula = 0,
    
    # Note that now, family = "DD" because it denotes a discrete -> discrete transition
    
    family  = "DD",
    
    # The only state variable this operates on is "b", so we can leave "ht"
    # out of the states list
    
    states  = list(c('b')),
    evict_cor = FALSE
  ) %>%
  define_kernel(
    
    # Here, the family changes to "DC" because it is the discrete -> continuous 
    # transition
    
    name          = 'leave_discrete',
    formula       = e_p * g_i * r_d,
    r_d           = dnorm(ht_2, r_d_mu, r_d_sd),
    family        = 'DC',
    data_list     = data_list,
    
    # Again, we need to add "b" to the states list
    
    states        = list(c('ht', "b")),
    uses_par_sets = FALSE,
    evict_cor     = TRUE,
    evict_fun     = truncated_distributions('norm',
                                            'r_d')
  ) 

We’ve now defined all of the kernels, next are the implementation details. These also differ somewhat from simple IPMs. The key difference in the implementation arguments list lies in the state_start and state_end of each kernel, and is related to the family argument of each kernel. Kernels that begin with one state and end in a different state (e.g. moving from seed bank to a plant) will have different entries in the state_start and state_end slots. It is very important to get these correct, as ipmr uses this information to generate the model iteration procedure automatically (i.e. code corresponding to Equations 1-2).

general_ipm <- general_ipm %>%
  define_impl(
    list(
      P              = list(int_rule    = "midpoint",
                            state_start = "ht",
                            state_end   = "ht"),
      go_discrete    = list(int_rule    = "midpoint",
                            state_start = "ht",
                            state_end   = "b"),
      stay_discrete  = list(int_rule    = "midpoint",
                            state_start = "b",
                            state_end   = "b"),
      leave_discrete = list(int_rule    = "midpoint",
                            state_start = "b",
                            state_end   = "ht")
    )
  )

An alternative to the list above is to use make_impl_args_list(). The chunk above and the chunk below generate equivalent proto_ipm objects.

general_ipm <- general_ipm %>%
  define_impl(
    make_impl_args_list(
      kernel_names = c("P", "go_discrete", "stay_discrete", "leave_discrete"),
      int_rule     = c(rep("midpoint", 4)),
      state_start    = c('ht', "ht", "b", "b"),
      state_end      = c('ht', "b", "b", 'ht')
    )
  )

The rest is the same as the simple IPM. We can use our pre-defined functions in make_ipm, and we can use pre-defined variables in define_domains. We’ll create variables for the upper (U) and lower (L) size bounds for the population, and the number of meshpoints for integration (n). We also define initial population vectors for both b and ht.

# The lower and upper bounds for the continuous state variable and the number
# of meshpoints for the midpoint rule integration. We'll also create the initial
# population vector from a random uniform distribution

L <- 1.02
U <- 624
n <- 500

set.seed(2312)

init_pop_vec   <- runif(500)
init_seed_bank <- 20


general_ipm <- general_ipm %>%
  define_domains(
    
    # We can pass the variables we created above into define_domains
    
    ht = c(L, U, n)
    
  ) %>%
  define_pop_state(
    
    # We can also pass them into define_pop_state
    
    pop_vectors = list(
      n_ht = init_pop_vec,
      n_b  = init_seed_bank
    )
  ) %>%
  make_ipm(iterations = 100,
           usr_funs = list(inv_logit   = inv_logit,
                           inv_logit_2 = inv_logit_2))


# lambda is a generic function to compute per-capita growth rates. It has a number
# of different options depending on the type of model

lambda(general_ipm)

# If we are worried about whether or not the model converged to stable
# dynamics, we can use the exported utility is_conv_to_asymptotic. The default
# tolerance for convergence is 1e-10, but can be changed with the 'tol' argument.

is_conv_to_asymptotic(general_ipm, tol = 1e-10)

w <- right_ev(general_ipm)
v <- left_ev(general_ipm)

Our model is now built! We can explore the asymptotic population growth rate with the lambda() function, which will work on any object made my make_ipm(). The same is true for is_conv_to_asymptotic(), which is a helper to function to figure out if we’ve set the number of iterations high enough to actually reach asymptotic population dynamics. left_ev and right_ev work for general deterministic IPMs as well. Stochastic versions are in the works, but are not yet implemented.

Further analysis

Say we wanted to calculate the mean and variance of life span conditional on initial state \(z_0\). This requires us to generate the IPM’s fundamental operator, \(N\), which is computed as follows: \(N = (I - P)^{-1}\). The following chunk is a function to compute average lifespan as a function of initial size. Note that we omit the fecundity from these calculations entirely, as our model does not assume that reproduction imposes a cost on survival.

The formula for mean lifespan is given as \(\bar{\eta}(z_0) = eN\) (where \(e\) is a constant function such that \(e(z)\equiv1\)), and the formula for its variance is \(\sigma^2_\eta(z_0) = e(2N^2-N) - (eN)^2\). Since the \(e\) function has the effect of summing columns, we’ll replace that with the R function colSums. Our second function, sigma_eta, will make use of %^% from ipmr, which multiples matrices by themselves, rather than the point-wise power provided by ^. More details on the math underlying these calculations are provided in Ellner, Childs, & Rees (2016), chapter 3.

make_N <- function(ipm) {
  
  P     <- ipm$sub_kernels$P
  
  I     <- diag(nrow(P))
  N     <- solve(I - P)
  
  return(N)
}

eta_bar <- function(ipm) {
  
  N     <- make_N(ipm)
  out   <- colSums(N)
  
  return(as.vector(out))
  
}

sigma_eta <- function(ipm) {
  
  N     <- make_N(ipm)  

  out <- colSums(2 * (N %^% 2L) - N) - colSums(N) ^ 2
  
  return(as.vector(out))
}

mean_l <- eta_bar(general_ipm)
var_l  <- sigma_eta(general_ipm)

mesh_ps <- int_mesh(general_ipm)$ht_1 %>%
  unique()

par(mfrow = c(1,2))

plot(mesh_ps, mean_l, type = "l", xlab = expression( "Initial size z"[0]))
plot(mesh_ps, var_l, type = "l", xlab = expression( "Initial size z"[0]))

From prior knowledge, we happen to know that even in the best conditions, Ligustrum obtusifolium individuals usually only live 25-50 years. It appears that our model (wildly) overestimates their average lifespan, and we’d want to think about how to re-parameterize it to more accurately capture mortality events. Our survival model would seem the most likely candidate for re-examination, particularly for young and mid-sized trees. That is a problem for another day though, and so the next section will investigate how to implement general IPMs in discretely varying environments.

General models for discretely varying environments

Discretely varying parameters can be used to construct general IPMs with little additional effort. These are typically the result of fitting a set of fixed effects vital rate models that include both continuous predictors for size/state and categorical variables like treatment or maturation state. They can also result from mixed effects models, for example, working with conditional modes for a random intercept corresponding to year or site.

ipmr refers to these as parameter sets, and abbreviates them par_sets. For example, a random intercept for year may be denoted \(\alpha_{yr}\), where \(_{yr}\) provides an index for the different values that \(\alpha\) can take on.

If you’ve already read the Intro to ipmr article and the example above, then there aren’t any new concepts to introduce here. Below is an example showing how all of these come together.

Mathematical overview of the example

We’ll use a variation of the model above, simulating some random year-specific intercepts for growth and seed production. Parameters, functions, and kernels that are now time varying have a subscript \(x_{yr}\) appended to them.

  1. \(n(z', t + 1) = \int_L^U P_{yr}(z', z) n(z,t)d\mathrm{z} + b(t) * leave\_discrete(z')\)

  2. \(b(t + 1) = \int_L^Ugo\_discrete_{yr}(z) n(z,t)d\mathrm{z}+ stay\_discrete\)

  3. \(P_{yr}(z',z) = s(z) * G_{yr}(z',z)\)

  4. \(go\_discrete_{yr}(z) = r_{s,yr}(z) * r_r(z) * g_i\)

  5. \(leave\_discrete(z') = e_p * r_d(z')\)

  6. \(stay\_discrete = 0\)

The vital rate models are as follows:

  1. survival (s): A logistic regression

    • Example code: glm(surv ~ ht_1, data = my_surv_data, family = binomial())

    • Mathematical form: \(Logit(s(z)) = \alpha_s + \beta_{s,1} * z + \beta_{s,2} * z^2\)

  2. growth (g): A linear mixed effects regression. \(f_G\) denotes a normal probability density function.

    • example code: lmer(ht_2 ~ ht_1 + (1 | year), data = grow_data)

    • Mathematical form:

      • \(\mu_{G,yr} = \alpha_G + \alpha_{G,yr} + \beta_G * z\)

      • \(G_{yr}(z',z) = f_G(z', \mu_{G,yr}, \sigma_G)\)

  3. flowering probability (r_r): A logistic regression

    • example code: glm(repro ~ ht_1, data = my_repro_data, family = binomial())

    • Mathematical form: \(Logit(r_r(z)) = \alpha_{r_r} + \beta_{r_r} * z\)

  4. seed production (r_s): A Poisson mixed effects regression

    • example code: glmer(seeds ~ ht_1 + (1 | year), data = my_seed_data, family = poisson())

    • Mathematical form: \(Log(r_{s,yr}(z)) = \alpha_{r_s} + \alpha_{r_s, yr} + \beta_{r_s} * z\)

  5. Recruit size distribution (r_d): A normal distribution (\(f_{r_d}\)) with mean r_d_mu and standard deviation r_d_sd.

    • example code: r_d_mu <- mean(my_recr_data$ht_2, na.rm = TRUE) and r_d_sd <- sd(my_recr_data$ht_2, na.rm = TRUE).

    • Mathematical form: \(r_d(z') = f_{r_d}(z', \mu_{r_d}, \sigma_{r_d})\)

  6. germination (g_i) and establishment (e_p) are constants. The code below assumes we have our data in long format (each seed get its own row) and that successful germinations/establishments are coded as 1s, failures are 0s, and seeds we don’t know the fate of for whatever reason are NAs.

    • example code: g_i <- mean(my_germ_data, na.rm = TRUE) and e_p <- mean(my_est_data, na.rm = TRUE)

    • Mathematical form: \(g_i = 0.5067, e_p = 0.15\)

Model parameterization

In the next chunk, we’ll set up the data list with all of our constants and make sure the names are what we want them to be. This example generates intercepts for 5 years of data. This can be altered according to your own needs. Then, we’ll define a couple functions to make the vital rate expressions easier to write.

A key aspect here is setting the names on the time-varying parameters in the data_list. We are using the form "g_int_year", and substituting the values that "year" can take in for the actual index. These don’t need to be numbers, they can be words, letters, or some combination of the two. For this example, the years will be represented with the values 1:5. If we wanted to denote the actual years of the censuses, we could switch 1:5 to, for example, 2002:2006.

library(ipmr)

# Set up the initial population conditions and parameters
# Here, we are simulating random intercepts for growth
# and seed production, converting them to a list,
# and adding them into the list of constants. Equivalent code
# to produce the output for the output from lmer/glmer
# is in the comments next to each line

all_g_int   <- as.list(rnorm(5, mean = 5.781, sd = 0.9)) # as.list(unlist(ranef(my_growth_model)))
all_r_s_int <- as.list(rnorm(5, mean = 2.6204, sd = 0.3)) # as.list(unlist(ranef(my_seed_model)))


names(all_g_int)   <- paste("g_int_", 1:5, sep = "")
names(all_r_s_int) <- paste("r_s_int_", 1:5, sep = "")

constant_list <- list(
  g_slope   = 0.988,
  g_sd      = 20.55699,
  s_int     = -0.352,
  s_slope   = 0.122,
  s_slope_2 = -0.000213,
  r_r_int   = -11.46,
  r_r_slope = 0.0835,
  r_s_slope = 0.01256,
  r_d_mu    = 5.6655,
  r_d_sd    = 2.0734,
  e_p       = 0.15,
  g_i       = 0.5067
)

all_params <- c(constant_list, all_g_int, all_r_s_int)

# The lower and upper bounds for the continuous state variable and the number
# of meshpoints for the midpoint rule integration.

L <- 1.02
U <- 624
n <- 500

set.seed(2312)

init_pop_vec   <- runif(500)
init_seed_bank <- 20

# add some helper functions. The survival function
# in this model is a quadratic function, so we use an additional inverse logit function
# that can handle the quadratic term.

inv_logit <- function(int, slope, sv) {
  1/(1 + exp(-(int + slope * sv)))
}

inv_logit_2 <- function(int, slope, slope_2, sv) {
  1/(1 + exp(-(int + slope * sv + slope_2 * sv ^ 2)))
}

Next, we will set up our sub-kernels. We’ll append the _year suffix to the kernel name, the vital rate expressions and parameter values that have time-varying components (g_year, f_s_year, g_int_year, f_s_int_year), and the call to truncated_distributions so that the growth kernel is correctly specified. We’ll also pass a list(year = 1:5)) into the par_set_indices argument of define_kernel(). When building the model, make_ipm() automatically expands these expressions to create 5 different expressions containing the various levels of year.

general_stoch_kern_ipm <- init_ipm(sim_gen    = "general",
                                   di_dd      = "di", 
                                   det_stoch  = "stoch",
                                   kern_param = "kern") %>%
  define_kernel(
    
    # The kernel name gets indexed by _year to denote that there
    # are multiple possible kernels we can build with our parameter set.
    # The _year gets substituted by the values in "par_set_indices" in the 
    # output, so in this example we will have P_1, P_2, P_3, P_4, and P_5
    
    name          = "P_year",
    
    # We also add _year to "g" to signify that it is going to vary across kernels.
    
    formula       = s * g_year * d_ht,
    family        = "CC",
    
    # Here, we add the suffixes again, ensuring they are expanded and replaced
    # during model building by the parameter names
    
    g_year           = dnorm(ht_2, g_mu_year, g_sd),
    g_mu_year        = g_int_year + g_slope * ht_1,
    s                = inv_logit_2(s_int, s_slope, s_slope_2, ht_1),
    data_list        = all_params,
    states           = list(c('ht')),
    
    # We set uses_par_sets to TRUE, signalling that we want to expand these
    # expressions across all levels of par_set_indices.
    
    uses_par_sets    = TRUE,
    par_set_indices = list(year = 1:5),
    evict_cor        = TRUE,
    
    # we also add the suffix to `target` here, because the value modified by 
    # truncated_distributions is time-varying.
    
    evict_fun        = truncated_distributions('norm',
                                               target = 'g_year')
  ) %>%
  define_kernel(
    
    # again, we append the index to the kernel name, vital rate expressions,
    # and in the model formula.
    
    name          = "go_discrete_year",
    formula       = r_r * r_s_year * g_i * d_ht,
    family        = 'CD',
    r_r           = inv_logit(r_r_int, r_r_slope, ht_1),
    
    # Again, we modify the left and right hand side of this expression to 
    # show that there is a time-varying component
    
    r_s_year      = exp(r_s_int_year + r_s_slope * ht_1),
    data_list     = all_params,
    states        = list(c('ht', "b")),
    uses_par_sets    = TRUE,
    par_set_indices = list(year = 1:5)
  ) %>%
  define_kernel(
    
    # This kernel has no time-varying parameters, and so is not indexed.
    
    name    = 'stay_discrete',
    
    # In this case, seeds in the seed bank either germinate or die, but they 
    # do not remain for multipe time steps. This can be adjusted as needed.
    
    formula = 0,
    
    # Note that now, family = "DD" becuase it denotes a discrete -> discrete transition
    
    family  = "DD",
    states  = list(c('b')),
    
    # This kernel has no time-varying parameters, so we don't need to designate
    # it as such.
    
    uses_par_sets = FALSE,
    evict_cor = FALSE
  ) %>%
  define_kernel(

    # This kernel also doesn't get a index, because there are no varying parameters.
    
    name          = 'leave_discrete',
    formula       = e_p * r_d,
    r_d           = dnorm(ht_2, r_d_mu, r_d_sd),
    family        = 'DC',
    data_list     = all_params,
    states        = list(c('ht', "b")),
    uses_par_sets = FALSE,
    evict_cor     = TRUE,
    evict_fun     = truncated_distributions('norm',
                                            'r_d')
  )  %>%
  define_impl(
    make_impl_args_list(
      
      # We add suffixes to the kernel names here to make sure they match the names
      # we specified above.
      
      kernel_names = c("P_year", "go_discrete_year", "stay_discrete", "leave_discrete"),
      int_rule     = c(rep("midpoint", 4)),
      state_start    = c('ht', "ht", "b", "b"),
      state_end      = c('ht', "b", "b", 'ht')
    )
  ) %>%
  define_domains(
    ht = c(L, U, n)
  ) %>%
  define_pop_state(
      n_ht = init_pop_vec,
      n_b  = init_seed_bank
  ) %>%
  make_ipm(
    iterations = 100,
    
    # We can specify a sequence of kernels to select for the simulation.
    # This helps others to reproduce what we did,
    # and lets us keep track of the consequences of different selection
    # sequences for population dynamics. 
    
    kernel_seq = sample(1:5, size = 100, replace = TRUE),
    usr_funs = list(inv_logit   = inv_logit,
                    inv_logit_2 = inv_logit_2)
  )

100 isn’t too many iterations, but hopefully this demonstrates how to set up and implement such a model. There are a number of slots in the output that may be useful for further analyses.

  1. env_seq: This contains a character vector which shows the sequence in which levels of the time-varying components were chosen during model iteration. We could use this to reproduce the model outputs later.

  2. pop_state$lambda: This shows the single timestep per-capita growth rates for the simulation.

Additionally, there are functions to compute the stochastic population growth rate (\(\lambda_s\), computed as mean(log(pop_state$lambda)), burn in can be controlled with the burn_in parameter), and the mean sub-kernels.

mean_kernels <- mean_kernel(general_stoch_kern_ipm)
lam_s        <- lambda(general_stoch_kern_ipm, burn_in = 0.15) # Remove first 15% of iterations

General models with continuously varying environments

We can also use continuously varying parameters to construct general IPMs. These are good tools for exploring the consequences of environmental variation on demographic rates. ipmr handles these using define_env_state(), which can take both functions and data and generate draws from distributions during each iteration of the model.

Mathematical overview

This example includes survival and growth models that make use of environmental covariates. The model can be written as:

  1. \(n(z', t+1) = \int_L^U K(z',z, \theta)n(z,t)dz + B(t) * r_s * r_d * f_{c_d}(z')\)

  2. \(B(t + 1) = r_e * r_s * \int_L^U c_r(z) * c_s(z)n(z,t)dz + B(t) * r_s * r_r\)

  3. \(K(z',z,\theta) = P(z',z,\theta) + F(z',z)\)

  4. \(P(z',z,\theta) = s(z, \theta) * G(z',z,\theta)\)

  5. \(F(z',z) = c_r(z) * c_s(z) * c_d(z') * (1 - r_e)\)

\(\theta\) is a vector of time-varying environmental parameters. For this example, we’ll use a Gaussian distribution for temperature and a Gamma distribution for precipitation:

  1. \(\theta_t \sim Norm(\mu = 8.9, \sigma = 1.2)\)

  2. \(\theta_p \sim Gamma(k = 1000, \beta = 2)\)

Next, we’ll write out the actual vital rate functions in the model:

  1. survival (s/\(s(z,\theta)\)): a logistic regression.

    • Example model formula: glm(survival ~ size_1 + temp + precip, data = my_surv_data, family = binomial())

    • Mathematical form: \(Logit(s(z,\theta)) = \alpha_s + \beta_s^z * z + \beta_s^t * temp + \beta_s^p * precip\)

  2. growth (g, \(G(z',z,\theta)\)): a linear regression with a Normal error distribution (denoted \(f_G\))

    • Example model formula: glm(size_2 ~ size_1 + temp + precip, data = my_grow_data)

    • Mathematical form:

      • \(G(z',z) = f_G(z', \mu_G(z,\theta), \sigma_G)\)

      • \(\mu_G(z, \theta) = \alpha_G + \beta_G^z * z + \beta_G^t * temp + \beta_G^p * precip\)

  3. flower probability (c_r/\(c_r(z)\)): A logistic regression.

    • Example model formula: glm(repro ~ size_1, data = my_repro_data, family = binomial())

    • Mathematical form: \(Logit(c_r(z)) = \alpha_{c} + \beta_{c_r} * z\)

  4. seed production (c_s/\(c_s(z)\): a logistic regression.

    • Example model formula: glm(flower_n ~ size_1, data = my_flower_data, family = poisson())

    • Mathematical form: \(Log(c_s(z)) = \alpha_{c_s} + \beta_{c_s} * z\)

  5. recruit sizes (c_d/\(c_d(z')\)): A Normal distribution (denoted \(f_{c_d}\))

    • Example code: mean (c_d_mu) mean(my_recruit_data$size_2, na.rm = TRUE) and standard deviation (c_d_sd) sd(my_recruit_data$size_2, na.rm = TRUE)

    • Mathematical form: \(c_d(z') = f_{c_d}(z', \mu_{f_d}, \sigma_{f_d})\)

  6. Constants: Discrete stage survival (r_s), discrete stage entrance probability (r_e), discrete stage departure probability conditional on survival (r_d), and probability of remaining in discrete stage (r_r).

Model parameterization

First, we’ll specify the constant parameters. We keep all values related to demography in the data_list.

library(ipmr)

# Define the fixed parameters in a list

constant_params <- list(
  s_int     = -5,
  s_slope   = 2.2,
  s_precip  = 0.0002,
  s_temp    = -0.003,
  g_int     = 0.2,
  g_slope   = 1.01,
  g_sd      = 1.2,
  g_temp    = -0.002,
  g_precip  = 0.004,
  c_r_int   = 0.3,
  c_r_slope = 0.03,
  c_s_int   = 0.4,
  c_s_slope = 0.01,
  c_d_mu    = 1.1,
  c_d_sd    = 0.1,
  r_e       = 0.3,
  r_d       = 0.3,
  r_r       = 0.2,
  r_s       = 0.2
)

In addition to creating the standard data_list, we need to create a function to sample the environmental covariates, and a list of parameters that generate the environmental covariates. The function needs to return a named list. The names in the returned list can be referenced in vital rate expressions, kernel formulas, etc. as if we had specified them in the data_list. make_ipm() only runs the functions in define_env_state() once per model iteration. This ensures that parameters from joint distributions can be used consistently across kernels without losing user-any specified correlations.

# Now, we create a set of environmental covariates. In this example, we use
# a normal distribution for temperature and a Gamma for precipitation. 

env_params <- list(
  temp_mu = 8.9,
  temp_sd = 1.2,
  precip_shape = 1000,
  precip_rate  = 2
)

# We define a wrapper function that samples from these distributions

sample_env <- function(env_params) {
  
  # We generate one value for each covariate per iteration, and return it 
  # as a named list. 
  
  temp_now <- rnorm(1,
                    env_params$temp_mu, 
                    env_params$temp_sd)
  
  precip_now <- rgamma(1, 
                       shape = env_params$precip_shape,
                       rate = env_params$precip_rate)
  
  # The vital rate expressions can now use the names "temp" and "precip" 
  # as if they were in the data_list.
  
  out        <- list(temp = temp_now, precip = precip_now)
  
  return(out)
  
}

# Again, we can define our own functions and pass them into calls to make_ipm. This
# isn't strictly necessary, but can make the model code more readable/less error prone.

inv_logit <- function(lin_term) {
  1/(1 + exp(-lin_term))
}

Model specification

We are now ready to begin implementing the model. This next chunk should look familiar, with the caveat that we have now add the terms temp and precip to vital rates in the P kernel.

general_stoch_param_model <- init_ipm(sim_gen    = "general",
                                      di_dd      = "di", 
                                      det_stoch  = "stoch",
                                      kern_param = "param") %>%
  define_kernel(
    name       = "P_stoch",
    family     = "CC",
    
    # As in the examples above, we have to add the d_surf_area
    # to ensure the integration of the functions is done.
    
    formula    = s * g * d_surf_area,
    
    # We can reference continuously varying parameters by name
    # in the vital rate expressions just as before, even though
    # they are passed in define_env_state() as opposed to the kernel's
    # data_list
    
    g_mu    = g_int + g_slope * surf_area_1 + g_temp * temp + g_precip * precip,
    s_lin_p = s_int + s_slope * surf_area_1 + s_temp * temp + s_precip * precip,
    s       = inv_logit(s_lin_p),
    g       = dnorm(surf_area_2, g_mu, g_sd),
   
    data_list     = constant_params,
    states        = list(c("surf_area")),
    uses_par_sets = FALSE,
    evict_cor     = TRUE,
    evict_fun     = truncated_distributions("norm", "g")
  ) %>%
  define_kernel(
    name          = "F",
    family        = "CC",
    formula       = c_r * c_s * c_d * (1 - r_e) * d_surf_area,
    
    c_r_lin_p     = c_r_int + c_r_slope * surf_area_1,
    c_r           = inv_logit(c_r_lin_p),
    c_s           = exp(c_s_int + c_s_slope * surf_area_1),
    c_d           = dnorm(surf_area_2, c_d_mu, c_d_sd),
    data_list     = constant_params,
    states        = list(c("surf_area")),
    uses_par_sets = FALSE,
    evict_cor     = TRUE,
    evict_fun     = truncated_distributions("norm", "c_d")
  ) %>%
  define_kernel(
    
    # Name can be anything, but it helps to make sure they're descriptive
    
    name = "go_discrete",
    
    # Family is now "CD" because it is a continuous -> discrete transition
    
    family        = "CD",
    formula       = r_e * r_s * c_r * c_s * d_surf_area,
    c_r_lin_p     = c_r_int + c_r_slope * surf_area_1,
    c_r           = inv_logit(c_r_lin_p),
    c_s           = exp(c_s_int + c_s_slope * surf_area_1),
    data_list     = constant_params,
    states        = list(c("surf_area", "sb")),
    uses_par_sets = FALSE,
    
    # There is not eviction to correct here, so we can set this to false
    
    evict_cor     = FALSE
  ) %>%
  define_kernel(
    name          = "stay_discrete",
    family        = "DD",
    formula       = r_s * r_r,
    data_list     = constant_params,
    states        = list("sb"),
    uses_par_sets = FALSE,
    evict_cor     = FALSE
  ) %>%
  define_kernel(
    name          = "leave_discrete",
    family        = "DC",
    formula       = r_d * r_s * c_d,
    c_d           = dnorm(surf_area_2, c_d_mu, c_d_sd),
    data_list     = constant_params,
    states        = list(c("surf_area", "sb")),
    uses_par_sets = FALSE,
    evict_cor     = TRUE,
    evict_fun     = truncated_distributions("norm", "c_d")
  ) %>%
  define_impl(
    make_impl_args_list(
      kernel_names = c("P_stoch", 
                       "F",
                       "go_discrete", 
                       "stay_discrete", 
                       "leave_discrete"),
      int_rule     = rep("midpoint", 5),
      state_start  = c("surf_area", 
                       "surf_area",
                       "surf_area",
                       "sb", 
                       "sb"),
      state_end    = c("surf_area",
                       "surf_area", 
                       "sb", 
                       "sb", 
                       "surf_area")
    )
  ) %>%
  define_domains(
    surf_area = c(0, 10, 100)
  ) 

Now, we need to define_env_state(). This consists of two parts - specifying expressions that generate environmental covariates, and supplying the information needed to evaluate those expressions. In this example, we want to use the env_params in sample_env (i.e. sample_env(env_params)). define_env_state uses the data_list argument to supply the env_params, and the call to sample_env goes into the ... portion of the function. Here, we assign the result to env_covs. The name env_covs isn’t important for specifying vital rate expressions and you could call it whatever you want, but it must be named something in order to work properly. The only thing that needs to match are the names in the list that sample_env returns, and the names used in the vital rate expressions/kernels.

Note that you can pass the sample_env function in either the data_list of define_env_state(), or the usr_funs argument of make_ipm().

# In the first version, sample_env is provided in the data_list of 
# define_env_state.

general_stoch_param_ipm <-  define_env_state(
  proto_ipm  = general_stoch_param_model,
  env_covs   = sample_env(env_params),
  data_list  = list(env_params = env_params,
                    sample_env = sample_env)
) %>% 
  define_pop_state(
    n_surf_area = runif(100),
    n_sb        = rpois(1, 20)
  ) %>%
  make_ipm(usr_funs = list(inv_logit  = inv_logit),
           iterate = TRUE,
           iterations = 100)


# in the second version, sample_env is provided in the usr_funs list of 
# make_ipm(). These two versions are equivalent. 

general_stoch_param_ipm <-  define_env_state(
  proto_ipm  = general_stoch_param_model,
  env_covs   = sample_env(env_params),
  data_list  = list(env_params = env_params)
) %>% 
  define_pop_state(
    n_surf_area = runif(100),
    n_sb        = rpois(1, 20)
  ) %>%
  make_ipm(usr_funs = list(inv_logit  = inv_logit,
                           sample_env = sample_env),
           iterate = TRUE,
           iterations = 100,
           return_sub_kernels = TRUE)

Stochastic parameter-resampled models also return an env_seq, but this time it will be a data.frame of parameter draws rather than a character vector. We can also compute mean sub-kernels and \(\lambda_s\) as we did in the kernel-resampled models. This time, each mean kernel is computed as the average of each sub-kernel over the course of the simulation (i.e. mean of all P kernels). Some sub-kernels in our example are not time-varying - they will be numerically equivalent to the sub-kernels stored in the IPM object.

env_draws  <- general_stoch_param_ipm$env_seq

mean_kerns <- mean_kernel(general_stoch_param_ipm)

lam_s      <- lambda(general_stoch_param_ipm)

all.equal(mean_kerns$mean_F, 
          general_stoch_param_ipm$sub_kernels$F_it_1,
          check.attributes = FALSE)

A note on memory management

Longer running stochastic parameter-resampled models can take up a lot of space in memory when all of the sub-kernels are saved from each iteration. For example, running the model above for 10,000 iterations would result in 20,000 \(100 \times 100\) matrices, 20,000 \(1 \times 100\)/\(100 \times 1\) matrices, and 10,000 \(1 \times 1\) matrices in the sub_kernels slot of the IPM object (~16.4 GB of RAM). This will likely result in crashes as smaller machines run out of available RAM. Therefore, make_ipm() contains an argument return_sub_kernels for these types of models that allows you to switch off that behavior and conserve available RAM. By default, this is set to FALSE. If you need sub-kernels for downstream analyses, set this option to TRUE and make sure you have a computer with sufficient RAM (64-128 GB range is likely required to store all information for longer running models).

These warnings also apply to all density dependent model classes and the same return_sub_kernels argument can be used for those as well.

Code to construct mega-kernels

Sometimes, we may want to work with our sub-kernels arranged into a single block kernel. This isn’t required for any of the code in ipmr, except for the plot method for general_di_det IPMs. Other use cases may arise for analyses not included in this package though, so below is a brief overview of how to generate those.

make_iter_kernel() takes an IPM object and a vector of symbols (for interactive use) or a character version of the expression (for programming) showing where each sub-kernel should go. It works in ROW MAJOR order. This example will use the model from the general deterministic example at the top of the article.

First, re-run the model to create the IPM object (if you haven’t already).

data_list <- list(
  g_int     = 5.781,
  g_slope   = 0.988,
  g_sd      = 20.55699,
  s_int     = -0.352,
  s_slope   = 0.122,
  s_slope_2 = -0.000213,
  r_r_int   = -11.46,
  r_r_slope = 0.0835,
  r_s_int   = 2.6204,
  r_s_slope = 0.01256,
  r_d_mu    = 5.6655,
  r_d_sd    = 2.0734,
  e_p       = 0.15,
  g_i       = 0.5067
)


L <- 1.02
U <- 624
n <- 500

set.seed(2312)

init_pop_vec   <- runif(500)
init_seed_bank <- 20

# Initialize the state list and add some helper functions. The survival function
# in this model is a quadratic function.

inv_logit <- function(int, slope, sv) {
  1/(1 + exp(-(int + slope * sv)))
}

inv_logit_2 <- function(int, slope, slope_2, sv) {
  1/(1 + exp(-(int + slope * sv + slope_2 * sv ^ 2)))
}

general_ipm <- init_ipm(sim_gen    = "general",
                        di_dd      = "di", 
                        det_stoch  = "det") %>%
  define_kernel(
    name          = "P",
    formula       = s * g * d_ht,
    family        = "CC",
    g             = dnorm(ht_2, g_mu, g_sd),
    g_mu          = g_int + g_slope * ht_1,
    s             = inv_logit_2(s_int, s_slope, s_slope_2, ht_1),
    data_list     = data_list,
    states        = list(c('ht')),
    uses_par_sets = FALSE,
    evict_cor     = TRUE,
    evict_fun     = truncated_distributions('norm',
                                            'g')
  ) %>%
  define_kernel(
    name          = "go_discrete",
    formula       = r_r * r_s * g_i,
    family        = 'CD',
    r_r           = inv_logit(r_r_int, r_r_slope, ht_1),
    r_s           = exp(r_s_int + r_s_slope * ht_1),
    data_list     = data_list,
    states        = list(c('ht', "b")),
    uses_par_sets = FALSE
  ) %>%
  define_kernel(
    name      = 'stay_discrete',
    formula   = 0,
    family    = "DD",  
    states    = list(c('ht', "b")),
    evict_cor = FALSE
  ) %>%
  define_kernel(
    name          = 'leave_discrete',
    formula       = e_p * r_d,
    r_d           = dnorm(ht_2, r_d_mu, r_d_sd),
    family        = 'DC',
    data_list     = data_list,
    states        = list(c('ht', "b")),
    uses_par_sets = FALSE,
    evict_cor     = TRUE,
    evict_fun     = truncated_distributions('norm',
                                            'r_d')
  ) %>%
  define_impl(
    make_impl_args_list(
      kernel_names = c("P", "go_discrete", "stay_discrete", "leave_discrete"),
      int_rule     = c(rep("midpoint", 4)),
      state_start    = c('ht', "ht", "b", "b"),
      state_end      = c('ht', "b", "b", 'ht')
    )
  ) %>%
  define_domains(
    ht = c(L, U, n)
  ) %>%
  define_pop_state(
    pop_vectors = list(
      n_ht = init_pop_vec,
      n_b  = init_seed_bank
    )
  ) %>%
  make_ipm(iterations = 100,
           usr_funs = list(inv_logit   = inv_logit,
                           inv_logit_2 = inv_logit_2))

Now, we specify which kernel belongs where in row major order, using a call to c().

mega_mat <- make_iter_kernel(ipm = general_ipm,
                             mega_mat = c(
                               stay_discrete, go_discrete,
                               leave_discrete, P
                             ))

# These values should be almost identical, so this should ~0

Re(eigen(mega_mat[[1]])$values[1]) - lambda(general_ipm)

Say we wanted to program with this function. Passing bare expression is difficult programatically, and how to do that is not really within the scope of this vignette (though if you’re interested in learning how, this is a good start). make_iter_kernel() also accepts text strings in the same format as above.

# Get the names of each sub_kernel
sub_k_nms     <- names(general_ipm$sub_kernels)

mega_mat_text <- c(sub_k_nms[3], sub_k_nms[2], sub_k_nms[4], sub_k_nms[1])

mega_mat_2 <- make_iter_kernel(general_ipm,
                               mega_mat = mega_mat_text)

# Should be TRUE
identical(mega_mat, mega_mat_2)

make_iter_kernel() can also handle cases where you need blocks of 0s or identity matrices. These are specified using 0 for 0s, and I for identity matrices. make_iter_kernel() automatically works out the correct dimensions internally, so you don’t need to worry about specifying those.

Below is an example that inserts 0s and identity matrices on the off-diagonals with the P kernel duplicated along the diagonal.

mega_mat <- make_iter_kernel(general_ipm,
                               mega_mat = c(P, 0, 
                                            I, P))

Finally, make_iter_kernel supports ipmr’s parameter set index syntax as well, enabling us to generate a list of mega-kernels for each combination of parameter set values. We’ll re-run the "general_di_stoch_kern" example from above to demonstrate this.

all_g_int   <- as.list(rnorm(5, mean = 5.781, sd = 0.9)) 
all_f_s_int <- as.list(rnorm(5, mean = 2.6204, sd = 0.3))

names(all_g_int)   <- paste("g_int_", 1:5, sep = "")
names(all_f_s_int) <- paste("f_s_int_", 1:5, sep = "")

constant_list <- list(
  g_slope   = 0.988,
  g_sd      = 20.55699,
  s_int     = -0.352,
  s_slope   = 0.122,
  s_slope_2 = -0.000213,
  f_r_int   = -11.46,
  f_r_slope = 0.0835,
  f_s_slope = 0.01256,
  f_d_mu    = 5.6655,
  f_d_sd    = 2.0734,
  e_p       = 0.15,
  g_i       = 0.5067
)

all_params <- c(constant_list, all_g_int, all_f_s_int)

L <- 1.02
U <- 624
n <- 500

set.seed(2312)

init_pop_vec   <- runif(500)
init_seed_bank <- 20

inv_logit <- function(int, slope, sv) {
  1/(1 + exp(-(int + slope * sv)))
}

inv_logit_2 <- function(int, slope, slope_2, sv) {
  1/(1 + exp(-(int + slope * sv + slope_2 * sv ^ 2)))
}

general_stoch_kern_ipm <- init_ipm(sim_gen    = "general",
                                   di_dd      = "di", 
                                   det_stoch  = "stoch",
                                   kern_param = "kern") %>%
  define_kernel(
    name          = "P_year",
    formula       = s * g_year * d_ht,
    family        = "CC",
    g_year           = dnorm(ht_2, g_mu_year, g_sd),
    g_mu_year        = g_int_year + g_slope * ht_1,
    s                = inv_logit_2(s_int, s_slope, s_slope_2, ht_1),
    data_list        = all_params,
    states           = list(c('ht')),
    uses_par_sets    = TRUE,
    par_set_indices = list(year = 1:5),
    evict_cor        = TRUE,
    evict_fun        = truncated_distributions('norm',
                                               'g_year')
  ) %>%
  define_kernel(
    name          = "go_discrete_year",
    formula       = f_r * f_s_year * g_i * d_ht,
    family        = 'CD',
    f_r           = inv_logit(f_r_int, f_r_slope, ht_1),
    f_s_year      = exp(f_s_int_year + f_s_slope * ht_1),
    data_list     = all_params,
    states        = list(c('ht', "b")),
    uses_par_sets    = TRUE,
    par_set_indices = list(year = 1:5)
  ) %>%
  define_kernel(
    name    = 'stay_discrete',
    formula = 0,
    family  = "DD",
    states  = list(c('b')),
    uses_par_sets = FALSE,
    evict_cor = FALSE
  ) %>%
  define_kernel(
    name          = 'leave_discrete',
    formula       = e_p * f_d * d_ht,
    f_d           = dnorm(ht_2, f_d_mu, f_d_sd),
    family        = 'DC',
    data_list     = all_params,
    states        = list(c('ht', "b")),
    uses_par_sets = FALSE,
    evict_cor     = TRUE,
    evict_fun     = truncated_distributions('norm',
                                            'f_d')
  )  %>%
  define_impl(
    make_impl_args_list(
      kernel_names = c("P_year", "go_discrete_year", "stay_discrete", "leave_discrete"),
      int_rule     = c(rep("midpoint", 4)),
      state_start    = c('ht', "ht", "b", "b"),
      state_end      = c('ht', "b", "b", 'ht')
    )
  ) %>%
  define_domains(
    ht = c(L, U, n)
  ) %>%
  define_pop_state(
      n_ht = init_pop_vec,
      n_b  = init_seed_bank
  ) %>%
  make_ipm(
    iterations = 10,
    kernel_seq = sample(1:5, size = 10, replace = TRUE),
    usr_funs = list(inv_logit   = inv_logit,
                    inv_logit_2 = inv_logit_2)
  )

Next, we call make_iter_kernel() using the kernel names like so:

block_list <- make_iter_kernel(general_stoch_kern_ipm,
                                 mega_mat = c(stay_discrete, go_discrete_year,
                                              leave_discrete, P_year))

make_iter_kernel() also works for simple models, but assumes that all sub-kernels are combined additively (i.e. \(K(z',z) = P(z',z) + F('z,z)\)). It can handle parameter set index syntax as well, but does not require the mega_mat argument, and can just be called with an IPM object.