`ipmr`

is a package for implementing integral projection
models of varying degrees of complexity. It uses mathematical-ish
expressions to build up the iteration kernels from smaller pieces, as
well as helpers to ensure that models are implemented correctly.
Finally, it provides machinery for stochastic simulations. basic
analyses, and model diagnostics. More complicated analysis functions are
being implemented in a separate package.

**This package does not help with fitting regression models to
demographic data!** This is a distinct enough problem that it
should not be in the purview of this package - and there are much better
tools out there already that can do a much better job of helping you
with that than I can. Some of my favorites are `lme4`

,
`brms`

, `mgcv`

, and `nlme`

.
`IPMpack`

handles the regression modeling and IPM
construction for some, but not all, types of IPMs. `IPMpack`

is no longer on CRAN, but can be installed from the
`MetaCRAN/IPMpack`

Github page or running
`devtools::install_github("CRAN/IPMpack")`

.

Unfortunately, `ipmr`

does not yet have built-in functions
for dealing with uncertainty. The goal is to change that soon, but for
now, doing this will require `for`

loops and storing the
results of each iteration. Some example code is provided at the end of
this vignette.

NB: If you are already familiar with how to build IPMs, you can probably skip this part and get straight into the next section.

An IPM describes how the abundance and distribution of trait values
(also called *state variables*/*states*, and denoted \(z\) and \(z'\)) for a population changes in
discrete time. The distribution of trait values in a population at time
\(t\) is given by the function \(n(z,t)\). A simple IPM for the trait
distribution \(z'\) at time \(t+1\) is then:

- \(n(z', t+1) = \int_L^UK(z',z)n(z,t)\mathrm{dz}\)

\(K(z',z)\), known as the
*projection kernel*, describes all possible transitions of
existing individuals and recruitment of new individuals from \(t\) to \(t+1\), generating a new trait distribution
\(n(z',t+1)\). \(L,U\) are the lower and upper bounds for
values that the trait \(z\) can have,
which defines the *domain* over which the integration is
performed. The integral \(\int_L^Un(z,t)\mathrm{dz}\) gives the total
population size at time \(t\).

To make the model more biologically interpretable, the projection
kernel \(K(z',z)\) is usually
decomposed into *sub-kernels* (Eq 2). For example, a projection
kernel to describe a lifecycle where individuals can survive, transition
to different state values, and reproduce via sexual and asexual
pathways, can be decomposed as follows:

- \(K(z',z) = P(z',z) + F(z',z) + C(z',z)\)

\(P(z',z)\) is a sub-kernel
describing transitions due to survival and trait changes of existing
individuals, \(F(z',z)\) is a
sub-kernel describing per-capita sexual contributions of existing
individuals to recruitment, and \(C(z',z)\) is a sub-kernel describing
per-capita asexual contributions of existing individuals to recruitment.
The sub-kernels are typically comprised of functions derived from
regression models that relate an individuals trait value \(z\) at time \(t\) to a new trait value at \(t+1\). For example, the \(P\) kernel for Soay sheep (*Ovis
aries*) on St. Kilda (Eq 3) may contain two regression models: (i) a
logistic regression of survival on log body mass (Eq 4), and (ii) a
linear regression of log body mass at \(t+1\) on log body mass at \(t\) (Eq 5-6). Here, \(f_G\) is used to denote a normal
probability density function, and \(\sigma_G\) is computed as the standard
deviation of the residuals of the linear regression.

\(P(z',z) = s(z) * G(z',z)\)

\(Logit(s(z)) = \alpha_s + \beta_s * z\)

\(G(z',z) = f_G(z', \mu_G, \sigma_G)\)

\(\mu_G = \alpha_G + \beta_G * z\)

Projecting the population state \(n(z',t+1)\) requires a solution to the
integral in Eq 1. Analytical solutions to the integral in Eq 1 are
usually not possible (Ellner & Rees 2006). However, numerical
approximations of these integrals can be constructed using a numerical
integration rule. A commonly used rule is the midpoint rule (this is
currently the only integration rule available in `ipmr`

,
though others should be implemented soon(ish)). The midpoint rule
divides the domain \([L,U]\) into \(m\) artificial size bins centered at \(z_i\) with width \(h = (U-L) / m\). The midpoints \(z_i = L + (i - 0.5) * h\) for \(i = \textrm{1, 2, ...}, m\). The midpoint
rule approximation for Eq 1 then becomes:

- \(n(z_j, t+1) = h\sum\limits_{i = 1}^mK(z_j, z_i)n(z_i,t)\)

In practice, the numerical approximation of the integral converts the
continuous projection kernel into a (large) discretized matrix. A matrix
multiplication of the discretized projection kernel and the discretized
trait distribution then generates a new trait distribution, a process
referred to as *model iteration* (*sensu* Easterling et
al. 2000).

While `ipmr`

intends to mimic the notation above as
closely as possible, it does provide some abstraction layers. For
example, Equations 1 and 2 are generated internally by
`ipmr`

, so you will not need to define those.

`ipmr`

The first step of defining a model in `ipmr`

is to
initialize the model using `init_ipm()`

. This function has
five arguments: `sim_gen`

, `di_dd`

,
`det_stoch`

, `kern_param`

, and
`uses_age`

. We will ignore `uses_age`

for now,
because age-size models are less common and have their own
vignette.

The combination of these arguments defines the type of IPM, and makes sure that the machinery for subsequent analyses works correctly. The possible entries for each argument are as follows:

`sim_gen`

:`"simple"`

/`"general"`

**simple**: This describes an IPM with a single continuous state variable and no discrete stages. The model presented in the Terminology section is a simple IPM, because it makes use of one, and only one, continuous state variable (\(z\)).

**general**: This describes and IPM with either more than one continuous state variable, one or more discrete stages, or both of the above. Basically, anything other than an IPM with a single continuous state variable.

`di_dd`

:`"di"`

/`"dd"`

A.

**di**: This is used to denote a**d**ensity-**i**ndependent IPM.B.

**dd**: This is used to denote a**d**ensity-**d**ependent IPM.

`det_stoch`

:`"det"`

/`"stoch"`

A.

**det**: This is used to denote a deterministic IPM. If this is the third argument of`init_ipm`

,`kern_param`

must be left as`NULL`

.B.

**stoch**: This is used to denote a stochastic IPM. If this is the third argument of`init_ipm`

,`kern_param`

must be specified. The two possibilities for the fourth are described next.

`kern_param`

:`"kern"`

/`"param"`

(Complete definitions found in Metcalf et al. 2015)A.

**kern**: This describes an IPM with discretely varying parameters such that their values are known before the IPM is specified. This is usually the case for vital rate models that estimate discrete fixed and/or random year/site effects and we do not wish to sample from parameter distributions. These models can be a bit more computationally efficient than the`param`

alternative because all kernels can be constructed before the iteration procedure begins, as opposed to requiring reconstruction for every single iteration.`ipmr`

provides a parameter set index syntax to help describe these models quickly and safely. See below for more details on this feature.B.

**param**: This describes an IPM with parameters that are re-sampled from some distribution at each iteration of the model. This could be a multivariate normal defined by covarying slopes and intercepts, or distributions of environmental variables that change from time to time. All that is required is that the parameters for the distribution are specified and that the function that generates the parameters at each iteration returns named lists that correspond to the parameter names in the model. Jump down to the`"simple_di_stoch_param"`

example for some inspiration in writing those.

The rest of this vignette will deal with simple, density independent IPMs. If you already know that you need a general IPM (i.e. an IPM with discrete stages and/or multiple continuous state variables), I still strongly recommend reading at least one complete example here before you start with those.

This is the simplest model that `ipmr`

works with. It is
an IPM with a single continuous state variable and no density dependent
functions. We’ll walk through the steps required to implement such an
IPM before getting into more complex models.

The model that we are going to implement here is similar to the one above, except there is not asexual reproduction (i.e. no \(C(z', z)\)). It can be written like this:

\(n(z',t+1) = \int_L^U K(z',z)n(z,t)dz\)

\(K(z',z) = P(z',z) + F(z',z)\)

\(P(z',z) = s(z) * G(z',z)\)

\(Logit(s(z)) = \alpha_s + \beta_s * z\)

\(G(z',z) = f_G(z', \mu_G, \sigma_G)\)

\(\mu_G = \alpha_G + \beta_G * z\)

\(F(z',z) = r_r(z) * r_s(z) * r_d(z')\)

\(Logit(r_r(z)) = \alpha_{r_r} + \beta_{r_r} * z\)

\(Log(r_s(z)) = \alpha_{r_s} + \beta_{r_s} * z\)

\(r_d(z') = f_{r_d}(z', \mu_{r_d}, \sigma_{r_d})\)

\(f_{r_d}\) and \(f_G\) denote normal probability density functions. The vital rate functions and code that might be used to generate models corresponding to them are below.

Survival (\(s(z)\)/

`s`

): a generalized linear model w/ a logit link.- Example model formula:
`glm(surv ~ size_1, data = my_surv_data, family = binomial())`

- Example model formula:
Growth (\(G(z',z)\)/

`G`

): a linear model with a normal error distribution.- Example model formula:
`lm(size_2 ~ size_1, data = my_grow_data)`

- Example model formula:
Pr(flowering) (\(r_r(z)\)/

`r_r`

): a generalized linear model w/ a logit link.- Example model formula:
`glm(flower ~ size_1, data = my_repro_data, family = binomial())`

- Example model formula:
Seed production (\(r_s(z)\)/

`r_s`

): a generalized linear model w/ log link.- Example model formula:
`glm(seeds ~ size_1, data = my_flower_data, family = poisson())`

- Example model formula:
Recruit size distribution (\(r_d(z')\)/

`r_d`

): a normal distribution w parameters`mu_rd`

(mean) and`sd_rd`

(standard deviation).- Example code:
`mu_fd = mean(my_recr_data$size_2, na.rm = TRUE)`

and`sd_fd = sd(my_recr_data$size_2, na.rm = TRUE)`

- Example code:

The first step is to decide what class of model we want to implement.
We have one continuous state variable and no spatial or temporal
variation to deal with. Thus, we have a simple, density independent,
deterministic IPM. We initialize it with `init_ipm()`

.

`my_ipm <- init_ipm(sim_gen = "simple", di_dd = "di", det_stoch = "det")`

The next step is to define the sub-kernels comprising the IPM. This
corresponds to equations 3-10. These are defined individually with calls
to `define_kernel()`

.

```
my_ipm <- define_kernel(
proto_ipm = my_ipm,
name = "P",
# The formula describes how the vital rates generate the sub-kernel. This is
# equivalent to equation 3 above.
formula = s * g,
# Next, we define the vital rate expressions for the P kernel (Eqs 4-6).
# Here, we create an expression for the inverse of the link function from
# our GLM for survival. In this case, it's the inverse logit (Eq 4).
s = 1/(1 + exp(-(s_int + s_slope * dbh_1))),
# Growth is defined by Eqs 5-6 above. There is no inverse link function required
# because we use a Gaussian GLM with an identity link function (i.e. no
# transformation).
g = dnorm(dbh_2, g_mu, g_sd), # Eq 5
g_mu = g_int + g_slope * dbh_1, # Eq 6
# The family describes the type of transition that kernel produces. See below.
family = "CC",
data_list = list(
s_int = 0.2, # coef(my_surv_mod)[1]
s_slope = 0.5, # coef(my_surv_mod)[2]
g_int = 0.1, # coef(my_grow_mod)[1]
g_slope = 1.033, # coef(my_grow_mod)[2]
g_sd = 2.2 # sd(resid(my_grow_mod))
),
# states should be a list of the state variables that the kernel operates on
states = list(c('dbh')),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "g")
)
```

This function takes the kernel `name`

, the mathematical
`formula`

for the kernel, and expressions for the vital rates
that comprise it (`s`

, `g`

,
`g_mu`

).

The `family`

argument refers to the type of transition
that the kernel describes and has 4 options:

`"CC"`

- a continuous -> continuous transition`"CD"`

- a continuous -> discrete transition`"DC"`

- a discrete -> continuous transition`"DD"`

- a discrete -> discrete transition

These aren’t required for `simple`

IPMs, as all
transitions are from a continuous state to a continuous state, and so
`family = "CC"`

every time. If you forget to specify this for
simple models, `ipmr`

will automatically add it for you.
However, it will throw an error for general models. As such, it is
probably good to get in the habit of specifying it for every kernel.

The `data_list`

argument holds the constant parameters
(e.g. regression coefficient estimates).

The `states`

argument is a list containing the names of
the state variables in use. `ipmr`

internally appends
`_1`

and `_2`

to the names in `states`

list. These correspond the trait values at \(t\) and \(t+1\), respectively (distributions, if
continuously distributed states, single numbers if discrete traits).
They can also be referenced in the vital rate expressions. If they are
continuously distributed state variables, `ipmr`

will also
generate meshpoints for them that are defined in
`define_domains()`

(see below).

`uses_par_sets`

is a logical indicating whether or not
some or all of the model parameters have multiple values they can take
on. These usually correspond vital rates that are estimated from
populations in different years and/or sites. In this example, the model
is a simple, deterministic IPM and so this is set to `FALSE`

.
There are examples later on that introduce their usage and syntax, so we
will ignore it for now.

`evict_cor`

refers to whether or not to correct for
eviction in the kernel. If this is set to `TRUE`

, then you
must supply a function specifying which expressions need to be corrected
and the correction to apply. `ipmr`

provides
`truncated_distributions`

for now, though others will
eventually be implemented as well. Subsequent additions are mainly to
accommodate models in PADRINO, and I very strongly suggest sticking to
`truncated_distributions`

for your own models.

Finally, `ipmr`

is designed to be pipe-friendly. All functions
prefixed with `define_*`

take a `proto_ipm`

as
their first argument and always return a `proto_ipm`

, meaning
operations can be chained together with the `%>%`

operator. This function is included in `ipmr`

, so there is no
need to load any other packages to access it (e.g. `dplyr`

or
`magrittr`

). The first chunk below is equivalent to the two
chunks above.

```
# the %>% takes the result of the first operation and passes it as the first
# argument to the second function.
my_ipm <- init_ipm(sim_gen = "simple", di_dd = "di", det_stoch = "det") %>%
define_kernel(
name = "P",
formula = s * g,
family = "CC",
s = 1/(1 + exp(-(s_int + s_slope * dbh_1))),
g = dnorm(dbh_2, g_mu, g_sd),
g_mu = g_int + g_slope * dbh_1,
data_list = list(
s_int = -3,
s_slope = 0.5,
g_int = 0.1,
g_slope = 1.033,
g_sd = 2.2
),
states = list(c('dbh')),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions(fun = "norm",
target = "g")
)
```

We’ll now continue with specifying Eqs 7-10, which comprise the sexual reproduction portion of the IPM.

The rest of the model definition sequence in this example will use
the `%>%`

operator. *It is not a requirement* - you
can assign a value to `my_ipm`

at each step and the model
will be identical. Choose which ever process is more comfortable for
you!

```
my_ipm <- my_ipm %>%
define_kernel(
name = "F",
formula = r_r * r_s * r_d, # Eq 7
family = "CC",
# Eq 8. Again, we use the inverse logit transformation to compute pr(flowering)
r_r = 1/(1 + exp(-(r_r_int + r_r_slope * dbh_1))),
# Eq 9. We exponentiate this because of the log link in our seed production model
r_s = exp(r_s_int + r_s_slope * dbh_1),
# Eq 10. In this case, both the mean and standard deviation are constants
r_d = dnorm(dbh_2, r_d_mu, r_d_sd),
data_list = list(
r_r_int = -5, # coef(my_flower_mod)[1]
r_r_slope = 0.1, # coef(my_flower_mod)[2]
r_s_int = -3, # coef(my_seed_mod)[1]
r_s_slope = 0.03, # coef(my_seed_mod)[2]
r_d_mu = 1.2, # mean(my_recr_data$size_2, na.rm = TRUE)
r_d_sd = 0.7 # sd(my_recr_data$size_2, na.rm = TRUE)
),
states = list(c('dbh')),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "r_d")
)
```

We are now ready to describe how to implement the model numerically.

`impl_args`

)Next, we need to define how the kernels are implemented numerically,
and how they act on state values at time \(t\) to create new state values at \(t+1\). This step is where we define the
integration rule `int_rule`

, the state the kernels modify
`state_start`

, and the state that they produce
`state_end`

. This is done with the `define_impl()`

function. It takes a named list where names correspond to the kernel
names, and each entry is itself a list containing 3 slots:
`int_rule`

, `state_start`

, and
`state_end`

. That information is used to generate Eq 1 from
above internally - you will not ever need to define Eqs 1-2 when using
`ipmr`

.

```
my_ipm <- my_ipm %>%
define_impl(
# Create one list of lists with all the info
list(
# The components of the big list should be the sub-kernels. Each sub-kernel
# requires 3 pieces of information: the numerical integration rule, the
# state variable it modifies (state_start), and the state variable it
# generates (state_end).
P = list(int_rule = "midpoint",
state_start = "dbh",
state_end = "dbh"),
F = list(int_rule = "midpoint",
state_start = "dbh",
state_end = "dbh")
)
)
```

Because this format is a bit specific, there is a helper function
that can be called within `define_impl`

or called before
initializing the IPM to make sure everything is formatted correctly -
`make_ipml_args_list()`

.

The first argument to the `make_ipml_args_list()`

function
is `kernel_names`

. This is a character vector with kernel
names for which the implementation arguments are being supplied. These
should be the same as `name`

from
`define_kernel`

.

Next, `int_rule`

is a character vector, and currently
`'midpoint'`

is the only option that is implemented.
`'b2b'`

(bin to bin method) and `'cdf'`

(cumulative density function) are on the to-do list. Additional rules
may be added if there is more demand for certain ones.

`state_start`

and `state_end`

are always the
same in `simple_*`

IPMs. In `general_*`

ones, they
may be different as individuals move from, for example, discrete to
continuous states or vice versa. These must always match the names
provided in the `states`

list.

Elements of each vector in each argument in
`make_impl_args_list()`

are matched by position. Thus, if you
specify `"P"`

as the first entry in
`kernel_names`

, then the first entries of
`int_rule`

, `state_start`

, and
`state_end`

should all correspond to the `P`

kernel. If you specify `"F"`

as the second entry, the second
entries in `int_rule`

, `state_start`

, and
`state_end`

should be the rules that correspond to the
`F`

kernel, and so on.

It is important to note that `make_impl_args_list`

*does not return a proto_ipm*. Thus, it must either
be called before beginning the model definition, or inside of

`define_impl`

!```
# Alternative 1 - call make_impl_args_list() before beginning the IPM creation pipe
impl_args <- make_impl_args_list(
kernel_names = c("P", "F"),
int_rule = rep("midpoint", 2),
state_start = rep("dbh", 2),
state_end = rep("dbh", 2)
)
my_ipm <- my_ipm %>%
define_impl(impl_args)
my_ipm <- my_ipm %>%
# Alternative 2, put the call to make_impl_args_list() inside of define_impl().
define_impl(
make_impl_args_list(
kernel_names = c("P", "F"),
int_rule = rep("midpoint", 2),
state_start = rep("dbh", 2),
state_end = rep("dbh", 2)
)
)
```

The next step in creating an IPM with `ipmr`

is to define
the domain of each continuous state variable in the `states`

list. This is done with the `define_domains()`

function. When
the `int_rule`

is `"midpoint"`

, this takes a named
set of vectors that have 3 entries each. The name corresponds to the
`state`

it is associated with, the first entry is the lower
bound, the second entry is the upper bound, and the third entry is the
number of meshpoints.

Note that for other `int_rule`

s, the vectors associated
with each domain will look different. However, those are not yet
implemented and so beyond the scope of this vignette for now.

```
my_ipm <- my_ipm %>%
define_domains(
dbh = c( # the name of the state variable.
1, # the lower bound for the domain. L from Eq 1
30, # the upper bound for the domain. U from Eq 1
200 # the number of mesh points to use for integration
)
)
```

`ipmr`

computes all values by iterating models (as opposed
to eigenvalue methods). This means we have to specify the initial
conditions for each simulation. In this case, the only initial condition
we need to specify is the initial population state. We do this using
`define_pop_state()`

. It takes a set of named expressions:
the name corresponds to the state variable, and the expression generates
values. The name of each state variable should be prefixed with an
`n_`

. The resulting vector should be the same length as the
number of meshpoints for the state variable. This chunk generates a
uniform distribution to represent the initial population state.

```
my_ipm <- my_ipm %>%
define_pop_state(n_dbh = rep(1/200, 200))
```

The minimal set of information to generate a deterministic IPM is now
wrapped up in our `proto_ipm`

and it is time to run the
model. `make_ipm()`

is a generic function and can be used for
any type of `proto_ipm`

generated with `ipmr`

.

```
my_ipm <- make_ipm(my_ipm,
iterations = 100)
lambda_ipmr <- lambda(my_ipm)
repro_value <- left_ev(my_ipm)
stable_dist <- right_ev(my_ipm)
# make_ipm_report works on either proto_ipms or made ipm objects. This
# requires the 'rmarkdown' package to run.
make_ipm_report(my_ipm$proto_ipm, title = "my_ipm_report", render_output = TRUE)
```

In this example, there are no additional arguments that need to be
passed to `make_ipm()`

- the `proto_ipm`

has all
of the information needed to generate a deterministic iteration kernel.
All of the `make_ipm()`

methods return a list with a length
of 5 with entries:

`sub_kernels`

contains, in this example,`P`

and`F`

.`env_list`

contains an environment named`main_env`

. This environment holds key information for implementing the model, such as the meshpoints, and upper and lower state variable bounds. This can be omitted by setting`return_main_env = FALSE`

.`return_all_envs = TRUE`

in`make_ipm()`

returns the evaluation environments for each sub-kernel. These are used by subsequent analysis methods and so are important for internal usage, but are probably of limited direct use to most users.`env_seq`

contains either a character vector with the sequence of indices used to select kernels from the`sub_kernels`

during a stochastic simulation (`*_stoch_kern`

), or a matrix of parameter estimates from each iteration of the stochastic simulation (`*_kern_param`

). Not relevant for`*_det`

methods and so contains`NA`

.`pop_state`

contains a list of matrices for each item defined in`define_pop_state()`

. The rows of each matrix correspond to the population state and columns are time steps. Additionally, contains a slot`lambda`

with a 1 x`iterations`

matrix containing per-capita growth rates for every time step of the model.`proto_ipm`

contains the`proto_ipm`

object used to generate the model. This is used extensively by other methods in the package.

`lambda`

, `left_ev`

, and `right_ev`

are generic functions corresponding to the dominant eigenvalue, dominant
left eigenvector, and dominant right eigenvector of the iteration
kernel, respectively. `lambda`

is available for all classes
of IPMs, while `left/right_ev`

is available for all
deterministic IPMs. Stochastic equivalents of the latter will get
implemented eventually. `make_ipm_report()`

generates an
Rmarkdown file containing Latex equations and parameter values used to
implement the IPM. This may be useful for publications/appendices, or
for sending an IPM to the PADRINO project for
archiving.

`predict()`

methods in vital rate expressionsThe vast majority of IPM pipelines entail fitting a statistical models to data to create vital rate functions. Thus, we’re typically working with model objects rather than just parameter values. Sometimes the models have many terms, including interactions and/or discrete grouping effects, and the vital rate expressions are complicated to write out. Furthermore, the flexibility of IPMs means our vital rate models may be semi- or fully non-parametric, and writing out the functional form may not even be possible prior to the model fitting exercise (e.g. a GAM).

To deal with this, many regression modeling packages provide a
`predict`

method for the class of models that they implement.
We can use these in `ipmr`

to take the place of the vital
rate expressions (more common) or the kernel formulas (probably not as
common).

We’ll go through a quick example of how to incorporate these
expressions into the model building process using a data set for
*Carpobrotus edulis*, an invasive succulent. This is real data
collected in Israel and appears in Bogdan
et al. (2021). The data set is also included in `ipmr`

and can be loaded by running `data(iceplant_ex)`

.

First, we’ll load in the example data set and fit some simple vital rate models. Some notes about the data:

`log_size`

/`log_size_next`

are log transformed surface areas in \(\mathrm{m}^2\). This will be the state variable for the model, and is abbreviated`sa_1`

and`sa_2`

in the model implementation code below.`repro`

indicates whether or not a plant flowered and is either 0 or 1`flower_n`

indicates the number of flowers a reproductive plant made, so is either a positive integer or`NA`

.

\(f_G\) and \(f_{r_d}\) denote normal probability density
functions. The vital rates models and the IPM variable names are as
follows (`vital_rate_model`

,
`IPM_variable_name`

):

- Growth (
`grow_mod`

,`g`

)

\(G(z',z) = f_G(z', \mu_G, \sigma_G)\)

\(\mu_G = \alpha_G + \beta_G * z\)

- Survival (
`surv_mod`

,`s`

)

- \(Logit(s(z)) = \alpha_s + \beta_s * z\)

- Probability of flowering (
`repr_mod`

,`r_p`

)

- \(Logit(r_p(z)) = \alpha_{r_p} + \beta_{r_p} * z\)

- Number of flowers for flowering plants (
`seed_mod`

,`r_s`

)

- \(Log(r_s(z)) = \alpha_{r_s} + \beta_{r_s} * z\)

- Total number of new recruits at \(t +
1\) (
`recr_n`

,`recr_n`

)

- \(N_r\)

- Total number of flowers at \(t\)
(
`flow_n`

,`flow_n`

)

- \(N_f = \int_L^Ur_s(z)dz\)

- The number of new recruits at \(t+1\) per flower at \(t\) (
`NA`

,`r_r`

)

- \(\frac{N_r}{N_f}\)

- Recruit size distribution (
`recr_mu, recr_sd`

,`r_d`

)

- \(r_d(z') = f_{r_d}(z', \mu_{r_d}, \sigma_{r_d})\)

```
library(ipmr)
data(iceplant_ex)
# growth model
grow_mod <- lm(log_size_next ~ log_size, data = iceplant_ex)
grow_sd <- sd(resid(grow_mod))
# survival model
surv_mod <- glm(survival ~ log_size, data = iceplant_ex, family = binomial())
# Pr(flowering) model
repr_mod <- glm(repro ~ log_size, data = iceplant_ex, family = binomial())
# Number of flowers per plant model
seed_mod <- glm(flower_n ~ log_size, data = iceplant_ex, family = poisson())
# New recruits have no size(t), but do have size(t + 1)
recr_data <- subset(iceplant_ex, is.na(log_size))
recr_mu <- mean(recr_data$log_size_next)
recr_sd <- sd(recr_data$log_size_next)
# This data set doesn't include information on germination and establishment.
# Thus, we'll compute the realized recruitment parameter as the number
# of observed recruits divided by the number of flowers produced in the prior
# year.
recr_n <- length(recr_data$log_size_next)
flow_n <- sum(iceplant_ex$flower_n, na.rm = TRUE)
# Now, we bundle everything into a list as we did before. We can
# pass the model objects directly into the list, and do not need to extract
# coefficients.
params <- list(recr_mu = recr_mu,
recr_sd = recr_sd,
grow_sd = grow_sd,
surv_mod = surv_mod,
grow_mod = grow_mod,
repr_mod = repr_mod,
seed_mod = seed_mod,
recr_n = recr_n,
flow_n = flow_n)
# The lower and upper bounds for integration. Adding 20% on either end to minimize
# eviction. The minimum value is negative, so we multiply by 1.2, rather than 0.8.
# If it were positive, we would need to change that to 0.8.
L <- min(c(iceplant_ex$log_size, iceplant_ex$log_size_next), na.rm = TRUE) * 1.2
U <- max(c(iceplant_ex$log_size, iceplant_ex$log_size_next), na.rm = TRUE) * 1.2
```

We now have everything we need to get started. We use almost the same
code as the first example to implement the model, but substitute
`predict()`

in for the mathematical form of the vital rate
expressions. When we do this, we have to make sure that the names of the
`newdata`

argument in `predict`

match the names in
our model formula, and that the values we put into it are the names of
the state variables in our model. These same caveats apply to using
`predict`

interactively.

```
pred_ipm <- init_ipm(sim_gen = "simple", di_dd = "di", det_stoch = "det") %>%
define_kernel(
name = "P",
family = "CC",
formula = s * g,
# Instead of the inverse logit transformation, we use predict() here.
# We have to be sure that the "newdata" argument of predict is correctly specified.
# This means matching the names used in the model itself (log_size) to the names
# we give the domains (sa_1). In this case, we use "sa" (short for surface area) as
# the new data to generate predictions for.
s = predict(surv_mod,
newdata = data.frame(log_size = sa_1),
type = 'response'),
g_mu = predict(grow_mod,
newdata = data.frame(log_size = sa_1),
type = 'response'),
# We specify the rest of the kernel the same way.
g = dnorm(sa_2, g_mu, grow_sd),
states = list(c("sa")),
data_list = params,
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions(fun = "norm",
target = "g")
) %>%
define_kernel(
name = "F",
family = "CC",
formula = r_p * r_s * r_d * r_r,
# As above, we use predict(model_object). We make sure the names of the "newdata"
# match the names in the vital rate model formulas, and the values match the
# names of domains they use.
r_p = predict(repr_mod,
newdata = data.frame(log_size = sa_1),
type = 'response'),
r_s = predict(seed_mod,
newdata = data.frame(log_size = sa_1),
type = 'response'),
r_r = recr_n / flow_n,
r_d = dnorm(sa_2, recr_mu, recr_sd),
states = list(c("sa")),
data_list = params,
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions(fun = "norm",
target = "r_d")
) %>%
define_impl(
make_impl_args_list(
kernel_names = c("P", "F"),
int_rule = rep('midpoint', 2),
state_start = rep("sa", 2),
state_end = rep("sa", 2)
)
) %>%
define_domains(
sa = c(L,
U,
100)
) %>%
define_pop_state(n_sa = runif(100)) %>%
make_ipm(iterate = TRUE,
iterations = 200)
```

`make_ipm()`

can handle most types of widely used model
classes without any additional effort (e.g. `lm`

,
`glm`

, `lme4`

, `brms`

,
`nlme`

,`betareg`

). The list of method available in
`ipmr`

is not exhaustive though, and sometimes there will be
errors.

If you are sure that a `predict`

method exists for your
model type, but you get errors along the lines of
`Element <some number> of '.x' must be a vector, not a call`

,
this usually means that I forgot to add the class of model that you’re
trying to use to `ipmr`

. We can get around this by wrapping
those model objects in `use_vr_model()`

. Please also create
an Issue here
describing the model class you’re trying to work with so it can be added
to future versions.

Say our growth model isn’t recognized by `ipmr`

. We try to
`make_ipm()`

, and get our obscure error message about
`.x`

. The snippet below shows how to rectify this. We wrap
our model objects in `use_vr_model()`

when creating the
`data_list`

, and then re-run it with the exact same code as
above. `use_vr_model()`

tells `ipmr`

that this
object is a model object, not a raw parameter value. As the package
(hopefully) gets used more widely, this issue should disappear, and
`use_vr_model()`

will (hopefully) become obsolete.

```
params <- list(recr_mu = recr_mu,
recr_sd = recr_sd,
grow_sd = grow_sd,
surv_mod = use_vr_model(surv_mod), # wrap the model
grow_mod = use_vr_model(grow_mod), # wrap the model
repr_mod = use_vr_model(repr_mod), # wrap the model
seed_mod = use_vr_model(seed_mod), # wrap the model
recr_n = recr_n,
flow_n = flow_n)
pred_ipm <- init_ipm(sim_gen = "simple", di_dd = "di", det_stoch = "det") %>%
define_kernel(
name = "P",
family = "CC",
formula = s * g,
s = predict(surv_mod,
newdata = data.frame(log_size = sa_1),
type = 'response'),
g_mu = predict(grow_mod,
newdata = data.frame(log_size = sa_1),
type = 'response'),
g = dnorm(sa_2, g_mu, grow_sd),
states = list(c("sa")),
data_list = params,
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "g")
) %>%
define_kernel(
name = "F",
family = "CC",
formula = r_p * r_s * r_d * r_r,
r_p = predict(repr_mod,
newdata = data.frame(log_size = sa_1),
type = 'response'),
r_s = predict(seed_mod,
newdata = data.frame(log_size = sa_1),
type = 'response'),
r_r = recr_n / flow_n,
r_d = dnorm(sa_2, recr_mu, recr_sd),
states = list(c("sa")),
data_list = params,
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "r_d")
) %>%
define_impl(
make_impl_args_list(
kernel_names = c("P", "F"),
int_rule = rep('midpoint', 2),
state_start = rep("sa", 2),
state_end = rep("sa", 2)
)
) %>%
define_domains(
sa = c(L,
U,
100)
) %>%
define_pop_state(n_sa = runif(100)) %>%
make_ipm(iterate = TRUE,
iterations = 200)
```

Usually, we want a bit more out of our model than just the
eigenvalues and eigenvectors. Here is a quick example of how to compute
generation time (\(T\)) and the
per-generation growth rate (\(R_0\)).
We’ll write two functions to help out. These are not part of
`ipmr`

, but will eventually be incorporated into a separate
package for life history analyses with IPMs which is designed to work
with `ipmr`

’s models.

```
R_0 <- function(ipm) {
Fm <- ipm$sub_kernels$F
Pm <- ipm$sub_kernels$P
I <- diag(nrow(Pm))
N <- solve(I - Pm)
R <- Fm %*% N
out <- Re(eigen(R)$values[1])
return(out)
}
gen_T <- function(ipm) {
R0 <- R_0(ipm)
lam <- lambda(ipm) %>% as.vector()
out <- log(R0) / log(lam)
return(out)
}
R_0(pred_ipm)
gen_T(pred_ipm)
```

In the example above, we created a model with a single, deterministic kernel defined by a single state variable. Next, we’ll go through an example showing how to build kernels that are constructed from discretely varying parameter estimates.

Say we have multiple sites sampled in the same year, but are only
interested in each site’s asymptotic dynamics (perhaps to compare
performance across space). `ipmr`

uses a syntax that closely
mirrors the mathematical notation of these models to successively build
up more complicated expressions without requiring that much extra code
as compared to the examples above. `ipmr`

refers to these as
*parameter sets*, and abbreviates them `par_sets`

. For
example, a random intercept for site may be denoted \(\alpha_{site}\), where \(_{site}\) provides an index for the
different values that \(\alpha\) can
take on.

The general idea is to append an index suffix to each variable in the
code that is modified, and supply a list of values that the index can
actually take in the `par_set_indices`

slot of
`define_kernel()`

. In the example below, the \(site\) index subscripts are incorporated
into the `ipmr`

code using the `_site`

suffix.
There is a longer vignette dedicated to translating math to R code to
the `ipmr`

syntax here.

NB: because of the way these indices are handled internally, they should always appear at the end of the name they modify. For example, use

`s_slope_site`

instead of`s_site_slope`

. Multiple indices are fine as long as they chained together at the end (e.g.`s_slope_site_year`

). Models may still work if formatted with indices in the middle of a name, but they also may not (often with obscure error messages).

For this example, we’ll use the following vital rate models. The
`(g)lmer`

functions are from the `lme4`

package.

survival (

`s_site`

): a logistic regression with a random site intercept (`s_r_site`

).Example mathematical notation: \(Logit(s_{site}(z)) = \alpha_{s,site} + \alpha_s + \beta_s * z\)

Example model formula:

`glmer(surv ~ size_1 + (1 | site), data = my_surv_data, family = binomial()))`

growth (

`g_site`

): A linear regression random site intercept (`g_r_site`

).Example mathematical notation:

\(\mu_G = \alpha_{G,site} + \alpha_G + \beta_G * z\)

\(G_{site}(z',z) = f_G(z', \mu_{G,site}, \sigma_G)\)

Example model formula:

`lmer(size_2 ~ size_1 + (1 | site), data = my_grow_data, family = gaussian()))`

pr(flowering) (

`p_r`

): A logistic regression. This has no random site effect.Example mathematical notation: \(Logit(p_r(z)) = \alpha_{r_p} + \beta_{r_p} * z\)

Example model formula:

`glm(flower ~ size_1 , data = my_flower_data, family = binomial()))`

seed production (

`r_s_site`

): A Poisson regression with a random site intercept (`r_s_r_site`

)Example mathematical notation: \(Log(r_{s_{site}}(z)) = \alpha_{r_s, site} + \alpha_{r_s} + \beta_{r_s} * z\)

Example model formula:

`glmer(seed_num ~ size_1 + (1 | site), data = my_surv_data, family = poisson()))`

recruit size distribution (

`r_d`

): A normal distribution with two constant parameters, the mean (`mu_rd`

) and standard deviation (`sd_rd`

).- Example mathematical notation: \(r_d(z') = f_{r_d}(z', \mu_{r_d}, \sigma_{r_d})\)

The chunk below takes the place of fitting regression models to
actual data, so the code that replaces this chunk will look a little
different (and probably involve the use of
`fixef(some_vital_rate_model`

and
`ranef(some_vital_rate_model)`

)).

```
library(ipmr)
# Define some fixed parameters
fixed_list <- list(
s_int = 1.03, # fixef(my_surv_mod)[1] - uses fixef because we now have a model with random effects
s_slope = 2.2, # fixef(my_surv_mod)[2]
g_int = 3.7, # fixef(my_grow_mod)[1]
g_slope = 0.92, # fixef(my_grow_mod)[2]
sd_g = 0.9, # sd(resid(my_grow_mod))
r_r_int = 0.09, # coef(my_repro_mod)[1] - uses coef because there are no random effects in this model
r_r_slope = 0.05, # coef(my_repro_mod)[2]
r_s_int = 0.1, # fixef(my_flower_mod)[1]
r_s_slope = 0.005, # fixef(my_flower_mod)[2]
mu_rd = 9, # mean(my_recr_data$size_2, na.rm = TRUE)
sd_rd = 2 # sd(my_recr_data$size_2, na.rm = TRUE)
)
```

We’ve defined a `fixed_list`

that holds all of the fixed
parameters in our model. Next, we’ll make up some random site specific
intercepts, and add those to the `fixed_list`

, naming it
`all_params_list`

. You don’t necessarily need to rename
anything, this is just for disambiguation.

```
# Now, simulate some random intercepts for growth (g_), survival (s_),
# and offspring production (r_s_). This part is for the purpose of the example.
# First, we create vector of values that each random component can take.
g_r_int <- rnorm(5, 0, 0.3) # unlist(ranef(my_grow_mod)) for an lme4 output
s_r_int <- rnorm(5, 0, 0.7) # unlist(ranef(my_surv_mod)) for an lme4 output
r_s_r_int <- rnorm(5, 0, 0.2) # unlist(ranef(my_flower_mod)) for an lme4 output
# We'll call our hypothetical sites 1, 2, 3, 4, and 5. The "r" prefix is to
# remind us that these are random quantities.
nms <- paste("r_", 1:5, sep = "")
names(g_r_int) <- paste('g_', nms, sep = "")
names(s_r_int) <- paste('s_', nms, sep = "")
names(r_s_r_int) <- paste('r_s_', nms, sep = "")
# Each set of parameters is converted to a named list. The names should match
# the variables referenced in each define_kernel() call.
g_params <- as.list(g_r_int)
s_params <- as.list(s_r_int)
r_s_params <- as.list(r_s_r_int)
# add them all together using c()
all_params_list <- c(fixed_list, g_params, s_params, r_s_params)
```

We’ve created a list where each entry is named and contains a single
parameter value. This is now ready for use in
`define_kernel()`

.

```
my_ipm <- init_ipm(sim_gen = "simple", di_dd = "di", det_stoch = "det") %>%
define_kernel(
# Our P kernels will vary from site to site, so we index it with "_site"
name = 'P_site',
# Similarly, our survival and growth functions will vary from site to site
# so these are also indexed
formula = s_site * g_site,
family = "CC",
# The linear predictor for the survival function can be split out
# into its own expression as well. This might help keep track of things.
# Survival is indexed by site as well.
s_lin_site = s_int + s_r_site + s_slope * ht_1,
s_site = 1 / (1 + exp(-s_lin_site)),
# Again, we modify the vital rate expression to include "_site".
g_site = dnorm(ht_2, mean = mu_g_site, sd = sd_g),
mu_g_site = g_int + g_slope * ht_1 + g_r_site,
data_list = all_params_list,
states = list(c('ht')),
# Here, we tell ipmr that the model has some parameter sets, and
# provide a list describing the values the index can take. The values in
# par_set_indices are substituted for "site" everywhere in the model, except
# for the data list. This is why we had to make sure that the names there
# matched the levels we supply here.
uses_par_sets = TRUE,
par_set_indices = list(site = 1:5),
# We must also index the variables in the eviction function
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "g_site")
) %>%
define_kernel(
# The F kernel also varies from site to site
name = "F_site",
formula = r_r * r_s_site * r_d,
family = "CC",
# In this example, we didn't include a site level effect for probability
# of flowering, only seed production. Thus, this expression is NOT indexed.
r_r_lin = r_r_int + r_r_slope * ht_1,
r_r = 1 / (1 + exp(- r_r_lin)),
# We index the seed production expression with the site effect
r_s_site = exp(r_s_int + r_s_r_site + r_s_slope * ht_1),
r_d = dnorm(ht_2, mean = mu_rd, sd = sd_rd),
data_list = all_params_list,
states = list(c('ht')),
# As in the P kernel, we specify the values the index can have.
uses_par_sets = TRUE,
par_set_indices = list(site = 1:5),
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "r_d")
) %>%
define_impl(
make_impl_args_list(
# The impl_args are also modified with the index
kernel_names = c("P_site", "F_site"),
int_rule = rep("midpoint", 2),
state_start = rep("ht", 2),
state_end = rep("ht", 2)
)
) %>%
define_domains(ht = c(0.2, 40, 100)) %>%
# We also append the suffix in define_pop_state(). THis will create a deterministic
# simulation for every "site"
define_pop_state(n_ht_site = runif(100)) %>%
make_ipm(iterate = TRUE,
iterations = 100)
lambda(my_ipm)
```

Aside from appending the `_site`

index to a number of
variables, the code is pretty much the same as the first example.
`ipmr`

automatically substitutes 1, 2, 3, 4, and 5 for each
occurrence of `site`

in the kernel formulas and vital rate
expressions. Thus, `P_site`

is expanded to `P_1`

,
`P_2`

, `P_3`

, `P_4`

, `P_5`

,
`s_site`

to `s_1`

, `s_2`

,
`s_3`

, `s_4`

, and `s_5`

.
`s_r_site`

is converted to `s_r_1`

,
`s_r_2`

, etc. This is why we needed to make sure the names in
`all_params_list`

had the actual numbers appended to
them.

For more complicated models, you may need multiple indices. In those
cases, you can append them in any order you like to each variable. For
example, sites (`_site`

) and years (`_yr`

) could
be appended to survival (`s_`

) like so:

`s_site_yr = some_expression`

The `par_set_indices`

list then gets modified like so:

`par_set_indices = list(site = c("A", "B", "C"), yr = c(2001:2005))`

In the examples above, we first created a model with a single, deterministic kernel defined by a single state variable. Then we modeled multiple sites with deterministic kernels that varied from site to site. Next, we’ll go through an example showing how to build stochastic models from kernels that are constructed from discretely varying parameter estimates.

The example below is the same as the last example, except that this
time, we call our random effect “year” (indexed with `_yr`

)
and we’ll run a simulation by randomly choosing a year’s kernel for each
model iteration (“kernel resampling” *sensu* Metcalf
et al. 2015). Along the way, we’ll also learn how to pass custom
functions to the model.

The vital rate models are as follows:

survival (

`s_yr`

): a logistic regression with a random year intercept (`s_r_yr`

).- Example model formula:
`glmer(surv ~ size_1 + (1 | yr), data = my_surv_data, family = binomial()))`

- Example model formula:
growth (

`g_yr`

): A linear regression random year intercept (`g_r_yr`

).- Example model formula:
`lmer(size_2 ~ size_1 + (1 | yr), data = my_grow_data, family = gaussian()))`

- Example model formula:
pr(flowering) (

`p_r`

): A logistic regression. This has no random year effect.- Example model formula:
`glm(flower ~ size_1 , data = my_surv_data, family = binomial()))`

- Example model formula:
seed production (

`r_s_yr`

): A Poisson regression with a random year intercept (`r_s_r_yr`

)- Example model formula:
`glmer(seed_num ~ size_1 + (1 | yr), data = my_surv_data, family = poisson()))`

- Example model formula:
recruit size distribution (

`r_d`

): A normal distribution with two constant parameters, the mean (`mu_rd`

) and standard deviation (`sd_rd`

).

The chunk below takes the place of fitting regression models to
actual data, so the code that replaces this chunk will look a little
different (and probably involve the use of
`fixef(some_vital_rate_model)`

and/or
`ranef(some_vital_rate_model)`

).

```
library(ipmr)
# Define some fixed parameters
fixed_list <- list(
s_int = 1.03, # fixef(my_surv_mod)[1] - uses fixef because we now have a model with random effects
s_slope = 2.2, # fixef(my_surv_mod)[2]
g_int = 3.7, # fixef(my_grow_mod)[1]
g_slope = 0.92, # fixef(my_grow_mod)[2]
sd_g = 0.9, # sd(resid(my_grow_mod))
r_r_int = 0.09, # coef(my_repro_mod)[1] - uses coef because there are no random effects in this model
r_r_slope = 0.05, # coef(my_repro_mod)[2]
r_s_int = 0.1, # fixef(my_flower_mod)[1]
r_s_slope = 0.005, # fixef(my_flower_mod)[2]
mu_fd = 9, # mean(my_recr_data$size_2, na.rm = TRUE)
sd_fd = 2 # sd(my_recr_data$size_2, na.rm = TRUE)
)
```

We’ve defined a `fixed_list`

that holds all of the fixed
parameters in our model. Next, we’ll make up some random year specific
intercepts, and add those to the `fixed_list`

, naming it
`all_params_list`

. You don’t necessarily need to rename
anything, this is just for disambiguation.

```
# Now, simulate some random intercepts for growth (g_), survival (s_),
# and offspring production (r_s_). This part is for the purpose of the example.
# First, we create vector of values corresponding to
g_r_int <- rnorm(5, 0, 0.3) # unlist(ranef(my_grow_mod)) for an lme4 output
s_r_int <- rnorm(5, 0, 0.7) # unlist(ranef(my_surv_mod)) for an lme4 output
r_s_r_int <- rnorm(5, 0, 0.2) # unlist(ranef(my_flower_mod)) for an lme4 output
nms <- paste("r_", 1:5, sep = "")
names(g_r_int) <- paste('g_', nms, sep = "")
names(s_r_int) <- paste('s_', nms, sep = "")
names(r_s_r_int) <- paste('r_s_', nms, sep = "")
# Each set of parameters is converted to a named list. The names should match
# the variables referenced in each define_kernel() call.
g_params <- as.list(g_r_int)
s_params <- as.list(s_r_int)
r_s_params <- as.list(r_s_r_int)
# add them all together using c()
all_params_list <- c(fixed_list, g_params, s_params, r_s_params)
```

We’ve created a list where each entry is named and contains a single
parameter value. This is now ready for use in
`define_kernel()`

We have our parameter values. In the next step, we’ll compose some
helper functions to make the vital rate expressions inside of
`define_kernel()`

a bit more concise and expressive. These
are passed in a list to the `usr_funs`

argument of
`make_ipm()`

.

```
inv_logit <- function(sv, int, slope) {
return(
1/(1 + exp(-(int + slope * sv)))
)
}
# same as above, but handles the extra term from the random effect we simulated.
inv_logit_r <- function(sv, int, slope, r_eff) {
return(
1/(1 + exp(-(int + slope * sv + r_eff)))
)
}
pois_r <- function(sv, int, slope, r_eff) {
return(
exp(
int + slope * sv + r_eff
)
)
}
my_funs <- list(inv_logit = inv_logit,
inv_logit_r = inv_logit_r,
pois_r = pois_r)
```

The only requirement for these functions is that they contain valid R code, and they return either real numbers or integers.

With the functions and parameter values defined, we are now ready to
begin composing the model. Each expression’s syntax will look very
similar to the deterministic example - the main difference is we are now
pretending that our expressions are indexed by year (`_yr`

)
as opposed to site, and we change the `det_stoch`

and
`kern_param`

arguments in `init_ipm()`

.

```
my_ipm <- init_ipm(sim_gen = "simple",
di_dd = "di",
det_stoch = "stoch",
kern_param = "kern") %>%
define_kernel(
name = 'P_yr', # P becomes P_yr
formula = s_yr * g_yr, # g and s become g_yr and s_yr, respectively
family = "CC",
# Note the usage of the inv_logit_r, which we defined in the block above.
# it is passed to make_ipm()
s_yr = inv_logit_r(ht_1, s_int, s_slope, s_r_yr),
g_yr = dnorm(ht_2, mean = mu_g_yr, sd = sd_g),
mu_g_yr = g_int + g_slope * ht_1 + g_r_yr,
# all_params_list contains the named parameters g_r_1, g_r_2, s_r_1, s_r_2, etc.
# This is the only level where the user is required to fully expand the name
# X par_set_indices combinations.
data_list = all_params_list,
states = list(c('ht')),
uses_par_sets = TRUE,
par_set_indices = list(yr = 1:5),
evict_cor = TRUE,
# reference to g_yr in evict_fun is also updated
evict_fun = truncated_distributions("norm", "g_yr")
) %>%
define_kernel(
name = "F_yr", # Update the names as we did for the P kernel
formula = r_r * r_s_yr * r_d,
family = "CC",
r_r = inv_logit(ht_1, r_r_int, r_r_slope),
r_s_yr = pois_r(ht_1, r_s_int, r_s_slope, r_s_r_yr),
r_d = dnorm(ht_2, mean = mu_fd, sd = sd_fd),
data_list = all_params_list,
states = list(c('ht')),
uses_par_sets = TRUE,
par_set_indices = list(yr = 1:5),
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "r_d")
) %>%
define_impl(
make_impl_args_list(
kernel_names = c("P_yr", "F_yr"),
int_rule = rep("midpoint", 2),
state_start = rep("ht", 2),
state_end = rep("ht", 2)
)
) %>%
define_domains(ht = c(0.2, 40, 100)) %>%
define_pop_state(n_ht = runif(100)) %>%
make_ipm(usr_funs = my_funs,
kernel_seq = sample(1:5, 100, replace = TRUE),
iterate = TRUE,
iterations = 100)
# The default for stochastic lambda is to compute mean(log(ipm$pop_state$lambda)).
# It also removes the first 10% of iterations to handle "burn in". The amount
# removed can be adjusted with the burn_in parameter.
stoch_lambda <- lambda(my_ipm)
# lambda(comp_method = 'pop_size', type = 'all') will compute the population
# growth rate for every time step as the sum(n_ht_t_1) / sum(n_ht_t).
iteration_lambdas <- lambda(my_ipm, type_lambda = 'all')
```

The only major difference between the IPM definition in the first
example and this one is that we’ve passed custom functions to the call
to `make_ipm()`

, and altered our vital rate expressions to
use them instead of the pure math for each variable transformation. On
the other hand, the contents of the output *will* look a little
different.

The

`sub_kernels`

slot of`my_ipm`

now contains 5 P and 5 F kernels (10 total).The

`env_seq`

slot will now contain a character vector with the indices used to select sub-kernels for each model iteration. These can be used to reproduce results later on.All other slots will look the same as in the previous

`simple_di_det`

example.

100 iterations is not enough to estimate stochastic growth rates (\(\lambda_s\)), but the computations can take some time with a lot of iterations, and are not practical for demonstration purposes.

In some cases, it is not desirable to work with single estimates of a random variable, and we would prefer to work with the distributions they come from. Environmental variables like temperature and precipitation may be random with reasonably well known distributions. A stochastic simulation can incorporate this information to help us understand the consequences of this random variability.

This is where the `kern_param = "param"`

methods come in
handy. Unfortunately, these methods are less computationally efficient
than their `kern_param = "kern"`

counterparts because they
must rebuild each kernel for every single iteration. However, they are
fantastic tools for exploring the consequences of continuous
environmental variation.

Below is an example that demonstrates how to work with environmental
variation. `ipmr`

is careful to only evaluate each expression
in `define_env_state()`

once per iteration, so we can safely
work with multivariate distributions using very little additional code.
It is important to note that this not necessarily the same thing as
incorporating parameter uncertainty into your model. Parameter
uncertainty is covered in the final section of this vignette.

The parameters for the growth and survival functions will be sampled
from distributions for temperature and precipitation one time per
iteration, and an example of a function to pass to
`define_env_state()`

is included in the first chunk
below.

Stochastic simulations require specification of the initial
conditions. `ipmr`

aims to make this straightforward for you
by providing two helpers - `define_pop_state()`

and
`define_env_state()`

. We were introduced
`define_pop_state()`

above, and will now cover
`define_env_state()`

.

`define_env_state()`

`define_env_state()`

takes a named set of expressions in
the `...`

and then a data list, much like how
`define_kernel()`

takes them. The values created need to be
in a list that has names corresponding to the parameter names in the
vital rate expressions. In this example, they are called
`temp`

and `precip`

. The `data_list`

in
`define_env_state()`

should contain any variables used in the
function we define (in this example, `sample_env()`

). we can
reference the names in list that `sample_env`

returns in the
vital rate expressions in each kernel definition as if they were in the
`data_list`

of `define_kernel`

.

The first chunk below initializes the parameters and functions that the model uses. It takes the place of the usual vital rate model fitting process.

This example uses models for survival and growth that include
environmental covariates. To limit complexity, we won’t include an
interaction term, but you are free to include as many as you want in
your own models. We define a single function to sample the environmental
distributions to illustrate how to use continuously varying parameter
distributions in `ipmr`

. This only uses two models and one
function to limit the the complexity of the example, however there is no
upper limit on the number of parameters or functions you can use in your
own models.

The vital rate functions are described here:

survival (

`s`

): a logistic regression.- example model formula:
`glm(survival ~ size_1 + temp + precip, data = my_surv_data, family = binomial())`

- example model formula:
growth (

`g`

): a linear regression- example model formula:
`glm(size_2 ~ size_1 + temp + precip, data = my_grow_data)`

- example model formula:
flower probability (

`r_r`

): A logistic regression.- example model formula:
`glm(repro ~ size_1, data = my_repro_data, family = binomial())`

- example model formula:
seed production (

`r_s`

): a logistic regression.- example model formula:
`glm(flower_n ~ size_1, data = my_flower_data, family = poisson())`

- example model formula:
recruit sizes (

`r_d`

): A normal distribution- example code: mean (
`r_d_mu`

)`mean(my_recruit_data$size_2, na.rm = TRUE)`

and standard deviation (`r_d_sd`

)`sd(my_recruit_data$size_2, na.rm = TRUE)`

- example code: mean (

And the the parameter values are given here:

```
library(ipmr)
# Define the fixed parameters in a list. The temperature and precipitation
# coefficients are defined as s_temp, s_precip, g_temp, and g_precip.
constant_params <- list(
s_int = -10,
s_slope = 1.5,
s_precip = 0.00001,
s_temp = -0.003,
g_int = 0.2,
g_slope = 1.01,
g_sd = 1.2,
g_temp = -0.002,
g_precip = 0.004,
r_r_int = -3.2,
r_r_slope = 0.55,
r_s_int = -0.4,
r_s_slope = 0.5,
r_d_mu = 1.1,
r_d_sd = 0.1
)
# Now, we create a set of environmental covariates. In this example, we use
# a normal distribution for temperature and a Gamma for precipitation.
env_params <- list(
temp_mu = 8.9,
temp_sd = 1.2,
precip_shape = 1000,
precip_rate = 2
)
# We define a wrapper function that samples from these distributions
sample_env <- function(env_params) {
# We generate one value for each covariate per iteration, and return it
# as a named list. We can reference the names in this list in vital rate
# expressions.
temp_now <- rnorm(1,
env_params$temp_mu,
env_params$temp_sd)
precip_now <- rgamma(1,
shape = env_params$precip_shape,
rate = env_params$precip_rate)
out <- list(temp = temp_now, precip = precip_now)
return(out)
}
# Again, we can define our own functions and pass them into calls to make_ipm. This
# isn't strictly necessary, but can make the model code more readable/less error prone.
inv_logit <- function(lin_term) {
1/(1 + exp(-lin_term))
}
```

We now have parameter estimates. Time to build the IPM!

```
init_pop_vec <- runif(100)
param_resamp_model <- init_ipm(sim_gen = "simple",
di_dd = "di",
det_stoch = "stoch",
kern_param = "param") %>%
define_kernel(
name = 'P',
formula = s * g,
family = 'CC',
# Parameters created by define_env_state() can be referenced by name just like
# any other parameter in the model.
g_mu = g_int + g_slope * surf_area_1 + g_temp * temp + g_precip * precip,
s_lin_p = s_int + s_slope * surf_area_1 + s_temp * temp + s_precip * precip,
s = inv_logit(s_lin_p),
g = dnorm(surf_area_2, g_mu, g_sd),
data_list = constant_params,
states = list(c('surf_area')),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "g")
) %>%
define_kernel(
name = 'F',
formula = r_r * r_s * r_d,
family = 'CC',
r_r_lin_p = r_r_int + r_r_slope * surf_area_1,
r_r = inv_logit(r_r_lin_p),
r_s = exp(r_s_int + r_s_slope * surf_area_1),
r_d = dnorm(surf_area_2, r_d_mu, r_d_sd),
data_list = constant_params,
states = list(c('surf_area')),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "r_d")
) %>%
define_impl(
make_impl_args_list(
kernel_names = c("P", "F"),
int_rule = rep('midpoint', 2),
state_start = rep('surf_area', 2),
state_end = rep('surf_area', 2)
)
) %>%
define_domains(surf_area = c(0, 10, 100))
```

The continuously varying parameters and the expressions that generate
them should be passed into `define_env_state()`

.
`make_ipm()`

will ensure that these are evaluated only once
per iteration of the model, so that we can safely work with joint
distributions that generate multiple parameter estimates per draw.

The `...`

part of `define_env_state()`

should
be expressions that generate the variables we would like to reference.
`temp`

and `precip`

are the names that will be
generated by the IPM code, and so they are the names that should be
referenced in the kernels’ vital rate expressions, not
`env_covs`

. we don’t need to remember that we called this
particular object `env_covs`

when we write our vital rate
expressions, but it does have to be named something for the IPM code to
run.

The `data_list`

contains the list of parameter values for
the expressions in `...`

, and any custom functions that we
specify to sample with. In this example, the environmental variables
don’t co-vary, but this is not a requirement. We could just as easily
specify them as coming from a multivariate distribution, and modify our
`env_sample`

function accordingly.

```
param_resamp_model <- param_resamp_model %>%
define_env_state(
env_covs = sample_env(env_params),
data_list = list(env_params = env_params,
sample_env = sample_env)
) %>%
define_pop_state(
pop_vectors = list(
n_surf_area = init_pop_vec
)
) %>%
make_ipm(usr_funs = list(inv_logit = inv_logit),
iterate = TRUE,
iterations = 10,
return_sub_kernels = TRUE)
# By default, lambda computes stochastic lambda with stochastic models
lambda(param_resamp_model)
# We can get lambdas for each time step by adding type_lambda = "all"
lambda(param_resamp_model, type_lambda = 'all')
# If we want to see the actual draws that were used at each step of the
# model iteration, we can access these using the output's $env_seq slot.
param_resamp_model$env_seq
```

Longer running stochastic parameter-resampled models can take up a
lot of space in memory when all of the sub-kernels are saved from each
iteration. For example, running the model above for 10,000 iterations
would result in 20,000 \(100 \times
100\) matrices in the `sub_kernels`

slot of the IPM
object (~16 GB of RAM). This will likely result in crashes as smaller
machines run out of available RAM. Therefore, `make_ipm()`

contains an argument `return_sub_kernels`

for these types of
models that allows you to switch off that behavior and conserve
available RAM. By default, this is set to `FALSE`

. If you
need sub-kernels for downstream analyses, set this option to
`TRUE`

and make sure you have a computer with sufficient RAM
(64-128 GB range is likely required to store all information for longer
running models).

These warnings also apply to *all* density dependent model
classes and the same `return_sub_kernels`

argument can be
used for those as well.

In some cases, we might want to provide a sequence of environmental
values ahead of time, as opposed to sampling them at each iteration. We
can do this by re-writing the `sample_env`

function so that
it takes the current model iteration as a parameter that selects values
from the pre-specified environmental values. `ipmr`

defines
the variable `t`

internally, which we can use to access the
current model iteration. First, we re-define the model:

```
param_resamp_model <- init_ipm(sim_gen = "simple",
di_dd = "di",
det_stoch = "stoch",
kern_param = "param") %>%
define_kernel(
name = 'P',
formula = s * g,
family = 'CC',
# Parameters created by define_env_state() can be referenced by name just like
# any other parameter in the model.
g_mu = g_int + g_slope * surf_area_1 + g_temp * temp + g_precip * precip,
s_lin_p = s_int + s_slope * surf_area_1 + s_temp * temp + s_precip * precip,
s = inv_logit(s_lin_p),
g = dnorm(surf_area_2, g_mu, g_sd),
data_list = constant_params,
states = list(c('surf_area')),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "g")
) %>%
define_kernel(
name = 'F',
formula = r_r * r_s * r_d,
family = 'CC',
r_r_lin_p = r_r_int + r_r_slope * surf_area_1,
r_r = inv_logit(r_r_lin_p),
r_s = exp(r_s_int + r_s_slope * surf_area_1),
r_d = dnorm(surf_area_2, r_d_mu, r_d_sd),
data_list = constant_params,
states = list(c('surf_area')),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "r_d")
) %>%
define_impl(
make_impl_args_list(
kernel_names = c("P", "F"),
int_rule = rep('midpoint', 2),
state_start = rep('surf_area', 2),
state_end = rep('surf_area', 2)
)
) %>%
define_domains(surf_area = c(0, 10, 100))
```

Next, we generate some parameter values for `temp`

and
`precip`

:

```
env_states <- data.frame(precip = rgamma(10, shape = 1000, rate = 2),
temp = rnorm(10, mean = 8.9, sd = 1.2))
```

In this case, we are trying to sample these in a specific order - the
1st row first, the 2nd row second, 3rd row third, etc. Instead of
writing a function that generates random draws from a distribution, we
write one that samples rows from this data frame and creates a named
list. `ipmr`

generates a helper variable, `t`

,
that lets us access the current model iteration. Thus, our function to
generate the environmental variables at each time step could look like
this:

```
sample_env <- function(env_states, iteration) {
out <- as.list(env_states[iteration, ])
names(out) <- names(env_states)
return(out)
}
```

We can now `define_env_state()`

and
`define_pop_state()`

:

```
param_resamp_model <- param_resamp_model %>%
define_env_state(env_params = sample_env(env_states,
iteration = t), # "t" indexes the current model iteration
data_list = list(
env_states = env_states,
sample_env = sample_env
)) %>%
define_pop_state(
pop_vectors = list(
n_surf_area = init_pop_vec
)
) %>%
make_ipm(usr_funs = list(inv_logit = inv_logit),
iterate = TRUE,
iterations = 10)
```

Notice that we set `iteration = t`

in the call to
`sample_env`

. Otherwise, everything else looks the same as
before.

Above, we defined the environment explicitly ahead of time. However,
there may be cases where we’d rather incorporate other environmental
models directly into our stochastic IPM (e.g. a specific climate change
scenario). We *could* sample our environmental model ahead of
time and simply incorporate those directly into the model, indexing by
time, similar to what we did above. However, this may not be convenient
for some purposes.

Below is an example that is similar to above, but includes information on time when drawing environmental covariate values. This simulation will set up a scenario in which temperature increases and precipitation decreases with time, and adds random noise to each draw.

```
param_resamp_model <- init_ipm(sim_gen = "simple",
di_dd = "di",
det_stoch = "stoch",
kern_param = "param") %>%
define_kernel(
name = 'P',
formula = s * g,
family = 'CC',
# Parameters created by define_env_state() can be referenced by name just like
# any other parameter in the model.
g_mu = g_int + g_slope * surf_area_1 + g_temp * temp + g_precip * precip,
s_lin_p = s_int + s_slope * surf_area_1 + s_temp * temp + s_precip * precip,
s = inv_logit(s_lin_p),
g = dnorm(surf_area_2, g_mu, g_sd),
data_list = constant_params,
states = list(c('surf_area')),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "g")
) %>%
define_kernel(
name = 'F',
formula = r_r * r_s * r_d,
family = 'CC',
r_r_lin_p = r_r_int + r_r_slope * surf_area_1,
r_r = inv_logit(r_r_lin_p),
r_s = exp(r_s_int + r_s_slope * surf_area_1),
r_d = dnorm(surf_area_2, r_d_mu, r_d_sd),
data_list = constant_params,
states = list(c('surf_area')),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "r_d")
) %>%
define_impl(
make_impl_args_list(
kernel_names = c("P", "F"),
int_rule = rep('midpoint', 2),
state_start = rep('surf_area', 2),
state_end = rep('surf_area', 2)
)
) %>%
define_domains(surf_area = c(0, 10, 100))
```

Next, we’ll set up a model for temperature and precipitation, and a
function that samples from those models to generate values we can use in
our IPM simulation. Setting the initial values
`init_temp`

/`init_precip`

is optional, but making
them parameters makes it a bit easier to specify a different model later
on.

```
env_sampler <- function(time, init_temp = 10, init_precip = 1500) {
t_est <- init_temp + 0.2 * time + rnorm(1, mean = 0, sd = 0.2)
p_est <- init_precip + -3.3 * time + rnorm(1, mean = 0, sd = 100)
if(p_est <= 0) p_est <- 1
out <- list(temp = t_est,
precip = p_est)
return(out)
}
```

We’re pretty much ready to run our simulation now!

```
param_resamp_ipm <- param_resamp_model %>%
define_env_state(
env_params = env_sampler(time = t, init_temp = 10, init_precip = 1500),
data_list = list()
) %>%
define_pop_state(
n_surf_area = init_pop_vec
) %>%
make_ipm(
iterations = 100,
usr_funs = list(env_sampler = env_sampler,
inv_logit = inv_logit)
)
lambda(param_resamp_ipm)
```

Currently, `ipmr`

doesn’t contain functions to deal with
uncertainty. The goal is to change that soon. In the meantime, here is
an example of how to deal with that in the context of a simple,
deterministic IPM. The procedure is similar for kernel- and
parameter-resampled models, and can be adapted to suit those as
needed.

There are a multitude of ways to incorporate uncertainty and other sources of continuous variation into regression models and IPMs. That variety is beyond the scope of this vignette. The following resources are excellent introductions to both and themselves contain a multitude of further readings:

For the purpose of this example, we’ll use the iceplant data set and do a bootstrapping procedure on the raw data. At each iteration of the bootstrap, we’ll re-run the regression models, and then re-build the IPM. We’ll store the lambda values for each one, and then repeat the procedure 50 times to keep the example short. Normally, many more re-samplings would be required to obtain reasonable confidence intervals.

First, we construct our vital rate models and IPM from the observed data.

```
library(ipmr)
data(iceplant_ex)
grow_mod <- lm(log_size_next ~ log_size, data = iceplant_ex)
grow_sd <- sd(resid(grow_mod))
surv_mod <- glm(survival ~ log_size, data = iceplant_ex, family = binomial())
repr_mod <- glm(repro ~ log_size, data = iceplant_ex, family = binomial())
seed_mod <- glm(flower_n ~ log_size, data = iceplant_ex, family = poisson())
recr_data <- subset(iceplant_ex, is.na(log_size))
recr_mu <- mean(recr_data$log_size_next)
recr_sd <- sd(recr_data$log_size_next)
recr_n <- length(recr_data$log_size_next)
flow_n <- sum(iceplant_ex$flower_n, na.rm = TRUE)
params <- list(recr_mu = recr_mu,
recr_sd = recr_sd,
grow_sd = grow_sd,
surv_mod = surv_mod,
grow_mod = grow_mod,
repr_mod = repr_mod,
seed_mod = seed_mod,
recr_n = recr_n,
flow_n = flow_n)
L <- min(c(iceplant_ex$log_size, iceplant_ex$log_size_next), na.rm = TRUE) * 1.2
U <- max(c(iceplant_ex$log_size, iceplant_ex$log_size_next), na.rm = TRUE) * 1.2
obs_ipm <- init_ipm(sim_gen = "simple",
di_dd = "di",
det_stoch = "det") %>%
define_kernel(
name = "P",
family = "CC",
formula = s * g,
# Instead of the inverse logit transformation, we use predict() here.
# We have to be sure that the "newdata" argument of predict is correctly specified.
# This means matching the names used in the model itself (log_size) to the names
# we give the domains. In this case, we use "sa" (short for surface area) as
# the new data to generate predictions for.
s = predict(surv_mod,
newdata = data.frame(log_size = sa_1),
type = 'response'),
g_mu = predict(grow_mod,
newdata = data.frame(log_size = sa_1),
type = 'response'),
# We specify the rest of the kernel the same way.
g = dnorm(sa_2, g_mu, grow_sd),
states = list(c("sa")),
data_list = params,
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions(fun = "norm",
target = "g")
) %>%
define_kernel(
name = "F",
family = "CC",
formula = r_p * r_s * r_d * r_r,
# As above, we use predict(model_object). We make sure the names of the "newdata"
# match the names in the vital rate model formulas, and the values match the
# names of domains they use.
r_p = predict(repr_mod,
newdata = data.frame(log_size = sa_1),
type = 'response'),
r_s = predict(seed_mod,
newdata = data.frame(log_size = sa_1),
type = 'response'),
r_r = recr_n / flow_n,
r_d = dnorm(sa_2, recr_mu, recr_sd),
states = list(c("sa")),
data_list = params,
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions(fun = "norm",
target = "r_d")
) %>%
define_impl(
make_impl_args_list(
kernel_names = c("P", "F"),
int_rule = rep('midpoint', 2),
state_start = rep("sa", 2),
state_end = rep("sa", 2)
)
) %>%
define_domains(
sa = c(L,
U,
100)
) %>%
define_pop_state(n_sa = runif(100)) %>%
make_ipm(iterate = TRUE,
iterations = 100)
lambda_obs <- lambda(obs_ipm)
```

Next, we’ll initialize a vector to hold the final time-step lambdas for each resampled dataset. We’re also going to split out the new recruits from the other data. We only have 12, and the model building won’t work if we happen to not pull any of them out when we resample the data.

```
all_lambdas <- numeric(50L)
recr_data <- subset(iceplant_ex, is.na(log_size))
adults <- subset(iceplant_ex, !is.na(log_size))
use_proto <- obs_ipm$proto_ipm
```

Now, we refit the vital rate models, and use the
`parameters<-`

setter function to update the original
`proto_ipm`

object with the new vital rate models. This saves
us from re-typing the whole model pipeline again. Normally, we would
bootstrap more than 50 times, but for the sake of this example and
saving time, we will only do 50.

```
for(i in 1:50) {
sample_ind <- seq(1, nrow(adults), by = 1)
boot_ind <- sample(sample_ind, size = nrow(adults), replace = TRUE)
boot_data <- rbind(adults[boot_ind, ],
recr_data)
grow_mod <- lm(log_size_next ~ log_size, data = boot_data)
grow_sd <- sd(resid(grow_mod))
surv_mod <- glm(survival ~ log_size, data = boot_data, family = binomial())
repr_mod <- glm(repro ~ log_size, data = boot_data, family = binomial())
seed_mod <- glm(flower_n ~ log_size, data = boot_data, family = poisson())
recr_mu <- mean(recr_data$log_size_next)
recr_sd <- sd(recr_data$log_size_next)
recr_n <- length(recr_data$log_size_next)
flow_n <- sum(boot_data$flower_n, na.rm = TRUE)
params <- list(recr_mu = recr_mu,
recr_sd = recr_sd,
grow_sd = grow_sd,
surv_mod = surv_mod,
grow_mod = grow_mod,
repr_mod = repr_mod,
seed_mod = seed_mod,
recr_n = recr_n,
flow_n = flow_n)
# Insert the new vital rate models into the proto_ipm, then rebuild the IPM.
parameters(use_proto) <- params
boot_ipm <- use_proto %>%
make_ipm(iterate = TRUE,
iterations = 100)
all_lambdas[i] <- lambda(boot_ipm)
}
# Plot the results
hist(all_lambdas)
abline(v = lambda_obs, col = 'red', lwd = 2, lty = 2)
```

You can adapt this code to store any quantity of interest (e.g. regression coefficients, population structures).

An article on these is available on the website and from an R session:

`browseVignettes("ipmr")`