--- title: Overview of nimbleMacros author: Ken Kellner output: markdown::html_format vignette: > %\VignetteIndexEntry{Overview of nimbleMacros} %\VignetteEngine{knitr::knitr} \usepackage[utf8]{inputenc} --- # Introduction The `nimbleMacros` package has two goals: 1. Provide a set of macros for linear modeling that may be used by both end users and R package developers. 2. Demonstrate the use of macros in [NIMBLE](https://r-nimble.org/) models. We intend the macros in this package to be widely useful, but also just a starting point and example for NIMBLE users to write their own macros. To begin, load the `nimbleMacros` package, which will also load the `NIMBLE` package: ``` r library(nimbleMacros) ``` Macro functionality is currently enabled by default in `NIMBLE`. Loading `nimbleMacros` will also activate it automatically. If you need to do so yourself, you can set the option manually: ``` r nimbleOptions(enableMacros = TRUE) ``` We'll begin by showing off the macros available in the package, and then discuss building your own macros at the end. # `LM`: A macro for complete linear models The `LM` macro generates complete NIMBLE model code for a variety of linear, generalized linear, and generalized linear mixed models. It uses very similar syntax to R functions like `lm`, `glm`, `lme4::lmer`, and `lme4::glmer`. ## Example logistic regression with `mtcars` Here's a logistic regression in R using the built-in `mtcars` dataset. We'll model transmission type (`am`, where `am = 1` is manual transmission) as a function of standardized miles per gallon. ``` r rmod <- glm(am ~ scale(mpg), data = mtcars, family = binomial) summary(rmod) ``` ``` ## ## Call: ## glm(formula = am ~ scale(mpg), family = binomial, data = mtcars) ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -0.4351 0.4543 -0.958 0.33816 ## scale(mpg) 1.8504 0.6921 2.673 0.00751 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 43.230 on 31 degrees of freedom ## Residual deviance: 29.675 on 30 degrees of freedom ## AIC: 33.675 ## ## Number of Fisher Scoring iterations: 5 ``` To create this model with the `LM` macro, we'll start by organizing the data and constants into a list. ``` r const <- mtcars[c("am", "mpg")] ``` Next we'll specify the prior distributions we want to use for the intercept and the slope coefficient(s). This is done using the `setPriors` function. We'll specify a uniform(-10, 10) distribution for the intercept and a normal distribution with mean 0 and precision 0.1 for the slope coefficient. Note that these distribution definitions are wrapped in `quote`; you can also supply them as strings. ``` r pr <- setPriors(intercept = quote(dunif(-10,10)), coefficient = quote(dnorm(0, 0.1))) ``` Finally, we specify the NIMBLE code, which is a single call to the `LM` macro that should look familiar to our call to `glm` above. Two things to note: 1. We include a call to `scale` in the formula as with `glm()`; `LM` will do the scaling automatically. 2. You don't need to specify dimensions of the variables with `LM`, as this is handled automatically, with `LM` assuming all data and constants are vectors of the same length. In more complex situations you can use the `LINPRED` macro; see the next section. ``` r code <- nimbleCode({ LM(am ~ scale(mpg), family = binomial, priors = pr) }) ``` We create a NIMBLE model with this input code, and show what the final code looks like after macro expansion. ``` r mod <- nimbleModel(code = code, constants = const) mod$getCode() ``` ``` ## { ## "# LM" ## " ## FORLOOP" ## for (i_1 in 1:32) { ## am[i_1] ~ dbinom(mu_[i_1], size = 1) ## } ## " ## ----" ## " ## LINPRED" ## " ### nimbleMacros::FORLOOP" ## for (i_2 in 1:32) { ## logit(mu_[i_2]) <- beta_Intercept + beta_mpg_scaled * ## mpg_scaled[i_2] ## } ## " ### ----" ## " ### nimbleMacros::LINPRED_PRIORS" ## beta_Intercept ~ dunif(-10, 10) ## beta_mpg_scaled ~ dnorm(0, 0.1) ## " ### ----" ## " ## ----" ## "# ----" ## } ``` Finally, we'll fit the model using MCMC, which yields similar results to `glm`. ``` r samples <- nimbleMCMC(mod, nchain = 1, niter = 3000, nburnin = 2000, samplesAsCodaMCMC = TRUE) ``` ``` ## |-------------|-------------|-------------|-------------| ## |-------------------------------------------------------| ``` ``` r summary(samples) ``` ``` ## ## Iterations = 1:1000 ## Thinning interval = 1 ## Number of chains = 1 ## Sample size per chain = 1000 ## ## 1. Empirical mean and standard deviation for each variable, ## plus standard error of the mean: ## ## Mean SD Naive SE Time-series SE ## beta_Intercept -0.4651 0.4693 0.01484 0.03274 ## beta_mpg_scaled 2.0391 0.7063 0.02234 0.04881 ## ## 2. Quantiles for each variable: ## ## 2.5% 25% 50% 75% 97.5% ## beta_Intercept -1.4296 -0.7595 -0.4548 -0.134 0.4101 ## beta_mpg_scaled 0.7872 1.5560 2.0039 2.464 3.5055 ``` ## Example linear mixed model using `ChickWeights` The `LM` macro can also generate code for random effects models. To illustrate this we will use the built-in `ChickWeight` dataset, which contains repeated measurements of chick weights over time. First, fit the model in R using the `lme4` package: ```r library(lme4) rmod <- lmer(weight ~ Time + (1|Chick), data=ChickWeight) summary(rmod) ``` ``` ## Linear mixed model fit by REML ['lmerMod'] ## Formula: weight ~ Time + (1 | Chick) ## Data: ChickWeight ## ## REML criterion at convergence: 5619.4 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.0086 -0.5528 -0.0888 0.4975 3.4588 ## ## Random effects: ## Groups Name Variance Std.Dev. ## Chick (Intercept) 717.9 26.79 ## Residual 799.4 28.27 ## Number of obs: 578, groups: Chick, 50 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 27.8451 4.3877 6.346 ## Time 8.7261 0.1755 49.716 ## ## Correlation of Fixed Effects: ## (Intr) ## Time -0.422 ``` To fit this model with NIMBLE using `LM`, we'll first organize the `ChickWeight` data into a list of constants and data for NIMBLE. ``` r chick_const <- list(weight = ChickWeight$weight, Time = ChickWeight$Time, Chick = ChickWeight$Chick) ``` We can create the model with `LM` using the same formula notation as `lme4` with the `(1|Chick)` part of the formula indicating random intercepts by chick. ``` r code <- nimbleCode({ LM(weight ~ Time + (1|Chick)) }) ``` Next we create the model object and show the expanded code. ``` r mod <- nimbleModel(code = code, constants = chick_const) mod$getCode() ``` ``` ## { ## "# LM" ## " ## FORLOOP" ## for (i_1 in 1:578) { ## weight[i_1] ~ dnorm(mu_[i_1], sd = sd_residual) ## } ## " ## ----" ## " ## LINPRED" ## " ### nimbleMacros::FORLOOP" ## for (i_2 in 1:578) { ## mu_[i_2] <- beta_Intercept + beta_Time * Time[i_2] + ## beta_Chick[Chick[i_2]] ## } ## " ### ----" ## " ### nimbleMacros::LINPRED_PRIORS" ## beta_Intercept ~ dnorm(0, sd = 1000) ## beta_Time ~ dnorm(0, sd = 1000) ## sd_Chick ~ dunif(0, 100) ## " #### nimbleMacros::FORLOOP" ## for (i_3 in 1:50) { ## beta_Chick[i_3] ~ dnorm(0, sd = sd_Chick) ## } ## " #### ----" ## " ### ----" ## " ## ----" ## sd_residual ~ dunif(0, 100) ## "# ----" ## } ``` Fit the model: ``` r samples <- nimbleMCMC(mod, nchain = 1, niter = 3000, nburnin = 2000, samplesAsCodaMCMC = TRUE) ``` ``` ## |-------------|-------------|-------------|-------------| ## |-------------------------------------------------------| ``` ``` r summary(samples) ``` ``` ## ## Iterations = 1:1000 ## Thinning interval = 1 ## Number of chains = 1 ## Sample size per chain = 1000 ## ## 1. Empirical mean and standard deviation for each variable, ## plus standard error of the mean: ## ## Mean SD Naive SE Time-series SE ## beta_Intercept 28.294 5.1556 0.163035 0.95444 ## beta_Time 8.727 0.1888 0.005971 0.01540 ## sd_Chick 27.444 3.1085 0.098299 0.25186 ## sd_residual 28.376 0.9509 0.030069 0.06735 ## ## 2. Quantiles for each variable: ## ## 2.5% 25% 50% 75% 97.5% ## beta_Intercept 18.807 24.475 28.33 31.810 38.099 ## beta_Time 8.375 8.593 8.73 8.858 9.098 ## sd_Chick 21.609 25.371 27.27 29.478 34.189 ## sd_residual 26.579 27.802 28.31 29.031 30.293 ``` # `LINPRED` As the name suggests, the `LINPRED` macro generates a linear predictor from a provided R formula. It is used internally by the `LM` macro and can be used separately as well. The macro acts something like the `model.matrix` function in R, although instead of creating a model matrix, it creates the NIMBLE model code for the linear predictor. The macro supports random effects using the same syntax as the `lme4` package. To demonstrate the macro, we'll use the built-in `ChickWeight` dataset again. In this dataset there are repeated weight measurements (`weight`) of chicks (`Chick`) over time (`Time`). ``` r head(ChickWeight) ``` ``` ## weight Time Chick Diet ## 1 42 0 1 1 ## 2 51 2 1 1 ## 3 59 4 1 1 ## 4 64 6 1 1 ## 5 76 8 1 1 ## 6 93 10 1 1 ``` ## Generate a linear predictor from a formula To create a linear predictor with time as a covariate in R, we would specify the formula as `~Time`. ``` r head(model.matrix(~Time, ChickWeight)) ``` ``` ## (Intercept) Time ## 1 1 0 ## 2 1 2 ## 3 1 4 ## 4 1 6 ## 5 1 8 ## 6 1 10 ``` Based on the model matrix, the formula implies two parameters: the intercept and the coefficient/slope for time. To generate the linear predictor NIMBLE model code, we start by setting up the constants/data list. We'll use the `chick_const` list we used with `LM`, and add the sample size `N` so we can use it to define dimensions. ``` r chick_const$N <- nrow(ChickWeight) ``` In the model code, we specify the parameter name for the linear predictor (`mu`) and calculate it with the `LINPRED` macro and the same formula as above. Here we also set an argument `priors = NULL`; we'll come back to this later. Notice that unlike with `LM`, we explicitly define the dimensions of `mu` and `Time`. ``` r code <- nimbleCode({ mu[1:N] <- LINPRED(~Time[1:N], priors = NULL) }) ``` Next create the model object and view the code. ``` r mod <- nimbleModel(code, chick_const) mod$getCode() ``` ``` ## { ## "# LINPRED" ## " ## nimbleMacros::FORLOOP" ## for (i_1 in 1:N) { ## mu[i_1] <- beta_Intercept + beta_Time * Time[i_1] ## } ## " ## ----" ## "# ----" ## } ``` NIMBLE expands the `LINPRED` macro into a `for` loop calculating a value of `mu` for each sample in the dataset, where `mu` is a function of time. Notice two parameters were created: `beta_Intercept` (the intercept) and `beta_Time`, the coefficient associated with time. We can get a list of the parameters created by macros in the model using the `getMacroParameters()` function: ``` r mod$getMacroParameters() ``` ``` ## $LINPRED ## $LINPRED[[1]] ## [1] "beta_Intercept" "beta_Time" ## ## ## $`nimbleMacros::FORLOOP` ## $`nimbleMacros::FORLOOP`[[1]] ## NULL ``` When all the data and constants used by `LINPRED` have the same dimensions, you can specify the dimensions for the left-hand-side only: ``` r code <- nimbleCode({ mu[1:N] <- LINPRED(~Time, priors = NULL) }) mod <- nimbleModel(code, chick_const) mod$getCode() ``` ``` ## { ## "# LINPRED" ## " ## nimbleMacros::FORLOOP" ## for (i_1 in 1:N) { ## mu[i_1] <- beta_Intercept + beta_Time * Time[i_1] ## } ## " ## ----" ## "# ----" ## } ``` In more complicated situations you may have different dimensions for different data/constants. For example, we'll convert `Time` into a one-column matrix, but keep `mu` as a vector. ``` r chick_const2 <- chick_const chick_const2$Time <- matrix(chick_const2$Time, ncol = 1) str(chick_const2) ``` ``` ## List of 4 ## $ weight: num [1:578] 42 51 59 64 76 93 106 125 149 171 ... ## $ Time : num [1:578, 1] 0 2 4 6 8 10 12 14 16 18 ... ## $ Chick : Ord.factor w/ 50 levels "18"<"16"<"15"<..: 15 15 15 15 15 15 15 15 15 15 ... ## $ N : int 578 ``` To handle this we just need to specify different dimensions for each in the call to `LINPRED`. ``` r code <- nimbleCode({ mu[1:N] <- LINPRED(~Time[1:N,1], priors = NULL) }) mod <- nimbleModel(code, chick_const2) mod$getCode() ``` ``` ## { ## "# LINPRED" ## " ## nimbleMacros::FORLOOP" ## for (i_1 in 1:N) { ## mu[i_1] <- beta_Intercept + beta_Time * Time[i_1, 1] ## } ## " ## ----" ## "# ----" ## } ``` ## Factor/categorical covariates The `LINPRED` macro can automatically handle categorical covariates (what R calls factors). A categorical covariate can be provided in the constants as either an R factor, or as a character vector/matrix/array (which will be converted automatically to a factor). For example, suppose we included a covariate for diet type (`Diet`) in the model. The `Diet` covariate is already coded as an R factor: ``` r class(ChickWeight$Diet) ``` ``` ## [1] "factor" ``` ``` r levels(ChickWeight$Diet) ``` ``` ## [1] "1" "2" "3" "4" ``` ``` r chick_const$Diet <- ChickWeight$Diet ``` ``` r code <- nimbleCode({ mu[1:N] <- LINPRED(~Time[1:N] + Diet[1:N], priors = NULL) }) mod <- nimbleModel(code, chick_const) mod$getCode() ``` ``` ## { ## "# LINPRED" ## " ## nimbleMacros::FORLOOP" ## for (i_1 in 1:N) { ## mu[i_1] <- beta_Intercept + beta_Time * Time[i_1] + beta_Diet[Diet[i_1]] ## } ## " ## ----" ## "# ----" ## } ``` A new vector parameter `beta_Diet` is created with four elements, one per level of `Diet`. The actual parameter `Diet` is then converted to numeric and used as the index for this vector parameter. It isn't obvious from this code, but the first element of `beta_Diet` will be fixed at 0 (serving as the reference level). We'll show this below. ## Add a link function We could also add a link function to the left-hand-side of the linear predictor calculation. Suppose we wanted to force `mu` to be positive; we could use a log link function, specifying this with the `link` argument. ``` r code <- nimbleCode({ mu[1:N] <- LINPRED(~Time[1:N], priors = NULL, link = log) }) mod <- nimbleModel(code, chick_const) mod$getCode() ``` ``` ## { ## "# LINPRED" ## " ## nimbleMacros::FORLOOP" ## for (i_1 in 1:N) { ## log(mu[i_1]) <- beta_Intercept + beta_Time * Time[i_1] ## } ## " ## ----" ## "# ----" ## } ``` ## Functions in formulas The `LINPRED` macro supports a handful of functions commonly included in formulas: `offset()`, `scale()`, `log()`, and `I()`. For example, to build a linear predictor with the `Time` covariate scaled to have mean 0 and SD 1: ``` r code <- nimbleCode({ mu[1:N] <- LINPRED(~scale(Time[1:N]), priors = NULL, link = log) }) mod <- nimbleModel(code, chick_const) mod$getCode() ``` ``` ## { ## "# LINPRED" ## " ## nimbleMacros::FORLOOP" ## for (i_1 in 1:N) { ## log(mu[i_1]) <- beta_Intercept + beta_Time_scaled * Time_scaled[i_1] ## } ## " ## ----" ## "# ----" ## } ``` This adds a new constant `Time_scaled` to the constants: ``` r head(mod$getConstants()$Time_scaled) ``` ``` ## [1] -1.5858774 -1.2899493 -0.9940213 -0.6980932 -0.4021652 -0.1062371 ``` `LINPRED` uses a modular system to handle formula functions. You can add support for additional functions in `LINPRED` formulas by writing a function with a specific name, class, and output. See the source code of the package, specifically the file `formulaFunction.R`, for examples of how to create these functions. ## Get default priors If we do not manually set the `priors` argument, `LINPRED` will automatically generate a default set of priors for the two parameters it created (`beta_Intercept` and `beta_Time`). Internally, `LINPRED` is calling the `LINPRED_PRIORS` macro (described below). ``` r code <- nimbleCode({ mu[1:N] <- LINPRED(~Time[1:N] + Diet[1:N]) }) mod <- nimbleModel(code, chick_const) mod$getCode() ``` ``` ## { ## "# LINPRED" ## " ## nimbleMacros::FORLOOP" ## for (i_1 in 1:N) { ## mu[i_1] <- beta_Intercept + beta_Time * Time[i_1] + beta_Diet[Diet[i_1]] ## } ## " ## ----" ## " ## nimbleMacros::LINPRED_PRIORS" ## beta_Intercept ~ dnorm(0, sd = 1000) ## beta_Time ~ dnorm(0, sd = 1000) ## beta_Diet[1] <- 0 ## beta_Diet[2] ~ dnorm(0, sd = 1000) ## beta_Diet[3] ~ dnorm(0, sd = 1000) ## beta_Diet[4] ~ dnorm(0, sd = 1000) ## " ## ----" ## "# ----" ## } ``` We now have priors for each parameter. Note that as mentioned previously, the first value of the `beta_Diet` vector is fixed at 0, serving as the reference level. Warning: one should check the default priors carefully to make sure that they make sense for your problem. In particular, if the values in your data are of magnitudes such that parameters of the linear predictor could be large in magnitude, the default priors could have a strong influence on your results. ## Suppress the macro comments We can also suppress the added macro comments to make things a little more concise. ``` r nimbleOptions(enableMacroComments = FALSE) ``` ``` r code <- nimbleCode({ mu[1:N] <- LINPRED(~Time[1:N]) }) mod <- nimbleModel(code, chick_const) mod$getCode() ``` ``` ## { ## for (i_1 in 1:N) { ## mu[i_1] <- beta_Intercept + beta_Time * Time[i_1] ## } ## beta_Intercept ~ dnorm(0, sd = 1000) ## beta_Time ~ dnorm(0, sd = 1000) ## } ``` ## Random effects The `LINPRED` macro supports random effects using `lme4` syntax. For example, a random intercept by `Chick`: ``` r code <- nimbleCode({ mu[1:N] <- LINPRED(~Time[1:N] + (1|Chick[1:N])) }) mod <- nimbleModel(code, chick_const) mod$getCode() ``` ``` ## { ## for (i_1 in 1:N) { ## mu[i_1] <- beta_Intercept + beta_Time * Time[i_1] + beta_Chick[Chick[i_1]] ## } ## beta_Intercept ~ dnorm(0, sd = 1000) ## beta_Time ~ dnorm(0, sd = 1000) ## sd_Chick ~ dunif(0, 100) ## for (i_2 in 1:50) { ## beta_Chick[i_2] ~ dnorm(0, sd = sd_Chick) ## } ## } ``` Note that by default, all random effects are mean 0, the same approach used by `lme4`. We can also get both random intercepts and (time) slopes by `Chick`; by default these are correlated. ``` r code <- nimbleCode({ mu[1:N] <- LINPRED(~Time[1:N] + (Time[1:N]|Chick[1:N])) }) mod <- nimbleModel(code, chick_const) mod$getCode() ``` ``` ## { ## for (i_1 in 1:N) { ## mu[i_1] <- beta_Intercept + beta_Time * Time[i_1] + beta_Chick[Chick[i_1]] + ## beta_Time_Chick[Chick[i_1]] * Time[i_1] ## } ## beta_Intercept ~ dnorm(0, sd = 1000) ## beta_Time ~ dnorm(0, sd = 1000) ## sd_Chick ~ dunif(0, 100) ## sd_Time_Chick ~ dunif(0, 100) ## re_sds_Chick[1] <- sd_Chick ## re_sds_Chick[2] <- sd_Time_Chick ## Ustar_Chick[1:2, 1:2] ~ dlkj_corr_cholesky(1, 2) ## U_Chick[1:2, 1:2] <- uppertri_mult_diag(Ustar_Chick[1:2, ## 1:2], re_sds_Chick[1:2]) ## re_means_Chick[1] <- 0 ## re_means_Chick[2] <- 0 ## for (i_2 in 1:50) { ## B_Chick[i_2, 1:2] ~ dmnorm(re_means_Chick[1:2], cholesky = U_Chick[1:2, ## 1:2], prec_param = 0) ## beta_Chick[i_2] <- B_Chick[i_2, 1] ## beta_Time_Chick[i_2] <- B_Chick[i_2, 2] ## } ## } ``` The correlated random effects model here uses an [LKJ distribution prior](https://r-nimble.org/html_manual/cha-writing-models.html#lkj-distribution-for-correlation-matrices). As with `lme4` we can also specify uncorrelated random slopes and intercepts using `||` instead of `|`. ``` r code <- nimbleCode({ mu[1:N] <- LINPRED(~Time[1:N] + (Time[1:N]||Chick[1:N])) }) mod <- nimbleModel(code, chick_const) mod$getCode() ``` ``` ## { ## for (i_1 in 1:N) { ## mu[i_1] <- beta_Intercept + beta_Time * Time[i_1] + beta_Chick[Chick[i_1]] + ## beta_Time_Chick[Chick[i_1]] * Time[i_1] ## } ## beta_Intercept ~ dnorm(0, sd = 1000) ## beta_Time ~ dnorm(0, sd = 1000) ## sd_Chick ~ dunif(0, 100) ## for (i_2 in 1:50) { ## beta_Chick[i_2] ~ dnorm(0, sd = sd_Chick) ## } ## sd_Time_Chick ~ dunif(0, 100) ## for (i_3 in 1:50) { ## beta_Time_Chick[i_3] ~ dnorm(0, sd = sd_Time_Chick) ## } ## } ``` ## Include `LINPRED` in a complete model Up to this point we've generated just the code needed to calculate the linear predictor. Here's a complete linear regression model of chick weight as a function of time making use of `LINPRED` (similar to the code generated by `LM`): ``` r code <- nimbleCode({ mu[1:N] <- LINPRED(~Time[1:N]) weight[1:N] ~ FORLOOP(dnorm(mu[1:N], sd = sigma)) sigma ~ dunif(0, 100) }) mod <- nimbleModel(code, chick_const) mod$getCode() ``` ``` ## { ## for (i_1 in 1:N) { ## mu[i_1] <- beta_Intercept + beta_Time * Time[i_1] ## } ## beta_Intercept ~ dnorm(0, sd = 1000) ## beta_Time ~ dnorm(0, sd = 1000) ## for (i_2 in 1:N) { ## weight[i_2] ~ dnorm(mu[i_2], sd = sigma) ## } ## sigma ~ dunif(0, 100) ## } ``` # `LINPRED_PRIORS` This macro generates a set of priors for parameters of a linear predictor, also based on the R formula. In most cases, we won't use this macro directly; rather it will be called automatically when we use `LINPRED` (see discussion of `priors` argument above). However, if for some reason we want to generate code for only the priors, and not the corresponding linear predictor, we can use `LINPRED_PRIORS` separately as we show below. The formula `~Time` implies two parameters as we described above, and `LINPRED_PRIORS` will generate a prior for each depending on the type of the parameter (e.g. intercept vs. slope). It is not necessary to specify the dimensions in the formula when using `LINPRED_PRIORS` directly. ``` r code <- nimbleCode({ LINPRED_PRIORS(~Time) }) mod <- nimbleModel(code, chick_const) mod$getCode() ``` ``` ## { ## beta_Intercept ~ dnorm(0, sd = 1000) ## beta_Time ~ dnorm(0, sd = 1000) ## } ``` We can also manually set the values for different types of priors such as intercepts and coefficients (slopes). See `?setPriors` for more details. ``` r pr <- setPriors(intercept = quote(dnorm(-5, 5)), coefficient = quote(dnorm(0, sd = 5))) code <- nimbleCode({ LINPRED_PRIORS(~Time + Diet, priors = pr) }) mod <- nimbleModel(code, chick_const) mod$getCode() ``` ``` ## { ## beta_Intercept ~ dnorm(-5, 5) ## beta_Time ~ dnorm(0, sd = 5) ## beta_Diet[1] <- 0 ## beta_Diet[2] ~ dnorm(0, sd = 5) ## beta_Diet[3] ~ dnorm(0, sd = 5) ## beta_Diet[4] ~ dnorm(0, sd = 5) ## } ``` Treating `beta_Diet` as a vector parameter makes sense in the NIMBLE model code, but it does not match the way categorical parameters are handled by `model.matrix`; i.e., dummy variables: ``` r colnames(model.matrix(~Time + Diet, ChickWeight)) ``` ``` ## [1] "(Intercept)" "Time" "Diet2" "Diet3" "Diet4" ``` Here `model.matrix` has created several separate parameters `Diet2`, `Diet3`, and `Diet4` rather than a vector. We can replicate this structure in NIMBLE model code by setting the `LINPRED_PRIORS` option `modelMatrixNames = TRUE`. ``` r code <- nimbleCode({ LINPRED_PRIORS(~Time + Diet, priors = pr, modelMatrixNames = TRUE) }) mod <- nimbleModel(code, chick_const) mod$getCode() ``` ``` ## { ## beta_Intercept ~ dnorm(-5, 5) ## beta_Time ~ dnorm(0, sd = 5) ## beta_Diet[1] <- 0 ## beta_Diet[2] <- beta_Diet2 ## beta_Diet2 ~ dnorm(0, sd = 5) ## beta_Diet[3] <- beta_Diet3 ## beta_Diet3 ~ dnorm(0, sd = 5) ## beta_Diet[4] <- beta_Diet4 ## beta_Diet4 ~ dnorm(0, sd = 5) ## } ``` When this option is set, a few extra lines of NIMBLE model code are added to yield parameter names that match `model.matrix`. # `FORLOOP` The `FORLOOP` macro is used by `LM`, `LINPRED`, and `LINPRED_PRIORS` and is a shorthand way of creating `for` loops (possibly nested) from a single line of code. For example, suppose you want to calculate a linear predictor `mu` from `N` values of a covariate `x`. The NIMBLE model code would be: ``` r nimbleCode({ for (i in 1:N){ mu[i] <- alpha + beta * x[i] } }) ``` The `FORLOOP` macro condenses this information into a single line using index information inside parameter brackets: ``` r code <- nimbleCode({ mu[1:N] <- FORLOOP(alpha + beta * x[1:N]) }) ``` Here's the result after expansion, after creating some example values for the constants. ``` r constants <- list(x = rnorm(5), N = 5) mod <- nimbleModel(code, constants) mod$getCode() ``` ``` ## { ## for (i_1 in 1:N) { ## mu[i_1] <- alpha + beta * x[i_1] ## } ## } ``` Note that one can use vectorization in NIMBLE model code directly, such as ``` mu[1:N] <- alpha + beta * X[1:N] ``` The distinction is that that creates a single multivariate node, `mu[1:N]`, while the loop creates `N` separate scalar nodes. Which is best will depend on the model structure and the algorithm used with the model [as discussed in the NIMBLE user manual](https://r-nimble.org/html_manual/cha-writing-models.html#subsec:vectorized-versus-scalar-declarations). It may seem a bit trivial to make a macro for such a simple chunk of code. However it can be nice to use this when you have nested for loops, which are a bit more complicated. ``` r constants <- list(x = matrix(rnorm(10), 5, 2), N = 5, J = 2) code <- nimbleCode({ mu[1:N, 1:J] <- FORLOOP(alpha + beta * x[1:N, 1:J]) }) mod <- nimbleModel(code, constants) mod$getCode() ``` ``` ## { ## for (i_1 in 1:N) { ## for (i_2 in 1:J) { ## mu[i_1, i_2] <- alpha + beta * x[i_1, i_2] ## } ## } ## } ``` Note that one place you can't use this macro directly are situations when you use the same index in multiple nested loops. For example, suppose the parameter `mu` shown above was a square matrix. ``` r constants <- list(x = matrix(rnorm(25), 5, 5), N = 5) code <- nimbleCode({ mu[1:N, 1:N] <- FORLOOP(alpha + beta * x[1:N, 1:N]) }) mod <- nimbleModel(code, constants) ``` ``` ## Error : Not sure how to expand duplicated index 1:N in code: ## mu[1:N, 1:N] ``` ``` ## Error: Model macro FORLOOP failed. ``` The structure of this `for` loop with duplicated indices could be ambiguous in more complex situations, so `FORLOOP` errors. The solution in this case would simply be to provide different variables for each dimension, even if they are the same value. ``` r constants <- list(x = matrix(rnorm(25), 5, 5), N = 5, K = 5) code <- nimbleCode({ mu[1:N, 1:K] <- FORLOOP(alpha + beta * x[1:N, 1:K]) }) mod <- nimbleModel(code, constants) mod$getCode() ``` ``` ## { ## for (i_1 in 1:N) { ## for (i_2 in 1:K) { ## mu[i_1, i_2] <- alpha + beta * x[i_1, i_2] ## } ## } ## } ``` # Writing Your Own Macros Suppose you wanted to create a macro called `MYDIST`, which creates a normal distribution with mean 0 and provided standard deviation. The macro would look like this in the NIMBLE model code: ``` r y ~ MYDIST(sdev = 2) ``` Note that by convention, we define the macro name in all capital letters. All the provided macros in `nimbleMacros` follow this convention. We want the final, processed output to look like: ``` r y ~ dnorm(0, sd = 2) ``` ## Manually creating the macro A macro is, fundamentally, a specially-formatted R function that converts one line of code to another. For example: ``` r fun <- function(input){ lhs <- input[[2]] sdev <- input[[3]]$sdev output <- substitute(LHS ~ dnorm(0, sd = SDEV), list(LHS = lhs, SDEV = sdev)) output } ``` Note that we have to do a bit of clunky code-processing to extract the left-hand-side of the assignment and the argument value. (The use of the `[[` extraction with a code object extracts information from the abstract syntax tree representation of an R expression.) ``` r input <- quote(y ~ MYDIST(sdev = 2)) fun(input) ``` ``` ## y ~ dnorm(0, sd = 2) ``` In practice, `nimble` expects the macro function converter (`fun` in the example above) to have two additional arguments: `modelInfo` and `.env`. When processing a macro, `nimble` will provide a named list to the `modelInfo` argument which contains some additional model information. Most importantly, this list includes `constants`, which can be used and/or changed by the macro. And, as you can probably guess, `nimble` provides the R environment in which `nimbleModel` was called to the `.env` argument, to help with scoping issues. In addition to these two arguments, the function also needs to return a particular structure in the output. The output should be a named list with two elements: `code`, which contains the output code (as in the original function above), and `modelInfo` which is the list of model information, which we could modify or leave as-is. ``` r fun <- function(input_code, modelInfo, .env){ LHS <- input_code[[2]] sdev <- input_code[[3]]$sdev output_code <- substitute(LHS ~ dnorm(0, sd = SDEV), list(LHS = LHS, SDEV = sdev)) # Modify the constants modelInfo$constants$new <- 1 list(code = output_code, modelInfo = modelInfo) } ``` ``` r input <- quote(y ~ MYDIST(sdev = 2)) fun(input, modelInfo = list(constants = NULL), .env = environment()) ``` ``` ## $code ## y ~ dnorm(0, sd = 2) ## ## $modelInfo ## $modelInfo$constants ## $modelInfo$constants$new ## [1] 1 ``` Finally, `nimble` needs a way to know that this function is intended to be a macro, and that it is called `MYDIST`. The final step then is to create a wrapper `list` with our function inserted as an element called `process`, and assign the class `"model_macro"`. (In the next section we'll see that the helper function `buildMacro` will create this wrapper list for us.) ``` r MYDIST <- list(process = fun) class(MYDIST) <- "model_macro" ``` Now, we can test the macro in a NIMBLE model: ``` r code <- nimbleCode({ y ~ MYDIST(sdev = 2) }) mod <- nimbleModel(code = code, data = list(y = 1)) mod$getCode() ``` ``` ## { ## y ~ dnorm(0, sd = 2) ## } ``` ## Using `buildMacro` to create the macro Manually creating the macro is fine, but it gets complicated if we want to extract a lot of separate pieces of information from the code line, e.g., if our macro has many arguments. As an alternative, we can use the `buildMacro` function included with `nimble`, which handles some of this code processing for us. The first argument to `buildMacro`, is a function structured like the one we wrote above `fun`. The other two arguments `use3pieces` and `unpackArgs` determine how the arguments of `fun` should be structured. First, suppose both `use3pieces` and `unpackArgs` are set to `FALSE`. In this case, the first argument to `fun` is the entire original line of code containing the macro, i.e., `y ~ MYDIST(sdev = 2)`. This is identical to how we wrote the function above. In this case, all `buildMacro` will do is create the required macro list structure and assign the class for us. ``` r MYDIST2 <- buildMacro(fun, use3pieces = FALSE, unpackArgs = FALSE) ``` ``` r code <- nimbleCode({ y ~ MYDIST2(sdev = 2) }) mod <- nimbleModel(code = code, data = list(y = 1)) mod$getCode() ``` ``` ## { ## y ~ dnorm(0, sd = 2) ## } ``` Now we will set `use3pieces = TRUE`. In this case, `buildMacro` will automatically do some initial processing on the code line for us. Specifically, it will identify if the assignment is stochastic (`stoch`, `TRUE` if `~` and `FALSE` if `<-`), and separate and return the left-hand-side (`LHS`) and right-hand-side (`RHS`) of the assignment. With these settings, our function's first three arguments need to be `stoch`, `LHS`, and `RHS`, in that order (plus we keep the always-required `modelInfo` and `.env`). ``` r fun3 <- function(stoch, lhs, rhs, modelInfo, .env){ sdev <- rhs$sdev output_code <- substitute(LHS ~ dnorm(0, sd = SDEV), list(LHS = lhs, SDEV = sdev)) # Modify the constants modelInfo$constants$new <- 1 list(code = output_code, modelInfo = modelInfo) } ``` We do a little less code processing this time since `buildMacro` has already separated LHS and RHS for us. Note that in this simple example we aren't using `stoch` because we want our output code to always be stochastic. ``` r MYDIST3 <- buildMacro(fun3, use3pieces = TRUE, unpackArgs = FALSE) ``` ``` r code <- nimbleCode({ y ~ MYDIST3(sdev = 2) }) mod <- nimbleModel(code = code, data = list(y = 1)) ``` ``` ## Error in (function (stoch, lhs, rhs, modelInfo, .env) : ## unused arguments (LHS = quote(y), RHS = quote(MYDIST3(sdev = 2))) ``` ``` ## Error: Model macro MYDIST3 failed. ``` ``` r mod$getCode() ``` ``` ## { ## y ~ dnorm(0, sd = 2) ## } ``` Finally, consider the case where we also set `unpackArgs = TRUE`. Now, `buildMacro` will dig into the right-hand-side of the macro and extract the argument values for us (in this case just `sdev`). Thus, we need to include all these arguments explicitly as arguments in our function. ``` r fun4 <- function(stoch, lhs, sdev, modelInfo, .env){ output_code <- substitute(LHS ~ dnorm(0, sd = SDEV), list(LHS = lhs, SDEV = sdev)) # Modify the constants modelInfo$constants$new <- 1 list(code = output_code, modelInfo = modelInfo) } ``` Even less code processing is required in this case, since `buildMacro` will split out the `sdev` argument for us. ``` r MYDIST4 <- buildMacro(fun4, use3pieces = TRUE, unpackArgs = TRUE) ``` ``` r code <- nimbleCode({ y ~ MYDIST4(sdev = 2) }) mod <- nimbleModel(code = code, data = list(y = 1)) ``` ``` ## Error in (function (stoch, lhs, sdev, modelInfo, .env) : ## unused argument (LHS = quote(y)) ``` ``` ## Error: Model macro MYDIST4 failed. ``` ``` r mod$getCode() ``` ``` ## { ## y ~ dnorm(0, sd = 2) ## } ``` The choice of arguments will depend on the macro: sometimes the short-cuts enabled by `buildMacro` will be useful, other times you may need more manual control over how the input code line is processed. See `?buildMacro` for additional detailed documentation and examples, including an example when `use3pieces = FALSE` and `unpackArgs = TRUE`. You may also find the source code for the macros included in `nimbleMacros`, found [here](https://github.com/nimble-dev/nimbleMacros/tree/master/R), a useful resource for writing your own.