This chapter focuses on hierarchical linear
models—Gaussian models with random effects—and related use
cases that rely on rlmb for block Gibbs
sampling. The function rlmb is a minimal
Bayesian simulation engine for linear (Gaussian) models: it takes a
response vector, design matrix, and prior specification, and returns
draws from the posterior. It is intended for settings where you need
repeated draws with updated conditioning inside a Gibbs
sampler (e.g., alternating between blocks for coefficients and
dispersion, or for population and unit-level parameters).
lmb or glmb when you
want a single fit or repeated standalone fits: formula interface, full
object, summary, predict, etc.rlmb when you are building
your own MCMC for linear models, such as a two-block Gibbs
sampler: you alternate between blocks, each time calling
rlmb with the current values of the other block. The
low-overhead, matrix-in/matrix-out interface is then a better fit. For
hierarchical generalized linear models (e.g., Poisson),
use rglmb—see Chapter 14.We illustrate two use cases for rlmb in
the context of linear (Gaussian) models: (1) estimating
dispersion and regression coefficients when the prior
is non-conjugate, and (2) hierarchical (random effects)
models. All examples use data already available in the package. For
hierarchical generalized linear models (e.g., Poisson),
see Chapter 14.
In the Gaussian linear model \(y = X\beta + \varepsilon\) with \(\varepsilon \sim \text{N}(0, \phi I)\), the likelihood depends on the residual variance \(\phi\). If the prior on \((\beta, \phi)\) is not conjugate (e.g., you want independent priors on \(\beta\) and \(\phi\)), you can still sample from the posterior by a two-block Gibbs sampler:
rlmb implements both conditionals: use
dNormal(..., dispersion = phi) for the
\(\beta\)-step and
dGamma(shape, rate, beta = beta_current)
for the \(\phi\)-step; then take the
returned dispersion as the new \(\phi\).
We use the plant weight data from (Dobson
1990) (Page 9): weight by treatment group (Ctl vs Trt). The code
below sets the prior from Prior_Setup,
then runs a short burn-in and stores draws from the Gibbs sampler.
Posterior means are compared to the conjugate
lmb(..., dIndependent_Normal_Gamma(...))
fit.
## Dobson (1990) "An Introduction to Generalized Linear Models", Page 9: Plant Weight Data
ctl <- c(4.17, 5.58, 5.18, 6.11, 4.50, 4.61, 5.17, 4.53, 5.33, 5.14)
trt <- c(4.81, 4.17, 4.41, 3.59, 5.87, 3.83, 6.03, 4.89, 4.32, 4.69)
group <- gl(2, 10, 20, labels = c("Ctl", "Trt"))
weight <- c(ctl, trt)
ps <- Prior_Setup(weight ~ group)
#> Using default pwt = 0.01 (low-d default).
x <- ps$x
y <- ps$y
mu <- ps$mu
V <- ps$Sigma
shape <- ps$shape
rate <- ps$rate
rate_dg <- if (!is.null(ps$rate_gamma)) ps$rate_gamma else rate
disp_ML <- ps$dispersionset.seed(180)
dispersion_current <- disp_ML
## Burn-in
n_burn <- 1000
for (i in seq_len(n_burn)) {
out_beta <- rlmb(n = 1, y = y, x = x, pfamily = dNormal(mu = mu, Sigma = V, dispersion = dispersion_current))
out_phi <- rlmb(n = 1, y = y, x = x, pfamily = dGamma(shape = shape, rate = rate_dg, beta = out_beta$coefficients[1, ]))
dispersion_current <- out_phi$dispersion
}
## Sampling
n_sim <- 1000
beta_out <- matrix(0, nrow = n_sim, ncol = 2)
disp_out <- numeric(n_sim)
for (i in seq_len(n_sim)) {
out_beta <- rlmb(n = 1, y = y, x = x, pfamily = dNormal(mu = mu, Sigma = V, dispersion = dispersion_current))
out_phi <- rlmb(n = 1, y = y, x = x, pfamily = dGamma(shape = shape, rate = rate_dg, beta = out_beta$coefficients[1, ]))
dispersion_current <- out_phi$dispersion
beta_out[i, ] <- out_beta$coefficients[1, ]
disp_out[i] <- out_phi$dispersion
}lmb_D9 <- lmb(weight ~ group, dIndependent_Normal_Gamma(mu, V, shape = ps$shape_ING, rate = ps$rate))
print(lmb_D9)
#>
#> Call:
#> lmb(formula = weight ~ group, pfamily = dIndependent_Normal_Gamma(mu,
#> V, shape = ps$shape_ING, rate = ps$rate))
#>
#> Posterior Mean Coefficients:
#> (Intercept) groupTrt
#> 5.023 -0.359
## Compare Gibbs vs lmb: means and SDs
coef_names <- colnames(ps$x)
tbl <- data.frame(
Parameter = c(coef_names, "dispersion"),
Gibbs_mean = c(colMeans(beta_out), mean(disp_out)),
Gibbs_SD = c(apply(beta_out, 2, sd), sd(disp_out)),
lmb_mean = c(colMeans(lmb_D9$coefficients), mean(lmb_D9$dispersion)),
lmb_SD = c(apply(lmb_D9$coefficients, 2, sd), sd(lmb_D9$dispersion))
)
knitr::kable(tbl, digits = 4, caption = "Dobson plant weight: Gibbs sampler vs conjugate lmb fit")| Parameter | Gibbs_mean | Gibbs_SD | lmb_mean | lmb_SD | |
|---|---|---|---|---|---|
| (Intercept) | (Intercept) | 5.0377 | 0.2249 | 5.0230 | 0.2182 |
| groupTrt | groupTrt | -0.3740 | 0.3253 | -0.3590 | 0.3061 |
| dispersion | 0.5252 | 0.1776 | 0.4855 | 0.1465 |
The Gibbs posterior means and standard deviations for \(\beta\) and \(\phi\) should be close to the conjugate
lmb fit. In practice, use longer burn-in
and more stored draws.
Gamma regression. The same idea—alternating draws of
coefficients given dispersion and dispersion given coefficients—can be
applied to gamma regression when the dispersion (shape)
of the Gamma response is unknown and must be estimated. The conditioning
steps are then implemented with rglmb
(family Gamma) and an appropriate prior for the dispersion
block. The package does not currently provide a ready-made example for
gamma regression in this setting; typical textbook datasets often imply
large weights that make the Gibbs steps costly. A future vignette or
example may use an alternate dataset once a suitable one is
identified.
The Eight Schools example from (Gelman et al. 2013, Ch. 5) has \(J = 8\) schools; each school \(j\) has an observed effect estimate \(y_j\) with known sampling variance \(\sigma_j^2\). The hierarchical model has a
population mean \(\mu\) and variance
\(\sigma_\theta^2\), and
school-specific effects \(\theta_j\).
With the dNormal_Gamma prior on \((\mu, \sigma_\theta^2)\), the Gibbs sampler
is efficient and suitable for repeated use inside a loop because the
envelope is built once per call without the minimization overhead of
dIndependent_Normal_Gamma.
rlmb with
dNormal_Gamma(mu_mu, sigma_mu/disp_ML, shape, rate).rlmb with
dNormal(mu, Sigma = sigma_theta^2, dispersion = sigma_j^2).school <- c("A", "B", "C", "D", "E", "F", "G", "H")
estimate <- c(28.39, 7.94, -2.75, 6.82, -0.64, 0.63, 18.01, 12.16)
sd_obs <- c(14.9, 10.2, 16.3, 11.0, 9.4, 11.4, 10.4, 17.6)
J <- length(school)
sigma_y_sq <- sd_obs^2 # known sampling variances for each school
mu_mu <- mean(estimate)
sigma_mu <- var(estimate)
n_prior <- 0.5
disp_ML <- var(estimate)
shape <- n_prior / 2
rate <- disp_ML * shapeThe dNormal_Gamma Gibbs chunk below is
not evaluated when the vignette is built
(eval = FALSE). Stored draws are loaded in
chapter13-load-schools-gibbs from
inst/extdata/Chapter13_Eight_Schools_two_gibbs.rds,
produced by
data-raw/make_Chapter13_Eight_Schools_gibbs_output.R.
ch13_path <- system.file(
"extdata", "Chapter13_Eight_Schools_two_gibbs.rds",
package = "glmbayes"
)
stopifnot(nzchar(ch13_path), file.exists(ch13_path))
ch13 <- readRDS(ch13_path)
ng <- ch13$normal_gamma
ind <- ch13$indep_norm_gamma
stopifnot(
ncol(ng$theta_out) == J,
nrow(ng$theta_out) == ng$n_sim,
ncol(ind$theta_out) == J,
nrow(ind$theta_out) == ind$n_sim
)
theta_out_ng <- ng$theta_out
mu_out_ng <- ng$mu_out
sigma_theta_out_ng <- ng$sigma_theta_out
n_burn_ng <- ng$n_burn
n_sim_ng <- ng$n_sim
theta_out <- ind$theta_out
mu_out <- ind$mu_out
sigma_theta_out <- ind$sigma_theta_out
iters_out1 <- ind$iters_out1
iters_out2 <- ind$iters_out2
n_burn_schools <- ind$n_burn
n_sim_schools <- ind$n_simset.seed(101)
x_one <- as.matrix(rep(1, J), nrow = J, ncol = 1)
theta_ng <- estimate
n_burn_ng <- 1000
n_sim_ng <- 1000
theta_out_ng <- matrix(0, nrow = n_sim_ng, ncol = J)
mu_out_ng <- numeric(n_sim_ng)
sigma_theta_out_ng <- numeric(n_sim_ng)
for (k in seq_len(n_burn_ng)) {
out_pop <- rlmb(1, y = theta_ng, x = x_one,
pfamily = dNormal_Gamma(mu_mu, sigma_mu / disp_ML, shape = shape, rate = rate))
mu_theta <- out_pop$coefficients[1, 1]
sigma_theta_sq <- out_pop$dispersion
for (j in seq_len(J)) {
theta_ng[j] <- rlmb(1, y = estimate[j], x = as.matrix(1),
pfamily = dNormal(mu_theta, Sigma = sigma_theta_sq, dispersion = sigma_y_sq[j]))$coefficients[1, 1]
}
}
for (k in seq_len(n_sim_ng)) {
out_pop <- rlmb(1, y = theta_ng, x = x_one,
pfamily = dNormal_Gamma(mu_mu, sigma_mu / disp_ML, shape = shape, rate = rate))
mu_theta <- out_pop$coefficients[1, 1]
sigma_theta_sq <- out_pop$dispersion
for (j in seq_len(J)) {
theta_ng[j] <- rlmb(1, y = estimate[j], x = as.matrix(1),
pfamily = dNormal(mu_theta, Sigma = sigma_theta_sq, dispersion = sigma_y_sq[j]))$coefficients[1, 1]
}
theta_out_ng[k, ] <- theta_ng
mu_out_ng[k] <- mu_theta
sigma_theta_out_ng[k] <- sqrt(sigma_theta_sq)
}colMeans(theta_out_ng)
#> A B C D E F G H
#> 12.243413 8.542895 7.020558 8.310343 6.120766 6.536146 11.441157 9.249691
mean(mu_out_ng)
#> [1] 8.768267
mean(sigma_theta_out_ng)
#> [1] 6.577895
sqrt(diag(var(theta_out_ng)))
#> A B C D E F G H
#> 7.318593 5.885325 7.074361 6.026843 6.215671 6.596157 6.315940 7.188958The dNormal_Gamma prior is well-suited
for Gibbs sampling: each call avoids the RSS/UB2 minimization used by
dIndependent_Normal_Gamma, so run time
stays manageable.
The same hierarchical model can be fit with
dIndependent_Normal_Gamma, which assumes
prior independence between coefficients and precision. This prior
triggers repeated envelope minimization (RSS and UB2) on each
population-block draw, making it slower and less suitable for long Gibbs
runs.
The Gibbs chunks below are not evaluated at vignette
build time (eval = FALSE); results for this block are
loaded from the same RDS as in §2.1
(chapter13-load-schools-gibbs).
rlmb with
dIndependent_Normal_Gamma(mu_mu, sigma_mu, shape, rate).dNormal(...).theta <- estimate
n_burn_schools <- 1000
n_sim_schools <- 1000
theta_out <- matrix(0, nrow = n_sim_schools, ncol = J)
mu_out <- numeric(n_sim_schools)
sigma_theta_out <- numeric(n_sim_schools)
iters_out1 <- numeric(n_burn_schools)
iters_out2 <- numeric(n_sim_schools)set.seed(102)
x_one <- as.matrix(rep(1, J), nrow = J, ncol = 1)
for (k in seq_len(n_burn_schools)) {
out_pop <- rlmb(1, y = theta, x = x_one,
pfamily = dIndependent_Normal_Gamma(mu_mu, sigma_mu, shape = shape, rate = rate))
mu_theta <- out_pop$coefficients[1, 1]
sigma_theta_sq <- out_pop$dispersion
for (j in seq_len(J)) {
theta[j] <- rlmb(1, y = estimate[j], x = as.matrix(1),
pfamily = dNormal(mu_theta, Sigma = sigma_theta_sq, dispersion = sigma_y_sq[j]))$coefficients[1, 1]
}
iters_out1[k] <- out_pop$iters
}## Mean draws per acceptance (population block) after burn-in
## Lower = better. In demo Ex_07_Schools.R this was ~6.01; compare to check for regression.
mean(iters_out1)
#> [1] 18.972for (k in seq_len(n_sim_schools)) {
out_pop <- rlmb(1, y = theta, x = x_one,
pfamily = dIndependent_Normal_Gamma(mu_mu, sigma_mu, shape = shape, rate = rate))
mu_theta <- out_pop$coefficients[1, 1]
sigma_theta_sq <- out_pop$dispersion
for (j in seq_len(J)) {
theta[j] <- rlmb(1, y = estimate[j], x = as.matrix(1),
pfamily = dNormal(mu_theta, Sigma = sigma_theta_sq, dispersion = sigma_y_sq[j]))$coefficients[1, 1]
}
theta_out[k, ] <- theta
mu_out[k] <- mu_theta
sigma_theta_out[k] <- sqrt(sigma_theta_sq)
iters_out2[k] <- out_pop$iters
}colMeans(theta_out)
#> A B C D E F G H
#> 12.752608 8.042492 6.163618 7.744056 5.239019 6.150466 11.402180 9.231072
mean(mu_out)
#> [1] 8.465987
mean(sigma_theta_out)
#> [1] 7.583587
sqrt(diag(var(theta_out)))
#> A B C D E F G H
#> 8.705483 6.580200 7.953565 6.889440 6.263768 6.946018 7.081195 7.800455Posterior means of \(\theta_j\) show shrinkage toward the population mean compared to the raw estimates.
Block Gibbs sampling with rlmb for the
population and school blocks follows the same conditioning logic as in
other glmbayes linear-model examples; the independent
Normal–Gamma population step is related to the joint envelope
construction in (Nygren 2025).
| Use case | Function | Block 1 | Block 2 | Data used |
|---|---|---|---|---|
| Dispersion + \(\beta\) (Gaussian) | rlmb | \(\beta \mid \phi\):
dNormal(..., dispersion = phi) |
\(\phi \mid \beta\):
dGamma(shape, rate, beta = beta) |
Dobson plant weight |
| Hierarchical (Eight Schools, Gibbs-suitable) | rlmb | \((\mu, \sigma_\theta^2) \mid
\theta\):
dNormal_Gamma(mu, Sigma_0, shape, rate) |
\(\theta_j \mid \mu,
\sigma_\theta^2\):
dNormal(..., dispersion = sigma_j^2) |
Eight Schools |
| Hierarchical (Eight Schools) | rlmb | \((\mu, \sigma_\theta^2) \mid
\theta\): dIndependent_Normal_Gamma |
\(\theta_j \mid \mu,
\sigma_\theta^2\):
dNormal(..., dispersion = sigma_j^2) |
Eight Schools |
The same dispersion-and-coefficients idea extends to gamma regression when dispersion must be estimated; a worked example may be added when a suitable dataset is available.
For hierarchical generalized linear models (e.g., Poisson), see
Chapter 14. For prior setup and conjugate fits, see
?Prior_Setup,
?lmb, and
?glmb.