Chapter 13: Hierarchical Linear Models

This chapter focuses on hierarchical linear models—Gaussian models with random effects—and related use cases that rely on rlmb for block Gibbs sampling. The function rlmb is a minimal Bayesian simulation engine for linear (Gaussian) models: it takes a response vector, design matrix, and prior specification, and returns draws from the posterior. It is intended for settings where you need repeated draws with updated conditioning inside a Gibbs sampler (e.g., alternating between blocks for coefficients and dispersion, or for population and unit-level parameters).

Use lmb or glmb when you want a single fit or repeated standalone fits: formula interface, full object, summary, predict, etc.
Use rlmb when you are building your own MCMC for linear models, such as a two-block Gibbs sampler: you alternate between blocks, each time calling rlmb with the current values of the other block. The low-overhead, matrix-in/matrix-out interface is then a better fit. For hierarchical generalized linear models (e.g., Poisson), use rglmb—see Chapter 14.

We illustrate two use cases for rlmb in the context of linear (Gaussian) models: (1) estimating dispersion and regression coefficients when the prior is non-conjugate, and (2) hierarchical (random effects) models. All examples use data already available in the package. For hierarchical generalized linear models (e.g., Poisson), see Chapter 14.

1. Use Case 1: Dispersion and Regression Coefficients (Gaussian)

In the Gaussian linear model \(y = X\beta + \varepsilon\) with \(\varepsilon \sim \text{N}(0, \phi I)\), the likelihood depends on the residual variance \(\phi\). If the prior on \((\beta, \phi)\) is not conjugate (e.g., you want independent priors on \(\beta\) and \(\phi\)), you can still sample from the posterior by a two-block Gibbs sampler:

Block 1: Draw \(\beta \mid \phi, y\) — Gaussian posterior given fixed \(\phi\).
Block 2: Draw \(\phi \mid \beta, y\) — Gamma posterior given fixed \(\beta\).

rlmb implements both conditionals: use dNormal(..., dispersion = phi) for the \(\beta\)-step and dGamma(shape, rate, beta = beta_current) for the \(\phi\)-step; then take the returned dispersion as the new \(\phi\).

We use the plant weight data from (Dobson 1990) (Page 9): weight by treatment group (Ctl vs Trt). The code below sets the prior from Prior_Setup, then runs a short burn-in and stores draws from the Gibbs sampler. Posterior means are compared to the conjugate lmb(..., dIndependent_Normal_Gamma(...)) fit.

## Dobson (1990) "An Introduction to Generalized Linear Models", Page 9: Plant Weight Data
ctl <- c(4.17, 5.58, 5.18, 6.11, 4.50, 4.61, 5.17, 4.53, 5.33, 5.14)
trt <- c(4.81, 4.17, 4.41, 3.59, 5.87, 3.83, 6.03, 4.89, 4.32, 4.69)
group <- gl(2, 10, 20, labels = c("Ctl", "Trt"))
weight <- c(ctl, trt)

ps <- Prior_Setup(weight ~ group)
#> Using default pwt = 0.01 (low-d default).
x    <- ps$x
y    <- ps$y
mu   <- ps$mu
V    <- ps$Sigma
shape <- ps$shape
rate  <- ps$rate
rate_dg <- if (!is.null(ps$rate_gamma)) ps$rate_gamma else rate
disp_ML <- ps$dispersion

set.seed(180)
dispersion_current <- disp_ML

## Burn-in
n_burn  <- 1000
for (i in seq_len(n_burn)) {
  out_beta <- rlmb(n = 1, y = y, x = x, pfamily = dNormal(mu = mu, Sigma = V, dispersion = dispersion_current))
  out_phi  <- rlmb(n = 1, y = y, x = x, pfamily = dGamma(shape = shape, rate = rate_dg, beta = out_beta$coefficients[1, ]))
  dispersion_current <- out_phi$dispersion
}

## Sampling
n_sim   <- 1000
beta_out <- matrix(0, nrow = n_sim, ncol = 2)
disp_out <- numeric(n_sim)
for (i in seq_len(n_sim)) {
  out_beta <- rlmb(n = 1, y = y, x = x, pfamily = dNormal(mu = mu, Sigma = V, dispersion = dispersion_current))
  out_phi  <- rlmb(n = 1, y = y, x = x, pfamily = dGamma(shape = shape, rate = rate_dg, beta = out_beta$coefficients[1, ]))
  dispersion_current    <- out_phi$dispersion
  beta_out[i, ]         <- out_beta$coefficients[1, ]
  disp_out[i]           <- out_phi$dispersion
}

lmb_D9 <- lmb(weight ~ group, dIndependent_Normal_Gamma(mu, V, shape = ps$shape_ING, rate = ps$rate))
print(lmb_D9)
#> 
#> Call:  
#> lmb(formula = weight ~ group, pfamily = dIndependent_Normal_Gamma(mu, 
#>     V, shape = ps$shape_ING, rate = ps$rate))
#> 
#> Posterior Mean Coefficients:
#> (Intercept)     groupTrt  
#>      5.0271      -0.3558

## Compare Gibbs vs lmb: means and SDs
coef_names <- colnames(ps$x)
tbl <- data.frame(
  Parameter  = c(coef_names, "dispersion"),
  Gibbs_mean = c(colMeans(beta_out), mean(disp_out)),
  Gibbs_SD   = c(apply(beta_out, 2, sd), sd(disp_out)),
  lmb_mean   = c(colMeans(lmb_D9$coefficients), mean(lmb_D9$dispersion)),
  lmb_SD     = c(apply(lmb_D9$coefficients, 2, sd), sd(lmb_D9$dispersion))
)
knitr::kable(tbl, digits = 4, caption = "Dobson plant weight: Gibbs sampler vs conjugate lmb fit")

Dobson plant weight: Gibbs sampler vs conjugate lmb fit
	Parameter	Gibbs_mean	Gibbs_SD	lmb_mean	lmb_SD
(Intercept)	(Intercept)	5.0377	0.2249	5.0271	0.2247
groupTrt	groupTrt	-0.3740	0.3253	-0.3558	0.3142
	dispersion	0.5252	0.1776	0.4787	0.1473

The Gibbs posterior means and standard deviations for \(\beta\) and \(\phi\) should be close to the conjugate lmb fit. In practice, use longer burn-in and more stored draws.

Gamma regression. The same idea—alternating draws of coefficients given dispersion and dispersion given coefficients—can be applied to gamma regression when the dispersion (shape) of the Gamma response is unknown and must be estimated. The conditioning steps are then implemented with rglmb (family Gamma) and an appropriate prior for the dispersion block. The package does not currently provide a ready-made example for gamma regression in this setting; typical textbook datasets often imply large weights that make the Gibbs steps costly. A future vignette or example may use an alternate dataset once a suitable one is identified.

2. Use Case 2: Hierarchical (Random Effects) Models

2.1 Eight Schools with dNormal_Gamma Prior (Gibbs-suitable)

The Eight Schools example from (Gelman et al. 2013, Ch. 5) has \(J = 8\) schools; each school \(j\) has an observed effect estimate \(y_j\) with known sampling variance \(\sigma_j^2\). The hierarchical model has a population mean \(\mu\) and variance \(\sigma_\theta^2\), and school-specific effects \(\theta_j\). With the dNormal_Gamma prior on \((\mu, \sigma_\theta^2)\), the Gibbs sampler is efficient and suitable for repeated use inside a loop because the envelope is built once per call without the minimization overhead of dIndependent_Normal_Gamma.

Population block: Draw \((\mu, \sigma_\theta^2) \mid \theta\) using rlmb with dNormal_Gamma(mu_mu, sigma_mu/disp_ML, shape, rate).
School block: For each \(j\), draw \(\theta_j \mid \mu, \sigma_\theta^2, y_j\) using rlmb with dNormal(mu, Sigma = sigma_theta^2, dispersion = sigma_j^2).

school  <- c("A", "B", "C", "D", "E", "F", "G", "H")
estimate <- c(28.39, 7.94, -2.75, 6.82, -0.64, 0.63, 18.01, 12.16)
sd_obs  <- c(14.9, 10.2, 16.3, 11.0, 9.4, 11.4, 10.4, 17.6)
J       <- length(school)
sigma_y_sq <- sd_obs^2   # known sampling variances for each school

mu_mu     <- mean(estimate)
sigma_mu  <- var(estimate)
n_prior   <- 0.5
disp_ML   <- var(estimate)
shape     <- n_prior / 2
rate      <- disp_ML * shape

The dNormal_Gamma Gibbs chunk below is not evaluated when the vignette is built (eval = FALSE). Stored draws are loaded in chapter13-load-schools-gibbs from inst/extdata/Chapter13_Eight_Schools_two_gibbs.rds, produced by data-raw/make_Chapter13_Eight_Schools_gibbs_output.R.

ch13_path <- system.file(
  "extdata", "Chapter13_Eight_Schools_two_gibbs.rds",
  package = "glmbayes"
)
stopifnot(nzchar(ch13_path), file.exists(ch13_path))
ch13 <- readRDS(ch13_path)
ng <- ch13$normal_gamma
ind <- ch13$indep_norm_gamma
stopifnot(
  ncol(ng$theta_out) == J,
  nrow(ng$theta_out) == ng$n_sim,
  ncol(ind$theta_out) == J,
  nrow(ind$theta_out) == ind$n_sim
)
theta_out_ng <- ng$theta_out
mu_out_ng <- ng$mu_out
sigma_theta_out_ng <- ng$sigma_theta_out
n_burn_ng <- ng$n_burn
n_sim_ng <- ng$n_sim
theta_out <- ind$theta_out
mu_out <- ind$mu_out
sigma_theta_out <- ind$sigma_theta_out
iters_out1 <- ind$iters_out1
iters_out2 <- ind$iters_out2
n_burn_schools <- ind$n_burn
n_sim_schools <- ind$n_sim

set.seed(101)
x_one <- as.matrix(rep(1, J), nrow = J, ncol = 1)
theta_ng <- estimate

n_burn_ng <- 1000
n_sim_ng  <- 1000
theta_out_ng <- matrix(0, nrow = n_sim_ng, ncol = J)
mu_out_ng    <- numeric(n_sim_ng)
sigma_theta_out_ng <- numeric(n_sim_ng)

for (k in seq_len(n_burn_ng)) {
  out_pop <- rlmb(1, y = theta_ng, x = x_one,
                  pfamily = dNormal_Gamma(mu_mu, sigma_mu / disp_ML, shape = shape, rate = rate))
  mu_theta     <- out_pop$coefficients[1, 1]
  sigma_theta_sq <- out_pop$dispersion
  for (j in seq_len(J)) {
    theta_ng[j] <- rlmb(1, y = estimate[j], x = as.matrix(1),
                        pfamily = dNormal(mu_theta, Sigma = sigma_theta_sq, dispersion = sigma_y_sq[j]))$coefficients[1, 1]
  }
}

for (k in seq_len(n_sim_ng)) {
  out_pop <- rlmb(1, y = theta_ng, x = x_one,
                  pfamily = dNormal_Gamma(mu_mu, sigma_mu / disp_ML, shape = shape, rate = rate))
  mu_theta     <- out_pop$coefficients[1, 1]
  sigma_theta_sq <- out_pop$dispersion
  for (j in seq_len(J)) {
    theta_ng[j] <- rlmb(1, y = estimate[j], x = as.matrix(1),
                        pfamily = dNormal(mu_theta, Sigma = sigma_theta_sq, dispersion = sigma_y_sq[j]))$coefficients[1, 1]
  }
  theta_out_ng[k, ] <- theta_ng
  mu_out_ng[k]      <- mu_theta
  sigma_theta_out_ng[k] <- sqrt(sigma_theta_sq)
}

colMeans(theta_out_ng)
#>         A         B         C         D         E         F         G         H 
#> 12.243413  8.542895  7.020558  8.310343  6.120766  6.536146 11.441157  9.249691
mean(mu_out_ng)
#> [1] 8.768267
mean(sigma_theta_out_ng)
#> [1] 6.577895
sqrt(diag(var(theta_out_ng)))
#>        A        B        C        D        E        F        G        H 
#> 7.318593 5.885325 7.074361 6.026843 6.215671 6.596157 6.315940 7.188958

The dNormal_Gamma prior is well-suited for Gibbs sampling: each call avoids the RSS/UB2 minimization used by dIndependent_Normal_Gamma, so run time stays manageable.

2.2 Eight Schools with dIndependent_Normal_Gamma Prior

The same hierarchical model can be fit with dIndependent_Normal_Gamma, which assumes prior independence between coefficients and precision. This prior triggers repeated envelope minimization (RSS and UB2) on each population-block draw, making it slower and less suitable for long Gibbs runs.

The Gibbs chunks below are not evaluated at vignette build time (eval = FALSE); results for this block are loaded from the same RDS as in §2.1 (chapter13-load-schools-gibbs).

Population block: Draw \((\mu, \sigma_\theta^2) \mid \theta\) using rlmb with dIndependent_Normal_Gamma(mu_mu, sigma_mu, shape, rate).
School block: Same as above—draw \(\theta_j \mid \mu, \sigma_\theta^2, y_j\) using dNormal(...).

theta <- estimate
n_burn_schools <- 1000
n_sim_schools  <- 1000
theta_out <- matrix(0, nrow = n_sim_schools, ncol = J)
mu_out    <- numeric(n_sim_schools)
sigma_theta_out <- numeric(n_sim_schools)
iters_out1 <- numeric(n_burn_schools)
iters_out2 <- numeric(n_sim_schools)

set.seed(102)
x_one <- as.matrix(rep(1, J), nrow = J, ncol = 1)

for (k in seq_len(n_burn_schools)) {
  out_pop <- rlmb(1, y = theta, x = x_one,
                  pfamily = dIndependent_Normal_Gamma(mu_mu, sigma_mu, shape = shape, rate = rate))
  mu_theta     <- out_pop$coefficients[1, 1]
  sigma_theta_sq <- out_pop$dispersion
  for (j in seq_len(J)) {
    theta[j] <- rlmb(1, y = estimate[j], x = as.matrix(1),
                     pfamily = dNormal(mu_theta, Sigma = sigma_theta_sq, dispersion = sigma_y_sq[j]))$coefficients[1, 1]
  }
  iters_out1[k] <- out_pop$iters
}

## Mean draws per acceptance (population block) after burn-in
## Lower = better. In demo Ex_07_Schools.R this was ~6.01; compare to check for regression.
mean(iters_out1)
#> [1] 18.972

for (k in seq_len(n_sim_schools)) {
  out_pop <- rlmb(1, y = theta, x = x_one,
                  pfamily = dIndependent_Normal_Gamma(mu_mu, sigma_mu, shape = shape, rate = rate))
  mu_theta     <- out_pop$coefficients[1, 1]
  sigma_theta_sq <- out_pop$dispersion
  for (j in seq_len(J)) {
    theta[j] <- rlmb(1, y = estimate[j], x = as.matrix(1),
                     pfamily = dNormal(mu_theta, Sigma = sigma_theta_sq, dispersion = sigma_y_sq[j]))$coefficients[1, 1]
  }
  theta_out[k, ] <- theta
  mu_out[k]      <- mu_theta
  sigma_theta_out[k] <- sqrt(sigma_theta_sq)
  iters_out2[k]  <- out_pop$iters
}

## Mean draws per acceptance (population block) during main run
mean(iters_out2)
#> [1] 17.867

colMeans(theta_out)
#>         A         B         C         D         E         F         G         H 
#> 12.752608  8.042492  6.163618  7.744056  5.239019  6.150466 11.402180  9.231072
mean(mu_out)
#> [1] 8.465987
mean(sigma_theta_out)
#> [1] 7.583587
sqrt(diag(var(theta_out)))
#>        A        B        C        D        E        F        G        H 
#> 8.705483 6.580200 7.953565 6.889440 6.263768 6.946018 7.081195 7.800455

Posterior means of \(\theta_j\) show shrinkage toward the population mean compared to the raw estimates.

Block Gibbs sampling with rlmb for the population and school blocks follows the same conditioning logic as in other glmbayes linear-model examples; the independent Normal–Gamma population step is related to the joint envelope construction in (Nygren 2025).

3. Summary

Use case	Function	Block 1	Block 2	Data used
Dispersion + \(\beta\) (Gaussian)	rlmb	\(\beta \mid \phi\): `dNormal(..., dispersion = phi)`	\(\phi \mid \beta\): `dGamma(shape, rate, beta = beta)`	Dobson plant weight
Hierarchical (Eight Schools, Gibbs-suitable)	rlmb	\((\mu, \sigma_\theta^2) \mid \theta\): `dNormal_Gamma(mu, Sigma_0, shape, rate)`	\(\theta_j \mid \mu, \sigma_\theta^2\): `dNormal(..., dispersion = sigma_j^2)`	Eight Schools
Hierarchical (Eight Schools)	rlmb	\((\mu, \sigma_\theta^2) \mid \theta\): `dIndependent_Normal_Gamma`	\(\theta_j \mid \mu, \sigma_\theta^2\): `dNormal(..., dispersion = sigma_j^2)`	Eight Schools

The same dispersion-and-coefficients idea extends to gamma regression when dispersion must be estimated; a worked example may be added when a suitable dataset is available.

For hierarchical generalized linear models (e.g., Poisson), see Chapter 14. For prior setup and conjugate fits, see ?Prior_Setup, ?lmb, and ?glmb.

Chapter 13: Hierarchical Linear Models

Kjell Nygren

2026-05-18

Chapter 13: Hierarchical Linear Models

1. Use Case 1: Dispersion and Regression Coefficients (Gaussian)

2. Use Case 2: Hierarchical (Random Effects) Models

2.1 Eight Schools with dNormal_Gamma Prior (Gibbs-suitable)

2.2 Eight Schools with dIndependent_Normal_Gamma Prior

3. Summary

References