--- title: "Chapter 13: Hierarchical Linear Models" author: "Kjell Nygren" date: "`r Sys.Date()`" output: rmarkdown::html_vignette bibliography: REFERENCES.bib reference-section-title: References vignette: > %\VignetteIndexEntry{Chapter 13: Hierarchical Linear Models} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(glmbayes) ``` # Chapter 13: Hierarchical Linear Models This chapter focuses on **hierarchical linear models**—Gaussian models with random effects—and related use cases that rely on **`rlmb`** for block Gibbs sampling. The function **`rlmb`** is a minimal Bayesian simulation engine for linear (Gaussian) models: it takes a response vector, design matrix, and prior specification, and returns draws from the posterior. It is intended for settings where you need **repeated draws with updated conditioning** inside a Gibbs sampler (e.g., alternating between blocks for coefficients and dispersion, or for population and unit-level parameters). - **Use `lmb` or `glmb`** when you want a single fit or repeated standalone fits: formula interface, full object, `summary`, `predict`, etc. - **Use `rlmb`** when you are **building your own MCMC** for linear models, such as a two-block Gibbs sampler: you alternate between blocks, each time calling `rlmb` with the current values of the other block. The low-overhead, matrix-in/matrix-out interface is then a better fit. For hierarchical **generalized** linear models (e.g., Poisson), use **`rglmb`**—see Chapter 14. We illustrate two use cases for **`rlmb`** in the context of **linear (Gaussian)** models: (1) estimating **dispersion and regression coefficients** when the prior is non-conjugate, and (2) **hierarchical (random effects)** models. All examples use data already available in the package. For hierarchical **generalized** linear models (e.g., Poisson), see Chapter 14. ## 1. Use Case 1: Dispersion and Regression Coefficients (Gaussian) In the Gaussian linear model \(y = X\beta + \varepsilon\) with \(\varepsilon \sim \text{N}(0, \phi I)\), the likelihood depends on the residual variance \(\phi\). If the prior on \((\beta, \phi)\) is not conjugate (e.g., you want independent priors on \(\beta\) and \(\phi\)), you can still sample from the posterior by a **two-block Gibbs sampler**: 1. **Block 1:** Draw \(\beta \mid \phi, y\) — Gaussian posterior given fixed \(\phi\). 2. **Block 2:** Draw \(\phi \mid \beta, y\) — Gamma posterior given fixed \(\beta\). **`rlmb`** implements both conditionals: use **`dNormal(..., dispersion = phi)`** for the \(\beta\)-step and **`dGamma(shape, rate, beta = beta_current)`** for the \(\phi\)-step; then take the returned **`dispersion`** as the new \(\phi\). We use the plant weight data from [@Dobson1990] (Page 9): weight by treatment group (Ctl vs Trt). The code below sets the prior from **`Prior_Setup`**, then runs a short burn-in and stores draws from the Gibbs sampler. Posterior means are compared to the conjugate **`lmb(..., dIndependent_Normal_Gamma(...))`** fit. ```{r dobson-setup} ## Dobson (1990) "An Introduction to Generalized Linear Models", Page 9: Plant Weight Data ctl <- c(4.17, 5.58, 5.18, 6.11, 4.50, 4.61, 5.17, 4.53, 5.33, 5.14) trt <- c(4.81, 4.17, 4.41, 3.59, 5.87, 3.83, 6.03, 4.89, 4.32, 4.69) group <- gl(2, 10, 20, labels = c("Ctl", "Trt")) weight <- c(ctl, trt) ps <- Prior_Setup(weight ~ group) x <- ps$x y <- ps$y mu <- ps$mu V <- ps$Sigma shape <- ps$shape rate <- ps$rate rate_dg <- if (!is.null(ps$rate_gamma)) ps$rate_gamma else rate disp_ML <- ps$dispersion ``` ```{r dobson-gibbs, eval = TRUE} set.seed(180) dispersion_current <- disp_ML ## Burn-in n_burn <- 1000 for (i in seq_len(n_burn)) { out_beta <- rlmb(n = 1, y = y, x = x, pfamily = dNormal(mu = mu, Sigma = V, dispersion = dispersion_current)) out_phi <- rlmb(n = 1, y = y, x = x, pfamily = dGamma(shape = shape, rate = rate_dg, beta = out_beta$coefficients[1, ])) dispersion_current <- out_phi$dispersion } ## Sampling n_sim <- 1000 beta_out <- matrix(0, nrow = n_sim, ncol = 2) disp_out <- numeric(n_sim) for (i in seq_len(n_sim)) { out_beta <- rlmb(n = 1, y = y, x = x, pfamily = dNormal(mu = mu, Sigma = V, dispersion = dispersion_current)) out_phi <- rlmb(n = 1, y = y, x = x, pfamily = dGamma(shape = shape, rate = rate_dg, beta = out_beta$coefficients[1, ])) dispersion_current <- out_phi$dispersion beta_out[i, ] <- out_beta$coefficients[1, ] disp_out[i] <- out_phi$dispersion } ``` ```{r dobson-compare} lmb_D9 <- lmb(weight ~ group, dIndependent_Normal_Gamma(mu, V, shape = ps$shape_ING, rate = ps$rate)) print(lmb_D9) ## Compare Gibbs vs lmb: means and SDs coef_names <- colnames(ps$x) tbl <- data.frame( Parameter = c(coef_names, "dispersion"), Gibbs_mean = c(colMeans(beta_out), mean(disp_out)), Gibbs_SD = c(apply(beta_out, 2, sd), sd(disp_out)), lmb_mean = c(colMeans(lmb_D9$coefficients), mean(lmb_D9$dispersion)), lmb_SD = c(apply(lmb_D9$coefficients, 2, sd), sd(lmb_D9$dispersion)) ) knitr::kable(tbl, digits = 4, caption = "Dobson plant weight: Gibbs sampler vs conjugate lmb fit") ``` The Gibbs posterior means and standard deviations for \(\beta\) and \(\phi\) should be close to the conjugate **`lmb`** fit. In practice, use longer burn-in and more stored draws. **Gamma regression.** The same idea—alternating draws of coefficients given dispersion and dispersion given coefficients—can be applied to **gamma regression** when the dispersion (shape) of the Gamma response is unknown and must be estimated. The conditioning steps are then implemented with **`rglmb`** (family `Gamma`) and an appropriate prior for the dispersion block. The package does not currently provide a ready-made example for gamma regression in this setting; typical textbook datasets often imply large weights that make the Gibbs steps costly. A future vignette or example may use an alternate dataset once a suitable one is identified. ## 2. Use Case 2: Hierarchical (Random Effects) Models ### 2.1 Eight Schools with dNormal_Gamma Prior (Gibbs-suitable) The **Eight Schools** example from [@Gelman2013, Ch. 5] has \(J = 8\) schools; each school \(j\) has an observed effect estimate \(y_j\) with known sampling variance \(\sigma_j^2\). The hierarchical model has a population mean \(\mu\) and variance \(\sigma_\theta^2\), and school-specific effects \(\theta_j\). With the **`dNormal_Gamma`** prior on \((\mu, \sigma_\theta^2)\), the Gibbs sampler is efficient and suitable for repeated use inside a loop because the envelope is built once per call without the minimization overhead of **`dIndependent_Normal_Gamma`**. 1. **Population block:** Draw \((\mu, \sigma_\theta^2) \mid \theta\) using **`rlmb`** with **`dNormal_Gamma(mu_mu, sigma_mu/disp_ML, shape, rate)`**. 2. **School block:** For each \(j\), draw \(\theta_j \mid \mu, \sigma_\theta^2, y_j\) using **`rlmb`** with **`dNormal(mu, Sigma = sigma_theta^2, dispersion = sigma_j^2)`**. ```{r schools-data} school <- c("A", "B", "C", "D", "E", "F", "G", "H") estimate <- c(28.39, 7.94, -2.75, 6.82, -0.64, 0.63, 18.01, 12.16) sd_obs <- c(14.9, 10.2, 16.3, 11.0, 9.4, 11.4, 10.4, 17.6) J <- length(school) sigma_y_sq <- sd_obs^2 # known sampling variances for each school mu_mu <- mean(estimate) sigma_mu <- var(estimate) n_prior <- 0.5 disp_ML <- var(estimate) shape <- n_prior / 2 rate <- disp_ML * shape ``` The **`dNormal_Gamma`** Gibbs chunk below is **not evaluated** when the vignette is built (`eval = FALSE`). Stored draws are loaded in **`chapter13-load-schools-gibbs`** from **`inst/extdata/Chapter13_Eight_Schools_two_gibbs.rds`**, produced by **`data-raw/make_Chapter13_Eight_Schools_gibbs_output.R`**. ```{r chapter13-load-schools-gibbs} ch13_path <- system.file( "extdata", "Chapter13_Eight_Schools_two_gibbs.rds", package = "glmbayes" ) stopifnot(nzchar(ch13_path), file.exists(ch13_path)) ch13 <- readRDS(ch13_path) ng <- ch13$normal_gamma ind <- ch13$indep_norm_gamma stopifnot( ncol(ng$theta_out) == J, nrow(ng$theta_out) == ng$n_sim, ncol(ind$theta_out) == J, nrow(ind$theta_out) == ind$n_sim ) theta_out_ng <- ng$theta_out mu_out_ng <- ng$mu_out sigma_theta_out_ng <- ng$sigma_theta_out n_burn_ng <- ng$n_burn n_sim_ng <- ng$n_sim theta_out <- ind$theta_out mu_out <- ind$mu_out sigma_theta_out <- ind$sigma_theta_out iters_out1 <- ind$iters_out1 iters_out2 <- ind$iters_out2 n_burn_schools <- ind$n_burn n_sim_schools <- ind$n_sim ``` ```{r schools-ng-gibbs, eval = FALSE} set.seed(101) x_one <- as.matrix(rep(1, J), nrow = J, ncol = 1) theta_ng <- estimate n_burn_ng <- 1000 n_sim_ng <- 1000 theta_out_ng <- matrix(0, nrow = n_sim_ng, ncol = J) mu_out_ng <- numeric(n_sim_ng) sigma_theta_out_ng <- numeric(n_sim_ng) for (k in seq_len(n_burn_ng)) { out_pop <- rlmb(1, y = theta_ng, x = x_one, pfamily = dNormal_Gamma(mu_mu, sigma_mu / disp_ML, shape = shape, rate = rate)) mu_theta <- out_pop$coefficients[1, 1] sigma_theta_sq <- out_pop$dispersion for (j in seq_len(J)) { theta_ng[j] <- rlmb(1, y = estimate[j], x = as.matrix(1), pfamily = dNormal(mu_theta, Sigma = sigma_theta_sq, dispersion = sigma_y_sq[j]))$coefficients[1, 1] } } for (k in seq_len(n_sim_ng)) { out_pop <- rlmb(1, y = theta_ng, x = x_one, pfamily = dNormal_Gamma(mu_mu, sigma_mu / disp_ML, shape = shape, rate = rate)) mu_theta <- out_pop$coefficients[1, 1] sigma_theta_sq <- out_pop$dispersion for (j in seq_len(J)) { theta_ng[j] <- rlmb(1, y = estimate[j], x = as.matrix(1), pfamily = dNormal(mu_theta, Sigma = sigma_theta_sq, dispersion = sigma_y_sq[j]))$coefficients[1, 1] } theta_out_ng[k, ] <- theta_ng mu_out_ng[k] <- mu_theta sigma_theta_out_ng[k] <- sqrt(sigma_theta_sq) } ``` ```{r schools-ng-summary} colMeans(theta_out_ng) mean(mu_out_ng) mean(sigma_theta_out_ng) sqrt(diag(var(theta_out_ng))) ``` The **`dNormal_Gamma`** prior is well-suited for Gibbs sampling: each call avoids the RSS/UB2 minimization used by **`dIndependent_Normal_Gamma`**, so run time stays manageable. ### 2.2 Eight Schools with dIndependent_Normal_Gamma Prior The same hierarchical model can be fit with **`dIndependent_Normal_Gamma`**, which assumes prior independence between coefficients and precision. This prior triggers repeated envelope minimization (RSS and UB2) on each population-block draw, making it slower and less suitable for long Gibbs runs. The Gibbs chunks below are **not evaluated** at vignette build time (`eval = FALSE`); results for this block are loaded from the same RDS as in §2.1 (**`chapter13-load-schools-gibbs`**). 1. **Population block:** Draw \((\mu, \sigma_\theta^2) \mid \theta\) using **`rlmb`** with **`dIndependent_Normal_Gamma(mu_mu, sigma_mu, shape, rate)`**. 2. **School block:** Same as above—draw \(\theta_j \mid \mu, \sigma_\theta^2, y_j\) using **`dNormal(...)`**. ```{r schools-indep-setup, eval = FALSE} theta <- estimate n_burn_schools <- 1000 n_sim_schools <- 1000 theta_out <- matrix(0, nrow = n_sim_schools, ncol = J) mu_out <- numeric(n_sim_schools) sigma_theta_out <- numeric(n_sim_schools) iters_out1 <- numeric(n_burn_schools) iters_out2 <- numeric(n_sim_schools) ``` ```{r schools-burnin, eval = FALSE} set.seed(102) x_one <- as.matrix(rep(1, J), nrow = J, ncol = 1) for (k in seq_len(n_burn_schools)) { out_pop <- rlmb(1, y = theta, x = x_one, pfamily = dIndependent_Normal_Gamma(mu_mu, sigma_mu, shape = shape, rate = rate)) mu_theta <- out_pop$coefficients[1, 1] sigma_theta_sq <- out_pop$dispersion for (j in seq_len(J)) { theta[j] <- rlmb(1, y = estimate[j], x = as.matrix(1), pfamily = dNormal(mu_theta, Sigma = sigma_theta_sq, dispersion = sigma_y_sq[j]))$coefficients[1, 1] } iters_out1[k] <- out_pop$iters } ``` ```{r schools-iters-burnin, eval = TRUE} ## Mean draws per acceptance (population block) after burn-in ## Lower = better. In demo Ex_07_Schools.R this was ~6.01; compare to check for regression. mean(iters_out1) ``` ```{r schools-gibbs, eval = FALSE} for (k in seq_len(n_sim_schools)) { out_pop <- rlmb(1, y = theta, x = x_one, pfamily = dIndependent_Normal_Gamma(mu_mu, sigma_mu, shape = shape, rate = rate)) mu_theta <- out_pop$coefficients[1, 1] sigma_theta_sq <- out_pop$dispersion for (j in seq_len(J)) { theta[j] <- rlmb(1, y = estimate[j], x = as.matrix(1), pfamily = dNormal(mu_theta, Sigma = sigma_theta_sq, dispersion = sigma_y_sq[j]))$coefficients[1, 1] } theta_out[k, ] <- theta mu_out[k] <- mu_theta sigma_theta_out[k] <- sqrt(sigma_theta_sq) iters_out2[k] <- out_pop$iters } ``` ```{r schools-iters-main, eval = TRUE} ## Mean draws per acceptance (population block) during main run mean(iters_out2) ``` ```{r schools-summary, eval = TRUE} colMeans(theta_out) mean(mu_out) mean(sigma_theta_out) sqrt(diag(var(theta_out))) ``` Posterior means of \(\theta_j\) show shrinkage toward the population mean compared to the raw estimates. Block Gibbs sampling with **`rlmb`** for the population and school blocks follows the same conditioning logic as in other **glmbayes** linear-model examples; the independent Normal--Gamma population step is related to the joint envelope construction in [@glmbayesIndNormGammaVignette]. ## 3. Summary | Use case | Function | Block 1 | Block 2 | Data used | |----------|----------|---------|---------|-----------| | Dispersion + \(\beta\) (Gaussian) | **rlmb** | \(\beta \mid \phi\): `dNormal(..., dispersion = phi)` | \(\phi \mid \beta\): `dGamma(shape, rate, beta = beta)` | Dobson plant weight | | Hierarchical (Eight Schools, Gibbs-suitable) | **rlmb** | \((\mu, \sigma_\theta^2) \mid \theta\): `dNormal_Gamma(mu, Sigma_0, shape, rate)` | \(\theta_j \mid \mu, \sigma_\theta^2\): `dNormal(..., dispersion = sigma_j^2)` | Eight Schools | | Hierarchical (Eight Schools) | **rlmb** | \((\mu, \sigma_\theta^2) \mid \theta\): `dIndependent_Normal_Gamma` | \(\theta_j \mid \mu, \sigma_\theta^2\): `dNormal(..., dispersion = sigma_j^2)` | Eight Schools | The same dispersion-and-coefficients idea extends to **gamma regression** when dispersion must be estimated; a worked example may be added when a suitable dataset is available. For hierarchical generalized linear models (e.g., Poisson), see Chapter 14. For prior setup and conjugate fits, see **`?Prior_Setup`**, **`?lmb`**, and **`?glmb`**.