--- title: "Chapter 11: Estimating Models with unknown dispersion parameters" output: rmarkdown::html_vignette bibliography: REFERENCES.bib reference-section-title: References vignette: > %\VignetteIndexEntry{Chapter 11: Estimating Models with unknown dispersion parameters} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning= FALSE, message=FALSE ) ``` ```{r setup} library(glmbayes) ``` ## 1. Introduction Earlier chapters developed Bayesian linear and generalized linear models under the assumption that the dispersion parameter, denoted by \( \phi \) (or equivalently the residual variance \( \sigma^{2} \)), was known. For Gaussian models this corresponds to fixing the variance of the error term, and for quasi‑families it corresponds to fixing the scale parameter. This chapter extends the framework to unknown dispersion parameters---a standard topic in Bayesian linear and generalized linear models [@OHaganForster2004; @Gelman2013]---showing how glmbayes supports: (1) Priors on the dispersion only (2) Joint conjugate Normal–Gamma priors (3) Independent Normal–Gamma priors • iid sampling via truncated Gamma priors • two‑block Gibbs sampling Full derivations for Gaussian dispersion, shape, and rate calibration from `Prior_Setup()` / `compute_gaussian_prior()`—and how those objects connect to conjugate and independent Normal--Gamma priors—are given in **Chapter A12** (). All Gaussian examples use the **Dobson** plant‑weight data [@Dobson1990], first introduced in Chapter 02. ```{r dobson} ## Annette Dobson (1990) "An Introduction to Generalized Linear Models". ## Page 9: Plant Weight Data. ctl <- c(4.17, 5.58, 5.18, 6.11, 4.50, 4.61, 5.17, 4.53, 5.33, 5.14) trt <- c(4.81, 4.17, 4.41, 3.59, 5.87, 3.83, 6.03, 4.89, 4.32, 4.69) group <- gl(2, 10, 20, labels = c("Ctl", "Trt")) weight <- c(ctl, trt) ``` ## 2. Exponential-Family Background and Log-Concavity Many likelihoods used in generalized linear models belong to the exponential family. With observation weights \(w_1,\dots,w_n \ge 0\), the weighted exponential-family density is \[ f(y \mid \theta,\phi,w) = \exp\left\{ \sum_{i=1}^{n} w_i \left[ \frac{y_i \theta_i - b(\theta_i)}{a(\phi)} + c(y_i,\phi) \right] \right\}. \] Here: - \( \theta_i \) is the canonical parameter - \( b(\theta_i) \) is the cumulant function - \( a(\phi) \) is the dispersion function - \( c(y_i,\phi) \) collects terms not involving \( \theta_i \) - \( w_i \) is the observation weight The **weighted log-likelihood** is therefore \[ \ell(\beta,\phi) = \sum_{i=1}^{n} w_i \left[ \frac{y_i\theta_i - b(\theta_i)}{a(\phi)} + c(y_i,\phi) \right]. \] This is the form used by both `glm()` and the Bayesian functions `glmb()`, `lmb()`, `rglmb()`, and `rlmb()`. --- ## 2.2 Special Case: Weighted Gaussian Linear Regression Consider the weighted Gaussian model \[ y_i \mid \beta,\phi \sim N(x_i^\top\beta,\;\phi/w_i), \qquad i=1,\dots,n. \] Let \(W = \mathrm{diag}(w_1,\dots,w_n)\) and define the precision \[ \tau = \frac{1}{\phi}. \] ### **Weighted log-likelihood** The log-likelihood (up to constants) is \[ \ell(\beta,\phi) = -\frac{1}{2\phi} \sum_{i=1}^{n} w_i (y_i - x_i^\top\beta)^2 \;-\; \frac{1}{2} \sum_{i=1}^{n} w_i \log(2\pi\phi). \] Equivalently, in precision form: \[ \ell(\beta,\tau) = \frac{\tau}{2} \sum_{i=1}^{n} w_i (y_i - x_i^\top\beta)^2 \;-\; \frac{1}{2} \sum_{i=1}^{n} w_i \log\left(\frac{2\pi}{\tau}\right). \] ### **Exponential-family identification** This matches the weighted exponential-family form with \[ \theta_i = \mu_i = x_i^\top\beta, \qquad b(\theta_i) = \tfrac{1}{2}\theta_i^2, \qquad a(\phi) = \phi, \] and \[ c(y_i,\phi) = -\frac{1}{2\phi}y_i^2 -\frac{1}{2}\log(2\pi\phi). \] ### **Weighted least squares representation** Base R implements weighted least squares via \[ y^\ast = W^{1/2} y, \qquad X^\ast = W^{1/2} X. \] The glmbayes functions use the same weighted likelihood, so all theoretical expressions above apply directly. ## Log-Concavity Properties in the Weighted Gaussian Case 1. **Concavity in \( \beta \)** For fixed \( \phi \), the negative weighted log-likelihood is \[ -\ell(\beta \mid \phi) = \frac{1}{2\phi} (y - X\beta)^{T} W (y - X\beta) + \text{const}, \] which is strictly convex whenever \( W \) has positive diagonal entries. Therefore the weighted log-likelihood is strictly concave in \( \beta \). 2. **Concavity in precision \( \tau = 1/\phi \)** Rewriting the weighted likelihood in terms of \( \tau \): \[ \ell(\beta,\tau) = -\frac{\tau}{2} (y - X\beta)^{T} W (y - X\beta) - \frac{1}{2} \left( \sum_{i=1}^{n} w_i \right) \log \tau + \text{const}, \] which is concave in \( \tau \) because the first term is linear in \( \tau \) and the second is concave. These properties guarantee: - unique posterior modes for both \( \beta \) and \( \tau \) - efficient iid sampling under Normal, Normal-Gamma, and Independent Normal-Gamma priors - stable two-block Gibbs sampling when the dispersion is updated separately The weighted Gaussian model thus provides the cleanest illustration of how weights enter the exponential-family structure and how they preserve log-concavity. ### 2.3 Special Case: Gamma Regression With a Log Link Consider the weighted Gamma regression model with log link: \[ y_i \mid \beta,\phi \;\sim\; \text{Gamma}\!\left( \text{shape}=\frac{1}{\phi},\; \text{scale}=\phi\,\mu_i \right), \qquad \mu_i = \exp(x_i^\top\beta), \] with observation weights \(w_i \ge 0\). Let the **precision** be \[ \tau = \frac{1}{\phi}. \] --- #### **Weighted log-likelihood** Up to an additive constant, the weighted log-likelihood is \[ \ell(\beta,\tau) = \sum_{i=1}^n w_i\left[ \tau\log\tau - \tau\log \mu_i + (\tau-1)\log y_i - \frac{\tau y_i}{\mu_i} \right]. \] Using \(\mu_i = \exp(x_i^\top\beta)\), this becomes \[ \ell(\beta,\tau) = \sum_{i=1}^n w_i\left[ \tau\log\tau - \tau x_i^\top\beta + (\tau-1)\log y_i - \tau y_i e^{-x_i^\top\beta} \right]. \] --- #### **Log‑concavity** For fixed \(\tau\): - the log-likelihood is **concave in \(\beta\)** because \(\mu_i = \exp(x_i^\top\beta)\) enters through a convex function and the Gamma log-likelihood is concave in the canonical parameter. For fixed \(\beta\): - the log-likelihood is **concave in \(\tau\)** because \(\tau\log\tau\), \((\tau-1)\log y_i\), and \(-\tau y_i/\mu_i\) combine into a concave function. These properties ensure: - unique posterior modes - stable envelope construction to support accept-reject sampling for \(\beta\) - valid accept–reject sampling for \(\tau\) ### Implications for Bayesian GLMs The Gamma log–link model thus shares the same structural advantages as the Gaussian and Poisson cases: * concavity in both \( \beta \) and \( v \), * unique posterior modes, * stable envelope construction, * efficient iid sampling for the dispersion parameter, * compatibility with two–block Gibbs sampling when desired. These properties make the Gamma log–link model an especially clean example of how exponential–family structure supports fast, reliable posterior simulation in glmbayes. ## 3. Gaussian Model With Unknown Dispersion For the Gaussian model \[ y = X\beta + \varepsilon,\qquad \varepsilon \sim N(0,\phi I_{n}), \] the likelihood is \[ p(y \mid \beta,\phi) \propto \phi^{-n/2}\exp\!\left(-\frac{S(\beta)}{2\phi}\right), \] where \[ S(\beta) = (y - X\beta)^{T}(y - X\beta) \] is the residual sum of squares. To support the different prior specification below, we use the Prior_Setup function to set and extract default settings for the needed parameters. ```{r Prior_Setup,results = "hide"} ps <- Prior_Setup(weight ~ group) x <- ps$x mu <- ps$mu V <- ps$Sigma y <- ps$y shape <- ps$shape rate <- ps$rate ``` ## 3. Priors for Gaussian Models With Unknown Dispersion We now specify priors for \((\beta,\tau)\) and derive the corresponding log-priors and log-posteriors. Throughout: - dispersion: \( \phi \) - precision: \( \tau = 1/\phi \) - weighted residual sum of squares: \[ \mathrm{RSS}(\beta) = \sum_{i=1}^{n} w_i (y_i - x_i^\top\beta)^2. \] --- ## 3.1 Prior on the Dispersion Only (Gamma Prior on Precision) We place a Gamma prior on the precision: \[ \tau \sim \mathrm{Gamma}(a_0,b_0), \] with log-prior \[ \log p(\tau) = (a_0 - 1)\log\tau - b_0\tau + \text{const}. \] Given fixed \(\beta\), the conditional posterior is \[ \tau \mid \beta,y \sim \mathrm{Gamma}\left( a_0 + \frac{n_{w}}{2}, \; b_0 + \frac{\mathrm{RSS}(\beta)}{2} \right), \] where \(n_{w} = \sum_i w_i\). --- To illustrate this, we fix \(\beta\) at the full-model coefficient vector from `Prior_Setup()` (`ps$coefficients`, the internal GLM maximum-likelihood estimate). We can then generate conditional Bayesian draws for the dispersion as below. This is essentially the equivalent of the classical function **gamma.dispersion()** function provided by the **MASS** package. ```{r glm_call_gamma_prior} out_rlmb <- rlmb( n = 1000, y = y, x = x, pfamily = dGamma(shape = shape, rate = rate, beta = ps$coefficients) ) ``` The posterior mean of the dispersion is then: ```{r glm_call_gamma_mean} mean(out_rlmb$dispersion) ``` ## 3.2 Joint Conjugate Normal–Gamma Prior The conjugate Normal--Gamma prior couples \(\beta\) and \(\tau\) [@LindleySmith1972; @OHaganForster2004]: \[ \beta \mid \tau \sim N(\mu_0,\;(\tau\Sigma_0)^{-1}), \qquad \tau \sim \mathrm{Gamma}(a_0,b_0). \] ### **Log-prior** \[ \log p(\beta,\tau) = (a_0 - 1)\log\tau - b_0\tau -\frac{\tau}{2}(\beta-\mu_0)^\top\Sigma_0^{-1}(\beta-\mu_0) +\text{const}. \] ### **Log-posterior** Combining with the Gaussian log-likelihood: \[ \log p(\beta,\tau \mid y) = \frac{\tau}{2}\mathrm{RSS}(\beta) -\frac{n_{w}}{2}\log\left(\frac{2\pi}{\tau}\right) +(a_0 - 1)\log\tau - b_0\tau -\frac{\tau}{2}(\beta-\mu_0)^\top\Sigma_0^{-1}(\beta-\mu_0) +\text{const}. \] This yields closed-form conditional posteriors: - \( \tau \mid y \) is Gamma - \( \beta \mid \tau,y \) is Normal allowing pure i.i.d. Monte Carlo sampling. --- The defining property of conjugacy is that the **posterior has the same posterior functional form**. In glmbayes, this model can be estimated as below. ```{r Conjugate_prior} ## Conjugate Normal_Gamma Prior lmb.D9_v2 <- lmb( weight ~ group, pfamily = dNormal_Gamma( ps$mu, Sigma_0 = ps$Sigma_0, shape = ps$shape, rate = ps$rate ) ) summary(lmb.D9_v2)$dispersion ``` ## 3.3 Independent Normal–Gamma Prior Here the prior factorizes: \[ \beta \sim N(\mu_0,\Sigma_0), \qquad \tau \sim \mathrm{Gamma}(a_0,b_0), \] so \(\beta\) and \(\tau\) are *a priori independent*. ### **Log-prior** \[ \log p(\beta,\tau) = -\frac{1}{2}(\beta-\mu_0)^\top\Sigma_0^{-1}(\beta-\mu_0) + (a_0 - 1)\log\tau - b_0\tau +\text{const}. \] ### **Log-posterior** \[ \log p(\beta,\tau \mid y) = \frac{\tau}{2}\mathrm{RSS}(\beta) -\frac{n_{w}}{2}\log\left(\frac{2\pi}{\tau}\right) -\frac{1}{2}(\beta-\mu_0)^\top\Sigma_0^{-1}(\beta-\mu_0) + (a_0 - 1)\log\tau - b_0\tau +\text{const}. \] A variation on this prior specification—using a **truncated Gamma prior** for the dispersion—is the model developed in **Chapter A07**, where: - the conditional posterior in \( \beta \) is *not conjugate*, - the conditional posterior in \( \tau = 1/\phi \) is *not conjugate*, and - the joint posterior remains **log‑concave**, enabling the **accept–reject envelope sampler**. This truncated‑Gamma, non‑conjugate Normal–Gamma model is therefore a direct extension of the independent Normal–Gamma prior described above, but requires the more advanced envelope‑based sampling strategy documented in Chapter A07. ### 3.3.1 Independent Normal Gamma Prior in glmbayes This prior is implemented in glmbayes using the below specification (but where a truncated gamma replaces the full gamma). The chunk below is **not evaluated** when the vignette is built. Precomputed results are read from **`inst/extdata/`** file **`Chapter11_Dobson_two_sampler.rds`**. To regenerate that file, run **`make_Chapter11_Dobson_two_sampler_output.R`** under **`data-raw/`**, from the package source root. ```{r chapter11-load-two-sampler} ch11_path <- system.file( "extdata", "Chapter11_Dobson_two_sampler.rds", package = "glmbayes" ) stopifnot(nzchar(ch11_path), file.exists(ch11_path)) ch11_saved <- readRDS(ch11_path) stopifnot( ncol(ch11_saved$gibbs_two_block$beta_out) == ncol(ps$x), nrow(ch11_saved$indep_norm_gamma$coefficients) == ch11_saved$indep_norm_gamma$n_draws ) ``` ```{r Independent_normal_gamma_prior, eval = FALSE} lmb.D9_v3 <- lmb(n = 10000, weight ~ group, dIndependent_Normal_Gamma( ps$mu, ps$Sigma, shape = ps$shape_ING, rate = ps$rate, max_disp_perc = 0.99, disp_lower = NULL, disp_upper = NULL ) ) summary(lmb.D9_v3)$coefficients summary(lmb.D9_v3)$dispersion sd(lmb.D9_v3$dispersion) ``` ```{r Independent_normal_gamma_loaded} inc <- ch11_saved$indep_norm_gamma coef_summ_iid <- data.frame( posterior_mean = colMeans(inc$coefficients), posterior_SD = apply(inc$coefficients, 2, sd), row.names = inc$coef_colnames ) coef_summ_iid cat("Dispersion: posterior mean =", mean(inc$dispersion), " SD =", sd(inc$dispersion), "\n") ``` ### 3.3.2 Two-Block Gibbs Sampling using glmbayes If iid sampling is not desired, or if the user wants to inspect mixing behavior, a simple two-block Gibbs sampler [@RobertCasella2004] may be used: 1. Sample \( \beta \mid \tau, y \) from a Normal distribution 2. Sample \( \tau \mid \beta, y \) from a Gamma distribution The following chunk is **not evaluated** during the vignette build (`eval = FALSE`). Stored draws are loaded from the same RDS as above. ```{r two_block_Gibbs_sampler, eval = FALSE} set.seed(180) dispersion2 <- ps$dispersion ## Burn-in for (i in 1:1000) { out1 <- rlmb( n = 1, y = y, x = x, pfamily = dNormal(mu = mu, Sigma = V, dispersion = dispersion2) ) out2 <- rlmb( n = 1, y = y, x = x, pfamily = dGamma(shape = shape, rate = rate, beta = out1$coefficients[1, ]) ) dispersion2 <- out2$dispersion } ## Sampling beta_out <- matrix(0, nrow = 10000, ncol = 2) disp_out <- rep(0, 10000) for (i in 1:10000) { out1 <- rlmb( n = 1, y = y, x = x, pfamily = dNormal(mu = mu, Sigma = V, dispersion = dispersion2) ) out2 <- rlmb( n = 1, y = y, x = x, pfamily = dGamma(shape = shape, rate = rate, beta = out1$coefficients[1, ]) ) beta_out[i, ] <- out1$coefficients[1, 1:2] disp_out[i] <- out2$dispersion } colMeans(beta_out) sd(beta_out[, 1]) sd(beta_out[, 2]) mean(disp_out) sd(disp_out) ``` ```{r two_block_Gibbs_loaded} gb <- ch11_saved$gibbs_two_block beta_out <- gb$beta_out disp_out <- gb$disp_out colMeans(beta_out) sd(beta_out[, 1]) sd(beta_out[, 2]) mean(disp_out) sd(disp_out) ``` ### 3.3.3 Comparison of the Two Samplers Under default settings, both the truncated‑Gamma iid sampler and the two‑block Gibbs sampler produce nearly identical posterior summaries for the regression coefficients and the dispersion parameter. To make this explicit, the table below reports posterior means and standard deviations from: * the iid sampler using the independent Normal–Gamma prior (same specification as `lmb(..., dIndependent_Normal_Gamma(...), n = 10000)` above), and * the two‑block Gibbs sampler constructed from alternating Normal and Gamma updates. These values illustrate that both samplers target essentially the same posterior distribution (although the former has a truncated prior), with only minor Monte Carlo variation. ```{r chapter11-sampler-comparison-table, echo = FALSE} inc <- ch11_saved$indep_norm_gamma gb <- ch11_saved$gibbs_two_block cn <- inc$coef_colnames iid_m <- colMeans(inc$coefficients) iid_s <- apply(inc$coefficients, 2, sd) gib_m <- colMeans(gb$beta_out) gib_s <- apply(gb$beta_out, 2, sd) cmp <- data.frame( Parameter = c(cn, "Dispersion"), iid_Mean = c(iid_m, mean(inc$dispersion)), iid_SD = c(iid_s, sd(inc$dispersion)), Gibbs_Mean = c(gib_m, mean(gb$disp_out)), Gibbs_SD = c(gib_s, sd(gb$disp_out)) ) knitr::kable( cmp, digits = 4, caption = paste( "Dobson plant weight: independent Normal-Gamma (iid) vs", "two-block Gibbs" ) ) ``` The difference in the iid and Gibbs SD for the dispersion is largely due to the difference between a truncated and an unrestricted gamma prior. However, the iid sampler has two clear advantages: * **Independent draws**: no autocorrelation and no loss of effective sample size * **No tuning or burn‑in**: each draw is a direct sample from the posterior The Gibbs sampler mixes well for this model class, but it inevitably shows mild autocorrelation and convergence diagnostics are beyond the scope of this vignette. For most users, the iid sampler is easier to work with and safer as no convergence diagnostics are needed. For readers who want the full mathematical details of the iid sampler, see **Chapter A07**, which documents the complete envelope construction and accept‑reject algorithm for the independent Normal–Gamma prior. ## 4. Gamma Regression Models With Unknown Dispersion We now specify priors for \((\beta,\tau)\) and derive the corresponding log‑priors and log‑posteriors, using the **carinsca** data [@BaileySimon1960; @statsciCarinsca; @McCullagh1989]. Throughout: - dispersion: \(\phi\) - precision: \(\tau = 1/\phi\) - mean: \(\mu_i = \exp(x_i^\top\beta)\) - weighted log-likelihood: \(\ell(\beta,\tau)\) from Section 2.3 ```{r Carinsca} data(carinsca) carinsca$Merit <- ordered(carinsca$Merit) carinsca$Class <- factor(carinsca$Class) oldopt <- options(contrasts = c("contr.treatment", "contr.treatment")) Claims=carinsca$Claims Insured=carinsca$Insured Merit=carinsca$Merit Class=carinsca$Class Cost=carinsca$Cost Claims_Adj<-Claims/1000 glm.carinsca <- glm(Cost / Claims ~ Merit + Class, family = Gamma(link = "log"), weights = Claims_Adj, x = TRUE) ``` ```{r Prior_Setup_gamma,results = "hide"} ps <- Prior_Setup( Cost / Claims ~ Merit + Class, family = Gamma(link = "log"), weights = Claims_Adj ) mu=ps$mu V=ps$Sigma shape <- ps$shape rate <- ps$rate x <- ps$x y <- ps$y m<-nrow(x) p<-ncol(x) ## Starting dispersion for beta | tau in the two-block Gibbs sampler: same quasi-likelihood ## estimate as used elsewhere for this Carinsca Gamma GLM (not the Dobson Gaussian ps). dispersion2 <- gamma.dispersion(glm.carinsca) options(oldopt) ``` --- ### 4.1 Prior on the Dispersion Only (Gamma Prior on Precision) We place a Gamma prior on the precision: \[ \tau \sim \text{Gamma}(a_0,b_0), \] with log‑prior \[ \log p(\tau) = (a_0 - 1)\log\tau - b_0\tau + \text{const}. \] Given fixed \(\beta\), the conditional posterior is \[ \log p(\tau \mid \beta,y) = \ell(\beta,\tau) + (a_0 - 1)\log\tau - b_0\tau + \text{const}. \] This posterior is **log‑concave in \(\tau\)**, enabling envelope‑based accept–reject sampling in the Gamma regression case. ```{r glm_call_gamma_prior2} ## Carinsca Gamma GLM (already fitted in the Carinsca chunk for gamma.dispersion etc.) gamma.dispersion(glm.carinsca) out2 <- rglmb( n = 1000, y = y, x = x, family =Gamma(link="log"), pfamily = dGamma(shape = shape, rate = rate, beta = ps$coefficients), weights = Claims_Adj ) mean(out2$dispersion) ``` ### 4.2 Independent Normal–Gamma Prior Here the priors factorize: \[ \beta \sim N(\mu_0,\Sigma_0), \qquad \tau \sim \text{Gamma}(a_0,b_0), \] so \(\beta\) and \(\tau\) are **a priori independent**. --- #### **Log‑prior** \[ \log p(\beta,\tau) = -\tfrac12(\beta-\mu_0)^\top\Sigma_0^{-1}(\beta-\mu_0) + (a_0 - 1)\log\tau - b_0\tau + \text{const}. \] --- #### **Log‑posterior** \[ \log p(\beta,\tau \mid y) = \ell(\beta,\tau) -\tfrac12(\beta-\mu_0)^\top\Sigma_0^{-1}(\beta-\mu_0) + (a_0 - 1)\log\tau - b_0\tau + \text{const}. \] In Gamma regression: - the conditional posterior in \(\beta\) is **not conjugate** - the conditional posterior in \(\tau\) is **not conjugate** - the joint posterior remains **log‑concave** This in theory enables two sampling strategies for the Gamma regression model: 1. **Two‑block Gibbs sampler** - \( \beta \mid \tau, y \): sampled using a Normal approximation (via `rglmb`) - \( \tau \mid \beta, y \): sampled using a Gamma‑based quasi‑likelihood update This is the **only fully implemented method** for Gamma regression in the current version of *glmbayes*. 2. **Accept–reject envelope sampler (Chapter A07)** Chapter A07 develops an accept–reject sampler using a **truncated Gamma prior** for \( \tau \) and a joint envelope over \((\beta,\tau)\). **However, this method is implemented only for the Gaussian family.** Extending the envelope‑based sampler to the Gamma regression model would require additional mathematical development and corresponding modifications to the code. The truncated‑Gamma independent Normal–Gamma model in **Chapter A07** therefore applies exclusively to the Gaussian case, while Gamma regression currently relies on the two‑block Gibbs sampler described above. Implementation details: alternate `rglmb` draws for \(\beta \mid \tau, y\) using `dNormal(..., dispersion = dispersion2)` with draws for \(\tau \mid \beta, y\) using `dGamma(...)` on the Gamma regression likelihood, updating `dispersion2` after each \(\tau\) step. A **1000‑iteration burn‑in** is followed by **1000 stored iterations** (coefficient matrix `beta_out`, dispersion vector `disp_out`, and accept–reject counts `iters_out`). The same logic is written to **`inst/extdata/Chapter11_Carinsca_gamma_gibbs.rds`** by **`data-raw/make_Chapter11_Carinsca_gamma_gibbs_output.R`** (**`set.seed(190)`**). The chunk **`Block_Gibbs_gamma_Regression`** below is **not evaluated** when the vignette is built (`eval = FALSE`). Stored draws are loaded from **`inst/extdata/Chapter11_Carinsca_gamma_gibbs.rds`** in **`chapter11-load-carinsca-gamma`**, and summaries are produced in **`Block_Gibbs_gamma_loaded`**. ```{r chapter11-load-carinsca-gamma} ch11_cg_path <- system.file( "extdata", "Chapter11_Carinsca_gamma_gibbs.rds", package = "glmbayes" ) stopifnot(nzchar(ch11_cg_path), file.exists(ch11_cg_path)) ch11_cg <- readRDS(ch11_cg_path) stopifnot( ncol(ch11_cg$gibbs_gamma$beta_out) == ncol(ps$x), length(ch11_cg$gibbs_gamma$disp_out) == nrow(ch11_cg$gibbs_gamma$beta_out) ) ``` ```{r Block_Gibbs_gamma_Regression, eval = FALSE} set.seed(190) dispersion2 <- gamma.dispersion(glm.carinsca) suppressWarnings( suppressMessages( for(i in 1:1000){ ## --- Block 1: Regression coefficients --- out1 <- rglmb( n = 1, y = y, x = x, family = Gamma(link="log"), pfamily = dNormal(mu = mu, Sigma = V, dispersion = dispersion2), weights = Claims_Adj ) ## --- Block 2: Dispersion (quasi-likelihood sampler) --- out2 <- rglmb( n = 1, y = y, x = x, family = Gamma(link="log"), pfamily = dGamma(shape = shape, rate = rate, beta = out1$coefficients[1,]), weights = Claims_Adj ) ## --- SCALE dispersion for the next beta update --- ## Convert quasi (from out2) to MLE for consistency dispersion2 <- out2$dispersion ##* ((m - p)/m) } ) ) ## Run 1000 additional iterations and store output beta_out <- matrix(0, nrow = 1000, ncol = ncol(x)) disp_out <- rep(0, 1000) iters_out <- rep(0, 1000) suppressWarnings( suppressMessages( for (i in 1:1000) { out1 <- rglmb( n = 1, y = y, x = x, family = Gamma(link="log"), pfamily = dNormal(mu = mu, Sigma = V, dispersion = dispersion2), weights = Claims_Adj ) out2 <- rglmb( n = 1, y = y, x = x, family = Gamma(link="log"), pfamily = dGamma(shape = shape, rate = rate, beta = out1$coefficients[1,]), weights = Claims_Adj ) dispersion2 <- out2$dispersion ##* ((m - p) / m) beta_out[i, ] <- out1$coefficients[1, seq_len(ncol(x))] disp_out[i] <- out2$dispersion ##* ((m - p) / m) iters_out[i]<-out2$iters } ) ) ## Review output beta_mean <- colMeans(beta_out) beta_sd <- apply(beta_out, 2, sd) beta_tlike <- beta_mean / beta_sd # analogous to GLM t-values bayes_coef_table <- data.frame( Estimate = beta_mean, Std.Error = beta_sd, t.like = beta_tlike ) bayes_coef_table mean(disp_out) mean(iters_out) ``` ```{r Block_Gibbs_gamma_loaded} cg <- ch11_cg$gibbs_gamma beta_out <- cg$beta_out disp_out <- cg$disp_out iters_out <- cg$iters_out beta_mean <- colMeans(beta_out) beta_sd <- apply(beta_out, 2, sd) beta_tlike <- beta_mean / beta_sd bayes_coef_table <- data.frame( Estimate = beta_mean, Std.Error = beta_sd, t.like = beta_tlike ) bayes_coef_table mean(disp_out) mean(iters_out) ``` ### 4.3 Conditional Sampling for the Dispersion Parameter Chapter A06 provides a detailed derivation of the conditional posterior for the precision parameter \( v = 1/\phi \) in Gamma regression. The posterior can be written in the form \[ \ell(v) = ( \text{shape2} - 1 ) \log v - \text{rate1} \, v + \text{testfunc}(v) + \text{const}, \] where \( \text{testfunc}(v) \) is a concave remainder term capturing the curvature contributed by the Gamma likelihood. This decomposition enables a **tangent–envelope accept–reject sampler**, in which: 1. A Gamma proposal distribution is constructed using the linear part \[ ( \text{shape2} - 1 ) \log v - \text{rate2} \, v, \] where \( \text{rate2} = \text{rate1} - g^{*} \) and \( g^{*} \) is the derivative of \( \text{testfunc}(v) \) at the anchor point \( v^{*} \). 2. A proposed value \( v_{\text{prop}} \) is accepted with probability \[ \exp\!\left[ \text{testfunc}(v_{\text{prop}}) - \text{testfunc}(v^{*}) - g^{*}(v_{\text{prop}} - v^{*}) \right]. \] Because \( \text{testfunc}(v) \) is concave, the tangent line at \( v^{*} \) always lies above it, ensuring a valid rejection sampler. ## 8. Summary This chapter introduced Bayesian estimation of Gaussian models with unknown dispersion [@Gelman2013; @OHaganForster2004], including: • Gamma priors on the precision \( \tau \) • Joint Normal–Gamma priors • Independent Normal–Gamma priors • iid sampling via envelope methods • two‑block Gibbs sampling The log‑concavity of the Gaussian likelihood in both \( \beta \) and \( \tau \) ensures that all samplers are stable, efficient, and theoretically well‑behaved. ## A1: Posterior Distribution Details for Conjugate Normal-Gamma prior ### A1.1 Marginal Posterior for \( \tau \) Integrating out \( \beta \) yields a Gamma posterior **independent of \( \beta \)**: \[ \tau \mid y \sim \mathrm{Gamma}\!\left( a_{0} + \frac{n}{2},\; b_{0} + \frac{1}{2} S_{\text{marg}} \right), \] where \[ S_{\text{marg}} = (y - X\mu_{0})^{T} \left(I + X\Sigma_{0}X^{T}\right)^{-1} (y - X\mu_{0}). \] Here \( \Sigma_{0} \) is the same matrix as in Section 3.2 and in A1.2 (so that \( \Sigma_{0}^{-1} \) is the prior precision factor in \( -\frac{\tau}{2}(\beta-\mu_{0})^{\mathsf T}\Sigma_{0}^{-1}(\beta-\mu_{0}) \)). Let \( \hat\beta = (X^{\mathsf T}X)^{-1} X^{\mathsf T} y \) be the ordinary least squares estimator and \( \mathrm{RSS} = (y - X\hat\beta)^{\mathsf T}(y - X\hat\beta) \). Write \( G = X^{\mathsf T}X \) (invertible in the usual full-rank setup). The **same** scalar \( S_{\text{marg}} \) can be written as \[ S_{\text{marg}} = \mathrm{RSS} + (\hat\beta - \mu_{0})^{\mathsf T} \left(\Sigma_{0} + G^{-1}\right)^{-1} (\hat\beta - \mu_{0}). \] Equivalently, expanding \( (I + X\Sigma_{0}X^{\mathsf T})^{-1} \) with the Sherman--Morrison--Woodbury formula gives the same middle matrix in the precision form \[ G - G\left(\Sigma_{0}^{-1} + G\right)^{-1}G = \left(\Sigma_{0} + G^{-1}\right)^{-1}. \] In particular, if \( \mu_{0} = \hat\beta \) then the second term vanishes and \( S_{\text{marg}} = \mathrm{RSS} \). **Step-by-step (Woodbury + least-squares split).** Set \(e = y - X\hat\beta\), \(\delta = \hat\beta - \mu_{0}\), so \(y - X\mu_{0} = e + X\delta\). Normal equations give \(X^{\mathsf T}e = 0\), hence \(\|e + X\delta\|^{2} = \mathrm{RSS} + \delta^{\mathsf T}G\delta\) and \(X^{\mathsf T}(e + X\delta) = G\delta\). Woodbury says \((I + X\Sigma_{0}X^{\mathsf T})^{-1} = I - X(\Sigma_{0}^{-1}+G)^{-1}X^{\mathsf T}\). Substitute \(y-X\mu_{0}=e+X\delta\) into the quadratic form for \(S_{\text{marg}}\); cross terms vanish because \(e^{\mathsf T}X=0\), and the algebra reduces to \(\mathrm{RSS} + \delta^{\mathsf T}\bigl(G - G(\Sigma_{0}^{-1}+G)^{-1}G\bigr)\delta\), which equals the display above. For observation weights \( w_{i} \) (Section 2.2), define \(G = X^{\mathsf T}WX\), \(\hat\beta = G^{-1} X^{\mathsf T}Wy\), and \(\mathrm{RSS} = (y - X\hat\beta)^{\mathsf T}W(y - X\hat\beta)\). The **RSS + \((\hat\beta-\mu_{0})^{\mathsf T}(\Sigma_{0}+G^{-1})^{-1}(\hat\beta-\mu_{0})\)** identity is unchanged (same proof in whitened coordinates \(\tilde y = W^{1/2}y\), \(\tilde X = W^{1/2}X\)). The **first** display becomes \((y-X\mu_{0})^{\mathsf T}W^{1/2}(I + W^{1/2}X\Sigma_{0}X^{\mathsf T}W^{1/2})^{-1}W^{1/2}(y-X\mu_{0})\) for any factor \(W^{1/2}\) with \(W^{1/2}(W^{1/2})^{\mathsf T}=W\) (e.g.\ diagonal \(\sqrt{w_i}\)). This closed‑form marginal is what makes the Normal–Gamma prior so computationally attractive: sampling \( \tau \) requires **only a single Gamma draw**. ### A1.2 Conditional Posterior for \( \beta \) Given \( \tau \), the posterior for \( \beta \) is Gaussian: \[ \beta \mid \tau,y \sim N(m_{\text{post}}, V_{\text{post}}), \] with \[ V_{\text{post}} = \left(\Sigma_{0}^{-1} + \tau X^{T}X\right)^{-1}, \qquad m_{\text{post}} = V_{\text{post}} \left(\Sigma_{0}^{-1}\mu_{0} + \tau X^{T}y\right). \] ### A1.3 Sampling the Posterior Because the Normal--Gamma prior is *conjugate* to the Gaussian likelihood [@OHaganForster2004], the **joint posterior factorizes into two standard distributions**: \[ p(\tau,\beta \mid y) = p(\tau \mid y)\, p(\beta \mid \tau, y). \] This structure means **no MCMC is required**. Posterior draws are obtained by **independent Monte Carlo sampling**, not by a Gibbs chain. --- #### Step 1: Draw the marginal posterior for \( \tau \) The dispersion (precision) parameter has a closed‑form marginal posterior: \[ \tau \mid y \sim \mathrm{Gamma}\!\left( a_{0} + \frac{n}{2},\; b_{0} + \frac{1}{2} S(\hat\beta_{\text{GLS}}) \right), \] where \[ S(\hat\beta_{\text{GLS}}) = (y - X\hat\beta_{\text{GLS}})^{T}(y - X\hat\beta_{\text{GLS}}) \] and \( \hat\beta_{\text{GLS}} \) is the generalized least‑squares estimator under the prior precision. Each draw of \( \tau \) is **i.i.d.** from this Gamma distribution. --- #### Step 2: Draw \( \beta \) conditionally on each sampled \( \tau \) Given a sampled value \( \tau^{(s)} \), the regression coefficients have a Normal posterior: \[ \beta \mid \tau^{(s)}, y \sim N\!\left(m_{\text{post}}^{(s)},\, V_{\text{post}}^{(s)}\right), \] with \[ V_{\text{post}}^{(s)} = \left(\Sigma_{0}^{-1} + \tau^{(s)} X^{T}X\right)^{-1}, \] \[ m_{\text{post}}^{(s)} = V_{\text{post}}^{(s)} \left(\Sigma_{0}^{-1}\mu_{0} + \tau^{(s)} X^{T}y\right). \] Each draw of \( \beta \) is **conditionally independent** given its corresponding \( \tau^{(s)} \). --- ### Result: Pure i.i.d. Monte Carlo Sampling The conjugate posterior allows: - **direct sampling** of \( \tau \) from its marginal Gamma distribution - **direct sampling** of \( \beta \) from its conditional Normal distribution No Markov chain is constructed, and **no Gibbs or Metropolis steps** are involved. All posterior samples are **independent**, making the conjugate Normal–Gamma model one of the simplest Bayesian regression frameworks for simulation. ## A2. Detailed sampling procedured for the Independent Normal-Gamma prior ### A2.1 iid Sampling Under Truncated Gamma Priors When iid sampling is requested, glmbayes uses a truncated Gamma prior for the precision parameter \( \tau \). The truncation is introduced for numerical stability and does not change the basic Normal-Gamma structure. The iid sampler for this prior family is based on the envelope accept-reject method described in: **Chapter A07 — Accept-Reject Sampling for Gaussian Regression Models with Independent Normal-Gamma Priors** That vignette explains the full simulation procedure, including: * how the dispersion parameter is truncated, * how the envelope is constructed across dispersion values, * how the mixture of truncated normal proposals for \( \beta \) is formed, * how the Gamma proposal for \( \tau \) is calibrated. The key point for users is that the posterior remains log-concave, so the accept-reject sampler produces independent draws for \( (\beta, \tau) \). There is no Markov chain, no burn-in, and no autocorrelation. ## A3. Posterior mean of \( \beta \) and marginal covariance (Zellner-type prior) This subsection records closed-form **posterior moments** for \( \beta \) in the **conjugate Normal-Gamma** Gaussian regression model, in the same weighted likelihood notation as Section 2.2 and in the **prior weight / effective prior sample size** notation used by `Prior_Setup()`. ### A3.1 Weighted likelihood and prior precision Assume \[ y \mid \beta,\phi \sim N\!\left(X\beta,\;\phi W^{-1}\right), \qquad \tau = 1/\phi, \] with \(W = \mathrm{diag}(w_{1},\dots,w_{n})\) and \(w_i \ge 0\). Define the **effective sample size** \[ n_{w} = \sum_{i=1}^{n} w_i \] (the quantity `sum(wt)` in `rNormalGammaReg()`). Write the **weighted Gram matrix** as \[ G = X^{\mathsf T} W X. \] Let \(\hat\beta\) denote the **weighted least squares** estimator \[ \hat\beta = G^{-1} X^{\mathsf T} W y \] and the **weighted residual sum of squares** \[ \mathrm{RSS} = (y - X\hat\beta)^{\mathsf T} W (y - X\hat\beta). \] Take a **Zellner-type** conditional Normal prior on \(\beta\) given \(\tau\) that is proportional to the data precision: \[ \beta \mid \tau \sim N\!\left(\mu_{0},\; \tau^{-1}\left(\frac{pwt}{1-pwt}G\right)^{-1}\right), \qquad 0 < pwt < 1, \] equivalently prior **precision** \(\tau\, (pwt/(1-pwt))\, G\). This is the structure implied by `Prior_Setup()` together with `dNormal_Gamma(..., Sigma_0, ...)` when \(\Sigma_0 = ((1-pwt)/pwt)\, G^{-1}\) (Zellner \(g\)-prior relative to \(G^{-1}\); see Section 3.2). Define the **effective prior sample size** \(n_{\mathrm{prior}} > 0\) by \[ pwt = \frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}} + n_{w}}, \qquad 1 - pwt = \frac{n_{w}}{n_{\mathrm{prior}} + n_{w}}. \] This is the relationship implemented when `Prior_Setup()` is called with scalar `n_prior` (`n_prior` and `n_effective` in the function). ### A3.2 Posterior mean of \( \beta \) Combining likelihood precision \(\tau G\) with prior precision \(\tau (pwt/(1-pwt)) G\) gives posterior precision **given \(\tau\)**: \[ \tau G + \tau \frac{pwt}{1-pwt} G = \frac{\tau}{1-pwt}\, G. \] Hence \[ \mathrm{Cov}(\beta \mid \tau, y) = \frac{1-pwt}{\tau}\, G^{-1} = \frac{n_{w}}{n_{\mathrm{prior}} + n_{w}}\,\frac{1}{\tau}\, G^{-1}. \] The **posterior mean** \(m_{\mathrm{post}} = E(\beta \mid \tau,y)\) solves the same normal equations as the **precision-weighted average** of \(\hat\beta\) and \(\mu_{0}\), and simplifies to \[ \boxed{ E(\beta \mid y) = (1-pwt)\,\hat\beta + pwt\,\mu_{0} = \frac{n_{w}}{n_{\mathrm{prior}} + n_{w}}\,\hat\beta + \frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}} + n_{w}}\,\mu_{0}. } \] The right-hand side **does not depend on \(\tau\)** (for this prior), so \(E(\beta \mid y) = E(\beta \mid \tau,y)\). ### A3.3 Marginal covariance of \( \beta \) and the Gamma parameters Marginalize \(\tau\) with \[ \tau \mid y \sim \mathrm{Gamma}(a_{n}, b_{n}) \] in the **shape-rate** parameterization \(p(\tau) \propto \tau^{a_{n}-1} e^{-b_{n}\tau}\) used in `rNormalGammaReg()` (`R::rgamma(shape, 1/rate)`), so \(E(\tau^{-1} \mid y) = b_{n}/(a_{n}-1)\) for \(a_{n} > 1\). Let \[ V_{n} = (1-pwt)\, G^{-1} = \frac{n_{w}}{n_{\mathrm{prior}} + n_{w}}\, G^{-1}, \] so \(\mathrm{Cov}(\beta \mid \tau,y) = \tau^{-1} V_{n}\). Because \(E(\beta \mid \tau,y)\) is constant in \(\tau\), the **law of total covariance** gives \[ \boxed{ \mathrm{Cov}(\beta \mid y) = E(\tau^{-1} \mid y)\, V_{n} = \frac{b_{n}}{a_{n}-1}\, \frac{n_{w}}{n_{\mathrm{prior}} + n_{w}}\, G^{-1}, \qquad a_{n} > 1. } \] In the sampler, \[ a_{n} = a_{0} + \frac{n_{w}}{2}, \qquad b_{n} = b_{0} + \frac{1}{2} S_{\mathrm{marg}}, \] with \(S_{\mathrm{marg}}\) the marginal sum of squares from Section A1.1. In the notation of this section (\(G = X^{\mathsf T}WX\), weighted least squares \(\hat\beta = G^{-1} X^{\mathsf T}Wy\), weighted \(\mathrm{RSS} = (y - X\hat\beta)^{\mathsf T}W(y - X\hat\beta)\), and \(\Sigma_{0}\) as in Section 3.2), \[ S_{\mathrm{marg}} = \mathrm{RSS} + (\hat\beta - \mu_{0})^{\mathsf T} \left(\Sigma_{0} + G^{-1}\right)^{-1} (\hat\beta - \mu_{0}). \] This is the same scalar as the augmented residual sum of squares `S` produced by `rNormal_reg.wfit()` (Section A1). Under the Zellner \(g\)-prior of A3.1, the prior **covariance** is \(\Sigma_{0} = ((1-pwt)/pwt)\,G^{-1}\), so \(\Sigma_{0} + G^{-1} = G^{-1}/pwt\) and \((\Sigma_{0} + G^{-1})^{-1} = pwt\, G\). Equivalently, \[ S_{\mathrm{marg}} = \mathrm{RSS} + pwt\, (\hat\beta - \mu_{0})^{\mathsf T} G (\hat\beta - \mu_{0}). \] The **structural** point is that the prior--data mismatch enters only through the scalar factor **\(pwt\)** multiplying a nonnegative quadratic in \(\hat\beta-\mu_{0}\). Hence, holding the data and \(\mu_{0}\) fixed so that \((\hat\beta-\mu_{0})^{\mathsf T}G(\hat\beta-\mu_{0})\) stays finite, \[ pwt \to 0^{+} \quad\Longrightarrow\quad pwt\,(\hat\beta-\mu_{0})^{\mathsf T}G(\hat\beta-\mu_{0}) \to 0, \] and \(S_{\mathrm{marg}} \to \mathrm{RSS}\): weak prior weight **washes out** the gap even when \(\mu_{0} \neq \hat\beta\). Equivalently, \(pwt\to 0\) whenever \(n_{\mathrm{prior}}/(n_{\mathrm{prior}}+n_{w})\to 0\)---for example \(n_{\mathrm{prior}}\to 0\) with \(n_{w}\) held fixed, or \(n_{w}\to\infty\) with \(n_{\mathrm{prior}}\) held fixed. The **exact** cancellation \(S_{\mathrm{marg}} = \mathrm{RSS}\) also holds for any \(pwt\) when \(\mu_{0} = \hat\beta\) (for example `Prior_Setup(..., intercept_source = "full_model", effects_source = "full_model")`). For fixed \(pwt>0\) and \(\mu_{0} \neq \hat\beta\), the second term is strictly positive unless \(\hat\beta-\mu_{0}\) lies in the null space of \(G\). Under standard regularity, with fixed \(n_{\mathrm{prior}}\) and \(n_{w}\to\infty\), one has \(pwt = O(1/n_{w})\) while \((\hat\beta-\mu_{0})^{\mathsf T}G(\hat\beta-\mu_{0})\) is typically \(O_{p}(n_{w})\), so the mismatch term is often \(O_{p}(1)\): the **additive** difference \(S_{\mathrm{marg}}-\mathrm{RSS}\) need not vanish at finite \(n_{w}\), but \(S_{\mathrm{marg}}/n_{w}\) and \(\mathrm{RSS}/n_{w}\) share the same leading order. **Limit \(pwt\to 0\) and coefficient covariance.** Under the Zellner decomposition above, **\(pwt\to 0^{+}\)** forces **\(S_{\mathrm{marg}}\to \mathrm{RSS}\)** (for fixed data and \(\mu_{0}\), with a finite mismatch quadratic), not only when \(\mu_{0}=\hat\beta\). Since \(b_{n} = b_{0} + \tfrac{1}{2} S_{\mathrm{marg}}\), the marginal Gamma rate then limits to **\(b_{0} + \tfrac{1}{2}\mathrm{RSS}\)** whenever \(b_{0}\) is held fixed along the path; \(E(\tau^{-1}\mid y)=b_{n}/(a_{n}-1)\) therefore ties to **RSS** through \(b_{n}\) in the same limit. At the same time, **\((1-pwt)\,G^{-1}\to G^{-1}\)** from A3.2--A3.3. Thus \(\mathrm{Cov}(\beta \mid \tau,y) = \tau^{-1}(1-pwt)\,G^{-1}\) tends to \(\tau^{-1}G^{-1}\) (likelihood-only WLS covariance given \(\tau\)), and \[ \mathrm{Cov}(\beta \mid y) = \frac{b_{n}}{a_{n}-1}\,(1-pwt)\,G^{-1}, \qquad a_{n}>1, \] has both **\(b_{n}\)** driven by **\(S_{\mathrm{marg}}\to \mathrm{RSS}\)** and the matrix prefactor \((1-pwt)\,G^{-1}\to G^{-1}\): weak \(pwt\) removes prior shrinkage of precision **and** drives the residual sum of squares in the \(\tau\) update back to the **data** \(\mathrm{RSS}\). The usual \(\sigma^{2}(X^{\mathsf T}WX)^{-1}\) limit is recovered when, in addition, \(a_{n}\) and \(b_{0}\) follow the weak-prior path discussed below (e.g.\ formal \(n_{\mathrm{prior}}\to 0\) with \(n_{w}\) fixed, holding \(a_{n}>1\)). If \(pwt\) is driven to zero by \(n_{w}\to\infty\) at fixed \(n_{\mathrm{prior}}\), then \((1-pwt)\to 1\) **and** \(G^{-1}\) typically scales as \(O(1/n_{w})\), so \(\mathrm{Cov}(\beta\mid y)\) shrinks with more weighted data even though the prior contributes negligibly to precision. With `Prior_Setup()` defaults for the Gamma on \(\tau\), \(a_{0} = n_{\mathrm{prior}}/2\) in the pre-calibration Step 9 rate scaling, so \(a_{n} = (n_{\mathrm{prior}} + n_{w})/2\); the prior rate is \(b_{0} = (n_{\mathrm{prior}}/2)\, d\) with `dispersion` \(d\) returned by `Prior_Setup()` (see Section 3.2 and `?Prior_Setup`). The **marginal** posterior of \(\beta\) is **multivariate \(t\)**; the matrix above is its **covariance** (not the scale matrix in every textbook parameterization of the \(t\) density). #### `Prior_Setup()` dispersion and the Gamma rates \(b_{0}\), \(b_{n}\) For Gaussian models with scalar `n_prior`, `Prior_Setup()` sets the prior Gamma **rate** to \(\mathrm{rate} = (n_{\mathrm{prior}}/2)\, d\), where \(d\) is the scalar `dispersion` returned by the function. In the notation above, \(b_{0} = \mathrm{rate}\). The **posterior** Gamma rate is \(b_{n} = b_{0} + \frac{1}{2} S_{\mathrm{marg}}\). Thus a larger returned \(d\) increases \(b_{0}\) and, holding \(S_{\mathrm{marg}}\) fixed, would increase \(b_{n}\). In practice, for **`dNormal_Gamma`** the matrix passed to the sampler is **`Sigma_0`** (dispersion-free prior covariance from `Prior_Setup()`), while the returned **`dispersion`** calibrates the Gamma rate and the coefficient-scale **`Sigma`**; changing \(d\) rescales the Normal--Gamma coupling of \(\beta\) given \(\tau\); the augmented least-squares step then produces a **different** \(S_{\mathrm{marg}}\) and posterior mean \(m_{\mathrm{post}}\) as well. #### Special case: \(b_{0} = \frac{1}{2}(n_{\mathrm{prior}}/n_{w})\,S_{\mathrm{marg}}\) \[ \frac{S_{\mathrm{marg}}}{n_{w}} \] This choice is **not** the `Prior_Setup()` default; it is a **hypothetical** reparameterization that sets the prior Gamma rate proportional to the **same** marginal sum of squares that enters the likelihood fragment, with weight \(n_{\mathrm{prior}}/n_{w}\) on the prior piece relative to the data piece (see the discussion in the main text). Because \(S_{\mathrm{marg}}\) depends on \(y\) (and on \(\mu_{0}\) versus \(\hat\beta\)), such a \(b_{0}\) is **data-dependent** unless used only in stylized derivations. Assume \(a_{0} = n_{\mathrm{prior}}/2\) (nominal rate scaling). Then \(a_{n} = (n_{\mathrm{prior}} + n_{w})/2\) and, provided \(a_{n} > 1\) (equivalently \(n_{\mathrm{prior}} + n_{w} > 2\)), \[ a_{n} - 1 = \frac{n_{\mathrm{prior}} + n_{w} - 2}{2}. \] Take \[ b_{0} = \frac{1}{2}\,\frac{n_{\mathrm{prior}}}{n_{w}}\, S_{\mathrm{marg}}, \qquad b_{n} = b_{0} + \frac{1}{2} S_{\mathrm{marg}} = \frac{1}{2}\, S_{\mathrm{marg}}\left(1 + \frac{n_{\mathrm{prior}}}{n_{w}}\right) = \frac{S_{\mathrm{marg}}}{2}\,\frac{n_{w} + n_{\mathrm{prior}}}{n_{w}}. \] Then the posterior expectation of \(\tau^{-1} = \phi = \sigma^{2}\) is \[ \frac{b_{n}}{a_{n}-1} = \frac{S_{\mathrm{marg}}(n_{w} + n_{\mathrm{prior}})/n_{w}} {(n_{\mathrm{prior}} + n_{w} - 2)}. \] Under the Zellner prior of A3.1, \(V_{n} = (1-pwt)\, G^{-1} = \dfrac{n_{w}}{n_{\mathrm{prior}} + n_{w}}\, G^{-1}\). One may write \(\mathrm{Cov}(\beta \mid y)\) as the product of \(b_{n}/(a_{n}-1)\) and \(V_{n}\) **without** combining the \((n_{\mathrm{prior}} + n_{w})\) terms: \[ \boxed{ \mathrm{Cov}(\beta \mid y) = \frac{b_{n}}{a_{n}-1}\, V_{n} = \dfrac{S_{\mathrm{marg}}\,(n_{\mathrm{prior}} + n_{w})/n_{w}} {n_{\mathrm{prior}} + n_{w} - 2} \cdot \dfrac{n_{w}}{n_{\mathrm{prior}} + n_{w}}\, G^{-1}. } \] The two fractions cancel to \(S_{\mathrm{marg}} / (n_{\mathrm{prior}} + n_{w} - 2)\) times \(G^{-1}\). So with this \(b_{0}\), the marginal coefficient covariance matches that heuristic pattern---analogous in structure to \(\mathrm{RSS}/(n-p)\) times \((X^{\mathsf T}WX)^{-1}\) when \(S_{\mathrm{marg}} = \mathrm{RSS}\) (e.g.\ \(\mu_{0} = \hat\beta\)) and one reads \(n_{\mathrm{prior}} + n_{w} - 2\) as an effective residual degrees of freedom for the **joint** Normal--Gamma fragment. The factor \(a_{n}-1\) in the denominator of \(b_{n}/(a_{n}-1)\) is exactly half that count, and the factor \((1-pwt) = n_{w}/(n_{\mathrm{prior}}+n_{w})\) from \(V_{n}\) supplies the remaining alignment. **Limit \(n_{\mathrm{prior}} \to 0\) (formal comparison to classical).** Holding \(n_{w} > 2\) and the data fixed, \(pwt = n_{\mathrm{prior}}/(n_{\mathrm{prior}}+n_{w}) \to 0\), \(b_{0} = \frac{1}{2}(n_{\mathrm{prior}}/n_{w})S_{\mathrm{marg}} \to 0\), \(a_{n} = (n_{\mathrm{prior}}+n_{w})/2 \to n_{w}/2\), and \(a_{n}-1 \to (n_{w}-2)/2\). Along the same path, \(S_{\mathrm{marg}} = \mathrm{RSS} + pwt\,(\hat\beta - \mu_{0})^{\mathsf T}G(\hat\beta-\mu_{0}) \to \mathrm{RSS}\) because \(pwt\to 0\) (Section A3.3): the prior--data mismatch vanishes **even when** \(\mu_{0} \neq \hat\beta\). Also \(b_{n} = b_{0} + \frac{1}{2}S_{\mathrm{marg}} \to \frac{1}{2}\mathrm{RSS}\), so \[ \frac{b_{n}}{a_{n}-1} \to \frac{\mathrm{RSS}}{n_{w}-2}, \qquad V_{n} = (1-pwt)\,G^{-1} \to G^{-1}. \] Hence \[ \mathrm{Cov}(\beta \mid y) \to \frac{\mathrm{RSS}}{n_{w}-2}\,(X^{\mathsf T}WX)^{-1}, \] the usual weighted least squares covariance under the Gaussian `glm.fit` dispersion convention of `Prior_Setup()` (see `?Prior_Setup`, Details). **Before** taking \(n_{\mathrm{prior}}\to 0\), a fixed \(n_{\mathrm{prior}}>0\) with \(\mu_{0} \neq \hat\beta\) typically has \(S_{\mathrm{marg}} > \mathrm{RSS}\) and hence a **larger** \(b_{n}\) and posterior spread than this classical limit suggests; only in the limit does \(S_{\mathrm{marg}}\) coincide with \(\mathrm{RSS}\) for every \(\mu_{0}\). At \(n_{\mathrm{prior}} = 0\) the Zellner \(g\)-prior weight and the Gamma shape \(a_{0}\) are degenerate, so this limit is best read as **asymptotic intuition** for weak prior weight, not as a literal prior specification. **Growth of \(n_{w}\) at fixed \(n_{\mathrm{prior}}\).** The posterior mean of \(\tau^{-1}=\sigma^{2}\) in the special case can be written with the uncancelled factors as \[ \frac{b_{n}}{a_{n}-1} = \frac{S_{\mathrm{marg}}\,(n_{\mathrm{prior}} + n_{w})/n_{w}} {n_{\mathrm{prior}} + n_{w} - 2} = \frac{S_{\mathrm{marg}}}{n_{w}} \cdot \frac{n_{\mathrm{prior}} + n_{w}}{n_{\mathrm{prior}} + n_{w} - 2}. \] As \(n_{w} \to \infty\) with \(n_{\mathrm{prior}}\) fixed, \((n_{\mathrm{prior}} + n_{w})/(n_{\mathrm{prior}} + n_{w} - 2) \to 1\) and \((n_{\mathrm{prior}} + n_{w})/n_{w} \to 1\), so asymptotically \((b_{n}/(a_{n}-1)) \sim S_{\mathrm{marg}}/n_{w}\): the leading behavior is **mean marginal residual energy per effective observation**. Moreover \(pwt = n_{\mathrm{prior}}/(n_{\mathrm{prior}}+n_{w}) \to 0\), so in the Zellner decomposition \(S_{\mathrm{marg}} = \mathrm{RSS} + pwt\,(\hat\beta-\mu_{0})^{\mathsf T}G(\hat\beta-\mu_{0})\) the mismatch term is multiplied by a coefficient tending to **zero**; under typical regularity \(pwt\,(\hat\beta-\mu_{0})^{\mathsf T}G(\hat\beta-\mu_{0}) = O_{p}(1)\) while \(\mathrm{RSS}\) grows with \(n_{w}\), so \(S_{\mathrm{marg}}/n_{w}\) and \(\mathrm{RSS}/n_{w}\) have the same leading order. The factor \((n_{\mathrm{prior}}+n_{w})/n_{w}\) is the only explicit **\(O(1/n_{w})\)** correction linking total sample size to effective data count before cancellation with \(V_{n}\). #### `dGamma` posterior with fixed \(\beta\): general form and special choices For a precision-only prior \[ \tau \sim \mathrm{Gamma}(a_0,b_0), \] and Gaussian weighted likelihood with \(\beta\) treated as fixed, define \[ \mathrm{RSS}_w(\beta)=\sum_i w_i(y_i-x_i^\top\beta)^2. \] Then \[ \tau\mid y,\beta \sim \mathrm{Gamma}\!\left(a_0+\frac{n_w}{2},\; b_0+\frac12\,\mathrm{RSS}_w(\beta)\right). \] Using the Chapter 11 calibration \[ a_0=\frac{n_{\mathrm{prior}}}{2}, \qquad b_0=\frac12\frac{n_{\mathrm{prior}}}{n_w}\,\mathrm{RSS}_w(\beta), \] we get \[ \tau\mid y,\beta \sim \mathrm{Gamma}\!\left( \frac{n_{\mathrm{prior}}+n_w}{2}, \frac12\left(1+\frac{n_{\mathrm{prior}}}{n_w}\right)\mathrm{RSS}_w(\beta) \right). \] Hence, for \(n_{\mathrm{prior}}+n_w>2\), \[ E(\tau^{-1}\mid y,\beta) = \frac{b_n}{a_n-1} = \frac{\left(\frac{n_{\mathrm{prior}}+n_w}{n_w}\right)\mathrm{RSS}_w(\beta)} {n_{\mathrm{prior}}+n_w-2}. \] This expression holds for any fixed \(\beta\). #### Choice \(\beta=\beta_\star\) (posterior mode from `dNormal_Gamma`) and weak-prior limit If \(\beta=\beta_\star\), where \(\beta_\star\) is the posterior mode from the `dNormal_Gamma` fit, the same formula applies with \(\mathrm{RSS}_w(\beta_\star)\). Using the weighted least-squares decomposition: \[ \mathrm{RSS}_w(\beta) = \mathrm{RSS}_w(\hat\beta) + (\beta-\hat\beta)^\top G(\beta-\hat\beta),\qquad G=X^\top W X, \] so \[ \mathrm{RSS}_w(\beta_\star) = \mathrm{RSS}_w(\hat\beta) + (\beta_\star-\hat\beta)^\top G(\beta_\star-\hat\beta). \] Under scalar Zellner weighting, \(\beta_\star=(1-pwt)\hat\beta+pwt\,\mu_0\), giving \[ \mathrm{RSS}_w(\beta_\star) = \mathrm{RSS}_w(\hat\beta) + pwt^2(\hat\beta-\mu_0)^\top G(\hat\beta-\mu_0). \] As \(n_{\mathrm{prior}}\to 0\) with \(n_w\) fixed, \(pwt\to 0\), so \[ \mathrm{RSS}_w(\beta_\star)\to \mathrm{RSS}_w(\hat\beta), \] and therefore \[ E(\tau^{-1}\mid y,\beta_\star) \to \frac{\mathrm{RSS}_w(\hat\beta)}{n_w-2}, \] the classical weighted residual-variance limit.