Chapter A12: Technical Derivations for Priors Returned by `Prior_Setup()

Kjell Nygren

2026-04-30

library(glmbayes)

Chapter A12: Technical Derivations for Priors Returned by Prior_Setup()

1. Introduction

This appendix provides a complete and self‑contained derivation of the prior objects returned by Prior_Setup() and of the Gaussian prior families used throughout glmbayes. Its purpose is to make explicit how the returned quantities—mu, Sigma, Sigma_0, dispersion, shape, rate, and related fields—arise from the weighted Gaussian likelihood, the Normal–Gamma algebra, and the Zellner‑type calibration used by the package.

Unlike Chapter 11, which focuses on modeling workflow and examples, this chapter focuses on the mathematical structure underlying the priors:

All formulas needed by the main vignettes are derived here from first principles. No results are imported from Chapter 11; instead, Chapter 11 now serves as a conceptual overview, while this appendix provides the full algebraic details.

The goal is to make the calibration used by Prior_Setup() transparent, reproducible, and extensible, so that users can confidently interpret or modify the priors supplied to dNormal(), dNormal_Gamma(), and dIndependent_Normal_Gamma().

Textbook treatments of conjugate Normal–Gamma linear models and related updating appear in (Gelman et al. 2013; Raiffa and Schlaifer 1961). The Zellner \(g\)-prior scaling used for coefficient covariances is due to (Zellner 1986). Applied prior construction with Prior_Setup() is in (Nygren 2025).

1. Introductory Discussion

This appendix records precise formulas and derivations for the prior objects returned by Prior_Setup() and for the conjugate Normal–Gamma Gaussian model used by dNormal_Gamma(). The goal is to connect implementation quantities (mu, Sigma, Sigma_0, dispersion, shape, rate, and related settings) to the weighted likelihood notation and \(S_{\mathrm{marg}}\) machinery in Chapter 11 (especially Section 3.2 and Appendix A3), with steps spelled out rather than only stated.

This chapter is a companion to the main vignettes: it emphasizes theory, mapping to pfamily constructors, and how defaults encode prior strength.

Roadmap. Chapter 11 fixes notation for weighted Gaussian regression (\(n_w\), \(G = X^{\mathsf T} W X\), precision \(\tau = 1/\phi\), and the conjugate Normal–Gamma structure). Appendix A3 there gives closed-form posterior moments for \(\beta\) under the Zellner-type prior implied by scalar pwt. Chapter A02 documents how pfamily objects map to lower-level simulation functions. Here we tie those ideas to what Prior_Setup() actually returns and how to pass those fields into dNormal(), dNormal_Gamma(), and dIndependent_Normal_Gamma() without mixing coefficient-scale Sigma, dispersion-free Sigma_0, and optional fixed dispersion (see ?Prior_Setup, ?compute_gaussian_prior).

2. Default Priors for Coefficient Means and Covariance Matrices

This section concerns families such as binomial and Poisson where the usual exponential-family dispersion is \(\phi=1\) (Chapters 5, 7, and 8). Gaussian models and dNormal_Gamma are in Section 3.

Let \(n_w = \sum_i w_i\) for nonnegative observation weights \(w_i\) in the weighted likelihood (the same totals appear as PriorSettings$n_effective). These \(w_i\) are fixed by design and do not depend on \(\beta\).

2.1 How prior means are determined

The Prior_Setup function provides three options for setting the prior mean vector mu. By default, it is set to correspond to the NULL (intercept only) model (intercept_source = "null_model",effects_source = "null_effects") . Alternatively, the user can change this to correspond to the OLS estimates for the intercept (intercept_source =full_model") , the predictors (effects_source ="full_model"), or both. Finally, the user can also optionally provide their own custom prior mean vector mu directly to the Prior_Setup function.

2.2 Data precision \(P(\beta)\).

Let \(\ell(\beta)\) be the weighted log-likelihood as in Chapters 7–8, with \(\eta_i = x_i^{\mathsf T}\beta\). Define the data precision matrix \[ P(\beta) := \nabla^2_\beta\bigl(-\ell(\beta)\bigr), \] the Hessian of the negative log-likelihood. With \(\ell(\beta)=\sum_i \ell_i(\eta_i)\), \[ P(\beta) = X^{\mathsf T} W(\beta)\, X, \qquad W_i(\beta) := -\frac{d^2 \ell_i}{d\eta_i^2}\Big|_{\eta_i=x_i^{\mathsf T}\beta} \ge 0 \] (log-concavity in \(\eta\); Chapter 5), and \(W(\beta)\) diagonal. The Hessian form of \(P(\beta)\) matches standard GLM theory (McCullagh and Nelder 1989).

Write \(W_i(\beta) = w_i\,\omega_i(\beta)\) with fixed \(w_i\) and mean-dependent \(\omega_i(\beta)\). Let \(W_{\mathrm{obs}}=\mathrm{diag}(w_i)\) and \(\Omega(\beta)=\mathrm{diag}(\omega_i(\beta))\). Then \(W(\beta) = W_{\mathrm{obs}}\,\Omega(\beta)\) (indexwise) and \[ P(\beta) = X^{\mathsf T} W_{\mathrm{obs}}\,\Omega(\beta)\, X. \]

Examples (Chapters 7–8):

  • Poisson, log link: \(P(\beta) = X^{\mathsf T}\,\mathrm{diag}\bigl(w_i\,\mu_i(\beta)\bigr)\, X\).
  • Binomial, logit: \(P(\beta) = X^{\mathsf T}\,\mathrm{diag}\bigl(w_i\,\mu_i(\beta)(1-\mu_i(\beta))\bigr)\, X\).

2.3 Zellner-type prior using \(P(\beta^{\ast})\)

2.3.1 Precision mapping and default covariance scaling

For these families, Prior_Setup() sets dispersion, shape, rate, and Sigma_0 to NULL. Let \(V_0\) denote the sampling covariance matrix of the fitted coefficients \(\beta^{\ast}\) under the stated model (\(\phi=1\)). Then \[ V_0^{-1} = P(\beta^{\ast}). \]

Weighted Gaussian, fixed dispersion \(d\). (See Section 3 for prior outputs.) Then \(P(\beta)=\frac{1}{d} X^{\mathsf T} W_{\mathrm{obs}} X\) for all \(\beta\), and the same identification gives \(V_0^{-1}=\frac{1}{d} X^{\mathsf T} W_{\mathrm{obs}} X\) when \(V_0\) is the covariance matrix at dispersion \(d\). User-provided \(d\) in compute_gaussian_prior() sets returned dispersion to \(d\) and rescales Sigma so this scale is explicit in the returned list.

Prior covariance: scalar pwt, \(\Sigma = \frac{1-\mathrm{pwt}}{\mathrm{pwt}}\, V_0\) (equivalently \(\Sigma^{-1} = \frac{\mathrm{pwt}}{1-\mathrm{pwt}}\, P(\beta^{\ast})\))

This Sigma is what Prior_Setup() returns by default on the coefficient scale. For Gaussian fits, the returned dispersion-free matrix is

\[ \Sigma_0 = \Sigma / d, \] so \[ \Sigma_0^{-1} = d\,\Sigma^{-1} = d\,\frac{\mathrm{pwt}}{1-\mathrm{pwt}}\,P(\beta^{\ast}). \] Using \(P(\beta^{\ast})=\frac{1}{d}X^{\mathsf T}WX\) in weighted Gaussian regression gives \[ \Sigma_0^{-1} = \frac{\mathrm{pwt}}{1-\mathrm{pwt}}\,X^{\mathsf T}WX, \] which is independent of \(d\); this is the default Sigma_0 returned by Prior_Setup().

2.3.2 Posterior mean and Variance under dNormal()

When default settings are used, the Gaussian posterior means reduce to simple weighted averages of the fitted coefficient vector \(\beta^\ast\) and prior mean \(\mu\).

dNormal() (Gaussian, coefficient-scale covariance Sigma). For Gaussian likelihood precision \(P(\beta^\ast)\) and prior precision \(\Sigma^{-1}\), \[ E(\beta\mid y) = \bigl(P(\beta^\ast)+\Sigma^{-1}\bigr)^{-1} \Bigl(P(\beta^\ast)\beta^\ast+\Sigma^{-1}\mu\Bigr). \] With the default scalar pwt, \[ \Sigma^{-1} = \frac{\mathrm{pwt}}{1-\mathrm{pwt}}\,P(\beta^\ast), \] so \[ \begin{aligned} E(\beta\mid y) &= \left(P(\beta^\ast)+\frac{\mathrm{pwt}}{1-\mathrm{pwt}}P(\beta^\ast) \right)^{-1} \left(P(\beta^\ast)\beta^\ast+\frac{\mathrm{pwt}}{1-\mathrm{pwt}}P(\beta^\ast)\mu\right) \\ &= \left(\frac{1-\mathrm{pwt}}{1-\mathrm{pwt}}P(\beta^\ast)+\frac{\mathrm{pwt}}{1-\mathrm{pwt}}P(\beta^\ast) \right)^{-1} \left(\frac{1-\mathrm{pwt}}{1-\mathrm{pwt}}P(\beta^\ast)\beta^\ast+\frac{\mathrm{pwt}}{1-\mathrm{pwt}}P(\beta^\ast)\mu\right) \\ &= \left(\frac{1}{1-\mathrm{pwt}}P(\beta^\ast)\right)^{-1} \left(\frac{1}{1-\mathrm{pwt}}P(\beta^\ast)\bigl((1-\mathrm{pwt})\beta^\ast+\mathrm{pwt}\mu\bigr)\right) \\ &= (1-\mathrm{pwt})\beta^\ast+\mathrm{pwt}\,\mu. \end{aligned} \] Thus the posterior mean is a convex combination of the likelihood estimate \(\beta^\ast\) and prior mean \(\mu\): larger pwt gives more pull toward \(\mu\). In the limit as \(\mathrm{pwt}\to 0\), it approaches \(\beta^\ast\). The underlying precision combination is the usual normal–normal Bayes linear model update (Lindley and Smith 1972).

The posterior covariance is \[ \mathrm{Var}(\beta\mid y) = \bigl(P(\beta^\ast)+\Sigma^{-1}\bigr)^{-1}. \] With the default scalar pwt, \[ \Sigma^{-1} = \frac{\mathrm{pwt}}{1-\mathrm{pwt}}\,P(\beta^\ast), \] so \[ \begin{aligned} \mathrm{Var}(\beta\mid y) &= \left(P(\beta^\ast)+\frac{\mathrm{pwt}}{1-\mathrm{pwt}}\,P(\beta^\ast)\right)^{-1} \\ &= \left(\frac{1-\mathrm{pwt}}{1-\mathrm{pwt}}\,P(\beta^\ast)+\frac{\mathrm{pwt}}{1-\mathrm{pwt}}\,P(\beta^\ast)\right)^{-1} \\ &= \left(\frac{1}{1-\mathrm{pwt}}\,P(\beta^\ast)\right)^{-1} \\ &= (1-\mathrm{pwt})\,P(\beta^\ast)^{-1}. \end{aligned} \]

Thus the posterior covariance is the likelihood-based covariance \(P(\beta^\ast)^{-1}\) shrunk by the factor \(1-\mathrm{pwt}\): larger pwt (stronger prior pull) gives tighter posterior uncertainty. In the limit as \(\mathrm{pwt}\to 0\), it approaches the likelihood-based covariance.

2.3.3 Marginal posterior mean under dNormal_Gamma()

dNormal_Gamma() (Gaussian conjugate Normal–Gamma, using Sigma_0). The marginal posterior mean is \[ E(\beta\mid y) = E_{\tau\mid y}\!\left[E(\beta\mid \tau,y)\right]. \] For fixed \(\tau\), \[ E(\beta\mid \tau,y) = \bigl(\tau X^{\mathsf T}W_{\mathrm{obs}}X+\tau\Sigma_0^{-1}\bigr)^{-1} \Bigl(\tau X^{\mathsf T}W_{\mathrm{obs}}X\,\beta^\ast+\tau\Sigma_0^{-1}\mu\Bigr). \] Under the default scalar pwt calibration for Sigma_0, \[ \Sigma_0^{-1} = \frac{\mathrm{pwt}}{1-\mathrm{pwt}}\,X^{\mathsf T}W_{\mathrm{obs}}X, \] so \[ \begin{aligned} E(\beta\mid \tau,y) &= \left(\tau X^{\mathsf T}W_{\mathrm{obs}}X+\tau\frac{\mathrm{pwt}}{1-\mathrm{pwt}}X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1} \left(\tau X^{\mathsf T}W_{\mathrm{obs}}X\,\beta^\ast+\tau\frac{\mathrm{pwt}}{1-\mathrm{pwt}}X^{\mathsf T}W_{\mathrm{obs}}X\,\mu\right) \\ &= \left(\frac{\tau}{1-\mathrm{pwt}}X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1} \left(\frac{\tau}{1-\mathrm{pwt}}X^{\mathsf T}W_{\mathrm{obs}}X\bigl((1-\mathrm{pwt})\beta^\ast+\mathrm{pwt}\mu\bigr)\right) \\ &= (1-\mathrm{pwt})\beta^\ast+\mathrm{pwt}\,\mu. \end{aligned} \] Because this expression is free of \(\tau\), averaging over \(\tau\mid y\) gives \[ E(\beta\mid y)=(1-\mathrm{pwt})\beta^\ast+\mathrm{pwt}\,\mu. \] Thus the marginal posterior mean has the same weighted-average interpretation: larger pwt gives more pull toward \(\mu\), and as \(\mathrm{pwt}\to 0\) it approaches \(\beta^\ast\). For general non-Gaussian GLMs these equalities are not exact in finite samples, because the likelihood is not exactly quadratic in \(\beta\); however, the same weighted-average form is often a good approximation when the likelihood is close to multivariate normal, as typically occurs in large samples.

2.4 Vector pwt and optional sd

Vector pwt: same Hadamard construction as above; correlations in \(V_0\) are preserved, variances scaled per coordinate.

sd: \(\mathrm{pwt}_j = (V_0)_{jj}/\bigl((V_0)_{jj}+\mathrm{sd}_j^2\bigr)\); vector pwt is not overwritten from scalar n_prior. Gaussian fits may require scalar n_prior in addition (Section 3).

3. Default Priors for Dispersion, Shape, and Rate Parameters

This section develops the Gaussian prior families used when the dispersion parameter is unknown. The goal is to show how Prior_Setup() constructs the Gamma prior on the residual precision \(\tau = 1/\phi\), how the Normal block interacts with the likelihood, and how the resulting posterior hyperparameters arise.

3.1 Posterior pieces: contribution from likelihood + Normal block

We begin with the conjugate Normal–Gamma specification \[ \beta \mid \tau \sim N\!\left(\mu,\; (\tau \Sigma_0)^{-1}\right), \qquad \tau \sim \Gamma(a_0, b_0), \] where \(\Sigma_0\) is the dispersion‑free prior covariance matrix.

For the weighted Gaussian likelihood, \[ y \mid \beta,\tau \sim N\!\left(X\beta,\; \tau^{-1} W_{\mathrm{obs}}^{-1}\right), \] the Normal block and likelihood combine through:

  • the coefficient precision update
    \[ X^{\mathsf T}W_{\mathrm{obs}}X \quad\text{and}\quad \Sigma_0^{-1}, \]
  • and the marginal quadratic term \[ S_{\mathrm{marg}} = \mathrm{RSS}_w + (\hat\beta - \mu)^{\mathsf T} \left( \Sigma_0 + (X^{\mathsf T}W_{\mathrm{obs}}X)^{-1} \right)^{-1} (\hat\beta - \mu), \] where \(\hat\beta\) is the weighted least‑squares estimator and \(\mathrm{RSS}_w\) is the weighted residual sum of squares.

Integrating out \(\beta\) in the Normal–Gamma algebra adds \[ \frac{n_w}{2} \] to the Gamma shape parameter (note: this parameterization does not add \(p/2\)). Thus the posterior hyperparameters are \[ a_n = a_0 + \frac{n_w}{2}, \qquad b_n = b_0 + \frac{1}{2} S_{\mathrm{marg}}, \] with \(n_w\) the effective sample size and \(p = \mathrm{ncol}(X)\).


3.2 Prior-strength parameterization from pwt

The scalar prior‑weight pwt is mapped to an effective prior sample size \[ n_{\mathrm{prior}} = \frac{\mathrm{pwt}}{1-\mathrm{pwt}}\, n_w, \qquad\text{equivalently}\qquad \mathrm{pwt} = \frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}} + n_w}. \]

Interpretation:

  • pwt controls how strongly the prior mean \(\mu\) influences the posterior,
  • n_prior is the number of “pseudo‑observations” implied by the prior,
  • and as pwt → 0, the prior becomes negligible and the posterior becomes likelihood‑dominated.

The dispersion‑free covariance used in dNormal_Gamma() is \[ \Sigma_0 = \frac{1-\mathrm{pwt}}{\mathrm{pwt}} \left(X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1}, \] so that \[ \Sigma_0^{-1} = \frac{\mathrm{pwt}}{1-\mathrm{pwt}} \,X^{\mathsf T}W_{\mathrm{obs}}X. \]

Substituting this into the expression for \(S_{\mathrm{marg}}\) yields \[ \begin{aligned} S_{\mathrm{marg}} &= \mathrm{RSS}_w + (\hat\beta - \mu)^{\mathsf T} \left( \frac{1-\mathrm{pwt}}{\mathrm{pwt}} (X^{\mathsf T}W_{\mathrm{obs}}X)^{-1} + (X^{\mathsf T}W_{\mathrm{obs}}X)^{-1} \right)^{-1} (\hat\beta - \mu) \\ &= \mathrm{RSS}_w + \mathrm{pwt}\, (\hat\beta - \mu)^{\mathsf T} \left(X^{\mathsf T}W_{\mathrm{obs}}X\right) (\hat\beta - \mu). \end{aligned} \]

Thus under scalar pwt, the prior‑mean penalty in \(S_{\mathrm{marg}}\) is scaled directly by pwt. This is the key link between the Normal block and the Gamma update for \(\tau\).

3.3 Gaussian prior-family calibration and parameter mapping

This section explains how the outputs of Prior_Setup() map into the Gaussian prior families and how a single calibration—based on pwt, \(n_{\mathrm{prior}}\), and the Zellner form of \(\Sigma_0\)—governs all of them.

We proceed in four parts:

  1. Default calibration of the Gamma prior on \(\tau\) from n_prior.
  2. Conjugate Normal–Gamma posterior (Theorem 1).
  3. Weak‑prior limit as \(n_{\mathrm{prior}}\to 0^{+}\) (Theorem 2).
  4. Independent Normal–Gamma analogue (Theorem 3).

A final subsection states the unified weak‑limit theorem.


3.3.1 Default calibration and posterior Gamma shape/rate

Let \(n_w=\sum_i w_i\) be the effective sample size (n_effective).
For scalar pwt, Prior_Setup() defines the effective prior sample size \[ n_{\mathrm{prior}} = \frac{\mathrm{pwt}}{1-\mathrm{pwt}}\,n_w, \qquad \mathrm{pwt} = \frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}}+n_w}. \]

Under the shape–rate parameterization \(\Gamma(a_0,b_0)\) with density \(\propto \tau^{a_0-1}e^{-b_0\tau}\), the default prior on the residual precision \(\tau\) is \[ a_0 = \frac{n_{\mathrm{prior}}+k}{2},\qquad b_0 = \frac{1}{2}(n_{\mathrm{prior}}+k+p-2)\frac{\mathrm{Smarg}}{n_w-p}. \] where \(S_{\mathrm{marg}}\) is the marginal quadratic term from Section 3.1,
\(n_w>p\) ensures propriety of the likelihood contribution,
and the conditions \(k \ge 0\) and \(k+p \ge 2\) guarantee that the Gamma prior itself is proper for all \(n_{\mathrm{prior}}>0\).

The posterior hyperparameters and induced moments follow from this calibration and are summarized in Theorem 1.


3.3.2 Conjugate Normal–Gamma posterior (dNormal_Gamma())

Theorem 1 (Conjugate posterior under the default dNormal_Gamma() calibration)

Assume:

  1. \(X^{\mathsf T}W_{\mathrm{obs}}X\) is positive definite,

  2. \(n_w > p\),

  3. \(\mathrm{RSS}_w > 0\),

  4. \(k \ge 0\),

  5. \(k + p \ge 2\).

Let the prior be \[ \beta\mid\tau\sim N(\mu,\tau^{-1}\Sigma_0), \qquad \tau\sim\Gamma(a_0,b_0), \] with \[ \Sigma_0 = \frac{1-\mathrm{pwt}}{\mathrm{pwt}} \left(X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1} = \frac{n_w}{n_{\mathrm{prior}}} \left(X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1}. \]

Then the posterior is again Normal–Gamma with the following hyperparameters.


(i) Posterior mean of \(\beta\)

\[ \mu_{\mathrm{post}} = \mathrm{pwt}\,\mu+(1-\mathrm{pwt})\,\hat\beta = \frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}}+n_w}\,\mu + \frac{n_w}{n_{\mathrm{prior}}+n_w}\,\hat\beta. \]


(ii) Posterior dispersion‑free covariance

\[ \Sigma_{0,\mathrm{post}} = (\Sigma_0^{-1}+X^{\mathsf T}W_{\mathrm{obs}}X)^{-1} = \frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}}+n_w}\,\Sigma_0 = \frac{n_w}{n_{\mathrm{prior}}+n_w} \left(X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1}. \]

For general \(\Sigma_0\), use
\(\Sigma_{0,\mathrm{post}}=(\Sigma_0^{-1}+X^{\mathsf T}W_{\mathrm{obs}}X)^{-1}\).


(iii) Posterior Gamma shape

\[ a_n = a_0 + \frac{n_w}{2} = \frac{n_{\mathrm{prior}} + k + n_w}{2}. \]


(iv) Posterior Gamma rate

\[ b_n = b_0 + \frac{1}{2}\mathrm{Smarg} = \frac{1}{2}\frac{\mathrm{Smarg}}{n_w-p}\,(n_{\mathrm{prior}} + k + n_w - 2). \]


(v) Marginal posterior mean of \(\beta\)

\[ \mathbb{E}[\beta\mid y]=\mu_{\mathrm{post}}. \]


(vi) Posterior expectation of \(\sigma^2=1/\tau\)

For \(a_n>1\), \[ \mathbb{E}[\sigma^2\mid y] = \frac{b_n}{a_n-1} = \frac{S_{\mathrm{marg}}}{n_w-p}. \]


(vii) Marginal posterior covariance of \(\beta\)

Let
\[ V_n=\Sigma_{0,\mathrm{post}} = (\Sigma_0^{-1}+X^{\mathsf T}W_{\mathrm{obs}}X)^{-1}. \]

Then \[ \mathrm{Cov}(\beta\mid y) = \mathbb{E}[\sigma^2\mid y]\,V_n = \frac{S_{\mathrm{marg}}}{n_w-p}\, \frac{n_w}{n_w+n_{\mathrm{prior}}}\, \left(X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1}. \]

Proof. See Appendix B.


Interpretation

  • pwt controls the pull toward the prior mean in (i).
  • pwt also controls the shrinkage of the covariance in (vii) via
    \(n_w/(n_w+n_{\mathrm{prior}})\).
  • The denominator \(n_w-p\) reflects the residual degrees of freedom in the weighted Gaussian model.

Together, these determine how prior strength interacts with sample size and model dimension.
Theorem 1 restates standard conjugate Normal–Gamma posterior formulas under this calibration
(Gelman et al. 2013; Raiffa and Schlaifer 1961).

Theorem 2 (Weak‑prior limit of the dNormal_Gamma() posterior)

Assume the same identifiability conditions as in Theorem 1:

  1. \(X^{\mathsf T}W_{\mathrm{obs}}X\) is positive definite,

  2. \(n_w > p\),

  3. \(\mathrm{RSS}_w > 0\),

  4. \(k \ge 0\),

  5. \(k + p \ge 2\).

Under the default calibration of Theorem 1, let
\[ n_{\mathrm{prior}} \to 0^{+} \qquad\text{equivalently}\qquad \mathrm{pwt} \to 0^{+}, \quad n_{\mathrm{prior}} = \frac{\mathrm{pwt}}{1-\mathrm{pwt}}\,n_w. \]

Then \(S_{\mathrm{marg}} \to \mathrm{RSS}_w\), and the conjugate dNormal_Gamma() posterior converges weakly to a Normal–Gamma law \(\Pi_{0}(\cdot\mid y)\) on \((\beta,\tau)\).
The limiting hyperparameters are the limits of the posterior quantities in Theorem 1 as \(n_{\mathrm{prior}}\to 0^{+}\).
(These are not the prior hyperparameters \(a_0,b_0\).)


(i) Limiting posterior mean of \(\beta\)

\[ \mu_{\Pi_{0}} = \lim_{n_{\mathrm{prior}}\to 0^{+}} \mu_{\mathrm{post}} = \hat\beta. \]


(ii) Limiting dispersion‑free covariance

\[ \Sigma_{0,\Pi_{0}} = \lim_{n_{\mathrm{prior}}\to 0^{+}} \Sigma_{0,\mathrm{post}} = \left(X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1}. \]


(iii) Limiting Gamma shape

\[ a_{\mathrm{II}} = \lim_{n_{\mathrm{prior}}\to 0^+} a_n = \frac{k + n_w}{2}. \]


(iv) Limiting Gamma rate

\[ b_{\mathrm{II}} = \lim_{n_{\mathrm{prior}}\to 0^+} b_n = \frac{1}{2}\frac{\mathrm{RSS}_w}{n_w-p}\,(k + n_w - 2). \]


(v) Limiting marginal mean of \(\beta\)

\[ \mathbb{E}_{\Pi_{0}}[\beta\mid y] = \mu_{\Pi_{0}} = \hat\beta. \]


(vi) Limiting expectation of \(\sigma^2 = 1/\tau\)

For \(\tau\mid y \sim \Gamma(a_{\Pi_{0}},b_{\Pi_{0}})\), \[ \mathbb{E}_{\Pi_{0}}[\sigma^2\mid y] = \frac{b_{\Pi_{0}}}{a_{\Pi_{0}}-1} = \frac{\mathrm{RSS}_w}{n_w-p}, \] the classical weighted residual‑variance estimator.


(vii) Limiting marginal covariance of \(\beta\)

\[ \mathrm{Cov}_{\Pi_{0}}(\beta\mid y) = \mathbb{E}_{\Pi_{0}}[\sigma^2\mid y]\, \Sigma_{0,\Pi_{0}} = \frac{\mathrm{RSS}_w}{n_w-p}\, \left(X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1}, \] matching the usual weighted least‑squares covariance.


Interpretation

The limit \(\Pi_{0}\) is the weak‑prior Normal–Gamma law obtained when the prior contributes no pseudo‑information.
It has:

  • location \(\hat\beta\),
  • geometry \((X^{\mathsf T}W_{\mathrm{obs}}X)^{-1}\),
  • and Gamma precision determined entirely by the data.

The independent Normal–Gamma posterior (Theorem 3) converges to this same \(\Pi_{0}\) under the same assumptions; only the finite‑\(n_{\mathrm{prior}}\) joint density differs.


Proof of Theorem 2.

By Theorem 1, for each \(n_{\mathrm{prior}}>0\) the dNormal_Gamma posterior is Normal–Gamma with hyperparameters \[ \mu_{\mathrm{post}}(n_{\mathrm{prior}}),\quad \Sigma_{0,\mathrm{post}}(n_{\mathrm{prior}}),\quad a_n(n_{\mathrm{prior}}),\quad b_n(n_{\mathrm{prior}}), \] given explicitly by \[ \mu_{\mathrm{post}}(n_{\mathrm{prior}}) = \frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}}+n_w}\,\mu + \frac{n_w}{n_{\mathrm{prior}}+n_w}\,\hat\beta, \] \[ \Sigma_{0,\mathrm{post}}(n_{\mathrm{prior}}) = \frac{n_w}{n_{\mathrm{prior}}+n_w}\,G^{-1}, \] \[ a_n(n_{\mathrm{prior}}) = \frac{n_{\mathrm{prior}}+k+n_w}{2},\qquad b_n(n_{\mathrm{prior}}) = \frac{n_{\mathrm{prior}}+k+n_w-2}{2}\, \frac{\mathrm{Smarg}}{n_w-p}. \]

As \(n_{\mathrm{prior}}\to 0^+\), each of these converges to a finite, valid Normal–Gamma parameter. Using assumption 4 (\(k\ge 0\)) together with assumption 2 (\(n_w>p\)), \[ a_n(n_{\mathrm{prior}})\to\frac{k+n_w}{2}>0. \] Using assumption 5 (\(k+p\ge 2\)) and again 2 (\(n_w>p\)), which together imply \(k+n_w>2\), \[ b_n(n_{\mathrm{prior}})\to \frac{k+n_w-2}{2}\, \frac{\mathrm{RSS}_w}{n_w-p}>0, \] with \(\mathrm{Smarg}\to\mathrm{RSS}_w\) as \(n_{\mathrm{prior}}\to 0^+\). Thus both limiting Gamma parameters are strictly positive, ensuring that the limiting Normal–Gamma law is proper.

The Normal–Gamma family is closed under weak limits when its parameters converge in this way, so the posteriors \(\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}\) converge weakly to the Normal–Gamma law \(\Pi_0\) with these limiting hyperparameters. The stated formulas for the limiting mean, covariance, and variance of \(\beta\) and \(\sigma^2=1/\tau\) follow by plugging the limits into the standard Normal–Gamma moment expressions. \(\square\)

3.3.3 Posterior covariance under dNormal() with default dispersion

The dNormal() prior fixes the residual variance \(\sigma^2\) at a calibrated value rather than integrating over \(\tau\) as in the Normal–Gamma model.
This section shows how the default dispersion is chosen and how the resulting posterior covariance matches the weak‑prior limit of dNormal_Gamma().


Covariance under fixed \(\sigma^2\)

From Section 2.3.2, under scalar pwt, \[ \mathrm{Var}(\beta\mid y,\sigma^2) = (1-\mathrm{pwt})\,P(\beta^\ast)^{-1}. \]

For weighted Gaussian regression, \[ P(\beta^\ast) = \sigma^{-2}X^{\mathsf T}W_{\mathrm{obs}}X, \] so \[ \mathrm{Var}(\beta\mid y,\sigma^2) = (1-\mathrm{pwt})\,\sigma^2 \left(X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1}. \]

Using
\[ n_{\mathrm{prior}} = \frac{\mathrm{pwt}}{1-\mathrm{pwt}}, \qquad 1-\mathrm{pwt} = \frac{n_w}{n_w+n_{\mathrm{prior}}}, \] this becomes \[ \mathrm{Var}(\beta\mid y,\sigma^2) = \frac{n_w}{n_w+n_{\mathrm{prior}}}\, \sigma^2 \left(X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1}. \]


Default dispersion

To choose a default fixed value of \(\sigma^2\), Prior_Setup() uses the posterior mean from the Normal–Gamma model (Theorem 1 (vi)): \[ \mathrm{dispersion}_{\mathrm{default}} = \frac{S_{\mathrm{marg}}}{n_w-p}. \]

This matches the classical residual degrees‑of‑freedom adjustment.

Substituting this into the covariance expression gives \[ \mathrm{Var}(\beta\mid y,\mathrm{dispersion}_{\mathrm{default}}) = \frac{n_w}{n_w+n_{\mathrm{prior}}}\, \frac{S_{\mathrm{marg}}}{n_w-p}\, \left(X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1}. \]


Calibrated prior covariance returned by Prior_Setup()

With the same default dispersion, Prior_Setup() returns the coefficient‑scale prior covariance \[ \Sigma_{\mathrm{calibrated}} = \frac{n_w}{n_{\mathrm{prior}}}\, \mathrm{dispersion}_{\mathrm{default}}\, \left(X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1}, \] which is the matrix used by dNormal().

This matches the Normal–Gamma expression in Section 3.3.2, ensuring that the fixed‑dispersion and conjugate models share the same calibration.


Weak‑prior limit

As \(\mathrm{pwt}\to 0\) (equivalently \(n_{\mathrm{prior}}\to 0^{+}\)), \[ \mathrm{Var}(\beta\mid y) \;\longrightarrow\; \frac{\mathrm{RSS}_w}{n_w-p}\, \left(X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1}, \] the classical weighted least‑squares covariance.

Thus dNormal() with default dispersion has the same weak‑prior limit as dNormal_Gamma(), and the returned shape, rate, dispersion, and coefficient‑scale covariance remain internally consistent under the package calibration.

3.3.4 Independent Normal–Gamma Prior

The independent Normal–Gamma (ING) prior replaces the conjugate covariance structure \(\tau^{-1}\Sigma_0\) with a fixed coefficient-scale covariance \(\Sigma\), while using a Gamma prior on \(\tau\) whose shape parameter differs from the conjugate Normal–Gamma case by \(p/2\). (Griffin and Brown 2010) develop inference with Normal–Gamma priors in regression when independence replaces full conjugacy.

The default call is

dIndependent_Normal_Gamma(ps$mu, Sigma = ps$Sigma, shape = ps$shape_ING, rate = ps$rate)

Let \(p = \mathrm{ncol}(X)\), and let \(a_0, b_0, S_{\mathrm{marg}}\) be as in Sections 3.3.1–3.3.2.

  • Prior mean: \[ \mu = \texttt{ps\$mu}. \]

  • Coefficient-scale covariance: \[ \Sigma = \frac{n_w}{n_{\mathrm{prior}}}\, \frac{S_{\mathrm{marg}}}{n_w - p}\, (X^{\mathsf T}W_{\mathrm{obs}}X)^{-1}. \]

  • ING Gamma shape: \[ \mathrm{shape}_{\mathrm{ING}} = a_0 + \frac{p}{2} = \frac{n_{\mathrm{prior}} + k + p}{2}. \]

  • Gamma rate: \[ \texttt{rate} = b_0 = \frac{1}{2}(n_{\mathrm{prior}} + k + p - 2)\, \frac{S_{\mathrm{marg}}}{n_w - p}. \]


Theorem 3 (Weak-prior limit of the Independent Normal–Gamma posterior)

Assume:

  1. \(X^{\mathsf T}W_{\mathrm{obs}}X\) is positive definite,

  2. \(n_w > p\),

  3. \(\mathrm{RSS}_w > 0\),

  4. \(k \ge 0\),

  5. \(k + p \ge 2\).

For each \(n_{\mathrm{prior}} > 0\), let
\[ \Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\cdot \mid y) \] denote the posterior under the ING prior above.
Let \(\Pi_0(\cdot \mid y)\) be the Normal–Gamma law from Theorem 2 with hyperparameters

\[ \mu_{\Pi_0} = \hat\beta, \qquad \Sigma_{0,\Pi_0} = (X^{\mathsf T}W_{\mathrm{obs}}X)^{-1}, \] \[ a_{\Pi_0} = \frac{k + n_w}{2}, \qquad b_{\Pi_0} = \frac{1}{2}\frac{k + n_w - 2}{n_w - p}\,\mathrm{RSS}_w. \]

Then, as \(n_{\mathrm{prior}} \to 0^{+}\) (equivalently \(\mathrm{pwt} \to 0^{+}\)),

\[ \Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\cdot \mid y) \;\Rightarrow\; \Pi_0(\cdot \mid y) \]

in distribution on \(\mathbb{R}^p \times (0,\infty)\).
Moreover, the posterior moments converge:

  1. Coefficient mean: \[ \mathbb{E}_{\Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}}[\beta \mid y] \longrightarrow \hat\beta. \]

  2. Residual variance: \[ \mathbb{E}_{\Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}}[\sigma^2 \mid y] \longrightarrow \frac{\mathrm{RSS}_w}{n_w - p}. \]

  3. Coefficient covariance: \[ \mathrm{Cov}_{\Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}}(\beta \mid y) \longrightarrow \frac{\mathrm{RSS}_w}{n_w - p}\, (X^{\mathsf T}W_{\mathrm{obs}}X)^{-1}. \]

Thus the ING posterior has the same weak-prior limit as the conjugate Normal–Gamma posterior, even though its finite-\(n_{\mathrm{prior}}\) form is not conjugate and its Gamma shape parameter differs by \(p/2\).

Proof of Theorem 3.

Fix \(y,X,W_{\mathrm{obs}}\) satisfying Assumptions 1–3.
For each \(n_{\mathrm{prior}}>0\), let \(\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}\) and \(\Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}\) denote, respectively, the NG and ING posteriors on \((\beta,\tau)\).

By Theorem 2, the NG posteriors converge weakly to the limiting Normal–Gamma law \(\Pi_0\): \[ \Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}} \Rightarrow \Pi_0 \quad\text{as }n_{\mathrm{prior}}\to 0^+. \]

From the posterior ratio identity in A.2 and Lemma B, we have, for each \(n_{\mathrm{prior}}>0\), \[ \Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\mathrm{d}\beta,\mathrm{d}\tau) = R_{n_{\mathrm{prior}}}(\beta,\tau)\, \Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\mathrm{d}\beta,\mathrm{d}\tau), \] with \[ R_{n_{\mathrm{prior}}}(\beta,\tau)\to 1 \quad\text{for each fixed }(\beta,\tau), \] and a measurable envelope \(M(\beta,\tau)\) such that \[ \sup_{0<n_{\mathrm{prior}}<\delta} |R_{n_{\mathrm{prior}}}(\beta,\tau)| \le M(\beta,\tau), \qquad \int M(\beta,\tau)\,\Pi_0(\mathrm{d}\beta,\mathrm{d}\tau)<\infty. \]

Let \(f\colon\mathbb{R}^p\times(0,\infty)\to\mathbb{R}\) be bounded and continuous. Then \[ \int f\,\mathrm{d}\Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}} = \int f(\beta,\tau)\,R_{n_{\mathrm{prior}}}(\beta,\tau)\, \Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\mathrm{d}\beta,\mathrm{d}\tau). \]

By Lemma A, the NG path has uniform moment bounds.
These bounds rely on Assumptions 4–5 (\(k\ge0\) and \(k+p\ge2\)), which ensure that the NG Gamma shapes and rates remain in compact subsets of \((0,\infty)\) for all \(0<n_{\mathrm{prior}}<\delta\).
Together with Claim B.2, this implies that \(|f\,R_{n_{\mathrm{prior}}}|\) is dominated by an integrable envelope under \(\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}\) for \(0<n_{\mathrm{prior}}<\delta\).

Using the pointwise convergence \(R_{n_{\mathrm{prior}}}\to 1\) and the weak convergence \(\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}\Rightarrow\Pi_0\), we obtain, by dominated convergence, \[ \int f\,\mathrm{d}\Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}} \longrightarrow \int f\,\mathrm{d}\Pi_0. \] Thus \(\Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}\Rightarrow\Pi_0\) as \(n_{\mathrm{prior}}\to 0^+\), proving the distributional convergence.

For the moment statements, take \(f(\beta,\tau)=\beta_j\), \(f(\beta,\tau)=\tau^{-1}\), and \(f(\beta,\tau)=(\beta-\mathbb{E}_{\Pi_0}[\beta]) (\beta-\mathbb{E}_{\Pi_0}[\beta])^\top\) componentwise. Lemma A again applies because Assumptions 4–5 ensure that the NG Gamma parameters stay uniformly bounded away from zero, giving uniform integrability of the corresponding NG moments.
The same envelope \(M\) from Claim B.2 transfers this to the ING path via the ratio representation. Hence dominated convergence applies to these (unbounded) test functions as well, yielding \[ \mathbb{E}_{\Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}}[\beta] \to \mathbb{E}_{\Pi_0}[\beta]=\hat\beta, \quad \mathbb{E}_{\Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}}[\tau^{-1}] \to \mathbb{E}_{\Pi_0}[\tau^{-1}] = \frac{\mathrm{RSS}_w}{n_w-p}, \] and \[ \mathrm{Cov}_{\Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}}(\beta\mid y) \to \mathrm{Cov}_{\Pi_0}(\beta\mid y) = \frac{\mathrm{RSS}_w}{n_w-p}\,G^{-1}. \]

These limits match the expressions stated in Theorem 3, so the ING posterior has the same weak–prior limit \(\Pi_0\) as the conjugate Normal–Gamma posterior, with convergence of the first two moments. \(\square\)

3.3.5 dGamma() Prior (Fixed \(\beta\), Gamma Prior on Precision)

This subsection records the Gaussian hyperparameters that Prior_Setup() passes into dGamma() and rGamma_reg() when the coefficient vector is fixed at the default blend \(\beta^{+}\). The implementation uses the fields shape, rate_gamma, and coefficients returned by compute_gaussian_prior(); the sampler then pairs a Gamma prior on \(\tau = 1/\sigma^{2}\) with the weighted Gaussian likelihood evaluated at \(\beta^{+}\).

Let \[ n_w = \sum_i w_i,\qquad p = \mathrm{ncol}(X),\qquad n_{\mathrm{prior}} = \frac{\mathrm{pwt}}{1-\mathrm{pwt}}\,n_w, \] and let \(k=1\) be the package default in compute_gaussian_prior().

Define the blended coefficient vector \[ \beta^{+} = (1-\mathrm{pwt})\,\hat\beta + \mathrm{pwt}\,\mu, \] and the corresponding weighted residual sum of squares \[ \mathrm{RSS}_w(\beta^{+}) = \sum_i w_i\,(y_i - x_i^\top\beta^{+})^2. \]


Prior on \(\tau\) (fixed-\(\beta\) path)

The prior supplied to dGamma() is a Gamma distribution on the precision \(\tau\):

  1. Shape parameter \[ a_0 = \frac{n_{\mathrm{prior}} + k}{2}. \]

  2. Rate parameter \[ b_{0,y} = \frac{n_{\mathrm{prior}} + k + p - 2}{2}\; \frac{\mathrm{RSS}_w(\beta^{+})}{\,n_w - p\,}, \qquad n_w > p. \]

This matches the structure used internally by compute_gaussian_prior(): the factor \((n_{\mathrm{prior}} + k + p - 2)/(n_w - p)\) is the same multiplier that appears in the default rate_gamma, with \(\mathrm{RSS}_w(\beta^{+})\) supplying the residual sum of squares at the blended coefficient.


Posterior for \(\tau\) given \(y\) and fixed \(\beta^{+}\)

With the weighted Gaussian likelihood \[ L(y\mid \beta^{+},\tau) \;\propto\; \tau^{n_w/2}\exp\!\left(-\frac{\tau}{2}\,\mathrm{RSS}_w(\beta^{+})\right), \] and the prior \(\tau\sim\Gamma(a_0,b_{0,y})\), the posterior is again Gamma:

  1. Posterior shape \[ a_n = a_0 + \frac{n_w}{2} = \frac{n_{\mathrm{prior}} + k + n_w}{2}. \]

  2. Posterior rate \[ b_n = b_{0,y} + \frac{1}{2}\,\mathrm{RSS}_w(\beta^{+}) = \frac{n_{\mathrm{prior}} + k + n_w - 2}{2}\; \frac{\mathrm{RSS}_w(\beta^{+})}{\,n_w - p\,}. \]


Posterior expectation of \(\sigma^2 = 1/\tau\)

For \(a_n > 1\), \[ E[\sigma^2 \mid y, \beta^{+}] = \frac{b_n}{a_n - 1} = \frac{\mathrm{RSS}_w(\beta^{+})}{\,n_w - p\,}, \] the usual weighted residual‑variance estimator evaluated at \(\beta^{+}\).


Interpretation

  • The prior rate \(b_{0,y}\) uses the same structural multiplier as the Normal–Gamma calibration, but evaluated at the blended coefficient \(\beta^{+}\).
  • The posterior expectation of \(\sigma^2\) is the classical residual‑variance estimator at \(\beta^{+}\), independent of \(n_{\mathrm{prior}}\).
  • In the weak‑prior limit \(\mathrm{pwt}\to 0\),
    \(\beta^{+}\to\hat\beta\) and
    \(\mathrm{RSS}_w(\beta^{+})\to\mathrm{RSS}_w(\hat\beta)\),
    recovering the usual weighted least‑squares variance estimate.

This completes the description of the fixed-\(\beta\) Gamma prior used by dGamma() and rGamma_reg(). —

Appendix A: Technical Ingredients for the ING Weak‑Prior Limit

This appendix collects the analytical components required to establish Theorem 3.
Theorems 1 and 2 follow directly from conjugate Normal–Gamma algebra and the Zellner‑type calibration; only the Independent Normal–Gamma (ING) case requires additional work.
The purpose of this appendix is therefore to isolate the technical machinery needed to show that the ING posterior converges to the same weak‑prior limit \(\Pi_0\) as the conjugate Normal–Gamma posterior.

The argument proceeds through five steps:

  1. A common Gaussian likelihood representation
  2. A ratio representation comparing ING and NG posteriors
  3. Uniform moment bounds for the NG path (Lemma A)
  4. Ratio convergence and domination (Lemma B)
  5. Weak convergence and moment convergence for the ING posterior

Each subsection states the required intermediate results and provides the structural components of the proof, while detailed algebraic derivations are deferred to the appropriate claims and lemmas.

A.1 Common Gaussian Setup

Let

  • \(G = X^{\mathsf T}W_{\mathrm{obs}}X\),
  • \(\hat\beta\) the weighted least‑squares estimator,
  • \(\mathrm{RSS}_w\) the weighted residual sum of squares,
  • \(\mathrm{RSS}_w(\beta) = \mathrm{RSS}_w + (\beta - \hat\beta)^{\mathsf T}G(\beta - \hat\beta)\).

The weighted Gaussian likelihood can be written as

\[ L(y \mid \beta,\tau) \propto \tau^{n_w/2} \exp\!\left( -\frac{\tau}{2}\,\mathrm{RSS}_w(\beta) \right). \]

This representation is shared by both the NG and ING posterior paths.


A.2 Posterior Ratio Representation

To compare the ING and NG posterior paths, we first record their correct prior kernels.

NG prior (Theorem 1, §3.3.2)

For each \(n_{\mathrm{prior}} > 0\), \[ \beta \mid \tau \sim N\!\left(\mu,\;\tau^{-1}\Sigma_0\right), \qquad \tau \sim \Gamma\!\left(a_0(n_{\mathrm{prior}}),\, b_0(n_{\mathrm{prior}})\right), \] where the dispersion–free Zellner matrix is \[ \Sigma_0 = \frac{1-pwt}{pwt}\,(X^\top W_{\mathrm{obs}}X)^{-1}. \]

Thus the NG prior kernel is \[ \pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\beta,\tau) \propto \tau^{a_0(n_{\mathrm{prior}})+p/2-1} \exp\!\left( -b_0(n_{\mathrm{prior}})\tau -\frac{\tau}{2}(\beta-\mu)^\top\Sigma_0^{-1}(\beta-\mu) \right). \]

ING prior (§3.3.4)

The ING prior uses a fixed coefficient–scale covariance and a Gamma shape shifted by \(p/2\): \[ \beta \mid n_{\mathrm{prior}} \sim N\!\left(\mu,\;\Sigma(n_{\mathrm{prior}})\right), \qquad \tau \sim \Gamma\!\left(a_0(n_{\mathrm{prior}})+\tfrac{p}{2},\; b_0(n_{\mathrm{prior}})\right), \] with \(\beta\) and \(\tau\) independent, and \[ \Sigma(n_{\mathrm{prior}}) = \frac{n_w}{n_{\mathrm{prior}}}\, \frac{\mathrm{Smarg}}{n_w - p}\, (X^\top W_{\mathrm{obs}}X)^{-1}. \]

Thus the ING prior kernel is \[ \pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\beta,\tau) \propto \tau^{a_0(n_{\mathrm{prior}})+p/2-1} \exp\!\left( -b_0(n_{\mathrm{prior}})\tau \right) \exp\!\left( -\frac{1}{2}(\beta-\mu)^\top\Sigma(n_{\mathrm{prior}})^{-1}(\beta-\mu) \right). \]

Ratio of prior kernels

Define \[ R_{n_{\mathrm{prior}}}(\beta,\tau) = \frac{ \pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\beta,\tau) }{ \pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\beta,\tau) }. \]

Because the ING Gamma shape equals the NG Gamma shape plus \(p/2\), the \(\tau\)-powers match and cancel. The ratio therefore reduces to \[ R_{n_{\mathrm{prior}}}(\beta,\tau) = C_{n_{\mathrm{prior}}}\, \exp\!\left( -\frac{1}{2}(\beta-\mu)^\top \bigl[\Sigma(n_{\mathrm{prior}})^{-1} -\tau\,\Sigma_0^{-1}\bigr] (\beta-\mu) \right), \] where \(C_{n_{\mathrm{prior}}}\) absorbs all \(\tau\)-free constants.

Posterior ratio identity

The ING posterior is a reweighted NG posterior: \[ \Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\mathrm{d}\beta,\mathrm{d}\tau) \propto R_{n_{\mathrm{prior}}}(\beta,\tau)\, \Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\mathrm{d}\beta,\mathrm{d}\tau). \]

This identity is the starting point for Lemma B and the ING weak‑prior limit.


A.3 Lemma A: Uniform moment bounds for the NG path

Lemma A (Uniform moment bounds for the NG posterior)

Assume:

  1. \(X^{\mathsf T}W_{\mathrm{obs}}X\) is positive definite,

  2. \(n_w > p\),

  3. \(\mathrm{RSS}_w > 0\),

  4. \(k \ge 0\),

  5. \(k + p \ge 2\).

Fix \(X, W_{\mathrm{obs}}, y, \mu\), and hence \(\hat\beta\), \(\mathrm{RSS}_w\), \(S_{\mathrm{marg}}\), and \(G\).
For each \(n_{\mathrm{prior}} > 0\), let \(\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}\) be the conjugate Normal–Gamma posterior from Section 3.3.2, with hyperparameters

\[ \mu_{\mathrm{post}}(n_{\mathrm{prior}}), \quad \Sigma_{0,\mathrm{post}}(n_{\mathrm{prior}}), \quad a_n(n_{\mathrm{prior}}), \quad b_n(n_{\mathrm{prior}}). \]

With the \(k\)-generalized calibration, these are: \[ a_n(n_{\mathrm{prior}})=\frac{n_{\mathrm{prior}}+k+n_w}{2}, \qquad b_n(n_{\mathrm{prior}})=\frac{n_{\mathrm{prior}}+k+n_w-2}{2}\frac{S_{\mathrm{marg}}}{n_w-p}. \]

Then there exists \(\delta > 0\) and constants \(C_1, C_2 < \infty\) such that for all \(0 < n_{\mathrm{prior}} < \delta\),

\[ \mathbb{E}_{\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}} \bigl[\|\beta\|\bigr] \le C_1, \qquad \mathbb{E}_{\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}} \bigl[\|\beta\|^2\bigr] \le C_2, \]

and

\[ \sup_{0 < n_{\mathrm{prior}} < \delta} \mathbb{E}_{\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}}[\tau] < \infty, \qquad \sup_{0 < n_{\mathrm{prior}} < \delta} \mathbb{E}_{\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}}[\tau^2] < \infty. \]


Claim A.1 (Continuity and compactness of NG hyperparameters)

Under Assumptions 1–5 of Theorem 3, the NG hyperparameters satisfy:

  • \(n_{\mathrm{prior}} \mapsto \mu_{\mathrm{post}}(n_{\mathrm{prior}})\) is continuous on \((0,\infty)\) and \(\mu_{\mathrm{post}}(n_{\mathrm{prior}}) \to \hat\beta\) as \(n_{\mathrm{prior}} \to 0^{+}\).
  • \(n_{\mathrm{prior}} \mapsto \Sigma_{0,\mathrm{post}}(n_{\mathrm{prior}})\) is continuous on \((0,\infty)\) and \(\Sigma_{0,\mathrm{post}}(n_{\mathrm{prior}}) \to G^{-1}\) as \(n_{\mathrm{prior}} \to 0^{+}\).
  • \(n_{\mathrm{prior}} \mapsto a_n(n_{\mathrm{prior}})\) and \(n_{\mathrm{prior}} \mapsto b_n(n_{\mathrm{prior}})\) are continuous on \((0,\infty)\) and converge to strictly positive limits as \(n_{\mathrm{prior}} \to 0^{+}\).

In particular, there exists \(\delta > 0\) such that for all \(0 < n_{\mathrm{prior}} < \delta\), the four hyperparameters lie in compact subsets of their respective spaces.

Proof of Claim A.1.

By Theorem 1 and the prior setup, the NG hyperparameters can be written explicitly as functions of \(n_{\mathrm{prior}} > 0\):

\[ \mu_{\mathrm{post}}(n_{\mathrm{prior}}) = \frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}}+n_w}\,\mu + \frac{n_w}{n_{\mathrm{prior}}+n_w}\,\hat\beta, \]

\[ \Sigma_{0,\mathrm{post}}(n_{\mathrm{prior}}) = \frac{n_w}{n_{\mathrm{prior}}+n_w}\,G^{-1}, \]

\[ a_n(n_{\mathrm{prior}}) = \frac{n_{\mathrm{prior}} + k + n_w}{2},\qquad b_n(n_{\mathrm{prior}}) = \frac{1}{2}\bigl(n_{\mathrm{prior}} + k + n_w - 2\bigr)\, \frac{S_{\mathrm{marg}}}{n_w-p}. \]

Here \(\mu, \hat\beta, G, S_{\mathrm{marg}}, n_w, p\) are fixed and do not depend on \(n_{\mathrm{prior}}\).

Each of these maps is a rational (in fact affine) function of \(n_{\mathrm{prior}}\) with denominator \(n_{\mathrm{prior}}+n_w > 0\), so all four are continuous on \((0,\infty)\). Assumption 1 (\(G\) positive definite) ensures that \(G^{-1}\) exists and is finite, so \(\Sigma_{0,\mathrm{post}}(n_{\mathrm{prior}})\) is well defined for all \(n_{\mathrm{prior}}>0\). Assumption 2 (\(n_w > p\)) implies \(n_w-p>0\), so the denominator in \(b_n(n_{\mathrm{prior}})\) is positive. Assumption 3 (\(\mathrm{RSS}_w>0\)) implies \(S_{\mathrm{marg}}>0\), so the rate \(b_n(n_{\mathrm{prior}})\) is strictly positive for all \(n_{\mathrm{prior}}>0\).

Taking the limit \(n_{\mathrm{prior}} \to 0^{+}\) in the explicit formulas gives

\[ \mu_{\mathrm{post}}(n_{\mathrm{prior}}) \to \hat\beta,\qquad \Sigma_{0,\mathrm{post}}(n_{\mathrm{prior}}) \to G^{-1}, \]

\[ a_n(n_{\mathrm{prior}}) \to \frac{k+n_w}{2} > 0,\qquad b_n(n_{\mathrm{prior}}) \to \frac{1}{2}\frac{k+n_w-2}{n_w-p}\,S_{\mathrm{marg}} > 0, \]

where the strict positivity of the limits of \(a_n\) and \(b_n\) uses Assumptions 2–3 together with Assumptions 4–5 (\(k\ge0\) and \(k+p\ge2\)), since \(k+p\ge2\) and \(n_w>p\) imply \(k+n_w>2\).

Since each map is continuous on \((0,\infty)\) and has a finite limit as \(n_{\mathrm{prior}} \to 0^{+}\), there exists \(\delta > 0\) such that, for all \(0 < n_{\mathrm{prior}} < \delta\),

  • \(\mu_{\mathrm{post}}(n_{\mathrm{prior}})\) lies in a compact subset of \(\mathbb{R}^p\),
  • \(\Sigma_{0,\mathrm{post}}(n_{\mathrm{prior}})\) lies in a compact subset of the positive definite matrices,
  • \(a_n(n_{\mathrm{prior}})\) and \(b_n(n_{\mathrm{prior}})\) lie in compact subsets of \((0,\infty)\).

This is exactly the continuity and compactness statement of Claim A.1. \(\square\)


Proof of Lemma A

For each \(n_{\mathrm{prior}} > 0\), Theorem 1 gives

  • \(\beta \mid \tau, y, n_{\mathrm{prior}} \sim N\bigl(\mu_{\mathrm{post}}(n_{\mathrm{prior}}), \tau^{-1}\Sigma_{0,\mathrm{post}}(n_{\mathrm{prior}})\bigr)\),

  • \(\tau \mid y, n_{\mathrm{prior}} \sim \Gamma\bigl(a_n(n_{\mathrm{prior}}), b_n(n_{\mathrm{prior}})\bigr)\).

By Claim A.1, there exists \(\delta > 0\) such that for all \(0 < n_{\mathrm{prior}} < \delta\),

  • \(a_n(n_{\mathrm{prior}})\) and \(b_n(n_{\mathrm{prior}})\) lie in compact subsets of \((0,\infty)\),
  • \(\mu_{\mathrm{post}}(n_{\mathrm{prior}})\) lies in a compact subset of \(\mathbb{R}^p\),
  • \(\Sigma_{0,\mathrm{post}}(n_{\mathrm{prior}})\) lies in a compact subset of the positive definite matrices.

These compactness properties rely on Assumptions 4–5 (\(k\ge0\), \(k+p\ge2\)) together with Assumptions 2–3 (\(n_w>p\), \(\mathrm{RSS}_w>0\)), which ensure that the limiting Gamma shape and rate are strictly positive and therefore bounded away from zero.


Bounds for \(\tau\)

For each \(n_{\mathrm{prior}}\),
\(\tau \mid y, n_{\mathrm{prior}} \sim \Gamma(a_n(n_{\mathrm{prior}}), b_n(n_{\mathrm{prior}}))\), so

\[ \mathbb{E}[\tau \mid y, n_{\mathrm{prior}}] = \frac{a_n(n_{\mathrm{prior}})}{b_n(n_{\mathrm{prior}})},\qquad \mathbb{E}[\tau^2 \mid y, n_{\mathrm{prior}}] = \frac{a_n(n_{\mathrm{prior}})\bigl(a_n(n_{\mathrm{prior}})+1\bigr)} {b_n(n_{\mathrm{prior}})^2}. \]

On \((0,\delta)\), both \(a_n(n_{\mathrm{prior}})\) and \(b_n(n_{\mathrm{prior}})\) stay in compact subsets of \((0,\infty)\) by Claim A.1, which uses Assumptions 4–5 to ensure positivity of the limiting Gamma parameters. Thus the maps

\[ n_{\mathrm{prior}} \mapsto \frac{a_n(n_{\mathrm{prior}})}{b_n(n_{\mathrm{prior}})},\qquad n_{\mathrm{prior}} \mapsto \frac{a_n(n_{\mathrm{prior}})\bigl(a_n(n_{\mathrm{prior}})+1\bigr)} {b_n(n_{\mathrm{prior}})^2} \]

are continuous and bounded on \((0,\delta)\). Hence

\[ \sup_{0 < n_{\mathrm{prior}} < \delta} \mathbb{E}_{\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}}[\tau] < \infty,\qquad \sup_{0 < n_{\mathrm{prior}} < \delta} \mathbb{E}_{\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}}[\tau^2] < \infty. \]


Bounds for \(\beta\)

The marginal distribution of \(\beta \mid y, n_{\mathrm{prior}}\) under \(\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}\) has

\[ \mathbb{E}[\beta \mid y, n_{\mathrm{prior}}] = \mu_{\mathrm{post}}(n_{\mathrm{prior}}), \]

and

\[ \mathrm{Cov}(\beta \mid y, n_{\mathrm{prior}}) = \mathbb{E}[\sigma^2 \mid y, n_{\mathrm{prior}}]\, \Sigma_{0,\mathrm{post}}(n_{\mathrm{prior}}), \]

where \(\sigma^2 = 1/\tau\) and, for \(a_n(n_{\mathrm{prior}}) > 1\),

\[ \mathbb{E}[\sigma^2 \mid y, n_{\mathrm{prior}}] = \frac{b_n(n_{\mathrm{prior}})}{a_n(n_{\mathrm{prior}})-1}. \]

By Claim A.1,
\(a_n(n_{\mathrm{prior}}) \to (k+n_w)/2 > 0\) as \(n_{\mathrm{prior}} \to 0^{+}\).
Assumptions 4–5 ensure that \((k+n_w)/2>1\) because \(k+p\ge2\) and \(n_w>p\) imply \(k+n_w>2\).
Shrinking \(\delta\) if necessary, we may therefore assume \(a_n(n_{\mathrm{prior}}) > 1\) for all \(0 < n_{\mathrm{prior}} < \delta\).

On this interval, \(a_n(n_{\mathrm{prior}})\) and \(b_n(n_{\mathrm{prior}})\) lie in compact subsets of \((0,\infty)\), so

\[ n_{\mathrm{prior}} \mapsto \frac{b_n(n_{\mathrm{prior}})}{a_n(n_{\mathrm{prior}})-1} \]

is continuous and bounded on \((0,\delta)\).
By Claim A.1, \(\Sigma_{0,\mathrm{post}}(n_{\mathrm{prior}})\) lies in a compact subset of the positive definite matrices, so its operator norm and trace are bounded on \((0,\delta)\).
Therefore

\[ \sup_{0 < n_{\mathrm{prior}} < \delta} \mathrm{tr}\,\mathrm{Cov}(\beta \mid y, n_{\mathrm{prior}}) < \infty. \]

Now

\[ \mathbb{E}\bigl[\|\beta\|^2 \mid y, n_{\mathrm{prior}}\bigr] = \bigl\|\mathbb{E}[\beta \mid y, n_{\mathrm{prior}}]\bigr\|^2 + \mathrm{tr}\,\mathrm{Cov}(\beta \mid y, n_{\mathrm{prior}}). \]

By Claim A.1, \(\mu_{\mathrm{post}}(n_{\mathrm{prior}})\) lies in a compact subset of \(\mathbb{R}^p\) for \(0 < n_{\mathrm{prior}} < \delta\), so \(\|\mu_{\mathrm{post}}(n_{\mathrm{prior}})\|\) is bounded on \((0,\delta)\). Combined with the bound on \(\mathrm{tr}\,\mathrm{Cov}(\beta \mid y, n_{\mathrm{prior}})\), this implies

\[ \sup_{0 < n_{\mathrm{prior}} < \delta} \mathbb{E}_{\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}} \bigl[\|\beta\|^2\bigr] < \infty. \]

Define

\[ C_2 := \sup_{0 < n_{\mathrm{prior}} < \delta} \mathbb{E}_{\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}} \bigl[\|\beta\|^2\bigr] < \infty. \]

Finally, by Cauchy–Schwarz,

\[ \mathbb{E}_{\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}} \bigl[\|\beta\|\bigr] \le \Bigl( \mathbb{E}_{\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}} \bigl[\|\beta\|^2\bigr] \Bigr)^{1/2} \le \sqrt{C_2} =: C_1. \]

This proves Lemma A.

A.4 Lemma B: Ratio convergence and domination

Lemma B (Ratio convergence and domination)

Let \(R_{n_{\mathrm{prior}}}(\beta,\tau)\) be the posterior density ratio \[ R_{n_{\mathrm{prior}}}(\beta,\tau) = \frac{\pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\beta,\tau\mid y)} {\pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\beta,\tau\mid y)}. \]

Under the assumptions of Theorem 3:

  1. For each fixed \((\beta,\tau)\), \[ R_{n_{\mathrm{prior}}}(\beta,\tau)\to 1 \quad\text{as }n_{\mathrm{prior}}\to 0^+. \]

  2. There exists a measurable envelope \(M(\beta,\tau)\) such that \[ \sup_{0<n_{\mathrm{prior}}<\delta} |R_{n_{\mathrm{prior}}}(\beta,\tau)| \le M(\beta,\tau), \qquad \mathbb{E}_{\Pi_0}[M(\beta,\tau)]<\infty. \]

Claim B.1 (Explicit prior ratio and quadratic form)

For each \(n_{\mathrm{prior}} > 0\), let \[ \tilde R_{n_{\mathrm{prior}}}(\beta,\tau) := \frac{ \pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\beta,\tau) }{ \pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\beta,\tau) } \] be the ratio of the ING and NG prior kernels defined in Section A.2.

Then \(\tilde R_{n_{\mathrm{prior}}}\) can be written in the form \[ \tilde R_{n_{\mathrm{prior}}}(\beta,\tau) = h_{n_{\mathrm{prior}}}(\beta)\, \tau^{c_p}\, \exp\!\bigl(-\tfrac{1}{2}\tau\,q_{n_{\mathrm{prior}}}(\beta)\bigr), \] where:

  • \(c_p\) is a constant depending only on \(p\),
  • \(q_{n_{\mathrm{prior}}}(\beta)\) is a quadratic form in \(\beta\) whose coefficients are continuous in \(n_{\mathrm{prior}}\) and converge pointwise to finite limits as \(n_{\mathrm{prior}}\to 0^+\),
  • \(h_{n_{\mathrm{prior}}}(\beta)\) does not depend on \(\tau\) and satisfies \(h_{n_{\mathrm{prior}}}(\beta)\to 1\) for each fixed \(\beta\) as \(n_{\mathrm{prior}}\to 0^+\).

Proof.

From Section A.2, the NG and ING prior kernels are \[ \pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\beta,\tau) \propto \tau^{a_0(n_{\mathrm{prior}})+p/2-1} \exp\!\left( -b_0(n_{\mathrm{prior}})\tau -\frac{\tau}{2}(\beta-\mu)^\top\Sigma_0^{-1}(\beta-\mu) \right), \] \[ \pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\beta,\tau) \propto \tau^{a_0(n_{\mathrm{prior}})+p/2-1} \exp\!\left( -b_0(n_{\mathrm{prior}})\tau \right) \exp\!\left( -\frac{1}{2}(\beta-\mu)^\top\Sigma(n_{\mathrm{prior}})^{-1}(\beta-\mu) \right), \] with \[ \Sigma_0 = \frac{1-pwt}{pwt}\,(X^\top W_{\mathrm{obs}}X)^{-1}, \qquad \Sigma(n_{\mathrm{prior}}) = \frac{n_w}{n_{\mathrm{prior}}}\, \frac{\mathrm{Smarg}}{n_w - p}\, (X^\top W_{\mathrm{obs}}X)^{-1}. \]

Assumption 1 ensures \(X^\top W_{\mathrm{obs}}X\) is positive definite, so both \(\Sigma_0^{-1}\) and \(\Sigma(n_{\mathrm{prior}})^{-1}\) are well defined. Assumptions 2–3 (\(n_w>p\), \(\mathrm{RSS}_w>0\)) ensure that the scalar multipliers in \(\Sigma(n_{\mathrm{prior}})\) are positive. Assumptions 4–5 (\(k\ge0\), \(k+p\ge2\)) ensure that the Gamma shapes and rates used in the kernels are strictly positive for all \(n_{\mathrm{prior}}>0\).

The \(\tau\)-powers match (ING shape = NG shape \(+\;p/2\)), so \[ \tilde R_{n_{\mathrm{prior}}}(\beta,\tau) = \frac{ \pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\beta,\tau) }{ \pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\beta,\tau) } = \exp\!\left( -\frac{1}{2}(\beta-\mu)^\top\Sigma(n_{\mathrm{prior}})^{-1}(\beta-\mu) +\frac{\tau}{2}(\beta-\mu)^\top\Sigma_0^{-1}(\beta-\mu) \right). \]

Define \[ q_{n_{\mathrm{prior}}}(\beta) := -(\beta-\mu)^\top\Sigma_0^{-1}(\beta-\mu), \qquad c_p := 0, \] and \[ h_{n_{\mathrm{prior}}}(\beta) := \exp\!\left( -\frac{1}{2}(\beta-\mu)^\top\Sigma(n_{\mathrm{prior}})^{-1}(\beta-\mu) \right). \]

Then \[ \tilde R_{n_{\mathrm{prior}}}(\beta,\tau) = h_{n_{\mathrm{prior}}}(\beta)\, \tau^{c_p}\, \exp\!\bigl(-\tfrac{1}{2}\tau\,q_{n_{\mathrm{prior}}}(\beta)\bigr), \] with \(q_{n_{\mathrm{prior}}}(\beta)\) a quadratic form in \(\beta\) that does not depend on \(\tau\) and, in fact, does not depend on \(n_{\mathrm{prior}}\) at all. It is therefore continuous in \(n_{\mathrm{prior}}\) and has a finite limit as \(n_{\mathrm{prior}}\to 0^+\).

Using the explicit formula for \(\Sigma(n_{\mathrm{prior}})\), \[ \Sigma(n_{\mathrm{prior}})^{-1} = \frac{n_{\mathrm{prior}}}{n_w}\, \frac{n_w - p}{\mathrm{Smarg}}\, (X^\top W_{\mathrm{obs}}X), \] we see that \(\Sigma(n_{\mathrm{prior}})^{-1}\to 0\) as \(n_{\mathrm{prior}}\to 0^+\). This uses Assumptions 2–3 to ensure the scalar prefactor is positive. Hence, for each fixed \(\beta\), \[ h_{n_{\mathrm{prior}}}(\beta) = \exp\!\left( -\frac{1}{2}(\beta-\mu)^\top\Sigma(n_{\mathrm{prior}})^{-1}(\beta-\mu) \right) \longrightarrow \exp(0) = 1. \]

This proves the claimed representation of \(\tilde R_{n_{\mathrm{prior}}}\) and the pointwise convergence \(h_{n_{\mathrm{prior}}}(\beta)\to 1\). \(\square\)

Claim B.2 (Uniform envelope and integrability)

Under Assumptions 1–3 and for \(0 < n_{\mathrm{prior}} < \delta\) as in Claim A.1, there exist constants \(C, c_1, c_2, c_3 > 0\) and a measurable function

\[ M(\beta,\tau) = C\,(1 + \tau^{c_1})\,\exp(-c_2 \tau)\,\exp\bigl(c_3 \|\beta\|^2\bigr) \]

such that

\[ \bigl|R_{n_{\mathrm{prior}}}(\beta,\tau)\bigr| \le M(\beta,\tau) \quad\text{for all }(\beta,\tau)\text{ and }0 < n_{\mathrm{prior}} < \delta, \]

and

\[ \int M(\beta,\tau)\,\Pi_0(\mathrm{d}\beta,\mathrm{d}\tau) < \infty. \]

Proof of Claim B.2.

Recall \[ R_{n_{\mathrm{prior}}}(\beta,\tau) = \frac{ \pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\beta,\tau\mid y) }{ \pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\beta,\tau\mid y) } = \tilde R_{n_{\mathrm{prior}}}(\beta,\tau)\, \frac{ Z^{(\mathrm{NG})}_{n_{\mathrm{prior}}} }{ Z^{(\mathrm{ING})}_{n_{\mathrm{prior}}} }, \] with \(\tilde R_{n_{\mathrm{prior}}}\) the prior–kernel ratio from Claim B.1 and \[ Z^{(\mathrm{NG})}_{n_{\mathrm{prior}}} = \iint L(y\mid\beta,\tau)\,\pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\beta,\tau)\,\mathrm{d}\beta\,\mathrm{d}\tau, \quad Z^{(\mathrm{ING})}_{n_{\mathrm{prior}}} = \iint L(y\mid\beta,\tau)\,\pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\beta,\tau)\,\mathrm{d}\beta\,\mathrm{d}\tau. \]


Step 1: Envelope for \(\tilde R_{n_{\mathrm{prior}}}\).

From Claim B.1 and the explicit formulas in A.2, \[ \log \tilde R_{n_{\mathrm{prior}}}(\beta,\tau) = -\tfrac12(\beta-\mu)^\top\Sigma(n_{\mathrm{prior}})^{-1}(\beta-\mu) +\tfrac12\tau\,(\beta-\mu)^\top\Sigma_0^{-1}(\beta-\mu). \]

Assumption 1 ensures \(X^\top W_{\mathrm{obs}}X\) is positive definite, so both \(\Sigma_0^{-1}\) and \(\Sigma(n_{\mathrm{prior}})^{-1}\) exist. Assumptions 2–3 (\(n_w>p\), \(\mathrm{RSS}_w>0\)) ensure the scalar multipliers in \(\Sigma(n_{\mathrm{prior}})\) are positive. Assumptions 4–5 (\(k\ge0\), \(k+p\ge2\)) ensure the Gamma shapes and rates used in the kernels are strictly positive.

For \(0<n_{\mathrm{prior}}<\delta\), Claim A.1 implies that the operator norms of \(\Sigma_0^{-1}\) and \(\Sigma(n_{\mathrm{prior}})^{-1}\) are uniformly bounded. Hence there exists \(C>0\) such that \[ \bigl|\log \tilde R_{n_{\mathrm{prior}}}(\beta,\tau)\bigr| \le C\,(1+\tau)\,\|\beta-\mu\|^2 \le C'\,(1+\tau)\,(1+\|\beta\|^2) \] for all \((\beta,\tau)\) and \(0<n_{\mathrm{prior}}<\delta\). Exponentiating and absorbing constants, \[ \bigl|\tilde R_{n_{\mathrm{prior}}}(\beta,\tau)\bigr| \le C_0\,(1+\tau^{c_1})\,\exp(-c_2\tau)\,\exp\bigl(c_3\|\beta\|^2\bigr) =: M_0(\beta,\tau), \] for suitable \(C_0,c_1,c_2,c_3>0\) independent of \(n_{\mathrm{prior}}\). This gives the desired functional form for an envelope of \(\tilde R_{n_{\mathrm{prior}}}\).


Step 2: Boundedness of the normalizing–constant ratio.

The maps \[ n_{\mathrm{prior}}\mapsto Z^{(\mathrm{NG})}_{n_{\mathrm{prior}}}, \qquad n_{\mathrm{prior}}\mapsto Z^{(\mathrm{ING})}_{n_{\mathrm{prior}}} \] are continuous on \((0,\delta)\) because the integrands depend continuously on \(n_{\mathrm{prior}}\) and are dominated by an integrable envelope. The likelihood \(L(y\mid\beta,\tau)\) times the NG prior kernel, together with the uniform moment bounds from Lemma A (which rely on Assumptions 2–5 to ensure the Gamma parameters remain in compact subsets of \((0,\infty)\)), provide such domination.

In particular, both \(Z^{(\mathrm{NG})}_{n_{\mathrm{prior}}}\) and \(Z^{(\mathrm{ING})}_{n_{\mathrm{prior}}}\) stay in compact subsets of \((0,\infty)\) for \(0<n_{\mathrm{prior}}<\delta\), so there exists \(K>0\) such that \[ \frac{ Z^{(\mathrm{NG})}_{n_{\mathrm{prior}}} }{ Z^{(\mathrm{ING})}_{n_{\mathrm{prior}}} } \in [K^{-1},K] \quad\text{for }0<n_{\mathrm{prior}}<\delta. \]


Step 3: Envelope for \(R_{n_{\mathrm{prior}}}\) and integrability under \(\Pi_0\).

Combining Steps 1–2, \[ \bigl|R_{n_{\mathrm{prior}}}(\beta,\tau)\bigr| \le K\,M_0(\beta,\tau) = C\,(1+\tau^{c_1})\,\exp(-c_2\tau)\,\exp\bigl(c_3\|\beta\|^2\bigr) =: M(\beta,\tau), \] for all \((\beta,\tau)\) and \(0<n_{\mathrm{prior}}<\delta\), with \(C=K C_0\).

Under the limiting NG law \(\Pi_0\) from Theorem 2, \(\tau\) has a Gamma distribution with shape \(a_0>1\) and rate \(b_0>0\), and \(\beta\mid\tau\) is Gaussian with covariance proportional to \(\tau^{-1}G^{-1}\). Assumptions 2–5 ensure these limiting parameters are strictly positive. For \(c_2>0\) small enough and \(c_3>0\) small enough, all mixed moments \(\mathbb{E}_{\Pi_0}[\tau^{k}\exp(c_3\|\beta\|^2)]\) with \(k\le c_1\) are finite, so \[ \int M(\beta,\tau)\,\Pi_0(\mathrm{d}\beta,\mathrm{d}\tau)<\infty. \]

This establishes the claimed envelope and integrability, proving Claim B.2. \(\square\)

Proof of Lemma B.

Write both posteriors as \[ \pi^{(\cdot)}_{n_{\mathrm{prior}}}(\beta,\tau\mid y) \propto L(y\mid\beta,\tau)\, \pi^{(\cdot)}_{n_{\mathrm{prior}}}(\beta,\tau), \] with the common Gaussian likelihood \(L(y\mid\beta,\tau)\) from Section A.1. The likelihood cancels in the posterior ratio, so \[ R_{n_{\mathrm{prior}}}(\beta,\tau) = \frac{ \pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\beta,\tau\mid y) }{ \pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\beta,\tau\mid y) } = \frac{ \pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\beta,\tau) }{ \pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\beta,\tau) } \cdot \frac{ Z^{(\mathrm{NG})}_{n_{\mathrm{prior}}} }{ Z^{(\mathrm{ING})}_{n_{\mathrm{prior}}} } = \tilde R_{n_{\mathrm{prior}}}(\beta,\tau)\, \frac{ Z^{(\mathrm{NG})}_{n_{\mathrm{prior}}} }{ Z^{(\mathrm{ING})}_{n_{\mathrm{prior}}} }, \] where \(\tilde R_{n_{\mathrm{prior}}}\) is the prior–kernel ratio from Claim B.1 and \[ Z^{(\mathrm{NG})}_{n_{\mathrm{prior}}} = \iint L\,\pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}},\qquad Z^{(\mathrm{ING})}_{n_{\mathrm{prior}}} = \iint L\,\pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}. \]

By Claim B.1, \[ \tilde R_{n_{\mathrm{prior}}}(\beta,\tau) = h_{n_{\mathrm{prior}}}(\beta)\,\tau^{c_p} \exp\!\bigl(-\tfrac12\tau\,q_{n_{\mathrm{prior}}}(\beta)\bigr), \] with \(h_{n_{\mathrm{prior}}}(\beta)\to 1\) for each fixed \(\beta\) and \(q_{n_{\mathrm{prior}}}(\beta)\) a quadratic form whose coefficients are continuous in \(n_{\mathrm{prior}}\) and converge pointwise. Assumption 1 ensures \(X^\top W_{\mathrm{obs}}X\) is positive definite, so \(\Sigma_0^{-1}\) and \(\Sigma(n_{\mathrm{prior}})^{-1}\) are well defined. Assumptions 2–3 (\(n_w>p\), \(\mathrm{RSS}_w>0\)) ensure the scalar multipliers in \(\Sigma(n_{\mathrm{prior}})\) are positive. Assumptions 4–5 (\(k\ge0\), \(k+p\ge2\)) ensure the Gamma shapes and rates in the kernels are strictly positive.

In our explicit construction, \(q_{n_{\mathrm{prior}}}(\beta)\) does not depend on \(n_{\mathrm{prior}}\) at all, and \[ h_{n_{\mathrm{prior}}}(\beta) = \exp\!\left( -\tfrac12(\beta-\mu)^\top\Sigma(n_{\mathrm{prior}})^{-1}(\beta-\mu) \right) \to 1 \] because \(\Sigma(n_{\mathrm{prior}})^{-1}\to 0\) as \(n_{\mathrm{prior}}\to 0^+\). Thus, for each fixed \((\beta,\tau)\), \[ \tilde R_{n_{\mathrm{prior}}}(\beta,\tau)\to 1. \]

Next, write the normalizing–constant ratio as \[ \frac{ Z^{(\mathrm{NG})}_{n_{\mathrm{prior}}} }{ Z^{(\mathrm{ING})}_{n_{\mathrm{prior}}} } = \frac{ \iint L\,\pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}} }{ \iint L\,\pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}} } = \frac{ \iint L\,\pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}} }{ \iint L\,\tilde R_{n_{\mathrm{prior}}}\,\pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}} } = \frac{1}{ \mathbb{E}_{\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}}\!\bigl[\tilde R_{n_{\mathrm{prior}}}(\beta,\tau)\bigr] }. \]

Claim B.2 provides a measurable envelope \(M(\beta,\tau)\) such that \[ \sup_{0<n_{\mathrm{prior}}<\delta} |R_{n_{\mathrm{prior}}}(\beta,\tau)| \le M(\beta,\tau), \qquad \int M(\beta,\tau)\,\Pi_0(\mathrm{d}\beta,\mathrm{d}\tau)<\infty, \] where \(\Pi_0\) is the NG weak–prior limit from Theorem 2. Assumptions 2–5 ensure that the limiting Gamma parameters of \(\Pi_0\) are strictly positive, which guarantees integrability of the envelope.

In particular, for \(n_{\mathrm{prior}}\) small, the normalizing–constant ratio stays in a bounded interval, so \(|\tilde R_{n_{\mathrm{prior}}}|\) is also dominated by a multiple of \(M\). Together with the pointwise convergence \(\tilde R_{n_{\mathrm{prior}}}\to 1\) and the uniform moment bounds from Lemma A (which rely on Assumptions 2–5), this yields, by dominated convergence, \[ \mathbb{E}_{\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}}\!\bigl[\tilde R_{n_{\mathrm{prior}}}(\beta,\tau)\bigr] \longrightarrow 1, \qquad \frac{ Z^{(\mathrm{NG})}_{n_{\mathrm{prior}}} }{ Z^{(\mathrm{ING})}_{n_{\mathrm{prior}}} } \longrightarrow 1. \]

Finally, \[ R_{n_{\mathrm{prior}}}(\beta,\tau) = \tilde R_{n_{\mathrm{prior}}}(\beta,\tau)\, \frac{ Z^{(\mathrm{NG})}_{n_{\mathrm{prior}}} }{ Z^{(\mathrm{ING})}_{n_{\mathrm{prior}}} } \longrightarrow 1 \quad\text{for each fixed }(\beta,\tau), \] and the same envelope \(M\) from Claim B.2 provides the required domination. This proves Lemma B. \(\square\)


A.8 Summary

The proof of Theorem 3 reduces to:

  • establishing uniform moment bounds for the NG path (Lemma A),
  • proving ratio convergence and domination (Lemma B),
  • applying dominated convergence to show ING \(\approx\) NG for small \(n_{\mathrm{prior}}\),
  • and combining this with the NG weak‑prior limit (Theorem 2).

Only Lemmas A and B require nontrivial work; all other steps follow from standard arguments in posterior convergence theory.

Appendix B: Derivation of Theorem 1 (Conjugate Normal–Gamma posterior)

We sketch how each closed‑form expression in Theorem 1 follows from standard Normal–Gamma algebra under the calibration in §3.3.1–3.3.2; see (Raiffa and Schlaifer 1961; Gelman et al. 2013) for the underlying updates.

B.1 Setup and joint kernel

Start from the prior \[ \beta \mid \tau \sim N\bigl(\mu,\;\tau^{-1}\Sigma_0\bigr), \qquad \tau \sim \Gamma(a_0,b_0), \] and the weighted Gaussian likelihood \[ y \mid \beta,\tau \sim N\bigl(X\beta,\;\tau^{-1}W_{\mathrm{obs}}^{-1}\bigr), \] with \[ G := X^\top W_{\mathrm{obs}}X,\qquad \hat\beta := G^{-1}X^\top W_{\mathrm{obs}}y,\qquad \mathrm{RSS}_w := (y-X\hat\beta)^\top W_{\mathrm{obs}}(y-X\hat\beta). \]

Assumption 1 ensures \(G\) is positive definite, so \(G^{-1}\) exists. Assumption 2 ensures \(n_w>p\), so the weighted Gaussian likelihood is proper. Assumption 3 ensures \(\mathrm{RSS}_w>0\), so the marginal quadratic term is strictly positive.

Under the Zellner calibration in §3.3.2, \[ \Sigma_0 = \frac{1-\mathrm{pwt}}{\mathrm{pwt}}\,G^{-1} = \frac{n_w}{n_{\mathrm{prior}}}\,G^{-1}. \]

and \[ a_0 = \frac{n_{\mathrm{prior}}+k}{2}, \qquad b_0 = \frac{n_{\mathrm{prior}}+k+p-2}{2}\,\frac{\mathrm{Smarg}}{n_w-p}, \]

with \(k\ge0\) and \(k+p\ge2\) by Assumptions 4–5, ensuring \(a_0>0\) and \(b_0>0\).

The joint prior–likelihood kernel in \((\beta,\tau)\) is \[ \pi(\beta,\tau\mid y) \propto \tau^{a_0-1}\exp(-b_0\tau)\, \tau^{p/2}\exp\!\Bigl(-\tfrac{\tau}{2}(\beta-\mu)^\top\Sigma_0^{-1}(\beta-\mu)\Bigr)\, \tau^{n_w/2}\exp\!\Bigl(-\tfrac{\tau}{2}\mathrm{RSS}_w(\beta)\Bigr), \] where \[ \mathrm{RSS}_w(\beta) = \mathrm{RSS}_w + (\beta-\hat\beta)^\top G(\beta-\hat\beta). \]

Collecting powers of \(\tau\) gives the Gamma shape update; collecting quadratic forms in \(\beta\) and completing the square gives the Normal block.


B.2 Posterior Normal block: mean and dispersion‑free covariance

The quadratic form in \(\beta\) is \[ \frac{\tau}{2} \Bigl[ (\beta-\hat\beta)^\top G(\beta-\hat\beta) + (\beta-\mu)^\top\Sigma_0^{-1}(\beta-\mu) \Bigr]. \]

Write \[ G_{\mathrm{post}} := G + \Sigma_0^{-1}, \] and complete the square: \[ (\beta-\hat\beta)^\top G(\beta-\hat\beta) + (\beta-\mu)^\top\Sigma_0^{-1}(\beta-\mu) = (\beta-\mu_{\mathrm{post}})^\top G_{\mathrm{post}}(\beta-\mu_{\mathrm{post}}) + \text{const}, \] with \[ \mu_{\mathrm{post}} = G_{\mathrm{post}}^{-1}\bigl(G\hat\beta + \Sigma_0^{-1}\mu\bigr). \]

Using \(\Sigma_0^{-1} = \frac{\mathrm{pwt}}{1-\mathrm{pwt}}G = \frac{n_w}{n_{\mathrm{prior}}}G\), we have \[ G_{\mathrm{post}} = \Bigl(1+\frac{n_w}{n_{\mathrm{prior}}}\Bigr)G = \frac{n_{\mathrm{prior}}+n_w}{n_{\mathrm{prior}}}\,G, \] so \[ G_{\mathrm{post}}^{-1} = \frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}}+n_w}\,G^{-1}. \]

Substituting into \(\mu_{\mathrm{post}}\), \[ \mu_{\mathrm{post}} = \frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}}+n_w}\,\mu + \frac{n_w}{n_{\mathrm{prior}}+n_w}\,\hat\beta, \] and the dispersion‑free posterior covariance is \[ \Sigma_{0,\mathrm{post}} = G_{\mathrm{post}}^{-1} = \frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}}+n_w}\,G^{-1} = \frac{n_{\mathrm{prior}}+n_w}{n_w}\,\Sigma_0, \] which matches item (ii) of Theorem 1.


B.3 Posterior Gamma block: shape and rate

To obtain the posterior Gamma update for \(\tau\), we must work with the marginal kernel \(\pi(\tau\mid y)\), not the conditional kernel \(\pi(\tau\mid\beta,y)\). This distinction matters because the conditional Normal density in \(\beta\mid\tau\) contains a factor \(\tau^{p/2}\), but this factor is exactly canceled when we integrate out \(\beta\).

Start from the joint kernel \[ \pi(\beta,\tau\mid y) \;\propto\; \tau^{a_0-1}\,e^{-b_0\tau}\; \tau^{p/2}\exp\!\Bigl(-\tfrac{\tau}{2}(\beta-\mu)^\top\Sigma_0^{-1}(\beta-\mu)\Bigr)\; \tau^{n_w/2}\exp\!\Bigl(-\tfrac{\tau}{2}\mathrm{RSS}_w(\beta)\Bigr). \]

If we look only at the conditional kernel in \(\beta\mid\tau\), the exponent of \(\tau\) appears to be \[ a_0 - 1 + \frac{p}{2} + \frac{n_w}{2}. \]

However, the marginal Gamma update is obtained from \[ \pi(\tau\mid y) \;\propto\; \tau^{a_0-1}\,e^{-b_0\tau}\; \tau^{n_w/2} \int \tau^{p/2}\exp\!\Bigl(-\tfrac{\tau}{2}Q(\beta)\Bigr)\,d\beta, \] where \(Q(\beta)\) is the quadratic form combining the likelihood and prior.

The integral over \(\beta\) is a multivariate Gaussian integral: \[ \int \tau^{p/2}\exp\!\Bigl(-\tfrac{\tau}{2}Q(\beta)\Bigr)\,d\beta = \tau^{p/2}\cdot (2\pi)^{p/2}\cdot \tau^{-p/2}\cdot |G_{\mathrm{post}}|^{-1/2} \exp\!\Bigl(-\tfrac{\tau}{2}Q(\mu_{\mathrm{post}})\Bigr). \]

The crucial point is the cancellation: \[ \tau^{p/2}\times\tau^{-p/2} = 1. \]

Thus no \(p/2\) term survives in the marginal kernel for \(\tau\).

After cancellation, the only remaining powers of \(\tau\) are \[ a_0 - 1 + \frac{n_w}{2}, \] so the posterior Gamma shape is \[ a_n = a_0 + \frac{n_w}{2} = \frac{n_{\mathrm{prior}}+k}{2} + \frac{n_w}{2} = \frac{n_{\mathrm{prior}} + k + n_w}{2}, \] matching item (iii) of Theorem 1.

For the rate parameter, the Gaussian integral contributes the marginal quadratic term from §3.1: \[ \frac{1}{2}\,\mathrm{Smarg}. \]

Thus \[ b_n = b_0 + \frac{1}{2}\,\mathrm{Smarg} = \frac{1}{2}\,\frac{n_{\mathrm{prior}}+k+p-2}{n_w-p}\,\mathrm{Smarg} + \frac{1}{2}\,\mathrm{Smarg} = \frac{1}{2}\,\frac{n_{\mathrm{prior}}+k+n_w-2}{n_w-p}\,\mathrm{Smarg}, \] which reduces to the expression in item (iv) under the calibration of §3.3.1.


B.4 Marginal moments of \(\beta\) and \(\sigma^2\)

Given \(\tau\), the posterior factorizes as \[ \beta\mid\tau,y \sim N\bigl(\mu_{\mathrm{post}},\;\tau^{-1}\Sigma_{0,\mathrm{post}}\bigr), \qquad \tau\mid y \sim \Gamma(a_n,b_n), \] with \[ \mu_{\mathrm{post}} = \frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}}+n_w}\,\mu + \frac{n_w}{n_{\mathrm{prior}}+n_w}\,\hat\beta, \qquad \Sigma_{0,\mathrm{post}} = \frac{n_{\mathrm{prior}}+n_w}{n_w}\,G^{-1}, \] \[ a_n = \frac{n_{\mathrm{prior}}+k+n_w}{2}, \qquad b_n = \frac{n_{\mathrm{prior}}+k+n_w-2}{2}\, \frac{\mathrm{Smarg}}{n_w-p}. \]


Marginal mean of \(\beta\).
Using the law of total expectation, \[ E[\beta\mid y] = E_\tau\bigl[E[\beta\mid\tau,y]\bigr] = E_\tau[\mu_{\mathrm{post}}] = \mu_{\mathrm{post}}, \] since \(\mu_{\mathrm{post}}\) does not depend on \(\tau\). Thus \[ E[\beta\mid y] = \frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}}+n_w}\,\mu + \frac{n_w}{n_{\mathrm{prior}}+n_w}\,\hat\beta, \] a convex combination of the prior mean and the weighted least‑squares estimate, with weights \[ \frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}}+n_w} \quad\text{and}\quad \frac{n_w}{n_{\mathrm{prior}}+n_w}, \] as in item (v).


Marginal mean of \(\sigma^2 = \tau^{-1}\).
For \(\tau\sim\Gamma(a_n,b_n)\) with shape–rate parameterization, \[ E[\tau^{-1}\mid y] = \frac{b_n}{a_n-1}, \quad\text{provided }a_n>1. \]

Assumptions 4–5 (\(k\ge0\), \(k+p\ge2\)) together with Assumption 2 (\(n_w>p\)) ensure
\[ a_n=\frac{n_{\mathrm{prior}}+k+n_w}{2}>1, \] so the expectation is well‑defined.

Substituting the expressions for \(a_n\) and \(b_n\), \[ E[\sigma^2\mid y] = \frac{ \frac{n_{\mathrm{prior}}+k+n_w-2}{2}\,\frac{\mathrm{Smarg}}{n_w-p} }{ \frac{n_{\mathrm{prior}}+k+n_w}{2}-1 } = \frac{\mathrm{Smarg}}{n_w-p}, \] which is exactly the residual‑variance estimator in item (vi).
Assumption 3 (\(\mathrm{RSS}_w>0\)) ensures \(\mathrm{Smarg}>0\).


Marginal covariance of \(\beta\).
By the law of total covariance, \[ \mathrm{Cov}(\beta\mid y) = E_\tau\bigl[\mathrm{Cov}(\beta\mid\tau,y)\bigr] + \mathrm{Cov}_\tau\bigl(E[\beta\mid\tau,y]\bigr). \] Since \(E[\beta\mid\tau,y]=\mu_{\mathrm{post}}\) does not depend on \(\tau\), the second term vanishes and \[ \mathrm{Cov}(\beta\mid y) = E_\tau\bigl[\tau^{-1}\Sigma_{0,\mathrm{post}}\bigr] = E[\tau^{-1}\mid y]\;\Sigma_{0,\mathrm{post}}. \]

We now compute both factors explicitly.


Step 1: \(E[\tau^{-1}\mid y]\).
From the Gamma block in Theorem 1, \[ \tau\mid y \sim \Gamma(a_n,b_n), \qquad a_n = \frac{n_{\mathrm{prior}}+k+n_w}{2}, \quad b_n = \frac{n_{\mathrm{prior}}+k+n_w-2}{2}\,\frac{\mathrm{Smarg}}{n_w-p}. \]

Assumptions 4–5 together with Assumption 2 ensure \(a_n>1\), and Assumptions 2–3 ensure \(b_n>0\).
Thus the Gamma moment formula applies: \[ E[\tau^{-1}\mid y] = \frac{b_n}{a_n-1}. \]

Substitute: \[ a_n-1 = \frac{n_{\mathrm{prior}}+k+n_w}{2}-1 = \frac{n_{\mathrm{prior}}+k+n_w-2}{2}, \] so \[ E[\tau^{-1}\mid y] = \frac{ \frac{n_{\mathrm{prior}}+k+n_w-2}{2}\,\frac{\mathrm{Smarg}}{n_w-p} }{ \frac{n_{\mathrm{prior}}+k+n_w-2}{2} } = \frac{\mathrm{Smarg}}{n_w-p}. \]


Step 2: \(\Sigma_{0,\mathrm{post}}\).
By conjugate Normal–Gamma algebra, \[ \Sigma_{0,\mathrm{post}} = \bigl(\Sigma_0^{-1} + G\bigr)^{-1}, \qquad G = X^\top W_{\mathrm{obs}}X. \]

Assumption 1 ensures \(G\) is positive definite, so all inverses exist.

Under the Zellner calibration, \[ \Sigma_0 = \frac{1-\mathrm{pwt}}{\mathrm{pwt}}\,G^{-1}, \quad\text{so}\quad \Sigma_0^{-1} = \frac{\mathrm{pwt}}{1-\mathrm{pwt}}\,G. \]

Hence \[ \Sigma_0^{-1} + G = \Bigl(\frac{\mathrm{pwt}}{1-\mathrm{pwt}} + 1\Bigr)G = \frac{1}{1-\mathrm{pwt}}\,G, \] and therefore \[ \Sigma_{0,\mathrm{post}} = (1-\mathrm{pwt})\,G^{-1}. \]

Now use the mapping between \(\mathrm{pwt}\) and \(n_{\mathrm{prior}}\): \[ \mathrm{pwt} = \frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}}+n_w} \quad\Longrightarrow\quad 1-\mathrm{pwt} = \frac{n_w}{n_{\mathrm{prior}}+n_w}. \]

Thus \[ \Sigma_{0,\mathrm{post}} = (1-\mathrm{pwt})\,G^{-1} = \frac{n_w}{n_{\mathrm{prior}}+n_w}\,G^{-1}. \]


Step 3: Combine the pieces.
Putting Steps 1 and 2 together, \[ \mathrm{Cov}(\beta\mid y) = E[\tau^{-1}\mid y]\;\Sigma_{0,\mathrm{post}} = \frac{\mathrm{Smarg}}{n_w-p}\, \frac{n_w}{n_{\mathrm{prior}}+n_w}\,G^{-1}, \] which is exactly item (vii) of Theorem 1.

In particular, the covariance can be written as \[ \mathrm{Cov}(\beta\mid y) = \Bigl(\text{residual variance estimate } \tfrac{\mathrm{Smarg}}{n_w-p}\Bigr) \times \Bigl(\text{shrinkage factor } \tfrac{n_w}{n_{\mathrm{prior}}+n_w}\Bigr) \times G^{-1}, \] making explicit how larger \(n_{\mathrm{prior}}\) reduces the covariance relative to the weak‑prior (least‑squares) limit obtained when \(n_{\mathrm{prior}}\to 0^+\).

This completes the derivation of the marginal moments in Theorem 1.

References

Gelman, Andrew, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. 2013. Bayesian Data Analysis. 3rd ed. CRC Press.
Griffin, Jim E., and Philip J. Brown. 2010. “Inference with Normal-Gamma Prior Distributions in Regression Problems.” Bayesian Analysis 5 (1): 171–88. https://doi.org/10.1214/10-BA507.
Lindley, D. V., and A. F. M. Smith. 1972. “Bayes Estimates for the Linear Model.” Journal of the Royal Statistical Society. Series B (Methodological) 34 (1): 1–41. https://doi.org/10.1111/j.2517-6161.1972.tb00899.x.
McCullagh, P., and J. A. Nelder. 1989. Generalized Linear Models. Chapman; Hall.
Nygren, Kjell. 2025. Chapter 03: Tailoring Priors - Leveraging the Prior_setup Function. Vignette in the glmbayes R package.
Raiffa, Howard, and R. Schlaifer. 1961. Applied Statistical Decision Theory. Clinton Press, Inc.
Zellner, Arnold. 1986. “On Assessing Prior Distributions and Bayesian Regression Analysis with g‐prior Distributions.” In Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti, edited by P. K. Goel and Arnold Zellner, vol. 6. Studies in Bayesian Econometrics and Statistics. Elsevier.