Prior_Setup()This appendix provides a complete and self‑contained derivation of
the prior objects returned by Prior_Setup() and of the
Gaussian prior families used throughout glmbayes. Its
purpose is to make explicit how the returned quantities—mu,
Sigma, Sigma_0, dispersion,
shape, rate, and related fields—arise from the
weighted Gaussian likelihood, the Normal–Gamma
algebra, and the Zellner‑type calibration used
by the package.
Unlike Chapter 11, which focuses on modeling workflow and examples, this chapter focuses on the mathematical structure underlying the priors:
Sigma_0,pwt, n_prior, and
prior strength,All formulas needed by the main vignettes are derived here from first principles. No results are imported from Chapter 11; instead, Chapter 11 now serves as a conceptual overview, while this appendix provides the full algebraic details.
The goal is to make the calibration used by
Prior_Setup() transparent, reproducible, and extensible, so
that users can confidently interpret or modify the priors supplied to
dNormal(), dNormal_Gamma(), and
dIndependent_Normal_Gamma().
Textbook treatments of conjugate Normal–Gamma linear models and
related updating appear in (Gelman et al. 2013;
Raiffa and Schlaifer 1961). The Zellner \(g\)-prior scaling used for coefficient
covariances is due to (Zellner 1986).
Applied prior construction with Prior_Setup() is in (Nygren 2025).
This appendix records precise formulas and
derivations for the prior objects returned by
Prior_Setup() and for the conjugate Normal–Gamma Gaussian
model used by dNormal_Gamma(). The goal is to connect
implementation quantities (mu, Sigma,
Sigma_0, dispersion, shape,
rate, and related settings) to the weighted
likelihood notation and \(S_{\mathrm{marg}}\) machinery in
Chapter 11 (especially Section 3.2 and Appendix A3), with steps spelled
out rather than only stated.
This chapter is a companion to the main vignettes: it emphasizes
theory, mapping to pfamily constructors,
and how defaults encode prior strength.
Roadmap. Chapter 11 fixes notation for weighted
Gaussian regression (\(n_w\), \(G = X^{\mathsf T} W X\), precision \(\tau = 1/\phi\), and the conjugate
Normal–Gamma structure). Appendix A3 there gives closed-form posterior
moments for \(\beta\) under the
Zellner-type prior implied by scalar pwt. Chapter A02
documents how pfamily objects map to lower-level simulation
functions. Here we tie those ideas to what
Prior_Setup() actually returns and how to pass
those fields into dNormal(), dNormal_Gamma(),
and dIndependent_Normal_Gamma() without mixing
coefficient-scale Sigma, dispersion-free
Sigma_0, and optional fixed dispersion (see
?Prior_Setup, ?compute_gaussian_prior).
This section concerns families such as binomial and
Poisson where the usual exponential-family dispersion
is \(\phi=1\)
(Chapters 5, 7, and 8). Gaussian models and
dNormal_Gamma are in Section 3.
Let \(n_w = \sum_i w_i\) for
nonnegative observation weights \(w_i\) in the weighted likelihood (the same
totals appear as PriorSettings$n_effective). These \(w_i\) are fixed by design
and do not depend on \(\beta\).
The Prior_Setup function provides three options for setting the prior
mean vector mu. By default, it is set to correspond to the NULL
(intercept only) model
(intercept_source = "null_model",effects_source = "null_effects")
. Alternatively, the user can change this to correspond to the OLS
estimates for the intercept (intercept_source =full_model")
, the predictors (effects_source ="full_model"), or both.
Finally, the user can also optionally provide their own custom prior
mean vector mu directly to the Prior_Setup function.
Let \(\ell(\beta)\) be the weighted log-likelihood as in Chapters 7–8, with \(\eta_i = x_i^{\mathsf T}\beta\). Define the data precision matrix \[ P(\beta) := \nabla^2_\beta\bigl(-\ell(\beta)\bigr), \] the Hessian of the negative log-likelihood. With \(\ell(\beta)=\sum_i \ell_i(\eta_i)\), \[ P(\beta) = X^{\mathsf T} W(\beta)\, X, \qquad W_i(\beta) := -\frac{d^2 \ell_i}{d\eta_i^2}\Big|_{\eta_i=x_i^{\mathsf T}\beta} \ge 0 \] (log-concavity in \(\eta\); Chapter 5), and \(W(\beta)\) diagonal. The Hessian form of \(P(\beta)\) matches standard GLM theory (McCullagh and Nelder 1989).
Write \(W_i(\beta) = w_i\,\omega_i(\beta)\) with fixed \(w_i\) and mean-dependent \(\omega_i(\beta)\). Let \(W_{\mathrm{obs}}=\mathrm{diag}(w_i)\) and \(\Omega(\beta)=\mathrm{diag}(\omega_i(\beta))\). Then \(W(\beta) = W_{\mathrm{obs}}\,\Omega(\beta)\) (indexwise) and \[ P(\beta) = X^{\mathsf T} W_{\mathrm{obs}}\,\Omega(\beta)\, X. \]
Examples (Chapters 7–8):
For these families, Prior_Setup() sets
dispersion, shape, rate, and
Sigma_0 to NULL. Let \(V_0\) denote the sampling
covariance matrix of the fitted coefficients \(\beta^{\ast}\) under the stated model
(\(\phi=1\)). Then \[
V_0^{-1} = P(\beta^{\ast}).
\]
Weighted Gaussian, fixed dispersion \(d\). (See Section 3 for prior
outputs.) Then \(P(\beta)=\frac{1}{d}
X^{\mathsf T} W_{\mathrm{obs}} X\) for all \(\beta\), and the same identification gives
\(V_0^{-1}=\frac{1}{d} X^{\mathsf T}
W_{\mathrm{obs}} X\) when \(V_0\) is the covariance matrix at
dispersion \(d\). User-provided \(d\) in
compute_gaussian_prior() sets returned
dispersion to \(d\) and
rescales Sigma so this scale is explicit in the returned
list.
Prior covariance: scalar pwt, \(\Sigma = \frac{1-\mathrm{pwt}}{\mathrm{pwt}}\,
V_0\) (equivalently \(\Sigma^{-1} =
\frac{\mathrm{pwt}}{1-\mathrm{pwt}}\, P(\beta^{\ast})\))
This Sigma is what Prior_Setup() returns by
default on the coefficient scale. For Gaussian fits, the returned
dispersion-free matrix is
\[
\Sigma_0 = \Sigma / d,
\] so \[
\Sigma_0^{-1}
=
d\,\Sigma^{-1}
=
d\,\frac{\mathrm{pwt}}{1-\mathrm{pwt}}\,P(\beta^{\ast}).
\] Using \(P(\beta^{\ast})=\frac{1}{d}X^{\mathsf
T}WX\) in weighted Gaussian regression gives \[
\Sigma_0^{-1}
=
\frac{\mathrm{pwt}}{1-\mathrm{pwt}}\,X^{\mathsf T}WX,
\] which is independent of \(d\); this is the default
Sigma_0 returned by Prior_Setup().
dNormal()When default settings are used, the Gaussian posterior means reduce to simple weighted averages of the fitted coefficient vector \(\beta^\ast\) and prior mean \(\mu\).
dNormal() (Gaussian, coefficient-scale
covariance Sigma). For Gaussian likelihood
precision \(P(\beta^\ast)\) and prior
precision \(\Sigma^{-1}\), \[
E(\beta\mid y)
=
\bigl(P(\beta^\ast)+\Sigma^{-1}\bigr)^{-1}
\Bigl(P(\beta^\ast)\beta^\ast+\Sigma^{-1}\mu\Bigr).
\] With the default scalar pwt, \[
\Sigma^{-1}
=
\frac{\mathrm{pwt}}{1-\mathrm{pwt}}\,P(\beta^\ast),
\] so \[
\begin{aligned}
E(\beta\mid y)
&=
\left(P(\beta^\ast)+\frac{\mathrm{pwt}}{1-\mathrm{pwt}}P(\beta^\ast)
\right)^{-1}
\left(P(\beta^\ast)\beta^\ast+\frac{\mathrm{pwt}}{1-\mathrm{pwt}}P(\beta^\ast)\mu\right)
\\
&=
\left(\frac{1-\mathrm{pwt}}{1-\mathrm{pwt}}P(\beta^\ast)+\frac{\mathrm{pwt}}{1-\mathrm{pwt}}P(\beta^\ast)
\right)^{-1}
\left(\frac{1-\mathrm{pwt}}{1-\mathrm{pwt}}P(\beta^\ast)\beta^\ast+\frac{\mathrm{pwt}}{1-\mathrm{pwt}}P(\beta^\ast)\mu\right)
\\
&=
\left(\frac{1}{1-\mathrm{pwt}}P(\beta^\ast)\right)^{-1}
\left(\frac{1}{1-\mathrm{pwt}}P(\beta^\ast)\bigl((1-\mathrm{pwt})\beta^\ast+\mathrm{pwt}\mu\bigr)\right)
\\
&=
(1-\mathrm{pwt})\beta^\ast+\mathrm{pwt}\,\mu.
\end{aligned}
\] Thus the posterior mean is a convex combination of the
likelihood estimate \(\beta^\ast\) and
prior mean \(\mu\): larger
pwt gives more pull toward \(\mu\). In the limit as \(\mathrm{pwt}\to 0\), it approaches \(\beta^\ast\). The underlying precision
combination is the usual normal–normal Bayes linear model update (Lindley and Smith 1972).
The posterior covariance is \[
\mathrm{Var}(\beta\mid y)
=
\bigl(P(\beta^\ast)+\Sigma^{-1}\bigr)^{-1}.
\] With the default scalar pwt, \[
\Sigma^{-1}
=
\frac{\mathrm{pwt}}{1-\mathrm{pwt}}\,P(\beta^\ast),
\] so \[
\begin{aligned}
\mathrm{Var}(\beta\mid y)
&=
\left(P(\beta^\ast)+\frac{\mathrm{pwt}}{1-\mathrm{pwt}}\,P(\beta^\ast)\right)^{-1}
\\
&=
\left(\frac{1-\mathrm{pwt}}{1-\mathrm{pwt}}\,P(\beta^\ast)+\frac{\mathrm{pwt}}{1-\mathrm{pwt}}\,P(\beta^\ast)\right)^{-1}
\\
&=
\left(\frac{1}{1-\mathrm{pwt}}\,P(\beta^\ast)\right)^{-1} \\
&=
(1-\mathrm{pwt})\,P(\beta^\ast)^{-1}.
\end{aligned}
\]
Thus the posterior covariance is the likelihood-based covariance
\(P(\beta^\ast)^{-1}\) shrunk by the
factor \(1-\mathrm{pwt}\): larger
pwt (stronger prior pull) gives tighter posterior
uncertainty. In the limit as \(\mathrm{pwt}\to
0\), it approaches the likelihood-based covariance.
dNormal_Gamma()dNormal_Gamma() (Gaussian conjugate
Normal–Gamma, using Sigma_0). The
marginal posterior mean is \[
E(\beta\mid y)
=
E_{\tau\mid y}\!\left[E(\beta\mid \tau,y)\right].
\] For fixed \(\tau\), \[
E(\beta\mid \tau,y)
=
\bigl(\tau X^{\mathsf T}W_{\mathrm{obs}}X+\tau\Sigma_0^{-1}\bigr)^{-1}
\Bigl(\tau X^{\mathsf
T}W_{\mathrm{obs}}X\,\beta^\ast+\tau\Sigma_0^{-1}\mu\Bigr).
\] Under the default scalar pwt calibration for
Sigma_0, \[
\Sigma_0^{-1}
=
\frac{\mathrm{pwt}}{1-\mathrm{pwt}}\,X^{\mathsf T}W_{\mathrm{obs}}X,
\] so \[
\begin{aligned}
E(\beta\mid \tau,y)
&=
\left(\tau X^{\mathsf
T}W_{\mathrm{obs}}X+\tau\frac{\mathrm{pwt}}{1-\mathrm{pwt}}X^{\mathsf
T}W_{\mathrm{obs}}X\right)^{-1}
\left(\tau X^{\mathsf
T}W_{\mathrm{obs}}X\,\beta^\ast+\tau\frac{\mathrm{pwt}}{1-\mathrm{pwt}}X^{\mathsf
T}W_{\mathrm{obs}}X\,\mu\right) \\
&=
\left(\frac{\tau}{1-\mathrm{pwt}}X^{\mathsf
T}W_{\mathrm{obs}}X\right)^{-1}
\left(\frac{\tau}{1-\mathrm{pwt}}X^{\mathsf
T}W_{\mathrm{obs}}X\bigl((1-\mathrm{pwt})\beta^\ast+\mathrm{pwt}\mu\bigr)\right)
\\
&=
(1-\mathrm{pwt})\beta^\ast+\mathrm{pwt}\,\mu.
\end{aligned}
\] Because this expression is free of \(\tau\), averaging over \(\tau\mid y\) gives \[
E(\beta\mid y)=(1-\mathrm{pwt})\beta^\ast+\mathrm{pwt}\,\mu.
\] Thus the marginal posterior mean has the same weighted-average
interpretation: larger pwt gives more pull toward \(\mu\), and as \(\mathrm{pwt}\to 0\) it approaches \(\beta^\ast\). For general non-Gaussian GLMs
these equalities are not exact in finite samples, because the likelihood
is not exactly quadratic in \(\beta\);
however, the same weighted-average form is often a good approximation
when the likelihood is close to multivariate normal, as typically occurs
in large samples.
pwt and optional sdVector pwt: same Hadamard construction
as above; correlations in \(V_0\) are
preserved, variances scaled per coordinate.
sd: \(\mathrm{pwt}_j =
(V_0)_{jj}/\bigl((V_0)_{jj}+\mathrm{sd}_j^2\bigr)\); vector
pwt is not overwritten from scalar n_prior.
Gaussian fits may require scalar n_prior
in addition (Section 3).
This section develops the Gaussian prior families used when the
dispersion parameter is unknown. The goal is to show how
Prior_Setup() constructs the Gamma prior on the residual
precision \(\tau = 1/\phi\), how the
Normal block interacts with the likelihood, and how the resulting
posterior hyperparameters arise.
We begin with the conjugate Normal–Gamma specification \[ \beta \mid \tau \sim N\!\left(\mu,\; (\tau \Sigma_0)^{-1}\right), \qquad \tau \sim \Gamma(a_0, b_0), \] where \(\Sigma_0\) is the dispersion‑free prior covariance matrix.
For the weighted Gaussian likelihood, \[ y \mid \beta,\tau \sim N\!\left(X\beta,\; \tau^{-1} W_{\mathrm{obs}}^{-1}\right), \] the Normal block and likelihood combine through:
Integrating out \(\beta\) in the Normal–Gamma algebra adds \[ \frac{n_w}{2} \] to the Gamma shape parameter (note: this parameterization does not add \(p/2\)). Thus the posterior hyperparameters are \[ a_n = a_0 + \frac{n_w}{2}, \qquad b_n = b_0 + \frac{1}{2} S_{\mathrm{marg}}, \] with \(n_w\) the effective sample size and \(p = \mathrm{ncol}(X)\).
pwtThe scalar prior‑weight pwt is mapped to an
effective prior sample size \[
n_{\mathrm{prior}}
=
\frac{\mathrm{pwt}}{1-\mathrm{pwt}}\, n_w,
\qquad\text{equivalently}\qquad
\mathrm{pwt}
=
\frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}} + n_w}.
\]
Interpretation:
pwt controls how strongly the prior mean \(\mu\) influences the posterior,n_prior is the number of “pseudo‑observations” implied
by the prior,pwt → 0, the prior becomes negligible and the
posterior becomes likelihood‑dominated.The dispersion‑free covariance used in dNormal_Gamma()
is \[
\Sigma_0
=
\frac{1-\mathrm{pwt}}{\mathrm{pwt}}
\left(X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1},
\] so that \[
\Sigma_0^{-1}
=
\frac{\mathrm{pwt}}{1-\mathrm{pwt}}
\,X^{\mathsf T}W_{\mathrm{obs}}X.
\]
Substituting this into the expression for \(S_{\mathrm{marg}}\) yields \[ \begin{aligned} S_{\mathrm{marg}} &= \mathrm{RSS}_w + (\hat\beta - \mu)^{\mathsf T} \left( \frac{1-\mathrm{pwt}}{\mathrm{pwt}} (X^{\mathsf T}W_{\mathrm{obs}}X)^{-1} + (X^{\mathsf T}W_{\mathrm{obs}}X)^{-1} \right)^{-1} (\hat\beta - \mu) \\ &= \mathrm{RSS}_w + \mathrm{pwt}\, (\hat\beta - \mu)^{\mathsf T} \left(X^{\mathsf T}W_{\mathrm{obs}}X\right) (\hat\beta - \mu). \end{aligned} \]
Thus under scalar pwt, the prior‑mean penalty in \(S_{\mathrm{marg}}\) is scaled
directly by pwt. This is the key link
between the Normal block and the Gamma update for \(\tau\).
This section explains how the outputs of Prior_Setup()
map into the Gaussian prior families and how a single calibration—based
on pwt, \(n_{\mathrm{prior}}\), and the Zellner form
of \(\Sigma_0\)—governs all of
them.
We proceed in four parts:
n_prior.A final subsection states the unified weak‑limit theorem.
Let \(n_w=\sum_i w_i\) be the
effective sample size (n_effective).
For scalar pwt, Prior_Setup() defines the
effective prior sample size \[
n_{\mathrm{prior}}
=
\frac{\mathrm{pwt}}{1-\mathrm{pwt}}\,n_w,
\qquad
\mathrm{pwt}
=
\frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}}+n_w}.
\]
Under the shape–rate parameterization \(\Gamma(a_0,b_0)\) with density \(\propto \tau^{a_0-1}e^{-b_0\tau}\), the
default prior on the residual precision \(\tau\) is \[
a_0 = \frac{n_{\mathrm{prior}}+k}{2},\qquad
b_0 = \frac{1}{2}(n_{\mathrm{prior}}+k+p-2)\frac{\mathrm{Smarg}}{n_w-p}.
\] where \(S_{\mathrm{marg}}\)
is the marginal quadratic term from Section 3.1,
\(n_w>p\) ensures propriety of the
likelihood contribution,
and the conditions \(k \ge 0\) and
\(k+p \ge 2\) guarantee that the Gamma
prior itself is proper for all \(n_{\mathrm{prior}}>0\).
The posterior hyperparameters and induced moments follow from this calibration and are summarized in Theorem 1.
dNormal_Gamma() calibration)Assume:
\(X^{\mathsf T}W_{\mathrm{obs}}X\) is positive definite,
\(n_w > p\),
\(\mathrm{RSS}_w > 0\),
\(k \ge 0\),
\(k + p \ge 2\).
Let the prior be \[ \beta\mid\tau\sim N(\mu,\tau^{-1}\Sigma_0), \qquad \tau\sim\Gamma(a_0,b_0), \] with \[ \Sigma_0 = \frac{1-\mathrm{pwt}}{\mathrm{pwt}} \left(X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1} = \frac{n_w}{n_{\mathrm{prior}}} \left(X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1}. \]
Then the posterior is again Normal–Gamma with the following hyperparameters.
\[ \mu_{\mathrm{post}} = \mathrm{pwt}\,\mu+(1-\mathrm{pwt})\,\hat\beta = \frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}}+n_w}\,\mu + \frac{n_w}{n_{\mathrm{prior}}+n_w}\,\hat\beta. \]
\[ \Sigma_{0,\mathrm{post}} = (\Sigma_0^{-1}+X^{\mathsf T}W_{\mathrm{obs}}X)^{-1} = \frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}}+n_w}\,\Sigma_0 = \frac{n_w}{n_{\mathrm{prior}}+n_w} \left(X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1}. \]
For general \(\Sigma_0\), use
\(\Sigma_{0,\mathrm{post}}=(\Sigma_0^{-1}+X^{\mathsf
T}W_{\mathrm{obs}}X)^{-1}\).
\[ a_n = a_0 + \frac{n_w}{2} = \frac{n_{\mathrm{prior}} + k + n_w}{2}. \]
\[ b_n = b_0 + \frac{1}{2}\mathrm{Smarg} = \frac{1}{2}\frac{\mathrm{Smarg}}{n_w-p}\,(n_{\mathrm{prior}} + k + n_w - 2). \]
\[ \mathbb{E}[\beta\mid y]=\mu_{\mathrm{post}}. \]
For \(a_n>1\), \[ \mathbb{E}[\sigma^2\mid y] = \frac{b_n}{a_n-1} = \frac{S_{\mathrm{marg}}}{n_w-p}. \]
Let
\[
V_n=\Sigma_{0,\mathrm{post}}
=
(\Sigma_0^{-1}+X^{\mathsf T}W_{\mathrm{obs}}X)^{-1}.
\]
Then \[ \mathrm{Cov}(\beta\mid y) = \mathbb{E}[\sigma^2\mid y]\,V_n = \frac{S_{\mathrm{marg}}}{n_w-p}\, \frac{n_w}{n_w+n_{\mathrm{prior}}}\, \left(X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1}. \]
Proof. See Appendix B.
pwt controls the pull toward the prior
mean in (i).pwt also controls the shrinkage of the
covariance in (vii) viaTogether, these determine how prior strength interacts with sample
size and model dimension.
Theorem 1 restates standard conjugate Normal–Gamma posterior formulas
under this calibration
(Gelman et al. 2013; Raiffa and Schlaifer
1961).
dNormal_Gamma() posterior)Assume the same identifiability conditions as in Theorem 1:
\(X^{\mathsf T}W_{\mathrm{obs}}X\) is positive definite,
\(n_w > p\),
\(\mathrm{RSS}_w > 0\),
\(k \ge 0\),
\(k + p \ge 2\).
Under the default calibration of Theorem 1, let
\[
n_{\mathrm{prior}} \to 0^{+}
\qquad\text{equivalently}\qquad
\mathrm{pwt} \to 0^{+},
\quad
n_{\mathrm{prior}}
=
\frac{\mathrm{pwt}}{1-\mathrm{pwt}}\,n_w.
\]
Then \(S_{\mathrm{marg}} \to
\mathrm{RSS}_w\), and the conjugate dNormal_Gamma()
posterior converges weakly to a Normal–Gamma law \(\Pi_{0}(\cdot\mid y)\) on \((\beta,\tau)\).
The limiting hyperparameters are the limits of the posterior quantities
in Theorem 1 as \(n_{\mathrm{prior}}\to
0^{+}\).
(These are not the prior hyperparameters \(a_0,b_0\).)
\[ \mu_{\Pi_{0}} = \lim_{n_{\mathrm{prior}}\to 0^{+}} \mu_{\mathrm{post}} = \hat\beta. \]
\[ \Sigma_{0,\Pi_{0}} = \lim_{n_{\mathrm{prior}}\to 0^{+}} \Sigma_{0,\mathrm{post}} = \left(X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1}. \]
\[ a_{\mathrm{II}} = \lim_{n_{\mathrm{prior}}\to 0^+} a_n = \frac{k + n_w}{2}. \]
\[ b_{\mathrm{II}} = \lim_{n_{\mathrm{prior}}\to 0^+} b_n = \frac{1}{2}\frac{\mathrm{RSS}_w}{n_w-p}\,(k + n_w - 2). \]
\[ \mathbb{E}_{\Pi_{0}}[\beta\mid y] = \mu_{\Pi_{0}} = \hat\beta. \]
For \(\tau\mid y \sim \Gamma(a_{\Pi_{0}},b_{\Pi_{0}})\), \[ \mathbb{E}_{\Pi_{0}}[\sigma^2\mid y] = \frac{b_{\Pi_{0}}}{a_{\Pi_{0}}-1} = \frac{\mathrm{RSS}_w}{n_w-p}, \] the classical weighted residual‑variance estimator.
\[ \mathrm{Cov}_{\Pi_{0}}(\beta\mid y) = \mathbb{E}_{\Pi_{0}}[\sigma^2\mid y]\, \Sigma_{0,\Pi_{0}} = \frac{\mathrm{RSS}_w}{n_w-p}\, \left(X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1}, \] matching the usual weighted least‑squares covariance.
The limit \(\Pi_{0}\) is the
weak‑prior Normal–Gamma law obtained when the prior
contributes no pseudo‑information.
It has:
The independent Normal–Gamma posterior (Theorem 3) converges to this same \(\Pi_{0}\) under the same assumptions; only the finite‑\(n_{\mathrm{prior}}\) joint density differs.
By Theorem 1, for each \(n_{\mathrm{prior}}>0\) the dNormal_Gamma posterior is Normal–Gamma with hyperparameters \[ \mu_{\mathrm{post}}(n_{\mathrm{prior}}),\quad \Sigma_{0,\mathrm{post}}(n_{\mathrm{prior}}),\quad a_n(n_{\mathrm{prior}}),\quad b_n(n_{\mathrm{prior}}), \] given explicitly by \[ \mu_{\mathrm{post}}(n_{\mathrm{prior}}) = \frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}}+n_w}\,\mu + \frac{n_w}{n_{\mathrm{prior}}+n_w}\,\hat\beta, \] \[ \Sigma_{0,\mathrm{post}}(n_{\mathrm{prior}}) = \frac{n_w}{n_{\mathrm{prior}}+n_w}\,G^{-1}, \] \[ a_n(n_{\mathrm{prior}}) = \frac{n_{\mathrm{prior}}+k+n_w}{2},\qquad b_n(n_{\mathrm{prior}}) = \frac{n_{\mathrm{prior}}+k+n_w-2}{2}\, \frac{\mathrm{Smarg}}{n_w-p}. \]
As \(n_{\mathrm{prior}}\to 0^+\), each of these converges to a finite, valid Normal–Gamma parameter. Using assumption 4 (\(k\ge 0\)) together with assumption 2 (\(n_w>p\)), \[ a_n(n_{\mathrm{prior}})\to\frac{k+n_w}{2}>0. \] Using assumption 5 (\(k+p\ge 2\)) and again 2 (\(n_w>p\)), which together imply \(k+n_w>2\), \[ b_n(n_{\mathrm{prior}})\to \frac{k+n_w-2}{2}\, \frac{\mathrm{RSS}_w}{n_w-p}>0, \] with \(\mathrm{Smarg}\to\mathrm{RSS}_w\) as \(n_{\mathrm{prior}}\to 0^+\). Thus both limiting Gamma parameters are strictly positive, ensuring that the limiting Normal–Gamma law is proper.
The Normal–Gamma family is closed under weak limits when its parameters converge in this way, so the posteriors \(\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}\) converge weakly to the Normal–Gamma law \(\Pi_0\) with these limiting hyperparameters. The stated formulas for the limiting mean, covariance, and variance of \(\beta\) and \(\sigma^2=1/\tau\) follow by plugging the limits into the standard Normal–Gamma moment expressions. \(\square\)
dNormal() with default
dispersionThe dNormal() prior fixes the residual variance \(\sigma^2\) at a calibrated value rather
than integrating over \(\tau\) as in
the Normal–Gamma model.
This section shows how the default dispersion is chosen and how the
resulting posterior covariance matches the weak‑prior limit of
dNormal_Gamma().
From Section 2.3.2, under scalar pwt, \[
\mathrm{Var}(\beta\mid y,\sigma^2)
=
(1-\mathrm{pwt})\,P(\beta^\ast)^{-1}.
\]
For weighted Gaussian regression, \[ P(\beta^\ast) = \sigma^{-2}X^{\mathsf T}W_{\mathrm{obs}}X, \] so \[ \mathrm{Var}(\beta\mid y,\sigma^2) = (1-\mathrm{pwt})\,\sigma^2 \left(X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1}. \]
Using
\[
n_{\mathrm{prior}}
=
\frac{\mathrm{pwt}}{1-\mathrm{pwt}},
\qquad
1-\mathrm{pwt}
=
\frac{n_w}{n_w+n_{\mathrm{prior}}},
\] this becomes \[
\mathrm{Var}(\beta\mid y,\sigma^2)
=
\frac{n_w}{n_w+n_{\mathrm{prior}}}\,
\sigma^2
\left(X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1}.
\]
To choose a default fixed value of \(\sigma^2\), Prior_Setup() uses
the posterior mean from the Normal–Gamma model (Theorem 1 (vi)): \[
\mathrm{dispersion}_{\mathrm{default}}
=
\frac{S_{\mathrm{marg}}}{n_w-p}.
\]
This matches the classical residual degrees‑of‑freedom adjustment.
Substituting this into the covariance expression gives \[ \mathrm{Var}(\beta\mid y,\mathrm{dispersion}_{\mathrm{default}}) = \frac{n_w}{n_w+n_{\mathrm{prior}}}\, \frac{S_{\mathrm{marg}}}{n_w-p}\, \left(X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1}. \]
Prior_Setup()With the same default dispersion, Prior_Setup() returns
the coefficient‑scale prior covariance \[
\Sigma_{\mathrm{calibrated}}
=
\frac{n_w}{n_{\mathrm{prior}}}\,
\mathrm{dispersion}_{\mathrm{default}}\,
\left(X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1},
\] which is the matrix used by dNormal().
This matches the Normal–Gamma expression in Section 3.3.2, ensuring that the fixed‑dispersion and conjugate models share the same calibration.
As \(\mathrm{pwt}\to 0\) (equivalently \(n_{\mathrm{prior}}\to 0^{+}\)), \[ \mathrm{Var}(\beta\mid y) \;\longrightarrow\; \frac{\mathrm{RSS}_w}{n_w-p}\, \left(X^{\mathsf T}W_{\mathrm{obs}}X\right)^{-1}, \] the classical weighted least‑squares covariance.
Thus dNormal() with default dispersion has the
same weak‑prior limit as dNormal_Gamma(),
and the returned shape, rate,
dispersion, and coefficient‑scale covariance remain
internally consistent under the package calibration.
The independent Normal–Gamma (ING) prior replaces the conjugate covariance structure \(\tau^{-1}\Sigma_0\) with a fixed coefficient-scale covariance \(\Sigma\), while using a Gamma prior on \(\tau\) whose shape parameter differs from the conjugate Normal–Gamma case by \(p/2\). (Griffin and Brown 2010) develop inference with Normal–Gamma priors in regression when independence replaces full conjugacy.
The default call is
dIndependent_Normal_Gamma(ps$mu, Sigma = ps$Sigma, shape = ps$shape_ING, rate = ps$rate)
Let \(p = \mathrm{ncol}(X)\), and let \(a_0, b_0, S_{\mathrm{marg}}\) be as in Sections 3.3.1–3.3.2.
Prior mean: \[ \mu = \texttt{ps\$mu}. \]
Coefficient-scale covariance: \[ \Sigma = \frac{n_w}{n_{\mathrm{prior}}}\, \frac{S_{\mathrm{marg}}}{n_w - p}\, (X^{\mathsf T}W_{\mathrm{obs}}X)^{-1}. \]
ING Gamma shape: \[ \mathrm{shape}_{\mathrm{ING}} = a_0 + \frac{p}{2} = \frac{n_{\mathrm{prior}} + k + p}{2}. \]
Gamma rate: \[ \texttt{rate} = b_0 = \frac{1}{2}(n_{\mathrm{prior}} + k + p - 2)\, \frac{S_{\mathrm{marg}}}{n_w - p}. \]
Assume:
\(X^{\mathsf T}W_{\mathrm{obs}}X\) is positive definite,
\(n_w > p\),
\(\mathrm{RSS}_w > 0\),
\(k \ge 0\),
\(k + p \ge 2\).
For each \(n_{\mathrm{prior}} >
0\), let
\[
\Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\cdot \mid y)
\] denote the posterior under the ING prior above.
Let \(\Pi_0(\cdot \mid y)\) be the
Normal–Gamma law from Theorem 2 with hyperparameters
\[ \mu_{\Pi_0} = \hat\beta, \qquad \Sigma_{0,\Pi_0} = (X^{\mathsf T}W_{\mathrm{obs}}X)^{-1}, \] \[ a_{\Pi_0} = \frac{k + n_w}{2}, \qquad b_{\Pi_0} = \frac{1}{2}\frac{k + n_w - 2}{n_w - p}\,\mathrm{RSS}_w. \]
Then, as \(n_{\mathrm{prior}} \to 0^{+}\) (equivalently \(\mathrm{pwt} \to 0^{+}\)),
\[ \Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\cdot \mid y) \;\Rightarrow\; \Pi_0(\cdot \mid y) \]
in distribution on \(\mathbb{R}^p \times
(0,\infty)\).
Moreover, the posterior moments converge:
Coefficient mean: \[ \mathbb{E}_{\Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}}[\beta \mid y] \longrightarrow \hat\beta. \]
Residual variance: \[ \mathbb{E}_{\Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}}[\sigma^2 \mid y] \longrightarrow \frac{\mathrm{RSS}_w}{n_w - p}. \]
Coefficient covariance: \[ \mathrm{Cov}_{\Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}}(\beta \mid y) \longrightarrow \frac{\mathrm{RSS}_w}{n_w - p}\, (X^{\mathsf T}W_{\mathrm{obs}}X)^{-1}. \]
Thus the ING posterior has the same weak-prior limit as the conjugate Normal–Gamma posterior, even though its finite-\(n_{\mathrm{prior}}\) form is not conjugate and its Gamma shape parameter differs by \(p/2\).
Proof of Theorem 3.
Fix \(y,X,W_{\mathrm{obs}}\)
satisfying Assumptions 1–3.
For each \(n_{\mathrm{prior}}>0\),
let \(\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}\)
and \(\Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}\)
denote, respectively, the NG and ING posteriors on \((\beta,\tau)\).
By Theorem 2, the NG posteriors converge weakly to the limiting Normal–Gamma law \(\Pi_0\): \[ \Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}} \Rightarrow \Pi_0 \quad\text{as }n_{\mathrm{prior}}\to 0^+. \]
From the posterior ratio identity in A.2 and Lemma B, we have, for each \(n_{\mathrm{prior}}>0\), \[ \Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\mathrm{d}\beta,\mathrm{d}\tau) = R_{n_{\mathrm{prior}}}(\beta,\tau)\, \Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\mathrm{d}\beta,\mathrm{d}\tau), \] with \[ R_{n_{\mathrm{prior}}}(\beta,\tau)\to 1 \quad\text{for each fixed }(\beta,\tau), \] and a measurable envelope \(M(\beta,\tau)\) such that \[ \sup_{0<n_{\mathrm{prior}}<\delta} |R_{n_{\mathrm{prior}}}(\beta,\tau)| \le M(\beta,\tau), \qquad \int M(\beta,\tau)\,\Pi_0(\mathrm{d}\beta,\mathrm{d}\tau)<\infty. \]
Let \(f\colon\mathbb{R}^p\times(0,\infty)\to\mathbb{R}\) be bounded and continuous. Then \[ \int f\,\mathrm{d}\Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}} = \int f(\beta,\tau)\,R_{n_{\mathrm{prior}}}(\beta,\tau)\, \Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\mathrm{d}\beta,\mathrm{d}\tau). \]
By Lemma A, the NG path has uniform moment bounds.
These bounds rely on Assumptions 4–5 (\(k\ge0\) and \(k+p\ge2\)), which ensure that the NG Gamma
shapes and rates remain in compact subsets of \((0,\infty)\) for all \(0<n_{\mathrm{prior}}<\delta\).
Together with Claim B.2, this implies that \(|f\,R_{n_{\mathrm{prior}}}|\) is dominated
by an integrable envelope under \(\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}\)
for \(0<n_{\mathrm{prior}}<\delta\).
Using the pointwise convergence \(R_{n_{\mathrm{prior}}}\to 1\) and the weak convergence \(\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}\Rightarrow\Pi_0\), we obtain, by dominated convergence, \[ \int f\,\mathrm{d}\Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}} \longrightarrow \int f\,\mathrm{d}\Pi_0. \] Thus \(\Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}\Rightarrow\Pi_0\) as \(n_{\mathrm{prior}}\to 0^+\), proving the distributional convergence.
For the moment statements, take \(f(\beta,\tau)=\beta_j\), \(f(\beta,\tau)=\tau^{-1}\), and \(f(\beta,\tau)=(\beta-\mathbb{E}_{\Pi_0}[\beta])
(\beta-\mathbb{E}_{\Pi_0}[\beta])^\top\) componentwise. Lemma A
again applies because Assumptions 4–5 ensure that the NG Gamma
parameters stay uniformly bounded away from zero, giving uniform
integrability of the corresponding NG moments.
The same envelope \(M\) from Claim B.2
transfers this to the ING path via the ratio representation. Hence
dominated convergence applies to these (unbounded) test functions as
well, yielding \[
\mathbb{E}_{\Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}}[\beta]
\to
\mathbb{E}_{\Pi_0}[\beta]=\hat\beta,
\quad
\mathbb{E}_{\Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}}[\tau^{-1}]
\to
\mathbb{E}_{\Pi_0}[\tau^{-1}]
=
\frac{\mathrm{RSS}_w}{n_w-p},
\] and \[
\mathrm{Cov}_{\Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}}(\beta\mid y)
\to
\mathrm{Cov}_{\Pi_0}(\beta\mid y)
=
\frac{\mathrm{RSS}_w}{n_w-p}\,G^{-1}.
\]
These limits match the expressions stated in Theorem 3, so the ING posterior has the same weak–prior limit \(\Pi_0\) as the conjugate Normal–Gamma posterior, with convergence of the first two moments. \(\square\)
This subsection records the Gaussian hyperparameters that
Prior_Setup() passes into dGamma() and
rGamma_reg() when the coefficient vector is fixed at the
default blend \(\beta^{+}\). The
implementation uses the fields shape,
rate_gamma, and coefficients returned by
compute_gaussian_prior(); the sampler then pairs a Gamma
prior on \(\tau = 1/\sigma^{2}\) with
the weighted Gaussian likelihood evaluated at \(\beta^{+}\).
Let \[
n_w = \sum_i w_i,\qquad p = \mathrm{ncol}(X),\qquad
n_{\mathrm{prior}} = \frac{\mathrm{pwt}}{1-\mathrm{pwt}}\,n_w,
\] and let \(k=1\) be the
package default in compute_gaussian_prior().
Define the blended coefficient vector \[ \beta^{+} = (1-\mathrm{pwt})\,\hat\beta + \mathrm{pwt}\,\mu, \] and the corresponding weighted residual sum of squares \[ \mathrm{RSS}_w(\beta^{+}) = \sum_i w_i\,(y_i - x_i^\top\beta^{+})^2. \]
The prior supplied to dGamma() is a Gamma distribution
on the precision \(\tau\):
Shape parameter \[ a_0 = \frac{n_{\mathrm{prior}} + k}{2}. \]
Rate parameter \[ b_{0,y} = \frac{n_{\mathrm{prior}} + k + p - 2}{2}\; \frac{\mathrm{RSS}_w(\beta^{+})}{\,n_w - p\,}, \qquad n_w > p. \]
This matches the structure used internally by
compute_gaussian_prior(): the factor \((n_{\mathrm{prior}} + k + p - 2)/(n_w -
p)\) is the same multiplier that appears in the default
rate_gamma, with \(\mathrm{RSS}_w(\beta^{+})\) supplying the
residual sum of squares at the blended coefficient.
With the weighted Gaussian likelihood \[ L(y\mid \beta^{+},\tau) \;\propto\; \tau^{n_w/2}\exp\!\left(-\frac{\tau}{2}\,\mathrm{RSS}_w(\beta^{+})\right), \] and the prior \(\tau\sim\Gamma(a_0,b_{0,y})\), the posterior is again Gamma:
Posterior shape \[ a_n = a_0 + \frac{n_w}{2} = \frac{n_{\mathrm{prior}} + k + n_w}{2}. \]
Posterior rate \[ b_n = b_{0,y} + \frac{1}{2}\,\mathrm{RSS}_w(\beta^{+}) = \frac{n_{\mathrm{prior}} + k + n_w - 2}{2}\; \frac{\mathrm{RSS}_w(\beta^{+})}{\,n_w - p\,}. \]
For \(a_n > 1\), \[ E[\sigma^2 \mid y, \beta^{+}] = \frac{b_n}{a_n - 1} = \frac{\mathrm{RSS}_w(\beta^{+})}{\,n_w - p\,}, \] the usual weighted residual‑variance estimator evaluated at \(\beta^{+}\).
This completes the description of the fixed-\(\beta\) Gamma prior used by
dGamma() and rGamma_reg(). —
This appendix collects the analytical components required to
establish Theorem 3.
Theorems 1 and 2 follow directly from conjugate Normal–Gamma algebra and
the Zellner‑type calibration; only the Independent Normal–Gamma (ING)
case requires additional work.
The purpose of this appendix is therefore to isolate the technical
machinery needed to show that the ING posterior converges to the same
weak‑prior limit \(\Pi_0\) as the
conjugate Normal–Gamma posterior.
The argument proceeds through five steps:
Each subsection states the required intermediate results and provides the structural components of the proof, while detailed algebraic derivations are deferred to the appropriate claims and lemmas.
Let
The weighted Gaussian likelihood can be written as
\[ L(y \mid \beta,\tau) \propto \tau^{n_w/2} \exp\!\left( -\frac{\tau}{2}\,\mathrm{RSS}_w(\beta) \right). \]
This representation is shared by both the NG and ING posterior paths.
To compare the ING and NG posterior paths, we first record their correct prior kernels.
For each \(n_{\mathrm{prior}} > 0\), \[ \beta \mid \tau \sim N\!\left(\mu,\;\tau^{-1}\Sigma_0\right), \qquad \tau \sim \Gamma\!\left(a_0(n_{\mathrm{prior}}),\, b_0(n_{\mathrm{prior}})\right), \] where the dispersion–free Zellner matrix is \[ \Sigma_0 = \frac{1-pwt}{pwt}\,(X^\top W_{\mathrm{obs}}X)^{-1}. \]
Thus the NG prior kernel is \[ \pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\beta,\tau) \propto \tau^{a_0(n_{\mathrm{prior}})+p/2-1} \exp\!\left( -b_0(n_{\mathrm{prior}})\tau -\frac{\tau}{2}(\beta-\mu)^\top\Sigma_0^{-1}(\beta-\mu) \right). \]
The ING prior uses a fixed coefficient–scale covariance and a Gamma shape shifted by \(p/2\): \[ \beta \mid n_{\mathrm{prior}} \sim N\!\left(\mu,\;\Sigma(n_{\mathrm{prior}})\right), \qquad \tau \sim \Gamma\!\left(a_0(n_{\mathrm{prior}})+\tfrac{p}{2},\; b_0(n_{\mathrm{prior}})\right), \] with \(\beta\) and \(\tau\) independent, and \[ \Sigma(n_{\mathrm{prior}}) = \frac{n_w}{n_{\mathrm{prior}}}\, \frac{\mathrm{Smarg}}{n_w - p}\, (X^\top W_{\mathrm{obs}}X)^{-1}. \]
Thus the ING prior kernel is \[ \pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\beta,\tau) \propto \tau^{a_0(n_{\mathrm{prior}})+p/2-1} \exp\!\left( -b_0(n_{\mathrm{prior}})\tau \right) \exp\!\left( -\frac{1}{2}(\beta-\mu)^\top\Sigma(n_{\mathrm{prior}})^{-1}(\beta-\mu) \right). \]
Define \[ R_{n_{\mathrm{prior}}}(\beta,\tau) = \frac{ \pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\beta,\tau) }{ \pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\beta,\tau) }. \]
Because the ING Gamma shape equals the NG Gamma shape plus \(p/2\), the \(\tau\)-powers match and cancel. The ratio therefore reduces to \[ R_{n_{\mathrm{prior}}}(\beta,\tau) = C_{n_{\mathrm{prior}}}\, \exp\!\left( -\frac{1}{2}(\beta-\mu)^\top \bigl[\Sigma(n_{\mathrm{prior}})^{-1} -\tau\,\Sigma_0^{-1}\bigr] (\beta-\mu) \right), \] where \(C_{n_{\mathrm{prior}}}\) absorbs all \(\tau\)-free constants.
The ING posterior is a reweighted NG posterior: \[ \Pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\mathrm{d}\beta,\mathrm{d}\tau) \propto R_{n_{\mathrm{prior}}}(\beta,\tau)\, \Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\mathrm{d}\beta,\mathrm{d}\tau). \]
This identity is the starting point for Lemma B and the ING weak‑prior limit.
Assume:
\(X^{\mathsf T}W_{\mathrm{obs}}X\) is positive definite,
\(n_w > p\),
\(\mathrm{RSS}_w > 0\),
\(k \ge 0\),
\(k + p \ge 2\).
Fix \(X, W_{\mathrm{obs}}, y, \mu\),
and hence \(\hat\beta\), \(\mathrm{RSS}_w\), \(S_{\mathrm{marg}}\), and \(G\).
For each \(n_{\mathrm{prior}} > 0\),
let \(\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}\)
be the conjugate Normal–Gamma posterior from Section 3.3.2, with
hyperparameters
\[ \mu_{\mathrm{post}}(n_{\mathrm{prior}}), \quad \Sigma_{0,\mathrm{post}}(n_{\mathrm{prior}}), \quad a_n(n_{\mathrm{prior}}), \quad b_n(n_{\mathrm{prior}}). \]
With the \(k\)-generalized calibration, these are: \[ a_n(n_{\mathrm{prior}})=\frac{n_{\mathrm{prior}}+k+n_w}{2}, \qquad b_n(n_{\mathrm{prior}})=\frac{n_{\mathrm{prior}}+k+n_w-2}{2}\frac{S_{\mathrm{marg}}}{n_w-p}. \]
Then there exists \(\delta > 0\) and constants \(C_1, C_2 < \infty\) such that for all \(0 < n_{\mathrm{prior}} < \delta\),
\[ \mathbb{E}_{\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}} \bigl[\|\beta\|\bigr] \le C_1, \qquad \mathbb{E}_{\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}} \bigl[\|\beta\|^2\bigr] \le C_2, \]
and
\[ \sup_{0 < n_{\mathrm{prior}} < \delta} \mathbb{E}_{\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}}[\tau] < \infty, \qquad \sup_{0 < n_{\mathrm{prior}} < \delta} \mathbb{E}_{\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}}[\tau^2] < \infty. \]
Under Assumptions 1–5 of Theorem 3, the NG hyperparameters satisfy:
In particular, there exists \(\delta > 0\) such that for all \(0 < n_{\mathrm{prior}} < \delta\), the four hyperparameters lie in compact subsets of their respective spaces.
Proof of Claim A.1.
By Theorem 1 and the prior setup, the NG hyperparameters can be written explicitly as functions of \(n_{\mathrm{prior}} > 0\):
\[ \mu_{\mathrm{post}}(n_{\mathrm{prior}}) = \frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}}+n_w}\,\mu + \frac{n_w}{n_{\mathrm{prior}}+n_w}\,\hat\beta, \]
\[ \Sigma_{0,\mathrm{post}}(n_{\mathrm{prior}}) = \frac{n_w}{n_{\mathrm{prior}}+n_w}\,G^{-1}, \]
\[ a_n(n_{\mathrm{prior}}) = \frac{n_{\mathrm{prior}} + k + n_w}{2},\qquad b_n(n_{\mathrm{prior}}) = \frac{1}{2}\bigl(n_{\mathrm{prior}} + k + n_w - 2\bigr)\, \frac{S_{\mathrm{marg}}}{n_w-p}. \]
Here \(\mu, \hat\beta, G, S_{\mathrm{marg}}, n_w, p\) are fixed and do not depend on \(n_{\mathrm{prior}}\).
Each of these maps is a rational (in fact affine) function of \(n_{\mathrm{prior}}\) with denominator \(n_{\mathrm{prior}}+n_w > 0\), so all four are continuous on \((0,\infty)\). Assumption 1 (\(G\) positive definite) ensures that \(G^{-1}\) exists and is finite, so \(\Sigma_{0,\mathrm{post}}(n_{\mathrm{prior}})\) is well defined for all \(n_{\mathrm{prior}}>0\). Assumption 2 (\(n_w > p\)) implies \(n_w-p>0\), so the denominator in \(b_n(n_{\mathrm{prior}})\) is positive. Assumption 3 (\(\mathrm{RSS}_w>0\)) implies \(S_{\mathrm{marg}}>0\), so the rate \(b_n(n_{\mathrm{prior}})\) is strictly positive for all \(n_{\mathrm{prior}}>0\).
Taking the limit \(n_{\mathrm{prior}} \to 0^{+}\) in the explicit formulas gives
\[ \mu_{\mathrm{post}}(n_{\mathrm{prior}}) \to \hat\beta,\qquad \Sigma_{0,\mathrm{post}}(n_{\mathrm{prior}}) \to G^{-1}, \]
\[ a_n(n_{\mathrm{prior}}) \to \frac{k+n_w}{2} > 0,\qquad b_n(n_{\mathrm{prior}}) \to \frac{1}{2}\frac{k+n_w-2}{n_w-p}\,S_{\mathrm{marg}} > 0, \]
where the strict positivity of the limits of \(a_n\) and \(b_n\) uses Assumptions 2–3 together with Assumptions 4–5 (\(k\ge0\) and \(k+p\ge2\)), since \(k+p\ge2\) and \(n_w>p\) imply \(k+n_w>2\).
Since each map is continuous on \((0,\infty)\) and has a finite limit as \(n_{\mathrm{prior}} \to 0^{+}\), there exists \(\delta > 0\) such that, for all \(0 < n_{\mathrm{prior}} < \delta\),
This is exactly the continuity and compactness statement of Claim A.1. \(\square\)
For each \(n_{\mathrm{prior}} > 0\), Theorem 1 gives
\(\beta \mid \tau, y, n_{\mathrm{prior}} \sim N\bigl(\mu_{\mathrm{post}}(n_{\mathrm{prior}}), \tau^{-1}\Sigma_{0,\mathrm{post}}(n_{\mathrm{prior}})\bigr)\),
\(\tau \mid y, n_{\mathrm{prior}} \sim \Gamma\bigl(a_n(n_{\mathrm{prior}}), b_n(n_{\mathrm{prior}})\bigr)\).
By Claim A.1, there exists \(\delta > 0\) such that for all \(0 < n_{\mathrm{prior}} < \delta\),
These compactness properties rely on Assumptions 4–5 (\(k\ge0\), \(k+p\ge2\)) together with Assumptions 2–3 (\(n_w>p\), \(\mathrm{RSS}_w>0\)), which ensure that the limiting Gamma shape and rate are strictly positive and therefore bounded away from zero.
For each \(n_{\mathrm{prior}}\),
\(\tau \mid y, n_{\mathrm{prior}} \sim
\Gamma(a_n(n_{\mathrm{prior}}), b_n(n_{\mathrm{prior}}))\),
so
\[ \mathbb{E}[\tau \mid y, n_{\mathrm{prior}}] = \frac{a_n(n_{\mathrm{prior}})}{b_n(n_{\mathrm{prior}})},\qquad \mathbb{E}[\tau^2 \mid y, n_{\mathrm{prior}}] = \frac{a_n(n_{\mathrm{prior}})\bigl(a_n(n_{\mathrm{prior}})+1\bigr)} {b_n(n_{\mathrm{prior}})^2}. \]
On \((0,\delta)\), both \(a_n(n_{\mathrm{prior}})\) and \(b_n(n_{\mathrm{prior}})\) stay in compact subsets of \((0,\infty)\) by Claim A.1, which uses Assumptions 4–5 to ensure positivity of the limiting Gamma parameters. Thus the maps
\[ n_{\mathrm{prior}} \mapsto \frac{a_n(n_{\mathrm{prior}})}{b_n(n_{\mathrm{prior}})},\qquad n_{\mathrm{prior}} \mapsto \frac{a_n(n_{\mathrm{prior}})\bigl(a_n(n_{\mathrm{prior}})+1\bigr)} {b_n(n_{\mathrm{prior}})^2} \]
are continuous and bounded on \((0,\delta)\). Hence
\[ \sup_{0 < n_{\mathrm{prior}} < \delta} \mathbb{E}_{\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}}[\tau] < \infty,\qquad \sup_{0 < n_{\mathrm{prior}} < \delta} \mathbb{E}_{\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}}[\tau^2] < \infty. \]
The marginal distribution of \(\beta \mid y, n_{\mathrm{prior}}\) under \(\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}\) has
\[ \mathbb{E}[\beta \mid y, n_{\mathrm{prior}}] = \mu_{\mathrm{post}}(n_{\mathrm{prior}}), \]
and
\[ \mathrm{Cov}(\beta \mid y, n_{\mathrm{prior}}) = \mathbb{E}[\sigma^2 \mid y, n_{\mathrm{prior}}]\, \Sigma_{0,\mathrm{post}}(n_{\mathrm{prior}}), \]
where \(\sigma^2 = 1/\tau\) and, for \(a_n(n_{\mathrm{prior}}) > 1\),
\[ \mathbb{E}[\sigma^2 \mid y, n_{\mathrm{prior}}] = \frac{b_n(n_{\mathrm{prior}})}{a_n(n_{\mathrm{prior}})-1}. \]
By Claim A.1,
\(a_n(n_{\mathrm{prior}}) \to (k+n_w)/2 >
0\) as \(n_{\mathrm{prior}} \to
0^{+}\).
Assumptions 4–5 ensure that \((k+n_w)/2>1\) because \(k+p\ge2\) and \(n_w>p\) imply \(k+n_w>2\).
Shrinking \(\delta\) if necessary, we
may therefore assume \(a_n(n_{\mathrm{prior}})
> 1\) for all \(0 <
n_{\mathrm{prior}} < \delta\).
On this interval, \(a_n(n_{\mathrm{prior}})\) and \(b_n(n_{\mathrm{prior}})\) lie in compact subsets of \((0,\infty)\), so
\[ n_{\mathrm{prior}} \mapsto \frac{b_n(n_{\mathrm{prior}})}{a_n(n_{\mathrm{prior}})-1} \]
is continuous and bounded on \((0,\delta)\).
By Claim A.1, \(\Sigma_{0,\mathrm{post}}(n_{\mathrm{prior}})\)
lies in a compact subset of the positive definite matrices, so its
operator norm and trace are bounded on \((0,\delta)\).
Therefore
\[ \sup_{0 < n_{\mathrm{prior}} < \delta} \mathrm{tr}\,\mathrm{Cov}(\beta \mid y, n_{\mathrm{prior}}) < \infty. \]
Now
\[ \mathbb{E}\bigl[\|\beta\|^2 \mid y, n_{\mathrm{prior}}\bigr] = \bigl\|\mathbb{E}[\beta \mid y, n_{\mathrm{prior}}]\bigr\|^2 + \mathrm{tr}\,\mathrm{Cov}(\beta \mid y, n_{\mathrm{prior}}). \]
By Claim A.1, \(\mu_{\mathrm{post}}(n_{\mathrm{prior}})\) lies in a compact subset of \(\mathbb{R}^p\) for \(0 < n_{\mathrm{prior}} < \delta\), so \(\|\mu_{\mathrm{post}}(n_{\mathrm{prior}})\|\) is bounded on \((0,\delta)\). Combined with the bound on \(\mathrm{tr}\,\mathrm{Cov}(\beta \mid y, n_{\mathrm{prior}})\), this implies
\[ \sup_{0 < n_{\mathrm{prior}} < \delta} \mathbb{E}_{\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}} \bigl[\|\beta\|^2\bigr] < \infty. \]
Define
\[ C_2 := \sup_{0 < n_{\mathrm{prior}} < \delta} \mathbb{E}_{\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}} \bigl[\|\beta\|^2\bigr] < \infty. \]
Finally, by Cauchy–Schwarz,
\[ \mathbb{E}_{\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}} \bigl[\|\beta\|\bigr] \le \Bigl( \mathbb{E}_{\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}} \bigl[\|\beta\|^2\bigr] \Bigr)^{1/2} \le \sqrt{C_2} =: C_1. \]
This proves Lemma A.
Lemma B (Ratio convergence and domination)
Let \(R_{n_{\mathrm{prior}}}(\beta,\tau)\) be the posterior density ratio \[ R_{n_{\mathrm{prior}}}(\beta,\tau) = \frac{\pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\beta,\tau\mid y)} {\pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\beta,\tau\mid y)}. \]
Under the assumptions of Theorem 3:
For each fixed \((\beta,\tau)\), \[ R_{n_{\mathrm{prior}}}(\beta,\tau)\to 1 \quad\text{as }n_{\mathrm{prior}}\to 0^+. \]
There exists a measurable envelope \(M(\beta,\tau)\) such that \[ \sup_{0<n_{\mathrm{prior}}<\delta} |R_{n_{\mathrm{prior}}}(\beta,\tau)| \le M(\beta,\tau), \qquad \mathbb{E}_{\Pi_0}[M(\beta,\tau)]<\infty. \] —
For each \(n_{\mathrm{prior}} > 0\), let \[ \tilde R_{n_{\mathrm{prior}}}(\beta,\tau) := \frac{ \pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\beta,\tau) }{ \pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\beta,\tau) } \] be the ratio of the ING and NG prior kernels defined in Section A.2.
Then \(\tilde R_{n_{\mathrm{prior}}}\) can be written in the form \[ \tilde R_{n_{\mathrm{prior}}}(\beta,\tau) = h_{n_{\mathrm{prior}}}(\beta)\, \tau^{c_p}\, \exp\!\bigl(-\tfrac{1}{2}\tau\,q_{n_{\mathrm{prior}}}(\beta)\bigr), \] where:
Proof.
From Section A.2, the NG and ING prior kernels are \[ \pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\beta,\tau) \propto \tau^{a_0(n_{\mathrm{prior}})+p/2-1} \exp\!\left( -b_0(n_{\mathrm{prior}})\tau -\frac{\tau}{2}(\beta-\mu)^\top\Sigma_0^{-1}(\beta-\mu) \right), \] \[ \pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\beta,\tau) \propto \tau^{a_0(n_{\mathrm{prior}})+p/2-1} \exp\!\left( -b_0(n_{\mathrm{prior}})\tau \right) \exp\!\left( -\frac{1}{2}(\beta-\mu)^\top\Sigma(n_{\mathrm{prior}})^{-1}(\beta-\mu) \right), \] with \[ \Sigma_0 = \frac{1-pwt}{pwt}\,(X^\top W_{\mathrm{obs}}X)^{-1}, \qquad \Sigma(n_{\mathrm{prior}}) = \frac{n_w}{n_{\mathrm{prior}}}\, \frac{\mathrm{Smarg}}{n_w - p}\, (X^\top W_{\mathrm{obs}}X)^{-1}. \]
Assumption 1 ensures \(X^\top W_{\mathrm{obs}}X\) is positive definite, so both \(\Sigma_0^{-1}\) and \(\Sigma(n_{\mathrm{prior}})^{-1}\) are well defined. Assumptions 2–3 (\(n_w>p\), \(\mathrm{RSS}_w>0\)) ensure that the scalar multipliers in \(\Sigma(n_{\mathrm{prior}})\) are positive. Assumptions 4–5 (\(k\ge0\), \(k+p\ge2\)) ensure that the Gamma shapes and rates used in the kernels are strictly positive for all \(n_{\mathrm{prior}}>0\).
The \(\tau\)-powers match (ING shape = NG shape \(+\;p/2\)), so \[ \tilde R_{n_{\mathrm{prior}}}(\beta,\tau) = \frac{ \pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\beta,\tau) }{ \pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\beta,\tau) } = \exp\!\left( -\frac{1}{2}(\beta-\mu)^\top\Sigma(n_{\mathrm{prior}})^{-1}(\beta-\mu) +\frac{\tau}{2}(\beta-\mu)^\top\Sigma_0^{-1}(\beta-\mu) \right). \]
Define \[ q_{n_{\mathrm{prior}}}(\beta) := -(\beta-\mu)^\top\Sigma_0^{-1}(\beta-\mu), \qquad c_p := 0, \] and \[ h_{n_{\mathrm{prior}}}(\beta) := \exp\!\left( -\frac{1}{2}(\beta-\mu)^\top\Sigma(n_{\mathrm{prior}})^{-1}(\beta-\mu) \right). \]
Then \[ \tilde R_{n_{\mathrm{prior}}}(\beta,\tau) = h_{n_{\mathrm{prior}}}(\beta)\, \tau^{c_p}\, \exp\!\bigl(-\tfrac{1}{2}\tau\,q_{n_{\mathrm{prior}}}(\beta)\bigr), \] with \(q_{n_{\mathrm{prior}}}(\beta)\) a quadratic form in \(\beta\) that does not depend on \(\tau\) and, in fact, does not depend on \(n_{\mathrm{prior}}\) at all. It is therefore continuous in \(n_{\mathrm{prior}}\) and has a finite limit as \(n_{\mathrm{prior}}\to 0^+\).
Using the explicit formula for \(\Sigma(n_{\mathrm{prior}})\), \[ \Sigma(n_{\mathrm{prior}})^{-1} = \frac{n_{\mathrm{prior}}}{n_w}\, \frac{n_w - p}{\mathrm{Smarg}}\, (X^\top W_{\mathrm{obs}}X), \] we see that \(\Sigma(n_{\mathrm{prior}})^{-1}\to 0\) as \(n_{\mathrm{prior}}\to 0^+\). This uses Assumptions 2–3 to ensure the scalar prefactor is positive. Hence, for each fixed \(\beta\), \[ h_{n_{\mathrm{prior}}}(\beta) = \exp\!\left( -\frac{1}{2}(\beta-\mu)^\top\Sigma(n_{\mathrm{prior}})^{-1}(\beta-\mu) \right) \longrightarrow \exp(0) = 1. \]
This proves the claimed representation of \(\tilde R_{n_{\mathrm{prior}}}\) and the pointwise convergence \(h_{n_{\mathrm{prior}}}(\beta)\to 1\). \(\square\)
Under Assumptions 1–3 and for \(0 < n_{\mathrm{prior}} < \delta\) as in Claim A.1, there exist constants \(C, c_1, c_2, c_3 > 0\) and a measurable function
\[ M(\beta,\tau) = C\,(1 + \tau^{c_1})\,\exp(-c_2 \tau)\,\exp\bigl(c_3 \|\beta\|^2\bigr) \]
such that
\[ \bigl|R_{n_{\mathrm{prior}}}(\beta,\tau)\bigr| \le M(\beta,\tau) \quad\text{for all }(\beta,\tau)\text{ and }0 < n_{\mathrm{prior}} < \delta, \]
and
\[ \int M(\beta,\tau)\,\Pi_0(\mathrm{d}\beta,\mathrm{d}\tau) < \infty. \]
Proof of Claim B.2.
Recall \[ R_{n_{\mathrm{prior}}}(\beta,\tau) = \frac{ \pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\beta,\tau\mid y) }{ \pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\beta,\tau\mid y) } = \tilde R_{n_{\mathrm{prior}}}(\beta,\tau)\, \frac{ Z^{(\mathrm{NG})}_{n_{\mathrm{prior}}} }{ Z^{(\mathrm{ING})}_{n_{\mathrm{prior}}} }, \] with \(\tilde R_{n_{\mathrm{prior}}}\) the prior–kernel ratio from Claim B.1 and \[ Z^{(\mathrm{NG})}_{n_{\mathrm{prior}}} = \iint L(y\mid\beta,\tau)\,\pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\beta,\tau)\,\mathrm{d}\beta\,\mathrm{d}\tau, \quad Z^{(\mathrm{ING})}_{n_{\mathrm{prior}}} = \iint L(y\mid\beta,\tau)\,\pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\beta,\tau)\,\mathrm{d}\beta\,\mathrm{d}\tau. \]
From Claim B.1 and the explicit formulas in A.2, \[ \log \tilde R_{n_{\mathrm{prior}}}(\beta,\tau) = -\tfrac12(\beta-\mu)^\top\Sigma(n_{\mathrm{prior}})^{-1}(\beta-\mu) +\tfrac12\tau\,(\beta-\mu)^\top\Sigma_0^{-1}(\beta-\mu). \]
Assumption 1 ensures \(X^\top W_{\mathrm{obs}}X\) is positive definite, so both \(\Sigma_0^{-1}\) and \(\Sigma(n_{\mathrm{prior}})^{-1}\) exist. Assumptions 2–3 (\(n_w>p\), \(\mathrm{RSS}_w>0\)) ensure the scalar multipliers in \(\Sigma(n_{\mathrm{prior}})\) are positive. Assumptions 4–5 (\(k\ge0\), \(k+p\ge2\)) ensure the Gamma shapes and rates used in the kernels are strictly positive.
For \(0<n_{\mathrm{prior}}<\delta\), Claim A.1 implies that the operator norms of \(\Sigma_0^{-1}\) and \(\Sigma(n_{\mathrm{prior}})^{-1}\) are uniformly bounded. Hence there exists \(C>0\) such that \[ \bigl|\log \tilde R_{n_{\mathrm{prior}}}(\beta,\tau)\bigr| \le C\,(1+\tau)\,\|\beta-\mu\|^2 \le C'\,(1+\tau)\,(1+\|\beta\|^2) \] for all \((\beta,\tau)\) and \(0<n_{\mathrm{prior}}<\delta\). Exponentiating and absorbing constants, \[ \bigl|\tilde R_{n_{\mathrm{prior}}}(\beta,\tau)\bigr| \le C_0\,(1+\tau^{c_1})\,\exp(-c_2\tau)\,\exp\bigl(c_3\|\beta\|^2\bigr) =: M_0(\beta,\tau), \] for suitable \(C_0,c_1,c_2,c_3>0\) independent of \(n_{\mathrm{prior}}\). This gives the desired functional form for an envelope of \(\tilde R_{n_{\mathrm{prior}}}\).
The maps \[ n_{\mathrm{prior}}\mapsto Z^{(\mathrm{NG})}_{n_{\mathrm{prior}}}, \qquad n_{\mathrm{prior}}\mapsto Z^{(\mathrm{ING})}_{n_{\mathrm{prior}}} \] are continuous on \((0,\delta)\) because the integrands depend continuously on \(n_{\mathrm{prior}}\) and are dominated by an integrable envelope. The likelihood \(L(y\mid\beta,\tau)\) times the NG prior kernel, together with the uniform moment bounds from Lemma A (which rely on Assumptions 2–5 to ensure the Gamma parameters remain in compact subsets of \((0,\infty)\)), provide such domination.
In particular, both \(Z^{(\mathrm{NG})}_{n_{\mathrm{prior}}}\) and \(Z^{(\mathrm{ING})}_{n_{\mathrm{prior}}}\) stay in compact subsets of \((0,\infty)\) for \(0<n_{\mathrm{prior}}<\delta\), so there exists \(K>0\) such that \[ \frac{ Z^{(\mathrm{NG})}_{n_{\mathrm{prior}}} }{ Z^{(\mathrm{ING})}_{n_{\mathrm{prior}}} } \in [K^{-1},K] \quad\text{for }0<n_{\mathrm{prior}}<\delta. \]
Combining Steps 1–2, \[ \bigl|R_{n_{\mathrm{prior}}}(\beta,\tau)\bigr| \le K\,M_0(\beta,\tau) = C\,(1+\tau^{c_1})\,\exp(-c_2\tau)\,\exp\bigl(c_3\|\beta\|^2\bigr) =: M(\beta,\tau), \] for all \((\beta,\tau)\) and \(0<n_{\mathrm{prior}}<\delta\), with \(C=K C_0\).
Under the limiting NG law \(\Pi_0\) from Theorem 2, \(\tau\) has a Gamma distribution with shape \(a_0>1\) and rate \(b_0>0\), and \(\beta\mid\tau\) is Gaussian with covariance proportional to \(\tau^{-1}G^{-1}\). Assumptions 2–5 ensure these limiting parameters are strictly positive. For \(c_2>0\) small enough and \(c_3>0\) small enough, all mixed moments \(\mathbb{E}_{\Pi_0}[\tau^{k}\exp(c_3\|\beta\|^2)]\) with \(k\le c_1\) are finite, so \[ \int M(\beta,\tau)\,\Pi_0(\mathrm{d}\beta,\mathrm{d}\tau)<\infty. \]
This establishes the claimed envelope and integrability, proving Claim B.2. \(\square\)
Proof of Lemma B.
Write both posteriors as \[ \pi^{(\cdot)}_{n_{\mathrm{prior}}}(\beta,\tau\mid y) \propto L(y\mid\beta,\tau)\, \pi^{(\cdot)}_{n_{\mathrm{prior}}}(\beta,\tau), \] with the common Gaussian likelihood \(L(y\mid\beta,\tau)\) from Section A.1. The likelihood cancels in the posterior ratio, so \[ R_{n_{\mathrm{prior}}}(\beta,\tau) = \frac{ \pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\beta,\tau\mid y) }{ \pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\beta,\tau\mid y) } = \frac{ \pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}(\beta,\tau) }{ \pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}(\beta,\tau) } \cdot \frac{ Z^{(\mathrm{NG})}_{n_{\mathrm{prior}}} }{ Z^{(\mathrm{ING})}_{n_{\mathrm{prior}}} } = \tilde R_{n_{\mathrm{prior}}}(\beta,\tau)\, \frac{ Z^{(\mathrm{NG})}_{n_{\mathrm{prior}}} }{ Z^{(\mathrm{ING})}_{n_{\mathrm{prior}}} }, \] where \(\tilde R_{n_{\mathrm{prior}}}\) is the prior–kernel ratio from Claim B.1 and \[ Z^{(\mathrm{NG})}_{n_{\mathrm{prior}}} = \iint L\,\pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}},\qquad Z^{(\mathrm{ING})}_{n_{\mathrm{prior}}} = \iint L\,\pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}}. \]
By Claim B.1, \[ \tilde R_{n_{\mathrm{prior}}}(\beta,\tau) = h_{n_{\mathrm{prior}}}(\beta)\,\tau^{c_p} \exp\!\bigl(-\tfrac12\tau\,q_{n_{\mathrm{prior}}}(\beta)\bigr), \] with \(h_{n_{\mathrm{prior}}}(\beta)\to 1\) for each fixed \(\beta\) and \(q_{n_{\mathrm{prior}}}(\beta)\) a quadratic form whose coefficients are continuous in \(n_{\mathrm{prior}}\) and converge pointwise. Assumption 1 ensures \(X^\top W_{\mathrm{obs}}X\) is positive definite, so \(\Sigma_0^{-1}\) and \(\Sigma(n_{\mathrm{prior}})^{-1}\) are well defined. Assumptions 2–3 (\(n_w>p\), \(\mathrm{RSS}_w>0\)) ensure the scalar multipliers in \(\Sigma(n_{\mathrm{prior}})\) are positive. Assumptions 4–5 (\(k\ge0\), \(k+p\ge2\)) ensure the Gamma shapes and rates in the kernels are strictly positive.
In our explicit construction, \(q_{n_{\mathrm{prior}}}(\beta)\) does not depend on \(n_{\mathrm{prior}}\) at all, and \[ h_{n_{\mathrm{prior}}}(\beta) = \exp\!\left( -\tfrac12(\beta-\mu)^\top\Sigma(n_{\mathrm{prior}})^{-1}(\beta-\mu) \right) \to 1 \] because \(\Sigma(n_{\mathrm{prior}})^{-1}\to 0\) as \(n_{\mathrm{prior}}\to 0^+\). Thus, for each fixed \((\beta,\tau)\), \[ \tilde R_{n_{\mathrm{prior}}}(\beta,\tau)\to 1. \]
Next, write the normalizing–constant ratio as \[ \frac{ Z^{(\mathrm{NG})}_{n_{\mathrm{prior}}} }{ Z^{(\mathrm{ING})}_{n_{\mathrm{prior}}} } = \frac{ \iint L\,\pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}} }{ \iint L\,\pi^{(\mathrm{ING})}_{n_{\mathrm{prior}}} } = \frac{ \iint L\,\pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}} }{ \iint L\,\tilde R_{n_{\mathrm{prior}}}\,\pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}} } = \frac{1}{ \mathbb{E}_{\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}}\!\bigl[\tilde R_{n_{\mathrm{prior}}}(\beta,\tau)\bigr] }. \]
Claim B.2 provides a measurable envelope \(M(\beta,\tau)\) such that \[ \sup_{0<n_{\mathrm{prior}}<\delta} |R_{n_{\mathrm{prior}}}(\beta,\tau)| \le M(\beta,\tau), \qquad \int M(\beta,\tau)\,\Pi_0(\mathrm{d}\beta,\mathrm{d}\tau)<\infty, \] where \(\Pi_0\) is the NG weak–prior limit from Theorem 2. Assumptions 2–5 ensure that the limiting Gamma parameters of \(\Pi_0\) are strictly positive, which guarantees integrability of the envelope.
In particular, for \(n_{\mathrm{prior}}\) small, the normalizing–constant ratio stays in a bounded interval, so \(|\tilde R_{n_{\mathrm{prior}}}|\) is also dominated by a multiple of \(M\). Together with the pointwise convergence \(\tilde R_{n_{\mathrm{prior}}}\to 1\) and the uniform moment bounds from Lemma A (which rely on Assumptions 2–5), this yields, by dominated convergence, \[ \mathbb{E}_{\Pi^{(\mathrm{NG})}_{n_{\mathrm{prior}}}}\!\bigl[\tilde R_{n_{\mathrm{prior}}}(\beta,\tau)\bigr] \longrightarrow 1, \qquad \frac{ Z^{(\mathrm{NG})}_{n_{\mathrm{prior}}} }{ Z^{(\mathrm{ING})}_{n_{\mathrm{prior}}} } \longrightarrow 1. \]
Finally, \[ R_{n_{\mathrm{prior}}}(\beta,\tau) = \tilde R_{n_{\mathrm{prior}}}(\beta,\tau)\, \frac{ Z^{(\mathrm{NG})}_{n_{\mathrm{prior}}} }{ Z^{(\mathrm{ING})}_{n_{\mathrm{prior}}} } \longrightarrow 1 \quad\text{for each fixed }(\beta,\tau), \] and the same envelope \(M\) from Claim B.2 provides the required domination. This proves Lemma B. \(\square\)
The proof of Theorem 3 reduces to:
Only Lemmas A and B require nontrivial work; all other steps follow from standard arguments in posterior convergence theory.
We sketch how each closed‑form expression in Theorem 1 follows from standard Normal–Gamma algebra under the calibration in §3.3.1–3.3.2; see (Raiffa and Schlaifer 1961; Gelman et al. 2013) for the underlying updates.
Start from the prior \[ \beta \mid \tau \sim N\bigl(\mu,\;\tau^{-1}\Sigma_0\bigr), \qquad \tau \sim \Gamma(a_0,b_0), \] and the weighted Gaussian likelihood \[ y \mid \beta,\tau \sim N\bigl(X\beta,\;\tau^{-1}W_{\mathrm{obs}}^{-1}\bigr), \] with \[ G := X^\top W_{\mathrm{obs}}X,\qquad \hat\beta := G^{-1}X^\top W_{\mathrm{obs}}y,\qquad \mathrm{RSS}_w := (y-X\hat\beta)^\top W_{\mathrm{obs}}(y-X\hat\beta). \]
Assumption 1 ensures \(G\) is positive definite, so \(G^{-1}\) exists. Assumption 2 ensures \(n_w>p\), so the weighted Gaussian likelihood is proper. Assumption 3 ensures \(\mathrm{RSS}_w>0\), so the marginal quadratic term is strictly positive.
Under the Zellner calibration in §3.3.2, \[ \Sigma_0 = \frac{1-\mathrm{pwt}}{\mathrm{pwt}}\,G^{-1} = \frac{n_w}{n_{\mathrm{prior}}}\,G^{-1}. \]
and \[ a_0 = \frac{n_{\mathrm{prior}}+k}{2}, \qquad b_0 = \frac{n_{\mathrm{prior}}+k+p-2}{2}\,\frac{\mathrm{Smarg}}{n_w-p}, \]
with \(k\ge0\) and \(k+p\ge2\) by Assumptions 4–5, ensuring \(a_0>0\) and \(b_0>0\).
The joint prior–likelihood kernel in \((\beta,\tau)\) is \[ \pi(\beta,\tau\mid y) \propto \tau^{a_0-1}\exp(-b_0\tau)\, \tau^{p/2}\exp\!\Bigl(-\tfrac{\tau}{2}(\beta-\mu)^\top\Sigma_0^{-1}(\beta-\mu)\Bigr)\, \tau^{n_w/2}\exp\!\Bigl(-\tfrac{\tau}{2}\mathrm{RSS}_w(\beta)\Bigr), \] where \[ \mathrm{RSS}_w(\beta) = \mathrm{RSS}_w + (\beta-\hat\beta)^\top G(\beta-\hat\beta). \]
Collecting powers of \(\tau\) gives the Gamma shape update; collecting quadratic forms in \(\beta\) and completing the square gives the Normal block.
The quadratic form in \(\beta\) is \[ \frac{\tau}{2} \Bigl[ (\beta-\hat\beta)^\top G(\beta-\hat\beta) + (\beta-\mu)^\top\Sigma_0^{-1}(\beta-\mu) \Bigr]. \]
Write \[ G_{\mathrm{post}} := G + \Sigma_0^{-1}, \] and complete the square: \[ (\beta-\hat\beta)^\top G(\beta-\hat\beta) + (\beta-\mu)^\top\Sigma_0^{-1}(\beta-\mu) = (\beta-\mu_{\mathrm{post}})^\top G_{\mathrm{post}}(\beta-\mu_{\mathrm{post}}) + \text{const}, \] with \[ \mu_{\mathrm{post}} = G_{\mathrm{post}}^{-1}\bigl(G\hat\beta + \Sigma_0^{-1}\mu\bigr). \]
Using \(\Sigma_0^{-1} = \frac{\mathrm{pwt}}{1-\mathrm{pwt}}G = \frac{n_w}{n_{\mathrm{prior}}}G\), we have \[ G_{\mathrm{post}} = \Bigl(1+\frac{n_w}{n_{\mathrm{prior}}}\Bigr)G = \frac{n_{\mathrm{prior}}+n_w}{n_{\mathrm{prior}}}\,G, \] so \[ G_{\mathrm{post}}^{-1} = \frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}}+n_w}\,G^{-1}. \]
Substituting into \(\mu_{\mathrm{post}}\), \[ \mu_{\mathrm{post}} = \frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}}+n_w}\,\mu + \frac{n_w}{n_{\mathrm{prior}}+n_w}\,\hat\beta, \] and the dispersion‑free posterior covariance is \[ \Sigma_{0,\mathrm{post}} = G_{\mathrm{post}}^{-1} = \frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}}+n_w}\,G^{-1} = \frac{n_{\mathrm{prior}}+n_w}{n_w}\,\Sigma_0, \] which matches item (ii) of Theorem 1.
To obtain the posterior Gamma update for \(\tau\), we must work with the marginal kernel \(\pi(\tau\mid y)\), not the conditional kernel \(\pi(\tau\mid\beta,y)\). This distinction matters because the conditional Normal density in \(\beta\mid\tau\) contains a factor \(\tau^{p/2}\), but this factor is exactly canceled when we integrate out \(\beta\).
Start from the joint kernel \[ \pi(\beta,\tau\mid y) \;\propto\; \tau^{a_0-1}\,e^{-b_0\tau}\; \tau^{p/2}\exp\!\Bigl(-\tfrac{\tau}{2}(\beta-\mu)^\top\Sigma_0^{-1}(\beta-\mu)\Bigr)\; \tau^{n_w/2}\exp\!\Bigl(-\tfrac{\tau}{2}\mathrm{RSS}_w(\beta)\Bigr). \]
If we look only at the conditional kernel in \(\beta\mid\tau\), the exponent of \(\tau\) appears to be \[ a_0 - 1 + \frac{p}{2} + \frac{n_w}{2}. \]
However, the marginal Gamma update is obtained from \[ \pi(\tau\mid y) \;\propto\; \tau^{a_0-1}\,e^{-b_0\tau}\; \tau^{n_w/2} \int \tau^{p/2}\exp\!\Bigl(-\tfrac{\tau}{2}Q(\beta)\Bigr)\,d\beta, \] where \(Q(\beta)\) is the quadratic form combining the likelihood and prior.
The integral over \(\beta\) is a multivariate Gaussian integral: \[ \int \tau^{p/2}\exp\!\Bigl(-\tfrac{\tau}{2}Q(\beta)\Bigr)\,d\beta = \tau^{p/2}\cdot (2\pi)^{p/2}\cdot \tau^{-p/2}\cdot |G_{\mathrm{post}}|^{-1/2} \exp\!\Bigl(-\tfrac{\tau}{2}Q(\mu_{\mathrm{post}})\Bigr). \]
The crucial point is the cancellation: \[ \tau^{p/2}\times\tau^{-p/2} = 1. \]
Thus no \(p/2\) term survives in the marginal kernel for \(\tau\).
After cancellation, the only remaining powers of \(\tau\) are \[ a_0 - 1 + \frac{n_w}{2}, \] so the posterior Gamma shape is \[ a_n = a_0 + \frac{n_w}{2} = \frac{n_{\mathrm{prior}}+k}{2} + \frac{n_w}{2} = \frac{n_{\mathrm{prior}} + k + n_w}{2}, \] matching item (iii) of Theorem 1.
For the rate parameter, the Gaussian integral contributes the marginal quadratic term from §3.1: \[ \frac{1}{2}\,\mathrm{Smarg}. \]
Thus \[ b_n = b_0 + \frac{1}{2}\,\mathrm{Smarg} = \frac{1}{2}\,\frac{n_{\mathrm{prior}}+k+p-2}{n_w-p}\,\mathrm{Smarg} + \frac{1}{2}\,\mathrm{Smarg} = \frac{1}{2}\,\frac{n_{\mathrm{prior}}+k+n_w-2}{n_w-p}\,\mathrm{Smarg}, \] which reduces to the expression in item (iv) under the calibration of §3.3.1.
Given \(\tau\), the posterior factorizes as \[ \beta\mid\tau,y \sim N\bigl(\mu_{\mathrm{post}},\;\tau^{-1}\Sigma_{0,\mathrm{post}}\bigr), \qquad \tau\mid y \sim \Gamma(a_n,b_n), \] with \[ \mu_{\mathrm{post}} = \frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}}+n_w}\,\mu + \frac{n_w}{n_{\mathrm{prior}}+n_w}\,\hat\beta, \qquad \Sigma_{0,\mathrm{post}} = \frac{n_{\mathrm{prior}}+n_w}{n_w}\,G^{-1}, \] \[ a_n = \frac{n_{\mathrm{prior}}+k+n_w}{2}, \qquad b_n = \frac{n_{\mathrm{prior}}+k+n_w-2}{2}\, \frac{\mathrm{Smarg}}{n_w-p}. \]
Marginal mean of \(\beta\).
Using the law of total expectation, \[
E[\beta\mid y]
=
E_\tau\bigl[E[\beta\mid\tau,y]\bigr]
=
E_\tau[\mu_{\mathrm{post}}]
=
\mu_{\mathrm{post}},
\] since \(\mu_{\mathrm{post}}\)
does not depend on \(\tau\). Thus \[
E[\beta\mid y]
=
\frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}}+n_w}\,\mu
+
\frac{n_w}{n_{\mathrm{prior}}+n_w}\,\hat\beta,
\] a convex combination of the prior mean and the weighted
least‑squares estimate, with weights \[
\frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}}+n_w}
\quad\text{and}\quad
\frac{n_w}{n_{\mathrm{prior}}+n_w},
\] as in item (v).
Marginal mean of \(\sigma^2 =
\tau^{-1}\).
For \(\tau\sim\Gamma(a_n,b_n)\) with
shape–rate parameterization, \[
E[\tau^{-1}\mid y]
=
\frac{b_n}{a_n-1},
\quad\text{provided }a_n>1.
\]
Assumptions 4–5 (\(k\ge0\), \(k+p\ge2\)) together with Assumption 2
(\(n_w>p\)) ensure
\[
a_n=\frac{n_{\mathrm{prior}}+k+n_w}{2}>1,
\] so the expectation is well‑defined.
Substituting the expressions for \(a_n\) and \(b_n\), \[
E[\sigma^2\mid y]
=
\frac{
\frac{n_{\mathrm{prior}}+k+n_w-2}{2}\,\frac{\mathrm{Smarg}}{n_w-p}
}{
\frac{n_{\mathrm{prior}}+k+n_w}{2}-1
}
=
\frac{\mathrm{Smarg}}{n_w-p},
\] which is exactly the residual‑variance estimator in item
(vi).
Assumption 3 (\(\mathrm{RSS}_w>0\))
ensures \(\mathrm{Smarg}>0\).
Marginal covariance of \(\beta\).
By the law of total covariance, \[
\mathrm{Cov}(\beta\mid y)
=
E_\tau\bigl[\mathrm{Cov}(\beta\mid\tau,y)\bigr]
+
\mathrm{Cov}_\tau\bigl(E[\beta\mid\tau,y]\bigr).
\] Since \(E[\beta\mid\tau,y]=\mu_{\mathrm{post}}\)
does not depend on \(\tau\), the second
term vanishes and \[
\mathrm{Cov}(\beta\mid y)
=
E_\tau\bigl[\tau^{-1}\Sigma_{0,\mathrm{post}}\bigr]
=
E[\tau^{-1}\mid y]\;\Sigma_{0,\mathrm{post}}.
\]
We now compute both factors explicitly.
Step 1: \(E[\tau^{-1}\mid
y]\).
From the Gamma block in Theorem 1, \[
\tau\mid y \sim \Gamma(a_n,b_n),
\qquad
a_n = \frac{n_{\mathrm{prior}}+k+n_w}{2},
\quad
b_n =
\frac{n_{\mathrm{prior}}+k+n_w-2}{2}\,\frac{\mathrm{Smarg}}{n_w-p}.
\]
Assumptions 4–5 together with Assumption 2 ensure \(a_n>1\), and Assumptions 2–3 ensure
\(b_n>0\).
Thus the Gamma moment formula applies: \[
E[\tau^{-1}\mid y] = \frac{b_n}{a_n-1}.
\]
Substitute: \[ a_n-1 = \frac{n_{\mathrm{prior}}+k+n_w}{2}-1 = \frac{n_{\mathrm{prior}}+k+n_w-2}{2}, \] so \[ E[\tau^{-1}\mid y] = \frac{ \frac{n_{\mathrm{prior}}+k+n_w-2}{2}\,\frac{\mathrm{Smarg}}{n_w-p} }{ \frac{n_{\mathrm{prior}}+k+n_w-2}{2} } = \frac{\mathrm{Smarg}}{n_w-p}. \]
Step 2: \(\Sigma_{0,\mathrm{post}}\).
By conjugate Normal–Gamma algebra, \[
\Sigma_{0,\mathrm{post}}
=
\bigl(\Sigma_0^{-1} + G\bigr)^{-1},
\qquad
G = X^\top W_{\mathrm{obs}}X.
\]
Assumption 1 ensures \(G\) is positive definite, so all inverses exist.
Under the Zellner calibration, \[ \Sigma_0 = \frac{1-\mathrm{pwt}}{\mathrm{pwt}}\,G^{-1}, \quad\text{so}\quad \Sigma_0^{-1} = \frac{\mathrm{pwt}}{1-\mathrm{pwt}}\,G. \]
Hence \[ \Sigma_0^{-1} + G = \Bigl(\frac{\mathrm{pwt}}{1-\mathrm{pwt}} + 1\Bigr)G = \frac{1}{1-\mathrm{pwt}}\,G, \] and therefore \[ \Sigma_{0,\mathrm{post}} = (1-\mathrm{pwt})\,G^{-1}. \]
Now use the mapping between \(\mathrm{pwt}\) and \(n_{\mathrm{prior}}\): \[ \mathrm{pwt} = \frac{n_{\mathrm{prior}}}{n_{\mathrm{prior}}+n_w} \quad\Longrightarrow\quad 1-\mathrm{pwt} = \frac{n_w}{n_{\mathrm{prior}}+n_w}. \]
Thus \[ \Sigma_{0,\mathrm{post}} = (1-\mathrm{pwt})\,G^{-1} = \frac{n_w}{n_{\mathrm{prior}}+n_w}\,G^{-1}. \]
Step 3: Combine the pieces.
Putting Steps 1 and 2 together, \[
\mathrm{Cov}(\beta\mid y)
=
E[\tau^{-1}\mid y]\;\Sigma_{0,\mathrm{post}}
=
\frac{\mathrm{Smarg}}{n_w-p}\,
\frac{n_w}{n_{\mathrm{prior}}+n_w}\,G^{-1},
\] which is exactly item (vii) of Theorem 1.
In particular, the covariance can be written as \[ \mathrm{Cov}(\beta\mid y) = \Bigl(\text{residual variance estimate } \tfrac{\mathrm{Smarg}}{n_w-p}\Bigr) \times \Bigl(\text{shrinkage factor } \tfrac{n_w}{n_{\mathrm{prior}}+n_w}\Bigr) \times G^{-1}, \] making explicit how larger \(n_{\mathrm{prior}}\) reduces the covariance relative to the weak‑prior (least‑squares) limit obtained when \(n_{\mathrm{prior}}\to 0^+\).
This completes the derivation of the marginal moments in Theorem 1.