This appendix documents the current directional_tail()
diagnostic, how to call it directly, and how it is used inside
summary.glmb().
The diagnostic measures prior–posterior disagreement in a multivariate direction (Evans and Moshonov 2006) rather than coefficient-by-coefficient only. It reports:
mahalanobis_shift: the standardized prior-posterior
shift magnitude in whitened spacep_directional: directional tail probability,
P(delta^T Z <= 0)delta: posterior mean shift vector in whitened
coordinatesLet mu0 denote the reference vector (prior mean by
default, or a null/reference mode), and let \(\beta^{(m)}\) be posterior draws.
directional_tail() computes a whitening map from posterior
precision so that
\[ Z^{(m)} = W\left(\beta^{(m)} - \mu_0\right), \]
with \(W^\top W = \text{Prec}_{post}\). In this scale, Euclidean distance corresponds to Mahalanobis distance in the original coefficient space.
Define
\[ \delta = E[Z \mid y], \qquad d = \|\delta\|_2. \]
The function returns \(d\) as
mahalanobis_shift, and the directional-tail probability
\[ p_{\text{dir}} = P\!\left(\delta^\top Z \le 0 \mid y\right), \]
estimated by posterior draws.
Intuition: \(\delta\) is the posterior mean shift away from the reference in whitened space. The event \(\delta^\top Z \le 0\) is the half-space opposite the direction of disagreement, so \(p_{\text{dir}}\) quantifies how much posterior mass lies “against” that shift.
Under a Gaussian approximation in whitened space, \[ Z \mid y \approx N(\delta, I_p), \] then with \(u = \delta/\|\delta\|\), \[ u^\top Z \sim N(d, 1), \qquad p_{\text{dir}} = P(u^\top Z \le 0) = \Phi(-d). \] So the directional tail is a one-dimensional tail probability driven entirely by the standardized distance \(d\).
For \(p=1\), whitening reduces to scalar standardization and \[ d = \frac{\mu_{post} - \mu_0}{\sigma_{post}}, \] so \(d\) is the posterior z/t-type contrast against \(\mu_0\). In this case:
Thus directional_tail() plays the same role as a
one-sided standardized test in the univariate case, while still
extending naturally to multivariate models.
The squared shift \[ d^2 = \delta^\top \delta \] is the whitened quadratic distance from the reference. This is the same geometric core as multivariate Wald statistics (and, after scaling, Hotelling/F forms): larger \(d^2\) means stronger global departure from the reference.
Difference in emphasis:
directional_tail() uses directional
half-space evidence along \(\delta\).Under the Gaussian approximation, both are monotone in \(d\): as \(d\) increases, quadratic-test evidence strengthens and \(p_{\text{dir}}=\Phi(-d)\) decreases.
In this sense, directional_tail() is a directional
companion to multivariate Wald/F-style testing: it retains the same
Mahalanobis geometry but reports evidence on an interpretable one-sided
directional probability scale.
This is the key conceptual difference from classical p-values (Bernardo and Smith 1994).
In a classical t-test, the p-value is a sampling probability under the null: \[ P_{H_0}\!\left(T(Y^{rep}) \ge T(y_{obs})\right), \] that is, how extreme the observed statistic would be in repeated samples if \(H_0\) were true.
By contrast, the directional-tail quantity here is a posterior probability conditional on the observed data: \[ p_{\text{dir}} = P\!\left(\delta^\top Z \le 0 \mid y,\ \text{model},\ \text{prior},\ \mu_0\right). \]
So the probability being assessed is not a prior probability and not a frequentist repeated-sampling probability. It is posterior mass, after updating by the data.
Practical interpretation:
p_directional: posterior mass is concentrated in
the direction away from the reference \(\mu_0\), indicating stronger directional
disagreement.p_directional (near 0.5): weak directional
separation from the reference.For summary.glmb(), this means:
dir_tail (vs prior): posterior directional disagreement
with the prior mean.dir_tail_null (vs null): posterior directional
disagreement with the null/intercept-only reference used in the summary
routine.In the one-parameter case, directional tail reduces to a one-sided posterior tail area relative to \(\mu_0\):
This is closely related to well-known Bayesian sign/tail summaries
(including posterior sign probability / “probability of direction” style
summaries in applied Bayesian reporting (Gelman
et al. 2013)). In that language, for a scalar parameter,
p_directional is essentially the opposite-side posterior
mass, and the sign probability is approximately \(1 - p_{\text{dir}}\).
The main contribution of directional_tail() is the
multivariate extension: it uses whitening plus
projection onto the posterior shift direction to provide a single
coherent directional-tail probability even when coefficients are
correlated and inference is genuinely multivariate.
Because the targets differ (posterior mass vs repeated-sampling
extremeness), p_directional is not numerically identical to
a classical p-value. Still, the two are often comparable through the
standardized shift magnitude \(d\).
In the scalar Gaussian approximation: \[ p_{\text{dir}} \approx \Phi(-|d|), \qquad p_{\text{post,2s}} \approx 2\Phi(-|d|), \] while a classical two-sided p-value is typically \[ p_{\text{classical,2s}} \approx 2\{1-F_t(|t|;\nu)\} \] (or \(2\Phi(-|z|)\) asymptotically).
When the prior is strong relative to the likelihood, two effects dominate:
Consequences for p_directional vs classical
p-values:
p_directional tends to be larger (weaker apparent evidence
against the reference) than classical p-values.p_directional can be smaller (stronger
evidence in that direction) than classical tails.directional_tail() summarizes the net
directional result along \(\delta\).As prior precision becomes small relative to information in the data:
Hence in the univariate case, p_directional approaches a
one-sided likelihood-based tail measure, and \(2\,p_{\text{dir}}\) approaches the familiar
two-sided z/t-style tail (asymptotically).
In the multivariate case, \(d^2\) approaches the corresponding Wald quadratic form (and its Hotelling/F scaling in finite samples), so Bayesian directional and classical quadratic tests tend to agree more closely in large-sample/weak-prior regimes.
For one coefficient, the posterior-vs-classical comparison can be decomposed into a center effect and a scale effect. Write \[ t_{\text{post}} = \frac{\hat\beta_{\text{post}}-\mu_0}{\text{SE}_{\text{post}}}, \qquad t_{\text{class}} = \frac{\hat\beta_{\text{MLE}}-\mu_0}{\text{SE}_{\text{class}}}. \] Then \[ t_{\text{post}} = t_{\text{class}} \times \underbrace{\frac{\hat\beta_{\text{post}}-\mu_0}{\hat\beta_{\text{MLE}}-\mu_0}}_{\text{center/shrinkage factor } \kappa} \times \underbrace{\frac{\text{SE}_{\text{class}}}{\text{SE}_{\text{post}}}}_{\text{scale factor}}. \]
In conjugate normal-gamma style parameterizations, a common heuristic is \[ \text{SE}_{\text{post}} \approx \text{SE}_{\text{class}} \sqrt{\frac{n-p}{n+n_{\text{prior}}}}, \] so \[ \frac{\text{SE}_{\text{class}}}{\text{SE}_{\text{post}}} \approx \sqrt{\frac{n+n_{\text{prior}}}{n-p}}. \] If one also uses a simple shrinkage approximation \(\kappa \approx \frac{n}{n+n_{\text{prior}}}\), then \[ t_{\text{post}} \approx t_{\text{class}} \cdot \frac{n}{n+n_{\text{prior}}} \cdot \sqrt{\frac{n+n_{\text{prior}}}{n-p}}. \]
Using \(n_{\text{prior}} = n\,\frac{\text{pwt}}{1-\text{pwt}}\), this can be rewritten as \[ t_{\text{post}} \approx t_{\text{class}} \cdot \sqrt{1-\text{pwt}} \cdot \sqrt{\frac{n}{n-p}}. \]
This expression shows why posterior directional tails can be either larger or smaller than classical tails:
A useful threshold from this heuristic is: \[ \sqrt{\frac{n+n_{\text{prior}}}{n-p}} > 1 \;\;\Longleftrightarrow\;\; n_{\text{prior}} > p, \] so once prior pseudo-sample size is large relative to dimension, scale effects can dominate unless center shrinkage is also strong.
This is the scalar analogue of the multivariate directional-tail story: center and scale are both active, and the final probability is driven by the standardized shift magnitude after whitening.
Ex_directional_tail.R)The package example already demonstrates the intended workflow and interpretation. In particular, it shows:
lmb/glmb),directional_tail(fit),mahalanobis_shift,
p_directional, and the draw-level objects,A compact usage sketch from that example is:
example("directional_tail", package = "glmbayes")
# Core objects used in interpretation:
dt <- directional_tail(fit)
dt$mahalanobis_shift
dt$p_directional
# Same quantities are also surfaced in summary output:
s <- summary(fit)
s$dir_tail
s$dir_tail_nullEx_directional_tail.RThe example script also includes two useful diagnostic visualizations:
dt <- directional_tail(fit)
Z <- dt$draws$Z
flag <- dt$draws$is_tail
delta <- dt$delta
w <- delta
plot(
Z,
col = ifelse(flag, "red", "blue"),
pch = 19,
xlab = "Z1",
ylab = "Z2",
main = "Directional Tail Diagnostic (Whitened Space)"
)
# Decision boundary orthogonal to direction vector
abline(a = 0, b = -w[1] / w[2], col = "darkgreen", lty = 2)
# Radius corresponding to Mahalanobis shift
r <- sqrt(sum(delta^2))
symbols(
delta[1], delta[2],
circles = r,
inches = FALSE,
add = TRUE,
lwd = 2,
fg = "gray"
)
points(0, 0, pch = 4, col = "black", lwd = 2) # reference center in whitened space
points(delta[1], delta[2], pch = 3, col = "purple", lwd = 2) # posterior shiftB <- dt$draws$B
flag <- dt$draws$is_tail
mu0 <- as.numeric(fit$Prior$mean)
mu_post <- colMeans(B)
plot(
B,
col = ifelse(flag, "red", "blue"),
pch = 19,
xlab = "Coefficient 1",
ylab = "Coefficient 2",
main = "Directional Tail Diagnostic (Raw Coefficient Space)"
)
points(mu0[1], mu0[2], pch = 4, col = "black", cex = 1.5) # reference center
points(mu_post[1], mu_post[2], pch = 3, col = "darkgreen", cex = 1.5) # posterior center
legend(
"topright",
legend = c("Tail draws", "Non-tail draws", "Reference", "Posterior"),
col = c("red", "blue", "black", "darkgreen"),
pch = c(19, 19, 4, 3),
bty = "n"
)directional_tail() is calledThe function is exported and can be called directly on a fitted
object (glmb or lmb):
fit <- glmb(
counts ~ outcome + treatment,
family = poisson(),
pfamily = dNormal(mu = mu, Sigma = V)
)
dt_prior <- directional_tail(fit) # reference = prior mean
dt_prior
print(dt_prior)To compare against a user-specified reference (for example a
null/intercept-only mode), pass mu0 explicitly:
summary.glmb() uses directional tailsummary.glmb() computes two directional diagnostics
internally:
directional_tail(object) : posterior vs prior meandirectional_tail(object, mu0 = null_est_full) :
posterior vs null referenceThese are returned in the summary object:
The print method displays a compact “Directional Tail Summaries” table with:
vs Null,
vs Prior)vs Null, vs Prior)The package example inst/examples/Ex_directional_tail.R
provides a full reproducible script. A minimal call pattern is:
ps <- Prior_Setup(weight ~ group, family = gaussian(), data = dat2)
fit <- lmb(weight ~ group, dNormal(ps$mu, ps$Sigma, dispersion = ps$dispersion), data = dat2, n = 10000)
dt <- directional_tail(fit)
print(dt)** 1.0 Why the Bayesian tail can be smaller
There are three distinct, concrete reasons the Bayesian posterior tail probability for a coefficient can be smaller than the classical t tail
*** 1.1 Different effective degrees of freedom (v)
In the conjugate parameterization above, posterior \(\boldsymbol{v = 2\alpha_{0} + n}\). If the prior has positive \(\boldsymbol{\alpha_{0}}\) (an informative prior on the variance), the posterior \(\boldsymbol{v}\) will be larger than the classical \(n - p\). Increasing \(v\) makes the Student‑t closer to normal (lighter tails), so the tail probability at the same \(t\) value is smaller.
– Even for vaguely inforative priors, the prior contributes “pseudo-observations” to the posterior df. This is often the main cause of smaller Bayesian tails
*** 1.2 Different standard error (scale) used in the denominator
Bayesian marginal variance for \(\boldsymbol{\beta_{j}}\) is \(\boldsymbol{\frac{\beta_{n}}{\alpha_{n}} V_{n}[j,j]}\).
Two effects happen here:
(a) \(\boldsymbol{V_{n}}\) generally
shrinks relative to \((X^{T}X)^{-1}\)
because of the prior precision \(\boldsymbol{V_{0}^{-1}}\), and
(b) the posterior scale factor \(\boldsymbol{\frac{\beta_{n}}{\alpha_{n}}}\)
can be smaller than the classical estimate \(\hat{s}^{2}\) depending on prior
hyperparameters and the data (e.g., prior favors smaller variance).
Either effect reduces the denominator and can increase the raw \(t\) value — but combined with larger \(\boldsymbol{v}\) the net tail probability can still be smaller.
So you must compare both the numerator (posterior mean \(\boldsymbol{m_{n,j}}\)) vs. the OLS estimate \(\hat{\beta}_{j}\), and the denominator (posterior scale vs. \(\boldsymbol{\hat{s}\sqrt{(X^{T}X)^{-1}_{jj}}}\)). Prior shrinkage often makes \(m_{n,j}\) closer to zero and \(V_{n}[j,j]\) smaller, both of which reduce evidence against \(H_{0}\).
For Bayesian linear models with normal-gamma priors, it seems like standard errors used in t-statistic calculations need to be adjusted relative to the classical estimates. In particular, the adjustment seems to be
The adjusted standard error is:
\[ \text{Std\_err}_t = \text{Std\_err} \cdot \sqrt{ \frac{n - p}{n + n_{\text{prior}}} } \]
which leads to:
\[ t_{\text{adj},i} = \frac{n}{n + n_{\text{prior}}} \, t_{\text{unadjusted},i} \, \sqrt{\frac{n + n_{\text{prior}}}{n - p}}. \]
where the first ratio shows the skrinkage effect (lower t-value) and the second effect the impact of the adjustment to the standard error (which boosts the t-value).
\[ t_{\text{adj},i} = \sqrt{\frac{n}{n + n_{\text{prior}}}} \cdot \sqrt{\frac{n}{n -p}} \cdot t_{\text{unadjusted},i} \]
\[ t_{\text{adj},i} = \sqrt{\frac{n}{n + n_{\text{prior}}}} \cdot \sqrt{\frac{n}{n -p}} \cdot t_{\text{unadjusted},i} \]
\[ t_{\text{adj},i} = \sqrt{1-pwt} \cdot \sqrt{\frac{n}{n -p}} \cdot t_{\text{unadjusted},i} \]
It can be seen from the above that t-statitic in the Bayesian context is larger than the unadjusted t-statistic (i.e., inflated) whenver
\[ n_{\text{prior}} < p \cdot \frac{n}{n - p} \]
which implies tail probabilities smaller than their classical counterparts.
\[ n + n_{\text{prior}} = n + n\left(\frac{\text{pwt}}{1 - \text{pwt}}\right) = \frac{n(1 - \text{pwt}) + n(\text{pwt})}{1 - \text{pwt}} = \frac{n}{1 - \text{pwt}}. \]
The adjusted standard error is \[ \text{StdErr}_{t} = \text{StdErr} \cdot \sqrt{\frac{n - p}{n + n_{\text{prior}}}}, \] which leads to the adjusted t-statistic
\[ t_{\text{adj},i} = t_{\text{unadjusted},i} \cdot \sqrt{\frac{n + n_{\text{prior}}}{n - p}}. \]
For reference, the same expression can be written as
\[ \text{StdErr}_{t} = \text{StdErr} \cdot \sqrt{\frac{n - p}{n + n_{\text{prior}}}}. \]
which leads to
\[ t_{\text{adj},i} = t_{\text{unadjusted},i} \cdot \sqrt{\frac{n + n_{\text{prior}}}{n - p}}. \]