Chapter 03: Tailoring Priors - Leveraging the Prior_Setup Function

Kjell Nygren

2026-04-30

1. Introductory Discussion

In this chapter, we explore how the Prior_Setup function can be used to specify multivariate normal priors for both Bayesian linear models and the generalized linear models (GLMs) introduced in later chapters. In the previous chapter, we covered the minimal arguments required to invoke the function: a model formula and a family (with an optional link function). These define the structure of the model to which the prior will be applied—namely, the variables being modeled, the assumed distribution of the response, and the link function connecting the linear predictor to the data. Multivariate normal and g-prior structure connects to standard GLM treatments (McCullagh and Nelder 1989; Venables and Ripley 2002; Zellner 1986). Prior–data conflict diagnostics (when used elsewhere in the package) relate to (Evans and Moshonov 2006). Envelope sampling builds on (Nygren and Nygren 2006).

For full technical derivations of the priors and calibrated quantities returned by Prior_Setup() (including Gaussian Normal–Gamma calibration), see Chapter A12 (https://knygren.r-universe.dev/articles/glmbayes/Chapter-A12.html).

In many cases, it is also helpful to provide a data frame to Prior_Setup, allowing the function to correctly identify and extract the relevant variables from the model environment.

Now, let’s examine the remaining key arguments that enable the specification of multivariate normal priors. We begin by reviewing the function signature and a representative call—similar to the one used in the previous chapter, but now with visibility into the default settings:

ps = Prior_Setup(
  weight ~ group, data = dat, family = gaussian(), pwt = 0.001,
  n_prior = NULL, sd = NULL,
  intercept_source = c("null_model", "full_model"),
  effects_source = c("null_effects", "full_model"),
  mu = NULL
)

#mu       = ps$mu
#V        = ps$Sigma
#disp_ML  = ps$dispersion

As shown above, we introduce six additional arguments beyond the core model specification:

In the next section, we discuss the default settings and their implications for model interpretation. Section 3 focuses on the parameters that control prior strength, while Section 4 addresses how the prior is centered.

2. The default prior specification

When you call Prior_Setup() with a single pwt value, the prior on the coefficient vector
\(\beta = (\beta_0,\beta_1,\dots,\beta_{k})^\top\)
is set up to be

\[ \beta \sim \mathcal{N}\bigl(\mu_{Prior},\;\Sigma_{Prior}\bigr), \]

where \(\mu_{Prior}\) is the derived prior mean and \(\Sigma_{Prior}\) the derived prior covariance.

In this section,we will frst focus on how the pwt argument is used to set \(\Sigma_{Prior}\) and how some of the other defaults help set \(\mu_{Prior}\). We also take a look at how these together help determine the Posterior mean in Bayesian linear models.


2.1 Identifiability and Full–Rank Likelihood Covariance

Before constructing the prior, Prior_Setup() fits the full GLM and extracts
the maximum likelihood estimate \(\widehat\beta_{\mathrm{MLE}}\) and the variance–covariance of the maximum‐likelihood estimator. \[ V_{MLE} \;=\;\mathrm{vcov}(\widehat\beta_{\mathrm{MLE}}). \]

The matrix \(V_{MLE}\) is checked to verify that it is a symmetric positive–definite (full rank). If \(V_{MLE}\) is rank‐deficient, the classical GLM lacks identifiability and Prior_Setup() aborts with an error. While it is possible to estimate Bayesian models even if the classical GLM lacks identifiability, it is not possible to specificy a g-prior for such a likelihood function and we urge the user to use caution with such models.


2.2 Canonical Precision Definitions and Scaling

For enhanced understanding and to simplify derivations, let us now define the Likelihood precision, the Prior Precision, and the Posterior Precision as follows:

\[ P_{MLE} = V_{MLE}^{-1}, \quad P_{Prior} = \Sigma^{-1}, \quad P_{Post} = P_0 + P_{Prior}. \]

For length(pwt)=1, the Prior_Setup() uses the \(pwt\) arguments to set the prior variance-covariance matrix as an inflated version of the likelihood variance–covariance matrix

\[ \Sigma_{Prior} = \frac{1 - pwt}{pwt}\;V_{MLE}, \quad \Longrightarrow P_{Prior} = \frac{pwt}{1 - pwt}\;P_{MLE}, \]

and therefore

\[ P_{\mathrm{Post}} = \Bigl(1 + \frac{pwt}{1 - pwt}\Bigr)\,P_0 = \frac{1}{1 - pwt}\;P_0. \]

Thus both \(P_{\mathrm{Prior}}\) and \(P_{\mathrm{Post}}\) are simple scalar multiples of \(P_0\). The scaling factor being applied to set \(\Sigma_{Prior}\) is actually a Zellner g-prior:

\[ g \;=\;\frac{1 - \text{pwt}}{\text{pwt}}. \]

It is worth noting that the default for Pwtis 0.05 which means that g=19 and the prior Variance is thus 19 times the likelihood variance (and the standard deviation is \(\sqrt{19}\) times larger than the likelihood standard deviations).


2.3 Posterior Mean via Precision Matrices

In the Gaussian‐data case, the posterior is

\[ \beta\mid y \;\sim\; \mathcal{N}\bigl(\mu_{\mathrm{Post}},\,\Sigma_{\mathrm{Post}}\bigr), \]

with

\[ \Sigma_{\mathrm{Post}} = P_{\mathrm{Post}}^{-1}, \quad \mu_{\mathrm{Post}} = P_{\mathrm{Post}}^{-1} \bigl(P_0\,\widehat\beta_{\mathrm{MLE}} \;+\;P_{\mathrm{Prior}}\,\mu\bigr). \]

Plugging in the scalar‐multiple forms:

\[ P_{\mathrm{Post}}^{-1} = (1 - pwt)\,P_{MLE}^{-1}, \quad P_{\mathrm{Prior}} = \frac{pwt}{1 - pwt}\,P_{MLE}, \]

we recover the familiar weighted‐average form:

\[ \mu_{\mathrm{Post}} = (1 - pwt)\,\widehat\beta_{\mathrm{MLE}} \;+\;pwt\,\mu_{Prior}. \]

Hence we can see that the pwt argument determine the relative weight on the prior in the formula for the posterior mean in these models. While technically, we don’t have the same closed form solution for other models, the pwt still plays a role in how close the posterior will be to the prior mean.

Since the default value for pwt is 0.01, we have a 1% weight on the prior for the default and the prior (when different from the Maximum likelihood estimate) hence pulls the posterior only minimally away from the maximum likelihood estimate.

2.4 The impact of the “null_model” and “null_effects” defaults for the interpretation of model diagnostics

In addition to the pwt arguments with its default, we also see from above that the default for the intercept_source is the “null_model” and the default for the effects_source is “null_effects”. The former means that the default prior for the intercept defaults to the estimate from the intercept only model while the latter means that the default prior for all effects variables is set to 0. This is in many ways consistent with the reference model utilized in most implementations for classical models. In particular, in classical models:

-The R^2 is defined relative to the null model (i.e., the R^2 is always zero for the null model) -The default F-test in a classical model is usally relative to the null model -The t-statistics usually are designed to measure how far estimates are from 0 and this is the default for how p-values are calculated.

It is worthwhile to now take a look at what this default specification means for the posterior mean when pwt is a scalar and the posterior mean is simplified as in the earlier section. In that case, we have:

Let us now re-examine the the Plant data from the previous chapter.

lm.D9 <- lm(weight ~ group)
sumlm<-summary(lm.D9)
sumlm$coefficients
#>             Estimate Std. Error  t value     Pr(>|t|)
#> (Intercept)    5.032  0.2202177 22.85012 9.547128e-15
#> groupTrt      -0.371  0.3114349 -1.19126 2.490232e-01

The groupTrt estimate from the classical model is -0.371 with a two sided p-value of 0.2490 (which implies that the one sided p-value is 0.1245).

lmb.D9=lmb(weight ~ group,dNormal(mu=ps$mu,Sigma=ps$Sigma,dispersion=ps$dispersion),n=1000)
sumlmb<-summary(lmb.D9)
sumlmb$coefficients
#>             Post.Mode  Post.Mean   Post.Sd    MC Error Pr(Null_tail)   SE(tail)
#> (Intercept)  5.031814  5.0311363 0.2172248 0.006869250         0.202 0.01269630
#> groupTrt    -0.370629 -0.3641917 0.3092222 0.009778466         0.120 0.01027619
#>             Pr(Prior_tail)
#> (Intercept)          0.202
#> groupTrt             0.120

We can now see that the corresponding output from the Bayesian run in many ways mirror the output from the classical model. In particular, four out of the columns have outputs analogous to corresponding outputs from the classical model. The Post.Mode and Post.Mean, both mirrors the Estimate column while the Post.sd column mirrors the Std. Error column. The Pr(tail), meanwhile is close to half of the value for Pr(>|t|) column in the classical output.

In addition to this, we have a column with MC Error which represents the error connected to the estimated Post. mean columns. This reflects the fact that we are using Monte Carlo simulation (in this case based on n=1000 draws) to estimate the mean. In the gaussian() case, the actual Post.Mean is the same as the Post.Mode so if we increase the number of draws, we are increasingly likely to observe a Post.Mean that is close to the Post.Mode column and the MC Error column shrinks (see below).

Finally, we also have a column SE(tail). This reflects the Standard error connected to the Pr(tail) column. Just like with the MC Error, we can reduce the error associated with this estimate by generating a larger number of draws (again, see below).

lmb.D9=lmb(weight ~ group,dNormal(mu=ps$mu,Sigma=ps$Sigma,dispersion=ps$dispersion),n=100000)
sumlmb<-summary(lmb.D9)
sumlmb$coefficients
#>             Post.Mode  Post.Mean   Post.Sd     MC Error Pr(Null_tail)
#> (Intercept)  5.031814  5.0315824 0.2200432 0.0006958378       0.19915
#> groupTrt    -0.370629 -0.3707955 0.3107300 0.0009826145       0.11643
#>                SE(tail) Pr(Prior_tail)
#> (Intercept) 0.001262891        0.19915
#> groupTrt    0.001014268        0.11643

For the purposes of this vignette, we simply note that under our default assumptions for the intercept and effects, the Pr(tail) for slope parameters in Bayesian linear models tend to be similar to the one-sided p-value in the classical models when we combineex them with our default value for our pwt (i.e., 0.01). While we don’t explore it fully here, there are certain aspects in which these become more fully equivalent in the limit as pwt is pushed further towards zero. A later vignette provides more details on exactly how these tail probabilities relate to their classical counterparts, while also demonstrating some properties as pwt, n_prior, and the sample size approach certain limits.

3. Adjusting the strength of priors using pwt, n_prior, and sd

In setting the strength of priors, pwt serves as the default mechanism, blending the prior mean with the maximum likelihood estimates. When pwt is a scalar, it uniformly adjusts this blend across parameters. Alternatively, a vector form of pwt allows for more nuanced adjustments, deviating from a Zellner g-prior. n_prior offers an alternative approach, influencing the strength of priors based on sample size equivalence. Finally, sd directly sets the standard deviation, offering another layer of control over the priors’ influence. Each of these parameters plays a crucial role in tailoring the prior to reflect the analyst’s beliefs and the context of the data.

To illustrate the role of these prior specifications, we will work with a model where the intercept dimension is orthogonal to the predictions as it makes interpretations clearer.

3.1 Data Setup and Visualization

data(mtcars)

# Create gallons per mile
mtcars$gpm <- 1 / mtcars$mpg

The mtcars dataset contains fuel efficiency and engine characteristics for 32 vehicles. We begin by preparing the mtcars dataset for modeling. To analyze fuel consumption, we define gallons per mile (gpm) as the reciprocal of miles per gallon (gpm = 1/mpg). This transformation allows us to treat fuel use as a direct cost per unit distance, which aligns more naturally with physical energy expenditure.

# Mean-center predictors
#mtcars$c_hp <- scale(mtcars$hp, center = TRUE, scale = FALSE)
#mtcars$c_drat <- scale(mtcars$drat, center = TRUE, scale = FALSE)
#mtcars$c_qsec <- scale(mtcars$qsec, center = TRUE, scale = FALSE)

mtcars$c_wt  <- as.numeric(scale(mtcars$wt, center = TRUE, scale = FALSE))
mtcars$c_cyl <- as.numeric(scale(mtcars$cyl, center = TRUE, scale = FALSE))

Next, we mean-center the predictors of interest—vehicle weight (wt) and number of cylinders (cyl)—to improve interpretability and reduce collinearity in the regression model. Mean-centering ensures that the intercept corresponds to the expected fuel consumption for a vehicle with average weight and cylinder count.


# Fit linear model
fit <- lm(gpm ~ c_wt, data = mtcars)

# Create scatter plot
plot(mtcars$c_wt, mtcars$gpm,
     pch = 19, col = "steelblue",
     xlab = "Mean-Centered Weight (c_wt)",
     ylab = "Gallons per Mile (gpm)",
     main = "Fuel Consumption vs. Mean-Centered Weight")

# Add regression line
abline(fit, col = "darkred", lwd = 2)

# Optional: Add confidence band manually
x_vals <- seq(min(mtcars$c_wt), max(mtcars$c_wt), length.out = 100)
pred <- predict(fit, newdata = data.frame(c_wt = x_vals), interval = "confidence")

lines(x_vals, pred[, "lwr"], col = "darkred", lty = 2)
lines(x_vals, pred[, "upr"], col = "darkred", lty = 2)

To assess the plausibility of a linear relationship, we first plot gpm against mean-centered weight (c_wt). The scatter plot reveals a clear positive trend: heavier vehicles tend to consume more fuel per mile. The fitted regression line aligns well with the data, and the symmetric confidence band suggests that linearity and homoscedasticity are reasonable assumptions. This relationship is also physically motivated, as rolling resistance and energy demand scale approximately linearly with mass.

# Ensure gpm is defined
mtcars$gpm <- 1 / mtcars$mpg

# Create boxplot
boxplot(gpm ~ factor(cyl), data = mtcars,
        col = "lightblue", border = "darkblue",
        xlab = "Number of Cylinders",
        ylab = "Gallons per Mile (gpm)",
        main = "Fuel Consumption by Number of Cylinders")

# Add jittered points
set.seed(123)  # for reproducibility
points(jitter(as.numeric(factor(mtcars$cyl)), amount = 0.2), mtcars$gpm,
       pch = 19, col = rgb(70/255, 130/255, 180/255, alpha = 0.6))

We then examine fuel consumption across cylinder categories using a boxplot. Treating cyl as a factor, we observe that vehicles with more cylinders tend to have higher median fuel consumption, with 8-cylinder cars showing the greatest fuel use. The jittered points reveal the distribution within each group and highlight the presence of outliers and some overlap acorss groups. This plot supports the inclusion of cyl as a categorical or mean-centered numeric predictor, depending on modeling goals.

Together, these visualizations provide both empirical and theoretical justification for modeling gpm as a function of c_wt and c_cyl. The linear trend with weight is physically motivated, while the categorical differences in cylinder count suggest potential predictive value. These insights set the stage for exploring how prior strength influences inference in Bayesian regression

3.2 Using the pwt argument to adjust the Zellner g-prior

To illustrate the impact of prior strength on marginal predictors, we fit a sequence of Bayesian models using gpm as the response and two mean-centered predictors: c_wt and c_cyl.

To ensure reproducibility of posterior estimates and tail probabilities, we set a fixed random seed before simulation. This allows consistent interpretation across models and avoids misleading comparisons due to Monte Carlo variability.

# Use Prior_Setup to illustrate prior strength
pscars  <- Prior_Setup(formula = gpm ~  c_wt+c_cyl,family=gaussian(),data = mtcars,pwt=0.01) ## pwt=0.01-->n_prior= 32/99
pscars2 <- Prior_Setup(formula = gpm ~  c_wt+c_cyl,family=gaussian(),data = mtcars,pwt=0.5)  ## pwt=0.5-->n_prior= 32
pscars3 <- Prior_Setup(formula = gpm ~  c_wt+c_cyl,family=gaussian(),data = mtcars,pwt=0.55) ## pwt=0.55-->n_prior= 39.111
pscars
#> 
#> Call:  Prior_Setup(formula = gpm ~ c_wt + c_cyl, family = gaussian(), 
#>     data = mtcars, pwt = 0.01)
#> 
#> Setting up a Zellner g-type prior: 
#>   pwt = 0.01 
#>   g   = (1 - pwt)/pwt = 99 
#> 
#> Note: n_prior was computed as (pwt / (1 - pwt)) * n_likelihood: 
#>   n_prior      = 0.3232 
#>   n_likelihood = 32 
#> 
#> Prior Setup Summary
#> ====================
#> 
#> Prior Estimates with 95% Confidence Intervals
#>             Prior.Mean Prior.SD  CI.Lower CI.Upper  pwt
#> (Intercept)   0.054227 0.012701  0.029335 0.079120 0.01
#> c_wt          0.000000 0.021180 -0.041512 0.041512 0.01
#> c_cyl         0.000000 0.011604 -0.022743 0.022743 0.01
#> 
#> Prior Correlation Matrix
#>             (Intercept)    c_wt   c_cyl
#> (Intercept)           1  0.0000  0.0000
#> c_wt                  0  1.0000 -0.7825
#> c_cyl                 0 -0.7825  1.0000
#> 
#> Conditional Dispersion (Gaussian family):  1e-04 
#> 
#> Gamma Prior on Residual Precision:
#>   shape = 0.6616
#>   rate = 6.05648e-05
#>   Expected precision (inverse variance) = 1.09241e+04 which implies 1/Expected precision  = 9.15408e-05 
#> 
#>   Applicable to gaussian models with compound pfamilies (e.g., dNormal_Gamma, dIndependent_Normal_Gamma),
#>   as well as for Gamma regression, quasipoisson, and quasibinomial models.

With pwt = 0.01, the prior exerts minimal influence, resulting in wide prior standard deviations and broad credible intervals. For example, the prior SD for c_wt is approximately 0.0207, yielding a 95% credible interval of ±0.0406 around the prior mean. This reflects substantial uncertainty and allows the posterior to be largely driven by the data. The intercept’s prior interval spans nearly 0.05 units, indicating weak regularization and high flexibility in model fitting

pscars2
#> 
#> Call:  Prior_Setup(formula = gpm ~ c_wt + c_cyl, family = gaussian(), 
#>     data = mtcars, pwt = 0.5)
#> 
#> Setting up a Zellner g-type prior: 
#>   pwt = 0.5 
#>   g   = (1 - pwt)/pwt = 1 
#> 
#> Note: n_prior was computed as (pwt / (1 - pwt)) * n_likelihood: 
#>   n_prior      = 32 
#>   n_likelihood = 32 
#> 
#> Prior Setup Summary
#> ====================
#> 
#> Prior Estimates with 95% Confidence Intervals
#>             Prior.Mean Prior.SD  CI.Lower CI.Upper pwt
#> (Intercept)   0.054227 0.002298  0.049722 0.058732 0.5
#> c_wt          0.000000 0.003833 -0.007513 0.007513 0.5
#> c_cyl         0.000000 0.002100 -0.004116 0.004116 0.5
#> 
#> Prior Correlation Matrix
#>             (Intercept)    c_wt   c_cyl
#> (Intercept)           1  0.0000  0.0000
#> c_wt                  0  1.0000 -0.7825
#> c_cyl                 0 -0.7825  1.0000
#> 
#> Conditional Dispersion (Gaussian family):  2e-04 
#> 
#> Gamma Prior on Residual Precision:
#>   shape = 16.5
#>   rate = 0.0029
#>   Expected precision (inverse variance) = 5.74117e+03 which implies 1/Expected precision  = 1.74181e-04 
#> 
#>   Applicable to gaussian models with compound pfamilies (e.g., dNormal_Gamma, dIndependent_Normal_Gamma),
#>   as well as for Gamma regression, quasipoisson, and quasibinomial models.

At pwt = 0.50, the prior and likelihood contribute equally to the posterior mean. This results in much tighter prior distributions: the prior SD for c_wt drops to ~0.00208, and its credible interval narrows to ±0.00408. The same pattern holds for c_cyl, whose prior interval shrinks to ±0.00223. These narrower intervals reflect stronger regularization, constraining the posterior estimates toward the prior mean and reducing sensitivity to noisy data.

pscars3
#> 
#> Call:  Prior_Setup(formula = gpm ~ c_wt + c_cyl, family = gaussian(), 
#>     data = mtcars, pwt = 0.55)
#> 
#> Setting up a Zellner g-type prior: 
#>   pwt = 0.55 
#>   g   = (1 - pwt)/pwt = 0.8182 
#> 
#> Note: n_prior was computed as (pwt / (1 - pwt)) * n_likelihood: 
#>   n_prior      = 39.1111 
#>   n_likelihood = 32 
#> 
#> Prior Setup Summary
#> ====================
#> 
#> Prior Estimates with 95% Confidence Intervals
#>             Prior.Mean Prior.SD  CI.Lower CI.Upper  pwt
#> (Intercept)   0.054227 0.002151  0.050011 0.058443 0.55
#> c_wt          0.000000 0.003587 -0.007031 0.007031 0.55
#> c_cyl         0.000000 0.001965 -0.003852 0.003852 0.55
#> 
#> Prior Correlation Matrix
#>             (Intercept)    c_wt   c_cyl
#> (Intercept)           1  0.0000  0.0000
#> c_wt                  0  1.0000 -0.7825
#> c_cyl                 0 -0.7825  1.0000
#> 
#> Conditional Dispersion (Gaussian family):  2e-04 
#> 
#> Gamma Prior on Residual Precision:
#>   shape = 20.0556
#>   rate = 0.0037
#>   Expected precision (inverse variance) = 5.39083e+03 which implies 1/Expected precision  = 1.855e-04 
#> 
#>   Applicable to gaussian models with compound pfamilies (e.g., dNormal_Gamma, dIndependent_Normal_Gamma),
#>   as well as for Gamma regression, quasipoisson, and quasibinomial models.

Increasing pwt slightly to 0.55 further tightens the prior distributions. The prior SD for c_cyl falls to ~0.00103, and its credible interval contracts to ±0.00202. This subtle shift can have meaningful diagnostic consequences—especially for marginal predictors—by reducing posterior variance and potentially pushing tail probabilities above conventional significance thresholds. The intercept’s interval also narrows slightly, reinforcing the model’s confidence in its prior location

The prior standard deviations shown here are derived automatically from the specified pwt values, reflecting the degree of shrinkage toward the prior mean. While this default behavior is useful for regularization and empirical calibration, analysts may sometimes prefer to set prior uncertainty explicitly—especially when they have domain knowledge about plausible effect sizes. We explore that approach in a later section using the sd argument, which allows for direct control over the spread of each prior distribution.


set.seed(333)

lmb_cars<- lmb(formula = gpm ~  c_wt+c_cyl,pfamily=dNormal(mu=pscars$mu,Sigma=pscars$Sigma, dispersion = pscars$dispersion),data =mtcars)
lmb_cars2<-lmb(formula = gpm ~  c_wt+c_cyl,pfamily=dNormal(mu=pscars2$mu,Sigma=pscars2$Sigma, dispersion = pscars2$dispersion),data =mtcars)
lmb_cars3<-lmb(formula = gpm ~  c_wt+c_cyl,pfamily=dNormal(mu=pscars3$mu,Sigma=pscars3$Sigma, dispersion = pscars3$dispersion),data =mtcars)


sum_lmbcars<-summary(lmb_cars)  
sum_lmbcars2<-summary(lmb_cars2)
sum_lmbcars3<-summary(lmb_cars3)
sum_lmbcars$coefficients
#>               Post.Mode   Post.Mean     Post.Sd     MC Error Pr(Null_tail)
#> (Intercept) 0.054227240 0.054219787 0.001266651 4.005503e-05         0.497
#> c_wt        0.010850283 0.010859793 0.002149825 6.798343e-05         0.000
#> c_cyl       0.002757255 0.002787737 0.001166016 3.687266e-05         0.008
#>                SE(tail) Pr(Prior_tail)
#> (Intercept) 0.015811104          0.497
#> c_wt        0.000000000          0.000
#> c_cyl       0.002817091          0.008

With minimal prior influence, the posterior closely tracks the classical estimates. Both c_wt and c_cyl show strong directional evidence, with c_cyl clearly significant (Pr(tail) = 0.008), suggesting a meaningful association with fuel consumption.

sum_lmbcars2$coefficients
#>               Post.Mode   Post.Mean     Post.Sd     MC Error Pr(Null_tail)
#> (Intercept) 0.054227240 0.054169822 0.001615584 5.108925e-05         0.474
#> c_wt        0.005479941 0.005557560 0.002826496 8.938167e-05         0.025
#> c_cyl       0.001392553 0.001361256 0.001530493 4.839845e-05         0.192
#>                SE(tail) Pr(Prior_tail)
#> (Intercept) 0.015789997          0.474
#> c_wt        0.004937104          0.025
#> c_cyl       0.012455360          0.192

Moderate prior strength visibly shrinks both coefficients. c_wt remains decisively significant, while c_cyl hovers near the one-tailed threshold (Pr(tail) = 0.054), indicating borderline relevance under regularization.

sum_lmbcars3$coefficients
#>               Post.Mode   Post.Mean     Post.Sd     MC Error Pr(Null_tail)
#> (Intercept) 0.054227240 0.054255290 0.001585555 5.013964e-05         0.485
#> c_wt        0.004931947 0.004972813 0.002816137 8.905406e-05         0.042
#> c_cyl       0.001253298 0.001225835 0.001512452 4.782795e-05         0.204
#>                SE(tail) Pr(Prior_tail)
#> (Intercept) 0.015804272          0.485
#> c_wt        0.006343185          0.042
#> c_cyl       0.012742998          0.204

A slight increase in prior weight pushes c_cyl over the edge (Pr(tail) = 0.063), rendering it non-significant. This illustrates how modest prior adjustments can suppress marginal predictors and shift interpretive conclusions.

Across models with increasing pwt, the posterior mean for c_cyl shrinks steadily toward zero, while its posterior standard deviation remains relatively stable. At low prior strength (pwt = 0.01), c_cyl is clearly significant (Pr(tail) = 0.005), closely tracking the classical estimate. As pwt increases to 0.50, the tail probability rises to 0.050—right on the edge of one-tailed significance. At pwt = 0.55, c_cyl crosses the threshold (Pr(tail) = 0.056), demonstrating how modest prior influence can suppress marginal effects. In contrast, c_wt remains robustly significant across all prior settings, with posterior means and tail probabilities consistently indicating strong evidence. This progression highlights the diagnostic sensitivity of Bayesian inference and the interpretive leverage offered by prior calibration, especially when dealing with predictors of borderline relevance.”

3.3 Using n_prior as an alternative

An alternative to pwt is the n_prior argument, which allows users to specify the number of prior pseudo-observations. This framing is intuitive when the analyst views the prior as representing a hypothetical dataset. While less direct than pwt in controlling shrinkage, it offers a natural interpretation in terms of prior sample size. We revisit this idea later when splitting the dataset, but for now we illustrate its use and diagnostic implications.

# Use Prior_Setup to illustrate prior strength
pscars4 <- Prior_Setup(formula = gpm ~  c_wt+c_cyl,family=gaussian(),data = mtcars,n_prior=1)  ## n_prior=1--> pwt=0.0303
#> Computed pwt = 0.0303 from n_prior = 1 and n_effective = 32
pscars5 <- Prior_Setup(formula = gpm ~  c_wt+c_cyl,family=gaussian(),data = mtcars,n_prior=10)  ## n_prior=10--> pwt=0.2381
#> Computed pwt = 0.2381 from n_prior = 10 and n_effective = 32
pscars6 <- Prior_Setup(formula = gpm ~  c_wt+c_cyl,family=gaussian(),data = mtcars,n_prior=32)  ## n_prior=32--> pwt=0.5
#> Computed pwt = 0.5 from n_prior = 32 and n_effective = 32

set.seed(333)

lmb_cars4<-lmb(formula = gpm ~  c_wt+c_cyl,pfamily=dNormal(mu=pscars4$mu,Sigma=pscars4$Sigma, dispersion = pscars4$dispersion),data =mtcars)
lmb_cars5<-lmb(formula = gpm ~  c_wt+c_cyl,pfamily=dNormal(mu=pscars5$mu,Sigma=pscars5$Sigma, dispersion = pscars5$dispersion),data =mtcars)
lmb_cars6<-lmb(formula = gpm ~  c_wt+c_cyl,pfamily=dNormal(mu=pscars6$mu,Sigma=pscars6$Sigma, dispersion = pscars6$dispersion),data =mtcars)

sum_lmbcars4<-summary(lmb_cars4)
sum_lmbcars5<-summary(lmb_cars5)
sum_lmbcars6<-summary(lmb_cars6)

As we can see from above, we now specify the n_prior argument instead of the pwt argument when calling the Prior_Setup function, everything else essentially remains the same as in the case where we use the pwt argument. The one thing to note is that the call to the Prior_Setup function now triggers a printout to the screen of the pwt argument and the information from which it was derived.

sum_lmbcars4$coefficients
#>               Post.Mode   Post.Mean     Post.Sd     MC Error Pr(Null_tail)
#> (Intercept) 0.054227240 0.054219529 0.001310542 4.144298e-05         0.497
#> c_wt        0.010627765 0.010637603 0.002224319 7.033913e-05         0.000
#> c_cyl       0.002700709 0.002732247 0.001206419 3.815033e-05         0.013
#>                SE(tail) Pr(Prior_tail)
#> (Intercept) 0.015811104          0.497
#> c_wt        0.000000000          0.000
#> c_cyl       0.003582039          0.013

With a very weak prior (n_prior = 1), the posterior remains close to the MLE. c_cyl retains directional significance, reinforcing its empirical signal when prior influence is minimal.

sum_lmbcars5$coefficients
#>               Post.Mode   Post.Mean     Post.Sd     MC Error Pr(Null_tail)
#> (Intercept) 0.054227240 0.054170966 0.001583374 5.007069e-05         0.474
#> c_wt        0.008350386 0.008426458 0.002770144 8.759966e-05         0.002
#> c_cyl       0.002121986 0.002091313 0.001499980 4.743353e-05         0.091
#>                SE(tail) Pr(Prior_tail)
#> (Intercept) 0.015789997          0.474
#> c_wt        0.001412799          0.002
#> c_cyl       0.009094999          0.091

With a moderate prior (n_prior = 10), the posterior mean for c_cyl is visibly shrunk, and its tail probability increases, suggesting reduced confidence in its directional effect

sum_lmbcars5$coefficients
#>               Post.Mode   Post.Mean     Post.Sd     MC Error Pr(Null_tail)
#> (Intercept) 0.054227240 0.054170966 0.001583374 5.007069e-05         0.474
#> c_wt        0.008350386 0.008426458 0.002770144 8.759966e-05         0.002
#> c_cyl       0.002121986 0.002091313 0.001499980 4.743353e-05         0.091
#>                SE(tail) Pr(Prior_tail)
#> (Intercept) 0.015789997          0.474
#> c_wt        0.001412799          0.002
#> c_cyl       0.009094999          0.091

At n_prior = 32 (equivalent to pwt = 0.5), the results mirror those from the earlier pwt-based model. This confirms the equivalence of the two approaches and reinforces the interpretive leverage offered by prior calibration.

3.4 Directly Specifying Prior Standard Deviations with sd

The previous examples illustrate how pwt indirectly determines the prior standard deviations by blending the prior mean with the maximum likelihood estimate. As shown in the output from Prior_Setup, increasing pwt results in tighter prior distributions—reflected in smaller Prior.SD values and narrower credible intervals.

In contrast, the sd argument allows the analyst to explicitly set the prior standard deviation for each coefficient. This offers direct control over the spread of the prior distribution, independent of sample size or likelihood curvature. It is especially useful when the analyst has domain knowledge about plausible effect sizes or wants to impose regularization that is not tied to the data-driven scale of the MLE. We now illustrate how to use the sd argument to manually specify prior uncertainty and compare its behavior to the pwt-based default.

pscars7 <- Prior_Setup(formula = gpm ~  c_wt+c_cyl,family=gaussian(),data =mtcars,sd=c(0.005,0.010,0.0015),n_prior=(0.01/0.99)*32)  ## 
#> Computed pwt from user-specified prior standard deviations (sd).
pscars7
#> 
#> Call:  Prior_Setup(formula = gpm ~ c_wt + c_cyl, family = gaussian(), 
#>     data = mtcars, n_prior = (0.01/0.99) * 32, sd = c(0.005, 
#>         0.01, 0.0015))
#> 
#> Note: n_prior was provided to Prior_Setup (scalar prior sample size for precision / calibration; pwt is per coefficient):
#>   n_prior      = 0.3232 
#>   n_likelihood = 32 
#> 
#> Note: Differential prior weights (pwt) were specified per coefficient.
#> 
#> Prior Setup Summary
#> ====================
#> 
#> Prior Estimates with 95% Confidence Intervals
#>             Prior.Mean Prior.SD  CI.Lower CI.Upper      pwt
#> (Intercept)   0.054227   0.0050  0.044427 0.064027 0.058549
#> c_wt          0.000000   0.0100 -0.019600 0.019600 0.041447
#> c_cyl         0.000000   0.0015 -0.002940 0.002940 0.365816
#> 
#> Prior Correlation Matrix
#>             (Intercept)    c_wt   c_cyl
#> (Intercept)           1  0.0000  0.0000
#> c_wt                  0  1.0000 -0.7825
#> c_cyl                 0 -0.7825  1.0000
#> 
#> Conditional Dispersion (Gaussian family):  1e-04 
#> 
#> Gamma Prior on Residual Precision:
#>   shape = 0.6616
#>   rate = 7.99899e-05
#>   Expected precision (inverse variance) = 8.27125e+03 which implies 1/Expected precision  = 1.20901e-04 
#> 
#>   Applicable to gaussian models with compound pfamilies (e.g., dNormal_Gamma, dIndependent_Normal_Gamma),
#>   as well as for Gamma regression, quasipoisson, and quasibinomial models.

To impose differentiated prior strength across model coefficients, we manually specify a vector of prior standard deviations. The intercept and weight coefficients are assigned relatively weak priors (SDs of 0.005 and 0.010), allowing moderate flexibility while still exerting some regularization. In contrast, the cylinder coefficient receives a much tighter prior (SD = 0.0015), reflecting stronger skepticism about its effect size. This yields credible intervals of ±0.0196 for c_wt and ±0.00294 for c_cyl, and corresponds to implied pwt values of approximately 0.041 and 0.366 respectively. The result is a prior structure that allows the data to dominate where signal is strong, while constraining marginal predictors more aggressively. It is worth noting that the same kind of differentiated prior can be generated by passing pwt as a vector instead of as a scalar. This vectorized approach to prior specification offers a flexible alternative to scalar pwt, enabling analysts to encode domain-specific beliefs about relative effect plausibility directly into the prior structure—without relying on sample-size equivalence or uniform shrinkage.

For this particular dataset, there are clear theoretical reasons to expect a near-proportional relationship between vehicle weight and fuel consumption. This aligns with physical models of rolling resistance and energy expenditure, which scale linearly with mass. In contrast, the theoretical basis for an effect from cylinder count is less direct. While cylinder number may proxy engine size or complexity, its impact on fuel consumption is mediated by design choices and driving conditions. As such, there is a compelling rationale for applying a more skeptical prior to the cylinder variable—reflecting weaker prior belief in its relevance. We now run the model using differentiated prior standard deviations and display the resulting output.


set.seed(333)
lmb_cars7<-lmb(formula = gpm ~  c_wt+c_cyl,pfamily=dNormal(mu=pscars7$mu,Sigma=pscars7$Sigma, dispersion = pscars7$dispersion),data =mtcars)
sum_lmbcars7<-summary(lmb_cars7)
sum_lmbcars7
#> Call
#> lmb(formula = gpm ~ c_wt + c_cyl, pfamily = dNormal(mu = pscars7$mu, 
#>     Sigma = pscars7$Sigma, dispersion = pscars7$dispersion), 
#>     data = mtcars)
#> 
#> Expected Residuals:
#>           Min            1Q        Median            3Q           Max 
#> -1.605366e-02 -5.186900e-03  6.241659e-05  4.642549e-03  1.339404e-02 
#> 
#> Prior and Maximum Likelihood Estimates with Standard Deviations
#> 
#>             Null Mode Prior Mean Prior.sd Max Like. Like.sd
#> (Intercept)  0.054227   0.054227 0.005000  0.054227   0.001
#> c_wt         0.000000   0.000000 0.010000  0.010960   0.002
#> c_cyl        0.000000   0.000000 0.001500  0.002785   0.001
#> 
#> Bayesian Estimates Based on 1000 iid draws
#> 
#>             Post.Mode Post.Mean   Post.Sd  MC Error Pr(Null_tail) SE(tail)
#> (Intercept) 5.423e-02 5.422e-02 1.404e-03 4.439e-05     4.970e-01    0.016
#> c_wt        1.388e-02 1.391e-02 2.079e-03 6.573e-05     0.000e+00    0.000
#> c_cyl       1.283e-04 1.509e-04 8.647e-04 2.734e-05     4.380e-01    0.016
#>             Pr(Prior_tail)    
#> (Intercept)          0.497    
#> c_wt                <2e-16 ***
#> c_cyl                0.438    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>   Directional Tail Summaries:
#> 
#>                Metric vs Null vs Prior
#>  Mahalanobis Distance 11.1603  11.1603
#>      Tail Probability  0.0000   0.0000
#>   [Tail probabilities are P(delta^T * Z <= 0) in whitened space]
#> 
#> 
#> Distribution Percentiles
#> 
#>                   1.0%       2.5%       5.0%     Median      95.0%      97.5%
#> (Intercept)  0.0509665  0.0514507  0.0518178  0.0542199  0.0564434  0.0571390
#> c_wt         0.0087148  0.0097131  0.0105347  0.0138904  0.0173886  0.0179081
#> c_cyl       -0.0018906 -0.0015000 -0.0011939  0.0001024  0.0016031  0.0019327
#>             99.0%
#> (Intercept) 0.058
#> c_wt        0.019
#> c_cyl       0.002
#> 
#> Effective Number of Parameters: 2.310513 
#> Expected Residual Deviance: 27.41328 
#> DIC: -218.1337 
#> 
#> Expected Mean dispersion: 6.886086e-05 
#> Sq.root of Expected Mean dispersion: 0.008298245 
#> 
#> Mean Likelihood Subgradient Candidates Per iid sample: 1

The posterior output reflects the impact of our manually specified prior standard deviations, which imposed differentiated regularization across coefficients. The prior for c_cyl was especially tight (SD = 0.0015), resulting in substantial shrinkage: the posterior mean dropped to 0.00050, well below the MLE of 0.00278. Despite this, the posterior standard deviation remained relatively large in proportion, and the tail probability rose to 0.265—clearly non-significant under a one-tailed test. This confirms that the skeptical prior effectively suppressed a marginal signal. In contrast, the prior for c_wt was much looser (SD = 0.010), allowing the posterior mean to remain close to the MLE (0.0136 vs. 0.01096) and yielding a highly significant tail probability (<2e-16). This reinforces the robustness of the weight effect, even under moderate regularization. The intercept, with a prior SD of 0.005, shows minimal shrinkage and remains centered near its MLE. The Mahalanobis distance (11.35) and directional tail probability (0.0000) indicate substantial prior-posterior conflict: the prior mean lies far from the posterior mean, and posterior draws almost never fall in the direction of the prior. This suggests that the prior structure is strongly contradicted by the data.

This model demonstrates how targeted prior specification can clarify which predictors are robustly supported by the data and which are sensitive to prior skepticism. The differential shrinkage across dimensions is both diagnostically informative and theoretically justified.

4. Adjusting the prior means using intercept_source, effects_source, and mu

By default, Prior_Setup() centers the prior mean for the intercept at the value estimated from the null model (i.e., the mean of the outcome), and sets the prior means for all effects to zero. However, in many modeling contexts, it is useful to shift these prior means to reflect domain knowledge, empirical estimates, or structural expectations. This can be done in three ways: by using intercept_source, by using effects_source, or by manually specifying a prior mean vector via mu.

4.1 Using intercept_source and effects_source

These arguments allow the prior mean to be derived from fitted models: - intercept_source = “null_model”: sets the prior mean for the intercept based on a model with no predictors (i.e., just the mean of the outcome). - intercept_source = “full_model”: uses the intercept from the full model fit. - effects_source = “null_effects”: sets prior means for predictors to zero. - effects_source = “full_model”: uses the coefficients from the full model fit as prior means.

This approach is useful when the analyst wants to regularize toward empirical estimates, especially in hierarchical or shrinkage settings.

pscars8 <- Prior_Setup(formula = gpm ~  c_wt+c_cyl,family=gaussian(),data =mtcars,intercept_source = "full_model",effects_source = "full_model")  ## 
#> Using default pwt = 0.01 (low-d default).
pscars8
#> 
#> Call:  Prior_Setup(formula = gpm ~ c_wt + c_cyl, family = gaussian(), 
#>     data = mtcars, intercept_source = "full_model", effects_source = "full_model")
#> 
#> Setting up a Zellner g-type prior: 
#>   pwt = 0.01 
#>   g   = (1 - pwt)/pwt = 99 
#> 
#> Note: n_prior was computed as (pwt / (1 - pwt)) * n_likelihood: 
#>   n_prior      = 0.3232 
#>   n_likelihood = 32 
#> 
#> Prior Setup Summary
#> ====================
#> 
#> Prior Estimates with 95% Confidence Intervals
#>             Prior.Mean Prior.SD  CI.Lower CI.Upper  pwt
#> (Intercept)   0.054227 0.012407  0.029911 0.078544 0.01
#> c_wt          0.010960 0.020690 -0.029591 0.051511 0.01
#> c_cyl         0.002785 0.011335 -0.019432 0.025002 0.01
#> 
#> Prior Correlation Matrix
#>             (Intercept)    c_wt   c_cyl
#> (Intercept)           1  0.0000  0.0000
#> c_wt                  0  1.0000 -0.7825
#> c_cyl                 0 -0.7825  1.0000
#> 
#> Conditional Dispersion (Gaussian family):  0 
#> 
#> Gamma Prior on Residual Precision:
#>   shape = 0.6616
#>   rate = 5.77931e-05
#>   Expected precision (inverse variance) = 1.1448e+04 which implies 1/Expected precision  = 8.73514e-05 
#> 
#>   Applicable to gaussian models with compound pfamilies (e.g., dNormal_Gamma, dIndependent_Normal_Gamma),
#>   as well as for Gamma regression, quasipoisson, and quasibinomial models.

From this output, we can see that the prior means is very close to the MLE estimate even with a diffuse prior. Let’s now estimate the model using that prior.


set.seed(333)
lmb_cars8<-lmb(formula = gpm ~  c_wt+c_cyl,pfamily=dNormal(mu=pscars8$mu,Sigma=pscars8$Sigma, dispersion = pscars8$dispersion),data =mtcars)
sum_lmbcars8<-summary(lmb_cars8)
sum_lmbcars8$coefficients1
#>              Null Mode  Prior Mean   Prior.sd   Max Like.     Like.sd
#> (Intercept) 0.05422724 0.054227240 0.01240650 0.054227240 0.001246900
#> c_wt        0.00000000 0.010959882 0.02068966 0.010959882 0.002079389
#> c_cyl       0.00000000 0.002785106 0.01133530 0.002785106 0.001139240
sum_lmbcars8$coefficients
#>               Post.Mode   Post.Mean     Post.Sd     MC Error Pr(Null_tail)
#> (Intercept) 0.054227240 0.054219960 0.001237328 3.912774e-05         0.497
#> c_wt        0.010959882 0.010969171 0.002100056 6.640959e-05         0.000
#> c_cyl       0.002785106 0.002814882 0.001139022 3.601904e-05         0.007
#>               SE(tail) Pr(Prior_tail)
#> (Intercept) 0.01581110          0.497
#> c_wt        0.01581139          0.500
#> c_cyl       0.01580883          0.491
# Use Prior_Setup to illustrate prior strength
lmcars_small <- lm(formula = gpm ~  c_wt,  data = mtcars)
lmcars_small
#> 
#> Call:
#> lm(formula = gpm ~ c_wt, data = mtcars)
#> 
#> Coefficients:
#> (Intercept)         c_wt  
#>     0.05423      0.01494
# Use Prior_Setup to illustrate prior strength
pscars9 <- Prior_Setup(  formula = gpm ~  c_wt+c_cyl,family=gaussian(),  data = mtcars,  pwt=0.01,mu=c(0.05423,0.01494,0))
#> Using user-specified prior mean vector (mu).
pscars9
#> 
#> Call:  Prior_Setup(formula = gpm ~ c_wt + c_cyl, family = gaussian(), 
#>     data = mtcars, pwt = 0.01, mu = c(0.05423, 0.01494, 0))
#> 
#> Setting up a Zellner g-type prior: 
#>   pwt = 0.01 
#>   g   = (1 - pwt)/pwt = 99 
#> 
#> Note: n_prior was computed as (pwt / (1 - pwt)) * n_likelihood: 
#>   n_prior      = 0.3232 
#>   n_likelihood = 32 
#> 
#> Prior Setup Summary
#> ====================
#> 
#> Prior Estimates with 95% Confidence Intervals
#>             Prior.Mean Prior.SD  CI.Lower CI.Upper  pwt
#> (Intercept)    0.05423 0.012419  0.029889 0.078571 0.01
#> c_wt           0.01494 0.020711 -0.025653 0.055533 0.01
#> c_cyl          0.00000 0.011347 -0.022240 0.022240 0.01
#> 
#> Prior Correlation Matrix
#>             (Intercept)    c_wt   c_cyl
#> (Intercept)           1  0.0000  0.0000
#> c_wt                  0  1.0000 -0.7825
#> c_cyl                 0 -0.7825  1.0000
#> 
#> Conditional Dispersion (Gaussian family):  0 
#> 
#> Gamma Prior on Residual Precision:
#>   shape = 0.6616
#>   rate = 5.79122e-05
#>   Expected precision (inverse variance) = 1.14245e+04 which implies 1/Expected precision  = 8.75314e-05 
#> 
#>   Applicable to gaussian models with compound pfamilies (e.g., dNormal_Gamma, dIndependent_Normal_Gamma),
#>   as well as for Gamma regression, quasipoisson, and quasibinomial models.
set.seed(333)
lmb_cars9<-lmb(formula = gpm ~  c_wt+c_cyl,pfamily=dNormal(mu=pscars9$mu,Sigma=pscars9$Sigma, dispersion = pscars9$dispersion),data =mtcars)
sum_lmbcars9<-summary(lmb_cars9)
sum_lmbcars9$coefficients1
#>              Null Mode Prior Mean   Prior.sd   Max Like.     Like.sd
#> (Intercept) 0.05422724    0.05423 0.01241928 0.054227240 0.001246900
#> c_wt        0.00000000    0.01494 0.02071097 0.010959882 0.002079389
#> c_cyl       0.00000000    0.00000 0.01134697 0.002785106 0.001139240
sum_lmbcars9$coefficients
#>               Post.Mode   Post.Mean     Post.Sd     MC Error Pr(Null_tail)
#> (Intercept) 0.054227267 0.054219980 0.001238602 3.916804e-05         0.497
#> c_wt        0.010999683 0.011008982 0.002102218 6.647798e-05         0.000
#> c_cyl       0.002757255 0.002787062 0.001140195 3.605614e-05         0.008
#>                SE(tail) Pr(Prior_tail)
#> (Intercept) 0.015811104          0.497
#> c_wt        0.005480785          0.031
#> c_cyl       0.002817091          0.008
# Use Prior_Setup to illustrate prior strength
ps4 <- Prior_Setup(
  formula = gpm ~  c_wt+c_cyl,
  family=gaussian(),
  data = mtcars,
  n_prior=1
##  ,pwt = c(0.1, 0.5),     # Example vector weights
##  priorsd = c(5, 2),     # Custom SDs for each predictor
##  priorN = NULL          # Leave NULL to focus on pwt and priorsd
)
#> Computed pwt = 0.0303 from n_prior = 1 and n_effective = 32
ps4
#> 
#> Call:  Prior_Setup(formula = gpm ~ c_wt + c_cyl, family = gaussian(), 
#>     data = mtcars, n_prior = 1)
#> 
#> Setting up a Zellner g-type prior: 
#>   pwt = 0.0303 
#>   g   = (1 - pwt)/pwt = 32 
#> 
#> Note: n_prior was computed as (pwt / (1 - pwt)) * n_likelihood: 
#>   n_prior      = 1 
#>   n_likelihood = 32 
#> 
#> Prior Setup Summary
#> ====================
#> 
#> Prior Estimates with 95% Confidence Intervals
#>             Prior.Mean Prior.SD  CI.Lower CI.Upper      pwt
#> (Intercept)   0.054227 0.007549  0.039432 0.069022 0.030303
#> c_wt          0.000000 0.012589 -0.024673 0.024673 0.030303
#> c_cyl         0.000000 0.006897 -0.013518 0.013518 0.030303
#> 
#> Prior Correlation Matrix
#>             (Intercept)    c_wt   c_cyl
#> (Intercept)           1  0.0000  0.0000
#> c_wt                  0  1.0000 -0.7825
#> c_cyl                 0 -0.7825  1.0000
#> 
#> Conditional Dispersion (Gaussian family):  1e-04 
#> 
#> Gamma Prior on Residual Precision:
#>   shape = 1
#>   rate = 8.54744e-05
#>   Expected precision (inverse variance) = 1.16994e+04 which implies 1/Expected precision  = 8.54744e-05 
#> 
#>   Applicable to gaussian models with compound pfamilies (e.g., dNormal_Gamma, dIndependent_Normal_Gamma),
#>   as well as for Gamma regression, quasipoisson, and quasibinomial models.

4.2 The role of the intercept_source in setting the prior mean for the intercept

The intercept_source argument is used to initialize the prior mean for the intercept. Our Prior_Setup function provides two options for setting the prior mean for the intercept. The first (the default) is to initialize the prior mean to the intercept from a call to the glm() function for the null model, while the alternative is to initializes the prior mean to the intercept to the intercept from a a call to the glm() for a null intercept only model.

The choice between these two specifications has some implications for the resulting estimates and standard errors for the effects parameters. Under the former, tail probabilities for effects under the default (discussed more below) are likely to be relatively close to one sided p-values in the classical frequentist model. The other alternative can be a good choice if we want to test the hypothesis that the true model is an intercept only model. The extent to which these two specifications are meaningfully different to a substantial degree from each other depends to a substantial degree on whether variables have been centered (in the case of continuous variables) and/or have been assigned reasonable reference levels (in the case of categorical variable).

The reason the null model is the default is because it tends to yield more reasonable predictions when combined with null effects.

In glmbayes, the default prior centering reflects a pragmatic balance between statistical rigor and user convenience:

4.2.1 Intercept Prior: Centered at Full Model MLE

By default, the prior for the intercept is centered at the maximum likelihood estimate from the full model. This avoids unintended shrinkage caused by poor centering or unbalanced predictors—especially in models with categorical variables. When predictors are not scaled or centered, the intercept in a full model can diverge substantially from the null model baseline. Even a weak prior (small pwt) can exert disproportionate influence if the prior mean is misaligned.

Centering the intercept prior at the full model MLE ensures:

  • Alignment with the model’s structural interpretation of the intercept
  • Robustness to variable coding and imbalance
  • Minimal distortion from prior miscentering
  • A smoother user experience without requiring manual preprocessing

4.2.2 Effect Priors: Centered at Zero

By contrast, the default prior for effect coefficients is centered at zero. This reflects a common statistical convention: the null hypothesis typically assumes no effect. Centering at zero allows users to interpret posterior summaries in terms of deviation from no effect, and makes tail probabilities (e.g., \(\Pr(\beta < 0)\)) directly meaningful.

This setup also makes it easy to specify informative priors centered at zero simply by adjusting:

  • pwt: the prior weight, interpreted as relative sample size
  • n_prior: the number of pseudo-observations contributing to the prior

4.2.3 Summary

The default prior centering in glmbayes reflects two complementary principles: - For intercepts: center at the full model MLE to avoid misalignment and unintended shrinkage. - For effects: center at zero to reflect the null hypothesis and support intuitive posterior interpretation.

This design ensures that default behavior is both statistically sound and user-friendly, while allowing flexible prior specification through pwt or n_prior.

4.3 The role of the effects_source in setting the prior mean for the model effects

Like with the intercept_source argument, our Prior_Setup function provides two alternative options.

The default is to set them to be null_effects (in other words to equal zero). When combined with the default setting for the intercept_source, this tends to result in tail probabilities close to the one-sided p-values in the classical frequentist model (the same may or may not be true when the intercept_source is set equal to the null_model).

The alternative is to set them equal to the estimates from the full_model. This is likely to be preferred if one wants to minimize the impact of the prior on the final estimates in the sense that they mirror the classical frequentist estimates.

4.4 Manually Setting Prior Means Using mu

In addition to the automated centering mechanisms provided by intercept_source and effects_source, users may directly specify the prior mean vector through the mu argument. When supplied, mu overrides all other centering logic and is used exactly as provided.

4.4.1 How mu interacts with the other arguments

The precedence rules in Prior_Setup() are:

  1. If mu is supplied, it is used verbatim, and intercept_source and effects_source are ignored.

  2. If mu is not supplied, the function constructs the prior mean vector using:

    • intercept_source for the intercept
    • effects_source for all non‑intercept coefficients
  3. The length of mu must match the number of coefficients in the model matrix.

4.4.2 Requirements for user‑specified mu

mu may be provided as:

  • a numeric vector of length equal to the number of coefficients, or
  • a matrix of dimension nvar × 1

The function validates:

  • numeric type
  • correct length
  • correct dimensions
  • assigns coefficient names automatically

If the dimensions do not match, the function stops with an informative error.

4.4.3 When manual mu is useful

Manually specifying mu is appropriate when:

  • domain knowledge suggests plausible effect sizes
  • the analyst wants to encode directional or magnitude expectations
  • the default centering (null model or full model) is inappropriate
  • different predictors require different prior locations
  • the model is part of a hierarchical or multi‑level structure
  • the analyst wants to reproduce a prior from an external study

4.4.4 Example: specifying a custom prior mean

A simple example using the mtcars data:


ps_custom <- Prior_Setup(
  gpm ~ c_wt + c_cyl,
  data = mtcars,
  pwt  = 0.01,
  mu   = c(0.05423, 0.01494, 0)   # intercept, c_wt, c_cyl
)
#> Using user-specified prior mean vector (mu).

ps_custom$mu
#>                  mu
#> (Intercept) 0.05423
#> c_wt        0.01494
#> c_cyl       0.00000

This produces a prior centered at the user‑specified values, with the prior covariance still determined by pwt (or by n_prior or sd, if supplied).

4.4.5 Relationship between mu and prior strength

It is important to distinguish:

  • mu controls the location of the prior, while
  • pwt, n_prior, and sd control the strength of the prior.

Thus:

  • A strongly informative prior mean will have minimal influence if pwt is small.
  • A weakly informative prior mean can have substantial influence if pwt is large.

4.4.6 Summary

  • Use mu when you want full control over prior centering.
  • Use intercept_source and effects_source when you want automatic empirical centering.
  • mu always overrides the other two.
  • Prior strength is controlled separately through pwt, n_prior, or sd.

5. Practical Considerations for Prior Means and Prior Strength

Specifying a prior involves two connected decisions:

  1. Where to center the prior (the prior mean)
  2. How strongly the prior should pull the posterior (the prior variance or weight)

The behavior of the posterior depends on both. This section summarizes practical considerations that help guide the choice of intercept_source, effects_source, mu, and the prior strength parameters pwt, n_prior, and sd.

5.1 Centering and Scaling of Predictors

The effect of prior centering depends on how predictors are coded.

  • Continuous predictors should ideally be centered. This aligns the null-model intercept with the full-model intercept and reduces unintended shrinkage.
  • Categorical predictors should have interpretable reference levels. Poor reference levels can cause the full-model intercept to differ substantially from the null-model intercept.

When predictors are not centered or reference levels are not meaningful, using intercept_source = "full_model" often produces more stable behavior.

5.2 Choosing the Prior Mean for the Intercept

  • intercept_source = "null_model" (default)
    Appropriate when predictors are centered or balanced. Posterior tail probabilities resemble classical one-sided p-values.

  • intercept_source = "full_model"
    Preferable when predictors are uncentered or categorical. This avoids shrinkage toward an intercept that does not reflect the structure of the full model.

The intercept prior can have a noticeable influence when predictors are poorly centered, even when pwt is small.

5.3 Choosing the Prior Means for the Effects

  • effects_source = "null_effects" (default)
    Centers all effects at zero. Appropriate for hypothesis testing or when the analyst wants posterior summaries relative to the classical null.

  • effects_source = "full_model"
    Centers the prior at the maximum likelihood estimates. Useful when the goal is regularization rather than hypothesis testing, or when the analyst wants the posterior to remain close to the classical estimates.

5.4 When to Specify mu Directly

Manual specification of mu is appropriate when:

  • domain knowledge suggests plausible effect sizes
  • the analyst wants to encode directional expectations
  • the default centering mechanisms are inappropriate
  • different predictors require different prior locations
  • the model is part of a hierarchical or multi-level structure

When mu is supplied, it overrides both intercept_source and effects_source.

5.5 Choosing the Strength of the Prior

The strength of the prior is controlled by:

  • pwt: prior weight relative to the likelihood
  • n_prior: effective prior sample size
  • sd: prior standard deviations (converted internally to pwt)

These parameters determine how strongly the prior mean influences the posterior.

  • Small pwt (for example, 0.01)
    The posterior remains close to the maximum likelihood estimate unless the prior mean is far from the likelihood estimate.

  • Moderate or large pwt (for example, from n_prior = 1 or small sd)
    The posterior is pulled more strongly toward the prior mean.

  • Vector pwt or vector sd
    Allows different predictors to have different prior strengths.

5.6 Interaction Between Prior Location and Prior Strength

Prior location and prior strength must be considered together.

  • A well-chosen prior mean has little effect if the prior is weak.
  • A poorly chosen prior mean can have substantial influence if the prior is strong.
  • Using "full_model" centering with a strong prior can effectively reproduce the maximum likelihood estimates.
  • Using "null_effects" with a strong prior can shrink effects strongly toward zero.

This interaction is especially important when predictors are uncentered or when the intercept differs substantially between the null and full models.

5.7 Summary

  • Prior means (intercept_source, effects_source, mu) determine where the prior pulls.
  • Prior strength (pwt, n_prior, sd) determines how strongly it pulls.
  • Center and scale predictors when possible.
  • Choose prior centering based on the modeling goal:
    • classical-like inference: "null_model" and "null_effects"
    • regularization toward empirical estimates: "full_model"
    • domain-informed priors: mu
  • Always consider prior strength when evaluating the impact of prior centering.

These considerations help ensure that prior specification behaves predictably and supports the analyst’s inferential goals.

References

Evans, Michael, and Hadas Moshonov. 2006. “Checking for Prior-Data Conflict.” Bayesian Analysis 1 (4): 893–914. https://doi.org/10.1214/06-BA129.
McCullagh, P., and J. A. Nelder. 1989. Generalized Linear Models. Chapman; Hall.
Nygren, K. N., and L. M. Nygren. 2006. Likelihood Subgradient Densities.” Journal of the American Statistical Association 101 (475): 1144–56. https://doi.org/10.1198/016214506000000357.
Venables, W. N., and B. D. Ripley. 2002. Modern Applied Statistics with S. Springer.
Zellner, Arnold. 1986. “On Assessing Prior Distributions and Bayesian Regression Analysis with g‐prior Distributions.” In Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti, edited by P. K. Goel and Arnold Zellner, vol. 6. Studies in Bayesian Econometrics and Statistics. Elsevier.