The glmbayes package produces iid samples for Bayesian Linear Models and Bayesian Genereralized Linear Models by providing Bayesian versions of the lmb and glm functions for classical models. Extensive efforts have been made to make the new functions, lmb and glmb, as similar as possible to their classical cousins. To that end, the functions add a single required argument (pfamily) and one optional argument (n) to those typically used by the classical functions. The Bayesian functions also generally inherit and/or provide implemented methods for nearly all of the methods available for the lm and glm functions so that the same type of outputs as from the classical functions can be readily produced.
This vignette is designed to provide users a detailed overview of the package and the various functionalities available. In the rest of this introduction, we briefly touch on the basics of loading the package and accessings various help resources and demos (which we recommend as a next step for users after they read this vignette). The next section then gives an overview of all the core functions that most users are likely to use. This includes the two aforementioned functions, functions used to specify and check prior distributions, and methods available for the two functions. Most users can likely stop after that section and can then turn to either the help pages or the next set of vignettes that go into a more detailed comparisons of how the methods are used for these Bayesian functions vs. for their classical cousins.
For advanced users who are interested in implementing Block-Gibbs sampling procedures or in doing simulation for other more complex models, we recommend continuing on to the third section in this vignette that covers the two functions rlmb and rglmb and are designed for implementation of precisely that type of sampling. The third section discusses how those functions can be called, references some examples/demos where they are used, and describes how to use the output from the functions to access similar output to that produced by the lmb and glmb functions.
For users who are interested in the details of how the simulation is performed there is the final section that contains a discussion of the various simulation functions that are used to sample from the posterior distributions. This includes a discussion of three core function that simulate for the currently implemented prior families and a couple of functions related to what we call the Central Normal distribution (the normal distribution between a lower and an upper bound). It also includes a discussion of a set of functions that are used in order to utilize the enveloping functions needed to get iid samples for some of the posterior distributions. That material is rather technical; the likelihood-subgradient foundation is given by (Nygren and Nygren 2006), with estimation architecture summarized in (Nygren 2025a) and envelope details in (Nygren 2025b).
| Function Type | Functions | Vignettes |
|---|---|---|
| Prior Setup function | Prior_Setup | Chapter 01 — Getting Started with glmbayes |
| Main Model Interfaces (GLM / LM) | glmb | Chapter 01 — Getting Started with glmbayes |
| Methods | print.glmb | Chapter 01 — Getting Started with glmbayes |
| pfamilies | dNormal | Chapter 02 — Estimating Bayesian Linear Models |
| Main Model Interfaces (GLM / LM) | lmb | Chapter 02 — Estimating Bayesian Linear Models |
| Methods | summary.lmb | Chapter 02 — Estimating Bayesian Linear Models |
| Prior Setup function | Prior_Setup | Chapter 03 — Tailoring Priors - Leveraging the Prior_Setup Function; Chapter A12 — Technical Derivations for Priors Returned by Prior_Setup |
| Methods | print.PriorSetup | Chapter 03 - Tailoring Priors - Leveraging the Prior_Setup Function |
| Methods | predict.glmb, residuals.glmb, extractAIC.glmb | Chapter 04 — Predictions, Deviance Residuals, Model Statistics |
| Main Model Interfaces (GLM / LM) | glmb | Chapter 06 — Estimating Bayesian Generalized Linear Models |
| Methods | summary.glmb | Chapter 06 — Estimating Bayesian Generalized Linear Models |
| Main Model Interfaces (GLM / LM) | glmb | Chapter 07 — Models for the Binomial Family |
| Main Model Interfaces (GLM / LM) | glmb | Chapter 08 — Models for the Poisson Family |
| Main Model Interfaces (GLM / LM) | glmb | Chapter 09 — Models for the Gamma Family |
| Lower-Level Interfaces (GLM / LM) | rlmb | Chapter 13 — Hierarchical Linear Models |
| Lower-Level Interfaces (GLM / LM) | rglmb | Chapter 14 — Hierarchical Generalized Linear Models |
| Model-Specific Simulation Functions | rNormal_reg, rGamma_reg, rNormal_Gamma_reg, rindependent_norm_gamma_reg | Chapter A02: Overview of Estimation Procedures |
| pfamilies | dGamma, dNormal_Gamma, dIndependent_Normal_Gamma | Chapter 11 — Estimating Models with Unknown Dispersion Parameters |
| Methods | Various method functions for glmbayes | Chapter A03 — Methods Available in glmbayes |
| Envelope Related Functions | Various Functions | Chapter A08 — Overview of Envelope Related Functions |
| Parallel Simulation Related Functions | Various Functions | Chapter A09: Parallel Sampling Implementation using RcppParallel |
| Accelerated EnvelopedBuild Related Functions | Various Functions | Chapter A10: Accelerated EnvelopeBuild Implementation using OpenCL |
| Utilities | add_to_libpath_linux, add_to_path, add_to_path_linux, add_to_path_windows | Chapter 00 — Introduction Chapter 01 — Getting Started |
| Datasets | AMI, carinsca, Cleveland | Chapter 01 — Getting Started |
The package curently requires the MASS package so it also needs to be installed and gets loaded as part of the loading of the glmbayes package (see below).
To access information for the package outside of the information contained in these vignettes, we suggest users review three sets of help pages. A call to help(glmbayes) yields a page with a detailed discussion of the package with an example at the bottom. Calling help(package=“glmbayes”) brings the user to a list of Help pages for specific functions as well as links to a general description file, code demos, and these vignettes. Part of the objective of this vignette is to provide more organized layout of the functions than the general list of help pages provided on the documentation page.
A call to demo(package=“glmbayes”), finally, brings up a list of available demos for the package. A useful start for users might be to go through the specific demos as they are set up to illustrate the functionality of the package.
As noted above, the lmb and glmb functions are the two core functions in this package. More detailed side-by-side comparisons of the use of methods for these functions and their classical cousins are provided in the next two vignettes. Here we provide a brief overview of supporting functions available to facilitate the use of the core functions. In the first subsection, we show how the core functions are called (and how this compares to calls for the classical functions). We then give a brief discussion of how priors are specified and some related functions designed to assist with that task as well as with the task of viewiw prior specifications after the modeling is completed.
The code snippets below illustrates how model specification for these models can be done by simply providind an additional argument, a pfamily, that provides information on the type of prior distribution that has been selected and the parameter values that are desired. A separate vignette is provided to give guidance to users on how to select appropriate prior parameters (and some standardization steps that may be useful as part of that).
[NEED TO MODIFY Prior_Setup PRINT FUNCTION AND HOW IT HANDLES CASE WHERE X IS MISSING] [MAY ALSO WANT TO HAVE IT INITIALIZE SHAPE AND RATE PARAMETERS]
Calling lm vs. calling lmb
Here is the call for the lmb function and how it compares to the call to the classical lm function. The recommended prior here is the dNormal_Gamma pfamily which requires four parameters (2 for he Normal component, and 2 for the Gamma component).
# Annette Dobson (1990) "An Introduction to Generalized Linear Models".
## Page 9: Plant Weight Data.
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
##Classical model-call
lm.D9 <- lm(weight ~ group)
# Bayesian Prior Setup and Call-Prior Should Generally be Customized
p_info=Prior_Setup(weight~group, family=gaussian())
#> Using default pwt = 0.01 (low-d default).
mu1=p_info$mu
Sigma1=p_info$Sigma
#lmb.D9 <- lmb(weight ~ group,pfamily=dNormal_Gamma(mu1,Sigma1,shape=4,rate=0.1))Calling glm vs. calling glmb
The call for the glmb function is quite similar and is here compared to the corresponding call for the glm function. For the binomial and Poisson families, the recommended prior here is the dNormal pfamily which requires 2 parameters (a prior mean vector and a prior Variance-Covariance matrix).
## Dobson (1990) Page 93: Randomized Controlled Trial :
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
##Classical model-call
glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
# Bayesian Prior Setup and Call-Prior Should Generally be Customized
p_info2=Prior_Setup(counts ~ outcome + treatment,family=poisson())
#> Using default pwt = 0.01 (low-d default).
mu2=p_info2$mu
Sigma2=p_info2$Sigma
glmb.D93<-glmb(counts ~ outcome + treatment, family = poisson(),pfamily=dNormal(mu=mu2,Sigma=Sigma2))The pfamily arguments described above are actually functions that provide needed information to the lmb and glmb functions. Since they are are functions, they can also be called directly, as seen below, where they return lists with items related to the prior (see documentation of these pfamily functions for details). These functions are also discussed in more detail in a latter chapter.
While the current package only provides 3 implemented pfamiles, this is actually not a severe limitations as users easily can write their own prior families and pass those to the lmb and glmb functions. One of our exaxmple includes an illustration of this and this approach is discussed more extensively in one of our other vignettes. Below, we simply illustrate how the functions can be called directly.
NOTE: IMPLEMENT THE EXAMPLE DISCUSSED ABOVE AND WRITE A SEPARATE VIGNETTE TO ACCOMPANY THE EXAMPLE.
Calling the dNormal prior family
This prior is typically used as a prior for the Poisson and Binomial families when calling either the glmb or rglm functions. Calling it directly provides a list with, among other items, the prior specification. the documentation for the pfamilies provides additional information on what the returned list contains and what required and optional arguments are.
*Calling the dNormal_Gamma prior family
This prior is typically used as a prior when calling either the lmb or rglmb functions or when using the gaussian() family in the glm and rglmb functions. Calling it directly
*Calling the dGamma prior family
This prior is typically used when giving a prior for the dispersion parameter in the lmb, rlmb, glmb, and rglmb functions (in the latter case for the gaussian() and Gamma() families). Typicall this would be done as part of Block-Gibbs sampling where the regression coefficients are updated in a separate block.
The lm and glm functions come with an extensive set of method functions that can be used to conduct supplemental analysis (typically during the post-modeling phase). Because the lmb and glmb functions are structured to return many of the same items as the classical functions and because they inherit methods from glm and lm, many of the methods available for the classical functions are directly available for the glmb and lmb functions. As some of the output does tend to be different (in particular, coefficients are returned as random draws as opposed to a single maximum likelihood estimate), we have implemented specific methods for the “glmb” and “lmb” classes of objects. The current set of methods for all four classes of objects are displayed below.
Two separate vignettes illustrate how calls to these methods can be used to replicate the output from the classical models. It is worth noting that a smaller subset of the classical methods are not yet implemented (although plans are in place to replicate some of those as well). Those are discussed in the separate vignette’s as well. Here we provide lists of the methods available for classical lm and glm functions (many of which are inherited by lmb and glmb) as well as methods that have been developed specifically for glmb and rglmb.
[NOTE: AS NEEDED, ADD ADDITIONAL METHODS FOR lmb. THE LIST OF METHODS FOR glmb SHOULD BE FAIRLY COMPLETE]
Methods for class=“lm”
methods(class="lm")
#> [1] add1 addterm alias anova boxcox
#> [6] case.names coerce confint cooks.distance deviance
#> [11] dfbeta dfbetas dffits drop1 dropterm
#> [16] dummy.coef effects extractAIC family formula
#> [21] hatvalues influence initialize kappa labels
#> [26] logLik logtrans model.frame model.matrix nobs
#> [31] plot predict print proj qr
#> [36] residuals rstandard rstudent show simulate
#> [41] slotsFromS3 summary variable.names vcov
#> see '?methods' for accessing help and source codeMethods for class=“glm”
methods(class="glm")
#> [1] add1 addterm anova coerce confint
#> [6] cooks.distance deviance dfbetas dffits drop1
#> [11] dropterm effects extractAIC family formula
#> [16] gamma.shape influence initialize logLik model.frame
#> [21] nobs predict print profile residuals
#> [26] rstandard rstudent show sigma slotsFromS3
#> [31] summary vcov weights
#> see '?methods' for accessing help and source codeMethods for class=“glmb”
While most of the methods for the glm class (when it has its own method) or the lm class (when it does not) work properly for objects of class glmb, some methods do not. For that reason, we have implemented the below methods specifically for the glmb class of objects. Future enhancements to this package may enhance these methods. There are also additional methods for lm and/or glm that do not currently have functioning methods for the glmb class. This includes the add1, drop1, and dropterm functions (which should be implementable fairly easily) and the various influence.measure methods (influence, rstandard, rstudent, dfbeta, dfbetas, cooks.distance, and hatvalues) which may require more work. Plans do exist to add the former and perhaps the latter (which likely would look at influence on posterior modes and not on posterior means).
methods(class="glmb")
#> [1] anova case.names confint cooks.distance dfbetas
#> [6] dummy.coef extractAIC influence logLik plot
#> [11] predict print residuals rstandard rstudent
#> [16] simulate summary variable.names vcov
#> see '?methods' for accessing help and source codeNOTE: CHECK WHICH METHODS FOR lmb that are inherited and functions vs. those that may require additional development.
Because the functions lmb and glmb are set up to mirror their corresponding classical cousins, they include a great deal of overhead for pre and post processing that are unsuitable if the functions need to be called repetitively (say during Block-Gibbs sampling). To faciliate Block-Gibbs sampling and other simulation, we provide more minimalistic interfaces to the simulation procedures in the form of the corresponding functions rlmb and rglmb.
These functions are called internally by lmb and glmb in much the same way that lm.fit and glm.fit are called by the lm and glm functions so in a way they are Bayesian versions of those functions. We choose the r prefix for these functions to symbolize that they also represent random draws from probability distributions in much the same way that rnorm, rgamma, and other functions in the stats package do.
Instead of passing a formula to these functions, the user instead passes the dependent and independent variables as a dependent variable vector y and as an independent variable design matrix x respectively. A useful approach that can help the user specify the design matrix involves sequential calls to model.frame and model.matrix as seen below.
NOTE 1: SHORTEN THIS CODE NOTE 2: The SUMMARY FUNCTION IS CURRENTLY NOT WORKING FOR rlmb
Calling the rlmb function
## Annette Dobson (1990) "An Introduction to Generalized Linear Models".
## Page 9: Plant Weight Data.
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
## Classical call
lm.D9 <- lm(weight ~ group,x=TRUE,y=TRUE)
lm_summary=summary(lm.D9)
## Bayesian Call
n<-10000
y<-lm.D9$y
#x<-as.matrix(lm.D9$x)
### Set old regression and dispersion coefficients
b_old=lm.D9$coefficients
v_old=lm_summary$sigma^2
#### Set up for rGamma_reg
n0=0.1
shape=n0/2
rate=shape*v_old
rate/(shape-1)
rate/shape
#a1<-shape+n1/2
#b1<-rate+sum(SS)/2
#out<-1/rgamma(n,shape=a1,rate=b1)
## v0=sum(SS)/n1 ~ (2*rate+sum(SS))/(2*shape+n1)=[(2*rate+sum(SS))/2]/((2*shape+n1)/2)
n1=length(y)
SS=v_old*n1
n1 # This is equal to 20
n0=2 # Prior observations
v_prior=v_old # Prior point estimate for variance (the mean of (1/dispersion=1/v_prior))
wt0=(n0/n1)
## set shape=0.01*(n1/2)
## set rate= 0.01*SS/2
shape=wt0*(n1/2) ### Shape is prior observations /2
rate=shape*v_prior ### rate is essentiall prior SS - V in rmultireg should be this
rate/shape ## Should match v_prior (currently also v_old)
mu<-c(0,0)
mu=b_old ### For testing purposes, set prior=b_old
P<-0.1*diag(2)
# Bayesian Prior Setup and Call-Prior Should Generally be Customized
p_info=Prior_Setup(weight~group)
#> Using default pwt = 0.01 (low-d default).
x=p_info$x
outtemp4=rlmb(n=1000,y=y,x=x,pfamily=dNormal_Gamma(mu=mu,Sigma_0=solve(P),shape=shape,rate=rate))
#summary(outtemp4)Calling the rglmb function
data(menarche,package="MASS")
Age2=menarche$Age-13
y=menarche$Menarche/menarche$Total
wt=menarche$Total
## Use model.frame and model.matrix to derive x
#mf=model.frame(formula)
#x=model.matrix(formula,mf)
x<-matrix(as.numeric(1.0),nrow=length(Age2),ncol=2)
x[,2]=Age2
## Modify Prior_Setup so it can take a model matrix as well as an model object
mu<-matrix(as.numeric(0.0),nrow=2,ncol=1)
V1<-1*diag(as.numeric(2.0))
mu[2,1]=(log(0.9/0.1)-log(0.5/0.5))/3
# 2 standard deviations for prior estimate at age 13 between 0.1 and 0.9
## Specifies uncertainty around the point estimates
V1[1,1]<-((log(0.9/0.1)-log(0.5/0.5))/2)^2
V1[2,2]=(3*mu[2,1]/2)^2 # Allows slope to be up to 3 times as large as point estimate
out<-rglmb(n = 1000, y=y, x=x, pfamily=dNormal(mu=mu,Sigma=V1), weights = wt,
family = binomial(logit))
summary(out)
#> Call
#> rglmb(n = 1000, y = y, x = x, family = binomial(logit), pfamily = dNormal(mu = mu,
#> Sigma = V1), weights = wt)
#>
#> Expected Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -2.0502951 -1.0082616 -0.4821518 0.7618818 1.3543865
#>
#> Prior Estimates with Standard Deviations
#>
#> Prior Mean Prior.sd Max Like. Like.sd
#> V1 0.00000 1.09861 -0.01081 0.063
#> V2 0.73241 1.09861 1.63197 0.059
#>
#> Bayesian Estimates Based on 1000 iid draws
#>
#> Post.Mode Post.Mean Post.Sd MC Error Pr(tail)
#> V1 -0.010696 -0.009177 0.065664 0.002 0.446553
#> V2 1.629390 1.630505 0.059787 0.002 0.000999 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Distribution Percentiles
#>
#> 1.0% 2.5% 5.0% Median 95.0% 97.5% 99.0%
#> V1 -0.167143 -0.140983 -0.114516 -0.008605 0.097706 0.115446 0.131
#> V2 1.495827 1.516834 1.535443 1.629163 1.728268 1.745358 1.766
#>
#> Effective Number of Parameters: 2.111456
#> Expected Residual Deviance: 28.81616
#> DIC: 114.9794
#>
#> Expected Mean dispersion: 1
#> Sq.root of Expected Mean dispersion: 1
#>
#> Mean Likelihood Subgradient Candidates Per iid sample: 1.293To allow for the output from the rglmb and rlmb functions to benefit from the wide range of methods available for their lmb and glmb cousins, we implement a few additional methods for these functions that enable most of the methods to also work for these objects post modeling.
methods(class="rglmb")
#> [1] deviance extractAIC print residuals summary
#> see '?methods' for accessing help and source codeIn this section, we cover additional functions included in the package that most users likely would not call directly but which provide visibility to the simulation methods utilized as part of the package. These functions can be broadly broken down into four distinct categories:.
One of the key items returned by the pfamily functions is an assigned simulation function to be called during the estimation process (one for each pfamily). These functions can also be called directly (though we don’t recommend it). The Function documentation for each of these underlying functions contains information on the type of algorihm used to generate the sample. We refer the reader to the documentation for each of these functions but note the following relationship:
For estimation of Bayesian generalized linear models with Normal Priors but non-gaussian families, accept-reject procedures are used where candidates are generated from mixtures of restricted multivariate normal distributions. Two functions for univariate restricted normal distributions (rnorm_ct and pnorm_ct) play a key role during this process and can be viewed as generalizations of the rnorm and pnorm functions.
Finally, the package contains a small set of functions that are used by various method functions. While these are currently exported, we may in the future choose not to export them as they are unlikely to be useful for users of the package as standalone functions.