Binomial generalized linear models (GLMs) are used when the response represents binary outcomes (success/failure) or proportions (successes out of trials). They are among the most widely used GLMs in applied statistics, powering models for:
Binomial regression is a standard generalized linear model (Nelder and Wedderburn 1972; McCullagh and Nelder 1989; Agresti 2015).
In classical statistics, these models are fit using:
In glmbayes, the Bayesian analogue is:
This chapter introduces:
We build on the foundations from Chapters 05 and 06, especially the role of link functions, log‑concavity, and prior specification.
Binomial data arise in several equivalent representations:
In all cases, the underlying sampling model is
\[ Y_i \sim \text{Binomial}(n_i, \mu_i), \qquad 0 < \mu_i < 1, \]
where: - \(n_i\) is the number of trials, - \(\mu_i = \Pr(Y_i = 1)\) is the success probability.
A binomial GLM links the mean \(\mu_i\) to a linear predictor through
\[ \eta_i = x_i^\top \beta, \qquad \mu_i = g^{-1}(\eta_i), \]
where \(g(\cdot)\) is the chosen link function (logit, probit, cloglog, etc.).
Using weights \(w_i = n_i\), the log‑likelihood (up to constants) becomes
\[ \ell(\beta) = \sum_{i=1}^n w_i\Big[ y_i \log(\mu_i) + (1-y_i)\log(1-\mu_i) \Big]. \]
This form is used by both glm() and the Bayesian
functions glmb() and rglmb().
The binomial likelihood belongs to the exponential family (McCullagh and Nelder 1989; Agresti 2015).
For a model with linear predictor \[
\eta_i = x_i^\top \beta,
\] and mean \[
\mu_i = g^{-1}(\eta_i),
\] the contribution of observation \(i\) to the log‑likelihood can be written as
\[
\ell_i(\beta)
=
w_i\Big[
y_i \log(\mu_i)
+
(1-y_i)\log(1-\mu_i)
\Big],
\] where \(w_i\) is the number
of trials (or a user‑supplied weight).
This representation does not require the link to be
canonical.
The variance of a binomial observation is always \[
\mathrm{Var}(Y_i) = \mu_i(1-\mu_i),
\] regardless of the link function.
The link function determines how the mean response relates to the linear predictor: \[ g(\mu_i) = \eta_i. \]
The binomial() family in R supports several link
functions:
| Link Function | Formula | Notes |
|---|---|---|
| logit (canonical) | \(\eta = \log(\mu/(1-\mu))\) | Most common; canonical link; induces log‑concavity in \(\eta\) |
| probit | \(\eta = \Phi^{-1}(\mu)\) | Based on the standard normal CDF; induces log‑concavity in \(\eta\) |
| cloglog | \(\eta = \log[-\log(1-\mu)]\) | Asymmetric; useful for rare events; induces log‑concavity in \(\eta\) |
| cauchit | \(\eta = \tan[\pi(\mu - 1/2)]\) | Heavy‑tailed; does not preserve log‑concavity in general |
| identity | \(\eta = \mu\) | Must ensure \(0 < \mu < 1\); does not preserve log‑concavity in general |
In this chapter we focus on the three most commonly used links:
Each link produces a different transformation \(g^{-1}(\eta)\) and therefore a different expression for the log‑likelihood and its derivatives. For the logit, probit, and cloglog links, the resulting log‑likelihood is known to be log‑concave in the linear predictor \(\eta\), which is crucial for stable envelope construction and accept–reject sampling in glmbayes.
The explicit formulas for each link are provided at the beginning of their respective sections.
The general Bayesian call is:
glmb(
formula,
family = binomial(link = "logit" | "probit" | "cloglog"),
pfamily = dNormal(mu = mu, Sigma = V),
data = ...
)As in earlier chapters, the recommended workflow is:
This produces:
You may override these defaults for more informative priors (see Chapter 10).
For the logit link: \[ \eta_i = \log\!\left(\frac{\mu_i}{1-\mu_i}\right), \qquad \mu_i = \frac{1}{1+e^{-\eta_i}}. \]
\[ \ell_{\text{logit}}(\beta) = \sum_{i=1}^n w_i\Big[ y_i\,\eta_i - \log(1+e^{\eta_i}) \Big]. \]
\[ \log p(\beta) = -\tfrac12(\beta-\mu_0)^\top \Sigma_0^{-1}(\beta-\mu_0) + \text{const}. \]
\[ \log p(\beta \mid y) = \sum_{i=1}^n w_i\Big[ y_i\,\eta_i - \log(1+e^{\eta_i}) \Big] - \tfrac12(\beta-\mu_0)^\top \Sigma_0^{-1}(\beta-\mu_0) + \text{const}. \]
The logit link is the canonical choice for binomial GLMs (McCullagh and Nelder 1989; Agresti 2015):
\[ \eta = \log\left(\frac{\mu}{1-\mu}\right) \]
It is symmetric and interpretable in terms of log‑odds.
We use the Menarche dataset introduced in Chapter 06:
data(menarche,package="MASS")
Age2 <- menarche$Age - 13
Menarche_Model_Data <- data.frame(
Age = menarche$Age,
Total = menarche$Total,
Menarche = menarche$Menarche,
Age2 = Age2
)
Menarche_Model_Data
#> Age Total Menarche Age2
#> 1 9.21 376 0 -3.79
#> 2 10.21 200 0 -2.79
#> 3 10.58 93 0 -2.42
#> 4 10.83 120 2 -2.17
#> 5 11.08 90 2 -1.92
#> 6 11.33 88 5 -1.67
#> 7 11.58 105 10 -1.42
#> 8 11.83 111 17 -1.17
#> 9 12.08 100 16 -0.92
#> 10 12.33 93 29 -0.67
#> 11 12.58 100 39 -0.42
#> 12 12.83 108 51 -0.17
#> 13 13.08 99 47 0.08
#> 14 13.33 106 67 0.33
#> 15 13.58 105 81 0.58
#> 16 13.83 117 88 0.83
#> 17 14.08 98 79 1.08
#> 18 14.33 97 90 1.33
#> 19 14.58 120 113 1.58
#> 20 14.83 102 95 1.83
#> 21 15.08 122 117 2.08
#> 22 15.33 111 107 2.33
#> 23 15.58 94 92 2.58
#> 24 15.83 114 112 2.83
#> 25 17.58 1049 1049 4.58summary(glmb.logit)
#> Call
#> glmb(formula = cbind(Menarche, Total - Menarche) ~ Age2, family = binomial(link = "logit"),
#> pfamily = dNormal(mu = mu, Sigma = V), n = 1000, data = Menarche_Model_Data)
#>
#> Expected Residuals:
#> Min 1Q Median 3Q Max
#> -2.0816329 -1.0288277 -0.4470266 0.7146186 1.3219517
#>
#> Prior and Maximum Likelihood Estimates with Standard Deviations
#>
#> Null Mode Prior Mean Prior.sd Max Like. Like.sd
#> (Intercept) 0.36015 0.36015 0.62794 -0.01081 0.063
#> Age2 0.00000 0.00000 0.58658 1.63197 0.059
#>
#> Bayesian Estimates Based on 1000 iid draws
#>
#> Post.Mode Post.Mean Post.Sd MC Error Pr(Null_tail) SE(tail)
#> (Intercept) -0.007133 -0.008952 0.060739 0.001921 0.000000 0
#> Age2 1.615870 1.619447 0.057918 0.001832 0.000000 0
#> Pr(Prior_tail)
#> (Intercept) <2e-16 ***
#> Age2 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Directional Tail Summaries:
#>
#> Metric vs Null vs Prior
#> Mahalanobis Distance 28.4625 28.4625
#> Tail Probability 0.0000 0.0000
#> [Tail probabilities are P(delta^T * Z <= 0) in whitened space]
#>
#>
#> Distribution Percentiles
#>
#> 1.0% 2.5% 5.0% Median 95.0% 97.5% 99.0%
#> (Intercept) -0.149079 -0.126726 -0.112577 -0.007696 0.085953 0.103664 0.126
#> Age2 1.497308 1.510325 1.524856 1.617836 1.716736 1.735859 1.758
#>
#> Effective Number of Parameters: 1.915889
#> Expected Residual Deviance: 28.66534
#> DIC: 114.633
#>
#> Expected Mean dispersion: 1
#> Sq.root of Expected Mean dispersion: 1
#>
#> Mean Likelihood Subgradient Candidates Per iid sample: 1.274This produces:
The slope parameter typically shows strong evidence of increasing probability of menarche with age.
The probit link is a common alternative to the logit when a latent normal formulation is convenient (McCullagh and Nelder 1989; Agresti 2015).
For the probit link: \[ \eta_i = \Phi^{-1}(\mu_i), \qquad \mu_i = \Phi(\eta_i), \] where \(\Phi\) is the standard normal CDF.
\[ \ell_{\text{probit}}(\beta) = \sum_{i=1}^n w_i\Big[ y_i \log\Phi(\eta_i) + (1-y_i)\log\big(1-\Phi(\eta_i)\big) \Big]. \]
\[ \log p(\beta) = -\tfrac12(\beta-\mu_0)^\top \Sigma_0^{-1}(\beta-\mu_0) + \text{const}. \]
\[ \log p(\beta \mid y) = \sum_{i=1}^n w_i\Big[ y_i \log\Phi(\eta_i) + (1-y_i)\log\big(1-\Phi(\eta_i)\big) \Big] - \tfrac12(\beta-\mu_0)^\top \Sigma_0^{-1}(\beta-\mu_0) + \text{const}. \]
It is similar to the logit link but has slightly thinner tails and a more Gaussian interpretation.
summary(glmb.probit)
#> Call
#> glmb(formula = cbind(Menarche, Total - Menarche) ~ Age2, family = binomial(link = "probit"),
#> pfamily = dNormal(mu = mu2, Sigma = V2), n = 1000, data = Menarche_Model_Data)
#>
#> Expected Residuals:
#> Min 1Q Median 3Q Max
#> -1.6317919 -0.9534072 -0.4880500 0.4469835 1.7716036
#>
#> Prior and Maximum Likelihood Estimates with Standard Deviations
#>
#> Null Mode Prior Mean Prior.sd Max Like. Like.sd
#> (Intercept) 0.22517 0.22517 0.34838 -0.01724 0.035
#> Age2 0.00000 0.00000 0.29405 0.90782 0.030
#>
#> Bayesian Estimates Based on 1000 iid draws
#>
#> Post.Mode Post.Mean Post.Sd MC Error Pr(Null_tail) SE(tail)
#> (Intercept) -0.0146442 -0.0152602 0.0353322 0.0011173 0.0000000 0
#> Age2 0.8988270 0.9005246 0.0294212 0.0009304 0.0000000 0
#> Pr(Prior_tail)
#> (Intercept) <2e-16 ***
#> Age2 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Directional Tail Summaries:
#>
#> Metric vs Null vs Prior
#> Mahalanobis Distance 32.9623 32.9623
#> Tail Probability 0.0000 0.0000
#> [Tail probabilities are P(delta^T * Z <= 0) in whitened space]
#>
#>
#> Distribution Percentiles
#>
#> 1.0% 2.5% 5.0% Median 95.0% 97.5% 99.0%
#> (Intercept) -0.09072 -0.08365 -0.07379 -0.01621 0.04341 0.05405 0.075
#> Age2 0.83584 0.84476 0.85239 0.90077 0.95045 0.95751 0.967
#>
#> Effective Number of Parameters: 2.02748
#> Expected Residual Deviance: 24.97825
#> DIC: 111.0575
#>
#> Expected Mean dispersion: 1
#> Sq.root of Expected Mean dispersion: 1
#>
#> Mean Likelihood Subgradient Candidates Per iid sample: 1.244The probit model typically yields:
The DIC often favors probit slightly for smooth S‑shaped curves (as seen in Chapter 04).
The complementary log–log link is often used for asymmetric response curves and hazard‑type interpretations (McCullagh and Nelder 1989; Agresti 2015).
For the cloglog link: \[ \eta_i = \log\!\big[-\log(1-\mu_i)\big], \qquad \mu_i = 1 - \exp\!\big(-e^{\eta_i}\big). \]
\[ \ell_{\text{cloglog}}(\beta) = \sum_{i=1}^n w_i\Big[ y_i \log\!\big(1 - e^{-e^{\eta_i}}\big) + (1-y_i)\big(-e^{\eta_i}\big) \Big]. \]
\[ \log p(\beta) = -\tfrac12(\beta-\mu_0)^\top \Sigma_0^{-1}(\beta-\mu_0) + \text{const}. \]
\[ \log p(\beta \mid y) = \sum_{i=1}^n w_i\Big[ y_i \log\!\big(1 - e^{-e^{\eta_i}}\big) + (1-y_i)\big(-e^{\eta_i}\big) \Big] - \tfrac12(\beta-\mu_0)^\top \Sigma_0^{-1}(\beta-\mu_0) + \text{const}. \]
The cloglog link is asymmetric:
It is useful when:
glmb.cloglog <- glmb(
cbind(Menarche, Total - Menarche) ~ Age2,
family = binomial(link = "cloglog"),
pfamily = dNormal(mu = mu3, Sigma = V3),
data = Menarche_Model_Data,
n = 1000
)
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurredsummary(glmb.cloglog)
#> Call
#> glmb(formula = cbind(Menarche, Total - Menarche) ~ Age2, family = binomial(link = "cloglog"),
#> pfamily = dNormal(mu = mu3, Sigma = V3), n = 1000, data = Menarche_Model_Data)
#>
#> Expected Residuals:
#> Min 1Q Median 3Q Max
#> -3.983059 -2.547182 -1.115548 1.196190 3.398801
#>
#> Prior and Maximum Likelihood Estimates with Standard Deviations
#>
#> Null Mode Prior Mean Prior.sd Max Like. Like.sd
#> (Intercept) -0.1173 -0.1173 0.4094 -0.5960 0.041
#> Age2 0.0000 0.0000 0.3117 0.9530 0.031
#>
#> Bayesian Estimates Based on 1000 iid draws
#>
#> Post.Mode Post.Mean Post.Sd MC Error Pr(Null_tail) SE(tail)
#> (Intercept) -0.5910908 -0.5924754 0.0428354 0.0013546 0.0000000 0
#> Age2 0.9451221 0.9455164 0.0293267 0.0009274 0.0000000 0
#> Pr(Prior_tail)
#> (Intercept) <2e-16 ***
#> Age2 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Directional Tail Summaries:
#>
#> Metric vs Null vs Prior
#> Mahalanobis Distance 33.1169 33.1169
#> Tail Probability 0.0000 0.0000
#> [Tail probabilities are P(delta^T * Z <= 0) in whitened space]
#>
#>
#> Distribution Percentiles
#>
#> 1.0% 2.5% 5.0% Median 95.0% 97.5% 99.0%
#> (Intercept) -0.6881 -0.6772 -0.6613 -0.5937 -0.5207 -0.5114 -0.495
#> Age2 0.8746 0.8850 0.8961 0.9465 0.9919 1.0031 1.017
#>
#> Effective Number of Parameters: 2.016892
#> Expected Residual Deviance: 120.9082
#> DIC: 206.9769
#>
#> Expected Mean dispersion: 1
#> Sq.root of Expected Mean dispersion: 1
#>
#> Mean Likelihood Subgradient Candidates Per iid sample: 1.252The cloglog model often fits poorly for symmetric S‑shaped curves (as shown in Chapter 04), but it is valuable for:
The Deviance Information Criterion (DIC) provides a Bayesian analogue to AIC (Spiegelhalter et al. 2002):
DIC_comp<-rbind(
extractAIC(glmb.logit),
extractAIC(glmb.probit),
extractAIC(glmb.cloglog))
rownames(DIC_comp)<-c("logit","probit","cloglog")
DIC_comp
#> pD DIC
#> logit 1.915889 114.6330
#> probit 2.027480 111.0575
#> cloglog 2.016892 206.9769Typical patterns:
The effective number of parameters (pD) is part of the same framework (Spiegelhalter et al. 2002) and helps diagnose model complexity.
Binomial GLMs are a core component of the glmbayes package. Their log‑concave likelihoods make them ideal for the envelope‑based accept‑reject sampler, and the familiar link functions allow analysts to choose models that match the scientific context (McCullagh and Nelder 1989; Gelman et al. 2013).
This chapter demonstrated:
In the next chapter, we extend these ideas to Poisson models, which share many structural similarities but introduce new considerations for count data.