Chapter 08: Models for the Poisson Family

Kjell Nygren

2026-04-30

library(glmbayes)

1. Introductory Discussion

Poisson regression is the canonical GLM for modeling count data (McCullagh and Nelder 1989).
The Poisson likelihood is log‑concave under the log link, making it an ideal setting for envelope‑based Bayesian sampling (Nygren and Nygren 2006).
This chapter mirrors the structure used for the Binomial vignette: we begin with the classical glm() fit, then introduce the Bayesian glmb() version, discuss prior specification, and compare posterior summaries to their classical counterparts.

We use the well‑known randomized controlled trial dataset of (Dobson 1990), which appears in many GLM textbooks and is included in the base R documentation for glm().
The dataset contains counts of subjects classified by treatment group and outcome category.

2. Poisson Likelihood and Model Structure

Poisson regression models count data of the form

\[ Y_i \in \{0,1,2,\ldots\}, \qquad Y_i \sim \text{Poisson}(\mu_i), \]

where \(\mu_i > 0\) is the expected count.
Observation weights \(w_i\) (e.g., exposure times) enter the likelihood through

\[ Y_i \sim \text{Poisson}(w_i \mu_i). \]

The mean is linked to a linear predictor via

\[ \eta_i = x_i^\top \beta, \qquad \mu_i = g^{-1}(\eta_i). \]

2.1 Weighted Poisson Log‑Likelihood

Up to constants, the weighted log‑likelihood is

\[ \ell(\beta) = \sum_{i=1}^n \left[ w_i y_i \log(\mu_i) - w_i \mu_i \right]. \]

Under the log link, the mean is

\[ \mu_i = e^{\eta_i}, \]

and the log‑likelihood simplifies to

\[ \ell(\beta) = \sum_{i=1}^n \left[ w_i y_i \eta_i - w_i e^{\eta_i} \right]. \]

This is the form used by both glm() and the Bayesian functions glmb() and rglmb().

2.2 Exponential‑Family Representation

The Poisson distribution belongs to the exponential family with canonical parameter

\[ \theta_i = \log(\mu_i), \]

and cumulant function

\[ b(\theta_i) = e^{\theta_i}. \]

The variance is

\[ \mathrm{Var}(Y_i) = \mu_i. \]

Under the log link, the linear predictor equals the canonical parameter:

\[ \eta_i = \theta_i. \]

This identity ensures that the Poisson log‑likelihood is globally log‑concave in the linear predictor, which is ideal for envelope construction and accept–reject sampling in glmbayes.

2.4 Likelihood, Prior, and Posterior (Normal Prior on beta)

With the log link, the weighted log‑likelihood is

\[ \ell(\beta) = \sum_{i=1}^n \left[ w_i y_i \eta_i - w_i e^{\eta_i} \right]. \]

A multivariate normal prior is specified as

\[ \log p(\beta) = -\tfrac{1}{2}(\beta - \mu_0)^\top \Sigma_0^{-1} (\beta - \mu_0) + \text{const}. \]

Combining likelihood and prior yields the log‑posterior:

\[ \log p(\beta \mid y) = \sum_{i=1}^n \left[ w_i y_i \eta_i - w_i e^{\eta_i} \right] - \tfrac{1}{2}(\beta - \mu_0)^\top \Sigma_0^{-1} (\beta - \mu_0) + \text{const}. \]

Because both terms are concave in \(\beta\), the posterior is log‑concave, enabling efficient iid sampling via the envelope‑based accept–reject sampler implemented in glmbayes.

3. Data Setup

We reproduce Dobson’s example exactly.
The data consist of 9 observations: three outcome levels crossed with three treatment groups.

    ## Dobson (1990) Page 93: Randomized Controlled Trial :
    set.seed(333)
    counts <- c(18,17,15,20,10,20,25,13,12)
    outcome <- gl(3,1,9)
    treatment <- gl(3,3)
    print(d.AD <- data.frame(treatment, outcome, counts))
#>   treatment outcome counts
#> 1         1       1     18
#> 2         1       2     17
#> 3         1       3     15
#> 4         2       1     20
#> 5         2       2     10
#> 6         2       3     20
#> 7         3       1     25
#> 8         3       2     13
#> 9         3       3     12

The table shows the observed counts for each treatment–outcome combination.

4. Classical Poisson Regression

The classical glm() call uses the Poisson family with the canonical log link:


    glm.D93 <- glm(counts ~ outcome + treatment,
                   family = poisson(link = "log"))
    summary(glm.D93)
#> 
#> Call:
#> glm(formula = counts ~ outcome + treatment, family = poisson(link = "log"))
#> 
#> Coefficients:
#>               Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  3.045e+00  1.709e-01  17.815   <2e-16 ***
#> outcome2    -4.543e-01  2.022e-01  -2.247   0.0246 *  
#> outcome3    -2.930e-01  1.927e-01  -1.520   0.1285    
#> treatment2   1.217e-15  2.000e-01   0.000   1.0000    
#> treatment3   8.438e-16  2.000e-01   0.000   1.0000    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for poisson family taken to be 1)
#> 
#>     Null deviance: 10.5814  on 8  degrees of freedom
#> Residual deviance:  5.1291  on 4  degrees of freedom
#> AIC: 56.761
#> 
#> Number of Fisher Scoring iterations: 4

4.1 Interpretation of the Classical Output

The glm summary reports:

The coefficients represent multiplicative effects on the expected count.
For example, \(\exp(\beta)\) gives the rate ratio associated with a one-unit change in a predictor.

The residual deviance (approximately 5.13) indicates a good fit relative to the null deviance (approximately 10.58), consistent with Dobson’s original analysis.

5. Bayesian Poisson Regression with glmb()

To fit the Bayesian analogue, we must specify a prior distribution for the regression coefficients.
The recommended workflow uses Prior_Setup(), which constructs a Zellner‑type g‑prior aligned with the model matrix.

5.1 Setting the Prior


    ps <- Prior_Setup(counts ~ outcome + treatment,
                      family = poisson())
#> Using default pwt = 0.01 (low-d default).
    mu <- ps$mu
    V  <- ps$Sigma

The printed Prior_Setup() output shows:

With the default pwt = 0.01, the prior is weakly informative.

5.2 Calling glmb()


    glmb.D93 <- glmb(counts ~ outcome + treatment,
                     family = poisson(),
                     pfamily = dNormal(mu = mu, Sigma = V))

This call mirrors glm() but adds the pfamily argument.

6. Printing and Summarizing the Bayesian Output

    print(glmb.D93)
#> 
#> Call:  glmb(formula = counts ~ outcome + treatment, family = poisson(), 
#>     pfamily = dNormal(mu = mu, Sigma = V))
#> 
#> Posterior Mean Coefficients:
#> (Intercept)     outcome2     outcome3   treatment2   treatment3  
#>    3.026184    -0.447186    -0.296234     0.005697    -0.003414  
#> 
#> Effective Number of Parameters: 4.943835 
#> Expected Residual Deviance: 10.11906 
#> DIC: 56.69507
    summary(glmb.D93)
#> Call
#> glmb(formula = counts ~ outcome + treatment, family = poisson(), 
#>     pfamily = dNormal(mu = mu, Sigma = V))
#> 
#> Expected Residuals:
#>        Min         1Q     Median         3Q        Max 
#> -0.9604284 -0.6108847 -0.1084108  0.9280288  1.0914552 
#> 
#> Prior and Maximum Likelihood Estimates with Standard Deviations
#> 
#>              Null Mode Prior Mean   Prior.sd  Max Like. Like.sd
#> (Intercept)  2.813e+00  2.813e+00  1.700e+00  3.045e+00   0.171
#> outcome2     0.000e+00  0.000e+00  2.012e+00 -4.543e-01   0.202
#> outcome3     0.000e+00  0.000e+00  1.918e+00 -2.930e-01   0.193
#> treatment2   0.000e+00  0.000e+00  1.990e+00  1.217e-15   0.200
#> treatment3   0.000e+00  0.000e+00  1.990e+00  8.438e-16   0.200
#> 
#> Bayesian Estimates Based on 1000 iid draws
#> 
#>              Post.Mode  Post.Mean    Post.Sd   MC Error Pr(Null_tail) SE(tail)
#> (Intercept)  3.042e+00  3.026e+00  1.679e-01  5.309e-03     9.200e-02    0.009
#> outcome2    -4.497e-01 -4.472e-01  1.989e-01  6.290e-03     1.400e-02    0.004
#> outcome3    -2.901e-01 -2.962e-01  1.878e-01  5.938e-03     5.900e-02    0.007
#> treatment2  -1.395e-05  5.697e-03  2.098e-01  6.634e-03     4.850e-01    0.016
#> treatment3   1.209e-06 -3.414e-03  1.989e-01  6.290e-03     4.940e-01    0.016
#>             Pr(Prior_tail)  
#> (Intercept)          0.092 .
#> outcome2             0.014 *
#> outcome3             0.059 .
#> treatment2           0.485  
#> treatment3           0.494  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>   Directional Tail Summaries:
#> 
#>                Metric vs Null vs Prior
#>  Mahalanobis Distance  2.3308   2.3308
#>      Tail Probability  0.0120   0.0120
#>   [Tail probabilities are P(delta^T * Z <= 0) in whitened space]
#> 
#> 
#> Distribution Percentiles
#> 
#>                  1.0%      2.5%      5.0%    Median     95.0%     97.5% 99.0%
#> (Intercept)  2.647232  2.697381  2.752295  3.020564  3.306623  3.358232 3.411
#> outcome2    -0.877269 -0.827067 -0.765187 -0.452620 -0.108355 -0.049461 0.022
#> outcome3    -0.703049 -0.654677 -0.601365 -0.291129  0.017370  0.069558 0.117
#> treatment2  -0.475433 -0.401161 -0.346093  0.010560  0.330793  0.393848 0.489
#> treatment3  -0.465001 -0.411936 -0.332202  0.003298  0.314890  0.371910 0.473
#> 
#> Effective Number of Parameters: 4.943835 
#> Expected Residual Deviance: 10.11906 
#> DIC: 56.69507 
#> 
#> Expected Mean dispersion: 1 
#> Sq.root of Expected Mean dispersion: 1 
#> 
#> Mean Likelihood Subgradient Candidates Per iid sample: 1.772

The summary includes:

6.1 Posterior Mean Coefficients

Posterior means are close to the classical MLEs because:

6.2 Effective Number of Parameters

The pD value is typically slightly below the nominal number of coefficients, reflecting mild shrinkage from the prior.

6.3 DIC

DIC is approximately 56.4, very close to the classical AIC value of 56.76.
This agreement is expected when the prior is weak and the posterior distribution is close to Gaussian.

7. Comparison of Classical and Bayesian Estimates

Quantity Classical glm Bayesian glmb
Point estimate MLE Posterior mean
Uncertainty SE Posterior SD
Model fit Deviance, AIC \(\bar{D}\), \(p_D\), DIC
Interpretation log-rate ratios same, with shrinkage

The Bayesian model provides richer diagnostics (tail probabilities, posterior quantiles) while preserving the familiar GLM structure.

8. Credible Intervals and Posterior Interpretation

Posterior percentiles (2.5%, 50%, 97.5%) provide Bayesian credible intervals for each coefficient.
These intervals closely track the classical Wald intervals because the posterior is nearly normal.

9. Concluding Remarks

This vignette demonstrates that:

In later chapters, we extend these ideas to:

References

Dobson, A. J. 1990. An Introduction to Generalized Linear Models. Chapman; Hall.
McCullagh, P., and J. A. Nelder. 1989. Generalized Linear Models. Chapman; Hall.
Nygren, K. N., and L. M. Nygren. 2006. Likelihood Subgradient Densities.” Journal of the American Statistical Association 101 (475): 1144–56. https://doi.org/10.1198/016214506000000357.