1. Introductory Discussion

Poisson regression is the canonical GLM for modeling count data (McCullagh and Nelder 1989).
The Poisson likelihood is log‑concave under the log link, making it an ideal setting for envelope‑based Bayesian sampling (Nygren and Nygren 2006).
This chapter mirrors the structure used for the Binomial vignette: we begin with the classical glm() fit, then introduce the Bayesian glmb() version, discuss prior specification, and compare posterior summaries to their classical counterparts.

We use the well‑known randomized controlled trial dataset of (Dobson 1990), which appears in many GLM textbooks and is included in the base R documentation for glm().
The dataset contains counts of subjects classified by treatment group and outcome category.

2. Poisson Likelihood and Model Structure

Poisson regression models count data of the form

\[ Y_i \in \{0,1,2,\ldots\}, \qquad Y_i \sim \text{Poisson}(\mu_i), \]

where \(\mu_i > 0\) is the expected count.
Observation weights \(w_i\) (e.g., exposure times) enter the likelihood through

\[ Y_i \sim \text{Poisson}(w_i \mu_i). \]

The mean is linked to a linear predictor via

\[ \eta_i = x_i^\top \beta, \qquad \mu_i = g^{-1}(\eta_i). \]

2.1 Weighted Poisson Log‑Likelihood

Up to constants, the weighted log‑likelihood is

\[ \ell(\beta) = \sum_{i=1}^n \left[ w_i y_i \log(\mu_i) - w_i \mu_i \right]. \]

Under the log link, the mean is

\[ \mu_i = e^{\eta_i}, \]

and the log‑likelihood simplifies to

\[ \ell(\beta) = \sum_{i=1}^n \left[ w_i y_i \eta_i - w_i e^{\eta_i} \right]. \]

This is the form used by both glm() and the Bayesian functions glmb() and rglmb().

2.2 Exponential‑Family Representation

The Poisson distribution belongs to the exponential family with canonical parameter

\[ \theta_i = \log(\mu_i), \]

and cumulant function

\[ b(\theta_i) = e^{\theta_i}. \]

The variance is

\[ \mathrm{Var}(Y_i) = \mu_i. \]

Under the log link, the linear predictor equals the canonical parameter:

\[ \eta_i = \theta_i. \]

This identity ensures that the Poisson log‑likelihood is globally log‑concave in the linear predictor, which is ideal for envelope construction and accept–reject sampling in glmbayes.

2.3 Log Link and Its Properties

The canonical link for the Poisson family is the log link:

\[ g(\mu_i) = \log(\mu_i), \qquad \mu_i = e^{\eta_i}. \]

Key properties:

automatically enforces \(\mu_i > 0\)
yields a concave log‑likelihood in \(\eta_i\)
ensures stable IRLS updates in classical GLMs
supports efficient envelope‑based iid sampling in glmbayes

Noncanonical links (identity, square‑root) do not preserve log‑concavity and are not supported in glmb().

2.4 Likelihood, Prior, and Posterior (Normal Prior on beta)

With the log link, the weighted log‑likelihood is

\[ \ell(\beta) = \sum_{i=1}^n \left[ w_i y_i \eta_i - w_i e^{\eta_i} \right]. \]

A multivariate normal prior is specified as

\[ \log p(\beta) = -\tfrac{1}{2}(\beta - \mu_0)^\top \Sigma_0^{-1} (\beta - \mu_0) + \text{const}. \]

Combining likelihood and prior yields the log‑posterior:

\[ \log p(\beta \mid y) = \sum_{i=1}^n \left[ w_i y_i \eta_i - w_i e^{\eta_i} \right] - \tfrac{1}{2}(\beta - \mu_0)^\top \Sigma_0^{-1} (\beta - \mu_0) + \text{const}. \]

Because both terms are concave in \(\beta\), the posterior is log‑concave, enabling efficient iid sampling via the envelope‑based accept–reject sampler implemented in glmbayes.

3. Data Setup

We reproduce Dobson’s example exactly.
The data consist of 9 observations: three outcome levels crossed with three treatment groups.

    ## Dobson (1990) Page 93: Randomized Controlled Trial :
    set.seed(333)
    counts <- c(18,17,15,20,10,20,25,13,12)
    outcome <- gl(3,1,9)
    treatment <- gl(3,3)
    print(d.AD <- data.frame(treatment, outcome, counts))
#>   treatment outcome counts
#> 1         1       1     18
#> 2         1       2     17
#> 3         1       3     15
#> 4         2       1     20
#> 5         2       2     10
#> 6         2       3     20
#> 7         3       1     25
#> 8         3       2     13
#> 9         3       3     12

The table shows the observed counts for each treatment–outcome combination.

4. Classical Poisson Regression

The classical glm() call uses the Poisson family with the canonical log link:


    glm.D93 <- glm(counts ~ outcome + treatment,
                   family = poisson(link = "log"))
    summary(glm.D93)
#> 
#> Call:
#> glm(formula = counts ~ outcome + treatment, family = poisson(link = "log"))
#> 
#> Coefficients:
#>               Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  3.045e+00  1.709e-01  17.815   <2e-16 ***
#> outcome2    -4.543e-01  2.022e-01  -2.247   0.0246 *  
#> outcome3    -2.930e-01  1.927e-01  -1.520   0.1285    
#> treatment2   1.217e-15  2.000e-01   0.000   1.0000    
#> treatment3   8.438e-16  2.000e-01   0.000   1.0000    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for poisson family taken to be 1)
#> 
#>     Null deviance: 10.5814  on 8  degrees of freedom
#> Residual deviance:  5.1291  on 4  degrees of freedom
#> AIC: 56.761
#> 
#> Number of Fisher Scoring iterations: 4

4.1 Interpretation of the Classical Output

The glm summary reports:

Coefficient estimates on the log‑scale
Standard errors derived from the Fisher information
Residual deviance and null deviance
AIC, which penalizes model complexity

The coefficients represent multiplicative effects on the expected count.
For example, \(\exp(\beta)\) gives the rate ratio associated with a one-unit change in a predictor.

The residual deviance (approximately 5.13) indicates a good fit relative to the null deviance (approximately 10.58), consistent with Dobson’s original analysis.

5. Bayesian Poisson Regression with glmb()

To fit the Bayesian analogue, we must specify a prior distribution for the regression coefficients.
The recommended workflow uses Prior_Setup(), which constructs a Zellner‑type g‑prior aligned with the model matrix.

5.1 Setting the Prior


    ps <- Prior_Setup(counts ~ outcome + treatment,
                      family = poisson())
#> Using default pwt = 0.01 (low-d default).
    mu <- ps$mu
    V  <- ps$Sigma

The printed Prior_Setup() output shows:

prior means (intercept from null model, effects centered at 0)
prior standard deviations (scaled by \(g = (1 - pwt) / pwt\))
the implied pseudo-sample size \(n_{\text{prior}}\)

With the default pwt = 0.01, the prior is weakly informative.

5.2 Calling glmb()


    glmb.D93 <- glmb(counts ~ outcome + treatment,
                     family = poisson(),
                     pfamily = dNormal(mu = mu, Sigma = V))

This call mirrors glm() but adds the pfamily argument.

6. Printing and Summarizing the Bayesian Output

    print(glmb.D93)
#> 
#> Call:  glmb(formula = counts ~ outcome + treatment, family = poisson(), 
#>     pfamily = dNormal(mu = mu, Sigma = V))
#> 
#> Posterior Mean Coefficients:
#> (Intercept)     outcome2     outcome3   treatment2   treatment3  
#>    3.025326    -0.462743    -0.292851     0.008365    -0.005307  
#> 
#> Effective Number of Parameters: 4.80691 
#> Expected Residual Deviance: 10.0048 
#> DIC: 56.44388
    summary(glmb.D93)
#> Call
#> glmb(formula = counts ~ outcome + treatment, family = poisson(), 
#>     pfamily = dNormal(mu = mu, Sigma = V))
#> 
#> Expected Residuals:
#>        Min         1Q     Median         3Q        Max 
#> -0.9120436 -0.6079672 -0.1158343  0.9415404  1.0711706 
#> 
#> Prior and Maximum Likelihood Estimates with Standard Deviations
#> 
#>              Null Mode Prior Mean   Prior.sd  Max Like. Like.sd
#> (Intercept)  2.813e+00  2.813e+00  1.700e+00  3.045e+00   0.171
#> outcome2     0.000e+00  0.000e+00  2.012e+00 -4.543e-01   0.202
#> outcome3     0.000e+00  0.000e+00  1.918e+00 -2.930e-01   0.193
#> treatment2   0.000e+00  0.000e+00  1.990e+00  1.217e-15   0.200
#> treatment3   0.000e+00  0.000e+00  1.990e+00  8.438e-16   0.200
#> 
#> Bayesian Estimates Based on 1000 iid draws
#> 
#>              Post.Mode  Post.Mean    Post.Sd   MC Error Pr(Null_tail) SE(tail)
#> (Intercept)  3.042e+00  3.025e+00  1.715e-01  5.425e-03     1.020e-01    0.010
#> outcome2    -4.497e-01 -4.627e-01  2.005e-01  6.339e-03     6.000e-03    0.002
#> outcome3    -2.901e-01 -2.929e-01  1.893e-01  5.987e-03     6.300e-02    0.008
#> treatment2  -1.395e-05  8.365e-03  1.974e-01  6.242e-03     4.910e-01    0.016
#> treatment3   1.209e-06 -5.307e-03  1.968e-01  6.225e-03     4.720e-01    0.016
#>             Pr(Prior_tail)   
#> (Intercept)          0.102   
#> outcome2             0.006 **
#> outcome3             0.063 . 
#> treatment2           0.491   
#> treatment3           0.472   
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>   Directional Tail Summaries:
#> 
#>                Metric vs Null vs Prior
#>  Mahalanobis Distance  2.3904   2.3904
#>      Tail Probability  0.0060   0.0060
#>   [Tail probabilities are P(delta^T * Z <= 0) in whitened space]
#> 
#> 
#> Distribution Percentiles
#> 
#>                  1.0%      2.5%      5.0%    Median     95.0%     97.5%  99.0%
#> (Intercept)  2.639578  2.695577  2.751660  3.024622  3.290824  3.354803  3.399
#> outcome2    -0.897696 -0.853371 -0.793634 -0.454890 -0.120121 -0.081302 -0.020
#> outcome3    -0.735274 -0.675858 -0.607380 -0.289538  0.024499  0.070597  0.142
#> treatment2  -0.443886 -0.368684 -0.317644  0.004171  0.343037  0.410547  0.460
#> treatment3  -0.445956 -0.378937 -0.327964 -0.011068  0.320298  0.375494  0.459
#> 
#> Effective Number of Parameters: 4.80691 
#> Expected Residual Deviance: 10.0048 
#> DIC: 56.44388 
#> 
#> Expected Mean dispersion: 1 
#> Sq.root of Expected Mean dispersion: 1 
#> 
#> Mean Likelihood Subgradient Candidates Per iid sample: 1.858

The summary includes:

Posterior means and posterior modes
Posterior standard deviations
Monte Carlo error estimates
Tail probabilities relative to the prior mean
Effective number of parameters (pD)
Expected residual deviance (\(\bar{D}\))
DIC, the Bayesian analogue of AIC

6.1 Posterior Mean Coefficients

Posterior means are close to the classical MLEs because:

the prior is weak (pwt = 0.01)
the Poisson likelihood is strongly log‑concave
the sample size is modest but informative

6.2 Effective Number of Parameters

The pD value is typically slightly below the nominal number of coefficients, reflecting mild shrinkage from the prior.

6.3 DIC

DIC is approximately 56.4, very close to the classical AIC value of 56.76.
This agreement is expected when the prior is weak and the posterior distribution is close to Gaussian.

7. Comparison of Classical and Bayesian Estimates

Quantity	Classical glm	Bayesian glmb
Point estimate	MLE	Posterior mean
Uncertainty	SE	Posterior SD
Model fit	Deviance, AIC	\(\bar{D}\), \(p_D\), DIC
Interpretation	log-rate ratios	same, with shrinkage

The Bayesian model provides richer diagnostics (tail probabilities, posterior quantiles) while preserving the familiar GLM structure.

8. Credible Intervals and Posterior Interpretation

Posterior percentiles (2.5%, 50%, 97.5%) provide Bayesian credible intervals for each coefficient.
These intervals closely track the classical Wald intervals because the posterior is nearly normal.

9. Concluding Remarks

This vignette demonstrates that:

Poisson GLMs with log links are ideal for envelope‑based Bayesian sampling
glmb() mirrors glm() while providing full posterior inference
Prior_Setup() offers a principled default prior
Posterior summaries align closely with classical results under weak priors

In later chapters, we extend these ideas to:

overdispersed quasi‑Poisson models
informative priors
prediction and posterior predictive checks
GPU‑accelerated Poisson models for large datasets

References

Dobson, A. J. 1990. An Introduction to Generalized Linear Models. Chapman; Hall.

McCullagh, P., and J. A. Nelder. 1989. Generalized Linear Models. Chapman; Hall.

Nygren, K. N., and L. M. Nygren. 2006. “Likelihood Subgradient Densities.” Journal of the American Statistical Association 101 (475): 1144–56. https://doi.org/10.1198/016214506000000357.

Chapter 08: Models for the Poisson Family

Kjell Nygren

2026-05-18