Chapter A05: Simulation Methods - Likelihood Subgradient Densities

Kjell Nygren

2026-04-30

library(glmbayes)

1. Introduction

Many of the key models implemented in the glmbayes package make use of independent and identically distributed (iid) sampling procedures for models with log-concave likelihoods and multivariate normal priors. The implementation is based on (Nygren and Nygren 2006); the envelope function map and call graph are summarized in (Nygren 2025).

The objective of this vignette is to provide a detailed overview of the implementation of these simulation methods. To that end, this vignette contains the following:

  1. An overview of the structure of the main .cpp function (.rNormalGLM_cpp) that serves as the “director of traffic” for the simulation pipeline and its connection to the underlying theory.

  2. A detailed discussion of input validation and preprocessing steps, including dimension checks, dispersion adjustment, and offset transformation.

  3. An overview of the posterior optimization routine, including the use of user-supplied log-posterior and gradient functions, the BFGS method, and Hessian estimation.

  4. A discussion of how models are transformed into a standardized form suitable for envelope construction and simulation, with a deep dive into the glmb_Standardize_Model() function.

  5. A discussion of options for sizing the envelope functions and densities used during sampling, including computational implications and a deep dive into how the EnvelopeOpt() function determines envelope size.

  6. A deep dive into envelope construction with theoretical connections to the Nygren & Nygren (2006) paper, including the role of GPU acceleration for large models.

  7. A discussion of how simulation is performed, including the use of parallel sampling to accelerate computation.

  8. A discussion of how simulation output is converted back into the original model scale for use in higher-level functions.

This structure is designed to guide users through both the theoretical foundations and practical implementation of the simulation routines in glmbayes.

2. Overview of the Main Simulation Function: .rNormalGLM_cpp

The core simulation engine in the glmbayes package is the .rNormalGLM_cpp function, implemented in C++ via Rcpp. This function orchestrates the entire simulation pipeline, from preprocessing inputs to generating posterior samples. It serves as the “director of traffic,” coordinating optimization, standardization, envelope sizing, envelope construction, and sampling. Below is a breakdown of its key components:

  1. Input Validation and Preprocessing
    Ensures consistency in dimensions across inputs (x, y, mu, P, offset, wt).
    Adjusts weights for dispersion and initializes key variables.
    Transforms the model so that the prior mean is zero, simplifying subsequent optimization.

  2. Posterior Optimization
    Uses the optim() function with user-supplied negative log-posterior (f2) and corresponding gradient (f3) functions. Applies the BFGS method to find the posterior mode.
    Computes an approximate Hessian matrix at the mode, which is used in standardization.

  3. Model Standardization
    Calls glmb_Standardize_Model() to transform the model into a standardized form.
    Outputs include transformed design matrix, prior precision, posterior mode, and matrices for undoing the transformation.

  4. Envelope Sizing
    Determines the appropriate size of the envelope used for sampling.
    Balances computational cost with sampling efficiency.
    Uses heuristics and diagnostics to guide envelope width and grid resolution.

  5. Envelope Construction
    Constructs an envelope for the standardized model using EnvelopeBuild_c().
    The envelope facilitates efficient sampling from the posterior.
    Supports OpenCL acceleration for large models and grid sorting for small sample sizes.

  6. Simulation Execution
    Chooses between serial and parallel sampling based on use_parallel and sample size n.
    Executes the standardized sampler via .rNormalGLM_std_cpp() to generate samples. Verbose mode provides diagnostic timestamps.

  7. Post-Processing and Output
    Applies reverse transformation to undo standardization.
    Adds back the prior mean to the simulated coefficients.
    Constructs the output list with coefficients, posterior mode, prior info, and envelope.

This function encapsulates the full simulation workflow, making it a central component of the package. Its modular design allows for flexibility in model specification, optimization routines, and computational acceleration.

3. Input Validation and Preprocessing

The first step in the simulation pipeline ensures that all inputs are consistent and properly formatted. This includes:

  1. Dimension Checks
    Verifies that the number of rows and columns in x, y, mu, P, offset, and wt are compatible. Any mismatch triggers an error.

  2. Dispersion Adjustment
    If the model family is Poisson or Binomial, the dispersion is set to 1. Otherwise, the user-supplied dispersion is used to scale the weights.

  3. Offset Transformation
    Computes the transformed offset vector alpha = x %*% mu + offset, which is used to shift the model so that the prior mean becomes zero.

  4. Initialization
    Sets up the starting values for optimization and prepares the input structures for subsequent steps.

This preprocessing step ensures that the model is well-posed and that the optimization routine receives clean, consistent inputs.

4. Posterior Optimization

Once the inputs are validated, the function proceeds to estimate the posterior mode using numerical optimization:

  1. Objective and Gradient Functions
    The user supplies two functions — f2 for the negative log-posterior and f3 for the corresponding gradient — which are passed to the optim() routine.

  2. Optimization Method
    The BFGS algorithm is used to minimize the negative log-posterior. The starting point is start - mu, reflecting the zero-centered prior.

  3. Hessian Estimation
    The optim() function returns an approximate Hessian matrix at the mode. This matrix is used in the standardization step to transform the model into a form suitable for envelope construction.

  4. Convergence Check
    If the optimizer fails to converge, the function halts with an error message.

This step identifies the most probable parameter configuration under the posterior by minimizing its negative log-density, and prepares the curvature information needed for standardization.

5. Model Standardization with Theoretical Underpinnings: glmb_Standardize_Model()

The theoretical results and formulas in the Nygren and Nygren (2006) paper assume that models are in a standardized form. The paper contains the following key definitions and remarks:

  1. Definition 3: A probability model with a multivariate normal prior and log-concave likelihood function is in standard form if:
  1. Remark 11: If the log-likelihood function is concave and twice continuously differentiable, then a Cholesky decomposition of the posterior precision at the unique posterior mode (which always exists) can be used to reparameterize a model into one for which the posterior precision at the posterior mode is a diagonal matrix.

  2. Remark 12: Let P be a positive definite matrix. Then there exists a positive definite diagonal matrix D such that P - D also is a positive definite matrix.

  3. Remark 13: Let P and D be as in the previous remark. It then follows that the following two models have the same posterior density:

  1. Remark 14: Suppose that a model has a multivariate normal prior with a diagonal variance–covariance matrix. Also assume that the posterior precision at the posterior mode is a diagonal matrix. Then the model can be reparameterized into a model in standard form.
  1. Remark 15: If a probability model with a multivariate normal prior and log-concave likelihood function has a twice continuously differentiable log-likelihood function, then it can be reparameterized into a model in standard form.

In addition to this standardized form, we further require that the prior mean vector is zero, which simplifies some of our formulas and implementation steps.


Implementation Details: glmb_Standardize_Model()

The function glmb_Standardize_Model() implements this transformation in practice. It performs two key eigenvalue decompositions:

  1. The first decomposition diagonalizes the Hessian of the log-posterior at the posterior mode. This yields a transformation matrix L2 that is used to normalize the posterior precision. The transformed design matrix and prior precision are updated accordingly.

  2. The second decomposition identifies a diagonal matrix \(\varepsilon\) such that the difference \(P - \varepsilon\) remains positive definite. This ensures that the transformed model has a diagonal prior precision and can be reparameterized into standard form. A scaling loop is used to find the largest possible \(\varepsilon\) that satisfies this condition.

The function then applies a second transformation L3 to finalize the standardization. This results in a model where both the prior and the data precision matrices are diagonal and that the prior precision matrix is the identity matrix, satisfying the theoretical conditions for standard form.

The output includes:

This implementation ensures that the model satisfies the theoretical conditions for standard form, enabling efficient envelope construction and posterior sampling. It also provides a modular and transparent framework for debugging and extending the standardization process in future versions of the package.

6. Envelope Sizing

The envelope sizing logic is governed by the Gridtype parameter, which controls how the grid points are selected for each dimension.

Gridtype Logic (Nygren & Nygren 2006)

Let \(a_i\) be the data precision for dimension \(i\) (i.e., \(A2[i,i]\)).
In a standardized model, the prior precision is the identity matrix, so the posterior precision becomes \(1 + a_i\).
Let \(\omega_i\) be the width parameter computed below.

1. Gridtype 1: Static Threshold Test

2. Gridtype 2: Dynamic Envelope via EnvelopeOpt(a, n)

3. Gridtype 3: Always Three-Point Grid

We now consider the sampling performance using the foregoing approach when the data are multivariate normal with a diagonal precision matrix given by NP for a positive definite diagonal matrix P and positive scalar N.

Theorem 3: Consider a k-dimensional multivariate normal data model in standard form with precision as above, and let \(\tilde{a}^*(N)\) denote the value of \(\tilde{a}\) at N; then
\[ \lim_{N \to \infty} \tilde{a}^*(N) = \left( \frac{2}{\sqrt{\pi}} \right)^k \]

Proof: This follows immediately from Theorem 2 in Nygren & Nygren (2006).

4. Gridtype 4: Always Single-Point Grid (Mode Only)

Following Example 1 in Nygren & Nygren (2006), the expected number of candidate draws per accepted draw in the multivariate normal case is:
\[ \prod_{i=1}^{k} \sqrt{1 + a_i} \]

where k is the number of dimensions. More generally, if the data are “normal-like,” then we expect similar acceptance rates even for other generalized linear models.


Deep Dive: Gridtype 1 — Static Threshold Envelope

Gridtype 1 uses a simple threshold test to decide whether a single-point or three-point envelope should be constructed for each dimension. The key quantity is the posterior precision \(1 + a_i\), and the threshold \(\frac{2}{\sqrt{\pi}} \approx 1.128379\) arises from bounding the expected number of candidate draws per accepted sample.

If \(\sqrt{1 + a_i} \leq \frac{2}{\sqrt{\pi}}\), then the expected number of candidate draws per accepted sample under a single tangent envelope is less than the limiting upper bound provided by the three-tangent envelope. This implies that adding more grid points does not meaningfully reduce rejection rates, and a single tangent at the mode suffices.

This logic is especially useful for low-variance dimensions, where the posterior is sharply peaked and additional tangents offer little marginal benefit. The resulting envelope is minimal and fast to construct:

If the threshold is exceeded, a symmetric three-point grid is used:

The width parameter \(\omega_i\) is derived from the local curvature of the log-likelihood at the posterior mode. Specifically:

\[ \omega_i := \frac{ \sqrt{2} - \exp\left( -1.20491 - 0.7321 \sqrt{0.5 + a_i} \right) }{ \sqrt{1 + a_i} } \]

Since \(\sqrt{1 + a_i}\) is the square root of the precision for dimension i, the inverse is essentially the standard deviation. When \(a_i\) is large, the second term approaches zero, so:

\[ \omega_i \approx \frac{ \sqrt{2} }{ \sqrt{1 + a_i} } \]

and hence is essentially proportional to the standard deviation.


Deep Dive: Gridtype 2 — Adaptive Envelope via EnvelopeOpt()

Gridtype 2 uses a dynamic strategy to select the number of grid points per dimension based on a cost-benefit analysis. The goal is to minimize total runtime by balancing envelope construction cost against sampling efficiency.

The tradeoff is formalized as:

The function EnvelopeOpt(a_i, n) solves this tradeoff using approximations from subgradient theory. Dimensions are ranked by decreasing posterior standard deviation — equivalently, by increasing values of \(1 / (1 + a_i)\). Dimensions with larger a_i (i.e., sharper curvature) are more likely to benefit from additional tangents.

The algorithm proceeds as follows:

  1. Rank dimensions by \(1 / (1 + a_i)\)
  2. For each possible number of three-point dimensions (from 0 to l1):
    • Assign three-point grids to the top-ranked dimensions
    • Estimate total build cost and sampling cost
    • Compute total evaluation cost: evalest = T_build + T_sample
  3. Select the configuration with minimal evalest

This results in a vector dimcount where each entry is either 1 (single-point) or 3 (three-point), depending on the optimal tradeoff.

This adaptive logic allows the simulation routine to scale efficiently across models of varying size and complexity, and reflects the theoretical insights from Nygren & Nygren (2006) on envelope efficiency.

In the implementation of EnvelopeOpt(), the line

intest[i] <- 3^(i-1)

represents the grid construction cost at iteration i. This term estimates the number of log‑likelihood and gradient evaluations required to build the envelope as more dimensions are promoted from single‑point to three‑point tangents:

In addition to the build cost term intest[i], the function also tracks the expected sampling cost through the quantity

slopeest[i] <- prod(scaleest[i,])

This term represents the expected number of likelihood/gradient (slope) evaluations required to obtain a valid draw under the current grid configuration:

Thus, slopeest[i] captures the sampling efficiency of the envelope:

Together with the build cost term intest[i], the total evaluation cost is

\[ \text{evalest}[i] \;=\; \underbrace{3^{\,i-1}}_{\text{build cost}} \;+\; \underbrace{n \cdot \text{slopeest}[i]}_{\text{sampling cost}}. \]

This balance ensures that the algorithm adaptively chooses the grid size that minimizes the combined cost of envelope construction and sampling.

The optimizer selects the configuration that minimizes this total. Intuitively:

When GPU acceleration is available through OpenCL, the Grid construction can be performed in parallel, reducing the amount of time that components takes. To that end, we modify the call to the function to reflect the lower Grid contruction cost by passing a scaled up sample size (which yields the same optimization result).

  int core_CNT=get_opencl_core_count();
  
  if (verbose) {
    Rcpp::Rcout << "[INFO] OpenCL core count = " << core_CNT << "\n";
  }
  
  // Second row in G1b here is the posterior mode
  
  
  
  NumericVector gridindex(l1);
  
  if(Gridtype==2){
    // Temporarily scale n to encourage richer envelopes when GPU parallelism is available
    int scaled_n = std::max(1, n * core_CNT);
    
    if (verbose) {
      Rcpp::Rcout << "[INFO] Scaling n from " << n << " to " << scaled_n
                  << " for envelope optimization.\n";
    }
    
    gridindex = EnvelopeOpt(a_2, scaled_n);
    
    //gridindex=EnvelopeOpt(a_2,n);
  }

6. Building the Envelope

The implementation of EnvelopeBuild follows the envelope construction in Nygren & Nygren (2006) for models in standard form (see Section 3–3.3). The outline below captures the process when each dimension involves three-point tangents. If some dimensions involve only one-point tangents, only the mid-point is used and the intervals become the full real-number line.

1. Compute width parameters \(\omega_i\) from the diagonal precision matrix

Let \(\theta^{\ast}\) denote the unique posterior mode. For each dimension \(i\), define:

\[ \omega_{i} := \frac{\sqrt{2} - \exp\left(-1.20491 - 0.7321 \sqrt{0.5 - \frac{\partial^{2} \log f(\theta^{\ast} \mid y)}{\partial \theta_{i}^{2}}} \right)}{\sqrt{1 - \frac{\partial^{2} \log f(\theta^{\ast} \mid y)}{\partial \theta_{i}^{2}}}} \]

The widths \(\omega_i\) are derived from the local curvature of the log-likelihood at the posterior mode. This ensures that the three-interval construction per dimension yields an envelope whose efficiency does not deteriorate with sample size.

2. Construct intervals around the posterior mode \(\theta^\star\)

Set:

\[ \ell_{i,1} = \theta^{\ast}_{i} - 0.5\,\omega_{i}, \quad \ell_{i,2} = \theta^{\ast}_{i} + 0.5\,\omega_{i} \]

Then define three intervals per dimension:

\[ A_{i,1} = (-\infty,\ell_{i,1}), \quad A_{i,2} = [\ell_{i,1},\ell_{i,2}], \quad A_{i,3} = (\ell_{i,2},\infty) \]

Let \(J_{i} = \{1,2,3\}\) and define \(J = \prod_{i=1}^{p} J_{i}\), which has \(3^{p}\) elements. Each \(j \in J\) is a vector \((j_{1},\ldots,j_{p})\), and we define:

\[ A^{\ast}_{j} = \prod_{i=1}^{p} A_{i,j_{i}} \]

The collection \(A^{\ast} = \{A^{\ast}_{j} : j \in J\}\) forms a partition of \(\Theta\).

3. Select tangency points \(\theta^\star \pm \omega_i\)

For each \(j \in J\), define index sets:

\[ C_{j1} = \{i : j_{i} = 1\}, \quad C_{j2} = \{i : j_{i} = 2\}, \quad C_{j3} = \{i : j_{i} = 3\} \]

Tangency points \(\bar{\theta}_{j}\) are defined componentwise by:

\[ \bar{\theta}_{j,i} = \begin{cases} \theta^{\ast}_{i} - \omega_{i}, & i \in C_{j1} \\ \theta^{\ast}_{i}, & i \in C_{j2} \\ \theta^{\ast}_{i} + \omega_{i}, & i \in C_{j3} \end{cases} \]

These tangency points ensure the envelope touches the log-likelihood at representative points in each interval, guaranteeing dominance and tightness.

4. Build the full grid of tangency points

The Cartesian product of per-dimension partitions yields the \(3^p\) restricted densities described in the paper, ensuring coverage of the full parameter space.

5. Evaluate negative log-likelihood and gradients at each grid point

This step constructs the likelihood subgradient densities and facilitates accept–reject sampling. The subgradients \(c(\bar{\theta})\) are part of the definitions of likelihood subgradient densities (see Definition 2). Both subgradients and negative log-likelihoods (via \(h_{\bar{\theta}}(.)\)) are used during the accept–reject procedure.

6. Call Set_Grid_C2_pointwise to evaluate restricted multivariate normal log-densities

(Claim 2; Remark 5)
Each restricted density corresponds to a subset of the partition, normalized as in Remark 5.

7. Call setlogP_C2 to compute component log-probabilities and constants

(Remark 6)
The constants \(\tilde{a}\) and mixture weights \(\tilde{p}_i\) are computed explicitly as in Remark 6, ensuring proper normalization of the mixture envelope.

8. Normalize probabilities (PLSD) and optionally sort grid components

(Claim 2)
Normalization ensures the mixture of restricted densities forms a valid dominating density for the posterior. Sorting improves sampling efficiency.

7. Simulation Execution

The standardized sampler called through .rNormalGLM_std_cpp() uses the envelope to generate posterior samples via rejection sampling. This routine is called internally by .rNormalGLM_cpp(), which in turn is invoked by the user-facing function rNormal_reg(). Together, these routines implement envelope-based sampling for generalized linear models with log-concave likelihood functions and multivariate normal priors.

The envelope provides a mixture of restricted likelihood–subgradient densities, each defined over a region \(A_i\), with associated mixture weights \(\tilde{p}_i\) stored in PLSD. The sampling proceeds as follows:

  1. Draw a region index.
    A region index \(J(i)\) is drawn from the discrete distribution defined by PLSD.

  2. Draw a candidate from the restricted density.
    A candidate \(\theta_i\) is drawn from the restricted density \(q^{\bar{\theta}_{J(i)}}_{A_{J(i)}}\), using the normal CDF bounds loglt and logrt, and subgradient vector cbars. Simulation for each dimension uses the internal C++ function ctrnorm_cpp(), which explicitly uses these inputs.

  3. Compute the log-likelihood at the candidate.
    The log-likelihood \(\log f(y \mid \theta_i)\) is computed and stored in testll[0] using the appropriate likelihood function f2.

The acceptance test is performed using the inequality \[ \log(U_2) \le \mathrm{LLconst}[J(i)] + \mathrm{cbars}[J(i), ]^{T} \theta_i + \log f(y \mid \theta_i), \] which is equivalent to \[ \log(U_2) \le \log f(y \mid \theta_i) - \left( \log f(y \mid \bar{\theta}_{J(i)}) - c(\bar{\theta}_{J(i)})^{T}(\theta_i - \bar{\theta}_{J(i)}) \right). \]

Where: - LLconst[J(i)]: stores the precomputed quantity \[ -\log f(y \mid \bar{\theta}_{J(i)}) - c(\bar{\theta}_{J(i)})^{T} \bar{\theta}_{J(i)}, \] computed during envelope construction via setlogP_C2(). - cbars[J(i), ]: the precomputed subgradient vector \(c(\bar{\theta}_{J(i)})\), extracted via cbars(J(i), _). It defines the exponential tilt direction used to evaluate the envelope. - testll[0]: the log-likelihood at the candidate draw \(\theta_i\), evaluated using the model specified by family and link. - \(-\log(U_2)\): the threshold from a uniform draw \(U_2 \sim \mathrm{Unif}(0,1)\).

The right-hand side of this inequality is always non-positive, and equals zero when \(\theta_i = \bar{\theta}_{J(i)}\). This reflects the fact that the envelope is tangent to the log-likelihood at each \(\bar{\theta}_j\), and lies above it elsewhere.

This procedure guarantees that accepted samples are drawn from the posterior \(\pi(\theta \mid y)\). The envelope ensures bounded rejection probability, and the mixture structure allows efficient sampling across regions. The output out contains accepted draws, and draws records the number of attempts per sample.

The components returned by EnvelopeBuild() are used in specific steps of the sampling procedure as follows:

8. Post-Processing and Output

This section documents the exact reverse transformations applied to sampled draws and the structure of the object returned by .rNormalGLM_cpp.

8.1 Reverse-transformation pipeline

8.2 Returned object (fields you can rely on)

References

Nygren, K. N., and L. M. Nygren. 2006. Likelihood Subgradient Densities.” Journal of the American Statistical Association 101 (475): 1144–56. https://doi.org/10.1198/016214506000000357.
Nygren, Kjell. 2025. Chapter A08: Overview of Envelope Related Functions. Vignette in the glmbayes R package.