The glmbayes package uses parallel
sampling for many envelope-based samplers because generating
iid draws can be computationally expensive and acceptance rates may be
low. By default, use_parallel = TRUE, so large fits often
run on multiple cores via RcppParallel.
However, RcppParallel-based sampling cannot be interrupted. Once the main parallel loop starts, the R session is blocked until it completes. Users may experience the sampler as “freezing” during long runs. To address this, the package runs pilots that estimate how long sampling will take and, in interactive sessions, asks users to opt in or out before the non-interruptible phase begins.
This chapter describes the parallel sampling implementation, the
pilot logic, and the interactive safeguards. Parallel envelope
construction (e.g., in EnvelopeDispersionBuild)
uses a similar pilot pattern and is noted briefly. GPU acceleration via
OpenCL is covered in Chapter A10.
Parallel sampling is used in two main C++ paths:
| Entry point | When | Notes |
|---|---|---|
rNormalGLM_cpp |
Non-Gaussian GLMs (Poisson, binomial, Gamma, etc.) with
use_parallel = TRUE and n > 1 |
Called by rNormal_reg |
rIndepNormalGammaReg_cpp |
Gaussian with independent Normal–Gamma prior,
use_parallel = TRUE and n > 1 |
Uses rIndepNormalGammaReg_std_parallel_cpp |
Serial fallback: When
use_parallel = FALSE or n == 1, the serial
path is used instead. The serial path includes
Rcpp::checkUserInterrupt() in its loop, so it can be
interrupted (e.g., with Ctrl+C).
RcppParallel uses Intel TBB (Threading Building Blocks) to distribute
work across multiple threads. The R main thread is blocked while
RcppParallel::parallelFor() runs. R’s
R_CheckUserInterrupt() is not safe to call from worker
threads, and there is no supported way to cancel a running TBB task from
R. As a result:
The pilot phase mitigates this by estimating runtime before the full parallel run and giving users a chance to decline.
The pilot flow consists of:
n, capped so it takes at most ~5
minutes) to refine the time estimate.n.Threshold: 300 seconds (5 minutes). Above this, the user is asked whether to continue.
Prompt:
"Estimated simulation exceeds 5 minutes. Continue? [y/N]: "
(or similar).
Responses: y, yes,
1, or continue → proceed; n,
no, 2, or empty → stop with an informative
error.
For non-Gaussian GLMs (Poisson, binomial, Gamma, etc.), the
rNormalGLM C++ path uses
run_rcppparallel_pilot():
A single draw is generated in serial mode to measure per-draw time and to detect problems early.
If the pilot hits an internal max_draws cap with
zero accepted draws, this indicates the envelope may be
insufficiently tight. The code warns that:
Recommended actions include: - Set use_opencl = TRUE or
increase the requested number of draws (which affects
n_envopt and may tighten the envelope). - Try a different
Gridtype to force a tighter envelope. - Strengthen the
prior.
An interactive prompt then asks:
"Enter 1 to continue full run, 2 to stop and return partial results: ".
If the user chooses 2, partial test results are returned and the full
run is aborted.
The calibration batch size is:
\[ m_{\text{stage}} = \min\left( \lceil 0.01 \cdot n \rceil,\; \lfloor 300000 / \text{est\_per\_draw\_ms} \rfloor \right), \]
where 300000 ms \(\approx\) 5
minutes. The calibration run uses
parallelFor(0, m_stage, worker) to measure per-draw time
empirically.
From the calibration:
per_candidate_sec = calibration elapsed time / total
candidates usedest_per_draw_sec = per_candidate_sec ×
E_draws (expected candidates per accepted draw)est_total_sec = est_per_draw_sec ×
nIf est_total_sec > 300:
"Do you want to continue? [y/N]: "With verbose = TRUE, the pilot reports:
n drawsavg_candidates_per_draw (empirical) vs
E_drawsper_candidate_sec and
est_per_draw_secFor Gaussian regression with independent Normal–Gamma priors, the pilot logic is similar in spirit but tailored to joint \((\beta, \phi)\) sampling:
One observation is generated in serial mode to measure per-observation time.
\[ m_{\text{stage}} = \min\left( \lceil 0.01 \cdot n \rceil,\; \lfloor 300000 / \text{per\_obs\_ms\_serial} \rfloor \right). \]
The calibration run is parallel:
parallelFor(0, m_stage, worker).
per_obs_sec = calibration elapsed time /
m_stageest_total_sec = per_obs_sec ×
nSame as rNormalGLM: if est_total_sec > 300, prompt in
interactive sessions.
The RSS and UB2 steps in EnvelopeDispersionBuild use
closed-form bounds (bound_rss_over_dispersion and
bound_ub2_over_dispersion) and no longer perform RSS/UB2
minimization or pilot timing blocks.
Since minimization/pilots are disabled, the time-estimate safeguard is not used.
| Scenario | Recommendation |
|---|---|
| Debugging or small runs | use_parallel = FALSE – serial path is
interruptible |
| Large production runs | use_parallel = TRUE – faster; rely on the pilot and
prompt |
| Scripts / non-interactive | No prompt; runs proceed automatically |
To disable parallel sampling:
glmb(formula, data, family, pfamily, n = 5000, use_parallel = FALSE)
rglmb(n = 5000, y, x, family, pfamily, use_parallel = FALSE)Serial sampling is slower but can be interrupted. Parallel sampling is faster but non-interruptible once the main loop starts.
The parallel samplers use the standard RcppParallel
Worker pattern:
RcppParallel::Workeroperator()(size_t begin, size_t end) processes the
range [begin, end)RcppParallel::RVector or
RcppParallel::RMatrix for thread-safe writesf2, f3) are called from workers; a mutex
protects R callback invocations where neededparallelFor interface| Symptom | Likely cause | Suggestion |
|---|---|---|
| Sampler appears to hang | Long parallel run | Use verbose = TRUE to see pilot estimates before the
main run |
| Zero accepts in pilot | Envelope too loose | Try use_opencl = TRUE, larger n, different
Gridtype, or stronger prior |
| Prompt never appears | Non-interactive session | Expected; runs proceed automatically in batch/CI |
| Want to interrupt a run | Parallel phase has started | Cannot interrupt; use use_parallel = FALSE for
interruptible runs |
?glmb,
?rglmb,
?lmb, ?rlmb
– use_parallel argument