---
title: "MCMC-Diagnostics"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{MCMC-Diagnostics}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---


``` r
library(bvarnet)
library(bayesplot)
#> This is bayesplot version 1.15.0
#> - Online documentation and vignettes at mc-stan.org/bayesplot
#> - bayesplot theme set to bayesplot::theme_default()
#>    * Does _not_ affect other ggplot2 plots
#>    * See ?bayesplot_theme_set for details on theme setting
```

## Introduction

`bvarnet` uses the No-U-Turn Sampler (NUTS), a variant of Hamiltonian Monte Carlo (HMC), via CmdStan. Unlike simple Metropolis-Hastings, NUTS adapts the step-size and trajectory length automatically during a warmup phase, making it very efficient for the high-dimensional, correlated posteriors typical of multilevel VAR models. Even so, every MCMC analysis should pass a set of standard diagnostic checks before results are interpreted.

This vignette covers:

1. Fitting a model with sensible defaults
2. Convergence diagnostics: R-hat and Effective Sample Size
3. Trace plots for visual inspection
4. Sampler-level warnings: divergences and exceeded tree depth
5. Tuning the sampler via `iter`, `warmup`, `adapt_delta`, and
   `max_treedepth`

For a deeper treatment of HMC/NUTS theory and every diagnostic described here, see the [Stan Reference Manual — MCMC Sampling](https://mc-stan.org/docs/reference-manual/mcmc.html) and the community [Stan Wiki](https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations).

---

## Fitting a Reference Model

We reuse the five-variable ordinal model from `vignette("bvarnet")`. The same `studentlife` data are used throughout.


``` r
data(studentlife)

priors <- set_priors(
  phi   = prior(family = "normal", loc = 0, scale = 0.5),
  kappa = prior(family = "normal", loc = 0, scale = 1)
)
```


``` r
fit <- bvar(
  id_col   = "id",
  time_col = "day",
  y_cols   = c("anxious", "calm", "conventional", "critical", "dependable"),
  K        = 1,
  data     = studentlife,
  family   = "ordinal",
  priors   = priors,
  iter     = 4000,
  warmup   = 1000,
  chains   = 4,
  cores    = 4,
  seed     = 1337
)
```


``` r
print(fit)
#> BVAR Network fit
#> ======================================== 
#> Family:      ordinal
#> Outcomes (p): 5 
#> Lags (K):     1 
#> Fixed eff.:   0 
#> Observations: 147 
#> Rhat max:    1.001
#> Divergences: 1  WARNING: check model/priors.
#> Priors:
#>   beta   ~ Normal(0, 1)  (default)
#>   phi    ~ Normal(0, 0.5)
#>   kappa  ~ Normal(0, 1)
#> Total time:  20.2 sec
#> ========================================
```

---

## R-hat: Between-Chain Convergence

**R-hat** ($\hat{R}$) measures whether independent chains have converged to the same distribution by comparing within-chain to between-chain variance. A value of $\hat{R} \approx 1$ indicates that all chains are exploring the same region of parameter space.

**Rule of thumb:** $\hat{R} \leq 1.01$ is the modern criterion. Values above 1.01 suggest the chains have *not* converged and longer sampling is needed.

The convergence table is stored in `fit$convergence` as a data frame with one row per parameter, alongside bulk-ESS and tail-ESS.


``` r
head(fit$convergence)
#>   variable      rhat ess_bulk  ess_tail
#> 1     lp__ 1.0006264  5807.04  8763.894
#> 2 phi[1,1] 1.0002434 19633.01 11959.170
#> 3 phi[2,1] 0.9999424 16400.52 11459.448
#> 4 phi[3,1] 0.9999472 19919.94 12271.959
#> 5 phi[4,1] 1.0001548 29249.05 11530.322
#> 6 phi[5,1] 1.0000761 18623.57 12379.016
```

A quick summary of the R-hat distribution across all parameters:


``` r
rhat_vals <- fit$convergence$rhat
summary(rhat_vals)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.9999  1.0000  1.0001  1.0001  1.0002  1.0007
```

---

## Effective Sample Size (ESS)

Even though NUTS chains produce correlated samples, the **Effective Sample Size (ESS)** measures how many *independent* draws the correlated sequence is worth. `bvarnet` stores two variants:

| Statistic | Targets | Minimum recommended |
|-----------|---------|---------------------|
| `ess_bulk` | Central tendency (mean, median) | $\geq 400$ |
| `ess_tail` | Tail quantiles, credible interval bounds | $\geq 400$ |


``` r
ess_bulk <- fit$convergence$ess_bulk
ess_tail <- fit$convergence$ess_tail

cat("Bulk ESS — min:", round(min(ess_bulk, na.rm = TRUE)),
    "  median:", round(median(ess_bulk, na.rm = TRUE)), "\n")
#> Bulk ESS — min: 5807   median: 18210
cat("Tail ESS — min:", round(min(ess_tail, na.rm = TRUE)),
    "  median:", round(median(ess_tail, na.rm = TRUE)), "\n")
#> Tail ESS — min: 8764   median: 12304
cat("Parameters with bulk ESS < 400:", sum(ess_bulk < 400, na.rm = TRUE), "\n")
#> Parameters with bulk ESS < 400: 0
```

---

## Trace Plots

A **trace plot** shows the sampled value of a parameter at each iteration for every chain. Healthy chains look like a fuzzy caterpillar, they mix rapidly and all chains overlap.  Trends, spikes, or chains that stay separated indicate sampling problems.

To investigate the trace plots, we can use the `mcmc_trace()` from the `bayesplot` package.


``` r
phi_pars <- grep("^phi", dimnames(fit$draws)[[3]], value = TRUE)[1:6] # select some lag-coefficient parameters

mcmc_trace(fit$draws, pars = phi_pars) +
  ggplot2::labs(title = "Trace plots — first six lag coefficients (phi)")
```

![plot of chunk trace-phi](figure/trace-phi-1.png)

Complementary **density overlay** plots show whether chain-specific posteriors are consistent:


``` r
mcmc_dens_overlay(fit$draws, pars = phi_pars) +
  ggplot2::labs(title = "Per-chain posterior densities — phi")
```

![plot of chunk density-phi](figure/density-phi-1.png)

We can do this for any sampled parameter, for example the threshold parameter kappa:


``` r
kappa_pars <- grep("^kappa", dimnames(fit$draws)[[3]], value = TRUE)[1:4]

mcmc_trace(fit$draws, pars = kappa_pars) +
  ggplot2::labs(title = "Trace plots — first four cutpoints (kappa)")
```

![plot of chunk trace-kappa](figure/trace-kappa-1.png)

---

## Sampler Diagnostics: Divergences and Exceeded Tree Depth

Stan's NUTS sampler reports two additional per-chain diagnostics stored in `fit$diagnostics`:


``` r
fit$diagnostics
#>   num_divergent num_max_treedepth     ebfmi
#> 1             0                 0 0.8941142
#> 2             0                 0 0.9862093
#> 3             0                 0 0.9318894
#> 4             1                 0 0.9390785
```

### Divergences

A **divergence** occurs when the numerical integrator of HMC deviates from the true Hamiltonian trajectory. Even a small number of divergences can indicate that the sampler is missing regions of the posterior, leading to **biased estimates**. They should *not* be ignored!

Common causes and remedies:

- **Funnel-shaped posterior** (common in hierarchical models)
- **Step size too large**: increase `adapt_delta` towards 1 (see below).
- **Model misspecification**: revisit the prior scale or model structure.

### Exceeded Tree Depth

To draw each new sample, NUTS simulates a trajectory through parameter space, taking small steps and deciding when to turn around. The `max_treedepth` argument (default 10) sets a hard limit on how long that trajectory can be. When this warning appears, it means the sampler wanted to keep exploring but was cut off — it hit the limit before it was ready to stop. Unlike divergences, this does **not** mean the results are biased. It is a **slowness warning**: the sampler is doing extra work to produce each draw, so your effective sample size per hour of compute time is lower than it could be. Increasing `max_treedepth` gives the sampler more room to move, which typically resolves the warning.

---

## Tuning the Sampler

### Chain Length: `iter` and `warmup`

`bvar()` exposes two length arguments:

| Argument | Default | Role |
|----------|---------|------|
| `warmup` | 1000    | Iterations used for adaptation (step-size, mass matrix). **Discarded** from inference. |
| `iter`   | 4000    | Post-warmup (sampling) iterations *per chain*. Used for inference. |

The total number of draws available for inference is `iter × chains` (16 000 with the defaults above).  Increasing `iter` directly improves ESS at a proportional computational cost.  Increasing `warmup` can help when the sampler struggles to find the typical set (visible as a long burn-in in trace plots).

### `adapt_delta`: Controlling Divergences

`adapt_delta` (default 0.80) is the **target acceptance rate** that Stan's dual-averaging algorithm uses when adapting the step size during warmup. A higher target forces a smaller step size, which reduces discretisation error and typically eliminates divergences.

**Trade-off:** smaller step sizes mean shorter trajectories per unit time, so ESS per second decreases.  Increase `adapt_delta` only when you observe divergences; do not set it near 1 unless needed.

Typical values:

| Situation | `adapt_delta` |
|-----------|--------------|
| Default / no divergences | 0.80 (CmdStan default) |
| Some divergences | 0.90–0.95 |
| Persistent divergences in hierarchical models | 0.97–0.99 |

For full details see the [Stan reference on `adapt_delta`](https://mc-stan.org/docs/reference-manual/mcmc.html#adaptation.section).

### `max_treedepth`: Controlling Trajectory Length

`max_treedepth` (default 10) caps the doubling steps NUTS may take. The maximum trajectory length is $2^{\text{max\_treedepth}}$ leapfrog steps.

Increase this when `fit$diagnostics` shows a non-trivial number of iterations hitting the tree-depth limit.  Each additional level doubles compute time per affected iteration, so increases of 1–2 beyond the default are usually sufficient.

For full details see the [Stan reference on `max_treedepth`](https://mc-stan.org/docs/reference-manual/mcmc.html#nuts-configuration).

---

## Quick Checklist

Before reporting results from a `bvarnet` model:

1. **R-hat**: all parameters $\leq 1.01$.
2. **ESS**: `ess_bulk` and `ess_tail` $\geq 400$ for all parameters of
   interest.
3. **Trace plots**: caterpillar-like mixing, no trends or stuck chains.
4. **Divergences**: zero, or investigate with higher `adapt_delta`.
5. **Tree-depth warnings**: none, or increase `max_treedepth`.