---
title: "Getting started with bayesqm"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 2
bibliography: REFERENCES.bib
link-citations: true
vignette: >
  %\VignetteIndexEntry{Getting started with bayesqm}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>",
  fig.width  = 7,
  fig.height = 4.5,
  dpi = 110
)
```

This vignette walks through one Q study end to end in **bayesqm**:
import, fit, diagnose, interpret, select K, and export. The running
example is a synthetic 33-participant, 42-statement Q-sort.

```{r, message = FALSE, warning = FALSE}
library(bayesqm)
library(ggplot2)
```

## Setup

bayesqm performs inference with Stan, so you need a working Stan
backend before the modelling functions will run. The package
supports cmdstanr (recommended) and rstan.

cmdstanr is distributed through the Stan R-universe rather than
CRAN, and installing the R package does not install CmdStan itself;
that is a separate step. Run the following once (not evaluated
here):

```{r setup-cmdstan, eval = FALSE}
install.packages("cmdstanr", repos = c("https://stan-dev.r-universe.dev", getOption("repos")))
cmdstanr::check_cmdstan_toolchain(fix = TRUE)
cmdstanr::install_cmdstan()
cmdstanr::cmdstan_version()   # should print a version
```

On Windows you additionally need Rtools matching your R version; on
macOS, run `xcode-select --install`. If you prefer rstan, install
it from CRAN with a C++ toolchain; bayesqm uses whichever backend
is available.

The remainder of this vignette uses small precomputed demonstration
objects (`demo_fit()`, `demo_run()`) so it renders without a Stan
backend. To reproduce the examples on real data you will need the
backend installed as above.

## The Q-sort data

A Q study asks each participant to rank-order a set of statements
into a forced distribution. `bayesqm` represents that as a `J × N`
numeric matrix (statements as rows, participants as columns) plus a
vector of forced-distribution counts. `read_qsort()` auto-detects
the file format:

```{r, eval = FALSE}
qdata <- read_qsort("mystudy.csv")   # CSV / Excel / HTMLQ / FlashQ
qdata <- read_qsort("mystudy.DAT")   # PQMethod
qdata <- read_qsort("mystudy.zip")   # KADE project
```

The example dataset:

```{r}
sim   <- generate_data(N = 33, J = 42, K = 3, seed = 1)
qdata <- qsort_data(sim$Y, distribution = sim$distribution)
qdata
```

## Fitting the model

`fit_bayesian()` samples the posterior of a low-rank factor model
with a Student-t likelihood (by default) and a hierarchical normal
prior on the loadings, then resolves rotational ambiguity with the
MatchAlign post-processing of @PoworoznekEtAl2025:

```{r, eval = FALSE}
fit <- fit_bayesian(qdata, K = 3, chains = 4, iter = 2000,
                    warmup = 1000, seed = 1)
```

```{r}
fit <- demo_fit(N = 33, J = 42, K = 3, seed = 1)
fit
```

`summary(fit)` adds factor characteristics, the PSIS-LOO ELPD (when
available), a divergence summary, and the MatchAlign
diagnostic.

## Diagnostics

Convergence statistics are stored on `fit$diagnostics`. The
recommended thresholds are R-hat below 1.01, bulk and tail effective
sample sizes above 400, and zero divergent transitions
[@VehtariEtAl2021]:

```{r}
fit$diagnostics
```

`plot_tucker()` visualises the per-draw Tucker phi between each
aligned factor column and the MatchAlign pivot. Values near 1
indicate a stable alignment; bimodality signals residual
label-switching.

```{r tucker, fig.cap = "MatchAlign alignment quality by factor."}
plot_tucker(fit)
```

## Factor loadings

Each participant has a posterior distribution over their loading on
every factor. `plot_loading_posterior()` draws the loading forest,
with nested 50% and 95% credible-interval whiskers and the classical
@Brown1980 cut-off as a faint reference. Filled points are
participants with `P(factor k dominant) > 0.5`.

```{r loadings, fig.cap = "Loadings with nested 50% and 95% credible intervals."}
plot_loading_posterior(fit)
```

The same information as a tidy data frame:

```{r}
head(compute_loadings(fit$Lambda_draws, prob = 0.95))
```

## Factor z-scores

`plot()` produces a cross-panel z-score dotchart; rows share an
order (by factor 1's posterior mean, configurable via `sort_by`) so
factors can be compared horizontally.

```{r zscores, fig.cap = "Posterior z-scores across factors with 50% and 95% credible intervals."}
plot(fit)
```

For a single statement across factors, `plot_zscore_posterior()`
shows the per-factor view:

```{r zscore-one, fig.cap = "Posterior z-score for a single statement across factors."}
plot_zscore_posterior(fit, statement = 1)
```

## Choosing K

`run_bayes()` fits the model for each K in a range and applies the
**peak-plus-Sivula** protocol: the ELPD peak is the automated choice
and the Sivula parsimony rule [@SivulaEtAl2025] is reported alongside
as a diagnostic.

On real data:

```{r, eval = FALSE}
run <- run_bayes(qdata, K_max = 4, seed = 1,
                 chains = 2, iter = 1500, warmup = 800)
```

```{r}
run <- demo_run(K_max = 5, k_peak = 3, k_sivula = 1, case = "gap")
run
```

```{r elpd, fig.width = 7, fig.height = 4.2, fig.cap = "ΔELPD across K with the Sivula non-promotion band at |ΔELPD| < 4. Triangle marks the Sivula K, square marks the ELPD peak (the adopted K)."}
make_elpd_diff(run)
```

The ELPD peak is always the adopted K. The Sivula diagnostic is
reported alongside as a parsimony check. The `run$case` field labels
the relationship between the two: when Sivula lands on the same K as
the peak, the case is `agree`; when Sivula points to a smaller K
than the peak, the case is `gap`; when Sivula exceeds the peak, the
case is `reversed` (rare in practice).

## Distinguishing, consensus, and membership

`bayesqm` replaces the classical z-score test with the posterior of an
explicit viewpoint-divergence measure. For each statement, D_j is the
mean absolute pairwise difference of the K viewpoint scores; the
package reports P(D_j > delta | Y) and P(D_j < delta | Y), the
probabilities that the viewpoints diverge meaningfully or practically
agree. By default the separation delta is the reliability-adjusted
critical difference from classical Q analysis [@Brown1980], computed
from the posterior dominant-factor counts (`critical_delta()`);
`suggest_delta()` (one forced-distribution category on the standardised
scale) is an alternative, and results are reported with sensitivity
across the choice of delta. No fixed probability cutoff defines a
statement.

The full distinguishing/consensus table is stored on `fit$qdc` (per
viewpoint: grid position and z-score with 95% CrI, then `D_j` with
95% CrI, `pi_D`, `pi_C`). `critical_delta()` returns the separation
the fit used:

```{r}
critical_delta(fit$Lambda_draws)
head(fit$qdc)
```

```{r dist-cons, fig.cap = "Posterior viewpoint divergence D_j with 95% credible intervals; statements ordered by P(D_j > delta | Y)."}
plot_dist_cons(fit)
```

Participant-level membership is also probabilistic.
`classify_membership()` turns posterior dominance into a
`Strong / Moderate / Weak` tier:

```{r}
head(classify_membership(fit$Lambda_draws))
```

```{r membership, fig.width = 6.5, fig.height = 5.5, fig.cap = "Blue tiles: posterior probability that each factor is dominant for a given participant (values printed in each cell). The right column shows the overall assignment verdict on an orange-red scale."}
make_dominant_panel(fit)
```

## Posterior predictive check

`plot_ppc()` shows the posterior distribution of the RMSE between
`cor(Y_rep)` and `cor(Y_obs)`. A well-specified model puts most
posterior mass at small RMSE.

```{r ppc, fig.cap = "PPC on the by-person correlation matrix."}
plot_ppc(fit)
```

## Hyperparameters

```{r hyper, fig.cap = "Posterior densities of nu, sigma, and tau."}
plot_hyper(fit)
```

## Reporting and exporting

`save_bayesqm_plot()` writes any plot to PDF, SVG, PNG, TIFF, or
JPEG at the chosen dimensions:

```{r, eval = FALSE}
save_bayesqm_plot("fig_loadings.pdf", plot_loading_posterior(fit))
save_bayesqm_plot("fig_elpd.pdf",      plot_elpd(run),
                  width = 3.5, height = 3)
```

`caption_bayesqm()` returns a figure caption with model
configuration, sample sizes, interval probability, and convergence
diagnostics:

```{r}
caption_bayesqm(fit)
```

Standard R accessors work on the fit (`coef`, `fitted`, `residuals`,
`sigma`, `family`, `as.matrix`), and `as.matrix(fit)` returns a
Stan-style draws matrix that `posterior` and `bayesplot` consume
natively.

## Theming

Every plot reads its palette through `bayesqm_colors()`. Switch the
scheme once and every subsequent base-R plot follows:

```{r, eval = FALSE}
bayesqm_set_colors("teal")   # "blue" (default), "red", "purple", "grey"
plot(fit)
bayesqm_set_colors("blue")
```

## Where next

- `?fit_bayesian`: every prior and sampler option
- `?run_bayes`: peak-plus-Sivula thresholds
- `?bayesqm-membership`: the full set of membership summaries
- `?rename_factors`: relabel `f1..fK` with substantive factor names
- `ggplot2::autoplot()`: ggplot2 / ggdist versions of every view
  above, when `ggplot2` and `ggdist` are installed

## References