--- title: "Getting started with bayesqm" output: rmarkdown::html_vignette: toc: true toc_depth: 2 bibliography: REFERENCES.bib link-citations: true vignette: > %\VignetteIndexEntry{Getting started with bayesqm} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.5, dpi = 110 ) ``` This vignette walks through one Q study end to end in **bayesqm**: import, fit, diagnose, interpret, select K, and export. The running example is a synthetic 33-participant, 42-statement Q-sort. ```{r, message = FALSE, warning = FALSE} library(bayesqm) library(ggplot2) ``` ## Setup bayesqm performs inference with Stan, so you need a working Stan backend before the modelling functions will run. The package supports cmdstanr (recommended) and rstan. cmdstanr is distributed through the Stan R-universe rather than CRAN, and installing the R package does not install CmdStan itself; that is a separate step. Run the following once (not evaluated here): ```{r setup-cmdstan, eval = FALSE} install.packages("cmdstanr", repos = c("https://stan-dev.r-universe.dev", getOption("repos"))) cmdstanr::check_cmdstan_toolchain(fix = TRUE) cmdstanr::install_cmdstan() cmdstanr::cmdstan_version() # should print a version ``` On Windows you additionally need Rtools matching your R version; on macOS, run `xcode-select --install`. If you prefer rstan, install it from CRAN with a C++ toolchain; bayesqm uses whichever backend is available. The remainder of this vignette uses small precomputed demonstration objects (`demo_fit()`, `demo_run()`) so it renders without a Stan backend. To reproduce the examples on real data you will need the backend installed as above. ## The Q-sort data A Q study asks each participant to rank-order a set of statements into a forced distribution. `bayesqm` represents that as a `J × N` numeric matrix (statements as rows, participants as columns) plus a vector of forced-distribution counts. `read_qsort()` auto-detects the file format: ```{r, eval = FALSE} qdata <- read_qsort("mystudy.csv") # CSV / Excel / HTMLQ / FlashQ qdata <- read_qsort("mystudy.DAT") # PQMethod qdata <- read_qsort("mystudy.zip") # KADE project ``` The example dataset: ```{r} sim <- generate_data(N = 33, J = 42, K = 3, seed = 1) qdata <- qsort_data(sim$Y, distribution = sim$distribution) qdata ``` ## Fitting the model `fit_bayesian()` samples the posterior of a low-rank factor model with a Student-t likelihood (by default) and a hierarchical normal prior on the loadings, then resolves rotational ambiguity with the MatchAlign post-processing of @PoworoznekEtAl2025: ```{r, eval = FALSE} fit <- fit_bayesian(qdata, K = 3, chains = 4, iter = 2000, warmup = 1000, seed = 1) ``` ```{r} fit <- demo_fit(N = 33, J = 42, K = 3, seed = 1) fit ``` `summary(fit)` adds factor characteristics, the PSIS-LOO ELPD (when available), a divergence summary, and the MatchAlign diagnostic. ## Diagnostics Convergence statistics are stored on `fit$diagnostics`. The recommended thresholds are R-hat below 1.01, bulk and tail effective sample sizes above 400, and zero divergent transitions [@VehtariEtAl2021]: ```{r} fit$diagnostics ``` `plot_tucker()` visualises the per-draw Tucker phi between each aligned factor column and the MatchAlign pivot. Values near 1 indicate a stable alignment; bimodality signals residual label-switching. ```{r tucker, fig.cap = "MatchAlign alignment quality by factor."} plot_tucker(fit) ``` ## Factor loadings Each participant has a posterior distribution over their loading on every factor. `plot_loading_posterior()` draws the loading forest, with nested 50% and 95% credible-interval whiskers and the classical @Brown1980 cut-off as a faint reference. Filled points are participants with `P(factor k dominant) > 0.5`. ```{r loadings, fig.cap = "Loadings with nested 50% and 95% credible intervals."} plot_loading_posterior(fit) ``` The same information as a tidy data frame: ```{r} head(compute_loadings(fit$Lambda_draws, prob = 0.95)) ``` ## Factor z-scores `plot()` produces a cross-panel z-score dotchart; rows share an order (by factor 1's posterior mean, configurable via `sort_by`) so factors can be compared horizontally. ```{r zscores, fig.cap = "Posterior z-scores across factors with 50% and 95% credible intervals."} plot(fit) ``` For a single statement across factors, `plot_zscore_posterior()` shows the per-factor view: ```{r zscore-one, fig.cap = "Posterior z-score for a single statement across factors."} plot_zscore_posterior(fit, statement = 1) ``` ## Choosing K `run_bayes()` fits the model for each K in a range and applies the **peak-plus-Sivula** protocol: the ELPD peak is the automated choice and the Sivula parsimony rule [@SivulaEtAl2025] is reported alongside as a diagnostic. On real data: ```{r, eval = FALSE} run <- run_bayes(qdata, K_max = 4, seed = 1, chains = 2, iter = 1500, warmup = 800) ``` ```{r} run <- demo_run(K_max = 5, k_peak = 3, k_sivula = 1, case = "gap") run ``` ```{r elpd, fig.width = 7, fig.height = 4.2, fig.cap = "ΔELPD across K with the Sivula non-promotion band at |ΔELPD| < 4. Triangle marks the Sivula K, square marks the ELPD peak (the adopted K)."} make_elpd_diff(run) ``` The ELPD peak is always the adopted K. The Sivula diagnostic is reported alongside as a parsimony check. The `run$case` field labels the relationship between the two: when Sivula lands on the same K as the peak, the case is `agree`; when Sivula points to a smaller K than the peak, the case is `gap`; when Sivula exceeds the peak, the case is `reversed` (rare in practice). ## Distinguishing, consensus, and membership `bayesqm` replaces the classical z-score test with the posterior of an explicit viewpoint-divergence measure. For each statement, D_j is the mean absolute pairwise difference of the K viewpoint scores; the package reports P(D_j > delta | Y) and P(D_j < delta | Y), the probabilities that the viewpoints diverge meaningfully or practically agree. By default the separation delta is the reliability-adjusted critical difference from classical Q analysis [@Brown1980], computed from the posterior dominant-factor counts (`critical_delta()`); `suggest_delta()` (one forced-distribution category on the standardised scale) is an alternative, and results are reported with sensitivity across the choice of delta. No fixed probability cutoff defines a statement. The full distinguishing/consensus table is stored on `fit$qdc` (per viewpoint: grid position and z-score with 95% CrI, then `D_j` with 95% CrI, `pi_D`, `pi_C`). `critical_delta()` returns the separation the fit used: ```{r} critical_delta(fit$Lambda_draws) head(fit$qdc) ``` ```{r dist-cons, fig.cap = "Posterior viewpoint divergence D_j with 95% credible intervals; statements ordered by P(D_j > delta | Y)."} plot_dist_cons(fit) ``` Participant-level membership is also probabilistic. `classify_membership()` turns posterior dominance into a `Strong / Moderate / Weak` tier: ```{r} head(classify_membership(fit$Lambda_draws)) ``` ```{r membership, fig.width = 6.5, fig.height = 5.5, fig.cap = "Blue tiles: posterior probability that each factor is dominant for a given participant (values printed in each cell). The right column shows the overall assignment verdict on an orange-red scale."} make_dominant_panel(fit) ``` ## Posterior predictive check `plot_ppc()` shows the posterior distribution of the RMSE between `cor(Y_rep)` and `cor(Y_obs)`. A well-specified model puts most posterior mass at small RMSE. ```{r ppc, fig.cap = "PPC on the by-person correlation matrix."} plot_ppc(fit) ``` ## Hyperparameters ```{r hyper, fig.cap = "Posterior densities of nu, sigma, and tau."} plot_hyper(fit) ``` ## Reporting and exporting `save_bayesqm_plot()` writes any plot to PDF, SVG, PNG, TIFF, or JPEG at the chosen dimensions: ```{r, eval = FALSE} save_bayesqm_plot("fig_loadings.pdf", plot_loading_posterior(fit)) save_bayesqm_plot("fig_elpd.pdf", plot_elpd(run), width = 3.5, height = 3) ``` `caption_bayesqm()` returns a figure caption with model configuration, sample sizes, interval probability, and convergence diagnostics: ```{r} caption_bayesqm(fit) ``` Standard R accessors work on the fit (`coef`, `fitted`, `residuals`, `sigma`, `family`, `as.matrix`), and `as.matrix(fit)` returns a Stan-style draws matrix that `posterior` and `bayesplot` consume natively. ## Theming Every plot reads its palette through `bayesqm_colors()`. Switch the scheme once and every subsequent base-R plot follows: ```{r, eval = FALSE} bayesqm_set_colors("teal") # "blue" (default), "red", "purple", "grey" plot(fit) bayesqm_set_colors("blue") ``` ## Where next - `?fit_bayesian`: every prior and sampler option - `?run_bayes`: peak-plus-Sivula thresholds - `?bayesqm-membership`: the full set of membership summaries - `?rename_factors`: relabel `f1..fK` with substantive factor names - `ggplot2::autoplot()`: ggplot2 / ggdist versions of every view above, when `ggplot2` and `ggdist` are installed ## References