--- title: "diagFDR: mokapot diagnostics (competed winners)" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{diagFDR: mokapot diagnostics (competed winners)} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.5 ) ``` This vignette demonstrates how to compute **diagFDR** diagnostics from **mokapot** PSM exports. A key choice is to compute diagnostics on **competed winners**: one PSM per `run + spectrum_id`, selecting the maximum mokapot score among candidates (targets and decoys). This corresponds to the standard *target–decoy competition* setting for PSM-level inference. The workflow is: 1. Read mokapot target and decoy PSM outputs. 2. Construct a competed-winner universe. 3. Run `dfdr_run_all()` and inspect tables/plots. 4. Optionally export results and a human-readable report. ## Runnable toy example (no external files required) We start with a small simulated dataset that is similar to a competed-winner mokapot output. Any workflow producing a table that can be mapped to columns `id`, `is_decoy`, `q`, `pep`, `run`, and `score` can similarly be used. ```{r toy} library(diagFDR) set.seed(2) n <- 6000 toy <- data.frame( id = paste0("run1||scan", seq_len(n)), is_decoy = sample(c(FALSE, TRUE), n, replace = TRUE, prob = c(0.98, 0.02)), q = pmin(1, runif(n)^3), # many small q-values pep = pmin(1, pmax(0, runif(n)^3 + rnorm(n, sd = 0.02))), # correlated toy PEP run = "run1", score = rnorm(n) ) x <- as_dfdr_tbl( toy, unit = "psm", scope = "global", q_source = "toy mokapot", q_max_export = 1 ) diag <- dfdr_run_all( xs = list(mokapot = x), alpha_main = 0.01, alphas = c(5e-4, 1e-3, 2e-3, 5e-3, 1e-2, 2e-2, 5e-2, 1e-1, 2e-1), low_conf = c(0.2, 0.5) ) ``` ### Headline stability at 1% ```{r headline} diag$tables$headline ``` ### Decoy tail support and stability proxy ```{r stability-plots} diag$plots$dalpha diag$plots$cv ``` ### Local boundary support and elasticity ```{r boundary-stability} diag$plots$dwin diag$plots$elasticity ``` ### Equal-chance plausibility by q-band (internal check) ```{r equal-chance} diag$tables$equal_chance_pooled diag$plots$equal_chance__mokapot ``` ### PEP reliability and expected errors (ΣPEP) ```{r pep} diag$tables$pep_IPE diag$plots$pep_reliability__mokapot ``` ```{r sumpep} # sumpep is a list-of-tibbles (one per list) diag$tables$sumpep$mokapot ``` ## Real mokapot workflow (targets + decoys text exports) The code below shows how to run **diagFDR** directly from mokapot outputs. ```{r real-mokapot, eval=FALSE} library(diagFDR) # Read mokapot outputs (targets + decoys) raw <- read_mokapot_psms( target_path = "path/to/your_output.mokapot.psms.txt", decoy_path = "path/to/your_output.mokapot.decoy.psms.txt" ) # Construct competed winners (1 per run+spectrum_id; max mokapot score) x <- mokapot_competed_universe( raw, id_mode = "runxid", unit = "psm", scope = "global", q_source = "mokapot q-value", q_max_export = 1 ) # Run diagnostics diag <- dfdr_run_all( xs = list(mokapot = x), alpha_main = 0.01, alphas = c(5e-4, 1e-3, 2e-3, 5e-3, 1e-2, 2e-2, 5e-2, 1e-1, 2e-1), low_conf = c(0.2, 0.5) ) # Write outputs to disk (tables + plots + summary + manifest + README) dfdr_write_report( diag, out_dir = "diagFDR_mokapot_out", formats = c("csv", "png", "manifest", "readme", "summary") ) # Render a single HTML report (requires rmarkdown in Suggests) dfdr_render_report(diag, out_dir = "diagFDR_mokapot_out") ``` ## Interpretation notes - `D_alpha` (tail support) and `D_alpha_win` (local boundary support) quantify how well the cutoff is supported by decoys. Very strict thresholds can yield sparse decoy tails and thus unstable operating points (large `CV_hat`). - `S_alpha(eps)` (elasticity) captures list sensitivity to small threshold perturbations. Low Jaccard overlap indicates that accepted IDs are sensitive to the chosen cutoff. - Equal-chance diagnostics (decoy fraction in low-confidence q-bands) and PEP-based diagnostics (PEP reliability, ΣPEP) are **internal consistency checks** under target–decoy assumptions. They are informative but do not replace external validation strategies (e.g., entrapment) when decoy representativeness is uncertain.