--- title: "diagFDR: DIA-NN diagnostics from report.parquet" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{diagFDR: DIA-NN diagnostics from report.parquet} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.5 ) ``` This vignette demonstrates how to run **diagFDR** on DIA-NN exports and interpret the key diagnostics in terms of **scope**, **calibration**, and **stability**. The typical workflow is: 1. Export DIA-NN results with decoys and a permissive q-value ceiling. 2. Read `report.parquet`. 3. Construct one or more *universes* (global precursor list, run×precursor, etc.). 4. Run diagnostics and inspect tables/plots. 5. (Optional) write tables/plots and a human-readable report to disk. ## Recommended DIA-NN export settings To enable all diagnostics, export: - decoys: `--report-decoys` - a permissive export ceiling: `--qvalue 0.5` (or higher) The q-value ceiling matters because some diagnostics operate in low-confidence regions (e.g. equal-chance plausibility checks, or local-window support around cutoffs). ## Runnable toy example (no DIA-NN files required) We start with a small simulated dataset that exercises the diagFDR functions. Any workflow producing outputs that can be mapped to the columns `id`, `is_decoy`, `q`, `pep`, `run`, and `score` can be handled similarly. ```{r toy-data} library(diagFDR) set.seed(1) n <- 3000 toy_global <- data.frame( id = paste0("P", seq_len(n)), is_decoy = sample(c(FALSE, TRUE), n, replace = TRUE, prob = c(0.97, 0.03)), q = pmin(1, runif(n)^3), # skew toward small q-values pep = NA_real_, run = NA_character_, score = NA_real_ ) x_global <- as_dfdr_tbl( toy_global, unit = "precursor", scope = "global", q_source = "toy", q_max_export = 0.5 ) diag <- dfdr_run_all( xs = list(global = x_global), alpha_main = 0.01, alphas = c(1e-3, 2e-3, 5e-3, 1e-2, 2e-2, 5e-2, 1e-1, 2e-1), low_conf = c(0.2, 0.5) ) ``` ### Headline stability at 1% ```{r headline} diag$tables$headline ``` ### Tail support and stability versus threshold ```{r plots-stability} diag$plots$dalpha diag$plots$cv ``` ### Local boundary support ```{r plot-dwin} diag$plots$dwin ``` ### Threshold elasticity (list sensitivity to changing alpha) ```{r plot-elasticity} diag$plots$elasticity ``` ### Equal-chance plausibility by q-band ```{r equal-chance} diag$tables$equal_chance_pooled diag$plots$equal_chance__global ``` ## Real DIA-NN parquet workflow The following code shows how to run the pipeline on a real DIA-NN `report.parquet`. ```{r real-diann, eval=FALSE} # Requires arrow rep <- read_diann_parquet("path/to/report.parquet") # (A) Global precursor list using Global.Q.Value # Recommended for experiment-wide (pooled) lists. x_global_gq <- diann_global_precursor( rep, q_col = "Global.Q.Value", q_max_export = 0.5, unit = "precursor", scope = "global", q_source = "Global.Q.Value" ) # (B) Run×precursor universe using run-wise Q.Value # Recommended for per-run decisions / QC. x_runx <- diann_runxprecursor( rep, q_col = "Q.Value", q_max_export = 0.5, id_mode = "runxid", unit = "runxprecursor", scope = "runwise", q_source = "Q.Value" ) # (C) Scope misuse comparator: min run-wise q over runs per precursor (anti-pattern) # Useful for demonstrating/diagnosing scope mismatch. x_minrun <- diann_global_minrunq( rep, q_col = "Q.Value", q_max_export = 0.5, unit = "precursor", scope = "aggregated", q_source = "min_run(Q.Value)" ) diag <- dfdr_run_all( xs = list(global = x_global_gq, runx = x_runx, minrun = x_minrun), alpha_main = 0.01, compute_pseudo_pvalues = TRUE # <-- This adds p-value diagnostics ) # Compare accepted lists across scopes (Jaccard overlap across alpha) scope_tbl <- dfdr_scope_disagreement( x1 = x_global_gq, x2 = x_minrun, alphas = c(1e-3, 2e-3, 5e-3, 1e-2, 2e-2, 5e-2), label1 = "Global.Q.Value", label2 = "min_run(Q.Value)" ) # Write outputs to disk (tables + plots; optionally PPTX) dfdr_write_report(diag, out_dir = "diagFDR_diann_out", formats = c("csv", "png", "manifest", "readme", "summary")) # Render a single HTML report (requires rmarkdown in Suggests) dfdr_render_report(diag, out_dir = "diagFDR_diann_out") ``` ## Interpretation notes - **Scope**: run-wise q-values (`Q.Value`) and global q-values (`Global.Q.Value`) do not control the same multiple-testing universe. Constructing experiment-wide lists by aggregating run-wise q-values (e.g., taking `min(Q.Value)` across runs) is generally anti-conservative. - **Stability**: stringent cutoffs can enter a *granular regime* where only a few decoys support the boundary. Inspect `D_alpha`, `CV_hat`, and the local boundary support `D_alpha_win` before making strong claims at very small alpha. - **Equal-chance diagnostics** (decoy fractions in low-confidence q-bands) and **PEP reliability** are internal consistency checks under target--decoy assumptions; they do not replace external validation (e.g. entrapment) when decoy representativeness is uncertain.