--- title: "Real cancer drivers walkthrough" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Real cancer drivers walkthrough} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") # Skip evaluation of all chunks on CRAN's auto-check farm to fit the # 10-minute build budget. Locally, on CI, and under devtools::check(), # NOT_CRAN=true and all chunks evaluate normally. The vignette source # (which CRAN users see in browseVignettes() / vignette()) is unchanged. NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), "true") knitr::opts_chunk$set(eval = NOT_CRAN) ``` # Real cancer drivers walkthrough This vignette uses the bundled `dataset_real_cancer_drivers_4` dataset to illustrate a real biological analysis: how do four canonical cancer driver catalogs overlap? The four sources are: * **Vogelstein** — the 138-gene catalog from Vogelstein et al. (Science 2013), often cited as the "core" oncogene set. * **COSMIC_CGC** — the COSMIC Cancer Gene Census (Sondka et al. 2018), a curated list of genes causally implicated in cancer. * **OncoKB** — the MSK precision-oncology knowledge base annotation level ≥ "Oncogenic" (Chakravarty et al. 2017). * **IntOGen** — pan-cancer driver mutations from the IntOGen pipeline (Martínez-Jiménez et al. 2020). ```{r setup-data} library(vennDiagramLab) ds <- load_sample("dataset_real_cancer_drivers_4") ds@set_names ``` ## Set sizes ```{r sizes} sapply(ds@items, length) ``` The lists are very different in size — Vogelstein is the smallest curated set; OncoKB is the most permissive at this annotation tier. ## Universe The dataset was built from a 20,000-gene background (`universe_size`): ```{r universe} ds@universe_size ``` This is the population N used in the hypergeometric over-representation tests (see `vignette("v05_statistics_deep_dive")`). ## Analyze ```{r analyze} result <- analyze(ds) result@model length(result@regions) ``` The default model for 4 sets is `venn-4-set` (Edwards-style). ## Set sizes (inclusive) and intersection layout ```{r set-sizes-table} result@set_sizes ``` ## A summary at a glance `broom::glance()` returns a one-row tibble with the headline numbers: ```{r glance} broom::glance(result) ``` ## Render the venn diagram The default render uses the dataset's set names as labels. To shorten them for the diagram, pass a per-letter override: ```{r render-custom} svg <- render_venn_svg( result, set_names = c(A = "Vogelstein", B = "COSMIC", C = "OncoKB", D = "IntOGen"), title = "Cancer driver overlap (4 sources)" ) nchar(svg) ``` (See `vignette("v08_custom_styling_and_export")` for color overrides and post-render SVG manipulation.) ## UpSet view For 4+ sets, an UpSet plot is often easier to read than the Venn diagram — each intersection size is a bar, sorted by cardinality. ```{r upset, eval = NOT_CRAN && (getRversion() >= "4.6")} upset_plot <- render_upset(result, sort_by = "size") upset_plot ``` (The chunk above is gated on `R >= 4.6` because the CRAN release of `ComplexUpset` (1.3.3) is incompatible with `ggplot2 >= 4.0` on older R — see `?vennDiagramLab::render_upset` for context.) ## Top significant intersections `broom::tidy()` returns one row per set pair, with all five pairwise metrics plus the BH-FDR-adjusted hypergeometric p-value: ```{r tidy} top_pairs <- broom::tidy(result) top_pairs[order(top_pairs$p_adjusted), c("set_a", "set_b", "intersection", "jaccard", "p_adjusted", "significant")] ``` Every pair is significant at FDR < 0.05 (as expected — these catalogs are designed to overlap on biology). ## Item-level annotation `broom::augment()` returns one row per gene with set-membership flags and the region label. ```{r augment} gene_table <- broom::augment(result) head(gene_table) nrow(gene_table) # total unique genes across all four sets table(gene_table$region_label) # how many genes in each region ``` ## Save the region summary ```{r save-summary, eval = FALSE} to_region_summary_tsv(result, "cancer_drivers_regions.tsv") ``` ## What's next * `vignette("v05_statistics_deep_dive")` — interpret the Jaccard / Dice / hypergeometric numbers in detail. * `vignette("v07_pdf_reports")` — turn this analysis into a multi-page PDF. * `vignette("v08_custom_styling_and_export")` — customize colors, embed in a ggplot, export to PDF/PNG.