Introduction to annotaR

This vignette provides a walkthrough of the annotaR package, demonstrating how to perform a multi-layered annotation of a gene list.

1. Starting the Pipeline

First, we define a character vector of our genes of interest. For this example, we use a small list of well-known cancer-related genes. Then, we initialize the pipeline with the annotaR() function.

# A small list of well-known genes involved in cancer
genes_of_interest <- c(
  "TP53", "EGFR", "BRCA1", "BRCA2", "KRAS", "PIK3CA", "AKT1", "BRAF",
  "MYC", "ERBB2", "CDKN2A", "PTEN"
)

# Create the initial object
annotaR_obj <- annotaR(genes_of_interest)

print(annotaR_obj)
#> # A tibble: 12 × 1
#>    gene  
#>    <chr> 
#>  1 TP53  
#>  2 EGFR  
#>  3 BRCA1 
#>  4 BRCA2 
#>  5 KRAS  
#>  6 PIK3CA
#>  7 AKT1  
#>  8 BRAF  
#>  9 MYC   
#> 10 ERBB2 
#> 11 CDKN2A
#> 12 PTEN

2. Adding Functional and Disease Annotations

The power of annotaR comes from its pipe-friendly, layered approach. We can chain functions together to progressively add data. Here, we add Gene Ontology (GO) terms, disease associations, and known drug links.

# Note: The following steps query live APIs and may take a few moments.

full_annotation <- annotaR_obj %>%
  add_go_terms(sources = c("GO:BP")) %>%
  add_disease_links() %>%
  add_drug_links()

# Take a look at the resulting tidy data frame
# Use `head()` to show just the first few rows
head(full_annotation)
#> # A tibble: 6 × 11
#>   gene  term_id    term_name       p_value source disease_name association_score
#>   <chr> <chr>      <chr>             <dbl> <chr>  <chr>                    <dbl>
#> 1 TP53  GO:0006915 apoptotic pro… 4.26e-10 GO:BP  Li-Fraumeni…             0.876
#> 2 TP53  GO:0006915 apoptotic pro… 4.26e-10 GO:BP  Li-Fraumeni…             0.876
#> 3 TP53  GO:0006915 apoptotic pro… 4.26e-10 GO:BP  Li-Fraumeni…             0.876
#> 4 TP53  GO:0006915 apoptotic pro… 4.26e-10 GO:BP  Li-Fraumeni…             0.876
#> 5 TP53  GO:0006915 apoptotic pro… 4.26e-10 GO:BP  Li-Fraumeni…             0.876
#> 6 TP53  GO:0006915 apoptotic pro… 4.26e-10 GO:BP  Li-Fraumeni…             0.876
#> # ℹ 4 more variables: drug_name <chr>, drug_type <chr>,
#> #   mechanism_of_action <chr>, phase <int>

3. Visualizing Enrichment Results

After annotating, we can easily visualize the results. The plot_enrichment_dotplot() function creates a publication-ready plot for the GO enrichment data.

# The plot function uses the data from the `add_go_terms` step
plot_enrichment_dotplot(
  full_annotation,
  n_terms = 20,
  title = "Top 20 Enriched GO Biological Processes"
)