---
title: "Quick Start"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Quick Start}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
# Quick Start for gimap
For more background on gimap and the calculations done here, [read here](https://github.com/FredHutch/gimap/blob/main/README.md)
## Requirements
Besides installing the gimap package, you will also need to install wget if you do not already have it installed. This will allow you to download the annotation files needed to run `gimap`.
[How to install wget](https://phoenixnap.com/kb/wget-command-with-examples#How_to_Install_wget)
To install the `gimap` package you will need to run:
```
install.packages("gimap")
```
Or you can install the development version from GitHub:
```
install.packages("remotes")
remotes::install_github("FredHutch/gimap")
```
## Loading needed packages
```{r echo = FALSE, results = 'hide'}
library(gimap)
```
```{r echo = FALSE, results = 'hide'}
library(dplyr)
```
## Set Up
First we can create a folder we will save files to.
```
output_dir <- "output_timepoints"
dir.create(output_dir, showWarnings = FALSE)
```
```{r eval = FALSE}
example_data <- get_example_data("count")
```
## Setting up data
We're going to set up three datasets that we will provide to the `set_up()` function to create a `gimap` dataset object.
- `counts` - the counts generated from pgPEN
- `pg_ids` - the IDs that correspond to the rows of the counts and specify the construct
- `sample_metadata` - metadata that describes the columns of the counts including their timepoints
```{r eval = FALSE}
counts <- example_data %>%
select(c("Day00_RepA", "Day05_RepA", "Day22_RepA", "Day22_RepB", "Day22_RepC")) %>%
as.matrix()
```
`pg_id` are just the unique IDs listed in the same order/sorted the same way as the count data.
```{r eval = FALSE}
pg_ids <- example_data %>%
dplyr::select("id")
```
Sample metadata is the information that describes the samples and is sorted the same order as the columns in the count data.
```{r eval = FALSE}
sample_metadata <- data.frame(
col_names = c("Day00_RepA", "Day05_RepA", "Day22_RepA", "Day22_RepB", "Day22_RepC"),
day = as.numeric(c("0", "5", "22", "22", "22")),
rep = as.factor(c("RepA", "RepA", "RepA", "RepB", "RepC"))
)
```
We'll need to provide `example_counts`, `pg_ids` and `sample_metadata` to `setup_data()`.
```{r eval = FALSE}
gimap_dataset <- setup_data(
counts = counts,
pg_ids = pg_ids,
sample_metadata = sample_metadata
)
```
It's ideal to run quality checks first. The `run_qc()` function will create a report we can look at to assess this.
```{r eval = FALSE}
run_qc(gimap_dataset,
output_file = "example_qc_report.Rmd",
overwrite = TRUE,
quiet = TRUE)
```
You can take a look at an [example QC report here](http://htmlpreview.github.io/?https://raw.githubusercontent.com/FredHutch/gimap/main/inst/example_qc_report.html).
```{r eval = FALSE}
gimap_dataset <- gimap_dataset %>%
gimap_filter() %>%
gimap_annotate(cell_line = "HELA") %>%
gimap_normalize(
timepoints = "day"
) %>%
calc_gi()
```
## Example output
Genetic interaction is calculated by:
- `rep` - indicates which sample from the original the data is from. Note the pretreatment is used for calculation and its statistics are not reported.
- `pgRNA_target` - what gene(s) were targeted by this the original pgRNAs for these data
- `mean_expected_cs` - the average expected genetic interaction score
- `mean_gi_score` - the average observer genetic interaction score
- `target_type` - describes whether the CRISPR design is targeting two genes ("gene_gene"), or a gene and a non targeting control ("gene_ctrl") or a targeting control and a gene ("ctrl_gene").
- `p_val` - p values from the testing whether a double knockout construct is significantly different in its genetic interaction score from single targets.
- `fdr` - False discovery rate corrected p values
```{r eval = FALSE}
gimap_dataset$gi_scores %>%
dplyr::arrange(fdr) %>%
head() %>%
knitr::kable(format = "html")
```
## Plot the results
You can remove any samples from these plots by altering the `reps_to_drop` argument.
```{r eval = FALSE}
plot_exp_v_obs_scatter(gimap_dataset, reps_to_drop = "Day05_RepA_early")
```
```{r eval = FALSE}
plot_rank_scatter(gimap_dataset, reps_to_drop = "Day05_RepA_early")
```
```{r eval = FALSE}
plot_volcano(gimap_dataset, reps_to_drop = "Day05_RepA_early", facet_rep = FALSE)
```
Here's how you can save plots like the above.
```
ggplot2::ggsave("volcano_plot.png")
```
### Plot specific target pair
We can pick out a specific pair to plot.
```{r eval = FALSE}
# "NDEL1_NDE1" is top result so let's plot that
plot_targets_bar(gimap_dataset, target1 = "NDEL1", target2 = "NDE1")
```
## Saving data to a file
We can save all these data as an RDS or the genetic interaction scores themselves to a tsv file.
```
saveRDS(gimap_dataset, "gimap_dataset_final.RDS")
```
```
readr::write_tsv(gimap_dataset$gi_scores, "gi_scores.tsv")
```
## Session Info
This is just for provenance purposes.
```{r eval = FALSE}
sessionInfo()
```