---
title: "Getting Started with rsynthbio"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with synthesizeR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE  # Set to FALSE since API calls require credentials
)
```

# rsynthbio

`rsynthbio` is an R package that provides a convenient interface to the Synthesize Bio API, allowing users to generate realistic gene expression data based on specified biological conditions. This package enables researchers to easily access AI-generated transcriptomic data for various modalities including bulk RNA-seq, single-cell RNA-seq, microarray data, and more.

## How to install

You can install `rsynthbio` from CRAN:

```{r installation, eval=FALSE}
install.packages("rsynthbio")
```

If you want the development version, you can install using the `remotes` package to install from GitHub:

```{r github-installation, eval=FALSE}
if (!("remotes" %in% installed.packages())) {
  install.packages("remotes")
}
remotes::install_github("synthesizebio/rsynthbio")
```

Once installed, load the package:

```{r}
library(rsynthbio)
```

## Authentication

Before using the Synthesize Bio API, you need to set up your API token. The package provides a secure way to handle authentication:

```{r auth-secure, eval=FALSE}
# Securely prompt for and store your API token
# The token will not be visible in the console
set_synthesize_token()

# You can also store the token in your system keyring for persistence
# across R sessions (requires the 'keyring' package)
set_synthesize_token(use_keyring = TRUE)
```

Loading your API key for a session. 

```{r, eval=FALSE}
# In future sessions, load the stored token
load_synthesize_token_from_keyring()

# Check if a token is already set
has_synthesize_token()
```

You can obtain an API token by registering at [Synthesize Bio](https://app.synthesize.bio).

### Security Best Practices

For security reasons, remember to clear your token when you're done:

```{r clear-token, eval = FALSE}
# Clear token from current session
clear_synthesize_token()

# Clear token from both session and keyring
clear_synthesize_token(remove_from_keyring = TRUE)
```

Never hard-code your token in scripts that will be shared or committed to version control.

## Basic Usage

### Available Modalities

Some Synthesize models support generation of different gene expression data types. 

In the v2 model, you should use "bulk" for bulk gene expression.

```{r modalities}
# Check available modalities
get_valid_modalities()
```

### Creating a Query

The first step to generating AI-generated gene expression data is to create a query. The package provides a sample query that you can modify:

```{r query}
# Get a sample query
query <- get_valid_query()

# Inspect the query structure
str(query)
```

The query consists of:

1. `output_modality`: The type of gene expression data to generate (see `get_valid_modalities`)
2. `mode`: The prediction mode (e.g., "mean estimation" or "sample generation")
3. `inputs`: A list of biological conditions to generate data for

We train our models with diverse multi-omics datasets. There are two model types/modes available today:

+ Sample generation: This runs in "diffusion" mode and generates different results for each sample requested. Use this mode to understand the distribution of expression across sample groups.

+ Mean estimation: This is deterministic. For a given metadata specification, you will get the same values.

```{r predict, eval=FALSE}
# Request raw counts data
result <- predict_query(query)
```

This result will be a list of two dataframes: `metadata` and `expression`

### Modifying a Query

You can customize the query to fit your specific research needs:

```{r modify-query}
# Change output modality
query$output_modality <- "single_cell_rna-seq"

# Adjust number of samples
query$inputs[[1]]$num_samples <- 10

# Modify cell line information
query$inputs[[1]]$metadata$cell_line <- "MCF7"
query$inputs[[1]]$metadata$perturbation <- "TP53"

# Add a new condition
query$inputs[[3]] <- list(
  metadata = list(
    tissue = "lung",
    disease = "adenocarcinoma",
    sex = "male",
    age = "57 years",
    sample_type = "primary tissue"
  ),
  num_samples = 3
)
```

### Making Predictions

Once your query is ready, you can send it to the API to generate gene expression data.

```{r predict-2, eval=FALSE}
# Request raw counts data
result <- predict_query(query, as_counts = TRUE)
```

If you want the full API response beyond just than just the result of the metadata and expression returned put `raw_response = TRUE`.

### Working with Results

```{r analyze, eval=FALSE}
# Access metadata and expression matrices
metadata <- result$metadata
expression <- result$expression

# Check dimensions
dim(expression)

# View metadata sample
head(metadata)
```

You may want to process the data in chunks or save it for later use:

```{r large-data, eval=FALSE}
# Save results to RDS file
saveRDS(result, "synthesize_results.rds")

# Load previously saved results
result <- readRDS("synthesize_results.rds")

# Export as CSV
write.csv(result$expression, "expression_matrix.csv")
write.csv(result$metadata, "sample_metadata.csv")
```


### Custom Validation

You can validate your queries before sending them to the API:

```{r validation}
# Validate structure
validate_query(query)

# Validate modality
validate_modality(query)
```

## Session info

```{r session-info}
sessionInfo()
```

## Additional Resources

- [Package Source Code](https://github.com/synthesizebio/rsynthbio)
- [File Bug Reports](https://github.com/synthesizebio/rsynthbio/issues)