---
title: "Databases"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Databases}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
There are many databases available in AWS services. This package deals with a subset of them, including:
- RDS: MariaDB, MySQL, Postgres
- Redshift
We only handle a subset as `sixtyfour` aims to be an opinionated package providing easy to use interfaces - achieving that goal means supporting the subset of the most well troden paths.
```r
library(sixtyfour)
```
All database related functions in `sixtyfour` should start with `aws_db`.
## Redshift
The `aws_db_redshift_create` function creates a cluster for you (a Redshift instance is called a "cluster").
There's an important distinction between Redshift and RDS. Redshift uses IAM username/password, whereas with RDS you can use username/password setup for each instance of a RDS database, or do authentication through IAM. However, with RDS you can't simply pass your IAM username/password (see notes below).
First, let's create a security group with an ingress rule so you can access your Redshift cluster.
```r
my_security_group <- aws_vpc_sg_with_ingresss("redshift")
```
Notes on the parameters used in the example below:
- `id`: this is an ID you come up with, it must be unique for all clusters within your AWS account
- `user`/`pwd`: your IAM username and password
- `security_group_ids`: it's best to first create a security group to handle permissions to your Redshift cluster. See example above for how to do that. Pass in it's identifier here
- `wait`: Since we're using `wait=TRUE` the call to `aws_db_redshift_create` will likely take about 3 minutes to run. You can set `wait=FALSE` and not wait, but then you'll want to check yourself when the instance is available.
```r
aws_db_redshift_create(
id = "some-id",
user = "your-username",
pwd = "your-pwd",
security_group_ids = list(my_security_group),
wait = TRUE
)
```
Connect to the cluster
```r
con <- aws_db_redshift_con(
user = "your-username",
pwd = "your-pwd",
id = "some-id"
)
```
List tables, create a table, etc
```r
library(DBI)
dbListTables(con)
dbWriteTable(con, "mtcars", mtcars)
dbListTables(con)
dbReadTable(con, "mtcars")
```
Use dplyr/et al.
```r
library(dplyr)
tbl(con, "mtcars") %>%
filter(mpg == 4)
```
Important: Remember to delete your cluster when your done!
## RDS
The process for MariaDB, MySQL, and Postgres are more or less the same, so we'll only demonstrate MariaDB.
In a future version of this package you'll be able to use IAM to authenticate with RDS, but for now `sixtyfour` does not support IAM for RDS. The current "happy path" (read: easy) process of starting an RDS instance with `aws_db_rds_create` is as follows:
- Supply an ID (identifier) for your instance that you create
- A random username is created and used for database authentication
- A random password is created (via AWS Secretamanager) and used for database authentication
- A security groups is created
- An ingress rule is added to the security group with your IP address
- On RDS instance creation, we use the above username, password, and security group
- The username, password, and security group information are stored in AWS Secretamanager for subsequent use
To connect to your RDS instance, you use `aws_db_rds_con`. The "happy path" for connecting is:
- Pass in only the ID used above
- We fetch secrets from the AWS Secretamanager and ask you which you'd like to use
- We gather all the necessary details to connect to your instance
- Return a DBI connection object, e.g., `MariaDBConnection` for MariaDB
Let's walk through the steps with some code.
First, create an RDS instance - in this case for MariaDB.
Notes on the parameters used in the example below:
- `id`: this is an ID you come up with. see `?aws_db_rds_create` for constraints
- `class`: The compute and memory capacity of the instance; see the [AWS docs](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.DBInstanceClass.html) for instance class options. The default for this parameter (`db.t3.micro`) gives you an instance with relatively small capacity in terms of memory and compute - but is set that way to minimize costs in case you forget to turn it off if you're not using it!
- `engine`: the default is `mariadb`; other options are mysql or postgres
- `wait`: Since we're using `wait=TRUE` (the default for this function) the call to `aws_db_rds_create` will likely take about 5 minutes to run. You can set `wait=FALSE` and not wait, but then you'll want to check yourself when the instance is available.
```r
aws_db_rds_create(
id = "myinstance",
class = "db.t3.micro",
engine = "mariadb",
wait = TRUE
)
```
Connect to the instance
```r
con <- aws_db_rds_con(id = "myinstance")
```
List tables, create a table, etc
```r
library(DBI)
dbListTables(con)
dbWriteTable(con, "mtcars", mtcars)
dbListTables(con)
dbReadTable(con, "mtcars")
```
Use dplyr/et al.
```r
library(dplyr)
tbl(con, "mtcars") %>%
filter(mpg == 4)
```