
Seamless AWS cloud bursting for parallel R workloads
staRburst lets you run parallel R code on AWS with zero infrastructure management. Scale from your laptop to 100+ cloud workers with a single function call. Supports both EC2 (recommended for performance and cost) and Fargate (serverless) backends.
starburst_map()
function - no new concepts to learnCRAN submission in progress for v0.3.6 (expected within 2-4 weeks).
Once available:
install.packages("starburst")Development version from GitHub:
remotes::install_github("scttfrdmn/starburst")library(starburst)
# One-time setup (2 minutes)
starburst_setup()
# Run parallel computation on AWS
results <- starburst_map(
1:1000,
function(x) expensive_computation(x),
workers = 50
)
#> 🚀 Starting starburst cluster with 50 workers
#> 💰 Estimated cost: ~$2.80/hour
#> 📊 Processing 1000 items with 50 workers
#> 📦 Created 50 chunks (avg 20 items per chunk)
#> 🚀 Submitting tasks...
#> ✓ Submitted 50 tasks
#> ⏳ Progress: 50/50 tasks (3.2 minutes elapsed)
#>
#> ✓ Completed in 3.2 minutes
#> 💰 Estimated cost: $0.15library(starburst)
# Define simulation
simulate_portfolio <- function(seed) {
set.seed(seed)
returns <- rnorm(252, mean = 0.0003, sd = 0.02)
prices <- cumprod(1 + returns)
list(
final_value = prices[252],
sharpe_ratio = mean(returns) / sd(returns) * sqrt(252)
)
}
# Run 10,000 simulations on 100 AWS workers
results <- starburst_map(
1:10000,
simulate_portfolio,
workers = 100
)
#> 🚀 Starting starburst cluster with 100 workers
#> 💰 Estimated cost: ~$5.60/hour
#> 📊 Processing 10000 items with 100 workers
#> ⏳ Progress: 100/100 tasks (3.1 minutes elapsed)
#>
#> ✓ Completed in 3.1 minutes
#> 💰 Estimated cost: $0.29
# Extract results
final_values <- sapply(results, function(x) x$final_value)
sharpe_ratios <- sapply(results, function(x) x$sharpe_ratio)
# Summary
mean(final_values) # Average portfolio outcome
quantile(final_values, c(0.05, 0.95)) # Risk range
# Comparison:
# Local (single core): ~4 hours
# Cloud (100 workers): 3 minutes, $0.29# Create cluster once
cluster <- starburst_cluster(workers = 50, cpu = 4, memory = "8GB")
# Run multiple analyses
results1 <- cluster$map(dataset1, analysis_function)
results2 <- cluster$map(dataset2, processing_function)
results3 <- cluster$map(dataset3, modeling_function)
# All use the same Docker image and configuration# For memory-intensive workloads
results <- starburst_map(
large_datasets,
memory_intensive_function,
workers = 20,
cpu = 8,
memory = "16GB"
)
# For CPU-intensive workloads
results <- starburst_map(
cpu_tasks,
cpu_intensive_function,
workers = 50,
cpu = 4,
memory = "8GB"
)Run long jobs and disconnect - results persist in S3:
# Start detached session
session <- starburst_session(workers = 50, detached = TRUE)
# Submit work and get session ID
session$submit(quote({
results <- starburst_map(huge_dataset, expensive_function)
saveRDS(results, "results.rds")
}))
session_id <- session$session_id
# Disconnect - job continues running
# Later (hours/days), reconnect:
session <- starburst_session_attach(session_id)
status <- session$status() # Check progress
results <- session$collect() # Get results
# Cleanup when done
session$cleanup(force = TRUE)# Set cost limits
starburst_config(
max_cost_per_job = 10, # Hard limit
cost_alert_threshold = 5 # Warning at $5
)
# Costs shown transparently
results <- starburst_map(data, fn, workers = 100)
#> 💰 Estimated cost: ~$3.50/hour
#> ✓ Completed in 23 minutes
#> 💰 Estimated cost: $1.34staRburst automatically handles AWS Fargate quota limitations:
results <- starburst_map(data, fn, workers = 100, cpu = 4)
#> ⚠ Requested 100 workers (400 vCPUs) but quota allows 25 workers (100 vCPUs)
#> ⚠ Using 25 workers instead
#> 💰 Estimated cost: ~$1.40/hourYour work still completes, just with fewer workers. You can request quota increases through AWS Service Quotas.
starburst_map(.x, .f, workers, ...) - Parallel map over
datastarburst_cluster(workers, cpu, memory) - Create
reusable clusterstarburst_setup() - Initial AWS configurationstarburst_config(...) - Update configurationstarburst_status() - Check cluster statusstarburst_config(
region = "us-east-1",
max_cost_per_job = 10,
cost_alert_threshold = 5
)Full documentation available at starburst.ing
| Feature | staRburst | RStudio Server on EC2 | Coiled (Python) |
|---|---|---|---|
| Setup time | 2 minutes | 30+ minutes | 5 minutes |
| Infrastructure management | Zero | Manual | Zero |
| Learning curve | Minimal | Medium | Medium |
| Auto scaling | Yes | No | Yes |
| Cost optimization | Automatic | Manual | Automatic |
| R-native | Yes | Yes | No (Python) |
AWS_PROFILE setstarburstECSExecutionRole - for ECS/ECR accessstarburstECSTaskRole - for S3 accessFor detailed setup instructions, see the Getting Started guide.
starburst_map,
starburst_cluster)future backend integrationfuture.apply, furrr,
targetsContributions welcome! See the GitHub repository for contribution guidelines.
Apache License 2.0 - see LICENSE
Copyright 2026 Scott Friedman
@software{starburst,
title = {staRburst: Seamless AWS Cloud Bursting for R},
author = {Scott Friedman},
year = {2026},
version = {0.3.6},
url = {https://starburst.ing},
license = {Apache-2.0}
}Built using the paws AWS SDK for R.
Container management with renv and rocker.
Inspired by Coiled for Python/Dask.