--- title: "Getting Started with SCIproj" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with SCIproj} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## What is a research compendium? A **research compendium** is a self-contained collection of data, code, and documentation that accompanies a research project. By structuring a project as an R package, you gain: - a standard, well-understood directory layout, - built-in dependency management via `DESCRIPTION`, - documentation infrastructure (`roxygen2`, vignettes), - testing infrastructure (`testthat`), - easy sharing and installation via GitHub. SCIproj automates the creation of such a compendium, adding opinionated defaults for reproducible workflows (`targets`), dependency snapshots (`renv`), and FAIR-compliant metadata (`CITATION.cff`). ## Getting started Install SCIproj from GitHub: ```{r install, eval = FALSE} # install.packages("remotes") remotes::install_github("saskiaotto/SCIproj") ``` Create a new project with a single call: ```{r basic, eval = FALSE} library(SCIproj) create_proj("~/projects/my_analysis") ``` This creates a fully scaffolded research compendium with `renv` and `targets` enabled by default. ### Customizing the call ```{r custom, eval = FALSE} create_proj("~/projects/baltic_cod", add_license = "MIT", license_holder = "Jane Doe", orcid = "0000-0001-2345-6789", use_docker = TRUE, use_git = TRUE ) ``` Directory names with underscores or hyphens are fine --- the R package name in `DESCRIPTION` is automatically sanitized (e.g., `baltic_cod` becomes `baltic.cod`). ## Project structure After creation, the project directory looks like this: ``` your-project/ ├── DESCRIPTION # Project metadata, dependencies, and author info (with ORCID). ├── README.Rmd # Top-level project description. ├── your-project.Rproj # RStudio project file. ├── CITATION.cff # Machine-readable citation metadata for FAIR compliance. ├── CONTRIBUTING.md # Contribution guidelines. ├── LICENSE.md # Full license text (here: MIT). ├── NAMESPACE # Auto-generated by roxygen2 (do not edit by hand). │ ├── data-raw/ # Raw data files and pre-processing scripts. │ ├── clean_data.R # Script template for data cleaning. │ ├── DATA_SOURCES.md # Data provenance: source, license, DOI, download date. │ └── ... │ ├── data/ # Cleaned datasets stored as .rda files. │ ├── R/ # Custom R functions and dataset documentation. │ ├── function_ex.R # Template for custom functions. │ ├── data.R # Template for dataset documentation. │ └── ... │ ├── analyses/ # R scripts or R Markdown/Quarto documents for analyses. │ ├── figures/ # Generated plots. │ └── ... │ ├── docs/ # Publication-ready documents (article, report, presentation). ├── trash/ # Temporary files that can be safely deleted. │ ├── _targets.R # Pipeline definition for reproducible workflow. ├── renv/ # renv library and settings. ├── renv.lock # Lockfile for reproducible package versions. └── Dockerfile # Container definition for full reproducibility. ``` | Directory / File | Purpose | |---------------------|------------------------------------------------------| | `R/` | Reusable R functions (documented with `roxygen2`) | | `data/` | Cleaned, analysis-ready datasets (`.rda` format) | | `data-raw/` | Raw data files and the script that cleans them | | `analyses/` | Analysis scripts, R Markdown reports, figures | | `docs/` | Manuscripts, presentations, supplementary material | | `trash/` | Temporary files not under version control | | `_targets.R` | Pipeline definition for `targets` | | `CITATION.cff` | Machine-readable citation metadata | | `CONTRIBUTING.md` | Guidelines for collaborators | ## FAIR compliance SCIproj encourages **FAIR** (Findable, Accessible, Interoperable, Reusable) research practices through several built-in features: ### CITATION.cff A [Citation File Format](https://citation-file-format.github.io/) file is created automatically. It includes the project title, author name, version, release date, and optionally a license and ORCID iD. Services like GitHub and Zenodo can parse this file to generate proper citations. ```{r citation, eval = FALSE} create_proj("my_project", license_holder = "Jane Doe", orcid = "0000-0001-2345-6789", add_license = "MIT" ) ``` ### DATA_SOURCES.md When `data_raw = TRUE` (the default), a `DATA_SOURCES.md` template is placed in `data-raw/`. Use it to document the provenance of every dataset: source, URL, DOI, license, download date, and file names. ### ORCID Pass your [ORCID iD](https://orcid.org/) via the `orcid` parameter to embed it in `CITATION.cff`, making your authorship unambiguously machine-readable. ## Workflow with targets By default (`use_targets = TRUE`), SCIproj adds a `_targets.R` pipeline template. The [targets](https://docs.ropensci.org/targets/) package provides: - **Automatic dependency tracking** --- only outdated targets are re-run. - **Caching** --- results are stored in the `_targets/` data store. - **Visualization** --- `tar_visnetwork()` shows the pipeline as a graph. A typical workflow: ```{r targets, eval = FALSE} # 1. Define targets in _targets.R # 2. Inspect the pipeline targets::tar_manifest() targets::tar_visnetwork() # 3. Run the pipeline targets::tar_make() # 4. Read a result targets::tar_read(my_result) ``` Edit `_targets.R` to define your data-loading, analysis, and reporting steps. Each step is a target that depends on upstream targets and R functions in `R/`. ## Dependency management with renv By default (`use_renv = TRUE`), SCIproj initializes [renv](https://rstudio.github.io/renv/) with the `"explicit"` snapshot type. This means renv discovers dependencies from `DESCRIPTION` rather than scanning all R files, which is the recommended approach for package-based compendia. Key commands: ```{r renv, eval = FALSE} renv::status() # check if lockfile is in sync renv::snapshot() # update the lockfile after adding packages renv::restore() # reinstall packages from the lockfile ``` The `renv.lock` file should be committed to version control so collaborators can reproduce your exact package versions. ## Optional features ### Docker Set `use_docker = TRUE` to add a `Dockerfile` and `.dockerignore`. The Dockerfile provides a template for building a container that reproduces your computational environment, independent of the host system. ### GitHub and CI Set `create_github_repo = TRUE` to create a GitHub repository (requires a configured `GITHUB_PAT`). Add `ci = "gh-actions"` to include a GitHub Actions workflow for automated R CMD check on push. ```{r github, eval = FALSE} create_proj("my_project", use_git = TRUE, create_github_repo = TRUE, ci = "gh-actions" ) ``` ### Licenses Choose from `"MIT"`, `"GPL"`, `"AGPL"`, `"LGPL"`, `"Apache"`, `"CCBY"`, or`"CC0"` via the `add_license` parameter. The selected license is applied to `DESCRIPTION` and recorded in `CITATION.cff`. ### testthat Set `testthat = TRUE` to add testing infrastructure (`tests/testthat.R` and `tests/testthat/`). Writing tests for your analysis functions helps catch regressions early. ### Makefile Set `makefile = TRUE` to add a `makefile.R` script as an alternative to `targets` for orchestrating your workflow. ## Typical development cycle 1. **Create the project** ```r SCIproj::create_proj("~/projects/my_study", add_license = "MIT", license_holder = "Your Name") ``` 2. **Open the `.Rproj` file** in RStudio. 3. **Add raw data** to `data-raw/` and document it in `DATA_SOURCES.md`. 4. **Write cleaning code** in `data-raw/clean_data.R`; save cleaned data to `data/` with `usethis::use_data()`. 5. **Write analysis functions** in `R/` and document them with `roxygen2`. 6. **Define the pipeline** in `_targets.R` to connect data, functions, and reports. 7. **Run `targets::tar_make()`** to execute the pipeline. 8. **Write reports** in `analyses/` using R Markdown or Quarto, reading results with `targets::tar_read()`. 9. **Snapshot dependencies** with `renv::snapshot()` before sharing. 10. **Push to GitHub** and let CI run `R CMD check` automatically.