--- title: "Getting started with toolero" author: "Erwin Lares" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting started with toolero} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## What is toolero? `toolero` is a small, opinionated toolkit designed to make the first steps of an R project faster and more consistent. It targets researchers and analysts who want to spend less time on setup and more time on the work itself. The package currently provides two functions: - `init_project()` — creates a new R project with a standard folder structure, and optionally initializes `renv` and `git` - `read_clean_csv()` — reads a CSV file and cleans column names in one step Both functions are designed around a simple idea: the decisions you make at the start of a project — how it is organized, how data is read in, how dependencies are tracked — have an outsized effect on how maintainable and reproducible that project turns out to be. `toolero` tries to make the right defaults easy to reach for. --- ## Installation You can install `toolero` from CRAN: ```{r} #| eval: false install.packages("toolero") ``` Or install the development version from GitHub: ```{r} #| eval: false pak::pak("erwinlares/toolero") ``` --- ## Starting a project with `init_project()` ### The problem Starting a new R project usually means the same manual steps every time: create a folder, set up an RStudio project, create subdirectories for data and scripts, initialize `renv`, initialize `git`. None of these steps is hard on its own, but skipping any of them — especially early on — tends to create friction later. A project without `renv` is harder to share. A project without `git` is harder to recover. A project without a consistent folder structure is harder to hand off. ### The solution `init_project()` handles all of this in a single call: ```{r} #| eval: false library(toolero) init_project("~/Documents/my-project") ``` This creates a new RStudio project at the specified path with the following folder structure already in place: ``` my-project/ ├── data/ # input data ├── data-raw/ # original, unprocessed data ├── R/ # reusable functions ├── scripts/ # analysis scripts ├── plots/ # generated visualizations ├── images/ # static images and assets ├── results/ # processed outputs and tables └── docs/ # notes, manuscripts, Quarto documents ``` > **Why this structure?** The folder layout is opinionated but not arbitrary. > Separating `data/` from `data-raw/` makes it clear which files are original > and which have been processed. Keeping `R/` distinct from `scripts/` > encourages moving reusable logic into functions over time, which is a natural > step toward more maintainable code. By default, `init_project()` also initializes `renv` and `git` in the new project. This means the project is reproducible and version-controlled from the first commit. > **Why `renv` and `git` by default?** `renv` ensures that the packages your > project depends on are recorded and reproducible — someone else (or your > future self) can restore the exact same environment. `git` provides a full > history of changes, making it possible to recover from mistakes and understand > how the project evolved. Both are much easier to set up at the start than to > retrofit later. ### Adding extra folders If your project needs folders beyond the defaults, pass them as a character vector via `extra_folders`: ```{r} #| eval: false init_project( "~/Documents/my-project", extra_folders = c("notebooks", "presentations") ) ``` ### Opting out of renv or git If you need to skip one or both: ```{r} #| eval: false init_project( "~/Documents/my-project", use_renv = FALSE, use_git = FALSE ) ``` > **When might you skip `renv` or `git`?** Skipping them is occasionally useful > in teaching or demonstration contexts where the overhead of a full setup is > unnecessary. For any project you plan to share, archive, or return to later, > the defaults are strongly recommended. --- ## Reading data with `read_clean_csv()` ### The problem Reading a CSV file into R is straightforward — until the column names come back with spaces, mixed capitalization, or special characters. Cleaning them up is a small but recurring friction point. ### The solution `read_clean_csv()` combines `readr::read_csv()` and `janitor::clean_names()` into a single call: ```{r} #| eval: false data <- read_clean_csv("data/my-file.csv") ``` Column names are automatically converted to lowercase with underscores — consistent, predictable, and tidyverse-friendly. A column called `First Name` becomes `first_name`. `Q1 Revenue ($)` becomes `q1_revenue`. By default, column type messages from `readr` are suppressed to keep the output clean. If you want to see them — useful when reading an unfamiliar dataset for the first time — set `verbose = TRUE`: ```{r} #| eval: false data <- read_clean_csv("data/my-file.csv", verbose = TRUE) ``` > **Why suppress messages by default?** Column type messages are helpful when > you are first exploring a dataset. In a script or document that runs > repeatedly, they become noise. The `verbose` argument gives you control > without requiring you to remember the `readr` option name. --- ## What's next for toolero? `toolero` is intentionally small right now. The two functions it provides solve a specific, recurring problem — getting a project started correctly. Future versions may include: - Utilities for common data validation patterns - Helpers for writing reproducible reports - Functions that ease the transition from local analysis to running code on remote computing infrastructure The goal is not to be comprehensive. It is to make the right habits easy to reach for from the first line of code. --- ## Citation If you use `toolero` in your work, please cite it: ```{r} #| eval: false citation("toolero") ```