--- title: "Reproducibility workflow: snapshot, manifest, SHA-256, Zenodo" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Reproducibility workflow: snapshot, manifest, SHA-256, Zenodo} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>", eval = FALSE) ``` Published tax research (PBO costings, Grattan reform papers, Tax Institute briefs) has a reproducibility bar that goes beyond "I called `ato_individuals()` and summed column X." Reviewers need to verify that the data you used is exactly the data you say you used. `ato` provides four features to meet that bar: 1. **Snapshot pin** : declare the intended vintage of the data. 2. **SHA-256 integrity** : every cached file is hashed; drift warns. 3. **Session manifest** : every fetch is recorded with URL, SHA, retrieval time, and snapshot pin. 4. **Zenodo DOI** : mint a DOI for the manifest so a paper can cite the exact data snapshot. ## Setup ```{r} library(ato) ato_snapshot("2026-04-24") ato_manifest_clear() ``` ## Fetch your datasets ```{r} ind <- ato_individuals_postcode( year = c("2020-21", "2021-22", "2022-23"), state = "NSW" ) companies <- ato_companies(year = "2022-23", table = "industry") tax_gap <- ato_tax_gaps() ``` Each `ato_tbl` prints with the snapshot pin and SHA-256 digest in its provenance header. ## Inspect the session manifest ```{r} man <- ato_manifest() man[, c("title", "sha256", "retrieved", "snapshot_date")] ``` ## Export the manifest for your paper appendix ```{r} ato_manifest_write("appendix/ato_manifest.csv") ato_manifest_write("appendix/ato_manifest.yaml") ``` ## Mint a DOI via Zenodo A DOI makes "retrieved from data.gov.au on 2026-04-24" citable and immutable. Your paper then cites `doi:10.5281/zenodo.XXXXXXXX` instead of a URL that might rotate. ```{r} dep <- ato_deposit_zenodo( title = "ATO data snapshot for working paper v1", creators = list(list(name = "Author, A.", orcid = "0000-0000-0000-0000")), upload = FALSE # dry run; inspect payload first ) dep$payload$metadata$title # When ready to actually deposit: # Sys.setenv(ZENODO_TOKEN = "...your token...") # dep <- ato_deposit_zenodo(upload = TRUE) # dep$doi_prereserve ``` ## Citing a dataset with full provenance ```{r} ato_cite(ind, style = "bibtex", doi = "10.5281/zenodo.XXXXXXXX") ``` The BibTeX `note` field includes the snapshot date and first 12 hex characters of the SHA-256. That is the verifiable audit trail a reviewer would ask for.