--- title: "Creating and Sharing Recipes" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Creating and Sharing Recipes} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE ) ``` ## Why Recipes? Anyone who works with household survey microdata knows this pattern: you download the raw files, open the codebook, and spend days recoding employment status, harmonizing income variables, and building indicators. Months later, a colleague starts the same project and writes the same code from scratch. In STATA, teams share `.do` files, but these are tightly coupled to specific file paths and variable names, and there is no standard way to discover or validate them. **Recipes** are metasurvey's answer to this problem. A recipe is a portable, documented, and validated collection of transformation steps that can: - Be applied to any compatible edition of a survey with a single function call - Be published to a registry where others can discover and reuse them - Be certified by institutions for official use - Generate automatic documentation of inputs, outputs, and pipeline ## Recipe Lifecycle ``` ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ 1. Develop │────▶│ 2. Package │────▶│ 3. Validate │ │ steps on a │ │ steps into │ │ against new │ │ survey │ │ a recipe │ │ data │ └──────────────┘ └──────────────┘ └──────┬───────┘ │ ┌──────────────┐ ┌──────────────┐ │ │ 5. Discover │◀────│ 4. Publish │◀────────────┘ │ & reuse │ │ to registry │ │ recipes │ │ or API │ └──────────────┘ └──────────────┘ ``` ## Loading a Survey The typical starting point is loading a survey with `load_survey()`. You can load example data or point to your own files: ```r library(metasurvey) # Load ECH 2022 with example data (downloads from GitHub) ech_2022 <- load_survey( load_survey_example("ech", "ech_2022"), svy_type = "ech", svy_edition = "2022", svy_weight = add_weight(annual = "pesoano") ) # Or load with existing recipes from the registry (requires API server) ech_2022 <- load_survey( load_survey_example("ech", "ech_2022"), svy_type = "ech", svy_edition = "2022", svy_weight = add_weight(annual = "pesoano"), recipes = get_recipe("ech", "2022") ) ``` ## Building a Recipe from Steps The most common workflow consists of developing transformations interactively on a survey and then converting the recorded steps into a recipe. ```{r build-recipe} library(metasurvey) library(data.table) set.seed(42) n <- 200 # Simulate survey microdata (standing in for load_survey) dt <- data.table( id = 1:n, age = sample(18:80, n, replace = TRUE), sex = sample(c(1, 2), n, replace = TRUE), income = round(runif(n, 5000, 80000)), activity = sample(c(2, 3, 5, 6), n, replace = TRUE, prob = c(0.55, 0.05, 0.05, 0.35) ), weight = round(runif(n, 0.5, 3.0), 4) ) svy <- Survey$new( data = dt, edition = "2023", type = "ech", psu = NULL, engine = "data.table", weight = add_weight(annual = "weight") ) # Develop transformations interactively svy <- step_compute(svy, income_thousands = income / 1000, employed = ifelse(activity == 2, 1L, 0L), comment = "Income scaling and employment indicator" ) svy <- step_recode(svy, labor_status, activity == 2 ~ "Employed", activity %in% 3:5 ~ "Unemployed", activity %in% 6:8 ~ "Inactive", .default = "Other", comment = "ILO labor force classification" ) svy <- step_recode(svy, age_group, age < 25 ~ "Youth", age < 45 ~ "Adult", age < 65 ~ "Mature", .default = "Senior", comment = "Standard age groups" ) # Convert all steps to a recipe labor_recipe <- steps_to_recipe( name = "Labor Force Indicators", user = "Research Team", svy = svy, description = "Standard labor force indicators following ILO definitions", steps = get_steps(svy), topic = "labor" ) labor_recipe ``` ## Recipe Documentation Every recipe can automatically generate its documentation from its steps. The `doc()` method returns a list with input variables, output variables, and the step-by-step pipeline: ```{r recipe-doc} doc <- labor_recipe$doc() names(doc) ``` ```{r recipe-doc-detail} # What variables does the recipe need? doc$input_variables # What variables does it create? doc$output_variables # Step-by-step pipeline doc$pipeline ``` This documentation is generated automatically, with no manual effort required. ## Validation Before applying a recipe to new data, verify that all required variables exist. The `validate()` method stops with a clear error if any dependency is missing: ```{r recipe-validate} labor_recipe$validate(svy) ``` ## Applying Recipes to a Survey Attach one or more recipes to a survey and apply them with `bake_recipes()`: ```{r apply-recipe} # Create a fresh survey with same structure (simulating a new edition) set.seed(99) dt2 <- data.table( id = 1:100, age = sample(18:80, 100, replace = TRUE), sex = sample(c(1, 2), 100, replace = TRUE), income = round(runif(100, 5000, 80000)), activity = sample(c(2, 3, 5, 6), 100, replace = TRUE, prob = c(0.55, 0.05, 0.05, 0.35) ), weight = round(runif(100, 0.5, 3.0), 4) ) svy2 <- Survey$new( data = dt2, edition = "2024", type = "ech", psu = NULL, engine = "data.table", weight = add_weight(annual = "weight") ) # Attach and bake svy2 <- add_recipe(svy2, labor_recipe) svy2 <- bake_recipes(svy2) head(get_data(svy2)[, .(id, income_thousands, labor_status, age_group)], 5) ``` The same recipe applied to a different edition produces consistent results. This is how metasurvey ensures reproducibility over time. In practice, you can load a survey and apply published recipes in a single call: ```r # Load ECH 2023 and apply the labor recipe from the registry (requires API) ech_2023 <- load_survey( load_survey_example("ech", "ech_2023"), svy_type = "ech", svy_edition = "2023", svy_weight = add_weight(annual = "pesoano"), recipes = get_recipe("ech", "2023", topic = "labor_market"), bake = TRUE ) ``` ## Categories Categories help organize recipes by topic: ```{r categories} cats <- default_categories() vapply(cats, function(c) c$name, character(1)) ``` Add categories to a recipe using `add_category()`: ```{r add-category} labor_recipe <- add_category(labor_recipe, "labor_market", "Labor market analysis") labor_recipe <- add_category(labor_recipe, "income", "Income-related indicators") labor_recipe ``` ## Certification The certification system offers three levels of trust: | Level | Meaning | |-------|---------| | `community` | User contribution (default), no review | | `reviewed` | Peer-reviewed by a recognized team | | `official` | Endorsed for official statistics | Higher certification levels appear first in search results and signal that the recipe has been reviewed. ## Publishing and Discovering Recipes The real power of recipes lies in sharing them. Every recipe you create can be published to the **metasurvey registry**, where other researchers can discover, reuse, and build upon your work. ### Publishing to the Public Registry The recommended workflow is to publish recipes to the public API. Anyone can browse recipes without an account; publishing requires registration: ```r # One-time: register and authenticate api_register("Your Name", "you@example.com", "password") api_login("you@example.com", "password") # Publish your recipe (your profile is attached automatically) api_publish_recipe(labor_recipe) ``` When authenticated, `api_publish_recipe()` attaches your user profile to the recipe. Other users see who published it, along with institutional affiliation and certification level. This builds accountability and trust in shared recipes. ### Browsing and Searching No authentication is needed to browse and download recipes: ```r # Browse all ECH recipes ech_recipes <- api_list_recipes(survey_type = "ech") # Get a specific recipe by ID r <- api_get_recipe(id = "recipe_id_here") ``` ### Interactive Explorer The Shiny app provides a visual interface for browsing recipes and workflows: ```r explore_recipes() ``` The explorer shows recipe cards with certification badges, download counts, and pipeline previews. Clicking a recipe opens a detail view with the full pipeline, an R code snippet, and links to related workflows. ### Private Registry for Institutions Institutions that work with **confidential or restricted-access surveys** may need a private registry. metasurvey supports this via a self-hosted backend with MongoDB: ```r # Point to your institution's private API configure_api("https://your-institution.example.com/api") # From here, the workflow is identical api_login("analyst@institution.edu", "password") api_publish_recipe(labor_recipe) api_list_recipes(survey_type = "ech") ``` See the [API and Database](api-database.html) vignette for instructions on deploying the Plumber API with MongoDB for your own organization. ## Best Practices 1. **Name recipes descriptively** -- include the survey type and topic (e.g., `"ECH Labor Force Indicators"`). 2. **Add descriptions** -- document what the recipe computes and why. 3. **Use categories and topics** -- make recipes easier to discover. 4. **Validate before sharing** -- call `validate()` on sample data to ensure all dependencies exist. 5. **Version your recipes** -- use `set_version()` when updating them. ## Next Steps - **[Estimation Workflows](workflows-and-estimation.html)** -- Use `workflow()` to compute weighted estimates from processed data - **[ECH Case Study](ech-case-study.html)** -- See recipes in action in a real labor market analysis - **[Getting Started](getting-started.html)** -- Review the basics of steps and Survey objects