--- title: "Get started" format: html execute: eval: true vignette: > %\VignetteIndexEntry{Get started} %\VignetteEngine{quarto::html} %\VignetteEncoding{UTF-8} --- ```{r results='hide', message=FALSE, warning=FALSE} library(chmsflow) ``` ## 1. Introduction chmsflow harmonizes variables from the Canadian Health Measures Survey (CHMS) across cycles 1--6 and derives health indicators used in health research. It works with the [recodeflow](https://big-life-lab.github.io/recodeflow/) package to transform raw CHMS variables into analysis-ready versions using recoding rules defined in CSV metadata files. ### What chmsflow provides The package includes two metadata CSV files (`variables.csv` and `variable-details.csv`) that define how raw CHMS variables are recoded, and 42 functions that derive new health indicators. The table below summarizes the available variables, organized by `section` and `subject` as defined in `variables.csv`: | Section | Subject | Examples | |---------|---------|----------| | Sociodemographics | Age, sex, ethnicity | `clc_age`, `clc_sex`, `pgdcgt` | | Socioeconomic | Income, education, occupation, marital status | `adj_hh_income`, `income_quintile`, `edudr04` | | Health status | Blood pressure, hypertension | `sbp_adj_mmhg`, `htn_status`, `htn_control_status` | | Health status | Chronic disease (diabetes, CKD, CVD) | `diab_status`, `ckd_status`, `cvd_status` | | Health status | Medication (8 drug classes from ATC codes) | `ace_med`, `any_htn_med`, `diab_med` | | Health status | Weight, height, cholesterol | `nonhdl_mmoll`, `waist_height_ratio`, `hwmdbmi` | | Health status | Family history | `cvd_premature_famhist_status`, `fam_bp` | | Health behaviour | Alcohol, diet | `alc_risk_score`, `fv_daily_times`, `healthy_diet_indicator` | | Health behaviour | Exercise | `exercise_min_week`, `enough_exercise_indicator` | | Health behaviour | Smoking | `pack_years`, `smoke` | For the full variable list, see [Variable schema reference](variables_and_variable_details.html). ### Typical workflow 1. **Merge cycle components** - At the RDC, combine household, clinic, and lab data into one object per cycle (e.g., `cycle4`). Keep medication data separate as `cyclex_meds`. 2. **Recode medications first** - If your analysis needs medication variables, always recode them before other variables. See [Recoding medications](recoding_medications.html). 3. **Recode other variables** - Use `rec_with_table()` from recodeflow to transform source variables and derive new ones. ## 2. Installation ```{r, eval=FALSE} # Install release version from CRAN install.packages("chmsflow") # Install the most recent version from GitHub devtools::install_github("Big-Life-Lab/chmsflow") ``` ## 3. Quick start Use `rec_with_table()` from recodeflow to transform CHMS variables. The cycle data object must be named `cyclex` for recoding to work properly. ```{r, warning=FALSE} library(recodeflow) # Recode a source variable (age) cycle4_ages <- rec_with_table( cycle4, "clc_age", variable_details = variable_details, log = TRUE ) head(cycle4_ages) ``` ## 4. Variable types chmsflow handles three types of variables, each recoded differently. ### 4.1 Source variables (direct mapping) Source variables are mapped directly from raw CHMS columns. Variable names may differ across cycles, but chmsflow harmonizes them to a single name. ```{r, warning=FALSE} # Recode sex (same variable name across all cycles) cycle4_sexes <- rec_with_table( cycle4, "clc_sex", variable_details = variable_details, log = TRUE ) head(cycle4_sexes) ``` ### 4.2 Transformed variables (continuous to categorical) Some variables convert continuous measurements into categories using thresholds defined in `variable-details.csv`. ```{r, warning=FALSE} # Recode age into 4 groups cycle4_categorical_ages <- rec_with_table( cycle4, "agegroup4", variable_details = variable_details, log = TRUE ) head(cycle4_categorical_ages) ``` ### 4.3 Derived variables (computed by functions) Derived variables are computed by R functions referenced as `Func::` entries in `variable-details.csv`. These require their input variables to be present in the data. See [Derived variables](derived_variables.html) for details. ```{r, warning=FALSE} # Derive adjusted systolic blood pressure # bpmdpbps (raw SBP) must be in the data for sbp_adj_mmhg to be computed cycle4_adjusted_SBPs <- rec_with_table( cycle4, c("bpmdpbps", "sbp_adj_mmhg"), variable_details = variable_details, log = TRUE ) head(cycle4_adjusted_SBPs) ``` ## 5. Next steps - **Full walkthrough** -- End-to-end hypertension prevalence analysis in [Analysis walkthrough](analysis_walkthrough.html). - **Medication recoding** -- Required before deriving hypertension or diabetes status. See [Recoding medications](recoding_medications.html). - **Understanding the metadata** -- Learn about the CSV schema in [Variable schema reference](variables_and_variable_details.html). - **Derived variables** -- How `Func::` and `DerivedVar::` entries work in [Derived variables](derived_variables.html). - **Adding variables** -- Extend chmsflow with your own variables in [How to add variables](how_to_add_variables.html). - **Missing data** -- How `haven::tagged_na()` handles CHMS missing codes in [Missing data (tagged_na)](tagged_na_usage.html). - **Methodology** -- Why harmonization is non-trivial and how chmsflow works in [Methodology](methodology.html). - **RDC setup** -- Using chmsflow at a Research Data Centre in [Using chmsflow at an RDC](using_chmsflow_at_an_rdc.html).