--- title: "Analysing WHO STEPS Survey Data with stepssurvey" author: "Abhijit Pakhare" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Analysing WHO STEPS Survey Data with stepssurvey} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, fig.align = "center", eval = TRUE ) ``` ## Introduction The WHO STEPwise Approach to NCD Risk Factor Surveillance (STEPS) is the standard method for collecting population-level data on chronic disease risk factors. Surveys measure behavioural risk factors (tobacco, alcohol, diet, physical activity), physical measurements (anthropometry, blood pressure), and biochemical markers (blood glucose, cholesterol). The **stepssurvey** package provides a complete, end-to-end analysis pipeline that takes raw STEPS data from any country and produces publication-ready tables, visualisations, and Word reports -- all while properly accounting for the complex survey design (stratification, clustering, sampling weights). ### What this guide covers 1. Installing the package 2. Understanding the data pipeline 3. Importing and detecting STEPS variables (v3.1 and v3.2) 4. Column mapping for non-standard datasets 5. Cleaning and deriving WHO-standard indicators 6. Configurable indicator thresholds 7. Data quality diagnostics 8. Setting up the complex survey design 9. Computing weighted prevalence estimates 10. Building tables and visualisations (including forest plot and radar chart) 11. Generating reports (fact sheet, data book, country report) 12. Running the full pipeline in one call 13. Using the interactive Shiny app 14. Working with your own data ## Installation Install the development version from GitHub: ```{r install, eval = FALSE} # install.packages("pak") pak::pak("drpakhare/stepssurvey") ``` Or with **devtools**: ```{r install-devtools, eval = FALSE} # install.packages("devtools") devtools::install_github("drpakhare/stepssurvey") ``` Load the package: ```{r library} library(stepssurvey) ``` ## The pipeline at a glance The package follows a linear, modular pipeline. You can use the one-command `run_steps_pipeline()` shortcut or call each step yourself for full control. ``` Raw data (.csv / .xlsx / .dta / .sav) | v import_steps_data() -- read any format | v detect_steps_columns() -- auto-detect v3.1/v3.2 codes -- OR -- read_column_mapping() -- use Excel mapping template | v clean_steps_data() -- derive WHO indicators | (configurable thresholds) v steps_data_quality() -- digit preference, completeness, | plausibility, weight diagnostics v setup_survey_design() -- step-specific weights | v compute_all_indicators() -- weighted prevalences & means | +---> build_steps_tables() -> summary flextables +---> build_steps_plots() -> ggplot2 charts + forest + radar +---> render_fact_sheet() -> Fact Sheet (HTML or Word) +---> render_country_report() -> Summary Report (Word) | v compute_all_tables() -- 60+ WHO registry tables | v build_all_tables() -> 3-panel flextables (Men|Women|Both) | +---> render_data_book() -> Detailed Data Book (Word) ``` Each function returns its output, so you can inspect, modify, or export results at every stage. ## Step 1: Generate or import data ### Using the built-in test data generator For learning and testing, the package includes a realistic data simulator: ```{r generate} raw <- generate_test_data(n = 3000, seed = 42) dim(raw) names(raw) ``` The generated dataset mimics a real STEPS survey with 5 strata, 40 primary sampling units (PSUs), sampling weights, and realistic correlations between risk factors (e.g. blood pressure increasing with age, higher tobacco use in males). ### Importing real STEPS data In practice you will have a data file exported from Epi Info, SPSS, or Stata. The `import_steps_data()` function reads all common formats and standardises column names to lowercase with underscores: ```{r import, eval = FALSE} # CSV raw <- import_steps_data("data/raw/steps_survey_2024.csv") # Excel raw <- import_steps_data("data/raw/steps_survey_2024.xlsx") # Stata (.dta) -- common for STEPS exports raw <- import_steps_data("data/raw/steps_survey_2024.dta") # SPSS (.sav) raw <- import_steps_data("data/raw/steps_survey_2024.sav") ``` The function uses the file extension to pick the right reader (`readr::read_csv`, `readxl::read_excel`, `haven::read_dta`, or `haven::read_spss`), then passes column names through `janitor::clean_names()` so that regardless of original casing you get a consistent format like `wt_final`, `age`, `sex`. ## Step 2: Auto-detect STEPS variables WHO STEPS datasets use standardised variable codes, but the codes changed between instrument versions and many countries add their own prefixes. The `detect_steps_columns()` function searches for each variable using a prioritised alias list: ```{r detect} cols <- detect_steps_columns(raw) ``` It returns a named list mapping each conceptual variable to the actual column found in your data. You can inspect the mapping: ```{r detect-inspect} # Which column was matched for fasting glucose? cols$fasting_glucose # Which column for SBP reading 1? cols$sbp1 # How many columns were detected? sum(!sapply(cols, is.null)) ``` ### Version 3.1 vs 3.2 variable codes A key feature of the package is transparent support for both WHO STEPS instrument versions. The variable codes changed substantially between versions: | Measurement | v3.1 / Epi Info | v3.2 Instrument | |--------------------|-----------------|-----------------| | SBP readings | B1, B3, B5 | M4a, M5a, M6a | | DBP readings | B2, B4, B6 | M4b, M5b, M6b | | BP medications | B7 | M7 / H3 | | Height | M1 | M11 | | Weight | M2 | M12 | | Waist | M3 | M14 | | Fasting glucose | C1 (c1_mmol) | B5 | | Diabetes meds | C5 | B6 / H8 | | Total cholesterol | C6 | B8 | | Cholesterol meds | C10 | B9 / H14 | | Sex | -- | C1 | | Age | C1 | C3 | The detection function includes aliases for both versions, so a dataset using `b1` for SBP (v3.1) and one using `m4a` (v3.2) will both be detected correctly. The search is case-insensitive. If a column is not found automatically, you can override the mapping before cleaning: ```{r override, eval = FALSE} cols$fasting_glucose <- "my_custom_glucose_variable" ``` ### Column mapping for non-standard datasets Many real-world STEPS datasets use country-specific variable names that auto-detection cannot resolve. The package includes an Excel mapping template that lets you specify the correspondence between your column names and the standard STEPS variables. **Step 1:** Get the blank template: ```{r mapping-template, eval = FALSE} # Copy the template to your working directory file.copy( system.file("templates", "column_mapping_template.xlsx", package = "stepssurvey"), file.path(tempdir(), "my_mapping.xlsx") ) ``` The template has two sheets: **Instructions** with usage guidance, and **Column Mapping** with 110 standard variables organised by domain (Demographics, Tobacco, Alcohol, Diet, Physical Activity, Anthropometry, Blood Pressure, Biochemical, History & Treatment). Required variables are highlighted in red; optional ones in yellow. **Step 2:** Open the template in Excel, and for each variable in column A, type your dataset's column name in column C ("Your Column Name"). Leave blank any variables your dataset does not have. **Step 3:** Read the filled template: ```{r mapping-read, eval = FALSE} cols <- read_column_mapping("my_mapping.xlsx", data = raw) ``` The `data` argument is optional but recommended -- it validates that every mapped column actually exists in your dataset and warns about typos. The returned `cols` list is identical in structure to what `detect_steps_columns()` produces, so you can pass it directly to `clean_steps_data()`. The `run_steps_pipeline()` function also accepts a `mapping_file` parameter: ```{r mapping-pipeline, eval = FALSE} result <- run_steps_pipeline( "my_data.dta", country_name = "My Country", survey_year = 2024, mapping_file = "my_mapping.xlsx" ) ``` ## Step 3: Clean and derive indicators The `clean_steps_data()` function performs all WHO-recommended data processing in a single call: ```{r clean} clean <- clean_steps_data(raw, cols, age_min = 18, age_max = 69) dim(clean) ``` ### What the cleaning step does **Demographics:** - Restricts age to the specified range (default 18--69) - Creates WHO standard age groups: 18--24, 25--34, 35--44, 45--54, 55--64, 65+ - Harmonises sex coding (1/2, "Male"/"Female", "M"/"F" all accepted) - Ensures survey weight, stratum, and PSU columns are present **Behavioural risk factors (Step 1):** - Recodes tobacco and alcohol variables to logical TRUE/FALSE using `recode_yn()`, which understands 0/1, 1/2, "yes"/"no" patterns - Computes average daily fruit and vegetable servings and flags `low_fruit_veg` (combined < 5 servings/day) - Classifies physical activity into Low / Moderate / High based on MET-minutes/week thresholds (< 600, 600--2999, >= 3000) **Physical measurements (Step 2):** - Applies plausibility checks (e.g. height 100--250 cm, weight 20--300 kg, waist 40--200 cm) and sets implausible values to `NA` - Computes BMI and classifies into Underweight / Normal / Overweight / Obese - Flags central obesity using WHO waist circumference thresholds (>= 102 cm male, >= 88 cm female) - Computes waist-to-hip ratio if both measurements are available - Averages the last two of three BP readings (WHO protocol) to obtain mean SBP and mean DBP - Creates the `raised_bp` indicator (SBP >= 140 or DBP >= 90 or on medication) and WHO blood pressure staging **Biochemical measurements (Step 3):** - Flags raised fasting glucose (>= 7.0 mmol/L or on diabetes medication) - Flags impaired fasting glucose (6.1--6.9 mmol/L) - Flags raised total cholesterol (>= 5.0 mmol/L) - Flags low HDL cholesterol (sex-specific thresholds) - Flags raised triglycerides (>= 1.7 mmol/L) ### Configurable indicator thresholds All indicator thresholds can be customised. This is essential when a country uses non-standard definitions (e.g. Mongolia uses 130/80 mmHg for raised blood pressure instead of the WHO default 140/90): ```{r thresholds, eval = FALSE} clean <- clean_steps_data(raw, cols, bp_sbp_threshold = 130, # SBP threshold (default 140) bp_dbp_threshold = 80, # DBP threshold (default 90) bmi_overweight = 25.0, # BMI overweight (default 25) bmi_obese = 30.0, # BMI obese (default 30) glucose_threshold = 7.0, # Raised glucose mmol/L (default 7.0) glucose_impaired_threshold = 6.1, # Impaired glucose mmol/L (default 6.1) chol_threshold = 5.0 # Raised cholesterol mmol/L (default 5.0) ) ``` The same thresholds are available in `steps_config()` and propagate through `run_steps_pipeline()` and the Shiny app interface. You can inspect the derived variables: ```{r clean-inspect} # BMI categories table(clean$bmi_category, clean$sex) # Blood pressure staging table(clean$bp_stage) # Physical activity levels table(clean$pa_category, clean$sex) ``` ## Step 3b: Data quality diagnostics Before proceeding with analysis, the package provides a comprehensive data quality assessment. The `steps_data_quality()` function checks four dimensions: ```{r quality, eval = FALSE} quality <- steps_data_quality(clean) names(quality) # [1] "digit_preference" "completeness" "plausibility" "weights" ``` **Digit preference** detects heaping on terminal digits 0 and 5 in blood pressure and anthropometric measurements -- a common data collection artefact: ```{r digit-plot, eval = FALSE} plot_digit_preference(quality, measure = "sbp") ``` **Completeness** reports the percentage of non-missing values for every key variable, helping identify modules that may have been skipped. ```{r completeness-plot, eval = FALSE} plot_completeness(quality) ``` **Plausibility** flags values outside physiologically reasonable ranges (e.g. systolic BP > 300 mmHg, height < 100 cm). **Sampling weights** shows the distribution and coefficient of variation of each step-specific weight, helping detect extreme weights that might destabilise survey estimates: ```{r weights-plot, eval = FALSE} plot_weights(quality) ``` In the Shiny app, the **Quality** tab presents all four diagnostics interactively with summary value boxes. ## Step 4: Set up the survey design STEPS surveys use complex sampling designs. Ignoring the design leads to biased estimates and incorrect confidence intervals. The `setup_survey_design()` function creates a `survey::svydesign` object that accounts for weights, stratification, and clustering: ```{r design} designs <- setup_survey_design(clean) ``` The returned object is a list with three elements (`step1`, `step2`, `step3`), each a `survey::svydesign` object weighted appropriately for that step of the survey. Functions like `compute_all_indicators()` accept this list directly, but for custom estimates you pick the design matching the step of the variable you are analysing: The function auto-detects the design complexity based on which columns are present: - **Full complex design**: weights + strata + clusters - **Weights + clusters**: no stratification variable - **Weights + strata**: no clustering (rare) - **Weights only**: self-representing design - **Unweighted**: simple random sample (weights set to 1) Sampling weights are used as-is without trimming, consistent with the WHO official STEPS analysis scripts. The returned object can be used with any function from the **survey** package if you need custom analyses beyond what the package provides. ## Step 5: Compute indicators ### All indicators at once ```{r indicators} result <- compute_all_indicators(designs) ``` This returns a list with two elements: - `result$results` -- a nested list of domain-specific estimates (total, by sex, by age group) - `result$key_indicators` -- a tidy data frame of headline prevalences ```{r key} result$key_indicators ``` ### Domain-specific functions For more control, call each domain function separately: ```{r domain, eval = FALSE} tob <- compute_tobacco_indicators(designs$step1) alc <- compute_alcohol_indicators(designs$step1) diet <- compute_diet_pa_indicators(designs$step1) anth <- compute_anthropometry_indicators(designs$step2) bp <- compute_bp_indicators(designs$step2) bio <- compute_biochemical_indicators(designs$step3) ``` Each returns a named list. For example, the tobacco module returns: ```{r tob-example} tob <- compute_tobacco_indicators(designs$step1) names(tob) # Overall prevalence of current tobacco use tob$current_tobacco_total # Prevalence by sex tob$current_tobacco_by_sex ``` ### Custom weighted estimates The package exports two low-level helpers for any weighted estimate you need: ```{r custom} # Weighted proportion with 95% CI (raised_bp is a Step 2 variable) svyprop(~raised_bp, designs$step2) # Stratified by sex svyprop(~raised_bp, designs$step2, by = ~sex) # Weighted mean with 95% CI svymn(~mean_sbp, designs$step2, by = ~sex) ``` ## Step 6: Build publication-ready tables The package provides **two table systems** for different purposes. ### Summary tables (Both Sexes only) ```{r tables} tables <- build_steps_tables(result$results) names(tables) ``` Each table is a **flextable** object styled with WHO STEPS branding (dark blue headers, formatted confidence intervals). These tables show estimates by age group for Both Sexes combined -- ideal for summary reports and quick reference. ```{r table-show, eval = FALSE} # Display the raised blood pressure table tables$raised_bp # Export to Word flextable::save_as_docx(tables$raised_bp, path = file.path(tempdir(), "bp_table.docx")) ``` ### Detailed WHO 3-panel tables (Men | Women | Both Sexes) For the full WHO STEPS data book format, use the detailed table engine. This produces ~60 tables in the standard 3-panel layout (Age Group | Men | Women | Both Sexes): ```{r detailed-tables, eval = FALSE} # Step 1: Compute raw results from the table registry computed <- compute_all_tables(designs) # Step 2: Format into flextable objects with WHO styling detailed <- build_all_tables(computed) names(detailed) # e.g. "T_current_smokers", "M_bp_mean", "B_glucose_raised" ``` The table IDs use prefixes matching WHO STEPS domains: | Prefix | Domain | |--------|---------------------------| | T_ | Tobacco | | A_ | Alcohol | | D_ | Diet | | P_ | Physical Activity | | H_ | Health History & Treatment| | M_ | Physical Measurements | | B_ | Biochemical Measurements | | R_ | Cardiovascular Risk | | RF_ | Combined Risk Factors | You can access individual tables by ID or filter by section: ```{r registry, eval = FALSE} # Browse the full table registry registry <- steps_table_registry() # Get all tables for a specific section bp_entries <- get_registry_by_section("Blood Pressure") # Get all Step 2 tables step2_entries <- get_registry_by_step(2) # List available sections list_registry_sections() ``` ## Step 7: Create visualisations ```{r plots, fig.width = 8, fig.height = 5} plots <- build_steps_plots( indicators = result$results, key_indicators = result$key_indicators, country_name = "Exampleland", survey_year = 2024 ) names(plots) ``` ### Overview chart The overview plot shows all key indicators as a horizontal bar chart with 95% confidence intervals, sorted by prevalence: ```{r overview, fig.width = 8, fig.height = 5} plots$overview ``` ### Sex-stratified dashboard If multiple sex-stratified indicators are available, the package creates a 2 x 2 dashboard using **patchwork**: ```{r dashboard, fig.width = 10, fig.height = 7, eval = FALSE} plots$sex_dashboard ``` ### Age-stratified trends Age trend plots show how each risk factor varies across the WHO standard age groups, with shaded confidence bands: ```{r age-trend, fig.width = 8, fig.height = 4.5} plots$bp_by_age ``` ### Forest plot The forest plot shows all key indicators as horizontal point-and-CI estimates, colour-coded by STEPS domain: ```{r forest, fig.width = 8, fig.height = 6, eval = FALSE} plots$forest # Or build standalone: build_forest_plot(result$key_indicators, "Exampleland", 2024) ``` ### Risk profile radar chart The radar (spider) chart provides a visual fingerprint of the country's NCD risk factor profile, making it easy to spot which domains are most affected: ```{r radar, fig.width = 7, fig.height = 7, eval = FALSE} plots$radar # Or build standalone: build_radar_plot(result$key_indicators, "Exampleland", 2024) ``` ### Saving plots ```{r save-plots, eval = FALSE} save_steps_plots(plots, output_dir = file.path(tempdir(), "figures")) # Creates: # outputs/figures/01_overview_indicators.png # outputs/figures/02_by_sex_dashboard.png # outputs/figures/03_bp_by_age.png # outputs/figures/04_obesity_by_age.png # outputs/figures/05_forest_plot.png # outputs/figures/06_radar_plot.png ``` ### WHO STEPS colour palette and theme The package uses a consistent visual identity. You can apply the same styling to your own ggplot2 plots: ```{r theme, fig.width = 6, fig.height = 3.5} pal <- steps_colors() str(pal) library(ggplot2) ggplot(clean, aes(x = age_group, fill = sex)) + geom_bar(position = "dodge") + scale_fill_manual(values = c(Male = pal$male, Female = pal$female)) + theme_steps() + labs(title = "Sample distribution by age and sex") ``` ## Step 8: Generate Word reports The package produces three complementary reports: | Report | Function | Format | Content | |--------|----------|--------|---------| | **Fact Sheet** | `render_fact_sheet()` | HTML or Word | One-page overview with radar chart, summary table, and key findings | | **Summary Report** | `render_country_report()` | Word | Narrative with key findings, charts, and recommendations | | **Detailed Data Book** | `render_data_book()` | Word | Complete WHO 3-panel tables (Men \| Women \| Both Sexes) across all domains | ```{r reports, eval = FALSE} cfg <- steps_config( data_path = "data/raw/steps_survey_2024.csv", country_name = "Exampleland", survey_year = 2024, age_min = 18, age_max = 69 ) # Fact sheet -- one-page overview (HTML for sharing, Word for print) render_fact_sheet(cfg, output_dir = "outputs", format = "html") render_fact_sheet(cfg, output_dir = "outputs", format = "word") # Summary report -- narrative with key findings, charts, recommendations render_country_report(cfg, output_dir = "outputs") # Data book -- detailed WHO 3-panel tables by domain render_data_book(cfg, output_dir = "outputs") ``` Each function runs the entire pipeline internally (import, clean, analyse) and renders an R Markdown template to a Word document. The output files are saved in the specified directory. ### What each report contains The **Fact Sheet** is a single-page overview with a branded header, summary table of key indicators (noting any non-default thresholds), the radar chart, sex-stratified dashboard, and forest plot. The HTML version is self-contained and ideal for web sharing; the Word version is print-ready. The **Summary Report** includes an executive summary table, narrative sections for each risk factor domain with inline prevalence figures, embedded charts (overview indicators, by-sex breakdowns, age trends), and WHO-aligned policy recommendations. The **Data Book** contains the full set of ~60 WHO STEPS tables in the standard 3-panel format. Each table shows estimates by age group separately for Men, Women, and Both Sexes. Tables are organised by STEPS step: Step 1 (Behavioural), Step 1.5 (Health History), Step 2 (Physical Measurements), Step 3 (Biochemical), and Combined Risk Factors. ## One-command pipeline For the fastest path from raw data to results, `run_steps_pipeline()` chains every step and returns all intermediate objects: ```{r pipeline, eval = FALSE} out <- run_steps_pipeline( data_path = "data/raw/steps_survey_2024.csv", country_name = "Exampleland", survey_year = 2024, age_min = 18, age_max = 69, output_dir = "outputs", render_reports = TRUE ) # Access any intermediate result out$raw_data out$clean_data out$design out$indicators out$key_indicators out$tables out$plots ``` Setting `render_reports = FALSE` skips the Word documents (useful for interactive exploration or when **rmarkdown** / Pandoc are not available). If your dataset uses non-standard column names, pass a filled mapping template: ```{r pipeline-mapping, eval = FALSE} out <- run_steps_pipeline( "my_data.dta", country_name = "My Country", survey_year = 2024, mapping_file = "my_mapping.xlsx" ) ``` ## Working with real STEPS data ### Preparing your data file The package accepts data in four formats: | Format | Extension | Typical source | |------------|-----------|----------------------------------| | CSV | .csv | Spreadsheet export | | Excel | .xlsx | Direct data entry | | Stata | .dta | WHO Epi Info / analysis template | | SPSS | .sav | SPSS data export | Before importing, ensure the file contains at minimum: - **Age** and **sex** columns (required for all analyses) - **Sampling weight** column (recommended; if absent, all weights are set to 1) - At least some risk factor measurements from Step 1, 2, or 3 ### Handling custom variable names For datasets with a few non-standard names, override individual mappings after auto-detection: ```{r custom-names, eval = FALSE} raw <- import_steps_data("my_steps_data.csv") cols <- detect_steps_columns(raw) cols$fasting_glucose <- "blood_sugar_fasting" cols$sbp1 <- "systolic_bp_1" clean <- clean_steps_data(raw, cols) ``` For datasets where many or most variables have non-standard names, use the column mapping template instead (see the "Column mapping for non-standard datasets" section above). This is the recommended approach for real-world STEPS microdata. ### Adjusting age range Some STEPS surveys target populations outside the standard 18--69 range. Adjust with the `age_min` and `age_max` parameters: ```{r age-range, eval = FALSE} # Include up to age 79 clean <- clean_steps_data(raw, cols, age_min = 18, age_max = 79) ``` Note that changing the upper age limit adds a wider final age group (e.g. 65--79 instead of 65+). ## Complete worked example This section walks through a full analysis using simulated data, showing every step from generation to output. ```{r worked, fig.width = 8, fig.height = 5} library(stepssurvey) # 1. Generate a realistic test dataset raw <- generate_test_data(n = 3000, seed = 42) # 2. Detect standard STEPS variable columns cols <- detect_steps_columns(raw) # 3. Clean data and derive all indicators clean <- clean_steps_data(raw, cols, age_min = 18, age_max = 69) # 4. Create the complex survey design designs <- setup_survey_design(clean) # 5. Compute all NCD risk factor indicators result <- compute_all_indicators(designs) # 6. View headline estimates result$key_indicators # 7. Build formatted tables tables <- build_steps_tables(result$results) # 8. Build visualisations plots <- build_steps_plots( indicators = result$results, key_indicators = result$key_indicators, country_name = "Exampleland", survey_year = 2024 ) # 9. Display the overview chart plots$overview ``` ## Interactive Shiny app For users who prefer a point-and-click interface, the package includes a full-featured Shiny application: ```{r shiny, eval = FALSE} library(stepssurvey) run_app() ``` The app guides you through the same pipeline in seven tabs: 1. **Upload** -- load data (or use built-in demo data), set country name, survey year, age range, indicator thresholds, and optionally upload a column mapping template 2. **Clean** -- run WHO-standard cleaning with summary statistics 3. **Quality** -- interactive data quality diagnostics (digit preference, completeness, plausibility, sampling weights) 4. **Design** -- set up the complex survey design with step-specific weights 5. **Indicators** -- compute all NCD risk factor indicators with tabulated results 6. **Visualise** -- interactive plots including overview, sex dashboard, age trends, forest plot, and radar chart 7. **Reports** -- one-click generation of fact sheet (HTML/Word), country report, and data book with download buttons A deployed version is available at . ## WHO standard definitions used The package implements the following WHO STEPS definitions for all derived indicators: | Indicator | Definition | |-------------------------------|-----------------------------------------------------------------------------| | Current tobacco use | Currently smokes any tobacco product (T1 = Yes) | | Daily tobacco use | Smokes tobacco daily (T2 = Yes) | | Current alcohol use | Consumed alcohol in the past 30 days (A5 = Yes) | | Heavy episodic drinking | 6 or more standard drinks on a single occasion in past 30 days (A9) | | Insufficient physical activity| Total MET-minutes per week < 600 | | Low fruit and vegetable intake| Combined < 5 servings per day | | Overweight or obese | BMI >= 25 kg/m^2^ (configurable) | | Obese | BMI >= 30 kg/m^2^ (configurable) | | Central obesity | Waist >= 102 cm (male) or >= 88 cm (female) | | Raised blood pressure | Mean SBP >= 140 or mean DBP >= 90 or on BP meds (configurable) | | Raised fasting glucose | Fasting glucose >= 7.0 mmol/L or on diabetes meds (configurable) | | Impaired fasting glucose | Fasting glucose 6.1--6.9 mmol/L (configurable) | | Raised total cholesterol | Total cholesterol >= 5.0 mmol/L (configurable) | | Low HDL cholesterol | HDL < 1.0 mmol/L (male) or < 1.3 mmol/L (female) | | Raised triglycerides | Triglycerides >= 1.7 mmol/L | Blood pressure readings follow the WHO protocol of averaging the last two of three measurements taken three minutes apart. ## FAQ **Can I use this package with STEPS surveys from any country?** Yes. The variable detection system supports both v3.1 and v3.2 naming conventions, plus common country-specific aliases. Override any undetected columns manually as shown above. **What if my dataset is missing some risk factor modules?** The package handles missing modules gracefully. If, for example, no biochemical columns are found, the glucose and cholesterol indicators are simply skipped and the tables and plots adapt accordingly. **Can I add my own indicators?** Absolutely. After the cleaning step you have a standard data frame with all derived variables. Use the `survey::svydesign` object with `svyprop()` or `svymn()` (or any **survey** package function) for custom analyses. **How do I cite this package?** ```{r cite, eval = FALSE} citation("stepssurvey") ``` ## Further resources - [WHO STEPS Manual](https://www.who.int/teams/noncommunicable-diseases/surveillance/systems-tools/steps/manuals) - [STEPS Instrument v3.2 (Q-by-Q Guide)](https://www.who.int/teams/noncommunicable-diseases/surveillance/systems-tools/steps/instrument) - [STEPS Data Analysis Tools](https://www.who.int/teams/noncommunicable-diseases/surveillance/systems-tools/steps/data-analysis-reporting-tools) - [Package source on GitHub](https://github.com/drpakhare/stepssurvey) ## Session info ```{r session} sessionInfo() ```