--- title: "Advanced Usage: Equity Analysis and Visualization" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Advanced Usage: Equity Analysis and Visualization} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` This vignette demonstrates advanced analytical workflows using `unicefData`, aligned with the examples in the package documentation paper (Azevedo, 2026). All examples use the same indicators, countries, and parameters as the paper and the Stata help file, enabling cross-language reproducibility. ## Data Acquisition as Infrastructure The examples in this vignette demonstrate the principle of **treating data acquisition as code**. Notice how each example explicitly specifies: - Which indicators are requested (e.g., `CME_MRY0T4` for under-5 mortality) - Which countries or regions are included - How data are filtered (by sex, wealth, time period) - What output format is needed (long, wide, or indicators as columns) This approach contrasts with workflows where researchers: 1. Manually download data from a web portal 2. Apply undocumented filters in Excel or R 3. Manually clean and reshape the data With unicefData, all these decisions are explicit and version-controlled in your script. This makes your analysis: - **Auditable**: Anyone can inspect your code and verify exactly what data you used - **Defensible**: Your data selection decisions are clearly documented - **Maintainable**: If upstream data change, you can simply re-run your script - **Reproducible**: Your entire pipeline—from data retrieval to visualization—is in one place This is especially important in research assisted by AI tools, where automated analysis must rest on transparent and verifiable data foundations. ```{r library} library(unicefData) library(dplyr) ``` ## U5MR trends in South Asia Reproduce the paper's South Asia mortality trend analysis (paper Example 5+6): ```{r south-asia-trends} # Fetch under-5 mortality for South Asian countries df <- unicefData( indicator = "CME_MRY0T4", countries = c("AFG", "BGD", "BTN", "IND", "MDV", "NPL", "PAK", "LKA") ) # Filter to total (both sexes) df_total <- df %>% filter(sex == "_T" | is.na(sex)) # Plot trends plot( value ~ period, data = df_total[df_total$iso3 == "AFG", ], type = "l", col = "red", lwd = 2, ylim = range(df_total$value, na.rm = TRUE), xlab = "Year", ylab = "Under-5 mortality rate (per 1,000)", main = "U5MR Trends in South Asia" ) lines(value ~ period, data = df_total[df_total$iso3 == "BGD", ], col = "blue", lwd = 2) lines(value ~ period, data = df_total[df_total$iso3 == "IND", ], col = "green", lwd = 2) lines(value ~ period, data = df_total[df_total$iso3 == "PAK", ], col = "orange", lwd = 2) legend("topright", legend = c("Afghanistan", "Bangladesh", "India", "Pakistan"), col = c("red", "blue", "green", "orange"), lwd = 2 ) ``` The equivalent Stata code from the paper: ``` . unicefdata, indicator(CME_MRY0T4) countries(AFG BGD BTN IND MDV NPL PAK LKA) clear . keep if sex == "_T" . graph twoway /// (connected value period if iso3 == "AFG", lcolor(red)) /// (connected value period if iso3 == "BGD", lcolor(blue)) /// (connected value period if iso3 == "IND", lcolor(green)) /// (connected value period if iso3 == "PAK", lcolor(orange)), /// legend(order(1 "Afghanistan" 2 "Bangladesh" 3 "India" 4 "Pakistan")) ``` ## Stunting by wealth quintile Equity analysis using wealth disaggregation (paper Example 8): ```{r stunting-wealth} # Fetch stunting data with all wealth quintiles df <- unicefData( indicator = "NT_ANT_HAZ_NE2", sex = "ALL", wealth = "ALL", latest = TRUE ) # Filter to wealth quintiles only df_wealth <- df %>% filter(wealth_quintile %in% c("Q1", "Q2", "Q3", "Q4", "Q5")) # Average stunting by wealth quintile (global) summary_wealth <- df_wealth %>% group_by(wealth_quintile) %>% summarise(mean_stunting = mean(value, na.rm = TRUE), .groups = "drop") %>% arrange(wealth_quintile) print(summary_wealth) # Visualize the wealth gradient barplot( summary_wealth$mean_stunting, names.arg = summary_wealth$wealth_quintile, ylab = "Stunting prevalence (%)", main = "Child Stunting by Wealth Quintile", col = c("#d73027", "#fc8d59", "#fee090", "#91bfdb", "#4575b4") ) ``` ## Wealth gap analysis Quantify the equity gap between poorest and richest quintiles: ```{r wealth-gap} # Fetch stunting for specific countries with Q1 and Q5 df <- unicefData( indicator = "NT_ANT_HAZ_NE2", countries = c("IND", "PAK", "BGD", "ETH"), wealth = "ALL", latest = TRUE ) # Compute wealth gap (Q1 - Q5 = poorest minus richest) df_gap <- df %>% filter(wealth_quintile %in% c("Q1", "Q5")) %>% tidyr::pivot_wider( id_cols = c(iso3, country), names_from = wealth_quintile, values_from = value ) %>% mutate(wealth_gap = Q1 - Q5) %>% arrange(desc(wealth_gap)) print(df_gap) ``` ## Multiple mortality indicators comparison Compare neonatal and under-5 mortality across countries (paper Example 10): ```{r multi-mortality} # Fetch multiple mortality indicators df <- unicefData( indicator = c("CME_MRM0", "CME_MRY0T4"), countries = c("BRA", "MEX", "ARG", "COL", "PER", "CHL"), year = "2020:2023" ) # Keep latest year per country-indicator df_latest <- df %>% filter(sex == "_T" | is.na(sex)) %>% group_by(iso3, indicator) %>% slice_max(period, n = 1) %>% ungroup() # Reshape wide for comparison df_wide <- df_latest %>% select(iso3, country, indicator, value) %>% tidyr::pivot_wider(names_from = indicator, values_from = value) print(df_wide) ``` ## Global immunization coverage trends Track DTP3 and MCV1 coverage over time (paper immunization example): ```{r immunization-trends} # Fetch immunization indicators df <- unicefData( indicator = c("IM_DTP3", "IM_MCV1"), year = "2000:2023" ) # Global average by year and indicator trends <- df %>% group_by(period, indicator) %>% summarise(coverage = mean(value, na.rm = TRUE), .groups = "drop") # Plot dtp3 <- trends[trends$indicator == "IM_DTP3", ] mcv1 <- trends[trends$indicator == "IM_MCV1", ] plot(coverage ~ period, data = dtp3, type = "l", col = "blue", lwd = 2, ylim = c(60, 95), xlab = "Year", ylab = "Coverage (%)", main = "Global Immunization Coverage Trends") lines(coverage ~ period, data = mcv1, col = "red", lwd = 2) legend("bottomright", legend = c("DTP3", "MCV1"), col = c("blue", "red"), lwd = 2) ``` ## Regional comparison with metadata Analyze U5MR by UNICEF region using metadata enrichment (paper Example 12): ```{r regional} # Fetch with regional classifications df <- unicefData( indicator = "CME_MRY0T4", add_metadata = c("region", "income_group"), latest = TRUE ) # Filter to countries only (exclude regional aggregates) df_countries <- df %>% filter(geo_type == 0, sex == "_T" | is.na(sex)) # Average U5MR by region by_region <- df_countries %>% group_by(region) %>% summarise(avg_u5mr = mean(value, na.rm = TRUE), .groups = "drop") %>% arrange(desc(avg_u5mr)) print(by_region) # Average U5MR by income group by_income <- df_countries %>% group_by(income_group) %>% summarise(avg_u5mr = mean(value, na.rm = TRUE), .groups = "drop") %>% arrange(desc(avg_u5mr)) print(by_income) ``` ## Wide format for time-series analysis Create panel datasets for econometric analysis (paper Example 9): ```{r wide-timeseries} # Wide format: years as columns df_wide <- unicefData( indicator = "CME_MRY0T4", countries = c("USA", "BRA", "IND", "CHN"), year = "2015:2023", format = "wide" ) # Compute change over time # Columns will be named yr2015, yr2016, ..., yr2023 # (exact names depend on available data) print(df_wide) ``` ## Wide indicators for cross-domain analysis Compare indicators from different domains side-by-side: ```{r wide-indicators} # One column per indicator df_cross <- unicefData( indicator = c("CME_MRY0T4", "CME_MRY0", "IM_DTP3", "IM_MCV1"), countries = c("AFG", "ETH", "PAK", "NGA"), latest = TRUE, format = "wide_indicators" ) print(df_cross) # Correlation between mortality and immunization if (all(c("CME_MRY0T4", "IM_DTP3") %in% names(df_cross))) { cor_val <- cor(df_cross$CME_MRY0T4, df_cross$IM_DTP3, use = "complete.obs") message("Correlation between U5MR and DTP3: ", round(cor_val, 3)) } ``` ## Sex disaggregation analysis Examine male-female mortality gaps (paper disaggregation example): ```{r sex-gap} # Fetch all sex categories df <- unicefData( indicator = "CME_MRY0T4", countries = c("IND", "PAK", "BGD"), year = 2020, sex = "ALL" ) # Compute male-female gap (biological pattern: male > female) df_gap <- df %>% filter(sex %in% c("M", "F")) %>% tidyr::pivot_wider( id_cols = c(iso3, country, period), names_from = sex, values_from = value ) %>% mutate(mf_gap = M - F) print(df_gap) ``` ## Domain-specific examples ### Nutrition indicators ```{r nutrition} # Stunting prevalence df_stunting <- unicefData(indicator = "NT_ANT_HAZ_NE2", latest = TRUE) # Stunting by wealth (poorest quintile only) df_q1 <- unicefData( indicator = "NT_ANT_HAZ_NE2", wealth = "Q1", latest = TRUE ) # Stunting by residence (rural only) df_rural <- unicefData( indicator = "NT_ANT_HAZ_NE2", residence = "R", latest = TRUE ) ``` ### WASH indicators ```{r wash} # Basic drinking water services df_water <- unicefData(indicator = "WS_PPL_W-B", latest = TRUE) # Basic sanitation services df_sanitation <- unicefData(indicator = "WS_PPL_S-B", latest = TRUE) ``` ### Education indicators ```{r education} # Out-of-school rate (primary) df_oos <- unicefData(indicator = "ED_ROFST_L1", latest = TRUE) # Net attendance rate (primary) df_nar <- unicefData(indicator = "ED_ANAR_L1", latest = TRUE) ``` ## Defensive programming Handle errors gracefully when processing multiple indicators: ```{r defensive} # Process multiple indicators, some of which may not exist indicators <- c("CME_MRY0T4", "IM_DTP3", "INVALID_CODE_XYZ") results <- list() for (ind in indicators) { tryCatch({ results[[ind]] <- unicefData(indicator = ind, countries = "BRA", latest = TRUE) message("OK: ", ind, " (", nrow(results[[ind]]), " rows)") }, error = function(e) { message("FAIL: ", ind, " - ", conditionMessage(e)) }) } ``` The Stata equivalent uses `capture noisily` (paper Example 16): ``` . foreach ind in CME_MRY0T4 IM_DTP3 INVALID { . capture noisily unicefdata, indicator(`ind') clear . if _rc == 0 { . summarize value } . } ``` ## Exporting data ```{r export} # Fetch and export to CSV df <- unicefData( indicator = "CME_MRY0T4", countries = c("ALB", "USA", "BRA", "IND", "CHN", "NGA"), year = "2015:2023", add_metadata = c("region", "income_group") ) # Export write.csv(df, "unicef_mortality_data.csv", row.names = FALSE) ``` ## Metadata synchronization Keep local metadata up to date with the UNICEF Data Warehouse: ```{r sync} # Sync all metadata sync_metadata() # Or sync specific components sync_dataflows() sync_indicators() sync_codelists() ``` ## Further reading - `vignette("unicefData-introduction")` --- Getting started guide - The package documentation paper: Azevedo, J.P. (2026). "unicefData: Trilingual Library for UNICEF SDMX Indicators." - `?unicefData` --- Main function documentation - `?search_indicators` --- Indicator discovery - `?filter_unicef_data` --- Post-processing filters --- ## Acknowledgments This package was developed at the UNICEF Data and Analytics Section. The author gratefully acknowledges the collaboration of **Lucas Rodrigues**, **Yang Liu**, and **Karen Avanesian**, whose technical contributions and feedback were instrumental in the development of this R package. Special thanks to **Yves Jaques**, **Alberto Sibileau**, and **Daniele Olivotti** for designing and maintaining the UNICEF SDMX data warehouse infrastructure that makes this package possible. The author also acknowledges the **UNICEF database managers** and technical teams who ensure data quality, as well as the country office staff and National Statistical Offices whose data collection efforts make this work possible. Development of this package was supported by UNICEF institutional funding for data infrastructure and statistical capacity building. The author also acknowledges UNICEF colleagues who provided testing and feedback during development, as well as the broader open-source R community. Development was assisted by AI coding tools (GitHub Copilot, Claude). All code has been reviewed, tested, and validated by the package maintainers. ## Disclaimer **This package is provided for research and analytical purposes.** The `unicefData` package provides programmatic access to UNICEF's public data warehouse. While the author is affiliated with UNICEF, **this package is not an official UNICEF product and the statements in this documentation are the views of the author and do not necessarily reflect the policies or views of UNICEF**. Data accessed through this package comes from the [UNICEF Data Warehouse](https://sdmx.data.unicef.org/). Users should verify critical data points against official UNICEF publications at [data.unicef.org](https://data.unicef.org/). This software is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors or UNICEF be liable for any claim, damages or other liability arising from the use of this software. The designations employed and the presentation of material in this package do not imply the expression of any opinion whatsoever on the part of UNICEF concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. ## Data Citation and Provenance **Important Note on Data Vintages** Official statistics are subject to revisions as new information becomes available and estimation methodologies improve. UNICEF indicators are regularly updated based on new surveys, censuses, and improved modeling techniques. Historical values may be revised retroactively to reflect better information or methodological improvements. **For reproducible research and proper data attribution, users should:** 1. **Document the indicator code** - Specify the exact SDMX indicator code(s) used (e.g., `CME_MRY0T4`) 2. **Record the download date** - Note when data was accessed (e.g., "Data downloaded: 2026-02-09") 3. **Cite the data source** - Reference both the package and the UNICEF Data Warehouse 4. **Archive your dataset** - Save a copy of the exact data used in your analysis **Example citation for data used in research:** > Under-5 mortality data (indicator: CME_MRY0T4) accessed from UNICEF Data Warehouse via unicefData R package (v2.1.0) on 2026-02-09. Data available at: https://sdmx.data.unicef.org/ This practice ensures that others can verify your results and understand any differences that may arise from data updates. For official UNICEF statistics in publications, always cross-reference with the current version at [data.unicef.org](https://data.unicef.org/). ## Citation If you use this package in your research, please cite: ``` Azevedo, J.P. (2026). unicefData: Trilingual R, Python, and Stata Interface to UNICEF SDMX Data Warehouse. R package version 2.1.0. https://github.com/unicef-drp/unicefData ``` For data citations, please refer to the specific UNICEF datasets accessed through the warehouse and cite them according to UNICEF's data citation guidelines. ## License This package is released under the MIT License. See the LICENSE file for full details. ## Contact & Support - **Package Maintainer**: Joao Pedro Azevedo (jpazevedo@unicef.org) - **Report Issues**: https://github.com/unicef-drp/unicefData/issues - **UNICEF Data Portal**: https://data.unicef.org/ - **SDMX API Documentation**: https://data.unicef.org/sdmx-api-documentation/