---
title: "Advanced Usage: Equity Analysis and Visualization"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Advanced Usage: Equity Analysis and Visualization}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

This vignette demonstrates advanced analytical workflows using `unicefData`,
aligned with the examples in the package documentation paper (Azevedo, 2026).
All examples use the same indicators, countries, and parameters as the paper
and the Stata help file, enabling cross-language reproducibility.

## Data Acquisition as Infrastructure

The examples in this vignette demonstrate the principle of **treating data acquisition as code**. Notice how each example explicitly specifies:

- Which indicators are requested (e.g., `CME_MRY0T4` for under-5 mortality)
- Which countries or regions are included
- How data are filtered (by sex, wealth, time period)
- What output format is needed (long, wide, or indicators as columns)

This approach contrasts with workflows where researchers:
1. Manually download data from a web portal
2. Apply undocumented filters in Excel or R
3. Manually clean and reshape the data

With unicefData, all these decisions are explicit and version-controlled in your script. This makes your analysis:

- **Auditable**: Anyone can inspect your code and verify exactly what data you used
- **Defensible**: Your data selection decisions are clearly documented
- **Maintainable**: If upstream data change, you can simply re-run your script
- **Reproducible**: Your entire pipeline—from data retrieval to visualization—is in one place

This is especially important in research assisted by AI tools, where automated analysis must rest on transparent and verifiable data foundations.

```{r library}
library(unicefData)
library(dplyr)
```

## U5MR trends in South Asia

Reproduce the paper's South Asia mortality trend analysis (paper Example 5+6):

```{r south-asia-trends}
# Fetch under-5 mortality for South Asian countries
df <- unicefData(
  indicator = "CME_MRY0T4",
  countries = c("AFG", "BGD", "BTN", "IND", "MDV", "NPL", "PAK", "LKA")
)

# Filter to total (both sexes)
df_total <- df %>% filter(sex == "_T" | is.na(sex))

# Plot trends
plot(
  value ~ period,
  data = df_total[df_total$iso3 == "AFG", ],
  type = "l", col = "red", lwd = 2,
  ylim = range(df_total$value, na.rm = TRUE),
  xlab = "Year", ylab = "Under-5 mortality rate (per 1,000)",
  main = "U5MR Trends in South Asia"
)
lines(value ~ period, data = df_total[df_total$iso3 == "BGD", ], col = "blue", lwd = 2)
lines(value ~ period, data = df_total[df_total$iso3 == "IND", ], col = "green", lwd = 2)
lines(value ~ period, data = df_total[df_total$iso3 == "PAK", ], col = "orange", lwd = 2)
legend("topright",
  legend = c("Afghanistan", "Bangladesh", "India", "Pakistan"),
  col = c("red", "blue", "green", "orange"), lwd = 2
)
```

The equivalent Stata code from the paper:

```
. unicefdata, indicator(CME_MRY0T4) countries(AFG BGD BTN IND MDV NPL PAK LKA) clear
. keep if sex == "_T"
. graph twoway ///
    (connected value period if iso3 == "AFG", lcolor(red)) ///
    (connected value period if iso3 == "BGD", lcolor(blue)) ///
    (connected value period if iso3 == "IND", lcolor(green)) ///
    (connected value period if iso3 == "PAK", lcolor(orange)), ///
        legend(order(1 "Afghanistan" 2 "Bangladesh" 3 "India" 4 "Pakistan"))
```

## Stunting by wealth quintile

Equity analysis using wealth disaggregation (paper Example 8):

```{r stunting-wealth}
# Fetch stunting data with all wealth quintiles
df <- unicefData(
  indicator = "NT_ANT_HAZ_NE2",
  sex = "ALL",
  wealth = "ALL",
  latest = TRUE
)

# Filter to wealth quintiles only
df_wealth <- df %>%
  filter(wealth_quintile %in% c("Q1", "Q2", "Q3", "Q4", "Q5"))

# Average stunting by wealth quintile (global)
summary_wealth <- df_wealth %>%
  group_by(wealth_quintile) %>%
  summarise(mean_stunting = mean(value, na.rm = TRUE), .groups = "drop") %>%
  arrange(wealth_quintile)

print(summary_wealth)

# Visualize the wealth gradient
barplot(
  summary_wealth$mean_stunting,
  names.arg = summary_wealth$wealth_quintile,
  ylab = "Stunting prevalence (%)",
  main = "Child Stunting by Wealth Quintile",
  col = c("#d73027", "#fc8d59", "#fee090", "#91bfdb", "#4575b4")
)
```

## Wealth gap analysis

Quantify the equity gap between poorest and richest quintiles:

```{r wealth-gap}
# Fetch stunting for specific countries with Q1 and Q5
df <- unicefData(
  indicator = "NT_ANT_HAZ_NE2",
  countries = c("IND", "PAK", "BGD", "ETH"),
  wealth = "ALL",
  latest = TRUE
)

# Compute wealth gap (Q1 - Q5 = poorest minus richest)
df_gap <- df %>%
  filter(wealth_quintile %in% c("Q1", "Q5")) %>%
  tidyr::pivot_wider(
    id_cols = c(iso3, country),
    names_from = wealth_quintile,
    values_from = value
  ) %>%
  mutate(wealth_gap = Q1 - Q5) %>%
  arrange(desc(wealth_gap))

print(df_gap)
```

## Multiple mortality indicators comparison

Compare neonatal and under-5 mortality across countries (paper Example 10):

```{r multi-mortality}
# Fetch multiple mortality indicators
df <- unicefData(
  indicator = c("CME_MRM0", "CME_MRY0T4"),
  countries = c("BRA", "MEX", "ARG", "COL", "PER", "CHL"),
  year = "2020:2023"
)

# Keep latest year per country-indicator
df_latest <- df %>%
  filter(sex == "_T" | is.na(sex)) %>%
  group_by(iso3, indicator) %>%
  slice_max(period, n = 1) %>%
  ungroup()

# Reshape wide for comparison
df_wide <- df_latest %>%
  select(iso3, country, indicator, value) %>%
  tidyr::pivot_wider(names_from = indicator, values_from = value)

print(df_wide)
```

## Global immunization coverage trends

Track DTP3 and MCV1 coverage over time (paper immunization example):

```{r immunization-trends}
# Fetch immunization indicators
df <- unicefData(
  indicator = c("IM_DTP3", "IM_MCV1"),
  year = "2000:2023"
)

# Global average by year and indicator
trends <- df %>%
  group_by(period, indicator) %>%
  summarise(coverage = mean(value, na.rm = TRUE), .groups = "drop")

# Plot
dtp3 <- trends[trends$indicator == "IM_DTP3", ]
mcv1 <- trends[trends$indicator == "IM_MCV1", ]

plot(coverage ~ period, data = dtp3, type = "l", col = "blue", lwd = 2,
     ylim = c(60, 95), xlab = "Year", ylab = "Coverage (%)",
     main = "Global Immunization Coverage Trends")
lines(coverage ~ period, data = mcv1, col = "red", lwd = 2)
legend("bottomright", legend = c("DTP3", "MCV1"),
       col = c("blue", "red"), lwd = 2)
```

## Regional comparison with metadata

Analyze U5MR by UNICEF region using metadata enrichment (paper Example 12):

```{r regional}
# Fetch with regional classifications
df <- unicefData(
  indicator = "CME_MRY0T4",
  add_metadata = c("region", "income_group"),
  latest = TRUE
)

# Filter to countries only (exclude regional aggregates)
df_countries <- df %>%
  filter(geo_type == 0, sex == "_T" | is.na(sex))

# Average U5MR by region
by_region <- df_countries %>%
  group_by(region) %>%
  summarise(avg_u5mr = mean(value, na.rm = TRUE), .groups = "drop") %>%
  arrange(desc(avg_u5mr))

print(by_region)

# Average U5MR by income group
by_income <- df_countries %>%
  group_by(income_group) %>%
  summarise(avg_u5mr = mean(value, na.rm = TRUE), .groups = "drop") %>%
  arrange(desc(avg_u5mr))

print(by_income)
```

## Wide format for time-series analysis

Create panel datasets for econometric analysis (paper Example 9):

```{r wide-timeseries}
# Wide format: years as columns
df_wide <- unicefData(
  indicator = "CME_MRY0T4",
  countries = c("USA", "BRA", "IND", "CHN"),
  year = "2015:2023",
  format = "wide"
)

# Compute change over time
# Columns will be named yr2015, yr2016, ..., yr2023
# (exact names depend on available data)
print(df_wide)
```

## Wide indicators for cross-domain analysis

Compare indicators from different domains side-by-side:

```{r wide-indicators}
# One column per indicator
df_cross <- unicefData(
  indicator = c("CME_MRY0T4", "CME_MRY0", "IM_DTP3", "IM_MCV1"),
  countries = c("AFG", "ETH", "PAK", "NGA"),
  latest = TRUE,
  format = "wide_indicators"
)

print(df_cross)

# Correlation between mortality and immunization
if (all(c("CME_MRY0T4", "IM_DTP3") %in% names(df_cross))) {
  cor_val <- cor(df_cross$CME_MRY0T4, df_cross$IM_DTP3, use = "complete.obs")
  message("Correlation between U5MR and DTP3: ", round(cor_val, 3))
}
```

## Sex disaggregation analysis

Examine male-female mortality gaps (paper disaggregation example):

```{r sex-gap}
# Fetch all sex categories
df <- unicefData(
  indicator = "CME_MRY0T4",
  countries = c("IND", "PAK", "BGD"),
  year = 2020,
  sex = "ALL"
)

# Compute male-female gap (biological pattern: male > female)
df_gap <- df %>%
  filter(sex %in% c("M", "F")) %>%
  tidyr::pivot_wider(
    id_cols = c(iso3, country, period),
    names_from = sex,
    values_from = value
  ) %>%
  mutate(mf_gap = M - F)

print(df_gap)
```

## Domain-specific examples

### Nutrition indicators

```{r nutrition}
# Stunting prevalence
df_stunting <- unicefData(indicator = "NT_ANT_HAZ_NE2", latest = TRUE)

# Stunting by wealth (poorest quintile only)
df_q1 <- unicefData(
  indicator = "NT_ANT_HAZ_NE2",
  wealth = "Q1",
  latest = TRUE
)

# Stunting by residence (rural only)
df_rural <- unicefData(
  indicator = "NT_ANT_HAZ_NE2",
  residence = "R",
  latest = TRUE
)
```

### WASH indicators

```{r wash}
# Basic drinking water services
df_water <- unicefData(indicator = "WS_PPL_W-B", latest = TRUE)

# Basic sanitation services
df_sanitation <- unicefData(indicator = "WS_PPL_S-B", latest = TRUE)
```

### Education indicators

```{r education}
# Out-of-school rate (primary)
df_oos <- unicefData(indicator = "ED_ROFST_L1", latest = TRUE)

# Net attendance rate (primary)
df_nar <- unicefData(indicator = "ED_ANAR_L1", latest = TRUE)
```

## Defensive programming

Handle errors gracefully when processing multiple indicators:

```{r defensive}
# Process multiple indicators, some of which may not exist
indicators <- c("CME_MRY0T4", "IM_DTP3", "INVALID_CODE_XYZ")

results <- list()
for (ind in indicators) {
  tryCatch({
    results[[ind]] <- unicefData(indicator = ind, countries = "BRA", latest = TRUE)
    message("OK: ", ind, " (", nrow(results[[ind]]), " rows)")
  }, error = function(e) {
    message("FAIL: ", ind, " - ", conditionMessage(e))
  })
}
```

The Stata equivalent uses `capture noisily` (paper Example 16):

```
. foreach ind in CME_MRY0T4 IM_DTP3 INVALID {
    . capture noisily unicefdata, indicator(`ind') clear
    . if _rc == 0 {
        . summarize value
    }
. }
```

## Exporting data

```{r export}
# Fetch and export to CSV
df <- unicefData(
  indicator = "CME_MRY0T4",
  countries = c("ALB", "USA", "BRA", "IND", "CHN", "NGA"),
  year = "2015:2023",
  add_metadata = c("region", "income_group")
)

# Export
write.csv(df, "unicef_mortality_data.csv", row.names = FALSE)
```

## Metadata synchronization

Keep local metadata up to date with the UNICEF Data Warehouse:

```{r sync}
# Sync all metadata
sync_metadata()

# Or sync specific components
sync_dataflows()
sync_indicators()
sync_codelists()
```

## Further reading

- `vignette("unicefData-introduction")` --- Getting started guide
- The package documentation paper: Azevedo, J.P. (2026). "unicefData:
  Trilingual Library for UNICEF SDMX Indicators."
- `?unicefData` --- Main function documentation
- `?search_indicators` --- Indicator discovery
- `?filter_unicef_data` --- Post-processing filters

---

## Acknowledgments

This package was developed at the UNICEF Data and Analytics Section. The author gratefully acknowledges the collaboration of **Lucas Rodrigues**, **Yang Liu**, and **Karen Avanesian**, whose technical contributions and feedback were instrumental in the development of this R package.

Special thanks to **Yves Jaques**, **Alberto Sibileau**, and **Daniele Olivotti** for designing and maintaining the UNICEF SDMX data warehouse infrastructure that makes this package possible.

The author also acknowledges the **UNICEF database managers** and technical teams who ensure data quality, as well as the country office staff and National Statistical Offices whose data collection efforts make this work possible.

Development of this package was supported by UNICEF institutional funding for data infrastructure and statistical capacity building. The author also acknowledges UNICEF colleagues who provided testing and feedback during development, as well as the broader open-source R community.

Development was assisted by AI coding tools (GitHub Copilot, Claude). All code has been reviewed, tested, and validated by the package maintainers.

## Disclaimer

**This package is provided for research and analytical purposes.**

The `unicefData` package provides programmatic access to UNICEF's public data warehouse. While the author is affiliated with UNICEF, **this package is not an official UNICEF product and the statements in this documentation are the views of the author and do not necessarily reflect the policies or views of UNICEF**.

Data accessed through this package comes from the [UNICEF Data Warehouse](https://sdmx.data.unicef.org/). Users should verify critical data points against official UNICEF publications at [data.unicef.org](https://data.unicef.org/).

This software is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors or UNICEF be liable for any claim, damages or other liability arising from the use of this software.

The designations employed and the presentation of material in this package do not imply the expression of any opinion whatsoever on the part of UNICEF concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries.

## Data Citation and Provenance

**Important Note on Data Vintages**

Official statistics are subject to revisions as new information becomes available and estimation methodologies improve. UNICEF indicators are regularly updated based on new surveys, censuses, and improved modeling techniques. Historical values may be revised retroactively to reflect better information or methodological improvements.

**For reproducible research and proper data attribution, users should:**

1. **Document the indicator code** - Specify the exact SDMX indicator code(s) used (e.g., `CME_MRY0T4`)
2. **Record the download date** - Note when data was accessed (e.g., "Data downloaded: 2026-02-09")
3. **Cite the data source** - Reference both the package and the UNICEF Data Warehouse
4. **Archive your dataset** - Save a copy of the exact data used in your analysis

**Example citation for data used in research:**

> Under-5 mortality data (indicator: CME_MRY0T4) accessed from UNICEF Data Warehouse via unicefData R package (v2.1.0) on 2026-02-09. Data available at: https://sdmx.data.unicef.org/

This practice ensures that others can verify your results and understand any differences that may arise from data updates. For official UNICEF statistics in publications, always cross-reference with the current version at [data.unicef.org](https://data.unicef.org/).

## Citation

If you use this package in your research, please cite:

```
Azevedo, J.P. (2026). unicefData: Trilingual R, Python, and Stata Interface
  to UNICEF SDMX Data Warehouse. R package version 2.1.0.
  https://github.com/unicef-drp/unicefData
```

For data citations, please refer to the specific UNICEF datasets accessed through the warehouse and cite them according to UNICEF's data citation guidelines.

## License

This package is released under the MIT License. See the LICENSE file for full details.

## Contact & Support

- **Package Maintainer**: Joao Pedro Azevedo (jpazevedo@unicef.org)
- **Report Issues**: https://github.com/unicef-drp/unicefData/issues
- **UNICEF Data Portal**: https://data.unicef.org/
- **SDMX API Documentation**: https://data.unicef.org/sdmx-api-documentation/