--- title: "Getting started with BEMPdata" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting started with BEMPdata} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` ## Overview The **BEMPdata** package provides access to the Bangladesh Environmental Mobility Panel (BEMP), a household panel survey on environmental migration along the Jamuna River in Bangladesh (2021–2024). The dataset covers 1,691 households across 20 survey datasets (14 rounds: 4 annual in-person waves and 10 bi-monthly phone waves), yielding 24,279 completed surveys. Data are hosted on [Zenodo](https://doi.org/10.5281/zenodo.18229498) and downloaded on demand. Files are cached locally after the first download, so subsequent calls are instant. ## Installation ```{r install} # Install from GitHub remotes::install_github("janfreihardt/BEMPdata") ``` ## Wave structure The package includes a built-in overview of all 20 wave datasets: ```{r wave-overview} library(BEMPdata) wave_overview # In-person waves only wave_overview[wave_overview$type == "in-person", ] ``` Wave identifiers follow the pattern `w{round}[_M|_N|_V]`: | Suffix | Meaning | |--------|---------| | *(none)* | Main household questionnaire | | `_M` | Migrant questionnaire | | `_N` | Non-migrant questionnaire | | `_V` | Village profile questionnaire | ## Downloading wave data Use `get_wave()` to download and load a wave. The first call downloads the full CSV archive (~6 MB) from Zenodo; all subsequent calls use the local cache. ```{r get-wave} # Baseline in-person wave (2021) w1 <- get_wave("w1") head(w1) # Wave 6, migrant questionnaire w6_migrant <- get_wave("w6_M") # Wave 14, non-migrant questionnaire, in Stata format (with value labels) w14_nm <- get_wave("w14_N", format = "dta") ``` ## Working with codebooks ### Look up a variable by keyword ```{r lookup} # Find all variables related to income lookup_variable("income") # Search only in variable labels lookup_variable("migrat", fields = "label") # Use a regular expression lookup_variable("flood|erosion") ``` ### Get the full codebook for a wave ```{r get-codebook} # Codebook for the baseline wave cb_w1 <- get_codebook("w1") names(cb_w1) # Merged codebook across all waves cb_all <- get_codebook("all") nrow(cb_all) ``` The pre-built `codebook` object ships with the package and is available immediately without downloading: ```{r bundled-codebook} # Available offline head(codebook[, c("wave", "variable_name", "variable_label", "block")]) ``` ## Cache management ```{r cache} # Check what is cached and how much space it uses bemp_cache_info() # Clear the cache (will prompt for confirmation) bemp_cache_clear() ``` ## Linking waves The panel respondent code is stored in the registration block of each wave. Here is a minimal example of merging two waves: ```{r merge} library(dplyr) w1 <- get_wave("w1") w6n <- get_wave("w6_N") # Identify the respondent code columns lookup_variable("respondent code", fields = "label") # Merge on respondent code (adjust variable names as needed) panel <- inner_join(w1, w6n, by = "w1_reg1", suffix = c("_w1", "_w6n")) ``` ## Citation If you use this package or dataset, please cite: **R package:** > Freihardt, J. (2026). *BEMPdata: R package for the Bangladesh Environmental > Mobility Panel*. Zenodo. **Dataset:** > Freihardt, J. et al. (2026). *The Bangladesh Environmental Mobility Panel > (BEMP): Panel data on (im)mobility, socio-economic, and political impacts > of riverbank erosion and flooding in Bangladesh* [Dataset]. Zenodo. > **Data descriptor:** > Freihardt, J. et al. (*forthcoming*). Bangladesh Environmental Mobility > Panel (BEMP). *[Journal]*. DOI: [to be added]