--- title: "LOAD_PNADC" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{LOAD_PNADC} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` The `load_pnadc` function is a wrapper for [*`get_pnadc`*](https://www.rdocumentation.org/packages/PNADcIBGE/versions/0.7.0/topics/get_pnadc) from the package `PNADcIBGE`, with added identification algorithms for panel construction. For details on the identification algorithms, see `vignette("BUILD_PNADC_PANEL")`. *** **Panel Structure:** The table below shows the first and last quarter (`ANOtrimestre`, e.g. `20121` = 2012 Q1) covered by each PNADC rotating panel: | Panel | Start | End | |------:|------:|------:| | 1 | 20121 | 20124 | | 2 | 20121 | 20141 | | 3 | 20132 | 20152 | | 4 | 20143 | 20163 | | 5 | 20154 | 20174 | | 6 | 20171 | 20191 | | 7 | 20182 | 20202 | | 8 | 20193 | 20213 | | 9 | 20204 | 20224 | | 10 | 20221 | 20241 | | 11 | 20232 | 20252 | | 12 | 20243 | 20263 | | 13 | 20254 | 20274 | | 14 | 20271 | 20291 | *** **Usage:** Default ```{r eval=FALSE} load_pnadc( save_to = getwd(), years, quarters = 1:4, panel = "advanced", raw_data = FALSE, save_options = c(TRUE, TRUE), vars = NULL ) ``` To download PNADC data for all quarters of 2022 and 2023, with advanced identification, simply run ```{r eval=FALSE} load_pnadc( save_to = "Directory/You/Would/like/to/save/the/files", years = 2022:2023 ) ``` To download PNADC data for all of 2022, but only the first quarter of 2023, run ```{r eval=FALSE} load_pnadc( save_to = "Directory/You/Would/like/to/save/the/files", years = 2022:2023, quarters = list(1:4, 1) ) ``` To download PNADC data without any variables treatment or identification (e.g., for all quarters of 2021), run ```{r eval=FALSE} load_pnadc( save_to = "Directory/You/Would/like/to/save/the/files", years = 2021, panel = "none", raw_data = TRUE ) ``` To download PNADC data, keep the quarters parquet on disk, and save panels as Parquet, run ```{r eval=FALSE} load_pnadc( save_to = "Directory/You/Would/like/to/save/the/files", years = 2022, save_options = c(TRUE, FALSE) ) ``` To download PNADC data and save panels as CSV but discard the intermediate quarters parquet, run ```{r eval=FALSE} load_pnadc( save_to = "Directory/You/Would/like/to/save/the/files", years = 2022, save_options = c(FALSE, TRUE) ) ``` To download only a specific subset of variables — for example, age (`V2009`) and habitual income (`VD4019`) — alongside the structural columns that `PNADcIBGE` always returns, run ```{r eval=FALSE} load_pnadc( save_to = "Directory/You/Would/like/to/save/the/files", years = 2022, vars = c("V2009", "VD4019") ) ``` > **Note:** `PNADcIBGE::get_pnadc()` always downloads a set of ~210 structural > columns regardless of the `vars` argument. These include survey design weights > (`V1027`, `V1028`, `V1028001`–`V1028200`, `posest`, `posest_sxi`), deflator > variables (`Habitual`, `Efetivo`), and identifiers such as `UF`, `Estrato`, > `V1029`, `V1033`, and `ID_DOMICILIO`. The `vars` argument adds columns > **on top of** those; it does not restrict them. Use `vars = NULL` (the > default) to download all available microdata columns. If you specify `vars` and also request panel identification, any columns required by the identification algorithm that are absent from `vars` will be added automatically and a warning will tell you which ones were added. For example, when using `panel = "advanced"`, the columns `V2007`, `V20082`, `V20081`, `V2008`, and `V2003` must be present. If you omit them from `vars`, the function adds them for you: ```{r eval=FALSE} # Only V2009 requested, but panel = "advanced" (the default) needs # V2007, V20082, V20081, V2008 and V2003 — these are added automatically # with a warning. load_pnadc( save_to = "Directory/You/Would/like/to/save/the/files", years = 2022, panel = "advanced", vars = c("V2009", "VD4019") ) ``` *** **Options:** 1. **save_to**: The directory in which the user desires to save the downloaded files. 2. **years**: picks the years for which the data will be downloaded 3. **quarters**: The quarters within those years to be downloaded. Can be either a vector such as `1:4` for consistent quarters across years, or a list of vectors, if quarters are different for each year (e.g. `list(1:4, 1:2)` for four quarters in the first year and two in the second). 4. **panel**: Which panel algorithm to apply to this data. There are three options: * `none`: No panel is built. If `raw_data = TRUE`, returns the original data. Otherwise, creates some extra treated variables. The intermediate quarters parquet is always kept when `panel = "none"`. * `basic`: Performs basic identification steps for creating households and individual identifiers for panel construction * `advanced`: Performs advanced identification steps for creating households and individual identifiers for panel construction. 5. **raw_data**: A command to define if the user would like to download the raw or treated data. There are two options: * `TRUE`: if you want the PNADC variables as they come. * `FALSE`: if you want the treated version of the PNADC variables. 6. **save_options**: A logical vector of length 2 controlling file saving behaviour: * `c(TRUE, TRUE)` (default): keeps the intermediate quarters parquet after panel is built; saves panel files as `.csv`. * `c(FALSE, TRUE)`: deletes the quarters parquet after use; saves panel files as `.csv`. * `c(TRUE, FALSE)`: keeps the quarters parquet; saves panel files as a `.parquet` dataset. * `c(FALSE, FALSE)`: deletes the quarters parquet after use; saves panel files as a `.parquet` dataset. 7. **vars**: A character vector of additional variable names to download, following the same convention as `vars` in `PNADcIBGE::get_pnadc()`. Use `NULL` (the default) to download all available microdata columns. See the note above regarding the ~210 structural columns that are always returned by `PNADcIBGE::get_pnadc()` regardless of this argument. *** **Details:** The function performs the following steps: 1. Loop over years and quarters using `PNADcIBGE::get_pnadc` to download the data. All quarters are collected in memory and saved together into a single `pnadc_quarters.parquet` file in `save_to`. 2. Split the data into panels by the panel variable `V1014`. Data from each panel is saved depending on `save_options`. 3. Read each panel file and apply the identification algorithms defined in `build_pnadc_panel`. 4. If `save_options[1] = FALSE`, the intermediate quarters parquet is deleted after the panels are built. * The identification algorithms in `build_pnadc_panel` are drawn from Ribas, Rafael Perez, and Sergei Suarez Dillon Soares (2008): "Sobre o painel da Pesquisa Mensal de Emprego (PME) do IBGE". ***