---
title: "LOAD_PNADC"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{LOAD_PNADC}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
The `load_pnadc` function is a wrapper for [*`get_pnadc`*](https://www.rdocumentation.org/packages/PNADcIBGE/versions/0.7.0/topics/get_pnadc) from the package `PNADcIBGE`, with added identification algorithms for panel construction. For details on the identification algorithms, see `vignette("BUILD_PNADC_PANEL")`.
***
**Panel Structure:**
The table below shows the first and last quarter (`ANOtrimestre`, e.g.
`20121` = 2012 Q1) covered by each PNADC rotating panel:
| Panel | Start | End |
|------:|------:|------:|
| 1 | 20121 | 20124 |
| 2 | 20121 | 20141 |
| 3 | 20132 | 20152 |
| 4 | 20143 | 20163 |
| 5 | 20154 | 20174 |
| 6 | 20171 | 20191 |
| 7 | 20182 | 20202 |
| 8 | 20193 | 20213 |
| 9 | 20204 | 20224 |
| 10 | 20221 | 20241 |
| 11 | 20232 | 20252 |
| 12 | 20243 | 20263 |
| 13 | 20254 | 20274 |
| 14 | 20271 | 20291 |
***
**Usage:**
Default
```{r eval=FALSE}
load_pnadc(
save_to = getwd(),
years,
quarters = 1:4,
panel = "advanced",
raw_data = FALSE,
save_options = c(TRUE, TRUE),
vars = NULL
)
```
To download PNADC data for all quarters of 2022 and 2023, with advanced identification, simply run
```{r eval=FALSE}
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2022:2023
)
```
To download PNADC data for all of 2022, but only the first quarter of 2023, run
```{r eval=FALSE}
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2022:2023,
quarters = list(1:4, 1)
)
```
To download PNADC data without any variables treatment or identification (e.g., for all quarters of 2021), run
```{r eval=FALSE}
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2021,
panel = "none",
raw_data = TRUE
)
```
To download PNADC data, keep the quarters parquet on disk, and save panels as Parquet, run
```{r eval=FALSE}
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2022,
save_options = c(TRUE, FALSE)
)
```
To download PNADC data and save panels as CSV but discard the intermediate quarters parquet, run
```{r eval=FALSE}
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2022,
save_options = c(FALSE, TRUE)
)
```
To download only a specific subset of variables — for example, age (`V2009`) and habitual income (`VD4019`) — alongside the structural columns that `PNADcIBGE` always returns, run
```{r eval=FALSE}
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2022,
vars = c("V2009", "VD4019")
)
```
> **Note:** `PNADcIBGE::get_pnadc()` always downloads a set of ~210 structural
> columns regardless of the `vars` argument. These include survey design weights
> (`V1027`, `V1028`, `V1028001`–`V1028200`, `posest`, `posest_sxi`), deflator
> variables (`Habitual`, `Efetivo`), and identifiers such as `UF`, `Estrato`,
> `V1029`, `V1033`, and `ID_DOMICILIO`. The `vars` argument adds columns
> **on top of** those; it does not restrict them. Use `vars = NULL` (the
> default) to download all available microdata columns.
If you specify `vars` and also request panel identification, any columns
required by the identification algorithm that are absent from `vars` will be
added automatically and a warning will tell you which ones were added. For
example, when using `panel = "advanced"`, the columns `V2007`, `V20082`,
`V20081`, `V2008`, and `V2003` must be present. If you omit them from `vars`,
the function adds them for you:
```{r eval=FALSE}
# Only V2009 requested, but panel = "advanced" (the default) needs
# V2007, V20082, V20081, V2008 and V2003 — these are added automatically
# with a warning.
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2022,
panel = "advanced",
vars = c("V2009", "VD4019")
)
```
***
**Options:**
1. **save_to**: The directory in which the user desires to save the downloaded files.
2. **years**: picks the years for which the data will be downloaded
3. **quarters**: The quarters within those years to be downloaded. Can be either a vector such as `1:4` for consistent quarters across years, or a list of vectors, if quarters are different for each year (e.g. `list(1:4, 1:2)` for four quarters in the first year and two in the second).
4. **panel**: Which panel algorithm to apply to this data. There are three options:
* `none`: No panel is built. If `raw_data = TRUE`, returns the original data. Otherwise, creates some extra treated variables. The intermediate quarters parquet is always kept when `panel = "none"`.
* `basic`: Performs basic identification steps for creating households and individual identifiers for panel construction
* `advanced`: Performs advanced identification steps for creating households and individual identifiers for panel construction.
5. **raw_data**: A command to define if the user would like to download the raw or treated data. There are two options:
* `TRUE`: if you want the PNADC variables as they come.
* `FALSE`: if you want the treated version of the PNADC variables.
6. **save_options**: A logical vector of length 2 controlling file saving behaviour:
* `c(TRUE, TRUE)` (default): keeps the intermediate quarters parquet after panel is built; saves panel files as `.csv`.
* `c(FALSE, TRUE)`: deletes the quarters parquet after use; saves panel files as `.csv`.
* `c(TRUE, FALSE)`: keeps the quarters parquet; saves panel files as a `.parquet` dataset.
* `c(FALSE, FALSE)`: deletes the quarters parquet after use; saves panel files as a `.parquet` dataset.
7. **vars**: A character vector of additional variable names to download, following the same convention as `vars` in `PNADcIBGE::get_pnadc()`. Use `NULL` (the default) to download all available microdata columns. See the note above regarding the ~210 structural columns that are always returned by `PNADcIBGE::get_pnadc()` regardless of this argument.
***
**Details:**
The function performs the following steps:
1. Loop over years and quarters using `PNADcIBGE::get_pnadc` to download the data. All quarters are collected in memory and saved together into a single `pnadc_quarters.parquet` file in `save_to`.
2. Split the data into panels by the panel variable `V1014`. Data from each panel is saved depending on `save_options`.
3. Read each panel file and apply the identification algorithms defined in `build_pnadc_panel`.
4. If `save_options[1] = FALSE`, the intermediate quarters parquet is deleted after the panels are built.
* The identification algorithms in `build_pnadc_panel` are drawn from Ribas, Rafael Perez, and Sergei Suarez Dillon Soares (2008): "Sobre o painel da Pesquisa Mensal de Emprego (PME) do IBGE".
***