--- title: "RREO longitudinal — handling layout drift across years" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{RREO longitudinal — handling layout drift across years} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>", eval = FALSE) ``` ## The problem: RREO labels drift across years The SICONFI RREO is a long table where each row is identified by an *appendix* (`no_anexo`), an *account* (`conta`) and a *column* (`coluna`). All three drift across fiscal years even when the underlying concept stays the same: * **Appendix renaming.** Federal social-security data on the General Regime (RGPS) lived in `RREO-Anexo 04.3 - RGPS` from 2015 to 2022 and moved to `RREO-Anexo 04.4 - RGPS` from 2023 onwards. RPPS Civis and FCDF moved from Anexo 04.1 to 04.2 in the same reform. * **Account-code shuffling.** Account labels carry trailing Roman numerals tied to the appendix layout, so the same indicator appears as `"Resultado Previdenciário RGPS (VII) = (III - VI)"` in one year and as `"Resultado Previdenciário RGPS (V) = (III - IV)"` in another. * **Column suffix instability.** A given column shows up as `"DESPESAS LIQUIDADAS ATÉ O BIMESTRE / 2019"` in 2019, then as `"DESPESAS LIQUIDADAS ATÉ O BIMESTRE"` (no year) in 2021–2022, and as `"DESPESAS LIQUIDADAS ATÉ O BIMESTRE / 2023"` in 2023+, with stray whitespace variants such as `"... BIMESTRE/ 2023"` (no space) inside the same year. Together, these three sources of drift mean that any code that filters SICONFI by literal label values ends up with `paste0()` ladders full of `ifelse()` to special-case each year. This vignette shows how the helpers in **tesouror** turn this into a single function call. ## The bundled layout reference `rreo_layout()` returns the table in `inst/extdata/rreo_layout.csv` that maps `(topic, regime, year_range)` to the correct appendix and to a year-stable matching key for the account label. ```{r} library(tesouror) rreo_layout() ``` ``` #> # A tibble: 7 × 8 #> topic regime first_year last_year co_esfera no_anexo conta_match indicador #> #> 1 previdencia civil_rpps 2015 2022 U RREO-Anexo 04.1 resultado previdenciario rpps civis resultado_previdenciario_civil #> 2 previdencia civil_rpps 2023 2099 U RREO-Anexo 04.2 resultado previdenciario rpps civis resultado_previdenciario_civil #> 3 previdencia fcdf 2015 2022 U RREO-Anexo 04.1 resultado previdenciario - fcdf resultado_previdenciario_fcdf #> 4 previdencia fcdf 2023 2099 U RREO-Anexo 04.2 resultado previdenciario - fcdf resultado_previdenciario_fcdf #> 5 previdencia militar 2015 2099 U RREO-Anexo 04.2 resultado total - militares inativos resultado_militares_inativos #> 6 previdencia rgps 2015 2022 U RREO-Anexo 04.3 - RGPS resultado previdenciario rgps resultado_previdenciario_rgps #> 7 previdencia rgps 2023 2099 U RREO-Anexo 04.4 - RGPS resultado previdenciario rgps resultado_previdenciario_rgps ``` Use it directly to look up the right appendix when fetching data with `get_rreo()`: ```{r} layout <- rreo_layout() fetch_year <- function(year) { rule <- layout[layout$topic == "previdencia" & layout$regime == "rgps" & year >= layout$first_year & year <= layout$last_year, ] get_rreo( an_exercicio = year, nr_periodo = 6, co_tipo_demonstrativo = "RREO", no_anexo = rule$no_anexo[1], co_esfera = "U", id_ente = 1 ) } ``` ## Step 1 — Normalize column labels `rreo_normalize_columns()` strips trailing year suffixes (`"/ 2019"`, `"/2023"`, `" /2023"`, `"EM 2023"`) and squishes whitespace, exposing the year separately so you can distinguish a *current-year* column from a *previous-year* comparator. ```{r} demo <- tibble::tibble( coluna = c( "DESPESAS LIQUIDADAS ATÉ O BIMESTRE / 2023", "DESPESAS LIQUIDADAS ATÉ O BIMESTRE", "DESPESAS LIQUIDADAS ATÉ O BIMESTRE/ 2018", "INSCRITAS EM RESTOS A PAGAR NÃO PROCESSADOS EM 2023" ) ) rreo_normalize_columns(demo) ``` ``` #> # A tibble: 4 × 3 #> coluna coluna_padrao coluna_ano #> #> 1 DESPESAS LIQUIDADAS ATÉ O BIMESTRE / 2023 DESPESAS LIQUIDADAS ATÉ O BIMESTRE 2023 #> 2 DESPESAS LIQUIDADAS ATÉ O BIMESTRE DESPESAS LIQUIDADAS ATÉ O BIMESTRE NA #> 3 DESPESAS LIQUIDADAS ATÉ O BIMESTRE/ 2018 DESPESAS LIQUIDADAS ATÉ O BIMESTRE 2018 #> 4 INSCRITAS EM RESTOS A PAGAR NÃO PROCESSADOS EM 2023 INSCRITAS EM RESTOS A PAGAR NÃO PROCESSADOS 2023 ``` ## Step 2 — Tidy by topic `tidy_rreo()` runs `rreo_normalize_columns()` for you and then matches each row's `conta` against the year-appropriate rule in `rreo_layout()`. Account labels are matched on a stable stem (everything before the first `(`, accent-stripped and lowercased), so the shifting Roman numerals do not break the join. ```{r} library(dplyr) # Pull the federal RGPS series for five years using the layout rgps_raw <- purrr::map_dfr(2019:2023, fetch_year) rgps_tidy <- rgps_raw |> tidy_rreo(topic = "previdencia", regime = "rgps") panel <- rgps_tidy |> filter(coluna_padrao == "DESPESAS LIQUIDADAS ATÉ O BIMESTRE", is.na(coluna_ano) | coluna_ano == exercicio) |> select(exercicio, indicador, regime, valor) panel ``` ``` #> # A tibble: 5 × 4 #> exercicio indicador regime valor #> #> 1 2019 resultado_previdenciario_rgps rgps -2.13e11 #> 2 2020 resultado_previdenciario_rgps rgps -2.61e11 #> 3 2021 resultado_previdenciario_rgps rgps -2.48e11 #> 4 2022 resultado_previdenciario_rgps rgps -2.67e11 #> 5 2023 resultado_previdenciario_rgps rgps -3.13e11 ``` The five-year deficit series of the RGPS, with no manual reconciliation of appendix names, account codes, or column suffixes. ## Multi-regime in one call Drop `regime` to keep every regime registered for the topic at once: ```{r} all_topics <- rreo_layout() fetch_topic <- function(year, regime) { rules <- all_topics[all_topics$topic == "previdencia" & all_topics$regime == regime & year >= all_topics$first_year & year <= all_topics$last_year, ] if (nrow(rules) == 0L) return(NULL) purrr::map_dfr(unique(rules$no_anexo), \(an) { get_rreo( an_exercicio = year, nr_periodo = 6, co_tipo_demonstrativo = "RREO", no_anexo = an, co_esfera = "U", id_ente = 1 ) }) } regimes <- unique(all_topics$regime[all_topics$topic == "previdencia"]) raw_22_23 <- purrr::map_dfr(2022:2023, \(yr) { purrr::map_dfr(regimes, \(rg) fetch_topic(yr, rg)) }) raw_22_23 |> tidy_rreo(topic = "previdencia") |> filter(coluna_padrao == "DESPESAS LIQUIDADAS ATÉ O BIMESTRE", is.na(coluna_ano) | coluna_ano == exercicio) |> select(exercicio, indicador, regime, valor) |> distinct() ``` ``` #> # A tibble: 7 × 4 #> exercicio indicador regime valor #> #> 1 2022 resultado_previdenciario_civil civil_rpps -5.04e10 #> 2 2023 resultado_previdenciario_civil civil_rpps -5.47e10 #> 3 2022 resultado_previdenciario_fcdf fcdf -6.56e 9 #> 4 2023 resultado_previdenciario_fcdf fcdf -8.03e 9 #> 5 2022 resultado_militares_inativos militar -3.11e10 #> 6 2022 resultado_previdenciario_rgps rgps -2.67e11 #> 7 2023 resultado_previdenciario_rgps rgps -3.13e11 ``` ## Adding new topics `inst/extdata/rreo_layout.csv` is the single source of truth. To extend coverage to a new topic — for example, RCL (`Receita Corrente Líquida`) or DPF (`Dívida Pública Fundada`) — append rows describing each year-range plus the year-stable account stem: ``` topic,regime,first_year,last_year,co_esfera,no_anexo,conta_match,indicador rcl,total,2015,2099,M,RREO-Anexo 03,receita corrente liquida (iv) = (i - ii - iii),rcl_total ``` The matching key (`conta_match`) is built by lowercasing the account label, stripping Latin diacritics, and dropping everything from the first `(` onwards. The helper `tesouror:::.clean_conta()` shows the exact transformation; running it on a sample row from `get_rreo()` gives you the value to put into the CSV. ## See also * [rsiconfi#4](https://github.com/tchiluanda/rsiconfi/issues/4) — the original issue on the competing `rsiconfi` package describing the problem this layer solves. * `?get_rreo`, `?get_rreo_for_state`, `?tidy_rreo`, `?rreo_normalize_columns`, `?rreo_layout`.