--- title: "Getting Started with realestatebr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with realestatebr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.asp = 0.618, fig.align = "center", message = FALSE, warning = FALSE ) ``` ## Introduction This vignette provides a minimal introduction to the `realestatebr` package, showing how to use its core functions. Since `realestatebr` returns `tibble` as default values, we recommend using it together with the `dplyr` package, though conversion do `data.table` is trivial. ```{r} library(realestatebr) library(dplyr) ``` The code below defines a common theme for all plots in this vignette and is required to fully replicate the code in this document. Despite this, this code is entirely optional and can be omitted. ```{r setup, message = FALSE} #| code-fold: true library(ggplot2) color_palette <- c( "#1E3A5F", "#DD6B20", "#2C7A7B", "#D69E2E", "#805AD5", "#C53030" ) theme_series <- function() { theme_minimal( # swap for other font if needed base_family = "Avenir", base_size = 10 ) + theme( plot.title = element_text(size = 16), panel.grid.minor = element_blank(), panel.grid.major.x = element_blank(), axis.line.x = element_line(color = "gray10", linewidth = 0.5), axis.ticks.x = element_line(color = "gray10", linewidth = 0.5), axis.title.x = element_blank(), legend.position = "bottom", palette.color.discrete = color_palette ) } ``` ```{r} #| include: false library(knitr) library(kableExtra) ``` `realestatebr` provides a unified interface to Brazilian real estate data from multiple public sources. All datasets are returned as tidy `tibble` objects. ## Core Interface The goal of `realestatebr` is to provide a unified interface to Brazilian real estate data from multiple public sources. All datasets are returned as tidy `tibble` objects. The package is centered around a key function: `get_dataset(name, table)` which retrieves any dataset by name. Without a `table` argument it returns the default table; use `table` to select a specific sub-table. - Use `get_dataset()` main function to retrieve datasets. ```{r} #| eval: false # Default table abecip <- get_dataset("abecip") # Specific table sbpe <- get_dataset("abecip", table = "units") ``` In order to explore which datasets are available, use `list_datasets()` and `get_dataset_info()`. - **`list_datasets()`** returns a catalogue of all available datasets and their tables. ```{r} ds <- list_datasets() ``` ```{r} #| echo: false ds |> select(name, title, source, available_tables, frequency) |> kable() |> kable_styling(bootstrap_options = "striped") |> scroll_box(width = "100%", height = "400px") ``` - **`get_dataset_info()`** shows available tables and metadata for a given dataset. ```{r} #| eval: false info <- get_dataset_info("abecip") names(info$categories) #> [1] "sbpe" "units" "cgi" ``` ### The `source` Argument The `source` argument from `get_dataset()` controls where data comes from. The default (`"auto"`) checks the local cache first, then falls back to the GitHub release. Typically, the best option is to use the default or `"github"`. Choosing `"fresh"` will download the data from the original source: while this guarantees the most recent data, it is slower. ```{r} #| eval: false get_dataset("abecip", source = "cache") # local cache (instant, works offline) get_dataset("abecip", source = "github") # GitHub release get_dataset("abecip", source = "fresh") # direct from the original source ``` Cache files are stored in the user data directory and can be inspected with `list_cached_files()` or cleared with `clear_user_cache()`. ## Example: Housing Credit Cycle SBPE (Sistema Brasileiro de Poupança e Empréstimo) is the primary funding mechanism for residential mortgages in Brazil. The table `sbpe` from` `abecip` tracks the deposits and withdrawals from saving accounts, that help finance real estate construction and acquisition. ```{r} sbpe <- get_dataset("abecip", table = "sbpe") glimpse(sbpe) ``` The plot below shows the annual net savings flow in recent years. ```{r} #| code-fold: true # Annual net credit flow sbpe_annual <- sbpe |> filter(date >= as.Date("2019-01-01")) |> mutate(year = lubridate::year(date)) |> summarise(net_flow = sum(sbpe_netflow, na.rm = TRUE) / 1e3, .by = year) |> mutate( label_num = format(round(net_flow, 1)), ypos = if_else(net_flow > 0, net_flow + 10, net_flow - 10) ) ggplot(sbpe_annual, aes(year, net_flow)) + geom_col(fill = color_palette[1], alpha = 0.9, width = 0.8) + geom_text(aes(y = ypos, label = label_num), size = 3) + geom_hline(yintercept = 0) + scale_x_continuous(breaks = 2019:2026) + labs( title = "Annual Net Savings Flow (SBPE)", x = NULL, y = "R$ billions" ) + theme_series() ``` The companion table `"units"` contains monthly counts of financed units. ```{r} units <- get_dataset("abecip", table = "units") glimpse(units) ``` The plot shows the amount of units financed per month together with a LOESS trend line. ```{r} #| code-fold: true # SBPE units financed per year units_recent <- units |> filter(date >= as.Date("2019-01-01")) ggplot(units_recent, aes(date, units_total)) + geom_point(alpha = 0.5, size = 0.8, color = color_palette[1]) + geom_smooth( color = color_palette[1], lwd = 0.7, se = FALSE, method = stats::loess, method.args = list(span = 0.4) ) + scale_x_date(date_breaks = "1 year", date_labels = "%Y") + labs( title = "Monthly Financed Units", y = "Units" ) + theme_series() ``` ## Example: Real Estate Credit Portfolio The `bcb_realestate` dataset imports all real estate statistics from the [Brazilian Central Bank](https://www.bcb.gov.br/estatisticas/mercadoimobiliario). This is a relatively large dataset and exploring can be cumbersome. Each series is uniquely identified by `date` and `series_info`. Helper functions `v1`, `v2`, ..., `v5`, `abbrev_state`, `category`, and `type` are provided to simplify the use of the dataset. The code below shows how to access a specific series and also how to fetch a group of related series. ```{r} bcb <- get_dataset("bcb_realestate") # Get a specific series sfh_pf <- bcb |> filter(series_info == "credito_estoque_carteira_credito_pf_sfh_br") # Get the all the related series for 'estoque_carteira_credito_pf' credit_stock <- bcb |> filter( category == "credito", type == "estoque", v1 == "carteira", v2 == "credito", v3 == "pf", # since v4 is left blank, we get all credit lines v5 == "br" ) # The helper columns essentially separate the 'series_info' column allowing # for easier filtering. It's equivalent to filtering by regex credit_stock <- bcb |> filter(grepl( "(?<=credito_estoque_carteira_credito_pf_).+_br$", series_info, perl = TRUE )) ``` The single series shows only the values from SFH (specific credit line). ```{r} #| code-fold: true ggplot(sfh_pf, aes(date, value / 1e9)) + geom_line(lwd = 0.7, color = color_palette[1]) + labs(title = "SFH", y = "R$ (billions)") + theme_series() ``` The grouped series show the entire household credit stock by credit line. ```{r} #| code-fold: true credit_labels <- c( "Home Equity" = "home-equity", "Comercial" = "comercial", "Livre" = "livre", "FGTS" = "fgts", "SFH" = "sfh" ) credit_stock <- credit_stock |> mutate( credit_line_label = factor( v4, levels = credit_labels, labels = names(credit_labels) ) ) ggplot(credit_stock, aes(date, value / 1e9)) + geom_area(aes(fill = credit_line_label), alpha = 0.9) + scale_fill_manual(values = rev(color_palette[1:5])) + scale_x_date(expand = expansion(mult = c(0.01))) + scale_y_continuous(expand = expansion(mult = c(0, 0.05))) + labs( title = "Real Estate Credit Stock", subtitle = "Household real estate credit stock (total debt) by credit line", y = "R$ (billions)", fill = NULL ) + theme_series() ``` As a final warning, note that the `bcb_realestate` dataset follows the `YYYY-MM-DD` format using the last day of the month as default value (e.g. `2023-01-31`). This can cause issues when merging with other datasets, since the first day of the month is the more common date format (e.g. `2023-01-01`). To avoid this, use `lubridate::floor_date(date, 'month')`. Future versions of `realestatebr` might provide this as a default behavior. ## Reference (all datasets) The available datasets are listed below. | Dataset | Source | Tables | Status | |---------|--------|--------|--------| | `abecip` | ABECIP | `sbpe`, `units`, `cgi` | Active | | `abrainc` | ABRAINC / FIPE | `indicator`, `radar`, `leading` | Active | | `bcb_realestate` | Banco Central do Brasil | `accounting`, `application`, `indices`, `sources`, `units` | Active | | `bcb_series` | Banco Central do Brasil | `core`, `primary`, `secondary`, `tertiary`, `full` | Active | | `fgv_ibre` | FGV IBRE | — | Active | | `rppi` | FIPE/ZAP, IVGR, IGMI, IQA, IVAR, SECOVI-SP | `sale`, `rent`, `fipezap`, `ivgr`, `igmi`, `iqa`, `iqaiw`, `ivar`, `secovi_sp` | Active | | `rppi_bis` | Bank for International Settlements | `selected`, `detailed_monthly`, `detailed_quarterly`, `detailed_annual`, `detailed_halfyearly` | Active | | `secovi` | SECOVI-SP | `condo`, `rent`, `launch`, `sale` | Active |