| Title: | Download and Process Brazilian Education Data from INEP |
| Version: | 0.1.0 |
| Description: | Download and process public education data from INEP (Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira). Provides functions to access microdata from the School Census (Censo Escolar), ENEM (Exame Nacional do Ensino Médio), IDEB (Índice de Desenvolvimento da Educação Básica), and other educational datasets. Returns data in tidy format ready for analysis. Data source: INEP Open Data Portal https://www.gov.br/inep/pt-br/acesso-a-informacao/dados-abertos. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| URL: | https://github.com/SidneyBissoli/educabR, https://sidneybissoli.github.io/educabR/ |
| BugReports: | https://github.com/SidneyBissoli/educabR/issues |
| Depends: | R (≥ 4.1.0) |
| Imports: | cli, dplyr, httr2, purrr, readr, rlang, stringr, tidyr, tools |
| Suggests: | ggplot2, knitr, readxl, rmarkdown, testthat (≥ 3.0.0), tibble, withr |
| Config/testthat/edition: | 3 |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2026-01-30 13:17:33 UTC; SIDNEY |
| Author: | Sidney da Silva Pereira Bissoli
|
| Maintainer: | Sidney da Silva Pereira Bissoli <sbissoli76@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-02-03 13:30:08 UTC |
educabR: Download and Process Brazilian Education Data from INEP
Description
Download and process public education data from INEP (Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira). Provides functions to access microdata from the School Census (Censo Escolar), ENEM (Exame Nacional do Ensino Médio), IDEB (Índice de Desenvolvimento da Educação Básica), and other educational datasets. Returns data in tidy format ready for analysis. Data source: INEP Open Data Portal https://www.gov.br/inep/pt-br/acesso-a-informacao/dados-abertos.
The educabR package provides functions to download and process public education data from INEP (Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira). It offers easy access to microdata from:
-
School Census (Censo Escolar): Annual data on schools, enrollment, teachers, and classes in basic education
-
ENEM: Data from the National High School Exam
-
IDEB: Basic Education Development Index
All functions return data in tidy format, ready for analysis with tidyverse tools.
Main functions
School Census:
-
get_censo_escolar(): Download School Census microdata
ENEM:
-
get_enem(): Download ENEM microdata
IDEB:
-
get_ideb(): Download IDEB data
Cache system
The package implements a local cache system to avoid repeated downloads.
Use set_cache_dir() to configure a persistent cache directory.
See get_cache_dir() to check the current cache location.
Data source
All data is downloaded from INEP's official portal: https://www.gov.br/inep/pt-br/acesso-a-informacao/dados-abertos/microdados
Author(s)
Maintainer: Sidney da Silva Pereira Bissoli sbissoli76@gmail.com (ORCID)
See Also
Useful links:
Report bugs at https://github.com/SidneyBissoli/educabR/issues
Check available years for a dataset
Description
Returns the years available for a given INEP dataset.
Usage
available_years(dataset)
Arguments
dataset |
The dataset name. |
Value
An integer vector of available years.
Examples
available_years("censo_escolar")
available_years("enem")
Build INEP microdata URL
Description
Internal function to construct URLs for INEP microdata.
Usage
build_inep_url(dataset, year, ...)
Arguments
dataset |
The dataset name (e.g., "censo_escolar", "enem"). |
year |
The year of the data. |
... |
Additional parameters for URL construction. |
Value
A character string with the URL.
Clear the educabR cache
Description
Removes all cached files from the educabR cache directory.
Usage
clear_cache(dataset = NULL)
Arguments
dataset |
Optional. A character string specifying which dataset
cache to clear. If |
Value
Invisibly returns TRUE if successful.
Examples
# clear all cached data
clear_cache()
# clear only ENEM cache
clear_cache("enem")
Detect file encoding
Description
Internal function to detect the encoding of a text file. INEP files typically use Latin-1 or UTF-8.
Usage
detect_encoding(file)
Arguments
file |
Path to the file. |
Value
A character string with the encoding name.
Download a file from INEP
Description
Internal function to download files from INEP's servers with progress indication and error handling.
Usage
download_inep_file(url, destfile, quiet = FALSE)
Arguments
url |
The URL to download from. |
destfile |
The destination file path. |
quiet |
Logical. If |
Value
The path to the downloaded file.
Summary statistics for ENEM scores
Description
Calculates summary statistics for ENEM scores, optionally grouped by demographic variables.
Usage
enem_summary(data, by = NULL)
Arguments
data |
A tibble with ENEM data (from |
by |
Optional grouping variable(s) as character vector. |
Value
A tibble with summary statistics for each score area.
Examples
enem <- get_enem(2023, n_max = 10000)
# overall summary
enem_summary(enem)
# summary by sex
enem_summary(enem, by = "tp_sexo")
Extract a ZIP file
Description
Internal function to extract ZIP files with progress indication.
Usage
extract_zip(zipfile, exdir, quiet = FALSE)
Arguments
zipfile |
Path to the ZIP file. |
exdir |
Directory to extract to. |
quiet |
Logical. If |
Value
A character vector of extracted file paths.
Find the Censo Escolar data file
Description
Internal function to locate the main data file within the extracted census directory.
Usage
find_censo_file(exdir, year)
Arguments
exdir |
The extraction directory. |
year |
The year. |
Value
The path to the data file.
Find data files in extracted directory
Description
Internal function to locate the main data files after extraction.
Usage
find_data_files(exdir, pattern = "\\.(csv|CSV|txt|TXT)$")
Arguments
exdir |
The extraction directory. |
pattern |
Optional regex pattern to filter files. |
Value
A character vector of file paths.
Find the ENEM data file
Description
Internal function to locate the main ENEM data file within the extracted directory.
Usage
find_enem_file(exdir, year)
Arguments
exdir |
The extraction directory. |
year |
The year. |
Value
The path to the data file.
Get the current cache directory
Description
Returns the current cache directory used by educabR.
Usage
get_cache_dir()
Value
A character string with the path to the cache directory.
Examples
get_cache_dir()
Get School Census (Censo Escolar) data
Description
Downloads and processes microdata from the Brazilian School Census (Censo Escolar), conducted annually by INEP. Returns school-level data with information about infrastructure, location, and administrative details.
Usage
get_censo_escolar(year, uf = NULL, n_max = Inf, keep_zip = TRUE, quiet = FALSE)
Arguments
year |
The year of the census (2007-2024). |
uf |
Optional. Filter by state (UF code or abbreviation). |
n_max |
Maximum number of rows to read. Default is |
keep_zip |
Logical. If |
quiet |
Logical. If |
Details
The School Census is the main statistical survey on basic education in Brazil. It collects data from all public and private schools offering basic education (early childhood, elementary, and high school).
Important notes:
The microdata contains one row per school (~217,000 schools in 2023).
Column names are standardized to lowercase with underscores.
Use the
ufparameter to filter by state for faster processing.
Value
A tibble with school data in tidy format.
Data dictionary
For detailed information about variables, see INEP's documentation: https://www.gov.br/inep/pt-br/acesso-a-informacao/dados-abertos/microdados/censo-escolar
Examples
# get schools data for 2023
escolas <- get_censo_escolar(2023)
# get schools from Sao Paulo state only
escolas_sp <- get_censo_escolar(2023, uf = "SP")
# read only first 1000 rows for exploration
escolas_sample <- get_censo_escolar(2023, n_max = 1000)
Get ENEM (Exame Nacional do Ensino Médio) data
Description
Downloads and processes microdata from ENEM, the Brazilian National High School Exam. ENEM is used for university admissions and as a high school equivalency exam.
Usage
get_enem(year, n_max = Inf, keep_zip = TRUE, quiet = FALSE)
Arguments
year |
The year of the exam (2009-2023). |
n_max |
Maximum number of rows to read. Default is |
keep_zip |
Logical. If |
quiet |
Logical. If |
Details
ENEM is conducted annually by INEP and is the largest exam in Brazil, with millions of participants. The microdata includes:
Participant demographics (age, sex, race, etc.)
Socioeconomic questionnaire responses
Scores for each test area
Essay scores
School information (when applicable)
Important notes:
ENEM files are very large (several GB when extracted).
Use
n_maxto read a sample first for exploration.Column names are standardized to lowercase with underscores.
Score variables start with
nu_nota_prefix.
Value
A tibble with the ENEM microdata in tidy format.
Data dictionary
For detailed information about variables, see INEP's documentation: https://www.gov.br/inep/pt-br/acesso-a-informacao/dados-abertos/microdados/enem
Examples
# get a sample of 10000 rows for exploration
enem_sample <- get_enem(2023, n_max = 10000)
# get full data (warning: large file)
enem_2023 <- get_enem(2023)
Get ENEM item response data
Description
Downloads and processes ENEM item response (gabarito) data, which contains detailed information about each question.
Usage
get_enem_itens(year, n_max = Inf, quiet = FALSE)
Arguments
year |
The year of the exam (2009-2023). |
n_max |
Maximum number of rows to read. |
quiet |
Logical. If |
Value
A tibble with item response data.
Examples
# get item data for 2023
itens <- get_enem_itens(2023)
Get IDEB (Índice de Desenvolvimento da Educação Básica) data
Description
Downloads and processes IDEB data from INEP. IDEB is the main indicator of education quality in Brazil, combining student performance (from SAEB) with grade promotion rates.
Usage
get_ideb(
year,
level = c("escola", "municipio"),
stage = c("anos_iniciais", "anos_finais", "ensino_medio"),
uf = NULL,
quiet = FALSE
)
Arguments
year |
The year of the IDEB (available: 2017, 2019, 2021, 2023). |
level |
The aggregation level:
|
stage |
The education stage:
|
uf |
Optional. Filter by state (UF code or abbreviation). |
quiet |
Logical. If |
Details
IDEB is calculated every two years since 2005 based on:
-
Learning: Average scores in Portuguese and Mathematics from SAEB
-
Flow: Grade promotion rate (inverse of repetition/dropout)
The index ranges from 0 to 10. Brazil's national goal is to reach 6.0 by 2022 (the level of developed countries in PISA).
Note: IDEB data is relatively small compared to other INEP datasets,
so no n_max parameter is provided.
Value
A tibble with IDEB data in tidy format.
Data source
Official IDEB portal: https://www.gov.br/inep/pt-br/areas-de-atuacao/pesquisas-estatisticas-e-indicadores/ideb
Examples
# get school-level IDEB for early elementary in 2021
ideb_escolas <- get_ideb(2021, level = "escola", stage = "anos_iniciais")
# get municipality-level IDEB for São Paulo state
ideb_sp <- get_ideb(2021, level = "municipio", stage = "anos_iniciais", uf = "SP")
# get high school IDEB for all municipalities
ideb_em <- get_ideb(2023, level = "municipio", stage = "ensino_medio")
Get IDEB historical series
Description
Downloads and combines IDEB data across multiple years to create a historical series.
Usage
get_ideb_series(
years = NULL,
level = c("escola", "municipio"),
stage = c("anos_iniciais", "anos_finais", "ensino_medio"),
uf = NULL,
quiet = FALSE
)
Arguments
years |
Vector of years to include (default: all available). |
level |
The aggregation level. |
stage |
The education stage. |
uf |
Optional. Filter by state. |
quiet |
Logical. If |
Value
A tibble with IDEB data for all requested years.
Examples
# get IDEB history for municipalities
ideb_hist <- get_ideb_series(
years = c(2017, 2019, 2021),
level = "municipio",
stage = "anos_iniciais"
)
List cached files
Description
Lists all files currently in the educabR cache.
Usage
list_cache(dataset = NULL)
Arguments
dataset |
Optional. Filter by dataset name. |
Value
A tibble with information about cached files.
Examples
list_cache()
List available Censo Escolar files
Description
Lists the data files available in a downloaded School Census.
Usage
list_censo_files(year)
Arguments
year |
The year of the census. |
Value
A character vector of file names found.
Examples
list_censo_files(2023)
List available IDEB data
Description
Lists the IDEB data files available in the INEP portal.
Usage
list_ideb_available()
Value
A tibble with available IDEB datasets.
Examples
list_ideb_available()
Read IDEB Excel file
Description
Internal function to read IDEB Excel files.
Usage
read_ideb_excel(file)
Arguments
file |
Path to the Excel file. |
Value
A tibble with the data.
Read INEP data file
Description
Internal function to read INEP data files with appropriate settings.
Usage
read_inep_file(file, delim = ";", encoding = NULL, n_max = Inf)
Arguments
file |
Path to the data file. |
delim |
The delimiter character. |
encoding |
The file encoding. |
n_max |
Maximum number of rows to read. |
Value
A tibble with the data.
Set the cache directory for educabR
Description
Sets the directory where downloaded files will be cached. This avoids repeated downloads of the same data.
Usage
set_cache_dir(path = NULL, persistent = FALSE)
Arguments
path |
A character string with the path to the cache directory.
If |
persistent |
Logical. If |
Value
Invisibly returns the cache directory path.
Examples
# set a persistent cache directory
set_cache_dir("~/educabR_cache")
Standardize column names
Description
Internal function to standardize column names to lowercase with underscores.
Usage
standardize_names(df)
Arguments
df |
A data frame. |
Value
The data frame with standardized names.
Convert UF abbreviation to code
Description
Internal function to convert state abbreviations to IBGE codes.
Usage
uf_to_code(uf)
Arguments
uf |
UF abbreviation or code. |
Value
The numeric UF code.
Validate year parameter
Description
Internal function to validate that a year is available for a dataset.
Usage
validate_year(year, dataset)
Arguments
year |
The year to validate. |
dataset |
The dataset name. |
Value
The validated year (invisibly), or aborts with error.