--- title: "How to use this package" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{How to use this package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(eudata) ``` ```{r} library(dplyr) library(purrr) library(ggplot2) ``` The data on GISCO is divided into topics. ```{r} get_topics() ``` Select the topics that you are interested in. Within each topic there are numerous files. These may differ in in the year they are associated with, spatial resolution, coordinate reference system, data format, among other things. This package provides an easy access to the latest files. The example below selects the highest resolution file, where the coordinate system is the usual lat/long. ```{r} api <- get_topic("NUTS") file_list <- get_latest_files(api)$gpkg |> grep(pattern = "01M_.*_4326_", value = TRUE) file_list ``` Be aware, that these files can be huge. The `get_content_length` function returns the size of a file without downloading it. It is not vectorized, so you have to use a `map` like construct if you have a list of files. ```{r} to_tibble <- function(x, column_name = "value") tibble::tibble(names = names(x), `:=`(!!column_name, x)) file_sizes <- map_int(file_list, get_content_length, api = api) |> to_tibble(column_name = "size") # tibble::as_tibble_col() file_sizes |> knitr::kable( format.args = list(big.mark = "_", scientific = FALSE) ) ``` Suppose we selected a file to download. Then you can save it to a local file using the `get_content` function. It also save a copy into a cache under your cache folder. The place of this folder is OS dependent, use `rappdirs::user_cache_dir("eudata")` to locate it. If you do not specify a `dest` file, the data will be downloaded into a temporary file. The path to this file is the `body` element of the result of the call. ```{r} file_to_download <- grep(pattern = "RG.*LEVL_3", file_list, value = TRUE) file_to_download result <- get_content( api = api, end_point = file_to_download, save_to_file = TRUE ) result ``` The selected data format `dpkg` can be read into memory with the `sf` package. First only the first five records are shown. ```{r} db_file <- result$body layer <- sf::st_layers(db_file) layer sample <- sf::st_read( db_file, query = glue::glue("select * from \"{layer}\" limit 5") ) sample ``` Once you have the structure of the database, it is easy to filter, for example, for Hungarian data only. ```{r} hu_data <- sf::st_read( db_file, query = glue::glue("select * from \"{layer}\" where CNTR_CODE = \"HU\"") ) hu_data |> knitr::kable() ``` A map with `ggplot2`. ```{r} hu_data |> ggplot() + geom_sf() ``` Another example, now for postal codes. ```{r} api <- get_topic("Postal") file_to_download <- grep("_4326", get_latest_files(api)$gpkg, value = TRUE) result <- get_content(api, file_to_download, save_to_file = TRUE) result ``` ```{r} db_file <- result$body layer <- sf::st_layers(db_file) layer sample <- sf::st_read( db_file, query = glue::glue("select * from \"{layer}\" limit 5") ) sample hu_data <- sf::st_read( db_file, query = glue::glue("select * from \"{layer}\" where CNTR_ID = \"HU\"") ) hu_data |> select(POSTCODE, LAU_NAT) ``` ```{r} EOV <- "EPSG:23700" hu_data |> filter(grepl("^Gyöngyös$", LAU_NAT)) |> ggplot() + geom_sf() + coord_sf(crs = EOV, datum = EOV) ``` Cities with the highest number of associated postal codes ```{r} hu_data |> sf::st_drop_geometry() |> count(LAU_NAT) |> arrange(-n) |> filter(n > 1) ```