---
title: "How to use this package"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{How to use this package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(eudata)
```

```{r}
library(dplyr)
library(purrr)
library(ggplot2)
```

The data on GISCO is divided into topics. 
```{r}
get_topics()
```
Select the topics that you are interested in.
Within each topic there are numerous files. These may  differ in 
in the year they are associated with, spatial resolution, coordinate reference system,
data format, among other things.

This package provides an easy access to the latest files.
The example below selects the highest resolution file, where the coordinate system is the usual lat/long.

```{r}
api <- get_topic("NUTS")

file_list <- get_latest_files(api)$gpkg |> 
  grep(pattern = "01M_.*_4326_", value = TRUE)

file_list
```

Be aware, that these files can be huge. The `get_content_length` function returns the size of a file without downloading it. It is not vectorized, so you have to use a `map` like construct if you have a list of files.

```{r}
to_tibble <- function(x, column_name = "value") 
  tibble::tibble(names = names(x), `:=`(!!column_name, x))

file_sizes <-
  map_int(file_list, get_content_length, api = api) |>
  to_tibble(column_name = "size")
  # tibble::as_tibble_col()


file_sizes |>
  knitr::kable(
    format.args = list(big.mark = "_", scientific = FALSE)
  )

```

Suppose we selected a file to download. Then you can save it to a local file using the `get_content` function. It also save a copy into a cache under your cache folder. The place of this folder is OS dependent, use `rappdirs::user_cache_dir("eudata")` to locate it.

If you do not specify a `dest` file, the data will be downloaded into a  temporary file. The path to this file is the `body` element of the result of the call.

```{r}
file_to_download <- grep(pattern = "RG.*LEVL_3", file_list, value = TRUE)
file_to_download

result <- get_content(
  api = api,
  end_point = file_to_download,
  save_to_file = TRUE
)

result
```

The selected data format `dpkg` can be read into memory with the `sf` package. First only the first five records are shown.
```{r}
db_file <- result$body 
layer <- sf::st_layers(db_file)
layer

sample <- sf::st_read(
  db_file,
  query = glue::glue("select * from \"{layer}\" limit 5")
)

sample

```

Once you have the structure of the database, it is easy to filter, for example, for Hungarian data only.

```{r}
hu_data <- sf::st_read(
  db_file,
  query = glue::glue("select * from \"{layer}\" where CNTR_CODE = \"HU\"")
)

hu_data |>
  knitr::kable()
```

A map with `ggplot2`.
```{r}
hu_data |>
  ggplot() +
  geom_sf()
```

Another example, now for postal codes.
```{r}
api <- get_topic("Postal")

file_to_download <- grep("_4326", get_latest_files(api)$gpkg, value = TRUE) 

result <- get_content(api, file_to_download, save_to_file = TRUE)
result
```

```{r}
db_file <- result$body
layer <- sf::st_layers(db_file)
layer

sample <- sf::st_read(
  db_file,
  query = glue::glue("select * from \"{layer}\" limit 5")
)

sample


hu_data <- sf::st_read(
  db_file,
  query = glue::glue("select * from \"{layer}\" where CNTR_ID = \"HU\"")
)

hu_data |> 
  select(POSTCODE, LAU_NAT)
```

```{r}
EOV <- "EPSG:23700"

hu_data |>
  filter(grepl("^Gyöngyös$", LAU_NAT)) |>
  ggplot() +
  geom_sf() + 
  coord_sf(crs = EOV, datum = EOV)


```

Cities with the highest number of associated postal codes
```{r}
hu_data |> 
  sf::st_drop_geometry() |>
  count(LAU_NAT) |>
  arrange(-n) |>
  filter(n > 1)
  
```