--- title: "Download data" author: "Martin Westgate & Dax Kellie" date: '2024-11-19' output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Download data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- The `atlas_` functions are used to return data from the atlas chosen using `galah_config()`. They are: - `atlas_counts()` - `atlas_occurrences()` - `atlas_species()` - `atlas_media()` - `atlas_taxonomy()` The final `atlas_` function---`atlas_citation()`---is unusual: It does not return any new data, but instead provides a citation for an existing dataset (downloaded using `atlas_occurrences()`) with an associated DOI. The other functions are described below. It is equally permissable to use the `type` argument of `galah_call()` to specify the kind of data you want, and then retrieve the data using `collect()`. Here we use the `atlas_` prefix for consistency with earlier versions of galah, and because many `atlas_` functions sometimes include shortcuts to make life easier. # Record counts `atlas_counts()` provides summary counts of records in the specified atlas without needing to download all the records first. ``` r galah_config(atlas = "Australia") # Total number of records in the ALA atlas_counts() ``` ``` ## # A tibble: 1 × 1 ## count ## ## 1 146185520 ``` Group and summarise record counts by specific fields using `galah_group_by()`. ``` r galah_call() |> galah_group_by(kingdom) |> atlas_counts() ``` ``` ## # A tibble: 12 × 2 ## kingdom count ## ## 1 Animalia 113408280 ## 2 Plantae 27572183 ## 3 Fungi 2448600 ## 4 Chromista 1057157 ## 5 Protista 316541 ## 6 Bacteria 113480 ## 7 Archaea 4120 ## 8 Virus 2382 ## 9 Bamfordvirae 210 ## 10 Orthornavirae 138 ## 11 Viroid 104 ## 12 Shotokuvirae 41 ``` # Species lists A common use case of atlas data is to identify which species occur in a specified region, time period, or taxonomic group. `atlas_species()` is similar to `search_taxa()`, in that it returns taxonomic information and unique identifiers, but differs by returning information only on species and is far more flexible by supporting filtering. ``` r species <- galah_call() |> galah_identify("Rodentia") |> galah_filter(stateProvince == "Northern Territory") |> atlas_species() species |> head() ``` ``` ## # A tibble: 6 × 11 ## taxon_concept_id species_name scientific_name_auth…¹ taxon_rank kingdom phylum class order family genus vernacular_name ## ## 1 https://biodive… Pseudomys d… (Gould, 1842) species Animal… Chord… Mamm… Rode… Murid… Pseu… Delicate Mouse ## 2 https://biodive… Mesembriomy… (J.E. Gray, 1843) species Animal… Chord… Mamm… Rode… Murid… Mese… Black-footed T… ## 3 https://biodive… Zyzomys arg… (Thomas, 1889) species Animal… Chord… Mamm… Rode… Murid… Zyzo… Common Rock-rat ## 4 https://biodive… Pseudomys h… (Waite, 1896) species Animal… Chord… Mamm… Rode… Murid… Pseu… Sandy Inland M… ## 5 https://biodive… Melomys bur… (Ramsay, 1887) species Animal… Chord… Mamm… Rode… Murid… Melo… Grassland Melo… ## 6 https://biodive… Notomys ale… Thomas, 1922 species Animal… Chord… Mamm… Rode… Murid… Noto… Spinifex Hoppi… ## # ℹ abbreviated name: ¹​scientific_name_authorship ``` # Occurrence data To download occurrence data you will need to specify an email in `galah_config()` that has been registered to an account with your selected GBIF node. See more information in the [config section](#config). ``` r galah_config(email = "your_email@email.com", atlas = "Australia") ``` Download occurrence records for *Eolophus roseicapilla*. ``` r occ <- galah_call() |> galah_identify("Eolophus roseicapilla") |> galah_filter( stateProvince == "Australian Capital Territory", year >= 2010, profile = "ALA" ) |> galah_select(institutionID, group = "basic") |> atlas_occurrences() ``` ``` ## Retrying in 1 seconds. ## Retrying in 2 seconds. ## Retrying in 4 seconds. ``` ``` r occ |> head() ``` ``` ## # A tibble: 6 × 9 ## recordID scientificName taxonConceptID decimalLatitude decimalLongitude eventDate occurrenceStatus ## ## 1 0000a928-d756-42eb… Eolophus rose… https://biodi… -35.6 149. 2017-04-19 09:11:00 PRESENT ## 2 0001bc78-d2e9-48aa… Eolophus rose… https://biodi… -35.2 149. 2019-08-13 15:13:00 PRESENT ## 3 0002064f-08ea-425b… Eolophus rose… https://biodi… -35.3 149. 2014-03-16 06:48:00 PRESENT ## 4 00022dd2-9f85-4802… Eolophus rose… https://biodi… -35.3 149. 2022-05-08 08:20:00 PRESENT ## 5 0002cc35-8d5a-4d20… Eolophus rose… https://biodi… -35.3 149. 2015-11-01 08:00:00 PRESENT ## 6 00030a8c-082f-44f0… Eolophus rose… https://biodi… -35.3 149. 2022-01-06 11:47:00 PRESENT ## # ℹ 2 more variables: dataResourceName , institutionID ``` # Media metadata In addition to text data describing individual occurrences and their attributes, ALA stores images, sounds and videos associated with a given record. Metadata on these records can be downloaded using `atlas_media()`. ``` r media_data <- galah_call() |> galah_identify("Eolophus roseicapilla") |> galah_filter( year == 2020, cl22 == "Australian Capital Territory") |> atlas_media() media_data |> head() ``` ``` ## # A tibble: 6 × 19 ## media_id recordID scientificName taxonConceptID decimalLatitude decimalLongitude eventDate occurrenceStatus ## ## 1 ff8322d0-… 003a192… Eolophus rose… https://biodi… -35.3 149. 2020-09-12 16:11:00 PRESENT ## 2 c66fc819-… 015ee7c… Eolophus rose… https://biodi… -35.4 149. 2020-08-09 15:11:00 PRESENT ## 3 fe6d7b94-… 05e86b7… Eolophus rose… https://biodi… -35.4 149. 2020-11-13 22:29:00 PRESENT ## 4 2f4d32c0-… 063bb0f… Eolophus rose… https://biodi… -35.6 149. 2020-08-04 11:50:00 PRESENT ## 5 73407414-… 063bb0f… Eolophus rose… https://biodi… -35.6 149. 2020-08-04 11:50:00 PRESENT ## 6 89171c49-… 063bb0f… Eolophus rose… https://biodi… -35.6 149. 2020-08-04 11:50:00 PRESENT ## # ℹ 11 more variables: dataResourceName , multimedia , images , sounds , videos , ## # creator , license , mimetype , width , height , image_url ``` To actually download the media files to your computer, use [collect_media()]. ``` r media_data |> collect_media() ``` # Taxonomic trees `atlas_taxonomy()` provides a way to build taxonomic trees from one clade down to another using each GBIF node's internal taxonomy. Specify which taxonomic level your tree will go down to with `galah_filter()` using the `rank` argument. ``` r galah_call() |> galah_identify("chordata") |> galah_filter(rank == class) |> atlas_taxonomy() ``` ``` ## # A tibble: 19 × 4 ## name rank parent_taxon_concept_id taxon_concept_id ## ## 1 Chordata phylum https://biodivers… ## 2 Cephalochordata subphylum https://biodiversity.org.au/afd/taxa/065f1da4-53cd-40b8-a396-80fa5c74dedd https://biodivers… ## 3 Tunicata subphylum https://biodiversity.org.au/afd/taxa/065f1da4-53cd-40b8-a396-80fa5c74dedd https://biodivers… ## 4 Appendicularia class https://biodiversity.org.au/afd/taxa/1c20ed62-d918-4e42-b625-8b86d533cc51 https://biodivers… ## 5 Ascidiacea class https://biodiversity.org.au/afd/taxa/1c20ed62-d918-4e42-b625-8b86d533cc51 https://biodivers… ## 6 Thaliacea class https://biodiversity.org.au/afd/taxa/1c20ed62-d918-4e42-b625-8b86d533cc51 https://biodivers… ## 7 Vertebrata subphylum https://biodiversity.org.au/afd/taxa/065f1da4-53cd-40b8-a396-80fa5c74dedd https://biodivers… ## 8 Agnatha informal https://biodiversity.org.au/afd/taxa/5d6076b1-b7c7-487f-9d61-0fea0111cc7e https://biodivers… ## 9 Myxini informal https://biodiversity.org.au/afd/taxa/66db22c8-891d-4b16-a1a2-b66feaeaa3e0 https://biodivers… ## 10 Petromyzontida informal https://biodiversity.org.au/afd/taxa/66db22c8-891d-4b16-a1a2-b66feaeaa3e0 https://biodivers… ## 11 Gnathostomata informal https://biodiversity.org.au/afd/taxa/5d6076b1-b7c7-487f-9d61-0fea0111cc7e https://biodivers… ## 12 Amphibia class https://biodiversity.org.au/afd/taxa/ef5515fd-a0a2-4e16-b61a-0f19f8900f76 https://biodivers… ## 13 Aves class https://biodiversity.org.au/afd/taxa/ef5515fd-a0a2-4e16-b61a-0f19f8900f76 https://biodivers… ## 14 Mammalia class https://biodiversity.org.au/afd/taxa/ef5515fd-a0a2-4e16-b61a-0f19f8900f76 https://biodivers… ## 15 Reptilia class https://biodiversity.org.au/afd/taxa/ef5515fd-a0a2-4e16-b61a-0f19f8900f76 https://biodivers… ## 16 Pisces informal https://biodiversity.org.au/afd/taxa/ef5515fd-a0a2-4e16-b61a-0f19f8900f76 https://biodivers… ## 17 Actinopterygii class https://biodiversity.org.au/afd/taxa/e22efeb4-2cb5-4250-8d71-61c48bdaa051 https://biodivers… ## 18 Chondrichthyes class https://biodiversity.org.au/afd/taxa/e22efeb4-2cb5-4250-8d71-61c48bdaa051 https://biodivers… ## 19 Sarcopterygii class https://biodiversity.org.au/afd/taxa/e22efeb4-2cb5-4250-8d71-61c48bdaa051 https://biodivers… ``` # Configuring galah Various aspects of the galah package can be customized. ## Email To download occurrence records, species lists or media, you will need to provide an email address registered with the service that you want to use (e.g. for the ALA you can create an account [here](https://auth.ala.org.au/userdetails/registration/createAccount)). Once an email is registered, it should be stored in the config: ``` r galah_config(email = "myemail@gmail.com") ``` ## Setting your directory By default, galah stores downloads in a temporary folder, meaning that the local files are automatically deleted when the R session is ended. This behaviour can be altered so that downloaded files are preserved by setting the directory to a non-temporary location. ``` r galah_config(directory = "example/dir") ``` ## Setting the download reason ALA requires that you provide a reason when downloading occurrence data (via the galah `atlas_occurrences()` function). `reason` is set as "scientific research" by default, but you can change this using `galah_config()`. See `show_all(reasons)` for valid download reasons. ``` r galah_config(download_reason_id = your_reason_id) ``` ## Debugging If things aren't working as expected, more detail (particularly about web requests and caching behaviour) can be obtained by setting `verbose = TRUE`. ``` r galah_config(verbose = TRUE) ```