--- title: "Getting Started with perumammals" author: "Paul Efren Santos Andrade" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with perumammals} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(perumammals) ``` ## Introduction The **perumammals** package provides tools for working with Peru's mammalian biodiversity. It includes a curated taxonomic backbone based on Pacheco et al. (2021), the most comprehensive and up-to-date synthesis of Peruvian mammal diversity. This vignette will show you how to: - Install and load the package - Access the mammal species data - Validate species names - Identify endemic species - Explore species by ecoregion - Work with taxonomic families ## Installation You can install the development version of perumammals from GitHub: ```{r install, eval=FALSE} # Using pak (recommended) pak::pak("PaulESantos/perumammals") # Or using remotes remotes::install_github("PaulESantos/perumammals") ``` ## Loading the package ```{r setup} library(perumammals) ``` ## Available dataset The main dataset included in the package is the species list provided as an appendix in Pacheco et al. (2021): ```{r datasets} # Main species backbone data(peru_mammals) head(peru_mammals) ``` ## Basic name validation The core function `validate_peru_mammals()` checks if species names are present in the Peruvian mammal checklist: ```{r validate-basic} # Single species species_list <- c( "Puma concolor", # Valid name "Tremarctos ornatus", # Valid name "Panthera onca", # Valid name "Lycalopex sechurae", # Valid name "Odocoileus virginianus", # Valid name "Puma concolar" # Misspelled ) results <- validate_peru_mammals(species_list) results ``` ## Quick checks with wrapper functions ### Check if species occur in Peru ```{r is-peru} # Returns TRUE/FALSE is_peru_mammal(species_list) ``` ### Identify endemic species ```{r endemic} # Check which species are endemic to Peru species_list <- c("Thomasomys notatus", "Tremarctos ornatus", "Eptesicus mochica", "Puma concolar") is_endemic_peru(species_list) # Get endemic status as character endemic_status <- ifelse( is_endemic_peru(species_list) == "Endemic to Peru", "Endémica", "No endémica" ) endemic_status ``` ### Check match quality ```{r match-quality} # Get match quality levels match_quality_peru(species_list) ``` ## Working with data frames The validation functions integrate smoothly with data frames and the tidyverse: ```{r dataframe, warning=FALSE, message=FALSE} library(dplyr) # Create a sample dataset my_data <- tibble( species = species_list, abundance = c(5, 3, 2, 8) ) # Add validation results my_data_validated <- my_data |> mutate( in_peru = is_peru_mammal(species), endemic = is_endemic_peru(species), match_quality = match_quality_peru(species) ) my_data_validated ``` ## Exploring taxonomic families ### List all families ```{r families} # Get summary of all families families <- pm_list_families() families # Families with highest species richness families |> arrange(desc(n_species)) |> head(10) ``` ### Filter by family ```{r family-filter} # Get summary for bat species (Phyllostomidae) pm_list_families() |> filter(family == "Phyllostomidae") # Get species list for a specific family pm_species(family = "Phyllostomidae") ``` ## Analyzing endemic species ### Get endemic species list ```{r endemic-list} # List all endemic species endemic_mammals <- pm_species(endemic = TRUE) endemic_mammals # Endemic species by family endemic_mammals |> group_by(family) |> summarise(n_species = n_distinct(scientific_name)) |> arrange(desc(n_species)) |> head(10) ``` ### Endemic species by ecoregion ```{r endemic-ecoregion} # Compare endemism across ecoregions endemic_rate <- pm_list_ecoregions(include_endemic = TRUE) endemic_rate # Endemic species in Yungas pm_by_ecoregion(ecoregion = "YUN", endemic = TRUE) ``` ## Ecoregion analysis ### Species distribution across ecoregions ```{r ecoregion-dist} # Count species per ecoregion pm_list_ecoregions() ``` ### Species with widest distribution ```{r wide-distribution} # Species occurring in most ecoregions peru_mammals_ecoregions |> count(scientific_name, name = "n_ecoregions") |> arrange(desc(n_ecoregions)) |> top_n(10) ``` ## Practical examples ### Example 1: Data cleaning workflow ```{r cleaning} # Messy species list from field observations field_data <- tibble( location = c("Manu", "Tambopata", "Paracas", "Cusco", "Lima"), species_name = c( "puma concolor", # lowercase "Tremarctos ornatu", # missing 's' "Otaria flavescens", # marine mammal "Lycalopex sechure", # missing 'ae' "Unknown bat" # invalid ), count = c(2, 1, 15, 3, 8) ) # Validate and clean field_data_clean <- field_data %>% mutate( # Validate names validated = validate_peru_mammals(species_name)$Matched.Name, # Check if in Peru in_checklist = is_peru_mammal(species_name), # Get match quality quality = match_quality_peru(species_name) ) field_data_clean ``` ### Example 2: Endemic species summary ```{r endemic-summary} # Get all endemic mammals endemic_species <- pm_species(endemic = TRUE) endemic_species # Total endemic species by order endemic_species |> count(order, name = "n_endemic") |> arrange(desc(n_endemic)) ``` ### Example 3: Ecoregion-specific analysis ```{r ecoregion-analysis} # Focus on Selva Baja (Amazon lowlands) selva_baja_species <- pm_by_ecoregion(ecoregion = "SB") selva_baja_species # Endemic species in Selva Baja pm_by_ecoregion(ecoregion = "SB", endemic = TRUE) |> count(family, name = "n_species") |> arrange(desc(n_species)) ``` ## Advanced: Fuzzy matching details The validation algorithm uses a hierarchical matching approach: 1. **Exact match**: Perfect match with accepted name 2. **Genus + fuzzy species**: Genus exact, species with small differences 4. **Fuzzy genus + exact species**: Species exact, genus with small differences 5. **Double fuzzy**: Both genus and species with small differences 6. **No match**: No acceptable match found ```{r fuzzy-details} # Examples of different match levels test_names <- c( "Puma concolor", # Level: Exact "Tremarctos ornatus Cuvier", "Lycalopex sechure", # Level: Genus + fuzzy species "Lyclopex sechurae", # Level: Fuzzy genus + exact species "Panthera onca" # Level: Exact ) validate_peru_mammals(test_names) |> select(Orig.Name, Matched.Name, matched) ``` ## Citation When using this package, please cite both the package and the source data: ```{r citation, eval=FALSE} citation("perumammals") ``` **Package citation:** Santos Andrade, P. E., & Gonzales Guillen, F. N. (2025). perumammals: Taxonomic Backbone and Name Validation Tools for Mammals of Peru. **Data source:** Pacheco, V., Cadenillas, R., Zeballos, H., Hurtado, C. M., Ruelas, D., & Pari, A. (2021). Lista actualizada de la diversidad de los mamíferos del Perú y una propuesta para su actualización. *Revista Peruana de Biología*, 28(special issue), e21019. https://doi.org/10.15381/rpb.v28i4.21019