--- title: "Reptile Data Formatting Tutorial" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Reptile Data Formatting Tutorial} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` This tutorial explains how to use our reptile taxonomy functions to format and access structured information about reptile species in your research or applications. ## Overview of Functions Our package provides two main functions for working with reptile taxonomic data: 1. `format_all_attributes()` 2. `format_selected_attributes()` These functions help researchers efficiently organize and access taxonomic information about reptile species from various sources into standardized formats. ## Using `format_all_attributes()` The `format_all_attributes()` function processes a complete reptile dataset and formats all available taxonomic attributes into structured tibbles (data frames). ### Syntax ```r format_all_attributes(reptile_data) ``` ### Parameters - `reptile_data`: A dataframe or list containing raw reptile taxonomic information ### Return Value The function returns a list of tibbles, with each tibble containing a specific category of taxonomic information: - `distribution`: Geographic distribution information - `synonyms`: Scientific name synonyms throughout taxonomic history - `higher_taxa`: Taxonomic classification hierarchy - `subspecies`: Information about recognized subspecies - `common_names`: Common names in different languages - `reproduction`: Reproductive biology information - `types`: Type specimen information - `diagnosis`: Diagnostic morphological features - `comments`: Additional taxonomic notes - `etymology`: Origin and meaning of scientific names - `references`: Scientific literature references ### Example ```{r} library(reptiledbr) species_list <- c( "Lachesis muta", "Python bivittatus", "Crotalus atrox", "Bothrops atrox insularis", # Trinomial (con subespecie) - debería dar error "Lachesis sp" ) # reptile_data <- get_reptiledb_data(species_list) # reptile_data # Format all attributes all_attributes <- format_all_attributes(reptile_data) # Access specific attribute categories all_attributes$distribution all_attributes$common_names ``` ## Using `format_selected_attributes()` If you only need specific taxonomic attributes, the `format_selected_attributes()` function allows you to extract only the categories you're interested in. ### Syntax ```r format_selected_attributes(reptile_data, attributes, quiet = FALSE) ``` ### Parameters - `reptile_data`: A dataframe or list containing raw reptile taxonomic information - `attributes`: A character vector listing the specific attributes to extract - `quiet`: Logical value. If TRUE, suppresses processing messages ### Return Value Returns a list containing only the requested attribute tibbles. ### Example ```{r} # Extract only synonyms, higher taxa, and common names selected_info <- format_selected_attributes( reptile_data = reptile_data, attributes = c("Synonym", "Higher Taxa", "Common Names"), quiet = TRUE ) # Access the selected information selected_info$Synonym selected_info$`Higher Taxa` selected_info$`Common Names` ``` ## Working with the Formatted Data Once you have formatted your reptile data, you can use standard R data manipulation techniques to analyze the information: ```{r} # Find all venomous species all_attributes$comments |> dplyr::filter(stringr::str_detect(comment_detail, "Venomous")) # Extract distribution information for a specific species all_attributes$distribution |> dplyr::filter(input_name == "Crotalus atrox") ``` ## Data Structure Each formatted attribute tibble includes at least these standard columns: - `input_name`: The original species name provided - `genus`: The genus name - `species`: The species epithet - Attribute-specific column (e.g., `distribution`, `common_name`, `synonym`, etc.) This consistent structure allows for easy filtering, joining, and analysis across different attribute categories. ## Tips for Effective Use - When working with large datasets, use `format_selected_attributes()` to improve performance - Consider that some species may have multiple entries in certain categories (e.g., multiple common names or distribution records) - The `input_name` column provides a reliable key for joining information across different attribute tibbles By effectively using these formatting functions, you can streamline your reptile taxonomic research and data analysis workflows.