rtry: Preprocessing Plant Trait Data

Overview

rtry is an R package to support the application of plant trait data providing easily applicable functions for the basic steps of data preprocessing, e.g. data import, data exploration, selection of columns and rows, excluding trait data according to different attributes, long- to wide-table transformation, data export, and geocoding. The rtry package is designed to support the preprocessing of data released from the TRY Plant Trait Database (https://www.try-db.org), but is also applicable for other trait data.


Sources of rtry

There are two sources where users can download the rtry package and the relevant documentation.

CRAN

The rtry package is available on the CRAN repository. This is the recommended option to obtain the latest version of the package.

GitHub Repository

The TRY R project is an open-source project that can be found on the MPI-BGC-Functional-Biogeography GitHub repository: https://github.com/MPI-BGC-Functional-Biogeography/rtry.

Developers are also welcome to contribute to the package.


Installation guide

R environment

R 4.0.5 was used to develop and build the rtry package, and this is the minimum version required to use the package.

The latest version of R can be downloaded from CRAN, a network of ftp and web servers around the world that store the code and documentation of R: https://cran.r-project.org/

In case RStudio is used, we also recommend to use the latest version of RStudio when using the package, which can be found at https://posit.co/download/rstudio-desktop/, it is sufficient to use the free and open source version of RStudio Desktop.


Install the rtry package

The installation of the rtry package can be performed through the RStudio console.

First, install all the dependencies with the command:

install.packages(c("data.table", "dplyr", "tidyr", "jsonlite", "curl"))

Once the installation is completed, the message “The downloaded source packages are in <path>” should be seen.

Next, install the rtry package with the command:

From CRAN:

install.packages("rtry")

Else, if user downloaded the source package (.tar.gz) from the GitHub repository:

install.packages("<path_to_rtry.tar.gz>", repos = NULL, type = "source")

You may ignore the warning message “Rtools is required to build R packages but is not currently installed” if appears.

Once the installation is completed, the rtry package needs to be loaded with the command library(rtry).


Functions

Inside the rtry package, we use a function naming convention where each function begins with the prefix rtry_ followed by the description of what the specific function does. The rtry package consists of the following functions:

Once rtry is installed and loaded, for documentation type ? and the function name, e.g.:

?rtry_import

To view the R code underlying the function:

View(rtry_import)


Usage

Here we provide a brief example of how to use the rtry package to import a dataset released from TRY, explore the data and exclude trait records based on specific criteria.

The rtry_import function displays the number of columns and rows of the imported datset and the column headers. Thus it provides the first step to explore the dataset. TRY released data in a long-table format: one trait record or ancillary data per row.

In the second step, we explore the dataset for plant species, traits and ancillary data.

Finally, we use the ancillary data on plant maturity (DataID 413) to exclude traits measured on juvenile plants or unknown. For this, we use the feature of the TRY data structure to combine different trait records and ancillary data measured on the same entity (plant) via the ObservationID. Then, we double-check that the data filtered for further analyses contain only the observations of adult and mature plants.

For a comprehensive introduction and detailed example, see the vignettes rtry-introduction and rtry-workflow-general.

# Load the rtry package
library(rtry)

# Import the sample dataset from TRY provided within rtry package
TRYdata1 <- rtry_import(system.file("testdata", "data_TRY_15160.txt", package = "rtry"))

# View the imported data
View(TRYdata1)

# Explore the imported data
# Group the input data based on AccSpeciesID, AccSpeciesName, DataID, DataName, TraitID and TraitName, and sort by TraitID
# Note: For TraitID == "NA", meaning that entry is an ancillary data
TRYdata1_explore_anc <- rtry_explore(TRYdata1,
                          AccSpeciesID, AccSpeciesName, DataID, DataName,
                          TraitID, TraitName,
                          sortBy = TraitID)
View(TRYdata1_explore_anc)

# Select the rows where DataID is 413, i.e. the data containing the plant development status
# Explore the unique values of the OrigValueStr within the selected data
tmp_unfiltered <- rtry_select_row(TRYdata1, DataID %in% 413)
tmp_unfiltered <- rtry_explore(tmp_unfiltered,
                    DataID, DataName, OriglName, OrigValueStr, OrigUnitStr,
                    StdValue, Comment,
                    sortBy = OrigValueStr)
View(tmp_unfiltered)

# Exclude (remove) observations of juvenile plants or unknown development state
# Criteria
# 1. DataID equals to 413
# 2. OrigValueStr equals to "juvenile" or "unknown"
TRYdata1_filtered <- rtry_exclude(TRYdata1,
                      (DataID %in% 413) & (OrigValueStr %in% c("juvenile", "unknown")),
                      baseOn = ObservationID)
View(TRYdata1_filtered)

# Double-check the filtered data to ensure the excluding worked as expected
# Select the rows where DataID is 413
# Explore the unique values of the OrigValueStr within the selected data
tmp_filtered <- rtry_select_row(TRYdata1_filtered, DataID %in% 413)
tmp_filtered <- rtry_explore(tmp_filtered,
                  DataID, DataName, OriglName, OrigValueStr, OrigUnitStr,
                  StdValue, Comment,
                  sortBy = OrigValueStr)
View(tmp_filtered)

Additional vignettes provide a detailed introduction to rtry and example workflows for trait data preprocessing and for geocoding are available at:

vignette("<name_of_vignette>")


Copyright license

The rtry package is distributed under the CC BY 4.0 license, with a remark that the (reverse) geocoding functions provided within the package used the Nominatim developed with OpenStreetMap. Although the API and the data provided are free to use for any purpose, including commercial use, note that they are governed by the Open Database License (ODbL).