Working with Units

Carl Boettiger

2022-04-28

Overview

One essential role of EML metadata is in precisely defining the units in which data is measured. To make sure these units can be understood by (and thus potentially converted to other units by) a computer, it’s necessary to be rather precise about our choice of units. EML knows about a lot of commonly used units, referred to as “standardUnits,” already. If a unit is in EML’s standardUnit dictionary, we can refer to it without further explanation as long as we’re careful to use the precise id for that unit, as we will see below.

Sometimes data involves a unit that is not in standardUnit dictionary. In this case, the metadata must provide additional information about the unit, including how to convert the unit into the SI system. EML uses an existing standard, stmml, to represent this information, which must be given in the additionalMetadata section of an EML file. The stmml standard is also used to specify EML’s own standardUnit definitions.

Add a custom unit to EML

library("EML")
custom_units <- 
  data.frame(id = "speciesPerSquareMeter", 
             unitType = "arealDensity", 
             parentSI = "numberPerSquareMeter", 
             multiplierToSI = 1, 
             description = "number of species per square meter")


unitList <- set_unitList(custom_units)

Start with a minimal EML document

me <- list(individualName = list(givenName = "Carl", surName = "Boettiger"))
my_eml <- list(dataset = list(
              title = "A Minimal Valid EML Dataset",
              creator = me,
              contact = me),
              additionalMetadata = list(metadata = list(
                unitList = unitList
              ))
            )
write_eml(my_eml, "eml-with-units.xml")
eml_validate("eml-with-units.xml")
## [1] TRUE
## attr(,"errors")
## character(0)

Note: Custom units are widely misunderstood and misused in EML. See examples from custom-units

Reading EML: parsing unit information, including custom unit types

Let us start by examining the numeric attributes in an example EML file. First we read in the file:

f <- system.file("tests", emld::eml_version(), "eml-datasetWithUnits.xml", package = "emld")
eml <- read_eml(f)

We extract the attributeList, and examine the numeric attributes (e.g. those which have units):