---
title: "Introduction to fluxfinder"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Introduction to fluxfinder}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
The `fluxfinder` package provides functions to parse static-chamber
greenhouse gas measurement files generated by a variety of instruments;
compute flux rates using multi-observation metadata; and generate diagnostic
metrics and plots. It's designed to be easy to integrate into scientific workflows.
## Load sample data
```{r setup}
library(fluxfinder)
# Data from a LI-7810
f <- system.file("extdata/TG10-01087.data", package = "fluxfinder")
dat <- ffi_read_LI7810(f)
# Note that the fluxfinder read functions print some info after reading
# Set "options(fluxfinder.quiet = TRUE)" to suppress such messages
# Look at a subset of the data; the full data frame has 500+ rows and 25 columns
dat[1:6, 1:9]
```
The data frame returned by `ffi_read_LI7810` is all data from the raw
[LI-7810](https://www.licor.com/env/products/trace-gas/LI-7810) file,
except that `TIMESTAMP`, `TZ` (time zone of the timestamps),
`SN` (serial number), and `MODEL` columns have been added.
The analyzer data is basically a stream of measured
greenhouse gas concentrations:
```{r overview-plot, fig.width=7}
library(ggplot2)
ggplot(dat, aes(TIMESTAMP, CO2)) + geom_point()
```
## Match with metadata
For these data to be useful, we need to
associate them with _metadata_ about the measurements: when they were started,
how long they lasted, plot/treatment/collar information, etc.
```{r metadata}
# Accompanying metadata
md <- system.file("extdata/TG10-01087-metadata.csv", package = "fluxfinder")
metadat <- read.csv(md)
print(metadat)
```
**Important note**: in this sample metadata, our measurement identified is
labeled `Plot`, but this could be named, and refer to, anything: bottle,
sample, collar, etc. It's simply an identifier for _this_ measurement, i.e.
this row.
The `ffi_metadata_match` function matches up the data with metadata,
using the `TIMESTAMP` column that `ffi_read_LI7810` helpfully created
when it read the data file.
```{r matching, fig.width=7}
dat$metadat_row <- ffi_metadata_match(
data_timestamps = dat$TIMESTAMP,
start_dates = metadat$Date,
start_times = metadat$Start_time,
obs_lengths = metadat$Obs_length + 10) # 10 is expected dead band length
# Note that ffi_metadata_match() warns us that one metadata row didn't match any data
# Based on the row match information, add a "Plot" column to the data
dat$Plot <- metadat$Plot[dat$metadat_row]
metadat$metadat_row <- seq_len(nrow(metadat))
# ...and plot
p <- ggplot(dat, aes(TIMESTAMP, CO2, color = Plot)) + geom_point()
print(p)
```
Some of these are clearly not correct--the measurement time seems to be shorter
then 60 seconds for the C, D, and E plots:
```{r emphasize-problems, fig.width=7, echo=FALSE, message=FALSE}
library(lubridate)
p + annotate("rect", ymin = 455, ymax = 476,
xmin = ymd_hms("2022-10-27 10:39:30", tz = "EST"),
xmax = ymd_hms("2022-10-27 10:43:30", tz = "EST"),
color = "red", fill = NA, linewidth = 1.5)
```
In real life we'd want to correct the faulty metadata at its source.
Here, we'll just change the values programmatically and re-match:
```{r matching2, fig.width=7}
metadat$Obs_length[3:5] <- c(30, 45, 45)
dat$metadat_row <- ffi_metadata_match(
data_timestamps = dat$TIMESTAMP,
start_dates = metadat$Date,
start_times = metadat$Start_time,
obs_lengths = metadat$Obs_length + 10)
dat$Plot <- metadat$Plot[dat$metadat_row]
p %+% dat
```
That looks better!
## Unit conversion
We'd like our final units to be in µmol/m2/s, and so need to do some unit
conversion. (This can happen either before or after flux computation, below.)
The package provides `ffi_ppm_to_umol()` and `ffi_ppb_to_nmol()` functions
that perform this conversion using the
[Ideal Gas Law](https://en.wikipedia.org/wiki/Ideal_gas_law).
```{r units}
dat$CO2_umol <- ffi_ppm_to_umol(dat$CO2,
volume = 0.1, # m3
temp = 24) # degrees C
# See the message: because we didn't provide the 'atm' parameter,
# ffi_ppm_to_umol assumed standard pressure.
# Also normalize by ground area (0.16 m2 in this example)
dat$CO2_umol_m2 <- dat$CO2_umol / 0.16
```
Note that in the example above we're using a **constant system volume and
measurement ground area**. If that's not the case,
there should be a column in the metadata providing the changing values
(e.g. giving volume in m3) for each measurement.
Then after calling `ffi_metadata_match()`, merge the data and metadata and
pass the appropriate column to `ffi_ppm_to_umol()`. Here's an example:
```{r}
# Let's say volume varies by measurement; this can happen if the chamber
# height changes depending on the ground vegetation in each plot
metadat$Volume <- c(0.1, 0.2, 0.1, 0.1, 0.3, 0.1, 0.1)
# Merge the data and metadata
dat_changing_vol <- merge(dat, metadat[c("Plot", "Volume")], by = "Plot", all.x = TRUE)
# Unit conversion as above, but using the changing volume information:
dat_changing_vol$CO2_umol <- ffi_ppm_to_umol(dat_changing_vol$CO2,
volume = dat_changing_vol$Volume,
temp = 24)
# We still have constant ground area in this example
dat_changing_vol$CO2_umol_m2 <- dat_changing_vol$CO2_umol / 0.16
# Relative to the previous constant-volume example, our area-normalized
# amounts (µmol) have now increased for plots B and E because
# of their larger volumes:
aggregate(CO2_umol_m2 ~ Plot, data = dat, FUN = mean)
aggregate(CO2_umol_m2 ~ Plot, data = dat_changing_vol, FUN = mean)
```
## Compute fluxes
The `ffi_compute_fluxes` function provides a general-purpose tool for
computing fluxes from concentration time series, as well as associated
QA/QC information. It returns statistics for four types of models:
_linear_, _robust linear_, _polynomial_, and _HM81_, an exponential model
derived from diffusion theory, following
[Hutchinson and Mosier (1981)](http://dx.doi.org/10.2136/sssaj1981.03615995004500020017x).
Model statistics include Akaike information criterion (`AIC`),
R squared (`r.squared`), standard error of the residuals (`sigma`),
and model p-value (`p.value`). For the robust linear regression only, a
logical value `converged` is included; see the documentation for `MASS::rlm()`.
Flux (slope) statistics estimate and std.error;
For the robust linear regression model only, a logical value converged.
```{r compute-fluxes}
fluxes <- ffi_compute_fluxes(dat,
group_column = "Plot",
time_column = "TIMESTAMP",
gas_column = "CO2_umol_m2",
dead_band = 10)
# By default, ffi_compute_fluxes returns a data.frame with one row per
# grouping variable value (i.e., per measurement). The first column is the
# group label; the second is the average value of the `time_column`;
# and the rest of the columns are fit statistics for a linear fit of
# concentration as a function of time, along with information about polynomial
# and robust-linear fits. See ?ffi_compute_fluxes for more details.
names(fluxes)
# For clarity, print out only a subset of the columns
fluxes[c("Plot", "TIMESTAMP", "lin_r.squared", "lin_flux.estimate", "HM81_flux.estimate")]
```
Note that the `fluxes` extract printed above has one row per `Plot`, the
grouping variable; the mean `TIMESTAMP` of the group; model statistics
such as `lin_r.squared`; and the flux estimate. The final
column, `HM81_flux.estimate` is only numeric (i.e., not `NA`) when the data
show evidence of a saturating curvature. So in this case we might want to
examine more carefully the data from plots A, C, and F.
Plotting our computed fluxes:
```{r plot-fluxes, fig.width=7}
ggplot(fluxes, aes(Plot, lin_flux.estimate, color = lin_r.squared)) +
geom_point() +
geom_linerange(aes(ymin = lin_flux.estimate - lin_flux.std.error,
ymax = lin_flux.estimate + lin_flux.std.error)) +
ylab("CO2 flux (µmol/m2/s)")
```
We might want to check whether the robust-linear slope diverges
from the linear fit slope, suggesting influential outliers, or whether the
polynomial R2 is much larger, potentially indicating curvature
of the observations due to e.g. diffusion limitations.
```{r, figures-side, fig.show="hold", out.width="46%"}
ggplot(fluxes, aes(lin_flux.estimate, rob_flux.estimate, color = Plot)) +
geom_point() + geom_abline() + theme(legend.position = "none")
ggplot(fluxes, aes(lin_r.squared, poly_r.squared, color = Plot)) +
geom_point() + geom_abline() + theme(legend.position="none")
```
The plot C (green) data have more scatter, and thus lower R2 values and
higher uncertainty on the computed flux,
but there's no strong evidence of nonlinearity or outlier problems (although
see note above about the `HM81_estimate` field).
## A clearly nonlinear case
In our experience, static-chamber fluxes are frequently linear
([Forbrich et al. 2010](https://doi.org/10.1016/j.soilbio.2009.12.004))
even though molecular diffusion means that chamber feedbacks will inevitably
lead to curvilinear (saturating) behavior over time
([Pedersen et al. 2010](https://doi.org/10.1111/j.1365-2389.2010.01291.x)).
Still, how does one use `fluxfinder` to diagnose and work with nonlinear data?
Let's use R's built-in `Puromycin` dataset (simply because it exhibits
saturating behavior; it's unrelated to greenhouse gases) as an example:
```{r, fig.width=7}
ggplot(Puromycin, aes(conc, rate)) + geom_point() + geom_smooth(method = "lm")
```
Visually, we can see that a linear model will not be appropriate in this
case.
```{r}
ffi_compute_fluxes(Puromycin,
group_column = NULL,
time_column = "conc",
gas_column = "rate")
```
```{r include=FALSE}
x <- ffi_compute_fluxes(Puromycin,
group_column = NULL,
time_column = "conc",
gas_column = "rate")
x <- round(x, 3)
```
From the diagnostics returned by `ffi_compute_fluxes`:
* `HM81_flux.estimate` is not `NA`, which only occurs with saturating behavior;
* The `lin_AIC` (`r x$lin_AIC`) and `rob_AIC` (`r x$rob_AIC`) [Akaike information criterion](https://en.wikipedia.org/wiki/Akaike_information_criterion) values are similar, so no indication of influential outliers;
* The `lin_r.squared` (`r x$lin_r.squared`) and `poly_r.squared` (`r x$poly_r.squared`) values are _very_ different, suggesting a failure of the linear model;
* The [root mean square error](https://en.wikipedia.org/wiki/Root_mean_square_deviation) (RMSE) of the linear model is much higher than the other models' values;
* The `HM81_r.squared` (`r x$HM81_r.squared`) and `HM81_AIC` (`r x$HM81_AIC`) are considerably higher and lower, respectively, than the linear model.
All of these metrics point to a common conclusion: a linear model is _not_
appropriate for these data. If these were real data, we should use the
`HM81_flux.estimate` value as our flux estimate.
## Conclusion
This vignette covered `fluxfinder` basics: loading data and metadata,
matching the two, performing unit conversion, computing fluxes, and some
basic QA/QC. The test data we worked above could be fit well by linear model,
but for many reasons this might not always be true; see the vignette on
[integrating with the gasfluxes package](integrating-gasfluxes.html) for
guidance on using more sophisticated model-fitting routines.