Introduction to nmrrr

This vignette describes the general workflow for processing NMR data using the {nmrrr} package.

This package can be used for batch processing and analysis of NMR (nuclear magnetic resonance) data, including combining and cleaning spectral data, assigning compound classes to the peaks, and calculating relative contributions of the compound classes.

This package will not perform corrections on raw spectral data (e.g., phase correction, baseline correction, peak picking, etc.). These steps must be done prior to using {nmrrr}, using the appropriate software (e.g., MNova, TopSpin).

For tips on processing NMR data in MNova/MestreNova, check out the repository wiki.

Currently, this package can handle data generated from MNova and TopSpin software. Because of the different file formats, users must specify the method when using the functions.


Example 1


A note on the data used here

This example uses data from the kfp_hysteresis dataset included with the {nmrrr} package. This is a subset of the data reported in Patel et al. 2021, representing samples subjected to drought and flood treatments. 1H solution-state NMR was performed on extracts reconstituted in DMSO-D6. The raw spectra were processed and cleaned using MNova, and the spectra and peaks were exported as .csv files. We use the bin set from Clemente et al. 2012 for compound classification.

This dataset contains (a) SPECTRA data and (B) PEAKS data (peak picked in MNova)


Part 0. Setup

library(nmrrr)

library(ggplot2)
theme_set(theme_bw()) # set the default ggplot theme

Set input directories


SPECTRA_FILES <- system.file("extdata", "kfp_hysteresis", "spectra_mnova", package = "nmrrr")
PEAKS_FILES <- system.file("extdata", "kfp_hysteresis", "peaks_mnova_multiple", package = "nmrrr")

Part 1: Importing spectra files: nmr_import_spectra()

This function will:

spectra_df <- nmr_import_spectra(path = SPECTRA_FILES,
                                 method = "mnova") 

str(spectra_df)
#> tibble [65,589 × 3] (S3: tbl_df/tbl/data.frame)
#>  $ ppm      : num [1:65589] 0.00392 0.00422 0.00453 0.00483 0.00514 ...
#>  $ intensity: num [1:65589] 0.00202 0.00202 0.00203 0.00203 0.00204 ...
#>  $ sampleID : chr [1:65589] "29" "29" "29" "29" ...

Further cleaning of the dataframe may be done by the user, according to specific needs. For instance, including only certain ranges of ppm shift. For this dataset, we include only points between 0 and 10 ppm.

spectra_df <- subset(spectra_df, ppm >= 0 & ppm <= 10)

str(spectra_df)
#> tibble [65,440 × 3] (S3: tbl_df/tbl/data.frame)
#>  $ ppm      : num [1:65440] 0.00392 0.00422 0.00453 0.00483 0.00514 ...
#>  $ intensity: num [1:65440] 0.00202 0.00202 0.00203 0.00203 0.00204 ...
#>  $ sampleID : chr [1:65440] "29" "29" "29" "29" ...

Part 2: Plotting the spectra: nmr_plot_spectra()

This function will plot all the spectra present in the spectra_df file. The spectra will be stacked and offset vertically (this can be customized).

nmr_plot_spectra(dat = spectra_df,
                 binset = bins_Clemente2012,
                 label_position = 5,
                 mapping = aes(x = ppm, 
                               y = intensity, 
                               group = sampleID, 
                               color = sampleID),
                 stagger = 0.5) +
  # OPTIONAL PARAMETERS/LAYERS
  geom_rect(aes(xmin = 2, xmax = 4, ymin = 0, ymax = 5.5), 
            fill = "white", color = NA, alpha = 0.8)+
  labs(subtitle = "binset: Clemente et al. 2012")+
  ylim(0, 5.5)

Notes:


Part 3: Assigning compound classes: nmr_assign_bins()

This function will assign bins/compound classes to the peaks based on the preferred bin set.

This package provides bin sets for DMSO-d6, D2O, and MeOD solvents. Users can choose from the available options, or can import their own preferred bin set. See vignette("nmrrr_binsets") for more details.

spectra_bins <- nmr_assign_bins(dat = spectra_df,
                                binset = bins_Clemente2012)

Note: The user may want to assign additional filtering steps to filter certain flagged data points, e.g. impurities, weak peaks, etc.

For this current dataset, because of the strong influence of water peaks in the o-alkyl region, we exclude that region from our calculations.

spectra_bins <- subset(spectra_bins, group != "oalkyl")

Part 4: Calculating relative abundance of compound classes: nmr_relabund()

Method 1: Integrating area under the curve from processed spectra files

relabund_integration <- nmr_relabund(dat = spectra_bins,
                                     method = "AUC")

Method 2: Calculating from peaks data

This method is specific to MNova-processed data. Users may pick peaks within MNova and export these as a table. In this case, users can simply add the area counts for each peak to calculate the relative contribution of the peak/bin type to the total area.

The peaks data can be exported one of two ways, giving two different formats of data files (“single columns” and “multiple columns”); this package can handle both versions. More details can be found in the repository wiki.

For both types, however, we first need to import and combine the files, then assign bin classes, and then add the areas.

(a) import the peaks

peaks_df <- nmr_import_peaks(path = PEAKS_FILES, 
                             method = "multiple columns")

str(peaks_df)
#> tibble [207 × 10] (S3: tbl_df/tbl/data.frame)
#>  $ Obs              : int [1:207] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ ppm              : num [1:207] 15.37 14.27 14.09 7.08 7.07 ...
#>  $ Intensity        : num [1:207] 0 0 0 0.3 0 0.1 0.5 0 0 0 ...
#>  $ Width            : num [1:207] 0.63 0.61 0.71 43.83 15.82 ...
#>  $ Area             : num [1:207] 0.08 0.07 0.07 179.46 3.01 ...
#>  $ Type             : chr [1:207] "Artifact" "Artifact" "Artifact" "Compound" ...
#>  $ Flags            : chr [1:207] "Weak" "Weak" "Weak" "None" ...
#>  $ Impurity/Compound: chr [1:207] NA NA NA NA ...
#>  $ Annotation       : chr [1:207] "" "" "" "" ...
#>  $ sampleID         : chr [1:207] "29" "29" "29" "29" ...

The columns we care about the most are ppm and Area. There are additional columns that provide flags for the peaks identified (e.g. Type == "Artifact"/"Compound"/"Solvent", Flags = "Weak"/"None", etc.). These can be filtered by the user as needed.

peaks_df <- subset(peaks_df, Type == "Compound")

(b) assign compound classes/bins to each peak

peaks_bins <- nmr_assign_bins(dat = peaks_df,
                              binset = bins_Clemente2012)

For this current dataset, because of the strong influence of water peaks in the o-alkyl region, we exclude that region from our calculations.

peaks_bins <- subset(peaks_bins, group != "oalkyl")

(c) calculate relative abundance

relabund_peaks <- nmr_relabund(dat = peaks_bins,
                               method = "peaks")

Part 5: Visualizing the processed relative abundance data

Users may then plot the relative abundance data using stacked bar plots, for example:


ggplot(relabund_integration,
       aes(x = sampleID, y = relabund, fill = group))+
  geom_bar(stat = "identity")+
  labs(title = "Relative abundance by AUC",
       subtitle = "binset: Clemente et al. 2012")


Example 2


A note on the data used here

This example uses data from the amp_burnseverity dataset included with the {nmrrr} package. This is a subset of the data available in Greiger et al. 2022, representing vegetation samples that were experimentally burnt in an open air burn table. Solid-state cross-polarization (CP) 13C NMR was performed on these samples. The raw spectra were processed and cleaned by scaling to mass using SIMPSON, and the spectra were batch-exported as a single .csv files. We use the SS bin set from Clemente et al. 2012 for compound classification.

This dataset contains one .csv file with all the samples. This file cannot be processed with {nmrrr} in its current form, but we can import it and convert to long-form, after which it is compatible with the {nmrrr} functions.


Here, we provide the workflow to demonstrate how to use additional formats with the {nmrrr} package. The first step is to bring the data into a format that is compatible with {nmrrr} functions, i.e., long-form data, with one column each for ppm, intensity, and sampleID.

This workflow makes use of {tidyverse} functions, but users may use other preferred packages and functions to get the same results.

library(tidyverse)

SS_FILE <- system.file("extdata", "amp_burnseverity", "spectra_wide.csv", package = "nmrrr")
ss_data <- read.csv(SS_FILE)

## Make long form and do additional cleaning if needed.
ss_data_long =
  ss_data %>%
  pivot_longer(-ppm,
               names_to = "sampleID",
               values_to = "intensity") %>% 
  arrange(sampleID, ppm)
  
ss_data_long = subset(ss_data_long, ppm >= 0 & ppm <= 250)
ss_data_long = subset(ss_data_long, intensity >= 0)

## Assign bins
data_long_bins = nmr_assign_bins(dat = ss_data_long,
                                 binset = bins_ss_Clemente2012)

## Plot spectra
nmr_plot_spectra(dat = data_long_bins,
                 binset = bins_ss_Clemente2012,
                 mapping = aes(x = ppm, y = intensity,
                               group = sampleID,
                               color = sampleID),
                 stagger = 15,
                 label_position = 70)+
  theme(axis.text.y = element_blank())+
  xlim(210, 0)

## Calculate relative abundance
data_relabund = nmr_relabund(dat = data_long_bins,
                             method = "AUC")

ggplot(data = data_relabund,
       aes(x =  sampleID,
           y = relabund,
           fill = group))+
  geom_bar(stat = "identity")

Importing your own preferred bin sets

Users may import their own binsets, if they do not wish to use the binsets provided with the {nmrrr} package. Binsets are simply dataframes, and therefore can be imported from any .csv, .txt, Excel file, or similar.

The binset dataframe must have columns:

  1. number - Serial number of the group
  2. group - Shortened name of the group. This column is used to label the groups, and will be seen in legends, tables, etc.
  3. start - Lower limit (ppm shift) of the bin
  4. stop - Upper limit (ppm shift) of the bin
  5. description - Optional column, with full-length description of the group

Below is an example of the binset format required.


bins_Clemente2012
#> # A tibble: 6 × 5
#>   number group      start  stop description                                
#>    <int> <chr>      <dbl> <dbl> <chr>                                      
#> 1      1 aliphatic1   0.3   1.3 aliphatic methyl and methylene             
#> 2      2 aliphatic2   1.3   2.2 aliphatic methyl and methylene near O and N
#> 3      3 oalkyl       2.9   4.1 O-alkyl, mainly from carbs and lignin      
#> 4      4 alphah       4.1   4.8 alpha-H from proteins                      
#> 5      5 aromatic     6.2   7.8 aromatic, from lignin and proteins         
#> 6      6 amide        7.8   8.4 amide from proteins