MetAlyzer User Guide

Nils Mechtel, Carolin Andresen & Luis Herfurth

2024-12-06

The package provides methods to read output files from the MetIDQ™ software into R. Metabolomics data is read and reformatted into an S4 object for convenient data handling, statistics and downstream analysis.

Install

There is a version available on CRAN.

install.packages("MetAlyzer")

The latest version can directly be installed from the github.

library(devtools)
install_github("nilsmechtel/MetAlyzer")

Overview

The package takes metabolomic measurements as “.xlsx” files generated from the MetIDQ™ software. Additionally, meta data for each sample can be provided for further analysis.

Effective quantification of metabolites requires a reliable signal. The limit of detection (LOD) is defined as 3 times signal-to-noise of the baseline, calculated by the software MetIDQ™ for each metabolite. Values are classified as “Valid”, “LOQ” (below limit of quantification) or “LOD” (below limit of detection). This information is encoded in the color in the “.xlsx” table. Further color codes can include “ISTD Out of Range”, “Invalid” and “Incomplete”. The MetAlyzer packages allow to read both information, the value and the quantification status, gives statistics about the quantification efficacy and allows filtering based on the LOD.

Example of excel sheet.
Example of excel sheet.

Get started

To present the base functionalities of MetAlyzer, we use a data set published by Gegner et al. 2022. This data set was analyzed using the MxP®Quant 500 kit and inlcudes 6 different extraction methods (1 to 6) to quantify 630 metabolites in triplicates for 4 tissue types (Drosophila, Mouse Liver, Mouse Kidney, Zebrafish Liver). It is included in the MetAlyzer package and can be accessed via example_extraction_data(). MetAlyzer_dataset reads the data from Excel into R and stores it as a SummarizedExperiment (SE).

fpath <- example_extraction_data()
metalyzer_se <- MetAlyzer_dataset(file_path = fpath)
#> 
#> 
#>  _____ ______   _______  _________  ________  ___           ___    ___ ________  _______   ________
#> |\   _ \  _   \|\  ___ \|\___   ___\\   __  \|\  \         |\  \  /  /|\_____  \|\  ___ \ |\   __  \
#> \ \  \\\__\ \  \ \   __/\|___ \  \_\ \  \|\  \ \  \        \ \  \/  / /\|___/  /\ \   __/|\ \  \|\  \
#>  \ \  \\|__| \  \ \  \_|/__  \ \  \ \ \   __  \ \  \        \ \    / /     /  / /\ \  \_|/_\ \   _  _\
#>   \ \  \    \ \  \ \  \_|\ \  \ \  \ \ \  \ \  \ \  \____    \/   / /     /  /_/__\ \  \_|\ \ \  \\  \| 
#>    \ \__\    \ \__\ \_______\  \ \__\ \ \__\ \__\ \_______\__/   / /     |\________\ \_______\ \__\\ _\ 
#>     \|__|     \|__|\|_______|   \|__|  \|__|\|__|\|_______|\____/ /       \|_______|\|_______|\|__|\|__|
#>                                                           \|____|/
#> 
#> 
#> Info: Reading color code "FFFFCCCC" as "#FFCCCC"
#> Info: Reading color code "FF00CD66" as "#00CD66"
#> Info: Reading color code "FF6A5ACD" as "#6A5ACD"
#> Info: Reading color code "FF87CEEB" as "#87CEEB"
#> Info: Reading color code "FFFFFFCC" as "#FFFFCC"
#> 
#> Measured concentration values:
#> ------------------------------
#>         0%        25%        50%        75%       100% 
#>      0.000      0.017      1.760     21.200 288149.000 
#> 
#> NAs: 5348 (8.38%)
#> Note: 'Metabolism Indicators' are frequently NA!
#> 
#> Measured quantification status:
#> -------------------------------
#> Valid: 24095 (37.77%)
#> LOQ: 5799 (9.09%)
#> LOD: 21789 (34.16%)
#> Invalid: 12105 (18.98%)
#> NAs: 0 (0%)

metalyzer_se
#> class: SummarizedExperiment 
#> dim: 862 74 
#> metadata(4): file_path sheet_index status_list aggregated_data
#> assays(2): conc_values quant_status
#> rownames(862): C0 C2 ... Sum of TGs Sum of Unsaturated TGs
#> rowData names(1): metabolic_classes
#> colnames(74): 9 10 ... 89 90
#> colData names(7): Plate Bar Code Sample Bar Code ... Sample Volume
#>   Measurement Time

The SE includes assay information with concentration values and quantification status, while storing metabolites and their respective classes in the row data. It is important to note that there are 630 metabolites, and 232 metabolite indicators calculated using the MetIDQ™ software. Sample-wise metadata is stored in the column data. Additional data such as the file path (‘file_path’), the Excel sheet number (‘sheet_index’), a list of colors for each quantification status (status_list), as well as a tibble of aggregated concentration data and quantification status for downstream analysis and visualizations (‘aggregated_data’) are stored in the SE’s metadata.

Summaries of concentration values and quantification status are automatically called during data loading, but can also manually be called using summarizeConcValues(metalyzer_se) and summarizeQuantData(metalyzer_se). The function summarizeConcValues prints the quantiles and the amount / percentage of NAs in the concentration data. The function summarizeQuantData gives an overview of the quantification status of the metabolites. E.g 34.16% of metabolites where below the Limit of Detection.

The various elements within the SummarizedExperiment (SE) can be accessed in a manner consistent with a typical SE.

meta_data <- colData(metalyzer_se)
head(meta_data)
#> DataFrame with 6 rows and 7 columns
#>            Plate Bar Code Sample Bar Code Sample Type Sample Description
#>               <character>     <character> <character>        <character>
#> 9  1030423581-1 | 10304..      1030420836      Sample                  1
#> 10 1030423581-1 | 10304..      1030420841      Sample                  1
#> 11 1030423581-1 | 10304..      1030420855      Sample                  1
#> 12 1030423581-1 | 10304..      1030423460      Sample                  2
#> 13 1030423581-1 | 10304..      1030420860      Sample                  2
#> 14 1030423581-1 | 10304..      1030420874      Sample                  2
#>         Tissue Sample Volume       Measurement Time
#>    <character>   <character>            <character>
#> 9   Drosophila            10 2020.10.05 17:52:54 ..
#> 10  Drosophila            10 2020.10.05 17:59:08 ..
#> 11  Drosophila            10 2020.10.05 18:05:19 ..
#> 12  Drosophila            10 2020.10.05 18:11:33 ..
#> 13  Drosophila            10 2020.10.05 18:17:47 ..
#> 14  Drosophila            10 2020.10.05 18:23:58 ..
metabolites <- rowData(metalyzer_se)
head(metabolites)
#> DataFrame with 6 rows and 1 column
#>               metabolic_classes
#>                     <character>
#> C0               Acylcarnitines
#> C2               Acylcarnitines
#> C3               Acylcarnitines
#> C3-DC (C4-OH)    Acylcarnitines
#> C3-OH            Acylcarnitines
#> C3:1             Acylcarnitines
concentration_values <- assays(metalyzer_se)$conc_values
head(concentration_values, c(5, 5))
#>                     9     10      11      12      13
#> C0            203.000 86.800 246.000 198.000 369.000
#> C2             29.500 15.800  34.600  39.200  42.200
#> C3             39.200  9.290  49.900  20.100  50.000
#> C3-DC (C4-OH)   0.057  0.055   0.076   0.679   0.071
#> C3-OH           0.106  0.132   0.116   0.145   0.170
quantification_status <- assays(metalyzer_se)$quant_status
head(quantification_status, c(5, 5))
#>               9       10      11      12      13     
#> C0            "Valid" "Valid" "Valid" "Valid" "Valid"
#> C2            "Valid" "Valid" "Valid" "Valid" "Valid"
#> C3            "Valid" "Valid" "Valid" "Valid" "Valid"
#> C3-DC (C4-OH) "LOD"   "LOD"   "LOD"   "Valid" "LOD"  
#> C3-OH         "LOD"   "LOD"   "LOD"   "LOD"   "LOD"

In addition to accessing the aggregated data through metadata(metalyzer_se)$aggregated_data, it can be directly accessed using aggregatedData.

aggregated_data <- aggregatedData(metalyzer_se)
head(aggregated_data)
#> # A tibble: 6 × 5
#> # Groups:   Metabolite [1]
#>   ID    Metabolite Class          Concentration Status
#>   <fct> <fct>      <fct>                  <dbl> <fct> 
#> 1 9     C0         Acylcarnitines         203   Valid 
#> 2 10    C0         Acylcarnitines          86.8 Valid 
#> 3 11    C0         Acylcarnitines         246   Valid 
#> 4 12    C0         Acylcarnitines         198   Valid 
#> 5 13    C0         Acylcarnitines         369   Valid 
#> 6 14    C0         Acylcarnitines         127   Valid

Manage metabolites and sample-wise meta data

The functions filterMetabolites and filterMetaData can be used to subset the data set based on certain metabolites or meta variables. Both functions subset conc_values, quant_status and aggregated_data.

Individual metabolites as well as metabolic classes can be passed to filterMetabolites. Here, we want to exclude the metabolism indicators.

metalyzer_se <- filterMetabolites(metalyzer_se, drop_metabolites = "Metabolism Indicators")
#> Info: 232 metabolites were removed!
metalyzer_se
#> class: SummarizedExperiment 
#> dim: 630 74 
#> metadata(4): file_path sheet_index status_list aggregated_data
#> assays(2): conc_values quant_status
#> rownames(630): C0 C2 ... TG(22:6_34:3) Choline
#> rowData names(1): metabolic_classes
#> colnames(74): 9 10 ... 89 90
#> colData names(7): Plate Bar Code Sample Bar Code ... Sample Volume
#>   Measurement Time

The extraction methods are encoded in the column “Sample Description” of the meta data. There are 2 blank samples in the data that we want to remove and only keep extraction method 1 to 6.

metalyzer_se <- filterMetaData(metalyzer_se, `Sample Description` %in% 1:6)
#> Info: 2 samples were removed!

To be more specific and for easier handling, we change the column “Sample Description” of the meta data to “Extraction_Method”.

metalyzer_se <- renameMetaData(metalyzer_se, "Extraction_Method" = "Sample Description")

meta_data <- colData(metalyzer_se)
head(meta_data)
#> DataFrame with 6 rows and 7 columns
#>            Plate Bar Code Sample Bar Code Sample Type Extraction_Method
#>               <character>     <character> <character>       <character>
#> 9  1030423581-1 | 10304..      1030420836      Sample                 1
#> 10 1030423581-1 | 10304..      1030420841      Sample                 1
#> 11 1030423581-1 | 10304..      1030420855      Sample                 1
#> 12 1030423581-1 | 10304..      1030423460      Sample                 2
#> 13 1030423581-1 | 10304..      1030420860      Sample                 2
#> 14 1030423581-1 | 10304..      1030420874      Sample                 2
#>         Tissue Sample Volume       Measurement Time
#>    <character>   <character>            <character>
#> 9   Drosophila            10 2020.10.05 17:52:54 ..
#> 10  Drosophila            10 2020.10.05 17:59:08 ..
#> 11  Drosophila            10 2020.10.05 18:05:19 ..
#> 12  Drosophila            10 2020.10.05 18:11:33 ..
#> 13  Drosophila            10 2020.10.05 18:17:47 ..
#> 14  Drosophila            10 2020.10.05 18:23:58 ..

Additional meta data can be easily added to the Summarized Experiment. Here, we want to add the column “Date” with the current date to the meta data as well as the column “Replicate” with extraction method replicates.

replicate_meta_data <- example_meta_data()
head(replicate_meta_data)
#>   Method Replicate
#> 1      1        R1
#> 2      1        R2
#> 3      1        R3
#> 4      2        R1
#> 5      2        R2
#> 6      2        R3
metalyzer_se <- updateMetaData(
  metalyzer_se,
  Date = Sys.Date(),
  Replicate = replicate_meta_data$Replicate
)

meta_data <- colData(metalyzer_se)
head(meta_data)
#> DataFrame with 6 rows and 9 columns
#>            Plate Bar Code Sample Bar Code Sample Type Extraction_Method
#>               <character>     <character> <character>       <character>
#> 9  1030423581-1 | 10304..      1030420836      Sample                 1
#> 10 1030423581-1 | 10304..      1030420841      Sample                 1
#> 11 1030423581-1 | 10304..      1030420855      Sample                 1
#> 12 1030423581-1 | 10304..      1030423460      Sample                 2
#> 13 1030423581-1 | 10304..      1030420860      Sample                 2
#> 14 1030423581-1 | 10304..      1030420874      Sample                 2
#>         Tissue Sample Volume       Measurement Time       Date Replicate
#>    <character>   <character>            <character>   <factor>  <factor>
#> 9   Drosophila            10 2020.10.05 17:52:54 .. 2024-12-06        R1
#> 10  Drosophila            10 2020.10.05 17:59:08 .. 2024-12-06        R2
#> 11  Drosophila            10 2020.10.05 18:05:19 .. 2024-12-06        R3
#> 12  Drosophila            10 2020.10.05 18:11:33 .. 2024-12-06        R1
#> 13  Drosophila            10 2020.10.05 18:17:47 .. 2024-12-06        R2
#> 14  Drosophila            10 2020.10.05 18:23:58 .. 2024-12-06        R3

Analysis of extraction methods

MetAlyzer includes functions to perform data analysis as it was performed for MetaboExtract (Andresen et al. 2022). All analysing functions in MetAlyzer use the aggregated concentration data and quantification status (‘aggregated_data’). The calculate_cv calculates the mean, standard deviation (SD) and the coefficient of variation (CV) within each group and saves it to ‘aggregated_data’. Each group is additionally categorized into given thresholds of variation based on the calculated CV.

metalyzer_se <- calculate_cv(
  metalyzer_se,
  groups = c("Tissue", "Extraction_Method", "Metabolite"),
  cv_thresholds = c(0.1, 0.2, 0.3),
  na.rm = TRUE
)
#> Info: Calculating mean and coefficient of variation (groupwise: Tissue * Extraction_Method * Metabolite)...  finished!

aggregated_data <- aggregatedData(metalyzer_se) %>%
  select(c(Extraction_Method, Metabolite, Mean, SD, CV, CV_thresh))
#> Adding missing grouping variables: `Tissue`
head(aggregated_data)
#> # A tibble: 6 × 7
#> # Groups:   Tissue, Extraction_Method, Metabolite [2]
#>   Tissue     Extraction_Method Metabolite  Mean    SD    CV CV_thresh
#>   <fct>      <fct>             <fct>      <dbl> <dbl> <dbl> <fct>    
#> 1 Drosophila 1                 C0          179.  82.4 0.461 more30   
#> 2 Drosophila 1                 C0          179.  82.4 0.461 more30   
#> 3 Drosophila 1                 C0          179.  82.4 0.461 more30   
#> 4 Drosophila 2                 C0          231. 124.  0.538 more30   
#> 5 Drosophila 2                 C0          231. 124.  0.538 more30   
#> 6 Drosophila 2                 C0          231. 124.  0.538 more30

As described in Gegner et al., an ANOVA is calculated to identify extraction methods with superior and inferior extraction performance. NAs or zero values are first imputed, to avoid NAs during log2 transformation. To impute NAs, the impute_NA argument can be set to TRUE. The argument impute_perc_of_min specifies the percentage of the minimum concentration value that is used for imputation.


metalyzer_se <- calculate_anova(
  metalyzer_se,
  categorical = "Extraction_Method",
  groups = c("Tissue", "Metabolite"),
  impute_perc_of_min = 0.2,
  impute_NA = TRUE
)
#> Info: Imputing concentrations (groupwise: Metabolite) with 20% of the minimal positive value...  finished!
#> Info: Calculating ANOVA (groupwise: Tissue * Metabolite)...  finished!

aggregated_data <- aggregatedData(metalyzer_se) %>%
  select(c(Extraction_Method, Metabolite, imputed_Conc, log2_Conc, ANOVA_n, ANOVA_Group))
#> Adding missing grouping variables: `Tissue`
head(aggregated_data)
#> # A tibble: 6 × 7
#> # Groups:   Tissue, Metabolite, Extraction_Method [2]
#>   Tissue Extraction_Method Metabolite imputed_Conc log2_Conc ANOVA_n ANOVA_Group
#>   <fct>  <fct>             <fct>             <dbl>     <dbl>   <int> <chr>      
#> 1 Droso… 1                 C0                203        7.67       3 A          
#> 2 Droso… 1                 C0                 86.8      6.44       3 A          
#> 3 Droso… 1                 C0                246        7.94       3 A          
#> 4 Droso… 2                 C0                198        7.63       3 A          
#> 5 Droso… 2                 C0                369        8.53       3 A          
#> 6 Droso… 2                 C0                127        6.99       3 A

The result of the imputation is stored in an extra column “imputed_Conc”. In total 12,212 zero values were imputed in this example.

cat("Number of zero values before imputation:",
    sum(aggregatedData(metalyzer_se)$Concentration == 0, na.rm = TRUE), "\n")
#> Number of zero values before imputation: 12212

cat("Number of zero values after imputation:",
    sum(aggregatedData(metalyzer_se)$imputed_Conc == 0, na.rm = TRUE), "\n")
#> Number of zero values after imputation: 0

Analysis and visualization of treatment data

MetAlyzer also includes functions to calculate and visualize the log2 fold change of data with paired effects. For this, we load a dataset with control and mutated samples analyzed using the MxP®Quant 500 XL kit. The XL kit covers 332 additional metabolites and 6 additional metabolic classes compared to the MxP®Quant 500 kit.

fpath <- example_mutation_data_xl()
metalyzer_se <- MetAlyzer_dataset(file_path = fpath)
#> 
#> 
#>  _____ ______   _______  _________  ________  ___           ___    ___ ________  _______   ________
#> |\   _ \  _   \|\  ___ \|\___   ___\\   __  \|\  \         |\  \  /  /|\_____  \|\  ___ \ |\   __  \
#> \ \  \\\__\ \  \ \   __/\|___ \  \_\ \  \|\  \ \  \        \ \  \/  / /\|___/  /\ \   __/|\ \  \|\  \
#>  \ \  \\|__| \  \ \  \_|/__  \ \  \ \ \   __  \ \  \        \ \    / /     /  / /\ \  \_|/_\ \   _  _\
#>   \ \  \    \ \  \ \  \_|\ \  \ \  \ \ \  \ \  \ \  \____    \/   / /     /  /_/__\ \  \_|\ \ \  \\  \| 
#>    \ \__\    \ \__\ \_______\  \ \__\ \ \__\ \__\ \_______\__/   / /     |\________\ \_______\ \__\\ _\ 
#>     \|__|     \|__|\|_______|   \|__|  \|__|\|__|\|_______|\____/ /       \|_______|\|_______|\|__|\|__|
#>                                                           \|____|/
#> 
#> 
#> Info: Reading color code "FFCBD2D7" as "#CBD2D7"
#> Info: Reading color code "FF7FB2C5" as "#7FB2C5"
#> Info: Reading color code "FFCBD2D7" as "#CBD2D7"
#> Info: Reading color code "FFB9DE83" as "#B9DE83"
#> Info: Reading color code "FFB9DE83" as "#B9DE83"
#> Info: Reading color code "FFA28BA3" as "#A28BA3"
#> Info: Reading color code "FFA28BA3" as "#A28BA3"
#> Info: Reading color code "FFB2D1DC" as "#B2D1DC"
#> Info: Reading color code "FFB2D1DC" as "#B2D1DC"
#> Info: Reading color code "FF7FB2C5" as "#7FB2C5"
#> 
#> Measured concentration values:
#> ------------------------------
#>        0%       25%       50%       75%      100% 
#>      0.00      1.63      7.77     43.07 139200.40 
#> 
#> NAs: 155 (0.85%)
#> 
#> 
#> Measured quantification status:
#> -------------------------------
#> Valid: 14280 (77.85%)
#> LOQ: 819 (4.47%)
#> LOD: 3243 (17.68%)
#> NAs: 0 (0%)

metalyzer_se
#> class: SummarizedExperiment 
#> dim: 1019 18 
#> metadata(4): file_path sheet_index status_list aggregated_data
#> assays(2): conc_values quant_status
#> rownames(1019): C0 C2 ... TG 22:6_34:3 Choline
#> rowData names(1): metabolic_classes
#> colnames(18): 32 33 ... 48 49
#> colData names(11): Plate bar code Sample bar code ... Org info
#>   Measurement time

Again, we want to remove the metabolism indicators.

metalyzer_se <- filterMetabolites(metalyzer_se, drop_metabolites = "Metabolism Indicators")
#> Info: No metabolites were removed!
metalyzer_se
#> class: SummarizedExperiment 
#> dim: 1019 18 
#> metadata(4): file_path sheet_index status_list aggregated_data
#> assays(2): conc_values quant_status
#> rownames(1019): C0 C2 ... TG 22:6_34:3 Choline
#> rowData names(1): metabolic_classes
#> colnames(18): 32 33 ... 48 49
#> colData names(11): Plate bar code Sample bar code ... Org info
#>   Measurement time

In this data set, the column “Sample Description” holds information about the wether a sample is a wild type or a mutant.

meta_data <- colData(metalyzer_se)
meta_data$`Sample Description`
#>  [1] "Mutant"  "Mutant"  "Mutant"  "Mutant"  "Mutant"  "Mutant"  "Mutant" 
#>  [8] "Mutant"  "Mutant"  "Control" "Control" "Control" "Control" "Control"
#> [15] "Control" "Control" "Control" "Control"

To determine the direction of the effect during log2 fold change calculation, the information is converted into a factor and saved as a new column “Control_Mutant”. The log2 fold change will now be calculated from “Control” to “Mutant”.

control_mutant <- factor(colData(metalyzer_se)$`Sample Description`, levels = c("Control", "Mutant"))
metalyzer_se <- updateMetaData(metalyzer_se, Control_Mutant = control_mutant)

meta_data <- colData(metalyzer_se)
meta_data$Control_Mutant
#>  [1] Mutant  Mutant  Mutant  Mutant  Mutant  Mutant  Mutant  Mutant  Mutant 
#> [10] Control Control Control Control Control Control Control Control Control
#> Levels: Control Mutant

In order to avoid NAs during log2 transformation the same imputation of NAs or zero values is applied as for calculate_anova. To impute NAs, the impute_NA argument can be set to TRUE. The argument impute_perc_of_min specifies the percentage of the minimum concentration value that is used for imputation.

metalyzer_se <- calculate_log2FC(
  metalyzer_se,
  categorical = "Control_Mutant",
  impute_perc_of_min = 0.2,
  impute_NA = TRUE
)
#> Info: Imputing concentrations (groupwise: Metabolite) with 20% of the minimal positive value...  finished!
#> Warning: Partial NA coefficients for 2 probe(s)

In addition to accessing the log2 fold change data through metadata(metalyzer_se)$log2FC, it can be directly accessed using log2FC.

log2FC(metalyzer_se)
#> # A tibble: 1,019 × 5
#> # Groups:   Metabolite [1,019]
#>    Metabolite    Class           log2FC   pval  qval
#>    <chr>         <fct>            <dbl>  <dbl> <dbl>
#>  1 C0            Acylcarnitines  0.0953 0.728  0.868
#>  2 C2            Acylcarnitines -0.323  0.419  0.633
#>  3 C3            Acylcarnitines  0.351  0.249  0.486
#>  4 C3-DC (C4-OH) Acylcarnitines -0.315  0.413  0.627
#>  5 C3-OH         Acylcarnitines  0.196  0.619  0.798
#>  6 C3:1          Acylcarnitines  0.322  0.475  0.684
#>  7 C4            Acylcarnitines -0.240  0.305  0.537
#>  8 C4:1          Acylcarnitines -0.959  0.0995 0.357
#>  9 C5            Acylcarnitines -0.115  0.671  0.831
#> 10 C5-DC (C6-OH) Acylcarnitines -0.342  0.137  0.394
#> # ℹ 1,009 more rows

The log2 fold change between the wild type and the mutant can be visualized with a volcano plot:


log2fc_vulcano <- plot_log2FC(
  metalyzer_se,
  hide_labels_for = rownames(rowData(metalyzer_se)),
  vulcano=TRUE
)

log2fc_vulcano

as well as in a scatter plot categorized by metabolic classes:


log2fc_by_class <- plot_log2FC(
  metalyzer_se,
  hide_labels_for = rownames(rowData(metalyzer_se)),
  vulcano=FALSE
)

log2fc_by_class

MetAlyzer also provides a visualization of the log2 fold change across different pathways, in order to identify groups of metabolites impacted by the effect.


log2fc_network <- plot_network(
  metalyzer_se,
  q_value=0.05,
  metabolite_text_size=2,
  connection_width=0.75,
  pathway_text_size=4,
  pathway_width=4,
  scale_colors = c("green", "black", "magenta")
)
#> Warning: Removing 5 invalid nodes.

log2fc_network