SIPmg-vignette

Pranav Sampara

Introduction

The SIPmg R package was designed to enable the exploration and statistical analysis of qSIP metagenomics data. Particularly, this package allows the identification of isotope incorporators and quantifies isotopic enrichment.

Please check out this introductory vignette to qSIP metagenomics, before checking out vignettes on:

[Coverage normalization and scaling based on linear regression model][https://zielslab.github.io/SIPmg.github.io/scaling-coverage-data.html]

[Identification of incorporators based on qSIP, ΔBD, or DESeq2 methods][https://zielslab.github.io/SIPmg.github.io/identifying-incorporators.html]

Background

This section introduces a bit of a background on the methodologies used in this pipeline, such as stable isotope probing, quantitative metagenomics, and quantitative stable isotope probing. References as a point of departure for more reading are also provided.

Stable isotope probing

Stable isotope probing (SIP) targets active populations within complex microbiomes by providing growth substrates enriched in a heavy stable isotope of carbon, nitrogen, or oxygen. In DNA-SIP, the DNA of microbes actively incorporating the labelled substrate into their DNA during cell division and growth would become increasingly labelled and heavier compared to the microbes not incorporating the substrate into their DNA. Heavier DNA can be physically separated via isopycnic centrifugation on a CsCl gradient, followed by fractionation. Sequencing effort on the “heavier density fractions” could inform the identity and function of microbes that actively incorporated the isotopic substrate into their DNA. Thus, DNA-SIP is a powerful technique to link the identity and function of environmental microbes.

However, traditional DNA-SIP is only a qualitative technique to identify active microbial populations and no quantitative estimates of isotopic enrichment of DNA are provided. For instance, the distinction between labelled and unlabelled microbes is defined by density gradient regions, limiting the resolution of taxon-specific responses to labelled or unlabelled. Moreover, there exists a guanine-cytosine (GC) bias in conventional SIP studies. The native buoyant density of DNA can differ by as much as 0.05 g/mL over the range of GC contents that can occur in complex communities. Lighter GC-content genomes may not shift into the researcher-assigned “heavier density fractions”, while high GC-content genomes may fall in the “heavier density fractions” without any isotopic substrate assimilation. Thus, a qualitative analysis may mask the true incorporators and, worse, falsely identify non-incorporators as incorporators, skewing the analysis of the study.

Quantitative metagenomics

Although metagenomics offers a comprehensive account of the metabolic repertoire in an environment, typically, the data is compositional. Relative abundance of genes or taxa is estimated to determine microbial community dynamics, which is directly influenced by the dynamics of rest of the community. Even if a certain gene remains in the same concentration, its relative abundance changes if a certain other gene increases/decreases in concentration. Absolute abundance of gene/genome can be critical, for instance, in the surveillance of a pathogenic gene/genome in an environment. For quantitative measurements of microbial taxon absolute abundance in a given metagenomic sample, Hardwick et al. (2018) proposed synthetic spike-ins, or sequins, as internal reference standards. Reference standards in metagenomics can act as scaling factors to evaluate quantitative estimates of individual biological features, in this case, genes or genomes.

Quantitative SIP (qSIP)

In quantitative SIP (qSIP), quantitative estimates of isotope incorporation, expressed as atom fraction excess (AFE), are calculated based on a mathematical model. Quantitative estimates of isotopic enrichment of DNA facilitate accurate and sensitive estimates of substrate uptake measurements, and growth and mortality rates of individual taxa in a complex community. Moreover, qSIP analysis is less susceptible to GC and enrichment biases, and taxon abundance, making it amenable to quantify isotopic incorporations in a complex community such as the activated sludge microbiome, with in-situ conditions. Integration of qSIP and quantitative metagenomics (qSIP metagenomics) facilitates the exploration of active microbial populations’ taxonomic and metabolic diversity with a quantitative estimate of their abundance and isotope incorporation.

For details on SIP, quantitative metagenomics, and qSIP, please refer to the following works:

Using SIPmg R package

This R package was designed to facilitate statistical analyses using SIP metagenomic data. Particularly, coverage and taxonomic classification of metagenome-assembled genomes and metadata of fractionated DNA is used to identify incorporators and quantify isotopic enrichment.

Overall, the package can perform the following:

  1. Coverage normalization to absolute concentrations or relative coverage. Scaling of coverages to absolute abundances based on post-fractionation reference standards, called sequins.

  2. Statistical analysis based on either of

  1. Data visualization for identification of isotope incorporators.

Example workflow

qSIP analysis is typically performed using sequins, MAG coverage, and MAG taxonomy is performed using this R markdown. Briefly, coverages are normalized, MAG coverages are then scaled based on linear regression models from sequin coverage and concentrations, a phyloseq object is created from MAG absolute concentrations and taxonomy (GTDB taxonomy output format is required) data. Note: Sequins that were spiked in the DNA-SIP fractions will be used in scaling and creating linear regression models for evaluating absolute MAG concentrations. Please see sequin_scaling.R For normalizing coverage values please see pooling_funs.R

This R markdown primarily uses Tidyverse, phyloseq, and HTSSIP packages This markdown uses functions from functions_qSIP_MAGs.R

For this data, MetaSIPSim (developed by Barnett et al, 2020) was used to simulate a data set from a hypothetical SIP workflow performed on the MBARC-26 mock community (as defined in Singer et al, 2016). The data set was further augmented with simulated post-fractionation sequin spike-ins and the data set was simulated to have all 26 organisms as incorporators. Some of the 26 organisms also had plasmids, which were assembled as distinct genomic units, adding the total genomic features in the simulated study to be 38

#Load required libraries

#list.of.packages <- c("tidyverse", "HTSSIP", "ggpubr","data.table")
#new.packages <- list.of.packages[!(list.of.packages %in% #installed.packages()[,"Package"])]
#if(length(new.packages)) install.packages(new.packages, quiet = TRUE, #dependencies = TRUE, repos = "http://cran.us.r-project.org")

#if (!require("BiocManager", quietly = TRUE))
#    install.packages("BiocManager")
#BiocManager::install("phyloseq")

#BiocManager::install("EBImage")

library(phyloseq)
library(HTSSIP)
library(ggpubr)
#> Loading required package: ggplot2
library(SIPmg)
#> 
#> Attaching package: 'SIPmg'
#> The following objects are masked from 'package:HTSSIP':
#> 
#>     DESeq2_l2fc, HRSIP


set.seed(seed = 1000)

Load required data

The following files are required:

MAG coverage data

Pooled coverages data as a comma separated file (.csv file) across samples for Features including sequins that were used as spike-ins. The followings columns are required:

  1. Feature: A character string describing the Feature label

  2. Sample: The label for these n number of columns should be in the format of “‘isotope’_rep_#_fraction_#”. For instance, “12C_rep_1_fraction_1”. The number of sample columns should be the same as the product of (replicates, fractions, isotopes)

Sequin metadata

Load the sequins metadata as a comma separated file (.csv file) which has the following columns:

  1. Feature: As described above

  2. Length: Length of the sequin in bp

  3. GC_content: GC content of the sequin

  4. Sequence: Sequin nucleotide sequence

  5. Concentration: Concentration of the sequin in attamoles/uL

Dilutions data

Load dilutions data as a comma separated file (.csv file) that contains the following columns

  1. Sample: Similar to the sample name as described above.

  2. Dilution: Dilution of sequins added to the fraction before sequencing.

Fractions metadata

A fractions file as a comma separated file (.csv file) with the following columns:

  1. Replicate: Depends on how many replicates the study has

  2. Fractions: Typically in the range of 1-24

  3. Buoyant_density: As calculated from the refractometer for each fraction and replicate

  4. Isotope - “12C”, “13C”, “14N”, “15N” etc.

  5. DNA_concentration

  6. Sample - In the format “‘isotope’_rep_#_fraction_#”. For instance 12C_rep_1_fraction_1

GTDB style taxonomy data

A taxonomy file in the GTDB output format (.tsv format). Load the bacteria and archaea taxonomy outputs separately. The markdown requires loading the standard output files from GTDB-Tk separately for bacteria and archaea

MAG absolute concentrations

MAG absolute concentrations. mag_tab object obtained from the above script is to be used as the input here

GC content

GC content of the MAGs as a comma separated file (.csv file). The table should contain the following columns:

  1. OTU: MAG identifier such as the Feature label from the sequin_scaling.R script

  2. GC_content: GC content of the Feature in the range of 0-1

Log scale BOOLEAN

True or False depending on how you would want the MAG coverages to be scaled. Select TRUE if you need MAG concentrations scaled on the log scale

Coefficient of variation

Acceptable coefficient of variation for coverage and detection (eg. 20 for 20 % threshold of coefficient of variation)

Coverages above the threshold value will be flagged in the plots. Here a value of 20 is used.

## Load data
#Coverage metadata
#Uncomment if your coverage data is in the format mentioned above for this file. Remains commented if you are using the output from `checkm coverage`
f_tibble <- readr::read_csv("mock_input_data/coverage_metadata.csv")
#> Rows: 124 Columns: 61
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (1): Feature
#> dbl (60): 12C_rep_1_fraction_10, 12C_rep_1_fraction_1, 12C_rep_1_fraction_2,...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#Sequins metadata
sequins <- readr::read_csv(file="mock_input_data/sequins_metadata.csv")
#> Rows: 115 Columns: 5
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): Feature, Sequence
#> dbl (3): Length, GC_content, Concentration
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#Dilutions data
seq_dil = readr::read_csv(file = "mock_input_data/dilutions_data.csv")
#> Rows: 60 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): Sample
#> dbl (1): Dilution
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#Log scale BOOLEAN. True or False depending on how you would want the MAG coverages to be scaled. Select TRUE if you need MAG concentrations scaled on the log scale
log_scale = TRUE

#coe_of_variation. Acceptable coefficient of variation for coverage and detection (eg. 20 - for 20 % threshold of coefficient of variation) (Coverages above the threshold value will be flagged in the plots)
coe_of_variation = 20 

#Taxonomy
gtdbtk_bac_summary = readr::read_delim("mock_input_data/gtdbtk.bac120.summary.tsv", 
                                 "\t", escape_double = FALSE, trim_ws = TRUE)
#> Rows: 33 Columns: 19
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr  (2): user_genome, classification
#> lgl (17): fastani_reference, fastani_reference_radius, fastani_taxonomy, fas...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
gtdbtk_archaea = readr::read_delim("mock_input_data/gtdbtk.ar122.summary.tsv", 
                             "\t", escape_double = FALSE, trim_ws = TRUE)
#> Rows: 5 Columns: 19
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr  (2): user_genome, classification
#> lgl (17): fastani_reference, fastani_reference_radius, fastani_taxonomy, fas...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#GC content
GC_content <- readr::read_csv(file = "mock_input_data/GC_content.csv")
#> Rows: 38 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): OTU
#> dbl (1): Gi
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#Fractions
fractions = readr::read_csv("mock_input_data/fractions.csv")
#> Rows: 60 Columns: 6
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): Isotope, Sample
#> dbl (4): Replicate, Fraction, Buoyant_density, DNA_concentration(ng/uL)
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Coverage normalization and scaling

Coverage data of MAGs (or features of interest) and sequins is scaled using sequin concentration to obtain absolute concentration of MAGs (or features of interest). For this step, either linear regression or robust linear regression can be used.

If linear regression is used, the function scale_features_lm is used. The functions in sequin_scaling_lm.R are used for this step.

If robust linear regression is used, the outliers impact on the regression models is minimized. For robust linear regression, scale_features_rlm is used. The functions in sequin_scaling_rlm.R are used for this step. In this example workflow, robust linear regression is used to scale coverage data in log10 scale.

For more discussion on the coice of regression method and comparison of the two methods please see this [vignette][https://zielslab.github.io/SIPmg.github.io/scaling-coverage-data.html]

This function uses the following datasets:

  1. Coverage data (f_tibble)
  2. Sequins metadata (sequins)
  3. Dilution of sequins used to add into the fractions (seq_dil)

The regression scaling plots are saved in the project folder within a subfolder sequin_scaling_plots

taxonomy_tibble = dplyr::bind_rows(gtdbtk_bac_summary, gtdbtk_archaea) #Combine bacteria and archaea taxonomy files if it has not been done yet
#mag_tab is a tibble with absolute concentrations of MAGs obtained by scaling MAG coverages using linear regression models on sequin coverages and concentration

##Scale MAG coverages to obtain MAG absolute concentrations and save scaling plots in the working directory
#For rlm scaling using scale_features_rlm
#For rlm scaling using scale_features_lm
mag_tab_scaled <- SIPmg::scale_features_rlm(f_tibble, sequins, seq_dil, log_scale, coe_of_variation = coe_of_variation, save_plots = FALSE)
#> Warning: There was 1 warning in `dplyr::mutate()`.
#> ℹ In argument: `fit = ifelse(...)`.
#> Caused by warning in `rlm.default()`:
#> ! 'rlm' failed to converge in 20 steps
#> Warning in dir.create(plot_dir):
#> '/var/folders/bz/_jkww2rn53j3fb0m7jdgdnyr0000gn/T//RtmpeaTW4l' already exists
mag_tab = as.matrix(mag_tab_scaled$mag_tab) #Extract absolute abundances as a matrix

Example of a scaling plot

An example of a scaling plot is as below

#> Warning: The dot-dot notation (`..rr.label..`) was deprecated in ggplot2 3.4.0.
#> ℹ Please use `after_stat(rr.label)` instead.
#> ℹ The deprecated feature was likely used in the SIPmg package.
#>   Please report the issue at <]8;;https://github.com/ZielsLab/SIPmghttps://github.com/ZielsLab/SIPmg]8;;>.
#> `geom_smooth()` using formula = 'y ~ x'

Preparation of scaled data for qSIP analysis

The scaled abundance data, taxonomy data, and metadata is then converted to phyloseq objects and a master phyloseq object is created for qSIP analysis.


mag.table = phyloseq::otu_table(mag_tab, taxa_are_rows = TRUE) #Phyloseq OTU table

taxonomy.object = SIPmg::tax.table(taxonomy_tibble) # Create a taxonomy phyloseq object
samples.object = SIPmg::sample.table(fractions) # Create a samples phyloseq object
phylo.qSIP = SIPmg::phylo.table(mag.table, taxonomy.object, samples.object) # Make a phyloseq table for downstream qSIP analysis

Estimation of atom fraction excess

The following steps estimate the atom fraction excess (AFE) in the MAGs. This calculation is based on the mathematical model suggested by Hungate et al. (2015). In the article, the authors suggest that GC contents of the individual biological features of interest should be accounted for to have a better estimate of AFE. Through the power of metagenomic analysis and the recovery of MAGs, one can estimate the GC content of features of interest and thereby obtain a better estimate of AFE. Thus, here the user would use the GC contents of MAGs in the AFE estimations and obtain isotopic enrichment of individual MAGs.

An important consideration is that the AFE calculations are based on a mathematical model and are not an absolute estimate of isotopic enrichment. One of the goals of AFE estimation is to determine isotope incorporators.

In this example workflow, the approach reported in Hungate et al. (2015) is used to determine incorporators.

For this step, the following are used:

  1. Master phyloseq object from the previous step
  2. GC content

Thanks to HTSSIP R package, bootstrapping can be performed faster with multithreading.

atomX = SIPmg::qSIP_atom_excess_MAGs(phylo.qSIP,
                               control_expr='Isotope=="12C"',
                               treatment_rep='Replicate',
                               Gi = GC_content)
#> Warning: `mutate_()` was deprecated in dplyr 0.7.0.
#> ℹ Please use `mutate()` instead.
#> ℹ See vignette('programming') for more help
#> ℹ The deprecated feature was likely used in the HTSSIP package.
#>   Please report the issue to the authors.
#> Warning: `select_()` was deprecated in dplyr 0.7.0.
#> ℹ Please use `select()` instead.
#> ℹ The deprecated feature was likely used in the HTSSIP package.
#>   Please report the issue to the authors.
#> Warning: `filter_()` was deprecated in dplyr 0.7.0.
#> ℹ Please use `filter()` instead.
#> ℹ See vignette('programming') for more help
#> ℹ The deprecated feature was likely used in the SIPmg package.
#>   Please report the issue at <]8;;https://github.com/ZielsLab/SIPmghttps://github.com/ZielsLab/SIPmg]8;;>.
#> Warning: `summarise_()` was deprecated in dplyr 0.7.0.
#> ℹ Please use `summarise()` instead.
#> ℹ The deprecated feature was likely used in the SIPmg package.
#>   Please report the issue at <]8;;https://github.com/ZielsLab/SIPmghttps://github.com/ZielsLab/SIPmg]8;;>.
#> Warning: `group_by_()` was deprecated in dplyr 0.7.0.
#> ℹ Please use `group_by()` instead.
#> ℹ See vignette('programming') for more help
#> ℹ The deprecated feature was likely used in the SIPmg package.
#>   Please report the issue at <]8;;https://github.com/ZielsLab/SIPmghttps://github.com/ZielsLab/SIPmg]8;;>.
#Bootstrap confidence intervals
df_atomX_boot = SIPmg::qSIP_bootstrap_fcr(atomX, n_boot=10) #Change "parallel = FALSE" to compute using a single-core

Plot the atom fraction excess plot

Plot the atom fraction excess data with 5 % and 95 % quantiles as the confidence limits. In the same way as reported in Hungate et al. (2015), a MAG is considered an incorporator if the lower confidence interval of its AFE is above zero.

CI_threshold = 0
df_atomX_boot = df_atomX_boot %>%
  dplyr::mutate(Incorporator = A_CI_fcr_low > CI_threshold,
         OTU = reorder(OTU, -A))

(atom_f_excess_plot = ggplot2::ggplot(df_atomX_boot, aes(OTU, A, ymin=A_CI_low, ymax=A_CI_high, color=Incorporator)) +
  geom_pointrange(size=0.25) +
  geom_linerange() +
  geom_hline(yintercept=0, linetype='dashed', alpha=0.5) +
  labs(x='MAGs', y='Atom fraction excess') +
  theme_bw() +
  coord_flip() +
  ggtitle("Isotope incorporating MAGs"))

ggplot2::ggsave(filename = "atom_fration_excess.pdf", plot = atom_f_excess_plot)
#> Saving 3 x 3 in image

Check incorporator list


#Get incorporator info
n_incorp = df_atomX_boot %>%
  dplyr::filter(Incorporator == TRUE) %>%
  nrow 
#Get incorporator list
incorporator_list = SIPmg::incorporators_taxonomy(taxonomy = taxonomy_tibble, bootstrapped_AFE_table = df_atomX_boot)
#Print incorporator information
cat('Number of incorporators:', n_incorp, '\n')
#> Number of incorporators: 28
print(incorporator_list, n = nrow(incorporator_list))
#> # A tibble: 28 × 2
#>    OTU         taxonomy                       
#>    <fct>       <chr>                          
#>  1 NC_000913.3 Terriglobusroseus              
#>  2 NC_003450.3 Segniliparusrotundus           
#>  3 NC_008261.1 Nocardiopsisdassonvillei       
#>  4 NC_010067.1 Olsenellauli                   
#>  5 NC_014008.1 Meiothermus_Bsilvanus          
#>  6 NC_014168.1 Meiothermus_Bsilvanus          
#>  7 NC_014210.1 Hungateiclostridiumthermocellum
#>  8 NC_014211.1 Clostridium_Pperfringens       
#>  9 NC_014212.1 Desulfosporosinusacidiphilus   
#> 10 NC_014213.1 Desulfosporosinusacidiphilus   
#> 11 NC_014214.1 Desulfosporosinusacidiphilus   
#> 12 NC_014363.1 Desulfosporosinusmeridiei      
#> 13 NC_015761.1 Streptococcuspyogenes          
#> 14 NC_017033.1 Thermobacilluscomposti         
#> 15 NC_018014.1 Hirschiabaltica                
#> 16 NC_018066.1 Hirschiabaltica                
#> 17 NC_018068.1 Salmonellaarizonae             
#> 18 NC_019792.1 Pseudomonas_Astutzeri_AE       
#> 19 NC_019897.1 Pseudomonas_Astutzeri_AE       
#> 20 NC_019936.1 Frateuriaaurantia              
#> 21 NC_019937.1 Sediminispirochaetasmaragdinae 
#> 22 NC_019938.1 Fervidobacteriumpennivorans    
#> 23 NC_019939.1 Coraliomargaritaakajimensis    
#> 24 NC_019964.1 Halovivaxruber                 
#> 25 NC_019974.1 Natronobacteriumgregoryi       
#> 26 NC_019975.1 Natronococcusoccultus          
#> 27 NC_019976.1 Natronococcusoccultus          
#> 28 NC_021184.1 Natronococcusoccultus

It appears that not all incorporators were identified with the bootstrapping method. This could be due to the model unable to detect the incorporators. Another way to detect incorporators, as mentioned above, is to test with the ΔBD method or with the HR-SIP or MW-HR-SIP methods. For a more detailed discussion, please refer this [vignette][https://zielslab.github.io/SIPmg.github.io/identifying-incorporators.html]

Scaling coverage data

In this example we will look at the two scaling approaches that can be utilized in the SIPmg package. The user can decide either

  1. A robust linear regression model
  2. An ordinary least squares linear regression model We will also briefly discuss data filtering methods when there are heavily influencing outliers, as a result of upstream methods.

Basic approach for scaling

Sequencing effort on each fraction and subsequently obtained coverage is typically used to estimate relative abundance for a feature of interest. However, as we discussed in the introductory vignette, this data is compositional and a quantitative estimate of concentrations can be highly informative. For this purpose, synthetic spike-in standards (sequins) with known concentrations are added before recovery of fractionated DNA, from the CsCl medium into TE buffer or water. For details on sequins please refer to Hardwick et al. (2018). These reference standards in each fraction, can now be used to estimate the concentration of the feature of interest in each fraction.

The approach used for scaling is detailed in the following steps:

  1. For each fraction, sequin coverages and known concentrations are obtained to make a standard curve.

  2. Limit of detection of sequins, i.e., the lowest concentration where at least one sequin has detectable coverage, is determined for each group of sequin concentrations.

  3. For each group of sequin concentrations, mean, standard deviation, and the coefficient of variation of coverage are determined. Groups of sequins with coefficient of variation greater than the set threshold will be flagged.

  4. Sequin groups with coverage values above the limit of detection and below the coefficient of variation threshold are filtered in preparation for regression.

  5. Linear regression or robust linear regression, based on the user choice, is performed on sequin coverage values as the independent variable and concentration of the sequin group as the dependent variable. The user can also decide on performing log scaling of coverage and concentration values for regression.

  6. The regression parameters are extracted and plots with regression parameters and best fit line are saved for inspection.

  7. Coverage values above the limit of detection and below the coefficient of variation threshold are filtered for further analysis. The rest of the values are flagged and reported.

  8. The filtered values are scaled based on regression parameters to estimate absolute concentrations in each library.

  9. Absolute concentrations of biological features are saved as a fresh dataset for the subsequent isotope incorporator identification and AFE estimation pipeline.

Choice of regression model

Ordinary least squares regression can be very sensitive to outliers, resulting a poor model performance when the data is contaminated with outliers. Although there are methods, like Cook’s distance based filtering to address outliers, it may not always be the best idea to remove these data points.

See these discussions and more for better insights into handling outliers.

Robust linear regression is an alternative to this situation. Robust linear regression assigns appropriate weights to the data points, minimizing the influence of outliers on the model performance, without deleting data. In this pipeline, a Huber loss function from the [MASS R package][https://CRAN.R-project.org/package=MASS] was used for robust linear regression.

The user is free to choose between the two methods for regression.

Scaling the data

For illustrating the differences in the regression models, a different dataset, that was impacted by one or many upstream methods, will be used than the one in the introductory vignette.

This dataset can be found here

#> Rows: 86 Columns: 6
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): Feature
#> dbl (5): 52434.2.332926.AAAGGCGT-TCAACTCC, 52434.2.332926.GCAGTGAA-ACATTCGG,...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 80 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): Feature
#> dbl (1): Concentration
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 5 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): Sample
#> dbl (1): Dilution
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

For both scaling functions the following data and parameters are required

  1. Coverage data. The output of checkm coverage command can be directly used.

  2. Sequin dilutions and metadata.

  3. Whether or not log10 scaling to be performed (a BOOLEAN value of TRUE or FALSE).

  4. Coefficient of variation. In this dataset, a higher coefficient of variation compared to the value used in the [introduction vignette][https://zielslab.github.io/SIPmg.github.io/quick-example.html] is used to allow data scaling to occur.

If the robust linear regression is chosen, the function is scale_features_rlm which is performed the function file sequin_scaling_rlm.R.

mag_tab_scaled_rlm <- SIPmg::scale_features_rlm(f_tibble, sequins, seq_dil, log_scale, coe_of_variation = coe_of_variation, save_plots = FALSE)
#> Warning: There was 1 warning in `dplyr::mutate()`.
#> ℹ In argument: `fit = ifelse(...)`.
#> Caused by warning in `rlm.default()`:
#> ! 'rlm' failed to converge in 20 steps
#> Warning in dir.create(plot_dir):
#> '/var/folders/bz/_jkww2rn53j3fb0m7jdgdnyr0000gn/T//RtmpeaTW4l' already exists

If a linear regression is chosen, the function is scale_features_lm which is performed by the function file sequin_scaling_lm.R.

If the user chooses linear regression, they can choose to perform outlier filtering based on the traditional Cook’s distance threshold of 4/n (n being the sample size). Only the outliers in the sequin coverage data are flagged as outliers with the Cook’s distance threshold and are filtered out. Later, the remaining data is subjected to linear regression to obtain model parameters and to scale coverage values of feature of interest.

mag_tab_scaled_lm <- SIPmg::scale_features_lm(f_tibble, sequins, seq_dil, log_scale, coe_of_variation = coe_of_variation, cook_filtering = TRUE, save_plots = FALSE)
#> Warning: There were 2 warnings in `dplyr::mutate()`.
#> The first warning was:
#> ℹ In argument: `slope = purrr::map_dbl(fit, ~summary(.)$coef[2])`.
#> Caused by warning in `summary.lm()`:
#> ! essentially perfect fit: summary may be unreliable
#> ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
#> Warning: There were 2 warnings in `dplyr::mutate()`.
#> The first warning was:
#> ℹ In argument: `slope_filtered = purrr::map_dbl(fit_filtered_lm,
#>   ~summary(.)$coef[2])`.
#> Caused by warning in `summary.lm()`:
#> ! essentially perfect fit: summary may be unreliable
#> ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
#> Warning in dir.create(plot_dir):
#> '/var/folders/bz/_jkww2rn53j3fb0m7jdgdnyr0000gn/T//RtmpeaTW4l' already exists
#> Warning in file.remove(plot_dir): cannot remove file
#> '/var/folders/bz/_jkww2rn53j3fb0m7jdgdnyr0000gn/T//RtmpeaTW4l', reason
#> 'Directory not empty'

Comparison of the fits

Robust linear regression provides a better model fit compared to the linear regression. Additionally, robust linear regression has less variability compared to the linear regression model.

# Robust linear regression plot
EBImage::display(rlm_example)


#Linear regression plot without filtering sequin data
EBImage::display(lm_example)

As discussed previously, removing the outliers may not always be the best idea. However, the outliers can be visualized to assess how “far out” are the outliers and how influential the data points are, negatively influencing the linear regression method.

The mag_tab_scaled_lm function provides the plots for the visualization of Cook’s distance for the sequin data in each fraction. In the plot below, an outlier is visualized. For this data, the Cook’s distance threshold was 0.09, and the outlier had a Cook’s distance of 10.8. The highly influential data point could well be the reason the linear regression model had a poor fit.


#Cook's distance threshold of the data set
4/(length(mag_tab_scaled_lm$scale_fac$cooksd[[3]]))
#> [1] 0.1142857

#Cook's distance of the outlier before filtering the data
max(mag_tab_scaled_lm$scale_fac$cooksd[[3]])
#> [1] 0.3588489

#Before filtration
EBImage::display(cooksd_example)

Upon filtering the data to remove the outlier, the fit becomes much better for the linear regression model

#Linear regression plot with filtered outliers in sequin data
EBImage::display(filtered_lm_example)

What if there were no sequins added in the study

In our study, we realized that the sequin-scaled data provided the best correlation (Spearman correlation coefficient = 0.85), compared to methods without the use of sequins. However, not all studies would have access to sequins or could choose not to add sequins. We tested qSIP methods with absolute concentration of MAGs obtained as a product of fraction DNA concentration and either with MAG relative abundance (as performed by Greenlon et al.) or relative coverage (as performed by Starr et al.), or simply with relative coverage, i.e., without converting the MAG coverages to absolute concentration. We found that without sequins, simply using relative coverage provided better specificity (0.99) and good correlation (Spearman correlation coefficient = 0.76), compared to the other two methods. The user could choose any of the three approaches to obtain normalized coverages using the equation coverage_normalization() where the parameter approach would be chosen accordingly (either “greenlon”, “starr”, or “relative_coverage”). In this example, simple relative coverage will be estimated without converting to absolute concentration based on fraction DNA concentrations. The relative coverage table can then be converted to a phyloseq object to run incorporator identification and/or AFE estimation. For more details, please refer to the [incorporator identification section][https://zielslab.github.io/SIPmg.github.io/identifying-incorporators.html]

f_tibble <- readr::read_csv("mock_input_data/coverage_metadata.csv")
#> Rows: 124 Columns: 61
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (1): Feature
#> dbl (60): 12C_rep_1_fraction_10, 12C_rep_1_fraction_1, 12C_rep_1_fraction_2,...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
rel.cov = SIPmg::coverage_normalization(f_tibble = f_tibble, approach = "relative_coverage")
mag.table = phyloseq::otu_table(as.matrix(rel.cov %>% tibble::column_to_rownames(var = "Feature")), taxa_are_rows = TRUE) #Phyloseq OTU table

Decision on the method to scale coverages

It can be seen from the above plots that the regression parameters differ based on the method of regression and whether or not data filtering is used. These regression parameters directly influence the estimation of concentrations of features of interest in the microbiome. Thus, the method for regression must be a careful choice.

SIP metagenomics and identification of isotope incorporators

The isotope incorporators could be identified using the below methods:

While the first two methods gives a quantitative estimate of isotopic enrichment in a taxon to determine if it is an incorporator, the third method relies on differential abundance analysis over multiple overlapping windows. Please refer to Youngblut et al. and HTS-SIP R package for more details.

Comparison of the methods

AFE method

Isotope incorporators from the introductory vignette were identified using the AFE method Reproducing the code here, we have

#> Rows: 124 Columns: 61
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (1): Feature
#> dbl (60): 12C_rep_1_fraction_10, 12C_rep_1_fraction_1, 12C_rep_1_fraction_2,...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 115 Columns: 5
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): Feature, Sequence
#> dbl (3): Length, GC_content, Concentration
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 60 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): Sample
#> dbl (1): Dilution
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 33 Columns: 19
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr  (2): user_genome, classification
#> lgl (17): fastani_reference, fastani_reference_radius, fastani_taxonomy, fas...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 5 Columns: 19
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr  (2): user_genome, classification
#> lgl (17): fastani_reference, fastani_reference_radius, fastani_taxonomy, fas...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 38 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): OTU
#> dbl (1): Gi
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 60 Columns: 6
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): Isotope, Sample
#> dbl (4): Replicate, Fraction, Buoyant_density, DNA_concentration(ng/uL)
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning in dir.create(plot_dir):
#> '/var/folders/bz/_jkww2rn53j3fb0m7jdgdnyr0000gn/T//RtmpeaTW4l' already exists
#> Warning in file.remove(plot_dir): cannot remove file
#> '/var/folders/bz/_jkww2rn53j3fb0m7jdgdnyr0000gn/T//RtmpeaTW4l', reason
#> 'Directory not empty'
atomX = SIPmg::qSIP_atom_excess_MAGs(phylo.qSIP,
                               control_expr='Isotope=="12C"',
                               treatment_rep='Replicate',
                               Gi = GC_content)
#Bootstrap confidence intervals
df_atomX_boot = SIPmg::qSIP_bootstrap_fcr(atomX, n_boot=10)
CI_threshold = 0
df_atomX_boot = df_atomX_boot %>%
  dplyr::mutate(Incorporator_qSIP = A_CI_fcr_low > CI_threshold,
                Incorporator_delbd = A_delbd - A_delbd_sd > 0,
         OTU = stats::reorder(OTU, -A))

df_atomX_boot = df_atomX_boot %>%
  dplyr::inner_join(taxonomy_tibble %>% 
               dplyr::select(user_genome, classification) %>%
               dplyr::rename(OTU = user_genome)) 
#> Joining with `by = join_by(OTU)`

MW-HR-SIP

Check incorporators in the overlapping windows of 1.71 - 1.74; 1.72 - 1.75; 1.73 - 1.76. The windows must be chosen in a more judicious manner that fits the hypothesis of the study.

windows = data.frame(density_min=c(1.71,1.72, 1.73), 
                     density_max=c(1.74,1.75,1.76))

padj_cutoff = 0.05
#ncores = 6
#doParallel::registerDoParallel(ncores)

mw.hr.sip = SIPmg::HRSIP(physeq = phylo.qSIP, design = ~Isotope,
                                    density_windows = windows,
                                    sparsity_threshold = seq(0, 0.3, 0.05), 
                                    padj_cutoff = padj_cutoff)
#> Sparsity threshold:0
#> Density window:1.71-1.74
#> converting counts to integer mode
#> Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
#> design formula are characters, converting to factors
#> Sparsity threshold:0.05
#> Density window:1.71-1.74
#> converting counts to integer mode
#> Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
#> design formula are characters, converting to factors
#> Sparsity threshold:0.1
#> Density window:1.71-1.74
#> converting counts to integer mode
#> Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
#> design formula are characters, converting to factors
#> Sparsity threshold:0.15
#> Density window:1.71-1.74
#> converting counts to integer mode
#> Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
#> design formula are characters, converting to factors
#> Sparsity threshold:0.2
#> Density window:1.71-1.74
#> converting counts to integer mode
#> Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
#> design formula are characters, converting to factors
#> Sparsity threshold:0.25
#> Density window:1.71-1.74
#> converting counts to integer mode
#> Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
#> design formula are characters, converting to factors
#> Sparsity threshold:0.3
#> Density window:1.71-1.74
#> converting counts to integer mode
#> Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
#> design formula are characters, converting to factors
#> Sparsity threshold:0
#> Density window:1.72-1.75
#> converting counts to integer mode
#> Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
#> design formula are characters, converting to factors
#> Sparsity threshold:0.05
#> Density window:1.72-1.75
#> converting counts to integer mode
#> Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
#> design formula are characters, converting to factors
#> Sparsity threshold:0.1
#> Density window:1.72-1.75
#> converting counts to integer mode
#> Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
#> design formula are characters, converting to factors
#> Sparsity threshold:0.15
#> Density window:1.72-1.75
#> converting counts to integer mode
#> Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
#> design formula are characters, converting to factors
#> Sparsity threshold:0.2
#> Density window:1.72-1.75
#> converting counts to integer mode
#> Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
#> design formula are characters, converting to factors
#> Sparsity threshold:0.25
#> Density window:1.72-1.75
#> converting counts to integer mode
#> Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
#> design formula are characters, converting to factors
#> Sparsity threshold:0.3
#> Density window:1.72-1.75
#> converting counts to integer mode
#> Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
#> design formula are characters, converting to factors
#> Sparsity threshold:0
#> Density window:1.73-1.76
#> converting counts to integer mode
#> Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
#> design formula are characters, converting to factors
#> Sparsity threshold:0.05
#> Density window:1.73-1.76
#> converting counts to integer mode
#> Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
#> design formula are characters, converting to factors
#> Sparsity threshold:0.1
#> Density window:1.73-1.76
#> converting counts to integer mode
#> Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
#> design formula are characters, converting to factors
#> Sparsity threshold:0.15
#> Density window:1.73-1.76
#> converting counts to integer mode
#> Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
#> design formula are characters, converting to factors
#> Sparsity threshold:0.2
#> Density window:1.73-1.76
#> converting counts to integer mode
#> Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
#> design formula are characters, converting to factors
#> Sparsity threshold:0.25
#> Density window:1.73-1.76
#> converting counts to integer mode
#> Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
#> design formula are characters, converting to factors
#> Sparsity threshold:0.3
#> Density window:1.73-1.76
#> converting counts to integer mode
#> Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
#> design formula are characters, converting to factors
#> Sparsity threshold with the most rejected hypotheses:0

mw.hr.sip = mw.hr.sip %>%
  dplyr::mutate(incorp = padj < padj_cutoff)

Compare number of incorporators

#Get incorporator info
qSIP_incorp = df_atomX_boot %>%
  dplyr::select(OTU, classification, A, A_sd, Incorporator_qSIP) %>%
  dplyr::filter(Incorporator_qSIP == TRUE) %>%
  dplyr::select(-classification)
n_qSIP_incorp = nrow(qSIP_incorp)

delbd_incorp = df_atomX_boot %>%
  dplyr::select(OTU, classification, A_delbd, A_delbd_sd, Incorporator_delbd) %>%
  dplyr::filter(Incorporator_delbd == TRUE) %>%
  dplyr::select(-classification)
n_delbd_incorp = nrow(delbd_incorp)

mw.hr.sip_incorp = mw.hr.sip %>%
  dplyr::select(OTU, taxa, incorp) %>%
  dplyr::filter(incorp == TRUE) %>%
  dplyr::rename("Incorporator_mw_hr.sip" = "incorp") %>%
  dplyr::select(-taxa)
n_mw.hr.sip_incorp = nrow(mw.hr.sip_incorp)

all_incorp_tibble = dplyr::full_join(qSIP_incorp, dplyr::full_join(delbd_incorp, mw.hr.sip_incorp, by = "OTU"), by = "OTU")

#Print incorporator information
cat('Number of incorporators detected by qSIP:', n_qSIP_incorp, '\n')
#> Number of incorporators detected by qSIP: 25
cat('Number of incorporators detected by ΔBD:', n_delbd_incorp, '\n')
#> Number of incorporators detected by ΔBD: 25
cat('Number of incorporators detected by MW-HR-SIP:', n_mw.hr.sip_incorp, '\n')
#> Number of incorporators detected by MW-HR-SIP: 6

rmarkdown::paged_table(all_incorp_tibble)

It is evident here that the AFE estimates using qSIP model or the ΔBD method are not the same, however, both provide the same incorporator information. Although the incorporators may not be the same everytime, we found an overlap between qSIP and ΔBD methods in our metagenome datasets. We also found that qSIP model provided more accurate estimates of AFE compared to ΔBD method. It also appears that MW-HR-SIP outputs lower number of incorporators. In our study and by Youngblut et al., MW-HR-SIP was found to have better sensitivity compared to qSIP or ΔBD methods. It is interesting that the AFE based methods and differential abundance based methods do not provide the same incorporator identity, which is perhaps due to comparistive analysis in only the multiple overlapping windows in the MW-HR-SIP method, while the AFE methods examine the complete BD gradient. Due to the higher sensitivity of the MW-HR-SIP method, we attempted a sequential analysis with first MW-HR-SIP to eliminate likely false positives, and then perform qSIP to estimate AFE. Such a method reduces multiple comparisons and increases confidence in detecting true incorporators due to higher statistical power. We are merely providing the different methods for the user to decide which method to apply for detecting incorporators in their study. The choice of the method is to the user’s discretion that fits best their study. ## Session information

sessionInfo()
#> R version 4.2.2 (2022-10-31)
#> Platform: aarch64-apple-darwin20 (64-bit)
#> Running under: macOS Monterey 12.0
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] SIPmg_1.4.1     ggpubr_0.6.0    ggplot2_3.4.1   HTSSIP_1.4.1   
#> [5] phyloseq_1.42.0
#> 
#> loaded via a namespace (and not attached):
#>   [1] colorspace_2.1-0            ggsignif_0.6.4             
#>   [3] ellipsis_0.3.2              XVector_0.38.0             
#>   [5] fftwtools_0.9-11            GenomicRanges_1.50.2       
#>   [7] rstudioapi_0.14             farver_2.1.1               
#>   [9] bit64_4.0.5                 AnnotationDbi_1.60.0       
#>  [11] fansi_1.0.4                 codetools_0.2-19           
#>  [13] splines_4.2.2               cachem_1.0.6               
#>  [15] geneplotter_1.76.0          knitr_1.42                 
#>  [17] ade4_1.7-22                 polynom_1.4-1              
#>  [19] jsonlite_1.8.4              broom_1.0.3                
#>  [21] annotate_1.76.0             cluster_2.1.4              
#>  [23] png_0.1-8                   readr_2.1.4                
#>  [25] compiler_4.2.2              httr_1.4.4                 
#>  [27] backports_1.4.1             lazyeval_0.2.2             
#>  [29] Matrix_1.5-3                fastmap_1.1.0              
#>  [31] cli_3.6.0                   htmltools_0.5.4            
#>  [33] tools_4.2.2                 igraph_1.4.0               
#>  [35] gtable_0.3.1                glue_1.6.2                 
#>  [37] GenomeInfoDbData_1.2.9      reshape2_1.4.4             
#>  [39] dplyr_1.1.0                 Rcpp_1.0.10                
#>  [41] EBImage_4.40.0              carData_3.0-5              
#>  [43] Biobase_2.58.0              jquerylib_0.1.4            
#>  [45] vctrs_0.5.2                 Biostrings_2.66.0          
#>  [47] rhdf5filters_1.10.0         multtest_2.54.0            
#>  [49] ape_5.7                     nlme_3.1-162               
#>  [51] iterators_1.0.14            xfun_0.37                  
#>  [53] stringr_1.5.0               lifecycle_1.0.3            
#>  [55] rstatix_0.7.2               XML_3.99-0.13              
#>  [57] zlibbioc_1.44.0             MASS_7.3-58.2              
#>  [59] scales_1.2.1                vroom_1.6.1                
#>  [61] ragg_1.2.5                  hms_1.1.2                  
#>  [63] MatrixGenerics_1.10.0       parallel_4.2.2             
#>  [65] SummarizedExperiment_1.28.0 biomformat_1.26.0          
#>  [67] rhdf5_2.42.0                RColorBrewer_1.1-3         
#>  [69] yaml_2.3.7                  memoise_2.0.1              
#>  [71] sass_0.4.5                  stringi_1.7.12             
#>  [73] RSQLite_2.3.0               highr_0.10                 
#>  [75] S4Vectors_0.36.1            foreach_1.5.2              
#>  [77] permute_0.9-7               tiff_0.1-11                
#>  [79] BiocGenerics_0.44.0         BiocParallel_1.32.5        
#>  [81] GenomeInfoDb_1.34.9         systemfonts_1.0.4          
#>  [83] rlang_1.0.6                 pkgconfig_2.0.3            
#>  [85] matrixStats_0.63.0          bitops_1.0-7               
#>  [87] evaluate_0.20               lattice_0.20-45            
#>  [89] purrr_1.0.1                 Rhdf5lib_1.20.0            
#>  [91] htmlwidgets_1.6.1           labeling_0.4.2             
#>  [93] bit_4.0.5                   tidyselect_1.2.0           
#>  [95] plyr_1.8.8                  magrittr_2.0.3             
#>  [97] DESeq2_1.38.3               R6_2.5.1                   
#>  [99] IRanges_2.32.0              generics_0.1.3             
#> [101] DelayedArray_0.24.0         DBI_1.1.3                  
#> [103] pillar_1.8.1                withr_2.5.0                
#> [105] mgcv_1.8-41                 survival_3.5-3             
#> [107] KEGGREST_1.38.0             abind_1.4-5                
#> [109] RCurl_1.98-1.10             tibble_3.1.8               
#> [111] crayon_1.5.2                car_3.1-1                  
#> [113] utf8_1.2.3                  tzdb_0.3.0                 
#> [115] rmarkdown_2.20              jpeg_0.1-10                
#> [117] locfit_1.5-9.7              grid_4.2.2                 
#> [119] data.table_1.14.8           blob_1.2.3                 
#> [121] vegan_2.6-4                 digest_0.6.31              
#> [123] xtable_1.8-4                tidyr_1.3.0                
#> [125] textshaping_0.3.6           stats4_4.2.2               
#> [127] munsell_0.5.0               bslib_0.4.2