--- title: "Analyzing Microbial Social Behavior with bsocialv2" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Analyzing Microbial Social Behavior with bsocialv2} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5 ) ``` ## Introduction The **bsocialv2** package provides an S4 class and methods for analyzing microbial social behavior in bacterial consortia. Starting from growth curve data, the package classifies strains as cooperators, cheaters, or neutrals based on their effect on consortium fitness. It also quantifies diversity effects, consortium stability, and finds optimal biofilm assembly sequences. The analysis pipeline supports two workflows: - **Curated workflow** -- start from pre-processed growth parameters (LogPhase, NGen, GR) that were computed externally. - **Raw workflow** -- start from plate reader CSVs, preprocess the curves, then extract growth parameters automatically. This vignette demonstrates the curated workflow end-to-end and gives a brief overview of the raw workflow. ## Curated Workflow ### Loading example data The package ships with example data in `inst/extdata/`. Two files are needed for the curated workflow: - `consortia.csv` -- a presence/absence matrix indicating which strains are in each consortium. - `curated_MTBE.csv` -- pre-processed growth parameters (Consortia, LogPhase, NGen, GR). ```{r load-data} library(bsocialv2) # Read the consortia presence/absence matrix consortia <- read.csv( system.file("extdata", "consortia.csv", package = "bsocialv2") ) if (!"Consortia" %in% colnames(consortia)) colnames(consortia)[1] <- "Consortia" # Read the curated (pre-processed) growth parameters curated <- read.csv( system.file("extdata", "curated_MTBE.csv", package = "bsocialv2") ) if (!"Consortia" %in% colnames(curated)) colnames(curated)[1] <- "Consortia" head(consortia) head(curated) ``` ### Creating the bsocial object Create a new `bsocial` object, set the project identifier, define the strain names, and populate the raw-data slots: ```{r create-object} obj <- new("bsocial") obj@id_proyecto <- "MTBE_example" # Strain names are all columns in consortia except "Consortia" obj@cepas_seleccionadas <- setdiff(colnames(consortia), "Consortia") # Populate the raw-data list obj@datos_crudos <- list( consortia = consortia, curated = curated ) ``` ### Step 1: Transform curated data `transform_curated_data()` joins the curated growth parameters with the consortia presence/absence matrix and stores the result in `datos_procesados`: ```{r transform-curated} obj <- transform_curated_data(obj) head(obj@datos_procesados) ``` ### Step 2: Analyze growth `analyze_growth()` creates a scatter plot of LogPhase vs NGen colored by consortium diversity (number of strains), and generates top-10 tables ranked by NGen and GR: ```{r analyze-growth} obj <- analyze_growth(obj) # View the scatter plot obj@graficos$growth_scatter # Top 10 consortia by number of generations obj@resultados_analisis$best_10_ngen ``` ### Step 3: Analyze social behavior `analyze_social_behavior()` compares each strain's fitness in consortia versus its monoculture baseline. It produces boxplots showing relative fitness split by whether each strain is present or absent: ```{r analyze-social} obj <- analyze_social_behavior(obj) # Fitness boxplots (NGen and GR) obj@resultados_analisis$social_behavior$social_generations_plot obj@resultados_analisis$social_behavior$social_gr_plot ``` ### Step 4: Classify strains `summarize_social_behavior()` uses pairwise t-tests and median comparisons to classify each strain as a cooperator (positive), cheater (negative), or neutral: ```{r summarize-social} obj <- summarize_social_behavior(obj) # Classification based on number of generations obj@resultados_analisis$summary_gen # Classification based on growth rate obj@resultados_analisis$summary_gr ``` ### Step 5: Analyze diversity effects `analyze_diversity()` examines the relationship between consortium diversity (number of strains) and fitness. It produces boxplots of relative fitness across diversity levels and a "top-k" analysis: ```{r analyze-diversity} obj <- analyze_diversity(obj) # Diversity effect on fitness (NGen) obj@graficos$diversity_gen_plot # Diversity effect on fitness (GR) obj@graficos$diversity_gr_plot ``` ### Step 6: Analyze stability `analyze_stability()` calculates how consistent growth metrics are across diversity levels. For curated data it groups consortia by diversity and produces violin plots: ```{r analyze-stability} obj <- analyze_stability(obj) # Stability violin plots obj@graficos$stability_ngen_plot obj@graficos$stability_gr_plot ``` ### Step 7: Find biofilm assembly sequences `analyze_biofilm_sequence()` builds a directed graph of consortium assembly paths. It finds shortest paths from monocultures to the full consortium, based on increasing diversity and fitness: ```{r analyze-biofilm} obj <- analyze_biofilm_sequence(obj) # The paths are stored as named lists str(obj@resultados_analisis$biofilm_gen_paths, max.level = 2) # Plot the assembly graph (base R plot) obj@graficos$biofilm_gen_plot_func() ``` ## Raw Workflow (Overview) When working with raw plate reader data the workflow starts with CSV files exported from the plate reader. Each plate CSV contains OD readings over time. The six example plates are included in the package (`plate1.csv` through `plate6.csv`). The following code is **not executed** here because it requires raw plate files with specific formatting. It illustrates the additional preprocessing steps. ```{r raw-workflow, eval = FALSE} library(bsocialv2) library(readr) obj <- new("bsocial") obj@id_proyecto <- "raw_example" # Read consortia matrix consortia <- read.csv( system.file("extdata", "consortia.csv", package = "bsocialv2") ) if (!"Consortia" %in% colnames(consortia)) colnames(consortia)[1] <- "Consortia" obj@cepas_seleccionadas <- setdiff(colnames(consortia), "Consortia") # Read plate CSVs plate_files <- paste0("plate", 1:6, ".csv") plates <- lapply(plate_files, function(f) { read_csv(system.file("extdata", f, package = "bsocialv2"), show_col_types = FALSE) }) obj@datos_crudos <- list( consortia = consortia, plates = plates, type = "raw" ) # Step 1: Preprocess raw data (background correction + replicate aggregation) # groups - integer vector assigning plates to replicate groups # bg_type - "blank" (subtract blank well) or "threshold" (subtract OD value) # bg_param - blank well ID (for "blank") or numeric threshold (for "threshold") obj <- transform_raw_data(obj, groups = c(1, 1, 2, 2, 3, 3), bg_type = "blank", bg_param = "BLANK" ) # Step 2: Calculate growth parameters obj <- calculate_growth_params(obj, method = "growthcurver") # Optional: visualize processed curves obj <- plot_processed_curves(obj) obj@graficos$processed_curves_plot # Step 3 onwards: same pipeline as curated workflow obj <- analyze_growth(obj) obj <- analyze_social_behavior(obj) obj <- summarize_social_behavior(obj) obj <- analyze_diversity(obj) obj <- analyze_stability(obj) obj <- analyze_biofilm_sequence(obj) ``` ## Accessing Results After running the full pipeline, results are stored in the `bsocial` object's slots. Here is a summary of key outputs: | Slot | Key | Description | |------|-----|-------------| | `datos_procesados` | (data frame) | Merged consortia + growth parameters | | `resultados_analisis` | `best_10_ngen` | Top 10 consortia by NGen | | `resultados_analisis` | `best_10_gr` | Top 10 consortia by GR | | `resultados_analisis` | `social_behavior` | Fitness comparison data and plots | | `resultados_analisis` | `summary_gen` | Strain classification (NGen) | | `resultados_analisis` | `summary_gr` | Strain classification (GR) | | `resultados_analisis` | `diversity_gen_table` | Diversity effect matrix (NGen) | | `resultados_analisis` | `diversity_gr_table` | Diversity effect matrix (GR) | | `resultados_analisis` | `stability_cv_data` | Stability CV data | | `resultados_analisis` | `biofilm_gen_paths` | Assembly paths (NGen) | | `resultados_analisis` | `biofilm_gr_paths` | Assembly paths (GR) | | `graficos` | `growth_scatter` | LogPhase vs NGen scatter (ggplot) | | `graficos` | `diversity_gen_plot` | Diversity boxplot, NGen (ggplot) | | `graficos` | `diversity_gr_plot` | Diversity boxplot, GR (ggplot) | | `graficos` | `stability_ngen_plot` | Stability violin, NGen (ggplot) | | `graficos` | `stability_gr_plot` | Stability violin, GR (ggplot) | | `graficos` | `biofilm_gen_plot_func` | Assembly graph plot function (NGen) | | `graficos` | `biofilm_gr_plot_func` | Assembly graph plot function (GR) |