Using ipolygrowth for bacterial growth estimation

Introduction

The ipolygrowth package calculates bacterial growth curve parameters using 4th degree polynomial functions. Polynomial growth curves are estimated from time series data from a single biological sample or multiple samples. Technical replicates within biological samples are allowed.

In this vignette, we will demonstrate examples with both single and multiple samples as input data. To use this package, we need to start with installing and loading the ipolygrowth package.

# install.packages(c("ipolygrowth", "dplyr", "ggplot2"))
library(ipolygrowth)
library(dplyr)
library(ggplot2)
library(kableExtra) # for RMarkdown table output

Expected outputs are shown in the rendered version of the [vignette] (https://CRAN.R-project.org/package=ipolygrowth) on CRAN.

Example data

We will use the bacterial growth data from the R package growthrates for demonstration. This data contains growth data of 3 bacteria strains and 12 antibiotics tetracycline concentration levels. Each strain-concentration combination has 2 replicates. In each replicate, the growth of bacteria is measured 31 times (i.e. 31 time points in the time series). For our demonstration, we consider each strain-concentration combination as a sample (i.e. a biological replicate) and each replicate as a technical replicate.

For more details about this data, see https://CRAN.R-project.org/package=growthrates.

The bactgrowth.txt can be loaded from the growthrates package directly or be downloaded from here and read in via read.table(). Let’s load the data and check its structure.

if (!"growthrates" %in% installed.packages()) {install.packages("growthrates")}
df.gr <- growthrates::bactgrowth
# Download the `bactgrowth.txt` to the directory of your script. Then run:
# data <- read.table("bactgrowth.txt", header = TRUE) %>%
  # mutate(strain = factor(strain, levels = c("D", "R", "T")))

str(df.gr)
#> 'data.frame':    2232 obs. of  5 variables:
#>  $ strain   : Factor w/ 3 levels "D","R","T": 3 3 3 3 3 3 3 3 3 3 ...
#>  $ replicate: int  2 2 2 2 2 2 2 2 2 2 ...
#>  $ conc     : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ time     : int  0 1 2 3 4 5 6 7 8 9 ...
#>  $ value    : num  0.013 0.014 0.017 0.022 0.03 0.039 0.042 0.045 0.048 0.049 ...

We can also take a look at the first few rows of the data.

head(df.gr)
#>   strain replicate conc time value
#> 1      T         2    0    0 0.013
#> 2      T         2    0    1 0.014
#> 3      T         2    0    2 0.017
#> 4      T         2    0    3 0.022
#> 5      T         2    0    4 0.030
#> 6      T         2    0    5 0.039

Let’s plot the data.

ggplot()+
  geom_point(data = df.gr, aes(x = time, y = value, color = factor(replicate)))+
  facet_wrap(~ strain + conc)+
  labs(color = "replicate")+
  theme_bw()

Requirement of data structure for ipolygrowth

The ipolygrowth functions require the input data to be in long format with a time variable and a dependent variable (y) as the measure of growth in the input data. Both variables need to be numeric. Time points in the time series can be any length (uneven and different spacing allowed, even between technical replicates), as long as the time scale is measured uniformly (same units used) across all samples.

When you have a single sample

We will use the ipg_singlesample() function to calculate growth curve parameters for one sample. Since the growthrates data contains 36 strain-concentration combinations, we will use data from donor (D) with concentration 15.63, to represent two replicates from a single sample. We will take out the variables strain and conc from the data frame to avoid confusion.

df.singlesample <- df.gr %>% filter(strain == "D", conc == 15.63) %>% select(-strain, -conc)
str(df.singlesample)
#> 'data.frame':    62 obs. of  3 variables:
#>  $ replicate: int  2 2 2 2 2 2 2 2 2 2 ...
#>  $ time     : int  0 1 2 3 4 5 6 7 8 9 ...
#>  $ value    : num  0.008 0.008 0.009 0.011 0.014 0.027 0.028 0.037 0.046 0.052 ...
table(df.singlesample$replicate, df.singlesample$time)
#>    
#>     0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
#>   1 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
#>   2 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
#>    
#>     28 29 30
#>   1  1  1  1
#>   2  1  1  1

Now we specify the data frame name, time variable, and y variable in the ipg_singlesample() function and save the results. The output of ipg_singlesample() contains a table of calculated growth curve parameters, the polynomial model deriving the calculated parameters, a table of \(\beta\) coefficients, and a table of fitted values with time. All components can be viewed by calling from the list output. Here, we show the table of calculated growth curve parameters.

out.singlesample <- ipg_singlesample(data = df.singlesample, time.name = "time", y.name = "value")
#> max y time is equal to the largest value of "time"

out.singlesample$estimates %>%
  kable %>% kable_styling("striped", full_width = F)  # table formatting for rmarkdown

peak growth rate	peak growth time	doubling time	lag time	max y	max y time
0.0064197	7.246653	107.972	1.515281	0.1149354	30

We can use the fitted values and the original data to plot our results.

ggplot()+
  geom_point(data = df.singlesample, aes(x = time, y = value, color = factor(replicate)))+ 
  geom_line(data = out.singlesample$fitted, aes(x = time, y = fit))+ 
  labs(color = "replicate")+
  scale_x_continuous(n.breaks = 10)+
  scale_y_continuous(n.breaks = 7)+
  theme_bw()

Altering epsilon for the maximum in y

Notice the printed message “max y time is equal to the largest value of”time”” after ipg_singlesample() is called. This message appears when max y time is equal to the largest observed time point in the sample and when the search algorithm to identify max y time did not converge. Usually, this indicates the growth curve did not reach an asymptote. If this message is printed, we recommend to plot the data from the sample to ensure the calculated max y time is appropriate for your data.

epsilon is a tuning parameter to change the threshold in the calculation of max y time. epsilon must be between 0 and 1, and a small value is recommended as it represents the convergence threshold as a fraction of the range of the dependent variable. The default value is 0.2%. The table below shows how the max y time is affected when epsilon is set to 1%. For this data, a reasonable value of epsilon is 1% as it corresponds to max growth at the end of the first growth phase.

out.singlesample2 <- ipg_singlesample(data = df.singlesample, time.name = "time", y.name = "value", epsilon = 1/100)
out.singlesample2$estimates %>%
  kable %>% kable_styling("striped", full_width = F)  # table formatting for rmarkdown

peak growth rate	peak growth time	doubling time	lag time	max y	max y time
0.0064197	7.246653	107.972	1.515281	0.1020701	23

When you have multiple samples

If you have multiple samples, ipolygrowth can calculate polynomial growth curve parameters for each sample. An ID variable must be included in the input data to uniquely identify each sample. For our demonstration, we will create a new ID variable to represent multiple biological samples.

df.gr2 <- df.gr %>%  mutate(id = paste(strain, conc, sep = "-"))
str(df.gr2)
#> 'data.frame':    2232 obs. of  6 variables:
#>  $ strain   : Factor w/ 3 levels "D","R","T": 3 3 3 3 3 3 3 3 3 3 ...
#>  $ replicate: int  2 2 2 2 2 2 2 2 2 2 ...
#>  $ conc     : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ time     : int  0 1 2 3 4 5 6 7 8 9 ...
#>  $ value    : num  0.013 0.014 0.017 0.022 0.03 0.039 0.042 0.045 0.048 0.049 ...
#>  $ id       : chr  "T-0" "T-0" "T-0" "T-0" ...
table(df.gr2$strain, df.gr2$conc)
#>    
#>      0 0.24 0.49 0.98 1.95 3.91 7.81 15.63 31.25 62.5 125 250
#>   D 62   62   62   62   62   62   62    62    62   62  62  62
#>   R 62   62   62   62   62   62   62    62    62   62  62  62
#>   T 62   62   62   62   62   62   62    62    62   62  62  62
unique(df.gr2$id)
#>  [1] "T-0"     "T-0.24"  "T-0.49"  "T-0.98"  "T-1.95"  "T-3.91"  "T-7.81" 
#>  [8] "T-15.63" "T-31.25" "T-62.5"  "T-125"   "T-250"   "D-0"     "D-0.24" 
#> [15] "D-0.49"  "D-0.98"  "D-1.95"  "D-3.91"  "D-7.81"  "D-15.63" "D-31.25"
#> [22] "D-62.5"  "D-125"   "D-250"   "R-0"     "R-0.24"  "R-0.49"  "R-0.98" 
#> [29] "R-1.95"  "R-3.91"  "R-7.81"  "R-15.63" "R-31.25" "R-62.5"  "R-125"  
#> [36] "R-250"

The input data foripg_multisample() is the same long format data used for ipg_singlesample() with the addition of the ID variable. epsilon is specified as a single value that will be applied across samples. It can also be input as a vector of values, one for each sample, thus allowing different thresholds for each sample.

out.multi.f <- ipg_multisample(data = df.gr2, id = "id", time.name = "time", y.name = "value", epsilon = 0.2/100)
#> max y time is equal to the largest value of "time" in 31 samples.
out.multi.f$estimates %>%
  kable() %>% kable_styling("striped", full_width = F) %>% scroll_box(width = "800px", height = "300px")  # table formatting for rmarkdown

id	peak growth rate	peak growth time	doubling time	lag time	max y	max y time
D-0	0.0054743	3.636922	126.61846	0.1231345	0.1073791	30
D-0.24	0.0047571	3.564325	145.70717	0.1410942	0.0765797	21
D-0.49	0.0051573	2.613987	134.40205	0.0384437	0.1027946	30
D-0.98	0.0054686	3.170291	126.75098	0.0728170	0.0962677	30
D-1.95	0.0055555	3.241725	124.76779	0.0779164	0.0961664	30
D-125	0.0010447	8.343843	663.46846	1.7678931	0.0312341	30
D-15.63	0.0064197	7.246653	107.97198	1.5152814	0.1149354	30
D-250	0.0011268	0.000000	615.12569	0.0000000	0.0426043	30
D-3.91	0.0056744	4.503602	122.15256	0.2438311	0.1014891	30
D-31.25	0.0084764	11.111193	81.77329	6.7640880	0.1395516	30
D-62.5	0.0053191	24.468397	130.31169	13.6297131	0.0906272	30
D-7.81	0.0058283	4.077346	118.92788	0.1753184	0.1074358	30
R-0	0.0053276	2.465975	130.10494	0.0481645	0.0667463	17
R-0.24	0.0013591	17.532117	510.01794	5.5878873	0.0456137	30
R-0.49	0.0026599	14.471380	260.59538	4.8918821	0.0732316	30
R-0.98	0.0018747	17.529965	369.73801	6.0976827	0.0486636	30
R-1.95	0.0015789	18.956604	438.99250	5.1036184	0.0542633	30
R-125	0.0007020	6.825708	987.42540	0.9884225	0.0212105	30
R-15.63	0.0015305	10.817963	452.89623	4.8301017	0.0416298	30
R-250	0.0008737	0.000000	793.31947	0.0000000	0.0363853	30
R-3.91	0.0012402	14.589281	558.89347	4.8800780	0.0415794	30
R-31.25	0.0009972	8.036184	695.10246	2.5311519	0.0251161	30
R-62.5	0.0013886	9.196834	499.15604	2.6981003	0.0344943	30
R-7.81	0.0013439	10.387510	515.76271	4.0522071	0.0373071	30
T-0	0.0065034	0.000000	106.58301	0.0000000	0.0701819	30
T-0.24	0.0074865	0.000000	92.58581	0.0000000	0.0668370	30
T-0.49	0.0072152	0.000000	96.06805	0.0000000	0.0874455	30
T-0.98	0.0056525	0.000000	122.62676	0.0000000	0.0886480	30
T-1.95	0.0068241	0.000000	101.57321	0.0000000	0.1030483	30
T-125	0.0008745	0.000000	792.60224	0.0000000	0.0176879	19
T-15.63	0.0049651	7.690180	139.60458	2.4044653	0.0852344	30
T-250	0.0012768	0.000000	542.88605	0.0000000	0.0407266	30
T-3.91	0.0062803	0.000000	110.36914	0.0000000	0.0956584	30
T-31.25	0.0074080	18.546689	93.56768	10.8612979	0.1182624	28
T-62.5	0.0010513	4.495226	659.34379	0.5534449	0.0206101	16
T-7.81	0.0056621	0.000000	122.41853	0.0000000	0.1005954	30

We can use the same method to plot the fitted growth curves.

ggplot()+
  geom_point(data = df.gr2, aes(x = time, y = value, color = factor(replicate)))+ 
  geom_line(data = out.multi.f$fitted, aes(x = time, y = fit))+ 
  facet_wrap(~ id)+
  labs(color = "replicate")+
  theme_bw()