--- title: "Vignette: Bayesian Kernel Machine Regression (BKMR) for Metal Exposure Analysis with Dynamic Threshold Function" author: "Kazi Tanvir Hasan and Dr. Gabriel Odom" output: rmarkdown::html_vignette: toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{Calculate PIP Threshold from Response Vector} %\VignetteEngine{quarto::html} %\VignetteLanguage{en} --- # Introduction In this vignette, we demonstrate how to use the bkmr package in R to analyze the relationship between metal exposures (e.g., Cadmium, Mercury, Arsenic, Lead, and Manganese) and a response variable (e.g., IQ score or QI). Additionally, we will compute a dynamic threshold for model inclusion based on the coefficient of variation and the sample size. ## 1. Required Libraries Before starting, ensure the following libraries are installed and loaded: ```{r} # Load the necessary libraries library(bkmr) library(simBKMRdata) library(tidyverse) ``` ## 2. Data Preprocessing First, we load the data, clean it, and perform any necessary transformations. In this case, we take the log-transformation of the metal concentrations to ensure they are on a comparable scale. ```{r} # Set the seed for reproducibility set.seed(2025) # Load the dataset and preprocess data("metalExposChildren_df") bkmrAnalysisData_df <- metalExposChildren_df %>% select(QI, Cadmium:Manganese) %>% # Selecting relevant columns na.omit() %>% # Remove rows with missing values mutate_at( vars(Cadmium:Manganese), ~ log10(. + 1) %>% as.vector ) # Log-transform metal concentrations ``` Here, `mutate_at` is used to log-transform the exposure variables (Cadmium to Manganese) by applying `log10(. + 1)`. ## 3. Exploratory Data Analysis (EDA) Before fitting the model, we perform some exploratory data analysis. We visualize the distribution of metal concentrations using histograms and density plots. ```{r} # Create a histogram with density plot for each metal metalDataPlot <- bkmrAnalysisData_df %>% select(Cadmium:Manganese) %>% pivot_longer(cols = everything(), names_to = "Metal", values_to = "Value") %>% ggplot(aes(x = Value, fill = Metal)) + geom_histogram(aes(y = ..density..), bins = 30, alpha = 0.5, position = "identity") + geom_density(aes(color = Metal), linewidth = 1) + facet_wrap(~Metal, scales = "free") + theme_minimal() + labs(title = "Histogram with Density Plot for Metals", x = "Concentration", y = "Density") + theme(legend.position = "none") # Display the plot metalDataPlot ``` This plot provides a visual understanding of how the metal concentrations are distributed. ## 4. Model Setup We now set up the Bayesian Kernel Machine Regression (BKMR) model by creating a response variable (QI) and the exposure matrix, which includes all the metal concentrations. ```{r} # Extract the response variable (IQ score or QI) iq <- bkmrAnalysisData_df %>% pull(QI) %>% na.omit() # Convert exposure variables (metals) to a matrix for modeling expos <- data.matrix(bkmrAnalysisData_df %>% select(Cadmium:Manganese)) ``` ## 5. Generating Knot Points for the Model To fit the Bayesian Kernel Machine Regression model, we generate knot points using the cover.design function. These points will be used for the MCMC model. ```{r} # Generate knot points using a cover design for Bayesian modeling knots50 <- fields::cover.design(expos, nd = 50)$design ``` Here, we specify 50 knots for the cover design. These knot points help in fitting the non-linear kernel in BKMR. ## 6. Fitting the BKMR Model We now fit the BKMR model using the kmbayes() function. This function performs Bayesian regression using MCMC (Markov Chain Monte Carlo) to estimate the relationship between the exposures and the response variable. ```{r} #| label: run-BKMR-MCMC # Fit the BKMR model using MCMC modelFit <- kmbayes( y = iq, # Response variable Z = expos, # Exposure matrix (metal concentrations) X = NULL, # No additional covariates iter = 1000, # Number of MCMC iterations family = "gaussian", # Gaussian response verbose = TRUE, # Output progress for each iteration varsel = TRUE, # Perform variable selection knots = knots50 # Knot points generated earlier ) ``` ## 7. Extracting Results After the model is fitted, we extract the posterior inclusion probabilities (PIPs) to determine which exposures are most important in predicting the response variable. ```{r} # Extract posterior inclusion probabilities (PIPs) and sort them pipFit <- ExtractPIPs(modelFit) %>% arrange(desc(PIP)) # Display the PIPs pipFit ``` The `ExtractPIPs()` function returns the posterior inclusion probabilities, which are then sorted in descending order to identify the most important exposures. 8. Dynamic Threshold Calculation Next, we calculate the dynamic threshold for model inclusion. This threshold is based on the coefficient of variation (CV) of the response variable (QI) and the sample size. ```{r} # Calculate the dynamic threshold for model inclusion pipThresh_fn <- calculate_pip_threshold( absCV = sd(bkmrAnalysisData_df$QI) / mean(bkmrAnalysisData_df$QI), # Coefficient of variation sampSize = length(bkmrAnalysisData_df$QI) # Sample size ) # Display the threshold pipThresh_fn ``` This function computes the threshold for the PIPs using the coefficient of variation of the response variable and the sample size. ## 9. Interpretation of Results Finally, we compare the PIPs with the threshold to identify the variables (metals) that have a significant effect on the response variable. ```{r} # Identify exposures with PIPs greater than the threshold significant_exposures <- pipFit %>% filter(PIP > pipThresh_fn) # Display significant exposures significant_exposures ``` This step filters the exposures whose posterior inclusion probabilities exceed the dynamic threshold, indicating that these variables are significant predictors of the outcome. # Conclusion This vignette demonstrates the complete workflow for using Bayesian Kernel Machine Regression (BKMR) to analyze the relationship between various metal exposures and an outcome of interest (e.g., IQ score). Additionally, we computed a dynamic threshold for model inclusion, which adjusts the threshold based on the coefficient of variation and the sample size.