# Motivation

This vignette introduces the R package BayesGmed, a package designed for a Bayesian causal mediation analysis in Stan, a probabilistic programming language for Bayesian inference. BayesGmed uses a parametric approach for the specification of the outcome and mediator model, and uses the g-formula approach for estimation. In addition to the estimation of causal mediation effects, the package also allows researchers to conduct sensitivity analysis.

# Getting started

You can install from Github via devtools

devtools::install_gitlab(belayb/BayesGmed)

# Example

BayesGmed comes with an example data, example_data, which contains a binary outcome $$Y$$, a single binary mediator $$M$$, a binary exposure $$A$$, and two numeric confounding variables $$Z = (Z_1, Z_2)$$. The first 6 rows of the data is visualized below.

library(BayesGmed)
head(example_data)
#>   Z1 Z2 A M Y
#> 1  0  1 1 1 0
#> 2  1  0 0 1 0
#> 3  0  0 0 0 0
#> 4  1  1 1 0 0
#> 5  0  1 1 0 0
#> 6  1  0 1 1 0

The aim in this example data is to estimate the direct and indirect effect of $$A$$ on $$Y$$ adjusting for $$Z$$. To do this, we may proced as follow.

$logit( P(Y_{i} = 1|A_{i},M_{i},\mathbf{Z}_{i}) ) = \alpha_{0} + \mathbf{\alpha}_{Z}^{'}\mathbf{Z}_{i} + \alpha_{A}A_{i} + \alpha_{M}M_{i},$

$logit(P(M_{i} = 1| A_{i},\mathbf{Z}_{i} ) ) = \beta_{0} + \mathbf{\beta}_{Z}^{'}\mathbf{Z}_{i} + \beta_{A}A_{i}.$

To complete the model specification, we assume the following priors for the model parameters.

\begin{aligned} \beta &\sim MVN(location_m, scale_m)\\ \alpha &\sim MVN(location_y, scale_y) \end{aligned}

We then need to specify the parameters of the prior distrbutions or assume a hyper-prior. BayesGmed currently only allow fixed values and these values can be passed as part of the model call and if not the following defult values will be used

P <- 3 # number of covariates plus the intercept term
priors <- list(scale_m = 2.5*diag(P+1),
scale_y = 2.5*diag(P+2),
location_m = rep(0, P+1),
location_y = rep(0, P+2))

The model can then be fitted as follow

fit1 <- bayesgmed(outcome = "Y", mediator =  "M", treat = "A", covariates = c("Z1", "Z2"), dist.y = "binary",
dist.m = "binary", link.y = "logit", link.m = "logit", data = example_data)
#>
#> SAMPLING FOR MODEL 'BY_BM_single' NOW (CHAIN 1).
#> Chain 1:
#> Chain 1: Gradient evaluation took 0.00014 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.4 seconds.
#> Chain 1:
#> Chain 1:
#> Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 1:
#> Chain 1:  Elapsed Time: 0.968 seconds (Warm-up)
#> Chain 1:                0.953 seconds (Sampling)
#> Chain 1:                1.921 seconds (Total)
#> Chain 1:
#>
#> SAMPLING FOR MODEL 'BY_BM_single' NOW (CHAIN 2).
#> Chain 2:
#> Chain 2: Gradient evaluation took 7.2e-05 seconds
#> Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.72 seconds.
#> Chain 2:
#> Chain 2:
#> Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 2:
#> Chain 2:  Elapsed Time: 1.062 seconds (Warm-up)
#> Chain 2:                1.094 seconds (Sampling)
#> Chain 2:                2.156 seconds (Total)
#> Chain 2:
#>
#> SAMPLING FOR MODEL 'BY_BM_single' NOW (CHAIN 3).
#> Chain 3:
#> Chain 3: Gradient evaluation took 5.9e-05 seconds
#> Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.59 seconds.
#> Chain 3:
#> Chain 3:
#> Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 3:
#> Chain 3:  Elapsed Time: 1.252 seconds (Warm-up)
#> Chain 3:                1.577 seconds (Sampling)
#> Chain 3:                2.829 seconds (Total)
#> Chain 3:
#>
#> SAMPLING FOR MODEL 'BY_BM_single' NOW (CHAIN 4).
#> Chain 4:
#> Chain 4: Gradient evaluation took 5.7e-05 seconds
#> Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.57 seconds.
#> Chain 4:
#> Chain 4:
#> Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 4:
#> Chain 4:  Elapsed Time: 1.154 seconds (Warm-up)
#> Chain 4:                1.349 seconds (Sampling)
#> Chain 4:                2.503 seconds (Total)
#> Chain 4:

bayesgmed_summary(fit1)
#>     Parameter  Mean    SE Median   2.5% 97.5% n_eff Rhat
#> 1 NDE_control 0.263 0.069  0.263  0.127 0.397  5292    1
#> 2 NDE_treated 0.286 0.069  0.287  0.150 0.420  5012    1
#> 3 NIE_control 0.042 0.036  0.040 -0.027 0.117  3925    1
#> 4 NIE_treated 0.065 0.048  0.063 -0.027 0.163  4434    1
#> 5        ANDE 0.274 0.064  0.273  0.148 0.398  5482    1
#> 6        ANIE 0.053 0.034  0.052 -0.010 0.123  4353    1
#> 7          TE 0.327 0.065  0.327  0.200 0.453  4959    1