IPD meta-analysis-with-missing-data

Michael Seo

2022-06-03

Fitting IPD meta-analysis model with missing data

In this vignette, we go over how to run IPD meta-analysis using models from this package when there is missing data. This would be a simple example showcasing a way of handling missing data. We will use multiple imputation using the mice package. First let’s generate a dataset with some missingness.

# install.packages('bipd') or
# devtools::install_github('MikeJSeo/bipd')
library(bipd)
set.seed(1)
simulated_dataset <- generate_sysmiss_ipdma_example(Nstudies = 10,
    Ncov = 5, sys_missing_prob = 0, magnitude = 0.5, heterogeneity = 0.1)

simulated_dataset_missing <- simulated_dataset
randomindex <- sample(c(TRUE, FALSE), dim(simulated_dataset_missing)[1],
    replace = TRUE, prob = c(0.2, 0.8))
randomindex2 <- sample(c(TRUE, FALSE), dim(simulated_dataset_missing)[1],
    replace = TRUE, prob = c(0.1, 0.9))
simulated_dataset_missing[randomindex, c("x1")] <- NA
simulated_dataset_missing[randomindex2, c("x3")] <- NA
head(simulated_dataset_missing)
#> # A tibble: 6 x 8
#>       y     x1 x2    x3         x4     x5 study treat
#>   <dbl>  <dbl> <fct> <fct>   <dbl>  <dbl> <int> <int>
#> 1  2.48  0.271 0     0     -0.280  0.819      1     0
#> 2  3.61  0.768 1     1      0.429  0.0837     1     0
#> 3  2.61 -1.31  1     0     -1.23   0.165      1     1
#> 4  1.47 -0.590 1     1      0.435  0.345      1     0
#> 5  4.39  1.20  1     1      0.0561 0.287      1     0
#> 6  1.95 -1.00  0     0     -2.19   1.63       1     1

Now we would like to create multiply imputed datasets. You can use mice package to create your own imputations or you can use pre-built imputation tools in this package. For demonstration, we will use the imputation tool in this package using ‘2l.pmm’, which is predictive mean matching accounting for multilevel via mixed effects modelling. See miceadds package for details.

library(miceadds)  #for multilevel datasets without systematically missing predictors
imputation <- ipdma.impute(simulated_dataset_missing, covariates = c("x1",
    "x2", "x3", "x4", "x5"), typeofvar = c("continuous", "binary",
    "binary", "continuous", "continuous"), interaction = TRUE,
    studyname = "study", treatmentname = "treat", outcomename = "y",
    m = 5)

Okay, so we have obtained 5 sets of imputed datasets with the call above. One convenient aspect of utilizing Bayesian methods under missing data is that to do a proper analysis, we can simply fit the IPD-MA model for each imputed dataset and merge the mcmc results. We do not need to use Rubin’s rule to combine multiply imputed datasets. Just simple merging of mcmc.list would work.

multiple.imputations <- imputation$imp.list
for (ii in 1:length(multiple.imputations)) {

    current.data <- multiple.imputations[[ii]]

    X <- with(current.data, apply(current.data[, c("x1", "x2",
        "x3", "x4", "x5")], 2, function(x) as.numeric(x)))

    ipd <- with(current.data, ipdma.model.onestage(y = y, study = study,
        treat = treat, X = X, response = "normal", shrinkage = "none"))

    # Run only 100 iterations for demonstration
    samples <- ipd.run(ipd, pars.save = c("beta", "gamma", "delta"),
        n.chains = 3, n.burnin = 100, n.iter = 100)

    if (ii == 1) {
        final.result <- samples
    } else {
        final.result <- add.mcmc(final.result, samples)
    }
}
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 2088
#>    Unobserved stochastic nodes: 33
#>    Total graph size: 27192
#> 
#> Initializing model
#> 
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 2088
#>    Unobserved stochastic nodes: 33
#>    Total graph size: 27192
#> 
#> Initializing model
#> 
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 2088
#>    Unobserved stochastic nodes: 33
#>    Total graph size: 27192
#> 
#> Initializing model
#> 
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 2088
#>    Unobserved stochastic nodes: 33
#>    Total graph size: 27192
#> 
#> Initializing model
#> 
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 2088
#>    Unobserved stochastic nodes: 33
#>    Total graph size: 27192
#> 
#> Initializing model

We have written a convenient function ‘mcmc.add’ which combines mcmc.list. Having run the code above, we have mcmc.list containing all the results from each multiply imputed datasets. Now we can use this to summarize the findings.

summary(final.result)
#> 
#> Iterations = 1:500
#> Thinning interval = 1 
#> Number of chains = 3 
#> Sample size per chain = 500 
#> 
#> 1. Empirical mean and standard deviation for each variable,
#>    plus standard error of the mean:
#> 
#>             Mean      SD  Naive SE Time-series SE
#> beta[1]  0.59774 0.03936 0.0010163       0.002392
#> beta[2]  0.26136 0.03394 0.0008763       0.001732
#> beta[3]  0.24367 0.03602 0.0009301       0.001948
#> beta[4]  0.63482 0.04008 0.0010350       0.002517
#> beta[5]  0.52734 0.03583 0.0009251       0.002159
#> delta[1] 0.00000 0.00000 0.0000000       0.000000
#> delta[2] 0.91045 0.18739 0.0048385       0.004841
#> gamma[1] 0.17109 0.05203 0.0013434       0.002670
#> gamma[2] 0.09573 0.04778 0.0012337       0.002371
#> gamma[3] 0.13870 0.05106 0.0013183       0.002755
#> gamma[4] 0.36311 0.05540 0.0014304       0.003193
#> gamma[5] 0.24112 0.04979 0.0012857       0.002623
#> 
#> 2. Quantiles for each variable:
#> 
#>              2.5%     25%    50%    75%  97.5%
#> beta[1]  0.521586 0.57142 0.5987 0.6246 0.6704
#> beta[2]  0.193070 0.23958 0.2613 0.2845 0.3266
#> beta[3]  0.173381 0.22020 0.2437 0.2674 0.3147
#> beta[4]  0.558550 0.60657 0.6342 0.6610 0.7165
#> beta[5]  0.457109 0.50341 0.5263 0.5507 0.5994
#> delta[1] 0.000000 0.00000 0.0000 0.0000 0.0000
#> delta[2] 0.550003 0.78580 0.9126 1.0296 1.2809
#> gamma[1] 0.071197 0.13569 0.1711 0.2053 0.2712
#> gamma[2] 0.001207 0.06467 0.0959 0.1266 0.1908
#> gamma[3] 0.036020 0.10546 0.1385 0.1727 0.2376
#> gamma[4] 0.248599 0.32601 0.3655 0.4003 0.4710
#> gamma[5] 0.140260 0.20909 0.2407 0.2729 0.3402

Also another important function in this package is finding the individual treatment effect. One additional precaution we need to take is determining what to use for the overall mean and standard deviation (sd). Instead of mean and sd of the imputed dataset, we need to calculate mean and sd of the original dataset. We have allowed the users to specify the overall mean (scale_mean) and overall sd (scale_sd) parameters in the treatment.effect function.

X <- as.matrix(apply(simulated_dataset[, c("x1", "x2", "x3",
    "x4", "x5")], 2, as.numeric))
# calculate overall mean
overall_mean <- apply(X, 2, mean, na.rm = TRUE)
overall_sd <- apply(X, 2, sd)

treatment.effect(ipd, samples, newpatient = c(0.5, 1, 1, -0.5,
    0.5), scale_mean = overall_mean, scale_sd = overall_sd)
#>     0.025       0.5     0.975 
#> 0.9062374 1.2857335 1.6615570