--- title: "Subgroup Analysis with Multivariate Binary Outcomes in Multilevel Data" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Subgroup Analysis with Multivariate Binary Outcomes in Multilevel Data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5 ) ``` ```{r, echo=FALSE} biblio <- RefManageR::ReadBib("../inst/REFERENCES.bib", check = "error") ``` ## Introduction When data are clustered (e.g., students within schools, patients within hospitals), observations are not independent. Ignoring this clustering can lead to: - Underestimated standard errors - Inflated Type I error rates - Invalid inference - Under- or overpowered decisions The `bglmm()` function extends `bglm()` to handle clustered data by incorporating **random effects**. ## When to Use Multilevel Models Use `bglmm()` when: - Observations are grouped into clusters (schools, hospitals, families, etc.); - Cluster variation affects outcomes - Clustering is theoretically meaningful Use `bglm()` when: - Data are not clustered; AND - Clustering is not theoretically meaningful ## Example: Educational Intervention Study We'll analyze a study comparing two teaching methods across 20 schools on reading and math ability, differentiating on student baseline ability. ### Generate Example Data ```{r setup} library(bmco) set.seed(2020) n_schools <- 20 n_per_school <- 15 n_total <- n_schools * n_per_school # Generate random intercepts school_intercepts_math <- rnorm(n_schools, mean = 0, sd = 1.50) school_intercepts_reading <- rnorm(n_schools, mean = 0, sd = 1.50) # Cluster randomization: whole schools receive the same method. school_method <- rep(c("traditional", "new_method"), each = n_schools/2) study_data <- data.frame( school_id = factor(rep(1:n_schools, each = n_per_school)), method = rep(school_method, each = n_per_school), baseline_ability = rnorm(n_total, mean = 50, sd = 10), stringsAsFactors = FALSE ) # Standardize covariate study_data$baseline_ability_std <- scale(study_data$baseline_ability)[, 1] study_data$math_pass <- NA study_data$reading_pass <- NA p_math <- p_reading <- rep(NA, nrow(study_data)) for (i in 1:nrow(study_data)) { school <- as.numeric(study_data$school_id[i]) is_new <- ifelse(study_data$method[i] == "new_method", 1, 0) p_math[i] <- plogis(-0.50 + 0.75 * is_new + 0.10 * study_data$baseline_ability_std[i] + 0.20 * is_new * study_data$baseline_ability_std[i] + school_intercepts_math[school]) p_reading[i] <- plogis(-0.50 + 0.80 * is_new + 0.05 * study_data$baseline_ability_std[i] + 0.15 * is_new * study_data$baseline_ability_std[i] + school_intercepts_reading[school]) study_data$math_pass[i] <- rbinom(1, 1, p_math[i]) study_data$reading_pass[i] <- rbinom(1, 1, p_reading[i]) } ``` ### Fit Multilevel Model ```{r fit_bglmm, results='hide', message=FALSE, warning=FALSE, eval=TRUE} set.seed(2025) result <- bglmm( data = study_data, grp = "method", grp_a = "traditional", grp_b = "new_method", id_var = "school_id", x_var = "baseline_ability", y_vars = c("math_pass", "reading_pass"), x_method = "Empirical", x_def = c(-Inf, Inf), fixed = c("method", "baseline_ability", "method_baseline_ability"), random = c("Intercept"), test = "right_sided", rule = "All", n_burn = 200, n_it = 500, n_thin = 1 ) ``` ```{r show_results, eval=TRUE} print(result) summary(result) ``` ### Interpretation The posterior probability indicates the probability that the new teaching method improves **both** math and reading outcomes (rule = "All"), accounting for: 1. **Fixed effects**: Overall method effect, baseline ability effect, and their interaction 2. **Random effects**: School-specific deviations from the overall pattern 3. **Clustering**: Within-school correlation of student outcomes ## Specifying Population of Interest Like `bglm()`, you can define subpopulations: ### Specific Ability Level ```{r value_method, results='hide', message=FALSE, warning=FALSE, eval = FALSE} set.seed(2027) result_value <- bglmm( # ... other arguments ... x_method = "Value", x_def = 50, # Average ability ) ``` ```{r show_value, eval = FALSE} cat("Effect for students with baseline ability = 50:\n") cat(" Treatment difference:", result_value$delta$mean_delta, "\n") cat(" Posterior probability:", result_value$delta$pop, "\n") ``` ### Ability Range (Empirical) ```{r empirical_method, results='hide', message=FALSE, warning=FALSE, eval = FALSE} set.seed(2028) result_empirical <- bglmm( # ... other arguments ... x_method = "Empirical", x_def = c(40,60) # Ability range 40-60 ) ``` ### Ability Range (Analytical) ```{r analytical_method, results='hide', message=FALSE, warning=FALSE, eval = FALSE} set.seed(2029) result_analytical <- bglmm( # ... other arguments ... x_method = "Analytical", x_def = c(40,60) ) ``` ## Decision Rules with Multilevel Data ### All Rule (Conjunctive) Both outcomes must favor the new method: ```{r rule_all, eval = FALSE} result_all <- bglmm( data = study_data, # ... other arguments ... rule = "All" # P(new_method better on BOTH outcomes) ) ``` ### Any Rule (Disjunctive) At least one outcome must favor the new method: ```{r rule_any, eval = FALSE} result_any <- bglmm( data = study_data, # ... other arguments ... rule = "Any" # P(new_method better on AT LEAST ONE outcome) ) ``` ### Compensatory Rule Weighted combination (e.g., math weighted 60%, reading 40%): ```{r rule_comp, eval = FALSE} result_comp <- bglmm( data = study_data, # ... other arguments ... rule = "Comp", w = c(0.6, 0.4) # Weights for math and reading ) ``` ## Specifying Prior Distributions `bglmm()` uses **weakly informative priors** by default. You can customize these priors for: - Fixed effects regression coefficients (`b_mu0`, `b_sigma0`) - Random effects means (`g_mu0`, `g_sigma0`) - Random effects covariance (`nu0`, `tau0`) ### Fixed Effects Priors (bglm and bglmm) Fixed effects have a **multivariate normal prior**: $$\beta \sim N(\mu_0, \Sigma_0)$$ #### Default Priors ```{r default_priors, eval=TRUE} # Default p_fixed <- 2 b_mu0 <- rep(0, p_fixed) # Zero prior mean b_sigma0 <- solve(diag(1e1, p_fixed)) # Small prior precision (large prior variance on the log-odds scale). # Weakly informative. # solve() generates precision matrix needed as input ``` Where `p_fixed` is the number of fixed effects. In our example, we use: - Covariate effect(s) - Interaction term(s) #### Custom Fixed Effects Priors ```{r custom_prior_1, results='hide', message=FALSE, warning=FALSE, eval=TRUE} # Assume treatment improves outcomes (positive coefficient on group) custom_b_mu0 <- c(0, 0.5) # Expect positive group effect custom_b_sigma0 <- solve(diag(c(1e1, 5))) # More certainty on group effect ``` ### Random Effects Priors #### Population-Level Random Effects Random effects means have a **multivariate normal prior**: $$\gamma \sim N(\mu_g, \Sigma_g)$$ **Default**: ```{r random_default, eval=TRUE} p_random <- 2 g_mu0 <- rep(0, p_random) # Prior mean g_sigma0 <- solve(diag(1e1, p_random)) # Weakly informative prior variance; # solve() transforms variance matrix to precision matrix needed as input ``` Where `p_random` is the number of fixed effects. In our example, we use: - Intercept - Group effect #### Random Effects Covariance The covariance matrix has an **inverse-Wishart prior**: $$\Sigma \sim \text{InvWishart}(\nu_0, \mathbf{T}_0)$$ **Default**: ```{r iwishart_default, eval=TRUE} nu0 <- p_random # Degrees of freedom (dimension) tau0 <- diag(1e-1, p_random) # Scale matrix ``` **Important**: - `nu0` must be ≥ `p_random` - `tau0` must be **positive definite** (never use `diag(0, p_random)`) - Smaller `nu0` = less informative - Larger `tau0` diagonal values = expect more between-cluster variation ### Prior Sensitivity Analysis Compare results under different priors: ```{r prior_sensitivity, results='hide', message=FALSE, warning=FALSE, eval = FALSE} # ----------------------------------------------------------------- # Three priors that pull in clearly different directions. # ----------------------------------------------------------------- # 1. Weakly informative (neutral starting point) # Diffuse on fixed effects, diffuse IW for random effects. result_weak <- bglmm( # ... other arguments ... b_mu0 = rep(0, p_fixed), b_sigma0 = solve(diag(10, p_fixed)), g_mu0 = rep(0, p_random), g_sigma0 = solve(diag(10, p_random)), nu0 = p_random, tau0 = diag(0.1, p_random), ) # 2. Skeptical prior # Tight variance so the prior resists positive or negative evidence. # Also a small tau0, implying homogeneity between schools. result_skeptical <- bglmm( # ... other arguments ... b_mu0 = rep(0, p_fixed), b_sigma0 = solve(diag(c(1, 1), p_fixed)), # Tight — hard(er) to overcome g_mu0 = rep(0, p_random), g_sigma0 = solve(diag(c(1, 1), p_random)), # Tight — hard(er) to overcome nu0 = p_random + 4, # More informative IW tau0 = diag(0.05, p_random), # Expects small school variation ) ``` **Interpretation**: - If results are **similar** → data dominate, prior has little influence - If results **differ substantially** → prior is influential ### Guidelines for Choosing Priors 1. **Default priors**: Use for exploratory analysis or when no prior information exists 2. **Informative priors**: Use when: - You have strong theoretical expectations - Previous studies provide evidence - You want to regularize extreme estimates - Sample size is very small 3. **Skeptical priors**: Use for: - Conservative hypothesis testing - Replication studies (favor null) - High-stakes decisions requiring strong evidence 4. **Prior sensitivity**: Check: ```{r prior_check, eval = FALSE} # Run with default priors # Run with informative priors # Compare posterior probabilities and effect sizes # If similar → robust; if different → prior-dependent ``` ### Common Mistakes to Avoid - **Don't**: Use `tau0 = diag(0, p)` (singular matrix) - **Do**: Use `tau0 = diag(1e-1, p)` or `diag(0.5, p)` - **Don't**: Set `nu0 < p` (too few degrees of freedom) - **Do**: Use `nu0 =$\geq p$ - **Don't**: Use extremely small prior variances without justification - **Do**: Use weakly informative priors unless you have strong evidence - **Don't**: Ignore prior sensitivity when sample size is small - **Do**: Report results under multiple priors for transparency ### Further Reading For Bayesian logistic regression priors: - Gelman, A. Jakulin, A., Pittau, M. G. & Su, Y-S (2008). A weakly informative default prior distribution for logistic and other regression models. *Annals of Applied Statistics, 2*(4), 1360-1383. \doi{10.1214/08-AOAS191}. For inverse-Wishart priors: - Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. *Bayesian Analysis, 1*(3), 515-534. \doi{10.1214/06-BA117A} ## MCMC Diagnostics The function internally checks the multivariate potential scale reduction factor (MPSRF) and warns if > 1.1. Additional MCMC diagnostics will be returned When `return_diagnostics = TRUE`. When `return_diagnostic_plots = TRUE`, `bglmm()` returns diagnostic plots as well. ```{r diagnose, results='hide', message=FALSE, warning=FALSE, eval=TRUE} set.seed(2030) result_diag <- bglmm( data = study_data, grp = "method", grp_a = "traditional", grp_b = "new_method", id_var = "school_id", x_var = "baseline_ability", y_vars = c("math_pass", "reading_pass"), x_method = "Value", x_def = 50, n_burn = 200, n_it = 500, n_thin = 1, start = c(0.5, 1), return_diagnostics = TRUE ) ``` Key diagnostics: - **MPSRF** (Multivariate PSRF): Ideally < 1.1 - **Effective sample size**: Should be > 100 per parameter - **Rhat**: Ideally < 1.1 for all parameters ```{r show_diag, eval=TRUE} # Check convergence cat("Random effects convergence:\n") print(result_diag$diags$g$convergence) cat("\nRandom effects covariance convergence:\n") print(result_diag$diags$tau$convergence) ``` Available plots: - **Autocorrelation**: Is ideally high at lag 0, then drops off quickly toward 0 within a few lags. - **Traceplot**: Chains should fluctuate randomly around a stable mean, with no clear trends, drifts, or long flat stretches. Good mixing. - **Density**: Posterior density of individual parameters (regression coefficients and variance parameters) if preferably smooth, unimodal (unless theory suggests otherwise), and without irregular spikes. ```{r plot_convergence, eval = FALSE} plot(result_diag) # All plot(result_diag, type = "trace") # Trace only plot(result_diag, type = "density") # Density only plot(result_diag, which = "fixed") # Fixed effects plot(result_diag, which = "random") # Random effects plot(result_diag, which = "variance") # Variance components plot(result_diag, which = "all") # All # Combine type + which plot(result_diag, type = "trace", which = "all") # Trace for all parameters plot(result_diag, type = "density", which = "variance") # Variance densities ``` If convergence issues occur: 1. Increase `n_burn` (more warmup) 2. Increase `n_it` (more samples) 3. Check for very few clusters 4. Check for very small cluster sizes 5. Check range of covariate and consider standardizing 6. Check for problems and warnings in data (perfect separation, low variation, etc.) If autocorrelation is high, usually the effective sample size is much lower than `n_it`: 1. Consider thinning 2. Consider increasing `n_it` For more information on MCMC diagnostics, consult - Brooks, S. P. & Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. _Journal of Computational and Graphical Statistics, 7_(4), 434–455.\doi{10.1080/10618600.1998.10474787} - Gelman, A. & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. _Statistical Science, 7_(4), 457–472. \doi{10.1214/ss/1177011136} ## Extracting Posterior Samples ```{r samples, results='hide', message=FALSE, warning=FALSE, eval = FALSE} set.seed(2031) result_samples <- bglmm( # ... other arguments ... return_samples = TRUE ) ``` ```{r use_samples, eval = FALSE} # Treatment effect samples str(result_samples$samples$delta) # Compute custom summaries cat("Treatment effect quantiles:\n") print(apply(result_samples$samples$delta, 2, quantile, probs = c(0.025, 0.5, 0.975))) # Probability that effect on math > 0.1 cat("\nP(delta_math > 0.1):", mean(result_samples$samples$delta[, 1] > 0.1), "\n") ``` ## Data Requirements For reliable multilevel modeling: 1. **Number of clusters**: Ideally there are sufficient clusters - More clusters → Better random effect estimates 2. **Cluster sizes**: - Very small clusters provide little information about random effects - Unbalanced cluster sizes are acceptable 3. **Group balance within clusters**: Ideally, each cluster has both treatment groups - If some clusters have only one group, the model can still fit but with a warning 4. **Missing data**: Handled automatically - Rows with missing outcomes or covariates are removed per cluster - Complete-case analysis within each cluster ## Common Issues and Solutions ### Warning: "Very few clusters (J < 5)" **Problem**: Too few clusters might hinder reliable random effect estimation **Solutions**: - Collect more clusters if possible - Use `bglm()` instead (ignores clustering), but only if appropriate ### Warning: "MCMC chains may not have converged" **Problem**: MPSRF > 1.1 **Solutions**: 1. Increase burn-in: `n_burn = 2000` or higher 2. Increase iterations: `n_it = 5000` or higher 3. Reduce thinning: `n_thin = 1` (keeps more samples) 4. Check data quality (outliers, sufficient variation) ### Slow computation **Problem**: Large number of clusters or long chains **Solutions**: - Start with small `n_it` to verify model runs - Increase `n_thin` to reduce stored samples: `n_thin = 10` - Run overnight for production analyses - Typical times: 5-10 minutes for 10 clusters, 10k iterations ## Summary `bglmm()` extends Bayesian multivariate analysis to clustered data by: 1. **Accounting for clustering** via random effects 2. **Allowing for subgroup effects** via logistic regression 3. **Flexible population definitions** (Value, Empirical, Analytical) 4. **Multivariate decision rules** (All, Any, Compensatory) Choose your analysis: - **`bmvb()`**: Full sample, no clustering - **`bglm()`**: Subgroup analysis, no clustering - **`bglmm()`**: Subgroup analysis + clustering All three functions support multivariate binary outcomes and provide posterior probabilities for treatment effects. ## References For more details on statistical methodology or when using the `bglmm()`-function, please cite: `r capture.output(print(biblio["Kavelaars2023"], .opts = list(style = "markdown", bib.style = "authoryear", sorting = "ydnt")))`