--- title: "Multivariate Regression" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Multivariate Regression} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} ## Use ragg for better font rendering if available if (requireNamespace("ragg", quietly = TRUE)) { old_opts <- options(summata.use_ragg = TRUE, width = 180) knitr::opts_chunk$set( dev = "ragg_png", fig.retina = 1, collapse = TRUE, comment = "##>", message = FALSE, warning = FALSE, fig.width = 8, fig.height = 5, out.width = "80%" ) } else { old_opts <- options(width = 180) knitr::opts_chunk$set( collapse = TRUE, comment = "##>", message = FALSE, warning = FALSE, fig.width = 8, fig.height = 5, out.width = "80%" ) } ## Dynamic figure sizing: queue_plot() stashes rec_dims from a plot object, ## and the opts_hook on the NEXT chunk (with use_rec_dims = TRUE) applies them ## before knitr opens the graphics device. Plots render via ragg (dev = "ragg_png" ## set above) and knitr captures them natively. No files written to disk. .plot_dims <- new.env(parent = emptyenv()) .plot_dims$width <- NULL .plot_dims$height <- NULL knitr::opts_hooks$set(use_rec_dims = function(options) { if (isTRUE(options$use_rec_dims)) { if (!is.null(.plot_dims$width)) options$fig.width <- .plot_dims$width if (!is.null(.plot_dims$height)) options$fig.height <- .plot_dims$height .plot_dims$width <- NULL .plot_dims$height <- NULL } options }) ## Call at the end of a plot-creation chunk to stash dimensions for the next chunk. queue_plot <- function(plot) { dims <- attr(plot, "rec_dims") if (!is.null(dims)) { .plot_dims$width <- dims$width .plot_dims$height <- dims$height } invisible(plot) } ``` Multivariate regression refers to the simultaneous examination of a single independent predictor across multiple dependent variables. This approach inverts the typical regression paradigm: rather than testing multiple predictors against one outcome (as in `uniscreen()`, `fit()`, and `fullfit()`), multivariate regression tests the predictive value of a individual factor against multiple outcomes, with or without multivariable adjustment for other independent factors. This design is particularly useful when a key exposure or intervention must be evaluated against several endpoints simultaneously. The `multifit()` function implements this workflow, supporting all model types available in `summata` with optional covariate adjustment, interaction terms, and mixed-effects specifications. Results can be visualized using `multiforest()` and exported using standard table export functions. As with other `summata` functions, this function adheres to the standard calling convention: ```{r, eval = FALSE} multifit(data, outcomes, predictor, covariates, model_type, ...) ``` where `data` is the dataset, `outcomes` is a vector of screened outcome variables, `predictor` is the predicting variable under examination, `covariates` is a vector of simultaneous independent predicting variables (i.e., potential confounders), and `model_type` is the type of model to be implemented. This vignette demonstrates the various capabilities of this function using the included sample dataset. --- # Preliminaries The examples in this vignette use the `clintrial` dataset included with `summata`: ```{r setup} library(summata) library(survival) library(ggplot2) data(clintrial) data(clintrial_labels) ``` The `clintrial` dataset includes multiple outcome types suitable for multivariate analysis: | Outcome Type | Variables | Model | |:-------------|:----------|:------| | Continuous | `los_days`, `pain_score`, `recovery_days` | `lm` | | Binary | `any_complication`, `wound_infection`, `readmission_30d`, `icu_admission` | `glm` (binomial) | | Count (equidispersed) | `fu_count` | `glm` (poisson) | | Count (overdispersed) | `ae_count` | `negbin` or `quasipoisson` | | Time-to-event | `Surv(pfs_months, pfs_status)`, `Surv(os_months, os_status)` | `coxph` | > *n.b.:* To ensure correct font rendering and figure sizing, the forest plots below are displayed using a helper function (`queue_plot()`) that applies each plot's recommended dimensions (stored in the `"rec_dims"` attribute) via the [`ragg`](https://ragg.r-lib.org/) graphics device. In practice, replace `queue_plot()` with `ggplot2::ggsave()` using recommended plot dimensions for equivalent results: > > ```{r, eval = FALSE} > p <- glmforest(model, data = mydata) > dims <- attr(p, "rec_dims") > ggplot2::ggsave("forest_plot.png", p, > width = dims$width, > height = dims$height) > ``` > > This ensures that the figure size is always large enough to accommodate the constituent plot text and graphics, and it is generally the preferred method for saving forest plot outputs in `summata`. --- # Basic Usage ## **Example 1:** Unadjusted Analysis Test a single predictor against multiple binary outcomes: ```{r} example1 <- multifit( data = clintrial, outcomes = c("any_complication", "wound_infection", "readmission_30d", "icu_admission"), predictor = "surgery", labels = clintrial_labels, parallel = FALSE ) example1 ``` Each row shows the effect of the predictor on a specific outcome. For categorical predictors, each non-reference level appears separately. ## **Example 2:** Adjusted Analysis Include covariates for confounding control: ```{r} example2 <- multifit( data = clintrial, outcomes = c("any_complication", "wound_infection", "readmission_30d", "icu_admission"), predictor = "surgery", covariates = c("age", "sex", "smoking", "diabetes"), labels = clintrial_labels, parallel = FALSE ) example2 ``` ## **Example 3:** Unadjusted and Adjusted Comparison Use `columns = "both"` to display unadjusted and adjusted results side-by-side: ```{r} example3 <- multifit( data = clintrial, outcomes = c("any_complication", "wound_infection", "readmission_30d", "icu_admission"), predictor = "surgery", covariates = c("age", "sex", "diabetes", "surgery"), columns = "both", labels = clintrial_labels, parallel = FALSE ) example3 ``` Comparing columns reveals confounding (large differences) or robust associations (similar estimates). --- # Predictor Types ## **Example 4:** Continuous Predictors For continuous predictors, one row appears per outcome: ```{r} example4 <- multifit( data = clintrial, outcomes = c("any_complication", "wound_infection", "icu_admission"), predictor = "age", covariates = c("sex", "treatment", "surgery"), labels = clintrial_labels, parallel = FALSE ) example4 ``` The effect estimate represents the change in log-odds per one-unit increase in the predictor. ## **Example 5:** Multilevel Categorical Predictors For categorical predictors with multiple levels, the display is expanded: ```{r} example5 <- multifit( data = clintrial, outcomes = c("any_complication", "wound_infection", "readmission_30d", "icu_admission"), predictor = "treatment", covariates = c("age", "sex", "surgery"), labels = clintrial_labels, parallel = FALSE ) example5 ``` --- # Model Types ## **Example 6:** Cox Regression For time-to-event outcomes, use `model_type = "coxph"`: ```{r} example6 <- multifit( data = clintrial, outcomes = c("Surv(pfs_months, pfs_status)", "Surv(os_months, os_status)"), predictor = "treatment", covariates = c("age", "sex", "stage"), model_type = "coxph", labels = clintrial_labels, parallel = FALSE ) example6 ``` ## **Example 7:** Linear Regression For continuous outcomes, use `model_type = "lm"`: ```{r} example7 <- multifit( data = clintrial, outcomes = c("los_days", "pain_score", "recovery_days"), predictor = "treatment", covariates = c("age", "sex", "surgery"), model_type = "lm", labels = clintrial_labels, parallel = FALSE ) example7 ``` ## **Example 8:** Mixed-Effects Models Account for clustering using random effects: ```{r} example8 <- multifit( data = clintrial, outcomes = c("any_complication", "wound_infection"), predictor = "treatment", covariates = c("age", "sex"), random = "(1|site)", model_type = "glmer", labels = clintrial_labels, parallel = FALSE ) example8 ``` --- # Advanced Features ## **Example 9:** Interaction Terms Interaction terms can be added to test effect modification: ```{r} example9 <- multifit( data = clintrial, outcomes = c("any_complication", "wound_infection"), predictor = "treatment", covariates = c("age", "sex"), interactions = c("treatment:sex"), labels = clintrial_labels, parallel = FALSE ) example9 ``` ## **Example 10:** Filtering by p-value Outputs can be filtered to retain only significant associations: ```{r} example10 <- multifit( data = clintrial, outcomes = c("any_complication", "wound_infection", "readmission_30d", "icu_admission"), predictor = "treatment", covariates = c("age", "sex", "surgery"), p_threshold = 0.01, labels = clintrial_labels, parallel = FALSE ) example10 ``` ## **Example 11:** Accessing Model Objects Underlying model objects are stored as attributes for additional analysis: ```{r} result <- multifit( data = clintrial, outcomes = c("any_complication", "wound_infection"), predictor = "treatment", covariates = c("age", "sex"), labels = clintrial_labels, keep_models = TRUE, parallel = FALSE ) # Access individual models models <- attr(result, "models") names(models) # Examine a specific model summary(models[["any_complication"]]) ``` --- # Forest Plot Visualization The `multiforest()` function creates forest plots from `multifit()` results. ## **Example 12:** Forest Plot for Logistic (Binary) Outcomes Results from `multifit()` can be directly inserted into the `multiforest()` function to generate a forest plot: ```{r} result <- multifit( data = clintrial, outcomes = c("any_complication", "wound_infection", "readmission_30d", "icu_admission"), predictor = "treatment", covariates = c("age", "sex", "diabetes", "surgery"), labels = clintrial_labels, parallel = FALSE ) example12 <- multiforest( result, title = "Treatment Effects Across Outcomes", indent_predictor = TRUE, zebra_stripes = TRUE ) queue_plot(example12) ``` ```{r, echo = FALSE, out.width = "100%", use_rec_dims = TRUE} print(example12) ``` ## **Example 13:** Customization Options Customize the appearance of the forest plot by adjusting function parameters: ```{r} example13 <- multiforest( result, title = "Effect Estimates", column = "adjusted", show_predictor = FALSE, covariates_footer = TRUE, table_width = 0.65, color = "#4BA6B6" ) queue_plot(example13) ``` ```{r, echo = FALSE, out.width = "100%", use_rec_dims = TRUE} print(example13) ``` ## **Example 14:** Forest Plot for Continuous Outcomes Create forest plots for continuous outcomes: ```{r} lm_result <- multifit( data = clintrial, outcomes = c("pain_score", "recovery_days", "los_days"), predictor = "treatment", covariates = c("age", "sex", "surgery"), model_type = "lm", parallel = FALSE ) example14 <- multiforest( lm_result, title = "Treatment Effects on Recovery Metrics", show_predictor = FALSE, covariates_footer = TRUE, labels = clintrial_labels ) queue_plot(example14) ``` ```{r, echo = FALSE, out.width = "100%", use_rec_dims = TRUE} print(example14) ``` ## **Example 15:** Forest Plot for Survival Outcomes The `multiforest()` function supports multiple `model_type` outputs from `multifit()`. In addition, variable labels can be applied to `multiforest()` outputs similarly to other forest plot functions: ```{r} cox_result <- multifit( data = clintrial, outcomes = c("Surv(pfs_months, pfs_status)", "Surv(os_months, os_status)"), predictor = "treatment", covariates = c("age", "sex", "stage"), model_type = "coxph", parallel = FALSE ) example15 <- multiforest( cox_result, title = "Treatment Effects on Survival Outcomes", indent_predictor = TRUE, zebra_stripes = TRUE, labels = clintrial_labels ) queue_plot(example15) ``` ```{r, echo = FALSE, out.width = "100%", use_rec_dims = TRUE} print(example15) ``` --- # Exporting Results ## Tables Export tables using standard export functions: ```{r, eval = FALSE} table2docx( table = result, file = file.path(tempdir(), "multioutcome_analysis.docx"), caption = "Treatment Effects Across Outcomes" ) table2pdf( table = result, file = file.path(tempdir(), "multioutcome_analysis.pdf"), caption = "Treatment Effects Across Outcomes" ) ``` ## Forest Plots Save forest plots using `ggsave()`: ```{r, eval = FALSE} p <- multiforest(result, title = "Effect Estimates") dims <- attr(p, "rec_dims") ggsave(file.path(tempdir(), "multioutcome_forest.pdf"), p, width = attr(result, "rec_dims")$width, height = attr(result, "rec_dims")$height, units = "in") ``` --- ## Complete Workflow Example The following demonstrates a complete multi-outcome analysis workflow: ```{r} ## Define outcomes by type binary_outcomes <- c("any_complication", "wound_infection", "readmission_30d", "icu_admission") survival_outcomes <- c("Surv(pfs_months, pfs_status)", "Surv(os_months, os_status)") ## Unadjusted screening unadjusted <- multifit( data = clintrial, outcomes = binary_outcomes, predictor = "treatment", labels = clintrial_labels, parallel = FALSE ) unadjusted ## Adjusted analysis with comparison adjusted <- multifit( data = clintrial, outcomes = binary_outcomes, predictor = "treatment", covariates = c("age", "sex", "diabetes", "surgery"), columns = "both", labels = clintrial_labels, parallel = FALSE ) adjusted ## Forest plot visualization forest_plot <- multiforest( adjusted, title = "Treatment Effect Estimates", column = "adjusted", indent_predictor = TRUE, zebra_stripes = TRUE, table_width = 0.65, labels = clintrial_labels ) queue_plot(forest_plot) ``` ```{r, echo = FALSE, out.width = "100%", use_rec_dims = TRUE} print(forest_plot) ``` --- ## Parameter Reference | Parameter | Description | |:----------|:------------| | `outcomes` | Character vector of outcome variable names | | `predictor` | Single predictor variable name | | `covariates` | Optional adjustment covariates | | `interactions` | Interaction terms (colon notation) | | `random` | Random effects formula for mixed models | | `strata` | Stratification variable (Cox models) | | `cluster` | Clustering variable for robust standard errors | | `model_type` | `"glm"`, `"lm"`, `"coxph"`, `"glmer"`, `"lmer"`, `"coxme"` | | `family` | GLM family (default: `"binomial"`) | | `columns` | `"adjusted"`, `"unadjusted"`, or `"both"` | | `p_threshold` | Filter results by *p*-value | | `labels` | Custom variable labels | | `parallel` | Enable parallel processing | --- # Best Practices ## Outcome Selection 1. Use compatible outcome types within a single analysis 2. Group conceptually related outcomes 3. Consider multiple testing adjustments when testing many outcomes ## Adjustment Strategy 1. Pre-specify covariates based on domain knowledge 2. Use `columns = "both"` to assess confounding 3. Apply consistent covariates across outcomes for comparability ## Interpretation 1. Focus on effect magnitude and precision, not only *p*-values 2. Look for consistent patterns across related outcomes 3. Consider practical significance alongside statistical significance --- # Common Issues ## Empty Results If results are empty, verify: - Predictor variable exists and has variation - Outcomes are correctly specified (`Surv()` syntax for survival) - Model type matches outcome type ## Convergence Warnings For mixed-effects models, simplify the random effects structure: ```{r, eval = FALSE} ## Start with random intercepts only multifit(data, outcomes, predictor, random = "(1|site)", model_type = "glmer") ``` ## Many Factor Levels For predictors with many levels, consider collapsing categories: ```{r, eval = FALSE} clintrial$treatment_binary <- ifelse(clintrial$treatment == "Control", "Control", "Active") ``` --- # Other Considerations ## Multivariate Regression vs. Univariable Screening The distinction between `multifit()` and `uniscreen()` is important: | Function | Tests | Use Case | |:---------|:------|:---------| | `uniscreen()` | Multiple predictors → One outcome | Variable screening, risk factor identification | | `multifit()` | One predictor → Multiple outcomes | Exposure effects, intervention evaluation | All outcomes in a single `multifit()` call should be of the same type (all binary, all continuous, or all survival). Mixing outcome types produces tables with incompatible effect measures. The function validates outcome compatibility and issues a warning when mixed types are detected. ## Categorical Outcomes with More Than Two Levels `multifit()` supports binary outcomes (via logistic regression) but not multinomial or ordinal outcomes. If a categorical outcome with three or more levels is included (e.g., treatment group with "Control", "Drug A", "Drug B"), the function will issue a warning because binomial GLM coerces such variables to binary (first level *vs.* all others). For multilevel categorical outcomes, use dedicated packages outside of `summata`: ```{r, eval = FALSE} ## Multinomial regression (unordered categories) library(nnet) model <- multinom(treatment ~ age + sex + stage, data = clintrial) ## Ordinal regression (ordered categories) library(MASS) model <- polr(grade ~ age + sex + stage, data = clintrial, Hess = TRUE) ``` ```{r, include = FALSE} options(old_opts) ``` --- ## Further Reading - [Descriptive Tables](descriptive_tables.html): `desctable()` for baseline characteristics - [Regression Modeling](regression_modeling.html): `fit()`, `uniscreen()`, and `fullfit()` - [Model Comparison](model_comparison.html): `compfit()` for comparing models - [Table Export](table_export.html): Export to PDF, Word, and other formats - [Forest Plots](forest_plots.html): Visualization of regression results - [Advanced Workflows](advanced_workflows.html): Interactions and mixed-effects models