Data Preparation and Settings

The hima function provides a flexible and user-friendly interface for performing high-dimensional mediation analysis. It supports a variety of analysis methods tailored for continuous, binary, survival, and compositional data. Below is an overview of the hima function and its key parameters:

`hima` Function Interface

hima(
  formula,          # The model formula specifying outcome, exposure, and covariate(s)
  data.pheno,       # Data frame with outcome, exposure, and covariate(s)
  data.M,           # Data frame or matrix of high-dimensional mediators
  mediator.type,    # Type of mediators: "gaussian", "negbin", or "compositional"
  penalty = "DBlasso",  # Penalty method: "DBlasso", "MCP", "SCAD", or "lasso"
  quantile = FALSE, # Use quantile mediation analysis (default: FALSE)
  efficient = FALSE,# Use efficient mediation analysis (default: FALSE)
  scale = TRUE,     # Scale data (default: TRUE)
  sigcut = 0.05,    # Significance cutoff for mediator selection
  contrast = NULL,  # Named list of contrasts for factor covariate(s)
  subset = NULL,    # Optional subset of observations
  verbose = FALSE   # Display progress messages (default: FALSE)
)

To use the hima function, ensure your data is prepared according to the following guidelines:

1. Formula Argument (`formula`)

Define the model formula to specify the relationship between the Outcome, Exposure, and Covariate(s). Ensure the following:

General Form: Use the format Outcome ~ Exposure + Covariate(s). Note that the Exposure variable represents the exposure of interest (e.g., “Treatment” in the demo examples) and it has to be listed as the first independent variable in the formula. Covariate(s) are optional.
Survival Data: For survival analysis, use the format Surv(time, event) ~ Exposure + Covariate(s). See data examples SurvivalData$PhenoData for more details.

2. Phenotype Data (`data.pheno`)

The data.pheno object should be a data.frame or matrix containing the phenotype information for the analysis (without missing values). Key requirements include:

Rows: Represent samples.
Columns: Include variables such as the outcome, treatment, and optional covariate(s).
Formula Consistency: Ensure that all variables specified in the formula argument (e.g., Outcome, Treatment, and Covariate(s)) are present in data.pheno.

3. Mediator Data (`data.M`)

The data.M object should be a data.frame or matrix containing high-dimensional mediators (without missing values). Key requirements include:

Rows: Represent samples, aligned with the rows in data.pheno.
Columns: Represent mediators (e.g., CpGs, genes, or other molecular features).
Mediator Type: Specify the type of mediators in the mediator.type argument. Supported types include:
- "gaussian" for continuous mediators (default, e.g., DNA methylation data).
- "negbin" for count data (e.g., transcriptomic data).
- "compositional" for microbiome or other compositional data.

4. About data scaling

In most real-world data analysis scenarios, scale is typically set to TRUE, ensuring that the exposure (variable of interest), mediators, and covariate(s) (if included) are standardized to a mean of zero and a variance of one. No scaling will be applied to Outcome. However, if your data is already pre-standardized—such as in simulation studies or when using our demo dataset-scale should be set to FALSE to prevent introducing biases or altering the original data structure.

When applying HIMA to simulated data, if scale is set to TRUE, it is imperative to preprocess the mediators by scaling them to have a mean of zero and a variance of one prior to generating the outcome variables.

Parallel Computing Support

The hima() function supports parallel computing to speed up high-dimensional mediation analysis, especially when dealing with a large number of mediators.

Enabling Parallel Computing

To enable parallel computing, simply set parallel = TRUE and specify the number of CPU cores to use via the ncore argument:

hima(..., parallel = TRUE, ncore = 4)

Applications and Examples

Load the `HIMA` Package

library(HIMA)

Continuous Outcome Analysis

When analyzing continuous and normally distributed outcomes and mediators, we can use the following code snippet:

data(ContinuousOutcome)
pheno_data <- ContinuousOutcome$PhenoData
mediator_data <- ContinuousOutcome$Mediator

hima_continuous.fit <- hima(
  Outcome ~ Treatment + Sex + Age,
  data.pheno = pheno_data,
  data.M = mediator_data,
  mediator.type = "gaussian",
  penalty = "DBlasso",
  scale = FALSE # Demo data is already standardized
)
summary(hima_continuous.fit, desc=TRUE) 
# `desc = TRUE` option to show the description of the output results

penalty = "DBlasso" is particularly effective at identifying mediators with weaker signals compared to penalty = "MCP". However, using DBlasso requires more computational time.

Efficient HIMA

For continuous and normally distributed mediators and outcomes, an efficient HIMA method can be activated with the efficient = TRUE option (penalty should be MCP for the best results). This method may also provide greater statistical power to detect mediators with weaker signals.

hima_efficient.fit <- hima(
  Outcome ~ Treatment + Sex + Age,
  data.pheno = pheno_data,
  data.M = mediator_data,
  mediator.type = "gaussian",
  efficient = TRUE,
  penalty = "MCP",
  scale = FALSE # Demo data is already standardized
)
summary(hima_efficient.fit, desc=TRUE) 
# Note that the efficient HIMA is controlling FDR

It is recommended to try different penalty options and efficient option to find the best one for your data.

Binary Outcome Analysis

The package can handle binary outcomes based on logistic regression:

data(BinaryOutcome)
pheno_data <- BinaryOutcome$PhenoData
mediator_data <- BinaryOutcome$Mediator

hima_binary.fit <- hima(
  Disease ~ Treatment + Sex + Age,
  data.pheno = pheno_data,
  data.M = mediator_data,
  mediator.type = "gaussian",
  penalty = "MCP",
  scale = FALSE # Demo data is already standardized
)
summary(hima_binary.fit)

Survival Outcome Analysis

For survival data, HIMA incorporates a Cox proportional hazards approach. Here is an example of survival outcome analysis using HIMA:

data(SurvivalData)
pheno_data <- SurvivalData$PhenoData
mediator_data <- SurvivalData$Mediator

hima_survival.fit <- hima(
  Surv(Time, Status) ~ Treatment + Sex + Age,
  data.pheno = pheno_data,
  data.M = mediator_data,
  mediator.type = "gaussian",
  penalty = "DBlasso",
  scale = FALSE # Demo data is already standardized
)
summary(hima_survival.fit)

Microbiome Mediation Analysis

For compositional microbiome data, HIMA employs isometric Log-Ratio transformations. Here is an example of microbiome mediation analysis using HIMA:

data(MicrobiomeData)
pheno_data <- MicrobiomeData$PhenoData
mediator_data <- MicrobiomeData$Mediator

hima_microbiome.fit <- hima(
  Outcome ~ Treatment + Sex + Age,
  data.pheno = pheno_data,
  data.M = mediator_data,
  mediator.type = "compositional",
  penalty = "DBlasso"
)
summary(hima_microbiome.fit)

Quantile Mediation Analysis

Perform quantile mediation analysis using the quantile = TRUE option and specify tau for desired quantile(s):

data(QuantileData)
pheno_data <- QuantileData$PhenoData
mediator_data <- QuantileData$Mediator

hima_quantile.fit <- hima(
  Outcome ~ Treatment + Sex + Age,
  data.pheno = pheno_data,
  data.M = mediator_data,
  mediator.type = "gaussian",
  quantile = TRUE,
  penalty = "MCP",
  tau = c(0.3, 0.5, 0.7),
  scale = FALSE # Demo data is already standardized
)
summary(hima_quantile.fit)

High-Dimensional Mediation Analysis

A Guide to Using the HIMA Package

The HIMA Development Team

2025-06-11

Introduction

Package Overview

Data Preparation and Settings

`hima` Function Interface

1. Formula Argument (`formula`)

2. Phenotype Data (`data.pheno`)

3. Mediator Data (`data.M`)

4. About data scaling

Parallel Computing Support

Enabling Parallel Computing

Applications and Examples

Load the `HIMA` Package

Continuous Outcome Analysis

Efficient HIMA

Binary Outcome Analysis

Survival Outcome Analysis

Microbiome Mediation Analysis

Quantile Mediation Analysis

High-Dimensional Mediation Analysis

A Guide to Using the HIMA Package

The HIMA Development Team

2025-06-11

Introduction

Package Overview

Data Preparation and Settings

hima Function Interface

1. Formula Argument (formula)

2. Phenotype Data (data.pheno)

3. Mediator Data (data.M)

4. About data scaling

Parallel Computing Support

Enabling Parallel Computing

Applications and Examples

Load the HIMA Package

Continuous Outcome Analysis

Efficient HIMA

Binary Outcome Analysis

Survival Outcome Analysis

Microbiome Mediation Analysis

Quantile Mediation Analysis

`hima` Function Interface

1. Formula Argument (`formula`)

2. Phenotype Data (`data.pheno`)

3. Mediator Data (`data.M`)

Load the `HIMA` Package