--- title: "Getting Started with hierNest" author: "Ziren Jiang, Lingfeng Huo, Jue Hou, and Jared D. Huling" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 3 number_sections: true vignette: > %\VignetteIndexEntry{Getting Started with hierNest} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, warning = FALSE, message = FALSE ) ``` # Introduction The **hierNest** package implements penalized regression with a hierarchical nested parameterization designed for settings in which observations belong to a two-level group hierarchy — for example, Diagnosis Related Groups (DRGs) nested within Major Diagnostic Categories (MDCs) in electronic health record (EHR) data. ## Motivation In health systems, patient populations are highly heterogeneous. A patient hospitalized for heart failure has very different clinical risk factors from one admitted for a traumatic injury. Fitting a single pooled model ignores this heterogeneity, while fitting fully separate models per DRG is statistically infeasible when subgroup sample sizes are small and the number of predictors is large. **hierNest** resolves this tension by decomposing each covariate's effect into three interpretable, hierarchically nested components: $$\beta_d = \mu + \eta^{M(d)} + \delta_d$$ where: - $\mu$ is the **overall common effect** shared across all subgroups, - $\eta^{M(d)}$ is the **MDC-specific deviation** for the Major Diagnostic Category $M(d)$ that DRG $d$ belongs to, - $\delta_d$ is the **DRG-specific deviation** unique to subgroup $d$. Pairing this reparameterization with structured regularization (lasso or overlapping group lasso) allows the model to borrow strength across related subgroups while remaining flexible enough to capture genuine heterogeneity. ## Two penalization strategies The package provides two penalization methods: 1. **hierNest-Lasso** — applies a standard lasso penalty to the reparameterized coefficients, with penalty weights adjusted for subgroup sample size. This is implemented via a modified design matrix that can be passed directly to `glmnet`. It is fast and serves as a useful baseline. 2. **hierNest-OGLasso** — applies an overlapping group lasso penalty that additionally enforces *effect hierarchy*: an MDC-specific effect can be non-zero only if the corresponding overall effect is non-zero, and a DRG-specific effect can be non-zero only if the corresponding MDC effect is non-zero. This is the recommended method when the hierarchical structure is plausible. ## Package scope This vignette covers: - The structure of the required `hier_info` input. - Simulating or loading data suitable for `hierNest`. - Fitting a model with `hierNest()` at a fixed penalty. - Selecting tuning parameters with `cv.hierNest()`. - Extracting and interpreting coefficients. - Generating predictions on new data with `predict_hierNest()`. - Visualizing results with `plot()`. --- # Installation Install the package from CRAN (when available) or from source: ```{r install, eval = FALSE} # From CRAN install.packages("hierNest") # From source (development version) # install.packages("devtools") devtools::install_github("ZirenJiang/hierNest") ``` Load the package: ```{r load} library(hierNest) ``` --- # The `hier_info` matrix All `hierNest` functions require a `hier_info` argument that encodes the two-level hierarchy. This is an $n \times 2$ integer matrix where: - Column 1 gives the **group index** (e.g., MDC) for each observation, taking values in $\{1, \ldots, n_{\text{MDC}}\}$. - Column 2 gives the **subgroup index** (e.g., DRG) for each observation, taking values in $\{1, \ldots, n_{\text{DRG}}\}$. Subgroup indices must be *globally unique* across groups — two DRGs in different MDCs must have different integer codes. ```{r hier_info_example, eval = FALSE} # Illustration: 3 MDCs with 2 DRGs each (6 DRGs total) # # MDC 1 -> DRG 1, DRG 2 # MDC 2 -> DRG 3, DRG 4 # MDC 3 -> DRG 5, DRG 6 # # For an observation in MDC 2 / DRG 4: # hier_info[i, ] = c(2, 4) ``` --- # Example data The package ships with a built-in example dataset that illustrates the required data structure. ```{r load_data} data("example_data") # Dimensions cat("X dimensions:", dim(example_data$X), "\n") cat("Y length: ", length(example_data$Y), "\n") cat("hier_info dimensions:", dim(example_data$hier_info), "\n") # Peek at hier_info head(example_data$hier_info) # Group sizes table_mdc <- table(example_data$hier_info[, 1]) cat("\nObservations per MDC group:\n") print(table_mdc) table_drg <- table(example_data$hier_info[, 2]) cat("\nObservations per DRG subgroup (first 10):\n") print(head(table_drg, 10)) # Outcome distribution (binary) cat("\nOutcome distribution:\n") print(table(example_data$Y)) ``` --- # Fitting the model ## `hierNest()` — fit at a single penalty level `hierNest()` fits the full regularization path for a single pair of hyperparameters $(\alpha_1, \alpha_2)$. It is useful for exploration or when tuning parameters have already been selected. ```{r hierNest_fit, eval = FALSE} library(hierNest) fit <- hierNest( x = example_data$X, y = example_data$Y, hier_info = example_data$hier_info, family = "binomial", asparse1 = 1, # weight for MDC-level group penalty asparse2 = 1 # weight for DRG-level subgroup penalty ) # The fit contains a lambda sequence and corresponding coefficient paths length(fit$lambda) dim(fit$beta) # rows = reparameterized coefficients, cols = lambda values ``` Key arguments: | Argument | Description | |---|---| | `x` | $n \times p$ predictor matrix | | `y` | Response (numeric for Gaussian, 0/1 or factor for binomial) | | `hier_info` | $n \times 2$ group/subgroup index matrix | | `family` | `"gaussian"` or `"binomial"` | | `asparse1` | $\alpha_1$: relative weight for the MDC-level overlapping group penalty | | `asparse2` | $\alpha_2$: relative weight for the DRG-level overlapping group penalty | | `nlambda` | Length of the $\lambda$ sequence (default 100) | | `standardize` | Whether to standardize predictors before fitting (default `TRUE`) | | `method` | For now, only `"overlapping"` (hierNest-OGLasso, default) | --- ## `cv.hierNest()` — cross-validated tuning parameter selection In practice, the penalty hyperparameters $\lambda$, $\alpha_1$, and $\alpha_2$ need to be chosen by cross-validation. `cv.hierNest()` wraps the fitting procedure with cross-validation and returns the optimal parameter combination. ### Cross-validation methods The `cvmethod` argument controls how $\alpha_1$ and $\alpha_2$ are selected: - `"general"` — uses a single pair of $\alpha_1$ and $\alpha_2$ (supplied as scalars) and runs standard cross-validation over the $\lambda$ sequence only. - `"grid_search"` — searches a grid of $\alpha_1 \times \alpha_2$ pairs. The ranges are specified via `asparse1` and `asparse2` (each a length-2 vector giving the lower and upper bound), with `asparse1_num` and `asparse2_num` controlling the grid resolution. - `"user_supply"` — the user provides explicit paired vectors of $(\alpha_1, \alpha_2)$ values to evaluate. ### Recommended: grid search ```{r cv_fit, eval = FALSE} cv.fit <- cv.hierNest( x = example_data$X, y = example_data$Y, method = "overlapping", hier_info = example_data$hier_info, family = "binomial", partition = "subgroup", # stratify CV folds within subgroups cvmethod = "grid_search", asparse1 = c(0.5, 20), # search alpha_1 over [0.5, 20] asparse2 = c(0.05, 0.20), # search alpha_2 over [0.05, 0.20] asparse1_num = 3, # 3 x 3 = 9 grid points asparse2_num = 3, nlambda = 50, # lambda values per grid point pred.loss = "ROC" # maximize AUROC ) ``` > **Note on `partition = "subgroup"`:** This stratifies the cross-validation folds so that each fold contains observations from all DRG subgroups (to the extent possible). This avoids degenerate folds where a small subgroup is entirely absent from the training data. ### Single hyperparameter pair (fast) ```{r cv_fit_general, eval = FALSE} cv.fit.simple <- cv.hierNest( x = example_data$X, y = example_data$Y, method = "overlapping", hier_info = example_data$hier_info, family = "binomial", partition = "subgroup", cvmethod = "general", asparse1 = 1, asparse2 = 0.1, nlambda = 100, pred.loss = "ROC" ) ``` ### User-supplied grid ```{r cv_fit_user, eval = FALSE} cv.fit.user <- cv.hierNest( x = example_data$X, y = example_data$Y, method = "overlapping", hier_info = example_data$hier_info, family = "binomial", partition = "subgroup", cvmethod = "user_supply", asparse1 = c(0.5, 1, 5, 10), # explicit (alpha1, alpha2) pairs asparse2 = c(0.05, 0.1, 0.1, 0.2), nlambda = 50 ) ``` --- # Inspecting the cross-validated fit After running `cv.hierNest()`, several components of the returned object are of interest. ```{r inspect_cv, eval = FALSE} # Optimal tuning parameters selected by cross-validation cv.fit$lambda.min # lambda minimising CV loss cv.fit$min_alpha1 # alpha_1 selected cv.fit$min_alpha2 # alpha_2 selected # Number of non-zero coefficients in the reparameterized model # at the optimal lambda nnz <- sum(cv.fit$hierNest.fit$beta[, which(cv.fit$hierNest.fit$lambda == cv.fit$lambda.min)] != 0) cat("Non-zero reparameterized coefficients:", nnz, "\n") ``` --- # Visualizing the estimated effects The `plot()` method for `cv.hierNest` objects provides two heatmap visualizations. ## Coefficient heatmap Plotting with `type = "coefficients"` shows the estimated overall, MDC-specific, and DRG-specific components for each selected predictor, at the cross-validation optimal $\lambda$. ```{r plot_coef, eval = FALSE} # Returns a plotly interactive heatmap p_coef <- plot(cv.fit, type = "coefficients") p_coef ``` Each row of the heatmap corresponds to one level in the hierarchy (overall mean, then each MDC, then each DRG within that MDC). Each column is a selected predictor. Blue cells indicate a positive contribution; red cells indicate negative. Rows with all zeros are suppressed. ## Subgroup effect heatmap Plotting with `type = "Subgroup effects"` reconstructs the *total* effect for each DRG subgroup: $\hat{\beta}_d = \hat{\mu} + \hat{\eta}^{M(d)} + \hat{\delta}_d$. ```{r plot_subgroup, eval = FALSE} p_sub <- plot(cv.fit, type = "Subgroup effects") p_sub ``` This visualization is useful for comparing risk profiles across subgroups — subgroups with similar colors for a given predictor share similar risk processes for that covariate. --- # Methodological background The statistical framework underlying **hierNest** is described in detail in: > Jiang, Z., Huo, L., Hou, J., Vaughan-Sarrazin, M., Smith, M. A., & Huling, J. D. (2026+). *Heterogeneous readmission prediction with hierarchical effect decomposition and regularization*. ## Hierarchical reparameterization For participant $i$ with DRG $D_i = d$, the logistic regression model is: $$\text{logit}(\Pr(Y_i = 1 \mid X_i, D_i = d)) = X_i^\top \beta_d$$ The hierNest reparameterization decomposes $\beta_d$ as: $$\beta_d = \mu + \eta^{M(d)} + \delta_d$$ This is encoded via a modified design matrix $X_H$ constructed as the Khatri–Rao (column-wise Kronecker) product of $X$ and the hierarchy indicator matrix $H = [\mathbf{1}_n \;;\; H_M \;;\; H_D]$. ## Overlapping group lasso penalty The overlapping group lasso penalty on the grouped coefficient vector $\theta_j = \{\mu_j, \{\eta_{Mj}\}_{M}, \{\delta_{dj}\}_d\}$ for the $j$-th predictor is: $$P_{\text{og}}(\theta) = \lambda \sum_{j=1}^{p} \left[ |\theta_j|_2 + \sum_{M \in \mathcal{M}} \left(\alpha_1 |\theta_{Mj}|_2 + \alpha_2 |\eta_{Mj}| \right) \right]$$ This penalty enforces a hierarchical zero pattern: $\mu_j = 0$ implies all $\eta_{Mj} = 0$ and $\delta_{dj} = 0$; $\eta_{Mj} = 0$ implies all $\delta_{dj} = 0$ for $d \in M$. ## Computation For `method = "overlapping"`, the majorization-minimization (MM) algorithm described in Algorithm 1 of the paper is used, with sequential strong rules to skip inactive predictor groups and accelerate computation. The design matrix is stored as a sparse matrix to minimise memory requirements. --- # Session information ```{r session_info} sessionInfo() ```