The hierNest package implements penalized regression with a hierarchical nested parameterization designed for settings in which observations belong to a two-level group hierarchy — for example, Diagnosis Related Groups (DRGs) nested within Major Diagnostic Categories (MDCs) in electronic health record (EHR) data.
In health systems, patient populations are highly heterogeneous. A patient hospitalized for heart failure has very different clinical risk factors from one admitted for a traumatic injury. Fitting a single pooled model ignores this heterogeneity, while fitting fully separate models per DRG is statistically infeasible when subgroup sample sizes are small and the number of predictors is large.
hierNest resolves this tension by decomposing each covariate’s effect into three interpretable, hierarchically nested components:
\[\beta_d = \mu + \eta^{M(d)} + \delta_d\]
where:
Pairing this reparameterization with structured regularization (lasso or overlapping group lasso) allows the model to borrow strength across related subgroups while remaining flexible enough to capture genuine heterogeneity.
The package provides two penalization methods:
hierNest-Lasso — applies a standard lasso
penalty to the reparameterized coefficients, with penalty weights
adjusted for subgroup sample size. This is implemented via a modified
design matrix that can be passed directly to glmnet. It is
fast and serves as a useful baseline.
hierNest-OGLasso — applies an overlapping group lasso penalty that additionally enforces effect hierarchy: an MDC-specific effect can be non-zero only if the corresponding overall effect is non-zero, and a DRG-specific effect can be non-zero only if the corresponding MDC effect is non-zero. This is the recommended method when the hierarchical structure is plausible.
This vignette covers:
hier_info input.hierNest.hierNest() at a fixed
penalty.cv.hierNest().predict_hierNest().plot().Install the package from CRAN (when available) or from source:
# From CRAN
install.packages("hierNest")
# From source (development version)
# install.packages("devtools")
devtools::install_github("ZirenJiang/hierNest")Load the package:
hier_info matrixAll hierNest functions require a hier_info
argument that encodes the two-level hierarchy. This is an \(n \times 2\) integer matrix where:
Subgroup indices must be globally unique across groups — two DRGs in different MDCs must have different integer codes.
# Illustration: 3 MDCs with 2 DRGs each (6 DRGs total)
#
# MDC 1 -> DRG 1, DRG 2
# MDC 2 -> DRG 3, DRG 4
# MDC 3 -> DRG 5, DRG 6
#
# For an observation in MDC 2 / DRG 4:
# hier_info[i, ] = c(2, 4)The package ships with a built-in example dataset that illustrates the required data structure.
data("example_data")
# Dimensions
cat("X dimensions:", dim(example_data$X), "\n")
#> X dimensions: 200 30
cat("Y length: ", length(example_data$Y), "\n")
#> Y length: 200
cat("hier_info dimensions:", dim(example_data$hier_info), "\n")
#> hier_info dimensions: 200 2
# Peek at hier_info
head(example_data$hier_info)
#> [,1] [,2]
#> [1,] 1 1
#> [2,] 1 1
#> [3,] 1 1
#> [4,] 1 1
#> [5,] 1 1
#> [6,] 1 1
# Group sizes
table_mdc <- table(example_data$hier_info[, 1])
cat("\nObservations per MDC group:\n")
#>
#> Observations per MDC group:
print(table_mdc)
#>
#> 1 2
#> 100 100
table_drg <- table(example_data$hier_info[, 2])
cat("\nObservations per DRG subgroup (first 10):\n")
#>
#> Observations per DRG subgroup (first 10):
print(head(table_drg, 10))
#>
#> 1 2 3 4
#> 50 50 50 50
# Outcome distribution (binary)
cat("\nOutcome distribution:\n")
#>
#> Outcome distribution:
print(table(example_data$Y))
#>
#> 0 1
#> 108 92hierNest() — fit at a single penalty levelhierNest() fits the full regularization path for a
single pair of hyperparameters \((\alpha_1,
\alpha_2)\). It is useful for exploration or when tuning
parameters have already been selected.
library(hierNest)
fit <- hierNest(
x = example_data$X,
y = example_data$Y,
hier_info = example_data$hier_info,
family = "binomial",
asparse1 = 1, # weight for MDC-level group penalty
asparse2 = 1 # weight for DRG-level subgroup penalty
)
# The fit contains a lambda sequence and corresponding coefficient paths
length(fit$lambda)
dim(fit$beta) # rows = reparameterized coefficients, cols = lambda valuesKey arguments:
| Argument | Description |
|---|---|
x |
\(n \times p\) predictor matrix |
y |
Response (numeric for Gaussian, 0/1 or factor for binomial) |
hier_info |
\(n \times 2\) group/subgroup index matrix |
family |
"gaussian" or "binomial" |
asparse1 |
\(\alpha_1\): relative weight for the MDC-level overlapping group penalty |
asparse2 |
\(\alpha_2\): relative weight for the DRG-level overlapping group penalty |
nlambda |
Length of the \(\lambda\) sequence (default 100) |
standardize |
Whether to standardize predictors before fitting (default
TRUE) |
method |
For now, only "overlapping" (hierNest-OGLasso,
default) |
cv.hierNest() — cross-validated tuning parameter
selectionIn practice, the penalty hyperparameters \(\lambda\), \(\alpha_1\), and \(\alpha_2\) need to be chosen by
cross-validation. cv.hierNest() wraps the fitting procedure
with cross-validation and returns the optimal parameter combination.
The cvmethod argument controls how \(\alpha_1\) and \(\alpha_2\) are selected:
"general" — uses a single pair of \(\alpha_1\) and \(\alpha_2\) (supplied as scalars) and runs
standard cross-validation over the \(\lambda\) sequence only."grid_search" — searches a grid of \(\alpha_1 \times \alpha_2\) pairs. The
ranges are specified via asparse1 and asparse2
(each a length-2 vector giving the lower and upper bound), with
asparse1_num and asparse2_num controlling the
grid resolution."user_supply" — the user provides explicit paired
vectors of \((\alpha_1, \alpha_2)\)
values to evaluate.cv.fit <- cv.hierNest(
x = example_data$X,
y = example_data$Y,
method = "overlapping",
hier_info = example_data$hier_info,
family = "binomial",
partition = "subgroup", # stratify CV folds within subgroups
cvmethod = "grid_search",
asparse1 = c(0.5, 20), # search alpha_1 over [0.5, 20]
asparse2 = c(0.05, 0.20), # search alpha_2 over [0.05, 0.20]
asparse1_num = 3, # 3 x 3 = 9 grid points
asparse2_num = 3,
nlambda = 50, # lambda values per grid point
pred.loss = "ROC" # maximize AUROC
)Note on
partition = "subgroup": This stratifies the cross-validation folds so that each fold contains observations from all DRG subgroups (to the extent possible). This avoids degenerate folds where a small subgroup is entirely absent from the training data.
cv.fit.user <- cv.hierNest(
x = example_data$X,
y = example_data$Y,
method = "overlapping",
hier_info = example_data$hier_info,
family = "binomial",
partition = "subgroup",
cvmethod = "user_supply",
asparse1 = c(0.5, 1, 5, 10), # explicit (alpha1, alpha2) pairs
asparse2 = c(0.05, 0.1, 0.1, 0.2),
nlambda = 50
)After running cv.hierNest(), several components of the
returned object are of interest.
# Optimal tuning parameters selected by cross-validation
cv.fit$lambda.min # lambda minimising CV loss
cv.fit$min_alpha1 # alpha_1 selected
cv.fit$min_alpha2 # alpha_2 selected
# Number of non-zero coefficients in the reparameterized model
# at the optimal lambda
nnz <- sum(cv.fit$hierNest.fit$beta[,
which(cv.fit$hierNest.fit$lambda == cv.fit$lambda.min)] != 0)
cat("Non-zero reparameterized coefficients:", nnz, "\n")The plot() method for cv.hierNest objects
provides two heatmap visualizations.
Plotting with type = "coefficients" shows the estimated
overall, MDC-specific, and DRG-specific components for each selected
predictor, at the cross-validation optimal \(\lambda\).
Each row of the heatmap corresponds to one level in the hierarchy (overall mean, then each MDC, then each DRG within that MDC). Each column is a selected predictor. Blue cells indicate a positive contribution; red cells indicate negative. Rows with all zeros are suppressed.
Plotting with type = "Subgroup effects" reconstructs the
total effect for each DRG subgroup: \(\hat{\beta}_d = \hat{\mu} + \hat{\eta}^{M(d)} +
\hat{\delta}_d\).
This visualization is useful for comparing risk profiles across subgroups — subgroups with similar colors for a given predictor share similar risk processes for that covariate.
The statistical framework underlying hierNest is described in detail in:
Jiang, Z., Huo, L., Hou, J., Vaughan-Sarrazin, M., Smith, M. A., & Huling, J. D. (2026+). Heterogeneous readmission prediction with hierarchical effect decomposition and regularization.
For participant \(i\) with DRG \(D_i = d\), the logistic regression model is:
\[\text{logit}(\Pr(Y_i = 1 \mid X_i, D_i = d)) = X_i^\top \beta_d\]
The hierNest reparameterization decomposes \(\beta_d\) as:
\[\beta_d = \mu + \eta^{M(d)} + \delta_d\]
This is encoded via a modified design matrix \(X_H\) constructed as the Khatri–Rao (column-wise Kronecker) product of \(X\) and the hierarchy indicator matrix \(H = [\mathbf{1}_n \;;\; H_M \;;\; H_D]\).
The overlapping group lasso penalty on the grouped coefficient vector \(\theta_j = \{\mu_j, \{\eta_{Mj}\}_{M}, \{\delta_{dj}\}_d\}\) for the \(j\)-th predictor is:
\[P_{\text{og}}(\theta) = \lambda \sum_{j=1}^{p} \left[ |\theta_j|_2 + \sum_{M \in \mathcal{M}} \left(\alpha_1 |\theta_{Mj}|_2 + \alpha_2 |\eta_{Mj}| \right) \right]\]
This penalty enforces a hierarchical zero pattern: \(\mu_j = 0\) implies all \(\eta_{Mj} = 0\) and \(\delta_{dj} = 0\); \(\eta_{Mj} = 0\) implies all \(\delta_{dj} = 0\) for \(d \in M\).
For method = "overlapping", the
majorization-minimization (MM) algorithm described in Algorithm 1 of the
paper is used, with sequential strong rules to skip inactive predictor
groups and accelerate computation. The design matrix is stored as a
sparse matrix to minimise memory requirements.
sessionInfo()
#> R version 4.3.2 (2023-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 11 x64 (build 26200)
#>
#> Matrix products: default
#>
#>
#> locale:
#> [1] LC_COLLATE=C
#> [2] LC_CTYPE=Chinese (Simplified)_China.utf8
#> [3] LC_MONETARY=Chinese (Simplified)_China.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=Chinese (Simplified)_China.utf8
#>
#> time zone: America/Chicago
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] hierNest_1.0.0
#>
#> loaded via a namespace (and not attached):
#> [1] Matrix_1.6-5 gtable_0.3.6 jsonlite_1.8.8 dplyr_1.1.4
#> [5] compiler_4.3.2 tidyselect_1.2.1 tidyr_1.3.1 jquerylib_0.1.4
#> [9] scales_1.4.0 yaml_2.3.8 fastmap_1.1.1 lattice_0.22-5
#> [13] ggplot2_4.0.0 R6_2.6.1 generics_0.1.4 knitr_1.51
#> [17] htmlwidgets_1.6.4 dotCall64_1.1-1 tibble_3.2.1 bslib_0.7.0
#> [21] pillar_1.11.1 RColorBrewer_1.1-3 rlang_1.1.6 cachem_1.0.8
#> [25] xfun_0.56 sass_0.4.9 S7_0.2.0 lazyeval_0.2.2
#> [29] otel_0.2.0 viridisLite_0.4.2 plotly_4.10.4 cli_3.6.2
#> [33] magrittr_2.0.3 digest_0.6.34 grid_4.3.2 rstudioapi_0.16.0
#> [37] rTensor_1.4.8 lifecycle_1.0.4 vctrs_0.6.5 evaluate_0.23
#> [41] glue_1.8.0 data.table_1.17.8 farver_2.1.1 purrr_1.0.2
#> [45] rmarkdown_2.30 httr_1.4.7 tools_4.3.2 pkgconfig_2.0.3
#> [49] htmltools_0.5.8.1