1 Introduction

The hierNest package implements penalized regression with a hierarchical nested parameterization designed for settings in which observations belong to a two-level group hierarchy — for example, Diagnosis Related Groups (DRGs) nested within Major Diagnostic Categories (MDCs) in electronic health record (EHR) data.

1.1 Motivation

In health systems, patient populations are highly heterogeneous. A patient hospitalized for heart failure has very different clinical risk factors from one admitted for a traumatic injury. Fitting a single pooled model ignores this heterogeneity, while fitting fully separate models per DRG is statistically infeasible when subgroup sample sizes are small and the number of predictors is large.

hierNest resolves this tension by decomposing each covariate’s effect into three interpretable, hierarchically nested components:

\[\beta_d = \mu + \eta^{M(d)} + \delta_d\]

where:

\(\mu\) is the overall common effect shared across all subgroups,
\(\eta^{M(d)}\) is the MDC-specific deviation for the Major Diagnostic Category \(M(d)\) that DRG \(d\) belongs to,
\(\delta_d\) is the DRG-specific deviation unique to subgroup \(d\).

Pairing this reparameterization with structured regularization (lasso or overlapping group lasso) allows the model to borrow strength across related subgroups while remaining flexible enough to capture genuine heterogeneity.

1.2 Two penalization strategies

The package provides two penalization methods:

hierNest-Lasso — applies a standard lasso penalty to the reparameterized coefficients, with penalty weights adjusted for subgroup sample size. This is implemented via a modified design matrix that can be passed directly to glmnet. It is fast and serves as a useful baseline.
hierNest-OGLasso — applies an overlapping group lasso penalty that additionally enforces effect hierarchy: an MDC-specific effect can be non-zero only if the corresponding overall effect is non-zero, and a DRG-specific effect can be non-zero only if the corresponding MDC effect is non-zero. This is the recommended method when the hierarchical structure is plausible.

1.3 Package scope

This vignette covers:

The structure of the required hier_info input.
Simulating or loading data suitable for hierNest.
Fitting a model with hierNest() at a fixed penalty.
Selecting tuning parameters with cv.hierNest().
Extracting and interpreting coefficients.
Generating predictions on new data with predict_hierNest().
Visualizing results with plot().

2 Installation

Install the package from CRAN (when available) or from source:

# From CRAN
install.packages("hierNest")

# From source (development version)
# install.packages("devtools")
devtools::install_github("ZirenJiang/hierNest")

Load the package:

library(hierNest)

3 The `hier_info` matrix

All hierNest functions require a hier_info argument that encodes the two-level hierarchy. This is an \(n \times 2\) integer matrix where:

Column 1 gives the group index (e.g., MDC) for each observation, taking values in \(\{1, \ldots, n_{\text{MDC}}\}\).
Column 2 gives the subgroup index (e.g., DRG) for each observation, taking values in \(\{1, \ldots, n_{\text{DRG}}\}\).

Subgroup indices must be globally unique across groups — two DRGs in different MDCs must have different integer codes.

# Illustration: 3 MDCs with 2 DRGs each (6 DRGs total)
#
# MDC 1 -> DRG 1, DRG 2
# MDC 2 -> DRG 3, DRG 4
# MDC 3 -> DRG 5, DRG 6
#
# For an observation in MDC 2 / DRG 4:
#   hier_info[i, ] = c(2, 4)

4 Example data

The package ships with a built-in example dataset that illustrates the required data structure.

data("example_data")

# Dimensions
cat("X dimensions:", dim(example_data$X), "\n")
#> X dimensions: 200 30
cat("Y length:     ", length(example_data$Y), "\n")
#> Y length:      200
cat("hier_info dimensions:", dim(example_data$hier_info), "\n")
#> hier_info dimensions: 200 2

# Peek at hier_info
head(example_data$hier_info)
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    1    1
#> [3,]    1    1
#> [4,]    1    1
#> [5,]    1    1
#> [6,]    1    1

# Group sizes
table_mdc <- table(example_data$hier_info[, 1])
cat("\nObservations per MDC group:\n")
#> 
#> Observations per MDC group:
print(table_mdc)
#> 
#>   1   2 
#> 100 100

table_drg <- table(example_data$hier_info[, 2])
cat("\nObservations per DRG subgroup (first 10):\n")
#> 
#> Observations per DRG subgroup (first 10):
print(head(table_drg, 10))
#> 
#>  1  2  3  4 
#> 50 50 50 50

# Outcome distribution (binary)
cat("\nOutcome distribution:\n")
#> 
#> Outcome distribution:
print(table(example_data$Y))
#> 
#>   0   1 
#> 108  92

5 Fitting the model

5.1 `hierNest()` — fit at a single penalty level

hierNest() fits the full regularization path for a single pair of hyperparameters \((\alpha_1, \alpha_2)\). It is useful for exploration or when tuning parameters have already been selected.

library(hierNest)
fit <- hierNest(
  x         = example_data$X,
  y         = example_data$Y,
  hier_info = example_data$hier_info,
  family    = "binomial",
  asparse1  = 1,    # weight for MDC-level group penalty
  asparse2  = 1     # weight for DRG-level subgroup penalty
)

# The fit contains a lambda sequence and corresponding coefficient paths
length(fit$lambda)
dim(fit$beta)   # rows = reparameterized coefficients, cols = lambda values

Key arguments:

Argument	Description
`x`	\(n \times p\) predictor matrix
`y`	Response (numeric for Gaussian, 0/1 or factor for binomial)
`hier_info`	\(n \times 2\) group/subgroup index matrix
`family`	`"gaussian"` or `"binomial"`
`asparse1`	\(\alpha_1\): relative weight for the MDC-level overlapping group penalty
`asparse2`	\(\alpha_2\): relative weight for the DRG-level overlapping group penalty
`nlambda`	Length of the \(\lambda\) sequence (default 100)
`standardize`	Whether to standardize predictors before fitting (default `TRUE`)
`method`	For now, only `"overlapping"` (hierNest-OGLasso, default)

5.2 `cv.hierNest()` — cross-validated tuning parameter selection

In practice, the penalty hyperparameters \(\lambda\), \(\alpha_1\), and \(\alpha_2\) need to be chosen by cross-validation. cv.hierNest() wraps the fitting procedure with cross-validation and returns the optimal parameter combination.

5.2.1 Cross-validation methods

The cvmethod argument controls how \(\alpha_1\) and \(\alpha_2\) are selected:

"general" — uses a single pair of \(\alpha_1\) and \(\alpha_2\) (supplied as scalars) and runs standard cross-validation over the \(\lambda\) sequence only.
"grid_search" — searches a grid of \(\alpha_1 \times \alpha_2\) pairs. The ranges are specified via asparse1 and asparse2 (each a length-2 vector giving the lower and upper bound), with asparse1_num and asparse2_num controlling the grid resolution.
"user_supply" — the user provides explicit paired vectors of \((\alpha_1, \alpha_2)\) values to evaluate.

5.2.2 Recommended: grid search

cv.fit <- cv.hierNest(
  x           = example_data$X,
  y           = example_data$Y,
  method      = "overlapping",
  hier_info   = example_data$hier_info,
  family      = "binomial",
  partition   = "subgroup",    # stratify CV folds within subgroups
  cvmethod    = "grid_search",
  asparse1    = c(0.5, 20),    # search alpha_1 over [0.5, 20]
  asparse2    = c(0.05, 0.20), # search alpha_2 over [0.05, 0.20]
  asparse1_num = 3,            # 3 x 3 = 9 grid points
  asparse2_num = 3,
  nlambda     = 50,            # lambda values per grid point
  pred.loss   = "ROC"          # maximize AUROC
)

Note on partition = "subgroup": This stratifies the cross-validation folds so that each fold contains observations from all DRG subgroups (to the extent possible). This avoids degenerate folds where a small subgroup is entirely absent from the training data.

5.2.3 Single hyperparameter pair (fast)

cv.fit.simple <- cv.hierNest(
  x         = example_data$X,
  y         = example_data$Y,
  method    = "overlapping",
  hier_info = example_data$hier_info,
  family    = "binomial",
  partition = "subgroup",
  cvmethod  = "general",
  asparse1  = 1,
  asparse2  = 0.1,
  nlambda   = 100,
  pred.loss = "ROC"
)

5.2.4 User-supplied grid

cv.fit.user <- cv.hierNest(
  x         = example_data$X,
  y         = example_data$Y,
  method    = "overlapping",
  hier_info = example_data$hier_info,
  family    = "binomial",
  partition = "subgroup",
  cvmethod  = "user_supply",
  asparse1  = c(0.5, 1, 5, 10),   # explicit (alpha1, alpha2) pairs
  asparse2  = c(0.05, 0.1, 0.1, 0.2),
  nlambda   = 50
)

6 Inspecting the cross-validated fit

After running cv.hierNest(), several components of the returned object are of interest.

# Optimal tuning parameters selected by cross-validation
cv.fit$lambda.min   # lambda minimising CV loss
cv.fit$min_alpha1   # alpha_1 selected
cv.fit$min_alpha2   # alpha_2 selected


# Number of non-zero coefficients in the reparameterized model
# at the optimal lambda
nnz <- sum(cv.fit$hierNest.fit$beta[,
           which(cv.fit$hierNest.fit$lambda == cv.fit$lambda.min)] != 0)
cat("Non-zero reparameterized coefficients:", nnz, "\n")

7 Visualizing the estimated effects

The plot() method for cv.hierNest objects provides two heatmap visualizations.

7.1 Coefficient heatmap

Plotting with type = "coefficients" shows the estimated overall, MDC-specific, and DRG-specific components for each selected predictor, at the cross-validation optimal \(\lambda\).

# Returns a plotly interactive heatmap
p_coef <- plot(cv.fit, type = "coefficients")
p_coef

Each row of the heatmap corresponds to one level in the hierarchy (overall mean, then each MDC, then each DRG within that MDC). Each column is a selected predictor. Blue cells indicate a positive contribution; red cells indicate negative. Rows with all zeros are suppressed.

7.2 Subgroup effect heatmap

Plotting with type = "Subgroup effects" reconstructs the total effect for each DRG subgroup: \(\hat{\beta}_d = \hat{\mu} + \hat{\eta}^{M(d)} + \hat{\delta}_d\).

p_sub <- plot(cv.fit, type = "Subgroup effects")
p_sub

This visualization is useful for comparing risk profiles across subgroups — subgroups with similar colors for a given predictor share similar risk processes for that covariate.

8 Methodological background

The statistical framework underlying hierNest is described in detail in:

Jiang, Z., Huo, L., Hou, J., Vaughan-Sarrazin, M., Smith, M. A., & Huling, J. D. (2026+). Heterogeneous readmission prediction with hierarchical effect decomposition and regularization.

8.1 Hierarchical reparameterization

For participant \(i\) with DRG \(D_i = d\), the logistic regression model is:

\[\text{logit}(\Pr(Y_i = 1 \mid X_i, D_i = d)) = X_i^\top \beta_d\]

The hierNest reparameterization decomposes \(\beta_d\) as:

\[\beta_d = \mu + \eta^{M(d)} + \delta_d\]

This is encoded via a modified design matrix \(X_H\) constructed as the Khatri–Rao (column-wise Kronecker) product of \(X\) and the hierarchy indicator matrix \(H = [\mathbf{1}_n \;;\; H_M \;;\; H_D]\).

8.2 Overlapping group lasso penalty

The overlapping group lasso penalty on the grouped coefficient vector \(\theta_j = \{\mu_j, \{\eta_{Mj}\}_{M}, \{\delta_{dj}\}_d\}\) for the \(j\)-th predictor is:

\[P_{\text{og}}(\theta) = \lambda \sum_{j=1}^{p} \left[ |\theta_j|_2 + \sum_{M \in \mathcal{M}} \left(\alpha_1 |\theta_{Mj}|_2 + \alpha_2 |\eta_{Mj}| \right) \right]\]

This penalty enforces a hierarchical zero pattern: \(\mu_j = 0\) implies all \(\eta_{Mj} = 0\) and \(\delta_{dj} = 0\); \(\eta_{Mj} = 0\) implies all \(\delta_{dj} = 0\) for \(d \in M\).

8.3 Computation

For method = "overlapping", the majorization-minimization (MM) algorithm described in Algorithm 1 of the paper is used, with sequential strong rules to skip inactive predictor groups and accelerate computation. The design matrix is stored as a sparse matrix to minimise memory requirements.

9 Session information

sessionInfo()
#> R version 4.3.2 (2023-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 11 x64 (build 26200)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=C                               
#> [2] LC_CTYPE=Chinese (Simplified)_China.utf8   
#> [3] LC_MONETARY=Chinese (Simplified)_China.utf8
#> [4] LC_NUMERIC=C                               
#> [5] LC_TIME=Chinese (Simplified)_China.utf8    
#> 
#> time zone: America/Chicago
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] hierNest_1.0.2
#> 
#> loaded via a namespace (and not attached):
#>  [1] Matrix_1.6-5       gtable_0.3.6       jsonlite_1.8.8     dplyr_1.1.4       
#>  [5] compiler_4.3.2     tidyselect_1.2.1   tidyr_1.3.1        jquerylib_0.1.4   
#>  [9] scales_1.4.0       yaml_2.3.12        fastmap_1.1.1      lattice_0.22-5    
#> [13] ggplot2_4.0.0      R6_2.6.1           generics_0.1.4     knitr_1.51        
#> [17] htmlwidgets_1.6.4  dotCall64_1.1-1    tibble_3.2.1       bslib_0.7.0       
#> [21] pillar_1.11.1      RColorBrewer_1.1-3 rlang_1.1.6        cachem_1.0.8      
#> [25] xfun_0.56          sass_0.4.9         S7_0.2.0           lazyeval_0.2.2    
#> [29] otel_0.2.0         viridisLite_0.4.2  plotly_4.10.4      cli_3.6.2         
#> [33] magrittr_2.0.3     digest_0.6.34      grid_4.3.2         rstudioapi_0.16.0 
#> [37] rTensor_1.4.8      lifecycle_1.0.4    vctrs_0.6.5        evaluate_1.0.5    
#> [41] glue_1.8.0         data.table_1.17.8  farver_2.1.1       purrr_1.0.2       
#> [45] rmarkdown_2.30     httr_1.4.8         tools_4.3.2        pkgconfig_2.0.3   
#> [49] htmltools_0.5.8.1

Getting Started with hierNest

Ziren Jiang, Lingfeng Huo, Jue Hou, and Jared D. Huling

2026-04-15

1 Introduction

1.1 Motivation

1.2 Two penalization strategies

1.3 Package scope

2 Installation

3 The `hier_info` matrix

4 Example data

5 Fitting the model

5.1 `hierNest()` — fit at a single penalty level

5.2 `cv.hierNest()` — cross-validated tuning parameter selection

5.2.1 Cross-validation methods

5.2.2 Recommended: grid search

5.2.3 Single hyperparameter pair (fast)

5.2.4 User-supplied grid

6 Inspecting the cross-validated fit

7 Visualizing the estimated effects

7.1 Coefficient heatmap

7.2 Subgroup effect heatmap

8 Methodological background

8.1 Hierarchical reparameterization

8.2 Overlapping group lasso penalty

8.3 Computation

9 Session information

Getting Started with hierNest

Ziren Jiang, Lingfeng Huo, Jue Hou, and Jared D. Huling

2026-04-15

1 Introduction

1.1 Motivation

1.2 Two penalization strategies

1.3 Package scope

2 Installation

3 The hier_info matrix

4 Example data

5 Fitting the model

5.1 hierNest() — fit at a single penalty level

5.2 cv.hierNest() — cross-validated tuning parameter selection

5.2.1 Cross-validation methods

5.2.2 Recommended: grid search

5.2.3 Single hyperparameter pair (fast)

5.2.4 User-supplied grid

6 Inspecting the cross-validated fit

7 Visualizing the estimated effects

7.1 Coefficient heatmap

7.2 Subgroup effect heatmap

8 Methodological background

8.1 Hierarchical reparameterization

8.2 Overlapping group lasso penalty

8.3 Computation

9 Session information

3 The `hier_info` matrix

5.1 `hierNest()` — fit at a single penalty level

5.2 `cv.hierNest()` — cross-validated tuning parameter selection