hcruR

R-CMD-check Codecov test coverage GitHub version License: MIT Lifecycle: experimental

hcruR is an R package to help health economists and RWE analysts estimate and compare healthcare resource utilization (HCRU) from observational healthcare data, such as claims or electronic health records.


🚀 Features


📥 Installation

You can install the development version of hcruR from GitHub with:

# Install from GitHub (after you upload the repo)
# install.packages("devtools")
devtools::install_github("mumbarkar/hcruR")

# install.packages("pak")
pak::pak("mumbarkar/hcruR")

đź§Ş Example Usage

This is a basic example which shows you how to solve a common problem:

This is a basic example which shows you how to solve a common problem:

# Load library
library(hcruR)

## Generate HCRU summary using dplyr (this can be used for create HCRU plots)

# Load sample data
data(hcru_sample_data)
data <- hcru_sample_data
head(hcru_sample_data)

# Estimate HCRU
hcru_summary <- estimate_hcru(data,
                          cohort_col = "cohort",
                          patient_id_col = "patient_id",
                          admit_col = "admission_date",
                          discharge_col = "discharge_date",
                          index_col = "index_date",
                          visit_col = "visit_date",
                          encounter_id_col = "encounter_id",
                          setting_col = "care_setting",
                          cost_col = "cost_usd",
                          readmission_col = "readmission",
                          time_window_col = "period",
                          los_col = "length_of_stay",
                          custom_var_list = NULL,
                          pre_days = 180,
                          post_days = 365,
                          readmission_days_rule = 30,
                          group_var_main = "cohort",
                          group_var_by = "care_setting",
                          test = NULL,
                          timeline = "Pre",
                          gt_output = FALSE)

hcru_summary

## Generate HCRU summary using gtsummary (a publication ready output) 

# Estimate HCRU
hcru_summary_gt <- estimate_hcru(data,
                          cohort_col = "cohort",
                          patient_id_col = "patient_id",
                          admit_col = "admission_date",
                          discharge_col = "discharge_date",
                          index_col = "index_date",
                          visit_col = "visit_date",
                          encounter_id_col = "encounter_id",
                          setting_col = "care_setting",
                          cost_col = "cost_usd",
                          readmission_col = "readmission",
                          time_window_col = "period",
                          los_col = "length_of_stay",
                          custom_var_list = NULL,
                          pre_days = 180,
                          post_days = 365,
                          readmission_days_rule = 30,
                          group_var_main = "cohort",
                          group_var_by = "care_setting",
                          test = NULL,
                          timeline = "Pre",
                          gt_output = TRUE)

hcru_summary_gt

## Generate the HCRU plot for average visits by cohort and time-line
# Calculate the average visits
sum_df1 <- hcru_summary$`Summary by settings using dplyr` |>
    dplyr::group_by(
      .data[["time_window"]], 
      .data[["cohort"]], 
      .data[["care_setting"]]) |>
    dplyr::summarise(
      AVG_VISIT = mean(.data[["Visits"]], na.rm = TRUE), .groups = "drop")

# Load the plot_hcru function
p1 <- plot_hcru(
  summary_df = sum_df1,
  x_var = "time_window",
  y_var = "AVG_VISIT",
  cohort_col = "cohort",
  facet_var = "care_setting",
  facet_var_n = 3,
  title = "Average visits by domain and cohort",
  x_label = "Healthcare Setting (Domain)",
  y_label = "Average visit",
  fill_label = "Cohort"
)

p1


## Generate HCRU plot for average cost by cohort and timeline
# Calculate the total cost
df2 <- hcru_summary$`Summary by settings using dplyr` |>
    dplyr::group_by(
      .data[["time_window"]], 
      .data[["cohort"]], 
      .data[["care_setting"]]) |>
    dplyr::summarise(
      AVG_COST = sum(.data[["Cost"]], na.rm = TRUE), .groups = "drop")

p2 <- plot_hcru(
  summary_df = df2,
  x_var = "time_window",
  y_var = "AVG_COST",
  cohort_col = "cohort",
  facet_var = "care_setting",
  facet_var_n = 3,
  title = "Average cost by domain and cohort",
  x_label = "Healthcare Setting (Domain)",
  y_label = "Average cost",
  fill_label = "Cohort"
)

p2

đź§ľ Function Reference

estimate_hcru() estimates of healthcare resource utilization (HCRU) from electronic health record data across various care settings (e.g., IP, OP, ED/ER). It provides descriptive summaries of patient counts, encounters, costs, length of stay, and readmission rates for pre- and post-index periods

Arguments

Argument Type Description
data data.frame Input claims dataset
cohort_col character Column name for cohort group
patient_id_col character Column name for patient ID
admit_col character Admission/start date column
discharge_col character Discharge/end date column
index_col character Index or anchor date for each patient
visit_col character Visit or claim date
encounter_id_col character Encounter or claim ID
setting_col character Setting type (e.g., “IP”, “OP”, “ED”)
cost_col character Column for cost data
readmission_col character Readmission indicator column
time_window_col character “Pre”/“Post” period column
los_col character Length of stay column
custom_var_list character Additional user-defined metrics (optional)
pre_days numeric Days before index date (default = 180)
post_days numeric Days after index date (default = 365)
readmission_days_rule numeric Max days for qualifying readmission (default = 30)
group_var_main character Main grouping variable (default = “cohort”)
group_var_by character Secondary grouping variable (e.g., “care_setting”)
test list Named list of tests for continuous vars (e.g., list(cost = "wilcox.test"))
timeline character Time window label (e.g., “Pre”, “Post”)
gt_output logical Whether to return a formatted gtsummary output (default = TRUE)
... ... Additional arguments for gtsummary::tbl_summary() if gt_output = TRUE
return_type character Type of output to return: “dplyr” for dplyr summary, “gtsummary” for gtsummary output (default = “dplyr”)

plot_hcru() provides the visualization of the events of the settings/domains grouped by cohort and time window.

Arguments:

Argument Type Description
summary_df dataframe Output from estimate_hcru() function.
x_var character Column name to plot on the x-axis (default "period").
y_var character Column name to plot on the y-axis (default "Cost").
cohort_col character Name of the column identifying cohorts (default "cohort").
facet_var character Column to generate subplots for (default "care_setting").
facet_var_n numeric Number of columns in the facet grid (default 3).
title character Title of the plot.
x_lable character Label for the x-axis.
y_lable character Label for the y-axis.
fill_lable character Label for the fill legend.

📊 Sample Data

This package includes a demo datasets for easy testing:

head(hcru_sample_data)

📚 Vignette

Run the following to access the full walkthrough:

vignette("hcru-analysis", package = "hcruR")

🔬 Use Cases


🛠️ Development

To contribute:

git clone https://github.com/mumbarkar/hcruR.git
cd hcruR

📜 License

This package is licensed under the MIT License.