--- title: "5. Advanced Hyperparameter Tuning & Grid Search" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{5. Advanced Hyperparameter Tuning & Grid Search} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction A single machine learning configuration is rarely optimal for every dataset. A Random Forest with deep trees might overfit on small data, while shallow trees might underfit on complex data. In the Super Learner framework, we handle "tuning" by adding multiple versions of the same algorithm to our library, each with different hyperparameters. The meta-learner automatically evaluates them via cross-validation and assigns the highest weight to the best configuration. `SuperSurv` provides the `create_grid` function to automate this process entirely. ## 1. Prepare the Data ```{r setup, message=FALSE, warning=FALSE} library(SuperSurv) library(survival) data("metabric", package = "SuperSurv") set.seed(123) train_idx <- sample(1:nrow(metabric), 0.7 * nrow(metabric)) train <- metabric[train_idx, ] test <- metabric[-train_idx, ] X_tr <- train[, grep("^x", names(metabric))] X_te <- test[, grep("^x", names(metabric))] new.times <- seq(50, 200, by = 25) ``` ## 2. Defining the Search Space Let's assume we want to tune a Random Survival Forest (`surv.rfsrc`). Instead of writing manual wrapper functions for every combination of parameters, we define a list of the values we want to test. ```{r, message=FALSE, warning=FALSE} # Define our hyperparameter search space rf_tuning_params <- list( nodesize = c(3, 15, 30), # Deep vs. Shallow trees ntree = c(100, 500) # Number of trees in the forest ) ``` ## 3. Generating the Tuning Grid We pass our base learner name and parameter list into `create_grid()`. The function will automatically generate unique R functions for all 6 combinations in your global environment and return their names! ```{r create-grid} # Automatically generate 6 different Random Forest wrappers rf_grid <- create_grid( base_learner = "surv.rfsrc", grid_params = rf_tuning_params ) # View the dynamically generated function names print(rf_grid) ``` ## 4. Fitting the Tuned Ensemble You can now pass this dynamically generated character vector directly into your `SuperSurv` library. We can combine our tuned ML grid with a baseline Cox model to ensure we always have a linear fallback. ```{r fit-tuned-model, eval=FALSE} # Combine baseline with our 6 tuned forests my_ml_library <- c("surv.coxph", rf_grid) # Fit the ensemble (pseudo-code) fit_tuned <- SuperSurv( time = train$duration, event = train$event, X = X_tr, newdata = X_te, new.times = new.times, event.library = my_ml_library, cens.library = c("surv.coxph"), nFolds = 3 ) # Inspect which hyperparameter combination won! round(fit_tuned$event.coef, 3) ``` By leveraging `create_grid()`, you can easily execute a massive hyperparameter search space across XGBoost, SVMs, or Random Forests, mathematically guaranteeing that your final ensemble minimizes the cross-validated risk without any manual trial-and-error!