---
title: "6. Machine Learning with Random Survival Forests"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{6. Machine Learning with Random Survival Forests}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5
)
```

## Introduction

While classical models like the Cox Proportional Hazards model are highly interpretable, they rely on strict assumptions of linear, additive effects. Random Survival Forests (RSF) overcome this by building a non-linear ensemble of decision trees. 

In this tutorial, we will demonstrate how to run a standalone RSF using `SuperSurv`'s unified wrappers, evaluate its individual performance, and finally include it in a Super Learner ensemble.

## 1. Prepare the Data

```{r setup, message=FALSE, warning=FALSE}
library(SuperSurv)
library(survival)

data("metabric", package = "SuperSurv")
set.seed(123)

train_idx <- sample(1:nrow(metabric), 0.7 * nrow(metabric))
train <- metabric[train_idx, ]
test  <- metabric[-train_idx, ]

X_tr <- train[, grep("^x", names(metabric))]
X_te <- test[, grep("^x", names(metabric))]

new.times <- seq(50, 200, by = 25) 
```

## 2. Standalone Machine Learning

`SuperSurv` provides unified wrappers that automatically handle the training and standardization of survival probabilities. You can use these completely independent of the Super Learner ensemble.

```{r standalone-rf}
# 1. Fit the standalone wrapper
rf_standalone <- surv.rfsrc(
  time = train$duration,
  event = train$event,
  X = X_tr,
  new.times = new.times 
)

# 2. Extract the fitted model object and prediction matrix
rf_fit <- rf_standalone$fit
rf_pred_matrix <- rf_standalone$pred
```

Because our plotting functions are universally compatible, we can plot individual patient curves directly from this standalone matrix:


```{r plot-standalone, fig.align='center'}
# Plot the first 3 patients in our training set
plot_predict(preds = rf_pred_matrix, eval_times = new.times, patient_idx = 1:3)
```

## 3. Evaluating the Standalone Model

We can also pass this standalone model directly into our evaluation suite to test its performance on new data. 

```{r eval-standalone, fig.align='center', fig.height=4, fig.width= 9}
# The function automatically detects this is a single model and plots it!
plot_benchmark(
  object = rf_fit,
  newdata = X_te,
  time = test$duration,
  event = test$event,
  eval_times = new.times
)
```


```{r}

# plot_calibration(
#   object   = rf_fit,
#   newdata  = X_te,
#   time     = test$duration,
#   event    = test$event,
#   eval_time = 150,
#   bins     = 2
# )

```


## 4. Train the Benchmark Ensemble

While the standalone RSF is powerful, we can objectively evaluate if it outperforms classical models by putting them together in a `SuperSurv` ensemble. 

```{r fit-models, results='hide', message=FALSE, warning=FALSE}
my_library <- c("surv.coxph", "surv.weibull", "surv.rfsrc")

fit_supersurv <- SuperSurv(
  time = train$duration,
  event = train$event,
  X = X_tr,
  newdata = X_te,
  new.times = new.times,
  event.library = my_library,
  cens.library = c("surv.coxph"), 
  control = list(saveFitLibrary = TRUE),
  verbose = FALSE,
  nFolds = 3
)
```

## 5. Visualizing Ensemble vs. Base Learners

When we pass the `SuperSurv` ensemble into the exact same benchmark function, it automatically unpacks the library and plots the ensemble against all its constituent models.


```{r plot-ensemble-benchmark, fig.align='center', fig.height=9}
plot_benchmark(
  object = fit_supersurv,
  newdata = X_te,
  time = test$duration,
  event = test$event,
  eval_times = new.times
)
```

By incorporating advanced machine learning algorithms into your library, `SuperSurv` mathematically guarantees that your final predictions adapt to the complexity of your data, achieving the lowest possible Brier Score.