--- title: "11. Extending SuperSurv" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{11. Extending SuperSurv} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` This vignette describes the practical contract used by `SuperSurv()` to call survival learners and screening methods. The goal is to make small extensions possible without changing the ensemble algorithm. ## How Learners Are Found Learners are supplied to `SuperSurv()` by name: ```r event.library <- c("surv.coxph", "my.surv.km") ``` During cross-validation and final fitting, `SuperSurv()` looks up each name as an R function and calls it directly. A custom learner can therefore live in a user script, package, or analysis file, as long as it is available in the R session before calling `SuperSurv()`. ## Learner Wrapper Contract A survival learner wrapper should have this shape: ```r my.surv.learner <- function(time, event, X, newdata, new.times, obsWeights, id, ...) { # fit model on X, time, event # predict survival probabilities for newdata at new.times list(pred = pred_matrix, fit = fitted_object) } ``` The required arguments are: | Argument | Meaning | | --- | --- | | `time` | numeric observed follow-up time | | `event` | event indicator, with `1` for the event and `0` for censoring | | `X` | training covariates as a data frame | | `newdata` | covariates for prediction as a data frame | | `new.times` | time grid where survival probabilities are needed | | `obsWeights` | observation weights, possibly `NULL` | | `id` | optional cluster or subject identifier, possibly `NULL` | | `...` | learner-specific tuning parameters | The returned `pred` must be a numeric matrix with: - one row per row of `newdata`, - one column per value of `new.times`, - entries on the survival-probability scale, usually between `0` and `1`. If possible, each row should be non-increasing across time. Wrappers commonly clamp small numerical drift back into `[0, 1]`. The returned `fit` can be any object needed for future prediction. When `control = list(saveFitLibrary = TRUE)`, `SuperSurv()` stores these fitted objects and `predict.SuperSurv()` later calls `predict()` on them. ## Prediction Method Contract If a custom learner should support prediction on new data after fitting, store a classed object in `fit` and define a matching S3 prediction method: ```r predict.my.surv.learner <- function(object, newdata, new.times, ...) { pred_matrix } ``` This method should return the same type of matrix as the wrapper's `pred` component: `nrow(newdata)` rows by `length(new.times)` columns. ## Direct Survival Curves vs Risk Scores Some survival learners directly return survival curves. In that case, the wrapper usually interpolates the native prediction grid to `new.times`. Other learners return a risk score or linear predictor. Those wrappers need one extra calibration step: 1. fit the risk-score model, 2. estimate a baseline cumulative hazard on the training data, 3. predict risk scores for `newdata`, 4. convert scores to survival curves with `S(t | x) = exp(-exp(eta(x)) H0(t))`. Several built-in Cox-style wrappers follow this pattern. The internal helper `safe_breslow_step()` is used by some package wrappers, but it is not exported; custom wrappers should either use a standard package prediction method that already returns survival curves or implement their own baseline-hazard calibration carefully. ## Minimal Custom Learner The following learner ignores covariates and fits a weighted Kaplan-Meier curve. It is intentionally simple, but it satisfies the full `SuperSurv()` learner contract. ```{r custom-learner} my.surv.km <- function(time, event, X, newdata, new.times, obsWeights = NULL, id = NULL, ...) { if (is.null(obsWeights)) { obsWeights <- rep(1, length(time)) } fit_km <- survival::survfit( survival::Surv(time, event) ~ 1, weights = obsWeights ) step_surv <- stats::stepfun(fit_km$time, c(1, fit_km$surv), right = FALSE) surv_probs <- step_surv(new.times) pred <- matrix( surv_probs, nrow = nrow(newdata), ncol = length(new.times), byrow = TRUE ) fit <- list(object = fit_km) class(fit) <- "my.surv.km" list(pred = pred, fit = fit) } predict.my.surv.km <- function(object, newdata, new.times, ...) { fit_km <- object$object step_surv <- stats::stepfun(fit_km$time, c(1, fit_km$surv), right = FALSE) surv_probs <- step_surv(new.times) matrix( surv_probs, nrow = nrow(newdata), ncol = length(new.times), byrow = TRUE ) } ``` You can test the wrapper outside the ensemble before adding it to a library: ```{r custom-learner-check} data("metabric", package = "SuperSurv") dat <- metabric[1:40, ] x_cols <- grep("^x", names(dat))[1:3] X <- dat[, x_cols, drop = FALSE] times <- seq(50, 150, by = 50) wrapper_out <- my.surv.km( time = dat$duration, event = dat$event, X = X, newdata = X[1:5, , drop = FALSE], new.times = times ) dim(wrapper_out[["pred"]]) dim(predict(wrapper_out[["fit"]], newdata = X[1:5, , drop = FALSE], new.times = times)) ``` The custom learner can be mixed with built-in learners: ```{r custom-fit, eval=FALSE} fit <- SuperSurv( time = train$duration, event = train$event, X = X_train, newdata = X_test, new.times = seq(50, 300, by = 50), event.library = c("surv.coxph", "my.surv.km"), cens.library = c("surv.coxph"), control = list(saveFitLibrary = TRUE) ) event_weights(fit) ``` ## Screening Method Contract Screening methods are also supplied by name. A screener should accept the same basic training inputs and return a logical vector with one value per column of `X`: ```r my.screen.first3 <- function(time, event, X, obsWeights = NULL, id = NULL, ...) { keep <- rep(FALSE, ncol(X)) keep[seq_len(min(3, ncol(X)))] <- TRUE names(keep) <- names(X) keep } ``` Use a list entry to pair a learner with one or more screeners: ```{r custom-screener, eval=FALSE} fit <- SuperSurv( time = train$duration, event = train$event, X = X_train, newdata = X_test, new.times = seq(50, 300, by = 50), event.library = list(c("surv.coxph", "screen.all", "my.screen.first3")), cens.library = c("surv.coxph") ) selected_variables(fit, learner = 2) ``` ## Development Checklist Before using a custom wrapper in a large ensemble, check: - `pred` is a numeric matrix with dimensions `nrow(newdata)` by `length(new.times)`, - predictions are on the survival-probability scale, - `predict(wrapper_out[["fit"]], newdata, new.times)` returns the same shape when saved fits are needed, - the wrapper tolerates `obsWeights = NULL` and `id = NULL`, - all required packages are checked with `requireNamespace()`, - any screener returns a logical vector aligned with the columns of `X`. These rules are deliberately small. They allow new learners to be added without changing the SuperSurv ensemble machinery.