--- title: "Cross-validation with the landmaRk package" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Cross-validation with the landmaRk package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Setup In addition to the `landmaRk` package, we will also use `tidyverse`. ```{r setup} set.seed(123) library(landmaRk) library(tidyverse) library(survival) library(prodlim) ``` ## Example: `aids` data As in the first vignette, we use the epileptic dataset to perform landmarking analysis of time-to-event data with time-varying covariates. Here is the structure of the dataset. ```{r} library(JMbayes2) data("aids") aids$patient <- as.numeric(aids$patient) str(aids) ``` ## Initialising the landmarking analysis for cross-validation As in the introductory vignette, the dataset `epileptic` comes in wide format. We split it into two dataframes, one for static covariates and one for dynamic covariates. ```{r} # DF with Static covariates aids_dfs <- split_wide_df( aids, ids = "patient", times = "obstime", static = c("Time", "death", "drug", "gender", "prevOI"), dynamic = c("CD4"), measurement_name = "value" ) static <- aids_dfs$df_static head(static) ``` ```{r} # DF with Dynamic covariates dynamic <- aids_dfs$df_dynamic head(dynamic[["CD4"]]) ``` As in the introductory vignette, we create an object of class `LandmarkAnalysis`, using the helper function of the same name. We now use the optional argument `K` to specify the number of cross-validations folds. In this example, we request five cross validation folds. We then calculate the risk sets using `compute_risk_sets()`. ```{r} landmarking_object <- LandmarkAnalysis( data_static = static, data_dynamic = dynamic, event_indicator = "death", ids = "patient", event_time = "Time", times = "obstime", measurements = "value", K = 5 ) landmarking_object <- landmarking_object |> compute_risk_sets(landmarks = c(6, 8)) ``` ## Performance evaluation in hold-out dataset Now that we have split the dataset into `K=5` parts for cross-validation, we can use one of the five parts as test set and the remaining four parts as the training set. To do so, use the argument `validation_fold` to indicate the that you want to use as test set when calling `fit_longitudinal()`, `predict_longitudinal()`, `fit_survival()` and `predict_survival()`. ```{r} landmarking_object <- landmarking_object |> fit_longitudinal( landmarks = c(6, 8), method = "lme4", formula = value ~ prevOI + obstime + (obstime | patient), dynamic_covariates = c("CD4"), validation_fold = 5 ) |> predict_longitudinal( landmarks = c(6, 8), method = "lme4", dynamic_covariates = c("CD4"), validation_fold = 5, ) |> fit_survival( formula = Surv(event_time, event_status) ~ drug, landmarks = c(6, 8), horizons = 12 + c(6, 8), method = "coxph", dynamic_covariates = c("CD4"), validation_fold = 5 ) |> predict_survival( landmarks = c(6, 8), horizons = 12 + c(6, 8), validation_fold = 5 ) ``` We can also use `summary()` to print parameter estimates. Note that this time the sample size is smaller, because we have held out part of the original dataset for validation. ```{r} summary(landmarking_object, type = "longitudinal", landmark = 6, dynamic_covariate = "CD4") ``` ```{r} summary(landmarking_object, type = "survival", landmark = 6, horizon = 18) ``` Here are the in-sample performance metrics: ```{r} performance_metrics( landmarking_object, landmarks = c(6, 8), horizons = c(18, 20), auc_t = TRUE, c_index = FALSE, h_times = c(3, 6, 12) ) ``` Out-of-sample performance metrics can be obtained by specifying `train = FALSE`: ```{r} performance_metrics( landmarking_object, landmarks = c(6, 8), horizons = c(18, 20), auc_t = TRUE, c_index = FALSE, h_times = c(3, 6, 12), train = FALSE ) ``` ## Cross-validation Now, we can embed the above pipeline in a for loop to perform cross-validation: ```{r} landmarking_object <- LandmarkAnalysis( data_static = static, data_dynamic = dynamic, event_indicator = "death", ids = "patient", event_time = "Time", times = "obstime", measurements = "value", K = 5 ) landmarking_object <- landmarking_object |> compute_risk_sets(landmarks = c(6, 8)) ``` ```{r} metrics <- list() for (k in 1:5) { metrics[[k]] <- landmarking_object |> fit_longitudinal( landmarks = c(6, 8), method = "lme4", formula = value ~ prevOI + obstime + (obstime | patient), dynamic_covariates = c("CD4"), validation_fold = k ) |> predict_longitudinal( landmarks = c(6, 8), method = "lme4", dynamic_covariates = c("CD4"), validation_fold = k, allow.new.levels = TRUE ) |> fit_survival( formula = Surv(event_time, event_status) ~ drug, landmarks = c(6, 8), horizons = 12 + c(6, 8), method = "coxph", dynamic_covariates = c("CD4"), validation_fold = k ) |> predict_survival( landmarks = c(6, 8), horizons = 12 + c(6, 8), validation_fold = k ) |> performance_metrics( landmarks = c(6, 8), horizons = c(18, 20), auc_t = TRUE, brier = TRUE, c_index = FALSE, h_times = c(3, 6, 12) ) } metrics ```