Generating unit test data

It can be useful to create a data set with a known distribution for testing novel modeling approaches. In this case, the sample data set generated is used for unit testing the spect package.


rng_seed <- 42
set.seed(rng_seed)

syn_data <- create_synthetic_data(sample_size=2500,
                      censor_percentage = 0.1,
                      perturbation_shift = 6)
#> INFO [2025-04-06 20:26:02] Creating 2500 income samples from normal distribution of median 50000, variance 10000 
#>             and watchtimes samples from uniform distribution with min: 0 and max: 6
                      

source_data <- select(syn_data, -c(baseline_time_to_cancel, perturbed_baseline))

predict_data <- source_data[1:10,]
modeling_data <- source_data[11:nrow(source_data),]

Training the model then becomes a straightforward call to spect_train.


event_indicator_var <- "cancel_event_detected"
survival_time_var <- "total_months"
obs_window <- 48
alg="glm"

result <- spect_train(model_algorithm=alg, modeling_data=modeling_data,
                      event_indicator_var=event_indicator_var,
                      survival_time_var=survival_time_var,
                      obs_window=obs_window, use_parallel=FALSE)
#> INFO [2025-04-06 20:26:02] Splitting test/train data at 0.200000/0.800000...
#> INFO [2025-04-06 20:26:02] Creating person-period data set...
#> INFO [2025-04-06 20:26:03] Creating caret train control...
#> INFO [2025-04-06 20:26:03] Training glm using repeatedcv method with 10 resamples and 3 kfold repeats
#> INFO [2025-04-06 20:26:05] Calculating probabilities on 498 test individuals...
#> INFO [2025-04-06 20:26:05] Transforming 498 rows of data

training_synthetic_data

Generating unit test data