--- title: "Getting Started with bayespmtools" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with bayespmtools} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## What is bayespmtools? The **bayespmtools** package helps you determine sample sizes for external validation studies of risk prediction models using a Bayesian approach. Unlike traditional methods that require fixed performance values, this package allows you to incorporate uncertainty about model performance into your calculations. ## Why use a Bayesian approach? Traditional sample size calculations require you to specify exact values for metrics like the c-statistic or calibration slope. But in reality, we're uncertain about these values. The Bayesian approach in **bayespmtools** lets you: - Express uncertainty about model performance using probability distributions - Calculate sample sizes based on expected precision OR assurance levels - Incorporate Value of Information (VoI) analysis to assess clinical utility ## Quick Example Let's walk through a simple example. Suppose you're planning to externally validate a risk prediction model and you have some prior information about its likely performance. ```{r setup} library(bayespmtools) set.seed(123) # Set seed for reproducibility ``` ### Step 1: Specify Your Evidence First, define what you know (or believe) about the model's performance using probability distributions: ```{r} evidence <- list( prev ~ beta(116, 155), # Outcome prevalence cstat ~ beta(3628, 1139), # C-statistic cal_mean ~ norm(-0.009, 0.125), # Mean calibration error cal_slp ~ norm(0.995, 0.024) # Calibration slope ) ``` **What this means:** - `prev`: Outcome prevalence - `cstat`: c-statistic (discrimination) - `cal_mean`: Mean calibration error (differences between average observed and expected risks) - `cal_slp`: Expected calibration slope (from a logistic model regressing observed outcome on logit-transformed predicted risks) You can parameterize distributions flexibly using means and SDs, confidence interval bounds, or natural parameters. ### Step 2: Define Your Targets Next, specify what precision you want to achieve. We want to evaluate sample size size on three rules: - Targeting expected 95% CI width for c-statistic, observed-to-expected outcome ratio (cal_oe), and calibration slope (cal_slp). - Targeting 90% assurance on calibration slope. In particular, we want to be 90% confident that the calibration slope's CI width will be no greater a maximum tolerable value. - Using Value of Information (voI) criterion for net benefit: we want this validation study to reduce net benefit loss due to uncertainty by 90%. ```{r} targets <- list( eciw.cstat = 0.1, # Expected CI width for c-statistic eciw.cal_oe = 0.22, # Expected CI width for O/E ratio eciw.cal_slp = 0.30, # Expected CI width for calibration slope qciw.cal_slp = c(0.9, 0.35), # 90% assurance that CI width ≤ 0.35 voi.nb = 0.90 ) ``` ### Step 3: Calculate Sample Size Now run the main calculation: ```{r eval=FALSE} results <- bpm_valsamp( evidence = evidence, targets = targets, n_sim = 1000, # Number of Monte Carlo simulations threshold = 0.2 # Risk threshold for net benefit calculations ) ``` NOTE: the number of simulations (n_sim) of 1,000 is low and is used here for convenience. In practice, consider simulation sizes of at least 10,000 and consider stability of results using different random seed. ```{r include=FALSE} # For vignette building speed, we'll use pre-computed results # In practice, run the code above results <- list(results = c( eciw.cstat = 347, eciw.cal_oe = 430, eciw.cal_slp = 1037, qciw.cal_slp = 896, voi.nb = 717 )) ``` ### Step 4: View Results ```{r} print(results$results) ``` The output shows the required sample size for each criterion. The largest sample size (1037) ensures all targets are met. However, the VoI criterion indicates that a sample size of 717 is expected to reduce uncertainty-related clinical utility loss by 90%. This might be used to reason that the criteria on calibration slope can potentially be relaxed. ## Next Steps For more advanced usage, see the full tutorial vignette: ```{r eval=FALSE} vignette("bayespmtools_tutorial") ``` This covers: - Working with different distribution types - Net benefit and Value of Information analysis - Precision calculations for fixed sample sizes - A real-world case study ## Key Functions - `bpm_valsamp()`: Calculate required sample size given targets - `bpm_valprec()`: Calculate precision/VoI given a fixed sample size ## Getting Help For detailed documentation on any function: ```{r eval=FALSE} ?bpm_valsamp ?bpm_valprec ``` Visit the package repository: https://github.com/resplab/bayespmtools ## References For methodological details, see: Sadatsafavi M, et al. (2025). Bayesian sample size considerations for external validation of risk prediction models. *Statistics in Medicine*. doi:10.1002/sim.70389 ```