--- title: "Bayespmtools Package Tutorial" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{bayespmtools_tutorial} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: references.bib --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(bayespmtools) ``` ## Introduction Current sample size calculations for external validation of risk prediction models require researchers to specify fixed values of assumed model performance metrics alongside target precision levels. Using a Bayesian framework enables researchers to 1) include uncertainty in model performance in sample size calculations, 2) specify targets based on expected precision, assurance probabilities, and Value of Information (VoI). The BayesPMTools R package incorporates the implementation of our proposed Bayesian approach towards Riley's multi-criteria sample size calculations. Details of this approach are provided in @sadatsafavi2026: . The two core functions of this package are ***bpm_valsamp***, and ***bpm_valprec***. In general, ***bpm_valsamp*** is used for sample size calculations given a set of sample size rules, while ***bpm_valprec*** is used in fixed designs: quantifying the anticipated precision for a given set of sample sizes. In this tutorial, we use the ***bpm_valsamp*** function to determine sample size for an exemplary scenario, while using ***bpm_valprec*** to 1) check the validity of the results and 2) perform VoI analysis to quantify the expected gain in net benefit across sample size components. This tutorial assumes the BayesPMTools package is installed on your computer. Please refer to for more instructions on how to install and use this package. ```{r set-seed} # Setting the random number seed for reproducibility set.seed(123) ``` ### The ISARIC Model As a case study, we present sample size calculations for the ISARIC 4C model, a risk prediction model for predicting deterioration in patients hospitalized from COVID-19 @gupta2021isaric. The investigators used electronic hospital records from all 9 health regions of the UK to develop and validate this model. London was left out for external validation while the other 8 regions were used in creating the model. We assume we plan to validate this model in the London region. ### Specifying Evidence (Uncertainty Quantification) The investigators used leave-one-region out cross validation to estimate out-pf-sample performance metrics of the model. We use this analysis as the source of evidence on model performance for the London region. Based on the exchangeability assumption, the performance of the model in the London region will be a random draw from the predictive distribution of performance metrics observed across the other 8 regions of the UK. The Bayesian sample size calculation approach requires specifying prior distributions on 1) outcome prevalence (prev), 2) c-statistic (cstat), 3) calibration slope (cal_slp), 4) one of calibration intercept (cal_int), observed-to-expected (O/E) ratio (cal_oe), or mean calibration (cal_mean, difference between average observed and predicted risks - which is the one we use for this example). These distributions are derived from a meta-analysis of internal-external validation results from the original study. Details of this meta-analysis are provided elsewhere in @sadatsafavi2026. Here, we directly use the predictive distributions from this meta-analysis. Evidence can be specified in different ways. Perhaps the most straightforward is a list of R formulas, as below: ```{r load-evidence} evidence <- list( prev~beta(mean=0.428, sd=0.030), cstat~beta(mean=0.761, cih=0.773), cal_mean~norm(-0.009, 0.125), #mean and SD cal_slp~norm(0.995, 0.024) #mean and SD ) ``` Note the flexibility in specifying the parameters for distributions. For prevalence, we are parameterizing the Beta distribution by its mean and SD; for c-statistic, we are using mean and the upper bound of 95% confidence intervals (i.e., 97.5th percentile). These values are internally mapped to the parameters of their respective distribution. If parameters are not named, they are taken as the natural parameters for the distribution as defined in R, as in the last two components. Currently recognized distributions are norm (Normal), logitnorm (Logit-normal), probitnorm (Probit-normal), and beta (Beta). Note the flexibility in specifying the distributions. ### Sample size targets and rules Targets of sample size / precision calculations are generally divided into two groups: statistical metrics of model performance, and the net benefit (NB) as a measure of clinical utility. For statistical metrics, the user can choose among c-statistic (cstat), mean calibration error (cal_mean), O/E ratio (cal_oe), calibration intercept (cal_int), and calibration slope (cal_slp). For our example, we use the same targets as chosen in @riley2021minsample: The target 95% confidence interval widths are as follows: C-Statistic: 0.10, O/E ratio: 0.22, Calibration Slope: 0.3. For these metrics, two types of rules are definable: those that target the expected CI width (ECIW), and those that target a given quantile of these metrics (QCIW). The former can be seen as the Bayesian equivalent of the frequentist sample size calculation in @riley2021minsample. The latter is unique to our Bayesian framework and can be used to devise 'assurance-type' rules: determining the sample size that provides a given level of assurance that the CI width will be below a desired target. For our case study, we calculate sample sizes corresponding to the expected value of these metrics, as well as sample sizes that will provide 90% assurance that the future CI width will be at most as wide as the specified targets above. The expected confidence intervals widths (eciw) only require a target confidence interval width. Meanwhile, the quantile confidence interval widths (qciw) require the target confidence interval width to be specified along with the desired assurance (quantile). In our example we use a quantile of 0.9 for 90% assurance. For net benefit (NB) as a decision-theoretic metric, this framework suggests two sample size rules. The first is the Optimality Assurance. An Optimality Assurance of 90% indicates that we would like to be at least 90% confident that the strategy that we declare as the most optimal (highest NB) in the sample is truly the most optimal strategy. The assurance.nb parameter is therefore set at 0.9. The second sample size rule is the voi.nb, and is defined as the ratio of the Expected Value of Sample Information (EVSI) at a given sample size to Expected Value of Perfect Information (EVPI). A ratio of 0.8 means that the planned study is expected to reduce NB loss due to uncertainty by 80%. In our case study, we do not impose a VoI criteria. Instead, we will later calculate VoI metrics at sample size components for other targets. This will help us inspect the efficiency of sample sizes in terms of expected gain in NB. ```{r targets-samp} targets_samp <- list(eciw.cstat=0.1, eciw.cal_oe=0.22, eciw.cal_slp=0.30, qciw.cstat=c(0.9, 0.1), qciw.cal_oe=c(0.9, 0.22), qciw.cal_slp=c(0.9, 0.30), oa.nb=0.90) ``` ### Sample size calculator (*bpm_valsamp*()) This is the main function call that computes the sample size requirements based on our evidence and targets. It returns the required sample size for external validation for each parameter requested in the targets. ```{r eval=FALSE} samp <- bpm_valsamp(evidence=evidence, #Evidence as a list targets=targets_samp, #Targets (as specified above) n_sim=10000, #Number of Monte Carlo simulations threshold=0.2) #Risk threshold for NB calculations ``` ```{r include=FALSE} samp <- readRDS("precomputed/samp_result.rds") ``` Let's take a look at the sample size components. ```{r print-samp-results} #Print results print(samp$results) ``` As can be seen above, different components require different sample size estimates. The largest sample size is `r max(samp$results)`, dictated by the assurance desired for the calibration slope criterion. The smallest component, at `r min(samp$results)` pertains to Optimality Assurance. ### Computing precision / VoI for a given sample size (*bpm_valprec*()) This function returns the parameter values based on the given sample sizes and targets. The utility of this function is in fixed designs where we are interested in evaluating the sufficiency of the available sample size for various targets. For this example, we will be using the outputs from ***bpm_valsamp*** to verify that the 95% CI widths and VoI values for sample size components derived from ***bpm_valprec*** are indeed close to the desired ones. As such, we use the same evidence element defined earlier, and create a new target element *target_prec*, that contains the same parameters *target_samp* but defined as T for present eciw targets, or 0.9 for present qciw targets. ```{r} N_prec <- samp$results #Sample size components from the main function call. targets_prec = list(eciw.cstat = T, eciw.cal_oe = T, eciw.cal_slp = T, qciw.cstat = 0.9, qciw.cal_oe = 0.9, qciw.cal_slp = 0.9, oa.nb=T) ``` Note that for ECIW (as well as Optimality Assurance), we just express our desire to have them calculated. For QCIW values, the quantile value needs to be supplied. ```{r eval=FALSE} prec <- bpm_valprec( N = N_prec, #Sample sizes evidence = evidence, #Evidence as a list dist_type="logitnorm", #Distribution type for calibrated risks method="sample", #Sample based or two-level ("2s") method targets = targets_prec, #Targets specified above n_sim = 10000, #Number of Monte Carlo simulations threshold=0.2) #Risk threshold for NB VoI calculations ``` ```{r include=FALSE} prec <- readRDS("precomputed/prec_result.rds") ``` The main output of **bpm_valprec()** is a matrix named 'results' that contains all requested precision / VoI metrics at all requested sample sizes. The rows of this matrix are named as the requested sample size (as character), while the columns are named after sample size rules. This allows us to extract the relevant cells from this matrix using the short code below: ```{r compare_results} res <- prec$results[cbind(samp$results, names(samp$results))] names(res) <- names(samp$results) print(res) ``` These values should be close to the requested precision / assurance / VoI targets. If this is not the case, one should potentially run the analysis with larger values of n_sim, as well as checking if evidence is specified reasonably. ## Drawing the EVSI curve Finally, we use VoI analysis to determine the relative efficiency of sample size components in terms of their expected NB gain. Here, we call the **bpm_valprec()** function with only one target (voi.nb=TRUE). Also, note that for 'evidence' input, we are passing the sample object from the main results. This bypasses the sample creation component, which is already done once. Because of this, some other inputs are also not needed anymore. For example, there is no need to specify n_sim, as this is already determined by the number of rows of the sample data frame. ```{r, eval=FALSE} voi <- bpm_valprec( N = N_prec, #Sample sizes at each calculations are to be conducted. evidence = samp$sample, #Evidence as a list targets = list(voi.nb=T), #Only requesting EVSI and EVPI threshold=0.2) #Risk threshold for NB VoI calculations ``` ```{r include=FALSE} voi <- readRDS("precomputed/voi_result.rds") ``` Below we use ggplot to draw the EVSI curve for each sample size component. Note that the main Y axis shows the raw EVSI(N) values, which asymptote to the EVPI value. The secondary (right) Y axis shows the EVSI(N)/EVPI ratio. This ratio asymptotes to 1 as the sample size increases. ```{r fig.fullwidth, fig.width = 8, fig.height = 5, out.width = "100%"} library(ggplot2) graph_shapes <- c("oa.nb"=18, "eciw.cstat"=15, "qciw.cstat"=15, "eciw.cal_oe"=16, "qciw.cal_oe"=16, "eciw.cal_slp"=17, "qciw.cal_slp"=17) graph_names <- c("oa.nb"="Optimality assurance", "eciw.cstat"="c-statistic (expected value)", "qciw.cstat"="c-statistic (assurance)", "eciw.cal_oe"="O/E ratio (expected value)", "qciw.cal_oe"="O/E ratio (assurance)", "eciw.cal_slp"="Calibration slope (expected value)", "qciw.cal_slp"="Calibration slope (assurance)") graph_colors <- c("oa.nb"="black", "eciw.cstat"="#1f77b4", "qciw.cstat"="orange", "eciw.cal_oe"="#1f77b4", "qciw.cal_oe"="orange", "eciw.cal_slp"="#1f77b4", "qciw.cal_slp"="orange") df <- data.frame(N=voi$N, EVPI=voi$results[,'voi.evpi'], EVSI=voi$results[,'voi.evsi']) o <- order(df$N) df <- df[o, ] df$shapes <- graph_shapes[rownames(df)] df$graph_names <- graph_names[rownames(df)] df$colors <- graph_colors[rownames(df)] df$names <- rownames(df) df$names <- factor(df$names, levels = unique(df$names)) df$EVSI_EVPI_ratio <- df$EVSI / df$EVPI scale_factor <- max(df$EVSI) / max(df$EVSI_EVPI_ratio) df$EVSI_EVPI_ratio_scaled <- df$EVSI_EVPI_ratio * scale_factor # EVSI plot plt <- ggplot(df, aes(x=N)) + geom_line(aes(y=EVSI), color="black") + # Add the EVSI/EVPI ratio line geom_point(aes(y=EVSI, shape=names, color=names), size=2) + labs(x = "Sample size (N)", y = "Expected gain in NB (EVSI)", title = "") + scale_shape_manual( name = "Sample size targets and rules", values = df$shapes, # Unique shapes labels = df$graph_names # Use levels of N for labels ) + scale_color_manual( name = "Sample size targets and rules", values = df$colors, # Unique shapes labels = df$graph_names # Use levels of N for labels ) + # Primary Y-axis limits scale_y_continuous( limits = c(0, max(df$EVSI, df$EVPI)), sec.axis = sec_axis(~ . / scale_factor, name = "EVSI/EVPI Ratio") # Secondary Y-axis ) + # Horizontal lines geom_hline(yintercept = df$EVPI, color = "gray") + theme_minimal() print(plt) ``` The EVSI graph might be used to reason, for example, that whether one can relax sample size rules around calibration slope (which more than double the required sample size) without losing much clinical utility. ### References