--- title: "Use Case 04: Estimation of optimal cluster size for a trial with pre-determined buffer width" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{Use Case 04: Estimation of optimal cluster size for a trial with pre-determined buffer width} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- Both the number and size of clusters affect power calculations for CRTs: + If there are no logistical constraints, and spillover can be neglected (as in trials of vaccines that enrol only small proportions of the population), there is no need for a buffer zone and the most efficient design is an individually randomized CRT (i.e. a cluster size of one). In general, a trial with many small clusters has more power than one with the same number of individuals enrolled in larger clusters. + If spillover is an issue, and it is decided to address this by including buffer zones, then the number of individuals included in the trial is less than the total population. Enumeration and intervention allocation are still required for the full trial area, so there can be substantial resource implications if many people are included in the buffers. There is a trade-off between increasing power by creating many small clusters (leading to a large proportion of locations in buffer zones) and reducing the proportion of locations in buffer zones by using large clusters. The `CRTspat` package provides functions for analysing this trade-off for any site for which baseline data are available. The example shown here uses the baseline prevalence data introduced in [Use Case 1](Usecase1.html). The trial is assumed to plan to be based on the same outcome of prevalence, and to be powered for an efficacy of 30%. A set of different algorithmic cluster allocations are carried out with different numbers of clusters. Each allocation is randomized and buffer zones are specified with the a pre-specified width (in this example, 0.5 km). The ICC is computed from the baseline data, excluding the buffer zones, and corresponding power calculations are carried out. The power is calculated and plotted as a function of cluster size. ``` r # use the same dataset as for Use Case 1. library(CRTspat) example_locations <- readdata('example_site.csv') example_locations$base_denom <- 1 exampleCRT <- CRTsp(example_locations) example <- aggregateCRT(exampleCRT, auxiliaries = c("RDT_test_result", "base_denom")) # randomly sample an array of numbers of clusters to allocate set.seed(5) c_vec <- round(runif(20, min = 6, max = 60)) CRTscenario <- function(c, CRT, buffer_width) { ex <- specify_clusters(CRT, c = c, algo = "kmeans") %>% randomizeCRT() %>% specify_buffer(buffer_width = buffer_width) GEEanalysis <- CRTanalysis(ex, method = "GEE", baselineOnly = TRUE, excludeBuffer = TRUE, baselineNumerator = "RDT_test_result", baselineDenominator = "base_denom") locations <- GEEanalysis$description$locations ex_power <- CRTpower(trial = ex, effect = 0.3, yC = GEEanalysis$pt_ests$controlY, outcome_type = "p", N = GEEanalysis$description$sum.denominators/locations, c = c, ICC = GEEanalysis$pt_ests$ICC) value <- c(c_full = c, c_core = ex_power$geom_core$c, clustersRequired = ex_power$geom_full$clustersRequired, power = ex_power$geom_full$power, mean_h = ex_power$geom_full$mean_h, locations = locations, ICC = GEEanalysis$pt_ests$ICC) names(value) <- c("c_full", "c_core", "clustersRequired", "power", "mean_h", "locations", "ICC") return(value) } results <- t(sapply(c_vec, FUN = CRTscenario, simplify = "array", CRT = example, buffer_width = 0.5)) %>% data.frame() ``` ``` ## (Intercept) ## -0.9370939 ``` ``` ## (Intercept) ## -0.8589939 ``` ``` ## (Intercept) ## -0.935533 ``` ``` ## (Intercept) ## -0.8964881 ``` ``` ## (Intercept) ## -1.023863 ``` ``` ## (Intercept) ## -0.9778199 ``` ``` ## (Intercept) ## -1.103676 ``` ``` ## (Intercept) ## -1.003547 ``` ``` ## (Intercept) ## -0.9010233 ``` ``` ## (Intercept) ## -0.9686587 ``` ``` ## (Intercept) ## -1.221709 ``` ``` ## (Intercept) ## -1.118359 ``` ``` ## (Intercept) ## -0.9804311 ``` ``` ## (Intercept) ## -1.210983 ``` ``` ## (Intercept) ## -1.140902 ``` ``` ## (Intercept) ## -1.077208 ``` ``` ## (Intercept) ## -1.252144 ``` ``` ## (Intercept) ## -0.9988478 ``` ``` ## (Intercept) ## -1.17822 ``` ``` ## (Intercept) ## -0.9236708 ``` Each simulated cluster allocation is different, as are the randomizations. This leads to variation in the locations of the buffer zones, so the number of core clusters is a stochastic function of the number of clusters randomised (c). There is also variation in the estimated Intracluster Correlation (see [Use Case 3](Usecase3.html)) for any value of c. ``` r total_locations <- example$geom_full$locations results$proportion_included <- results$c_core * results$mean_h * 2/total_locations results$corelocations_required <- results$clustersRequired * results$mean_h results$totallocations_required <- with(results, total_locations/locations * corelocations_required) library(ggplot2) theme_set(theme_bw(base_size = 14)) ggplot(data = results, aes(x = c_full, y = c_core)) + geom_smooth() + xlab("Clusters allocated (per arm)") + ylab("Clusters in core (per arm)") + geom_segment(aes(x = 5, xend = 35, y = 18.5, yend = 18.5), arrow = arrow(length = unit(1, "cm")), lwd = 2, color = "red") ```
Fig 4.1 Numbers of clusters
Fig 4.2 Map of clusters with c = 6
Fig 4.3 Map of clusters with c = 40
Fig 4.4 Size of clusters
Fig 4.5 Power achievable with given site
Fig 4.6 Power achievable with given site
Fig 4.7 Number of clusters required for full trial area
Fig 4.8 Proportions of locations in core
Fig 4.9 Size of trial area required to achieve adequate power