--- title: "Custom statistics" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Custom statistics} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: references.bib --- The normal tidystats workflow consists of running statistics functions such as `lm()`, saving the output into variables, and then using the `add_stats()` function to add the statistics to a list. This works as long as tidystats has built-in support for the statistics functions you ran. So what should you do when this is not the case? The first thing would be to let me know that there's a function you would like tidystats to support. There are various ways to contact me. You can go to the Github [page](https://github.com/WillemSleegers/tidystats) and create an [issue](https://github.com/WillemSleegers/tidystats/issues). This is the preferred method because it is easy to paste code snippets and to ask follow-up questions. For alternative ways to contact me, see the tidystats [website](https://tidystats.io/). Of course, it's not always possible to wait for me to add support for the function to tidystats. Nor is it always possible for me to add support for the function. This can happen when the statistic you want to report is not identifiable as belonging to a particular type of analysis (for example, the result of `confint()` returns a matrix, which does not contain any information about it being a matrix containing confidence intervals). For these reasons, it is useful to know that there are two possible solutions. The first solution is that I can still add support for functions that return objects without sufficient information. You can tell `add_stats()` what kind of object it is using the `class` argument. This only works if I explicitly coded support for a specific class. You can see which classes are supported in the help document of `add_stats()` (`?add_stats`). The second solution is to manually extract the statistics yourself and create an object to add to `add_stats()`. I've created a few helper functions to help you do this: `custom_stats()` and `custom_stat()`. These two functions work together to help you store the statistics in a format needed for tidystats to function. `custom_stats()` has two arguments: `method` and `statistics`. The method should contain a description of the type of method you used. The `statistics` argument requires a vector of statistics created with the `custom_stat()` function. The `custom_stat()` function serves to create a statistic, along with the necessary information to report the statistic. At a minimum, it requires specifying the `name` and `value` of the statistic. Optionally you can also specify a symbol and a subscript so that the text editor add-ins can correctly report the statistic. Finally, it's also possible that the statistic is a ranged statistic, i.e., it has a lower and upper bound. These statistics require that you specify the type of `interval` (e.g., "CI", "HDI"), `level` (e.g., .95), `lower` and `upper` bound. Below I show a few examples of adding custom statistics. ## Example 1: Using the `class` argument Say we want to calculate the confidence intervals for several parameters in a linear model, using the `confint()` function. ```{r, eval = FALSE} # Run a linear model fit <- lm(100 / mpg ~ disp, data = mtcars) # Compute the confidence intervals fit_confint <- confint(fit) # Create an empty list statistics <- list() # Add linear model and confidence intervals to the list statistics <- statistics %>% add_stats(fit) %>% add_stats(fit_confint) ``` Unfortunately, we get an error: > Error in UseMethod("tidy_stats") : no applicable method for 'tidy_stats' applied to an object of class "c('matrix', 'array', 'double', 'numeric')" That's because `confint()` return a standard matrix, rather than an object specific to the `confint()` function. You can check this yourself by running the `class()` function on the output of `confint()` (e.g., `class(fit_confint)`). You'll see that it simply says `"matrix" "array"`. That's not enough for tidystats to work with. Ideally it would say something like `"confint"` so that tidystats knows what it is working with and extract the statistics. Thankfully, we can still add statistics from `confint()` to a list via `add_stats()`, using the `class` argument. We can specify that the statistics are from the `confint()` function by saying `class = "confint"`. ```{r, eval = FALSE} statistics <- statistics %>% add_stats(fit) %>% add_stats(fit_confint, class = "confint") ``` We don't get an error this time, indicating that it worked. ## Example 2: Using `custom_stats()` Say you want to calculate a Bayes Factor using the BIC approach ([Wagenmakers, 2007](https://doi.org/10.3758/BF03194105)). An example of this approach can be found [here](https://rstudio-pubs-static.s3.amazonaws.com/358672_09291d0b37ce43f08cf001cfd25c16c2.html); which I'll repeat down below. ```{r} # Set seed for reproducibility set.seed(14) # Simulate some data intercept_data <- data.frame(score = scale(rnorm(40), center = 0.72)) # Run two models and calculate the BIC full_lm <- lm(score ~ 1, intercept_data) null_lm <- lm(score ~ 0, intercept_data) BF_BIC <- exp((BIC(null_lm) - BIC(full_lm)) / 2) ``` The Bayes Factor is `r round(BF_BIC, 2)`. Now, how do you add this value to a tidystats list? If we try it the standard way, we'll see that it fails. ```{r, eval = FALSE} # Load the tidystats package library(tidystats) # Create an empty list statistics <- list() # Add BIC to the list using add_stats() statistics <- add_stats(statistics, BF_BIC) ``` This produces an error message that says: > Error in UseMethod("tidy_stats") : no applicable method for 'tidy_stats' applied to an object of class "c('double', 'numeric')" It's because `BF_BIC` is simply a number and not the output of a statistics function, so tidystats doesn't know how to store this number. Let's fix this using `custom_stats()` and `custom_stat()`. ```{r, eval = FALSE} # Create a list of custom statistics BIC <- custom_stats( method = "BIC", statistics = custom_stat( name = "BIC Bayes Factor", value = BF_BIC, symbol = "BF", subscript = "10" ) ) # Add the statistics to the list statistics <- add_stats(statistics, BIC) ``` Now we don't get an error. Thanks to `custom_stats()` and `custom_stat()` we correctly structured the statistic so it can be added to the list via `add_stats()`. ## Summary tidystats works by taking the output of statistical tests, extracting the statistics, and reorganizing them into a particular structure. This works if 1) tidystats has built-in support for the function and 2) if the function used to run the statistical test returns an object that tidystats can use to identify the test. If you want to use tidystats on a function that is not supported yet, please contact me to let me know that I should add support for it. Alternatively, you can manually create a list of statistics and supply it to the `add_stats()` function using `custom_stats()` and `custom_stat()`. The goal is to have tidystats support as many tests as possible, so that you rarely have to resort to this solution.