---
title: "Custom statistics"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Custom statistics}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: references.bib
---

The normal tidystats workflow consists of running statistics functions such as
`lm()`, saving the output into variables, and then using the `add_stats()` 
function to add the statistics to a list. This works as long as tidystats
has built-in support for the statistics functions you ran. So what should you do
when this is not the case?

The first thing would be to let me know that there's a function you would like 
tidystats to support. There are various ways to contact me. You can go to the 
Github [page](https://github.com/WillemSleegers/tidystats) and create an 
[issue](https://github.com/WillemSleegers/tidystats/issues). This is the 
preferred method because it is easy to paste code snippets and to ask follow-up 
questions. For alternative ways to contact me, see the tidystats 
[website](https://tidystats.io/).

Of course, it's not always possible to wait for me to add support for the 
function to tidystats. Nor is it always possible for me to add support for the 
function. This can happen when the statistic  you want to report is not 
identifiable as belonging to a particular type of analysis (for example, the 
result of `confint()` returns a matrix, which does not contain any information 
about it being a matrix containing confidence intervals).

For these reasons, it is useful to know that there are two possible solutions.

The first solution is that I can still add support for functions that return 
objects without sufficient information. You can tell `add_stats()` what kind of 
object it is using the `class` argument. This only works if I explicitly coded 
support for a specific class. You can see which classes are supported in the 
help document of `add_stats()` (`?add_stats`).

The second solution is to manually extract the statistics yourself and create
an object to add to `add_stats()`. I've created a few helper functions to help 
you do this: `custom_stats()` and `custom_stat()`. These two functions work 
together to help you store the statistics in a format needed for tidystats to 
function.

`custom_stats()` has two arguments: `method` and `statistics`. The method should
contain a description of the type of method you used. The `statistics` argument
requires a vector of statistics created with the `custom_stat()` function.

The `custom_stat()` function serves to create a statistic, along with the 
necessary information to report the statistic. At a minimum, it requires 
specifying the `name` and `value` of the statistic. Optionally you can also 
specify a symbol and a subscript so that the text editor add-ins can correctly 
report the statistic. Finally, it's also possible that the statistic is a ranged
statistic, i.e., it has a lower and upper bound. These statistics require that 
you specify the type of `interval` (e.g., "CI", "HDI"), `level` (e.g., .95), 
`lower` and `upper` bound. 

Below I show a few examples of adding custom statistics.

## Example 1: Using the `class` argument

Say we want to calculate the confidence intervals for several parameters in a 
linear model, using the `confint()` function.

```{r, eval = FALSE}
# Run a linear model
fit <- lm(100 / mpg ~ disp, data = mtcars)

# Compute the confidence intervals
fit_confint <- confint(fit)

# Create an empty list
statistics <- list()

# Add linear model and confidence intervals to the list
statistics <- statistics %>%
  add_stats(fit) %>%
  add_stats(fit_confint)
```

Unfortunately, we get an error:

> Error in UseMethod("tidy_stats") : 
  no applicable method for 'tidy_stats' applied to an object of class 
  "c('matrix', 'array', 'double', 'numeric')"

 That's because `confint()` return a standard matrix, rather than an object 
 specific to the `confint()` function. You can check this yourself by running
 the `class()` function on the output of `confint()` 
 (e.g., `class(fit_confint)`). You'll see that it simply says 
 `"matrix" "array"`. That's not enough for tidystats to work with. Ideally it
 would say something like `"confint"` so that tidystats knows what it is working
 with and extract the statistics. 

Thankfully, we can still add statistics from `confint()` to a list via 
`add_stats()`, using the `class` argument. We can specify that the statistics
are from the `confint()` function by saying `class = "confint"`. 

```{r, eval = FALSE}
statistics <- statistics %>%
  add_stats(fit) %>%
  add_stats(fit_confint, class = "confint")
```

We don't get an error this time, indicating that it worked.

## Example 2: Using `custom_stats()`

Say you want to calculate a Bayes Factor using the BIC approach 
([Wagenmakers, 2007](https://doi.org/10.3758/BF03194105)). An example of this 
approach can be found 
[here](https://rstudio-pubs-static.s3.amazonaws.com/358672_09291d0b37ce43f08cf001cfd25c16c2.html); 
which I'll repeat down below.

```{r}
# Set seed for reproducibility
set.seed(14)

# Simulate some data
intercept_data <- data.frame(score = scale(rnorm(40), center = 0.72))

# Run two models and calculate the BIC
full_lm <- lm(score ~ 1, intercept_data)
null_lm <- lm(score ~ 0, intercept_data)

BF_BIC <- exp((BIC(null_lm) - BIC(full_lm)) / 2)
```

The Bayes Factor is `r round(BF_BIC, 2)`. Now, how do you add this value to a 
tidystats list? If we try it the standard way, we'll see that it fails.

```{r, eval = FALSE}
# Load the tidystats package
library(tidystats)

# Create an empty list
statistics <- list()

# Add BIC to the list using add_stats()
statistics <- add_stats(statistics, BF_BIC)
```

This produces an error message that says:

> Error in UseMethod("tidy_stats") : no applicable method for 'tidy_stats' 
  applied to an object of class "c('double', 'numeric')"

It's because `BF_BIC` is simply a number and not the output of a statistics 
function, so tidystats doesn't know how to store this number. Let's fix this 
using `custom_stats()` and `custom_stat()`.

```{r, eval = FALSE}
# Create a list of custom statistics
BIC <- custom_stats(
  method = "BIC",
  statistics = custom_stat(
    name = "BIC Bayes Factor",
    value = BF_BIC,
    symbol = "BF",
    subscript = "10"
  )
)

# Add the statistics to the list
statistics <- add_stats(statistics, BIC)
```

Now we don't get an error. Thanks to `custom_stats()` and `custom_stat()` we 
correctly structured the statistic so it can be added to the list via 
`add_stats()`. 

## Summary

tidystats works by taking the output of statistical tests, extracting the 
statistics, and reorganizing them into a particular structure. This works if 1)
tidystats has built-in support for the function and 2) if the function used to 
run the statistical test returns an object that tidystats can use to identify 
the test.

If you want to use tidystats on a function that is not supported yet, please 
contact me to let me know that I should add support for it.

Alternatively, you can manually create a list of statistics and supply it to 
the `add_stats()` function using `custom_stats()` and `custom_stat()`. 

The goal is to have tidystats support as many tests as possible, so that you 
rarely have to resort to this solution.