---
title: "Get started"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Get started}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
out.width = "100%"
)
```
```{r setup}
library(unsum)
```
Here is a brief walkthrough of using CLOSURE in unsum.
Call `closure_generate()` to run the CLOSURE algorithm. Enter mean, SD, and sample size that you read in a paper. For `scale_min` and `scale_max`, use the empirical minimum and maximum if available. Otherwise, use the more fundamental scale bounds, e.g., `1` and `7` for a 1-7 scale.
The `mean` and `sd` arguments must be strings to preserve trailing zeros. Note that CLOSURE can only be used if the values must be integers: e.g., a value can be 2 or 3, but not 2.5.
```{r}
data <- closure_generate(
mean = "3.5",
sd = "1.8",
n = 80,
scale_min = 1,
scale_max = 5
)
```
First create a plot of the mean sample found by CLOSURE. This gives us a sense of the overall results, which are quite polarized:
```{r}
#| fig.alt: >
#| Barplot of `data`, the CLOSURE output.
#| It specifically visualizes the `f_average` column of
#| the `frequency` tibble, but also gives percentage figures,
#| similar to the `f_relative` column. The overall shape is
#| a strongly polarized distribution.
closure_plot_bar(data)
```
You can customize the plot, e.g., to show the sum of all samples found instead of the average sample, or only percentages, or different colors. See documentation at `closure_plot_bar()`. However, the default should be informative enough for a start.
## CLOSURE results
Now let's look at the results themselves:
```{r}
data
```
- `inputs` records the arguments in `closure_generate()`.
- `metrics` shows the number of possible samples that could have led to the reported summary statistics (`samples_all`) and the total number of all values found in them (`values_all`).
Importantly, it also features an index of variation in bounded scales (`horns`). It ranges from 0 to 1, where 0 means no variability and 1 would be a sample evenly split between the extremes — here, 1, and 5 — with no values in between. The reference value `horns_uniform` shows which value `horns` would have if the mean sample was uniformly distributed. This is 0.5 because of the 1-5 scale. See `horns()` for more details.
The actual `horns` value is 0.79, which is a high degree of variability even in the abstract. However, in practice, 0.79 might be extremely high compared to theoretical expectations: if the sample should have a roughly normal shape, even the hypothetical 0.5 uniform value would be surprisingly high, let alone the 0.79 actual value.
- `frequency` shows the absolute and relative frequencies of values found by CLOSURE at each scale point. It also gives us the (absolute) frequency of values in the average sample that we saw in the plot above.
- `results` stores all the samples that CLOSURE found (`sample`). Each has a unique number (`id`).
See `closure_generate()` for more details.
In addition to the bar plot, unsum offers an ECDF plot for CLOSURE results:
```{r}
#| fig.alt: >
#| Empirical cumulative distribution function (ECDF) plot
#| of `data`, the CLOSURE output. The curve rises most steeply at the
#| first and last scale values, indicating a strongly polarized distribution.
closure_plot_ecdf(data)
```
## Horns index variation
You may wonder about variability between the samples. Couldn't there be some with a much lower or higher horns index than the overall mean `horns`? In this case, there would be a chance that the original data looked quite different from the average. Check this using `closure_horns_analyze()`:
```{r}
data_horns <- closure_horns_analyze(data)
data_horns
```
Also, `closure_horns_histogram()` visualizes the distribution of horns values across all samples:
```{r}
#| fig.alt: >
#| Barplot of horns values distribution.
#| The scale limits range from 0 to 1,
#| but values only start after 0.75
#| and end before 0.90.
closure_horns_histogram(data_horns)
```
In sum, the horns values are quite tightly confined. Wide variation among them seems to occur only if `mean` and `sd` have no decimal places.
## Read and write
What if you have a huge object with CLOSURE results that you want to save? Write it to disk with `closure_write()`:
```{r}
# Using a temporary folder via `tempdir()` just for this example --
# you should use a real folder on your computer instead!
path_new_folder <- closure_write(data, path = tempdir())
```
This stores the results using the highly efficient Parquet format. It will only take a tiny fraction of a CSV file's disk space.
In your later session, read the data in from the folder to get the same CLOSURE list back:
```{r}
data_new <- closure_read(path_new_folder)
```
A caveat: don't modify the output of `closure_generate()` before passing it into other `closure_*()` functions. The latter need input with a very specific format, and if you manipulate the data between two `closure_*()` calls, these assumptions may no longer hold. Some checks are in place to detect alterations, but they may not catch all of them.