--- title: "Tidy Aggregation and Required Data Inputs" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Tidy Aggregation and Required Data Inputs} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} date-modified: last-modified --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` The design philosophy of [aggreCAT]{.pkg} is principled on 'tidy' data [@Wickham:2014vp]. Each aggregation method expects a [data.frame]{.class} or [tibble]{.class} of judgements (`data_ratings`) as its input, and returns a [tibble]{.class} containing the variables `method`, `paper_id`, `cs` and `n_experts` (see @sec-AverageWAgg for illustration of outputs); where `method` is a character vector corresponding to the aggregation method name specified in the `type` argument. Each aggregation is applied as a summary function [@Wickham2017R], and therefore returns a single row or observation with a single confidence score `cs` for each claim or `paper_id`. The number of expert judgements summarised in the aggregated confidence score is returned in the column `n_experts`. Because of the tidy nature of the aggregation outputs, multiple aggregations can be applied to the same data with the results of all aggregation methods row bound together in a single `tibble` (See the example repliCATS workflow in @sec-workflow). The tibble of judgements to be aggregated (`data_ratings`) requires the columns `round`, `paper_id`, `user_name`, `question`, `element`, `value` and `group`. Each observation in the judgement data corresponds to a single `value` for a single `question` elicited from a single `user_name` about a given `paper_id` in a single `round`. There are four types of `question`s that elicited `values` correspond to. Estimates about the event probability for a given `paper_id` correspond to `"direct_replication"` in the `question` variable. The type of estimate the `value` belongs to is recorded in the `element` variable, and may be one of `"three_point_lower"`, `"three_point_best"`, or `"three_point_upper"`. Every aggregation function requires at least one `value` derived from three-point elicitation (`question == "direct_replication"`) in the dataframe supplied to the `expert_judgements` argument, however, some methods require only the best-estimates (`element == "three_point_best"`) for mathematical aggregation. Similarly some aggregation methods require multiple `round`s of judgements, while others require only a single round. Only the aggregation method *CompWAgg* requires `value`s for the `comprehension` question. For a summary of each aggregation method, its calling function and data requirements and sources, see @tbl-method-summary-table. ```{r setup} library(aggreCAT) ```