--- title: "sdcLog options" output: rmarkdown::html_vignette: toc: true toc_depth: 5 number_sections: false vignette: > %\VignetteIndexEntry{sdcLog options} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#" ) user_options <- options() options(datatable.print.class = FALSE) options(datatable.print.keys = FALSE) options(datatable.print.trunc.cols = FALSE) options(sdc.n_ids = 5L) options(sdc.n_dominance = 2L) options(sdc.share_dominance = 0.85) ``` You can set several options to 1. adapt sdcLog to the policies at your research data center 2. make sdcLog a little more convenient. First, we create a tiny `data.frame` to demonstrate the effects of the options: ```{r label, options} library(sdcLog) df <- data.frame(id = LETTERS[1:3], v1 = 1L:3L, v2 = c(1L, 2L, 4L)) df ``` # Options to adapt sdcLog to the policies at your research data center ## sdc.n_ids By default, sdcLog expects at least five different entities behind each calculated number. The functions in sdcLog derive this number from `getOption("sdc.n_ids", default = 5)`. That is, if the option `sdc.n_ids` is not set, it defaults to `5`. Consider the following example: ```{r example1_sdc.n_ids} sdc_descriptives(data = df, id_var = "id", val_var = "v1") ``` This can be adapted to the policy of your research data center by setting the option `sdc.n_ids` to the desired value. For example, if your policy allows results to be released if there are at least three different entities behind each number, set ```{r set_sdc.n_ids} options(sdc.n_ids = 3) ``` Now, `getOption("sdc.n_ids", default = 5)` evaluates to `3` and warnings are thrown only if there are less than three entities behind each result. Note that this is reflected in the first line of output from every function of sdcLog: ```{r example2_sdc.n_ids} sdc_descriptives(data = df, id_var = "id", val_var = "v1") ``` ## sdc.n_ids_dominance The default value for `sdc.n_ids_dominance` is `2`. In our example, this leads to a warning: ```{r example1_sdc.n_ids_dominance} sdc_descriptives(data = df, id_var = "id", val_var = "v2") ``` If your policy requires only the largest entity alone to attribute for a share of less than `0.85`, set ```{r set_sdc.n_ids_dominance} options(sdc.n_ids_dominance = 1) ``` Then, there is no problem in the example: ```{r example2_sdc.n_ids_dominance} sdc_descriptives(data = df, id_var = "id", val_var = "v2") ``` ## sdc.share_dominance The last option of sdcLog which affects internal calculations is `sdc.share_dominance`. To demonstrate, we first reset `sdc.n_dominance` to it's default value of `2`. ```{r reset_options1, include=FALSE} options(sdc.n_ids_dominance = 2L) ``` Let's consider a policy which allows the largest two entities to attribute for a share of `0.8`. To reflect this, set ```{r set_sdc.share_dominance} options(sdc.share_dominance = 0.8) ``` Now, the initial example from `sdc.n_ids` throws a warning: ```{r example1_sdc.share_dominance} sdc_descriptives(data = df, id_var = "id", val_var = "v1") ``` ## sdc.info_level This option differs from the previous ones in the sense that is does not affect actual calculations. Instead, it determines the verbosity of the output of sdcLog functions. Possible values are `0`, `1` (default), and `2`. Before demonstrating the effects of `sdc.info_level`, we reset `sdc.share_dominance` to it's default value of `0.85`. ```{r reset_options2, include=FALSE} options(sdc.share_dominance = 0.85) ``` The example below shows the different levels of information printed to the console based on the different levels of `sdc.info_level`: ```{r example_sdc.info_level} for (i in 0:2) { options(sdc.info_level = i) cat("\nsdc.info_level: ", getOption("sdc.info_level"), "\n") print(sdc_descriptives(data = df, id_var = "id", val_var = "v1")) } ``` At level `0`, only options and settings are printed. Level `1` also prints a short message about the overall outcome of the checks. Level `2` additionally prints the results of the separate checks on distinct entities and dominance. # Option to make sdcLog more convenient Usually, the ID variable does not change during the course of your analysis. Therefore, it is convenient to set ```{r sdc.id_var} options(sdc.id_var = "id") ``` Then you do not have to specify `id_var` every time you use one of the `sdc_*` functions: ```{r reset options,echo=-1} options(user_options) sdc_descriptives(data = df, val_var = "v1") ``` # General remarks Please note that these options affect all functions of sdcLog, not just `sdc_descriptives()`.