How to generate tables and plots?

The plot and table type is determined by your column selection. Try it out!

Categorical variables

# A single variable
tab_counts(data, use_private)
Usage: in private context n p
never 12 12%
rarely 40 40%
several times a month 30 30%
several times a week 15 15%
almost daily 4 4%
Total 101 100%
Missing 0
# A list of variables
tab_counts(data, c(use_private, use_work))
Usage: in pr never rarely several times a month several times a week almost daily Total
ivate context 12% (12) 40% (40) 30% (30) 15% (15) 4% (4) 100% (101)
ofessional context 38% (38) 21% (21) 15% (15) 17% (17) 10% (10) 100% (101)
# Variables matched by a pattern
tab_counts(data, starts_with("use_"))
Usage: in pr never rarely several times a month several times a week almost daily Total
ivate context 12% (12) 40% (40) 30% (30) 15% (15) 4% (4) 100% (101)
ofessional context 38% (38) 21% (21) 15% (15) 17% (17) 10% (10) 100% (101)

Metric variables

To select the approriate function, you need to decide whether your data is categorical or metric.

# One metric variable
tab_metrics(data, sd_age)
Age value
min 18
q1 27
median 38
q3 52
max 68
m 39.7
sd 13.8
missing 0
n 101
# Multiple metric items
tab_metrics(data, starts_with("cg_adoption_"))
Expectations min q1 median q3 max m sd missing n
ChatGPT has clear advantages compared to similar offerings. 1 3 4 4 5 3.5 1.0 2 101
Using ChatGPT brings financial benefits. 1 2 3 4 5 2.7 1.2 0 101
Using ChatGPT is advantageous in many tasks. 1 3 4 4 5 3.6 1.1 0 101
Compared to other systems, using ChatGPT is more fun. 1 3 4 4 5 3.5 1.0 0 101
Much can go wrong when using ChatGPT. 1 2 3 4 5 3.1 1.1 0 101
There are legal issues with using ChatGPT. 1 2 3 4 5 3.1 1.2 0 101
The security of user data is not guaranteed with ChatGPT. 1 3 3 4 5 3.2 1.0 1 101
Using ChatGPT could bring personal disadvantages. 1 2 3 3 5 2.7 1.1 0 101
In my environment, using ChatGPT is standard. 1 2 2 3 5 2.5 1.1 1 101
Almost everyone in my environment uses ChatGPT. 1 1 2 3 5 2.4 1.2 0 101
Not using ChatGPT is considered being an outsider. 1 1 2 3 5 2.0 1.2 1 101
Using ChatGPT brings me recognition from my environment. 1 1 2 3 5 2.3 1.2 0 101
plot_metrics(data, starts_with("cg_adoption_"))

Cross tabulation and group comparison

Provide a grouping column in the third parameter to compare different groups.

tab_counts(data, adopter, sd_gender)
Gender Total I try new offers immediately I try new offers rather quickly I wait until offers establish themselves I only use new offers when I have no other choice
female 40%
(40)
2%
(2)
25%
(25)
13%
(13)
0%
(0)
male 59%
(60)
12%
(12)
38%
(38)
9%
(9)
1%
(1)
diverse 1%
(1)
1%
(1)
0%
(0)
0%
(0)
0%
(0)
Total 100%
(101)
15%
(15)
62%
(63)
22%
(22)
1%
(1)

In the corresponding plot function, you can use the prop parameter to grow bars to 100%. The numbers parameter prints the percentages onto the bars.

data |> 
  filter(sd_gender != "diverse") |> 
  plot_counts(adopter, sd_gender, prop="rows", numbers="p")

For metric variables, you can compare the mean values.

# Compare the means of one variable
tab_metrics(data, sd_age, sd_gender)
Gender min q1 median q3 max m sd missing n
female 18 25.8 38.0 44.2 63 37.5 13.4 0 40
male 19 32.5 38.5 52.0 68 41.2 14.0 0 60
diverse 33 33.0 33.0 33.0 33 33.0 NA 0 1
Total 18 27.0 38.0 52.0 68 39.7 13.8 0 101
# Compare the means of multiple items
tab_metrics(data, starts_with("cg_adoption_"), sd_gender)
Expectations Total female male diverse
ChatGPT has clear advantages compared to similar offerings. 3.4
(1.0)
3.6
(1.0)
3.3
(1.0)
4.0
(NA)
Using ChatGPT brings financial benefits. 2.7
(1.2)
2.6
(1.2)
2.7
(1.2)
3.0
(NA)
Using ChatGPT is advantageous in many tasks. 3.6
(1.1)
3.7
(1.0)
3.5
(1.1)
4.0
(NA)
Compared to other systems, using ChatGPT is more fun. 3.5
(1.0)
3.6
(1.0)
3.5
(1.0)
3.0
(NA)
Much can go wrong when using ChatGPT. 3.1
(1.1)
3.1
(1.0)
3.1
(1.2)
3.0
(NA)
There are legal issues with using ChatGPT. 3.1
(1.2)
3.0
(1.0)
3.1
(1.3)
3.0
(NA)
The security of user data is not guaranteed with ChatGPT. 3.2
(1.0)
3.0
(1.0)
3.3
(1.1)
3.0
(NA)
Using ChatGPT could bring personal disadvantages. 2.7
(1.1)
2.5
(0.9)
2.8
(1.2)
4.0
(NA)
In my environment, using ChatGPT is standard. 2.5
(1.1)
2.5
(0.9)
2.5
(1.3)
4.0
(NA)
Almost everyone in my environment uses ChatGPT. 2.4
(1.2)
2.4
(1.0)
2.3
(1.3)
4.0
(NA)
Not using ChatGPT is considered being an outsider. 2.0
(1.2)
1.8
(1.0)
2.1
(1.3)
4.0
(NA)
Using ChatGPT brings me recognition from my environment. 2.3
(1.2)
2.4
(1.2)
2.3
(1.3)
3.0
(NA)

Automatically generate reports

Reports combine plots and tables. Optionally, for item batteries, an index is calculated and reported.

To see an example or develop own reports, use the volker report template in RStudio:

Have fun with developing own reports!

Alternatively, manually add volker::html_report to the output options of your Markdown document:

---
title: "How to create reports?"
output: 
  volker::html_report
---

Then, you can generate combined outputs using the report-functions. One advantage of the report-functions is that plots are automatically scaled to fit the page.

The main entry point for reports are the report-functions. See the function help or the report vignette for further options.

data %>% 
  filter(sd_gender != "diverse") %>% 
  report_metrics(starts_with("cg_adoption_"), sd_gender)

Expectations

Plot

Table
Expectations Total female male
ChatGPT has clear advantages compared to similar offerings. 3.4
(1.0)
3.6
(1.0)
3.3
(1.0)
Using ChatGPT brings financial benefits. 2.7
(1.2)
2.6
(1.2)
2.7
(1.2)
Using ChatGPT is advantageous in many tasks. 3.6
(1.1)
3.7
(1.0)
3.5
(1.1)
Compared to other systems, using ChatGPT is more fun. 3.5
(1.0)
3.6
(1.0)
3.5
(1.0)
Much can go wrong when using ChatGPT. 3.1
(1.1)
3.1
(1.0)
3.1
(1.2)
There are legal issues with using ChatGPT. 3.1
(1.2)
3.0
(1.0)
3.1
(1.3)
The security of user data is not guaranteed with ChatGPT. 3.2
(1.0)
3.0
(1.0)
3.3
(1.1)
Using ChatGPT could bring personal disadvantages. 2.7
(1.1)
2.5
(0.9)
2.8
(1.2)
In my environment, using ChatGPT is standard. 2.5
(1.1)
2.5
(0.9)
2.5
(1.3)
Almost everyone in my environment uses ChatGPT. 2.4
(1.2)
2.4
(1.0)
2.3
(1.3)
Not using ChatGPT is considered being an outsider. 2.0
(1.2)
1.8
(1.0)
2.1
(1.3)
Using ChatGPT brings me recognition from my environment. 2.3
(1.2)
2.4
(1.2)
2.3
(1.3)
Index: Plot

Index: Table
Gender min q1 median q3 max m sd missing n items alpha
female 2 2.5 2.9 3.1 3.8 2.9 0.4 0 40 12 0.81
male 1 2.5 2.8 3.2 5.0 2.9 0.7 0 60 12 0.81
Total 1 2.5 2.8 3.2 5.0 2.9 0.6 0 100 12 0.81

If you want to add content before the report outputs, set the title parameter to FALSE and add your own title.

A good place for methodological details is a tabsheet next to the “Plot” and the “Table” buttons. You can add a tab by setting the close-parameter to FALSE and adding a new header on the fifth level (5 x # followed by the tab name). Close your new tabsheet with #### {-} (4 x #). See the example Markdown behind this vignette.

Adoption types

data %>% 
  filter(sd_gender != "diverse") %>% 
  report_counts(adopter, sd_gender, prop="rows", title= FALSE, close= FALSE)

Plot

Table
Gender Total I try new offers immediately I try new offers rather quickly I wait until offers establish themselves I only use new offers when I have no other choice
female 100%
(40)
5%
(2)
62%
(25)
32%
(13)
0%
(0)
male 100%
(60)
20%
(12)
63%
(38)
15%
(9)
2%
(1)
Total 100%
(100)
14%
(14)
63%
(63)
22%
(22)
1%
(1)
Method

Basis: Only male and female respondents.

Customizing outputs

Plot and table functions share a number of parameters that can be used to customize the outputs. Lookup the available parameters in the help of the specific function:

Custom labels: Where do they come from?

Labels used in plots and tables are stored in the comment attribute of the variable. You can inspect all labels using the codebook()-function:

codebook(data)
# A tibble: 94 × 6
   item_name     item_group item_class item_label         value_name value_label
   <chr>         <chr>      <chr>      <chr>              <chr>      <chr>      
 1 case          case       <NA>       case               <NA>       <NA>       
 2 sd_age        sd         <NA>       Age                <NA>       <NA>       
 3 cg_activities cg         <NA>       Activities with C… <NA>       <NA>       
 4 use_private   use        <NA>       Usage: in private… 1          never      
 5 use_private   use        <NA>       Usage: in private… 2          rarely     
 6 use_private   use        <NA>       Usage: in private… 3          several ti…
 7 use_private   use        <NA>       Usage: in private… 4          several ti…
 8 use_private   use        <NA>       Usage: in private… 5          almost dai…
 9 use_work      use        <NA>       Usage: in profess… 1          never      
10 use_work      use        <NA>       Usage: in profess… 2          rarely     
# ℹ 84 more rows

You can set custom or new labels with labs_apply() by providing a tibble with item names in the first column and item labels in the second column.

newlabels <- tribble(
  ~item_name, ~item_label,
  "cg_adoption_advantage_01", "Allgemeine Vorteile",
  "cg_adoption_advantage_02", "Finanzielle Vorteile",
  "cg_adoption_advantage_03", "Vorteile bei der Arbeit",
  "cg_adoption_advantage_04", "Macht mehr Spaß"
)

data %>%
  labs_apply(newlabels) %>%
  tab_metrics_items(starts_with("cg_adoption_advantage_"))
Item min q1 median q3 max m sd missing n
Allgemeine Vorteile 1 3 4 4 5 3.5 1.0 2 101
Finanzielle Vorteile 1 2 3 4 5 2.7 1.2 0 101
Vorteile bei der Arbeit 1 3 4 4 5 3.6 1.1 0 101
Macht mehr Spaß 1 3 4 4 5 3.5 1.0 0 101

You can remove all labels with labs_clear() to get a plain dataset.

data %>%
  labs_clear(everything()) %>%
  tab_counts(starts_with("cg_adoption_advantage_"))
cg_adoption_advantage_0 1 2 3 4 5 Total
1 6% (6) 8% (8) 34% (34) 37% (37) 14% (14) 100% (99)
2 22% (22) 21% (21) 29% (29) 21% (21) 6% (6) 100% (99)
3 6% (6) 10% (10) 21% (21) 45% (45) 17% (17) 100% (99)
4 6% (6) 4% (4) 35% (35) 39% (39) 15% (15) 100% (99)

With the labels parameter, you achieve a similar result.

data %>%
  tab_counts(starts_with("cg_adoption_advantage_"), labels= FALSE)
cg_adoption_advantage_0 1 2 3 4 5 Total
1 6% (6) 8% (8) 34% (34) 37% (37) 14% (14) 100% (99)
2 22% (22) 21% (21) 29% (29) 21% (21) 6% (6) 100% (99)
3 6% (6) 10% (10) 21% (21) 45% (45) 17% (17) 100% (99)
4 6% (6) 4% (4) 35% (35) 39% (39) 15% (15) 100% (99)

Scales

You can calculate mean indexes from a bunch of items using idx_add(). A new column is created with the average value of all selected columns for each case.

Reliability and number of items are calculated with psych::alpha() and stored as column attribute named “psych.alpha”. The reliability values are printed by tab_metrics().

Add a single index

data %>%
  idx_add(starts_with("cg_adoption_")) %>%
  tab_metrics(idx_cg_adoption)
Index: cg_adoption value
min 1
q1 2.5
median 2.8
q3 3.2
max 5
m 2.9
sd 0.6
missing 0
n 101
items 12
alpha 0.81

Compare the index values by group

data %>%
  idx_add(starts_with("cg_adoption_")) %>%
  tab_metrics(idx_cg_adoption, adopter)
Innovator type min q1 median q3 max m sd missing n items alpha
I try new offers immediately 1.5 3.2 3.3 4.1 5.0 3.5 0.9 0 15 12 0.81
I try new offers rather quickly 1.8 2.5 2.8 3.1 3.8 2.8 0.5 0 63 12 0.81
I wait until offers establish themselves 1.0 2.4 2.8 3.1 3.8 2.7 0.6 0 22 12 0.81
I only use new offers when I have no other choice 2.4 2.4 2.4 2.4 2.4 2.4 NA 0 1 12 0.81
Total 1.0 2.5 2.8 3.2 5.0 2.9 0.6 0 101 12 0.81

Add multiple indizes and summarize them

data %>%
  idx_add(starts_with("cg_adoption_")) %>%
  idx_add(starts_with("cg_adoption_advantage")) %>%
  idx_add(starts_with("cg_adoption_fearofuse")) %>%
  idx_add(starts_with("cg_adoption_social")) %>%
  tab_metrics(starts_with("idx_cg_adoption"))
Index: cg_adoption min q1 median q3 max m sd missing n
Index: cg_adoption 1 2.5 2.8 3.2 5 2.9 0.6 0 101
_advantage 1 3.0 3.5 3.8 5 3.3 0.8 0 101
_fearofuse 1 2.5 3.0 3.5 5 3.0 0.8 0 101
_social 1 1.5 2.0 2.8 5 2.3 1.0 0 101

What’s behind the scenes?

The volker-package is based on standard methods for data handling and visualisation. You can produce all outputs with a handful of functions. The package just makes your code dry - don’t repeat yourself - and wraps often used snippets into a simple interface.

Basically, all table values are calculated two tidyverse functions:

To shape the data frames, two essential functions come into play:

Plots are generated by ggplot().

The package provides print- and knit-functions that pimp console and markdown output. To make this work, the cleanded data, produced plots, tables and markdown snippets gain new classes (vlkr_df, vlkr_plt, vlkr_tbl, vlkr_rprt).