How to use the volkeR package?
First, load the package, set the plot theme and get some data.
# Load the package
library(volker)
# Set the basic plot theme
theme_set(theme_vlkr())
# Load an example dataset ds from the package
ds <- volker::chatgpt
How to generate tables and plots?
Decide whether your data is categorical or metric and choose the appropriate function:
-
tab_counts()
shows frequency tables. -
plot_counts()
generates simple and stacked bar charts. -
effect_counts()
calculates test statistics for categorical data.
-
tab_metrics()
creates tables with distribution parameters. -
plot_metrics()
visualises distributions in density plots, box plots or scatter plots. -
effect_metrics()
calculates test statistics for metric data.
The column selection determines whether to analyse single variables, item lists or to compare and correlate multiple variables. Try it out!
Categorical variables
# A single variable
tab_counts(ds, use_private)
Usage: in private context | n | p |
---|---|---|
never | 12 | 12% |
rarely | 40 | 40% |
several times a month | 30 | 30% |
several times a week | 15 | 15% |
almost daily | 4 | 4% |
total | 101 | 100% |
# A list of variables
tab_counts(ds, c(use_private, use_work))
Usage | never | rarely | several times a month | several times a week | almost daily | total |
---|---|---|---|---|---|---|
in private context | 12% (12) | 40% (40) | 30% (30) | 15% (15) | 4% (4) | 100% (101) |
in professional context | 38% (38) | 21% (21) | 15% (15) | 17% (17) | 10% (10) | 100% (101) |
# Variables matched by a pattern
tab_counts(ds, starts_with("use_"))
Usage | never | rarely | several times a month | several times a week | almost daily | total |
---|---|---|---|---|---|---|
in private context | 12% (12) | 40% (40) | 30% (30) | 15% (15) | 4% (4) | 100% (101) |
in professional context | 38% (38) | 21% (21) | 15% (15) | 17% (17) | 10% (10) | 100% (101) |
Metric variables
# One metric variable
tab_metrics(ds, sd_age)
Age | value |
---|---|
min | 18 |
q1 | 27 |
median | 38 |
q3 | 52 |
max | 68 |
mean | 39.7 |
sd | 13.8 |
n | 101 |
# Multiple metric items
tab_metrics(ds, starts_with("cg_adoption_"))
Expectations | min | q1 | median | q3 | max | mean | sd | n |
---|---|---|---|---|---|---|---|---|
ChatGPT has clear advantages compared to similar offerings. | 1 | 3 | 4 | 4 | 5 | 3.4 | 1.0 | 97 |
Using ChatGPT brings financial benefits. | 1 | 2 | 3 | 4 | 5 | 2.7 | 1.2 | 97 |
Using ChatGPT is advantageous in many tasks. | 1 | 3 | 4 | 4 | 5 | 3.6 | 1.1 | 97 |
Compared to other systems, using ChatGPT is more fun. | 1 | 3 | 4 | 4 | 5 | 3.5 | 1.0 | 97 |
Much can go wrong when using ChatGPT. | 1 | 2 | 3 | 4 | 5 | 3.1 | 1.1 | 97 |
There are legal issues with using ChatGPT. | 1 | 2 | 3 | 4 | 5 | 3.1 | 1.2 | 97 |
The security of user data is not guaranteed with ChatGPT. | 1 | 3 | 3 | 4 | 5 | 3.2 | 1.0 | 97 |
Using ChatGPT could bring personal disadvantages. | 1 | 2 | 3 | 3 | 5 | 2.7 | 1.1 | 97 |
In my environment, using ChatGPT is standard. | 1 | 2 | 2 | 3 | 5 | 2.5 | 1.1 | 97 |
Almost everyone in my environment uses ChatGPT. | 1 | 1 | 2 | 3 | 5 | 2.4 | 1.2 | 97 |
Not using ChatGPT is considered being an outsider. | 1 | 1 | 2 | 3 | 5 | 2.0 | 1.2 | 97 |
Using ChatGPT brings me recognition from my environment. | 1 | 1 | 2 | 3 | 5 | 2.3 | 1.2 | 97 |
4 missing case(s) omitted.
Cross tabulation and group comparison
Provide a grouping column in the third parameter to compare different groups.
tab_counts(ds, adopter, sd_gender)
Innovator type | total | female | male | diverse |
---|---|---|---|---|
I try new offers immediately |
15% (15) |
2% (2) |
12% (12) |
1% (1) |
I try new offers rather quickly |
62% (63) |
25% (25) |
38% (38) |
0% (0) |
I wait until offers establish themselves |
22% (22) |
13% (13) |
9% (9) |
0% (0) |
I only use new offers when I have no other choice |
1% (1) |
0% (0) |
1% (1) |
0% (0) |
total |
100% (101) |
40% (40) |
59% (60) |
1% (1) |
For metric variables, you can compare the mean values.
# Compare the means of one grouping variable (including the confidence interval)
tab_metrics(ds, sd_age, sd_gender, ci = TRUE)
Gender | min | q1 | median | q3 | max | mean | sd | ci low | ci high | n |
---|---|---|---|---|---|---|---|---|---|---|
female | 18 | 25.8 | 38.0 | 44.2 | 63 | 37.5 | 13.4 | 33.2 | 41.8 | 40 |
male | 19 | 32.5 | 38.5 | 52.0 | 68 | 41.2 | 14.0 | 37.6 | 44.8 | 60 |
diverse | 33 | 33.0 | 33.0 | 33.0 | 33 | 33.0 | 1 | |||
total | 18 | 27.0 | 38.0 | 52.0 | 68 | 39.7 | 13.8 | 37.0 | 42.4 | 101 |
By default, the crossing variable is treated as categorical. You can change this behaviour using the metric-parameter to calculate correlations:
# Correlate two metric variables
tab_metrics(ds, sd_age, use_work, metric = TRUE, ci = TRUE)
Item 1 | Item 2 | n | Pearson’s r | ci low | ci high |
---|---|---|---|---|---|
Age | Usage: in professional context | 101 | -0.2 | -0.38 | 0 |
Each table function has a corresponding plot function with parameters
to pimp the result. See the function help (F1 key) to learn the options.
For example, you can use the prop
parameter to grow bars to
100%. The numbers
parameter prints frequencies and
percentages onto the bars.
ds |>
filter(sd_gender != "diverse") |>
plot_counts(adopter, sd_gender, prop="rows", numbers=c("p","n"))
Further, the effect-functions conduct statistical tests:
ds |>
filter(sd_gender != "diverse") |>
effect_counts(adopter, sd_gender)
Statistic | Value |
---|---|
Cramer’s V | 0.28 |
Number of cases | 100 |
Degrees of freedom | |
Chi-squared | 7.87 |
p value | 0.031 |
stars | * |
Automatically generate reports
Getting started
Reports combine plots, tables and effect calculations. Optionally, for item batteries, an index is calculated and reported.
To see an example or develop own reports, use the volker report template in RStudio:
- Create a new R Markdown document from the main menu
- In the popup select the “From Template” option
- Select the volker template.
- The template contains a working example. Just click knit to see the result.
Have fun with developing own reports!
Custom reports
To generate a volker-report from any R-Markdown document, add
volker::html_report
to the output options of your Markdown
document:
---
title: "How to create reports?"
output:
volker::html_report
---
Then, you can generate combined outputs using the report-functions. One advantage of the report-functions is that plots are automatically scaled to fit the page. See the function help for further options (F1 key).
ds %>%
filter(sd_gender != "diverse") %>%
report_metrics(starts_with("cg_adoption_"), sd_gender, index=TRUE, box=TRUE, ci=TRUE)
Expectations
Table
Expectations | total | female | male |
---|---|---|---|
ChatGPT has clear advantages compared to similar offerings. |
3.4 (1.0) |
3.6 (1.0) |
3.3 (1.0) |
Using ChatGPT brings financial benefits. |
2.7 (1.2) |
2.6 (1.2) |
2.7 (1.2) |
Using ChatGPT is advantageous in many tasks. |
3.6 (1.1) |
3.7 (1.0) |
3.5 (1.1) |
Compared to other systems, using ChatGPT is more fun. |
3.5 (1.0) |
3.6 (1.0) |
3.5 (1.0) |
Much can go wrong when using ChatGPT. |
3.1 (1.1) |
3.1 (1.0) |
3.1 (1.2) |
There are legal issues with using ChatGPT. |
3.1 (1.2) |
3.0 (1.0) |
3.1 (1.3) |
The security of user data is not guaranteed with ChatGPT. |
3.2 (1.0) |
3.0 (1.0) |
3.3 (1.1) |
Using ChatGPT could bring personal disadvantages. |
2.7 (1.1) |
2.5 (0.9) |
2.8 (1.2) |
In my environment, using ChatGPT is standard. |
2.5 (1.1) |
2.5 (0.9) |
2.5 (1.3) |
Almost everyone in my environment uses ChatGPT. |
2.4 (1.2) |
2.4 (1.0) |
2.3 (1.3) |
Not using ChatGPT is considered being an outsider. |
2.0 (1.2) |
1.8 (1.0) |
2.1 (1.3) |
Using ChatGPT brings me recognition from my environment. |
2.3 (1.2) |
2.4 (1.2) |
2.3 (1.3) |
4 missing case(s) omitted.
Index: Table
Gender | min | q1 | median | q3 | max | mean | sd | ci low | ci high | n | items | alpha |
---|---|---|---|---|---|---|---|---|---|---|---|---|
female | 2 | 2.4 | 2.9 | 3.1 | 3.8 | 2.9 | 0.5 | 2.7 | 3.0 | 37 | 12 | 0.81 |
male | 1 | 2.5 | 2.8 | 3.2 | 5.0 | 2.9 | 0.7 | 2.7 | 3.1 | 59 | 12 | 0.81 |
total | 1 | 2.4 | 2.8 | 3.2 | 5.0 | 2.9 | 0.6 | 2.7 | 3.0 | 96 | 12 | 0.81 |
4 missing case(s) omitted.
Custom tab sheets
By default, a header and tabsheets are automatically created. You can mix in custom content.
- If you want to add content before the report outputs, set the title
parameter to
FALSE
and add your own title. - A good place for methodological details is a custom tabsheet next to
the “Plot” and the “Table” buttons. You can add a tab by setting the
close-parameter to
FALSE
and adding a new header on the fifth level (5 x # followed by the tab name). Close your custom new tabsheet with#### {-}
(4 x #).
All together, the following report output is generated by the pattern:
#> ### Adoption types
#>
#> ```{r echo=FALSE}
#> ds %>%
#> filter(sd_gender != "diverse") %>%
#> report_counts(adopter, sd_gender, prop="rows", title=FALSE, close=FALSE, box=TRUE, ci=TRUE)
#> ```
#>
#> ##### Method
#> Basis: Only male and female respondents.
#>
#> #### {-}
Adoption types
Table
Innovator type | total | female | male |
---|---|---|---|
I try new offers immediately |
100% (14) |
14% (2) |
86% (12) |
I try new offers rather quickly |
100% (63) |
40% (25) |
60% (38) |
I wait until offers establish themselves |
100% (22) |
59% (13) |
41% (9) |
I only use new offers when I have no other choice |
100% (1) |
0% (0) |
100% (1) |
total |
100% (100) |
40% (40) |
60% (60) |
Theming
Plot and table functions share a number of parameters that can be used to customize the outputs. Lookup the available parameters in the help of the specific function.
The theme_vlkr()-function lets you customise colors:
theme_set(theme_vlkr(
base_fill = c("#F0983A","#3ABEF0","#95EF39","#E35FF5","#7A9B59"),
base_gradient = c("#FAE2C4","#F0983A")
))
Custom labels
Labels used in plots and tables are stored in the comment attribute
of the variable. You can inspect all labels using the
codebook()
-function:
codebook(ds)
#> # A tibble: 94 × 6
#> item_name item_group item_class item_label value_name value_label
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 case case numeric case NA NA
#> 2 sd_age sd numeric Age NA NA
#> 3 cg_activities cg character Activities with C… NA NA
#> 4 adopter adopter factor Innovator type I try new… I try new …
#> 5 adopter adopter factor Innovator type I try new… I try new …
#> 6 adopter adopter factor Innovator type I wait un… I wait unt…
#> 7 adopter adopter factor Innovator type I only us… I only use…
#> 8 adopter adopter factor Innovator type [no answe… [no answer]
#> 9 sd_gender sd factor Gender female female
#> 10 sd_gender sd factor Gender male male
#> # ℹ 84 more rows
You can set specific column labels by providing a named list to the
items-parameter of labs_apply()
:
ds %>%
labs_apply(
items = list(
"cg_adoption_advantage_01" = "Allgemeine Vorteile",
"cg_adoption_advantage_02" = "Finanzielle Vorteile",
"cg_adoption_advantage_03" = "Vorteile bei der Arbeit",
"cg_adoption_advantage_04" = "Macht mehr Spaß"
)
) %>%
tab_metrics(starts_with("cg_adoption_advantage_"))
Item | min | q1 | median | q3 | max | mean | sd | n |
---|---|---|---|---|---|---|---|---|
Allgemeine Vorteile | 1 | 3 | 4 | 4 | 5 | 3.5 | 1.0 | 99 |
Finanzielle Vorteile | 1 | 2 | 3 | 4 | 5 | 2.7 | 1.2 | 99 |
Vorteile bei der Arbeit | 1 | 3 | 4 | 4 | 5 | 3.6 | 1.1 | 99 |
Macht mehr Spaß | 1 | 3 | 4 | 4 | 5 | 3.5 | 1.0 | 99 |
2 missing case(s) omitted.
Labels for values inside a column can be adjusted by providing a
named list to the values-parameter of labs_apply()
. In
addition, select the columns where value labels should be changed:
ds %>%
labs_apply(
cols=starts_with("cg_adoption"),
values = list(
"1" = "Stimme überhaupt nicht zu",
"2" = "Stimme nicht zu",
"3" = "Unentschieden",
"4" = "Stimme zu",
"5" = "Stimme voll und ganz zu"
)
) %>%
plot_metrics(starts_with("cg_adoption"))
To conveniently manage all labels of a dataset, save the result of
codebook()
to an Excel file, change the labels manually in
a copy of the Excel file, and finally call labs_apply()
with your revised codebook.
library(readxl)
library(writexl)
# Save codebook to a file
codes <- codebook(ds)
write_xlsx(codes,"codebook.xlsx")
# Load and apply a codebook from a file
codes <- read_xlsx("codebook_revised.xlsx")
ds <- labs_apply(ds, codebook)
Be aware that some data operations such as mutate()
from
the tidyverse loose labels on their way. In this case, store the labels
(in the codebook attribute of the data frame) before the operation and
restore them afterwards:
ds %>%
labs_store() %>%
mutate(sd_age = 2024 - sd_age) %>%
labs_restore() %>%
tab_metrics(sd_age)
Age | value |
---|---|
min | 1956 |
q1 | 1972 |
median | 1986 |
q3 | 1997 |
max | 2006 |
mean | 1984.3 |
sd | 13.8 |
n | 101 |
Index calculation for item batteries
You can calculate mean indexes from a bunch of items using
idx_add()
. A new column is created with the average value
of all selected columns for each case.
Reliability and number of items are calculated with
psych::alpha()
and stored as column attribute named
“psych.alpha”. The reliability values are printed by
tab_metrics()
.
Add a single index
ds %>%
idx_add(starts_with("cg_adoption_")) %>%
tab_metrics(idx_cg_adoption)
Index: cg_adoption | value |
---|---|
min | 1 |
q1 | 2.4 |
median | 2.8 |
q3 | 3.2 |
max | 5 |
mean | 2.9 |
sd | 0.6 |
n | 97 |
items | 12 |
alpha | 0.81 |
4 missing case(s) omitted.
Compare the index values by group
ds %>%
idx_add(starts_with("cg_adoption_")) %>%
tab_metrics(idx_cg_adoption, adopter)
Innovator type | min | q1 | median | q3 | max | mean | sd | n | items | alpha |
---|---|---|---|---|---|---|---|---|---|---|
I try new offers immediately | 1.5 | 3.2 | 3.3 | 4.1 | 5.0 | 3.5 | 0.9 | 15 | 12 | 0.81 |
I try new offers rather quickly | 1.8 | 2.5 | 2.8 | 3.1 | 3.8 | 2.8 | 0.5 | 61 | 12 | 0.81 |
I wait until offers establish themselves | 1.0 | 2.4 | 2.7 | 3.0 | 3.8 | 2.7 | 0.6 | 20 | 12 | 0.81 |
I only use new offers when I have no other choice | 2.4 | 2.4 | 2.4 | 2.4 | 2.4 | 2.4 | 1 | 12 | 0.81 | |
total | 1.0 | 2.4 | 2.8 | 3.2 | 5.0 | 2.9 | 0.6 | 97 | 12 | 0.81 |
4 missing case(s) omitted.
Add multiple indizes and summarize them
ds %>%
idx_add(starts_with("cg_adoption_")) %>%
idx_add(starts_with("cg_adoption_advantage")) %>%
idx_add(starts_with("cg_adoption_fearofuse")) %>%
idx_add(starts_with("cg_adoption_social")) %>%
tab_metrics(starts_with("idx_cg_adoption"))
Item | min | q1 | median | q3 | max | mean | sd | n | items | alpha |
---|---|---|---|---|---|---|---|---|---|---|
Index: cg_adoption | 1 | 2.4 | 2.8 | 3.2 | 5 | 2.9 | 0.6 | 97 | 12 | 0.81 |
Index: cg_adoption_advantage_0 | 1 | 3.0 | 3.5 | 3.8 | 5 | 3.3 | 0.9 | 97 | 4 | 0.8 |
Index: cg_adoption_fearofuse_0 | 1 | 2.5 | 3.0 | 3.5 | 5 | 3.0 | 0.8 | 97 | 4 | 0.7 |
Index: cg_adoption_social_0 | 1 | 1.5 | 2.0 | 3.0 | 5 | 2.3 | 1.0 | 97 | 4 | 0.84 |
4 missing case(s) omitted.
What’s behind the scenes?
The volker-package is based on standard methods for data handling and visualisation. You can produce all outputs with a handful of functions. The package just makes your code dry - don’t repeat yourself - and wraps often used snippets into a simple interface.
Basically, all table values are calculated two tidyverse functions:
-
count()
is used to produce counts
-
skim()
is used to produce metrics
To shape the data frames, two essential functions come into play:
-
group_by()
is used to calculate grouped outputs -
pivot_longer()
brings multiple items into a format where the item name becomes a grouping variable.
Plots are generated by ggplot()
.
The package provides print- and knit-functions that pimp console and
markdown output. To make this work, the cleaned data, produced plots,
tables and markdown snippets gain new classes (vlkr_df
,
vlkr_plt
, vlkr_tbl
, vlkr_list
,
vlkr_rprt
).