Visualize and tabulate single-choice, multiple-choice, matrix-style questions from survey data. Includes ability to group cross-tabulations, frequency distributions, and plots by categorical variables.
With each plot or table there is also the option to in integrate survey weights.
The functions are ideal for quickly uncovering descriptive patterns in survey data.
install.packages("surveyexplorer")
# or devtools::install_github("liamhaller/surveyexplorer") for the devlopment version
library(surveyexplorer)
The data used in the following examples is from the berlinbears
dataset, a fictional survey of bears in Berlin, that is included in the
surveyexplorer
package.
#Basic table
single_table(berlinbears,
question = income)
Question: income | ||
n | freq | |
---|---|---|
<1000 | 82 | 16.40% |
1000-2000 | 50 | 10.00% |
2000-3000 | 177 | 35.40% |
3000-4000 | 109 | 21.80% |
5000+ | 57 | 11.40% |
No answer | 22 | 4.40% |
NA | 3 | 0.60% |
Column Total | 500 | 1 |
Use group_by =
to partition the question into several groups
single_table(berlinbears,
question = income,
group_by = gender)
Question: income | ||||||||
grouped by: gender | ||||||||
female | male | NA | Rowwise Total | |||||
---|---|---|---|---|---|---|---|---|
Frequency | Count | Frequency | Count | Frequency | Count | Frequency | Count | |
<1000 | 16.74% | 39 | 15.73% | 39 | 21.05% | 4 | 16.40% | 82 |
1000-2000 | 9.87% | 23 | 9.68% | 24 | 15.79% | 3 | 10.00% | 50 |
2000-3000 | 35.62% | 83 | 35.89% | 89 | 26.32% | 5 | 35.40% | 177 |
3000-4000 | 21.89% | 51 | 22.18% | 55 | 15.79% | 3 | 21.80% | 109 |
5000+ | 11.59% | 27 | 10.89% | 27 | 15.79% | 3 | 11.40% | 57 |
No answer | 3.86% | 9 | 4.84% | 12 | 5.26% | 1 | 4.40% | 22 |
NA | 0.43% | 1 | 0.81% | 2 | 0.00% | 0 | 0.60% | 3 |
Columnwise Total | 46.60% | 233 | 49.60% | 248 | 3.80% | 19 | 100.00% | 500 |
Ignore unwanted subgroups with subgroups_to_exclude
single_table(berlinbears,
question = income,
group_by = gender,
subgroups_to_exclude = NA)
Question: income | ||||||
grouped by: gender | ||||||
female | male | Rowwise Total | ||||
---|---|---|---|---|---|---|
Frequency | Count | Frequency | Count | Frequency | Count | |
<1000 | 16.74% | 39 | 15.73% | 39 | 16.22% | 78 |
1000-2000 | 9.87% | 23 | 9.68% | 24 | 9.77% | 47 |
2000-3000 | 35.62% | 83 | 35.89% | 89 | 35.76% | 172 |
3000-4000 | 21.89% | 51 | 22.18% | 55 | 22.04% | 106 |
5000+ | 11.59% | 27 | 10.89% | 27 | 11.23% | 54 |
No answer | 3.86% | 9 | 4.84% | 12 | 4.37% | 21 |
NA | 0.43% | 1 | 0.81% | 2 | 0.62% | 3 |
Columnwise Total | 48.44% | 233 | 51.56% | 248 | 100.00% | 481 |
Remove NAs from the question variable with na.rm
single_table(berlinbears,
question = income,
group_by = gender,
subgroups_to_exclude = NA,
na.rm = TRUE)
Question: income | ||||||
grouped by: gender | ||||||
female | male | Rowwise Total | ||||
---|---|---|---|---|---|---|
Frequency | Count | Frequency | Count | Frequency | Count | |
<1000 | 16.81% | 39 | 15.85% | 39 | 16.32% | 78 |
1000-2000 | 9.91% | 23 | 9.76% | 24 | 9.83% | 47 |
2000-3000 | 35.78% | 83 | 36.18% | 89 | 35.98% | 172 |
3000-4000 | 21.98% | 51 | 22.36% | 55 | 22.18% | 106 |
5000+ | 11.64% | 27 | 10.98% | 27 | 11.30% | 54 |
No answer | 3.88% | 9 | 4.88% | 12 | 4.39% | 21 |
Columnwise Total | 48.54% | 232 | 51.46% | 246 | 100.00% | 478 |
Finally, you can specify survey weights using the weight option
single_table(berlinbears,
question = income,
group_by = gender,
subgroups_to_exclude = NA,
na.rm = TRUE,
weights = weights)
Question: income | ||||||
grouped by: gender | ||||||
female | male | Rowwise Total | ||||
---|---|---|---|---|---|---|
Frequency | Count | Frequency | Count | Frequency | Count | |
<1000 | 15.96% | 59.6 | 17.21% | 75.2 | 16.63% | 134.8 |
1000-2000 | 10.46% | 39.1 | 10.19% | 44.5 | 10.31% | 83.6 |
2000-3000 | 33.79% | 126.3 | 33.88% | 148.0 | 33.84% | 274.3 |
3000-4000 | 25.08% | 93.7 | 25.34% | 110.7 | 25.22% | 204.4 |
5000+ | 9.82% | 36.7 | 8.68% | 37.9 | 9.21% | 74.6 |
No answer | 4.90% | 18.3 | 4.70% | 20.5 | 4.79% | 38.8 |
Columnwise Total | 46.10% | 373.6 | 53.90% | 436.9 | 100.00% | 810.5 |
Frequencies and counts are weighted |
The same syntax can be applied to the single_freq
function to plot
frequencies of the question optionally partitioned by subgroups.
single_freq(berlinbears,
question = income,
group_by = gender,
subgroups_to_exclude = NA,
na.rm = TRUE,
weights = weights)
The options and syntax for multiple-choice tables multi_table
and
graphs multi_graphs
are the same. The only difference is the question
input also accommodates tidyselect syntax to select several columns for
each answer option. For example, the question “will_eat” has five answer
options each prefixed by “will_eat”
berlinbears |>
dplyr::select(starts_with('will_eat')) |>
head()
#> will_eat.SQ001 will_eat.SQ002 will_eat.SQ003 will_eat.SQ004 will_eat.SQ005
#> 1 0 1 0 1 1
#> 2 0 1 1 1 1
#> 3 1 1 0 1 1
#> 4 0 0 0 1 0
#> 5 0 0 0 1 1
#> 6 0 0 0 1 0
The same syntax can be used to select the question for the multiple choice tables and graphs
multi_table(berlinbears,
question = dplyr::starts_with('will_eat'),
group_by = genus,
subgroups_to_exclude = NA,
na.rm = TRUE)
Question: dplyr::starts_with("will_eat") | ||||||
grouped by: genus | ||||||
Ailuropoda | Ursus | Rowwise Total | ||||
---|---|---|---|---|---|---|
Frequency | Count | Frequency | Count | Frequency | Count | |
will_eat.SQ004 | 97.53% | 237 | 92.07% | 151 | 40.00% | 388 |
will_eat.SQ002 | 58.02% | 141 | 63.41% | 104 | 25.26% | 245 |
will_eat.SQ005 | 46.09% | 112 | 48.78% | 80 | 19.79% | 192 |
will_eat.SQ001 | 25.10% | 61 | 26.83% | 44 | 10.82% | 105 |
will_eat.SQ003 | 9.05% | 22 | 10.98% | 18 | 4.12% | 40 |
Columnwise Total | 59.07% | 573 | 40.93% | 397 | 100.00% | 970 |
For graphing, the multi_freq
function creates an UpSet plot to
visualize the frequencies of the intersecting sets for each answer
combination and also includes the ability to specify weights.
multi_freq(berlinbears,
question = dplyr::starts_with('will_eat'),
na.rm = TRUE,
weights = weights)
#> Estimes are only preciese to one significant digit, weights may have been rounded
The graphs can also be grouped
multi_freq(berlinbears,
question = dplyr::starts_with('will_eat'),
group_by = genus,
subgroups_to_exclude = NA,
na.rm = FALSE,
weights = weights)
#> Estimes are only preciese to one significant digit, weights may have been rounded
matrix_table
has the same syntax as above and works with array or
categorical questions
matrix_table(berlinbears,
dplyr::starts_with('c_'),
group_by = is_parent)
Question: dplyr::starts_with("c_") | ||||
grouped by: is_parent | ||||
high | low | medium | NA | |
---|---|---|---|---|
0 | ||||
c_diet | 6.02% (20) | 71.99% (239) | 16.57% (55) | 5.42% (18) |
c_exercise | 25% (83) | 27.71% (92) | 24.1% (80) | 23.19% (77) |
1 | ||||
c_diet | 3.57% (6) | 75% (126) | 17.26% (29) | 4.17% (7) |
c_exercise | 19.05% (32) | 27.38% (46) | 23.81% (40) | 29.76% (50) |
matrix_freq
visualizes the frequencies of responses
matrix_freq(berlinbears,
dplyr::starts_with('p_'),
na.rm = TRUE)
For array/matrix style questions that are numeric matrix_mean
plots
the mean values and confidence intervals
matrix_mean(berlinbears,
question = dplyr::starts_with('p_'),
na.rm = TRUE)
#Can also apply grouping + survey weights
matrix_mean(berlinbears,
question = dplyr::starts_with('p_'),
na.rm = TRUE,
group_by = species,
subgroups_to_exclude = NA)
Finally, for Likert questions (scales of 3,5,7,9…) matrix_likert
provides a custom plot
#you can specify custom labels with the `label` argument
matrix_likert(berlinbears,
question = dplyr::starts_with('p_'),
labels = c('Strongly disagree', 'Disagree','Neutral','Agree','Strongly agree'))
#can also apply pass custom colors and specify weights weights
matrix_likert(berlinbears,
question = dplyr::starts_with('p_'),
labels = c('Strongly disagree', 'Disagree','Neutral','Agree','Strongly agree'),
colors = c("#E1AA28", "#1E5F46", "#7E8F75", "#EFCD83", "#E17832"),
weights = weights)
- Single-choice
single_table
single_freq
- Multiple-choice
multi_table
multi_freq
- Matrix
matrix_table
matrix_freq
matrix_mean
matrix_likert
*_table functions return a gt table of the cross tabulations and frequencies for each question while *_freq returns the same data but as a plot.
For matrix-style questions with numerical input, matrix_mean
plots the
mean value value and ± two standard deviations. matrix_likert
visualizes questions that accept Likert responses (strongly
agree-strongly disagree) or questions with 3,5,7,9… categories.
Each function contains the following options
- dataset —The input dataframe (or tibble) of survey questions
- question — The column(s) that contain the response options for a question, can be selected by using tidyselect semantics or providing a vector of column names or numbers
- group_by — Optional variable to group the analysis. If provided, the frequencies and counts will be calculated within each subgroup
- subgroups_to_exclude — Optional vector specifying subgroups to exclude from the analysis
- weights — Optional variable containing survey weights. If provided, frequencies and counts will be weighted accordingly
- na.rm — Logical indicating whether to remove NA values from question before analysis