-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathREADME.Rmd
123 lines (89 loc) · 8.6 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
title: "Simple Conjoint Analyses and Visualization"
author: "Thomas J. Leeper"
output:
md_document:
variant: markdown_github
---
**cregg** is a package for analyzing and visualizing the results of conjoint ("cj") factorial experiments using methods described by Hainmueller, Hopkins, and Yamamoto (2014). It provides functionality that is useful for analyzing and otherwise examining conjoint experimental data through a main function - `cj()` - that simply wraps around a number of analytic tools:
- Estimation of average marginal component effects (AMCEs) for fully randomized conjoint designs (as well as designs involving an unlimited number of two-way constraints between features) and munging of AMCE estimates into tidy data frames, via `amce()`
- Calculation of marginal means (MMs) for conjoint designs and munging them into tidy data frames via `mm()`
- Tabulation of display frequencies of feature levels via `cj_table()` and `cj_freqs()` and cross-tabulation of feature restrictions using `cj_props()`
- Diagnostics to assess preference heterogeneity, including an omnibus statstical test (`cj_anova()`) and tidying of differences in MMs (`mm_diffs()`) and AMCEs (`amce_diffs()`) across subgroups
In addition, the package provides a number of tools that are likely useful to conjoint analysts:
- **ggplot2**-based visualizations of AMCEs and MMs, via `plot()` methods for all of the above
- Tidying of raw "wide"-format conjoint survey datasets into "long" or "tidy" datasets using `cj_tidy()`
- Diagnostics to choose feature reference categories, via `amce_by_reference()`
To demonstrate package functionality, the package includes three example datasets:
- `taxes`, a full randomized choice task conjoint experiment conducted by Ballard-Rosa et al. (2016)
- `immigration`, a partial factorial conjoint experiment with several restrictions between features conducted by Hainmueller, Hopkins, and Yamamoto (2014)
- `conjoint_wide`, a simulated "wide"-format conjoint dataset that is used to demonstrate functionality of `cj_tidy()`
The design of cregg follows a few key princples:
- Following tidy data principles throughout, so that all of the main functions produce consistently structured, metadata-rich data frames. Thus the response from any function is a tidy data frame that can easily be stacked with others (e.g., for computing AMCEs for subsets of respondents) and then producing ggplot2 visualizations.
- A formula-based interface that meshes well with the underlying [**survey**](https://cran.r-project.org/package=survey)-based effect estimation API.
- A consistent interface for both unconstrained and two-way constrained designs that relies only on formula notation without any package-specific "design" specification. Conjoint designs involving two-way constraints between features are easily supported using simple formula notation: `Y ~ A + B + C` implies an unconstrained design, while `Y ~ A * B + C` implies a constraint between levels of features A and B. cregg figures out the constraints automatically without needing to further specify them explicitly.
cregg also provides some sugar:
- Using "label" attributes on variables to provide pretty printing, with options to relabel features or plots on the fly. The `cj_df()` function (and data frame class "cj_df") is designed to preserve these attributes during subsetting.
- Using factor base levels rather than trying to set baseline levels atomically
- A convenient API (via the `cj(..., by = ~ group)` idiom) for repeated, subgroup operations without the need for `lapply()` or `for` loops
- All functions have arguments in data-formula order, making it simple to pipe into them via the magrittr pipe (`%>%`).
A detailed website showcasing package functionality is available at: https://thomasleeper.com/cregg/. Contributions and feedback are welcome on [GitHub](https://github.com/leeper/cregg/issues).
The package, whose primary point of contact is `cj()`, takes its name from the surname of a famous White House Press Secretary.
## Basic Code Examples
```{r opts, echo=FALSE}
options(width = 120)
knitr::opts_knit$set(upload.fun = knitr::imgur_upload, base.url = NULL)
knitr::opts_chunk$set(comment = "", warning = FALSE, message = FALSE, echo = TRUE, tidy = TRUE, size="small", fig.width = 10, fig.height = 10)
```
The package includes several example conjoint datasets, which is used here and and in examples:
```{r load}
library("cregg")
data("immigration")
data("taxes")
```
The package provides straightforward calculation and visualization of descriptive marginal means (MMs). These represent the mean outcome across all appearances of a particular conjoint feature level, averaging across all other features. In forced choice conjoint designs, MMs by definition average 0.5 with values above 0.5 indicating features that increase profile favorability and values below 0.5 indicating features that decrease profile favorability. For continuous outcomes, MMs can take any value in the full range of the outcome. Calculation of MMs entail no modelling assumptions are simply descriptive quantities of interest:
```{r mmplot}
# descriptive plotting
f1 <- ChosenImmigrant ~ Gender + Education + LanguageSkills +
CountryOfOrigin + Job + JobExperience + JobPlans +
ReasonForApplication + PriorEntry
plot(mm(immigration, f1, id = ~ CaseID), vline = 0.5)
```
cregg functions uses `attr(data$feature, "label")` to provide pretty printing of feature labels, so that variable names can be arbitrary. These can be overwritten using the `feature_labels` argument to override these settings. Feature levels are always deduced from the `levels()` of righthand-side variables in the model specification. All variables should be factors with levels in desired display order. Similarly, the plotted order of features is given by the order of terms in the RHS formula unless overridden by the order of variable names given in `feature_order`.
A more common analytic approach for conjoints is to estimate average marginal component effects (AMCEs) using some form of regression analysis. cregg uses `glm()` and `svyglm()` to perform estimation and [margins](https://cran.r-project.org/package=margins) to generate average marginal effect estimates. Designs can be specified with any interactions between conjoint features but only AMCEs are returned. (No functionality is provided at the moment for explict estimation of feature interaction effects.) Just like for `mm()`, the output of `cj()` (or its alias, `amce()`) is a tidy data frame:
```{r amce}
# estimation
amces <- cj(taxes, chose_plan ~ taxrate1 + taxrate2 + taxrate3 +
taxrate4 + taxrate5 + taxrate6 + taxrev, id = ~ ID)
head(amces[c("feature", "level", "estimate", "std.error")], 20L)
```
This makes it very easy to modify, combine, print, etc. the resulting output. It also makes it easy to visualize using ggplot2. A convenience visualization function is provided:
```{r plot_amce}
# plotting of AMCEs
plot(amces)
```
To provide simple subgroup analyses, the `cj()` function provides a `by` argument to iterate over subsets of `data` and calculate AMCEs or MMs on each subgroup. For example, we may want to ensure that there are no substantial variations in preferences within-respondents across multiple conjoint decision tasks:
```{r mm_by}
immigration$contest_no <- factor(immigration$contest_no)
mm_by <- cj(immigration, ChosenImmigrant ~ Gender + Education + LanguageSkills,
id = ~ CaseID, estimate = "mm", by = ~ contest_no)
plot(mm_by, group = "contest_no", vline = 0.5)
```
A more formal test of these differences is provided by a nested model comparison test:
```{r cj_anova}
cj_anova(immigration, ChosenImmigrant ~ Gender + Education + LanguageSkills, by = ~ contest_no)
```
which provides a test of whether any of the interactions between the `by` variable and feature levels differ from zero.
Again, a detailed website showcasing package functionality is available at: https://thomasleeper.com/cregg/ and the content thereof is installed as a vignette. The package documentation provides further examples.
## Installation
[![CRAN](https://www.r-pkg.org/badges/version/cregg)](https://cran.r-project.org/package=cregg)
![Downloads](https://cranlogs.r-pkg.org/badges/cregg)
[![Travis Build Status](https://travis-ci.org/leeper/cregg.png?branch=master)](https://travis-ci.org/leeper/cregg)
[![codecov.io](https://codecov.io/github/leeper/cregg/coverage.svg?branch=master)](https://codecov.io/github/leeper/cregg?branch=master)
This package can be installed directly from CRAN. To install the latest development version you can pull from GitHub:
```R
if (!require("remotes")) {
install.packages("remotes")
}
remotes::install_github("leeper/cregg")
```