This vignette provides an introduction to cregg, a package for analyzing and visualizing the results of conjoint experiments, which are factorial discrete choice experiments that are increasingly popular in the political and social sciences for studying decision making and preferences over multidimensional issues. cregg provides functionality that is useful for analyzing and otherwise examining data from these designs, namely:
amce()
mm()
cj_table()
and cj_freqs()
and cross-tabulation of feature restrictions using cj_props()
cj_tidy()
In addition, the package provides a number of tools that are likely useful to conjoint analysts:
plot()
methods for all of the abovecj_tidy()
amce_by_reference()
To demonstrate package functionality, the package includes three example datasets:
taxes
, a full randomized choice task conjoint experiment conducted by Ballard-Rosa et al. (2016)immigration
, a partial factorial conjoint experiment with several restrictions between features conducted by Hainmueller, Hopkins, and Yamamoto (2014)conjoint_wide
, a simulated “wide”-format conjoint dataset that is used to demonstrate functionality of cj_tidy()
The design of cregg follows a few key principles:
Y ~ A + B + C
implies an unconstrained design, while Y ~ A * B + C
implies a constraint between levels of features A and B. cregg figures out the constrained level pairs automatically without needing to further specify them explicitly.cregg also provides some sugar:
cj(..., by = ~ group)
idiom) for repeated, subgroup operations without the need for lapply()
or for
loops%>%
).The package, whose primary point of contact is cj()
, takes its name from the surname of a famous White House Press Secretary.
Contributions and feedback are welcome on GitHub.
The package includes several example conjoint datasets, which are used here and in examples:
library("cregg")
data("immigration")
The package provides straightforward calculation and visualization of descriptive marginal means (MMs). These represent the mean outcome across all appearances of a particular conjoint feature level, averaging across all other features. In forced choice conjoint designs with two profiles per choice task, MMs by definition average 0.5 with values above 0.5 indicating features that increase profile favorability and values below 0.5 indicating features that decrease profile favorability. (They will average 0.33 in designs with three profiles, 0.25 with four profiles, etc.) For continuous/ordinal outcomes, MMs can take any value in the full range of the outcome. Calculation of MMs entail no modelling assumptions are simply descriptive quantities of interest:
# descriptive plotting
f1 <- ChosenImmigrant ~ Gender + LanguageSkills + PriorEntry + Education * Job + CountryOfOrigin * ReasonForApplication +
JobExperience + JobPlans
plot(mm(immigration, f1, id = ~CaseID), vline = 0.5)
cregg functions use attr(data$feature, "label")
to provide pretty printing of feature labels, so that variable names can be arbitrary. These can be overwritten using the feature_labels
argument to override these settings within cregg functions. (To overwrite/create feature labels permanently, use for example attr(immigration$LanguageSkills, "label") <- "English Proficiency"
.) Feature levels are always deduced from the levels()
of righthand-side variables in the model specification. All variables should be factors with levels in desired display order. Similarly, the plotted order of features is given by the order of terms in the RHS formula unless overridden by the order of variable names given in feature_order
.
A more common analytic approach for conjoints is to estimate average marginal component effects (AMCEs) using some form of regression analysis. cregg uses glm()
and svyglm()
to perform estimation and margins to generate average marginal effect estimates. Designs can be specified with any interactions between conjoint features but only AMCEs are returned. Any terms that linked by a *
in the formula are treated as design constraints and AMCEs are estimated cognizant of these constraints; only two-way interactions are supported, however. Just like for mm()
, the output of cj()
(or its alias, amce()
) is a tidy data frame:
# estimation
amces <- cj(immigration, f1, id = ~CaseID)
head(amces[c("feature", "level", "estimate", "std.error")], 20L)
feature level estimate std.error
1 Gender Female 0.000000000 NA
2 Gender Male -0.026023159 0.008012413
3 Language Skills Fluent English 0.000000000 NA
4 Language Skills Broken English -0.056319723 0.011312971
5 Language Skills Tried English but Unable -0.126359527 0.011370259
6 Language Skills Used Interpreter -0.159740917 0.011589128
7 Prior Entry Never 0.000000000 NA
8 Prior Entry Once as Tourist 0.055954954 0.012463168
9 Prior Entry Many Times as Tourist 0.054748425 0.012912603
10 Prior Entry Six Months with Family 0.075317887 0.012603718
11 Prior Entry Once w/o Authorization -0.110084275 0.013026767
12 Educational Attainment No Formal 0.000000000 NA
13 Educational Attainment 4th Grade 0.033068508 0.014957991
14 Educational Attainment 8th Grade 0.057744013 0.014961024
15 Educational Attainment High School 0.119483476 0.015101284
16 Educational Attainment Two-Year College 0.163405352 0.023021124
17 Educational Attainment College Degree 0.190036705 0.023063509
18 Educational Attainment Graduate Degree 0.176068029 0.016690780
19 Job Janitor 0.000000000 NA
20 Job Waiter -0.006814709 0.016856109
This makes it very easy to modify, combine, print, etc. the resulting output. It also makes it easy to visualize using ggplot2. A convenience visualization function is provided:
# plotting of AMCEs
plot(amces)
Reference categories for AMCEs are often arbitrary and can affect intuitions about results, so the package also provides a diagnostic tool for helping to decide on an appropriate reference category:
amce_diagnostic <- amce_by_reference(immigration, ChosenImmigrant ~ LanguageSkills, ~LanguageSkills, id = ~CaseID)
plot(amce_diagnostic, group = "REFERENCE", legend_title = "Reference Category")