--- title: "Variable Encoding" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Variable Encoding} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, warning = FALSE, message = FALSE, fig.retina = 3, comment = "#>" ) ``` This article demonstrates how to convert between different encoding schemes for categorical variables in choice-based conjoint designs using the `cbc_encode()` function. # Overview Choice-based conjoint data can use different encoding schemes for categorical variables: - **Standard encoding**: Categorical variables represented as factors or characters - **Dummy coding**: Binary indicators with a reference category (all zeros) - **Effects coding**: Coded as -1, 0, or 1 to ensure coefficients sum to zero The `cbc_encode()` function allows you to convert between these encodings and customize reference levels. # Basic Encoding Conversion ## Creating a Design Let's start by creating a simple design: ```{r} library(cbcTools) # Create profiles profiles <- cbc_profiles( price = c(1, 1.5, 2, 2.5, 3), type = c("Fuji", "Gala", "Honeycrisp"), freshness = c("Poor", "Average", "Excellent") ) # Create design (uses standard encoding by default) design <- cbc_design( profiles = profiles, n_alts = 3, n_q = 6, n_resp = 100, method = "random" ) head(design) ``` By default, designs are created with **standard encoding** where categorical variables remain as factors. ## Converting to Dummy Coding Convert to dummy coding for model estimation: ```{r} design_dummy <- cbc_encode(design, coding = "dummy") head(design_dummy) ``` Notice that: - The `type` variable is replaced with `typeGala` and `typeHoneycrisp` - The `freshness` variable is replaced with `freshnessAverage` and `freshnessExcellent` - `Fuji` and `Poor` are the reference levels (represented when dummy variables = 0) - Continuous variables like `price` remain unchanged ## Converting to Effects Coding Effects coding uses -1 for the reference level: ```{r} design_effects <- cbc_encode(design, coding = "effects") head(design_effects) ``` In effects coding: - Non-reference levels are coded as 0 or 1 (same as dummy) - Reference level rows have -1 for all level indicators - This ensures coefficients sum to zero ## Converting Back to Standard Convert back to categorical variables: ```{r} design_standard <- cbc_encode(design_dummy, coding = "standard") head(design_standard) ``` # Customizing Reference Levels By default, the first level of each categorical variable is used as the reference. You can specify different reference levels using the `ref_levels` argument. ## Setting Custom References ```{r} # Use "Honeycrisp" as reference for type, "Excellent" for freshness design_custom <- cbc_encode( design, coding = "dummy", ref_levels = list( type = "Honeycrisp", freshness = "Excellent" ) ) head(design_custom) ``` Now `Honeycrisp` and `Excellent` are the reference categories. ## Updating References Without Changing Encoding You can update reference levels while keeping the current encoding: ```{r} # Start with dummy coding design_dummy <- cbc_encode(design, coding = "dummy") # Update reference levels only (keeps dummy coding) design_updated <- cbc_encode( design_dummy, ref_levels = list(type = "Gala") ) head(design_updated) ``` # Working with No-Choice Options When using designs with no-choice options, you should convert to dummy coding before power analysis or model estimation: ```{r} # Create profiles profiles_nc <- cbc_profiles( price = c(1, 2, 3), quality = c("Low", "High") ) # Create priors including no-choice priors_nc <- cbc_priors( profiles = profiles_nc, price = -0.1, quality = c("High" = 0.5), no_choice = -1.5 ) # Create design with no-choice design_nc <- cbc_design( profiles = profiles_nc, priors = priors_nc, n_alts = 2, n_q = 4, n_resp = 50, no_choice = TRUE, method = "random" ) # Simulate choices choices_nc <- cbc_choices(design_nc, priors_nc) head(choices_nc) ``` For modeling or power analysis with no-choice data, convert to dummy or effects coding: ```{r eval=FALSE} # Convert to dummy coding for power analysis choices_dummy <- cbc_encode(choices_nc, coding = "dummy") # Run power analysis power_result <- cbc_power( data = choices_dummy, n_breaks = 5 ) power_result ``` # Use Cases ## For Model Estimation While it is not required for the `logitr` package, encoding the data into dummy or effects coding can be helpful when estimating models for easier interpretation or simply greater control over which levels are included in the model: ```{r eval=FALSE} library(logitr) # Convert to dummy coding choices_dummy <- cbc_encode(choices, coding = "dummy") # Estimate model model <- logitr( data = choices_dummy, outcome = "choice", obsID = "obsID", pars = c("price", "typeGala", "typeHoneycrisp", "freshnessAverage", "freshnessExcellent") ) ``` ## For Data Inspection It is generally easier to inspect your data when using standard encoding: ```{r} # Work with categorical variables choices_standard <- design # Filter for chosen alternatives chosen <- choices_standard[sample(1:nrow(choices_standard), 100), ] # Examine choice frequencies by category table(chosen$type) table(chosen$freshness) # Use cbc_inspect cbc_inspect(choices_standard, sections = 'balance') ``` ## For Power Analysis You can use either encoding, but results differ: ```{r eval=FALSE} # Dummy coding: estimates for each level power_dummy <- cbc_power( cbc_encode(choices, coding = "dummy"), n_breaks = 5 ) # Standard coding: estimates categorical effect power_standard <- cbc_power( cbc_encode(choices, coding = "standard"), pars = c("price", "type", "freshness"), n_breaks = 5 ) ```