--- title: "Other Examples of Nested Logit Models" author: "Michael Friendly" date: "`r Sys.Date()`" package: nestedLogit output: rmarkdown::html_vignette: toc: true toc_depth: 1 number_sections: true fig_caption: yes bibliography: ["references.bib", "packages.bib"] csl: apa.csl vignette: > %\VignetteIndexEntry{Other Examples of Nested Logit Models} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, message = FALSE, warning = FALSE, fig.align = "center", fig.height = 5, fig.width = 7, fig.path = "fig/", dev = "png", comment = "#>" ) # packages to be cited in packages.bib .to.cite <- c("nestedLogit", "nnet", "car", "ggplot2", "AER", "dplyr", "mlogit", "vcd", "MASS") ``` # Introduction This document explores datasets beyond those already packaged with `nestedLogit` (`Womenlf`, `GSS`, `gators`, `HealthInsurance`) that could profit from treatment as nested logit models, either because: - the response categories have a natural **hierarchical tree structure** and the independence-of-irrelevant-alternatives (IIA) assumption of plain multinomial logit is implausible across nests, or - the response is **ordinal** and **continuation logits** (a special case of nested dichotomies) are a natural model, and the **proportional odds** model (which assumes equal slopes for the predictors) might not be likely to hold. From a scan of suitable datasets in available packages, the best candidates identified for this vignette are: | Dataset | Package | Response levels | Structure | N | |---------|---------|-----------------|-----------|---| | `Chile` | carData | Abstain/No/Undec/Yes | two competing trees | 2700 | | `TravelMode` | AER | air/train/bus/car | IIA violation | 210 | | `Fishing` | mlogit | beach/pier/boat/charter | shore vs. vessel | 1182 | | `Arthritis` | vcd | None/Some/Marked | ordinal continuation | 84 | | `survey$Smoke` | MASS | Never/Occas/Regul/Heavy | ordinal continuation | 237 | | `pneumo` | VGAM | normal/mild/severe | ordinal continuation | 8 | Polytomous response data comes in different forms (e.g., wide vs. long) so this vignette also provides an opportunity to illustrate transformations among these to make the data suitable for analysis and for plotting. ## Load packages Load the packages we'll use here: ```{r load-packages} library(nestedLogit) library(nnet) # for: multinom() library(car) # for: Anova() library(ggplot2) ``` --- # Chilean plebiscite voting intent (`carData::Chile`) {#chile} ## Data The September, 1988 Chilean national plebiscite asked citizens whether the military government of Augusto Pinochet should remain in power for another eight years. Six months before that, the independent research center FLASCO/Chile conducted a national survey of 27000 randomly selected voters. The main question concerned their intention to vote (`vote`) in the plebiscite, which was recorded as: - **Yes** (support Pinochet) - **No** (oppose Pinochet) - **Undec** (undecided) - **Abstain** (abstain / not vote) [The levels of `Chile$vote` are actually just the first letters: "Y", "N", "U", "A". For ease of interpretation, these are recoded in a step below.] `Chile` is in `carData`, which is already a Suggests: dependency of `nestedLogit`, so no extra package is needed. One main predictor is a standardized scale of support for the `statusquo`, where high values represent general support for the policies of the military regime. Other predictors are: `sex`, `age`, `education` (a factor) and `income` of the respondents. ```{r chile-data} data("Chile", package = "carData") str(Chile) ``` The dataset contains missing values on a number of predictors, but there were 168 cases who did not answer the voting intention question. These are dropped here, so that all models use the same cases. ```{r chile-drop} colSums(is.na(Chile)) # Drop the 168 rows with missing vote so all models use the same cases chile <- Chile[!is.na(Chile$vote), ] # Recode vote with longer labels, ordered by median statusquo (No < Abstain < Undec < Yes) chile$vote <- factor(chile$vote, levels = c("N", "A", "U", "Y"), labels = c("No", "Abstain", "Undec", "Yes")) nrow(chile) table(chile$vote) ``` The strong predictor `statusquo` is a scale measuring attitude toward the political status quo (higher = more supportive of Pinochet's regime). ```{r chile-eda} # statusquo distribution by vote boxplot(statusquo ~ vote, data = chile, xlab = "Vote", ylab = "Attitude toward status quo", main = "Chile 1988: status quo attitude by voting intention") ``` With this reordering of the response levels, a clear pattern emerges in the boxplot. But, also note how the shapes of the distributions change, and the presence of outliers. But here, the goal is to explain voting intention using the other predictors. ## Two competing tree structures The four `vote` categories may seem to be unordered in the general case. However, we can think more productively about the underlying process for a respondent that leads to their choice. A key theoretical question is the **ordering of the decision process**, which gives rise to two different sets of nested comparisons. ### Engagement-first tree The voter first decides whether to participate meaningfully (cast a Yes or No vote) vs. disengage (Abstain or remain Undecided). Engagement is then governed by demographic/civic variables; direction by political attitude. ``` vote / \ engaged disengaged / \ / \ Yes No Abstain Undec ``` We specify this as three nested dichotomies: ```{r chile-dichots-eng} dichots_eng <- logits( engage = dichotomy(engaged = c("Yes", "No"), disengaged = c("Abstain", "Undec")), direction = dichotomy("Yes", "No"), disengage = dichotomy("Abstain", "Undec") ) # print method dichots_eng # print as an ASCII tree as.tree(dichots_eng, response = "vote") ``` ### Direction-first tree The voter first forms a political opinion (pro or anti status quo) and only then decides whether to act on it. `statusquo` should dominate the direction split; demographics should govern the two engagement splits. ``` vote / \ pro anti / \ / \ Yes Undec No Abstain ``` These dichotomies are specified as follows: ```{r chile-dichots-dir} dichots_dir <- logits( direction = dichotomy(pro = c("Yes", "Undec"), anti = c("No", "Abstain")), engage_pro = dichotomy("Yes", "Undec"), engage_ant = dichotomy("No", "Abstain") ) dichots_dir ``` ## Fitting the models For comparison, we can fit the standard multinomial logistic regression models (using `"N"` as the baseline category) and the two nested dichotomy models. I use the main effects of the principal predictors (ignoring `region` and `population`) in all three models. ```{r chile-fit} mod.formula <- vote ~ statusquo + age + sex + education + income # Multinomial logit baseline (reference = "No") chile2 <- within(chile, vote <- relevel(vote, ref = "No")) chile_multi <- multinom(mod.formula, data = chile2, trace = FALSE) # Nested logit: engagement-first tree chile_eng <- nestedLogit(mod.formula, dichotomies = dichots_eng, data = chile) # Nested logit: direction-first tree chile_dir <- nestedLogit(mod.formula, dichotomies = dichots_dir, data = chile) ``` Details of this analysis are provided by `summary()`, which provides tests of coefficients as well as overall measures of fit (residual deviance) for each dichotomy as well as for the combined models comprising all four vote choices. This output is suppressed here ```{r chile-summaries, eval=FALSE} summary(chile_eng) summary(chile_dir) ``` ### Omnibus tests `car::Anova()` gives $G^2$ tests for each term in the submodels for the dichotomies as well as tests for these terms in the combined model. ```{r chile-anova} Anova(chile_eng) Anova(chile_dir) ``` ## Which tree fits better? Both nested models and the multinomial model have identical parameter counts (5 predictors × 3 sub-models = 15 slopes plus 3 intercepts), so AIC is a fair comparison. ```{r chile-aic} data.frame( model = c("Multinomial", "Engagement tree", "Direction tree"), logLik = c(logLik(chile_multi), logLik(chile_eng), logLik(chile_dir)), AIC = c(AIC(chile_multi), AIC(chile_eng), AIC(chile_dir)) ) ``` The direction-first tree should fit better (as it does) if `statusquo` is primarily a predictor of political opinion (Y/N vs. U/A) rather than of engagement. We can check this by inspecting the `statusquo` coefficients in each sub-model of the engagement tree: they should be large in `direction` and small in `engage` and `disengage`. ## Predicted probabilities To visualize these models, it is useful to calculate the predicted probabilities of `vote` choice over a range of predictor values, holding constant (averaging over) the predictor we don't want to display in a given plot ```{r chile-newdata} new_chile <- data.frame( statusquo = seq(-1.5, 1.5, by = 0.25), age = median(chile$age, na.rm = TRUE), sex = factor("F", levels = levels(chile$sex)), education = factor("S", levels = levels(chile$education)), income = median(chile$income, na.rm = TRUE) ) ``` We can get the predicted probabilities for these separate models for the dichotomies using `predict.nestedLogit()` and turning these into data frames for plotting with `ggplot2`. ```{r chile-predict} pred_eng <- as.data.frame(predict(chile_eng, newdata = new_chile)) pred_dir <- as.data.frame(predict(chile_dir, newdata = new_chile)) ``` ```{r chile-plot, fig.height=5, fig.width=9} # as.data.frame() returns long format: one row per (observation × response level) # with predictor columns, plus 'response', 'p', 'se.p', 'logit', 'se.logit' pred_long <- rbind( transform(pred_eng, model = "Engagement-first tree"), transform(pred_dir, model = "Direction-first tree") ) ggplot(pred_long, aes(x = statusquo, y = p, colour = response, fill = response)) + geom_ribbon(aes(ymin = pmax(0, p - 1.96 * se.p), ymax = pmin(1, p + 1.96 * se.p)), alpha = 0.15, colour = NA) + geom_line(linewidth = 1.2) + geom_point(color = "blacK") + facet_wrap(~ model) + labs(x = "Attitude toward status quo", y = "Predicted probability", colour = "Vote", fill = "Vote", title = "Chile 1988: predicted vote probabilities by tree structure") + theme_bw() ``` This is substantively interesting, because the two different structures for the dichotomy comparisons give quite different views of the predictions from the models. ### Plotting logits Using the `plot.nestedLogit()` method, we can also show these on the logit scale, using the `scale = "logit"` argument. The `others` argument allows us to condition on (average over) values of the variables not shown in this plot. There are other niceties, not shown here, such as using direct labels on the curves (`label = TRUE`, `label.x`) rather than a legend. ```{r chile-plot-logit, fig.height=5, fig.width=6} # Logit-scale plot using the new scale= argument plot(chile_eng, "statusquo", others = list(age = median(chile$age, na.rm = TRUE), sex = "F", education = "S", income = median(chile$income, na.rm = TRUE)), scale = "logit", xlab = "Attitude toward status quo", main = "Engagement-first tree") ``` --- # Travel mode choice (`AER::TravelMode`) {#travel} The dataset `AER::TravelMode` is a classic in econometrics, originally from @Greene:2003. It relates to the mode of travel chosen by individuals for travel between Sydney and Melbourne, Australia. * Do they choose to go by "car", "air", "train", or "bus"? * How do variables like cost, waiting time, household income influence that choice? ```{r travel-pkg, include=FALSE} have_AER <- requireNamespace("AER", quietly = TRUE) have_dplyr <- requireNamespace("dplyr", quietly = TRUE) ``` This section requires the `AER` and `dplyr` packages. ```{r travel-eval, eval=have_AER && have_dplyr} library(AER) library(dplyr) data("TravelMode", package = "AER") str(TravelMode) ``` ## Data structure `TravelMode` is in **long format**: 840 rows = 210 individuals × 4 travel alternatives (air, train, bus, car). Variables such as `wait` and `gcost` are *alternative-specific* — they differ across modes for the same individual. `nestedLogit()` requires **one row per individual** with the chosen mode as the response. Only **individual-specific** predictors (`income`, `size`) work here directly; alternative-specific costs would require a different approach. ```{r travel-reshape, eval=have_AER && have_dplyr} tm <- TravelMode |> filter(choice == "yes") |> select(mode, income, size) |> mutate(mode = relevel(factor(mode), ref = "car")) table(tm$mode) ``` ## Nesting structure The canonical nested structure separates *private* (car) from *public* transit, then within public separates *air* from *ground*, then *train* from *bus*. This reflects the IIA problem: adding a near-identical alternative (e.g., a second bus service) should not equally reduce the probability of choosing car. ```{r travel-dichots, eval=have_AER && have_dplyr} travel_dichots <- logits( pvt_pub = dichotomy("car", public = c("air", "train", "bus")), air_grnd = dichotomy("air", ground = c("train", "bus")), tr_bus = dichotomy("train", "bus") ) travel_dichots ``` ## Fitting ```{r travel-fit, eval=have_AER && have_dplyr} tm_multi <- multinom(mode ~ income + size, data = tm, trace = FALSE) tm_nested <- nestedLogit(mode ~ income + size, dichotomies = travel_dichots, data = tm) summary(tm_nested) Anova(tm_nested) ``` ## Predicted probabilities ```{r travel-predict, eval=have_AER && have_dplyr} new_tm <- data.frame(income = seq(10, 60, by = 10), size = 1) # as.data.frame() returns long format; reshape to wide for comparison pred_tm_nested_long <- as.data.frame(predict(tm_nested, newdata = new_tm)) nested_wide <- reshape( pred_tm_nested_long[, c("income", "size", "response", "p")], idvar = c("income", "size"), timevar = "response", direction = "wide" ) names(nested_wide) <- sub("^p\\.", "", names(nested_wide)) pred_tm_multi <- as.data.frame(predict(tm_multi, newdata = new_tm, type = "probs")) cat("Nested logit:\n") cbind(income = new_tm$income, round(nested_wide[, levels(tm$mode)], 3)) cat("Multinomial logit:\n") cbind(income = new_tm$income, round(pred_tm_multi[, levels(tm$mode)], 3)) ``` ## Model comparison ```{r travel-aic, eval=have_AER && have_dplyr} data.frame( model = c("Multinomial", "Nested"), logLik = c(logLik(tm_multi), logLik(tm_nested)), AIC = c(AIC(tm_multi), AIC(tm_nested)) ) ``` --- # Fishing mode choice (`mlogit::Fishing`) {#fishing} Another classic econometric dataset amenable to nested dichotomies treatment concerns the choice by recreational anglers among different modes of fishing: from a beach or pier, or on a boat, that could be private or a charter. The dataset is from @HerrgesKling:1999; see also @CameronTrivedi:2005 for an econometric treatment that also considers the cost of fishing by these modes and also the expected catch associated with each. Models for such response-specific predictors are interesting, but outside the scope of our nested logit models. ```{r fishing-pkg, include=FALSE} have_mlogit <- requireNamespace("mlogit", quietly = TRUE) ``` This section requires the `mlogit` package for the dataset. ```{r fishing-load, eval=have_mlogit} data("Fishing", package = "mlogit") str(Fishing) ``` ## Overview 1182 US recreational anglers each chose one of four fishing modes: `beach`, `pier`, `boat`, `charter`. The natural nesting separates *shore-based* from *vessel-based* alternatives: ``` fishing / \ shore vessel / \ / \ beach pier boat charter ``` Shore-based modes share unobserved attributes (no boat required, lower cost and commitment) that vessel-based modes do not, so IIA is plausible within but not across groups. Unlike `AER::TravelMode`, `mlogit::Fishing` is **already in wide format** — one row per individual — with `mode` indicating the chosen alternative and `income` as the only individual-specific predictor. The `price.*` and `catch.*` variables are *alternative-specific* (one column per mode) and cannot be used directly with `nestedLogit()`, which expects predictors that apply to the individual, not to each alternative. A full conditional-logit model (e.g., via `mlogit::mlogit()`) would be needed to exploit those variables. ```{r fishing-dichots, eval=have_mlogit} fishing_dichots <- logits( shore_vessel = dichotomy(shore = c("beach", "pier"), vessel = c("boat", "charter")), shore_type = dichotomy("beach", "pier"), vessel_type = dichotomy("boat", "charter") ) fishing_dichots as.tree(fishing_dichots, response = "mode") ``` Fit the model and show the summaries for the dichotomies: ```{r fishing-fit, eval=have_mlogit} fish_nested <- nestedLogit(mode ~ income, dichotomies = fishing_dichots, data = Fishing) summary(fish_nested) Anova(fish_nested) ``` ## Plots Again, the `plot.nestedLogit()` gives an interpretable plot of the response probabilities: ```{r fishing-plot, eval=have_mlogit} plot(fish_nested, x.var = "income", xlab = "Monthly income (USD)", main = "Fishing mode choice vs. income", label=TRUE, label.col="black", cex.lab = 1.2) ``` * Fishing from the beach is a low-probability choice, which doesn't vary much with income. * Fishing from a pier is next overall, followed by charter fishing. Both of these decline with income. * The tendency to fish from a private boat (yours or friend's) increases dramatically with income. --- # Arthritis treatment outcome (`vcd::Arthritis`) {#arthritis} ```{r arthritis-pkg, include=FALSE} have_vcd <- requireNamespace("vcd", quietly = TRUE) ``` ```{r arthritis-load, eval=have_vcd} data("Arthritis", package = "vcd") str(Arthritis) table(Arthritis$Improved) ``` ## Overview 84 patients in a double-blind clinical trial received either an active treatment or a placebo (`Treatment`). They are classified by `Sex` and `Age`. The outcome `Improved` is *ordinal*, with levels "None", "Some" or "Marked" improvement. Simpler analyses of these data consider only the dichotomy ("Improved") between "None" and the other categories. ``` Improved / \ None Any improvement / \ Some Marked ``` Continuation logits are natural here: dichotomy 1 asks whether the patient improved at all; dichotomy 2 asks, given improvement, whether it was marked rather than merely some. Treated patients should have higher probabilities at both transitions. ```{r arthritis-dichots, eval=have_vcd} arth_dichots <- continuationLogits(c("None", "Some", "Marked")) arth_dichots ``` ```{r arthritis-fit, eval=have_vcd} arth_nested <- nestedLogit(Improved ~ Treatment + Sex + Age, dichotomies = arth_dichots, data = Arthritis) summary(arth_nested) Anova(arth_nested) ``` ## Plotting ```{r arthritis-plot, eval=have_vcd} plot(arth_nested, x.var = "Age", others = list(Treatment = "Treated", Sex = "Female"), xlab = "Age", main = "Arthritis: Treatment = Treated, Sex = Female") ``` --- # Smoking status (`MASS::survey`) {#smoking} ```{r survey-load} data("survey", package = "MASS") # Drop NAs on Smoke sv <- survey[!is.na(survey$Smoke), ] table(sv$Smoke) ``` ## Overview 237 Australian university students. `Smoke` is ordinal: `Never` < `Occas` < `Regul` < `Heavy`. Continuation logits: - Dichotomy 1: Never vs. ever smoked - Dichotomy 2: Occasional vs. regular+ - Dichotomy 3: Regular vs. heavy ```{r smoke-dichots} smoke_dichots <- continuationLogits(c("Never", "Occas", "Regul", "Heavy")) smoke_dichots ``` ```{r smoke-fit} smoke_nested <- nestedLogit(Smoke ~ Sex + Age + Exer, dichotomies = smoke_dichots, data = sv) summary(smoke_nested) Anova(smoke_nested) ``` ```{r smoke-plot} plot(smoke_nested, x.var = "Age", others = list(Sex = "Female", Exer = "Some"), xlab = "Age", main = "Smoking: Female students, some exercise") ``` **Note**: With only 237 observations and modest predictors, results are illustrative rather than definitive. --- # Ordinal examples — brief sketches ## Pneumoconiosis severity (`VGAM::pneumo`) ```{r pneumo-pkg, include=FALSE} have_VGAM <- requireNamespace("VGAM", quietly = TRUE) ``` ```{r pneumo-show, eval=have_VGAM} data("pneumo", package = "VGAM") pneumo ``` 8 grouped observations on coal miners. Response: `normal` / `mild` / `severe`. Predictor: `duration` (years of dust exposure). Very small data but the story is clean — continuation logits on an ordinal severity scale vs. a single continuous dose predictor. ```{r pneumo-fit, eval=have_VGAM} # pneumo is aggregated; nestedLogit() needs individual-level data or # frequency weights — requires expansion or a weighted interface. # Placeholder: would use continuationLogits(c("normal", "mild", "severe")) ``` ## Mental health status (Agresti) Mental health status (`Well`, `Mild`, `Moderate`, `Impaired`) vs. SES and life events — Table 9.1 in @Agresti:2002. May be available via `vcdExtra::Mental` or must be entered from the book. Classic illustration of continuation logits for an ordinal psychiatric outcome. --- # Session info ```{r session-info} sessionInfo() ``` ```{r write-bib, echo=FALSE} pkgs <- unique(c(.to.cite, .packages())) knitr::write_bib(pkgs, file = here::here("vignettes", "packages.bib")) ``` ## References