--- title: "New Features in rdrobust 4.0.0" author: "Sebastian Calonico, Matias D. Cattaneo, Max H. Farrell, Rocio Titiunik" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{New Features in rdrobust 4.0.0} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.5 ) ``` This vignette documents the features added in **rdrobust 4.0.0**: 1. A `data =` argument for `rdrobust()`, `rdbwselect()`, and `rdplot()`, letting you refer to variables by their bare names inside a data frame. 2. Heteroskedasticity-robust variance estimators (`vce = "hc0"`, `"hc1"`, `"hc2"`, `"hc3"`). 3. Cluster-robust variance estimators (`vce = "cr1"`, `"cr2"`, `"cr3"`) with automatic validation. 4. A `plot` method for `rdrobust` objects (`plot.rdrobust`). 5. A `hide = TRUE` flag and `vars_bins` / `vars_poly` / `coef` outputs from `rdplot()`, so you can build custom RD plots on top of `ggplot2` (or any other graphics system) without re-implementing the binning logic. 6. `tidy` / `glance` methods for integration with the **broom** package. Existing functionality — bandwidth selection, covariates, kernels, fuzzy and kink designs — is unchanged. See `?rdrobust` and `?rdbwselect` for full documentation. --- ## Setup We use the U.S. Senate incumbency advantage dataset included in the package. The running variable is the Democratic vote margin at election $t$; the outcome is the Democratic vote share at election $t+2$. The `state` variable serves as a natural cluster identifier, since elections within the same state share common political and institutional features. ```{r data} library(rdrobust) data(rdrobust_RDsenate) df <- rdrobust_RDsenate head(df) ``` --- ## 1. The `data =` argument Previously, callers either had to `attach()` the data frame or pass each column with `df$` prefixing. In 4.0.0, `rdrobust()`, `rdbwselect()`, and `rdplot()` accept a `data =` argument so `y`, `x`, `covs`, `cluster`, `fuzzy`, `weights`, and `subset` can be given as bare names, resolved against the supplied data frame: ```{r data-arg-basic} r <- rdrobust(y = vote, x = margin, data = df) round(r$coef, 3) ``` It also works for expressions built from columns (e.g. `cbind()` for multiple covariates, comparisons for `subset`): ```{r data-arg-expr} r_cov <- rdrobust(y = vote, x = margin, covs = cbind(class, termshouse, termssenate), data = df) round(r_cov$coef, 3) # subset via an expression evaluated inside `data` r_sub <- rdrobust(y = vote, x = margin, subset = year >= 1950, data = df) round(r_sub$coef, 3) ``` The same pattern works for `rdbwselect()` and `rdplot()`. For the rest of the vignette we pass variables through `data =`. ### 1.1 `covs =`: formula, names, or matrix The `covs` argument accepts three forms, so the way you specify covariates can match the shape of your problem: 1. **One-sided formula** — recommended when you have factors, interactions, or transformations: ```{r covs-formula-factor} df$classf <- factor(df$class, labels = c("A", "B", "C")) r_fm <- rdrobust(y = vote, x = margin, covs = ~ classf + termshouse + I(termssenate^2), data = df) round(r_fm$coef, 3) ``` Factors are expanded to contrast dummies via `model.matrix()`; interactions (`~ z1 * z2`) and transformations (`I(...)`, `log()`, `poly()`) all work, and the intercept column is dropped automatically. Symbols missing from `data` fall through to the caller's environment. 2. **Character vector of column names** — convenient for programmatic specifications: ```{r covs-char} covs_names <- c("class", "termshouse", "termssenate") r_ch <- rdrobust(y = vote, x = margin, covs = covs_names, data = df) round(r_ch$coef, 3) ``` Missing column names give a clear error (`Column(s) not found in 'data': ...`). Passing a character vector without `data = ` is rejected up front instead of failing deep inside the estimator. 3. **Numeric vector, matrix, or data frame** — fully backwards compatible: ```{r covs-matrix} Z <- with(df, cbind(class, termshouse, termssenate)) r_mat <- rdrobust(y = vote, x = margin, covs = Z, data = df) round(r_mat$coef, 3) ``` All three forms — formula with the same plain terms, character vector, and pre-built matrix — yield identical results when the covariates are numeric and the expansion is trivial; the formula form is the right choice when factor expansion or interactions matter. ```{r rdplot, fig.cap="RD plot: U.S. Senate incumbency advantage."} rdplot(y = vote, x = margin, data = df, title = "Senate: incumbency advantage", x.label = "Vote margin (t)", y.label = "Vote share (t+2)") ``` --- ## 2. Heteroskedasticity-robust variance (HC family) The full `HC0`–`HC3` family is available via `vce =`. `HC0` is the plug-in sandwich; `HC1` adds a finite-sample scalar; `HC2` and `HC3` reweight residuals by leverage, with `HC3` being the most conservative. ```{r hc-variants} hc_fits <- lapply(c("nn", "hc0", "hc1", "hc2", "hc3"), function(v) { rdrobust(y = vote, x = margin, vce = v, data = df) }) names(hc_fits) <- c("NN", "HC0", "HC1", "HC2", "HC3") hc_tbl <- do.call(rbind, lapply(names(hc_fits), function(nm) { r <- hc_fits[[nm]] data.frame(vce = nm, coef = round(r$coef[2], 3), # bias-corrected se_rb = round(r$se[3], 3), # robust SE pval = round(r$pv[3], 3)) })) print(hc_tbl, row.names = FALSE) ``` As expected, the point estimate is identical across HC variants (the variance estimator does not change it); only the standard errors and p-values move. --- ## 3. Cluster-Robust Variance Estimators (CR family) When observations are grouped into clusters, standard errors should account for within-cluster correlation. Version 4.0.0 adds three cluster-robust variance estimators: | `vce` | Description | |-------|-------------| | `"cr1"` | CR1 — cluster sandwich with degrees-of-freedom correction; the default when `cluster` is supplied | | `"cr2"` | CR2 — Bell & McCaffrey (2002); accounts for within-cluster leverage | | `"cr3"` | CR3 — Pustejovsky & Tipton (2018); leave-one-cluster-out jackknife; most conservative | ```{r cluster-counts} cat("Number of clusters:", length(unique(df$state)), "\n") ``` ```{r crv-variants} r_cr1 <- rdrobust(y = vote, x = margin, cluster = state, vce = "cr1", data = df) r_cr2 <- rdrobust(y = vote, x = margin, cluster = state, vce = "cr2", data = df) r_cr3 <- rdrobust(y = vote, x = margin, cluster = state, vce = "cr3", data = df) cl_tbl <- data.frame( vce = c("CR1", "CR2", "CR3"), coef = c(r_cr1$coef[2], r_cr2$coef[2], r_cr3$coef[2]), se_rb = c(r_cr1$se[3], r_cr2$se[3], r_cr3$se[3]), pval = c(r_cr1$pv[3], r_cr2$pv[3], r_cr3$pv[3]) ) cl_tbl[, 2:4] <- round(cl_tbl[, 2:4], 3) print(cl_tbl, row.names = FALSE) ``` The point estimate (bias-corrected) is the same across estimators — only the standard errors differ. CR3 produces the widest intervals and is the safest default when the number of clusters is moderate. ### Automatic validation When `cluster` is supplied, only `cr1`/`cr2`/`cr3` are valid; the HC variants and `nn` are silently remapped (or warned and remapped, for HC) to the corresponding cluster-robust estimator: - `vce = "nn"` + cluster → `cr1` (silent default) - `vce = "hc0"` + cluster → `cr1` (with warning) - `vce = "hc1"` + cluster → `cr1` (with warning) - `vce = "hc2"` + cluster → `cr2` (with warning) - `vce = "hc3"` + cluster → `cr3` (with warning) Conversely, asking for `cr1/cr2/cr3` without supplying a `cluster` falls back to the corresponding HC variant with a warning. ### Full output for CR3 ```{r cr3-summary} summary(r_cr3) ``` --- ## 4. `plot.rdrobust`: Visual Summary of RD Results Version 4.0.0 adds a `plot` S3 method for `rdrobust` objects. It produces a binscatter with polynomial fits and confidence bands on a common scale. **Signature:** ```r plot(x, y, x_run, nbins = 20, ci = TRUE, show_effect = FALSE, title = NULL, x.label = "Running Variable", y.label = "Outcome", ...) ``` ### Basic plot ```{r plot-basic, fig.cap="Binscatter with local polynomial fits and 95% CI bands."} plot(r_cr1, df$vote, df$margin, title = "Senate: incumbency advantage", x.label = "Vote margin (t)", y.label = "Vote share (t+2)") ``` ### Adding an effect panel `show_effect = TRUE` appends a second panel with the conventional point estimate, the robust bias-corrected confidence interval, and significance stars. ```{r plot-effect, fig.height=6, fig.cap="Main plot with RD effect panel."} plot(r_cr1, df$vote, df$margin, show_effect = TRUE, title = "Senate: incumbency advantage", x.label = "Vote margin (t)", y.label = "Vote share (t+2)") ``` ### Clustered plot The method works identically for any `rdrobust` object, regardless of how it was estimated. ```{r plot-cr3, fig.cap="CR3 clustered variance, with effect panel."} plot(r_cr3, df$vote, df$margin, show_effect = TRUE, title = "Senate: CR3 clustered SE", x.label = "Vote margin (t)", y.label = "Vote share (t+2)") ``` --- ## 5. Custom RD plots from `rdplot()` `rdplot()` now accepts a `hide = TRUE` flag and returns the bin-level data used to draw the plot, so you can build any visualization on top of `ggplot2`, `lattice`, or base graphics without re-implementing the binning logic. The returned list contains: - `vars_bins` — one row per bin: bin midpoint (`rdplot_mean_bin`), within-bin mean of `y` (`rdplot_mean_y`), within-bin standard error (`rdplot_se_y`), and pointwise CI endpoints (`rdplot_ci_l` / `rdplot_ci_r`) when `ci = ` is set. - `vars_poly` — a fine evaluation grid (`rdplot_x`) with the fitted polynomial values on each side of the cutoff (`rdplot_y`). - `coef` — the per-side polynomial coefficient matrix (`Left` / `Right` columns) that produced `vars_poly`. ```{r rdplot-hide} out <- rdplot(y = vote, x = margin, c = 0, hide = TRUE, ci = 95, data = df) str(out$vars_bins, give.attr = FALSE) str(out$vars_poly, give.attr = FALSE) out$coef ``` A minimal custom plot using `ggplot2`: ```{r rdplot-manual, fig.cap="Custom RD plot built from rdplot() output."} library(ggplot2) poly_l <- subset(out$vars_poly, rdplot_x < 0) poly_r <- subset(out$vars_poly, rdplot_x >= 0) ggplot() + geom_vline(xintercept = 0, linewidth = 0.4) + geom_errorbar(data = out$vars_bins, aes(x = rdplot_mean_bin, ymin = rdplot_ci_l, ymax = rdplot_ci_r), color = "grey60", width = 0) + geom_point(data = out$vars_bins, aes(x = rdplot_mean_bin, y = rdplot_mean_y), color = "grey40", size = 1.5) + geom_line(data = poly_l, aes(x = rdplot_x, y = rdplot_y)) + geom_line(data = poly_r, aes(x = rdplot_x, y = rdplot_y)) + theme_bw() + labs(x = "Vote margin (t)", y = "Vote share (t+2)", title = "Senate: regression function fit") ``` You can also reconstruct the fitted polynomial on a custom x-range directly from `coef`, which is convenient when the default evaluation grid doesn't span the range you want: ```{r rdplot-from-coef} eval_fit <- function(xs, coef, c) sapply(xs, function(xi) sum(coef * (xi - c) ^ (0:(length(coef) - 1)))) xs <- seq(-50, 50, length.out = 5) data.frame( x = xs, y_left = ifelse(xs < 0, eval_fit(xs, out$coef[, "Left"], 0), NA), y_right = ifelse(xs >= 0, eval_fit(xs, out$coef[, "Right"], 0), NA) ) ``` The full set of worked examples (default plot, error-bar variant, shaded CI ribbons, coefficient-based reconstruction) is in `rdplot_illustration.R` at the project root, with Python and Stata twins (`rdplot_illustration.py`, `rdplot_illustration.do`). --- ## 6. broom Integration **rdrobust** 4.0.0 registers `tidy` and `glance` methods so that results integrate seamlessly with the **broom** package and tidyverse workflows. ### `tidy()`: Coefficient-level output `tidy()` returns one row per estimator (Conventional, Bias-Corrected, Robust), with standard broom column names: ```{r broom-tidy, eval=requireNamespace("broom", quietly=TRUE)} library(broom) tidy(r_cr3) ``` For most applications, the **Robust** row (row 3) is the recommended inference — it combines the bias-corrected point estimate with robust standard errors. ### `glance()`: Model-level summaries `glance()` returns a single-row data frame with key estimation details: sample sizes, bandwidths, VCE method, and whether covariates or clusters were used. ```{r broom-glance, eval=requireNamespace("broom", quietly=TRUE)} glance(r_cr3) ``` ### Collecting results across specifications The main payoff of broom integration is easy comparison across specifications. Here we compare all three cluster-robust estimators: ```{r broom-collect, eval=requireNamespace("broom", quietly=TRUE)} specs <- list(CR1 = r_cr1, CR2 = r_cr2, CR3 = r_cr3) # Robust BC estimates side-by-side res <- do.call(rbind, lapply(names(specs), function(nm) { cbind(spec = nm, tidy(specs[[nm]])[3, ]) # row 3 = robust })) res[, c("spec", "estimate", "std.error", "conf.low", "conf.high")] ``` We can also compare model-level details: ```{r broom-glance-compare, eval=requireNamespace("broom", quietly=TRUE)} gl <- do.call(rbind, lapply(names(specs), function(nm) { cbind(spec = nm, glance(specs[[nm]])) })) gl[, c("spec", "nobs.effective.left", "nobs.effective.right", "h.left", "vce", "clusters.left", "clusters.right")] ``` ### Broader comparison: HC3 vs clustered A common workflow is comparing heteroskedasticity-robust and cluster-robust results to assess sensitivity to within-cluster correlation: ```{r broom-hc-vs-cl, eval=requireNamespace("broom", quietly=TRUE)} r_hc3 <- rdrobust(y = vote, x = margin, vce = "hc3", data = df) all_specs <- list("HC3" = r_hc3, "CR1" = r_cr1, "CR2" = r_cr2, "CR3" = r_cr3) comparison <- do.call(rbind, lapply(names(all_specs), function(nm) { g <- glance(all_specs[[nm]]) t <- tidy(all_specs[[nm]])[3, ] # robust row data.frame(spec = nm, estimate = round(t$estimate, 3), std.error = round(t$std.error, 3), ci = paste0("[", round(t$conf.low, 3), ", ", round(t$conf.high, 3), "]"), h = round(g$h.left, 3), clusters = g$clusters.left) })) print(comparison, row.names = FALSE) ``` --- ## 7. Reporting tables: conventional point + RBC CI A common workflow is to present a small panel of specifications (sharp, covariate-adjusted, cluster-robust, etc.) in a single table. The canonical Calonico-Cattaneo-Titiunik presentation pairs the **conventional point estimate** with the **robust bias-corrected (RBC) confidence interval** — the point estimate is from row 1 of `coef`, the CI is row 3 of `ci`. The same pattern appears in the Stata illustration (`rdrobust_illustration_new_features.do`), built there with `esttab`; the R analog uses `knitr::kable()`. We fit four specifications side-by-side: sharp, sharp+covs, cluster, cluster+covs. ```{r table-fits} covs_form <- ~ class + termshouse + termssenate m1 <- rdrobust(y = vote, x = margin, data = df) m2 <- rdrobust(y = vote, x = margin, covs = covs_form, data = df) m3 <- rdrobust(y = vote, x = margin, vce = "cr1", cluster = state, data = df) m4 <- rdrobust(y = vote, x = margin, vce = "cr1", cluster = state, covs = covs_form, data = df) ``` A small helper extracts the conventional point estimate, the RBC confidence interval, and a few model-level stats from a fitted object: ```{r table-row-builder} rdr_row <- function(fit, label) { data.frame( spec = label, estimate = fit$coef[1, 1], # tau_conv rbc_ci = sprintf("[%.3f, %.3f]", fit$ci[3, 1], fit$ci[3, 2]), rbc_pval = fit$pv[3, 1], h = fit$bws[1, 1], # h_left N_eff = fit$N_h[1] + fit$N_h[2], clusters = if (!is.null(fit$n_clust)) sum(fit$n_clust) else NA_integer_ ) } tab <- rbind( rdr_row(m1, "sharp"), rdr_row(m2, "sharp + covs"), rdr_row(m3, "cluster"), rdr_row(m4, "cluster + covs") ) ``` Render with `knitr::kable()` for the report: ```{r table-kable} knitr::kable( tab, digits = 3, col.names = c("Specification", "Estimate", "Robust 95% CI", "RBC p-value", "h", "Eff. N", "Clusters"), caption = "RD estimates: conventional point estimate with RBC confidence interval." ) ``` The Estimate column reports `tau_conv` (the standard local-polynomial point estimate); the **Robust 95% CI** is the bias-corrected, robust-SE confidence interval — the inference recommended by CCT 2014 and the same one printed by `summary(fit)` in the Robust row. The CI is centered on `tau_bc`, not on `tau_conv`, which is why a CI may not be symmetric about the reported point estimate. This is a feature of robust bias-corrected inference, not a typo. ### Specs-as-columns variant (paper style) For a typeset table closer to the Stata `esttab` layout (specifications as columns, statistics as rows), transpose: ```{r table-paper} paper <- t(tab[, c("estimate", "rbc_ci", "h", "N_eff")]) colnames(paper) <- tab$spec rownames(paper) <- c("RD effect (conv.)", "Robust 95% CI", "h", "Eff. N") knitr::kable( paper, caption = "Senate incumbency: RD estimates across specifications." ) ``` ### LaTeX export To write a publication-ready LaTeX table to a file (the R analog of the Stata `esttab ... using rdrobust_table.tex`): ```{r table-latex, eval=FALSE} tex <- knitr::kable( paper, format = "latex", booktabs = TRUE, caption = "RD estimates for Senate incumbency.", label = "tab:rdrobust" ) writeLines(tex, "rdrobust_table.tex") ``` For richer formatting (multi-row headers, footnotes, grouping), the `kableExtra` and `gt` packages build on top of `kable()` / `gt::gt()` and accept the same input data frame. --- ## Session Info ```{r session} sessionInfo() ```