1 Introduction

The NonlinearDiD package implements difference-in-differences estimators for staggered treatment adoption with binary, count, and other nonlinear outcomes.

1.1 Why Nonlinear Outcomes Require Special Methods

The landmark Callaway & Sant’Anna (2021) framework assumes that parallel trends holds on the probability scale:

\[E[Y_{it}(0) - Y_{it-1}(0) \mid G = g] = E[Y_{it}(0) - Y_{it-1}(0) \mid G = \infty]\]

For continuous outcomes (wages, test scores), this is natural. But for binary outcomes (employment, hospitalization, default), parallel trends in probabilities implies parallel trends in log-odds only under a very specific functional form restriction.

Roth & Sant’Anna (2023) show this creates two distinct problems:

Sensitivity to functional form: Whether pre-treatment trends appear parallel depends entirely on which scale you test.
Treatment effect heterogeneity conflation: In nonlinear models, the “treatment effect” includes both the true effect and Jensen’s inequality adjustments.

This package addresses both problems.

2 Installation

# From CRAN (forthcoming)
install.packages("NonlinearDiD")

# Development version from GitHub
remotes::install_github("example/NonlinearDiD")

3 Binary Outcomes: Staggered DiD

3.1 Simulating Data

dat <- sim_binary_panel(
  n            = 800,
  nperiods     = 8,
  prop_treated = 0.6,
  n_cohorts    = 3,
  true_att     = c(0.15, 0.25, 0.20),  # heterogeneous treatment effects
  base_prob    = 0.3,
  seed         = 42
)

head(dat)
#>   id period y g D alpha_i     x1 x2
#> 1  1      1 1 0 0  0.8921 0.6888  1
#> 2  1      2 1 0 0  0.8921 0.6888  1
#> 3  1      3 0 0 0  0.8921 0.6888  1
#> 4  1      4 1 0 0  0.8921 0.6888  1
#> 5  1      5 0 0 0  0.8921 0.6888  1
#> 6  1      6 0 0 0  0.8921 0.6888  1
cat("Treatment cohorts:", table(dat$g[dat$period == 1]), "\n")
#> Treatment cohorts: 320 160 160 160
cat("Baseline outcome rate:", round(mean(dat$y[dat$D == 0]), 3), "\n")
#> Baseline outcome rate: 0.343

3.2 Estimating ATT(g,t) with Logit Model

res <- nonlinear_attgt(
  data          = dat,
  yname         = "y",
  tname         = "period",
  idname        = "id",
  gname         = "g",
  xformla       = ~ x1 + x2,
  outcome_model = "logit",
  estimand      = "att",
  control_group = "nevertreated",
  doubly_robust = TRUE
)

summary(res)
#> 
#> === Nonlinear DiD Summary ===
#> 
#> Model:         logit 
#> Estimand:      att 
#> Control group: nevertreated 
#> 
#> --- Post-treatment ATT(g,t) ---
#>   Mean ATT: 0.0327
#>   Median ATT: 0.0287
#>   Min ATT: -0.0531  |  Max ATT: 0.1634
#>   N(g,t) cells: 14
#> 
#> --- Pre-treatment ATT(g,t) (should be ~0 under PT) ---
#>   Mean: 0.0399  |  SD: 0.0568
#>   N(g,t) cells: 10
#> 
#> --- Convergence ---
#>   Converged: 24 / 24

The nonlinear_attgt() function estimates a separate ATT for each (group, time) pair. The key difference from the linear case: we model the change in log-odds for control units as the counterfactual, not the change in probability.

3.3 Event-Study Aggregation

agg_dynamic <- nonlinear_aggte(res, type = "dynamic")
print(agg_dynamic)
#> 
#> === Nonlinear DiD: Aggregated ATT ===
#> 
#> Aggregation type: dynamic 
#> Outcome model:   logit 
#> 
#> Overall ATT: 0.0327  (SE: 0.0130, t = 2.508, p = 0.0121)
#> 95% CI: [0.0071, 0.0582]
#> 
#>  label       att      se     ci_lo   ci_hi n_groups  post
#>     -6  0.155126 0.04903  0.059034 0.25122        1 FALSE
#>     -5  0.082694 0.04779 -0.010974 0.17636        1 FALSE
#>     -4  0.108083 0.04850  0.013032 0.20313        1 FALSE
#>     -3  0.002701 0.03444 -0.064793 0.07019        2 FALSE
#>     -2  0.024059 0.03406 -0.042694 0.09081        2 FALSE
#>     -1  0.000000 0.00000  0.000000 0.00000        3 FALSE
#>      0  0.005890 0.02791 -0.048816 0.06060        3  TRUE
#>      1  0.062026 0.02836  0.006445 0.11761        3  TRUE
#>      2  0.002272 0.03493 -0.066189 0.07073        2  TRUE
#>      3  0.065044 0.03432 -0.002223 0.13231        2  TRUE
#>      4  0.045172 0.03519 -0.023797 0.11414        2  TRUE
#>      5 -0.001089 0.04710 -0.093395 0.09122        1  TRUE
#>      6  0.029827 0.04741 -0.063086 0.12274        1  TRUE
plot(agg_dynamic)

Event-study plot for binary outcome DiD

3.4 Group-Level ATT

agg_group <- nonlinear_aggte(res, type = "group")
print(agg_group)
#> 
#> === Nonlinear DiD: Aggregated ATT ===
#> 
#> Aggregation type: group 
#> Outcome model:   logit 
#> 
#> Overall ATT: 0.0327  (SE: 0.0130, t = 2.508, p = 0.0121)
#> 95% CI: [0.0071, 0.0582]
#> 
#>  label      att      se      ci_lo   ci_hi n_periods post
#>      2 0.003981 0.01821 -0.0317023 0.03966         7 TRUE
#>      4 0.043094 0.02214 -0.0003072 0.08650         5 TRUE
#>      7 0.107062 0.03449  0.0394586 0.17467         2 TRUE

The group-level ATT tells us: cohort 4 experienced the largest treatment effect (they adopted treatment earlier, when baseline rates were lower).

3.5 Pre-Trends Test

pt <- nonlinear_pretest(res, plot = FALSE)
print(pt)
#> 
#> === Nonlinear DiD Pre-Treatment Parallel Trends Test ===
#> 
#> Test type: joint 
#> Significance level: 0.05 
#> 
#> --- Pre-period ATT(g,t) estimates ---
#>    group time         att         se      tstat        pval   sig
#> 1      2    1  0.00000000 0.00000000        NaN         NaN    NA
#> 9      4    1  0.02766112 0.04872714  0.5676738 0.570256489 FALSE
#> 10     4    2  0.02104178 0.04841993  0.4345685 0.663875632 FALSE
#> 11     4    3  0.00000000 0.00000000        NaN         NaN    NA
#> 17     7    1  0.15512568 0.04902738  3.1640623 0.001555835  TRUE
#> 18     7    2  0.08269389 0.04779045  1.7303435 0.083568925 FALSE
#> 19     7    3  0.10808258 0.04849612  2.2286851 0.025834862  TRUE
#> 20     7    4 -0.02225973 0.04867279 -0.4573342 0.647430846 FALSE
#> 21     7    5  0.02707614 0.04790974  0.5651490 0.571972408 FALSE
#> 22     7    6  0.00000000 0.00000000        NaN         NaN    NA
#> 
#> --- Joint Test ---
#> REJECT parallel trends in pre-period (joint chi2(7) = 19.012, p = 0.0081 < 0.05).
#> Estimates may be biased.

The pre-trends test examines whether ATT(g,t) = 0 for all pre-treatment periods. Critical caveat: A failure to reject pre-trends does NOT validate parallel trends — it only fails to find evidence against it. For binary outcomes, the test is conducted on the same scale as the estimator.

4 Odds-Ratio DiD

For some binary outcome applications, the odds-ratio DiD is preferable because: - It is invariant to the choice of reference group - It does not require parallel trends in probabilities (only in log-odds) - It has a natural multiplicative interpretation

# Use simple 2-period case for clarity
dat2 <- dat[dat$period %in% c(3, 5), ]

res_or <- odds_ratio_did(
  data           = dat2,
  yname          = "y",
  tname          = "period",
  idname         = "id",
  treat_period   = 5,
  control_period = 3,
  gname          = "g"
)

print(res_or)
#> 
#> === Odds-Ratio DiD ===
#> 
#> Interpretation: OR-DiD = 0.964 [95% CI: 0.634, 1.464] 
#> 
#> Log(OR-DiD): -0.0372  (SE: 0.2135)
#> 95% CI [OR]: [0.6340, 1.4642]
#> t-stat: -0.174  |  p-value: 0.8618

An OR-DiD of 1 means no treatment effect. Values > 1 indicate the treatment increased the odds of Y = 1.

5 Count Outcomes: Poisson DiD

For count outcomes (patents, hospital visits, crimes), we assume parallel trends on the log scale (multiplicative parallel trends):

\[\frac{E[Y_{it}(0)]}{E[Y_{it-1}(0)]} = \frac{E[Y_{jt}(0)]}{E[Y_{jt-1}(0)]}\]

count_dat <- sim_count_panel(
  n            = 600,
  nperiods     = 6,
  prop_treated = 0.5,
  true_rr      = c(1.5, 2.0, 1.3),  # rate ratios per cohort
  base_rate    = 10,
  seed         = 7
)

summary(count_dat$y)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    0.00   10.00   14.00   15.53   19.00   63.00

# Staggered Poisson DiD
res_count <- nonlinear_attgt(
  data          = count_dat,
  yname         = "y",
  tname         = "period",
  idname        = "id",
  gname         = "g",
  outcome_model = "poisson",
  estimand      = "att",
  control_group = "nevertreated"
)

agg_count <- nonlinear_aggte(res_count, type = "dynamic")
plot(agg_count)

# Simple 2x2 Poisson DiD (Wooldridge QMLE)
count_sub <- count_dat[count_dat$period %in% c(2, 4), ]
res_pois  <- count_did_poisson(
  count_sub,
  yname          = "y",
  tname          = "period",
  idname         = "id",
  treat_period   = 4,
  control_period = 2,
  gname          = "g"
)
print(res_pois)
#> 
#> === Count DiD (Poisson QMLE) ===
#> 
#> ATT (log rate ratio): 0.2643  (SE: 0.0510)
#> Rate ratio:           1.3026  [95% CI: 1.1787, 1.4394]
#> ATT (APE, count scale): 3.9909
#> t-stat: 5.185  |  p-value: 0.0000

6 Doubly-Robust Estimation

The doubly-robust estimator combines an outcome model with a propensity score model. It is consistent if either (but not necessarily both) is correctly specified.

res_dr <- binary_did_dr(
  data           = dat[dat$period %in% c(3, 5), ],
  yname          = "y",
  tname          = "period",
  idname         = "id",
  treat_period   = 5,
  control_period = 3,
  gname          = "g",
  xformla        = ~ x1 + x2,
  outcome_model  = "logit"
)
print(res_dr)
#> 
#> === Doubly-Robust Binary DiD ===
#> 
#> ATT:   -0.0080  (SE: 0.0475)
#> 95% CI: [-0.1010, 0.0851]
#> t-stat: -0.168  |  p-value: 0.8670
#> N (treated): 480  |  N (control): 320

7 Nonparametric Bounds

When we are uncertain about functional form, we can compute sharp bounds on the ATT. Under parallel trends alone (with no functional form assumption), the ATT for binary outcomes is point identified. But under weaker assumptions (e.g., only monotone treatment response), we get intervals.

bounds <- nonlinear_bounds(
  data          = dat,
  yname         = "y",
  tname         = "period",
  idname        = "id",
  gname         = "g",
  bound_type    = "manski",  # widest (no assumptions)
  control_group = "nevertreated"
)

# Show post-treatment bounds
post_bounds <- bounds[bounds$post, ]
head(post_bounds[, c("group", "time", "lb", "ub", "identified")])
#>   group time       lb      ub identified
#> 2     2    2 -0.67500 0.32500       TRUE
#> 3     2    3 -0.64375 0.35625       TRUE
#> 4     2    4 -0.62500 0.37500       TRUE
#> 5     2    5 -0.58750 0.41250       TRUE
#> 6     2    6 -0.56250 0.43750       TRUE
#> 7     2    7 -0.58750 0.41250       TRUE

8 Comparison: Linear vs. Nonlinear DiD

A critical question: does it matter? When baseline probabilities are far from 0 and 1, the answer is yes — the linear DiD can be severely biased.

# Linear comparison (for illustration)
res_linear <- nonlinear_attgt(
  data          = dat,
  yname         = "y",
  tname         = "period",
  idname        = "id",
  gname         = "g",
  outcome_model = "linear",
  estimand      = "att"
)

agg_lin  <- nonlinear_aggte(res_linear, type = "dynamic")
agg_logit <- nonlinear_aggte(res, type = "dynamic")

# The two produce different estimates when baseline rates are moderate
cat("Linear overall ATT:",  round(agg_lin$overall_att, 4), "\n")
cat("Logit overall ATT:",   round(agg_logit$overall_att, 4), "\n")
cat("True ATT (avg):    ",  round(mean(c(0.15, 0.25, 0.20)), 4), "\n")

9 Methodological Details

9.1 Identifying Assumption

For the logit-scale ATT(g,t) estimator, the key identifying assumption is:

Parallel Trends on the Logit Scale: For all \(t \geq 2\) and all treated cohorts \(g\):

\[[\text{logit}(E[Y_{it}(0)]) - \text{logit}(E[Y_{it-1}(0)])] \cdot \mathbf{1}_{G_i = g}\] \[= [\text{logit}(E[Y_{it}(0)]) - \text{logit}(E[Y_{it-1}(0)])] \cdot \mathbf{1}_{G_i = \infty}\]

This is a stronger assumption than CS2021’s linear parallel trends when treatment effects are heterogeneous (it restricts both the means and the full distribution).

9.2 DR Estimator

The doubly-robust estimator for binary outcomes uses the influence function:

\[\hat{\psi}^{DR}_{g,t} = \frac{1}{n} \sum_i \left[ \frac{D_i}{\hat{p}} (\Delta Y_i - \hat{\mu}(X_i)) - \frac{(1-D_i)\hat{\pi}(X_i)}{1-\hat{\pi}(X_i)} \cdot \frac{1}{\hat{p}} (\Delta Y_i - \hat{\mu}(X_i)) \right]\]

where \(\Delta Y_i = Y_{it} - Y_{it-1}\), \(\hat{\pi}\) is the propensity score, and \(\hat{\mu}\) is the outcome regression on controls.

Staggered DiD with Nonlinear Outcomes: Methods and Usage

NonlinearDiD Package

2026-05-03