A Causal Framework for Hierarchical Outcome Analysis

library(causalWins)

Description

Quantifying causal effects with complex, multivariate outcomes remains a challenge in treatment evaluation. For hierarchical outcomes, regulatory agencies such as the U.S. Food and Drug Administration recommend methods based on the Win Ratio (Pocock et al. (2012)) and Generalized Pairwise Comparisons (Buyse (2010)). These approaches typically estimate (in this vignette using the WINS package by Cui and Huang (2025)) a population-level quantity — the probability that a randomly selected treated patient fares better than a randomly selected control patient. In Even and Josse (2025), the authors studied another quantity that can be very interesting in this individual context: the probability that a given individual would have a better outcome under treatment than under control. Unfortunately, this quantity is not identifiable, because for each individual we do not observe both the treated and control outcomes. Even and Josse (2025) thus proposed a proxy for this quantity that is identifiable, and three ways to estimate it. This package implements that methodology and the three complementary estimators: (i) a nearest-neighbor matching estimator (nn_win_ratio), (ii) a distributional regression estimator (drf_win_ratio), and (iii) a semiparametric efficient estimator (eif_win_ratio).

How to install

The latest release of the package can be installed through CRAN (soon):

install.packages("causalWins")

Example 1

The goal of this example is to illustrate, in a simple setting, how to use the functions nn_win_ratio, drf_win_ratio, and eif_win_ratio. The estimators produced by these functions target the causal (non-identifiable) quantity

\[ \tau_{\text{indiv}} := \mathbb{E}\Big[w\big(Y_i(1) \big| Y_i(0)\big)\Big], \]

where \(Y_i(0)\) and \(Y_i(1)\) denote the potential outcomes for individual \(i\) under control and treatment, respectively. The function \(w(\cdot| \cdot)\) compares these outcomes to determine whether a patient benefits more from treatment or from control. This measure is not identifiable. The authors of Buyse (2010) introduce the following population quantity estimated via nearest-neighbor matching: \[ \tau_{\text{pop}} := \mathbb{E}\Big[w\big(Y_i(1)\,\big|\, Y_j(0)\big)\Big], \] where \(i\) and \(j\) are two different and independent individuals. In Even and Josse (2025) the authors studied a proxy of the individual quantity,

\[ \tau^\star := \mathbb{E}\Big[ w\big(Y^{(X_i)}(1), Y_i(0)\big) \Big], \quad \text{ where } \tau^\star(x) := \mathbb{E}\Big[ w\big(Y^{(x)}(1), Y_i(0)\big) \;\big|\; X_i = x \Big], \] and where for \(x \in X\), \(\{Y^{(x)}(0), Y^{(x)}(1)\}\) denotes an independent copy of \(\{Y_i(0), Y_i(1)\} \mid X_i = x\).

For more details on how to use them, please refer to the function documentation ?nn_win_ratio, ?drf_win_ratio and eif_win_ratio.

# ---------------------------------------------------------
    # Generate data with mixed covariates
# ---------------------------------------------------------
set.seed(123)
n <- 100
base <- data.frame(
  X1 = rnorm(n, mean = 5, sd = 2), # numeric covariate 1
  X2 = rnorm(n, mean = 10, sd = 3), # numeric covariate 2
  X3 = factor(sample(c("A", "B", "C"), n, replace = TRUE)), # factor covariate
  Y1 = rnorm(n, mean = 50, sd = 10), # numeric outcome 1
  Y2 = rnorm(n, mean = 100, sd = 20), # numeric outcome 2
  arm = sample(c(0, 1), n, replace = TRUE) # binary treatment arm
)
# ---------------------------------------------------------
    # Compute win ratio with nearest neighbor pairing
# ---------------------------------------------------------
res_nn <- nn_win_ratio(
  base = base,
  treatment_name = "arm",
  outcome_names = c("Y1", "Y2")
)
# ---------------------------------------------------------
    # Compute win ratio with Distributional Random Forests
# ---------------------------------------------------------
res_drf <- drf_win_ratio(
  base = base,
  treatment_name = "arm",
  outcome_names = c("Y1", "Y2")
)
# ---------------------------------------------------------
    # Compute win ratio with Efficient Influence Functions
# ---------------------------------------------------------
res_eif <- eif_win_ratio(
  base = base,
  treatment_name = "arm",
  outcome_names = c("Y1", "Y2")
)

Example 2

Drawing on Example 2 from Even and Josse (2025), the following simulation illustrates the potential discrepancy between the historical Win Ratio and the individual-level estimator proposed in that work. The historical Win Ratio estimated here using the WINS package (Cui and Huang (2025)), which is based on complete pairing, differs from the estimates obtained via the individual-level approach and the causalWins package, which employs nearest-neighbor (NN) matching.

library(WINS)

# In this example, there are two subpopulations: A and B. The potential outcomes are

y_0a <- 4 # potential outcome of individuals of subpopulation A without treatment
y_1b <- 3 # potential outcome of individuals of subpopulation B under treatment
y_0b <- 2 # potential outcome of individuals of subpopulation B without treatment
y_1a <- 1 # potential outcome of individuals of subpopulation A under treatment

alpha <- 1 / 3 # proportion of women

n_rounds <- 100
n <- 300*1:5  # number of individuals 300, 600, 900, 1200, 1500

rct_prob <- .5 # probability of assignment to treatment/control group

results_nn <- data.frame(row_id = 1:n_rounds)
results_comp <- data.frame(row_id = 1:n_rounds)

for (n_individuals in n) {
  current_result_nn <- c()
  current_result_comp <- c()

  for (round in 1:n_rounds) {
    set.seed(123 + n_individuals + round)

    # ---------------------------------------------------
    # Generate Data
    # ---------------------------------------------------

    X <- rbinom(n_individuals, 1, alpha) # Sample element in A with prob `alpha`
    treatment <- rbinom(n_individuals, 1, rct_prob) # Assign to treatment with prob `rct_prob`
    Y <- X * treatment * y_1a + X * (1 - treatment) * y_0a +
      (1 - X) * treatment * y_1b + (1 - X) * (1 - treatment) * y_0b # Outcomes
    base <- data.frame(X = X, Y = Y, arm = treatment)

    # ---------------------------------------------------
    # Compute Win Proportion for Nearest Neighbor Pairing
    # ---------------------------------------------------
    res_nn <- nn_win_ratio(
      base = base,
      treatment_name = "arm",
      outcome_names = "Y"
    )
    res_nn <- res_nn$win_proportion$value
    current_result_nn <- c(current_result_nn, res_nn)

    # ---------------------------------------------------
    # Compute Win Proportion for Complete Pairing
    # ---------------------------------------------------
    data <- base

    colnames(data)[colnames(data) == "Y"] <- "Y_1" # Format for use with the WINS Package
    data$id <- 1:n_individuals # Format for use with the WINS Package
    res_comp <- WINS::win.stat(
      data = data,
      ep_type = "continuous",
      arm.name = c(1, 0),
      priority = c(1),
      summary.print = FALSE
    )
    res_comp <- res_comp$Win_prop[[2]]
    current_result_comp <- c(current_result_comp, res_comp)
  }
  results_nn[[as.character(n_individuals)]] <- current_result_nn
  results_comp[[as.character(n_individuals)]] <- current_result_comp
}

Boxplots

We now present boxplots of the simulated win proportions calculated using the causalWins and WINS packages. The results indicate that, under substantial heterogeneity, the two methods are not equivalent and can result in different treatment recommendations.

results_nn$row_id <- NULL
results_comp$row_id <- NULL

# # Interleave columns for side-by-side plotting
combined <- unlist(Map(list, results_nn, results_comp), recursive = FALSE)

# Set names for x-axis
names(combined) <- rep(names(results_nn), each = 2)

# Plot
boxplot(combined,
  col = rep(c("skyblue", "orange"), ncol(results_nn)),
  las = 2,
  main = "Comparison of the win proportions computed \nwith Nearest Neighbors and Complete Pairings",
  ylab = "Win proportion",
  xlab = "Number of patients (n)"
)

legend("bottomleft",
  legend = c("NN", "Complete"),
  fill = c("skyblue", "orange")
)

abline(h = 0.5, lty = 2, col = "red")

Example 3

In contrast with the previous example, the following simulation demonstrates that in an RCT with a more homogeneous population, the complete pairing and nearest neighbor approaches yield broadly similar results.

library(MASS)
library(WINS)

n_rounds <- 100
n <- 300* 1:5 # number of individuals 300, 600, 900, 1200, 1500

mu <- c(0, 0) # Covariates mean
sigma <- matrix(c(1, 0.6, 0.6, 1), 2, 2) # Covariates covariance

tau_cont <- 2 # ATE of continuous outcome
tau_bin <- 0.8 # ATE on logit scale of binary outcome

rct_prob <- .5 # probability of assignment to treatment/control group

results_nn <- data.frame(row_id = 1:n_rounds)
results_comp <- data.frame(row_id = 1:n_rounds)

for (n_individuals in n) {
  current_result_nn <- c()
  current_result_comp <- c()

  for (round in 1:n_rounds) {
    set.seed(123 + n_individuals + round)

    # ---------------------------------------------------
    # Generate Covariates and Treatment
    # ---------------------------------------------------

    covariates <- as.data.frame(mvrnorm(n_individuals, mu = mu, Sigma = sigma)) # covariates
    colnames(covariates) <- c("X1", "X2")
    treatment <- rbinom(n_individuals, 1, rct_prob) # Assign to treatment with prob `rct_prob`

    # ---------------------------------------------------
    # Generate Binary Outcomes
    # ---------------------------------------------------
    # Potential Outcomes
    lin_pred0 <- -1 + 0.3 * covariates$X1 - 0.5 * covariates$X2
    lin_pred1 <- lin_pred0 + tau_bin

    # Sample binary potential outcomes
    prob0 <- 1 / (1 + exp(-lin_pred0))
    prob1 <- 1 / (1 + exp(-lin_pred1))
    Y0_bin <- rbinom(n_individuals, size = 1, prob = prob0)
    Y1_bin <- rbinom(n_individuals, size = 1, prob = prob1)

    # Observed binary outcome
    Y_1 <- Y0_bin * (1 - treatment) + Y1_bin * treatment

    # ---------------------------------------------------
    # Generate Continuous Outcomes
    # ---------------------------------------------------
    # Potential Outcomes
    Y0_cont <- 1 + 0.5 * covariates$X1 - 0.7 * covariates$X2 + rnorm(n_individuals, 0, 1)
    Y1_cont <- Y0_cont + tau_cont

    # Observed continuous outcome
    Y_2 <- Y0_cont * (1 - treatment) + Y1_cont * treatment

    # ---------------------------------------------------
    # Compute Win Proportion for Nearest Neighbor Pairing
    # ---------------------------------------------------
    base <- data.frame(covariates, arm = treatment, Y_1, Y_2)

    res_nn <- nn_win_ratio(
      base = base,
      treatment_name = "arm",
      outcome_names = c("Y_1", "Y_2")
    )
    res_nn <- res_nn$win_proportion$value
    current_result_nn <- c(current_result_nn, res_nn)
    # ---------------------------------------------------
    # Compute Win Proportion for Complete Pairing
    # ---------------------------------------------------
    data <- base

    data$id <- 1:n_individuals # Format for use with the WINS Package
    res_comp <- WINS::win.stat(
      data = data,
      ep_type = c("binary", "continuous"),
      arm.name = c(1, 0),
      priority = c(1, 2),
      summary.print = FALSE
    )
    res_comp <- res_comp$Win_prop[[2]]
    current_result_comp <- c(current_result_comp, res_comp)
  }
  results_nn[[as.character(n_individuals)]] <- current_result_nn
  results_comp[[as.character(n_individuals)]] <- current_result_comp
}

Boxplots

Unlike the previous plot, we now observe that both approaches yield similar treatment recommendations.

results_nn$row_id <- NULL
results_comp$row_id <- NULL

# # Interleave columns for side-by-side plotting
combined <- unlist(Map(list, results_nn, results_comp), recursive = FALSE)

# Set names for x-axis
names(combined) <- rep(names(results_nn), each = 2)

# Plot
boxplot(combined,
  col = rep(c("skyblue", "orange"), ncol(results_nn)),
  las = 2,
  main = "Comparison of the win proportions computed \nwith Nearest Neighbors and Complete Pairings",
  ylab = "Win proportion",
  xlab = "Number of patients (n)"
)

legend("bottomleft",
  legend = c("NN", "Complete"),
  fill = c("skyblue", "orange")
)

abline(h = 0.5, lty = 2, col = "red")

Example 4

This simulation illustrates the application of the Efficient Influence Function (EIF) method. The data-generating process is similar to the previous example: covariates are produced in the same way. However, the treatment assignment is no longer randomized. Instead, it depends on the covariates, meaning the setting corresponds to an observational study.

library(doParallel)
library(foreach)

n_rounds <- 100
n <- 10000
mu <- c(0, 0)
sigma <- matrix(c(1, 0.6, 0.6, 1), 2, 2)
tau_cont <- 2
tau_bin <- 0.8

# Set up cluster
n_cores <- parallel::detectCores() - 1  # leave 1 core free
cl <- makeCluster(n_cores)
registerDoParallel(cl)

results_eif <- foreach(
  round = 1:n_rounds,
  .combine = c,
  .packages = c("MASS", "causalWins")
) %dopar% {
  set.seed(123 + n + round)

  # -----------------------------------------------------------
  # Generate Covariates and Treatment (observational)
  # -----------------------------------------------------------

  covariates <- mvrnorm(n, mu = mu, Sigma = sigma) # covariates
  colnames(covariates) <- c("X1", "X2")
  covariates <- as.data.frame(covariates)

  p_treatment <- 1 / (1 + exp(-(0.5 * covariates$X1 - 0.3 * covariates$X2))) # non-rct
  treatment <- rbinom(n, 1, p_treatment)

  # -----------------------------------------------------------
  # Generate Binary Outcomes
  # -----------------------------------------------------------
  # Potential Outcomes
  lin_pred0 <- -1 + 0.3 * covariates$X1 - 0.5 * covariates$X2
  lin_pred1 <- lin_pred0 + tau_bin

  # Sample binary potential outcomes
  prob0 <- 1 / (1 + exp(-lin_pred0))
  prob1 <- 1 / (1 + exp(-lin_pred1))
  Y0_bin <- rbinom(n, size = 1, prob = prob0)
  Y1_bin <- rbinom(n, size = 1, prob = prob1)

  # Observed binary outcome
  Y_1 <- Y0_bin * (1 - treatment) + Y1_bin * treatment

  # -----------------------------------------------------------
  # Generate Continuous Outcomes
  # -----------------------------------------------------------
  # Potential Outcomes
  Y0_cont <- 1 + 0.5 * covariates$X1 - 0.7 * covariates$X2 + rnorm(n, 0, 1)
  Y1_cont <- Y0_cont + tau_cont

  # Observed continuous outcome
  Y_2 <- Y0_cont * (1 - treatment) + Y1_cont * treatment

  # -----------------------------------------------------------
  # Compute Win Proportion for drf and eif
  # -----------------------------------------------------------
  base <- data.frame(covariates, arm = treatment, Y_1, Y_2)

  res_eif <- eif_win_ratio(
    base = base,
    treatment_name = "arm",
    outcome_names = c("Y_1", "Y_2"),
    propensity_model = "glm"
  )

  res_eif <- res_eif$win_proportion$value

  res_eif
}
stopCluster(cl)

Approximation of the true causal measure

In what follows, we construct an approximation of the causal measure \(\tau^{\star}\) defined in Even and Josse (2025) and which is revisited in Example 1 for completeness.

library(MASS)
mu <- c(0, 0) # Covariates mean
sigma <- matrix(c(1, 0.6, 0.6, 1), 2, 2)
tau_cont <- 2 # ATE of continuous outcome
tau_bin <- 0.8 # ATE on logit scale of binary outcome

# Default win function
win_function <- function(y, z) {
  if (all(y == z)) {
    return(0) # Tie
  }
  result <- (y > z)[which.max(y != z)] * 1
  return(result)
}

mean_given <- function(x) {
  n_samples <- 10000
  # Binary potential outcomes
  lin_pred0 <- -1 + 0.3 * x[1] - 0.5 * x[2]
  lin_pred1 <- lin_pred0 + tau_bin
  prob0 <- 1 / (1 + exp(-lin_pred0))
  prob1 <- 1 / (1 + exp(-lin_pred1))
  Y0_bin <- rbinom(n_samples, size = 1, prob = prob0)
  Y1_bin <- rbinom(n_samples, size = 1, prob = prob1)

  # Continuous Potential Outcomes
  Y0_cont <- 1 + 0.5 * x[1] - 0.7 * x[2] + rnorm(n_samples, 0, 1)
  Y1_cont <- 1 + 0.5 * x[1] - 0.7 * x[2] + rnorm(n_samples, 0, 1) + tau_cont

  wins_given_x <- mapply(function(yb, yc, zb, zc) {
    win_function(c(yb, yc), c(zb, zc))
  }, Y1_bin, Y1_cont, Y0_bin, Y0_cont)

  return(mean(wins_given_x))
}

set.seed(123)

covariates <- as.data.frame(mvrnorm(10000, mu = mu, Sigma = sigma))

tau_star <- mean(apply(covariates, 1, function(row) mean_given(row)))

Plot

The plot displays the simulation boxplots alongside the target causal measure.

boxplot(results_eif,
  main = "Win proportion with EIF",
  xlab = "Number of patients 10000",
  ylab = "Win proportion"
)
abline(h = tau_star, lty = 2, col = "red")

Computation of the Confidence Interval for Win Ratio

Let \(w\) and \(l\) denote the win and loss vectors, respectively, as computed by any of the three methods. The win ratio is defined as \(\bar{w} / \bar{l}\), where \[ \bar{w} = \frac{1}{n} \sum_{i=1}^{n} w_i \quad \text{and} \quad \bar{l} = \frac{1}{n} \sum_{i=1}^{n} l_i, \] with \(n\) denoting the number of individuals. Let \(s_w^2\), \(s_l^2\), and \(s^2_{wl}\) denote the sample variances of \(w\), \(l\) and the sample covariance of \((w,l)\), i.e., \[\begin{equation} \begin{aligned} s^2_w &= \frac{1}{n-1}\sum_i (w_i-\bar w)^2, \quad s^2_l = \frac{1}{n-1}\sum_i (l_i-\bar l)^2\quad \text{ and} \\ s^2_{wl} &= \frac{1}{n-1}\sum_i (w_i-\bar w)(l_i-\bar l) \end{aligned} \end{equation}\] To obtain the confidence interval for the win ratio we first compute \[\begin{equation} s_\text{log} = \sqrt{\frac{s^2_w}{n\bar w^2}+\frac{s^2_l}{n\bar l^2}-2\frac{s^2_{wl}}{n \bar w \bar l}} \end{equation}\] which can be seen to be an approximation of the variance of \(\log(\bar w/ \bar l)=\log(\bar w) -\log(\bar l)\) using the delta method. The confidence interval is then computed by \[\begin{equation} \Big[\exp\big(\log(\bar w/ \bar l)-1.96 s_{\text{log}})\big), \exp\big(\log(\bar w \bar l)+1.96 s_{\text{log}})\big)\Big]; \end{equation}\] which can be seen as the exponential of the confidence interval for the \(\log(\bar w/\bar l)\).

References

Buyse, Marc. 2010. “Generalized Pairwise Comparisons of Prioritized Outcomes in the Two-Sample Problem.” Statistics in Medicine 29 (30): 3245–57.
Cui, Ying, and Bo Huang. 2025. WINS: The r WINS Package. https://doi.org/10.32614/CRAN.package.WINS.
Even, Mathieu, and Julie Josse. 2025. “Rethinking the Win Ratio: A Causal Framework for Hierarchical Outcome Analysis.” https://arxiv.org/abs/2501.16933.
Pocock, Stuart J, Cono A Ariti, Timothy J Collier, and Duolao Wang. 2012. “The Win Ratio: A New Approach to the Analysis of Composite Endpoints in Clinical Trials Based on Clinical Priorities.” European Heart Journal 33 (2): 176–82.