--- title: "Power Analysis and Sample Size Determination for Agreement Studies" author: "Aaron R. Caldwell" date: "Last Updated: `r Sys.Date()`" bibliography: refs.bib link-citations: true output: rmarkdown::html_vignette: toc: true toc_depth: 1 vignette: > %\VignetteIndexEntry{Power Analysis and Sample Size Determination for Agreement Studies} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5 ) ``` # Introduction This vignette provides comprehensive guidance on power analysis and sample size determination for method comparison and agreement studies using the `SimplyAgree` package. ## Available Methods `SimplyAgree` implements four approaches to power/sample size calculations: 1. **`power_agreement_exact()`** - Exact agreement test [@shieh2019] 2. **`blandPowerCurve()`** - Bland-Altman power curves [@lu2016] 3. **`agree_expected_half()`** - Expected half-width criterion [@JanShieh2018] 4. **`agree_assurance()`** - Assurance probability criterion [@JanShieh2018] ```{r load} library(SimplyAgree) ``` # Understanding the Approaches ## Hypothesis Testing vs. Estimation The methods divide into two categories: **Hypothesis Testing** (binary decision): - `power_agreement_exact()` - Tests if central proportion, essentially tolerance intevals, are within the maximal allowable difference - `blandPowerCurve()` - Tests if confidence intervals of limits of agreement fall within the maximal allowable difference **Estimation** (quantifying precision): - `agree_expected_half()` - Controls average CI half-width of limits of agreement - `agree_assurance()` - Controls probability of achieving target CI half-width of limits of agreement # Method 1: Exact Agreement Test ## Overview Tests whether the central P* proportion of paired differences falls within the maximal allowable difference [-delta, delta]. **Hypotheses:** - H0: Methods disagree (central portion extends beyond bounds) - H1: Methods agree (central portion within bounds) ## Usage ```{r exact_usage, eval=FALSE} power_agreement_exact( n = NULL, # Sample size delta = NULL, # Tolerance bound mu = 0, # Mean of differences sigma = NULL, # SD of differences p0_star = 0.95, # Central proportion (tolerance coverage) power = NULL, # Target power alpha = 0.05 # Significance level ) ``` Specify **exactly three** of: n, delta, power, sigma. ## Example: Sample Size Calculation ```{r exact_ex1, eval=TRUE} # Blood pressure device comparison result <- power_agreement_exact( delta = 7, # +/-7 mmHg tolerance mu = 0.5, # Expected bias sigma = 2.5, # Expected SD p0_star = 0.95, # 95% must be within bounds power = 0.80, # 80% power alpha = 0.05 ) print(result) ``` # Method 2: Bland-Altman Power Curves ## Overview Calculates power curves using approximate Bland-Altman confidence intervals using the method of @lu2016 (which is approximate). Useful for exploring power across sample sizes. ## Usage ```{r bland_usage, eval=FALSE} blandPowerCurve( samplesizes = seq(10, 100, 1), # Range of sample sizes mu = 0, # Mean difference SD, # SD of differences delta, # Tolerance bound(s) conf.level = 0.95, # CI confidence level agree.level = 0.95 # LOA agreement level ) ``` ## Example: Power Curve ```{r bland_ex1, eval=TRUE} # Generate power curve pc <- blandPowerCurve( samplesizes = seq(10, 200, 1), mu = 0, SD = 3.3, delta = 8, conf.level = 0.95, agree.level = 0.95 ) # Plot plot(pc, type = 1) # Find n for 80% power find_n(pc, power = 0.8) ``` # Method 3: Expected Half-Width ## Overview Determines sample size to ensure **average** CI half-width <= delta across hypothetical repeated studies. ## Usage ```{r expected_usage, eval=FALSE} agree_expected_half( conf.level = 0.95, # CI confidence level delta = NULL, # Target expected half-width pstar = 0.95, # Central proportion sigma = 1, # SD of differences n = NULL # Sample size ) ``` Specify **either** n OR delta. ## Example: Sample Size for Precision ```{r expected_ex1, eval=TRUE} # Want E[H] <= 2.5*sigma result <- agree_expected_half( conf.level = 0.95, delta = 2.5, # As multiple of sigma pstar = 0.95, sigma = 1 # Standardized ) print(result) ``` # Method 4: Assurance Probability ## Overview Determines sample size to ensure **probability** that CI half-width <= omega is at least (1-gamma). Stronger guarantee than expected half-width --- ensures specific probability of achieving target precision. ## Usage ```{r assurance_usage, eval=FALSE} agree_assurance( conf.level = 0.95, # CI confidence level assurance = 0.90, # Target assurance probability omega = NULL, # Target half-width bound pstar = 0.95, # Central proportion sigma = 1, # SD of differences n = NULL # Sample size ) ``` Specify **either** n OR omega. ## Example: Sample Size with Guarantee ```{r assurance_ex1} # Want 90% probability that H <= 2.5*sigma result <- agree_assurance( conf.level = 0.95, assurance = 0.90, # 90% probability omega = 2.5, # Target bound pstar = 0.95, sigma = 1 ) print(result) ``` # Decision Guide for the Method ``` Research Goal? | |- Hypothesis Testing -> | \- Need exact Type I error control -> Power for Agreement | \- Precision Estimation -> |- Average precision sufficient -> Expected Half-Width \- Need probabilistic guarantee -> Assurance Probability ``` # Handling Clustered/Nested Data ## The Problem Many studies have **clustered** data where there are multiple measurements per subject or natural groupings (e.g., repeated measures, multi-center studies). Note, the advice here only applies to clustering but not to situations where replicate measures are taken within a measurement occasion (e.g., multiple measures at the same time point wherein any variation would only represent measurement error). Standard formulas assume independence^[Implications of which are discussed by @bland2003cluster among many others]. Ignoring clustering can leads to studies that lack precision. To my knowledge, there is no well developed methods for accounting for clustering in sample size calculations for agreement studies, so we use a common approximation from survey sampling and multilevel modeling: the **design effect**. ## My Best Approximation: Design Effect The design effect (DEFF) quantifies loss of efficiency due to clustering: $$\text{DEFF} = 1 + (m - 1) \times \text{ICC}$$ where: - m = observations per cluster - ICC = intraclass correlation coefficient - $n_{ESS}$ = effective sample size **Effect on sample size:** $$n_{\text{ESS}} = n_{\text{independent}} \times \text{DEFF}$$ ## Understanding ICC ICC = proportion of variance between clusters: $$\text{ICC} = \frac{\sigma^2_{\text{between}}}{\sigma^2_{\text{between}} + \sigma^2_{\text{within}}}$$ ## Application Workflow 1. Calculate independent sample size (using power function) 2. Determine m (observations per cluster) 3. Estimate ICC (from pilot data, literature, or theory) 4. Calculate DEFF = 1 + (m-1)*ICC 5. Inflate: n_total = n_indep * DEFF 6. Calculate clusters: K = the smallest integer greater than or equal to (n_total / m) (i.e., round up n_total / m) ## Example: Repeated Measures Design ```{r cluster_ex1, eval=TRUE} # Step 1: Independent sample size result <- power_agreement_exact( delta = 7, mu = 0.5, sigma = 2.5, p0_star = 0.95, power = 0.80, alpha = 0.05 ) n_indep <- result$n cat("Independent pairs needed:", n_indep, "\n") # Step 2: Apply design effect m <- 3 # 3 measurements per participant ICC <- 0.15 # from pilot or literature DEFF <- 1 + (m - 1) * ICC cat("Design effect:", round(DEFF, 3), "\n") # Step 3: Calculate participants needed n_ess <- ceiling(n_indep * DEFF) K <- ceiling(n_ess / m) cat("Total observations:", n_ess, "\n") cat("Participants needed:", K, "\n") ``` **Result**: Instead of 34 independent pairs, need ~15 participants (45 total observations). ## Impact of ICC ```{r cluster_ex2, eval=TRUE} # Compare different ICC values n_indep <- 50 m <- 4 ICC_values <- c(0, 0.05, 0.10, 0.15, 0.20) for (ICC in ICC_values) { DEFF <- 1 + (m - 1) * ICC K <- ceiling(ceiling(n_indep * DEFF) / m) cat(sprintf("ICC = %.2f: Need %d participants\n", ICC, K)) } ``` ## When Design Effect Works Well **Good situations:** - Balanced designs (equal cluster sizes) - Moderate ICC (0.01 - 0.30) - Sufficient clusters (K >= 10) - Simple two-level hierarchy **Problematic:** - Highly unbalanced clusters - Very high ICC (> 0.4) - Small number of clusters (K < 10) - Complex correlation structures - Multiple levels of nesting For complex designs, consider simulation-based power analysis and consult a statistician. ## Complete Example with Clustering ```{r cluster_complete, eval=TRUE} # Study parameters sigma <- 3.3 delta <- 7 m <- 4 # measurements per participant ICC <- 0.15 dropout <- 0.20 # Step 1: Independent sample size result <- power_agreement_exact( delta = delta, mu = 0, sigma = sigma, p0_star = 0.95, power = 0.80, alpha = 0.05 ) # Step 2: Account for clustering DEFF <- 1 + (m - 1) * ICC n_total <- ceiling(result$n * DEFF) K_pre <- ceiling(n_total / m) # Step 3: Account for dropout K_final <- ceiling(K_pre / (1 - dropout)) # Summary cat("Independent pairs:", result$n, "\n") cat("Design effect:", round(DEFF, 3), "\n") cat("Participants (no dropout):", K_pre, "\n") cat("Participants to recruit:", K_final, "\n") cat("Total measurements:", K_final * m, "\n") ``` # Practical Recommendations ## Planning Checklist - [ ] Define research question (hypothesis test vs. estimation) - [ ] Choose appropriate power method - If possible, obtain pilot estimates (SD, ICC if clustered) - [ ] Calculate independent sample size - [ ] Adjust for clustering (if applicable) - [ ] Adjust for dropout - [ ] Conduct sensitivity analyses - [ ] Document all assumptions and sources - [ ] Pre-register before data collection ## Conservative Planning When uncertain: - Use **upper range** of plausible SD estimates - Use **upper range** of plausible ICC estimates - Build in **10-20% buffer** beyond calculated required sample size - Conduct **sensitivity analyses** for key parameters (SD, ICC, etc) # References