--- title: "Rotating Panels and PoolSurvey" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Rotating Panels and PoolSurvey} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE ) ``` ## Introduction Many national household surveys use **rotating panel designs**, where a sample of respondents is interviewed in an initial wave (*implantation*) and then followed up over successive periods. Uruguay's ECH, for example, interviews each household once and then conducts monthly follow-ups for the rest of the year. metasurvey provides two classes for this type of design: - `RotativePanelSurvey` -- a panel with an implantation survey and a list of follow-up surveys - `PoolSurvey` -- a collection of surveys grouped together for combined estimation across periods ## Creating a RotativePanelSurvey A `RotativePanelSurvey` requires an implantation `Survey` and one or more follow-up `Survey` objects. ```{r create-panel} library(metasurvey) library(data.table) set_use_copy(TRUE) set.seed(42) n <- 100 make_survey <- function(edition) { dt <- data.table( id = 1:n, age = sample(18:80, n, replace = TRUE), income = round(runif(n, 5000, 80000)), employed = sample(0:1, n, replace = TRUE), w = round(runif(n, 0.5, 3.0), 4) ) Survey$new( data = dt, edition = edition, type = "ech", psu = NULL, engine = "data.table", weight = add_weight(annual = "w") ) } # Implantation: 2023 wave 1 impl <- make_survey("2023") # Follow-ups: waves 2 through 4 fu_2 <- make_survey("2023") fu_3 <- make_survey("2023") fu_4 <- make_survey("2023") panel <- RotativePanelSurvey$new( implantation = impl, follow_up = list(fu_2, fu_3, fu_4), type = "ech", default_engine = "data.table", steps = list(), recipes = list(), workflows = list(), design = NULL ) ``` ## Accessing panel components Use `get_implantation()` and `get_follow_up()` to retrieve the individual surveys: ```{r access-panel} # Implantation survey imp <- get_implantation(panel) class(imp) head(get_data(imp), 3) ``` ```{r access-followup} # Follow-up surveys follow_ups <- get_follow_up(panel) cat("Number of follow-ups:", length(follow_ups), "\n") ``` ## Applying steps to panel components Apply transformations to individual panel components. The same step functions work on both the implantation and follow-up surveys: ```{r panel-steps} # Transform the implantation survey panel$implantation <- step_compute(panel$implantation, income_k = income / 1000, comment = "Income in thousands" ) # Apply the same step to each follow-up panel$follow_up <- lapply(panel$follow_up, function(svy) { step_compute(svy, income_k = income / 1000, comment = "Income in thousands") }) ``` ## Estimation on panel components Use `workflow()` on individual panel components to perform cross-sectional or time-series analysis. ### Cross-sectional analysis (Implantation) ```{r workflow-impl} result_impl <- workflow( list(panel$implantation), survey::svymean(~income, na.rm = TRUE), estimation_type = "annual" ) result_impl ``` ### Comparison across follow-ups ```{r workflow-followup} results <- rbindlist(lapply(seq_along(panel$follow_up), function(i) { r <- workflow( list(panel$follow_up[[i]]), survey::svymean(~income, na.rm = TRUE), estimation_type = "annual" ) r$period <- panel$follow_up[[i]]$edition r })) results[, .(period, stat, value, se, cv)] ``` ## PoolSurvey: Combined estimation A `PoolSurvey` groups multiple surveys for combined estimation. This is useful when you want to aggregate monthly data into quarterly or annual estimates, or when combining surveys reduces sampling variability. The constructor takes a nested list: `list(estimation_type = list(group = list(surveys)))`. ```{r pool-create} s1 <- make_survey("2023") s2 <- make_survey("2023") s3 <- make_survey("2023") pool <- PoolSurvey$new( list(annual = list("q1" = list(s1, s2, s3))) ) class(pool) ``` ### Pooled estimation ```{r pool-workflow} pool_result <- workflow( pool, survey::svymean(~income, na.rm = TRUE), estimation_type = "annual" ) pool_result ``` ### Multiple groups Surveys can be organized into multiple groups: ```{r pool-groups} s4 <- make_survey("2023") s5 <- make_survey("2023") s6 <- make_survey("2023") pool_semester <- PoolSurvey$new( list(annual = list( "q1" = list(s1, s2, s3), "q2" = list(s4, s5, s6) )) ) result_semester <- workflow( pool_semester, survey::svymean(~income, na.rm = TRUE), estimation_type = "annual" ) result_semester ``` ## Extracting surveys from panels Use `extract_surveys()` to select specific periods from a `RotativePanelSurvey`: ```{r extract} # Extract specific follow-ups by index first_two <- extract_surveys(panel, index = 1:2) class(first_two) ``` ```r # Extract by month (requires Date-format editions) march_data <- extract_surveys(panel, monthly = 3) ``` ## Time patterns metasurvey provides utilities for working with survey edition dates: ```{r time-patterns} # Extract periodicity from edition strings extract_time_pattern("2023") extract_time_pattern("2023-06") ``` ```{r validate-time} # Validate edition format validate_time_pattern(svy_type = "ech", svy_edition = "2023") ``` ```{r group-dates} # Group dates by period dates <- as.Date(c( "2023-01-15", "2023-03-20", "2023-06-10", "2023-09-05", "2023-11-30" )) group_dates(dates, type = "quarterly") group_dates(dates, type = "biannual") ``` ## Loading panel data from files In practice, panel data is loaded from files using `load_panel_survey()`: ```r panel <- load_panel_survey( path_implantation = "data/ECH_implantacion_2023.csv", path_follow_up = "data/seguimiento/", svy_type = "ech", svy_weight_implantation = add_weight(annual = "pesoano"), svy_weight_follow_up = add_weight(monthly = "pesomes") ) # Access components imp <- get_implantation(panel) fups <- get_follow_up(panel) ``` ## Bootstrap replicate weights For surveys that provide bootstrap replicate weights (such as the ECH), use `add_replicate()` inside `add_weight()` to configure robust variance estimation: ```r panel <- load_panel_survey( path_implantation = "data/ECH_implantacion_2023.csv", path_follow_up = "data/seguimiento/", svy_type = "ech", svy_weight_implantation = add_weight( annual = add_replicate( weight = "pesoano", replicate_pattern = "wr\\d+", replicate_path = "data/pesos_replicados_anual.csv", replicate_id = c("numero" = "numero"), replicate_type = "bootstrap" ) ), svy_weight_follow_up = add_weight(monthly = "pesomes") ) ``` When replicate weights are configured, `workflow()` automatically uses `survey::svrepdesign()` for variance estimation instead of the standard Taylor linearization approach. ## Best practices 1. **Set the periodicity** on each component survey before building the panel 2. **Apply transformations uniformly** -- ensure that the same steps are applied to both implantation and follow-up surveys to guarantee comparability 3. **Use PoolSurvey** when combining surveys to reduce variance or for quarterly/annual aggregations 4. **Validate results** -- compare pooled estimates with direct estimates to verify consistency 5. **Use bootstrap replicate weights** when available for more robust variance estimation ## Next steps - **[Survey designs and validation](complex-designs.html)** -- Stratification, clustering, and pipeline validation - **[ECH case study](ech-case-study.html)** -- Complete labor market analysis with the ECH rotating panel - **[Estimation workflows](workflows-and-estimation.html)** -- `workflow()` and `RecipeWorkflow`