After transforming survey data with steps and recipes, the next task is estimation: computing means, totals, ratios, and their standard errors while accounting for the complex survey design.
The workflow() function wraps the estimators from the
survey package (svymean,
svytotal, svyratio, svyby) and
returns tidy results as a data.table that include:
We use the Academic Performance Index (API) dataset from the
survey package, which contains real data from stratified
schools in California.
We estimate the population mean of the API score in the year 2000:
We estimate total enrollment across all schools:
You can pass multiple estimation calls to workflow() to
compute them in a single step:
results <- workflow(
list(svy),
survey::svymean(~api00, na.rm = TRUE),
survey::svytotal(~enroll, na.rm = TRUE),
estimation_type = "annual"
)
results
#> stat value se cv confint_lower
#> <char> <num> <num> <num> <num>
#> 1: survey::svymean: api00 662.2874 9.585429e+00 0.01447322 643.5003
#> 2: survey::svytotal: enroll 3687177.5324 1.645323e+05 0.04462283 3364700.1537
#> confint_upper
#> <num>
#> 1: 681.0745
#> 2: 4009654.9112We use survey::svyby() to compute estimates by
subpopulations (domains):
# Mean API score by school type
api_by_type <- workflow(
list(svy),
survey::svyby(~api00, ~stype, survey::svymean, na.rm = TRUE),
estimation_type = "annual"
)
api_by_type
#> stat value se cv confint_lower
#> <char> <num> <num> <num> <num>
#> 1: survey::svyby: api00 [stype=E] 674.43 12.49343 0.01852443 649.9433
#> 2: survey::svyby: api00 [stype=H] 625.82 15.34078 0.02451309 595.7526
#> 3: survey::svyby: api00 [stype=M] 636.60 16.50239 0.02592270 604.2559
#> confint_upper stype
#> <num> <fctr>
#> 1: 698.9167 E
#> 2: 655.8874 H
#> 3: 668.9441 M# Mean enrollment by awards status
enroll_by_award <- workflow(
list(svy),
survey::svyby(~enroll, ~awards, survey::svymean, na.rm = TRUE),
estimation_type = "annual"
)
enroll_by_award
#> stat value se cv
#> <char> <num> <num> <num>
#> 1: survey::svyby: enroll [awards=No] 727.5958 57.54094 0.07908366
#> 2: survey::svyby: enroll [awards=Yes] 520.5114 25.51854 0.04902590
#> confint_lower confint_upper awards
#> <num> <num> <fctr>
#> 1: 614.8177 840.3740 No
#> 2: 470.4960 570.5269 YesThe coefficient of variation (CV) measures the
precision of an estimate. You can use evaluate_cv() to
classify quality following standard guidelines:
| CV Range | Quality | Recommendation |
|---|---|---|
| < 5% | Excellent | Use without restrictions |
| 5-10% | Very good | Use with confidence |
| 10-15% | Good | Use for most purposes |
| 15-25% | Acceptable | Use with caution |
| 25-35% | Poor | Only for general trends |
| >= 35% | Unreliable | Do not publish |
A RecipeWorkflow bundles estimation calls with metadata,
making the analysis reproducible and shareable. It records:
wf <- RecipeWorkflow$new(
name = "API Score Analysis 2000",
description = "Mean API score estimation by school type",
user = "Research Team",
survey_type = "api",
edition = "2000",
estimation_type = "annual",
recipe_ids = character(0),
calls = list(
"survey::svymean(~api00, na.rm = TRUE)",
"survey::svyby(~api00, ~stype, survey::svymean, na.rm = TRUE)"
)
)
wf
#>
#> ── Workflow: API Score Analysis 2000 ──
#> Author: Research Team
#> Survey: api / 2000
#> Version: 1.0.0
#> Description: Mean API score estimation by school type
#> Certification: community
#> Estimation types: annual
#>
#> ── Calls (2) ──
#> 1. survey::svymean(~api00, na.rm = TRUE)
#> 2. survey::svyby(~api00, ~stype, survey::svymean, na.rm = TRUE)We publish the workflow so that others can discover and reuse it:
# Configure a local backend
wf_path <- tempfile(fileext = ".json")
set_workflow_backend("local", path = wf_path)
# Publish
publish_workflow(wf)
# Discover workflows
all_wf <- list_workflows()
length(all_wf)
#> [1] 1
# Search by text
found <- search_workflows("income")
length(found)
#> [1] 0
# Filter by survey type
ech_wf <- filter_workflows(survey_type = "ech")
length(ech_wf)
#> [1] 0If you have a recipe and want to know which estimates have been
published for it, you can use
find_workflows_for_recipe():
# Create a workflow that references a recipe
wf2 <- RecipeWorkflow$new(
name = "Labor Market Estimates",
user = "Team",
survey_type = "ech",
edition = "2023",
estimation_type = "annual",
recipe_ids = c("labor_force_recipe_001"),
calls = list("survey::svymean(~employed, na.rm = TRUE)")
)
publish_workflow(wf2)
# Find all workflows that use this recipe
related <- find_workflows_for_recipe("labor_force_recipe_001")
length(related)
#> [1] 1
if (length(related) > 0) cat("Found:", related[[1]]$name, "\n")
#> Found: Labor Market EstimatesBelow is a complete pipeline from raw data to publishable estimation, using the API dataset:
# 1. Create survey from real data
dt_full <- data.table(apistrat)
svy_full <- Survey$new(
data = dt_full,
edition = "2000",
type = "api",
psu = NULL,
engine = "data.table",
weight = add_weight(annual = "pw")
)
# 2. Apply steps: compute derived variables
svy_full <- step_compute(svy_full,
api_growth = api00 - api99,
high_growth = ifelse(api00 - api99 > 50, 1L, 0L),
comment = "API score growth indicators"
)
svy_full <- step_recode(svy_full, school_level,
stype == "E" ~ "Elementary",
stype == "M" ~ "Middle",
stype == "H" ~ "High",
.default = "Other",
comment = "School level classification"
)
# 3. Estimate means
estimates <- workflow(
list(svy_full),
survey::svymean(~api_growth, na.rm = TRUE),
survey::svymean(~high_growth, na.rm = TRUE),
estimation_type = "annual"
)
estimates
#> stat value se cv confint_lower
#> <char> <num> <num> <num> <num>
#> 1: survey::svymean: api_growth 32.8925184 2.1583789 0.06561914 28.6621734
#> 2: survey::svymean: high_growth 0.2938489 0.0363651 0.12375443 0.2225746
#> confint_upper
#> <num>
#> 1: 37.1228633
#> 2: 0.3651232# 4. Domain estimation (by school type)
by_school <- workflow(
list(svy_full),
survey::svyby(~api00, ~stype, survey::svymean, na.rm = TRUE),
estimation_type = "annual"
)
by_school
#> stat value se cv confint_lower
#> <char> <num> <num> <num> <num>
#> 1: survey::svyby: api00 [stype=E] 674.43 12.49343 0.01852443 649.9433
#> 2: survey::svyby: api00 [stype=H] 625.82 15.34078 0.02451309 595.7526
#> 3: survey::svyby: api00 [stype=M] 636.60 16.50239 0.02592270 604.2559
#> confint_upper stype
#> <num> <fctr>
#> 1: 698.9167 E
#> 2: 655.8874 H
#> 3: 668.9441 MEvery Survey object records provenance
metadata: where the data came from, which steps were applied, how many
rows survived each step, and which versions of R and metasurvey were
used. This makes it possible to trace any estimate back to the raw
data.
# Provenance is populated automatically after bake_steps()
prov <- provenance(svy_full)
prov
#> ── Data Provenance ─────────────────────────────────────────────────────────────
#> Loaded: 2026-02-20T14:09:33
#> Initial rows: 200
#>
#> Environment:
#> metasurvey: 0.0.21
#> R: 4.4.3
#> survey: 4.4.8Provenance is also attached to workflow() results, so
you can always inspect the full lineage of an estimate:
prov_wf <- provenance(estimates)
cat("metasurvey version:", prov_wf$environment$metasurvey_version, "\n")
#> metasurvey version: 0.0.21
cat("Steps applied:", length(prov_wf$steps), "\n")
#> Steps applied: 0For audit trails, export provenance to JSON:
To compare two runs (e.g., different editions), use
provenance_diff():
workflow_table() formats estimation results as
publication-ready tables using the gt package. It adds
confidence intervals, CV quality classification with color coding, and
provenance-based source notes.
| Survey Estimation Results | ||||||
| Statistic | Estimate | SE | CI Lower | CI Upper | CV (%) | Quality |
|---|---|---|---|---|---|---|
| :svymean: api_growth | 32.89 | 2.158 | 28.66 | 37.12 | 6.6 | Very good |
| :svymean: high_growth | 0.29 | 0.036 | 0.22 | 0.37 | 12.4 | Good |
| metasurvey 0.0.21 | CI: 95% | 2026-02-20 | ||||||
You can customize the output:
# Spanish locale, hide SE, custom title
workflow_table(
estimates,
locale = "es",
show_se = FALSE,
title = "API Growth Indicators",
subtitle = "California Schools, 2000"
)| API Growth Indicators | |||||
| California Schools, 2000 | |||||
| Statistic | Estimate | CI Lower | CI Upper | CV (%) | Quality |
|---|---|---|---|---|---|
| :svymean: api_growth | 32,89 | 28,66 | 37,12 | 6,6 | Very good |
| :svymean: high_growth | 0,29 | 0,22 | 0,37 | 12,4 | Good |
| metasurvey 0.0.21 | CI: 95% | 2026-02-20 | |||||
For domain estimates, the table detects group columns automatically:
| Survey Estimation Results | |||||||
| Statistic | stype | Estimate | SE | CI Lower | CI Upper | CV (%) | Quality |
|---|---|---|---|---|---|---|---|
| :svyby: api00 | E | 674.43 | 12.493 | 649.94 | 698.92 | 1.9 | Excellent |
| :svyby: api00 | H | 625.82 | 15.341 | 595.75 | 655.89 | 2.5 | Excellent |
| :svyby: api00 | M | 636.60 | 16.502 | 604.26 | 668.94 | 2.6 | Excellent |
| metasurvey 0.0.21 | CI: 95% | 2026-02-20 | |||||||
Export to any format supported by gt::gtsave():