API and Database Reference

Overview

metasurvey provides a REST API built with plumber backed by MongoDB for sharing recipes, workflows, and variable metadata with the community. The API can be self-hosted (see vignette("self-hosting")) and is used by both the R client functions (api_*) and the Shiny exploration application.

After deploying, the Swagger UI interface at <your-api-url>/__docs__/ provides an interactive endpoint explorer automatically generated by plumber. For detailed request/response schemas and MongoDB collection documentation, see the sections below.

Configuration

library(metasurvey)

# Point to your self-hosted API
configure_api("https://your-api-host.example.com")

# Or use an environment variable
Sys.setenv(METASURVEY_API_URL = "https://your-api-host.example.com")

The R client reads the URL first from configure_api(), then falls back to the METASURVEY_API_URL environment variable.

Authentication

The API uses JWT (JSON Web Token) authentication with HMAC-SHA256 signing. Tokens expire after 24 hours; long-lived tokens (90 days) can be generated for automated scripts.

Registration

# Individual account (auto-approved)
api_register("Ana Garcia", "ana@example.com", "password123")

# Institutional member (requires admin review)
api_register(
  "Carlos Rodriguez",
  "carlos@ine.gub.uy",
  "password123",
  user_type = "institutional_member",
  institution = "INE Uruguay"
)

Account types:

Type Description Approval
individual Independent researcher Automatic
institutional_member Member of a recognized institution Requires admin review
institution Institutional account Requires admin review

Login

api_login("ana@example.com", "password123")

The token is stored in the session and used automatically in subsequent API calls. The client automatically renews tokens within 5 minutes of their expiration.

Session Management

# View current user profile
api_me()

# Refresh token
api_refresh_token()

# Logout
api_logout()

Long-lived Tokens

For automated scripts and CI/CD, generate a 90-day token from the Shiny application (Profile tab) or use it directly:

Sys.setenv(METASURVEY_TOKEN = "your-long-lived-token")

# API calls work without interactive login
recipes <- api_list_recipes(survey_type = "ech")

API Endpoints

Recipes

Method Endpoint Auth Description
GET /recipes No List and search recipes
GET /recipes/:id No Get an individual recipe
POST /recipes Yes Publish a new recipe
POST /recipes/:id/download No Increment download counter

List Recipes

# All recipes
all <- api_list_recipes()

# Filter by survey type
ech <- api_list_recipes(survey_type = "ech")

# Search by text
labor <- api_list_recipes(search = "empleo")

# Filter by topic
income <- api_list_recipes(topic = "income")

# Filter by certification level
official <- api_list_recipes(certification = "official")

# Pagination
page2 <- api_list_recipes(limit = 10, offset = 10)

Query parameters:

Parameter Type Description
search string Regex search on recipe name
survey_type string ech, eaii, eph, eai
topic string labor_market, income, education, health, demographics, housing
certification string community, reviewed, official
user string Filter by author email
limit integer Maximum results (default 50)
offset integer Skip N results (default 0)

Get Recipe

recipe <- api_get_recipe("ech_employment_001")

Publish Recipe

api_login("ana@example.com", "password123")
api_publish_recipe(my_recipe)

The server automatically sets the user field from the JWT, initializes downloads = 0, generates an id if not provided, and assigns the community certification by default.

Workflows

Method Endpoint Auth Description
GET /workflows No List and search workflows
GET /workflows/:id No Get an individual workflow
POST /workflows Yes Publish a new workflow
POST /workflows/:id/download No Increment download counter
# List workflows for ECH
wf <- api_list_workflows(survey_type = "ech")

# Find workflows that use a specific recipe
wf <- api_list_workflows(recipe_id = "ech_employment_001")

# Get specific workflow
w <- api_get_workflow("wf_labor_market_001")

# Publish
api_publish_workflow(my_workflow)

ANDA Variable Metadata

Note: The ANDA integration is an unofficial implementation that parses DDI XML metadata from INE Uruguay’s public ANDA catalog. It is not endorsed by INE and may contain errors or become outdated if INE changes the catalog structure. Always verify critical variable definitions against the official codebook.

The /anda/variables endpoint provides variable metadata obtained from INE Uruguay’s ANDA catalog (DDI XML format). This includes variable labels, value categories, and type information.

Method Endpoint Auth Description
GET /anda/variables No Get variable metadata
# Get all ECH variables
vars <- api_get_anda_variables(survey_type = "ech")

# Get specific variables
vars <- api_get_anda_variables(
  survey_type = "ech",
  var_names = c("pobpcoac", "e27", "ht11")
)

Query parameters:

Parameter Type Description
survey_type string Survey type (default "ech")
names string Comma-separated variable names (all if empty)

Each variable document contains:

Field Description
name Variable name (lowercase)
label Human-readable label
type discrete, continuous, or unknown
value_labels List of code-label mappings
description Extended description
source_edition Survey edition (e.g., "2024")
source_catalog_id ANDA catalog ID (e.g., 767)

Administration

Method Endpoint Auth Description
GET /admin/pending-users Admin List institutional accounts pending review
POST /admin/approve/:email Admin Approve an institutional account
POST /admin/reject/:email Admin Reject an institutional account

Admin access is controlled via the METASURVEY_ADMIN_EMAIL environment variable on the server.

Health Check

Method Endpoint Auth Description
GET /health No API and MongoDB status
{
  "status": "ok",
  "service": "metasurvey-api",
  "version": "2.0.0",
  "database": "metasurvey",
  "mongodb": "connected",
  "timestamp": "2026-02-15T12:00:00Z"
}

MongoDB Schema

The database has four collections, each with JSON Schema validation and optimized indexes.

Entity-Relationship Diagram

The following diagram shows the MongoDB collections and their relationships:

  ┌──────────────────┐       ┌──────────────────────┐
  │     users         │       │      recipes          │
  ├──────────────────┤       ├──────────────────────┤
  │ email (PK)       │──┐    │ id (PK)              │
  │ name             │  │    │ name                 │
  │ password_hash    │  ├───>│ user (FK)            │
  │ user_type        │  │    │ survey_type          │
  │ institution      │  │    │ edition              │
  └──────────────────┘  │    │ steps[]              │
                        │    │ certification{}      │
                        │    │ categories[]         │
                        │    └──────────┬───────────┘
                        │               │
                        │    ┌──────────┴───────────┐
                        │    │     workflows         │
                        │    ├──────────────────────┤
                        │    │ id (PK)              │
                        └───>│ user (FK)            │
                             │ survey_type          │
                             │ recipe_ids[] (FK)    │
                             │ calls[]              │
                             └──────────────────────┘

  ┌──────────────────────┐
  │   anda_variables      │
  ├──────────────────────┤
  │ survey_type (PK)     │
  │ name (PK)            │
  │ label                │
  │ type                 │
  │ value_labels{}       │
  └──────────────────────┘

  Relationships:
    users    ──1:N──>  recipes     (publishes)
    users    ──1:N──>  workflows   (publishes)
    recipes  ──1:N──>  workflows   (referenced by)

Collections

users

Field Type Required Description
name string Yes Display name
email string Yes Email (unique, validated)
password_hash string Yes SHA-256 hash (64 characters)
user_type enum Yes individual, institutional_member, institution
institution string No Institution name
verified boolean No Whether identity is verified
review_status enum No approved, pending, rejected
reviewed_by string No Reviewing admin’s email
reviewed_at string No ISO timestamp
created_at string Yes ISO timestamp

Indexes: unique on email.

recipes

Field Type Required Description
id string No Unique identifier (auto-generated)
name string Yes Recipe name
user string Yes Author email
survey_type enum Yes ech, eaii, eph, eai
edition string/array No Survey edition(s)
description string No Description
topic enum No labor_market, income, education, health, demographics, housing
version string No Semantic version (default "1.0.0")
downloads number No Download counter (default 0)
steps array No Step expressions as strings
depends_on array No Required input variable names
depends_on_recipes array No IDs of dependent recipes
categories array No Category objects
certification object No {level, certified_at, certified_by, notes}
user_info object No {name, user_type, email, url, verified}
doc object No {input_variables, output_variables, pipeline}
data_source object No {s3_bucket, s3_prefix, file_pattern, provider}

Indexes: unique on id; on user, survey_type, topic, downloads (desc), certification.level; compound on (survey_type, edition); text search on (name, description, topic).

workflows

Field Type Required Description
id string No Unique identifier (auto-generated)
name string Yes Workflow name
user string Yes Author email
survey_type enum Yes ech, eaii, eph, eai
edition string/array No Survey edition(s)
description string No Description
version string No Semantic version
downloads number No Download counter
estimation_type string/array No annual, quarterly, monthly
recipe_ids array No Referenced recipe IDs
calls array No Estimation calls as strings
call_metadata array No Call descriptions
categories array No Category objects
certification object No Same as recipes
user_info object No Same as recipes

Indexes: unique on id; on user, survey_type, recipe_ids, downloads (desc); compound on (survey_type, edition); text search on (name, description).

anda_variables

Field Type Required Description
survey_type string Yes Survey type
name string Yes Variable name (lowercase)
label string Yes Human-readable label
type enum No discrete, continuous, unknown
value_labels object No Code-label mappings
description string No Extended description
source_edition string No Edition (e.g., "2024")
source_catalog_id number No ANDA catalog ID

Indexes: compound unique on (survey_type, name); on survey_type.

Database Setup

To set up the database on a new deployment:

# 1. Create collections with JSON Schema validation and indexes
mongosh "$METASURVEY_MONGO_URI" inst/scripts/setup_mongodb.js

# 2. Seed recipes, workflows, and users
METASURVEY_MONGO_URI="..." Rscript inst/scripts/seed_ech_recipes.R

# 3. Seed ANDA variable metadata from INE catalog
METASURVEY_MONGO_URI="..." Rscript inst/scripts/seed_anda_metadata.R

The setup script creates the four collections and builds the indexes. It is idempotent: existing collections are skipped.

Server Deployment

Environment Variables

Variable Required Default Description
METASURVEY_MONGO_URI Yes MongoDB connection string
METASURVEY_DB No metasurvey Database name
METASURVEY_JWT_SECRET No metasurvey-dev-secret-... JWT signing secret (override in production)
METASURVEY_ADMIN_EMAIL No Admin email for institutional review

Running Locally

METASURVEY_MONGO_URI="mongodb+srv://user:pass@cluster.mongodb.net" \
  Rscript -e 'plumber::plumb("inst/api/plumber.R")$run(port = 8787)'

The Swagger UI interface will be available at http://localhost:8787/__docs__/.

Docker

docker build -t metasurvey-api inst/api/
docker run -p 8787:8787 \
  -e METASURVEY_MONGO_URI="mongodb+srv://..." \
  -e METASURVEY_JWT_SECRET="your-production-secret" \
  -e METASURVEY_ADMIN_EMAIL="admin@example.com" \
  metasurvey-api

Railway

The API is configured for Railway deployment via the render.yaml file in inst/api/. Push the repository and configure the environment variables in the Railway dashboard.

CORS

The API allows cross-origin requests from any origin:

Next Steps