--- title: "Getting started with ibger" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting started with ibger} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` ## Overview **ibger** provides a tidyverse-friendly interface to the [IBGE Aggregate Data API](https://servicodados.ibge.gov.br/api/docs/agregados?versao=3) (version 3). This is the same API that powers [SIDRA](https://sidra.ibge.gov.br/) — the automatic data retrieval system for all surveys and censuses conducted by the Brazilian Institute of Geography and Statistics (IBGE). Each SIDRA table corresponds to an **aggregate** in the API. With ibger you can browse aggregates, inspect their metadata, and retrieve tidy data — all from R. ## Installation ```{r} # install.packages("remotes") remotes::install_github("StrategicProjects/ibger") ``` ```{r setup} library(ibger) ``` ## A typical workflow ### Step 1 — Find an aggregate Use `ibge_aggregates()` to list every aggregate grouped by survey. Optional filters let you narrow the search: ```{r} # All aggregates ibge_aggregates() #> ✔ 1420 aggregates found. #> # A tibble: 1,420 × 4 #> survey_id survey_name aggregate_id aggregate_name #> #> 1 AB Abate de animais 1705 Animais abatidos … #> 2 AB Abate de animais 1706 Peso total das ca… #> ... # Monthly aggregates only ibge_aggregates(periodicity = "P5") # Aggregates with municipality-level data ibge_aggregates(level = "N6") ``` ### Step 2 — Inspect the metadata Once you have an aggregate ID, `ibge_metadata()` tells you everything about its structure: ```{r} meta <- ibge_metadata(1705) meta ``` The print method shows a structured summary: ``` ── Animais abatidos ── ID: 1705 Survey: Pesquisa Trimestral do Abate de Animais Periodicity: trimestral (200101 to 202404) Territorial levels: N1, N2, N3 ── Variables (2) ── 284: Número de informantes (Unidades) 285: Cabeças abatidas (Cabeças) ── Classifications (1) ── 12529: Tipo de rebanho bovino (9 categories) 115236: Total [level 0] 115237: Bois [level 1] 115238: Vacas [level 1] ... ``` Each component is accessible directly: ```{r} meta$variables #> # A tibble: 2 × 3 #> id name unit #> #> 1 284 Número de informantes Unidades #> 2 285 Cabeças abatidas Cabeças meta$classifications #> # A tibble: 1 × 3 #> id name categories #> #> 1 12529 Tipo de rebanho bovino # Unnest to see every category tidyr::unnest(meta$classifications, categories) # Geographic levels meta$territorial_level #> $administrative #> [1] "N1" "N2" "N3" # Time range meta$periodicity #> $frequency [1] "trimestral" #> $start [1] "200101" #> $end [1] "202404" ``` ### Step 3 — Retrieve data `ibge_variables()` is the main workhorse. It sends a single request and returns a tidy tibble: ```{r} ibge_variables(1705, localities = "BR") #> ✔ 12 records retrieved. #> # A tibble: 12 × 9 #> variable_id variable_name variable_unit classification_12529 #> #> 1 284 Número de inform… Unidades Total #> 2 285 Cabeças abatidas Cabeças Total #> ... #> locality_id locality_name locality_level period value #> #> 1 1 Brasil Brasil 202303 2584 #> 2 1 Brasil Brasil 202303 7802044 #> ... ``` ## Specifying localities The `localities` parameter accepts several convenient formats: ```{r} # Country total ibge_variables(1705, localities = "BR") # All states ibge_variables(8884, localities = "N3") # Specific states (RJ = 33, SP = 35) ibge_variables(8884, localities = list(N3 = c(33, 35))) # Mix levels: metropolitan areas + a specific municipality ibge_variables(1705, localities = list(N7 = c(3501, 3301), N6 = 5208707)) ``` The geographic level codes follow the IBGE convention: | Code | Level | Example | |------|----------------------------|--------------------------------------| | `N1` | Brazil | `"BR"` or `list(N1 = 1)` | | `N2` | Major region | `list(N2 = 1)` — North | | `N3` | State (UF) | `list(N3 = 33)` — Rio de Janeiro | | `N6` | Municipality | `list(N6 = 3550308)` — São Paulo/SP | | `N7` | Metropolitan area | `list(N7 = 3501)` — RM São Paulo | > **Tip**: Not every aggregate is available at every level. Aggregate 1705 > has data for N1, N2, and N3 but not N6. Use `ibge_metadata()` to check. ## Specifying periods Periods follow the API convention — negative values mean "last N": ```{r} # Last 6 periods (the default) ibge_variables(1705, periods = -6, localities = "BR") # Last 12 periods ibge_variables(1705, periods = -12, localities = "BR") # Specific period codes ibge_variables(8884, periods = c(202301, 202302, 202303), localities = "BR") # Range (inclusive) ibge_variables(8884, periods = "202101-202304", localities = "BR") # Range + extra period ibge_variables(8884, periods = "202101-202106|202301", localities = "BR") ``` > **Note**: Negative values cannot be mixed with specific periods. Period > codes encode both the date and the periodicity — `202001` could mean > January 2020 (monthly), Q1 2020 (quarterly), or S1 2020 (semi-annual), > depending on the aggregate. ## Filtering with classifications Many aggregates break their data further by classifications (dimensions). For instance, aggregate 1712 (crop production) has a classification for the type of product (226) and another for the producer condition (218). ```{r} # Single category: pineapple (4844) from product classification (226) ibge_variables( aggregate = 1712, localities = "BR", classification = list("226" = 4844) ) # Multiple categories ibge_variables( aggregate = 1712, localities = "BR", classification = list("226" = c(4844, 96608, 96609)) ) # Multiple classifications ibge_variables( aggregate = 1712, localities = "BR", classification = list("226" = c(4844, 96608), "218" = 4780) ) # All categories of a classification (can be large!) ibge_variables( aggregate = 1712, periods = -1, localities = "BR", classification = list("226" = "all") ) ``` When no classification is specified, the API returns the **Total** category (ID = 0) — an aggregate across all categories. ## Automatic validation Before sending any request, `ibge_variables()` and `ibge_localities()` validate your parameters against the aggregate's metadata. If something doesn't match, you get a clear error with the allowed values: ```{r} # N3 (states) is not available for aggregate 1705 ibge_variables(1705, localities = "N3") #> Error: #> ! Geographic level(s) "N3" not available for aggregate 1705. #> ℹ Available levels: "N1", "N6", and "N7". # Period out of range ibge_variables(1705, periods = 199901, localities = "BR") #> Error: #> ! Period(s) "199901" out of range for aggregate 1705. #> ℹ Valid range: "201202" to "202001" (monthly). # Non-existent variable ibge_variables(1705, variable = 999, localities = "BR") #> Error: #> 355 - IPCA15 - Variação mensal (%) #> 356 - IPCA15 - Variação acumulada no ano (%) #> 1120 - IPCA15 - Variação acumulada em 12 meses (%) #> 357 - IPCA15 - Peso mensal (%) ``` Metadata is fetched once per session and cached. To force a refresh: ```{r} ibge_clear_cache() ``` Skip validation entirely with `validate = FALSE`: ```{r} ibge_variables(1705, localities = "BR", validate = FALSE) ``` ## Browsing the survey catalog Beyond aggregate-level data, ibger also provides access to the [IBGE Metadata API](https://servicodados.ibge.gov.br/api/docs/metadados?versao=2) (v2), which catalogs IBGE's surveys with institutional and methodological information such as status, category, collection frequency, and thematic classifications. This is useful when you want to understand **what surveys exist** and **how they are structured** before diving into specific aggregates. ```{r} # List all 98 IBGE surveys ibge_surveys() #> # A tibble: 98 × 8 #> id name status category ... #> #> 1 AC Pesquisa Anual da Indústria da Cons… Ativa Estrutural #> 2 AA Pesquisa Nacional de Saúde do Escol… Ativa Especial #> ... # Filter active monthly surveys library(dplyr) ibge_surveys(thematic_classifications = FALSE) |> filter(status == "Ativa", category == "Conjuntural") # Check which periods have metadata for the Censo Demográfico ibge_survey_periods("CD") #> # A tibble: 9 × 3 #> year month order #> #> 1 2022 NA 0 #> 2 2010 NA 0 #> ... # Get full institutional metadata for a specific period meta <- ibge_survey_metadata("CD", year = 2022) meta #> ── CD ── #> Status: Ativa #> Category: Estrutural #> ... #> ── Metadata occurrences (1) ── #> Use `meta$occurrences` to explore the full metadata. # Explore methodology fields names(meta$occurrences[[1]]) ``` Survey codes are validated before each request. If you use a wrong code, the error suggests similar alternatives: ```{r} ibge_survey_periods("PMS") #> Error: Survey code "PMS" not found in the IBGE catalog. #> ℹ Did you mean one of these? #> * SC - Pesquisa Mensal de Serviços #> * MC - Pesquisa Mensal de Comércio #> ... ``` ## API limits and special values Each request can return at most **100,000 values**, computed as: > categories × periods × localities ≤ 100,000 If exceeded, the API returns HTTP 500. Split your request into smaller chunks when working with many localities or categories. The `value` column may contain special characters instead of numbers: | Value | Meaning | |-------|--------------------------------------------------------------| | `-` | Numeric zero (not from rounding) | | `..` | Not applicable | | `...` | Data not available | | `X` | Suppressed to avoid identifying individual respondents | These come through as character strings in the `value` column. Use `parse_ibge_value()` to convert to numeric in one step: ```{r} ibge_variables(7060, localities = "BR") |> dplyr::mutate(value = parse_ibge_value(value)) ```