--- title: "Recoding & Relabelling" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Recoding & Relabelling} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, message=FALSE, warnings=FALSE, echo=FALSE} library(regions) library(dplyr) library(tidyr) ``` Eurostat offers so-called correspondence tables to follow boundary changes, recoding and relabelling for all NUTS changes since the formalization of the NUTS typology. Unfortunately, these Excel tables do not conform with the requirements of tidy data, and their vocabulary for is not standardized, either. For example, recoding changes are often labelled as recoding, recoding and renaming, code change, Code change, etc. The `data-raw` library contains these Excel tables and very long data wrangling code that unifies the relevant vocabulary of these Excel files and brings the tables into a single, tidy format , starting with the definition `NUTS1999`. The resulting data file `nuts_changes` is included in the `regions` package. It already contains the changes that will come into force in 2021. Let's review a few changes. ```{r changes, results='asis'} data(nuts_changes) nuts_changes %>% mutate ( geo_16 = .data$code_2016, geo_13 = .data$code_2013 ) %>% filter ( code_2016 %in% c("FRB", "HU11") | code_2013 %in% c("FR7", "HU10", "FR24")) %>% select ( all_of(c("typology", "geo_16", "geo_13", "start_year", "code_2013", "change_2013", "code_2016", "change_2016")) ) %>% pivot_longer ( cols = starts_with("code"), names_to = 'definition', values_to = 'code') %>% pivot_longer ( cols = starts_with("change"), names_to = 'change', values_to = 'description') %>% filter (!is.na(.data$description), !is.na(.data$code)) %>% select ( -.data$change ) %>% knitr::kable () ``` You will not find the `geo` identifier `FRB` in any statistical data that was released before France changes its administrative boundaries and the `NUTS2016` boundary definition came into force. However, as the description says, you may find historical data elsewhere, in a historical NUTS2-level product for the `FRB` _CENTRE — VAL DE LOIRE_ `NUTS1` region, because it is identical to the earlier `NUTS2` level region `FR24`, i.e. Central France, which was known as _Centre_ for many years before the transition to `NUTS2016`. The size and importance of this territorial unit is more similar to `NUTS1` than `NUTS2` units. Because `FRB` contains only one `FRB0`, the earlier `FR24`, it is technically identified as a NUTS2-level region, too. You find the same data in the `NUTS2` typology. With statistical products on NUTS2 level, you can simply recode historical `FR24` data to `FRB0`, since the aggregation level and the boundaries are not changed. Furthermore, you can project this data to any `NUTS1` level panel either under the earlier `FR2` `NUTS1` label, if you use the old definition, or the new `FRB` label, if you use the current `NUTS2016` typology. Let's see a hypothetical data frame with random variables. (Usually a data frame has no so many issues, so a more detailed example can be constructed this way.) ```{r recode, results='asis'} example_df <- data.frame ( geo = c("FR", "DEE32", "UKI3" , "HU12", "DED", "FRK"), values = runif(6, 0, 100 ), stringsAsFactors = FALSE ) recode_nuts(dat = example_df, nuts_year = 2013) %>% select ( geo, values, code_2013) %>% knitr::kable() ``` In this hypothetical example we are creating backward compatibility with the `NUTS2013` definition. There are three type of observations: - Observations about typologies that did not change. There is not further thing to do to make the data comparable across time. - Typologies which changed their geo codes, but are not affected by boundary changes, i.e. the data is comparable, it is only found at a different geographical label - Typologies that are not comparable, and we cannot compare them meaningfully with a `NUTS2013` dataset. ```{r recode2013, results='asis'} recode_nuts(example_df, nuts_year = 2013) %>% select ( all_of(c("geo", "values", "typology_change", "code_2013")) ) %>% knitr::kable() ``` The first three observations are comparable with a `NUTS2013` dataset. The fourth observation is comparable, too, but when joining with a `NUTS2013` dataset or map, it is likely that `FRK` needs to be re-coded to `FR7`. The following data can be joined with a `NUTS2013` dataset or map: ```{r changes2013, results='asis'} recode_nuts(example_df, nuts_year = 2013) %>% select ( .data$code_2013, .data$values, .data$typology_change ) %>% rename ( geo = .data$code_2013 ) %>% filter ( !is.na(.data$geo) ) %>% knitr::kable() ``` And re-assuringly these data will be compatible with the next NUTS typology, too! ```{r recode2021, results='asis'} recode_nuts(example_df, nuts_year = 2021) %>% select ( .data$code_2021, .data$values, .data$typology_change ) %>% rename ( geo = .data$code_2021 ) %>% filter ( !is.na(.data$geo) ) %>% knitr::kable() ``` What about `HU12`? ```{r recodeHU12, results='asis'} data(nuts_changes) nuts_changes %>% select( .data$code_2016, .data$geo_name_2016, .data$change_2016) %>% filter( code_2016 == "HU12") %>% filter( complete.cases(.) ) %>% knitr::kable() ``` The description in the correspondence tables clarifies that in fact historical data may be assembled for `HU12` (*Pest county*.) - It can be accessed from national LAU sources (as `HU-PE`) or for `NUTS3` data (as `HU102`) - It can be calculated by deducting the Budapest data from the former Central Hungary `NUTS1` region data. That will be the topic of a later vignette on aggregation and re-aggregation. # Citations and related work ### Citing the data sources Eurostat data: cite [Eurostat](https://ec.europa.eu/eurostat/). Administrative boundaries: cite [EuroGeographics](https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units). ### Citing the regions R package For main developer and contributors, see the [package homepage](https://regions.dataobservatory.eu/). This work can be freely used, modified and distributed under the GPL-3 license: ```{r citation-regions, message=FALSE, eval=TRUE, echo=TRUE} citation("regions") ``` ### Contact For contact information, see the [package homepage](https://regions.dataobservatory.eu/).