The bdl package is an interface to Local Data Bank(Bank
Danych Lokalnych - bdl) API with a set of
useful tools like quick plotting using data from the data bank.
Working with bdl is based on id codes. Most of the data
downloading functions require specifying one or vector of multiple unit
or variable ids as a string.
It is recommended to use a private API key which u can get here. To apply it
use: options(bdl.api_private_key ="your_key")
Also, every function returns data in Polish by default. If you would
like to get data in English, just add lang = "en" to any
function.
Any metadata information (unit levels, aggregates, NUTS code explanation, etc.) can be found here.
When searching for unit id, we can use two methods:
search_units()get_units()Units consist of 6 levels:
get_levels()The lowest - seventh level has its own separate functions with suffix
localities. Warning - the localities functions
have a different set of arguments. Check package or API documentation
for more info.
Direct searching search_units() takes couple different
arguments like:
name - required search phrase (can be empty
string)level - narrows returned units to given leveland more. To look for more arguments on any given function check package or API documentation.
search_units(name = "wro")
search_units(name = "", level = 3)To get all units available in local data bank run
get_units() without any argument(warning - it can eat data
limit very fast around 4.5k rows):
To narrow the list add unitParentId. The function will
return all children units for a given parent at all levels. Add
level argument to filter units even further.
get_units(parentId = "000000000000", level = 5)Subjects are themed directories of variables.
We have two searching methods for both subjects and variables:
search_variables() and
search_subjects()get_subjects() and
get_variables()To directly search for subject we just provide search phrase:
search_subjects("lud")Subjects consist of 3 levels (categories, groups, subgroups) -
K, G and P respectively. The
fourth level of the subject (child of a subgroup) would be
variables.
To list all top level subjects use get_subjects():
get_subjects()To list sub-subjects to given category or group use
get_subjects() with parentId argument:
get_subjects(parentId = "K3")
get_subjects(parentId = "G7")Firstly you can list variables for given subject (subgroup):
get_variables("P2425")Secondly, you can direct search variables with
search_variables(). You can use an empty string as
name to list all variables but I strongly advise against as
it has around 40 000 rows and you will probably hit data limit.
search_variables("samochod")You can narrow the search to the given subject - subgroup:
search_variables("lud", subjectId = "P2425")If you picked unit and variable codes, you are ready to download data. You can do this two ways:
get_data_by_unit()get_data_by_variable()We will use get_data_by_unit(). We specify our single
unit as unitId string argument and variables by a vector of
strings. Optionally we can specify years of data. If not all available
years are used.
get_data_by_unit(unitId = "023200000000", varId =  "3643")
get_data_by_unit(unitId = "023200000000", varId =  c("3643", "2137", "148190"))To get more information about data we can add type
argument and set it to "label" to add an additional column
with the variable info.
get_data_by_unit(unitId = "023200000000", varId = "3643", type = "label")We will use get_data_by_variable(). We specify our
single variable as varId string argument. If no
unitParentId is provided, the function will return all
available units for a given variable. Setting unitParentId
will return all available children units (on all levels). To narrow unit
level set unitLevel. Optionally we can specify years of
data. If not all available years are used.
get_data_by_variable("420", unitParentId = "011210000000", year = 2013:2016)
get_data_by_variable("420", unitLevel = "2", year = 2013:2016)The bdl package provides a couple of additional
functions for summarizing and visualizing data.
Data downloaded via get_data_by_unit() or
get_data_by_variable() and their locality versions can be
easily summarized by summary():
df <- get_data_by_variable(varId = "3643", unitParentId = "010000000000")
summary(df)Plotting functions in this package are interfaces to the data
downloading functions. Some of them require specifying
data_type - a method for downloading data, and the rest of
the arguments will be relevant to specify data_type
function. Check documentation for more details.
line_plot(data_type = "unit", unitId = "000000000000", varId = c("415","420"))pie_plot(data_type ="variable" ,"1", "2018",unitParentId="042214300000", unitLevel = "6")Scatter plot is unique - requires vector of only 2 variables.
scatter_2var_plot(data_type ="variable" ,c("60559","415"), unitLevel = "2")The bdl package comes with the bdl.maps
dataset containing spatial maps for each Poland’s level.
generate_map() use them to generate maps filled with the
bdl data. Use unitLevel to change the type of map. When the
lower level is chosen, the map generation can be more time consuming as
it has more spatial data to process. This function will download and
load maps automatically. In case of any errors you can download them
manually here.
Download data file and double-click to load it to environment.
generate_map(varId = "60559", year = "2017", unitLevel = 3)Downloading functions get_data_by_unit() and
get_data_by_variable() have alternative “multi” downloading
mode. Function that would work for example single unit, if provided a
vector will make additional column with values for each unit
provided:
get_data_by_unit(unitId = c("023200000000", "020800000000"), varId =  c("3643", "2137", "148190"))Or multiple variables for get_data_by_variable():
get_data_by_variable(varId =c("3643","420"), unitParentId = "010000000000")This mode works for the locality version as well.
More consistent method of downloading multiple variables for multiple
units is provided by get_panel_data() function:
get_panel_data(unitId = c("030210101000", "030210105000", "030210106000"), varId =  c("60270", "461668"), year = c(2015:2016))It offers also parameter ggplot = TRUE which produces
output in the long form suitable for plotting with ggplot package:
library(ggplot2)
df <- get_panel_data(unitId = c("030210101000", "030210105000", "030210106000"), varId =  c("60270", "461668"), year = c(2015:2018), ggplot = TRUE)
ggplot(df,aes(x=year, y= values, color = unit)) + geom_line(aes(linetype = variables)) + scale_color_discrete(labels = c("A", "B", "C")) + scale_linetype_discrete(labels = c("X", "Y"))