
The goal of ucimlrepo is to download and import data
sets directly into R from the UCI
Machine Learning Repository.
[!IMPORTANT]
This package is an unoffical port of the Python
ucimlrepopackage.
[!NOTE]
Want to have datasets alongside a help documentation entry?
Check out the
{ucidata}R package! The package provides a small selection of data sets from the UC Irvine Machine Learning Repository alongside of help entries.
You can install the development version of ucimlrepo from GitHub with:
# install.packages("remotes")
remotes::install_github("coatless-rpkg/ucimlrepo")To use ucimlrepo, load the package using:
library(ucimlrepo)With the package now loaded, we can download a dataset using the
fetch_ucirepo() function or use the
list_available_datasets() function to view a list of
available datasets.
For example, to download the iris dataset, we can
use:
# Fetch a dataset by name
iris_by_name <- fetch_ucirepo(name = "iris")
names(iris_by_name)
#> [1] "data" "metadata" "variables"There are many levels to the data returned. For example, we can
extract the original data frame containing the iris dataset
using:
iris_uci <- iris_by_name$data$original
head(iris_uci)
#> sepal length sepal width petal length petal width class
#> 1 5.1 3.5 1.4 0.2 Iris-setosa
#> 2 4.9 3.0 1.4 0.2 Iris-setosa
#> 3 4.7 3.2 1.3 0.2 Iris-setosa
#> 4 4.6 3.1 1.5 0.2 Iris-setosa
#> 5 5.0 3.6 1.4 0.2 Iris-setosa
#> 6 5.4 3.9 1.7 0.4 Iris-setosaAlternatively, we could retrieve two data frames, one for the features and one for the targets:
iris_features <- iris_by_name$data$features
iris_targets <- iris_by_name$data$targetsWe can then view the first few rows of each data frame:
head(iris_features)
#> sepal length sepal width petal length petal width
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3.0 1.4 0.2
#> 3 4.7 3.2 1.3 0.2
#> 4 4.6 3.1 1.5 0.2
#> 5 5.0 3.6 1.4 0.2
#> 6 5.4 3.9 1.7 0.4head(iris_targets)
#> class
#> 1 Iris-setosa
#> 2 Iris-setosa
#> 3 Iris-setosa
#> 4 Iris-setosa
#> 5 Iris-setosa
#> 6 Iris-setosaAlternatively, you can also directly query by using an ID found by
using list_available_datasets() or by looking up the
dataset on the UCI ML Repo website:
# Fetch a dataset by id
iris_by_id <- fetch_ucirepo(id = 53)We can also view a list of data sets available for download using the
list_available_datasets() function:
# List available datasets
list_available_datasets()[!NOTE]
Not all 600+ datasets on UCI ML Repo are available for download using the package. The current list of available datasets can be viewed here.
If you would like to see a specific dataset added, please submit a comment on an issue ticket in the upstream repository.