---
title:
Datasets in `"sdam"` package
date: "August 2022"
author:
- name: Antonio Rivero Ostoic
affiliation: Aarhus University
email: jaro@cas.au.dk
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Datasets in `"sdam"` package}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---
```{r setup, echo=FALSE, message=FALSE}
knitr::opts_chunk$set(echo=TRUE,error=TRUE)
knitr::opts_chunk$set(comment = "")
library("sdam")
```
```{r set-options, echo=FALSE, cache=FALSE}
options(width = 96)
```
## Preliminaries
Install and load one version of `"sdam"` package.
```{r, echo=TRUE, eval=FALSE}
install.packages("sdam") # from CRAN
devtools::install_github("sdam-au/sdam") # development version
devtools::install_github("mplex/cedhar", subdir="pkg/sdam") # a legacy version R 3.6.x
```
```{r}
# load and check version
library(sdam)
packageVersion("sdam")
```
## Built-in datasets
Package `"sdam"` comes with a suite of datasets and external data to execute different functions available in the package
and to perform analysis.
For a list of built-in datasets in `"sdam"` use the `"utils"` function `data()` or `utils::data()` with the `'package'` argument.
The CRAN distribution has four built-in datasets, while the development and legacy distributions add three more built-in datasets.
```{r, echo=TRUE, eval=FALSE}
# pop-up a new window
data(package="sdam")
```
```{r, echo=FALSE, eval=TRUE}
print(data(package="sdam"))
```
```{r}
# Data sets in package 'sdam':
#
# rp Roman province names and acronyms as in EDH
# rpcp Roman provinces chronological periods
# rpd Roman provinces dates from EDH
# rpmcd Caption maps and affiliation dates of Roman provinces
```
```{r}
# Additional built-in datasets in 'sdam':
#
# EDH Epigraphic Database Heidelberg Dataset
# rpmp Maps of ancient Roman provinces and Italian regions
# retn Roman Empire transport network and Mediterranean sea
```
A description of each dataset is available in the manual that from the R console is accessible as e.g.
the `EDH` dataset in a non-CRAN distribution.
```{r, echo=TRUE, eval=FALSE}
# Epigraphic Database Heidelberg Dataset help
?EDH
```
### Ancient Mediterranean built-in datasets
The `EDH` dataset in `"sdam"` has information about Latin epigraphy retrieved from the Epigraphic Database Heidelberg API repository from the Roman world during the antiquity period.
A list of Roman provinces and regions in this dataset is available in dataset `"rp"`, and use again function `data()` to load this built-in dataset
to look at its internal structure with `utils::str()` function.
* Dataset `"rp"` is a named list with Roman provinces and regions with acronyms according to the Epigraphic Database Heidelberg.
```{r, echo=TRUE, eval=TRUE}
# load dataset
data("rp")
# obtain object structure
str(rp)
```
#### `edhw()` interface with `"rp"` dataset
* Function `edhw()` is a wrapper to extract and transform the records in the `EDH` dataset that invokes `"rp"` dataset
to retrieve the records from a specific Roman province or region in `EDH`.
```{r}
# Armenian records in 'EDH'
edhw(province="Arm")[1]
```
The `Warning` messages from `edhw()` are first because there is not an explicit input in `x`, it is assumed that the input data is from the `EDH` dataset.
The second warning message just tells the type object to return is always a list for argument `province` alone.
#### `EDH` in data frames
All records in the `EDH` dataset have a list format and it is possible to transform this information into a dataframe format
with the wrapper function `edhw()`.
For instance, displaying the first record from `Arm` as a data frame in argument `'as'` is made by the record `'id'` number.
```{r, echo=TRUE, eval=FALSE}
# record HD015521
edhw(id="15521", as="df")
```
However, it is easier to visualise in the screen only the variables related to people.
```{r, echo=TRUE, eval=FALSE}
# record HD015521 with explicit variables
edhw(id="15521", vars="people", as="df")
```
```{r, echo=FALSE, eval=TRUE}
suppressWarnings(edhw(id="15521", vars="people", as="df"))
```
```{r, echo=TRUE, eval=FALSE}
# record HD015521 with more explicit variables
edhw(id="15521", vars=c("people", "province_label"), as="df")
```
```{r, echo=FALSE, eval=TRUE}
suppressWarnings(edhw(id="15521", vars=c("people", "province_label"), as="df"))
```
### Obtaining all `people` variables
Start by looking at the `people` variables in the `EDH` dataset for the Roman province of **Armenia**.
#### Armenia
```{r, echo=FALSE, eval=TRUE, out.width="25%", fig.align="center", fig.cap="Roman province of Armenia (ca 117 AD)."}
plot.map("Arm", cap=TRUE, name=FALSE)
```
Transformation of the entire province from the `EDH` dataset requires extracting first a list with the province content.
Function `edhw()` is to obtain available inscriptions per province from `EDH` and all data attributes from `people` variable.
The default outputs are a list and a dataframe for the first and the second instance of the function.
```{r, echo=TRUE, eval=FALSE}
# people in Armenia
edhw(province="Arm") |>
edhw(vars="people")
```
```{r, echo=FALSE, eval=TRUE}
edhw(province="Arm") |>
suppressWarnings() |>
edhw(vars="people") |>
suppressWarnings()
```
People attribute variables in inscriptions for `Armenia` are
`age: years`, `cognomen`, `gender`, `name`, `nomen`, `person_id`, `praenomen`, and, `status`,
but any inscription with `tribus` or `origo` as in the case of other provinces.
For `Armenia`, two inscriptions have people variables and all people scripted are `male`,
where record `HD015524` spans two rows because there are two persons where one have `nomen`, `cognomen`, and `name` ineligible.
### Datasets for cartographical maps
The plotting of the Roman province in the previous section requires other datasets.
Apart from `"rp"`. In `"sdam"`, there are other three datasets invoked for plotting cartographical maps related to the Roman Empire and the
Mediterranean basin, which are `"rpmp"`, `"rpmcd"`, and `"retn"`.
Function `plot.map()` calls dataset `"rpmp"` for the shapes and colours in the plotting of the cartographical maps
of different regions of the Roman Empire. For the caption and province dates with this function shapes and colours are in dataset `"rpmcd"`.
* Dataset `"retn"` bears the shapes of places and routes of an ancient transportation system in the Mediterranean region and political
divisions of the Roman Empire.
It also has it contours and parts of the European continent.
```{r}
# land contour around Mediterranean
plot.map(type="plain")
```
```{r, echo=TRUE, eval=FALSE}
# display settlements and shipping routes
plot.map(type="plain", settl=TRUE, shipr=TRUE)
```
Vignette [Cartographical maps and networks](../doc/Maps.html) has more about
transportation networks in the ancient Mediterranean.
### Datasets with dates
There are built-in datasets in `"sdam"` related to dates as well that are either displayed in a cartographical map or used for other computations.
* Dataset `"rpd"` that has dates for provinces from the `EDH` dataset.
It serves for performing a restricted imputation on data subsets in `EDH` or in another dataset.
```{r, echo=TRUE, eval=TRUE}
# dates from EDH
data("rpd")
# three provinces in object structure
str(rpd[1:3])
```
From this set of three Roman provinces in the `EDH`, the longest timespan is for `Aem`, and on average `Ach`
has the oldest incriptions, while `Aeg` has incriptions with the newest dates.
* Dataset `"rpcp"` with chronological periods for regions with early and later Roman influence
per province.
```{r, echo=TRUE, eval=TRUE}
# periods for Roman provinces
data("rpcp")
# object structure
str(rpcp)
```
The early and later Roman influence in the 45 ancient provinces and regions are timespans with a
*terminus ante quem* and a *terminus post quem*.
Vignette [Dates and missing dating data](../doc/Dates.html) has the visualisation of these
and other dates.
### External data
Apart from the built-in datasets, it is attached as external data the semi-colon separated file `StraussShipwrecks.csv`
with the Shipwrecks dataset for performing analyses:
Reference and documentation in
Strauss, J. (2013). *Shipwrecks Database*. Version 1.0.
Accessed (07-12-2021) from oxrep.classics.ox.ac.uk/databases/shipwrecks_database/
Built from Parker, A.J. *Ancient Shipwrecks of the Mediterranean and the Roman Provinces* (Oxford: BAR International Series 580, 1992)
Details about the access to the database are in:
- [Shipwrecks network in the Mediterranean Basin (23-June-2022)](https://htmlpreview.github.io/?https://github.com/sdam-au/R_code/blob/master/HTML/Shipwrecks%20Network%20in%20the%20Mediterranean%20Basin.html)
- Vignettes [Dates and missing dating data](../doc/Dates.html) and
[Cartographical maps and networks](../doc/Maps.html) also use the Shipwrecks dataset.
### See also
#### Vignettes
* [Re-encoding `people` in the `EDH` dataset](../doc/Encoding.html)
* [Dates and missing dating data](../doc/Dates.html)
* [Cartographical maps and networks](../doc/Maps.html)
#### Manuals
* [sdam: Digital Tools for the SDAM Project at Aarhus University](../html/sdam-package.html)
* [`"sdam"` manual](https://github.com/mplex/cedhar/blob/master/typesetting/reports/sdam.pdf)
#### Project
* [Release candidate version](https://github.com/sdam-au/sdam)
* [Code snippets using `"sdam"`](https://github.com/sdam-au/R_code)
* [Social Dynamics and complexity in the Ancient Mediterranean project](https://sdam-au.github.io/sdam-au/)