--- title: "BayLum: Specification of the YAML config file" author: "Sebastian Kreutzer" date: 'Updated for BayLum package version >= `r packageVersion("BayLum")` (`r Sys.Date()`)' output: rmarkdown::html_vignette: toc: yes toc_depth: 4 number_sections: yes vignette: > %\VignetteIndexEntry{BayLum: Specification of the YAML config file} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, echo = FALSE, message = FALSE} knitr::opts_chunk$set(comment = "") options(max.print = 100) library(BayLum) ``` # Background and scope In earlier versions of `'BayLum'` measurement data and settings had to be prepared using a multiple folder structure comprising various CVS and BIN/BINX files in a very particular way. This concept proved error-prone and left a lot of frustrated `'BayLum'` users behind, who sometimes spent hours trying to understand unclear error messages and then realising that there was a typo in one of the CSV files, or the folder structure was not precisely how `'BayLum'` expected it to be found. By switching to a single-configuration file users have more options while the settings are cleaner and less scatters over different files in numerous subfolders. The parameter naming follows the naming convention used in the "old" `'BayLum'` CSV-files. The purpose of this document is the specification and description of the [YAML](https://en.wikipedia.org/wiki/YAML) (file ending `*.yml`) configuration file used by the function `create_DataFile()` to provide data input and settings to the `'BayLum'` modelling. The YAML file is an alternative and a future replacement of the previous folder structure with various CSV files required by the functions `Generate_DataFile()` and `Generate_DateFile_MG()`. # Key concepts The configuration file uses the [YAML](https://en.wikipedia.org/wiki/YAML) format, which uses indention to nest different parameters. Please see the cited documentation for details, for `'BayLum'` the following features stick out: * Only a single configuration file is needed * The path of the file on your hard drive does not matter * Measurement data can be stored wherever you like to have them, although it makes sense to pool them into one folder for the analysis * One sample is one record in the YAML file. If data of one sample is scattered over different measurement files (e.g., BIN/BINX), one sample still has only one record. # Examples and detailed specifications ## A single sample entry A single sample entry appears as follows: ``` - sample: "samp1" files: - "/yourhardrive/yourfolder/sample_one.binx" settings: dose_points: null dose_source: { value: 0.1535, error: 0.00005891 } dose_env: { value: 2.512, error: 0.05626 } rules: beginSignal: 6 endSignal: 8 beginBackground: 50 endBackground: 55 beginTest: 6 endTest: 8 beginTestBackground: 50 endTestBackground: 55 inflatePercent: 0.027 nbOfLastCycleToRemove: 1 ``` Each new record (aka sample) starts with a `-` and the indention as shown above. Furthermore: * Indention rules need to be followed strictly * Each record has three levels: top level, settings level, rules level and this nesting needs to be kept * Only parameters shown in the examples are allowed, and the list of parameters need to be complete (i.e. you cannot just remove a parameter) * Sample names need to be **unique** Other than that, if you keep these simple rules in mind, you will have an easy time preparing your `'BayLum'` analysis. ## Multiple records While the single record makes an easy case, you probably have more than one sample to be thrown into the modelling. Although the number of records is not limited, we keep it simple here; a two records entry (dots replace the entries as shown above): ``` # this is a coment for sample number one - sample: "samp1" files: - "/yourhardrive/yourfolder/sample_one.binx" settings: dose_points: null dose_source: { value: 0.1535, error: 0.00005891 } dose_env: { value: 2.512, error: 0.05626 } rules: beginSignal: 6 . . . nbOfLastCycleToRemove: 1 # this is a coment for sample number two - sample: "samp2" files: null settings: dose_points: null dose_source: { value: 0.1535, error: 0.00005891 } dose_env: { value: 2.512, error: 0.05626 } rules: beginSignal: 6 . . . nbOfLastCycleToRemove: 1 ``` As you can see from the example, you can also add comments to the records, which start with `#`. The two records also show different entries for argument `files`. In the first case, a file path is given, while for record number two, `files` is set to `null`. Both options are possible. In the first case, the record specifies where the measurement data can be found. In the second case it is assumed that an R object with the name `samp2` can be found in the global environment of your R session. ## Paramter specifcation ### Top level (`sample`, `files`) The top level has two parameters: #### `sample` This parameter specifies the name of the sample. This **name must be unique** and is ideally free of non-ASCII characters and white space. #### `files` This parameter can be `null` (`files` is the only parameter that can be set to `null`) or is followed by a set of `-` with the path to the measurement file given in quotes. The number of entries under `files` is not limited. Example: ``` files: - "/yourhardrive/yourfolder/sample_one_a.binx" - "/yourhardrive/yourfolder/sample_one_b.binx" ``` If the entry is `null`, the function `BayLum::create_DataFile()` that uses the settings from the YAML file will assume that R objects with the name specified in `sample` are available in the global session environment. For instance, files are imported and treated with `Luminescence::read_BIN2R(...) |> subset(...)` or similar. Setting `files` to `null` gives you all options to pre-process your measurement data and is the recommended mode of operation. If `files` comes with file path entries, then `BayLum::create_DataFile()` will try to import those files using the appropriate import functions. This is very convenient, however, except for minimal filtering (e.g., removing non-OSL and non-IRSL curves), the measurement data remain untreated, and `BayLum::create_DataFile()` expects that all data are complete (e.g., identical number of curves), without error and strictly follow the SAR structure. ### `settings` level The `settings` level allows you to specify the dose rate of your source used for the irradiation in Gy/s (`dose_source`) and the environmental dose rate in Gy/ka (`dose_env`). Each value needs to be provided with its uncertainty, as shown in the example: ``` settings: dose_points: null dose_source: { value: 0.1535, error: 0.00005891 } dose_env: { value: 2.512, error: 0.05626 } ``` Additionally, you can set specify the regeneration dose points (in s). The default is `null`, because irradiation times are automatically extracted from the data by `create_DataFile()`. However, this information might be missing or, more likely, wrong and it is very cumbersome to fix those numbers manually in the measurement data. Therefore the dose points can be provided with the config file: ``` settings: dose_points: [10, 20, 50, 0, 10] dose_source: { value: 0.1535, error: 0.00005891 } dose_env: { value: 2.512, error: 0.05626 } ``` The example corresponds to 5 (five) regeneration dose points of 10 s, 20 s, ..., 10 s. *Note* - *You should add the values as you have specified them in the measurement sequence, except for the natural dose point (0 s) and the test dose points, which must not be added.* - *The provided vector will be shortened automatically to fit the actual number of dose points.* - *An error will be thrown if you provide not enough dose points* ### `rules` level The rules level enables you to provide a couple of parameters, which are used in Bayesian modelling. PARAMETER | TYPE | COMMENT ----------------- | ------------| -------------------------------------------- `beginSignal` | integer | Channel number start OSL signal integral ($L_x$) `endSignal` | integer | Channel number end OSL signal integral ($L_x$) `beginBackground` | integer | Channel number start OSL background integral ($L_x$) `endBackground` | integer | Channel number end OSL background integral ($L_x$) `beginTest` | integer | Channel number start OSL signal integral ($T_x$) `endTest` | integer | Channel number end OSL signal integral ($T_x$) `beginTestBackground` | integer | Channel number start OSL background integral ($T_x$) `endTestBackground` | integer | Channel number end OSL background integral ($T_x$) `inflatePercent` | double | Additional overdispersion value to inflate the uncertainty in percentage `nbOfLastCycleToRemove` | integer | Number of SAR cycles to be removed from the measurement file Example: ``` rules: beginSignal: 6 endSignal: 8 beginBackground: 50 endBackground: 55 beginTest: 6 endTest: 8 beginTestBackground: 50 endTestBackground: 55 inflatePercent: 0.027 nbOfLastCycleToRemove: 1 ``` Please ensure that the set values correspond to your measurement data. For instance, if your OSL curve has only 100 channels (data points), it does not make sense to set larger integral settings (e.g., 1000), and such a setting will lead to an error. Integral values for ($L_x$) and ($T_x$) are usually set to identical values unless you have good reasons to use different integral settings. ## Final remarks ## Auto-generate the config file using `write_YAMLConfigFile()` To ease the generation of configuration files for many samples, you can use the function `write_YAMLConfigFile()`. The function has two different operation modes, which are shown below. Important is to note that the function does not seem to have function parameters, because all parameters are extracted from a reference file within the package. All parameters in the reference file are allowed. However, you can only preset each parameter for all records, except for the parameter `sample`. The length of this parameter (e.g., `write_YAMLConfigFile(sample = c("a1", "a2))`) determines the number of records in the configuration file output. ### Show available parameters In this mode, the function displays available parameters in the terminal and returns a list that can be modified in R and then passed to `create_DataFile()`: ```{r, collapse=TRUE} l <- write_YAMLConfigFile() ``` ```{r} str(l) ``` ### Write YAML file Alternatively, the function can be used to generate a config file with preset values. You can then modify the generated YAML file with any text-editor. ```{r, eval=FALSE} l <- write_YAMLConfigFile(output_file = "") ``` ## Internals The YAML settings file is loaded and processed by the function `BayLum::create_DataFile()` using `yaml::read_yaml()` from the R package `'yaml'`. `yaml::read_yaml()` returns a `list` on R, which is then processed by `BayLum::create_DataFile()`. Sometimes it makes sense to modify the settings on the fly in R. To avoid import and export of YAML files, `BayLum::create_DataFile()` always tries to process the input of the parameter `BayLum::create_DataFile(config_file, ...)` as a `list` before trying to load a YAML file from the hard drive. While this option is usually unnecessary, this information may help in more complex R scripts.