--- title: "Adding New Data-Generating Mechanisms" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Adding New Data-Generating Mechanisms} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` This vignette explains how to add new data-generating mechanisms (DGMs) to the `PublicationBiasBenchmark` package. In the following, we will use the `no_bias` DGM as an example. (See the [Using Presimulated Datasets](Using_Presimulated_Datasets.html) vignette for details on working with the already stored simulated datasets.) ## Overview Each DGM in the package consists of three key components: 1. **Main DGM function**: Implements the data-generating mechanism 2. **Validation function**: Validates input parameters and settings 3. **Conditions function**: Defines pre-specified conditions All three functions must be implemented in a single file named `dgm-{DGM_NAME}.R` in the `R/` directory. Implementation of these three functions allows users to generate data from the DGM via the [`simulate_dgm()`](../reference/simulate_dgm.html) function. ## File Structure and Naming For a DGM called "no_bias", you need to create a file named `R/dgm-no_bias.R` containing three functions: - `dgm.no_bias()`: The main data-generating mechanism implementation - `validate_dgm_setting.no_bias()`: Parameter validation - `dgm_conditions.no_bias()`: Pre-defined conditions The naming pattern is crucial for the package's S3 method dispatch system to work correctly. ## 1. Main DGM Function: `dgm.{DGM_NAME}()` This is the core function that implements your data-generating mechanism. Here is the `no_bias` implementation as an example: ```{r, eval=FALSE} #' @title Normal Unbiased Data-Generating Mechanism #' #' @description #' An example data-generating mechanism to simulate effect sizes without #' publication bias. #' #' @param dgm_name DGM name (automatically passed) #' @param settings List containing \describe{ #' \item{mean_effect}{Mean effect} #' \item{heterogeneity}{Effect heterogeneity} #' \item{n_studies}{Number of effect size estimates} #' } #' #' #' @return Data frame with \describe{ #' \item{yi}{effect size} #' \item{sei}{standard error} #' } #' #' @references #' \insertAllCited{} #' #' @seealso [dgm()], [validate_dgm_setting()] #' @export dgm.no_bias <- function(dgm_name, settings) { # Extract settings n_studies <- settings[["n_studies"]] mean_effect <- settings[["mean_effect"]] heterogeneity <- settings[["heterogeneity"]] # Simulate sample sizes based on empirical distribution N_shape <- 2 N_scale <- 58 N_low <- 25 N_high <- 500 N_seq <- seq(N_low, N_high, 1) N_den <- stats::dnbinom(N_seq, size = N_shape, prob = 1/(N_scale+1)) / (stats::pnbinom(N_high, size = N_shape, prob = 1/(N_scale+1)) - stats::pnbinom(N_low - 1, size = N_shape, prob = 1/(N_scale+1))) N <- sample(N_seq, n_studies, TRUE, N_den) # Compute standard errors based on sample sizes (Cohen's d formula) standard_errors <- sqrt(4/N) # Simulate true effect sizes with heterogeneity effect_sizes <- stats::rnorm(n_studies, mean_effect, sqrt(heterogeneity^2 + standard_errors^2)) # Return standardized data frame data <- data.frame( yi = effect_sizes, sei = standard_errors, ni = N ) return(data) } ``` ### Key Requirements for the Main Function: **Input Parameters:** - `dgm_name`: Automatically passed by the framework - `settings`: Named list containing all DGM parameters or the `condition_id` value **Output:** Must return a data frame with these **required columns**: - `yi`: Effect sizes - `sei`: Standard errors - `ni`: Sample sizes - `es_type`: Type of effect size (e.g., "SMD", "logOR", "none") **Optional additional columns** (commonly used): - `study_id`: Unique identifier for each study/cluster (in the presence of multilevel/clustered data) ## 2. Validation Function: `validate_dgm_setting.{DGM_NAME}()` This function validates that all required parameters are provided and have valid values: ```{r, eval=FALSE} #' @export validate_dgm_setting.no_bias <- function(dgm_name, settings) { # Check that all required settings are specified required_params <- c("n_studies", "mean_effect", "heterogeneity") missing_params <- setdiff(required_params, names(settings)) if (length(missing_params) > 0) stop("Missing required settings: ", paste(missing_params, collapse = ", ")) # Extract settings for validation n_studies <- settings[["n_studies"]] mean_effect <- settings[["mean_effect"]] heterogeneity <- settings[["heterogeneity"]] # Validate each parameter if (length(n_studies) != 1 || !is.numeric(n_studies) || is.na(n_studies) || !is.wholenumber(n_studies) || n_studies < 1) stop("'n_studies' must be an integer larger than 0") if (length(mean_effect) != 1 || !is.numeric(mean_effect) || is.na(mean_effect)) stop("'mean_effect' must be numeric") if (length(heterogeneity) != 1 || !is.numeric(heterogeneity) || is.na(heterogeneity) || heterogeneity < 0) stop("'heterogeneity' must be non-negative") return(invisible(TRUE)) } ``` ### Key Points for Validation: - Check for missing required parameters - Validate parameter types (numeric, integer, character, etc.) - Check parameter ranges and constraints - Provide clear, informative error messages - Return `invisible(TRUE)` on successful validation - Use `stop()` for validation failures ## 3. Conditions Function: `dgm_conditions.{DGM_NAME}()` This function defines pre-specified conditions for benchmarking studies: ```{r, eval=FALSE} #' @export dgm_conditions.no_bias <- function(dgm_name) { # Generate a grid of pre-specified settings settings <- data.frame(expand.grid( mean_effect = c(0, 0.3), heterogeneity = c(0, 0.15), n_studies = c(10, 100) )) # Attach unique condition identifiers settings$condition_id <- 1:nrow(settings) return(settings) } ``` Always add a `condition_id` column with unique identifiers. This column is used for generating data from the pre-defined conditions. Once defined, these settings cannot be changed retrospectively to ensure reproducibility and continuity of the benchmark. ## Using Your New DGM Once implemented, your DGM can be used through a unified interface: ```{r, eval=FALSE} # Use with custom settings data <- simulate_dgm("no_bias", list( mean_effect = 0.2, heterogeneity = 0.1, n_studies = 50 )) head(data) # Use with pre-defined conditions data <- simulate_dgm("no_bias", settings = 1) head(data) # View available conditions conditions <- dgm_conditions("no_bias") conditions ```