--- title: "Introduction to the Spower package" author: Phil Chalmers date: "`r format(Sys.time(), '%B %d, %Y')`" output: bookdown::html_document2: base_format: rmarkdown::html_vignette number_sections: false toc: true vignette: > %\VignetteIndexEntry{Introduction to the Spower package} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r include=FALSE} library(Spower) set.seed(42) formals(SpowerCurve)$plotly <- FALSE ``` ```{r include=FALSE} eval <- as.logical(Sys.getenv("SPOWER_EVAL")) if(is.na(eval)) eval <- FALSE # set to FALSE for normal run store <- list() if(!eval) store <- readRDS(system.file("intro.rds", package = 'Spower')) ``` ```{r include=eval} getwd() ``` This vignette provides a brief introduction to using the package `Spower` for prospective/post-hoc, a priori, sensitivity, criterion, and compromise power analyses. For a more detailed description of the package structure and philosophy please refer to associated publication (Chalmers, in press), as well as the documentation found within the functions; particularly, `Spower()`, `SpowerBatch()`, and `SpowerCurve()`. # Types of functions There are generally two ways to go about using `Spower`. The first is by utilizing one of a handful of **build-in simulation experiments** with an associated analysis, or, more flexibility (but less user friendly) by constructing a **user-defined simulation experiment** by way of writing simulation code encapsulated inside a function. In either case, the goal is to design R code to perform a given simulation experiment with a set of meaningful (often scalar) arguments, where the output from this function returns either: 1) A suitable $p$-value under the null hypothesis, $P(D|H_0)$, 2) The posterior probability of a (typically alternative) hypothesis, $P(H_1|D)$, or 3) A logical value indicating support for the hypothesis of interest, where the average of the `TRUE`/`FALSE` returns reflects something to do with statistical power (e.g., `TRUE` if rejecting the null; `TRUE` if supporting the alternative; some more complicated combination involving precision, multiple hypotheses, regions of practical equivalence/significance, etc). The package defaults primarily focus on the first approach criteria, as this is historically the most common in software (e.g., *GPower 3*), however nothing precludes `Spower` from more complex power analyses. See the vignette *"Logical Vectors, Bayesian power analyses and ROPEs"* for more advanced examples involving confidence intervals (CIs), parameter precision criterion, regions of practical equivalences (ROPEs), equivalence tests, Bayes Factors, and power analyses involving posterior probabilities. See also the vignette *"Type S and Type M errors"* for conditional power evaluation information to estimate sign (S) and magnitude (M) power information and their respective Type S and Type M errors (Gelman and Carlin, 2014). ## Built-in experiments `Spower` ships with several common statistical inference experiments, such as those involving linear regression models, mediation, ANOVAs, $t$-tests, and so on. The simulation experiments are organized with the prefix `p_`, followed by the name of the analysis method. For instance, ```{r} p_lm.R2(50, k=3, R2=.3) ``` performs a single simulation experiment reflecting the null hypothesis $H_0:\, R^2=0$ for a linear regression model with $k=3$ predictor variables and a sample size of $N=50$. Translating this into a power analysis simply requires passing this experiment to `Spower()` (the details of which are discussed below), where by default the estimate of power ($1-\hat{\beta}$) is returned. ```{r eval=eval} p_lm.R2(50, k=3, R2=.3) |> Spower() ``` ```{r echo=FALSE} if(eval) store$R2ex <- getLastSpower() print(store$R2ex) ``` Each of the `p_*` functions return a $p$-value ($P(D|H_0$) as this is the general information required to evaluate statistical power with `Spower()`. Alternatively, users may define their own simulation functions if the desired experiment has not been defined within the package. ## User-defined functions As a very simple example, suppose one were interested in the power to reject the null hypothesis $H_0:\, \mu = \mu_0$ in a one-sample $t$-test scenario, where $P(D|H_0$) is the probability of the data given the null hypothesis. Note that while the package already supports this type of analysis (see `help(p_t.test)`) it is instructive to see how users they can write their own version of this experiment as this will help in defining simulation experiments that are not currently defined. ```{r} p_single.t <- function(n, mean, mu=0){ g <- rnorm(n, mean=mean) p <- t.test(g, mu=mu)$p.value p } ``` This simulation experiment will first obtain sample of data drawn from a Gaussian distribution with some specific `mean` ($\mu$), and evaluate the conditional probability that the data were generated from a population with a $\mu_0=0$ (the null; hence, $P(D|\mu_0)$). As such, the $p$-value returned from this experiments reflects the probability of observing the data given the null hypothesis $H_0:\, \mu=\mu_0$, or more specifically $H_0:\, \mu=0$, for a single generated dataset. ```{r} # a single experiment p_single.t(n=100, mean=.2) ``` From here, a suitable cut-off is required to evaluate whether the experiment was 'significant', which is the purpose of the parameter $\alpha$. Specifically, if the observed data ($D$) were plausibly drawn from a population with $\mu_0$ (hence, $P(D|\mu_0) \ge \alpha$) then a `FALSE` significance would be returned; otherwise, if the data were unlikely to have been observed given the ($P(D|\mu_0) < \alpha$) then a `TRUE` would be returned, thereby indicating significance. For convenience, and for the purpose of other types of specialized power analyses (e.g., compromise analyses), the $\alpha$ parameter has been controlled via the argument `Spower(..., sig.level = .05)`, which creates the `TRUE/FALSE` evaluations internally. This saves a step in the writing, but if users wished to defined `sig.level` within the simulation experiment itself that is an acceptable approach too. # Types of power analyses to evaluate For power analyses there are typically four parameters that can be manipulated or solved in a given experiment: the $\alpha$ level (Type I error, often reflexively set to $\alpha=.05$), power (the complement of the Type II error, $1-\beta$), an effect size of interest, and the sample size. Given three of these values, the fourth can always be solved. Note that the above description reflects a rule-of-thumb as there may be multiple effect sizes of interest, multiple sample sizes (e.g., in the form of cluster sizes in multi-level models), and so on. In `Spower()`, switching between these types of power analysis criteria is done simply by explicitly specifying which parameter is missing (`NA`), as demonstrated below. ## Prospective/post-hoc power analysis The canonical setup for `Spower()` will evaluate prospective or post-hoc power, thereby obtaining the estimate $1-\hat{\beta}$. By default `Spower()` uses $10,000$ independent simulation `replications` to estimate the power when this is the target criterion. The following provides an estimate of power given the null hypothesis $H_0:\, \mu=0.3$. ```{r eval=eval} p_single.t(n=100, mean=.5, mu=.3) |> Spower() -> prospective prospective ``` ```{r echo=FALSE} if(eval) store$prospective <- prospective prospective <- store$prospective print(prospective) ``` ## Compromise power analysis Compromise power analysis involves manipulating the $\alpha$ level until some sufficient balance between the Type I and Type II error rates are met, expressed in terms of the ratio $q=\frac{\beta}{\alpha}$. In `Spower`, there are two ways to approach this criterion. The first way, which focuses on the `beta_alpha` ratio at the outset, requires passing the target ratio to `Spower()` using the same setup as the previous prospective power analysis. ```{r eval=eval} p_single.t(n=100, mean=.5, mu=.3) |> Spower(beta_alpha=4) -> compromise compromise ``` ```{r echo=FALSE} if(eval) store$compromise <- getLastSpower() compromise <- store$compromise print(compromise) ``` This returns the estimated `sig.level` ($\hat{\alpha}$) and resulting $\hat{\beta}$ that satisfies the target $q$ ratio. ```{r} # satisfies q = 4 ratio with(compromise, (1 - power) / sig.level) ``` The second way to perform a compromise analysis is to re-use a previous evaluation of a prospective/post-hoc power analysis as this contains all the necessary information. ```{r} # using previous post-hoc/prospective power analysis update(prospective, beta_alpha=4) ``` In either case, the use of `update()` can be beneficial as the stored result can be reused with alternative `beta_alpha` values without having to generate and analyse new experimental data. ## A priori power analysis The goal of a priori power analysis is generally to obtain the sample size ($N$) associated with a specific power rate of interest (e.g., $1-\beta=.90$), which is useful in the context of future data collection planning. To estimate the sample size using Monte Carlo simulation experiments, `Spower()` performs stochastic root solving using the ProBABLI approach from the `SimDesign` package (see `SimSolve()`), and requires a specific search `interval` to search within. The width of the interval should reflect a plausible range where the researcher believes the solution to lie, however this may be quite large as well as ProBABLI is less influenced by the range of the search interval (Chalmers, 2024). Below the sample size `n` is solved to achieve a target power of $1-\beta=.80$, where the solution for $N$ was suspected to lie somewhere between the search `interval = c(20, 200)`. ```{r eval=eval} p_single.t(n=NA, mean=.5) |> Spower(power=.8, interval=c(20,200)) ``` ```{r echo=FALSE} if(eval) store$apriori <- getLastSpower() print(store$apriori) ``` ## Sensitivity power analysis Similar to a priori power analysis, however the goal is now to find some specific standardized or unstandardized effect size associated with a specific power rate. This pertains to the question of how large an effect size must be in order to reliably detect the effect of interest. Below the sample size is fixed at $N=100$, while the interval for the standardized effect size $d$ is searched between `c(.1, 3)`. Note that the use of decimals in the interval tells `Spower()` to use a continuous rather than discrete search space (cf. with a priori, which uses an integer search space for the simulation replicates). ```{r eval=eval} p_single.t(n=100, mean=NA) |> Spower(power=.8, interval=c(.1, 3)) ``` ```{r echo=FALSE} if(eval) store$sensitive <- getLastSpower() print(store$sensitive) ``` ## Criterion power analysis Finally, in criterion power analysis the goal is to locate the associated $\alpha$ level (`sig.level`) required to achieve a target power output holding constant the other modeling information. This is done in `Spower()` by setting the `sig.level` input to `NA` while providing values for the other parameters. Note that technically no search interval is require in this context as $\alpha$ necessarily lies between the interval $[0,1]$. ```{r eval=eval} p_single.t(n=50, mean=.5) |> Spower(power=.8, sig.level=NA) ``` ```{r echo=FALSE} if(eval) store$criterion <- getLastSpower() print(store$criterion) ``` # Multiple power evaluation functions The following functions, `SpowerBatch()` and `SpowerCurve()`, can be used to evaluate and visualize power analysis results across a range on inputs rather than for a single set of fixed inputs. This section demonstrates their general usage, as their specifications slightly differ from that of `Spower()`, despite the fact that `Spower()` is the underlying estimation engine. ## SpowerBatch() To begin, suppose that there were interest in evaluating the `p_single.t()` function across multiple $n$ inputs to obtain estimates of $1-\beta$. While this could be performed using independent calls to `Spower()`, the function `SpowerBatch()` can be used instead, where the variable inputs can be specified in a suitable vector format. For instance, given the effect size $\mu=.5$, what would the power be to reject the null hypothesis $H_0:\, \mu=0$ across three different sample sizes, $N = [30,60,90]$? ```{r eval=eval} p_single.t(mean=.5) |> SpowerBatch(n=c(30, 60, 90)) -> prospective.batch prospective.batch ``` ```{r echo=FALSE} if(eval) store$prospective.batch <- prospective.batch prospective.batch <- store$prospective.batch print(prospective.batch) ``` This can further be coerced to a `data.frame` object if there is reason to do so (e.g., for plotting purposes, though see also `SpowerCurve()`). ```{r} as.data.frame(prospective.batch) ``` Similarly, if the were related to a priori analyses for sample size planning then the inputs to `SpowerBatch()` would be modified to set `n` to the missing quantity. ```{r eval=eval} apriori.batch <- p_single.t(mean=.5, n=NA) |> SpowerBatch(power=c(.7, .8, .9), interval=c(20, 200)) apriori.batch ``` ```{r echo=FALSE} if(eval) store$apriori.batch <- apriori.batch apriori.batch <- store$apriori.batch print(apriori.batch) ``` ```{r} as.data.frame(apriori.batch) ``` ## SpowerCurve() Often times researchers wish to visualize the results of power analyses in the form of graphical representations. `Spower` supports various types of visualizations through the function `SpowerCurve()`, which creates power curve plots of previously obtained results (e.g., via `SpowerBatch()`) or for to-be-explored inputs. Importantly, each graphic contains estimates of the Monte Carlo sampling uncertainty to deter over-interpretation of any resulting point-estimates. To demonstrate, suppose one were interested in visualizing the power for the running single-sample $t$ test across four different sample sizes, $N=[30,60,90,120]$. To do this requires passing the simulation experiment and varying information to the function `SpowerCurve()`, which fills in the variable information to the supplied simulation experiment and plots the resulting output. ```{r eval=FALSE} p_single.t(mean=.5) |> SpowerCurve(n=c(30, 60, 90, 120)) ``` ```{r echo=FALSE} if(eval) store$gg1 <- p_single.t(mean=.5) |> SpowerCurve(n=c(30, 60, 90, 120)) print(store$gg1) ``` Alternatively, were the above information already evaluated using `SpowerBatch()` then this `batch` object could be passed directly to `SpowerCurve()`'s argument `batch`, thereby avoiding the need to re-evaluate the simulation experiments. ```{r eval=FALSE} # pass previous SpowerBatch() object SpowerCurve(batch=batch) ``` ```{r echo=FALSE} if(eval) SpowerCurve(batch=store$prospective.batch) ``` `SpowerCurve()` will accept as many arguments as exists in the supplied simulation experiment definition, however it will only provide aesthetic mappings for the first three variable input specifications as anything past this becomes more difficult to display automatically. Below is an example that varies both the `n` input as well as the input `mean`, where the first input appears on the $x$-axis while the second is mapped to the default colour aesthetic in `ggplot2`. ```{r eval=FALSE} p_single.t() |> SpowerCurve(n=c(30, 60, 90, 120), mean=c(.2, .5, .8)) ``` ```{r echo=FALSE} if(eval) store$gg2 <- p_single.t() |> SpowerCurve(n=c(30, 60, 90, 120), mean=c(.2, .5, .8)) print(store$gg2) ``` ```{r include=FALSE, eval=eval} saveRDS(store, '../inst/intro.rds') # rebuild package when done ```