---
title: "Logical Vectors, Bayesian power analyses, and ROPEs"
author: Phil Chalmers
date: "`r format(Sys.time(), '%B %d, %Y')`"
output: 
  bookdown::html_document2:
    base_format: rmarkdown::html_vignette
    number_sections: false
    toc: true
vignette: >
  %\VignetteIndexEntry{Logical Vectors, Bayesian power analyses, and ROPEs}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
  markdown: 
    wrap: 72
---

```{r include=FALSE}
library(Spower)
set.seed(42)
formals(SpowerCurve)$plotly <- FALSE
```

```{r include=FALSE}
eval <- as.logical(Sys.getenv("SPOWER_EVAL"))
if(is.na(eval)) eval <- FALSE  # set to FALSE for normal run
store <- list()
if(!eval)
	store <- readRDS(system.file("intro2.rds", package = 'Spower'))
```

```{r include=eval}
getwd()
```

# Logical returns

In many applications it can be advantageous to directly return
`logical` values in the simulation experiment rather than
letting `Spower()` perform these threshold transformations internally 
(e.g., using `sig.level`) as these can
include more intricate experimental result requirements. The following
showcases various ways that returning `logical` values works in the
`Spower` package, where the average across the `TRUE`/`FALSE` values 
reflects the target power estimate.

Note that returning a `logical` in the simulation
experiment necessarily implies that the `sig.level` argument in `Spower()`
and friends will not be used, and therefore suitable alternatives must
be defined within the context of the simulation experiment code
(e.g., including `conf.level` or `sig.level` in the simulation experiment function directly).

## Confidence (and credible) intervals

Keeping with the basic t-test experiment in the introduction vignette,
suppose we're interested in the power to reject the null hypothesis
$H_0:\, \mu = \mu_0$ in a one-sample $t$-test, where $P(D|H_0$)
is the probability of the observing the data given the null hypothesis. Normally, one
could simply write an experiment that returns a $p$-value in this
context, such as the following,

```{r}
p_single.t <- function(n, mean, mu=0){
	g <- rnorm(n, mean=mean)
	p <- t.test(g, mu=mu)$p.value
	p
}
```

However, an equivalent way to explore power in this context would be to
investigate the same null hypothesis via *confidence intervals* given a
specific $\alpha$ level to define their range, where $CI_\mu=[CI_{\alpha/2},CI_{1-\alpha/2}]$.

If one were to take this approach, the defined simulation function should return a
`logical` value based on the relation of the parameter estimate to the CI, 
where the CI is used to evaluate the
plausibility of $\mu = \mu_0$. Specifically, the CI is used to evaluate whether $\mu_0$ falls *outside*
the advertised interval, returning `TRUE` if outside the CI and `FALSE` if within the interval. 
Alternatively, if one were in a Bayesian analysis
context, a *credible interval* could be used instead of the confidence
interval to construct the same logical output.

The following code demonstrates this logic, assuming that $\alpha = .05$
(and therefore a two-tailed, 95% CI is used), and uses the
`is.outside_CI()` function to evaluate whether the $\mu_0$ parameter 
falls outside the estimated `CI` returned from `t.test()`.

```{r}
l_single.t <- function(n, mean, mu=0, conf.level=.95){
	g <- rnorm(n, mean=mean)
	out <- t.test(g, mu=mu, conf.level=conf.level)
	CI <- out$conf.int
	is.outside_CI(mu, CI)   # equivalent to: !(CI[1] < mu && mu < CI[2])
}

l_single.t(100, mean=.2)
```

Evaluating the power analysis with `Spower()` works out of the box now, noting again that
`l_single.t()` will ignore the `Spower(..., sig.level)` information altogether as it is no longer 
relevant when `logical` information is returned. The following compares both the $p$-value and logical CI approaches, both of which provide identical inferential information in this case (this will not always be true; the $t$-test simply reflects a special case).

```{r eval=eval}
p_single.t(n=100, mean=.3) |> Spower()
```

```{r echo=FALSE}
if(eval) store$pPower <- getLastSpower()
pPower <- store$pPower
print(pPower)
```

```{r eval=eval}
l_single.t(n=100, mean=.3) |> Spower()
```

```{r echo=FALSE}
if(eval) store$lPower <- getLastSpower()
lPower <- store$lPower
print(lPower)
```

### Using previouls defined simulation code

Note that even in the CI context presented above writing user-defined functions may not be entirely necessary. This is because the related, internally defined function `p_t.test()` can be used to obtain the same CI information by returning the model itself and subsequently extracting the `$conf.int` element. The benefit of this, as shown below, is that users do not need to reinvent the data generation and analysis portions of the experiment if this is already available (and well tested) in the package.

```{r}
l_single.t <- function(n, mean, mu=0, conf.level=.95){
	# return analysis output from t.test() for further extraction
	out <- p_t.test(n=n, d=mean, mu=mu, type='one.sample', 
					conf.level=conf.level, return_analysis=TRUE)
	CI <- out$conf.int
	is.outside_CI(mu, CI)
}

l_single.t(100, mean=.2)
```

## Precision criterion

Using confidence or credible intervals are also useful in contexts where
specific *precision* criteria are important to satisfy.
Suppose that, in addition to detecting a particular effect of interest
in a given sample, the results are only deemed "practically useful" if
the resulting effect size inference are sufficiently precise, where
precision could be based on the *magnitude of the SE*, the *width of the uncertainty interval*,
or other relevant precision-based criterion. In this case, one may join the logic of the $p$-value/CI approaches
presented thus far to create a joint evaluation for power, where a
result is deemed "significant and useful" if the tests are null hypothesis test is
rejected (in the $p$-value context) *and* the CI is sufficiently narrow.

As a working example, suppose that the above one-sample $t$-test experiment was generalized
such that a meaningfully significant result would require a) the rejection
of the null, $\mu_0=0$, and b) a CI width less than
1/4 standardized mean units. What value of $N$ would be required to obtain such a
significant and sufficiently accurate inference to obtain a power of 80% given the small effect size of $0.2$?

```{r}
l_precision <- function(n, mean, CI.width, mu=0, alpha=.05){
	g <- rnorm(n, mean=mean)
	out <- t.test(g, mu=mu)
	CI <- out$conf.int
	width <- CI[2] - CI[1]
	# return TRUE if significant and CI is sufficiently narrow
	out$p.value < alpha && width < CI.width   
}
```

```{r eval=eval}
l_precision(n=NA, mean=.2, CI.width=1/4) |> 
	Spower(power=.80, interval=c(10, 500))
```

```{r echo=FALSE}
if(eval) store$lprecision <- getLastSpower()
lprecision <- store$lprecision
print(lprecision)
```

Compared the required $N$ to an power analysis that just contains a
significant result this joint practical significance criteria requires a notably higher sample size. Note that in the special case where `CI.width=Inf` then all CI widths will be accepted, which will result in the same power output that would have been obtained using `p_single.t()`.

```{r eval=eval}
l_precision(n=NA, mean=.2, CI.width=Inf) |> 
	Spower(power=.80, interval=c(10, 500))
```

```{r echo=FALSE}
if(eval) store$lprecision2 <- getLastSpower()
lprecision2 <- store$lprecision2
print(lprecision2)
```

## Bayes Factors

If one were using a Bayesian analysis criteria rather than the $p$-value approach, the Bayes factor ($BF$)
ratio could be used in the `logical` return context too. For example,
returning whether the observed $BF>3$ in a given random sample would indicate
at least "moderate" supporting evidence for the hypothesis of interest
compared to some competing hypothesis (often the complementary null, $P(H_0|D)$,
though not necessarily), and the average across the independent samples would indicate the degree of power when using 
this Bayes factor cut-off.

The downside of focusing on BFs is that they require the computation of
the marginal likelihoods, typically via bridge sampling (e.g., via the
`bridgesampling` package), in addition to fitting the model using Markov
chain Monte Carlo (MCMC) methods (e.g., `brms`, `rstan`, `rstanarm`).
Though not a strict limitation per se, it is often more natural to focus
directly on the sample from posterior distribution for power analysis
applications rather than on the marginal Bayes factors; this is
demonstrated in the next section. Nevertheless, such applications are
possible with `Spower` if there is interest in doing so.

As a simple example, the following one-sample $t$-test initially defined
above could be redefined to focus on output from the `BayesFactor`
package, which returns the $BF$ criteria in log units (hence, `exp()` is
used to return the ratio to its original metric) assuming a non-informative Jeffreys prior for $\mu$.
In this case a `TRUE` is returned if the Bayes factor is greater than 3
and `FALSE` if less than or equal to 3. 

Finally, to ensure that nothing important is lost in the simulation experiment code 
a `data.frame()` object is returned instead of just the `logical` information, 
while `Spower()` is informed to only focus on the `logical` information for the purpose of the power computations.

```{r eval=eval}
l_single.Bayes.t_BF <- function(n, mean, mu=0, bf.cut=3){
	g <- rnorm(n, mean=mean)
	res <- BayesFactor::ttestBF(g, mu=mu)	
	bf <- exp(as.numeric(res@bayesFactor[1])) # Bayes factor
	data.frame(largeBF=bf > bf.cut, bf=bf)
}
```

Evaluating this simulation with $N=100$,
$\mu=.5$, and $\mu_0=.3$ gives the following power estimate.

```{r eval=eval}
l_single.Bayes.t_BF(n=100, mean=.5, mu=.3) |> Spower(select='largeBF') -> BFsim
BFsim
```

```{r echo=FALSE}
if(eval) store$prospectiveBF3 <- getLastSpower()
BFsim <- store$prospectiveBF3
print(BFsim)
```

To view the complete simulation results use `SimResults()` on the resulting output, which if useful could be further plotted. Note that when plotting Bayes factors it is advantageous to present the plot in natural log units.

```{r include=FALSE}
library(ggplot2)
```

```{r}
BFresults <- SimResults(BFsim)
BFresults

# use log-scale for Bayes factors as this is a more useful metric
library(ggplot2)
ggplot(BFresults, aes(log(bf), fill=largeBF)) + 
	geom_histogram(bins=50) + geom_vline(xintercept=log(3)) + 
	ggtitle('log(BF) distribution')
```

# Bayesian power analysis via posterior probabiltes

The canonical way that *Spower* has been designed focuses primarily
on $p$-values involving the null hypothesis to be tested ($P(D|H_0)$).
The reason for setting the package up this way is so that the parameter
$\alpha$ (`sig.level`) can be used as the "line-in-the-sand" threshold to
flag whether a null hypothesis was rejected in each sample of data as
this behaviour is common among popular power analysis software.
Bayesian power analysis, on the other hand, are also supported by the package,
where instead the posterior probability of the alternative
hypothesis, $P(H_1|D)$, is the focus of the simulation experiment. 

Continuing with the simple one-sample $t$-test example in
the introduction vignette and above, were the power analysis context be that of a
Bayesian analysis the conditional probability of the alternative,
$P(H_1|D)$, may be used instead. For this to work with *Spower* though,
the argument `sig.direction = 'above'` should be supplied, where now the 
`sig.level` indicates that "significance" only occurs when an probability
observation is *above* the define `sig.level` cutoff (hence, the default of `.05`
is no longer reasonable and should be modified).

Below is one such Bayesian approach using posterior probabilities using
the `BayesFactor` package, which is obtained by translating the Bayes
factor output into a suitable posterior probability and focusing on the
alternative hypothesis (hence, the posterior probability returned
corresponds to $P(\mu \ne \mu_0|D)$). The following also assumes that
the competing hypotheses are equally likely when obtaining the posterior
probability (hence, prior odds are 1:1, reflected in the argument `prior_odds`).

```{r eval=eval}
# assuming P(H1)/P(H0) are equally likely; hence, prior_odds = 1
pp_single.Bayes.t <- function(n, mean, mu, prior_odds = 1){
	g <- rnorm(n, mean=mean)
	res <- BayesFactor::ttestBF(g, mu=mu)	
	bf <- exp(as.numeric(res@bayesFactor[1])) # Bayes factor
	posterior_odds <- bf * prior_odds
	posterior <- posterior_odds / (posterior_odds + 1)   
	posterior   # P(H_1|D)
}
```

For the Bayesian $t$-test definition in the next code chunk
evaluation, "significance" is obtained whenever the sample posterior is
*greater* than `sig.level = .90`, demonstrating strong support of $H_1$.
Note that this is a more strict criteria than the null hypothesis
criteria presented in the introduction vignette, and therefore has
notably lower power.

```{r eval=eval}
# power cut-off for a significantly supportive posterior is > 0.90
pp_single.Bayes.t(n=100, mean=.5, mu=.3) |> 
	Spower(sig.level = .90, sig.direction = 'above')
```

```{r echo=FALSE}
if(eval) store$prospectiveBayes <- getLastSpower()
prospectiveBayes <- store$prospectiveBayes
print(prospectiveBayes)
```

With this approach all of the power analysis criteria described in
`help(Spower)` are still possible, where for instance solving other
experimental components (such as the sample size `n`) are easy to setup
by providing suitable `NA` argument flags and search intervals in `Spower()`.

# Regions of practical equivalence (ROPEs)

This section presents two related concepts for estimating the power
where some justifiable equivalence interval is of interest.

## Equivalence testing

As an alternative approach to the rejection of the null hypothesis via
the $p$-value or CI approaches, there may be interest in evaluating
power in the context of establishing *equivalence*, or in directional
cases *superiority* or *non-inferiority*. The purpose of an equivalence
tests is to establish that, although true differences may exist between groups, the differences are
small enough to be considered "practically equivalent" in all subsequent applications. 

As a running example, suppose that in an independent samples $t$-test 
the two groups might be considered "equivalent" if the true mean difference in the population is
somewhere above $\epsilon_L$ but below $\epsilon_U$, where the $\epsilon$s
are used to define the **equivalence interval**. If, for instance, two
groups are to be deemed statistically equivalent given these boundary locations then, 
using a two-one sided hypothesis testing approach (TOST), the two null hypotheses must be evaluated are
$$H_{0a}:\, (\mu_1 - \mu_2) \le -\epsilon_L$$ and
$$H_{0b}:\,(\mu_1 - \mu_2) \ge \epsilon_U$$ 
Rejecting both of these null
hypotheses leads to the induced complementary hypothesis of interest
$$H_1:\, \epsilon_L < (\mu_1 - \mu_2) < \epsilon_U$$ 
or in words, the population mean difference falls within the defined region of
equivalence. Superiority testing and non-inferiority testing follow the
same type of logic, however rather than defining a region of equivalence
only one tail of the equivalence interval is of interest.

To put numbers to the above expression, suppose that the true mean
difference between the groups was $\mu_d = \mu_2 - \mu_1 = 1$ (labeled `delta`), and each group had an $SD = 2.5$ (labeled `sds`).
Furthermore, suppose *any* true difference that fell within the equivalence interval $[-2.5, 2.5]$ (labeled `equiv`)
would be deemed practically equivalent a priori. The power to jointly reject the above null hypotheses, and therefore conclude the groups are practically equivalence ($H_1$), is evaluated in the following output for an experiment with $N=100$ observations ($n=50$ for each group).

```{r}
l_equiv.t <- function(n, delta, equiv, sds = c(1,1), 
					  sig.level = .025){
	g1 <- rnorm(n, mean=0, sd=sds[1])
	g2 <- rnorm(n, mean=delta, sd=sds[2])
	outL <- t.test(g2, g1, mu=-equiv[1], alternative = "less")$p.value
	outU <- t.test(g2, g1, mu=equiv[2], alternative = "less")$p.value
	outL < sig.level && outU < sig.level
}
```

```{r eval=eval}
l_equiv.t(50, delta=1, equiv=c(-2.5, 2.5), 
		  sds=c(2.5, 2.5)) |> Spower()
```

```{r echo=FALSE}
if(eval) store$equivT <- getLastSpower()
equivT <- store$equivT
print(equivT)
```

In this case, the power to conclude that the two groups are equivalent, expressed as a percentage, is `r paste0(round(equivT$power*100), '%')`.
You can verify that these computations are correct by comparing to
established software for now, such as via the `TOSTER` package.

```{r eval=FALSE}
TOSTER::power_t_TOST(n = 50,
			 delta = 1,
			 sd = 2.5,
			 eqb = 2.5,
			 alpha = .025,
			 power = NULL,
			 type = "two.sample")
```

```         
     Two-sample TOST power calculation 

          power = 0.8438747
           beta = 0.1561253
          alpha = 0.025
              n = 50
          delta = 1
             sd = 2.5
         bounds = -2.5, 2.5

NOTE: n is number in *each* group
```

Again, the same type of logic can be evaluated using `CI`s alone, and with the built-in `p_t.test()` function, where in this case `TRUE` is returned if the estimated  90% `CI` falls within the defined equivalence interval.

```{r}
l_equiv.t_CI <- function(n, delta, equiv, 
						 sds = c(1,1), conf.level = .95){
	out <- p_t.test(n, delta, sds=sds, conf.level=conf.level, 
					return_analysis=TRUE)
	is.CI_within(out$conf.int, interval=equiv)  # returns TRUE if CI is within equiv interval
}
```

```{r eval=eval}
# an equivalent power analysis for "equivalence tests" via CI evaluations
l_equiv.t_CI(50, delta=1, equiv=c(-2.5, 2.5), 
		  sds=c(2.5, 2.5)) |> Spower()
```

```{r echo=FALSE}
if(eval) store$equivTL <- getLastSpower()
equivTL <- store$equivTL
print(equivTL)
```

## Bayesian approach to ROPEs (HDI + ROPE)

Finally, though not exhaustively, one could approach the topic of
practical equivalence using Bayesian methods using draws from the posterior
distribution of interest, such as those available from BUGS or HMC samplers (e.g.,
`stan`). This approach is highly similar to the equivalence testing
approach described above, but uses highest density interval + ROPE in
Bayesian modeling instead. Below is one such example that constructs a
simple linear regression model with a binary $X$ term that is analysed
with `rstanarm::stan_glm()`.

```{r eval=eval}
library(bayestestR)
library(rstanarm)

rope.lm <- function(n, beta0, beta1, range, sigma=1, ...){
	# generate data
	x <- matrix(rep(0:1, each=n))
	y <- beta0 + beta1 * x + rnorm(nrow(x), sd=sigma)
	dat <- data.frame(y, x)
	
	# run model, but tell stan_glm() to use its indoor voice
	model <- quiet(rstanarm::stan_glm(y ~ x, data = dat))
	rope <- bayestestR::rope(model, ci=1, range=range, parameters="x")
	as.numeric(rope)
}
```

In the above example, the proportion of the sampled posterior
distribution that falls within the ROPE is returned, which works well
with the `sig.level` argument coupled with `sig.direction = 'above')` in
`Spower()` to define a suitable accept/reject cut-off. Specifically, if
`sig.level = .95` and `sig.direction = 'above')` then the ROPE will only
be accepted when the percentage of the posterior distribution that falls within the
defined ROPE is greater than .95. This can of course be performed
manually, returning a `TRUE` when satisfied and `FALSE` otherwise, 
however in this case it is not necessary.

Below reports a power estimate given $N=50\times 2=100$, where the ROPE
criteria is deemed satisfied/significant if 95% of the posterior
distribution for the $\beta_1=1$ falls within the defined range of
$1 \pm .2\rightarrow [.8,1.2]$. Due to the slower execution speeds of
the simulations the power evaluations are computed using `parallel=TRUE`
to utilize all available cores.

```{r eval=eval}
rope.lm(n=50, beta0=2, beta1=1, sigma=1/2, range=c(.8, 1.2)) |> 
	Spower(sig.level=.95, sig.direction='above', parallel=TRUE)
```

```{r echo=FALSE}
if(eval) store$ROPE.lm <- getLastSpower()
ROPE.lm <- store$ROPE.lm
print(ROPE.lm)
```

Finally, to demonstrate why this might be useful, the following
estimates the required sample size to achieve 80% power when using a 95%
HDI-ROPE criteria.

```{r eval=eval}
rope.lm(n=NA, beta0=2, beta1=1, sigma=1/2, range=c(.8, 1.2)) |> 
	Spower(power=.80, sig.level=.95, sig.direction='above',
		   interval=c(50, 200), parallel=TRUE)
```

```{r echo=FALSE}
if(eval) store$ROPE.lm_n <- getLastSpower()
ROPE.lm_n <- store$ROPE.lm_n
print(ROPE.lm_n)
```

```{r include=FALSE, eval=eval}
saveRDS(store, '../inst/intro2.rds') # rebuild package when done
```