---
title: "Tutorial of QualityMeasure functions"
author: "Kenneth Nieser"
output:  rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Tutorial of QualityMeasure functions}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 8, 
  fig.height = 6
)
```
This vignette shows how to use the functions in `QualityMeasure` to analyze quality measure performance and reliability. In the first half, we demonstrate how to use functions for measures without risk adjustment. In the second half, we show how to incorporate risk adjustment into the analyses.
```{r setup}
library(QualityMeasure)
```
------------------------------------------------------------------------
# Example #1: Measures without risk adjustment
## Simulate data
We'll simulate some data to use for the example analyses in this section.
```{r, eval = FALSE}
### Data parameters
n.entity = 100  # number of accountable entities
n.obs = 50      # average number of observations per accountable entity
mu = .2         # marginal probability of the outcome
r = .7          # median reliability
### Simulate data
example_df_1 <- simulateData(n.entity = n.entity, n.obs = n.obs, mu = mu, r = r)
```
## Provider profiling analyses
The `calcPerformance()` function can be used to obtain some basic provider profiling results, including confidence intervals for entity-level rates.
```{r}
profiling.results <- calcPerformance(df = example_df_1)
perf.results <- profiling.results$perf.results
```
Several control parameters are available through the `controlPerf()` parameter in `calcPerformance()`, including:
-   `min.n` which filters out entities with sample sizes below the specified amount; default is 2.
-   `alpha` which specifies the statistical significance level; default is 0.05.
The estimated marginal probability of the outcome is `r round(profiling.results$marg.p, 2)`. Recall that the true marginal probability according to our data-generation process is 0.2.
Using the `perf.results` dataframe, you can obtain summary statistics of the performance distribution. We see that the outcome rate varies from `r round(min(perf.results$p), 3)` to `r round(max(perf.results$p), 3)` with a median entity-level rate of `r round(median(perf.results$p), 3)`.
```{r, results = 'asis', echo = FALSE}
rate.summary = rbind(summary(perf.results$p))
perf.summary = cbind(Method = c('Unadjusted'), round(rate.summary, 3))
perf.summary = as.data.frame(perf.summary)
knitr::kable(perf.summary, caption = 'Performance summary statistics across entities')
```
### Number of observations per entity
```{r}
plotN(perf.results$n)
```
### Unadjusted rates across entities
```{r}
plotPerformance(df = perf.results)
```
Categorization of entities according to whether rates are lower, higher, or no different from the overall average can also be found in the `perf.results` output from `calcPerformance()` function.
```{r, results = 'asis', echo = FALSE}
knitr::kable(table(perf.results$category.p), col.names = c('Category', 'Number of entities'), caption = 'Categorization of entities based on their outcome rates')
```
### Plot of ORs comparing each entity with the average
Another way to assess whether entities have measure performance that differs from the national average is to examine the predicted random intercept values. These values likely will show fewer outliers, given that estimates are shrunken towards the mean.
Entities highlighted in red have estimated ORs that are statistically different from 1.0 at alpha = 0.05. There are `r sum(perf.results$intercept.sig==1)` entities with outcome odds that are significantly different from the average entity performance.
```{r}
plotPerformance(df = perf.results, plot.type = 'OR')
```
 
## Beta-binomial-based reliability estimates
You can calculate reliability estimates based on a Beta-Binomial model by using the `calcBetaBin()` function.
```{r, eval = FALSE}
tutorial_BB_results <- calcBetaBin(df = example_df_1)
```
The estimated alpha is `r round(tutorial_BB_results$alpha, 3)` and beta is `r round(tutorial_BB_results$beta, 3)`, which you can find in `tutorial_BB_results`.
Summary statistics of the reliability distribution:
```{r}
summary(tutorial_BB_results$est.BB)
```
Alternatively, if you have data that are aggregated by entity, you can calculate Beta-Binomial estimates by using
```{r, eval = FALSE}
df.agg <- data.frame(n = aggregate(y ~ entity, data = example_df_1, length)$y,
                     x = aggregate(y ~ entity, data = example_df_1, sum)$y)
tutorial_BB_agg_results <- calcBetaBin(df = df.agg, df.aggregate = T, n = 'n', x = 'x')`
```
where `n` is the column of sample sizes for each entity and `x` is the column of counts of observations meeting measure numerator criteria for each entity.
Again, the estimated alpha is `r round(tutorial_BB_agg_results$alpha, 3)` and beta is `r round(tutorial_BB_agg_results$beta, 3)`, and the distribution of reliability estimates is summarzied by:
```{r, echo = FALSE}
summary(tutorial_BB_agg_results$est.BB)
```
These estimates match our results from analyzing the data before aggregation.
## Calculate reliability from all methods
If you would like to estimate reliability using the split-sample method, Beta-Binomial method, and hierarchical logistic regression method at once, you can use the `calcReliability()` function. 
Again, minimum sample size per entity can be specified through the `min.n` parameter in `controlPerf()`. In addition, several control parameters are available through the `controlRel()` function:
-   `n.resamples` specifies the number of resamples to use for the calculation of the permutation split-sample reliability estimate; default is 100.
-   `n.cores` specifies the number of cores to use for parallel processing when calculating the split-sample estimates; default is 2.
-   `SSR.method` specifies whether the permutation method (i.e., resampling without replacement) or bootstrap method (i.e., resampling with replacement) should be used for split-sample reliability estimates. Default is the permutation method.
```{r, eval = FALSE}
tutorial_reliability_results_1 <- calcReliability(df = example_df_1, ctrRel = controlRel(n.resamples = 100))
```
```{r, results = 'asis', echo = FALSE}
rel.results <- tutorial_reliability_results_1$rel.results
rel.results.sub <- rel.results[,c('method', 'reliability', 'reliability_min', 'reliability_max')]
rel.results.sub$reliability <- round(rel.results.sub$reliability, 3)
rel.results.sub$reliability_min <- round(rel.results.sub$reliability_min, 3)
rel.results.sub$reliability_max <- round(rel.results.sub$reliability_max, 3)
names(rel.results.sub) <- c('Method', 'Reliability', 'Min Reliability', 'Max Reliability')
knitr::kable(rel.results.sub, caption = 'Reliability estimates')
```
```{r}
plotReliability(tutorial_reliability_results_1)
```
------------------------------------------------------------------------
# Example #2: Measures with risk adjustment
In this next part of the tutorial, we will work with an example measure where risk adjustment is required.
## Simulate data
We can use the built-in function `simulateData()` to simulate data from a hierarchical logistic regression model with covariates for risk adjustment. The simulated data will include a continuous covariate `x1` which is sampled from a standard Normal distribution.
```{r, eval = FALSE}
### Data parameters 
n.entity = 100  # number of accountable entities
n.obs = 50 # average number of patients/cases per accountable entity
mu = .2 # marginal probability of the outcome
r = .7 # reliability for entity with an median number of patients
beta1 = log(1.5) # parameter for risk adjustment model---coefficient for x1 which is simulated from a standard Normal
### Simulate data 
example_df_2 <- simulateData(n.entity = n.entity, n.obs = n.obs, mu = mu, r = r, beta1 = beta1)
```
```{r, echo = FALSE, results = 'asis'}
knitr::kable(head(example_df_2, 10), caption = 'Simulated data with a covariate')
```
## Fit risk adjustment model to the data
To incorporate risk adjustment, we specify a model to use for risk adjustment:
```{r}
model = 'y ~ x1 + (1 | entity)'
```
This is a model that adjusts for `x1` and includes a random intercept for `entity`.
### Estimates of model parameters
```{r}
model.perf <- model_performance(df = example_df_2, model = model)
plotEstimates(model.perf)
```
The estimated value of the regression coefficient, after exponentiating is, `r round(model.perf$model.results$est,2)` with a 95% confidence interval ranging from `r round(model.perf$model.results$lb,2)` to `r round(model.perf$model.results$ub,2)`.
### Discrimination
The estimated c-statistic is `r round(model.perf$c.statistic,2)`, and below is a plot of the distributions of predicted probabilities separately for observations with and without the outcome occurring.
```{r}
plotPredictedDistribution(model.perf)
```
### Calibration
Below is a plot of the observed rate of the outcome for quantiles of the predicted probabilities from the model. The number of quantiles can be specified through the `quantiles` argument of the `plotCalibration()` function.
```{r}
plotCalibration(model.perf, quantiles = 5)
```
## Provider profiling analyses
The `calcPerformance()` function can be used to obtain some basic provider profiling results.
This analysis generates confidence intervals for entity-level performance. Confidence intervals for the observed-to-expected (O/E) standardized rates are Wilson score intervals, while the confidence intervals for the predicted-to-expected (P/E) standardized rates use a parametric bootstrapping method. The number of bootstraps and the number of cores to use for parallel processing can be adjusted with the `controlPerf()` function within `calcPerformance()`.
```{r, eval = FALSE}
perf.out <- calcPerformance(df = example_df_2, model = model, ctrPerf = controlPerf(n.boots = 1000, n.cores = 2))
tutorial_profiling_results <- perf.out$perf.results
```
The estimated marginal probability of the outcome is `r round(mean(example_df_2$y), 2)`.
```{r, echo = FALSE, results = 'asis'}
knitr::kable(tutorial_profiling_results$perf.summary, caption = 'Performance summary statistics across entities')
```
### Number of observations per entity
```{r}
plotN(tutorial_profiling_results$n)
```
### Unadjusted rates across entities
```{r}
plotPerformance(df = tutorial_profiling_results)
```
### OE-risk-standardized rates across entities
```{r}
plotPerformance(df = tutorial_profiling_results, plot.type = 'oe')
```
Categorization of entities according to whether their risk-standardized rates are lower, higher, or no different from the overall, unadjusted average can be found in the `perf.results` output from `calcPerformance()` function.
```{r, echo = FALSE, results = 'asis'}
knitr::kable(table(tutorial_profiling_results$category.oe), col.names = c('Category', 'Number of entities'), caption = 'Categorization of entities based on OE-risk-standardized rates')
```
### PE-risk-standardized rates across entities
```{r}
plotPerformance(df = tutorial_profiling_results, plot.type = 'pe')
```
```{r, echo = FALSE, results = 'asis'}
knitr::kable(table(tutorial_profiling_results$category.pe), col.names = c('Category', 'Number of entities'), caption = 'Categorization of entities based on PE-risk-standardized rates')
```
### Plot of ORs comparing each entity with the average
Another way to assess whether entities have measure performance that differs from the average is to examine the predicted random intercept values. Entities highlighted in red have estimated ORs that are statistically different from 1.0 at alpha = 0.05. There are `r sum(tutorial_profiling_results$intercept.sig==1)` entities with outcome adjusted odds that are significantly different from the average entity performance.
```{r}
plotPerformance(df = tutorial_profiling_results, plot.type = 'OR')
```
 
## Calculate reliability from all methods
Again, to calculate reliability estimates from all methods, we use the `calcReliability()` function. This time we specify the risk adjustment model as follows:
```{r, eval = FALSE}
tutorial_reliability_results_2 <- calcReliability(df = example_df_2, model = model)
```
Note that while the results from the Beta-Binomial method are included, these estimates currently do not account for risk adjustment variables.
```{r, echo = FALSE, results = 'asis'}
rel.results2 <- tutorial_reliability_results_2$rel.results
rel.results.sub2 <- rel.results2[,c('method', 'reliability', 'reliability_min', 'reliability_max')]
rel.results.sub2$reliability <- round(rel.results.sub2$reliability, 3)
rel.results.sub2$reliability_min <- round(rel.results.sub2$reliability_min, 3)
rel.results.sub2$reliability_max <- round(rel.results.sub2$reliability_max, 3)
names(rel.results.sub2) <- c('Method', 'Reliability', 'Min Reliability', 'Max Reliability')
knitr::kable(rel.results.sub2, caption = 'Reliability estimates')
```
```{r}
plotReliability(tutorial_reliability_results_2)
```