If you use this package, please cite both the package and the paper that first presented the algorithm that you used (see below for details). Cite the package as follows:
Marwick, B. and K. Krishnamoorthy 2019 cvequality: Tests for the Equality of Coefficients of Variation from Multiple Groups. R software package version 0.1.3. Retrieved from https://github.com/benmarwick/cvequality, on 07/01/2019
And reference it in your text similar to this example:
“We used the R package cvequality (Version 0.1.3; Marwick and Krishnamoorthy 2019) to test for significant differences [etc.].”
A BibTeX entry for LaTeX users is:
 @Manual{,
    title = {cvequality: Tests for the Equality of Coefficients of Variation from Multiple Groups},
    author = {Ben Marwick and Kalimuthu Krishnamoorthy},
    note = {R package version 0.1.3},
    url = {https://github.com/benmarwick/cvequality},
  }You can also access the citation details from the R console with citation(package = "cvequality")
The coefficient of variation statistic is a simple and widely-used standardized measure of the spread of a set of measurements of a sample. It is a useful value because it allows us to directly compare variation in samples measured with different units, or with very different means. For example, with coefficients of variation we can compare the relative variability of lengths with that of weights, and so forth. It is also known as as relative standard deviation (RSD), and defined as the standard deviation divided by the mean:
\[c_v = \frac{\sigma}{\mu}\]
The coefficient of variation was introduced by Karl Pearson in 1896. Since then it has become widely used in chemistry, engineering, physics, sociology, finance, and other fields. Despite its ubiquity, we lack convenient methods for testing if \(c_v\) values from different samples are equal or not. Numerous statistical methods have been described in publications, but until now, none have been implemented in an easy-to-use statistical package. This R package aims to remedy this situation by providing methods for testing for the equality of \(c_v\) values from different samples. This will help researchers make decisions when comparing coefficients of variation.
Note that there are some limitations to using \(c_v\) values. The main restriction is that \(c_v\) should only be computed for data measured on a ratio scale, as these are the measurements that can only take non-negative values. For example it is not appropriate to compare \(c_v\) of measurements made using Celsius and Fahrenheit, because these are interval scales that can take negative values and have arbitrary zero points. When the mean of a variable is zero, the \(c_v\) cannot be computed, and when the mean is close to zero and the variable contains both positive and negative values, then the \(c_v\) can be inaccurate. A lesser concern is that log-normal data have stationary \(c_v\) values, and so they are generally uninformative.
This package includes two tests:
We include the Feltz and Miller (1996) test because it is widely cited as an authoritative test for the equality of \(c_v\) values that performs better than many approximative tests. It is a ‘gold standard’ test for comparing multiple \(c_v\). Our implementation is based on the following from Feltz and Miller (1996):
Let \(s_i\) be the observed standard deviation of the \(i\)th sample (or group of measurements), \(x_i\) be the observed mean of the \(i\)th sample, and let
\[m_i=n_i-1\]
and we do not know the population \(c_v\), so let \(\widehat{\tau }_c\) be an estimate as follows:
\[\widehat{\tau }_c = \frac{\left( \sum _{j=1}^km_j\frac{S_j}{\bar{x}_j} \right)}{M}\]
where
\[M=\sum _{j=1}^m m_j\]
Then the test statistic can be computed with:
\[\begin{aligned} D'A D= \widehat{\tau }_c^{-2}\left( 0.5+\widehat{\tau }_c^2\right) ^{-1}\left[ \sum _{i=1}^km_i\left(\frac{s_i}{\bar{x}_i} \right)^2-\frac{\widehat{\tau }_c^2}{M} \right] \end{aligned}\]
The \(D'AD\) value measures how far each sample \(c_v\) is from our estimate of the population \(c_v\). Feltz and Miller (1996) note that the \(D'AD\) value distributes as a central \(\chi^2\) random variable with \(k-1\) degrees of freedom, from which a p-value can be computed.
We include the Krishnamoorthy and Lee (2014) test as a more recent development with lower rates of type I error, better performance with uneven sample numbers, and more power across a range of conditions. The formulae for the Krishnamoorthy and Lee (2014) are quite involved, and we do not reproduce them here, but refer the reader to the paper.
For each test there are two functions, which we demonstrate below. The first function uses raw measurement data as its inputs. The second function uses only summary statistics as its inputs, intended for use when the raw data are not available.
To install the package:
To demonstrate the use the functions with raw data we will use Galton’s (1886) observations of heights (in inches) of 934 children in 205 families. This is one of the datasets that Pearson used when he introduced the \(c_v\)
# read in the data, we obtained this from the HistData package
GaltonFamilies <- read.csv("GaltonFamilies.csv")
library(knitr)
kable(head(GaltonFamilies), caption = "Preview of first few rows of the GaltonFamilies data")| X | family | father | mother | midparentHeight | children | childNum | gender | childHeight | 
|---|---|---|---|---|---|---|---|---|
| 1 | 001 | 78.5 | 67.0 | 75.43 | 4 | 1 | male | 73.2 | 
| 2 | 001 | 78.5 | 67.0 | 75.43 | 4 | 2 | female | 69.2 | 
| 3 | 001 | 78.5 | 67.0 | 75.43 | 4 | 3 | female | 69.0 | 
| 4 | 001 | 78.5 | 67.0 | 75.43 | 4 | 4 | female | 69.0 | 
| 5 | 002 | 75.5 | 66.5 | 73.66 | 4 | 1 | male | 73.5 | 
| 6 | 002 | 75.5 | 66.5 | 73.66 | 4 | 2 | male | 72.5 | 
Do male and female children have different variation in their heights? Let’s have a look, first with a plot to get some intuition about the variation:
library(ggplot2)
library(ggbeeswarm)
ggplot(GaltonFamilies, 
       aes(gender, 
           childHeight)) +
  geom_boxplot() +
  geom_quasirandom(alpha = 0.05) +
  theme_bw()Their spreads look quite similar, let’s test our intuition, first with Feltz and Miller’s (1996) asymptotic test:
# test for difference in CVs between boys and girls
library(cvequality)
child_height_by_gender_cv_test <- 
with(GaltonFamilies, 
     asymptotic_test(childHeight,
                     gender))We can repeat the test with Krishnamoorthy and Lee’s (2014) modified signed-likelihood ratio test:
child_height_by_gender_cv_test_MSLRT <- 
with(GaltonFamilies, 
     mslr_test(nr = 1e4, 
               childHeight,
               gender))| Test name | Test statistic | p-value | 
|---|---|---|
| asymptotic | 0.44164 | 0.5063320 | 
| M-SLRT | 0.44164 | 0.5077081 | 
We conclude that the \(c_v\) values are very similar, and there is no difference in the variation of heights in male and female children.
For something a little different, let’s look at measurements of 600 stone artefacts from archaeological sites in Berkshire, England dating to the Lower Paleolithic (Fox and Baker 2006). We want to know if the length of the artefacts more variable than the other dimensions.
# read in the data, we obtained this from the archdata package
handaxes <- read.csv("Handaxes.csv")
# take a quick look
kable(head(handaxes), 
      caption = "Preview of first few rows of the handaxes data")| X | Catalog | L | L1 | B | B1 | B2 | T | T1 | 
|---|---|---|---|---|---|---|---|---|
| 1 | 896.4.1 | 134 | 69 | 79 | 65 | 57 | 41 | 22 | 
| 2 | 909.49.1 | 225 | 51 | 104 | 52 | 99 | 68 | 28 | 
| 3 | 909.49.2 | 198 | 51 | 104 | 55 | 102 | 45 | 18 | 
| 4 | 909.49.3 | 159 | 36 | 78 | 33 | 78 | 37 | 13 | 
| 5 | 909.50.7 | 173 | 75 | 112 | 92 | 90 | 57 | 25 | 
| 6 | 909.63.36 | 101 | 35 | 68 | 28 | 64 | 28 | 10 | 
We need to reshape the data from the one-artefact-per-line format to a long format where the variable names are in one column, and the value for that variable is in another column:
library(dplyr)
library(tidyr)
handaxes_reshape <- 
  handaxes %>% 
  select(c(L, L1, B, B1, B2, T, T1)) %>% 
  gather(variable, value) %>% 
  na.omit
# take a quick look
kable(head(handaxes_reshape), 
      caption = "Preview of first few rows of the handaxes data")| variable | value | 
|---|---|
| L | 134 | 
| L | 225 | 
| L | 198 | 
| L | 159 | 
| L | 173 | 
| L | 101 | 
Let’s have a look to get an idea of the variation in each variable:
ggplot(handaxes_reshape, 
       aes(variable, 
           value)) +
  geom_boxplot() +
  geom_quasirandom(alpha = 0.02) +
  theme_bw()It looks like the L variable (length) has much greater variation that the others. Let’s test to see if this is due to chance or not:
handaxes_asymptotic_test <- 
with(handaxes_reshape, 
     asymptotic_test(value, 
                     variable))
handaxes_mlrt_test <- 
with(handaxes_reshape, 
     mslr_test(nr = 1e4, 
               value, 
               variable))| Test name | Test statistic | p-value | 
|---|---|---|
| asymptotic | 309.1919 | 0 | 
| M-SLRT | 309.1919 | 0 | 
We see low p-values (rounded to zero in the table the asymptotic test p-value is 8.76610110^{-64}), suggesting that the variation in \(c_v\) values across different measurements is not due to chance. Length is more variable than the other dimensions of the handaxes.
In some circumstances we do not have access to the individual measurement data. For the situations we provide versions of the functions that take summary statistics as input.
Here’s an example using the same data that Feltz and Miller (1996) present in their paper. They provide mean, \(c_v\), and n for three groups of measurements:
miller <- data.frame(test = c('ELISA', 'WEHI', '`Viral inhibition`'),
                     Mean = c(6.8, 8.5, 6.0),
                     CV =   c(0.090, 0.462, 0.340),
                     N =    c(5, 5, 5))To test for the equality of these \(c_v\) values, we need the standard deviations of the observations, which are easily obtained:
We can plot to see what to expect from the test:
ggplot(miller,
       aes(test,
           Mean)) +
  # points to show mean values
  geom_point(size = 4) +
  # lines to show standard deviations
  geom_linerange(aes(ymin = Mean - SD,
                     ymax = Mean + SD)) +
  theme_bw()Looking at the plot, we might expect that the ELISA values have significantly less variation that the other two groups. Let’s see what the tests suggest:
miller_asymptotic_test2 <- 
asymptotic_test2(k = nrow(miller), 
                 n = miller$N, 
                 s = miller$SD, 
                 x = miller$Mean)The asymptotic test statistic is 5.5304524, which gives a p-value of 0.0629619. Assuming a typical value of 0.05 for \(\alpha\), this test does not lead us to reject the hypothesis of no difference in variance. So we would conclude that the ELISA values are no less variable than the other two groups.
Let’s see how the modified signed-likelihood ratio test performs:
This results in for the test statistic, and a p-value of 0.036453. This p-value would lead us to reject the hypothesis of no difference in variance. So unlike the asymptotic test, this M-SLR test would conclude that the ELISA values are significantly less variable than the other two groups, consistent with our observations of the plot.
We can use set.seed() before we run the function to ensure that we get the same output each time.
Here is some example data:
set.seed(42)
df <- data.frame(x = c(rnorm(20, 5, 2), 
                       rnorm(20, 5, 4)), 
                 y = c(rep(1, 20), 
                       rep(2, 20)))Here is the first run of the test, using set.seed()
## $MSLRT
## [1] 5.827931
## 
## $p_value
## [1] 0.01577366And here is a second run of the same test, note that the set.seed() function must immediately preceed the statistic function:
## $MSLRT
## [1] 5.827931
## 
## $p_value
## [1] 0.01577366Let’s scale it up to see if this is robust. We can re-run the same test 10 times with set.seed() before each run, and if the results are reproduible then we should expect to get the same output each time. We can test this by counting how many unique values are in the output of all the runs combined - it should be just one.
# repeat it n times to see how the results vary
n <- 10
reps <- replicate(n, {set.seed(42); mslr_test(nr = 10000, x = df$x, y = df$y)})
# take a look
stat <- unlist(reps[seq(1, length(reps), 2)])
pvals <- unlist(reps[seq(2, length(reps), 2)])
how_many_unique_values <- data.frame(unique_stat_values = length(unique(stat)),
                                     unique_p_values = length(unique(pvals)))
kable(how_many_unique_values)| unique_stat_values | unique_p_values | 
|---|---|
| 1 | 1 | 
Feltz, C. J., & Miller, G. E. (1996). An asymptotic test for the equality of coefficients of variation from k populations. Statistics in Medicine, 15(6), 647-658. https://www.ncbi.nlm.nih.gov/pubmed/8731006
Fox, William and Tony Baker. 2006. Dimensions of 600 Acheulean Handaxes from Furze Platt, Maidenhead, Berkshire, England. Online at https://ele.net/acheulean/FPatROM.htm.
Galton, F. (1886). Regression Towards Mediocrity in Hereditary Stature Journal of the Anthropological Institute, 15, 246-263. https://www.jstor.org/stable/2841583
Krishnamoorthy, K., & Lee, M. (2014). Improved tests for the equality of normal coefficients of variation. Computational Statistics, 29(1-2), 215-232. https://link.springer.com/article/10.1007/s00180-013-0445-2
Pearson, K. (1896). Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity, and Panmixia. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 187, 253-318. Retrieved from https://www.jstor.org/stable/90707