In this document, we show that the Wilcoxon-Mann-Whitney test is comparable or superior to alternative methods.
Two alternative methods could be compared with the Wilcoxon-Mann-Whitney (WMW) test proposed by BioQC: the Kolmogorov-Smirnov (KS) test, and the Student’s t-test, or more particularly, the Welch’s test which does not assume equal sample number or equal variance, which is appropriate in the setting of gene expression studies.
Based on these considerations, BioQC implements a computationally efficient version of the WMW test. In order not to confuse end-users, no alternative methods are implemented.
Nevertheless, in order to demonstrate the power of WMW test in comparison with the KS-test or the t-test, we performed the sensitivity benchmark described in the simulation studies, for the two alternative tests respectively.
 
Figure 1: Sensitivity benchmark. Expression levels of genes in the ovary signature are dedicately sampled randomly from normal distributions with different mean values. The lines show the enrichment score for the Wilcoxon-Mann-Whitney test, the t-test and the Kolmogorov-Smirnov test respectively. In the right panel, outliers were added by adding a random value to 1% of the simulated genes.
As expected, the results suggest, that both the KS-test and the WMW-test are robust to noise, while the performance of the t-test drops significantly on noisy data. Additionally, the WMW-test appears to be superior to the KS-test for low expression differences.
Since the KS-test is so slow, we did not replicate the sensitivity benchmark from the simulation studies using the enrichment score rank. While it takes BioQC about 3 seconds on a single thread to test all 155 signatures, it already takes the KS-test about 2 seconds to test a single signature.
##       test replications elapsed relative
## 2  runKS()            5  11.411    1.000
## 1 runWMW()            5  16.116    1.412## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] ggplot2_3.3.3        plyr_1.8.6           reshape2_1.4.4      
##  [4] hgu133plus2.db_3.2.3 rbenchmark_1.0.0     gplots_3.1.1        
##  [7] gridExtra_2.3        latticeExtra_0.6-29  lattice_0.20-44     
## [10] org.Hs.eg.db_3.13.0  AnnotationDbi_1.54.0 IRanges_2.26.0      
## [13] S4Vectors_0.30.0     testthat_3.0.2       limma_3.48.0        
## [16] RColorBrewer_1.1-2   BioQC_1.20.0         Biobase_2.52.0      
## [19] BiocGenerics_0.38.0  knitr_1.33          
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.2             sass_0.4.0             pkgload_1.2.1         
##  [4] edgeR_3.34.0           bit64_4.0.5            jsonlite_1.7.2        
##  [7] gtools_3.8.2           bslib_0.2.5.1          assertthat_0.2.1      
## [10] highr_0.9              blob_1.2.1             GenomeInfoDbData_1.2.6
## [13] yaml_2.2.1             pillar_1.6.1           RSQLite_2.2.7         
## [16] glue_1.4.2             digest_0.6.27          XVector_0.32.0        
## [19] colorspace_2.0-1       htmltools_0.5.1.1      pkgconfig_2.0.3       
## [22] zlibbioc_1.38.0        purrr_0.3.4            scales_1.1.1          
## [25] jpeg_0.1-8.1           tibble_3.1.2           KEGGREST_1.32.0       
## [28] farver_2.1.0           generics_0.1.0         ellipsis_0.3.2        
## [31] cachem_1.0.5           withr_2.4.2            magrittr_2.0.1        
## [34] crayon_1.4.1           memoise_2.0.0          evaluate_0.14         
## [37] fansi_0.4.2            tools_4.1.0            lifecycle_1.0.0       
## [40] stringr_1.4.0          munsell_0.5.0          locfit_1.5-9.4        
## [43] Biostrings_2.60.0      compiler_4.1.0         jquerylib_0.1.4       
## [46] GenomeInfoDb_1.28.0    caTools_1.18.2         rlang_0.4.11          
## [49] grid_4.1.0             RCurl_1.98-1.3         labeling_0.4.2        
## [52] bitops_1.0-7           rmarkdown_2.8          gtable_0.3.0          
## [55] DBI_1.1.1              R6_2.5.0               dplyr_1.0.6           
## [58] utf8_1.2.1             fastmap_1.1.0          bit_4.0.4             
## [61] rprojroot_2.0.2        KernSmooth_2.23-20     desc_1.3.0            
## [64] stringi_1.6.2          Rcpp_1.0.6             vctrs_0.3.8           
## [67] png_0.1-7              tidyselect_1.1.1       xfun_0.23