if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}
BiocManager::install("Dune")We use a subset of the Allen Smart-Seq nuclei dataset. Run ?Dune::nuclei for more details on pre-processing.
suppressPackageStartupMessages({
  library(RColorBrewer)
  library(dplyr)
  library(ggplot2)
  library(tidyr)
  library(knitr)
  library(purrr)
  library(Dune)
})
data("nuclei", package = "Dune")
theme_set(theme_classic())We have a dataset of \(1744\) cells, with the results from 3 clustering algorithms: Seurat3, Monocle3 and SC3. The Allen Institute also produce hand-picked cluster and subclass labels. Finally, we included the coordinates from a t-SNE representation, for visualization.
ggplot(nuclei, aes(x = x, y = y, col = subclass_label)) +
  geom_point()We can also see how the three clustering algorithm partitioned the dataset initially:
walk(c("SC3", "Seurat", "Monocle"), function(clus_algo){
  df <- nuclei
  df$clus_algo <- nuclei[, clus_algo]
  p <- ggplot(df, aes(x = x, y = y, col = as.character(clus_algo))) +
    geom_point(size = 1.5) +
    # guides(color = FALSE) +
    labs(title = clus_algo, col = "clusters") +
    theme(legend.position = "bottom")
  print(p)
})The adjusted Rand Index between the three methods can be computed.
plotARIs(nuclei %>% select(SC3, Seurat, Monocle))## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.As we can see, the ARI between the three methods is initially quite low.
We can now try to merge clusters with the Dune function. At each step, the algorithm will print which clustering label is merged (by its number, so 1~SC3 and so on), as well as the pair of clusters that get merged.
merger <- Dune(clusMat = nuclei %>% select(SC3, Seurat, Monocle), verbose = TRUE)## [1] "SC3" "21"  "20" 
## [1] "Monocle" "20"      "4"      
## [1] "SC3" "11"  "12" 
## [1] "SC3" "30"  "28" 
## [1] "SC3" "11"  "24"The output from Dune is a list with four components:
names(merger)## [1] "initialMat" "currentMat" "merges"     "ImpMetric"  "metric"initialMat is the initial matrix. of cluster labels. currentMat is the final matrix of cluster labels. merges is a matrix that recapitulates what has been printed above, while ImpARI list the ARI improvement over the merges.
We can now see how much the ARI has improved:
plotARIs(clusMat = merger$currentMat)## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.The methods now look much more similar, as can be expected.
We can also see how the number of clusters got reduced.
plotPrePost(merger)For SC3 for example, we can visualize how the clusters got merged:
ConfusionPlot(merger$initialMat[, "SC3"], merger$currentMat[, "SC3"]) +
  labs(x = "Before merging", y = "After merging")Finally, the ARIImp function tracks mean ARI improvement as pairs of clusters get merged down.
ARItrend(merger)sessionInfo()## R version 4.1.1 (2021-08-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] Dune_1.6.0         purrr_0.3.4        knitr_1.36         tidyr_1.1.4       
## [5] ggplot2_3.3.5      dplyr_1.0.7        RColorBrewer_1.1-2
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.7                  gganimate_1.0.7            
##  [3] lattice_0.20-45             prettyunits_1.1.1          
##  [5] assertthat_0.2.1            digest_0.6.28              
##  [7] utf8_1.2.2                  R6_2.5.1                   
##  [9] GenomeInfoDb_1.30.0         stats4_4.1.1               
## [11] evaluate_0.14               highr_0.9                  
## [13] pillar_1.6.4                zlibbioc_1.40.0            
## [15] rlang_0.4.12                progress_1.2.2             
## [17] jquerylib_0.1.4             magick_2.7.3               
## [19] S4Vectors_0.32.0            Matrix_1.3-4               
## [21] rmarkdown_2.11              labeling_0.4.2             
## [23] BiocParallel_1.28.0         stringr_1.4.0              
## [25] RCurl_1.98-1.5              munsell_0.5.0              
## [27] DelayedArray_0.20.0         compiler_4.1.1             
## [29] xfun_0.27                   pkgconfig_2.0.3            
## [31] BiocGenerics_0.40.0         htmltools_0.5.2            
## [33] tidyselect_1.1.1            SummarizedExperiment_1.24.0
## [35] tibble_3.1.5                GenomeInfoDbData_1.2.7     
## [37] IRanges_2.28.0              matrixStats_0.61.0         
## [39] viridisLite_0.4.0           fansi_0.5.0                
## [41] aricode_1.0.0               crayon_1.4.1               
## [43] withr_2.4.2                 bitops_1.0-7               
## [45] grid_4.1.1                  jsonlite_1.7.2             
## [47] gtable_0.3.0                lifecycle_1.0.1            
## [49] DBI_1.1.1                   magrittr_2.0.1             
## [51] scales_1.1.1                stringi_1.7.5              
## [53] farver_2.1.0                XVector_0.34.0             
## [55] bslib_0.3.1                 ellipsis_0.3.2             
## [57] generics_0.1.1              vctrs_0.3.8                
## [59] tools_4.1.1                 Biobase_2.54.0             
## [61] glue_1.4.2                  tweenr_1.0.2               
## [63] hms_1.1.1                   MatrixGenerics_1.6.0       
## [65] parallel_4.1.1              fastmap_1.1.0              
## [67] yaml_2.2.1                  colorspace_2.0-2           
## [69] GenomicRanges_1.46.0        sass_0.4.0