
Sccomp is a generalised method for differential composition and variability analyses.
Bioconductor
if (!requireNamespace("BiocManager")) install.packages("BiocManager")
BiocManager::install("sccomp")
Github
devtools::install_github("stemangiola/sccomp")
sccomp can model changes in composition and variability. By default, the formula for variability is either ~1, which assumes that the
cell-group variability is independent of any covariate or ~ factor_of_interest, which assumes that the model is dependent on the
factor of interest only. The variability model must be a subset of the model for composition.
single_cell_object |>
  sccomp_glm( 
    formula_composition = ~ type, 
    .sample =  sample, 
    .cell_group = cell_group, 
    bimodal_mean_variability_association = TRUE,
    cores = 1 
  )
counts_obj |>
  sccomp_glm( 
    formula_composition = ~ type, 
    .sample = sample,
    .cell_group = cell_group,
    .count = count, 
    bimodal_mean_variability_association = TRUE,
    cores = 1 
  )
## # A tibble: 72 × 18
##    cell_group parameter   factor c_lower c_eff…¹ c_upper   c_pH0   c_FDR c_n_eff
##    <chr>      <chr>       <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1 B1         (Intercept) <NA>    0.871    1.06   1.23   0       0         3701.
##  2 B1         typecancer  type   -1.20    -0.869 -0.520  2.50e-4 4.17e-5   4461.
##  3 B2         (Intercept) <NA>    0.406    0.697  0.963  2.50e-4 1.32e-5   4330.
##  4 B2         typecancer  type   -1.23    -0.773 -0.343  5.00e-3 6.67e-4   4862.
##  5 B3         (Intercept) <NA>   -0.621   -0.386 -0.144  6.70e-2 3.73e-3   4531.
##  6 B3         typecancer  type   -0.619   -0.228  0.143  4.43e-1 1.37e-1   4835.
##  7 BM         (Intercept) <NA>   -1.29    -1.02  -0.760  0       0         5571.
##  8 BM         typecancer  type   -0.755   -0.348  0.0301 2.18e-1 3.83e-2   4645.
##  9 CD4 1      (Intercept) <NA>    0.129    0.322  0.507  1.10e-1 1.14e-2   4999.
## 10 CD4 1      typecancer  type   -0.0999   0.160  0.428  6.2 e-1 2.29e-1   4322.
## # … with 62 more rows, 9 more variables: c_R_k_hat <dbl>, v_lower <dbl>,
## #   v_effect <dbl>, v_upper <dbl>, v_pH0 <dbl>, v_FDR <dbl>, v_n_eff <dbl>,
## #   v_R_k_hat <dbl>, count_data <list>, and abbreviated variable name ¹c_effect
Of the output table, the estimate columns start with the prefix c_ indicate composition, or with v_ indicate variability (when formula_variability is set).
seurat_obj |>
  sccomp_glm( 
    formula_composition = ~ 0 + type, 
    contrasts =  c("typecancer - typehealthy", "typehealthy - typecancer"),
    .sample = sample,
    .cell_group = cell_group, 
    bimodal_mean_variability_association = TRUE,
    cores = 1 
  )
## # A tibble: 60 × 18
##    cell_group     param…¹ factor c_lower c_eff…² c_upper   c_pH0   c_FDR c_n_eff
##    <chr>          <chr>   <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1 B immature     typeca… type    -2.01   -1.57   -1.13  0       0            NA
##  2 B immature     typehe… type     1.13    1.57    2.01  0       0            NA
##  3 B mem          typeca… type    -2.29   -1.71   -1.10  0       0            NA
##  4 B mem          typehe… type     1.10    1.71    2.29  0       0            NA
##  5 CD4 cm S100A4  typeca… type    -1.23   -0.879  -0.540 0       0            NA
##  6 CD4 cm S100A4  typehe… type     0.540   0.879   1.23  0       0            NA
##  7 CD4 cm high c… typeca… type     0.972   1.87    2.92  0       0            NA
##  8 CD4 cm high c… typehe… type    -2.92   -1.87   -0.972 0       0            NA
##  9 CD4 cm riboso… typeca… type     0.563   1.25    1.99  0.00125 2.81e-4      NA
## 10 CD4 cm riboso… typehe… type    -1.99   -1.25   -0.563 0.00125 2.81e-4      NA
## # … with 50 more rows, 9 more variables: c_R_k_hat <dbl>, v_lower <dbl>,
## #   v_effect <dbl>, v_upper <dbl>, v_pH0 <dbl>, v_FDR <dbl>, v_n_eff <dbl>,
## #   v_R_k_hat <dbl>, count_data <list>, and abbreviated variable names
## #   ¹parameter, ²c_effect
This is achieved through model comparison with loo. In the following example, the model with association with factors better fits the data compared to the baseline model with no factor association. For comparisons check_outliers must be set to FALSE as the leave-one-out must work with the same amount of data, while outlier elimination does not guarantee it.
If elpd_diff is away from zero of > 5 se_diff difference of 5, we are confident that a model is better than the other reference.
In this case, -79.9 / 11.5 = -6.9, therefore we can conclude that model one, the one with factor association, is better than model two.
library(loo)
# Fit first model
model_with_factor_association = 
  seurat_obj |>
  sccomp_glm( 
    formula_composition = ~ type, 
    .sample =  sample, 
    .cell_group = cell_group, 
    check_outliers = FALSE, 
    bimodal_mean_variability_association = TRUE,
    cores = 1, 
    enable_loo = TRUE
  )
# Fit second model
model_without_association = 
  seurat_obj |>
  sccomp_glm( 
    formula_composition = ~ 1, 
    .sample =  sample, 
    .cell_group = cell_group, 
    check_outliers = FALSE, 
    bimodal_mean_variability_association = TRUE,
    cores = 1 , 
    enable_loo = TRUE
  )
# Compare models
loo_compare(
  model_with_factor_association |> attr("fit") |> loo(),
  model_without_association |> attr("fit") |> loo()
)
We can model the cell-group variability also dependent on the type, and so test differences in variability
res = 
  seurat_obj |>
  sccomp_glm( 
    formula_composition = ~ type, 
    formula_variability = ~ type,
    .sample = sample,
    .cell_group = cell_group,
    bimodal_mean_variability_association = TRUE,
    cores = 1 
  )
res
## # A tibble: 60 × 18
##    cell_group     param…¹ factor c_lower c_eff…² c_upper   c_pH0   c_FDR c_n_eff
##    <chr>          <chr>   <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1 B immature     (Inter… <NA>     0.599   0.969   1.35  0       0         6867.
##  2 B immature     typehe… type     0.979   1.50    2.03  0       0         4165.
##  3 B mem          (Inter… <NA>    -1.66   -1.08   -0.444 0.00500 5.28e-4   4900.
##  4 B mem          typehe… type     1.24    2.01    2.75  0       0         3757.
##  5 CD4 cm S100A4  (Inter… <NA>     1.76    2.01    2.26  0       0         7733.
##  6 CD4 cm S100A4  typehe… type     0.366   0.755   1.18  0.00275 9.00e-4   3432.
##  7 CD4 cm high c… (Inter… <NA>    -0.858  -0.380   0.135 0.235   3.30e-2   4519.
##  8 CD4 cm high c… typehe… type    -3.16   -1.35    1.60  0.203   4.21e-2   2869.
##  9 CD4 cm riboso… (Inter… <NA>     0.179   0.501   0.847 0.0360  2.71e-3   3690.
## 10 CD4 cm riboso… typehe… type    -2.47   -1.73   -0.791 0.00300 1.25e-3   2787.
## # … with 50 more rows, 9 more variables: c_R_k_hat <dbl>, v_lower <dbl>,
## #   v_effect <dbl>, v_upper <dbl>, v_pH0 <dbl>, v_FDR <dbl>, v_n_eff <dbl>,
## #   v_R_k_hat <dbl>, count_data <list>, and abbreviated variable names
## #   ¹parameter, ²c_effect
We recommend setting bimodal_mean_variability_association  = TRUE. The bimodality of the mean-variability association can be confirmed from the plots$credible_intervals_2D (see below).
We recommend setting bimodal_mean_variability_association  = FALSE (Default).
plots = plot_summary(res) 
## Joining, by = c("cell_group", "sample")
## Joining, by = c("cell_group", "type")
## Warning: Expected 2 pieces. Additional pieces discarded in 4 rows [6, 7, 13,
## 14].
A plot of group proportion, faceted by groups. The blue boxplots represent the posterior predictive check. If the model is likely to be
descriptively adequate to the data, the blue box plot should roughly overlay with the black box plot, which represents the observed data. The
outliers are coloured in red. A box plot will be returned for every (discrete) covariate present in formula_composition. The colour coding
represents the significant associations for composition and/or variability.
plots$boxplot
## [[1]]
A plot of estimates of differential composition (c_) on the x-axis and differential variability (v_) on the y-axis. The error bars represent 95% credible intervals. The dashed lines represent the minimal effect that the hypothesis test is based on. An effect is labelled as significant if bigger than the minimal effect according to the 95% credible interval. Facets represent the covariates in the model.
plots$credible_intervals_1D
It is possible to directly evaluate the posterior distribution. In this example, we plot the Monte Carlo chain for the slope parameter of the first cell type. We can see that it has converged and is negative with probability 1.
res %>% attr("fit") %>% rstan::traceplot("beta[2,1]")
Plot 1D significance plot
plots = plot_summary(res)
## Joining, by = c("cell_group", "sample")
## Joining, by = c("cell_group", "type")
## Warning: Expected 2 pieces. Additional pieces discarded in 4 rows [6, 7, 13,
## 14].
plots$credible_intervals_1D
Plot 2D significance plot. Data points are cell groups. Error bars are the 95% credible interval. The dashed lines represent the default threshold fold change for which the probabilities (c_pH0, v_pH0) are calculated. pH0 of 0 represent the rejection of the null hypothesis that no effect is observed.
This plot is provided only if differential variability has been tested. The differential variability estimates are reliable only if the linear association between mean and variability for (intercept) (left-hand side facet) is satisfied. A scatterplot (besides the Intercept) is provided for each category of interest. The for each category of interest, the composition and variability effects should be generally uncorrelated.
plots$credible_intervals_2D
sessionInfo()
## R version 4.2.2 (2022-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] rstan_2.21.7         StanHeaders_2.21.0-7 tidyr_1.2.1         
## [4] forcats_0.5.2        ggplot2_3.4.0        sccomp_1.2.1        
## [7] dplyr_1.0.10        
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-7                matrixStats_0.63.0         
##  [3] RColorBrewer_1.1-3          GenomeInfoDb_1.34.4        
##  [5] tools_4.2.2                 utf8_1.2.2                 
##  [7] R6_2.5.1                    DBI_1.1.3                  
##  [9] BiocGenerics_0.44.0         colorspace_2.0-3           
## [11] withr_2.5.0                 sp_1.5-1                   
## [13] tidyselect_1.2.0            gridExtra_2.3              
## [15] prettyunits_1.1.1           processx_3.8.0             
## [17] compiler_4.2.2              progressr_0.12.0           
## [19] cli_3.5.0                   Biobase_2.58.0             
## [21] DelayedArray_0.24.0         labeling_0.4.2             
## [23] scales_1.2.1                readr_2.1.3                
## [25] callr_3.7.3                 stringr_1.5.0              
## [27] digest_0.6.31               XVector_0.38.0             
## [29] pkgconfig_2.0.3             parallelly_1.33.0          
## [31] MatrixGenerics_1.10.0       highr_0.9                  
## [33] rlang_1.0.6                 farver_2.1.1               
## [35] generics_0.1.3              inline_0.3.19              
## [37] RCurl_1.98-1.9              magrittr_2.0.3             
## [39] GenomeInfoDbData_1.2.9      loo_2.5.1                  
## [41] patchwork_1.1.2             Matrix_1.5-3               
## [43] Rcpp_1.0.9                  munsell_0.5.0              
## [45] S4Vectors_0.36.1            fansi_1.0.3                
## [47] lifecycle_1.0.3             stringi_1.7.8              
## [49] SummarizedExperiment_1.28.0 zlibbioc_1.44.0            
## [51] pkgbuild_1.4.0              grid_4.2.2                 
## [53] parallel_4.2.2              listenv_0.9.0              
## [55] ggrepel_0.9.2               crayon_1.5.2               
## [57] lattice_0.20-45             hms_1.1.2                  
## [59] knitr_1.41                  ps_1.7.2                   
## [61] pillar_1.8.1                GenomicRanges_1.50.2       
## [63] boot_1.3-28.1               future.apply_1.10.0        
## [65] codetools_0.2-18            stats4_4.2.2               
## [67] rstantools_2.2.0            glue_1.6.2                 
## [69] evaluate_0.19               SeuratObject_4.1.3         
## [71] RcppParallel_5.1.5          vctrs_0.5.1                
## [73] tzdb_0.3.0                  gtable_0.3.1               
## [75] purrr_1.0.0                 future_1.30.0              
## [77] assertthat_0.2.1            xfun_0.36                  
## [79] SingleCellExperiment_1.20.0 tibble_3.1.8               
## [81] IRanges_2.32.0              globals_0.16.2             
## [83] ellipsis_0.3.2