scpdata 1.12.0
scpdata packagescpdata disseminates mass spectrometry (MS)-based single-cell
proteomics (SCP) data sets formatted using the scp data structure.
The data structure is described in the
scp vignette.
In this vignette, we describe how to access the SCP data sets. To
start, we load the scpdata package.
library("scpdata")ExperimentHubThe data is stored using the
ExperimentHub
infrastructure. We first create a connection with ExperimentHub.
eh <- ExperimentHub()You can list all data sets available in scpdata using the query
function.
query(eh, "scpdata")
#> ExperimentHub with 26 records
#> # snapshotDate(): 2024-04-29
#> # $dataprovider: MassIVE, PRIDE, SlavovLab website, Dataverse
#> # $species: Homo sapiens, Mus musculus, Rattus norvegicus, Gallus gallus
#> # $rdataclass: QFeatures
#> # additional mcols(): taxonomyid, genome, description,
#> #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#> #   rdatapath, sourceurl, sourcetype 
#> # retrieve records with, e.g., 'object[["EH3899"]]' 
#> 
#>            title                  
#>   EH3899 | specht2019v2           
#>   EH3900 | specht2019v3           
#>   EH3901 | dou2019_lysates        
#>   EH3902 | dou2019_mouse          
#>   EH3903 | dou2019_boosting       
#>   ...      ...                    
#>   EH9450 | gregoire2023_mixCTRL   
#>   EH9477 | khan2023               
#>   EH9487 | guise2024              
#>   EH9497 | petrosius2023_mES      
#>   EH9498 | petrosius2023_AstralAMLAnother way to get information about the available data sets is to
call scpdata(). This will retrieve all the available metadata. For
example, we can retrieve the data set titles along with the
description to make an informed choice about which data set to choose.
info <- scpdata()
knitr::kable(info[, c("title", "description")])| title | description | |
|---|---|---|
| EH3899 | specht2019v2 | SCP expression data for monocytes (U-937) and macrophages at PSM, peptide and protein level | 
| EH3900 | specht2019v3 | SCP expression data for more monocytes (U-937) and macrophages at PSM, peptide and protein level | 
| EH3901 | dou2019_lysates | SCP expression data for Hela digests (0.2 or 10 ng) at PSM and protein level | 
| EH3902 | dou2019_mouse | SCP expression data for C10, SVEC or Raw cells at PSM and protein level | 
| EH3903 | dou2019_boosting | SCP expression data for C10, SVEC or Raw cells and 3 boosters (0, 5 or 50 ng) at PSM and protein level | 
| EH3904 | zhu2018MCP | Near SCP expression data for micro-dissection rat brain samples (50, 100, or 200 µm width) at PSM level | 
| EH3905 | zhu2018NC_hela | Near SCP expression data for HeLa samples (aproximately 12, 40, or 140 cells) at PSM level | 
| EH3906 | zhu2018NC_lysates | Near SCP expression data for HeLa lysates (10, 40 and 140 cell equivalent) at PSM level | 
| EH3907 | zhu2018NC_islets | Near SCP expression data for micro-dissected human pancreas samples (control patients or type 1 diabetes) at PSM level | 
| EH3908 | cong2020AC | SCP expression data for Hela cells at PSM, peptide and protein level | 
| EH3909 | zhu2019EL | SCP expression data for chicken utricle samples (1, 3, 5 or 20 cells) at PSM, peptide and protein level | 
| EH6011 | liang2020_hela | Expression data for HeLa cells (0, 1, 10, 150, 500 cells) at PSM, peptide and protein level | 
| EH7085 | schoof2021 | Single-cell proteomics data from OCI-AML8227 cell culture to reconstruct the cellular hierarchy. | 
| EH7295 | williams2020_lfq | Single-cell label free proteomics data from a MCF10A cell line culture. | 
| EH7296 | williams2020_tmt | Single-cell proteomics data from three acute myeloid leukemia cell line culture (MOLM-14, K562, CMK). | 
| EH7712 | derks2022 | Single-cell and bulk (100-cell) proteomics data of PDAC, melanoma cells and monocytes. | 
| EH7713 | brunner2022 | Single-cell proteomics data of cell cycle stages in HeLa. | 
| EH8301 | leduc2022_pSCoPE | Single-cell proteomics data of 878 melanoma cells and 877 monocytes (pSCoPE). | 
| EH8302 | leduc2022_plexDIA | Single-cell proteomics data of 126 melanoma cells (plexDIA). | 
| EH8303 | woo2022_macrophage | Single-cell proteomics data from LPS-treated macrophages. | 
| EH8304 | woo2022_lung | Single-cell proteomics data from primary human lung cells. | 
| EH9450 | gregoire2023_mixCTRL | Single-cell proteomics data from two monocyte cell lines | 
| EH9477 | khan2023 | Single-cell proteomics data of 421 MCF-10A cells undergoing EMT triggered by TGF-beta | 
| EH9487 | guise2024 | Single-cell proteomics data of 108 postmortem CTL or ALS spinal moto neurons | 
| EH9497 | petrosius2023_mES | Mouse embryonic stem cells across ground-state (m2i) and differentiation-permissive (m15) culture conditions. | 
| EH9498 | petrosius2023_AstralAML | Single-cell proteomics data of 4 cell types from the OCI-AML8227 model. | 
To get one of the data sets (e.g. dou2019_lysates) you can either
retrieve it using the ExperimentHub query function
scp <- eh[["EH3901"]]
#> see ?scpdata and browseVignettes('scpdata') for documentation
#> loading from cache
scp
#> An instance of class QFeatures containing 4 assays:
#>  [1] Hela_run_1: SingleCellExperiment with 24562 rows and 10 columns 
#>  [2] Hela_run_2: SingleCellExperiment with 24310 rows and 10 columns 
#>  [3] peptides: SingleCellExperiment with 13934 rows and 20 columns 
#>  [4] proteins: SingleCellExperiment with 1641 rows and 20 columnsor you can the use the built-in functions from scpdata
scp <- dou2019_lysates()
#> see ?scpdata and browseVignettes('scpdata') for documentation
#> loading from cache
scp
#> An instance of class QFeatures containing 4 assays:
#>  [1] Hela_run_1: SingleCellExperiment with 24562 rows and 10 columns 
#>  [2] Hela_run_2: SingleCellExperiment with 24310 rows and 10 columns 
#>  [3] peptides: SingleCellExperiment with 13934 rows and 20 columns 
#>  [4] proteins: SingleCellExperiment with 1641 rows and 20 columnsEach data set has been extensively documented in a separate man page
(e.g. ?dou2019_lysates). You can find information about the data
content, the acquisition protocol, the data collection procedure as
well as the data sources and reference.
For more information about manipulating the data sets, check the
scp
package. The scp
vignette
will guide you through a typical SCP data processing workflow. Once
your data is loaded from scpdata you can skip section 2
Read in SCP data of the scp vignette.
R version 4.4.0 beta (2024-04-15 r86425)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS
Matrix products: default
BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB              LC_COLLATE=C              
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
time zone: America/New_York
tzcode source: system (glibc)
attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     
other attached packages:
 [1] scpdata_1.12.0              ExperimentHub_2.12.0       
 [3] AnnotationHub_3.12.0        BiocFileCache_2.12.0       
 [5] dbplyr_2.5.0                QFeatures_1.14.0           
 [7] MultiAssayExperiment_1.30.0 SummarizedExperiment_1.34.0
 [9] Biobase_2.64.0              GenomicRanges_1.56.0       
[11] GenomeInfoDb_1.40.0         IRanges_2.38.0             
[13] S4Vectors_0.42.0            BiocGenerics_0.50.0        
[15] MatrixGenerics_1.16.0       matrixStats_1.3.0          
[17] BiocStyle_2.32.0           
loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1            dplyr_1.1.4                
 [3] blob_1.2.4                  filelock_1.0.3             
 [5] Biostrings_2.72.0           fastmap_1.1.1              
 [7] SingleCellExperiment_1.26.0 lazyeval_0.2.2             
 [9] digest_0.6.35               mime_0.12                  
[11] lifecycle_1.0.4             cluster_2.1.6              
[13] ProtGenerics_1.36.0         KEGGREST_1.44.0            
[15] RSQLite_2.3.6               magrittr_2.0.3             
[17] compiler_4.4.0              rlang_1.1.3                
[19] sass_0.4.9                  tools_4.4.0                
[21] igraph_2.0.3                utf8_1.2.4                 
[23] yaml_2.3.8                  knitr_1.46                 
[25] S4Arrays_1.4.0              bit_4.0.5                  
[27] curl_5.2.1                  DelayedArray_0.30.0        
[29] abind_1.4-5                 withr_3.0.0                
[31] purrr_1.0.2                 grid_4.4.0                 
[33] fansi_1.0.6                 MASS_7.3-60.2              
[35] cli_3.6.2                   rmarkdown_2.26             
[37] crayon_1.5.2                generics_0.1.3             
[39] httr_1.4.7                  BiocBaseUtils_1.6.0        
[41] DBI_1.2.2                   cachem_1.0.8               
[43] zlibbioc_1.50.0             AnnotationDbi_1.66.0       
[45] AnnotationFilter_1.28.0     BiocManager_1.30.22        
[47] XVector_0.44.0              vctrs_0.6.5                
[49] Matrix_1.7-0                jsonlite_1.8.8             
[51] bookdown_0.39               bit64_4.0.5                
[53] clue_0.3-65                 tidyr_1.3.1                
[55] jquerylib_0.1.4             glue_1.7.0                 
[57] BiocVersion_3.19.1          UCSC.utils_1.0.0           
[59] tibble_3.2.1                pillar_1.9.0               
[61] rappdirs_0.3.3              htmltools_0.5.8.1          
[63] GenomeInfoDbData_1.2.12     R6_2.5.1                   
[65] evaluate_0.23               lattice_0.22-6             
[67] png_0.1-8                   memoise_2.0.1              
[69] bslib_0.7.0                 SparseArray_1.4.0          
[71] xfun_0.43                   MsCoreUtils_1.16.0         
[73] pkgconfig_2.0.3            This vignette is distributed under a CC BY-SA license.