The gDNAinRNAseqData package provides access through the ExperimentHub, to RNA-seq BAM files containing different levels of genomic DNA (gDNA) contamination. This vignette illustrates how to download them.
gDNAinRNAseqData 1.6.0
Here we show how to download a subset of the RNA-seq data published in:
Li, X., Zhang, P., and Yu. Y. Gene expressed at low levels raise false discovery rates in RNA samples contaminated with genomic DNA. BMC Genomics, 23:554, 2022. https://doi.org/10.1186/s12864-022-08785-1
The subset of the data available through this package are BAM files containing
about 100,000 alignments, sampled uniformly at random from complete BAM files.
These complete BAM files were obtained by aligning the RNA-seq reads sequenced
from total RNA libraries mixed with different concentrations of gDNA,
concretely 0% (no contamination), 1% and 10%; see Fig. 2 from Li et al. (2022).
The original RNA-seq data is publicly available at
https://ngdc.cncb.ac.cn/bioproject/browse/PRJCA007961 and you can find the
pipeline to generate this subset of the data in the file
gDNAinRNAseqData/inst/scripts/make-data_LiYu22subsetBAMfiles.R stored in this
package.
To download these subsetted BAM files, and the corresponding index (.bai) files,
we load this package and call the function LiYu22subsetBAMfiles():
library(gDNAinRNAseqData)
bamfiles <- LiYu22subsetBAMfiles()
bamfiles## [1] "/home/biocbuild/bbs-3.20-data-experiment/tmpdir/Rtmp12VPrv/s32gDNA0.bam" 
## [2] "/home/biocbuild/bbs-3.20-data-experiment/tmpdir/Rtmp12VPrv/s33gDNA0.bam" 
## [3] "/home/biocbuild/bbs-3.20-data-experiment/tmpdir/Rtmp12VPrv/s34gDNA0.bam" 
## [4] "/home/biocbuild/bbs-3.20-data-experiment/tmpdir/Rtmp12VPrv/s26gDNA1.bam" 
## [5] "/home/biocbuild/bbs-3.20-data-experiment/tmpdir/Rtmp12VPrv/s27gDNA1.bam" 
## [6] "/home/biocbuild/bbs-3.20-data-experiment/tmpdir/Rtmp12VPrv/s28gDNA1.bam" 
## [7] "/home/biocbuild/bbs-3.20-data-experiment/tmpdir/Rtmp12VPrv/s23gDNA10.bam"
## [8] "/home/biocbuild/bbs-3.20-data-experiment/tmpdir/Rtmp12VPrv/s24gDNA10.bam"
## [9] "/home/biocbuild/bbs-3.20-data-experiment/tmpdir/Rtmp12VPrv/s25gDNA10.bam"The previous function call can take a path argument to specify the path in
the filesystem where we would like to store the downloaded BAM files, which
by default is a temporary path from the current R session; consult the help
page of LiYu22subsetBAMfiles() for full details.
We can also retrieve the gDNA concentrations associated to each BAM file with the following function call:
pdat <- LiYu22phenoData(bamfiles)
pdat##           gDNA
## s32gDNA0     0
## s33gDNA0     0
## s34gDNA0     0
## s26gDNA1     1
## s27gDNA1     1
## s28gDNA1     1
## s23gDNA10   10
## s24gDNA10   10
## s25gDNA10   10sessionInfo()## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] Rsamtools_2.22.0       Biostrings_2.74.0      XVector_0.46.0        
##  [4] GenomicRanges_1.58.0   GenomeInfoDb_1.42.0    IRanges_2.40.0        
##  [7] S4Vectors_0.44.0       BiocGenerics_0.52.0    gDNAinRNAseqData_1.6.0
## [10] BiocStyle_2.34.0      
## 
## loaded via a namespace (and not attached):
##  [1] KEGGREST_1.46.0         xfun_0.48               bslib_0.8.0            
##  [4] Biobase_2.66.0          vctrs_0.6.5             tools_4.4.1            
##  [7] bitops_1.0-9            generics_0.1.3          parallel_4.4.1         
## [10] curl_5.2.3              tibble_3.2.1            fansi_1.0.6            
## [13] AnnotationDbi_1.68.0    RSQLite_2.3.7           blob_1.2.4             
## [16] pkgconfig_2.0.3         dbplyr_2.5.0            lifecycle_1.0.4        
## [19] GenomeInfoDbData_1.2.13 compiler_4.4.1          codetools_0.2-20       
## [22] htmltools_0.5.8.1       sass_0.4.9              RCurl_1.98-1.16        
## [25] yaml_2.3.10             pillar_1.9.0            crayon_1.5.3           
## [28] jquerylib_0.1.4         BiocParallel_1.40.0     cachem_1.1.0           
## [31] mime_0.12               ExperimentHub_2.14.0    AnnotationHub_3.14.0   
## [34] tidyselect_1.2.1        digest_0.6.37           purrr_1.0.2            
## [37] dplyr_1.1.4             bookdown_0.41           BiocVersion_3.20.0     
## [40] fastmap_1.2.0           cli_3.6.3               magrittr_2.0.3         
## [43] XML_3.99-0.17           utf8_1.2.4              withr_3.0.2            
## [46] filelock_1.0.3          UCSC.utils_1.2.0        rappdirs_0.3.3         
## [49] bit64_4.5.2             rmarkdown_2.28          httr_1.4.7             
## [52] bit_4.5.0               png_0.1-8               memoise_2.0.1          
## [55] evaluate_1.0.1          knitr_1.48              BiocFileCache_2.14.0   
## [58] rlang_1.1.4             glue_1.8.0              DBI_1.2.3              
## [61] BiocManager_1.30.25     jsonlite_1.8.9          R6_2.5.1               
## [64] zlibbioc_1.52.0