systemPipeRdata 1.20.0
Note: the most recent version of this vignette can be found here.
Note: if you use systemPipeR and systemPipeRdata in published research, please cite:
Backman, T.W.H and Girke, T. (2016). systemPipeR: Workflow and Report Generation Environment. BMC Bioinformatics, 17: 388. 10.1186/s12859-016-1241-0.
systemPipeRdata is a helper package
to generate with a single command workflow templates that are intended to be
used by its parent package systemPipeR (H Backman and Girke 2016).
The systemPipeR project provides a suite of R/Bioconductor packages for designing,
building and running end-to-end analysis workflows on local machines, HPC clusters
and cloud systems, while generating at the same time publication quality analysis reports.
To test workflows quickly or design new ones from existing templates, users can
generate with a single command workflow instances fully populated with sample data
and parameter files required for running a chosen workflow.
Pre-configured directory structure of the workflow environment and the sample data
used by systemPipeRdata are described here.
The systemPipeRdata package is available at Bioconductor and can be installed from within R as follows:
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("systemPipeRdata")Also, it is possible to install the development version from Bioconductor.
BiocManager::install("systemPipeRdata", version = "devel", build_vignettes = TRUE, 
    dependencies = TRUE)  # Installs Devel version from Bioconductorlibrary("systemPipeRdata")  # Loads the packagelibrary(help = "systemPipeRdata")  # Lists package info
vignette("systemPipeRdata")  # Opens vignetteLoad one of the available workflows into your current working directory.
The following does this for the varseq workflow template. The name of the resulting
workflow directory can be specified under the mydirname argument. The default NULL
uses the name of the chosen workflow. An error is issued if a directory of the same
name and path exists already.
genWorkenvir(workflow = "systemPipeR/SPvarseq", mydirname = "varseq")
setwd("varseq")On Linux and OS X systems the same can be achieved from the command-line of a terminal with the following commands.
$ Rscript -e "systemPipeRdata::genWorkenvir(workflow='systemPipeR/SPvarseq', mydirname='varseq')"A collection of workflow templates are available, and it is possible to browse the current availability, as follows:
availableWF(github = TRUE)This function returns the list of workflow templates available within the package and systemPipeR Organization on GitHub. Each one listed template can be created as described above.
The workflow template choose from Github will be installed as an R package, and also it creates the environment with all the settings and files to run the demo analysis.
genWorkenvir(workflow="systemPipeR/SPrnaseq", mydirname="NULL")
setwd("SPrnaseq")Besides, it is possible to choose different versions of the workflow template,
defined through other branches on the GitHub Repository. By default, the master
branch is selected, however, it is possible to define a different branch with the ref argument.
genWorkenvir(workflow="systemPipeR/SPrnaseq", ref = "singleMachine")
setwd("SPrnaseq")Also, it is possible to download a specific workflow script for your analysis.
The URL can be specified under url argument and the R Markdown file name in
the urlname argument. The default NULL copies the current version available in the chose template.
genWorkenvir(workflow="systemPipeR/SPrnaseq", url = "https://raw.githubusercontent.com/systemPipeR/systemPipeRNAseq/cluster/vignettes/systemPipeRNAseq.Rmd", 
             urlname = "rnaseq_V-cluster.Rmd")
setwd("rnaseq")It is possible to create a new workflow structure from RStudio
menu File -> New File -> R Markdown -> From Template -> systemPipeR New WorkFlow.
This interactive option creates the same environment as demonstrated above.
Figure 1: Selecting workflow template within RStudio.
The workflow templates generated by genWorkenvir contain the following preconfigured directory structure:
CWL param and input.yml files need to be in the same subdirectory.Note: Directory names are indicated in green. Users can change this structure as needed, but need to adjust the code in their workflows accordingly.
Figure 2: systemPipeR’s preconfigured directory structure.
Next, run from within R the chosen sample workflow by executing the code provided
in the corresponding *.Rmd template file.
Much more detailed information on running and customizing systemPipeR
workflows is available in its overview vignette here.
This vignette can also be opened from R with the following command.
library("systemPipeR")  # Loads systemPipeR which needs to be installed via BiocManager::install() from Bioconductorvignette("systemPipeR", package = "systemPipeR")The location of the sample data provided by systemPipeRdata can be returned as a list.
pathList()## $targets
## [1] "/tmp/RtmpO6E3SN/Rinst2749ac3d119c73/systemPipeRdata/extdata/param/targets.txt"
## 
## $targetsPE
## [1] "/tmp/RtmpO6E3SN/Rinst2749ac3d119c73/systemPipeRdata/extdata/param/targetsPE.txt"
## 
## $annotationdir
## [1] "/tmp/RtmpO6E3SN/Rinst2749ac3d119c73/systemPipeRdata/extdata/annotation/"
## 
## $fastqdir
## [1] "/tmp/RtmpO6E3SN/Rinst2749ac3d119c73/systemPipeRdata/extdata/fastq/"
## 
## $bamdir
## [1] "/tmp/RtmpO6E3SN/Rinst2749ac3d119c73/systemPipeRdata/extdata/bam/"
## 
## $paramdir
## [1] "/tmp/RtmpO6E3SN/Rinst2749ac3d119c73/systemPipeRdata/extdata/param/"
## 
## $workflows
## [1] "/tmp/RtmpO6E3SN/Rinst2749ac3d119c73/systemPipeRdata/extdata/workflows/"
## 
## $chipseq
## [1] "/tmp/RtmpO6E3SN/Rinst2749ac3d119c73/systemPipeRdata/extdata/workflows/chipseq/"
## 
## $rnaseq
## [1] "/tmp/RtmpO6E3SN/Rinst2749ac3d119c73/systemPipeRdata/extdata/workflows/rnaseq/"
## 
## $riboseq
## [1] "/tmp/RtmpO6E3SN/Rinst2749ac3d119c73/systemPipeRdata/extdata/workflows/riboseq/"
## 
## $varseq
## [1] "/tmp/RtmpO6E3SN/Rinst2749ac3d119c73/systemPipeRdata/extdata/workflows/varseq/"
## 
## $new
## [1] "/tmp/RtmpO6E3SN/Rinst2749ac3d119c73/systemPipeRdata/extdata/workflows/new/"sessionInfo()## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices
## [6] utils     datasets  methods   base     
## 
## other attached packages:
##  [1] systemPipeRdata_1.20.0      batchtools_0.9.15          
##  [3] ape_5.5                     ggplot2_3.3.3              
##  [5] systemPipeR_1.26.0          ShortRead_1.50.0           
##  [7] GenomicAlignments_1.28.0    SummarizedExperiment_1.22.0
##  [9] Biobase_2.52.0              MatrixGenerics_1.4.0       
## [11] matrixStats_0.58.0          BiocParallel_1.26.0        
## [13] Rsamtools_2.8.0             Biostrings_2.60.0          
## [15] XVector_0.32.0              GenomicRanges_1.44.0       
## [17] GenomeInfoDb_1.28.0         IRanges_2.26.0             
## [19] S4Vectors_0.30.0            BiocGenerics_0.38.0        
## [21] BiocStyle_2.20.0           
## 
## loaded via a namespace (and not attached):
##   [1] colorspace_2.0-1         rjson_0.2.20            
##   [3] hwriter_1.3.2            ellipsis_0.3.2          
##   [5] remotes_2.3.0            bit64_4.0.5             
##   [7] AnnotationDbi_1.54.0     fansi_0.4.2             
##   [9] codetools_0.2-18         splines_4.1.0           
##  [11] cachem_1.0.5             knitr_1.33              
##  [13] jsonlite_1.7.2           annotate_1.70.0         
##  [15] GO.db_3.13.0             dbplyr_2.1.1            
##  [17] png_0.1-7                pheatmap_1.0.12         
##  [19] graph_1.70.0             BiocManager_1.30.15     
##  [21] compiler_4.1.0           httr_1.4.2              
##  [23] backports_1.2.1          GOstats_2.58.0          
##  [25] assertthat_0.2.1         Matrix_1.3-3            
##  [27] fastmap_1.1.0            limma_3.48.0            
##  [29] formatR_1.9              htmltools_0.5.1.1       
##  [31] prettyunits_1.1.1        tools_4.1.0             
##  [33] gtable_0.3.0             glue_1.4.2              
##  [35] GenomeInfoDbData_1.2.6   Category_2.58.0         
##  [37] dplyr_1.0.6              rsvg_2.1.2              
##  [39] rappdirs_0.3.3           V8_3.4.2                
##  [41] Rcpp_1.0.6               jquerylib_0.1.4         
##  [43] vctrs_0.3.8              nlme_3.1-152            
##  [45] debugme_1.1.0            rtracklayer_1.52.0      
##  [47] xfun_0.23                stringr_1.4.0           
##  [49] lifecycle_1.0.0          restfulr_0.0.13         
##  [51] XML_3.99-0.6             edgeR_3.34.0            
##  [53] zlibbioc_1.38.0          scales_1.1.1            
##  [55] BSgenome_1.60.0          VariantAnnotation_1.38.0
##  [57] hms_1.1.0                RBGL_1.68.0             
##  [59] RColorBrewer_1.1-2       yaml_2.2.1              
##  [61] curl_4.3.1               memoise_2.0.0           
##  [63] sass_0.4.0               biomaRt_2.48.0          
##  [65] latticeExtra_0.6-29      stringi_1.6.2           
##  [67] RSQLite_2.2.7            genefilter_1.74.0       
##  [69] BiocIO_1.2.0             checkmate_2.0.0         
##  [71] GenomicFeatures_1.44.0   filelock_1.0.2          
##  [73] DOT_0.1                  rlang_0.4.11            
##  [75] pkgconfig_2.0.3          bitops_1.0-7            
##  [77] evaluate_0.14            lattice_0.20-44         
##  [79] purrr_0.3.4              bit_4.0.4               
##  [81] tidyselect_1.1.1         GSEABase_1.54.0         
##  [83] AnnotationForge_1.34.0   magrittr_2.0.1          
##  [85] bookdown_0.22            R6_2.5.0                
##  [87] generics_0.1.0           base64url_1.4           
##  [89] DelayedArray_0.18.0      DBI_1.1.1               
##  [91] withr_2.4.2              pillar_1.6.1            
##  [93] survival_3.2-11          KEGGREST_1.32.0         
##  [95] RCurl_1.98-1.3           tibble_3.1.2            
##  [97] crayon_1.4.1             utf8_1.2.1              
##  [99] BiocFileCache_2.0.0      rmarkdown_2.8           
## [101] jpeg_0.1-8.1             progress_1.2.2          
## [103] locfit_1.5-9.4           grid_4.1.0              
## [105] data.table_1.14.0        blob_1.2.1              
## [107] Rgraphviz_2.36.0         digest_0.6.27           
## [109] xtable_1.8-4             brew_1.0-6              
## [111] munsell_0.5.0            bslib_0.2.5.1This project was supported by funds from the National Institutes of Health (NIH) and the National Science Foundation (NSF).
H Backman, Tyler W, and Thomas Girke. 2016. “systemPipeR: NGS workflow and report generation environment.” BMC Bioinformatics 17 (1): 388. https://doi.org/10.1186/s12859-016-1241-0.