Capsule 0.2.0
This release adds comprehensive support for bioinformatics and
computational biology workflows, with enhancements specifically designed
for NGS analysis, HPC environments, and large-scale data processing.
- External Tool Version Tracking
track_external_tools(): Track versions of command-line
tools (samtools, STAR, BWA, etc.)
get_tool_versions(): Retrieve tracked tool
versions
- Automatically detects and tracks 18+ common bioinformatics
tools
- Conda/Mamba Environment Support
track_conda_env(): Export and track conda
environments
restore_conda_env(): Restore conda environments from
YAML
get_conda_env_info(): Retrieve conda environment
information
- Full support for both conda and mamba
- Reference Genome Tracking
track_reference_genome(): Track reference genomes,
annotations, and indices
get_reference_info(): Retrieve reference genome
information
list_reference_sources(): Display common reference
genome sources
- Tracks FASTA files, GTF/GFF annotations, and aligner indices (STAR,
BWA, etc.)
New Features - High Priority
- Large File Handling
- Enhanced
track_data() with smart checksumming for large
files (>1GB)
- xxHash64 support for 10-100x faster checksumming of BAM/FASTQ
files
- Metadata-based fingerprinting for very large files
- Automatic algorithm selection based on file size
- System Library Detection
capture_system_libraries(): Detect system library
versions
- Tracks libcurl, libxml2, BLAS/LAPACK implementations
- Essential for documenting system dependencies
- Hardware Information Capture
capture_hardware(): Capture CPU, RAM, and GPU
specifications
- NVIDIA GPU detection via nvidia-smi
- Cross-platform support (Linux, macOS, Windows)
- Essential for HPC job documentation
New Features -
Containerization & HPC
- Singularity/Apptainer Support
generate_singularity(): Generate Singularity definition
files
- Full support for HPC environments where Docker is unavailable
- Automatic build script generation
- Conda environment integration
New Features - Pipeline
Integration
- Workflow Manager Integration
export_for_nextflow(): Export data for Nextflow
pipelines
export_for_snakemake(): Export data for Snakemake
workflows
export_for_wdl(): Export data for WDL workflows
export_for_cwl(): Export data for CWL workflows
- Seamless integration with major workflow managers
New Features - Snapshot
Management
- Snapshot Comparison
compare_snapshots(): Compare two workflow
snapshots
list_snapshots(): List all available snapshots with
metadata
- Detailed diff reports showing package, parameter, and data
changes
- Markdown report generation
Enhancements
- Updated DESCRIPTION with bioinformatics focus
- Enhanced documentation with bioinformatics examples
- Improved error handling and user feedback
- Better cross-platform compatibility
- Added utils to imports for better compatibility
Bug Fixes
- Fixed issue with checksum verification for legacy tracked files
- Improved handling of missing files in verification
- Better error messages for conda/mamba detection
Capsule 0.1.0
Initial Release
Features
- Session Tracking: Comprehensive R session
information capture
capture_session(): Capture R version, platform, and
system info
capture_environment(): Capture global environment
state
- Package Management: Complete package version
tracking
snapshot_packages(): Create detailed package
manifests
create_renv_lockfile(): Generate renv lockfiles
- Automatic dependency graph creation
- Data Provenance: Track data files with integrity
verification
track_data(): Record data source, checksums, and
metadata
verify_data(): Verify data integrity via SHA-256
checksums
get_data_lineage(): Retrieve complete data
provenance
- Parameter Tracking: Document analysis parameters
track_params(): Store analysis parameters with
metadata
get_param_history(): Retrieve parameter history
- Random Seed Management: Reproducible random number
generation
set_seed(): Set and track random seeds
restore_seed(): Restore previously tracked seeds
- Complete RNG state tracking
- Script Generation: Create reproducible analysis
scripts
generate_repro_script(): Generate executable R
scripts
create_repro_report(): Generate markdown reports
- Automatic integration of all tracked components
- Docker Support: Containerization for perfect
reproducibility
generate_docker(): Generate Dockerfile and
docker-compose.yml
- RStudio Server support
- Automatic system dependency configuration
- Workflow Management: Complete workflow
orchestration
init_capsule(): Initialize Capsule in projects
snapshot_workflow(): Create complete workflow
snapshots
- Automatic artifact generation
Documentation
- Comprehensive README with quick start guide
- Complete function documentation with examples
- Example workflow demonstrating all features
- Docker usage instructions
Infrastructure
- MIT License
- Complete test suite
- Package structure following R best practices