For an overview of the design principles and use of Bioconductor sequence classes, see Lawrence et al., 2013, Software for Computing and Annotating Genomic Ranges. PLoS Comput Biol 9(8): doi:10.1371/journal.pcbi.1003118
For an overview of select high-throughput sequence packages in Bioconductor, see Intermediate Sequence Analysis 2013 section 3.3.
Classes
IRanges(), GRanges()
metadata()), and on
individual elements (e.g., mcols())Vector(), e.g., length(),
subset, etc.*List(), e.g., IntegerList(), GRangesList()
IntegerList() is a list where all elements
are integer vectorsinteger() with
partitioning vector.DataFrame()Rle()
Methods
Users
DNAString(), DNAStringSet() classes
XString(), XStringSet classes.Users
XStringSet as basis for coordinating short reads
and quality scores.DNAString to represent whole genome sequences.GAlignments(), GAlignmentsList(), GAlignmentPairs()SummarizedExperiment()VCF()ShortRead – FASTQ files
ShortReadQ() – Reads and their quality scoresimport(), so complexity hidden from userVariantAnnotation readVcf(), filterVcf(). Manage large data by::
ScanVcfParam().ScanVcfParam()readInfo(), readGeno()TabixFile(<...>,
yieldSize=10000) and a paradigm liketbx <- open(TabixFile(fl, yieldSize=10000))
repeat({
vcf <- readVcf(tbx, "hg19") ## up to 10000 records
if (length(vcf) == 0)
break ## all done
## do work
}
close(tbx)
filterVcf()Rsamtools BamFile() and TabixFile() to open and iterate
through BAM and Tabix files
ScanBamParam();
iterate through large files using yieldSize argument of
BamFile().readGAlignmentsFromBam(), readGAlignmentsListFromBam()
ShortRead FastqStreamer(), FastqSampler(), readFastq()
yield() on an instance created by FastqStreamer()yield() onan instance created with FastqSampler()
Annotated
o metadata
-- Vector
o many methods (showMethods(class="Vector", where=search()))
-- Rle
-- List
-- SimpleList
-- DataFrame
-- Simple*List, e.g., SimpleNumericList
-- CompressedList (IRanges package)
-- Compressed*List, e.g., CompressedNumericList
-- Ranges
-- IRanges
-- ... *StringSet, e.g., DNAStringSet
-- GenomicRanges
-- GRanges (GenomicRanges package)
-- ... *String, e.g., DNAString (Biostrings package)
o transcribe, reverseComplement, pairwiseAligment
SummarizedExperiment (GenomicRanges package)
-- VCF (VariantAnnotation package; readVcf)
ShortReadQ