The RaggedExperiment package provides a flexible data representation for copy number, mutation and other ragged array schema for genomic location data. It aims to provide a framework for a set of samples that have differing numbers of genomic ranges.
The RaggedExperiment class derives from a GRangesList representation and
provides a semblance of a rectangular dataset. The row and column dimensions of
the RaggedExperiment correspond to the number of ranges in the entire dataset
and the number of samples represented in the data, respectively.
if (!require("BiocManager"))
install.packages("BiocManager")
BiocManager::install("RaggedExperiment")
Loading the package:
library(RaggedExperiment)
library(GenomicRanges)
RaggedExperiment class overviewA schematic showing the class geometry and supported transformations of the
RaggedExperiment class is show below. There are three main operations for
transforming the RaggedExperiment representation:
sparseAssaycompactAssayqreduceAssay
Figure 1: RaggedExperiment object schematic
Rows and columns represent genomic ranges and samples, respectively. Assay operations can be performed with (from left to right) compactAssay, qreduceAssay, and sparseAssay.
RaggedExperiment objectWe start with a couple of GRanges objects, each representing an individual
sample:
sample1 <- GRanges(
c(A = "chr1:1-10:-", B = "chr1:8-14:+", C = "chr2:15-18:+"),
score = 3:5)
sample2 <- GRanges(
c(D = "chr1:1-10:-", E = "chr2:11-18:+"),
score = 1:2)
Include column data colData to describe the samples:
colDat <- DataFrame(id = 1:2)
GRanges objectsragexp <- RaggedExperiment(sample1 = sample1,
sample2 = sample2,
colData = colDat)
ragexp
## class: RaggedExperiment
## dim: 5 2
## assays(1): score
## rownames(5): A B C D E
## colnames(2): sample1 sample2
## colData names(1): id
GRangesList instancegrl <- GRangesList(sample1 = sample1, sample2 = sample2)
RaggedExperiment(grl, colData = colDat)
## class: RaggedExperiment
## dim: 5 2
## assays(1): score
## rownames(5): A B C D E
## colnames(2): sample1 sample2
## colData names(1): id
list of GRangesrangeList <- list(sample1 = sample1, sample2 = sample2)
RaggedExperiment(rangeList, colData = colDat)
## class: RaggedExperiment
## dim: 5 2
## assays(1): score
## rownames(5): A B C D E
## colnames(2): sample1 sample2
## colData names(1): id
List of GRanges with metadataNote: In cases where a SimpleGenomicRangesList is provided along with
accompanying metadata (accessed by mcols), the metadata is used as the
colData for the RaggedExperiment.
grList <- List(sample1 = sample1, sample2 = sample2)
mcols(grList) <- colDat
RaggedExperiment(grList)
## class: RaggedExperiment
## dim: 5 2
## assays(1): score
## rownames(5): A B C D E
## colnames(2): sample1 sample2
## colData names(1): id
rowRanges(ragexp)
## GRanges object with 5 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## A chr1 1-10 -
## B chr1 8-14 +
## C chr2 15-18 +
## D chr1 1-10 -
## E chr2 11-18 +
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
dimnames(ragexp)
## [[1]]
## [1] "A" "B" "C" "D" "E"
##
## [[2]]
## [1] "sample1" "sample2"
colDatacolData(ragexp)
## DataFrame with 2 rows and 1 column
## id
## <integer>
## sample1 1
## sample2 2
Subsetting a RaggedExperiment is akin to subsetting a matrix object. Rows
correspond to genomic ranges and columns to samples or specimen. It is possible
to subset using integer, character, and logical indices.
The overlapsAny and subsetByOverlaps functionalities are available for use
for RaggedExperiment. Please see the corresponding documentation in
RaggedExperiment and GenomicRanges.
RaggedExperiment package provides several different functions for representing
ranged data in a rectangular matrix via the *Assay methods.
The most straightforward matrix representation of a RaggedExperiment will
return a matrix of dimensions equal to the product of the number of ranges and
samples.
dim(ragexp)
## [1] 5 2
Reduce(`*`, dim(ragexp))
## [1] 10
sparseAssay(ragexp)
## sample1 sample2
## A 3 NA
## B 4 NA
## C 5 NA
## D NA 1
## E NA 2
length(sparseAssay(ragexp))
## [1] 10
We provide sparse matrix representations with the help of the Matrix package.
To obtain a sparse representation, the user can use the sparse = TRUE
argument.
sparseAssay(ragexp, sparse = TRUE)
## 5 x 2 sparse Matrix of class "dgCMatrix"
## sample1 sample2
## A 3 .
## B 4 .
## C 5 .
## D . 1
## E . 2
This representation is of class dgCMatrix see the Matrix documentation for
more details.
Samples with identical ranges are placed in the same row. Non-disjoint ranges are not collapsed.
compactAssay(ragexp)
## sample1 sample2
## chr1:8-14:+ 4 NA
## chr1:1-10:- 3 1
## chr2:11-18:+ NA 2
## chr2:15-18:+ 5 NA
Similarly, to sparseAssay the compactAssay function provides the option to
obtain a sparse matrix representation with the sparse = TRUE argument. This
will return a dgCMatrix class from the Matrix package.
compactAssay(ragexp, sparse = TRUE)
## 4 x 2 sparse Matrix of class "dgCMatrix"
## sample1 sample2
## chr1:1-10:- 3 1
## chr1:8-14:+ 4 .
## chr2:15-18:+ 5 .
## chr2:11-18:+ . 2
This function returns a matrix of disjoint ranges across all samples. Elements
of the matrix are summarized by applying the simplifyDisjoin functional
argument to assay values of overlapping ranges.
disjoinAssay(ragexp, simplifyDisjoin = mean)
## sample1 sample2
## chr1:8-14:+ 4 NA
## chr1:1-10:- 3 1
## chr2:11-14:+ NA 2
## chr2:15-18:+ 5 2
The qreduceAssay function works with a query parameter that highlights
a window of ranges for the resulting matrix. The returned matrix will have
dimensions length(query) by ncol(x). Elements contain assay values for the
i th query range and the j th sample, summarized according to the
simplifyReduce functional argument.
For demonstration purposes, we can have a look at the original GRangesList
and the associated scores from which the current ragexp object is derived:
unlist(grl, use.names = FALSE)
## GRanges object with 5 ranges and 1 metadata column:
## seqnames ranges strand | score
## <Rle> <IRanges> <Rle> | <integer>
## A chr1 1-10 - | 3
## B chr1 8-14 + | 4
## C chr2 15-18 + | 5
## D chr1 1-10 - | 1
## E chr2 11-18 + | 2
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
This data is represented as rowRanges and assays in RaggedExperiment:
rowRanges(ragexp)
## GRanges object with 5 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## A chr1 1-10 -
## B chr1 8-14 +
## C chr2 15-18 +
## D chr1 1-10 -
## E chr2 11-18 +
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
assay(ragexp, "score")
## sample1 sample2
## A 3 NA
## B 4 NA
## C 5 NA
## D NA 1
## E NA 2
Here we provide the “query” region of interest:
(query <- GRanges(c("chr1:1-14:-", "chr2:11-18:+")))
## GRanges object with 2 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 1-14 -
## [2] chr2 11-18 +
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
The simplifyReduce argument in qreduceAssay allows the user to summarize
overlapping regions with a custom method for the given “query” region of
interest. We provide one for calculating a weighted average score per
query range, where the weight is proportional to the overlap widths between
overlapping ranges and a query range.
Note that there are three arguments to this function. See the documentation for additional details.
weightedmean <- function(scores, ranges, qranges)
{
isects <- pintersect(ranges, qranges)
sum(scores * width(isects)) / sum(width(isects))
}
A call to qreduceAssay involves the RaggedExperiment, the GRanges query
and the simplifyReduce functional argument.
qreduceAssay(ragexp, query, simplifyReduce = weightedmean)
## sample1 sample2
## chr1:1-14:- 3 1
## chr2:11-18:+ 5 2
See the schematic for a visual representation.
The RaggedExperiment provides a family of parallel functions for coercing to
the SummarizedExperiment class. By selecting a particular assay index (i),
a parallel assay coercion method can be achieved.
Here is the list of function names:
sparseSummarizedExperimentcompactSummarizedExperimentdisjoinSummarizedExperimentqreduceSummarizedExperimentSee the documentation for details.
In the special case where the rownames of a sparseMatrix are coercible to
GRanges, RaggedExperiment provides the facility to convert sparse matrices
into RaggedExperiment. This can be done using the as coercion method. The
example below first creates an example sparse dgCMatrix class and then shows
the as method usage to this end.
library(Matrix)
##
## Attaching package: 'Matrix'
## The following object is masked from 'package:S4Vectors':
##
## expand
sm <- Matrix::sparseMatrix(
i = c(2, 3, 4, 3, 4, 3, 4),
j = c(1, 1, 1, 3, 3, 4, 4),
x = c(2L, 4L, 2L, 2L, 2L, 4L, 2L),
dims = c(4, 4),
dimnames = list(
c("chr2:1-10", "chr2:2-10", "chr2:3-10", "chr2:4-10"),
LETTERS[1:4]
)
)
as(sm, "RaggedExperiment")
## class: RaggedExperiment
## dim: 7 3
## assays(1): counts
## rownames: NULL
## colnames(3): A C D
## colData names(0):
sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] Matrix_1.5-1 RaggedExperiment_1.22.0 GenomicRanges_1.50.0
## [4] GenomeInfoDb_1.34.0 IRanges_2.32.0 S4Vectors_0.36.0
## [7] BiocGenerics_0.44.0 BiocStyle_2.26.0
##
## loaded via a namespace (and not attached):
## [1] highr_0.9 bslib_0.4.0
## [3] compiler_4.2.1 BiocManager_1.30.19
## [5] jquerylib_0.1.4 XVector_0.38.0
## [7] MatrixGenerics_1.10.0 bitops_1.0-7
## [9] tools_4.2.1 zlibbioc_1.44.0
## [11] digest_0.6.30 lattice_0.20-45
## [13] jsonlite_1.8.3 evaluate_0.17
## [15] rlang_1.0.6 DelayedArray_0.24.0
## [17] cli_3.4.1 yaml_2.3.6
## [19] xfun_0.34 fastmap_1.1.0
## [21] GenomeInfoDbData_1.2.9 stringr_1.4.1
## [23] knitr_1.40 sass_0.4.2
## [25] grid_4.2.1 Biobase_2.58.0
## [27] R6_2.5.1 rmarkdown_2.17
## [29] bookdown_0.29 magrittr_2.0.3
## [31] htmltools_0.5.3 matrixStats_0.62.0
## [33] SummarizedExperiment_1.28.0 stringi_1.7.8
## [35] RCurl_1.98-1.9 cachem_1.0.6