The SpatialExperiment package is available via Bioconductor.
if(!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("SpatialExperiment")
Load the package as follows:
library(SpatialExperiment)
The SpatialExperiment class is designed to represent spatially
resolved transcriptomics (ST) data. It inherits from the
SingleCellExperiment class and is used in the same manner. In
addition, the class supports storage of spatial information via
spatialData and spatialCoords, and storage of
images via imgData.
The SpatialExperiment class constructor is defined with several arguments to
provide maximum flexibility to the user.
In particular, we distinguish between spatialData and spatialCoords as
follows:
spatialData expects a DataFrame with all the data associated with the
spatial information (optionally including spatial coordinates from spatialCoords).spatialCoords is a numeric matrix containing only the defined
(via spatialCoordsNames) spatial coordinates.When building a SpatialExperiment object, the columns of spatial coordinates from the
spatialData DFrame are identified via the spatialCoordsNames argument and
stored separately as a numeric matrix within spatialCoords.
Following is an example of spatialCoords built via spatialData and
spatialCoordsNames.
cd <- DataFrame(x=1:26, y=1:26, z=letters)
mat <- matrix(nrow=26, ncol=26)
spe <- SpatialExperiment(assay=mat,
spatialData=cd,
spatialCoordsNames=c("x", "y"))
head(spatialCoords(spe))
## x y
## [1,] 1 1
## [2,] 2 2
## [3,] 3 3
## [4,] 4 4
## [5,] 5 5
## [6,] 6 6
spatialCoordsNames(spe)
## [1] "x" "y"
head(spatialData(spe))
## DataFrame with 6 rows and 1 column
## z
## <character>
## 1 a
## 2 b
## 3 c
## 4 d
## 5 e
## 6 f
spatialDataNames(spe)
## [1] "z"
head(colData(spe))
## DataFrame with 6 rows and 2 columns
## sample_id z
## <character> <character>
## 1 sample01 a
## 2 sample01 b
## 3 sample01 c
## 4 sample01 d
## 5 sample01 e
## 6 sample01 f
It is also possible to display the combined spatial information with
spatialData() using the spatialCoords=TRUE argument:
spatialData(spe, spatialCoords=TRUE)
## DataFrame with 26 rows and 3 columns
## z x y
## <character> <integer> <integer>
## 1 a 1 1
## 2 b 2 2
## 3 c 3 3
## 4 d 4 4
## 5 e 5 5
## ... ... ... ...
## 22 v 22 22
## 23 w 23 23
## 24 x 24 24
## 25 y 25 25
## 26 z 26 26
Alternatively, it is possible to define the spatialDataNames to define
the spatialData DFrame from the columns of colData.
cd <- DataFrame(x=1:26, y=1:26, z=letters)
mat <- matrix(nrow=26, ncol=26)
spe <- SpatialExperiment(
assay=mat, colData=cd, spatialCoordsNames=c("x", "y"), spatialDataNames="z")
head(spatialData(spe))
## DataFrame with 6 rows and 1 column
## z
## <character>
## 1 a
## 2 b
## 3 c
## 4 d
## 5 e
## 6 f
head(spatialCoords(spe))
## x y
## [1,] 1 1
## [2,] 2 2
## [3,] 3 3
## [4,] 4 4
## [5,] 5 5
## [6,] 6 6
head(colData(spe))
## DataFrame with 6 rows and 2 columns
## z sample_id
## <character> <character>
## 1 a sample01
## 2 b sample01
## 3 c sample01
## 4 d sample01
## 5 e sample01
## 6 f sample01
Also, it is possible to load a numeric matrix of coordinates with the
spatialCoords argument.
y <- diag(n <- 10)
mat <- matrix(0, n, m <- 2)
spe <- SpatialExperiment(assays = y, spatialCoords = mat)
Finally, it is possible to set spatialData, spatialCoords, and colData
separately.
mat <- as.matrix(cd[,1:2])
colnames(mat) <- c("ecs","uai")
spad <- DataFrame(a=1:26, b=1:26, z=letters)
asy <- matrix(nrow=26, ncol=26)
spe <- SpatialExperiment(assays = asy, spatialCoords = mat,
spatialData=spad, colData=cd)
head(spatialData(spe))
## DataFrame with 6 rows and 3 columns
## a b z
## <integer> <integer> <character>
## 1 1 1 a
## 2 2 2 b
## 3 3 3 c
## 4 4 4 d
## 5 5 5 e
## 6 6 6 f
head(spatialCoords(spe))
## ecs uai
## [1,] 1 1
## [2,] 2 2
## [3,] 3 3
## [4,] 4 4
## [5,] 5 5
## [6,] 6 6
head(colData(spe))
## DataFrame with 6 rows and 7 columns
## x y z sample_id a b z
## <integer> <integer> <character> <character> <integer> <integer> <character>
## 1 1 1 a sample01 1 1 a
## 2 2 2 b sample01 2 2 b
## 3 3 3 c sample01 3 3 c
## 4 4 4 d sample01 4 4 d
## 5 5 5 e sample01 5 5 e
## 6 6 6 f sample01 6 6 f
To work with multiple samples, the SpatialExperiment class provides the cbind
method, which assumes unique sample_id(s) are provided for each sample.
In case the sample_id(s) are duplicated across multiple samples, the cbind
method takes care of this by appending indices to create unique sample identifiers.
spe1 <- spe2 <- spe
spe3 <- cbind(spe1, spe2)
## 'sample_id's are duplicated across 'SpatialExperiment' objects to cbind; appending sample indices.
unique(spe3$sample_id)
## [1] "sample01.1" "sample01.2"
Otherwise, it is possible to create unique sample_id(s) as follows.
# make sample identifiers unique
spe1 <- spe2 <- spe
spe1$sample_id <- paste(spe1$sample_id, "sample1", sep = ".")
spe2$sample_id <- paste(spe2$sample_id, "sample2", sep = ".")
# combine into single object
spe3 <- cbind(spe1, spe2)
spe3
## class: SpatialExperiment
## dim: 26 52
## metadata(0):
## assays(1): ''
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(4): x y sample_id z.1
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialData names(3) : a b z
## spatialCoords names(2) : ecs uai
## imgData names(1): sample_id
Subsetting objects is automatically defined to synchronize across all attributes of the objects, as for any other Bioconductor Experiment class.
For example, it is possible to subset by sample_id as follows:
spe3[, colData(spe)$sample_id=="sample1.sample1"]
## class: SpatialExperiment
## dim: 26 0
## metadata(0):
## assays(1): ''
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(4): x y sample_id z.1
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialData names(3) : a b z
## spatialCoords names(2) : ecs uai
## imgData names(1): sample_id
In particular, when trying to replace the sample_id(s) of a SpatialExperiment
object, these must map uniquely with the already existing ones, otherwise an
error is returned.
new <- spe3$sample_id;
new[1] <- "sample1.sample2"
spe3$sample_id <- new
## Error in .local(x, ..., value): Number of unique 'sample_id's is 2, but 3 were provided.
new[1] <- "third.one.of.two"
spe3$sample_id <- new
## Error in .local(x, ..., value): Number of unique 'sample_id's is 2, but 3 were provided.
When working with spot-based ST data, such as 10x Genomics Visium or other
platforms providing images, it is possible to store the image information in the
dedicated imgData structure.
Also, the SpatialExperiment class stores a sample_id value in the
spatialData structure, which is possible to set with the sample_id argument
(default is “sample_01”).
Here we show how to load the default Space Ranger data files from a 10x Genomics
Visium experiment, and build a SpatialExperiment object.
In particular, the readImgData() function is used to build an imgData
DataFrame to be passed to the SpatialExperiment constructor.
The sample_id used to build the imgData object must be the
same one used to build the SpatialExperiment object, otherwise an error is
returned.
dir <- system.file(
file.path("extdata", "10xVisium", "section1"),
package = "SpatialExperiment")
# read in counts
fnm <- file.path(dir, "raw_feature_bc_matrix")
sce <- DropletUtils::read10xCounts(fnm)
# read in image data
img <- readImgData(
path = file.path(dir, "spatial"),
sample_id="foo")
# read in spatial coordinates
fnm <- file.path(dir, "spatial", "tissue_positions_list.csv")
xyz <- read.csv(fnm, header = FALSE,
col.names = c(
"barcode", "in_tissue", "array_row", "array_col",
"pxl_row_in_fullres", "pxl_col_in_fullres"))
# construct observation & feature metadata
rd <- S4Vectors::DataFrame(
symbol = rowData(sce)$Symbol)
# construct 'SpatialExperiment'
(spe <- SpatialExperiment(
assays = list(counts = assay(sce)),
colData = colData(sce), rowData = rd, imgData = img,
spatialData=DataFrame(xyz),
spatialCoordsNames=c("pxl_col_in_fullres", "pxl_row_in_fullres"),
sample_id="foo"))
## class: SpatialExperiment
## dim: 50 50
## metadata(0):
## assays(1): counts
## rownames(50): ENSMUSG00000051951 ENSMUSG00000089699 ...
## ENSMUSG00000005886 ENSMUSG00000101476
## rowData names(1): symbol
## colnames: NULL
## colData names(3): Sample Barcode sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialData names(4) : barcode in_tissue array_row array_col
## spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
## imgData names(4): sample_id image_id data scaleFactor
Alternatively, the read10xVisium() function facilitates the import of
10x Genomics Visium data to handle one or more samples organized in
folders reflecting the default Space Ranger folder tree organization:
sample
|—outs
· · |—raw/filtered_feature_bc_matrix.h5
· · |—raw/filtered_feature_bc_matrix
· · · · |—barcodes.tsv
· · · · |—features.tsv
· · · · |—matrix.mtx
· · |—spatial
· · · · |—scalefactors_json.json
· · · · |—tissue_lowres_image.png
· · · · |—tissue_positions_list.csv
dir <- system.file(
file.path("extdata", "10xVisium"),
package = "SpatialExperiment")
sample_ids <- c("section1", "section2")
samples <- file.path(dir, sample_ids)
(spe <- read10xVisium(samples, sample_ids,
type = "sparse", data = "raw",
images = "lowres", load = FALSE))
## class: SpatialExperiment
## dim: 50 99
## metadata(0):
## assays(1): counts
## rownames(50): ENSMUSG00000051951 ENSMUSG00000089699 ...
## ENSMUSG00000005886 ENSMUSG00000101476
## rowData names(1): symbol
## colnames(99): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
## AAAGTCGACCCTCAGT-1 AAAGTGCCATCAATTA-1
## colData names(1): sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialData names(3) : in_tissue array_row array_col
## spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
## imgData names(4): sample_id image_id data scaleFactor
To demonstrate how to accommodate molecule-based ST data (e.g. seqFISH platform)
inside a SpatialExperiment object, we generate some mock data of
molecule coordinates across and cells.
These should be formatted into a data.frame where each row corresponds to a
molecule, and columns specify the xy-position as well as which gene/cell the
molecule has been assigned to:
# sample xy-coordinates in [0,1]
x <- runif(n)
y <- runif(n)
# assign each molecule to some gene-cell pair
gs <- paste0("gene", seq(ng))
cs <- paste0("cell", seq(nc))
gene <- sample(gs, n, TRUE)
cell <- sample(cs, n, TRUE)
# construct data.frame of molecule coodinates
df <- data.frame(gene, cell, x, y)
head(df)
## gene cell x y
## 1 gene2 cell1 0.8808741 0.320652463
## 2 gene13 cell3 0.6192341 0.543930311
## 3 gene35 cell16 0.8563139 0.249875277
## 4 gene12 cell16 0.1921838 0.740945357
## 5 gene49 cell1 0.8762683 0.001373172
## 6 gene40 cell4 0.2871124 0.811706067
Next, it is possible to re-shape the above table into a
BumpyMatrix using splitAsBumpyMatrix(), which takes
as input the xy-coordinates, as well as arguments specifying the row and column
index of each observation:
# (assure gene & cell are factor so that
# missing observations aren't dropped)
df$gene <- factor(df$gene, gs)
df$cell <- factor(df$cell, cs)
# construct BumpyMatrix
library(BumpyMatrix)
mol <- splitAsBumpyMatrix(
df[, c("x", "y")],
row = gs, col = cs)
Finally, it is possible to construct a SpatialExperiment object with two data
slots:
- The counts assay stores the number of molecules per gene and cell
(equivalent to transcript counts in spot-based data).
- The molecules assay holds the spatial molecule positions (xy-coordinates).
Here, each entry is a DFrame that contains the positions of all molecules
from a given gene that have been assigned to a given cell.
# get count matrix
y <- with(df, table(gene, cell))
y <- as.matrix(unclass(y))
y[1:5, 1:5]
## cell
## gene cell1 cell2 cell3 cell4 cell5
## gene1 2 0 0 1 0
## gene2 3 2 4 1 0
## gene3 1 1 0 1 3
## gene4 1 0 1 1 1
## gene5 2 1 1 0 0
# construct SpatialExperiment
spe <- SpatialExperiment(
assays = list(
counts = y,
molecules = mol))
spe
## class: SpatialExperiment
## dim: 50 20
## metadata(0):
## assays(2): counts molecules
## rownames(50): gene1 gene2 ... gene49 gene50
## rowData names(0):
## colnames(20): cell1 cell2 ... cell19 cell20
## colData names(1): sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialData names(0) :
## spatialCoords names(0) :
## imgData names(0):
The BumpyMatrix of molecule locations can be accessed using the dedicated
molecules() accessor:
molecules(spe)
## 50 x 20 BumpyDataFrameMatrix
## rownames: gene1 gene2 ... gene49 gene50
## colnames: cell1 cell2 ... cell19 cell20
## preview [1,1]:
## DataFrame with 20 rows and 2 columns
## x y
## <numeric> <numeric>
## 1 0.8808741 0.320652
## 2 0.0260193 0.469372
## 3 0.8526963 0.294299
## 4 0.0341913 0.918395
## 5 0.3358630 0.748199
## ... ... ...
## 16 0.38597319 0.799529
## 17 0.50525161 0.116668
## 18 0.62342685 0.925884
## 19 0.00431502 0.625905
## 20 0.34600773 0.944289
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] BumpyMatrix_1.0.0 SpatialExperiment_1.2.1
## [3] SingleCellExperiment_1.14.1 SummarizedExperiment_1.22.0
## [5] Biobase_2.52.0 GenomicRanges_1.44.0
## [7] GenomeInfoDb_1.28.0 IRanges_2.26.0
## [9] S4Vectors_0.30.0 BiocGenerics_0.38.0
## [11] MatrixGenerics_1.4.0 matrixStats_0.59.0
## [13] BiocStyle_2.20.0
##
## loaded via a namespace (and not attached):
## [1] locfit_1.5-9.4 xfun_0.23
## [3] bslib_0.2.5.1 beachmat_2.8.0
## [5] HDF5Array_1.20.0 lattice_0.20-44
## [7] rhdf5_2.36.0 htmltools_0.5.1.1
## [9] yaml_2.2.1 rlang_0.4.11
## [11] R.oo_1.24.0 jquerylib_0.1.4
## [13] scuttle_1.2.0 R.utils_2.10.1
## [15] BiocParallel_1.26.0 dqrng_0.3.0
## [17] GenomeInfoDbData_1.2.6 stringr_1.4.0
## [19] zlibbioc_1.38.0 R.methodsS3_1.8.1
## [21] codetools_0.2-18 evaluate_0.14
## [23] knitr_1.33 Rcpp_1.0.6
## [25] edgeR_3.34.0 formatR_1.11
## [27] BiocManager_1.30.15 limma_3.48.0
## [29] DelayedArray_0.18.0 magick_2.7.2
## [31] jsonlite_1.7.2 XVector_0.32.0
## [33] rjson_0.2.20 digest_0.6.27
## [35] stringi_1.6.2 bookdown_0.22
## [37] grid_4.1.0 tools_4.1.0
## [39] bitops_1.0-7 rhdf5filters_1.4.0
## [41] magrittr_2.0.1 sass_0.4.0
## [43] RCurl_1.98-1.3 Matrix_1.3-4
## [45] DelayedMatrixStats_1.14.0 sparseMatrixStats_1.4.0
## [47] rmarkdown_2.8 Rhdf5lib_1.14.1
## [49] R6_2.5.0 DropletUtils_1.12.1
## [51] compiler_4.1.0