In this guide, we illustrate here two common downstream analysis workflows for ChIP-seq experiments, one is for comparing and combining peaks for single transcription factor (TF) with replicates, and the other is for comparing binding profiles from ChIP-seq experiments with multiple TFs.
This workflow shows how to convert BED/GFF files to GRanges, find overlapping peaks between two peak sets, and visualize the number of common and specific peaks with Venn diagram.
The input for ChIPpeakAnno1 is a list of called peaks identified from ChIP-seq experiments or any other experiments that yield a set of chromosome coordinates. Although peaks are represented as GRanges in ChIPpeakAnno, other common peak formats such as BED, GFF and MACS can be converted to GRanges easily using a conversion toGRanges method. For detailed information on how to use this method, please type ?toGRanges.
The following examples illustrate the usage of this method to convert BED and GFF file to GRanges, add metadata from orignal peaks to the overlap GRanges using function addMetadata, and visualize the overlapping using function makeVennDiagram.
library(ChIPpeakAnno)
bed <- system.file("extdata", "MACS_output.bed", package="ChIPpeakAnno")
gr1 <- toGRanges(bed, format="BED", header=FALSE) 
## one can also try import from rtracklayer
gff <- system.file("extdata", "GFF_peaks.gff", package="ChIPpeakAnno")
gr2 <- toGRanges(gff, format="GFF", header=FALSE, skip=3)
## must keep the class exactly same as gr1$score, i.e., numeric.
gr2$score <- as.numeric(gr2$score) 
ol <- findOverlapsOfPeaks(gr1, gr2)
## add metadata (mean of score) to the overlapping peaks
ol <- addMetadata(ol, colNames="score", FUN=mean) 
ol$peaklist[["gr1///gr2"]][1:2]## GRanges object with 2 ranges and 2 metadata columns:
##       seqnames           ranges strand |
##          <Rle>        <IRanges>  <Rle> |
##   [1]     chr1 [713791, 715578]      * |
##   [2]     chr1 [724851, 727191]      * |
##                                           peakNames     score
##                                     <CharacterList> <numeric>
##   [1] gr1__MACS_peak_13,gr2__region_0,gr2__region_1  850.2033
##   [2]               gr2__region_2,gr1__MACS_peak_14   29.1700
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengthsmakeVennDiagram(ol)## $p.value
##      gr1 gr2 pval
## [1,]   1   1    0
## 
## $vennCounts
##      gr1 gr2 Counts
## [1,]   0   0      0
## [2,]   0   1     61
## [3,]   1   0     62
## [4,]   1   1    166
## attr(,"class")
## [1] "VennCounts"Annotation data should be an object of GRanges. Same as import peaks, we use the method toGRanges, which can return an object of GRanges, to represent the annotation data. An annotation data be constructed from not only BED, GFF or user defined readable text files, but also EnsDb or TxDb object, by calling the toGRanges method. Please type ?toGRanges for more information.
library(EnsDb.Hsapiens.v75) ##(hg19)
## create annotation file from EnsDb or TxDb
annoData <- toGRanges(EnsDb.Hsapiens.v75, feature="gene")
annoData[1:2]## GRanges object with 2 ranges and 1 metadata column:
##                   seqnames      ranges strand |   gene_name
##                      <Rle>   <IRanges>  <Rle> | <character>
##   ENSG00000210049     chrM [577,  647]      + |       MT-TF
##   ENSG00000211459     chrM [648, 1601]      + |     MT-RNR1
##   -------
##   seqinfo: 273 sequences from GRCh37 genomeAfter finding the overlapping peaks, the distribution of the distance of overlapped peaks to the nearest feature such as the transcription start sites (TSS) can be plotted by binOverFeature function. The sample code here plots the distribution of peaks around the TSS.
overlaps <- ol$peaklist[["gr1///gr2"]]
binOverFeature(overlaps, annotationData=annoData,
               radius=5000, nbins=20, FUN=length, errFun=0,
               ylab="count", 
               main="Distribution of aggregated peak numbers around TSS")In addition, assignChromosomeRegion can be used to summarize the distribution of peaks over different type of features such as exon, intron, enhancer, proximal promoter, 5’ UTR and 3’ UTR. This distribution can be summarized in peak centric or nucleotide centric view using the function assignChromosomeRegion. Please note that one peak might span multiple type of features, leading to the number of annotated features greater than the total number of input peaks. At the peak centric view, precedence will dictate the annotation order when peaks span multiple type of features.
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
aCR<-assignChromosomeRegion(overlaps, nucleotideLevel=FALSE, 
                           precedence=c("Promoters", "immediateDownstream", 
                                         "fiveUTRs", "threeUTRs", 
                                         "Exons", "Introns"), 
                           TxDb=TxDb.Hsapiens.UCSC.hg19.knownGene)
barplot(aCR$percentage, las=3)As shown from the distribution of aggregated peak numbers around TSS and the distribution of peaks in different of chromosome regions, most of the peaks locate around TSS. Therefore, it is reasonable to use annotatePeakInBatch or annoPeaks to annotate the peaks to the promoter regions of Hg19 genes. Promoters can be specified with bindingRegion. For the following example, promoter region is defined as upstream 2000 and downstream 500 from TSS (bindingRegion=c(-2000, 500)).
overlaps.anno <- annotatePeakInBatch(overlaps, 
                                     AnnotationData=annoData, 
                                     output="nearestBiDirectionalPromoters",
                                     bindingRegion=c(-2000, 500))
library(org.Hs.eg.db)
overlaps.anno <- addGeneIDs(overlaps.anno,
                            "org.Hs.eg.db",
                            IDs2Add = "entrez_id")
head(overlaps.anno)## GRanges object with 6 ranges and 10 metadata columns:
##       seqnames           ranges strand |
##          <Rle>        <IRanges>  <Rle> |
##   [1]     chr1 [713791, 715578]      * |
##   [2]     chr1 [713791, 715578]      * |
##   [3]     chr1 [839467, 840090]      * |
##   [4]     chr1 [856361, 856999]      * |
##   [5]     chr1 [859315, 860144]      * |
##   [6]     chr1 [901118, 902861]      * |
##                                           peakNames     score
##                                     <CharacterList> <numeric>
##   [1] gr1__MACS_peak_13,gr2__region_0,gr2__region_1  850.2033
##   [2] gr1__MACS_peak_13,gr2__region_0,gr2__region_1  850.2033
##   [3]               gr1__MACS_peak_16,gr2__region_3   73.1200
##   [4]               gr1__MACS_peak_17,gr2__region_4   54.6900
##   [5]               gr2__region_5,gr1__MACS_peak_18   81.4850
##   [6]              gr2__region_11,gr1__MACS_peak_23  119.9100
##               feature   feature.ranges feature.strand  distance
##           <character>        <IRanges>          <Rle> <integer>
##   [1] ENSG00000228327 [700237, 714006]              -         0
##   [2] ENSG00000237491 [714150, 745440]              +         0
##   [3] ENSG00000272438 [840214, 851356]              +       123
##   [4] ENSG00000223764 [852245, 856396]              -         0
##   [5] ENSG00000187634 [860260, 879955]              +       115
##   [6] ENSG00000187583 [901877, 911245]              +         0
##       insideFeature distanceToStart     gene_name   entrez_id
##            <factor>       <numeric>   <character> <character>
##   [1]  overlapStart             215 RP11-206L10.2        <NA>
##   [2]  overlapStart             359 RP11-206L10.9        <NA>
##   [3]      upstream             124  RP11-54O7.16        <NA>
##   [4]  overlapStart              35   RP11-54O7.3   100130417
##   [5]      upstream             116        SAMD11      148398
##   [6]  overlapStart             759       PLEKHN1       84069
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengthswrite.csv(as.data.frame(unname(overlaps.anno)), "anno.csv")The distribution of the common peaks around features can be visualized using a pie chart.
pie1(table(overlaps.anno$insideFeature))The following example shows how to use getEnrichedGO to obtain a list of enriched GO terms with annotated peaks. For pathway analysis, please use function getEnrichedPATH with reactome or KEGG database. Please note that by default feature_id_type is set as “ensembl_gene_id”. If you are using TxDb as annotation data, please set it to “entrez_id”.
over <- getEnrichedGO(overlaps.anno, orgAnn="org.Hs.eg.db", 
                     maxP=.05, minGOterm=10, 
                     multiAdjMethod="BH", condense=TRUE)
head(over[["bp"]][, -c(3, 10)])## [1] go.id              go.term            Ontology          
## [4] pvalue             count.InDataset    count.InGenome    
## [7] totaltermInDataset totaltermInGenome  EntrezID          
## <0 rows> (or 0-length row.names)library(reactome.db)
path <- getEnrichedPATH(overlaps.anno, "org.Hs.eg.db", "reactome.db", maxP=.05)
head(path)##   path.id EntrezID count.InDataset count.InGenome     pvalue
## 1  114604     5590               1             28 0.04645649
## 2 1296041     2782               1             25 0.04158312
## 3 1296059     2782               1             25 0.04158312
## 4 1852241    54998               3            283 0.01261177
## 5 1852241    55052               3            283 0.01261177
## 6 1852241   261734               3            283 0.01261177
##   totaltermInDataset totaltermInGenome
## 1                111             65404
## 2                111             65404
## 3                111             65404
## 4                111             65404
## 5                111             65404
## 6                111             65404
##                                                                                                                            PATH
## 1                                 Homo sapiens: GPVI-mediated activation cascade;Homo sapiens: GPVI-mediated activation cascade
## 2 Homo sapiens: Activation of G protein gated Potassium channels;Homo sapiens: Activation of G protein gated Potassium channels
## 3                             Homo sapiens: G protein gated Potassium channels;Homo sapiens: G protein gated Potassium channels
## 4                         Homo sapiens: Organelle biogenesis and maintenance;Homo sapiens: Organelle biogenesis and maintenance
## 5                         Homo sapiens: Organelle biogenesis and maintenance;Homo sapiens: Organelle biogenesis and maintenance
## 6                         Homo sapiens: Organelle biogenesis and maintenance;Homo sapiens: Organelle biogenesis and maintenanceHere is an example to get the sequences of the peaks plus 20 bp upstream and downstream for PCR validation or motif discovery.
library(BSgenome.Hsapiens.UCSC.hg19)
seq <- getAllPeakSequence(overlaps, upstream=20, downstream=20, genome=Hsapiens)
write2FASTA(seq, "test.fa")Here is an example to get the Z-scores for short oligos3.
## summary of the short oligos
freqs <- oligoFrequency(Hsapiens$chr1, MarkovOrder=3)
os <- oligoSummary(seq, oligoLength=6, MarkovOrder=3, 
                   quickMotif=FALSE, freqs=freqs)
## plot the results
zscore <- sort(os$zscore)
h <- hist(zscore, breaks=100, xlim=c(-50, 50), main="Histogram of Z-score")
text(zscore[length(zscore)], max(h$counts)/10, 
     labels=names(zscore[length(zscore)]), adj=1)## We can also try simulation data
seq.sim.motif <- list(c("t", "g", "c", "a", "t", "g"), 
                      c("g", "c", "a", "t", "g", "c"))
set.seed(1)
seq.sim <- sapply(sample(c(2, 1, 0), 1000, replace=TRUE, prob=c(0.07, 0.1, 0.83)), 
                  function(x){
    s <- sample(c("a", "c", "g", "t"), 
                sample(100:1000, 1), replace=TRUE)
    if(x>0){
        si <- sample.int(length(s), 1)
        if(si>length(s)-6) si <- length(s)-6
        s[si:(si+5)] <- seq.sim.motif[[x]]
    }
    paste(s, collapse="")
})
os <- oligoSummary(seq.sim, oligoLength=6, MarkovOrder=3, 
                   quickMotif=TRUE)
zscore <- sort(os$zscore, decreasing=TRUE)
h <- hist(zscore, breaks=100, main="Histogram of Z-score")
text(zscore[1:2], rep(5, 2), 
     labels=names(zscore[1:2]), adj=0, srt=90)## generate the motifs
library(motifStack)
pfms <- mapply(function(.ele, id)
    new("pfm", mat=.ele, name=paste("SAMPLE motif", id)), 
    os$motifs, 1:length(os$motifs))
motifStack(pfms[[1]])Bidirectional promoters are the DNA regions located between TSS of two adjacent genes that are transcribed on opposite directions and often co-regulated by this shared promoter region5. Here is an example to find peaks near bi-directional promoters.
bdp <- peaksNearBDP(overlaps, annoData, maxgap=5000)
c(bdp$percentPeaksWithBDP, 
  bdp$n.peaks, 
  bdp$n.peaksWithBDP)## [1]   0.1084337 166.0000000  18.0000000bdp$peaksWithBDP[1:2]## GRangesList object of length 2:
## $1 
## GRanges object with 2 ranges and 10 metadata columns:
##       seqnames           ranges strand |
##          <Rle>        <IRanges>  <Rle> |
##   [1]     chr1 [713791, 715578]      * |
##   [2]     chr1 [713791, 715578]      * |
##                                           peakNames     score   bdp_idx
##                                     <CharacterList> <numeric> <integer>
##   [1] gr1__MACS_peak_13,gr2__region_0,gr2__region_1  850.2033         1
##   [2] gr1__MACS_peak_13,gr2__region_0,gr2__region_1  850.2033         1
##               feature   feature.ranges feature.strand  distance
##           <character>        <IRanges>          <Rle> <integer>
##   [1] ENSG00000228327 [700237, 714006]              -         0
##   [2] ENSG00000237491 [714150, 745440]              +         0
##       insideFeature distanceToStart     gene_name
##            <factor>       <numeric>   <character>
##   [1]  overlapStart             215 RP11-206L10.2
##   [2]  overlapStart             359 RP11-206L10.9
## 
## $4 
## GRanges object with 2 ranges and 10 metadata columns:
##       seqnames           ranges strand |                       peakNames
##   [1]     chr1 [856361, 856999]      * | gr1__MACS_peak_17,gr2__region_4
##   [2]     chr1 [856361, 856999]      * | gr1__MACS_peak_17,gr2__region_4
##       score bdp_idx         feature   feature.ranges feature.strand
##   [1] 54.69       4 ENSG00000223764 [852245, 856396]              -
##   [2] 54.69       4 ENSG00000187634 [860260, 879955]              +
##       distance insideFeature distanceToStart   gene_name
##   [1]        0  overlapStart              35 RP11-54O7.3
##   [2]     3260      upstream            3261      SAMD11
## 
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengthsThere are several techniques available to determine the spatial organization of chromosomes at high resolution such as 3C, 5C and HiC6. These techniques make it possible to search peaks binding to the potential enhancer regions. Here is an example to find peaks binding to the potential enhancer regions.
DNA5C <- system.file("extdata", 
                     "wgEncodeUmassDekker5CGm12878PkV2.bed.gz",
                     package="ChIPpeakAnno")
DNAinteractiveData <- toGRanges(gzfile(DNA5C))
findEnhancers(overlaps, annoData, DNAinteractiveData)## GRanges object with 5 ranges and 13 metadata columns:
##       seqnames                 ranges strand |
##          <Rle>              <IRanges>  <Rle> |
##   [1]     chr1 [151591700, 151591800]      * |
##   [2]     chr1 [151591700, 151591800]      * |
##   [3]     chr1 [151591700, 151591800]      * |
##   [4]     chr1 [151591700, 151591800]      * |
##   [5]     chr1 [151630186, 151630286]      * |
##                                peakNames     score         feature
##                          <CharacterList> <numeric>     <character>
##   [1] gr2__region_228,gr1__MACS_peak_229    78.675 ENSG00000207606
##   [2] gr2__region_228,gr1__MACS_peak_229    78.675 ENSG00000143390
##   [3] gr2__region_228,gr1__MACS_peak_229    78.675 ENSG00000143376
##   [4] gr2__region_228,gr1__MACS_peak_229    78.675 ENSG00000143367
##   [5] gr2__region_229,gr1__MACS_peak_230    78.675 ENSG00000143393
##               feature.ranges feature.strand   feature.shift.ranges
##                    <IRanges>          <Rle>              <IRanges>
##   [1] [151518272, 151518367]              + [151594534, 151594629]
##   [2] [151313116, 151319833]              - [151595209, 151601927]
##   [3] [151584541, 151671567]              + [151500588, 151587615]
##   [4] [151512781, 151556059]              + [151595902, 151639180]
##   [5] [151264273, 151300191]              - [151594247, 151630165]
##       feature.shift.strand  distance insideFeature distanceToStart
##                      <Rle> <integer>      <factor>       <numeric>
##   [1]                    +      2733      upstream            2734
##   [2]                    +      3408      upstream            3409
##   [3]                    -      4084      upstream            4085
##   [4]                    +      4101      upstream            4102
##   [5]                    -        20      upstream              21
##         gene_name  DNAinteractive.ranges             DNAinteractive.blocks
##       <character>              <IRanges>                     <IRangesList>
##   [1]      MIR554 [151516086, 151603110]     [    1, 19082] [76263, 87025]
##   [2]        RFX5 [151309062, 151603110] [     1,  13633] [283287, 294049]
##   [3]       SNX27 [151546428, 151636526]     [    1,  6978] [72324, 90099]
##   [4]       TUFT1 [151546428, 151636526]     [    1,  6978] [72324, 90099]
##   [5]       PI4KB [151283079, 151636526] [     1,   5699] [335673, 353448]
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengthsGiven two or more peak lists from different TFs, one may be interested in finding whether DNA binding profile of those TFs are correlated, and if correlated, what is the common binding pattern. The workflow here shows how to test the correlation of binding profiles of three TFs and how to discover the common binding pattern.
path <- system.file("extdata", package="ChIPpeakAnno")
files <- dir(path, "broadPeak")
data <- sapply(file.path(path, files), toGRanges, format="broadPeak")
names(data) <- gsub(".broadPeak", "", files)When we test the association between two sets of data based on hypergeometric distribution, the number of all potential binding sites is required. The parameter totalTest in the function makeVennDiagram indicates how many potential peaks in total will be used in the hypergeometric test. It should be larger than the largest number of peaks in the peak list. The smaller it is set, the more stringent the test is. The time used to calculate p-value does not depend on the value of the totalTest. For practical guidance on how to choose totalTest, please refer to the post. The following example makes an assumption that there are 3% of coding region plus promoter region. Because the sample data is only a subset of chromosome 2, we estimate that the total binding sites is 1/24 of possible binding region in the genome.
ol <- findOverlapsOfPeaks(data, connectedPeaks="keepAll")
averagePeakWidth <- mean(width(unlist(GRangesList(ol$peaklist))))
tot <- ceiling(3.3e+9 * .03 / averagePeakWidth / 24)
makeVennDiagram(ol, totalTest=tot, connectedPeaks="keepAll")## $p.value
##      TAF Tead4 YY1          pval
## [1,]   0     1   1  1.000000e+00
## [2,]   1     0   1 2.904297e-258
## [3,]   1     1   0  8.970986e-04
## 
## $vennCounts
##      TAF Tead4 YY1 Counts count.TAF count.Tead4 count.YY1
## [1,]   0     0   0    849         0           0         0
## [2,]   0     0   1    621         0           0       621
## [3,]   0     1   0   2097         0        2097         0
## [4,]   0     1   1    309         0         310       319
## [5,]   1     0   0     59        59           0         0
## [6,]   1     0   1    166       172           0       170
## [7,]   1     1   0      8         8           8         0
## [8,]   1     1   1    476       545         537       521
## attr(,"class")
## [1] "VennCounts"The above hypergeometric test requires users to input an estimate of the total potential binding sites for a given TF. To circumvent this requirement, we implemented a permutation test called peakPermTest. Before performing a permutation test, users need to generate random peaks using the distribution discovered from the input peaks for a given feature type (transcripts or exons), to make sure the binding positions relative to features, such as TSS and geneEnd, and the width of the random peaks follow the distribution of that of the input peaks.
Alternatively, a peak pool representing all potential binding sites can be created with associated binding probabilities for random peak sampling using preparePool. Here is an example to build a peak pool for human genome using the transcription factor binding site clusters (V3) (see ?wgEncodeTfbsV3) downloaded from ENCODE with the HOT spots (?HOT.spots) removed. HOT spots are the genomic regions with high probability of being bound by many TFs in ChIP-seq experiments7. We suggest remove those HOT spots from the peak lists before performing permutation test to avoid the overestimation of the association between the two input peak lists. Users can also choose to remove ENCODE blacklist for a given species. The blacklists were constructed by identifying consistently problematic regions over independent cell lines and types of experiments for each species in the ENCODE and modENCODE datasets8. Please note that some of the blacklists may need to be converted to the correct genome assembly using liftover utility.
Following are the sample codes to do the permutation test using permTest:
    data(HOT.spots)
    data(wgEncodeTfbsV3)
    hotGR <- reduce(unlist(HOT.spots))
    removeOl <- function(.ele){
        ol <- findOverlaps(.ele, hotGR)
        if(length(ol)>0) .ele <- .ele[-unique(queryHits(ol))]
        .ele
    }
    TAF <- removeOl(data[["TAF"]])
    TEAD4 <- removeOl(data[["Tead4"]])
    YY1 <- removeOl(data[["YY1"]])
    # we subset the pool to save demo time
    set.seed(1)
    wgEncodeTfbsV3.subset <- 
        wgEncodeTfbsV3[sample.int(length(wgEncodeTfbsV3), 2000)]
    pool <- new("permPool", grs=GRangesList(wgEncodeTfbsV3.subset), N=length(YY1))
    pt1 <- peakPermTest(YY1, TEAD4, pool=pool, seed=1, force.parallel=FALSE)
    plot(pt1)    pt2 <- peakPermTest(YY1, TAF, pool=pool, seed=1, force.parallel=FALSE)
    plot(pt2)The binding pattern around a genome feature could be visualized and compared using a side-by-side heatmap and density plot using the binding ranges of overlapping peaks.
features <- ol$peaklist[[length(ol$peaklist)]]
feature.recentered <- reCenterPeaks(features, width=4000)
## here we also suggest importData function in bioconductor trackViewer package 
## to import the coverage.
## compare rtracklayer, it will save you time when handle huge dataset.
library(rtracklayer)
files <- dir(path, "bigWig")
if(.Platform$OS.type != "windows"){
    cvglists <- sapply(file.path(path, files), import, 
                       format="BigWig", 
                       which=feature.recentered, 
                       as="RleList")
}else{## rtracklayer can not import bigWig files on Windows
    load(file.path(path, "cvglist.rds"))
}
names(cvglists) <- gsub(".bigWig", "", files)
feature.center <- reCenterPeaks(features, width=1)
sig <- featureAlignedSignal(cvglists, feature.center, 
                            upstream=2000, downstream=2000)
##Because the bw file is only a subset of the original file,
##the signals are not exists for every peak.
keep <- rowSums(sig[[2]]) > 0
sig <- sapply(sig, function(.ele) .ele[keep, ], simplify = FALSE)
feature.center <- feature.center[keep]
heatmap <- featureAlignedHeatmap(sig, feature.center, 
                                 upstream=2000, downstream=2000,
                                 upper.extreme=c(3,.5,4))sig.rowsums <- sapply(sig, rowSums, na.rm=TRUE)
d <- dist(sig.rowsums)
hc <- hclust(d)
feature.center$order <- hc$order
heatmap <- featureAlignedHeatmap(sig, feature.center, 
                                 upstream=2000, downstream=2000,
                                 upper.extreme=c(3,.5,4),
                                 sortBy="order")featureAlignedDistribution(sig, feature.center, 
                           upstream=2000, downstream=2000,
                           type="l")1. Zhu, L. J. et al. ChIPpeakAnno: A bioconductor package to annotate chIP-seq and chIP-chip data. BMC bioinformatics 11, 237 (2010).
2. Zhu, L. J. in Tiling arrays 105–124 (Springer, 2013).
3. Leung, M.-Y., Marsh, G. M. & Speed, T. P. Over-and underrepresentation of short dNA words in herpesvirus genomes. Journal of Computational Biology 3, 345–360 (1996).
4. Helden, J. van, Olmo, M. lí del & Pérez-Ortín, J. E. Statistical analysis of yeast genomic downstream sequences reveals putative polyadenylation signals. Nucleic Acids Research 28, 1000–1010 (2000).
5. Robertson, G. et al. Genome-wide profiles of sTAT1 dNA association using chromatin immunoprecipitation and massively parallel sequencing. Nature methods 4, 651–657 (2007).
6. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. science 326, 289–293 (2009).
7. Yip, K. Y. et al. Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome Biol 13, R48 (2012).
8. Consortium, E. P. & others. An integrated encyclopedia of dNA elements in the human genome. Nature 489, 57–74 (2012).