--- title: "gVenn: Proportional Venn diagrams for genomic regions and gene set overlaps" author: "Christophe Tav" date: "`r format(Sys.time(), '%B %Y')`" output: html_document: toc: true toc_depth: 3 number_sections: false vignette: > %\VignetteIndexEntry{gVenn} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} knitr: opts_chunk: fig.bg: "white" --- ```{r, echo=FALSE, out.width="20%", fig.align="center"} knitr::include_graphics("figures/20250827_hex_gVenn_v1.png") ``` ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) suppressWarnings(library(GenomicRanges)) ``` # Introduction **gVenn** stands for **gene/genomic Venn**. It provides tools to compute overlaps between genomic regions or sets of genes and visualize them as **Venn** diagrams with areas proportional to the number of overlapping elements. In addition, the package can generate **UpSet** plots for cases with many sets, offering a clear alternative to complex Venn diagrams. With seamless support for `GRanges` and `GRangesList` objects, **gVenn** integrates naturally into Bioconductor workflows such as ChIP-seq, ATAC-seq, or other interval-based analyses. Overlap groups can be easily extracted for further analysis, such as motif enrichment, transcription factor binding enrichment, or gene annotation. **gVenn** package produces clean, publication-ready figures.
```{r, echo=FALSE, out.width="100%", fig.align="center"} knitr::include_graphics("figures/20250827_graphical_abstract_v2.png") ```
# Installation The gVenn package is available through Bioconductor and GitHub. You can install it from Bioconductor using: ```{r, install-bioconductor, eval=FALSE} if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("gVenn") ``` To install the development version from GitHub, use: ```{r, install-github, eval=FALSE} # install.packages("pak") # if not already installed pak::pak("ckntav/gVenn") # or, alternatively: # install.packages("devtools") # if not already installed devtools::install_github("ckntav/gVenn") ```
# Example workflow This section demonstrates a typical workflow with gVenn, from computing overlaps to generating clean, publication-ready figures. The examples show how to work with genomic interval data. We start by loading the package: ```{r setup} library(gVenn) ``` ## 1. Load example ChIP-seq peak sets (genomic) We use the dataset **`a549_chipseq_peaks`**, which contains example consensus peak subsets for **MED1**, **BRD4**, and **GR** after dexamethasone treatment in A549 cells. To keep the dataset small and suitable for examples and tests, each set has been restricted to peaks located on *chromosome 7*. These data originate from Tav *et al.* (2023) (doi:10.3389/fgene.2023.1237092). ```{r, load_chip_dataset} # Load the example A549 ChIP-seq peaks (subset on chr7 for demo) data(a549_chipseq_peaks) ``` ## 2. Compute overlaps between genomic regions We compute overlaps between the ChIP-seq peak sets using `computeOverlaps()`: ```{r compute_overlaps} genomic_overlaps <- computeOverlaps(a549_chipseq_peaks) ``` The result is a structured `GenomicOverlapResult` object that contains: - A GRanges object, where each region includes metadata describing its overlap pattern across the input sets. - An associated logical matrix (or data frame) indicating which reduced regions overlap with which input sets. ## 3. Visualization ### Venn diagram `plotVenn()` draws proportional Venn diagrams from the overlap object. ```{r, plot_venn, fig.width=5, fig.height=3, fig.align='center'} plotVenn(genomic_overlaps) ```
### UpSet plot For more than **three sets**, a Venn diagram with **areas exactly proportional** to all intersections is **generally not mathematically attainable**. Solvers (like those used by `eulerr`) provide **best-effort approximations**, but the layout can become hard to read. In these cases, an **UpSet plot** is the recommended visualization because it scales cleanly to many sets and preserves intersection sizes precisely on bar axes. We therefore suggest using `plotUpSet()` when you have **> 3 sets** (or any time the Venn becomes visually crowded). ```{r, plot_upset, fig.width=5, fig.height=3, fig.align='center'} plotUpSet(genomic_overlaps) ```
### Export visualization You can export any visualization using `saveViz()`: ```{r save_plot, eval=FALSE} venn <- plotVenn(genomic_overlaps) saveViz(venn, output_dir = ".", output_file = "figure_gVenn", format = "pdf") ``` By default, files are written to the current directory ("."). If you enabled the date option (today), the current date will be prepended to the filename. You can also export to PNG or SVG: ```{r save_plot2, eval=FALSE} saveViz(venn, output_dir = ".", output_file = "figure_gVenn", format = "png") saveViz(venn, output_dir = ".", output_file = "figure_gVenn", format = "svg") ``` ## 4. Extract elements per overlap group ```{r, extractOverlaps_example1} groups <- extractOverlaps(genomic_overlaps) ``` ```{r, extractOverlaps_example2} # Display the number of genomic regions per overlap group sapply(groups, length) ``` In this example: - 243 peaks are shared across all three factors (MED1, BRD4, and GR) - 267 peaks are unique to BRD4 - 48 peaks are shared between MED1 and BRD4 only
#### Overlap group naming When overlaps are computed, each group of elements or genomic regions is labeled with a binary code that indicates which sets the element belongs to. - Each digit in the code corresponds to one input set (e.g., A, B, C). - A 1 means the element is present in that set, while 0 means absent. - The group names in the output are prefixed with "group_" for clarity.
| Group name | Meaning | |--------------|-------------------------------| | `group_100` | Elements only in **A** | | `group_010` | Elements only in **B** | | `group_001` | Elements only in **C** | | `group_110` | Elements in **A ∩ B** (not C) | | `group_101` | Elements in **A ∩ C** (not B) | | `group_011` | Elements in **B ∩ C** (not A) | | `group_111` | Elements in **A ∩ B ∩ C** |

#### Extract one particular group Each overlap group can be accessed directly by name for downstream analyses, including motif enrichment, transcription factor (TF) enrichment, annotation of peaks to nearby genes, functional enrichment or visualization. For example, to extract all elements that are present in **A ∩ B ∩ C**: ```{r, extractOverlaps_example3} # Extract elements in group_111 (present in all three sets: MED1, BRD4, and GR) peaks_in_all_sets <- groups[["group_111"]] # Display the elements peaks_in_all_sets ```
#### Exporting overlap groups Each overlap group (e.g., `group_100`, `group_110`, `group_111`) can be exported for downstream analysis. The function `exportOverlaps()` writes each group to a Excel file, which makes it easy to reuse the results outside of R. ```{r, exportOverlaps, eval=FALSE} # export overlaps exportOverlaps(groups, output_dir = ".", output_file = "overlap_groups") ```
# Customization examples This section shows common ways to customize the Venn diagram produced by `plotVenn()`. All examples use the built-in `gene_list` dataset. ```{r venn-custom-default, fig.width=6, fig.height=4, fig.align="center"} # load the example gene_list data(gene_list) # compute overlaps between gene sets res_sets <- computeOverlaps(gene_list) # basic default venn plot (uses package defaults) plotVenn(res_sets) ```
#### Custom fills with transparency ``` {r venn-custom-fills, fig.width=6, fig.height=4, fig.align="center"} plotVenn(res_sets, fills = list(fill = c("#FF6B6B", "#4ECDC4", "#45B7D1"), alpha = 0.5), legend = "right", main = list(label = "Custom fills (transparent)", fontsize = 14)) ```
#### Colored edges, no fills (colored borders only) ``` {r venn-transparent-fills, fig.width=6, fig.height=4, fig.align="center"} plotVenn(res_sets, fills = "transparent", edges = list(col = c("red", "blue", "darkgreen"), lwd = 2), main = list(label = "Colored borders only")) ```
#### Custom labels and counts + percentages ``` {r venn-labels-quantities, fig.width=6, fig.height=4, fig.align="center"} plotVenn(res_sets, labels = list(col = "black", fontsize = 12, font = 2), quantities = list(type = c("counts","percent"), col = "black", fontsize = 10), main = list(label = "Counts + Percentages", fontsize = 14)) ```
#### Legend at the bottom with custom text ``` {r venn-legend-bottom, fig.width=6, fig.height=4, fig.align="center"} plotVenn(res_sets, legend = list(side = "bottom", labels = c("Treatment A","Treatment B","Control"), fontsize = 10), main = list(label = "Custom legend")) ```
#### Combining multiple custom options ``` {r venn-multiple-custom, fig.width=6, fig.height=4, fig.align="center"} plotVenn(res_sets, fills = list(fill = c("#2B70AB", "#FFB027", "#3EA742"), alpha = 0.6), edges = list(col = "gray30", lwd = 1.5), labels = list(col = "black", fontsize = 7, font = 2), quantities = list(type = "counts", col = "black", fontsize = 10), main = list(label = "multiple custom options Venn", fontsize = 16, font = 2), legend = FALSE) ```
# Session info This vignette was built with the following R session: ```{r session-info} sessionInfo() ``` # References ### Example A549 ChIP-seq dataset - Tav, C., Fournier, É., Fournier, M., Khadangi, F., Baguette, A., Côté, M.C., Silveira, M.A.D., Bérubé-Simard, F.-A., Bourque, G., Droit, A., & Bilodeau, S. (2023). *Glucocorticoid stimulation induces regionalized gene responses within topologically associating domains.* **Frontiers in Genetics**, 14, 1237092. doi:10.3389/fgene.2023.1237092 ### Supporting packages - **eulerr** : Larsson, J. (2023). *eulerr: Area-Proportional Euler and Venn Diagrams with Ellipses.* CRAN package page - **ComplexHeatmap** : Gu, Z., Eils, R., & Schlesner, M. (2016). *Complex heatmaps reveal patterns and correlations in multidimensional genomic data.* **Bioinformatics**, 32(18), 2847–2849. doi:10.1093/bioinformatics/btw313