Gene set analysis workflow with gVenn: identifying overlaps and shared genes

Supplementary data S2

Last updated: 11 February 2026

NoteInteractive notebook

This analysis workflow is available online at:
https://ckntav.github.io/gVenn_manuscript/supp_data_S2.html

1 Overview

This supplementary material demonstrates a complete gene set analysis workflow using the gVenn R package. The workflow includes computing overlaps between gene lists, visualizing results with Venn diagrams and UpSet plots, extracting overlap groups, and exporting results for downstream analysis.

2 Installation

The gVenn package can be installed from Bioconductor or GitHub:

# From Bioconductor
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("gVenn")

# From GitHub (development version)
# install.packages("pak")
# pak::pak("ckntav/gVenn")

3 Load required packages

library(gVenn)
library(knitr)

3.1 Checking gVenn version

packageVersion("gVenn")
[1] '1.1.1'

4 Example dataset

The gVenn package includes a synthetic gene list dataset (gene_list) comprising three sets of human gene symbols with designed overlaps. This dataset was generated from the first 250 gene symbols in the org.Hs.eg.db package using a reproducible random seed.

# Load the example gene list dataset
data(gene_list)

# Display the structure of the dataset
str(gene_list)
List of 3
 $ random_genes_A: chr [1:125] "ALPP" "ACTG1P9" "AHSG" "ASIC2" ...
 $ random_genes_B: chr [1:115] "AFM" "ADPRH" "AIF1" "ACVR2A" ...
 $ random_genes_C: chr [1:70] "ACTN1" "ALDOA" "CRYBG1" "AK4" ...

The dataset contains three gene lists with the following sizes:

# Create a summary table
summary_df <- data.frame(
  "gene list" = names(gene_list),
  "number of genes" = sapply(gene_list, length),
  check.names = FALSE
)

kable(summary_df, 
      caption = "Summary of example gene lists")
Summary of example gene lists
gene list number of genes
random_genes_A random_genes_A 125
random_genes_B random_genes_B 115
random_genes_C random_genes_C 70

5 Computing overlaps

The computeOverlaps() function analyzes the intersection patterns across all gene lists and returns a structured object containing:

  • A vector of all unique elements across sets
  • A logical matrix indicating set membership for each element
  • Category labels encoding the overlap pattern (e.g., “110”, “101”)
# Compute overlaps between gene lists
gene_overlaps <- computeOverlaps(gene_list)

# Display the structure of the result
class(gene_overlaps)
[1] "SetOverlapResult"
names(gene_overlaps)
[1] "unique_elements"    "overlap_matrix"     "intersect_category"

5.1 Overlap matrix

The overlap matrix shows which genes belong to which sets:

# Display the first 10 rows of the overlap matrix
kable(head(gene_overlaps$overlap_matrix, 10),
      caption = "First 10 rows of the overlap matrix (TRUE = gene present in set)")
First 10 rows of the overlap matrix (TRUE = gene present in set)
random_genes_A random_genes_B random_genes_C
ALPP TRUE FALSE FALSE
ACTG1P9 TRUE FALSE FALSE
AHSG TRUE FALSE FALSE
ASIC2 TRUE FALSE FALSE
ACTG1P10 TRUE FALSE FALSE
ALAS1 TRUE FALSE FALSE
AKT2 TRUE FALSE FALSE
PARP1P1 TRUE FALSE FALSE
ABCD1 TRUE FALSE FALSE
SLC25A6 TRUE FALSE FALSE

5.2 Overlap categories

Each gene is assigned a category code representing its overlap pattern:

# Show the distribution of overlap categories
category_table <- table(gene_overlaps$intersect_category)
category_df <- data.frame(
  "Overlap pattern" = names(category_table),
  "Number of genes" = as.integer(category_table),
  check.names = FALSE
)

kable(category_df,
      caption = "Distribution of overlap patterns")
Distribution of overlap patterns
Overlap pattern Number of genes
001 17
010 45
011 16
100 67
101 4
110 21
111 33

Interpretation of overlap patterns:

  • 100: Genes only in set A (random_genes_A)
  • 010: Genes only in set B (random_genes_B)
  • 001: Genes only in set C (random_genes_C)
  • 110: Genes in A ∩ B (not in C)
  • 101: Genes in A ∩ C (not in B)
  • 011: Genes in B ∩ C (not in A)
  • 111: Genes in A ∩ B ∩ C (all three sets)

6 Visualization

6.1 Venn diagram

The plotVenn() function creates area-proportional Venn diagrams to visualize the overlaps:

# Create a basic Venn diagram
plotVenn(gene_overlaps)
Figure 1: Basic Venn diagram showing overlaps between three gene lists

6.1.1 Customized Venn Diagram

The appearance can be customized by adjusting colors, transparency, labels, and other parameters:

# Create a customized Venn diagram
plotVenn(gene_overlaps,
         fills = list(fill = c("#2B70AB", "#FFB027", "#3EA742"), alpha = 0.6),
         edges = list(col = "gray30", lwd = 1.5),
         labels = list(col = "black", fontsize = 12, font = 2),
         quantities = list(type = c("counts", "percent"), 
                          col = "black", fontsize = 10),
         main = list(label = "gene set overlaps", 
                    fontsize = 14, font = 2, col = "navy"),
         legend = list(side = "right", fontsize = 10))
Figure 2: Customized Venn diagram with adjusted colors and transparency

6.2 UpSet Plot

For larger numbers of sets (>3), UpSet plots provide a clearer alternative to Venn diagrams:

# Create an UpSet plot
plotUpSet(gene_overlaps)
Figure 3: UpSet plot showing intersection sizes between gene lists

6.2.1 Customized UpSet plot

The UpSet plot can also be customized with colors:

# Create a customized UpSet plot with colored dots
plotUpSet(gene_overlaps, 
          comb_col = c("#2B70AB", "#FFB027", "#3EA742", "#CD3301", 
                      "#9370DB", "#008B8B", "#D87093"))
Figure 4: Customized UpSet plot with colored combination matrix

7 Extracting overlap groups

The extractOverlaps() function separates genes into distinct groups based on their overlap patterns:

# Extract genes grouped by overlap pattern
gene_groups <- extractOverlaps(gene_overlaps)

# Display the number of genes per group
group_sizes <- sapply(gene_groups, length)
group_df <- data.frame(
  "Overlap group" = names(group_sizes),
  "Number of genes" = as.integer(group_sizes),
  check.names = FALSE
)

kable(group_df,
      caption = "Number of genes in each overlap group")
Number of genes in each overlap group
Overlap group Number of genes
group_001 17
group_010 45
group_100 67
group_011 16
group_101 4
group_110 21
group_111 33

7.1 Examining specific groups

Individual overlap groups can be accessed for downstream analysis:

7.1.1 Extract genes present in all three sets (group_111)

# Extract genes present in all three sets (group_111)
genes_in_all_three <- gene_groups[["group_111"]]

cat("Genes present in all three sets (A ∩ B ∩ C):\n")
Genes present in all three sets (A ∩ B ∩ C):
print(genes_in_all_three)
 [1] "ACP2"      "ALDH3A1"   "ACTB"      "ACACA"     "ASIC1"     "SLC25A5"  
 [7] "ACTL6A"    "AMY2B"     "AMH"       "AMPH"      "ADK"       "ALDH3A2"  
[13] "ACTG1P3"   "ACO1"      "ACTG1P7"   "ALPI"      "ANXA4"     "AGL"      
[19] "ADRB2"     "ABCF1"     "ABO"       "AMD1"      "ALS3"      "ALOX12"   
[25] "AMBP"      "AMPD2"     "ALDH1A1"   "AFG3L1P"   "ADFN"      "ADCYAP1R1"
[31] "ADD3"      "ALOX12P2"  "BIN1"     

7.1.2 Extract genes unique to random_genes_A (group_100)

# Extract genes unique to random_genes_A (group_100)
genes_unique_to_A <- gene_groups[["group_100"]]

cat("Genes unique to random_genes_A:\n")
Genes unique to random_genes_A:
print(genes_unique_to_A)
 [1] "ALPP"     "ACTG1P9"  "AHSG"     "ASIC2"    "ACTG1P10" "ALAS1"   
 [7] "AKT2"     "PARP1P1"  "ABCD1"    "SLC25A6"  "AAMP"     "ADCP1"   
[13] "ACADVL"   "ACTG1"    "ANGPT2"   "AGTR1"    "ACACB"    "ACTBP9"  
[19] "ALDH1B1"  "ADAR"     "ABCD2"    "AMHR2"    "ABCB7"    "ABCA1"   
[25] "PARP4"    "ACTG1P1"  "JAG1"     "ACTA2"    "ADH7"     "AP1B1"   
[31] "ACVR1"    "ACTN4"    "A2MP1"    "ABCA4"    "ALAD"     "ADRA1A"  
[37] "ADCY5"    "ALDOB"    "AP2B1"    "AMELY"    "ABL1"     "ACTC1"   
[43] "AK2"      "ALOX12B"  "ACTN3"    "AIC"      "ALB"      "NATP"    
[49] "ANG"      "AHR"      "ABCA2"    "ALPL"     "ANXA2P1"  "AMELX"   
[55] "AHCY"     "PARP1P2"  "ALOX5"    "AMPD1"    "AFA"      "ACADSB"  
[61] "AIH3"     "ACAN"     "AGA"      "AMY1C"    "ADSS2"    "ALDH2"   
[67] "ALOX15B" 

8 Exporting results

8.1 Export to Excel

The exportOverlaps() function exports each overlap group to a separate sheet in an Excel file:

# Export overlap groups to Excel
exportOverlaps(gene_groups,
               output_dir = "results",
               output_file = "gene_overlap_groups",
               with_date = TRUE,
               verbose = TRUE)

This creates an Excel file with one sheet per overlap group, making it easy to:

  • Review genes in each category
  • Perform functional enrichment analysis
  • Share results with collaborators
  • Import into other analysis tools

8.2 Saving visualizations

Visualizations can be exported in multiple formats (PDF, PNG, SVG):

# Create a Venn diagram
venn_plot <- plotVenn(gene_overlaps)

# Save as PDF
saveViz(venn_plot,
        output_dir = "figures",
        output_file = "gene_venn_diagram",
        format = "pdf",
        width = 6,
        height = 4)

# Save as high-resolution PNG
saveViz(venn_plot,
        output_dir = "figures",
        output_file = "gene_venn_diagram",
        format = "png",
        width = 6,
        height = 4,
        resolution = 300)

# Save with transparent background for presentations
saveViz(venn_plot,
        output_dir = "figures",
        output_file = "gene_venn_diagram_transparent",
        format = "png",
        bg = "transparent")

9 References

For more information about the gVenn package, visit:

10 Session information

Code
sessionInfo()
R version 4.5.2 (2025-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.7.3

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8

time zone: America/Toronto
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] knitr_1.51  gVenn_1.1.1

loaded via a namespace (and not attached):
 [1] ComplexHeatmap_2.26.1 jsonlite_2.0.0        compiler_4.5.2       
 [4] rjson_0.2.23          crayon_1.5.3          Rcpp_1.1.1           
 [7] stringr_1.6.0         magick_2.9.0          parallel_4.5.2       
[10] cluster_2.1.8.2       IRanges_2.44.0        png_0.1-8            
[13] yaml_2.3.12           fastmap_1.2.0         generics_0.1.4       
[16] shape_1.4.6.1         Cairo_1.7-0           BiocGenerics_0.56.0  
[19] iterators_1.0.14      GetoptLong_1.1.0      htmlwidgets_1.6.4    
[22] polyclip_1.10-7       circlize_0.4.17       lubridate_1.9.5      
[25] RColorBrewer_1.1-3    polylabelr_1.0.0      rlang_1.1.7          
[28] stringi_1.8.7         xfun_0.56             GlobalOptions_0.1.3  
[31] otel_0.2.0            doParallel_1.0.17     timechange_0.4.0     
[34] cli_3.6.5             magrittr_2.0.4        digest_0.6.39        
[37] foreach_1.5.2         grid_4.5.2            lifecycle_1.0.5      
[40] clue_0.3-67           eulerr_7.0.4          vctrs_0.7.1          
[43] S4Vectors_0.48.0      glue_1.8.0            evaluate_1.0.5       
[46] codetools_0.2-20      stats4_4.5.2          colorspace_2.1-2     
[49] rmarkdown_2.30        matrixStats_1.5.0     tools_4.5.2          
[52] htmltools_0.5.9