---
title: "Four Dimensional High Throughput GoMiner (HTGM4D)"
author: Barry Zeeberg [aut, cre]
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Four Dimensional High Throughput GoMiner (HTGM4D)}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
Four Dimensional High Throughput GoMiner (HTGM4D)
Barry
Zeeberg
barryz2013@gmail.com
Motivation
‘Four Dimensional High Throughput GoMiner (HTGM4D)’ is really the final package of the seven CRAN packages that together comprise the GoMiner suite. The other six are ‘minimalistGODB,’ 'randomGODB',‘GoMiner,’ ‘High Throughput GoMiner (HTGM)', ‘Two Dimensional High Throughput GoMiner (HTGM2D)', and ‘Three Dimensional High Throughput GoMiner (HTGM3D)'. HTGM4D is an extension of HTGM2D, that is intended to provide the user with an enhanced context for interpreting the basic HTGM2D results.
The Gene Ontology (GO) Consortium organizes genes into hierarchical categories based on biological process (BP), molecular function (MF) and cellular component (CC, i.e., subcellular localization). Tools such as GoMiner (see Zeeberg, B.R., Feng, W., Wang, G. et al. (2003) ) can leverage GO to perform ontological analysis of microarray and proteomics studies, typically generating a list of significant functional categories. Microarray studies are usually analyzed with BP, whereas proteomics researchers often prefer CC.
To capture the benefit of both of those ontologies simultaneously, I developed a two-dimensional version of High-Throughput GoMiner (HTGM2D). I generate a 2D heat map whose axes are any two of BP, MF, or CC, and the value within a picture element of the heat map reflects the Jaccard metric p-value for the number of genes in common for the corresponding ontology pair.
The HTGM2D heatmap has only 2 axes, so the identity of the genes are unfortunately 'integrated out of the equation.' Because the graphic for the heatmap is implemented in Scalable Vector Graphics (SVG) technology, it is relatively easy to hyperlink each picture element to the relevant list of genes. By clicking on the desired picture element, the user can recover the 'lost' genes.
To complement the hyperlink approach, HTGM4D provides enhanced gene information, by aligning the corresponding pair of standard GoMiner heatmaps along the axes of the HTGM2D heatmap. This does not show the genes that are in the HTGM2D heatmap, but rather the significant genes that had been found in the standard GoMiner analyses, while restricting (and visually matching up) the significant categories to those found in the HTGM2D analysis. In the event that categories in the HTGM2D analysis were not found in the GoMiner analysis, a blank row(s) or column(s) is added to the GoMiner analysis.
Results
The list of genes that I will use here for proof of concept, referred to as 'cluster52' was derived from a published analysis of a large set of cancer cell lines (Zeeberg, B.R., Kohn, K.W., Kahn, A., Larionov, V., Weinstein, J.N., Reinhold, W., Pommier, Y. (2012) ) and was subsequently the subject of intensive research (Kohn, K.W., Zeeberg, B.R, Reinhold, W.C., Sunshine, M., Luna, A., Pommier, Y. (2012) ) because they were associated with categories like cell adhesion, which are key to understanding metastasis. The gene list is available in data/cluster52.RData . The GO database *GOGOA3* can be obtained from my package 'minimalistGODB' or downloaded from https://github.com/barryzee/GO.
The HTGM4D study was invoked by
```
load("/Users/barryzeeberg/personal/GODB_RDATA/goa_human/
GOGOA3_goa_human.RData")
geneList<-cluster52
ontologies<-c("biological_process","cellular_component")
dir<-tempdir()
odir<-HTGM4Ddriver(dir,geneList,ontologies,GOGOA3,enrichThresh=2,
countThresh=5,pvalThresh=0.10,fdrThresh=0.10,nrand=100,
mn=2,mx=2000)
```
This differs slightly from the how the study had been invoked that was reported in the vignette for HTGM2D. In the current instance, there are two new parameters that were not available at the time of the HTGM2D study. These are *mn* and *mx*, that restrict the categories that are permitted in the study based upon the number of genes that map to the category, within the GOGOA3 database. Thus, the present study did not use all of the GOGOA3 categories that had been used in the HTGM2D study.
Figure 1 shows the HTGM4D arrangement for svg graphics. The color scale represents the false discovery rate (FDR) of the category, with bright red corresponding to the most significant FDR close to 0.00, and the light background color corresponding to the fdrThresh of 0.10.
{width=175%}
Figure 2 shows the same arrangement for svg graphics, but with the labels showing for the HTGM2D portion. This is too crowded for normal viewing, and just serves to validate that the correct order of labels is present in the standard GoMiner graphics
{width=175%}
Figure 3 shows the same arrangement as in Figure 1, but this time for png images. The reason for including the png version is that initially it seemed impossible to generate a stable arrangement using svg, so png was used as a substitute. Fortunately, I eventually figured out the trick that allowed using svg directly. The trick is that svg() generates the category and gene labels using vector graphics, whereas svglite() uses regular text characters. The latter are more robust and do not degrade upon scaling and translation.
{width=175%}
Note that for both the png and svg versions, it is difficult to pre-compute the exact scaling and positioning of the three graphics components, and the package includes facilities for the user to interactively and iteratively apply the appropriate corrections. This procedure is very intuitive and easy (and fun to do!), usually taking no more than a couple of iterations.