--- title: "ggtree: a phylogenetic tree viewer for different types of tree annotations" author: "\\ Guangchuang Yu () and Tommy Tsan-Yuk Lam ()\\ School of Public Health, The University of Hong Kong" date: "`r Sys.Date()`" bibliography: ggtree.bib csl: nature.csl output: html_document: toc: true pdf_document: toc: true vignette: > %\VignetteIndexEntry{00 ggtree introduction} %\VignetteEngine{knitr::rmarkdown} %\usepackage[utf8]{inputenc} --- ```{r style, echo=FALSE, results="asis", message=FALSE} knitr::opts_chunk$set(tidy = FALSE, message = FALSE) ``` ```{r echo=FALSE, results="hide", message=FALSE} library("colorspace") library("Biostrings") library("ape") library("ggplot2") library("ggtree") ``` > You can't even begin to understand biology, you can't understand life, unless you understand what it's all there for, how it arose - and that means evolution. > --- Richard Dawkins # Citation If you use `ggtree` in published research, please cite: __G Yu__, DK Smith, H Zhu, Y Guan, TTY Lam^\*^. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. __*Methods in Ecology and Evolution*__. _accepted_. doi:[10.1111/2041-210X.12628](http://dx.doi.org/10.1111/2041-210X.12628) # Introduction This project arose from our needs to annotate nucleotide substitutions in the phylogenetic tree, and we found that there is no tree visualization software can do this easily. Existing tree viewers are designed for displaying phylogenetic tree, but not annotating it. Although some tree viewers can displaying bootstrap values in the tree, it is hard/impossible to display other information in the tree. Our first solution for displaying nucleotide substituitions in the tree is to add this information in the node/tip names and use traditional tree viewer to show it. We displayed the information in the tree successfully, but we believe this indirect approach is inefficient. Previously, phylogenetic trees were much smaller. Annotation of phylogenetic trees was not as necessary as nowadays much more data is becomming available. We want to associate our experimental data, for instance antigenic change, with the evolution relationship. Visualizing these associations in a phylogenetic tree can help us to identify evolution patterns. We believe we need a next generation tree viewer that should be programmable and extensible. It can view a phylogenetic tree easily as we did with classical software and support adding annotation data in a layer above the tree. This is the objective of developing the `ggtree`. Common tasks of annotating a phylogenetic tree should be easy and complicated tasks can be possible to achieve by adding multiple layers of annotation. The `ggtree` is designed by extending the `ggplot2`[@wickham_ggplot2_2009] package. It is based on the grammar of graphics and takes all the good parts of `ggplot2`. There are other R packages that implement tree viewer using `ggplot2`, including `OutbreakTools`, `phyloseq`[@mcmurdie_phyloseq_2013] and [ggphylo](https://github.com/gjuggler/ggphylo); they mostly create complex tree view functions for their specific needs. Internally, these packages interpret a phylogenetic as a collection of `lines`, which makes it hard to annotate diverse user input that are related to node (taxa). The `ggtree` is different to them by interpreting a tree as a collection of `taxa` and allowing general flexibilities of annotating phylogenetic tree with diverse types of user inputs. # Getting data into `R` Most of the tree viewer software (including `R` packages) focus on `Newick` and `Nexus` file format, while there are file formats from different evolution analysis software that contain supporting evidences within the file that are ready for annotating a phylogenetic tree. In addition to `Newick` and `Nexus`, ggtree supports `NHX`, `jplace` and `Phylip` file formats. `ggtree` also supports software outputs from [BEAST](http://beast2.org/)[@bouckaert_beast_2014], [EPA](http://sco.h-its.org/exelixis/web/software/epa/index.html)[@berger_EPA_2011], [HYPHY](http://hyphy.org/w/index.php/Main_Page)[@pond_hyphy_2005], [PAML](http://abacus.gene.ucl.ac.uk/software/paml.html)[@yang_paml_2007], [PHYLDOG](http://pbil.univ-lyon1.fr/software/phyldog/)[@boussau_genome-scale_2013], [pplacer](http://matsen.fhcrc.org/pplacer/)[@matsen_pplacer_2010], [r8s](http://loco.biosci.arizona.edu/r8s/)[@marazzi_locating_2012], [RAxML](http://sco.h-its.org/exelixis/web/software/raxml/)[@stamatakis_raxml_2014] and [RevBayes](http://revbayes.github.io/intro.html)[@hohna_probabilistic_2014]. Parsing data from a number of molecular evolution software is not only for visualization in `ggtree`, but also bring these data to `R` users for further analysis (e.g. summarization, visualization, comparision, test, _etc_). For more details, please refer to [Tree Data Import](treeImport.html) vignette. # Tree Visualization and Annotation Tree Visualization in `ggtree` is easy, with one line of command `ggtree(tree_object)`. It supports several layouts, including `rectangular`, `slanted` and `circular` for `Phylogram` and `Cladogram`, `unrooted` layout, time-scaled and two dimentional phylogenies. [Tree Visualization](treeVisualization.html) vignette describes these feature in details. We implement several functions to manipulate a phylogenetic tree. + taxa can be clustered together using `groupClade` or `groupOTU` functions + clades can be collapsed via `collapse` function + collapsed clade can be expanded by using `expand` function + clade can be re-scale to zoom in or zoom out by `scaleClade` function + selected clade can be rotated by 180 degree using `rotate` function + position of two selected clades (should share a same parent) can be exchanged by `flip` function Details and examples can be found in [Tree Manipulation](treeManipulation.html) vignette. Most of the phylogenetic trees are scaled by evolutionary distance (substitution/site), in `ggtree` a phylogenetic tree can be re-scaled by any numerical variable inferred by evolutionary analysis (e.g. species divergence time, *dN/dS*, _etc_). Numerical and category variable can be used to color a phylogenetic tree. The `ggtree` package provides several layers to annotate a phylogenetic tree, including: + `geom_cladelabel` for labelling selected clades + `geom_hilight` for highlighting selected clades + `geom_range` to indicate uncertainty of branch lengths + `geom_strip` for adding strip/bar to label associated taxa (with optional label) + `geom_taxalink` for connecting related taxa + `geom_tiplab` for adding tip labels + `geom_treescale` for adding a legend of tree scale It supports annotating phylogenetic trees with analyses obtained from R packages and other commonly used evolutionary software. User's specific annotation (e.g. experimental data) can be integrated to annotate phylogenetic trees. `ggtree` provides `write.jplace` function to combine Newick tree file and user's own data to a single `jplace` file that can be parsed and the data can be used to annotate the tree directly in `ggtree`. `ggtree` integrates `phylopic` database and silhouette images of organisms can be downloaded and used to annotate phylogenetic directly. `ggtree` also supports using local images to annotate a phylogenetic tree. Visualizing an annotated phylogenetic tree with numerical matrix (e.g. genotype table), multiple sequence alignment and subplots are also supported in `ggtree`. Examples of annotating phylogenetic trees can be found in the [Tree Annotation](treeAnnotation.html) and [Advance Tree Annotation](advanceTreeAnnotation.html) vignettes. # Vignette Entry + [Tree Data Import](treeImport.html) + [Tree Visualization](treeVisualization.html) + [Tree Manipulation](treeManipulation.html) + [Tree Annotation](treeAnnotation.html) + [Advance Tree Annotation](advanceTreeAnnotation.html) + [ggtree utilities](ggtreeUtilities.html) More documents can be found in . ## Feedback ## - For bugs or feature request, please post to [github issue](https://github.com/GuangchuangYu/ggtree/issues). - For user questions, please post to [google group](https://groups.google.com/forum/#!forum/bioc-ggtree) or post to [Bioconductor support site](https://support.bioconductor.org/) or [Biostars](https://www.biostars.org/). We are following every post tagged with **ggtree**. # Session info Here is the output of `sessionInfo()` on the system on which this document was compiled: ```{r echo=FALSE} sessionInfo() ``` # References