--- title: "Update of a treestructure object with new sequences" author: "Vinicius Franceschi and Fabricia F Nascimento" date: "`r Sys.Date()`" output: bookdown::html_vignette2: #rmarkdown::html_vignette #bookdown::pdf_book: toc: TRUE pkgdown: as_is: true fontsize: 12pt vignette: > %\VignetteIndexEntry{Update of a treestructure object with new sequences} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 11, warning = FALSE, message = FALSE ) ``` # Introduction In this tutorial, we will exemplify how to update a previous `treestructure` object with new sequences using a down sampled version of the [Ebola dated tree](https://github.com/ebov/space-time/blob/master/Data/Makona_1610_cds_ig.GLM.MCC.tree), which is publicly available. First, we will load all the R packages that we will use in this tutorial. ```{r message=FALSE} library(ape) library(treestructure) library(phangorn) ``` Now we can read the down-sampled time-tree for Ebola. In this pruned tree, we have 1,310 tips. ```{r} pruned_tree <- readRDS( system.file('Ebola_down_sampled_tree.rds', package='treestructure') ) ``` ## Assign clusters using node support Now we will assign clusters using the posterior probability node support to the Ebola down-sampled phylogenetic tree: ```{r eval=FALSE} trestruct_res <- trestruct(pruned_tree, minCladeSize = 30, nodeSupportValues = TRUE, nodeSupportThreshold = 95, level = 0.01) ``` Because `treestructure` will take several minutes to run, we can load the results: ```{r} trestruct_res <- readRDS( system.file('downsampled_tree_struc.rds', package='treestructure') ) plot(trestruct_res, use_ggtree = T) + ggtree::geom_tippoint() ``` The `treestructure` analyses resulted in 4 clusters. ## Update a previous treestrucuture object with new sequences To update the previous `treestructure` object with new sequences, we will now use the maximum likelihood [Ebola tree](https://github.com/ebov/space-time/blob/master/Data/Makona_1610_genomes_2016-06-23.ml.tree). Note that this new tree must be rooted, but does not need to be time-scaled or binary. ```{r} #Note that this tree has more sequences than the previous tree used in this #tutorial. new_tree <- ape::read.nexus( system.file('Makona_1610_genomes_2016-06-23.ml.tree', package='treestructure') ) #now we can root the tree using mid-point rooting for illustration ml_rooted_tree <- phangorn::midpoint(new_tree) #now we need to remove the quotes from the tip names (to avoid an error with #treestructure function) ml_rooted_tree$tip.label <- unlist(lapply(ml_rooted_tree$tip.label, function (x) gsub("'", "", x))) ``` And without the need to re-estimate a timetree or re-run `trestruct` from scratch, we are now able to add the new sequences to the existing `treestructure` object: ```{r} trestruct_add_tips <- addtips(trst = trestruct_res, tre = ml_rooted_tree) plot(trestruct_add_tips, use_ggtree = T) + ggtree::geom_tippoint() ``` If you would like to compare the sequence names that comprise each cluster in each tree, you can do: ```{r} #compare sequences in cluster 1 from trestruct_res object and the #trestruct_add_tips object tree1_cluster1 <- trestruct_res$clusterSets$`1` tree2_cluster1 <- trestruct_add_tips$clusterSets$`1` length(tree1_cluster1) length(tree2_cluster1) ``` Note that the length of tree1_cluster1 and tree2_cluster1 is different. That is because we _added_ tips from the ML tree, _ml_rooted_tree_, to the `treestructure` object, _trestruct_res_. You can also see that all elements in tree1_cluster1 is contained in tree2_cluster1 ```{r} sum(tree1_cluster1 %in% tree2_cluster1) ```