--- title: "Posterior tree distribution" author: "Gustavo A. Ballen and Sandra Reinales" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Posterior tree distribution} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- There are two steps necessary for summarizing tree samples. One is to decide how many distinct topologies (disregarding branch lengths) are in the collection of trees, and then we need to summarize branch lengths for each of these tree subsets using any sensible summary measure (e.g., mean or median). The functions `topoFreq` and `summaryBrlen` aid in these two tasks respectively. `topoFreq` uses the Robinson-Foulds (RF) distance among trees in order to find clusters which are identical in topology and then bind them together in a list that we can later operate over. The RF distance is defined over unrooted trees, so it is necessary to check whether the input trees are. Unrooting a rooted tree will destroy the root node and then add up together both adjacent branch lengths, and this will be reflected when summarizing with` summaryBrlens` using the output of `topoFreq`. This will return one branch length less than the number present in the original sample of rooted trees. Accordingly, this method will only work on collections of unrooted trees, or collections of time trees of fixed topology where we are only interested in finding a summary values for branch lengths. It should not be used when both the topology and divergence times are being coestimated (e.g., when the posterior of trees comes from not fixing the topology as in `Beast2`, `RevBayes`, and `MrBayes`), but it is perfectly fine when the inference was done to estimate only the unrooted topology and branch lengths, such as in standard tree inference in `MrBayes`. We see an example using these two functions below where we want to use the median branch length for each group: ```{r, fig.align="center", out.width="100%"} # load the packages library(ape) library(tbea) # these urls render correctly on html but not on pdf. The former is more important for CRAN. # download the trees from the supplementary material of Ballen et al. (2022) t1<-"https://raw.githubusercontent.com/gaballench/cynodontidaeWare/refs/heads/main/bayesian/mrbayes" t2<-"https://raw.githubusercontent.com/gaballench/cynodontidaeWare/refs/heads/main/bayesian/mrbayes" mtr1 <- read.nexus(paste(t1, "concatenatedMolmorph.nexus.run1.t", sep="/")) mtr2 <- read.nexus(paste(t2, "concatenatedMolmorph.nexus.run2.t", sep="/")) trees <- c(mtr1, mtr2) # calculate topological frequencies tpf <- topoFreq(trees, output="trees") # summarize median branch length sumtrees <- summaryBrlen(tpf$trees, method="median") # sort the frequencies in decreasing order decreasingIdx <- order(tpf$fabs, decreasing=TRUE) # how many topologies comprise about 90% of the trees? sum(cumsum(tpf$fabs[decreasingIdx])/sum(tpf$fabs)<0.9) # plot these four trees with branch lengths already summarized par(mfrow=c(2,2)) par(oma=c(0, 0, 0, 0)) par(mai=c(0.5, 0.2, 0.5, 0.2)) for(i in 1:4){ plot(sumtrees[[decreasingIdx[i]]], type="unrooted", show.node.label=FALSE, cex=0.4, main=round(tpf$frel[decreasingIdx[i]], digits=2)) } ```