\documentclass[a4paper]{article} %\VignetteIndexEntry{Drawing Phylogenies} %\VignettePackage{ape} \usepackage{ape} \author{Emmanuel Paradis} \title{Drawing Phylogenies in \R: Basic and Advanced Features With \pkg{ape}} \begin{document} \DefineVerbatimEnvironment{Sinput}{Verbatim}{formatcom=\color{darkblue}} \DefineVerbatimEnvironment{Soutput}{Verbatim}{formatcom=\color{black}\vspace{-1.5em}} \maketitle \tableofcontents\vspace*{1pc}\hrule <>= options(width = 80, prompt = "> ") @ \vspace{1cm} \section{Introduction} Graphical functions have been present in \ape\ since its first version (0.1, released in August 2002). Over the years, these tools have been improved to become quite sophisticated although complicated to use efficiently. This document gives an overview of these functionalities. Section~\ref{sec:basic} explains the basic concepts and tools behind graphics in \ape. A figure made with \ape\ usually starts by calling the function \code{plot.phylo} which is detailed in Section~\ref{sec:plotphylo}, and further graphical annotations can be done with functions covered in Section~\ref{sec:annot}. Section~\ref{sec:spec} shows some specialized functions available in \ape, and finally, Sections~\ref{sec:geom} and \ref{sec:build} give an overview of some ideas to help making complicated figures. \section{Basic Concepts}\label{sec:basic} The core of \ape's graphical tools is the \code{plot} method for the class \code{"phylo"}, the function \code{plot.phylo}. This function is studied in details in Section~\ref{sec:plotphylo}, but first we see the basic ideas behind it and other functions mentioned in this document. \subsection{Graphical Model} The graphical functions in \ape\ use the package \pkg{graphics}. Overall, the conventions of this package are followed quite closely (see Murrell's book \cite{Murrell2006}), so users familiar with graphics in \R\ are expected to find their way relatively easily when plotting phylogenies with \ape. \ape\ has several functions to perform computations before drawing a tree, so that they may be used to implement the same graphical functionalities with other graphical engines such as the \pkg{grid} package. These functions are detailed in the next section. To start simply, we build a small tree with three genera of primates which we will use in several examples in this document: <<>>= library(ape) mytr <- read.tree(text = "((Pan:5,Homo:5):2,Gorilla:7);") @ \noindent Now let's build a small function to show the frame around the plot with dots, and the $x$- and $y$-axes in green: <<>>= foo <- function() { col <- "green" for (i in 1:2) axis(i, col = col, col.ticks = col, col.axis = col, las = 1) box(lty = "19") } @ \noindent We then plot the tree in four different ways (see below for explanations about the options) and call for each of them the previous small function: <>= layout(matrix(1:4, 2, 2, byrow = TRUE)) plot(mytr); foo() plot(mytr, "c", FALSE); foo() plot(mytr, "u"); foo() par(xpd = TRUE) plot(mytr, "f"); foo() box("outer") @ \noindent The last command (\code{box("outer")}) makes visible the most outer frame of the figure showing more clearly the margins around each tree (more on this in Sect.~\ref{sec:geom}). We note also the command \code{par(xpd = TRUE)}: by default this parameter is \code{FALSE} so that graphical elements (points, lines, text, \dots) outside the plotting region (i.e., in the margins or beyond) are cut (clipped).\footnote{\code{par(xpd = TRUE)} is used in several examples in this document mainly because of the small size of the trees drawn here. However, in practice, this is rarely needed.} These small figures illustrate the way trees are drawn with \ape. This can be summarised with the following (pseudo-)algorithm: \bigskip\hrule height 1pt\relax %\renewcommand{\theenumi}{\alph{enumi}} \renewcommand{\labelenumi}{\textbf{\theenumi.}} \begin{enumerate}\small \item Compute the node coordinates depending on the type of tree plot, the branch lengths, and other parameters. \item Evaluate the space required for printing the tip labels. \item Depending on the options, do some rotations and/or translations. \item Set the limits of the $x$- and $y$-axes. \item Open a graphical device (or reset it if already open) and draw an empty plot with the limits found at the previous step. \item Call \code{segments()} to draw the branches. \item Call \code{text()} to draw the labels. \end{enumerate} \hrule height 1pt\relax\bigskip There are a lot of ways to control these steps. The main variations along these steps are given below. \textbf{Step 1. } The option \code{type} specifies the shape of the tree plot: five values are possible, \code{"phylogram"}, \code{"cladogram"}, \code{"fan"}, \code{"unrooted"}, and \code{"radial"} (the last one is not considered in this document). The first three types are valid representations for rooted trees, while the fourth one should be selected for unrooted trees. The node coordinates depend also on whether the tree has branch lengths or not, and on the options \code{node.pos} and \code{node.depth}. This is illustrated below using a tree with eight tips and all branch length equal to one (these options have little effect if the tree has only three tips): <<>>= tr <- compute.brlen(stree(8, "l"), 0.1) tr$tip.label[] <- "" @ \noindent We now draw this tree using the option \code{type = "phylogram"} (first column of plots) or \code{type = "cladogram"} (second column) and different options: <>= foo <- function() { col <- "green" axis(1, col = col, col.ticks = col, col.axis = col) axis(2, col = col, col.ticks = col, col.axis = col, at = 1:Ntip(tr), las = 1) box(lty = "19") } @ <<>>= @ <>= layout(matrix(1:12, 6, 2)) par(mar = c(2, 2, 0.3, 0)) for (type in c("p", "c")) { plot(tr, type); foo() plot(tr, type, node.pos = 2); foo() plot(tr, type, FALSE); foo() plot(tr, type, FALSE, node.pos = 1, node.depth = 2); foo() plot(tr, type, FALSE, node.pos = 2); foo() plot(tr, type, FALSE, node.pos = 2, node.depth = 2); foo() } @ \noindent Some combinations of options may result in the same tree shape as shown by the last two rows of trees. For unrooted and circular trees, only the option \code{use.edge.length} has an effect on the layout and/or the scales of the axes: <>= foo <- function() { col <- "green" for (i in 1:2) axis(i, col = col, col.ticks = col, col.axis = col, las = 1) box(lty = "19") } @ <>= layout(matrix(1:4, 2, 2)) par(las = 1) plot(tr, "u"); foo() plot(tr, "u", FALSE); foo() plot(tr, "f"); foo() plot(tr, "f", FALSE); foo() @ \textbf{Step 2.} In the \pkg{graphics} package, text are printed with a fixed size, which means that whether you draw a small tree or a large tree, on a small or large device, the labels will have the same size. However, before anything is plotted or drawn on the device it is difficult to find the correspondence between this size (in inches) and the user coordinates used for the node coordinates. Therefore, the following steps are implemented to determine the limits on the $x$-axis: \renewcommand{\labelenumi}{\theenumi.} \begin{enumerate} \item Find the width of the device in inches (see Sect.~\ref{sec:overlay}). \item Find the widths of all labels in inches: if at least one of them is wider than the device, assign two thirds of the device for the branches and one third to the tip labels. (This makes sure that by default the tree is visible in the case there are very long tip labels.) \item Otherwise, the space allocated to the tip labels is increased incrementally until all labels are visible on the device. \end{enumerate} The limits on the $y$-axis are easier to determine since it depends only on the number of branches in the tree. The limits on both axes can be changed manually with the options \code{x.lim} and \code{y.lim} which take one or two values: if only one value is given this will set the rightmost or uppermost limit, respectively; if two values are given these will set both limits on the respecive axis.\footnote{These two options differ from their standard counterparts \code{xlim} and \code{ylim} which always require two values.} By default, there is no space between the tip labels and the tips of the terminal branches; however, text strings are printed with a bounding box around them making sure there is actually a small space (besides, the default font is italics making this space more visible). The option \code{label.offset} (which is 0 by default) makes possible to add an explicit space between them (this must be in user coordinates). \textbf{Step 3.} For rooted trees, only 90\textdegree\ rotations are supported using the option \code{direction}.\footnote{To have full control of the tree rotation, the option `rotate' in \LaTeX\ does the job very well.} For unrooted (\code{type = "u"}) and circular (\code{type = "fan"}) trees, full rotation is supported with the option \code{rotate.tree}. If these options are used, the tip labels are not rotated. Label rotation is controlled by other options: \code{srt}\footnote{\code{srt} is for \textit{string rotation}, not to be confused with the function \code{str} to print the \textit{structure} of an object.} for all trees, and \code{lab4ut} for unrooted trees. \textbf{Step 4.} These can be fully controlled with the options \code{x.lim} and \code{y.lim}. Note that the options \code{xlim} and \code{ylim} \emph{cannot} be used from \code{plot.phylo}. \textbf{Step 5.} If the options \code{plot = FALSE} is used, then steps 6 and 7 are not performed. \subsection{Computations}\label{sec:comput} As we can see from the previous section, a lot of computations are done before a tree is plotted. Some of these computations are performed by special functions accessible to all users, particularly the three functions used to calculate the node coordinates. First, two functions calculate ``node depths'' which are the coordinates of the nodes on the $x$-axis for rooted trees: <<>>= args(node.depth.edgelength) args(node.depth) @ \noindent Here, \code{phy} is an object of class \code{"phylo"}. The first function uses edge lengths to calculate these coordinates, while the second one calculates these coordinates proportional to the number of tips descending from each node (if \code{method = 1}), or evenly spaced (if \code{method = 2}). The third function is \code{node.height} and is used to calculate ``node heights'', the coordinates of the nodes on the $y$-axis: <<>>= args(node.height) @ \noindent If \code{clado.style = TRUE}, the node heights are calculated for a ``triangular cladogram'' (see figure above). Otherwise, by default they are calculated to fall in the middle of the vertical segments with the default \code{type = "phylogram"}.\footnote{It may be good to remind here than these segments, vertical since \code{direction = "rightwards"} is the default, are not part of the edges of the tree.} For unrooted trees, the node coordinates are calculated with the ``equal angle'' algorithm described by Felsenstein \cite{Felsenstein2004}. This is done by an internal function which arguments are: <<>>= args(unrooted.xy) @ \noindent There are three other internal functions used to plot the segments of the tree after the above calculations have been performed (steps 1--4 in the previous section): <<>>= args(phylogram.plot) args(cladogram.plot) args(circular.plot) @ \noindent Although these four functions are not formally documented, they are anyway exported because they are used by several packages outside \ape. \section{The \code{plot.phylo} Function}\label{sec:plotphylo} The \code{plot} method for \code{"phylo"} objects follows quite closely the \R\ standard practice. It has a relatively large number of arguments: the first one (\code{x}) is mandatory and is the tree to be drawn. It is thus not required to name it, so in practice the tree \code{tr} can be plotted with the command \code{plot(tr)}. All other arguments have default values: <<>>= args(plot.phylo) @ \noindent The second and third arguments are the two commonly used in practice, so they can be modified without explicitly naming them like in the above examples. Besides, \code{"cladogram"} can be abbreviated with \code{"c"}, \code{"unrooted"} with \code{"u"}, and so on. For the other arguments, it is better to name them if they are used or modified (e.g., \code{lab4ut = "a"}). \subsection{Overview on the Options} The logic of this long list of options is double: the user can modify the aspect of the tree plot, and/or use some of these options to display some data in association with the tree. Therefore, the table below group these options into three categories. The following two sections show how data can be displayed in connection to the tips or to the branches of the tree. \begin{center} \begin{tabular}{lll} \toprule Aspect of the tree & Attributes of the labels & Attributes of the edges\\ \midrule \code{type} & \code{show.tip.label} & \code{edge.color}\\ \code{use.edge.length} & \code{show.node.label} & \code{edge.width}\\ \code{node.pos} & \code{font} & \code{edge.lty}\\ \code{x.lim} & \code{tip.color}\\ \code{y.lim} & \code{cex}\\ \code{direction} & \code{adj}\\ \code{no.margin} & \code{underscore}\\ \code{root.edge} & \code{srt}\\ \code{rotate.tree} & \code{lab4ut}\\ \code{open.angle} & \code{label.offset}\\ \code{node.depth} & \code{align.tip.label}\\ \bottomrule \end{tabular} \end{center} \subsection{Connecting Data to the Tips} It is common that some data are associated with the tips of a tree: body mass, population, treatment, \dots\ The options \code{font}, \code{tip.color}, and \code{cex} make possible to show this kind of information by changing the font (normal, bold, italics, or bold-italics), the colour, or the size of the tip labels, or any combination of these. These three arguments work in the usual \R\ way: they can a vector of any length whose values are eventually recycled if this length is less than the number of tips. This makes possible to change all tips if a single value is given. For instance, consider the small primate tree where we want to show the geographic distributions stored in a factor: <<>>= geo <- factor(c("Africa", "World", "Africa")) @ \noindent We can define a color for each region and use the above factor as a numeric index vector and pass it to \code{tip.color}: \begin{center} \setkeys{Gin}{width=.5\textwidth} <>= (mycol <- c("blue", "red")[geo]) plot(mytr, tip.color = mycol) @ \end{center} The values must be in the same order than in the vector of tip labels, here \code{mytr\$tip.label}. Reordering can be done in the usual \R\ way (e.g., with \code{names} or with \code{row.names} if the data are in a data frame). This can be combined with another argument, for instance to show (relative) body size: \begin{center} \setkeys{Gin}{width=.5\textwidth} <>= par(xpd = TRUE) plot(mytr, tip.color = mycol, cex = c(1, 1, 1.5)) @ \end{center} The function \code{def} gives another way to define the above arguments given a vector of labels (\code{x}): <<>>= args(def) @ \noindent The `\code{...}' are arguments separated by commas of the form \code{\textsl{