-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy path03-Phylogenies.Rmd
590 lines (438 loc) · 32.6 KB
/
03-Phylogenies.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
# Understanding, plotting and manipulating phylogenies in R
Using phylogenies in R can feel daunting at first, even if you're already familiar with R. Luckily, lots of nice R packages exist to help us work with them, and fairly new packages like `ggtree` mean we can now make beautiful phylogeny plots without needing to resort to other software. In this chapter I'll introduce you to how R stores phylogenies, and then show you how to plot them and manipulate them before using them in your PCMs. Most of the examples come from the *Primer*.
## Before you start
- Open the `03-Phylogenies.RProj` file in the `03-Phylogenies` folder to open your R Project for this exercise.
- Make yourself a new R script for your code.
You will also need to install the following packages:
* `ape`
* `tidyverse`
* `ggtree`
* `patchwork`
* `ggimage`
* `phytools`
## What do phylogeny files look like?
There are many different formats that you might get a phylogeny in, but the most common formats of the tree files we read into R are Newick and NEXUS. Here we will just focus on these two. If you need to read in lots of different kinds of tree files check out the documentation for [ggtree](https://yulab-smu.github.io/treedata-book/chapter1.html).
### Newick trees
Newick (or New Hampshire) format uses brackets/parentheses to group taxa together with their closest relatives. Tips are represented by their names, and these names can include any characters *except* blanks, colons, semicolons, parentheses, and square brackets. Because blanks are not allowed, we use underscores to replace them, meaning that in many phylogenies the tip names are formatted as Genus_species.
Our basic tree from the *Primer* is shown below.
```{r, echo = FALSE}
library(ape)
tree <- read.tree("data/basic.tre")
plot(tree)
```
The topology of this tree can be represented in Newick format as:
`(((robin,deer),spider),jellyfish);`
Note that Newick trees always end with a *semicolon*. Branch lengths can be added into a tree by adding a colon followed by the branch length after the node. This represents the length of the branch immediately following that node, so numbers after a tip label show how long the branches at the tips of the tree are, and others show how long the internal branches are.
For our tree above this can be written as:
`(((robin:4.2,deer:4.2):3.1,spider:7.3):6.3,jellyfish:13.5);`
And that is basically all there is to Newick trees! In R these can be read in using `read.tree` from the APE package. They are usually stored in `.tre` or `.phy` files.
### NEXUS trees
NEXUS files are a little more complicated, but are also based on the Newick format. Many tree inference packages will output trees as NEXUS files, so they are very commonly used in R. You read them in using `read.nexus` from the APE package, and they are usually stored in `.nex` files.
A NEXUS file for the tree above might look like the example below. First it uses the `#NEXUS` tag to tell the computer it is a NEXUS file, then the first block lists how many taxa there are (`NTAX`) and then what the names of the taxa are (`TAXLABELS`), i.e. the tip labels. Then the second block first gives each taxon a number, then the `TREE` block shows the tree in Newick format, exactly as we did above except using the numbers for the taxa rather than their names. The `UNTITLED = [&R]` part just means we haven't given the tree a name, and that the tree is *rooted*.
```
#NEXUS
BEGIN TAXA;
DIMENSIONS NTAX = 4;
TAXLABELS
robin
deer
spider
jellyfish
;
END;
BEGIN TREES;
TRANSLATE
1 robin,
2 deer,
3 spider,
4 jellyfish
;
TREE * UNTITLED = [&R] (((1:4.2,2:4.2):3.1,3:7.3):6.3,4:13.5);
END;
```
NEXUS files can be much more complex and include the data used to infer the tree too, but we don't need this for comparative analyses so `read.nexus` will just ignore it.
```{block, type = "info"}
Sometimes someone will email you a `.tre` or `.nex` file, or you'll download one from the internet, but when you use `read.tre` or `read.nexus` it just keeps giving you an error. Before panicking, check what you did after you downloaded the tree file. Did you open it to take a look? On some computers, doing this makes the computer freak out because it doesn't know what to do with `.tre` or `.nex` files, so it converts them into something it does understand like `.txt` or `.docx`. This then alters the tree file to the point that R can no longer read it. To solve the problem, just download the file again but this time *don't open it!*. Just save it and then read it into R. If you need to look at one of these files outside of R, try "Open with" and pick a text editor like Notepad or TextEdit. This shouldn't alter the file.
```
## How are phylogenies formatted in R?
You don't need to know how phylogenies are represented in R to use them, but it helps to have a bit of an idea what is going on.
To load a tree into R you need either the function `read.tree` or `read.nexus` from the package APE. `read.tree` can deal with a number of different types of data (including DNA) whereas `read.nexus` reads NEXUS files. Let's practice with our basic tree from the *Primer*. We can read it in using `read.tree`. Remember to load the APE library first or this won't work.
```{r}
# Load ape
library(ape)
# Read in the tree
tree <- read.tree("data/basic.tre")
```
Note that whether we read the tree in using `read.tree` or `read.nexus` the resulting tree in R is the same.
To understand what is going on we need a bit of computer programming jargon. Phylogenies are stored as objects of class `phylo`. What do I mean by this? Whenever we make a new "thing" in R and save it with a name we have created an *object*. So the code `tree <- read.tree("data/basic.tre")` above makes a new object called `tree` which will have the phylogeny stored in it. Each object in R belongs to a *class*, which is a sort of blueprint for how the object should behave. For example, objects of *numeric* class will behave differently to objects of *character* class - we can't multiply two characters (i.e. words) together but we can multiply two numbers. There are many classes in R and each have different rules. The `phylo` class is one of these that we use for phylogenies.
Let's examine the tree by typing:
```{r}
# Look at the tree summary
tree
```
Notice that when we print out a `phylo` object like `tree`, it doesn't just give us all of the information in the tree file. Instead it gives us a couple of bits of summary information. It tells us how many tips (4) and internal nodes (3) the tree has, then lists the tip labels. It also tells us the tree is rooted and contains branch lengths.
If we want to look more closely at the components that make up a `phylo` object we need to look at the structure of `tree` using `str`.
```{r}
# Look at the summary of tree components
str(tree)
```
This gives us a bit more information about what is going on. `tree` contains four variables:
1. **edge** shows how the branches and tips are linked together (I'll show you how this works below).
2. **edge.length** gives the branch lengths.
3. **Nnode** tells you how many internal nodes there are.
4. **tip.label** is a list of the taxon names.
If we want to see the whole of the `phylo` object we have to use a little trick. Everything we do to `tree` at the moment will work based on `tree` being a `phylo` object. If we want to see the whole thing, we need to tell R to ignore that, and present it as is. To do this we can use `unclass` to remove the class rules.
```{r}
# Look at the tree as is
unclass(tree)
```
If we only wanted to see the full `edge` component we can use the `$` to extract just that variable from `tree`:
```{r}
# Just look at the edge variable
tree$edge
```
What is going on here? The easiest way to understand is to look at the `edge` matrix and the tree with node numbers added to it (see below). For all trees, 1 is the number of the first tip taxon at the base of the phylogeny, 2 is the number of the second taxon and so on until you have each taxon numbered. The numbers then refer to *nodes*, working through the tree from the root forwards.
So in the `edge` matrix above, 1, 2, 3 and 4 are the tips. Node 7 leads to tips 1 and 2. Node 6 leads to node 7, and tip 3. Node 5 leads to node 6 and tip 4. Node 5 is the root node.
```{r, echo = FALSE}
plot(tree, show.node.label = TRUE, label.offset = 1)
nodelabels(text = c(7,6,5,4,3,2,1), node = c(7,6,5,4,3,2,1), frame = "circle")
```
Notice that the `edge.length` variable is in the same order as `edge`. So the first row of `edge` shows a branch joining node 5 to node 6, and this has a branch length of 6.3 (the first entry in `edge.length`). And so on...
```{r}
tree$edge
tree$edge.length
```
As I said above, you don't need to fully understand this, so don't worry if this is confusing. But you will see at various points we do things like `tree$tip.label` to extract or change tip labels, and we might set branch lengths using `tree$edge.length` somewhere in our code. So now you understand where those bits of information come from.
## Plotting phylogenies
Regardless of what you're using a phylogeny for, you're likely to want to plot it at some point. Even if you don't need a plot straight away, I highly advise plotting your phylogeny before you do anything else. It's good practice to look at whatever data you read into R to check it, and plotting helps you check it looks correct.
Most phylogeny plotting in R uses the package `ape`. Recently, however, a new package called `ggtree` has been introduced that builds on the popular `ggplot2` method of plotting in R, and is much more flexible than `ape`. This leaves us in a bit of a quandary - which should you use? And which should I teach you?! I think there is a benefit in learning the basics of both approaches. In my own work, I use `ape` to plot trees I am working with to quickly check what is going on. I also use `ape` plotting indirectly in other packages like `phytools` - see later exercises. When I need to produce a pretty or complex tree I'll often use `ggtree` instead. The exercises here will mirror this. The aim is not to give you a thorough introduction to all the things we can do with plotting phlyogenies in R, but more to give a taster of what is possible, and what we will need for the later exercises.
### Basic phylogeny plotting with APE
Our basic tree is a bit small, so let's use some data that is built into R with a phlyogeny of bird orders.
First load the data. This will read in a tree called `bird.orders`.
```{r}
# Load the data from R
data(bird.orders)
```
Now plot the tree.
```{r}
# Plot the tree
plot(bird.orders)
```
Note that the function we use to plot phylogenies in `ape` is just called `plot`, but R knows to plot a phylogeny not anything else. How does this work? `plot` is one of a set of clever functions in R that uses an _ifelse_ statement to decide what kind of plot it should do. When you ask R to `plot` something, it first determines what _class_ of object it is. It then chooses the correct version of `plot` for that class. In this case the function it is actually using to plot the phylogeny is `plot.phylo`. This is important if you want to look at the help file for plotting phylogenies, because you need to use `?plot.phylo` *not* `?plot`.
```{r, eval = FALSE}
# Access the help file for phylogeny plotting
# This should open a help file in another window of RStudio
?plot.phylo
```
You should see from the help file that there are lots of options for plotting trees. For example if we want to use a fan-shaped tree, with smaller tip labels, and no white space round the edges to make the tree fill the plotting area, we can use:
```{r}
# Plot the tree as a circular/fan phylogeny with small labels
plot(bird.orders, cex = 0.6, type = "fan", no.margin = TRUE)
```
You can change the style of the tree (`type`), the color of the branches and tips (`edge.color`, `tip.color`), and the size of the tip labels (`cex`). Here's an fun/hideous example!
```{r}
plot(bird.orders, edge.color = "deeppink", tip.color = "springgreen", no.margin = TRUE)
```
We can also add information about the traits of species to trees. We will come back to this when we cover models of evolution, but as a quick demonstration, let's imagine we want to display some data on our bird tree. For example, let's code these bird orders based on how much I like them (this is clearly a total lie because I love them all equally, but for the sake of an example I'm willing to pretend!).
```{r}
# Make a factor which contains how much I like each order
myfaves <- factor(c("awesome", "cool", "ok", "awesome", "ok",
"ok", "ok", "awesome", "cool", "ok", "ok",
"cool", "cool", "awesome", "ok", "ok",
"cool", "ok", "ok", "ok", "ok", "cool", "ok"))
```
Now we can plot this on a phylogeny. First we decide which colours we'd like and make a list of these. To look at a list of inbuilt colours in R type in `colors()`. You can also use any hex colour coded as e.g. "#000000" instead of "white". The first colour will be the first category alphabetically, the second will be the second category alphabetically, and so on.
```{r}
mycolours <- c("gold", "cornflowerblue", "cyan4")
```
Now plot the tree and add square labels to the tips showing the categories. We use `label.offset = 1` to move the labels to the right a bit so the squares will fit. I've also added a legend.
```{r}
# Plot the tree
plot(bird.orders, label.offset = 1, cex = 0.9, no.margin = TRUE)
# Add the squares at the tip labels.
tiplabels(pch = 22, bg = mycolours[as.numeric(myfaves)], cex = 1.2, adj = 1)
# Add a legend
legend("topleft", fill = mycolours,
legend = c("Awesome", "Cool", "OK"),
bty = "n")
```
`pch = 22` sets the tip labels to be unfilled squares, `bg` (background) defines the colours of the squares using the list of colours we provided, and sorting them based on what the value for that order was for `myfaves`.
`cex = 1.2` increases the point size, and `adj = 1` moves the tip labels sideways a bit so they don't obscure the ends of the branches.
### Basic phylogeny plotting with `ggtree`
I won't go into a huge amount of detail about `ggtree` because the [manual for `ggtree`](https://yulab-smu.github.io/treedata-book/index.html) is really comprehensive, and for the most part you won't need to know how to do most of the things it can do. Instead I'll give a basic introduction, and then show you how to make some of the plots used in chapter 2 of the *Primer*. I will also provide the code used to build figures in the *Primer* where appropriate in later exercises.
#### A quick intro to `ggplot2`
`ggtree` extends the `ggplot2` package to work with phylogenies. What's `ggplot2`? The `ggplot2` package was developed by Hadley Wickham to implement some of the ideas in a book called "The Grammar of Graphics" by Wilkinson (2005), hence the *gg* bit of the name. Many books and online tutorials cover `ggplot2` in detail, so here I'll just cover the basics you need to understand how `ggtree` works. See https://ggplot2-book.org/ for details.
`ggplot2` works on the basis of __layers__. You start off with a line of code that uses the function `ggplot`, then at the end of the line you add a `+`. On the next line you add a *layer*. This layer might tell R what kind of plot to make, what the axes should be, how to draw the legend etc. You keep adding layers until you get the plot that you want.
Each layer can have six components, but we'll just focus on the main three:
1. The __data__. Every layer needs some data, in the form of a dataframe (or tibble). Each layer can be associated with a different dataset if appropriate.
* A __geometric object__, called a 'geom'. geoms refer to the things we can see on a plot, such as points, lines or bars.
* A set of __aesthetic mappings__. These describe how variables in the data are associated with the aesthetic properties of the layer. This can include what to use as x and y axes, and the colour and size of the objects (e.g. points) on a plot. Each layer can be associated with its own unique aesthetic mappings. Aesthetic mappings are always defined inside the `aes` function. Different geoms will need different sets of aesthetic mappings, for example to draw a scatter plot you need to define the x and y axes, but for a histogram you only need the x axis.
Each layer will also have layer specific parameters. These are the features of a layer, for example in `geom_point`, the geom that makes points on a graph, one layer specific parameter is `colour = ` which defines the colour of the data points.
Let's quickly demonstrate this using some fake data - read in the code below to create a monster dataset.
```{r}
# Make some fake data
monsterdata <- data.frame(monster = c("vampire", "golem",
"cookie-monster", "manticore", "wererat"),
type = c("dead", "dead", "alive", "dead", "alive"),
scariness = c(65, 50, 10, 75, 95))
```
To make a really basic plot...
```{r}
ggplot(data = monsterdata, aes(x = monster, y = scariness)) +
geom_point(colour = "deeppink", size = 4) +
theme_bw() +
ylim(0, 100)
```
A couple of points to note:
In `ggplot2`, *theme* is used to define the overall look of the plot. We can use the `theme` function modify everything individually, but there are some built in themes that do lots of things at once. `theme_bw` is one of my favourites as it gets rid of the horrible grey background that is a `ggplot2` default, but keeps the nice background grid.
I also used `ylim` to change the y axis limits to go from 0 to 100.
Another thing I'm going to do below with `ggtree` is plot multiple plots on one plotting space. There are a couple of ways to do this, but the easiest is to use the package `patchwork`. First we need to assign our plots to an object using `<-`. We do this as follows
```{r}
myplot1 <-
ggplot(data = monsterdata, aes(x = monster, y = scariness)) +
geom_point(colour = "deeppink", size = 4) +
theme_bw() +
ylim(0, 100)
myplot2 <-
ggplot(data = monsterdata, aes(x = monster, y = scariness)) +
geom_point(colour = "gold1", size = 4, shape = "triangle") +
theme_bw() +
ylim(0, 100)
```
To look at the plots we would now have to type `myplot1` or `myplot2` into R, but I haven't done this here because we already looked at the first plot above, and all I changed in the second plot was the colour and shape of the points.
With `patchwork`, we can use `+` to plot things next to each other, or `/` to plot things on top of one another, and `()` to do more complex arrangements. See https://patchwork.data-imaginist.com/articles/patchwork.html for details. Here let's just plot them next to each other...
```{r}
library(patchwork)
myplot1 + myplot2
```
```{r}
myplot1 / (myplot1 + myplot2)
```
Below you'll see I use the `ggplot2` layer `labs` to add labels (i.e. A, B, C) to plots in multi-panel figures. We can also angle the x axis labels so they fit, add trend lines etc. But I'll let you find out how to do that for yourselves!
### Back to `ggtree`
OK that should give you the basics of what to expect from `ggplot2`. Now let's use `ggtree`...
To plot a phylogeny using `ggtree` we always start with the `ggtree` function (*not* `ggplot`). We then add layers using the `+` like in `ggplot2`. Some layers are the same (e.g. `xlim`, `ylim`), others are unique to `ggtree` (e.g. `geom_tiplab`). `ggtree` also has an inbuilt theme for trees called `theme_tree`.
To plot the basic phylogeny we've been practising with we use:
```{r, message = FALSE}
library(ggtree)
ggtree(tree) +
# Add theme for trees
theme_tree() +
# Change x and y limits so the tip labels fit
xlim(0, 22) +
ylim(0, 5) +
# Add tip labels as text, slightly offset from the tips and aligned
geom_tiplab(geom = "text", align = TRUE, offset = 0.5, linetype = NA)
```
Note that you might need to fiddle around with the `xlim` and `ylim` a bit to make sure your phylogeny and the tip labels fit. If we don't specify the limits, the tip labels often fall off the plotting window (see below), and the tree will push right up against the top and bottom of the page making it hard to add things like images to the tips.
```{r}
# Demonstration of what happens if you don't define the x and y limits
ggtree(tree) +
theme_tree() +
geom_tiplab(geom = "text", align = TRUE, offset = 0.5, linetype = NA)
```
`ggtree` is clearly a bit more involved than just using `plot.phylo` with `ape`. If you're happy with plotting in `ape` you don't need to use `ggtree`. As I mentioned above, for day-to-day tree viewing and checking I just tend to use `ape`, so don't worry if this is all incomprehensible.
One thing, among many, that `ggtree` is great for is adding images to the tips rather than species names. To do this you need to collate the images you want to use first. You may want to draw your own, or I often download them from the excellent [PhyloPic](http://phylopic.org/) website (these are free to use but don't forget to record who made the image so you can credit them in your papers/talks). To keep my folder tidy I keep images in a separate subfolder called, predictably, `images`.
You need the package `ggimage` installed for this to work, which for me took some time and also required that I install `ggplotify`, `magick`, and `gridGraphics`. If this is a struggle, then just skip this example.
```{r, message = FALSE}
# Plot with images at tips
ggtree(tree) +
theme_tree() +
# Change limits so labels fit
xlim(0, 22) +
ylim(0, 5) +
# Add tip label pictures
geom_tiplab(aes(image = c("images/deer.png",
"images/robin.png",
"images/spider.png",
"images/jellyfish.png",
rep(NA, 3))),
geom = "image", align = TRUE, offset = 0.5,
linetype = NA, size = c(0.12, 0.15, 0.09, 0.15))
```
Note that to add either text or images to the tips we use the same layer function, `geom_tiplab`. If doing this with your own data you'll likely need to fiddle with the sizes of the images to get them looking right.
Some of you may be wondering what the `rep(NA, 3)` bit of the code above does. When we try to add images to the tips, the function also tries to add images to the internal nodes. We don't want these to have images so we just ask `ggtree` to use `NA`, i.e. don't add an image. `rep(NA, 3)` means replicate `NA` three times (for the three internal nodes). Again you'll probably need to modify this number when you do this for your own data.
As with `plot.phylo` we can plot phylogenies in different ways, see code below. Fan trees are called circular trees. Note the different values for `offset` and `hjust` (horizontal justification) I've used here to make the labels fit. Again you will need to fiddle with these numbers to get this to work for your data.
```{r, message=FALSE}
# Standard tree
p1 <-
ggtree(tree) +
theme_tree() +
# Change limits so labels fit
xlim(0, 22) +
ylim(0, 5) +
# Add tip labels
geom_tiplab(geom = "text", align = TRUE, offset = 0.5, linetype = NA) +
labs(tag = "A", size = 2)
# Standard tree facing up
p2 <-
ggtree(tree) +
theme_tree() +
coord_flip() +
# Change limits so labels fit
xlim(0, 22) +
ylim(0, 5) +
# Add tip labels
geom_tiplab(geom = "text", align = TRUE, offset = 2, hjust = 0.5, linetype = NA) +
labs(tag = "B", size = 2)
# Slanted tree
p3 <-
ggtree(tree, layout = "slanted") +
theme_tree() +
# Change limits so labels fit
xlim(0, 22) +
ylim(0, 5) +
# Add tip labels
geom_tiplab(geom = "text", align = TRUE, offset = 0.5, linetype = NA) +
labs(tag = "C", size = 2)
# Fan/circular tree
p4 <-
ggtree(tree, layout = "circular") +
theme_tree() +
# Add tip labels
geom_tiplab(geom = "text", align = TRUE, offset = 5, hjust = 0.5, linetype = NA, size = 3) +
labs(tag = "D", size = 2)
# Plot
(p1 + p2) / (p3 + p4)
```
What about highlighting clades? We can use `geom_hilight` to add coloured shading to all species from a defined node, and/or `geom_cladelabel` to label clades (A and B below) from a defined node. The offset in `geom_cladelabel` defines how far from the phylogeny tips the lines should be drawn. Again, you'll need to play with this to get the best figure possible.
```{r, message = FALSE}
# Plot with images at tips
ggtree(tree) +
theme_tree() +
# Change limits so labels fit
xlim(0, 22) +
ylim(0, 5) +
# Add tip label pictures
geom_tiplab(aes(image = c("images/deer.png",
"images/robin.png",
"images/spider.png",
"images/jellyfish.png",
rep(NA, 3))),
geom = "image", align = TRUE, offset = 0.5,
linetype = NA, size = c(.12, .15, .09, .15)) +
# Highlight clades
geom_hilight(node = 6, fill = "cornflowerblue", alpha = 0.4) +
geom_hilight(node = 7, fill = "springgreen", alpha = 0.4) +
geom_cladelabel(node = 7 , label = "A", offset = 4,
fontsize = 5) +
geom_cladelabel(node = 6 , label = "B", offset = 6,
fontsize = 5)
```
You can also highlight non-monophyletic groups using `groupOTU`. Here's a quick and silly example...
```{r, message = FALSE}
# Define group
feathers <- list(no = c("deer","jellyfish", "spider"),
yes = c("robin"))
# Plot with images at tips
p <-
ggtree(tree) +
theme_tree() +
# Change limits so labels fit
xlim(0, 25) +
ylim(0, 5) +
# Add tip labels
geom_tiplab(geom = "text", align = TRUE, offset = 0.5, linetype = NA)
# Add colours to branches and labels
groupOTU(p, feathers, 'feathers') +
aes(color = feathers) +
scale_colour_manual(values = c("orange1","darkblue")) +
theme(legend.position = "top")
```
Note the warning here that tells us not to use `filter_`. This is from the internal `ggtree` code and can be ignored for now. Either this will be updated in the future, or `ggtree` will change how this is done. So if this code breaks completely try the `ggtree` help files for a solution.
```{block, type = "info"}
There are many more things you can do with `ggtree`, all based on this idea of adding layers. See the manual for `ggtree` at http://yulab-smu.top/treedata-book/index.html for more details.
```
## Phylogeny manipulation
To run analyses in R we often need to manipulate our phylogeny in some way so that the analyses can work. I'll mention these requirements where needed, but here are some common things we might need to change. I'll use the `bird.orders` phylogeny again as it's a manageable size.
```{r}
# Load the tree
data(bird.orders)
```
### Removing polytomies
Most R functions require your tree to be dichotomous, i.e. to have no polytomies. To check whether your tree is dichotomous use `is.binary.tree`.
```{r}
# Check whether the tree is binary
# We want this to be TRUE
is.binary.tree(bird.orders)
```
If this is FALSE, use `multi2di` to make the tree dichotomous. This function works by randomly resolving polytomies with zero-length branches. This doesn't change the tree overall, it's just a clever trick to get the functions to work.
```{r, eval = FALSE}
# Make the tree into a binary tree
bird.orders <- multi2di(bird.orders)
```
### Rooting your phylogeny
Most R functions also require the tree to be rooted, i.e., to have one (or more) taxon designated as the outgroup. We can check whether the tree is rooted as follows.
```{r}
# Check whether the tree is rooted
# We want this to be TRUE
is.rooted(bird.orders)
```
Our tree is rooted but if you wanted to change the root, or root an unrooted tree use `root`. We can make a new tree and root it on something silly (e.g. Passeriformes) to demonstrate. Remember that your root should be the outgroup from the phylogenetic inference. Rooting a tree incorrectly can cause big issues with downstream analyses, so make sure you choose carefully.
```{r}
new.tree <- root(bird.orders, outgroup = "Passeriformes")
plot(new.tree)
```
### Manipulating the species in your phlyogeny
When we do comparative analyses, we have data and a tree to deal with. It's very common to have species in your data that are not in your tree and vice versa. In the next exercise I'm going to show you how to match up the species in your comparative data with the species in your phylogeny. But you may also want to make these changes for plotting purposes, so it is useful to know how to do them anyway.
#### Renaming species
We can change the names of species using `gsub` which stands for generalised substitution. This can help with typos etc. or if you have minor taxonomic changes to make. Here, for fun let's rename Passeriformes...
```{r}
bird.orders$tip.label <- gsub("Passeriformes", "birbs", bird.orders$tip.label)
plot(bird.orders)
```
#### Removing species
Removing species uses the function `drop.tip`. If we want to remove birbs from the tree above..
```{r}
bird.orders <- drop.tip(bird.orders, tip = "birbs")
plot(bird.orders)
```
You can also remove multiple tips at once...
```{r}
bird.orders <- drop.tip(bird.orders, tip = c("Struthioniformes", "Tinamiformes"))
plot(bird.orders)
```
However, if you're looking to do something more complex like removing entire groups it's probably easier to use the functions in the next exercise.
#### Adding species
Sometimes you'll find your phylogeny is missing some species that you have in your dataset. In general the easiest thing is just to omit those species from your analysis, but it may be that they're data points you really want to keep. In which case you can add species to the tree.
Before you can add the species you need to find out *where* in the tree the species belongs. This will require looking in the literature to find the closest relative of the species that already exists in the tree. You also need to know how long the branch should be that will attach the new species to the tree. In some cases this will also be possible to locate in the literature. In other cases, for example if you're adding a species as a sister species to another species within its genus, you can just choose an arbitrary short branch length, e.g. 0.1 or 1. The exact value depends on the branch lengths in the tree. We can then use R to add the species to the tree.
To add a species in R, we first select the node that we want to add the tip to. For example, if we want to add Big Bird to our `bird.orders` tree as a sister to the Galliformes, we can identify the node to add Big Bird to using:
```{r}
# Identify the node
node <- which(bird.orders$tip.label == "Galliformes")
# Which node is it?
node
```
We then add the species to this node using the `bind.tip` function in `phytools`:
```{r}
library(phytools)
bird.orders_plusbigbird <- bind.tip(bird.orders, tip.label = "Big_Bird",
where = node, position = 4.5)
plot(bird.orders_plusbigbird)
```
You can see that Big Bird has been added to the Galliformes branch. In the `bind.tip` function, `position` is the length of the new branch. Here we used 4.5, but you could use other numbers if you needed the branch to be longer or shorter (i.e. if you knew the divergence between Galliformes and Big Bird was longer ago or more recent).
```{block, type = "detail"}
Note that the value of `position` is limited by the length of the branch you are adding the new species to. If you see the error below it means you've given `position` a value which is longer that the existing branch, and you'll need to make it a bit smaller.
> `Error in bind.tree(tree, tip, where = where, position = pp) :
'position' is larger than the branch length`
```
### Writing phylogenies to file
After manipulating a phylogeny, if you want to save the new tree you can use `write.tree` or `write.nexus` as preferred. You can then use this modified tree for you analyses or plots, meaning that you don't have to do all the manipulation again every time you want to use it.
If we want to save the Big Bird tree we can use:
```{r, eval = FALSE}
write.nexus(bird.orders_plusbigbird, file = "bigbirdtree.nex")
```
## Summary
This exercise should have given you the skills to understand, plot and manipulate phylogenies in R.
## Practical exercise
Read in the `frog-tree.nex` phylogeny from the folder. This comes from @feng2017phylogenomics. Then do the following:
1. Use R functions to determine:
a) how many species are in the tree?
b) is the tree fully resolved?
c) is the tree rooted?
2. Use `plot.phylo` to plot the tree.
a) Can you change the size of the tip labels?
b) Can you make a fan-shaped plot?
c) Can you change the colour of the tips and branches?
3. Save the tree to file as "mysuperdoopertree.nex"
EXTRA FUN. Use `ggtree` and the `ggtree` manual to produce a phylogeny plot of this tree that you could use in a publication OR have some fun and make the most garish looking tree you possibly can. Playing around with these packages is the best way to learn how to use them. If you make a particularly amazing or horrific tree, send a screenshot to me.