Skip to content

Commit

Permalink
Islands() (#121)
Browse files Browse the repository at this point in the history
  • Loading branch information
ms609 authored Jun 28, 2024
1 parent 4ecb9f9 commit 217a8c0
Show file tree
Hide file tree
Showing 15 changed files with 271 additions and 3 deletions.
4 changes: 3 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: TreeDist
Type: Package
Title: Calculate and Map Distances Between Phylogenetic Trees
Version: 2.7.1.9002
Version: 2.7.1.9003
Authors@R: c(person("Martin R.", "Smith",
email = "[email protected]",
role = c("aut", "cre", "cph", "prg"),
Expand Down Expand Up @@ -29,6 +29,8 @@ Description: Implements measures of tree similarity, including
(1996) <doi:10.1007/3-540-61332-3_168>.
Includes tools for visualizing mappings of tree space (Smith 2022)
<doi:10.1093/sysbio/syab100>,
for identifying islands of trees (Silva and Wilkinson 2021)
<doi:10.1093/sysbio/syab015>,
for calculating the median of sets of trees,
and for computing the information content of trees and splits.
Copyright: Jonker-Volgenant Linear Assignment Problem implementation
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ export(GeneralizedRF)
export(GetParallel)
export(InfoRobinsonFoulds)
export(InfoRobinsonFouldsSplits)
export(Islands)
export(JaccardRobinsonFoulds)
export(JaccardSplitSimilarity)
export(KCDiameter)
Expand Down
4 changes: 3 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
# branch no-phangorn (9002)
# TreeDist 2.7.1.9003 (2024-06-28)

- `Islands()` allows the identification of islands of trees.

- Internal implementation of path and SPR distances, removing dependency
on phangorn (and thus R 4.4).
Expand Down
79 changes: 79 additions & 0 deletions R/Islands.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
#' Find islands from distance matrix
#'
#' `Islands()` assigns a set of objects to islands, such that all elements
#' within an island can form a connected graph in which each edge is no longer
#' than `threshold` distance units \insertRef{Silva2021}{TreeDist}.
#'
#' @inheritParams SpectralEigens
#' @param threshold Elements greater than `threshold` distance units will not be
#' assigned to the same island.
#' @param dense Logical; if `FALSE`, each island will be named according to the
#' index of its lowest-indexed member; if `TRUE`, each island will be numbered
#' sequentially from `1`, ordered by the index of the lowest-indexed member.
#' @param smallest Integer; Islands comprising no more than `smallest` elements
#' will be assigned to island `NA`.
#' @return `Islands()` returns a vector listing the island to which
#' each element is assigned.
#' @references \insertAllCited{}
#' @examples
#' library("TreeTools", quietly = TRUE)
#' # Generate a set of trees
#' trees <- as.phylo(as.TreeNumber(BalancedTree(16)) + c(-(40:20), 70:105), 16)
#'
#' # Calculate distances between trees
#' distances <- ClusteringInfoDist(trees)
#' summary(distances)
#'
#' # Assign trees to islands
#' isle <- Islands(distances, quantile(distances, 0.1))
#' table(isle)
#'
#' # Indicate island membership on 2D mapping of tree distances
#' mapping <- cmdscale(distances, 2)
#' plot(mapping, col = isle + 1,
#' asp = 1, # Preserve aspect ratio - do not distort distances
#' ann = FALSE, axes = FALSE, # Don't label axes: dimensions are meaningless)
#' pch = 16 # Plotting character: Filled circle
#' )
#'
#' # Compare strict consensus with island consensus trees
#' oPar <- par(mfrow = c(2, 2), mai = rep(0.1, 4))
#' plot(Consensus(trees), main = "Strict")
#' plot(Consensus(trees[isle == 1]), edge.col = 2, main = "Island 1")
#' plot(Consensus(trees[isle == 2]), edge.col = 3, main = "Island 2")
#' plot(Consensus(trees[isle == 3]), edge.col = 4, main = "Island 3")
#'
#' # Restore graphical parameters
#' par(oPar)
#' @template MRS
#' @family tree space functions
#' @export
Islands <- function(D, threshold, dense = TRUE, smallest = 0) {
linked <- as.matrix(D) <= threshold
n <- dim(linked)[[1]]
ret <- integer(n)
i <- 1
repeat {
links <- seq_len(n) == i
repeat {
nowLinked <- colSums(linked[links, , drop = FALSE]) > 0
if (any(nowLinked[!links])) {
# Added to island
links <- nowLinked
} else {
break
}
}
ret[links] <- i
i <- which.min(ret)
if (ret[[i]]) break
}
tab <- table(ret, dnn = NULL)
ret[ret %in% as.integer(names(tab)[tab < smallest])] <- NA

if (dense) {
as.integer(factor(rank(ret, ties.method = "min", na.last = "keep")))
} else {
ret
}
}
11 changes: 11 additions & 0 deletions inst/REFERENCES.bib
Original file line number Diff line number Diff line change
Expand Up @@ -491,6 +491,17 @@ @article{Shannon1948
journal = {Bell System Technical Journal}
}

@article{Silva2021,
title = {On Defining and Finding Islands of Trees and Mitigating Large Island Bias},
author = {Silva, Ana Serra and Wilkinson, Mark},
year = {2021},
journal = {Systematic Biology},
volume = {70},
number = {6},
pages = {1282--1294},
doi = {10.1093/sysbio/syab015}
}

@article{Smith2014,
author = {Smith, Martin R. and Ortega-Hern{\'{a}}ndez, Javier},
doi = {10.1038/nature13576},
Expand Down
79 changes: 79 additions & 0 deletions man/Islands.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/MSTSegments.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/MapTrees.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/MappingQuality.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/SpectralEigens.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/cluster-statistics.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/median.multiPhylo.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

19 changes: 19 additions & 0 deletions tests/testthat/test-Islands.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
test_that("Islands() works", {
library("TreeTools", quietly = TRUE)
trees <- as.phylo(as.TreeNumber(BalancedTree(16)) + c(70:78, -(39:30), 80:99), 16)
distances <- ClusteringInfoDist(trees)
expect_equal(Islands(distances, 2.5), rep(c(1, 2, 1), c(9, 10, 20)))
expect_equal(Islands(distances, 2.5, FALSE), rep(c(1, 10, 1), c(9, 10, 20)))


trees <- as.phylo(c(0:10, 1000:1005, 2000:2019), 16)
distances <- ClusteringInfoDist(trees)
expect_equal(Islands(distances, 2.5, TRUE, smallest = 6),
rep(c(1, 2, 3), c(11, 6, 20)))
expect_equal(Islands(distances, 2.5, TRUE, smallest = 7),
rep(c(1, NA, 2), c(11, 6, 20)))
expect_equal(Islands(distances, 2.5, FALSE, smallest = 6),
rep(c(1, 12, 18), c(11, 6, 20)))
expect_equal(Islands(distances, 2.5, FALSE, smallest = 7),
rep(c(1, NA, 18), c(11, 6, 20)))
})
30 changes: 30 additions & 0 deletions vignettes/tree-islands.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
title: "Finding islands of phylogenetic trees"
author: "[Martin R. Smith](https://smithlabdurham.github.io/)"
output:
rmarkdown::html_vignette:
fig_width: 6
fig_height: 4
bibliography: ../inst/REFERENCES.bib
csl: https://raw.githubusercontent.com/citation-style-language/styles/master/apa-old-doi-prefix.csl
vignette: >
%\VignetteIndexEntry{Finding islands of phylogenetic trees}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

Collections of phylogenetic trees, such as those output by Bayesian or parsimony
analysis, may occupy discrete regions of tree space such that individual
clusters of trees are separated from all other trees by a certain distance.
Finding such islands, and taking their consensus, can be an effective way
of summarising conflicting signal in a phylogenetic dataset [@Silva2021].

```{r col-trees-by-score}
# Load required libraries
library("TreeTools", quietly = TRUE)
library("TreeDist")
# Generate a set of trees
trees <- as.phylo(as.TreeNumber(BalancedTree(16)) + 0:100 - 15, 16)
```
41 changes: 40 additions & 1 deletion vignettes/treespace.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,7 @@ points(seq_along(trees), rep(1, length(trees)), pch = 16,

Another thing we may wish to do is to take the consensus of each cluster:

```{r consensus, fig.align="center"}
```{r cluster-consensus, fig.align="center"}
par(mfrow = c(1, 2), mar = rep(0.2, 4))
col1 <- spectrum[mean(treeNumbers[cluster == 1])]
col2 <- spectrum[mean(treeNumbers[cluster == 2])]
Expand All @@ -219,6 +219,45 @@ plot(consensus(trees[cluster == 2], p = 0.5),

Here, we learn that the two clusters are distinguished by the position of `t7`.


### Identifying islands

Besides clustering, we can also define 'islands' in tree space that are
separated by a 'moat', such that all trees on one island are separated from
all trees on another by at least a certain distance [@Silva2021].

```{r island-id, fig.asp = 1, fig.width = 3, fig.align="center"}
par(mar = rep(0, 4))
# set a threshold corresponding to the width of the "moat" between islands
threshold <- 1.8
island <- Islands(distances, threshold)
# See how many trees are on each island
table(island)
# Let's ignore the small islands for now
largeIsle <- Islands(distances, threshold, smallest = 5)
# Colour trees according to their island
plot(mapping,
asp = 1, # Preserve aspect ratio - do not distort distances
ann = FALSE, axes = FALSE, # Don't label axes: dimensions are meaningless
col = ifelse(is.na(largeIsle), "grey", largeIsle + 1),
pch = 16
)
```

Let's view the consensus of each large island cluster:

```{r island-consensus, fig.align="center"}
par(mfrow = c(1, 2), mar = rep(0.2, 4))
plot(consensus(trees[!is.na(largeIsle) & largeIsle == 1], p = 0.5),
edge.color = 2, edge.width = 2, tip.color = 2)
plot(consensus(trees[!is.na(largeIsle) & largeIsle == 2], p = 0.5),
edge.color = 3, edge.width = 2, tip.color = 3)
```


### Validating a mapping

Now let's evaluate whether our map of tree space is representative.
Expand Down

0 comments on commit 217a8c0

Please sign in to comment.