Skip to content

Commit

Permalink
FIx typos
Browse files Browse the repository at this point in the history
  • Loading branch information
camillescott authored Apr 4, 2018
1 parent 3e7c02f commit 5f26904
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions docs/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,13 +32,13 @@ Python has several packages for working with phylogenetic trees, each focused on
* The [`Bio.Phylo`](http://dx.doi.org/10.1186/1471-2105-13-209) subpackage in [biopython](http://biopython.org/) collects useful tools for working with common (and not so common) file formats in phylogenetics, along with utilities for analysis and visualization [@biophylo]
* The [`skbio.tree`](http://scikit-bio.org/docs/latest/tree.html) module in [`scikit-bio`](http://scikit-bio.org/) is a base class for phylogenetic trees providing analytical and file processing functions for working with phylogenetic trees [@skbio]

Each of these packages allow trees to be manipulated, edited and reshaped. To make this possible, they must strike a balance between raw performance and flexibility, and most prioritize flexibility and a rich set of features. This is desireable for most use cases, but computational scaling challanges arise when using these packages to work with very large trees. Trees representing microbial communities may contain tens of thousands to tens of millions of taxa, depending on the community diversity and the survey methodology.
Each of these packages allow trees to be manipulated, edited and reshaped. To make this possible, they must strike a balance between raw performance and flexibility, and most prioritize flexibility and a rich set of features. This is desireable for most use cases, but computational scaling challenges arise when using these packages to work with very large trees. Trees representing microbial communities may contain tens of thousands to tens of millions of taxa, depending on the community diversity and the survey methodology.

`SuchTree` is designed purely as a backend for analysis of large trees. Significant advantages in memory layout, parallelism and speed are achieved by sacrificing the ability to manipulate, edit or reshape trees (these capabilities exist in other packages). It scales to millions of taxa, and the key algorithms and data structures permit concurrent threads without locks.

![](nj_vs_ml.png)

**Figure 1 :** Two phylogenetic trees of 54,327 taxa were constructed using different methods (approximate maximum likelihood using [`FastTree`](http://www.microbesonline.org/fasttree/) and the [`neighbor joining`](https://en.wikipedia.org/wiki/Neighbor_joining)) agglomerative clustering method). To explore the different topologies of the trees, pairs of taxa were chosen at random and the patristic distance between each pair was computed through each of the two trees. This plot shows 1,000,000 random pairs sampled from 1,475,684,301 possible pairs (0.07%). The two million distances calculations required about 12.5 seconds using a single thread.
**Figure 1 :** Two phylogenetic trees of 54,327 taxa were constructed using different methods (approximate maximum likelihood using [`FastTree`](http://www.microbesonline.org/fasttree/) and the [`neighbor joining`](https://en.wikipedia.org/wiki/Neighbor_joining) agglomerative clustering method). To explore the different topologies of the trees, pairs of taxa were chosen at random and the patristic distance between each pair was computed through each of the two trees. This plot shows 1,000,000 random pairs sampled from 1,475,684,301 possible pairs (0.07%). The two million distances calculations required about 12.5 seconds using a single thread.

`SuchTree` supports co-phylogenies, with functions for efficiently extracting graphs and subgraphs for network analysis, and has native support for [`igraph`](http://igraph.org/) and [`networkx`](https://networkx.github.io/).

Expand Down

0 comments on commit 5f26904

Please sign in to comment.