A Python library for doing fast, thread-safe computations with phylogenetic trees.
New for SuchTree v1.2
- Quartet topology tests provided by
SuchTree.get_quartet_topology( a, b, c, d )
- Optimized, thread-safe bulk quartet topology tests provided by
SuchTree.quartet_topologies( [N,4] )
- SuchTree now automatically detects and uses NEWICK strings as for initialization
New for SuchTree v1.1
- Basic support for support values provided by
SuchTree.get_support( node_id )
- Relative evolutionary divergence (RED)
- Bipartitions
- Node generators for in-order and preorder traversal
- Summary of leaf relationships via
SuchTree.relationships()
So, you have a phylogenetic tree, and you want to do some statistics with it.
There are lots of packages in Python that let you manipulate
phylogenies, like dendropy
, the tree model
included in scikit-bio
,
ete3
and the awesome, shiny new
toytree
. If your tree isn't too
big and your statistical tests doesn't require too many traversals, there
a lot of great options. If you're working with about a thousand taxa or less,
you should be able to use any of those packages for your tree.
However, if you are working with trees that include tens of thousands, or
maybe even millions of taxa, you are going to run into problems. ete3
,
dendropy
, toytree
, andscikit-bio
's TreeNode
are all designed to give
you lots of flexibility. You can re-root trees, use different traversal
schemes, attach metadata to nodes, attach and detach nodes, splice sub-trees
into or out of the main tree, plot trees for publication figures and do lots
of other useful things. That power and flexibility comes with a price -- speed.
For trees of moderate size, it is possible to solve the speed issue by working with matrix representations of the tree. Unfortunately, these representations scale quadratically with the number of taxa in the tree. A distance matrix for a tree of 100,000 taxa will consume about 20GB of RAM. If your method performs sampling, then almost every operation will be a cache miss. Unless you are very clever about access patterns and matrix layout, the performance will be limited by RAM latency, leaving the CPU mostly idle.
Suppose you have more than one group of organisms, and you want to study the way their interactions have influenced their evolution. Now, you have several trees that link together to form a generalized graph.
SuchLinkedTrees
has you covered. At the moment, SuchLinkedTrees
supports
trees of two interacting groups. Like SuchTree
, SuchLinkedTrees
is not
intended to be a general-purpose graph theory package. Instead, it leverages
SuchTree
to efficiently handle the problem-specific tasks of working with
co-phylogeny systems. It will load your datasets. It will build the graphs. It
will let you subset the graphs using their phylogenetic or ecological
properties. It will generate weighted adjacency and Laplacian matrixes of the
whole graph or of subgraphs you have selected. It will generate spectral
decompositions of subgraphs if spectral graph theory is your thing.
And, if that doesn't solve your problem, it will emit sugraphs as Graph
objects for use with the igraph
network analysis
package, or node and edge data for building graphs in
networkx
. Now you can do even more things.
Maybe you want to get all crazy with some
graph kernels?
Well, now you can.
SuchTree
is motivated by the observation that the memory usage of distance
matrixes grows quadratically with taxa, while for trees it grows linearly.
A matrix of 100,000 taxa is quite bulky, but the tree it represents can be made
to fit into about 7.6MB of RAM if implemented using only C
primitives. This
is small enough to fit into L2 cache on many modern microprocessors. This comes
at the cost of traversing the tree for every calculation (about 16 hops from
leaf to root for a 100,000 taxa tree), but, as these operations all happen
on-chip, the processor can take full advantage of
pipelining,
speculative execution
and other optimizations available in modern CPUs. And, because SuchTree
objects
are immutable, they are thread-safe. You can take full advantage of modern
multicore chips.
Here, we use SuchTree
to compare the topology of two trees built
from the same 54,327 sequences using two methods : neighbor joining
and Morgan Price's FastTree
approximate maximum likelihood algorithm. Using one million randomly
chosen pairs of leaf nodes, we look at the patristic distances in each
of the two trees, plot them against one another, and compute
correlation coefficients.
On an Intel i7-3770S, SuchTree
completes the two million distance
calculations in a little more than ten seconds.
from SuchTree import SuchTree
import random
T1 = SuchTree( 'data/bigtrees/ml.tree' )
T2 = SuchTree( 'data/bigtrees/nj.tree' )
print( 'nodes : %d, leafs : %d' % ( T1.length, len(T1.leafs) ) )
print( 'nodes : %d, leafs : %d' % ( T2.length, len(T2.leafs) ) )
nodes : 108653, leafs : 54327
nodes : 108653, leafs : 54327
N = 1000000
v = list( T1.leafs.keys() )
pairs = []
for i in range(N) :
pairs.append( ( random.choice( v ), random.choice( v ) ) )
%time D1 = T1.distances_by_name( pairs ); D2 = T2.distances_by_name( pairs )
CPU times: user 10.1 s, sys: 0 ns, total: 10.1 s
Wall time: 10.1 s
from scipy.stats import kendalltau, pearsonr
print( 'Kendall\'s tau : %0.3f' % kendalltau( D1, D2 )[0] )
print( 'Pearson\'s r : %0.3f' % pearsonr( D1, D2 )[0] )
Kendall's tau : 0.709
Pearson's r : 0.969
SuchTree
depends on the following packages :
scipy
numpy
dendropy
cython
pandas
To install the current release, you can install from PyPI :
pip install SuchTree
If you install using pip
, binary packages
(wheels
) are available for CPython 3.6, 3.7,
3.8, 3.9, 3.10 and 3.11 on Linux x86_64 and on MacOS with Intel and Apple
silicon. If your platform isn't in that list, but it is supported by
cibuildwheel
, please file an issue
to request your platform! I would be absolutely delighted if someone was
actually running SuchTree
on an exotic embedded system or a mainframe.
To install the most recent development version :
git clone https://github.com/ryneches/SuchTree.git
cd SuchTree
./setup.py install
To install via conda, first make sure you've got the bioconda channel set up, if you haven't already :
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict
Then, install in the usual way :
conda install suchtree
Note that the conda package name is lower case!
SuchTree
will accept either a URL or a file path :
from SuchTree import SuchTree
T = SuchTree( 'test.tree' )
T = SuchTree( 'https://github.com/ryneches/SuchTree/blob/master/data/gopher-louse/gopher.tree' )
The available properties are :
length
: the number of nodes in the treedepth
: the maximum depth of the treeroot
: the id of the root nodeleafs
: a dictionary mapping leaf names to their idsleafnodes
: a dictionary mapping leaf node ids to leaf namesRED
: a dictionary of RED (relative evolutionary divergence) scores for internal nodes, calculated on first access
The available methods of SuchTree
are :
get_parent
: for a given node id or leaf name, return the parent idget_support
: return the support value, if availableget_children
: for a given node id or leaf name, return the ids of the child nodes (leaf nodes have no children, so their child node ids will always be -1)get_leafs
: return an array of ids of all leaf nodes that descend from a nodeget_descendant_nodes
: generator for ids of all nodes that descend from a node, including leafsget_bipartition
: return the two sets of leaf nodes partitioned by a nodebipartitions
: generator of all bipartitionsget_internal_nodes
: return array of internal nodesget_nodes
: return an array of all nodesin_order
: generator for an in-order traversal of the treepre_order
: generator for a pre-order traversal of the treeget_distance_to_root
: for a given node id or leaf name, return the integrated phylogenetic distance to the root nodemrca
: for a given pair of node ids or leaf names, return the id of the nearest node that is parent to bothis_leaf
: returns True if the node is a leafis_internal_node
: returns True if the node is an internal nodeis_ancestor
: returns 1 if a is an ancestor of b, -1 if b is an ancestor of a, or 0 otherwisedistance
: for a given pair of node ids or leaf names, return the patristic distance between the pairdistances
: for an (n,2) array of pairs of node ids, return an (n) array of patristic distances between the pairsdistances_by_name
for an (n,2) list of pairs of leaf names, return an (n) list of patristic distances between each pairget_quartet_topology
: for a given quartet, return the topology of that quartetquartet_topologies
: compute the topologies of an array of quartets by idquartet_topologies_by_name
: compute the topologies of quartets by their taxa namesdump_array
: print out the entire tree (for debugging only! May produce pathologically gigantic output.)adjacency
: build the graph adjacency matrix of the treelaplacian
: build the Laplacian matrix of the treenodes_data
: generator for node data, compatible withnetworkx
edges_data
: generator for edge data, compatible withnetworkx
relationships
: builds a Pandas DataFrame describing relationships among taxa
For analysis of ecological interactions, SuchTree
is distributed
with a curated collection of several different examples from the
literature. Additionally, a collection of simulated interactions with
various properties, along with an annotated notebook of Python
code
for generating them, is also included. Interactions are registered in
a JSON object (data/studies.json
).
- gopher-louse Hafner, M.S. & Nadler, S.A. 1988. Phylogenetic trees support the coevolution of parasites and their hosts. Nature 332: 258-259)
- dove-louse Dale H. Clayton, Sarah E. Bush, Brad M. Goates, and Kevin P. Johnson. 2003. Host defense reinforces host–parasite cospeciation. PNAS.
- sedge-smut Escudero, Marcial. 2015. Phylogenetic congruence of parasitic smut fungi (Anthracoidea, Anthracoideaceae) and their host plants (Carex, Cyperaceae): Cospeciation or host-shift speciation? American journal of botany.
- fish-worm Maarten P. M. Vanhove, Antoine Pariselle, Maarten Van Steenberge, Joost A. M. Raeymaekers, Pascal I. Hablützel, Céline Gillardin, Bart Hellemans, Floris C. Breman, Stephan Koblmüller, Christian Sturmbauer, Jos Snoeks, Filip A. M. Volckaert & Tine Huyse. 2015. Hidden biodiversity in an ancient lake: phylogenetic congruence between Lake Tanganyika tropheine cichlids and their monogenean flatworm parasites, Scientific Reports.
These were originally collected by Enrico Rezende et al. :
Enrico L. Rezende, Jessica E. Lavabre, Paulo R. Guimarães, Pedro Jordano & Jordi Bascompte "Non-random coextinctions in phylogenetically structured mutualistic networks," Nature, 2007
- arr1 Arroyo, M.T.K., R. Primack & J.J. Armesto. 1982. Community studies in pollination ecology in the high temperate Andes of central Chile. I. Pollination mechanisms and altitudinal variation. Amer. J. Bot. 69:82-97.
- arr2 Arroyo, M.T.K., R. Primack & J.J. Armesto. 1982. Community studies in pollination ecology in the high temperate Andes of central Chile. I. Pollination mechanisms and altitudinal variation. Amer. J. Bot. 69:82-97.
- arr3 Arroyo, M.T.K., R. Primack & J.J. Armesto. 1982. Community studies in pollination ecology in the high temperate Andes of central Chile. I. Pollination mechanisms and altitudinal variation. Amer. J. Bot. 69:82-97.
- bahe Barrett, S. C. H., and K. Helenurm. 1987. The Reproductive-Biology of Boreal Forest Herbs.1. Breeding Systems and Pollination. Canadian Journal of Botany 65:2036-2046.
- cllo Clements, R. E., and F. L. Long. 1923, Experimental pollination. An outline of the ecology of flowers and insects. Washington, D.C., USA, Carnegie Institute of Washington.
- dihi Dicks, LV, Corbet, SA and Pywell, RF 2002. Compartmentalization in plant–insect flower visitor webs. J. Anim. Ecol. 71: 32–43
- dish Dicks, LV, Corbet, SA and Pywell, RF 2002. Compartmentalization in plant–insect flower visitor webs. J. Anim. Ecol. 71: 32–43
- dupo Dupont YL, Hansen DM and Olesen JM 2003 Structure of a plant-flower-visitor network in the high-altitude sub-alpine desert of Tenerife, Canary Islands. Ecography 26:301-310
- eol Elberling, H., and J. M. Olesen. 1999. The structure of a high latitude plant-flower visitor system: the dominance of flies. Ecography 22:314-323.
- eolz Elberling & Olesen unpubl.
- eski Eskildsen et al. unpubl.
- herr Herrera, J. 1988 Pollination relatioships in southern spanish mediterranean shrublands. Journal of Ecology 76: 274-287.
- hock Hocking, B. 1968. Insect-flower associations in the high Arctic with special reference to nectar. Oikos 19:359-388.
- inpk Inouye, D. W., and G. H. Pyke. 1988. Pollination biology in the Snowy Mountains of Australia: comparisons with montane Colorado, USA. Australian Journal of Ecology 13:191-210.
- kevn Kevan P. G. 1970. High Arctic insect-flower relations: The interrelationships of arthropods and flowers at Lake Hazen, Ellesmere Island, Northwest Territories, Canada. Ph.D. thesis, University of Alberta, Edmonton, 399 pp.
- kt90 Kato, M., Kakutani, T., Inoue, T. and Itino, T. (1990). Insect-flower relationship in the primary beech forest of Ashu, Kyoto: An overview of the flowering phenology and the seasonal pattern of insect visits. Contrib. Biol. Lab., Kyoto, Univ., 27, 309-375.
- med1 Medan, D., N. H. Montaldo, M. Devoto, A. Mantese, V. Vasellati, and N. H. Bartoloni. 2002. Plant-pollinator relationships at two altitudes in the Andes of Mendoza, Argentina. Arctic Antarctic and Alpine Research 34:233-241.
- med2 Medan, D., N. H. Montaldo, M. Devoto, A. Mantese, V. Vasellati, and N. H. Bartoloni. 2002. Plant-pollinator relationships at two altitudes in the Andes of Mendoza, Argentina. Arctic Antarctic and Alpine Research 34:233-241.
- memm Memmott J. 1999. The structure of a plant-pollinator food web. Ecology Letters 2:276-280.
- moma Mosquin, T., and J. E. H. Martin. 1967. Observations on the pollination biology of plants on Melville Island, N.W.T., Canada. Canadian Field Naturalist 81:201-205.
- mott Motten, A. F. 1982. Pollination Ecology of the Spring Wildflower Community in the Deciduous Forests of Piedmont North Carolina. Doctoral Dissertation thesis, Duke University, Duhram, North Carolina, USA; Motten, A. F. 1986. Pollination ecology of the spring wildflower community of a temperate deciduous forest. Ecological Monographs 56:21-42.
- mull McMullen 1993
- oflo Olesen unpubl.
- ofst Olesen unpubl.
- olau Olesen unpubl.
- olle Ollerton, J., S. D. Johnson, L. Cranmer, and S. Kellie. 2003. The pollination ecology of an assemblage of grassland asclepiads in South Africa. Annals of Botany 92:807-834.
- perc Percival, M. 1974. Floral ecology of coastal scrub in sotheast Jamaica. Biotropica, 6, 104-129.
- prap Primack, R.B. 1983. Insect pollination in the New Zealand mountain flora. New Zealand J. Bot. 21, 317-333, AB.
- prca Primack, R.B. 1983. Insect pollination in the New Zealand mountain flora. New Zealand J. Bot. 21, 317-333. Cass
- prcg Primack, R.B. 1983. Insect pollination in the New Zealand mountain flora. New Zealand J. Bot. 21, 317-333. Craigieb.
- ptnd Petanidou, T. 1991. Pollination ecology in a phryganic ecosystem. Unp. PhD. Thesis, Aristotelian University, Thessaloniki.
- rabr Ramirez, N., and Y. Brito. 1992. Pollination Biology in a Palm Swamp Community in the Venezuelan Central Plains. Botanical Journal of the Linnean Society 110:277-302.
- rmrz Ramirez, N. 1989. Biología de polinización en una comunidad arbustiva tropical de la alta Guyana Venezolana. Biotropica 21, 319-330.
- schm Schemske, D. W., M. F. Willson, M. N. Melampy, L. J. Miller, L. Verner, K. M. Schemske, and L. B. Best. 1978. Flowering Ecology of Some Spring Woodland Herbs. Ecology 59:351-366.
- smal Small, E. 1976. Insect pollinators of the Mer Bleue peat bog of Ottawa. Canadian Field Naturalist 90:22-28.
- smra Smith-Ramírez C., P. Martinez, M. Nuñez, C. González and J. J. Armesto 2005 Diversity, flower visitation frequency and generalism of pollinators in temperate rain forests of Chiloé Island,Chile. Botanical Journal of the Linnean Society, 2005, 147, 399–416.
- bair Baird, J.W. 1980. The selection and use of fruit by birds in an eastern forest. Wilson Bulletin 92: 63-73.
- beeh Beehler, B. 1983. Frugivory and polygamy in birds of paradise. Auk, 100: 1-12.
- cacg Carlo et al. 2003. Avian fruit preferences across a Puerto Rican forested landscape: pattern consistency and implications for seed removal. Oecologia 134: 119-131
- caci Carlo et al. 2003. Avian fruit preferences across a Puerto Rican forested landscape: pattern consistency and implications for seed removal. Oecologia 134: 119-131
- caco Carlo et al. 2003. Avian fruit preferences across a Puerto Rican forested landscape: pattern consistency and implications for seed removal. Oecologia 134: 119-131
- cafr Carlo et al. 2003. Avian fruit preferences across a Puerto Rican forested landscape: pattern consistency and implications for seed removal. Oecologia 134: 119-131
- crom Crome, F.H.J. 1975. The ecology of fruit pigeons in tropical Northern Queensland. Australian Journal of Wildlife Research, 2: 155-185.
- fros Frost, P.G.H. 1980. Fruit-frugivore interactions in a South African coastal dune forest. Pages 1179-1184 in: R. Noring (ed.). Acta XVII Congresus Internationalis Ornithologici, Deutsches Ornithologische Gessenshaft, Berlin.
- gen1 Galetti, M., Pizo, M.A. 1996. Fruit eating birds in a forest fragment in southeastern Brazil. Ararajuba, Revista Brasileira de Ornitologia, 4: 71-79.
- gen2 Galetti, M., Pizo, M.A. 1996. Fruit eating birds in a forest fragment in southeastern Brazil. Ararajuba, Revista Brasileira de Ornitologia, 4: 71-79.
- hamm Hammann, A. & Curio, B. 1999. Interactions among frugivores and fleshy fruit trees in a Philippine submontane rainforest
- hrat Jordano P. 1985. El ciclo anual de los paseriformes frugívoros en el matorral mediterráneo del sur de España: importancia de su invernada y variaciones interanuales. Ardeola, 32, 69-94.
- kant Kantak, G.E. 1979. Observations on some fruit-eating birds in Mexico. Auk, 96: 183-186.
- lamb Lambert F. 1989. Fig-eating by birds in a Malaysian lowland rain forest. J. Trop. Ecol., 5, 401-412.
- lope Tutin, C.E.G., Ham, R.M., White, L.J.T., Harrison, M.J.S. 1997. The primate community of the Lopé Reserve, Gabon: diets, responses to fruit scarcity, and effects on biomass. American Journal of Primatology, 42: 1-24.
- mack Mack, AL and Wright, DD. 1996. Notes on occurrence and feeding of birds at Crater Mountain Biological Research Station, Papua New Guinea. Emu 96: 89-101.
- mont Wheelwright, N.T., Haber, W.A., Murray, K.G., Guindon, C. 1984. Tropical fruit-eating birds and their food plants: a survey of a Costa Rican lower montane forest. Biotropica, 16: 173-192.
- ncor P. Jordano, unpubl.
- nnog P. Jordano, unpubl.
- sapf Noma, N. 1997. Annual fluctuations of sapfruits production and synchronization within and inter species in a warm temperate forest on Yakushima Island, Japan. Tropics, 6: 441-449.
- snow Snow, B.K., Snow, D.W. 1971. The feeding ecology of tanagers and honeycreepers in Trinidad. Auk, 88: 291-322.
- wes Silva, W.R., P. De Marco, E. Hasui, and V.S.M. Gomes, 2002. Patterns of fruit-frugivores interactions in two Atlantic Forest bird communities of South-eastern Brazil: implications for conservation. Pp. 423-435. In: D.J. Levey, W.R. Silva and M. Galetti (eds.) Seed dispersal and frugivory: ecology, evolution and conservation. Wallinford: CAB International.
- wyth Snow B.K. & Snow D.W. 1988. Birds and berries, Calton, England.
Special thanks to @camillescott and @pmarkowsky for their many helpful suggestions (and for their patience).