GitHub - neherlab/treetime: Maximum likelihood inference of time stamped phylogenies and ancestral reconstruction

TreeTime: maximum likelihood dating and ancestral sequence inference

Overview

TreeTime provides routines for ancestral sequence reconstruction and inference of molecular-clock phylogenies, i.e., a tree where all branches are scaled such that the positions of terminal nodes correspond to their sampling times and internal nodes are placed at the most likely time of divergence.

To optimize the likelihood of time-scaled phylogenies, TreeTime uses an iterative approach that first infers ancestral sequences given the branch length of the tree (assuming the branch length of the input tree is in units of average number of nucleotide or protein substitutions per site in the sequence). After infering ancestral sequences TreeTime optimizes the positions of unconstrained nodes on the time axis, and then repeats this cycle. The only topology optimization are (optional) resolution of polytomies in a way that is most (approximately) consistent with the sampling time constraints on the tree. The package is designed to be used as a stand-alone tool on the command-line or as a library used in larger phylogenetic analysis work-flows. The documentation of TreeTime is hosted on readthedocs.org.

In addition to scripting TreeTime or using it via the command-line, there is also a small web server at treetime.ch.

Have a look at our repository with example data and the tutorials.

Features

ancestral sequence reconstruction (marginal and joint maximum likelihood)
molecular clock tree inference (marginal and joint maximum likelihood)
inference of GTR models
rerooting to maximize temporal signal and optimize the root-to-tip distance vs time relationship
simple phylodynamic analysis such as coalescent model fits
sequence evolution along trees using flexible site specific models.

Migration between discrete geographic regions, host switching, or other transition between discrete states are often parameterized by time-reversible models analogous to models describing evolution of genome sequences. Such models are hence often called "mugration" models. TreeTime GTR model machinery can be used to infer mugration models:

  treetime mugration --tree <input.nwk> --states <states.csv> --attribute <field>

where <field> is the relevant column in the csv file specifying the metadata states.csv, e.g. <field>=country. The full list if options is available by typing treetime mugration -h. Please see the documentation on readthedocs.org for examples and more documentation.

Metadata and date format

Several of TreeTime commands require the user to specify a file with dates and/or other meta data. TreeTime assumes these files to by either comma (csv) or tab-separated (tsv) files. The first line of these files is interpreted as header line specifying the content of the columns. Each file needs to have at least one column that is named name, accession, or strain. This column needs to contain the names of each sequence and match the names of taxons in the tree if one is provided. If more than one of name, accession, or strain is found, TreeTime will use the first.

If the analysis requires dates, at least one column name needs to contain date (i.e. sampling date is fine). Again, if multiple hits are found, TreeTime will use the first. TreeTime will attempt to parse dates in the following way and order

order	type/format	example	description
1	float	2017.56	decimal date
2	[float:float]	[2013.45:2015.56]	decimal date range
3	%Y-%m-%d	2017-08-25	calendar date in ISO format
4	%Y-XX-XX	2017-XX-XX	calendar date missing month and/or day

Example scripts

The following scripts illustrate how treetime can be used to solve common problem with short python scripts. They are meant to be used in an interactive ipython environment and run as run examples/ancestral_inference.py.

ancestral_inference.py illustrates how ancestral sequences are inferred and likely mutations are assigned to branches in the tree,
relaxed_clock.py walks the user through the usage of relaxed molecular clock models.
examples/rerooting_and_timetrees.py illustrates the rerooting and root-to-tip regression scatter plots.
ebola.py uses about 300 sequences from the 2014-2015 Ebola virus outbreak to infer a timetree. This example takes a few minutes to run.

HTML documentation of the different classes and function is available at here.

Related tools

There are several other tools which estimate molecular clock phylogenies.

Beast relies on the MCMC-type sampling of trees. It is hence rather slow for large data sets. But BEAST allows the flexible inclusion of prior distributions, complex evolutionary models, and estimation of parameters.
Least-Square-Dating (LSD) emphasizes speed (it scales as O(N) as TreeTime), but provides limited scope for customization.
treedater by Eric Volz and Simon Frost is an R package that implements time tree estimation and supports relaxed clocks.

Projects using TreeTime

TreeTime is an integral part of the nextstrain.org project to track and analyze viral sequence data in real time.
panX uses TreeTime for ancestral reconstructions and inference of gene gain-loss patterns.

Building the documentation

The API documentation for the TreeTime package is generated created with Sphinx. The source code for the documentaiton is located in doc folder.

sphinx-build to generate static html pages from source. Installed as

pip install Sphinx

After required packages are installed, navigate to doc directory, and build the docs by typing:

make html

Instead of html, another target as latex or epub can be specified to build the docs in the desired format.

Requirements

To build the documentation, sphinx-build tool should be installed. The doc pages are using basicstrap html theme to have the same design as the TreeTime web server. Therefore, the basicstrap theme should be also available in the system.

Developer info

Copyright and License: Pavel Sagulenko, Emma Hodcroft, and Richard Neher, MIT Licence
References
- TreeTime: Maximum-likelihood phylodynamic analysis by Pavel Sagulenko, Vadim Puller and Richard A Neher. Virus Evolution.
- NextStrain: real-time tracking of pathogen evolution by James Hadfield et al. Bioinformatics.

Name		Name	Last commit message	Last commit date
Latest commit History 1,463 Commits
.github/workflows		.github/workflows
benchmarking		benchmarking
docs		docs
test		test
treetime		treetime
.gitignore		.gitignore
.pylintrc		.pylintrc
.readthedocs.yml		.readthedocs.yml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
changelog.md		changelog.md
contributing.md		contributing.md
setup.py		setup.py
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TreeTime: maximum likelihood dating and ancestral sequence inference

Overview

Features

Table of contents

Installation and prerequisites

Command-line usage

Timetrees

Rerooting and substitution rate estimation

Ancestral sequence reconstruction:

Homoplasy analysis

Mugration analysis

Metadata and date format

Example scripts

Related tools

Projects using TreeTime

Building the documentation

Requirements

Developer info

About

Releases 46

Packages

Contributors 18

Languages

License

neherlab/treetime

Folders and files

Latest commit

History

Repository files navigation

TreeTime: maximum likelihood dating and ancestral sequence inference

Overview

Features

Table of contents

Installation and prerequisites

Command-line usage

Timetrees

Rerooting and substitution rate estimation

Ancestral sequence reconstruction:

Homoplasy analysis

Mugration analysis

Metadata and date format

Example scripts

Related tools

Projects using TreeTime

Building the documentation

Requirements

Developer info

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 46

Packages 0

Contributors 18

Languages

Packages