-
Notifications
You must be signed in to change notification settings - Fork 8
Incremental distance tree tools
The executables are in the directory $TT/phylogeny/
.
All incremental distance tree scripts start with distTree_inc
.
The first parameter of these scripts is an incremental distance tree directory.
Create an incremental distance tree directory with parameter files and empty scripts.
Create an incremental distance tree directory with the parameter files and scripts from standard for a biologial project:
- Genome
- bacteria
- fungi
- Metazoa
- Protists
- Viridiplantae
- rRNA
- bacteria (prokaryotic 16S)
- fungi
- ITS
- 18S
- 28S
- SSU (eukaryotic 18S)
- 5.8S (eukaryotic)
- virus
- SARS-CoV-2
For an incremental distance tree directory and a list of objects
compute a complete pairwise dissimilarity matrix, store it in a Data Master file data.dm
and build a distance tree.
On finishing, a Data Master format file data.dm
is created.
This file contains a two-way attribute "dissim" - dissimilarity matrix computed by inc/request2dissim.sh
.
Test specific variance or dissimilarity parameters on a pairwise dissimilarity matrix stored in a Data Master file,
e.g., data.dm
produced by $TT/phylogeny/distTree_inc_complete.sh
.
Incremental distance tree building.
Requires a computer with large memory.
For a tree with 200,000 objects the addition of new objects has the speed of about 7,000 objects per day using 30 threads on a computer with the speed of 3500 MHz.
Theoretically, the running time is O(n log^4 n)
and space is O(n log^3 n)
, where n is the number of objects.
On finishing, a Data Master format file leaf_errors.dm
is created.
This file contains two attributes defined on objects:
- "leaf_error": normalized object criterion, which theoretically has a standard normal distribution;
- "deformation": relative object deformation, which theoretically has the distribution of a maximum of 100 chi^2 with 1 degree of freedom (if the tree has 100 objects).
Large values of these attributes identify outlier objects.
Delete a list of objects from the tree in an incremental distance tree directory.
Test specific variance or dissimilarity parameters on an existing incremental distance tree directory without changing it.
Print the status of an incremental distance tree directory: version, the number of objects in the tree, etc.