Skip to content

Distance tree

Vyacheslav Brover edited this page Sep 28, 2021 · 7 revisions

The executables are in the directory $TT/phylogeny/.

makeDistTree

Optimize an existing distance tree or create a distance tree.

Main parameters:

  • -input_tree tree: tree file which has been produced by the -output_tree parameter;
  • dissimilarity data:
    • -data data: data file in Data Master format, see Data Master format, or an incremental distance tree directory ending with /;
    • -dissim_attr dissim_attr: dissimilarity attribute in the data file;
  • dissimilarity transformations:
    • -dissim_coeff c: dissimilarities are multiplied by c;
    • -dissim_power p: dissimilarities are raised to the power of p;
  • dissimilarity variance:
    • -variance { lin | sqr | pow | exp | linExp }:
    • -variance_power p: non-negative power for -variance pow;
    • -variance_dissim: flag indicating that variance function is applied to dissimilarities rather than to tree distances;
    • -variance_min m: minimum dissimilarity variance to be added to the computed dissimilarity variance;
  • deletion of objects:
    • -delete obj_list: list of objects to delete from the tree;
    • -keep obj_list: list of objects to keep in the tree and delete all the other objects;
  • optimization:
    • -optimize: flag indicating that tree must be optimized;
    • -subgraph_iter_max i: maximum number of iterations of subgraph optimizations;
    • -skip_len: flag indicating that arc length optimization should be skipped;
    • -reinsert: flag indicating the usage of optimization by reinsertion;
  • fitness outliers:
    • -delete_criterion_outliers criterion_outlier_list: output file to save the list of criterion outliers;
    • -criterion_outlier_num_max n: maximum length of criterion_outlier_list;
    • -delete_deformation_outliers deformation_outlier_list: output file to save the list of deformation outliers;
    • -deformation_outlier_num_max n: maximum length of deformation_outlier_list;
  • hybrid outliers:
    • -hybridness_min hybridness_min: minimum hybridness of hybrid triangles;
    • dissim_boundary b: dissimilarity threshold at which two different dissimilarities are merged causing discontinuity.
      Hybrid triangles are not identified for dissimilarities close to this value;
    • -delete_hybrids hybrid_triangles: output file with hybrid triangles;
  • -reroot_at obj1:obj2: make the middle of the arc of the least common ancestor of the objects named obj1 and obj2 the root of the tree;
  • -output_tree tree: create a tree file in internal format;
  • -threads n: use n processor threads.

Examples

Create a tree using the Data Master file $TT/phylogeny/data/Saccharomyces.dm:

$TT/phylogeny/makeDistTree  -threads 3  -data $TT/phylogeny/data/Saccharomyces \
   -variance linExp  -optimize  -subgraph_iter_max 2 \
   -hybridness_min 1.2  -delete_hybrids Saccharomyces.hybrid  -dissim_boundary 0.675 \
   -output_tree Saccharomyces.tree

Remove all objects from a tree in.tree which are not in the list list:

$TT/phylogeny/makeDistTree  -input_tree in.tree  -keep list  -output_tree out.tree

tree2genogroup

Find genogroups in a tere given a distance threshold.
Main parameters:

  • input_tree: Input tree file;
  • genogroup_dist: Max. distance between objects of the same genogroup;
  • -genogroup_table table: Output file with lines: <object> <genogroup leader>;
  • -genogroups genogroups: Output file with the names of the interior nodes which are genogroup roots;
  • -genogroup_under_genogroup table: Output file with lines: <node1 LCA name> <node2 LCA name>, where nodes belong to different genogroups, but node1 is a child of node2.

tree2obj.sh

Print the list of objects of a distance tree.
Parameter: Input distance tree made by makeDistTree.

Example

Optimize of an existing tree using a subset of dissimilarities with a change of dissimilarity variance:

$TT/phylogeny/tree2obj.sh Saccharomyces.tree > Saccharomyces.list
$TT/dm/dm2subset $TT/phylogeny/data/Saccharomyces Saccharomyces.list > subset.dm
$TT/phylogeny/makeDistTree  -threads 3  -input_tree Saccharomyces.tree -data subset 
  -variance pow  -variance_power 3  -optimize  -subgraph_iter_max 2 

hybrid2list.sh

Extract the list of hybrid objects from the file hybrid_triangles made by makeDistTree and print it.
Parameter: file hybrid_triangles.

Converters

printDistTree

Main parameters:

  • Input tree file
  • -name_match name_match: File with lines: <name_old> <tab> <name_new>, to replace leaf names;
  • -decimals decimals: Number of decimals in arc lengths, default = 6;
  • -format { newick | itree (makeDistTree output) | ASNT (textual ASN.1) } : default = newick;
  • -ext_name: Extended leaf names for newick;
  • -order: Order subtrees by the number of leaves descending,

Examples

Convert a tree from an internal format to Newick adding normalized object criterion to each leaf:

$TT/phylogeny/printDistTree  -data $TT/phylogeny/data/Enterobacteriaceae  -dissim_attr Conservation \
  -variance linExp  Enterobacteriaceae.tree  \
  -order  -decimals 4  -ext_name > Enterobacteriaceae.nw

Convert a tree from an internal format to Newick without adding normalized object criterion to each leaf:

$TT/phylogeny/printDistTree  Enterobacteriaceae.tree  -order  -decimals 4 \
   > Enterobacteriaceae.nw

newick2tree

Convert a newick tree to the makeDistTree tree format.
Parameter: Input newick tree.

asnt2tree

attr2_2paup

attr2_2phylip

Comparison with PAUP*

PAUP* version used: Portable version 4.0b10 for Unix

Prepare data for PAUP*

$TT/phylogeny/attr2_2paup $TT/phylogeny/data/Saccharomyces cons map > Saccharomyces.nex

Run PAUP*

$ paup Saccharomyces.nex
paup> Set criterion=distance;
paup> dset objective=lsfit power=2;
paup> hsearch
...
    Elapsed   Taxa      Rearr.   -- Number of trees --      Best
       time  added      tried    saved    left-to-swap   tree(s)
  --------------------------------------------------------------
    0:01:00      -         247       1          1      3148.9391
    ...
    1:00:07      -       14984       1          1      1334.1309
    ^C

Run makeDistTree

$TT/phylogeny/makeDistTree -data $TT/phylogeny/data/Saccharomyces -variance sqr \
   -variance_dissim -optimize

Takes 2 min.
Abs. criterion = 6.4861e+02.