-
Notifications
You must be signed in to change notification settings - Fork 13
Home
Jaime Huerta-Cepas edited this page Jun 14, 2016
·
9 revisions
- To develop an ETE module to search for custom tree patterns in large collections of trees.
- pattern could be defined as a Newick structure, where rules and filters are encoded in a hopefully user-friendly vocabulary. For instance:
((sp=Hsa,sp=Pta,name=Hsa001),dups>1, name, H in species, )[dist>0.1*starter]
- The matcher should be a python object transparent to the user, so querying can be done like:
TreeMatcher(pattern).find_matching_nodes([tree1, tree2, tree3])
- The tree matcher object should allow for different types of searches. For instance,
TreeMatcher.is_match(node)
TreeMatcher.has_match(tree)
TreeMatcher.find_matching_nodes([tree1, tree2], matches_per_tree=1)
TreeMatcher.find_matching_nodes([tree1, tree2], matches_per_tree=1)
etc...
-
Search for the most optimal way to find matches. Check recursion, heuristic methods, etc.. Think the matching algorithm should be able to search over thousands of trees.
-
Develop a way to auto generate patterns from a bunch of real trees. In example, finding the commonalities from a a group of trees and generate a pattern expression that can be used to find similar structures.
-
tree patterns should allow common operators and have a basic language to permit user defined functions and filters.
@ = target node
OR ||
AND &&
NOT !
OPERATORS >= > < <= != == ~= IN
custom functions
@leaf
@size
@contains
function arguments = {}
-
Develop a visualization layout to compare trees and patterns
-
Implement parallel tree matching to scan large collections of tree (i.e., with multiprocessing)