Skip to content

proposed_taxonomy_api

Noah Hoffman edited this page Aug 11, 2011 · 7 revisions

proposed API for module Bio.Taxonomy

class Taxonomy

__init__(filename) :: String -> Taxonomy

filename is an SQLite3 file holding the taxonomy. It is created if it does not exist.

close() :: None

Closes this taxonomy database.

node(taxid) :: String -> Node (or raise NoSuchTaxIDException)

Return a Node object corresponding to taxid.

has_node(taxid) :: String -> Boolean

Check if taxid exists in this taxonomy.

species(taxid) :: String -> Node | None (or raise NoSuchTaxIDException)

Return the entry in taxid's lineage at the species rank, if any exists, or None otherwise.

match_species(regex) :: String -> [Node]

Find any nodes whose species entry matches regex.

it might be nice to generalize the above method to other ranks, perhaps something like

match_name(regex, rank = None) :: String, String -> [Node]

Find any nodes whose entry matches regex at the specified rank, or at any rank if rank is None.

lineage(taxid) :: String -> {String: Node} (or raise NoSuchTaxIDException)

Return a dictionary of "rank":Node pairs of all nodes in the lineage of taxid (including taxid itself).

the return value of lineage should specify an order, so either an OrderedDict or list of (rank, Node) tuples

is_primary(taxid) :: String -> Bool (or raise NoSuchTaxIDException)

Returns True is taxid if a primary taxid (i.e., has not been merged with another); returns False otherwise.

String, String, String, String, [String], [String], {String:?} -> Taxonomy```

Add a new entry, automatically creating a node given the information here.

```add_node(node) :: Node -> Taxonomy```

Add a node object.

```add_nodes(nodes) :: [Node] -> Taxonomy```

Calls add_node on each element of *nodes*.

```add_nodes_from_excel(filename, mapping=?, source=?, prefix=?) :: String -> Taxonomy```

Add nodes from an Excel spreadsheet.  Source is the source to list for all these new nodes, prefix is inserted before each taxid in the spreadsheet, and mapping is a function from a dictionary representing the row to a Node object which is used to construct the Node from the spreadsheet row.  It has a default value which makes it work on the vaginal set's spreadsheets.

_perhaps generalize the above to the following, where file_type may be one of ['excel','csv']_

```add_nodes_from_file(filename, file_type='excel', mapping=?, source=?, prefix=?) :: String -> Taxonomy```

```pop_node(taxid) :: String -> Node (or raise NoSuchTaxIDException)```

Remove the node corresponding to *taxid* from the taxonomy, and return it.

```ranks() :: -> [(String, Integer)]```

Returns a list of rank names and levels that are in the taxonomy.

```rank(r) :: String|Integer -> (String,Integer)|None```

Returns the information on a rank specified by a name or level.

```add_rank(name, level) :: String, Integer -> Taxonomy```

Add a new rank to the taxonomy. _it may be more convenient to specify the name of a parent rank rather than the level_

```add_ranks(ranks) :: [(String,Integer)] -> Taxonomy```

Calls add_rank on each element of *rank*.

```pop_rank(r):: String|Integer -> (String,Integer) (or raise NoSuchRankException)```

Delete a rank from the taxonomy, either by name or level.

```sources() :: -> [(String,String)]```

Return a list of all sources in the taxonomy.  Each source is returned as (name,description).

```add_source(name, description=None) :: String[,String] -> Taxonomy```

Adds a source to the taxonomy.

```pop_source(name) :: String -> (String,String)|None```

Removed the source named *name* if it exists, returning its (name,description).  If there is no such source, returns None.

## class Node
```__init__(taxid=None, parent=None, rank=None, source=None, annotations=None, names=None, old_taxids=None)```

taxid :: String parent :: String rank :: String source :: String annotations :: {String:?} names :: [String] old_taxids :: [String]


* _should it be an error to initialize a Node object without taxid, parent, rank, or names?_
* _there needs to be some way of indicating which name in *names* is the primary name; perhaps the first in the list is assumed to be the primary names, and any others are synonyms?_

# module Bio.Taxonomy.NCBI

(All modules provide ``ranks``, and a function to load the taxonomy into a Taxonomy object.)

```ranks :: Bimap(String:Integer)```

```load_ncbi_taxonomy(taxdb, local=None, prefix=None) :: Taxonomy, local=String|None, prefix=String|None -> Taxonomy```

Inserts a copy of the NCBI taxonomy into *taxdb*.  If *local* is None, then it downloads the taxonomy from NCBI; otherwise *local* should point to the zip file downloaded and that will be used instead.  If *prefix* is None, the taxids of the taxonomy are inserted as is.  Otherwise the given prefix is inserted before each taxid.

# module Bio.Taxonomy.Greengenes, etc.
Clone this wiki locally