Skip to content

Commit

Permalink
add details of how to run manually
Browse files Browse the repository at this point in the history
  • Loading branch information
DavidBSauer authored Oct 29, 2018
1 parent 6953c56 commit bd0b87e
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions prediction/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,9 @@ python3 prediction_pipeline.py regression_model_directory/ genomes_retrieved.txt
note: Models for the same taxon in the regression model directory should be cleaned-up so all taxa models are non-redundant prior to use. Each model file should be named rank-taxon-xxx. Each model file is a list features-coefficients pairs, tab separated.

The final result will be in the file newly_predicted_OGTs.txt, listing each species, the predicted OGT, and the taxonomic model used for the prediction.

## Notes
If you are running on your genomes not available for download:
1. Place each genome in a folder in the prediction directory called "genomes/XXX/" where XXX is the name of the species. The species names are unimportant for this regression and can be placeholders. However, they should be unique as they denote from which genomes features can be averaged.
2. Create a tab separated file of the genomes and species pairs. Provide this file in place of genomes_retrieved.txt.
3. Create a file for the taxonomic classification of each species. The top line needs to be "species", then all ranks for which a species will be classified, tab separated. Each species should then be listed on a new line, followed by its classification, all tab separated. Provide this file in place of species_taxonomic.txt.

0 comments on commit bd0b87e

Please sign in to comment.