add details of how to run manually

DavidBSauer · Oct 29, 2018 · bd0b87e · bd0b87e
1 parent 6953c56
commit bd0b87e
Showing 1 changed file with 6 additions and 0 deletions.
diff --git a/prediction/README.md b/prediction/README.md
@@ -22,3 +22,9 @@ python3 prediction_pipeline.py regression_model_directory/ genomes_retrieved.txt
 note: Models for the same taxon in the regression model directory should be cleaned-up so all taxa models are non-redundant prior to use. Each model file should be named rank-taxon-xxx. Each model file is a list features-coefficients pairs, tab separated.
 
 The final result will be in the file newly_predicted_OGTs.txt, listing each species, the predicted OGT, and the taxonomic model used for the prediction.
+
+## Notes
+If you are running on your genomes not available for download: 
+1. Place each genome in a folder in the prediction directory called "genomes/XXX/" where XXX is the name of the species. The species names are unimportant for this regression and can be placeholders. However, they should be unique as they denote from which genomes features can be averaged.
+2. Create a tab separated file of the genomes and species pairs. Provide this file in place of genomes_retrieved.txt. 
+3. Create a file for the taxonomic classification of each species. The top line needs to be "species", then all ranks for which a species will be classified, tab separated. Each species should then be listed on a new line, followed by its classification, all tab separated. Provide this file in place of species_taxonomic.txt.