This project is a search engine for Protein Function. Its designed to be able to attend to free text queries to retrieve relevant ranked results of Protein Functions.
Pre-reqs:
- The program downloads a lot of data from API's.
- Please ensure that atleast 2 GB space is free.
- It takes a while to run the classification program.
- Solr version 6.4 is required for indexing.
- Three collections have to be created in Solr: PF-WORDS, PF-Core, PF-Intermediate
- Ensure the newly configured collection in Solr matches the managed schema in the git repo.
Following libraries are required:
- pysolr (https://github.com/django-haystack/pysolr)
- sklearn (https://github.com/scikit-learn/scikit-learn)
- flask (https://github.com/pallets/flask)
- NLTK (http://www.nltk.org/install.html)
- numpy (https://www.scipy.org/install.html)
To run:
- Run UI
python3 FlaskContainer.py
Go to: http://localhost:5000/test/
- Compile All Docs and Core Docs:
python3 AllDocs.py
- Classify documents (Train: SVM, NB | Predict: SVM):
python3 Classification.py