This repository explores translation of natural language questions to SQL code to get data from relational databases. The base model and algorithm was inspired and based upon the Coarse2Fine repo.
This repo provides:
- inference files for running the Coarse2Fine model with new input questions over tables from WikiSQL,
- a sample Flask app that uses the inference files to serve the model, and
- a simplified implementation of execution guidance when decoding the SQL code to improve the accuracy of the model.
Here are some slides for the presentation of this repo, and the Flask app page serving the model (www.nlp2sql.com):
You need:
- Stanford CreNLP for data annotation
- Spacy for part-of-speech tagging of the natural language question
- PyTorch 0.2.0.post3 from previous versions of PyTorch (using GPU)
pip install -r requirements.txt
Download pretrained model from here and unzip it in the folder pretrained
in the root folder.
Use these for training (preprocess.py will save the data files in pt format):
cd src/
python preprocess.py
python train.py
You can modify the config/model_config.json
file and run run_model.py for infering the model with new input questions:
Example:
python run_model.py -config_path "config/model_config.json" -question "what was the result of the game with New York Jets?"
Result:
SQL code: SELECT `Result` FROM table WHERE `Opponent` = New York Jets
Execution result: w 20-13
Evaluate the model over all questions from tables of the test set in WikiSQL:
cd src/
python evaluate.py -model_path ../pretrained/pretrain.pt -beam_search