Please use the following citation:
@inproceedings{puzikov-gurevych-2018-e2e,
title = "{E}2{E} {NLG} Challenge: Neural Models vs. Templates",
author = "Puzikov, Yevgeniy and Gurevych, Iryna",
booktitle = "Proceedings of the 11th International Conference on Natural Language Generation",
month = nov,
year = "2018",
address = "Tilburg University, The Netherlands",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/W18-6557",
doi = "10.18653/v1/W18-6557",
pages = "463--471",
abstract = "E2E NLG Challenge is a shared task on generating restaurant descriptions from sets of key-value pairs. This paper describes the results of our participation in the challenge. We develop a simple, yet effective neural encoder-decoder model which produces fluent restaurant descriptions and outperforms a strong baseline. We further analyze the data provided by the organizers and conclude that the task can also be approached with a template-based model developed in just a few hours.",
}
Abstract:
E2E NLG Challenge is a shared task on generating restaurant descriptions from sets of key-value pairs. This paper describes the results of our participation in the challenge. We develop a simple, yet effective neural encoder-decoder model which produces fluent restaurant descriptions and outperforms a strong baseline. We further analyze the data provided by the organizers and conclude that the task can also be approached with a template-based model developed in just a few hours.
Contact person: Yevgeniy Puzikov, [email protected]
https://www.ukp.tu-darmstadt.de/
Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.
This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
- Official website: http://www.macs.hw.ac.uk/InteractionLab/E2E/
- Submission deadline: 31 October 2017
- Evaluation protocol:
- automatic metrics for system development
- final (human) evaluation by crowd workers and experts
The repository contains code for an MLP-based encoder-decoder model and a template-based deterministic system:
run_experiment.py
: main script to runconfig_e2e_MLP_train.yaml
andconfig_e2e_MLP_predict.yaml
: configuration files to use with the script abovecomponents/
: NN components and the template modelpredictions/
:e2e_model_MLP_seedXXX
: 20 folders with predictions and scores from the NN model (one per different random seed)model-t_predictions.txt
-- predictions of the template-based modelaggregate.py
-- a script to aggregate NN model scores
- 64-bit Linux versions
- Python 3 and dependencies in the environment.yaml file
-
Install Python3 dependencies:
$ conda env create -f environment.yaml
This will create an Anaconda environment
e2e
. To activate this environment, run:$ conda activate e2e
-
Python2 dependencies are needed only to run the official evaluation scripts. See installation instructions here.
- Step 1
The repository contains two template yaml files for training Model-D and using it later for prediction.
Before using the files, run:
$ envsubst < config.yaml > my-config.yaml
This will replace shell format strings (e.g, $HOME) in your .yaml files with the corresponding environment variables' values (see this page for details). Use the my-config.yaml for the experiments.
- Step2
Modify PYTHON2
and E2E_METRCIS_FOLDER
variables in the following file:
components/evaluator/eval_scripts/run_eval.sh
This shell script is calling the external evaluation tools.
PYTHON2
denotes a specific python environment with all the necessary dependencies installed.
E2E_METRICS_FOLDER
denotes the cloned repository with the aforementioned tools.
-
Model-D:
-
Adjust data paths and hyper-parameter values in the config file (my_config.yaml, as a running example).
-
Run the following command:
$ python run_experiment.py my_config.yaml
-
After the experiment, a folder will be created in the directory specified by the experiments_dir field of my_config.yaml file. This folder should contain the following files:
- experiment log (log.txt)
- model weights and development set predictions for each training epoch (weights.epochX, predictions.epochX)
- a csv file with scores and train/dev losses for each epoch (scores.csv)
- configuration dictionary in json format (config.json)
- pdf files with learning curves (optional)
-
If you use a model for prediction (by setting "predict" as the value for the mode field in the config file and specifying model path in model_fn), the predictions done by the loaded model will be stored in:
- $model_fn.devset.predictions.txt
- $model_fn.testset.predictions.txt
-
-
Model-T:
To make predictions on filename.txt, run the following command:
$ python components/template-baseline.py filename.txt MODE
Here, filename.txt is either devset or testset CSV file; MODE can be either 'dev' or 'test'.
Model-T's predictions are saved in filename.txt.predicted.
./predictions contains prediction files for 20 instances of Model-D, trained with different values of the random seed. Note that those are predictions scored highest epoch-wise for each model. The folder also contains the predictions of Model-T (also on dev set) and a Python script to aggregate the results.
Navigate to ./predictions/ and run:
python aggregate.py */scores.csv
This will output mean scores averaged over 20 runs (with standard deviation and some other useful statistics).
After running the experiments, you should expect the following results (development set):
Metric | TGen | Model-D | Model-T |
---|---|---|---|
BLEU | 0.6925 | 0.7128 (+-0.013) | 0.6051 |
NIST | 8.4781 | 8.5020 (+-0.092) | 7.5257 |
METEOR | 0.4703 | 0.4770 (+-0.012) | 0.4678 |
ROUGE-L | 0.7257 | 0.7378 (+-0.015) | 0.6890 |
CIDEr | 2.3987 | 2.4432 (+-0.088) | 1.6997 |
- TGen - baseline from the organizers
- Model-D - data-driven model (enc-dec model with an MLP as encoder)
- Model-T - template-based system