Term project for the course PM Computational Semantics with Pictures at the Universität Potsdam in the summer semester 2020, taught by Prof. Dr. David Schlangen.
Developed by Alexander Koch, Meryem Karalioglu and Rodrigo Lopez Portillo Alcocer.
Currently a growing reliance on automatic image captioning systems can be observed. These captions are oftentimes very generic and apply to a multitude of images. Furthermore the evaluation of the quality of these captions is a difficult task. In this paper we evaluate the quality of automatically generated image captions by how discriminative they are. We trained an image captioning system and a multimodal version of it on the Flickr8K dataset. With them we conducted different experiments with varying levels of difficulty. Their implementation and theoretical foundations are described.
The generated captions turn out to be not sufficiently discriminative which is demonstrated by the retrieval evaluation method. We show that this is an applicable method of automatic evaluation.
To install Pytorch Lightning check out their official GitHub repo here, and Test-tube's pip installation command here.
In the notebooks/
directory you can find some already run IPython Notebooks with the key parts of our models and evaluations. If you want to execute the ones related to our caption generator, please use the ones in models/caption_generator/
and models/level_generator
.
If you want to look more closely at the scripts and models we used, see the corresponding .py files in models/
Most of our generated data can be found in the data/
directory.
Heavier files like our neural network's weights and checkpoints can be downloaded from here.
The final weights and embedding for the multimodal model can be downloaded from here.
The versions we used from the Flickr8k and COCO val2014 data sets can be downloaded from kaggle and cocodataset.org respectively.
Pytorch Lightning makes Pytorch code device-independent. If you want to retrain or run some of our models in CPU, simply comment out the following arguments from the Trainer function:
- gpus
- num_nodes
- auto_select_gpus
- distributed_backend
For a quick review on how to transform Pytorch code into Pytorch Lightning models, https://towardsdatascience.com/from-pytorch-to-pytorch-lightning-a-gentle-introduction-b371b7caaf09 is a good place to start.
If you need help adapting our models to non-SLURM computational clusters, please contact us at [email protected] or check the official Pytorch Lightning documentation at https://pytorch-lightning.readthedocs.io/en/latest/.