Evaluation of Neural Image Captions Based on Caption-Image Retrieval

Term project for the course PM Computational Semantics with Pictures at the Universität Potsdam in the summer semester 2020, taught by Prof. Dr. David Schlangen.

Developed by Alexander Koch, Meryem Karalioglu and Rodrigo Lopez Portillo Alcocer.

Abstract

Currently a growing reliance on automatic image captioning systems can be observed. These captions are oftentimes very generic and apply to a multitude of images. Furthermore the evaluation of the quality of these captions is a difficult task. In this paper we evaluate the quality of automatically generated image captions by how discriminative they are. We trained an image captioning system and a multimodal version of it on the Flickr8K dataset. With them we conducted different experiments with varying levels of difficulty. Their implementation and theoretical foundations are described.

The generated captions turn out to be not sufficiently discriminative which is demonstrated by the retrieval evaluation method. We show that this is an applicable method of automatic evaluation.

Setup

To install Pytorch Lightning check out their official GitHub repo here, and Test-tube's pip installation command here.

Demos

In the notebooks/ directory you can find some already run IPython Notebooks with the key parts of our models and evaluations. If you want to execute the ones related to our caption generator, please use the ones in models/caption_generator/ and models/level_generator.

Looking a bit deeper

If you want to look more closely at the scripts and models we used, see the corresponding .py files in models/

Reproducibility

Data

Most of our generated data can be found in the data/ directory.
Heavier files like our neural network's weights and checkpoints can be downloaded from here.
The final weights and embedding for the multimodal model can be downloaded from here.
The versions we used from the Flickr8k and COCO val2014 data sets can be downloaded from kaggle and cocodataset.org respectively.

Training

Pytorch Lightning makes Pytorch code device-independent. If you want to retrain or run some of our models in CPU, simply comment out the following arguments from the Trainer function:

gpus
num_nodes
auto_select_gpus
distributed_backend

Other

For a quick review on how to transform Pytorch code into Pytorch Lightning models, https://towardsdatascience.com/from-pytorch-to-pytorch-lightning-a-gentle-introduction-b371b7caaf09 is a good place to start.
If you need help adapting our models to non-SLURM computational clusters, please contact us at [email protected] or check the official Pytorch Lightning documentation at https://pytorch-lightning.readthedocs.io/en/latest/.

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
data		data
expose		expose
models		models
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
contract.pdf		contract.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluation of Neural Image Captions Based on Caption-Image Retrieval

Abstract

Setup

Demos

Looking a bit deeper

Reproducibility

Data

Training

Other

About

Releases

Packages

Languages

RodrigoLPA/sempix20

Folders and files

Latest commit

History

Repository files navigation

Evaluation of Neural Image Captions Based on Caption-Image Retrieval

Abstract

Setup

Demos

Looking a bit deeper

Reproducibility

Data

Training

Other

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages