Skip to content

RodrigoLPA/sempix20

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Evaluation of Neural Image Captions Based on Caption-Image Retrieval

Term project for the course PM Computational Semantics with Pictures at the Universität Potsdam in the summer semester 2020, taught by Prof. Dr. David Schlangen.

Developed by Alexander Koch, Meryem Karalioglu and Rodrigo Lopez Portillo Alcocer.

Abstract

Currently a growing reliance on automatic image captioning systems can be observed. These captions are oftentimes very generic and apply to a multitude of images. Furthermore the evaluation of the quality of these captions is a difficult task. In this paper we evaluate the quality of automatically generated image captions by how discriminative they are. We trained an image captioning system and a multimodal version of it on the Flickr8K dataset. With them we conducted different experiments with varying levels of difficulty. Their implementation and theoretical foundations are described.

The generated captions turn out to be not sufficiently discriminative which is demonstrated by the retrieval evaluation method. We show that this is an applicable method of automatic evaluation.

Setup

To install Pytorch Lightning check out their official GitHub repo here, and Test-tube's pip installation command here.

Demos

In the notebooks/ directory you can find some already run IPython Notebooks with the key parts of our models and evaluations. If you want to execute the ones related to our caption generator, please use the ones in models/caption_generator/ and models/level_generator.

Looking a bit deeper

If you want to look more closely at the scripts and models we used, see the corresponding .py files in models/

Reproducibility

Data

Most of our generated data can be found in the data/ directory.
Heavier files like our neural network's weights and checkpoints can be downloaded from here.
The final weights and embedding for the multimodal model can be downloaded from here.
The versions we used from the Flickr8k and COCO val2014 data sets can be downloaded from kaggle and cocodataset.org respectively.

Training

Pytorch Lightning makes Pytorch code device-independent. If you want to retrain or run some of our models in CPU, simply comment out the following arguments from the Trainer function:

  • gpus
  • num_nodes
  • auto_select_gpus
  • distributed_backend

Other

For a quick review on how to transform Pytorch code into Pytorch Lightning models, https://towardsdatascience.com/from-pytorch-to-pytorch-lightning-a-gentle-introduction-b371b7caaf09 is a good place to start.
If you need help adapting our models to non-SLURM computational clusters, please contact us at [email protected] or check the official Pytorch Lightning documentation at https://pytorch-lightning.readthedocs.io/en/latest/.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.0%
  • Python 1.3%
  • Shell 0.4%
  • C++ 0.1%
  • C 0.1%
  • TeX 0.1%