Image Caption Generator

Final project from Deep Learning 2022 course Skoltech

Team members:

Farid Davletshin
Fakhriddin Tojiboev
Albert Sayapin
Olga Gorbunova
Evgeniy Garsiya
Hai Le
Lina Bashaeva
Dmitriy Gilyov

Environment

We use conda package manager to install required python packages. In order to improve speed and reliability of package version resolution it is advised to use mamba-forge (installation) that works over conda. Once mamba is installed, run the following command (while in the root of the repository):

mamba env create -f environment.yml

This will create new environment named img_caption with many required packages already installed. You can install additional packages by running:

mamba install <package name>

You should run the following commands to install pytorch library:

conda activate img_caption

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

conda install -c pytorch torchtext

In order to read and run Jupyter Notebooks you may follow either of two options:

[recommended] using notebook-compatibility features of modern IDEs, e.g. via python and jupyter extensions of VS Code.
install jupyter notebook packages: either with mamba install jupyterlab or with mamba install jupyter notebook

Note: If you prefer to use conda, just replace mamba commands with conda, e.g. instead of mamba install use conda install.

General setup

Clone this repository

$ git clone https://github.com/tojiboyevf/image_captioning.git

Move to project's directory and download dataset Flickr8k, COCO_2014 and GloVe

$ cd image_captioning
$ bash load_flickr8k.sh
$ bash load_glove.sh
$ bash load_coco.sh

Quick start

If you want to try re-train our models and/or observe evaluation results you are welcome to examples folder.

Open any notebook from there and follow the instructions inside.

Evaluation results

Link to the report

Flickr8k

		bleu 1	bleu 2	bleu 3	bleu 4
vgg16 + lstm	`train` `val` `test`	`55.53` `55.14` `55.41`	`34.94` `34.42` `34.34`	`21.94` `21.36` `21.13`	`14.02` `13.47` `13.29`
vgg16 + transformer	`train` `val` `test`	`53.13` `52.79` `52.76`	`33.63` `33.07` `33.04`	`21.01` `20.13` `20.27`	`13.21` `12.31` `12.38`
densenet161 + lstm	`train` `val` `test`	`55.05` `55.18` `55.27`	`31.18` `31.23` `30.76`	`17.79` `17.75` `17.11`	`10.84` `10.78` `10.23`
densenet161 + transformer	`train` `val` `test`	`69.55` `65.71` `65.98`	`49.93` `44.46` `44.79`	`35.55` `29.94` `30.04`	`25.03` `20.13` `19.75`
DeiT + lstm	`train` `val` `test`	`56.06` `53.23` `53.48`	`34.40` `30.86` `31.06`	`20.97` `17.62` `17.61`	`13.24` `10.91` `10.61`
DeiT + transformer	`train` `val` `test`	`70.43` `62.71` `62.57`	`53.22` `43.71` `44.09`	`42.16` `34.58` `35.11`	`35.15` `29.32` `29.80`
inceptionV3 + transformer	`train` `val` `test`	`61.44` `60.37` `60.19`	`41.09` `39.84` `39.19`	`27.52` `26.26` `25.70`	`18.29` `17.25` `16.70`
resnet34 + transformer	`train` `val` `test`	`67.23` `63.33` `63.70`	`48.05` `42.58` `42.92`	`34.08` `28.69` `29.19`	`23.84` `19.22` `19.51`

COCO val2014

	bleu 1	bleu 2	bleu 3	bleu 4
vgg16 + lstm	`46.71`	`23.75`	`12.25`	`8.39`
vgg16 + transformer	`50.24`	`27.14`	`16.10`	`8.80`
densenet161 + lstm	`49.33`	`23.25`	`11.70`	`9.46`
densenet161 + transformer	`55.38`	`30.71`	`17.09`	`9.79`
DeiT + lstm	`45.73`	`22.04`	`11.14`	`9.12`
DeiT + transformer	`53.09`	`29.76`	`16.92`	`9.95`
inceptionV3 + transformer	`49.14`	`26.49`	`14.21`	`8.11`

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
datasets		datasets
examples		examples
models/torch		models/torch
report		report
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
environment.yml		environment.yml
glove.py		glove.py
load_coco.sh		load_coco.sh
load_flickr8k.sh		load_flickr8k.sh
load_glove.sh		load_glove.sh
metrics.py		metrics.py
requirements.txt		requirements.txt
train_torch.py		train_torch.py
train_transformer.py		train_transformer.py
utils_plot.py		utils_plot.py
utils_torch.py		utils_torch.py
vocabulary.py		vocabulary.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Caption Generator

Environment

General setup

Quick start

Evaluation results

Flickr8k

COCO val2014

About

Releases

Packages

Contributors 3

Languages

License

tojiboyevf/image_captioning

Folders and files

Latest commit

History

Repository files navigation

Image Caption Generator

Environment

General setup

Quick start

Evaluation results

Flickr8k

COCO val2014

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages