Final project from Deep Learning 2022 course Skoltech
Team members:
- Farid Davletshin
- Fakhriddin Tojiboev
- Albert Sayapin
- Olga Gorbunova
- Evgeniy Garsiya
- Hai Le
- Lina Bashaeva
- Dmitriy Gilyov
We use conda
package manager to install required python packages. In order to improve speed and reliability of package version resolution it is advised to use mamba-forge
(installation) that works over conda
. Once mamba is installed
, run the following command (while in the root of the repository):
mamba env create -f environment.yml
This will create new environment named img_caption
with many required packages already installed. You can install additional packages by running:
mamba install <package name>
You should run the following commands to install pytorch library:
conda activate img_caption
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
conda install -c pytorch torchtext
In order to read and run Jupyter Notebooks
you may follow either of two options:
- [recommended] using notebook-compatibility features of modern IDEs, e.g. via
python
andjupyter
extensions of VS Code. - install jupyter notebook packages:
either with
mamba install jupyterlab
or withmamba install jupyter notebook
Note: If you prefer to use conda
, just replace mamba
commands with conda
, e.g. instead of mamba install
use conda install
.
- Clone this repository
$ git clone https://github.com/tojiboyevf/image_captioning.git
- Move to project's directory and download dataset Flickr8k, COCO_2014 and GloVe
$ cd image_captioning
$ bash load_flickr8k.sh
$ bash load_glove.sh
$ bash load_coco.sh
If you want to try re-train our models and/or observe evaluation results you are welcome to examples
folder.
Open any notebook from there and follow the instructions inside.
Link to the report
bleu 1 | bleu 2 | bleu 3 | bleu 4 | ||
---|---|---|---|---|---|
vgg16 + lstm | train val test |
55.53 55.14 55.41 |
34.94 34.42 34.34 |
21.94 21.36 21.13 |
14.02 13.47 13.29 |
vgg16 + transformer | train val test |
53.13 52.79 52.76 |
33.63 33.07 33.04 |
21.01 20.13 20.27 |
13.21 12.31 12.38 |
densenet161 + lstm | train val test |
55.05 55.18 55.27 |
31.18 31.23 30.76 |
17.79 17.75 17.11 |
10.84 10.78 10.23 |
densenet161 + transformer | train val test |
69.55 65.71 65.98 |
49.93 44.46 44.79 |
35.55 29.94 30.04 |
25.03 20.13 19.75 |
DeiT + lstm | train val test |
56.06 53.23 53.48 |
34.40 30.86 31.06 |
20.97 17.62 17.61 |
13.24 10.91 10.61 |
DeiT + transformer | train val test |
70.43 62.71 62.57 |
53.22 43.71 44.09 |
42.16 34.58 35.11 |
35.15 29.32 29.80 |
inceptionV3 + transformer | train val test |
61.44 60.37 60.19 |
41.09 39.84 39.19 |
27.52 26.26 25.70 |
18.29 17.25 16.70 |
resnet34 + transformer | train val test |
67.23 63.33 63.70 |
48.05 42.58 42.92 |
34.08 28.69 29.19 |
23.84 19.22 19.51 |
bleu 1 | bleu 2 | bleu 3 | bleu 4 | |
---|---|---|---|---|
vgg16 + lstm | 46.71 |
23.75 |
12.25 |
8.39 |
vgg16 + transformer | 50.24 |
27.14 |
16.10 |
8.80 |
densenet161 + lstm | 49.33 |
23.25 |
11.70 |
9.46 |
densenet161 + transformer | 55.38 |
30.71 |
17.09 |
9.79 |
DeiT + lstm | 45.73 |
22.04 |
11.14 |
9.12 |
DeiT + transformer | 53.09 |
29.76 |
16.92 |
9.95 |
inceptionV3 + transformer | 49.14 |
26.49 |
14.21 |
8.11 |