Skip to content

An attention based sequential deep learning model implemented in pytorch to generate single line caption given an input image

Notifications You must be signed in to change notification settings

Subangkar/Image-Captioning-Attention-PyTorch

Repository files navigation

Image-Captioning-PyTorch

This repo contains codes to preprocess, train and evaluate sequence models on Flickr8k Image dataset in pytorch. This repo was a part of a Deep Learning Project for the Machine Learning Sessional course of Department of CSE, BUET for the session January-2020.

Models Experimented with:

  • Pretrained CNN encoder & LSTM based Decoder
    • VGG-16, Inception-v3, Resnet-50, Resnet-101, Resnext-101, Densenet-201
  • Pretrained Resnet-101 & LSTM with Attention Mechanism

Open Pretrained Attention Model's Notebook or Pretrained MonoLSTM Model's Notebook in colab and execute from top to bottom.

Pre-requisites:

Data Folder Structure for training using train_torch.py or train_attntn.py:

data/
    flickr8k/
        Flicker8k_Dataset/
            *.jpg
        Flickr8k_text/
            Flickr8k.token.txt
            Flickr_8k.devImages.txt
            Flickr_8k.testImages.txt
            Flickr_8k.trainImages.txt
    glove.6B/
        glove.6B.50d.txt
        glove.6B.100d.txt
        glove.6B.200d.txt
        glove.6B.300d.txt

Pretrained Models:
Some pre-trained weights are provided here

Bleu score comparision of trained models: alt text

About

An attention based sequential deep learning model implemented in pytorch to generate single line caption given an input image

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published