- Authors: Youngjae Yu1∗, Jiwan Chung*, Heeseung Yun, Jongseok Kim, Gunhee Kim
- Paper: CVPR2021 (pdf, slide, video)
PyTorch code for the CVPR 2021 paper "Transitional Adaptation of Pretrained Models for Visual Storytelling".
We propose an explicit visual adaptation step to harmonize the visual encoder with the pretrained language models. Our simple adaptation objective aims to bridge the gap between the nature of the information stored in the visual encoder and the language decoder.
Python 3.7 PyTorch 1.5
The other dependencies are specified in the requirements.txt
file.
git clone $THIS_REPO
cd $THIS_REPO
pip install requirements_primary.txt
pip install requirements.txt
download stanfordnlp.download('en_ewt')
Store the datasets in $THIS_REPO/data
e.g. data/LSMDC
and data/VIST
For detailed instructions on how to extract relevant features, please refer to our guide on Dataset Preperation
Please follow the instructions on Download to download the dataset.
From the downloaded files, extract and move the task1
folder to under $THIS_REPO/data/LSMDC
directory.
The above link contains the two features: I3D and Resnet152.
Extract and move both features to under $THIS_REPO/data/LSMDC/features
directory.
We also provide alternative features extracted with ResNext. Note that to reproduce our results you need these features instead of the official ones. Download
Please follow the instructions on Download to download the dataset.
Download the Stories of Images-in-Sequence (SIS) set, extract and move the folder to under $THIS_REPO/data/VIST
directory.
e.g. data/VIST/sis
The above link contains the raw image files.
Use Resnet152 pretrained on ImageNet to extract features for each image.
Store the features with numpy.save
following the below structure.
resnet/
train/
{image_id}.npy
test/
val/
Use Faster-RCNN model to extract object classification logits.
Store the features with numpy.save
following the below structure.
rcnn/
train/
{image_id}.npy
test/
val/
Use VILBERT model to extract last hidden state vector.
Store the features with pickle.dump
following the below structure.
rcnn/
train/
{album_id}/
{image_id}.pickle
test/
val/
cd code
python cli.py train with model=no_gt_sos fix_gpt_epoch=5 feature_names="['video', 'images']"
python cli.py train with model=no_gt_sos fix_gpt_epoch=3 feature_names="['images', 'box']" use_vist=True
with additional vilbert features
cd code
python cli.py train with model=no_gt_sos fix_gpt_epoch=3 feature_names="['images', 'box', 'vilbert']" use_vist=True
python cli.py scripts with cript=[SCRIPT_NAME] (additional args)
Please take a look at the config.py
file for more options.