This is an implementation of the paper On diversity in image captioning: metrics and methods and Towards Diverse and Accurate Image Captions via Reinforcing Determinantal Point Process.
Python 2.7
Pytorch 0.4
java 1.8
Please refer to README-DISC.md to train the Att2in model. Note that you can just train the model using cross-entropy for 30 epochs.
XE denotes cross-entropy loss, CIDEr denotes CIDEr reward, DISC denotes retrieval rewards proposed in DISCCap, DIV denotes Self-CIDEr and DPP denotes determinantal point process rewards, which combines the quality (CIDEr) and diversity (self-CIDEr) into one matrix and the determinant of the matrix reflects both diversity and quality.
xe=1
cider=10
disc=1
div=0
naiverl=0
numc=1
bash train_diversity.sh $xe $cider $disc $div selfcider $naiverl $numc
The above values denote the weight of the loss functions and selfcider
denote the diversity function, you can also set it to LSA
or mcider
. For naiverl
, set it to 0, works better and numc
represents the number of captions that should be drawn from p(c|I)
, and in this section 1 is fine.
xe=0
cider=1
disc=0
div=1
naiverl=0
numc=5
bash train_diversity.sh $xe $cider $disc $div selfcider $naiverl $numc
In this section, to compute the diversity reward, you need to set numc
to larger than 1.
numc=5
subset=-1
retrieval=1
bash train_dpp.sh $numc $subset $retrieval
numc
must be larger than 1 and subset
should be -1. If you want to use retrieval reward as the quality function, please set retrieval
to a value that larger than 0, otherwise, only CIDEr is used as the quality function.
bash eval.sh $model_id test random_sample $num_samples
num_samples
denotes the number of captions you want to generate for each image.
This repository is based on DISCCap and we would like to thank the author Ruotian Luo.
If you think this repository is helpful please cite the following papers:
@article{wang2020tpami,
author={Qingzhong Wang and Jia Wan and Antoni B. Chan},
title={On Diversity in Image Captioning: Metrics and Methods},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2020}
}
@article{wang2019towards,
title={Towards Diverse and Accurate Image Captions via Reinforcing Determinantal Point Process},
author={Wang, Qingzhong and Chan, Antoni B},
journal={arXiv preprint arXiv:1908.04919},
year={2019}
}