NLPCC2023 Best Paper: Bounding and Filling: A Fast and Flexible Framework for Image Captioning
- We use conda to build virtual environment, and we export the env configs to
env.yaml
, you can reproduce the project env by this yaml. - Besides virtual env, evaluation relies on some metric project, so we build our project based on self-critical.pytorch, you can refer to this project to install metric project, e.g., cider.
- We use MSCOCO Dataset as our dataset, and follow the standard Karpathy Splits.
- To reproduce our training, you need to preprocess sentence in data into phrase level, and the preprocessed data had been uploaded in
data
folder. - We also provide preprocess scripts to build phrase datasets, the detailed usage will be upload soon.
- AT LAST, your data folder should contain three files:
- cocotalk_stanza_kd100_syn_dep0.json
- cocotalk_stanza_kd100_syn_dep0_label.h5
- cocobu_att.lmdb (detailed info refer to here.)
Our training process including two stages: Cross Entropy Training and Self Critical Training
-
Cross Entroy Training
python tools/train.py --cfg configs/uic_sd.yaml --id any_thing_you_like
-
Self Critical Training
python tools/train.py --cfg configs/uic_sd_kd100_sd_nscl.yaml --id any_thing_you_like
remember to edit the checkpoint path while you using self-critical training
python tools/eval.py --input_json data/cocotalk_stanza_kd100_syn_dep0.json --input_att_dir data/cocobu_att.lmdb --input_label_h5 data/cocotalk_stanza_kd100_syn_dep0_label.h5 --num_images -1 --model model.pth --infos_path infos.pkl --language_eval 0