This repository is an official implementation of the paper titled above. Please refer to project page or paper for more details.
We check the reproducibility under this environment.
- Python3.7
- CUDA 11.3
- Tensorflow 2.8
Install python dependencies. Perhaps this should be done inside venv
.
pip install -r requirements.txt
Note that Tensorflow has a version-specific system requirement for GPU environment. Check if the compatible CUDA/CuDNN runtime is installed.
To try demo on pre-trained models
- download pre-processed datasets for crello / rico and unzip it under
./data
. - download pre-trained checkpointsfor crello / rico and unzip it under
./results
.
You can test some tasks using the pre-trained models in the notebook.
You can train your own model.
The trainer script takes a few arguments to control hyperparameters.
See src/mfp/mfp/args.py
for the list of available options.
If the script slows an out-of-memory error, please make sure other processes do not occupy GPU memory and adjust --batch_size
.
bin/train_mfp.sh crello --masking_method random # Ours-IMP
bin/train_mfp.sh crello --masking_method elem_pos_attr_img_txt # Ours-EXP
bin/train_mfp.sh crello --masking_method elem_pos_attr_img_txt --weights <WEIGHTS> # Ours-EXP-FT
The trainer outputs logs, evaluation results, and checkpoints to tmp/mfp/jobs/<job_id>
.
The training progress can be monitored via tensorboard
.
You perform quantitative evaluation.
bin/eval_mfp.sh --job_dir <JOB_DIR> (<ADDITIONAL_ARGS>)
See eval.py for <ADDITIONAL_ARGS>
.
You can test some tasks using the pre-trained models in the notebook.
The process is almost similar as above.
bin/train_mfp.sh rico --masking_method random # Ours-IMP
bin/train_mfp.sh rico --masking_method elem_pos_attr # Ours-EXP
bin/train_mfp.sh rico --masking_method elem_pos_attr --weights <WEIGHTS> # Ours-EXP-FT
The process is similar as above.
If you find this code useful for your research, please cite our paper.
@inproceedings{inoue2023document,
title={{Towards Flexible Multi-modal Document Models}},
author={Naoto Inoue and Kotaro Kikuchi and Edgar Simo-Serra and Mayu Otani and Kota Yamaguchi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2023},
pages={14287-14296},
}