Codes for the system presented in "DocParser: Hierarchical Structure Parsing of Document Renderings"
Tested for Ubuntu 18.04/20.04.
Use of a GPU significantly speeds up generation of detection outputs, but it is possible to run the inference demo code on CPU.
To setup via Anaconda, please follow these steps:
-
Install anaconda. Up-to-date instructions can be found at: https://docs.anaconda.com/anaconda/install/
-
Set up python 3.6 environment:
conda create -n docparser python=3.6
-
Activate the environment:
source activate docparser
-
Install all requirements:
pip install -r requirements.txt
- (for GPU-enabled installation:
pip install -r requirements_gpu.txt
)
- (for GPU-enabled installation:
-
Install Mask R-CNN library:
- We used a slightly modified version of https://github.com/matterport/Mask_RCNN, though the original version should still be usable, possibly with minor adaptions.
- Clone repository from https://github.com/j-rausch/Mask_RCNN
- Change into mask rcnn directory
- type
python setup.py install
-
Install docparser:
- Change into DocParser directory
- type
python setup.py develop
-
Prepare the datasets:
- Download arxivdocs-target from https://github.com/DS3Lab/arXivDocs
- To run the ICDAR demo, download the prepared files from: https://drive.google.com/file/d/1SdGTq80eUGqUJBA6kdVQBO9L6a_ijAcN/view?usp=sharing
- Extract datasets to the
DocParser
subdirectory- (resulting in structure:
DocParser/datasets
).
- (resulting in structure:
-
Prepare the trained models:
- Download from URL: https://drive.google.com/file/d/1Hi4-tg4Zmtx8zYiCg6IBi47R88PdmAW4/view?usp=sharing
- Extract the pretrained models to the
default_models
subdirectory inDocParser/docparser/
- (resulting in structure
DocParser/docparser/default_models/
).
- (resulting in structure
- For convenience, we include the COCO pre-trained weights from from https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5 in the zip file
-
For running the ICDAR demo:
- Please note that, in order to run the ICDAR 2013 evaluation script provided by the competition organizers, a Java installation is necessary. We used
openjdk 11.0.7 2020-04-14
in our experiments. - If necessary, update permissions for the evaluation script (on linux systems):
chmod a+x DocParser/docparser/utils/dataset-tools-20180206.jar
- Please note that, in order to run the ICDAR 2013 evaluation script provided by the competition organizers, a Java installation is necessary. We used
-
From the
DocParser
directory, execute:python demos/demo_inference.py
plus one or more of the following command line arguments:--page
--table
--icdar
- e.g.
python demos/demo_inferencey.py --page --table
The results of our current system on arXivDocs-target is likely to perform better than the one evaluated in the last version of the paper, mostly due to further improvements to postprocessing.
Updated Results. We corrected a read-out error on the outputs of the provided evaluation script for documents with multiple tables.
System | F1* | F1 |
---|---|---|
DocParser Baselie | 0.8443 | 0.8209 |
DocParser WS | 0.8117 | 0.8056 |
DocParser WS+FT | 0.9292 | 0.9292 |
(PDF-based system F1: 0.9221)
Parts of our code is based on: https://github.com/rafaelpadilla/Object-Detection-Metrics
https://github.com/matterport/Mask_RCNN
Rausch, J., Martinez, O., Bissig, F., Zhang, C., & Feuerriegel, S. (2019). DocParser: Hierarchical Structure Parsing of Document Renderings. http://arxiv.org/abs/1911.01702