Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
gwkrsrch authored Oct 24, 2023
1 parent e7a54e8 commit 747be44
Showing 1 changed file with 4 additions and 5 deletions.
9 changes: 4 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,12 @@ Official Implementation of **Web**-based **Vi**sual **Co**rpus **B**uilder (**WE

**WEBVICOB** 🕸, **Web**-based **Vi**sual **Co**rpus **B**uilder, is a dataset generator that can readily construct a large-scale visual corpus (i.e., images with text annotations) from a raw Wikipedia HTML dump. The constructed visual corpora can be utilized in building Visual Document Understanding (VDU) backbones. Our academic paper, which describes our engine in detail and provides full experimental results and analyses, can be found here:<br>
> [**On Web-based Visual Corpus Construction for Visual Document Understanding**](https://arxiv.org/abs/2211.03256).<br>
> [Donghyun Kim](https://github.com/dhkim0225), [Teakgyu Hong](https://dblp.org/pid/183/0952.html), [Moonbin Yim](https://github.com/moonbings), [Yoonsik Kim](https://scholar.google.com/citations?user=nuxd_BsAAAAJ) and [Geewook Kim](https://geewook.kim). In ICDAR 2023 (to appear).
> [Donghyun Kim](https://github.com/dhkim0225), [Teakgyu Hong](https://dblp.org/pid/183/0952.html), [Moonbin Yim](https://github.com/moonbings), [Yoonsik Kim](https://scholar.google.com/citations?user=nuxd_BsAAAAJ) and [Geewook Kim](https://geewook.kim). In ICDAR 2023.
![annot](resources/annot.png)

## Updates
**_2023-05-03_** Our paper is accepted at ICDAR2023. A new version of the paper has been published on arxiv.
**_2023-05-03_** Our paper is accepted at ICDAR2023. A new version of the paper has been published on arXiv.
**_2023-02-11_** HTML Section Chunker added, Solve memory-leak issue.
**_2022-11-08_** [Paper](https://arxiv.org/abs/2211.03256) published on arxiv.
**_2022-11-04_** First Commit, We release the codebase.
Expand Down Expand Up @@ -111,12 +111,11 @@ And untar ndjson files on `[your workspace path]/raw`.
## How to Cite
If you find this work useful to you, please cite:
```
@inproceedings{kim2023web,
@InProceedings{kim2023web,
title = {On Web-based Visual Corpus Construction for Visual Document Understanding},
author = {Kim, Donghyun and Hong, Teakgyu and Yim, Moonbin and Kim, Yoonsik and Kim, Geewook},
booktitle = {International Conference on Document Analysis and Recognition (ICDAR)},
booktitle = {Document Analysis and Recognition - ICDAR 2023},
year = {2023},
note = {accepted, to appear},
}
```

Expand Down

0 comments on commit 747be44

Please sign in to comment.