Database changes

# If you want to tag a bunch of documents, organized under a "tagging_push_2023" dataset name:
mkdir -p to_be_tagged/pdfs/tagging_push_2023

docker-compose up

# In another terminal or in Finder, copy the pdfs into to_be_tagged/pdfs/tagging_push_2023. This will kick off some mild processing in the background to split the PDF into page-level images, and add them to the backend for processing. After a few minutes, they should be visible when you select the "Tag" option on localhost:8080.

Data is persisted in ./pg_data, but can be dumped in XML+PNG format for model training by:

docker cp dump_to_xml.py cosmos-tagger_import_data_1:/src/
docker exec cosmos-tagger_import_data_1 python dump_to_xml.py /data/pngs/ /data/dump

Now the annotations are sitting in your local host in to_be_tagged/dump. Improvements for this step to come soon.

Database changes

The following functions are possible, but require database changes for which there are no easy interfaces yet:

Adding users
Adding tag classes.

Both are self-explanatory within the postgres database, and changes should be immediately apparent within the app.

NOTE: because complete annotations are tied to the tag_id within tag table, do not re-use ids if you have both annotated documents AND delete tag classes.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
init-sql		init-sql
README.md		README.md
docker-compose.yml		docker-compose.yml
dump_to_xml.py		dump_to_xml.py
nginx.conf		nginx.conf
organize_ground_truths.py		organize_ground_truths.py
xml_to_json.py		xml_to_json.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Database changes

About

Releases

Packages

Contributors 2

Languages

UW-COSMOS/cosmos-tagger

Folders and files

Latest commit

History

Repository files navigation

Database changes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages