The OMEN platform is a self-hosted annotation paltform with multi-user support. It is deployed as a Docker container and, by default, backed by a PostgreSQL database. It is maintained by the Semantic Computing Group at the Center for Cognitive Interaction Technology (CITEC), Bielefeld University
- Simple dataset management for document-level annotation tasks.
- Role-based dataset and task management: Users interact with datasets as either owner, curator, or annotator (each providing different levels of functionality and access).
- Work package definitions for annotation tasks based on subsets of your dataset.
A simple annotation task with three labels. Each label is configured to use a background color and display an icon. Annotators can either click the label buttons or press a hot-key (numbers 1-3 - for quick access to up to 9 labels):
Users with the creator or owner role can access further management functionality, e.g. browsing the dataset, seeing the distribution of all annotations, and inspecting the inter-annotator agreement on the dataset overall:
Configuring a dataset is as easy as uploading a CSV file, choosing columns to identify samples and their content, and configuring the possible labels:
To use OMEN on your own tasks, please make sure to install the following prerequisites:
- Docker
- Docker compose
- PostgreSQL (if you do not want to use the Docker compose setup below and provide your own)
The software is made available as a Docker container via GitHub packages. For regular deployments, the omen-prod
package should be used.
Note that pulling images, even public ones, from GitHub's infrastructure requires authentication. A personal authentication token with the read:packages
permission is required. Please see the GitHub documentation on how to set this up (cat ~/TOKEN.txt | docker login https://docker.pkg.github.com -u USERNAME --password-stdin
).
Alternatively you can choose to pull the production image from dockerhub.
To pull the image using the command line: docker pull docker.pkg.github.com/frankgrimm/omen/omen-prod:latest
Our standard deployment model uses Docker compose. An example docker-compose.yml
configuration that sets up a database and OMEN instance can be found in the examples/
directory of the repository. Note that this example requires mapping the database files to a volume in order to be retained when the infrastructure is restarted.
After pulling the image and configuring your preferred deployment method, make sure to:
- a) Adjust your configuration with the mandatory parameters (e.g. database connection and credentials)
- b) Provide it to the container by mapping a volume and expose the web server (running on port TCP/5000 by default) so you can reach the web application
- c) Create a first user via the command line and try to log in (by default at
http://yourhost.domain.tld:5000
). You will only have to do this once, additional users can be created in the application itself.
Note that this deployment example ends at the container level, it should be used behind a https-enabled reverse proxy (e.g. nginx, Apache, Caddy 2, or similar).
[...]
ports:
- "5000:5000"
volumes:
- ${PWD}/config.json:/home/omenuser/app/config.json
Check out our milestones and issues to see what is going on with the project. If you want to get started, go ahead and fork the project. We provide two ready-to-go docker-compose configurations. These should work for most setups and are also used to configure the CI (via GitHub actions) and package releases:
docker-compose.dev.yml
which runs the current OMEN branch using Flask debug (featuring auto-reloading) and configures a local database within the same compose-network. This should be the default configuration for developmentdocker-compose --env-file /dev/null -f docker-compose.dev.yml up
docker-compose.prod.yml
which runs a fullgunicorn
instance and is used to create the production release. This is otherwise mostly used directly when setting up test and staging environments.docker-compose --env-file /dev/null -f docker-compose.prod.yml up