This is a repository for the temporal worker docker image for EPP.
This project facilitates asynchronous importing of content identified from a docmap provider. We are using the docmaps to provide a feed of preprints that have been reviewed by a particular publisher. The data in the docmap provides the history and location of content, which we parse and retrieve.
We then push the parsed content into an EPP server endpoint.
Finally, the results of all this retrieval is stored in an S3 bucket in well structure paths (which can then be configured as a source for a canteloupe IIIF server)
The monitoring and scheduling of the import workflows are handled by a temporal servertesting and dev).
Ensure you have docker and docker-compose (v2 tested). Also install temporal
to start and control jobs
- clone the repo
- run
yarn
- run
docker compose up
to start temporal and the workers in "watch" mode - run
temporal operator namespace list
to list namespaces, you should see default namespace listed, and not any other error.
The docker compose
workflow above will restart the worker when your mounted filesystem changes.
To run an import workflow, run:
temporal workflow execute --type importDocmaps -t epp -w import-docmap-test -i '{ "docMapIndexUrl": "http://mock-datahub/enhanced-preprints/docmaps/v1/index" }'
This will kick of a full import for a docmap index from eLife's API.
To re-run the whole process, you will first need to remove the containers and volumes:
docker compose down --volumes
To prevent large reimport of docmaps that would cause content becoming unpublished, you can specify an optional numeric threshold for docmap changes that are allowed.
temporal workflow execute --type importDocmaps -t epp -w import-docmap-test -i '{ "docMapIndexUrl": "http://mock-datahub/enhanced-preprints/docmaps/v1/index", "docMapThreshold": 2 }'
This can also be applied to the importDocmap
, importManuscriptData
and importContent
workflows:
temporal workflow execute --type importDocmaps -t epp -w import-docmap-test -i '{ "docMapIndexUrl": "http://mock-datahub/enhanced-preprints/docmaps/v1/index", "workflowArgs": { "xsltTransformPassthrough": true } }'
Sometimes we want to disable specific types of XSLT, e.g. handle-etal-in-refs.xsl (full list of options can be found in xsltLogs in Temporal). This can also be applied to the importDocmap
, importManuscriptData
and importContent
workflows:
temporal workflow execute --type importDocmaps -t epp -w import-docmap-test -i '{ "docMapIndexUrl": "http://mock-datahub/enhanced-preprints/docmaps/v1/index", "workflowArgs": { "xsltBlacklist": "file1.xsl, file2.xsl" } }'
This can also be applied to the importDocmap
, importManuscriptData
and importContent
workflows:
temporal workflow execute --type importDocmaps -t epp -w import-docmap-test -i '{ "docMapIndexUrl": "http://mock-datahub/enhanced-preprints/docmaps/v1/index", "workflowArgs": { "preferPreprintContent": true } }'
Sometimes, due to issues with Temporal UI, we need to use command line to send a signal. You need to specify the target workflow id, name and input of the signal.
tctl workflow signal --workflow_id import-docmap-test --name approval -i true
To run an import workflow that only imports docmaps that are new or have changed since a previous run, start an importDocmaps workflow with a state file name as the second parameter and add a state file to minio:
temporal workflow execute --type importDocmaps -t epp -w import-docmap-test -i '{ "docMapIndexUrl": "http://mock-datahub/enhanced-preprints/docmaps/v1/index", "s3StateFileUrl": "state.json" }'
This will read in previously seen (and hashed) docmaps from the S3 bucket in config, skipping any it has seen before.
To kick of a full import for a docmap index from eLife's API, then loop itself every hour (see next command to change this), skipping docmaps that have no changes.
To change the sleep time, add a semantic time parameter to the --interval
inputs, for example 1 minute
or 5 minutes
:
temporal schedule create --schedule-id import-docmaps -w import-docmaps -t epp --workflow-type importDocmaps -i '{ "docMapIndexUrl": "http://mock-datahub/enhanced-preprints/docmaps/v1/index", "s3StateFileUrl": "import-docmaps.json" }' --overlap-policy Skip --interval '1m'
You can then view these runs on the dashboard.
SERVER_DIR="../your-directory-here" docker compose -f docker-compose.yaml -f docker-compose.override.yaml -f docker-compose.localserver.yaml up
To start the application with a local version of the EPP API server
, so you can run the application and test local changes of the API, you need to define an environment variable SERVER_DIR
with the location of your EPP API server project, i.e. SERVER_DIR="../enhanced-preprints-server"
, then run the above command to invoke the .localserver
overrides. This will work with the first import workflow command.
To run with the local API but without the mocked services, omit -f docker-compose.override.yaml
from the compose command.
SERVER_DIR="../enhanced-preprints-server" APP_DIR="../enhanced-preprints-client" docker compose -f docker-compose.yaml -f docker-compose.override.yaml -f docker-compose.localserver.yaml -f docker-compose.localapp.yaml up
ENCODA_DIR="../enhanced-preprints-encoda" docker compose -f docker-compose.yaml -f docker-compose.override.yaml -f docker-compose.localencoda.yaml up
NOTE: this will only read meca files from the real S3, so you don't need to mock them out
Define a .env file with these variables:
MECA_AWS_ACCESS_KEY_ID=your access key
MECA_AWS_SECRET_ACCESS_KEY=your secret key
MECA_AWS_ROLE_ARN=a role to assume to have permission to source S3 buckets # optional
Then run docker-compose with the base, override and s3 configs, like below:
docker compose -f docker-compose.yaml -f docker-compose.override.yaml -f docker-compose.s3.yaml up
To import a specific docmap such as 85111 use the importDocmap workflow:
temporal workflow execute --type importDocmap -w import-docmap-85111 -t epp -i '{ "url": "https://data-hub-api.elifesciences.org/enhanced-preprints/docmaps/v2/by-publisher/elife/get-by-manuscript-id?manuscript_id=85111" }'
NOTE: this will only write extract resources to the real S3, so you can verify that the process works
Define a .env file with these variables:
AWS_ACCESS_KEY_ID=your access key
AWS_SECRET_ACCESS_KEY=your secret key
BUCKET_NAME=you will want to create an S3 bucket for your dev experiments
Then run docker-compose with the base, override and s3 configs, like below:
docker compose -f docker-compose.yaml -f docker-compose.override.yaml -f docker-compose.s3-epp.yaml up
You can combine the s3 source and destination to allow for retrieval from s3 source and preparing the assets and uploading them to S3:
docker compose -f docker-compose.yaml -f docker-compose.override.yaml -f docker-compose.s3.yaml -f docker-compose.s3-epp.yaml up
To run the tests with docker (especially useful if they are not working locally) use the following command:
docker compose -f docker-compose.tests.yaml run tests