-
Notifications
You must be signed in to change notification settings - Fork 5
Continuous integration and automatic tests
The corpus has some continuous integration and automatic testing associated with it. This consists of several tests, plus a mechanism that keeps track of manual / automatic curations.
Each paragraph has its hash associated with its n
attribute. The n
attribute is used since it can be used to store numeric values about the element, and has no special meaning in the Parla-Clarin schema. However, if the , the n
attribute is populated with "manual" instead of the hash.
The test that flags changes that break this policy is still a work in progress.
Currently, the following are run on push
- push.yml
- Runs unit tests:
- test/schemas.py
- test/db.py
- test/mp.py
- Further documentation on individual tests is available as docstrings in the python files
- Runs unit tests:
The following scripts are run on updates to pull requests
- check_unchanged.yml
- Checks that we don't change or delete files that should be static
- test/unchanged.py for newly changed and deleted files
- validate.yml
- Validates the Parla-Clarin schema on new and changed .xml files
- test/validate_parlaclarin.py for newly created and changed files
And the following script is run when a pull request is merged
- log_changes.yml
- Recalculates hashes of paragraphs for parla-clarin files
The .yml files can be found under .github/workflows/
On updates to pull requests
- check_manual.yml (doesn't exist yet)
- Checks that manual changes have not been overridden in Parla-Clarin files
- scripts/update_hashes.py
General
- Refactor paragraph hashes so that manual changes are also saved to an external file. Otherwise we cannot keep observe deletions of manually edited paragraphs.