Continuous integration and automatic tests

The corpus has some continuous integration and automatic testing associated with it. This consists of several tests, plus a mechanism that keeps track of manual / automatic curations.

Manual / automatic curations

Each paragraph has its hash associated with its n attribute. The n attribute is used since it can be used to store numeric values about the element, and has no special meaning in the Parla-Clarin schema. However, if the , the n attribute is populated with "manual" instead of the hash.

The test that flags changes that break this policy is still a work in progress.

GH Actions

Currently, the following are run on push

push.yml
- Runs unit tests:
  - test/schemas.py
  - test/db.py
  - test/mp.py
- Further documentation on individual tests is available as docstrings in the python files

The following scripts are run on updates to pull requests

check_unchanged.yml
- Checks that we don't change or delete files that should be static
- test/unchanged.py for newly changed and deleted files
validate.yml
- Validates the Parla-Clarin schema on new and changed .xml files
- test/validate_parlaclarin.py for newly created and changed files

And the following script is run when a pull request is merged

log_changes.yml
- Recalculates hashes of paragraphs for parla-clarin files

The .yml files can be found under .github/workflows/

TODOs

On updates to pull requests

check_manual.yml (doesn't exist yet)
- Checks that manual changes have not been overridden in Parla-Clarin files
- scripts/update_hashes.py

General

Refactor paragraph hashes so that manual changes are also saved to an external file. Otherwise we cannot keep observe deletions of manually edited paragraphs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continuous integration and automatic tests

Manual / automatic curations

GH Actions

TODOs

Clone this wiki locally