Skip to content
This repository has been archived by the owner on May 8, 2024. It is now read-only.

Continuous integration and automatic tests

ninpnin edited this page Jul 23, 2021 · 1 revision

The corpus has some continuous integration and automatic testing associated with it. This consists of several tests, plus a mechanism that keeps track of manual / automatic curations.

Manual / automatic curations

Each paragraph has its hash associated with its n attribute. The n attribute is used since it can be used to store numeric values about the element, and has no special meaning in the Parla-Clarin schema. However, if the , the n attribute is populated with "manual" instead of the hash.

The test that flags changes that break this policy is still a work in progress.

GH Actions

Currently, the following are run on push

  • push.yml
    • Runs unit tests:
      • test/schemas.py
      • test/db.py
      • test/mp.py
    • Further documentation on individual tests is available as docstrings in the python files

The following scripts are run on updates to pull requests

  • check_unchanged.yml
    • Checks that we don't change or delete files that should be static
    • test/unchanged.py for newly changed and deleted files
  • validate.yml
    • Validates the Parla-Clarin schema on new and changed .xml files
    • test/validate_parlaclarin.py for newly created and changed files

And the following script is run when a pull request is merged

  • log_changes.yml
    • Recalculates hashes of paragraphs for parla-clarin files

The .yml files can be found under .github/workflows/

TODOs

On updates to pull requests

  • check_manual.yml (doesn't exist yet)
    • Checks that manual changes have not been overridden in Parla-Clarin files
    • scripts/update_hashes.py

General

  • Refactor paragraph hashes so that manual changes are also saved to an external file. Otherwise we cannot keep observe deletions of manually edited paragraphs.