Skip to content

v1.2.0

Compare
Choose a tag to compare
@unhammer unhammer released this 11 Mar 13:08
· 6422 commits to master since this release

(see also the mailing list announcement)

This release comes courtesy of Nynorsk pressekontor / NPK, with funding from the Norwegian Ministry of Culture. There has been some press about the project.

NPK have been using apertium-nno-nob in production since fall 2018 – it's integrated into their translation/editing systems – and we've been continually improving it with the help of their post-edits and feedback. The form/spelling/style choices used by nob→nno are now more modern and uniform (there was a major release of Nynorsk back in 2012, while most style decisions in the translator were made in the first release back in 2009).

Other major changes to the pair:

  • 35 new transfer rules (one of which required a bugfix to apertium-transfer
  • 248 new lrx rules
  • about 42.000 new names and 3.800 new non-names added to bidix
  • regression testing by checking that WER does not drop
  • lots of work on nob disambiguation
  • we now do long-distance adjective congruence
  • there's a post-nno.dix to get rid of triple consonants resulting from
    compounding
  • compounding happens on proper nouns too now
  • genitives are translated not just by preposition-rewriting, but we now also have:
    • lists of exceptions where we want to keep genitives
    • rewriting some nouns with relatives
    • rewriting nationalities with adjectives
    • rewriting some abstract nouns into compounds

Below is the median/mean WER on a test set of 1135 NTB news articles that were post-edited with the git checkout of January 2019, evaluated with various git checkouts of apertium-nno-nob:

  | git date   | median WER | mean WER | stdev |
  |------------+------------+----------+-------|
  | 2018-10-01 |      11.79 |    12.96 |  7.49 |
  | 2018-10-31 |       9.68 |    10.96 |  7.28 |
  | 2018-12-20 |       7.26 |     8.52 |  7.05 |
  | 2019-02-28 |       6.77 |     8.04 |  7.04 |

(apertium-eval-translator was run once for each of the 1135 articles, for each of the checkouts of the translator+deps)