Contributed MT leaderboard

A repository of scores and for MT models based on contributed testset translations. The scores are intended to feed into the OPUS-MT dashboard.

Leaderboards

The scores directory includes leaderboards for each evaluated benchmark in OPUS-MT-testsets. The benchmark-specific leaderboards are stored in SQLite3 databases, one per benchmark:

scores/bleu_scores.db
scores/chrf_scores.db
scores/chrf++_scores.db
...

There are also plain text files that indicate the time stamp of the last update for those DB files. This is convenient for a web app to decide whether it needs to fetch a new version of the database. The naming conventions are very similar for the timestamp files:

scores/bleu_scores.date
scores/chrf_scores.date
scores/chrf++_scores.date
...

Each database contains just one simple table called scores with the essential information about test set scores:

field	data type	explanation
model	TEXT	model name including path in this repository relative to the model dir
langpair	TEXT	language pair of the benchmark from OPUS-MT-testsets (ISO639-3 language codes)
testset	TEXT	name of the benchmark (according to OPUS-MT-testsets)
score	NUMERIC	score of the evaluation

The primary key is (model, langpair, testset). All scores are taken from the model score files described below.

Model Scores

The repository includes recipes for evaluating MT models and scores coming from systematically running MT benchmarks. Each sub directory in models corresponds to a specific model type and includes tables of automatic evaluation results.

The structure corresponds to the repository of OPUS-MT models with separate tables for different evaluation metrics (like BLEU, chrF and COMET):

models/provider/model-release-name.bleu-scores.txt
models/provider/model-release-name.spbleu-scores.txt
models/provider/model-release-name.chrf-scores.txt
models/provider/model-release-name.chrf++-scores.txt
models/provider/model-release-name.comet-scores.txt

The provider specifies the name of the provider (for example facebook). The model-release-name corresponds to the release name of the model (for example nllb-200-54.5B).

There is also another file that combines BLEU and chrF scores together with some other information about the test set and the model (see further down below).

models/provider/model-release-name.scores.txt

Additional metrics can be added using the same format replacing metric in model-release-name.metric-scores.txt with a descriptive unique name of the metric.

Note that chrF scores should for historical reasons be with decimals and not in percentages as they are given by current versions of sacrebleu. This is to match the implementation of the web interface of the OPUS-MT leaderboard.

File Formats

Each model score file for each specific evaluation metric follows a very simple format: The file is a plain text file with TAB-separated values in three columns specifying

the language pair of the benchmark (e.g. eng-rus)
the name of the benchmark (e.g. flores200-devtest)
the score

As an example, the English - Russian wmt19 model from facebook models/facebook/nllb-200-54.5B.bleu-scores.txt includes the following lines:

ace_Arab-ace_Latn	flores200-devtest	6.8
ace_Arab-acm	flores200-devtest	2.5
ace_Arab-acq	flores200-devtest	2.4
ace_Arab-aeb	flores200-devtest	3.3
ace_Arab-afr	flores200-devtest	8.0
...

The only file that differs from this general format is the src-trg/model-release-name.scores.txt that combines BLEU and chrF scores. In addition to the scores, this file also includes

the link to the actual model for downloading
the size of the benchmark in terms of the number of sentences
the size of the benchmark in terms of the number of tokens

Here is an example from models/facebook/nllb-200-54.5B.scores.txt:

ace_Arab-ace_Latn	flores200-devtest	0.36182	6.8	facebook/nllb-200-54.5B	1012	24121
ace_Arab-acm	flores200-devtest	0.2232	2.5	facebook/nllb-200-54.5B	1012	20497
ace_Arab-acq	flores200-devtest	0.21289	2.4	facebook/nllb-200-54.5B	1012	20945
ace_Arab-aeb	flores200-devtest	0.2582	3.3	facebook/nllb-200-54.5B	1012	20498
ace_Arab-afr	flores200-devtest	0.34476	8.0	facebook/nllb-200-54.5B	1012	25740
ace_Arab-ajp	flores200-devtest	0.27019	4.1	facebook/nllb-200-54.5B	1012	20450
ace_Arab-aka	flores200-devtest	0.24491	3.9	facebook/nllb-200-54.5B	1012	29549
ace_Arab-als	flores200-devtest	0.33013	8.8	facebook/nllb-200-54.5B	1012	27783
...

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
OPUS-MT-testsets @ ac0f19c		OPUS-MT-testsets @ ac0f19c
admin		admin
build @ 2369bcc		build @ 2369bcc
lib		lib
models		models
scores		scores
tools		tools
.gitmodules		.gitmodules
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
models.date		models.date
models.db		models.db
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Contributed MT leaderboard

Leaderboards

Model Scores

File Formats

Related work and links

About

Releases

Packages

Languages

License

Helsinki-NLP/Contributed-MT-leaderboard

Folders and files

Latest commit

History

Repository files navigation

Contributed MT leaderboard

Leaderboards

Model Scores

File Formats

Related work and links

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages