Skip to content

Latest commit

 

History

History
159 lines (143 loc) · 7.55 KB

README.md

File metadata and controls

159 lines (143 loc) · 7.55 KB

opus-2020-06-28.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): asm awa ben bho gom guj hif_Latn hin mai mar npi ori pan_Guru pnb rom sin snd_Arab urd
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-06-28.zip
  • test set translations: opus-2020-06-28.test.txt
  • test set scores: opus-2020-06-28.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-asm.eng.asm 3.0 0.245
Tatoeba-test.eng-awa.eng.awa 0.4 0.098
Tatoeba-test.eng-ben.eng.ben 16.5 0.481
Tatoeba-test.eng-bho.eng.bho 0.8 0.110
Tatoeba-test.eng-guj.eng.guj 19.9 0.393
Tatoeba-test.eng-hif.eng.hif 0.5 0.022
Tatoeba-test.eng-hin.eng.hin 17.4 0.463
Tatoeba-test.eng-kok.eng.kok 8.1 0.006
Tatoeba-test.eng-lah.eng.lah 0.2 0.001
Tatoeba-test.eng-mai.eng.mai 7.6 0.374
Tatoeba-test.eng-mar.eng.mar 20.4 0.464
Tatoeba-test.eng.multi 17.0 0.442
Tatoeba-test.eng-nep.eng.nep 1.0 0.102
Tatoeba-test.eng-ori.eng.ori 2.2 0.198
Tatoeba-test.eng-pan.eng.pan 8.4 0.343
Tatoeba-test.eng-rom.eng.rom 0.3 0.185
Tatoeba-test.eng-sin.eng.sin 9.5 0.368
Tatoeba-test.eng-snd.eng.snd 6.8 0.343
Tatoeba-test.eng-urd.eng.urd 12.5 0.414

opus-2020-07-06.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): asm awa ben bho gom guj hif_Latn hin mai mar npi ori pan_Guru pnb rom san_Deva sin snd_Arab urd
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-06.zip
  • test set translations: opus-2020-07-06.test.txt
  • test set scores: opus-2020-07-06.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-asm.eng.asm 3.6 0.277
Tatoeba-test.eng-awa.eng.awa 0.4 0.144
Tatoeba-test.eng-ben.eng.ben 15.9 0.466
Tatoeba-test.eng-bho.eng.bho 0.6 0.152
Tatoeba-test.eng-guj.eng.guj 20.9 0.380
Tatoeba-test.eng-hif.eng.hif 0.6 0.032
Tatoeba-test.eng-hin.eng.hin 17.2 0.461
Tatoeba-test.eng-kok.eng.kok 3.3 0.022
Tatoeba-test.eng-lah.eng.lah 0.3 0.007
Tatoeba-test.eng-mai.eng.mai 8.9 0.392
Tatoeba-test.eng-mar.eng.mar 20.1 0.463
Tatoeba-test.eng.multi 16.8 0.439
Tatoeba-test.eng-nep.eng.nep 0.6 0.058
Tatoeba-test.eng-ori.eng.ori 2.2 0.187
Tatoeba-test.eng-pan.eng.pan 9.6 0.351
Tatoeba-test.eng-rom.eng.rom 0.4 0.188
Tatoeba-test.eng-san.eng.san 1.5 0.111
Tatoeba-test.eng-sin.eng.sin 9.1 0.370
Tatoeba-test.eng-snd.eng.snd 1.9 0.235
Tatoeba-test.eng-urd.eng.urd 12.7 0.412

opus-2020-07-26.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): asm awa ben bho gom guj hif_Latn hin mai mar npi ori pan_Guru pnb rom san_Deva sin snd_Arab urd
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-26.zip
  • test set translations: opus-2020-07-26.test.txt
  • test set scores: opus-2020-07-26.eval.txt

Benchmarks

testset BLEU chr-F
newsdev2014-enghin.eng.hin 7.5 0.337
newsdev2019-engu-engguj.eng.guj 6.3 0.282
newstest2014-hien-enghin.eng.hin 11.0 0.358
newstest2019-engu-engguj.eng.guj 7.1 0.291
Tatoeba-test.eng-asm.eng.asm 3.7 0.260
Tatoeba-test.eng-awa.eng.awa 0.4 0.144
Tatoeba-test.eng-ben.eng.ben 16.0 0.466
Tatoeba-test.eng-bho.eng.bho 0.6 0.143
Tatoeba-test.eng-guj.eng.guj 20.2 0.375
Tatoeba-test.eng-hif.eng.hif 0.5 0.040
Tatoeba-test.eng-hin.eng.hin 17.3 0.462
Tatoeba-test.eng-kok.eng.kok 3.3 0.044
Tatoeba-test.eng-lah.eng.lah 0.2 0.005
Tatoeba-test.eng-mai.eng.mai 9.3 0.385
Tatoeba-test.eng-mar.eng.mar 19.9 0.461
Tatoeba-test.eng.multi 16.6 0.436
Tatoeba-test.eng-nep.eng.nep 0.7 0.067
Tatoeba-test.eng-ori.eng.ori 2.2 0.196
Tatoeba-test.eng-pan.eng.pan 7.0 0.342
Tatoeba-test.eng-rom.eng.rom 0.4 0.187
Tatoeba-test.eng-san.eng.san 1.7 0.109
Tatoeba-test.eng-sin.eng.sin 9.1 0.365
Tatoeba-test.eng-snd.eng.snd 5.6 0.343
Tatoeba-test.eng-urd.eng.urd 12.9 0.411

opus2m-2020-08-01.zip

  • dataset: opus2m
  • model: transformer
  • source language(s): eng
  • target language(s): asm awa ben bho gom guj hif_Latn hin mai mar npi ori pan_Guru pnb rom san_Deva sin snd_Arab urd
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus2m-2020-08-01.zip
  • test set translations: opus2m-2020-08-01.test.txt
  • test set scores: opus2m-2020-08-01.eval.txt

Benchmarks

testset BLEU chr-F
newsdev2014-enghin.eng.hin 8.2 0.342
newsdev2019-engu-engguj.eng.guj 6.5 0.293
newstest2014-hien-enghin.eng.hin 11.4 0.364
newstest2019-engu-engguj.eng.guj 7.2 0.296
Tatoeba-test.eng-asm.eng.asm 2.7 0.277
Tatoeba-test.eng-awa.eng.awa 0.5 0.132
Tatoeba-test.eng-ben.eng.ben 16.7 0.470
Tatoeba-test.eng-bho.eng.bho 4.3 0.227
Tatoeba-test.eng-guj.eng.guj 17.5 0.373
Tatoeba-test.eng-hif.eng.hif 0.6 0.028
Tatoeba-test.eng-hin.eng.hin 17.7 0.469
Tatoeba-test.eng-kok.eng.kok 1.7 0.000
Tatoeba-test.eng-lah.eng.lah 0.3 0.028
Tatoeba-test.eng-mai.eng.mai 15.6 0.429
Tatoeba-test.eng-mar.eng.mar 21.3 0.477
Tatoeba-test.eng.multi 17.3 0.448
Tatoeba-test.eng-nep.eng.nep 0.8 0.081
Tatoeba-test.eng-ori.eng.ori 2.2 0.208
Tatoeba-test.eng-pan.eng.pan 8.0 0.347
Tatoeba-test.eng-rom.eng.rom 0.4 0.197
Tatoeba-test.eng-san.eng.san 0.5 0.108
Tatoeba-test.eng-sin.eng.sin 9.1 0.364
Tatoeba-test.eng-snd.eng.snd 4.4 0.284
Tatoeba-test.eng-urd.eng.urd 13.3 0.423