Skip to content

Latest commit

 

History

History
 
 

eng-bat

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

opus-2020-06-28.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): lav lit ltg prg_Latn sgs
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-06-28.zip
  • test set translations: opus-2020-06-28.test.txt
  • test set scores: opus-2020-06-28.eval.txt

Benchmarks

testset BLEU chr-F
newsdev2017-enlv-englav.eng.lav 22.4 0.533
newsdev2019-enlt-englit.eng.lit 19.5 0.520
newstest2017-enlv-englav.eng.lav 17.3 0.493
newstest2019-enlt-englit.eng.lit 12.7 0.453
Tatoeba-test.eng-lav.eng.lav 40.4 0.637
Tatoeba-test.eng-lit.eng.lit 35.1 0.634
Tatoeba-test.eng.multi 33.9 0.596
Tatoeba-test.eng-prg.eng.prg 0.2 0.110
Tatoeba-test.eng-sgs.eng.sgs 1.5 0.136

opus-2020-07-26.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): lav lit ltg prg_Latn sgs
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-26.zip
  • test set translations: opus-2020-07-26.test.txt
  • test set scores: opus-2020-07-26.eval.txt

Benchmarks

testset BLEU chr-F
newsdev2017-enlv-englav.eng.lav 22.8 0.533
newsdev2019-enlt-englit.eng.lit 19.4 0.518
newstest2017-enlv-englav.eng.lav 17.2 0.493
newstest2019-enlt-englit.eng.lit 13.1 0.456
Tatoeba-test.eng-lav.eng.lav 41.2 0.636
Tatoeba-test.eng-lit.eng.lit 34.6 0.631
Tatoeba-test.eng.multi 35.1 0.599
Tatoeba-test.eng-prg.eng.prg 0.5 0.130
Tatoeba-test.eng-sgs.eng.sgs 3.8 0.192

opus2m-2020-08-01.zip

  • dataset: opus2m
  • model: transformer
  • source language(s): eng
  • target language(s): lav lit ltg prg_Latn sgs
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus2m-2020-08-01.zip
  • test set translations: opus2m-2020-08-01.test.txt
  • test set scores: opus2m-2020-08-01.eval.txt

Benchmarks

testset BLEU chr-F
newsdev2017-enlv-englav.eng.lav 24.0 0.546
newsdev2019-enlt-englit.eng.lit 20.9 0.533
newstest2017-enlv-englav.eng.lav 18.3 0.506
newstest2019-enlt-englit.eng.lit 13.6 0.466
Tatoeba-test.eng-lav.eng.lav 42.8 0.652
Tatoeba-test.eng-lit.eng.lit 37.1 0.650
Tatoeba-test.eng.multi 37.0 0.616
Tatoeba-test.eng-prg.eng.prg 0.5 0.130
Tatoeba-test.eng-sgs.eng.sgs 4.1 0.178