Skip to content

Latest commit

 

History

History
 
 

sem-sem

opus-2020-07-27.zip

  • dataset: opus
  • model: transformer
  • source language(s): apc ara arq arz heb mlt
  • target language(s): apc ara arq arz heb mlt
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-27.zip
  • test set translations: opus-2020-07-27.test.txt
  • test set scores: opus-2020-07-27.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.ara-ara.ara.ara 4.2 0.200
Tatoeba-test.ara-heb.ara.heb 34.0 0.542
Tatoeba-test.ara-mlt.ara.mlt 16.6 0.513
Tatoeba-test.heb-ara.heb.ara 18.8 0.477
Tatoeba-test.mlt-ara.mlt.ara 20.7 0.388
Tatoeba-test.multi.multi 27.1 0.507

opus-2020-09-26.zip

  • dataset: opus
  • model: transformer
  • source language(s): acm afb amh apc ara arq ary arz eng heb mlt phn_Phnx syc_Syrc tir tmr_Hebr
  • target language(s): acm afb amh apc ara arq ary arz eng heb mlt phn_Phnx syc_Syrc tir tmr_Hebr
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-09-26.zip
  • test set translations: opus-2020-09-26.test.txt
  • test set scores: opus-2020-09-26.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.amh-eng.amh.eng 42.0 0.593
Tatoeba-test.ara-ara.ara.ara 2.7 0.167
Tatoeba-test.ara-eng.ara.eng 38.6 0.564
Tatoeba-test.ara-heb.ara.heb 34.9 0.558
Tatoeba-test.ara-mlt.ara.mlt 24.3 0.532
Tatoeba-test.ara-tmr.ara.tmr 2.7 0.014
Tatoeba-test.eng-amh.eng.amh 13.7 0.510
Tatoeba-test.eng-ara.eng.ara 12.2 0.412
Tatoeba-test.eng-heb.eng.heb 32.1 0.550
Tatoeba-test.eng-mlt.eng.mlt 17.6 0.556
Tatoeba-test.eng-phn.eng.phn 1.3 0.007
Tatoeba-test.eng-tir.eng.tir 2.6 0.250
Tatoeba-test.eng-tmr.eng.tmr 1.1 0.007
Tatoeba-test.heb-ara.heb.ara 19.5 0.496
Tatoeba-test.heb-eng.heb.eng 43.3 0.598
Tatoeba-test.heb-phn.heb.phn 2.0 0.009
Tatoeba-test.heb-syc.heb.syc 3.3 0.000
Tatoeba-test.heb-tmr.heb.tmr 0.2 0.005
Tatoeba-test.mlt-ara.mlt.ara 17.3 0.427
Tatoeba-test.mlt-eng.mlt.eng 48.3 0.647
Tatoeba-test.multi.multi 33.2 0.534
Tatoeba-test.phn-eng.phn.eng 2.2 0.071
Tatoeba-test.phn-heb.phn.heb 0.4 0.044
Tatoeba-test.phn-tmr.phn.tmr 0.5 0.000
Tatoeba-test.syc-heb.syc.heb 0.0 0.000
Tatoeba-test.tir-eng.tir.eng 16.1 0.344
Tatoeba-test.tmr-ara.tmr.ara 2.5 0.075
Tatoeba-test.tmr-eng.tmr.eng 2.2 0.141
Tatoeba-test.tmr-heb.tmr.heb 1.0 0.142
Tatoeba-test.tmr-phn.tmr.phn 0.0 0.017

opus-2020-10-04.zip

  • dataset: opus
  • model: transformer
  • source language(s): acm afb amh apc ara arq ary arz eng heb mlt phn_Phnx syc_Syrc tir tmr_Hebr
  • target language(s): acm afb amh apc ara arq ary arz eng heb mlt phn_Phnx syc_Syrc tir tmr_Hebr
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-10-04.zip
  • test set translations: opus-2020-10-04.test.txt
  • test set scores: opus-2020-10-04.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.amh-eng.amh.eng 41.0 0.586
Tatoeba-test.ara-ara.ara.ara 2.9 0.181
Tatoeba-test.ara-eng.ara.eng 39.0 0.566
Tatoeba-test.ara-heb.ara.heb 35.8 0.565
Tatoeba-test.ara-mlt.ara.mlt 20.3 0.575
Tatoeba-test.ara-tmr.ara.tmr 2.2 0.013
Tatoeba-test.eng-amh.eng.amh 15.7 0.531
Tatoeba-test.eng-ara.eng.ara 12.4 0.416
Tatoeba-test.eng-heb.eng.heb 32.2 0.551
Tatoeba-test.eng-mlt.eng.mlt 17.7 0.563
Tatoeba-test.eng-phn.eng.phn 1.3 0.007
Tatoeba-test.eng-tir.eng.tir 2.5 0.242
Tatoeba-test.eng-tmr.eng.tmr 1.2 0.007
Tatoeba-test.heb-ara.heb.ara 20.1 0.497
Tatoeba-test.heb-eng.heb.eng 43.4 0.600
Tatoeba-test.heb-phn.heb.phn 1.9 0.008
Tatoeba-test.heb-syc.heb.syc 0.2 0.000
Tatoeba-test.heb-tmr.heb.tmr 0.6 0.005
Tatoeba-test.mlt-ara.mlt.ara 13.5 0.451
Tatoeba-test.mlt-eng.mlt.eng 49.2 0.659
Tatoeba-test.multi.multi 33.5 0.537
Tatoeba-test.phn-eng.phn.eng 1.1 0.058
Tatoeba-test.phn-heb.phn.heb 0.2 0.046
Tatoeba-test.phn-tmr.phn.tmr 0.0 0.000
Tatoeba-test.syc-heb.syc.heb 3.3 0.045
Tatoeba-test.tir-eng.tir.eng 17.5 0.359
Tatoeba-test.tmr-ara.tmr.ara 1.4 0.077
Tatoeba-test.tmr-eng.tmr.eng 4.5 0.186
Tatoeba-test.tmr-heb.tmr.heb 1.0 0.134
Tatoeba-test.tmr-phn.tmr.phn 0.0 0.017

opus-2021-02-24.zip

  • dataset: opus
  • model: transformer
  • source language(s): afb apc ara arq arz heb jpa mlt oar phn syc tmr
  • target language(s): afb apc ara arq arz heb jpa mlt oar phn syc tmr
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • valid language labels: >>eng<< >>ara<< >>heb<< >>mlt<< >>tir<< >>amh<< >>arq<< >>arz<<
  • download: opus-2021-02-24.zip
  • test set translations: opus-2021-02-24.test.txt
  • test set scores: opus-2021-02-24.eval.txt

Benchmarks

testset BLEU chr-F #sent #words BP
Tatoeba-test.afb-tmr_Hebr 10.7 0.019 1 3 1.000
Tatoeba-test.apc-ara 29.8 0.385 2 9 0.882
Tatoeba-test.apc-arz 6.4 0.105 2 6 1.000
Tatoeba-test.apc-heb 27.5 0.234 1 5 0.779
Tatoeba-test.apc-tmr_Hebr 7.6 0.014 2 7 1.000
Tatoeba-test.ara-apc 5.5 0.012 2 8 1.000
Tatoeba-test.ara-ara 2.9 0.176 16 60 1.000
Tatoeba-test.ara-arq 4.8 0.086 1 5 1.000
Tatoeba-test.ara-arz 1.1 0.095 2 9 1.000
Tatoeba-test.ara-heb 35.8 0.566 1208 6800 1.000
Tatoeba-test.ara-mlt 19.7 0.553 28 88 1.000
Tatoeba-test.ara-tmr 2.2 0.013 8 28 1.000
Tatoeba-test.ara-tmr_Hebr 5.3 0.012 3 11 1.000
Tatoeba-test.arq-ara 10.7 0.056 1 5 1.000
Tatoeba-test.arq-heb 16.0 0.179 1 4 1.000
Tatoeba-test.arz-apc 4.1 0.007 2 6 1.000
Tatoeba-test.arz-ara 31.2 0.672 2 9 1.000
Tatoeba-test.arz-heb 0.5 0.048 2 10 1.000
Tatoeba-test.arz-tmr_Hebr 4.8 0.013 2 7 1.000
Tatoeba-test.heb-apc 12.7 0.251 1 4 1.000
Tatoeba-test.heb-ara 20.2 0.495 1208 6371 0.896
Tatoeba-test.heb-arq 6.6 0.008 1 4 1.000
Tatoeba-test.heb-arz 2.5 0.242 2 8 1.000
Tatoeba-test.heb-jpa 10.7 0.012 1 4 1.000
Tatoeba-test.heb-oar 0.1 0.001 8 95 1.000
Tatoeba-test.heb-phn 1.9 0.008 9 47 1.000
Tatoeba-test.heb-syc 0.2 0.000 1 6 1.000
Tatoeba-test.heb-tmr 0.6 0.005 16 94 1.000
Tatoeba-test.jpa-heb 12.7 0.142 1 4 1.000
Tatoeba-test.jpa-tmr 6.6 0.013 1 4 1.000
Tatoeba-test.mlt-ara 13.8 0.437 28 91 0.944
Tatoeba-test.multi-multi 26.1 0.498 2596 14039 1.000
Tatoeba-test.oar-heb 0.8 0.062 8 82 0.937
Tatoeba-test.oar-syc 0.2 0.000 1 6 1.000
Tatoeba-test.phn-heb 0.2 0.046 9 48 1.000
Tatoeba-test.phn-tmr 0.0 0.000 1 3 1.000
Tatoeba-test.syc-heb 3.3 0.045 1 6 1.000
Tatoeba-test.syc-oar 0.0 0.000 1 6 0.368
Tatoeba-test.tmr-ara 1.4 0.077 8 24 1.000
Tatoeba-test.tmr-heb 1.0 0.134 16 102 0.918
Tatoeba-test.tmr_Hebr-afb 0.0 0.000 1 2 0.368
Tatoeba-test.tmr_Hebr-apc 1.5 0.032 2 6 1.000
Tatoeba-test.tmr_Hebr-ara 4.3 0.170 3 10 1.000
Tatoeba-test.tmr_Hebr-arz 3.2 0.045 2 6 1.000
Tatoeba-test.tmr-jpa 0.0 0.000 1 4 0.050
Tatoeba-test.tmr-phn 0.0 0.017 1 4 0.717
tico19-test.eng-amh 4.3 0.238 2100 44943 0.826
tico19-test.eng-ara 15.8 0.465 2100 51336 0.938
tico19-test.eng-tir 2.8 0.182 2100 46792 0.962