Skip to content

Latest commit

 

History

History
276 lines (256 loc) · 13.6 KB

README.md

File metadata and controls

276 lines (256 loc) · 13.6 KB

opus-2020-06-28.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): arg ast cat cos egl ext fra frm_Latn fvr glg ita lad lad_Latn lij lld_Latn lmo mwl oci osp_Latn pms por roh ron scn spa vec wln
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-06-28.zip
  • test set translations: opus-2020-06-28.test.txt
  • test set scores: opus-2020-06-28.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-arg.eng.arg 2.2 0.147
Tatoeba-test.eng-ast.eng.ast 17.2 0.415
Tatoeba-test.eng-cat.eng.cat 47.7 0.669
Tatoeba-test.eng-cos.eng.cos 3.2 0.262
Tatoeba-test.eng-egl.eng.egl 0.4 0.119
Tatoeba-test.eng-ext.eng.ext 5.5 0.304
Tatoeba-test.eng-fra.eng.fra 45.8 0.641
Tatoeba-test.eng-frm.eng.frm 0.9 0.212
Tatoeba-test.eng-fvr.eng.fvr 2.6 0.260
Tatoeba-test.eng-glg.eng.glg 45.8 0.655
Tatoeba-test.eng-ita.eng.ita 45.9 0.678
Tatoeba-test.eng-lad.eng.lad 8.9 0.324
Tatoeba-test.eng-lij.eng.lij 1.8 0.191
Tatoeba-test.eng-lld.eng.lld 0.5 0.215
Tatoeba-test.eng-lmo.eng.lmo 0.9 0.203
Tatoeba-test.eng.multi 44.1 0.645
Tatoeba-test.eng-mwl.eng.mwl 4.1 0.331
Tatoeba-test.eng-oci.eng.oci 7.8 0.289
Tatoeba-test.eng-osp.eng.osp 10.8 0.382
Tatoeba-test.eng-pms.eng.pms 1.8 0.197
Tatoeba-test.eng-por.eng.por 41.7 0.637
Tatoeba-test.eng-roh.eng.roh 2.8 0.257
Tatoeba-test.eng-ron.eng.ron 41.8 0.640
Tatoeba-test.eng-scn.eng.scn 1.8 0.175
Tatoeba-test.eng-spa.eng.spa 50.3 0.691
Tatoeba-test.eng-vec.eng.vec 3.2 0.251
Tatoeba-test.eng-wln.eng.wln 6.6 0.236

opus-2020-07-14.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): arg ast cat cos egl ext fra frm_Latn gcf_Latn glg hat ind ita lad lad_Latn lij lld_Latn lmo max_Latn mfe min mwl oci pap pms por roh ron scn spa tmw_Latn vec wln zlm_Latn zsm_Latn
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-14.zip
  • test set translations: opus-2020-07-14.test.txt
  • test set scores: opus-2020-07-14.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-arg.eng.arg 1.7 0.133
Tatoeba-test.eng-ast.eng.ast 17.2 0.415
Tatoeba-test.eng-cat.eng.cat 47.5 0.668
Tatoeba-test.eng-cos.eng.cos 1.8 0.215
Tatoeba-test.eng-egl.eng.egl 0.4 0.087
Tatoeba-test.eng-ext.eng.ext 13.7 0.353
Tatoeba-test.eng-fra.eng.fra 44.1 0.629
Tatoeba-test.eng-frm.eng.frm 0.6 0.196
Tatoeba-test.eng-gcf.eng.gcf 0.9 0.116
Tatoeba-test.eng-glg.eng.glg 43.7 0.640
Tatoeba-test.eng-hat.eng.hat 30.1 0.529
Tatoeba-test.eng-ita.eng.ita 44.8 0.668
Tatoeba-test.eng-lad.eng.lad 7.5 0.301
Tatoeba-test.eng-lij.eng.lij 1.5 0.187
Tatoeba-test.eng-lld.eng.lld 0.8 0.199
Tatoeba-test.eng-lmo.eng.lmo 0.8 0.177
Tatoeba-test.eng-mfe.eng.mfe 91.9 0.956
Tatoeba-test.eng.multi 42.3 0.631
Tatoeba-test.eng-mwl.eng.mwl 2.7 0.252
Tatoeba-test.eng-oci.eng.oci 7.3 0.290
Tatoeba-test.eng-pap.eng.pap 43.7 0.627
Tatoeba-test.eng-pms.eng.pms 2.4 0.194
Tatoeba-test.eng-por.eng.por 40.7 0.632
Tatoeba-test.eng-roh.eng.roh 3.5 0.258
Tatoeba-test.eng-ron.eng.ron 40.0 0.628
Tatoeba-test.eng-scn.eng.scn 1.6 0.100
Tatoeba-test.eng-spa.eng.spa 48.7 0.680
Tatoeba-test.eng-vec.eng.vec 1.9 0.166
Tatoeba-test.eng-wln.eng.wln 8.1 0.226

opus-2020-07-20.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): arg ast cat cos egl ext fra frm_Latn gcf_Latn glg hat ind ita lad lad_Latn lij lld_Latn lmo max_Latn mfe min mwl oci pap pms por roh ron scn spa tmw_Latn vec wln zlm_Latn zsm_Latn
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-20.zip
  • test set translations: opus-2020-07-20.test.txt
  • test set scores: opus-2020-07-20.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-arg.eng.arg 1.5 0.132
Tatoeba-test.eng-ast.eng.ast 15.4 0.413
Tatoeba-test.eng-cat.eng.cat 47.8 0.671
Tatoeba-test.eng-cos.eng.cos 3.3 0.293
Tatoeba-test.eng-egl.eng.egl 0.2 0.085
Tatoeba-test.eng-ext.eng.ext 11.7 0.311
Tatoeba-test.eng-fra.eng.fra 44.8 0.633
Tatoeba-test.eng-frm.eng.frm 1.0 0.213
Tatoeba-test.eng-gcf.eng.gcf 0.8 0.119
Tatoeba-test.eng-glg.eng.glg 44.5 0.646
Tatoeba-test.eng-hat.eng.hat 25.5 0.494
Tatoeba-test.eng-ita.eng.ita 45.1 0.673
Tatoeba-test.eng-lad.eng.lad 8.0 0.305
Tatoeba-test.eng-lij.eng.lij 1.5 0.178
Tatoeba-test.eng-lld.eng.lld 0.4 0.171
Tatoeba-test.eng-lmo.eng.lmo 1.5 0.191
Tatoeba-test.eng-mfe.eng.mfe 91.9 0.956
Tatoeba-test.eng-msa.eng.msa 31.2 0.548
Tatoeba-test.eng.multi 42.6 0.632
Tatoeba-test.eng-mwl.eng.mwl 3.3 0.288
Tatoeba-test.eng-oci.eng.oci 7.5 0.287
Tatoeba-test.eng-pap.eng.pap 44.8 0.630
Tatoeba-test.eng-pms.eng.pms 2.7 0.198
Tatoeba-test.eng-por.eng.por 41.3 0.635
Tatoeba-test.eng-roh.eng.roh 4.3 0.271
Tatoeba-test.eng-ron.eng.ron 40.6 0.631
Tatoeba-test.eng-scn.eng.scn 1.4 0.173
Tatoeba-test.eng-spa.eng.spa 49.2 0.684
Tatoeba-test.eng-vec.eng.vec 4.8 0.240
Tatoeba-test.eng-wln.eng.wln 5.4 0.233

opus-2020-07-27.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): arg ast cat cos egl ext fra frm_Latn gcf_Latn glg hat ind ita lad lad_Latn lij lld_Latn lmo max_Latn mfe min mwl oci pap pms por roh ron scn spa tmw_Latn vec wln zlm_Latn zsm_Latn
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-27.zip
  • test set translations: opus-2020-07-27.test.txt
  • test set scores: opus-2020-07-27.eval.txt

Benchmarks

testset BLEU chr-F
newsdev2016-enro-engron.eng.ron 27.3 0.565
newsdiscussdev2015-enfr-engfra.eng.fra 29.9 0.573
newsdiscusstest2015-enfr-engfra.eng.fra 35.2 0.609
newssyscomb2009-engfra.eng.fra 27.8 0.569
newssyscomb2009-engita.eng.ita 29.0 0.590
newssyscomb2009-engspa.eng.spa 29.5 0.567
news-test2008-engfra.eng.fra 25.1 0.538
news-test2008-engspa.eng.spa 27.2 0.547
newstest2009-engfra.eng.fra 26.6 0.557
newstest2009-engita.eng.ita 28.6 0.582
newstest2009-engspa.eng.spa 28.7 0.565
newstest2010-engfra.eng.fra 29.2 0.573
newstest2010-engspa.eng.spa 33.6 0.598
newstest2011-engfra.eng.fra 31.2 0.591
newstest2011-engspa.eng.spa 34.8 0.599
newstest2012-engfra.eng.fra 29.2 0.574
newstest2012-engspa.eng.spa 35.1 0.601
newstest2013-engfra.eng.fra 29.7 0.565
newstest2013-engspa.eng.spa 31.7 0.576
newstest2016-enro-engron.eng.ron 25.9 0.548
Tatoeba-test.eng-arg.eng.arg 1.7 0.131
Tatoeba-test.eng-ast.eng.ast 16.6 0.417
Tatoeba-test.eng-cat.eng.cat 47.6 0.670
Tatoeba-test.eng-cos.eng.cos 3.3 0.284
Tatoeba-test.eng-egl.eng.egl 0.9 0.118
Tatoeba-test.eng-ext.eng.ext 8.7 0.301
Tatoeba-test.eng-fra.eng.fra 44.8 0.633
Tatoeba-test.eng-frm.eng.frm 0.8 0.201
Tatoeba-test.eng-gcf.eng.gcf 0.8 0.117
Tatoeba-test.eng-glg.eng.glg 44.0 0.642
Tatoeba-test.eng-hat.eng.hat 28.8 0.510
Tatoeba-test.eng-ita.eng.ita 45.3 0.674
Tatoeba-test.eng-lad.eng.lad 8.4 0.310
Tatoeba-test.eng-lij.eng.lij 1.4 0.178
Tatoeba-test.eng-lld.eng.lld 0.8 0.220
Tatoeba-test.eng-lmo.eng.lmo 0.9 0.189
Tatoeba-test.eng-mfe.eng.mfe 82.4 0.915
Tatoeba-test.eng-msa.eng.msa 31.3 0.549
Tatoeba-test.eng.multi 42.6 0.633
Tatoeba-test.eng-mwl.eng.mwl 2.9 0.311
Tatoeba-test.eng-oci.eng.oci 7.9 0.292
Tatoeba-test.eng-pap.eng.pap 47.4 0.661
Tatoeba-test.eng-pms.eng.pms 2.5 0.198
Tatoeba-test.eng-por.eng.por 41.4 0.636
Tatoeba-test.eng-roh.eng.roh 3.2 0.259
Tatoeba-test.eng-ron.eng.ron 40.8 0.632
Tatoeba-test.eng-scn.eng.scn 1.8 0.191
Tatoeba-test.eng-spa.eng.spa 49.4 0.685
Tatoeba-test.eng-vec.eng.vec 5.1 0.253
Tatoeba-test.eng-wln.eng.wln 7.1 0.235

opus2m-2020-08-01.zip

  • dataset: opus2m
  • model: transformer
  • source language(s): eng
  • target language(s): arg ast cat cos egl ext fra frm_Latn gcf_Latn glg hat ind ita lad lad_Latn lij lld_Latn lmo max_Latn mfe min mwl oci pap pms por roh ron scn spa tmw_Latn vec wln zlm_Latn zsm_Latn
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus2m-2020-08-01.zip
  • test set translations: opus2m-2020-08-01.test.txt
  • test set scores: opus2m-2020-08-01.eval.txt

Benchmarks

testset BLEU chr-F
newsdev2016-enro-engron.eng.ron 27.6 0.567
newsdiscussdev2015-enfr-engfra.eng.fra 30.2 0.575
newsdiscusstest2015-enfr-engfra.eng.fra 35.5 0.612
newssyscomb2009-engfra.eng.fra 27.9 0.570
newssyscomb2009-engita.eng.ita 29.3 0.590
newssyscomb2009-engspa.eng.spa 29.6 0.570
news-test2008-engfra.eng.fra 25.2 0.538
news-test2008-engspa.eng.spa 27.3 0.548
newstest2009-engfra.eng.fra 26.9 0.560
newstest2009-engita.eng.ita 28.7 0.583
newstest2009-engspa.eng.spa 29.0 0.568
newstest2010-engfra.eng.fra 29.3 0.574
newstest2010-engspa.eng.spa 34.2 0.601
newstest2011-engfra.eng.fra 31.4 0.592
newstest2011-engspa.eng.spa 35.0 0.599
newstest2012-engfra.eng.fra 29.5 0.576
newstest2012-engspa.eng.spa 35.5 0.603
newstest2013-engfra.eng.fra 29.9 0.567
newstest2013-engspa.eng.spa 32.1 0.578
newstest2016-enro-engron.eng.ron 26.1 0.551
Tatoeba-test.eng-arg.eng.arg 1.4 0.125
Tatoeba-test.eng-ast.eng.ast 17.8 0.406
Tatoeba-test.eng-cat.eng.cat 48.3 0.676
Tatoeba-test.eng-cos.eng.cos 3.2 0.275
Tatoeba-test.eng-egl.eng.egl 0.2 0.084
Tatoeba-test.eng-ext.eng.ext 11.2 0.344
Tatoeba-test.eng-fra.eng.fra 45.3 0.637
Tatoeba-test.eng-frm.eng.frm 1.1 0.221
Tatoeba-test.eng-gcf.eng.gcf 0.6 0.118
Tatoeba-test.eng-glg.eng.glg 44.2 0.645
Tatoeba-test.eng-hat.eng.hat 28.0 0.502
Tatoeba-test.eng-ita.eng.ita 45.6 0.674
Tatoeba-test.eng-lad.eng.lad 8.2 0.322
Tatoeba-test.eng-lij.eng.lij 1.4 0.182
Tatoeba-test.eng-lld.eng.lld 0.8 0.217
Tatoeba-test.eng-lmo.eng.lmo 0.7 0.190
Tatoeba-test.eng-mfe.eng.mfe 91.9 0.956
Tatoeba-test.eng-msa.eng.msa 31.1 0.548
Tatoeba-test.eng.multi 42.9 0.636
Tatoeba-test.eng-mwl.eng.mwl 2.1 0.234
Tatoeba-test.eng-oci.eng.oci 7.9 0.297
Tatoeba-test.eng-pap.eng.pap 44.1 0.648
Tatoeba-test.eng-pms.eng.pms 2.1 0.190
Tatoeba-test.eng-por.eng.por 41.8 0.639
Tatoeba-test.eng-roh.eng.roh 3.5 0.261
Tatoeba-test.eng-ron.eng.ron 41.0 0.635
Tatoeba-test.eng-scn.eng.scn 1.7 0.184
Tatoeba-test.eng-spa.eng.spa 50.1 0.689
Tatoeba-test.eng-vec.eng.vec 3.2 0.248
Tatoeba-test.eng-wln.eng.wln 7.2 0.220