You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It appears that the pipeline only supports training a joint BPE model, but it is sometimes better to have separate source/target BPE vocabularies
The text was updated successfully, but these errors were encountered:
radinplaid
changed the title
Support training seperate source/target SentencePiece Models
Support training separate source/target SentencePiece Models
Jul 15, 2022
I would really like to see that too. I work on language pairs with no overlap between the src and target character set, and so a separate tokenization model for each makes sense.
It appears that the pipeline only supports training a joint BPE model, but it is sometimes better to have separate source/target BPE vocabularies
The text was updated successfully, but these errors were encountered: