Support training separate source/target SentencePiece Models #97

radinplaid · 2022-07-15T14:33:16Z

It appears that the pipeline only supports training a joint BPE model, but it is sometimes better to have separate source/target BPE vocabularies

AmitMY · 2022-07-27T14:23:29Z

I would really like to see that too. I work on language pairs with no overlap between the src and target character set, and so a separate tokenization model for each makes sense.

gregtatum · 2025-01-15T21:08:59Z

#913 is also this I think.

radinplaid changed the title ~~Support training seperate source/target SentencePiece Models~~ Support training separate source/target SentencePiece Models Jul 15, 2022

eu9ene added the enhancement New feature or request label Jul 15, 2022

AmitMY mentioned this issue Oct 14, 2022

Feature request: support separate source and target vocabularies #101

Closed

AmitMY mentioned this issue Jan 31, 2024

Support Offline Translation Between SignWriting and Spoken Language sign/translate#57

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support training separate source/target SentencePiece Models #97

Support training separate source/target SentencePiece Models #97

radinplaid commented Jul 15, 2022

AmitMY commented Jul 27, 2022

gregtatum commented Jan 15, 2025

Support training separate source/target SentencePiece Models #97

Support training separate source/target SentencePiece Models #97

Comments

radinplaid commented Jul 15, 2022

AmitMY commented Jul 27, 2022

gregtatum commented Jan 15, 2025