Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support training separate source/target SentencePiece Models #97

Open
radinplaid opened this issue Jul 15, 2022 · 2 comments
Open

Support training separate source/target SentencePiece Models #97

radinplaid opened this issue Jul 15, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@radinplaid
Copy link

It appears that the pipeline only supports training a joint BPE model, but it is sometimes better to have separate source/target BPE vocabularies

@radinplaid radinplaid changed the title Support training seperate source/target SentencePiece Models Support training separate source/target SentencePiece Models Jul 15, 2022
@eu9ene eu9ene added the enhancement New feature or request label Jul 15, 2022
@AmitMY
Copy link
Contributor

AmitMY commented Jul 27, 2022

I would really like to see that too. I work on language pairs with no overlap between the src and target character set, and so a separate tokenization model for each makes sense.

@gregtatum
Copy link
Member

#913 is also this I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants