You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Talking about other things that SentencePiece does, it has some other features that may replace pre-post-process.sh scripts. By default it applies NFKC normalization, but can be customized. The default normalization already does some of the preprocess.sh like:
Talking about other things that SentencePiece does, it has some other features that may replace pre-post-process.sh scripts. By default it applies NFKC normalization, but can be customized. The default normalization already does some of the preprocess.sh like:
If the user needs to add more normalization or change it, it can be borrowed from here https://github.com/google/sentencepiece/tree/master/data, modify it and provide it in the
spm_train
step and forget about preprocessing.The text was updated successfully, but these errors were encountered: