Add keep_tokens_separator as alternative for keep_tokens #975
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi, great job as always.
I propose this feature to be added; it's inspired by NovelAI tagging. They train their model by putting some important tags at the head of the tags and shuffle the rest.
Got this from their docs:
And this is also confirmed by finetunej.
And we know that some Danbooru images have more than one tag in
tag_character_string
andtag_copyright_string
, as well as some of them having both1boy, 1girl
in one picture, so usingkeep_tokens
alone is not effective to 'mimic' NovelAI tagging.The
keep_tokens_separator
is proposed so we can keep tokens from being shuffled for different captions.For example:
Haven't tested for fine-tuning but I train some LoRA with this separator
link to model | link to datasets (5.65gb)
Thank you!