Implement parallel Inference and Generation #113
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
note: commits fix overlap should be checked in reverse and feat: add stopword checker + iterable generate function are from the implement stopwords PR.
I've only tested with the strategy
cpu fp32 *1
but I think it should work for all strategies.The parallel sampling method builds heavily on logic implemented in the stopword PR and the commits don't really define good boundaries. Let me know if you want me to fix that. If you merge this then the other can deleted
Otherwise seems to be running quite smoothly with what I've tried. Let me know what you think about it all and if there is any obvious math issues I've introduced!
I've written a bit on #dev-chatrwkv on discord about this
some examples that I should document
initialise
Generate parallel
Generate single
Infer single
Infer parallel