-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read EOS token from model runtime information for speculative_decoding_lm #353
Conversation
Made changes to accommodate the dynamic EOS Token
getting the rt_info from the tokenizer instead of the LLM.
Made changes according to the review. This is the latest commit.
Fixed the error Related to the tokenizer read_model
Added comments and organised the code better
Changed Beam Search header to accommodate eos token in parameter and removed the comment regarding eos token not implemented
Added EOS token into the parameters
@ilya-lavrenov this is the PR corresponding to the change we discussed. |
@ilya-lavrenov The readme does not contain proper instruction for running speculative_decoding_lm. in the example, it runs the "Llama-2-7b-chat-hf" model as the main model directory; |
Thanks for noticing that. I will update Readme in the next following PRs |
Extension to issue #277, Added the functionality to read EOS token from model runtime information in the speculative_decoding_lm.