Replies: 1 comment
-
I am also having the same problem with generation of old tokens everytime when a new token is generated. Can someone help how to stream the response from vllm. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
First, I am sorry for so many posts written by me. It's just I am a newbie to vllm.
I'm using vllm for stream inference. But I have found vllm generates every token including previous one. So for temporary solution, I store last output in a variable, and when current step inference finished, I cut the result by the previous one.
I want to know is there a way(like a configuration) to only generate new tokens
Beta Was this translation helpful? Give feedback.
All reactions