How do you implement Continuous batching of incoming requests? #492
-
Where can i find the codes about it? I am really interested about it. |
Beta Was this translation helpful? Give feedback.
Answered by
zhuohan123
Jul 18, 2023
Replies: 3 comments
-
Looks like doesn't support batch, for example
the Attention hiddenstate here has no batch dim |
Beta Was this translation helpful? Give feedback.
0 replies
-
Please check the logic starting at this vllm/vllm/engine/llm_engine.py Line 228 in b4b195b Move this issue to discussions for future questions |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
zhuohan123
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Please check the logic starting at this
step
function:vllm/vllm/engine/llm_engine.py
Line 228 in b4b195b
Move this issue to discussions for future questions