-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to support latest vLLM version (max_lora_rank) #2389
Comments
Hi @frankfliu - would you be able to help? Thanks. |
We are planning a release that will use vllm 0.6.0 (or 0.6.1.post2) soon. In the meantime, you can try providing a requirements.txt file with vllm==0.5.5 (or later version) to get around this. |
Thank you @siddvenk for your suggestions. I tried rebuilding the custom image by running We specified the followings in
We tried setting max_token to a really high number but the response is still very short.
Do you have any insights? |
Yes, you should use You can find the schema for our inference api here https://github.com/deepjavalibrary/djl-serving/blob/master/serving/docs/lmi/user_guides/lmi_input_output_schema.md. We also support the openai chat completions schema, details here https://github.com/deepjavalibrary/djl-serving/blob/master/serving/docs/lmi/user_guides/chat_input_output_schema.md. |
Thanks again for your quick response @siddvenk - Just want to make sure, should we:
|
btw, forgot to mention, we are deploying this to sagemaker |
There are two different configurations. On a per request basis, you can specify You can limit the maximum length of sequences globally by setting |
Thanks, @siddvenk . We did more tests and it turns out the "short response token" issue was only specific to the custom image I built (mentioned above). So we suspect we missed some key steps when building the image - can you help us review our process? Steps:
|
Description
In the current version (using LMI sagemaker image), we are running into the following error:
Looks like above error was fixed in vllm version v0.5.5.
See release notes here: https://github.com/vllm-project/vllm/releases/tag/v0.5.5
See PR here: vllm-project/vllm#7146
References
N/A
The text was updated successfully, but these errors were encountered: