why max_num_batched_tokens must be <= 65528 when LoRA is enabled? can it be extended longer #6247
-
When I used LoRA, I found that Due to limitations of the custom LoRA CUDA kernel, max_num_batched_tokens must be <= 65528 when LoRA is enabled. ,I would like to know the specific reason for this. Can it be extended for a longer period of time? I have a need in this area. Do you have any specific ideas for me? Thank you @Yard1 @simon-mo |
Beta Was this translation helpful? Give feedback.
Answered by
jeejeelee
Aug 26, 2024
Replies: 1 comment 2 replies
-
Version 0.5.5 has already removed this restriction, see: #7288. |
Beta Was this translation helpful? Give feedback.
2 replies
Answer selected by
Yard1
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Version 0.5.5 has already removed this restriction, see: #7288.