Skip to content

Commit

Permalink
[Frontend] remove max_num_batched_tokens limit for lora (vllm-project…
Browse files Browse the repository at this point in the history
  • Loading branch information
NiuBlibing authored Aug 8, 2024
1 parent 7467096 commit 48abee9
Showing 1 changed file with 0 additions and 5 deletions.
5 changes: 0 additions & 5 deletions vllm/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -1377,11 +1377,6 @@ def verify_with_model_config(self, model_config: ModelConfig):
model_config.quantization)

def verify_with_scheduler_config(self, scheduler_config: SchedulerConfig):
if scheduler_config.max_num_batched_tokens > 65528:
raise ValueError(
"Due to limitations of the custom LoRA CUDA kernel, "
"max_num_batched_tokens must be <= 65528 when "
"LoRA is enabled.")
if scheduler_config.chunked_prefill_enabled:
raise ValueError("LoRA is not supported with chunked prefill yet.")

Expand Down

0 comments on commit 48abee9

Please sign in to comment.