-
Recently when I tried to use LORAs with GPTQ model, vLLM raised an error saying that it does not support this feature yet. The "yet" in the sentence gives me hope that this feature request is already in their roadmap. Is it possible to know an estimation of when this feature will be implemented? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Is there any progress on this issue? |
Beta Was this translation helpful? Give feedback.
-
See: https://github.com/vllm-project/vllm/blob/main/examples/lora_with_quantization_inference.py and https://github.com/vllm-project/vllm/blob/main/tests/lora/test_quant_model.py |
Beta Was this translation helpful? Give feedback.
See: https://github.com/vllm-project/vllm/blob/main/examples/lora_with_quantization_inference.py and https://github.com/vllm-project/vllm/blob/main/tests/lora/test_quant_model.py