-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lora] Add load option to LoRA adapter API #2536
Conversation
52d9886
to
7a3e234
Compare
if adapter_load: | ||
_service.add_lora(adapter_name, adapter_alias, adapter_path) | ||
else: | ||
_service.remove_lora(adapter_name, adapter_alias) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if adapter_load is false, why are we removing the adapter? Should this just be a noop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to have an unload option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so we support unloading only for unpinned adapters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
return self.engine.add_lora(lora_request) and self.engine.pin_lora( | ||
lora_request.lora_int_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to check the result of add_lora before pinning?
Also, I would prefer if we kept these calls separate. It's more readable.
Same questions for vlm_rolling_batch.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This is to make sure it's successfully loaded before pinning.
- Made these calls separate.
Description
Add an additional
load
option to the register adapter API and update adapter API.The reason is for this change is to keep consistent with other model servers like vllm and lorax. vllm load_lora_adapter API don't load adapter weights, the adapter weights is only loaded when running inference using one particular adapter.
Discussed with Hosting team, making default to
true
.Example: