Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lora] Add load option to LoRA adapter API #2536

Merged
merged 2 commits into from
Nov 12, 2024

Conversation

xyang16
Copy link
Contributor

@xyang16 xyang16 commented Nov 8, 2024

Description

Add an additional load option to the register adapter API and update adapter API.

The reason is for this change is to keep consistent with other model servers like vllm and lorax. vllm load_lora_adapter API don't load adapter weights, the adapter weights is only loaded when running inference using one particular adapter.

Discussed with Hosting team, making default to true.

Example:

curl -X POST "http://localhost:8080/models/model/adapters?name=eng_alpaca&load=false&src=/opt/ml/model/adapters/eng_alpaca"

@xyang16 xyang16 requested review from zachgk and a team as code owners November 8, 2024 21:09
@xyang16 xyang16 force-pushed the lora branch 12 times, most recently from 52d9886 to 7a3e234 Compare November 8, 2024 23:22
if adapter_load:
_service.add_lora(adapter_name, adapter_alias, adapter_path)
else:
_service.remove_lora(adapter_name, adapter_alias)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if adapter_load is false, why are we removing the adapter? Should this just be a noop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to have an unload option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so we support unloading only for unpinned adapters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Comment on lines 286 to 287
return self.engine.add_lora(lora_request) and self.engine.pin_lora(
lora_request.lora_int_id)
Copy link
Contributor

@siddvenk siddvenk Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to check the result of add_lora before pinning?

Also, I would prefer if we kept these calls separate. It's more readable.

Same questions for vlm_rolling_batch.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This is to make sure it's successfully loaded before pinning.
  2. Made these calls separate.

@xyang16 xyang16 merged commit 8bddf35 into deepjavalibrary:master Nov 12, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants