[LoRA] Change lora_tokenizers capacity #10796

xyang16 · 2024-11-30T18:07:14Z

Currently lora_tokenizers capacity is max_num_seqs. This is fine if max_loras is smaller than max_num_seqs.

But if max_loras is greater than max_num_seqs, lora_tokenizers should have capacity=max_loras, just like active_adapters LoRALRUCache have capacity=max_loras. If not, items in active_adapters are not evicted, but items in lora_tokenizers are evicted. I think this is unnecessarily evicted, causing performance degradation.

In our benchmark, we test max_loras=50, max_num_seqs=32, 50 adapters are invoked evenly distributed. We found this causing 50% performance degradation in P90 latency, comparing to tokenizers are not evicted.

So raising this PR to make lora_tokenizers capacity to be max(max_loras, max_num_seqs).

Another option is to add max_lora_tokenizers, just like max_loras and max_cpu_loras which controls GPU cache and CPU cache. But this might be an overkill.

github-actions · 2024-11-30T18:07:28Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: Xin Yang <[email protected]>

jeejeelee · 2024-12-01T12:49:05Z

Thanks for your contribution, I will look at this PR tomorrow or the day after tomorrow

jeejeelee

Thank you very much for your contribution. Could we pass max_loras as a keyword argument? This way the overall changes would be minimal.

xyang16 · 2024-12-03T06:16:14Z

Thanks for your contribution, I will look at this PR tomorrow or the day after tomorrow

@jeejeelee Thanks for reviewing. There's already **tokenizer_config https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/tokenizer_group/tokenizer_group.py#L18, you mean add to **tokenizer_config?

jeejeelee · 2024-12-03T06:23:56Z

Thanks for your contribution, I will look at this PR tomorrow or the day after tomorrow

@jeejeelee Thanks for reviewing. There's already **tokenizer_config https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/tokenizer_group/tokenizer_group.py#L18, you mean add to **tokenizer_config?

Yeah

xyang16 · 2024-12-03T06:30:29Z

Thanks for your contribution, I will look at this PR tomorrow or the day after tomorrow

@jeejeelee Thanks for reviewing. There's already **tokenizer_config https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/tokenizer_group/tokenizer_group.py#L18, you mean add to **tokenizer_config?

Yeah

OK

Signed-off-by: Xin Yang <[email protected]>

xyang16 · 2024-12-03T20:16:19Z

@jeejeelee I have updated the code, please review.

vllm/transformers_utils/tokenizer_group/tokenizer_group.py

Signed-off-by: Xin Yang <[email protected]>

jeejeelee

overall LGTM, thanks for your contribution and patience

xyang16 requested review from WoosukKwon, robertgshaw2-neuralmagic, njhill, ywang96, comaniac, alexm-neuralmagic, zhuohan123 and youkaichao as code owners November 30, 2024 18:07

xyang16 force-pushed the lora branch from 7de10fc to 48d9ce0 Compare November 30, 2024 20:25

xyang16 requested review from DarkLight1337 and simon-mo as code owners November 30, 2024 20:25

[LoRA] Change lora_tokenizers capacity

3b88608

Signed-off-by: Xin Yang <[email protected]>

xyang16 force-pushed the lora branch from 48d9ce0 to 3b88608 Compare November 30, 2024 20:28

DarkLight1337 requested a review from jeejeelee December 1, 2024 00:51

jeejeelee requested changes Dec 3, 2024

View reviewed changes

xyang16 force-pushed the lora branch from 9c29c7e to e4b30d0 Compare December 3, 2024 06:22

xyang16 force-pushed the lora branch 3 times, most recently from 1053c54 to 2f362f9 Compare December 3, 2024 07:31

Review changes

ce384bc

Signed-off-by: Xin Yang <[email protected]>

xyang16 force-pushed the lora branch from 2f362f9 to ce384bc Compare December 3, 2024 07:37

jeejeelee reviewed Dec 4, 2024

View reviewed changes

vllm/transformers_utils/tokenizer_group/tokenizer_group.py Show resolved Hide resolved

xyang16 force-pushed the lora branch 2 times, most recently from 5defaac to d9994b0 Compare December 4, 2024 06:52

Unit test

3b9c652

Signed-off-by: Xin Yang <[email protected]>

xyang16 force-pushed the lora branch from d9994b0 to 3b9c652 Compare December 4, 2024 07:03

jeejeelee approved these changes Dec 4, 2024

View reviewed changes

jeejeelee enabled auto-merge (squash) December 4, 2024 15:40

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 4, 2024

jeejeelee merged commit 01d079f into vllm-project:main Dec 4, 2024
70 checks passed

xyang16 deleted the lora branch December 7, 2024 01:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LoRA] Change lora_tokenizers capacity #10796

[LoRA] Change lora_tokenizers capacity #10796

xyang16 commented Nov 30, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Nov 30, 2024

jeejeelee commented Dec 1, 2024

jeejeelee left a comment

xyang16 commented Dec 3, 2024

jeejeelee commented Dec 3, 2024

xyang16 commented Dec 3, 2024

xyang16 commented Dec 3, 2024

jeejeelee left a comment

[LoRA] Change lora_tokenizers capacity #10796

[LoRA] Change lora_tokenizers capacity #10796

Conversation

xyang16 commented Nov 30, 2024 • edited by github-actions bot Loading

github-actions bot commented Nov 30, 2024

jeejeelee commented Dec 1, 2024

jeejeelee left a comment

Choose a reason for hiding this comment

xyang16 commented Dec 3, 2024

jeejeelee commented Dec 3, 2024

xyang16 commented Dec 3, 2024

xyang16 commented Dec 3, 2024

jeejeelee left a comment

Choose a reason for hiding this comment

xyang16 commented Nov 30, 2024 •

edited by github-actions bot

Loading