vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 4.7k
Star 30.9k

Code
Issues 1.7k
Pull requests 377
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q4 2024

#9006 opened Oct 1, 2024 by simon-mo

Open 22

vLLM's V1 Engine Architecture

#8779 opened Sep 24, 2024 by simon-mo

Open 9

Labels 56 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1,710 Open 3,659 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[RFC]: Make any model an embedding model RFC

#10674 opened Nov 26, 2024 by DarkLight1337

1 task done

[Usage]: Llama-2-7b-chat-hf as embedding model usage

How to use vllm

#10673 opened Nov 26, 2024 by ra-MANUJ-an

1 task done

[Usage]: No Generation When Running VLLM with neuralmagic/Meta-Llama-3.1-8b-Instruct-quantized.w4a16 Using langchain_openai usage

How to use vllm

#10671 opened Nov 26, 2024 by ehab-akram

1 task done

[Usage]: how to get every output token score? usage

How to use vllm

#10670 opened Nov 26, 2024 by TonyUSTC

[Bug]: When using Ray as the inference backend for Qwen2-VL, there are issues with the inference results. bug

Something isn't working

#10668 opened Nov 26, 2024 by my17th2

1 task done

[RFC]: Create VllmState to save immutable args in VllmConfig RFC

#10666 opened Nov 26, 2024 by MengqingCao

1 task done

[Performance]: There is a 10x performance gap between the lora-modules deployment model and the Merge deployment model performance

Performance-related issues

#10664 opened Nov 26, 2024 by LIUKAI0815

1 task done

[Installation]: Request for a Solution to Enable Llama 3.1 405B-FP8 Model Compatibility with AMD Mi250 installation

Installation problems

#10663 opened Nov 26, 2024 by Bihan

1 task done

[Usage]: Cannot use xformers with old GPU usage

How to use vllm

#10662 opened Nov 26, 2024 by baimushan

1 task done

[Bug]: No available block found in 60 second. bug

Something isn't working

#10661 opened Nov 26, 2024 by Went-Liang

1 task done

[Feature]: Integrate with XGrammar for zero-overhead structured generation in LLM inference. feature request

#10660 opened Nov 26, 2024 by choisioo

1 task done

[Feature]: add macos installation script feature request good first issue

Good for newcomers

#10658 opened Nov 26, 2024 by youkaichao

1 task done

[Bug]: Qwen2.5-32B-GPTQ-Int4 inference !!!!! bug

Something isn't working

#10656 opened Nov 26, 2024 by jklj077

1 task done

[Bug]: AMD GPU RX 7900XT: Failed to infer device type bug

Something isn't working

#10653 opened Nov 26, 2024 by githust66

1 task done

[Bug]: Inference is exceptionally slow on the L20 GPU bug

Something isn't working

#10652 opened Nov 26, 2024 by joey9503

1 task done

[Bug]: vllm infer for Qwen2-VL-72B-Instruct-GPTQ-Int8 bug

Something isn't working

#10650 opened Nov 26, 2024 by DoctorTar

1 task done

[Feature]: Mixtral manual head_dim feature request

#10649 opened Nov 26, 2024 by wavy-jung

1 task done

[Bug]: Llama 3.2 90b crash bug

Something isn't working

#10648 opened Nov 26, 2024 by yessenzhar

1 task done

[RFC]: Support KV Cache Compaction RFC

#10646 opened Nov 25, 2024 by YaoJiayi

1 task done

[Feature]: if vllm supports explicitly specifying GPU devices for a model instance. feature request

#10638 opened Nov 25, 2024 by wlll123456

1 task done

[Bug]:The parameter gpu_memory_utilization does not take effect bug

Something isn't working

#10637 opened Nov 25, 2024 by liutao053877

1 task done

[Feature]: Initial Idea and Design for Asynchronous Scheduling feature request

#10634 opened Nov 25, 2024 by lixiaolx

1 task done

[Bug]: GPU memory leak when using bad_words feature bug

Something isn't working

#10630 opened Nov 25, 2024 by wsp317

1 task done

[Performance]: It seems that using bge-m3 for performance acceleration did not achieve the expected results. performance

Performance-related issues

#10628 opened Nov 25, 2024 by Jay-ju

1 task done

[Bug]: Crash with Qwen2-Audio Model in vLLM During Audio Processing bug

Something isn't working

#10627 opened Nov 25, 2024 by jiahansu

1 task done

Previous 1 2 3 4 5 … 68 69 Next

Previous Next

ProTip! Updated in the last three days: updated:>2024-11-23.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly