-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[RFC]: Make any model an embedding model
RFC
#10674
opened Nov 26, 2024 by
DarkLight1337
1 task done
[Usage]: Llama-2-7b-chat-hf as embedding model
usage
How to use vllm
#10673
opened Nov 26, 2024 by
ra-MANUJ-an
1 task done
[Usage]: No Generation When Running VLLM with neuralmagic/Meta-Llama-3.1-8b-Instruct-quantized.w4a16 Using langchain_openai
usage
How to use vllm
#10671
opened Nov 26, 2024 by
ehab-akram
1 task done
[Usage]: how to get every output token score?
usage
How to use vllm
#10670
opened Nov 26, 2024 by
TonyUSTC
[Bug]: When using Ray as the inference backend for Qwen2-VL, there are issues with the inference results.
bug
Something isn't working
#10668
opened Nov 26, 2024 by
my17th2
1 task done
[RFC]: Create
VllmState
to save immutable args in VllmConfig
RFC
#10666
opened Nov 26, 2024 by
MengqingCao
1 task done
[Performance]: There is a 10x performance gap between the lora-modules deployment model and the Merge deployment model
performance
Performance-related issues
#10664
opened Nov 26, 2024 by
LIUKAI0815
1 task done
[Installation]: Request for a Solution to Enable Llama 3.1 405B-FP8 Model Compatibility with AMD Mi250
installation
Installation problems
#10663
opened Nov 26, 2024 by
Bihan
1 task done
[Usage]: Cannot use xformers with old GPU
usage
How to use vllm
#10662
opened Nov 26, 2024 by
baimushan
1 task done
[Bug]: No available block found in 60 second.
bug
Something isn't working
#10661
opened Nov 26, 2024 by
Went-Liang
1 task done
[Feature]: Integrate with XGrammar for zero-overhead structured generation in LLM inference.
feature request
#10660
opened Nov 26, 2024 by
choisioo
1 task done
[Feature]: add macos installation script
feature request
good first issue
Good for newcomers
#10658
opened Nov 26, 2024 by
youkaichao
1 task done
[Bug]: Qwen2.5-32B-GPTQ-Int4 inference Something isn't working
!!!!!
bug
#10656
opened Nov 26, 2024 by
jklj077
1 task done
[Bug]: AMD GPU RX 7900XT: Failed to infer device type
bug
Something isn't working
#10653
opened Nov 26, 2024 by
githust66
1 task done
[Bug]: Inference is exceptionally slow on the L20 GPU
bug
Something isn't working
#10652
opened Nov 26, 2024 by
joey9503
1 task done
[Bug]: vllm infer for Qwen2-VL-72B-Instruct-GPTQ-Int8
bug
Something isn't working
#10650
opened Nov 26, 2024 by
DoctorTar
1 task done
[Feature]: Mixtral manual
head_dim
feature request
#10649
opened Nov 26, 2024 by
wavy-jung
1 task done
[Bug]: Llama 3.2 90b crash
bug
Something isn't working
#10648
opened Nov 26, 2024 by
yessenzhar
1 task done
[Feature]: if vllm supports explicitly specifying GPU devices for a model instance.
feature request
#10638
opened Nov 25, 2024 by
wlll123456
1 task done
[Bug]:The parameter gpu_memory_utilization does not take effect
bug
Something isn't working
#10637
opened Nov 25, 2024 by
liutao053877
1 task done
[Feature]: Initial Idea and Design for Asynchronous Scheduling
feature request
#10634
opened Nov 25, 2024 by
lixiaolx
1 task done
[Bug]: GPU memory leak when using bad_words feature
bug
Something isn't working
#10630
opened Nov 25, 2024 by
wsp317
1 task done
[Performance]: It seems that using bge-m3 for performance acceleration did not achieve the expected results.
performance
Performance-related issues
#10628
opened Nov 25, 2024 by
Jay-ju
1 task done
[Bug]: Crash with Qwen2-Audio Model in vLLM During Audio Processing
bug
Something isn't working
#10627
opened Nov 25, 2024 by
jiahansu
1 task done
Previous Next
ProTip!
Updated in the last three days: updated:>2024-11-23.