Replies: 8 comments 22 replies
-
I'm having the same issue |
Beta Was this translation helpful? Give feedback.
-
Same error here. |
Beta Was this translation helpful? Give feedback.
-
same to me |
Beta Was this translation helpful? Give feedback.
-
Consider increasing the I encountered a similar issue that was resolved by increasing the |
Beta Was this translation helpful? Give feedback.
-
similar issue #10002 |
Beta Was this translation helpful? Give feedback.
-
Should changing
No matter what I set, I get checks every 10 seconds. |
Beta Was this translation helpful? Give feedback.
-
I will check it out and let you know. Thanks.
…On Thu, Nov 14, 2024, 4:28 PM JAEWON ROH ***@***.***> wrote:
It's strange.. I ran 4 vllms and test with openai. It works fine. Maybe
your error is occurred because of the server environment such as gpu
—
Reply to this email directly, view it on GitHub
<#9418 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJG6WKDUULUFYRM3WHTA6ML2ASXRXAVCNFSM6AAAAABQBHGDEOVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCMRVGUZDKNA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I encounted this too, I use --disable-frontend-multiprocessing to disable MQEngine to avoid this. |
Beta Was this translation helpful? Give feedback.
-
Hi everyone,
I am trying to perform inference using TheBloke/Mistral-7B-Instruct-v0.2-AWQ with vLLM Installation with CPU using Docker and I keep receiving this error:
I am successful in building the CPU Docker image, and when using the default facebook/opt-125m model, I am also successful in running the server and performing inference to receive a completions response.
As for the Mistral model, AWQ is a part of the supported hardware for quantization kernels, and I am able to start the server with my Docker run command as follows:
docker run -it --rm -v Mistral:/mnt/models/Mistral --network=host --ipc=host -e VLLM_CPU_KVCACHE_SPACE=40 vllm-cpu-env --model="/mnt/models/Mistral/Mistral-7B-Instruct-v0.2-AWQ" --dtype="half" --quantization awq --device "cpu" --max-model-len 2048
When I send an inference query, I am also able to see the following log:
It is only after a few seconds that I receive
RuntimeError('Engine loop has died')
which kills the server, and shuts down the Docker container.I have tried various parameters of the
VLLM_CPU_KVCACHE_SPACE
value and have increasedVLLM_ENGINE_ITERATION_TIMEOUT_S
, as well as settingVLLM_CPU_OMP_THREADS_BIND
to my physical cores, but to no avail.I'm reaching out in the hopes that this error can be rectified. Thank you for your attention thus far. Cheers
Beta Was this translation helpful? Give feedback.
All reactions