You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running the Qwen2-Audio model with vLLM for audio transcription tasks, the process may crash with a RuntimeError due to a shape mismatch during tensor operations.
NFO 11-24 17:46:31 logger.py:37] Received request chatcmpl-247fa5bbcca64896b269dc58fe916d23: prompt: '<|im_start|>system\nYou are responsible for transcribing audio recordings into text.<|im_end|>\n<|im_start|>user\nAudio 1: <|audio_bos|><|AUDIO|><|audio_eos|>\n提取文字<|im_end|>\n<|im_start|>assistant\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.01, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=512, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None.
INFO 11-24 17:46:31 preprocess.py:215] Your model uses the legacy input pipeline instead of the new multi-modal processor. Please note that the legacy pipeline will be removed in a future release. For more details, see: https://github.com/vllm-project/vllm/issues/10114
INFO 11-24 17:46:31 engine.py:267] Added request chatcmpl-247fa5bbcca64896b269dc58fe916d23.
INFO 11-24 17:46:32 model_runner_base.py:120] Writing input of failed execution to /tmp/err_execute_model_input_20241124-174632.pkl...
INFO 11-24 17:46:32 model_runner_base.py:149] Completed writing input of failed execution to /tmp/err_execute_model_input_20241124-174632.pkl.
CRITICAL 11-24 17:46:32 launcher.py:99] MQLLMEngine is already dead, terminating server process
INFO: 100.64.0.25:41350 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR 11-24 17:46:32 engine.py:135] RuntimeError('Error in model execution (input dumped to /tmp/err_execute_model_input_20241124-174632.pkl): shape mismatch: value tensor of shape [211, 4096] cannot be broadcast to indexing result of shape [212, 4096]')
ERROR 11-24 17:46:32 engine.py:135] Traceback (most recent call last):
ERROR 11-24 17:46:32 engine.py:135] File "/workspace/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/worker/model_runner_base.py", line 116, in _wrapper
ERROR 11-24 17:46:32 engine.py:135] return func(*args, **kwargs)
ERROR 11-24 17:46:32 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^
ERROR 11-24 17:46:32 engine.py:135] File "/workspace/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/worker/model_runner.py", line 1654, in execute_model
ERROR 11-24 17:46:32 engine.py:135] hidden_or_intermediate_states = model_executable(
ERROR 11-24 17:46:32 engine.py:135] ^^^^^^^^^^^^^^^^^
ERROR 11-24 17:46:32 engine.py:135] File "/workspace/miniconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 11-24 17:46:32 engine.py:135] return self._call_impl(*args, **kwargs)
ERROR 11-24 17:46:32 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-24 17:46:32 engine.py:135] File "/workspace/miniconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 11-24 17:46:32 engine.py:135] return forward_call(*args, **kwargs)
ERROR 11-24 17:46:32 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-24 17:46:32 engine.py:135] File "/workspace/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_audio.py", line 396, in forward
ERROR 11-24 17:46:32 engine.py:135] inputs_embeds[mask, :] = masked_audio_features
ERROR 11-24 17:46:32 engine.py:135] ~~~~~~~~~~~~~^^^^^^^^^
ERROR 11-24 17:46:32 engine.py:135] RuntimeError: shape mismatch: value tensor of shape [211, 4096] cannot be broadcast to indexing result of shape [212, 4096]
ERROR 11-24 17:46:32 engine.py:135]
ERROR 11-24 17:46:32 engine.py:135] The above exception was the direct cause of the following exception:
ERROR 11-24 17:46:32 engine.py:135]
ERROR 11-24 17:46:32 engine.py:135] Traceback (most recent call last):
ERROR 11-24 17:46:32 engine.py:135] File "/workspace/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 133, in start
ERROR 11-24 17:46:32 engine.py:135] self.run_engine_loop()
ERROR 11-24 17:46:32 engine.py:135] File "/workspace/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 196, in run_engine_loop
ERROR 11-24 17:46:32 engine.py:135] request_outputs = self.engine_step()
ERROR 11-24 17:46:32 engine.py:135] ^^^^^^^^^^^^^^^^^^
ERROR 11-24 17:46:32 engine.py:135] File "/workspace/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 214, in engine_step
ERROR 11-24 17:46:32 engine.py:135] raise e
ERROR 11-24 17:46:32 engine.py:135] File "/workspace/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 205, in engine_step
ERROR 11-24 17:46:32 engine.py:135] returnself.engine.step()
ERROR 11-24 17:46:32 engine.py:135] ^^^^^^^^^^^^^^^^^^
ERROR 11-24 17:46:32 engine.py:135] File "/workspace/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 1454, in step
ERROR 11-24 17:46:32 engine.py:135] outputs = self.model_executor.execute_model(
ERROR 11-24 17:46:32 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-24 17:46:32 engine.py:135] File "/workspace/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/gpu_executor.py", line 125, in execute_model
ERROR 11-24 17:46:32 engine.py:135] output = self.driver_worker.execute_model(execute_model_req)
ERROR 11-24 17:46:32 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-24 17:46:32 engine.py:135] File "/workspace/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 343, in execute_model
ERROR 11-24 17:46:32 engine.py:135] output = self.model_runner.execute_model(
ERROR 11-24 17:46:32 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-24 17:46:32 engine.py:135] File "/workspace/miniconda3/envs/vllm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 11-24 17:46:32 engine.py:135] return func(*args, **kwargs)
ERROR 11-24 17:46:32 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^
ERROR 11-24 17:46:32 engine.py:135] File "/workspace/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/worker/model_runner_base.py", line 152, in _wrapper
ERROR 11-24 17:46:32 engine.py:135] raise type(err)(
ERROR 11-24 17:46:32 engine.py:135] RuntimeError: Error in model execution (input dumped to /tmp/err_execute_model_input_20241124-174632.pkl): shape mismatch: value tensor of shape [211, 4096] cannot be broadcast to indexing result of shape [212, 4096]
Before submitting a new issue...
Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
The text was updated successfully, but these errors were encountered:
Are you able to reproduce this with a particular example? Or does this just happen randomly?
I have noticed occasional errors with MiniCPM-V similar to this one in the CI, but retrying always fixes it.
I’m not entirely sure because the input sound is captured from the microphone. It’s likely that this issue only occurs with a specific audio clip. I’ll try to reproduce the problem and upload the corresponding audio clip.
Your current environment
The output of `python collect_env.py`
Model Input Dumps
dump.zip
🐛 Describe the bug
When running the Qwen2-Audio model with vLLM for audio transcription tasks, the process may crash with a RuntimeError due to a shape mismatch during tensor operations.
Host the Qwen2-Audio Server:
Submit an audio file for text transcription:
Error log:
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: