最新版本的xinference无法正常启动qwen2-vl-instruct模型 #2554

majestichou · 2024-11-14T12:10:53Z

System Info / 系統信息

cuda 12.2，centos7

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

docker / docker
pip install / 通过 pip install 安装
installation from source / 从源码安装

Version info / 版本信息

V0.16.3

The command used to start Xinference / 用以启动 xinference 的命令

docker run -d -v /home/llm-test/embedding_and_rerank_model:/root/models -p 9998:9997 --gpus all xprobe/xinference:latest xinference-local -H 0.0.0.0

Reproduction / 复现过程

下载模型Qwen2-VL-7B-Instruct到目标目录：/home/llm-test/embedding_and_rerank_model
采用命令 docker run -d -v /home/llm-test/embedding_and_rerank_model:/root/models -p 9998:9997 --gpus all xprobe/xinference:latest xinference-local -H 0.0.0.0启动Xinference
然后打开网页，选择Launch Model，选中qwen2-vl-instruct模型，Model Path填写为/root/models/Qwen2-VL-7B-Instruct，单击启动按钮
运行报错，报错信息如下

2024-11-14 08:35:27,521 xinference.core.worker 140 ERROR    Failed to load model qwen2-vl-instruct-0
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/xinference/core/worker.py", line 897, in launch_builtin_model
    await model_ref.load()
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 231, in send
    return self._process_result_message(result)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 398, in load
    self._model.load()
  File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/qwen2_vl.py", line 53, in load
    from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
ImportError: [address=0.0.0.0:43921, pid=1213] cannot import name 'Qwen2VLForConditionalGeneration' from 'transformers' (/usr/local/lib/python3.10/dist-packages/transformers/__init__.py)

Expected behavior / 期待表现

能够正常启动模型

The text was updated successfully, but these errors were encountered:

codingl2k1 · 2024-11-14T16:45:43Z

你的transformers版本是多少？可以尝试更新一下transformers.

majestichou · 2024-11-14T22:46:05Z

@codingl2k1 啊？镜像里面自带的还不行吗？我用的docker镜像

jacobdong · 2024-11-17T08:30:58Z

@codingl2k1
qwen2-audio 模型也是一样

2024-11-17 16:25:41 ImportError: [address=0.0.0.0:46487, pid=176] cannot import name 'Qwen2AudioForConditionalGeneration' from 'transformers' (/usr/local/lib/python3.10/dist-packages/transformers/init.py)

ChiayenGu · 2024-11-18T01:36:58Z

我也遇到了这个问题，docker版本是0.16.0

JumpNew · 2024-11-18T02:49:20Z

升级transformers到最新的版本，就可以启动了

harryzwh · 2024-11-18T15:26:21Z

Same here.
And I actually load the AWQ version, but why only the transformer engine available? Does vllm support vision model with AWQ quantization?

cyhasuka · 2024-11-19T01:46:12Z

Same here. And I actually load the AWQ version, but why only the transformer engine available? Does vllm support vision model with AWQ quantization?

Yes. plz confirm vllm>=0.6.4

harryzwh · 2024-11-19T15:27:21Z

Same here. And I actually load the AWQ version, but why only the transformer engine available? Does vllm support vision model with AWQ quantization?

Yes. plz confirm vllm>=0.6.4

Comfirmed that updating transformers to >4.46 and rebuild the docker image fixed this issue. However, changing to the base docker image from vllm 0.6.0 to 0.6.4 introducts a number of errors, mainly because of the python is also updated from 3.10 to 3.12. Still figuring out how to build docker image based on vllm 0.6.4.

cnrbi1 · 2024-11-24T16:28:52Z

这个模型调用两次显存占用就翻倍了，有没有释放机制啊，很快就OOM了

XprobeBot added the gpu label Nov 14, 2024

XprobeBot added this to the v0.16 milestone Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

最新版本的xinference无法正常启动qwen2-vl-instruct模型 #2554

最新版本的xinference无法正常启动qwen2-vl-instruct模型 #2554

majestichou commented Nov 14, 2024

codingl2k1 commented Nov 14, 2024

majestichou commented Nov 14, 2024

jacobdong commented Nov 17, 2024

ChiayenGu commented Nov 18, 2024

JumpNew commented Nov 18, 2024

harryzwh commented Nov 18, 2024

cyhasuka commented Nov 19, 2024

harryzwh commented Nov 19, 2024

cnrbi1 commented Nov 24, 2024

最新版本的xinference无法正常启动qwen2-vl-instruct模型 #2554

最新版本的xinference无法正常启动qwen2-vl-instruct模型 #2554

Comments

majestichou commented Nov 14, 2024

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

codingl2k1 commented Nov 14, 2024

majestichou commented Nov 14, 2024

jacobdong commented Nov 17, 2024

ChiayenGu commented Nov 18, 2024

JumpNew commented Nov 18, 2024

harryzwh commented Nov 18, 2024

cyhasuka commented Nov 19, 2024

harryzwh commented Nov 19, 2024

cnrbi1 commented Nov 24, 2024