Alibaba-NLP/gte-Qwen2-1.5B-instruct #8

axeloh · 2024-07-01T15:58:37Z

Hi 😄

I am trying to run the Alibaba-NLP/gte-Qwen2-1.5B-instruct model (https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct) on RunPod serverless.
I am using the docker image runpod/worker-infinity-embedding:dev-cuda11.8.0.

Upon incoming requests, the pod logs shows this error:

ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run `pip install flash_attn`

The text was updated successfully, but these errors were encountered:

Steel-skull · 2024-07-04T12:44:20Z

having same issue on my end

minhazma · 2024-07-11T14:40:33Z

Same issue here with

Alibaba-NLP/gte-Qwen2-7B-instruct

TimPietrusky · 2024-08-15T10:13:05Z

Thanks for reporting this.

Do you have any idea why this error could arise @michaelfeil?

michaelfeil · 2024-08-15T14:48:48Z

@TimPietrusky Because flash-attn is not installed.

Solution: You can use vllm for the 7b model embeddings, as qwen is a decoder model. And not built for high throughput embeddings - its actually pretty annoying to support that.

You can always build your own docker image, install pip install flash-attn & use that one. I recommend installing it from the prebuild wheels from tri dao, to not deal with nvcc.

TimPietrusky · 2024-08-15T15:34:36Z

@michaelfeil thank you, makes total sense!

@axeloh can you please take a look at the comment above?

@pandyamarut we should update our README to mention that we are not supporting qwen, so that users are aware to use our vllm-worker or something similar.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alibaba-NLP/gte-Qwen2-1.5B-instruct #8

Alibaba-NLP/gte-Qwen2-1.5B-instruct #8

axeloh commented Jul 1, 2024 •

edited

Loading

Steel-skull commented Jul 4, 2024

minhazma commented Jul 11, 2024

TimPietrusky commented Aug 15, 2024

michaelfeil commented Aug 15, 2024 •

edited

Loading

TimPietrusky commented Aug 15, 2024

Alibaba-NLP/gte-Qwen2-1.5B-instruct #8

Alibaba-NLP/gte-Qwen2-1.5B-instruct #8

Comments

axeloh commented Jul 1, 2024 • edited Loading

Steel-skull commented Jul 4, 2024

minhazma commented Jul 11, 2024

TimPietrusky commented Aug 15, 2024

michaelfeil commented Aug 15, 2024 • edited Loading

TimPietrusky commented Aug 15, 2024

axeloh commented Jul 1, 2024 •

edited

Loading

michaelfeil commented Aug 15, 2024 •

edited

Loading