Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alibaba-NLP/gte-Qwen2-1.5B-instruct #8

Open
axeloh opened this issue Jul 1, 2024 · 5 comments
Open

Alibaba-NLP/gte-Qwen2-1.5B-instruct #8

axeloh opened this issue Jul 1, 2024 · 5 comments

Comments

@axeloh
Copy link

axeloh commented Jul 1, 2024

Hi 😄

I am trying to run the Alibaba-NLP/gte-Qwen2-1.5B-instruct model (https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct) on RunPod serverless.
I am using the docker image runpod/worker-infinity-embedding:dev-cuda11.8.0.

Upon incoming requests, the pod logs shows this error:

ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run `pip install flash_attn`

@Steel-skull
Copy link

having same issue on my end

@minhazma
Copy link

Same issue here with

Alibaba-NLP/gte-Qwen2-7B-instruct

@TimPietrusky
Copy link

Thanks for reporting this.

Do you have any idea why this error could arise @michaelfeil?

@michaelfeil
Copy link
Contributor

michaelfeil commented Aug 15, 2024

@TimPietrusky Because flash-attn is not installed.

Solution: You can use vllm for the 7b model embeddings, as qwen is a decoder model. And not built for high throughput embeddings - its actually pretty annoying to support that.

You can always build your own docker image, install pip install flash-attn & use that one. I recommend installing it from the prebuild wheels from tri dao, to not deal with nvcc.

@TimPietrusky
Copy link

@michaelfeil thank you, makes total sense!

@axeloh can you please take a look at the comment above?

@pandyamarut we should update our README to mention that we are not supporting qwen, so that users are aware to use our vllm-worker or something similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants