[Usage]: Dose vLLM support embedding api of multimodal llm? #8483

sfyumi · 2024-09-14T04:19:50Z

Your current environment

The output of `python collect_env.py`

How would you like to use vllm

eg: get embedding of minicpmv 2.6

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

DarkLight1337 · 2024-09-14T04:26:54Z

No, this is not supported yet.

DarkLight1337 · 2024-09-14T04:27:40Z

In fact, this isn't even available for most language-only models. The only one supported right now is Mistral. See also #7915

noooop · 2024-09-14T05:52:14Z

I am working on it. #8453 #8452 @DarkLight1337

noooop · 2024-09-20T03:44:05Z

According to my understanding, MiniCPM-V 2.6 is a generative model, not a retrieval model specifically used to generate embeddings. (Maybe you need multimodal retrieval models such as BAAI/bge-visualized-m3 https://huggingface.co/BAAI/bge-visualized）

Can you send some sample code and tell me how you want to use MiniCPM-V 2.6 to generate embedding
@sfyumi

sfyumi · 2024-09-20T09:46:10Z

@noooop
We obtain the last hidden states from a language model (LLM) and use a multi-layer linear transformation to reduce the dimensionality as embedding.
We use both minicpmv 2.6 and qwen2 model as base model to get embeddings.

sample code

class MiniCpmlWithProjectionModel(MiniCPMV):
    def __init__(self, config):
        super().__init__(config)
        embedding_dim = config.hidden_size
        projection_output_dim = config.projection_output_dim if hasattr(config, "projection_output_dim") else 128
        self.projection_layer = nn.Sequential(
            nn.Linear(embedding_dim, 512),
            nn.ReLU(),
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.Linear(256, projection_output_dim)
        )

    def forward(self, data, **kwargs):
        outputs = super().forward(data, **kwargs)
        hidden_states = outputs.hidden_states[-1]
        return self.projection_layer(hidden_states)

noooop · 2024-09-20T16:05:36Z

~~Understood~~

~~There is a small detail here.~~
~~vllm routing model_name to it corresponding implementation is through the architectures parameter in config.json.~~

~~You must think of a cool name to avoid routing to the previous model.~~

noooop · 2024-09-20T16:16:42Z

~~There are also issues that propose to output last hidden states #853, but I think this is very costly and the best way is to implement a model yourself. adding_model~~

noooop · 2024-09-20T17:22:46Z

Simple but inefficient method:

Output last hidden states #853, A hacker’s method is mentioned below. https://github.com/WuNein/vllm4mteb/tree/main
(Maybe vllm can add an option to output last hidden states in the future.
But you need to implement mlp in another process

More efficient implementation

Implement a model yourself. adding_model
There is a small detail here.
vllm routing model_name to it corresponding implementation is through the architectures parameter in config.json.

You must think of a cool name to avoid routing to the previous model.

DarkLight1337 · 2024-10-23T08:40:48Z

You can now modify any existing model to support embeddings, please see #9314 (comment).

sfyumi added the usage How to use vllm label Sep 14, 2024

DarkLight1337 mentioned this issue Sep 14, 2024

[RFC]: Multi-modality Support Refactoring #4194

Open

34 tasks

noooop mentioned this issue Sep 24, 2024

[RFC]: Support encode only models by Workflow Defined Engine #8453

Open

1 task

noooop mentioned this issue Oct 17, 2024

[Model] Add user-configurable task for models that support both generation and embedding #9424

Merged

DarkLight1337 closed this as completed Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: Dose vLLM support embedding api of multimodal llm? #8483

[Usage]: Dose vLLM support embedding api of multimodal llm? #8483

sfyumi commented Sep 14, 2024

DarkLight1337 commented Sep 14, 2024

DarkLight1337 commented Sep 14, 2024 •

edited

Loading

noooop commented Sep 14, 2024 •

edited

Loading

noooop commented Sep 20, 2024

sfyumi commented Sep 20, 2024 •

edited

Loading

noooop commented Sep 20, 2024 •

edited

Loading

noooop commented Sep 20, 2024 •

edited

Loading

noooop commented Sep 20, 2024 •

edited

Loading

DarkLight1337 commented Oct 23, 2024

[Usage]: Dose vLLM support embedding api of multimodal llm? #8483

[Usage]: Dose vLLM support embedding api of multimodal llm? #8483

Comments

sfyumi commented Sep 14, 2024

Your current environment

How would you like to use vllm

Before submitting a new issue...

DarkLight1337 commented Sep 14, 2024

DarkLight1337 commented Sep 14, 2024 • edited Loading

noooop commented Sep 14, 2024 • edited Loading

noooop commented Sep 20, 2024

sfyumi commented Sep 20, 2024 • edited Loading

noooop commented Sep 20, 2024 • edited Loading

noooop commented Sep 20, 2024 • edited Loading

noooop commented Sep 20, 2024 • edited Loading

Simple but inefficient method:

More efficient implementation

DarkLight1337 commented Oct 23, 2024

DarkLight1337 commented Sep 14, 2024 •

edited

Loading

noooop commented Sep 14, 2024 •

edited

Loading

sfyumi commented Sep 20, 2024 •

edited

Loading

noooop commented Sep 20, 2024 •

edited

Loading

noooop commented Sep 20, 2024 •

edited

Loading

noooop commented Sep 20, 2024 •

edited

Loading