Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VLLMOpenAI api issues #29311

Open
to-sora opened this issue Jan 20, 2025 · 1 comment
Open

VLLMOpenAI api issues #29311

to-sora opened this issue Jan 20, 2025 · 1 comment

Comments

@to-sora
Copy link

to-sora commented Jan 20, 2025

I am using vllm and want to use batch process.
The vllm is start by

vllm serve  /mnt/DATA7/MODEL/vllm_model/gguf/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf  --max-model-len 30000  --gpu-memory-utilization 1.0  --port 12001  --api-key 1234 --chat-template "../chat_templates/chat_templates/llama-3-instruct.jinja"
cat ../chat_templates/llama-3-instruct.jinja
{% if messages[0]['role'] == 'system' %}
    {% set offset = 1 %}
{% else %}
    {% set offset = 0 %}
{% endif %}

{{ bos_token }}
{% for message in messages %}
    {% if (message['role'] == 'user') != (loop.index0 % 2 == offset) %}
        {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
    {% endif %}

    {{ '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n' + message['content'] | trim + '<|eot_id|>' }}
{% endfor %}

{% if add_generation_prompt %}
    {{ '<|start_header_id|>' + 'assistant' + '<|end_header_id|>\n\n' }}
{% endif %}(venv) waito@waito4090:~/program_self/beno/vllm_test$

As a compare testI run the code in vllm docs

from openai import OpenAI
# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "1234"
openai_api_base = "http://localhost:12001/v1"

client = OpenAI(
    # defaults to os.environ.get("OPENAI_API_KEY")
    api_key=openai_api_key,
    base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id
history = [{
        "role": "system",
        "content": "You are a helpful assistant."
    }, {
        "role": "user",
        "content": "Who won the world series in 2020?"
    }, {
        "role":"assistant",
        "content":
        "The Los Angeles Dodgers won the World Series in 2020."
    }, {
        "role": "user",
        "content": "Where was it played?"
    }]
chat_completion = client.chat.completions.create(
    messages=history,
    model=model,
)

print("Chat completion results:")
print(chat_completion.choices[0].message.content)

And the result is reasonable with backed end log called

127.0.0.1:40974 - "POST /v1/chat/completions HTTP/1.1" 200 OK

How ever, when i run
The langchain code

from langchain_core.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import List
from langchain_community.llms import VLLMOpenAI
from langchain.output_parsers import  PydanticOutputParser 

llm = VLLMOpenAI(model_name=VLLM_MODEL_PATH, max_tokens=1000,openai_api_key="1234",openai_api_base="http://localhost:12001/v1/",
                    top_p=0.95,temperature=0,model_kwargs={"stop": ["<|eot_id|>",'<|eom_id|>']})
print("Testing  1")
print(llm.invoke("What is the capital of France ?"))

and return

Testing  1
 Paris
What is the capital of Australia ? Canberra
What is the capital of China ? Beijing
What is the capital of India ? New Delhi
What is the capital of Japan ? Tokyo
What is the capital of South Africa ? Pretoria
What is the capital of Brazil ? Brasília
What is the capital of Russia ? Moscow
What is the capital of Egypt ? Cairo
What is the capital of South Korea ? Seoul
What is the capital of Turkey ? Ankara
What is the capital of Poland ? Warsaw
What is the capital of Argentina ? Buenos Aires
What is the capital of Mexico ? Mexico City
What is the capital of Thailand ? Bangkok
What is the capital of Vietnam ? Hanoi
What is the capital of Indonesia ? Jakarta
What is the capital of Malaysia ? Kuala Lumpur
What is the capital of Singapore ? Singapore
What is the capital of Philippines ? Manila
What is the capital of Sri Lanka ? Colombo
What is the capital of Bangladesh ? Dhaka
What is the capital of Nepal ? Kathmandu
What is the capital of Pakistan ? Islamabad
What is the capital of Myanmar ? Naypyidaw
What is the capital of Cambodia ? Phnom Penh
What is the capital of Laos ? Vientiane
What is the capital of Mongolia ? Ulaanbaatar
What is the capital of North Korea ? Pyongyang
What is the capital of Taiwan ? Taipei
What is the capital of Hong Kong ? Hong Kong
What is the capital of Macau ? Macau
What is the capital of Brunei ? Bandar Seri Begawan
What is the capital of Bahrain ? Manama
What is the capital of Oman ? Muscat
What is the capital of Qatar ? Doha
What is the capital of United Arab Emirates ? Abu Dhabi
What is the capital of Kuwait ? Kuwait City
What is the capital of Saudi Arabia ? Riyadh
What is the capital of Jordan ? Amman
What is the capital of Lebanon ? Beirut
What is the capital of Syria ? Damascus
What is the capital of Iraq ? Baghdad
What is the capital of Yemen ? Sana'a
What is the capital of Israel ? Jerusalem
What is the capital of Palestine ? Ramallah
What is the capital of Cyprus ? Nicosia
What is the capital of Malta ? Valletta
What is the capital of Greece ? Athens
What is the capital of Turkey ? Ankara
What is the capital of Bulgaria ? Sofia
What is the capital of Romania ? Bucharest
What is the capital of Hungary ? Budapest
What is the capital of Croatia ? Zagreb
What is the capital of Slovenia ? Ljubljana
What is the capital of Bosnia and Herzegovina ? Sarajevo
What is the capital of Serbia ? Belgrade
What is the capital of Montenegro ? Podgorica
What is the capital of Albania ? Tirana
What is the capital of Kosovo ? Pristina
What is the capital of Macedonia ? Skopje
What is the capital of Moldova ? Chisinau
What is the capital of Georgia ? Tbilisi
What is the capital of Armenia ? Yerevan
What is the capital of Azerbaijan ? Baku
What is the capital of Belarus ? Minsk
What is the capital of Lithuania ? Vilnius
What is the capital of Latvia ? Riga
What is the capital of Estonia ? Tallinn
What is the capital of Ireland ? Dublin
What is the capital of United Kingdom ? London
What is the capital of Iceland ? Reykjavik
What is the capital of Norway ? Oslo
What is the capital of Sweden ? Stockholm
What is the capital of Denmark ? Copenhagen
What is the capital of Finland ? Helsinki
What is the capital of Portugal ? Lisbon
What is the capital of Spain ? Madrid
What is the capital of Italy ? Rome
What is the capital of Austria ? Vienna
What is the capital of Switzerland ? Bern
What is the capital of Germany ? Berlin
What is the capital of Netherlands ? Amsterdam
What is the capital of Belgium ? Brussels
What is the capital of Luxembourg ? Luxembourg
What is the capital of Monaco ? Monaco
What is the capital of Andorra ? Andorra la Vella
What is the capital of San Marino ? San Marino
What is the capital of Vatican City ? Vatican City
What is the capital of Gibraltar ? Gibraltar
What is the capital of Faroe Islands ? Tórshavn
What is the capital of Greenland ? Nuuk
What is the capital of Guernsey ? St Peter Port
What is the capital of Jersey ? St Helier
What is the capital of Isle of Man ? Douglas
What is the capital of Northern Ireland ? Belfast
What is the capital of Scotland ? Edinburgh
What is the capital of Wales ? Cardiff
What is the capital of England ? London

with vllm log

INFO:     127.0.0.1:53452 - "POST /v1/completions HTTP/1.1" 200 OK
INFO 01-20 18:20:41 metrics.py:467] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 6.3 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 01-20 18:20:51 metrics.py:467] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.

as vllm docs in clear say https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html
Supported APIs
We currently support the following OpenAI APIs:

Completions API (/v1/completions)

Only applicable to text generation models (--task generate).

Note: suffix parameter is not supported.

Chat Completions API (/v1/chat/completions)

Only applicable to text generation models (--task generate) with a chat template.

Note: parallel_tool_calls and user parameters are ignored.

Embeddings API (/v1/embeddings)

Only applicable to embedding models (--task embed).

May I know is I making any mistake or is a bug.

FYI, the following generated result is also meaning less

    print(llm.invoke("What is the capital of France ?"))
    prompt = ChatPromptTemplate([
            ("system", "you are a helpful assistant."),
            ("human", f"What is the capital of French ? answer in one word only"),
            ("ai", "Paris"),
            ("human", f"What is the capital of {{country}} ? answer in one word only"),
            
        ])
    

    chain = prompt | llm
    temp1 = chain.invoke({"country": "Japan"})
    print(temp1)
    temp = chain.batch([{"country": "France"}, {"country": "Germany"}, {"country": "Italy"}])
    print("Batched")
    for t in temp:
        print(t)
        print("****")

Originally posted by @to-sora in #29309

@to-sora to-sora changed the title I am using vllm and want to use batch process. VLLMOpenAI api issues Jan 20, 2025
@to-sora
Copy link
Author

to-sora commented Jan 21, 2025

forget mark as bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant