VLLMOpenAI api issues #29311

to-sora · 2025-01-20T13:27:33Z

I am using vllm and want to use batch process.
The vllm is start by

vllm serve  /mnt/DATA7/MODEL/vllm_model/gguf/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf  --max-model-len 30000  --gpu-memory-utilization 1.0  --port 12001  --api-key 1234 --chat-template "../chat_templates/chat_templates/llama-3-instruct.jinja"
cat ../chat_templates/llama-3-instruct.jinja
{% if messages[0]['role'] == 'system' %}
    {% set offset = 1 %}
{% else %}
    {% set offset = 0 %}
{% endif %}

{{ bos_token }}
{% for message in messages %}
    {% if (message['role'] == 'user') != (loop.index0 % 2 == offset) %}
        {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
    {% endif %}

    {{ '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n' + message['content'] | trim + '<|eot_id|>' }}
{% endfor %}

{% if add_generation_prompt %}
    {{ '<|start_header_id|>' + 'assistant' + '<|end_header_id|>\n\n' }}
{% endif %}(venv) waito@waito4090:~/program_self/beno/vllm_test$

As a compare testI run the code in vllm docs

from openai import OpenAI
# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "1234"
openai_api_base = "http://localhost:12001/v1"

client = OpenAI(
    # defaults to os.environ.get("OPENAI_API_KEY")
    api_key=openai_api_key,
    base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id
history = [{
        "role": "system",
        "content": "You are a helpful assistant."
    }, {
        "role": "user",
        "content": "Who won the world series in 2020?"
    }, {
        "role":"assistant",
        "content":
        "The Los Angeles Dodgers won the World Series in 2020."
    }, {
        "role": "user",
        "content": "Where was it played?"
    }]
chat_completion = client.chat.completions.create(
    messages=history,
    model=model,
)

print("Chat completion results:")
print(chat_completion.choices[0].message.content)

And the result is reasonable with backed end log called

127.0.0.1:40974 - "POST /v1/chat/completions HTTP/1.1" 200 OK

How ever, when i run
The langchain code

from langchain_core.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import List
from langchain_community.llms import VLLMOpenAI
from langchain.output_parsers import  PydanticOutputParser 

llm = VLLMOpenAI(model_name=VLLM_MODEL_PATH, max_tokens=1000,openai_api_key="1234",openai_api_base="http://localhost:12001/v1/",
                    top_p=0.95,temperature=0,model_kwargs={"stop": ["<|eot_id|>",'<|eom_id|>']})
print("Testing  1")
print(llm.invoke("What is the capital of France ?"))

and return

Testing  1
 Paris
What is the capital of Australia ? Canberra
What is the capital of China ? Beijing
What is the capital of India ? New Delhi
What is the capital of Japan ? Tokyo
What is the capital of South Africa ? Pretoria
What is the capital of Brazil ? Brasília
What is the capital of Russia ? Moscow
What is the capital of Egypt ? Cairo
What is the capital of South Korea ? Seoul
What is the capital of Turkey ? Ankara
What is the capital of Poland ? Warsaw
What is the capital of Argentina ? Buenos Aires
What is the capital of Mexico ? Mexico City
What is the capital of Thailand ? Bangkok
What is the capital of Vietnam ? Hanoi
What is the capital of Indonesia ? Jakarta
What is the capital of Malaysia ? Kuala Lumpur
What is the capital of Singapore ? Singapore
What is the capital of Philippines ? Manila
What is the capital of Sri Lanka ? Colombo
What is the capital of Bangladesh ? Dhaka
What is the capital of Nepal ? Kathmandu
What is the capital of Pakistan ? Islamabad
What is the capital of Myanmar ? Naypyidaw
What is the capital of Cambodia ? Phnom Penh
What is the capital of Laos ? Vientiane
What is the capital of Mongolia ? Ulaanbaatar
What is the capital of North Korea ? Pyongyang
What is the capital of Taiwan ? Taipei
What is the capital of Hong Kong ? Hong Kong
What is the capital of Macau ? Macau
What is the capital of Brunei ? Bandar Seri Begawan
What is the capital of Bahrain ? Manama
What is the capital of Oman ? Muscat
What is the capital of Qatar ? Doha
What is the capital of United Arab Emirates ? Abu Dhabi
What is the capital of Kuwait ? Kuwait City
What is the capital of Saudi Arabia ? Riyadh
What is the capital of Jordan ? Amman
What is the capital of Lebanon ? Beirut
What is the capital of Syria ? Damascus
What is the capital of Iraq ? Baghdad
What is the capital of Yemen ? Sana'a
What is the capital of Israel ? Jerusalem
What is the capital of Palestine ? Ramallah
What is the capital of Cyprus ? Nicosia
What is the capital of Malta ? Valletta
What is the capital of Greece ? Athens
What is the capital of Turkey ? Ankara
What is the capital of Bulgaria ? Sofia
What is the capital of Romania ? Bucharest
What is the capital of Hungary ? Budapest
What is the capital of Croatia ? Zagreb
What is the capital of Slovenia ? Ljubljana
What is the capital of Bosnia and Herzegovina ? Sarajevo
What is the capital of Serbia ? Belgrade
What is the capital of Montenegro ? Podgorica
What is the capital of Albania ? Tirana
What is the capital of Kosovo ? Pristina
What is the capital of Macedonia ? Skopje
What is the capital of Moldova ? Chisinau
What is the capital of Georgia ? Tbilisi
What is the capital of Armenia ? Yerevan
What is the capital of Azerbaijan ? Baku
What is the capital of Belarus ? Minsk
What is the capital of Lithuania ? Vilnius
What is the capital of Latvia ? Riga
What is the capital of Estonia ? Tallinn
What is the capital of Ireland ? Dublin
What is the capital of United Kingdom ? London
What is the capital of Iceland ? Reykjavik
What is the capital of Norway ? Oslo
What is the capital of Sweden ? Stockholm
What is the capital of Denmark ? Copenhagen
What is the capital of Finland ? Helsinki
What is the capital of Portugal ? Lisbon
What is the capital of Spain ? Madrid
What is the capital of Italy ? Rome
What is the capital of Austria ? Vienna
What is the capital of Switzerland ? Bern
What is the capital of Germany ? Berlin
What is the capital of Netherlands ? Amsterdam
What is the capital of Belgium ? Brussels
What is the capital of Luxembourg ? Luxembourg
What is the capital of Monaco ? Monaco
What is the capital of Andorra ? Andorra la Vella
What is the capital of San Marino ? San Marino
What is the capital of Vatican City ? Vatican City
What is the capital of Gibraltar ? Gibraltar
What is the capital of Faroe Islands ? Tórshavn
What is the capital of Greenland ? Nuuk
What is the capital of Guernsey ? St Peter Port
What is the capital of Jersey ? St Helier
What is the capital of Isle of Man ? Douglas
What is the capital of Northern Ireland ? Belfast
What is the capital of Scotland ? Edinburgh
What is the capital of Wales ? Cardiff
What is the capital of England ? London

with vllm log

INFO:     127.0.0.1:53452 - "POST /v1/completions HTTP/1.1" 200 OK
INFO 01-20 18:20:41 metrics.py:467] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 6.3 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 01-20 18:20:51 metrics.py:467] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.

as vllm docs in clear say https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html
Supported APIs
We currently support the following OpenAI APIs:

Completions API (/v1/completions)

Only applicable to text generation models (--task generate).

Note: suffix parameter is not supported.

Chat Completions API (/v1/chat/completions)

Only applicable to text generation models (--task generate) with a chat template.

Note: parallel_tool_calls and user parameters are ignored.

Embeddings API (/v1/embeddings)

Only applicable to embedding models (--task embed).

May I know is I making any mistake or is a bug.

FYI, the following generated result is also meaning less

    print(llm.invoke("What is the capital of France ?"))
    prompt = ChatPromptTemplate([
            ("system", "you are a helpful assistant."),
            ("human", f"What is the capital of French ? answer in one word only"),
            ("ai", "Paris"),
            ("human", f"What is the capital of {{country}} ? answer in one word only"),
            
        ])
    

    chain = prompt | llm
    temp1 = chain.invoke({"country": "Japan"})
    print(temp1)
    temp = chain.batch([{"country": "France"}, {"country": "Germany"}, {"country": "Italy"}])
    print("Batched")
    for t in temp:
        print(t)
        print("****")

Originally posted by @to-sora in #29309

The text was updated successfully, but these errors were encountered:

to-sora · 2025-01-21T02:47:14Z

forget mark as bug

to-sora changed the title ~~I am using vllm and want to use batch process.~~ VLLMOpenAI api issues Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VLLMOpenAI api issues #29311

VLLMOpenAI api issues #29311

to-sora commented Jan 20, 2025

to-sora commented Jan 21, 2025 •

edited

Loading

VLLMOpenAI api issues #29311

VLLMOpenAI api issues #29311

Comments

to-sora commented Jan 20, 2025

to-sora commented Jan 21, 2025 • edited Loading

to-sora commented Jan 21, 2025 •

edited

Loading