You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2024-08-25T07:02:45.593647Z WARN text_generation_router: router/src/main.rs:372: Invalid hostname, defaulting to 0.0.0.0
2024-08-25T07:02:45.597333Z INFO text_generation_router::server: router/src/server.rs:1613: Warming up model
2024-08-25T07:02:45.597833Z DEBUG text_generation_launcher: Prefilling 1 new request(s) with 1 empty slot(s)
2024-08-25T07:02:45.598003Z DEBUG text_generation_launcher: Request 0 assigned to slot 0
2024-08-25T07:02:45.671381Z DEBUG text_generation_launcher: Model ready for decoding
2024-08-25T07:02:45.671501Z INFO text_generation_launcher: Removing slot 0 with request 0
2024-08-25T07:02:45.671737Z INFO text_generation_router::server: router/src/server.rs:1640: Using scheduler V2
2024-08-25T07:02:45.671750Z INFO text_generation_router::server: router/src/server.rs:1646: Setting max batch total tokens to 1024
2024-08-25T07:02:45.740855Z INFO text_generation_router::server: router/src/server.rs:1884: Connected
Expected behavior
The TGI server can be started normally.
The text was updated successfully, but these errors were encountered:
cszhz
changed the title
Cannot host Llama-3-8B with optimum-neuron TGI contianer using optimum-neuron(0.0.24) and neuron-sdk(2.19.1)
Cannot host Llama-3-8B exported by optimum-neuron with TGI contianer using optimum-neuron(0.0.24) and neuron-sdk(2.19.1)
Aug 25, 2024
@cszhz thank you for your feedback.
According to your traces, the server started normally. What do you mean when you say it hangs ?
What do you get when you query its URL using CURL or the huggingface_hub inference client ?
Hi @dacorvo
I don't think server started normally, In the previous 0.0.21 image, it worked fine.
Here is the response from docker container
763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.2-optimum0.0.24-neuronx-py310-ubuntu22.04
curl 127.0.0.1:8080/generate \
-X POST \
-d '{
"inputs":"What is Deep Learning?",
"parameters":{
"max_new_tokens":20
}
}' \
-H 'Content-Type: application/json'
System Info
Who can help?
Inference @dacorvo, @JingyaHuang
TGI @dacorvo
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction (minimal, reproducible, runnable)
I confirm
optimum-neuron
version: 0.0.21 with Neuron 2.18.2 is working fine.After about 1 minutes, the server hangs
Expected behavior
The TGI server can be started normally.
The text was updated successfully, but these errors were encountered: