How does this compare to triton inference server? #403

jebarpg · 2023-07-07T23:35:17Z

jebarpg
Jul 7, 2023

Forgive me I am new to llms so I don't know if this project is actually a comparable project to triton or not, but I would love some feedback on how they compare if at all. Thank you on advance 🤗

Answered by WoosukKwon

Jul 8, 2023

Hi @jebarpg Thanks for your interest and great question! NVIDIA Triton inference server is a serving system that provides high availability, observability, model versioning, etc. It needs to co-operate with an inference engine ("backend") that simply processes inputs with the models on GPUs, like vLLM, FasterTransformer, and PyTorch. Thus, while we haven't investigated it much, vLLM can be potentially used with Triton.

For now, Ray Serve provides an example of using vLLM as their backend. Please check it out.

View full answer

WoosukKwon · 2023-07-08T19:51:19Z

WoosukKwon
Jul 8, 2023
Maintainer

Hi @jebarpg Thanks for your interest and great question! NVIDIA Triton inference server is a serving system that provides high availability, observability, model versioning, etc. It needs to co-operate with an inference engine ("backend") that simply processes inputs with the models on GPUs, like vLLM, FasterTransformer, and PyTorch. Thus, while we haven't investigated it much, vLLM can be potentially used with Triton.

For now, Ray Serve provides an example of using vLLM as their backend. Please check it out.

1 reply

jebarpg Jul 9, 2023
Author

Okay great thank you for the feedback

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does this compare to triton inference server? #403

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

How does this compare to triton inference server? #403

jebarpg Jul 7, 2023

Replies: 1 comment · 1 reply

WoosukKwon Jul 8, 2023 Maintainer

jebarpg Jul 9, 2023 Author

jebarpg
Jul 7, 2023

Replies: 1 comment 1 reply

WoosukKwon
Jul 8, 2023
Maintainer

jebarpg Jul 9, 2023
Author