deploy gradio app for llama2 on inf2/ray to k8s #495

harishvs · 2024-04-08T05:21:47Z

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

right now, we dont have yaml file to deploy the gradio app for llama2 inference to k8s. We run it locally on the users laptop. this can be very slow.

deploy gradio app for llama2 on inf2/ray to k8s and update the documentation to reflect that https://awslabs.github.io/data-on-eks/docs/gen-ai/inference/Llama2
This will speed up end to end response time significantly.

None

The stable diffusion model already does this, so copy the pattern from that

vara-bonthu added the enhancement New feature or request label Apr 14, 2024

Provide feedback