Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deploy gradio app for llama2 on inf2/ray to k8s #495

Open
harishvs opened this issue Apr 8, 2024 · 0 comments
Open

deploy gradio app for llama2 on inf2/ray to k8s #495

harishvs opened this issue Apr 8, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@harishvs
Copy link
Contributor

harishvs commented Apr 8, 2024

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

What is the outcome that you are trying to reach?

right now, we dont have yaml file to deploy the gradio app for llama2 inference to k8s. We run it locally on the users laptop. this can be very slow.

Describe the solution you would like

deploy gradio app for llama2 on inf2/ray to k8s and update the documentation to reflect that https://awslabs.github.io/data-on-eks/docs/gen-ai/inference/Llama2
This will speed up end to end response time significantly.

Describe alternatives you have considered

None

Additional context

The stable diffusion model already does this, so copy the pattern from that

@vara-bonthu vara-bonthu added the enhancement New feature or request label Apr 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants