You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment
What is the outcome that you are trying to reach?
You can run CPU inference on EKS with Graviton, ray serve with llama.cpp can help with that
Describe the solution you would like
Describe alternatives you have considered
Additional context
The text was updated successfully, but these errors were encountered:
In this blueprint, we will have a ray deployment of llama.cpp for model inference, a script for quantizing model and rearranging Model Weights, a script for benchmark
Community Note
What is the outcome that you are trying to reach?
You can run CPU inference on EKS with Graviton, ray serve with llama.cpp can help with that
Describe the solution you would like
Describe alternatives you have considered
Additional context
The text was updated successfully, but these errors were encountered: