-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROPOSAL] Apply GPU acceleration to ML Node in k8s operator #906
Comments
[Triage] Coming from operator side yes its a limitation, custom image at nodePool level Regarding GPU acceleration - OpenSearch Documentation, Adding @dblock to provide some guidelines. Thank you |
My thoughts: |
Adding @vamshin to provide some thoughts on CUDA OpenSearch development, more than operator I feel this issue should be part of ml-commons repo to continue the discussion related to GPU acceleration - OpenSearch Documentation. |
@YeonghyeonKO could you please help us with use cases for GPU with ml nodes? |
@prudhvigodithi Thanks for continuing the discussion. I totally agree with you that it's prior to build cuda image than nodePools[] thing. |
@vamshin Sure, I would happily help as a tester for deploying cuda image to hosts(k8s worker nodes) where nvidia toolkit has been installed in. |
@vamshin
ML Nodes will be deployed in this worker node(GPU). I'd already tested the deployment of them via opensearch-k8s-operator using the existing opensearch docker image instead of CUDA-based image. To avoid being disturbed by nodes with other roles, the property of nodeSelector and tolerations is used. nodePools:
- component: ml
replicas: 3
nodeSelector:
gpu: "true"
tolerations:
- key: node.kubernetes.io/unschedulable
operator: "Exists"
effect: "NoSchedule" |
What are you proposing?
As @stevapple has already suggested #832 how to override CUDA image in nodePools[] in opensearch k8s oprator, this issue is an extended discussion with that concept.
nodeSelector
property as below:But this is all for now because we can't set a GPU-enabled image for opensearch, except
opensearchCluster.general.image
in yaml.Is there any further progress in this idea?
For implementing GPU acceleration - OpenSearch Documentation, we need guidelines and an exact standard for it.
The text was updated successfully, but these errors were encountered: