[FEATURE] Allow overriding image spec for node group #832

stevapple · 2024-05-31T01:46:25Z

Is your feature request related to a problem?

We’re running an OpenSearch cluster with dedicated ML nodes supported by CUDA. These nodes and run with GPU-enabled container runtimes and require a specific image with CUDA built-in, while other nodes should use the original ones that are smaller and runtime-agnostic.

What solution would you like?

Add an optional image property to node spec that overrides the Pod spec template of the resulting StatefulSet.

What alternatives have you considered?

If all ML nodes are CUDA-enabled, we can endure the larger images and just use the CUDA version for all nodes.

Do you have any additional context?

To use CUDA with specific runtime, we also need the ability to set the runtimeClassName property in the Pod spec. This should be another small feature request.

The text was updated successfully, but these errors were encountered:

prudhvigodithi · 2024-06-20T19:57:01Z

[Triage]
Hey @stevapple as of today custom image at nodePool level spec.nodePools[0].image is not supported in the NodePool struct. Also just curious today CUDA built-in OpenSearch images are not officially release by the project (coming from the issue opensearch-project/opensearch-build#4743 you created :) ), do you have a built in custom image for this purpose?
Also if you are open can you please contribute to the feature to allow overriding image spec for node group ?

Thank you
@getsaurabh02 @swoehrl-mw @rishabh6788 @peterzhuamazon

stevapple added enhancement New feature or request untriaged Issues that have not yet been triaged labels May 31, 2024

prudhvigodithi removed the untriaged Issues that have not yet been triaged label Jun 20, 2024

peterzhuamazon added this to Engineering Effectiveness Board Jul 11, 2024

github-project-automation bot moved this to 🆕 New in Engineering Effectiveness Board Jul 11, 2024

getsaurabh02 moved this from 🆕 New to Backlog in Engineering Effectiveness Board Jul 18, 2024

YeonghyeonKO mentioned this issue Nov 18, 2024

[PROPOSAL] Apply GPU acceleration to ML Node in k8s operator #906

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Allow overriding image spec for node group #832

[FEATURE] Allow overriding image spec for node group #832

stevapple commented May 31, 2024 •

edited

Loading

prudhvigodithi commented Jun 20, 2024

[FEATURE] Allow overriding image spec for node group #832

[FEATURE] Allow overriding image spec for node group #832

Comments

stevapple commented May 31, 2024 • edited Loading

Is your feature request related to a problem?

What solution would you like?

What alternatives have you considered?

Do you have any additional context?

prudhvigodithi commented Jun 20, 2024

stevapple commented May 31, 2024 •

edited

Loading