Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the resume action in the slurmctld charm #34

Merged

Conversation

matheushent
Copy link
Contributor

This PR fixes the resume action in the slurmctld charm. The source error can be seen on this link.

Essentially, the command to resume the nodes has been patched from scontrol update nodename=<names> state=resume to scontrol update nodename=<names> state=idle.

As I understand Slurm's documentation, there's no operational change applied by this PR.

This commit modifies the command run by the *resume* action. The previous command
set the state argument as *resume*, which was leading to the error *slurm_update error: Invalid node state specified*.
@NucciTheBoss NucciTheBoss self-requested a review November 12, 2024 14:03
@NucciTheBoss NucciTheBoss added the bug Something isn't working label Nov 12, 2024
@NucciTheBoss
Copy link
Member

Hey @matheushent 👋

Thanks for this PR! I'm currently doing some work over on #35 that will simplify how we interface with Slurm. Once we have that landed, I'll get this PR reviewed 😄

@matheushent matheushent changed the title F the resume action in the slurmctld charm Fix the resume action in the slurmctld charm Nov 13, 2024
Copy link
Member

@NucciTheBoss NucciTheBoss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Eventually we'll roll this into the scontrol method we've added to slurm_ops as all calls will be streamed to the juju debug log, but it'll be good to have the resume action good and going 🤩

Here's the scontrol method from slurm_ops: https://github.com/charmed-hpc/hpc-libs/blob/eaa1978d2c51f22e6bffd6a1cea56e54b67534d4/lib/charms/hpc_libs/v0/slurm_ops.py#L901-L908

@NucciTheBoss NucciTheBoss merged commit c5e0f6e into charmed-hpc:main Nov 18, 2024
5 checks passed
@matheushent matheushent deleted the slurmctld/fix-resume-action branch November 18, 2024 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

resume action does not work due to 'resume' not being a valid state for a node.
2 participants