You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our current howto for deploying Slurm takes folks through deploying their own Slurm cluster with Juju, but it does not mention that the compute nodes are initially in the down state and that the node-configured action must be executed for the compute nodes to become active:
juju run slurmd/<unit-number> node-configured --background
If we don't tell folks about this step, it will cause confusion when they attempt to run their first job on the cluster as they will think that something went wrong when they deployed the workload manager. To prevent this confusion, we must a howto that comes after the "Deploy workload manager" documentation that informs folks how to change the state of new = compute nodes from down to active.
Misc.
We should also add explanation documentation that justifies why compute nodes are initially brought up as down and not active (e.g. compute nodes need additional configuration before they're ready to start running user workloads).
The text was updated successfully, but these errors were encountered:
NucciTheBoss
changed the title
[Enhancement]: Add howto section for marking compute nodes active after deployment
Add howto section for marking compute nodes active after deployment
Nov 25, 2024
Our current howto for deploying Slurm takes folks through deploying their own Slurm cluster with Juju, but it does not mention that the compute nodes are initially in the down state and that the
node-configured
action must be executed for the compute nodes to become active:If we don't tell folks about this step, it will cause confusion when they attempt to run their first job on the cluster as they will think that something went wrong when they deployed the workload manager. To prevent this confusion, we must a howto that comes after the "Deploy workload manager" documentation that informs folks how to change the state of new = compute nodes from down to active.
Misc.
We should also add explanation documentation that justifies why compute nodes are initially brought up as down and not active (e.g. compute nodes need additional configuration before they're ready to start running user workloads).
The text was updated successfully, but these errors were encountered: