-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(all): implement slurm_ops
charm library
#35
feat(all): implement slurm_ops
charm library
#35
Conversation
…ipulation These submodules are created underneath `utils` since `slurm_ops` is specific to Slurm rather than including customizations required specifically by the slurmd operator. The `machine` module is a wrapper around the `slurmd -C` command which retrieves compute node information required by slurmctld, the `nhc` module is a wrapper around the NHC installation + configuration logic, and `service` includes the systemd customizations required to modify ulimits + support `juju_systemd_notices`. Signed-off-by: Jason C. Nucciarone <[email protected]>
`slurm_ops` provides the neccessary methods to manage slurmd without needing to carry a `slurmd_ops` module that copies methods from several other custom ops managers. Signed-off-by: Jason C. Nucciarone <[email protected]>
Signed-off-by: Jason C. Nucciarone <[email protected]>
Signed-off-by: Jason C. Nucciarone <[email protected]>
Signed-off-by: Jason C. Nucciarone <[email protected]>
Signed-off-by: Jason C. Nucciarone <[email protected]>
Signed-off-by: Jason C. Nucciarone <[email protected]>
Signed-off-by: Jason C. Nucciarone <[email protected]>
Signed-off-by: Jason C. Nucciarone <[email protected]>
Signed-off-by: Jason C. Nucciarone <[email protected]>
Signed-off-by: Jason C. Nucciarone <[email protected]>
Signed-off-by: Jason C. Nucciarone <[email protected]>
Signed-off-by: Jason C. Nucciarone <[email protected]>
Signed-off-by: Jason C. Nucciarone <[email protected]>
Signed-off-by: Jason C. Nucciarone <[email protected]>
Signed-off-by: Jason C. Nucciarone <[email protected]>
Signed-off-by: Jason C. Nucciarone <[email protected]>
`ConfigData.get(...)` from `ops` returns a looser type than what's allowed by pyright if we're passing the result off to `nhc.generate_config(...)`, so we use `typing.cast(str, ...)` to ensure that the type of the output is just str. Other changes: - Removed stale TODO comment from development code. Signed-off-by: Jason C. Nucciarone <[email protected]>
c1d0d0b
to
648e032
Compare
The `rustc` compiler is required by the cryptography Python package since all charm dependencies are pulled as source packages rather than as pre-built wheels. The parts declarations will not be required when `jwtctl` is implemented, but until then, the `rustc` compiler is required to successfully pack the Slurm charms. Signed-off-by: Jason C. Nucciarone <[email protected]>
Hmm... seems like the CI hit a timeout. Will try running locally to see what the issues are |
Upon further investigation, it seems like the current issue with the CI is the time it takes to build the |
This looks awesome! Nice work! |
Signed-off-by: Jason C. Nucciarone <[email protected]>
Installation would fail originally because lbnl-* would be the top-level directory under `/tmp` so `subprocess.check_output(...)` could not find the `autogen.sh` file we needed to build NHC on the node. Signed-off-by: Jason C. Nucciarone <[email protected]>
Modifies `interface_slurmd` to not add an empty 'Nodes=' field to the Partition info if no compute nodes are enlisted yet. `scontrol reconfigure` will fail as Slurm cannot process an empty 'Nodes=' field. Signed-off-by: Jason C. Nucciarone <[email protected]>
Ignoring an empty 'Nodes=' field in the Partition configuration by checking the length of configured `partition_nodes` to be greater than 0, the cyclomatic complexity of `interface_slurmd.Slurmd.get_new_nodes_and_nodes_and_partitions` becomes 11 which cause the lint job to fail. This method should eventually be reduced if possible, but setting the max cyclomatic complexity to 15 works as an effective stop-gap. Signed-off-by: Jason C. Nucciarone <[email protected]>
Signed-off-by: Jason C. Nucciarone <[email protected]>
Signed-off-by: Jason C. Nucciarone <[email protected]>
`pyright` doesn't like the dynamic attributes provided by the `slurmutils` config models in the `slurmdbd` charm, so we'll need to enable type checking over on that project. For the `slurmctld` charm, we just needed to correct the hint for `assemble_slurmctld_parameters` to be `dict[str, Any]` rather than string which is returned for the slurm.conf editor. Signed-off-by: Jason C. Nucciarone <[email protected]>
3680b20
to
14b32f0
Compare
@jedel1043 only took ~45 minutes for the integration tests to run, but this PR is R4R 🥳 Still looking into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive work! I love the line diff; +1766 - 2422 = -656 less lines, which is always great.
Prematurely called the debug message that MUNGE was restarted correctly before actually restarting MUNGE on the machine. Signed-off-by: Jason C. Nucciarone <[email protected]>
…es not exist Signed-off-by: Jason C. Nucciarone <[email protected]>
Changes: * slurmctld: create gh issue for enabling Juju Secrets. * slurmd: remove stale TODO comment from development work. Signed-off-by: Jason C. Nucciarone <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's everything addressed - LGTM 👍
This PR implements the
slurm_ops
andis_container
charm libraries to provide the Slurm operations logic to all the Slurm charms.Key features are that there's now one standard interface for interacting with the Slurm services, and we have an easy way to drive munge, jwt tokens, prometheus, etc, without requiring duplication of logic between charms.
slurm_ops
also provides back-end implementations for both the Slurm deb and Slurm snap so that we can easily switch between the chosen format.Related issues
Misc.
I came across a few issues that were fixed as part of the work to integrate
slurm_ops
into the charms:SlurmctldParameters
was being overwritten byCHARM_MAINTAINED_SLURM_CONF_PARAMETERS
when writing out theslurm.conf
file.