Utilities and APIs for interfacing with the Slurm workload manager.
slurmutils is a collection of various utilities that make it easier for you and your friends to interface with the Slurm workload manager, especially if you are orchestrating deployments of new and current Slurm clusters. Gone are the days of seething over incomplete Jinja2 templates. Current utilities shipped in the slurmutils package include:
calculate_rs
: A function for calculating the ranges and strides of an iterable with unique elements. This function can be used to help convert arrays of node hostnames, device file ids, etc into a Slurm hostname specification.
acctgatherconfig
: An editor for acct_gather.conf configuration files.cgroupconfig
: An editor for cgroup.conf configuration files.gresconfig
: An editor for gres.conf configuration files.slurmconfig
: An editor for slurm.conf configuration files.slurmdbdconfig
: An editor for slurmdbd.conf configuration files.
For more information on how to use or contribute to slurmutils, check out the Getting Started and Development sections below 👇
$ python3 -m pip install slurmutils
We use the Poetry packaging and dependency manager to manage this project. It must be installed on your system if installing slurmutils from source.
$ git clone https://github.com/canonical/slurmutils.git
$ cd slurmutils
$ poetry install
The top-level provides access to some utilities that streamline common Slurm-related operations such as calculating the ranges and strides for a Slurm hostname specification. Here's some example operations you can perform with these utilities:
from os.path import commonprefix
from slurmutils import calculate_rs
nodes = ["juju-abc654-1", "juju-abc654-2", "juju-abc654-4"]
prefix = commonprefix(nodes)
nums = [int(n.partition(prefix)[2]) for n in nodes]
slurm_host_spec = prefix + calculate_rs(nums) # "juju-abc654-[1-2,4]"
from pathlib import Path
from slurmutils import calculate_rs
device_files = [file for file in Path("/dev").iterdir() if "nvidia" in file.name]
prefix = "/dev/nvidia"
nums = [int(n.partition(prefix)[2]) for n in device_files]
file_spec = prefix + calculate_rs(nums) # "/dev/nvidia[0-4]"
This module provides an API for editing files, and creating new files if they do not exist. Here's some operations you can perform on files using the editors in this module:
from slurmutils.editors import acctgatherconfig
with acctgatherconfig.edit("/etc/slurm/acct_gather.conf") as config:
config.profile_influx_db_database = "test_acct_gather_db"
config.profile_influx_db_default = ["NONE"]
config.profile_influx_db_host = "testhostname1"
config.profile_influx_db_pass = "testpassword1"
config.profile_influx_dbrt_policy = "testpolicy1"
config.profile_influx_db_user = "testuser1"
config.profile_influx_db_timeout = "20"
from slurmutils.editors import cgroupconfig
with cgroupconfig.edit("/etc/slurm/cgroup.conf") as config:
config.constrain_cores = "yes"
config.constrain_devices = "yes"
config.constrain_ram_space = "yes"
config.constrain_swap_space = "yes"
from slurmutils.editors import gresconfig
from slurmutils.models import GRESName, GRESNode
with gresconfig.edit("/etc/slurm/gres.conf") as config:
new_gres = GRESName(
Name="gpu",
Type="epyc",
File="/dev/amd4",
Cores=["0", "1"],
)
new_node = GRESNode(
NodeName="juju-abc654-[1-20]",
Name="gpu",
Type="epyc",
File="/dev/amd[0-3]",
Count="12G",
)
config.auto_detect = "rsmi"
config.names[new_gres.name] = [new_gres]
config.nodes[new_node.node_name] = [new_node]
from slurmutils.editors import slurmconfig
# Open, edit, and save the slurm.conf file located at _/etc/slurm/slurm.conf_.
with slurmconfig.edit("/etc/slurm/slurm.conf") as config:
del config.inactive_limit
config.max_job_count = 20000
config.proctrack_type = "proctrack/linuxproc"
from slurmutils.editors import slurmconfig
from slurmutils.models import Node
with slurmconfig.edit("/etc/slurm/slurm.conf") as config:
node = Node(
NodeName="batch-[0-25]",
NodeAddr="12.34.56.78",
CPUs=1,
RealMemory=1000,
TmpDisk=10000,
)
config.nodes.update(node.dict())
from slurmutils.editors import slurmdbdconfig
with slurmdbdconfig.edit("/etc/slurm/slurmdbd.conf") as config:
config.archive_usage = "yes"
config.log_file = "/var/spool/slurmdbd.log"
config.debug_flags = ["DB_EVENT", "DB_JOB", "DB_USAGE"]
del config.auth_alt_types
del config.auth_alt_parameters
If you want to learn more about all the things you can do with slurmutils, here are some further resources for you to explore:
This project uses tox as its command runner, which provides some useful commands that will help you while hacking on slurmutils:
tox run -e fmt # Apply formatting standards to code.
tox run -e lint # Check code against coding style standards.
tox run -e unit # Run unit tests.
If you're interested in contributing your work to slurmutils, take a look at our contributing guidelines for further details.
slurmutils is a project of the Ubuntu High-Performance Computing community. Interested in contributing bug fixes, new editors, documentation, or feedback? Want to join the Ubuntu HPC community? You’ve come to the right place 🤩
Here’s some links to help you get started with joining the community:
- Ubuntu Code of Conduct
- Contributing guidelines
- Join the conversation on Matrix
- Get the latest news on Discourse
- Ask and answer questions on GitHub
slurmutils is free software, distributed under the GNU Lesser General Public License, v3.0. See the LGPL-3.0 LICENSE file for further details.