This module creates a compute partition that can be used as input to the schedmd-slurm-gcp-v5-controller.
The partition module is designed to work alongside the
schedmd-slurm-gcp-v5-node-group
module. A partition can be made up of one or
more node groups, provided either through use
(preferred) or defined manually
in the node_groups
variable.
Warning: updating a partition and running
terraform apply
will not cause the slurm controller to update its own configurations (slurm.conf
) unlessenable_reconfigure
is set to true in the partition and controller modules.
The following code snippet creates a partition module with:
- 2 node groups added via
use
.- The first node group is made up of machines of type
c2-standard-30
. - The second node group is made up of machines of type
c2-standard-60
. - Both node groups have a maximum count of 200 dynamically created nodes.
- The first node group is made up of machines of type
- partition name of "compute".
- connected to the
network1
module viause
. - nodes mounted to homefs via
use
.
- id: node_group_1
source: community/modules/compute/schedmd-slurm-gcp-v5-node-group
settings:
name: c30
node_count_dynamic_max: 200
machine_type: c2-standard-30
- id: node_group_2
source: community/modules/compute/schedmd-slurm-gcp-v5-node-group
settings:
name: c60
node_count_dynamic_max: 200
machine_type: c2-standard-60
- id: compute_partition
source: community/modules/compute/schedmd-slurm-gcp-v5-partition
use:
- network1
- homefs
- node_group_1
- node_group_2
settings:
partition_name: compute
For a complete example using this module, see slurm-gcp-v5-cluster.yaml.
WARNING: Lenient zone policies can lead to additional egress costs when moving data between Google Cloud resources in different zones in the same region, such as between filestore and other VM instances. For more information on egress fees, see the Network Pricing Google Cloud documentation.
To avoid egress charges, ensure your compute nodes are created in the same zone as the other resources that share data with them by setting
zone_policy_deny
to all other zones in the region.
The Slurm on GCP partition modules provide the option to set policies regarding
which zone the compute VM instances will be created in through the
zone_policy_allow
and zone_policy_deny
variables.
As an example, see the the following module:
- id: partition-with-zone-policy
source: community/modules/compute/schedmd-slurm-gcp-v5-partition
settings:
zone_policy_allow:
- us-central1-a
- us-central1-b
zone_policy_deny: [us-central1-f]
In this module, the following is defined:
us-central1-a
andus-central1-b
zones have been explicitly allowed.us-central1-f
has been explicitly denied, therefore no nodes in this partition will be created in that zone.- Since
us-central1-c
was not included in the zone policy, it will default to "Allow", which means the partition has the same likelihood of creating a node in that zone as the zones explicitly listed underzone_policy_allow
.
NOTE:
zone_policy_allow
does not guarantee the use of specified zones because zones are allowed by default. Configurezone_policy_deny
to ensure that zones outside the allowed list are not used.
The zone
variable is another option for setting the zone policy. If zone
is
set and neither zone_policy_deny
nor zone_policy_allow
are set, the
policy will be configured as follows:
- All currently active zones in the region at deploy time will be set in the
zone_policy_deny
list, with the exception of the providedzone
. - The provided
zone
will be set as the only value in thezone_policy_allow
list.
zone_policy_allow
and zone_policy_deny
take precedence over zone
if both
are set.
NOTE: If a new zone is added to the region while the cluster is active, nodes in the partition may be created in that zone as well. In this case, the partition may need to be redeployed (possible via
enable_reconfigure
if set) to ensure the newly added zone is set to "Deny".
The HPC Toolkit team maintains the wrapper around the slurm-on-gcp terraform modules. For support with the underlying modules, see the instructions in the slurm-gcp README.
Copyright 2022 Google LLC
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Name | Version |
---|---|
terraform | >= 0.13.0 |
>= 3.83 |
Name | Version |
---|---|
>= 3.83 |
Name | Source | Version |
---|---|---|
slurm_partition | github.com/SchedMD/slurm-gcp.git//terraform/slurm_cluster/modules/slurm_partition | 5.3.0 |
Name | Type |
---|---|
google_compute_zones.available | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
additional_disks | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | list(object({ |
null |
no |
bandwidth_tier | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | string |
null |
no |
can_ip_forward | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | bool |
null |
no |
deployment_name | Name of the deployment. | string |
n/a | yes |
disable_smt | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | bool |
null |
no |
disk_auto_delete | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | bool |
null |
no |
disk_size_gb | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | number |
null |
no |
disk_type | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | string |
null |
no |
enable_confidential_vm | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | bool |
null |
no |
enable_oslogin | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | bool |
null |
no |
enable_placement | Enable placement groups. | bool |
true |
no |
enable_reconfigure | Enables automatic Slurm reconfigure on when Slurm configuration changes (e.g. slurm.conf.tpl, partition details). Compute instances and resource policies (e.g. placement groups) will be destroyed to align with new configuration. NOTE: Requires Python and Google Pub/Sub API. WARNING: Toggling this will impact the running workload. Deployed compute nodes will be destroyed and their jobs will be requeued. |
bool |
false |
no |
enable_shielded_vm | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | bool |
null |
no |
enable_spot_vm | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | bool |
null |
no |
exclusive | Exclusive job access to nodes. | bool |
true |
no |
gpu | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | object({ |
null |
no |
is_default | Sets this partition as the default partition by updating the partition_conf. If "Default" is already set in partition_conf, this variable will have no effect. |
bool |
false |
no |
labels | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | any |
null |
no |
machine_type | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | string |
null |
no |
metadata | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | map(string) |
null |
no |
min_cpu_platform | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | string |
null |
no |
network_storage | An array of network attached storage mounts to be configured on the partition compute nodes. | list(object({ |
[] |
no |
node_conf | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | map(any) |
null |
no |
node_count_dynamic_max | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | number |
null |
no |
node_count_static | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | number |
null |
no |
node_groups | A list of node groups associated with this partition. See schedmd-slurm-gcp-v5-node-group for more information on defining a node group in a blueprint. |
list(object({ |
[] |
no |
on_host_maintenance | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | string |
null |
no |
partition_conf | Slurm partition configuration as a map. See https://slurm.schedmd.com/slurm.conf.html#SECTION_PARTITION-CONFIGURATION |
map(string) |
{} |
no |
partition_name | The name of the slurm partition. | string |
n/a | yes |
preemptible | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | string |
null |
no |
project_id | Project in which the HPC deployment will be created. | string |
n/a | yes |
region | The default region for Cloud resources. | string |
n/a | yes |
service_account | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | object({ |
null |
no |
shielded_instance_config | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | object({ |
null |
no |
slurm_cluster_name | Cluster name, used for resource naming and slurm accounting. If not provided it will default to the first 8 characters of the deployment name (removing any invalid characters). | string |
null |
no |
source_image | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | string |
null |
no |
source_image_family | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | string |
null |
no |
source_image_project | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | string |
null |
no |
spot_instance_config | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | object({ |
null |
no |
subnetwork_project | The project the subnetwork belongs to. | string |
"" |
no |
subnetwork_self_link | Subnet to deploy to. | string |
null |
no |
tags | Deprecated: Use the schedmd-slurm-gcp-v5-node-group module for defining node groups instead. | list(string) |
null |
no |
zone | Zone in which to create all compute VMs. If zone_policy_deny or zone_policy_allow are set, the zone variable will be ignored. |
string |
null |
no |
zone_policy_allow | Partition nodes will prefer to be created in the listed zones. If a zone appears in both zone_policy_allow and zone_policy_deny, then zone_policy_deny will take priority for that zone. |
set(string) |
[] |
no |
zone_policy_deny | Partition nodes will not be created in the listed zones. If a zone appears in both zone_policy_allow and zone_policy_deny, then zone_policy_deny will take priority for that zone. |
set(string) |
[] |
no |
Name | Description |
---|---|
partition | Details of a slurm partition |