Skip to content

Latest commit

 

History

History
468 lines (372 loc) · 20.4 KB

README.md

File metadata and controls

468 lines (372 loc) · 20.4 KB

Modules

This directory contains a set of core modules built for the HPC Toolkit. Modules describe the building blocks of an HPC deployment. The expected fields in a module are listed in more detail below. Blueprints can be extended in functionality by incorporating modules from GitHub repositories.

All Modules

Modules from various sources are all listed here for visibility. Badges are used to indicate the source and status of many of these resources.

Modules listed below with the core-badge badge are located in this folder and are tested and maintained by the HPC Toolkit team.

Modules labeled with the community-badge badge are contributed by the community (including the HPC Toolkit team, partners, etc.). Community modules are located in the community folder.

Modules that are still in development and less stable are labeled with the experimental-badge badge.

Compute

Database

File System

Monitoring

Network

Packer

  • custom-image core-badge : Creates a custom VM Image based on the GCP HPC VM image.

Project

Scheduler

Scripts

  • startup-script core-badge : Creates a customizable startup script that can be fed into compute VMs.
  • htcondor-install community-badge experimental-badge : Creates a startup script to install HTCondor and exports a list of required APIs
  • omnia-install community-badge experimental-badge : Installs Slurm via Dell Omnia onto a cluster of VMs instances.
  • pbspro-preinstall community-badge experimental-badge : Creates a Cloud Storage bucket in which to save PBS Professional RPM packages for use by PBS clusters.
  • pbspro-install community-badge experimental-badge : Creates a Toolkit runner to install PBS Professional from RPM packages.
  • pbspro-qmgr community-badge experimental-badge : Creates a Toolkit runner to run common qmgr commands when configuring a PBS Professional cluster.
  • spack-install community-badge experimental-badge : Creates a startup script to install Spack on an instance or a slurm login or controller.
  • wait-for-startup community-badge experimental-badge : Waits for successful completion of a startup script on a compute VM.

Module Fields

ID (Required)

The id field is used to uniquely identify and reference a defined module. ID's are used in variables and become the name of each module when writing the terraform main.tf file. They are also used in the use and outputs lists described below.

For terraform modules, the ID will be rendered into the terraform module label at the top level main.tf file.

Source (Required)

The source is a path or URL that points to the source files for a module. The actual content of those files is determined by the kind of the module.

A source can be a path which may refer to a module embedded in the ghpc binary or a local file. It can also be a URL pointing to a GitHub path containing a conforming module.

Embedded Modules

Embedded modules are embedded in the ghpc binary during compilation and cannot be edited. To refer to embedded modules, set the source path to modules/<<MODULE_PATH>>.

The paths match the modules in the repository at compilation time. You can review the directory structure of the core modules and community modules to determine which path to use. For example, the following code is using the embedded pre-existing-vpc module:

  - id: network1
    source: modules/network/pre-existing-vpc

Local Modules

Local modules point to a module in the file system and can easily be edited. They are very useful during module development. To use a local module, set the source to a path starting with /, ./, or ../. For instance, the following module definition refers the local pre-existing-vpc modules.

  - id: network1
    source: ./modules/network/pre-existing-vpc

NOTE: This example would have to be run from the HPC Toolkit repository directory, otherwise the path would need to be updated to point at the correct directory.

GitHub Modules

To use a Terraform module available on GitHub, set the source to a path starting with github.com (over HTTPS) or [email protected] (over SSH). For instance, the following module definitions are sourcing the vpc module by pointing at the HPC Toolkit GitHub repository:

Get module from GitHub over SSH:

  - id: network1
    source: [email protected]:GoogleCloudPlatform/hpc-toolkit.git//modules/network/vpc

Get module from GitHub over HTTPS:

  - id: network1
    source: github.com/GoogleCloudPlatform/hpc-toolkit//modules/network/vpc

Both examples above use the double-slash notation (//) to indicate the root directory of the git repository and the remainder of the path indicates the location of the Terraform module.

Additionally, specific revisions of a remote module can be selected by any valid git reference. Typically, these are a git branch, commit hash or tag. The Intel DAOS blueprint makes extensive use of this feature. For example, to temporarily point to a development copy of the Toolkit vpc module, use:

  - id: network1
    source: github.com/GoogleCloudPlatform/hpc-toolkit//modules/network/vpc?ref=develop

Generic Git Modules

To use a Terraform module available in a non-GitHub git repository such as gitlab, set the source to a path starting git::. Two Standard git protocols are supported, git::https:// for HTTPS or git::[email protected] for SSH.

Additional formatting and features after git:: are identical to that of the GitHub Modules described above.

Kind (May be Required)

kind refers to the way in which a module is deployed. Currently, kind can be either terraform or packer. It must be specified for modules of type packer. If omitted, it will default to terraform.

Settings (May Be Required)

The settings field is a map that supplies any user-defined variables for each module. Settings values can be simple strings, numbers or booleans, but can also support complex data types like maps and lists of variable depth. These settings will become the values for the variables defined in either the variables.tf file for Terraform or variable.pkr.hcl file for Packer.

For some modules, there are mandatory variables that must be set, therefore settings is a required field in that case. In many situations, a combination of sensible defaults, deployment variables and used modules can populated all required settings and therefore the settings field can be omitted.

Use (Optional)

The use field is a powerful way of linking a module to one or more other modules. When a module "uses" another module, the outputs of the used module are compared to the settings of the current module. If they have matching names and the setting has no explicit value, then it will be set to the used module's output. For example, see the following blueprint snippet:

modules:
- id: network1
  source: modules/network/vpc

- id: workstation
  source: modules/compute/vm-instance
  use: [network1]
  settings:
  ...

In this snippet, the VM instance workstation uses the outputs of vpc network1.

In this case both network_self_link and subnetwork_self_link in the workstation settings will be set to $(network1.network_self_link) and $(network1.subnetwork_self_link) which refer to the network1 outputs of the same names.

The order of precedence that ghpc uses in determining when to infer a setting value is in the following priority order:

  1. Explicitly set in the blueprint using the settings field
  2. Output from a used module, taken in the order provided in the use list
  3. Deployment variable (vars) of the same name
  4. Default value for the setting

NOTE: See the network storage documentation for more information about mounting network storage file systems via the use field.

Outputs (Optional)

The outputs field allows a module-level output to be made available at the deployment group level and therefore will be available via terraform output in terraform-based deployment groups. This can useful for displaying the IP of a login node or simply displaying instructions on how to use a module, as we have in the monitoring dashboard module.

Required Services (APIs) (optional)

Each Toolkit module depends upon Google Cloud services ("APIs") being enabled in the project used by the HPC environment. For example, the creation of VMs requires the Compute Engine API (compute.googleapis.com). The startup-script module requires the Cloud Storage API (storage.googleapis.com) for storage of the scripts themselves. Each module includes in the Toolkit source code describes its required APIs internally. The Toolkit will merge the requiements from all modules and automatically validate that all APIs are enabled in the project specified by $(vars.project_id).

For advanced multi-project use cases and for modules not included with the Toolkit, you may manually add required APIs to each module with the following format:

deployment_groups:
- group: primary
  modules:
  ...
  - id: examplevm
    source: modules/example/module
    required_apis:
      $(vars.project_id):
      - compute.googleapis.com
      - storage.googleapis.com
      $(vars.other_project_id):
      - storage.googleapis.com
      explicit-project-id:
      - file.googleapis.com
    settings:
    ...

Common Settings

The following common naming conventions should be used to decrease the verbosity needed to define a blueprint. This is intentional to allow multiple modules to share inferred settings from deployment variables or from other modules listed under the use field.

For example, if all modules are to be created in a single region, that region can be defined as a deployment variable named region, which is shared between all modules without an explicit setting. Similarly, if many modules need to be connected to the same VPC network, they all can add the vpc module ID to their use list so that network_name would be inferred from that vpc module rather than having to set it manually.

  • project_id: The GCP project ID in which to create the GCP resources.
  • deployment_name: The name of the current deployment of a blueprint. This can help to avoid naming conflicts of modules when multiple deployments are created from the same blueprint.
  • region: The GCP region the module will be created in.
  • zone: The GCP zone the module will be created in.
  • network_name: The name of the network a module will use or connect to.
  • labels: Labels added to the module. In order to include any module in advanced monitoring, labels must be exposed. We strongly recommend that all modules expose this variable.

Writing Custom HPC Modules

Modules are flexible by design, however we do define some best practices when creating a new module meant to be used with the HPC Toolkit.

Terraform Requirements

The module source field must point to a single terraform module. We recommend the following structure:

  • main.tf file composing the terraform resources using provided variables.
  • variables.tf file defining the variables used.
  • (Optional) outputs.tf file defining any exported outputs used (if any).
  • (Optional) modules/ sub-directory pointing to submodules needed to create the top level module.

General Best Practices

  • Variables for environment-specific values (like project_id) should not be given defaults. This forces the calling module to provide meaningful values.
  • Variables should only have zero-value defaults (like null or empty strings) where leaving the variable empty is a valid preference which will not be rejected by the underlying API(s).
  • Set good defaults wherever possible. Be opinionated about HPC use cases.
  • Follow common variable naming conventions.

Terraform Coding Standards

Any Terraform based modules in the HPC Toolkit should implement the following standards:

  • terraform-docs is used to generate README files for each module.
  • The first parameter listed under a module should be source (when referring to an external implementation).
  • The order for parameters in inputs should be:
    • description
    • type
    • default
  • The order for parameters in outputs should be:
    • description
    • value