-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor submission templates #722
Conversation
This moves much of the task and number of nodes logic to the environments where it is easier to manage the more complicated logic.
TODO:
Note need to make sure to only set -N when required to not over request resources. Anyone working on this please feel free to modify this. |
CPUS and GPUS per partition are in theory supported.
Use sentinal of -1 to denote no node structure and always return 1 node requested for either CPU or GPU tasks.
Add the ComputeEnvironment._shared_partitions attribute to check if less than single node submissions should be allowed in ComputeEnvironment._get_scheduler_values.
The Delta template is now tested and works.
The environment is now tested.
for more information, see https://pre-commit.ci
I do not have access to the cluster, so to prevent regressions, I am reseting this.
@b-butler Can we push this through or close it? |
updates: - [github.com/psf/black: 23.1.0 → 23.3.0](psf/black@23.1.0...23.3.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* First pass at fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add filter test * Update changelog --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Delta changed their compute node hostnames in their April 19th maintanance. This fixes the detection of the delta environment.
* doc: Update changelog. * Bump up to version 0.25.1.
* feat: Add the Frontier supercomputer to environments. * test: Add Frontier to template testing. * test: Update environment test template generation to signac 2.0 * doc: Update changelog * doc: Add Frontier documentation. * doc: Update incode comment clarity Co-authored-by: Bradley Dice <[email protected]> --------- Co-authored-by: Bradley Dice <[email protected]>
Co-authored-by: Bradley Dice <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Bradley Dice <[email protected]>
* feat (WIP): create flow CLI subcommand for testing templates * feat: Finish new CLI option. * test: flow test-workflow. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * doc: Add test-workflow to documentation. * doc: Add changes to changelog. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Bumps [ruamel-yaml](https://sourceforge.net/p/ruamel-yaml/code/ci/default/tree) from 0.17.21 to 0.17.31. --- updated-dependencies: - dependency-name: ruamel-yaml dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Codecov Report
@@ Coverage Diff @@
## main #722 +/- ##
==========================================
- Coverage 69.25% 69.17% -0.09%
==========================================
Files 44 44
Lines 4297 4331 +34
Branches 950 1052 +102
==========================================
+ Hits 2976 2996 +20
- Misses 1109 1129 +20
+ Partials 212 206 -6
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two small suggestions and a question:
This PR touches a lot of cluster templates. Does flow normally need to be validated on each of the clusters in addition to unit tests? If so, has the validation been done after the changes in this PR?
@tommy-waltmann that is part of the reason the PR took so long. All of the templates changed have been validated, I believe (need to double check Andes). That is also the reason some are not changed since I could not access them. |
Description
This PR moves much of the task and number of nodes logic to the environment classes where they can specify any needed parameters using the
resources
template context value now provided to the Jinja templates. This allows more complicated logic in a simpler to understand format compared to Jinja. This also promotes similar computation of resources, and explicit overrides through inheritance. For some more discussion see #702.Motivation and Context
This is attempting to solve some bugs in the existing template logic. For one, we sometimes fail on multi-nodes GPU submissions (see #702) and related issues. We also currently modify user resource requests (e.g. rounding CPU tasks to the same number per node) rather than submitting as is or erroring.
Checklist: