Skip to content

Commit

Permalink
Many updates.
Browse files Browse the repository at this point in the history
* Improve documentation.
* Show Job ID column only when showing submitted or completed jobs.
* Validate submit_whole.
* Cache workspace directory mtime and list it only when changed.
* New subcommands: `row init` and `row clean`.
* Show requested job statuses in `show directories`.
  • Loading branch information
joaander committed May 12, 2024
1 parent 8d4129a commit f2114a8
Show file tree
Hide file tree
Showing 68 changed files with 2,475 additions and 311 deletions.
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
* joaander
6 changes: 3 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/target
/workspace
/workflow.toml
/.row
/.signac
workspace
.row
.signac
16 changes: 14 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,5 +34,17 @@ repos:
hooks:
- id: ruff-format
- id: ruff

# TODO: add fix-license-header
- repo: https://github.com/glotzerlab/fix-license-header
rev: v0.3.2
hooks:
- id: fix-license-header
name: Fix license headers (rust)
types_or: [rust]
args:
- --license-file=LICENSE
- --add=Part of row, released under the BSD 3-Clause License.
- --comment-prefix=//
- repo: https://github.com/crate-ci/typos
rev: v1.21.0
hooks:
- id: typos
19 changes: 19 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ indicatif-log-bridge = "0.2.2"
log = "0.4.21"
memchr = "2.7.2"
nix = { version = "0.28.0", features = ["signal"] }
path-absolutize = "3.1.1"
postcard = { version = "1.0.8", default-features = false, features = ["use-std"] }
serde = { version = "1.0.197", features = ["derive"] }
serde_json = "1.0.114"
Expand Down
21 changes: 12 additions & 9 deletions DESIGN.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,11 +52,6 @@ Row is yet another workflow engine that automates the process of executing **act
Ideas:
* List scheduler jobs and show useful information.
* Cancel scheduler jobs specific to actions and/or directories.
* Command to uncomplete an action for a set of directories. This would remove the product files and
update the cache.
* Some method to clear any cache (maybe this instead of uncomplete?). This would allow
users to discover changed action names, changed products, manually uncompleted
actions, and deal with corrupt cache files.

## Overview

Expand Down Expand Up @@ -138,7 +133,8 @@ completed **directories** is read.
### The cache files

Row maintains the state of the workflow in several files:
* `values.json`
* `directories.json`
* Last time the workspace was modified.
* Cached copies of the user-provided static value file.
* `completed.postcard`
* Completion status for each **action**.
Expand Down Expand Up @@ -263,13 +259,14 @@ status may take a long time, so it should display a progress bar.

## Subcommands

* `init` - create `workflow.toml` and `workspace` if they do not yet exist. (TODO: write init)
* `init` - create `workflow.toml` and `workspace` if they do not yet exist.
* `scan` - scan the workspace for directories that have completed actions.
* `show` - show properties of the workflow:
* `status` - summarize the status of the workflow.
* `directories` - list directories in the workflow.

Ideas for other commands, `uncomplete`
* `clsuter` - show the currently selected cluster configuration.
* `launchers` - list the launchers for the current cluster.
* `clean` - delete row cache files.

## Definitions

Expand All @@ -296,3 +293,9 @@ Ideas for other commands, `uncomplete`
- **whole group**: A **submission group** that is identical to the **group** found
without applying the additional submission filters.
- **workspace**: The location on the file system that contains **directories**.

# TODO: Pull request template
# TODO: Issue templates?
# TODO: Dependabot configuration
# TODO: readthedocs builds
# TODO: logo
82 changes: 57 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,79 @@
# Row

[![GitHub Actions](https://github.com/glotzerlab/row/actions/workflows/test.yaml/badge.svg?branch=trunk)](https://github.com/glotzerlab/row/actions/workflows/test.yaml)
[![Read the Docs](https://img.shields.io/readthedocs/row/latest.svg)](https://row.readthedocs.io/)
[![Contributors](https://img.shields.io/github/contributors-anon/glotzerlab/row.svg?style=flat)](https://row.readthedocs.io/en/latest/contributors.html)
[![License](https://img.shields.io/badge/license-BSD--3--Clause-green.svg)](https://row.readthedocs.io/en/latest/license.html)

Row is a command line tool that helps you manage workflows on HPC resources. Define
**actions** in a workflow configuration file that apply to **groups** of **directories**
in your **workspace**. **Submit** actions to your HPC **scheduler**. Row tracks which
actions have been submitted on which directories so that you don't submit the same work
twice. Once a job completes, subsequent actions become eligible allowing you to process
your entire workflow to completion over many submissions.
in your **workspace**. **Submit** actions to your cluster's **scheduler**. Row tracks
which actions have been submitted on which directories so that you don't submit the same
work twice. Once a job completes, subsequent actions become eligible allowing you to
process your entire workflow to completion over many submissions.

The name is "row" as in "row, row, row your boat".
The name is **row** as in *"row, row, row your boat"*.

Notable features:
* Support both arbitrary directories and [signac](https://signac.io) workspaces.
* Execute actions via arbitrary shell commands.
* Support arbitrary directories and [signac](https://signac.io) workspaces.
* Define your workflow in a configuration file.
* Execute actions via user-defined shell commands.
* Flexible group definitions:
* Select directories with conditions on their value.
* Split directories by their value and/or into fixed size groups.
* Execute groups in serial or parallel.
* Schedule CPU and GPU resources.
* Automatically determine the partition based on the batch job size.
* Automatically determine the partition based on the job's resources and size.
* Built-in configurations for many national and university HPC systems.
* Add custom cluster definitions for your resources.
* Row is **fast**.

TODO: better demo script to get output for README and row show documentation examples.
## Demonstration

For example:
```bash
> row show status
Action Completed Submitted Eligible Waiting Remaining cost
one 1000 100 900 0 24K CPU-hours
two 0 200 800 1000 8K GPU-hours
$ row submit --action=step1 -n 1
[1/1] Submitting action 'step1' on directory dir12 and 3 more (0ms).
Row submitted job 5095791.
```

```bash
> row show directories --value "/value"
Directory Status Job ID /value
dir1 submitted 1432876 0.9
dir2 submitted 1432876 0.8
dir3 submitted 1432876 0.7

dir4 completed 0.5
dir5 completed 0.4
dir6 completed 0.3
$ row show status
Action Completed Submitted Eligible Waiting Remaining cost
initialize 50 0 50 0 8 CPU-hours
step1 4 4 42 50 2K CPU-hours
step2 0 0 4 96 800 GPU-hours
```

```bash
$ row show directories step1 -n 3 --value="/value"
Directory Status Job ID /value
dir1 completed 116
dir10 completed 952
dir100 completed 139
dir11 completed 998

dir12 submitted anvil/5095791 950
dir13 submitted anvil/5095791 107
dir14 submitted anvil/5095791 127
dir15 submitted anvil/5095791 122

dir16 eligible 682
dir17 eligible 816
dir18 eligible 803
dir19 eligible 691
```

**Row** is a spiritual successor to
[signac-flow](https://docs.signac.io/projects/flow/en/latest/).
## Resources

- [Documentation](https://row.readthedocs.io/):
Tutorial, command line interface documentation, and configuration file specifications.
- [Row discussion board](https://github.com/glotzerlab/row/discussions/):
Ask the **row** user community for help.
- [signac](https://signac.io):
Python package to help you manage your workspace.

## History

**Row** is a spiritual successor to [signac-flow][flow].

[flow]: https://docs.signac.io/projects/flow/en/latest/.
1 change: 1 addition & 0 deletions demo/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.out
19 changes: 19 additions & 0 deletions demo/demo.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
mkdir workspace || exit 1
cd workspace

for i in {1..100}
do
mkdir dir$i || exit 1
v=$((1 + RANDOM % 1000))
echo "{\"value\": $v}" > dir$i/value.json || exit 1
done

row submit --cluster none --action=initialize -n 5 --yes || exit 1

row submit --cluster none --action=step1 -n 1 --yes || exit 1

row submit --action=step1 -n 1 --yes || exit 1

row show status || exit 1

row show directories step1 -n 3 --value="/value"
26 changes: 26 additions & 0 deletions demo/workflow.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
[workspace]
value_file = "value.json"

[[action]]
name = "initialize"
command = "touch workspace/{directory}/initialize.out"
products = ["initialize.out"]
resources.walltime.per_directory = "00:10:00"
group.maximum_size = 10

[[action]]
name = "step1"
command = "touch workspace/{directory}/step1.out"
products = ["step1.out"]
previous_actions = ["initialize"]
resources.walltime.per_directory = "1 day, 00:00:00"
group.maximum_size = 4

[[action]]
name = "step2"
command = "touch workspace/{directory}/step2.out"
previous_actions = ["step1"]
products = ["step2.out"]
resources.walltime.per_directory = "08:00:00"
resources.gpus_per_process = 1
group.maximum_size = 4
12 changes: 6 additions & 6 deletions doc/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
- [Thread parallelism](guide/concepts/thread-parallelism.md)
- [Directory status](guide/concepts/status.md)
- [JSON pointers](guide/concepts/json-pointers.md)
- [The row cache](guide/concepts/cache.md)
- [Cache files](guide/concepts/cache.md)
# Reference

- [row](row/index.md)
Expand All @@ -33,7 +33,7 @@
- [show cluster](row/show/cluster.md)
- [show launchers](row/show/launchers.md)
- [scan](row/scan.md)
- [uncomplete](row/uncomplete.md)
- [clean](row/clean.md)

- [`workflow.toml`](workflow/index.md)
- [workspace](workflow/workspace.md)
Expand All @@ -52,14 +52,14 @@

# Appendix

- [Change log]()
- [Release notes](release-notes.md)
- [Migrating from signac-flow](signac-flow.md)
- [For developers](developers/index.md)
- [Contributing]()
- [Contributing](developers/contributing.md)
- [Code style](developers/style.md)
- [Testing](developers/testing.md)
- [Documentation](developers/documentation.md)
- [License]()
- [License](license.md)

-----
[Contributors]()
[Contributors](contributors.md)
4 changes: 2 additions & 2 deletions doc/src/clusters/cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ total_cpus <= maximum_cpus_per_job

`cluster.partition.require_cpus_multiple_of`: **integer** - All jobs submitted to this
partition **must** use an integer multiple of the given number of cpus:
```
```plaintext
total_cpus % require_cpus_multiple_of == 0
```

Expand Down Expand Up @@ -118,7 +118,7 @@ total_gpus <= maximum_gpus_per_job

`cluster.partition.require_gpus_multiple_of`: **integer** - All jobs submitted to this
partition **must** use an integer multiple of the given number of gpus:
```
```plaintext
total_gpus % require_gpus_multiple_of == 0
```

Expand Down
5 changes: 5 additions & 0 deletions doc/src/contributors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Contributors

The following people have contributed to the development of **row**:

* Joshua A. Anderson, University of Michigan
Loading

0 comments on commit f2114a8

Please sign in to comment.