Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tutorial #12

Merged
merged 7 commits into from
May 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ env:
CARGO_TERM_COLOR: always
ROW_COLOR: always
CLICOLOR: 1
MDBOOK_VERSION: 0.4.37
MDBOOK_VERSION: 0.4.40
LINKCHECK_VERSION: 0.7.7
RUST_LATEST_VERSION: 1.78.0
BUNDLE_LICENSES_VERSION: 1.3.0
Expand Down
2 changes: 1 addition & 1 deletion .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ build:
os: "ubuntu-22.04"
commands:
- mkdir -p bin
- curl -sSL "https://github.com/rust-lang/mdBook/releases/download/v0.4.37/mdbook-v0.4.37-x86_64-unknown-linux-gnu.tar.gz" | tar -xvz --directory "bin"
- curl -sSL "https://github.com/rust-lang/mdBook/releases/download/v0.4.40/mdbook-v0.4.40-x86_64-unknown-linux-gnu.tar.gz" | tar -xvz --directory "bin"
- mkdir -p $READTHEDOCS_OUTPUT/html
- echo "site-url = \"/$READTHEDOCS_LANGUAGE/$READTHEDOCS_VERSION/\"" >> doc/book.toml
- bin/mdbook build doc --dest-dir=$READTHEDOCS_OUTPUT/html
25 changes: 13 additions & 12 deletions doc/src/guide/python/actions.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Writing action commands in Python

In **row**, actions execute arbitrary **shell commands**. When your action is
**Python** code, you must structure that code so that it is a command line tool
that takes directories as arguments. There are many ways you can achieve this goal.
In **row**, actions execute arbitrary **shell commands**. When your action is **Python**
code, you must wrap it with command line parsing that takes directories as arguments.
There are many ways you can achieve this goal.

This guide will show you how to structure all of your actions in a single file:
`actions.py`. This layout is inspired by **row's** predecessor **signac-flow**
Expand Down Expand Up @@ -54,32 +54,34 @@ Now, create a file `actions.py` with the contents:
This file defines each **action** as a function with the same name. These functions take
an array of jobs as an argument: `def square(*jobs)` and `def compute_sum(*jobs)`. The
`if __name__ == "__main__":` block parses the command line arguments, forms an array of
signac jobs and calls the requested **action** function.
signac jobs, and calls the requested **action** function.

> Note: This example demonstrates looping over directories in **serial**. However, this
> structure also gives you the power to choose **serial** or **parallel** execution.
> structure also gives you the ability to choose **serial** or **[parallel]** execution.
> Grouping many directories into a single cluster job submission will increase your
> workflow's throughput.
> workflow's throuability.

[parallel]: ../concepts/process-parallelism.md

## Write workflow.toml

Next, replace the contents of `workflow.toml` with the corresponding workflow:
```toml
{{#include signac-workflow.toml}}
{{#include signac-workflow.toml:whole}}
```

*Both* actions have the same **command**, set once by the
[**default action**](../../workflow/default.md):
```toml
{{#include signac-workflow.toml:5}}
{{#include signac-workflow.toml:default}}
```

`python actions.py` executes the `actions.py` file above. It is given the argument
`--action $ACTION_NAME` which selects the Python function to call. Here `$ACTION_NAME`
is an [environment variable](../../env.md) that **row** sets in job scripts. The
last arguments are given by `{directories}`. Unlike `{directory}` shown in previous
tutorials, `{directories}` expands to *ALL* directories in the submitted **group**. In
this way, `action.py` is executed once and is free to process the list of directories in
tutorials, `{directories}` expands to *ALL* directories in the submitted **group**.
`action.py` is executed once and is free to process the list of directories in
any way it chooses (e.g. in serial, with
[multiprocessing parallelism, multiple threads](../concepts/thread-parallelism.md),
using [MPI parallelism](../concepts/process-parallelism.md), ...).
Expand Down Expand Up @@ -130,8 +132,7 @@ these steps:
> Note: You may write functions that take only one job `def action(job)` without
> modifying the given implementation of `__main__`. However, you will need to set
> `action.group.maximum_size = 1` or use `{directory}` to ensure that `action.py` is
> given a single directory. If you implement your code using arrays, you can use
> **row's** grouping functionality to your benefit.
> given a single directory.

## Next steps

Expand Down
4 changes: 4 additions & 0 deletions doc/src/guide/python/signac-workflow.toml
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
# ANCHOR: whole
[workspace]
value_file = "signac_statepoint.json"

# ANCHOR: default
[default.action]
command = "python actions.py --action $ACTION_NAME {directories}"
# ANCHOR_END: default

[[action]]
name = "square"
Expand All @@ -14,3 +17,4 @@ name = "compute_sum"
previous_actions = ["square"]
resources.walltime.per_directory = "00:00:01"
group.submit_whole = true
# ANCHOR_END: whole
2 changes: 2 additions & 0 deletions doc/src/guide/tutorial/group-workflow3.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ value_file = "value.json"
name = "process_point"
command = "echo {directory}"
[action.group]
# ANCHOR: sort
sort_by = ["/x"]
# ANCHOR_END: sort
[[action.group.include]]
condition = ["/type", "==", "point"]

Expand Down
2 changes: 2 additions & 0 deletions doc/src/guide/tutorial/group-workflow4.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@ name = "process_point"
command = "echo {directory}"
[action.group]
sort_by = ["/x"]
# ANCHOR: split
split_by_sort_key = true
# ANCHOR_END: split
[[action.group.include]]
condition = ["/type", "==", "point"]

Expand Down
2 changes: 2 additions & 0 deletions doc/src/guide/tutorial/group-workflow5.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@ name = "process_point"
command = "echo {directory}"
[action.group]
sort_by = ["/x"]
# ANCHOR: max
maximum_size = 4
# ANCHOR_END: max
[[action.group.include]]
condition = ["/type", "==", "point"]

Expand Down
24 changes: 13 additions & 11 deletions doc/src/guide/tutorial/group.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,11 @@ read thousands of files every time you execute a **row** command.

## Grouping by value

Now that your workspace directories have **values**, you can use those to
form **groups**. Every action in your workflow operates on **groups**. Set the
`action.group.include` key in an action to select which directories to include by
Now that your workspace directories have **values**, you can use those to form
**groups**. Every action in your workflow operates on **groups**. Add entries to the
`action.group.include` array in an action to select which directories to include by
**value**. To see how this works, replace the contents of `workflow.toml` with:

```toml
{{#include group-workflow2.toml}}
```
Expand All @@ -53,14 +54,15 @@ This workflow will apply the `process_point` action to the directories where
`value/type == "point"` and the `process_letter` action to the directories where
`value/type == "letter"`.

`condition` is a length 3 array with the contents: `[JSON pointer, operator, operand]`.
Think of each element as an expression. The
[*JSON pointer*](../concepts/json-pointers.md) is a string that reads a particular value
from the directory's **value**. The *operator* is a comparison operator: `"<"`, `"<="`,
`action.group.include` is an array of conditions. A directory is included when *any*
condition is true. `condition` is a length 3 array with the contents: `[JSON pointer,
operator, operand]`. Think of the condition as an expression. The
[*JSON pointer*](../concepts/json-pointers.md) is a string that references a portion of
the directory's **value**. The *operator* is a comparison operator: `"<"`, `"<="`,
`"=="`, `">="`, or `">"`. The *operand* is the value to compare to. Together, these 3
elements make a *condition*.

**Row** applies the *condition* to all directories in the workspace. When the
**Row** applies each *condition* to all directories in the workspace. When a
*condition* is true, the directory is included in the action's **groups**.

> Note: This implies that every JSON pointer used in an `include` condition **MUST**
Expand Down Expand Up @@ -114,7 +116,7 @@ behavior. You can choose to instead sort **groups** by any number of **value** e

To demonstrate, add the line:
```toml
{{#include group-workflow3.toml:9}}
{{#include group-workflow3.toml:sort}}
```
to the `[action.group]` table for the `"process_point"` action.

Expand Down Expand Up @@ -169,7 +171,7 @@ groups to a `maximum_size`.

Add the line:
```toml
{{#include group-workflow4.toml:10}}
{{#include group-workflow4.toml:split}}
```
to the `[action.group]` table for the `"process_point"` action.

Expand Down Expand Up @@ -225,7 +227,7 @@ of other ways that you might utilize `split_by_sort_key` in your workflows.
**Row** can also limit groups to a maximum size. To see how this works,
**REPLACE** the `split_by_sort_key = true` line with:
```toml
{{#include group-workflow5.toml:10}}
{{#include group-workflow5.toml:max}}
```

Now:
Expand Down
12 changes: 6 additions & 6 deletions doc/src/guide/tutorial/submit.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ This section explains how to **submit** jobs to the **scheduler** with **row**.
You can skip to the [next heading](#checking-your-job-script) if you are using one of
these clusters.

If not, then you need to create a configuration files that describe your
cluster. You may also need to define launchers specific to your cluster.
If not, then you need to create a configuration file that describe your cluster. You may
also need to define launchers specific to your cluster.

* [`$HOME/.config/row/clusters.toml`](../../clusters/index.md) gives your cluster
a name, instructions on how to identify it, and lists the partitions your cluster
Expand Down Expand Up @@ -56,7 +56,7 @@ Remember, **YOU ARE RESPONSIBLE** for the content of the scripts that you submit
Make sure that the script is requesting the correct resources and is routed to the
correct **partition**.

For example, the example workflow might generate a job script like this on Anvil:
For example, the example workflow generates this job script on Anvil:
```bash
#!/bin/bash
#SBATCH --job-name=hello-directory0+2
Expand Down Expand Up @@ -85,9 +85,9 @@ Notice the selection of 1 task on the `shared` **partition**. This is correct fo
where the `shared` **partition** allows jobs smaller than one node and charges based
on the number of CPU cores quested.

> Note: If you are using **row** on one of the built-in clusters, then **row** should
> always select the correct partition for your jobs. If you find it does not, please
> open an [issue](https://github.com/glotzerlab/row/issues).
> Note: When using **row** on one of the built-in clusters **row** should always select
> the correct partition for your jobs. Please open an
> [issue](https://github.com/glotzerlab/row/issues) if it does not.

### Submitting jobs

Expand Down
6 changes: 6 additions & 0 deletions doc/src/release-notes.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# Release notes

## 0.1.2 (2024-05-29)

*Fixed:*

* Erroneous code examples in the *Grouping directories* tutorial.

## 0.1.1 (2024-05-29)

*Added:*
Expand Down
8 changes: 4 additions & 4 deletions doc/src/workflow/action/group.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ which may be true for a directory to be included in this group.

Each selector is a **table** with only one of the following keys:
* `condition`: An array of three elements: The *JSON pointer*, *the operator*, and the
*operand*. The [JSON pointer](../../guide/concepts/json-pointers.md) points to a
specific element from the directory's value. The operator may be `"<"`, `"<="`,
*operand*. The [JSON pointer](../../guide/concepts/json-pointers.md) references a
specific portion of the directory's value. The operator may be `"<"`, `"<="`,
`"=="`, `">="`, or `">"`.
* `all`: Array of conditions (see above). All conditions must be true for this selector
to be true.
Expand All @@ -54,8 +54,8 @@ Compare by array:
condition = ["/array", "==", [1, "string", 14.0]
```

Both operands **must** have the same data type. The JSON pointer must be present in the
value of **every** directory.
Both operands **must** have the same data type. The element referenced by JSON pointer
must be present in the value of **every** directory.

When you omit `include`, **row** includes **all** directories in the workspace.

Expand Down
9 changes: 9 additions & 0 deletions doc/theme/head.hbs
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
<!-- Google tag (gtag.js) -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-2DY110Y9DY"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());

gtag('config', 'G-2DY110Y9DY');
</script>