diff --git a/dev/search/search_index.json b/dev/search/search_index.json index 97c63c4d..2bd46214 100644 --- a/dev/search/search_index.json +++ b/dev/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Setup","text":"

LUTE is publically available on GitHub. In order to run it, the first step is to clone the repository:

# Navigate to the directory of your choice.\ngit clone@github.com:slac-lcls/lute\n

The repository directory structure is as follows:

lute\n  |--- config             # Configuration YAML files (see below) and templates for third party config\n  |--- docs               # Documentation (including this page)\n  |--- launch_scripts     # Entry points for using SLURM and communicating with Airflow\n  |--- lute               # Code\n        |--- run_task.py  # Script to run an individual managed Task\n        |--- ...\n  |--- utilities          # Help utility programs\n  |--- workflows          # This directory contains workflow definitions. It is synced elsewhere and not used directly.\n\n

In general, most interactions with the software will be through scripts located in the launch_scripts directory. Some users (for certain use-cases) may also choose to run the run_task.py script directly - it's location has been highlighted within hierarchy. To begin with you will need a YAML file, templates for which are available in the config directory. The structure of the YAML file and how to use the various launch scripts are described in more detail below.

"},{"location":"#a-note-on-utilties","title":"A note on utilties","text":"

In the utilities directory there are two useful programs to provide assistance with using the software:

"},{"location":"#basic-usage","title":"Basic Usage","text":""},{"location":"#overview","title":"Overview","text":"

LUTE runs code as Tasks that are managed by an Executor. The Executor provides modifications to the environment the Task runs in, as well as controls details of inter-process communication, reporting results to the eLog, etc. Combinations of specific Executors and Tasks are already provided, and are referred to as managed Tasks. Managed Tasks are submitted as a single unit. They can be run individually, or a series of independent steps can be submitted all at once in the form of a workflow, or directed acyclic graph (DAG). This latter option makes use of Airflow to manage the individual execution steps.

Running analysis with LUTE is the process of submitting one or more managed Tasks. This is generally a two step process.

  1. First, a configuration YAML file is prepared. This contains the parameterizations of all the Tasks which you may run.
  2. Individual managed Task submission, or workflow (DAG) submission.

These two steps are described below.

"},{"location":"#preparing-a-configuration-yaml","title":"Preparing a Configuration YAML","text":"

All Tasks are parameterized through a single configuration YAML file - even third party code which requires its own configuration files is managed through this YAML file. The basic structure is split into two documents, a brief header section which contains information that is applicable across all Tasks, such as the experiment name, run numbers and the working directory, followed by per Task parameters:

%YAML 1.3\n---\ntitle: \"Some title.\"\nexperiment: \"MYEXP123\"\n# run: 12 # Does not need to be provided\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/scratch/users/d/dorlhiac\"\n...\n---\nTaskOne:\n  param_a: 123\n  param_b: 456\n  param_c:\n    sub_var: 3\n    sub_var2: 4\n\nTaskTwo:\n  new_param1: 3\n  new_param2: 4\n\n# ...\n...\n

In the first document, the header, it is important that the work_dir is properly specified. This is the root directory from which Task outputs will be written, and the LUTE database will be stored. It may also be desirable to modify the task_timeout parameter which defines the time limit for individual Task jobs. By default it is set to 10 minutes, although this may not be sufficient for long running jobs. This value will be applied to all Tasks so should account for the longest running job you expect.

The actual analysis parameters are defined in the second document. As these vary from Task to Task, a full description will not be provided here. An actual template with real Task parameters is available in config/test.yaml. Your analysis POC can also help you set up and choose the correct Tasks to include as a starting point. The template YAML file has further descriptions of what each parameter does and how to fill it out. You can also refer to the lute_help program described under the following sub-heading.

Some things to consider and possible points of confusion:

Managed Task The Task it Runs Task Description SmallDataProducer SubmitSMD Smalldata production CrystFELIndexer IndexCrystFEL Crystallographic indexing PartialatorMerger MergePartialator Crystallographic merging HKLComparer CompareHKL Crystallographic figures of merit HKLManipulator ManipulateHKL Crystallographic format conversions DimpleSolver DimpleSolve Crystallographic structure solution with molecular replacement PeakFinderPyAlgos FindPeaksPyAlgos Peak finding with PyAlgos algorithm. PeakFinderPsocake FindPeaksPsocake Peak finding with psocake algorithm. StreamFileConcatenator ConcatenateStreamFiles Stream file concatenation."},{"location":"#how-do-i-know-what-parameters-are-available-and-what-they-do","title":"How do I know what parameters are available, and what they do?","text":"

A summary of Task parameters is available through the lute_help program.

> utilities/lute_help -t [TaskName]\n

Note, some parameters may say \"Unknown description\" - this either means they are using an old-style defintion that does not include parameter help, or they may have some internal use. In particular you will see this for lute_config on every Task, this parameter is filled in automatically and should be ignored. E.g. as an example:

> utilities/lute_help -t IndexCrystFEL\nINFO:__main__:Fetching parameter information for IndexCrystFEL.\nIndexCrystFEL\n-------------\nParameters for CrystFEL's `indexamajig`.\n\nThere are many parameters, and many combinations. For more information on\nusage, please refer to the CrystFEL documentation, here:\nhttps://www.desy.de/~twhite/crystfel/manual-indexamajig.html\n\n\nRequired Parameters:\n--------------------\n[...]\n\nAll Parameters:\n-------------\n[...]\n\nhighres (number)\n    Mark all pixels greater than `x` has bad.\n\nprofile (boolean) - Default: False\n    Display timing data to monitor performance.\n\ntemp_dir (string)\n    Specify a path for the temp files folder.\n\nwait_for_file (integer) - Default: 0\n    Wait at most `x` seconds for a file to be created. A value of -1 means wait forever.\n\nno_image_data (boolean) - Default: False\n    Load only the metadata, no iamges. Can check indexability without high data requirements.\n\n[...]\n
"},{"location":"#running-managed-tasks-and-workflows-dags","title":"Running Managed Tasks and Workflows (DAGs)","text":"

After a YAML file has been filled in you can run a Task. There are multiple ways to submit a Task, but there are 3 that are most likely:

  1. Run a single managed Task interactively by running python ...
  2. Run a single managed Task as a batch job (e.g. on S3DF) via a SLURM submission submit_slurm.sh ...
  3. Run a DAG (workflow with multiple managed Tasks).

These will be covered in turn below; however, in general all methods will require two parameters: the path to a configuration YAML file, and the name of the managed Task or workflow you want to run. When submitting via SLURM or submitting an entire workflow there are additional parameters to control these processes.

"},{"location":"#running-single-managed-tasks-interactively","title":"Running single managed Tasks interactively","text":"

The simplest submission method is just to run Python interactively. In most cases this is not practical for long-running analysis, but may be of use for short Tasks or when debugging. From the root directory of the LUTE repository (or after installation) you can use the run_task.py script:

> python -B [-O] run_task.py -t <ManagedTaskName> -c </path/to/config/yaml>\n

The command-line arguments in square brackets [] are optional, while those in <> must be provided:

"},{"location":"#submitting-a-single-managed-task-as-a-batch-job","title":"Submitting a single managed Task as a batch job","text":"

On S3DF you can also submit individual managed Tasks to run as batch jobs. To do so use launch_scripts/submit_slurm.sh

> launch_scripts/submit_slurm.sh -t <ManagedTaskName> -c </path/to/config/yaml> [--debug] $SLURM_ARGS\n

As before command-line arguments in square brackets [] are optional, while those in <> must be provided

In addition to the LUTE-specific arguments, SLURM arguments must also be provided ($SLURM_ARGS above). You can provide as many as you want; however you will need to at least provide:

You will likely also want to provide at a minimum:

In general, it is best to prefer the long-form of the SLURM-argument (--arg=<...>) in order to avoid potential clashes with present or future LUTE arguments.

"},{"location":"#workflow-dag-submission","title":"Workflow (DAG) submission","text":"

Finally, you can submit a full workflow (e.g. SFX analysis, smalldata production and summary results, geometry optimization...). This can be done using a single script, submit_launch_airflow.sh, similarly to the SLURM submission above:

> launch_scripts/submit_launch_airflow.sh /path/to/lute/launch_scripts/launch_airflow.py -c </path/to/yaml.yaml> -w <dag_name> [--debug] [--test] [-e <exp>] [-r <run>] $SLURM_ARGS\n

The submission process is slightly more complicated in this case. A more in-depth explanation is provided under \"Airflow Launch Steps\", in the advanced usage section below if interested. The parameters are as follows - as before command-line arguments in square brackets [] are optional, while those in <> must be provided:

The $SLURM_ARGS must be provided in the same manner as when submitting an individual managed Task by hand to be run as batch job with the script above. Note that these parameters will be used as the starting point for the SLURM arguments of every managed Task in the DAG; however, individual steps in the DAG may have overrides built-in where appropriate to make sure that step is not submitted with potentially incompatible arguments. For example, a single threaded analysis Task may be capped to running on one core, even if in general everything should be running on 100 cores, per the SLURM argument provided. These caps are added during development and cannot be disabled through configuration changes in the YAML.

DAG List

"},{"location":"#dag-submission-from-the-elog","title":"DAG Submission from the eLog","text":"

You can use the script in the previous section to submit jobs through the eLog. To do so navigate to the Workflow > Definitions tab using the blue navigation bar at the top of the eLog. On this tab, in the top-right corner (underneath the help and zoom icons) you can click the + sign to add a new workflow. This will bring up a \"Workflow definition\" UI window. When filling out the eLog workflow definition the following fields are needed (all of them):

Upon clicking create you will see a new entry in the table on the definitions page. In order to run MANUAL workflows, or re-run automatic workflows, you must navigate to the Workflows > Control tab. For each acquisition run you will find a drop down menu under the Job column. To submit a workflow you select it from this drop down menu by the Name you provided when creating its definition.

"},{"location":"#advanced-usage","title":"Advanced Usage","text":""},{"location":"#variable-substitution-in-yaml-files","title":"Variable Substitution in YAML Files","text":"

Using validators, it is possible to define (generally, default) model parameters for a Task in terms of other parameters. It is also possible to use validated Pydantic model parameters to substitute values into a configuration file required to run a third party Task (e.g. some Tasks may require their own JSON, TOML files, etc. to run properly). For more information on these types of substitutions, refer to the new_task.md documentation on Task creation.

These types of substitutions, however, have a limitation in that they are not easily adapted at run time. They therefore address only a small number of the possible combinations in the dependencies between different input parameters. In order to support more complex relationships between parameters, variable substitutions can also be used in the configuration YAML itself. Using a syntax similar to Jinja templates, you can define values for YAML parameters in terms of other parameters or environment variables. The values are substituted before Pydantic attempts to validate the configuration.

It is perhaps easiest to illustrate with an example. A test case is provided in config/test_var_subs.yaml and is reproduced here:

%YAML 1.3\n---\ntitle: \"Configuration to Test YAML Substitution\"\nexperiment: \"TestYAMLSubs\"\nrun: 12\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/scratch/users/d/dorlhiac\"\n...\n---\nOtherTask:\n  useful_other_var: \"USE ME!\"\n\nNonExistentTask:\n  test_sub: \"/path/to/{{ experiment }}/file_r{{ run:04d }}.input\"         # Substitute `experiment` and `run` from header above\n  test_env_sub: \"/path/to/{{ $EXPERIMENT }}/file.input\"                   # Substitute from the environment variable $EXPERIMENT\n  test_nested:\n    a: \"outfile_{{ run }}_one.out\"                                        # Substitute `run` from header above\n    b:\n      c: \"outfile_{{ run }}_two.out\"                                      # Also substitute `run` from header above\n      d: \"{{ OtherTask.useful_other_var }}\"                               # Substitute `useful_other_var` from `OtherTask`\n  test_fmt: \"{{ run:04d }}\"                                               # Subsitute `run` and format as 0012\n  test_env_fmt: \"{{ $RUN:04d }}\"                                          # Substitute environment variable $RUN and pad to 4 w/ zeros\n...\n

Input parameters in the config YAML can be substituted with either other input parameters or environment variables, with or without limited string formatting. All substitutions occur between double curly brackets: {{ VARIABLE_TO_SUBSTITUTE }}. Environment variables are indicated by $ in front of the variable name. Parameters from the header, i.e. the first YAML document (top section) containing the run, experiment, version fields, etc. can be substituted without any qualification. If you want to use the run parameter, you can substitute it using {{ run }}. All other parameters, i.e. from other Tasks or within Tasks, must use a qualified name. Nested levels are delimited using a .. E.g. consider a structure like:

Task:\n  param_set:\n    a: 1\n    b: 2\n    c: 3\n

In order to use parameter c, you would use {{ Task.param_set.c }} as the substitution.

Take care when using substitutions! This process will not try to guess for you. When a substitution is not available, e.g. due to misspelling, one of two things will happen:

Defining your own parameters

The configuration file is not validated in its totality, only on a Task-by-Task basis, but it is read in its totality. E.g. when running MyTask only that portion of the configuration is validated even though the entire file has been read, and is available for substitutions. As a result, it is safe to introduce extra entries into the YAML file, as long as they are not entered under a specific Task's configuration. This may be useful to create your own global substitutions, for example if there is a key variable that may be used across different Tasks. E.g. Consider a case where you want to create a more generic configuration file where a single variable is used by multiple Tasks. This single variable may be changed between experiments, for instance, but is likely static for the duration of a single set of analyses. In order to avoid a mistake when changing the configuration between experiments you can define this special variable (or variables) as a separate entry in the YAML, and make use of substitutions in each Task's configuration. This way the variable only needs to be changed in one place.

# Define our substitution. This is only for substitutiosns!\nMY_SPECIAL_SUB: \"EXPMT_DEPENDENT_VALUE\"  # Can change here once per experiment!\n\nRunTask1:\n  special_var: \"{{ MY_SPECIAL_SUB }}\"\n  var_1: 1\n  var_2: \"a\"\n  # ...\n\nRunTask2:\n  special_var: \"{{ MY_SPECIAL_SUB }}\"\n  var_3: \"abcd\"\n  var_4: 123\n  # ...\n\nRunTask3:\n  special_var: \"{{ MY_SPECIAL_SUB }}\"\n  #...\n\n# ... and so on\n
"},{"location":"#gotchas","title":"Gotchas!","text":"

Order matters

While in general you can use parameters that appear later in a YAML document to substitute for values of parameters that appear earlier, the substitutions themselves will be performed in order of appearance. It is therefore NOT possible to correctly use a later parameter as a substitution for an earlier one, if the later one itself depends on a substitution. The YAML document, however, can be rearranged without error. The order in the YAML document has no effect on execution order which is determined purely by the workflow definition. As mentioned above, the document is not validated in its entirety so rearrangements are allowed. For example consider the following situation which produces an incorrect substitution:

%YAML 1.3\n---\ntitle: \"Configuration to Test YAML Substitution\"\nexperiment: \"TestYAMLSubs\"\nrun: 12\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/data/lcls/ds/exp/experiment/scratch\"\n...\n---\nRunTaskOne:\n  input_dir: \"{{ RunTaskTwo.path }}\"  # Will incorrectly be \"{{ work_dir }}/additional_path/{{ $RUN }}\"\n  # ...\n\nRunTaskTwo:\n  # Remember `work_dir` and `run` come from the header document and don't need to\n  # be qualified\n  path: \"{{ work_dir }}/additional_path/{{ run }}\"\n...\n

This configuration can be rearranged to achieve the desired result:

%YAML 1.3\n---\ntitle: \"Configuration to Test YAML Substitution\"\nexperiment: \"TestYAMLSubs\"\nrun: 12\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/data/lcls/ds/exp/experiment/scratch\"\n...\n---\nRunTaskTwo:\n  # Remember `work_dir` comes from the header document and doesn't need to be qualified\n  path: \"{{ work_dir }}/additional_path/{{ run }}\"\n\nRunTaskOne:\n  input_dir: \"{{ RunTaskTwo.path }}\"  # Will now be /sdf/data/lcls/ds/exp/experiment/scratch/additional_path/12\n  # ...\n...\n

On the otherhand, relationships such as these may point to inconsistencies in the dependencies between Tasks which may warrant a refactor.

Found unhashable key

To avoid YAML parsing issues when using the substitution syntax, be sure to quote your substitutions. Before substitution is performed, a dictionary is first constructed by the pyyaml package which parses the document - it may fail to parse the document and raise an exception if the substitutions are not quoted. E.g.

# USE THIS\nMyTask:\n  var_sub: \"{{ other_var:04d }}\"\n\n# **DO NOT** USE THIS\nMyTask:\n  var_sub: {{ other_var:04d }}\n

During validation, Pydantic will by default cast variables if possible, because of this it is generally safe to use strings for substitutions. E.g. if your parameter is expecting an integer, and after substitution you pass \"2\", Pydantic will cast this to the int 2, and validation will succeed. As part of the substitution process limited type casting will also be handled if it is necessary for any formatting strings provided. E.g. \"{{ run:04d }}\" requires that run be an integer, so it will be treated as such in order to apply the formatting.

"},{"location":"#custom-run-time-dags","title":"Custom Run-Time DAGs","text":"

In most cases, standard DAGs should be called as described above. However, Airflow also supports the dynamic creation of DAGs, e.g. to vary the input data to various steps, or the number of steps that will occur. Some of this functionality has been used to allow for user-defined DAGs which are passed in the form of a dictionary, allowing Airflow to construct the workflow as it is running.

A basic YAML syntax is used to construct a series of nested dictionaries which define a DAG. Consider a simplified serial femtosecond crystallography DAG which runs peak finding through merging and then calculates some statistics. I.e. we want an execution order that looks like:

peak_finder >> indexer >> merger >> hkl_comparer\n

We can alternatively define this DAG in YAML:

task_name: PeakFinderPyAlgos\nslurm_params: ''\nnext:\n- task_name: CrystFELIndexer\n  slurm_params: ''\n  next: []\n  - task_name: PartialatorMerger\n    slurm_params: ''\n    next: []\n    - task_name: HKLComparer\n      slurm_params: ''\n      next:\n

I.e. we define a tree where each node is constructed using Node(task_name: str, slurm_params: str, next: List[Node]).

As a second example, to run task1 followed by task2 and task3 in parellel we would use:

task_name: Task1\nslurm_params: ''\nnext:\n- task_name: Task2\n  slurm_params: ''\n  next: []\n- task_name: Task3\n  slurm_params: ''\n  next: []\n

In order to run a DAG defined in this way, we pass the path to the YAML file we have defined it in to the launch script using -W <path_to_dag>. This is instead of calling it by name. E.g.

/path/to/lute/launch_scripts/submit_launch_airflow.sh /path/to/lute/launch_scripts/launch_airflow.py -e <exp> -r <run> -c /path/to/config -W <path_to_dag> --test [--debug] [SLURM_ARGS]\n

Note that fewer options are currently supported for configuring the operators for each step of the DAG. The slurm arguments can be replaced in their entirety using a custom slurm_params string but individual options cannot be modified.

"},{"location":"#debug-environment-variables","title":"Debug Environment Variables","text":"

Special markers have been inserted at certain points in the execution flow for LUTE. These can be enabled by setting the environment variables detailed below. These are intended to allow developers to exit the program at certain points to investigate behaviour or a bug. For instance, when working on configuration parsing, an environment variable can be set which exits the program after passing this step. This allows you to run LUTE otherwise as normal (described above), without having to modify any additional code or insert your own early exits.

Types of debug markers:

Developers can insert these markers as needed into their code to add new exit points, although as a rule of thumb they should be used sparingly, and generally only after major steps in the execution flow (e.g. after parsing, after beginning a task, after returning a result, etc.).

In order to include a new marker in your code:

from lute.execution.debug_utils import LUTE_DEBUG_EXIT\n\ndef my_code() -> None:\n    # ...\n    LUTE_DEBUG_EXIT(\"MYENVVAR\", \"Additional message to print\")\n    # If MYENVVAR is not set, the above function does nothing\n

You can enable a marker by setting to 1, e.g. to enable the example marker above while running Tester:

MYENVVAR=1 python -B run_task.py -t Tester -c config/test.yaml\n
"},{"location":"#currently-used-environment-variables","title":"Currently used environment variables","text":""},{"location":"#airflow-launch-and-dag-execution-steps","title":"Airflow Launch and DAG Execution Steps","text":"

The Airflow launch process actually involves a number of steps, and is rather complicated. There are two wrapper steps prior to getting to the actual Airflow API communication.

  1. launch_scripts/submit_launch_airflow.sh is run.
  2. This script calls /sdf/group/lcls/ds/tools/lute_launcher with all the same parameters that it was called with.
  3. lute_launcher runs the launch_scripts/launch_airflow.py script which was provided as the first argument. This is the true launch script
  4. launch_airflow.py communicates with the Airflow API, requesting that a specific DAG be launched. It then continues to run, and gathers the individual logs and the exit status of each step of the DAG.
  5. Airflow will then enter a loop of communication where it asks the JID to submit each step of the requested DAG as batch job using launch_scripts/submit_slurm.sh.

There are some specific reasons for this complexity:

"},{"location":"usage/","title":"Setup","text":"

LUTE is publically available on GitHub. In order to run it, the first step is to clone the repository:

# Navigate to the directory of your choice.\ngit clone@github.com:slac-lcls/lute\n

The repository directory structure is as follows:

lute\n  |--- config             # Configuration YAML files (see below) and templates for third party config\n  |--- docs               # Documentation (including this page)\n  |--- launch_scripts     # Entry points for using SLURM and communicating with Airflow\n  |--- lute               # Code\n        |--- run_task.py  # Script to run an individual managed Task\n        |--- ...\n  |--- utilities          # Help utility programs\n  |--- workflows          # This directory contains workflow definitions. It is synced elsewhere and not used directly.\n\n

In general, most interactions with the software will be through scripts located in the launch_scripts directory. Some users (for certain use-cases) may also choose to run the run_task.py script directly - it's location has been highlighted within hierarchy. To begin with you will need a YAML file, templates for which are available in the config directory. The structure of the YAML file and how to use the various launch scripts are described in more detail below.

"},{"location":"usage/#a-note-on-utilties","title":"A note on utilties","text":"

In the utilities directory there are two useful programs to provide assistance with using the software:

"},{"location":"usage/#basic-usage","title":"Basic Usage","text":""},{"location":"usage/#overview","title":"Overview","text":"

LUTE runs code as Tasks that are managed by an Executor. The Executor provides modifications to the environment the Task runs in, as well as controls details of inter-process communication, reporting results to the eLog, etc. Combinations of specific Executors and Tasks are already provided, and are referred to as managed Tasks. Managed Tasks are submitted as a single unit. They can be run individually, or a series of independent steps can be submitted all at once in the form of a workflow, or directed acyclic graph (DAG). This latter option makes use of Airflow to manage the individual execution steps.

Running analysis with LUTE is the process of submitting one or more managed Tasks. This is generally a two step process.

  1. First, a configuration YAML file is prepared. This contains the parameterizations of all the Tasks which you may run.
  2. Individual managed Task submission, or workflow (DAG) submission.

These two steps are described below.

"},{"location":"usage/#preparing-a-configuration-yaml","title":"Preparing a Configuration YAML","text":"

All Tasks are parameterized through a single configuration YAML file - even third party code which requires its own configuration files is managed through this YAML file. The basic structure is split into two documents, a brief header section which contains information that is applicable across all Tasks, such as the experiment name, run numbers and the working directory, followed by per Task parameters:

%YAML 1.3\n---\ntitle: \"Some title.\"\nexperiment: \"MYEXP123\"\n# run: 12 # Does not need to be provided\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/scratch/users/d/dorlhiac\"\n...\n---\nTaskOne:\n  param_a: 123\n  param_b: 456\n  param_c:\n    sub_var: 3\n    sub_var2: 4\n\nTaskTwo:\n  new_param1: 3\n  new_param2: 4\n\n# ...\n...\n

In the first document, the header, it is important that the work_dir is properly specified. This is the root directory from which Task outputs will be written, and the LUTE database will be stored. It may also be desirable to modify the task_timeout parameter which defines the time limit for individual Task jobs. By default it is set to 10 minutes, although this may not be sufficient for long running jobs. This value will be applied to all Tasks so should account for the longest running job you expect.

The actual analysis parameters are defined in the second document. As these vary from Task to Task, a full description will not be provided here. An actual template with real Task parameters is available in config/test.yaml. Your analysis POC can also help you set up and choose the correct Tasks to include as a starting point. The template YAML file has further descriptions of what each parameter does and how to fill it out. You can also refer to the lute_help program described under the following sub-heading.

Some things to consider and possible points of confusion:

Managed Task The Task it Runs Task Description SmallDataProducer SubmitSMD Smalldata production CrystFELIndexer IndexCrystFEL Crystallographic indexing PartialatorMerger MergePartialator Crystallographic merging HKLComparer CompareHKL Crystallographic figures of merit HKLManipulator ManipulateHKL Crystallographic format conversions DimpleSolver DimpleSolve Crystallographic structure solution with molecular replacement PeakFinderPyAlgos FindPeaksPyAlgos Peak finding with PyAlgos algorithm. PeakFinderPsocake FindPeaksPsocake Peak finding with psocake algorithm. StreamFileConcatenator ConcatenateStreamFiles Stream file concatenation."},{"location":"usage/#how-do-i-know-what-parameters-are-available-and-what-they-do","title":"How do I know what parameters are available, and what they do?","text":"

A summary of Task parameters is available through the lute_help program.

> utilities/lute_help -t [TaskName]\n

Note, some parameters may say \"Unknown description\" - this either means they are using an old-style defintion that does not include parameter help, or they may have some internal use. In particular you will see this for lute_config on every Task, this parameter is filled in automatically and should be ignored. E.g. as an example:

> utilities/lute_help -t IndexCrystFEL\nINFO:__main__:Fetching parameter information for IndexCrystFEL.\nIndexCrystFEL\n-------------\nParameters for CrystFEL's `indexamajig`.\n\nThere are many parameters, and many combinations. For more information on\nusage, please refer to the CrystFEL documentation, here:\nhttps://www.desy.de/~twhite/crystfel/manual-indexamajig.html\n\n\nRequired Parameters:\n--------------------\n[...]\n\nAll Parameters:\n-------------\n[...]\n\nhighres (number)\n    Mark all pixels greater than `x` has bad.\n\nprofile (boolean) - Default: False\n    Display timing data to monitor performance.\n\ntemp_dir (string)\n    Specify a path for the temp files folder.\n\nwait_for_file (integer) - Default: 0\n    Wait at most `x` seconds for a file to be created. A value of -1 means wait forever.\n\nno_image_data (boolean) - Default: False\n    Load only the metadata, no iamges. Can check indexability without high data requirements.\n\n[...]\n
"},{"location":"usage/#running-managed-tasks-and-workflows-dags","title":"Running Managed Tasks and Workflows (DAGs)","text":"

After a YAML file has been filled in you can run a Task. There are multiple ways to submit a Task, but there are 3 that are most likely:

  1. Run a single managed Task interactively by running python ...
  2. Run a single managed Task as a batch job (e.g. on S3DF) via a SLURM submission submit_slurm.sh ...
  3. Run a DAG (workflow with multiple managed Tasks).

These will be covered in turn below; however, in general all methods will require two parameters: the path to a configuration YAML file, and the name of the managed Task or workflow you want to run. When submitting via SLURM or submitting an entire workflow there are additional parameters to control these processes.

"},{"location":"usage/#running-single-managed-tasks-interactively","title":"Running single managed Tasks interactively","text":"

The simplest submission method is just to run Python interactively. In most cases this is not practical for long-running analysis, but may be of use for short Tasks or when debugging. From the root directory of the LUTE repository (or after installation) you can use the run_task.py script:

> python -B [-O] run_task.py -t <ManagedTaskName> -c </path/to/config/yaml>\n

The command-line arguments in square brackets [] are optional, while those in <> must be provided:

"},{"location":"usage/#submitting-a-single-managed-task-as-a-batch-job","title":"Submitting a single managed Task as a batch job","text":"

On S3DF you can also submit individual managed Tasks to run as batch jobs. To do so use launch_scripts/submit_slurm.sh

> launch_scripts/submit_slurm.sh -t <ManagedTaskName> -c </path/to/config/yaml> [--debug] $SLURM_ARGS\n

As before command-line arguments in square brackets [] are optional, while those in <> must be provided

In addition to the LUTE-specific arguments, SLURM arguments must also be provided ($SLURM_ARGS above). You can provide as many as you want; however you will need to at least provide:

You will likely also want to provide at a minimum:

In general, it is best to prefer the long-form of the SLURM-argument (--arg=<...>) in order to avoid potential clashes with present or future LUTE arguments.

"},{"location":"usage/#workflow-dag-submission","title":"Workflow (DAG) submission","text":"

Finally, you can submit a full workflow (e.g. SFX analysis, smalldata production and summary results, geometry optimization...). This can be done using a single script, submit_launch_airflow.sh, similarly to the SLURM submission above:

> launch_scripts/submit_launch_airflow.sh /path/to/lute/launch_scripts/launch_airflow.py -c </path/to/yaml.yaml> -w <dag_name> [--debug] [--test] [-e <exp>] [-r <run>] $SLURM_ARGS\n

The submission process is slightly more complicated in this case. A more in-depth explanation is provided under \"Airflow Launch Steps\", in the advanced usage section below if interested. The parameters are as follows - as before command-line arguments in square brackets [] are optional, while those in <> must be provided:

The $SLURM_ARGS must be provided in the same manner as when submitting an individual managed Task by hand to be run as batch job with the script above. Note that these parameters will be used as the starting point for the SLURM arguments of every managed Task in the DAG; however, individual steps in the DAG may have overrides built-in where appropriate to make sure that step is not submitted with potentially incompatible arguments. For example, a single threaded analysis Task may be capped to running on one core, even if in general everything should be running on 100 cores, per the SLURM argument provided. These caps are added during development and cannot be disabled through configuration changes in the YAML.

DAG List

"},{"location":"usage/#dag-submission-from-the-elog","title":"DAG Submission from the eLog","text":"

You can use the script in the previous section to submit jobs through the eLog. To do so navigate to the Workflow > Definitions tab using the blue navigation bar at the top of the eLog. On this tab, in the top-right corner (underneath the help and zoom icons) you can click the + sign to add a new workflow. This will bring up a \"Workflow definition\" UI window. When filling out the eLog workflow definition the following fields are needed (all of them):

Upon clicking create you will see a new entry in the table on the definitions page. In order to run MANUAL workflows, or re-run automatic workflows, you must navigate to the Workflows > Control tab. For each acquisition run you will find a drop down menu under the Job column. To submit a workflow you select it from this drop down menu by the Name you provided when creating its definition.

"},{"location":"usage/#advanced-usage","title":"Advanced Usage","text":""},{"location":"usage/#variable-substitution-in-yaml-files","title":"Variable Substitution in YAML Files","text":"

Using validators, it is possible to define (generally, default) model parameters for a Task in terms of other parameters. It is also possible to use validated Pydantic model parameters to substitute values into a configuration file required to run a third party Task (e.g. some Tasks may require their own JSON, TOML files, etc. to run properly). For more information on these types of substitutions, refer to the new_task.md documentation on Task creation.

These types of substitutions, however, have a limitation in that they are not easily adapted at run time. They therefore address only a small number of the possible combinations in the dependencies between different input parameters. In order to support more complex relationships between parameters, variable substitutions can also be used in the configuration YAML itself. Using a syntax similar to Jinja templates, you can define values for YAML parameters in terms of other parameters or environment variables. The values are substituted before Pydantic attempts to validate the configuration.

It is perhaps easiest to illustrate with an example. A test case is provided in config/test_var_subs.yaml and is reproduced here:

%YAML 1.3\n---\ntitle: \"Configuration to Test YAML Substitution\"\nexperiment: \"TestYAMLSubs\"\nrun: 12\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/scratch/users/d/dorlhiac\"\n...\n---\nOtherTask:\n  useful_other_var: \"USE ME!\"\n\nNonExistentTask:\n  test_sub: \"/path/to/{{ experiment }}/file_r{{ run:04d }}.input\"         # Substitute `experiment` and `run` from header above\n  test_env_sub: \"/path/to/{{ $EXPERIMENT }}/file.input\"                   # Substitute from the environment variable $EXPERIMENT\n  test_nested:\n    a: \"outfile_{{ run }}_one.out\"                                        # Substitute `run` from header above\n    b:\n      c: \"outfile_{{ run }}_two.out\"                                      # Also substitute `run` from header above\n      d: \"{{ OtherTask.useful_other_var }}\"                               # Substitute `useful_other_var` from `OtherTask`\n  test_fmt: \"{{ run:04d }}\"                                               # Subsitute `run` and format as 0012\n  test_env_fmt: \"{{ $RUN:04d }}\"                                          # Substitute environment variable $RUN and pad to 4 w/ zeros\n...\n

Input parameters in the config YAML can be substituted with either other input parameters or environment variables, with or without limited string formatting. All substitutions occur between double curly brackets: {{ VARIABLE_TO_SUBSTITUTE }}. Environment variables are indicated by $ in front of the variable name. Parameters from the header, i.e. the first YAML document (top section) containing the run, experiment, version fields, etc. can be substituted without any qualification. If you want to use the run parameter, you can substitute it using {{ run }}. All other parameters, i.e. from other Tasks or within Tasks, must use a qualified name. Nested levels are delimited using a .. E.g. consider a structure like:

Task:\n  param_set:\n    a: 1\n    b: 2\n    c: 3\n

In order to use parameter c, you would use {{ Task.param_set.c }} as the substitution.

Take care when using substitutions! This process will not try to guess for you. When a substitution is not available, e.g. due to misspelling, one of two things will happen:

Defining your own parameters

The configuration file is not validated in its totality, only on a Task-by-Task basis, but it is read in its totality. E.g. when running MyTask only that portion of the configuration is validated even though the entire file has been read, and is available for substitutions. As a result, it is safe to introduce extra entries into the YAML file, as long as they are not entered under a specific Task's configuration. This may be useful to create your own global substitutions, for example if there is a key variable that may be used across different Tasks. E.g. Consider a case where you want to create a more generic configuration file where a single variable is used by multiple Tasks. This single variable may be changed between experiments, for instance, but is likely static for the duration of a single set of analyses. In order to avoid a mistake when changing the configuration between experiments you can define this special variable (or variables) as a separate entry in the YAML, and make use of substitutions in each Task's configuration. This way the variable only needs to be changed in one place.

# Define our substitution. This is only for substitutiosns!\nMY_SPECIAL_SUB: \"EXPMT_DEPENDENT_VALUE\"  # Can change here once per experiment!\n\nRunTask1:\n  special_var: \"{{ MY_SPECIAL_SUB }}\"\n  var_1: 1\n  var_2: \"a\"\n  # ...\n\nRunTask2:\n  special_var: \"{{ MY_SPECIAL_SUB }}\"\n  var_3: \"abcd\"\n  var_4: 123\n  # ...\n\nRunTask3:\n  special_var: \"{{ MY_SPECIAL_SUB }}\"\n  #...\n\n# ... and so on\n
"},{"location":"usage/#gotchas","title":"Gotchas!","text":"

Order matters

While in general you can use parameters that appear later in a YAML document to substitute for values of parameters that appear earlier, the substitutions themselves will be performed in order of appearance. It is therefore NOT possible to correctly use a later parameter as a substitution for an earlier one, if the later one itself depends on a substitution. The YAML document, however, can be rearranged without error. The order in the YAML document has no effect on execution order which is determined purely by the workflow definition. As mentioned above, the document is not validated in its entirety so rearrangements are allowed. For example consider the following situation which produces an incorrect substitution:

%YAML 1.3\n---\ntitle: \"Configuration to Test YAML Substitution\"\nexperiment: \"TestYAMLSubs\"\nrun: 12\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/data/lcls/ds/exp/experiment/scratch\"\n...\n---\nRunTaskOne:\n  input_dir: \"{{ RunTaskTwo.path }}\"  # Will incorrectly be \"{{ work_dir }}/additional_path/{{ $RUN }}\"\n  # ...\n\nRunTaskTwo:\n  # Remember `work_dir` and `run` come from the header document and don't need to\n  # be qualified\n  path: \"{{ work_dir }}/additional_path/{{ run }}\"\n...\n

This configuration can be rearranged to achieve the desired result:

%YAML 1.3\n---\ntitle: \"Configuration to Test YAML Substitution\"\nexperiment: \"TestYAMLSubs\"\nrun: 12\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/data/lcls/ds/exp/experiment/scratch\"\n...\n---\nRunTaskTwo:\n  # Remember `work_dir` comes from the header document and doesn't need to be qualified\n  path: \"{{ work_dir }}/additional_path/{{ run }}\"\n\nRunTaskOne:\n  input_dir: \"{{ RunTaskTwo.path }}\"  # Will now be /sdf/data/lcls/ds/exp/experiment/scratch/additional_path/12\n  # ...\n...\n

On the otherhand, relationships such as these may point to inconsistencies in the dependencies between Tasks which may warrant a refactor.

Found unhashable key

To avoid YAML parsing issues when using the substitution syntax, be sure to quote your substitutions. Before substitution is performed, a dictionary is first constructed by the pyyaml package which parses the document - it may fail to parse the document and raise an exception if the substitutions are not quoted. E.g.

# USE THIS\nMyTask:\n  var_sub: \"{{ other_var:04d }}\"\n\n# **DO NOT** USE THIS\nMyTask:\n  var_sub: {{ other_var:04d }}\n

During validation, Pydantic will by default cast variables if possible, because of this it is generally safe to use strings for substitutions. E.g. if your parameter is expecting an integer, and after substitution you pass \"2\", Pydantic will cast this to the int 2, and validation will succeed. As part of the substitution process limited type casting will also be handled if it is necessary for any formatting strings provided. E.g. \"{{ run:04d }}\" requires that run be an integer, so it will be treated as such in order to apply the formatting.

"},{"location":"usage/#custom-run-time-dags","title":"Custom Run-Time DAGs","text":"

In most cases, standard DAGs should be called as described above. However, Airflow also supports the dynamic creation of DAGs, e.g. to vary the input data to various steps, or the number of steps that will occur. Some of this functionality has been used to allow for user-defined DAGs which are passed in the form of a dictionary, allowing Airflow to construct the workflow as it is running.

A basic YAML syntax is used to construct a series of nested dictionaries which define a DAG. Consider a simplified serial femtosecond crystallography DAG which runs peak finding through merging and then calculates some statistics. I.e. we want an execution order that looks like:

peak_finder >> indexer >> merger >> hkl_comparer\n

We can alternatively define this DAG in YAML:

task_name: PeakFinderPyAlgos\nslurm_params: ''\nnext:\n- task_name: CrystFELIndexer\n  slurm_params: ''\n  next: []\n  - task_name: PartialatorMerger\n    slurm_params: ''\n    next: []\n    - task_name: HKLComparer\n      slurm_params: ''\n      next:\n

I.e. we define a tree where each node is constructed using Node(task_name: str, slurm_params: str, next: List[Node]).

As a second example, to run task1 followed by task2 and task3 in parellel we would use:

task_name: Task1\nslurm_params: ''\nnext:\n- task_name: Task2\n  slurm_params: ''\n  next: []\n- task_name: Task3\n  slurm_params: ''\n  next: []\n

In order to run a DAG defined in this way, we pass the path to the YAML file we have defined it in to the launch script using -W <path_to_dag>. This is instead of calling it by name. E.g.

/path/to/lute/launch_scripts/submit_launch_airflow.sh /path/to/lute/launch_scripts/launch_airflow.py -e <exp> -r <run> -c /path/to/config -W <path_to_dag> --test [--debug] [SLURM_ARGS]\n

Note that fewer options are currently supported for configuring the operators for each step of the DAG. The slurm arguments can be replaced in their entirety using a custom slurm_params string but individual options cannot be modified.

"},{"location":"usage/#debug-environment-variables","title":"Debug Environment Variables","text":"

Special markers have been inserted at certain points in the execution flow for LUTE. These can be enabled by setting the environment variables detailed below. These are intended to allow developers to exit the program at certain points to investigate behaviour or a bug. For instance, when working on configuration parsing, an environment variable can be set which exits the program after passing this step. This allows you to run LUTE otherwise as normal (described above), without having to modify any additional code or insert your own early exits.

Types of debug markers:

Developers can insert these markers as needed into their code to add new exit points, although as a rule of thumb they should be used sparingly, and generally only after major steps in the execution flow (e.g. after parsing, after beginning a task, after returning a result, etc.).

In order to include a new marker in your code:

from lute.execution.debug_utils import LUTE_DEBUG_EXIT\n\ndef my_code() -> None:\n    # ...\n    LUTE_DEBUG_EXIT(\"MYENVVAR\", \"Additional message to print\")\n    # If MYENVVAR is not set, the above function does nothing\n

You can enable a marker by setting to 1, e.g. to enable the example marker above while running Tester:

MYENVVAR=1 python -B run_task.py -t Tester -c config/test.yaml\n
"},{"location":"usage/#currently-used-environment-variables","title":"Currently used environment variables","text":""},{"location":"usage/#airflow-launch-and-dag-execution-steps","title":"Airflow Launch and DAG Execution Steps","text":"

The Airflow launch process actually involves a number of steps, and is rather complicated. There are two wrapper steps prior to getting to the actual Airflow API communication.

  1. launch_scripts/submit_launch_airflow.sh is run.
  2. This script calls /sdf/group/lcls/ds/tools/lute_launcher with all the same parameters that it was called with.
  3. lute_launcher runs the launch_scripts/launch_airflow.py script which was provided as the first argument. This is the true launch script
  4. launch_airflow.py communicates with the Airflow API, requesting that a specific DAG be launched. It then continues to run, and gathers the individual logs and the exit status of each step of the DAG.
  5. Airflow will then enter a loop of communication where it asks the JID to submit each step of the requested DAG as batch job using launch_scripts/submit_slurm.sh.

There are some specific reasons for this complexity:

"},{"location":"adrs/","title":"Architecture Decision Records","text":" ADR No. Record Date Title Status 1 2023-11-06 All analysis Tasks inherit from a base class Accepted 2 2023-11-06 Analysis Task submission and communication is performed via Executors Accepted 3 2023-11-06 Executors will run all Tasks via subprocess Proposed 4 2023-11-06 Airflow Operators and LUTE Executors are separate entities. Proposed 5 2023-12-06 Task-Executor IPC is Managed by Communicator Objects Proposed 6 2024-02-12 Third-party Config Files Managed by Templates Rendered by ThirdPartyTasks Proposed 7 2024-02-12 Task Configuration is Stored in a Database Managed by Executors Proposed 8 2024-03-18 Airflow credentials/authorization requires special launch program. Proposed 9 2024-04-15 Airflow launch script will run as long lived batch job. Proposed"},{"location":"adrs/MADR_LICENSE/","title":"MADR LICENSE","text":"

Copyright 2022 ADR Github Organization

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \u201cSoftware\u201d), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED \u201cAS IS\u201d, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

"},{"location":"adrs/adr-1/","title":"[ADR-1] All Analysis Tasks Inherit from a Base Class","text":"

Date: 2023-11-06

"},{"location":"adrs/adr-1/#status","title":"Status","text":"

Accepted

"},{"location":"adrs/adr-1/#context-and-problem-statement","title":"Context and Problem Statement","text":""},{"location":"adrs/adr-1/#decision","title":"Decision","text":""},{"location":"adrs/adr-1/#decision-drivers","title":"Decision Drivers","text":""},{"location":"adrs/adr-1/#considered-options","title":"Considered Options","text":""},{"location":"adrs/adr-1/#consequences","title":"Consequences","text":""},{"location":"adrs/adr-1/#compliance","title":"Compliance","text":""},{"location":"adrs/adr-1/#metadata","title":"Metadata","text":""},{"location":"adrs/adr-2/","title":"[ADR-2] Analysis Task Submission and Communication is Performed Via Executors","text":"

Date: 2023-11-06

"},{"location":"adrs/adr-2/#status","title":"Status","text":"

Accepted

"},{"location":"adrs/adr-2/#context-and-problem-statement","title":"Context and Problem Statement","text":""},{"location":"adrs/adr-2/#decision","title":"Decision","text":""},{"location":"adrs/adr-2/#decision-drivers","title":"Decision Drivers","text":""},{"location":"adrs/adr-2/#considered-options","title":"Considered Options","text":""},{"location":"adrs/adr-2/#consequences","title":"Consequences","text":""},{"location":"adrs/adr-2/#compliance","title":"Compliance","text":""},{"location":"adrs/adr-2/#metadata","title":"Metadata","text":""},{"location":"adrs/adr-3/","title":"[ADR-3] Executors will run all Tasks via subprocess","text":"

Date: 2023-11-06

"},{"location":"adrs/adr-3/#status","title":"Status","text":"

Proposed

"},{"location":"adrs/adr-3/#context-and-problem-statement","title":"Context and Problem Statement","text":""},{"location":"adrs/adr-3/#decision","title":"Decision","text":""},{"location":"adrs/adr-3/#decision-drivers","title":"Decision Drivers","text":""},{"location":"adrs/adr-3/#considered-options","title":"Considered Options","text":""},{"location":"adrs/adr-3/#consequences","title":"Consequences","text":""},{"location":"adrs/adr-3/#compliance","title":"Compliance","text":""},{"location":"adrs/adr-3/#metadata","title":"Metadata","text":""},{"location":"adrs/adr-4/","title":"[ADR-4] Airflow Operators and LUTE Executors are Separate Entities","text":"

Date: 2023-11-06

"},{"location":"adrs/adr-4/#status","title":"Status","text":"

Proposed

"},{"location":"adrs/adr-4/#context-and-problem-statement","title":"Context and Problem Statement","text":""},{"location":"adrs/adr-4/#decision","title":"Decision","text":""},{"location":"adrs/adr-4/#decision-drivers","title":"Decision Drivers","text":"

*

"},{"location":"adrs/adr-4/#considered-options","title":"Considered Options","text":"

*

"},{"location":"adrs/adr-4/#consequences","title":"Consequences","text":"

*

"},{"location":"adrs/adr-4/#compliance","title":"Compliance","text":""},{"location":"adrs/adr-4/#metadata","title":"Metadata","text":""},{"location":"adrs/adr-5/","title":"[ADR-5] Task-Executor IPC is Managed by Communicator Objects","text":"

Date: 2023-12-06

"},{"location":"adrs/adr-5/#status","title":"Status","text":"

Proposed

"},{"location":"adrs/adr-5/#context-and-problem-statement","title":"Context and Problem Statement","text":""},{"location":"adrs/adr-5/#decision","title":"Decision","text":"

Communicator objects which maintain simple read and write mechanisms for Message objects. These latter can contain arbitrary Python objects. Tasks do not interact directly with the communicator, but rather through specific instance methods which hide the communicator interfaces. Multiple Communicators can be used in parallel. The same Communicator objects are used identically at the Task and Executor layers - any changes to communication protocols are not transferred to the calling objects.

"},{"location":"adrs/adr-5/#decision-drivers","title":"Decision Drivers","text":""},{"location":"adrs/adr-5/#considered-options","title":"Considered Options","text":""},{"location":"adrs/adr-5/#communicator-types","title":"Communicator Types","text":""},{"location":"adrs/adr-5/#consequences","title":"Consequences","text":""},{"location":"adrs/adr-5/#compliance","title":"Compliance","text":""},{"location":"adrs/adr-5/#metadata","title":"Metadata","text":""},{"location":"adrs/adr-6/","title":"[ADR-6] Third-party Config Files Managed by Templates Rendered by ThirdPartyTasks","text":"

Date: 2024-02-12

"},{"location":"adrs/adr-6/#status","title":"Status","text":"

Proposed

"},{"location":"adrs/adr-6/#context-and-problem-statement","title":"Context and Problem Statement","text":""},{"location":"adrs/adr-6/#decision","title":"Decision","text":"

Templates will be used for the third party configuration files. A generic interface to heterogenous templates will be provided through a combination of pydantic models and the ThirdPartyTask implementation. The pydantic models will label extra arguments to ThirdPartyTasks as being TemplateParameters. I.e. any extra parameters are considered to be for a templated configuration file. The ThirdPartyTask will find the necessary template and render it if any extra parameters are found. This puts the burden of correct parsing on the template definition itself.

"},{"location":"adrs/adr-6/#decision-drivers","title":"Decision Drivers","text":""},{"location":"adrs/adr-6/#considered-options","title":"Considered Options","text":""},{"location":"adrs/adr-6/#consequences","title":"Consequences","text":""},{"location":"adrs/adr-6/#compliance","title":"Compliance","text":""},{"location":"adrs/adr-6/#metadata","title":"Metadata","text":""},{"location":"adrs/adr-7/","title":"[ADR-7] Task Configuration is Stored in a Database Managed by Executors","text":"

Date: 2024-02-12

"},{"location":"adrs/adr-7/#status","title":"Status","text":"

Proposed

"},{"location":"adrs/adr-7/#context-and-problem-statement","title":"Context and Problem Statement","text":""},{"location":"adrs/adr-7/#decision","title":"Decision","text":"

Upon Task completion the managing Executor will write the AnalysisConfig object, including TaskParameters, results and generic configuration information to a database. Some entries from this database can be retrieved to provide default files for TaskParameter fields; however, the Task itself has no knowledge, and does not access to the database.

"},{"location":"adrs/adr-7/#decision-drivers","title":"Decision Drivers","text":""},{"location":"adrs/adr-7/#considered-options","title":"Considered Options","text":""},{"location":"adrs/adr-7/#consequences","title":"Consequences","text":""},{"location":"adrs/adr-7/#compliance","title":"Compliance","text":""},{"location":"adrs/adr-7/#metadata","title":"Metadata","text":""},{"location":"adrs/adr-8/","title":"[ADR-8] Airflow credentials/authorization requires special launch program","text":"

Date: 2024-03-18

"},{"location":"adrs/adr-8/#status","title":"Status","text":"

Proposed

"},{"location":"adrs/adr-8/#context-and-problem-statement","title":"Context and Problem Statement","text":""},{"location":"adrs/adr-8/#decision","title":"Decision","text":"

A closed-source lute_launcher program will be used to run the Airflow launch scripts. This program accesses credentials with the correct permissions. Users should otherwise not have access to the credentials. This will help ensure the credentials can be used by everyone but only to run workflows and not perform restricted admin activities.

"},{"location":"adrs/adr-8/#decision-drivers","title":"Decision Drivers","text":""},{"location":"adrs/adr-8/#considered-options","title":"Considered Options","text":""},{"location":"adrs/adr-8/#consequences","title":"Consequences","text":""},{"location":"adrs/adr-8/#compliance","title":"Compliance","text":""},{"location":"adrs/adr-8/#metadata","title":"Metadata","text":""},{"location":"adrs/adr-9/","title":"[ADR-9] Airflow launch script will run as long lived batch job.","text":"

Date: 2024-04-15

"},{"location":"adrs/adr-9/#status","title":"Status","text":"

Proposed

"},{"location":"adrs/adr-9/#context-and-problem-statement","title":"Context and Problem Statement","text":""},{"location":"adrs/adr-9/#decision","title":"Decision","text":"

The Airflow launch script will be a long lived process, running for the duration of the entire DAG. It will provide basic status logging information, e.g. what Tasks are running, if they succeed or failed. Additionally, at the end of each Task job, the launch job will collect the log file from that job and append it to its own log.

As the Airflow launch script is an entry point used from the eLog, only its log file is available to users using that UI. By converting the launch script into a long-lived monitoring job it allows the log information to be easily accessible.

In order to accomplish this, the launch script must be submitted as a batch job, in order to comply with the 30 second timeout imposed by jobs run by the ARP. This necessitates providing an additional wrapper script.

"},{"location":"adrs/adr-9/#decision-drivers","title":"Decision Drivers","text":""},{"location":"adrs/adr-9/#considered-options","title":"Considered Options","text":""},{"location":"adrs/adr-9/#consequences","title":"Consequences","text":""},{"location":"adrs/adr-9/#compliance","title":"Compliance","text":""},{"location":"adrs/adr-9/#metadata","title":"Metadata","text":""},{"location":"adrs/madr_template/","title":"Madr template","text":""},{"location":"adrs/madr_template/#title","title":"Title","text":"

{ADR #X : Short description/title of feature/decision}

Date:

"},{"location":"adrs/madr_template/#status","title":"Status","text":"

{Accepted | Proposed | Rejected | Deprecated | Superseded} {If this proposal supersedes another, please indicate so, e.g. \"Status: Accepted, supersedes [ADR-3]\"} {Likewise, if this proposal was superseded, e.g. \"Status: Superseded by [ADR-2]\"}

"},{"location":"adrs/madr_template/#context-and-problem-statement","title":"Context and Problem Statement","text":"

{Describe the problem context and why this decision has been made/feature implemented.}

"},{"location":"adrs/madr_template/#decision","title":"Decision","text":"

{Describe how the solution was arrived at in the manner it was. You may use the sections below to help.}

"},{"location":"adrs/madr_template/#decision-drivers","title":"Decision Drivers","text":""},{"location":"adrs/madr_template/#considered-options","title":"Considered Options","text":""},{"location":"adrs/madr_template/#consequences","title":"Consequences","text":"

{Short description of anticipated consequences} * {Anticipated consequence 1} * {Anticipated consequence 2}

"},{"location":"adrs/madr_template/#compliance","title":"Compliance","text":"

{How will the decision/implementation be enforced. How will compliance be validated?}

"},{"location":"adrs/madr_template/#metadata","title":"Metadata","text":"

{Any additional information to include}

"},{"location":"design/database/","title":"LUTE Configuration Database Specification","text":"

Date: 2024-02-12 VERSION: v0.1

"},{"location":"design/database/#basic-outline","title":"Basic Outline","text":""},{"location":"design/database/#gen_cfg-table","title":"gen_cfg table","text":"

The general configuration table contains entries which may be shared between multiple Tasks. The format of the table is:

id title experiment run date lute_version task_timeout 2 \"My experiment desc\" \"EXPx00000 1 YYYY/MM/DD 0.1 6000

These parameters are extracted from the TaskParameters object. Each of those contains an AnalysisHeader object stored in the lute_config variable. For a given experimental run, this value will be shared across any Tasks that are executed.

"},{"location":"design/database/#column-descriptions","title":"Column descriptions","text":"Column Description id ID of the entry in this table. title Arbitrary description/title of the purpose of analysis. E.g. what kind of experiment is being conducted experiment LCLS Experiment. Can be a placeholder if debugging, etc. run LCLS Acquisition run. Can be a placeholder if debugging, testing, etc. date Date the configuration file was first setup. lute_version Version of the codebase being used to execute Tasks. task_timeout The maximum amount of time in seconds that a Task can run before being cancelled."},{"location":"design/database/#exec_cfg-table","title":"exec_cfg table","text":"

The Executor table contains information on the environment provided to the Executor for Task execution, the polling interval used for IPC between the Task and Executor and information on the communicator protocols used for IPC. This information can be shared between Tasks or between experimental runs, but not necessarily every Task of a given run will use exactly the same Executor configuration and environment.

id env poll_interval communicator_desc 2 \"VAR1=val1;VAR2=val2\" 0.1 \"PipeCommunicator...;SocketCommunicator...\""},{"location":"design/database/#column-descriptions_1","title":"Column descriptions","text":"Column Description id ID of the entry in this table. env Execution environment used by the Executor and by proxy any Tasks submitted by an Executor matching this entry. Environment is stored as a string with variables delimited by \";\" poll_interval Polling interval used for Task monitoring. communicator_desc Description of the Communicators used.

NOTE: The env column currently only stores variables related to SLURM or LUTE itself.

"},{"location":"design/database/#task-tables","title":"Task tables","text":"

For every Task a table of the following format will be created. The exact number of columns will depend on the specific Task, as the number of parameters can vary between them, and each parameter gets its own column. Within a table, multiple experiments and runs can coexist. The experiment and run are not recorded directly. Instead, the first two columns point to the id of entries in the general configuration and Executor tables respectively. The general configuration table entry will contain the experiment and run information.

id timestamp gen_cfg_id exec_cfg_id P1 P2 ... Pn result.task_status result.summary result.payload result.impl_schemas valid_flag 2 \"YYYY-MM-DD HH:MM:SS\" 1 1 1 2 ... 3 \"COMPLETED\" \"Summary\" \"XYZ\" \"schema1;schema3;\" 1 3 \"YYYY-MM-DD HH:MM:SS\" 1 1 3 1 ... 4 \"FAILED\" \"Summary\" \"XYZ\" \"schema1;schema3;\" 0

Parameter sets which can be described as nested dictionaries are flattened and then delimited with a . to create column names. Parameters which are lists (or Python tuples, etc.) have a column for each entry with names that include an index (counting from 0). E.g. consider the following dictionary of parameters:

param_dict: Dict[str, Any] = {\n    \"a\": {               # First parameter a\n        \"b\": (1, 2),\n        \"c\": 1,\n        # ...\n    },\n    \"a2\": 4,             # Second parameter a2\n    # ...\n}\n

The dictionary a will produce columns: a.b[0], a.b[1], a.c, and so on.

"},{"location":"design/database/#column-descriptions_2","title":"Column descriptions","text":"Column Description id ID of the entry in this table. CURRENT_TIMESTAMP Full timestamp for the entry. gen_cfg_id ID of the entry in the general config table that applies to this Task entry. That table has, e.g., experiment and run number. exec_cfg_id The ID of the entry in the Executor table which applies to this Task entry. P1 - Pn The specific parameters of the Task. The P{1..n} are replaced by the actual parameter names. result.task_status Reported exit status of the Task. Note that the output may still be labeled invalid by the valid_flag (see below). result.summary Short text summary of the Task result. This is provided by the Task, or sometimes the Executor. result.payload Full description of result from the Task. If the object is incompatible with the database, will instead be a pointer to where it can be found. result.impl_schemas A string of semi-colon separated schema(s) implemented by the Task. Schemas describe conceptually the type output the Task produces. valid_flag A boolean flag for whether the result is valid. May be 0 (False) if e.g., data is missing, or corrupt, or reported status is failed.

NOTE: The result.payload may be distinct from the output files. Payloads can be specified in terms of output parameters, specific output files, or are an optional summary of the results provided by the Task. E.g. this may include graphical descriptions of results (plots, figures, etc.). In many cases, however, the output files will most likely be pointed to by a parameter in one of the columns P{1...n} - if properly specified in the TaskParameters model the value of this output parameter will be replicated in the result.payload column as well..

"},{"location":"design/database/#api","title":"API","text":"

This API is intended to be used at the Executor level, with some calls intended to provide default values for Pydantic models. Utilities for reading and inspecting the database outside of normal Task execution are addressed in the following subheader.

"},{"location":"design/database/#write","title":"Write","text":""},{"location":"design/database/#read","title":"Read","text":""},{"location":"design/database/#utilities","title":"Utilities","text":""},{"location":"design/database/#scripts","title":"Scripts","text":""},{"location":"design/database/#tui-and-gui","title":"TUI and GUI","text":""},{"location":"source/managed_tasks/","title":"managed_tasks","text":"

LUTE Managed Tasks.

Executor-managed Tasks with specific environment specifications are defined here.

"},{"location":"source/managed_tasks/#managed_tasks.BinaryErrTester","title":"BinaryErrTester = Executor('TestBinaryErr') module-attribute","text":"

Runs a test of a third-party task that fails.

"},{"location":"source/managed_tasks/#managed_tasks.BinaryTester","title":"BinaryTester: Executor = Executor('TestBinary') module-attribute","text":"

Runs a basic test of a multi-threaded third-party Task.

"},{"location":"source/managed_tasks/#managed_tasks.CrystFELIndexer","title":"CrystFELIndexer: Executor = Executor('IndexCrystFEL') module-attribute","text":"

Runs crystallographic indexing using CrystFEL.

"},{"location":"source/managed_tasks/#managed_tasks.DimpleSolver","title":"DimpleSolver: Executor = Executor('DimpleSolve') module-attribute","text":"

Solves a crystallographic structure using molecular replacement.

"},{"location":"source/managed_tasks/#managed_tasks.HKLComparer","title":"HKLComparer: Executor = Executor('CompareHKL') module-attribute","text":"

Runs analysis on merge results for statistics/figures of merit..

"},{"location":"source/managed_tasks/#managed_tasks.HKLManipulator","title":"HKLManipulator: Executor = Executor('ManipulateHKL') module-attribute","text":"

Performs format conversions (among other things) of merge results.

"},{"location":"source/managed_tasks/#managed_tasks.MultiNodeCommunicationTester","title":"MultiNodeCommunicationTester: MPIExecutor = MPIExecutor('TestMultiNodeCommunication') module-attribute","text":"

Runs a test to confirm communication works between multiple nodes.

"},{"location":"source/managed_tasks/#managed_tasks.PartialatorMerger","title":"PartialatorMerger: Executor = Executor('MergePartialator') module-attribute","text":"

Runs crystallographic merging using CrystFEL's partialator.

"},{"location":"source/managed_tasks/#managed_tasks.PeakFinderPsocake","title":"PeakFinderPsocake: Executor = Executor('FindPeaksPsocake') module-attribute","text":"

Performs Bragg peak finding using psocake - DEPRECATED.

"},{"location":"source/managed_tasks/#managed_tasks.PeakFinderPyAlgos","title":"PeakFinderPyAlgos: MPIExecutor = MPIExecutor('FindPeaksPyAlgos') module-attribute","text":"

Performs Bragg peak finding using the PyAlgos algorithm.

"},{"location":"source/managed_tasks/#managed_tasks.ReadTester","title":"ReadTester: Executor = Executor('TestReadOutput') module-attribute","text":"

Runs a test to confirm database reading.

"},{"location":"source/managed_tasks/#managed_tasks.SHELXCRunner","title":"SHELXCRunner: Executor = Executor('RunSHELXC') module-attribute","text":"

Runs CCP4 SHELXC - needed for crystallographic phasing.

"},{"location":"source/managed_tasks/#managed_tasks.SmallDataProducer","title":"SmallDataProducer: Executor = Executor('SubmitSMD') module-attribute","text":"

Runs the production of a smalldata HDF5 file.

"},{"location":"source/managed_tasks/#managed_tasks.SocketTester","title":"SocketTester: Executor = Executor('TestSocket') module-attribute","text":"

Runs a test of socket-based communication.

"},{"location":"source/managed_tasks/#managed_tasks.StreamFileConcatenator","title":"StreamFileConcatenator: Executor = Executor('ConcatenateStreamFiles') module-attribute","text":"

Concatenates results from crystallographic indexing of multiple runs.

"},{"location":"source/managed_tasks/#managed_tasks.Tester","title":"Tester: Executor = Executor('Test') module-attribute","text":"

Runs a basic test of a first-party Task.

"},{"location":"source/managed_tasks/#managed_tasks.WriteTester","title":"WriteTester: Executor = Executor('TestWriteOutput') module-attribute","text":"

Runs a test to confirm database writing.

"},{"location":"source/execution/debug_utils/","title":"debug_utils","text":"

Functions to assist in debugging execution of LUTE.

Functions:

Name Description LUTE_DEBUG_EXIT

str, str_dump: Optional[str]): Exits the program if the provided env_var is set. Optionally, also prints a message if provided.

Raises:

Type Description ValidationError

Error raised by pydantic during data validation. (From Pydantic)

"},{"location":"source/execution/executor/","title":"executor","text":"

Base classes and functions for handling Task execution.

Executors run a Task as a subprocess and handle all communication with other services, e.g., the eLog. They accept specific handlers to override default stream parsing.

Event handlers/hooks are implemented as standalone functions which can be added to an Executor.

Classes:

Name Description AnalysisConfig

Data class for holding a managed Task's configuration.

BaseExecutor

Abstract base class from which all Executors are derived.

Executor

Default Executor implementing all basic functionality and IPC.

BinaryExecutor

Can execute any arbitrary binary/command as a managed task within the framework provided by LUTE.

"},{"location":"source/execution/executor/#execution.executor--exceptions","title":"Exceptions","text":""},{"location":"source/execution/executor/#execution.executor.BaseExecutor","title":"BaseExecutor","text":"

Bases: ABC

ABC to manage Task execution and communication with user services.

When running in a workflow, \"tasks\" (not the class instances) are submitted as Executors. The Executor manages environment setup, the actual Task submission, and communication regarding Task results and status with third party services like the eLog.

Attributes:

Methods:

Name Description add_hook

str, hook: Callable[[None], None]) -> None: Create a new hook to be called each time a specific event occurs.

add_default_hooks

Populate the event hooks with the default functions.

update_environment

Dict[str, str], update_path: str): Update the environment that is passed to the Task subprocess.

execute_task

Run the task as a subprocess.

Source code in lute/execution/executor.py
class BaseExecutor(ABC):\n    \"\"\"ABC to manage Task execution and communication with user services.\n\n    When running in a workflow, \"tasks\" (not the class instances) are submitted\n    as `Executors`. The Executor manages environment setup, the actual Task\n    submission, and communication regarding Task results and status with third\n    party services like the eLog.\n\n    Attributes:\n\n    Methods:\n        add_hook(event: str, hook: Callable[[None], None]) -> None: Create a\n            new hook to be called each time a specific event occurs.\n\n        add_default_hooks() -> None: Populate the event hooks with the default\n            functions.\n\n        update_environment(env: Dict[str, str], update_path: str): Update the\n            environment that is passed to the Task subprocess.\n\n        execute_task(): Run the task as a subprocess.\n    \"\"\"\n\n    class Hooks:\n        \"\"\"A container class for the Executor's event hooks.\n\n        There is a corresponding function (hook) for each event/signal. Each\n        function takes two parameters - a reference to the Executor (self) and\n        a reference to the Message (msg) which includes the corresponding\n        signal.\n        \"\"\"\n\n        def no_pickle_mode(self: Self, msg: Message): ...\n\n        def task_started(self: Self, msg: Message): ...\n\n        def task_failed(self: Self, msg: Message): ...\n\n        def task_stopped(self: Self, msg: Message): ...\n\n        def task_done(self: Self, msg: Message): ...\n\n        def task_cancelled(self: Self, msg: Message): ...\n\n        def task_result(self: Self, msg: Message): ...\n\n    def __init__(\n        self,\n        task_name: str,\n        communicators: List[Communicator],\n        poll_interval: float = 0.05,\n    ) -> None:\n        \"\"\"The Executor will manage the subprocess in which `task_name` is run.\n\n        Args:\n            task_name (str): The name of the Task to be submitted. Must match\n                the Task's class name exactly. The parameter specification must\n                also be in a properly named model to be identified.\n\n            communicators (List[Communicator]): A list of one or more\n                communicators which manage information flow to/from the Task.\n                Subclasses may have different defaults, and new functionality\n                can be introduced by composing Executors with communicators.\n\n            poll_interval (float): Time to wait between reading/writing to the\n                managed subprocess. In seconds.\n        \"\"\"\n        result: TaskResult = TaskResult(\n            task_name=task_name, task_status=TaskStatus.PENDING, summary=\"\", payload=\"\"\n        )\n        task_parameters: Optional[TaskParameters] = None\n        task_env: Dict[str, str] = os.environ.copy()\n        self._communicators: List[Communicator] = communicators\n        communicator_desc: List[str] = []\n        for comm in self._communicators:\n            comm.stage_communicator()\n            communicator_desc.append(str(comm))\n\n        self._analysis_desc: DescribedAnalysis = DescribedAnalysis(\n            task_result=result,\n            task_parameters=task_parameters,\n            task_env=task_env,\n            poll_interval=poll_interval,\n            communicator_desc=communicator_desc,\n        )\n\n    def add_hook(self, event: str, hook: Callable[[Self, Message], None]) -> None:\n        \"\"\"Add a new hook.\n\n        Each hook is a function called any time the Executor receives a signal\n        for a particular event, e.g. Task starts, Task ends, etc. Calling this\n        method will remove any hook that currently exists for the event. I.e.\n        only one hook can be called per event at a time. Creating hooks for\n        events which do not exist is not allowed.\n\n        Args:\n            event (str): The event for which the hook will be called.\n\n            hook (Callable[[None], None]) The function to be called during each\n                occurrence of the event.\n        \"\"\"\n        if event.upper() in LUTE_SIGNALS:\n            setattr(self.Hooks, event.lower(), hook)\n\n    @abstractmethod\n    def add_default_hooks(self) -> None:\n        \"\"\"Populate the set of default event hooks.\"\"\"\n\n        ...\n\n    def update_environment(\n        self, env: Dict[str, str], update_path: str = \"prepend\"\n    ) -> None:\n        \"\"\"Update the stored set of environment variables.\n\n        These are passed to the subprocess to setup its environment.\n\n        Args:\n            env (Dict[str, str]): A dictionary of \"VAR\":\"VALUE\" pairs of\n                environment variables to be added to the subprocess environment.\n                If any variables already exist, the new variables will\n                overwrite them (except PATH, see below).\n\n            update_path (str): If PATH is present in the new set of variables,\n                this argument determines how the old PATH is dealt with. There\n                are three options:\n                * \"prepend\" : The new PATH values are prepended to the old ones.\n                * \"append\" : The new PATH values are appended to the old ones.\n                * \"overwrite\" : The old PATH is overwritten by the new one.\n                \"prepend\" is the default option. If PATH is not present in the\n                current environment, the new PATH is used without modification.\n        \"\"\"\n        if \"PATH\" in env:\n            sep: str = os.pathsep\n            if update_path == \"prepend\":\n                env[\"PATH\"] = (\n                    f\"{env['PATH']}{sep}{self._analysis_desc.task_env['PATH']}\"\n                )\n            elif update_path == \"append\":\n                env[\"PATH\"] = (\n                    f\"{self._analysis_desc.task_env['PATH']}{sep}{env['PATH']}\"\n                )\n            elif update_path == \"overwrite\":\n                pass\n            else:\n                raise ValueError(\n                    (\n                        f\"{update_path} is not a valid option for `update_path`!\"\n                        \" Options are: prepend, append, overwrite.\"\n                    )\n                )\n        os.environ.update(env)\n        self._analysis_desc.task_env.update(env)\n\n    def shell_source(self, env: str) -> None:\n        \"\"\"Source a script.\n\n        Unlike `update_environment` this method sources a new file.\n\n        Args:\n            env (str): Path to the script to source.\n        \"\"\"\n        import sys\n\n        if not os.path.exists(env):\n            logger.info(f\"Cannot source environment from {env}!\")\n            return\n\n        script: str = (\n            f\"set -a\\n\"\n            f'source \"{env}\" >/dev/null\\n'\n            f'{sys.executable} -c \"import os; print(dict(os.environ))\"\\n'\n        )\n        logger.info(f\"Sourcing file {env}\")\n        o, e = subprocess.Popen(\n            [\"bash\", \"-c\", script], stdout=subprocess.PIPE\n        ).communicate()\n        new_environment: Dict[str, str] = eval(o)\n        self._analysis_desc.task_env = new_environment\n\n    def _pre_task(self) -> None:\n        \"\"\"Any actions to be performed before task submission.\n\n        This method may or may not be used by subclasses. It may be useful\n        for logging etc.\n        \"\"\"\n        # This prevents the Executors in managed_tasks.py from all acquiring\n        # resources like sockets.\n        for communicator in self._communicators:\n            communicator.delayed_setup()\n            # Not great, but experience shows we need a bit of time to setup\n            # network.\n            time.sleep(0.1)\n        # Propagate any env vars setup by Communicators - only update LUTE_ vars\n        tmp: Dict[str, str] = {key: os.environ[key] for key in os.environ if \"LUTE_\" in key}\n        self._analysis_desc.task_env.update(tmp)\n\n    def _submit_task(self, cmd: str) -> subprocess.Popen:\n        proc: subprocess.Popen = subprocess.Popen(\n            cmd.split(),\n            stdout=subprocess.PIPE,\n            stderr=subprocess.PIPE,\n            env=self._analysis_desc.task_env,\n        )\n        os.set_blocking(proc.stdout.fileno(), False)\n        os.set_blocking(proc.stderr.fileno(), False)\n        return proc\n\n    @abstractmethod\n    def _task_loop(self, proc: subprocess.Popen) -> None:\n        \"\"\"Actions to perform while the Task is running.\n\n        This function is run in the body of a loop until the Task signals\n        that its finished.\n        \"\"\"\n        ...\n\n    @abstractmethod\n    def _finalize_task(self, proc: subprocess.Popen) -> None:\n        \"\"\"Any actions to be performed after the Task has ended.\n\n        Examples include a final clearing of the pipes, retrieving results,\n        reporting to third party services, etc.\n        \"\"\"\n        ...\n\n    def _submit_cmd(self, executable_path: str, params: str) -> str:\n        \"\"\"Return a formatted command for launching Task subprocess.\n\n        May be overridden by subclasses.\n\n        Args:\n            executable_path (str): Path to the LUTE subprocess script.\n\n            params (str): String of formatted command-line arguments.\n\n        Returns:\n            cmd (str): Appropriately formatted command for this Executor.\n        \"\"\"\n        cmd: str = \"\"\n        if __debug__:\n            cmd = f\"python -B {executable_path} {params}\"\n        else:\n            cmd = f\"python -OB {executable_path} {params}\"\n\n        return cmd\n\n    def execute_task(self) -> None:\n        \"\"\"Run the requested Task as a subprocess.\"\"\"\n        self._pre_task()\n        lute_path: Optional[str] = os.getenv(\"LUTE_PATH\")\n        if lute_path is None:\n            logger.debug(\"Absolute path to subprocess_task.py not found.\")\n            lute_path = os.path.abspath(f\"{os.path.dirname(__file__)}/../..\")\n            self.update_environment({\"LUTE_PATH\": lute_path})\n        executable_path: str = f\"{lute_path}/subprocess_task.py\"\n        config_path: str = self._analysis_desc.task_env[\"LUTE_CONFIGPATH\"]\n        params: str = f\"-c {config_path} -t {self._analysis_desc.task_result.task_name}\"\n\n        cmd: str = self._submit_cmd(executable_path, params)\n        proc: subprocess.Popen = self._submit_task(cmd)\n\n        while self._task_is_running(proc):\n            self._task_loop(proc)\n            time.sleep(self._analysis_desc.poll_interval)\n\n        os.set_blocking(proc.stdout.fileno(), True)\n        os.set_blocking(proc.stderr.fileno(), True)\n\n        self._finalize_task(proc)\n        proc.stdout.close()\n        proc.stderr.close()\n        proc.wait()\n        if ret := proc.returncode:\n            logger.info(f\"Task failed with return code: {ret}\")\n            self._analysis_desc.task_result.task_status = TaskStatus.FAILED\n            self.Hooks.task_failed(self, msg=Message())\n        elif self._analysis_desc.task_result.task_status == TaskStatus.RUNNING:\n            # Ret code is 0, no exception was thrown, task forgot to set status\n            self._analysis_desc.task_result.task_status = TaskStatus.COMPLETED\n            logger.debug(f\"Task did not change from RUNNING status. Assume COMPLETED.\")\n            self.Hooks.task_done(self, msg=Message())\n        self._store_configuration()\n        for comm in self._communicators:\n            comm.clear_communicator()\n\n        if self._analysis_desc.task_result.task_status == TaskStatus.FAILED:\n            logger.info(\"Exiting after Task failure. Result recorded.\")\n            sys.exit(-1)\n\n        self.process_results()\n\n    def _store_configuration(self) -> None:\n        \"\"\"Store configuration and results in the LUTE database.\"\"\"\n        record_analysis_db(copy.deepcopy(self._analysis_desc))\n\n    def _task_is_running(self, proc: subprocess.Popen) -> bool:\n        \"\"\"Whether a subprocess is running.\n\n        Args:\n            proc (subprocess.Popen): The subprocess to determine the run status\n                of.\n\n        Returns:\n            bool: Is the subprocess task running.\n        \"\"\"\n        # Add additional conditions - don't want to exit main loop\n        # if only stopped\n        task_status: TaskStatus = self._analysis_desc.task_result.task_status\n        is_running: bool = task_status != TaskStatus.COMPLETED\n        is_running &= task_status != TaskStatus.CANCELLED\n        is_running &= task_status != TaskStatus.TIMEDOUT\n        return proc.poll() is None and is_running\n\n    def _stop(self, proc: subprocess.Popen) -> None:\n        \"\"\"Stop the Task subprocess.\"\"\"\n        os.kill(proc.pid, signal.SIGTSTP)\n        self._analysis_desc.task_result.task_status = TaskStatus.STOPPED\n\n    def _continue(self, proc: subprocess.Popen) -> None:\n        \"\"\"Resume a stopped Task subprocess.\"\"\"\n        os.kill(proc.pid, signal.SIGCONT)\n        self._analysis_desc.task_result.task_status = TaskStatus.RUNNING\n\n    def _set_result_from_parameters(self) -> None:\n        \"\"\"Use TaskParameters object to set TaskResult fields.\n\n        A result may be defined in terms of specific parameters. This is most\n        useful for ThirdPartyTasks which would not otherwise have an easy way of\n        reporting what the TaskResult is. There are two options for specifying\n        results from parameters:\n            1. A single parameter (Field) of the model has an attribute\n               `is_result`. This is a bool indicating that this parameter points\n               to a result. E.g. a parameter `output` may set `is_result=True`.\n            2. The `TaskParameters.Config` has a `result_from_params` attribute.\n               This is an appropriate option if the result is determinable for\n               the Task, but it is not easily defined by a single parameter. The\n               TaskParameters.Config.result_from_param can be set by a custom\n               validator, e.g. to combine the values of multiple parameters into\n               a single result. E.g. an `out_dir` and `out_file` parameter used\n               together specify the result. Currently only string specifiers are\n               supported.\n\n        A TaskParameters object specifies that it contains information about the\n        result by setting a single config option:\n                        TaskParameters.Config.set_result=True\n        In general, this method should only be called when the above condition is\n        met, however, there are minimal checks in it as well.\n        \"\"\"\n        # This method shouldn't be called unless appropriate\n        # But we will add extra guards here\n        if self._analysis_desc.task_parameters is None:\n            logger.debug(\n                \"Cannot set result from TaskParameters. TaskParameters is None!\"\n            )\n            return\n        if (\n            not hasattr(self._analysis_desc.task_parameters.Config, \"set_result\")\n            or not self._analysis_desc.task_parameters.Config.set_result\n        ):\n            logger.debug(\n                \"Cannot set result from TaskParameters. `set_result` not specified!\"\n            )\n            return\n\n        # First try to set from result_from_params (faster)\n        if self._analysis_desc.task_parameters.Config.result_from_params is not None:\n            result_from_params: str = (\n                self._analysis_desc.task_parameters.Config.result_from_params\n            )\n            logger.info(f\"TaskResult specified as {result_from_params}.\")\n            self._analysis_desc.task_result.payload = result_from_params\n        else:\n            # Iterate parameters to find the one that is the result\n            schema: Dict[str, Any] = self._analysis_desc.task_parameters.schema()\n            for param, value in self._analysis_desc.task_parameters.dict().items():\n                param_attrs: Dict[str, Any] = schema[\"properties\"][param]\n                if \"is_result\" in param_attrs:\n                    is_result: bool = param_attrs[\"is_result\"]\n                    if isinstance(is_result, bool) and is_result:\n                        logger.info(f\"TaskResult specified as {value}.\")\n                        self._analysis_desc.task_result.payload = value\n                    else:\n                        logger.debug(\n                            (\n                                f\"{param} specified as result! But specifier is of \"\n                                f\"wrong type: {type(is_result)}!\"\n                            )\n                        )\n                    break  # We should only have 1 result-like parameter!\n\n        # If we get this far and haven't changed the payload we should complain\n        if self._analysis_desc.task_result.payload == \"\":\n            task_name: str = self._analysis_desc.task_result.task_name\n            logger.debug(\n                (\n                    f\"{task_name} specified result be set from {task_name}Parameters,\"\n                    \" but no result provided! Check model definition!\"\n                )\n            )\n        # Now check for impl_schemas and pass to result.impl_schemas\n        # Currently unused\n        impl_schemas: Optional[str] = (\n            self._analysis_desc.task_parameters.Config.impl_schemas\n        )\n        self._analysis_desc.task_result.impl_schemas = impl_schemas\n        # If we set_result but didn't get schema information we should complain\n        if self._analysis_desc.task_result.impl_schemas is None:\n            task_name: str = self._analysis_desc.task_result.task_name\n            logger.debug(\n                (\n                    f\"{task_name} specified result be set from {task_name}Parameters,\"\n                    \" but no schema provided! Check model definition!\"\n                )\n            )\n\n    def process_results(self) -> None:\n        \"\"\"Perform any necessary steps to process TaskResults object.\n\n        Processing will depend on subclass. Examples of steps include, moving\n        files, converting file formats, compiling plots/figures into an HTML\n        file, etc.\n        \"\"\"\n        self._process_results()\n\n    @abstractmethod\n    def _process_results(self) -> None: ...\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.Hooks","title":"Hooks","text":"

A container class for the Executor's event hooks.

There is a corresponding function (hook) for each event/signal. Each function takes two parameters - a reference to the Executor (self) and a reference to the Message (msg) which includes the corresponding signal.

Source code in lute/execution/executor.py
class Hooks:\n    \"\"\"A container class for the Executor's event hooks.\n\n    There is a corresponding function (hook) for each event/signal. Each\n    function takes two parameters - a reference to the Executor (self) and\n    a reference to the Message (msg) which includes the corresponding\n    signal.\n    \"\"\"\n\n    def no_pickle_mode(self: Self, msg: Message): ...\n\n    def task_started(self: Self, msg: Message): ...\n\n    def task_failed(self: Self, msg: Message): ...\n\n    def task_stopped(self: Self, msg: Message): ...\n\n    def task_done(self: Self, msg: Message): ...\n\n    def task_cancelled(self: Self, msg: Message): ...\n\n    def task_result(self: Self, msg: Message): ...\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.__init__","title":"__init__(task_name, communicators, poll_interval=0.05)","text":"

The Executor will manage the subprocess in which task_name is run.

Parameters:

Name Type Description Default task_name str

The name of the Task to be submitted. Must match the Task's class name exactly. The parameter specification must also be in a properly named model to be identified.

required communicators List[Communicator]

A list of one or more communicators which manage information flow to/from the Task. Subclasses may have different defaults, and new functionality can be introduced by composing Executors with communicators.

required poll_interval float

Time to wait between reading/writing to the managed subprocess. In seconds.

0.05 Source code in lute/execution/executor.py
def __init__(\n    self,\n    task_name: str,\n    communicators: List[Communicator],\n    poll_interval: float = 0.05,\n) -> None:\n    \"\"\"The Executor will manage the subprocess in which `task_name` is run.\n\n    Args:\n        task_name (str): The name of the Task to be submitted. Must match\n            the Task's class name exactly. The parameter specification must\n            also be in a properly named model to be identified.\n\n        communicators (List[Communicator]): A list of one or more\n            communicators which manage information flow to/from the Task.\n            Subclasses may have different defaults, and new functionality\n            can be introduced by composing Executors with communicators.\n\n        poll_interval (float): Time to wait between reading/writing to the\n            managed subprocess. In seconds.\n    \"\"\"\n    result: TaskResult = TaskResult(\n        task_name=task_name, task_status=TaskStatus.PENDING, summary=\"\", payload=\"\"\n    )\n    task_parameters: Optional[TaskParameters] = None\n    task_env: Dict[str, str] = os.environ.copy()\n    self._communicators: List[Communicator] = communicators\n    communicator_desc: List[str] = []\n    for comm in self._communicators:\n        comm.stage_communicator()\n        communicator_desc.append(str(comm))\n\n    self._analysis_desc: DescribedAnalysis = DescribedAnalysis(\n        task_result=result,\n        task_parameters=task_parameters,\n        task_env=task_env,\n        poll_interval=poll_interval,\n        communicator_desc=communicator_desc,\n    )\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.add_default_hooks","title":"add_default_hooks() abstractmethod","text":"

Populate the set of default event hooks.

Source code in lute/execution/executor.py
@abstractmethod\ndef add_default_hooks(self) -> None:\n    \"\"\"Populate the set of default event hooks.\"\"\"\n\n    ...\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.add_hook","title":"add_hook(event, hook)","text":"

Add a new hook.

Each hook is a function called any time the Executor receives a signal for a particular event, e.g. Task starts, Task ends, etc. Calling this method will remove any hook that currently exists for the event. I.e. only one hook can be called per event at a time. Creating hooks for events which do not exist is not allowed.

Parameters:

Name Type Description Default event str

The event for which the hook will be called.

required Source code in lute/execution/executor.py
def add_hook(self, event: str, hook: Callable[[Self, Message], None]) -> None:\n    \"\"\"Add a new hook.\n\n    Each hook is a function called any time the Executor receives a signal\n    for a particular event, e.g. Task starts, Task ends, etc. Calling this\n    method will remove any hook that currently exists for the event. I.e.\n    only one hook can be called per event at a time. Creating hooks for\n    events which do not exist is not allowed.\n\n    Args:\n        event (str): The event for which the hook will be called.\n\n        hook (Callable[[None], None]) The function to be called during each\n            occurrence of the event.\n    \"\"\"\n    if event.upper() in LUTE_SIGNALS:\n        setattr(self.Hooks, event.lower(), hook)\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.execute_task","title":"execute_task()","text":"

Run the requested Task as a subprocess.

Source code in lute/execution/executor.py
def execute_task(self) -> None:\n    \"\"\"Run the requested Task as a subprocess.\"\"\"\n    self._pre_task()\n    lute_path: Optional[str] = os.getenv(\"LUTE_PATH\")\n    if lute_path is None:\n        logger.debug(\"Absolute path to subprocess_task.py not found.\")\n        lute_path = os.path.abspath(f\"{os.path.dirname(__file__)}/../..\")\n        self.update_environment({\"LUTE_PATH\": lute_path})\n    executable_path: str = f\"{lute_path}/subprocess_task.py\"\n    config_path: str = self._analysis_desc.task_env[\"LUTE_CONFIGPATH\"]\n    params: str = f\"-c {config_path} -t {self._analysis_desc.task_result.task_name}\"\n\n    cmd: str = self._submit_cmd(executable_path, params)\n    proc: subprocess.Popen = self._submit_task(cmd)\n\n    while self._task_is_running(proc):\n        self._task_loop(proc)\n        time.sleep(self._analysis_desc.poll_interval)\n\n    os.set_blocking(proc.stdout.fileno(), True)\n    os.set_blocking(proc.stderr.fileno(), True)\n\n    self._finalize_task(proc)\n    proc.stdout.close()\n    proc.stderr.close()\n    proc.wait()\n    if ret := proc.returncode:\n        logger.info(f\"Task failed with return code: {ret}\")\n        self._analysis_desc.task_result.task_status = TaskStatus.FAILED\n        self.Hooks.task_failed(self, msg=Message())\n    elif self._analysis_desc.task_result.task_status == TaskStatus.RUNNING:\n        # Ret code is 0, no exception was thrown, task forgot to set status\n        self._analysis_desc.task_result.task_status = TaskStatus.COMPLETED\n        logger.debug(f\"Task did not change from RUNNING status. Assume COMPLETED.\")\n        self.Hooks.task_done(self, msg=Message())\n    self._store_configuration()\n    for comm in self._communicators:\n        comm.clear_communicator()\n\n    if self._analysis_desc.task_result.task_status == TaskStatus.FAILED:\n        logger.info(\"Exiting after Task failure. Result recorded.\")\n        sys.exit(-1)\n\n    self.process_results()\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.process_results","title":"process_results()","text":"

Perform any necessary steps to process TaskResults object.

Processing will depend on subclass. Examples of steps include, moving files, converting file formats, compiling plots/figures into an HTML file, etc.

Source code in lute/execution/executor.py
def process_results(self) -> None:\n    \"\"\"Perform any necessary steps to process TaskResults object.\n\n    Processing will depend on subclass. Examples of steps include, moving\n    files, converting file formats, compiling plots/figures into an HTML\n    file, etc.\n    \"\"\"\n    self._process_results()\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.shell_source","title":"shell_source(env)","text":"

Source a script.

Unlike update_environment this method sources a new file.

Parameters:

Name Type Description Default env str

Path to the script to source.

required Source code in lute/execution/executor.py
def shell_source(self, env: str) -> None:\n    \"\"\"Source a script.\n\n    Unlike `update_environment` this method sources a new file.\n\n    Args:\n        env (str): Path to the script to source.\n    \"\"\"\n    import sys\n\n    if not os.path.exists(env):\n        logger.info(f\"Cannot source environment from {env}!\")\n        return\n\n    script: str = (\n        f\"set -a\\n\"\n        f'source \"{env}\" >/dev/null\\n'\n        f'{sys.executable} -c \"import os; print(dict(os.environ))\"\\n'\n    )\n    logger.info(f\"Sourcing file {env}\")\n    o, e = subprocess.Popen(\n        [\"bash\", \"-c\", script], stdout=subprocess.PIPE\n    ).communicate()\n    new_environment: Dict[str, str] = eval(o)\n    self._analysis_desc.task_env = new_environment\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.update_environment","title":"update_environment(env, update_path='prepend')","text":"

Update the stored set of environment variables.

These are passed to the subprocess to setup its environment.

Parameters:

Name Type Description Default env Dict[str, str]

A dictionary of \"VAR\":\"VALUE\" pairs of environment variables to be added to the subprocess environment. If any variables already exist, the new variables will overwrite them (except PATH, see below).

required update_path str

If PATH is present in the new set of variables, this argument determines how the old PATH is dealt with. There are three options: * \"prepend\" : The new PATH values are prepended to the old ones. * \"append\" : The new PATH values are appended to the old ones. * \"overwrite\" : The old PATH is overwritten by the new one. \"prepend\" is the default option. If PATH is not present in the current environment, the new PATH is used without modification.

'prepend' Source code in lute/execution/executor.py
def update_environment(\n    self, env: Dict[str, str], update_path: str = \"prepend\"\n) -> None:\n    \"\"\"Update the stored set of environment variables.\n\n    These are passed to the subprocess to setup its environment.\n\n    Args:\n        env (Dict[str, str]): A dictionary of \"VAR\":\"VALUE\" pairs of\n            environment variables to be added to the subprocess environment.\n            If any variables already exist, the new variables will\n            overwrite them (except PATH, see below).\n\n        update_path (str): If PATH is present in the new set of variables,\n            this argument determines how the old PATH is dealt with. There\n            are three options:\n            * \"prepend\" : The new PATH values are prepended to the old ones.\n            * \"append\" : The new PATH values are appended to the old ones.\n            * \"overwrite\" : The old PATH is overwritten by the new one.\n            \"prepend\" is the default option. If PATH is not present in the\n            current environment, the new PATH is used without modification.\n    \"\"\"\n    if \"PATH\" in env:\n        sep: str = os.pathsep\n        if update_path == \"prepend\":\n            env[\"PATH\"] = (\n                f\"{env['PATH']}{sep}{self._analysis_desc.task_env['PATH']}\"\n            )\n        elif update_path == \"append\":\n            env[\"PATH\"] = (\n                f\"{self._analysis_desc.task_env['PATH']}{sep}{env['PATH']}\"\n            )\n        elif update_path == \"overwrite\":\n            pass\n        else:\n            raise ValueError(\n                (\n                    f\"{update_path} is not a valid option for `update_path`!\"\n                    \" Options are: prepend, append, overwrite.\"\n                )\n            )\n    os.environ.update(env)\n    self._analysis_desc.task_env.update(env)\n
"},{"location":"source/execution/executor/#execution.executor.Communicator","title":"Communicator","text":"

Bases: ABC

Source code in lute/execution/ipc.py
class Communicator(ABC):\n    def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n        \"\"\"Abstract Base Class for IPC Communicator objects.\n\n        Args:\n            party (Party): Which object (side/process) the Communicator is\n                managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n            use_pickle (bool): Whether to serialize data using pickle prior to\n                sending it.\n        \"\"\"\n        self._party = party\n        self._use_pickle = use_pickle\n        self.desc = \"Communicator abstract base class.\"\n\n    @abstractmethod\n    def read(self, proc: subprocess.Popen) -> Message:\n        \"\"\"Method for reading data through the communication mechanism.\"\"\"\n        ...\n\n    @abstractmethod\n    def write(self, msg: Message) -> None:\n        \"\"\"Method for sending data through the communication mechanism.\"\"\"\n        ...\n\n    def __str__(self):\n        name: str = str(type(self)).split(\"'\")[1].split(\".\")[-1]\n        return f\"{name}: {self.desc}\"\n\n    def __repr__(self):\n        return self.__str__()\n\n    def __enter__(self) -> Self:\n        return self\n\n    def __exit__(self) -> None: ...\n\n    @property\n    def has_messages(self) -> bool:\n        \"\"\"Whether the Communicator has remaining messages.\n\n        The precise method for determining whether there are remaining messages\n        will depend on the specific Communicator sub-class.\n        \"\"\"\n        return False\n\n    def stage_communicator(self):\n        \"\"\"Alternative method for staging outside of context manager.\"\"\"\n        self.__enter__()\n\n    def clear_communicator(self):\n        \"\"\"Alternative exit method outside of context manager.\"\"\"\n        self.__exit__()\n\n    def delayed_setup(self):\n        \"\"\"Any setup that should be done later than init.\"\"\"\n        ...\n
"},{"location":"source/execution/executor/#execution.executor.Communicator.has_messages","title":"has_messages: bool property","text":"

Whether the Communicator has remaining messages.

The precise method for determining whether there are remaining messages will depend on the specific Communicator sub-class.

"},{"location":"source/execution/executor/#execution.executor.Communicator.__init__","title":"__init__(party=Party.TASK, use_pickle=True)","text":"

Abstract Base Class for IPC Communicator objects.

Parameters:

Name Type Description Default party Party

Which object (side/process) the Communicator is managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.

TASK use_pickle bool

Whether to serialize data using pickle prior to sending it.

True Source code in lute/execution/ipc.py
def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n    \"\"\"Abstract Base Class for IPC Communicator objects.\n\n    Args:\n        party (Party): Which object (side/process) the Communicator is\n            managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n        use_pickle (bool): Whether to serialize data using pickle prior to\n            sending it.\n    \"\"\"\n    self._party = party\n    self._use_pickle = use_pickle\n    self.desc = \"Communicator abstract base class.\"\n
"},{"location":"source/execution/executor/#execution.executor.Communicator.clear_communicator","title":"clear_communicator()","text":"

Alternative exit method outside of context manager.

Source code in lute/execution/ipc.py
def clear_communicator(self):\n    \"\"\"Alternative exit method outside of context manager.\"\"\"\n    self.__exit__()\n
"},{"location":"source/execution/executor/#execution.executor.Communicator.delayed_setup","title":"delayed_setup()","text":"

Any setup that should be done later than init.

Source code in lute/execution/ipc.py
def delayed_setup(self):\n    \"\"\"Any setup that should be done later than init.\"\"\"\n    ...\n
"},{"location":"source/execution/executor/#execution.executor.Communicator.read","title":"read(proc) abstractmethod","text":"

Method for reading data through the communication mechanism.

Source code in lute/execution/ipc.py
@abstractmethod\ndef read(self, proc: subprocess.Popen) -> Message:\n    \"\"\"Method for reading data through the communication mechanism.\"\"\"\n    ...\n
"},{"location":"source/execution/executor/#execution.executor.Communicator.stage_communicator","title":"stage_communicator()","text":"

Alternative method for staging outside of context manager.

Source code in lute/execution/ipc.py
def stage_communicator(self):\n    \"\"\"Alternative method for staging outside of context manager.\"\"\"\n    self.__enter__()\n
"},{"location":"source/execution/executor/#execution.executor.Communicator.write","title":"write(msg) abstractmethod","text":"

Method for sending data through the communication mechanism.

Source code in lute/execution/ipc.py
@abstractmethod\ndef write(self, msg: Message) -> None:\n    \"\"\"Method for sending data through the communication mechanism.\"\"\"\n    ...\n
"},{"location":"source/execution/executor/#execution.executor.Executor","title":"Executor","text":"

Bases: BaseExecutor

Basic implementation of an Executor which manages simple IPC with Task.

Attributes:

Methods:

Name Description add_hook

str, hook: Callable[[None], None]) -> None: Create a new hook to be called each time a specific event occurs.

add_default_hooks

Populate the event hooks with the default functions.

update_environment

Dict[str, str], update_path: str): Update the environment that is passed to the Task subprocess.

execute_task

Run the task as a subprocess.

Source code in lute/execution/executor.py
class Executor(BaseExecutor):\n    \"\"\"Basic implementation of an Executor which manages simple IPC with Task.\n\n    Attributes:\n\n    Methods:\n        add_hook(event: str, hook: Callable[[None], None]) -> None: Create a\n            new hook to be called each time a specific event occurs.\n\n        add_default_hooks() -> None: Populate the event hooks with the default\n            functions.\n\n        update_environment(env: Dict[str, str], update_path: str): Update the\n            environment that is passed to the Task subprocess.\n\n        execute_task(): Run the task as a subprocess.\n    \"\"\"\n\n    def __init__(\n        self,\n        task_name: str,\n        communicators: List[Communicator] = [\n            PipeCommunicator(Party.EXECUTOR),\n            SocketCommunicator(Party.EXECUTOR),\n        ],\n        poll_interval: float = 0.05,\n    ) -> None:\n        super().__init__(\n            task_name=task_name,\n            communicators=communicators,\n            poll_interval=poll_interval,\n        )\n        self.add_default_hooks()\n\n    def add_default_hooks(self) -> None:\n        \"\"\"Populate the set of default event hooks.\"\"\"\n\n        def no_pickle_mode(self: Executor, msg: Message):\n            for idx, communicator in enumerate(self._communicators):\n                if isinstance(communicator, PipeCommunicator):\n                    self._communicators[idx] = PipeCommunicator(\n                        Party.EXECUTOR, use_pickle=False\n                    )\n\n        self.add_hook(\"no_pickle_mode\", no_pickle_mode)\n\n        def task_started(self: Executor, msg: Message):\n            if isinstance(msg.contents, TaskParameters):\n                self._analysis_desc.task_parameters = msg.contents\n                # Maybe just run this no matter what? Rely on the other guards?\n                # Perhaps just check if ThirdPartyParameters?\n                # if isinstance(self._analysis_desc.task_parameters, ThirdPartyParameters):\n                if hasattr(self._analysis_desc.task_parameters.Config, \"set_result\"):\n                    # Third party Tasks may mark a parameter as the result\n                    # If so, setup the result now.\n                    self._set_result_from_parameters()\n            logger.info(\n                f\"Executor: {self._analysis_desc.task_result.task_name} started\"\n            )\n            self._analysis_desc.task_result.task_status = TaskStatus.RUNNING\n            elog_data: Dict[str, str] = {\n                f\"{self._analysis_desc.task_result.task_name} status\": \"RUNNING\",\n            }\n            post_elog_run_status(elog_data)\n\n        self.add_hook(\"task_started\", task_started)\n\n        def task_failed(self: Executor, msg: Message):\n            elog_data: Dict[str, str] = {\n                f\"{self._analysis_desc.task_result.task_name} status\": \"FAILED\",\n            }\n            post_elog_run_status(elog_data)\n\n        self.add_hook(\"task_failed\", task_failed)\n\n        def task_stopped(self: Executor, msg: Message):\n            elog_data: Dict[str, str] = {\n                f\"{self._analysis_desc.task_result.task_name} status\": \"STOPPED\",\n            }\n            post_elog_run_status(elog_data)\n\n        self.add_hook(\"task_stopped\", task_stopped)\n\n        def task_done(self: Executor, msg: Message):\n            elog_data: Dict[str, str] = {\n                f\"{self._analysis_desc.task_result.task_name} status\": \"COMPLETED\",\n            }\n            post_elog_run_status(elog_data)\n\n        self.add_hook(\"task_done\", task_done)\n\n        def task_cancelled(self: Executor, msg: Message):\n            elog_data: Dict[str, str] = {\n                f\"{self._analysis_desc.task_result.task_name} status\": \"CANCELLED\",\n            }\n            post_elog_run_status(elog_data)\n\n        self.add_hook(\"task_cancelled\", task_cancelled)\n\n        def task_result(self: Executor, msg: Message):\n            if isinstance(msg.contents, TaskResult):\n                self._analysis_desc.task_result = msg.contents\n                logger.info(self._analysis_desc.task_result.summary)\n                logger.info(self._analysis_desc.task_result.task_status)\n            elog_data: Dict[str, str] = {\n                f\"{self._analysis_desc.task_result.task_name} status\": \"COMPLETED\",\n            }\n            post_elog_run_status(elog_data)\n\n        self.add_hook(\"task_result\", task_result)\n\n    def _task_loop(self, proc: subprocess.Popen) -> None:\n        \"\"\"Actions to perform while the Task is running.\n\n        This function is run in the body of a loop until the Task signals\n        that its finished.\n        \"\"\"\n        for communicator in self._communicators:\n            while True:\n                msg: Message = communicator.read(proc)\n                if msg.signal is not None and msg.signal.upper() in LUTE_SIGNALS:\n                    hook: Callable[[Executor, Message], None] = getattr(\n                        self.Hooks, msg.signal.lower()\n                    )\n                    hook(self, msg)\n                if msg.contents is not None:\n                    if isinstance(msg.contents, str) and msg.contents != \"\":\n                        logger.info(msg.contents)\n                    elif not isinstance(msg.contents, str):\n                        logger.info(msg.contents)\n                if not communicator.has_messages:\n                    break\n\n    def _finalize_task(self, proc: subprocess.Popen) -> None:\n        \"\"\"Any actions to be performed after the Task has ended.\n\n        Examples include a final clearing of the pipes, retrieving results,\n        reporting to third party services, etc.\n        \"\"\"\n        self._task_loop(proc)  # Perform a final read.\n\n    def _process_results(self) -> None:\n        \"\"\"Performs result processing.\n\n        Actions include:\n        - For `ElogSummaryPlots`, will save the summary plot to the appropriate\n            directory for display in the eLog.\n        \"\"\"\n        task_result: TaskResult = self._analysis_desc.task_result\n        self._process_result_payload(task_result.payload)\n        self._process_result_summary(task_result.summary)\n\n    def _process_result_payload(self, payload: Any) -> None:\n        if self._analysis_desc.task_parameters is None:\n            logger.debug(\"Please run Task before using this method!\")\n            return\n        if isinstance(payload, ElogSummaryPlots):\n            # ElogSummaryPlots has figures and a display name\n            # display name also serves as a path.\n            expmt: str = self._analysis_desc.task_parameters.lute_config.experiment\n            base_path: str = f\"/sdf/data/lcls/ds/{expmt[:3]}/{expmt}/stats/summary\"\n            full_path: str = f\"{base_path}/{payload.display_name}\"\n            if not os.path.isdir(full_path):\n                os.makedirs(full_path)\n\n            # Preferred plots are pn.Tabs objects which save directly as html\n            # Only supported plot type that has \"save\" method - do not want to\n            # import plot modules here to do type checks.\n            if hasattr(payload.figures, \"save\"):\n                payload.figures.save(f\"{full_path}/report.html\")\n            else:\n                ...\n        elif isinstance(payload, str):\n            # May be a path to a file...\n            schemas: Optional[str] = self._analysis_desc.task_result.impl_schemas\n            # Should also check `impl_schemas` to determine what to do with path\n\n    def _process_result_summary(self, summary: str) -> None: ...\n
"},{"location":"source/execution/executor/#execution.executor.Executor.add_default_hooks","title":"add_default_hooks()","text":"

Populate the set of default event hooks.

Source code in lute/execution/executor.py
def add_default_hooks(self) -> None:\n    \"\"\"Populate the set of default event hooks.\"\"\"\n\n    def no_pickle_mode(self: Executor, msg: Message):\n        for idx, communicator in enumerate(self._communicators):\n            if isinstance(communicator, PipeCommunicator):\n                self._communicators[idx] = PipeCommunicator(\n                    Party.EXECUTOR, use_pickle=False\n                )\n\n    self.add_hook(\"no_pickle_mode\", no_pickle_mode)\n\n    def task_started(self: Executor, msg: Message):\n        if isinstance(msg.contents, TaskParameters):\n            self._analysis_desc.task_parameters = msg.contents\n            # Maybe just run this no matter what? Rely on the other guards?\n            # Perhaps just check if ThirdPartyParameters?\n            # if isinstance(self._analysis_desc.task_parameters, ThirdPartyParameters):\n            if hasattr(self._analysis_desc.task_parameters.Config, \"set_result\"):\n                # Third party Tasks may mark a parameter as the result\n                # If so, setup the result now.\n                self._set_result_from_parameters()\n        logger.info(\n            f\"Executor: {self._analysis_desc.task_result.task_name} started\"\n        )\n        self._analysis_desc.task_result.task_status = TaskStatus.RUNNING\n        elog_data: Dict[str, str] = {\n            f\"{self._analysis_desc.task_result.task_name} status\": \"RUNNING\",\n        }\n        post_elog_run_status(elog_data)\n\n    self.add_hook(\"task_started\", task_started)\n\n    def task_failed(self: Executor, msg: Message):\n        elog_data: Dict[str, str] = {\n            f\"{self._analysis_desc.task_result.task_name} status\": \"FAILED\",\n        }\n        post_elog_run_status(elog_data)\n\n    self.add_hook(\"task_failed\", task_failed)\n\n    def task_stopped(self: Executor, msg: Message):\n        elog_data: Dict[str, str] = {\n            f\"{self._analysis_desc.task_result.task_name} status\": \"STOPPED\",\n        }\n        post_elog_run_status(elog_data)\n\n    self.add_hook(\"task_stopped\", task_stopped)\n\n    def task_done(self: Executor, msg: Message):\n        elog_data: Dict[str, str] = {\n            f\"{self._analysis_desc.task_result.task_name} status\": \"COMPLETED\",\n        }\n        post_elog_run_status(elog_data)\n\n    self.add_hook(\"task_done\", task_done)\n\n    def task_cancelled(self: Executor, msg: Message):\n        elog_data: Dict[str, str] = {\n            f\"{self._analysis_desc.task_result.task_name} status\": \"CANCELLED\",\n        }\n        post_elog_run_status(elog_data)\n\n    self.add_hook(\"task_cancelled\", task_cancelled)\n\n    def task_result(self: Executor, msg: Message):\n        if isinstance(msg.contents, TaskResult):\n            self._analysis_desc.task_result = msg.contents\n            logger.info(self._analysis_desc.task_result.summary)\n            logger.info(self._analysis_desc.task_result.task_status)\n        elog_data: Dict[str, str] = {\n            f\"{self._analysis_desc.task_result.task_name} status\": \"COMPLETED\",\n        }\n        post_elog_run_status(elog_data)\n\n    self.add_hook(\"task_result\", task_result)\n
"},{"location":"source/execution/executor/#execution.executor.MPIExecutor","title":"MPIExecutor","text":"

Bases: Executor

Runs first-party Tasks that require MPI.

This Executor is otherwise identical to the standard Executor, except it uses mpirun for Task submission. Currently this Executor assumes a job has been submitted using SLURM as a first step. It will determine the number of MPI ranks based on the resources requested. As a fallback, it will try to determine the number of local cores available for cases where a job has not been submitted via SLURM. On S3DF, the second determination mechanism should accurately match the environment variable provided by SLURM indicating resources allocated.

This Executor will submit the Task to run with a number of processes equal to the total number of cores available minus 1. A single core is reserved for the Executor itself. Note that currently this means that you must submit on 3 cores or more, since MPI requires a minimum of 2 ranks, and the number of ranks is determined from the cores dedicated to Task execution.

Methods:

Name Description _submit_cmd

Run the task as a subprocess using mpirun.

Source code in lute/execution/executor.py
class MPIExecutor(Executor):\n    \"\"\"Runs first-party Tasks that require MPI.\n\n    This Executor is otherwise identical to the standard Executor, except it\n    uses `mpirun` for `Task` submission. Currently this Executor assumes a job\n    has been submitted using SLURM as a first step. It will determine the number\n    of MPI ranks based on the resources requested. As a fallback, it will try\n    to determine the number of local cores available for cases where a job has\n    not been submitted via SLURM. On S3DF, the second determination mechanism\n    should accurately match the environment variable provided by SLURM indicating\n    resources allocated.\n\n    This Executor will submit the Task to run with a number of processes equal\n    to the total number of cores available minus 1. A single core is reserved\n    for the Executor itself. Note that currently this means that you must submit\n    on 3 cores or more, since MPI requires a minimum of 2 ranks, and the number\n    of ranks is determined from the cores dedicated to Task execution.\n\n    Methods:\n        _submit_cmd: Run the task as a subprocess using `mpirun`.\n    \"\"\"\n\n    def _submit_cmd(self, executable_path: str, params: str) -> str:\n        \"\"\"Override submission command to use `mpirun`\n\n        Args:\n            executable_path (str): Path to the LUTE subprocess script.\n\n            params (str): String of formatted command-line arguments.\n\n        Returns:\n            cmd (str): Appropriately formatted command for this Executor.\n        \"\"\"\n        py_cmd: str = \"\"\n        nprocs: int = max(\n            int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1\n        )\n        mpi_cmd: str = f\"mpirun -np {nprocs}\"\n        if __debug__:\n            py_cmd = f\"python -B -u -m mpi4py.run {executable_path} {params}\"\n        else:\n            py_cmd = f\"python -OB -u -m mpi4py.run {executable_path} {params}\"\n\n        cmd: str = f\"{mpi_cmd} {py_cmd}\"\n        return cmd\n
"},{"location":"source/execution/executor/#execution.executor.Party","title":"Party","text":"

Bases: Enum

Identifier for which party (side/end) is using a communicator.

For some types of communication streams there may be different interfaces depending on which side of the communicator you are on. This enum is used by the communicator to determine which interface to use.

Source code in lute/execution/ipc.py
class Party(Enum):\n    \"\"\"Identifier for which party (side/end) is using a communicator.\n\n    For some types of communication streams there may be different interfaces\n    depending on which side of the communicator you are on. This enum is used\n    by the communicator to determine which interface to use.\n    \"\"\"\n\n    TASK = 0\n    \"\"\"\n    The Task (client) side.\n    \"\"\"\n    EXECUTOR = 1\n    \"\"\"\n    The Executor (server) side.\n    \"\"\"\n
"},{"location":"source/execution/executor/#execution.executor.Party.EXECUTOR","title":"EXECUTOR = 1 class-attribute instance-attribute","text":"

The Executor (server) side.

"},{"location":"source/execution/executor/#execution.executor.Party.TASK","title":"TASK = 0 class-attribute instance-attribute","text":"

The Task (client) side.

"},{"location":"source/execution/executor/#execution.executor.PipeCommunicator","title":"PipeCommunicator","text":"

Bases: Communicator

Provides communication through pipes over stderr/stdout.

The implementation of this communicator has reading and writing ocurring on stderr and stdout. In general the Task will be writing while the Executor will be reading. stderr is used for sending signals.

Source code in lute/execution/ipc.py
class PipeCommunicator(Communicator):\n    \"\"\"Provides communication through pipes over stderr/stdout.\n\n    The implementation of this communicator has reading and writing ocurring\n    on stderr and stdout. In general the `Task` will be writing while the\n    `Executor` will be reading. `stderr` is used for sending signals.\n    \"\"\"\n\n    def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n        \"\"\"IPC through pipes.\n\n        Arbitrary objects may be transmitted using pickle to serialize the data.\n        If pickle is not used\n\n        Args:\n            party (Party): Which object (side/process) the Communicator is\n                managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n            use_pickle (bool): Whether to serialize data using Pickle prior to\n                sending it. If False, data is assumed to be text whi\n        \"\"\"\n        super().__init__(party=party, use_pickle=use_pickle)\n        self.desc = \"Communicates through stderr and stdout using pickle.\"\n\n    def read(self, proc: subprocess.Popen) -> Message:\n        \"\"\"Read from stdout and stderr.\n\n        Args:\n            proc (subprocess.Popen): The process to read from.\n\n        Returns:\n            msg (Message): The message read, containing contents and signal.\n        \"\"\"\n        signal: Optional[str]\n        contents: Optional[str]\n        raw_signal: bytes = proc.stderr.read()\n        raw_contents: bytes = proc.stdout.read()\n        if raw_signal is not None:\n            signal = raw_signal.decode()\n        else:\n            signal = raw_signal\n        if raw_contents:\n            if self._use_pickle:\n                try:\n                    contents = pickle.loads(raw_contents)\n                except (pickle.UnpicklingError, ValueError, EOFError) as err:\n                    logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=False\")\n                    self._use_pickle = False\n                    contents = self._safe_unpickle_decode(raw_contents)\n            else:\n                try:\n                    contents = raw_contents.decode()\n                except UnicodeDecodeError as err:\n                    logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=True\")\n                    self._use_pickle = True\n                    contents = self._safe_unpickle_decode(raw_contents)\n        else:\n            contents = None\n\n        if signal and signal not in LUTE_SIGNALS:\n            # Some tasks write on stderr\n            # If the signal channel has \"non-signal\" info, add it to\n            # contents\n            if not contents:\n                contents = f\"({signal})\"\n            else:\n                contents = f\"{contents} ({signal})\"\n            signal = None\n\n        return Message(contents=contents, signal=signal)\n\n    def _safe_unpickle_decode(self, maybe_mixed: bytes) -> Optional[str]:\n        \"\"\"This method is used to unpickle and/or decode a bytes object.\n\n        It attempts to handle cases where contents can be mixed, i.e., part of\n        the message must be decoded and the other part unpickled. It handles\n        only two-way splits. If there are more complex arrangements such as:\n        <pickled>:<unpickled>:<pickled> etc, it will give up.\n\n        The simpler two way splits are unlikely to occur in normal usage. They\n        may arise when debugging if, e.g., `print` statements are mixed with the\n        usage of the `_report_to_executor` method.\n\n        Note that this method works because ONLY text data is assumed to be\n        sent via the pipes. The method needs to be revised to handle non-text\n        data if the `Task` is modified to also send that via PipeCommunicator.\n        The use of pickle is supported to provide for this option if it is\n        necessary. It may be deprecated in the future.\n\n        Be careful when making changes. This method has seemingly redundant\n        checks because unpickling will not throw an error if a full object can\n        be retrieved. That is, the library will ignore extraneous bytes. This\n        method attempts to retrieve that information if the pickled data comes\n        first in the stream.\n\n        Args:\n            maybe_mixed (bytes): A bytes object which could require unpickling,\n                decoding, or both.\n\n        Returns:\n            contents (Optional[str]): The unpickled/decoded contents if possible.\n                Otherwise, None.\n        \"\"\"\n        contents: Optional[str]\n        try:\n            contents = pickle.loads(maybe_mixed)\n            repickled: bytes = pickle.dumps(contents)\n            if len(repickled) < len(maybe_mixed):\n                # Successful unpickling, but pickle stops even if there are more bytes\n                try:\n                    additional_data: str = maybe_mixed[len(repickled) :].decode()\n                    contents = f\"{contents}{additional_data}\"\n                except UnicodeDecodeError:\n                    # Can't decode the bytes left by pickle, so they are lost\n                    missing_bytes: int = len(maybe_mixed) - len(repickled)\n                    logger.debug(\n                        f\"PipeCommunicator has truncated message. Unable to retrieve {missing_bytes} bytes.\"\n                    )\n        except (pickle.UnpicklingError, ValueError, EOFError) as err:\n            # Pickle may also throw a ValueError, e.g. this bytes: b\"Found! \\n\"\n            # Pickle may also throw an EOFError, eg. this bytes: b\"F0\\n\"\n            try:\n                contents = maybe_mixed.decode()\n            except UnicodeDecodeError as err2:\n                try:\n                    contents = maybe_mixed[: err2.start].decode()\n                    contents = f\"{contents}{pickle.loads(maybe_mixed[err2.start:])}\"\n                except Exception as err3:\n                    logger.debug(\n                        f\"PipeCommunicator unable to decode/parse data! {err3}\"\n                    )\n                    contents = None\n        return contents\n\n    def write(self, msg: Message) -> None:\n        \"\"\"Write to stdout and stderr.\n\n         The signal component is sent to `stderr` while the contents of the\n         Message are sent to `stdout`.\n\n        Args:\n            msg (Message): The Message to send.\n        \"\"\"\n        if self._use_pickle:\n            signal: bytes\n            if msg.signal:\n                signal = msg.signal.encode()\n            else:\n                signal = b\"\"\n\n            contents: bytes = pickle.dumps(msg.contents)\n\n            sys.stderr.buffer.write(signal)\n            sys.stdout.buffer.write(contents)\n\n            sys.stderr.buffer.flush()\n            sys.stdout.buffer.flush()\n        else:\n            raw_signal: str\n            if msg.signal:\n                raw_signal = msg.signal\n            else:\n                raw_signal = \"\"\n\n            raw_contents: str\n            if isinstance(msg.contents, str):\n                raw_contents = msg.contents\n            elif msg.contents is None:\n                raw_contents = \"\"\n            else:\n                raise ValueError(\n                    f\"Cannot send msg contents of type: {type(msg.contents)} when not using pickle!\"\n                )\n            sys.stderr.write(raw_signal)\n            sys.stdout.write(raw_contents)\n
"},{"location":"source/execution/executor/#execution.executor.PipeCommunicator.__init__","title":"__init__(party=Party.TASK, use_pickle=True)","text":"

IPC through pipes.

Arbitrary objects may be transmitted using pickle to serialize the data. If pickle is not used

Parameters:

Name Type Description Default party Party

Which object (side/process) the Communicator is managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.

TASK use_pickle bool

Whether to serialize data using Pickle prior to sending it. If False, data is assumed to be text whi

True Source code in lute/execution/ipc.py
def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n    \"\"\"IPC through pipes.\n\n    Arbitrary objects may be transmitted using pickle to serialize the data.\n    If pickle is not used\n\n    Args:\n        party (Party): Which object (side/process) the Communicator is\n            managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n        use_pickle (bool): Whether to serialize data using Pickle prior to\n            sending it. If False, data is assumed to be text whi\n    \"\"\"\n    super().__init__(party=party, use_pickle=use_pickle)\n    self.desc = \"Communicates through stderr and stdout using pickle.\"\n
"},{"location":"source/execution/executor/#execution.executor.PipeCommunicator.read","title":"read(proc)","text":"

Read from stdout and stderr.

Parameters:

Name Type Description Default proc Popen

The process to read from.

required

Returns:

Name Type Description msg Message

The message read, containing contents and signal.

Source code in lute/execution/ipc.py
def read(self, proc: subprocess.Popen) -> Message:\n    \"\"\"Read from stdout and stderr.\n\n    Args:\n        proc (subprocess.Popen): The process to read from.\n\n    Returns:\n        msg (Message): The message read, containing contents and signal.\n    \"\"\"\n    signal: Optional[str]\n    contents: Optional[str]\n    raw_signal: bytes = proc.stderr.read()\n    raw_contents: bytes = proc.stdout.read()\n    if raw_signal is not None:\n        signal = raw_signal.decode()\n    else:\n        signal = raw_signal\n    if raw_contents:\n        if self._use_pickle:\n            try:\n                contents = pickle.loads(raw_contents)\n            except (pickle.UnpicklingError, ValueError, EOFError) as err:\n                logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=False\")\n                self._use_pickle = False\n                contents = self._safe_unpickle_decode(raw_contents)\n        else:\n            try:\n                contents = raw_contents.decode()\n            except UnicodeDecodeError as err:\n                logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=True\")\n                self._use_pickle = True\n                contents = self._safe_unpickle_decode(raw_contents)\n    else:\n        contents = None\n\n    if signal and signal not in LUTE_SIGNALS:\n        # Some tasks write on stderr\n        # If the signal channel has \"non-signal\" info, add it to\n        # contents\n        if not contents:\n            contents = f\"({signal})\"\n        else:\n            contents = f\"{contents} ({signal})\"\n        signal = None\n\n    return Message(contents=contents, signal=signal)\n
"},{"location":"source/execution/executor/#execution.executor.PipeCommunicator.write","title":"write(msg)","text":"

Write to stdout and stderr.

The signal component is sent to stderr while the contents of the Message are sent to stdout.

Parameters:

Name Type Description Default msg Message

The Message to send.

required Source code in lute/execution/ipc.py
def write(self, msg: Message) -> None:\n    \"\"\"Write to stdout and stderr.\n\n     The signal component is sent to `stderr` while the contents of the\n     Message are sent to `stdout`.\n\n    Args:\n        msg (Message): The Message to send.\n    \"\"\"\n    if self._use_pickle:\n        signal: bytes\n        if msg.signal:\n            signal = msg.signal.encode()\n        else:\n            signal = b\"\"\n\n        contents: bytes = pickle.dumps(msg.contents)\n\n        sys.stderr.buffer.write(signal)\n        sys.stdout.buffer.write(contents)\n\n        sys.stderr.buffer.flush()\n        sys.stdout.buffer.flush()\n    else:\n        raw_signal: str\n        if msg.signal:\n            raw_signal = msg.signal\n        else:\n            raw_signal = \"\"\n\n        raw_contents: str\n        if isinstance(msg.contents, str):\n            raw_contents = msg.contents\n        elif msg.contents is None:\n            raw_contents = \"\"\n        else:\n            raise ValueError(\n                f\"Cannot send msg contents of type: {type(msg.contents)} when not using pickle!\"\n            )\n        sys.stderr.write(raw_signal)\n        sys.stdout.write(raw_contents)\n
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator","title":"SocketCommunicator","text":"

Bases: Communicator

Provides communication over Unix or TCP sockets.

Communication is provided either using sockets with the Python socket library or using ZMQ. The choice of implementation is controlled by the global bool USE_ZMQ.

Whether to use TCP or Unix sockets is controlled by the environment

LUTE_USE_TCP=1

If defined, TCP sockets will be used, otherwise Unix sockets will be used.

Regardless of socket type, the environment variable LUTE_EXECUTOR_HOST=<hostname> will be defined by the Executor-side Communicator.

For TCP sockets: The Executor-side Communicator should be run first and will bind to all interfaces on the port determined by the environment variable: LUTE_PORT=### If no port is defined, a port scan will be performed and the Executor-side Communicator will bind the first one available from a random selection. It will then define the environment variable so the Task-side can pick it up.

For Unix sockets: The path to the Unix socket is defined by the environment variable: LUTE_SOCKET=/path/to/socket This class assumes proper permissions and that this above environment variable has been defined. The Task is configured as what would commonly be referred to as the client, while the Executor is configured as the server.

If the Task process is run on a different machine than the Executor, the Task-side Communicator will open a ssh-tunnel to forward traffic from a local Unix socket to the Executor Unix socket. Opening of the tunnel relies on the environment variable: LUTE_EXECUTOR_HOST=<hostname> to determine the Executor's host. This variable should be defined by the Executor and passed to the Task process automatically, but it can also be defined manually if launching the Task process separately. The Task will use the local socket <LUTE_SOCKET>.task{##}. Multiple local sockets may be created. Currently, it is assumed that the user is identical on both the Task machine and Executor machine.

Source code in lute/execution/ipc.py
class SocketCommunicator(Communicator):\n    \"\"\"Provides communication over Unix or TCP sockets.\n\n    Communication is provided either using sockets with the Python socket library\n    or using ZMQ. The choice of implementation is controlled by the global bool\n    `USE_ZMQ`.\n\n    Whether to use TCP or Unix sockets is controlled by the environment:\n                           `LUTE_USE_TCP=1`\n    If defined, TCP sockets will be used, otherwise Unix sockets will be used.\n\n    Regardless of socket type, the environment variable\n                      `LUTE_EXECUTOR_HOST=<hostname>`\n    will be defined by the Executor-side Communicator.\n\n\n    For TCP sockets:\n    The Executor-side Communicator should be run first and will bind to all\n    interfaces on the port determined by the environment variable:\n                            `LUTE_PORT=###`\n    If no port is defined, a port scan will be performed and the Executor-side\n    Communicator will bind the first one available from a random selection. It\n    will then define the environment variable so the Task-side can pick it up.\n\n    For Unix sockets:\n    The path to the Unix socket is defined by the environment variable:\n                      `LUTE_SOCKET=/path/to/socket`\n    This class assumes proper permissions and that this above environment\n    variable has been defined. The `Task` is configured as what would commonly\n    be referred to as the `client`, while the `Executor` is configured as the\n    server.\n\n    If the Task process is run on a different machine than the Executor, the\n    Task-side Communicator will open a ssh-tunnel to forward traffic from a local\n    Unix socket to the Executor Unix socket. Opening of the tunnel relies on the\n    environment variable:\n                      `LUTE_EXECUTOR_HOST=<hostname>`\n    to determine the Executor's host. This variable should be defined by the\n    Executor and passed to the Task process automatically, but it can also be\n    defined manually if launching the Task process separately. The Task will use\n    the local socket `<LUTE_SOCKET>.task{##}`. Multiple local sockets may be\n    created. Currently, it is assumed that the user is identical on both the Task\n    machine and Executor machine.\n    \"\"\"\n\n    ACCEPT_TIMEOUT: float = 0.01\n    \"\"\"\n    Maximum time to wait to accept connections. Used by Executor-side.\n    \"\"\"\n    MSG_HEAD: bytes = b\"MSG\"\n    \"\"\"\n    Start signal of a message. The end of a message is indicated by MSG_HEAD[::-1].\n    \"\"\"\n    MSG_SEP: bytes = b\";;;\"\n    \"\"\"\n    Separator for parts of a message. Messages have a start, length, message and end.\n    \"\"\"\n\n    def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n        \"\"\"IPC over a TCP or Unix socket.\n\n        Unlike with the PipeCommunicator, pickle is always used to send data\n        through the socket.\n\n        Args:\n            party (Party): Which object (side/process) the Communicator is\n                managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n\n            use_pickle (bool): Whether to use pickle. Always True currently,\n                passing False does not change behaviour.\n        \"\"\"\n        super().__init__(party=party, use_pickle=use_pickle)\n\n    def delayed_setup(self) -> None:\n        \"\"\"Delays the creation of socket objects.\n\n        The Executor initializes the Communicator when it is created. Since\n        all Executors are created and available at once we want to delay\n        acquisition of socket resources until a single Executor is ready\n        to use them.\n        \"\"\"\n        self._data_socket: Union[socket.socket, zmq.sugar.socket.Socket]\n        if USE_ZMQ:\n            self.desc: str = \"Communicates using ZMQ through TCP or Unix sockets.\"\n            self._context: zmq.context.Context = zmq.Context()\n            self._data_socket = self._create_socket_zmq()\n        else:\n            self.desc: str = \"Communicates through a TCP or Unix socket.\"\n            self._data_socket = self._create_socket_raw()\n            self._data_socket.settimeout(SocketCommunicator.ACCEPT_TIMEOUT)\n\n        if self._party == Party.EXECUTOR:\n            # Executor created first so we can define the hostname env variable\n            os.environ[\"LUTE_EXECUTOR_HOST\"] = socket.gethostname()\n            # Setup reader thread\n            self._reader_thread: threading.Thread = threading.Thread(\n                target=self._read_socket\n            )\n            self._msg_queue: queue.Queue = queue.Queue()\n            self._partial_msg: Optional[bytes] = None\n            self._stop_thread: bool = False\n            self._reader_thread.start()\n        else:\n            # Only used by Party.TASK\n            self._use_ssh_tunnel: bool = False\n            self._ssh_proc: Optional[subprocess.Popen] = None\n            self._local_socket_path: Optional[str] = None\n\n    # Read\n    ############################################################################\n\n    def read(self, proc: subprocess.Popen) -> Message:\n        \"\"\"Return a message from the queue if available.\n\n        Socket(s) are continuously monitored, and read from when new data is\n        available.\n\n        Args:\n            proc (subprocess.Popen): The process to read from. Provided for\n                compatibility with other Communicator subtypes. Is ignored.\n\n        Returns:\n             msg (Message): The message read, containing contents and signal.\n        \"\"\"\n        msg: Message\n        try:\n            msg = self._msg_queue.get(timeout=SocketCommunicator.ACCEPT_TIMEOUT)\n        except queue.Empty:\n            msg = Message()\n\n        return msg\n\n    def _read_socket(self) -> None:\n        \"\"\"Read data from a socket.\n\n        Socket(s) are continuously monitored, and read from when new data is\n        available.\n\n        Calls an underlying method for either raw sockets or ZMQ.\n        \"\"\"\n\n        while True:\n            if self._stop_thread:\n                logger.debug(\"Stopping socket reader thread.\")\n                break\n            if USE_ZMQ:\n                self._read_socket_zmq()\n            else:\n                self._read_socket_raw()\n\n    def _read_socket_raw(self) -> None:\n        \"\"\"Read data from a socket.\n\n        Raw socket implementation for the reader thread.\n        \"\"\"\n        connection: socket.socket\n        addr: Union[str, Tuple[str, int]]\n        try:\n            connection, addr = self._data_socket.accept()\n            full_data: bytes = b\"\"\n            while True:\n                data: bytes = connection.recv(8192)\n                if data:\n                    full_data += data\n                else:\n                    break\n            connection.close()\n            self._unpack_messages(full_data)\n        except socket.timeout:\n            pass\n\n    def _read_socket_zmq(self) -> None:\n        \"\"\"Read data from a socket.\n\n        ZMQ implementation for the reader thread.\n        \"\"\"\n        try:\n            full_data: bytes = self._data_socket.recv(0)\n            self._unpack_messages(full_data)\n        except zmq.ZMQError:\n            pass\n\n    def _unpack_messages(self, data: bytes) -> None:\n        \"\"\"Unpacks a byte stream into individual messages.\n\n        Messages are encoded in the following format:\n                 <HEAD><SEP><len(msg)><SEP><msg><SEP><HEAD[::-1]>\n        The items between <> are replaced as follows:\n            - <HEAD>: A start marker\n            - <SEP>: A separator for components of the message\n            - <len(msg)>: The length of the message payload in bytes.\n            - <msg>: The message payload in bytes\n            - <HEAD[::-1]>: The start marker in reverse to indicate the end.\n\n        Partial messages (a series of bytes which cannot be converted to a full\n        message) are stored for later. An attempt is made to reconstruct the\n        message with the next call to this method.\n\n        Args:\n            data (bytes): A raw byte stream containing anywhere from a partial\n                message to multiple full messages.\n        \"\"\"\n        msg: Message\n        working_data: bytes\n        if self._partial_msg:\n            # Concatenate the previous partial message to the beginning\n            working_data = self._partial_msg + data\n            self._partial_msg = None\n        else:\n            working_data = data\n        while working_data:\n            try:\n                # Message encoding: <HEAD><SEP><len><SEP><msg><SEP><HEAD[::-1]>\n                end = working_data.find(\n                    SocketCommunicator.MSG_SEP + SocketCommunicator.MSG_HEAD[::-1]\n                )\n                msg_parts: List[bytes] = working_data[:end].split(\n                    SocketCommunicator.MSG_SEP\n                )\n                if len(msg_parts) != 3:\n                    self._partial_msg = working_data\n                    break\n\n                cmd: bytes\n                nbytes: bytes\n                raw_msg: bytes\n                cmd, nbytes, raw_msg = msg_parts\n                if len(raw_msg) != int(nbytes):\n                    self._partial_msg = working_data\n                    break\n                msg = pickle.loads(raw_msg)\n                self._msg_queue.put(msg)\n            except pickle.UnpicklingError:\n                self._partial_msg = working_data\n                break\n            if end < len(working_data):\n                # Add len(SEP+HEAD) since end marks the start of <SEP><HEAD[::-1]\n                offset: int = len(\n                    SocketCommunicator.MSG_SEP + SocketCommunicator.MSG_HEAD\n                )\n                working_data = working_data[end + offset :]\n            else:\n                working_data = b\"\"\n\n    # Write\n    ############################################################################\n\n    def _write_socket(self, msg: Message) -> None:\n        \"\"\"Sends data over a socket from the 'client' (Task) side.\n\n        Messages are encoded in the following format:\n                 <HEAD><SEP><len(msg)><SEP><msg><SEP><HEAD[::-1]>\n        The items between <> are replaced as follows:\n            - <HEAD>: A start marker\n            - <SEP>: A separator for components of the message\n            - <len(msg)>: The length of the message payload in bytes.\n            - <msg>: The message payload in bytes\n            - <HEAD[::-1]>: The start marker in reverse to indicate the end.\n\n        This structure is used for decoding the message on the other end.\n        \"\"\"\n        data: bytes = pickle.dumps(msg)\n        cmd: bytes = SocketCommunicator.MSG_HEAD\n        size: bytes = b\"%d\" % len(data)\n        end: bytes = SocketCommunicator.MSG_HEAD[::-1]\n        sep: bytes = SocketCommunicator.MSG_SEP\n        packed_msg: bytes = cmd + sep + size + sep + data + sep + end\n        if USE_ZMQ:\n            self._data_socket.send(packed_msg)\n        else:\n            self._data_socket.sendall(packed_msg)\n\n    def write(self, msg: Message) -> None:\n        \"\"\"Send a single Message.\n\n        The entire Message (signal and contents) is serialized and sent through\n        a connection over Unix socket.\n\n        Args:\n            msg (Message): The Message to send.\n        \"\"\"\n        self._write_socket(msg)\n\n    # Generic create\n    ############################################################################\n\n    def _create_socket_raw(self) -> socket.socket:\n        \"\"\"Create either a Unix or TCP socket.\n\n        If the environment variable:\n                              `LUTE_USE_TCP=1`\n        is defined, a TCP socket is returned, otherwise a Unix socket.\n\n        Refer to the individual initialization methods for additional environment\n        variables controlling the behaviour of these two communication types.\n\n        Returns:\n            data_socket (socket.socket): TCP or Unix socket.\n        \"\"\"\n        import struct\n\n        use_tcp: Optional[str] = os.getenv(\"LUTE_USE_TCP\")\n        sock: socket.socket\n        if use_tcp is not None:\n            if self._party == Party.EXECUTOR:\n                logger.info(\"Will use raw TCP sockets.\")\n            sock = self._init_tcp_socket_raw()\n        else:\n            if self._party == Party.EXECUTOR:\n                logger.info(\"Will use raw Unix sockets.\")\n            sock = self._init_unix_socket_raw()\n        sock.setsockopt(\n            socket.SOL_SOCKET, socket.SO_LINGER, struct.pack(\"ii\", 1, 10000)\n        )\n        return sock\n\n    def _create_socket_zmq(self) -> zmq.sugar.socket.Socket:\n        \"\"\"Create either a Unix or TCP socket.\n\n        If the environment variable:\n                              `LUTE_USE_TCP=1`\n        is defined, a TCP socket is returned, otherwise a Unix socket.\n\n        Refer to the individual initialization methods for additional environment\n        variables controlling the behaviour of these two communication types.\n\n        Returns:\n            data_socket (socket.socket): Unix socket object.\n        \"\"\"\n        socket_type: Literal[zmq.PULL, zmq.PUSH]\n        if self._party == Party.EXECUTOR:\n            socket_type = zmq.PULL\n        else:\n            socket_type = zmq.PUSH\n\n        data_socket: zmq.sugar.socket.Socket = self._context.socket(socket_type)\n        data_socket.set_hwm(160000)\n        # Need to multiply by 1000 since ZMQ uses ms\n        data_socket.setsockopt(\n            zmq.RCVTIMEO, int(SocketCommunicator.ACCEPT_TIMEOUT * 1000)\n        )\n        # Try TCP first\n        use_tcp: Optional[str] = os.getenv(\"LUTE_USE_TCP\")\n        if use_tcp is not None:\n            if self._party == Party.EXECUTOR:\n                logger.info(\"Will use TCP (ZMQ).\")\n            self._init_tcp_socket_zmq(data_socket)\n        else:\n            if self._party == Party.EXECUTOR:\n                logger.info(\"Will use Unix sockets (ZMQ).\")\n            self._init_unix_socket_zmq(data_socket)\n\n        return data_socket\n\n    # TCP Init\n    ############################################################################\n\n    def _find_random_port(\n        self, min_port: int = 41923, max_port: int = 64324, max_tries: int = 100\n    ) -> Optional[int]:\n        \"\"\"Find a random open port to bind to if using TCP.\"\"\"\n        from random import choices\n\n        sock: socket.socket\n        ports: List[int] = choices(range(min_port, max_port), k=max_tries)\n        for port in ports:\n            sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)\n            try:\n                sock.bind((\"\", port))\n                sock.close()\n                del sock\n                return port\n            except:\n                continue\n        return None\n\n    def _init_tcp_socket_raw(self) -> socket.socket:\n        \"\"\"Initialize a TCP socket.\n\n        Executor-side code should always be run first. It checks to see if\n        the environment variable\n                                `LUTE_PORT=###`\n        is defined, if so binds it, otherwise find a free port from a selection\n        of random ports. If a port search is performed, the `LUTE_PORT` variable\n        will be defined so it can be picked up by the the Task-side Communicator.\n\n        In the event that no port can be bound on the Executor-side, or the port\n        and hostname information is unavailable to the Task-side, the program\n        will exit.\n\n        Returns:\n            data_socket (socket.socket): TCP socket object.\n        \"\"\"\n        data_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)\n        port: Optional[Union[str, int]] = os.getenv(\"LUTE_PORT\")\n        if self._party == Party.EXECUTOR:\n            if port is None:\n                # If port is None find one\n                # Executor code executes first\n                port = self._find_random_port()\n                if port is None:\n                    # Failed to find a port to bind\n                    logger.info(\n                        \"Executor failed to bind a port. \"\n                        \"Try providing a LUTE_PORT directly! Exiting!\"\n                    )\n                    sys.exit(-1)\n                # Provide port env var for Task-side\n                os.environ[\"LUTE_PORT\"] = str(port)\n            data_socket.bind((\"\", int(port)))\n            data_socket.listen()\n        else:\n            hostname: str = socket.gethostname()\n            executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n            if executor_hostname is None or port is None:\n                logger.info(\n                    \"Task-side does not have host/port information!\"\n                    \" Check environment variables! Exiting!\"\n                )\n                sys.exit(-1)\n            if hostname == executor_hostname:\n                data_socket.connect((\"localhost\", int(port)))\n            else:\n                data_socket.connect((executor_hostname, int(port)))\n        return data_socket\n\n    def _init_tcp_socket_zmq(self, data_socket: zmq.sugar.socket.Socket) -> None:\n        \"\"\"Initialize a TCP socket using ZMQ.\n\n        Equivalent as the method above but requires passing in a ZMQ socket\n        object instead of returning one.\n\n        Args:\n            data_socket (zmq.socket.Socket): Socket object.\n        \"\"\"\n        port: Optional[Union[str, int]] = os.getenv(\"LUTE_PORT\")\n        if self._party == Party.EXECUTOR:\n            if port is None:\n                new_port: int = data_socket.bind_to_random_port(\"tcp://*\")\n                if new_port is None:\n                    # Failed to find a port to bind\n                    logger.info(\n                        \"Executor failed to bind a port. \"\n                        \"Try providing a LUTE_PORT directly! Exiting!\"\n                    )\n                    sys.exit(-1)\n                port = new_port\n                os.environ[\"LUTE_PORT\"] = str(port)\n            else:\n                data_socket.bind(f\"tcp://*:{port}\")\n            logger.debug(f\"Executor bound port {port}\")\n        else:\n            executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n            if executor_hostname is None or port is None:\n                logger.info(\n                    \"Task-side does not have host/port information!\"\n                    \" Check environment variables! Exiting!\"\n                )\n                sys.exit(-1)\n            data_socket.connect(f\"tcp://{executor_hostname}:{port}\")\n\n    # Unix Init\n    ############################################################################\n\n    def _get_socket_path(self) -> str:\n        \"\"\"Return the socket path, defining one if it is not available.\n\n        Returns:\n            socket_path (str): Path to the Unix socket.\n        \"\"\"\n        socket_path: str\n        try:\n            socket_path = os.environ[\"LUTE_SOCKET\"]\n        except KeyError as err:\n            import uuid\n            import tempfile\n\n            # Define a path, and add to environment\n            # Executor-side always created first, Task will use the same one\n            socket_path = f\"{tempfile.gettempdir()}/lute_{uuid.uuid4().hex}.sock\"\n            os.environ[\"LUTE_SOCKET\"] = socket_path\n            logger.debug(f\"SocketCommunicator defines socket_path: {socket_path}\")\n        if USE_ZMQ:\n            return f\"ipc://{socket_path}\"\n        else:\n            return socket_path\n\n    def _init_unix_socket_raw(self) -> socket.socket:\n        \"\"\"Returns a Unix socket object.\n\n        Executor-side code should always be run first. It checks to see if\n        the environment variable\n                                `LUTE_SOCKET=XYZ`\n        is defined, if so binds it, otherwise it will create a new path and\n        define the environment variable for the Task-side to find.\n\n        On the Task (client-side), this method will also open a SSH tunnel to\n        forward a local Unix socket to an Executor Unix socket if the Task and\n        Executor processes are on different machines.\n\n        Returns:\n            data_socket (socket.socket): Unix socket object.\n        \"\"\"\n        socket_path: str = self._get_socket_path()\n        data_socket = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)\n        if self._party == Party.EXECUTOR:\n            if os.path.exists(socket_path):\n                os.unlink(socket_path)\n            data_socket.bind(socket_path)\n            data_socket.listen()\n        elif self._party == Party.TASK:\n            hostname: str = socket.gethostname()\n            executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n            if executor_hostname is None:\n                logger.info(\"Hostname for Executor process not found! Exiting!\")\n                data_socket.close()\n                sys.exit(-1)\n            if hostname == executor_hostname:\n                data_socket.connect(socket_path)\n            else:\n                self._local_socket_path = self._setup_unix_ssh_tunnel(\n                    socket_path, hostname, executor_hostname\n                )\n                while 1:\n                    # Keep trying reconnect until ssh tunnel works.\n                    try:\n                        data_socket.connect(self._local_socket_path)\n                        break\n                    except FileNotFoundError:\n                        continue\n\n        return data_socket\n\n    def _init_unix_socket_zmq(self, data_socket: zmq.sugar.socket.Socket) -> None:\n        \"\"\"Initialize a Unix socket object, using ZMQ.\n\n        Equivalent as the method above but requires passing in a ZMQ socket\n        object instead of returning one.\n\n        Args:\n            data_socket (socket.socket): ZMQ object.\n        \"\"\"\n        socket_path = self._get_socket_path()\n        if self._party == Party.EXECUTOR:\n            if os.path.exists(socket_path):\n                os.unlink(socket_path)\n            data_socket.bind(socket_path)\n        elif self._party == Party.TASK:\n            hostname: str = socket.gethostname()\n            executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n            if executor_hostname is None:\n                logger.info(\"Hostname for Executor process not found! Exiting!\")\n                self._data_socket.close()\n                sys.exit(-1)\n            if hostname == executor_hostname:\n                data_socket.connect(socket_path)\n            else:\n                # Need to remove ipc:// from socket_path for forwarding\n                self._local_socket_path = self._setup_unix_ssh_tunnel(\n                    socket_path[6:], hostname, executor_hostname\n                )\n                # Need to add it back\n                path: str = f\"ipc://{self._local_socket_path}\"\n                data_socket.connect(path)\n\n    def _setup_unix_ssh_tunnel(\n        self, socket_path: str, hostname: str, executor_hostname: str\n    ) -> str:\n        \"\"\"Prepares an SSH tunnel for forwarding between Unix sockets on two hosts.\n\n        An SSH tunnel is opened with `ssh -L <local>:<remote> sleep 2`.\n        This method of communication is slightly slower and incurs additional\n        overhead - it should only be used as a backup. If communication across\n        multiple hosts is required consider using TCP.  The Task will use\n        the local socket `<LUTE_SOCKET>.task{##}`. Multiple local sockets may be\n        created. It is assumed that the user is identical on both the\n        Task machine and Executor machine.\n\n        Returns:\n            local_socket_path (str): The local Unix socket to connect to.\n        \"\"\"\n        if \"uuid\" not in globals():\n            import uuid\n        local_socket_path = f\"{socket_path}.task{uuid.uuid4().hex[:4]}\"\n        self._use_ssh_tunnel = True\n        ssh_cmd: List[str] = [\n            \"ssh\",\n            \"-o\",\n            \"LogLevel=quiet\",\n            \"-L\",\n            f\"{local_socket_path}:{socket_path}\",\n            executor_hostname,\n            \"sleep\",\n            \"2\",\n        ]\n        logger.debug(f\"Opening tunnel from {hostname} to {executor_hostname}\")\n        self._ssh_proc = subprocess.Popen(ssh_cmd)\n        time.sleep(0.4)  # Need to wait... -> Use single Task comm at beginning?\n        return local_socket_path\n\n    # Clean up and properties\n    ############################################################################\n\n    def _clean_up(self) -> None:\n        \"\"\"Clean up connections.\"\"\"\n        if self._party == Party.EXECUTOR:\n            self._stop_thread = True\n            self._reader_thread.join()\n            logger.debug(\"Closed reading thread.\")\n\n        self._data_socket.close()\n        if USE_ZMQ:\n            self._context.term()\n        else:\n            ...\n\n        if os.getenv(\"LUTE_USE_TCP\"):\n            return\n        else:\n            if self._party == Party.EXECUTOR:\n                os.unlink(os.getenv(\"LUTE_SOCKET\"))  # Should be defined\n                return\n            elif self._use_ssh_tunnel:\n                if self._ssh_proc is not None:\n                    self._ssh_proc.terminate()\n\n    @property\n    def has_messages(self) -> bool:\n        if self._party == Party.TASK:\n            # Shouldn't be called on Task-side\n            return False\n\n        if self._msg_queue.qsize() > 0:\n            return True\n        return False\n\n    def __exit__(self):\n        self._clean_up()\n
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.ACCEPT_TIMEOUT","title":"ACCEPT_TIMEOUT: float = 0.01 class-attribute instance-attribute","text":"

Maximum time to wait to accept connections. Used by Executor-side.

"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.MSG_HEAD","title":"MSG_HEAD: bytes = b'MSG' class-attribute instance-attribute","text":"

Start signal of a message. The end of a message is indicated by MSG_HEAD[::-1].

"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.MSG_SEP","title":"MSG_SEP: bytes = b';;;' class-attribute instance-attribute","text":"

Separator for parts of a message. Messages have a start, length, message and end.

"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.__init__","title":"__init__(party=Party.TASK, use_pickle=True)","text":"

IPC over a TCP or Unix socket.

Unlike with the PipeCommunicator, pickle is always used to send data through the socket.

Parameters:

Name Type Description Default party Party

Which object (side/process) the Communicator is managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.

TASK use_pickle bool

Whether to use pickle. Always True currently, passing False does not change behaviour.

True Source code in lute/execution/ipc.py
def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n    \"\"\"IPC over a TCP or Unix socket.\n\n    Unlike with the PipeCommunicator, pickle is always used to send data\n    through the socket.\n\n    Args:\n        party (Party): Which object (side/process) the Communicator is\n            managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n\n        use_pickle (bool): Whether to use pickle. Always True currently,\n            passing False does not change behaviour.\n    \"\"\"\n    super().__init__(party=party, use_pickle=use_pickle)\n
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.delayed_setup","title":"delayed_setup()","text":"

Delays the creation of socket objects.

The Executor initializes the Communicator when it is created. Since all Executors are created and available at once we want to delay acquisition of socket resources until a single Executor is ready to use them.

Source code in lute/execution/ipc.py
def delayed_setup(self) -> None:\n    \"\"\"Delays the creation of socket objects.\n\n    The Executor initializes the Communicator when it is created. Since\n    all Executors are created and available at once we want to delay\n    acquisition of socket resources until a single Executor is ready\n    to use them.\n    \"\"\"\n    self._data_socket: Union[socket.socket, zmq.sugar.socket.Socket]\n    if USE_ZMQ:\n        self.desc: str = \"Communicates using ZMQ through TCP or Unix sockets.\"\n        self._context: zmq.context.Context = zmq.Context()\n        self._data_socket = self._create_socket_zmq()\n    else:\n        self.desc: str = \"Communicates through a TCP or Unix socket.\"\n        self._data_socket = self._create_socket_raw()\n        self._data_socket.settimeout(SocketCommunicator.ACCEPT_TIMEOUT)\n\n    if self._party == Party.EXECUTOR:\n        # Executor created first so we can define the hostname env variable\n        os.environ[\"LUTE_EXECUTOR_HOST\"] = socket.gethostname()\n        # Setup reader thread\n        self._reader_thread: threading.Thread = threading.Thread(\n            target=self._read_socket\n        )\n        self._msg_queue: queue.Queue = queue.Queue()\n        self._partial_msg: Optional[bytes] = None\n        self._stop_thread: bool = False\n        self._reader_thread.start()\n    else:\n        # Only used by Party.TASK\n        self._use_ssh_tunnel: bool = False\n        self._ssh_proc: Optional[subprocess.Popen] = None\n        self._local_socket_path: Optional[str] = None\n
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.read","title":"read(proc)","text":"

Return a message from the queue if available.

Socket(s) are continuously monitored, and read from when new data is available.

Parameters:

Name Type Description Default proc Popen

The process to read from. Provided for compatibility with other Communicator subtypes. Is ignored.

required

Returns:

Name Type Description msg Message

The message read, containing contents and signal.

Source code in lute/execution/ipc.py
def read(self, proc: subprocess.Popen) -> Message:\n    \"\"\"Return a message from the queue if available.\n\n    Socket(s) are continuously monitored, and read from when new data is\n    available.\n\n    Args:\n        proc (subprocess.Popen): The process to read from. Provided for\n            compatibility with other Communicator subtypes. Is ignored.\n\n    Returns:\n         msg (Message): The message read, containing contents and signal.\n    \"\"\"\n    msg: Message\n    try:\n        msg = self._msg_queue.get(timeout=SocketCommunicator.ACCEPT_TIMEOUT)\n    except queue.Empty:\n        msg = Message()\n\n    return msg\n
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.write","title":"write(msg)","text":"

Send a single Message.

The entire Message (signal and contents) is serialized and sent through a connection over Unix socket.

Parameters:

Name Type Description Default msg Message

The Message to send.

required Source code in lute/execution/ipc.py
def write(self, msg: Message) -> None:\n    \"\"\"Send a single Message.\n\n    The entire Message (signal and contents) is serialized and sent through\n    a connection over Unix socket.\n\n    Args:\n        msg (Message): The Message to send.\n    \"\"\"\n    self._write_socket(msg)\n
"},{"location":"source/execution/ipc/","title":"ipc","text":"

Classes and utilities for communication between Executors and subprocesses.

Communicators manage message passing and parsing between subprocesses. They maintain a limited public interface of \"read\" and \"write\" operations. Behind this interface the methods of communication vary from serialization across pipes to Unix sockets, etc. All communicators pass a single object called a \"Message\" which contains an arbitrary \"contents\" field as well as an optional \"signal\" field.

Classes:

Name Description Party

Enum describing whether Communicator is on Task-side or Executor-side.

Message

A dataclass used for passing information from Task to Executor.

Communicator

Abstract base class for Communicator types.

PipeCommunicator

Manages communication between Task and Executor via pipes (stderr and stdout).

SocketCommunicator

Manages communication using sockets, either raw or using zmq. Supports both TCP and Unix sockets.

"},{"location":"source/execution/ipc/#execution.ipc.Communicator","title":"Communicator","text":"

Bases: ABC

Source code in lute/execution/ipc.py
class Communicator(ABC):\n    def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n        \"\"\"Abstract Base Class for IPC Communicator objects.\n\n        Args:\n            party (Party): Which object (side/process) the Communicator is\n                managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n            use_pickle (bool): Whether to serialize data using pickle prior to\n                sending it.\n        \"\"\"\n        self._party = party\n        self._use_pickle = use_pickle\n        self.desc = \"Communicator abstract base class.\"\n\n    @abstractmethod\n    def read(self, proc: subprocess.Popen) -> Message:\n        \"\"\"Method for reading data through the communication mechanism.\"\"\"\n        ...\n\n    @abstractmethod\n    def write(self, msg: Message) -> None:\n        \"\"\"Method for sending data through the communication mechanism.\"\"\"\n        ...\n\n    def __str__(self):\n        name: str = str(type(self)).split(\"'\")[1].split(\".\")[-1]\n        return f\"{name}: {self.desc}\"\n\n    def __repr__(self):\n        return self.__str__()\n\n    def __enter__(self) -> Self:\n        return self\n\n    def __exit__(self) -> None: ...\n\n    @property\n    def has_messages(self) -> bool:\n        \"\"\"Whether the Communicator has remaining messages.\n\n        The precise method for determining whether there are remaining messages\n        will depend on the specific Communicator sub-class.\n        \"\"\"\n        return False\n\n    def stage_communicator(self):\n        \"\"\"Alternative method for staging outside of context manager.\"\"\"\n        self.__enter__()\n\n    def clear_communicator(self):\n        \"\"\"Alternative exit method outside of context manager.\"\"\"\n        self.__exit__()\n\n    def delayed_setup(self):\n        \"\"\"Any setup that should be done later than init.\"\"\"\n        ...\n
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.has_messages","title":"has_messages: bool property","text":"

Whether the Communicator has remaining messages.

The precise method for determining whether there are remaining messages will depend on the specific Communicator sub-class.

"},{"location":"source/execution/ipc/#execution.ipc.Communicator.__init__","title":"__init__(party=Party.TASK, use_pickle=True)","text":"

Abstract Base Class for IPC Communicator objects.

Parameters:

Name Type Description Default party Party

Which object (side/process) the Communicator is managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.

TASK use_pickle bool

Whether to serialize data using pickle prior to sending it.

True Source code in lute/execution/ipc.py
def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n    \"\"\"Abstract Base Class for IPC Communicator objects.\n\n    Args:\n        party (Party): Which object (side/process) the Communicator is\n            managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n        use_pickle (bool): Whether to serialize data using pickle prior to\n            sending it.\n    \"\"\"\n    self._party = party\n    self._use_pickle = use_pickle\n    self.desc = \"Communicator abstract base class.\"\n
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.clear_communicator","title":"clear_communicator()","text":"

Alternative exit method outside of context manager.

Source code in lute/execution/ipc.py
def clear_communicator(self):\n    \"\"\"Alternative exit method outside of context manager.\"\"\"\n    self.__exit__()\n
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.delayed_setup","title":"delayed_setup()","text":"

Any setup that should be done later than init.

Source code in lute/execution/ipc.py
def delayed_setup(self):\n    \"\"\"Any setup that should be done later than init.\"\"\"\n    ...\n
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.read","title":"read(proc) abstractmethod","text":"

Method for reading data through the communication mechanism.

Source code in lute/execution/ipc.py
@abstractmethod\ndef read(self, proc: subprocess.Popen) -> Message:\n    \"\"\"Method for reading data through the communication mechanism.\"\"\"\n    ...\n
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.stage_communicator","title":"stage_communicator()","text":"

Alternative method for staging outside of context manager.

Source code in lute/execution/ipc.py
def stage_communicator(self):\n    \"\"\"Alternative method for staging outside of context manager.\"\"\"\n    self.__enter__()\n
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.write","title":"write(msg) abstractmethod","text":"

Method for sending data through the communication mechanism.

Source code in lute/execution/ipc.py
@abstractmethod\ndef write(self, msg: Message) -> None:\n    \"\"\"Method for sending data through the communication mechanism.\"\"\"\n    ...\n
"},{"location":"source/execution/ipc/#execution.ipc.Party","title":"Party","text":"

Bases: Enum

Identifier for which party (side/end) is using a communicator.

For some types of communication streams there may be different interfaces depending on which side of the communicator you are on. This enum is used by the communicator to determine which interface to use.

Source code in lute/execution/ipc.py
class Party(Enum):\n    \"\"\"Identifier for which party (side/end) is using a communicator.\n\n    For some types of communication streams there may be different interfaces\n    depending on which side of the communicator you are on. This enum is used\n    by the communicator to determine which interface to use.\n    \"\"\"\n\n    TASK = 0\n    \"\"\"\n    The Task (client) side.\n    \"\"\"\n    EXECUTOR = 1\n    \"\"\"\n    The Executor (server) side.\n    \"\"\"\n
"},{"location":"source/execution/ipc/#execution.ipc.Party.EXECUTOR","title":"EXECUTOR = 1 class-attribute instance-attribute","text":"

The Executor (server) side.

"},{"location":"source/execution/ipc/#execution.ipc.Party.TASK","title":"TASK = 0 class-attribute instance-attribute","text":"

The Task (client) side.

"},{"location":"source/execution/ipc/#execution.ipc.PipeCommunicator","title":"PipeCommunicator","text":"

Bases: Communicator

Provides communication through pipes over stderr/stdout.

The implementation of this communicator has reading and writing ocurring on stderr and stdout. In general the Task will be writing while the Executor will be reading. stderr is used for sending signals.

Source code in lute/execution/ipc.py
class PipeCommunicator(Communicator):\n    \"\"\"Provides communication through pipes over stderr/stdout.\n\n    The implementation of this communicator has reading and writing ocurring\n    on stderr and stdout. In general the `Task` will be writing while the\n    `Executor` will be reading. `stderr` is used for sending signals.\n    \"\"\"\n\n    def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n        \"\"\"IPC through pipes.\n\n        Arbitrary objects may be transmitted using pickle to serialize the data.\n        If pickle is not used\n\n        Args:\n            party (Party): Which object (side/process) the Communicator is\n                managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n            use_pickle (bool): Whether to serialize data using Pickle prior to\n                sending it. If False, data is assumed to be text whi\n        \"\"\"\n        super().__init__(party=party, use_pickle=use_pickle)\n        self.desc = \"Communicates through stderr and stdout using pickle.\"\n\n    def read(self, proc: subprocess.Popen) -> Message:\n        \"\"\"Read from stdout and stderr.\n\n        Args:\n            proc (subprocess.Popen): The process to read from.\n\n        Returns:\n            msg (Message): The message read, containing contents and signal.\n        \"\"\"\n        signal: Optional[str]\n        contents: Optional[str]\n        raw_signal: bytes = proc.stderr.read()\n        raw_contents: bytes = proc.stdout.read()\n        if raw_signal is not None:\n            signal = raw_signal.decode()\n        else:\n            signal = raw_signal\n        if raw_contents:\n            if self._use_pickle:\n                try:\n                    contents = pickle.loads(raw_contents)\n                except (pickle.UnpicklingError, ValueError, EOFError) as err:\n                    logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=False\")\n                    self._use_pickle = False\n                    contents = self._safe_unpickle_decode(raw_contents)\n            else:\n                try:\n                    contents = raw_contents.decode()\n                except UnicodeDecodeError as err:\n                    logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=True\")\n                    self._use_pickle = True\n                    contents = self._safe_unpickle_decode(raw_contents)\n        else:\n            contents = None\n\n        if signal and signal not in LUTE_SIGNALS:\n            # Some tasks write on stderr\n            # If the signal channel has \"non-signal\" info, add it to\n            # contents\n            if not contents:\n                contents = f\"({signal})\"\n            else:\n                contents = f\"{contents} ({signal})\"\n            signal = None\n\n        return Message(contents=contents, signal=signal)\n\n    def _safe_unpickle_decode(self, maybe_mixed: bytes) -> Optional[str]:\n        \"\"\"This method is used to unpickle and/or decode a bytes object.\n\n        It attempts to handle cases where contents can be mixed, i.e., part of\n        the message must be decoded and the other part unpickled. It handles\n        only two-way splits. If there are more complex arrangements such as:\n        <pickled>:<unpickled>:<pickled> etc, it will give up.\n\n        The simpler two way splits are unlikely to occur in normal usage. They\n        may arise when debugging if, e.g., `print` statements are mixed with the\n        usage of the `_report_to_executor` method.\n\n        Note that this method works because ONLY text data is assumed to be\n        sent via the pipes. The method needs to be revised to handle non-text\n        data if the `Task` is modified to also send that via PipeCommunicator.\n        The use of pickle is supported to provide for this option if it is\n        necessary. It may be deprecated in the future.\n\n        Be careful when making changes. This method has seemingly redundant\n        checks because unpickling will not throw an error if a full object can\n        be retrieved. That is, the library will ignore extraneous bytes. This\n        method attempts to retrieve that information if the pickled data comes\n        first in the stream.\n\n        Args:\n            maybe_mixed (bytes): A bytes object which could require unpickling,\n                decoding, or both.\n\n        Returns:\n            contents (Optional[str]): The unpickled/decoded contents if possible.\n                Otherwise, None.\n        \"\"\"\n        contents: Optional[str]\n        try:\n            contents = pickle.loads(maybe_mixed)\n            repickled: bytes = pickle.dumps(contents)\n            if len(repickled) < len(maybe_mixed):\n                # Successful unpickling, but pickle stops even if there are more bytes\n                try:\n                    additional_data: str = maybe_mixed[len(repickled) :].decode()\n                    contents = f\"{contents}{additional_data}\"\n                except UnicodeDecodeError:\n                    # Can't decode the bytes left by pickle, so they are lost\n                    missing_bytes: int = len(maybe_mixed) - len(repickled)\n                    logger.debug(\n                        f\"PipeCommunicator has truncated message. Unable to retrieve {missing_bytes} bytes.\"\n                    )\n        except (pickle.UnpicklingError, ValueError, EOFError) as err:\n            # Pickle may also throw a ValueError, e.g. this bytes: b\"Found! \\n\"\n            # Pickle may also throw an EOFError, eg. this bytes: b\"F0\\n\"\n            try:\n                contents = maybe_mixed.decode()\n            except UnicodeDecodeError as err2:\n                try:\n                    contents = maybe_mixed[: err2.start].decode()\n                    contents = f\"{contents}{pickle.loads(maybe_mixed[err2.start:])}\"\n                except Exception as err3:\n                    logger.debug(\n                        f\"PipeCommunicator unable to decode/parse data! {err3}\"\n                    )\n                    contents = None\n        return contents\n\n    def write(self, msg: Message) -> None:\n        \"\"\"Write to stdout and stderr.\n\n         The signal component is sent to `stderr` while the contents of the\n         Message are sent to `stdout`.\n\n        Args:\n            msg (Message): The Message to send.\n        \"\"\"\n        if self._use_pickle:\n            signal: bytes\n            if msg.signal:\n                signal = msg.signal.encode()\n            else:\n                signal = b\"\"\n\n            contents: bytes = pickle.dumps(msg.contents)\n\n            sys.stderr.buffer.write(signal)\n            sys.stdout.buffer.write(contents)\n\n            sys.stderr.buffer.flush()\n            sys.stdout.buffer.flush()\n        else:\n            raw_signal: str\n            if msg.signal:\n                raw_signal = msg.signal\n            else:\n                raw_signal = \"\"\n\n            raw_contents: str\n            if isinstance(msg.contents, str):\n                raw_contents = msg.contents\n            elif msg.contents is None:\n                raw_contents = \"\"\n            else:\n                raise ValueError(\n                    f\"Cannot send msg contents of type: {type(msg.contents)} when not using pickle!\"\n                )\n            sys.stderr.write(raw_signal)\n            sys.stdout.write(raw_contents)\n
"},{"location":"source/execution/ipc/#execution.ipc.PipeCommunicator.__init__","title":"__init__(party=Party.TASK, use_pickle=True)","text":"

IPC through pipes.

Arbitrary objects may be transmitted using pickle to serialize the data. If pickle is not used

Parameters:

Name Type Description Default party Party

Which object (side/process) the Communicator is managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.

TASK use_pickle bool

Whether to serialize data using Pickle prior to sending it. If False, data is assumed to be text whi

True Source code in lute/execution/ipc.py
def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n    \"\"\"IPC through pipes.\n\n    Arbitrary objects may be transmitted using pickle to serialize the data.\n    If pickle is not used\n\n    Args:\n        party (Party): Which object (side/process) the Communicator is\n            managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n        use_pickle (bool): Whether to serialize data using Pickle prior to\n            sending it. If False, data is assumed to be text whi\n    \"\"\"\n    super().__init__(party=party, use_pickle=use_pickle)\n    self.desc = \"Communicates through stderr and stdout using pickle.\"\n
"},{"location":"source/execution/ipc/#execution.ipc.PipeCommunicator.read","title":"read(proc)","text":"

Read from stdout and stderr.

Parameters:

Name Type Description Default proc Popen

The process to read from.

required

Returns:

Name Type Description msg Message

The message read, containing contents and signal.

Source code in lute/execution/ipc.py
def read(self, proc: subprocess.Popen) -> Message:\n    \"\"\"Read from stdout and stderr.\n\n    Args:\n        proc (subprocess.Popen): The process to read from.\n\n    Returns:\n        msg (Message): The message read, containing contents and signal.\n    \"\"\"\n    signal: Optional[str]\n    contents: Optional[str]\n    raw_signal: bytes = proc.stderr.read()\n    raw_contents: bytes = proc.stdout.read()\n    if raw_signal is not None:\n        signal = raw_signal.decode()\n    else:\n        signal = raw_signal\n    if raw_contents:\n        if self._use_pickle:\n            try:\n                contents = pickle.loads(raw_contents)\n            except (pickle.UnpicklingError, ValueError, EOFError) as err:\n                logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=False\")\n                self._use_pickle = False\n                contents = self._safe_unpickle_decode(raw_contents)\n        else:\n            try:\n                contents = raw_contents.decode()\n            except UnicodeDecodeError as err:\n                logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=True\")\n                self._use_pickle = True\n                contents = self._safe_unpickle_decode(raw_contents)\n    else:\n        contents = None\n\n    if signal and signal not in LUTE_SIGNALS:\n        # Some tasks write on stderr\n        # If the signal channel has \"non-signal\" info, add it to\n        # contents\n        if not contents:\n            contents = f\"({signal})\"\n        else:\n            contents = f\"{contents} ({signal})\"\n        signal = None\n\n    return Message(contents=contents, signal=signal)\n
"},{"location":"source/execution/ipc/#execution.ipc.PipeCommunicator.write","title":"write(msg)","text":"

Write to stdout and stderr.

The signal component is sent to stderr while the contents of the Message are sent to stdout.

Parameters:

Name Type Description Default msg Message

The Message to send.

required Source code in lute/execution/ipc.py
def write(self, msg: Message) -> None:\n    \"\"\"Write to stdout and stderr.\n\n     The signal component is sent to `stderr` while the contents of the\n     Message are sent to `stdout`.\n\n    Args:\n        msg (Message): The Message to send.\n    \"\"\"\n    if self._use_pickle:\n        signal: bytes\n        if msg.signal:\n            signal = msg.signal.encode()\n        else:\n            signal = b\"\"\n\n        contents: bytes = pickle.dumps(msg.contents)\n\n        sys.stderr.buffer.write(signal)\n        sys.stdout.buffer.write(contents)\n\n        sys.stderr.buffer.flush()\n        sys.stdout.buffer.flush()\n    else:\n        raw_signal: str\n        if msg.signal:\n            raw_signal = msg.signal\n        else:\n            raw_signal = \"\"\n\n        raw_contents: str\n        if isinstance(msg.contents, str):\n            raw_contents = msg.contents\n        elif msg.contents is None:\n            raw_contents = \"\"\n        else:\n            raise ValueError(\n                f\"Cannot send msg contents of type: {type(msg.contents)} when not using pickle!\"\n            )\n        sys.stderr.write(raw_signal)\n        sys.stdout.write(raw_contents)\n
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator","title":"SocketCommunicator","text":"

Bases: Communicator

Provides communication over Unix or TCP sockets.

Communication is provided either using sockets with the Python socket library or using ZMQ. The choice of implementation is controlled by the global bool USE_ZMQ.

Whether to use TCP or Unix sockets is controlled by the environment

LUTE_USE_TCP=1

If defined, TCP sockets will be used, otherwise Unix sockets will be used.

Regardless of socket type, the environment variable LUTE_EXECUTOR_HOST=<hostname> will be defined by the Executor-side Communicator.

For TCP sockets: The Executor-side Communicator should be run first and will bind to all interfaces on the port determined by the environment variable: LUTE_PORT=### If no port is defined, a port scan will be performed and the Executor-side Communicator will bind the first one available from a random selection. It will then define the environment variable so the Task-side can pick it up.

For Unix sockets: The path to the Unix socket is defined by the environment variable: LUTE_SOCKET=/path/to/socket This class assumes proper permissions and that this above environment variable has been defined. The Task is configured as what would commonly be referred to as the client, while the Executor is configured as the server.

If the Task process is run on a different machine than the Executor, the Task-side Communicator will open a ssh-tunnel to forward traffic from a local Unix socket to the Executor Unix socket. Opening of the tunnel relies on the environment variable: LUTE_EXECUTOR_HOST=<hostname> to determine the Executor's host. This variable should be defined by the Executor and passed to the Task process automatically, but it can also be defined manually if launching the Task process separately. The Task will use the local socket <LUTE_SOCKET>.task{##}. Multiple local sockets may be created. Currently, it is assumed that the user is identical on both the Task machine and Executor machine.

Source code in lute/execution/ipc.py
class SocketCommunicator(Communicator):\n    \"\"\"Provides communication over Unix or TCP sockets.\n\n    Communication is provided either using sockets with the Python socket library\n    or using ZMQ. The choice of implementation is controlled by the global bool\n    `USE_ZMQ`.\n\n    Whether to use TCP or Unix sockets is controlled by the environment:\n                           `LUTE_USE_TCP=1`\n    If defined, TCP sockets will be used, otherwise Unix sockets will be used.\n\n    Regardless of socket type, the environment variable\n                      `LUTE_EXECUTOR_HOST=<hostname>`\n    will be defined by the Executor-side Communicator.\n\n\n    For TCP sockets:\n    The Executor-side Communicator should be run first and will bind to all\n    interfaces on the port determined by the environment variable:\n                            `LUTE_PORT=###`\n    If no port is defined, a port scan will be performed and the Executor-side\n    Communicator will bind the first one available from a random selection. It\n    will then define the environment variable so the Task-side can pick it up.\n\n    For Unix sockets:\n    The path to the Unix socket is defined by the environment variable:\n                      `LUTE_SOCKET=/path/to/socket`\n    This class assumes proper permissions and that this above environment\n    variable has been defined. The `Task` is configured as what would commonly\n    be referred to as the `client`, while the `Executor` is configured as the\n    server.\n\n    If the Task process is run on a different machine than the Executor, the\n    Task-side Communicator will open a ssh-tunnel to forward traffic from a local\n    Unix socket to the Executor Unix socket. Opening of the tunnel relies on the\n    environment variable:\n                      `LUTE_EXECUTOR_HOST=<hostname>`\n    to determine the Executor's host. This variable should be defined by the\n    Executor and passed to the Task process automatically, but it can also be\n    defined manually if launching the Task process separately. The Task will use\n    the local socket `<LUTE_SOCKET>.task{##}`. Multiple local sockets may be\n    created. Currently, it is assumed that the user is identical on both the Task\n    machine and Executor machine.\n    \"\"\"\n\n    ACCEPT_TIMEOUT: float = 0.01\n    \"\"\"\n    Maximum time to wait to accept connections. Used by Executor-side.\n    \"\"\"\n    MSG_HEAD: bytes = b\"MSG\"\n    \"\"\"\n    Start signal of a message. The end of a message is indicated by MSG_HEAD[::-1].\n    \"\"\"\n    MSG_SEP: bytes = b\";;;\"\n    \"\"\"\n    Separator for parts of a message. Messages have a start, length, message and end.\n    \"\"\"\n\n    def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n        \"\"\"IPC over a TCP or Unix socket.\n\n        Unlike with the PipeCommunicator, pickle is always used to send data\n        through the socket.\n\n        Args:\n            party (Party): Which object (side/process) the Communicator is\n                managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n\n            use_pickle (bool): Whether to use pickle. Always True currently,\n                passing False does not change behaviour.\n        \"\"\"\n        super().__init__(party=party, use_pickle=use_pickle)\n\n    def delayed_setup(self) -> None:\n        \"\"\"Delays the creation of socket objects.\n\n        The Executor initializes the Communicator when it is created. Since\n        all Executors are created and available at once we want to delay\n        acquisition of socket resources until a single Executor is ready\n        to use them.\n        \"\"\"\n        self._data_socket: Union[socket.socket, zmq.sugar.socket.Socket]\n        if USE_ZMQ:\n            self.desc: str = \"Communicates using ZMQ through TCP or Unix sockets.\"\n            self._context: zmq.context.Context = zmq.Context()\n            self._data_socket = self._create_socket_zmq()\n        else:\n            self.desc: str = \"Communicates through a TCP or Unix socket.\"\n            self._data_socket = self._create_socket_raw()\n            self._data_socket.settimeout(SocketCommunicator.ACCEPT_TIMEOUT)\n\n        if self._party == Party.EXECUTOR:\n            # Executor created first so we can define the hostname env variable\n            os.environ[\"LUTE_EXECUTOR_HOST\"] = socket.gethostname()\n            # Setup reader thread\n            self._reader_thread: threading.Thread = threading.Thread(\n                target=self._read_socket\n            )\n            self._msg_queue: queue.Queue = queue.Queue()\n            self._partial_msg: Optional[bytes] = None\n            self._stop_thread: bool = False\n            self._reader_thread.start()\n        else:\n            # Only used by Party.TASK\n            self._use_ssh_tunnel: bool = False\n            self._ssh_proc: Optional[subprocess.Popen] = None\n            self._local_socket_path: Optional[str] = None\n\n    # Read\n    ############################################################################\n\n    def read(self, proc: subprocess.Popen) -> Message:\n        \"\"\"Return a message from the queue if available.\n\n        Socket(s) are continuously monitored, and read from when new data is\n        available.\n\n        Args:\n            proc (subprocess.Popen): The process to read from. Provided for\n                compatibility with other Communicator subtypes. Is ignored.\n\n        Returns:\n             msg (Message): The message read, containing contents and signal.\n        \"\"\"\n        msg: Message\n        try:\n            msg = self._msg_queue.get(timeout=SocketCommunicator.ACCEPT_TIMEOUT)\n        except queue.Empty:\n            msg = Message()\n\n        return msg\n\n    def _read_socket(self) -> None:\n        \"\"\"Read data from a socket.\n\n        Socket(s) are continuously monitored, and read from when new data is\n        available.\n\n        Calls an underlying method for either raw sockets or ZMQ.\n        \"\"\"\n\n        while True:\n            if self._stop_thread:\n                logger.debug(\"Stopping socket reader thread.\")\n                break\n            if USE_ZMQ:\n                self._read_socket_zmq()\n            else:\n                self._read_socket_raw()\n\n    def _read_socket_raw(self) -> None:\n        \"\"\"Read data from a socket.\n\n        Raw socket implementation for the reader thread.\n        \"\"\"\n        connection: socket.socket\n        addr: Union[str, Tuple[str, int]]\n        try:\n            connection, addr = self._data_socket.accept()\n            full_data: bytes = b\"\"\n            while True:\n                data: bytes = connection.recv(8192)\n                if data:\n                    full_data += data\n                else:\n                    break\n            connection.close()\n            self._unpack_messages(full_data)\n        except socket.timeout:\n            pass\n\n    def _read_socket_zmq(self) -> None:\n        \"\"\"Read data from a socket.\n\n        ZMQ implementation for the reader thread.\n        \"\"\"\n        try:\n            full_data: bytes = self._data_socket.recv(0)\n            self._unpack_messages(full_data)\n        except zmq.ZMQError:\n            pass\n\n    def _unpack_messages(self, data: bytes) -> None:\n        \"\"\"Unpacks a byte stream into individual messages.\n\n        Messages are encoded in the following format:\n                 <HEAD><SEP><len(msg)><SEP><msg><SEP><HEAD[::-1]>\n        The items between <> are replaced as follows:\n            - <HEAD>: A start marker\n            - <SEP>: A separator for components of the message\n            - <len(msg)>: The length of the message payload in bytes.\n            - <msg>: The message payload in bytes\n            - <HEAD[::-1]>: The start marker in reverse to indicate the end.\n\n        Partial messages (a series of bytes which cannot be converted to a full\n        message) are stored for later. An attempt is made to reconstruct the\n        message with the next call to this method.\n\n        Args:\n            data (bytes): A raw byte stream containing anywhere from a partial\n                message to multiple full messages.\n        \"\"\"\n        msg: Message\n        working_data: bytes\n        if self._partial_msg:\n            # Concatenate the previous partial message to the beginning\n            working_data = self._partial_msg + data\n            self._partial_msg = None\n        else:\n            working_data = data\n        while working_data:\n            try:\n                # Message encoding: <HEAD><SEP><len><SEP><msg><SEP><HEAD[::-1]>\n                end = working_data.find(\n                    SocketCommunicator.MSG_SEP + SocketCommunicator.MSG_HEAD[::-1]\n                )\n                msg_parts: List[bytes] = working_data[:end].split(\n                    SocketCommunicator.MSG_SEP\n                )\n                if len(msg_parts) != 3:\n                    self._partial_msg = working_data\n                    break\n\n                cmd: bytes\n                nbytes: bytes\n                raw_msg: bytes\n                cmd, nbytes, raw_msg = msg_parts\n                if len(raw_msg) != int(nbytes):\n                    self._partial_msg = working_data\n                    break\n                msg = pickle.loads(raw_msg)\n                self._msg_queue.put(msg)\n            except pickle.UnpicklingError:\n                self._partial_msg = working_data\n                break\n            if end < len(working_data):\n                # Add len(SEP+HEAD) since end marks the start of <SEP><HEAD[::-1]\n                offset: int = len(\n                    SocketCommunicator.MSG_SEP + SocketCommunicator.MSG_HEAD\n                )\n                working_data = working_data[end + offset :]\n            else:\n                working_data = b\"\"\n\n    # Write\n    ############################################################################\n\n    def _write_socket(self, msg: Message) -> None:\n        \"\"\"Sends data over a socket from the 'client' (Task) side.\n\n        Messages are encoded in the following format:\n                 <HEAD><SEP><len(msg)><SEP><msg><SEP><HEAD[::-1]>\n        The items between <> are replaced as follows:\n            - <HEAD>: A start marker\n            - <SEP>: A separator for components of the message\n            - <len(msg)>: The length of the message payload in bytes.\n            - <msg>: The message payload in bytes\n            - <HEAD[::-1]>: The start marker in reverse to indicate the end.\n\n        This structure is used for decoding the message on the other end.\n        \"\"\"\n        data: bytes = pickle.dumps(msg)\n        cmd: bytes = SocketCommunicator.MSG_HEAD\n        size: bytes = b\"%d\" % len(data)\n        end: bytes = SocketCommunicator.MSG_HEAD[::-1]\n        sep: bytes = SocketCommunicator.MSG_SEP\n        packed_msg: bytes = cmd + sep + size + sep + data + sep + end\n        if USE_ZMQ:\n            self._data_socket.send(packed_msg)\n        else:\n            self._data_socket.sendall(packed_msg)\n\n    def write(self, msg: Message) -> None:\n        \"\"\"Send a single Message.\n\n        The entire Message (signal and contents) is serialized and sent through\n        a connection over Unix socket.\n\n        Args:\n            msg (Message): The Message to send.\n        \"\"\"\n        self._write_socket(msg)\n\n    # Generic create\n    ############################################################################\n\n    def _create_socket_raw(self) -> socket.socket:\n        \"\"\"Create either a Unix or TCP socket.\n\n        If the environment variable:\n                              `LUTE_USE_TCP=1`\n        is defined, a TCP socket is returned, otherwise a Unix socket.\n\n        Refer to the individual initialization methods for additional environment\n        variables controlling the behaviour of these two communication types.\n\n        Returns:\n            data_socket (socket.socket): TCP or Unix socket.\n        \"\"\"\n        import struct\n\n        use_tcp: Optional[str] = os.getenv(\"LUTE_USE_TCP\")\n        sock: socket.socket\n        if use_tcp is not None:\n            if self._party == Party.EXECUTOR:\n                logger.info(\"Will use raw TCP sockets.\")\n            sock = self._init_tcp_socket_raw()\n        else:\n            if self._party == Party.EXECUTOR:\n                logger.info(\"Will use raw Unix sockets.\")\n            sock = self._init_unix_socket_raw()\n        sock.setsockopt(\n            socket.SOL_SOCKET, socket.SO_LINGER, struct.pack(\"ii\", 1, 10000)\n        )\n        return sock\n\n    def _create_socket_zmq(self) -> zmq.sugar.socket.Socket:\n        \"\"\"Create either a Unix or TCP socket.\n\n        If the environment variable:\n                              `LUTE_USE_TCP=1`\n        is defined, a TCP socket is returned, otherwise a Unix socket.\n\n        Refer to the individual initialization methods for additional environment\n        variables controlling the behaviour of these two communication types.\n\n        Returns:\n            data_socket (socket.socket): Unix socket object.\n        \"\"\"\n        socket_type: Literal[zmq.PULL, zmq.PUSH]\n        if self._party == Party.EXECUTOR:\n            socket_type = zmq.PULL\n        else:\n            socket_type = zmq.PUSH\n\n        data_socket: zmq.sugar.socket.Socket = self._context.socket(socket_type)\n        data_socket.set_hwm(160000)\n        # Need to multiply by 1000 since ZMQ uses ms\n        data_socket.setsockopt(\n            zmq.RCVTIMEO, int(SocketCommunicator.ACCEPT_TIMEOUT * 1000)\n        )\n        # Try TCP first\n        use_tcp: Optional[str] = os.getenv(\"LUTE_USE_TCP\")\n        if use_tcp is not None:\n            if self._party == Party.EXECUTOR:\n                logger.info(\"Will use TCP (ZMQ).\")\n            self._init_tcp_socket_zmq(data_socket)\n        else:\n            if self._party == Party.EXECUTOR:\n                logger.info(\"Will use Unix sockets (ZMQ).\")\n            self._init_unix_socket_zmq(data_socket)\n\n        return data_socket\n\n    # TCP Init\n    ############################################################################\n\n    def _find_random_port(\n        self, min_port: int = 41923, max_port: int = 64324, max_tries: int = 100\n    ) -> Optional[int]:\n        \"\"\"Find a random open port to bind to if using TCP.\"\"\"\n        from random import choices\n\n        sock: socket.socket\n        ports: List[int] = choices(range(min_port, max_port), k=max_tries)\n        for port in ports:\n            sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)\n            try:\n                sock.bind((\"\", port))\n                sock.close()\n                del sock\n                return port\n            except:\n                continue\n        return None\n\n    def _init_tcp_socket_raw(self) -> socket.socket:\n        \"\"\"Initialize a TCP socket.\n\n        Executor-side code should always be run first. It checks to see if\n        the environment variable\n                                `LUTE_PORT=###`\n        is defined, if so binds it, otherwise find a free port from a selection\n        of random ports. If a port search is performed, the `LUTE_PORT` variable\n        will be defined so it can be picked up by the the Task-side Communicator.\n\n        In the event that no port can be bound on the Executor-side, or the port\n        and hostname information is unavailable to the Task-side, the program\n        will exit.\n\n        Returns:\n            data_socket (socket.socket): TCP socket object.\n        \"\"\"\n        data_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)\n        port: Optional[Union[str, int]] = os.getenv(\"LUTE_PORT\")\n        if self._party == Party.EXECUTOR:\n            if port is None:\n                # If port is None find one\n                # Executor code executes first\n                port = self._find_random_port()\n                if port is None:\n                    # Failed to find a port to bind\n                    logger.info(\n                        \"Executor failed to bind a port. \"\n                        \"Try providing a LUTE_PORT directly! Exiting!\"\n                    )\n                    sys.exit(-1)\n                # Provide port env var for Task-side\n                os.environ[\"LUTE_PORT\"] = str(port)\n            data_socket.bind((\"\", int(port)))\n            data_socket.listen()\n        else:\n            hostname: str = socket.gethostname()\n            executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n            if executor_hostname is None or port is None:\n                logger.info(\n                    \"Task-side does not have host/port information!\"\n                    \" Check environment variables! Exiting!\"\n                )\n                sys.exit(-1)\n            if hostname == executor_hostname:\n                data_socket.connect((\"localhost\", int(port)))\n            else:\n                data_socket.connect((executor_hostname, int(port)))\n        return data_socket\n\n    def _init_tcp_socket_zmq(self, data_socket: zmq.sugar.socket.Socket) -> None:\n        \"\"\"Initialize a TCP socket using ZMQ.\n\n        Equivalent as the method above but requires passing in a ZMQ socket\n        object instead of returning one.\n\n        Args:\n            data_socket (zmq.socket.Socket): Socket object.\n        \"\"\"\n        port: Optional[Union[str, int]] = os.getenv(\"LUTE_PORT\")\n        if self._party == Party.EXECUTOR:\n            if port is None:\n                new_port: int = data_socket.bind_to_random_port(\"tcp://*\")\n                if new_port is None:\n                    # Failed to find a port to bind\n                    logger.info(\n                        \"Executor failed to bind a port. \"\n                        \"Try providing a LUTE_PORT directly! Exiting!\"\n                    )\n                    sys.exit(-1)\n                port = new_port\n                os.environ[\"LUTE_PORT\"] = str(port)\n            else:\n                data_socket.bind(f\"tcp://*:{port}\")\n            logger.debug(f\"Executor bound port {port}\")\n        else:\n            executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n            if executor_hostname is None or port is None:\n                logger.info(\n                    \"Task-side does not have host/port information!\"\n                    \" Check environment variables! Exiting!\"\n                )\n                sys.exit(-1)\n            data_socket.connect(f\"tcp://{executor_hostname}:{port}\")\n\n    # Unix Init\n    ############################################################################\n\n    def _get_socket_path(self) -> str:\n        \"\"\"Return the socket path, defining one if it is not available.\n\n        Returns:\n            socket_path (str): Path to the Unix socket.\n        \"\"\"\n        socket_path: str\n        try:\n            socket_path = os.environ[\"LUTE_SOCKET\"]\n        except KeyError as err:\n            import uuid\n            import tempfile\n\n            # Define a path, and add to environment\n            # Executor-side always created first, Task will use the same one\n            socket_path = f\"{tempfile.gettempdir()}/lute_{uuid.uuid4().hex}.sock\"\n            os.environ[\"LUTE_SOCKET\"] = socket_path\n            logger.debug(f\"SocketCommunicator defines socket_path: {socket_path}\")\n        if USE_ZMQ:\n            return f\"ipc://{socket_path}\"\n        else:\n            return socket_path\n\n    def _init_unix_socket_raw(self) -> socket.socket:\n        \"\"\"Returns a Unix socket object.\n\n        Executor-side code should always be run first. It checks to see if\n        the environment variable\n                                `LUTE_SOCKET=XYZ`\n        is defined, if so binds it, otherwise it will create a new path and\n        define the environment variable for the Task-side to find.\n\n        On the Task (client-side), this method will also open a SSH tunnel to\n        forward a local Unix socket to an Executor Unix socket if the Task and\n        Executor processes are on different machines.\n\n        Returns:\n            data_socket (socket.socket): Unix socket object.\n        \"\"\"\n        socket_path: str = self._get_socket_path()\n        data_socket = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)\n        if self._party == Party.EXECUTOR:\n            if os.path.exists(socket_path):\n                os.unlink(socket_path)\n            data_socket.bind(socket_path)\n            data_socket.listen()\n        elif self._party == Party.TASK:\n            hostname: str = socket.gethostname()\n            executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n            if executor_hostname is None:\n                logger.info(\"Hostname for Executor process not found! Exiting!\")\n                data_socket.close()\n                sys.exit(-1)\n            if hostname == executor_hostname:\n                data_socket.connect(socket_path)\n            else:\n                self._local_socket_path = self._setup_unix_ssh_tunnel(\n                    socket_path, hostname, executor_hostname\n                )\n                while 1:\n                    # Keep trying reconnect until ssh tunnel works.\n                    try:\n                        data_socket.connect(self._local_socket_path)\n                        break\n                    except FileNotFoundError:\n                        continue\n\n        return data_socket\n\n    def _init_unix_socket_zmq(self, data_socket: zmq.sugar.socket.Socket) -> None:\n        \"\"\"Initialize a Unix socket object, using ZMQ.\n\n        Equivalent as the method above but requires passing in a ZMQ socket\n        object instead of returning one.\n\n        Args:\n            data_socket (socket.socket): ZMQ object.\n        \"\"\"\n        socket_path = self._get_socket_path()\n        if self._party == Party.EXECUTOR:\n            if os.path.exists(socket_path):\n                os.unlink(socket_path)\n            data_socket.bind(socket_path)\n        elif self._party == Party.TASK:\n            hostname: str = socket.gethostname()\n            executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n            if executor_hostname is None:\n                logger.info(\"Hostname for Executor process not found! Exiting!\")\n                self._data_socket.close()\n                sys.exit(-1)\n            if hostname == executor_hostname:\n                data_socket.connect(socket_path)\n            else:\n                # Need to remove ipc:// from socket_path for forwarding\n                self._local_socket_path = self._setup_unix_ssh_tunnel(\n                    socket_path[6:], hostname, executor_hostname\n                )\n                # Need to add it back\n                path: str = f\"ipc://{self._local_socket_path}\"\n                data_socket.connect(path)\n\n    def _setup_unix_ssh_tunnel(\n        self, socket_path: str, hostname: str, executor_hostname: str\n    ) -> str:\n        \"\"\"Prepares an SSH tunnel for forwarding between Unix sockets on two hosts.\n\n        An SSH tunnel is opened with `ssh -L <local>:<remote> sleep 2`.\n        This method of communication is slightly slower and incurs additional\n        overhead - it should only be used as a backup. If communication across\n        multiple hosts is required consider using TCP.  The Task will use\n        the local socket `<LUTE_SOCKET>.task{##}`. Multiple local sockets may be\n        created. It is assumed that the user is identical on both the\n        Task machine and Executor machine.\n\n        Returns:\n            local_socket_path (str): The local Unix socket to connect to.\n        \"\"\"\n        if \"uuid\" not in globals():\n            import uuid\n        local_socket_path = f\"{socket_path}.task{uuid.uuid4().hex[:4]}\"\n        self._use_ssh_tunnel = True\n        ssh_cmd: List[str] = [\n            \"ssh\",\n            \"-o\",\n            \"LogLevel=quiet\",\n            \"-L\",\n            f\"{local_socket_path}:{socket_path}\",\n            executor_hostname,\n            \"sleep\",\n            \"2\",\n        ]\n        logger.debug(f\"Opening tunnel from {hostname} to {executor_hostname}\")\n        self._ssh_proc = subprocess.Popen(ssh_cmd)\n        time.sleep(0.4)  # Need to wait... -> Use single Task comm at beginning?\n        return local_socket_path\n\n    # Clean up and properties\n    ############################################################################\n\n    def _clean_up(self) -> None:\n        \"\"\"Clean up connections.\"\"\"\n        if self._party == Party.EXECUTOR:\n            self._stop_thread = True\n            self._reader_thread.join()\n            logger.debug(\"Closed reading thread.\")\n\n        self._data_socket.close()\n        if USE_ZMQ:\n            self._context.term()\n        else:\n            ...\n\n        if os.getenv(\"LUTE_USE_TCP\"):\n            return\n        else:\n            if self._party == Party.EXECUTOR:\n                os.unlink(os.getenv(\"LUTE_SOCKET\"))  # Should be defined\n                return\n            elif self._use_ssh_tunnel:\n                if self._ssh_proc is not None:\n                    self._ssh_proc.terminate()\n\n    @property\n    def has_messages(self) -> bool:\n        if self._party == Party.TASK:\n            # Shouldn't be called on Task-side\n            return False\n\n        if self._msg_queue.qsize() > 0:\n            return True\n        return False\n\n    def __exit__(self):\n        self._clean_up()\n
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.ACCEPT_TIMEOUT","title":"ACCEPT_TIMEOUT: float = 0.01 class-attribute instance-attribute","text":"

Maximum time to wait to accept connections. Used by Executor-side.

"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.MSG_HEAD","title":"MSG_HEAD: bytes = b'MSG' class-attribute instance-attribute","text":"

Start signal of a message. The end of a message is indicated by MSG_HEAD[::-1].

"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.MSG_SEP","title":"MSG_SEP: bytes = b';;;' class-attribute instance-attribute","text":"

Separator for parts of a message. Messages have a start, length, message and end.

"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.__init__","title":"__init__(party=Party.TASK, use_pickle=True)","text":"

IPC over a TCP or Unix socket.

Unlike with the PipeCommunicator, pickle is always used to send data through the socket.

Parameters:

Name Type Description Default party Party

Which object (side/process) the Communicator is managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.

TASK use_pickle bool

Whether to use pickle. Always True currently, passing False does not change behaviour.

True Source code in lute/execution/ipc.py
def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n    \"\"\"IPC over a TCP or Unix socket.\n\n    Unlike with the PipeCommunicator, pickle is always used to send data\n    through the socket.\n\n    Args:\n        party (Party): Which object (side/process) the Communicator is\n            managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n\n        use_pickle (bool): Whether to use pickle. Always True currently,\n            passing False does not change behaviour.\n    \"\"\"\n    super().__init__(party=party, use_pickle=use_pickle)\n
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.delayed_setup","title":"delayed_setup()","text":"

Delays the creation of socket objects.

The Executor initializes the Communicator when it is created. Since all Executors are created and available at once we want to delay acquisition of socket resources until a single Executor is ready to use them.

Source code in lute/execution/ipc.py
def delayed_setup(self) -> None:\n    \"\"\"Delays the creation of socket objects.\n\n    The Executor initializes the Communicator when it is created. Since\n    all Executors are created and available at once we want to delay\n    acquisition of socket resources until a single Executor is ready\n    to use them.\n    \"\"\"\n    self._data_socket: Union[socket.socket, zmq.sugar.socket.Socket]\n    if USE_ZMQ:\n        self.desc: str = \"Communicates using ZMQ through TCP or Unix sockets.\"\n        self._context: zmq.context.Context = zmq.Context()\n        self._data_socket = self._create_socket_zmq()\n    else:\n        self.desc: str = \"Communicates through a TCP or Unix socket.\"\n        self._data_socket = self._create_socket_raw()\n        self._data_socket.settimeout(SocketCommunicator.ACCEPT_TIMEOUT)\n\n    if self._party == Party.EXECUTOR:\n        # Executor created first so we can define the hostname env variable\n        os.environ[\"LUTE_EXECUTOR_HOST\"] = socket.gethostname()\n        # Setup reader thread\n        self._reader_thread: threading.Thread = threading.Thread(\n            target=self._read_socket\n        )\n        self._msg_queue: queue.Queue = queue.Queue()\n        self._partial_msg: Optional[bytes] = None\n        self._stop_thread: bool = False\n        self._reader_thread.start()\n    else:\n        # Only used by Party.TASK\n        self._use_ssh_tunnel: bool = False\n        self._ssh_proc: Optional[subprocess.Popen] = None\n        self._local_socket_path: Optional[str] = None\n
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.read","title":"read(proc)","text":"

Return a message from the queue if available.

Socket(s) are continuously monitored, and read from when new data is available.

Parameters:

Name Type Description Default proc Popen

The process to read from. Provided for compatibility with other Communicator subtypes. Is ignored.

required

Returns:

Name Type Description msg Message

The message read, containing contents and signal.

Source code in lute/execution/ipc.py
def read(self, proc: subprocess.Popen) -> Message:\n    \"\"\"Return a message from the queue if available.\n\n    Socket(s) are continuously monitored, and read from when new data is\n    available.\n\n    Args:\n        proc (subprocess.Popen): The process to read from. Provided for\n            compatibility with other Communicator subtypes. Is ignored.\n\n    Returns:\n         msg (Message): The message read, containing contents and signal.\n    \"\"\"\n    msg: Message\n    try:\n        msg = self._msg_queue.get(timeout=SocketCommunicator.ACCEPT_TIMEOUT)\n    except queue.Empty:\n        msg = Message()\n\n    return msg\n
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.write","title":"write(msg)","text":"

Send a single Message.

The entire Message (signal and contents) is serialized and sent through a connection over Unix socket.

Parameters:

Name Type Description Default msg Message

The Message to send.

required Source code in lute/execution/ipc.py
def write(self, msg: Message) -> None:\n    \"\"\"Send a single Message.\n\n    The entire Message (signal and contents) is serialized and sent through\n    a connection over Unix socket.\n\n    Args:\n        msg (Message): The Message to send.\n    \"\"\"\n    self._write_socket(msg)\n
"},{"location":"source/io/_sqlite/","title":"_sqlite","text":"

Backend SQLite database utilites.

Functions should be used only by the higher-level database module.

"},{"location":"source/io/config/","title":"config","text":"

Machinary for the IO of configuration YAML files and their validation.

Functions:

Name Description parse_config

str, config_path: str) -> TaskParameters: Parse a configuration file and return a TaskParameters object of validated parameters for a specific Task. Raises an exception if the provided configuration does not match the expected model.

Raises:

Type Description ValidationError

Error raised by pydantic during data validation. (From Pydantic)

"},{"location":"source/io/config/#io.config.AnalysisHeader","title":"AnalysisHeader","text":"

Bases: BaseModel

Header information for LUTE analysis runs.

Source code in lute/io/models/base.py
class AnalysisHeader(BaseModel):\n    \"\"\"Header information for LUTE analysis runs.\"\"\"\n\n    title: str = Field(\n        \"LUTE Task Configuration\",\n        description=\"Description of the configuration or experiment.\",\n    )\n    experiment: str = Field(\"\", description=\"Experiment.\")\n    run: Union[str, int] = Field(\"\", description=\"Data acquisition run.\")\n    date: str = Field(\"1970/01/01\", description=\"Start date of analysis.\")\n    lute_version: Union[float, str] = Field(\n        0.1, description=\"Version of LUTE used for analysis.\"\n    )\n    task_timeout: PositiveInt = Field(\n        600,\n        description=(\n            \"Time in seconds until a task times out. Should be slightly shorter\"\n            \" than job timeout if using a job manager (e.g. SLURM).\"\n        ),\n    )\n    work_dir: str = Field(\"\", description=\"Main working directory for LUTE.\")\n\n    @validator(\"work_dir\", always=True)\n    def validate_work_dir(cls, directory: str, values: Dict[str, Any]) -> str:\n        work_dir: str\n        if directory == \"\":\n            std_work_dir = (\n                f\"/sdf/data/lcls/ds/{values['experiment'][:3]}/\"\n                f\"{values['experiment']}/scratch\"\n            )\n            work_dir = std_work_dir\n        else:\n            work_dir = directory\n        # Check existence and permissions\n        if not os.path.exists(work_dir):\n            raise ValueError(f\"Working Directory: {work_dir} does not exist!\")\n        if not os.access(work_dir, os.W_OK):\n            # Need write access for database, files etc.\n            raise ValueError(f\"Not write access for working directory: {work_dir}!\")\n        return work_dir\n\n    @validator(\"run\", always=True)\n    def validate_run(\n        cls, run: Union[str, int], values: Dict[str, Any]\n    ) -> Union[str, int]:\n        if run == \"\":\n            # From Airflow RUN_NUM should have Format \"RUN_DATETIME\" - Num is first part\n            run_time: str = os.environ.get(\"RUN_NUM\", \"\")\n            if run_time != \"\":\n                return int(run_time.split(\"_\")[0])\n        return run\n\n    @validator(\"experiment\", always=True)\n    def validate_experiment(cls, experiment: str, values: Dict[str, Any]) -> str:\n        if experiment == \"\":\n            arp_exp: str = os.environ.get(\"EXPERIMENT\", \"EXPX00000\")\n            return arp_exp\n        return experiment\n
"},{"location":"source/io/config/#io.config.CompareHKLParameters","title":"CompareHKLParameters","text":"

Bases: ThirdPartyParameters

Parameters for CrystFEL's compare_hkl for calculating figures of merit.

There are many parameters, and many combinations. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-partialator.html

Source code in lute/io/models/sfx_merge.py
class CompareHKLParameters(ThirdPartyParameters):\n    \"\"\"Parameters for CrystFEL's `compare_hkl` for calculating figures of merit.\n\n    There are many parameters, and many combinations. For more information on\n    usage, please refer to the CrystFEL documentation, here:\n    https://www.desy.de/~twhite/crystfel/manual-partialator.html\n    \"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        long_flags_use_eq: bool = True\n        \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/compare_hkl\",\n        description=\"CrystFEL's reflection comparison binary.\",\n        flag_type=\"\",\n    )\n    in_files: Optional[str] = Field(\n        \"\",\n        description=\"Path to input HKLs. Space-separated list of 2. Use output of partialator e.g.\",\n        flag_type=\"\",\n    )\n    ## Need mechanism to set is_result=True ...\n    symmetry: str = Field(\"\", description=\"Point group symmetry.\", flag_type=\"--\")\n    cell_file: str = Field(\n        \"\",\n        description=\"Path to a file containing unit cell information (PDB or CrystFEL format).\",\n        flag_type=\"-\",\n        rename_param=\"p\",\n    )\n    fom: str = Field(\n        \"Rsplit\", description=\"Specify figure of merit to calculate.\", flag_type=\"--\"\n    )\n    nshells: int = Field(10, description=\"Use n resolution shells.\", flag_type=\"--\")\n    # NEED A NEW CASE FOR THIS -> Boolean flag, no arg, one hyphen...\n    # fix_unity: bool = Field(\n    #    False,\n    #    description=\"Fix scale factors to unity.\",\n    #    flag_type=\"-\",\n    #    rename_param=\"u\",\n    # )\n    shell_file: str = Field(\n        \"\",\n        description=\"Write the statistics in resolution shells to a file.\",\n        flag_type=\"--\",\n        rename_param=\"shell-file\",\n        is_result=True,\n    )\n    ignore_negs: bool = Field(\n        False,\n        description=\"Ignore reflections with negative reflections.\",\n        flag_type=\"--\",\n        rename_param=\"ignore-negs\",\n    )\n    zero_negs: bool = Field(\n        False,\n        description=\"Set negative intensities to 0.\",\n        flag_type=\"--\",\n        rename_param=\"zero-negs\",\n    )\n    sigma_cutoff: Optional[Union[float, int, str]] = Field(\n        # \"-infinity\",\n        description=\"Discard reflections with I/sigma(I) < n. -infinity means no cutoff.\",\n        flag_type=\"--\",\n        rename_param=\"sigma-cutoff\",\n    )\n    rmin: Optional[float] = Field(\n        description=\"Low resolution cutoff of 1/d (m-1). Use this or --lowres NOT both.\",\n        flag_type=\"--\",\n    )\n    lowres: Optional[float] = Field(\n        descirption=\"Low resolution cutoff in Angstroms. Use this or --rmin NOT both.\",\n        flag_type=\"--\",\n    )\n    rmax: Optional[float] = Field(\n        description=\"High resolution cutoff in 1/d (m-1). Use this or --highres NOT both.\",\n        flag_type=\"--\",\n    )\n    highres: Optional[float] = Field(\n        description=\"High resolution cutoff in Angstroms. Use this or --rmax NOT both.\",\n        flag_type=\"--\",\n    )\n\n    @validator(\"in_files\", always=True)\n    def validate_in_files(cls, in_files: str, values: Dict[str, Any]) -> str:\n        if in_files == \"\":\n            partialator_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n            )\n            if partialator_file:\n                hkls: str = f\"{partialator_file}1 {partialator_file}2\"\n                return hkls\n        return in_files\n\n    @validator(\"cell_file\", always=True)\n    def validate_cell_file(cls, cell_file: str, values: Dict[str, Any]) -> str:\n        if cell_file == \"\":\n            idx_cell_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\",\n                \"IndexCrystFEL\",\n                \"cell_file\",\n                valid_only=False,\n            )\n            if idx_cell_file:\n                return idx_cell_file\n        return cell_file\n\n    @validator(\"symmetry\", always=True)\n    def validate_symmetry(cls, symmetry: str, values: Dict[str, Any]) -> str:\n        if symmetry == \"\":\n            partialator_sym: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"symmetry\"\n            )\n            if partialator_sym:\n                return partialator_sym\n        return symmetry\n\n    @validator(\"shell_file\", always=True)\n    def validate_shell_file(cls, shell_file: str, values: Dict[str, Any]) -> str:\n        if shell_file == \"\":\n            partialator_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n            )\n            if partialator_file:\n                shells_out: str = partialator_file.split(\".\")[0]\n                shells_out = f\"{shells_out}_{values['fom']}_n{values['nshells']}.dat\"\n                return shells_out\n        return shell_file\n
"},{"location":"source/io/config/#io.config.CompareHKLParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_merge.py
class Config(ThirdPartyParameters.Config):\n    long_flags_use_eq: bool = True\n    \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/config/#io.config.CompareHKLParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True class-attribute instance-attribute","text":"

Whether long command-line arguments are passed like --long=arg.

"},{"location":"source/io/config/#io.config.CompareHKLParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/config/#io.config.ConcatenateStreamFilesParameters","title":"ConcatenateStreamFilesParameters","text":"

Bases: TaskParameters

Parameters for stream concatenation.

Concatenates the stream file output from CrystFEL indexing for multiple experimental runs.

Source code in lute/io/models/sfx_index.py
class ConcatenateStreamFilesParameters(TaskParameters):\n    \"\"\"Parameters for stream concatenation.\n\n    Concatenates the stream file output from CrystFEL indexing for multiple\n    experimental runs.\n    \"\"\"\n\n    class Config(TaskParameters.Config):\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    in_file: str = Field(\n        \"\",\n        description=\"Root of directory tree storing stream files to merge.\",\n    )\n\n    tag: Optional[str] = Field(\n        \"\",\n        description=\"Tag identifying the stream files to merge.\",\n    )\n\n    out_file: str = Field(\n        \"\", description=\"Path to merged output stream file.\", is_result=True\n    )\n\n    @validator(\"in_file\", always=True)\n    def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n        if in_file == \"\":\n            stream_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"IndexCrystFEL\", \"out_file\"\n            )\n            if stream_file:\n                stream_dir: str = str(Path(stream_file).parent)\n                return stream_dir\n        return in_file\n\n    @validator(\"tag\", always=True)\n    def validate_tag(cls, tag: str, values: Dict[str, Any]) -> str:\n        if tag == \"\":\n            stream_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"IndexCrystFEL\", \"out_file\"\n            )\n            if stream_file:\n                stream_tag: str = Path(stream_file).name.split(\"_\")[0]\n                return stream_tag\n        return tag\n\n    @validator(\"out_file\", always=True)\n    def validate_out_file(cls, tag: str, values: Dict[str, Any]) -> str:\n        if tag == \"\":\n            stream_out_file: str = str(\n                Path(values[\"in_file\"]).parent / f\"{values['tag'].stream}\"\n            )\n            return stream_out_file\n        return tag\n
"},{"location":"source/io/config/#io.config.ConcatenateStreamFilesParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_index.py
class Config(TaskParameters.Config):\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/config/#io.config.ConcatenateStreamFilesParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/config/#io.config.DimpleSolveParameters","title":"DimpleSolveParameters","text":"

Bases: ThirdPartyParameters

Parameters for CCP4's dimple program.

There are many parameters. For more information on usage, please refer to the CCP4 documentation, here: https://ccp4.github.io/dimple/

Source code in lute/io/models/sfx_solve.py
class DimpleSolveParameters(ThirdPartyParameters):\n    \"\"\"Parameters for CCP4's dimple program.\n\n    There are many parameters. For more information on\n    usage, please refer to the CCP4 documentation, here:\n    https://ccp4.github.io/dimple/\n    \"\"\"\n\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/ccp4-8.0/bin/dimple\",\n        description=\"CCP4 Dimple for solving structures with MR.\",\n        flag_type=\"\",\n    )\n    # Positional requirements - all required.\n    in_file: str = Field(\n        \"\",\n        description=\"Path to input mtz.\",\n        flag_type=\"\",\n    )\n    pdb: str = Field(\"\", description=\"Path to a PDB.\", flag_type=\"\")\n    out_dir: str = Field(\"\", description=\"Output DIRECTORY.\", flag_type=\"\")\n    # Most used options\n    mr_thresh: PositiveFloat = Field(\n        0.4,\n        description=\"Threshold for molecular replacement.\",\n        flag_type=\"--\",\n        rename_param=\"mr-when-r\",\n    )\n    slow: Optional[bool] = Field(\n        False, description=\"Perform more refinement.\", flag_type=\"--\"\n    )\n    # Other options (IO)\n    hklout: str = Field(\n        \"final.mtz\", description=\"Output mtz file name.\", flag_type=\"--\"\n    )\n    xyzout: str = Field(\n        \"final.pdb\", description=\"Output PDB file name.\", flag_type=\"--\"\n    )\n    icolumn: Optional[str] = Field(\n        # \"IMEAN\",\n        description=\"Name for the I column.\",\n        flag_type=\"--\",\n    )\n    sigicolumn: Optional[str] = Field(\n        # \"SIG<ICOL>\",\n        description=\"Name for the Sig<I> column.\",\n        flag_type=\"--\",\n    )\n    fcolumn: Optional[str] = Field(\n        # \"F\",\n        description=\"Name for the F column.\",\n        flag_type=\"--\",\n    )\n    sigfcolumn: Optional[str] = Field(\n        # \"F\",\n        description=\"Name for the Sig<F> column.\",\n        flag_type=\"--\",\n    )\n    libin: Optional[str] = Field(\n        description=\"Ligand descriptions for refmac (LIBIN).\", flag_type=\"--\"\n    )\n    refmac_key: Optional[str] = Field(\n        description=\"Extra Refmac keywords to use in refinement.\",\n        flag_type=\"--\",\n        rename_param=\"refmac-key\",\n    )\n    free_r_flags: Optional[str] = Field(\n        description=\"Path to a mtz file with freeR flags.\",\n        flag_type=\"--\",\n        rename_param=\"free-r-flags\",\n    )\n    freecolumn: Optional[Union[int, float]] = Field(\n        # 0,\n        description=\"Refree column with an optional value.\",\n        flag_type=\"--\",\n    )\n    img_format: Optional[str] = Field(\n        description=\"Format of generated images. (png, jpeg, none).\",\n        flag_type=\"-\",\n        rename_param=\"f\",\n    )\n    white_bg: bool = Field(\n        False,\n        description=\"Use a white background in Coot and in images.\",\n        flag_type=\"--\",\n        rename_param=\"white-bg\",\n    )\n    no_cleanup: bool = Field(\n        False,\n        description=\"Retain intermediate files.\",\n        flag_type=\"--\",\n        rename_param=\"no-cleanup\",\n    )\n    # Calculations\n    no_blob_search: bool = Field(\n        False,\n        description=\"Do not search for unmodelled blobs.\",\n        flag_type=\"--\",\n        rename_param=\"no-blob-search\",\n    )\n    anode: bool = Field(\n        False, description=\"Use SHELX/AnoDe to find peaks in the anomalous map.\"\n    )\n    # Run customization\n    no_hetatm: bool = Field(\n        False,\n        description=\"Remove heteroatoms from the given model.\",\n        flag_type=\"--\",\n        rename_param=\"no-hetatm\",\n    )\n    rigid_cycles: Optional[PositiveInt] = Field(\n        # 10,\n        description=\"Number of cycles of rigid-body refinement to perform.\",\n        flag_type=\"--\",\n        rename_param=\"rigid-cycles\",\n    )\n    jelly: Optional[PositiveInt] = Field(\n        # 4,\n        description=\"Number of cycles of jelly-body refinement to perform.\",\n        flag_type=\"--\",\n    )\n    restr_cycles: Optional[PositiveInt] = Field(\n        # 8,\n        description=\"Number of cycles of refmac final refinement to perform.\",\n        flag_type=\"--\",\n        rename_param=\"restr-cycles\",\n    )\n    lim_resolution: Optional[PositiveFloat] = Field(\n        description=\"Limit the final resolution.\", flag_type=\"--\", rename_param=\"reso\"\n    )\n    weight: Optional[str] = Field(\n        # \"auto-weight\",\n        description=\"The refmac matrix weight.\",\n        flag_type=\"--\",\n    )\n    mr_prog: Optional[str] = Field(\n        # \"phaser\",\n        description=\"Molecular replacement program. phaser or molrep.\",\n        flag_type=\"--\",\n        rename_param=\"mr-prog\",\n    )\n    mr_num: Optional[Union[str, int]] = Field(\n        # \"auto\",\n        description=\"Number of molecules to use for molecular replacement.\",\n        flag_type=\"--\",\n        rename_param=\"mr-num\",\n    )\n    mr_reso: Optional[PositiveFloat] = Field(\n        # 3.25,\n        description=\"High resolution for molecular replacement. If >10 interpreted as eLLG.\",\n        flag_type=\"--\",\n        rename_param=\"mr-reso\",\n    )\n    itof_prog: Optional[str] = Field(\n        description=\"Program to calculate amplitudes. truncate, or ctruncate.\",\n        flag_type=\"--\",\n        rename_param=\"ItoF-prog\",\n    )\n\n    @validator(\"in_file\", always=True)\n    def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n        if in_file == \"\":\n            get_hkl_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"ManipulateHKL\", \"out_file\"\n            )\n            if get_hkl_file:\n                return get_hkl_file\n        return in_file\n\n    @validator(\"out_dir\", always=True)\n    def validate_out_dir(cls, out_dir: str, values: Dict[str, Any]) -> str:\n        if out_dir == \"\":\n            get_hkl_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"ManipulateHKL\", \"out_file\"\n            )\n            if get_hkl_file:\n                return os.path.dirname(get_hkl_file)\n        return out_dir\n
"},{"location":"source/io/config/#io.config.FindOverlapXSSParameters","title":"FindOverlapXSSParameters","text":"

Bases: TaskParameters

TaskParameter model for FindOverlapXSS Task.

This Task determines spatial or temporal overlap between an optical pulse and the FEL pulse based on difference scattering (XSS) signal. This Task uses SmallData HDF5 files as a source.

Source code in lute/io/models/smd.py
class FindOverlapXSSParameters(TaskParameters):\n    \"\"\"TaskParameter model for FindOverlapXSS Task.\n\n    This Task determines spatial or temporal overlap between an optical pulse\n    and the FEL pulse based on difference scattering (XSS) signal. This Task\n    uses SmallData HDF5 files as a source.\n    \"\"\"\n\n    class ExpConfig(BaseModel):\n        det_name: str\n        ipm_var: str\n        scan_var: Union[str, List[str]]\n\n    class Thresholds(BaseModel):\n        min_Iscat: Union[int, float]\n        min_ipm: Union[int, float]\n\n    class AnalysisFlags(BaseModel):\n        use_pyfai: bool = True\n        use_asymls: bool = False\n\n    exp_config: ExpConfig\n    thresholds: Thresholds\n    analysis_flags: AnalysisFlags\n
"},{"location":"source/io/config/#io.config.FindPeaksPsocakeParameters","title":"FindPeaksPsocakeParameters","text":"

Bases: ThirdPartyParameters

Parameters for crystallographic (Bragg) peak finding using Psocake.

This peak finding Task optionally has the ability to compress/decompress data with SZ for the purpose of compression validation. NOTE: This Task is deprecated and provided for compatibility only.

Source code in lute/io/models/sfx_find_peaks.py
class FindPeaksPsocakeParameters(ThirdPartyParameters):\n    \"\"\"Parameters for crystallographic (Bragg) peak finding using Psocake.\n\n    This peak finding Task optionally has the ability to compress/decompress\n    data with SZ for the purpose of compression validation.\n    NOTE: This Task is deprecated and provided for compatibility only.\n    \"\"\"\n\n    class Config(TaskParameters.Config):\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n        result_from_params: str = \"\"\n        \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n\n    class SZParameters(BaseModel):\n        compressor: Literal[\"qoz\", \"sz3\"] = Field(\n            \"qoz\", description=\"SZ compression algorithm (qoz, sz3)\"\n        )\n        binSize: int = Field(2, description=\"SZ compression's bin size paramater\")\n        roiWindowSize: int = Field(\n            2, description=\"SZ compression's ROI window size paramater\"\n        )\n        absError: float = Field(10, descriptionp=\"Maximum absolute error value\")\n\n    executable: str = Field(\"mpirun\", description=\"MPI executable.\", flag_type=\"\")\n    np: PositiveInt = Field(\n        max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n        description=\"Number of processes\",\n        flag_type=\"-\",\n    )\n    mca: str = Field(\n        \"btl ^openib\", description=\"Mca option for the MPI executable\", flag_type=\"--\"\n    )\n    p_arg1: str = Field(\n        \"python\", description=\"Executable to run with mpi (i.e. python).\", flag_type=\"\"\n    )\n    u: str = Field(\n        \"\", description=\"Python option for unbuffered output.\", flag_type=\"-\"\n    )\n    p_arg2: str = Field(\n        \"findPeaksSZ.py\",\n        description=\"Executable to run with mpi (i.e. python).\",\n        flag_type=\"\",\n    )\n    d: str = Field(description=\"Detector name\", flag_type=\"-\")\n    e: str = Field(\"\", description=\"Experiment name\", flag_type=\"-\")\n    r: int = Field(-1, description=\"Run number\", flag_type=\"-\")\n    outDir: str = Field(\n        description=\"Output directory where .cxi will be saved\", flag_type=\"--\"\n    )\n    algorithm: int = Field(1, description=\"PyAlgos algorithm to use\", flag_type=\"--\")\n    alg_npix_min: float = Field(\n        1.0, description=\"PyAlgos algorithm's npix_min parameter\", flag_type=\"--\"\n    )\n    alg_npix_max: float = Field(\n        45.0, description=\"PyAlgos algorithm's npix_max parameter\", flag_type=\"--\"\n    )\n    alg_amax_thr: float = Field(\n        250.0, description=\"PyAlgos algorithm's amax_thr parameter\", flag_type=\"--\"\n    )\n    alg_atot_thr: float = Field(\n        330.0, description=\"PyAlgos algorithm's atot_thr parameter\", flag_type=\"--\"\n    )\n    alg_son_min: float = Field(\n        10.0, description=\"PyAlgos algorithm's son_min parameter\", flag_type=\"--\"\n    )\n    alg1_thr_low: float = Field(\n        80.0, description=\"PyAlgos algorithm's thr_low parameter\", flag_type=\"--\"\n    )\n    alg1_thr_high: float = Field(\n        270.0, description=\"PyAlgos algorithm's thr_high parameter\", flag_type=\"--\"\n    )\n    alg1_rank: int = Field(\n        3, description=\"PyAlgos algorithm's rank parameter\", flag_type=\"--\"\n    )\n    alg1_radius: int = Field(\n        3, description=\"PyAlgos algorithm's radius parameter\", flag_type=\"--\"\n    )\n    alg1_dr: int = Field(\n        1, description=\"PyAlgos algorithm's dr parameter\", flag_type=\"--\"\n    )\n    psanaMask_on: str = Field(\n        \"True\", description=\"Whether psana's mask should be used\", flag_type=\"--\"\n    )\n    psanaMask_calib: str = Field(\n        \"True\", description=\"Psana mask's calib parameter\", flag_type=\"--\"\n    )\n    psanaMask_status: str = Field(\n        \"True\", description=\"Psana mask's status parameter\", flag_type=\"--\"\n    )\n    psanaMask_edges: str = Field(\n        \"True\", description=\"Psana mask's edges parameter\", flag_type=\"--\"\n    )\n    psanaMask_central: str = Field(\n        \"True\", description=\"Psana mask's central parameter\", flag_type=\"--\"\n    )\n    psanaMask_unbond: str = Field(\n        \"True\", description=\"Psana mask's unbond parameter\", flag_type=\"--\"\n    )\n    psanaMask_unbondnrs: str = Field(\n        \"True\", description=\"Psana mask's unbondnbrs parameter\", flag_type=\"--\"\n    )\n    mask: str = Field(\n        \"\", description=\"Path to an additional mask to apply\", flag_type=\"--\"\n    )\n    clen: str = Field(\n        description=\"Epics variable storing the camera length\", flag_type=\"--\"\n    )\n    coffset: float = Field(0, description=\"Camera offset in m\", flag_type=\"--\")\n    minPeaks: int = Field(\n        15,\n        description=\"Minimum number of peaks to mark frame for indexing\",\n        flag_type=\"--\",\n    )\n    maxPeaks: int = Field(\n        15,\n        description=\"Maximum number of peaks to mark frame for indexing\",\n        flag_type=\"--\",\n    )\n    minRes: int = Field(\n        0,\n        description=\"Minimum peak resolution to mark frame for indexing \",\n        flag_type=\"--\",\n    )\n    sample: str = Field(\"\", description=\"Sample name\", flag_type=\"--\")\n    instrument: Union[None, str] = Field(\n        None, description=\"Instrument name\", flag_type=\"--\"\n    )\n    pixelSize: float = Field(0.0, description=\"Pixel size\", flag_type=\"--\")\n    auto: str = Field(\n        \"False\",\n        description=(\n            \"Whether to automatically determine peak per event peak \"\n            \"finding parameters\"\n        ),\n        flag_type=\"--\",\n    )\n    detectorDistance: float = Field(\n        0.0, description=\"Detector distance from interaction point in m\", flag_type=\"--\"\n    )\n    access: Literal[\"ana\", \"ffb\"] = Field(\n        \"ana\", description=\"Data node type: {ana,ffb}\", flag_type=\"--\"\n    )\n    szfile: str = Field(\"qoz.json\", description=\"Path to SZ's JSON configuration file\")\n    lute_template_cfg: TemplateConfig = Field(\n        TemplateConfig(\n            template_name=\"sz.json\",\n            output_path=\"\",  # Will want to change where this goes...\n        ),\n        description=\"Template information for the sz.json file\",\n    )\n    sz_parameters: SZParameters = Field(\n        description=\"Configuration parameters for SZ Compression\", flag_type=\"\"\n    )\n\n    @validator(\"e\", always=True)\n    def validate_e(cls, e: str, values: Dict[str, Any]) -> str:\n        if e == \"\":\n            return values[\"lute_config\"].experiment\n        return e\n\n    @validator(\"r\", always=True)\n    def validate_r(cls, r: int, values: Dict[str, Any]) -> int:\n        if r == -1:\n            return values[\"lute_config\"].run\n        return r\n\n    @validator(\"lute_template_cfg\", always=True)\n    def set_output_path(\n        cls, lute_template_cfg: TemplateConfig, values: Dict[str, Any]\n    ) -> TemplateConfig:\n        if lute_template_cfg.output_path == \"\":\n            lute_template_cfg.output_path = values[\"szfile\"]\n        return lute_template_cfg\n\n    @validator(\"sz_parameters\", always=True)\n    def set_sz_compression_parameters(\n        cls, sz_parameters: SZParameters, values: Dict[str, Any]\n    ) -> None:\n        values[\"compressor\"] = sz_parameters.compressor\n        values[\"binSize\"] = sz_parameters.binSize\n        values[\"roiWindowSize\"] = sz_parameters.roiWindowSize\n        if sz_parameters.compressor == \"qoz\":\n            values[\"pressio_opts\"] = {\n                \"pressio:abs\": sz_parameters.absError,\n                \"qoz\": {\"qoz:stride\": 8},\n            }\n        else:\n            values[\"pressio_opts\"] = {\"pressio:abs\": sz_parameters.absError}\n        return None\n\n    @root_validator(pre=False)\n    def define_result(cls, values: Dict[str, Any]) -> Dict[str, Any]:\n        exp: str = values[\"lute_config\"].experiment\n        run: int = int(values[\"lute_config\"].run)\n        directory: str = values[\"outDir\"]\n        fname: str = f\"{exp}_{run:04d}.lst\"\n\n        cls.Config.result_from_params = f\"{directory}/{fname}\"\n        return values\n
"},{"location":"source/io/config/#io.config.FindPeaksPsocakeParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_find_peaks.py
class Config(TaskParameters.Config):\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    result_from_params: str = \"\"\n    \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n
"},{"location":"source/io/config/#io.config.FindPeaksPsocakeParameters.Config.result_from_params","title":"result_from_params: str = '' class-attribute instance-attribute","text":"

Defines a result from the parameters. Use a validator to do so.

"},{"location":"source/io/config/#io.config.FindPeaksPsocakeParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/config/#io.config.FindPeaksPyAlgosParameters","title":"FindPeaksPyAlgosParameters","text":"

Bases: TaskParameters

Parameters for crystallographic (Bragg) peak finding using PyAlgos.

This peak finding Task optionally has the ability to compress/decompress data with SZ for the purpose of compression validation.

Source code in lute/io/models/sfx_find_peaks.py
class FindPeaksPyAlgosParameters(TaskParameters):\n    \"\"\"Parameters for crystallographic (Bragg) peak finding using PyAlgos.\n\n    This peak finding Task optionally has the ability to compress/decompress\n    data with SZ for the purpose of compression validation.\n    \"\"\"\n\n    class Config(TaskParameters.Config):\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    class SZCompressorParameters(BaseModel):\n        compressor: Literal[\"qoz\", \"sz3\"] = Field(\n            \"qoz\", description='Compression algorithm (\"qoz\" or \"sz3\")'\n        )\n        abs_error: float = Field(10.0, description=\"Absolute error bound\")\n        bin_size: int = Field(2, description=\"Bin size\")\n        roi_window_size: int = Field(\n            9,\n            description=\"Default window size\",\n        )\n\n    outdir: str = Field(\n        description=\"Output directory for cxi files\",\n    )\n    n_events: int = Field(\n        0,\n        description=\"Number of events to process (0 to process all events)\",\n    )\n    det_name: str = Field(\n        description=\"Psana name of the detector storing the image data\",\n    )\n    event_receiver: Literal[\"evr0\", \"evr1\"] = Field(\n        description=\"Event Receiver to be used: evr0 or evr1\",\n    )\n    tag: str = Field(\n        \"\",\n        description=\"Tag to add to the output file names\",\n    )\n    pv_camera_length: Union[str, float] = Field(\n        \"\",\n        description=\"PV associated with camera length \"\n        \"(if a number, camera length directly)\",\n    )\n    event_logic: bool = Field(\n        False,\n        description=\"True if only events with a specific event code should be \"\n        \"processed. False if the event code should be ignored\",\n    )\n    event_code: int = Field(\n        0,\n        description=\"Required events code for events to be processed if event logic \"\n        \"is True\",\n    )\n    psana_mask: bool = Field(\n        False,\n        description=\"If True, apply mask from psana Detector object\",\n    )\n    mask_file: Union[str, None] = Field(\n        None,\n        description=\"File with a custom mask to apply. If None, no custom mask is \"\n        \"applied\",\n    )\n    min_peaks: int = Field(2, description=\"Minimum number of peaks per image\")\n    max_peaks: int = Field(\n        2048,\n        description=\"Maximum number of peaks per image\",\n    )\n    npix_min: int = Field(\n        2,\n        description=\"Minimum number of pixels per peak\",\n    )\n    npix_max: int = Field(\n        30,\n        description=\"Maximum number of pixels per peak\",\n    )\n    amax_thr: float = Field(\n        80.0,\n        description=\"Minimum intensity threshold for starting a peak\",\n    )\n    atot_thr: float = Field(\n        120.0,\n        description=\"Minimum summed intensity threshold for pixel collection\",\n    )\n    son_min: float = Field(\n        7.0,\n        description=\"Minimum signal-to-noise ratio to be considered a peak\",\n    )\n    peak_rank: int = Field(\n        3,\n        description=\"Radius in which central peak pixel is a local maximum\",\n    )\n    r0: float = Field(\n        3.0,\n        description=\"Radius of ring for background evaluation in pixels\",\n    )\n    dr: float = Field(\n        2.0,\n        description=\"Width of ring for background evaluation in pixels\",\n    )\n    nsigm: float = Field(\n        7.0,\n        description=\"Intensity threshold to include pixel in connected group\",\n    )\n    compression: Optional[SZCompressorParameters] = Field(\n        None,\n        description=\"Options for the SZ Compression Algorithm\",\n    )\n    out_file: str = Field(\n        \"\",\n        description=\"Path to output file.\",\n        flag_type=\"-\",\n        rename_param=\"o\",\n        is_result=True,\n    )\n\n    @validator(\"out_file\", always=True)\n    def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n        if out_file == \"\":\n            fname: Path = (\n                Path(values[\"outdir\"])\n                / f\"{values['lute_config'].experiment}_{values['lute_config'].run}_\"\n                f\"{values['tag']}.list\"\n            )\n            return str(fname)\n        return out_file\n
"},{"location":"source/io/config/#io.config.FindPeaksPyAlgosParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_find_peaks.py
class Config(TaskParameters.Config):\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/config/#io.config.FindPeaksPyAlgosParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/config/#io.config.IndexCrystFELParameters","title":"IndexCrystFELParameters","text":"

Bases: ThirdPartyParameters

Parameters for CrystFEL's indexamajig.

There are many parameters, and many combinations. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-indexamajig.html

Source code in lute/io/models/sfx_index.py
class IndexCrystFELParameters(ThirdPartyParameters):\n    \"\"\"Parameters for CrystFEL's `indexamajig`.\n\n    There are many parameters, and many combinations. For more information on\n    usage, please refer to the CrystFEL documentation, here:\n    https://www.desy.de/~twhite/crystfel/manual-indexamajig.html\n    \"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n        long_flags_use_eq: bool = True\n        \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/indexamajig\",\n        description=\"CrystFEL's indexing binary.\",\n        flag_type=\"\",\n    )\n    # Basic options\n    in_file: Optional[str] = Field(\n        \"\", description=\"Path to input file.\", flag_type=\"-\", rename_param=\"i\"\n    )\n    out_file: str = Field(\n        \"\",\n        description=\"Path to output file.\",\n        flag_type=\"-\",\n        rename_param=\"o\",\n        is_result=True,\n    )\n    geometry: str = Field(\n        \"\", description=\"Path to geometry file.\", flag_type=\"-\", rename_param=\"g\"\n    )\n    zmq_input: Optional[str] = Field(\n        description=\"ZMQ address to receive data over. `input` and `zmq-input` are mutually exclusive\",\n        flag_type=\"--\",\n        rename_param=\"zmq-input\",\n    )\n    zmq_subscribe: Optional[str] = Field(  # Can be used multiple times...\n        description=\"Subscribe to ZMQ message of type `tag`\",\n        flag_type=\"--\",\n        rename_param=\"zmq-subscribe\",\n    )\n    zmq_request: Optional[AnyUrl] = Field(\n        description=\"Request new data over ZMQ by sending this value\",\n        flag_type=\"--\",\n        rename_param=\"zmq-request\",\n    )\n    asapo_endpoint: Optional[str] = Field(\n        description=\"ASAP::O endpoint. zmq-input and this are mutually exclusive.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-endpoint\",\n    )\n    asapo_token: Optional[str] = Field(\n        description=\"ASAP::O authentication token.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-token\",\n    )\n    asapo_beamtime: Optional[str] = Field(\n        description=\"ASAP::O beatime.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-beamtime\",\n    )\n    asapo_source: Optional[str] = Field(\n        description=\"ASAP::O data source.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-source\",\n    )\n    asapo_group: Optional[str] = Field(\n        description=\"ASAP::O consumer group.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-group\",\n    )\n    asapo_stream: Optional[str] = Field(\n        description=\"ASAP::O stream.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-stream\",\n    )\n    asapo_wait_for_stream: Optional[str] = Field(\n        description=\"If ASAP::O stream does not exist, wait for it to appear.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-wait-for-stream\",\n    )\n    data_format: Optional[str] = Field(\n        description=\"Specify format for ZMQ or ASAP::O. `msgpack`, `hdf5` or `seedee`.\",\n        flag_type=\"--\",\n        rename_param=\"data-format\",\n    )\n    basename: bool = Field(\n        False,\n        description=\"Remove directory parts of filenames. Acts before prefix if prefix also given.\",\n        flag_type=\"--\",\n    )\n    prefix: Optional[str] = Field(\n        description=\"Add a prefix to the filenames from the infile argument.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-stream\",\n    )\n    nthreads: PositiveInt = Field(\n        max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n        description=\"Number of threads to use. See also `max_indexer_threads`.\",\n        flag_type=\"-\",\n        rename_param=\"j\",\n    )\n    no_check_prefix: bool = Field(\n        False,\n        description=\"Don't attempt to correct the prefix if it seems incorrect.\",\n        flag_type=\"--\",\n        rename_param=\"no-check-prefix\",\n    )\n    highres: Optional[float] = Field(\n        description=\"Mark all pixels greater than `x` has bad.\", flag_type=\"--\"\n    )\n    profile: bool = Field(\n        False, description=\"Display timing data to monitor performance.\", flag_type=\"--\"\n    )\n    temp_dir: Optional[str] = Field(\n        description=\"Specify a path for the temp files folder.\",\n        flag_type=\"--\",\n        rename_param=\"temp-dir\",\n    )\n    wait_for_file: conint(gt=-2) = Field(\n        0,\n        description=\"Wait at most `x` seconds for a file to be created. A value of -1 means wait forever.\",\n        flag_type=\"--\",\n        rename_param=\"wait-for-file\",\n    )\n    no_image_data: bool = Field(\n        False,\n        description=\"Load only the metadata, no iamges. Can check indexability without high data requirements.\",\n        flag_type=\"--\",\n        rename_param=\"no-image-data\",\n    )\n    # Peak-finding options\n    # ....\n    # Indexing options\n    indexing: Optional[str] = Field(\n        description=\"Comma-separated list of supported indexing algorithms to use. Default is to automatically detect.\",\n        flag_type=\"--\",\n    )\n    cell_file: Optional[str] = Field(\n        description=\"Path to a file containing unit cell information (PDB or CrystFEL format).\",\n        flag_type=\"-\",\n        rename_param=\"p\",\n    )\n    tolerance: str = Field(\n        \"5,5,5,1.5\",\n        description=(\n            \"Tolerances (in percent) for unit cell comparison. \"\n            \"Comma-separated list a,b,c,angle. Default=5,5,5,1.5\"\n        ),\n        flag_type=\"--\",\n    )\n    no_check_cell: bool = Field(\n        False,\n        description=\"Do not check cell parameters against unit cell. Replaces '-raw' method.\",\n        flag_type=\"--\",\n        rename_param=\"no-check-cell\",\n    )\n    no_check_peaks: bool = Field(\n        False,\n        description=\"Do not verify peaks are accounted for by solution.\",\n        flag_type=\"--\",\n        rename_param=\"no-check-peaks\",\n    )\n    multi: bool = Field(\n        False, description=\"Enable multi-lattice indexing.\", flag_type=\"--\"\n    )\n    wavelength_estimate: Optional[float] = Field(\n        description=\"Estimate for X-ray wavelength. Required for some methods.\",\n        flag_type=\"--\",\n        rename_param=\"wavelength-estimate\",\n    )\n    camera_length_estimate: Optional[float] = Field(\n        description=\"Estimate for camera distance. Required for some methods.\",\n        flag_type=\"--\",\n        rename_param=\"camera-length-estimate\",\n    )\n    max_indexer_threads: Optional[PositiveInt] = Field(\n        # 1,\n        description=\"Some indexing algos can use multiple threads. In addition to image-based.\",\n        flag_type=\"--\",\n        rename_param=\"max-indexer-threads\",\n    )\n    no_retry: bool = Field(\n        False,\n        description=\"Do not remove weak peaks and try again.\",\n        flag_type=\"--\",\n        rename_param=\"no-retry\",\n    )\n    no_refine: bool = Field(\n        False,\n        description=\"Skip refinement step.\",\n        flag_type=\"--\",\n        rename_param=\"no-refine\",\n    )\n    no_revalidate: bool = Field(\n        False,\n        description=\"Skip revalidation step.\",\n        flag_type=\"--\",\n        rename_param=\"no-revalidate\",\n    )\n    # TakeTwo specific parameters\n    taketwo_member_threshold: Optional[PositiveInt] = Field(\n        # 20,\n        description=\"Minimum number of vectors to consider.\",\n        flag_type=\"--\",\n        rename_param=\"taketwo-member-threshold\",\n    )\n    taketwo_len_tolerance: Optional[PositiveFloat] = Field(\n        # 0.001,\n        description=\"TakeTwo length tolerance in Angstroms.\",\n        flag_type=\"--\",\n        rename_param=\"taketwo-len-tolerance\",\n    )\n    taketwo_angle_tolerance: Optional[PositiveFloat] = Field(\n        # 0.6,\n        description=\"TakeTwo angle tolerance in degrees.\",\n        flag_type=\"--\",\n        rename_param=\"taketwo-angle-tolerance\",\n    )\n    taketwo_trace_tolerance: Optional[PositiveFloat] = Field(\n        # 3,\n        description=\"Matrix trace tolerance in degrees.\",\n        flag_type=\"--\",\n        rename_param=\"taketwo-trace-tolerance\",\n    )\n    # Felix-specific parameters\n    # felix_domega\n    # felix-fraction-max-visits\n    # felix-max-internal-angle\n    # felix-max-uniqueness\n    # felix-min-completeness\n    # felix-min-visits\n    # felix-num-voxels\n    # felix-sigma\n    # felix-tthrange-max\n    # felix-tthrange-min\n    # XGANDALF-specific parameters\n    xgandalf_sampling_pitch: Optional[NonNegativeInt] = Field(\n        # 6,\n        description=\"Density of reciprocal space sampling.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-sampling-pitch\",\n    )\n    xgandalf_grad_desc_iterations: Optional[NonNegativeInt] = Field(\n        # 4,\n        description=\"Number of gradient descent iterations.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-grad-desc-iterations\",\n    )\n    xgandalf_tolerance: Optional[PositiveFloat] = Field(\n        # 0.02,\n        description=\"Relative tolerance of lattice vectors\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-tolerance\",\n    )\n    xgandalf_no_deviation_from_provided_cell: Optional[bool] = Field(\n        description=\"Found unit cell must match provided.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-no-deviation-from-provided-cell\",\n    )\n    xgandalf_min_lattice_vector_length: Optional[PositiveFloat] = Field(\n        # 30,\n        description=\"Minimum possible lattice length.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-min-lattice-vector-length\",\n    )\n    xgandalf_max_lattice_vector_length: Optional[PositiveFloat] = Field(\n        # 250,\n        description=\"Minimum possible lattice length.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-max-lattice-vector-length\",\n    )\n    xgandalf_max_peaks: Optional[PositiveInt] = Field(\n        # 250,\n        description=\"Maximum number of peaks to use for indexing.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-max-peaks\",\n    )\n    xgandalf_fast_execution: bool = Field(\n        False,\n        description=\"Shortcut to set sampling-pitch=2, and grad-desc-iterations=3.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-fast-execution\",\n    )\n    # pinkIndexer parameters\n    # ...\n    # asdf_fast: bool = Field(False, description=\"Enable fast mode for asdf. 3x faster for 7% loss in accuracy.\", flag_type=\"--\", rename_param=\"asdf-fast\")\n    # Integration parameters\n    integration: str = Field(\n        \"rings-nocen\", description=\"Method for integrating reflections.\", flag_type=\"--\"\n    )\n    fix_profile_radius: Optional[float] = Field(\n        description=\"Fix the profile radius (m^{-1})\",\n        flag_type=\"--\",\n        rename_param=\"fix-profile-radius\",\n    )\n    fix_divergence: Optional[float] = Field(\n        0,\n        description=\"Fix the divergence (rad, full angle).\",\n        flag_type=\"--\",\n        rename_param=\"fix-divergence\",\n    )\n    int_radius: str = Field(\n        \"4,5,7\",\n        description=\"Inner, middle, and outer radii for 3-ring integration.\",\n        flag_type=\"--\",\n        rename_param=\"int-radius\",\n    )\n    int_diag: str = Field(\n        \"none\",\n        description=\"Show detailed information on integration when condition is met.\",\n        flag_type=\"--\",\n        rename_param=\"int-diag\",\n    )\n    push_res: str = Field(\n        \"infinity\",\n        description=\"Integrate `x` higher than apparent resolution limit (nm-1).\",\n        flag_type=\"--\",\n        rename_param=\"push-res\",\n    )\n    overpredict: bool = Field(\n        False,\n        description=\"Over-predict reflections. Maybe useful with post-refinement.\",\n        flag_type=\"--\",\n    )\n    cell_parameters_only: bool = Field(\n        False, description=\"Do not predict refletions at all\", flag_type=\"--\"\n    )\n    # Output parameters\n    no_non_hits_in_stream: bool = Field(\n        False,\n        description=\"Exclude non-hits from the stream file.\",\n        flag_type=\"--\",\n        rename_param=\"no-non-hits-in-stream\",\n    )\n    copy_hheader: Optional[str] = Field(\n        description=\"Copy information from header in the image to output stream.\",\n        flag_type=\"--\",\n        rename_param=\"copy-hheader\",\n    )\n    no_peaks_in_stream: bool = Field(\n        False,\n        description=\"Do not record peaks in stream file.\",\n        flag_type=\"--\",\n        rename_param=\"no-peaks-in-stream\",\n    )\n    no_refls_in_stream: bool = Field(\n        False,\n        description=\"Do not record reflections in stream.\",\n        flag_type=\"--\",\n        rename_param=\"no-refls-in-stream\",\n    )\n    serial_offset: Optional[PositiveInt] = Field(\n        description=\"Start numbering at `x` instead of 1.\",\n        flag_type=\"--\",\n        rename_param=\"serial-offset\",\n    )\n    harvest_file: Optional[str] = Field(\n        description=\"Write parameters to file in JSON format.\",\n        flag_type=\"--\",\n        rename_param=\"harvest-file\",\n    )\n\n    @validator(\"in_file\", always=True)\n    def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n        if in_file == \"\":\n            filename: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"FindPeaksPyAlgos\", \"out_file\"\n            )\n            if filename is None:\n                exp: str = values[\"lute_config\"].experiment\n                run: int = int(values[\"lute_config\"].run)\n                tag: Optional[str] = read_latest_db_entry(\n                    f\"{values['lute_config'].work_dir}\", \"FindPeaksPsocake\", \"tag\"\n                )\n                out_dir: Optional[str] = read_latest_db_entry(\n                    f\"{values['lute_config'].work_dir}\", \"FindPeaksPsocake\", \"outDir\"\n                )\n                if out_dir is not None:\n                    fname: str = f\"{out_dir}/{exp}_{run:04d}\"\n                    if tag is not None:\n                        fname = f\"{fname}_{tag}\"\n                    return f\"{fname}.lst\"\n            else:\n                return filename\n        return in_file\n\n    @validator(\"out_file\", always=True)\n    def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n        if out_file == \"\":\n            expmt: str = values[\"lute_config\"].experiment\n            run: int = int(values[\"lute_config\"].run)\n            work_dir: str = values[\"lute_config\"].work_dir\n            fname: str = f\"{expmt}_r{run:04d}.stream\"\n            return f\"{work_dir}/{fname}\"\n        return out_file\n
"},{"location":"source/io/config/#io.config.IndexCrystFELParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_index.py
class Config(ThirdPartyParameters.Config):\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    long_flags_use_eq: bool = True\n    \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n
"},{"location":"source/io/config/#io.config.IndexCrystFELParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True class-attribute instance-attribute","text":"

Whether long command-line arguments are passed like --long=arg.

"},{"location":"source/io/config/#io.config.IndexCrystFELParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/config/#io.config.ManipulateHKLParameters","title":"ManipulateHKLParameters","text":"

Bases: ThirdPartyParameters

Parameters for CrystFEL's get_hkl for manipulating lists of reflections.

This Task is predominantly used internally to convert hkl to mtz files. Note that performing multiple manipulations is undefined behaviour. Run the Task with multiple configurations in explicit separate steps. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-partialator.html

Source code in lute/io/models/sfx_merge.py
class ManipulateHKLParameters(ThirdPartyParameters):\n    \"\"\"Parameters for CrystFEL's `get_hkl` for manipulating lists of reflections.\n\n    This Task is predominantly used internally to convert `hkl` to `mtz` files.\n    Note that performing multiple manipulations is undefined behaviour. Run\n    the Task with multiple configurations in explicit separate steps. For more\n    information on usage, please refer to the CrystFEL documentation, here:\n    https://www.desy.de/~twhite/crystfel/manual-partialator.html\n    \"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        long_flags_use_eq: bool = True\n        \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/get_hkl\",\n        description=\"CrystFEL's reflection manipulation binary.\",\n        flag_type=\"\",\n    )\n    in_file: str = Field(\n        \"\",\n        description=\"Path to input HKL file.\",\n        flag_type=\"-\",\n        rename_param=\"i\",\n    )\n    out_file: str = Field(\n        \"\",\n        description=\"Path to output file.\",\n        flag_type=\"-\",\n        rename_param=\"o\",\n        is_result=True,\n    )\n    cell_file: str = Field(\n        \"\",\n        description=\"Path to a file containing unit cell information (PDB or CrystFEL format).\",\n        flag_type=\"-\",\n        rename_param=\"p\",\n    )\n    output_format: str = Field(\n        \"mtz\",\n        description=\"Output format. One of mtz, mtz-bij, or xds. Otherwise CrystFEL format.\",\n        flag_type=\"--\",\n        rename_param=\"output-format\",\n    )\n    expand: Optional[str] = Field(\n        description=\"Reflections will be expanded to fill asymmetric unit of specified point group.\",\n        flag_type=\"--\",\n    )\n    # Reducing reflections to higher symmetry\n    twin: Optional[str] = Field(\n        description=\"Reflections equivalent to specified point group will have intensities summed.\",\n        flag_type=\"--\",\n    )\n    no_need_all_parts: Optional[bool] = Field(\n        description=\"Use with --twin to allow reflections missing a 'twin mate' to be written out.\",\n        flag_type=\"--\",\n        rename_param=\"no-need-all-parts\",\n    )\n    # Noise - Add to data\n    noise: Optional[bool] = Field(\n        description=\"Generate 10% uniform noise.\", flag_type=\"--\"\n    )\n    poisson: Optional[bool] = Field(\n        description=\"Generate Poisson noise. Intensities assumed to be A.U.\",\n        flag_type=\"--\",\n    )\n    adu_per_photon: Optional[int] = Field(\n        description=\"Use with --poisson to convert A.U. to photons.\",\n        flag_type=\"--\",\n        rename_param=\"adu-per-photon\",\n    )\n    # Remove duplicate reflections\n    trim_centrics: Optional[bool] = Field(\n        description=\"Duplicated reflections (according to symmetry) are removed.\",\n        flag_type=\"--\",\n    )\n    # Restrict to template file\n    template: Optional[str] = Field(\n        description=\"Only reflections which also appear in specified file are written out.\",\n        flag_type=\"--\",\n    )\n    # Multiplicity\n    multiplicity: Optional[bool] = Field(\n        description=\"Reflections are multiplied by their symmetric multiplicites.\",\n        flag_type=\"--\",\n    )\n    # Resolution cutoffs\n    cutoff_angstroms: Optional[Union[str, int, float]] = Field(\n        description=\"Either n, or n1,n2,n3. For n, reflections < n are removed. For n1,n2,n3 anisotropic trunction performed at separate resolution limits for a*, b*, c*.\",\n        flag_type=\"--\",\n        rename_param=\"cutoff-angstroms\",\n    )\n    lowres: Optional[float] = Field(\n        description=\"Remove reflections with d > n\", flag_type=\"--\"\n    )\n    highres: Optional[float] = Field(\n        description=\"Synonym for first form of --cutoff-angstroms\"\n    )\n    reindex: Optional[str] = Field(\n        description=\"Reindex according to specified operator. E.g. k,h,-l.\",\n        flag_type=\"--\",\n    )\n    # Override input symmetry\n    symmetry: Optional[str] = Field(\n        description=\"Point group symmetry to use to override. Almost always OMIT this option.\",\n        flag_type=\"--\",\n    )\n\n    @validator(\"in_file\", always=True)\n    def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n        if in_file == \"\":\n            partialator_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n            )\n            if partialator_file:\n                return partialator_file\n        return in_file\n\n    @validator(\"out_file\", always=True)\n    def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n        if out_file == \"\":\n            partialator_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n            )\n            if partialator_file:\n                mtz_out: str = partialator_file.split(\".\")[0]\n                mtz_out = f\"{mtz_out}.mtz\"\n                return mtz_out\n        return out_file\n\n    @validator(\"cell_file\", always=True)\n    def validate_cell_file(cls, cell_file: str, values: Dict[str, Any]) -> str:\n        if cell_file == \"\":\n            idx_cell_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\",\n                \"IndexCrystFEL\",\n                \"cell_file\",\n                valid_only=False,\n            )\n            if idx_cell_file:\n                return idx_cell_file\n        return cell_file\n
"},{"location":"source/io/config/#io.config.ManipulateHKLParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_merge.py
class Config(ThirdPartyParameters.Config):\n    long_flags_use_eq: bool = True\n    \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/config/#io.config.ManipulateHKLParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True class-attribute instance-attribute","text":"

Whether long command-line arguments are passed like --long=arg.

"},{"location":"source/io/config/#io.config.ManipulateHKLParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/config/#io.config.MergePartialatorParameters","title":"MergePartialatorParameters","text":"

Bases: ThirdPartyParameters

Parameters for CrystFEL's partialator.

There are many parameters, and many combinations. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-partialator.html

Source code in lute/io/models/sfx_merge.py
class MergePartialatorParameters(ThirdPartyParameters):\n    \"\"\"Parameters for CrystFEL's `partialator`.\n\n    There are many parameters, and many combinations. For more information on\n    usage, please refer to the CrystFEL documentation, here:\n    https://www.desy.de/~twhite/crystfel/manual-partialator.html\n    \"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        long_flags_use_eq: bool = True\n        \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/partialator\",\n        description=\"CrystFEL's Partialator binary.\",\n        flag_type=\"\",\n    )\n    in_file: Optional[str] = Field(\n        \"\", description=\"Path to input stream.\", flag_type=\"-\", rename_param=\"i\"\n    )\n    out_file: str = Field(\n        \"\",\n        description=\"Path to output file.\",\n        flag_type=\"-\",\n        rename_param=\"o\",\n        is_result=True,\n    )\n    symmetry: str = Field(description=\"Point group symmetry.\", flag_type=\"--\")\n    niter: Optional[int] = Field(\n        description=\"Number of cycles of scaling and post-refinement.\",\n        flag_type=\"-\",\n        rename_param=\"n\",\n    )\n    no_scale: Optional[bool] = Field(\n        description=\"Disable scaling.\", flag_type=\"--\", rename_param=\"no-scale\"\n    )\n    no_Bscale: Optional[bool] = Field(\n        description=\"Disable Debye-Waller part of scaling.\",\n        flag_type=\"--\",\n        rename_param=\"no-Bscale\",\n    )\n    no_pr: Optional[bool] = Field(\n        description=\"Disable orientation model.\", flag_type=\"--\", rename_param=\"no-pr\"\n    )\n    no_deltacchalf: Optional[bool] = Field(\n        description=\"Disable rejection based on deltaCC1/2.\",\n        flag_type=\"--\",\n        rename_param=\"no-deltacchalf\",\n    )\n    model: str = Field(\n        \"unity\",\n        description=\"Partiality model. Options: xsphere, unity, offset, ggpm.\",\n        flag_type=\"--\",\n    )\n    nthreads: int = Field(\n        max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n        description=\"Number of parallel analyses.\",\n        flag_type=\"-\",\n        rename_param=\"j\",\n    )\n    polarisation: Optional[str] = Field(\n        description=\"Specification of incident polarisation. Refer to CrystFEL docs for more info.\",\n        flag_type=\"--\",\n    )\n    no_polarisation: Optional[bool] = Field(\n        description=\"Synonym for --polarisation=none\",\n        flag_type=\"--\",\n        rename_param=\"no-polarisation\",\n    )\n    max_adu: Optional[float] = Field(\n        description=\"Maximum intensity of reflection to include.\",\n        flag_type=\"--\",\n        rename_param=\"max-adu\",\n    )\n    min_res: Optional[float] = Field(\n        description=\"Only include crystals diffracting to a minimum resolution.\",\n        flag_type=\"--\",\n        rename_param=\"min-res\",\n    )\n    min_measurements: int = Field(\n        2,\n        description=\"Include a reflection only if it appears a minimum number of times.\",\n        flag_type=\"--\",\n        rename_param=\"min-measurements\",\n    )\n    push_res: Optional[float] = Field(\n        description=\"Merge reflections up to higher than the apparent resolution limit.\",\n        flag_type=\"--\",\n        rename_param=\"push-res\",\n    )\n    start_after: int = Field(\n        0,\n        description=\"Ignore the first n crystals.\",\n        flag_type=\"--\",\n        rename_param=\"start-after\",\n    )\n    stop_after: int = Field(\n        0,\n        description=\"Stop after processing n crystals. 0 means process all.\",\n        flag_type=\"--\",\n        rename_param=\"stop-after\",\n    )\n    no_free: Optional[bool] = Field(\n        description=\"Disable cross-validation. Testing ONLY.\",\n        flag_type=\"--\",\n        rename_param=\"no-free\",\n    )\n    custom_split: Optional[str] = Field(\n        description=\"Read a set of filenames, event and dataset IDs from a filename.\",\n        flag_type=\"--\",\n        rename_param=\"custom-split\",\n    )\n    max_rel_B: float = Field(\n        100,\n        description=\"Reject crystals if |relB| > n sq Angstroms.\",\n        flag_type=\"--\",\n        rename_param=\"max-rel-B\",\n    )\n    output_every_cycle: bool = Field(\n        False,\n        description=\"Write per-crystal params after every refinement cycle.\",\n        flag_type=\"--\",\n        rename_param=\"output-every-cycle\",\n    )\n    no_logs: bool = Field(\n        False,\n        description=\"Do not write logs needed for plots, maps and graphs.\",\n        flag_type=\"--\",\n        rename_param=\"no-logs\",\n    )\n    set_symmetry: Optional[str] = Field(\n        description=\"Set the apparent symmetry of the crystals to a point group.\",\n        flag_type=\"-\",\n        rename_param=\"w\",\n    )\n    operator: Optional[str] = Field(\n        description=\"Specify an ambiguity operator. E.g. k,h,-l.\", flag_type=\"--\"\n    )\n    force_bandwidth: Optional[float] = Field(\n        description=\"Set X-ray bandwidth. As percent, e.g. 0.0013 (0.13%).\",\n        flag_type=\"--\",\n        rename_param=\"force-bandwidth\",\n    )\n    force_radius: Optional[float] = Field(\n        description=\"Set the initial profile radius (nm-1).\",\n        flag_type=\"--\",\n        rename_param=\"force-radius\",\n    )\n    force_lambda: Optional[float] = Field(\n        description=\"Set the wavelength. In Angstroms.\",\n        flag_type=\"--\",\n        rename_param=\"force-lambda\",\n    )\n    harvest_file: Optional[str] = Field(\n        description=\"Write parameters to file in JSON format.\",\n        flag_type=\"--\",\n        rename_param=\"harvest-file\",\n    )\n\n    @validator(\"in_file\", always=True)\n    def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n        if in_file == \"\":\n            stream_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\",\n                \"ConcatenateStreamFiles\",\n                \"out_file\",\n            )\n            if stream_file:\n                return stream_file\n        return in_file\n\n    @validator(\"out_file\", always=True)\n    def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n        if out_file == \"\":\n            in_file: str = values[\"in_file\"]\n            if in_file:\n                tag: str = in_file.split(\".\")[0]\n                return f\"{tag}.hkl\"\n            else:\n                return \"partialator.hkl\"\n        return out_file\n
"},{"location":"source/io/config/#io.config.MergePartialatorParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_merge.py
class Config(ThirdPartyParameters.Config):\n    long_flags_use_eq: bool = True\n    \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/config/#io.config.MergePartialatorParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True class-attribute instance-attribute","text":"

Whether long command-line arguments are passed like --long=arg.

"},{"location":"source/io/config/#io.config.MergePartialatorParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/config/#io.config.RunSHELXCParameters","title":"RunSHELXCParameters","text":"

Bases: ThirdPartyParameters

Parameters for CCP4's SHELXC program.

SHELXC prepares files for SHELXD and SHELXE.

For more information please refer to the official documentation: https://www.ccp4.ac.uk/html/crank.html

Source code in lute/io/models/sfx_solve.py
class RunSHELXCParameters(ThirdPartyParameters):\n    \"\"\"Parameters for CCP4's SHELXC program.\n\n    SHELXC prepares files for SHELXD and SHELXE.\n\n    For more information please refer to the official documentation:\n    https://www.ccp4.ac.uk/html/crank.html\n    \"\"\"\n\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/ccp4-8.0/bin/shelxc\",\n        description=\"CCP4 SHELXC. Generates input files for SHELXD/SHELXE.\",\n        flag_type=\"\",\n    )\n    placeholder: str = Field(\n        \"xx\", description=\"Placeholder filename stem.\", flag_type=\"\"\n    )\n    in_file: str = Field(\n        \"\",\n        description=\"Input file for SHELXC with reflections AND proper records.\",\n        flag_type=\"\",\n    )\n\n    @validator(\"in_file\", always=True)\n    def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n        if in_file == \"\":\n            # get_hkl needed to be run to produce an XDS format file...\n            xds_format_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"ManipulateHKL\", \"out_file\"\n            )\n            if xds_format_file:\n                in_file = xds_format_file\n        if in_file[0] != \"<\":\n            # Need to add a redirection for this program\n            # Runs like `shelxc xx <input_file.xds`\n            in_file = f\"<{in_file}\"\n        return in_file\n
"},{"location":"source/io/config/#io.config.SubmitSMDParameters","title":"SubmitSMDParameters","text":"

Bases: ThirdPartyParameters

Parameters for running smalldata to produce reduced HDF5 files.

Source code in lute/io/models/smd.py
class SubmitSMDParameters(ThirdPartyParameters):\n    \"\"\"Parameters for running smalldata to produce reduced HDF5 files.\"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        \"\"\"Identical to super-class Config but includes a result.\"\"\"\n\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n        result_from_params: str = \"\"\n        \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n\n    executable: str = Field(\"mpirun\", description=\"MPI executable.\", flag_type=\"\")\n    np: PositiveInt = Field(\n        max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n        description=\"Number of processes\",\n        flag_type=\"-\",\n    )\n    p_arg1: str = Field(\n        \"python\", description=\"Executable to run with mpi (i.e. python).\", flag_type=\"\"\n    )\n    u: str = Field(\n        \"\", description=\"Python option for unbuffered output.\", flag_type=\"-\"\n    )\n    m: str = Field(\n        \"mpi4py.run\",\n        description=\"Python option to execute a module's contents as __main__ module.\",\n        flag_type=\"-\",\n    )\n    producer: str = Field(\n        \"\", description=\"Path to the SmallData producer Python script.\", flag_type=\"\"\n    )\n    run: str = Field(\n        os.environ.get(\"RUN_NUM\", \"\"), description=\"DAQ Run Number.\", flag_type=\"--\"\n    )\n    experiment: str = Field(\n        os.environ.get(\"EXPERIMENT\", \"\"),\n        description=\"LCLS Experiment Number.\",\n        flag_type=\"--\",\n    )\n    stn: NonNegativeInt = Field(0, description=\"Hutch endstation.\", flag_type=\"--\")\n    nevents: int = Field(\n        int(1e9), description=\"Number of events to process.\", flag_type=\"--\"\n    )\n    directory: Optional[str] = Field(\n        None,\n        description=\"Optional output directory. If None, will be in ${EXP_FOLDER}/hdf5/smalldata.\",\n        flag_type=\"--\",\n    )\n    ## Need mechanism to set result_from_param=True ...\n    gather_interval: PositiveInt = Field(\n        25, description=\"Number of events to collect at a time.\", flag_type=\"--\"\n    )\n    norecorder: bool = Field(\n        False, description=\"Whether to ignore recorder streams.\", flag_type=\"--\"\n    )\n    url: HttpUrl = Field(\n        \"https://pswww.slac.stanford.edu/ws-auth/lgbk\",\n        description=\"Base URL for eLog posting.\",\n        flag_type=\"--\",\n    )\n    epicsAll: bool = Field(\n        False,\n        description=\"Whether to store all EPICS PVs. Use with care.\",\n        flag_type=\"--\",\n    )\n    full: bool = Field(\n        False,\n        description=\"Whether to store all data. Use with EXTRA care.\",\n        flag_type=\"--\",\n    )\n    fullSum: bool = Field(\n        False,\n        description=\"Whether to store sums for all area detector images.\",\n        flag_type=\"--\",\n    )\n    default: bool = Field(\n        False,\n        description=\"Whether to store only the default minimal set of data.\",\n        flag_type=\"--\",\n    )\n    image: bool = Field(\n        False,\n        description=\"Whether to save everything as images. Use with care.\",\n        flag_type=\"--\",\n    )\n    tiff: bool = Field(\n        False,\n        description=\"Whether to save all images as a single TIFF. Use with EXTRA care.\",\n        flag_type=\"--\",\n    )\n    centerpix: bool = Field(\n        False,\n        description=\"Whether to mask center pixels for Epix10k2M detectors.\",\n        flag_type=\"--\",\n    )\n    postRuntable: bool = Field(\n        False,\n        description=\"Whether to post run tables. Also used as a trigger for summary jobs.\",\n        flag_type=\"--\",\n    )\n    wait: bool = Field(\n        False, description=\"Whether to wait for a file to appear.\", flag_type=\"--\"\n    )\n    xtcav: bool = Field(\n        False,\n        description=\"Whether to add XTCAV processing to the HDF5 generation.\",\n        flag_type=\"--\",\n    )\n    noarch: bool = Field(\n        False, description=\"Whether to not use archiver data.\", flag_type=\"--\"\n    )\n\n    lute_template_cfg: TemplateConfig = TemplateConfig(template_name=\"\", output_path=\"\")\n\n    @validator(\"producer\", always=True)\n    def validate_producer_path(cls, producer: str) -> str:\n        return producer\n\n    @validator(\"lute_template_cfg\", always=True)\n    def use_producer(\n        cls, lute_template_cfg: TemplateConfig, values: Dict[str, Any]\n    ) -> TemplateConfig:\n        if not lute_template_cfg.output_path:\n            lute_template_cfg.output_path = values[\"producer\"]\n        return lute_template_cfg\n\n    @root_validator(pre=False)\n    def define_result(cls, values: Dict[str, Any]) -> Dict[str, Any]:\n        exp: str = values[\"lute_config\"].experiment\n        hutch: str = exp[:3]\n        run: int = int(values[\"lute_config\"].run)\n        directory: Optional[str] = values[\"directory\"]\n        if directory is None:\n            directory = f\"/sdf/data/lcls/ds/{hutch}/{exp}/hdf5/smalldata\"\n        fname: str = f\"{exp}_Run{run:04d}.h5\"\n\n        cls.Config.result_from_params = f\"{directory}/{fname}\"\n        return values\n
"},{"location":"source/io/config/#io.config.SubmitSMDParameters.Config","title":"Config","text":"

Bases: Config

Identical to super-class Config but includes a result.

Source code in lute/io/models/smd.py
class Config(ThirdPartyParameters.Config):\n    \"\"\"Identical to super-class Config but includes a result.\"\"\"\n\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    result_from_params: str = \"\"\n    \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n
"},{"location":"source/io/config/#io.config.SubmitSMDParameters.Config.result_from_params","title":"result_from_params: str = '' class-attribute instance-attribute","text":"

Defines a result from the parameters. Use a validator to do so.

"},{"location":"source/io/config/#io.config.SubmitSMDParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/config/#io.config.TaskParameters","title":"TaskParameters","text":"

Bases: BaseSettings

Base class for models of task parameters to be validated.

Parameters are read from a configuration YAML file and validated against subclasses of this type in order to ensure that both all parameters are present, and that the parameters are of the correct type.

Note

Pydantic is used for data validation. Pydantic does not perform \"strict\" validation by default. Parameter values may be cast to conform with the model specified by the subclass definition if it is possible to do so. Consider whether this may cause issues (e.g. if a float is cast to an int).

Source code in lute/io/models/base.py
class TaskParameters(BaseSettings):\n    \"\"\"Base class for models of task parameters to be validated.\n\n    Parameters are read from a configuration YAML file and validated against\n    subclasses of this type in order to ensure that both all parameters are\n    present, and that the parameters are of the correct type.\n\n    Note:\n        Pydantic is used for data validation. Pydantic does not perform \"strict\"\n        validation by default. Parameter values may be cast to conform with the\n        model specified by the subclass definition if it is possible to do so.\n        Consider whether this may cause issues (e.g. if a float is cast to an\n        int).\n    \"\"\"\n\n    class Config:\n        \"\"\"Configuration for parameters model.\n\n        The Config class holds Pydantic configuration. A number of LUTE-specific\n        configuration has also been placed here.\n\n        Attributes:\n            env_prefix (str): Pydantic configuration. Will set parameters from\n                environment variables containing this prefix. E.g. a model\n                parameter `input` can be set with an environment variable:\n                `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n            underscore_attrs_are_private (bool): Pydantic configuration. Whether\n                to hide attributes (parameters) prefixed with an underscore.\n\n            copy_on_model_validation (str): Pydantic configuration. How to copy\n                the input object passed to the class instance for model\n                validation. Set to perform a deep copy.\n\n            allow_inf_nan (bool): Pydantic configuration. Whether to allow\n                infinity or NAN in float fields.\n\n            run_directory (Optional[str]): None. If set, it should be a valid\n                path. The `Task` will be run from this directory. This may be\n                useful for some `Task`s which rely on searching the working\n                directory.\n\n            set_result (bool). False. If True, the model has information about\n                setting the TaskResult object from the parameters it contains.\n                E.g. it has an `output` parameter which is marked as the result.\n                The result can be set with a field value of `is_result=True` on\n                a specific parameter, or using `result_from_params` and a\n                validator.\n\n            result_from_params (Optional[str]): None. Optionally used to define\n                results from information available in the model using a custom\n                validator. E.g. use a `outdir` and `filename` field to set\n                `result_from_params=f\"{outdir}/{filename}`, etc. Only used if\n                `set_result==True`\n\n            result_summary (Optional[str]): None. Defines a result summary that\n                can be known after processing the Pydantic model. Use of summary\n                depends on the Executor running the Task. All summaries are\n                stored in the database, however. Only used if `set_result==True`\n\n            impl_schemas (Optional[str]). Specifies a the schemas the\n                output/results conform to. Only used if `set_result==True`.\n        \"\"\"\n\n        env_prefix = \"LUTE_\"\n        underscore_attrs_are_private: bool = True\n        copy_on_model_validation: str = \"deep\"\n        allow_inf_nan: bool = False\n\n        run_directory: Optional[str] = None\n        \"\"\"Set the directory that the Task is run from.\"\"\"\n        set_result: bool = False\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n        result_from_params: Optional[str] = None\n        \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n        result_summary: Optional[str] = None\n        \"\"\"Format a TaskResult.summary from output.\"\"\"\n        impl_schemas: Optional[str] = None\n        \"\"\"Schema specification for output result. Will be passed to TaskResult.\"\"\"\n\n    lute_config: AnalysisHeader\n
"},{"location":"source/io/config/#io.config.TaskParameters.Config","title":"Config","text":"

Configuration for parameters model.

The Config class holds Pydantic configuration. A number of LUTE-specific configuration has also been placed here.

Attributes:

Name Type Description env_prefix str

Pydantic configuration. Will set parameters from environment variables containing this prefix. E.g. a model parameter input can be set with an environment variable: {env_prefix}input, in LUTE's case LUTE_input.

underscore_attrs_are_private bool

Pydantic configuration. Whether to hide attributes (parameters) prefixed with an underscore.

copy_on_model_validation str

Pydantic configuration. How to copy the input object passed to the class instance for model validation. Set to perform a deep copy.

allow_inf_nan bool

Pydantic configuration. Whether to allow infinity or NAN in float fields.

run_directory Optional[str]

None. If set, it should be a valid path. The Task will be run from this directory. This may be useful for some Tasks which rely on searching the working directory.

result_from_params Optional[str]

None. Optionally used to define results from information available in the model using a custom validator. E.g. use a outdir and filename field to set result_from_params=f\"{outdir}/{filename}, etc. Only used if set_result==True

result_summary Optional[str]

None. Defines a result summary that can be known after processing the Pydantic model. Use of summary depends on the Executor running the Task. All summaries are stored in the database, however. Only used if set_result==True

Source code in lute/io/models/base.py
class Config:\n    \"\"\"Configuration for parameters model.\n\n    The Config class holds Pydantic configuration. A number of LUTE-specific\n    configuration has also been placed here.\n\n    Attributes:\n        env_prefix (str): Pydantic configuration. Will set parameters from\n            environment variables containing this prefix. E.g. a model\n            parameter `input` can be set with an environment variable:\n            `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n        underscore_attrs_are_private (bool): Pydantic configuration. Whether\n            to hide attributes (parameters) prefixed with an underscore.\n\n        copy_on_model_validation (str): Pydantic configuration. How to copy\n            the input object passed to the class instance for model\n            validation. Set to perform a deep copy.\n\n        allow_inf_nan (bool): Pydantic configuration. Whether to allow\n            infinity or NAN in float fields.\n\n        run_directory (Optional[str]): None. If set, it should be a valid\n            path. The `Task` will be run from this directory. This may be\n            useful for some `Task`s which rely on searching the working\n            directory.\n\n        set_result (bool). False. If True, the model has information about\n            setting the TaskResult object from the parameters it contains.\n            E.g. it has an `output` parameter which is marked as the result.\n            The result can be set with a field value of `is_result=True` on\n            a specific parameter, or using `result_from_params` and a\n            validator.\n\n        result_from_params (Optional[str]): None. Optionally used to define\n            results from information available in the model using a custom\n            validator. E.g. use a `outdir` and `filename` field to set\n            `result_from_params=f\"{outdir}/{filename}`, etc. Only used if\n            `set_result==True`\n\n        result_summary (Optional[str]): None. Defines a result summary that\n            can be known after processing the Pydantic model. Use of summary\n            depends on the Executor running the Task. All summaries are\n            stored in the database, however. Only used if `set_result==True`\n\n        impl_schemas (Optional[str]). Specifies a the schemas the\n            output/results conform to. Only used if `set_result==True`.\n    \"\"\"\n\n    env_prefix = \"LUTE_\"\n    underscore_attrs_are_private: bool = True\n    copy_on_model_validation: str = \"deep\"\n    allow_inf_nan: bool = False\n\n    run_directory: Optional[str] = None\n    \"\"\"Set the directory that the Task is run from.\"\"\"\n    set_result: bool = False\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n    result_from_params: Optional[str] = None\n    \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n    result_summary: Optional[str] = None\n    \"\"\"Format a TaskResult.summary from output.\"\"\"\n    impl_schemas: Optional[str] = None\n    \"\"\"Schema specification for output result. Will be passed to TaskResult.\"\"\"\n
"},{"location":"source/io/config/#io.config.TaskParameters.Config.impl_schemas","title":"impl_schemas: Optional[str] = None class-attribute instance-attribute","text":"

Schema specification for output result. Will be passed to TaskResult.

"},{"location":"source/io/config/#io.config.TaskParameters.Config.result_from_params","title":"result_from_params: Optional[str] = None class-attribute instance-attribute","text":"

Defines a result from the parameters. Use a validator to do so.

"},{"location":"source/io/config/#io.config.TaskParameters.Config.result_summary","title":"result_summary: Optional[str] = None class-attribute instance-attribute","text":"

Format a TaskResult.summary from output.

"},{"location":"source/io/config/#io.config.TaskParameters.Config.run_directory","title":"run_directory: Optional[str] = None class-attribute instance-attribute","text":"

Set the directory that the Task is run from.

"},{"location":"source/io/config/#io.config.TaskParameters.Config.set_result","title":"set_result: bool = False class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/config/#io.config.TemplateConfig","title":"TemplateConfig","text":"

Bases: BaseModel

Parameters used for templating of third party configuration files.

Attributes:

Name Type Description template_name str

The name of the template to use. This template must live in config/templates.

output_path str

The FULL path, including filename to write the rendered template to.

Source code in lute/io/models/base.py
class TemplateConfig(BaseModel):\n    \"\"\"Parameters used for templating of third party configuration files.\n\n    Attributes:\n        template_name (str): The name of the template to use. This template must\n            live in `config/templates`.\n\n        output_path (str): The FULL path, including filename to write the\n            rendered template to.\n    \"\"\"\n\n    template_name: str\n    output_path: str\n
"},{"location":"source/io/config/#io.config.TemplateParameters","title":"TemplateParameters","text":"

Class for representing parameters for third party configuration files.

These parameters can represent arbitrary data types and are used in conjunction with templates for modifying third party configuration files from the single LUTE YAML. Due to the storage of arbitrary data types, and the use of a template file, a single instance of this class can hold from a single template variable to an entire configuration file. The data parsing is done by jinja using the complementary template. All data is stored in the single model variable params.

The pydantic \"dataclass\" is used over the BaseModel/Settings to allow positional argument instantiation of the params Field.

Source code in lute/io/models/base.py
@dataclass\nclass TemplateParameters:\n    \"\"\"Class for representing parameters for third party configuration files.\n\n    These parameters can represent arbitrary data types and are used in\n    conjunction with templates for modifying third party configuration files\n    from the single LUTE YAML. Due to the storage of arbitrary data types, and\n    the use of a template file, a single instance of this class can hold from a\n    single template variable to an entire configuration file. The data parsing\n    is done by jinja using the complementary template.\n    All data is stored in the single model variable `params.`\n\n    The pydantic \"dataclass\" is used over the BaseModel/Settings to allow\n    positional argument instantiation of the `params` Field.\n    \"\"\"\n\n    params: Any\n
"},{"location":"source/io/config/#io.config.TestBinaryErrParameters","title":"TestBinaryErrParameters","text":"

Bases: ThirdPartyParameters

Same as TestBinary, but exits with non-zero code.

Source code in lute/io/models/tests.py
class TestBinaryErrParameters(ThirdPartyParameters):\n    \"\"\"Same as TestBinary, but exits with non-zero code.\"\"\"\n\n    executable: str = Field(\n        \"/sdf/home/d/dorlhiac/test_tasks/test_threads_err\",\n        description=\"Multi-threaded tes tbinary with non-zero exit code.\",\n    )\n    p_arg1: int = Field(1, description=\"Number of threads.\")\n
"},{"location":"source/io/config/#io.config.TestMultiNodeCommunicationParameters","title":"TestMultiNodeCommunicationParameters","text":"

Bases: TaskParameters

Parameters for the test Task TestMultiNodeCommunication.

Test verifies communication across multiple machines.

Source code in lute/io/models/mpi_tests.py
class TestMultiNodeCommunicationParameters(TaskParameters):\n    \"\"\"Parameters for the test Task `TestMultiNodeCommunication`.\n\n    Test verifies communication across multiple machines.\n    \"\"\"\n\n    send_obj: Literal[\"plot\", \"array\"] = Field(\n        \"array\", description=\"Object to send to Executor. `plot` or `array`\"\n    )\n    arr_size: Optional[int] = Field(\n        None, description=\"Size of array to send back to Executor.\"\n    )\n
"},{"location":"source/io/config/#io.config.TestParameters","title":"TestParameters","text":"

Bases: TaskParameters

Parameters for the test Task Test.

Source code in lute/io/models/tests.py
class TestParameters(TaskParameters):\n    \"\"\"Parameters for the test Task `Test`.\"\"\"\n\n    float_var: float = Field(0.01, description=\"A floating point number.\")\n    str_var: str = Field(\"test\", description=\"A string.\")\n\n    class CompoundVar(BaseModel):\n        int_var: int = 1\n        dict_var: Dict[str, str] = {\"a\": \"b\"}\n\n    compound_var: CompoundVar = Field(\n        description=(\n            \"A compound parameter - consists of a `int_var` (int) and `dict_var`\"\n            \" (Dict[str, str]).\"\n        )\n    )\n    throw_error: bool = Field(\n        False, description=\"If `True`, raise an exception to test error handling.\"\n    )\n
"},{"location":"source/io/config/#io.config.ThirdPartyParameters","title":"ThirdPartyParameters","text":"

Bases: TaskParameters

Base class for third party task parameters.

Contains special validators for extra arguments and handling of parameters used for filling in third party configuration files.

Source code in lute/io/models/base.py
class ThirdPartyParameters(TaskParameters):\n    \"\"\"Base class for third party task parameters.\n\n    Contains special validators for extra arguments and handling of parameters\n    used for filling in third party configuration files.\n    \"\"\"\n\n    class Config(TaskParameters.Config):\n        \"\"\"Configuration for parameters model.\n\n        The Config class holds Pydantic configuration and inherited configuration\n        from the base `TaskParameters.Config` class. A number of values are also\n        overridden, and there are some specific configuration options to\n        ThirdPartyParameters. A full list of options (with TaskParameters options\n        repeated) is described below.\n\n        Attributes:\n            env_prefix (str): Pydantic configuration. Will set parameters from\n                environment variables containing this prefix. E.g. a model\n                parameter `input` can be set with an environment variable:\n                `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n            underscore_attrs_are_private (bool): Pydantic configuration. Whether\n                to hide attributes (parameters) prefixed with an underscore.\n\n            copy_on_model_validation (str): Pydantic configuration. How to copy\n                the input object passed to the class instance for model\n                validation. Set to perform a deep copy.\n\n            allow_inf_nan (bool): Pydantic configuration. Whether to allow\n                infinity or NAN in float fields.\n\n            run_directory (Optional[str]): None. If set, it should be a valid\n                path. The `Task` will be run from this directory. This may be\n                useful for some `Task`s which rely on searching the working\n                directory.\n\n            set_result (bool). True. If True, the model has information about\n                setting the TaskResult object from the parameters it contains.\n                E.g. it has an `output` parameter which is marked as the result.\n                The result can be set with a field value of `is_result=True` on\n                a specific parameter, or using `result_from_params` and a\n                validator.\n\n            result_from_params (Optional[str]): None. Optionally used to define\n                results from information available in the model using a custom\n                validator. E.g. use a `outdir` and `filename` field to set\n                `result_from_params=f\"{outdir}/{filename}`, etc.\n\n            result_summary (Optional[str]): None. Defines a result summary that\n                can be known after processing the Pydantic model. Use of summary\n                depends on the Executor running the Task. All summaries are\n                stored in the database, however.\n\n            impl_schemas (Optional[str]). Specifies a the schemas the\n                output/results conform to. Only used if set_result is True.\n\n            -----------------------\n            ThirdPartyTask-specific:\n\n            extra (str): \"allow\". Pydantic configuration. Allow (or ignore) extra\n                arguments.\n\n            short_flags_use_eq (bool): False. If True, \"short\" command-line args\n                are passed as `-x=arg`. ThirdPartyTask-specific.\n\n            long_flags_use_eq (bool): False. If True, \"long\" command-line args\n                are passed as `--long=arg`. ThirdPartyTask-specific.\n        \"\"\"\n\n        extra: str = \"allow\"\n        short_flags_use_eq: bool = False\n        \"\"\"Whether short command-line arguments are passed like `-x=arg`.\"\"\"\n        long_flags_use_eq: bool = False\n        \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    # lute_template_cfg: TemplateConfig\n\n    @root_validator(pre=False)\n    def extra_fields_to_thirdparty(cls, values: Dict[str, Any]):\n        for key in values:\n            if key not in cls.__fields__:\n                values[key] = TemplateParameters(values[key])\n\n        return values\n
"},{"location":"source/io/config/#io.config.ThirdPartyParameters.Config","title":"Config","text":"

Bases: Config

Configuration for parameters model.

The Config class holds Pydantic configuration and inherited configuration from the base TaskParameters.Config class. A number of values are also overridden, and there are some specific configuration options to ThirdPartyParameters. A full list of options (with TaskParameters options repeated) is described below.

Attributes:

Name Type Description env_prefix str

Pydantic configuration. Will set parameters from environment variables containing this prefix. E.g. a model parameter input can be set with an environment variable: {env_prefix}input, in LUTE's case LUTE_input.

underscore_attrs_are_private bool

Pydantic configuration. Whether to hide attributes (parameters) prefixed with an underscore.

copy_on_model_validation str

Pydantic configuration. How to copy the input object passed to the class instance for model validation. Set to perform a deep copy.

allow_inf_nan bool

Pydantic configuration. Whether to allow infinity or NAN in float fields.

run_directory Optional[str]

None. If set, it should be a valid path. The Task will be run from this directory. This may be useful for some Tasks which rely on searching the working directory.

result_from_params Optional[str]

None. Optionally used to define results from information available in the model using a custom validator. E.g. use a outdir and filename field to set result_from_params=f\"{outdir}/{filename}, etc.

result_summary Optional[str]

None. Defines a result summary that can be known after processing the Pydantic model. Use of summary depends on the Executor running the Task. All summaries are stored in the database, however.

ThirdPartyTask-specific Optional[str] extra str

\"allow\". Pydantic configuration. Allow (or ignore) extra arguments.

short_flags_use_eq bool

False. If True, \"short\" command-line args are passed as -x=arg. ThirdPartyTask-specific.

long_flags_use_eq bool

False. If True, \"long\" command-line args are passed as --long=arg. ThirdPartyTask-specific.

Source code in lute/io/models/base.py
class Config(TaskParameters.Config):\n    \"\"\"Configuration for parameters model.\n\n    The Config class holds Pydantic configuration and inherited configuration\n    from the base `TaskParameters.Config` class. A number of values are also\n    overridden, and there are some specific configuration options to\n    ThirdPartyParameters. A full list of options (with TaskParameters options\n    repeated) is described below.\n\n    Attributes:\n        env_prefix (str): Pydantic configuration. Will set parameters from\n            environment variables containing this prefix. E.g. a model\n            parameter `input` can be set with an environment variable:\n            `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n        underscore_attrs_are_private (bool): Pydantic configuration. Whether\n            to hide attributes (parameters) prefixed with an underscore.\n\n        copy_on_model_validation (str): Pydantic configuration. How to copy\n            the input object passed to the class instance for model\n            validation. Set to perform a deep copy.\n\n        allow_inf_nan (bool): Pydantic configuration. Whether to allow\n            infinity or NAN in float fields.\n\n        run_directory (Optional[str]): None. If set, it should be a valid\n            path. The `Task` will be run from this directory. This may be\n            useful for some `Task`s which rely on searching the working\n            directory.\n\n        set_result (bool). True. If True, the model has information about\n            setting the TaskResult object from the parameters it contains.\n            E.g. it has an `output` parameter which is marked as the result.\n            The result can be set with a field value of `is_result=True` on\n            a specific parameter, or using `result_from_params` and a\n            validator.\n\n        result_from_params (Optional[str]): None. Optionally used to define\n            results from information available in the model using a custom\n            validator. E.g. use a `outdir` and `filename` field to set\n            `result_from_params=f\"{outdir}/{filename}`, etc.\n\n        result_summary (Optional[str]): None. Defines a result summary that\n            can be known after processing the Pydantic model. Use of summary\n            depends on the Executor running the Task. All summaries are\n            stored in the database, however.\n\n        impl_schemas (Optional[str]). Specifies a the schemas the\n            output/results conform to. Only used if set_result is True.\n\n        -----------------------\n        ThirdPartyTask-specific:\n\n        extra (str): \"allow\". Pydantic configuration. Allow (or ignore) extra\n            arguments.\n\n        short_flags_use_eq (bool): False. If True, \"short\" command-line args\n            are passed as `-x=arg`. ThirdPartyTask-specific.\n\n        long_flags_use_eq (bool): False. If True, \"long\" command-line args\n            are passed as `--long=arg`. ThirdPartyTask-specific.\n    \"\"\"\n\n    extra: str = \"allow\"\n    short_flags_use_eq: bool = False\n    \"\"\"Whether short command-line arguments are passed like `-x=arg`.\"\"\"\n    long_flags_use_eq: bool = False\n    \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/config/#io.config.ThirdPartyParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = False class-attribute instance-attribute","text":"

Whether long command-line arguments are passed like --long=arg.

"},{"location":"source/io/config/#io.config.ThirdPartyParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/config/#io.config.ThirdPartyParameters.Config.short_flags_use_eq","title":"short_flags_use_eq: bool = False class-attribute instance-attribute","text":"

Whether short command-line arguments are passed like -x=arg.

"},{"location":"source/io/config/#io.config.parse_config","title":"parse_config(task_name='test', config_path='')","text":"

Parse a configuration file and validate the contents.

Parameters:

Name Type Description Default task_name str

Name of the specific task that will be run.

'test' config_path str

Path to the configuration file.

''

Returns:

Name Type Description params TaskParameters

A TaskParameters object of validated task-specific parameters. Parameters are accessed with \"dot\" notation. E.g. params.param1.

Raises:

Type Description ValidationError

Raised if there are problems with the configuration file. Passed through from Pydantic.

Source code in lute/io/config.py
def parse_config(task_name: str = \"test\", config_path: str = \"\") -> TaskParameters:\n    \"\"\"Parse a configuration file and validate the contents.\n\n    Args:\n        task_name (str): Name of the specific task that will be run.\n\n        config_path (str): Path to the configuration file.\n\n    Returns:\n        params (TaskParameters): A TaskParameters object of validated\n            task-specific parameters. Parameters are accessed with \"dot\"\n            notation. E.g. `params.param1`.\n\n    Raises:\n        ValidationError: Raised if there are problems with the configuration\n            file. Passed through from Pydantic.\n    \"\"\"\n    task_config_name: str = f\"{task_name}Parameters\"\n\n    with open(config_path, \"r\") as f:\n        docs: Iterator[Dict[str, Any]] = yaml.load_all(stream=f, Loader=yaml.FullLoader)\n        header: Dict[str, Any] = next(docs)\n        config: Dict[str, Any] = next(docs)\n    substitute_variables(header, header)\n    substitute_variables(header, config)\n    LUTE_DEBUG_EXIT(\"LUTE_DEBUG_EXIT_AT_YAML\", pprint.pformat(config))\n    lute_config: Dict[str, AnalysisHeader] = {\"lute_config\": AnalysisHeader(**header)}\n    try:\n        task_config: Dict[str, Any] = dict(config[task_name])\n        lute_config.update(task_config)\n    except KeyError as err:\n        warnings.warn(\n            (\n                f\"{task_name} has no parameter definitions in YAML file.\"\n                \" Attempting default parameter initialization.\"\n            )\n        )\n    parsed_parameters: TaskParameters = globals()[task_config_name](**lute_config)\n    return parsed_parameters\n
"},{"location":"source/io/config/#io.config.substitute_variables","title":"substitute_variables(header, config, curr_key=None)","text":"

Performs variable substitutions on a dictionary read from config YAML file.

Can be used to define input parameters in terms of other input parameters. This is similar to functionality employed by validators for parameters in the specific Task models, but is intended to be more accessible to users. Variable substitutions are defined using a minimal syntax from Jinja: {{ experiment }} defines a substitution of the variable experiment. The characters {{ }} can be escaped if the literal symbols are needed in place.

For example, a path to a file can be defined in terms of experiment and run values in the config file: MyTask: experiment: myexp run: 2 special_file: /path/to/{{ experiment }}/{{ run }}/file.inp

Acceptable variables for substitutions are values defined elsewhere in the YAML file. Environment variables can also be used if prefaced with a $ character. E.g. to get the experiment from an environment variable: MyTask: run: 2 special_file: /path/to/{{ $EXPERIMENT }}/{{ run }}/file.inp

Parameters:

Name Type Description Default config Dict[str, Any]

A dictionary of parsed configuration.

required curr_key Optional[str]

Used to keep track of recursion level when scanning through iterable items in the config dictionary.

None

Returns:

Name Type Description subbed_config Dict[str, Any]

The config dictionary after substitutions have been made. May be identical to the input if no substitutions are needed.

Source code in lute/io/config.py
def substitute_variables(\n    header: Dict[str, Any], config: Dict[str, Any], curr_key: Optional[str] = None\n) -> None:\n    \"\"\"Performs variable substitutions on a dictionary read from config YAML file.\n\n    Can be used to define input parameters in terms of other input parameters.\n    This is similar to functionality employed by validators for parameters in\n    the specific Task models, but is intended to be more accessible to users.\n    Variable substitutions are defined using a minimal syntax from Jinja:\n                               {{ experiment }}\n    defines a substitution of the variable `experiment`. The characters `{{ }}`\n    can be escaped if the literal symbols are needed in place.\n\n    For example, a path to a file can be defined in terms of experiment and run\n    values in the config file:\n        MyTask:\n          experiment: myexp\n          run: 2\n          special_file: /path/to/{{ experiment }}/{{ run }}/file.inp\n\n    Acceptable variables for substitutions are values defined elsewhere in the\n    YAML file. Environment variables can also be used if prefaced with a `$`\n    character. E.g. to get the experiment from an environment variable:\n        MyTask:\n          run: 2\n          special_file: /path/to/{{ $EXPERIMENT }}/{{ run }}/file.inp\n\n    Args:\n        config (Dict[str, Any]):  A dictionary of parsed configuration.\n\n        curr_key (Optional[str]): Used to keep track of recursion level when scanning\n            through iterable items in the config dictionary.\n\n    Returns:\n        subbed_config (Dict[str, Any]): The config dictionary after substitutions\n            have been made. May be identical to the input if no substitutions are\n            needed.\n    \"\"\"\n    _sub_pattern = r\"\\{\\{[^}{]*\\}\\}\"\n    iterable: Dict[str, Any] = config\n    if curr_key is not None:\n        # Need to handle nested levels by interpreting curr_key\n        keys_by_level: List[str] = curr_key.split(\".\")\n        for key in keys_by_level:\n            iterable = iterable[key]\n    else:\n        ...\n        # iterable = config\n    for param, value in iterable.items():\n        if isinstance(value, dict):\n            new_key: str\n            if curr_key is None:\n                new_key = param\n            else:\n                new_key = f\"{curr_key}.{param}\"\n            substitute_variables(header, config, curr_key=new_key)\n        elif isinstance(value, list):\n            ...\n        # Scalars str - we skip numeric types\n        elif isinstance(value, str):\n            matches: List[str] = re.findall(_sub_pattern, value)\n            for m in matches:\n                key_to_sub_maybe_with_fmt: List[str] = m[2:-2].strip().split(\":\")\n                key_to_sub: str = key_to_sub_maybe_with_fmt[0]\n                fmt: Optional[str] = None\n                if len(key_to_sub_maybe_with_fmt) == 2:\n                    fmt = key_to_sub_maybe_with_fmt[1]\n                sub: Any\n                if key_to_sub[0] == \"$\":\n                    sub = os.getenv(key_to_sub[1:], None)\n                    if sub is None:\n                        print(\n                            f\"Environment variable {key_to_sub[1:]} not found! Cannot substitute in YAML config!\",\n                            flush=True,\n                        )\n                        continue\n                    # substitutions from env vars will be strings, so convert back\n                    # to numeric in order to perform formatting later on (e.g. {var:04d})\n                    sub = _check_str_numeric(sub)\n                else:\n                    try:\n                        sub = config\n                        for key in key_to_sub.split(\".\"):\n                            sub = sub[key]\n                    except KeyError:\n                        sub = header[key_to_sub]\n                pattern: str = (\n                    m.replace(\"{{\", r\"\\{\\{\").replace(\"}}\", r\"\\}\\}\").replace(\"$\", r\"\\$\")\n                )\n                if fmt is not None:\n                    sub = f\"{sub:{fmt}}\"\n                else:\n                    sub = f\"{sub}\"\n                iterable[param] = re.sub(pattern, sub, iterable[param])\n            # Reconvert back to numeric values if needed...\n            iterable[param] = _check_str_numeric(iterable[param])\n
"},{"location":"source/io/db/","title":"db","text":"

Tools for working with the LUTE parameter and configuration database.

The current implementation relies on a sqlite backend database. In the future this may change - therefore relatively few high-level API function calls are intended to be public. These abstract away the details of the database interface and work exclusively on LUTE objects.

Functions:

Name Description record_analysis_db

DescribedAnalysis) -> None: Writes the configuration to the backend database.

read_latest_db_entry

str, task_name: str, param: str) -> Any: Retrieve the most recent entry from a database for a specific Task.

Raises:

Type Description DatabaseError

Generic exception raised for LUTE database errors.

"},{"location":"source/io/db/#io.db.DatabaseError","title":"DatabaseError","text":"

Bases: Exception

General LUTE database error.

Source code in lute/io/db.py
class DatabaseError(Exception):\n    \"\"\"General LUTE database error.\"\"\"\n\n    ...\n
"},{"location":"source/io/db/#io.db.read_latest_db_entry","title":"read_latest_db_entry(db_dir, task_name, param, valid_only=True)","text":"

Read most recent value entered into the database for a Task parameter.

(Will be updated for schema compliance as well as Task name.)

Parameters:

Name Type Description Default db_dir str

Database location.

required task_name str

The name of the Task to check the database for.

required param str

The parameter name for the Task that we want to retrieve.

required valid_only bool

Whether to consider only valid results or not. E.g. An input file may be useful even if the Task result is invalid (Failed). Default = True.

True

Returns:

Name Type Description val Any

The most recently entered value for param of task_name that can be found in the database. Returns None if nothing found.

Source code in lute/io/db.py
def read_latest_db_entry(\n    db_dir: str, task_name: str, param: str, valid_only: bool = True\n) -> Optional[Any]:\n    \"\"\"Read most recent value entered into the database for a Task parameter.\n\n    (Will be updated for schema compliance as well as Task name.)\n\n    Args:\n        db_dir (str): Database location.\n\n        task_name (str): The name of the Task to check the database for.\n\n        param (str): The parameter name for the Task that we want to retrieve.\n\n        valid_only (bool): Whether to consider only valid results or not. E.g.\n            An input file may be useful even if the Task result is invalid\n            (Failed). Default = True.\n\n    Returns:\n        val (Any): The most recently entered value for `param` of `task_name`\n            that can be found in the database. Returns None if nothing found.\n    \"\"\"\n    import sqlite3\n    from ._sqlite import _select_from_db\n\n    con: sqlite3.Connection = sqlite3.Connection(f\"{db_dir}/lute.db\")\n    with con:\n        try:\n            cond: Dict[str, str] = {}\n            if valid_only:\n                cond = {\"valid_flag\": \"1\"}\n            entry: Any = _select_from_db(con, task_name, param, cond)\n        except sqlite3.OperationalError as err:\n            logger.debug(f\"Cannot retrieve value {param} due to: {err}\")\n            entry = None\n    return entry\n
"},{"location":"source/io/db/#io.db.record_analysis_db","title":"record_analysis_db(cfg)","text":"

Write an DescribedAnalysis object to the database.

The DescribedAnalysis object is maintained by the Executor and contains all information necessary to fully describe a single Task execution. The contained fields are split across multiple tables within the database as some of the information can be shared across multiple Tasks. Refer to docs/design/database.md for more information on the database specification.

Source code in lute/io/db.py
def record_analysis_db(cfg: DescribedAnalysis) -> None:\n    \"\"\"Write an DescribedAnalysis object to the database.\n\n    The DescribedAnalysis object is maintained by the Executor and contains all\n    information necessary to fully describe a single `Task` execution. The\n    contained fields are split across multiple tables within the database as\n    some of the information can be shared across multiple Tasks. Refer to\n    `docs/design/database.md` for more information on the database specification.\n    \"\"\"\n    import sqlite3\n    from ._sqlite import (\n        _make_shared_table,\n        _make_task_table,\n        _add_row_no_duplicate,\n        _add_task_entry,\n    )\n\n    try:\n        work_dir: str = cfg.task_parameters.lute_config.work_dir\n    except AttributeError:\n        logger.info(\n            (\n                \"Unable to access TaskParameters object. Likely wasn't created. \"\n                \"Cannot store result.\"\n            )\n        )\n        return\n    del cfg.task_parameters.lute_config.work_dir\n\n    exec_entry, exec_columns = _cfg_to_exec_entry_cols(cfg)\n    task_name: str = cfg.task_result.task_name\n    # All `Task`s have an AnalysisHeader, but this info can be shared so is\n    # split into a different table\n    (\n        task_entry,  # Dict[str, Any]\n        task_columns,  # Dict[str, str]\n        gen_entry,  # Dict[str, Any]\n        gen_columns,  # Dict[str, str]\n    ) = _params_to_entry_cols(cfg.task_parameters)\n    x, y = _result_to_entry_cols(cfg.task_result)\n    task_entry.update(x)\n    task_columns.update(y)\n\n    con: sqlite3.Connection = sqlite3.Connection(f\"{work_dir}/lute.db\")\n    with con:\n        # --- Table Creation ---#\n        if not _make_shared_table(con, \"gen_cfg\", gen_columns):\n            raise DatabaseError(\"Could not make general configuration table!\")\n        if not _make_shared_table(con, \"exec_cfg\", exec_columns):\n            raise DatabaseError(\"Could not make Executor configuration table!\")\n        if not _make_task_table(con, task_name, task_columns):\n            raise DatabaseError(f\"Could not make Task table for: {task_name}!\")\n\n        # --- Row Addition ---#\n        gen_id: int = _add_row_no_duplicate(con, \"gen_cfg\", gen_entry)\n        exec_id: int = _add_row_no_duplicate(con, \"exec_cfg\", exec_entry)\n\n        full_task_entry: Dict[str, Any] = {\n            \"gen_cfg_id\": gen_id,\n            \"exec_cfg_id\": exec_id,\n        }\n        full_task_entry.update(task_entry)\n        # Prepare flag to indicate whether the task entry is valid or not\n        # By default we say it is assuming proper completion\n        valid_flag: int = (\n            1 if cfg.task_result.task_status == TaskStatus.COMPLETED else 0\n        )\n        full_task_entry.update({\"valid_flag\": valid_flag})\n\n        _add_task_entry(con, task_name, full_task_entry)\n
"},{"location":"source/io/elog/","title":"elog","text":"

Provides utilities for communicating with the LCLS eLog.

Make use of various eLog API endpoint to retrieve information or post results.

Functions:

Name Description get_elog_opr_auth

str): Return an authorization object to interact with eLog API as an opr account for the hutch where exp was conducted.

get_elog_kerberos_auth

Return the authorization headers for the user account submitting the job.

elog_http_request

str, request_type: str, **params): Make an HTTP request to the API endpoint at url.

format_file_for_post

Union[str, tuple, list]): Prepare files according to the specification needed to add them as attachments to eLog posts.

post_elog_message

str, msg: str, tag: Optional[str], title: Optional[str], in_files: List[Union[str, tuple, list]], auth: Optional[Union[HTTPBasicAuth, Dict]] = None) Post a message to the eLog.

post_elog_run_status

Dict[str, Union[str, int, float]], update_url: Optional[str] = None) Post a run status to the summary section on the Workflows>Control tab.

post_elog_run_table

str, run: int, data: Dict[str, Any], auth: Optional[Union[HTTPBasicAuth, Dict]] = None) Update run table in the eLog.

get_elog_runs_by_tag

str, tag: str, auth: Optional[Union[HTTPBasicAuth, Dict]] = None) Return a list of runs with a specific tag.

get_elog_params_by_run

str, params: List[str], runs: Optional[List[int]]) Retrieve the requested parameters by run. If no run is provided, retrieve the requested parameters for all runs.

"},{"location":"source/io/elog/#io.elog.elog_http_request","title":"elog_http_request(exp, endpoint, request_type, **params)","text":"

Make an HTTP request to the eLog.

This method will determine the proper authorization method and update the passed parameters appropriately. Functions implementing specific endpoint functionality and calling this function should only pass the necessary endpoint-specific parameters and not include the authorization objects.

Parameters:

Name Type Description Default exp str

Experiment.

required endpoint str

eLog API endpoint.

required request_type str

Type of request to make. Recognized options: POST or GET.

required **params Dict

Endpoint parameters to pass with the HTTP request! Differs depending on the API endpoint. Do not include auth objects.

{}

Returns:

Name Type Description status_code int

Response status code. Can be checked for errors.

msg str

An error message, or a message saying SUCCESS.

value Optional[Any]

For GET requests ONLY, return the requested information.

Source code in lute/io/elog.py
def elog_http_request(\n    exp: str, endpoint: str, request_type: str, **params\n) -> Tuple[int, str, Optional[Any]]:\n    \"\"\"Make an HTTP request to the eLog.\n\n    This method will determine the proper authorization method and update the\n    passed parameters appropriately. Functions implementing specific endpoint\n    functionality and calling this function should only pass the necessary\n    endpoint-specific parameters and not include the authorization objects.\n\n    Args:\n        exp (str): Experiment.\n\n        endpoint (str): eLog API endpoint.\n\n        request_type (str): Type of request to make. Recognized options: POST or\n            GET.\n\n        **params (Dict): Endpoint parameters to pass with the HTTP request!\n            Differs depending on the API endpoint. Do not include auth objects.\n\n    Returns:\n        status_code (int): Response status code. Can be checked for errors.\n\n        msg (str): An error message, or a message saying SUCCESS.\n\n        value (Optional[Any]): For GET requests ONLY, return the requested\n            information.\n    \"\"\"\n    auth: Union[HTTPBasicAuth, Dict[str, str]] = get_elog_auth(exp)\n    base_url: str\n    if isinstance(auth, HTTPBasicAuth):\n        params.update({\"auth\": auth})\n        base_url = \"https://pswww.slac.stanford.edu/ws-auth/lgbk/lgbk\"\n    elif isinstance(auth, dict):\n        params.update({\"headers\": auth})\n        base_url = \"https://pswww.slac.stanford.edu/ws-kerb/lgbk/lgbk\"\n\n    url: str = f\"{base_url}/{endpoint}\"\n\n    resp: requests.models.Response\n    if request_type.upper() == \"POST\":\n        resp = requests.post(url, **params)\n    elif request_type.upper() == \"GET\":\n        resp = requests.get(url, **params)\n    else:\n        return (-1, \"Invalid request type!\", None)\n\n    status_code: int = resp.status_code\n    msg: str = \"SUCCESS\"\n\n    if resp.json()[\"success\"] and request_type.upper() == \"GET\":\n        return (status_code, msg, resp.json()[\"value\"])\n\n    if status_code >= 300:\n        msg = f\"Error when posting to eLog: Response {status_code}\"\n\n    if not resp.json()[\"success\"]:\n        err_msg = resp.json()[\"error_msg\"]\n        msg += f\"\\nInclude message: {err_msg}\"\n    return (resp.status_code, msg, None)\n
"},{"location":"source/io/elog/#io.elog.format_file_for_post","title":"format_file_for_post(in_file)","text":"

Format a file for attachment to an eLog post.

The eLog API expects a specifically formatted tuple when adding file attachments. This function prepares the tuple to specification given a number of different input types.

Parameters:

Name Type Description Default in_file str | tuple | list

File to include as an attachment in an eLog post.

required Source code in lute/io/elog.py
def format_file_for_post(\n    in_file: Union[str, tuple, list]\n) -> Tuple[str, Tuple[str, BufferedReader], Any]:\n    \"\"\"Format a file for attachment to an eLog post.\n\n    The eLog API expects a specifically formatted tuple when adding file\n    attachments. This function prepares the tuple to specification given a\n    number of different input types.\n\n    Args:\n        in_file (str | tuple | list): File to include as an attachment in an\n            eLog post.\n    \"\"\"\n    description: str\n    fptr: BufferedReader\n    ftype: Optional[str]\n    if isinstance(in_file, str):\n        description = os.path.basename(in_file)\n        fptr = open(in_file, \"rb\")\n        ftype = mimetypes.guess_type(in_file)[0]\n    elif isinstance(in_file, tuple) or isinstance(in_file, list):\n        description = in_file[1]\n        fptr = open(in_file[0], \"rb\")\n        ftype = mimetypes.guess_type(in_file[0])[0]\n    else:\n        raise ElogFileFormatError(f\"Unrecognized format: {in_file}\")\n\n    out_file: Tuple[str, Tuple[str, BufferedReader], Any] = (\n        \"files\",\n        (description, fptr),\n        ftype,\n    )\n    return out_file\n
"},{"location":"source/io/elog/#io.elog.get_elog_active_expmt","title":"get_elog_active_expmt(hutch, *, endstation=0)","text":"

Get the current active experiment for a hutch.

This function is one of two functions to manage the HTTP request independently. This is because it does not require an authorization object, and its result is needed for the generic function elog_http_request to work properly.

Parameters:

Name Type Description Default hutch str

The hutch to get the active experiment for.

required endstation int

The hutch endstation to get the experiment for. This should generally be 0.

0 Source code in lute/io/elog.py
def get_elog_active_expmt(hutch: str, *, endstation: int = 0) -> str:\n    \"\"\"Get the current active experiment for a hutch.\n\n    This function is one of two functions to manage the HTTP request independently.\n    This is because it does not require an authorization object, and its result\n    is needed for the generic function `elog_http_request` to work properly.\n\n    Args:\n        hutch (str): The hutch to get the active experiment for.\n\n        endstation (int): The hutch endstation to get the experiment for. This\n            should generally be 0.\n    \"\"\"\n\n    base_url: str = \"https://pswww.slac.stanford.edu/ws/lgbk/lgbk\"\n    endpoint: str = \"ws/activeexperiment_for_instrument_station\"\n    url: str = f\"{base_url}/{endpoint}\"\n    params: Dict[str, str] = {\"instrument_name\": hutch, \"station\": f\"{endstation}\"}\n    resp: requests.models.Response = requests.get(url, params)\n    if resp.status_code > 300:\n        raise RuntimeError(\n            f\"Error getting current experiment!\\n\\t\\tIncorrect hutch: '{hutch}'?\"\n        )\n    if resp.json()[\"success\"]:\n        return resp.json()[\"value\"][\"name\"]\n    else:\n        msg: str = resp.json()[\"error_msg\"]\n        raise RuntimeError(f\"Error getting current experiment! Err: {msg}\")\n
"},{"location":"source/io/elog/#io.elog.get_elog_auth","title":"get_elog_auth(exp)","text":"

Determine the appropriate auth method depending on experiment state.

Returns:

Name Type Description auth HTTPBasicAuth | Dict[str, str]

Depending on whether an experiment is active/live, returns authorization for the hutch operator account or the current user submitting a job.

Source code in lute/io/elog.py
def get_elog_auth(exp: str) -> Union[HTTPBasicAuth, Dict[str, str]]:\n    \"\"\"Determine the appropriate auth method depending on experiment state.\n\n    Returns:\n        auth (HTTPBasicAuth | Dict[str, str]): Depending on whether an experiment\n            is active/live, returns authorization for the hutch operator account\n            or the current user submitting a job.\n    \"\"\"\n    hutch: str = exp[:3]\n    if exp.lower() == get_elog_active_expmt(hutch=hutch).lower():\n        return get_elog_opr_auth(exp)\n    else:\n        return get_elog_kerberos_auth()\n
"},{"location":"source/io/elog/#io.elog.get_elog_kerberos_auth","title":"get_elog_kerberos_auth()","text":"

Returns Kerberos authorization key.

This functions returns authorization for the USER account submitting jobs. It assumes that kinit has been run.

Returns:

Name Type Description auth Dict[str, str]

Dictionary containing Kerberos authorization key.

Source code in lute/io/elog.py
def get_elog_kerberos_auth() -> Dict[str, str]:\n    \"\"\"Returns Kerberos authorization key.\n\n    This functions returns authorization for the USER account submitting jobs.\n    It assumes that `kinit` has been run.\n\n    Returns:\n        auth (Dict[str, str]): Dictionary containing Kerberos authorization key.\n    \"\"\"\n    from krtc import KerberosTicket\n\n    return KerberosTicket(\"HTTP@pswww.slac.stanford.edu\").getAuthHeaders()\n
"},{"location":"source/io/elog/#io.elog.get_elog_opr_auth","title":"get_elog_opr_auth(exp)","text":"

Produce authentication for the \"opr\" user associated to an experiment.

This method uses basic authentication using username and password.

Parameters:

Name Type Description Default exp str

Name of the experiment to produce authentication for.

required

Returns:

Name Type Description auth HTTPBasicAuth

HTTPBasicAuth for an active experiment based on username and password for the associated operator account.

Source code in lute/io/elog.py
def get_elog_opr_auth(exp: str) -> HTTPBasicAuth:\n    \"\"\"Produce authentication for the \"opr\" user associated to an experiment.\n\n    This method uses basic authentication using username and password.\n\n    Args:\n        exp (str): Name of the experiment to produce authentication for.\n\n    Returns:\n        auth (HTTPBasicAuth): HTTPBasicAuth for an active experiment based on\n            username and password for the associated operator account.\n    \"\"\"\n    opr: str = f\"{exp[:3]}opr\"\n    with open(\"/sdf/group/lcls/ds/tools/forElogPost.txt\", \"r\") as f:\n        pw: str = f.readline()[:-1]\n    return HTTPBasicAuth(opr, pw)\n
"},{"location":"source/io/elog/#io.elog.get_elog_params_by_run","title":"get_elog_params_by_run(exp, params, runs=None)","text":"

Retrieve requested parameters by run or for all runs.

Parameters:

Name Type Description Default exp str

Experiment to retrieve parameters for.

required params List[str]

A list of parameters to retrieve. These can be any parameter recorded in the eLog (PVs, parameters posted by other Tasks, etc.)

required Source code in lute/io/elog.py
def get_elog_params_by_run(\n    exp: str, params: List[str], runs: Optional[List[int]] = None\n) -> Dict[str, str]:\n    \"\"\"Retrieve requested parameters by run or for all runs.\n\n    Args:\n        exp (str): Experiment to retrieve parameters for.\n\n        params (List[str]): A list of parameters to retrieve. These can be any\n            parameter recorded in the eLog (PVs, parameters posted by other\n            Tasks, etc.)\n    \"\"\"\n    ...\n
"},{"location":"source/io/elog/#io.elog.get_elog_runs_by_tag","title":"get_elog_runs_by_tag(exp, tag, auth=None)","text":"

Retrieve run numbers with a specified tag.

Parameters:

Name Type Description Default exp str

Experiment name.

required tag str

The tag to retrieve runs for.

required Source code in lute/io/elog.py
def get_elog_runs_by_tag(\n    exp: str, tag: str, auth: Optional[Union[HTTPBasicAuth, Dict]] = None\n) -> List[int]:\n    \"\"\"Retrieve run numbers with a specified tag.\n\n    Args:\n        exp (str): Experiment name.\n\n        tag (str): The tag to retrieve runs for.\n    \"\"\"\n    endpoint: str = f\"{exp}/ws/get_runs_with_tag?tag={tag}\"\n    params: Dict[str, Any] = {}\n\n    status_code, resp_msg, tagged_runs = elog_http_request(\n        exp=exp, endpoint=endpoint, request_type=\"GET\", **params\n    )\n\n    if not tagged_runs:\n        tagged_runs = []\n\n    return tagged_runs\n
"},{"location":"source/io/elog/#io.elog.get_elog_workflows","title":"get_elog_workflows(exp)","text":"

Get the current workflow definitions for an experiment.

Returns:

Name Type Description defns Dict[str, str]

A dictionary of workflow definitions.

Source code in lute/io/elog.py
def get_elog_workflows(exp: str) -> Dict[str, str]:\n    \"\"\"Get the current workflow definitions for an experiment.\n\n    Returns:\n        defns (Dict[str, str]): A dictionary of workflow definitions.\n    \"\"\"\n    raise NotImplementedError\n
"},{"location":"source/io/elog/#io.elog.post_elog_message","title":"post_elog_message(exp, msg, *, tag, title, in_files=[])","text":"

Post a new message to the eLog. Inspired by the elog package.

Parameters:

Name Type Description Default exp str

Experiment name.

required msg str

BODY of the eLog post.

required tag str | None

Optional \"tag\" to associate with the eLog post.

required title str | None

Optional title to include in the eLog post.

required in_files List[str | tuple | list]

Files to include as attachments in the eLog post.

[]

Returns:

Name Type Description err_msg str | None

If successful, nothing is returned, otherwise, return an error message.

Source code in lute/io/elog.py
def post_elog_message(\n    exp: str,\n    msg: str,\n    *,\n    tag: Optional[str],\n    title: Optional[str],\n    in_files: List[Union[str, tuple, list]] = [],\n) -> Optional[str]:\n    \"\"\"Post a new message to the eLog. Inspired by the `elog` package.\n\n    Args:\n        exp (str): Experiment name.\n\n        msg (str): BODY of the eLog post.\n\n        tag (str | None): Optional \"tag\" to associate with the eLog post.\n\n        title (str | None): Optional title to include in the eLog post.\n\n        in_files (List[str | tuple | list]): Files to include as attachments in\n            the eLog post.\n\n    Returns:\n        err_msg (str | None): If successful, nothing is returned, otherwise,\n            return an error message.\n    \"\"\"\n    # MOSTLY CORRECT\n    out_files: list = []\n    for f in in_files:\n        try:\n            out_files.append(format_file_for_post(in_file=f))\n        except ElogFileFormatError as err:\n            logger.debug(f\"ElogFileFormatError: {err}\")\n    post: Dict[str, str] = {}\n    post[\"log_text\"] = msg\n    if tag:\n        post[\"log_tags\"] = tag\n    if title:\n        post[\"log_title\"] = title\n\n    endpoint: str = f\"{exp}/ws/new_elog_entry\"\n\n    params: Dict[str, Any] = {\"data\": post}\n\n    if out_files:\n        params.update({\"files\": out_files})\n\n    status_code, resp_msg, _ = elog_http_request(\n        exp=exp, endpoint=endpoint, request_type=\"POST\", **params\n    )\n\n    if resp_msg != \"SUCCESS\":\n        return resp_msg\n
"},{"location":"source/io/elog/#io.elog.post_elog_run_status","title":"post_elog_run_status(data, update_url=None)","text":"

Post a summary to the status/report section of a specific run.

In contrast to most eLog update/post mechanisms, this function searches for a specific environment variable which contains a specific URL for posting. This is updated every job/run as jobs are submitted by the JID. The URL can optionally be passed to this function if it is known.

Parameters:

Name Type Description Default data Dict[str, Union[str, int, float]]

The data to post to the eLog report section. Formatted in key:value pairs.

required update_url Optional[str]

Optional update URL. If not provided, the function searches for the corresponding environment variable. If neither is found, the function aborts

None Source code in lute/io/elog.py
def post_elog_run_status(\n    data: Dict[str, Union[str, int, float]], update_url: Optional[str] = None\n) -> None:\n    \"\"\"Post a summary to the status/report section of a specific run.\n\n    In contrast to most eLog update/post mechanisms, this function searches\n    for a specific environment variable which contains a specific URL for\n    posting. This is updated every job/run as jobs are submitted by the JID.\n    The URL can optionally be passed to this function if it is known.\n\n    Args:\n        data (Dict[str, Union[str, int, float]]): The data to post to the eLog\n            report section. Formatted in key:value pairs.\n\n        update_url (Optional[str]): Optional update URL. If not provided, the\n            function searches for the corresponding environment variable. If\n            neither is found, the function aborts\n    \"\"\"\n    if update_url is None:\n        update_url = os.environ.get(\"JID_UPDATE_COUNTERS\")\n        if update_url is None:\n            logger.info(\"eLog Update Failed! JID_UPDATE_COUNTERS is not defined!\")\n            return\n    current_status: Dict[str, Union[str, int, float]] = _get_current_run_status(\n        update_url\n    )\n    current_status.update(data)\n    post_list: List[Dict[str, str]] = [\n        {\"key\": f\"{key}\", \"value\": f\"{value}\"} for key, value in current_status.items()\n    ]\n    params: Dict[str, List[Dict[str, str]]] = {\"json\": post_list}\n    resp: requests.models.Response = requests.post(update_url, **params)\n
"},{"location":"source/io/elog/#io.elog.post_elog_run_table","title":"post_elog_run_table(exp, run, data)","text":"

Post data for eLog run tables.

Parameters:

Name Type Description Default exp str

Experiment name.

required run int

Run number corresponding to the data being posted.

required data Dict[str, Any]

Data to be posted in format data[\"column_header\"] = value.

required

Returns:

Name Type Description err_msg None | str

If successful, nothing is returned, otherwise, return an error message.

Source code in lute/io/elog.py
def post_elog_run_table(\n    exp: str,\n    run: int,\n    data: Dict[str, Any],\n) -> Optional[str]:\n    \"\"\"Post data for eLog run tables.\n\n    Args:\n        exp (str): Experiment name.\n\n        run (int): Run number corresponding to the data being posted.\n\n        data (Dict[str, Any]): Data to be posted in format\n            data[\"column_header\"] = value.\n\n    Returns:\n        err_msg (None | str): If successful, nothing is returned, otherwise,\n            return an error message.\n    \"\"\"\n    endpoint: str = f\"run_control/{exp}/ws/add_run_params\"\n\n    params: Dict[str, Any] = {\"params\": {\"run_num\": run}, \"json\": data}\n\n    status_code, resp_msg, _ = elog_http_request(\n        exp=exp, endpoint=endpoint, request_type=\"POST\", **params\n    )\n\n    if resp_msg != \"SUCCESS\":\n        return resp_msg\n
"},{"location":"source/io/elog/#io.elog.post_elog_workflow","title":"post_elog_workflow(exp, name, executable, wf_params, *, trigger='run_end', location='S3DF', **trig_args)","text":"

Create a new eLog workflow, or update an existing one.

The workflow will run a specific executable as a batch job when the specified trigger occurs. The precise arguments may vary depending on the selected trigger type.

Parameters:

Name Type Description Default name str

An identifying name for the workflow. E.g. \"process data\"

required executable str

Full path to the executable to be run.

required wf_params str

All command-line parameters for the executable as a string.

required trigger str

When to trigger execution of the specified executable. One of: - 'manual': Must be manually triggered. No automatic processing. - 'run_start': Execute immediately if a new run begins. - 'run_end': As soon as a run ends. - 'param_is': As soon as a parameter has a specific value for a run.

'run_end' location str

Where to submit the job. S3DF or NERSC.

'S3DF' **trig_args str

Arguments required for a specific trigger type. trigger='param_is' - 2 Arguments trig_param (str): Name of the parameter to watch for. trig_param_val (str): Value the parameter should have to trigger.

{} Source code in lute/io/elog.py
def post_elog_workflow(\n    exp: str,\n    name: str,\n    executable: str,\n    wf_params: str,\n    *,\n    trigger: str = \"run_end\",\n    location: str = \"S3DF\",\n    **trig_args: str,\n) -> None:\n    \"\"\"Create a new eLog workflow, or update an existing one.\n\n    The workflow will run a specific executable as a batch job when the\n    specified trigger occurs. The precise arguments may vary depending on the\n    selected trigger type.\n\n    Args:\n        name (str): An identifying name for the workflow. E.g. \"process data\"\n\n        executable (str): Full path to the executable to be run.\n\n        wf_params (str): All command-line parameters for the executable as a string.\n\n        trigger (str): When to trigger execution of the specified executable.\n            One of:\n                - 'manual': Must be manually triggered. No automatic processing.\n                - 'run_start': Execute immediately if a new run begins.\n                - 'run_end': As soon as a run ends.\n                - 'param_is': As soon as a parameter has a specific value for a run.\n\n        location (str): Where to submit the job. S3DF or NERSC.\n\n        **trig_args (str): Arguments required for a specific trigger type.\n            trigger='param_is' - 2 Arguments\n                trig_param (str): Name of the parameter to watch for.\n                trig_param_val (str): Value the parameter should have to trigger.\n    \"\"\"\n    endpoint: str = f\"{exp}/ws/create_update_workflow_def\"\n    trig_map: Dict[str, str] = {\n        \"manual\": \"MANUAL\",\n        \"run_start\": \"START_OF_RUN\",\n        \"run_end\": \"END_OF_RUN\",\n        \"param_is\": \"RUN_PARAM_IS_VALUE\",\n    }\n    if trigger not in trig_map.keys():\n        raise NotImplementedError(\n            f\"Cannot create workflow with trigger type: {trigger}\"\n        )\n    wf_defn: Dict[str, str] = {\n        \"name\": name,\n        \"executable\": executable,\n        \"parameters\": wf_params,\n        \"trigger\": trig_map[trigger],\n        \"location\": location,\n    }\n    if trigger == \"param_is\":\n        if \"trig_param\" not in trig_args or \"trig_param_val\" not in trig_args:\n            raise RuntimeError(\n                \"Trigger type 'param_is' requires: 'trig_param' and 'trig_param_val' arguments\"\n            )\n        wf_defn.update(\n            {\n                \"run_param_name\": trig_args[\"trig_param\"],\n                \"run_param_val\": trig_args[\"trig_param_val\"],\n            }\n        )\n    post_params: Dict[str, Dict[str, str]] = {\"json\": wf_defn}\n    status_code, resp_msg, _ = elog_http_request(\n        exp, endpoint=endpoint, request_type=\"POST\", **post_params\n    )\n
"},{"location":"source/io/exceptions/","title":"exceptions","text":"

Specifies custom exceptions defined for IO problems.

Raises:

Type Description ElogFileFormatError

Raised if an attachment is specified in an incorrect format.

"},{"location":"source/io/exceptions/#io.exceptions.ElogFileFormatError","title":"ElogFileFormatError","text":"

Bases: Exception

Raised when an eLog attachment is specified in an invalid format.

Source code in lute/io/exceptions.py
class ElogFileFormatError(Exception):\n    \"\"\"Raised when an eLog attachment is specified in an invalid format.\"\"\"\n\n    ...\n
"},{"location":"source/io/models/base/","title":"base","text":"

Base classes for describing Task parameters.

Classes:

Name Description AnalysisHeader

Model holding shared configuration across Tasks. E.g. experiment name, run number and working directory.

TaskParameters

Base class for Task parameters. Subclasses specify a model of parameters and their types for validation.

ThirdPartyParameters

Base class for Third-party, binary executable Tasks.

TemplateParameters

Dataclass to represent parameters of binary (third-party) Tasks which are used for additional config files.

TemplateConfig

Class for holding information on where templates are stored in order to properly handle ThirdPartyParameter objects.

"},{"location":"source/io/models/base/#io.models.base.AnalysisHeader","title":"AnalysisHeader","text":"

Bases: BaseModel

Header information for LUTE analysis runs.

Source code in lute/io/models/base.py
class AnalysisHeader(BaseModel):\n    \"\"\"Header information for LUTE analysis runs.\"\"\"\n\n    title: str = Field(\n        \"LUTE Task Configuration\",\n        description=\"Description of the configuration or experiment.\",\n    )\n    experiment: str = Field(\"\", description=\"Experiment.\")\n    run: Union[str, int] = Field(\"\", description=\"Data acquisition run.\")\n    date: str = Field(\"1970/01/01\", description=\"Start date of analysis.\")\n    lute_version: Union[float, str] = Field(\n        0.1, description=\"Version of LUTE used for analysis.\"\n    )\n    task_timeout: PositiveInt = Field(\n        600,\n        description=(\n            \"Time in seconds until a task times out. Should be slightly shorter\"\n            \" than job timeout if using a job manager (e.g. SLURM).\"\n        ),\n    )\n    work_dir: str = Field(\"\", description=\"Main working directory for LUTE.\")\n\n    @validator(\"work_dir\", always=True)\n    def validate_work_dir(cls, directory: str, values: Dict[str, Any]) -> str:\n        work_dir: str\n        if directory == \"\":\n            std_work_dir = (\n                f\"/sdf/data/lcls/ds/{values['experiment'][:3]}/\"\n                f\"{values['experiment']}/scratch\"\n            )\n            work_dir = std_work_dir\n        else:\n            work_dir = directory\n        # Check existence and permissions\n        if not os.path.exists(work_dir):\n            raise ValueError(f\"Working Directory: {work_dir} does not exist!\")\n        if not os.access(work_dir, os.W_OK):\n            # Need write access for database, files etc.\n            raise ValueError(f\"Not write access for working directory: {work_dir}!\")\n        return work_dir\n\n    @validator(\"run\", always=True)\n    def validate_run(\n        cls, run: Union[str, int], values: Dict[str, Any]\n    ) -> Union[str, int]:\n        if run == \"\":\n            # From Airflow RUN_NUM should have Format \"RUN_DATETIME\" - Num is first part\n            run_time: str = os.environ.get(\"RUN_NUM\", \"\")\n            if run_time != \"\":\n                return int(run_time.split(\"_\")[0])\n        return run\n\n    @validator(\"experiment\", always=True)\n    def validate_experiment(cls, experiment: str, values: Dict[str, Any]) -> str:\n        if experiment == \"\":\n            arp_exp: str = os.environ.get(\"EXPERIMENT\", \"EXPX00000\")\n            return arp_exp\n        return experiment\n
"},{"location":"source/io/models/base/#io.models.base.TaskParameters","title":"TaskParameters","text":"

Bases: BaseSettings

Base class for models of task parameters to be validated.

Parameters are read from a configuration YAML file and validated against subclasses of this type in order to ensure that both all parameters are present, and that the parameters are of the correct type.

Note

Pydantic is used for data validation. Pydantic does not perform \"strict\" validation by default. Parameter values may be cast to conform with the model specified by the subclass definition if it is possible to do so. Consider whether this may cause issues (e.g. if a float is cast to an int).

Source code in lute/io/models/base.py
class TaskParameters(BaseSettings):\n    \"\"\"Base class for models of task parameters to be validated.\n\n    Parameters are read from a configuration YAML file and validated against\n    subclasses of this type in order to ensure that both all parameters are\n    present, and that the parameters are of the correct type.\n\n    Note:\n        Pydantic is used for data validation. Pydantic does not perform \"strict\"\n        validation by default. Parameter values may be cast to conform with the\n        model specified by the subclass definition if it is possible to do so.\n        Consider whether this may cause issues (e.g. if a float is cast to an\n        int).\n    \"\"\"\n\n    class Config:\n        \"\"\"Configuration for parameters model.\n\n        The Config class holds Pydantic configuration. A number of LUTE-specific\n        configuration has also been placed here.\n\n        Attributes:\n            env_prefix (str): Pydantic configuration. Will set parameters from\n                environment variables containing this prefix. E.g. a model\n                parameter `input` can be set with an environment variable:\n                `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n            underscore_attrs_are_private (bool): Pydantic configuration. Whether\n                to hide attributes (parameters) prefixed with an underscore.\n\n            copy_on_model_validation (str): Pydantic configuration. How to copy\n                the input object passed to the class instance for model\n                validation. Set to perform a deep copy.\n\n            allow_inf_nan (bool): Pydantic configuration. Whether to allow\n                infinity or NAN in float fields.\n\n            run_directory (Optional[str]): None. If set, it should be a valid\n                path. The `Task` will be run from this directory. This may be\n                useful for some `Task`s which rely on searching the working\n                directory.\n\n            set_result (bool). False. If True, the model has information about\n                setting the TaskResult object from the parameters it contains.\n                E.g. it has an `output` parameter which is marked as the result.\n                The result can be set with a field value of `is_result=True` on\n                a specific parameter, or using `result_from_params` and a\n                validator.\n\n            result_from_params (Optional[str]): None. Optionally used to define\n                results from information available in the model using a custom\n                validator. E.g. use a `outdir` and `filename` field to set\n                `result_from_params=f\"{outdir}/{filename}`, etc. Only used if\n                `set_result==True`\n\n            result_summary (Optional[str]): None. Defines a result summary that\n                can be known after processing the Pydantic model. Use of summary\n                depends on the Executor running the Task. All summaries are\n                stored in the database, however. Only used if `set_result==True`\n\n            impl_schemas (Optional[str]). Specifies a the schemas the\n                output/results conform to. Only used if `set_result==True`.\n        \"\"\"\n\n        env_prefix = \"LUTE_\"\n        underscore_attrs_are_private: bool = True\n        copy_on_model_validation: str = \"deep\"\n        allow_inf_nan: bool = False\n\n        run_directory: Optional[str] = None\n        \"\"\"Set the directory that the Task is run from.\"\"\"\n        set_result: bool = False\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n        result_from_params: Optional[str] = None\n        \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n        result_summary: Optional[str] = None\n        \"\"\"Format a TaskResult.summary from output.\"\"\"\n        impl_schemas: Optional[str] = None\n        \"\"\"Schema specification for output result. Will be passed to TaskResult.\"\"\"\n\n    lute_config: AnalysisHeader\n
"},{"location":"source/io/models/base/#io.models.base.TaskParameters.Config","title":"Config","text":"

Configuration for parameters model.

The Config class holds Pydantic configuration. A number of LUTE-specific configuration has also been placed here.

Attributes:

Name Type Description env_prefix str

Pydantic configuration. Will set parameters from environment variables containing this prefix. E.g. a model parameter input can be set with an environment variable: {env_prefix}input, in LUTE's case LUTE_input.

underscore_attrs_are_private bool

Pydantic configuration. Whether to hide attributes (parameters) prefixed with an underscore.

copy_on_model_validation str

Pydantic configuration. How to copy the input object passed to the class instance for model validation. Set to perform a deep copy.

allow_inf_nan bool

Pydantic configuration. Whether to allow infinity or NAN in float fields.

run_directory Optional[str]

None. If set, it should be a valid path. The Task will be run from this directory. This may be useful for some Tasks which rely on searching the working directory.

result_from_params Optional[str]

None. Optionally used to define results from information available in the model using a custom validator. E.g. use a outdir and filename field to set result_from_params=f\"{outdir}/{filename}, etc. Only used if set_result==True

result_summary Optional[str]

None. Defines a result summary that can be known after processing the Pydantic model. Use of summary depends on the Executor running the Task. All summaries are stored in the database, however. Only used if set_result==True

Source code in lute/io/models/base.py
class Config:\n    \"\"\"Configuration for parameters model.\n\n    The Config class holds Pydantic configuration. A number of LUTE-specific\n    configuration has also been placed here.\n\n    Attributes:\n        env_prefix (str): Pydantic configuration. Will set parameters from\n            environment variables containing this prefix. E.g. a model\n            parameter `input` can be set with an environment variable:\n            `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n        underscore_attrs_are_private (bool): Pydantic configuration. Whether\n            to hide attributes (parameters) prefixed with an underscore.\n\n        copy_on_model_validation (str): Pydantic configuration. How to copy\n            the input object passed to the class instance for model\n            validation. Set to perform a deep copy.\n\n        allow_inf_nan (bool): Pydantic configuration. Whether to allow\n            infinity or NAN in float fields.\n\n        run_directory (Optional[str]): None. If set, it should be a valid\n            path. The `Task` will be run from this directory. This may be\n            useful for some `Task`s which rely on searching the working\n            directory.\n\n        set_result (bool). False. If True, the model has information about\n            setting the TaskResult object from the parameters it contains.\n            E.g. it has an `output` parameter which is marked as the result.\n            The result can be set with a field value of `is_result=True` on\n            a specific parameter, or using `result_from_params` and a\n            validator.\n\n        result_from_params (Optional[str]): None. Optionally used to define\n            results from information available in the model using a custom\n            validator. E.g. use a `outdir` and `filename` field to set\n            `result_from_params=f\"{outdir}/{filename}`, etc. Only used if\n            `set_result==True`\n\n        result_summary (Optional[str]): None. Defines a result summary that\n            can be known after processing the Pydantic model. Use of summary\n            depends on the Executor running the Task. All summaries are\n            stored in the database, however. Only used if `set_result==True`\n\n        impl_schemas (Optional[str]). Specifies a the schemas the\n            output/results conform to. Only used if `set_result==True`.\n    \"\"\"\n\n    env_prefix = \"LUTE_\"\n    underscore_attrs_are_private: bool = True\n    copy_on_model_validation: str = \"deep\"\n    allow_inf_nan: bool = False\n\n    run_directory: Optional[str] = None\n    \"\"\"Set the directory that the Task is run from.\"\"\"\n    set_result: bool = False\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n    result_from_params: Optional[str] = None\n    \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n    result_summary: Optional[str] = None\n    \"\"\"Format a TaskResult.summary from output.\"\"\"\n    impl_schemas: Optional[str] = None\n    \"\"\"Schema specification for output result. Will be passed to TaskResult.\"\"\"\n
"},{"location":"source/io/models/base/#io.models.base.TaskParameters.Config.impl_schemas","title":"impl_schemas: Optional[str] = None class-attribute instance-attribute","text":"

Schema specification for output result. Will be passed to TaskResult.

"},{"location":"source/io/models/base/#io.models.base.TaskParameters.Config.result_from_params","title":"result_from_params: Optional[str] = None class-attribute instance-attribute","text":"

Defines a result from the parameters. Use a validator to do so.

"},{"location":"source/io/models/base/#io.models.base.TaskParameters.Config.result_summary","title":"result_summary: Optional[str] = None class-attribute instance-attribute","text":"

Format a TaskResult.summary from output.

"},{"location":"source/io/models/base/#io.models.base.TaskParameters.Config.run_directory","title":"run_directory: Optional[str] = None class-attribute instance-attribute","text":"

Set the directory that the Task is run from.

"},{"location":"source/io/models/base/#io.models.base.TaskParameters.Config.set_result","title":"set_result: bool = False class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/models/base/#io.models.base.TemplateConfig","title":"TemplateConfig","text":"

Bases: BaseModel

Parameters used for templating of third party configuration files.

Attributes:

Name Type Description template_name str

The name of the template to use. This template must live in config/templates.

output_path str

The FULL path, including filename to write the rendered template to.

Source code in lute/io/models/base.py
class TemplateConfig(BaseModel):\n    \"\"\"Parameters used for templating of third party configuration files.\n\n    Attributes:\n        template_name (str): The name of the template to use. This template must\n            live in `config/templates`.\n\n        output_path (str): The FULL path, including filename to write the\n            rendered template to.\n    \"\"\"\n\n    template_name: str\n    output_path: str\n
"},{"location":"source/io/models/base/#io.models.base.TemplateParameters","title":"TemplateParameters","text":"

Class for representing parameters for third party configuration files.

These parameters can represent arbitrary data types and are used in conjunction with templates for modifying third party configuration files from the single LUTE YAML. Due to the storage of arbitrary data types, and the use of a template file, a single instance of this class can hold from a single template variable to an entire configuration file. The data parsing is done by jinja using the complementary template. All data is stored in the single model variable params.

The pydantic \"dataclass\" is used over the BaseModel/Settings to allow positional argument instantiation of the params Field.

Source code in lute/io/models/base.py
@dataclass\nclass TemplateParameters:\n    \"\"\"Class for representing parameters for third party configuration files.\n\n    These parameters can represent arbitrary data types and are used in\n    conjunction with templates for modifying third party configuration files\n    from the single LUTE YAML. Due to the storage of arbitrary data types, and\n    the use of a template file, a single instance of this class can hold from a\n    single template variable to an entire configuration file. The data parsing\n    is done by jinja using the complementary template.\n    All data is stored in the single model variable `params.`\n\n    The pydantic \"dataclass\" is used over the BaseModel/Settings to allow\n    positional argument instantiation of the `params` Field.\n    \"\"\"\n\n    params: Any\n
"},{"location":"source/io/models/base/#io.models.base.ThirdPartyParameters","title":"ThirdPartyParameters","text":"

Bases: TaskParameters

Base class for third party task parameters.

Contains special validators for extra arguments and handling of parameters used for filling in third party configuration files.

Source code in lute/io/models/base.py
class ThirdPartyParameters(TaskParameters):\n    \"\"\"Base class for third party task parameters.\n\n    Contains special validators for extra arguments and handling of parameters\n    used for filling in third party configuration files.\n    \"\"\"\n\n    class Config(TaskParameters.Config):\n        \"\"\"Configuration for parameters model.\n\n        The Config class holds Pydantic configuration and inherited configuration\n        from the base `TaskParameters.Config` class. A number of values are also\n        overridden, and there are some specific configuration options to\n        ThirdPartyParameters. A full list of options (with TaskParameters options\n        repeated) is described below.\n\n        Attributes:\n            env_prefix (str): Pydantic configuration. Will set parameters from\n                environment variables containing this prefix. E.g. a model\n                parameter `input` can be set with an environment variable:\n                `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n            underscore_attrs_are_private (bool): Pydantic configuration. Whether\n                to hide attributes (parameters) prefixed with an underscore.\n\n            copy_on_model_validation (str): Pydantic configuration. How to copy\n                the input object passed to the class instance for model\n                validation. Set to perform a deep copy.\n\n            allow_inf_nan (bool): Pydantic configuration. Whether to allow\n                infinity or NAN in float fields.\n\n            run_directory (Optional[str]): None. If set, it should be a valid\n                path. The `Task` will be run from this directory. This may be\n                useful for some `Task`s which rely on searching the working\n                directory.\n\n            set_result (bool). True. If True, the model has information about\n                setting the TaskResult object from the parameters it contains.\n                E.g. it has an `output` parameter which is marked as the result.\n                The result can be set with a field value of `is_result=True` on\n                a specific parameter, or using `result_from_params` and a\n                validator.\n\n            result_from_params (Optional[str]): None. Optionally used to define\n                results from information available in the model using a custom\n                validator. E.g. use a `outdir` and `filename` field to set\n                `result_from_params=f\"{outdir}/{filename}`, etc.\n\n            result_summary (Optional[str]): None. Defines a result summary that\n                can be known after processing the Pydantic model. Use of summary\n                depends on the Executor running the Task. All summaries are\n                stored in the database, however.\n\n            impl_schemas (Optional[str]). Specifies a the schemas the\n                output/results conform to. Only used if set_result is True.\n\n            -----------------------\n            ThirdPartyTask-specific:\n\n            extra (str): \"allow\". Pydantic configuration. Allow (or ignore) extra\n                arguments.\n\n            short_flags_use_eq (bool): False. If True, \"short\" command-line args\n                are passed as `-x=arg`. ThirdPartyTask-specific.\n\n            long_flags_use_eq (bool): False. If True, \"long\" command-line args\n                are passed as `--long=arg`. ThirdPartyTask-specific.\n        \"\"\"\n\n        extra: str = \"allow\"\n        short_flags_use_eq: bool = False\n        \"\"\"Whether short command-line arguments are passed like `-x=arg`.\"\"\"\n        long_flags_use_eq: bool = False\n        \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    # lute_template_cfg: TemplateConfig\n\n    @root_validator(pre=False)\n    def extra_fields_to_thirdparty(cls, values: Dict[str, Any]):\n        for key in values:\n            if key not in cls.__fields__:\n                values[key] = TemplateParameters(values[key])\n\n        return values\n
"},{"location":"source/io/models/base/#io.models.base.ThirdPartyParameters.Config","title":"Config","text":"

Bases: Config

Configuration for parameters model.

The Config class holds Pydantic configuration and inherited configuration from the base TaskParameters.Config class. A number of values are also overridden, and there are some specific configuration options to ThirdPartyParameters. A full list of options (with TaskParameters options repeated) is described below.

Attributes:

Name Type Description env_prefix str

Pydantic configuration. Will set parameters from environment variables containing this prefix. E.g. a model parameter input can be set with an environment variable: {env_prefix}input, in LUTE's case LUTE_input.

underscore_attrs_are_private bool

Pydantic configuration. Whether to hide attributes (parameters) prefixed with an underscore.

copy_on_model_validation str

Pydantic configuration. How to copy the input object passed to the class instance for model validation. Set to perform a deep copy.

allow_inf_nan bool

Pydantic configuration. Whether to allow infinity or NAN in float fields.

run_directory Optional[str]

None. If set, it should be a valid path. The Task will be run from this directory. This may be useful for some Tasks which rely on searching the working directory.

result_from_params Optional[str]

None. Optionally used to define results from information available in the model using a custom validator. E.g. use a outdir and filename field to set result_from_params=f\"{outdir}/{filename}, etc.

result_summary Optional[str]

None. Defines a result summary that can be known after processing the Pydantic model. Use of summary depends on the Executor running the Task. All summaries are stored in the database, however.

ThirdPartyTask-specific Optional[str] extra str

\"allow\". Pydantic configuration. Allow (or ignore) extra arguments.

short_flags_use_eq bool

False. If True, \"short\" command-line args are passed as -x=arg. ThirdPartyTask-specific.

long_flags_use_eq bool

False. If True, \"long\" command-line args are passed as --long=arg. ThirdPartyTask-specific.

Source code in lute/io/models/base.py
class Config(TaskParameters.Config):\n    \"\"\"Configuration for parameters model.\n\n    The Config class holds Pydantic configuration and inherited configuration\n    from the base `TaskParameters.Config` class. A number of values are also\n    overridden, and there are some specific configuration options to\n    ThirdPartyParameters. A full list of options (with TaskParameters options\n    repeated) is described below.\n\n    Attributes:\n        env_prefix (str): Pydantic configuration. Will set parameters from\n            environment variables containing this prefix. E.g. a model\n            parameter `input` can be set with an environment variable:\n            `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n        underscore_attrs_are_private (bool): Pydantic configuration. Whether\n            to hide attributes (parameters) prefixed with an underscore.\n\n        copy_on_model_validation (str): Pydantic configuration. How to copy\n            the input object passed to the class instance for model\n            validation. Set to perform a deep copy.\n\n        allow_inf_nan (bool): Pydantic configuration. Whether to allow\n            infinity or NAN in float fields.\n\n        run_directory (Optional[str]): None. If set, it should be a valid\n            path. The `Task` will be run from this directory. This may be\n            useful for some `Task`s which rely on searching the working\n            directory.\n\n        set_result (bool). True. If True, the model has information about\n            setting the TaskResult object from the parameters it contains.\n            E.g. it has an `output` parameter which is marked as the result.\n            The result can be set with a field value of `is_result=True` on\n            a specific parameter, or using `result_from_params` and a\n            validator.\n\n        result_from_params (Optional[str]): None. Optionally used to define\n            results from information available in the model using a custom\n            validator. E.g. use a `outdir` and `filename` field to set\n            `result_from_params=f\"{outdir}/{filename}`, etc.\n\n        result_summary (Optional[str]): None. Defines a result summary that\n            can be known after processing the Pydantic model. Use of summary\n            depends on the Executor running the Task. All summaries are\n            stored in the database, however.\n\n        impl_schemas (Optional[str]). Specifies a the schemas the\n            output/results conform to. Only used if set_result is True.\n\n        -----------------------\n        ThirdPartyTask-specific:\n\n        extra (str): \"allow\". Pydantic configuration. Allow (or ignore) extra\n            arguments.\n\n        short_flags_use_eq (bool): False. If True, \"short\" command-line args\n            are passed as `-x=arg`. ThirdPartyTask-specific.\n\n        long_flags_use_eq (bool): False. If True, \"long\" command-line args\n            are passed as `--long=arg`. ThirdPartyTask-specific.\n    \"\"\"\n\n    extra: str = \"allow\"\n    short_flags_use_eq: bool = False\n    \"\"\"Whether short command-line arguments are passed like `-x=arg`.\"\"\"\n    long_flags_use_eq: bool = False\n    \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/models/base/#io.models.base.ThirdPartyParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = False class-attribute instance-attribute","text":"

Whether long command-line arguments are passed like --long=arg.

"},{"location":"source/io/models/base/#io.models.base.ThirdPartyParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/models/base/#io.models.base.ThirdPartyParameters.Config.short_flags_use_eq","title":"short_flags_use_eq: bool = False class-attribute instance-attribute","text":"

Whether short command-line arguments are passed like -x=arg.

"},{"location":"source/io/models/sfx_find_peaks/","title":"sfx_find_peaks","text":""},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPsocakeParameters","title":"FindPeaksPsocakeParameters","text":"

Bases: ThirdPartyParameters

Parameters for crystallographic (Bragg) peak finding using Psocake.

This peak finding Task optionally has the ability to compress/decompress data with SZ for the purpose of compression validation. NOTE: This Task is deprecated and provided for compatibility only.

Source code in lute/io/models/sfx_find_peaks.py
class FindPeaksPsocakeParameters(ThirdPartyParameters):\n    \"\"\"Parameters for crystallographic (Bragg) peak finding using Psocake.\n\n    This peak finding Task optionally has the ability to compress/decompress\n    data with SZ for the purpose of compression validation.\n    NOTE: This Task is deprecated and provided for compatibility only.\n    \"\"\"\n\n    class Config(TaskParameters.Config):\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n        result_from_params: str = \"\"\n        \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n\n    class SZParameters(BaseModel):\n        compressor: Literal[\"qoz\", \"sz3\"] = Field(\n            \"qoz\", description=\"SZ compression algorithm (qoz, sz3)\"\n        )\n        binSize: int = Field(2, description=\"SZ compression's bin size paramater\")\n        roiWindowSize: int = Field(\n            2, description=\"SZ compression's ROI window size paramater\"\n        )\n        absError: float = Field(10, descriptionp=\"Maximum absolute error value\")\n\n    executable: str = Field(\"mpirun\", description=\"MPI executable.\", flag_type=\"\")\n    np: PositiveInt = Field(\n        max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n        description=\"Number of processes\",\n        flag_type=\"-\",\n    )\n    mca: str = Field(\n        \"btl ^openib\", description=\"Mca option for the MPI executable\", flag_type=\"--\"\n    )\n    p_arg1: str = Field(\n        \"python\", description=\"Executable to run with mpi (i.e. python).\", flag_type=\"\"\n    )\n    u: str = Field(\n        \"\", description=\"Python option for unbuffered output.\", flag_type=\"-\"\n    )\n    p_arg2: str = Field(\n        \"findPeaksSZ.py\",\n        description=\"Executable to run with mpi (i.e. python).\",\n        flag_type=\"\",\n    )\n    d: str = Field(description=\"Detector name\", flag_type=\"-\")\n    e: str = Field(\"\", description=\"Experiment name\", flag_type=\"-\")\n    r: int = Field(-1, description=\"Run number\", flag_type=\"-\")\n    outDir: str = Field(\n        description=\"Output directory where .cxi will be saved\", flag_type=\"--\"\n    )\n    algorithm: int = Field(1, description=\"PyAlgos algorithm to use\", flag_type=\"--\")\n    alg_npix_min: float = Field(\n        1.0, description=\"PyAlgos algorithm's npix_min parameter\", flag_type=\"--\"\n    )\n    alg_npix_max: float = Field(\n        45.0, description=\"PyAlgos algorithm's npix_max parameter\", flag_type=\"--\"\n    )\n    alg_amax_thr: float = Field(\n        250.0, description=\"PyAlgos algorithm's amax_thr parameter\", flag_type=\"--\"\n    )\n    alg_atot_thr: float = Field(\n        330.0, description=\"PyAlgos algorithm's atot_thr parameter\", flag_type=\"--\"\n    )\n    alg_son_min: float = Field(\n        10.0, description=\"PyAlgos algorithm's son_min parameter\", flag_type=\"--\"\n    )\n    alg1_thr_low: float = Field(\n        80.0, description=\"PyAlgos algorithm's thr_low parameter\", flag_type=\"--\"\n    )\n    alg1_thr_high: float = Field(\n        270.0, description=\"PyAlgos algorithm's thr_high parameter\", flag_type=\"--\"\n    )\n    alg1_rank: int = Field(\n        3, description=\"PyAlgos algorithm's rank parameter\", flag_type=\"--\"\n    )\n    alg1_radius: int = Field(\n        3, description=\"PyAlgos algorithm's radius parameter\", flag_type=\"--\"\n    )\n    alg1_dr: int = Field(\n        1, description=\"PyAlgos algorithm's dr parameter\", flag_type=\"--\"\n    )\n    psanaMask_on: str = Field(\n        \"True\", description=\"Whether psana's mask should be used\", flag_type=\"--\"\n    )\n    psanaMask_calib: str = Field(\n        \"True\", description=\"Psana mask's calib parameter\", flag_type=\"--\"\n    )\n    psanaMask_status: str = Field(\n        \"True\", description=\"Psana mask's status parameter\", flag_type=\"--\"\n    )\n    psanaMask_edges: str = Field(\n        \"True\", description=\"Psana mask's edges parameter\", flag_type=\"--\"\n    )\n    psanaMask_central: str = Field(\n        \"True\", description=\"Psana mask's central parameter\", flag_type=\"--\"\n    )\n    psanaMask_unbond: str = Field(\n        \"True\", description=\"Psana mask's unbond parameter\", flag_type=\"--\"\n    )\n    psanaMask_unbondnrs: str = Field(\n        \"True\", description=\"Psana mask's unbondnbrs parameter\", flag_type=\"--\"\n    )\n    mask: str = Field(\n        \"\", description=\"Path to an additional mask to apply\", flag_type=\"--\"\n    )\n    clen: str = Field(\n        description=\"Epics variable storing the camera length\", flag_type=\"--\"\n    )\n    coffset: float = Field(0, description=\"Camera offset in m\", flag_type=\"--\")\n    minPeaks: int = Field(\n        15,\n        description=\"Minimum number of peaks to mark frame for indexing\",\n        flag_type=\"--\",\n    )\n    maxPeaks: int = Field(\n        15,\n        description=\"Maximum number of peaks to mark frame for indexing\",\n        flag_type=\"--\",\n    )\n    minRes: int = Field(\n        0,\n        description=\"Minimum peak resolution to mark frame for indexing \",\n        flag_type=\"--\",\n    )\n    sample: str = Field(\"\", description=\"Sample name\", flag_type=\"--\")\n    instrument: Union[None, str] = Field(\n        None, description=\"Instrument name\", flag_type=\"--\"\n    )\n    pixelSize: float = Field(0.0, description=\"Pixel size\", flag_type=\"--\")\n    auto: str = Field(\n        \"False\",\n        description=(\n            \"Whether to automatically determine peak per event peak \"\n            \"finding parameters\"\n        ),\n        flag_type=\"--\",\n    )\n    detectorDistance: float = Field(\n        0.0, description=\"Detector distance from interaction point in m\", flag_type=\"--\"\n    )\n    access: Literal[\"ana\", \"ffb\"] = Field(\n        \"ana\", description=\"Data node type: {ana,ffb}\", flag_type=\"--\"\n    )\n    szfile: str = Field(\"qoz.json\", description=\"Path to SZ's JSON configuration file\")\n    lute_template_cfg: TemplateConfig = Field(\n        TemplateConfig(\n            template_name=\"sz.json\",\n            output_path=\"\",  # Will want to change where this goes...\n        ),\n        description=\"Template information for the sz.json file\",\n    )\n    sz_parameters: SZParameters = Field(\n        description=\"Configuration parameters for SZ Compression\", flag_type=\"\"\n    )\n\n    @validator(\"e\", always=True)\n    def validate_e(cls, e: str, values: Dict[str, Any]) -> str:\n        if e == \"\":\n            return values[\"lute_config\"].experiment\n        return e\n\n    @validator(\"r\", always=True)\n    def validate_r(cls, r: int, values: Dict[str, Any]) -> int:\n        if r == -1:\n            return values[\"lute_config\"].run\n        return r\n\n    @validator(\"lute_template_cfg\", always=True)\n    def set_output_path(\n        cls, lute_template_cfg: TemplateConfig, values: Dict[str, Any]\n    ) -> TemplateConfig:\n        if lute_template_cfg.output_path == \"\":\n            lute_template_cfg.output_path = values[\"szfile\"]\n        return lute_template_cfg\n\n    @validator(\"sz_parameters\", always=True)\n    def set_sz_compression_parameters(\n        cls, sz_parameters: SZParameters, values: Dict[str, Any]\n    ) -> None:\n        values[\"compressor\"] = sz_parameters.compressor\n        values[\"binSize\"] = sz_parameters.binSize\n        values[\"roiWindowSize\"] = sz_parameters.roiWindowSize\n        if sz_parameters.compressor == \"qoz\":\n            values[\"pressio_opts\"] = {\n                \"pressio:abs\": sz_parameters.absError,\n                \"qoz\": {\"qoz:stride\": 8},\n            }\n        else:\n            values[\"pressio_opts\"] = {\"pressio:abs\": sz_parameters.absError}\n        return None\n\n    @root_validator(pre=False)\n    def define_result(cls, values: Dict[str, Any]) -> Dict[str, Any]:\n        exp: str = values[\"lute_config\"].experiment\n        run: int = int(values[\"lute_config\"].run)\n        directory: str = values[\"outDir\"]\n        fname: str = f\"{exp}_{run:04d}.lst\"\n\n        cls.Config.result_from_params = f\"{directory}/{fname}\"\n        return values\n
"},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPsocakeParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_find_peaks.py
class Config(TaskParameters.Config):\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    result_from_params: str = \"\"\n    \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n
"},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPsocakeParameters.Config.result_from_params","title":"result_from_params: str = '' class-attribute instance-attribute","text":"

Defines a result from the parameters. Use a validator to do so.

"},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPsocakeParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPyAlgosParameters","title":"FindPeaksPyAlgosParameters","text":"

Bases: TaskParameters

Parameters for crystallographic (Bragg) peak finding using PyAlgos.

This peak finding Task optionally has the ability to compress/decompress data with SZ for the purpose of compression validation.

Source code in lute/io/models/sfx_find_peaks.py
class FindPeaksPyAlgosParameters(TaskParameters):\n    \"\"\"Parameters for crystallographic (Bragg) peak finding using PyAlgos.\n\n    This peak finding Task optionally has the ability to compress/decompress\n    data with SZ for the purpose of compression validation.\n    \"\"\"\n\n    class Config(TaskParameters.Config):\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    class SZCompressorParameters(BaseModel):\n        compressor: Literal[\"qoz\", \"sz3\"] = Field(\n            \"qoz\", description='Compression algorithm (\"qoz\" or \"sz3\")'\n        )\n        abs_error: float = Field(10.0, description=\"Absolute error bound\")\n        bin_size: int = Field(2, description=\"Bin size\")\n        roi_window_size: int = Field(\n            9,\n            description=\"Default window size\",\n        )\n\n    outdir: str = Field(\n        description=\"Output directory for cxi files\",\n    )\n    n_events: int = Field(\n        0,\n        description=\"Number of events to process (0 to process all events)\",\n    )\n    det_name: str = Field(\n        description=\"Psana name of the detector storing the image data\",\n    )\n    event_receiver: Literal[\"evr0\", \"evr1\"] = Field(\n        description=\"Event Receiver to be used: evr0 or evr1\",\n    )\n    tag: str = Field(\n        \"\",\n        description=\"Tag to add to the output file names\",\n    )\n    pv_camera_length: Union[str, float] = Field(\n        \"\",\n        description=\"PV associated with camera length \"\n        \"(if a number, camera length directly)\",\n    )\n    event_logic: bool = Field(\n        False,\n        description=\"True if only events with a specific event code should be \"\n        \"processed. False if the event code should be ignored\",\n    )\n    event_code: int = Field(\n        0,\n        description=\"Required events code for events to be processed if event logic \"\n        \"is True\",\n    )\n    psana_mask: bool = Field(\n        False,\n        description=\"If True, apply mask from psana Detector object\",\n    )\n    mask_file: Union[str, None] = Field(\n        None,\n        description=\"File with a custom mask to apply. If None, no custom mask is \"\n        \"applied\",\n    )\n    min_peaks: int = Field(2, description=\"Minimum number of peaks per image\")\n    max_peaks: int = Field(\n        2048,\n        description=\"Maximum number of peaks per image\",\n    )\n    npix_min: int = Field(\n        2,\n        description=\"Minimum number of pixels per peak\",\n    )\n    npix_max: int = Field(\n        30,\n        description=\"Maximum number of pixels per peak\",\n    )\n    amax_thr: float = Field(\n        80.0,\n        description=\"Minimum intensity threshold for starting a peak\",\n    )\n    atot_thr: float = Field(\n        120.0,\n        description=\"Minimum summed intensity threshold for pixel collection\",\n    )\n    son_min: float = Field(\n        7.0,\n        description=\"Minimum signal-to-noise ratio to be considered a peak\",\n    )\n    peak_rank: int = Field(\n        3,\n        description=\"Radius in which central peak pixel is a local maximum\",\n    )\n    r0: float = Field(\n        3.0,\n        description=\"Radius of ring for background evaluation in pixels\",\n    )\n    dr: float = Field(\n        2.0,\n        description=\"Width of ring for background evaluation in pixels\",\n    )\n    nsigm: float = Field(\n        7.0,\n        description=\"Intensity threshold to include pixel in connected group\",\n    )\n    compression: Optional[SZCompressorParameters] = Field(\n        None,\n        description=\"Options for the SZ Compression Algorithm\",\n    )\n    out_file: str = Field(\n        \"\",\n        description=\"Path to output file.\",\n        flag_type=\"-\",\n        rename_param=\"o\",\n        is_result=True,\n    )\n\n    @validator(\"out_file\", always=True)\n    def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n        if out_file == \"\":\n            fname: Path = (\n                Path(values[\"outdir\"])\n                / f\"{values['lute_config'].experiment}_{values['lute_config'].run}_\"\n                f\"{values['tag']}.list\"\n            )\n            return str(fname)\n        return out_file\n
"},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPyAlgosParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_find_peaks.py
class Config(TaskParameters.Config):\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPyAlgosParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/models/sfx_index/","title":"sfx_index","text":"

Models for serial femtosecond crystallography indexing.

Classes:

Name Description IndexCrystFELParameters

Perform indexing of hits/peaks using CrystFEL's indexamajig.

"},{"location":"source/io/models/sfx_index/#io.models.sfx_index.ConcatenateStreamFilesParameters","title":"ConcatenateStreamFilesParameters","text":"

Bases: TaskParameters

Parameters for stream concatenation.

Concatenates the stream file output from CrystFEL indexing for multiple experimental runs.

Source code in lute/io/models/sfx_index.py
class ConcatenateStreamFilesParameters(TaskParameters):\n    \"\"\"Parameters for stream concatenation.\n\n    Concatenates the stream file output from CrystFEL indexing for multiple\n    experimental runs.\n    \"\"\"\n\n    class Config(TaskParameters.Config):\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    in_file: str = Field(\n        \"\",\n        description=\"Root of directory tree storing stream files to merge.\",\n    )\n\n    tag: Optional[str] = Field(\n        \"\",\n        description=\"Tag identifying the stream files to merge.\",\n    )\n\n    out_file: str = Field(\n        \"\", description=\"Path to merged output stream file.\", is_result=True\n    )\n\n    @validator(\"in_file\", always=True)\n    def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n        if in_file == \"\":\n            stream_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"IndexCrystFEL\", \"out_file\"\n            )\n            if stream_file:\n                stream_dir: str = str(Path(stream_file).parent)\n                return stream_dir\n        return in_file\n\n    @validator(\"tag\", always=True)\n    def validate_tag(cls, tag: str, values: Dict[str, Any]) -> str:\n        if tag == \"\":\n            stream_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"IndexCrystFEL\", \"out_file\"\n            )\n            if stream_file:\n                stream_tag: str = Path(stream_file).name.split(\"_\")[0]\n                return stream_tag\n        return tag\n\n    @validator(\"out_file\", always=True)\n    def validate_out_file(cls, tag: str, values: Dict[str, Any]) -> str:\n        if tag == \"\":\n            stream_out_file: str = str(\n                Path(values[\"in_file\"]).parent / f\"{values['tag'].stream}\"\n            )\n            return stream_out_file\n        return tag\n
"},{"location":"source/io/models/sfx_index/#io.models.sfx_index.ConcatenateStreamFilesParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_index.py
class Config(TaskParameters.Config):\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/models/sfx_index/#io.models.sfx_index.ConcatenateStreamFilesParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/models/sfx_index/#io.models.sfx_index.IndexCrystFELParameters","title":"IndexCrystFELParameters","text":"

Bases: ThirdPartyParameters

Parameters for CrystFEL's indexamajig.

There are many parameters, and many combinations. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-indexamajig.html

Source code in lute/io/models/sfx_index.py
class IndexCrystFELParameters(ThirdPartyParameters):\n    \"\"\"Parameters for CrystFEL's `indexamajig`.\n\n    There are many parameters, and many combinations. For more information on\n    usage, please refer to the CrystFEL documentation, here:\n    https://www.desy.de/~twhite/crystfel/manual-indexamajig.html\n    \"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n        long_flags_use_eq: bool = True\n        \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/indexamajig\",\n        description=\"CrystFEL's indexing binary.\",\n        flag_type=\"\",\n    )\n    # Basic options\n    in_file: Optional[str] = Field(\n        \"\", description=\"Path to input file.\", flag_type=\"-\", rename_param=\"i\"\n    )\n    out_file: str = Field(\n        \"\",\n        description=\"Path to output file.\",\n        flag_type=\"-\",\n        rename_param=\"o\",\n        is_result=True,\n    )\n    geometry: str = Field(\n        \"\", description=\"Path to geometry file.\", flag_type=\"-\", rename_param=\"g\"\n    )\n    zmq_input: Optional[str] = Field(\n        description=\"ZMQ address to receive data over. `input` and `zmq-input` are mutually exclusive\",\n        flag_type=\"--\",\n        rename_param=\"zmq-input\",\n    )\n    zmq_subscribe: Optional[str] = Field(  # Can be used multiple times...\n        description=\"Subscribe to ZMQ message of type `tag`\",\n        flag_type=\"--\",\n        rename_param=\"zmq-subscribe\",\n    )\n    zmq_request: Optional[AnyUrl] = Field(\n        description=\"Request new data over ZMQ by sending this value\",\n        flag_type=\"--\",\n        rename_param=\"zmq-request\",\n    )\n    asapo_endpoint: Optional[str] = Field(\n        description=\"ASAP::O endpoint. zmq-input and this are mutually exclusive.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-endpoint\",\n    )\n    asapo_token: Optional[str] = Field(\n        description=\"ASAP::O authentication token.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-token\",\n    )\n    asapo_beamtime: Optional[str] = Field(\n        description=\"ASAP::O beatime.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-beamtime\",\n    )\n    asapo_source: Optional[str] = Field(\n        description=\"ASAP::O data source.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-source\",\n    )\n    asapo_group: Optional[str] = Field(\n        description=\"ASAP::O consumer group.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-group\",\n    )\n    asapo_stream: Optional[str] = Field(\n        description=\"ASAP::O stream.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-stream\",\n    )\n    asapo_wait_for_stream: Optional[str] = Field(\n        description=\"If ASAP::O stream does not exist, wait for it to appear.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-wait-for-stream\",\n    )\n    data_format: Optional[str] = Field(\n        description=\"Specify format for ZMQ or ASAP::O. `msgpack`, `hdf5` or `seedee`.\",\n        flag_type=\"--\",\n        rename_param=\"data-format\",\n    )\n    basename: bool = Field(\n        False,\n        description=\"Remove directory parts of filenames. Acts before prefix if prefix also given.\",\n        flag_type=\"--\",\n    )\n    prefix: Optional[str] = Field(\n        description=\"Add a prefix to the filenames from the infile argument.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-stream\",\n    )\n    nthreads: PositiveInt = Field(\n        max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n        description=\"Number of threads to use. See also `max_indexer_threads`.\",\n        flag_type=\"-\",\n        rename_param=\"j\",\n    )\n    no_check_prefix: bool = Field(\n        False,\n        description=\"Don't attempt to correct the prefix if it seems incorrect.\",\n        flag_type=\"--\",\n        rename_param=\"no-check-prefix\",\n    )\n    highres: Optional[float] = Field(\n        description=\"Mark all pixels greater than `x` has bad.\", flag_type=\"--\"\n    )\n    profile: bool = Field(\n        False, description=\"Display timing data to monitor performance.\", flag_type=\"--\"\n    )\n    temp_dir: Optional[str] = Field(\n        description=\"Specify a path for the temp files folder.\",\n        flag_type=\"--\",\n        rename_param=\"temp-dir\",\n    )\n    wait_for_file: conint(gt=-2) = Field(\n        0,\n        description=\"Wait at most `x` seconds for a file to be created. A value of -1 means wait forever.\",\n        flag_type=\"--\",\n        rename_param=\"wait-for-file\",\n    )\n    no_image_data: bool = Field(\n        False,\n        description=\"Load only the metadata, no iamges. Can check indexability without high data requirements.\",\n        flag_type=\"--\",\n        rename_param=\"no-image-data\",\n    )\n    # Peak-finding options\n    # ....\n    # Indexing options\n    indexing: Optional[str] = Field(\n        description=\"Comma-separated list of supported indexing algorithms to use. Default is to automatically detect.\",\n        flag_type=\"--\",\n    )\n    cell_file: Optional[str] = Field(\n        description=\"Path to a file containing unit cell information (PDB or CrystFEL format).\",\n        flag_type=\"-\",\n        rename_param=\"p\",\n    )\n    tolerance: str = Field(\n        \"5,5,5,1.5\",\n        description=(\n            \"Tolerances (in percent) for unit cell comparison. \"\n            \"Comma-separated list a,b,c,angle. Default=5,5,5,1.5\"\n        ),\n        flag_type=\"--\",\n    )\n    no_check_cell: bool = Field(\n        False,\n        description=\"Do not check cell parameters against unit cell. Replaces '-raw' method.\",\n        flag_type=\"--\",\n        rename_param=\"no-check-cell\",\n    )\n    no_check_peaks: bool = Field(\n        False,\n        description=\"Do not verify peaks are accounted for by solution.\",\n        flag_type=\"--\",\n        rename_param=\"no-check-peaks\",\n    )\n    multi: bool = Field(\n        False, description=\"Enable multi-lattice indexing.\", flag_type=\"--\"\n    )\n    wavelength_estimate: Optional[float] = Field(\n        description=\"Estimate for X-ray wavelength. Required for some methods.\",\n        flag_type=\"--\",\n        rename_param=\"wavelength-estimate\",\n    )\n    camera_length_estimate: Optional[float] = Field(\n        description=\"Estimate for camera distance. Required for some methods.\",\n        flag_type=\"--\",\n        rename_param=\"camera-length-estimate\",\n    )\n    max_indexer_threads: Optional[PositiveInt] = Field(\n        # 1,\n        description=\"Some indexing algos can use multiple threads. In addition to image-based.\",\n        flag_type=\"--\",\n        rename_param=\"max-indexer-threads\",\n    )\n    no_retry: bool = Field(\n        False,\n        description=\"Do not remove weak peaks and try again.\",\n        flag_type=\"--\",\n        rename_param=\"no-retry\",\n    )\n    no_refine: bool = Field(\n        False,\n        description=\"Skip refinement step.\",\n        flag_type=\"--\",\n        rename_param=\"no-refine\",\n    )\n    no_revalidate: bool = Field(\n        False,\n        description=\"Skip revalidation step.\",\n        flag_type=\"--\",\n        rename_param=\"no-revalidate\",\n    )\n    # TakeTwo specific parameters\n    taketwo_member_threshold: Optional[PositiveInt] = Field(\n        # 20,\n        description=\"Minimum number of vectors to consider.\",\n        flag_type=\"--\",\n        rename_param=\"taketwo-member-threshold\",\n    )\n    taketwo_len_tolerance: Optional[PositiveFloat] = Field(\n        # 0.001,\n        description=\"TakeTwo length tolerance in Angstroms.\",\n        flag_type=\"--\",\n        rename_param=\"taketwo-len-tolerance\",\n    )\n    taketwo_angle_tolerance: Optional[PositiveFloat] = Field(\n        # 0.6,\n        description=\"TakeTwo angle tolerance in degrees.\",\n        flag_type=\"--\",\n        rename_param=\"taketwo-angle-tolerance\",\n    )\n    taketwo_trace_tolerance: Optional[PositiveFloat] = Field(\n        # 3,\n        description=\"Matrix trace tolerance in degrees.\",\n        flag_type=\"--\",\n        rename_param=\"taketwo-trace-tolerance\",\n    )\n    # Felix-specific parameters\n    # felix_domega\n    # felix-fraction-max-visits\n    # felix-max-internal-angle\n    # felix-max-uniqueness\n    # felix-min-completeness\n    # felix-min-visits\n    # felix-num-voxels\n    # felix-sigma\n    # felix-tthrange-max\n    # felix-tthrange-min\n    # XGANDALF-specific parameters\n    xgandalf_sampling_pitch: Optional[NonNegativeInt] = Field(\n        # 6,\n        description=\"Density of reciprocal space sampling.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-sampling-pitch\",\n    )\n    xgandalf_grad_desc_iterations: Optional[NonNegativeInt] = Field(\n        # 4,\n        description=\"Number of gradient descent iterations.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-grad-desc-iterations\",\n    )\n    xgandalf_tolerance: Optional[PositiveFloat] = Field(\n        # 0.02,\n        description=\"Relative tolerance of lattice vectors\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-tolerance\",\n    )\n    xgandalf_no_deviation_from_provided_cell: Optional[bool] = Field(\n        description=\"Found unit cell must match provided.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-no-deviation-from-provided-cell\",\n    )\n    xgandalf_min_lattice_vector_length: Optional[PositiveFloat] = Field(\n        # 30,\n        description=\"Minimum possible lattice length.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-min-lattice-vector-length\",\n    )\n    xgandalf_max_lattice_vector_length: Optional[PositiveFloat] = Field(\n        # 250,\n        description=\"Minimum possible lattice length.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-max-lattice-vector-length\",\n    )\n    xgandalf_max_peaks: Optional[PositiveInt] = Field(\n        # 250,\n        description=\"Maximum number of peaks to use for indexing.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-max-peaks\",\n    )\n    xgandalf_fast_execution: bool = Field(\n        False,\n        description=\"Shortcut to set sampling-pitch=2, and grad-desc-iterations=3.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-fast-execution\",\n    )\n    # pinkIndexer parameters\n    # ...\n    # asdf_fast: bool = Field(False, description=\"Enable fast mode for asdf. 3x faster for 7% loss in accuracy.\", flag_type=\"--\", rename_param=\"asdf-fast\")\n    # Integration parameters\n    integration: str = Field(\n        \"rings-nocen\", description=\"Method for integrating reflections.\", flag_type=\"--\"\n    )\n    fix_profile_radius: Optional[float] = Field(\n        description=\"Fix the profile radius (m^{-1})\",\n        flag_type=\"--\",\n        rename_param=\"fix-profile-radius\",\n    )\n    fix_divergence: Optional[float] = Field(\n        0,\n        description=\"Fix the divergence (rad, full angle).\",\n        flag_type=\"--\",\n        rename_param=\"fix-divergence\",\n    )\n    int_radius: str = Field(\n        \"4,5,7\",\n        description=\"Inner, middle, and outer radii for 3-ring integration.\",\n        flag_type=\"--\",\n        rename_param=\"int-radius\",\n    )\n    int_diag: str = Field(\n        \"none\",\n        description=\"Show detailed information on integration when condition is met.\",\n        flag_type=\"--\",\n        rename_param=\"int-diag\",\n    )\n    push_res: str = Field(\n        \"infinity\",\n        description=\"Integrate `x` higher than apparent resolution limit (nm-1).\",\n        flag_type=\"--\",\n        rename_param=\"push-res\",\n    )\n    overpredict: bool = Field(\n        False,\n        description=\"Over-predict reflections. Maybe useful with post-refinement.\",\n        flag_type=\"--\",\n    )\n    cell_parameters_only: bool = Field(\n        False, description=\"Do not predict refletions at all\", flag_type=\"--\"\n    )\n    # Output parameters\n    no_non_hits_in_stream: bool = Field(\n        False,\n        description=\"Exclude non-hits from the stream file.\",\n        flag_type=\"--\",\n        rename_param=\"no-non-hits-in-stream\",\n    )\n    copy_hheader: Optional[str] = Field(\n        description=\"Copy information from header in the image to output stream.\",\n        flag_type=\"--\",\n        rename_param=\"copy-hheader\",\n    )\n    no_peaks_in_stream: bool = Field(\n        False,\n        description=\"Do not record peaks in stream file.\",\n        flag_type=\"--\",\n        rename_param=\"no-peaks-in-stream\",\n    )\n    no_refls_in_stream: bool = Field(\n        False,\n        description=\"Do not record reflections in stream.\",\n        flag_type=\"--\",\n        rename_param=\"no-refls-in-stream\",\n    )\n    serial_offset: Optional[PositiveInt] = Field(\n        description=\"Start numbering at `x` instead of 1.\",\n        flag_type=\"--\",\n        rename_param=\"serial-offset\",\n    )\n    harvest_file: Optional[str] = Field(\n        description=\"Write parameters to file in JSON format.\",\n        flag_type=\"--\",\n        rename_param=\"harvest-file\",\n    )\n\n    @validator(\"in_file\", always=True)\n    def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n        if in_file == \"\":\n            filename: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"FindPeaksPyAlgos\", \"out_file\"\n            )\n            if filename is None:\n                exp: str = values[\"lute_config\"].experiment\n                run: int = int(values[\"lute_config\"].run)\n                tag: Optional[str] = read_latest_db_entry(\n                    f\"{values['lute_config'].work_dir}\", \"FindPeaksPsocake\", \"tag\"\n                )\n                out_dir: Optional[str] = read_latest_db_entry(\n                    f\"{values['lute_config'].work_dir}\", \"FindPeaksPsocake\", \"outDir\"\n                )\n                if out_dir is not None:\n                    fname: str = f\"{out_dir}/{exp}_{run:04d}\"\n                    if tag is not None:\n                        fname = f\"{fname}_{tag}\"\n                    return f\"{fname}.lst\"\n            else:\n                return filename\n        return in_file\n\n    @validator(\"out_file\", always=True)\n    def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n        if out_file == \"\":\n            expmt: str = values[\"lute_config\"].experiment\n            run: int = int(values[\"lute_config\"].run)\n            work_dir: str = values[\"lute_config\"].work_dir\n            fname: str = f\"{expmt}_r{run:04d}.stream\"\n            return f\"{work_dir}/{fname}\"\n        return out_file\n
"},{"location":"source/io/models/sfx_index/#io.models.sfx_index.IndexCrystFELParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_index.py
class Config(ThirdPartyParameters.Config):\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    long_flags_use_eq: bool = True\n    \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n
"},{"location":"source/io/models/sfx_index/#io.models.sfx_index.IndexCrystFELParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True class-attribute instance-attribute","text":"

Whether long command-line arguments are passed like --long=arg.

"},{"location":"source/io/models/sfx_index/#io.models.sfx_index.IndexCrystFELParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/models/sfx_merge/","title":"sfx_merge","text":"

Models for merging reflections in serial femtosecond crystallography.

Classes:

Name Description MergePartialatorParameters

Perform merging using CrystFEL's partialator.

CompareHKLParameters

Calculate figures of merit using CrystFEL's compare_hkl.

ManipulateHKLParameters

Perform transformations on lists of reflections using CrystFEL's get_hkl.

"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.CompareHKLParameters","title":"CompareHKLParameters","text":"

Bases: ThirdPartyParameters

Parameters for CrystFEL's compare_hkl for calculating figures of merit.

There are many parameters, and many combinations. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-partialator.html

Source code in lute/io/models/sfx_merge.py
class CompareHKLParameters(ThirdPartyParameters):\n    \"\"\"Parameters for CrystFEL's `compare_hkl` for calculating figures of merit.\n\n    There are many parameters, and many combinations. For more information on\n    usage, please refer to the CrystFEL documentation, here:\n    https://www.desy.de/~twhite/crystfel/manual-partialator.html\n    \"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        long_flags_use_eq: bool = True\n        \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/compare_hkl\",\n        description=\"CrystFEL's reflection comparison binary.\",\n        flag_type=\"\",\n    )\n    in_files: Optional[str] = Field(\n        \"\",\n        description=\"Path to input HKLs. Space-separated list of 2. Use output of partialator e.g.\",\n        flag_type=\"\",\n    )\n    ## Need mechanism to set is_result=True ...\n    symmetry: str = Field(\"\", description=\"Point group symmetry.\", flag_type=\"--\")\n    cell_file: str = Field(\n        \"\",\n        description=\"Path to a file containing unit cell information (PDB or CrystFEL format).\",\n        flag_type=\"-\",\n        rename_param=\"p\",\n    )\n    fom: str = Field(\n        \"Rsplit\", description=\"Specify figure of merit to calculate.\", flag_type=\"--\"\n    )\n    nshells: int = Field(10, description=\"Use n resolution shells.\", flag_type=\"--\")\n    # NEED A NEW CASE FOR THIS -> Boolean flag, no arg, one hyphen...\n    # fix_unity: bool = Field(\n    #    False,\n    #    description=\"Fix scale factors to unity.\",\n    #    flag_type=\"-\",\n    #    rename_param=\"u\",\n    # )\n    shell_file: str = Field(\n        \"\",\n        description=\"Write the statistics in resolution shells to a file.\",\n        flag_type=\"--\",\n        rename_param=\"shell-file\",\n        is_result=True,\n    )\n    ignore_negs: bool = Field(\n        False,\n        description=\"Ignore reflections with negative reflections.\",\n        flag_type=\"--\",\n        rename_param=\"ignore-negs\",\n    )\n    zero_negs: bool = Field(\n        False,\n        description=\"Set negative intensities to 0.\",\n        flag_type=\"--\",\n        rename_param=\"zero-negs\",\n    )\n    sigma_cutoff: Optional[Union[float, int, str]] = Field(\n        # \"-infinity\",\n        description=\"Discard reflections with I/sigma(I) < n. -infinity means no cutoff.\",\n        flag_type=\"--\",\n        rename_param=\"sigma-cutoff\",\n    )\n    rmin: Optional[float] = Field(\n        description=\"Low resolution cutoff of 1/d (m-1). Use this or --lowres NOT both.\",\n        flag_type=\"--\",\n    )\n    lowres: Optional[float] = Field(\n        descirption=\"Low resolution cutoff in Angstroms. Use this or --rmin NOT both.\",\n        flag_type=\"--\",\n    )\n    rmax: Optional[float] = Field(\n        description=\"High resolution cutoff in 1/d (m-1). Use this or --highres NOT both.\",\n        flag_type=\"--\",\n    )\n    highres: Optional[float] = Field(\n        description=\"High resolution cutoff in Angstroms. Use this or --rmax NOT both.\",\n        flag_type=\"--\",\n    )\n\n    @validator(\"in_files\", always=True)\n    def validate_in_files(cls, in_files: str, values: Dict[str, Any]) -> str:\n        if in_files == \"\":\n            partialator_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n            )\n            if partialator_file:\n                hkls: str = f\"{partialator_file}1 {partialator_file}2\"\n                return hkls\n        return in_files\n\n    @validator(\"cell_file\", always=True)\n    def validate_cell_file(cls, cell_file: str, values: Dict[str, Any]) -> str:\n        if cell_file == \"\":\n            idx_cell_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\",\n                \"IndexCrystFEL\",\n                \"cell_file\",\n                valid_only=False,\n            )\n            if idx_cell_file:\n                return idx_cell_file\n        return cell_file\n\n    @validator(\"symmetry\", always=True)\n    def validate_symmetry(cls, symmetry: str, values: Dict[str, Any]) -> str:\n        if symmetry == \"\":\n            partialator_sym: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"symmetry\"\n            )\n            if partialator_sym:\n                return partialator_sym\n        return symmetry\n\n    @validator(\"shell_file\", always=True)\n    def validate_shell_file(cls, shell_file: str, values: Dict[str, Any]) -> str:\n        if shell_file == \"\":\n            partialator_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n            )\n            if partialator_file:\n                shells_out: str = partialator_file.split(\".\")[0]\n                shells_out = f\"{shells_out}_{values['fom']}_n{values['nshells']}.dat\"\n                return shells_out\n        return shell_file\n
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.CompareHKLParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_merge.py
class Config(ThirdPartyParameters.Config):\n    long_flags_use_eq: bool = True\n    \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.CompareHKLParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True class-attribute instance-attribute","text":"

Whether long command-line arguments are passed like --long=arg.

"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.CompareHKLParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.ManipulateHKLParameters","title":"ManipulateHKLParameters","text":"

Bases: ThirdPartyParameters

Parameters for CrystFEL's get_hkl for manipulating lists of reflections.

This Task is predominantly used internally to convert hkl to mtz files. Note that performing multiple manipulations is undefined behaviour. Run the Task with multiple configurations in explicit separate steps. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-partialator.html

Source code in lute/io/models/sfx_merge.py
class ManipulateHKLParameters(ThirdPartyParameters):\n    \"\"\"Parameters for CrystFEL's `get_hkl` for manipulating lists of reflections.\n\n    This Task is predominantly used internally to convert `hkl` to `mtz` files.\n    Note that performing multiple manipulations is undefined behaviour. Run\n    the Task with multiple configurations in explicit separate steps. For more\n    information on usage, please refer to the CrystFEL documentation, here:\n    https://www.desy.de/~twhite/crystfel/manual-partialator.html\n    \"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        long_flags_use_eq: bool = True\n        \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/get_hkl\",\n        description=\"CrystFEL's reflection manipulation binary.\",\n        flag_type=\"\",\n    )\n    in_file: str = Field(\n        \"\",\n        description=\"Path to input HKL file.\",\n        flag_type=\"-\",\n        rename_param=\"i\",\n    )\n    out_file: str = Field(\n        \"\",\n        description=\"Path to output file.\",\n        flag_type=\"-\",\n        rename_param=\"o\",\n        is_result=True,\n    )\n    cell_file: str = Field(\n        \"\",\n        description=\"Path to a file containing unit cell information (PDB or CrystFEL format).\",\n        flag_type=\"-\",\n        rename_param=\"p\",\n    )\n    output_format: str = Field(\n        \"mtz\",\n        description=\"Output format. One of mtz, mtz-bij, or xds. Otherwise CrystFEL format.\",\n        flag_type=\"--\",\n        rename_param=\"output-format\",\n    )\n    expand: Optional[str] = Field(\n        description=\"Reflections will be expanded to fill asymmetric unit of specified point group.\",\n        flag_type=\"--\",\n    )\n    # Reducing reflections to higher symmetry\n    twin: Optional[str] = Field(\n        description=\"Reflections equivalent to specified point group will have intensities summed.\",\n        flag_type=\"--\",\n    )\n    no_need_all_parts: Optional[bool] = Field(\n        description=\"Use with --twin to allow reflections missing a 'twin mate' to be written out.\",\n        flag_type=\"--\",\n        rename_param=\"no-need-all-parts\",\n    )\n    # Noise - Add to data\n    noise: Optional[bool] = Field(\n        description=\"Generate 10% uniform noise.\", flag_type=\"--\"\n    )\n    poisson: Optional[bool] = Field(\n        description=\"Generate Poisson noise. Intensities assumed to be A.U.\",\n        flag_type=\"--\",\n    )\n    adu_per_photon: Optional[int] = Field(\n        description=\"Use with --poisson to convert A.U. to photons.\",\n        flag_type=\"--\",\n        rename_param=\"adu-per-photon\",\n    )\n    # Remove duplicate reflections\n    trim_centrics: Optional[bool] = Field(\n        description=\"Duplicated reflections (according to symmetry) are removed.\",\n        flag_type=\"--\",\n    )\n    # Restrict to template file\n    template: Optional[str] = Field(\n        description=\"Only reflections which also appear in specified file are written out.\",\n        flag_type=\"--\",\n    )\n    # Multiplicity\n    multiplicity: Optional[bool] = Field(\n        description=\"Reflections are multiplied by their symmetric multiplicites.\",\n        flag_type=\"--\",\n    )\n    # Resolution cutoffs\n    cutoff_angstroms: Optional[Union[str, int, float]] = Field(\n        description=\"Either n, or n1,n2,n3. For n, reflections < n are removed. For n1,n2,n3 anisotropic trunction performed at separate resolution limits for a*, b*, c*.\",\n        flag_type=\"--\",\n        rename_param=\"cutoff-angstroms\",\n    )\n    lowres: Optional[float] = Field(\n        description=\"Remove reflections with d > n\", flag_type=\"--\"\n    )\n    highres: Optional[float] = Field(\n        description=\"Synonym for first form of --cutoff-angstroms\"\n    )\n    reindex: Optional[str] = Field(\n        description=\"Reindex according to specified operator. E.g. k,h,-l.\",\n        flag_type=\"--\",\n    )\n    # Override input symmetry\n    symmetry: Optional[str] = Field(\n        description=\"Point group symmetry to use to override. Almost always OMIT this option.\",\n        flag_type=\"--\",\n    )\n\n    @validator(\"in_file\", always=True)\n    def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n        if in_file == \"\":\n            partialator_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n            )\n            if partialator_file:\n                return partialator_file\n        return in_file\n\n    @validator(\"out_file\", always=True)\n    def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n        if out_file == \"\":\n            partialator_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n            )\n            if partialator_file:\n                mtz_out: str = partialator_file.split(\".\")[0]\n                mtz_out = f\"{mtz_out}.mtz\"\n                return mtz_out\n        return out_file\n\n    @validator(\"cell_file\", always=True)\n    def validate_cell_file(cls, cell_file: str, values: Dict[str, Any]) -> str:\n        if cell_file == \"\":\n            idx_cell_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\",\n                \"IndexCrystFEL\",\n                \"cell_file\",\n                valid_only=False,\n            )\n            if idx_cell_file:\n                return idx_cell_file\n        return cell_file\n
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.ManipulateHKLParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_merge.py
class Config(ThirdPartyParameters.Config):\n    long_flags_use_eq: bool = True\n    \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.ManipulateHKLParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True class-attribute instance-attribute","text":"

Whether long command-line arguments are passed like --long=arg.

"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.ManipulateHKLParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.MergePartialatorParameters","title":"MergePartialatorParameters","text":"

Bases: ThirdPartyParameters

Parameters for CrystFEL's partialator.

There are many parameters, and many combinations. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-partialator.html

Source code in lute/io/models/sfx_merge.py
class MergePartialatorParameters(ThirdPartyParameters):\n    \"\"\"Parameters for CrystFEL's `partialator`.\n\n    There are many parameters, and many combinations. For more information on\n    usage, please refer to the CrystFEL documentation, here:\n    https://www.desy.de/~twhite/crystfel/manual-partialator.html\n    \"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        long_flags_use_eq: bool = True\n        \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/partialator\",\n        description=\"CrystFEL's Partialator binary.\",\n        flag_type=\"\",\n    )\n    in_file: Optional[str] = Field(\n        \"\", description=\"Path to input stream.\", flag_type=\"-\", rename_param=\"i\"\n    )\n    out_file: str = Field(\n        \"\",\n        description=\"Path to output file.\",\n        flag_type=\"-\",\n        rename_param=\"o\",\n        is_result=True,\n    )\n    symmetry: str = Field(description=\"Point group symmetry.\", flag_type=\"--\")\n    niter: Optional[int] = Field(\n        description=\"Number of cycles of scaling and post-refinement.\",\n        flag_type=\"-\",\n        rename_param=\"n\",\n    )\n    no_scale: Optional[bool] = Field(\n        description=\"Disable scaling.\", flag_type=\"--\", rename_param=\"no-scale\"\n    )\n    no_Bscale: Optional[bool] = Field(\n        description=\"Disable Debye-Waller part of scaling.\",\n        flag_type=\"--\",\n        rename_param=\"no-Bscale\",\n    )\n    no_pr: Optional[bool] = Field(\n        description=\"Disable orientation model.\", flag_type=\"--\", rename_param=\"no-pr\"\n    )\n    no_deltacchalf: Optional[bool] = Field(\n        description=\"Disable rejection based on deltaCC1/2.\",\n        flag_type=\"--\",\n        rename_param=\"no-deltacchalf\",\n    )\n    model: str = Field(\n        \"unity\",\n        description=\"Partiality model. Options: xsphere, unity, offset, ggpm.\",\n        flag_type=\"--\",\n    )\n    nthreads: int = Field(\n        max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n        description=\"Number of parallel analyses.\",\n        flag_type=\"-\",\n        rename_param=\"j\",\n    )\n    polarisation: Optional[str] = Field(\n        description=\"Specification of incident polarisation. Refer to CrystFEL docs for more info.\",\n        flag_type=\"--\",\n    )\n    no_polarisation: Optional[bool] = Field(\n        description=\"Synonym for --polarisation=none\",\n        flag_type=\"--\",\n        rename_param=\"no-polarisation\",\n    )\n    max_adu: Optional[float] = Field(\n        description=\"Maximum intensity of reflection to include.\",\n        flag_type=\"--\",\n        rename_param=\"max-adu\",\n    )\n    min_res: Optional[float] = Field(\n        description=\"Only include crystals diffracting to a minimum resolution.\",\n        flag_type=\"--\",\n        rename_param=\"min-res\",\n    )\n    min_measurements: int = Field(\n        2,\n        description=\"Include a reflection only if it appears a minimum number of times.\",\n        flag_type=\"--\",\n        rename_param=\"min-measurements\",\n    )\n    push_res: Optional[float] = Field(\n        description=\"Merge reflections up to higher than the apparent resolution limit.\",\n        flag_type=\"--\",\n        rename_param=\"push-res\",\n    )\n    start_after: int = Field(\n        0,\n        description=\"Ignore the first n crystals.\",\n        flag_type=\"--\",\n        rename_param=\"start-after\",\n    )\n    stop_after: int = Field(\n        0,\n        description=\"Stop after processing n crystals. 0 means process all.\",\n        flag_type=\"--\",\n        rename_param=\"stop-after\",\n    )\n    no_free: Optional[bool] = Field(\n        description=\"Disable cross-validation. Testing ONLY.\",\n        flag_type=\"--\",\n        rename_param=\"no-free\",\n    )\n    custom_split: Optional[str] = Field(\n        description=\"Read a set of filenames, event and dataset IDs from a filename.\",\n        flag_type=\"--\",\n        rename_param=\"custom-split\",\n    )\n    max_rel_B: float = Field(\n        100,\n        description=\"Reject crystals if |relB| > n sq Angstroms.\",\n        flag_type=\"--\",\n        rename_param=\"max-rel-B\",\n    )\n    output_every_cycle: bool = Field(\n        False,\n        description=\"Write per-crystal params after every refinement cycle.\",\n        flag_type=\"--\",\n        rename_param=\"output-every-cycle\",\n    )\n    no_logs: bool = Field(\n        False,\n        description=\"Do not write logs needed for plots, maps and graphs.\",\n        flag_type=\"--\",\n        rename_param=\"no-logs\",\n    )\n    set_symmetry: Optional[str] = Field(\n        description=\"Set the apparent symmetry of the crystals to a point group.\",\n        flag_type=\"-\",\n        rename_param=\"w\",\n    )\n    operator: Optional[str] = Field(\n        description=\"Specify an ambiguity operator. E.g. k,h,-l.\", flag_type=\"--\"\n    )\n    force_bandwidth: Optional[float] = Field(\n        description=\"Set X-ray bandwidth. As percent, e.g. 0.0013 (0.13%).\",\n        flag_type=\"--\",\n        rename_param=\"force-bandwidth\",\n    )\n    force_radius: Optional[float] = Field(\n        description=\"Set the initial profile radius (nm-1).\",\n        flag_type=\"--\",\n        rename_param=\"force-radius\",\n    )\n    force_lambda: Optional[float] = Field(\n        description=\"Set the wavelength. In Angstroms.\",\n        flag_type=\"--\",\n        rename_param=\"force-lambda\",\n    )\n    harvest_file: Optional[str] = Field(\n        description=\"Write parameters to file in JSON format.\",\n        flag_type=\"--\",\n        rename_param=\"harvest-file\",\n    )\n\n    @validator(\"in_file\", always=True)\n    def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n        if in_file == \"\":\n            stream_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\",\n                \"ConcatenateStreamFiles\",\n                \"out_file\",\n            )\n            if stream_file:\n                return stream_file\n        return in_file\n\n    @validator(\"out_file\", always=True)\n    def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n        if out_file == \"\":\n            in_file: str = values[\"in_file\"]\n            if in_file:\n                tag: str = in_file.split(\".\")[0]\n                return f\"{tag}.hkl\"\n            else:\n                return \"partialator.hkl\"\n        return out_file\n
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.MergePartialatorParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_merge.py
class Config(ThirdPartyParameters.Config):\n    long_flags_use_eq: bool = True\n    \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.MergePartialatorParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True class-attribute instance-attribute","text":"

Whether long command-line arguments are passed like --long=arg.

"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.MergePartialatorParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/models/sfx_solve/","title":"sfx_solve","text":"

Models for structure solution in serial femtosecond crystallography.

Classes:

Name Description DimpleSolveParameters

Perform structure solution using CCP4's dimple (molecular replacement).

"},{"location":"source/io/models/sfx_solve/#io.models.sfx_solve.DimpleSolveParameters","title":"DimpleSolveParameters","text":"

Bases: ThirdPartyParameters

Parameters for CCP4's dimple program.

There are many parameters. For more information on usage, please refer to the CCP4 documentation, here: https://ccp4.github.io/dimple/

Source code in lute/io/models/sfx_solve.py
class DimpleSolveParameters(ThirdPartyParameters):\n    \"\"\"Parameters for CCP4's dimple program.\n\n    There are many parameters. For more information on\n    usage, please refer to the CCP4 documentation, here:\n    https://ccp4.github.io/dimple/\n    \"\"\"\n\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/ccp4-8.0/bin/dimple\",\n        description=\"CCP4 Dimple for solving structures with MR.\",\n        flag_type=\"\",\n    )\n    # Positional requirements - all required.\n    in_file: str = Field(\n        \"\",\n        description=\"Path to input mtz.\",\n        flag_type=\"\",\n    )\n    pdb: str = Field(\"\", description=\"Path to a PDB.\", flag_type=\"\")\n    out_dir: str = Field(\"\", description=\"Output DIRECTORY.\", flag_type=\"\")\n    # Most used options\n    mr_thresh: PositiveFloat = Field(\n        0.4,\n        description=\"Threshold for molecular replacement.\",\n        flag_type=\"--\",\n        rename_param=\"mr-when-r\",\n    )\n    slow: Optional[bool] = Field(\n        False, description=\"Perform more refinement.\", flag_type=\"--\"\n    )\n    # Other options (IO)\n    hklout: str = Field(\n        \"final.mtz\", description=\"Output mtz file name.\", flag_type=\"--\"\n    )\n    xyzout: str = Field(\n        \"final.pdb\", description=\"Output PDB file name.\", flag_type=\"--\"\n    )\n    icolumn: Optional[str] = Field(\n        # \"IMEAN\",\n        description=\"Name for the I column.\",\n        flag_type=\"--\",\n    )\n    sigicolumn: Optional[str] = Field(\n        # \"SIG<ICOL>\",\n        description=\"Name for the Sig<I> column.\",\n        flag_type=\"--\",\n    )\n    fcolumn: Optional[str] = Field(\n        # \"F\",\n        description=\"Name for the F column.\",\n        flag_type=\"--\",\n    )\n    sigfcolumn: Optional[str] = Field(\n        # \"F\",\n        description=\"Name for the Sig<F> column.\",\n        flag_type=\"--\",\n    )\n    libin: Optional[str] = Field(\n        description=\"Ligand descriptions for refmac (LIBIN).\", flag_type=\"--\"\n    )\n    refmac_key: Optional[str] = Field(\n        description=\"Extra Refmac keywords to use in refinement.\",\n        flag_type=\"--\",\n        rename_param=\"refmac-key\",\n    )\n    free_r_flags: Optional[str] = Field(\n        description=\"Path to a mtz file with freeR flags.\",\n        flag_type=\"--\",\n        rename_param=\"free-r-flags\",\n    )\n    freecolumn: Optional[Union[int, float]] = Field(\n        # 0,\n        description=\"Refree column with an optional value.\",\n        flag_type=\"--\",\n    )\n    img_format: Optional[str] = Field(\n        description=\"Format of generated images. (png, jpeg, none).\",\n        flag_type=\"-\",\n        rename_param=\"f\",\n    )\n    white_bg: bool = Field(\n        False,\n        description=\"Use a white background in Coot and in images.\",\n        flag_type=\"--\",\n        rename_param=\"white-bg\",\n    )\n    no_cleanup: bool = Field(\n        False,\n        description=\"Retain intermediate files.\",\n        flag_type=\"--\",\n        rename_param=\"no-cleanup\",\n    )\n    # Calculations\n    no_blob_search: bool = Field(\n        False,\n        description=\"Do not search for unmodelled blobs.\",\n        flag_type=\"--\",\n        rename_param=\"no-blob-search\",\n    )\n    anode: bool = Field(\n        False, description=\"Use SHELX/AnoDe to find peaks in the anomalous map.\"\n    )\n    # Run customization\n    no_hetatm: bool = Field(\n        False,\n        description=\"Remove heteroatoms from the given model.\",\n        flag_type=\"--\",\n        rename_param=\"no-hetatm\",\n    )\n    rigid_cycles: Optional[PositiveInt] = Field(\n        # 10,\n        description=\"Number of cycles of rigid-body refinement to perform.\",\n        flag_type=\"--\",\n        rename_param=\"rigid-cycles\",\n    )\n    jelly: Optional[PositiveInt] = Field(\n        # 4,\n        description=\"Number of cycles of jelly-body refinement to perform.\",\n        flag_type=\"--\",\n    )\n    restr_cycles: Optional[PositiveInt] = Field(\n        # 8,\n        description=\"Number of cycles of refmac final refinement to perform.\",\n        flag_type=\"--\",\n        rename_param=\"restr-cycles\",\n    )\n    lim_resolution: Optional[PositiveFloat] = Field(\n        description=\"Limit the final resolution.\", flag_type=\"--\", rename_param=\"reso\"\n    )\n    weight: Optional[str] = Field(\n        # \"auto-weight\",\n        description=\"The refmac matrix weight.\",\n        flag_type=\"--\",\n    )\n    mr_prog: Optional[str] = Field(\n        # \"phaser\",\n        description=\"Molecular replacement program. phaser or molrep.\",\n        flag_type=\"--\",\n        rename_param=\"mr-prog\",\n    )\n    mr_num: Optional[Union[str, int]] = Field(\n        # \"auto\",\n        description=\"Number of molecules to use for molecular replacement.\",\n        flag_type=\"--\",\n        rename_param=\"mr-num\",\n    )\n    mr_reso: Optional[PositiveFloat] = Field(\n        # 3.25,\n        description=\"High resolution for molecular replacement. If >10 interpreted as eLLG.\",\n        flag_type=\"--\",\n        rename_param=\"mr-reso\",\n    )\n    itof_prog: Optional[str] = Field(\n        description=\"Program to calculate amplitudes. truncate, or ctruncate.\",\n        flag_type=\"--\",\n        rename_param=\"ItoF-prog\",\n    )\n\n    @validator(\"in_file\", always=True)\n    def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n        if in_file == \"\":\n            get_hkl_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"ManipulateHKL\", \"out_file\"\n            )\n            if get_hkl_file:\n                return get_hkl_file\n        return in_file\n\n    @validator(\"out_dir\", always=True)\n    def validate_out_dir(cls, out_dir: str, values: Dict[str, Any]) -> str:\n        if out_dir == \"\":\n            get_hkl_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"ManipulateHKL\", \"out_file\"\n            )\n            if get_hkl_file:\n                return os.path.dirname(get_hkl_file)\n        return out_dir\n
"},{"location":"source/io/models/sfx_solve/#io.models.sfx_solve.RunSHELXCParameters","title":"RunSHELXCParameters","text":"

Bases: ThirdPartyParameters

Parameters for CCP4's SHELXC program.

SHELXC prepares files for SHELXD and SHELXE.

For more information please refer to the official documentation: https://www.ccp4.ac.uk/html/crank.html

Source code in lute/io/models/sfx_solve.py
class RunSHELXCParameters(ThirdPartyParameters):\n    \"\"\"Parameters for CCP4's SHELXC program.\n\n    SHELXC prepares files for SHELXD and SHELXE.\n\n    For more information please refer to the official documentation:\n    https://www.ccp4.ac.uk/html/crank.html\n    \"\"\"\n\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/ccp4-8.0/bin/shelxc\",\n        description=\"CCP4 SHELXC. Generates input files for SHELXD/SHELXE.\",\n        flag_type=\"\",\n    )\n    placeholder: str = Field(\n        \"xx\", description=\"Placeholder filename stem.\", flag_type=\"\"\n    )\n    in_file: str = Field(\n        \"\",\n        description=\"Input file for SHELXC with reflections AND proper records.\",\n        flag_type=\"\",\n    )\n\n    @validator(\"in_file\", always=True)\n    def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n        if in_file == \"\":\n            # get_hkl needed to be run to produce an XDS format file...\n            xds_format_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"ManipulateHKL\", \"out_file\"\n            )\n            if xds_format_file:\n                in_file = xds_format_file\n        if in_file[0] != \"<\":\n            # Need to add a redirection for this program\n            # Runs like `shelxc xx <input_file.xds`\n            in_file = f\"<{in_file}\"\n        return in_file\n
"},{"location":"source/io/models/smd/","title":"smd","text":"

Models for smalldata_tools Tasks.

Classes:

Name Description SubmitSMDParameters

Parameters to run smalldata_tools to produce a smalldata HDF5 file.

FindOverlapXSSParameters

Parameter model for the FindOverlapXSS Task. Used to determine spatial/temporal overlap based on XSS difference signal.

"},{"location":"source/io/models/smd/#io.models.smd.FindOverlapXSSParameters","title":"FindOverlapXSSParameters","text":"

Bases: TaskParameters

TaskParameter model for FindOverlapXSS Task.

This Task determines spatial or temporal overlap between an optical pulse and the FEL pulse based on difference scattering (XSS) signal. This Task uses SmallData HDF5 files as a source.

Source code in lute/io/models/smd.py
class FindOverlapXSSParameters(TaskParameters):\n    \"\"\"TaskParameter model for FindOverlapXSS Task.\n\n    This Task determines spatial or temporal overlap between an optical pulse\n    and the FEL pulse based on difference scattering (XSS) signal. This Task\n    uses SmallData HDF5 files as a source.\n    \"\"\"\n\n    class ExpConfig(BaseModel):\n        det_name: str\n        ipm_var: str\n        scan_var: Union[str, List[str]]\n\n    class Thresholds(BaseModel):\n        min_Iscat: Union[int, float]\n        min_ipm: Union[int, float]\n\n    class AnalysisFlags(BaseModel):\n        use_pyfai: bool = True\n        use_asymls: bool = False\n\n    exp_config: ExpConfig\n    thresholds: Thresholds\n    analysis_flags: AnalysisFlags\n
"},{"location":"source/io/models/smd/#io.models.smd.SubmitSMDParameters","title":"SubmitSMDParameters","text":"

Bases: ThirdPartyParameters

Parameters for running smalldata to produce reduced HDF5 files.

Source code in lute/io/models/smd.py
class SubmitSMDParameters(ThirdPartyParameters):\n    \"\"\"Parameters for running smalldata to produce reduced HDF5 files.\"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        \"\"\"Identical to super-class Config but includes a result.\"\"\"\n\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n        result_from_params: str = \"\"\n        \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n\n    executable: str = Field(\"mpirun\", description=\"MPI executable.\", flag_type=\"\")\n    np: PositiveInt = Field(\n        max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n        description=\"Number of processes\",\n        flag_type=\"-\",\n    )\n    p_arg1: str = Field(\n        \"python\", description=\"Executable to run with mpi (i.e. python).\", flag_type=\"\"\n    )\n    u: str = Field(\n        \"\", description=\"Python option for unbuffered output.\", flag_type=\"-\"\n    )\n    m: str = Field(\n        \"mpi4py.run\",\n        description=\"Python option to execute a module's contents as __main__ module.\",\n        flag_type=\"-\",\n    )\n    producer: str = Field(\n        \"\", description=\"Path to the SmallData producer Python script.\", flag_type=\"\"\n    )\n    run: str = Field(\n        os.environ.get(\"RUN_NUM\", \"\"), description=\"DAQ Run Number.\", flag_type=\"--\"\n    )\n    experiment: str = Field(\n        os.environ.get(\"EXPERIMENT\", \"\"),\n        description=\"LCLS Experiment Number.\",\n        flag_type=\"--\",\n    )\n    stn: NonNegativeInt = Field(0, description=\"Hutch endstation.\", flag_type=\"--\")\n    nevents: int = Field(\n        int(1e9), description=\"Number of events to process.\", flag_type=\"--\"\n    )\n    directory: Optional[str] = Field(\n        None,\n        description=\"Optional output directory. If None, will be in ${EXP_FOLDER}/hdf5/smalldata.\",\n        flag_type=\"--\",\n    )\n    ## Need mechanism to set result_from_param=True ...\n    gather_interval: PositiveInt = Field(\n        25, description=\"Number of events to collect at a time.\", flag_type=\"--\"\n    )\n    norecorder: bool = Field(\n        False, description=\"Whether to ignore recorder streams.\", flag_type=\"--\"\n    )\n    url: HttpUrl = Field(\n        \"https://pswww.slac.stanford.edu/ws-auth/lgbk\",\n        description=\"Base URL for eLog posting.\",\n        flag_type=\"--\",\n    )\n    epicsAll: bool = Field(\n        False,\n        description=\"Whether to store all EPICS PVs. Use with care.\",\n        flag_type=\"--\",\n    )\n    full: bool = Field(\n        False,\n        description=\"Whether to store all data. Use with EXTRA care.\",\n        flag_type=\"--\",\n    )\n    fullSum: bool = Field(\n        False,\n        description=\"Whether to store sums for all area detector images.\",\n        flag_type=\"--\",\n    )\n    default: bool = Field(\n        False,\n        description=\"Whether to store only the default minimal set of data.\",\n        flag_type=\"--\",\n    )\n    image: bool = Field(\n        False,\n        description=\"Whether to save everything as images. Use with care.\",\n        flag_type=\"--\",\n    )\n    tiff: bool = Field(\n        False,\n        description=\"Whether to save all images as a single TIFF. Use with EXTRA care.\",\n        flag_type=\"--\",\n    )\n    centerpix: bool = Field(\n        False,\n        description=\"Whether to mask center pixels for Epix10k2M detectors.\",\n        flag_type=\"--\",\n    )\n    postRuntable: bool = Field(\n        False,\n        description=\"Whether to post run tables. Also used as a trigger for summary jobs.\",\n        flag_type=\"--\",\n    )\n    wait: bool = Field(\n        False, description=\"Whether to wait for a file to appear.\", flag_type=\"--\"\n    )\n    xtcav: bool = Field(\n        False,\n        description=\"Whether to add XTCAV processing to the HDF5 generation.\",\n        flag_type=\"--\",\n    )\n    noarch: bool = Field(\n        False, description=\"Whether to not use archiver data.\", flag_type=\"--\"\n    )\n\n    lute_template_cfg: TemplateConfig = TemplateConfig(template_name=\"\", output_path=\"\")\n\n    @validator(\"producer\", always=True)\n    def validate_producer_path(cls, producer: str) -> str:\n        return producer\n\n    @validator(\"lute_template_cfg\", always=True)\n    def use_producer(\n        cls, lute_template_cfg: TemplateConfig, values: Dict[str, Any]\n    ) -> TemplateConfig:\n        if not lute_template_cfg.output_path:\n            lute_template_cfg.output_path = values[\"producer\"]\n        return lute_template_cfg\n\n    @root_validator(pre=False)\n    def define_result(cls, values: Dict[str, Any]) -> Dict[str, Any]:\n        exp: str = values[\"lute_config\"].experiment\n        hutch: str = exp[:3]\n        run: int = int(values[\"lute_config\"].run)\n        directory: Optional[str] = values[\"directory\"]\n        if directory is None:\n            directory = f\"/sdf/data/lcls/ds/{hutch}/{exp}/hdf5/smalldata\"\n        fname: str = f\"{exp}_Run{run:04d}.h5\"\n\n        cls.Config.result_from_params = f\"{directory}/{fname}\"\n        return values\n
"},{"location":"source/io/models/smd/#io.models.smd.SubmitSMDParameters.Config","title":"Config","text":"

Bases: Config

Identical to super-class Config but includes a result.

Source code in lute/io/models/smd.py
class Config(ThirdPartyParameters.Config):\n    \"\"\"Identical to super-class Config but includes a result.\"\"\"\n\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    result_from_params: str = \"\"\n    \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n
"},{"location":"source/io/models/smd/#io.models.smd.SubmitSMDParameters.Config.result_from_params","title":"result_from_params: str = '' class-attribute instance-attribute","text":"

Defines a result from the parameters. Use a validator to do so.

"},{"location":"source/io/models/smd/#io.models.smd.SubmitSMDParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/models/tests/","title":"tests","text":"

Models for all test Tasks.

Classes:

Name Description TestParameters

Model for most basic test case. Single core first-party Task. Uses only communication via pipes.

TestBinaryParameters

Parameters for a simple multi- threaded binary executable.

TestSocketParameters

Model for first-party test requiring communication via socket.

TestWriteOutputParameters

Model for test Task which writes an output file. Location of file is recorded in database.

TestReadOutputParameters

Model for test Task which locates an output file based on an entry in the database, if no path is provided.

"},{"location":"source/io/models/tests/#io.models.tests.TestBinaryErrParameters","title":"TestBinaryErrParameters","text":"

Bases: ThirdPartyParameters

Same as TestBinary, but exits with non-zero code.

Source code in lute/io/models/tests.py
class TestBinaryErrParameters(ThirdPartyParameters):\n    \"\"\"Same as TestBinary, but exits with non-zero code.\"\"\"\n\n    executable: str = Field(\n        \"/sdf/home/d/dorlhiac/test_tasks/test_threads_err\",\n        description=\"Multi-threaded tes tbinary with non-zero exit code.\",\n    )\n    p_arg1: int = Field(1, description=\"Number of threads.\")\n
"},{"location":"source/io/models/tests/#io.models.tests.TestParameters","title":"TestParameters","text":"

Bases: TaskParameters

Parameters for the test Task Test.

Source code in lute/io/models/tests.py
class TestParameters(TaskParameters):\n    \"\"\"Parameters for the test Task `Test`.\"\"\"\n\n    float_var: float = Field(0.01, description=\"A floating point number.\")\n    str_var: str = Field(\"test\", description=\"A string.\")\n\n    class CompoundVar(BaseModel):\n        int_var: int = 1\n        dict_var: Dict[str, str] = {\"a\": \"b\"}\n\n    compound_var: CompoundVar = Field(\n        description=(\n            \"A compound parameter - consists of a `int_var` (int) and `dict_var`\"\n            \" (Dict[str, str]).\"\n        )\n    )\n    throw_error: bool = Field(\n        False, description=\"If `True`, raise an exception to test error handling.\"\n    )\n
"},{"location":"source/tasks/dataclasses/","title":"dataclasses","text":"

Classes for describing Task state and results.

Classes:

Name Description TaskResult

Output of a specific analysis task.

TaskStatus

Enumeration of possible Task statuses (running, pending, failed, etc.).

DescribedAnalysis

Executor's description of a Task run (results, parameters, env).

"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.DescribedAnalysis","title":"DescribedAnalysis dataclass","text":"

Complete analysis description. Held by an Executor.

Source code in lute/tasks/dataclasses.py
@dataclass\nclass DescribedAnalysis:\n    \"\"\"Complete analysis description. Held by an Executor.\"\"\"\n\n    task_result: TaskResult\n    task_parameters: Optional[TaskParameters]\n    task_env: Dict[str, str]\n    poll_interval: float\n    communicator_desc: List[str]\n
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.ElogSummaryPlots","title":"ElogSummaryPlots dataclass","text":"

Holds a graphical summary intended for display in the eLog.

Attributes:

Name Type Description display_name str

This represents both a path and how the result will be displayed in the eLog. Can include \"/\" characters. E.g. display_name = \"scans/my_motor_scan\" will have plots shown on a \"my_motor_scan\" page, under a \"scans\" tab. This format mirrors how the file is stored on disk as well.

Source code in lute/tasks/dataclasses.py
@dataclass\nclass ElogSummaryPlots:\n    \"\"\"Holds a graphical summary intended for display in the eLog.\n\n    Attributes:\n        display_name (str): This represents both a path and how the result will be\n            displayed in the eLog. Can include \"/\" characters. E.g.\n            `display_name = \"scans/my_motor_scan\"` will have plots shown\n            on a \"my_motor_scan\" page, under a \"scans\" tab. This format mirrors\n            how the file is stored on disk as well.\n    \"\"\"\n\n    display_name: str\n    figures: Union[pn.Tabs, hv.Image, plt.Figure]\n
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskResult","title":"TaskResult dataclass","text":"

Class for storing the result of a Task's execution with metadata.

Attributes:

Name Type Description task_name str

Name of the associated task which produced it.

task_status TaskStatus

Status of associated task.

summary str

Short message/summary associated with the result.

payload Any

Actual result. May be data in any format.

impl_schemas Optional[str]

A string listing Task schemas implemented by the associated Task. Schemas define the category and expected output of the Task. An individual task may implement/conform to multiple schemas. Multiple schemas are separated by ';', e.g. * impl_schemas = \"schema1;schema2\"

Source code in lute/tasks/dataclasses.py
@dataclass\nclass TaskResult:\n    \"\"\"Class for storing the result of a Task's execution with metadata.\n\n    Attributes:\n        task_name (str): Name of the associated task which produced it.\n\n        task_status (TaskStatus): Status of associated task.\n\n        summary (str): Short message/summary associated with the result.\n\n        payload (Any): Actual result. May be data in any format.\n\n        impl_schemas (Optional[str]): A string listing `Task` schemas implemented\n            by the associated `Task`. Schemas define the category and expected\n            output of the `Task`. An individual task may implement/conform to\n            multiple schemas. Multiple schemas are separated by ';', e.g.\n                * impl_schemas = \"schema1;schema2\"\n    \"\"\"\n\n    task_name: str\n    task_status: TaskStatus\n    summary: str\n    payload: Any\n    impl_schemas: Optional[str] = None\n
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus","title":"TaskStatus","text":"

Bases: Enum

Possible Task statuses.

Source code in lute/tasks/dataclasses.py
class TaskStatus(Enum):\n    \"\"\"Possible Task statuses.\"\"\"\n\n    PENDING = 0\n    \"\"\"\n    Task has yet to run. Is Queued, or waiting for prior tasks.\n    \"\"\"\n    RUNNING = 1\n    \"\"\"\n    Task is in the process of execution.\n    \"\"\"\n    COMPLETED = 2\n    \"\"\"\n    Task has completed without fatal errors.\n    \"\"\"\n    FAILED = 3\n    \"\"\"\n    Task encountered a fatal error.\n    \"\"\"\n    STOPPED = 4\n    \"\"\"\n    Task was, potentially temporarily, stopped/suspended.\n    \"\"\"\n    CANCELLED = 5\n    \"\"\"\n    Task was cancelled prior to completion or failure.\n    \"\"\"\n    TIMEDOUT = 6\n    \"\"\"\n    Task did not reach completion due to timeout.\n    \"\"\"\n
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.CANCELLED","title":"CANCELLED = 5 class-attribute instance-attribute","text":"

Task was cancelled prior to completion or failure.

"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.COMPLETED","title":"COMPLETED = 2 class-attribute instance-attribute","text":"

Task has completed without fatal errors.

"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.FAILED","title":"FAILED = 3 class-attribute instance-attribute","text":"

Task encountered a fatal error.

"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.PENDING","title":"PENDING = 0 class-attribute instance-attribute","text":"

Task has yet to run. Is Queued, or waiting for prior tasks.

"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.RUNNING","title":"RUNNING = 1 class-attribute instance-attribute","text":"

Task is in the process of execution.

"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.STOPPED","title":"STOPPED = 4 class-attribute instance-attribute","text":"

Task was, potentially temporarily, stopped/suspended.

"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.TIMEDOUT","title":"TIMEDOUT = 6 class-attribute instance-attribute","text":"

Task did not reach completion due to timeout.

"},{"location":"source/tasks/sfx_find_peaks/","title":"sfx_find_peaks","text":"

Classes for peak finding tasks in SFX.

Classes:

Name Description CxiWriter

utility class for writing peak finding results to CXI files.

FindPeaksPyAlgos

peak finding using psana's PyAlgos algorithm. Optional data compression and decompression with libpressio for data reduction tests.

"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.CxiWriter","title":"CxiWriter","text":"Source code in lute/tasks/sfx_find_peaks.py
class CxiWriter:\n\n    def __init__(\n        self,\n        outdir: str,\n        rank: int,\n        exp: str,\n        run: int,\n        n_events: int,\n        det_shape: Tuple[int, ...],\n        min_peaks: int,\n        max_peaks: int,\n        i_x: Any,  # Not typed becomes it comes from psana\n        i_y: Any,  # Not typed becomes it comes from psana\n        ipx: Any,  # Not typed becomes it comes from psana\n        ipy: Any,  # Not typed becomes it comes from psana\n        tag: str,\n    ):\n        \"\"\"\n        Set up the CXI files to which peak finding results will be saved.\n\n        Parameters:\n\n            outdir (str): Output directory for cxi file.\n\n            rank (int): MPI rank of the caller.\n\n            exp (str): Experiment string.\n\n            run (int): Experimental run.\n\n            n_events (int): Number of events to process.\n\n            det_shape (Tuple[int, int]): Shape of the numpy array storing the detector\n                data. This must be aCheetah-stile 2D array.\n\n            min_peaks (int): Minimum number of peaks per image.\n\n            max_peaks (int): Maximum number of peaks per image.\n\n            i_x (Any): Array of pixel indexes along x\n\n            i_y (Any): Array of pixel indexes along y\n\n            ipx (Any): Pixel indexes with respect to detector origin (x component)\n\n            ipy (Any): Pixel indexes with respect to detector origin (y component)\n\n            tag (str): Tag to append to cxi file names.\n        \"\"\"\n        self._det_shape: Tuple[int, ...] = det_shape\n        self._i_x: Any = i_x\n        self._i_y: Any = i_y\n        self._ipx: Any = ipx\n        self._ipy: Any = ipy\n        self._index: int = 0\n\n        # Create and open the HDF5 file\n        fname: str = f\"{exp}_r{run:0>4}_{rank}{tag}.cxi\"\n        Path(outdir).mkdir(exist_ok=True)\n        self._outh5: Any = h5py.File(Path(outdir) / fname, \"w\")\n\n        # Entry_1 entry for processing with CrystFEL\n        entry_1: Any = self._outh5.create_group(\"entry_1\")\n        keys: List[str] = [\n            \"nPeaks\",\n            \"peakXPosRaw\",\n            \"peakYPosRaw\",\n            \"rcent\",\n            \"ccent\",\n            \"rmin\",\n            \"rmax\",\n            \"cmin\",\n            \"cmax\",\n            \"peakTotalIntensity\",\n            \"peakMaxIntensity\",\n            \"peakRadius\",\n        ]\n        ds_expId: Any = entry_1.create_dataset(\n            \"experimental_identifier\", (n_events,), maxshape=(None,), dtype=int\n        )\n        ds_expId.attrs[\"axes\"] = \"experiment_identifier\"\n        data_1: Any = entry_1.create_dataset(\n            \"/entry_1/data_1/data\",\n            (n_events, det_shape[0], det_shape[1]),\n            chunks=(1, det_shape[0], det_shape[1]),\n            maxshape=(None, det_shape[0], det_shape[1]),\n            dtype=numpy.float32,\n        )\n        data_1.attrs[\"axes\"] = \"experiment_identifier\"\n        key: str\n        for key in [\"powderHits\", \"powderMisses\", \"mask\"]:\n            entry_1.create_dataset(\n                f\"/entry_1/data_1/{key}\",\n                (det_shape[0], det_shape[1]),\n                chunks=(det_shape[0], det_shape[1]),\n                maxshape=(det_shape[0], det_shape[1]),\n                dtype=float,\n            )\n\n        # Peak-related entries\n        for key in keys:\n            if key == \"nPeaks\":\n                ds_x: Any = self._outh5.create_dataset(\n                    f\"/entry_1/result_1/{key}\",\n                    (n_events,),\n                    maxshape=(None,),\n                    dtype=int,\n                )\n                ds_x.attrs[\"minPeaks\"] = min_peaks\n                ds_x.attrs[\"maxPeaks\"] = max_peaks\n            else:\n                ds_x: Any = self._outh5.create_dataset(\n                    f\"/entry_1/result_1/{key}\",\n                    (n_events, max_peaks),\n                    maxshape=(None, max_peaks),\n                    chunks=(1, max_peaks),\n                    dtype=float,\n                )\n            ds_x.attrs[\"axes\"] = \"experiment_identifier:peaks\"\n\n        # Timestamp entries\n        lcls_1: Any = self._outh5.create_group(\"LCLS\")\n        keys: List[str] = [\n            \"eventNumber\",\n            \"machineTime\",\n            \"machineTimeNanoSeconds\",\n            \"fiducial\",\n            \"photon_energy_eV\",\n        ]\n        key: str\n        for key in keys:\n            if key == \"photon_energy_eV\":\n                ds_x: Any = lcls_1.create_dataset(\n                    f\"{key}\", (n_events,), maxshape=(None,), dtype=float\n                )\n            else:\n                ds_x = lcls_1.create_dataset(\n                    f\"{key}\", (n_events,), maxshape=(None,), dtype=int\n                )\n            ds_x.attrs[\"axes\"] = \"experiment_identifier\"\n\n        ds_x = self._outh5.create_dataset(\n            \"/LCLS/detector_1/EncoderValue\", (n_events,), maxshape=(None,), dtype=float\n        )\n        ds_x.attrs[\"axes\"] = \"experiment_identifier\"\n\n    def write_event(\n        self,\n        img: NDArray[numpy.float_],\n        peaks: Any,  # Not typed becomes it comes from psana\n        timestamp_seconds: int,\n        timestamp_nanoseconds: int,\n        timestamp_fiducials: int,\n        photon_energy: float,\n    ):\n        \"\"\"\n        Write peak finding results for an event into the HDF5 file.\n\n        Parameters:\n\n            img (NDArray[numpy.float_]): Detector data for the event\n\n            peaks: (Any): Peak information for the event, as recovered from the PyAlgos\n                algorithm\n\n            timestamp_seconds (int): Second part of the event's timestamp information\n\n            timestamp_nanoseconds (int): Nanosecond part of the event's timestamp\n                information\n\n            timestamp_fiducials (int): Fiducials part of the event's timestamp\n                information\n\n            photon_energy (float): Photon energy for the event\n        \"\"\"\n        ch_rows: NDArray[numpy.float_] = peaks[:, 0] * self._det_shape[1] + peaks[:, 1]\n        ch_cols: NDArray[numpy.float_] = peaks[:, 2]\n\n        # Entry_1 entry for processing with CrystFEL\n        self._outh5[\"/entry_1/data_1/data\"][self._index, :, :] = img.reshape(\n            -1, img.shape[-1]\n        )\n        self._outh5[\"/entry_1/result_1/nPeaks\"][self._index] = peaks.shape[0]\n        self._outh5[\"/entry_1/result_1/peakXPosRaw\"][self._index, : peaks.shape[0]] = (\n            ch_cols.astype(\"int\")\n        )\n        self._outh5[\"/entry_1/result_1/peakYPosRaw\"][self._index, : peaks.shape[0]] = (\n            ch_rows.astype(\"int\")\n        )\n        self._outh5[\"/entry_1/result_1/rcent\"][self._index, : peaks.shape[0]] = peaks[\n            :, 6\n        ]\n        self._outh5[\"/entry_1/result_1/ccent\"][self._index, : peaks.shape[0]] = peaks[\n            :, 7\n        ]\n        self._outh5[\"/entry_1/result_1/rmin\"][self._index, : peaks.shape[0]] = peaks[\n            :, 10\n        ]\n        self._outh5[\"/entry_1/result_1/rmax\"][self._index, : peaks.shape[0]] = peaks[\n            :, 11\n        ]\n        self._outh5[\"/entry_1/result_1/cmin\"][self._index, : peaks.shape[0]] = peaks[\n            :, 12\n        ]\n        self._outh5[\"/entry_1/result_1/cmax\"][self._index, : peaks.shape[0]] = peaks[\n            :, 13\n        ]\n        self._outh5[\"/entry_1/result_1/peakTotalIntensity\"][\n            self._index, : peaks.shape[0]\n        ] = peaks[:, 5]\n        self._outh5[\"/entry_1/result_1/peakMaxIntensity\"][\n            self._index, : peaks.shape[0]\n        ] = peaks[:, 4]\n\n        # Calculate and write pixel radius\n        peaks_cenx: NDArray[numpy.float_] = (\n            self._i_x[\n                numpy.array(peaks[:, 0], dtype=numpy.int64),\n                numpy.array(peaks[:, 1], dtype=numpy.int64),\n                numpy.array(peaks[:, 2], dtype=numpy.int64),\n            ]\n            + 0.5\n            - self._ipx\n        )\n        peaks_ceny: NDArray[numpy.float_] = (\n            self._i_y[\n                numpy.array(peaks[:, 0], dtype=numpy.int64),\n                numpy.array(peaks[:, 1], dtype=numpy.int64),\n                numpy.array(peaks[:, 2], dtype=numpy.int64),\n            ]\n            + 0.5\n            - self._ipy\n        )\n        peak_radius: NDArray[numpy.float_] = numpy.sqrt(\n            (peaks_cenx**2) + (peaks_ceny**2)\n        )\n        self._outh5[\"/entry_1/result_1/peakRadius\"][\n            self._index, : peaks.shape[0]\n        ] = peak_radius\n\n        # LCLS entry dataset\n        self._outh5[\"/LCLS/machineTime\"][self._index] = timestamp_seconds\n        self._outh5[\"/LCLS/machineTimeNanoSeconds\"][self._index] = timestamp_nanoseconds\n        self._outh5[\"/LCLS/fiducial\"][self._index] = timestamp_fiducials\n        self._outh5[\"/LCLS/photon_energy_eV\"][self._index] = photon_energy\n\n        self._index += 1\n\n    def write_non_event_data(\n        self,\n        powder_hits: NDArray[numpy.float_],\n        powder_misses: NDArray[numpy.float_],\n        mask: NDArray[numpy.uint16],\n        clen: float,\n    ):\n        \"\"\"\n        Write to the file data that is not related to a specific event (masks, powders)\n\n        Parameters:\n\n            powder_hits (NDArray[numpy.float_]): Virtual powder pattern from hits\n\n            powder_misses (NDArray[numpy.float_]): Virtual powder pattern from hits\n\n            mask: (NDArray[numpy.uint16]): Pixel ask to write into the file\n\n        \"\"\"\n        # Add powders and mask to files, reshaping them to match the crystfel\n        # convention\n        self._outh5[\"/entry_1/data_1/powderHits\"][:] = powder_hits.reshape(\n            -1, powder_hits.shape[-1]\n        )\n        self._outh5[\"/entry_1/data_1/powderMisses\"][:] = powder_misses.reshape(\n            -1, powder_misses.shape[-1]\n        )\n        self._outh5[\"/entry_1/data_1/mask\"][:] = (1 - mask).reshape(\n            -1, mask.shape[-1]\n        )  # Crystfel expects inverted values\n\n        # Add clen distance\n        self._outh5[\"/LCLS/detector_1/EncoderValue\"][:] = clen\n\n    def optimize_and_close_file(\n        self,\n        num_hits: int,\n        max_peaks: int,\n    ):\n        \"\"\"\n        Resize data blocks and write additional information to the file\n\n        Parameters:\n\n            num_hits (int): Number of hits for which information has been saved to the\n                file\n\n            max_peaks (int): Maximum number of peaks (per event) for which information\n                can be written into the file\n        \"\"\"\n\n        # Resize the entry_1 entry\n        data_shape: Tuple[int, ...] = self._outh5[\"/entry_1/data_1/data\"].shape\n        self._outh5[\"/entry_1/data_1/data\"].resize(\n            (num_hits, data_shape[1], data_shape[2])\n        )\n        self._outh5[f\"/entry_1/result_1/nPeaks\"].resize((num_hits,))\n        key: str\n        for key in [\n            \"peakXPosRaw\",\n            \"peakYPosRaw\",\n            \"rcent\",\n            \"ccent\",\n            \"rmin\",\n            \"rmax\",\n            \"cmin\",\n            \"cmax\",\n            \"peakTotalIntensity\",\n            \"peakMaxIntensity\",\n            \"peakRadius\",\n        ]:\n            self._outh5[f\"/entry_1/result_1/{key}\"].resize((num_hits, max_peaks))\n\n        # Resize LCLS entry\n        for key in [\n            \"eventNumber\",\n            \"machineTime\",\n            \"machineTimeNanoSeconds\",\n            \"fiducial\",\n            \"detector_1/EncoderValue\",\n            \"photon_energy_eV\",\n        ]:\n            self._outh5[f\"/LCLS/{key}\"].resize((num_hits,))\n        self._outh5.close()\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.CxiWriter.__init__","title":"__init__(outdir, rank, exp, run, n_events, det_shape, min_peaks, max_peaks, i_x, i_y, ipx, ipy, tag)","text":"

Set up the CXI files to which peak finding results will be saved.

Parameters:

outdir (str): Output directory for cxi file.\n\nrank (int): MPI rank of the caller.\n\nexp (str): Experiment string.\n\nrun (int): Experimental run.\n\nn_events (int): Number of events to process.\n\ndet_shape (Tuple[int, int]): Shape of the numpy array storing the detector\n    data. This must be aCheetah-stile 2D array.\n\nmin_peaks (int): Minimum number of peaks per image.\n\nmax_peaks (int): Maximum number of peaks per image.\n\ni_x (Any): Array of pixel indexes along x\n\ni_y (Any): Array of pixel indexes along y\n\nipx (Any): Pixel indexes with respect to detector origin (x component)\n\nipy (Any): Pixel indexes with respect to detector origin (y component)\n\ntag (str): Tag to append to cxi file names.\n
Source code in lute/tasks/sfx_find_peaks.py
def __init__(\n    self,\n    outdir: str,\n    rank: int,\n    exp: str,\n    run: int,\n    n_events: int,\n    det_shape: Tuple[int, ...],\n    min_peaks: int,\n    max_peaks: int,\n    i_x: Any,  # Not typed becomes it comes from psana\n    i_y: Any,  # Not typed becomes it comes from psana\n    ipx: Any,  # Not typed becomes it comes from psana\n    ipy: Any,  # Not typed becomes it comes from psana\n    tag: str,\n):\n    \"\"\"\n    Set up the CXI files to which peak finding results will be saved.\n\n    Parameters:\n\n        outdir (str): Output directory for cxi file.\n\n        rank (int): MPI rank of the caller.\n\n        exp (str): Experiment string.\n\n        run (int): Experimental run.\n\n        n_events (int): Number of events to process.\n\n        det_shape (Tuple[int, int]): Shape of the numpy array storing the detector\n            data. This must be aCheetah-stile 2D array.\n\n        min_peaks (int): Minimum number of peaks per image.\n\n        max_peaks (int): Maximum number of peaks per image.\n\n        i_x (Any): Array of pixel indexes along x\n\n        i_y (Any): Array of pixel indexes along y\n\n        ipx (Any): Pixel indexes with respect to detector origin (x component)\n\n        ipy (Any): Pixel indexes with respect to detector origin (y component)\n\n        tag (str): Tag to append to cxi file names.\n    \"\"\"\n    self._det_shape: Tuple[int, ...] = det_shape\n    self._i_x: Any = i_x\n    self._i_y: Any = i_y\n    self._ipx: Any = ipx\n    self._ipy: Any = ipy\n    self._index: int = 0\n\n    # Create and open the HDF5 file\n    fname: str = f\"{exp}_r{run:0>4}_{rank}{tag}.cxi\"\n    Path(outdir).mkdir(exist_ok=True)\n    self._outh5: Any = h5py.File(Path(outdir) / fname, \"w\")\n\n    # Entry_1 entry for processing with CrystFEL\n    entry_1: Any = self._outh5.create_group(\"entry_1\")\n    keys: List[str] = [\n        \"nPeaks\",\n        \"peakXPosRaw\",\n        \"peakYPosRaw\",\n        \"rcent\",\n        \"ccent\",\n        \"rmin\",\n        \"rmax\",\n        \"cmin\",\n        \"cmax\",\n        \"peakTotalIntensity\",\n        \"peakMaxIntensity\",\n        \"peakRadius\",\n    ]\n    ds_expId: Any = entry_1.create_dataset(\n        \"experimental_identifier\", (n_events,), maxshape=(None,), dtype=int\n    )\n    ds_expId.attrs[\"axes\"] = \"experiment_identifier\"\n    data_1: Any = entry_1.create_dataset(\n        \"/entry_1/data_1/data\",\n        (n_events, det_shape[0], det_shape[1]),\n        chunks=(1, det_shape[0], det_shape[1]),\n        maxshape=(None, det_shape[0], det_shape[1]),\n        dtype=numpy.float32,\n    )\n    data_1.attrs[\"axes\"] = \"experiment_identifier\"\n    key: str\n    for key in [\"powderHits\", \"powderMisses\", \"mask\"]:\n        entry_1.create_dataset(\n            f\"/entry_1/data_1/{key}\",\n            (det_shape[0], det_shape[1]),\n            chunks=(det_shape[0], det_shape[1]),\n            maxshape=(det_shape[0], det_shape[1]),\n            dtype=float,\n        )\n\n    # Peak-related entries\n    for key in keys:\n        if key == \"nPeaks\":\n            ds_x: Any = self._outh5.create_dataset(\n                f\"/entry_1/result_1/{key}\",\n                (n_events,),\n                maxshape=(None,),\n                dtype=int,\n            )\n            ds_x.attrs[\"minPeaks\"] = min_peaks\n            ds_x.attrs[\"maxPeaks\"] = max_peaks\n        else:\n            ds_x: Any = self._outh5.create_dataset(\n                f\"/entry_1/result_1/{key}\",\n                (n_events, max_peaks),\n                maxshape=(None, max_peaks),\n                chunks=(1, max_peaks),\n                dtype=float,\n            )\n        ds_x.attrs[\"axes\"] = \"experiment_identifier:peaks\"\n\n    # Timestamp entries\n    lcls_1: Any = self._outh5.create_group(\"LCLS\")\n    keys: List[str] = [\n        \"eventNumber\",\n        \"machineTime\",\n        \"machineTimeNanoSeconds\",\n        \"fiducial\",\n        \"photon_energy_eV\",\n    ]\n    key: str\n    for key in keys:\n        if key == \"photon_energy_eV\":\n            ds_x: Any = lcls_1.create_dataset(\n                f\"{key}\", (n_events,), maxshape=(None,), dtype=float\n            )\n        else:\n            ds_x = lcls_1.create_dataset(\n                f\"{key}\", (n_events,), maxshape=(None,), dtype=int\n            )\n        ds_x.attrs[\"axes\"] = \"experiment_identifier\"\n\n    ds_x = self._outh5.create_dataset(\n        \"/LCLS/detector_1/EncoderValue\", (n_events,), maxshape=(None,), dtype=float\n    )\n    ds_x.attrs[\"axes\"] = \"experiment_identifier\"\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.CxiWriter.optimize_and_close_file","title":"optimize_and_close_file(num_hits, max_peaks)","text":"

Resize data blocks and write additional information to the file

Parameters:

num_hits (int): Number of hits for which information has been saved to the\n    file\n\nmax_peaks (int): Maximum number of peaks (per event) for which information\n    can be written into the file\n
Source code in lute/tasks/sfx_find_peaks.py
def optimize_and_close_file(\n    self,\n    num_hits: int,\n    max_peaks: int,\n):\n    \"\"\"\n    Resize data blocks and write additional information to the file\n\n    Parameters:\n\n        num_hits (int): Number of hits for which information has been saved to the\n            file\n\n        max_peaks (int): Maximum number of peaks (per event) for which information\n            can be written into the file\n    \"\"\"\n\n    # Resize the entry_1 entry\n    data_shape: Tuple[int, ...] = self._outh5[\"/entry_1/data_1/data\"].shape\n    self._outh5[\"/entry_1/data_1/data\"].resize(\n        (num_hits, data_shape[1], data_shape[2])\n    )\n    self._outh5[f\"/entry_1/result_1/nPeaks\"].resize((num_hits,))\n    key: str\n    for key in [\n        \"peakXPosRaw\",\n        \"peakYPosRaw\",\n        \"rcent\",\n        \"ccent\",\n        \"rmin\",\n        \"rmax\",\n        \"cmin\",\n        \"cmax\",\n        \"peakTotalIntensity\",\n        \"peakMaxIntensity\",\n        \"peakRadius\",\n    ]:\n        self._outh5[f\"/entry_1/result_1/{key}\"].resize((num_hits, max_peaks))\n\n    # Resize LCLS entry\n    for key in [\n        \"eventNumber\",\n        \"machineTime\",\n        \"machineTimeNanoSeconds\",\n        \"fiducial\",\n        \"detector_1/EncoderValue\",\n        \"photon_energy_eV\",\n    ]:\n        self._outh5[f\"/LCLS/{key}\"].resize((num_hits,))\n    self._outh5.close()\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.CxiWriter.write_event","title":"write_event(img, peaks, timestamp_seconds, timestamp_nanoseconds, timestamp_fiducials, photon_energy)","text":"

Write peak finding results for an event into the HDF5 file.

Parameters:

img (NDArray[numpy.float_]): Detector data for the event\n\npeaks: (Any): Peak information for the event, as recovered from the PyAlgos\n    algorithm\n\ntimestamp_seconds (int): Second part of the event's timestamp information\n\ntimestamp_nanoseconds (int): Nanosecond part of the event's timestamp\n    information\n\ntimestamp_fiducials (int): Fiducials part of the event's timestamp\n    information\n\nphoton_energy (float): Photon energy for the event\n
Source code in lute/tasks/sfx_find_peaks.py
def write_event(\n    self,\n    img: NDArray[numpy.float_],\n    peaks: Any,  # Not typed becomes it comes from psana\n    timestamp_seconds: int,\n    timestamp_nanoseconds: int,\n    timestamp_fiducials: int,\n    photon_energy: float,\n):\n    \"\"\"\n    Write peak finding results for an event into the HDF5 file.\n\n    Parameters:\n\n        img (NDArray[numpy.float_]): Detector data for the event\n\n        peaks: (Any): Peak information for the event, as recovered from the PyAlgos\n            algorithm\n\n        timestamp_seconds (int): Second part of the event's timestamp information\n\n        timestamp_nanoseconds (int): Nanosecond part of the event's timestamp\n            information\n\n        timestamp_fiducials (int): Fiducials part of the event's timestamp\n            information\n\n        photon_energy (float): Photon energy for the event\n    \"\"\"\n    ch_rows: NDArray[numpy.float_] = peaks[:, 0] * self._det_shape[1] + peaks[:, 1]\n    ch_cols: NDArray[numpy.float_] = peaks[:, 2]\n\n    # Entry_1 entry for processing with CrystFEL\n    self._outh5[\"/entry_1/data_1/data\"][self._index, :, :] = img.reshape(\n        -1, img.shape[-1]\n    )\n    self._outh5[\"/entry_1/result_1/nPeaks\"][self._index] = peaks.shape[0]\n    self._outh5[\"/entry_1/result_1/peakXPosRaw\"][self._index, : peaks.shape[0]] = (\n        ch_cols.astype(\"int\")\n    )\n    self._outh5[\"/entry_1/result_1/peakYPosRaw\"][self._index, : peaks.shape[0]] = (\n        ch_rows.astype(\"int\")\n    )\n    self._outh5[\"/entry_1/result_1/rcent\"][self._index, : peaks.shape[0]] = peaks[\n        :, 6\n    ]\n    self._outh5[\"/entry_1/result_1/ccent\"][self._index, : peaks.shape[0]] = peaks[\n        :, 7\n    ]\n    self._outh5[\"/entry_1/result_1/rmin\"][self._index, : peaks.shape[0]] = peaks[\n        :, 10\n    ]\n    self._outh5[\"/entry_1/result_1/rmax\"][self._index, : peaks.shape[0]] = peaks[\n        :, 11\n    ]\n    self._outh5[\"/entry_1/result_1/cmin\"][self._index, : peaks.shape[0]] = peaks[\n        :, 12\n    ]\n    self._outh5[\"/entry_1/result_1/cmax\"][self._index, : peaks.shape[0]] = peaks[\n        :, 13\n    ]\n    self._outh5[\"/entry_1/result_1/peakTotalIntensity\"][\n        self._index, : peaks.shape[0]\n    ] = peaks[:, 5]\n    self._outh5[\"/entry_1/result_1/peakMaxIntensity\"][\n        self._index, : peaks.shape[0]\n    ] = peaks[:, 4]\n\n    # Calculate and write pixel radius\n    peaks_cenx: NDArray[numpy.float_] = (\n        self._i_x[\n            numpy.array(peaks[:, 0], dtype=numpy.int64),\n            numpy.array(peaks[:, 1], dtype=numpy.int64),\n            numpy.array(peaks[:, 2], dtype=numpy.int64),\n        ]\n        + 0.5\n        - self._ipx\n    )\n    peaks_ceny: NDArray[numpy.float_] = (\n        self._i_y[\n            numpy.array(peaks[:, 0], dtype=numpy.int64),\n            numpy.array(peaks[:, 1], dtype=numpy.int64),\n            numpy.array(peaks[:, 2], dtype=numpy.int64),\n        ]\n        + 0.5\n        - self._ipy\n    )\n    peak_radius: NDArray[numpy.float_] = numpy.sqrt(\n        (peaks_cenx**2) + (peaks_ceny**2)\n    )\n    self._outh5[\"/entry_1/result_1/peakRadius\"][\n        self._index, : peaks.shape[0]\n    ] = peak_radius\n\n    # LCLS entry dataset\n    self._outh5[\"/LCLS/machineTime\"][self._index] = timestamp_seconds\n    self._outh5[\"/LCLS/machineTimeNanoSeconds\"][self._index] = timestamp_nanoseconds\n    self._outh5[\"/LCLS/fiducial\"][self._index] = timestamp_fiducials\n    self._outh5[\"/LCLS/photon_energy_eV\"][self._index] = photon_energy\n\n    self._index += 1\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.CxiWriter.write_non_event_data","title":"write_non_event_data(powder_hits, powder_misses, mask, clen)","text":"

Write to the file data that is not related to a specific event (masks, powders)

Parameters:

powder_hits (NDArray[numpy.float_]): Virtual powder pattern from hits\n\npowder_misses (NDArray[numpy.float_]): Virtual powder pattern from hits\n\nmask: (NDArray[numpy.uint16]): Pixel ask to write into the file\n
Source code in lute/tasks/sfx_find_peaks.py
def write_non_event_data(\n    self,\n    powder_hits: NDArray[numpy.float_],\n    powder_misses: NDArray[numpy.float_],\n    mask: NDArray[numpy.uint16],\n    clen: float,\n):\n    \"\"\"\n    Write to the file data that is not related to a specific event (masks, powders)\n\n    Parameters:\n\n        powder_hits (NDArray[numpy.float_]): Virtual powder pattern from hits\n\n        powder_misses (NDArray[numpy.float_]): Virtual powder pattern from hits\n\n        mask: (NDArray[numpy.uint16]): Pixel ask to write into the file\n\n    \"\"\"\n    # Add powders and mask to files, reshaping them to match the crystfel\n    # convention\n    self._outh5[\"/entry_1/data_1/powderHits\"][:] = powder_hits.reshape(\n        -1, powder_hits.shape[-1]\n    )\n    self._outh5[\"/entry_1/data_1/powderMisses\"][:] = powder_misses.reshape(\n        -1, powder_misses.shape[-1]\n    )\n    self._outh5[\"/entry_1/data_1/mask\"][:] = (1 - mask).reshape(\n        -1, mask.shape[-1]\n    )  # Crystfel expects inverted values\n\n    # Add clen distance\n    self._outh5[\"/LCLS/detector_1/EncoderValue\"][:] = clen\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.FindPeaksPyAlgos","title":"FindPeaksPyAlgos","text":"

Bases: Task

Task that performs peak finding using the PyAlgos peak finding algorithms and writes the peak information to CXI files.

Source code in lute/tasks/sfx_find_peaks.py
class FindPeaksPyAlgos(Task):\n    \"\"\"\n    Task that performs peak finding using the PyAlgos peak finding algorithms and\n    writes the peak information to CXI files.\n    \"\"\"\n\n    def __init__(self, *, params: TaskParameters) -> None:\n        super().__init__(params=params)\n\n    def _run(self) -> None:\n        ds: Any = MPIDataSource(\n            f\"exp={self._task_parameters.lute_config.experiment}:\"\n            f\"run={self._task_parameters.lute_config.run}:smd\"\n        )\n        if self._task_parameters.n_events != 0:\n            ds.break_after(self._task_parameters.n_events)\n\n        det: Any = Detector(self._task_parameters.det_name)\n        det.do_reshape_2d_to_3d(flag=True)\n\n        evr: Any = Detector(self._task_parameters.event_receiver)\n\n        i_x: Any = det.indexes_x(self._task_parameters.lute_config.run).astype(\n            numpy.int64\n        )\n        i_y: Any = det.indexes_y(self._task_parameters.lute_config.run).astype(\n            numpy.int64\n        )\n        ipx: Any\n        ipy: Any\n        ipx, ipy = det.point_indexes(\n            self._task_parameters.lute_config.run, pxy_um=(0, 0)\n        )\n\n        alg: Any = None\n        num_hits: int = 0\n        num_events: int = 0\n        num_empty_images: int = 0\n        tag: str = self._task_parameters.tag\n        if (tag != \"\") and (tag[0] != \"_\"):\n            tag = \"_\" + tag\n\n        evt: Any\n        for evt in ds.events():\n\n            evt_id: Any = evt.get(EventId)\n            timestamp_seconds: int = evt_id.time()[0]\n            timestamp_nanoseconds: int = evt_id.time()[1]\n            timestamp_fiducials: int = evt_id.fiducials()\n            event_codes: Any = evr.eventCodes(evt)\n\n            if isinstance(self._task_parameters.pv_camera_length, float):\n                clen: float = self._task_parameters.pv_camera_length\n            else:\n                clen = (\n                    ds.env().epicsStore().value(self._task_parameters.pv_camera_length)\n                )\n\n            if self._task_parameters.event_logic:\n                if not self._task_parameters.event_code in event_codes:\n                    continue\n\n            img: Any = det.calib(evt)\n\n            if img is None:\n                num_empty_images += 1\n                continue\n\n            if alg is None:\n                det_shape: Tuple[int, ...] = img.shape\n                if len(det_shape) == 3:\n                    det_shape = (det_shape[0] * det_shape[1], det_shape[2])\n                else:\n                    det_shape = img.shape\n\n                mask: NDArray[numpy.uint16] = numpy.ones(det_shape).astype(numpy.uint16)\n\n                if self._task_parameters.psana_mask:\n                    mask = det.mask(\n                        self.task_parameters.run,\n                        calib=False,\n                        status=True,\n                        edges=False,\n                        centra=False,\n                        unbond=False,\n                        unbondnbrs=False,\n                    ).astype(numpy.uint16)\n\n                hdffh: Any\n                if self._task_parameters.mask_file is not None:\n                    with h5py.File(self._task_parameters.mask_file, \"r\") as hdffh:\n                        loaded_mask: NDArray[numpy.int] = hdffh[\"entry_1/data_1/mask\"][\n                            :\n                        ]\n                        mask *= loaded_mask.astype(numpy.uint16)\n\n                file_writer: CxiWriter = CxiWriter(\n                    outdir=self._task_parameters.outdir,\n                    rank=ds.rank,\n                    exp=self._task_parameters.lute_config.experiment,\n                    run=self._task_parameters.lute_config.run,\n                    n_events=self._task_parameters.n_events,\n                    det_shape=det_shape,\n                    i_x=i_x,\n                    i_y=i_y,\n                    ipx=ipx,\n                    ipy=ipy,\n                    min_peaks=self._task_parameters.min_peaks,\n                    max_peaks=self._task_parameters.max_peaks,\n                    tag=tag,\n                )\n                alg: Any = PyAlgos(mask=mask, pbits=0)  # pbits controls verbosity\n                alg.set_peak_selection_pars(\n                    npix_min=self._task_parameters.npix_min,\n                    npix_max=self._task_parameters.npix_max,\n                    amax_thr=self._task_parameters.amax_thr,\n                    atot_thr=self._task_parameters.atot_thr,\n                    son_min=self._task_parameters.son_min,\n                )\n\n                if self._task_parameters.compression is not None:\n\n                    libpressio_config = generate_libpressio_configuration(\n                        compressor=self._task_parameters.compression.compressor,\n                        roi_window_size=self._task_parameters.compression.roi_window_size,\n                        bin_size=self._task_parameters.compression.bin_size,\n                        abs_error=self._task_parameters.compression.abs_error,\n                        libpressio_mask=mask,\n                    )\n\n                powder_hits: NDArray[numpy.float_] = numpy.zeros(det_shape)\n                powder_misses: NDArray[numpy.float_] = numpy.zeros(det_shape)\n\n            peaks: Any = alg.peak_finder_v3r3(\n                img,\n                rank=self._task_parameters.peak_rank,\n                r0=self._task_parameters.r0,\n                dr=self._task_parameters.dr,\n                #      nsigm=self._task_parameters.nsigm,\n            )\n\n            num_events += 1\n\n            if (peaks.shape[0] >= self._task_parameters.min_peaks) and (\n                peaks.shape[0] <= self._task_parameters.max_peaks\n            ):\n\n                if self._task_parameters.compression is not None:\n\n                    libpressio_config_with_peaks = (\n                        add_peaks_to_libpressio_configuration(libpressio_config, peaks)\n                    )\n                    compressor = PressioCompressor.from_config(\n                        libpressio_config_with_peaks\n                    )\n                    compressed_img = compressor.encode(img)\n                    decompressed_img = numpy.zeros_like(img)\n                    decompressed = compressor.decode(compressed_img, decompressed_img)\n                    img = decompressed_img\n\n                try:\n                    photon_energy: float = (\n                        Detector(\"EBeam\").get(evt).ebeamPhotonEnergy()\n                    )\n                except AttributeError:\n                    photon_energy = (\n                        1.23984197386209e-06\n                        / ds.env().epicsStore().value(\"SIOC:SYS0:ML00:AO192\")\n                        / 1.0e9\n                    )\n\n                file_writer.write_event(\n                    img=img,\n                    peaks=peaks,\n                    timestamp_seconds=timestamp_seconds,\n                    timestamp_nanoseconds=timestamp_nanoseconds,\n                    timestamp_fiducials=timestamp_fiducials,\n                    photon_energy=photon_energy,\n                )\n                num_hits += 1\n\n            # TODO: Fix bug here\n            # generate / update powders\n            if peaks.shape[0] >= self._task_parameters.min_peaks:\n                powder_hits = numpy.maximum(powder_hits, img)\n            else:\n                powder_misses = numpy.maximum(powder_misses, img)\n\n        if num_empty_images != 0:\n            msg: Message = Message(\n                contents=f\"Rank {ds.rank} encountered {num_empty_images} empty images.\"\n            )\n            self._report_to_executor(msg)\n\n        file_writer.write_non_event_data(\n            powder_hits=powder_hits,\n            powder_misses=powder_misses,\n            mask=mask,\n            clen=clen,\n        )\n\n        file_writer.optimize_and_close_file(\n            num_hits=num_hits, max_peaks=self._task_parameters.max_peaks\n        )\n\n        COMM_WORLD.Barrier()\n\n        num_hits_per_rank: List[int] = COMM_WORLD.gather(num_hits, root=0)\n        num_hits_total: int = COMM_WORLD.reduce(num_hits, SUM)\n        num_events_per_rank: List[int] = COMM_WORLD.gather(num_events, root=0)\n\n        if ds.rank == 0:\n            master_fname: Path = write_master_file(\n                mpi_size=ds.size,\n                outdir=self._task_parameters.outdir,\n                exp=self._task_parameters.lute_config.experiment,\n                run=self._task_parameters.lute_config.run,\n                tag=tag,\n                n_hits_per_rank=num_hits_per_rank,\n                n_hits_total=num_hits_total,\n            )\n\n            # Write final summary file\n            f: TextIO\n            with open(\n                Path(self._task_parameters.outdir) / f\"peakfinding{tag}.summary\", \"w\"\n            ) as f:\n                print(f\"Number of events processed: {num_events_per_rank[-1]}\", file=f)\n                print(f\"Number of hits found: {num_hits_total}\", file=f)\n                print(\n                    \"Fractional hit rate: \"\n                    f\"{(num_hits_total/num_events_per_rank[-1]):.2f}\",\n                    file=f,\n                )\n                print(f\"No. hits per rank: {num_hits_per_rank}\", file=f)\n\n            with open(Path(self._task_parameters.out_file), \"w\") as f:\n                print(f\"{master_fname}\", file=f)\n\n            # Write out_file\n\n    def _post_run(self) -> None:\n        super()._post_run()\n        self._result.task_status = TaskStatus.COMPLETED\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.add_peaks_to_libpressio_configuration","title":"add_peaks_to_libpressio_configuration(lp_json, peaks)","text":"

Add peak infromation to libpressio configuration

Parameters:

lp_json: Dictionary storing the configuration JSON structure for the libpressio\n    library.\n\npeaks (Any): Peak information as returned by psana.\n

Returns:

lp_json: Updated configuration JSON structure for the libpressio library.\n
Source code in lute/tasks/sfx_find_peaks.py
def add_peaks_to_libpressio_configuration(lp_json, peaks) -> Dict[str, Any]:\n    \"\"\"\n    Add peak infromation to libpressio configuration\n\n    Parameters:\n\n        lp_json: Dictionary storing the configuration JSON structure for the libpressio\n            library.\n\n        peaks (Any): Peak information as returned by psana.\n\n    Returns:\n\n        lp_json: Updated configuration JSON structure for the libpressio library.\n    \"\"\"\n    lp_json[\"compressor_config\"][\"pressio\"][\"roibin\"][\"roibin:centers\"] = (\n        numpy.ascontiguousarray(numpy.uint64(peaks[:, [2, 1, 0]]))\n    )\n    return lp_json\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.generate_libpressio_configuration","title":"generate_libpressio_configuration(compressor, roi_window_size, bin_size, abs_error, libpressio_mask)","text":"

Create the configuration JSON for the libpressio library

Parameters:

compressor (Literal[\"sz3\", \"qoz\"]): Compression algorithm to use\n    (\"qoz\" or \"sz3\").\n\nabs_error (float): Bound value for the absolute error.\n\nbin_size (int): Bining Size.\n\nroi_window_size (int): Default size of the ROI window.\n\nlibpressio_mask (NDArray): mask to be applied to the data.\n

Returns:

lp_json (Dict[str, Any]): Dictionary storing the JSON configuration structure\nfor the libpressio library\n
Source code in lute/tasks/sfx_find_peaks.py
def generate_libpressio_configuration(\n    compressor: Literal[\"sz3\", \"qoz\"],\n    roi_window_size: int,\n    bin_size: int,\n    abs_error: float,\n    libpressio_mask,\n) -> Dict[str, Any]:\n    \"\"\"\n    Create the configuration JSON for the libpressio library\n\n    Parameters:\n\n        compressor (Literal[\"sz3\", \"qoz\"]): Compression algorithm to use\n            (\"qoz\" or \"sz3\").\n\n        abs_error (float): Bound value for the absolute error.\n\n        bin_size (int): Bining Size.\n\n        roi_window_size (int): Default size of the ROI window.\n\n        libpressio_mask (NDArray): mask to be applied to the data.\n\n    Returns:\n\n        lp_json (Dict[str, Any]): Dictionary storing the JSON configuration structure\n        for the libpressio library\n    \"\"\"\n\n    if compressor == \"qoz\":\n        pressio_opts: Dict[str, Any] = {\n            \"pressio:abs\": abs_error,\n            \"qoz\": {\"qoz:stride\": 8},\n        }\n    elif compressor == \"sz3\":\n        pressio_opts = {\"pressio:abs\": abs_error}\n\n    lp_json = {\n        \"compressor_id\": \"pressio\",\n        \"early_config\": {\n            \"pressio\": {\n                \"pressio:compressor\": \"roibin\",\n                \"roibin\": {\n                    \"roibin:metric\": \"composite\",\n                    \"roibin:background\": \"mask_binning\",\n                    \"roibin:roi\": \"fpzip\",\n                    \"background\": {\n                        \"binning:compressor\": \"pressio\",\n                        \"mask_binning:compressor\": \"pressio\",\n                        \"pressio\": {\"pressio:compressor\": compressor},\n                    },\n                    \"composite\": {\n                        \"composite:plugins\": [\n                            \"size\",\n                            \"time\",\n                            \"input_stats\",\n                            \"error_stat\",\n                        ]\n                    },\n                },\n            }\n        },\n        \"compressor_config\": {\n            \"pressio\": {\n                \"roibin\": {\n                    \"roibin:roi_size\": [roi_window_size, roi_window_size, 0],\n                    \"roibin:centers\": None,  # \"roibin:roi_strategy\": \"coordinates\",\n                    \"roibin:nthreads\": 4,\n                    \"roi\": {\"fpzip:prec\": 0},\n                    \"background\": {\n                        \"mask_binning:mask\": None,\n                        \"mask_binning:shape\": [bin_size, bin_size, 1],\n                        \"mask_binning:nthreads\": 4,\n                        \"pressio\": pressio_opts,\n                    },\n                }\n            }\n        },\n        \"name\": \"pressio\",\n    }\n\n    lp_json[\"compressor_config\"][\"pressio\"][\"roibin\"][\"background\"][\n        \"mask_binning:mask\"\n    ] = (1 - libpressio_mask)\n\n    return lp_json\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.write_master_file","title":"write_master_file(mpi_size, outdir, exp, run, tag, n_hits_per_rank, n_hits_total)","text":"

Generate a virtual dataset to map all individual files for this run.

Parameters:

mpi_size (int): Number of ranks in the MPI pool.\n\noutdir (str): Output directory for cxi file.\n\nexp (str): Experiment string.\n\nrun (int): Experimental run.\n\ntag (str): Tag to append to cxi file names.\n\nn_hits_per_rank (List[int]): Array containing the number of hits found on each\n    node processing data.\n\nn_hits_total (int): Total number of hits found across all nodes.\n

Returns:

The path to the the written master file\n
Source code in lute/tasks/sfx_find_peaks.py
def write_master_file(\n    mpi_size: int,\n    outdir: str,\n    exp: str,\n    run: int,\n    tag: str,\n    n_hits_per_rank: List[int],\n    n_hits_total: int,\n) -> Path:\n    \"\"\"\n    Generate a virtual dataset to map all individual files for this run.\n\n    Parameters:\n\n        mpi_size (int): Number of ranks in the MPI pool.\n\n        outdir (str): Output directory for cxi file.\n\n        exp (str): Experiment string.\n\n        run (int): Experimental run.\n\n        tag (str): Tag to append to cxi file names.\n\n        n_hits_per_rank (List[int]): Array containing the number of hits found on each\n            node processing data.\n\n        n_hits_total (int): Total number of hits found across all nodes.\n\n    Returns:\n\n        The path to the the written master file\n    \"\"\"\n    # Retrieve paths to the files containing data\n    fnames: List[Path] = []\n    fi: int\n    for fi in range(mpi_size):\n        if n_hits_per_rank[fi] > 0:\n            fnames.append(Path(outdir) / f\"{exp}_r{run:0>4}_{fi}{tag}.cxi\")\n    if len(fnames) == 0:\n        sys.exit(\"No hits found\")\n\n    # Retrieve list of entries to populate in the virtual hdf5 file\n    dname_list, key_list, shape_list, dtype_list = [], [], [], []\n    datasets = [\"/entry_1/result_1\", \"/LCLS/detector_1\", \"/LCLS\", \"/entry_1/data_1\"]\n    f = h5py.File(fnames[0], \"r\")\n    for dname in datasets:\n        dset = f[dname]\n        for key in dset.keys():\n            if f\"{dname}/{key}\" not in datasets:\n                dname_list.append(dname)\n                key_list.append(key)\n                shape_list.append(dset[key].shape)\n                dtype_list.append(dset[key].dtype)\n    f.close()\n\n    # Compute cumulative powder hits and misses for all files\n    powder_hits, powder_misses = None, None\n    for fn in fnames:\n        f = h5py.File(fn, \"r\")\n        if powder_hits is None:\n            powder_hits = f[\"entry_1/data_1/powderHits\"][:].copy()\n            powder_misses = f[\"entry_1/data_1/powderMisses\"][:].copy()\n        else:\n            powder_hits = numpy.maximum(\n                powder_hits, f[\"entry_1/data_1/powderHits\"][:].copy()\n            )\n            powder_misses = numpy.maximum(\n                powder_misses, f[\"entry_1/data_1/powderMisses\"][:].copy()\n            )\n        f.close()\n\n    vfname: Path = Path(outdir) / f\"{exp}_r{run:0>4}{tag}.cxi\"\n    with h5py.File(vfname, \"w\") as vdf:\n\n        # Write the virtual hdf5 file\n        for dnum in range(len(dname_list)):\n            dname = f\"{dname_list[dnum]}/{key_list[dnum]}\"\n            if key_list[dnum] not in [\"mask\", \"powderHits\", \"powderMisses\"]:\n                layout = h5py.VirtualLayout(\n                    shape=(n_hits_total,) + shape_list[dnum][1:], dtype=dtype_list[dnum]\n                )\n                cursor = 0\n                for i, fn in enumerate(fnames):\n                    vsrc = h5py.VirtualSource(\n                        fn, dname, shape=(n_hits_per_rank[i],) + shape_list[dnum][1:]\n                    )\n                    if len(shape_list[dnum]) == 1:\n                        layout[cursor : cursor + n_hits_per_rank[i]] = vsrc\n                    else:\n                        layout[cursor : cursor + n_hits_per_rank[i], :] = vsrc\n                    cursor += n_hits_per_rank[i]\n                vdf.create_virtual_dataset(dname, layout, fillvalue=-1)\n\n        vdf[\"entry_1/data_1/powderHits\"] = powder_hits\n        vdf[\"entry_1/data_1/powderMisses\"] = powder_misses\n\n    return vfname\n
"},{"location":"source/tasks/sfx_index/","title":"sfx_index","text":"

Classes for indexing tasks in SFX.

Classes:

Name Description ConcatenateStreamFIles

task that merges multiple stream files into a single file.

"},{"location":"source/tasks/sfx_index/#tasks.sfx_index.ConcatenateStreamFiles","title":"ConcatenateStreamFiles","text":"

Bases: Task

Task that merges stream files located within a directory tree.

Source code in lute/tasks/sfx_index.py
class ConcatenateStreamFiles(Task):\n    \"\"\"\n    Task that merges stream files located within a directory tree.\n    \"\"\"\n\n    def __init__(self, *, params: TaskParameters) -> None:\n        super().__init__(params=params)\n\n    def _run(self) -> None:\n\n        stream_file_path: Path = Path(self._task_parameters.in_file)\n        stream_file_list: List[Path] = list(\n            stream_file_path.rglob(f\"{self._task_parameters.tag}_*.stream\")\n        )\n\n        processed_file_list = [str(stream_file) for stream_file in stream_file_list]\n\n        msg: Message = Message(\n            contents=f\"Merging following stream files: {processed_file_list} into \"\n            f\"{self._task_parameters.out_file}\",\n        )\n        self._report_to_executor(msg)\n\n        wfd: BinaryIO\n        with open(self._task_parameters.out_file, \"wb\") as wfd:\n            infile: Path\n            for infile in stream_file_list:\n                fd: BinaryIO\n                with open(infile, \"rb\") as fd:\n                    shutil.copyfileobj(fd, wfd)\n
"},{"location":"source/tasks/task/","title":"task","text":"

Base classes for implementing analysis tasks.

Classes:

Name Description Task

Abstract base class from which all analysis tasks are derived.

ThirdPartyTask

Class to run a third-party executable binary as a Task.

"},{"location":"source/tasks/task/#tasks.task.DescribedAnalysis","title":"DescribedAnalysis dataclass","text":"

Complete analysis description. Held by an Executor.

Source code in lute/tasks/dataclasses.py
@dataclass\nclass DescribedAnalysis:\n    \"\"\"Complete analysis description. Held by an Executor.\"\"\"\n\n    task_result: TaskResult\n    task_parameters: Optional[TaskParameters]\n    task_env: Dict[str, str]\n    poll_interval: float\n    communicator_desc: List[str]\n
"},{"location":"source/tasks/task/#tasks.task.ElogSummaryPlots","title":"ElogSummaryPlots dataclass","text":"

Holds a graphical summary intended for display in the eLog.

Attributes:

Name Type Description display_name str

This represents both a path and how the result will be displayed in the eLog. Can include \"/\" characters. E.g. display_name = \"scans/my_motor_scan\" will have plots shown on a \"my_motor_scan\" page, under a \"scans\" tab. This format mirrors how the file is stored on disk as well.

Source code in lute/tasks/dataclasses.py
@dataclass\nclass ElogSummaryPlots:\n    \"\"\"Holds a graphical summary intended for display in the eLog.\n\n    Attributes:\n        display_name (str): This represents both a path and how the result will be\n            displayed in the eLog. Can include \"/\" characters. E.g.\n            `display_name = \"scans/my_motor_scan\"` will have plots shown\n            on a \"my_motor_scan\" page, under a \"scans\" tab. This format mirrors\n            how the file is stored on disk as well.\n    \"\"\"\n\n    display_name: str\n    figures: Union[pn.Tabs, hv.Image, plt.Figure]\n
"},{"location":"source/tasks/task/#tasks.task.Task","title":"Task","text":"

Bases: ABC

Abstract base class for analysis tasks.

Attributes:

Name Type Description name str

The name of the Task.

Source code in lute/tasks/task.py
class Task(ABC):\n    \"\"\"Abstract base class for analysis tasks.\n\n    Attributes:\n        name (str): The name of the Task.\n    \"\"\"\n\n    def __init__(self, *, params: TaskParameters, use_mpi: bool = False) -> None:\n        \"\"\"Initialize a Task.\n\n        Args:\n            params (TaskParameters): Parameters needed to properly configure\n                the analysis task. These are NOT related to execution parameters\n                (number of cores, etc), except, potentially, in case of binary\n                executable sub-classes.\n\n            use_mpi (bool): Whether this Task requires the use of MPI.\n                This determines the behaviour and timing of certain signals\n                and ensures appropriate barriers are placed to not end\n                processing until all ranks have finished.\n        \"\"\"\n        self.name: str = str(type(self)).split(\"'\")[1].split(\".\")[-1]\n        self._result: TaskResult = TaskResult(\n            task_name=self.name,\n            task_status=TaskStatus.PENDING,\n            summary=\"PENDING\",\n            payload=\"\",\n        )\n        self._task_parameters: TaskParameters = params\n        timeout: int = self._task_parameters.lute_config.task_timeout\n        signal.setitimer(signal.ITIMER_REAL, timeout)\n\n        run_directory: Optional[str] = self._task_parameters.Config.run_directory\n        if run_directory is not None:\n            try:\n                os.chdir(run_directory)\n            except FileNotFoundError:\n                warnings.warn(\n                    (\n                        f\"Attempt to change to {run_directory}, but it is not found!\\n\"\n                        f\"Will attempt to run from {os.getcwd()}. It may fail!\"\n                    ),\n                    category=UserWarning,\n                )\n        self._use_mpi: bool = use_mpi\n\n    def run(self) -> None:\n        \"\"\"Calls the analysis routines and any pre/post task functions.\n\n        This method is part of the public API and should not need to be modified\n        in any subclasses.\n        \"\"\"\n        self._signal_start()\n        self._pre_run()\n        self._run()\n        self._post_run()\n        self._signal_result()\n\n    @abstractmethod\n    def _run(self) -> None:\n        \"\"\"Actual analysis to run. Overridden by subclasses.\n\n        Separating the calling API from the implementation allows `run` to\n        have pre and post task functionality embedded easily into a single\n        function call.\n        \"\"\"\n        ...\n\n    def _pre_run(self) -> None:\n        \"\"\"Code to run BEFORE the main analysis takes place.\n\n        This function may, or may not, be employed by subclasses.\n        \"\"\"\n        ...\n\n    def _post_run(self) -> None:\n        \"\"\"Code to run AFTER the main analysis takes place.\n\n        This function may, or may not, be employed by subclasses.\n        \"\"\"\n        ...\n\n    @property\n    def result(self) -> TaskResult:\n        \"\"\"TaskResult: Read-only Task Result information.\"\"\"\n        return self._result\n\n    def __call__(self) -> None:\n        self.run()\n\n    def _signal_start(self) -> None:\n        \"\"\"Send the signal that the Task will begin shortly.\"\"\"\n        start_msg: Message = Message(\n            contents=self._task_parameters, signal=\"TASK_STARTED\"\n        )\n        self._result.task_status = TaskStatus.RUNNING\n        if self._use_mpi:\n            from mpi4py import MPI\n\n            comm: MPI.Intracomm = MPI.COMM_WORLD\n            rank: int = comm.Get_rank()\n            comm.Barrier()\n            if rank == 0:\n                self._report_to_executor(start_msg)\n        else:\n            self._report_to_executor(start_msg)\n\n    def _signal_result(self) -> None:\n        \"\"\"Send the signal that results are ready along with the results.\"\"\"\n        signal: str = \"TASK_RESULT\"\n        results_msg: Message = Message(contents=self.result, signal=signal)\n        if self._use_mpi:\n            from mpi4py import MPI\n\n            comm: MPI.Intracomm = MPI.COMM_WORLD\n            rank: int = comm.Get_rank()\n            comm.Barrier()\n            if rank == 0:\n                self._report_to_executor(results_msg)\n        else:\n            self._report_to_executor(results_msg)\n        time.sleep(0.1)\n\n    def _report_to_executor(self, msg: Message) -> None:\n        \"\"\"Send a message to the Executor.\n\n        Details of `Communicator` choice are hidden from the caller. This\n        method may be overriden by subclasses with specialized functionality.\n\n        Args:\n            msg (Message): The message object to send.\n        \"\"\"\n        communicator: Communicator\n        if isinstance(msg.contents, str) or msg.contents is None:\n            communicator = PipeCommunicator()\n        else:\n            communicator = SocketCommunicator()\n\n        communicator.delayed_setup()\n        communicator.write(msg)\n        communicator.clear_communicator()\n\n    def clean_up_timeout(self) -> None:\n        \"\"\"Perform any necessary cleanup actions before exit if timing out.\"\"\"\n        ...\n
"},{"location":"source/tasks/task/#tasks.task.Task.result","title":"result: TaskResult property","text":"

TaskResult: Read-only Task Result information.

"},{"location":"source/tasks/task/#tasks.task.Task.__init__","title":"__init__(*, params, use_mpi=False)","text":"

Initialize a Task.

Parameters:

Name Type Description Default params TaskParameters

Parameters needed to properly configure the analysis task. These are NOT related to execution parameters (number of cores, etc), except, potentially, in case of binary executable sub-classes.

required use_mpi bool

Whether this Task requires the use of MPI. This determines the behaviour and timing of certain signals and ensures appropriate barriers are placed to not end processing until all ranks have finished.

False Source code in lute/tasks/task.py
def __init__(self, *, params: TaskParameters, use_mpi: bool = False) -> None:\n    \"\"\"Initialize a Task.\n\n    Args:\n        params (TaskParameters): Parameters needed to properly configure\n            the analysis task. These are NOT related to execution parameters\n            (number of cores, etc), except, potentially, in case of binary\n            executable sub-classes.\n\n        use_mpi (bool): Whether this Task requires the use of MPI.\n            This determines the behaviour and timing of certain signals\n            and ensures appropriate barriers are placed to not end\n            processing until all ranks have finished.\n    \"\"\"\n    self.name: str = str(type(self)).split(\"'\")[1].split(\".\")[-1]\n    self._result: TaskResult = TaskResult(\n        task_name=self.name,\n        task_status=TaskStatus.PENDING,\n        summary=\"PENDING\",\n        payload=\"\",\n    )\n    self._task_parameters: TaskParameters = params\n    timeout: int = self._task_parameters.lute_config.task_timeout\n    signal.setitimer(signal.ITIMER_REAL, timeout)\n\n    run_directory: Optional[str] = self._task_parameters.Config.run_directory\n    if run_directory is not None:\n        try:\n            os.chdir(run_directory)\n        except FileNotFoundError:\n            warnings.warn(\n                (\n                    f\"Attempt to change to {run_directory}, but it is not found!\\n\"\n                    f\"Will attempt to run from {os.getcwd()}. It may fail!\"\n                ),\n                category=UserWarning,\n            )\n    self._use_mpi: bool = use_mpi\n
"},{"location":"source/tasks/task/#tasks.task.Task.clean_up_timeout","title":"clean_up_timeout()","text":"

Perform any necessary cleanup actions before exit if timing out.

Source code in lute/tasks/task.py
def clean_up_timeout(self) -> None:\n    \"\"\"Perform any necessary cleanup actions before exit if timing out.\"\"\"\n    ...\n
"},{"location":"source/tasks/task/#tasks.task.Task.run","title":"run()","text":"

Calls the analysis routines and any pre/post task functions.

This method is part of the public API and should not need to be modified in any subclasses.

Source code in lute/tasks/task.py
def run(self) -> None:\n    \"\"\"Calls the analysis routines and any pre/post task functions.\n\n    This method is part of the public API and should not need to be modified\n    in any subclasses.\n    \"\"\"\n    self._signal_start()\n    self._pre_run()\n    self._run()\n    self._post_run()\n    self._signal_result()\n
"},{"location":"source/tasks/task/#tasks.task.TaskResult","title":"TaskResult dataclass","text":"

Class for storing the result of a Task's execution with metadata.

Attributes:

Name Type Description task_name str

Name of the associated task which produced it.

task_status TaskStatus

Status of associated task.

summary str

Short message/summary associated with the result.

payload Any

Actual result. May be data in any format.

impl_schemas Optional[str]

A string listing Task schemas implemented by the associated Task. Schemas define the category and expected output of the Task. An individual task may implement/conform to multiple schemas. Multiple schemas are separated by ';', e.g. * impl_schemas = \"schema1;schema2\"

Source code in lute/tasks/dataclasses.py
@dataclass\nclass TaskResult:\n    \"\"\"Class for storing the result of a Task's execution with metadata.\n\n    Attributes:\n        task_name (str): Name of the associated task which produced it.\n\n        task_status (TaskStatus): Status of associated task.\n\n        summary (str): Short message/summary associated with the result.\n\n        payload (Any): Actual result. May be data in any format.\n\n        impl_schemas (Optional[str]): A string listing `Task` schemas implemented\n            by the associated `Task`. Schemas define the category and expected\n            output of the `Task`. An individual task may implement/conform to\n            multiple schemas. Multiple schemas are separated by ';', e.g.\n                * impl_schemas = \"schema1;schema2\"\n    \"\"\"\n\n    task_name: str\n    task_status: TaskStatus\n    summary: str\n    payload: Any\n    impl_schemas: Optional[str] = None\n
"},{"location":"source/tasks/task/#tasks.task.TaskStatus","title":"TaskStatus","text":"

Bases: Enum

Possible Task statuses.

Source code in lute/tasks/dataclasses.py
class TaskStatus(Enum):\n    \"\"\"Possible Task statuses.\"\"\"\n\n    PENDING = 0\n    \"\"\"\n    Task has yet to run. Is Queued, or waiting for prior tasks.\n    \"\"\"\n    RUNNING = 1\n    \"\"\"\n    Task is in the process of execution.\n    \"\"\"\n    COMPLETED = 2\n    \"\"\"\n    Task has completed without fatal errors.\n    \"\"\"\n    FAILED = 3\n    \"\"\"\n    Task encountered a fatal error.\n    \"\"\"\n    STOPPED = 4\n    \"\"\"\n    Task was, potentially temporarily, stopped/suspended.\n    \"\"\"\n    CANCELLED = 5\n    \"\"\"\n    Task was cancelled prior to completion or failure.\n    \"\"\"\n    TIMEDOUT = 6\n    \"\"\"\n    Task did not reach completion due to timeout.\n    \"\"\"\n
"},{"location":"source/tasks/task/#tasks.task.TaskStatus.CANCELLED","title":"CANCELLED = 5 class-attribute instance-attribute","text":"

Task was cancelled prior to completion or failure.

"},{"location":"source/tasks/task/#tasks.task.TaskStatus.COMPLETED","title":"COMPLETED = 2 class-attribute instance-attribute","text":"

Task has completed without fatal errors.

"},{"location":"source/tasks/task/#tasks.task.TaskStatus.FAILED","title":"FAILED = 3 class-attribute instance-attribute","text":"

Task encountered a fatal error.

"},{"location":"source/tasks/task/#tasks.task.TaskStatus.PENDING","title":"PENDING = 0 class-attribute instance-attribute","text":"

Task has yet to run. Is Queued, or waiting for prior tasks.

"},{"location":"source/tasks/task/#tasks.task.TaskStatus.RUNNING","title":"RUNNING = 1 class-attribute instance-attribute","text":"

Task is in the process of execution.

"},{"location":"source/tasks/task/#tasks.task.TaskStatus.STOPPED","title":"STOPPED = 4 class-attribute instance-attribute","text":"

Task was, potentially temporarily, stopped/suspended.

"},{"location":"source/tasks/task/#tasks.task.TaskStatus.TIMEDOUT","title":"TIMEDOUT = 6 class-attribute instance-attribute","text":"

Task did not reach completion due to timeout.

"},{"location":"source/tasks/task/#tasks.task.ThirdPartyTask","title":"ThirdPartyTask","text":"

Bases: Task

A Task interface to analysis with binary executables.

Source code in lute/tasks/task.py
class ThirdPartyTask(Task):\n    \"\"\"A `Task` interface to analysis with binary executables.\"\"\"\n\n    def __init__(self, *, params: TaskParameters) -> None:\n        \"\"\"Initialize a Task.\n\n        Args:\n            params (TaskParameters): Parameters needed to properly configure\n                the analysis task. `Task`s of this type MUST include the name\n                of a binary to run and any arguments which should be passed to\n                it (as would be done via command line). The binary is included\n                with the parameter `executable`. All other parameter names are\n                assumed to be the long/extended names of the flag passed on the\n                command line by default:\n                    * `arg_name = 3` is converted to `--arg_name 3`\n                Positional arguments can be included with `p_argN` where `N` is\n                any integer:\n                    * `p_arg1 = 3` is converted to `3`\n\n                Note that it is NOT recommended to rely on this default behaviour\n                as command-line arguments can be passed in many ways. Refer to\n                the dcoumentation at\n                https://slac-lcls.github.io/lute/tutorial/new_task/\n                under \"Speciyfing a TaskParameters Model for your Task\" for more\n                information on how to control parameter parsing from within your\n                TaskParameters model definition.\n        \"\"\"\n        super().__init__(params=params)\n        self._cmd = self._task_parameters.executable\n        self._args_list: List[str] = [self._cmd]\n        self._template_context: Dict[str, Any] = {}\n\n    def _add_to_jinja_context(self, param_name: str, value: Any) -> None:\n        \"\"\"Store a parameter as a Jinja template variable.\n\n        Variables are stored in a dictionary which is used to fill in a\n        premade Jinja template for a third party configuration file.\n\n        Args:\n            param_name (str): Name to store the variable as. This should be\n                the name defined in the corresponding pydantic model. This name\n                MUST match the name used in the Jinja Template!\n            value (Any): The value to store. If possible, large chunks of the\n                template should be represented as a single dictionary for\n                simplicity; however, any type can be stored as needed.\n        \"\"\"\n        context_update: Dict[str, Any] = {param_name: value}\n        if __debug__:\n            msg: Message = Message(contents=f\"TemplateParameters: {context_update}\")\n            self._report_to_executor(msg)\n        self._template_context.update(context_update)\n\n    def _template_to_config_file(self) -> None:\n        \"\"\"Convert a template file into a valid configuration file.\n\n        Uses Jinja to fill in a provided template file with variables supplied\n        through the LUTE config file. This facilitates parameter modification\n        for third party tasks which use a separate configuration, in addition\n        to, or instead of, command-line arguments.\n        \"\"\"\n        from jinja2 import Environment, FileSystemLoader, Template\n\n        out_file: str = self._task_parameters.lute_template_cfg.output_path\n        template_name: str = self._task_parameters.lute_template_cfg.template_name\n\n        lute_path: Optional[str] = os.getenv(\"LUTE_PATH\")\n        template_dir: str\n        if lute_path is None:\n            warnings.warn(\n                \"LUTE_PATH is None in Task process! Using relative path for templates!\",\n                category=UserWarning,\n            )\n            template_dir: str = \"../../config/templates\"\n        else:\n            template_dir = f\"{lute_path}/config/templates\"\n        environment: Environment = Environment(loader=FileSystemLoader(template_dir))\n        template: Template = environment.get_template(template_name)\n\n        with open(out_file, \"w\", encoding=\"utf-8\") as cfg_out:\n            cfg_out.write(template.render(self._template_context))\n\n    def _pre_run(self) -> None:\n        \"\"\"Parse the parameters into an appropriate argument list.\n\n        Arguments are identified by a `flag_type` attribute, defined in the\n        pydantic model, which indicates how to pass the parameter and its\n        argument on the command-line. This method parses flag:value pairs\n        into an appropriate list to be used to call the executable.\n\n        Note:\n        ThirdPartyParameter objects are returned by custom model validators.\n        Objects of this type are assumed to be used for a templated config\n        file used by the third party executable for configuration. The parsing\n        of these parameters is performed separately by a template file used as\n        an input to Jinja. This method solely identifies the necessary objects\n        and passes them all along. Refer to the template files and pydantic\n        models for more information on how these parameters are defined and\n        identified.\n        \"\"\"\n        super()._pre_run()\n        full_schema: Dict[str, Union[str, Dict[str, Any]]] = (\n            self._task_parameters.schema()\n        )\n        short_flags_use_eq: bool\n        long_flags_use_eq: bool\n        if hasattr(self._task_parameters.Config, \"short_flags_use_eq\"):\n            short_flags_use_eq: bool = self._task_parameters.Config.short_flags_use_eq\n            long_flags_use_eq: bool = self._task_parameters.Config.long_flags_use_eq\n        else:\n            short_flags_use_eq = False\n            long_flags_use_eq = False\n        for param, value in self._task_parameters.dict().items():\n            # Clunky test with __dict__[param] because compound model-types are\n            # converted to `dict`. E.g. type(value) = dict not AnalysisHeader\n            if (\n                param == \"executable\"\n                or value is None  # Cannot have empty values in argument list for execvp\n                or value == \"\"  # But do want to include, e.g. 0\n                or isinstance(self._task_parameters.__dict__[param], TemplateConfig)\n                or isinstance(self._task_parameters.__dict__[param], AnalysisHeader)\n            ):\n                continue\n            if isinstance(self._task_parameters.__dict__[param], TemplateParameters):\n                # TemplateParameters objects have a single parameter `params`\n                self._add_to_jinja_context(param_name=param, value=value.params)\n                continue\n\n            param_attributes: Dict[str, Any] = full_schema[\"properties\"][param]\n            # Some model params do not match the commnad-line parameter names\n            param_repr: str\n            if \"rename_param\" in param_attributes:\n                param_repr = param_attributes[\"rename_param\"]\n            else:\n                param_repr = param\n            if \"flag_type\" in param_attributes:\n                flag: str = param_attributes[\"flag_type\"]\n                if flag:\n                    # \"-\" or \"--\" flags\n                    if flag == \"--\" and isinstance(value, bool) and not value:\n                        continue\n                    constructed_flag: str = f\"{flag}{param_repr}\"\n                    if flag == \"--\" and isinstance(value, bool) and value:\n                        # On/off flag, e.g. something like --verbose: No Arg\n                        self._args_list.append(f\"{constructed_flag}\")\n                        continue\n                    if (flag == \"-\" and short_flags_use_eq) or (\n                        flag == \"--\" and long_flags_use_eq\n                    ):  # Must come after above check! Otherwise you get --param=True\n                        # Flags following --param=value or -param=value\n                        constructed_flag = f\"{constructed_flag}={value}\"\n                        self._args_list.append(f\"{constructed_flag}\")\n                        continue\n                    self._args_list.append(f\"{constructed_flag}\")\n            else:\n                warnings.warn(\n                    (\n                        f\"Model parameters should be defined using Field(...,flag_type='')\"\n                        f\" in the future.  Parameter: {param}\"\n                    ),\n                    category=PendingDeprecationWarning,\n                )\n                if len(param) == 1:  # Single-dash flags\n                    if short_flags_use_eq:\n                        self._args_list.append(f\"-{param_repr}={value}\")\n                        continue\n                    self._args_list.append(f\"-{param_repr}\")\n                elif \"p_arg\" in param:  # Positional arguments\n                    pass\n                else:  # Double-dash flags\n                    if isinstance(value, bool) and not value:\n                        continue\n                    if long_flags_use_eq:\n                        self._args_list.append(f\"--{param_repr}={value}\")\n                        continue\n                    self._args_list.append(f\"--{param_repr}\")\n                    if isinstance(value, bool) and value:\n                        continue\n            if isinstance(value, str) and \" \" in value:\n                for val in value.split():\n                    self._args_list.append(f\"{val}\")\n            else:\n                self._args_list.append(f\"{value}\")\n        if (\n            hasattr(self._task_parameters, \"lute_template_cfg\")\n            and self._template_context\n        ):\n            self._template_to_config_file()\n\n    def _run(self) -> None:\n        \"\"\"Execute the new program by replacing the current process.\"\"\"\n        if __debug__:\n            time.sleep(0.1)\n            msg: Message = Message(contents=self._formatted_command())\n            self._report_to_executor(msg)\n        LUTE_DEBUG_EXIT(\"LUTE_DEBUG_BEFORE_TPP_EXEC\")\n        os.execvp(file=self._cmd, args=self._args_list)\n\n    def _formatted_command(self) -> str:\n        \"\"\"Returns the command as it would passed on the command-line.\"\"\"\n        formatted_cmd: str = \"\".join(f\"{arg} \" for arg in self._args_list)\n        return formatted_cmd\n\n    def _signal_start(self) -> None:\n        \"\"\"Override start signal method to switch communication methods.\"\"\"\n        super()._signal_start()\n        time.sleep(0.05)\n        signal: str = \"NO_PICKLE_MODE\"\n        msg: Message = Message(signal=signal)\n        self._report_to_executor(msg)\n
"},{"location":"source/tasks/task/#tasks.task.ThirdPartyTask.__init__","title":"__init__(*, params)","text":"

Initialize a Task.

Parameters:

Name Type Description Default params TaskParameters

Parameters needed to properly configure the analysis task. Tasks of this type MUST include the name of a binary to run and any arguments which should be passed to it (as would be done via command line). The binary is included with the parameter executable. All other parameter names are assumed to be the long/extended names of the flag passed on the command line by default: * arg_name = 3 is converted to --arg_name 3 Positional arguments can be included with p_argN where N is any integer: * p_arg1 = 3 is converted to 3

Note that it is NOT recommended to rely on this default behaviour as command-line arguments can be passed in many ways. Refer to the dcoumentation at https://slac-lcls.github.io/lute/tutorial/new_task/ under \"Speciyfing a TaskParameters Model for your Task\" for more information on how to control parameter parsing from within your TaskParameters model definition.

required Source code in lute/tasks/task.py
def __init__(self, *, params: TaskParameters) -> None:\n    \"\"\"Initialize a Task.\n\n    Args:\n        params (TaskParameters): Parameters needed to properly configure\n            the analysis task. `Task`s of this type MUST include the name\n            of a binary to run and any arguments which should be passed to\n            it (as would be done via command line). The binary is included\n            with the parameter `executable`. All other parameter names are\n            assumed to be the long/extended names of the flag passed on the\n            command line by default:\n                * `arg_name = 3` is converted to `--arg_name 3`\n            Positional arguments can be included with `p_argN` where `N` is\n            any integer:\n                * `p_arg1 = 3` is converted to `3`\n\n            Note that it is NOT recommended to rely on this default behaviour\n            as command-line arguments can be passed in many ways. Refer to\n            the dcoumentation at\n            https://slac-lcls.github.io/lute/tutorial/new_task/\n            under \"Speciyfing a TaskParameters Model for your Task\" for more\n            information on how to control parameter parsing from within your\n            TaskParameters model definition.\n    \"\"\"\n    super().__init__(params=params)\n    self._cmd = self._task_parameters.executable\n    self._args_list: List[str] = [self._cmd]\n    self._template_context: Dict[str, Any] = {}\n
"},{"location":"source/tasks/test/","title":"test","text":"

Basic test Tasks for testing functionality.

Classes:

Name Description Test

Simplest test Task - runs a 10 iteration loop and returns a result.

TestSocket

Test Task which sends larger data to test socket IPC.

TestWriteOutput

Test Task which writes an output file.

TestReadOutput

Test Task which reads in a file. Can be used to test database access.

"},{"location":"source/tasks/test/#tasks.test.Test","title":"Test","text":"

Bases: Task

Simple test Task to ensure subprocess and pipe-based IPC work.

Source code in lute/tasks/test.py
class Test(Task):\n    \"\"\"Simple test Task to ensure subprocess and pipe-based IPC work.\"\"\"\n\n    def __init__(self, *, params: TaskParameters) -> None:\n        super().__init__(params=params)\n\n    def _run(self) -> None:\n        for i in range(10):\n            time.sleep(1)\n            msg: Message = Message(contents=f\"Test message {i}\")\n            self._report_to_executor(msg)\n        if self._task_parameters.throw_error:\n            raise RuntimeError(\"Testing Error!\")\n\n    def _post_run(self) -> None:\n        self._result.summary = \"Test Finished.\"\n        self._result.task_status = TaskStatus.COMPLETED\n        time.sleep(0.1)\n
"},{"location":"source/tasks/test/#tasks.test.TestReadOutput","title":"TestReadOutput","text":"

Bases: Task

Simple test Task to read in output from the test Task above.

Its pydantic model relies on a database access to retrieve the output file.

Source code in lute/tasks/test.py
class TestReadOutput(Task):\n    \"\"\"Simple test Task to read in output from the test Task above.\n\n    Its pydantic model relies on a database access to retrieve the output file.\n    \"\"\"\n\n    def __init__(self, *, params: TaskParameters) -> None:\n        super().__init__(params=params)\n\n    def _run(self) -> None:\n        array: np.ndarray = np.loadtxt(self._task_parameters.in_file, delimiter=\",\")\n        self._report_to_executor(msg=Message(contents=\"Successfully loaded data!\"))\n        for i in range(5):\n            time.sleep(1)\n\n    def _post_run(self) -> None:\n        super()._post_run()\n        self._result.summary = \"Was able to load data.\"\n        self._result.payload = \"This Task produces no output.\"\n        self._result.task_status = TaskStatus.COMPLETED\n
"},{"location":"source/tasks/test/#tasks.test.TestSocket","title":"TestSocket","text":"

Bases: Task

Simple test Task to ensure basic IPC over Unix sockets works.

Source code in lute/tasks/test.py
class TestSocket(Task):\n    \"\"\"Simple test Task to ensure basic IPC over Unix sockets works.\"\"\"\n\n    def __init__(self, *, params: TaskParameters) -> None:\n        super().__init__(params=params)\n\n    def _run(self) -> None:\n        for i in range(self._task_parameters.num_arrays):\n            msg: Message = Message(contents=f\"Sending array {i}\")\n            self._report_to_executor(msg)\n            time.sleep(0.05)\n            msg: Message = Message(\n                contents=np.random.rand(self._task_parameters.array_size)\n            )\n            self._report_to_executor(msg)\n\n    def _post_run(self) -> None:\n        super()._post_run()\n        self._result.summary = f\"Sent {self._task_parameters.num_arrays} arrays\"\n        self._result.payload = np.random.rand(self._task_parameters.array_size)\n        self._result.task_status = TaskStatus.COMPLETED\n
"},{"location":"source/tasks/test/#tasks.test.TestWriteOutput","title":"TestWriteOutput","text":"

Bases: Task

Simple test Task to write output other Tasks depend on.

Source code in lute/tasks/test.py
class TestWriteOutput(Task):\n    \"\"\"Simple test Task to write output other Tasks depend on.\"\"\"\n\n    def __init__(self, *, params: TaskParameters) -> None:\n        super().__init__(params=params)\n\n    def _run(self) -> None:\n        for i in range(self._task_parameters.num_vals):\n            # Doing some calculations...\n            time.sleep(0.05)\n            if i % 10 == 0:\n                msg: Message = Message(contents=f\"Processed {i+1} values!\")\n                self._report_to_executor(msg)\n\n    def _post_run(self) -> None:\n        super()._post_run()\n        work_dir: str = self._task_parameters.lute_config.work_dir\n        out_file: str = f\"{work_dir}/{self._task_parameters.outfile_name}\"\n        array: np.ndarray = np.random.rand(self._task_parameters.num_vals)\n        np.savetxt(out_file, array, delimiter=\",\")\n        self._result.summary = \"Completed task successfully.\"\n        self._result.payload = out_file\n        self._result.task_status = TaskStatus.COMPLETED\n
"},{"location":"tutorial/creating_workflows/","title":"Workflows with Airflow","text":"

Note: Airflow uses the term DAG, or directed acyclic graph, to describe workflows of tasks with defined (and acyclic) connectivities. This page will use the terms workflow and DAG interchangeably.

"},{"location":"tutorial/creating_workflows/#relevant-components","title":"Relevant Components","text":"

In addition to the core LUTE package, a number of components are generally involved to run a workflow. The current set of scripts and objects are used to interface with Airflow, and the SLURM job scheduler. The core LUTE library can also be used to run workflows using different backends, and in the future these may be supported.

For building and running workflows using SLURM and Airflow, the following components are necessary, and will be described in more detail below: - Airflow launch script: launch_airflow.py - This has a wrapper batch submission script: submit_launch_airflow.sh . When running using the ARP (from the eLog), you MUST use this wrapper script instead of the Python script directly. - SLURM submission script: submit_slurm.sh - Airflow operators: - JIDSlurmOperator

"},{"location":"tutorial/creating_workflows/#launchsubmission-scripts","title":"Launch/Submission Scripts","text":""},{"location":"tutorial/creating_workflows/#launch_airflowpy","title":"launch_airflow.py","text":"

Sends a request to an Airflow instance to submit a specific DAG (workflow). This script prepares an HTTP request with the appropriate parameters in a specific format.

A request involves the following information, most of which is retrieved automatically:

dag_run_data: Dict[str, Union[str, Dict[str, Union[str, int, List[str]]]]] = {\n    \"dag_run_id\": str(uuid.uuid4()),\n    \"conf\": {\n        \"experiment\": os.environ.get(\"EXPERIMENT\"),\n        \"run_id\": f\"{os.environ.get('RUN_NUM')}{datetime.datetime.utcnow().isoformat()}\",\n        \"JID_UPDATE_COUNTERS\": os.environ.get(\"JID_UPDATE_COUNTERS\"),\n        \"ARP_ROOT_JOB_ID\": os.environ.get(\"ARP_JOB_ID\"),\n        \"ARP_LOCATION\": os.environ.get(\"ARP_LOCATION\", \"S3DF\"),\n        \"Authorization\": os.environ.get(\"Authorization\"),\n        \"user\": getpass.getuser(),\n        \"lute_params\": params,\n        \"slurm_params\": extra_args,\n        \"workflow\": wf_defn,  # Used only for custom DAGs. See below under advanced usage.\n    },\n}\n

Note that the environment variables are used to fill in the appropriate information because this script is intended to be launched primarily from the ARP (which passes these variables). The ARP allows for the launch job to be defined in the experiment eLog and submitted automatically for each new DAQ run. The environment variables EXPERIMENT and RUN can alternatively be defined prior to submitting the script on the command-line.

The script takes a number of parameters:

launch_airflow.py -c <path_to_config_yaml> -w <workflow_name> [--debug] [--test] [-e <exp>] [-r <run>] [SLURM_ARGS]\n

Lifetime This script will run for the entire duration of the workflow (DAG). After making the initial request of Airflow to launch the DAG, it will enter a status update loop which will keep track of each individual job (each job runs one managed Task) submitted by Airflow. At the end of each job it will collect the log file, in addition to providing a few other status updates/debugging messages, and append it to its own log. This allows all logging for the entire workflow (DAG) to be inspected from an individual file. This is particularly useful when running via the eLog, because only a single log file is displayed.

"},{"location":"tutorial/creating_workflows/#submit_launch_airflowsh","title":"submit_launch_airflow.sh","text":"

This script is only necessary when running from the eLog using the ARP. The initial job submitted by the ARP can not have a duration of longer than 30 seconds, as it will then time out. As the launch_airflow.py job will live for the entire duration of the workflow, which is often much longer than 30 seconds, the solution was to have a wrapper which submits the launch_airflow.py script to run on the S3DF batch nodes. Usage of this script is mostly identical to launch_airflow.py. All the arguments are passed transparently to the underlying Python script with the exception of the first argument which must be the location of the underlying launch_airflow.py script. The wrapper will simply launch a batch job using minimal resources (1 core). While the primary purpose of the script is to allow running from the eLog, it is also an useful wrapper generally, to be able to submit the previous script as a SLURM job.

Usage:

submit_launch_airflow.sh /path/to/launch_airflow.py -c <path_to_config_yaml> -w <workflow_name> [--debug] [--test] [-e <exp>] [-r <run>] [SLURM_ARGS]\n
"},{"location":"tutorial/creating_workflows/#submit_slurmsh","title":"submit_slurm.sh","text":"

Launches a job on the S3DF batch nodes using the SLURM job scheduler. This script launches a single managed Task at a time. The usage is as follows:

submit_slurm.sh -c <path_to_config_yaml> -t <MANAGED_task_name> [--debug] [SLURM_ARGS ...]\n

As a reminder the managed Task refers to the Executor-Task combination. The script does not parse any SLURM specific parameters, and instead passes them transparently to SLURM. At least the following two SLURM arguments must be provided:

--partition=<...> # Usually partition=milano\n--account=<...> # Usually account=lcls:$EXPERIMENT\n

Generally, resource requests will also be included, such as the number of cores to use. A complete call may look like the following:

submit_slurm.sh -c /sdf/data/lcls/ds/hutch/experiment/scratch/config.yaml -t Tester --partition=milano --account=lcls:experiment --ntasks=100 [...]\n

When running a workflow using the launch_airflow.py script, each step of the workflow will be submitted using this script.

"},{"location":"tutorial/creating_workflows/#operators","title":"Operators","text":"

Operators are the objects submitted as individual steps of a DAG by Airflow. They are conceptually linked to the idea of a task in that each task of a workflow is generally an operator. Care should be taken, not to confuse them with LUTE Tasks or managed Tasks though. There is, however, usually a one-to-one correspondance between a Task and an Operator.

Airflow runs on a K8S cluster which has no access to the experiment data. When we ask Airflow to run a DAG, it will launch an Operator for each step of the DAG. However, the Operator itself cannot perform productive analysis without access to the data. The solution employed by LUTE is to have a limited set of Operators which do not perform analysis, but instead request that a LUTE managed Tasks be submitted on the batch nodes where it can access the data. There may be small differences between how the various provided Operators do this, but in general they will all make a request to the job interface daemon (JID) that a new SLURM job be scheduled using the submit_slurm.sh script described above.

Therefore, running a typical Airflow DAG involves the following steps:

  1. launch_airflow.py script is submitted, usually from a definition in the eLog.
  2. The launch_airflow script requests that Airflow run a specific DAG.
  3. The Airflow instance begins submitting the Operators that makeup the DAG definition.
  4. Each Operator sends a request to the JID to submit a job.
  5. The JID submits the elog_submit.sh script with the appropriate managed Task.
  6. The managed Task runs on the batch nodes, while the Operator, requesting updates from the JID on job status, waits for it to complete.
  7. Once a managed Task completes, the Operator will receieve this information and tell the Airflow server whether the job completed successfully or resulted in failure.
  8. The Airflow server will then launch the next step of the DAG, and so on, until every step has been executed.

Currently, the following Operators are maintained: - JIDSlurmOperator: The standard Operator. Each instance has a one-to-one correspondance with a LUTE managed Task.

"},{"location":"tutorial/creating_workflows/#jidslurmoperator-arguments","title":"JIDSlurmOperator arguments","text":""},{"location":"tutorial/creating_workflows/#creating-a-new-workflow","title":"Creating a new workflow","text":"

Defining a new workflow involves creating a new module (Python file) in the directory workflows/airflow, creating a number of Operator instances within the module, and then drawing the connectivity between them. At the top of the file an Airflow DAG is created and given a name. By convention all LUTE workflows use the name of the file as the name of the DAG. The following code can be copied exactly into the file:

from datetime import datetime\nimport os\nfrom airflow import DAG\nfrom lute.operators.jidoperators import JIDSlurmOperator # Import other operators if needed\n\ndag_id: str = f\"lute_{os.path.splitext(os.path.basename(__file__))[0]}\"\ndescription: str = (\n    \"Run SFX processing using PyAlgos peak finding and experimental phasing\"\n)\n\ndag: DAG = DAG(\n    dag_id=dag_id,\n    start_date=datetime(2024, 3, 18),\n    schedule_interval=None,\n    description=description,\n)\n

Once the DAG has been created, a number of Operators must be created to run the various LUTE analysis operations. As an example consider a partial SFX processing workflow which includes steps for peak finding, indexing, merging, and calculating figures of merit. Each of the 4 steps will have an Operator instance which will launch a corresponding LUTE managed Task, for example:

# Using only the JIDSlurmOperator\n# syntax: JIDSlurmOperator(task_id=\"LuteManagedTaskName\", dag=dag) # optionally, max_cores=123)\npeak_finder: JIDSlurmOperator = JIDSlurmOperator(task_id=\"PeakFinderPyAlgos\", dag=dag)\n\n# We specify a maximum number of cores for the rest of the jobs.\nindexer: JIDSlurmOperator = JIDSlurmOperator(\n    max_cores=120, task_id=\"CrystFELIndexer\", dag=dag\n)\n# We can alternatively specify this task be only ever run with the following args.\n# indexer: JIDSlurmOperator = JIDSlurmOperator(\n#     custom_slurm_params=\"--partition=milano --ntasks=120 --account=lcls:myaccount\",\n#     task_id=\"CrystFELIndexer\",\n#     dag=dag,\n# )\n\n# Merge\nmerger: JIDSlurmOperator = JIDSlurmOperator(\n    max_cores=120, task_id=\"PartialatorMerger\", dag=dag\n)\n\n# Figures of merit\nhkl_comparer: JIDSlurmOperator = JIDSlurmOperator(\n    max_cores=8, task_id=\"HKLComparer\", dag=dag\n)\n

Finally, the dependencies between the Operators are \"drawn\", defining the execution order of the various steps. The >> operator has been overloaded for the Operator class, allowing it to be used to specify the next step in the DAG. In this case, a completely linear DAG is drawn as:

peak_finder >> indexer >> merger >> hkl_comparer\n

Parallel execution can be added by using the >> operator multiple times. Consider a task1 which upon successful completion starts a task2 and task3 in parallel. This dependency can be added to the DAG using:

#task1: JIDSlurmOperator = JIDSlurmOperator(...)\n#task2 ...\n\ntask1 >> task2\ntask1 >> task3\n

As each DAG is defined in pure Python, standard control structures (loops, if statements, etc.) can be used to create more complex workflow arrangements.

Note: Your DAG will not be available to Airflow until your PR including the file you have defined is merged! Once merged the file will be synced with the Airflow instance and can be run using the scripts described earlier in this document. For testing it is generally preferred that you run each step of your DAG individually using the submit_slurm.sh script and the independent managed Task names. If, however, you want to test the behaviour of Airflow itself (in a modified form) you can use the advanced run-time DAGs defined below as well.

"},{"location":"tutorial/creating_workflows/#advanced-usage","title":"Advanced Usage","text":""},{"location":"tutorial/creating_workflows/#run-time-dag-creation","title":"Run-time DAG creation","text":"

In most cases, standard DAGs should be defined as described above and called by name. However, Airflow also supports the creation of DAGs dynamically, e.g. to vary the input data to various steps, or the number of steps that will occur. Some of this functionality has been used to allow for user-defined DAGs which are passed in the form of a dictionary, allowing Airflow to construct the workflow as it is running.

A basic YAML syntax is used to construct a series of nested dictionaries which define a DAG. Considering the first example DAG defined above (for serial femtosecond crystallography), the standard DAG looked like:

peak_finder >> indexer >> merger >> hkl_comparer\n

We can alternatively define this DAG in YAML:

task_name: PeakFinderPyAlgos\nslurm_params: ''\nnext:\n- task_name: CrystFELIndexer\n  slurm_params: ''\n  next: []\n  - task_name: PartialatorMerger\n    slurm_params: ''\n    next: []\n    - task_name: HKLComparer\n      slurm_params: ''\n      next: []\n

I.e. we define a tree where each node is constructed using Node(task_name: str, slurm_params: str, next: List[Node]).

As a second example, to run task1 followed by task2 and task3 in parellel we would use:

task_name: Task1\nslurm_params: ''\nnext:\n- task_name: Task2\n  slurm_params: ''\n  next: []\n- task_name: Task3\n  slurm_params: ''\n  next: []\n

In order to run a DAG defined this way we pass the path to the YAML file we have defined it in to the launch script using -W <path_to_dag>. This is instead of calling it by name. E.g.

/path/to/lute/launch_scripts/submit_launch_airflow.sh /path/to/lute/launch_scripts/launch_airflow.py -e <exp> -r <run> -c /path/to/config -W <path_to_dag> --test [--debug] [SLURM_ARGS]\n

Note that fewer options are currently supported for configuring the operators for each step of the DAG. The slurm arguments can be replaced in their entirety using a custom slurm_params string but individual options cannot be modified.

"},{"location":"tutorial/new_task/","title":"Integrating a New Task","text":"

Tasks can be broadly categorized into two types: - \"First-party\" - where the analysis or executed code is maintained within this library. - \"Third-party\" - where the analysis, code, or program is maintained elsewhere and is simply called by a wrapping Task.

Creating a new Task of either type generally involves the same steps, although for first-party Tasks, the analysis code must of course also be written. Due to this difference, as well as additional considerations for parameter handling when dealing with \"third-party\" Tasks, the \"first-party\" and \"third-party\" Task integration cases will be considered separately.

"},{"location":"tutorial/new_task/#creating-a-third-party-task","title":"Creating a \"Third-party\" Task","text":"

There are two required steps for third-party Task integration, and one additional step which is optional, and may not be applicable to all possible third-party Tasks. Generally, Task integration requires: 1. Defining a TaskParameters (pydantic) model which fully parameterizes the Task. This involves specifying a path to a binary, and all the required command-line arguments to run the binary. 2. Creating a managed Task by specifying an Executor for the new third-party Task. At this stage, any additional environment variables can be added which are required for the execution environment. 3. (Optional/Maybe applicable) Create a template for a third-party configuration file. If the new Task has its own configuration file, specifying a template will allow that file to be parameterized from the singular LUTE yaml configuration file. A couple of minor additions to the pydantic model specified in 1. are required to support template usage.

Each of these stages will be discussed in detail below. The vast majority of the work is completed in step 1.

"},{"location":"tutorial/new_task/#specifying-a-taskparameters-model-for-your-task","title":"Specifying a TaskParameters Model for your Task","text":"

A brief overview of parameters objects will be provided below. The following information goes into detail only about specifics related to LUTE configuration. An in depth description of pydantic is beyond the scope of this tutorial; please refer to the official documentation for more information. Please note that due to environment constraints pydantic is currently pinned to version 1.10! Make sure to read the appropriate documentation for this version as many things are different compared to the newer releases. At the end this document there will be an example highlighting some supported behaviour as well as a FAQ to address some common integration considerations.

Tasks and TaskParameters

All Tasks have a corresponding TaskParameters object. These objects are linked exclusively by a named relationship. For a Task named MyThirdPartyTask, the parameters object must be named MyThirdPartyTaskParameters. For third-party Tasks there are a number of additional requirements: - The model must inherit from a base class called ThirdPartyParameters. - The model must have one field specified called executable. The presence of this field indicates that the Task is a third-party Task and the specified executable must be called. This allows all third-party Tasks to be defined exclusively by their parameters model. A single ThirdPartyTask class handles execution of all third-party Tasks.

All models are stored in lute/io/models. For any given Task, a new model can be added to an existing module contained in this directory or to a new module. If creating a new module, make sure to add an import statement to lute.io.models.__init__.

Defining TaskParameters

When specifying parameters the default behaviour is to provide a one-to-one correspondance between the Python attribute specified in the parameter model, and the parameter specified on the command-line. Single-letter attributes are assumed to be passed using -, e.g. n will be passed as -n when the executable is launched. Longer attributes are passed using --, e.g. by default a model attribute named my_arg will be passed on the command-line as --my_arg. Positional arguments are specified using p_argX where X is a number. All parameters are passed in the order that they are specified in the model.

However, because the number of possible command-line combinations is large, relying on the default behaviour above is NOT recommended. It is provided solely as a fallback. Instead, there are a number of configuration knobs which can be tuned to achieve the desired behaviour. The two main mechanisms for controlling behaviour are specification of model-wide configuration under the Config class within the model's definition, and parameter-by-parameter configuration using field attributes. For the latter, we define all parameters as Field objects. This allows parameters to have their own attributes, which are parsed by LUTE's task-layer. Given this, the preferred starting template for a TaskParameters model is the following - we assume we are integrating a new Task called RunTask:

\nfrom pydantic import Field, validator\n# Also include any pydantic type specifications - Pydantic has many custom\n# validation types already, e.g. types for constrained numberic values, URL handling, etc.\n\nfrom .base import ThirdPartyParameters\n\n# Change class name as necessary\nclass RunTaskParameters(ThirdPartyParameters):\n    \"\"\"Parameters for RunTask...\"\"\"\n\n    class Config(ThirdPartyParameters.Config): # MUST be exactly as written here.\n        ...\n        # Model-wide configuration will go here\n\n    executable: str = Field(\"/path/to/executable\", description=\"...\")\n    ...\n    # Additional params.\n    # param1: param1Type = Field(\"default\", description=\"\", ...)\n

Config settings and options Under the class definition for Config in the model, we can modify global options for all the parameters. In addition, there are a number of configuration options related to specifying what the outputs/results from the associated Task are, and a number of options to modify runtime behaviour. Currently, the available configuration options are:

Config Parameter Meaning Default Value ThirdPartyTask-specific? run_directory If provided, can be used to specify the directory from which a Task is run. None (not provided) NO set_result bool. If True search the model definition for a parameter that indicates what the result is. False NO result_from_params If set_result is True can define a result using this option and a validator. See also is_result below. None (not provided) NO short_flags_use_eq Use equals sign instead of space for arguments of - parameters. False YES - Only affects ThirdPartyTasks long_flags_use_eq Use equals sign instead of space for arguments of - parameters. False YES - Only affects ThirdPartyTasks

These configuration options modify how the parameter models are parsed and passed along on the command-line, as well as what we consider results and where a Task can run. The default behaviour is that parameters are assumed to be passed as -p arg and --param arg, the Task will be run in the current working directory (or scratch if submitted with the ARP), and we have no information about Task results . Setting the above options can modify this behaviour.

Field attributes In addition to the global configuration options there are a couple of ways to specify individual parameters. The following Field attributes are used when parsing the model:

Field Attribute Meaning Default Value Example flag_type Specify the type of flag for passing this argument. One of \"-\", \"--\", or \"\" N/A p_arg1 = Field(..., flag_type=\"\") rename_param Change the name of the parameter as passed on the command-line. N/A my_arg = Field(..., rename_param=\"my-arg\") description Documentation of the parameter's usage or purpose. N/A arg = Field(..., description=\"Argument for...\") is_result bool. If the set_result Config option is True, we can set this to True to indicate a result. N/A output_result = Field(..., is_result=true)

The flag_type attribute allows us to specify whether the parameter corresponds to a positional (\"\") command line argument, requires a single hyphen (\"-\"), or a double hyphen (\"--\"). By default, the parameter name is passed as-is on the command-line. However, command-line arguments can have characters which would not be valid in Python variable names. In particular, hyphens are frequently used. To handle this case, the rename_param attribute can be used to specify an alternative spelling of the parameter when it is passed on the command-line. This also allows for using more descriptive variable names internally than those used on the command-line. A description can also be provided for each Field to document the usage and purpose of that particular parameter.

As an example, we can again consider defining a model for a RunTask Task. Consider an executable which would normally be called from the command-line as follows:

/sdf/group/lcls/ds/tools/runtask -n <nthreads> --method=<algorithm> -p <algo_param> [--debug]\n

A model specification for this Task may look like:

class RunTaskParameters(ThirdPartyParameters):\n    \"\"\"Parameters for the runtask binary.\"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        long_flags_use_eq: bool = True  # For the --method parameter\n\n    # Prefer using full/absolute paths where possible.\n    # No flag_type needed for this field\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/runtask\", description=\"Runtask Binary v1.0\"\n    )\n\n    # We can provide a more descriptive name for -n\n    # Let's assume it's a number of threads, or processes, etc.\n    num_threads: int = Field(\n        1, description=\"Number of concurrent threads.\", flag_type=\"-\", rename_param=\"n\"\n    )\n\n    # In this case we will use the Python variable name directly when passing\n    # the parameter on the command-line\n    method: str = Field(\"algo1\", description=\"Algorithm to use.\", flag_type=\"--\")\n\n    # For an actual parameter we would probably have a better name. Lets assume\n    # This parameter (-p) modifies the behaviour of the method above.\n    method_param1: int = Field(\n        3, description=\"Modify method performance.\", flag_type=\"-\", rename_param=\"p\"\n    )\n\n    # Boolean flags are only passed when True! `--debug` is an optional parameter\n    # which is not followed by any arguments.\n    debug: bool = Field(\n        False, description=\"Whether to run in debug mode.\", flag_type=\"--\"\n    )\n

The is_result attribute allows us to specify whether the corresponding Field points to the output/result of the associated Task. Consider a Task, RunTask2 which writes its output to a single file which is passed as a parameter.

class RunTask2Parameters(ThirdPartyParameters):\n    \"\"\"Parameters for the runtask2 binary.\"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        set_result: bool = True                     # This must be set here!\n        # result_from_params: Optional[str] = None  # We can use this for more complex result setups (see below). Ignore for now.\n\n    # Prefer using full/absolute paths where possible.\n    # No flag_type needed for this field\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/runtask2\", description=\"Runtask Binary v2.0\"\n    )\n\n    # Lets assume we take one input and write one output file\n    # We will not provide a default value, so this parameter MUST be provided\n    input: str = Field(\n        description=\"Path to input file.\", flag_type=\"--\"\n    )\n\n    # We will also not provide a default for the output\n    # BUT, we will specify that whatever is provided is the result\n    output: str = Field(\n        description=\"Path to write output to.\",\n        flag_type=\"-\",\n        rename_param=\"o\",\n        is_result=True,   # This means this parameter points to the result!\n    )\n

Additional Comments 1. Model parameters of type bool are not passed with an argument and are only passed when True. This is a common use-case for boolean flags which enable things like test or debug modes, verbosity or reporting features. E.g. --debug, --test, --verbose, etc. - If you need to pass the literal words \"True\" or \"False\", use a parameter of type str. 2. You can use pydantic types to constrain parameters beyond the basic Python types. E.g. conint can be used to define lower and upper bounds for an integer. There are also types for common categories, positive/negative numbers, paths, URLs, IP addresses, etc. - Even more custom behaviour can be achieved with validators (see below). 3. All TaskParameters objects and its subclasses have access to a lute_config parameter, which is of type lute.io.models.base.AnalysisHeader. This special parameter is ignored when constructing the call for a binary task, but it provides access to shared/common parameters between tasks. For example, the following parameters are available through the lute_config object, and may be of use when constructing validators. All fields can be accessed with . notation. E.g. lute_config.experiment. - title: A user provided title/description of the analysis. - experiment: The current experiment name - run: The current acquisition run number - date: The date of the experiment or the analysis. - lute_version: The version of the software you are running. - task_timeout: How long a Task can run before it is killed. - work_dir: The main working directory for LUTE. Files and the database are created relative to this directory. This is separate from the run_directory config option. LUTE will write files to the work directory by default; however, the Task itself is run from run_directory if it is specified.

Validators Pydantic uses validators to determine whether a value for a specific field is appropriate. There are default validators for all the standard library types and the types specified within the pydantic package; however, it is straightforward to define custom ones as well. In the template code-snippet above we imported the validator decorator. To create our own validator we define a method (with any name) with the following prototype, and decorate it with the validator decorator:

@validator(\"name_of_field_to_decorate\")\ndef my_custom_validator(cls, field: Any, values: Dict[str, Any]) -> Any: ...\n

In this snippet, the field variable corresponds to the value for the specific field we want to validate. values is a dictionary of fields and their values which have been parsed prior to the current field. This means you can validate the value of a parameter based on the values provided for other parameters. Since pydantic always validates the fields in the order they are defined in the model, fields dependent on other fields should come later in the definition.

For example, consider the method_param1 field defined above for RunTask. We can provide a custom validator which changes the default value for this field depending on what type of algorithm is specified for the --method option. We will also constrain the options for method to two specific strings.

from pydantic import Field, validator, ValidationError, root_validator\nclass RunTaskParameters(ThirdPartyParameters):\n    \"\"\"Parameters for the runtask binary.\"\"\"\n\n    # [...]\n\n    # In this case we will use the Python variable name directly when passing\n    # the parameter on the command-line\n    method: str = Field(\"algo1\", description=\"Algorithm to use.\", flag_type=\"--\")\n\n    # For an actual parameter we would probably have a better name. Lets assume\n    # This parameter (-p) modifies the behaviour of the method above.\n    method_param1: Optional[int] = Field(\n        description=\"Modify method performance.\", flag_type=\"-\", rename_param=\"p\"\n    )\n\n    # We will only allow method to take on one of two values\n    @validator(\"method\")\n    def validate_method(cls, method: str, values: Dict[str, Any]) -> str:\n        \"\"\"Method validator: --method can be algo1 or algo2.\"\"\"\n\n        valid_methods: List[str] = [\"algo1\", \"algo2\"]\n        if method not in valid_methods:\n            raise ValueError(\"method must be algo1 or algo2\")\n        return method\n\n    # Lets change the default value of `method_param1` depending on `method`\n    # NOTE: We didn't provide a default value to the Field above and made it\n    # optional. We can use this to test whether someone is purposefully\n    # overriding the value of it, and if not, set the default ourselves.\n    # We set `always=True` since pydantic will normally not use the validator\n    # if the default is not changed\n    @validator(\"method_param1\", always=True)\n    def validate_method_param1(cls, param1: Optional[int], values: Dict[str, Any]) -> int:\n        \"\"\"method param1 validator\"\"\"\n\n        # If someone actively defined it, lets just return that value\n        # We could instead do some additional validation to make sure that the\n        # value they provided is valid...\n        if param1 is not None:\n            return param1\n\n        # method_param1 comes after method, so this will be defined, or an error\n        # would have been raised.\n        method: str = values['method']\n        if method == \"algo1\":\n            return 3\n        elif method == \"algo2\":\n            return 5\n

The special root_validator(pre=False) can also be used to provide validation of the model as a whole. This is also the recommended method for specifying a result (using result_from_params) which has a complex dependence on the parameters of the model. This latter use-case is described in FAQ 2 below.

"},{"location":"tutorial/new_task/#faq","title":"FAQ","text":"
  1. How can I specify a default value which depends on another parameter?

Use a custom validator. The example above shows how to do this. The parameter that depends on another parameter must come LATER in the model defintion than the independent parameter.

  1. My TaskResult is determinable from the parameters model, but it isn't easily specified by one parameter. How can I use result_from_params to indicate the result?

When a result can be identified from the set of parameters defined in a TaskParameters model, but is not as straightforward as saying it is equivalent to one of the parameters alone, we can set result_from_params using a custom validator. In the example below, we have two parameters which together determine what the result is, output_dir and out_name. Using a validator we will define a result from these two values.

from pydantic import Field, root_validator\n\nclass RunTask3Parameters(ThirdPartyParameters):\n    \"\"\"Parameters for the runtask3 binary.\"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        set_result: bool = True       # This must be set here!\n        result_from_params: str = \"\"  # We will set this momentarily\n\n    # [...] executable, other params, etc.\n\n    output_dir: str = Field(\n        description=\"Directory to write output to.\",\n        flag_type=\"--\",\n        rename_param=\"dir\",\n    )\n\n    out_name: str = Field(\n        description=\"The name of the final output file.\",\n        flag_type=\"--\",\n        rename_param=\"oname\",\n    )\n\n    # We can still provide other validators as needed\n    # But for now, we just set result_from_params\n    # Validator name can be anything, we set pre=False so this runs at the end\n    @root_validator(pre=False)\n    def define_result(cls, values: Dict[str, Any]) -> Dict[str, Any]:\n        # Extract the values of output_dir and out_name\n        output_dir: str = values[\"output_dir\"]\n        out_name: str = values[\"out_name\"]\n\n        result: str = f\"{output_dir}/{out_name}\"\n        # Now we set result_from_params\n        cls.Config.result_from_params = result\n\n        # We haven't modified any other values, but we MUST return this!\n        return values\n
  1. My new Task depends on the output of a previous Task, how can I specify this dependency? Parameters used to run a Task are recorded in a database for every Task. It is also recorded whether or not the execution of that specific parameter set was successful. A utility function is provided to access the most recent values from the database for a specific parameter of a specific Task. It can also be used to specify whether unsuccessful Tasks should be included in the query. This utility can be used within a validator to specify dependencies. For example, suppose the input of RunTask2 (parameter input) depends on the output location of RunTask1 (parameter outfile). A validator of the following type can be used to retrieve the output file and make it the default value of the input parameter.
from pydantic import Field, validator\n\nfrom .base import ThirdPartyParameters\nfrom ..db import read_latest_db_entry\n\nclass RunTask2Parameters(ThirdPartyParameters):\n    input: str = Field(\"\", description=\"Input file.\", flag_type=\"--\")\n\n    @validator(\"input\")\n    def validate_input(cls, input: str, values: Dict[str, Any]) -> str:\n        if input == \"\":\n            task1_out: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\",  # Working directory. We search for the database here.\n                \"RunTask1\",                           # Name of Task we want to look up\n                \"outfile\",                            # Name of parameter of the Task\n                valid_only=True,                      # We only want valid output files.\n            )\n            # read_latest_db_entry returns None if nothing is found\n            if task1_out is not None:\n                return task1_out\n        return input\n

There are more examples of this pattern spread throughout the various Task models.

"},{"location":"tutorial/new_task/#specifying-an-executor-creating-a-runnable-managed-task","title":"Specifying an Executor: Creating a runnable, \"managed Task\"","text":"

Overview

After a pydantic model has been created, the next required step is to define a managed Task. In the context of this library, a managed Task refers to the combination of an Executor and a Task to run. The Executor manages the process of Task submission and the execution environment, as well as performing any logging, eLog communication, etc. There are currently two types of Executor to choose from, but only one is applicable to third-party code. The second Executor is listed below for completeness only. If you need MPI see the note below.

  1. Executor: This is the standard Executor. It should be used for third-party uses cases.
  2. MPIExecutor: This performs all the same types of operations as the option above; however, it will submit your Task using MPI.
  3. The MPIExecutor will submit the Task using the number of available cores - 1. The number of cores is determined from the physical core/thread count on your local machine, or the number of cores allocated by SLURM when submitting on the batch nodes.

Using MPI with third-party Tasks

As mentioned, you should setup a third-party Task to use the first type of Executor. If, however, your third-party Task uses MPI this may seem non-intuitive. When using the MPIExecutor LUTE code is submitted with MPI. This includes the code that performs signalling to the Executor and execs the third-party code you are interested in running. While it is possible to set this code up to run with MPI, it is more challenging in the case of third-party Tasks because there is no Task code to modify directly! The MPIExecutor is provided mostly for first-party code. This is not an issue, however, since the standard Executor is easily configured to run with MPI in the case of third-party code.

When using the standard Executor for a Task requiring MPI, the executable in the pydantic model must be set to mpirun. For example, a third-party Task model, that uses MPI but is intended to be run with the Executor may look like the following. We assume this Task runs a Python script using MPI.

class RunMPITaskParameters(ThirdPartyParameters):\n    class Config(ThirdPartyParameters.Config):\n        ...\n\n    executable: str = Field(\"mpirun\", description=\"MPI executable\")\n    np: PositiveInt = Field(\n        max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n        description=\"Number of processes\",\n        flag_type=\"-\",\n    )\n    pos_arg: str = Field(\"python\", description=\"Python...\", flag_type=\"\")\n    script: str = Field(\"\", description=\"Python script to run with MPI\", flag_type=\"\")\n

Selecting the Executor

After deciding on which Executor to use, a single line must be added to the lute/managed_tasks.py module:

# Initialization: Executor(\"TaskName\")\nTaskRunner: Executor = Executor(\"SubmitTask\")\n# TaskRunner: MPIExecutor = MPIExecutor(\"SubmitTask\") ## If using the MPIExecutor\n

In an attempt to make it easier to discern whether discussing a Task or managed Task, the standard naming convention is that the Task (class name) will have a verb in the name, e.g. RunTask, SubmitTask. The corresponding managed Task will use a related noun, e.g. TaskRunner, TaskSubmitter, etc.

As a reminder, the Task name is the first part of the class name of the pydantic model, without the Parameters suffix. This name must match. E.g. if your pydantic model's class name is RunTaskParameters, the Task name is RunTask, and this is the string passed to the Executor initializer.

Modifying the environment

If your third-party Task can run in the standard psana environment with no further configuration files, the setup process is now complete and your Task can be run within the LUTE framework. If on the other hand your Task requires some changes to the environment, this is managed through the Executor. There are a couple principle methods that the Executor has to change the environment.

  1. Executor.update_environment: if you only need to add a few environment variables, or update the PATH this is the method to use. The method takes a Dict[str, str] as input. Any variables can be passed/defined using this method. By default, any variables in the dictionary will overwrite those variable definitions in the current environment if they are already present, except for the variable PATH. By default PATH entries in the dictionary are prepended to the current PATH available in the environment the Executor runs in (the standard psana environment). This behaviour can be changed to either append, or overwrite the PATH entirely by an optional second argument to the method.
  2. Executor.shell_source: This method will source a shell script which can perform numerous modifications of the environment (PATH changes, new environment variables, conda environments, etc.). The method takes a str which is the path to a shell script to source.

As an example, we will update the PATH of one Task and source a script for a second.

TaskRunner: Executor = Executor(\"RunTask\")\n# update_environment(env: Dict[str,str], update_path: str = \"prepend\") # \"append\" or \"overwrite\"\nTaskRunner.update_environment(\n    { \"PATH\": \"/sdf/group/lcls/ds/tools\" }  # This entry will be prepended to the PATH available after sourcing `psconda.sh`\n)\n\nTask2Runner: Executor = Executor(\"RunTask2\")\nTask2Runner.shell_source(\"/sdf/group/lcls/ds/tools/new_task_setup.sh\") # Will source new_task_setup.sh script\n
"},{"location":"tutorial/new_task/#using-templates-managing-third-party-configuration-files","title":"Using templates: managing third-party configuration files","text":"

Some third-party executables will require their own configuration files. These are often separate JSON or YAML files, although they can also be bash or Python scripts which are intended to be edited. Since LUTE requires its own configuration YAML file, it attempts to handle these cases by using Jinja templates. When wrapping a third-party task a template can also be provided - with small modifications to the Task's pydantic model, LUTE can process special types of parameters to render them in the template. LUTE offloads all the template rendering to Jinja, making the required additions to the pydantic model small. On the other hand, it does require understanding the Jinja syntax, and the provision of a well-formatted template, to properly parse parameters. Some basic examples of this syntax will be shown below; however, it is recommended that the Task implementer refer to the official Jinja documentation for more information.

LUTE provides two additional base models which are used for template parsing in conjunction with the primary Task model. These are: - TemplateParameters objects which hold parameters which will be used to render a portion of a template. - TemplateConfig objects which hold two strings: the name of the template file to use and the full path (including filename) of where to output the rendered result.

Task models which inherit from the ThirdPartyParameters model, as all third-party Tasks should, allow for extra arguments. LUTE will parse any extra arguments provided in the configuration YAML as TemplateParameters objects automatically, which means that they do not need to be explicitly added to the pydantic model (although they can be). As such the only requirement on the Python-side when adding template rendering functionality to the Task is the addition of one parameter - an instance of TemplateConfig. The instance MUST be called lute_template_cfg.

from pydantic import Field, validator\n\nfrom .base import TemplateConfig\n\nclass RunTaskParamaters(ThirdPartyParameters):\n    ...\n    # This parameter MUST be called lute_template_cfg!\n    lute_template_cfg: TemplateConfig = Field(\n        TemplateConfig(\n            template_name=\"name_of_template.json\",\n            output_path=\"/path/to/write/rendered_output_to.json\",\n        ),\n        description=\"Template rendering configuration\",\n    )\n

LUTE looks for the template in config/templates, so only the name of the template file to use within that directory is required for the template_name attribute of lute_template_cfg. LUTE can write the output anywhere (the user has permissions), and with any name, so the full absolute path including filename should be used for the output_path of lute_template_cfg.

The rest of the work is done by the combination of Jinja, LUTE's configuration YAML file, and the template itself. Understanding the interplay between these components is perhaps best illustrated by an example. As such, let us consider a simple third-party Task whose only input parameter (on the command-line) is the location of a configuration JSON file. We'll call the third-party executable jsonuser and our Task model, the RunJsonUserParameters. We assume the program is run like:

jsonuser -i <input_file.json>\n

The first step is to setup the pydantic model as before.

from pydantic import Field, validator\n\nfrom .base import TemplateConfig\n\nclass RunJsonUserParameters:\n    executable: str = Field(\n        \"/path/to/jsonuser\", description=\"Executable which requires a JSON configuration file.\"\n    )\n    # Lets assume the JSON file is passed as \"-i <path_to_json>\"\n    input_json: str = Field(\n        \"\", description=\"Path to the input JSON file.\", flag_type=\"-\", rename_param=\"i\"\n    )\n

The next step is to create a template for the JSON file. Let's assume the JSON file looks like:

{\n    \"param1\": \"arg1\",\n    \"param2\": 4,\n    \"param3\": {\n        \"a\": 1,\n        \"b\": 2\n    },\n    \"param4\": [\n        1,\n        2,\n        3\n    ]\n}\n

Any, or all of these values can be substituted for, and we can determine the way in which we will provide them. I.e. a substitution can be provided for each variable individually, or, for example for a nested hierarchy, a dictionary can be provided which will substitute all the items at once. For this simple case, let's provide variables for param1, param2, param3.b and assume that we want the first and second entries for param4 to be identical for our use case (i.e., we can use one variable for them both. In total, this means we will perform 5 substitutions using 4 variables. Jinja will substitute a variable anywhere it sees the following syntax, {{ variable_name }}. As such a valid template for our use-case may look like:

{\n    \"param1\": {{ str_var }},\n    \"param2\": {{ int_var }},\n    \"param3\": {\n        \"a\": 1,\n        \"b\": {{ p3_b }}\n    },\n    \"param4\": [\n        {{ val }},\n        {{ val }},\n        3\n    ]\n}\n

We save this file as jsonuser.json in config/templates. Next, we will update the original pydantic model to include our template configuration. We still have an issue, however, in that we need to decide where to write the output of the template to. In this case, we can use the input_json parameter. We will assume that the user will provide this, although a default value can also be used. A custom validator will be added so that we can take the input_json value and update the value of lute_template_cfg.output_path with it.

# from typing import Optional\n\nfrom pydantic import Field, validator\n\nfrom .base import TemplateConfig #, TemplateParameters\n\nclass RunJsonUserParameters:\n    executable: str = Field(\n        \"jsonuser\", description=\"Executable which requires a JSON configuration file.\"\n    )\n    # Lets assume the JSON file is passed as \"-i <path_to_json>\"\n    input_json: str = Field(\n        \"\", description=\"Path to the input JSON file.\", flag_type=\"-\", rename_param=\"i\"\n    )\n    # Add template configuration! *MUST* be called `lute_template_cfg`\n    lute_template_cfg: TemplateConfig = Field(\n        TemplateConfig(\n            template_name=\"jsonuser.json\", # Only the name of the file here.\n            output_path=\"\",\n        ),\n        description=\"Template rendering configuration\",\n    )\n    # We do not need to include these TemplateParameters, they will be added\n    # automatically if provided in the YAML\n    #str_var: Optional[TemplateParameters]\n    #int_var: Optional[TemplateParameters]\n    #p3_b: Optional[TemplateParameters]\n    #val: Optional[TemplateParameters]\n\n\n    # Tell LUTE to write the rendered template to the location provided with\n    # `input_json`. I.e. update `lute_template_cfg.output_path`\n    @validator(\"lute_template_cfg\", always=True)\n    def update_output_path(\n        cls, lute_template_cfg: TemplateConfig, values: Dict[str, Any]\n    ) -> TemplateConfig:\n        if lute_template_cfg.output_path == \"\":\n            lute_template_cfg.output_path = values[\"input_json\"]\n        return lute_template_cfg\n

All that is left to render the template, is to provide the variables we want to substitute in the LUTE configuration YAML. In our case we must provide the 4 variable names we included within the substitution syntax ({{ var_name }}). The names in the YAML must match those in the template.

RunJsonUser:\n    input_json: \"/my/chosen/path.json\" # We'll come back to this...\n    str_var: \"arg1\" # Will substitute for \"param1\": \"arg1\"\n    int_var: 4 # Will substitute for \"param2\": 4\n    p3_b: 2  # Will substitute for \"param3: { \"b\": 2 }\n    val: 2 # Will substitute for \"param4\": [2, 2, 3] in the JSON\n

If on the other hand, a user were to have an already valid JSON file, it is possible to turn off the template rendering. (ALL) Template variables (TemplateParameters) are simply excluded from the configuration YAML.

RunJsonUser:\n    input_json: \"/path/to/existing.json\"\n    #str_var: ...\n    #...\n
"},{"location":"tutorial/new_task/#additional-jinja-syntax","title":"Additional Jinja Syntax","text":"

There are many other syntactical constructions we can use with Jinja. Some of the useful ones are:

If Statements - E.g. only include portions of the template if a value is defined.

{% if VARNAME is defined %}\n// Stuff to include\n{% endif %}\n

Loops - E.g. Unpacking multiple elements from a dictionary.

{% for name, value in VARNAME.items() %}\n// Do stuff with name and value\n{% endfor %}\n
"},{"location":"tutorial/new_task/#creating-a-first-party-task","title":"Creating a \"First-Party\" Task","text":"

The process for creating a \"First-Party\" Task is very similar to that for a \"Third-Party\" Task, with the difference being that you must also write the analysis code. The steps for integration are: 1. Write the TaskParameters model. 2. Write the Task class. There are a few rules that need to be adhered to. 3. Make your Task available by modifying the import function. 4. Specify an Executor

"},{"location":"tutorial/new_task/#specifying-a-taskparameters-model-for-your-task_1","title":"Specifying a TaskParameters Model for your Task","text":"

Parameter models have a format that must be followed for \"Third-Party\" Tasks, but \"First-Party\" Tasks have a little more liberty in how parameters are dealt with, since the Task will do all the parsing itself.

To create a model, the basic steps are: 1. If necessary, create a new module (e.g. new_task_category.py) under lute.io.models, or find an appropriate pre-existing module in that directory. - An import statement must be added to lute.io.models._init_ if a new module is created, so it can be found. - If defining the model in a pre-existing module, make sure to modify the __all__ statement to include it. 2. Create a new model that inherits from TaskParameters. You can look at lute.models.io.tests.TestReadOutputParameters for an example. The model must be named <YourTaskName>Parameters - You should include all relevant parameters here, including input file, output file, and any potentially adjustable parameters. These parameters must be included even if there are some implicit dependencies between Tasks and it would make sense for the parameter to be auto-populated based on some other output. Creating this dependency is done with validators (see step 3.). All parameters should be overridable, and all Tasks should be fully-independently configurable, based solely on their model and the configuration YAML. - To follow the preferred format, parameters should be defined as: param_name: type = Field([default value], description=\"This parameter does X.\") 3. Use validators to do more complex things for your parameters, including populating default values dynamically: - E.g. create default values that depend on other parameters in the model - see for example: SubmitSMDParameters. - E.g. create default values that depend on other Tasks by reading from the database - see for example: TestReadOutputParameters. 4. The model will have access to some general configuration values by inheriting from TaskParameters. These parameters are all stored in lute_config which is an instance of AnalysisHeader (defined here). - For example, the experiment and run number can be obtained from this object and a validator could use these values to define the default input file for the Task.

A number of configuration options and Field attributes are also available for \"First-Party\" Task models. These are identical to those used for the ThirdPartyTasks, although there is a smaller selection. These options are reproduced below for convenience.

Config settings and options Under the class definition for Config in the model, we can modify global options for all the parameters. In addition, there are a number of configuration options related to specifying what the outputs/results from the associated Task are, and a number of options to modify runtime behaviour. Currently, the available configuration options are:

Config Parameter Meaning Default Value ThirdPartyTask-specific? run_directory If provided, can be used to specify the directory from which a Task is run. None (not provided) NO set_result bool. If True search the model definition for a parameter that indicates what the result is. False NO result_from_params If set_result is True can define a result using this option and a validator. See also is_result below. None (not provided) NO short_flags_use_eq Use equals sign instead of space for arguments of - parameters. False YES - Only affects ThirdPartyTasks long_flags_use_eq Use equals sign instead of space for arguments of - parameters. False YES - Only affects ThirdPartyTasks

These configuration options modify how the parameter models are parsed and passed along on the command-line, as well as what we consider results and where a Task can run. The default behaviour is that parameters are assumed to be passed as -p arg and --param arg, the Task will be run in the current working directory (or scratch if submitted with the ARP), and we have no information about Task results . Setting the above options can modify this behaviour.

Field attributes In addition to the global configuration options there are a couple of ways to specify individual parameters. The following Field attributes are used when parsing the model:

Field Attribute Meaning Default Value Example description Documentation of the parameter's usage or purpose. N/A arg = Field(..., description=\"Argument for...\") is_result bool. If the set_result Config option is True, we can set this to True to indicate a result. N/A output_result = Field(..., is_result=true)"},{"location":"tutorial/new_task/#writing-the-task","title":"Writing the Task","text":"

You can write your analysis code (or whatever code to be executed) as long as it adheres to the limited rules below. You can create a new module for your Task in lute.tasks or add it to any existing module, if it makes sense for it to belong there. The Task itself is a single class constructed as:

  1. Your analysis Task is a class named in a way that matches its Pydantic model. E.g. RunTask is the Task, and RunTaskParameters is the Pydantic model.
  2. The class must inherit from the Task class (see template below). If you intend to use MPI see the following section.
  3. You must provide an implementation of a _run method. This is the method that will be executed when the Task is run. You can in addition write as many methods as you need. For fine-grained execution control you can also provide _pre_run() and _post_run() methods, but this is optional.
  4. For all communication (including print statements) you should use the _report_to_executor(msg: Message) method. Since the Task is run as a subprocess this method will pass information to the controlling Executor. You can pass any type of object using this method, strings, plots, arrays, etc.
  5. If you did not use the set_result configuration option in your parameters model, make sure to provide a result when finished. This is done by setting self._result.payload = .... You can set the result to be any object. If you have written the result to a file, for example, please provide a path.

A minimal template is provided below.

\"\"\"Standard docstring...\"\"\"\n\n__all__ = [\"RunTask\"]\n__author__ = \"\" # Please include so we know who the SME is\n\n# Include any imports you need here\n\nfrom lute.execution.ipc import Message # Message for communication\nfrom lute.io.models.base import *      # For TaskParameters\nfrom lute.tasks.task import *          # For Task\n\nclass RunTask(Task): # Inherit from Task\n    \"\"\"Task description goes here, or in __init__\"\"\"\n\n    def __init__(self, *, params: TaskParameters) -> None:\n        super().__init__(params=params) # Sets up Task, parameters, etc.\n        # Parameters will be available through:\n          # self._task_parameters\n          # You access with . operator: self._task_parameters.param1, etc.\n        # Your result object is availble through:\n          # self._result\n            # self._result.payload <- Main result\n            # self._result.summary <- Short summary\n            # self._result.task_status <- Semi-automatic, but can be set manually\n\n    def _run(self) -> None:\n        # THIS METHOD MUST BE PROVIDED\n        self.do_my_analysis()\n\n    def do_my_analysis(self) -> None:\n        # Send a message, proper way to print:\n        msg: Message(contents=\"My message contents\", signal=\"\")\n        self._report_to_executor(msg)\n\n        # When done, set result - assume we wrote a file, e.g.\n        self._result.payload = \"/path/to/output_file.h5\"\n        # Optionally also set status - good practice but not obligatory\n        self._result.task_status = TaskStatus.COMPLETED\n
"},{"location":"tutorial/new_task/#using-mpi-for-your-task","title":"Using MPI for your Task","text":"

In the case your Task is written to use MPI a slight modification to the template above is needed. Specifically, an additional keyword argument should be passed to the base class initializer: use_mpi=True. This tells the base class to adjust signalling/communication behaviour appropriately for a multi-rank MPI program. Doing this prevents tricky-to-track-down problems due to ranks starting, completing and sending messages at different times. The rest of your code can, as before, be written as you see fit. The use of this keyword argument will also synchronize the start of all ranks and wait until all ranks have finished to exit.

\"\"\"Task which needs to run with MPI\"\"\"\n\n__all__ = [\"RunTask\"]\n__author__ = \"\" # Please include so we know who the SME is\n\n# Include any imports you need here\n\nfrom lute.execution.ipc import Message # Message for communication\nfrom lute.io.models.base import *      # For TaskParameters\nfrom lute.tasks.task import *          # For Task\n\n# Only the init is shown\nclass RunMPITask(Task): # Inherit from Task\n    \"\"\"Task description goes here, or in __init__\"\"\"\n\n    # Signal the use of MPI!\n    def __init__(self, *, params: TaskParameters, use_mpi: bool = True) -> None:\n        super().__init__(params=params, use_mpi=use_mpi) # Sets up Task, parameters, etc.\n        # That's it.\n
"},{"location":"tutorial/new_task/#message-signals","title":"Message signals","text":"

Signals in Message objects are strings and can be one of the following:

LUTE_SIGNALS: Set[str] = {\n    \"NO_PICKLE_MODE\",\n    \"TASK_STARTED\",\n    \"TASK_FAILED\",\n    \"TASK_STOPPED\",\n    \"TASK_DONE\",\n    \"TASK_CANCELLED\",\n    \"TASK_RESULT\",\n}\n

Each of these signals is associated with a hook on the Executor-side. They are for the most part used by base classes; however, you can choose to make use of them manually as well.

"},{"location":"tutorial/new_task/#making-your-task-available","title":"Making your Task available","text":"

Once the Task has been written, it needs to be made available for import. Since different Tasks can have conflicting dependencies and environments, this is managed through an import function. When the Task is done, or ready for testing, a condition is added to lute.tasks.__init__.import_task. For example, assume the Task is called RunXASAnalysis and it's defined in a module called xas.py, we would add the following lines to the import_task function:

# in lute.tasks.__init__\n\n# ...\n\ndef import_task(task_name: str) -> Type[Task]:\n    # ...\n    if task_name == \"RunXASAnalysis\":\n        from .xas import RunXASAnalysis\n\n        return RunXASAnalysis\n
"},{"location":"tutorial/new_task/#defining-an-executor","title":"Defining an Executor","text":"

The process of Executor definition is identical to the process as described for ThirdPartyTasks above. The one exception is if you defined the Task to use MPI as described in the section above (Using MPI for your Task), you will likely consider using the MPIExecutor.

"}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Setup","text":"

LUTE is publically available on GitHub. In order to run it, the first step is to clone the repository:

# Navigate to the directory of your choice.\ngit clone@github.com:slac-lcls/lute\n

The repository directory structure is as follows:

lute\n  |--- config             # Configuration YAML files (see below) and templates for third party config\n  |--- docs               # Documentation (including this page)\n  |--- launch_scripts     # Entry points for using SLURM and communicating with Airflow\n  |--- lute               # Code\n        |--- run_task.py  # Script to run an individual managed Task\n        |--- ...\n  |--- utilities          # Help utility programs\n  |--- workflows          # This directory contains workflow definitions. It is synced elsewhere and not used directly.\n\n

In general, most interactions with the software will be through scripts located in the launch_scripts directory. Some users (for certain use-cases) may also choose to run the run_task.py script directly - it's location has been highlighted within hierarchy. To begin with you will need a YAML file, templates for which are available in the config directory. The structure of the YAML file and how to use the various launch scripts are described in more detail below.

"},{"location":"#a-note-on-utilties","title":"A note on utilties","text":"

In the utilities directory there are two useful programs to provide assistance with using the software:

"},{"location":"#basic-usage","title":"Basic Usage","text":""},{"location":"#overview","title":"Overview","text":"

LUTE runs code as Tasks that are managed by an Executor. The Executor provides modifications to the environment the Task runs in, as well as controls details of inter-process communication, reporting results to the eLog, etc. Combinations of specific Executors and Tasks are already provided, and are referred to as managed Tasks. Managed Tasks are submitted as a single unit. They can be run individually, or a series of independent steps can be submitted all at once in the form of a workflow, or directed acyclic graph (DAG). This latter option makes use of Airflow to manage the individual execution steps.

Running analysis with LUTE is the process of submitting one or more managed Tasks. This is generally a two step process.

  1. First, a configuration YAML file is prepared. This contains the parameterizations of all the Tasks which you may run.
  2. Individual managed Task submission, or workflow (DAG) submission.

These two steps are described below.

"},{"location":"#preparing-a-configuration-yaml","title":"Preparing a Configuration YAML","text":"

All Tasks are parameterized through a single configuration YAML file - even third party code which requires its own configuration files is managed through this YAML file. The basic structure is split into two documents, a brief header section which contains information that is applicable across all Tasks, such as the experiment name, run numbers and the working directory, followed by per Task parameters:

%YAML 1.3\n---\ntitle: \"Some title.\"\nexperiment: \"MYEXP123\"\n# run: 12 # Does not need to be provided\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/scratch/users/d/dorlhiac\"\n...\n---\nTaskOne:\n  param_a: 123\n  param_b: 456\n  param_c:\n    sub_var: 3\n    sub_var2: 4\n\nTaskTwo:\n  new_param1: 3\n  new_param2: 4\n\n# ...\n...\n

In the first document, the header, it is important that the work_dir is properly specified. This is the root directory from which Task outputs will be written, and the LUTE database will be stored. It may also be desirable to modify the task_timeout parameter which defines the time limit for individual Task jobs. By default it is set to 10 minutes, although this may not be sufficient for long running jobs. This value will be applied to all Tasks so should account for the longest running job you expect.

The actual analysis parameters are defined in the second document. As these vary from Task to Task, a full description will not be provided here. An actual template with real Task parameters is available in config/test.yaml. Your analysis POC can also help you set up and choose the correct Tasks to include as a starting point. The template YAML file has further descriptions of what each parameter does and how to fill it out. You can also refer to the lute_help program described under the following sub-heading.

Some things to consider and possible points of confusion:

Managed Task The Task it Runs Task Description SmallDataProducer SubmitSMD Smalldata production CrystFELIndexer IndexCrystFEL Crystallographic indexing PartialatorMerger MergePartialator Crystallographic merging HKLComparer CompareHKL Crystallographic figures of merit HKLManipulator ManipulateHKL Crystallographic format conversions DimpleSolver DimpleSolve Crystallographic structure solution with molecular replacement PeakFinderPyAlgos FindPeaksPyAlgos Peak finding with PyAlgos algorithm. PeakFinderPsocake FindPeaksPsocake Peak finding with psocake algorithm. StreamFileConcatenator ConcatenateStreamFiles Stream file concatenation."},{"location":"#how-do-i-know-what-parameters-are-available-and-what-they-do","title":"How do I know what parameters are available, and what they do?","text":"

A summary of Task parameters is available through the lute_help program.

> utilities/lute_help -t [TaskName]\n

Note, some parameters may say \"Unknown description\" - this either means they are using an old-style defintion that does not include parameter help, or they may have some internal use. In particular you will see this for lute_config on every Task, this parameter is filled in automatically and should be ignored. E.g. as an example:

> utilities/lute_help -t IndexCrystFEL\nINFO:__main__:Fetching parameter information for IndexCrystFEL.\nIndexCrystFEL\n-------------\nParameters for CrystFEL's `indexamajig`.\n\nThere are many parameters, and many combinations. For more information on\nusage, please refer to the CrystFEL documentation, here:\nhttps://www.desy.de/~twhite/crystfel/manual-indexamajig.html\n\n\nRequired Parameters:\n--------------------\n[...]\n\nAll Parameters:\n-------------\n[...]\n\nhighres (number)\n    Mark all pixels greater than `x` has bad.\n\nprofile (boolean) - Default: False\n    Display timing data to monitor performance.\n\ntemp_dir (string)\n    Specify a path for the temp files folder.\n\nwait_for_file (integer) - Default: 0\n    Wait at most `x` seconds for a file to be created. A value of -1 means wait forever.\n\nno_image_data (boolean) - Default: False\n    Load only the metadata, no iamges. Can check indexability without high data requirements.\n\n[...]\n
"},{"location":"#running-managed-tasks-and-workflows-dags","title":"Running Managed Tasks and Workflows (DAGs)","text":"

After a YAML file has been filled in you can run a Task. There are multiple ways to submit a Task, but there are 3 that are most likely:

  1. Run a single managed Task interactively by running python ...
  2. Run a single managed Task as a batch job (e.g. on S3DF) via a SLURM submission submit_slurm.sh ...
  3. Run a DAG (workflow with multiple managed Tasks).

These will be covered in turn below; however, in general all methods will require two parameters: the path to a configuration YAML file, and the name of the managed Task or workflow you want to run. When submitting via SLURM or submitting an entire workflow there are additional parameters to control these processes.

"},{"location":"#running-single-managed-tasks-interactively","title":"Running single managed Tasks interactively","text":"

The simplest submission method is just to run Python interactively. In most cases this is not practical for long-running analysis, but may be of use for short Tasks or when debugging. From the root directory of the LUTE repository (or after installation) you can use the run_task.py script:

> python -B [-O] run_task.py -t <ManagedTaskName> -c </path/to/config/yaml>\n

The command-line arguments in square brackets [] are optional, while those in <> must be provided:

"},{"location":"#submitting-a-single-managed-task-as-a-batch-job","title":"Submitting a single managed Task as a batch job","text":"

On S3DF you can also submit individual managed Tasks to run as batch jobs. To do so use launch_scripts/submit_slurm.sh

> launch_scripts/submit_slurm.sh -t <ManagedTaskName> -c </path/to/config/yaml> [--debug] $SLURM_ARGS\n

As before command-line arguments in square brackets [] are optional, while those in <> must be provided

In addition to the LUTE-specific arguments, SLURM arguments must also be provided ($SLURM_ARGS above). You can provide as many as you want; however you will need to at least provide:

You will likely also want to provide at a minimum:

In general, it is best to prefer the long-form of the SLURM-argument (--arg=<...>) in order to avoid potential clashes with present or future LUTE arguments.

"},{"location":"#workflow-dag-submission","title":"Workflow (DAG) submission","text":"

Finally, you can submit a full workflow (e.g. SFX analysis, smalldata production and summary results, geometry optimization...). This can be done using a single script, submit_launch_airflow.sh, similarly to the SLURM submission above:

> launch_scripts/submit_launch_airflow.sh /path/to/lute/launch_scripts/launch_airflow.py -c </path/to/yaml.yaml> -w <dag_name> [--debug] [--test] [-e <exp>] [-r <run>] $SLURM_ARGS\n

The submission process is slightly more complicated in this case. A more in-depth explanation is provided under \"Airflow Launch Steps\", in the advanced usage section below if interested. The parameters are as follows - as before command-line arguments in square brackets [] are optional, while those in <> must be provided:

The $SLURM_ARGS must be provided in the same manner as when submitting an individual managed Task by hand to be run as batch job with the script above. Note that these parameters will be used as the starting point for the SLURM arguments of every managed Task in the DAG; however, individual steps in the DAG may have overrides built-in where appropriate to make sure that step is not submitted with potentially incompatible arguments. For example, a single threaded analysis Task may be capped to running on one core, even if in general everything should be running on 100 cores, per the SLURM argument provided. These caps are added during development and cannot be disabled through configuration changes in the YAML.

DAG List

"},{"location":"#dag-submission-from-the-elog","title":"DAG Submission from the eLog","text":"

You can use the script in the previous section to submit jobs through the eLog. To do so navigate to the Workflow > Definitions tab using the blue navigation bar at the top of the eLog. On this tab, in the top-right corner (underneath the help and zoom icons) you can click the + sign to add a new workflow. This will bring up a \"Workflow definition\" UI window. When filling out the eLog workflow definition the following fields are needed (all of them):

Upon clicking create you will see a new entry in the table on the definitions page. In order to run MANUAL workflows, or re-run automatic workflows, you must navigate to the Workflows > Control tab. For each acquisition run you will find a drop down menu under the Job column. To submit a workflow you select it from this drop down menu by the Name you provided when creating its definition.

"},{"location":"#advanced-usage","title":"Advanced Usage","text":""},{"location":"#variable-substitution-in-yaml-files","title":"Variable Substitution in YAML Files","text":"

Using validators, it is possible to define (generally, default) model parameters for a Task in terms of other parameters. It is also possible to use validated Pydantic model parameters to substitute values into a configuration file required to run a third party Task (e.g. some Tasks may require their own JSON, TOML files, etc. to run properly). For more information on these types of substitutions, refer to the new_task.md documentation on Task creation.

These types of substitutions, however, have a limitation in that they are not easily adapted at run time. They therefore address only a small number of the possible combinations in the dependencies between different input parameters. In order to support more complex relationships between parameters, variable substitutions can also be used in the configuration YAML itself. Using a syntax similar to Jinja templates, you can define values for YAML parameters in terms of other parameters or environment variables. The values are substituted before Pydantic attempts to validate the configuration.

It is perhaps easiest to illustrate with an example. A test case is provided in config/test_var_subs.yaml and is reproduced here:

%YAML 1.3\n---\ntitle: \"Configuration to Test YAML Substitution\"\nexperiment: \"TestYAMLSubs\"\nrun: 12\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/scratch/users/d/dorlhiac\"\n...\n---\nOtherTask:\n  useful_other_var: \"USE ME!\"\n\nNonExistentTask:\n  test_sub: \"/path/to/{{ experiment }}/file_r{{ run:04d }}.input\"         # Substitute `experiment` and `run` from header above\n  test_env_sub: \"/path/to/{{ $EXPERIMENT }}/file.input\"                   # Substitute from the environment variable $EXPERIMENT\n  test_nested:\n    a: \"outfile_{{ run }}_one.out\"                                        # Substitute `run` from header above\n    b:\n      c: \"outfile_{{ run }}_two.out\"                                      # Also substitute `run` from header above\n      d: \"{{ OtherTask.useful_other_var }}\"                               # Substitute `useful_other_var` from `OtherTask`\n  test_fmt: \"{{ run:04d }}\"                                               # Subsitute `run` and format as 0012\n  test_env_fmt: \"{{ $RUN:04d }}\"                                          # Substitute environment variable $RUN and pad to 4 w/ zeros\n...\n

Input parameters in the config YAML can be substituted with either other input parameters or environment variables, with or without limited string formatting. All substitutions occur between double curly brackets: {{ VARIABLE_TO_SUBSTITUTE }}. Environment variables are indicated by $ in front of the variable name. Parameters from the header, i.e. the first YAML document (top section) containing the run, experiment, version fields, etc. can be substituted without any qualification. If you want to use the run parameter, you can substitute it using {{ run }}. All other parameters, i.e. from other Tasks or within Tasks, must use a qualified name. Nested levels are delimited using a .. E.g. consider a structure like:

Task:\n  param_set:\n    a: 1\n    b: 2\n    c: 3\n

In order to use parameter c, you would use {{ Task.param_set.c }} as the substitution.

Take care when using substitutions! This process will not try to guess for you. When a substitution is not available, e.g. due to misspelling, one of two things will happen:

Defining your own parameters

The configuration file is not validated in its totality, only on a Task-by-Task basis, but it is read in its totality. E.g. when running MyTask only that portion of the configuration is validated even though the entire file has been read, and is available for substitutions. As a result, it is safe to introduce extra entries into the YAML file, as long as they are not entered under a specific Task's configuration. This may be useful to create your own global substitutions, for example if there is a key variable that may be used across different Tasks. E.g. Consider a case where you want to create a more generic configuration file where a single variable is used by multiple Tasks. This single variable may be changed between experiments, for instance, but is likely static for the duration of a single set of analyses. In order to avoid a mistake when changing the configuration between experiments you can define this special variable (or variables) as a separate entry in the YAML, and make use of substitutions in each Task's configuration. This way the variable only needs to be changed in one place.

# Define our substitution. This is only for substitutiosns!\nMY_SPECIAL_SUB: \"EXPMT_DEPENDENT_VALUE\"  # Can change here once per experiment!\n\nRunTask1:\n  special_var: \"{{ MY_SPECIAL_SUB }}\"\n  var_1: 1\n  var_2: \"a\"\n  # ...\n\nRunTask2:\n  special_var: \"{{ MY_SPECIAL_SUB }}\"\n  var_3: \"abcd\"\n  var_4: 123\n  # ...\n\nRunTask3:\n  special_var: \"{{ MY_SPECIAL_SUB }}\"\n  #...\n\n# ... and so on\n
"},{"location":"#gotchas","title":"Gotchas!","text":"

Order matters

While in general you can use parameters that appear later in a YAML document to substitute for values of parameters that appear earlier, the substitutions themselves will be performed in order of appearance. It is therefore NOT possible to correctly use a later parameter as a substitution for an earlier one, if the later one itself depends on a substitution. The YAML document, however, can be rearranged without error. The order in the YAML document has no effect on execution order which is determined purely by the workflow definition. As mentioned above, the document is not validated in its entirety so rearrangements are allowed. For example consider the following situation which produces an incorrect substitution:

%YAML 1.3\n---\ntitle: \"Configuration to Test YAML Substitution\"\nexperiment: \"TestYAMLSubs\"\nrun: 12\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/data/lcls/ds/exp/experiment/scratch\"\n...\n---\nRunTaskOne:\n  input_dir: \"{{ RunTaskTwo.path }}\"  # Will incorrectly be \"{{ work_dir }}/additional_path/{{ $RUN }}\"\n  # ...\n\nRunTaskTwo:\n  # Remember `work_dir` and `run` come from the header document and don't need to\n  # be qualified\n  path: \"{{ work_dir }}/additional_path/{{ run }}\"\n...\n

This configuration can be rearranged to achieve the desired result:

%YAML 1.3\n---\ntitle: \"Configuration to Test YAML Substitution\"\nexperiment: \"TestYAMLSubs\"\nrun: 12\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/data/lcls/ds/exp/experiment/scratch\"\n...\n---\nRunTaskTwo:\n  # Remember `work_dir` comes from the header document and doesn't need to be qualified\n  path: \"{{ work_dir }}/additional_path/{{ run }}\"\n\nRunTaskOne:\n  input_dir: \"{{ RunTaskTwo.path }}\"  # Will now be /sdf/data/lcls/ds/exp/experiment/scratch/additional_path/12\n  # ...\n...\n

On the otherhand, relationships such as these may point to inconsistencies in the dependencies between Tasks which may warrant a refactor.

Found unhashable key

To avoid YAML parsing issues when using the substitution syntax, be sure to quote your substitutions. Before substitution is performed, a dictionary is first constructed by the pyyaml package which parses the document - it may fail to parse the document and raise an exception if the substitutions are not quoted. E.g.

# USE THIS\nMyTask:\n  var_sub: \"{{ other_var:04d }}\"\n\n# **DO NOT** USE THIS\nMyTask:\n  var_sub: {{ other_var:04d }}\n

During validation, Pydantic will by default cast variables if possible, because of this it is generally safe to use strings for substitutions. E.g. if your parameter is expecting an integer, and after substitution you pass \"2\", Pydantic will cast this to the int 2, and validation will succeed. As part of the substitution process limited type casting will also be handled if it is necessary for any formatting strings provided. E.g. \"{{ run:04d }}\" requires that run be an integer, so it will be treated as such in order to apply the formatting.

"},{"location":"#custom-run-time-dags","title":"Custom Run-Time DAGs","text":"

In most cases, standard DAGs should be called as described above. However, Airflow also supports the dynamic creation of DAGs, e.g. to vary the input data to various steps, or the number of steps that will occur. Some of this functionality has been used to allow for user-defined DAGs which are passed in the form of a dictionary, allowing Airflow to construct the workflow as it is running.

A basic YAML syntax is used to construct a series of nested dictionaries which define a DAG. Consider a simplified serial femtosecond crystallography DAG which runs peak finding through merging and then calculates some statistics. I.e. we want an execution order that looks like:

peak_finder >> indexer >> merger >> hkl_comparer\n

We can alternatively define this DAG in YAML:

task_name: PeakFinderPyAlgos\nslurm_params: ''\nnext:\n- task_name: CrystFELIndexer\n  slurm_params: ''\n  next: []\n  - task_name: PartialatorMerger\n    slurm_params: ''\n    next: []\n    - task_name: HKLComparer\n      slurm_params: ''\n      next:\n

I.e. we define a tree where each node is constructed using Node(task_name: str, slurm_params: str, next: List[Node]).

As a second example, to run task1 followed by task2 and task3 in parellel we would use:

task_name: Task1\nslurm_params: ''\nnext:\n- task_name: Task2\n  slurm_params: ''\n  next: []\n- task_name: Task3\n  slurm_params: ''\n  next: []\n

In order to run a DAG defined in this way, we pass the path to the YAML file we have defined it in to the launch script using -W <path_to_dag>. This is instead of calling it by name. E.g.

/path/to/lute/launch_scripts/submit_launch_airflow.sh /path/to/lute/launch_scripts/launch_airflow.py -e <exp> -r <run> -c /path/to/config -W <path_to_dag> --test [--debug] [SLURM_ARGS]\n

Note that fewer options are currently supported for configuring the operators for each step of the DAG. The slurm arguments can be replaced in their entirety using a custom slurm_params string but individual options cannot be modified.

"},{"location":"#debug-environment-variables","title":"Debug Environment Variables","text":"

Special markers have been inserted at certain points in the execution flow for LUTE. These can be enabled by setting the environment variables detailed below. These are intended to allow developers to exit the program at certain points to investigate behaviour or a bug. For instance, when working on configuration parsing, an environment variable can be set which exits the program after passing this step. This allows you to run LUTE otherwise as normal (described above), without having to modify any additional code or insert your own early exits.

Types of debug markers:

Developers can insert these markers as needed into their code to add new exit points, although as a rule of thumb they should be used sparingly, and generally only after major steps in the execution flow (e.g. after parsing, after beginning a task, after returning a result, etc.).

In order to include a new marker in your code:

from lute.execution.debug_utils import LUTE_DEBUG_EXIT\n\ndef my_code() -> None:\n    # ...\n    LUTE_DEBUG_EXIT(\"MYENVVAR\", \"Additional message to print\")\n    # If MYENVVAR is not set, the above function does nothing\n

You can enable a marker by setting to 1, e.g. to enable the example marker above while running Tester:

MYENVVAR=1 python -B run_task.py -t Tester -c config/test.yaml\n
"},{"location":"#currently-used-environment-variables","title":"Currently used environment variables","text":""},{"location":"#airflow-launch-and-dag-execution-steps","title":"Airflow Launch and DAG Execution Steps","text":"

The Airflow launch process actually involves a number of steps, and is rather complicated. There are two wrapper steps prior to getting to the actual Airflow API communication.

  1. launch_scripts/submit_launch_airflow.sh is run.
  2. This script calls /sdf/group/lcls/ds/tools/lute_launcher with all the same parameters that it was called with.
  3. lute_launcher runs the launch_scripts/launch_airflow.py script which was provided as the first argument. This is the true launch script
  4. launch_airflow.py communicates with the Airflow API, requesting that a specific DAG be launched. It then continues to run, and gathers the individual logs and the exit status of each step of the DAG.
  5. Airflow will then enter a loop of communication where it asks the JID to submit each step of the requested DAG as batch job using launch_scripts/submit_slurm.sh.

There are some specific reasons for this complexity:

"},{"location":"usage/","title":"Setup","text":"

LUTE is publically available on GitHub. In order to run it, the first step is to clone the repository:

# Navigate to the directory of your choice.\ngit clone@github.com:slac-lcls/lute\n

The repository directory structure is as follows:

lute\n  |--- config             # Configuration YAML files (see below) and templates for third party config\n  |--- docs               # Documentation (including this page)\n  |--- launch_scripts     # Entry points for using SLURM and communicating with Airflow\n  |--- lute               # Code\n        |--- run_task.py  # Script to run an individual managed Task\n        |--- ...\n  |--- utilities          # Help utility programs\n  |--- workflows          # This directory contains workflow definitions. It is synced elsewhere and not used directly.\n\n

In general, most interactions with the software will be through scripts located in the launch_scripts directory. Some users (for certain use-cases) may also choose to run the run_task.py script directly - it's location has been highlighted within hierarchy. To begin with you will need a YAML file, templates for which are available in the config directory. The structure of the YAML file and how to use the various launch scripts are described in more detail below.

"},{"location":"usage/#a-note-on-utilties","title":"A note on utilties","text":"

In the utilities directory there are two useful programs to provide assistance with using the software:

"},{"location":"usage/#basic-usage","title":"Basic Usage","text":""},{"location":"usage/#overview","title":"Overview","text":"

LUTE runs code as Tasks that are managed by an Executor. The Executor provides modifications to the environment the Task runs in, as well as controls details of inter-process communication, reporting results to the eLog, etc. Combinations of specific Executors and Tasks are already provided, and are referred to as managed Tasks. Managed Tasks are submitted as a single unit. They can be run individually, or a series of independent steps can be submitted all at once in the form of a workflow, or directed acyclic graph (DAG). This latter option makes use of Airflow to manage the individual execution steps.

Running analysis with LUTE is the process of submitting one or more managed Tasks. This is generally a two step process.

  1. First, a configuration YAML file is prepared. This contains the parameterizations of all the Tasks which you may run.
  2. Individual managed Task submission, or workflow (DAG) submission.

These two steps are described below.

"},{"location":"usage/#preparing-a-configuration-yaml","title":"Preparing a Configuration YAML","text":"

All Tasks are parameterized through a single configuration YAML file - even third party code which requires its own configuration files is managed through this YAML file. The basic structure is split into two documents, a brief header section which contains information that is applicable across all Tasks, such as the experiment name, run numbers and the working directory, followed by per Task parameters:

%YAML 1.3\n---\ntitle: \"Some title.\"\nexperiment: \"MYEXP123\"\n# run: 12 # Does not need to be provided\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/scratch/users/d/dorlhiac\"\n...\n---\nTaskOne:\n  param_a: 123\n  param_b: 456\n  param_c:\n    sub_var: 3\n    sub_var2: 4\n\nTaskTwo:\n  new_param1: 3\n  new_param2: 4\n\n# ...\n...\n

In the first document, the header, it is important that the work_dir is properly specified. This is the root directory from which Task outputs will be written, and the LUTE database will be stored. It may also be desirable to modify the task_timeout parameter which defines the time limit for individual Task jobs. By default it is set to 10 minutes, although this may not be sufficient for long running jobs. This value will be applied to all Tasks so should account for the longest running job you expect.

The actual analysis parameters are defined in the second document. As these vary from Task to Task, a full description will not be provided here. An actual template with real Task parameters is available in config/test.yaml. Your analysis POC can also help you set up and choose the correct Tasks to include as a starting point. The template YAML file has further descriptions of what each parameter does and how to fill it out. You can also refer to the lute_help program described under the following sub-heading.

Some things to consider and possible points of confusion:

Managed Task The Task it Runs Task Description SmallDataProducer SubmitSMD Smalldata production CrystFELIndexer IndexCrystFEL Crystallographic indexing PartialatorMerger MergePartialator Crystallographic merging HKLComparer CompareHKL Crystallographic figures of merit HKLManipulator ManipulateHKL Crystallographic format conversions DimpleSolver DimpleSolve Crystallographic structure solution with molecular replacement PeakFinderPyAlgos FindPeaksPyAlgos Peak finding with PyAlgos algorithm. PeakFinderPsocake FindPeaksPsocake Peak finding with psocake algorithm. StreamFileConcatenator ConcatenateStreamFiles Stream file concatenation."},{"location":"usage/#how-do-i-know-what-parameters-are-available-and-what-they-do","title":"How do I know what parameters are available, and what they do?","text":"

A summary of Task parameters is available through the lute_help program.

> utilities/lute_help -t [TaskName]\n

Note, some parameters may say \"Unknown description\" - this either means they are using an old-style defintion that does not include parameter help, or they may have some internal use. In particular you will see this for lute_config on every Task, this parameter is filled in automatically and should be ignored. E.g. as an example:

> utilities/lute_help -t IndexCrystFEL\nINFO:__main__:Fetching parameter information for IndexCrystFEL.\nIndexCrystFEL\n-------------\nParameters for CrystFEL's `indexamajig`.\n\nThere are many parameters, and many combinations. For more information on\nusage, please refer to the CrystFEL documentation, here:\nhttps://www.desy.de/~twhite/crystfel/manual-indexamajig.html\n\n\nRequired Parameters:\n--------------------\n[...]\n\nAll Parameters:\n-------------\n[...]\n\nhighres (number)\n    Mark all pixels greater than `x` has bad.\n\nprofile (boolean) - Default: False\n    Display timing data to monitor performance.\n\ntemp_dir (string)\n    Specify a path for the temp files folder.\n\nwait_for_file (integer) - Default: 0\n    Wait at most `x` seconds for a file to be created. A value of -1 means wait forever.\n\nno_image_data (boolean) - Default: False\n    Load only the metadata, no iamges. Can check indexability without high data requirements.\n\n[...]\n
"},{"location":"usage/#running-managed-tasks-and-workflows-dags","title":"Running Managed Tasks and Workflows (DAGs)","text":"

After a YAML file has been filled in you can run a Task. There are multiple ways to submit a Task, but there are 3 that are most likely:

  1. Run a single managed Task interactively by running python ...
  2. Run a single managed Task as a batch job (e.g. on S3DF) via a SLURM submission submit_slurm.sh ...
  3. Run a DAG (workflow with multiple managed Tasks).

These will be covered in turn below; however, in general all methods will require two parameters: the path to a configuration YAML file, and the name of the managed Task or workflow you want to run. When submitting via SLURM or submitting an entire workflow there are additional parameters to control these processes.

"},{"location":"usage/#running-single-managed-tasks-interactively","title":"Running single managed Tasks interactively","text":"

The simplest submission method is just to run Python interactively. In most cases this is not practical for long-running analysis, but may be of use for short Tasks or when debugging. From the root directory of the LUTE repository (or after installation) you can use the run_task.py script:

> python -B [-O] run_task.py -t <ManagedTaskName> -c </path/to/config/yaml>\n

The command-line arguments in square brackets [] are optional, while those in <> must be provided:

"},{"location":"usage/#submitting-a-single-managed-task-as-a-batch-job","title":"Submitting a single managed Task as a batch job","text":"

On S3DF you can also submit individual managed Tasks to run as batch jobs. To do so use launch_scripts/submit_slurm.sh

> launch_scripts/submit_slurm.sh -t <ManagedTaskName> -c </path/to/config/yaml> [--debug] $SLURM_ARGS\n

As before command-line arguments in square brackets [] are optional, while those in <> must be provided

In addition to the LUTE-specific arguments, SLURM arguments must also be provided ($SLURM_ARGS above). You can provide as many as you want; however you will need to at least provide:

You will likely also want to provide at a minimum:

In general, it is best to prefer the long-form of the SLURM-argument (--arg=<...>) in order to avoid potential clashes with present or future LUTE arguments.

"},{"location":"usage/#workflow-dag-submission","title":"Workflow (DAG) submission","text":"

Finally, you can submit a full workflow (e.g. SFX analysis, smalldata production and summary results, geometry optimization...). This can be done using a single script, submit_launch_airflow.sh, similarly to the SLURM submission above:

> launch_scripts/submit_launch_airflow.sh /path/to/lute/launch_scripts/launch_airflow.py -c </path/to/yaml.yaml> -w <dag_name> [--debug] [--test] [-e <exp>] [-r <run>] $SLURM_ARGS\n

The submission process is slightly more complicated in this case. A more in-depth explanation is provided under \"Airflow Launch Steps\", in the advanced usage section below if interested. The parameters are as follows - as before command-line arguments in square brackets [] are optional, while those in <> must be provided:

The $SLURM_ARGS must be provided in the same manner as when submitting an individual managed Task by hand to be run as batch job with the script above. Note that these parameters will be used as the starting point for the SLURM arguments of every managed Task in the DAG; however, individual steps in the DAG may have overrides built-in where appropriate to make sure that step is not submitted with potentially incompatible arguments. For example, a single threaded analysis Task may be capped to running on one core, even if in general everything should be running on 100 cores, per the SLURM argument provided. These caps are added during development and cannot be disabled through configuration changes in the YAML.

DAG List

"},{"location":"usage/#dag-submission-from-the-elog","title":"DAG Submission from the eLog","text":"

You can use the script in the previous section to submit jobs through the eLog. To do so navigate to the Workflow > Definitions tab using the blue navigation bar at the top of the eLog. On this tab, in the top-right corner (underneath the help and zoom icons) you can click the + sign to add a new workflow. This will bring up a \"Workflow definition\" UI window. When filling out the eLog workflow definition the following fields are needed (all of them):

Upon clicking create you will see a new entry in the table on the definitions page. In order to run MANUAL workflows, or re-run automatic workflows, you must navigate to the Workflows > Control tab. For each acquisition run you will find a drop down menu under the Job column. To submit a workflow you select it from this drop down menu by the Name you provided when creating its definition.

"},{"location":"usage/#advanced-usage","title":"Advanced Usage","text":""},{"location":"usage/#variable-substitution-in-yaml-files","title":"Variable Substitution in YAML Files","text":"

Using validators, it is possible to define (generally, default) model parameters for a Task in terms of other parameters. It is also possible to use validated Pydantic model parameters to substitute values into a configuration file required to run a third party Task (e.g. some Tasks may require their own JSON, TOML files, etc. to run properly). For more information on these types of substitutions, refer to the new_task.md documentation on Task creation.

These types of substitutions, however, have a limitation in that they are not easily adapted at run time. They therefore address only a small number of the possible combinations in the dependencies between different input parameters. In order to support more complex relationships between parameters, variable substitutions can also be used in the configuration YAML itself. Using a syntax similar to Jinja templates, you can define values for YAML parameters in terms of other parameters or environment variables. The values are substituted before Pydantic attempts to validate the configuration.

It is perhaps easiest to illustrate with an example. A test case is provided in config/test_var_subs.yaml and is reproduced here:

%YAML 1.3\n---\ntitle: \"Configuration to Test YAML Substitution\"\nexperiment: \"TestYAMLSubs\"\nrun: 12\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/scratch/users/d/dorlhiac\"\n...\n---\nOtherTask:\n  useful_other_var: \"USE ME!\"\n\nNonExistentTask:\n  test_sub: \"/path/to/{{ experiment }}/file_r{{ run:04d }}.input\"         # Substitute `experiment` and `run` from header above\n  test_env_sub: \"/path/to/{{ $EXPERIMENT }}/file.input\"                   # Substitute from the environment variable $EXPERIMENT\n  test_nested:\n    a: \"outfile_{{ run }}_one.out\"                                        # Substitute `run` from header above\n    b:\n      c: \"outfile_{{ run }}_two.out\"                                      # Also substitute `run` from header above\n      d: \"{{ OtherTask.useful_other_var }}\"                               # Substitute `useful_other_var` from `OtherTask`\n  test_fmt: \"{{ run:04d }}\"                                               # Subsitute `run` and format as 0012\n  test_env_fmt: \"{{ $RUN:04d }}\"                                          # Substitute environment variable $RUN and pad to 4 w/ zeros\n...\n

Input parameters in the config YAML can be substituted with either other input parameters or environment variables, with or without limited string formatting. All substitutions occur between double curly brackets: {{ VARIABLE_TO_SUBSTITUTE }}. Environment variables are indicated by $ in front of the variable name. Parameters from the header, i.e. the first YAML document (top section) containing the run, experiment, version fields, etc. can be substituted without any qualification. If you want to use the run parameter, you can substitute it using {{ run }}. All other parameters, i.e. from other Tasks or within Tasks, must use a qualified name. Nested levels are delimited using a .. E.g. consider a structure like:

Task:\n  param_set:\n    a: 1\n    b: 2\n    c: 3\n

In order to use parameter c, you would use {{ Task.param_set.c }} as the substitution.

Take care when using substitutions! This process will not try to guess for you. When a substitution is not available, e.g. due to misspelling, one of two things will happen:

Defining your own parameters

The configuration file is not validated in its totality, only on a Task-by-Task basis, but it is read in its totality. E.g. when running MyTask only that portion of the configuration is validated even though the entire file has been read, and is available for substitutions. As a result, it is safe to introduce extra entries into the YAML file, as long as they are not entered under a specific Task's configuration. This may be useful to create your own global substitutions, for example if there is a key variable that may be used across different Tasks. E.g. Consider a case where you want to create a more generic configuration file where a single variable is used by multiple Tasks. This single variable may be changed between experiments, for instance, but is likely static for the duration of a single set of analyses. In order to avoid a mistake when changing the configuration between experiments you can define this special variable (or variables) as a separate entry in the YAML, and make use of substitutions in each Task's configuration. This way the variable only needs to be changed in one place.

# Define our substitution. This is only for substitutiosns!\nMY_SPECIAL_SUB: \"EXPMT_DEPENDENT_VALUE\"  # Can change here once per experiment!\n\nRunTask1:\n  special_var: \"{{ MY_SPECIAL_SUB }}\"\n  var_1: 1\n  var_2: \"a\"\n  # ...\n\nRunTask2:\n  special_var: \"{{ MY_SPECIAL_SUB }}\"\n  var_3: \"abcd\"\n  var_4: 123\n  # ...\n\nRunTask3:\n  special_var: \"{{ MY_SPECIAL_SUB }}\"\n  #...\n\n# ... and so on\n
"},{"location":"usage/#gotchas","title":"Gotchas!","text":"

Order matters

While in general you can use parameters that appear later in a YAML document to substitute for values of parameters that appear earlier, the substitutions themselves will be performed in order of appearance. It is therefore NOT possible to correctly use a later parameter as a substitution for an earlier one, if the later one itself depends on a substitution. The YAML document, however, can be rearranged without error. The order in the YAML document has no effect on execution order which is determined purely by the workflow definition. As mentioned above, the document is not validated in its entirety so rearrangements are allowed. For example consider the following situation which produces an incorrect substitution:

%YAML 1.3\n---\ntitle: \"Configuration to Test YAML Substitution\"\nexperiment: \"TestYAMLSubs\"\nrun: 12\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/data/lcls/ds/exp/experiment/scratch\"\n...\n---\nRunTaskOne:\n  input_dir: \"{{ RunTaskTwo.path }}\"  # Will incorrectly be \"{{ work_dir }}/additional_path/{{ $RUN }}\"\n  # ...\n\nRunTaskTwo:\n  # Remember `work_dir` and `run` come from the header document and don't need to\n  # be qualified\n  path: \"{{ work_dir }}/additional_path/{{ run }}\"\n...\n

This configuration can be rearranged to achieve the desired result:

%YAML 1.3\n---\ntitle: \"Configuration to Test YAML Substitution\"\nexperiment: \"TestYAMLSubs\"\nrun: 12\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/data/lcls/ds/exp/experiment/scratch\"\n...\n---\nRunTaskTwo:\n  # Remember `work_dir` comes from the header document and doesn't need to be qualified\n  path: \"{{ work_dir }}/additional_path/{{ run }}\"\n\nRunTaskOne:\n  input_dir: \"{{ RunTaskTwo.path }}\"  # Will now be /sdf/data/lcls/ds/exp/experiment/scratch/additional_path/12\n  # ...\n...\n

On the otherhand, relationships such as these may point to inconsistencies in the dependencies between Tasks which may warrant a refactor.

Found unhashable key

To avoid YAML parsing issues when using the substitution syntax, be sure to quote your substitutions. Before substitution is performed, a dictionary is first constructed by the pyyaml package which parses the document - it may fail to parse the document and raise an exception if the substitutions are not quoted. E.g.

# USE THIS\nMyTask:\n  var_sub: \"{{ other_var:04d }}\"\n\n# **DO NOT** USE THIS\nMyTask:\n  var_sub: {{ other_var:04d }}\n

During validation, Pydantic will by default cast variables if possible, because of this it is generally safe to use strings for substitutions. E.g. if your parameter is expecting an integer, and after substitution you pass \"2\", Pydantic will cast this to the int 2, and validation will succeed. As part of the substitution process limited type casting will also be handled if it is necessary for any formatting strings provided. E.g. \"{{ run:04d }}\" requires that run be an integer, so it will be treated as such in order to apply the formatting.

"},{"location":"usage/#custom-run-time-dags","title":"Custom Run-Time DAGs","text":"

In most cases, standard DAGs should be called as described above. However, Airflow also supports the dynamic creation of DAGs, e.g. to vary the input data to various steps, or the number of steps that will occur. Some of this functionality has been used to allow for user-defined DAGs which are passed in the form of a dictionary, allowing Airflow to construct the workflow as it is running.

A basic YAML syntax is used to construct a series of nested dictionaries which define a DAG. Consider a simplified serial femtosecond crystallography DAG which runs peak finding through merging and then calculates some statistics. I.e. we want an execution order that looks like:

peak_finder >> indexer >> merger >> hkl_comparer\n

We can alternatively define this DAG in YAML:

task_name: PeakFinderPyAlgos\nslurm_params: ''\nnext:\n- task_name: CrystFELIndexer\n  slurm_params: ''\n  next: []\n  - task_name: PartialatorMerger\n    slurm_params: ''\n    next: []\n    - task_name: HKLComparer\n      slurm_params: ''\n      next:\n

I.e. we define a tree where each node is constructed using Node(task_name: str, slurm_params: str, next: List[Node]).

As a second example, to run task1 followed by task2 and task3 in parellel we would use:

task_name: Task1\nslurm_params: ''\nnext:\n- task_name: Task2\n  slurm_params: ''\n  next: []\n- task_name: Task3\n  slurm_params: ''\n  next: []\n

In order to run a DAG defined in this way, we pass the path to the YAML file we have defined it in to the launch script using -W <path_to_dag>. This is instead of calling it by name. E.g.

/path/to/lute/launch_scripts/submit_launch_airflow.sh /path/to/lute/launch_scripts/launch_airflow.py -e <exp> -r <run> -c /path/to/config -W <path_to_dag> --test [--debug] [SLURM_ARGS]\n

Note that fewer options are currently supported for configuring the operators for each step of the DAG. The slurm arguments can be replaced in their entirety using a custom slurm_params string but individual options cannot be modified.

"},{"location":"usage/#debug-environment-variables","title":"Debug Environment Variables","text":"

Special markers have been inserted at certain points in the execution flow for LUTE. These can be enabled by setting the environment variables detailed below. These are intended to allow developers to exit the program at certain points to investigate behaviour or a bug. For instance, when working on configuration parsing, an environment variable can be set which exits the program after passing this step. This allows you to run LUTE otherwise as normal (described above), without having to modify any additional code or insert your own early exits.

Types of debug markers:

Developers can insert these markers as needed into their code to add new exit points, although as a rule of thumb they should be used sparingly, and generally only after major steps in the execution flow (e.g. after parsing, after beginning a task, after returning a result, etc.).

In order to include a new marker in your code:

from lute.execution.debug_utils import LUTE_DEBUG_EXIT\n\ndef my_code() -> None:\n    # ...\n    LUTE_DEBUG_EXIT(\"MYENVVAR\", \"Additional message to print\")\n    # If MYENVVAR is not set, the above function does nothing\n

You can enable a marker by setting to 1, e.g. to enable the example marker above while running Tester:

MYENVVAR=1 python -B run_task.py -t Tester -c config/test.yaml\n
"},{"location":"usage/#currently-used-environment-variables","title":"Currently used environment variables","text":""},{"location":"usage/#airflow-launch-and-dag-execution-steps","title":"Airflow Launch and DAG Execution Steps","text":"

The Airflow launch process actually involves a number of steps, and is rather complicated. There are two wrapper steps prior to getting to the actual Airflow API communication.

  1. launch_scripts/submit_launch_airflow.sh is run.
  2. This script calls /sdf/group/lcls/ds/tools/lute_launcher with all the same parameters that it was called with.
  3. lute_launcher runs the launch_scripts/launch_airflow.py script which was provided as the first argument. This is the true launch script
  4. launch_airflow.py communicates with the Airflow API, requesting that a specific DAG be launched. It then continues to run, and gathers the individual logs and the exit status of each step of the DAG.
  5. Airflow will then enter a loop of communication where it asks the JID to submit each step of the requested DAG as batch job using launch_scripts/submit_slurm.sh.

There are some specific reasons for this complexity:

"},{"location":"adrs/","title":"Architecture Decision Records","text":" ADR No. Record Date Title Status 1 2023-11-06 All analysis Tasks inherit from a base class Accepted 2 2023-11-06 Analysis Task submission and communication is performed via Executors Accepted 3 2023-11-06 Executors will run all Tasks via subprocess Proposed 4 2023-11-06 Airflow Operators and LUTE Executors are separate entities. Proposed 5 2023-12-06 Task-Executor IPC is Managed by Communicator Objects Proposed 6 2024-02-12 Third-party Config Files Managed by Templates Rendered by ThirdPartyTasks Proposed 7 2024-02-12 Task Configuration is Stored in a Database Managed by Executors Proposed 8 2024-03-18 Airflow credentials/authorization requires special launch program. Proposed 9 2024-04-15 Airflow launch script will run as long lived batch job. Proposed"},{"location":"adrs/MADR_LICENSE/","title":"MADR LICENSE","text":"

Copyright 2022 ADR Github Organization

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \u201cSoftware\u201d), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED \u201cAS IS\u201d, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

"},{"location":"adrs/adr-1/","title":"[ADR-1] All Analysis Tasks Inherit from a Base Class","text":"

Date: 2023-11-06

"},{"location":"adrs/adr-1/#status","title":"Status","text":"

Accepted

"},{"location":"adrs/adr-1/#context-and-problem-statement","title":"Context and Problem Statement","text":""},{"location":"adrs/adr-1/#decision","title":"Decision","text":""},{"location":"adrs/adr-1/#decision-drivers","title":"Decision Drivers","text":""},{"location":"adrs/adr-1/#considered-options","title":"Considered Options","text":""},{"location":"adrs/adr-1/#consequences","title":"Consequences","text":""},{"location":"adrs/adr-1/#compliance","title":"Compliance","text":""},{"location":"adrs/adr-1/#metadata","title":"Metadata","text":""},{"location":"adrs/adr-2/","title":"[ADR-2] Analysis Task Submission and Communication is Performed Via Executors","text":"

Date: 2023-11-06

"},{"location":"adrs/adr-2/#status","title":"Status","text":"

Accepted

"},{"location":"adrs/adr-2/#context-and-problem-statement","title":"Context and Problem Statement","text":""},{"location":"adrs/adr-2/#decision","title":"Decision","text":""},{"location":"adrs/adr-2/#decision-drivers","title":"Decision Drivers","text":""},{"location":"adrs/adr-2/#considered-options","title":"Considered Options","text":""},{"location":"adrs/adr-2/#consequences","title":"Consequences","text":""},{"location":"adrs/adr-2/#compliance","title":"Compliance","text":""},{"location":"adrs/adr-2/#metadata","title":"Metadata","text":""},{"location":"adrs/adr-3/","title":"[ADR-3] Executors will run all Tasks via subprocess","text":"

Date: 2023-11-06

"},{"location":"adrs/adr-3/#status","title":"Status","text":"

Proposed

"},{"location":"adrs/adr-3/#context-and-problem-statement","title":"Context and Problem Statement","text":""},{"location":"adrs/adr-3/#decision","title":"Decision","text":""},{"location":"adrs/adr-3/#decision-drivers","title":"Decision Drivers","text":""},{"location":"adrs/adr-3/#considered-options","title":"Considered Options","text":""},{"location":"adrs/adr-3/#consequences","title":"Consequences","text":""},{"location":"adrs/adr-3/#compliance","title":"Compliance","text":""},{"location":"adrs/adr-3/#metadata","title":"Metadata","text":""},{"location":"adrs/adr-4/","title":"[ADR-4] Airflow Operators and LUTE Executors are Separate Entities","text":"

Date: 2023-11-06

"},{"location":"adrs/adr-4/#status","title":"Status","text":"

Proposed

"},{"location":"adrs/adr-4/#context-and-problem-statement","title":"Context and Problem Statement","text":""},{"location":"adrs/adr-4/#decision","title":"Decision","text":""},{"location":"adrs/adr-4/#decision-drivers","title":"Decision Drivers","text":"

*

"},{"location":"adrs/adr-4/#considered-options","title":"Considered Options","text":"

*

"},{"location":"adrs/adr-4/#consequences","title":"Consequences","text":"

*

"},{"location":"adrs/adr-4/#compliance","title":"Compliance","text":""},{"location":"adrs/adr-4/#metadata","title":"Metadata","text":""},{"location":"adrs/adr-5/","title":"[ADR-5] Task-Executor IPC is Managed by Communicator Objects","text":"

Date: 2023-12-06

"},{"location":"adrs/adr-5/#status","title":"Status","text":"

Proposed

"},{"location":"adrs/adr-5/#context-and-problem-statement","title":"Context and Problem Statement","text":""},{"location":"adrs/adr-5/#decision","title":"Decision","text":"

Communicator objects which maintain simple read and write mechanisms for Message objects. These latter can contain arbitrary Python objects. Tasks do not interact directly with the communicator, but rather through specific instance methods which hide the communicator interfaces. Multiple Communicators can be used in parallel. The same Communicator objects are used identically at the Task and Executor layers - any changes to communication protocols are not transferred to the calling objects.

"},{"location":"adrs/adr-5/#decision-drivers","title":"Decision Drivers","text":""},{"location":"adrs/adr-5/#considered-options","title":"Considered Options","text":""},{"location":"adrs/adr-5/#communicator-types","title":"Communicator Types","text":""},{"location":"adrs/adr-5/#consequences","title":"Consequences","text":""},{"location":"adrs/adr-5/#compliance","title":"Compliance","text":""},{"location":"adrs/adr-5/#metadata","title":"Metadata","text":""},{"location":"adrs/adr-6/","title":"[ADR-6] Third-party Config Files Managed by Templates Rendered by ThirdPartyTasks","text":"

Date: 2024-02-12

"},{"location":"adrs/adr-6/#status","title":"Status","text":"

Proposed

"},{"location":"adrs/adr-6/#context-and-problem-statement","title":"Context and Problem Statement","text":""},{"location":"adrs/adr-6/#decision","title":"Decision","text":"

Templates will be used for the third party configuration files. A generic interface to heterogenous templates will be provided through a combination of pydantic models and the ThirdPartyTask implementation. The pydantic models will label extra arguments to ThirdPartyTasks as being TemplateParameters. I.e. any extra parameters are considered to be for a templated configuration file. The ThirdPartyTask will find the necessary template and render it if any extra parameters are found. This puts the burden of correct parsing on the template definition itself.

"},{"location":"adrs/adr-6/#decision-drivers","title":"Decision Drivers","text":""},{"location":"adrs/adr-6/#considered-options","title":"Considered Options","text":""},{"location":"adrs/adr-6/#consequences","title":"Consequences","text":""},{"location":"adrs/adr-6/#compliance","title":"Compliance","text":""},{"location":"adrs/adr-6/#metadata","title":"Metadata","text":""},{"location":"adrs/adr-7/","title":"[ADR-7] Task Configuration is Stored in a Database Managed by Executors","text":"

Date: 2024-02-12

"},{"location":"adrs/adr-7/#status","title":"Status","text":"

Proposed

"},{"location":"adrs/adr-7/#context-and-problem-statement","title":"Context and Problem Statement","text":""},{"location":"adrs/adr-7/#decision","title":"Decision","text":"

Upon Task completion the managing Executor will write the AnalysisConfig object, including TaskParameters, results and generic configuration information to a database. Some entries from this database can be retrieved to provide default files for TaskParameter fields; however, the Task itself has no knowledge, and does not access to the database.

"},{"location":"adrs/adr-7/#decision-drivers","title":"Decision Drivers","text":""},{"location":"adrs/adr-7/#considered-options","title":"Considered Options","text":""},{"location":"adrs/adr-7/#consequences","title":"Consequences","text":""},{"location":"adrs/adr-7/#compliance","title":"Compliance","text":""},{"location":"adrs/adr-7/#metadata","title":"Metadata","text":""},{"location":"adrs/adr-8/","title":"[ADR-8] Airflow credentials/authorization requires special launch program","text":"

Date: 2024-03-18

"},{"location":"adrs/adr-8/#status","title":"Status","text":"

Proposed

"},{"location":"adrs/adr-8/#context-and-problem-statement","title":"Context and Problem Statement","text":""},{"location":"adrs/adr-8/#decision","title":"Decision","text":"

A closed-source lute_launcher program will be used to run the Airflow launch scripts. This program accesses credentials with the correct permissions. Users should otherwise not have access to the credentials. This will help ensure the credentials can be used by everyone but only to run workflows and not perform restricted admin activities.

"},{"location":"adrs/adr-8/#decision-drivers","title":"Decision Drivers","text":""},{"location":"adrs/adr-8/#considered-options","title":"Considered Options","text":""},{"location":"adrs/adr-8/#consequences","title":"Consequences","text":""},{"location":"adrs/adr-8/#compliance","title":"Compliance","text":""},{"location":"adrs/adr-8/#metadata","title":"Metadata","text":""},{"location":"adrs/adr-9/","title":"[ADR-9] Airflow launch script will run as long lived batch job.","text":"

Date: 2024-04-15

"},{"location":"adrs/adr-9/#status","title":"Status","text":"

Proposed

"},{"location":"adrs/adr-9/#context-and-problem-statement","title":"Context and Problem Statement","text":""},{"location":"adrs/adr-9/#decision","title":"Decision","text":"

The Airflow launch script will be a long lived process, running for the duration of the entire DAG. It will provide basic status logging information, e.g. what Tasks are running, if they succeed or failed. Additionally, at the end of each Task job, the launch job will collect the log file from that job and append it to its own log.

As the Airflow launch script is an entry point used from the eLog, only its log file is available to users using that UI. By converting the launch script into a long-lived monitoring job it allows the log information to be easily accessible.

In order to accomplish this, the launch script must be submitted as a batch job, in order to comply with the 30 second timeout imposed by jobs run by the ARP. This necessitates providing an additional wrapper script.

"},{"location":"adrs/adr-9/#decision-drivers","title":"Decision Drivers","text":""},{"location":"adrs/adr-9/#considered-options","title":"Considered Options","text":""},{"location":"adrs/adr-9/#consequences","title":"Consequences","text":""},{"location":"adrs/adr-9/#compliance","title":"Compliance","text":""},{"location":"adrs/adr-9/#metadata","title":"Metadata","text":""},{"location":"adrs/madr_template/","title":"Madr template","text":""},{"location":"adrs/madr_template/#title","title":"Title","text":"

{ADR #X : Short description/title of feature/decision}

Date:

"},{"location":"adrs/madr_template/#status","title":"Status","text":"

{Accepted | Proposed | Rejected | Deprecated | Superseded} {If this proposal supersedes another, please indicate so, e.g. \"Status: Accepted, supersedes [ADR-3]\"} {Likewise, if this proposal was superseded, e.g. \"Status: Superseded by [ADR-2]\"}

"},{"location":"adrs/madr_template/#context-and-problem-statement","title":"Context and Problem Statement","text":"

{Describe the problem context and why this decision has been made/feature implemented.}

"},{"location":"adrs/madr_template/#decision","title":"Decision","text":"

{Describe how the solution was arrived at in the manner it was. You may use the sections below to help.}

"},{"location":"adrs/madr_template/#decision-drivers","title":"Decision Drivers","text":""},{"location":"adrs/madr_template/#considered-options","title":"Considered Options","text":""},{"location":"adrs/madr_template/#consequences","title":"Consequences","text":"

{Short description of anticipated consequences} * {Anticipated consequence 1} * {Anticipated consequence 2}

"},{"location":"adrs/madr_template/#compliance","title":"Compliance","text":"

{How will the decision/implementation be enforced. How will compliance be validated?}

"},{"location":"adrs/madr_template/#metadata","title":"Metadata","text":"

{Any additional information to include}

"},{"location":"design/database/","title":"LUTE Configuration Database Specification","text":"

Date: 2024-02-12 VERSION: v0.1

"},{"location":"design/database/#basic-outline","title":"Basic Outline","text":""},{"location":"design/database/#gen_cfg-table","title":"gen_cfg table","text":"

The general configuration table contains entries which may be shared between multiple Tasks. The format of the table is:

id title experiment run date lute_version task_timeout 2 \"My experiment desc\" \"EXPx00000 1 YYYY/MM/DD 0.1 6000

These parameters are extracted from the TaskParameters object. Each of those contains an AnalysisHeader object stored in the lute_config variable. For a given experimental run, this value will be shared across any Tasks that are executed.

"},{"location":"design/database/#column-descriptions","title":"Column descriptions","text":"Column Description id ID of the entry in this table. title Arbitrary description/title of the purpose of analysis. E.g. what kind of experiment is being conducted experiment LCLS Experiment. Can be a placeholder if debugging, etc. run LCLS Acquisition run. Can be a placeholder if debugging, testing, etc. date Date the configuration file was first setup. lute_version Version of the codebase being used to execute Tasks. task_timeout The maximum amount of time in seconds that a Task can run before being cancelled."},{"location":"design/database/#exec_cfg-table","title":"exec_cfg table","text":"

The Executor table contains information on the environment provided to the Executor for Task execution, the polling interval used for IPC between the Task and Executor and information on the communicator protocols used for IPC. This information can be shared between Tasks or between experimental runs, but not necessarily every Task of a given run will use exactly the same Executor configuration and environment.

id env poll_interval communicator_desc 2 \"VAR1=val1;VAR2=val2\" 0.1 \"PipeCommunicator...;SocketCommunicator...\""},{"location":"design/database/#column-descriptions_1","title":"Column descriptions","text":"Column Description id ID of the entry in this table. env Execution environment used by the Executor and by proxy any Tasks submitted by an Executor matching this entry. Environment is stored as a string with variables delimited by \";\" poll_interval Polling interval used for Task monitoring. communicator_desc Description of the Communicators used.

NOTE: The env column currently only stores variables related to SLURM or LUTE itself.

"},{"location":"design/database/#task-tables","title":"Task tables","text":"

For every Task a table of the following format will be created. The exact number of columns will depend on the specific Task, as the number of parameters can vary between them, and each parameter gets its own column. Within a table, multiple experiments and runs can coexist. The experiment and run are not recorded directly. Instead, the first two columns point to the id of entries in the general configuration and Executor tables respectively. The general configuration table entry will contain the experiment and run information.

id timestamp gen_cfg_id exec_cfg_id P1 P2 ... Pn result.task_status result.summary result.payload result.impl_schemas valid_flag 2 \"YYYY-MM-DD HH:MM:SS\" 1 1 1 2 ... 3 \"COMPLETED\" \"Summary\" \"XYZ\" \"schema1;schema3;\" 1 3 \"YYYY-MM-DD HH:MM:SS\" 1 1 3 1 ... 4 \"FAILED\" \"Summary\" \"XYZ\" \"schema1;schema3;\" 0

Parameter sets which can be described as nested dictionaries are flattened and then delimited with a . to create column names. Parameters which are lists (or Python tuples, etc.) have a column for each entry with names that include an index (counting from 0). E.g. consider the following dictionary of parameters:

param_dict: Dict[str, Any] = {\n    \"a\": {               # First parameter a\n        \"b\": (1, 2),\n        \"c\": 1,\n        # ...\n    },\n    \"a2\": 4,             # Second parameter a2\n    # ...\n}\n

The dictionary a will produce columns: a.b[0], a.b[1], a.c, and so on.

"},{"location":"design/database/#column-descriptions_2","title":"Column descriptions","text":"Column Description id ID of the entry in this table. CURRENT_TIMESTAMP Full timestamp for the entry. gen_cfg_id ID of the entry in the general config table that applies to this Task entry. That table has, e.g., experiment and run number. exec_cfg_id The ID of the entry in the Executor table which applies to this Task entry. P1 - Pn The specific parameters of the Task. The P{1..n} are replaced by the actual parameter names. result.task_status Reported exit status of the Task. Note that the output may still be labeled invalid by the valid_flag (see below). result.summary Short text summary of the Task result. This is provided by the Task, or sometimes the Executor. result.payload Full description of result from the Task. If the object is incompatible with the database, will instead be a pointer to where it can be found. result.impl_schemas A string of semi-colon separated schema(s) implemented by the Task. Schemas describe conceptually the type output the Task produces. valid_flag A boolean flag for whether the result is valid. May be 0 (False) if e.g., data is missing, or corrupt, or reported status is failed.

NOTE: The result.payload may be distinct from the output files. Payloads can be specified in terms of output parameters, specific output files, or are an optional summary of the results provided by the Task. E.g. this may include graphical descriptions of results (plots, figures, etc.). In many cases, however, the output files will most likely be pointed to by a parameter in one of the columns P{1...n} - if properly specified in the TaskParameters model the value of this output parameter will be replicated in the result.payload column as well..

"},{"location":"design/database/#api","title":"API","text":"

This API is intended to be used at the Executor level, with some calls intended to provide default values for Pydantic models. Utilities for reading and inspecting the database outside of normal Task execution are addressed in the following subheader.

"},{"location":"design/database/#write","title":"Write","text":""},{"location":"design/database/#read","title":"Read","text":""},{"location":"design/database/#utilities","title":"Utilities","text":""},{"location":"design/database/#scripts","title":"Scripts","text":""},{"location":"design/database/#tui-and-gui","title":"TUI and GUI","text":""},{"location":"source/managed_tasks/","title":"managed_tasks","text":"

LUTE Managed Tasks.

Executor-managed Tasks with specific environment specifications are defined here.

"},{"location":"source/managed_tasks/#managed_tasks.BinaryErrTester","title":"BinaryErrTester = Executor('TestBinaryErr') module-attribute","text":"

Runs a test of a third-party task that fails.

"},{"location":"source/managed_tasks/#managed_tasks.BinaryTester","title":"BinaryTester: Executor = Executor('TestBinary') module-attribute","text":"

Runs a basic test of a multi-threaded third-party Task.

"},{"location":"source/managed_tasks/#managed_tasks.CrystFELIndexer","title":"CrystFELIndexer: Executor = Executor('IndexCrystFEL') module-attribute","text":"

Runs crystallographic indexing using CrystFEL.

"},{"location":"source/managed_tasks/#managed_tasks.DimpleSolver","title":"DimpleSolver: Executor = Executor('DimpleSolve') module-attribute","text":"

Solves a crystallographic structure using molecular replacement.

"},{"location":"source/managed_tasks/#managed_tasks.HKLComparer","title":"HKLComparer: Executor = Executor('CompareHKL') module-attribute","text":"

Runs analysis on merge results for statistics/figures of merit..

"},{"location":"source/managed_tasks/#managed_tasks.HKLManipulator","title":"HKLManipulator: Executor = Executor('ManipulateHKL') module-attribute","text":"

Performs format conversions (among other things) of merge results.

"},{"location":"source/managed_tasks/#managed_tasks.MultiNodeCommunicationTester","title":"MultiNodeCommunicationTester: MPIExecutor = MPIExecutor('TestMultiNodeCommunication') module-attribute","text":"

Runs a test to confirm communication works between multiple nodes.

"},{"location":"source/managed_tasks/#managed_tasks.PartialatorMerger","title":"PartialatorMerger: Executor = Executor('MergePartialator') module-attribute","text":"

Runs crystallographic merging using CrystFEL's partialator.

"},{"location":"source/managed_tasks/#managed_tasks.PeakFinderPsocake","title":"PeakFinderPsocake: Executor = Executor('FindPeaksPsocake') module-attribute","text":"

Performs Bragg peak finding using psocake - DEPRECATED.

"},{"location":"source/managed_tasks/#managed_tasks.PeakFinderPyAlgos","title":"PeakFinderPyAlgos: MPIExecutor = MPIExecutor('FindPeaksPyAlgos') module-attribute","text":"

Performs Bragg peak finding using the PyAlgos algorithm.

"},{"location":"source/managed_tasks/#managed_tasks.ReadTester","title":"ReadTester: Executor = Executor('TestReadOutput') module-attribute","text":"

Runs a test to confirm database reading.

"},{"location":"source/managed_tasks/#managed_tasks.SHELXCRunner","title":"SHELXCRunner: Executor = Executor('RunSHELXC') module-attribute","text":"

Runs CCP4 SHELXC - needed for crystallographic phasing.

"},{"location":"source/managed_tasks/#managed_tasks.SmallDataProducer","title":"SmallDataProducer: Executor = Executor('SubmitSMD') module-attribute","text":"

Runs the production of a smalldata HDF5 file.

"},{"location":"source/managed_tasks/#managed_tasks.SocketTester","title":"SocketTester: Executor = Executor('TestSocket') module-attribute","text":"

Runs a test of socket-based communication.

"},{"location":"source/managed_tasks/#managed_tasks.StreamFileConcatenator","title":"StreamFileConcatenator: Executor = Executor('ConcatenateStreamFiles') module-attribute","text":"

Concatenates results from crystallographic indexing of multiple runs.

"},{"location":"source/managed_tasks/#managed_tasks.Tester","title":"Tester: Executor = Executor('Test') module-attribute","text":"

Runs a basic test of a first-party Task.

"},{"location":"source/managed_tasks/#managed_tasks.WriteTester","title":"WriteTester: Executor = Executor('TestWriteOutput') module-attribute","text":"

Runs a test to confirm database writing.

"},{"location":"source/execution/debug_utils/","title":"debug_utils","text":"

Functions to assist in debugging execution of LUTE.

Functions:

Name Description LUTE_DEBUG_EXIT

str, str_dump: Optional[str]): Exits the program if the provided env_var is set. Optionally, also prints a message if provided.

Raises:

Type Description ValidationError

Error raised by pydantic during data validation. (From Pydantic)

"},{"location":"source/execution/executor/","title":"executor","text":"

Base classes and functions for handling Task execution.

Executors run a Task as a subprocess and handle all communication with other services, e.g., the eLog. They accept specific handlers to override default stream parsing.

Event handlers/hooks are implemented as standalone functions which can be added to an Executor.

Classes:

Name Description AnalysisConfig

Data class for holding a managed Task's configuration.

BaseExecutor

Abstract base class from which all Executors are derived.

Executor

Default Executor implementing all basic functionality and IPC.

BinaryExecutor

Can execute any arbitrary binary/command as a managed task within the framework provided by LUTE.

"},{"location":"source/execution/executor/#execution.executor--exceptions","title":"Exceptions","text":""},{"location":"source/execution/executor/#execution.executor.BaseExecutor","title":"BaseExecutor","text":"

Bases: ABC

ABC to manage Task execution and communication with user services.

When running in a workflow, \"tasks\" (not the class instances) are submitted as Executors. The Executor manages environment setup, the actual Task submission, and communication regarding Task results and status with third party services like the eLog.

Attributes:

Methods:

Name Description add_hook

str, hook: Callable[[None], None]) -> None: Create a new hook to be called each time a specific event occurs.

add_default_hooks

Populate the event hooks with the default functions.

update_environment

Dict[str, str], update_path: str): Update the environment that is passed to the Task subprocess.

execute_task

Run the task as a subprocess.

Source code in lute/execution/executor.py
class BaseExecutor(ABC):\n    \"\"\"ABC to manage Task execution and communication with user services.\n\n    When running in a workflow, \"tasks\" (not the class instances) are submitted\n    as `Executors`. The Executor manages environment setup, the actual Task\n    submission, and communication regarding Task results and status with third\n    party services like the eLog.\n\n    Attributes:\n\n    Methods:\n        add_hook(event: str, hook: Callable[[None], None]) -> None: Create a\n            new hook to be called each time a specific event occurs.\n\n        add_default_hooks() -> None: Populate the event hooks with the default\n            functions.\n\n        update_environment(env: Dict[str, str], update_path: str): Update the\n            environment that is passed to the Task subprocess.\n\n        execute_task(): Run the task as a subprocess.\n    \"\"\"\n\n    class Hooks:\n        \"\"\"A container class for the Executor's event hooks.\n\n        There is a corresponding function (hook) for each event/signal. Each\n        function takes two parameters - a reference to the Executor (self) and\n        a reference to the Message (msg) which includes the corresponding\n        signal.\n        \"\"\"\n\n        def no_pickle_mode(self: Self, msg: Message): ...\n\n        def task_started(self: Self, msg: Message): ...\n\n        def task_failed(self: Self, msg: Message): ...\n\n        def task_stopped(self: Self, msg: Message): ...\n\n        def task_done(self: Self, msg: Message): ...\n\n        def task_cancelled(self: Self, msg: Message): ...\n\n        def task_result(self: Self, msg: Message): ...\n\n    def __init__(\n        self,\n        task_name: str,\n        communicators: List[Communicator],\n        poll_interval: float = 0.05,\n    ) -> None:\n        \"\"\"The Executor will manage the subprocess in which `task_name` is run.\n\n        Args:\n            task_name (str): The name of the Task to be submitted. Must match\n                the Task's class name exactly. The parameter specification must\n                also be in a properly named model to be identified.\n\n            communicators (List[Communicator]): A list of one or more\n                communicators which manage information flow to/from the Task.\n                Subclasses may have different defaults, and new functionality\n                can be introduced by composing Executors with communicators.\n\n            poll_interval (float): Time to wait between reading/writing to the\n                managed subprocess. In seconds.\n        \"\"\"\n        result: TaskResult = TaskResult(\n            task_name=task_name, task_status=TaskStatus.PENDING, summary=\"\", payload=\"\"\n        )\n        task_parameters: Optional[TaskParameters] = None\n        task_env: Dict[str, str] = os.environ.copy()\n        self._communicators: List[Communicator] = communicators\n        communicator_desc: List[str] = []\n        for comm in self._communicators:\n            comm.stage_communicator()\n            communicator_desc.append(str(comm))\n\n        self._analysis_desc: DescribedAnalysis = DescribedAnalysis(\n            task_result=result,\n            task_parameters=task_parameters,\n            task_env=task_env,\n            poll_interval=poll_interval,\n            communicator_desc=communicator_desc,\n        )\n\n    def add_hook(self, event: str, hook: Callable[[Self, Message], None]) -> None:\n        \"\"\"Add a new hook.\n\n        Each hook is a function called any time the Executor receives a signal\n        for a particular event, e.g. Task starts, Task ends, etc. Calling this\n        method will remove any hook that currently exists for the event. I.e.\n        only one hook can be called per event at a time. Creating hooks for\n        events which do not exist is not allowed.\n\n        Args:\n            event (str): The event for which the hook will be called.\n\n            hook (Callable[[None], None]) The function to be called during each\n                occurrence of the event.\n        \"\"\"\n        if event.upper() in LUTE_SIGNALS:\n            setattr(self.Hooks, event.lower(), hook)\n\n    @abstractmethod\n    def add_default_hooks(self) -> None:\n        \"\"\"Populate the set of default event hooks.\"\"\"\n\n        ...\n\n    def update_environment(\n        self, env: Dict[str, str], update_path: str = \"prepend\"\n    ) -> None:\n        \"\"\"Update the stored set of environment variables.\n\n        These are passed to the subprocess to setup its environment.\n\n        Args:\n            env (Dict[str, str]): A dictionary of \"VAR\":\"VALUE\" pairs of\n                environment variables to be added to the subprocess environment.\n                If any variables already exist, the new variables will\n                overwrite them (except PATH, see below).\n\n            update_path (str): If PATH is present in the new set of variables,\n                this argument determines how the old PATH is dealt with. There\n                are three options:\n                * \"prepend\" : The new PATH values are prepended to the old ones.\n                * \"append\" : The new PATH values are appended to the old ones.\n                * \"overwrite\" : The old PATH is overwritten by the new one.\n                \"prepend\" is the default option. If PATH is not present in the\n                current environment, the new PATH is used without modification.\n        \"\"\"\n        if \"PATH\" in env:\n            sep: str = os.pathsep\n            if update_path == \"prepend\":\n                env[\"PATH\"] = (\n                    f\"{env['PATH']}{sep}{self._analysis_desc.task_env['PATH']}\"\n                )\n            elif update_path == \"append\":\n                env[\"PATH\"] = (\n                    f\"{self._analysis_desc.task_env['PATH']}{sep}{env['PATH']}\"\n                )\n            elif update_path == \"overwrite\":\n                pass\n            else:\n                raise ValueError(\n                    (\n                        f\"{update_path} is not a valid option for `update_path`!\"\n                        \" Options are: prepend, append, overwrite.\"\n                    )\n                )\n        os.environ.update(env)\n        self._analysis_desc.task_env.update(env)\n\n    def shell_source(self, env: str) -> None:\n        \"\"\"Source a script.\n\n        Unlike `update_environment` this method sources a new file.\n\n        Args:\n            env (str): Path to the script to source.\n        \"\"\"\n        import sys\n\n        if not os.path.exists(env):\n            logger.info(f\"Cannot source environment from {env}!\")\n            return\n\n        script: str = (\n            f\"set -a\\n\"\n            f'source \"{env}\" >/dev/null\\n'\n            f'{sys.executable} -c \"import os; print(dict(os.environ))\"\\n'\n        )\n        logger.info(f\"Sourcing file {env}\")\n        o, e = subprocess.Popen(\n            [\"bash\", \"-c\", script], stdout=subprocess.PIPE\n        ).communicate()\n        new_environment: Dict[str, str] = eval(o)\n        self._analysis_desc.task_env = new_environment\n\n    def _pre_task(self) -> None:\n        \"\"\"Any actions to be performed before task submission.\n\n        This method may or may not be used by subclasses. It may be useful\n        for logging etc.\n        \"\"\"\n        # This prevents the Executors in managed_tasks.py from all acquiring\n        # resources like sockets.\n        for communicator in self._communicators:\n            communicator.delayed_setup()\n            # Not great, but experience shows we need a bit of time to setup\n            # network.\n            time.sleep(0.1)\n        # Propagate any env vars setup by Communicators - only update LUTE_ vars\n        tmp: Dict[str, str] = {\n            key: os.environ[key] for key in os.environ if \"LUTE_\" in key\n        }\n        self._analysis_desc.task_env.update(tmp)\n\n    def _submit_task(self, cmd: str) -> subprocess.Popen:\n        proc: subprocess.Popen = subprocess.Popen(\n            cmd.split(),\n            stdout=subprocess.PIPE,\n            stderr=subprocess.PIPE,\n            env=self._analysis_desc.task_env,\n        )\n        os.set_blocking(proc.stdout.fileno(), False)\n        os.set_blocking(proc.stderr.fileno(), False)\n        return proc\n\n    @abstractmethod\n    def _task_loop(self, proc: subprocess.Popen) -> None:\n        \"\"\"Actions to perform while the Task is running.\n\n        This function is run in the body of a loop until the Task signals\n        that its finished.\n        \"\"\"\n        ...\n\n    @abstractmethod\n    def _finalize_task(self, proc: subprocess.Popen) -> None:\n        \"\"\"Any actions to be performed after the Task has ended.\n\n        Examples include a final clearing of the pipes, retrieving results,\n        reporting to third party services, etc.\n        \"\"\"\n        ...\n\n    def _submit_cmd(self, executable_path: str, params: str) -> str:\n        \"\"\"Return a formatted command for launching Task subprocess.\n\n        May be overridden by subclasses.\n\n        Args:\n            executable_path (str): Path to the LUTE subprocess script.\n\n            params (str): String of formatted command-line arguments.\n\n        Returns:\n            cmd (str): Appropriately formatted command for this Executor.\n        \"\"\"\n        cmd: str = \"\"\n        if __debug__:\n            cmd = f\"python -B {executable_path} {params}\"\n        else:\n            cmd = f\"python -OB {executable_path} {params}\"\n\n        return cmd\n\n    def execute_task(self) -> None:\n        \"\"\"Run the requested Task as a subprocess.\"\"\"\n        self._pre_task()\n        lute_path: Optional[str] = os.getenv(\"LUTE_PATH\")\n        if lute_path is None:\n            logger.debug(\"Absolute path to subprocess_task.py not found.\")\n            lute_path = os.path.abspath(f\"{os.path.dirname(__file__)}/../..\")\n            self.update_environment({\"LUTE_PATH\": lute_path})\n        executable_path: str = f\"{lute_path}/subprocess_task.py\"\n        config_path: str = self._analysis_desc.task_env[\"LUTE_CONFIGPATH\"]\n        params: str = f\"-c {config_path} -t {self._analysis_desc.task_result.task_name}\"\n\n        cmd: str = self._submit_cmd(executable_path, params)\n        proc: subprocess.Popen = self._submit_task(cmd)\n\n        while self._task_is_running(proc):\n            self._task_loop(proc)\n            time.sleep(self._analysis_desc.poll_interval)\n\n        os.set_blocking(proc.stdout.fileno(), True)\n        os.set_blocking(proc.stderr.fileno(), True)\n\n        self._finalize_task(proc)\n        proc.stdout.close()\n        proc.stderr.close()\n        proc.wait()\n        if ret := proc.returncode:\n            logger.info(f\"Task failed with return code: {ret}\")\n            self._analysis_desc.task_result.task_status = TaskStatus.FAILED\n            self.Hooks.task_failed(self, msg=Message())\n        elif self._analysis_desc.task_result.task_status == TaskStatus.RUNNING:\n            # Ret code is 0, no exception was thrown, task forgot to set status\n            self._analysis_desc.task_result.task_status = TaskStatus.COMPLETED\n            logger.debug(f\"Task did not change from RUNNING status. Assume COMPLETED.\")\n            self.Hooks.task_done(self, msg=Message())\n        self._store_configuration()\n        for comm in self._communicators:\n            comm.clear_communicator()\n\n        if self._analysis_desc.task_result.task_status == TaskStatus.FAILED:\n            logger.info(\"Exiting after Task failure. Result recorded.\")\n            sys.exit(-1)\n\n        self.process_results()\n\n    def _store_configuration(self) -> None:\n        \"\"\"Store configuration and results in the LUTE database.\"\"\"\n        record_analysis_db(copy.deepcopy(self._analysis_desc))\n\n    def _task_is_running(self, proc: subprocess.Popen) -> bool:\n        \"\"\"Whether a subprocess is running.\n\n        Args:\n            proc (subprocess.Popen): The subprocess to determine the run status\n                of.\n\n        Returns:\n            bool: Is the subprocess task running.\n        \"\"\"\n        # Add additional conditions - don't want to exit main loop\n        # if only stopped\n        task_status: TaskStatus = self._analysis_desc.task_result.task_status\n        is_running: bool = task_status != TaskStatus.COMPLETED\n        is_running &= task_status != TaskStatus.CANCELLED\n        is_running &= task_status != TaskStatus.TIMEDOUT\n        return proc.poll() is None and is_running\n\n    def _stop(self, proc: subprocess.Popen) -> None:\n        \"\"\"Stop the Task subprocess.\"\"\"\n        os.kill(proc.pid, signal.SIGTSTP)\n        self._analysis_desc.task_result.task_status = TaskStatus.STOPPED\n\n    def _continue(self, proc: subprocess.Popen) -> None:\n        \"\"\"Resume a stopped Task subprocess.\"\"\"\n        os.kill(proc.pid, signal.SIGCONT)\n        self._analysis_desc.task_result.task_status = TaskStatus.RUNNING\n\n    def _set_result_from_parameters(self) -> None:\n        \"\"\"Use TaskParameters object to set TaskResult fields.\n\n        A result may be defined in terms of specific parameters. This is most\n        useful for ThirdPartyTasks which would not otherwise have an easy way of\n        reporting what the TaskResult is. There are two options for specifying\n        results from parameters:\n            1. A single parameter (Field) of the model has an attribute\n               `is_result`. This is a bool indicating that this parameter points\n               to a result. E.g. a parameter `output` may set `is_result=True`.\n            2. The `TaskParameters.Config` has a `result_from_params` attribute.\n               This is an appropriate option if the result is determinable for\n               the Task, but it is not easily defined by a single parameter. The\n               TaskParameters.Config.result_from_param can be set by a custom\n               validator, e.g. to combine the values of multiple parameters into\n               a single result. E.g. an `out_dir` and `out_file` parameter used\n               together specify the result. Currently only string specifiers are\n               supported.\n\n        A TaskParameters object specifies that it contains information about the\n        result by setting a single config option:\n                        TaskParameters.Config.set_result=True\n        In general, this method should only be called when the above condition is\n        met, however, there are minimal checks in it as well.\n        \"\"\"\n        # This method shouldn't be called unless appropriate\n        # But we will add extra guards here\n        if self._analysis_desc.task_parameters is None:\n            logger.debug(\n                \"Cannot set result from TaskParameters. TaskParameters is None!\"\n            )\n            return\n        if (\n            not hasattr(self._analysis_desc.task_parameters.Config, \"set_result\")\n            or not self._analysis_desc.task_parameters.Config.set_result\n        ):\n            logger.debug(\n                \"Cannot set result from TaskParameters. `set_result` not specified!\"\n            )\n            return\n\n        # First try to set from result_from_params (faster)\n        if self._analysis_desc.task_parameters.Config.result_from_params is not None:\n            result_from_params: str = (\n                self._analysis_desc.task_parameters.Config.result_from_params\n            )\n            logger.info(f\"TaskResult specified as {result_from_params}.\")\n            self._analysis_desc.task_result.payload = result_from_params\n        else:\n            # Iterate parameters to find the one that is the result\n            schema: Dict[str, Any] = self._analysis_desc.task_parameters.schema()\n            for param, value in self._analysis_desc.task_parameters.dict().items():\n                param_attrs: Dict[str, Any] = schema[\"properties\"][param]\n                if \"is_result\" in param_attrs:\n                    is_result: bool = param_attrs[\"is_result\"]\n                    if isinstance(is_result, bool) and is_result:\n                        logger.info(f\"TaskResult specified as {value}.\")\n                        self._analysis_desc.task_result.payload = value\n                    else:\n                        logger.debug(\n                            (\n                                f\"{param} specified as result! But specifier is of \"\n                                f\"wrong type: {type(is_result)}!\"\n                            )\n                        )\n                    break  # We should only have 1 result-like parameter!\n\n        # If we get this far and haven't changed the payload we should complain\n        if self._analysis_desc.task_result.payload == \"\":\n            task_name: str = self._analysis_desc.task_result.task_name\n            logger.debug(\n                (\n                    f\"{task_name} specified result be set from {task_name}Parameters,\"\n                    \" but no result provided! Check model definition!\"\n                )\n            )\n        # Now check for impl_schemas and pass to result.impl_schemas\n        # Currently unused\n        impl_schemas: Optional[str] = (\n            self._analysis_desc.task_parameters.Config.impl_schemas\n        )\n        self._analysis_desc.task_result.impl_schemas = impl_schemas\n        # If we set_result but didn't get schema information we should complain\n        if self._analysis_desc.task_result.impl_schemas is None:\n            task_name: str = self._analysis_desc.task_result.task_name\n            logger.debug(\n                (\n                    f\"{task_name} specified result be set from {task_name}Parameters,\"\n                    \" but no schema provided! Check model definition!\"\n                )\n            )\n\n    def process_results(self) -> None:\n        \"\"\"Perform any necessary steps to process TaskResults object.\n\n        Processing will depend on subclass. Examples of steps include, moving\n        files, converting file formats, compiling plots/figures into an HTML\n        file, etc.\n        \"\"\"\n        self._process_results()\n\n    @abstractmethod\n    def _process_results(self) -> None: ...\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.Hooks","title":"Hooks","text":"

A container class for the Executor's event hooks.

There is a corresponding function (hook) for each event/signal. Each function takes two parameters - a reference to the Executor (self) and a reference to the Message (msg) which includes the corresponding signal.

Source code in lute/execution/executor.py
class Hooks:\n    \"\"\"A container class for the Executor's event hooks.\n\n    There is a corresponding function (hook) for each event/signal. Each\n    function takes two parameters - a reference to the Executor (self) and\n    a reference to the Message (msg) which includes the corresponding\n    signal.\n    \"\"\"\n\n    def no_pickle_mode(self: Self, msg: Message): ...\n\n    def task_started(self: Self, msg: Message): ...\n\n    def task_failed(self: Self, msg: Message): ...\n\n    def task_stopped(self: Self, msg: Message): ...\n\n    def task_done(self: Self, msg: Message): ...\n\n    def task_cancelled(self: Self, msg: Message): ...\n\n    def task_result(self: Self, msg: Message): ...\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.__init__","title":"__init__(task_name, communicators, poll_interval=0.05)","text":"

The Executor will manage the subprocess in which task_name is run.

Parameters:

Name Type Description Default task_name str

The name of the Task to be submitted. Must match the Task's class name exactly. The parameter specification must also be in a properly named model to be identified.

required communicators List[Communicator]

A list of one or more communicators which manage information flow to/from the Task. Subclasses may have different defaults, and new functionality can be introduced by composing Executors with communicators.

required poll_interval float

Time to wait between reading/writing to the managed subprocess. In seconds.

0.05 Source code in lute/execution/executor.py
def __init__(\n    self,\n    task_name: str,\n    communicators: List[Communicator],\n    poll_interval: float = 0.05,\n) -> None:\n    \"\"\"The Executor will manage the subprocess in which `task_name` is run.\n\n    Args:\n        task_name (str): The name of the Task to be submitted. Must match\n            the Task's class name exactly. The parameter specification must\n            also be in a properly named model to be identified.\n\n        communicators (List[Communicator]): A list of one or more\n            communicators which manage information flow to/from the Task.\n            Subclasses may have different defaults, and new functionality\n            can be introduced by composing Executors with communicators.\n\n        poll_interval (float): Time to wait between reading/writing to the\n            managed subprocess. In seconds.\n    \"\"\"\n    result: TaskResult = TaskResult(\n        task_name=task_name, task_status=TaskStatus.PENDING, summary=\"\", payload=\"\"\n    )\n    task_parameters: Optional[TaskParameters] = None\n    task_env: Dict[str, str] = os.environ.copy()\n    self._communicators: List[Communicator] = communicators\n    communicator_desc: List[str] = []\n    for comm in self._communicators:\n        comm.stage_communicator()\n        communicator_desc.append(str(comm))\n\n    self._analysis_desc: DescribedAnalysis = DescribedAnalysis(\n        task_result=result,\n        task_parameters=task_parameters,\n        task_env=task_env,\n        poll_interval=poll_interval,\n        communicator_desc=communicator_desc,\n    )\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.add_default_hooks","title":"add_default_hooks() abstractmethod","text":"

Populate the set of default event hooks.

Source code in lute/execution/executor.py
@abstractmethod\ndef add_default_hooks(self) -> None:\n    \"\"\"Populate the set of default event hooks.\"\"\"\n\n    ...\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.add_hook","title":"add_hook(event, hook)","text":"

Add a new hook.

Each hook is a function called any time the Executor receives a signal for a particular event, e.g. Task starts, Task ends, etc. Calling this method will remove any hook that currently exists for the event. I.e. only one hook can be called per event at a time. Creating hooks for events which do not exist is not allowed.

Parameters:

Name Type Description Default event str

The event for which the hook will be called.

required Source code in lute/execution/executor.py
def add_hook(self, event: str, hook: Callable[[Self, Message], None]) -> None:\n    \"\"\"Add a new hook.\n\n    Each hook is a function called any time the Executor receives a signal\n    for a particular event, e.g. Task starts, Task ends, etc. Calling this\n    method will remove any hook that currently exists for the event. I.e.\n    only one hook can be called per event at a time. Creating hooks for\n    events which do not exist is not allowed.\n\n    Args:\n        event (str): The event for which the hook will be called.\n\n        hook (Callable[[None], None]) The function to be called during each\n            occurrence of the event.\n    \"\"\"\n    if event.upper() in LUTE_SIGNALS:\n        setattr(self.Hooks, event.lower(), hook)\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.execute_task","title":"execute_task()","text":"

Run the requested Task as a subprocess.

Source code in lute/execution/executor.py
def execute_task(self) -> None:\n    \"\"\"Run the requested Task as a subprocess.\"\"\"\n    self._pre_task()\n    lute_path: Optional[str] = os.getenv(\"LUTE_PATH\")\n    if lute_path is None:\n        logger.debug(\"Absolute path to subprocess_task.py not found.\")\n        lute_path = os.path.abspath(f\"{os.path.dirname(__file__)}/../..\")\n        self.update_environment({\"LUTE_PATH\": lute_path})\n    executable_path: str = f\"{lute_path}/subprocess_task.py\"\n    config_path: str = self._analysis_desc.task_env[\"LUTE_CONFIGPATH\"]\n    params: str = f\"-c {config_path} -t {self._analysis_desc.task_result.task_name}\"\n\n    cmd: str = self._submit_cmd(executable_path, params)\n    proc: subprocess.Popen = self._submit_task(cmd)\n\n    while self._task_is_running(proc):\n        self._task_loop(proc)\n        time.sleep(self._analysis_desc.poll_interval)\n\n    os.set_blocking(proc.stdout.fileno(), True)\n    os.set_blocking(proc.stderr.fileno(), True)\n\n    self._finalize_task(proc)\n    proc.stdout.close()\n    proc.stderr.close()\n    proc.wait()\n    if ret := proc.returncode:\n        logger.info(f\"Task failed with return code: {ret}\")\n        self._analysis_desc.task_result.task_status = TaskStatus.FAILED\n        self.Hooks.task_failed(self, msg=Message())\n    elif self._analysis_desc.task_result.task_status == TaskStatus.RUNNING:\n        # Ret code is 0, no exception was thrown, task forgot to set status\n        self._analysis_desc.task_result.task_status = TaskStatus.COMPLETED\n        logger.debug(f\"Task did not change from RUNNING status. Assume COMPLETED.\")\n        self.Hooks.task_done(self, msg=Message())\n    self._store_configuration()\n    for comm in self._communicators:\n        comm.clear_communicator()\n\n    if self._analysis_desc.task_result.task_status == TaskStatus.FAILED:\n        logger.info(\"Exiting after Task failure. Result recorded.\")\n        sys.exit(-1)\n\n    self.process_results()\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.process_results","title":"process_results()","text":"

Perform any necessary steps to process TaskResults object.

Processing will depend on subclass. Examples of steps include, moving files, converting file formats, compiling plots/figures into an HTML file, etc.

Source code in lute/execution/executor.py
def process_results(self) -> None:\n    \"\"\"Perform any necessary steps to process TaskResults object.\n\n    Processing will depend on subclass. Examples of steps include, moving\n    files, converting file formats, compiling plots/figures into an HTML\n    file, etc.\n    \"\"\"\n    self._process_results()\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.shell_source","title":"shell_source(env)","text":"

Source a script.

Unlike update_environment this method sources a new file.

Parameters:

Name Type Description Default env str

Path to the script to source.

required Source code in lute/execution/executor.py
def shell_source(self, env: str) -> None:\n    \"\"\"Source a script.\n\n    Unlike `update_environment` this method sources a new file.\n\n    Args:\n        env (str): Path to the script to source.\n    \"\"\"\n    import sys\n\n    if not os.path.exists(env):\n        logger.info(f\"Cannot source environment from {env}!\")\n        return\n\n    script: str = (\n        f\"set -a\\n\"\n        f'source \"{env}\" >/dev/null\\n'\n        f'{sys.executable} -c \"import os; print(dict(os.environ))\"\\n'\n    )\n    logger.info(f\"Sourcing file {env}\")\n    o, e = subprocess.Popen(\n        [\"bash\", \"-c\", script], stdout=subprocess.PIPE\n    ).communicate()\n    new_environment: Dict[str, str] = eval(o)\n    self._analysis_desc.task_env = new_environment\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.update_environment","title":"update_environment(env, update_path='prepend')","text":"

Update the stored set of environment variables.

These are passed to the subprocess to setup its environment.

Parameters:

Name Type Description Default env Dict[str, str]

A dictionary of \"VAR\":\"VALUE\" pairs of environment variables to be added to the subprocess environment. If any variables already exist, the new variables will overwrite them (except PATH, see below).

required update_path str

If PATH is present in the new set of variables, this argument determines how the old PATH is dealt with. There are three options: * \"prepend\" : The new PATH values are prepended to the old ones. * \"append\" : The new PATH values are appended to the old ones. * \"overwrite\" : The old PATH is overwritten by the new one. \"prepend\" is the default option. If PATH is not present in the current environment, the new PATH is used without modification.

'prepend' Source code in lute/execution/executor.py
def update_environment(\n    self, env: Dict[str, str], update_path: str = \"prepend\"\n) -> None:\n    \"\"\"Update the stored set of environment variables.\n\n    These are passed to the subprocess to setup its environment.\n\n    Args:\n        env (Dict[str, str]): A dictionary of \"VAR\":\"VALUE\" pairs of\n            environment variables to be added to the subprocess environment.\n            If any variables already exist, the new variables will\n            overwrite them (except PATH, see below).\n\n        update_path (str): If PATH is present in the new set of variables,\n            this argument determines how the old PATH is dealt with. There\n            are three options:\n            * \"prepend\" : The new PATH values are prepended to the old ones.\n            * \"append\" : The new PATH values are appended to the old ones.\n            * \"overwrite\" : The old PATH is overwritten by the new one.\n            \"prepend\" is the default option. If PATH is not present in the\n            current environment, the new PATH is used without modification.\n    \"\"\"\n    if \"PATH\" in env:\n        sep: str = os.pathsep\n        if update_path == \"prepend\":\n            env[\"PATH\"] = (\n                f\"{env['PATH']}{sep}{self._analysis_desc.task_env['PATH']}\"\n            )\n        elif update_path == \"append\":\n            env[\"PATH\"] = (\n                f\"{self._analysis_desc.task_env['PATH']}{sep}{env['PATH']}\"\n            )\n        elif update_path == \"overwrite\":\n            pass\n        else:\n            raise ValueError(\n                (\n                    f\"{update_path} is not a valid option for `update_path`!\"\n                    \" Options are: prepend, append, overwrite.\"\n                )\n            )\n    os.environ.update(env)\n    self._analysis_desc.task_env.update(env)\n
"},{"location":"source/execution/executor/#execution.executor.Communicator","title":"Communicator","text":"

Bases: ABC

Source code in lute/execution/ipc.py
class Communicator(ABC):\n    def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n        \"\"\"Abstract Base Class for IPC Communicator objects.\n\n        Args:\n            party (Party): Which object (side/process) the Communicator is\n                managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n            use_pickle (bool): Whether to serialize data using pickle prior to\n                sending it.\n        \"\"\"\n        self._party = party\n        self._use_pickle = use_pickle\n        self.desc = \"Communicator abstract base class.\"\n\n    @abstractmethod\n    def read(self, proc: subprocess.Popen) -> Message:\n        \"\"\"Method for reading data through the communication mechanism.\"\"\"\n        ...\n\n    @abstractmethod\n    def write(self, msg: Message) -> None:\n        \"\"\"Method for sending data through the communication mechanism.\"\"\"\n        ...\n\n    def __str__(self):\n        name: str = str(type(self)).split(\"'\")[1].split(\".\")[-1]\n        return f\"{name}: {self.desc}\"\n\n    def __repr__(self):\n        return self.__str__()\n\n    def __enter__(self) -> Self:\n        return self\n\n    def __exit__(self) -> None: ...\n\n    @property\n    def has_messages(self) -> bool:\n        \"\"\"Whether the Communicator has remaining messages.\n\n        The precise method for determining whether there are remaining messages\n        will depend on the specific Communicator sub-class.\n        \"\"\"\n        return False\n\n    def stage_communicator(self):\n        \"\"\"Alternative method for staging outside of context manager.\"\"\"\n        self.__enter__()\n\n    def clear_communicator(self):\n        \"\"\"Alternative exit method outside of context manager.\"\"\"\n        self.__exit__()\n\n    def delayed_setup(self):\n        \"\"\"Any setup that should be done later than init.\"\"\"\n        ...\n
"},{"location":"source/execution/executor/#execution.executor.Communicator.has_messages","title":"has_messages: bool property","text":"

Whether the Communicator has remaining messages.

The precise method for determining whether there are remaining messages will depend on the specific Communicator sub-class.

"},{"location":"source/execution/executor/#execution.executor.Communicator.__init__","title":"__init__(party=Party.TASK, use_pickle=True)","text":"

Abstract Base Class for IPC Communicator objects.

Parameters:

Name Type Description Default party Party

Which object (side/process) the Communicator is managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.

TASK use_pickle bool

Whether to serialize data using pickle prior to sending it.

True Source code in lute/execution/ipc.py
def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n    \"\"\"Abstract Base Class for IPC Communicator objects.\n\n    Args:\n        party (Party): Which object (side/process) the Communicator is\n            managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n        use_pickle (bool): Whether to serialize data using pickle prior to\n            sending it.\n    \"\"\"\n    self._party = party\n    self._use_pickle = use_pickle\n    self.desc = \"Communicator abstract base class.\"\n
"},{"location":"source/execution/executor/#execution.executor.Communicator.clear_communicator","title":"clear_communicator()","text":"

Alternative exit method outside of context manager.

Source code in lute/execution/ipc.py
def clear_communicator(self):\n    \"\"\"Alternative exit method outside of context manager.\"\"\"\n    self.__exit__()\n
"},{"location":"source/execution/executor/#execution.executor.Communicator.delayed_setup","title":"delayed_setup()","text":"

Any setup that should be done later than init.

Source code in lute/execution/ipc.py
def delayed_setup(self):\n    \"\"\"Any setup that should be done later than init.\"\"\"\n    ...\n
"},{"location":"source/execution/executor/#execution.executor.Communicator.read","title":"read(proc) abstractmethod","text":"

Method for reading data through the communication mechanism.

Source code in lute/execution/ipc.py
@abstractmethod\ndef read(self, proc: subprocess.Popen) -> Message:\n    \"\"\"Method for reading data through the communication mechanism.\"\"\"\n    ...\n
"},{"location":"source/execution/executor/#execution.executor.Communicator.stage_communicator","title":"stage_communicator()","text":"

Alternative method for staging outside of context manager.

Source code in lute/execution/ipc.py
def stage_communicator(self):\n    \"\"\"Alternative method for staging outside of context manager.\"\"\"\n    self.__enter__()\n
"},{"location":"source/execution/executor/#execution.executor.Communicator.write","title":"write(msg) abstractmethod","text":"

Method for sending data through the communication mechanism.

Source code in lute/execution/ipc.py
@abstractmethod\ndef write(self, msg: Message) -> None:\n    \"\"\"Method for sending data through the communication mechanism.\"\"\"\n    ...\n
"},{"location":"source/execution/executor/#execution.executor.Executor","title":"Executor","text":"

Bases: BaseExecutor

Basic implementation of an Executor which manages simple IPC with Task.

Attributes:

Methods:

Name Description add_hook

str, hook: Callable[[None], None]) -> None: Create a new hook to be called each time a specific event occurs.

add_default_hooks

Populate the event hooks with the default functions.

update_environment

Dict[str, str], update_path: str): Update the environment that is passed to the Task subprocess.

execute_task

Run the task as a subprocess.

Source code in lute/execution/executor.py
class Executor(BaseExecutor):\n    \"\"\"Basic implementation of an Executor which manages simple IPC with Task.\n\n    Attributes:\n\n    Methods:\n        add_hook(event: str, hook: Callable[[None], None]) -> None: Create a\n            new hook to be called each time a specific event occurs.\n\n        add_default_hooks() -> None: Populate the event hooks with the default\n            functions.\n\n        update_environment(env: Dict[str, str], update_path: str): Update the\n            environment that is passed to the Task subprocess.\n\n        execute_task(): Run the task as a subprocess.\n    \"\"\"\n\n    def __init__(\n        self,\n        task_name: str,\n        communicators: List[Communicator] = [\n            PipeCommunicator(Party.EXECUTOR),\n            SocketCommunicator(Party.EXECUTOR),\n        ],\n        poll_interval: float = 0.05,\n    ) -> None:\n        super().__init__(\n            task_name=task_name,\n            communicators=communicators,\n            poll_interval=poll_interval,\n        )\n        self.add_default_hooks()\n\n    def add_default_hooks(self) -> None:\n        \"\"\"Populate the set of default event hooks.\"\"\"\n\n        def no_pickle_mode(self: Executor, msg: Message):\n            for idx, communicator in enumerate(self._communicators):\n                if isinstance(communicator, PipeCommunicator):\n                    self._communicators[idx] = PipeCommunicator(\n                        Party.EXECUTOR, use_pickle=False\n                    )\n\n        self.add_hook(\"no_pickle_mode\", no_pickle_mode)\n\n        def task_started(self: Executor, msg: Message):\n            if isinstance(msg.contents, TaskParameters):\n                self._analysis_desc.task_parameters = msg.contents\n                # Maybe just run this no matter what? Rely on the other guards?\n                # Perhaps just check if ThirdPartyParameters?\n                # if isinstance(self._analysis_desc.task_parameters, ThirdPartyParameters):\n                if hasattr(self._analysis_desc.task_parameters.Config, \"set_result\"):\n                    # Third party Tasks may mark a parameter as the result\n                    # If so, setup the result now.\n                    self._set_result_from_parameters()\n            logger.info(\n                f\"Executor: {self._analysis_desc.task_result.task_name} started\"\n            )\n            self._analysis_desc.task_result.task_status = TaskStatus.RUNNING\n            elog_data: Dict[str, str] = {\n                f\"{self._analysis_desc.task_result.task_name} status\": \"RUNNING\",\n            }\n            post_elog_run_status(elog_data)\n\n        self.add_hook(\"task_started\", task_started)\n\n        def task_failed(self: Executor, msg: Message):\n            elog_data: Dict[str, str] = {\n                f\"{self._analysis_desc.task_result.task_name} status\": \"FAILED\",\n            }\n            post_elog_run_status(elog_data)\n\n        self.add_hook(\"task_failed\", task_failed)\n\n        def task_stopped(self: Executor, msg: Message):\n            elog_data: Dict[str, str] = {\n                f\"{self._analysis_desc.task_result.task_name} status\": \"STOPPED\",\n            }\n            post_elog_run_status(elog_data)\n\n        self.add_hook(\"task_stopped\", task_stopped)\n\n        def task_done(self: Executor, msg: Message):\n            elog_data: Dict[str, str] = {\n                f\"{self._analysis_desc.task_result.task_name} status\": \"COMPLETED\",\n            }\n            post_elog_run_status(elog_data)\n\n        self.add_hook(\"task_done\", task_done)\n\n        def task_cancelled(self: Executor, msg: Message):\n            elog_data: Dict[str, str] = {\n                f\"{self._analysis_desc.task_result.task_name} status\": \"CANCELLED\",\n            }\n            post_elog_run_status(elog_data)\n\n        self.add_hook(\"task_cancelled\", task_cancelled)\n\n        def task_result(self: Executor, msg: Message):\n            if isinstance(msg.contents, TaskResult):\n                self._analysis_desc.task_result = msg.contents\n                logger.info(self._analysis_desc.task_result.summary)\n                logger.info(self._analysis_desc.task_result.task_status)\n            elog_data: Dict[str, str] = {\n                f\"{self._analysis_desc.task_result.task_name} status\": \"COMPLETED\",\n            }\n            post_elog_run_status(elog_data)\n\n        self.add_hook(\"task_result\", task_result)\n\n    def _task_loop(self, proc: subprocess.Popen) -> None:\n        \"\"\"Actions to perform while the Task is running.\n\n        This function is run in the body of a loop until the Task signals\n        that its finished.\n        \"\"\"\n        for communicator in self._communicators:\n            while True:\n                msg: Message = communicator.read(proc)\n                if msg.signal is not None and msg.signal.upper() in LUTE_SIGNALS:\n                    hook: Callable[[Executor, Message], None] = getattr(\n                        self.Hooks, msg.signal.lower()\n                    )\n                    hook(self, msg)\n                if msg.contents is not None:\n                    if isinstance(msg.contents, str) and msg.contents != \"\":\n                        logger.info(msg.contents)\n                    elif not isinstance(msg.contents, str):\n                        logger.info(msg.contents)\n                if not communicator.has_messages:\n                    break\n\n    def _finalize_task(self, proc: subprocess.Popen) -> None:\n        \"\"\"Any actions to be performed after the Task has ended.\n\n        Examples include a final clearing of the pipes, retrieving results,\n        reporting to third party services, etc.\n        \"\"\"\n        self._task_loop(proc)  # Perform a final read.\n\n    def _process_results(self) -> None:\n        \"\"\"Performs result processing.\n\n        Actions include:\n        - For `ElogSummaryPlots`, will save the summary plot to the appropriate\n            directory for display in the eLog.\n        \"\"\"\n        task_result: TaskResult = self._analysis_desc.task_result\n        self._process_result_payload(task_result.payload)\n        self._process_result_summary(task_result.summary)\n\n    def _process_result_payload(self, payload: Any) -> None:\n        if self._analysis_desc.task_parameters is None:\n            logger.debug(\"Please run Task before using this method!\")\n            return\n        if isinstance(payload, ElogSummaryPlots):\n            # ElogSummaryPlots has figures and a display name\n            # display name also serves as a path.\n            expmt: str = self._analysis_desc.task_parameters.lute_config.experiment\n            base_path: str = f\"/sdf/data/lcls/ds/{expmt[:3]}/{expmt}/stats/summary\"\n            full_path: str = f\"{base_path}/{payload.display_name}\"\n            if not os.path.isdir(full_path):\n                os.makedirs(full_path)\n\n            # Preferred plots are pn.Tabs objects which save directly as html\n            # Only supported plot type that has \"save\" method - do not want to\n            # import plot modules here to do type checks.\n            if hasattr(payload.figures, \"save\"):\n                payload.figures.save(f\"{full_path}/report.html\")\n            else:\n                ...\n        elif isinstance(payload, str):\n            # May be a path to a file...\n            schemas: Optional[str] = self._analysis_desc.task_result.impl_schemas\n            # Should also check `impl_schemas` to determine what to do with path\n\n    def _process_result_summary(self, summary: str) -> None: ...\n
"},{"location":"source/execution/executor/#execution.executor.Executor.add_default_hooks","title":"add_default_hooks()","text":"

Populate the set of default event hooks.

Source code in lute/execution/executor.py
def add_default_hooks(self) -> None:\n    \"\"\"Populate the set of default event hooks.\"\"\"\n\n    def no_pickle_mode(self: Executor, msg: Message):\n        for idx, communicator in enumerate(self._communicators):\n            if isinstance(communicator, PipeCommunicator):\n                self._communicators[idx] = PipeCommunicator(\n                    Party.EXECUTOR, use_pickle=False\n                )\n\n    self.add_hook(\"no_pickle_mode\", no_pickle_mode)\n\n    def task_started(self: Executor, msg: Message):\n        if isinstance(msg.contents, TaskParameters):\n            self._analysis_desc.task_parameters = msg.contents\n            # Maybe just run this no matter what? Rely on the other guards?\n            # Perhaps just check if ThirdPartyParameters?\n            # if isinstance(self._analysis_desc.task_parameters, ThirdPartyParameters):\n            if hasattr(self._analysis_desc.task_parameters.Config, \"set_result\"):\n                # Third party Tasks may mark a parameter as the result\n                # If so, setup the result now.\n                self._set_result_from_parameters()\n        logger.info(\n            f\"Executor: {self._analysis_desc.task_result.task_name} started\"\n        )\n        self._analysis_desc.task_result.task_status = TaskStatus.RUNNING\n        elog_data: Dict[str, str] = {\n            f\"{self._analysis_desc.task_result.task_name} status\": \"RUNNING\",\n        }\n        post_elog_run_status(elog_data)\n\n    self.add_hook(\"task_started\", task_started)\n\n    def task_failed(self: Executor, msg: Message):\n        elog_data: Dict[str, str] = {\n            f\"{self._analysis_desc.task_result.task_name} status\": \"FAILED\",\n        }\n        post_elog_run_status(elog_data)\n\n    self.add_hook(\"task_failed\", task_failed)\n\n    def task_stopped(self: Executor, msg: Message):\n        elog_data: Dict[str, str] = {\n            f\"{self._analysis_desc.task_result.task_name} status\": \"STOPPED\",\n        }\n        post_elog_run_status(elog_data)\n\n    self.add_hook(\"task_stopped\", task_stopped)\n\n    def task_done(self: Executor, msg: Message):\n        elog_data: Dict[str, str] = {\n            f\"{self._analysis_desc.task_result.task_name} status\": \"COMPLETED\",\n        }\n        post_elog_run_status(elog_data)\n\n    self.add_hook(\"task_done\", task_done)\n\n    def task_cancelled(self: Executor, msg: Message):\n        elog_data: Dict[str, str] = {\n            f\"{self._analysis_desc.task_result.task_name} status\": \"CANCELLED\",\n        }\n        post_elog_run_status(elog_data)\n\n    self.add_hook(\"task_cancelled\", task_cancelled)\n\n    def task_result(self: Executor, msg: Message):\n        if isinstance(msg.contents, TaskResult):\n            self._analysis_desc.task_result = msg.contents\n            logger.info(self._analysis_desc.task_result.summary)\n            logger.info(self._analysis_desc.task_result.task_status)\n        elog_data: Dict[str, str] = {\n            f\"{self._analysis_desc.task_result.task_name} status\": \"COMPLETED\",\n        }\n        post_elog_run_status(elog_data)\n\n    self.add_hook(\"task_result\", task_result)\n
"},{"location":"source/execution/executor/#execution.executor.MPIExecutor","title":"MPIExecutor","text":"

Bases: Executor

Runs first-party Tasks that require MPI.

This Executor is otherwise identical to the standard Executor, except it uses mpirun for Task submission. Currently this Executor assumes a job has been submitted using SLURM as a first step. It will determine the number of MPI ranks based on the resources requested. As a fallback, it will try to determine the number of local cores available for cases where a job has not been submitted via SLURM. On S3DF, the second determination mechanism should accurately match the environment variable provided by SLURM indicating resources allocated.

This Executor will submit the Task to run with a number of processes equal to the total number of cores available minus 1. A single core is reserved for the Executor itself. Note that currently this means that you must submit on 3 cores or more, since MPI requires a minimum of 2 ranks, and the number of ranks is determined from the cores dedicated to Task execution.

Methods:

Name Description _submit_cmd

Run the task as a subprocess using mpirun.

Source code in lute/execution/executor.py
class MPIExecutor(Executor):\n    \"\"\"Runs first-party Tasks that require MPI.\n\n    This Executor is otherwise identical to the standard Executor, except it\n    uses `mpirun` for `Task` submission. Currently this Executor assumes a job\n    has been submitted using SLURM as a first step. It will determine the number\n    of MPI ranks based on the resources requested. As a fallback, it will try\n    to determine the number of local cores available for cases where a job has\n    not been submitted via SLURM. On S3DF, the second determination mechanism\n    should accurately match the environment variable provided by SLURM indicating\n    resources allocated.\n\n    This Executor will submit the Task to run with a number of processes equal\n    to the total number of cores available minus 1. A single core is reserved\n    for the Executor itself. Note that currently this means that you must submit\n    on 3 cores or more, since MPI requires a minimum of 2 ranks, and the number\n    of ranks is determined from the cores dedicated to Task execution.\n\n    Methods:\n        _submit_cmd: Run the task as a subprocess using `mpirun`.\n    \"\"\"\n\n    def _submit_cmd(self, executable_path: str, params: str) -> str:\n        \"\"\"Override submission command to use `mpirun`\n\n        Args:\n            executable_path (str): Path to the LUTE subprocess script.\n\n            params (str): String of formatted command-line arguments.\n\n        Returns:\n            cmd (str): Appropriately formatted command for this Executor.\n        \"\"\"\n        py_cmd: str = \"\"\n        nprocs: int = max(\n            int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1\n        )\n        mpi_cmd: str = f\"mpirun -np {nprocs}\"\n        if __debug__:\n            py_cmd = f\"python -B -u -m mpi4py.run {executable_path} {params}\"\n        else:\n            py_cmd = f\"python -OB -u -m mpi4py.run {executable_path} {params}\"\n\n        cmd: str = f\"{mpi_cmd} {py_cmd}\"\n        return cmd\n
"},{"location":"source/execution/executor/#execution.executor.Party","title":"Party","text":"

Bases: Enum

Identifier for which party (side/end) is using a communicator.

For some types of communication streams there may be different interfaces depending on which side of the communicator you are on. This enum is used by the communicator to determine which interface to use.

Source code in lute/execution/ipc.py
class Party(Enum):\n    \"\"\"Identifier for which party (side/end) is using a communicator.\n\n    For some types of communication streams there may be different interfaces\n    depending on which side of the communicator you are on. This enum is used\n    by the communicator to determine which interface to use.\n    \"\"\"\n\n    TASK = 0\n    \"\"\"\n    The Task (client) side.\n    \"\"\"\n    EXECUTOR = 1\n    \"\"\"\n    The Executor (server) side.\n    \"\"\"\n
"},{"location":"source/execution/executor/#execution.executor.Party.EXECUTOR","title":"EXECUTOR = 1 class-attribute instance-attribute","text":"

The Executor (server) side.

"},{"location":"source/execution/executor/#execution.executor.Party.TASK","title":"TASK = 0 class-attribute instance-attribute","text":"

The Task (client) side.

"},{"location":"source/execution/executor/#execution.executor.PipeCommunicator","title":"PipeCommunicator","text":"

Bases: Communicator

Provides communication through pipes over stderr/stdout.

The implementation of this communicator has reading and writing ocurring on stderr and stdout. In general the Task will be writing while the Executor will be reading. stderr is used for sending signals.

Source code in lute/execution/ipc.py
class PipeCommunicator(Communicator):\n    \"\"\"Provides communication through pipes over stderr/stdout.\n\n    The implementation of this communicator has reading and writing ocurring\n    on stderr and stdout. In general the `Task` will be writing while the\n    `Executor` will be reading. `stderr` is used for sending signals.\n    \"\"\"\n\n    def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n        \"\"\"IPC through pipes.\n\n        Arbitrary objects may be transmitted using pickle to serialize the data.\n        If pickle is not used\n\n        Args:\n            party (Party): Which object (side/process) the Communicator is\n                managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n            use_pickle (bool): Whether to serialize data using Pickle prior to\n                sending it. If False, data is assumed to be text whi\n        \"\"\"\n        super().__init__(party=party, use_pickle=use_pickle)\n        self.desc = \"Communicates through stderr and stdout using pickle.\"\n\n    def read(self, proc: subprocess.Popen) -> Message:\n        \"\"\"Read from stdout and stderr.\n\n        Args:\n            proc (subprocess.Popen): The process to read from.\n\n        Returns:\n            msg (Message): The message read, containing contents and signal.\n        \"\"\"\n        signal: Optional[str]\n        contents: Optional[str]\n        raw_signal: bytes = proc.stderr.read()\n        raw_contents: bytes = proc.stdout.read()\n        if raw_signal is not None:\n            signal = raw_signal.decode()\n        else:\n            signal = raw_signal\n        if raw_contents:\n            if self._use_pickle:\n                try:\n                    contents = pickle.loads(raw_contents)\n                except (pickle.UnpicklingError, ValueError, EOFError) as err:\n                    logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=False\")\n                    self._use_pickle = False\n                    contents = self._safe_unpickle_decode(raw_contents)\n            else:\n                try:\n                    contents = raw_contents.decode()\n                except UnicodeDecodeError as err:\n                    logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=True\")\n                    self._use_pickle = True\n                    contents = self._safe_unpickle_decode(raw_contents)\n        else:\n            contents = None\n\n        if signal and signal not in LUTE_SIGNALS:\n            # Some tasks write on stderr\n            # If the signal channel has \"non-signal\" info, add it to\n            # contents\n            if not contents:\n                contents = f\"({signal})\"\n            else:\n                contents = f\"{contents} ({signal})\"\n            signal = None\n\n        return Message(contents=contents, signal=signal)\n\n    def _safe_unpickle_decode(self, maybe_mixed: bytes) -> Optional[str]:\n        \"\"\"This method is used to unpickle and/or decode a bytes object.\n\n        It attempts to handle cases where contents can be mixed, i.e., part of\n        the message must be decoded and the other part unpickled. It handles\n        only two-way splits. If there are more complex arrangements such as:\n        <pickled>:<unpickled>:<pickled> etc, it will give up.\n\n        The simpler two way splits are unlikely to occur in normal usage. They\n        may arise when debugging if, e.g., `print` statements are mixed with the\n        usage of the `_report_to_executor` method.\n\n        Note that this method works because ONLY text data is assumed to be\n        sent via the pipes. The method needs to be revised to handle non-text\n        data if the `Task` is modified to also send that via PipeCommunicator.\n        The use of pickle is supported to provide for this option if it is\n        necessary. It may be deprecated in the future.\n\n        Be careful when making changes. This method has seemingly redundant\n        checks because unpickling will not throw an error if a full object can\n        be retrieved. That is, the library will ignore extraneous bytes. This\n        method attempts to retrieve that information if the pickled data comes\n        first in the stream.\n\n        Args:\n            maybe_mixed (bytes): A bytes object which could require unpickling,\n                decoding, or both.\n\n        Returns:\n            contents (Optional[str]): The unpickled/decoded contents if possible.\n                Otherwise, None.\n        \"\"\"\n        contents: Optional[str]\n        try:\n            contents = pickle.loads(maybe_mixed)\n            repickled: bytes = pickle.dumps(contents)\n            if len(repickled) < len(maybe_mixed):\n                # Successful unpickling, but pickle stops even if there are more bytes\n                try:\n                    additional_data: str = maybe_mixed[len(repickled) :].decode()\n                    contents = f\"{contents}{additional_data}\"\n                except UnicodeDecodeError:\n                    # Can't decode the bytes left by pickle, so they are lost\n                    missing_bytes: int = len(maybe_mixed) - len(repickled)\n                    logger.debug(\n                        f\"PipeCommunicator has truncated message. Unable to retrieve {missing_bytes} bytes.\"\n                    )\n        except (pickle.UnpicklingError, ValueError, EOFError) as err:\n            # Pickle may also throw a ValueError, e.g. this bytes: b\"Found! \\n\"\n            # Pickle may also throw an EOFError, eg. this bytes: b\"F0\\n\"\n            try:\n                contents = maybe_mixed.decode()\n            except UnicodeDecodeError as err2:\n                try:\n                    contents = maybe_mixed[: err2.start].decode()\n                    contents = f\"{contents}{pickle.loads(maybe_mixed[err2.start:])}\"\n                except Exception as err3:\n                    logger.debug(\n                        f\"PipeCommunicator unable to decode/parse data! {err3}\"\n                    )\n                    contents = None\n        return contents\n\n    def write(self, msg: Message) -> None:\n        \"\"\"Write to stdout and stderr.\n\n         The signal component is sent to `stderr` while the contents of the\n         Message are sent to `stdout`.\n\n        Args:\n            msg (Message): The Message to send.\n        \"\"\"\n        if self._use_pickle:\n            signal: bytes\n            if msg.signal:\n                signal = msg.signal.encode()\n            else:\n                signal = b\"\"\n\n            contents: bytes = pickle.dumps(msg.contents)\n\n            sys.stderr.buffer.write(signal)\n            sys.stdout.buffer.write(contents)\n\n            sys.stderr.buffer.flush()\n            sys.stdout.buffer.flush()\n        else:\n            raw_signal: str\n            if msg.signal:\n                raw_signal = msg.signal\n            else:\n                raw_signal = \"\"\n\n            raw_contents: str\n            if isinstance(msg.contents, str):\n                raw_contents = msg.contents\n            elif msg.contents is None:\n                raw_contents = \"\"\n            else:\n                raise ValueError(\n                    f\"Cannot send msg contents of type: {type(msg.contents)} when not using pickle!\"\n                )\n            sys.stderr.write(raw_signal)\n            sys.stdout.write(raw_contents)\n
"},{"location":"source/execution/executor/#execution.executor.PipeCommunicator.__init__","title":"__init__(party=Party.TASK, use_pickle=True)","text":"

IPC through pipes.

Arbitrary objects may be transmitted using pickle to serialize the data. If pickle is not used

Parameters:

Name Type Description Default party Party

Which object (side/process) the Communicator is managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.

TASK use_pickle bool

Whether to serialize data using Pickle prior to sending it. If False, data is assumed to be text whi

True Source code in lute/execution/ipc.py
def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n    \"\"\"IPC through pipes.\n\n    Arbitrary objects may be transmitted using pickle to serialize the data.\n    If pickle is not used\n\n    Args:\n        party (Party): Which object (side/process) the Communicator is\n            managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n        use_pickle (bool): Whether to serialize data using Pickle prior to\n            sending it. If False, data is assumed to be text whi\n    \"\"\"\n    super().__init__(party=party, use_pickle=use_pickle)\n    self.desc = \"Communicates through stderr and stdout using pickle.\"\n
"},{"location":"source/execution/executor/#execution.executor.PipeCommunicator.read","title":"read(proc)","text":"

Read from stdout and stderr.

Parameters:

Name Type Description Default proc Popen

The process to read from.

required

Returns:

Name Type Description msg Message

The message read, containing contents and signal.

Source code in lute/execution/ipc.py
def read(self, proc: subprocess.Popen) -> Message:\n    \"\"\"Read from stdout and stderr.\n\n    Args:\n        proc (subprocess.Popen): The process to read from.\n\n    Returns:\n        msg (Message): The message read, containing contents and signal.\n    \"\"\"\n    signal: Optional[str]\n    contents: Optional[str]\n    raw_signal: bytes = proc.stderr.read()\n    raw_contents: bytes = proc.stdout.read()\n    if raw_signal is not None:\n        signal = raw_signal.decode()\n    else:\n        signal = raw_signal\n    if raw_contents:\n        if self._use_pickle:\n            try:\n                contents = pickle.loads(raw_contents)\n            except (pickle.UnpicklingError, ValueError, EOFError) as err:\n                logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=False\")\n                self._use_pickle = False\n                contents = self._safe_unpickle_decode(raw_contents)\n        else:\n            try:\n                contents = raw_contents.decode()\n            except UnicodeDecodeError as err:\n                logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=True\")\n                self._use_pickle = True\n                contents = self._safe_unpickle_decode(raw_contents)\n    else:\n        contents = None\n\n    if signal and signal not in LUTE_SIGNALS:\n        # Some tasks write on stderr\n        # If the signal channel has \"non-signal\" info, add it to\n        # contents\n        if not contents:\n            contents = f\"({signal})\"\n        else:\n            contents = f\"{contents} ({signal})\"\n        signal = None\n\n    return Message(contents=contents, signal=signal)\n
"},{"location":"source/execution/executor/#execution.executor.PipeCommunicator.write","title":"write(msg)","text":"

Write to stdout and stderr.

The signal component is sent to stderr while the contents of the Message are sent to stdout.

Parameters:

Name Type Description Default msg Message

The Message to send.

required Source code in lute/execution/ipc.py
def write(self, msg: Message) -> None:\n    \"\"\"Write to stdout and stderr.\n\n     The signal component is sent to `stderr` while the contents of the\n     Message are sent to `stdout`.\n\n    Args:\n        msg (Message): The Message to send.\n    \"\"\"\n    if self._use_pickle:\n        signal: bytes\n        if msg.signal:\n            signal = msg.signal.encode()\n        else:\n            signal = b\"\"\n\n        contents: bytes = pickle.dumps(msg.contents)\n\n        sys.stderr.buffer.write(signal)\n        sys.stdout.buffer.write(contents)\n\n        sys.stderr.buffer.flush()\n        sys.stdout.buffer.flush()\n    else:\n        raw_signal: str\n        if msg.signal:\n            raw_signal = msg.signal\n        else:\n            raw_signal = \"\"\n\n        raw_contents: str\n        if isinstance(msg.contents, str):\n            raw_contents = msg.contents\n        elif msg.contents is None:\n            raw_contents = \"\"\n        else:\n            raise ValueError(\n                f\"Cannot send msg contents of type: {type(msg.contents)} when not using pickle!\"\n            )\n        sys.stderr.write(raw_signal)\n        sys.stdout.write(raw_contents)\n
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator","title":"SocketCommunicator","text":"

Bases: Communicator

Provides communication over Unix or TCP sockets.

Communication is provided either using sockets with the Python socket library or using ZMQ. The choice of implementation is controlled by the global bool USE_ZMQ.

Whether to use TCP or Unix sockets is controlled by the environment

LUTE_USE_TCP=1

If defined, TCP sockets will be used, otherwise Unix sockets will be used.

Regardless of socket type, the environment variable LUTE_EXECUTOR_HOST=<hostname> will be defined by the Executor-side Communicator.

For TCP sockets: The Executor-side Communicator should be run first and will bind to all interfaces on the port determined by the environment variable: LUTE_PORT=### If no port is defined, a port scan will be performed and the Executor-side Communicator will bind the first one available from a random selection. It will then define the environment variable so the Task-side can pick it up.

For Unix sockets: The path to the Unix socket is defined by the environment variable: LUTE_SOCKET=/path/to/socket This class assumes proper permissions and that this above environment variable has been defined. The Task is configured as what would commonly be referred to as the client, while the Executor is configured as the server.

If the Task process is run on a different machine than the Executor, the Task-side Communicator will open a ssh-tunnel to forward traffic from a local Unix socket to the Executor Unix socket. Opening of the tunnel relies on the environment variable: LUTE_EXECUTOR_HOST=<hostname> to determine the Executor's host. This variable should be defined by the Executor and passed to the Task process automatically, but it can also be defined manually if launching the Task process separately. The Task will use the local socket <LUTE_SOCKET>.task{##}. Multiple local sockets may be created. Currently, it is assumed that the user is identical on both the Task machine and Executor machine.

Source code in lute/execution/ipc.py
class SocketCommunicator(Communicator):\n    \"\"\"Provides communication over Unix or TCP sockets.\n\n    Communication is provided either using sockets with the Python socket library\n    or using ZMQ. The choice of implementation is controlled by the global bool\n    `USE_ZMQ`.\n\n    Whether to use TCP or Unix sockets is controlled by the environment:\n                           `LUTE_USE_TCP=1`\n    If defined, TCP sockets will be used, otherwise Unix sockets will be used.\n\n    Regardless of socket type, the environment variable\n                      `LUTE_EXECUTOR_HOST=<hostname>`\n    will be defined by the Executor-side Communicator.\n\n\n    For TCP sockets:\n    The Executor-side Communicator should be run first and will bind to all\n    interfaces on the port determined by the environment variable:\n                            `LUTE_PORT=###`\n    If no port is defined, a port scan will be performed and the Executor-side\n    Communicator will bind the first one available from a random selection. It\n    will then define the environment variable so the Task-side can pick it up.\n\n    For Unix sockets:\n    The path to the Unix socket is defined by the environment variable:\n                      `LUTE_SOCKET=/path/to/socket`\n    This class assumes proper permissions and that this above environment\n    variable has been defined. The `Task` is configured as what would commonly\n    be referred to as the `client`, while the `Executor` is configured as the\n    server.\n\n    If the Task process is run on a different machine than the Executor, the\n    Task-side Communicator will open a ssh-tunnel to forward traffic from a local\n    Unix socket to the Executor Unix socket. Opening of the tunnel relies on the\n    environment variable:\n                      `LUTE_EXECUTOR_HOST=<hostname>`\n    to determine the Executor's host. This variable should be defined by the\n    Executor and passed to the Task process automatically, but it can also be\n    defined manually if launching the Task process separately. The Task will use\n    the local socket `<LUTE_SOCKET>.task{##}`. Multiple local sockets may be\n    created. Currently, it is assumed that the user is identical on both the Task\n    machine and Executor machine.\n    \"\"\"\n\n    ACCEPT_TIMEOUT: float = 0.01\n    \"\"\"\n    Maximum time to wait to accept connections. Used by Executor-side.\n    \"\"\"\n    MSG_HEAD: bytes = b\"MSG\"\n    \"\"\"\n    Start signal of a message. The end of a message is indicated by MSG_HEAD[::-1].\n    \"\"\"\n    MSG_SEP: bytes = b\";;;\"\n    \"\"\"\n    Separator for parts of a message. Messages have a start, length, message and end.\n    \"\"\"\n\n    def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n        \"\"\"IPC over a TCP or Unix socket.\n\n        Unlike with the PipeCommunicator, pickle is always used to send data\n        through the socket.\n\n        Args:\n            party (Party): Which object (side/process) the Communicator is\n                managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n\n            use_pickle (bool): Whether to use pickle. Always True currently,\n                passing False does not change behaviour.\n        \"\"\"\n        super().__init__(party=party, use_pickle=use_pickle)\n\n    def delayed_setup(self) -> None:\n        \"\"\"Delays the creation of socket objects.\n\n        The Executor initializes the Communicator when it is created. Since\n        all Executors are created and available at once we want to delay\n        acquisition of socket resources until a single Executor is ready\n        to use them.\n        \"\"\"\n        self._data_socket: Union[socket.socket, zmq.sugar.socket.Socket]\n        if USE_ZMQ:\n            self.desc: str = \"Communicates using ZMQ through TCP or Unix sockets.\"\n            self._context: zmq.context.Context = zmq.Context()\n            self._data_socket = self._create_socket_zmq()\n        else:\n            self.desc: str = \"Communicates through a TCP or Unix socket.\"\n            self._data_socket = self._create_socket_raw()\n            self._data_socket.settimeout(SocketCommunicator.ACCEPT_TIMEOUT)\n\n        if self._party == Party.EXECUTOR:\n            # Executor created first so we can define the hostname env variable\n            os.environ[\"LUTE_EXECUTOR_HOST\"] = socket.gethostname()\n            # Setup reader thread\n            self._reader_thread: threading.Thread = threading.Thread(\n                target=self._read_socket\n            )\n            self._msg_queue: queue.Queue = queue.Queue()\n            self._partial_msg: Optional[bytes] = None\n            self._stop_thread: bool = False\n            self._reader_thread.start()\n        else:\n            # Only used by Party.TASK\n            self._use_ssh_tunnel: bool = False\n            self._ssh_proc: Optional[subprocess.Popen] = None\n            self._local_socket_path: Optional[str] = None\n\n    # Read\n    ############################################################################\n\n    def read(self, proc: subprocess.Popen) -> Message:\n        \"\"\"Return a message from the queue if available.\n\n        Socket(s) are continuously monitored, and read from when new data is\n        available.\n\n        Args:\n            proc (subprocess.Popen): The process to read from. Provided for\n                compatibility with other Communicator subtypes. Is ignored.\n\n        Returns:\n             msg (Message): The message read, containing contents and signal.\n        \"\"\"\n        msg: Message\n        try:\n            msg = self._msg_queue.get(timeout=SocketCommunicator.ACCEPT_TIMEOUT)\n        except queue.Empty:\n            msg = Message()\n\n        return msg\n\n    def _read_socket(self) -> None:\n        \"\"\"Read data from a socket.\n\n        Socket(s) are continuously monitored, and read from when new data is\n        available.\n\n        Calls an underlying method for either raw sockets or ZMQ.\n        \"\"\"\n\n        while True:\n            if self._stop_thread:\n                logger.debug(\"Stopping socket reader thread.\")\n                break\n            if USE_ZMQ:\n                self._read_socket_zmq()\n            else:\n                self._read_socket_raw()\n\n    def _read_socket_raw(self) -> None:\n        \"\"\"Read data from a socket.\n\n        Raw socket implementation for the reader thread.\n        \"\"\"\n        connection: socket.socket\n        addr: Union[str, Tuple[str, int]]\n        try:\n            connection, addr = self._data_socket.accept()\n            full_data: bytes = b\"\"\n            while True:\n                data: bytes = connection.recv(8192)\n                if data:\n                    full_data += data\n                else:\n                    break\n            connection.close()\n            self._unpack_messages(full_data)\n        except socket.timeout:\n            pass\n\n    def _read_socket_zmq(self) -> None:\n        \"\"\"Read data from a socket.\n\n        ZMQ implementation for the reader thread.\n        \"\"\"\n        try:\n            full_data: bytes = self._data_socket.recv(0)\n            self._unpack_messages(full_data)\n        except zmq.ZMQError:\n            pass\n\n    def _unpack_messages(self, data: bytes) -> None:\n        \"\"\"Unpacks a byte stream into individual messages.\n\n        Messages are encoded in the following format:\n                 <HEAD><SEP><len(msg)><SEP><msg><SEP><HEAD[::-1]>\n        The items between <> are replaced as follows:\n            - <HEAD>: A start marker\n            - <SEP>: A separator for components of the message\n            - <len(msg)>: The length of the message payload in bytes.\n            - <msg>: The message payload in bytes\n            - <HEAD[::-1]>: The start marker in reverse to indicate the end.\n\n        Partial messages (a series of bytes which cannot be converted to a full\n        message) are stored for later. An attempt is made to reconstruct the\n        message with the next call to this method.\n\n        Args:\n            data (bytes): A raw byte stream containing anywhere from a partial\n                message to multiple full messages.\n        \"\"\"\n        msg: Message\n        working_data: bytes\n        if self._partial_msg:\n            # Concatenate the previous partial message to the beginning\n            working_data = self._partial_msg + data\n            self._partial_msg = None\n        else:\n            working_data = data\n        while working_data:\n            try:\n                # Message encoding: <HEAD><SEP><len><SEP><msg><SEP><HEAD[::-1]>\n                end = working_data.find(\n                    SocketCommunicator.MSG_SEP + SocketCommunicator.MSG_HEAD[::-1]\n                )\n                msg_parts: List[bytes] = working_data[:end].split(\n                    SocketCommunicator.MSG_SEP\n                )\n                if len(msg_parts) != 3:\n                    self._partial_msg = working_data\n                    break\n\n                cmd: bytes\n                nbytes: bytes\n                raw_msg: bytes\n                cmd, nbytes, raw_msg = msg_parts\n                if len(raw_msg) != int(nbytes):\n                    self._partial_msg = working_data\n                    break\n                msg = pickle.loads(raw_msg)\n                self._msg_queue.put(msg)\n            except pickle.UnpicklingError:\n                self._partial_msg = working_data\n                break\n            if end < len(working_data):\n                # Add len(SEP+HEAD) since end marks the start of <SEP><HEAD[::-1]\n                offset: int = len(\n                    SocketCommunicator.MSG_SEP + SocketCommunicator.MSG_HEAD\n                )\n                working_data = working_data[end + offset :]\n            else:\n                working_data = b\"\"\n\n    # Write\n    ############################################################################\n\n    def _write_socket(self, msg: Message) -> None:\n        \"\"\"Sends data over a socket from the 'client' (Task) side.\n\n        Messages are encoded in the following format:\n                 <HEAD><SEP><len(msg)><SEP><msg><SEP><HEAD[::-1]>\n        The items between <> are replaced as follows:\n            - <HEAD>: A start marker\n            - <SEP>: A separator for components of the message\n            - <len(msg)>: The length of the message payload in bytes.\n            - <msg>: The message payload in bytes\n            - <HEAD[::-1]>: The start marker in reverse to indicate the end.\n\n        This structure is used for decoding the message on the other end.\n        \"\"\"\n        data: bytes = pickle.dumps(msg)\n        cmd: bytes = SocketCommunicator.MSG_HEAD\n        size: bytes = b\"%d\" % len(data)\n        end: bytes = SocketCommunicator.MSG_HEAD[::-1]\n        sep: bytes = SocketCommunicator.MSG_SEP\n        packed_msg: bytes = cmd + sep + size + sep + data + sep + end\n        if USE_ZMQ:\n            self._data_socket.send(packed_msg)\n        else:\n            self._data_socket.sendall(packed_msg)\n\n    def write(self, msg: Message) -> None:\n        \"\"\"Send a single Message.\n\n        The entire Message (signal and contents) is serialized and sent through\n        a connection over Unix socket.\n\n        Args:\n            msg (Message): The Message to send.\n        \"\"\"\n        self._write_socket(msg)\n\n    # Generic create\n    ############################################################################\n\n    def _create_socket_raw(self) -> socket.socket:\n        \"\"\"Create either a Unix or TCP socket.\n\n        If the environment variable:\n                              `LUTE_USE_TCP=1`\n        is defined, a TCP socket is returned, otherwise a Unix socket.\n\n        Refer to the individual initialization methods for additional environment\n        variables controlling the behaviour of these two communication types.\n\n        Returns:\n            data_socket (socket.socket): TCP or Unix socket.\n        \"\"\"\n        import struct\n\n        use_tcp: Optional[str] = os.getenv(\"LUTE_USE_TCP\")\n        sock: socket.socket\n        if use_tcp is not None:\n            if self._party == Party.EXECUTOR:\n                logger.info(\"Will use raw TCP sockets.\")\n            sock = self._init_tcp_socket_raw()\n        else:\n            if self._party == Party.EXECUTOR:\n                logger.info(\"Will use raw Unix sockets.\")\n            sock = self._init_unix_socket_raw()\n        sock.setsockopt(\n            socket.SOL_SOCKET, socket.SO_LINGER, struct.pack(\"ii\", 1, 10000)\n        )\n        return sock\n\n    def _create_socket_zmq(self) -> zmq.sugar.socket.Socket:\n        \"\"\"Create either a Unix or TCP socket.\n\n        If the environment variable:\n                              `LUTE_USE_TCP=1`\n        is defined, a TCP socket is returned, otherwise a Unix socket.\n\n        Refer to the individual initialization methods for additional environment\n        variables controlling the behaviour of these two communication types.\n\n        Returns:\n            data_socket (socket.socket): Unix socket object.\n        \"\"\"\n        socket_type: Literal[zmq.PULL, zmq.PUSH]\n        if self._party == Party.EXECUTOR:\n            socket_type = zmq.PULL\n        else:\n            socket_type = zmq.PUSH\n\n        data_socket: zmq.sugar.socket.Socket = self._context.socket(socket_type)\n        data_socket.set_hwm(160000)\n        # Need to multiply by 1000 since ZMQ uses ms\n        data_socket.setsockopt(\n            zmq.RCVTIMEO, int(SocketCommunicator.ACCEPT_TIMEOUT * 1000)\n        )\n        # Try TCP first\n        use_tcp: Optional[str] = os.getenv(\"LUTE_USE_TCP\")\n        if use_tcp is not None:\n            if self._party == Party.EXECUTOR:\n                logger.info(\"Will use TCP (ZMQ).\")\n            self._init_tcp_socket_zmq(data_socket)\n        else:\n            if self._party == Party.EXECUTOR:\n                logger.info(\"Will use Unix sockets (ZMQ).\")\n            self._init_unix_socket_zmq(data_socket)\n\n        return data_socket\n\n    # TCP Init\n    ############################################################################\n\n    def _find_random_port(\n        self, min_port: int = 41923, max_port: int = 64324, max_tries: int = 100\n    ) -> Optional[int]:\n        \"\"\"Find a random open port to bind to if using TCP.\"\"\"\n        from random import choices\n\n        sock: socket.socket\n        ports: List[int] = choices(range(min_port, max_port), k=max_tries)\n        for port in ports:\n            sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)\n            try:\n                sock.bind((\"\", port))\n                sock.close()\n                del sock\n                return port\n            except:\n                continue\n        return None\n\n    def _init_tcp_socket_raw(self) -> socket.socket:\n        \"\"\"Initialize a TCP socket.\n\n        Executor-side code should always be run first. It checks to see if\n        the environment variable\n                                `LUTE_PORT=###`\n        is defined, if so binds it, otherwise find a free port from a selection\n        of random ports. If a port search is performed, the `LUTE_PORT` variable\n        will be defined so it can be picked up by the the Task-side Communicator.\n\n        In the event that no port can be bound on the Executor-side, or the port\n        and hostname information is unavailable to the Task-side, the program\n        will exit.\n\n        Returns:\n            data_socket (socket.socket): TCP socket object.\n        \"\"\"\n        data_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)\n        port: Optional[Union[str, int]] = os.getenv(\"LUTE_PORT\")\n        if self._party == Party.EXECUTOR:\n            if port is None:\n                # If port is None find one\n                # Executor code executes first\n                port = self._find_random_port()\n                if port is None:\n                    # Failed to find a port to bind\n                    logger.info(\n                        \"Executor failed to bind a port. \"\n                        \"Try providing a LUTE_PORT directly! Exiting!\"\n                    )\n                    sys.exit(-1)\n                # Provide port env var for Task-side\n                os.environ[\"LUTE_PORT\"] = str(port)\n            data_socket.bind((\"\", int(port)))\n            data_socket.listen()\n        else:\n            hostname: str = socket.gethostname()\n            executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n            if executor_hostname is None or port is None:\n                logger.info(\n                    \"Task-side does not have host/port information!\"\n                    \" Check environment variables! Exiting!\"\n                )\n                sys.exit(-1)\n            if hostname == executor_hostname:\n                data_socket.connect((\"localhost\", int(port)))\n            else:\n                data_socket.connect((executor_hostname, int(port)))\n        return data_socket\n\n    def _init_tcp_socket_zmq(self, data_socket: zmq.sugar.socket.Socket) -> None:\n        \"\"\"Initialize a TCP socket using ZMQ.\n\n        Equivalent as the method above but requires passing in a ZMQ socket\n        object instead of returning one.\n\n        Args:\n            data_socket (zmq.socket.Socket): Socket object.\n        \"\"\"\n        port: Optional[Union[str, int]] = os.getenv(\"LUTE_PORT\")\n        if self._party == Party.EXECUTOR:\n            if port is None:\n                new_port: int = data_socket.bind_to_random_port(\"tcp://*\")\n                if new_port is None:\n                    # Failed to find a port to bind\n                    logger.info(\n                        \"Executor failed to bind a port. \"\n                        \"Try providing a LUTE_PORT directly! Exiting!\"\n                    )\n                    sys.exit(-1)\n                port = new_port\n                os.environ[\"LUTE_PORT\"] = str(port)\n            else:\n                data_socket.bind(f\"tcp://*:{port}\")\n            logger.debug(f\"Executor bound port {port}\")\n        else:\n            executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n            if executor_hostname is None or port is None:\n                logger.info(\n                    \"Task-side does not have host/port information!\"\n                    \" Check environment variables! Exiting!\"\n                )\n                sys.exit(-1)\n            data_socket.connect(f\"tcp://{executor_hostname}:{port}\")\n\n    # Unix Init\n    ############################################################################\n\n    def _get_socket_path(self) -> str:\n        \"\"\"Return the socket path, defining one if it is not available.\n\n        Returns:\n            socket_path (str): Path to the Unix socket.\n        \"\"\"\n        socket_path: str\n        try:\n            socket_path = os.environ[\"LUTE_SOCKET\"]\n        except KeyError as err:\n            import uuid\n            import tempfile\n\n            # Define a path, and add to environment\n            # Executor-side always created first, Task will use the same one\n            socket_path = f\"{tempfile.gettempdir()}/lute_{uuid.uuid4().hex}.sock\"\n            os.environ[\"LUTE_SOCKET\"] = socket_path\n            logger.debug(f\"SocketCommunicator defines socket_path: {socket_path}\")\n        if USE_ZMQ:\n            return f\"ipc://{socket_path}\"\n        else:\n            return socket_path\n\n    def _init_unix_socket_raw(self) -> socket.socket:\n        \"\"\"Returns a Unix socket object.\n\n        Executor-side code should always be run first. It checks to see if\n        the environment variable\n                                `LUTE_SOCKET=XYZ`\n        is defined, if so binds it, otherwise it will create a new path and\n        define the environment variable for the Task-side to find.\n\n        On the Task (client-side), this method will also open a SSH tunnel to\n        forward a local Unix socket to an Executor Unix socket if the Task and\n        Executor processes are on different machines.\n\n        Returns:\n            data_socket (socket.socket): Unix socket object.\n        \"\"\"\n        socket_path: str = self._get_socket_path()\n        data_socket = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)\n        if self._party == Party.EXECUTOR:\n            if os.path.exists(socket_path):\n                os.unlink(socket_path)\n            data_socket.bind(socket_path)\n            data_socket.listen()\n        elif self._party == Party.TASK:\n            hostname: str = socket.gethostname()\n            executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n            if executor_hostname is None:\n                logger.info(\"Hostname for Executor process not found! Exiting!\")\n                data_socket.close()\n                sys.exit(-1)\n            if hostname == executor_hostname:\n                data_socket.connect(socket_path)\n            else:\n                self._local_socket_path = self._setup_unix_ssh_tunnel(\n                    socket_path, hostname, executor_hostname\n                )\n                while 1:\n                    # Keep trying reconnect until ssh tunnel works.\n                    try:\n                        data_socket.connect(self._local_socket_path)\n                        break\n                    except FileNotFoundError:\n                        continue\n\n        return data_socket\n\n    def _init_unix_socket_zmq(self, data_socket: zmq.sugar.socket.Socket) -> None:\n        \"\"\"Initialize a Unix socket object, using ZMQ.\n\n        Equivalent as the method above but requires passing in a ZMQ socket\n        object instead of returning one.\n\n        Args:\n            data_socket (socket.socket): ZMQ object.\n        \"\"\"\n        socket_path = self._get_socket_path()\n        if self._party == Party.EXECUTOR:\n            if os.path.exists(socket_path):\n                os.unlink(socket_path)\n            data_socket.bind(socket_path)\n        elif self._party == Party.TASK:\n            hostname: str = socket.gethostname()\n            executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n            if executor_hostname is None:\n                logger.info(\"Hostname for Executor process not found! Exiting!\")\n                self._data_socket.close()\n                sys.exit(-1)\n            if hostname == executor_hostname:\n                data_socket.connect(socket_path)\n            else:\n                # Need to remove ipc:// from socket_path for forwarding\n                self._local_socket_path = self._setup_unix_ssh_tunnel(\n                    socket_path[6:], hostname, executor_hostname\n                )\n                # Need to add it back\n                path: str = f\"ipc://{self._local_socket_path}\"\n                data_socket.connect(path)\n\n    def _setup_unix_ssh_tunnel(\n        self, socket_path: str, hostname: str, executor_hostname: str\n    ) -> str:\n        \"\"\"Prepares an SSH tunnel for forwarding between Unix sockets on two hosts.\n\n        An SSH tunnel is opened with `ssh -L <local>:<remote> sleep 2`.\n        This method of communication is slightly slower and incurs additional\n        overhead - it should only be used as a backup. If communication across\n        multiple hosts is required consider using TCP.  The Task will use\n        the local socket `<LUTE_SOCKET>.task{##}`. Multiple local sockets may be\n        created. It is assumed that the user is identical on both the\n        Task machine and Executor machine.\n\n        Returns:\n            local_socket_path (str): The local Unix socket to connect to.\n        \"\"\"\n        if \"uuid\" not in globals():\n            import uuid\n        local_socket_path = f\"{socket_path}.task{uuid.uuid4().hex[:4]}\"\n        self._use_ssh_tunnel = True\n        ssh_cmd: List[str] = [\n            \"ssh\",\n            \"-o\",\n            \"LogLevel=quiet\",\n            \"-L\",\n            f\"{local_socket_path}:{socket_path}\",\n            executor_hostname,\n            \"sleep\",\n            \"2\",\n        ]\n        logger.debug(f\"Opening tunnel from {hostname} to {executor_hostname}\")\n        self._ssh_proc = subprocess.Popen(ssh_cmd)\n        time.sleep(0.4)  # Need to wait... -> Use single Task comm at beginning?\n        return local_socket_path\n\n    # Clean up and properties\n    ############################################################################\n\n    def _clean_up(self) -> None:\n        \"\"\"Clean up connections.\"\"\"\n        if self._party == Party.EXECUTOR:\n            self._stop_thread = True\n            self._reader_thread.join()\n            logger.debug(\"Closed reading thread.\")\n\n        self._data_socket.close()\n        if USE_ZMQ:\n            self._context.term()\n        else:\n            ...\n\n        if os.getenv(\"LUTE_USE_TCP\"):\n            return\n        else:\n            if self._party == Party.EXECUTOR:\n                os.unlink(os.getenv(\"LUTE_SOCKET\"))  # Should be defined\n                return\n            elif self._use_ssh_tunnel:\n                if self._ssh_proc is not None:\n                    self._ssh_proc.terminate()\n\n    @property\n    def has_messages(self) -> bool:\n        if self._party == Party.TASK:\n            # Shouldn't be called on Task-side\n            return False\n\n        if self._msg_queue.qsize() > 0:\n            return True\n        return False\n\n    def __exit__(self):\n        self._clean_up()\n
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.ACCEPT_TIMEOUT","title":"ACCEPT_TIMEOUT: float = 0.01 class-attribute instance-attribute","text":"

Maximum time to wait to accept connections. Used by Executor-side.

"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.MSG_HEAD","title":"MSG_HEAD: bytes = b'MSG' class-attribute instance-attribute","text":"

Start signal of a message. The end of a message is indicated by MSG_HEAD[::-1].

"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.MSG_SEP","title":"MSG_SEP: bytes = b';;;' class-attribute instance-attribute","text":"

Separator for parts of a message. Messages have a start, length, message and end.

"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.__init__","title":"__init__(party=Party.TASK, use_pickle=True)","text":"

IPC over a TCP or Unix socket.

Unlike with the PipeCommunicator, pickle is always used to send data through the socket.

Parameters:

Name Type Description Default party Party

Which object (side/process) the Communicator is managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.

TASK use_pickle bool

Whether to use pickle. Always True currently, passing False does not change behaviour.

True Source code in lute/execution/ipc.py
def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n    \"\"\"IPC over a TCP or Unix socket.\n\n    Unlike with the PipeCommunicator, pickle is always used to send data\n    through the socket.\n\n    Args:\n        party (Party): Which object (side/process) the Communicator is\n            managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n\n        use_pickle (bool): Whether to use pickle. Always True currently,\n            passing False does not change behaviour.\n    \"\"\"\n    super().__init__(party=party, use_pickle=use_pickle)\n
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.delayed_setup","title":"delayed_setup()","text":"

Delays the creation of socket objects.

The Executor initializes the Communicator when it is created. Since all Executors are created and available at once we want to delay acquisition of socket resources until a single Executor is ready to use them.

Source code in lute/execution/ipc.py
def delayed_setup(self) -> None:\n    \"\"\"Delays the creation of socket objects.\n\n    The Executor initializes the Communicator when it is created. Since\n    all Executors are created and available at once we want to delay\n    acquisition of socket resources until a single Executor is ready\n    to use them.\n    \"\"\"\n    self._data_socket: Union[socket.socket, zmq.sugar.socket.Socket]\n    if USE_ZMQ:\n        self.desc: str = \"Communicates using ZMQ through TCP or Unix sockets.\"\n        self._context: zmq.context.Context = zmq.Context()\n        self._data_socket = self._create_socket_zmq()\n    else:\n        self.desc: str = \"Communicates through a TCP or Unix socket.\"\n        self._data_socket = self._create_socket_raw()\n        self._data_socket.settimeout(SocketCommunicator.ACCEPT_TIMEOUT)\n\n    if self._party == Party.EXECUTOR:\n        # Executor created first so we can define the hostname env variable\n        os.environ[\"LUTE_EXECUTOR_HOST\"] = socket.gethostname()\n        # Setup reader thread\n        self._reader_thread: threading.Thread = threading.Thread(\n            target=self._read_socket\n        )\n        self._msg_queue: queue.Queue = queue.Queue()\n        self._partial_msg: Optional[bytes] = None\n        self._stop_thread: bool = False\n        self._reader_thread.start()\n    else:\n        # Only used by Party.TASK\n        self._use_ssh_tunnel: bool = False\n        self._ssh_proc: Optional[subprocess.Popen] = None\n        self._local_socket_path: Optional[str] = None\n
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.read","title":"read(proc)","text":"

Return a message from the queue if available.

Socket(s) are continuously monitored, and read from when new data is available.

Parameters:

Name Type Description Default proc Popen

The process to read from. Provided for compatibility with other Communicator subtypes. Is ignored.

required

Returns:

Name Type Description msg Message

The message read, containing contents and signal.

Source code in lute/execution/ipc.py
def read(self, proc: subprocess.Popen) -> Message:\n    \"\"\"Return a message from the queue if available.\n\n    Socket(s) are continuously monitored, and read from when new data is\n    available.\n\n    Args:\n        proc (subprocess.Popen): The process to read from. Provided for\n            compatibility with other Communicator subtypes. Is ignored.\n\n    Returns:\n         msg (Message): The message read, containing contents and signal.\n    \"\"\"\n    msg: Message\n    try:\n        msg = self._msg_queue.get(timeout=SocketCommunicator.ACCEPT_TIMEOUT)\n    except queue.Empty:\n        msg = Message()\n\n    return msg\n
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.write","title":"write(msg)","text":"

Send a single Message.

The entire Message (signal and contents) is serialized and sent through a connection over Unix socket.

Parameters:

Name Type Description Default msg Message

The Message to send.

required Source code in lute/execution/ipc.py
def write(self, msg: Message) -> None:\n    \"\"\"Send a single Message.\n\n    The entire Message (signal and contents) is serialized and sent through\n    a connection over Unix socket.\n\n    Args:\n        msg (Message): The Message to send.\n    \"\"\"\n    self._write_socket(msg)\n
"},{"location":"source/execution/ipc/","title":"ipc","text":"

Classes and utilities for communication between Executors and subprocesses.

Communicators manage message passing and parsing between subprocesses. They maintain a limited public interface of \"read\" and \"write\" operations. Behind this interface the methods of communication vary from serialization across pipes to Unix sockets, etc. All communicators pass a single object called a \"Message\" which contains an arbitrary \"contents\" field as well as an optional \"signal\" field.

Classes:

Name Description Party

Enum describing whether Communicator is on Task-side or Executor-side.

Message

A dataclass used for passing information from Task to Executor.

Communicator

Abstract base class for Communicator types.

PipeCommunicator

Manages communication between Task and Executor via pipes (stderr and stdout).

SocketCommunicator

Manages communication using sockets, either raw or using zmq. Supports both TCP and Unix sockets.

"},{"location":"source/execution/ipc/#execution.ipc.Communicator","title":"Communicator","text":"

Bases: ABC

Source code in lute/execution/ipc.py
class Communicator(ABC):\n    def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n        \"\"\"Abstract Base Class for IPC Communicator objects.\n\n        Args:\n            party (Party): Which object (side/process) the Communicator is\n                managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n            use_pickle (bool): Whether to serialize data using pickle prior to\n                sending it.\n        \"\"\"\n        self._party = party\n        self._use_pickle = use_pickle\n        self.desc = \"Communicator abstract base class.\"\n\n    @abstractmethod\n    def read(self, proc: subprocess.Popen) -> Message:\n        \"\"\"Method for reading data through the communication mechanism.\"\"\"\n        ...\n\n    @abstractmethod\n    def write(self, msg: Message) -> None:\n        \"\"\"Method for sending data through the communication mechanism.\"\"\"\n        ...\n\n    def __str__(self):\n        name: str = str(type(self)).split(\"'\")[1].split(\".\")[-1]\n        return f\"{name}: {self.desc}\"\n\n    def __repr__(self):\n        return self.__str__()\n\n    def __enter__(self) -> Self:\n        return self\n\n    def __exit__(self) -> None: ...\n\n    @property\n    def has_messages(self) -> bool:\n        \"\"\"Whether the Communicator has remaining messages.\n\n        The precise method for determining whether there are remaining messages\n        will depend on the specific Communicator sub-class.\n        \"\"\"\n        return False\n\n    def stage_communicator(self):\n        \"\"\"Alternative method for staging outside of context manager.\"\"\"\n        self.__enter__()\n\n    def clear_communicator(self):\n        \"\"\"Alternative exit method outside of context manager.\"\"\"\n        self.__exit__()\n\n    def delayed_setup(self):\n        \"\"\"Any setup that should be done later than init.\"\"\"\n        ...\n
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.has_messages","title":"has_messages: bool property","text":"

Whether the Communicator has remaining messages.

The precise method for determining whether there are remaining messages will depend on the specific Communicator sub-class.

"},{"location":"source/execution/ipc/#execution.ipc.Communicator.__init__","title":"__init__(party=Party.TASK, use_pickle=True)","text":"

Abstract Base Class for IPC Communicator objects.

Parameters:

Name Type Description Default party Party

Which object (side/process) the Communicator is managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.

TASK use_pickle bool

Whether to serialize data using pickle prior to sending it.

True Source code in lute/execution/ipc.py
def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n    \"\"\"Abstract Base Class for IPC Communicator objects.\n\n    Args:\n        party (Party): Which object (side/process) the Communicator is\n            managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n        use_pickle (bool): Whether to serialize data using pickle prior to\n            sending it.\n    \"\"\"\n    self._party = party\n    self._use_pickle = use_pickle\n    self.desc = \"Communicator abstract base class.\"\n
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.clear_communicator","title":"clear_communicator()","text":"

Alternative exit method outside of context manager.

Source code in lute/execution/ipc.py
def clear_communicator(self):\n    \"\"\"Alternative exit method outside of context manager.\"\"\"\n    self.__exit__()\n
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.delayed_setup","title":"delayed_setup()","text":"

Any setup that should be done later than init.

Source code in lute/execution/ipc.py
def delayed_setup(self):\n    \"\"\"Any setup that should be done later than init.\"\"\"\n    ...\n
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.read","title":"read(proc) abstractmethod","text":"

Method for reading data through the communication mechanism.

Source code in lute/execution/ipc.py
@abstractmethod\ndef read(self, proc: subprocess.Popen) -> Message:\n    \"\"\"Method for reading data through the communication mechanism.\"\"\"\n    ...\n
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.stage_communicator","title":"stage_communicator()","text":"

Alternative method for staging outside of context manager.

Source code in lute/execution/ipc.py
def stage_communicator(self):\n    \"\"\"Alternative method for staging outside of context manager.\"\"\"\n    self.__enter__()\n
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.write","title":"write(msg) abstractmethod","text":"

Method for sending data through the communication mechanism.

Source code in lute/execution/ipc.py
@abstractmethod\ndef write(self, msg: Message) -> None:\n    \"\"\"Method for sending data through the communication mechanism.\"\"\"\n    ...\n
"},{"location":"source/execution/ipc/#execution.ipc.Party","title":"Party","text":"

Bases: Enum

Identifier for which party (side/end) is using a communicator.

For some types of communication streams there may be different interfaces depending on which side of the communicator you are on. This enum is used by the communicator to determine which interface to use.

Source code in lute/execution/ipc.py
class Party(Enum):\n    \"\"\"Identifier for which party (side/end) is using a communicator.\n\n    For some types of communication streams there may be different interfaces\n    depending on which side of the communicator you are on. This enum is used\n    by the communicator to determine which interface to use.\n    \"\"\"\n\n    TASK = 0\n    \"\"\"\n    The Task (client) side.\n    \"\"\"\n    EXECUTOR = 1\n    \"\"\"\n    The Executor (server) side.\n    \"\"\"\n
"},{"location":"source/execution/ipc/#execution.ipc.Party.EXECUTOR","title":"EXECUTOR = 1 class-attribute instance-attribute","text":"

The Executor (server) side.

"},{"location":"source/execution/ipc/#execution.ipc.Party.TASK","title":"TASK = 0 class-attribute instance-attribute","text":"

The Task (client) side.

"},{"location":"source/execution/ipc/#execution.ipc.PipeCommunicator","title":"PipeCommunicator","text":"

Bases: Communicator

Provides communication through pipes over stderr/stdout.

The implementation of this communicator has reading and writing ocurring on stderr and stdout. In general the Task will be writing while the Executor will be reading. stderr is used for sending signals.

Source code in lute/execution/ipc.py
class PipeCommunicator(Communicator):\n    \"\"\"Provides communication through pipes over stderr/stdout.\n\n    The implementation of this communicator has reading and writing ocurring\n    on stderr and stdout. In general the `Task` will be writing while the\n    `Executor` will be reading. `stderr` is used for sending signals.\n    \"\"\"\n\n    def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n        \"\"\"IPC through pipes.\n\n        Arbitrary objects may be transmitted using pickle to serialize the data.\n        If pickle is not used\n\n        Args:\n            party (Party): Which object (side/process) the Communicator is\n                managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n            use_pickle (bool): Whether to serialize data using Pickle prior to\n                sending it. If False, data is assumed to be text whi\n        \"\"\"\n        super().__init__(party=party, use_pickle=use_pickle)\n        self.desc = \"Communicates through stderr and stdout using pickle.\"\n\n    def read(self, proc: subprocess.Popen) -> Message:\n        \"\"\"Read from stdout and stderr.\n\n        Args:\n            proc (subprocess.Popen): The process to read from.\n\n        Returns:\n            msg (Message): The message read, containing contents and signal.\n        \"\"\"\n        signal: Optional[str]\n        contents: Optional[str]\n        raw_signal: bytes = proc.stderr.read()\n        raw_contents: bytes = proc.stdout.read()\n        if raw_signal is not None:\n            signal = raw_signal.decode()\n        else:\n            signal = raw_signal\n        if raw_contents:\n            if self._use_pickle:\n                try:\n                    contents = pickle.loads(raw_contents)\n                except (pickle.UnpicklingError, ValueError, EOFError) as err:\n                    logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=False\")\n                    self._use_pickle = False\n                    contents = self._safe_unpickle_decode(raw_contents)\n            else:\n                try:\n                    contents = raw_contents.decode()\n                except UnicodeDecodeError as err:\n                    logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=True\")\n                    self._use_pickle = True\n                    contents = self._safe_unpickle_decode(raw_contents)\n        else:\n            contents = None\n\n        if signal and signal not in LUTE_SIGNALS:\n            # Some tasks write on stderr\n            # If the signal channel has \"non-signal\" info, add it to\n            # contents\n            if not contents:\n                contents = f\"({signal})\"\n            else:\n                contents = f\"{contents} ({signal})\"\n            signal = None\n\n        return Message(contents=contents, signal=signal)\n\n    def _safe_unpickle_decode(self, maybe_mixed: bytes) -> Optional[str]:\n        \"\"\"This method is used to unpickle and/or decode a bytes object.\n\n        It attempts to handle cases where contents can be mixed, i.e., part of\n        the message must be decoded and the other part unpickled. It handles\n        only two-way splits. If there are more complex arrangements such as:\n        <pickled>:<unpickled>:<pickled> etc, it will give up.\n\n        The simpler two way splits are unlikely to occur in normal usage. They\n        may arise when debugging if, e.g., `print` statements are mixed with the\n        usage of the `_report_to_executor` method.\n\n        Note that this method works because ONLY text data is assumed to be\n        sent via the pipes. The method needs to be revised to handle non-text\n        data if the `Task` is modified to also send that via PipeCommunicator.\n        The use of pickle is supported to provide for this option if it is\n        necessary. It may be deprecated in the future.\n\n        Be careful when making changes. This method has seemingly redundant\n        checks because unpickling will not throw an error if a full object can\n        be retrieved. That is, the library will ignore extraneous bytes. This\n        method attempts to retrieve that information if the pickled data comes\n        first in the stream.\n\n        Args:\n            maybe_mixed (bytes): A bytes object which could require unpickling,\n                decoding, or both.\n\n        Returns:\n            contents (Optional[str]): The unpickled/decoded contents if possible.\n                Otherwise, None.\n        \"\"\"\n        contents: Optional[str]\n        try:\n            contents = pickle.loads(maybe_mixed)\n            repickled: bytes = pickle.dumps(contents)\n            if len(repickled) < len(maybe_mixed):\n                # Successful unpickling, but pickle stops even if there are more bytes\n                try:\n                    additional_data: str = maybe_mixed[len(repickled) :].decode()\n                    contents = f\"{contents}{additional_data}\"\n                except UnicodeDecodeError:\n                    # Can't decode the bytes left by pickle, so they are lost\n                    missing_bytes: int = len(maybe_mixed) - len(repickled)\n                    logger.debug(\n                        f\"PipeCommunicator has truncated message. Unable to retrieve {missing_bytes} bytes.\"\n                    )\n        except (pickle.UnpicklingError, ValueError, EOFError) as err:\n            # Pickle may also throw a ValueError, e.g. this bytes: b\"Found! \\n\"\n            # Pickle may also throw an EOFError, eg. this bytes: b\"F0\\n\"\n            try:\n                contents = maybe_mixed.decode()\n            except UnicodeDecodeError as err2:\n                try:\n                    contents = maybe_mixed[: err2.start].decode()\n                    contents = f\"{contents}{pickle.loads(maybe_mixed[err2.start:])}\"\n                except Exception as err3:\n                    logger.debug(\n                        f\"PipeCommunicator unable to decode/parse data! {err3}\"\n                    )\n                    contents = None\n        return contents\n\n    def write(self, msg: Message) -> None:\n        \"\"\"Write to stdout and stderr.\n\n         The signal component is sent to `stderr` while the contents of the\n         Message are sent to `stdout`.\n\n        Args:\n            msg (Message): The Message to send.\n        \"\"\"\n        if self._use_pickle:\n            signal: bytes\n            if msg.signal:\n                signal = msg.signal.encode()\n            else:\n                signal = b\"\"\n\n            contents: bytes = pickle.dumps(msg.contents)\n\n            sys.stderr.buffer.write(signal)\n            sys.stdout.buffer.write(contents)\n\n            sys.stderr.buffer.flush()\n            sys.stdout.buffer.flush()\n        else:\n            raw_signal: str\n            if msg.signal:\n                raw_signal = msg.signal\n            else:\n                raw_signal = \"\"\n\n            raw_contents: str\n            if isinstance(msg.contents, str):\n                raw_contents = msg.contents\n            elif msg.contents is None:\n                raw_contents = \"\"\n            else:\n                raise ValueError(\n                    f\"Cannot send msg contents of type: {type(msg.contents)} when not using pickle!\"\n                )\n            sys.stderr.write(raw_signal)\n            sys.stdout.write(raw_contents)\n
"},{"location":"source/execution/ipc/#execution.ipc.PipeCommunicator.__init__","title":"__init__(party=Party.TASK, use_pickle=True)","text":"

IPC through pipes.

Arbitrary objects may be transmitted using pickle to serialize the data. If pickle is not used

Parameters:

Name Type Description Default party Party

Which object (side/process) the Communicator is managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.

TASK use_pickle bool

Whether to serialize data using Pickle prior to sending it. If False, data is assumed to be text whi

True Source code in lute/execution/ipc.py
def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n    \"\"\"IPC through pipes.\n\n    Arbitrary objects may be transmitted using pickle to serialize the data.\n    If pickle is not used\n\n    Args:\n        party (Party): Which object (side/process) the Communicator is\n            managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n        use_pickle (bool): Whether to serialize data using Pickle prior to\n            sending it. If False, data is assumed to be text whi\n    \"\"\"\n    super().__init__(party=party, use_pickle=use_pickle)\n    self.desc = \"Communicates through stderr and stdout using pickle.\"\n
"},{"location":"source/execution/ipc/#execution.ipc.PipeCommunicator.read","title":"read(proc)","text":"

Read from stdout and stderr.

Parameters:

Name Type Description Default proc Popen

The process to read from.

required

Returns:

Name Type Description msg Message

The message read, containing contents and signal.

Source code in lute/execution/ipc.py
def read(self, proc: subprocess.Popen) -> Message:\n    \"\"\"Read from stdout and stderr.\n\n    Args:\n        proc (subprocess.Popen): The process to read from.\n\n    Returns:\n        msg (Message): The message read, containing contents and signal.\n    \"\"\"\n    signal: Optional[str]\n    contents: Optional[str]\n    raw_signal: bytes = proc.stderr.read()\n    raw_contents: bytes = proc.stdout.read()\n    if raw_signal is not None:\n        signal = raw_signal.decode()\n    else:\n        signal = raw_signal\n    if raw_contents:\n        if self._use_pickle:\n            try:\n                contents = pickle.loads(raw_contents)\n            except (pickle.UnpicklingError, ValueError, EOFError) as err:\n                logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=False\")\n                self._use_pickle = False\n                contents = self._safe_unpickle_decode(raw_contents)\n        else:\n            try:\n                contents = raw_contents.decode()\n            except UnicodeDecodeError as err:\n                logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=True\")\n                self._use_pickle = True\n                contents = self._safe_unpickle_decode(raw_contents)\n    else:\n        contents = None\n\n    if signal and signal not in LUTE_SIGNALS:\n        # Some tasks write on stderr\n        # If the signal channel has \"non-signal\" info, add it to\n        # contents\n        if not contents:\n            contents = f\"({signal})\"\n        else:\n            contents = f\"{contents} ({signal})\"\n        signal = None\n\n    return Message(contents=contents, signal=signal)\n
"},{"location":"source/execution/ipc/#execution.ipc.PipeCommunicator.write","title":"write(msg)","text":"

Write to stdout and stderr.

The signal component is sent to stderr while the contents of the Message are sent to stdout.

Parameters:

Name Type Description Default msg Message

The Message to send.

required Source code in lute/execution/ipc.py
def write(self, msg: Message) -> None:\n    \"\"\"Write to stdout and stderr.\n\n     The signal component is sent to `stderr` while the contents of the\n     Message are sent to `stdout`.\n\n    Args:\n        msg (Message): The Message to send.\n    \"\"\"\n    if self._use_pickle:\n        signal: bytes\n        if msg.signal:\n            signal = msg.signal.encode()\n        else:\n            signal = b\"\"\n\n        contents: bytes = pickle.dumps(msg.contents)\n\n        sys.stderr.buffer.write(signal)\n        sys.stdout.buffer.write(contents)\n\n        sys.stderr.buffer.flush()\n        sys.stdout.buffer.flush()\n    else:\n        raw_signal: str\n        if msg.signal:\n            raw_signal = msg.signal\n        else:\n            raw_signal = \"\"\n\n        raw_contents: str\n        if isinstance(msg.contents, str):\n            raw_contents = msg.contents\n        elif msg.contents is None:\n            raw_contents = \"\"\n        else:\n            raise ValueError(\n                f\"Cannot send msg contents of type: {type(msg.contents)} when not using pickle!\"\n            )\n        sys.stderr.write(raw_signal)\n        sys.stdout.write(raw_contents)\n
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator","title":"SocketCommunicator","text":"

Bases: Communicator

Provides communication over Unix or TCP sockets.

Communication is provided either using sockets with the Python socket library or using ZMQ. The choice of implementation is controlled by the global bool USE_ZMQ.

Whether to use TCP or Unix sockets is controlled by the environment

LUTE_USE_TCP=1

If defined, TCP sockets will be used, otherwise Unix sockets will be used.

Regardless of socket type, the environment variable LUTE_EXECUTOR_HOST=<hostname> will be defined by the Executor-side Communicator.

For TCP sockets: The Executor-side Communicator should be run first and will bind to all interfaces on the port determined by the environment variable: LUTE_PORT=### If no port is defined, a port scan will be performed and the Executor-side Communicator will bind the first one available from a random selection. It will then define the environment variable so the Task-side can pick it up.

For Unix sockets: The path to the Unix socket is defined by the environment variable: LUTE_SOCKET=/path/to/socket This class assumes proper permissions and that this above environment variable has been defined. The Task is configured as what would commonly be referred to as the client, while the Executor is configured as the server.

If the Task process is run on a different machine than the Executor, the Task-side Communicator will open a ssh-tunnel to forward traffic from a local Unix socket to the Executor Unix socket. Opening of the tunnel relies on the environment variable: LUTE_EXECUTOR_HOST=<hostname> to determine the Executor's host. This variable should be defined by the Executor and passed to the Task process automatically, but it can also be defined manually if launching the Task process separately. The Task will use the local socket <LUTE_SOCKET>.task{##}. Multiple local sockets may be created. Currently, it is assumed that the user is identical on both the Task machine and Executor machine.

Source code in lute/execution/ipc.py
class SocketCommunicator(Communicator):\n    \"\"\"Provides communication over Unix or TCP sockets.\n\n    Communication is provided either using sockets with the Python socket library\n    or using ZMQ. The choice of implementation is controlled by the global bool\n    `USE_ZMQ`.\n\n    Whether to use TCP or Unix sockets is controlled by the environment:\n                           `LUTE_USE_TCP=1`\n    If defined, TCP sockets will be used, otherwise Unix sockets will be used.\n\n    Regardless of socket type, the environment variable\n                      `LUTE_EXECUTOR_HOST=<hostname>`\n    will be defined by the Executor-side Communicator.\n\n\n    For TCP sockets:\n    The Executor-side Communicator should be run first and will bind to all\n    interfaces on the port determined by the environment variable:\n                            `LUTE_PORT=###`\n    If no port is defined, a port scan will be performed and the Executor-side\n    Communicator will bind the first one available from a random selection. It\n    will then define the environment variable so the Task-side can pick it up.\n\n    For Unix sockets:\n    The path to the Unix socket is defined by the environment variable:\n                      `LUTE_SOCKET=/path/to/socket`\n    This class assumes proper permissions and that this above environment\n    variable has been defined. The `Task` is configured as what would commonly\n    be referred to as the `client`, while the `Executor` is configured as the\n    server.\n\n    If the Task process is run on a different machine than the Executor, the\n    Task-side Communicator will open a ssh-tunnel to forward traffic from a local\n    Unix socket to the Executor Unix socket. Opening of the tunnel relies on the\n    environment variable:\n                      `LUTE_EXECUTOR_HOST=<hostname>`\n    to determine the Executor's host. This variable should be defined by the\n    Executor and passed to the Task process automatically, but it can also be\n    defined manually if launching the Task process separately. The Task will use\n    the local socket `<LUTE_SOCKET>.task{##}`. Multiple local sockets may be\n    created. Currently, it is assumed that the user is identical on both the Task\n    machine and Executor machine.\n    \"\"\"\n\n    ACCEPT_TIMEOUT: float = 0.01\n    \"\"\"\n    Maximum time to wait to accept connections. Used by Executor-side.\n    \"\"\"\n    MSG_HEAD: bytes = b\"MSG\"\n    \"\"\"\n    Start signal of a message. The end of a message is indicated by MSG_HEAD[::-1].\n    \"\"\"\n    MSG_SEP: bytes = b\";;;\"\n    \"\"\"\n    Separator for parts of a message. Messages have a start, length, message and end.\n    \"\"\"\n\n    def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n        \"\"\"IPC over a TCP or Unix socket.\n\n        Unlike with the PipeCommunicator, pickle is always used to send data\n        through the socket.\n\n        Args:\n            party (Party): Which object (side/process) the Communicator is\n                managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n\n            use_pickle (bool): Whether to use pickle. Always True currently,\n                passing False does not change behaviour.\n        \"\"\"\n        super().__init__(party=party, use_pickle=use_pickle)\n\n    def delayed_setup(self) -> None:\n        \"\"\"Delays the creation of socket objects.\n\n        The Executor initializes the Communicator when it is created. Since\n        all Executors are created and available at once we want to delay\n        acquisition of socket resources until a single Executor is ready\n        to use them.\n        \"\"\"\n        self._data_socket: Union[socket.socket, zmq.sugar.socket.Socket]\n        if USE_ZMQ:\n            self.desc: str = \"Communicates using ZMQ through TCP or Unix sockets.\"\n            self._context: zmq.context.Context = zmq.Context()\n            self._data_socket = self._create_socket_zmq()\n        else:\n            self.desc: str = \"Communicates through a TCP or Unix socket.\"\n            self._data_socket = self._create_socket_raw()\n            self._data_socket.settimeout(SocketCommunicator.ACCEPT_TIMEOUT)\n\n        if self._party == Party.EXECUTOR:\n            # Executor created first so we can define the hostname env variable\n            os.environ[\"LUTE_EXECUTOR_HOST\"] = socket.gethostname()\n            # Setup reader thread\n            self._reader_thread: threading.Thread = threading.Thread(\n                target=self._read_socket\n            )\n            self._msg_queue: queue.Queue = queue.Queue()\n            self._partial_msg: Optional[bytes] = None\n            self._stop_thread: bool = False\n            self._reader_thread.start()\n        else:\n            # Only used by Party.TASK\n            self._use_ssh_tunnel: bool = False\n            self._ssh_proc: Optional[subprocess.Popen] = None\n            self._local_socket_path: Optional[str] = None\n\n    # Read\n    ############################################################################\n\n    def read(self, proc: subprocess.Popen) -> Message:\n        \"\"\"Return a message from the queue if available.\n\n        Socket(s) are continuously monitored, and read from when new data is\n        available.\n\n        Args:\n            proc (subprocess.Popen): The process to read from. Provided for\n                compatibility with other Communicator subtypes. Is ignored.\n\n        Returns:\n             msg (Message): The message read, containing contents and signal.\n        \"\"\"\n        msg: Message\n        try:\n            msg = self._msg_queue.get(timeout=SocketCommunicator.ACCEPT_TIMEOUT)\n        except queue.Empty:\n            msg = Message()\n\n        return msg\n\n    def _read_socket(self) -> None:\n        \"\"\"Read data from a socket.\n\n        Socket(s) are continuously monitored, and read from when new data is\n        available.\n\n        Calls an underlying method for either raw sockets or ZMQ.\n        \"\"\"\n\n        while True:\n            if self._stop_thread:\n                logger.debug(\"Stopping socket reader thread.\")\n                break\n            if USE_ZMQ:\n                self._read_socket_zmq()\n            else:\n                self._read_socket_raw()\n\n    def _read_socket_raw(self) -> None:\n        \"\"\"Read data from a socket.\n\n        Raw socket implementation for the reader thread.\n        \"\"\"\n        connection: socket.socket\n        addr: Union[str, Tuple[str, int]]\n        try:\n            connection, addr = self._data_socket.accept()\n            full_data: bytes = b\"\"\n            while True:\n                data: bytes = connection.recv(8192)\n                if data:\n                    full_data += data\n                else:\n                    break\n            connection.close()\n            self._unpack_messages(full_data)\n        except socket.timeout:\n            pass\n\n    def _read_socket_zmq(self) -> None:\n        \"\"\"Read data from a socket.\n\n        ZMQ implementation for the reader thread.\n        \"\"\"\n        try:\n            full_data: bytes = self._data_socket.recv(0)\n            self._unpack_messages(full_data)\n        except zmq.ZMQError:\n            pass\n\n    def _unpack_messages(self, data: bytes) -> None:\n        \"\"\"Unpacks a byte stream into individual messages.\n\n        Messages are encoded in the following format:\n                 <HEAD><SEP><len(msg)><SEP><msg><SEP><HEAD[::-1]>\n        The items between <> are replaced as follows:\n            - <HEAD>: A start marker\n            - <SEP>: A separator for components of the message\n            - <len(msg)>: The length of the message payload in bytes.\n            - <msg>: The message payload in bytes\n            - <HEAD[::-1]>: The start marker in reverse to indicate the end.\n\n        Partial messages (a series of bytes which cannot be converted to a full\n        message) are stored for later. An attempt is made to reconstruct the\n        message with the next call to this method.\n\n        Args:\n            data (bytes): A raw byte stream containing anywhere from a partial\n                message to multiple full messages.\n        \"\"\"\n        msg: Message\n        working_data: bytes\n        if self._partial_msg:\n            # Concatenate the previous partial message to the beginning\n            working_data = self._partial_msg + data\n            self._partial_msg = None\n        else:\n            working_data = data\n        while working_data:\n            try:\n                # Message encoding: <HEAD><SEP><len><SEP><msg><SEP><HEAD[::-1]>\n                end = working_data.find(\n                    SocketCommunicator.MSG_SEP + SocketCommunicator.MSG_HEAD[::-1]\n                )\n                msg_parts: List[bytes] = working_data[:end].split(\n                    SocketCommunicator.MSG_SEP\n                )\n                if len(msg_parts) != 3:\n                    self._partial_msg = working_data\n                    break\n\n                cmd: bytes\n                nbytes: bytes\n                raw_msg: bytes\n                cmd, nbytes, raw_msg = msg_parts\n                if len(raw_msg) != int(nbytes):\n                    self._partial_msg = working_data\n                    break\n                msg = pickle.loads(raw_msg)\n                self._msg_queue.put(msg)\n            except pickle.UnpicklingError:\n                self._partial_msg = working_data\n                break\n            if end < len(working_data):\n                # Add len(SEP+HEAD) since end marks the start of <SEP><HEAD[::-1]\n                offset: int = len(\n                    SocketCommunicator.MSG_SEP + SocketCommunicator.MSG_HEAD\n                )\n                working_data = working_data[end + offset :]\n            else:\n                working_data = b\"\"\n\n    # Write\n    ############################################################################\n\n    def _write_socket(self, msg: Message) -> None:\n        \"\"\"Sends data over a socket from the 'client' (Task) side.\n\n        Messages are encoded in the following format:\n                 <HEAD><SEP><len(msg)><SEP><msg><SEP><HEAD[::-1]>\n        The items between <> are replaced as follows:\n            - <HEAD>: A start marker\n            - <SEP>: A separator for components of the message\n            - <len(msg)>: The length of the message payload in bytes.\n            - <msg>: The message payload in bytes\n            - <HEAD[::-1]>: The start marker in reverse to indicate the end.\n\n        This structure is used for decoding the message on the other end.\n        \"\"\"\n        data: bytes = pickle.dumps(msg)\n        cmd: bytes = SocketCommunicator.MSG_HEAD\n        size: bytes = b\"%d\" % len(data)\n        end: bytes = SocketCommunicator.MSG_HEAD[::-1]\n        sep: bytes = SocketCommunicator.MSG_SEP\n        packed_msg: bytes = cmd + sep + size + sep + data + sep + end\n        if USE_ZMQ:\n            self._data_socket.send(packed_msg)\n        else:\n            self._data_socket.sendall(packed_msg)\n\n    def write(self, msg: Message) -> None:\n        \"\"\"Send a single Message.\n\n        The entire Message (signal and contents) is serialized and sent through\n        a connection over Unix socket.\n\n        Args:\n            msg (Message): The Message to send.\n        \"\"\"\n        self._write_socket(msg)\n\n    # Generic create\n    ############################################################################\n\n    def _create_socket_raw(self) -> socket.socket:\n        \"\"\"Create either a Unix or TCP socket.\n\n        If the environment variable:\n                              `LUTE_USE_TCP=1`\n        is defined, a TCP socket is returned, otherwise a Unix socket.\n\n        Refer to the individual initialization methods for additional environment\n        variables controlling the behaviour of these two communication types.\n\n        Returns:\n            data_socket (socket.socket): TCP or Unix socket.\n        \"\"\"\n        import struct\n\n        use_tcp: Optional[str] = os.getenv(\"LUTE_USE_TCP\")\n        sock: socket.socket\n        if use_tcp is not None:\n            if self._party == Party.EXECUTOR:\n                logger.info(\"Will use raw TCP sockets.\")\n            sock = self._init_tcp_socket_raw()\n        else:\n            if self._party == Party.EXECUTOR:\n                logger.info(\"Will use raw Unix sockets.\")\n            sock = self._init_unix_socket_raw()\n        sock.setsockopt(\n            socket.SOL_SOCKET, socket.SO_LINGER, struct.pack(\"ii\", 1, 10000)\n        )\n        return sock\n\n    def _create_socket_zmq(self) -> zmq.sugar.socket.Socket:\n        \"\"\"Create either a Unix or TCP socket.\n\n        If the environment variable:\n                              `LUTE_USE_TCP=1`\n        is defined, a TCP socket is returned, otherwise a Unix socket.\n\n        Refer to the individual initialization methods for additional environment\n        variables controlling the behaviour of these two communication types.\n\n        Returns:\n            data_socket (socket.socket): Unix socket object.\n        \"\"\"\n        socket_type: Literal[zmq.PULL, zmq.PUSH]\n        if self._party == Party.EXECUTOR:\n            socket_type = zmq.PULL\n        else:\n            socket_type = zmq.PUSH\n\n        data_socket: zmq.sugar.socket.Socket = self._context.socket(socket_type)\n        data_socket.set_hwm(160000)\n        # Need to multiply by 1000 since ZMQ uses ms\n        data_socket.setsockopt(\n            zmq.RCVTIMEO, int(SocketCommunicator.ACCEPT_TIMEOUT * 1000)\n        )\n        # Try TCP first\n        use_tcp: Optional[str] = os.getenv(\"LUTE_USE_TCP\")\n        if use_tcp is not None:\n            if self._party == Party.EXECUTOR:\n                logger.info(\"Will use TCP (ZMQ).\")\n            self._init_tcp_socket_zmq(data_socket)\n        else:\n            if self._party == Party.EXECUTOR:\n                logger.info(\"Will use Unix sockets (ZMQ).\")\n            self._init_unix_socket_zmq(data_socket)\n\n        return data_socket\n\n    # TCP Init\n    ############################################################################\n\n    def _find_random_port(\n        self, min_port: int = 41923, max_port: int = 64324, max_tries: int = 100\n    ) -> Optional[int]:\n        \"\"\"Find a random open port to bind to if using TCP.\"\"\"\n        from random import choices\n\n        sock: socket.socket\n        ports: List[int] = choices(range(min_port, max_port), k=max_tries)\n        for port in ports:\n            sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)\n            try:\n                sock.bind((\"\", port))\n                sock.close()\n                del sock\n                return port\n            except:\n                continue\n        return None\n\n    def _init_tcp_socket_raw(self) -> socket.socket:\n        \"\"\"Initialize a TCP socket.\n\n        Executor-side code should always be run first. It checks to see if\n        the environment variable\n                                `LUTE_PORT=###`\n        is defined, if so binds it, otherwise find a free port from a selection\n        of random ports. If a port search is performed, the `LUTE_PORT` variable\n        will be defined so it can be picked up by the the Task-side Communicator.\n\n        In the event that no port can be bound on the Executor-side, or the port\n        and hostname information is unavailable to the Task-side, the program\n        will exit.\n\n        Returns:\n            data_socket (socket.socket): TCP socket object.\n        \"\"\"\n        data_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)\n        port: Optional[Union[str, int]] = os.getenv(\"LUTE_PORT\")\n        if self._party == Party.EXECUTOR:\n            if port is None:\n                # If port is None find one\n                # Executor code executes first\n                port = self._find_random_port()\n                if port is None:\n                    # Failed to find a port to bind\n                    logger.info(\n                        \"Executor failed to bind a port. \"\n                        \"Try providing a LUTE_PORT directly! Exiting!\"\n                    )\n                    sys.exit(-1)\n                # Provide port env var for Task-side\n                os.environ[\"LUTE_PORT\"] = str(port)\n            data_socket.bind((\"\", int(port)))\n            data_socket.listen()\n        else:\n            hostname: str = socket.gethostname()\n            executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n            if executor_hostname is None or port is None:\n                logger.info(\n                    \"Task-side does not have host/port information!\"\n                    \" Check environment variables! Exiting!\"\n                )\n                sys.exit(-1)\n            if hostname == executor_hostname:\n                data_socket.connect((\"localhost\", int(port)))\n            else:\n                data_socket.connect((executor_hostname, int(port)))\n        return data_socket\n\n    def _init_tcp_socket_zmq(self, data_socket: zmq.sugar.socket.Socket) -> None:\n        \"\"\"Initialize a TCP socket using ZMQ.\n\n        Equivalent as the method above but requires passing in a ZMQ socket\n        object instead of returning one.\n\n        Args:\n            data_socket (zmq.socket.Socket): Socket object.\n        \"\"\"\n        port: Optional[Union[str, int]] = os.getenv(\"LUTE_PORT\")\n        if self._party == Party.EXECUTOR:\n            if port is None:\n                new_port: int = data_socket.bind_to_random_port(\"tcp://*\")\n                if new_port is None:\n                    # Failed to find a port to bind\n                    logger.info(\n                        \"Executor failed to bind a port. \"\n                        \"Try providing a LUTE_PORT directly! Exiting!\"\n                    )\n                    sys.exit(-1)\n                port = new_port\n                os.environ[\"LUTE_PORT\"] = str(port)\n            else:\n                data_socket.bind(f\"tcp://*:{port}\")\n            logger.debug(f\"Executor bound port {port}\")\n        else:\n            executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n            if executor_hostname is None or port is None:\n                logger.info(\n                    \"Task-side does not have host/port information!\"\n                    \" Check environment variables! Exiting!\"\n                )\n                sys.exit(-1)\n            data_socket.connect(f\"tcp://{executor_hostname}:{port}\")\n\n    # Unix Init\n    ############################################################################\n\n    def _get_socket_path(self) -> str:\n        \"\"\"Return the socket path, defining one if it is not available.\n\n        Returns:\n            socket_path (str): Path to the Unix socket.\n        \"\"\"\n        socket_path: str\n        try:\n            socket_path = os.environ[\"LUTE_SOCKET\"]\n        except KeyError as err:\n            import uuid\n            import tempfile\n\n            # Define a path, and add to environment\n            # Executor-side always created first, Task will use the same one\n            socket_path = f\"{tempfile.gettempdir()}/lute_{uuid.uuid4().hex}.sock\"\n            os.environ[\"LUTE_SOCKET\"] = socket_path\n            logger.debug(f\"SocketCommunicator defines socket_path: {socket_path}\")\n        if USE_ZMQ:\n            return f\"ipc://{socket_path}\"\n        else:\n            return socket_path\n\n    def _init_unix_socket_raw(self) -> socket.socket:\n        \"\"\"Returns a Unix socket object.\n\n        Executor-side code should always be run first. It checks to see if\n        the environment variable\n                                `LUTE_SOCKET=XYZ`\n        is defined, if so binds it, otherwise it will create a new path and\n        define the environment variable for the Task-side to find.\n\n        On the Task (client-side), this method will also open a SSH tunnel to\n        forward a local Unix socket to an Executor Unix socket if the Task and\n        Executor processes are on different machines.\n\n        Returns:\n            data_socket (socket.socket): Unix socket object.\n        \"\"\"\n        socket_path: str = self._get_socket_path()\n        data_socket = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)\n        if self._party == Party.EXECUTOR:\n            if os.path.exists(socket_path):\n                os.unlink(socket_path)\n            data_socket.bind(socket_path)\n            data_socket.listen()\n        elif self._party == Party.TASK:\n            hostname: str = socket.gethostname()\n            executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n            if executor_hostname is None:\n                logger.info(\"Hostname for Executor process not found! Exiting!\")\n                data_socket.close()\n                sys.exit(-1)\n            if hostname == executor_hostname:\n                data_socket.connect(socket_path)\n            else:\n                self._local_socket_path = self._setup_unix_ssh_tunnel(\n                    socket_path, hostname, executor_hostname\n                )\n                while 1:\n                    # Keep trying reconnect until ssh tunnel works.\n                    try:\n                        data_socket.connect(self._local_socket_path)\n                        break\n                    except FileNotFoundError:\n                        continue\n\n        return data_socket\n\n    def _init_unix_socket_zmq(self, data_socket: zmq.sugar.socket.Socket) -> None:\n        \"\"\"Initialize a Unix socket object, using ZMQ.\n\n        Equivalent as the method above but requires passing in a ZMQ socket\n        object instead of returning one.\n\n        Args:\n            data_socket (socket.socket): ZMQ object.\n        \"\"\"\n        socket_path = self._get_socket_path()\n        if self._party == Party.EXECUTOR:\n            if os.path.exists(socket_path):\n                os.unlink(socket_path)\n            data_socket.bind(socket_path)\n        elif self._party == Party.TASK:\n            hostname: str = socket.gethostname()\n            executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n            if executor_hostname is None:\n                logger.info(\"Hostname for Executor process not found! Exiting!\")\n                self._data_socket.close()\n                sys.exit(-1)\n            if hostname == executor_hostname:\n                data_socket.connect(socket_path)\n            else:\n                # Need to remove ipc:// from socket_path for forwarding\n                self._local_socket_path = self._setup_unix_ssh_tunnel(\n                    socket_path[6:], hostname, executor_hostname\n                )\n                # Need to add it back\n                path: str = f\"ipc://{self._local_socket_path}\"\n                data_socket.connect(path)\n\n    def _setup_unix_ssh_tunnel(\n        self, socket_path: str, hostname: str, executor_hostname: str\n    ) -> str:\n        \"\"\"Prepares an SSH tunnel for forwarding between Unix sockets on two hosts.\n\n        An SSH tunnel is opened with `ssh -L <local>:<remote> sleep 2`.\n        This method of communication is slightly slower and incurs additional\n        overhead - it should only be used as a backup. If communication across\n        multiple hosts is required consider using TCP.  The Task will use\n        the local socket `<LUTE_SOCKET>.task{##}`. Multiple local sockets may be\n        created. It is assumed that the user is identical on both the\n        Task machine and Executor machine.\n\n        Returns:\n            local_socket_path (str): The local Unix socket to connect to.\n        \"\"\"\n        if \"uuid\" not in globals():\n            import uuid\n        local_socket_path = f\"{socket_path}.task{uuid.uuid4().hex[:4]}\"\n        self._use_ssh_tunnel = True\n        ssh_cmd: List[str] = [\n            \"ssh\",\n            \"-o\",\n            \"LogLevel=quiet\",\n            \"-L\",\n            f\"{local_socket_path}:{socket_path}\",\n            executor_hostname,\n            \"sleep\",\n            \"2\",\n        ]\n        logger.debug(f\"Opening tunnel from {hostname} to {executor_hostname}\")\n        self._ssh_proc = subprocess.Popen(ssh_cmd)\n        time.sleep(0.4)  # Need to wait... -> Use single Task comm at beginning?\n        return local_socket_path\n\n    # Clean up and properties\n    ############################################################################\n\n    def _clean_up(self) -> None:\n        \"\"\"Clean up connections.\"\"\"\n        if self._party == Party.EXECUTOR:\n            self._stop_thread = True\n            self._reader_thread.join()\n            logger.debug(\"Closed reading thread.\")\n\n        self._data_socket.close()\n        if USE_ZMQ:\n            self._context.term()\n        else:\n            ...\n\n        if os.getenv(\"LUTE_USE_TCP\"):\n            return\n        else:\n            if self._party == Party.EXECUTOR:\n                os.unlink(os.getenv(\"LUTE_SOCKET\"))  # Should be defined\n                return\n            elif self._use_ssh_tunnel:\n                if self._ssh_proc is not None:\n                    self._ssh_proc.terminate()\n\n    @property\n    def has_messages(self) -> bool:\n        if self._party == Party.TASK:\n            # Shouldn't be called on Task-side\n            return False\n\n        if self._msg_queue.qsize() > 0:\n            return True\n        return False\n\n    def __exit__(self):\n        self._clean_up()\n
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.ACCEPT_TIMEOUT","title":"ACCEPT_TIMEOUT: float = 0.01 class-attribute instance-attribute","text":"

Maximum time to wait to accept connections. Used by Executor-side.

"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.MSG_HEAD","title":"MSG_HEAD: bytes = b'MSG' class-attribute instance-attribute","text":"

Start signal of a message. The end of a message is indicated by MSG_HEAD[::-1].

"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.MSG_SEP","title":"MSG_SEP: bytes = b';;;' class-attribute instance-attribute","text":"

Separator for parts of a message. Messages have a start, length, message and end.

"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.__init__","title":"__init__(party=Party.TASK, use_pickle=True)","text":"

IPC over a TCP or Unix socket.

Unlike with the PipeCommunicator, pickle is always used to send data through the socket.

Parameters:

Name Type Description Default party Party

Which object (side/process) the Communicator is managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.

TASK use_pickle bool

Whether to use pickle. Always True currently, passing False does not change behaviour.

True Source code in lute/execution/ipc.py
def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n    \"\"\"IPC over a TCP or Unix socket.\n\n    Unlike with the PipeCommunicator, pickle is always used to send data\n    through the socket.\n\n    Args:\n        party (Party): Which object (side/process) the Communicator is\n            managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n\n        use_pickle (bool): Whether to use pickle. Always True currently,\n            passing False does not change behaviour.\n    \"\"\"\n    super().__init__(party=party, use_pickle=use_pickle)\n
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.delayed_setup","title":"delayed_setup()","text":"

Delays the creation of socket objects.

The Executor initializes the Communicator when it is created. Since all Executors are created and available at once we want to delay acquisition of socket resources until a single Executor is ready to use them.

Source code in lute/execution/ipc.py
def delayed_setup(self) -> None:\n    \"\"\"Delays the creation of socket objects.\n\n    The Executor initializes the Communicator when it is created. Since\n    all Executors are created and available at once we want to delay\n    acquisition of socket resources until a single Executor is ready\n    to use them.\n    \"\"\"\n    self._data_socket: Union[socket.socket, zmq.sugar.socket.Socket]\n    if USE_ZMQ:\n        self.desc: str = \"Communicates using ZMQ through TCP or Unix sockets.\"\n        self._context: zmq.context.Context = zmq.Context()\n        self._data_socket = self._create_socket_zmq()\n    else:\n        self.desc: str = \"Communicates through a TCP or Unix socket.\"\n        self._data_socket = self._create_socket_raw()\n        self._data_socket.settimeout(SocketCommunicator.ACCEPT_TIMEOUT)\n\n    if self._party == Party.EXECUTOR:\n        # Executor created first so we can define the hostname env variable\n        os.environ[\"LUTE_EXECUTOR_HOST\"] = socket.gethostname()\n        # Setup reader thread\n        self._reader_thread: threading.Thread = threading.Thread(\n            target=self._read_socket\n        )\n        self._msg_queue: queue.Queue = queue.Queue()\n        self._partial_msg: Optional[bytes] = None\n        self._stop_thread: bool = False\n        self._reader_thread.start()\n    else:\n        # Only used by Party.TASK\n        self._use_ssh_tunnel: bool = False\n        self._ssh_proc: Optional[subprocess.Popen] = None\n        self._local_socket_path: Optional[str] = None\n
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.read","title":"read(proc)","text":"

Return a message from the queue if available.

Socket(s) are continuously monitored, and read from when new data is available.

Parameters:

Name Type Description Default proc Popen

The process to read from. Provided for compatibility with other Communicator subtypes. Is ignored.

required

Returns:

Name Type Description msg Message

The message read, containing contents and signal.

Source code in lute/execution/ipc.py
def read(self, proc: subprocess.Popen) -> Message:\n    \"\"\"Return a message from the queue if available.\n\n    Socket(s) are continuously monitored, and read from when new data is\n    available.\n\n    Args:\n        proc (subprocess.Popen): The process to read from. Provided for\n            compatibility with other Communicator subtypes. Is ignored.\n\n    Returns:\n         msg (Message): The message read, containing contents and signal.\n    \"\"\"\n    msg: Message\n    try:\n        msg = self._msg_queue.get(timeout=SocketCommunicator.ACCEPT_TIMEOUT)\n    except queue.Empty:\n        msg = Message()\n\n    return msg\n
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.write","title":"write(msg)","text":"

Send a single Message.

The entire Message (signal and contents) is serialized and sent through a connection over Unix socket.

Parameters:

Name Type Description Default msg Message

The Message to send.

required Source code in lute/execution/ipc.py
def write(self, msg: Message) -> None:\n    \"\"\"Send a single Message.\n\n    The entire Message (signal and contents) is serialized and sent through\n    a connection over Unix socket.\n\n    Args:\n        msg (Message): The Message to send.\n    \"\"\"\n    self._write_socket(msg)\n
"},{"location":"source/io/_sqlite/","title":"_sqlite","text":"

Backend SQLite database utilites.

Functions should be used only by the higher-level database module.

"},{"location":"source/io/config/","title":"config","text":"

Machinary for the IO of configuration YAML files and their validation.

Functions:

Name Description parse_config

str, config_path: str) -> TaskParameters: Parse a configuration file and return a TaskParameters object of validated parameters for a specific Task. Raises an exception if the provided configuration does not match the expected model.

Raises:

Type Description ValidationError

Error raised by pydantic during data validation. (From Pydantic)

"},{"location":"source/io/config/#io.config.AnalysisHeader","title":"AnalysisHeader","text":"

Bases: BaseModel

Header information for LUTE analysis runs.

Source code in lute/io/models/base.py
class AnalysisHeader(BaseModel):\n    \"\"\"Header information for LUTE analysis runs.\"\"\"\n\n    title: str = Field(\n        \"LUTE Task Configuration\",\n        description=\"Description of the configuration or experiment.\",\n    )\n    experiment: str = Field(\"\", description=\"Experiment.\")\n    run: Union[str, int] = Field(\"\", description=\"Data acquisition run.\")\n    date: str = Field(\"1970/01/01\", description=\"Start date of analysis.\")\n    lute_version: Union[float, str] = Field(\n        0.1, description=\"Version of LUTE used for analysis.\"\n    )\n    task_timeout: PositiveInt = Field(\n        600,\n        description=(\n            \"Time in seconds until a task times out. Should be slightly shorter\"\n            \" than job timeout if using a job manager (e.g. SLURM).\"\n        ),\n    )\n    work_dir: str = Field(\"\", description=\"Main working directory for LUTE.\")\n\n    @validator(\"work_dir\", always=True)\n    def validate_work_dir(cls, directory: str, values: Dict[str, Any]) -> str:\n        work_dir: str\n        if directory == \"\":\n            std_work_dir = (\n                f\"/sdf/data/lcls/ds/{values['experiment'][:3]}/\"\n                f\"{values['experiment']}/scratch\"\n            )\n            work_dir = std_work_dir\n        else:\n            work_dir = directory\n        # Check existence and permissions\n        if not os.path.exists(work_dir):\n            raise ValueError(f\"Working Directory: {work_dir} does not exist!\")\n        if not os.access(work_dir, os.W_OK):\n            # Need write access for database, files etc.\n            raise ValueError(f\"Not write access for working directory: {work_dir}!\")\n        return work_dir\n\n    @validator(\"run\", always=True)\n    def validate_run(\n        cls, run: Union[str, int], values: Dict[str, Any]\n    ) -> Union[str, int]:\n        if run == \"\":\n            # From Airflow RUN_NUM should have Format \"RUN_DATETIME\" - Num is first part\n            run_time: str = os.environ.get(\"RUN_NUM\", \"\")\n            if run_time != \"\":\n                return int(run_time.split(\"_\")[0])\n        return run\n\n    @validator(\"experiment\", always=True)\n    def validate_experiment(cls, experiment: str, values: Dict[str, Any]) -> str:\n        if experiment == \"\":\n            arp_exp: str = os.environ.get(\"EXPERIMENT\", \"EXPX00000\")\n            return arp_exp\n        return experiment\n
"},{"location":"source/io/config/#io.config.CompareHKLParameters","title":"CompareHKLParameters","text":"

Bases: ThirdPartyParameters

Parameters for CrystFEL's compare_hkl for calculating figures of merit.

There are many parameters, and many combinations. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-partialator.html

Source code in lute/io/models/sfx_merge.py
class CompareHKLParameters(ThirdPartyParameters):\n    \"\"\"Parameters for CrystFEL's `compare_hkl` for calculating figures of merit.\n\n    There are many parameters, and many combinations. For more information on\n    usage, please refer to the CrystFEL documentation, here:\n    https://www.desy.de/~twhite/crystfel/manual-partialator.html\n    \"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        long_flags_use_eq: bool = True\n        \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/compare_hkl\",\n        description=\"CrystFEL's reflection comparison binary.\",\n        flag_type=\"\",\n    )\n    in_files: Optional[str] = Field(\n        \"\",\n        description=\"Path to input HKLs. Space-separated list of 2. Use output of partialator e.g.\",\n        flag_type=\"\",\n    )\n    ## Need mechanism to set is_result=True ...\n    symmetry: str = Field(\"\", description=\"Point group symmetry.\", flag_type=\"--\")\n    cell_file: str = Field(\n        \"\",\n        description=\"Path to a file containing unit cell information (PDB or CrystFEL format).\",\n        flag_type=\"-\",\n        rename_param=\"p\",\n    )\n    fom: str = Field(\n        \"Rsplit\", description=\"Specify figure of merit to calculate.\", flag_type=\"--\"\n    )\n    nshells: int = Field(10, description=\"Use n resolution shells.\", flag_type=\"--\")\n    # NEED A NEW CASE FOR THIS -> Boolean flag, no arg, one hyphen...\n    # fix_unity: bool = Field(\n    #    False,\n    #    description=\"Fix scale factors to unity.\",\n    #    flag_type=\"-\",\n    #    rename_param=\"u\",\n    # )\n    shell_file: str = Field(\n        \"\",\n        description=\"Write the statistics in resolution shells to a file.\",\n        flag_type=\"--\",\n        rename_param=\"shell-file\",\n        is_result=True,\n    )\n    ignore_negs: bool = Field(\n        False,\n        description=\"Ignore reflections with negative reflections.\",\n        flag_type=\"--\",\n        rename_param=\"ignore-negs\",\n    )\n    zero_negs: bool = Field(\n        False,\n        description=\"Set negative intensities to 0.\",\n        flag_type=\"--\",\n        rename_param=\"zero-negs\",\n    )\n    sigma_cutoff: Optional[Union[float, int, str]] = Field(\n        # \"-infinity\",\n        description=\"Discard reflections with I/sigma(I) < n. -infinity means no cutoff.\",\n        flag_type=\"--\",\n        rename_param=\"sigma-cutoff\",\n    )\n    rmin: Optional[float] = Field(\n        description=\"Low resolution cutoff of 1/d (m-1). Use this or --lowres NOT both.\",\n        flag_type=\"--\",\n    )\n    lowres: Optional[float] = Field(\n        descirption=\"Low resolution cutoff in Angstroms. Use this or --rmin NOT both.\",\n        flag_type=\"--\",\n    )\n    rmax: Optional[float] = Field(\n        description=\"High resolution cutoff in 1/d (m-1). Use this or --highres NOT both.\",\n        flag_type=\"--\",\n    )\n    highres: Optional[float] = Field(\n        description=\"High resolution cutoff in Angstroms. Use this or --rmax NOT both.\",\n        flag_type=\"--\",\n    )\n\n    @validator(\"in_files\", always=True)\n    def validate_in_files(cls, in_files: str, values: Dict[str, Any]) -> str:\n        if in_files == \"\":\n            partialator_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n            )\n            if partialator_file:\n                hkls: str = f\"{partialator_file}1 {partialator_file}2\"\n                return hkls\n        return in_files\n\n    @validator(\"cell_file\", always=True)\n    def validate_cell_file(cls, cell_file: str, values: Dict[str, Any]) -> str:\n        if cell_file == \"\":\n            idx_cell_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\",\n                \"IndexCrystFEL\",\n                \"cell_file\",\n                valid_only=False,\n            )\n            if idx_cell_file:\n                return idx_cell_file\n        return cell_file\n\n    @validator(\"symmetry\", always=True)\n    def validate_symmetry(cls, symmetry: str, values: Dict[str, Any]) -> str:\n        if symmetry == \"\":\n            partialator_sym: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"symmetry\"\n            )\n            if partialator_sym:\n                return partialator_sym\n        return symmetry\n\n    @validator(\"shell_file\", always=True)\n    def validate_shell_file(cls, shell_file: str, values: Dict[str, Any]) -> str:\n        if shell_file == \"\":\n            partialator_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n            )\n            if partialator_file:\n                shells_out: str = partialator_file.split(\".\")[0]\n                shells_out = f\"{shells_out}_{values['fom']}_n{values['nshells']}.dat\"\n                return shells_out\n        return shell_file\n
"},{"location":"source/io/config/#io.config.CompareHKLParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_merge.py
class Config(ThirdPartyParameters.Config):\n    long_flags_use_eq: bool = True\n    \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/config/#io.config.CompareHKLParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True class-attribute instance-attribute","text":"

Whether long command-line arguments are passed like --long=arg.

"},{"location":"source/io/config/#io.config.CompareHKLParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/config/#io.config.ConcatenateStreamFilesParameters","title":"ConcatenateStreamFilesParameters","text":"

Bases: TaskParameters

Parameters for stream concatenation.

Concatenates the stream file output from CrystFEL indexing for multiple experimental runs.

Source code in lute/io/models/sfx_index.py
class ConcatenateStreamFilesParameters(TaskParameters):\n    \"\"\"Parameters for stream concatenation.\n\n    Concatenates the stream file output from CrystFEL indexing for multiple\n    experimental runs.\n    \"\"\"\n\n    class Config(TaskParameters.Config):\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    in_file: str = Field(\n        \"\",\n        description=\"Root of directory tree storing stream files to merge.\",\n    )\n\n    tag: Optional[str] = Field(\n        \"\",\n        description=\"Tag identifying the stream files to merge.\",\n    )\n\n    out_file: str = Field(\n        \"\", description=\"Path to merged output stream file.\", is_result=True\n    )\n\n    @validator(\"in_file\", always=True)\n    def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n        if in_file == \"\":\n            stream_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"IndexCrystFEL\", \"out_file\"\n            )\n            if stream_file:\n                stream_dir: str = str(Path(stream_file).parent)\n                return stream_dir\n        return in_file\n\n    @validator(\"tag\", always=True)\n    def validate_tag(cls, tag: str, values: Dict[str, Any]) -> str:\n        if tag == \"\":\n            stream_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"IndexCrystFEL\", \"out_file\"\n            )\n            if stream_file:\n                stream_tag: str = Path(stream_file).name.split(\"_\")[0]\n                return stream_tag\n        return tag\n\n    @validator(\"out_file\", always=True)\n    def validate_out_file(cls, tag: str, values: Dict[str, Any]) -> str:\n        if tag == \"\":\n            stream_out_file: str = str(\n                Path(values[\"in_file\"]).parent / f\"{values['tag'].stream}\"\n            )\n            return stream_out_file\n        return tag\n
"},{"location":"source/io/config/#io.config.ConcatenateStreamFilesParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_index.py
class Config(TaskParameters.Config):\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/config/#io.config.ConcatenateStreamFilesParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/config/#io.config.DimpleSolveParameters","title":"DimpleSolveParameters","text":"

Bases: ThirdPartyParameters

Parameters for CCP4's dimple program.

There are many parameters. For more information on usage, please refer to the CCP4 documentation, here: https://ccp4.github.io/dimple/

Source code in lute/io/models/sfx_solve.py
class DimpleSolveParameters(ThirdPartyParameters):\n    \"\"\"Parameters for CCP4's dimple program.\n\n    There are many parameters. For more information on\n    usage, please refer to the CCP4 documentation, here:\n    https://ccp4.github.io/dimple/\n    \"\"\"\n\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/ccp4-8.0/bin/dimple\",\n        description=\"CCP4 Dimple for solving structures with MR.\",\n        flag_type=\"\",\n    )\n    # Positional requirements - all required.\n    in_file: str = Field(\n        \"\",\n        description=\"Path to input mtz.\",\n        flag_type=\"\",\n    )\n    pdb: str = Field(\"\", description=\"Path to a PDB.\", flag_type=\"\")\n    out_dir: str = Field(\"\", description=\"Output DIRECTORY.\", flag_type=\"\")\n    # Most used options\n    mr_thresh: PositiveFloat = Field(\n        0.4,\n        description=\"Threshold for molecular replacement.\",\n        flag_type=\"--\",\n        rename_param=\"mr-when-r\",\n    )\n    slow: Optional[bool] = Field(\n        False, description=\"Perform more refinement.\", flag_type=\"--\"\n    )\n    # Other options (IO)\n    hklout: str = Field(\n        \"final.mtz\", description=\"Output mtz file name.\", flag_type=\"--\"\n    )\n    xyzout: str = Field(\n        \"final.pdb\", description=\"Output PDB file name.\", flag_type=\"--\"\n    )\n    icolumn: Optional[str] = Field(\n        # \"IMEAN\",\n        description=\"Name for the I column.\",\n        flag_type=\"--\",\n    )\n    sigicolumn: Optional[str] = Field(\n        # \"SIG<ICOL>\",\n        description=\"Name for the Sig<I> column.\",\n        flag_type=\"--\",\n    )\n    fcolumn: Optional[str] = Field(\n        # \"F\",\n        description=\"Name for the F column.\",\n        flag_type=\"--\",\n    )\n    sigfcolumn: Optional[str] = Field(\n        # \"F\",\n        description=\"Name for the Sig<F> column.\",\n        flag_type=\"--\",\n    )\n    libin: Optional[str] = Field(\n        description=\"Ligand descriptions for refmac (LIBIN).\", flag_type=\"--\"\n    )\n    refmac_key: Optional[str] = Field(\n        description=\"Extra Refmac keywords to use in refinement.\",\n        flag_type=\"--\",\n        rename_param=\"refmac-key\",\n    )\n    free_r_flags: Optional[str] = Field(\n        description=\"Path to a mtz file with freeR flags.\",\n        flag_type=\"--\",\n        rename_param=\"free-r-flags\",\n    )\n    freecolumn: Optional[Union[int, float]] = Field(\n        # 0,\n        description=\"Refree column with an optional value.\",\n        flag_type=\"--\",\n    )\n    img_format: Optional[str] = Field(\n        description=\"Format of generated images. (png, jpeg, none).\",\n        flag_type=\"-\",\n        rename_param=\"f\",\n    )\n    white_bg: bool = Field(\n        False,\n        description=\"Use a white background in Coot and in images.\",\n        flag_type=\"--\",\n        rename_param=\"white-bg\",\n    )\n    no_cleanup: bool = Field(\n        False,\n        description=\"Retain intermediate files.\",\n        flag_type=\"--\",\n        rename_param=\"no-cleanup\",\n    )\n    # Calculations\n    no_blob_search: bool = Field(\n        False,\n        description=\"Do not search for unmodelled blobs.\",\n        flag_type=\"--\",\n        rename_param=\"no-blob-search\",\n    )\n    anode: bool = Field(\n        False, description=\"Use SHELX/AnoDe to find peaks in the anomalous map.\"\n    )\n    # Run customization\n    no_hetatm: bool = Field(\n        False,\n        description=\"Remove heteroatoms from the given model.\",\n        flag_type=\"--\",\n        rename_param=\"no-hetatm\",\n    )\n    rigid_cycles: Optional[PositiveInt] = Field(\n        # 10,\n        description=\"Number of cycles of rigid-body refinement to perform.\",\n        flag_type=\"--\",\n        rename_param=\"rigid-cycles\",\n    )\n    jelly: Optional[PositiveInt] = Field(\n        # 4,\n        description=\"Number of cycles of jelly-body refinement to perform.\",\n        flag_type=\"--\",\n    )\n    restr_cycles: Optional[PositiveInt] = Field(\n        # 8,\n        description=\"Number of cycles of refmac final refinement to perform.\",\n        flag_type=\"--\",\n        rename_param=\"restr-cycles\",\n    )\n    lim_resolution: Optional[PositiveFloat] = Field(\n        description=\"Limit the final resolution.\", flag_type=\"--\", rename_param=\"reso\"\n    )\n    weight: Optional[str] = Field(\n        # \"auto-weight\",\n        description=\"The refmac matrix weight.\",\n        flag_type=\"--\",\n    )\n    mr_prog: Optional[str] = Field(\n        # \"phaser\",\n        description=\"Molecular replacement program. phaser or molrep.\",\n        flag_type=\"--\",\n        rename_param=\"mr-prog\",\n    )\n    mr_num: Optional[Union[str, int]] = Field(\n        # \"auto\",\n        description=\"Number of molecules to use for molecular replacement.\",\n        flag_type=\"--\",\n        rename_param=\"mr-num\",\n    )\n    mr_reso: Optional[PositiveFloat] = Field(\n        # 3.25,\n        description=\"High resolution for molecular replacement. If >10 interpreted as eLLG.\",\n        flag_type=\"--\",\n        rename_param=\"mr-reso\",\n    )\n    itof_prog: Optional[str] = Field(\n        description=\"Program to calculate amplitudes. truncate, or ctruncate.\",\n        flag_type=\"--\",\n        rename_param=\"ItoF-prog\",\n    )\n\n    @validator(\"in_file\", always=True)\n    def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n        if in_file == \"\":\n            get_hkl_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"ManipulateHKL\", \"out_file\"\n            )\n            if get_hkl_file:\n                return get_hkl_file\n        return in_file\n\n    @validator(\"out_dir\", always=True)\n    def validate_out_dir(cls, out_dir: str, values: Dict[str, Any]) -> str:\n        if out_dir == \"\":\n            get_hkl_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"ManipulateHKL\", \"out_file\"\n            )\n            if get_hkl_file:\n                return os.path.dirname(get_hkl_file)\n        return out_dir\n
"},{"location":"source/io/config/#io.config.FindOverlapXSSParameters","title":"FindOverlapXSSParameters","text":"

Bases: TaskParameters

TaskParameter model for FindOverlapXSS Task.

This Task determines spatial or temporal overlap between an optical pulse and the FEL pulse based on difference scattering (XSS) signal. This Task uses SmallData HDF5 files as a source.

Source code in lute/io/models/smd.py
class FindOverlapXSSParameters(TaskParameters):\n    \"\"\"TaskParameter model for FindOverlapXSS Task.\n\n    This Task determines spatial or temporal overlap between an optical pulse\n    and the FEL pulse based on difference scattering (XSS) signal. This Task\n    uses SmallData HDF5 files as a source.\n    \"\"\"\n\n    class ExpConfig(BaseModel):\n        det_name: str\n        ipm_var: str\n        scan_var: Union[str, List[str]]\n\n    class Thresholds(BaseModel):\n        min_Iscat: Union[int, float]\n        min_ipm: Union[int, float]\n\n    class AnalysisFlags(BaseModel):\n        use_pyfai: bool = True\n        use_asymls: bool = False\n\n    exp_config: ExpConfig\n    thresholds: Thresholds\n    analysis_flags: AnalysisFlags\n
"},{"location":"source/io/config/#io.config.FindPeaksPsocakeParameters","title":"FindPeaksPsocakeParameters","text":"

Bases: ThirdPartyParameters

Parameters for crystallographic (Bragg) peak finding using Psocake.

This peak finding Task optionally has the ability to compress/decompress data with SZ for the purpose of compression validation. NOTE: This Task is deprecated and provided for compatibility only.

Source code in lute/io/models/sfx_find_peaks.py
class FindPeaksPsocakeParameters(ThirdPartyParameters):\n    \"\"\"Parameters for crystallographic (Bragg) peak finding using Psocake.\n\n    This peak finding Task optionally has the ability to compress/decompress\n    data with SZ for the purpose of compression validation.\n    NOTE: This Task is deprecated and provided for compatibility only.\n    \"\"\"\n\n    class Config(TaskParameters.Config):\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n        result_from_params: str = \"\"\n        \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n\n    class SZParameters(BaseModel):\n        compressor: Literal[\"qoz\", \"sz3\"] = Field(\n            \"qoz\", description=\"SZ compression algorithm (qoz, sz3)\"\n        )\n        binSize: int = Field(2, description=\"SZ compression's bin size paramater\")\n        roiWindowSize: int = Field(\n            2, description=\"SZ compression's ROI window size paramater\"\n        )\n        absError: float = Field(10, descriptionp=\"Maximum absolute error value\")\n\n    executable: str = Field(\"mpirun\", description=\"MPI executable.\", flag_type=\"\")\n    np: PositiveInt = Field(\n        max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n        description=\"Number of processes\",\n        flag_type=\"-\",\n    )\n    mca: str = Field(\n        \"btl ^openib\", description=\"Mca option for the MPI executable\", flag_type=\"--\"\n    )\n    p_arg1: str = Field(\n        \"python\", description=\"Executable to run with mpi (i.e. python).\", flag_type=\"\"\n    )\n    u: str = Field(\n        \"\", description=\"Python option for unbuffered output.\", flag_type=\"-\"\n    )\n    p_arg2: str = Field(\n        \"findPeaksSZ.py\",\n        description=\"Executable to run with mpi (i.e. python).\",\n        flag_type=\"\",\n    )\n    d: str = Field(description=\"Detector name\", flag_type=\"-\")\n    e: str = Field(\"\", description=\"Experiment name\", flag_type=\"-\")\n    r: int = Field(-1, description=\"Run number\", flag_type=\"-\")\n    outDir: str = Field(\n        description=\"Output directory where .cxi will be saved\", flag_type=\"--\"\n    )\n    algorithm: int = Field(1, description=\"PyAlgos algorithm to use\", flag_type=\"--\")\n    alg_npix_min: float = Field(\n        1.0, description=\"PyAlgos algorithm's npix_min parameter\", flag_type=\"--\"\n    )\n    alg_npix_max: float = Field(\n        45.0, description=\"PyAlgos algorithm's npix_max parameter\", flag_type=\"--\"\n    )\n    alg_amax_thr: float = Field(\n        250.0, description=\"PyAlgos algorithm's amax_thr parameter\", flag_type=\"--\"\n    )\n    alg_atot_thr: float = Field(\n        330.0, description=\"PyAlgos algorithm's atot_thr parameter\", flag_type=\"--\"\n    )\n    alg_son_min: float = Field(\n        10.0, description=\"PyAlgos algorithm's son_min parameter\", flag_type=\"--\"\n    )\n    alg1_thr_low: float = Field(\n        80.0, description=\"PyAlgos algorithm's thr_low parameter\", flag_type=\"--\"\n    )\n    alg1_thr_high: float = Field(\n        270.0, description=\"PyAlgos algorithm's thr_high parameter\", flag_type=\"--\"\n    )\n    alg1_rank: int = Field(\n        3, description=\"PyAlgos algorithm's rank parameter\", flag_type=\"--\"\n    )\n    alg1_radius: int = Field(\n        3, description=\"PyAlgos algorithm's radius parameter\", flag_type=\"--\"\n    )\n    alg1_dr: int = Field(\n        1, description=\"PyAlgos algorithm's dr parameter\", flag_type=\"--\"\n    )\n    psanaMask_on: str = Field(\n        \"True\", description=\"Whether psana's mask should be used\", flag_type=\"--\"\n    )\n    psanaMask_calib: str = Field(\n        \"True\", description=\"Psana mask's calib parameter\", flag_type=\"--\"\n    )\n    psanaMask_status: str = Field(\n        \"True\", description=\"Psana mask's status parameter\", flag_type=\"--\"\n    )\n    psanaMask_edges: str = Field(\n        \"True\", description=\"Psana mask's edges parameter\", flag_type=\"--\"\n    )\n    psanaMask_central: str = Field(\n        \"True\", description=\"Psana mask's central parameter\", flag_type=\"--\"\n    )\n    psanaMask_unbond: str = Field(\n        \"True\", description=\"Psana mask's unbond parameter\", flag_type=\"--\"\n    )\n    psanaMask_unbondnrs: str = Field(\n        \"True\", description=\"Psana mask's unbondnbrs parameter\", flag_type=\"--\"\n    )\n    mask: str = Field(\n        \"\", description=\"Path to an additional mask to apply\", flag_type=\"--\"\n    )\n    clen: str = Field(\n        description=\"Epics variable storing the camera length\", flag_type=\"--\"\n    )\n    coffset: float = Field(0, description=\"Camera offset in m\", flag_type=\"--\")\n    minPeaks: int = Field(\n        15,\n        description=\"Minimum number of peaks to mark frame for indexing\",\n        flag_type=\"--\",\n    )\n    maxPeaks: int = Field(\n        15,\n        description=\"Maximum number of peaks to mark frame for indexing\",\n        flag_type=\"--\",\n    )\n    minRes: int = Field(\n        0,\n        description=\"Minimum peak resolution to mark frame for indexing \",\n        flag_type=\"--\",\n    )\n    sample: str = Field(\"\", description=\"Sample name\", flag_type=\"--\")\n    instrument: Union[None, str] = Field(\n        None, description=\"Instrument name\", flag_type=\"--\"\n    )\n    pixelSize: float = Field(0.0, description=\"Pixel size\", flag_type=\"--\")\n    auto: str = Field(\n        \"False\",\n        description=(\n            \"Whether to automatically determine peak per event peak \"\n            \"finding parameters\"\n        ),\n        flag_type=\"--\",\n    )\n    detectorDistance: float = Field(\n        0.0, description=\"Detector distance from interaction point in m\", flag_type=\"--\"\n    )\n    access: Literal[\"ana\", \"ffb\"] = Field(\n        \"ana\", description=\"Data node type: {ana,ffb}\", flag_type=\"--\"\n    )\n    szfile: str = Field(\"qoz.json\", description=\"Path to SZ's JSON configuration file\")\n    lute_template_cfg: TemplateConfig = Field(\n        TemplateConfig(\n            template_name=\"sz.json\",\n            output_path=\"\",  # Will want to change where this goes...\n        ),\n        description=\"Template information for the sz.json file\",\n    )\n    sz_parameters: SZParameters = Field(\n        description=\"Configuration parameters for SZ Compression\", flag_type=\"\"\n    )\n\n    @validator(\"e\", always=True)\n    def validate_e(cls, e: str, values: Dict[str, Any]) -> str:\n        if e == \"\":\n            return values[\"lute_config\"].experiment\n        return e\n\n    @validator(\"r\", always=True)\n    def validate_r(cls, r: int, values: Dict[str, Any]) -> int:\n        if r == -1:\n            return values[\"lute_config\"].run\n        return r\n\n    @validator(\"lute_template_cfg\", always=True)\n    def set_output_path(\n        cls, lute_template_cfg: TemplateConfig, values: Dict[str, Any]\n    ) -> TemplateConfig:\n        if lute_template_cfg.output_path == \"\":\n            lute_template_cfg.output_path = values[\"szfile\"]\n        return lute_template_cfg\n\n    @validator(\"sz_parameters\", always=True)\n    def set_sz_compression_parameters(\n        cls, sz_parameters: SZParameters, values: Dict[str, Any]\n    ) -> None:\n        values[\"compressor\"] = sz_parameters.compressor\n        values[\"binSize\"] = sz_parameters.binSize\n        values[\"roiWindowSize\"] = sz_parameters.roiWindowSize\n        if sz_parameters.compressor == \"qoz\":\n            values[\"pressio_opts\"] = {\n                \"pressio:abs\": sz_parameters.absError,\n                \"qoz\": {\"qoz:stride\": 8},\n            }\n        else:\n            values[\"pressio_opts\"] = {\"pressio:abs\": sz_parameters.absError}\n        return None\n\n    @root_validator(pre=False)\n    def define_result(cls, values: Dict[str, Any]) -> Dict[str, Any]:\n        exp: str = values[\"lute_config\"].experiment\n        run: int = int(values[\"lute_config\"].run)\n        directory: str = values[\"outDir\"]\n        fname: str = f\"{exp}_{run:04d}.lst\"\n\n        cls.Config.result_from_params = f\"{directory}/{fname}\"\n        return values\n
"},{"location":"source/io/config/#io.config.FindPeaksPsocakeParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_find_peaks.py
class Config(TaskParameters.Config):\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    result_from_params: str = \"\"\n    \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n
"},{"location":"source/io/config/#io.config.FindPeaksPsocakeParameters.Config.result_from_params","title":"result_from_params: str = '' class-attribute instance-attribute","text":"

Defines a result from the parameters. Use a validator to do so.

"},{"location":"source/io/config/#io.config.FindPeaksPsocakeParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/config/#io.config.FindPeaksPyAlgosParameters","title":"FindPeaksPyAlgosParameters","text":"

Bases: TaskParameters

Parameters for crystallographic (Bragg) peak finding using PyAlgos.

This peak finding Task optionally has the ability to compress/decompress data with SZ for the purpose of compression validation.

Source code in lute/io/models/sfx_find_peaks.py
class FindPeaksPyAlgosParameters(TaskParameters):\n    \"\"\"Parameters for crystallographic (Bragg) peak finding using PyAlgos.\n\n    This peak finding Task optionally has the ability to compress/decompress\n    data with SZ for the purpose of compression validation.\n    \"\"\"\n\n    class Config(TaskParameters.Config):\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    class SZCompressorParameters(BaseModel):\n        compressor: Literal[\"qoz\", \"sz3\"] = Field(\n            \"qoz\", description='Compression algorithm (\"qoz\" or \"sz3\")'\n        )\n        abs_error: float = Field(10.0, description=\"Absolute error bound\")\n        bin_size: int = Field(2, description=\"Bin size\")\n        roi_window_size: int = Field(\n            9,\n            description=\"Default window size\",\n        )\n\n    outdir: str = Field(\n        description=\"Output directory for cxi files\",\n    )\n    n_events: int = Field(\n        0,\n        description=\"Number of events to process (0 to process all events)\",\n    )\n    det_name: str = Field(\n        description=\"Psana name of the detector storing the image data\",\n    )\n    event_receiver: Literal[\"evr0\", \"evr1\"] = Field(\n        description=\"Event Receiver to be used: evr0 or evr1\",\n    )\n    tag: str = Field(\n        \"\",\n        description=\"Tag to add to the output file names\",\n    )\n    pv_camera_length: Union[str, float] = Field(\n        \"\",\n        description=\"PV associated with camera length \"\n        \"(if a number, camera length directly)\",\n    )\n    event_logic: bool = Field(\n        False,\n        description=\"True if only events with a specific event code should be \"\n        \"processed. False if the event code should be ignored\",\n    )\n    event_code: int = Field(\n        0,\n        description=\"Required events code for events to be processed if event logic \"\n        \"is True\",\n    )\n    psana_mask: bool = Field(\n        False,\n        description=\"If True, apply mask from psana Detector object\",\n    )\n    mask_file: Union[str, None] = Field(\n        None,\n        description=\"File with a custom mask to apply. If None, no custom mask is \"\n        \"applied\",\n    )\n    min_peaks: int = Field(2, description=\"Minimum number of peaks per image\")\n    max_peaks: int = Field(\n        2048,\n        description=\"Maximum number of peaks per image\",\n    )\n    npix_min: int = Field(\n        2,\n        description=\"Minimum number of pixels per peak\",\n    )\n    npix_max: int = Field(\n        30,\n        description=\"Maximum number of pixels per peak\",\n    )\n    amax_thr: float = Field(\n        80.0,\n        description=\"Minimum intensity threshold for starting a peak\",\n    )\n    atot_thr: float = Field(\n        120.0,\n        description=\"Minimum summed intensity threshold for pixel collection\",\n    )\n    son_min: float = Field(\n        7.0,\n        description=\"Minimum signal-to-noise ratio to be considered a peak\",\n    )\n    peak_rank: int = Field(\n        3,\n        description=\"Radius in which central peak pixel is a local maximum\",\n    )\n    r0: float = Field(\n        3.0,\n        description=\"Radius of ring for background evaluation in pixels\",\n    )\n    dr: float = Field(\n        2.0,\n        description=\"Width of ring for background evaluation in pixels\",\n    )\n    nsigm: float = Field(\n        7.0,\n        description=\"Intensity threshold to include pixel in connected group\",\n    )\n    compression: Optional[SZCompressorParameters] = Field(\n        None,\n        description=\"Options for the SZ Compression Algorithm\",\n    )\n    out_file: str = Field(\n        \"\",\n        description=\"Path to output file.\",\n        flag_type=\"-\",\n        rename_param=\"o\",\n        is_result=True,\n    )\n\n    @validator(\"out_file\", always=True)\n    def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n        if out_file == \"\":\n            fname: Path = (\n                Path(values[\"outdir\"])\n                / f\"{values['lute_config'].experiment}_{values['lute_config'].run}_\"\n                f\"{values['tag']}.list\"\n            )\n            return str(fname)\n        return out_file\n
"},{"location":"source/io/config/#io.config.FindPeaksPyAlgosParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_find_peaks.py
class Config(TaskParameters.Config):\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/config/#io.config.FindPeaksPyAlgosParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/config/#io.config.IndexCrystFELParameters","title":"IndexCrystFELParameters","text":"

Bases: ThirdPartyParameters

Parameters for CrystFEL's indexamajig.

There are many parameters, and many combinations. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-indexamajig.html

Source code in lute/io/models/sfx_index.py
class IndexCrystFELParameters(ThirdPartyParameters):\n    \"\"\"Parameters for CrystFEL's `indexamajig`.\n\n    There are many parameters, and many combinations. For more information on\n    usage, please refer to the CrystFEL documentation, here:\n    https://www.desy.de/~twhite/crystfel/manual-indexamajig.html\n    \"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n        long_flags_use_eq: bool = True\n        \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/indexamajig\",\n        description=\"CrystFEL's indexing binary.\",\n        flag_type=\"\",\n    )\n    # Basic options\n    in_file: Optional[str] = Field(\n        \"\", description=\"Path to input file.\", flag_type=\"-\", rename_param=\"i\"\n    )\n    out_file: str = Field(\n        \"\",\n        description=\"Path to output file.\",\n        flag_type=\"-\",\n        rename_param=\"o\",\n        is_result=True,\n    )\n    geometry: str = Field(\n        \"\", description=\"Path to geometry file.\", flag_type=\"-\", rename_param=\"g\"\n    )\n    zmq_input: Optional[str] = Field(\n        description=\"ZMQ address to receive data over. `input` and `zmq-input` are mutually exclusive\",\n        flag_type=\"--\",\n        rename_param=\"zmq-input\",\n    )\n    zmq_subscribe: Optional[str] = Field(  # Can be used multiple times...\n        description=\"Subscribe to ZMQ message of type `tag`\",\n        flag_type=\"--\",\n        rename_param=\"zmq-subscribe\",\n    )\n    zmq_request: Optional[AnyUrl] = Field(\n        description=\"Request new data over ZMQ by sending this value\",\n        flag_type=\"--\",\n        rename_param=\"zmq-request\",\n    )\n    asapo_endpoint: Optional[str] = Field(\n        description=\"ASAP::O endpoint. zmq-input and this are mutually exclusive.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-endpoint\",\n    )\n    asapo_token: Optional[str] = Field(\n        description=\"ASAP::O authentication token.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-token\",\n    )\n    asapo_beamtime: Optional[str] = Field(\n        description=\"ASAP::O beatime.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-beamtime\",\n    )\n    asapo_source: Optional[str] = Field(\n        description=\"ASAP::O data source.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-source\",\n    )\n    asapo_group: Optional[str] = Field(\n        description=\"ASAP::O consumer group.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-group\",\n    )\n    asapo_stream: Optional[str] = Field(\n        description=\"ASAP::O stream.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-stream\",\n    )\n    asapo_wait_for_stream: Optional[str] = Field(\n        description=\"If ASAP::O stream does not exist, wait for it to appear.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-wait-for-stream\",\n    )\n    data_format: Optional[str] = Field(\n        description=\"Specify format for ZMQ or ASAP::O. `msgpack`, `hdf5` or `seedee`.\",\n        flag_type=\"--\",\n        rename_param=\"data-format\",\n    )\n    basename: bool = Field(\n        False,\n        description=\"Remove directory parts of filenames. Acts before prefix if prefix also given.\",\n        flag_type=\"--\",\n    )\n    prefix: Optional[str] = Field(\n        description=\"Add a prefix to the filenames from the infile argument.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-stream\",\n    )\n    nthreads: PositiveInt = Field(\n        max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n        description=\"Number of threads to use. See also `max_indexer_threads`.\",\n        flag_type=\"-\",\n        rename_param=\"j\",\n    )\n    no_check_prefix: bool = Field(\n        False,\n        description=\"Don't attempt to correct the prefix if it seems incorrect.\",\n        flag_type=\"--\",\n        rename_param=\"no-check-prefix\",\n    )\n    highres: Optional[float] = Field(\n        description=\"Mark all pixels greater than `x` has bad.\", flag_type=\"--\"\n    )\n    profile: bool = Field(\n        False, description=\"Display timing data to monitor performance.\", flag_type=\"--\"\n    )\n    temp_dir: Optional[str] = Field(\n        description=\"Specify a path for the temp files folder.\",\n        flag_type=\"--\",\n        rename_param=\"temp-dir\",\n    )\n    wait_for_file: conint(gt=-2) = Field(\n        0,\n        description=\"Wait at most `x` seconds for a file to be created. A value of -1 means wait forever.\",\n        flag_type=\"--\",\n        rename_param=\"wait-for-file\",\n    )\n    no_image_data: bool = Field(\n        False,\n        description=\"Load only the metadata, no iamges. Can check indexability without high data requirements.\",\n        flag_type=\"--\",\n        rename_param=\"no-image-data\",\n    )\n    # Peak-finding options\n    # ....\n    # Indexing options\n    indexing: Optional[str] = Field(\n        description=\"Comma-separated list of supported indexing algorithms to use. Default is to automatically detect.\",\n        flag_type=\"--\",\n    )\n    cell_file: Optional[str] = Field(\n        description=\"Path to a file containing unit cell information (PDB or CrystFEL format).\",\n        flag_type=\"-\",\n        rename_param=\"p\",\n    )\n    tolerance: str = Field(\n        \"5,5,5,1.5\",\n        description=(\n            \"Tolerances (in percent) for unit cell comparison. \"\n            \"Comma-separated list a,b,c,angle. Default=5,5,5,1.5\"\n        ),\n        flag_type=\"--\",\n    )\n    no_check_cell: bool = Field(\n        False,\n        description=\"Do not check cell parameters against unit cell. Replaces '-raw' method.\",\n        flag_type=\"--\",\n        rename_param=\"no-check-cell\",\n    )\n    no_check_peaks: bool = Field(\n        False,\n        description=\"Do not verify peaks are accounted for by solution.\",\n        flag_type=\"--\",\n        rename_param=\"no-check-peaks\",\n    )\n    multi: bool = Field(\n        False, description=\"Enable multi-lattice indexing.\", flag_type=\"--\"\n    )\n    wavelength_estimate: Optional[float] = Field(\n        description=\"Estimate for X-ray wavelength. Required for some methods.\",\n        flag_type=\"--\",\n        rename_param=\"wavelength-estimate\",\n    )\n    camera_length_estimate: Optional[float] = Field(\n        description=\"Estimate for camera distance. Required for some methods.\",\n        flag_type=\"--\",\n        rename_param=\"camera-length-estimate\",\n    )\n    max_indexer_threads: Optional[PositiveInt] = Field(\n        # 1,\n        description=\"Some indexing algos can use multiple threads. In addition to image-based.\",\n        flag_type=\"--\",\n        rename_param=\"max-indexer-threads\",\n    )\n    no_retry: bool = Field(\n        False,\n        description=\"Do not remove weak peaks and try again.\",\n        flag_type=\"--\",\n        rename_param=\"no-retry\",\n    )\n    no_refine: bool = Field(\n        False,\n        description=\"Skip refinement step.\",\n        flag_type=\"--\",\n        rename_param=\"no-refine\",\n    )\n    no_revalidate: bool = Field(\n        False,\n        description=\"Skip revalidation step.\",\n        flag_type=\"--\",\n        rename_param=\"no-revalidate\",\n    )\n    # TakeTwo specific parameters\n    taketwo_member_threshold: Optional[PositiveInt] = Field(\n        # 20,\n        description=\"Minimum number of vectors to consider.\",\n        flag_type=\"--\",\n        rename_param=\"taketwo-member-threshold\",\n    )\n    taketwo_len_tolerance: Optional[PositiveFloat] = Field(\n        # 0.001,\n        description=\"TakeTwo length tolerance in Angstroms.\",\n        flag_type=\"--\",\n        rename_param=\"taketwo-len-tolerance\",\n    )\n    taketwo_angle_tolerance: Optional[PositiveFloat] = Field(\n        # 0.6,\n        description=\"TakeTwo angle tolerance in degrees.\",\n        flag_type=\"--\",\n        rename_param=\"taketwo-angle-tolerance\",\n    )\n    taketwo_trace_tolerance: Optional[PositiveFloat] = Field(\n        # 3,\n        description=\"Matrix trace tolerance in degrees.\",\n        flag_type=\"--\",\n        rename_param=\"taketwo-trace-tolerance\",\n    )\n    # Felix-specific parameters\n    # felix_domega\n    # felix-fraction-max-visits\n    # felix-max-internal-angle\n    # felix-max-uniqueness\n    # felix-min-completeness\n    # felix-min-visits\n    # felix-num-voxels\n    # felix-sigma\n    # felix-tthrange-max\n    # felix-tthrange-min\n    # XGANDALF-specific parameters\n    xgandalf_sampling_pitch: Optional[NonNegativeInt] = Field(\n        # 6,\n        description=\"Density of reciprocal space sampling.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-sampling-pitch\",\n    )\n    xgandalf_grad_desc_iterations: Optional[NonNegativeInt] = Field(\n        # 4,\n        description=\"Number of gradient descent iterations.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-grad-desc-iterations\",\n    )\n    xgandalf_tolerance: Optional[PositiveFloat] = Field(\n        # 0.02,\n        description=\"Relative tolerance of lattice vectors\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-tolerance\",\n    )\n    xgandalf_no_deviation_from_provided_cell: Optional[bool] = Field(\n        description=\"Found unit cell must match provided.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-no-deviation-from-provided-cell\",\n    )\n    xgandalf_min_lattice_vector_length: Optional[PositiveFloat] = Field(\n        # 30,\n        description=\"Minimum possible lattice length.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-min-lattice-vector-length\",\n    )\n    xgandalf_max_lattice_vector_length: Optional[PositiveFloat] = Field(\n        # 250,\n        description=\"Minimum possible lattice length.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-max-lattice-vector-length\",\n    )\n    xgandalf_max_peaks: Optional[PositiveInt] = Field(\n        # 250,\n        description=\"Maximum number of peaks to use for indexing.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-max-peaks\",\n    )\n    xgandalf_fast_execution: bool = Field(\n        False,\n        description=\"Shortcut to set sampling-pitch=2, and grad-desc-iterations=3.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-fast-execution\",\n    )\n    # pinkIndexer parameters\n    # ...\n    # asdf_fast: bool = Field(False, description=\"Enable fast mode for asdf. 3x faster for 7% loss in accuracy.\", flag_type=\"--\", rename_param=\"asdf-fast\")\n    # Integration parameters\n    integration: str = Field(\n        \"rings-nocen\", description=\"Method for integrating reflections.\", flag_type=\"--\"\n    )\n    fix_profile_radius: Optional[float] = Field(\n        description=\"Fix the profile radius (m^{-1})\",\n        flag_type=\"--\",\n        rename_param=\"fix-profile-radius\",\n    )\n    fix_divergence: Optional[float] = Field(\n        0,\n        description=\"Fix the divergence (rad, full angle).\",\n        flag_type=\"--\",\n        rename_param=\"fix-divergence\",\n    )\n    int_radius: str = Field(\n        \"4,5,7\",\n        description=\"Inner, middle, and outer radii for 3-ring integration.\",\n        flag_type=\"--\",\n        rename_param=\"int-radius\",\n    )\n    int_diag: str = Field(\n        \"none\",\n        description=\"Show detailed information on integration when condition is met.\",\n        flag_type=\"--\",\n        rename_param=\"int-diag\",\n    )\n    push_res: str = Field(\n        \"infinity\",\n        description=\"Integrate `x` higher than apparent resolution limit (nm-1).\",\n        flag_type=\"--\",\n        rename_param=\"push-res\",\n    )\n    overpredict: bool = Field(\n        False,\n        description=\"Over-predict reflections. Maybe useful with post-refinement.\",\n        flag_type=\"--\",\n    )\n    cell_parameters_only: bool = Field(\n        False, description=\"Do not predict refletions at all\", flag_type=\"--\"\n    )\n    # Output parameters\n    no_non_hits_in_stream: bool = Field(\n        False,\n        description=\"Exclude non-hits from the stream file.\",\n        flag_type=\"--\",\n        rename_param=\"no-non-hits-in-stream\",\n    )\n    copy_hheader: Optional[str] = Field(\n        description=\"Copy information from header in the image to output stream.\",\n        flag_type=\"--\",\n        rename_param=\"copy-hheader\",\n    )\n    no_peaks_in_stream: bool = Field(\n        False,\n        description=\"Do not record peaks in stream file.\",\n        flag_type=\"--\",\n        rename_param=\"no-peaks-in-stream\",\n    )\n    no_refls_in_stream: bool = Field(\n        False,\n        description=\"Do not record reflections in stream.\",\n        flag_type=\"--\",\n        rename_param=\"no-refls-in-stream\",\n    )\n    serial_offset: Optional[PositiveInt] = Field(\n        description=\"Start numbering at `x` instead of 1.\",\n        flag_type=\"--\",\n        rename_param=\"serial-offset\",\n    )\n    harvest_file: Optional[str] = Field(\n        description=\"Write parameters to file in JSON format.\",\n        flag_type=\"--\",\n        rename_param=\"harvest-file\",\n    )\n\n    @validator(\"in_file\", always=True)\n    def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n        if in_file == \"\":\n            filename: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"FindPeaksPyAlgos\", \"out_file\"\n            )\n            if filename is None:\n                exp: str = values[\"lute_config\"].experiment\n                run: int = int(values[\"lute_config\"].run)\n                tag: Optional[str] = read_latest_db_entry(\n                    f\"{values['lute_config'].work_dir}\", \"FindPeaksPsocake\", \"tag\"\n                )\n                out_dir: Optional[str] = read_latest_db_entry(\n                    f\"{values['lute_config'].work_dir}\", \"FindPeaksPsocake\", \"outDir\"\n                )\n                if out_dir is not None:\n                    fname: str = f\"{out_dir}/{exp}_{run:04d}\"\n                    if tag is not None:\n                        fname = f\"{fname}_{tag}\"\n                    return f\"{fname}.lst\"\n            else:\n                return filename\n        return in_file\n\n    @validator(\"out_file\", always=True)\n    def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n        if out_file == \"\":\n            expmt: str = values[\"lute_config\"].experiment\n            run: int = int(values[\"lute_config\"].run)\n            work_dir: str = values[\"lute_config\"].work_dir\n            fname: str = f\"{expmt}_r{run:04d}.stream\"\n            return f\"{work_dir}/{fname}\"\n        return out_file\n
"},{"location":"source/io/config/#io.config.IndexCrystFELParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_index.py
class Config(ThirdPartyParameters.Config):\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    long_flags_use_eq: bool = True\n    \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n
"},{"location":"source/io/config/#io.config.IndexCrystFELParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True class-attribute instance-attribute","text":"

Whether long command-line arguments are passed like --long=arg.

"},{"location":"source/io/config/#io.config.IndexCrystFELParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/config/#io.config.ManipulateHKLParameters","title":"ManipulateHKLParameters","text":"

Bases: ThirdPartyParameters

Parameters for CrystFEL's get_hkl for manipulating lists of reflections.

This Task is predominantly used internally to convert hkl to mtz files. Note that performing multiple manipulations is undefined behaviour. Run the Task with multiple configurations in explicit separate steps. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-partialator.html

Source code in lute/io/models/sfx_merge.py
class ManipulateHKLParameters(ThirdPartyParameters):\n    \"\"\"Parameters for CrystFEL's `get_hkl` for manipulating lists of reflections.\n\n    This Task is predominantly used internally to convert `hkl` to `mtz` files.\n    Note that performing multiple manipulations is undefined behaviour. Run\n    the Task with multiple configurations in explicit separate steps. For more\n    information on usage, please refer to the CrystFEL documentation, here:\n    https://www.desy.de/~twhite/crystfel/manual-partialator.html\n    \"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        long_flags_use_eq: bool = True\n        \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/get_hkl\",\n        description=\"CrystFEL's reflection manipulation binary.\",\n        flag_type=\"\",\n    )\n    in_file: str = Field(\n        \"\",\n        description=\"Path to input HKL file.\",\n        flag_type=\"-\",\n        rename_param=\"i\",\n    )\n    out_file: str = Field(\n        \"\",\n        description=\"Path to output file.\",\n        flag_type=\"-\",\n        rename_param=\"o\",\n        is_result=True,\n    )\n    cell_file: str = Field(\n        \"\",\n        description=\"Path to a file containing unit cell information (PDB or CrystFEL format).\",\n        flag_type=\"-\",\n        rename_param=\"p\",\n    )\n    output_format: str = Field(\n        \"mtz\",\n        description=\"Output format. One of mtz, mtz-bij, or xds. Otherwise CrystFEL format.\",\n        flag_type=\"--\",\n        rename_param=\"output-format\",\n    )\n    expand: Optional[str] = Field(\n        description=\"Reflections will be expanded to fill asymmetric unit of specified point group.\",\n        flag_type=\"--\",\n    )\n    # Reducing reflections to higher symmetry\n    twin: Optional[str] = Field(\n        description=\"Reflections equivalent to specified point group will have intensities summed.\",\n        flag_type=\"--\",\n    )\n    no_need_all_parts: Optional[bool] = Field(\n        description=\"Use with --twin to allow reflections missing a 'twin mate' to be written out.\",\n        flag_type=\"--\",\n        rename_param=\"no-need-all-parts\",\n    )\n    # Noise - Add to data\n    noise: Optional[bool] = Field(\n        description=\"Generate 10% uniform noise.\", flag_type=\"--\"\n    )\n    poisson: Optional[bool] = Field(\n        description=\"Generate Poisson noise. Intensities assumed to be A.U.\",\n        flag_type=\"--\",\n    )\n    adu_per_photon: Optional[int] = Field(\n        description=\"Use with --poisson to convert A.U. to photons.\",\n        flag_type=\"--\",\n        rename_param=\"adu-per-photon\",\n    )\n    # Remove duplicate reflections\n    trim_centrics: Optional[bool] = Field(\n        description=\"Duplicated reflections (according to symmetry) are removed.\",\n        flag_type=\"--\",\n    )\n    # Restrict to template file\n    template: Optional[str] = Field(\n        description=\"Only reflections which also appear in specified file are written out.\",\n        flag_type=\"--\",\n    )\n    # Multiplicity\n    multiplicity: Optional[bool] = Field(\n        description=\"Reflections are multiplied by their symmetric multiplicites.\",\n        flag_type=\"--\",\n    )\n    # Resolution cutoffs\n    cutoff_angstroms: Optional[Union[str, int, float]] = Field(\n        description=\"Either n, or n1,n2,n3. For n, reflections < n are removed. For n1,n2,n3 anisotropic trunction performed at separate resolution limits for a*, b*, c*.\",\n        flag_type=\"--\",\n        rename_param=\"cutoff-angstroms\",\n    )\n    lowres: Optional[float] = Field(\n        description=\"Remove reflections with d > n\", flag_type=\"--\"\n    )\n    highres: Optional[float] = Field(\n        description=\"Synonym for first form of --cutoff-angstroms\"\n    )\n    reindex: Optional[str] = Field(\n        description=\"Reindex according to specified operator. E.g. k,h,-l.\",\n        flag_type=\"--\",\n    )\n    # Override input symmetry\n    symmetry: Optional[str] = Field(\n        description=\"Point group symmetry to use to override. Almost always OMIT this option.\",\n        flag_type=\"--\",\n    )\n\n    @validator(\"in_file\", always=True)\n    def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n        if in_file == \"\":\n            partialator_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n            )\n            if partialator_file:\n                return partialator_file\n        return in_file\n\n    @validator(\"out_file\", always=True)\n    def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n        if out_file == \"\":\n            partialator_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n            )\n            if partialator_file:\n                mtz_out: str = partialator_file.split(\".\")[0]\n                mtz_out = f\"{mtz_out}.mtz\"\n                return mtz_out\n        return out_file\n\n    @validator(\"cell_file\", always=True)\n    def validate_cell_file(cls, cell_file: str, values: Dict[str, Any]) -> str:\n        if cell_file == \"\":\n            idx_cell_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\",\n                \"IndexCrystFEL\",\n                \"cell_file\",\n                valid_only=False,\n            )\n            if idx_cell_file:\n                return idx_cell_file\n        return cell_file\n
"},{"location":"source/io/config/#io.config.ManipulateHKLParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_merge.py
class Config(ThirdPartyParameters.Config):\n    long_flags_use_eq: bool = True\n    \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/config/#io.config.ManipulateHKLParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True class-attribute instance-attribute","text":"

Whether long command-line arguments are passed like --long=arg.

"},{"location":"source/io/config/#io.config.ManipulateHKLParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/config/#io.config.MergePartialatorParameters","title":"MergePartialatorParameters","text":"

Bases: ThirdPartyParameters

Parameters for CrystFEL's partialator.

There are many parameters, and many combinations. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-partialator.html

Source code in lute/io/models/sfx_merge.py
class MergePartialatorParameters(ThirdPartyParameters):\n    \"\"\"Parameters for CrystFEL's `partialator`.\n\n    There are many parameters, and many combinations. For more information on\n    usage, please refer to the CrystFEL documentation, here:\n    https://www.desy.de/~twhite/crystfel/manual-partialator.html\n    \"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        long_flags_use_eq: bool = True\n        \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/partialator\",\n        description=\"CrystFEL's Partialator binary.\",\n        flag_type=\"\",\n    )\n    in_file: Optional[str] = Field(\n        \"\", description=\"Path to input stream.\", flag_type=\"-\", rename_param=\"i\"\n    )\n    out_file: str = Field(\n        \"\",\n        description=\"Path to output file.\",\n        flag_type=\"-\",\n        rename_param=\"o\",\n        is_result=True,\n    )\n    symmetry: str = Field(description=\"Point group symmetry.\", flag_type=\"--\")\n    niter: Optional[int] = Field(\n        description=\"Number of cycles of scaling and post-refinement.\",\n        flag_type=\"-\",\n        rename_param=\"n\",\n    )\n    no_scale: Optional[bool] = Field(\n        description=\"Disable scaling.\", flag_type=\"--\", rename_param=\"no-scale\"\n    )\n    no_Bscale: Optional[bool] = Field(\n        description=\"Disable Debye-Waller part of scaling.\",\n        flag_type=\"--\",\n        rename_param=\"no-Bscale\",\n    )\n    no_pr: Optional[bool] = Field(\n        description=\"Disable orientation model.\", flag_type=\"--\", rename_param=\"no-pr\"\n    )\n    no_deltacchalf: Optional[bool] = Field(\n        description=\"Disable rejection based on deltaCC1/2.\",\n        flag_type=\"--\",\n        rename_param=\"no-deltacchalf\",\n    )\n    model: str = Field(\n        \"unity\",\n        description=\"Partiality model. Options: xsphere, unity, offset, ggpm.\",\n        flag_type=\"--\",\n    )\n    nthreads: int = Field(\n        max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n        description=\"Number of parallel analyses.\",\n        flag_type=\"-\",\n        rename_param=\"j\",\n    )\n    polarisation: Optional[str] = Field(\n        description=\"Specification of incident polarisation. Refer to CrystFEL docs for more info.\",\n        flag_type=\"--\",\n    )\n    no_polarisation: Optional[bool] = Field(\n        description=\"Synonym for --polarisation=none\",\n        flag_type=\"--\",\n        rename_param=\"no-polarisation\",\n    )\n    max_adu: Optional[float] = Field(\n        description=\"Maximum intensity of reflection to include.\",\n        flag_type=\"--\",\n        rename_param=\"max-adu\",\n    )\n    min_res: Optional[float] = Field(\n        description=\"Only include crystals diffracting to a minimum resolution.\",\n        flag_type=\"--\",\n        rename_param=\"min-res\",\n    )\n    min_measurements: int = Field(\n        2,\n        description=\"Include a reflection only if it appears a minimum number of times.\",\n        flag_type=\"--\",\n        rename_param=\"min-measurements\",\n    )\n    push_res: Optional[float] = Field(\n        description=\"Merge reflections up to higher than the apparent resolution limit.\",\n        flag_type=\"--\",\n        rename_param=\"push-res\",\n    )\n    start_after: int = Field(\n        0,\n        description=\"Ignore the first n crystals.\",\n        flag_type=\"--\",\n        rename_param=\"start-after\",\n    )\n    stop_after: int = Field(\n        0,\n        description=\"Stop after processing n crystals. 0 means process all.\",\n        flag_type=\"--\",\n        rename_param=\"stop-after\",\n    )\n    no_free: Optional[bool] = Field(\n        description=\"Disable cross-validation. Testing ONLY.\",\n        flag_type=\"--\",\n        rename_param=\"no-free\",\n    )\n    custom_split: Optional[str] = Field(\n        description=\"Read a set of filenames, event and dataset IDs from a filename.\",\n        flag_type=\"--\",\n        rename_param=\"custom-split\",\n    )\n    max_rel_B: float = Field(\n        100,\n        description=\"Reject crystals if |relB| > n sq Angstroms.\",\n        flag_type=\"--\",\n        rename_param=\"max-rel-B\",\n    )\n    output_every_cycle: bool = Field(\n        False,\n        description=\"Write per-crystal params after every refinement cycle.\",\n        flag_type=\"--\",\n        rename_param=\"output-every-cycle\",\n    )\n    no_logs: bool = Field(\n        False,\n        description=\"Do not write logs needed for plots, maps and graphs.\",\n        flag_type=\"--\",\n        rename_param=\"no-logs\",\n    )\n    set_symmetry: Optional[str] = Field(\n        description=\"Set the apparent symmetry of the crystals to a point group.\",\n        flag_type=\"-\",\n        rename_param=\"w\",\n    )\n    operator: Optional[str] = Field(\n        description=\"Specify an ambiguity operator. E.g. k,h,-l.\", flag_type=\"--\"\n    )\n    force_bandwidth: Optional[float] = Field(\n        description=\"Set X-ray bandwidth. As percent, e.g. 0.0013 (0.13%).\",\n        flag_type=\"--\",\n        rename_param=\"force-bandwidth\",\n    )\n    force_radius: Optional[float] = Field(\n        description=\"Set the initial profile radius (nm-1).\",\n        flag_type=\"--\",\n        rename_param=\"force-radius\",\n    )\n    force_lambda: Optional[float] = Field(\n        description=\"Set the wavelength. In Angstroms.\",\n        flag_type=\"--\",\n        rename_param=\"force-lambda\",\n    )\n    harvest_file: Optional[str] = Field(\n        description=\"Write parameters to file in JSON format.\",\n        flag_type=\"--\",\n        rename_param=\"harvest-file\",\n    )\n\n    @validator(\"in_file\", always=True)\n    def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n        if in_file == \"\":\n            stream_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\",\n                \"ConcatenateStreamFiles\",\n                \"out_file\",\n            )\n            if stream_file:\n                return stream_file\n        return in_file\n\n    @validator(\"out_file\", always=True)\n    def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n        if out_file == \"\":\n            in_file: str = values[\"in_file\"]\n            if in_file:\n                tag: str = in_file.split(\".\")[0]\n                return f\"{tag}.hkl\"\n            else:\n                return \"partialator.hkl\"\n        return out_file\n
"},{"location":"source/io/config/#io.config.MergePartialatorParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_merge.py
class Config(ThirdPartyParameters.Config):\n    long_flags_use_eq: bool = True\n    \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/config/#io.config.MergePartialatorParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True class-attribute instance-attribute","text":"

Whether long command-line arguments are passed like --long=arg.

"},{"location":"source/io/config/#io.config.MergePartialatorParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/config/#io.config.RunSHELXCParameters","title":"RunSHELXCParameters","text":"

Bases: ThirdPartyParameters

Parameters for CCP4's SHELXC program.

SHELXC prepares files for SHELXD and SHELXE.

For more information please refer to the official documentation: https://www.ccp4.ac.uk/html/crank.html

Source code in lute/io/models/sfx_solve.py
class RunSHELXCParameters(ThirdPartyParameters):\n    \"\"\"Parameters for CCP4's SHELXC program.\n\n    SHELXC prepares files for SHELXD and SHELXE.\n\n    For more information please refer to the official documentation:\n    https://www.ccp4.ac.uk/html/crank.html\n    \"\"\"\n\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/ccp4-8.0/bin/shelxc\",\n        description=\"CCP4 SHELXC. Generates input files for SHELXD/SHELXE.\",\n        flag_type=\"\",\n    )\n    placeholder: str = Field(\n        \"xx\", description=\"Placeholder filename stem.\", flag_type=\"\"\n    )\n    in_file: str = Field(\n        \"\",\n        description=\"Input file for SHELXC with reflections AND proper records.\",\n        flag_type=\"\",\n    )\n\n    @validator(\"in_file\", always=True)\n    def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n        if in_file == \"\":\n            # get_hkl needed to be run to produce an XDS format file...\n            xds_format_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"ManipulateHKL\", \"out_file\"\n            )\n            if xds_format_file:\n                in_file = xds_format_file\n        if in_file[0] != \"<\":\n            # Need to add a redirection for this program\n            # Runs like `shelxc xx <input_file.xds`\n            in_file = f\"<{in_file}\"\n        return in_file\n
"},{"location":"source/io/config/#io.config.SubmitSMDParameters","title":"SubmitSMDParameters","text":"

Bases: ThirdPartyParameters

Parameters for running smalldata to produce reduced HDF5 files.

Source code in lute/io/models/smd.py
class SubmitSMDParameters(ThirdPartyParameters):\n    \"\"\"Parameters for running smalldata to produce reduced HDF5 files.\"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        \"\"\"Identical to super-class Config but includes a result.\"\"\"\n\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n        result_from_params: str = \"\"\n        \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n\n    executable: str = Field(\"mpirun\", description=\"MPI executable.\", flag_type=\"\")\n    np: PositiveInt = Field(\n        max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n        description=\"Number of processes\",\n        flag_type=\"-\",\n    )\n    p_arg1: str = Field(\n        \"python\", description=\"Executable to run with mpi (i.e. python).\", flag_type=\"\"\n    )\n    u: str = Field(\n        \"\", description=\"Python option for unbuffered output.\", flag_type=\"-\"\n    )\n    m: str = Field(\n        \"mpi4py.run\",\n        description=\"Python option to execute a module's contents as __main__ module.\",\n        flag_type=\"-\",\n    )\n    producer: str = Field(\n        \"\", description=\"Path to the SmallData producer Python script.\", flag_type=\"\"\n    )\n    run: str = Field(\n        os.environ.get(\"RUN_NUM\", \"\"), description=\"DAQ Run Number.\", flag_type=\"--\"\n    )\n    experiment: str = Field(\n        os.environ.get(\"EXPERIMENT\", \"\"),\n        description=\"LCLS Experiment Number.\",\n        flag_type=\"--\",\n    )\n    stn: NonNegativeInt = Field(0, description=\"Hutch endstation.\", flag_type=\"--\")\n    nevents: int = Field(\n        int(1e9), description=\"Number of events to process.\", flag_type=\"--\"\n    )\n    directory: Optional[str] = Field(\n        None,\n        description=\"Optional output directory. If None, will be in ${EXP_FOLDER}/hdf5/smalldata.\",\n        flag_type=\"--\",\n    )\n    ## Need mechanism to set result_from_param=True ...\n    gather_interval: PositiveInt = Field(\n        25, description=\"Number of events to collect at a time.\", flag_type=\"--\"\n    )\n    norecorder: bool = Field(\n        False, description=\"Whether to ignore recorder streams.\", flag_type=\"--\"\n    )\n    url: HttpUrl = Field(\n        \"https://pswww.slac.stanford.edu/ws-auth/lgbk\",\n        description=\"Base URL for eLog posting.\",\n        flag_type=\"--\",\n    )\n    epicsAll: bool = Field(\n        False,\n        description=\"Whether to store all EPICS PVs. Use with care.\",\n        flag_type=\"--\",\n    )\n    full: bool = Field(\n        False,\n        description=\"Whether to store all data. Use with EXTRA care.\",\n        flag_type=\"--\",\n    )\n    fullSum: bool = Field(\n        False,\n        description=\"Whether to store sums for all area detector images.\",\n        flag_type=\"--\",\n    )\n    default: bool = Field(\n        False,\n        description=\"Whether to store only the default minimal set of data.\",\n        flag_type=\"--\",\n    )\n    image: bool = Field(\n        False,\n        description=\"Whether to save everything as images. Use with care.\",\n        flag_type=\"--\",\n    )\n    tiff: bool = Field(\n        False,\n        description=\"Whether to save all images as a single TIFF. Use with EXTRA care.\",\n        flag_type=\"--\",\n    )\n    centerpix: bool = Field(\n        False,\n        description=\"Whether to mask center pixels for Epix10k2M detectors.\",\n        flag_type=\"--\",\n    )\n    postRuntable: bool = Field(\n        False,\n        description=\"Whether to post run tables. Also used as a trigger for summary jobs.\",\n        flag_type=\"--\",\n    )\n    wait: bool = Field(\n        False, description=\"Whether to wait for a file to appear.\", flag_type=\"--\"\n    )\n    xtcav: bool = Field(\n        False,\n        description=\"Whether to add XTCAV processing to the HDF5 generation.\",\n        flag_type=\"--\",\n    )\n    noarch: bool = Field(\n        False, description=\"Whether to not use archiver data.\", flag_type=\"--\"\n    )\n\n    lute_template_cfg: TemplateConfig = TemplateConfig(template_name=\"\", output_path=\"\")\n\n    @validator(\"producer\", always=True)\n    def validate_producer_path(cls, producer: str) -> str:\n        return producer\n\n    @validator(\"lute_template_cfg\", always=True)\n    def use_producer(\n        cls, lute_template_cfg: TemplateConfig, values: Dict[str, Any]\n    ) -> TemplateConfig:\n        if not lute_template_cfg.output_path:\n            lute_template_cfg.output_path = values[\"producer\"]\n        return lute_template_cfg\n\n    @root_validator(pre=False)\n    def define_result(cls, values: Dict[str, Any]) -> Dict[str, Any]:\n        exp: str = values[\"lute_config\"].experiment\n        hutch: str = exp[:3]\n        run: int = int(values[\"lute_config\"].run)\n        directory: Optional[str] = values[\"directory\"]\n        if directory is None:\n            directory = f\"/sdf/data/lcls/ds/{hutch}/{exp}/hdf5/smalldata\"\n        fname: str = f\"{exp}_Run{run:04d}.h5\"\n\n        cls.Config.result_from_params = f\"{directory}/{fname}\"\n        return values\n
"},{"location":"source/io/config/#io.config.SubmitSMDParameters.Config","title":"Config","text":"

Bases: Config

Identical to super-class Config but includes a result.

Source code in lute/io/models/smd.py
class Config(ThirdPartyParameters.Config):\n    \"\"\"Identical to super-class Config but includes a result.\"\"\"\n\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    result_from_params: str = \"\"\n    \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n
"},{"location":"source/io/config/#io.config.SubmitSMDParameters.Config.result_from_params","title":"result_from_params: str = '' class-attribute instance-attribute","text":"

Defines a result from the parameters. Use a validator to do so.

"},{"location":"source/io/config/#io.config.SubmitSMDParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/config/#io.config.TaskParameters","title":"TaskParameters","text":"

Bases: BaseSettings

Base class for models of task parameters to be validated.

Parameters are read from a configuration YAML file and validated against subclasses of this type in order to ensure that both all parameters are present, and that the parameters are of the correct type.

Note

Pydantic is used for data validation. Pydantic does not perform \"strict\" validation by default. Parameter values may be cast to conform with the model specified by the subclass definition if it is possible to do so. Consider whether this may cause issues (e.g. if a float is cast to an int).

Source code in lute/io/models/base.py
class TaskParameters(BaseSettings):\n    \"\"\"Base class for models of task parameters to be validated.\n\n    Parameters are read from a configuration YAML file and validated against\n    subclasses of this type in order to ensure that both all parameters are\n    present, and that the parameters are of the correct type.\n\n    Note:\n        Pydantic is used for data validation. Pydantic does not perform \"strict\"\n        validation by default. Parameter values may be cast to conform with the\n        model specified by the subclass definition if it is possible to do so.\n        Consider whether this may cause issues (e.g. if a float is cast to an\n        int).\n    \"\"\"\n\n    class Config:\n        \"\"\"Configuration for parameters model.\n\n        The Config class holds Pydantic configuration. A number of LUTE-specific\n        configuration has also been placed here.\n\n        Attributes:\n            env_prefix (str): Pydantic configuration. Will set parameters from\n                environment variables containing this prefix. E.g. a model\n                parameter `input` can be set with an environment variable:\n                `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n            underscore_attrs_are_private (bool): Pydantic configuration. Whether\n                to hide attributes (parameters) prefixed with an underscore.\n\n            copy_on_model_validation (str): Pydantic configuration. How to copy\n                the input object passed to the class instance for model\n                validation. Set to perform a deep copy.\n\n            allow_inf_nan (bool): Pydantic configuration. Whether to allow\n                infinity or NAN in float fields.\n\n            run_directory (Optional[str]): None. If set, it should be a valid\n                path. The `Task` will be run from this directory. This may be\n                useful for some `Task`s which rely on searching the working\n                directory.\n\n            set_result (bool). False. If True, the model has information about\n                setting the TaskResult object from the parameters it contains.\n                E.g. it has an `output` parameter which is marked as the result.\n                The result can be set with a field value of `is_result=True` on\n                a specific parameter, or using `result_from_params` and a\n                validator.\n\n            result_from_params (Optional[str]): None. Optionally used to define\n                results from information available in the model using a custom\n                validator. E.g. use a `outdir` and `filename` field to set\n                `result_from_params=f\"{outdir}/{filename}`, etc. Only used if\n                `set_result==True`\n\n            result_summary (Optional[str]): None. Defines a result summary that\n                can be known after processing the Pydantic model. Use of summary\n                depends on the Executor running the Task. All summaries are\n                stored in the database, however. Only used if `set_result==True`\n\n            impl_schemas (Optional[str]). Specifies a the schemas the\n                output/results conform to. Only used if `set_result==True`.\n        \"\"\"\n\n        env_prefix = \"LUTE_\"\n        underscore_attrs_are_private: bool = True\n        copy_on_model_validation: str = \"deep\"\n        allow_inf_nan: bool = False\n\n        run_directory: Optional[str] = None\n        \"\"\"Set the directory that the Task is run from.\"\"\"\n        set_result: bool = False\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n        result_from_params: Optional[str] = None\n        \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n        result_summary: Optional[str] = None\n        \"\"\"Format a TaskResult.summary from output.\"\"\"\n        impl_schemas: Optional[str] = None\n        \"\"\"Schema specification for output result. Will be passed to TaskResult.\"\"\"\n\n    lute_config: AnalysisHeader\n
"},{"location":"source/io/config/#io.config.TaskParameters.Config","title":"Config","text":"

Configuration for parameters model.

The Config class holds Pydantic configuration. A number of LUTE-specific configuration has also been placed here.

Attributes:

Name Type Description env_prefix str

Pydantic configuration. Will set parameters from environment variables containing this prefix. E.g. a model parameter input can be set with an environment variable: {env_prefix}input, in LUTE's case LUTE_input.

underscore_attrs_are_private bool

Pydantic configuration. Whether to hide attributes (parameters) prefixed with an underscore.

copy_on_model_validation str

Pydantic configuration. How to copy the input object passed to the class instance for model validation. Set to perform a deep copy.

allow_inf_nan bool

Pydantic configuration. Whether to allow infinity or NAN in float fields.

run_directory Optional[str]

None. If set, it should be a valid path. The Task will be run from this directory. This may be useful for some Tasks which rely on searching the working directory.

result_from_params Optional[str]

None. Optionally used to define results from information available in the model using a custom validator. E.g. use a outdir and filename field to set result_from_params=f\"{outdir}/{filename}, etc. Only used if set_result==True

result_summary Optional[str]

None. Defines a result summary that can be known after processing the Pydantic model. Use of summary depends on the Executor running the Task. All summaries are stored in the database, however. Only used if set_result==True

Source code in lute/io/models/base.py
class Config:\n    \"\"\"Configuration for parameters model.\n\n    The Config class holds Pydantic configuration. A number of LUTE-specific\n    configuration has also been placed here.\n\n    Attributes:\n        env_prefix (str): Pydantic configuration. Will set parameters from\n            environment variables containing this prefix. E.g. a model\n            parameter `input` can be set with an environment variable:\n            `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n        underscore_attrs_are_private (bool): Pydantic configuration. Whether\n            to hide attributes (parameters) prefixed with an underscore.\n\n        copy_on_model_validation (str): Pydantic configuration. How to copy\n            the input object passed to the class instance for model\n            validation. Set to perform a deep copy.\n\n        allow_inf_nan (bool): Pydantic configuration. Whether to allow\n            infinity or NAN in float fields.\n\n        run_directory (Optional[str]): None. If set, it should be a valid\n            path. The `Task` will be run from this directory. This may be\n            useful for some `Task`s which rely on searching the working\n            directory.\n\n        set_result (bool). False. If True, the model has information about\n            setting the TaskResult object from the parameters it contains.\n            E.g. it has an `output` parameter which is marked as the result.\n            The result can be set with a field value of `is_result=True` on\n            a specific parameter, or using `result_from_params` and a\n            validator.\n\n        result_from_params (Optional[str]): None. Optionally used to define\n            results from information available in the model using a custom\n            validator. E.g. use a `outdir` and `filename` field to set\n            `result_from_params=f\"{outdir}/{filename}`, etc. Only used if\n            `set_result==True`\n\n        result_summary (Optional[str]): None. Defines a result summary that\n            can be known after processing the Pydantic model. Use of summary\n            depends on the Executor running the Task. All summaries are\n            stored in the database, however. Only used if `set_result==True`\n\n        impl_schemas (Optional[str]). Specifies a the schemas the\n            output/results conform to. Only used if `set_result==True`.\n    \"\"\"\n\n    env_prefix = \"LUTE_\"\n    underscore_attrs_are_private: bool = True\n    copy_on_model_validation: str = \"deep\"\n    allow_inf_nan: bool = False\n\n    run_directory: Optional[str] = None\n    \"\"\"Set the directory that the Task is run from.\"\"\"\n    set_result: bool = False\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n    result_from_params: Optional[str] = None\n    \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n    result_summary: Optional[str] = None\n    \"\"\"Format a TaskResult.summary from output.\"\"\"\n    impl_schemas: Optional[str] = None\n    \"\"\"Schema specification for output result. Will be passed to TaskResult.\"\"\"\n
"},{"location":"source/io/config/#io.config.TaskParameters.Config.impl_schemas","title":"impl_schemas: Optional[str] = None class-attribute instance-attribute","text":"

Schema specification for output result. Will be passed to TaskResult.

"},{"location":"source/io/config/#io.config.TaskParameters.Config.result_from_params","title":"result_from_params: Optional[str] = None class-attribute instance-attribute","text":"

Defines a result from the parameters. Use a validator to do so.

"},{"location":"source/io/config/#io.config.TaskParameters.Config.result_summary","title":"result_summary: Optional[str] = None class-attribute instance-attribute","text":"

Format a TaskResult.summary from output.

"},{"location":"source/io/config/#io.config.TaskParameters.Config.run_directory","title":"run_directory: Optional[str] = None class-attribute instance-attribute","text":"

Set the directory that the Task is run from.

"},{"location":"source/io/config/#io.config.TaskParameters.Config.set_result","title":"set_result: bool = False class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/config/#io.config.TemplateConfig","title":"TemplateConfig","text":"

Bases: BaseModel

Parameters used for templating of third party configuration files.

Attributes:

Name Type Description template_name str

The name of the template to use. This template must live in config/templates.

output_path str

The FULL path, including filename to write the rendered template to.

Source code in lute/io/models/base.py
class TemplateConfig(BaseModel):\n    \"\"\"Parameters used for templating of third party configuration files.\n\n    Attributes:\n        template_name (str): The name of the template to use. This template must\n            live in `config/templates`.\n\n        output_path (str): The FULL path, including filename to write the\n            rendered template to.\n    \"\"\"\n\n    template_name: str\n    output_path: str\n
"},{"location":"source/io/config/#io.config.TemplateParameters","title":"TemplateParameters","text":"

Class for representing parameters for third party configuration files.

These parameters can represent arbitrary data types and are used in conjunction with templates for modifying third party configuration files from the single LUTE YAML. Due to the storage of arbitrary data types, and the use of a template file, a single instance of this class can hold from a single template variable to an entire configuration file. The data parsing is done by jinja using the complementary template. All data is stored in the single model variable params.

The pydantic \"dataclass\" is used over the BaseModel/Settings to allow positional argument instantiation of the params Field.

Source code in lute/io/models/base.py
@dataclass\nclass TemplateParameters:\n    \"\"\"Class for representing parameters for third party configuration files.\n\n    These parameters can represent arbitrary data types and are used in\n    conjunction with templates for modifying third party configuration files\n    from the single LUTE YAML. Due to the storage of arbitrary data types, and\n    the use of a template file, a single instance of this class can hold from a\n    single template variable to an entire configuration file. The data parsing\n    is done by jinja using the complementary template.\n    All data is stored in the single model variable `params.`\n\n    The pydantic \"dataclass\" is used over the BaseModel/Settings to allow\n    positional argument instantiation of the `params` Field.\n    \"\"\"\n\n    params: Any\n
"},{"location":"source/io/config/#io.config.TestBinaryErrParameters","title":"TestBinaryErrParameters","text":"

Bases: ThirdPartyParameters

Same as TestBinary, but exits with non-zero code.

Source code in lute/io/models/tests.py
class TestBinaryErrParameters(ThirdPartyParameters):\n    \"\"\"Same as TestBinary, but exits with non-zero code.\"\"\"\n\n    executable: str = Field(\n        \"/sdf/home/d/dorlhiac/test_tasks/test_threads_err\",\n        description=\"Multi-threaded tes tbinary with non-zero exit code.\",\n    )\n    p_arg1: int = Field(1, description=\"Number of threads.\")\n
"},{"location":"source/io/config/#io.config.TestMultiNodeCommunicationParameters","title":"TestMultiNodeCommunicationParameters","text":"

Bases: TaskParameters

Parameters for the test Task TestMultiNodeCommunication.

Test verifies communication across multiple machines.

Source code in lute/io/models/mpi_tests.py
class TestMultiNodeCommunicationParameters(TaskParameters):\n    \"\"\"Parameters for the test Task `TestMultiNodeCommunication`.\n\n    Test verifies communication across multiple machines.\n    \"\"\"\n\n    send_obj: Literal[\"plot\", \"array\"] = Field(\n        \"array\", description=\"Object to send to Executor. `plot` or `array`\"\n    )\n    arr_size: Optional[int] = Field(\n        None, description=\"Size of array to send back to Executor.\"\n    )\n
"},{"location":"source/io/config/#io.config.TestParameters","title":"TestParameters","text":"

Bases: TaskParameters

Parameters for the test Task Test.

Source code in lute/io/models/tests.py
class TestParameters(TaskParameters):\n    \"\"\"Parameters for the test Task `Test`.\"\"\"\n\n    float_var: float = Field(0.01, description=\"A floating point number.\")\n    str_var: str = Field(\"test\", description=\"A string.\")\n\n    class CompoundVar(BaseModel):\n        int_var: int = 1\n        dict_var: Dict[str, str] = {\"a\": \"b\"}\n\n    compound_var: CompoundVar = Field(\n        description=(\n            \"A compound parameter - consists of a `int_var` (int) and `dict_var`\"\n            \" (Dict[str, str]).\"\n        )\n    )\n    throw_error: bool = Field(\n        False, description=\"If `True`, raise an exception to test error handling.\"\n    )\n
"},{"location":"source/io/config/#io.config.ThirdPartyParameters","title":"ThirdPartyParameters","text":"

Bases: TaskParameters

Base class for third party task parameters.

Contains special validators for extra arguments and handling of parameters used for filling in third party configuration files.

Source code in lute/io/models/base.py
class ThirdPartyParameters(TaskParameters):\n    \"\"\"Base class for third party task parameters.\n\n    Contains special validators for extra arguments and handling of parameters\n    used for filling in third party configuration files.\n    \"\"\"\n\n    class Config(TaskParameters.Config):\n        \"\"\"Configuration for parameters model.\n\n        The Config class holds Pydantic configuration and inherited configuration\n        from the base `TaskParameters.Config` class. A number of values are also\n        overridden, and there are some specific configuration options to\n        ThirdPartyParameters. A full list of options (with TaskParameters options\n        repeated) is described below.\n\n        Attributes:\n            env_prefix (str): Pydantic configuration. Will set parameters from\n                environment variables containing this prefix. E.g. a model\n                parameter `input` can be set with an environment variable:\n                `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n            underscore_attrs_are_private (bool): Pydantic configuration. Whether\n                to hide attributes (parameters) prefixed with an underscore.\n\n            copy_on_model_validation (str): Pydantic configuration. How to copy\n                the input object passed to the class instance for model\n                validation. Set to perform a deep copy.\n\n            allow_inf_nan (bool): Pydantic configuration. Whether to allow\n                infinity or NAN in float fields.\n\n            run_directory (Optional[str]): None. If set, it should be a valid\n                path. The `Task` will be run from this directory. This may be\n                useful for some `Task`s which rely on searching the working\n                directory.\n\n            set_result (bool). True. If True, the model has information about\n                setting the TaskResult object from the parameters it contains.\n                E.g. it has an `output` parameter which is marked as the result.\n                The result can be set with a field value of `is_result=True` on\n                a specific parameter, or using `result_from_params` and a\n                validator.\n\n            result_from_params (Optional[str]): None. Optionally used to define\n                results from information available in the model using a custom\n                validator. E.g. use a `outdir` and `filename` field to set\n                `result_from_params=f\"{outdir}/{filename}`, etc.\n\n            result_summary (Optional[str]): None. Defines a result summary that\n                can be known after processing the Pydantic model. Use of summary\n                depends on the Executor running the Task. All summaries are\n                stored in the database, however.\n\n            impl_schemas (Optional[str]). Specifies a the schemas the\n                output/results conform to. Only used if set_result is True.\n\n            -----------------------\n            ThirdPartyTask-specific:\n\n            extra (str): \"allow\". Pydantic configuration. Allow (or ignore) extra\n                arguments.\n\n            short_flags_use_eq (bool): False. If True, \"short\" command-line args\n                are passed as `-x=arg`. ThirdPartyTask-specific.\n\n            long_flags_use_eq (bool): False. If True, \"long\" command-line args\n                are passed as `--long=arg`. ThirdPartyTask-specific.\n        \"\"\"\n\n        extra: str = \"allow\"\n        short_flags_use_eq: bool = False\n        \"\"\"Whether short command-line arguments are passed like `-x=arg`.\"\"\"\n        long_flags_use_eq: bool = False\n        \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    # lute_template_cfg: TemplateConfig\n\n    @root_validator(pre=False)\n    def extra_fields_to_thirdparty(cls, values: Dict[str, Any]):\n        for key in values:\n            if key not in cls.__fields__:\n                values[key] = TemplateParameters(values[key])\n\n        return values\n
"},{"location":"source/io/config/#io.config.ThirdPartyParameters.Config","title":"Config","text":"

Bases: Config

Configuration for parameters model.

The Config class holds Pydantic configuration and inherited configuration from the base TaskParameters.Config class. A number of values are also overridden, and there are some specific configuration options to ThirdPartyParameters. A full list of options (with TaskParameters options repeated) is described below.

Attributes:

Name Type Description env_prefix str

Pydantic configuration. Will set parameters from environment variables containing this prefix. E.g. a model parameter input can be set with an environment variable: {env_prefix}input, in LUTE's case LUTE_input.

underscore_attrs_are_private bool

Pydantic configuration. Whether to hide attributes (parameters) prefixed with an underscore.

copy_on_model_validation str

Pydantic configuration. How to copy the input object passed to the class instance for model validation. Set to perform a deep copy.

allow_inf_nan bool

Pydantic configuration. Whether to allow infinity or NAN in float fields.

run_directory Optional[str]

None. If set, it should be a valid path. The Task will be run from this directory. This may be useful for some Tasks which rely on searching the working directory.

result_from_params Optional[str]

None. Optionally used to define results from information available in the model using a custom validator. E.g. use a outdir and filename field to set result_from_params=f\"{outdir}/{filename}, etc.

result_summary Optional[str]

None. Defines a result summary that can be known after processing the Pydantic model. Use of summary depends on the Executor running the Task. All summaries are stored in the database, however.

ThirdPartyTask-specific Optional[str] extra str

\"allow\". Pydantic configuration. Allow (or ignore) extra arguments.

short_flags_use_eq bool

False. If True, \"short\" command-line args are passed as -x=arg. ThirdPartyTask-specific.

long_flags_use_eq bool

False. If True, \"long\" command-line args are passed as --long=arg. ThirdPartyTask-specific.

Source code in lute/io/models/base.py
class Config(TaskParameters.Config):\n    \"\"\"Configuration for parameters model.\n\n    The Config class holds Pydantic configuration and inherited configuration\n    from the base `TaskParameters.Config` class. A number of values are also\n    overridden, and there are some specific configuration options to\n    ThirdPartyParameters. A full list of options (with TaskParameters options\n    repeated) is described below.\n\n    Attributes:\n        env_prefix (str): Pydantic configuration. Will set parameters from\n            environment variables containing this prefix. E.g. a model\n            parameter `input` can be set with an environment variable:\n            `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n        underscore_attrs_are_private (bool): Pydantic configuration. Whether\n            to hide attributes (parameters) prefixed with an underscore.\n\n        copy_on_model_validation (str): Pydantic configuration. How to copy\n            the input object passed to the class instance for model\n            validation. Set to perform a deep copy.\n\n        allow_inf_nan (bool): Pydantic configuration. Whether to allow\n            infinity or NAN in float fields.\n\n        run_directory (Optional[str]): None. If set, it should be a valid\n            path. The `Task` will be run from this directory. This may be\n            useful for some `Task`s which rely on searching the working\n            directory.\n\n        set_result (bool). True. If True, the model has information about\n            setting the TaskResult object from the parameters it contains.\n            E.g. it has an `output` parameter which is marked as the result.\n            The result can be set with a field value of `is_result=True` on\n            a specific parameter, or using `result_from_params` and a\n            validator.\n\n        result_from_params (Optional[str]): None. Optionally used to define\n            results from information available in the model using a custom\n            validator. E.g. use a `outdir` and `filename` field to set\n            `result_from_params=f\"{outdir}/{filename}`, etc.\n\n        result_summary (Optional[str]): None. Defines a result summary that\n            can be known after processing the Pydantic model. Use of summary\n            depends on the Executor running the Task. All summaries are\n            stored in the database, however.\n\n        impl_schemas (Optional[str]). Specifies a the schemas the\n            output/results conform to. Only used if set_result is True.\n\n        -----------------------\n        ThirdPartyTask-specific:\n\n        extra (str): \"allow\". Pydantic configuration. Allow (or ignore) extra\n            arguments.\n\n        short_flags_use_eq (bool): False. If True, \"short\" command-line args\n            are passed as `-x=arg`. ThirdPartyTask-specific.\n\n        long_flags_use_eq (bool): False. If True, \"long\" command-line args\n            are passed as `--long=arg`. ThirdPartyTask-specific.\n    \"\"\"\n\n    extra: str = \"allow\"\n    short_flags_use_eq: bool = False\n    \"\"\"Whether short command-line arguments are passed like `-x=arg`.\"\"\"\n    long_flags_use_eq: bool = False\n    \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/config/#io.config.ThirdPartyParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = False class-attribute instance-attribute","text":"

Whether long command-line arguments are passed like --long=arg.

"},{"location":"source/io/config/#io.config.ThirdPartyParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/config/#io.config.ThirdPartyParameters.Config.short_flags_use_eq","title":"short_flags_use_eq: bool = False class-attribute instance-attribute","text":"

Whether short command-line arguments are passed like -x=arg.

"},{"location":"source/io/config/#io.config.parse_config","title":"parse_config(task_name='test', config_path='')","text":"

Parse a configuration file and validate the contents.

Parameters:

Name Type Description Default task_name str

Name of the specific task that will be run.

'test' config_path str

Path to the configuration file.

''

Returns:

Name Type Description params TaskParameters

A TaskParameters object of validated task-specific parameters. Parameters are accessed with \"dot\" notation. E.g. params.param1.

Raises:

Type Description ValidationError

Raised if there are problems with the configuration file. Passed through from Pydantic.

Source code in lute/io/config.py
def parse_config(task_name: str = \"test\", config_path: str = \"\") -> TaskParameters:\n    \"\"\"Parse a configuration file and validate the contents.\n\n    Args:\n        task_name (str): Name of the specific task that will be run.\n\n        config_path (str): Path to the configuration file.\n\n    Returns:\n        params (TaskParameters): A TaskParameters object of validated\n            task-specific parameters. Parameters are accessed with \"dot\"\n            notation. E.g. `params.param1`.\n\n    Raises:\n        ValidationError: Raised if there are problems with the configuration\n            file. Passed through from Pydantic.\n    \"\"\"\n    task_config_name: str = f\"{task_name}Parameters\"\n\n    with open(config_path, \"r\") as f:\n        docs: Iterator[Dict[str, Any]] = yaml.load_all(stream=f, Loader=yaml.FullLoader)\n        header: Dict[str, Any] = next(docs)\n        config: Dict[str, Any] = next(docs)\n    substitute_variables(header, header)\n    substitute_variables(header, config)\n    LUTE_DEBUG_EXIT(\"LUTE_DEBUG_EXIT_AT_YAML\", pprint.pformat(config))\n    lute_config: Dict[str, AnalysisHeader] = {\"lute_config\": AnalysisHeader(**header)}\n    try:\n        task_config: Dict[str, Any] = dict(config[task_name])\n        lute_config.update(task_config)\n    except KeyError as err:\n        warnings.warn(\n            (\n                f\"{task_name} has no parameter definitions in YAML file.\"\n                \" Attempting default parameter initialization.\"\n            )\n        )\n    parsed_parameters: TaskParameters = globals()[task_config_name](**lute_config)\n    return parsed_parameters\n
"},{"location":"source/io/config/#io.config.substitute_variables","title":"substitute_variables(header, config, curr_key=None)","text":"

Performs variable substitutions on a dictionary read from config YAML file.

Can be used to define input parameters in terms of other input parameters. This is similar to functionality employed by validators for parameters in the specific Task models, but is intended to be more accessible to users. Variable substitutions are defined using a minimal syntax from Jinja: {{ experiment }} defines a substitution of the variable experiment. The characters {{ }} can be escaped if the literal symbols are needed in place.

For example, a path to a file can be defined in terms of experiment and run values in the config file: MyTask: experiment: myexp run: 2 special_file: /path/to/{{ experiment }}/{{ run }}/file.inp

Acceptable variables for substitutions are values defined elsewhere in the YAML file. Environment variables can also be used if prefaced with a $ character. E.g. to get the experiment from an environment variable: MyTask: run: 2 special_file: /path/to/{{ $EXPERIMENT }}/{{ run }}/file.inp

Parameters:

Name Type Description Default config Dict[str, Any]

A dictionary of parsed configuration.

required curr_key Optional[str]

Used to keep track of recursion level when scanning through iterable items in the config dictionary.

None

Returns:

Name Type Description subbed_config Dict[str, Any]

The config dictionary after substitutions have been made. May be identical to the input if no substitutions are needed.

Source code in lute/io/config.py
def substitute_variables(\n    header: Dict[str, Any], config: Dict[str, Any], curr_key: Optional[str] = None\n) -> None:\n    \"\"\"Performs variable substitutions on a dictionary read from config YAML file.\n\n    Can be used to define input parameters in terms of other input parameters.\n    This is similar to functionality employed by validators for parameters in\n    the specific Task models, but is intended to be more accessible to users.\n    Variable substitutions are defined using a minimal syntax from Jinja:\n                               {{ experiment }}\n    defines a substitution of the variable `experiment`. The characters `{{ }}`\n    can be escaped if the literal symbols are needed in place.\n\n    For example, a path to a file can be defined in terms of experiment and run\n    values in the config file:\n        MyTask:\n          experiment: myexp\n          run: 2\n          special_file: /path/to/{{ experiment }}/{{ run }}/file.inp\n\n    Acceptable variables for substitutions are values defined elsewhere in the\n    YAML file. Environment variables can also be used if prefaced with a `$`\n    character. E.g. to get the experiment from an environment variable:\n        MyTask:\n          run: 2\n          special_file: /path/to/{{ $EXPERIMENT }}/{{ run }}/file.inp\n\n    Args:\n        config (Dict[str, Any]):  A dictionary of parsed configuration.\n\n        curr_key (Optional[str]): Used to keep track of recursion level when scanning\n            through iterable items in the config dictionary.\n\n    Returns:\n        subbed_config (Dict[str, Any]): The config dictionary after substitutions\n            have been made. May be identical to the input if no substitutions are\n            needed.\n    \"\"\"\n    _sub_pattern = r\"\\{\\{[^}{]*\\}\\}\"\n    iterable: Dict[str, Any] = config\n    if curr_key is not None:\n        # Need to handle nested levels by interpreting curr_key\n        keys_by_level: List[str] = curr_key.split(\".\")\n        for key in keys_by_level:\n            iterable = iterable[key]\n    else:\n        ...\n        # iterable = config\n    for param, value in iterable.items():\n        if isinstance(value, dict):\n            new_key: str\n            if curr_key is None:\n                new_key = param\n            else:\n                new_key = f\"{curr_key}.{param}\"\n            substitute_variables(header, config, curr_key=new_key)\n        elif isinstance(value, list):\n            ...\n        # Scalars str - we skip numeric types\n        elif isinstance(value, str):\n            matches: List[str] = re.findall(_sub_pattern, value)\n            for m in matches:\n                key_to_sub_maybe_with_fmt: List[str] = m[2:-2].strip().split(\":\")\n                key_to_sub: str = key_to_sub_maybe_with_fmt[0]\n                fmt: Optional[str] = None\n                if len(key_to_sub_maybe_with_fmt) == 2:\n                    fmt = key_to_sub_maybe_with_fmt[1]\n                sub: Any\n                if key_to_sub[0] == \"$\":\n                    sub = os.getenv(key_to_sub[1:], None)\n                    if sub is None:\n                        print(\n                            f\"Environment variable {key_to_sub[1:]} not found! Cannot substitute in YAML config!\",\n                            flush=True,\n                        )\n                        continue\n                    # substitutions from env vars will be strings, so convert back\n                    # to numeric in order to perform formatting later on (e.g. {var:04d})\n                    sub = _check_str_numeric(sub)\n                else:\n                    try:\n                        sub = config\n                        for key in key_to_sub.split(\".\"):\n                            sub = sub[key]\n                    except KeyError:\n                        sub = header[key_to_sub]\n                pattern: str = (\n                    m.replace(\"{{\", r\"\\{\\{\").replace(\"}}\", r\"\\}\\}\").replace(\"$\", r\"\\$\")\n                )\n                if fmt is not None:\n                    sub = f\"{sub:{fmt}}\"\n                else:\n                    sub = f\"{sub}\"\n                iterable[param] = re.sub(pattern, sub, iterable[param])\n            # Reconvert back to numeric values if needed...\n            iterable[param] = _check_str_numeric(iterable[param])\n
"},{"location":"source/io/db/","title":"db","text":"

Tools for working with the LUTE parameter and configuration database.

The current implementation relies on a sqlite backend database. In the future this may change - therefore relatively few high-level API function calls are intended to be public. These abstract away the details of the database interface and work exclusively on LUTE objects.

Functions:

Name Description record_analysis_db

DescribedAnalysis) -> None: Writes the configuration to the backend database.

read_latest_db_entry

str, task_name: str, param: str) -> Any: Retrieve the most recent entry from a database for a specific Task.

Raises:

Type Description DatabaseError

Generic exception raised for LUTE database errors.

"},{"location":"source/io/db/#io.db.DatabaseError","title":"DatabaseError","text":"

Bases: Exception

General LUTE database error.

Source code in lute/io/db.py
class DatabaseError(Exception):\n    \"\"\"General LUTE database error.\"\"\"\n\n    ...\n
"},{"location":"source/io/db/#io.db.read_latest_db_entry","title":"read_latest_db_entry(db_dir, task_name, param, valid_only=True)","text":"

Read most recent value entered into the database for a Task parameter.

(Will be updated for schema compliance as well as Task name.)

Parameters:

Name Type Description Default db_dir str

Database location.

required task_name str

The name of the Task to check the database for.

required param str

The parameter name for the Task that we want to retrieve.

required valid_only bool

Whether to consider only valid results or not. E.g. An input file may be useful even if the Task result is invalid (Failed). Default = True.

True

Returns:

Name Type Description val Any

The most recently entered value for param of task_name that can be found in the database. Returns None if nothing found.

Source code in lute/io/db.py
def read_latest_db_entry(\n    db_dir: str, task_name: str, param: str, valid_only: bool = True\n) -> Optional[Any]:\n    \"\"\"Read most recent value entered into the database for a Task parameter.\n\n    (Will be updated for schema compliance as well as Task name.)\n\n    Args:\n        db_dir (str): Database location.\n\n        task_name (str): The name of the Task to check the database for.\n\n        param (str): The parameter name for the Task that we want to retrieve.\n\n        valid_only (bool): Whether to consider only valid results or not. E.g.\n            An input file may be useful even if the Task result is invalid\n            (Failed). Default = True.\n\n    Returns:\n        val (Any): The most recently entered value for `param` of `task_name`\n            that can be found in the database. Returns None if nothing found.\n    \"\"\"\n    import sqlite3\n    from ._sqlite import _select_from_db\n\n    con: sqlite3.Connection = sqlite3.Connection(f\"{db_dir}/lute.db\")\n    with con:\n        try:\n            cond: Dict[str, str] = {}\n            if valid_only:\n                cond = {\"valid_flag\": \"1\"}\n            entry: Any = _select_from_db(con, task_name, param, cond)\n        except sqlite3.OperationalError as err:\n            logger.debug(f\"Cannot retrieve value {param} due to: {err}\")\n            entry = None\n    return entry\n
"},{"location":"source/io/db/#io.db.record_analysis_db","title":"record_analysis_db(cfg)","text":"

Write an DescribedAnalysis object to the database.

The DescribedAnalysis object is maintained by the Executor and contains all information necessary to fully describe a single Task execution. The contained fields are split across multiple tables within the database as some of the information can be shared across multiple Tasks. Refer to docs/design/database.md for more information on the database specification.

Source code in lute/io/db.py
def record_analysis_db(cfg: DescribedAnalysis) -> None:\n    \"\"\"Write an DescribedAnalysis object to the database.\n\n    The DescribedAnalysis object is maintained by the Executor and contains all\n    information necessary to fully describe a single `Task` execution. The\n    contained fields are split across multiple tables within the database as\n    some of the information can be shared across multiple Tasks. Refer to\n    `docs/design/database.md` for more information on the database specification.\n    \"\"\"\n    import sqlite3\n    from ._sqlite import (\n        _make_shared_table,\n        _make_task_table,\n        _add_row_no_duplicate,\n        _add_task_entry,\n    )\n\n    try:\n        work_dir: str = cfg.task_parameters.lute_config.work_dir\n    except AttributeError:\n        logger.info(\n            (\n                \"Unable to access TaskParameters object. Likely wasn't created. \"\n                \"Cannot store result.\"\n            )\n        )\n        return\n    del cfg.task_parameters.lute_config.work_dir\n\n    exec_entry, exec_columns = _cfg_to_exec_entry_cols(cfg)\n    task_name: str = cfg.task_result.task_name\n    # All `Task`s have an AnalysisHeader, but this info can be shared so is\n    # split into a different table\n    (\n        task_entry,  # Dict[str, Any]\n        task_columns,  # Dict[str, str]\n        gen_entry,  # Dict[str, Any]\n        gen_columns,  # Dict[str, str]\n    ) = _params_to_entry_cols(cfg.task_parameters)\n    x, y = _result_to_entry_cols(cfg.task_result)\n    task_entry.update(x)\n    task_columns.update(y)\n\n    con: sqlite3.Connection = sqlite3.Connection(f\"{work_dir}/lute.db\")\n    with con:\n        # --- Table Creation ---#\n        if not _make_shared_table(con, \"gen_cfg\", gen_columns):\n            raise DatabaseError(\"Could not make general configuration table!\")\n        if not _make_shared_table(con, \"exec_cfg\", exec_columns):\n            raise DatabaseError(\"Could not make Executor configuration table!\")\n        if not _make_task_table(con, task_name, task_columns):\n            raise DatabaseError(f\"Could not make Task table for: {task_name}!\")\n\n        # --- Row Addition ---#\n        gen_id: int = _add_row_no_duplicate(con, \"gen_cfg\", gen_entry)\n        exec_id: int = _add_row_no_duplicate(con, \"exec_cfg\", exec_entry)\n\n        full_task_entry: Dict[str, Any] = {\n            \"gen_cfg_id\": gen_id,\n            \"exec_cfg_id\": exec_id,\n        }\n        full_task_entry.update(task_entry)\n        # Prepare flag to indicate whether the task entry is valid or not\n        # By default we say it is assuming proper completion\n        valid_flag: int = (\n            1 if cfg.task_result.task_status == TaskStatus.COMPLETED else 0\n        )\n        full_task_entry.update({\"valid_flag\": valid_flag})\n\n        _add_task_entry(con, task_name, full_task_entry)\n
"},{"location":"source/io/elog/","title":"elog","text":"

Provides utilities for communicating with the LCLS eLog.

Make use of various eLog API endpoint to retrieve information or post results.

Functions:

Name Description get_elog_opr_auth

str): Return an authorization object to interact with eLog API as an opr account for the hutch where exp was conducted.

get_elog_kerberos_auth

Return the authorization headers for the user account submitting the job.

elog_http_request

str, request_type: str, **params): Make an HTTP request to the API endpoint at url.

format_file_for_post

Union[str, tuple, list]): Prepare files according to the specification needed to add them as attachments to eLog posts.

post_elog_message

str, msg: str, tag: Optional[str], title: Optional[str], in_files: List[Union[str, tuple, list]], auth: Optional[Union[HTTPBasicAuth, Dict]] = None) Post a message to the eLog.

post_elog_run_status

Dict[str, Union[str, int, float]], update_url: Optional[str] = None) Post a run status to the summary section on the Workflows>Control tab.

post_elog_run_table

str, run: int, data: Dict[str, Any], auth: Optional[Union[HTTPBasicAuth, Dict]] = None) Update run table in the eLog.

get_elog_runs_by_tag

str, tag: str, auth: Optional[Union[HTTPBasicAuth, Dict]] = None) Return a list of runs with a specific tag.

get_elog_params_by_run

str, params: List[str], runs: Optional[List[int]]) Retrieve the requested parameters by run. If no run is provided, retrieve the requested parameters for all runs.

"},{"location":"source/io/elog/#io.elog.elog_http_request","title":"elog_http_request(exp, endpoint, request_type, **params)","text":"

Make an HTTP request to the eLog.

This method will determine the proper authorization method and update the passed parameters appropriately. Functions implementing specific endpoint functionality and calling this function should only pass the necessary endpoint-specific parameters and not include the authorization objects.

Parameters:

Name Type Description Default exp str

Experiment.

required endpoint str

eLog API endpoint.

required request_type str

Type of request to make. Recognized options: POST or GET.

required **params Dict

Endpoint parameters to pass with the HTTP request! Differs depending on the API endpoint. Do not include auth objects.

{}

Returns:

Name Type Description status_code int

Response status code. Can be checked for errors.

msg str

An error message, or a message saying SUCCESS.

value Optional[Any]

For GET requests ONLY, return the requested information.

Source code in lute/io/elog.py
def elog_http_request(\n    exp: str, endpoint: str, request_type: str, **params\n) -> Tuple[int, str, Optional[Any]]:\n    \"\"\"Make an HTTP request to the eLog.\n\n    This method will determine the proper authorization method and update the\n    passed parameters appropriately. Functions implementing specific endpoint\n    functionality and calling this function should only pass the necessary\n    endpoint-specific parameters and not include the authorization objects.\n\n    Args:\n        exp (str): Experiment.\n\n        endpoint (str): eLog API endpoint.\n\n        request_type (str): Type of request to make. Recognized options: POST or\n            GET.\n\n        **params (Dict): Endpoint parameters to pass with the HTTP request!\n            Differs depending on the API endpoint. Do not include auth objects.\n\n    Returns:\n        status_code (int): Response status code. Can be checked for errors.\n\n        msg (str): An error message, or a message saying SUCCESS.\n\n        value (Optional[Any]): For GET requests ONLY, return the requested\n            information.\n    \"\"\"\n    auth: Union[HTTPBasicAuth, Dict[str, str]] = get_elog_auth(exp)\n    base_url: str\n    if isinstance(auth, HTTPBasicAuth):\n        params.update({\"auth\": auth})\n        base_url = \"https://pswww.slac.stanford.edu/ws-auth/lgbk/lgbk\"\n    elif isinstance(auth, dict):\n        params.update({\"headers\": auth})\n        base_url = \"https://pswww.slac.stanford.edu/ws-kerb/lgbk/lgbk\"\n\n    url: str = f\"{base_url}/{endpoint}\"\n\n    resp: requests.models.Response\n    if request_type.upper() == \"POST\":\n        resp = requests.post(url, **params)\n    elif request_type.upper() == \"GET\":\n        resp = requests.get(url, **params)\n    else:\n        return (-1, \"Invalid request type!\", None)\n\n    status_code: int = resp.status_code\n    msg: str = \"SUCCESS\"\n\n    if resp.json()[\"success\"] and request_type.upper() == \"GET\":\n        return (status_code, msg, resp.json()[\"value\"])\n\n    if status_code >= 300:\n        msg = f\"Error when posting to eLog: Response {status_code}\"\n\n    if not resp.json()[\"success\"]:\n        err_msg = resp.json()[\"error_msg\"]\n        msg += f\"\\nInclude message: {err_msg}\"\n    return (resp.status_code, msg, None)\n
"},{"location":"source/io/elog/#io.elog.format_file_for_post","title":"format_file_for_post(in_file)","text":"

Format a file for attachment to an eLog post.

The eLog API expects a specifically formatted tuple when adding file attachments. This function prepares the tuple to specification given a number of different input types.

Parameters:

Name Type Description Default in_file str | tuple | list

File to include as an attachment in an eLog post.

required Source code in lute/io/elog.py
def format_file_for_post(\n    in_file: Union[str, tuple, list]\n) -> Tuple[str, Tuple[str, BufferedReader], Any]:\n    \"\"\"Format a file for attachment to an eLog post.\n\n    The eLog API expects a specifically formatted tuple when adding file\n    attachments. This function prepares the tuple to specification given a\n    number of different input types.\n\n    Args:\n        in_file (str | tuple | list): File to include as an attachment in an\n            eLog post.\n    \"\"\"\n    description: str\n    fptr: BufferedReader\n    ftype: Optional[str]\n    if isinstance(in_file, str):\n        description = os.path.basename(in_file)\n        fptr = open(in_file, \"rb\")\n        ftype = mimetypes.guess_type(in_file)[0]\n    elif isinstance(in_file, tuple) or isinstance(in_file, list):\n        description = in_file[1]\n        fptr = open(in_file[0], \"rb\")\n        ftype = mimetypes.guess_type(in_file[0])[0]\n    else:\n        raise ElogFileFormatError(f\"Unrecognized format: {in_file}\")\n\n    out_file: Tuple[str, Tuple[str, BufferedReader], Any] = (\n        \"files\",\n        (description, fptr),\n        ftype,\n    )\n    return out_file\n
"},{"location":"source/io/elog/#io.elog.get_elog_active_expmt","title":"get_elog_active_expmt(hutch, *, endstation=0)","text":"

Get the current active experiment for a hutch.

This function is one of two functions to manage the HTTP request independently. This is because it does not require an authorization object, and its result is needed for the generic function elog_http_request to work properly.

Parameters:

Name Type Description Default hutch str

The hutch to get the active experiment for.

required endstation int

The hutch endstation to get the experiment for. This should generally be 0.

0 Source code in lute/io/elog.py
def get_elog_active_expmt(hutch: str, *, endstation: int = 0) -> str:\n    \"\"\"Get the current active experiment for a hutch.\n\n    This function is one of two functions to manage the HTTP request independently.\n    This is because it does not require an authorization object, and its result\n    is needed for the generic function `elog_http_request` to work properly.\n\n    Args:\n        hutch (str): The hutch to get the active experiment for.\n\n        endstation (int): The hutch endstation to get the experiment for. This\n            should generally be 0.\n    \"\"\"\n\n    base_url: str = \"https://pswww.slac.stanford.edu/ws/lgbk/lgbk\"\n    endpoint: str = \"ws/activeexperiment_for_instrument_station\"\n    url: str = f\"{base_url}/{endpoint}\"\n    params: Dict[str, str] = {\"instrument_name\": hutch, \"station\": f\"{endstation}\"}\n    resp: requests.models.Response = requests.get(url, params)\n    if resp.status_code > 300:\n        raise RuntimeError(\n            f\"Error getting current experiment!\\n\\t\\tIncorrect hutch: '{hutch}'?\"\n        )\n    if resp.json()[\"success\"]:\n        return resp.json()[\"value\"][\"name\"]\n    else:\n        msg: str = resp.json()[\"error_msg\"]\n        raise RuntimeError(f\"Error getting current experiment! Err: {msg}\")\n
"},{"location":"source/io/elog/#io.elog.get_elog_auth","title":"get_elog_auth(exp)","text":"

Determine the appropriate auth method depending on experiment state.

Returns:

Name Type Description auth HTTPBasicAuth | Dict[str, str]

Depending on whether an experiment is active/live, returns authorization for the hutch operator account or the current user submitting a job.

Source code in lute/io/elog.py
def get_elog_auth(exp: str) -> Union[HTTPBasicAuth, Dict[str, str]]:\n    \"\"\"Determine the appropriate auth method depending on experiment state.\n\n    Returns:\n        auth (HTTPBasicAuth | Dict[str, str]): Depending on whether an experiment\n            is active/live, returns authorization for the hutch operator account\n            or the current user submitting a job.\n    \"\"\"\n    hutch: str = exp[:3]\n    if exp.lower() == get_elog_active_expmt(hutch=hutch).lower():\n        return get_elog_opr_auth(exp)\n    else:\n        return get_elog_kerberos_auth()\n
"},{"location":"source/io/elog/#io.elog.get_elog_kerberos_auth","title":"get_elog_kerberos_auth()","text":"

Returns Kerberos authorization key.

This functions returns authorization for the USER account submitting jobs. It assumes that kinit has been run.

Returns:

Name Type Description auth Dict[str, str]

Dictionary containing Kerberos authorization key.

Source code in lute/io/elog.py
def get_elog_kerberos_auth() -> Dict[str, str]:\n    \"\"\"Returns Kerberos authorization key.\n\n    This functions returns authorization for the USER account submitting jobs.\n    It assumes that `kinit` has been run.\n\n    Returns:\n        auth (Dict[str, str]): Dictionary containing Kerberos authorization key.\n    \"\"\"\n    from krtc import KerberosTicket\n\n    return KerberosTicket(\"HTTP@pswww.slac.stanford.edu\").getAuthHeaders()\n
"},{"location":"source/io/elog/#io.elog.get_elog_opr_auth","title":"get_elog_opr_auth(exp)","text":"

Produce authentication for the \"opr\" user associated to an experiment.

This method uses basic authentication using username and password.

Parameters:

Name Type Description Default exp str

Name of the experiment to produce authentication for.

required

Returns:

Name Type Description auth HTTPBasicAuth

HTTPBasicAuth for an active experiment based on username and password for the associated operator account.

Source code in lute/io/elog.py
def get_elog_opr_auth(exp: str) -> HTTPBasicAuth:\n    \"\"\"Produce authentication for the \"opr\" user associated to an experiment.\n\n    This method uses basic authentication using username and password.\n\n    Args:\n        exp (str): Name of the experiment to produce authentication for.\n\n    Returns:\n        auth (HTTPBasicAuth): HTTPBasicAuth for an active experiment based on\n            username and password for the associated operator account.\n    \"\"\"\n    opr: str = f\"{exp[:3]}opr\"\n    with open(\"/sdf/group/lcls/ds/tools/forElogPost.txt\", \"r\") as f:\n        pw: str = f.readline()[:-1]\n    return HTTPBasicAuth(opr, pw)\n
"},{"location":"source/io/elog/#io.elog.get_elog_params_by_run","title":"get_elog_params_by_run(exp, params, runs=None)","text":"

Retrieve requested parameters by run or for all runs.

Parameters:

Name Type Description Default exp str

Experiment to retrieve parameters for.

required params List[str]

A list of parameters to retrieve. These can be any parameter recorded in the eLog (PVs, parameters posted by other Tasks, etc.)

required Source code in lute/io/elog.py
def get_elog_params_by_run(\n    exp: str, params: List[str], runs: Optional[List[int]] = None\n) -> Dict[str, str]:\n    \"\"\"Retrieve requested parameters by run or for all runs.\n\n    Args:\n        exp (str): Experiment to retrieve parameters for.\n\n        params (List[str]): A list of parameters to retrieve. These can be any\n            parameter recorded in the eLog (PVs, parameters posted by other\n            Tasks, etc.)\n    \"\"\"\n    ...\n
"},{"location":"source/io/elog/#io.elog.get_elog_runs_by_tag","title":"get_elog_runs_by_tag(exp, tag, auth=None)","text":"

Retrieve run numbers with a specified tag.

Parameters:

Name Type Description Default exp str

Experiment name.

required tag str

The tag to retrieve runs for.

required Source code in lute/io/elog.py
def get_elog_runs_by_tag(\n    exp: str, tag: str, auth: Optional[Union[HTTPBasicAuth, Dict]] = None\n) -> List[int]:\n    \"\"\"Retrieve run numbers with a specified tag.\n\n    Args:\n        exp (str): Experiment name.\n\n        tag (str): The tag to retrieve runs for.\n    \"\"\"\n    endpoint: str = f\"{exp}/ws/get_runs_with_tag?tag={tag}\"\n    params: Dict[str, Any] = {}\n\n    status_code, resp_msg, tagged_runs = elog_http_request(\n        exp=exp, endpoint=endpoint, request_type=\"GET\", **params\n    )\n\n    if not tagged_runs:\n        tagged_runs = []\n\n    return tagged_runs\n
"},{"location":"source/io/elog/#io.elog.get_elog_workflows","title":"get_elog_workflows(exp)","text":"

Get the current workflow definitions for an experiment.

Returns:

Name Type Description defns Dict[str, str]

A dictionary of workflow definitions.

Source code in lute/io/elog.py
def get_elog_workflows(exp: str) -> Dict[str, str]:\n    \"\"\"Get the current workflow definitions for an experiment.\n\n    Returns:\n        defns (Dict[str, str]): A dictionary of workflow definitions.\n    \"\"\"\n    raise NotImplementedError\n
"},{"location":"source/io/elog/#io.elog.post_elog_message","title":"post_elog_message(exp, msg, *, tag, title, in_files=[])","text":"

Post a new message to the eLog. Inspired by the elog package.

Parameters:

Name Type Description Default exp str

Experiment name.

required msg str

BODY of the eLog post.

required tag str | None

Optional \"tag\" to associate with the eLog post.

required title str | None

Optional title to include in the eLog post.

required in_files List[str | tuple | list]

Files to include as attachments in the eLog post.

[]

Returns:

Name Type Description err_msg str | None

If successful, nothing is returned, otherwise, return an error message.

Source code in lute/io/elog.py
def post_elog_message(\n    exp: str,\n    msg: str,\n    *,\n    tag: Optional[str],\n    title: Optional[str],\n    in_files: List[Union[str, tuple, list]] = [],\n) -> Optional[str]:\n    \"\"\"Post a new message to the eLog. Inspired by the `elog` package.\n\n    Args:\n        exp (str): Experiment name.\n\n        msg (str): BODY of the eLog post.\n\n        tag (str | None): Optional \"tag\" to associate with the eLog post.\n\n        title (str | None): Optional title to include in the eLog post.\n\n        in_files (List[str | tuple | list]): Files to include as attachments in\n            the eLog post.\n\n    Returns:\n        err_msg (str | None): If successful, nothing is returned, otherwise,\n            return an error message.\n    \"\"\"\n    # MOSTLY CORRECT\n    out_files: list = []\n    for f in in_files:\n        try:\n            out_files.append(format_file_for_post(in_file=f))\n        except ElogFileFormatError as err:\n            logger.debug(f\"ElogFileFormatError: {err}\")\n    post: Dict[str, str] = {}\n    post[\"log_text\"] = msg\n    if tag:\n        post[\"log_tags\"] = tag\n    if title:\n        post[\"log_title\"] = title\n\n    endpoint: str = f\"{exp}/ws/new_elog_entry\"\n\n    params: Dict[str, Any] = {\"data\": post}\n\n    if out_files:\n        params.update({\"files\": out_files})\n\n    status_code, resp_msg, _ = elog_http_request(\n        exp=exp, endpoint=endpoint, request_type=\"POST\", **params\n    )\n\n    if resp_msg != \"SUCCESS\":\n        return resp_msg\n
"},{"location":"source/io/elog/#io.elog.post_elog_run_status","title":"post_elog_run_status(data, update_url=None)","text":"

Post a summary to the status/report section of a specific run.

In contrast to most eLog update/post mechanisms, this function searches for a specific environment variable which contains a specific URL for posting. This is updated every job/run as jobs are submitted by the JID. The URL can optionally be passed to this function if it is known.

Parameters:

Name Type Description Default data Dict[str, Union[str, int, float]]

The data to post to the eLog report section. Formatted in key:value pairs.

required update_url Optional[str]

Optional update URL. If not provided, the function searches for the corresponding environment variable. If neither is found, the function aborts

None Source code in lute/io/elog.py
def post_elog_run_status(\n    data: Dict[str, Union[str, int, float]], update_url: Optional[str] = None\n) -> None:\n    \"\"\"Post a summary to the status/report section of a specific run.\n\n    In contrast to most eLog update/post mechanisms, this function searches\n    for a specific environment variable which contains a specific URL for\n    posting. This is updated every job/run as jobs are submitted by the JID.\n    The URL can optionally be passed to this function if it is known.\n\n    Args:\n        data (Dict[str, Union[str, int, float]]): The data to post to the eLog\n            report section. Formatted in key:value pairs.\n\n        update_url (Optional[str]): Optional update URL. If not provided, the\n            function searches for the corresponding environment variable. If\n            neither is found, the function aborts\n    \"\"\"\n    if update_url is None:\n        update_url = os.environ.get(\"JID_UPDATE_COUNTERS\")\n        if update_url is None:\n            logger.info(\"eLog Update Failed! JID_UPDATE_COUNTERS is not defined!\")\n            return\n    current_status: Dict[str, Union[str, int, float]] = _get_current_run_status(\n        update_url\n    )\n    current_status.update(data)\n    post_list: List[Dict[str, str]] = [\n        {\"key\": f\"{key}\", \"value\": f\"{value}\"} for key, value in current_status.items()\n    ]\n    params: Dict[str, List[Dict[str, str]]] = {\"json\": post_list}\n    resp: requests.models.Response = requests.post(update_url, **params)\n
"},{"location":"source/io/elog/#io.elog.post_elog_run_table","title":"post_elog_run_table(exp, run, data)","text":"

Post data for eLog run tables.

Parameters:

Name Type Description Default exp str

Experiment name.

required run int

Run number corresponding to the data being posted.

required data Dict[str, Any]

Data to be posted in format data[\"column_header\"] = value.

required

Returns:

Name Type Description err_msg None | str

If successful, nothing is returned, otherwise, return an error message.

Source code in lute/io/elog.py
def post_elog_run_table(\n    exp: str,\n    run: int,\n    data: Dict[str, Any],\n) -> Optional[str]:\n    \"\"\"Post data for eLog run tables.\n\n    Args:\n        exp (str): Experiment name.\n\n        run (int): Run number corresponding to the data being posted.\n\n        data (Dict[str, Any]): Data to be posted in format\n            data[\"column_header\"] = value.\n\n    Returns:\n        err_msg (None | str): If successful, nothing is returned, otherwise,\n            return an error message.\n    \"\"\"\n    endpoint: str = f\"run_control/{exp}/ws/add_run_params\"\n\n    params: Dict[str, Any] = {\"params\": {\"run_num\": run}, \"json\": data}\n\n    status_code, resp_msg, _ = elog_http_request(\n        exp=exp, endpoint=endpoint, request_type=\"POST\", **params\n    )\n\n    if resp_msg != \"SUCCESS\":\n        return resp_msg\n
"},{"location":"source/io/elog/#io.elog.post_elog_workflow","title":"post_elog_workflow(exp, name, executable, wf_params, *, trigger='run_end', location='S3DF', **trig_args)","text":"

Create a new eLog workflow, or update an existing one.

The workflow will run a specific executable as a batch job when the specified trigger occurs. The precise arguments may vary depending on the selected trigger type.

Parameters:

Name Type Description Default name str

An identifying name for the workflow. E.g. \"process data\"

required executable str

Full path to the executable to be run.

required wf_params str

All command-line parameters for the executable as a string.

required trigger str

When to trigger execution of the specified executable. One of: - 'manual': Must be manually triggered. No automatic processing. - 'run_start': Execute immediately if a new run begins. - 'run_end': As soon as a run ends. - 'param_is': As soon as a parameter has a specific value for a run.

'run_end' location str

Where to submit the job. S3DF or NERSC.

'S3DF' **trig_args str

Arguments required for a specific trigger type. trigger='param_is' - 2 Arguments trig_param (str): Name of the parameter to watch for. trig_param_val (str): Value the parameter should have to trigger.

{} Source code in lute/io/elog.py
def post_elog_workflow(\n    exp: str,\n    name: str,\n    executable: str,\n    wf_params: str,\n    *,\n    trigger: str = \"run_end\",\n    location: str = \"S3DF\",\n    **trig_args: str,\n) -> None:\n    \"\"\"Create a new eLog workflow, or update an existing one.\n\n    The workflow will run a specific executable as a batch job when the\n    specified trigger occurs. The precise arguments may vary depending on the\n    selected trigger type.\n\n    Args:\n        name (str): An identifying name for the workflow. E.g. \"process data\"\n\n        executable (str): Full path to the executable to be run.\n\n        wf_params (str): All command-line parameters for the executable as a string.\n\n        trigger (str): When to trigger execution of the specified executable.\n            One of:\n                - 'manual': Must be manually triggered. No automatic processing.\n                - 'run_start': Execute immediately if a new run begins.\n                - 'run_end': As soon as a run ends.\n                - 'param_is': As soon as a parameter has a specific value for a run.\n\n        location (str): Where to submit the job. S3DF or NERSC.\n\n        **trig_args (str): Arguments required for a specific trigger type.\n            trigger='param_is' - 2 Arguments\n                trig_param (str): Name of the parameter to watch for.\n                trig_param_val (str): Value the parameter should have to trigger.\n    \"\"\"\n    endpoint: str = f\"{exp}/ws/create_update_workflow_def\"\n    trig_map: Dict[str, str] = {\n        \"manual\": \"MANUAL\",\n        \"run_start\": \"START_OF_RUN\",\n        \"run_end\": \"END_OF_RUN\",\n        \"param_is\": \"RUN_PARAM_IS_VALUE\",\n    }\n    if trigger not in trig_map.keys():\n        raise NotImplementedError(\n            f\"Cannot create workflow with trigger type: {trigger}\"\n        )\n    wf_defn: Dict[str, str] = {\n        \"name\": name,\n        \"executable\": executable,\n        \"parameters\": wf_params,\n        \"trigger\": trig_map[trigger],\n        \"location\": location,\n    }\n    if trigger == \"param_is\":\n        if \"trig_param\" not in trig_args or \"trig_param_val\" not in trig_args:\n            raise RuntimeError(\n                \"Trigger type 'param_is' requires: 'trig_param' and 'trig_param_val' arguments\"\n            )\n        wf_defn.update(\n            {\n                \"run_param_name\": trig_args[\"trig_param\"],\n                \"run_param_val\": trig_args[\"trig_param_val\"],\n            }\n        )\n    post_params: Dict[str, Dict[str, str]] = {\"json\": wf_defn}\n    status_code, resp_msg, _ = elog_http_request(\n        exp, endpoint=endpoint, request_type=\"POST\", **post_params\n    )\n
"},{"location":"source/io/exceptions/","title":"exceptions","text":"

Specifies custom exceptions defined for IO problems.

Raises:

Type Description ElogFileFormatError

Raised if an attachment is specified in an incorrect format.

"},{"location":"source/io/exceptions/#io.exceptions.ElogFileFormatError","title":"ElogFileFormatError","text":"

Bases: Exception

Raised when an eLog attachment is specified in an invalid format.

Source code in lute/io/exceptions.py
class ElogFileFormatError(Exception):\n    \"\"\"Raised when an eLog attachment is specified in an invalid format.\"\"\"\n\n    ...\n
"},{"location":"source/io/models/base/","title":"base","text":"

Base classes for describing Task parameters.

Classes:

Name Description AnalysisHeader

Model holding shared configuration across Tasks. E.g. experiment name, run number and working directory.

TaskParameters

Base class for Task parameters. Subclasses specify a model of parameters and their types for validation.

ThirdPartyParameters

Base class for Third-party, binary executable Tasks.

TemplateParameters

Dataclass to represent parameters of binary (third-party) Tasks which are used for additional config files.

TemplateConfig

Class for holding information on where templates are stored in order to properly handle ThirdPartyParameter objects.

"},{"location":"source/io/models/base/#io.models.base.AnalysisHeader","title":"AnalysisHeader","text":"

Bases: BaseModel

Header information for LUTE analysis runs.

Source code in lute/io/models/base.py
class AnalysisHeader(BaseModel):\n    \"\"\"Header information for LUTE analysis runs.\"\"\"\n\n    title: str = Field(\n        \"LUTE Task Configuration\",\n        description=\"Description of the configuration or experiment.\",\n    )\n    experiment: str = Field(\"\", description=\"Experiment.\")\n    run: Union[str, int] = Field(\"\", description=\"Data acquisition run.\")\n    date: str = Field(\"1970/01/01\", description=\"Start date of analysis.\")\n    lute_version: Union[float, str] = Field(\n        0.1, description=\"Version of LUTE used for analysis.\"\n    )\n    task_timeout: PositiveInt = Field(\n        600,\n        description=(\n            \"Time in seconds until a task times out. Should be slightly shorter\"\n            \" than job timeout if using a job manager (e.g. SLURM).\"\n        ),\n    )\n    work_dir: str = Field(\"\", description=\"Main working directory for LUTE.\")\n\n    @validator(\"work_dir\", always=True)\n    def validate_work_dir(cls, directory: str, values: Dict[str, Any]) -> str:\n        work_dir: str\n        if directory == \"\":\n            std_work_dir = (\n                f\"/sdf/data/lcls/ds/{values['experiment'][:3]}/\"\n                f\"{values['experiment']}/scratch\"\n            )\n            work_dir = std_work_dir\n        else:\n            work_dir = directory\n        # Check existence and permissions\n        if not os.path.exists(work_dir):\n            raise ValueError(f\"Working Directory: {work_dir} does not exist!\")\n        if not os.access(work_dir, os.W_OK):\n            # Need write access for database, files etc.\n            raise ValueError(f\"Not write access for working directory: {work_dir}!\")\n        return work_dir\n\n    @validator(\"run\", always=True)\n    def validate_run(\n        cls, run: Union[str, int], values: Dict[str, Any]\n    ) -> Union[str, int]:\n        if run == \"\":\n            # From Airflow RUN_NUM should have Format \"RUN_DATETIME\" - Num is first part\n            run_time: str = os.environ.get(\"RUN_NUM\", \"\")\n            if run_time != \"\":\n                return int(run_time.split(\"_\")[0])\n        return run\n\n    @validator(\"experiment\", always=True)\n    def validate_experiment(cls, experiment: str, values: Dict[str, Any]) -> str:\n        if experiment == \"\":\n            arp_exp: str = os.environ.get(\"EXPERIMENT\", \"EXPX00000\")\n            return arp_exp\n        return experiment\n
"},{"location":"source/io/models/base/#io.models.base.TaskParameters","title":"TaskParameters","text":"

Bases: BaseSettings

Base class for models of task parameters to be validated.

Parameters are read from a configuration YAML file and validated against subclasses of this type in order to ensure that both all parameters are present, and that the parameters are of the correct type.

Note

Pydantic is used for data validation. Pydantic does not perform \"strict\" validation by default. Parameter values may be cast to conform with the model specified by the subclass definition if it is possible to do so. Consider whether this may cause issues (e.g. if a float is cast to an int).

Source code in lute/io/models/base.py
class TaskParameters(BaseSettings):\n    \"\"\"Base class for models of task parameters to be validated.\n\n    Parameters are read from a configuration YAML file and validated against\n    subclasses of this type in order to ensure that both all parameters are\n    present, and that the parameters are of the correct type.\n\n    Note:\n        Pydantic is used for data validation. Pydantic does not perform \"strict\"\n        validation by default. Parameter values may be cast to conform with the\n        model specified by the subclass definition if it is possible to do so.\n        Consider whether this may cause issues (e.g. if a float is cast to an\n        int).\n    \"\"\"\n\n    class Config:\n        \"\"\"Configuration for parameters model.\n\n        The Config class holds Pydantic configuration. A number of LUTE-specific\n        configuration has also been placed here.\n\n        Attributes:\n            env_prefix (str): Pydantic configuration. Will set parameters from\n                environment variables containing this prefix. E.g. a model\n                parameter `input` can be set with an environment variable:\n                `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n            underscore_attrs_are_private (bool): Pydantic configuration. Whether\n                to hide attributes (parameters) prefixed with an underscore.\n\n            copy_on_model_validation (str): Pydantic configuration. How to copy\n                the input object passed to the class instance for model\n                validation. Set to perform a deep copy.\n\n            allow_inf_nan (bool): Pydantic configuration. Whether to allow\n                infinity or NAN in float fields.\n\n            run_directory (Optional[str]): None. If set, it should be a valid\n                path. The `Task` will be run from this directory. This may be\n                useful for some `Task`s which rely on searching the working\n                directory.\n\n            set_result (bool). False. If True, the model has information about\n                setting the TaskResult object from the parameters it contains.\n                E.g. it has an `output` parameter which is marked as the result.\n                The result can be set with a field value of `is_result=True` on\n                a specific parameter, or using `result_from_params` and a\n                validator.\n\n            result_from_params (Optional[str]): None. Optionally used to define\n                results from information available in the model using a custom\n                validator. E.g. use a `outdir` and `filename` field to set\n                `result_from_params=f\"{outdir}/{filename}`, etc. Only used if\n                `set_result==True`\n\n            result_summary (Optional[str]): None. Defines a result summary that\n                can be known after processing the Pydantic model. Use of summary\n                depends on the Executor running the Task. All summaries are\n                stored in the database, however. Only used if `set_result==True`\n\n            impl_schemas (Optional[str]). Specifies a the schemas the\n                output/results conform to. Only used if `set_result==True`.\n        \"\"\"\n\n        env_prefix = \"LUTE_\"\n        underscore_attrs_are_private: bool = True\n        copy_on_model_validation: str = \"deep\"\n        allow_inf_nan: bool = False\n\n        run_directory: Optional[str] = None\n        \"\"\"Set the directory that the Task is run from.\"\"\"\n        set_result: bool = False\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n        result_from_params: Optional[str] = None\n        \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n        result_summary: Optional[str] = None\n        \"\"\"Format a TaskResult.summary from output.\"\"\"\n        impl_schemas: Optional[str] = None\n        \"\"\"Schema specification for output result. Will be passed to TaskResult.\"\"\"\n\n    lute_config: AnalysisHeader\n
"},{"location":"source/io/models/base/#io.models.base.TaskParameters.Config","title":"Config","text":"

Configuration for parameters model.

The Config class holds Pydantic configuration. A number of LUTE-specific configuration has also been placed here.

Attributes:

Name Type Description env_prefix str

Pydantic configuration. Will set parameters from environment variables containing this prefix. E.g. a model parameter input can be set with an environment variable: {env_prefix}input, in LUTE's case LUTE_input.

underscore_attrs_are_private bool

Pydantic configuration. Whether to hide attributes (parameters) prefixed with an underscore.

copy_on_model_validation str

Pydantic configuration. How to copy the input object passed to the class instance for model validation. Set to perform a deep copy.

allow_inf_nan bool

Pydantic configuration. Whether to allow infinity or NAN in float fields.

run_directory Optional[str]

None. If set, it should be a valid path. The Task will be run from this directory. This may be useful for some Tasks which rely on searching the working directory.

result_from_params Optional[str]

None. Optionally used to define results from information available in the model using a custom validator. E.g. use a outdir and filename field to set result_from_params=f\"{outdir}/{filename}, etc. Only used if set_result==True

result_summary Optional[str]

None. Defines a result summary that can be known after processing the Pydantic model. Use of summary depends on the Executor running the Task. All summaries are stored in the database, however. Only used if set_result==True

Source code in lute/io/models/base.py
class Config:\n    \"\"\"Configuration for parameters model.\n\n    The Config class holds Pydantic configuration. A number of LUTE-specific\n    configuration has also been placed here.\n\n    Attributes:\n        env_prefix (str): Pydantic configuration. Will set parameters from\n            environment variables containing this prefix. E.g. a model\n            parameter `input` can be set with an environment variable:\n            `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n        underscore_attrs_are_private (bool): Pydantic configuration. Whether\n            to hide attributes (parameters) prefixed with an underscore.\n\n        copy_on_model_validation (str): Pydantic configuration. How to copy\n            the input object passed to the class instance for model\n            validation. Set to perform a deep copy.\n\n        allow_inf_nan (bool): Pydantic configuration. Whether to allow\n            infinity or NAN in float fields.\n\n        run_directory (Optional[str]): None. If set, it should be a valid\n            path. The `Task` will be run from this directory. This may be\n            useful for some `Task`s which rely on searching the working\n            directory.\n\n        set_result (bool). False. If True, the model has information about\n            setting the TaskResult object from the parameters it contains.\n            E.g. it has an `output` parameter which is marked as the result.\n            The result can be set with a field value of `is_result=True` on\n            a specific parameter, or using `result_from_params` and a\n            validator.\n\n        result_from_params (Optional[str]): None. Optionally used to define\n            results from information available in the model using a custom\n            validator. E.g. use a `outdir` and `filename` field to set\n            `result_from_params=f\"{outdir}/{filename}`, etc. Only used if\n            `set_result==True`\n\n        result_summary (Optional[str]): None. Defines a result summary that\n            can be known after processing the Pydantic model. Use of summary\n            depends on the Executor running the Task. All summaries are\n            stored in the database, however. Only used if `set_result==True`\n\n        impl_schemas (Optional[str]). Specifies a the schemas the\n            output/results conform to. Only used if `set_result==True`.\n    \"\"\"\n\n    env_prefix = \"LUTE_\"\n    underscore_attrs_are_private: bool = True\n    copy_on_model_validation: str = \"deep\"\n    allow_inf_nan: bool = False\n\n    run_directory: Optional[str] = None\n    \"\"\"Set the directory that the Task is run from.\"\"\"\n    set_result: bool = False\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n    result_from_params: Optional[str] = None\n    \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n    result_summary: Optional[str] = None\n    \"\"\"Format a TaskResult.summary from output.\"\"\"\n    impl_schemas: Optional[str] = None\n    \"\"\"Schema specification for output result. Will be passed to TaskResult.\"\"\"\n
"},{"location":"source/io/models/base/#io.models.base.TaskParameters.Config.impl_schemas","title":"impl_schemas: Optional[str] = None class-attribute instance-attribute","text":"

Schema specification for output result. Will be passed to TaskResult.

"},{"location":"source/io/models/base/#io.models.base.TaskParameters.Config.result_from_params","title":"result_from_params: Optional[str] = None class-attribute instance-attribute","text":"

Defines a result from the parameters. Use a validator to do so.

"},{"location":"source/io/models/base/#io.models.base.TaskParameters.Config.result_summary","title":"result_summary: Optional[str] = None class-attribute instance-attribute","text":"

Format a TaskResult.summary from output.

"},{"location":"source/io/models/base/#io.models.base.TaskParameters.Config.run_directory","title":"run_directory: Optional[str] = None class-attribute instance-attribute","text":"

Set the directory that the Task is run from.

"},{"location":"source/io/models/base/#io.models.base.TaskParameters.Config.set_result","title":"set_result: bool = False class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/models/base/#io.models.base.TemplateConfig","title":"TemplateConfig","text":"

Bases: BaseModel

Parameters used for templating of third party configuration files.

Attributes:

Name Type Description template_name str

The name of the template to use. This template must live in config/templates.

output_path str

The FULL path, including filename to write the rendered template to.

Source code in lute/io/models/base.py
class TemplateConfig(BaseModel):\n    \"\"\"Parameters used for templating of third party configuration files.\n\n    Attributes:\n        template_name (str): The name of the template to use. This template must\n            live in `config/templates`.\n\n        output_path (str): The FULL path, including filename to write the\n            rendered template to.\n    \"\"\"\n\n    template_name: str\n    output_path: str\n
"},{"location":"source/io/models/base/#io.models.base.TemplateParameters","title":"TemplateParameters","text":"

Class for representing parameters for third party configuration files.

These parameters can represent arbitrary data types and are used in conjunction with templates for modifying third party configuration files from the single LUTE YAML. Due to the storage of arbitrary data types, and the use of a template file, a single instance of this class can hold from a single template variable to an entire configuration file. The data parsing is done by jinja using the complementary template. All data is stored in the single model variable params.

The pydantic \"dataclass\" is used over the BaseModel/Settings to allow positional argument instantiation of the params Field.

Source code in lute/io/models/base.py
@dataclass\nclass TemplateParameters:\n    \"\"\"Class for representing parameters for third party configuration files.\n\n    These parameters can represent arbitrary data types and are used in\n    conjunction with templates for modifying third party configuration files\n    from the single LUTE YAML. Due to the storage of arbitrary data types, and\n    the use of a template file, a single instance of this class can hold from a\n    single template variable to an entire configuration file. The data parsing\n    is done by jinja using the complementary template.\n    All data is stored in the single model variable `params.`\n\n    The pydantic \"dataclass\" is used over the BaseModel/Settings to allow\n    positional argument instantiation of the `params` Field.\n    \"\"\"\n\n    params: Any\n
"},{"location":"source/io/models/base/#io.models.base.ThirdPartyParameters","title":"ThirdPartyParameters","text":"

Bases: TaskParameters

Base class for third party task parameters.

Contains special validators for extra arguments and handling of parameters used for filling in third party configuration files.

Source code in lute/io/models/base.py
class ThirdPartyParameters(TaskParameters):\n    \"\"\"Base class for third party task parameters.\n\n    Contains special validators for extra arguments and handling of parameters\n    used for filling in third party configuration files.\n    \"\"\"\n\n    class Config(TaskParameters.Config):\n        \"\"\"Configuration for parameters model.\n\n        The Config class holds Pydantic configuration and inherited configuration\n        from the base `TaskParameters.Config` class. A number of values are also\n        overridden, and there are some specific configuration options to\n        ThirdPartyParameters. A full list of options (with TaskParameters options\n        repeated) is described below.\n\n        Attributes:\n            env_prefix (str): Pydantic configuration. Will set parameters from\n                environment variables containing this prefix. E.g. a model\n                parameter `input` can be set with an environment variable:\n                `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n            underscore_attrs_are_private (bool): Pydantic configuration. Whether\n                to hide attributes (parameters) prefixed with an underscore.\n\n            copy_on_model_validation (str): Pydantic configuration. How to copy\n                the input object passed to the class instance for model\n                validation. Set to perform a deep copy.\n\n            allow_inf_nan (bool): Pydantic configuration. Whether to allow\n                infinity or NAN in float fields.\n\n            run_directory (Optional[str]): None. If set, it should be a valid\n                path. The `Task` will be run from this directory. This may be\n                useful for some `Task`s which rely on searching the working\n                directory.\n\n            set_result (bool). True. If True, the model has information about\n                setting the TaskResult object from the parameters it contains.\n                E.g. it has an `output` parameter which is marked as the result.\n                The result can be set with a field value of `is_result=True` on\n                a specific parameter, or using `result_from_params` and a\n                validator.\n\n            result_from_params (Optional[str]): None. Optionally used to define\n                results from information available in the model using a custom\n                validator. E.g. use a `outdir` and `filename` field to set\n                `result_from_params=f\"{outdir}/{filename}`, etc.\n\n            result_summary (Optional[str]): None. Defines a result summary that\n                can be known after processing the Pydantic model. Use of summary\n                depends on the Executor running the Task. All summaries are\n                stored in the database, however.\n\n            impl_schemas (Optional[str]). Specifies a the schemas the\n                output/results conform to. Only used if set_result is True.\n\n            -----------------------\n            ThirdPartyTask-specific:\n\n            extra (str): \"allow\". Pydantic configuration. Allow (or ignore) extra\n                arguments.\n\n            short_flags_use_eq (bool): False. If True, \"short\" command-line args\n                are passed as `-x=arg`. ThirdPartyTask-specific.\n\n            long_flags_use_eq (bool): False. If True, \"long\" command-line args\n                are passed as `--long=arg`. ThirdPartyTask-specific.\n        \"\"\"\n\n        extra: str = \"allow\"\n        short_flags_use_eq: bool = False\n        \"\"\"Whether short command-line arguments are passed like `-x=arg`.\"\"\"\n        long_flags_use_eq: bool = False\n        \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    # lute_template_cfg: TemplateConfig\n\n    @root_validator(pre=False)\n    def extra_fields_to_thirdparty(cls, values: Dict[str, Any]):\n        for key in values:\n            if key not in cls.__fields__:\n                values[key] = TemplateParameters(values[key])\n\n        return values\n
"},{"location":"source/io/models/base/#io.models.base.ThirdPartyParameters.Config","title":"Config","text":"

Bases: Config

Configuration for parameters model.

The Config class holds Pydantic configuration and inherited configuration from the base TaskParameters.Config class. A number of values are also overridden, and there are some specific configuration options to ThirdPartyParameters. A full list of options (with TaskParameters options repeated) is described below.

Attributes:

Name Type Description env_prefix str

Pydantic configuration. Will set parameters from environment variables containing this prefix. E.g. a model parameter input can be set with an environment variable: {env_prefix}input, in LUTE's case LUTE_input.

underscore_attrs_are_private bool

Pydantic configuration. Whether to hide attributes (parameters) prefixed with an underscore.

copy_on_model_validation str

Pydantic configuration. How to copy the input object passed to the class instance for model validation. Set to perform a deep copy.

allow_inf_nan bool

Pydantic configuration. Whether to allow infinity or NAN in float fields.

run_directory Optional[str]

None. If set, it should be a valid path. The Task will be run from this directory. This may be useful for some Tasks which rely on searching the working directory.

result_from_params Optional[str]

None. Optionally used to define results from information available in the model using a custom validator. E.g. use a outdir and filename field to set result_from_params=f\"{outdir}/{filename}, etc.

result_summary Optional[str]

None. Defines a result summary that can be known after processing the Pydantic model. Use of summary depends on the Executor running the Task. All summaries are stored in the database, however.

ThirdPartyTask-specific Optional[str] extra str

\"allow\". Pydantic configuration. Allow (or ignore) extra arguments.

short_flags_use_eq bool

False. If True, \"short\" command-line args are passed as -x=arg. ThirdPartyTask-specific.

long_flags_use_eq bool

False. If True, \"long\" command-line args are passed as --long=arg. ThirdPartyTask-specific.

Source code in lute/io/models/base.py
class Config(TaskParameters.Config):\n    \"\"\"Configuration for parameters model.\n\n    The Config class holds Pydantic configuration and inherited configuration\n    from the base `TaskParameters.Config` class. A number of values are also\n    overridden, and there are some specific configuration options to\n    ThirdPartyParameters. A full list of options (with TaskParameters options\n    repeated) is described below.\n\n    Attributes:\n        env_prefix (str): Pydantic configuration. Will set parameters from\n            environment variables containing this prefix. E.g. a model\n            parameter `input` can be set with an environment variable:\n            `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n        underscore_attrs_are_private (bool): Pydantic configuration. Whether\n            to hide attributes (parameters) prefixed with an underscore.\n\n        copy_on_model_validation (str): Pydantic configuration. How to copy\n            the input object passed to the class instance for model\n            validation. Set to perform a deep copy.\n\n        allow_inf_nan (bool): Pydantic configuration. Whether to allow\n            infinity or NAN in float fields.\n\n        run_directory (Optional[str]): None. If set, it should be a valid\n            path. The `Task` will be run from this directory. This may be\n            useful for some `Task`s which rely on searching the working\n            directory.\n\n        set_result (bool). True. If True, the model has information about\n            setting the TaskResult object from the parameters it contains.\n            E.g. it has an `output` parameter which is marked as the result.\n            The result can be set with a field value of `is_result=True` on\n            a specific parameter, or using `result_from_params` and a\n            validator.\n\n        result_from_params (Optional[str]): None. Optionally used to define\n            results from information available in the model using a custom\n            validator. E.g. use a `outdir` and `filename` field to set\n            `result_from_params=f\"{outdir}/{filename}`, etc.\n\n        result_summary (Optional[str]): None. Defines a result summary that\n            can be known after processing the Pydantic model. Use of summary\n            depends on the Executor running the Task. All summaries are\n            stored in the database, however.\n\n        impl_schemas (Optional[str]). Specifies a the schemas the\n            output/results conform to. Only used if set_result is True.\n\n        -----------------------\n        ThirdPartyTask-specific:\n\n        extra (str): \"allow\". Pydantic configuration. Allow (or ignore) extra\n            arguments.\n\n        short_flags_use_eq (bool): False. If True, \"short\" command-line args\n            are passed as `-x=arg`. ThirdPartyTask-specific.\n\n        long_flags_use_eq (bool): False. If True, \"long\" command-line args\n            are passed as `--long=arg`. ThirdPartyTask-specific.\n    \"\"\"\n\n    extra: str = \"allow\"\n    short_flags_use_eq: bool = False\n    \"\"\"Whether short command-line arguments are passed like `-x=arg`.\"\"\"\n    long_flags_use_eq: bool = False\n    \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/models/base/#io.models.base.ThirdPartyParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = False class-attribute instance-attribute","text":"

Whether long command-line arguments are passed like --long=arg.

"},{"location":"source/io/models/base/#io.models.base.ThirdPartyParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/models/base/#io.models.base.ThirdPartyParameters.Config.short_flags_use_eq","title":"short_flags_use_eq: bool = False class-attribute instance-attribute","text":"

Whether short command-line arguments are passed like -x=arg.

"},{"location":"source/io/models/sfx_find_peaks/","title":"sfx_find_peaks","text":""},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPsocakeParameters","title":"FindPeaksPsocakeParameters","text":"

Bases: ThirdPartyParameters

Parameters for crystallographic (Bragg) peak finding using Psocake.

This peak finding Task optionally has the ability to compress/decompress data with SZ for the purpose of compression validation. NOTE: This Task is deprecated and provided for compatibility only.

Source code in lute/io/models/sfx_find_peaks.py
class FindPeaksPsocakeParameters(ThirdPartyParameters):\n    \"\"\"Parameters for crystallographic (Bragg) peak finding using Psocake.\n\n    This peak finding Task optionally has the ability to compress/decompress\n    data with SZ for the purpose of compression validation.\n    NOTE: This Task is deprecated and provided for compatibility only.\n    \"\"\"\n\n    class Config(TaskParameters.Config):\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n        result_from_params: str = \"\"\n        \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n\n    class SZParameters(BaseModel):\n        compressor: Literal[\"qoz\", \"sz3\"] = Field(\n            \"qoz\", description=\"SZ compression algorithm (qoz, sz3)\"\n        )\n        binSize: int = Field(2, description=\"SZ compression's bin size paramater\")\n        roiWindowSize: int = Field(\n            2, description=\"SZ compression's ROI window size paramater\"\n        )\n        absError: float = Field(10, descriptionp=\"Maximum absolute error value\")\n\n    executable: str = Field(\"mpirun\", description=\"MPI executable.\", flag_type=\"\")\n    np: PositiveInt = Field(\n        max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n        description=\"Number of processes\",\n        flag_type=\"-\",\n    )\n    mca: str = Field(\n        \"btl ^openib\", description=\"Mca option for the MPI executable\", flag_type=\"--\"\n    )\n    p_arg1: str = Field(\n        \"python\", description=\"Executable to run with mpi (i.e. python).\", flag_type=\"\"\n    )\n    u: str = Field(\n        \"\", description=\"Python option for unbuffered output.\", flag_type=\"-\"\n    )\n    p_arg2: str = Field(\n        \"findPeaksSZ.py\",\n        description=\"Executable to run with mpi (i.e. python).\",\n        flag_type=\"\",\n    )\n    d: str = Field(description=\"Detector name\", flag_type=\"-\")\n    e: str = Field(\"\", description=\"Experiment name\", flag_type=\"-\")\n    r: int = Field(-1, description=\"Run number\", flag_type=\"-\")\n    outDir: str = Field(\n        description=\"Output directory where .cxi will be saved\", flag_type=\"--\"\n    )\n    algorithm: int = Field(1, description=\"PyAlgos algorithm to use\", flag_type=\"--\")\n    alg_npix_min: float = Field(\n        1.0, description=\"PyAlgos algorithm's npix_min parameter\", flag_type=\"--\"\n    )\n    alg_npix_max: float = Field(\n        45.0, description=\"PyAlgos algorithm's npix_max parameter\", flag_type=\"--\"\n    )\n    alg_amax_thr: float = Field(\n        250.0, description=\"PyAlgos algorithm's amax_thr parameter\", flag_type=\"--\"\n    )\n    alg_atot_thr: float = Field(\n        330.0, description=\"PyAlgos algorithm's atot_thr parameter\", flag_type=\"--\"\n    )\n    alg_son_min: float = Field(\n        10.0, description=\"PyAlgos algorithm's son_min parameter\", flag_type=\"--\"\n    )\n    alg1_thr_low: float = Field(\n        80.0, description=\"PyAlgos algorithm's thr_low parameter\", flag_type=\"--\"\n    )\n    alg1_thr_high: float = Field(\n        270.0, description=\"PyAlgos algorithm's thr_high parameter\", flag_type=\"--\"\n    )\n    alg1_rank: int = Field(\n        3, description=\"PyAlgos algorithm's rank parameter\", flag_type=\"--\"\n    )\n    alg1_radius: int = Field(\n        3, description=\"PyAlgos algorithm's radius parameter\", flag_type=\"--\"\n    )\n    alg1_dr: int = Field(\n        1, description=\"PyAlgos algorithm's dr parameter\", flag_type=\"--\"\n    )\n    psanaMask_on: str = Field(\n        \"True\", description=\"Whether psana's mask should be used\", flag_type=\"--\"\n    )\n    psanaMask_calib: str = Field(\n        \"True\", description=\"Psana mask's calib parameter\", flag_type=\"--\"\n    )\n    psanaMask_status: str = Field(\n        \"True\", description=\"Psana mask's status parameter\", flag_type=\"--\"\n    )\n    psanaMask_edges: str = Field(\n        \"True\", description=\"Psana mask's edges parameter\", flag_type=\"--\"\n    )\n    psanaMask_central: str = Field(\n        \"True\", description=\"Psana mask's central parameter\", flag_type=\"--\"\n    )\n    psanaMask_unbond: str = Field(\n        \"True\", description=\"Psana mask's unbond parameter\", flag_type=\"--\"\n    )\n    psanaMask_unbondnrs: str = Field(\n        \"True\", description=\"Psana mask's unbondnbrs parameter\", flag_type=\"--\"\n    )\n    mask: str = Field(\n        \"\", description=\"Path to an additional mask to apply\", flag_type=\"--\"\n    )\n    clen: str = Field(\n        description=\"Epics variable storing the camera length\", flag_type=\"--\"\n    )\n    coffset: float = Field(0, description=\"Camera offset in m\", flag_type=\"--\")\n    minPeaks: int = Field(\n        15,\n        description=\"Minimum number of peaks to mark frame for indexing\",\n        flag_type=\"--\",\n    )\n    maxPeaks: int = Field(\n        15,\n        description=\"Maximum number of peaks to mark frame for indexing\",\n        flag_type=\"--\",\n    )\n    minRes: int = Field(\n        0,\n        description=\"Minimum peak resolution to mark frame for indexing \",\n        flag_type=\"--\",\n    )\n    sample: str = Field(\"\", description=\"Sample name\", flag_type=\"--\")\n    instrument: Union[None, str] = Field(\n        None, description=\"Instrument name\", flag_type=\"--\"\n    )\n    pixelSize: float = Field(0.0, description=\"Pixel size\", flag_type=\"--\")\n    auto: str = Field(\n        \"False\",\n        description=(\n            \"Whether to automatically determine peak per event peak \"\n            \"finding parameters\"\n        ),\n        flag_type=\"--\",\n    )\n    detectorDistance: float = Field(\n        0.0, description=\"Detector distance from interaction point in m\", flag_type=\"--\"\n    )\n    access: Literal[\"ana\", \"ffb\"] = Field(\n        \"ana\", description=\"Data node type: {ana,ffb}\", flag_type=\"--\"\n    )\n    szfile: str = Field(\"qoz.json\", description=\"Path to SZ's JSON configuration file\")\n    lute_template_cfg: TemplateConfig = Field(\n        TemplateConfig(\n            template_name=\"sz.json\",\n            output_path=\"\",  # Will want to change where this goes...\n        ),\n        description=\"Template information for the sz.json file\",\n    )\n    sz_parameters: SZParameters = Field(\n        description=\"Configuration parameters for SZ Compression\", flag_type=\"\"\n    )\n\n    @validator(\"e\", always=True)\n    def validate_e(cls, e: str, values: Dict[str, Any]) -> str:\n        if e == \"\":\n            return values[\"lute_config\"].experiment\n        return e\n\n    @validator(\"r\", always=True)\n    def validate_r(cls, r: int, values: Dict[str, Any]) -> int:\n        if r == -1:\n            return values[\"lute_config\"].run\n        return r\n\n    @validator(\"lute_template_cfg\", always=True)\n    def set_output_path(\n        cls, lute_template_cfg: TemplateConfig, values: Dict[str, Any]\n    ) -> TemplateConfig:\n        if lute_template_cfg.output_path == \"\":\n            lute_template_cfg.output_path = values[\"szfile\"]\n        return lute_template_cfg\n\n    @validator(\"sz_parameters\", always=True)\n    def set_sz_compression_parameters(\n        cls, sz_parameters: SZParameters, values: Dict[str, Any]\n    ) -> None:\n        values[\"compressor\"] = sz_parameters.compressor\n        values[\"binSize\"] = sz_parameters.binSize\n        values[\"roiWindowSize\"] = sz_parameters.roiWindowSize\n        if sz_parameters.compressor == \"qoz\":\n            values[\"pressio_opts\"] = {\n                \"pressio:abs\": sz_parameters.absError,\n                \"qoz\": {\"qoz:stride\": 8},\n            }\n        else:\n            values[\"pressio_opts\"] = {\"pressio:abs\": sz_parameters.absError}\n        return None\n\n    @root_validator(pre=False)\n    def define_result(cls, values: Dict[str, Any]) -> Dict[str, Any]:\n        exp: str = values[\"lute_config\"].experiment\n        run: int = int(values[\"lute_config\"].run)\n        directory: str = values[\"outDir\"]\n        fname: str = f\"{exp}_{run:04d}.lst\"\n\n        cls.Config.result_from_params = f\"{directory}/{fname}\"\n        return values\n
"},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPsocakeParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_find_peaks.py
class Config(TaskParameters.Config):\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    result_from_params: str = \"\"\n    \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n
"},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPsocakeParameters.Config.result_from_params","title":"result_from_params: str = '' class-attribute instance-attribute","text":"

Defines a result from the parameters. Use a validator to do so.

"},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPsocakeParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPyAlgosParameters","title":"FindPeaksPyAlgosParameters","text":"

Bases: TaskParameters

Parameters for crystallographic (Bragg) peak finding using PyAlgos.

This peak finding Task optionally has the ability to compress/decompress data with SZ for the purpose of compression validation.

Source code in lute/io/models/sfx_find_peaks.py
class FindPeaksPyAlgosParameters(TaskParameters):\n    \"\"\"Parameters for crystallographic (Bragg) peak finding using PyAlgos.\n\n    This peak finding Task optionally has the ability to compress/decompress\n    data with SZ for the purpose of compression validation.\n    \"\"\"\n\n    class Config(TaskParameters.Config):\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    class SZCompressorParameters(BaseModel):\n        compressor: Literal[\"qoz\", \"sz3\"] = Field(\n            \"qoz\", description='Compression algorithm (\"qoz\" or \"sz3\")'\n        )\n        abs_error: float = Field(10.0, description=\"Absolute error bound\")\n        bin_size: int = Field(2, description=\"Bin size\")\n        roi_window_size: int = Field(\n            9,\n            description=\"Default window size\",\n        )\n\n    outdir: str = Field(\n        description=\"Output directory for cxi files\",\n    )\n    n_events: int = Field(\n        0,\n        description=\"Number of events to process (0 to process all events)\",\n    )\n    det_name: str = Field(\n        description=\"Psana name of the detector storing the image data\",\n    )\n    event_receiver: Literal[\"evr0\", \"evr1\"] = Field(\n        description=\"Event Receiver to be used: evr0 or evr1\",\n    )\n    tag: str = Field(\n        \"\",\n        description=\"Tag to add to the output file names\",\n    )\n    pv_camera_length: Union[str, float] = Field(\n        \"\",\n        description=\"PV associated with camera length \"\n        \"(if a number, camera length directly)\",\n    )\n    event_logic: bool = Field(\n        False,\n        description=\"True if only events with a specific event code should be \"\n        \"processed. False if the event code should be ignored\",\n    )\n    event_code: int = Field(\n        0,\n        description=\"Required events code for events to be processed if event logic \"\n        \"is True\",\n    )\n    psana_mask: bool = Field(\n        False,\n        description=\"If True, apply mask from psana Detector object\",\n    )\n    mask_file: Union[str, None] = Field(\n        None,\n        description=\"File with a custom mask to apply. If None, no custom mask is \"\n        \"applied\",\n    )\n    min_peaks: int = Field(2, description=\"Minimum number of peaks per image\")\n    max_peaks: int = Field(\n        2048,\n        description=\"Maximum number of peaks per image\",\n    )\n    npix_min: int = Field(\n        2,\n        description=\"Minimum number of pixels per peak\",\n    )\n    npix_max: int = Field(\n        30,\n        description=\"Maximum number of pixels per peak\",\n    )\n    amax_thr: float = Field(\n        80.0,\n        description=\"Minimum intensity threshold for starting a peak\",\n    )\n    atot_thr: float = Field(\n        120.0,\n        description=\"Minimum summed intensity threshold for pixel collection\",\n    )\n    son_min: float = Field(\n        7.0,\n        description=\"Minimum signal-to-noise ratio to be considered a peak\",\n    )\n    peak_rank: int = Field(\n        3,\n        description=\"Radius in which central peak pixel is a local maximum\",\n    )\n    r0: float = Field(\n        3.0,\n        description=\"Radius of ring for background evaluation in pixels\",\n    )\n    dr: float = Field(\n        2.0,\n        description=\"Width of ring for background evaluation in pixels\",\n    )\n    nsigm: float = Field(\n        7.0,\n        description=\"Intensity threshold to include pixel in connected group\",\n    )\n    compression: Optional[SZCompressorParameters] = Field(\n        None,\n        description=\"Options for the SZ Compression Algorithm\",\n    )\n    out_file: str = Field(\n        \"\",\n        description=\"Path to output file.\",\n        flag_type=\"-\",\n        rename_param=\"o\",\n        is_result=True,\n    )\n\n    @validator(\"out_file\", always=True)\n    def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n        if out_file == \"\":\n            fname: Path = (\n                Path(values[\"outdir\"])\n                / f\"{values['lute_config'].experiment}_{values['lute_config'].run}_\"\n                f\"{values['tag']}.list\"\n            )\n            return str(fname)\n        return out_file\n
"},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPyAlgosParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_find_peaks.py
class Config(TaskParameters.Config):\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPyAlgosParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/models/sfx_index/","title":"sfx_index","text":"

Models for serial femtosecond crystallography indexing.

Classes:

Name Description IndexCrystFELParameters

Perform indexing of hits/peaks using CrystFEL's indexamajig.

"},{"location":"source/io/models/sfx_index/#io.models.sfx_index.ConcatenateStreamFilesParameters","title":"ConcatenateStreamFilesParameters","text":"

Bases: TaskParameters

Parameters for stream concatenation.

Concatenates the stream file output from CrystFEL indexing for multiple experimental runs.

Source code in lute/io/models/sfx_index.py
class ConcatenateStreamFilesParameters(TaskParameters):\n    \"\"\"Parameters for stream concatenation.\n\n    Concatenates the stream file output from CrystFEL indexing for multiple\n    experimental runs.\n    \"\"\"\n\n    class Config(TaskParameters.Config):\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    in_file: str = Field(\n        \"\",\n        description=\"Root of directory tree storing stream files to merge.\",\n    )\n\n    tag: Optional[str] = Field(\n        \"\",\n        description=\"Tag identifying the stream files to merge.\",\n    )\n\n    out_file: str = Field(\n        \"\", description=\"Path to merged output stream file.\", is_result=True\n    )\n\n    @validator(\"in_file\", always=True)\n    def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n        if in_file == \"\":\n            stream_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"IndexCrystFEL\", \"out_file\"\n            )\n            if stream_file:\n                stream_dir: str = str(Path(stream_file).parent)\n                return stream_dir\n        return in_file\n\n    @validator(\"tag\", always=True)\n    def validate_tag(cls, tag: str, values: Dict[str, Any]) -> str:\n        if tag == \"\":\n            stream_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"IndexCrystFEL\", \"out_file\"\n            )\n            if stream_file:\n                stream_tag: str = Path(stream_file).name.split(\"_\")[0]\n                return stream_tag\n        return tag\n\n    @validator(\"out_file\", always=True)\n    def validate_out_file(cls, tag: str, values: Dict[str, Any]) -> str:\n        if tag == \"\":\n            stream_out_file: str = str(\n                Path(values[\"in_file\"]).parent / f\"{values['tag'].stream}\"\n            )\n            return stream_out_file\n        return tag\n
"},{"location":"source/io/models/sfx_index/#io.models.sfx_index.ConcatenateStreamFilesParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_index.py
class Config(TaskParameters.Config):\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/models/sfx_index/#io.models.sfx_index.ConcatenateStreamFilesParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/models/sfx_index/#io.models.sfx_index.IndexCrystFELParameters","title":"IndexCrystFELParameters","text":"

Bases: ThirdPartyParameters

Parameters for CrystFEL's indexamajig.

There are many parameters, and many combinations. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-indexamajig.html

Source code in lute/io/models/sfx_index.py
class IndexCrystFELParameters(ThirdPartyParameters):\n    \"\"\"Parameters for CrystFEL's `indexamajig`.\n\n    There are many parameters, and many combinations. For more information on\n    usage, please refer to the CrystFEL documentation, here:\n    https://www.desy.de/~twhite/crystfel/manual-indexamajig.html\n    \"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n        long_flags_use_eq: bool = True\n        \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/indexamajig\",\n        description=\"CrystFEL's indexing binary.\",\n        flag_type=\"\",\n    )\n    # Basic options\n    in_file: Optional[str] = Field(\n        \"\", description=\"Path to input file.\", flag_type=\"-\", rename_param=\"i\"\n    )\n    out_file: str = Field(\n        \"\",\n        description=\"Path to output file.\",\n        flag_type=\"-\",\n        rename_param=\"o\",\n        is_result=True,\n    )\n    geometry: str = Field(\n        \"\", description=\"Path to geometry file.\", flag_type=\"-\", rename_param=\"g\"\n    )\n    zmq_input: Optional[str] = Field(\n        description=\"ZMQ address to receive data over. `input` and `zmq-input` are mutually exclusive\",\n        flag_type=\"--\",\n        rename_param=\"zmq-input\",\n    )\n    zmq_subscribe: Optional[str] = Field(  # Can be used multiple times...\n        description=\"Subscribe to ZMQ message of type `tag`\",\n        flag_type=\"--\",\n        rename_param=\"zmq-subscribe\",\n    )\n    zmq_request: Optional[AnyUrl] = Field(\n        description=\"Request new data over ZMQ by sending this value\",\n        flag_type=\"--\",\n        rename_param=\"zmq-request\",\n    )\n    asapo_endpoint: Optional[str] = Field(\n        description=\"ASAP::O endpoint. zmq-input and this are mutually exclusive.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-endpoint\",\n    )\n    asapo_token: Optional[str] = Field(\n        description=\"ASAP::O authentication token.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-token\",\n    )\n    asapo_beamtime: Optional[str] = Field(\n        description=\"ASAP::O beatime.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-beamtime\",\n    )\n    asapo_source: Optional[str] = Field(\n        description=\"ASAP::O data source.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-source\",\n    )\n    asapo_group: Optional[str] = Field(\n        description=\"ASAP::O consumer group.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-group\",\n    )\n    asapo_stream: Optional[str] = Field(\n        description=\"ASAP::O stream.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-stream\",\n    )\n    asapo_wait_for_stream: Optional[str] = Field(\n        description=\"If ASAP::O stream does not exist, wait for it to appear.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-wait-for-stream\",\n    )\n    data_format: Optional[str] = Field(\n        description=\"Specify format for ZMQ or ASAP::O. `msgpack`, `hdf5` or `seedee`.\",\n        flag_type=\"--\",\n        rename_param=\"data-format\",\n    )\n    basename: bool = Field(\n        False,\n        description=\"Remove directory parts of filenames. Acts before prefix if prefix also given.\",\n        flag_type=\"--\",\n    )\n    prefix: Optional[str] = Field(\n        description=\"Add a prefix to the filenames from the infile argument.\",\n        flag_type=\"--\",\n        rename_param=\"asapo-stream\",\n    )\n    nthreads: PositiveInt = Field(\n        max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n        description=\"Number of threads to use. See also `max_indexer_threads`.\",\n        flag_type=\"-\",\n        rename_param=\"j\",\n    )\n    no_check_prefix: bool = Field(\n        False,\n        description=\"Don't attempt to correct the prefix if it seems incorrect.\",\n        flag_type=\"--\",\n        rename_param=\"no-check-prefix\",\n    )\n    highres: Optional[float] = Field(\n        description=\"Mark all pixels greater than `x` has bad.\", flag_type=\"--\"\n    )\n    profile: bool = Field(\n        False, description=\"Display timing data to monitor performance.\", flag_type=\"--\"\n    )\n    temp_dir: Optional[str] = Field(\n        description=\"Specify a path for the temp files folder.\",\n        flag_type=\"--\",\n        rename_param=\"temp-dir\",\n    )\n    wait_for_file: conint(gt=-2) = Field(\n        0,\n        description=\"Wait at most `x` seconds for a file to be created. A value of -1 means wait forever.\",\n        flag_type=\"--\",\n        rename_param=\"wait-for-file\",\n    )\n    no_image_data: bool = Field(\n        False,\n        description=\"Load only the metadata, no iamges. Can check indexability without high data requirements.\",\n        flag_type=\"--\",\n        rename_param=\"no-image-data\",\n    )\n    # Peak-finding options\n    # ....\n    # Indexing options\n    indexing: Optional[str] = Field(\n        description=\"Comma-separated list of supported indexing algorithms to use. Default is to automatically detect.\",\n        flag_type=\"--\",\n    )\n    cell_file: Optional[str] = Field(\n        description=\"Path to a file containing unit cell information (PDB or CrystFEL format).\",\n        flag_type=\"-\",\n        rename_param=\"p\",\n    )\n    tolerance: str = Field(\n        \"5,5,5,1.5\",\n        description=(\n            \"Tolerances (in percent) for unit cell comparison. \"\n            \"Comma-separated list a,b,c,angle. Default=5,5,5,1.5\"\n        ),\n        flag_type=\"--\",\n    )\n    no_check_cell: bool = Field(\n        False,\n        description=\"Do not check cell parameters against unit cell. Replaces '-raw' method.\",\n        flag_type=\"--\",\n        rename_param=\"no-check-cell\",\n    )\n    no_check_peaks: bool = Field(\n        False,\n        description=\"Do not verify peaks are accounted for by solution.\",\n        flag_type=\"--\",\n        rename_param=\"no-check-peaks\",\n    )\n    multi: bool = Field(\n        False, description=\"Enable multi-lattice indexing.\", flag_type=\"--\"\n    )\n    wavelength_estimate: Optional[float] = Field(\n        description=\"Estimate for X-ray wavelength. Required for some methods.\",\n        flag_type=\"--\",\n        rename_param=\"wavelength-estimate\",\n    )\n    camera_length_estimate: Optional[float] = Field(\n        description=\"Estimate for camera distance. Required for some methods.\",\n        flag_type=\"--\",\n        rename_param=\"camera-length-estimate\",\n    )\n    max_indexer_threads: Optional[PositiveInt] = Field(\n        # 1,\n        description=\"Some indexing algos can use multiple threads. In addition to image-based.\",\n        flag_type=\"--\",\n        rename_param=\"max-indexer-threads\",\n    )\n    no_retry: bool = Field(\n        False,\n        description=\"Do not remove weak peaks and try again.\",\n        flag_type=\"--\",\n        rename_param=\"no-retry\",\n    )\n    no_refine: bool = Field(\n        False,\n        description=\"Skip refinement step.\",\n        flag_type=\"--\",\n        rename_param=\"no-refine\",\n    )\n    no_revalidate: bool = Field(\n        False,\n        description=\"Skip revalidation step.\",\n        flag_type=\"--\",\n        rename_param=\"no-revalidate\",\n    )\n    # TakeTwo specific parameters\n    taketwo_member_threshold: Optional[PositiveInt] = Field(\n        # 20,\n        description=\"Minimum number of vectors to consider.\",\n        flag_type=\"--\",\n        rename_param=\"taketwo-member-threshold\",\n    )\n    taketwo_len_tolerance: Optional[PositiveFloat] = Field(\n        # 0.001,\n        description=\"TakeTwo length tolerance in Angstroms.\",\n        flag_type=\"--\",\n        rename_param=\"taketwo-len-tolerance\",\n    )\n    taketwo_angle_tolerance: Optional[PositiveFloat] = Field(\n        # 0.6,\n        description=\"TakeTwo angle tolerance in degrees.\",\n        flag_type=\"--\",\n        rename_param=\"taketwo-angle-tolerance\",\n    )\n    taketwo_trace_tolerance: Optional[PositiveFloat] = Field(\n        # 3,\n        description=\"Matrix trace tolerance in degrees.\",\n        flag_type=\"--\",\n        rename_param=\"taketwo-trace-tolerance\",\n    )\n    # Felix-specific parameters\n    # felix_domega\n    # felix-fraction-max-visits\n    # felix-max-internal-angle\n    # felix-max-uniqueness\n    # felix-min-completeness\n    # felix-min-visits\n    # felix-num-voxels\n    # felix-sigma\n    # felix-tthrange-max\n    # felix-tthrange-min\n    # XGANDALF-specific parameters\n    xgandalf_sampling_pitch: Optional[NonNegativeInt] = Field(\n        # 6,\n        description=\"Density of reciprocal space sampling.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-sampling-pitch\",\n    )\n    xgandalf_grad_desc_iterations: Optional[NonNegativeInt] = Field(\n        # 4,\n        description=\"Number of gradient descent iterations.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-grad-desc-iterations\",\n    )\n    xgandalf_tolerance: Optional[PositiveFloat] = Field(\n        # 0.02,\n        description=\"Relative tolerance of lattice vectors\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-tolerance\",\n    )\n    xgandalf_no_deviation_from_provided_cell: Optional[bool] = Field(\n        description=\"Found unit cell must match provided.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-no-deviation-from-provided-cell\",\n    )\n    xgandalf_min_lattice_vector_length: Optional[PositiveFloat] = Field(\n        # 30,\n        description=\"Minimum possible lattice length.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-min-lattice-vector-length\",\n    )\n    xgandalf_max_lattice_vector_length: Optional[PositiveFloat] = Field(\n        # 250,\n        description=\"Minimum possible lattice length.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-max-lattice-vector-length\",\n    )\n    xgandalf_max_peaks: Optional[PositiveInt] = Field(\n        # 250,\n        description=\"Maximum number of peaks to use for indexing.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-max-peaks\",\n    )\n    xgandalf_fast_execution: bool = Field(\n        False,\n        description=\"Shortcut to set sampling-pitch=2, and grad-desc-iterations=3.\",\n        flag_type=\"--\",\n        rename_param=\"xgandalf-fast-execution\",\n    )\n    # pinkIndexer parameters\n    # ...\n    # asdf_fast: bool = Field(False, description=\"Enable fast mode for asdf. 3x faster for 7% loss in accuracy.\", flag_type=\"--\", rename_param=\"asdf-fast\")\n    # Integration parameters\n    integration: str = Field(\n        \"rings-nocen\", description=\"Method for integrating reflections.\", flag_type=\"--\"\n    )\n    fix_profile_radius: Optional[float] = Field(\n        description=\"Fix the profile radius (m^{-1})\",\n        flag_type=\"--\",\n        rename_param=\"fix-profile-radius\",\n    )\n    fix_divergence: Optional[float] = Field(\n        0,\n        description=\"Fix the divergence (rad, full angle).\",\n        flag_type=\"--\",\n        rename_param=\"fix-divergence\",\n    )\n    int_radius: str = Field(\n        \"4,5,7\",\n        description=\"Inner, middle, and outer radii for 3-ring integration.\",\n        flag_type=\"--\",\n        rename_param=\"int-radius\",\n    )\n    int_diag: str = Field(\n        \"none\",\n        description=\"Show detailed information on integration when condition is met.\",\n        flag_type=\"--\",\n        rename_param=\"int-diag\",\n    )\n    push_res: str = Field(\n        \"infinity\",\n        description=\"Integrate `x` higher than apparent resolution limit (nm-1).\",\n        flag_type=\"--\",\n        rename_param=\"push-res\",\n    )\n    overpredict: bool = Field(\n        False,\n        description=\"Over-predict reflections. Maybe useful with post-refinement.\",\n        flag_type=\"--\",\n    )\n    cell_parameters_only: bool = Field(\n        False, description=\"Do not predict refletions at all\", flag_type=\"--\"\n    )\n    # Output parameters\n    no_non_hits_in_stream: bool = Field(\n        False,\n        description=\"Exclude non-hits from the stream file.\",\n        flag_type=\"--\",\n        rename_param=\"no-non-hits-in-stream\",\n    )\n    copy_hheader: Optional[str] = Field(\n        description=\"Copy information from header in the image to output stream.\",\n        flag_type=\"--\",\n        rename_param=\"copy-hheader\",\n    )\n    no_peaks_in_stream: bool = Field(\n        False,\n        description=\"Do not record peaks in stream file.\",\n        flag_type=\"--\",\n        rename_param=\"no-peaks-in-stream\",\n    )\n    no_refls_in_stream: bool = Field(\n        False,\n        description=\"Do not record reflections in stream.\",\n        flag_type=\"--\",\n        rename_param=\"no-refls-in-stream\",\n    )\n    serial_offset: Optional[PositiveInt] = Field(\n        description=\"Start numbering at `x` instead of 1.\",\n        flag_type=\"--\",\n        rename_param=\"serial-offset\",\n    )\n    harvest_file: Optional[str] = Field(\n        description=\"Write parameters to file in JSON format.\",\n        flag_type=\"--\",\n        rename_param=\"harvest-file\",\n    )\n\n    @validator(\"in_file\", always=True)\n    def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n        if in_file == \"\":\n            filename: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"FindPeaksPyAlgos\", \"out_file\"\n            )\n            if filename is None:\n                exp: str = values[\"lute_config\"].experiment\n                run: int = int(values[\"lute_config\"].run)\n                tag: Optional[str] = read_latest_db_entry(\n                    f\"{values['lute_config'].work_dir}\", \"FindPeaksPsocake\", \"tag\"\n                )\n                out_dir: Optional[str] = read_latest_db_entry(\n                    f\"{values['lute_config'].work_dir}\", \"FindPeaksPsocake\", \"outDir\"\n                )\n                if out_dir is not None:\n                    fname: str = f\"{out_dir}/{exp}_{run:04d}\"\n                    if tag is not None:\n                        fname = f\"{fname}_{tag}\"\n                    return f\"{fname}.lst\"\n            else:\n                return filename\n        return in_file\n\n    @validator(\"out_file\", always=True)\n    def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n        if out_file == \"\":\n            expmt: str = values[\"lute_config\"].experiment\n            run: int = int(values[\"lute_config\"].run)\n            work_dir: str = values[\"lute_config\"].work_dir\n            fname: str = f\"{expmt}_r{run:04d}.stream\"\n            return f\"{work_dir}/{fname}\"\n        return out_file\n
"},{"location":"source/io/models/sfx_index/#io.models.sfx_index.IndexCrystFELParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_index.py
class Config(ThirdPartyParameters.Config):\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    long_flags_use_eq: bool = True\n    \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n
"},{"location":"source/io/models/sfx_index/#io.models.sfx_index.IndexCrystFELParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True class-attribute instance-attribute","text":"

Whether long command-line arguments are passed like --long=arg.

"},{"location":"source/io/models/sfx_index/#io.models.sfx_index.IndexCrystFELParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/models/sfx_merge/","title":"sfx_merge","text":"

Models for merging reflections in serial femtosecond crystallography.

Classes:

Name Description MergePartialatorParameters

Perform merging using CrystFEL's partialator.

CompareHKLParameters

Calculate figures of merit using CrystFEL's compare_hkl.

ManipulateHKLParameters

Perform transformations on lists of reflections using CrystFEL's get_hkl.

"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.CompareHKLParameters","title":"CompareHKLParameters","text":"

Bases: ThirdPartyParameters

Parameters for CrystFEL's compare_hkl for calculating figures of merit.

There are many parameters, and many combinations. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-partialator.html

Source code in lute/io/models/sfx_merge.py
class CompareHKLParameters(ThirdPartyParameters):\n    \"\"\"Parameters for CrystFEL's `compare_hkl` for calculating figures of merit.\n\n    There are many parameters, and many combinations. For more information on\n    usage, please refer to the CrystFEL documentation, here:\n    https://www.desy.de/~twhite/crystfel/manual-partialator.html\n    \"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        long_flags_use_eq: bool = True\n        \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/compare_hkl\",\n        description=\"CrystFEL's reflection comparison binary.\",\n        flag_type=\"\",\n    )\n    in_files: Optional[str] = Field(\n        \"\",\n        description=\"Path to input HKLs. Space-separated list of 2. Use output of partialator e.g.\",\n        flag_type=\"\",\n    )\n    ## Need mechanism to set is_result=True ...\n    symmetry: str = Field(\"\", description=\"Point group symmetry.\", flag_type=\"--\")\n    cell_file: str = Field(\n        \"\",\n        description=\"Path to a file containing unit cell information (PDB or CrystFEL format).\",\n        flag_type=\"-\",\n        rename_param=\"p\",\n    )\n    fom: str = Field(\n        \"Rsplit\", description=\"Specify figure of merit to calculate.\", flag_type=\"--\"\n    )\n    nshells: int = Field(10, description=\"Use n resolution shells.\", flag_type=\"--\")\n    # NEED A NEW CASE FOR THIS -> Boolean flag, no arg, one hyphen...\n    # fix_unity: bool = Field(\n    #    False,\n    #    description=\"Fix scale factors to unity.\",\n    #    flag_type=\"-\",\n    #    rename_param=\"u\",\n    # )\n    shell_file: str = Field(\n        \"\",\n        description=\"Write the statistics in resolution shells to a file.\",\n        flag_type=\"--\",\n        rename_param=\"shell-file\",\n        is_result=True,\n    )\n    ignore_negs: bool = Field(\n        False,\n        description=\"Ignore reflections with negative reflections.\",\n        flag_type=\"--\",\n        rename_param=\"ignore-negs\",\n    )\n    zero_negs: bool = Field(\n        False,\n        description=\"Set negative intensities to 0.\",\n        flag_type=\"--\",\n        rename_param=\"zero-negs\",\n    )\n    sigma_cutoff: Optional[Union[float, int, str]] = Field(\n        # \"-infinity\",\n        description=\"Discard reflections with I/sigma(I) < n. -infinity means no cutoff.\",\n        flag_type=\"--\",\n        rename_param=\"sigma-cutoff\",\n    )\n    rmin: Optional[float] = Field(\n        description=\"Low resolution cutoff of 1/d (m-1). Use this or --lowres NOT both.\",\n        flag_type=\"--\",\n    )\n    lowres: Optional[float] = Field(\n        descirption=\"Low resolution cutoff in Angstroms. Use this or --rmin NOT both.\",\n        flag_type=\"--\",\n    )\n    rmax: Optional[float] = Field(\n        description=\"High resolution cutoff in 1/d (m-1). Use this or --highres NOT both.\",\n        flag_type=\"--\",\n    )\n    highres: Optional[float] = Field(\n        description=\"High resolution cutoff in Angstroms. Use this or --rmax NOT both.\",\n        flag_type=\"--\",\n    )\n\n    @validator(\"in_files\", always=True)\n    def validate_in_files(cls, in_files: str, values: Dict[str, Any]) -> str:\n        if in_files == \"\":\n            partialator_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n            )\n            if partialator_file:\n                hkls: str = f\"{partialator_file}1 {partialator_file}2\"\n                return hkls\n        return in_files\n\n    @validator(\"cell_file\", always=True)\n    def validate_cell_file(cls, cell_file: str, values: Dict[str, Any]) -> str:\n        if cell_file == \"\":\n            idx_cell_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\",\n                \"IndexCrystFEL\",\n                \"cell_file\",\n                valid_only=False,\n            )\n            if idx_cell_file:\n                return idx_cell_file\n        return cell_file\n\n    @validator(\"symmetry\", always=True)\n    def validate_symmetry(cls, symmetry: str, values: Dict[str, Any]) -> str:\n        if symmetry == \"\":\n            partialator_sym: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"symmetry\"\n            )\n            if partialator_sym:\n                return partialator_sym\n        return symmetry\n\n    @validator(\"shell_file\", always=True)\n    def validate_shell_file(cls, shell_file: str, values: Dict[str, Any]) -> str:\n        if shell_file == \"\":\n            partialator_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n            )\n            if partialator_file:\n                shells_out: str = partialator_file.split(\".\")[0]\n                shells_out = f\"{shells_out}_{values['fom']}_n{values['nshells']}.dat\"\n                return shells_out\n        return shell_file\n
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.CompareHKLParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_merge.py
class Config(ThirdPartyParameters.Config):\n    long_flags_use_eq: bool = True\n    \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.CompareHKLParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True class-attribute instance-attribute","text":"

Whether long command-line arguments are passed like --long=arg.

"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.CompareHKLParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.ManipulateHKLParameters","title":"ManipulateHKLParameters","text":"

Bases: ThirdPartyParameters

Parameters for CrystFEL's get_hkl for manipulating lists of reflections.

This Task is predominantly used internally to convert hkl to mtz files. Note that performing multiple manipulations is undefined behaviour. Run the Task with multiple configurations in explicit separate steps. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-partialator.html

Source code in lute/io/models/sfx_merge.py
class ManipulateHKLParameters(ThirdPartyParameters):\n    \"\"\"Parameters for CrystFEL's `get_hkl` for manipulating lists of reflections.\n\n    This Task is predominantly used internally to convert `hkl` to `mtz` files.\n    Note that performing multiple manipulations is undefined behaviour. Run\n    the Task with multiple configurations in explicit separate steps. For more\n    information on usage, please refer to the CrystFEL documentation, here:\n    https://www.desy.de/~twhite/crystfel/manual-partialator.html\n    \"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        long_flags_use_eq: bool = True\n        \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/get_hkl\",\n        description=\"CrystFEL's reflection manipulation binary.\",\n        flag_type=\"\",\n    )\n    in_file: str = Field(\n        \"\",\n        description=\"Path to input HKL file.\",\n        flag_type=\"-\",\n        rename_param=\"i\",\n    )\n    out_file: str = Field(\n        \"\",\n        description=\"Path to output file.\",\n        flag_type=\"-\",\n        rename_param=\"o\",\n        is_result=True,\n    )\n    cell_file: str = Field(\n        \"\",\n        description=\"Path to a file containing unit cell information (PDB or CrystFEL format).\",\n        flag_type=\"-\",\n        rename_param=\"p\",\n    )\n    output_format: str = Field(\n        \"mtz\",\n        description=\"Output format. One of mtz, mtz-bij, or xds. Otherwise CrystFEL format.\",\n        flag_type=\"--\",\n        rename_param=\"output-format\",\n    )\n    expand: Optional[str] = Field(\n        description=\"Reflections will be expanded to fill asymmetric unit of specified point group.\",\n        flag_type=\"--\",\n    )\n    # Reducing reflections to higher symmetry\n    twin: Optional[str] = Field(\n        description=\"Reflections equivalent to specified point group will have intensities summed.\",\n        flag_type=\"--\",\n    )\n    no_need_all_parts: Optional[bool] = Field(\n        description=\"Use with --twin to allow reflections missing a 'twin mate' to be written out.\",\n        flag_type=\"--\",\n        rename_param=\"no-need-all-parts\",\n    )\n    # Noise - Add to data\n    noise: Optional[bool] = Field(\n        description=\"Generate 10% uniform noise.\", flag_type=\"--\"\n    )\n    poisson: Optional[bool] = Field(\n        description=\"Generate Poisson noise. Intensities assumed to be A.U.\",\n        flag_type=\"--\",\n    )\n    adu_per_photon: Optional[int] = Field(\n        description=\"Use with --poisson to convert A.U. to photons.\",\n        flag_type=\"--\",\n        rename_param=\"adu-per-photon\",\n    )\n    # Remove duplicate reflections\n    trim_centrics: Optional[bool] = Field(\n        description=\"Duplicated reflections (according to symmetry) are removed.\",\n        flag_type=\"--\",\n    )\n    # Restrict to template file\n    template: Optional[str] = Field(\n        description=\"Only reflections which also appear in specified file are written out.\",\n        flag_type=\"--\",\n    )\n    # Multiplicity\n    multiplicity: Optional[bool] = Field(\n        description=\"Reflections are multiplied by their symmetric multiplicites.\",\n        flag_type=\"--\",\n    )\n    # Resolution cutoffs\n    cutoff_angstroms: Optional[Union[str, int, float]] = Field(\n        description=\"Either n, or n1,n2,n3. For n, reflections < n are removed. For n1,n2,n3 anisotropic trunction performed at separate resolution limits for a*, b*, c*.\",\n        flag_type=\"--\",\n        rename_param=\"cutoff-angstroms\",\n    )\n    lowres: Optional[float] = Field(\n        description=\"Remove reflections with d > n\", flag_type=\"--\"\n    )\n    highres: Optional[float] = Field(\n        description=\"Synonym for first form of --cutoff-angstroms\"\n    )\n    reindex: Optional[str] = Field(\n        description=\"Reindex according to specified operator. E.g. k,h,-l.\",\n        flag_type=\"--\",\n    )\n    # Override input symmetry\n    symmetry: Optional[str] = Field(\n        description=\"Point group symmetry to use to override. Almost always OMIT this option.\",\n        flag_type=\"--\",\n    )\n\n    @validator(\"in_file\", always=True)\n    def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n        if in_file == \"\":\n            partialator_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n            )\n            if partialator_file:\n                return partialator_file\n        return in_file\n\n    @validator(\"out_file\", always=True)\n    def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n        if out_file == \"\":\n            partialator_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n            )\n            if partialator_file:\n                mtz_out: str = partialator_file.split(\".\")[0]\n                mtz_out = f\"{mtz_out}.mtz\"\n                return mtz_out\n        return out_file\n\n    @validator(\"cell_file\", always=True)\n    def validate_cell_file(cls, cell_file: str, values: Dict[str, Any]) -> str:\n        if cell_file == \"\":\n            idx_cell_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\",\n                \"IndexCrystFEL\",\n                \"cell_file\",\n                valid_only=False,\n            )\n            if idx_cell_file:\n                return idx_cell_file\n        return cell_file\n
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.ManipulateHKLParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_merge.py
class Config(ThirdPartyParameters.Config):\n    long_flags_use_eq: bool = True\n    \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.ManipulateHKLParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True class-attribute instance-attribute","text":"

Whether long command-line arguments are passed like --long=arg.

"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.ManipulateHKLParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.MergePartialatorParameters","title":"MergePartialatorParameters","text":"

Bases: ThirdPartyParameters

Parameters for CrystFEL's partialator.

There are many parameters, and many combinations. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-partialator.html

Source code in lute/io/models/sfx_merge.py
class MergePartialatorParameters(ThirdPartyParameters):\n    \"\"\"Parameters for CrystFEL's `partialator`.\n\n    There are many parameters, and many combinations. For more information on\n    usage, please refer to the CrystFEL documentation, here:\n    https://www.desy.de/~twhite/crystfel/manual-partialator.html\n    \"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        long_flags_use_eq: bool = True\n        \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/partialator\",\n        description=\"CrystFEL's Partialator binary.\",\n        flag_type=\"\",\n    )\n    in_file: Optional[str] = Field(\n        \"\", description=\"Path to input stream.\", flag_type=\"-\", rename_param=\"i\"\n    )\n    out_file: str = Field(\n        \"\",\n        description=\"Path to output file.\",\n        flag_type=\"-\",\n        rename_param=\"o\",\n        is_result=True,\n    )\n    symmetry: str = Field(description=\"Point group symmetry.\", flag_type=\"--\")\n    niter: Optional[int] = Field(\n        description=\"Number of cycles of scaling and post-refinement.\",\n        flag_type=\"-\",\n        rename_param=\"n\",\n    )\n    no_scale: Optional[bool] = Field(\n        description=\"Disable scaling.\", flag_type=\"--\", rename_param=\"no-scale\"\n    )\n    no_Bscale: Optional[bool] = Field(\n        description=\"Disable Debye-Waller part of scaling.\",\n        flag_type=\"--\",\n        rename_param=\"no-Bscale\",\n    )\n    no_pr: Optional[bool] = Field(\n        description=\"Disable orientation model.\", flag_type=\"--\", rename_param=\"no-pr\"\n    )\n    no_deltacchalf: Optional[bool] = Field(\n        description=\"Disable rejection based on deltaCC1/2.\",\n        flag_type=\"--\",\n        rename_param=\"no-deltacchalf\",\n    )\n    model: str = Field(\n        \"unity\",\n        description=\"Partiality model. Options: xsphere, unity, offset, ggpm.\",\n        flag_type=\"--\",\n    )\n    nthreads: int = Field(\n        max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n        description=\"Number of parallel analyses.\",\n        flag_type=\"-\",\n        rename_param=\"j\",\n    )\n    polarisation: Optional[str] = Field(\n        description=\"Specification of incident polarisation. Refer to CrystFEL docs for more info.\",\n        flag_type=\"--\",\n    )\n    no_polarisation: Optional[bool] = Field(\n        description=\"Synonym for --polarisation=none\",\n        flag_type=\"--\",\n        rename_param=\"no-polarisation\",\n    )\n    max_adu: Optional[float] = Field(\n        description=\"Maximum intensity of reflection to include.\",\n        flag_type=\"--\",\n        rename_param=\"max-adu\",\n    )\n    min_res: Optional[float] = Field(\n        description=\"Only include crystals diffracting to a minimum resolution.\",\n        flag_type=\"--\",\n        rename_param=\"min-res\",\n    )\n    min_measurements: int = Field(\n        2,\n        description=\"Include a reflection only if it appears a minimum number of times.\",\n        flag_type=\"--\",\n        rename_param=\"min-measurements\",\n    )\n    push_res: Optional[float] = Field(\n        description=\"Merge reflections up to higher than the apparent resolution limit.\",\n        flag_type=\"--\",\n        rename_param=\"push-res\",\n    )\n    start_after: int = Field(\n        0,\n        description=\"Ignore the first n crystals.\",\n        flag_type=\"--\",\n        rename_param=\"start-after\",\n    )\n    stop_after: int = Field(\n        0,\n        description=\"Stop after processing n crystals. 0 means process all.\",\n        flag_type=\"--\",\n        rename_param=\"stop-after\",\n    )\n    no_free: Optional[bool] = Field(\n        description=\"Disable cross-validation. Testing ONLY.\",\n        flag_type=\"--\",\n        rename_param=\"no-free\",\n    )\n    custom_split: Optional[str] = Field(\n        description=\"Read a set of filenames, event and dataset IDs from a filename.\",\n        flag_type=\"--\",\n        rename_param=\"custom-split\",\n    )\n    max_rel_B: float = Field(\n        100,\n        description=\"Reject crystals if |relB| > n sq Angstroms.\",\n        flag_type=\"--\",\n        rename_param=\"max-rel-B\",\n    )\n    output_every_cycle: bool = Field(\n        False,\n        description=\"Write per-crystal params after every refinement cycle.\",\n        flag_type=\"--\",\n        rename_param=\"output-every-cycle\",\n    )\n    no_logs: bool = Field(\n        False,\n        description=\"Do not write logs needed for plots, maps and graphs.\",\n        flag_type=\"--\",\n        rename_param=\"no-logs\",\n    )\n    set_symmetry: Optional[str] = Field(\n        description=\"Set the apparent symmetry of the crystals to a point group.\",\n        flag_type=\"-\",\n        rename_param=\"w\",\n    )\n    operator: Optional[str] = Field(\n        description=\"Specify an ambiguity operator. E.g. k,h,-l.\", flag_type=\"--\"\n    )\n    force_bandwidth: Optional[float] = Field(\n        description=\"Set X-ray bandwidth. As percent, e.g. 0.0013 (0.13%).\",\n        flag_type=\"--\",\n        rename_param=\"force-bandwidth\",\n    )\n    force_radius: Optional[float] = Field(\n        description=\"Set the initial profile radius (nm-1).\",\n        flag_type=\"--\",\n        rename_param=\"force-radius\",\n    )\n    force_lambda: Optional[float] = Field(\n        description=\"Set the wavelength. In Angstroms.\",\n        flag_type=\"--\",\n        rename_param=\"force-lambda\",\n    )\n    harvest_file: Optional[str] = Field(\n        description=\"Write parameters to file in JSON format.\",\n        flag_type=\"--\",\n        rename_param=\"harvest-file\",\n    )\n\n    @validator(\"in_file\", always=True)\n    def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n        if in_file == \"\":\n            stream_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\",\n                \"ConcatenateStreamFiles\",\n                \"out_file\",\n            )\n            if stream_file:\n                return stream_file\n        return in_file\n\n    @validator(\"out_file\", always=True)\n    def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n        if out_file == \"\":\n            in_file: str = values[\"in_file\"]\n            if in_file:\n                tag: str = in_file.split(\".\")[0]\n                return f\"{tag}.hkl\"\n            else:\n                return \"partialator.hkl\"\n        return out_file\n
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.MergePartialatorParameters.Config","title":"Config","text":"

Bases: Config

Source code in lute/io/models/sfx_merge.py
class Config(ThirdPartyParameters.Config):\n    long_flags_use_eq: bool = True\n    \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.MergePartialatorParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True class-attribute instance-attribute","text":"

Whether long command-line arguments are passed like --long=arg.

"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.MergePartialatorParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/models/sfx_solve/","title":"sfx_solve","text":"

Models for structure solution in serial femtosecond crystallography.

Classes:

Name Description DimpleSolveParameters

Perform structure solution using CCP4's dimple (molecular replacement).

"},{"location":"source/io/models/sfx_solve/#io.models.sfx_solve.DimpleSolveParameters","title":"DimpleSolveParameters","text":"

Bases: ThirdPartyParameters

Parameters for CCP4's dimple program.

There are many parameters. For more information on usage, please refer to the CCP4 documentation, here: https://ccp4.github.io/dimple/

Source code in lute/io/models/sfx_solve.py
class DimpleSolveParameters(ThirdPartyParameters):\n    \"\"\"Parameters for CCP4's dimple program.\n\n    There are many parameters. For more information on\n    usage, please refer to the CCP4 documentation, here:\n    https://ccp4.github.io/dimple/\n    \"\"\"\n\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/ccp4-8.0/bin/dimple\",\n        description=\"CCP4 Dimple for solving structures with MR.\",\n        flag_type=\"\",\n    )\n    # Positional requirements - all required.\n    in_file: str = Field(\n        \"\",\n        description=\"Path to input mtz.\",\n        flag_type=\"\",\n    )\n    pdb: str = Field(\"\", description=\"Path to a PDB.\", flag_type=\"\")\n    out_dir: str = Field(\"\", description=\"Output DIRECTORY.\", flag_type=\"\")\n    # Most used options\n    mr_thresh: PositiveFloat = Field(\n        0.4,\n        description=\"Threshold for molecular replacement.\",\n        flag_type=\"--\",\n        rename_param=\"mr-when-r\",\n    )\n    slow: Optional[bool] = Field(\n        False, description=\"Perform more refinement.\", flag_type=\"--\"\n    )\n    # Other options (IO)\n    hklout: str = Field(\n        \"final.mtz\", description=\"Output mtz file name.\", flag_type=\"--\"\n    )\n    xyzout: str = Field(\n        \"final.pdb\", description=\"Output PDB file name.\", flag_type=\"--\"\n    )\n    icolumn: Optional[str] = Field(\n        # \"IMEAN\",\n        description=\"Name for the I column.\",\n        flag_type=\"--\",\n    )\n    sigicolumn: Optional[str] = Field(\n        # \"SIG<ICOL>\",\n        description=\"Name for the Sig<I> column.\",\n        flag_type=\"--\",\n    )\n    fcolumn: Optional[str] = Field(\n        # \"F\",\n        description=\"Name for the F column.\",\n        flag_type=\"--\",\n    )\n    sigfcolumn: Optional[str] = Field(\n        # \"F\",\n        description=\"Name for the Sig<F> column.\",\n        flag_type=\"--\",\n    )\n    libin: Optional[str] = Field(\n        description=\"Ligand descriptions for refmac (LIBIN).\", flag_type=\"--\"\n    )\n    refmac_key: Optional[str] = Field(\n        description=\"Extra Refmac keywords to use in refinement.\",\n        flag_type=\"--\",\n        rename_param=\"refmac-key\",\n    )\n    free_r_flags: Optional[str] = Field(\n        description=\"Path to a mtz file with freeR flags.\",\n        flag_type=\"--\",\n        rename_param=\"free-r-flags\",\n    )\n    freecolumn: Optional[Union[int, float]] = Field(\n        # 0,\n        description=\"Refree column with an optional value.\",\n        flag_type=\"--\",\n    )\n    img_format: Optional[str] = Field(\n        description=\"Format of generated images. (png, jpeg, none).\",\n        flag_type=\"-\",\n        rename_param=\"f\",\n    )\n    white_bg: bool = Field(\n        False,\n        description=\"Use a white background in Coot and in images.\",\n        flag_type=\"--\",\n        rename_param=\"white-bg\",\n    )\n    no_cleanup: bool = Field(\n        False,\n        description=\"Retain intermediate files.\",\n        flag_type=\"--\",\n        rename_param=\"no-cleanup\",\n    )\n    # Calculations\n    no_blob_search: bool = Field(\n        False,\n        description=\"Do not search for unmodelled blobs.\",\n        flag_type=\"--\",\n        rename_param=\"no-blob-search\",\n    )\n    anode: bool = Field(\n        False, description=\"Use SHELX/AnoDe to find peaks in the anomalous map.\"\n    )\n    # Run customization\n    no_hetatm: bool = Field(\n        False,\n        description=\"Remove heteroatoms from the given model.\",\n        flag_type=\"--\",\n        rename_param=\"no-hetatm\",\n    )\n    rigid_cycles: Optional[PositiveInt] = Field(\n        # 10,\n        description=\"Number of cycles of rigid-body refinement to perform.\",\n        flag_type=\"--\",\n        rename_param=\"rigid-cycles\",\n    )\n    jelly: Optional[PositiveInt] = Field(\n        # 4,\n        description=\"Number of cycles of jelly-body refinement to perform.\",\n        flag_type=\"--\",\n    )\n    restr_cycles: Optional[PositiveInt] = Field(\n        # 8,\n        description=\"Number of cycles of refmac final refinement to perform.\",\n        flag_type=\"--\",\n        rename_param=\"restr-cycles\",\n    )\n    lim_resolution: Optional[PositiveFloat] = Field(\n        description=\"Limit the final resolution.\", flag_type=\"--\", rename_param=\"reso\"\n    )\n    weight: Optional[str] = Field(\n        # \"auto-weight\",\n        description=\"The refmac matrix weight.\",\n        flag_type=\"--\",\n    )\n    mr_prog: Optional[str] = Field(\n        # \"phaser\",\n        description=\"Molecular replacement program. phaser or molrep.\",\n        flag_type=\"--\",\n        rename_param=\"mr-prog\",\n    )\n    mr_num: Optional[Union[str, int]] = Field(\n        # \"auto\",\n        description=\"Number of molecules to use for molecular replacement.\",\n        flag_type=\"--\",\n        rename_param=\"mr-num\",\n    )\n    mr_reso: Optional[PositiveFloat] = Field(\n        # 3.25,\n        description=\"High resolution for molecular replacement. If >10 interpreted as eLLG.\",\n        flag_type=\"--\",\n        rename_param=\"mr-reso\",\n    )\n    itof_prog: Optional[str] = Field(\n        description=\"Program to calculate amplitudes. truncate, or ctruncate.\",\n        flag_type=\"--\",\n        rename_param=\"ItoF-prog\",\n    )\n\n    @validator(\"in_file\", always=True)\n    def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n        if in_file == \"\":\n            get_hkl_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"ManipulateHKL\", \"out_file\"\n            )\n            if get_hkl_file:\n                return get_hkl_file\n        return in_file\n\n    @validator(\"out_dir\", always=True)\n    def validate_out_dir(cls, out_dir: str, values: Dict[str, Any]) -> str:\n        if out_dir == \"\":\n            get_hkl_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"ManipulateHKL\", \"out_file\"\n            )\n            if get_hkl_file:\n                return os.path.dirname(get_hkl_file)\n        return out_dir\n
"},{"location":"source/io/models/sfx_solve/#io.models.sfx_solve.RunSHELXCParameters","title":"RunSHELXCParameters","text":"

Bases: ThirdPartyParameters

Parameters for CCP4's SHELXC program.

SHELXC prepares files for SHELXD and SHELXE.

For more information please refer to the official documentation: https://www.ccp4.ac.uk/html/crank.html

Source code in lute/io/models/sfx_solve.py
class RunSHELXCParameters(ThirdPartyParameters):\n    \"\"\"Parameters for CCP4's SHELXC program.\n\n    SHELXC prepares files for SHELXD and SHELXE.\n\n    For more information please refer to the official documentation:\n    https://www.ccp4.ac.uk/html/crank.html\n    \"\"\"\n\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/ccp4-8.0/bin/shelxc\",\n        description=\"CCP4 SHELXC. Generates input files for SHELXD/SHELXE.\",\n        flag_type=\"\",\n    )\n    placeholder: str = Field(\n        \"xx\", description=\"Placeholder filename stem.\", flag_type=\"\"\n    )\n    in_file: str = Field(\n        \"\",\n        description=\"Input file for SHELXC with reflections AND proper records.\",\n        flag_type=\"\",\n    )\n\n    @validator(\"in_file\", always=True)\n    def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n        if in_file == \"\":\n            # get_hkl needed to be run to produce an XDS format file...\n            xds_format_file: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\", \"ManipulateHKL\", \"out_file\"\n            )\n            if xds_format_file:\n                in_file = xds_format_file\n        if in_file[0] != \"<\":\n            # Need to add a redirection for this program\n            # Runs like `shelxc xx <input_file.xds`\n            in_file = f\"<{in_file}\"\n        return in_file\n
"},{"location":"source/io/models/smd/","title":"smd","text":"

Models for smalldata_tools Tasks.

Classes:

Name Description SubmitSMDParameters

Parameters to run smalldata_tools to produce a smalldata HDF5 file.

FindOverlapXSSParameters

Parameter model for the FindOverlapXSS Task. Used to determine spatial/temporal overlap based on XSS difference signal.

"},{"location":"source/io/models/smd/#io.models.smd.FindOverlapXSSParameters","title":"FindOverlapXSSParameters","text":"

Bases: TaskParameters

TaskParameter model for FindOverlapXSS Task.

This Task determines spatial or temporal overlap between an optical pulse and the FEL pulse based on difference scattering (XSS) signal. This Task uses SmallData HDF5 files as a source.

Source code in lute/io/models/smd.py
class FindOverlapXSSParameters(TaskParameters):\n    \"\"\"TaskParameter model for FindOverlapXSS Task.\n\n    This Task determines spatial or temporal overlap between an optical pulse\n    and the FEL pulse based on difference scattering (XSS) signal. This Task\n    uses SmallData HDF5 files as a source.\n    \"\"\"\n\n    class ExpConfig(BaseModel):\n        det_name: str\n        ipm_var: str\n        scan_var: Union[str, List[str]]\n\n    class Thresholds(BaseModel):\n        min_Iscat: Union[int, float]\n        min_ipm: Union[int, float]\n\n    class AnalysisFlags(BaseModel):\n        use_pyfai: bool = True\n        use_asymls: bool = False\n\n    exp_config: ExpConfig\n    thresholds: Thresholds\n    analysis_flags: AnalysisFlags\n
"},{"location":"source/io/models/smd/#io.models.smd.SubmitSMDParameters","title":"SubmitSMDParameters","text":"

Bases: ThirdPartyParameters

Parameters for running smalldata to produce reduced HDF5 files.

Source code in lute/io/models/smd.py
class SubmitSMDParameters(ThirdPartyParameters):\n    \"\"\"Parameters for running smalldata to produce reduced HDF5 files.\"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        \"\"\"Identical to super-class Config but includes a result.\"\"\"\n\n        set_result: bool = True\n        \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n        result_from_params: str = \"\"\n        \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n\n    executable: str = Field(\"mpirun\", description=\"MPI executable.\", flag_type=\"\")\n    np: PositiveInt = Field(\n        max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n        description=\"Number of processes\",\n        flag_type=\"-\",\n    )\n    p_arg1: str = Field(\n        \"python\", description=\"Executable to run with mpi (i.e. python).\", flag_type=\"\"\n    )\n    u: str = Field(\n        \"\", description=\"Python option for unbuffered output.\", flag_type=\"-\"\n    )\n    m: str = Field(\n        \"mpi4py.run\",\n        description=\"Python option to execute a module's contents as __main__ module.\",\n        flag_type=\"-\",\n    )\n    producer: str = Field(\n        \"\", description=\"Path to the SmallData producer Python script.\", flag_type=\"\"\n    )\n    run: str = Field(\n        os.environ.get(\"RUN_NUM\", \"\"), description=\"DAQ Run Number.\", flag_type=\"--\"\n    )\n    experiment: str = Field(\n        os.environ.get(\"EXPERIMENT\", \"\"),\n        description=\"LCLS Experiment Number.\",\n        flag_type=\"--\",\n    )\n    stn: NonNegativeInt = Field(0, description=\"Hutch endstation.\", flag_type=\"--\")\n    nevents: int = Field(\n        int(1e9), description=\"Number of events to process.\", flag_type=\"--\"\n    )\n    directory: Optional[str] = Field(\n        None,\n        description=\"Optional output directory. If None, will be in ${EXP_FOLDER}/hdf5/smalldata.\",\n        flag_type=\"--\",\n    )\n    ## Need mechanism to set result_from_param=True ...\n    gather_interval: PositiveInt = Field(\n        25, description=\"Number of events to collect at a time.\", flag_type=\"--\"\n    )\n    norecorder: bool = Field(\n        False, description=\"Whether to ignore recorder streams.\", flag_type=\"--\"\n    )\n    url: HttpUrl = Field(\n        \"https://pswww.slac.stanford.edu/ws-auth/lgbk\",\n        description=\"Base URL for eLog posting.\",\n        flag_type=\"--\",\n    )\n    epicsAll: bool = Field(\n        False,\n        description=\"Whether to store all EPICS PVs. Use with care.\",\n        flag_type=\"--\",\n    )\n    full: bool = Field(\n        False,\n        description=\"Whether to store all data. Use with EXTRA care.\",\n        flag_type=\"--\",\n    )\n    fullSum: bool = Field(\n        False,\n        description=\"Whether to store sums for all area detector images.\",\n        flag_type=\"--\",\n    )\n    default: bool = Field(\n        False,\n        description=\"Whether to store only the default minimal set of data.\",\n        flag_type=\"--\",\n    )\n    image: bool = Field(\n        False,\n        description=\"Whether to save everything as images. Use with care.\",\n        flag_type=\"--\",\n    )\n    tiff: bool = Field(\n        False,\n        description=\"Whether to save all images as a single TIFF. Use with EXTRA care.\",\n        flag_type=\"--\",\n    )\n    centerpix: bool = Field(\n        False,\n        description=\"Whether to mask center pixels for Epix10k2M detectors.\",\n        flag_type=\"--\",\n    )\n    postRuntable: bool = Field(\n        False,\n        description=\"Whether to post run tables. Also used as a trigger for summary jobs.\",\n        flag_type=\"--\",\n    )\n    wait: bool = Field(\n        False, description=\"Whether to wait for a file to appear.\", flag_type=\"--\"\n    )\n    xtcav: bool = Field(\n        False,\n        description=\"Whether to add XTCAV processing to the HDF5 generation.\",\n        flag_type=\"--\",\n    )\n    noarch: bool = Field(\n        False, description=\"Whether to not use archiver data.\", flag_type=\"--\"\n    )\n\n    lute_template_cfg: TemplateConfig = TemplateConfig(template_name=\"\", output_path=\"\")\n\n    @validator(\"producer\", always=True)\n    def validate_producer_path(cls, producer: str) -> str:\n        return producer\n\n    @validator(\"lute_template_cfg\", always=True)\n    def use_producer(\n        cls, lute_template_cfg: TemplateConfig, values: Dict[str, Any]\n    ) -> TemplateConfig:\n        if not lute_template_cfg.output_path:\n            lute_template_cfg.output_path = values[\"producer\"]\n        return lute_template_cfg\n\n    @root_validator(pre=False)\n    def define_result(cls, values: Dict[str, Any]) -> Dict[str, Any]:\n        exp: str = values[\"lute_config\"].experiment\n        hutch: str = exp[:3]\n        run: int = int(values[\"lute_config\"].run)\n        directory: Optional[str] = values[\"directory\"]\n        if directory is None:\n            directory = f\"/sdf/data/lcls/ds/{hutch}/{exp}/hdf5/smalldata\"\n        fname: str = f\"{exp}_Run{run:04d}.h5\"\n\n        cls.Config.result_from_params = f\"{directory}/{fname}\"\n        return values\n
"},{"location":"source/io/models/smd/#io.models.smd.SubmitSMDParameters.Config","title":"Config","text":"

Bases: Config

Identical to super-class Config but includes a result.

Source code in lute/io/models/smd.py
class Config(ThirdPartyParameters.Config):\n    \"\"\"Identical to super-class Config but includes a result.\"\"\"\n\n    set_result: bool = True\n    \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n    result_from_params: str = \"\"\n    \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n
"},{"location":"source/io/models/smd/#io.models.smd.SubmitSMDParameters.Config.result_from_params","title":"result_from_params: str = '' class-attribute instance-attribute","text":"

Defines a result from the parameters. Use a validator to do so.

"},{"location":"source/io/models/smd/#io.models.smd.SubmitSMDParameters.Config.set_result","title":"set_result: bool = True class-attribute instance-attribute","text":"

Whether the Executor should mark a specified parameter as a result.

"},{"location":"source/io/models/tests/","title":"tests","text":"

Models for all test Tasks.

Classes:

Name Description TestParameters

Model for most basic test case. Single core first-party Task. Uses only communication via pipes.

TestBinaryParameters

Parameters for a simple multi- threaded binary executable.

TestSocketParameters

Model for first-party test requiring communication via socket.

TestWriteOutputParameters

Model for test Task which writes an output file. Location of file is recorded in database.

TestReadOutputParameters

Model for test Task which locates an output file based on an entry in the database, if no path is provided.

"},{"location":"source/io/models/tests/#io.models.tests.TestBinaryErrParameters","title":"TestBinaryErrParameters","text":"

Bases: ThirdPartyParameters

Same as TestBinary, but exits with non-zero code.

Source code in lute/io/models/tests.py
class TestBinaryErrParameters(ThirdPartyParameters):\n    \"\"\"Same as TestBinary, but exits with non-zero code.\"\"\"\n\n    executable: str = Field(\n        \"/sdf/home/d/dorlhiac/test_tasks/test_threads_err\",\n        description=\"Multi-threaded tes tbinary with non-zero exit code.\",\n    )\n    p_arg1: int = Field(1, description=\"Number of threads.\")\n
"},{"location":"source/io/models/tests/#io.models.tests.TestParameters","title":"TestParameters","text":"

Bases: TaskParameters

Parameters for the test Task Test.

Source code in lute/io/models/tests.py
class TestParameters(TaskParameters):\n    \"\"\"Parameters for the test Task `Test`.\"\"\"\n\n    float_var: float = Field(0.01, description=\"A floating point number.\")\n    str_var: str = Field(\"test\", description=\"A string.\")\n\n    class CompoundVar(BaseModel):\n        int_var: int = 1\n        dict_var: Dict[str, str] = {\"a\": \"b\"}\n\n    compound_var: CompoundVar = Field(\n        description=(\n            \"A compound parameter - consists of a `int_var` (int) and `dict_var`\"\n            \" (Dict[str, str]).\"\n        )\n    )\n    throw_error: bool = Field(\n        False, description=\"If `True`, raise an exception to test error handling.\"\n    )\n
"},{"location":"source/tasks/dataclasses/","title":"dataclasses","text":"

Classes for describing Task state and results.

Classes:

Name Description TaskResult

Output of a specific analysis task.

TaskStatus

Enumeration of possible Task statuses (running, pending, failed, etc.).

DescribedAnalysis

Executor's description of a Task run (results, parameters, env).

"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.DescribedAnalysis","title":"DescribedAnalysis dataclass","text":"

Complete analysis description. Held by an Executor.

Source code in lute/tasks/dataclasses.py
@dataclass\nclass DescribedAnalysis:\n    \"\"\"Complete analysis description. Held by an Executor.\"\"\"\n\n    task_result: TaskResult\n    task_parameters: Optional[TaskParameters]\n    task_env: Dict[str, str]\n    poll_interval: float\n    communicator_desc: List[str]\n
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.ElogSummaryPlots","title":"ElogSummaryPlots dataclass","text":"

Holds a graphical summary intended for display in the eLog.

Attributes:

Name Type Description display_name str

This represents both a path and how the result will be displayed in the eLog. Can include \"/\" characters. E.g. display_name = \"scans/my_motor_scan\" will have plots shown on a \"my_motor_scan\" page, under a \"scans\" tab. This format mirrors how the file is stored on disk as well.

Source code in lute/tasks/dataclasses.py
@dataclass\nclass ElogSummaryPlots:\n    \"\"\"Holds a graphical summary intended for display in the eLog.\n\n    Attributes:\n        display_name (str): This represents both a path and how the result will be\n            displayed in the eLog. Can include \"/\" characters. E.g.\n            `display_name = \"scans/my_motor_scan\"` will have plots shown\n            on a \"my_motor_scan\" page, under a \"scans\" tab. This format mirrors\n            how the file is stored on disk as well.\n    \"\"\"\n\n    display_name: str\n    figures: Union[pn.Tabs, hv.Image, plt.Figure]\n
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskResult","title":"TaskResult dataclass","text":"

Class for storing the result of a Task's execution with metadata.

Attributes:

Name Type Description task_name str

Name of the associated task which produced it.

task_status TaskStatus

Status of associated task.

summary str

Short message/summary associated with the result.

payload Any

Actual result. May be data in any format.

impl_schemas Optional[str]

A string listing Task schemas implemented by the associated Task. Schemas define the category and expected output of the Task. An individual task may implement/conform to multiple schemas. Multiple schemas are separated by ';', e.g. * impl_schemas = \"schema1;schema2\"

Source code in lute/tasks/dataclasses.py
@dataclass\nclass TaskResult:\n    \"\"\"Class for storing the result of a Task's execution with metadata.\n\n    Attributes:\n        task_name (str): Name of the associated task which produced it.\n\n        task_status (TaskStatus): Status of associated task.\n\n        summary (str): Short message/summary associated with the result.\n\n        payload (Any): Actual result. May be data in any format.\n\n        impl_schemas (Optional[str]): A string listing `Task` schemas implemented\n            by the associated `Task`. Schemas define the category and expected\n            output of the `Task`. An individual task may implement/conform to\n            multiple schemas. Multiple schemas are separated by ';', e.g.\n                * impl_schemas = \"schema1;schema2\"\n    \"\"\"\n\n    task_name: str\n    task_status: TaskStatus\n    summary: str\n    payload: Any\n    impl_schemas: Optional[str] = None\n
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus","title":"TaskStatus","text":"

Bases: Enum

Possible Task statuses.

Source code in lute/tasks/dataclasses.py
class TaskStatus(Enum):\n    \"\"\"Possible Task statuses.\"\"\"\n\n    PENDING = 0\n    \"\"\"\n    Task has yet to run. Is Queued, or waiting for prior tasks.\n    \"\"\"\n    RUNNING = 1\n    \"\"\"\n    Task is in the process of execution.\n    \"\"\"\n    COMPLETED = 2\n    \"\"\"\n    Task has completed without fatal errors.\n    \"\"\"\n    FAILED = 3\n    \"\"\"\n    Task encountered a fatal error.\n    \"\"\"\n    STOPPED = 4\n    \"\"\"\n    Task was, potentially temporarily, stopped/suspended.\n    \"\"\"\n    CANCELLED = 5\n    \"\"\"\n    Task was cancelled prior to completion or failure.\n    \"\"\"\n    TIMEDOUT = 6\n    \"\"\"\n    Task did not reach completion due to timeout.\n    \"\"\"\n
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.CANCELLED","title":"CANCELLED = 5 class-attribute instance-attribute","text":"

Task was cancelled prior to completion or failure.

"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.COMPLETED","title":"COMPLETED = 2 class-attribute instance-attribute","text":"

Task has completed without fatal errors.

"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.FAILED","title":"FAILED = 3 class-attribute instance-attribute","text":"

Task encountered a fatal error.

"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.PENDING","title":"PENDING = 0 class-attribute instance-attribute","text":"

Task has yet to run. Is Queued, or waiting for prior tasks.

"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.RUNNING","title":"RUNNING = 1 class-attribute instance-attribute","text":"

Task is in the process of execution.

"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.STOPPED","title":"STOPPED = 4 class-attribute instance-attribute","text":"

Task was, potentially temporarily, stopped/suspended.

"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.TIMEDOUT","title":"TIMEDOUT = 6 class-attribute instance-attribute","text":"

Task did not reach completion due to timeout.

"},{"location":"source/tasks/sfx_find_peaks/","title":"sfx_find_peaks","text":"

Classes for peak finding tasks in SFX.

Classes:

Name Description CxiWriter

utility class for writing peak finding results to CXI files.

FindPeaksPyAlgos

peak finding using psana's PyAlgos algorithm. Optional data compression and decompression with libpressio for data reduction tests.

"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.CxiWriter","title":"CxiWriter","text":"Source code in lute/tasks/sfx_find_peaks.py
class CxiWriter:\n\n    def __init__(\n        self,\n        outdir: str,\n        rank: int,\n        exp: str,\n        run: int,\n        n_events: int,\n        det_shape: Tuple[int, ...],\n        min_peaks: int,\n        max_peaks: int,\n        i_x: Any,  # Not typed becomes it comes from psana\n        i_y: Any,  # Not typed becomes it comes from psana\n        ipx: Any,  # Not typed becomes it comes from psana\n        ipy: Any,  # Not typed becomes it comes from psana\n        tag: str,\n    ):\n        \"\"\"\n        Set up the CXI files to which peak finding results will be saved.\n\n        Parameters:\n\n            outdir (str): Output directory for cxi file.\n\n            rank (int): MPI rank of the caller.\n\n            exp (str): Experiment string.\n\n            run (int): Experimental run.\n\n            n_events (int): Number of events to process.\n\n            det_shape (Tuple[int, int]): Shape of the numpy array storing the detector\n                data. This must be aCheetah-stile 2D array.\n\n            min_peaks (int): Minimum number of peaks per image.\n\n            max_peaks (int): Maximum number of peaks per image.\n\n            i_x (Any): Array of pixel indexes along x\n\n            i_y (Any): Array of pixel indexes along y\n\n            ipx (Any): Pixel indexes with respect to detector origin (x component)\n\n            ipy (Any): Pixel indexes with respect to detector origin (y component)\n\n            tag (str): Tag to append to cxi file names.\n        \"\"\"\n        self._det_shape: Tuple[int, ...] = det_shape\n        self._i_x: Any = i_x\n        self._i_y: Any = i_y\n        self._ipx: Any = ipx\n        self._ipy: Any = ipy\n        self._index: int = 0\n\n        # Create and open the HDF5 file\n        fname: str = f\"{exp}_r{run:0>4}_{rank}{tag}.cxi\"\n        Path(outdir).mkdir(exist_ok=True)\n        self._outh5: Any = h5py.File(Path(outdir) / fname, \"w\")\n\n        # Entry_1 entry for processing with CrystFEL\n        entry_1: Any = self._outh5.create_group(\"entry_1\")\n        keys: List[str] = [\n            \"nPeaks\",\n            \"peakXPosRaw\",\n            \"peakYPosRaw\",\n            \"rcent\",\n            \"ccent\",\n            \"rmin\",\n            \"rmax\",\n            \"cmin\",\n            \"cmax\",\n            \"peakTotalIntensity\",\n            \"peakMaxIntensity\",\n            \"peakRadius\",\n        ]\n        ds_expId: Any = entry_1.create_dataset(\n            \"experimental_identifier\", (n_events,), maxshape=(None,), dtype=int\n        )\n        ds_expId.attrs[\"axes\"] = \"experiment_identifier\"\n        data_1: Any = entry_1.create_dataset(\n            \"/entry_1/data_1/data\",\n            (n_events, det_shape[0], det_shape[1]),\n            chunks=(1, det_shape[0], det_shape[1]),\n            maxshape=(None, det_shape[0], det_shape[1]),\n            dtype=numpy.float32,\n        )\n        data_1.attrs[\"axes\"] = \"experiment_identifier\"\n        key: str\n        for key in [\"powderHits\", \"powderMisses\", \"mask\"]:\n            entry_1.create_dataset(\n                f\"/entry_1/data_1/{key}\",\n                (det_shape[0], det_shape[1]),\n                chunks=(det_shape[0], det_shape[1]),\n                maxshape=(det_shape[0], det_shape[1]),\n                dtype=float,\n            )\n\n        # Peak-related entries\n        for key in keys:\n            if key == \"nPeaks\":\n                ds_x: Any = self._outh5.create_dataset(\n                    f\"/entry_1/result_1/{key}\",\n                    (n_events,),\n                    maxshape=(None,),\n                    dtype=int,\n                )\n                ds_x.attrs[\"minPeaks\"] = min_peaks\n                ds_x.attrs[\"maxPeaks\"] = max_peaks\n            else:\n                ds_x: Any = self._outh5.create_dataset(\n                    f\"/entry_1/result_1/{key}\",\n                    (n_events, max_peaks),\n                    maxshape=(None, max_peaks),\n                    chunks=(1, max_peaks),\n                    dtype=float,\n                )\n            ds_x.attrs[\"axes\"] = \"experiment_identifier:peaks\"\n\n        # Timestamp entries\n        lcls_1: Any = self._outh5.create_group(\"LCLS\")\n        keys: List[str] = [\n            \"eventNumber\",\n            \"machineTime\",\n            \"machineTimeNanoSeconds\",\n            \"fiducial\",\n            \"photon_energy_eV\",\n        ]\n        key: str\n        for key in keys:\n            if key == \"photon_energy_eV\":\n                ds_x: Any = lcls_1.create_dataset(\n                    f\"{key}\", (n_events,), maxshape=(None,), dtype=float\n                )\n            else:\n                ds_x = lcls_1.create_dataset(\n                    f\"{key}\", (n_events,), maxshape=(None,), dtype=int\n                )\n            ds_x.attrs[\"axes\"] = \"experiment_identifier\"\n\n        ds_x = self._outh5.create_dataset(\n            \"/LCLS/detector_1/EncoderValue\", (n_events,), maxshape=(None,), dtype=float\n        )\n        ds_x.attrs[\"axes\"] = \"experiment_identifier\"\n\n    def write_event(\n        self,\n        img: NDArray[numpy.float_],\n        peaks: Any,  # Not typed becomes it comes from psana\n        timestamp_seconds: int,\n        timestamp_nanoseconds: int,\n        timestamp_fiducials: int,\n        photon_energy: float,\n    ):\n        \"\"\"\n        Write peak finding results for an event into the HDF5 file.\n\n        Parameters:\n\n            img (NDArray[numpy.float_]): Detector data for the event\n\n            peaks: (Any): Peak information for the event, as recovered from the PyAlgos\n                algorithm\n\n            timestamp_seconds (int): Second part of the event's timestamp information\n\n            timestamp_nanoseconds (int): Nanosecond part of the event's timestamp\n                information\n\n            timestamp_fiducials (int): Fiducials part of the event's timestamp\n                information\n\n            photon_energy (float): Photon energy for the event\n        \"\"\"\n        ch_rows: NDArray[numpy.float_] = peaks[:, 0] * self._det_shape[1] + peaks[:, 1]\n        ch_cols: NDArray[numpy.float_] = peaks[:, 2]\n\n        if self._outh5[\"/entry_1/data_1/data\"].shape[0] <= self._index:\n            self._outh5[\"entry_1/data_1/data\"].resize(self._index + 1, axis=0)\n            ds_key: str\n            for ds_key in self._outh5[\"/entry_1/result_1\"].keys():\n                self._outh5[f\"/entry_1/result_1/{ds_key}\"].resize(\n                    self._index + 1, axis=0\n                )\n            for ds_key in (\n                \"machineTime\",\n                \"machineTimeNanoSeconds\",\n                \"fiducial\",\n                \"photon_energy_eV\",\n            ):\n                self._outh5[f\"/LCLS/{ds_key}\"].resize(self._index + 1, axis=0)\n\n        # Entry_1 entry for processing with CrystFEL\n        self._outh5[\"/entry_1/data_1/data\"][self._index, :, :] = img.reshape(\n            -1, img.shape[-1]\n        )\n        self._outh5[\"/entry_1/result_1/nPeaks\"][self._index] = peaks.shape[0]\n        self._outh5[\"/entry_1/result_1/peakXPosRaw\"][self._index, : peaks.shape[0]] = (\n            ch_cols.astype(\"int\")\n        )\n        self._outh5[\"/entry_1/result_1/peakYPosRaw\"][self._index, : peaks.shape[0]] = (\n            ch_rows.astype(\"int\")\n        )\n        self._outh5[\"/entry_1/result_1/rcent\"][self._index, : peaks.shape[0]] = peaks[\n            :, 6\n        ]\n        self._outh5[\"/entry_1/result_1/ccent\"][self._index, : peaks.shape[0]] = peaks[\n            :, 7\n        ]\n        self._outh5[\"/entry_1/result_1/rmin\"][self._index, : peaks.shape[0]] = peaks[\n            :, 10\n        ]\n        self._outh5[\"/entry_1/result_1/rmax\"][self._index, : peaks.shape[0]] = peaks[\n            :, 11\n        ]\n        self._outh5[\"/entry_1/result_1/cmin\"][self._index, : peaks.shape[0]] = peaks[\n            :, 12\n        ]\n        self._outh5[\"/entry_1/result_1/cmax\"][self._index, : peaks.shape[0]] = peaks[\n            :, 13\n        ]\n        self._outh5[\"/entry_1/result_1/peakTotalIntensity\"][\n            self._index, : peaks.shape[0]\n        ] = peaks[:, 5]\n        self._outh5[\"/entry_1/result_1/peakMaxIntensity\"][\n            self._index, : peaks.shape[0]\n        ] = peaks[:, 4]\n\n        # Calculate and write pixel radius\n        peaks_cenx: NDArray[numpy.float_] = (\n            self._i_x[\n                numpy.array(peaks[:, 0], dtype=numpy.int64),\n                numpy.array(peaks[:, 1], dtype=numpy.int64),\n                numpy.array(peaks[:, 2], dtype=numpy.int64),\n            ]\n            + 0.5\n            - self._ipx\n        )\n        peaks_ceny: NDArray[numpy.float_] = (\n            self._i_y[\n                numpy.array(peaks[:, 0], dtype=numpy.int64),\n                numpy.array(peaks[:, 1], dtype=numpy.int64),\n                numpy.array(peaks[:, 2], dtype=numpy.int64),\n            ]\n            + 0.5\n            - self._ipy\n        )\n        peak_radius: NDArray[numpy.float_] = numpy.sqrt(\n            (peaks_cenx**2) + (peaks_ceny**2)\n        )\n        self._outh5[\"/entry_1/result_1/peakRadius\"][\n            self._index, : peaks.shape[0]\n        ] = peak_radius\n\n        # LCLS entry dataset\n        self._outh5[\"/LCLS/machineTime\"][self._index] = timestamp_seconds\n        self._outh5[\"/LCLS/machineTimeNanoSeconds\"][self._index] = timestamp_nanoseconds\n        self._outh5[\"/LCLS/fiducial\"][self._index] = timestamp_fiducials\n        self._outh5[\"/LCLS/photon_energy_eV\"][self._index] = photon_energy\n\n        self._index += 1\n\n    def write_non_event_data(\n        self,\n        powder_hits: NDArray[numpy.float_],\n        powder_misses: NDArray[numpy.float_],\n        mask: NDArray[numpy.uint16],\n        clen: float,\n    ):\n        \"\"\"\n        Write to the file data that is not related to a specific event (masks, powders)\n\n        Parameters:\n\n            powder_hits (NDArray[numpy.float_]): Virtual powder pattern from hits\n\n            powder_misses (NDArray[numpy.float_]): Virtual powder pattern from hits\n\n            mask: (NDArray[numpy.uint16]): Pixel ask to write into the file\n\n        \"\"\"\n        # Add powders and mask to files, reshaping them to match the crystfel\n        # convention\n        self._outh5[\"/entry_1/data_1/powderHits\"][:] = powder_hits.reshape(\n            -1, powder_hits.shape[-1]\n        )\n        self._outh5[\"/entry_1/data_1/powderMisses\"][:] = powder_misses.reshape(\n            -1, powder_misses.shape[-1]\n        )\n        self._outh5[\"/entry_1/data_1/mask\"][:] = (1 - mask).reshape(\n            -1, mask.shape[-1]\n        )  # Crystfel expects inverted values\n\n        # Add clen distance\n        self._outh5[\"/LCLS/detector_1/EncoderValue\"][:] = clen\n\n    def optimize_and_close_file(\n        self,\n        num_hits: int,\n        max_peaks: int,\n    ):\n        \"\"\"\n        Resize data blocks and write additional information to the file\n\n        Parameters:\n\n            num_hits (int): Number of hits for which information has been saved to the\n                file\n\n            max_peaks (int): Maximum number of peaks (per event) for which information\n                can be written into the file\n        \"\"\"\n\n        # Resize the entry_1 entry\n        data_shape: Tuple[int, ...] = self._outh5[\"/entry_1/data_1/data\"].shape\n        self._outh5[\"/entry_1/data_1/data\"].resize(\n            (num_hits, data_shape[1], data_shape[2])\n        )\n        self._outh5[f\"/entry_1/result_1/nPeaks\"].resize((num_hits,))\n        key: str\n        for key in [\n            \"peakXPosRaw\",\n            \"peakYPosRaw\",\n            \"rcent\",\n            \"ccent\",\n            \"rmin\",\n            \"rmax\",\n            \"cmin\",\n            \"cmax\",\n            \"peakTotalIntensity\",\n            \"peakMaxIntensity\",\n            \"peakRadius\",\n        ]:\n            self._outh5[f\"/entry_1/result_1/{key}\"].resize((num_hits, max_peaks))\n\n        # Resize LCLS entry\n        for key in [\n            \"eventNumber\",\n            \"machineTime\",\n            \"machineTimeNanoSeconds\",\n            \"fiducial\",\n            \"detector_1/EncoderValue\",\n            \"photon_energy_eV\",\n        ]:\n            self._outh5[f\"/LCLS/{key}\"].resize((num_hits,))\n        self._outh5.close()\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.CxiWriter.__init__","title":"__init__(outdir, rank, exp, run, n_events, det_shape, min_peaks, max_peaks, i_x, i_y, ipx, ipy, tag)","text":"

Set up the CXI files to which peak finding results will be saved.

Parameters:

outdir (str): Output directory for cxi file.\n\nrank (int): MPI rank of the caller.\n\nexp (str): Experiment string.\n\nrun (int): Experimental run.\n\nn_events (int): Number of events to process.\n\ndet_shape (Tuple[int, int]): Shape of the numpy array storing the detector\n    data. This must be aCheetah-stile 2D array.\n\nmin_peaks (int): Minimum number of peaks per image.\n\nmax_peaks (int): Maximum number of peaks per image.\n\ni_x (Any): Array of pixel indexes along x\n\ni_y (Any): Array of pixel indexes along y\n\nipx (Any): Pixel indexes with respect to detector origin (x component)\n\nipy (Any): Pixel indexes with respect to detector origin (y component)\n\ntag (str): Tag to append to cxi file names.\n
Source code in lute/tasks/sfx_find_peaks.py
def __init__(\n    self,\n    outdir: str,\n    rank: int,\n    exp: str,\n    run: int,\n    n_events: int,\n    det_shape: Tuple[int, ...],\n    min_peaks: int,\n    max_peaks: int,\n    i_x: Any,  # Not typed becomes it comes from psana\n    i_y: Any,  # Not typed becomes it comes from psana\n    ipx: Any,  # Not typed becomes it comes from psana\n    ipy: Any,  # Not typed becomes it comes from psana\n    tag: str,\n):\n    \"\"\"\n    Set up the CXI files to which peak finding results will be saved.\n\n    Parameters:\n\n        outdir (str): Output directory for cxi file.\n\n        rank (int): MPI rank of the caller.\n\n        exp (str): Experiment string.\n\n        run (int): Experimental run.\n\n        n_events (int): Number of events to process.\n\n        det_shape (Tuple[int, int]): Shape of the numpy array storing the detector\n            data. This must be aCheetah-stile 2D array.\n\n        min_peaks (int): Minimum number of peaks per image.\n\n        max_peaks (int): Maximum number of peaks per image.\n\n        i_x (Any): Array of pixel indexes along x\n\n        i_y (Any): Array of pixel indexes along y\n\n        ipx (Any): Pixel indexes with respect to detector origin (x component)\n\n        ipy (Any): Pixel indexes with respect to detector origin (y component)\n\n        tag (str): Tag to append to cxi file names.\n    \"\"\"\n    self._det_shape: Tuple[int, ...] = det_shape\n    self._i_x: Any = i_x\n    self._i_y: Any = i_y\n    self._ipx: Any = ipx\n    self._ipy: Any = ipy\n    self._index: int = 0\n\n    # Create and open the HDF5 file\n    fname: str = f\"{exp}_r{run:0>4}_{rank}{tag}.cxi\"\n    Path(outdir).mkdir(exist_ok=True)\n    self._outh5: Any = h5py.File(Path(outdir) / fname, \"w\")\n\n    # Entry_1 entry for processing with CrystFEL\n    entry_1: Any = self._outh5.create_group(\"entry_1\")\n    keys: List[str] = [\n        \"nPeaks\",\n        \"peakXPosRaw\",\n        \"peakYPosRaw\",\n        \"rcent\",\n        \"ccent\",\n        \"rmin\",\n        \"rmax\",\n        \"cmin\",\n        \"cmax\",\n        \"peakTotalIntensity\",\n        \"peakMaxIntensity\",\n        \"peakRadius\",\n    ]\n    ds_expId: Any = entry_1.create_dataset(\n        \"experimental_identifier\", (n_events,), maxshape=(None,), dtype=int\n    )\n    ds_expId.attrs[\"axes\"] = \"experiment_identifier\"\n    data_1: Any = entry_1.create_dataset(\n        \"/entry_1/data_1/data\",\n        (n_events, det_shape[0], det_shape[1]),\n        chunks=(1, det_shape[0], det_shape[1]),\n        maxshape=(None, det_shape[0], det_shape[1]),\n        dtype=numpy.float32,\n    )\n    data_1.attrs[\"axes\"] = \"experiment_identifier\"\n    key: str\n    for key in [\"powderHits\", \"powderMisses\", \"mask\"]:\n        entry_1.create_dataset(\n            f\"/entry_1/data_1/{key}\",\n            (det_shape[0], det_shape[1]),\n            chunks=(det_shape[0], det_shape[1]),\n            maxshape=(det_shape[0], det_shape[1]),\n            dtype=float,\n        )\n\n    # Peak-related entries\n    for key in keys:\n        if key == \"nPeaks\":\n            ds_x: Any = self._outh5.create_dataset(\n                f\"/entry_1/result_1/{key}\",\n                (n_events,),\n                maxshape=(None,),\n                dtype=int,\n            )\n            ds_x.attrs[\"minPeaks\"] = min_peaks\n            ds_x.attrs[\"maxPeaks\"] = max_peaks\n        else:\n            ds_x: Any = self._outh5.create_dataset(\n                f\"/entry_1/result_1/{key}\",\n                (n_events, max_peaks),\n                maxshape=(None, max_peaks),\n                chunks=(1, max_peaks),\n                dtype=float,\n            )\n        ds_x.attrs[\"axes\"] = \"experiment_identifier:peaks\"\n\n    # Timestamp entries\n    lcls_1: Any = self._outh5.create_group(\"LCLS\")\n    keys: List[str] = [\n        \"eventNumber\",\n        \"machineTime\",\n        \"machineTimeNanoSeconds\",\n        \"fiducial\",\n        \"photon_energy_eV\",\n    ]\n    key: str\n    for key in keys:\n        if key == \"photon_energy_eV\":\n            ds_x: Any = lcls_1.create_dataset(\n                f\"{key}\", (n_events,), maxshape=(None,), dtype=float\n            )\n        else:\n            ds_x = lcls_1.create_dataset(\n                f\"{key}\", (n_events,), maxshape=(None,), dtype=int\n            )\n        ds_x.attrs[\"axes\"] = \"experiment_identifier\"\n\n    ds_x = self._outh5.create_dataset(\n        \"/LCLS/detector_1/EncoderValue\", (n_events,), maxshape=(None,), dtype=float\n    )\n    ds_x.attrs[\"axes\"] = \"experiment_identifier\"\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.CxiWriter.optimize_and_close_file","title":"optimize_and_close_file(num_hits, max_peaks)","text":"

Resize data blocks and write additional information to the file

Parameters:

num_hits (int): Number of hits for which information has been saved to the\n    file\n\nmax_peaks (int): Maximum number of peaks (per event) for which information\n    can be written into the file\n
Source code in lute/tasks/sfx_find_peaks.py
def optimize_and_close_file(\n    self,\n    num_hits: int,\n    max_peaks: int,\n):\n    \"\"\"\n    Resize data blocks and write additional information to the file\n\n    Parameters:\n\n        num_hits (int): Number of hits for which information has been saved to the\n            file\n\n        max_peaks (int): Maximum number of peaks (per event) for which information\n            can be written into the file\n    \"\"\"\n\n    # Resize the entry_1 entry\n    data_shape: Tuple[int, ...] = self._outh5[\"/entry_1/data_1/data\"].shape\n    self._outh5[\"/entry_1/data_1/data\"].resize(\n        (num_hits, data_shape[1], data_shape[2])\n    )\n    self._outh5[f\"/entry_1/result_1/nPeaks\"].resize((num_hits,))\n    key: str\n    for key in [\n        \"peakXPosRaw\",\n        \"peakYPosRaw\",\n        \"rcent\",\n        \"ccent\",\n        \"rmin\",\n        \"rmax\",\n        \"cmin\",\n        \"cmax\",\n        \"peakTotalIntensity\",\n        \"peakMaxIntensity\",\n        \"peakRadius\",\n    ]:\n        self._outh5[f\"/entry_1/result_1/{key}\"].resize((num_hits, max_peaks))\n\n    # Resize LCLS entry\n    for key in [\n        \"eventNumber\",\n        \"machineTime\",\n        \"machineTimeNanoSeconds\",\n        \"fiducial\",\n        \"detector_1/EncoderValue\",\n        \"photon_energy_eV\",\n    ]:\n        self._outh5[f\"/LCLS/{key}\"].resize((num_hits,))\n    self._outh5.close()\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.CxiWriter.write_event","title":"write_event(img, peaks, timestamp_seconds, timestamp_nanoseconds, timestamp_fiducials, photon_energy)","text":"

Write peak finding results for an event into the HDF5 file.

Parameters:

img (NDArray[numpy.float_]): Detector data for the event\n\npeaks: (Any): Peak information for the event, as recovered from the PyAlgos\n    algorithm\n\ntimestamp_seconds (int): Second part of the event's timestamp information\n\ntimestamp_nanoseconds (int): Nanosecond part of the event's timestamp\n    information\n\ntimestamp_fiducials (int): Fiducials part of the event's timestamp\n    information\n\nphoton_energy (float): Photon energy for the event\n
Source code in lute/tasks/sfx_find_peaks.py
def write_event(\n    self,\n    img: NDArray[numpy.float_],\n    peaks: Any,  # Not typed becomes it comes from psana\n    timestamp_seconds: int,\n    timestamp_nanoseconds: int,\n    timestamp_fiducials: int,\n    photon_energy: float,\n):\n    \"\"\"\n    Write peak finding results for an event into the HDF5 file.\n\n    Parameters:\n\n        img (NDArray[numpy.float_]): Detector data for the event\n\n        peaks: (Any): Peak information for the event, as recovered from the PyAlgos\n            algorithm\n\n        timestamp_seconds (int): Second part of the event's timestamp information\n\n        timestamp_nanoseconds (int): Nanosecond part of the event's timestamp\n            information\n\n        timestamp_fiducials (int): Fiducials part of the event's timestamp\n            information\n\n        photon_energy (float): Photon energy for the event\n    \"\"\"\n    ch_rows: NDArray[numpy.float_] = peaks[:, 0] * self._det_shape[1] + peaks[:, 1]\n    ch_cols: NDArray[numpy.float_] = peaks[:, 2]\n\n    if self._outh5[\"/entry_1/data_1/data\"].shape[0] <= self._index:\n        self._outh5[\"entry_1/data_1/data\"].resize(self._index + 1, axis=0)\n        ds_key: str\n        for ds_key in self._outh5[\"/entry_1/result_1\"].keys():\n            self._outh5[f\"/entry_1/result_1/{ds_key}\"].resize(\n                self._index + 1, axis=0\n            )\n        for ds_key in (\n            \"machineTime\",\n            \"machineTimeNanoSeconds\",\n            \"fiducial\",\n            \"photon_energy_eV\",\n        ):\n            self._outh5[f\"/LCLS/{ds_key}\"].resize(self._index + 1, axis=0)\n\n    # Entry_1 entry for processing with CrystFEL\n    self._outh5[\"/entry_1/data_1/data\"][self._index, :, :] = img.reshape(\n        -1, img.shape[-1]\n    )\n    self._outh5[\"/entry_1/result_1/nPeaks\"][self._index] = peaks.shape[0]\n    self._outh5[\"/entry_1/result_1/peakXPosRaw\"][self._index, : peaks.shape[0]] = (\n        ch_cols.astype(\"int\")\n    )\n    self._outh5[\"/entry_1/result_1/peakYPosRaw\"][self._index, : peaks.shape[0]] = (\n        ch_rows.astype(\"int\")\n    )\n    self._outh5[\"/entry_1/result_1/rcent\"][self._index, : peaks.shape[0]] = peaks[\n        :, 6\n    ]\n    self._outh5[\"/entry_1/result_1/ccent\"][self._index, : peaks.shape[0]] = peaks[\n        :, 7\n    ]\n    self._outh5[\"/entry_1/result_1/rmin\"][self._index, : peaks.shape[0]] = peaks[\n        :, 10\n    ]\n    self._outh5[\"/entry_1/result_1/rmax\"][self._index, : peaks.shape[0]] = peaks[\n        :, 11\n    ]\n    self._outh5[\"/entry_1/result_1/cmin\"][self._index, : peaks.shape[0]] = peaks[\n        :, 12\n    ]\n    self._outh5[\"/entry_1/result_1/cmax\"][self._index, : peaks.shape[0]] = peaks[\n        :, 13\n    ]\n    self._outh5[\"/entry_1/result_1/peakTotalIntensity\"][\n        self._index, : peaks.shape[0]\n    ] = peaks[:, 5]\n    self._outh5[\"/entry_1/result_1/peakMaxIntensity\"][\n        self._index, : peaks.shape[0]\n    ] = peaks[:, 4]\n\n    # Calculate and write pixel radius\n    peaks_cenx: NDArray[numpy.float_] = (\n        self._i_x[\n            numpy.array(peaks[:, 0], dtype=numpy.int64),\n            numpy.array(peaks[:, 1], dtype=numpy.int64),\n            numpy.array(peaks[:, 2], dtype=numpy.int64),\n        ]\n        + 0.5\n        - self._ipx\n    )\n    peaks_ceny: NDArray[numpy.float_] = (\n        self._i_y[\n            numpy.array(peaks[:, 0], dtype=numpy.int64),\n            numpy.array(peaks[:, 1], dtype=numpy.int64),\n            numpy.array(peaks[:, 2], dtype=numpy.int64),\n        ]\n        + 0.5\n        - self._ipy\n    )\n    peak_radius: NDArray[numpy.float_] = numpy.sqrt(\n        (peaks_cenx**2) + (peaks_ceny**2)\n    )\n    self._outh5[\"/entry_1/result_1/peakRadius\"][\n        self._index, : peaks.shape[0]\n    ] = peak_radius\n\n    # LCLS entry dataset\n    self._outh5[\"/LCLS/machineTime\"][self._index] = timestamp_seconds\n    self._outh5[\"/LCLS/machineTimeNanoSeconds\"][self._index] = timestamp_nanoseconds\n    self._outh5[\"/LCLS/fiducial\"][self._index] = timestamp_fiducials\n    self._outh5[\"/LCLS/photon_energy_eV\"][self._index] = photon_energy\n\n    self._index += 1\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.CxiWriter.write_non_event_data","title":"write_non_event_data(powder_hits, powder_misses, mask, clen)","text":"

Write to the file data that is not related to a specific event (masks, powders)

Parameters:

powder_hits (NDArray[numpy.float_]): Virtual powder pattern from hits\n\npowder_misses (NDArray[numpy.float_]): Virtual powder pattern from hits\n\nmask: (NDArray[numpy.uint16]): Pixel ask to write into the file\n
Source code in lute/tasks/sfx_find_peaks.py
def write_non_event_data(\n    self,\n    powder_hits: NDArray[numpy.float_],\n    powder_misses: NDArray[numpy.float_],\n    mask: NDArray[numpy.uint16],\n    clen: float,\n):\n    \"\"\"\n    Write to the file data that is not related to a specific event (masks, powders)\n\n    Parameters:\n\n        powder_hits (NDArray[numpy.float_]): Virtual powder pattern from hits\n\n        powder_misses (NDArray[numpy.float_]): Virtual powder pattern from hits\n\n        mask: (NDArray[numpy.uint16]): Pixel ask to write into the file\n\n    \"\"\"\n    # Add powders and mask to files, reshaping them to match the crystfel\n    # convention\n    self._outh5[\"/entry_1/data_1/powderHits\"][:] = powder_hits.reshape(\n        -1, powder_hits.shape[-1]\n    )\n    self._outh5[\"/entry_1/data_1/powderMisses\"][:] = powder_misses.reshape(\n        -1, powder_misses.shape[-1]\n    )\n    self._outh5[\"/entry_1/data_1/mask\"][:] = (1 - mask).reshape(\n        -1, mask.shape[-1]\n    )  # Crystfel expects inverted values\n\n    # Add clen distance\n    self._outh5[\"/LCLS/detector_1/EncoderValue\"][:] = clen\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.FindPeaksPyAlgos","title":"FindPeaksPyAlgos","text":"

Bases: Task

Task that performs peak finding using the PyAlgos peak finding algorithms and writes the peak information to CXI files.

Source code in lute/tasks/sfx_find_peaks.py
class FindPeaksPyAlgos(Task):\n    \"\"\"\n    Task that performs peak finding using the PyAlgos peak finding algorithms and\n    writes the peak information to CXI files.\n    \"\"\"\n\n    def __init__(self, *, params: TaskParameters, use_mpi: bool = True) -> None:\n        super().__init__(params=params, use_mpi=use_mpi)\n        if self._task_parameters.compression is not None:\n            from libpressio import PressioCompressor\n\n    def _run(self) -> None:\n        ds: Any = MPIDataSource(\n            f\"exp={self._task_parameters.lute_config.experiment}:\"\n            f\"run={self._task_parameters.lute_config.run}:smd\"\n        )\n        if self._task_parameters.n_events != 0:\n            ds.break_after(self._task_parameters.n_events)\n\n        det: Any = Detector(self._task_parameters.det_name)\n        det.do_reshape_2d_to_3d(flag=True)\n\n        evr: Any = Detector(self._task_parameters.event_receiver)\n\n        i_x: Any = det.indexes_x(self._task_parameters.lute_config.run).astype(\n            numpy.int64\n        )\n        i_y: Any = det.indexes_y(self._task_parameters.lute_config.run).astype(\n            numpy.int64\n        )\n        ipx: Any\n        ipy: Any\n        ipx, ipy = det.point_indexes(\n            self._task_parameters.lute_config.run, pxy_um=(0, 0)\n        )\n\n        alg: Any = None\n        num_hits: int = 0\n        num_events: int = 0\n        num_empty_images: int = 0\n        tag: str = self._task_parameters.tag\n        if (tag != \"\") and (tag[0] != \"_\"):\n            tag = \"_\" + tag\n\n        evt: Any\n        for evt in ds.events():\n\n            evt_id: Any = evt.get(EventId)\n            timestamp_seconds: int = evt_id.time()[0]\n            timestamp_nanoseconds: int = evt_id.time()[1]\n            timestamp_fiducials: int = evt_id.fiducials()\n            event_codes: Any = evr.eventCodes(evt)\n\n            if isinstance(self._task_parameters.pv_camera_length, float):\n                clen: float = self._task_parameters.pv_camera_length\n            else:\n                clen = (\n                    ds.env().epicsStore().value(self._task_parameters.pv_camera_length)\n                )\n\n            if self._task_parameters.event_logic:\n                if not self._task_parameters.event_code in event_codes:\n                    continue\n\n            img: Any = det.calib(evt)\n\n            if img is None:\n                num_empty_images += 1\n                continue\n\n            if alg is None:\n                det_shape: Tuple[int, ...] = img.shape\n                if len(det_shape) == 3:\n                    det_shape = (det_shape[0] * det_shape[1], det_shape[2])\n                else:\n                    det_shape = img.shape\n\n                mask: NDArray[numpy.uint16] = numpy.ones(det_shape).astype(numpy.uint16)\n\n                if self._task_parameters.psana_mask:\n                    mask = det.mask(\n                        self.task_parameters.run,\n                        calib=False,\n                        status=True,\n                        edges=False,\n                        centra=False,\n                        unbond=False,\n                        unbondnbrs=False,\n                    ).astype(numpy.uint16)\n\n                hdffh: Any\n                if self._task_parameters.mask_file is not None:\n                    with h5py.File(self._task_parameters.mask_file, \"r\") as hdffh:\n                        loaded_mask: NDArray[numpy.int] = hdffh[\"entry_1/data_1/mask\"][\n                            :\n                        ]\n                        mask *= loaded_mask.astype(numpy.uint16)\n\n                file_writer: CxiWriter = CxiWriter(\n                    outdir=self._task_parameters.outdir,\n                    rank=ds.rank,\n                    exp=self._task_parameters.lute_config.experiment,\n                    run=self._task_parameters.lute_config.run,\n                    n_events=self._task_parameters.n_events,\n                    det_shape=det_shape,\n                    i_x=i_x,\n                    i_y=i_y,\n                    ipx=ipx,\n                    ipy=ipy,\n                    min_peaks=self._task_parameters.min_peaks,\n                    max_peaks=self._task_parameters.max_peaks,\n                    tag=tag,\n                )\n                alg: Any = PyAlgos(mask=mask, pbits=0)  # pbits controls verbosity\n                alg.set_peak_selection_pars(\n                    npix_min=self._task_parameters.npix_min,\n                    npix_max=self._task_parameters.npix_max,\n                    amax_thr=self._task_parameters.amax_thr,\n                    atot_thr=self._task_parameters.atot_thr,\n                    son_min=self._task_parameters.son_min,\n                )\n\n                if self._task_parameters.compression is not None:\n\n                    libpressio_config = generate_libpressio_configuration(\n                        compressor=self._task_parameters.compression.compressor,\n                        roi_window_size=self._task_parameters.compression.roi_window_size,\n                        bin_size=self._task_parameters.compression.bin_size,\n                        abs_error=self._task_parameters.compression.abs_error,\n                        libpressio_mask=mask,\n                    )\n\n                powder_hits: NDArray[numpy.float_] = numpy.zeros(det_shape)\n                powder_misses: NDArray[numpy.float_] = numpy.zeros(det_shape)\n\n            peaks: Any = alg.peak_finder_v3r3(\n                img,\n                rank=self._task_parameters.peak_rank,\n                r0=self._task_parameters.r0,\n                dr=self._task_parameters.dr,\n                #      nsigm=self._task_parameters.nsigm,\n            )\n\n            num_events += 1\n\n            if (peaks.shape[0] >= self._task_parameters.min_peaks) and (\n                peaks.shape[0] <= self._task_parameters.max_peaks\n            ):\n\n                if self._task_parameters.compression is not None:\n\n                    libpressio_config_with_peaks = (\n                        add_peaks_to_libpressio_configuration(libpressio_config, peaks)\n                    )\n                    compressor = PressioCompressor.from_config(\n                        libpressio_config_with_peaks\n                    )\n                    compressed_img = compressor.encode(img)\n                    decompressed_img = numpy.zeros_like(img)\n                    decompressed = compressor.decode(compressed_img, decompressed_img)\n                    img = decompressed_img\n\n                try:\n                    photon_energy: float = (\n                        Detector(\"EBeam\").get(evt).ebeamPhotonEnergy()\n                    )\n                except AttributeError:\n                    photon_energy = (\n                        1.23984197386209e-06\n                        / ds.env().epicsStore().value(\"SIOC:SYS0:ML00:AO192\")\n                        / 1.0e9\n                    )\n\n                file_writer.write_event(\n                    img=img,\n                    peaks=peaks,\n                    timestamp_seconds=timestamp_seconds,\n                    timestamp_nanoseconds=timestamp_nanoseconds,\n                    timestamp_fiducials=timestamp_fiducials,\n                    photon_energy=photon_energy,\n                )\n                num_hits += 1\n\n            # TODO: Fix bug here\n            # generate / update powders\n            if peaks.shape[0] >= self._task_parameters.min_peaks:\n                powder_hits = numpy.maximum(\n                    powder_hits,\n                    img.reshape(-1, img.shape[-1]),\n                )\n            else:\n                powder_misses = numpy.maximum(\n                    powder_misses,\n                    img.reshape(-1, img.shape[-1]),\n                )\n\n        if num_empty_images != 0:\n            msg: Message = Message(\n                contents=f\"Rank {ds.rank} encountered {num_empty_images} empty images.\"\n            )\n            self._report_to_executor(msg)\n\n        file_writer.write_non_event_data(\n            powder_hits=powder_hits,\n            powder_misses=powder_misses,\n            mask=mask,\n            clen=clen,\n        )\n\n        file_writer.optimize_and_close_file(\n            num_hits=num_hits, max_peaks=self._task_parameters.max_peaks\n        )\n\n        COMM_WORLD.Barrier()\n\n        num_hits_per_rank: List[int] = COMM_WORLD.gather(num_hits, root=0)\n        num_hits_total: int = COMM_WORLD.reduce(num_hits, SUM)\n        num_events_per_rank: List[int] = COMM_WORLD.gather(num_events, root=0)\n\n        if ds.rank == 0:\n            master_fname: Path = write_master_file(\n                mpi_size=ds.size,\n                outdir=self._task_parameters.outdir,\n                exp=self._task_parameters.lute_config.experiment,\n                run=self._task_parameters.lute_config.run,\n                tag=tag,\n                n_hits_per_rank=num_hits_per_rank,\n                n_hits_total=num_hits_total,\n            )\n\n            # Write final summary file\n            f: TextIO\n            with open(\n                Path(self._task_parameters.outdir) / f\"peakfinding{tag}.summary\", \"w\"\n            ) as f:\n                print(f\"Number of events processed: {num_events_per_rank[-1]}\", file=f)\n                print(f\"Number of hits found: {num_hits_total}\", file=f)\n                print(\n                    \"Fractional hit rate: \"\n                    f\"{(num_hits_total/num_events_per_rank[-1]):.2f}\",\n                    file=f,\n                )\n                print(f\"No. hits per rank: {num_hits_per_rank}\", file=f)\n\n            with open(Path(self._task_parameters.out_file), \"w\") as f:\n                print(f\"{master_fname}\", file=f)\n\n            # Write out_file\n\n    def _post_run(self) -> None:\n        super()._post_run()\n        self._result.task_status = TaskStatus.COMPLETED\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.add_peaks_to_libpressio_configuration","title":"add_peaks_to_libpressio_configuration(lp_json, peaks)","text":"

Add peak infromation to libpressio configuration

Parameters:

lp_json: Dictionary storing the configuration JSON structure for the libpressio\n    library.\n\npeaks (Any): Peak information as returned by psana.\n

Returns:

lp_json: Updated configuration JSON structure for the libpressio library.\n
Source code in lute/tasks/sfx_find_peaks.py
def add_peaks_to_libpressio_configuration(lp_json, peaks) -> Dict[str, Any]:\n    \"\"\"\n    Add peak infromation to libpressio configuration\n\n    Parameters:\n\n        lp_json: Dictionary storing the configuration JSON structure for the libpressio\n            library.\n\n        peaks (Any): Peak information as returned by psana.\n\n    Returns:\n\n        lp_json: Updated configuration JSON structure for the libpressio library.\n    \"\"\"\n    lp_json[\"compressor_config\"][\"pressio\"][\"roibin\"][\"roibin:centers\"] = (\n        numpy.ascontiguousarray(numpy.uint64(peaks[:, [2, 1, 0]]))\n    )\n    return lp_json\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.generate_libpressio_configuration","title":"generate_libpressio_configuration(compressor, roi_window_size, bin_size, abs_error, libpressio_mask)","text":"

Create the configuration JSON for the libpressio library

Parameters:

compressor (Literal[\"sz3\", \"qoz\"]): Compression algorithm to use\n    (\"qoz\" or \"sz3\").\n\nabs_error (float): Bound value for the absolute error.\n\nbin_size (int): Bining Size.\n\nroi_window_size (int): Default size of the ROI window.\n\nlibpressio_mask (NDArray): mask to be applied to the data.\n

Returns:

lp_json (Dict[str, Any]): Dictionary storing the JSON configuration structure\nfor the libpressio library\n
Source code in lute/tasks/sfx_find_peaks.py
def generate_libpressio_configuration(\n    compressor: Literal[\"sz3\", \"qoz\"],\n    roi_window_size: int,\n    bin_size: int,\n    abs_error: float,\n    libpressio_mask,\n) -> Dict[str, Any]:\n    \"\"\"\n    Create the configuration JSON for the libpressio library\n\n    Parameters:\n\n        compressor (Literal[\"sz3\", \"qoz\"]): Compression algorithm to use\n            (\"qoz\" or \"sz3\").\n\n        abs_error (float): Bound value for the absolute error.\n\n        bin_size (int): Bining Size.\n\n        roi_window_size (int): Default size of the ROI window.\n\n        libpressio_mask (NDArray): mask to be applied to the data.\n\n    Returns:\n\n        lp_json (Dict[str, Any]): Dictionary storing the JSON configuration structure\n        for the libpressio library\n    \"\"\"\n\n    if compressor == \"qoz\":\n        pressio_opts: Dict[str, Any] = {\n            \"pressio:abs\": abs_error,\n            \"qoz\": {\"qoz:stride\": 8},\n        }\n    elif compressor == \"sz3\":\n        pressio_opts = {\"pressio:abs\": abs_error}\n\n    lp_json = {\n        \"compressor_id\": \"pressio\",\n        \"early_config\": {\n            \"pressio\": {\n                \"pressio:compressor\": \"roibin\",\n                \"roibin\": {\n                    \"roibin:metric\": \"composite\",\n                    \"roibin:background\": \"mask_binning\",\n                    \"roibin:roi\": \"fpzip\",\n                    \"background\": {\n                        \"binning:compressor\": \"pressio\",\n                        \"mask_binning:compressor\": \"pressio\",\n                        \"pressio\": {\"pressio:compressor\": compressor},\n                    },\n                    \"composite\": {\n                        \"composite:plugins\": [\n                            \"size\",\n                            \"time\",\n                            \"input_stats\",\n                            \"error_stat\",\n                        ]\n                    },\n                },\n            }\n        },\n        \"compressor_config\": {\n            \"pressio\": {\n                \"roibin\": {\n                    \"roibin:roi_size\": [roi_window_size, roi_window_size, 0],\n                    \"roibin:centers\": None,  # \"roibin:roi_strategy\": \"coordinates\",\n                    \"roibin:nthreads\": 4,\n                    \"roi\": {\"fpzip:prec\": 0},\n                    \"background\": {\n                        \"mask_binning:mask\": None,\n                        \"mask_binning:shape\": [bin_size, bin_size, 1],\n                        \"mask_binning:nthreads\": 4,\n                        \"pressio\": pressio_opts,\n                    },\n                }\n            }\n        },\n        \"name\": \"pressio\",\n    }\n\n    lp_json[\"compressor_config\"][\"pressio\"][\"roibin\"][\"background\"][\n        \"mask_binning:mask\"\n    ] = (1 - libpressio_mask)\n\n    return lp_json\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.write_master_file","title":"write_master_file(mpi_size, outdir, exp, run, tag, n_hits_per_rank, n_hits_total)","text":"

Generate a virtual dataset to map all individual files for this run.

Parameters:

mpi_size (int): Number of ranks in the MPI pool.\n\noutdir (str): Output directory for cxi file.\n\nexp (str): Experiment string.\n\nrun (int): Experimental run.\n\ntag (str): Tag to append to cxi file names.\n\nn_hits_per_rank (List[int]): Array containing the number of hits found on each\n    node processing data.\n\nn_hits_total (int): Total number of hits found across all nodes.\n

Returns:

The path to the the written master file\n
Source code in lute/tasks/sfx_find_peaks.py
def write_master_file(\n    mpi_size: int,\n    outdir: str,\n    exp: str,\n    run: int,\n    tag: str,\n    n_hits_per_rank: List[int],\n    n_hits_total: int,\n) -> Path:\n    \"\"\"\n    Generate a virtual dataset to map all individual files for this run.\n\n    Parameters:\n\n        mpi_size (int): Number of ranks in the MPI pool.\n\n        outdir (str): Output directory for cxi file.\n\n        exp (str): Experiment string.\n\n        run (int): Experimental run.\n\n        tag (str): Tag to append to cxi file names.\n\n        n_hits_per_rank (List[int]): Array containing the number of hits found on each\n            node processing data.\n\n        n_hits_total (int): Total number of hits found across all nodes.\n\n    Returns:\n\n        The path to the the written master file\n    \"\"\"\n    # Retrieve paths to the files containing data\n    fnames: List[Path] = []\n    fi: int\n    for fi in range(mpi_size):\n        if n_hits_per_rank[fi] > 0:\n            fnames.append(Path(outdir) / f\"{exp}_r{run:0>4}_{fi}{tag}.cxi\")\n    if len(fnames) == 0:\n        sys.exit(\"No hits found\")\n\n    # Retrieve list of entries to populate in the virtual hdf5 file\n    dname_list, key_list, shape_list, dtype_list = [], [], [], []\n    datasets = [\"/entry_1/result_1\", \"/LCLS/detector_1\", \"/LCLS\", \"/entry_1/data_1\"]\n    f = h5py.File(fnames[0], \"r\")\n    for dname in datasets:\n        dset = f[dname]\n        for key in dset.keys():\n            if f\"{dname}/{key}\" not in datasets:\n                dname_list.append(dname)\n                key_list.append(key)\n                shape_list.append(dset[key].shape)\n                dtype_list.append(dset[key].dtype)\n    f.close()\n\n    # Compute cumulative powder hits and misses for all files\n    powder_hits, powder_misses = None, None\n    for fn in fnames:\n        f = h5py.File(fn, \"r\")\n        if powder_hits is None:\n            powder_hits = f[\"entry_1/data_1/powderHits\"][:].copy()\n            powder_misses = f[\"entry_1/data_1/powderMisses\"][:].copy()\n        else:\n            powder_hits = numpy.maximum(\n                powder_hits, f[\"entry_1/data_1/powderHits\"][:].copy()\n            )\n            powder_misses = numpy.maximum(\n                powder_misses, f[\"entry_1/data_1/powderMisses\"][:].copy()\n            )\n        f.close()\n\n    vfname: Path = Path(outdir) / f\"{exp}_r{run:0>4}{tag}.cxi\"\n    with h5py.File(vfname, \"w\") as vdf:\n\n        # Write the virtual hdf5 file\n        for dnum in range(len(dname_list)):\n            dname = f\"{dname_list[dnum]}/{key_list[dnum]}\"\n            if key_list[dnum] not in [\"mask\", \"powderHits\", \"powderMisses\"]:\n                layout = h5py.VirtualLayout(\n                    shape=(n_hits_total,) + shape_list[dnum][1:], dtype=dtype_list[dnum]\n                )\n                cursor = 0\n                for i, fn in enumerate(fnames):\n                    vsrc = h5py.VirtualSource(\n                        fn, dname, shape=(n_hits_per_rank[i],) + shape_list[dnum][1:]\n                    )\n                    if len(shape_list[dnum]) == 1:\n                        layout[cursor : cursor + n_hits_per_rank[i]] = vsrc\n                    else:\n                        layout[cursor : cursor + n_hits_per_rank[i], :] = vsrc\n                    cursor += n_hits_per_rank[i]\n                vdf.create_virtual_dataset(dname, layout, fillvalue=-1)\n\n        vdf[\"entry_1/data_1/powderHits\"] = powder_hits\n        vdf[\"entry_1/data_1/powderMisses\"] = powder_misses\n\n    return vfname\n
"},{"location":"source/tasks/sfx_index/","title":"sfx_index","text":"

Classes for indexing tasks in SFX.

Classes:

Name Description ConcatenateStreamFIles

task that merges multiple stream files into a single file.

"},{"location":"source/tasks/sfx_index/#tasks.sfx_index.ConcatenateStreamFiles","title":"ConcatenateStreamFiles","text":"

Bases: Task

Task that merges stream files located within a directory tree.

Source code in lute/tasks/sfx_index.py
class ConcatenateStreamFiles(Task):\n    \"\"\"\n    Task that merges stream files located within a directory tree.\n    \"\"\"\n\n    def __init__(self, *, params: TaskParameters) -> None:\n        super().__init__(params=params)\n\n    def _run(self) -> None:\n\n        stream_file_path: Path = Path(self._task_parameters.in_file)\n        stream_file_list: List[Path] = list(\n            stream_file_path.rglob(f\"{self._task_parameters.tag}_*.stream\")\n        )\n\n        processed_file_list = [str(stream_file) for stream_file in stream_file_list]\n\n        msg: Message = Message(\n            contents=f\"Merging following stream files: {processed_file_list} into \"\n            f\"{self._task_parameters.out_file}\",\n        )\n        self._report_to_executor(msg)\n\n        wfd: BinaryIO\n        with open(self._task_parameters.out_file, \"wb\") as wfd:\n            infile: Path\n            for infile in stream_file_list:\n                fd: BinaryIO\n                with open(infile, \"rb\") as fd:\n                    shutil.copyfileobj(fd, wfd)\n
"},{"location":"source/tasks/task/","title":"task","text":"

Base classes for implementing analysis tasks.

Classes:

Name Description Task

Abstract base class from which all analysis tasks are derived.

ThirdPartyTask

Class to run a third-party executable binary as a Task.

"},{"location":"source/tasks/task/#tasks.task.DescribedAnalysis","title":"DescribedAnalysis dataclass","text":"

Complete analysis description. Held by an Executor.

Source code in lute/tasks/dataclasses.py
@dataclass\nclass DescribedAnalysis:\n    \"\"\"Complete analysis description. Held by an Executor.\"\"\"\n\n    task_result: TaskResult\n    task_parameters: Optional[TaskParameters]\n    task_env: Dict[str, str]\n    poll_interval: float\n    communicator_desc: List[str]\n
"},{"location":"source/tasks/task/#tasks.task.ElogSummaryPlots","title":"ElogSummaryPlots dataclass","text":"

Holds a graphical summary intended for display in the eLog.

Attributes:

Name Type Description display_name str

This represents both a path and how the result will be displayed in the eLog. Can include \"/\" characters. E.g. display_name = \"scans/my_motor_scan\" will have plots shown on a \"my_motor_scan\" page, under a \"scans\" tab. This format mirrors how the file is stored on disk as well.

Source code in lute/tasks/dataclasses.py
@dataclass\nclass ElogSummaryPlots:\n    \"\"\"Holds a graphical summary intended for display in the eLog.\n\n    Attributes:\n        display_name (str): This represents both a path and how the result will be\n            displayed in the eLog. Can include \"/\" characters. E.g.\n            `display_name = \"scans/my_motor_scan\"` will have plots shown\n            on a \"my_motor_scan\" page, under a \"scans\" tab. This format mirrors\n            how the file is stored on disk as well.\n    \"\"\"\n\n    display_name: str\n    figures: Union[pn.Tabs, hv.Image, plt.Figure]\n
"},{"location":"source/tasks/task/#tasks.task.Task","title":"Task","text":"

Bases: ABC

Abstract base class for analysis tasks.

Attributes:

Name Type Description name str

The name of the Task.

Source code in lute/tasks/task.py
class Task(ABC):\n    \"\"\"Abstract base class for analysis tasks.\n\n    Attributes:\n        name (str): The name of the Task.\n    \"\"\"\n\n    def __init__(self, *, params: TaskParameters, use_mpi: bool = False) -> None:\n        \"\"\"Initialize a Task.\n\n        Args:\n            params (TaskParameters): Parameters needed to properly configure\n                the analysis task. These are NOT related to execution parameters\n                (number of cores, etc), except, potentially, in case of binary\n                executable sub-classes.\n\n            use_mpi (bool): Whether this Task requires the use of MPI.\n                This determines the behaviour and timing of certain signals\n                and ensures appropriate barriers are placed to not end\n                processing until all ranks have finished.\n        \"\"\"\n        self.name: str = str(type(self)).split(\"'\")[1].split(\".\")[-1]\n        self._result: TaskResult = TaskResult(\n            task_name=self.name,\n            task_status=TaskStatus.PENDING,\n            summary=\"PENDING\",\n            payload=\"\",\n        )\n        self._task_parameters: TaskParameters = params\n        timeout: int = self._task_parameters.lute_config.task_timeout\n        signal.setitimer(signal.ITIMER_REAL, timeout)\n\n        run_directory: Optional[str] = self._task_parameters.Config.run_directory\n        if run_directory is not None:\n            try:\n                os.chdir(run_directory)\n            except FileNotFoundError:\n                warnings.warn(\n                    (\n                        f\"Attempt to change to {run_directory}, but it is not found!\\n\"\n                        f\"Will attempt to run from {os.getcwd()}. It may fail!\"\n                    ),\n                    category=UserWarning,\n                )\n        self._use_mpi: bool = use_mpi\n\n    def run(self) -> None:\n        \"\"\"Calls the analysis routines and any pre/post task functions.\n\n        This method is part of the public API and should not need to be modified\n        in any subclasses.\n        \"\"\"\n        self._signal_start()\n        self._pre_run()\n        self._run()\n        self._post_run()\n        self._signal_result()\n\n    @abstractmethod\n    def _run(self) -> None:\n        \"\"\"Actual analysis to run. Overridden by subclasses.\n\n        Separating the calling API from the implementation allows `run` to\n        have pre and post task functionality embedded easily into a single\n        function call.\n        \"\"\"\n        ...\n\n    def _pre_run(self) -> None:\n        \"\"\"Code to run BEFORE the main analysis takes place.\n\n        This function may, or may not, be employed by subclasses.\n        \"\"\"\n        ...\n\n    def _post_run(self) -> None:\n        \"\"\"Code to run AFTER the main analysis takes place.\n\n        This function may, or may not, be employed by subclasses.\n        \"\"\"\n        ...\n\n    @property\n    def result(self) -> TaskResult:\n        \"\"\"TaskResult: Read-only Task Result information.\"\"\"\n        return self._result\n\n    def __call__(self) -> None:\n        self.run()\n\n    def _signal_start(self) -> None:\n        \"\"\"Send the signal that the Task will begin shortly.\"\"\"\n        start_msg: Message = Message(\n            contents=self._task_parameters, signal=\"TASK_STARTED\"\n        )\n        self._result.task_status = TaskStatus.RUNNING\n        if self._use_mpi:\n            from mpi4py import MPI\n\n            comm: MPI.Intracomm = MPI.COMM_WORLD\n            rank: int = comm.Get_rank()\n            comm.Barrier()\n            if rank == 0:\n                self._report_to_executor(start_msg)\n        else:\n            self._report_to_executor(start_msg)\n\n    def _signal_result(self) -> None:\n        \"\"\"Send the signal that results are ready along with the results.\"\"\"\n        signal: str = \"TASK_RESULT\"\n        results_msg: Message = Message(contents=self.result, signal=signal)\n        if self._use_mpi:\n            from mpi4py import MPI\n\n            comm: MPI.Intracomm = MPI.COMM_WORLD\n            rank: int = comm.Get_rank()\n            comm.Barrier()\n            if rank == 0:\n                self._report_to_executor(results_msg)\n        else:\n            self._report_to_executor(results_msg)\n        time.sleep(0.1)\n\n    def _report_to_executor(self, msg: Message) -> None:\n        \"\"\"Send a message to the Executor.\n\n        Details of `Communicator` choice are hidden from the caller. This\n        method may be overriden by subclasses with specialized functionality.\n\n        Args:\n            msg (Message): The message object to send.\n        \"\"\"\n        communicator: Communicator\n        if isinstance(msg.contents, str) or msg.contents is None:\n            communicator = PipeCommunicator()\n        else:\n            communicator = SocketCommunicator()\n\n        communicator.delayed_setup()\n        communicator.write(msg)\n        communicator.clear_communicator()\n\n    def clean_up_timeout(self) -> None:\n        \"\"\"Perform any necessary cleanup actions before exit if timing out.\"\"\"\n        ...\n
"},{"location":"source/tasks/task/#tasks.task.Task.result","title":"result: TaskResult property","text":"

TaskResult: Read-only Task Result information.

"},{"location":"source/tasks/task/#tasks.task.Task.__init__","title":"__init__(*, params, use_mpi=False)","text":"

Initialize a Task.

Parameters:

Name Type Description Default params TaskParameters

Parameters needed to properly configure the analysis task. These are NOT related to execution parameters (number of cores, etc), except, potentially, in case of binary executable sub-classes.

required use_mpi bool

Whether this Task requires the use of MPI. This determines the behaviour and timing of certain signals and ensures appropriate barriers are placed to not end processing until all ranks have finished.

False Source code in lute/tasks/task.py
def __init__(self, *, params: TaskParameters, use_mpi: bool = False) -> None:\n    \"\"\"Initialize a Task.\n\n    Args:\n        params (TaskParameters): Parameters needed to properly configure\n            the analysis task. These are NOT related to execution parameters\n            (number of cores, etc), except, potentially, in case of binary\n            executable sub-classes.\n\n        use_mpi (bool): Whether this Task requires the use of MPI.\n            This determines the behaviour and timing of certain signals\n            and ensures appropriate barriers are placed to not end\n            processing until all ranks have finished.\n    \"\"\"\n    self.name: str = str(type(self)).split(\"'\")[1].split(\".\")[-1]\n    self._result: TaskResult = TaskResult(\n        task_name=self.name,\n        task_status=TaskStatus.PENDING,\n        summary=\"PENDING\",\n        payload=\"\",\n    )\n    self._task_parameters: TaskParameters = params\n    timeout: int = self._task_parameters.lute_config.task_timeout\n    signal.setitimer(signal.ITIMER_REAL, timeout)\n\n    run_directory: Optional[str] = self._task_parameters.Config.run_directory\n    if run_directory is not None:\n        try:\n            os.chdir(run_directory)\n        except FileNotFoundError:\n            warnings.warn(\n                (\n                    f\"Attempt to change to {run_directory}, but it is not found!\\n\"\n                    f\"Will attempt to run from {os.getcwd()}. It may fail!\"\n                ),\n                category=UserWarning,\n            )\n    self._use_mpi: bool = use_mpi\n
"},{"location":"source/tasks/task/#tasks.task.Task.clean_up_timeout","title":"clean_up_timeout()","text":"

Perform any necessary cleanup actions before exit if timing out.

Source code in lute/tasks/task.py
def clean_up_timeout(self) -> None:\n    \"\"\"Perform any necessary cleanup actions before exit if timing out.\"\"\"\n    ...\n
"},{"location":"source/tasks/task/#tasks.task.Task.run","title":"run()","text":"

Calls the analysis routines and any pre/post task functions.

This method is part of the public API and should not need to be modified in any subclasses.

Source code in lute/tasks/task.py
def run(self) -> None:\n    \"\"\"Calls the analysis routines and any pre/post task functions.\n\n    This method is part of the public API and should not need to be modified\n    in any subclasses.\n    \"\"\"\n    self._signal_start()\n    self._pre_run()\n    self._run()\n    self._post_run()\n    self._signal_result()\n
"},{"location":"source/tasks/task/#tasks.task.TaskResult","title":"TaskResult dataclass","text":"

Class for storing the result of a Task's execution with metadata.

Attributes:

Name Type Description task_name str

Name of the associated task which produced it.

task_status TaskStatus

Status of associated task.

summary str

Short message/summary associated with the result.

payload Any

Actual result. May be data in any format.

impl_schemas Optional[str]

A string listing Task schemas implemented by the associated Task. Schemas define the category and expected output of the Task. An individual task may implement/conform to multiple schemas. Multiple schemas are separated by ';', e.g. * impl_schemas = \"schema1;schema2\"

Source code in lute/tasks/dataclasses.py
@dataclass\nclass TaskResult:\n    \"\"\"Class for storing the result of a Task's execution with metadata.\n\n    Attributes:\n        task_name (str): Name of the associated task which produced it.\n\n        task_status (TaskStatus): Status of associated task.\n\n        summary (str): Short message/summary associated with the result.\n\n        payload (Any): Actual result. May be data in any format.\n\n        impl_schemas (Optional[str]): A string listing `Task` schemas implemented\n            by the associated `Task`. Schemas define the category and expected\n            output of the `Task`. An individual task may implement/conform to\n            multiple schemas. Multiple schemas are separated by ';', e.g.\n                * impl_schemas = \"schema1;schema2\"\n    \"\"\"\n\n    task_name: str\n    task_status: TaskStatus\n    summary: str\n    payload: Any\n    impl_schemas: Optional[str] = None\n
"},{"location":"source/tasks/task/#tasks.task.TaskStatus","title":"TaskStatus","text":"

Bases: Enum

Possible Task statuses.

Source code in lute/tasks/dataclasses.py
class TaskStatus(Enum):\n    \"\"\"Possible Task statuses.\"\"\"\n\n    PENDING = 0\n    \"\"\"\n    Task has yet to run. Is Queued, or waiting for prior tasks.\n    \"\"\"\n    RUNNING = 1\n    \"\"\"\n    Task is in the process of execution.\n    \"\"\"\n    COMPLETED = 2\n    \"\"\"\n    Task has completed without fatal errors.\n    \"\"\"\n    FAILED = 3\n    \"\"\"\n    Task encountered a fatal error.\n    \"\"\"\n    STOPPED = 4\n    \"\"\"\n    Task was, potentially temporarily, stopped/suspended.\n    \"\"\"\n    CANCELLED = 5\n    \"\"\"\n    Task was cancelled prior to completion or failure.\n    \"\"\"\n    TIMEDOUT = 6\n    \"\"\"\n    Task did not reach completion due to timeout.\n    \"\"\"\n
"},{"location":"source/tasks/task/#tasks.task.TaskStatus.CANCELLED","title":"CANCELLED = 5 class-attribute instance-attribute","text":"

Task was cancelled prior to completion or failure.

"},{"location":"source/tasks/task/#tasks.task.TaskStatus.COMPLETED","title":"COMPLETED = 2 class-attribute instance-attribute","text":"

Task has completed without fatal errors.

"},{"location":"source/tasks/task/#tasks.task.TaskStatus.FAILED","title":"FAILED = 3 class-attribute instance-attribute","text":"

Task encountered a fatal error.

"},{"location":"source/tasks/task/#tasks.task.TaskStatus.PENDING","title":"PENDING = 0 class-attribute instance-attribute","text":"

Task has yet to run. Is Queued, or waiting for prior tasks.

"},{"location":"source/tasks/task/#tasks.task.TaskStatus.RUNNING","title":"RUNNING = 1 class-attribute instance-attribute","text":"

Task is in the process of execution.

"},{"location":"source/tasks/task/#tasks.task.TaskStatus.STOPPED","title":"STOPPED = 4 class-attribute instance-attribute","text":"

Task was, potentially temporarily, stopped/suspended.

"},{"location":"source/tasks/task/#tasks.task.TaskStatus.TIMEDOUT","title":"TIMEDOUT = 6 class-attribute instance-attribute","text":"

Task did not reach completion due to timeout.

"},{"location":"source/tasks/task/#tasks.task.ThirdPartyTask","title":"ThirdPartyTask","text":"

Bases: Task

A Task interface to analysis with binary executables.

Source code in lute/tasks/task.py
class ThirdPartyTask(Task):\n    \"\"\"A `Task` interface to analysis with binary executables.\"\"\"\n\n    def __init__(self, *, params: TaskParameters) -> None:\n        \"\"\"Initialize a Task.\n\n        Args:\n            params (TaskParameters): Parameters needed to properly configure\n                the analysis task. `Task`s of this type MUST include the name\n                of a binary to run and any arguments which should be passed to\n                it (as would be done via command line). The binary is included\n                with the parameter `executable`. All other parameter names are\n                assumed to be the long/extended names of the flag passed on the\n                command line by default:\n                    * `arg_name = 3` is converted to `--arg_name 3`\n                Positional arguments can be included with `p_argN` where `N` is\n                any integer:\n                    * `p_arg1 = 3` is converted to `3`\n\n                Note that it is NOT recommended to rely on this default behaviour\n                as command-line arguments can be passed in many ways. Refer to\n                the dcoumentation at\n                https://slac-lcls.github.io/lute/tutorial/new_task/\n                under \"Speciyfing a TaskParameters Model for your Task\" for more\n                information on how to control parameter parsing from within your\n                TaskParameters model definition.\n        \"\"\"\n        super().__init__(params=params)\n        self._cmd = self._task_parameters.executable\n        self._args_list: List[str] = [self._cmd]\n        self._template_context: Dict[str, Any] = {}\n\n    def _add_to_jinja_context(self, param_name: str, value: Any) -> None:\n        \"\"\"Store a parameter as a Jinja template variable.\n\n        Variables are stored in a dictionary which is used to fill in a\n        premade Jinja template for a third party configuration file.\n\n        Args:\n            param_name (str): Name to store the variable as. This should be\n                the name defined in the corresponding pydantic model. This name\n                MUST match the name used in the Jinja Template!\n            value (Any): The value to store. If possible, large chunks of the\n                template should be represented as a single dictionary for\n                simplicity; however, any type can be stored as needed.\n        \"\"\"\n        context_update: Dict[str, Any] = {param_name: value}\n        if __debug__:\n            msg: Message = Message(contents=f\"TemplateParameters: {context_update}\")\n            self._report_to_executor(msg)\n        self._template_context.update(context_update)\n\n    def _template_to_config_file(self) -> None:\n        \"\"\"Convert a template file into a valid configuration file.\n\n        Uses Jinja to fill in a provided template file with variables supplied\n        through the LUTE config file. This facilitates parameter modification\n        for third party tasks which use a separate configuration, in addition\n        to, or instead of, command-line arguments.\n        \"\"\"\n        from jinja2 import Environment, FileSystemLoader, Template\n\n        out_file: str = self._task_parameters.lute_template_cfg.output_path\n        template_name: str = self._task_parameters.lute_template_cfg.template_name\n\n        lute_path: Optional[str] = os.getenv(\"LUTE_PATH\")\n        template_dir: str\n        if lute_path is None:\n            warnings.warn(\n                \"LUTE_PATH is None in Task process! Using relative path for templates!\",\n                category=UserWarning,\n            )\n            template_dir: str = \"../../config/templates\"\n        else:\n            template_dir = f\"{lute_path}/config/templates\"\n        environment: Environment = Environment(loader=FileSystemLoader(template_dir))\n        template: Template = environment.get_template(template_name)\n\n        with open(out_file, \"w\", encoding=\"utf-8\") as cfg_out:\n            cfg_out.write(template.render(self._template_context))\n\n    def _pre_run(self) -> None:\n        \"\"\"Parse the parameters into an appropriate argument list.\n\n        Arguments are identified by a `flag_type` attribute, defined in the\n        pydantic model, which indicates how to pass the parameter and its\n        argument on the command-line. This method parses flag:value pairs\n        into an appropriate list to be used to call the executable.\n\n        Note:\n        ThirdPartyParameter objects are returned by custom model validators.\n        Objects of this type are assumed to be used for a templated config\n        file used by the third party executable for configuration. The parsing\n        of these parameters is performed separately by a template file used as\n        an input to Jinja. This method solely identifies the necessary objects\n        and passes them all along. Refer to the template files and pydantic\n        models for more information on how these parameters are defined and\n        identified.\n        \"\"\"\n        super()._pre_run()\n        full_schema: Dict[str, Union[str, Dict[str, Any]]] = (\n            self._task_parameters.schema()\n        )\n        short_flags_use_eq: bool\n        long_flags_use_eq: bool\n        if hasattr(self._task_parameters.Config, \"short_flags_use_eq\"):\n            short_flags_use_eq: bool = self._task_parameters.Config.short_flags_use_eq\n            long_flags_use_eq: bool = self._task_parameters.Config.long_flags_use_eq\n        else:\n            short_flags_use_eq = False\n            long_flags_use_eq = False\n        for param, value in self._task_parameters.dict().items():\n            # Clunky test with __dict__[param] because compound model-types are\n            # converted to `dict`. E.g. type(value) = dict not AnalysisHeader\n            if (\n                param == \"executable\"\n                or value is None  # Cannot have empty values in argument list for execvp\n                or value == \"\"  # But do want to include, e.g. 0\n                or isinstance(self._task_parameters.__dict__[param], TemplateConfig)\n                or isinstance(self._task_parameters.__dict__[param], AnalysisHeader)\n            ):\n                continue\n            if isinstance(self._task_parameters.__dict__[param], TemplateParameters):\n                # TemplateParameters objects have a single parameter `params`\n                self._add_to_jinja_context(param_name=param, value=value.params)\n                continue\n\n            param_attributes: Dict[str, Any] = full_schema[\"properties\"][param]\n            # Some model params do not match the commnad-line parameter names\n            param_repr: str\n            if \"rename_param\" in param_attributes:\n                param_repr = param_attributes[\"rename_param\"]\n            else:\n                param_repr = param\n            if \"flag_type\" in param_attributes:\n                flag: str = param_attributes[\"flag_type\"]\n                if flag:\n                    # \"-\" or \"--\" flags\n                    if flag == \"--\" and isinstance(value, bool) and not value:\n                        continue\n                    constructed_flag: str = f\"{flag}{param_repr}\"\n                    if flag == \"--\" and isinstance(value, bool) and value:\n                        # On/off flag, e.g. something like --verbose: No Arg\n                        self._args_list.append(f\"{constructed_flag}\")\n                        continue\n                    if (flag == \"-\" and short_flags_use_eq) or (\n                        flag == \"--\" and long_flags_use_eq\n                    ):  # Must come after above check! Otherwise you get --param=True\n                        # Flags following --param=value or -param=value\n                        constructed_flag = f\"{constructed_flag}={value}\"\n                        self._args_list.append(f\"{constructed_flag}\")\n                        continue\n                    self._args_list.append(f\"{constructed_flag}\")\n            else:\n                warnings.warn(\n                    (\n                        f\"Model parameters should be defined using Field(...,flag_type='')\"\n                        f\" in the future.  Parameter: {param}\"\n                    ),\n                    category=PendingDeprecationWarning,\n                )\n                if len(param) == 1:  # Single-dash flags\n                    if short_flags_use_eq:\n                        self._args_list.append(f\"-{param_repr}={value}\")\n                        continue\n                    self._args_list.append(f\"-{param_repr}\")\n                elif \"p_arg\" in param:  # Positional arguments\n                    pass\n                else:  # Double-dash flags\n                    if isinstance(value, bool) and not value:\n                        continue\n                    if long_flags_use_eq:\n                        self._args_list.append(f\"--{param_repr}={value}\")\n                        continue\n                    self._args_list.append(f\"--{param_repr}\")\n                    if isinstance(value, bool) and value:\n                        continue\n            if isinstance(value, str) and \" \" in value:\n                for val in value.split():\n                    self._args_list.append(f\"{val}\")\n            else:\n                self._args_list.append(f\"{value}\")\n        if (\n            hasattr(self._task_parameters, \"lute_template_cfg\")\n            and self._template_context\n        ):\n            self._template_to_config_file()\n\n    def _run(self) -> None:\n        \"\"\"Execute the new program by replacing the current process.\"\"\"\n        if __debug__:\n            time.sleep(0.1)\n            msg: Message = Message(contents=self._formatted_command())\n            self._report_to_executor(msg)\n        LUTE_DEBUG_EXIT(\"LUTE_DEBUG_BEFORE_TPP_EXEC\")\n        os.execvp(file=self._cmd, args=self._args_list)\n\n    def _formatted_command(self) -> str:\n        \"\"\"Returns the command as it would passed on the command-line.\"\"\"\n        formatted_cmd: str = \"\".join(f\"{arg} \" for arg in self._args_list)\n        return formatted_cmd\n\n    def _signal_start(self) -> None:\n        \"\"\"Override start signal method to switch communication methods.\"\"\"\n        super()._signal_start()\n        time.sleep(0.05)\n        signal: str = \"NO_PICKLE_MODE\"\n        msg: Message = Message(signal=signal)\n        self._report_to_executor(msg)\n
"},{"location":"source/tasks/task/#tasks.task.ThirdPartyTask.__init__","title":"__init__(*, params)","text":"

Initialize a Task.

Parameters:

Name Type Description Default params TaskParameters

Parameters needed to properly configure the analysis task. Tasks of this type MUST include the name of a binary to run and any arguments which should be passed to it (as would be done via command line). The binary is included with the parameter executable. All other parameter names are assumed to be the long/extended names of the flag passed on the command line by default: * arg_name = 3 is converted to --arg_name 3 Positional arguments can be included with p_argN where N is any integer: * p_arg1 = 3 is converted to 3

Note that it is NOT recommended to rely on this default behaviour as command-line arguments can be passed in many ways. Refer to the dcoumentation at https://slac-lcls.github.io/lute/tutorial/new_task/ under \"Speciyfing a TaskParameters Model for your Task\" for more information on how to control parameter parsing from within your TaskParameters model definition.

required Source code in lute/tasks/task.py
def __init__(self, *, params: TaskParameters) -> None:\n    \"\"\"Initialize a Task.\n\n    Args:\n        params (TaskParameters): Parameters needed to properly configure\n            the analysis task. `Task`s of this type MUST include the name\n            of a binary to run and any arguments which should be passed to\n            it (as would be done via command line). The binary is included\n            with the parameter `executable`. All other parameter names are\n            assumed to be the long/extended names of the flag passed on the\n            command line by default:\n                * `arg_name = 3` is converted to `--arg_name 3`\n            Positional arguments can be included with `p_argN` where `N` is\n            any integer:\n                * `p_arg1 = 3` is converted to `3`\n\n            Note that it is NOT recommended to rely on this default behaviour\n            as command-line arguments can be passed in many ways. Refer to\n            the dcoumentation at\n            https://slac-lcls.github.io/lute/tutorial/new_task/\n            under \"Speciyfing a TaskParameters Model for your Task\" for more\n            information on how to control parameter parsing from within your\n            TaskParameters model definition.\n    \"\"\"\n    super().__init__(params=params)\n    self._cmd = self._task_parameters.executable\n    self._args_list: List[str] = [self._cmd]\n    self._template_context: Dict[str, Any] = {}\n
"},{"location":"source/tasks/test/","title":"test","text":"

Basic test Tasks for testing functionality.

Classes:

Name Description Test

Simplest test Task - runs a 10 iteration loop and returns a result.

TestSocket

Test Task which sends larger data to test socket IPC.

TestWriteOutput

Test Task which writes an output file.

TestReadOutput

Test Task which reads in a file. Can be used to test database access.

"},{"location":"source/tasks/test/#tasks.test.Test","title":"Test","text":"

Bases: Task

Simple test Task to ensure subprocess and pipe-based IPC work.

Source code in lute/tasks/test.py
class Test(Task):\n    \"\"\"Simple test Task to ensure subprocess and pipe-based IPC work.\"\"\"\n\n    def __init__(self, *, params: TaskParameters) -> None:\n        super().__init__(params=params)\n\n    def _run(self) -> None:\n        for i in range(10):\n            time.sleep(1)\n            msg: Message = Message(contents=f\"Test message {i}\")\n            self._report_to_executor(msg)\n        if self._task_parameters.throw_error:\n            raise RuntimeError(\"Testing Error!\")\n\n    def _post_run(self) -> None:\n        self._result.summary = \"Test Finished.\"\n        self._result.task_status = TaskStatus.COMPLETED\n        time.sleep(0.1)\n
"},{"location":"source/tasks/test/#tasks.test.TestReadOutput","title":"TestReadOutput","text":"

Bases: Task

Simple test Task to read in output from the test Task above.

Its pydantic model relies on a database access to retrieve the output file.

Source code in lute/tasks/test.py
class TestReadOutput(Task):\n    \"\"\"Simple test Task to read in output from the test Task above.\n\n    Its pydantic model relies on a database access to retrieve the output file.\n    \"\"\"\n\n    def __init__(self, *, params: TaskParameters) -> None:\n        super().__init__(params=params)\n\n    def _run(self) -> None:\n        array: np.ndarray = np.loadtxt(self._task_parameters.in_file, delimiter=\",\")\n        self._report_to_executor(msg=Message(contents=\"Successfully loaded data!\"))\n        for i in range(5):\n            time.sleep(1)\n\n    def _post_run(self) -> None:\n        super()._post_run()\n        self._result.summary = \"Was able to load data.\"\n        self._result.payload = \"This Task produces no output.\"\n        self._result.task_status = TaskStatus.COMPLETED\n
"},{"location":"source/tasks/test/#tasks.test.TestSocket","title":"TestSocket","text":"

Bases: Task

Simple test Task to ensure basic IPC over Unix sockets works.

Source code in lute/tasks/test.py
class TestSocket(Task):\n    \"\"\"Simple test Task to ensure basic IPC over Unix sockets works.\"\"\"\n\n    def __init__(self, *, params: TaskParameters) -> None:\n        super().__init__(params=params)\n\n    def _run(self) -> None:\n        for i in range(self._task_parameters.num_arrays):\n            msg: Message = Message(contents=f\"Sending array {i}\")\n            self._report_to_executor(msg)\n            time.sleep(0.05)\n            msg: Message = Message(\n                contents=np.random.rand(self._task_parameters.array_size)\n            )\n            self._report_to_executor(msg)\n\n    def _post_run(self) -> None:\n        super()._post_run()\n        self._result.summary = f\"Sent {self._task_parameters.num_arrays} arrays\"\n        self._result.payload = np.random.rand(self._task_parameters.array_size)\n        self._result.task_status = TaskStatus.COMPLETED\n
"},{"location":"source/tasks/test/#tasks.test.TestWriteOutput","title":"TestWriteOutput","text":"

Bases: Task

Simple test Task to write output other Tasks depend on.

Source code in lute/tasks/test.py
class TestWriteOutput(Task):\n    \"\"\"Simple test Task to write output other Tasks depend on.\"\"\"\n\n    def __init__(self, *, params: TaskParameters) -> None:\n        super().__init__(params=params)\n\n    def _run(self) -> None:\n        for i in range(self._task_parameters.num_vals):\n            # Doing some calculations...\n            time.sleep(0.05)\n            if i % 10 == 0:\n                msg: Message = Message(contents=f\"Processed {i+1} values!\")\n                self._report_to_executor(msg)\n\n    def _post_run(self) -> None:\n        super()._post_run()\n        work_dir: str = self._task_parameters.lute_config.work_dir\n        out_file: str = f\"{work_dir}/{self._task_parameters.outfile_name}\"\n        array: np.ndarray = np.random.rand(self._task_parameters.num_vals)\n        np.savetxt(out_file, array, delimiter=\",\")\n        self._result.summary = \"Completed task successfully.\"\n        self._result.payload = out_file\n        self._result.task_status = TaskStatus.COMPLETED\n
"},{"location":"tutorial/creating_workflows/","title":"Workflows with Airflow","text":"

Note: Airflow uses the term DAG, or directed acyclic graph, to describe workflows of tasks with defined (and acyclic) connectivities. This page will use the terms workflow and DAG interchangeably.

"},{"location":"tutorial/creating_workflows/#relevant-components","title":"Relevant Components","text":"

In addition to the core LUTE package, a number of components are generally involved to run a workflow. The current set of scripts and objects are used to interface with Airflow, and the SLURM job scheduler. The core LUTE library can also be used to run workflows using different backends, and in the future these may be supported.

For building and running workflows using SLURM and Airflow, the following components are necessary, and will be described in more detail below: - Airflow launch script: launch_airflow.py - This has a wrapper batch submission script: submit_launch_airflow.sh . When running using the ARP (from the eLog), you MUST use this wrapper script instead of the Python script directly. - SLURM submission script: submit_slurm.sh - Airflow operators: - JIDSlurmOperator

"},{"location":"tutorial/creating_workflows/#launchsubmission-scripts","title":"Launch/Submission Scripts","text":""},{"location":"tutorial/creating_workflows/#launch_airflowpy","title":"launch_airflow.py","text":"

Sends a request to an Airflow instance to submit a specific DAG (workflow). This script prepares an HTTP request with the appropriate parameters in a specific format.

A request involves the following information, most of which is retrieved automatically:

dag_run_data: Dict[str, Union[str, Dict[str, Union[str, int, List[str]]]]] = {\n    \"dag_run_id\": str(uuid.uuid4()),\n    \"conf\": {\n        \"experiment\": os.environ.get(\"EXPERIMENT\"),\n        \"run_id\": f\"{os.environ.get('RUN_NUM')}{datetime.datetime.utcnow().isoformat()}\",\n        \"JID_UPDATE_COUNTERS\": os.environ.get(\"JID_UPDATE_COUNTERS\"),\n        \"ARP_ROOT_JOB_ID\": os.environ.get(\"ARP_JOB_ID\"),\n        \"ARP_LOCATION\": os.environ.get(\"ARP_LOCATION\", \"S3DF\"),\n        \"Authorization\": os.environ.get(\"Authorization\"),\n        \"user\": getpass.getuser(),\n        \"lute_params\": params,\n        \"slurm_params\": extra_args,\n        \"workflow\": wf_defn,  # Used only for custom DAGs. See below under advanced usage.\n    },\n}\n

Note that the environment variables are used to fill in the appropriate information because this script is intended to be launched primarily from the ARP (which passes these variables). The ARP allows for the launch job to be defined in the experiment eLog and submitted automatically for each new DAQ run. The environment variables EXPERIMENT and RUN can alternatively be defined prior to submitting the script on the command-line.

The script takes a number of parameters:

launch_airflow.py -c <path_to_config_yaml> -w <workflow_name> [--debug] [--test] [-e <exp>] [-r <run>] [SLURM_ARGS]\n

Lifetime This script will run for the entire duration of the workflow (DAG). After making the initial request of Airflow to launch the DAG, it will enter a status update loop which will keep track of each individual job (each job runs one managed Task) submitted by Airflow. At the end of each job it will collect the log file, in addition to providing a few other status updates/debugging messages, and append it to its own log. This allows all logging for the entire workflow (DAG) to be inspected from an individual file. This is particularly useful when running via the eLog, because only a single log file is displayed.

"},{"location":"tutorial/creating_workflows/#submit_launch_airflowsh","title":"submit_launch_airflow.sh","text":"

This script is only necessary when running from the eLog using the ARP. The initial job submitted by the ARP can not have a duration of longer than 30 seconds, as it will then time out. As the launch_airflow.py job will live for the entire duration of the workflow, which is often much longer than 30 seconds, the solution was to have a wrapper which submits the launch_airflow.py script to run on the S3DF batch nodes. Usage of this script is mostly identical to launch_airflow.py. All the arguments are passed transparently to the underlying Python script with the exception of the first argument which must be the location of the underlying launch_airflow.py script. The wrapper will simply launch a batch job using minimal resources (1 core). While the primary purpose of the script is to allow running from the eLog, it is also an useful wrapper generally, to be able to submit the previous script as a SLURM job.

Usage:

submit_launch_airflow.sh /path/to/launch_airflow.py -c <path_to_config_yaml> -w <workflow_name> [--debug] [--test] [-e <exp>] [-r <run>] [SLURM_ARGS]\n
"},{"location":"tutorial/creating_workflows/#submit_slurmsh","title":"submit_slurm.sh","text":"

Launches a job on the S3DF batch nodes using the SLURM job scheduler. This script launches a single managed Task at a time. The usage is as follows:

submit_slurm.sh -c <path_to_config_yaml> -t <MANAGED_task_name> [--debug] [SLURM_ARGS ...]\n

As a reminder the managed Task refers to the Executor-Task combination. The script does not parse any SLURM specific parameters, and instead passes them transparently to SLURM. At least the following two SLURM arguments must be provided:

--partition=<...> # Usually partition=milano\n--account=<...> # Usually account=lcls:$EXPERIMENT\n

Generally, resource requests will also be included, such as the number of cores to use. A complete call may look like the following:

submit_slurm.sh -c /sdf/data/lcls/ds/hutch/experiment/scratch/config.yaml -t Tester --partition=milano --account=lcls:experiment --ntasks=100 [...]\n

When running a workflow using the launch_airflow.py script, each step of the workflow will be submitted using this script.

"},{"location":"tutorial/creating_workflows/#operators","title":"Operators","text":"

Operators are the objects submitted as individual steps of a DAG by Airflow. They are conceptually linked to the idea of a task in that each task of a workflow is generally an operator. Care should be taken, not to confuse them with LUTE Tasks or managed Tasks though. There is, however, usually a one-to-one correspondance between a Task and an Operator.

Airflow runs on a K8S cluster which has no access to the experiment data. When we ask Airflow to run a DAG, it will launch an Operator for each step of the DAG. However, the Operator itself cannot perform productive analysis without access to the data. The solution employed by LUTE is to have a limited set of Operators which do not perform analysis, but instead request that a LUTE managed Tasks be submitted on the batch nodes where it can access the data. There may be small differences between how the various provided Operators do this, but in general they will all make a request to the job interface daemon (JID) that a new SLURM job be scheduled using the submit_slurm.sh script described above.

Therefore, running a typical Airflow DAG involves the following steps:

  1. launch_airflow.py script is submitted, usually from a definition in the eLog.
  2. The launch_airflow script requests that Airflow run a specific DAG.
  3. The Airflow instance begins submitting the Operators that makeup the DAG definition.
  4. Each Operator sends a request to the JID to submit a job.
  5. The JID submits the elog_submit.sh script with the appropriate managed Task.
  6. The managed Task runs on the batch nodes, while the Operator, requesting updates from the JID on job status, waits for it to complete.
  7. Once a managed Task completes, the Operator will receieve this information and tell the Airflow server whether the job completed successfully or resulted in failure.
  8. The Airflow server will then launch the next step of the DAG, and so on, until every step has been executed.

Currently, the following Operators are maintained: - JIDSlurmOperator: The standard Operator. Each instance has a one-to-one correspondance with a LUTE managed Task.

"},{"location":"tutorial/creating_workflows/#jidslurmoperator-arguments","title":"JIDSlurmOperator arguments","text":""},{"location":"tutorial/creating_workflows/#creating-a-new-workflow","title":"Creating a new workflow","text":"

Defining a new workflow involves creating a new module (Python file) in the directory workflows/airflow, creating a number of Operator instances within the module, and then drawing the connectivity between them. At the top of the file an Airflow DAG is created and given a name. By convention all LUTE workflows use the name of the file as the name of the DAG. The following code can be copied exactly into the file:

from datetime import datetime\nimport os\nfrom airflow import DAG\nfrom lute.operators.jidoperators import JIDSlurmOperator # Import other operators if needed\n\ndag_id: str = f\"lute_{os.path.splitext(os.path.basename(__file__))[0]}\"\ndescription: str = (\n    \"Run SFX processing using PyAlgos peak finding and experimental phasing\"\n)\n\ndag: DAG = DAG(\n    dag_id=dag_id,\n    start_date=datetime(2024, 3, 18),\n    schedule_interval=None,\n    description=description,\n)\n

Once the DAG has been created, a number of Operators must be created to run the various LUTE analysis operations. As an example consider a partial SFX processing workflow which includes steps for peak finding, indexing, merging, and calculating figures of merit. Each of the 4 steps will have an Operator instance which will launch a corresponding LUTE managed Task, for example:

# Using only the JIDSlurmOperator\n# syntax: JIDSlurmOperator(task_id=\"LuteManagedTaskName\", dag=dag) # optionally, max_cores=123)\npeak_finder: JIDSlurmOperator = JIDSlurmOperator(task_id=\"PeakFinderPyAlgos\", dag=dag)\n\n# We specify a maximum number of cores for the rest of the jobs.\nindexer: JIDSlurmOperator = JIDSlurmOperator(\n    max_cores=120, task_id=\"CrystFELIndexer\", dag=dag\n)\n# We can alternatively specify this task be only ever run with the following args.\n# indexer: JIDSlurmOperator = JIDSlurmOperator(\n#     custom_slurm_params=\"--partition=milano --ntasks=120 --account=lcls:myaccount\",\n#     task_id=\"CrystFELIndexer\",\n#     dag=dag,\n# )\n\n# Merge\nmerger: JIDSlurmOperator = JIDSlurmOperator(\n    max_cores=120, task_id=\"PartialatorMerger\", dag=dag\n)\n\n# Figures of merit\nhkl_comparer: JIDSlurmOperator = JIDSlurmOperator(\n    max_cores=8, task_id=\"HKLComparer\", dag=dag\n)\n

Finally, the dependencies between the Operators are \"drawn\", defining the execution order of the various steps. The >> operator has been overloaded for the Operator class, allowing it to be used to specify the next step in the DAG. In this case, a completely linear DAG is drawn as:

peak_finder >> indexer >> merger >> hkl_comparer\n

Parallel execution can be added by using the >> operator multiple times. Consider a task1 which upon successful completion starts a task2 and task3 in parallel. This dependency can be added to the DAG using:

#task1: JIDSlurmOperator = JIDSlurmOperator(...)\n#task2 ...\n\ntask1 >> task2\ntask1 >> task3\n

As each DAG is defined in pure Python, standard control structures (loops, if statements, etc.) can be used to create more complex workflow arrangements.

Note: Your DAG will not be available to Airflow until your PR including the file you have defined is merged! Once merged the file will be synced with the Airflow instance and can be run using the scripts described earlier in this document. For testing it is generally preferred that you run each step of your DAG individually using the submit_slurm.sh script and the independent managed Task names. If, however, you want to test the behaviour of Airflow itself (in a modified form) you can use the advanced run-time DAGs defined below as well.

"},{"location":"tutorial/creating_workflows/#advanced-usage","title":"Advanced Usage","text":""},{"location":"tutorial/creating_workflows/#run-time-dag-creation","title":"Run-time DAG creation","text":"

In most cases, standard DAGs should be defined as described above and called by name. However, Airflow also supports the creation of DAGs dynamically, e.g. to vary the input data to various steps, or the number of steps that will occur. Some of this functionality has been used to allow for user-defined DAGs which are passed in the form of a dictionary, allowing Airflow to construct the workflow as it is running.

A basic YAML syntax is used to construct a series of nested dictionaries which define a DAG. Considering the first example DAG defined above (for serial femtosecond crystallography), the standard DAG looked like:

peak_finder >> indexer >> merger >> hkl_comparer\n

We can alternatively define this DAG in YAML:

task_name: PeakFinderPyAlgos\nslurm_params: ''\nnext:\n- task_name: CrystFELIndexer\n  slurm_params: ''\n  next: []\n  - task_name: PartialatorMerger\n    slurm_params: ''\n    next: []\n    - task_name: HKLComparer\n      slurm_params: ''\n      next: []\n

I.e. we define a tree where each node is constructed using Node(task_name: str, slurm_params: str, next: List[Node]).

As a second example, to run task1 followed by task2 and task3 in parellel we would use:

task_name: Task1\nslurm_params: ''\nnext:\n- task_name: Task2\n  slurm_params: ''\n  next: []\n- task_name: Task3\n  slurm_params: ''\n  next: []\n

In order to run a DAG defined this way we pass the path to the YAML file we have defined it in to the launch script using -W <path_to_dag>. This is instead of calling it by name. E.g.

/path/to/lute/launch_scripts/submit_launch_airflow.sh /path/to/lute/launch_scripts/launch_airflow.py -e <exp> -r <run> -c /path/to/config -W <path_to_dag> --test [--debug] [SLURM_ARGS]\n

Note that fewer options are currently supported for configuring the operators for each step of the DAG. The slurm arguments can be replaced in their entirety using a custom slurm_params string but individual options cannot be modified.

"},{"location":"tutorial/new_task/","title":"Integrating a New Task","text":"

Tasks can be broadly categorized into two types: - \"First-party\" - where the analysis or executed code is maintained within this library. - \"Third-party\" - where the analysis, code, or program is maintained elsewhere and is simply called by a wrapping Task.

Creating a new Task of either type generally involves the same steps, although for first-party Tasks, the analysis code must of course also be written. Due to this difference, as well as additional considerations for parameter handling when dealing with \"third-party\" Tasks, the \"first-party\" and \"third-party\" Task integration cases will be considered separately.

"},{"location":"tutorial/new_task/#creating-a-third-party-task","title":"Creating a \"Third-party\" Task","text":"

There are two required steps for third-party Task integration, and one additional step which is optional, and may not be applicable to all possible third-party Tasks. Generally, Task integration requires: 1. Defining a TaskParameters (pydantic) model which fully parameterizes the Task. This involves specifying a path to a binary, and all the required command-line arguments to run the binary. 2. Creating a managed Task by specifying an Executor for the new third-party Task. At this stage, any additional environment variables can be added which are required for the execution environment. 3. (Optional/Maybe applicable) Create a template for a third-party configuration file. If the new Task has its own configuration file, specifying a template will allow that file to be parameterized from the singular LUTE yaml configuration file. A couple of minor additions to the pydantic model specified in 1. are required to support template usage.

Each of these stages will be discussed in detail below. The vast majority of the work is completed in step 1.

"},{"location":"tutorial/new_task/#specifying-a-taskparameters-model-for-your-task","title":"Specifying a TaskParameters Model for your Task","text":"

A brief overview of parameters objects will be provided below. The following information goes into detail only about specifics related to LUTE configuration. An in depth description of pydantic is beyond the scope of this tutorial; please refer to the official documentation for more information. Please note that due to environment constraints pydantic is currently pinned to version 1.10! Make sure to read the appropriate documentation for this version as many things are different compared to the newer releases. At the end this document there will be an example highlighting some supported behaviour as well as a FAQ to address some common integration considerations.

Tasks and TaskParameters

All Tasks have a corresponding TaskParameters object. These objects are linked exclusively by a named relationship. For a Task named MyThirdPartyTask, the parameters object must be named MyThirdPartyTaskParameters. For third-party Tasks there are a number of additional requirements: - The model must inherit from a base class called ThirdPartyParameters. - The model must have one field specified called executable. The presence of this field indicates that the Task is a third-party Task and the specified executable must be called. This allows all third-party Tasks to be defined exclusively by their parameters model. A single ThirdPartyTask class handles execution of all third-party Tasks.

All models are stored in lute/io/models. For any given Task, a new model can be added to an existing module contained in this directory or to a new module. If creating a new module, make sure to add an import statement to lute.io.models.__init__.

Defining TaskParameters

When specifying parameters the default behaviour is to provide a one-to-one correspondance between the Python attribute specified in the parameter model, and the parameter specified on the command-line. Single-letter attributes are assumed to be passed using -, e.g. n will be passed as -n when the executable is launched. Longer attributes are passed using --, e.g. by default a model attribute named my_arg will be passed on the command-line as --my_arg. Positional arguments are specified using p_argX where X is a number. All parameters are passed in the order that they are specified in the model.

However, because the number of possible command-line combinations is large, relying on the default behaviour above is NOT recommended. It is provided solely as a fallback. Instead, there are a number of configuration knobs which can be tuned to achieve the desired behaviour. The two main mechanisms for controlling behaviour are specification of model-wide configuration under the Config class within the model's definition, and parameter-by-parameter configuration using field attributes. For the latter, we define all parameters as Field objects. This allows parameters to have their own attributes, which are parsed by LUTE's task-layer. Given this, the preferred starting template for a TaskParameters model is the following - we assume we are integrating a new Task called RunTask:

\nfrom pydantic import Field, validator\n# Also include any pydantic type specifications - Pydantic has many custom\n# validation types already, e.g. types for constrained numberic values, URL handling, etc.\n\nfrom .base import ThirdPartyParameters\n\n# Change class name as necessary\nclass RunTaskParameters(ThirdPartyParameters):\n    \"\"\"Parameters for RunTask...\"\"\"\n\n    class Config(ThirdPartyParameters.Config): # MUST be exactly as written here.\n        ...\n        # Model-wide configuration will go here\n\n    executable: str = Field(\"/path/to/executable\", description=\"...\")\n    ...\n    # Additional params.\n    # param1: param1Type = Field(\"default\", description=\"\", ...)\n

Config settings and options Under the class definition for Config in the model, we can modify global options for all the parameters. In addition, there are a number of configuration options related to specifying what the outputs/results from the associated Task are, and a number of options to modify runtime behaviour. Currently, the available configuration options are:

Config Parameter Meaning Default Value ThirdPartyTask-specific? run_directory If provided, can be used to specify the directory from which a Task is run. None (not provided) NO set_result bool. If True search the model definition for a parameter that indicates what the result is. False NO result_from_params If set_result is True can define a result using this option and a validator. See also is_result below. None (not provided) NO short_flags_use_eq Use equals sign instead of space for arguments of - parameters. False YES - Only affects ThirdPartyTasks long_flags_use_eq Use equals sign instead of space for arguments of - parameters. False YES - Only affects ThirdPartyTasks

These configuration options modify how the parameter models are parsed and passed along on the command-line, as well as what we consider results and where a Task can run. The default behaviour is that parameters are assumed to be passed as -p arg and --param arg, the Task will be run in the current working directory (or scratch if submitted with the ARP), and we have no information about Task results . Setting the above options can modify this behaviour.

Field attributes In addition to the global configuration options there are a couple of ways to specify individual parameters. The following Field attributes are used when parsing the model:

Field Attribute Meaning Default Value Example flag_type Specify the type of flag for passing this argument. One of \"-\", \"--\", or \"\" N/A p_arg1 = Field(..., flag_type=\"\") rename_param Change the name of the parameter as passed on the command-line. N/A my_arg = Field(..., rename_param=\"my-arg\") description Documentation of the parameter's usage or purpose. N/A arg = Field(..., description=\"Argument for...\") is_result bool. If the set_result Config option is True, we can set this to True to indicate a result. N/A output_result = Field(..., is_result=true)

The flag_type attribute allows us to specify whether the parameter corresponds to a positional (\"\") command line argument, requires a single hyphen (\"-\"), or a double hyphen (\"--\"). By default, the parameter name is passed as-is on the command-line. However, command-line arguments can have characters which would not be valid in Python variable names. In particular, hyphens are frequently used. To handle this case, the rename_param attribute can be used to specify an alternative spelling of the parameter when it is passed on the command-line. This also allows for using more descriptive variable names internally than those used on the command-line. A description can also be provided for each Field to document the usage and purpose of that particular parameter.

As an example, we can again consider defining a model for a RunTask Task. Consider an executable which would normally be called from the command-line as follows:

/sdf/group/lcls/ds/tools/runtask -n <nthreads> --method=<algorithm> -p <algo_param> [--debug]\n

A model specification for this Task may look like:

class RunTaskParameters(ThirdPartyParameters):\n    \"\"\"Parameters for the runtask binary.\"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        long_flags_use_eq: bool = True  # For the --method parameter\n\n    # Prefer using full/absolute paths where possible.\n    # No flag_type needed for this field\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/runtask\", description=\"Runtask Binary v1.0\"\n    )\n\n    # We can provide a more descriptive name for -n\n    # Let's assume it's a number of threads, or processes, etc.\n    num_threads: int = Field(\n        1, description=\"Number of concurrent threads.\", flag_type=\"-\", rename_param=\"n\"\n    )\n\n    # In this case we will use the Python variable name directly when passing\n    # the parameter on the command-line\n    method: str = Field(\"algo1\", description=\"Algorithm to use.\", flag_type=\"--\")\n\n    # For an actual parameter we would probably have a better name. Lets assume\n    # This parameter (-p) modifies the behaviour of the method above.\n    method_param1: int = Field(\n        3, description=\"Modify method performance.\", flag_type=\"-\", rename_param=\"p\"\n    )\n\n    # Boolean flags are only passed when True! `--debug` is an optional parameter\n    # which is not followed by any arguments.\n    debug: bool = Field(\n        False, description=\"Whether to run in debug mode.\", flag_type=\"--\"\n    )\n

The is_result attribute allows us to specify whether the corresponding Field points to the output/result of the associated Task. Consider a Task, RunTask2 which writes its output to a single file which is passed as a parameter.

class RunTask2Parameters(ThirdPartyParameters):\n    \"\"\"Parameters for the runtask2 binary.\"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        set_result: bool = True                     # This must be set here!\n        # result_from_params: Optional[str] = None  # We can use this for more complex result setups (see below). Ignore for now.\n\n    # Prefer using full/absolute paths where possible.\n    # No flag_type needed for this field\n    executable: str = Field(\n        \"/sdf/group/lcls/ds/tools/runtask2\", description=\"Runtask Binary v2.0\"\n    )\n\n    # Lets assume we take one input and write one output file\n    # We will not provide a default value, so this parameter MUST be provided\n    input: str = Field(\n        description=\"Path to input file.\", flag_type=\"--\"\n    )\n\n    # We will also not provide a default for the output\n    # BUT, we will specify that whatever is provided is the result\n    output: str = Field(\n        description=\"Path to write output to.\",\n        flag_type=\"-\",\n        rename_param=\"o\",\n        is_result=True,   # This means this parameter points to the result!\n    )\n

Additional Comments 1. Model parameters of type bool are not passed with an argument and are only passed when True. This is a common use-case for boolean flags which enable things like test or debug modes, verbosity or reporting features. E.g. --debug, --test, --verbose, etc. - If you need to pass the literal words \"True\" or \"False\", use a parameter of type str. 2. You can use pydantic types to constrain parameters beyond the basic Python types. E.g. conint can be used to define lower and upper bounds for an integer. There are also types for common categories, positive/negative numbers, paths, URLs, IP addresses, etc. - Even more custom behaviour can be achieved with validators (see below). 3. All TaskParameters objects and its subclasses have access to a lute_config parameter, which is of type lute.io.models.base.AnalysisHeader. This special parameter is ignored when constructing the call for a binary task, but it provides access to shared/common parameters between tasks. For example, the following parameters are available through the lute_config object, and may be of use when constructing validators. All fields can be accessed with . notation. E.g. lute_config.experiment. - title: A user provided title/description of the analysis. - experiment: The current experiment name - run: The current acquisition run number - date: The date of the experiment or the analysis. - lute_version: The version of the software you are running. - task_timeout: How long a Task can run before it is killed. - work_dir: The main working directory for LUTE. Files and the database are created relative to this directory. This is separate from the run_directory config option. LUTE will write files to the work directory by default; however, the Task itself is run from run_directory if it is specified.

Validators Pydantic uses validators to determine whether a value for a specific field is appropriate. There are default validators for all the standard library types and the types specified within the pydantic package; however, it is straightforward to define custom ones as well. In the template code-snippet above we imported the validator decorator. To create our own validator we define a method (with any name) with the following prototype, and decorate it with the validator decorator:

@validator(\"name_of_field_to_decorate\")\ndef my_custom_validator(cls, field: Any, values: Dict[str, Any]) -> Any: ...\n

In this snippet, the field variable corresponds to the value for the specific field we want to validate. values is a dictionary of fields and their values which have been parsed prior to the current field. This means you can validate the value of a parameter based on the values provided for other parameters. Since pydantic always validates the fields in the order they are defined in the model, fields dependent on other fields should come later in the definition.

For example, consider the method_param1 field defined above for RunTask. We can provide a custom validator which changes the default value for this field depending on what type of algorithm is specified for the --method option. We will also constrain the options for method to two specific strings.

from pydantic import Field, validator, ValidationError, root_validator\nclass RunTaskParameters(ThirdPartyParameters):\n    \"\"\"Parameters for the runtask binary.\"\"\"\n\n    # [...]\n\n    # In this case we will use the Python variable name directly when passing\n    # the parameter on the command-line\n    method: str = Field(\"algo1\", description=\"Algorithm to use.\", flag_type=\"--\")\n\n    # For an actual parameter we would probably have a better name. Lets assume\n    # This parameter (-p) modifies the behaviour of the method above.\n    method_param1: Optional[int] = Field(\n        description=\"Modify method performance.\", flag_type=\"-\", rename_param=\"p\"\n    )\n\n    # We will only allow method to take on one of two values\n    @validator(\"method\")\n    def validate_method(cls, method: str, values: Dict[str, Any]) -> str:\n        \"\"\"Method validator: --method can be algo1 or algo2.\"\"\"\n\n        valid_methods: List[str] = [\"algo1\", \"algo2\"]\n        if method not in valid_methods:\n            raise ValueError(\"method must be algo1 or algo2\")\n        return method\n\n    # Lets change the default value of `method_param1` depending on `method`\n    # NOTE: We didn't provide a default value to the Field above and made it\n    # optional. We can use this to test whether someone is purposefully\n    # overriding the value of it, and if not, set the default ourselves.\n    # We set `always=True` since pydantic will normally not use the validator\n    # if the default is not changed\n    @validator(\"method_param1\", always=True)\n    def validate_method_param1(cls, param1: Optional[int], values: Dict[str, Any]) -> int:\n        \"\"\"method param1 validator\"\"\"\n\n        # If someone actively defined it, lets just return that value\n        # We could instead do some additional validation to make sure that the\n        # value they provided is valid...\n        if param1 is not None:\n            return param1\n\n        # method_param1 comes after method, so this will be defined, or an error\n        # would have been raised.\n        method: str = values['method']\n        if method == \"algo1\":\n            return 3\n        elif method == \"algo2\":\n            return 5\n

The special root_validator(pre=False) can also be used to provide validation of the model as a whole. This is also the recommended method for specifying a result (using result_from_params) which has a complex dependence on the parameters of the model. This latter use-case is described in FAQ 2 below.

"},{"location":"tutorial/new_task/#faq","title":"FAQ","text":"
  1. How can I specify a default value which depends on another parameter?

Use a custom validator. The example above shows how to do this. The parameter that depends on another parameter must come LATER in the model defintion than the independent parameter.

  1. My TaskResult is determinable from the parameters model, but it isn't easily specified by one parameter. How can I use result_from_params to indicate the result?

When a result can be identified from the set of parameters defined in a TaskParameters model, but is not as straightforward as saying it is equivalent to one of the parameters alone, we can set result_from_params using a custom validator. In the example below, we have two parameters which together determine what the result is, output_dir and out_name. Using a validator we will define a result from these two values.

from pydantic import Field, root_validator\n\nclass RunTask3Parameters(ThirdPartyParameters):\n    \"\"\"Parameters for the runtask3 binary.\"\"\"\n\n    class Config(ThirdPartyParameters.Config):\n        set_result: bool = True       # This must be set here!\n        result_from_params: str = \"\"  # We will set this momentarily\n\n    # [...] executable, other params, etc.\n\n    output_dir: str = Field(\n        description=\"Directory to write output to.\",\n        flag_type=\"--\",\n        rename_param=\"dir\",\n    )\n\n    out_name: str = Field(\n        description=\"The name of the final output file.\",\n        flag_type=\"--\",\n        rename_param=\"oname\",\n    )\n\n    # We can still provide other validators as needed\n    # But for now, we just set result_from_params\n    # Validator name can be anything, we set pre=False so this runs at the end\n    @root_validator(pre=False)\n    def define_result(cls, values: Dict[str, Any]) -> Dict[str, Any]:\n        # Extract the values of output_dir and out_name\n        output_dir: str = values[\"output_dir\"]\n        out_name: str = values[\"out_name\"]\n\n        result: str = f\"{output_dir}/{out_name}\"\n        # Now we set result_from_params\n        cls.Config.result_from_params = result\n\n        # We haven't modified any other values, but we MUST return this!\n        return values\n
  1. My new Task depends on the output of a previous Task, how can I specify this dependency? Parameters used to run a Task are recorded in a database for every Task. It is also recorded whether or not the execution of that specific parameter set was successful. A utility function is provided to access the most recent values from the database for a specific parameter of a specific Task. It can also be used to specify whether unsuccessful Tasks should be included in the query. This utility can be used within a validator to specify dependencies. For example, suppose the input of RunTask2 (parameter input) depends on the output location of RunTask1 (parameter outfile). A validator of the following type can be used to retrieve the output file and make it the default value of the input parameter.
from pydantic import Field, validator\n\nfrom .base import ThirdPartyParameters\nfrom ..db import read_latest_db_entry\n\nclass RunTask2Parameters(ThirdPartyParameters):\n    input: str = Field(\"\", description=\"Input file.\", flag_type=\"--\")\n\n    @validator(\"input\")\n    def validate_input(cls, input: str, values: Dict[str, Any]) -> str:\n        if input == \"\":\n            task1_out: Optional[str] = read_latest_db_entry(\n                f\"{values['lute_config'].work_dir}\",  # Working directory. We search for the database here.\n                \"RunTask1\",                           # Name of Task we want to look up\n                \"outfile\",                            # Name of parameter of the Task\n                valid_only=True,                      # We only want valid output files.\n            )\n            # read_latest_db_entry returns None if nothing is found\n            if task1_out is not None:\n                return task1_out\n        return input\n

There are more examples of this pattern spread throughout the various Task models.

"},{"location":"tutorial/new_task/#specifying-an-executor-creating-a-runnable-managed-task","title":"Specifying an Executor: Creating a runnable, \"managed Task\"","text":"

Overview

After a pydantic model has been created, the next required step is to define a managed Task. In the context of this library, a managed Task refers to the combination of an Executor and a Task to run. The Executor manages the process of Task submission and the execution environment, as well as performing any logging, eLog communication, etc. There are currently two types of Executor to choose from, but only one is applicable to third-party code. The second Executor is listed below for completeness only. If you need MPI see the note below.

  1. Executor: This is the standard Executor. It should be used for third-party uses cases.
  2. MPIExecutor: This performs all the same types of operations as the option above; however, it will submit your Task using MPI.
  3. The MPIExecutor will submit the Task using the number of available cores - 1. The number of cores is determined from the physical core/thread count on your local machine, or the number of cores allocated by SLURM when submitting on the batch nodes.

Using MPI with third-party Tasks

As mentioned, you should setup a third-party Task to use the first type of Executor. If, however, your third-party Task uses MPI this may seem non-intuitive. When using the MPIExecutor LUTE code is submitted with MPI. This includes the code that performs signalling to the Executor and execs the third-party code you are interested in running. While it is possible to set this code up to run with MPI, it is more challenging in the case of third-party Tasks because there is no Task code to modify directly! The MPIExecutor is provided mostly for first-party code. This is not an issue, however, since the standard Executor is easily configured to run with MPI in the case of third-party code.

When using the standard Executor for a Task requiring MPI, the executable in the pydantic model must be set to mpirun. For example, a third-party Task model, that uses MPI but is intended to be run with the Executor may look like the following. We assume this Task runs a Python script using MPI.

class RunMPITaskParameters(ThirdPartyParameters):\n    class Config(ThirdPartyParameters.Config):\n        ...\n\n    executable: str = Field(\"mpirun\", description=\"MPI executable\")\n    np: PositiveInt = Field(\n        max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n        description=\"Number of processes\",\n        flag_type=\"-\",\n    )\n    pos_arg: str = Field(\"python\", description=\"Python...\", flag_type=\"\")\n    script: str = Field(\"\", description=\"Python script to run with MPI\", flag_type=\"\")\n

Selecting the Executor

After deciding on which Executor to use, a single line must be added to the lute/managed_tasks.py module:

# Initialization: Executor(\"TaskName\")\nTaskRunner: Executor = Executor(\"SubmitTask\")\n# TaskRunner: MPIExecutor = MPIExecutor(\"SubmitTask\") ## If using the MPIExecutor\n

In an attempt to make it easier to discern whether discussing a Task or managed Task, the standard naming convention is that the Task (class name) will have a verb in the name, e.g. RunTask, SubmitTask. The corresponding managed Task will use a related noun, e.g. TaskRunner, TaskSubmitter, etc.

As a reminder, the Task name is the first part of the class name of the pydantic model, without the Parameters suffix. This name must match. E.g. if your pydantic model's class name is RunTaskParameters, the Task name is RunTask, and this is the string passed to the Executor initializer.

Modifying the environment

If your third-party Task can run in the standard psana environment with no further configuration files, the setup process is now complete and your Task can be run within the LUTE framework. If on the other hand your Task requires some changes to the environment, this is managed through the Executor. There are a couple principle methods that the Executor has to change the environment.

  1. Executor.update_environment: if you only need to add a few environment variables, or update the PATH this is the method to use. The method takes a Dict[str, str] as input. Any variables can be passed/defined using this method. By default, any variables in the dictionary will overwrite those variable definitions in the current environment if they are already present, except for the variable PATH. By default PATH entries in the dictionary are prepended to the current PATH available in the environment the Executor runs in (the standard psana environment). This behaviour can be changed to either append, or overwrite the PATH entirely by an optional second argument to the method.
  2. Executor.shell_source: This method will source a shell script which can perform numerous modifications of the environment (PATH changes, new environment variables, conda environments, etc.). The method takes a str which is the path to a shell script to source.

As an example, we will update the PATH of one Task and source a script for a second.

TaskRunner: Executor = Executor(\"RunTask\")\n# update_environment(env: Dict[str,str], update_path: str = \"prepend\") # \"append\" or \"overwrite\"\nTaskRunner.update_environment(\n    { \"PATH\": \"/sdf/group/lcls/ds/tools\" }  # This entry will be prepended to the PATH available after sourcing `psconda.sh`\n)\n\nTask2Runner: Executor = Executor(\"RunTask2\")\nTask2Runner.shell_source(\"/sdf/group/lcls/ds/tools/new_task_setup.sh\") # Will source new_task_setup.sh script\n
"},{"location":"tutorial/new_task/#using-templates-managing-third-party-configuration-files","title":"Using templates: managing third-party configuration files","text":"

Some third-party executables will require their own configuration files. These are often separate JSON or YAML files, although they can also be bash or Python scripts which are intended to be edited. Since LUTE requires its own configuration YAML file, it attempts to handle these cases by using Jinja templates. When wrapping a third-party task a template can also be provided - with small modifications to the Task's pydantic model, LUTE can process special types of parameters to render them in the template. LUTE offloads all the template rendering to Jinja, making the required additions to the pydantic model small. On the other hand, it does require understanding the Jinja syntax, and the provision of a well-formatted template, to properly parse parameters. Some basic examples of this syntax will be shown below; however, it is recommended that the Task implementer refer to the official Jinja documentation for more information.

LUTE provides two additional base models which are used for template parsing in conjunction with the primary Task model. These are: - TemplateParameters objects which hold parameters which will be used to render a portion of a template. - TemplateConfig objects which hold two strings: the name of the template file to use and the full path (including filename) of where to output the rendered result.

Task models which inherit from the ThirdPartyParameters model, as all third-party Tasks should, allow for extra arguments. LUTE will parse any extra arguments provided in the configuration YAML as TemplateParameters objects automatically, which means that they do not need to be explicitly added to the pydantic model (although they can be). As such the only requirement on the Python-side when adding template rendering functionality to the Task is the addition of one parameter - an instance of TemplateConfig. The instance MUST be called lute_template_cfg.

from pydantic import Field, validator\n\nfrom .base import TemplateConfig\n\nclass RunTaskParamaters(ThirdPartyParameters):\n    ...\n    # This parameter MUST be called lute_template_cfg!\n    lute_template_cfg: TemplateConfig = Field(\n        TemplateConfig(\n            template_name=\"name_of_template.json\",\n            output_path=\"/path/to/write/rendered_output_to.json\",\n        ),\n        description=\"Template rendering configuration\",\n    )\n

LUTE looks for the template in config/templates, so only the name of the template file to use within that directory is required for the template_name attribute of lute_template_cfg. LUTE can write the output anywhere (the user has permissions), and with any name, so the full absolute path including filename should be used for the output_path of lute_template_cfg.

The rest of the work is done by the combination of Jinja, LUTE's configuration YAML file, and the template itself. Understanding the interplay between these components is perhaps best illustrated by an example. As such, let us consider a simple third-party Task whose only input parameter (on the command-line) is the location of a configuration JSON file. We'll call the third-party executable jsonuser and our Task model, the RunJsonUserParameters. We assume the program is run like:

jsonuser -i <input_file.json>\n

The first step is to setup the pydantic model as before.

from pydantic import Field, validator\n\nfrom .base import TemplateConfig\n\nclass RunJsonUserParameters:\n    executable: str = Field(\n        \"/path/to/jsonuser\", description=\"Executable which requires a JSON configuration file.\"\n    )\n    # Lets assume the JSON file is passed as \"-i <path_to_json>\"\n    input_json: str = Field(\n        \"\", description=\"Path to the input JSON file.\", flag_type=\"-\", rename_param=\"i\"\n    )\n

The next step is to create a template for the JSON file. Let's assume the JSON file looks like:

{\n    \"param1\": \"arg1\",\n    \"param2\": 4,\n    \"param3\": {\n        \"a\": 1,\n        \"b\": 2\n    },\n    \"param4\": [\n        1,\n        2,\n        3\n    ]\n}\n

Any, or all of these values can be substituted for, and we can determine the way in which we will provide them. I.e. a substitution can be provided for each variable individually, or, for example for a nested hierarchy, a dictionary can be provided which will substitute all the items at once. For this simple case, let's provide variables for param1, param2, param3.b and assume that we want the first and second entries for param4 to be identical for our use case (i.e., we can use one variable for them both. In total, this means we will perform 5 substitutions using 4 variables. Jinja will substitute a variable anywhere it sees the following syntax, {{ variable_name }}. As such a valid template for our use-case may look like:

{\n    \"param1\": {{ str_var }},\n    \"param2\": {{ int_var }},\n    \"param3\": {\n        \"a\": 1,\n        \"b\": {{ p3_b }}\n    },\n    \"param4\": [\n        {{ val }},\n        {{ val }},\n        3\n    ]\n}\n

We save this file as jsonuser.json in config/templates. Next, we will update the original pydantic model to include our template configuration. We still have an issue, however, in that we need to decide where to write the output of the template to. In this case, we can use the input_json parameter. We will assume that the user will provide this, although a default value can also be used. A custom validator will be added so that we can take the input_json value and update the value of lute_template_cfg.output_path with it.

# from typing import Optional\n\nfrom pydantic import Field, validator\n\nfrom .base import TemplateConfig #, TemplateParameters\n\nclass RunJsonUserParameters:\n    executable: str = Field(\n        \"jsonuser\", description=\"Executable which requires a JSON configuration file.\"\n    )\n    # Lets assume the JSON file is passed as \"-i <path_to_json>\"\n    input_json: str = Field(\n        \"\", description=\"Path to the input JSON file.\", flag_type=\"-\", rename_param=\"i\"\n    )\n    # Add template configuration! *MUST* be called `lute_template_cfg`\n    lute_template_cfg: TemplateConfig = Field(\n        TemplateConfig(\n            template_name=\"jsonuser.json\", # Only the name of the file here.\n            output_path=\"\",\n        ),\n        description=\"Template rendering configuration\",\n    )\n    # We do not need to include these TemplateParameters, they will be added\n    # automatically if provided in the YAML\n    #str_var: Optional[TemplateParameters]\n    #int_var: Optional[TemplateParameters]\n    #p3_b: Optional[TemplateParameters]\n    #val: Optional[TemplateParameters]\n\n\n    # Tell LUTE to write the rendered template to the location provided with\n    # `input_json`. I.e. update `lute_template_cfg.output_path`\n    @validator(\"lute_template_cfg\", always=True)\n    def update_output_path(\n        cls, lute_template_cfg: TemplateConfig, values: Dict[str, Any]\n    ) -> TemplateConfig:\n        if lute_template_cfg.output_path == \"\":\n            lute_template_cfg.output_path = values[\"input_json\"]\n        return lute_template_cfg\n

All that is left to render the template, is to provide the variables we want to substitute in the LUTE configuration YAML. In our case we must provide the 4 variable names we included within the substitution syntax ({{ var_name }}). The names in the YAML must match those in the template.

RunJsonUser:\n    input_json: \"/my/chosen/path.json\" # We'll come back to this...\n    str_var: \"arg1\" # Will substitute for \"param1\": \"arg1\"\n    int_var: 4 # Will substitute for \"param2\": 4\n    p3_b: 2  # Will substitute for \"param3: { \"b\": 2 }\n    val: 2 # Will substitute for \"param4\": [2, 2, 3] in the JSON\n

If on the other hand, a user were to have an already valid JSON file, it is possible to turn off the template rendering. (ALL) Template variables (TemplateParameters) are simply excluded from the configuration YAML.

RunJsonUser:\n    input_json: \"/path/to/existing.json\"\n    #str_var: ...\n    #...\n
"},{"location":"tutorial/new_task/#additional-jinja-syntax","title":"Additional Jinja Syntax","text":"

There are many other syntactical constructions we can use with Jinja. Some of the useful ones are:

If Statements - E.g. only include portions of the template if a value is defined.

{% if VARNAME is defined %}\n// Stuff to include\n{% endif %}\n

Loops - E.g. Unpacking multiple elements from a dictionary.

{% for name, value in VARNAME.items() %}\n// Do stuff with name and value\n{% endfor %}\n
"},{"location":"tutorial/new_task/#creating-a-first-party-task","title":"Creating a \"First-Party\" Task","text":"

The process for creating a \"First-Party\" Task is very similar to that for a \"Third-Party\" Task, with the difference being that you must also write the analysis code. The steps for integration are: 1. Write the TaskParameters model. 2. Write the Task class. There are a few rules that need to be adhered to. 3. Make your Task available by modifying the import function. 4. Specify an Executor

"},{"location":"tutorial/new_task/#specifying-a-taskparameters-model-for-your-task_1","title":"Specifying a TaskParameters Model for your Task","text":"

Parameter models have a format that must be followed for \"Third-Party\" Tasks, but \"First-Party\" Tasks have a little more liberty in how parameters are dealt with, since the Task will do all the parsing itself.

To create a model, the basic steps are: 1. If necessary, create a new module (e.g. new_task_category.py) under lute.io.models, or find an appropriate pre-existing module in that directory. - An import statement must be added to lute.io.models._init_ if a new module is created, so it can be found. - If defining the model in a pre-existing module, make sure to modify the __all__ statement to include it. 2. Create a new model that inherits from TaskParameters. You can look at lute.models.io.tests.TestReadOutputParameters for an example. The model must be named <YourTaskName>Parameters - You should include all relevant parameters here, including input file, output file, and any potentially adjustable parameters. These parameters must be included even if there are some implicit dependencies between Tasks and it would make sense for the parameter to be auto-populated based on some other output. Creating this dependency is done with validators (see step 3.). All parameters should be overridable, and all Tasks should be fully-independently configurable, based solely on their model and the configuration YAML. - To follow the preferred format, parameters should be defined as: param_name: type = Field([default value], description=\"This parameter does X.\") 3. Use validators to do more complex things for your parameters, including populating default values dynamically: - E.g. create default values that depend on other parameters in the model - see for example: SubmitSMDParameters. - E.g. create default values that depend on other Tasks by reading from the database - see for example: TestReadOutputParameters. 4. The model will have access to some general configuration values by inheriting from TaskParameters. These parameters are all stored in lute_config which is an instance of AnalysisHeader (defined here). - For example, the experiment and run number can be obtained from this object and a validator could use these values to define the default input file for the Task.

A number of configuration options and Field attributes are also available for \"First-Party\" Task models. These are identical to those used for the ThirdPartyTasks, although there is a smaller selection. These options are reproduced below for convenience.

Config settings and options Under the class definition for Config in the model, we can modify global options for all the parameters. In addition, there are a number of configuration options related to specifying what the outputs/results from the associated Task are, and a number of options to modify runtime behaviour. Currently, the available configuration options are:

Config Parameter Meaning Default Value ThirdPartyTask-specific? run_directory If provided, can be used to specify the directory from which a Task is run. None (not provided) NO set_result bool. If True search the model definition for a parameter that indicates what the result is. False NO result_from_params If set_result is True can define a result using this option and a validator. See also is_result below. None (not provided) NO short_flags_use_eq Use equals sign instead of space for arguments of - parameters. False YES - Only affects ThirdPartyTasks long_flags_use_eq Use equals sign instead of space for arguments of - parameters. False YES - Only affects ThirdPartyTasks

These configuration options modify how the parameter models are parsed and passed along on the command-line, as well as what we consider results and where a Task can run. The default behaviour is that parameters are assumed to be passed as -p arg and --param arg, the Task will be run in the current working directory (or scratch if submitted with the ARP), and we have no information about Task results . Setting the above options can modify this behaviour.

Field attributes In addition to the global configuration options there are a couple of ways to specify individual parameters. The following Field attributes are used when parsing the model:

Field Attribute Meaning Default Value Example description Documentation of the parameter's usage or purpose. N/A arg = Field(..., description=\"Argument for...\") is_result bool. If the set_result Config option is True, we can set this to True to indicate a result. N/A output_result = Field(..., is_result=true)"},{"location":"tutorial/new_task/#writing-the-task","title":"Writing the Task","text":"

You can write your analysis code (or whatever code to be executed) as long as it adheres to the limited rules below. You can create a new module for your Task in lute.tasks or add it to any existing module, if it makes sense for it to belong there. The Task itself is a single class constructed as:

  1. Your analysis Task is a class named in a way that matches its Pydantic model. E.g. RunTask is the Task, and RunTaskParameters is the Pydantic model.
  2. The class must inherit from the Task class (see template below). If you intend to use MPI see the following section.
  3. You must provide an implementation of a _run method. This is the method that will be executed when the Task is run. You can in addition write as many methods as you need. For fine-grained execution control you can also provide _pre_run() and _post_run() methods, but this is optional.
  4. For all communication (including print statements) you should use the _report_to_executor(msg: Message) method. Since the Task is run as a subprocess this method will pass information to the controlling Executor. You can pass any type of object using this method, strings, plots, arrays, etc.
  5. If you did not use the set_result configuration option in your parameters model, make sure to provide a result when finished. This is done by setting self._result.payload = .... You can set the result to be any object. If you have written the result to a file, for example, please provide a path.

A minimal template is provided below.

\"\"\"Standard docstring...\"\"\"\n\n__all__ = [\"RunTask\"]\n__author__ = \"\" # Please include so we know who the SME is\n\n# Include any imports you need here\n\nfrom lute.execution.ipc import Message # Message for communication\nfrom lute.io.models.base import *      # For TaskParameters\nfrom lute.tasks.task import *          # For Task\n\nclass RunTask(Task): # Inherit from Task\n    \"\"\"Task description goes here, or in __init__\"\"\"\n\n    def __init__(self, *, params: TaskParameters) -> None:\n        super().__init__(params=params) # Sets up Task, parameters, etc.\n        # Parameters will be available through:\n          # self._task_parameters\n          # You access with . operator: self._task_parameters.param1, etc.\n        # Your result object is availble through:\n          # self._result\n            # self._result.payload <- Main result\n            # self._result.summary <- Short summary\n            # self._result.task_status <- Semi-automatic, but can be set manually\n\n    def _run(self) -> None:\n        # THIS METHOD MUST BE PROVIDED\n        self.do_my_analysis()\n\n    def do_my_analysis(self) -> None:\n        # Send a message, proper way to print:\n        msg: Message(contents=\"My message contents\", signal=\"\")\n        self._report_to_executor(msg)\n\n        # When done, set result - assume we wrote a file, e.g.\n        self._result.payload = \"/path/to/output_file.h5\"\n        # Optionally also set status - good practice but not obligatory\n        self._result.task_status = TaskStatus.COMPLETED\n
"},{"location":"tutorial/new_task/#using-mpi-for-your-task","title":"Using MPI for your Task","text":"

In the case your Task is written to use MPI a slight modification to the template above is needed. Specifically, an additional keyword argument should be passed to the base class initializer: use_mpi=True. This tells the base class to adjust signalling/communication behaviour appropriately for a multi-rank MPI program. Doing this prevents tricky-to-track-down problems due to ranks starting, completing and sending messages at different times. The rest of your code can, as before, be written as you see fit. The use of this keyword argument will also synchronize the start of all ranks and wait until all ranks have finished to exit.

\"\"\"Task which needs to run with MPI\"\"\"\n\n__all__ = [\"RunTask\"]\n__author__ = \"\" # Please include so we know who the SME is\n\n# Include any imports you need here\n\nfrom lute.execution.ipc import Message # Message for communication\nfrom lute.io.models.base import *      # For TaskParameters\nfrom lute.tasks.task import *          # For Task\n\n# Only the init is shown\nclass RunMPITask(Task): # Inherit from Task\n    \"\"\"Task description goes here, or in __init__\"\"\"\n\n    # Signal the use of MPI!\n    def __init__(self, *, params: TaskParameters, use_mpi: bool = True) -> None:\n        super().__init__(params=params, use_mpi=use_mpi) # Sets up Task, parameters, etc.\n        # That's it.\n
"},{"location":"tutorial/new_task/#message-signals","title":"Message signals","text":"

Signals in Message objects are strings and can be one of the following:

LUTE_SIGNALS: Set[str] = {\n    \"NO_PICKLE_MODE\",\n    \"TASK_STARTED\",\n    \"TASK_FAILED\",\n    \"TASK_STOPPED\",\n    \"TASK_DONE\",\n    \"TASK_CANCELLED\",\n    \"TASK_RESULT\",\n}\n

Each of these signals is associated with a hook on the Executor-side. They are for the most part used by base classes; however, you can choose to make use of them manually as well.

"},{"location":"tutorial/new_task/#making-your-task-available","title":"Making your Task available","text":"

Once the Task has been written, it needs to be made available for import. Since different Tasks can have conflicting dependencies and environments, this is managed through an import function. When the Task is done, or ready for testing, a condition is added to lute.tasks.__init__.import_task. For example, assume the Task is called RunXASAnalysis and it's defined in a module called xas.py, we would add the following lines to the import_task function:

# in lute.tasks.__init__\n\n# ...\n\ndef import_task(task_name: str) -> Type[Task]:\n    # ...\n    if task_name == \"RunXASAnalysis\":\n        from .xas import RunXASAnalysis\n\n        return RunXASAnalysis\n
"},{"location":"tutorial/new_task/#defining-an-executor","title":"Defining an Executor","text":"

The process of Executor definition is identical to the process as described for ThirdPartyTasks above. The one exception is if you defined the Task to use MPI as described in the section above (Using MPI for your Task), you will likely consider using the MPIExecutor.

"}]} \ No newline at end of file diff --git a/dev/sitemap.xml.gz b/dev/sitemap.xml.gz index c19f318b..1ce9fcf6 100644 Binary files a/dev/sitemap.xml.gz and b/dev/sitemap.xml.gz differ diff --git a/dev/source/execution/executor/index.html b/dev/source/execution/executor/index.html index 56831c1f..51705e51 100644 --- a/dev/source/execution/executor/index.html +++ b/dev/source/execution/executor/index.html @@ -2439,7 +2439,9 @@

485 486 487 -488
class BaseExecutor(ABC):
+488
+489
+490
class BaseExecutor(ABC):
     """ABC to manage Task execution and communication with user services.
 
     When running in a workflow, "tasks" (not the class instances) are submitted
@@ -2633,7 +2635,9 @@ 

# network. time.sleep(0.1) # Propagate any env vars setup by Communicators - only update LUTE_ vars - tmp: Dict[str, str] = {key: os.environ[key] for key in os.environ if "LUTE_" in key} + tmp: Dict[str, str] = { + key: os.environ[key] for key in os.environ if "LUTE_" in key + } self._analysis_desc.task_env.update(tmp) def _submit_task(self, cmd: str) -> subprocess.Popen: @@ -3267,9 +3271,7 @@

Source code in lute/execution/executor.py -
309
-310
-311
+              
311
 312
 313
 314
@@ -3310,7 +3312,9 @@ 

349 350 351 -352

def execute_task(self) -> None:
+352
+353
+354
def execute_task(self) -> None:
     """Run the requested Task as a subprocess."""
     self._pre_task()
     lute_path: Optional[str] = os.getenv("LUTE_PATH")
@@ -3378,14 +3382,14 @@ 

Source code in lute/execution/executor.py -
478
-479
-480
+              
480
 481
 482
 483
 484
-485
def process_results(self) -> None:
+485
+486
+487
def process_results(self) -> None:
     """Perform any necessary steps to process TaskResults object.
 
     Processing will depend on subclass. Examples of steps include, moving
@@ -4142,9 +4146,7 @@ 

Source code in lute/execution/executor.py -
491
-492
-493
+                
493
 494
 495
 496
@@ -4319,7 +4321,9 @@ 

665 666 667 -668

class Executor(BaseExecutor):
+668
+669
+670
class Executor(BaseExecutor):
     """Basic implementation of an Executor which manages simple IPC with Task.
 
     Attributes:
@@ -4527,9 +4531,7 @@ 

Source code in lute/execution/executor.py -
525
-526
-527
+              
527
 528
 529
 530
@@ -4602,7 +4604,9 @@ 

597 598 599 -600

def add_default_hooks(self) -> None:
+600
+601
+602
def add_default_hooks(self) -> None:
     """Populate the set of default event hooks."""
 
     def no_pickle_mode(self: Executor, msg: Message):
@@ -4746,9 +4750,7 @@ 

Source code in lute/execution/executor.py -
671
-672
-673
+                
673
 674
 675
 676
@@ -4790,7 +4792,9 @@ 

712 713 714 -715

class MPIExecutor(Executor):
+715
+716
+717
class MPIExecutor(Executor):
     """Runs first-party Tasks that require MPI.
 
     This Executor is otherwise identical to the standard Executor, except it
diff --git a/dev/source/tasks/sfx_find_peaks/index.html b/dev/source/tasks/sfx_find_peaks/index.html
index 2523df7d..dfe3621b 100644
--- a/dev/source/tasks/sfx_find_peaks/index.html
+++ b/dev/source/tasks/sfx_find_peaks/index.html
@@ -1376,7 +1376,8 @@ 

Source code in lute/tasks/sfx_find_peaks.py -
 31
+                
 30
+ 31
  32
  33
  34
@@ -1708,7 +1709,21 @@ 

360 361 362 -363

class CxiWriter:
+363
+364
+365
+366
+367
+368
+369
+370
+371
+372
+373
+374
+375
+376
+377
class CxiWriter:
 
     def __init__(
         self,
@@ -1887,6 +1902,21 @@ 

ch_rows: NDArray[numpy.float_] = peaks[:, 0] * self._det_shape[1] + peaks[:, 1] ch_cols: NDArray[numpy.float_] = peaks[:, 2] + if self._outh5["/entry_1/data_1/data"].shape[0] <= self._index: + self._outh5["entry_1/data_1/data"].resize(self._index + 1, axis=0) + ds_key: str + for ds_key in self._outh5["/entry_1/result_1"].keys(): + self._outh5[f"/entry_1/result_1/{ds_key}"].resize( + self._index + 1, axis=0 + ) + for ds_key in ( + "machineTime", + "machineTimeNanoSeconds", + "fiducial", + "photon_energy_eV", + ): + self._outh5[f"/LCLS/{ds_key}"].resize(self._index + 1, axis=0) + # Entry_1 entry for processing with CrystFEL self._outh5["/entry_1/data_1/data"][self._index, :, :] = img.reshape( -1, img.shape[-1] @@ -2099,7 +2129,8 @@

Source code in lute/tasks/sfx_find_peaks.py -
 33
+              
 32
+ 33
  34
  35
  36
@@ -2241,8 +2272,7 @@ 

172 173 174 -175 -176

def __init__(
+175
def __init__(
     self,
     outdir: str,
     rank: int,
@@ -2414,21 +2444,7 @@ 

Source code in lute/tasks/sfx_find_peaks.py -
314
-315
-316
-317
-318
-319
-320
-321
-322
-323
-324
-325
-326
-327
-328
+              
328
 329
 330
 331
@@ -2463,7 +2479,21 @@ 

360 361 362 -363

def optimize_and_close_file(
+363
+364
+365
+366
+367
+368
+369
+370
+371
+372
+373
+374
+375
+376
+377
def optimize_and_close_file(
     self,
     num_hits: int,
     max_peaks: int,
@@ -2550,7 +2580,8 @@ 

Source code in lute/tasks/sfx_find_peaks.py -
178
+              
177
+178
 179
 180
 181
@@ -2650,7 +2681,21 @@ 

275 276 277 -278

def write_event(
+278
+279
+280
+281
+282
+283
+284
+285
+286
+287
+288
+289
+290
+291
+292
def write_event(
     self,
     img: NDArray[numpy.float_],
     peaks: Any,  # Not typed becomes it comes from psana
@@ -2682,6 +2727,21 @@ 

ch_rows: NDArray[numpy.float_] = peaks[:, 0] * self._det_shape[1] + peaks[:, 1] ch_cols: NDArray[numpy.float_] = peaks[:, 2] + if self._outh5["/entry_1/data_1/data"].shape[0] <= self._index: + self._outh5["entry_1/data_1/data"].resize(self._index + 1, axis=0) + ds_key: str + for ds_key in self._outh5["/entry_1/result_1"].keys(): + self._outh5[f"/entry_1/result_1/{ds_key}"].resize( + self._index + 1, axis=0 + ) + for ds_key in ( + "machineTime", + "machineTimeNanoSeconds", + "fiducial", + "photon_energy_eV", + ): + self._outh5[f"/LCLS/{ds_key}"].resize(self._index + 1, axis=0) + # Entry_1 entry for processing with CrystFEL self._outh5["/entry_1/data_1/data"][self._index, :, :] = img.reshape( -1, img.shape[-1] @@ -2779,21 +2839,7 @@

Source code in lute/tasks/sfx_find_peaks.py -
280
-281
-282
-283
-284
-285
-286
-287
-288
-289
-290
-291
-292
-293
-294
+              
294
 295
 296
 297
@@ -2811,7 +2857,21 @@ 

309 310 311 -312

def write_non_event_data(
+312
+313
+314
+315
+316
+317
+318
+319
+320
+321
+322
+323
+324
+325
+326
def write_non_event_data(
     self,
     powder_hits: NDArray[numpy.float_],
     powder_misses: NDArray[numpy.float_],
@@ -2879,21 +2939,7 @@ 

Source code in lute/tasks/sfx_find_peaks.py -
575
-576
-577
-578
-579
-580
-581
-582
-583
-584
-585
-586
-587
-588
-589
+                
589
 590
 591
 592
@@ -3122,14 +3168,38 @@ 

815 816 817 -818

class FindPeaksPyAlgos(Task):
+818
+819
+820
+821
+822
+823
+824
+825
+826
+827
+828
+829
+830
+831
+832
+833
+834
+835
+836
+837
+838
+839
+840
class FindPeaksPyAlgos(Task):
     """
     Task that performs peak finding using the PyAlgos peak finding algorithms and
     writes the peak information to CXI files.
     """
 
-    def __init__(self, *, params: TaskParameters) -> None:
-        super().__init__(params=params)
+    def __init__(self, *, params: TaskParameters, use_mpi: bool = True) -> None:
+        super().__init__(params=params, use_mpi=use_mpi)
+        if self._task_parameters.compression is not None:
+            from libpressio import PressioCompressor
 
     def _run(self) -> None:
         ds: Any = MPIDataSource(
@@ -3306,9 +3376,15 @@ 

# TODO: Fix bug here # generate / update powders if peaks.shape[0] >= self._task_parameters.min_peaks: - powder_hits = numpy.maximum(powder_hits, img) + powder_hits = numpy.maximum( + powder_hits, + img.reshape(-1, img.shape[-1]), + ) else: - powder_misses = numpy.maximum(powder_misses, img) + powder_misses = numpy.maximum( + powder_misses, + img.reshape(-1, img.shape[-1]), + ) if num_empty_images != 0: msg: Message = Message( @@ -3414,25 +3490,25 @@

Source code in lute/tasks/sfx_find_peaks.py -
554
-555
-556
-557
-558
-559
-560
-561
-562
-563
-564
-565
-566
-567
-568
+              
568
 569
 570
 571
-572
def add_peaks_to_libpressio_configuration(lp_json, peaks) -> Dict[str, Any]:
+572
+573
+574
+575
+576
+577
+578
+579
+580
+581
+582
+583
+584
+585
+586
def add_peaks_to_libpressio_configuration(lp_json, peaks) -> Dict[str, Any]:
     """
     Add peak infromation to libpressio configuration
 
@@ -3488,21 +3564,7 @@ 

Source code in lute/tasks/sfx_find_peaks.py -
466
-467
-468
-469
-470
-471
-472
-473
-474
-475
-476
-477
-478
-479
-480
+              
480
 481
 482
 483
@@ -3573,7 +3635,21 @@ 

548 549 550 -551

def generate_libpressio_configuration(
+551
+552
+553
+554
+555
+556
+557
+558
+559
+560
+561
+562
+563
+564
+565
def generate_libpressio_configuration(
     compressor: Literal["sz3", "qoz"],
     roi_window_size: int,
     bin_size: int,
@@ -3699,21 +3775,7 @@ 

Source code in lute/tasks/sfx_find_peaks.py -
366
-367
-368
-369
-370
-371
-372
-373
-374
-375
-376
-377
-378
-379
-380
+              
380
 381
 382
 383
@@ -3796,7 +3858,21 @@ 

460 461 462 -463

def write_master_file(
+463
+464
+465
+466
+467
+468
+469
+470
+471
+472
+473
+474
+475
+476
+477
def write_master_file(
     mpi_size: int,
     outdir: str,
     exp: str,