diff --git a/dev/search/search_index.json b/dev/search/search_index.json index 97c63c4d..2bd46214 100644 --- a/dev/search/search_index.json +++ b/dev/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Setup","text":"
LUTE is publically available on GitHub. In order to run it, the first step is to clone the repository:
# Navigate to the directory of your choice.\ngit clone@github.com:slac-lcls/lute\n
The repository directory structure is as follows:
lute\n |--- config # Configuration YAML files (see below) and templates for third party config\n |--- docs # Documentation (including this page)\n |--- launch_scripts # Entry points for using SLURM and communicating with Airflow\n |--- lute # Code\n |--- run_task.py # Script to run an individual managed Task\n |--- ...\n |--- utilities # Help utility programs\n |--- workflows # This directory contains workflow definitions. It is synced elsewhere and not used directly.\n\n
In general, most interactions with the software will be through scripts located in the launch_scripts
directory. Some users (for certain use-cases) may also choose to run the run_task.py
script directly - it's location has been highlighted within hierarchy. To begin with you will need a YAML file, templates for which are available in the config
directory. The structure of the YAML file and how to use the various launch scripts are described in more detail below.
In the utilities
directory there are two useful programs to provide assistance with using the software:
utilities/dbview
: LUTE stores all parameters for every analysis routine it runs (as well as results) in a database. This database is stored in the work_dir
defined in the YAML file (see below). The dbview
utility is a TUI application (Text-based user interface) which runs in the terminal. It allows you to navigate a LUTE database using the arrow keys, etc. Usage is: utilities/dbview -p <path/to/lute.db>
.utilities/lute_help
: This utility provides help and usage information for running LUTE software. E.g., it provides access to parameter descriptions to assist in properly filling out a configuration YAML. It's usage is described in slightly more detail below.LUTE runs code as Task
s that are managed by an Executor
. The Executor
provides modifications to the environment the Task
runs in, as well as controls details of inter-process communication, reporting results to the eLog, etc. Combinations of specific Executor
s and Task
s are already provided, and are referred to as managed Task
s. Managed Task
s are submitted as a single unit. They can be run individually, or a series of independent steps can be submitted all at once in the form of a workflow, or directed acyclic graph (DAG). This latter option makes use of Airflow to manage the individual execution steps.
Running analysis with LUTE is the process of submitting one or more managed Task
s. This is generally a two step process.
Task
s which you may run.Task
submission, or workflow (DAG) submission.These two steps are described below.
"},{"location":"#preparing-a-configuration-yaml","title":"Preparing a Configuration YAML","text":"All Task
s are parameterized through a single configuration YAML file - even third party code which requires its own configuration files is managed through this YAML file. The basic structure is split into two documents, a brief header section which contains information that is applicable across all Task
s, such as the experiment name, run numbers and the working directory, followed by per Task
parameters:
%YAML 1.3\n---\ntitle: \"Some title.\"\nexperiment: \"MYEXP123\"\n# run: 12 # Does not need to be provided\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/scratch/users/d/dorlhiac\"\n...\n---\nTaskOne:\n param_a: 123\n param_b: 456\n param_c:\n sub_var: 3\n sub_var2: 4\n\nTaskTwo:\n new_param1: 3\n new_param2: 4\n\n# ...\n...\n
In the first document, the header, it is important that the work_dir
is properly specified. This is the root directory from which Task
outputs will be written, and the LUTE database will be stored. It may also be desirable to modify the task_timeout
parameter which defines the time limit for individual Task
jobs. By default it is set to 10 minutes, although this may not be sufficient for long running jobs. This value will be applied to all Task
s so should account for the longest running job you expect.
The actual analysis parameters are defined in the second document. As these vary from Task
to Task
, a full description will not be provided here. An actual template with real Task
parameters is available in config/test.yaml
. Your analysis POC can also help you set up and choose the correct Task
s to include as a starting point. The template YAML file has further descriptions of what each parameter does and how to fill it out. You can also refer to the lute_help
program described under the following sub-heading.
Some things to consider and possible points of confusion:
Task
s, the parameters are defined at the Task
level. I.e. the managed Task
and Task
itself have different names, and the names in the YAML refer to the latter. This is because a single Task
can be run using different Executor
configurations, but using the same parameters. The list of managed Task
s is in lute/managed_tasks.py
. A table is also provided below for some routines of interest..Task
The Task
it Runs Task
Description SmallDataProducer
SubmitSMD
Smalldata production CrystFELIndexer
IndexCrystFEL
Crystallographic indexing PartialatorMerger
MergePartialator
Crystallographic merging HKLComparer
CompareHKL
Crystallographic figures of merit HKLManipulator
ManipulateHKL
Crystallographic format conversions DimpleSolver
DimpleSolve
Crystallographic structure solution with molecular replacement PeakFinderPyAlgos
FindPeaksPyAlgos
Peak finding with PyAlgos algorithm. PeakFinderPsocake
FindPeaksPsocake
Peak finding with psocake algorithm. StreamFileConcatenator
ConcatenateStreamFiles
Stream file concatenation."},{"location":"#how-do-i-know-what-parameters-are-available-and-what-they-do","title":"How do I know what parameters are available, and what they do?","text":"A summary of Task
parameters is available through the lute_help
program.
> utilities/lute_help -t [TaskName]\n
Note, some parameters may say \"Unknown description\" - this either means they are using an old-style defintion that does not include parameter help, or they may have some internal use. In particular you will see this for lute_config
on every Task
, this parameter is filled in automatically and should be ignored. E.g. as an example:
> utilities/lute_help -t IndexCrystFEL\nINFO:__main__:Fetching parameter information for IndexCrystFEL.\nIndexCrystFEL\n-------------\nParameters for CrystFEL's `indexamajig`.\n\nThere are many parameters, and many combinations. For more information on\nusage, please refer to the CrystFEL documentation, here:\nhttps://www.desy.de/~twhite/crystfel/manual-indexamajig.html\n\n\nRequired Parameters:\n--------------------\n[...]\n\nAll Parameters:\n-------------\n[...]\n\nhighres (number)\n Mark all pixels greater than `x` has bad.\n\nprofile (boolean) - Default: False\n Display timing data to monitor performance.\n\ntemp_dir (string)\n Specify a path for the temp files folder.\n\nwait_for_file (integer) - Default: 0\n Wait at most `x` seconds for a file to be created. A value of -1 means wait forever.\n\nno_image_data (boolean) - Default: False\n Load only the metadata, no iamges. Can check indexability without high data requirements.\n\n[...]\n
"},{"location":"#running-managed-tasks-and-workflows-dags","title":"Running Managed Task
s and Workflows (DAGs)","text":"After a YAML file has been filled in you can run a Task
. There are multiple ways to submit a Task
, but there are 3 that are most likely:
Task
interactively by running python ...
Task
as a batch job (e.g. on S3DF) via a SLURM submission submit_slurm.sh ...
Task
s).These will be covered in turn below; however, in general all methods will require two parameters: the path to a configuration YAML file, and the name of the managed Task
or workflow you want to run. When submitting via SLURM or submitting an entire workflow there are additional parameters to control these processes.
Task
s interactively","text":"The simplest submission method is just to run Python interactively. In most cases this is not practical for long-running analysis, but may be of use for short Task
s or when debugging. From the root directory of the LUTE repository (or after installation) you can use the run_task.py
script:
> python -B [-O] run_task.py -t <ManagedTaskName> -c </path/to/config/yaml>\n
The command-line arguments in square brackets []
are optional, while those in <>
must be provided:
-O
is the flag controlling whether you run in debug or non-debug mode. By default, i.e. if you do NOT provide this flag you will run in debug mode which enables verbose printing. Passing -O
will turn off debug to minimize output.-t <ManagedTaskName>
is the name of the managed Task
you want to run.-c </path/...>
is the path to the configuration YAML.Task
as a batch job","text":"On S3DF you can also submit individual managed Task
s to run as batch jobs. To do so use launch_scripts/submit_slurm.sh
> launch_scripts/submit_slurm.sh -t <ManagedTaskName> -c </path/to/config/yaml> [--debug] $SLURM_ARGS\n
As before command-line arguments in square brackets []
are optional, while those in <>
must be provided
-t <ManagedTaskName>
is the name of the managed Task
you want to run.-c </path/...>
is the path to the configuration YAML.--debug
is the flag to control whether or not to run in debug mode.In addition to the LUTE-specific arguments, SLURM arguments must also be provided ($SLURM_ARGS
above). You can provide as many as you want; however you will need to at least provide:
--partition=<partition/queue>
- The queue to run on, in general for LCLS this is milano
--account=lcls:<experiment>
- The account to use for batch job accounting.You will likely also want to provide at a minimum:
--ntasks=<...>
to control the number of cores in allocated.In general, it is best to prefer the long-form of the SLURM-argument (--arg=<...>
) in order to avoid potential clashes with present or future LUTE arguments.
Finally, you can submit a full workflow (e.g. SFX analysis, smalldata production and summary results, geometry optimization...). This can be done using a single script, submit_launch_airflow.sh
, similarly to the SLURM submission above:
> launch_scripts/submit_launch_airflow.sh /path/to/lute/launch_scripts/launch_airflow.py -c </path/to/yaml.yaml> -w <dag_name> [--debug] [--test] [-e <exp>] [-r <run>] $SLURM_ARGS\n
The submission process is slightly more complicated in this case. A more in-depth explanation is provided under \"Airflow Launch Steps\", in the advanced usage section below if interested. The parameters are as follows - as before command-line arguments in square brackets []
are optional, while those in <>
must be provided:
launch_scripts/launch_airflow.py
script located in whatever LUTE installation you are running. All other arguments can come afterwards in any order.-c </path/...>
is the path to the configuration YAML to use.-w <dag_name>
is the name of the DAG (workflow) to run. This replaces the task name provided when using the other two methods above. A DAG list is provided below.-W
(capital W) followed by the path to the workflow instead of -w
. See below for further discussion on this use case.--debug
controls whether to use debug mode (verbose printing)--test
controls whether to use the test or production instance of Airflow to manage the DAG. The instances are running identical versions of Airflow, but the test
instance may have \"test\" or more bleeding edge development DAGs.-e
is used to pass the experiment name. Needed if not using the ARP, i.e. running from the command-line.-r
is used to pass a run number. Needed if not using the ARP, i.e. running from the command-line.The $SLURM_ARGS
must be provided in the same manner as when submitting an individual managed Task
by hand to be run as batch job with the script above. Note that these parameters will be used as the starting point for the SLURM arguments of every managed Task
in the DAG; however, individual steps in the DAG may have overrides built-in where appropriate to make sure that step is not submitted with potentially incompatible arguments. For example, a single threaded analysis Task
may be capped to running on one core, even if in general everything should be running on 100 cores, per the SLURM argument provided. These caps are added during development and cannot be disabled through configuration changes in the YAML.
DAG List
find_peaks_index
psocake_sfx_phasing
pyalgos_sfx
eLog
","text":"You can use the script in the previous section to submit jobs through the eLog. To do so navigate to the Workflow > Definitions
tab using the blue navigation bar at the top of the eLog. On this tab, in the top-right corner (underneath the help and zoom icons) you can click the +
sign to add a new workflow. This will bring up a \"Workflow definition\" UI window. When filling out the eLog workflow definition the following fields are needed (all of them):
Name
: You can name the workflow anything you like. It should probably be something descriptive, e.g. if you are using LUTE to run smalldata_tools, you may call the workflow lute_smd
.Executable
: In this field you will put the full path to the submit_launch_airflow.sh
script: /path/to/lute/launch_scripts/submit_launch_airflow.sh
.Parameters
: You will use the parameters as described above. Remember the first argument will be the full path to the launch_airflow.py
script (this is NOT the same as the bash script used in the executable!): /full/path/to/lute/launch_scripts/launch_airflow.py -c <path/to/yaml> -w <dag_name> [--debug] [--test] $SLURM_ARGS
Location
: Be sure to set to S3DF
.Trigger
: You can have the workflow trigger automatically or manually. Which option to choose will depend on the type of workflow you are running. In general the options Manually triggered
(which displays as MANUAL
on the definitions page) and End of a run
(which displays as END_OF_RUN
on the definitions page) are safe options for ALL workflows. The latter will be automatically submitted for you when data acquisition has finished. If you are running a workflow with managed Task
s that work as data is being acquired (e.g. SmallDataProducer
), you may also select Start of a run
(which displays as START_OF_RUN
on the definitions page).Upon clicking create you will see a new entry in the table on the definitions page. In order to run MANUAL
workflows, or re-run automatic workflows, you must navigate to the Workflows > Control
tab. For each acquisition run you will find a drop down menu under the Job
column. To submit a workflow you select it from this drop down menu by the Name
you provided when creating its definition.
Using validator
s, it is possible to define (generally, default) model parameters for a Task
in terms of other parameters. It is also possible to use validated Pydantic model parameters to substitute values into a configuration file required to run a third party Task
(e.g. some Task
s may require their own JSON, TOML files, etc. to run properly). For more information on these types of substitutions, refer to the new_task.md
documentation on Task
creation.
These types of substitutions, however, have a limitation in that they are not easily adapted at run time. They therefore address only a small number of the possible combinations in the dependencies between different input parameters. In order to support more complex relationships between parameters, variable substitutions can also be used in the configuration YAML itself. Using a syntax similar to Jinja
templates, you can define values for YAML parameters in terms of other parameters or environment variables. The values are substituted before Pydantic attempts to validate the configuration.
It is perhaps easiest to illustrate with an example. A test case is provided in config/test_var_subs.yaml
and is reproduced here:
%YAML 1.3\n---\ntitle: \"Configuration to Test YAML Substitution\"\nexperiment: \"TestYAMLSubs\"\nrun: 12\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/scratch/users/d/dorlhiac\"\n...\n---\nOtherTask:\n useful_other_var: \"USE ME!\"\n\nNonExistentTask:\n test_sub: \"/path/to/{{ experiment }}/file_r{{ run:04d }}.input\" # Substitute `experiment` and `run` from header above\n test_env_sub: \"/path/to/{{ $EXPERIMENT }}/file.input\" # Substitute from the environment variable $EXPERIMENT\n test_nested:\n a: \"outfile_{{ run }}_one.out\" # Substitute `run` from header above\n b:\n c: \"outfile_{{ run }}_two.out\" # Also substitute `run` from header above\n d: \"{{ OtherTask.useful_other_var }}\" # Substitute `useful_other_var` from `OtherTask`\n test_fmt: \"{{ run:04d }}\" # Subsitute `run` and format as 0012\n test_env_fmt: \"{{ $RUN:04d }}\" # Substitute environment variable $RUN and pad to 4 w/ zeros\n...\n
Input parameters in the config YAML can be substituted with either other input parameters or environment variables, with or without limited string formatting. All substitutions occur between double curly brackets: {{ VARIABLE_TO_SUBSTITUTE }}
. Environment variables are indicated by $
in front of the variable name. Parameters from the header, i.e. the first YAML document (top section) containing the run
, experiment
, version fields, etc. can be substituted without any qualification. If you want to use the run
parameter, you can substitute it using {{ run }}
. All other parameters, i.e. from other Task
s or within Task
s, must use a qualified name. Nested levels are delimited using a .
. E.g. consider a structure like:
Task:\n param_set:\n a: 1\n b: 2\n c: 3\n
In order to use parameter c
, you would use {{ Task.param_set.c }}
as the substitution.
Take care when using substitutions! This process will not try to guess for you. When a substitution is not available, e.g. due to misspelling, one of two things will happen:
param: /my/failed/{{ $SUBSTITUTION }}
as your parameter. This may or may not fail the model validation step, but is likely not what you intended.Defining your own parameters
The configuration file is not validated in its totality, only on a Task
-by-Task
basis, but it is read in its totality. E.g. when running MyTask
only that portion of the configuration is validated even though the entire file has been read, and is available for substitutions. As a result, it is safe to introduce extra entries into the YAML file, as long as they are not entered under a specific Task
's configuration. This may be useful to create your own global substitutions, for example if there is a key variable that may be used across different Task
s. E.g. Consider a case where you want to create a more generic configuration file where a single variable is used by multiple Task
s. This single variable may be changed between experiments, for instance, but is likely static for the duration of a single set of analyses. In order to avoid a mistake when changing the configuration between experiments you can define this special variable (or variables) as a separate entry in the YAML, and make use of substitutions in each Task
's configuration. This way the variable only needs to be changed in one place.
# Define our substitution. This is only for substitutiosns!\nMY_SPECIAL_SUB: \"EXPMT_DEPENDENT_VALUE\" # Can change here once per experiment!\n\nRunTask1:\n special_var: \"{{ MY_SPECIAL_SUB }}\"\n var_1: 1\n var_2: \"a\"\n # ...\n\nRunTask2:\n special_var: \"{{ MY_SPECIAL_SUB }}\"\n var_3: \"abcd\"\n var_4: 123\n # ...\n\nRunTask3:\n special_var: \"{{ MY_SPECIAL_SUB }}\"\n #...\n\n# ... and so on\n
"},{"location":"#gotchas","title":"Gotchas!","text":"Order matters
While in general you can use parameters that appear later in a YAML document to substitute for values of parameters that appear earlier, the substitutions themselves will be performed in order of appearance. It is therefore NOT possible to correctly use a later parameter as a substitution for an earlier one, if the later one itself depends on a substitution. The YAML document, however, can be rearranged without error. The order in the YAML document has no effect on execution order which is determined purely by the workflow definition. As mentioned above, the document is not validated in its entirety so rearrangements are allowed. For example consider the following situation which produces an incorrect substitution:
%YAML 1.3\n---\ntitle: \"Configuration to Test YAML Substitution\"\nexperiment: \"TestYAMLSubs\"\nrun: 12\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/data/lcls/ds/exp/experiment/scratch\"\n...\n---\nRunTaskOne:\n input_dir: \"{{ RunTaskTwo.path }}\" # Will incorrectly be \"{{ work_dir }}/additional_path/{{ $RUN }}\"\n # ...\n\nRunTaskTwo:\n # Remember `work_dir` and `run` come from the header document and don't need to\n # be qualified\n path: \"{{ work_dir }}/additional_path/{{ run }}\"\n...\n
This configuration can be rearranged to achieve the desired result:
%YAML 1.3\n---\ntitle: \"Configuration to Test YAML Substitution\"\nexperiment: \"TestYAMLSubs\"\nrun: 12\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/data/lcls/ds/exp/experiment/scratch\"\n...\n---\nRunTaskTwo:\n # Remember `work_dir` comes from the header document and doesn't need to be qualified\n path: \"{{ work_dir }}/additional_path/{{ run }}\"\n\nRunTaskOne:\n input_dir: \"{{ RunTaskTwo.path }}\" # Will now be /sdf/data/lcls/ds/exp/experiment/scratch/additional_path/12\n # ...\n...\n
On the otherhand, relationships such as these may point to inconsistencies in the dependencies between Task
s which may warrant a refactor.
Found unhashable key
To avoid YAML parsing issues when using the substitution syntax, be sure to quote your substitutions. Before substitution is performed, a dictionary is first constructed by the pyyaml
package which parses the document - it may fail to parse the document and raise an exception if the substitutions are not quoted. E.g.
# USE THIS\nMyTask:\n var_sub: \"{{ other_var:04d }}\"\n\n# **DO NOT** USE THIS\nMyTask:\n var_sub: {{ other_var:04d }}\n
During validation, Pydantic will by default cast variables if possible, because of this it is generally safe to use strings for substitutions. E.g. if your parameter is expecting an integer, and after substitution you pass \"2\"
, Pydantic will cast this to the int
2
, and validation will succeed. As part of the substitution process limited type casting will also be handled if it is necessary for any formatting strings provided. E.g. \"{{ run:04d }}\"
requires that run be an integer, so it will be treated as such in order to apply the formatting.
In most cases, standard DAGs should be called as described above. However, Airflow also supports the dynamic creation of DAGs, e.g. to vary the input data to various steps, or the number of steps that will occur. Some of this functionality has been used to allow for user-defined DAGs which are passed in the form of a dictionary, allowing Airflow to construct the workflow as it is running.
A basic YAML syntax is used to construct a series of nested dictionaries which define a DAG. Consider a simplified serial femtosecond crystallography DAG which runs peak finding through merging and then calculates some statistics. I.e. we want an execution order that looks like:
peak_finder >> indexer >> merger >> hkl_comparer\n
We can alternatively define this DAG in YAML:
task_name: PeakFinderPyAlgos\nslurm_params: ''\nnext:\n- task_name: CrystFELIndexer\n slurm_params: ''\n next: []\n - task_name: PartialatorMerger\n slurm_params: ''\n next: []\n - task_name: HKLComparer\n slurm_params: ''\n next:\n
I.e. we define a tree where each node is constructed using Node(task_name: str, slurm_params: str, next: List[Node])
.
task_name
is the name of a managed Task
. This name must be identical to a managed Task
defined in the LUTE installation you are using.slurm_params
. This is a complete string of all the arguments to use for the corresponding managed Task
. Use of this field is all or nothing! - if it is left as an empty string, the default parameters (passed on the command-line using the launch script) are used, otherwise this string is used in its stead. Because of this remember to include a partition and account if using it.next
field is composed of either an empty list (meaning no managed Task
s are run after the current node), or additional nodes. All nodes in the next
list are run in parallel.As a second example, to run task1
followed by task2
and task3
in parellel we would use:
task_name: Task1\nslurm_params: ''\nnext:\n- task_name: Task2\n slurm_params: ''\n next: []\n- task_name: Task3\n slurm_params: ''\n next: []\n
In order to run a DAG defined in this way, we pass the path to the YAML file we have defined it in to the launch script using -W <path_to_dag>
. This is instead of calling it by name. E.g.
/path/to/lute/launch_scripts/submit_launch_airflow.sh /path/to/lute/launch_scripts/launch_airflow.py -e <exp> -r <run> -c /path/to/config -W <path_to_dag> --test [--debug] [SLURM_ARGS]\n
Note that fewer options are currently supported for configuring the operators for each step of the DAG. The slurm arguments can be replaced in their entirety using a custom slurm_params
string but individual options cannot be modified.
Special markers have been inserted at certain points in the execution flow for LUTE. These can be enabled by setting the environment variables detailed below. These are intended to allow developers to exit the program at certain points to investigate behaviour or a bug. For instance, when working on configuration parsing, an environment variable can be set which exits the program after passing this step. This allows you to run LUTE otherwise as normal (described above), without having to modify any additional code or insert your own early exits.
Types of debug markers:
LUTE_DEBUG_EXIT
: Will exit the program at this point if the corresponding environment variable has been set.Developers can insert these markers as needed into their code to add new exit points, although as a rule of thumb they should be used sparingly, and generally only after major steps in the execution flow (e.g. after parsing, after beginning a task, after returning a result, etc.).
In order to include a new marker in your code:
from lute.execution.debug_utils import LUTE_DEBUG_EXIT\n\ndef my_code() -> None:\n # ...\n LUTE_DEBUG_EXIT(\"MYENVVAR\", \"Additional message to print\")\n # If MYENVVAR is not set, the above function does nothing\n
You can enable a marker by setting to 1, e.g. to enable the example marker above while running Tester
:
MYENVVAR=1 python -B run_task.py -t Tester -c config/test.yaml\n
"},{"location":"#currently-used-environment-variables","title":"Currently used environment variables","text":"LUTE_DEBUG_EXIT_AT_YAML
: Exits the program after reading in a YAML configuration file and performing variable substitutions, but BEFORE Pydantic validation.LUTE_DEBUG_BEFORE_TPP_EXEC
: Exits the program after a ThirdPartyTask has prepared its submission command, but before exec
is used to run it.The Airflow launch process actually involves a number of steps, and is rather complicated. There are two wrapper steps prior to getting to the actual Airflow API communication.
launch_scripts/submit_launch_airflow.sh
is run./sdf/group/lcls/ds/tools/lute_launcher
with all the same parameters that it was called with.lute_launcher
runs the launch_scripts/launch_airflow.py
script which was provided as the first argument. This is the true launch scriptlaunch_airflow.py
communicates with the Airflow API, requesting that a specific DAG be launched. It then continues to run, and gathers the individual logs and the exit status of each step of the DAG.launch_scripts/submit_slurm.sh
.There are some specific reasons for this complexity:
submit_launch_airflow.sh
as a thin-wrapper around lute_launcher
is to allow the true Airflow launch script to be a long-lived job. This is for compatibility with the eLog and the ARP. When run from the eLog as a workflow, the job submission process must occur within 30 seconds due to a timeout built-in to the system. This is fine when submitting jobs to run on the batch-nodes, as the submission to the queue takes very little time. So here, submit_launch_airflow.sh
serves as a thin script to have lute_launcher
run as a batch job. It can then run as a long-lived job (for the duration of the entire DAG) collecting log files all in one place. This allows the log for each stage of the Airflow DAG to be inspected in a single file, and through the eLog browser interface.lute_launcher
as a wrapper around launch_airflow.py
is to manage authentication and credentials. The launch_airflow.py
script requires loading credentials in order to authenticate against the Airflow API. For the average user this is not possible, unless the script is run from within the lute_launcher
process.LUTE is publically available on GitHub. In order to run it, the first step is to clone the repository:
# Navigate to the directory of your choice.\ngit clone@github.com:slac-lcls/lute\n
The repository directory structure is as follows:
lute\n |--- config # Configuration YAML files (see below) and templates for third party config\n |--- docs # Documentation (including this page)\n |--- launch_scripts # Entry points for using SLURM and communicating with Airflow\n |--- lute # Code\n |--- run_task.py # Script to run an individual managed Task\n |--- ...\n |--- utilities # Help utility programs\n |--- workflows # This directory contains workflow definitions. It is synced elsewhere and not used directly.\n\n
In general, most interactions with the software will be through scripts located in the launch_scripts
directory. Some users (for certain use-cases) may also choose to run the run_task.py
script directly - it's location has been highlighted within hierarchy. To begin with you will need a YAML file, templates for which are available in the config
directory. The structure of the YAML file and how to use the various launch scripts are described in more detail below.
In the utilities
directory there are two useful programs to provide assistance with using the software:
utilities/dbview
: LUTE stores all parameters for every analysis routine it runs (as well as results) in a database. This database is stored in the work_dir
defined in the YAML file (see below). The dbview
utility is a TUI application (Text-based user interface) which runs in the terminal. It allows you to navigate a LUTE database using the arrow keys, etc. Usage is: utilities/dbview -p <path/to/lute.db>
.utilities/lute_help
: This utility provides help and usage information for running LUTE software. E.g., it provides access to parameter descriptions to assist in properly filling out a configuration YAML. It's usage is described in slightly more detail below.LUTE runs code as Task
s that are managed by an Executor
. The Executor
provides modifications to the environment the Task
runs in, as well as controls details of inter-process communication, reporting results to the eLog, etc. Combinations of specific Executor
s and Task
s are already provided, and are referred to as managed Task
s. Managed Task
s are submitted as a single unit. They can be run individually, or a series of independent steps can be submitted all at once in the form of a workflow, or directed acyclic graph (DAG). This latter option makes use of Airflow to manage the individual execution steps.
Running analysis with LUTE is the process of submitting one or more managed Task
s. This is generally a two step process.
Task
s which you may run.Task
submission, or workflow (DAG) submission.These two steps are described below.
"},{"location":"usage/#preparing-a-configuration-yaml","title":"Preparing a Configuration YAML","text":"All Task
s are parameterized through a single configuration YAML file - even third party code which requires its own configuration files is managed through this YAML file. The basic structure is split into two documents, a brief header section which contains information that is applicable across all Task
s, such as the experiment name, run numbers and the working directory, followed by per Task
parameters:
%YAML 1.3\n---\ntitle: \"Some title.\"\nexperiment: \"MYEXP123\"\n# run: 12 # Does not need to be provided\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/scratch/users/d/dorlhiac\"\n...\n---\nTaskOne:\n param_a: 123\n param_b: 456\n param_c:\n sub_var: 3\n sub_var2: 4\n\nTaskTwo:\n new_param1: 3\n new_param2: 4\n\n# ...\n...\n
In the first document, the header, it is important that the work_dir
is properly specified. This is the root directory from which Task
outputs will be written, and the LUTE database will be stored. It may also be desirable to modify the task_timeout
parameter which defines the time limit for individual Task
jobs. By default it is set to 10 minutes, although this may not be sufficient for long running jobs. This value will be applied to all Task
s so should account for the longest running job you expect.
The actual analysis parameters are defined in the second document. As these vary from Task
to Task
, a full description will not be provided here. An actual template with real Task
parameters is available in config/test.yaml
. Your analysis POC can also help you set up and choose the correct Task
s to include as a starting point. The template YAML file has further descriptions of what each parameter does and how to fill it out. You can also refer to the lute_help
program described under the following sub-heading.
Some things to consider and possible points of confusion:
Task
s, the parameters are defined at the Task
level. I.e. the managed Task
and Task
itself have different names, and the names in the YAML refer to the latter. This is because a single Task
can be run using different Executor
configurations, but using the same parameters. The list of managed Task
s is in lute/managed_tasks.py
. A table is also provided below for some routines of interest..Task
The Task
it Runs Task
Description SmallDataProducer
SubmitSMD
Smalldata production CrystFELIndexer
IndexCrystFEL
Crystallographic indexing PartialatorMerger
MergePartialator
Crystallographic merging HKLComparer
CompareHKL
Crystallographic figures of merit HKLManipulator
ManipulateHKL
Crystallographic format conversions DimpleSolver
DimpleSolve
Crystallographic structure solution with molecular replacement PeakFinderPyAlgos
FindPeaksPyAlgos
Peak finding with PyAlgos algorithm. PeakFinderPsocake
FindPeaksPsocake
Peak finding with psocake algorithm. StreamFileConcatenator
ConcatenateStreamFiles
Stream file concatenation."},{"location":"usage/#how-do-i-know-what-parameters-are-available-and-what-they-do","title":"How do I know what parameters are available, and what they do?","text":"A summary of Task
parameters is available through the lute_help
program.
> utilities/lute_help -t [TaskName]\n
Note, some parameters may say \"Unknown description\" - this either means they are using an old-style defintion that does not include parameter help, or they may have some internal use. In particular you will see this for lute_config
on every Task
, this parameter is filled in automatically and should be ignored. E.g. as an example:
> utilities/lute_help -t IndexCrystFEL\nINFO:__main__:Fetching parameter information for IndexCrystFEL.\nIndexCrystFEL\n-------------\nParameters for CrystFEL's `indexamajig`.\n\nThere are many parameters, and many combinations. For more information on\nusage, please refer to the CrystFEL documentation, here:\nhttps://www.desy.de/~twhite/crystfel/manual-indexamajig.html\n\n\nRequired Parameters:\n--------------------\n[...]\n\nAll Parameters:\n-------------\n[...]\n\nhighres (number)\n Mark all pixels greater than `x` has bad.\n\nprofile (boolean) - Default: False\n Display timing data to monitor performance.\n\ntemp_dir (string)\n Specify a path for the temp files folder.\n\nwait_for_file (integer) - Default: 0\n Wait at most `x` seconds for a file to be created. A value of -1 means wait forever.\n\nno_image_data (boolean) - Default: False\n Load only the metadata, no iamges. Can check indexability without high data requirements.\n\n[...]\n
"},{"location":"usage/#running-managed-tasks-and-workflows-dags","title":"Running Managed Task
s and Workflows (DAGs)","text":"After a YAML file has been filled in you can run a Task
. There are multiple ways to submit a Task
, but there are 3 that are most likely:
Task
interactively by running python ...
Task
as a batch job (e.g. on S3DF) via a SLURM submission submit_slurm.sh ...
Task
s).These will be covered in turn below; however, in general all methods will require two parameters: the path to a configuration YAML file, and the name of the managed Task
or workflow you want to run. When submitting via SLURM or submitting an entire workflow there are additional parameters to control these processes.
Task
s interactively","text":"The simplest submission method is just to run Python interactively. In most cases this is not practical for long-running analysis, but may be of use for short Task
s or when debugging. From the root directory of the LUTE repository (or after installation) you can use the run_task.py
script:
> python -B [-O] run_task.py -t <ManagedTaskName> -c </path/to/config/yaml>\n
The command-line arguments in square brackets []
are optional, while those in <>
must be provided:
-O
is the flag controlling whether you run in debug or non-debug mode. By default, i.e. if you do NOT provide this flag you will run in debug mode which enables verbose printing. Passing -O
will turn off debug to minimize output.-t <ManagedTaskName>
is the name of the managed Task
you want to run.-c </path/...>
is the path to the configuration YAML.Task
as a batch job","text":"On S3DF you can also submit individual managed Task
s to run as batch jobs. To do so use launch_scripts/submit_slurm.sh
> launch_scripts/submit_slurm.sh -t <ManagedTaskName> -c </path/to/config/yaml> [--debug] $SLURM_ARGS\n
As before command-line arguments in square brackets []
are optional, while those in <>
must be provided
-t <ManagedTaskName>
is the name of the managed Task
you want to run.-c </path/...>
is the path to the configuration YAML.--debug
is the flag to control whether or not to run in debug mode.In addition to the LUTE-specific arguments, SLURM arguments must also be provided ($SLURM_ARGS
above). You can provide as many as you want; however you will need to at least provide:
--partition=<partition/queue>
- The queue to run on, in general for LCLS this is milano
--account=lcls:<experiment>
- The account to use for batch job accounting.You will likely also want to provide at a minimum:
--ntasks=<...>
to control the number of cores in allocated.In general, it is best to prefer the long-form of the SLURM-argument (--arg=<...>
) in order to avoid potential clashes with present or future LUTE arguments.
Finally, you can submit a full workflow (e.g. SFX analysis, smalldata production and summary results, geometry optimization...). This can be done using a single script, submit_launch_airflow.sh
, similarly to the SLURM submission above:
> launch_scripts/submit_launch_airflow.sh /path/to/lute/launch_scripts/launch_airflow.py -c </path/to/yaml.yaml> -w <dag_name> [--debug] [--test] [-e <exp>] [-r <run>] $SLURM_ARGS\n
The submission process is slightly more complicated in this case. A more in-depth explanation is provided under \"Airflow Launch Steps\", in the advanced usage section below if interested. The parameters are as follows - as before command-line arguments in square brackets []
are optional, while those in <>
must be provided:
launch_scripts/launch_airflow.py
script located in whatever LUTE installation you are running. All other arguments can come afterwards in any order.-c </path/...>
is the path to the configuration YAML to use.-w <dag_name>
is the name of the DAG (workflow) to run. This replaces the task name provided when using the other two methods above. A DAG list is provided below.-W
(capital W) followed by the path to the workflow instead of -w
. See below for further discussion on this use case.--debug
controls whether to use debug mode (verbose printing)--test
controls whether to use the test or production instance of Airflow to manage the DAG. The instances are running identical versions of Airflow, but the test
instance may have \"test\" or more bleeding edge development DAGs.-e
is used to pass the experiment name. Needed if not using the ARP, i.e. running from the command-line.-r
is used to pass a run number. Needed if not using the ARP, i.e. running from the command-line.The $SLURM_ARGS
must be provided in the same manner as when submitting an individual managed Task
by hand to be run as batch job with the script above. Note that these parameters will be used as the starting point for the SLURM arguments of every managed Task
in the DAG; however, individual steps in the DAG may have overrides built-in where appropriate to make sure that step is not submitted with potentially incompatible arguments. For example, a single threaded analysis Task
may be capped to running on one core, even if in general everything should be running on 100 cores, per the SLURM argument provided. These caps are added during development and cannot be disabled through configuration changes in the YAML.
DAG List
find_peaks_index
psocake_sfx_phasing
pyalgos_sfx
eLog
","text":"You can use the script in the previous section to submit jobs through the eLog. To do so navigate to the Workflow > Definitions
tab using the blue navigation bar at the top of the eLog. On this tab, in the top-right corner (underneath the help and zoom icons) you can click the +
sign to add a new workflow. This will bring up a \"Workflow definition\" UI window. When filling out the eLog workflow definition the following fields are needed (all of them):
Name
: You can name the workflow anything you like. It should probably be something descriptive, e.g. if you are using LUTE to run smalldata_tools, you may call the workflow lute_smd
.Executable
: In this field you will put the full path to the submit_launch_airflow.sh
script: /path/to/lute/launch_scripts/submit_launch_airflow.sh
.Parameters
: You will use the parameters as described above. Remember the first argument will be the full path to the launch_airflow.py
script (this is NOT the same as the bash script used in the executable!): /full/path/to/lute/launch_scripts/launch_airflow.py -c <path/to/yaml> -w <dag_name> [--debug] [--test] $SLURM_ARGS
Location
: Be sure to set to S3DF
.Trigger
: You can have the workflow trigger automatically or manually. Which option to choose will depend on the type of workflow you are running. In general the options Manually triggered
(which displays as MANUAL
on the definitions page) and End of a run
(which displays as END_OF_RUN
on the definitions page) are safe options for ALL workflows. The latter will be automatically submitted for you when data acquisition has finished. If you are running a workflow with managed Task
s that work as data is being acquired (e.g. SmallDataProducer
), you may also select Start of a run
(which displays as START_OF_RUN
on the definitions page).Upon clicking create you will see a new entry in the table on the definitions page. In order to run MANUAL
workflows, or re-run automatic workflows, you must navigate to the Workflows > Control
tab. For each acquisition run you will find a drop down menu under the Job
column. To submit a workflow you select it from this drop down menu by the Name
you provided when creating its definition.
Using validator
s, it is possible to define (generally, default) model parameters for a Task
in terms of other parameters. It is also possible to use validated Pydantic model parameters to substitute values into a configuration file required to run a third party Task
(e.g. some Task
s may require their own JSON, TOML files, etc. to run properly). For more information on these types of substitutions, refer to the new_task.md
documentation on Task
creation.
These types of substitutions, however, have a limitation in that they are not easily adapted at run time. They therefore address only a small number of the possible combinations in the dependencies between different input parameters. In order to support more complex relationships between parameters, variable substitutions can also be used in the configuration YAML itself. Using a syntax similar to Jinja
templates, you can define values for YAML parameters in terms of other parameters or environment variables. The values are substituted before Pydantic attempts to validate the configuration.
It is perhaps easiest to illustrate with an example. A test case is provided in config/test_var_subs.yaml
and is reproduced here:
%YAML 1.3\n---\ntitle: \"Configuration to Test YAML Substitution\"\nexperiment: \"TestYAMLSubs\"\nrun: 12\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/scratch/users/d/dorlhiac\"\n...\n---\nOtherTask:\n useful_other_var: \"USE ME!\"\n\nNonExistentTask:\n test_sub: \"/path/to/{{ experiment }}/file_r{{ run:04d }}.input\" # Substitute `experiment` and `run` from header above\n test_env_sub: \"/path/to/{{ $EXPERIMENT }}/file.input\" # Substitute from the environment variable $EXPERIMENT\n test_nested:\n a: \"outfile_{{ run }}_one.out\" # Substitute `run` from header above\n b:\n c: \"outfile_{{ run }}_two.out\" # Also substitute `run` from header above\n d: \"{{ OtherTask.useful_other_var }}\" # Substitute `useful_other_var` from `OtherTask`\n test_fmt: \"{{ run:04d }}\" # Subsitute `run` and format as 0012\n test_env_fmt: \"{{ $RUN:04d }}\" # Substitute environment variable $RUN and pad to 4 w/ zeros\n...\n
Input parameters in the config YAML can be substituted with either other input parameters or environment variables, with or without limited string formatting. All substitutions occur between double curly brackets: {{ VARIABLE_TO_SUBSTITUTE }}
. Environment variables are indicated by $
in front of the variable name. Parameters from the header, i.e. the first YAML document (top section) containing the run
, experiment
, version fields, etc. can be substituted without any qualification. If you want to use the run
parameter, you can substitute it using {{ run }}
. All other parameters, i.e. from other Task
s or within Task
s, must use a qualified name. Nested levels are delimited using a .
. E.g. consider a structure like:
Task:\n param_set:\n a: 1\n b: 2\n c: 3\n
In order to use parameter c
, you would use {{ Task.param_set.c }}
as the substitution.
Take care when using substitutions! This process will not try to guess for you. When a substitution is not available, e.g. due to misspelling, one of two things will happen:
param: /my/failed/{{ $SUBSTITUTION }}
as your parameter. This may or may not fail the model validation step, but is likely not what you intended.Defining your own parameters
The configuration file is not validated in its totality, only on a Task
-by-Task
basis, but it is read in its totality. E.g. when running MyTask
only that portion of the configuration is validated even though the entire file has been read, and is available for substitutions. As a result, it is safe to introduce extra entries into the YAML file, as long as they are not entered under a specific Task
's configuration. This may be useful to create your own global substitutions, for example if there is a key variable that may be used across different Task
s. E.g. Consider a case where you want to create a more generic configuration file where a single variable is used by multiple Task
s. This single variable may be changed between experiments, for instance, but is likely static for the duration of a single set of analyses. In order to avoid a mistake when changing the configuration between experiments you can define this special variable (or variables) as a separate entry in the YAML, and make use of substitutions in each Task
's configuration. This way the variable only needs to be changed in one place.
# Define our substitution. This is only for substitutiosns!\nMY_SPECIAL_SUB: \"EXPMT_DEPENDENT_VALUE\" # Can change here once per experiment!\n\nRunTask1:\n special_var: \"{{ MY_SPECIAL_SUB }}\"\n var_1: 1\n var_2: \"a\"\n # ...\n\nRunTask2:\n special_var: \"{{ MY_SPECIAL_SUB }}\"\n var_3: \"abcd\"\n var_4: 123\n # ...\n\nRunTask3:\n special_var: \"{{ MY_SPECIAL_SUB }}\"\n #...\n\n# ... and so on\n
"},{"location":"usage/#gotchas","title":"Gotchas!","text":"Order matters
While in general you can use parameters that appear later in a YAML document to substitute for values of parameters that appear earlier, the substitutions themselves will be performed in order of appearance. It is therefore NOT possible to correctly use a later parameter as a substitution for an earlier one, if the later one itself depends on a substitution. The YAML document, however, can be rearranged without error. The order in the YAML document has no effect on execution order which is determined purely by the workflow definition. As mentioned above, the document is not validated in its entirety so rearrangements are allowed. For example consider the following situation which produces an incorrect substitution:
%YAML 1.3\n---\ntitle: \"Configuration to Test YAML Substitution\"\nexperiment: \"TestYAMLSubs\"\nrun: 12\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/data/lcls/ds/exp/experiment/scratch\"\n...\n---\nRunTaskOne:\n input_dir: \"{{ RunTaskTwo.path }}\" # Will incorrectly be \"{{ work_dir }}/additional_path/{{ $RUN }}\"\n # ...\n\nRunTaskTwo:\n # Remember `work_dir` and `run` come from the header document and don't need to\n # be qualified\n path: \"{{ work_dir }}/additional_path/{{ run }}\"\n...\n
This configuration can be rearranged to achieve the desired result:
%YAML 1.3\n---\ntitle: \"Configuration to Test YAML Substitution\"\nexperiment: \"TestYAMLSubs\"\nrun: 12\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/data/lcls/ds/exp/experiment/scratch\"\n...\n---\nRunTaskTwo:\n # Remember `work_dir` comes from the header document and doesn't need to be qualified\n path: \"{{ work_dir }}/additional_path/{{ run }}\"\n\nRunTaskOne:\n input_dir: \"{{ RunTaskTwo.path }}\" # Will now be /sdf/data/lcls/ds/exp/experiment/scratch/additional_path/12\n # ...\n...\n
On the otherhand, relationships such as these may point to inconsistencies in the dependencies between Task
s which may warrant a refactor.
Found unhashable key
To avoid YAML parsing issues when using the substitution syntax, be sure to quote your substitutions. Before substitution is performed, a dictionary is first constructed by the pyyaml
package which parses the document - it may fail to parse the document and raise an exception if the substitutions are not quoted. E.g.
# USE THIS\nMyTask:\n var_sub: \"{{ other_var:04d }}\"\n\n# **DO NOT** USE THIS\nMyTask:\n var_sub: {{ other_var:04d }}\n
During validation, Pydantic will by default cast variables if possible, because of this it is generally safe to use strings for substitutions. E.g. if your parameter is expecting an integer, and after substitution you pass \"2\"
, Pydantic will cast this to the int
2
, and validation will succeed. As part of the substitution process limited type casting will also be handled if it is necessary for any formatting strings provided. E.g. \"{{ run:04d }}\"
requires that run be an integer, so it will be treated as such in order to apply the formatting.
In most cases, standard DAGs should be called as described above. However, Airflow also supports the dynamic creation of DAGs, e.g. to vary the input data to various steps, or the number of steps that will occur. Some of this functionality has been used to allow for user-defined DAGs which are passed in the form of a dictionary, allowing Airflow to construct the workflow as it is running.
A basic YAML syntax is used to construct a series of nested dictionaries which define a DAG. Consider a simplified serial femtosecond crystallography DAG which runs peak finding through merging and then calculates some statistics. I.e. we want an execution order that looks like:
peak_finder >> indexer >> merger >> hkl_comparer\n
We can alternatively define this DAG in YAML:
task_name: PeakFinderPyAlgos\nslurm_params: ''\nnext:\n- task_name: CrystFELIndexer\n slurm_params: ''\n next: []\n - task_name: PartialatorMerger\n slurm_params: ''\n next: []\n - task_name: HKLComparer\n slurm_params: ''\n next:\n
I.e. we define a tree where each node is constructed using Node(task_name: str, slurm_params: str, next: List[Node])
.
task_name
is the name of a managed Task
. This name must be identical to a managed Task
defined in the LUTE installation you are using.slurm_params
. This is a complete string of all the arguments to use for the corresponding managed Task
. Use of this field is all or nothing! - if it is left as an empty string, the default parameters (passed on the command-line using the launch script) are used, otherwise this string is used in its stead. Because of this remember to include a partition and account if using it.next
field is composed of either an empty list (meaning no managed Task
s are run after the current node), or additional nodes. All nodes in the next
list are run in parallel.As a second example, to run task1
followed by task2
and task3
in parellel we would use:
task_name: Task1\nslurm_params: ''\nnext:\n- task_name: Task2\n slurm_params: ''\n next: []\n- task_name: Task3\n slurm_params: ''\n next: []\n
In order to run a DAG defined in this way, we pass the path to the YAML file we have defined it in to the launch script using -W <path_to_dag>
. This is instead of calling it by name. E.g.
/path/to/lute/launch_scripts/submit_launch_airflow.sh /path/to/lute/launch_scripts/launch_airflow.py -e <exp> -r <run> -c /path/to/config -W <path_to_dag> --test [--debug] [SLURM_ARGS]\n
Note that fewer options are currently supported for configuring the operators for each step of the DAG. The slurm arguments can be replaced in their entirety using a custom slurm_params
string but individual options cannot be modified.
Special markers have been inserted at certain points in the execution flow for LUTE. These can be enabled by setting the environment variables detailed below. These are intended to allow developers to exit the program at certain points to investigate behaviour or a bug. For instance, when working on configuration parsing, an environment variable can be set which exits the program after passing this step. This allows you to run LUTE otherwise as normal (described above), without having to modify any additional code or insert your own early exits.
Types of debug markers:
LUTE_DEBUG_EXIT
: Will exit the program at this point if the corresponding environment variable has been set.Developers can insert these markers as needed into their code to add new exit points, although as a rule of thumb they should be used sparingly, and generally only after major steps in the execution flow (e.g. after parsing, after beginning a task, after returning a result, etc.).
In order to include a new marker in your code:
from lute.execution.debug_utils import LUTE_DEBUG_EXIT\n\ndef my_code() -> None:\n # ...\n LUTE_DEBUG_EXIT(\"MYENVVAR\", \"Additional message to print\")\n # If MYENVVAR is not set, the above function does nothing\n
You can enable a marker by setting to 1, e.g. to enable the example marker above while running Tester
:
MYENVVAR=1 python -B run_task.py -t Tester -c config/test.yaml\n
"},{"location":"usage/#currently-used-environment-variables","title":"Currently used environment variables","text":"LUTE_DEBUG_EXIT_AT_YAML
: Exits the program after reading in a YAML configuration file and performing variable substitutions, but BEFORE Pydantic validation.LUTE_DEBUG_BEFORE_TPP_EXEC
: Exits the program after a ThirdPartyTask has prepared its submission command, but before exec
is used to run it.The Airflow launch process actually involves a number of steps, and is rather complicated. There are two wrapper steps prior to getting to the actual Airflow API communication.
launch_scripts/submit_launch_airflow.sh
is run./sdf/group/lcls/ds/tools/lute_launcher
with all the same parameters that it was called with.lute_launcher
runs the launch_scripts/launch_airflow.py
script which was provided as the first argument. This is the true launch scriptlaunch_airflow.py
communicates with the Airflow API, requesting that a specific DAG be launched. It then continues to run, and gathers the individual logs and the exit status of each step of the DAG.launch_scripts/submit_slurm.sh
.There are some specific reasons for this complexity:
submit_launch_airflow.sh
as a thin-wrapper around lute_launcher
is to allow the true Airflow launch script to be a long-lived job. This is for compatibility with the eLog and the ARP. When run from the eLog as a workflow, the job submission process must occur within 30 seconds due to a timeout built-in to the system. This is fine when submitting jobs to run on the batch-nodes, as the submission to the queue takes very little time. So here, submit_launch_airflow.sh
serves as a thin script to have lute_launcher
run as a batch job. It can then run as a long-lived job (for the duration of the entire DAG) collecting log files all in one place. This allows the log for each stage of the Airflow DAG to be inspected in a single file, and through the eLog browser interface.lute_launcher
as a wrapper around launch_airflow.py
is to manage authentication and credentials. The launch_airflow.py
script requires loading credentials in order to authenticate against the Airflow API. For the average user this is not possible, unless the script is run from within the lute_launcher
process.madr_template.md
for creating new ADRs. This template was adapted from the MADR template (MIT License).Task
s inherit from a base class Accepted 2 2023-11-06 Analysis Task
submission and communication is performed via Executor
s Accepted 3 2023-11-06 Executor
s will run all Task
s via subprocess Proposed 4 2023-11-06 Airflow Operator
s and LUTE Executor
s are separate entities. Proposed 5 2023-12-06 Task-Executor IPC is Managed by Communicator Objects Proposed 6 2024-02-12 Third-party Config Files Managed by Templates Rendered by ThirdPartyTask
s Proposed 7 2024-02-12 Task
Configuration is Stored in a Database Managed by Executor
s Proposed 8 2024-03-18 Airflow credentials/authorization requires special launch program. Proposed 9 2024-04-15 Airflow launch script will run as long lived batch job. Proposed"},{"location":"adrs/MADR_LICENSE/","title":"MADR LICENSE","text":"Copyright 2022 ADR Github Organization
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \u201cSoftware\u201d), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED \u201cAS IS\u201d, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
"},{"location":"adrs/adr-1/","title":"[ADR-1] All Analysis Tasks Inherit from a Base Class","text":"Date: 2023-11-06
"},{"location":"adrs/adr-1/#status","title":"Status","text":"Accepted
"},{"location":"adrs/adr-1/#context-and-problem-statement","title":"Context and Problem Statement","text":"btx
tasks had heterogenous interfaces.Task
s simultaneously.Date: 2023-11-06
"},{"location":"adrs/adr-2/#status","title":"Status","text":"Accepted
"},{"location":"adrs/adr-2/#context-and-problem-statement","title":"Context and Problem Statement","text":"Task
code itself provides a separation of concerns allowing Task
s to run indepently of execution environment.Executor
can prepare environment, submission requirements, etc.Executor
classes avoids maintaining that code independently for each task (cf. alternatives considered).Executor
level and immediately applied to all Task
s.Task
code.btx
tasks. E.g. task timeout leading to failure of a processing pipeline even if substantial work had been done and subsequent tasks could proceed.Task
submission already exist in the original btx
but the methods were not fully standardized.JobScheduler
submission vs direct submission of the task.Task
class interface as pre/post analysis operations.Task
subclasses for different execution environments.Task
class.Task
code independent of execution environment.Task
failure.Executor
s as the \"Managed Task\"Task
s will not be submitted independently.Executor
s will run all Task
s via subprocess","text":"Date: 2023-11-06
"},{"location":"adrs/adr-3/#status","title":"Status","text":"Proposed
"},{"location":"adrs/adr-3/#context-and-problem-statement","title":"Context and Problem Statement","text":"Task
s from within the Executor
(cf. ADR-2)Task
s, at all locations, but at the very least all Task
s at a single location (e.g. S3DF, NERSC)Task
submission, but have to submit both first-party and third-party code.JobScheduler
for btx
multiprocessing
at the Python level.Operator
s and LUTE Executor
s are Separate Entities","text":"Date: 2023-11-06
"},{"location":"adrs/adr-4/#status","title":"Status","text":"Proposed
"},{"location":"adrs/adr-4/#context-and-problem-statement","title":"Context and Problem Statement","text":"Executor
which in turn submits the Task
*
"},{"location":"adrs/adr-4/#considered-options","title":"Considered Options","text":"*
"},{"location":"adrs/adr-4/#consequences","title":"Consequences","text":"*
"},{"location":"adrs/adr-4/#compliance","title":"Compliance","text":""},{"location":"adrs/adr-4/#metadata","title":"Metadata","text":""},{"location":"adrs/adr-5/","title":"[ADR-5] Task-Executor IPC is Managed by Communicator Objects","text":"Date: 2023-12-06
"},{"location":"adrs/adr-5/#status","title":"Status","text":"Proposed
"},{"location":"adrs/adr-5/#context-and-problem-statement","title":"Context and Problem Statement","text":"Communicator
objects which maintain simple read
and write
mechanisms for Message
objects. These latter can contain arbitrary Python objects. Task
s do not interact directly with the communicator, but rather through specific instance methods which hide the communicator interfaces. Multiple Communicators can be used in parallel. The same Communicator
objects are used identically at the Task
and Executor
layers - any changes to communication protocols are not transferred to the calling objects.
Task
output needs to be routed to other layers of the software, but the Task
s themselves should have no knowledge of where the output ends up.subprocess
Task
and Executor
layers.Communicator
: Abstract base class - defines interfacePipeCommunicator
: Manages communication through pipes (stderr
and stdout
)SocketCommunicator
: Manages communication through Unix socketsTask
and Executor
side, IPC is greatly simplifiedCommunicator
Communicator
objects are non-public. Their interfaces (already limited) are handled by simple methods in the base classes of Task
s and Executor
s.Communicator
should have no need to be directly manipulated by callers (even less so by users)ThirdPartyTask
s","text":"Date: 2024-02-12
"},{"location":"adrs/adr-6/#status","title":"Status","text":"Proposed
"},{"location":"adrs/adr-6/#context-and-problem-statement","title":"Context and Problem Statement","text":"Templates will be used for the third party configuration files. A generic interface to heterogenous templates will be provided through a combination of pydantic models and the ThirdPartyTask
implementation. The pydantic models will label extra arguments to ThirdPartyTask
s as being TemplateParameters
. I.e. any extra parameters are considered to be for a templated configuration file. The ThirdPartyTask
will find the necessary template and render it if any extra parameters are found. This puts the burden of correct parsing on the template definition itself.
Task
interface as possible - but due to the above, need a way of handling multiple output files.Task
to be run before the main ThirdPartyTask
.Task
.ThirdPartyTask
s to be run as instances of a single class.Task
Configuration is Stored in a Database Managed by Executor
s","text":"Date: 2024-02-12
"},{"location":"adrs/adr-7/#status","title":"Status","text":"Proposed
"},{"location":"adrs/adr-7/#context-and-problem-statement","title":"Context and Problem Statement","text":"Task
parameter configurations.Task
's code is designed to be independent of other Task
's aside from code shared by inheritance.Task
s are intended to be defined only at the level of workflows.Task
s may have implicit dependencies on others. E.g. one Task
may use the output files of another, and so could benefit from having knowledge of where they were written.Upon Task
completion the managing Executor
will write the AnalysisConfig
object, including TaskParameters
, results and generic configuration information to a database. Some entries from this database can be retrieved to provide default files for TaskParameter
fields; however, the Task
itself has no knowledge, and does not access to the database.
Task
s while allowing information to be shared between them.Task
-independent IO be managed solely at the Executor
level.Task
s write the database.Task
s pass information through other mechanisms, such as Airflow.sqlite
which should make everything transferrable.Task
s without any explicit code dependencies/linkages between them.Date: 2024-03-18
"},{"location":"adrs/adr-8/#status","title":"Status","text":"Proposed
"},{"location":"adrs/adr-8/#context-and-problem-statement","title":"Context and Problem Statement","text":"A closed-source lute_launcher
program will be used to run the Airflow launch scripts. This program accesses credentials with the correct permissions. Users should otherwise not have access to the credentials. This will help ensure the credentials can be used by everyone but only to run workflows and not perform restricted admin activities.
Date: 2024-04-15
"},{"location":"adrs/adr-9/#status","title":"Status","text":"Proposed
"},{"location":"adrs/adr-9/#context-and-problem-statement","title":"Context and Problem Statement","text":"Task
will produce its own log file.The Airflow launch script will be a long lived process, running for the duration of the entire DAG. It will provide basic status logging information, e.g. what Task
s are running, if they succeed or failed. Additionally, at the end of each Task
job, the launch job will collect the log file from that job and append it to its own log.
As the Airflow launch script is an entry point used from the eLog, only its log file is available to users using that UI. By converting the launch script into a long-lived monitoring job it allows the log information to be easily accessible.
In order to accomplish this, the launch script must be submitted as a batch job, in order to comply with the 30 second timeout imposed by jobs run by the ARP. This necessitates providing an additional wrapper script.
"},{"location":"adrs/adr-9/#decision-drivers","title":"Decision Drivers","text":"--open-mode=append
for SLURM)submit_launch_airflow.sh
which submits the launch_airflow.py
script (run by lute_launcher
) as a batch job.launch_airflow.py
) and 1 for the Executor
process. {ADR #X : Short description/title of feature/decision}
Date:
"},{"location":"adrs/madr_template/#status","title":"Status","text":"{Accepted | Proposed | Rejected | Deprecated | Superseded} {If this proposal supersedes another, please indicate so, e.g. \"Status: Accepted, supersedes [ADR-3]\"} {Likewise, if this proposal was superseded, e.g. \"Status: Superseded by [ADR-2]\"}
"},{"location":"adrs/madr_template/#context-and-problem-statement","title":"Context and Problem Statement","text":"{Describe the problem context and why this decision has been made/feature implemented.}
"},{"location":"adrs/madr_template/#decision","title":"Decision","text":"{Describe how the solution was arrived at in the manner it was. You may use the sections below to help.}
"},{"location":"adrs/madr_template/#decision-drivers","title":"Decision Drivers","text":"{Short description of anticipated consequences} * {Anticipated consequence 1} * {Anticipated consequence 2}
"},{"location":"adrs/madr_template/#compliance","title":"Compliance","text":"{How will the decision/implementation be enforced. How will compliance be validated?}
"},{"location":"adrs/madr_template/#metadata","title":"Metadata","text":"{Any additional information to include}
"},{"location":"design/database/","title":"LUTE Configuration Database Specification","text":"Date: 2024-02-12 VERSION: v0.1
"},{"location":"design/database/#basic-outline","title":"Basic Outline","text":"Executor
level code.Executor
configurationlute.io.config.AnalysisHeader
)Task
Task
tables by pointing/linking to the entry ids in the above two tables.gen_cfg
table","text":"The general configuration table contains entries which may be shared between multiple Task
s. The format of the table is:
These parameters are extracted from the TaskParameters
object. Each of those contains an AnalysisHeader
object stored in the lute_config
variable. For a given experimental run, this value will be shared across any Task
s that are executed.
id
ID of the entry in this table. title
Arbitrary description/title of the purpose of analysis. E.g. what kind of experiment is being conducted experiment
LCLS Experiment. Can be a placeholder if debugging, etc. run
LCLS Acquisition run. Can be a placeholder if debugging, testing, etc. date
Date the configuration file was first setup. lute_version
Version of the codebase being used to execute Task
s. task_timeout
The maximum amount of time in seconds that a Task
can run before being cancelled."},{"location":"design/database/#exec_cfg-table","title":"exec_cfg
table","text":"The Executor
table contains information on the environment provided to the Executor
for Task
execution, the polling interval used for IPC between the Task
and Executor
and information on the communicator protocols used for IPC. This information can be shared between Task
s or between experimental runs, but not necessarily every Task
of a given run will use exactly the same Executor
configuration and environment.
id
ID of the entry in this table. env
Execution environment used by the Executor and by proxy any Tasks submitted by an Executor matching this entry. Environment is stored as a string with variables delimited by \";\" poll_interval
Polling interval used for Task monitoring. communicator_desc
Description of the Communicators used. NOTE: The env
column currently only stores variables related to SLURM
or LUTE
itself.
Task
tables","text":"For every Task
a table of the following format will be created. The exact number of columns will depend on the specific Task
, as the number of parameters can vary between them, and each parameter gets its own column. Within a table, multiple experiments and runs can coexist. The experiment and run are not recorded directly. Instead, the first two columns point to the id of entries in the general configuration and Executor
tables respectively. The general configuration table entry will contain the experiment and run information.
Parameter sets which can be described as nested dictionaries are flattened and then delimited with a .
to create column names. Parameters which are lists (or Python tuples, etc.) have a column for each entry with names that include an index (counting from 0). E.g. consider the following dictionary of parameters:
param_dict: Dict[str, Any] = {\n \"a\": { # First parameter a\n \"b\": (1, 2),\n \"c\": 1,\n # ...\n },\n \"a2\": 4, # Second parameter a2\n # ...\n}\n
The dictionary a
will produce columns: a.b[0]
, a.b[1]
, a.c
, and so on.
id
ID of the entry in this table. CURRENT_TIMESTAMP
Full timestamp for the entry. gen_cfg_id
ID of the entry in the general config table that applies to this Task
entry. That table has, e.g., experiment and run number. exec_cfg_id
The ID of the entry in the Executor
table which applies to this Task
entry. P1
- Pn
The specific parameters of the Task
. The P{1..n}
are replaced by the actual parameter names. result.task_status
Reported exit status of the Task
. Note that the output may still be labeled invalid by the valid_flag
(see below). result.summary
Short text summary of the Task
result. This is provided by the Task
, or sometimes the Executor
. result.payload
Full description of result from the Task
. If the object is incompatible with the database, will instead be a pointer to where it can be found. result.impl_schemas
A string of semi-colon separated schema(s) implemented by the Task
. Schemas describe conceptually the type output the Task
produces. valid_flag
A boolean flag for whether the result is valid. May be 0
(False) if e.g., data is missing, or corrupt, or reported status is failed. NOTE: The result.payload
may be distinct from the output files. Payloads can be specified in terms of output parameters, specific output files, or are an optional summary of the results provided by the Task
. E.g. this may include graphical descriptions of results (plots, figures, etc.). In many cases, however, the output files will most likely be pointed to by a parameter in one of the columns P{1...n}
- if properly specified in the TaskParameters
model the value of this output parameter will be replicated in the result.payload
column as well..
This API is intended to be used at the Executor
level, with some calls intended to provide default values for Pydantic models. Utilities for reading and inspecting the database outside of normal Task
execution are addressed in the following subheader.
record_analysis_db(cfg: DescribedAnalysis) -> None
: Writes the configuration to the backend database.read_latest_db_entry(db_dir: str, task_name: str, param: str) -> Any
: Retrieve the most recent entry from a database for a specific Task.invalidate_entry
: Marks a database entry as invalid. Common reason to use this is if data has been deleted, or found to be corrupted.dbview
: TUI for database inspection. Read only.LUTE Managed Tasks.
Executor-managed Tasks with specific environment specifications are defined here.
"},{"location":"source/managed_tasks/#managed_tasks.BinaryErrTester","title":"BinaryErrTester = Executor('TestBinaryErr')
module-attribute
","text":"Runs a test of a third-party task that fails.
"},{"location":"source/managed_tasks/#managed_tasks.BinaryTester","title":"BinaryTester: Executor = Executor('TestBinary')
module-attribute
","text":"Runs a basic test of a multi-threaded third-party Task.
"},{"location":"source/managed_tasks/#managed_tasks.CrystFELIndexer","title":"CrystFELIndexer: Executor = Executor('IndexCrystFEL')
module-attribute
","text":"Runs crystallographic indexing using CrystFEL.
"},{"location":"source/managed_tasks/#managed_tasks.DimpleSolver","title":"DimpleSolver: Executor = Executor('DimpleSolve')
module-attribute
","text":"Solves a crystallographic structure using molecular replacement.
"},{"location":"source/managed_tasks/#managed_tasks.HKLComparer","title":"HKLComparer: Executor = Executor('CompareHKL')
module-attribute
","text":"Runs analysis on merge results for statistics/figures of merit..
"},{"location":"source/managed_tasks/#managed_tasks.HKLManipulator","title":"HKLManipulator: Executor = Executor('ManipulateHKL')
module-attribute
","text":"Performs format conversions (among other things) of merge results.
"},{"location":"source/managed_tasks/#managed_tasks.MultiNodeCommunicationTester","title":"MultiNodeCommunicationTester: MPIExecutor = MPIExecutor('TestMultiNodeCommunication')
module-attribute
","text":"Runs a test to confirm communication works between multiple nodes.
"},{"location":"source/managed_tasks/#managed_tasks.PartialatorMerger","title":"PartialatorMerger: Executor = Executor('MergePartialator')
module-attribute
","text":"Runs crystallographic merging using CrystFEL's partialator.
"},{"location":"source/managed_tasks/#managed_tasks.PeakFinderPsocake","title":"PeakFinderPsocake: Executor = Executor('FindPeaksPsocake')
module-attribute
","text":"Performs Bragg peak finding using psocake - DEPRECATED.
"},{"location":"source/managed_tasks/#managed_tasks.PeakFinderPyAlgos","title":"PeakFinderPyAlgos: MPIExecutor = MPIExecutor('FindPeaksPyAlgos')
module-attribute
","text":"Performs Bragg peak finding using the PyAlgos algorithm.
"},{"location":"source/managed_tasks/#managed_tasks.ReadTester","title":"ReadTester: Executor = Executor('TestReadOutput')
module-attribute
","text":"Runs a test to confirm database reading.
"},{"location":"source/managed_tasks/#managed_tasks.SHELXCRunner","title":"SHELXCRunner: Executor = Executor('RunSHELXC')
module-attribute
","text":"Runs CCP4 SHELXC - needed for crystallographic phasing.
"},{"location":"source/managed_tasks/#managed_tasks.SmallDataProducer","title":"SmallDataProducer: Executor = Executor('SubmitSMD')
module-attribute
","text":"Runs the production of a smalldata HDF5 file.
"},{"location":"source/managed_tasks/#managed_tasks.SocketTester","title":"SocketTester: Executor = Executor('TestSocket')
module-attribute
","text":"Runs a test of socket-based communication.
"},{"location":"source/managed_tasks/#managed_tasks.StreamFileConcatenator","title":"StreamFileConcatenator: Executor = Executor('ConcatenateStreamFiles')
module-attribute
","text":"Concatenates results from crystallographic indexing of multiple runs.
"},{"location":"source/managed_tasks/#managed_tasks.Tester","title":"Tester: Executor = Executor('Test')
module-attribute
","text":"Runs a basic test of a first-party Task.
"},{"location":"source/managed_tasks/#managed_tasks.WriteTester","title":"WriteTester: Executor = Executor('TestWriteOutput')
module-attribute
","text":"Runs a test to confirm database writing.
"},{"location":"source/execution/debug_utils/","title":"debug_utils","text":"Functions to assist in debugging execution of LUTE.
Functions:
Name DescriptionLUTE_DEBUG_EXIT
str, str_dump: Optional[str]): Exits the program if the provided env_var
is set. Optionally, also prints a message if provided.
Raises:
Type DescriptionValidationError
Error raised by pydantic during data validation. (From Pydantic)
"},{"location":"source/execution/executor/","title":"executor","text":"Base classes and functions for handling Task
execution.
Executors run a Task
as a subprocess and handle all communication with other services, e.g., the eLog. They accept specific handlers to override default stream parsing.
Event handlers/hooks are implemented as standalone functions which can be added to an Executor.
Classes:
Name DescriptionAnalysisConfig
Data class for holding a managed Task's configuration.
BaseExecutor
Abstract base class from which all Executors are derived.
Executor
Default Executor implementing all basic functionality and IPC.
BinaryExecutor
Can execute any arbitrary binary/command as a managed task within the framework provided by LUTE.
"},{"location":"source/execution/executor/#execution.executor--exceptions","title":"Exceptions","text":""},{"location":"source/execution/executor/#execution.executor.BaseExecutor","title":"BaseExecutor
","text":" Bases: ABC
ABC to manage Task execution and communication with user services.
When running in a workflow, \"tasks\" (not the class instances) are submitted as Executors
. The Executor manages environment setup, the actual Task submission, and communication regarding Task results and status with third party services like the eLog.
Attributes:
Methods:
Name Descriptionadd_hook
str, hook: Callable[[None], None]) -> None: Create a new hook to be called each time a specific event occurs.
add_default_hooks
Populate the event hooks with the default functions.
update_environment
Dict[str, str], update_path: str): Update the environment that is passed to the Task subprocess.
execute_task
Run the task as a subprocess.
Source code inlute/execution/executor.py
class BaseExecutor(ABC):\n \"\"\"ABC to manage Task execution and communication with user services.\n\n When running in a workflow, \"tasks\" (not the class instances) are submitted\n as `Executors`. The Executor manages environment setup, the actual Task\n submission, and communication regarding Task results and status with third\n party services like the eLog.\n\n Attributes:\n\n Methods:\n add_hook(event: str, hook: Callable[[None], None]) -> None: Create a\n new hook to be called each time a specific event occurs.\n\n add_default_hooks() -> None: Populate the event hooks with the default\n functions.\n\n update_environment(env: Dict[str, str], update_path: str): Update the\n environment that is passed to the Task subprocess.\n\n execute_task(): Run the task as a subprocess.\n \"\"\"\n\n class Hooks:\n \"\"\"A container class for the Executor's event hooks.\n\n There is a corresponding function (hook) for each event/signal. Each\n function takes two parameters - a reference to the Executor (self) and\n a reference to the Message (msg) which includes the corresponding\n signal.\n \"\"\"\n\n def no_pickle_mode(self: Self, msg: Message): ...\n\n def task_started(self: Self, msg: Message): ...\n\n def task_failed(self: Self, msg: Message): ...\n\n def task_stopped(self: Self, msg: Message): ...\n\n def task_done(self: Self, msg: Message): ...\n\n def task_cancelled(self: Self, msg: Message): ...\n\n def task_result(self: Self, msg: Message): ...\n\n def __init__(\n self,\n task_name: str,\n communicators: List[Communicator],\n poll_interval: float = 0.05,\n ) -> None:\n \"\"\"The Executor will manage the subprocess in which `task_name` is run.\n\n Args:\n task_name (str): The name of the Task to be submitted. Must match\n the Task's class name exactly. The parameter specification must\n also be in a properly named model to be identified.\n\n communicators (List[Communicator]): A list of one or more\n communicators which manage information flow to/from the Task.\n Subclasses may have different defaults, and new functionality\n can be introduced by composing Executors with communicators.\n\n poll_interval (float): Time to wait between reading/writing to the\n managed subprocess. In seconds.\n \"\"\"\n result: TaskResult = TaskResult(\n task_name=task_name, task_status=TaskStatus.PENDING, summary=\"\", payload=\"\"\n )\n task_parameters: Optional[TaskParameters] = None\n task_env: Dict[str, str] = os.environ.copy()\n self._communicators: List[Communicator] = communicators\n communicator_desc: List[str] = []\n for comm in self._communicators:\n comm.stage_communicator()\n communicator_desc.append(str(comm))\n\n self._analysis_desc: DescribedAnalysis = DescribedAnalysis(\n task_result=result,\n task_parameters=task_parameters,\n task_env=task_env,\n poll_interval=poll_interval,\n communicator_desc=communicator_desc,\n )\n\n def add_hook(self, event: str, hook: Callable[[Self, Message], None]) -> None:\n \"\"\"Add a new hook.\n\n Each hook is a function called any time the Executor receives a signal\n for a particular event, e.g. Task starts, Task ends, etc. Calling this\n method will remove any hook that currently exists for the event. I.e.\n only one hook can be called per event at a time. Creating hooks for\n events which do not exist is not allowed.\n\n Args:\n event (str): The event for which the hook will be called.\n\n hook (Callable[[None], None]) The function to be called during each\n occurrence of the event.\n \"\"\"\n if event.upper() in LUTE_SIGNALS:\n setattr(self.Hooks, event.lower(), hook)\n\n @abstractmethod\n def add_default_hooks(self) -> None:\n \"\"\"Populate the set of default event hooks.\"\"\"\n\n ...\n\n def update_environment(\n self, env: Dict[str, str], update_path: str = \"prepend\"\n ) -> None:\n \"\"\"Update the stored set of environment variables.\n\n These are passed to the subprocess to setup its environment.\n\n Args:\n env (Dict[str, str]): A dictionary of \"VAR\":\"VALUE\" pairs of\n environment variables to be added to the subprocess environment.\n If any variables already exist, the new variables will\n overwrite them (except PATH, see below).\n\n update_path (str): If PATH is present in the new set of variables,\n this argument determines how the old PATH is dealt with. There\n are three options:\n * \"prepend\" : The new PATH values are prepended to the old ones.\n * \"append\" : The new PATH values are appended to the old ones.\n * \"overwrite\" : The old PATH is overwritten by the new one.\n \"prepend\" is the default option. If PATH is not present in the\n current environment, the new PATH is used without modification.\n \"\"\"\n if \"PATH\" in env:\n sep: str = os.pathsep\n if update_path == \"prepend\":\n env[\"PATH\"] = (\n f\"{env['PATH']}{sep}{self._analysis_desc.task_env['PATH']}\"\n )\n elif update_path == \"append\":\n env[\"PATH\"] = (\n f\"{self._analysis_desc.task_env['PATH']}{sep}{env['PATH']}\"\n )\n elif update_path == \"overwrite\":\n pass\n else:\n raise ValueError(\n (\n f\"{update_path} is not a valid option for `update_path`!\"\n \" Options are: prepend, append, overwrite.\"\n )\n )\n os.environ.update(env)\n self._analysis_desc.task_env.update(env)\n\n def shell_source(self, env: str) -> None:\n \"\"\"Source a script.\n\n Unlike `update_environment` this method sources a new file.\n\n Args:\n env (str): Path to the script to source.\n \"\"\"\n import sys\n\n if not os.path.exists(env):\n logger.info(f\"Cannot source environment from {env}!\")\n return\n\n script: str = (\n f\"set -a\\n\"\n f'source \"{env}\" >/dev/null\\n'\n f'{sys.executable} -c \"import os; print(dict(os.environ))\"\\n'\n )\n logger.info(f\"Sourcing file {env}\")\n o, e = subprocess.Popen(\n [\"bash\", \"-c\", script], stdout=subprocess.PIPE\n ).communicate()\n new_environment: Dict[str, str] = eval(o)\n self._analysis_desc.task_env = new_environment\n\n def _pre_task(self) -> None:\n \"\"\"Any actions to be performed before task submission.\n\n This method may or may not be used by subclasses. It may be useful\n for logging etc.\n \"\"\"\n # This prevents the Executors in managed_tasks.py from all acquiring\n # resources like sockets.\n for communicator in self._communicators:\n communicator.delayed_setup()\n # Not great, but experience shows we need a bit of time to setup\n # network.\n time.sleep(0.1)\n # Propagate any env vars setup by Communicators - only update LUTE_ vars\n tmp: Dict[str, str] = {key: os.environ[key] for key in os.environ if \"LUTE_\" in key}\n self._analysis_desc.task_env.update(tmp)\n\n def _submit_task(self, cmd: str) -> subprocess.Popen:\n proc: subprocess.Popen = subprocess.Popen(\n cmd.split(),\n stdout=subprocess.PIPE,\n stderr=subprocess.PIPE,\n env=self._analysis_desc.task_env,\n )\n os.set_blocking(proc.stdout.fileno(), False)\n os.set_blocking(proc.stderr.fileno(), False)\n return proc\n\n @abstractmethod\n def _task_loop(self, proc: subprocess.Popen) -> None:\n \"\"\"Actions to perform while the Task is running.\n\n This function is run in the body of a loop until the Task signals\n that its finished.\n \"\"\"\n ...\n\n @abstractmethod\n def _finalize_task(self, proc: subprocess.Popen) -> None:\n \"\"\"Any actions to be performed after the Task has ended.\n\n Examples include a final clearing of the pipes, retrieving results,\n reporting to third party services, etc.\n \"\"\"\n ...\n\n def _submit_cmd(self, executable_path: str, params: str) -> str:\n \"\"\"Return a formatted command for launching Task subprocess.\n\n May be overridden by subclasses.\n\n Args:\n executable_path (str): Path to the LUTE subprocess script.\n\n params (str): String of formatted command-line arguments.\n\n Returns:\n cmd (str): Appropriately formatted command for this Executor.\n \"\"\"\n cmd: str = \"\"\n if __debug__:\n cmd = f\"python -B {executable_path} {params}\"\n else:\n cmd = f\"python -OB {executable_path} {params}\"\n\n return cmd\n\n def execute_task(self) -> None:\n \"\"\"Run the requested Task as a subprocess.\"\"\"\n self._pre_task()\n lute_path: Optional[str] = os.getenv(\"LUTE_PATH\")\n if lute_path is None:\n logger.debug(\"Absolute path to subprocess_task.py not found.\")\n lute_path = os.path.abspath(f\"{os.path.dirname(__file__)}/../..\")\n self.update_environment({\"LUTE_PATH\": lute_path})\n executable_path: str = f\"{lute_path}/subprocess_task.py\"\n config_path: str = self._analysis_desc.task_env[\"LUTE_CONFIGPATH\"]\n params: str = f\"-c {config_path} -t {self._analysis_desc.task_result.task_name}\"\n\n cmd: str = self._submit_cmd(executable_path, params)\n proc: subprocess.Popen = self._submit_task(cmd)\n\n while self._task_is_running(proc):\n self._task_loop(proc)\n time.sleep(self._analysis_desc.poll_interval)\n\n os.set_blocking(proc.stdout.fileno(), True)\n os.set_blocking(proc.stderr.fileno(), True)\n\n self._finalize_task(proc)\n proc.stdout.close()\n proc.stderr.close()\n proc.wait()\n if ret := proc.returncode:\n logger.info(f\"Task failed with return code: {ret}\")\n self._analysis_desc.task_result.task_status = TaskStatus.FAILED\n self.Hooks.task_failed(self, msg=Message())\n elif self._analysis_desc.task_result.task_status == TaskStatus.RUNNING:\n # Ret code is 0, no exception was thrown, task forgot to set status\n self._analysis_desc.task_result.task_status = TaskStatus.COMPLETED\n logger.debug(f\"Task did not change from RUNNING status. Assume COMPLETED.\")\n self.Hooks.task_done(self, msg=Message())\n self._store_configuration()\n for comm in self._communicators:\n comm.clear_communicator()\n\n if self._analysis_desc.task_result.task_status == TaskStatus.FAILED:\n logger.info(\"Exiting after Task failure. Result recorded.\")\n sys.exit(-1)\n\n self.process_results()\n\n def _store_configuration(self) -> None:\n \"\"\"Store configuration and results in the LUTE database.\"\"\"\n record_analysis_db(copy.deepcopy(self._analysis_desc))\n\n def _task_is_running(self, proc: subprocess.Popen) -> bool:\n \"\"\"Whether a subprocess is running.\n\n Args:\n proc (subprocess.Popen): The subprocess to determine the run status\n of.\n\n Returns:\n bool: Is the subprocess task running.\n \"\"\"\n # Add additional conditions - don't want to exit main loop\n # if only stopped\n task_status: TaskStatus = self._analysis_desc.task_result.task_status\n is_running: bool = task_status != TaskStatus.COMPLETED\n is_running &= task_status != TaskStatus.CANCELLED\n is_running &= task_status != TaskStatus.TIMEDOUT\n return proc.poll() is None and is_running\n\n def _stop(self, proc: subprocess.Popen) -> None:\n \"\"\"Stop the Task subprocess.\"\"\"\n os.kill(proc.pid, signal.SIGTSTP)\n self._analysis_desc.task_result.task_status = TaskStatus.STOPPED\n\n def _continue(self, proc: subprocess.Popen) -> None:\n \"\"\"Resume a stopped Task subprocess.\"\"\"\n os.kill(proc.pid, signal.SIGCONT)\n self._analysis_desc.task_result.task_status = TaskStatus.RUNNING\n\n def _set_result_from_parameters(self) -> None:\n \"\"\"Use TaskParameters object to set TaskResult fields.\n\n A result may be defined in terms of specific parameters. This is most\n useful for ThirdPartyTasks which would not otherwise have an easy way of\n reporting what the TaskResult is. There are two options for specifying\n results from parameters:\n 1. A single parameter (Field) of the model has an attribute\n `is_result`. This is a bool indicating that this parameter points\n to a result. E.g. a parameter `output` may set `is_result=True`.\n 2. The `TaskParameters.Config` has a `result_from_params` attribute.\n This is an appropriate option if the result is determinable for\n the Task, but it is not easily defined by a single parameter. The\n TaskParameters.Config.result_from_param can be set by a custom\n validator, e.g. to combine the values of multiple parameters into\n a single result. E.g. an `out_dir` and `out_file` parameter used\n together specify the result. Currently only string specifiers are\n supported.\n\n A TaskParameters object specifies that it contains information about the\n result by setting a single config option:\n TaskParameters.Config.set_result=True\n In general, this method should only be called when the above condition is\n met, however, there are minimal checks in it as well.\n \"\"\"\n # This method shouldn't be called unless appropriate\n # But we will add extra guards here\n if self._analysis_desc.task_parameters is None:\n logger.debug(\n \"Cannot set result from TaskParameters. TaskParameters is None!\"\n )\n return\n if (\n not hasattr(self._analysis_desc.task_parameters.Config, \"set_result\")\n or not self._analysis_desc.task_parameters.Config.set_result\n ):\n logger.debug(\n \"Cannot set result from TaskParameters. `set_result` not specified!\"\n )\n return\n\n # First try to set from result_from_params (faster)\n if self._analysis_desc.task_parameters.Config.result_from_params is not None:\n result_from_params: str = (\n self._analysis_desc.task_parameters.Config.result_from_params\n )\n logger.info(f\"TaskResult specified as {result_from_params}.\")\n self._analysis_desc.task_result.payload = result_from_params\n else:\n # Iterate parameters to find the one that is the result\n schema: Dict[str, Any] = self._analysis_desc.task_parameters.schema()\n for param, value in self._analysis_desc.task_parameters.dict().items():\n param_attrs: Dict[str, Any] = schema[\"properties\"][param]\n if \"is_result\" in param_attrs:\n is_result: bool = param_attrs[\"is_result\"]\n if isinstance(is_result, bool) and is_result:\n logger.info(f\"TaskResult specified as {value}.\")\n self._analysis_desc.task_result.payload = value\n else:\n logger.debug(\n (\n f\"{param} specified as result! But specifier is of \"\n f\"wrong type: {type(is_result)}!\"\n )\n )\n break # We should only have 1 result-like parameter!\n\n # If we get this far and haven't changed the payload we should complain\n if self._analysis_desc.task_result.payload == \"\":\n task_name: str = self._analysis_desc.task_result.task_name\n logger.debug(\n (\n f\"{task_name} specified result be set from {task_name}Parameters,\"\n \" but no result provided! Check model definition!\"\n )\n )\n # Now check for impl_schemas and pass to result.impl_schemas\n # Currently unused\n impl_schemas: Optional[str] = (\n self._analysis_desc.task_parameters.Config.impl_schemas\n )\n self._analysis_desc.task_result.impl_schemas = impl_schemas\n # If we set_result but didn't get schema information we should complain\n if self._analysis_desc.task_result.impl_schemas is None:\n task_name: str = self._analysis_desc.task_result.task_name\n logger.debug(\n (\n f\"{task_name} specified result be set from {task_name}Parameters,\"\n \" but no schema provided! Check model definition!\"\n )\n )\n\n def process_results(self) -> None:\n \"\"\"Perform any necessary steps to process TaskResults object.\n\n Processing will depend on subclass. Examples of steps include, moving\n files, converting file formats, compiling plots/figures into an HTML\n file, etc.\n \"\"\"\n self._process_results()\n\n @abstractmethod\n def _process_results(self) -> None: ...\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.Hooks","title":"Hooks
","text":"A container class for the Executor's event hooks.
There is a corresponding function (hook) for each event/signal. Each function takes two parameters - a reference to the Executor (self) and a reference to the Message (msg) which includes the corresponding signal.
Source code inlute/execution/executor.py
class Hooks:\n \"\"\"A container class for the Executor's event hooks.\n\n There is a corresponding function (hook) for each event/signal. Each\n function takes two parameters - a reference to the Executor (self) and\n a reference to the Message (msg) which includes the corresponding\n signal.\n \"\"\"\n\n def no_pickle_mode(self: Self, msg: Message): ...\n\n def task_started(self: Self, msg: Message): ...\n\n def task_failed(self: Self, msg: Message): ...\n\n def task_stopped(self: Self, msg: Message): ...\n\n def task_done(self: Self, msg: Message): ...\n\n def task_cancelled(self: Self, msg: Message): ...\n\n def task_result(self: Self, msg: Message): ...\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.__init__","title":"__init__(task_name, communicators, poll_interval=0.05)
","text":"The Executor will manage the subprocess in which task_name
is run.
Parameters:
Name Type Description Defaulttask_name
str
The name of the Task to be submitted. Must match the Task's class name exactly. The parameter specification must also be in a properly named model to be identified.
requiredcommunicators
List[Communicator]
A list of one or more communicators which manage information flow to/from the Task. Subclasses may have different defaults, and new functionality can be introduced by composing Executors with communicators.
requiredpoll_interval
float
Time to wait between reading/writing to the managed subprocess. In seconds.
0.05
Source code in lute/execution/executor.py
def __init__(\n self,\n task_name: str,\n communicators: List[Communicator],\n poll_interval: float = 0.05,\n) -> None:\n \"\"\"The Executor will manage the subprocess in which `task_name` is run.\n\n Args:\n task_name (str): The name of the Task to be submitted. Must match\n the Task's class name exactly. The parameter specification must\n also be in a properly named model to be identified.\n\n communicators (List[Communicator]): A list of one or more\n communicators which manage information flow to/from the Task.\n Subclasses may have different defaults, and new functionality\n can be introduced by composing Executors with communicators.\n\n poll_interval (float): Time to wait between reading/writing to the\n managed subprocess. In seconds.\n \"\"\"\n result: TaskResult = TaskResult(\n task_name=task_name, task_status=TaskStatus.PENDING, summary=\"\", payload=\"\"\n )\n task_parameters: Optional[TaskParameters] = None\n task_env: Dict[str, str] = os.environ.copy()\n self._communicators: List[Communicator] = communicators\n communicator_desc: List[str] = []\n for comm in self._communicators:\n comm.stage_communicator()\n communicator_desc.append(str(comm))\n\n self._analysis_desc: DescribedAnalysis = DescribedAnalysis(\n task_result=result,\n task_parameters=task_parameters,\n task_env=task_env,\n poll_interval=poll_interval,\n communicator_desc=communicator_desc,\n )\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.add_default_hooks","title":"add_default_hooks()
abstractmethod
","text":"Populate the set of default event hooks.
Source code inlute/execution/executor.py
@abstractmethod\ndef add_default_hooks(self) -> None:\n \"\"\"Populate the set of default event hooks.\"\"\"\n\n ...\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.add_hook","title":"add_hook(event, hook)
","text":"Add a new hook.
Each hook is a function called any time the Executor receives a signal for a particular event, e.g. Task starts, Task ends, etc. Calling this method will remove any hook that currently exists for the event. I.e. only one hook can be called per event at a time. Creating hooks for events which do not exist is not allowed.
Parameters:
Name Type Description Defaultevent
str
The event for which the hook will be called.
required Source code inlute/execution/executor.py
def add_hook(self, event: str, hook: Callable[[Self, Message], None]) -> None:\n \"\"\"Add a new hook.\n\n Each hook is a function called any time the Executor receives a signal\n for a particular event, e.g. Task starts, Task ends, etc. Calling this\n method will remove any hook that currently exists for the event. I.e.\n only one hook can be called per event at a time. Creating hooks for\n events which do not exist is not allowed.\n\n Args:\n event (str): The event for which the hook will be called.\n\n hook (Callable[[None], None]) The function to be called during each\n occurrence of the event.\n \"\"\"\n if event.upper() in LUTE_SIGNALS:\n setattr(self.Hooks, event.lower(), hook)\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.execute_task","title":"execute_task()
","text":"Run the requested Task as a subprocess.
Source code inlute/execution/executor.py
def execute_task(self) -> None:\n \"\"\"Run the requested Task as a subprocess.\"\"\"\n self._pre_task()\n lute_path: Optional[str] = os.getenv(\"LUTE_PATH\")\n if lute_path is None:\n logger.debug(\"Absolute path to subprocess_task.py not found.\")\n lute_path = os.path.abspath(f\"{os.path.dirname(__file__)}/../..\")\n self.update_environment({\"LUTE_PATH\": lute_path})\n executable_path: str = f\"{lute_path}/subprocess_task.py\"\n config_path: str = self._analysis_desc.task_env[\"LUTE_CONFIGPATH\"]\n params: str = f\"-c {config_path} -t {self._analysis_desc.task_result.task_name}\"\n\n cmd: str = self._submit_cmd(executable_path, params)\n proc: subprocess.Popen = self._submit_task(cmd)\n\n while self._task_is_running(proc):\n self._task_loop(proc)\n time.sleep(self._analysis_desc.poll_interval)\n\n os.set_blocking(proc.stdout.fileno(), True)\n os.set_blocking(proc.stderr.fileno(), True)\n\n self._finalize_task(proc)\n proc.stdout.close()\n proc.stderr.close()\n proc.wait()\n if ret := proc.returncode:\n logger.info(f\"Task failed with return code: {ret}\")\n self._analysis_desc.task_result.task_status = TaskStatus.FAILED\n self.Hooks.task_failed(self, msg=Message())\n elif self._analysis_desc.task_result.task_status == TaskStatus.RUNNING:\n # Ret code is 0, no exception was thrown, task forgot to set status\n self._analysis_desc.task_result.task_status = TaskStatus.COMPLETED\n logger.debug(f\"Task did not change from RUNNING status. Assume COMPLETED.\")\n self.Hooks.task_done(self, msg=Message())\n self._store_configuration()\n for comm in self._communicators:\n comm.clear_communicator()\n\n if self._analysis_desc.task_result.task_status == TaskStatus.FAILED:\n logger.info(\"Exiting after Task failure. Result recorded.\")\n sys.exit(-1)\n\n self.process_results()\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.process_results","title":"process_results()
","text":"Perform any necessary steps to process TaskResults object.
Processing will depend on subclass. Examples of steps include, moving files, converting file formats, compiling plots/figures into an HTML file, etc.
Source code inlute/execution/executor.py
def process_results(self) -> None:\n \"\"\"Perform any necessary steps to process TaskResults object.\n\n Processing will depend on subclass. Examples of steps include, moving\n files, converting file formats, compiling plots/figures into an HTML\n file, etc.\n \"\"\"\n self._process_results()\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.shell_source","title":"shell_source(env)
","text":"Source a script.
Unlike update_environment
this method sources a new file.
Parameters:
Name Type Description Defaultenv
str
Path to the script to source.
required Source code inlute/execution/executor.py
def shell_source(self, env: str) -> None:\n \"\"\"Source a script.\n\n Unlike `update_environment` this method sources a new file.\n\n Args:\n env (str): Path to the script to source.\n \"\"\"\n import sys\n\n if not os.path.exists(env):\n logger.info(f\"Cannot source environment from {env}!\")\n return\n\n script: str = (\n f\"set -a\\n\"\n f'source \"{env}\" >/dev/null\\n'\n f'{sys.executable} -c \"import os; print(dict(os.environ))\"\\n'\n )\n logger.info(f\"Sourcing file {env}\")\n o, e = subprocess.Popen(\n [\"bash\", \"-c\", script], stdout=subprocess.PIPE\n ).communicate()\n new_environment: Dict[str, str] = eval(o)\n self._analysis_desc.task_env = new_environment\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.update_environment","title":"update_environment(env, update_path='prepend')
","text":"Update the stored set of environment variables.
These are passed to the subprocess to setup its environment.
Parameters:
Name Type Description Defaultenv
Dict[str, str]
A dictionary of \"VAR\":\"VALUE\" pairs of environment variables to be added to the subprocess environment. If any variables already exist, the new variables will overwrite them (except PATH, see below).
requiredupdate_path
str
If PATH is present in the new set of variables, this argument determines how the old PATH is dealt with. There are three options: * \"prepend\" : The new PATH values are prepended to the old ones. * \"append\" : The new PATH values are appended to the old ones. * \"overwrite\" : The old PATH is overwritten by the new one. \"prepend\" is the default option. If PATH is not present in the current environment, the new PATH is used without modification.
'prepend'
Source code in lute/execution/executor.py
def update_environment(\n self, env: Dict[str, str], update_path: str = \"prepend\"\n) -> None:\n \"\"\"Update the stored set of environment variables.\n\n These are passed to the subprocess to setup its environment.\n\n Args:\n env (Dict[str, str]): A dictionary of \"VAR\":\"VALUE\" pairs of\n environment variables to be added to the subprocess environment.\n If any variables already exist, the new variables will\n overwrite them (except PATH, see below).\n\n update_path (str): If PATH is present in the new set of variables,\n this argument determines how the old PATH is dealt with. There\n are three options:\n * \"prepend\" : The new PATH values are prepended to the old ones.\n * \"append\" : The new PATH values are appended to the old ones.\n * \"overwrite\" : The old PATH is overwritten by the new one.\n \"prepend\" is the default option. If PATH is not present in the\n current environment, the new PATH is used without modification.\n \"\"\"\n if \"PATH\" in env:\n sep: str = os.pathsep\n if update_path == \"prepend\":\n env[\"PATH\"] = (\n f\"{env['PATH']}{sep}{self._analysis_desc.task_env['PATH']}\"\n )\n elif update_path == \"append\":\n env[\"PATH\"] = (\n f\"{self._analysis_desc.task_env['PATH']}{sep}{env['PATH']}\"\n )\n elif update_path == \"overwrite\":\n pass\n else:\n raise ValueError(\n (\n f\"{update_path} is not a valid option for `update_path`!\"\n \" Options are: prepend, append, overwrite.\"\n )\n )\n os.environ.update(env)\n self._analysis_desc.task_env.update(env)\n
"},{"location":"source/execution/executor/#execution.executor.Communicator","title":"Communicator
","text":" Bases: ABC
lute/execution/ipc.py
class Communicator(ABC):\n def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n \"\"\"Abstract Base Class for IPC Communicator objects.\n\n Args:\n party (Party): Which object (side/process) the Communicator is\n managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n use_pickle (bool): Whether to serialize data using pickle prior to\n sending it.\n \"\"\"\n self._party = party\n self._use_pickle = use_pickle\n self.desc = \"Communicator abstract base class.\"\n\n @abstractmethod\n def read(self, proc: subprocess.Popen) -> Message:\n \"\"\"Method for reading data through the communication mechanism.\"\"\"\n ...\n\n @abstractmethod\n def write(self, msg: Message) -> None:\n \"\"\"Method for sending data through the communication mechanism.\"\"\"\n ...\n\n def __str__(self):\n name: str = str(type(self)).split(\"'\")[1].split(\".\")[-1]\n return f\"{name}: {self.desc}\"\n\n def __repr__(self):\n return self.__str__()\n\n def __enter__(self) -> Self:\n return self\n\n def __exit__(self) -> None: ...\n\n @property\n def has_messages(self) -> bool:\n \"\"\"Whether the Communicator has remaining messages.\n\n The precise method for determining whether there are remaining messages\n will depend on the specific Communicator sub-class.\n \"\"\"\n return False\n\n def stage_communicator(self):\n \"\"\"Alternative method for staging outside of context manager.\"\"\"\n self.__enter__()\n\n def clear_communicator(self):\n \"\"\"Alternative exit method outside of context manager.\"\"\"\n self.__exit__()\n\n def delayed_setup(self):\n \"\"\"Any setup that should be done later than init.\"\"\"\n ...\n
"},{"location":"source/execution/executor/#execution.executor.Communicator.has_messages","title":"has_messages: bool
property
","text":"Whether the Communicator has remaining messages.
The precise method for determining whether there are remaining messages will depend on the specific Communicator sub-class.
"},{"location":"source/execution/executor/#execution.executor.Communicator.__init__","title":"__init__(party=Party.TASK, use_pickle=True)
","text":"Abstract Base Class for IPC Communicator objects.
Parameters:
Name Type Description Defaultparty
Party
Which object (side/process) the Communicator is managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.
TASK
use_pickle
bool
Whether to serialize data using pickle prior to sending it.
True
Source code in lute/execution/ipc.py
def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n \"\"\"Abstract Base Class for IPC Communicator objects.\n\n Args:\n party (Party): Which object (side/process) the Communicator is\n managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n use_pickle (bool): Whether to serialize data using pickle prior to\n sending it.\n \"\"\"\n self._party = party\n self._use_pickle = use_pickle\n self.desc = \"Communicator abstract base class.\"\n
"},{"location":"source/execution/executor/#execution.executor.Communicator.clear_communicator","title":"clear_communicator()
","text":"Alternative exit method outside of context manager.
Source code inlute/execution/ipc.py
def clear_communicator(self):\n \"\"\"Alternative exit method outside of context manager.\"\"\"\n self.__exit__()\n
"},{"location":"source/execution/executor/#execution.executor.Communicator.delayed_setup","title":"delayed_setup()
","text":"Any setup that should be done later than init.
Source code inlute/execution/ipc.py
def delayed_setup(self):\n \"\"\"Any setup that should be done later than init.\"\"\"\n ...\n
"},{"location":"source/execution/executor/#execution.executor.Communicator.read","title":"read(proc)
abstractmethod
","text":"Method for reading data through the communication mechanism.
Source code inlute/execution/ipc.py
@abstractmethod\ndef read(self, proc: subprocess.Popen) -> Message:\n \"\"\"Method for reading data through the communication mechanism.\"\"\"\n ...\n
"},{"location":"source/execution/executor/#execution.executor.Communicator.stage_communicator","title":"stage_communicator()
","text":"Alternative method for staging outside of context manager.
Source code inlute/execution/ipc.py
def stage_communicator(self):\n \"\"\"Alternative method for staging outside of context manager.\"\"\"\n self.__enter__()\n
"},{"location":"source/execution/executor/#execution.executor.Communicator.write","title":"write(msg)
abstractmethod
","text":"Method for sending data through the communication mechanism.
Source code inlute/execution/ipc.py
@abstractmethod\ndef write(self, msg: Message) -> None:\n \"\"\"Method for sending data through the communication mechanism.\"\"\"\n ...\n
"},{"location":"source/execution/executor/#execution.executor.Executor","title":"Executor
","text":" Bases: BaseExecutor
Basic implementation of an Executor which manages simple IPC with Task.
Attributes:
Methods:
Name Descriptionadd_hook
str, hook: Callable[[None], None]) -> None: Create a new hook to be called each time a specific event occurs.
add_default_hooks
Populate the event hooks with the default functions.
update_environment
Dict[str, str], update_path: str): Update the environment that is passed to the Task subprocess.
execute_task
Run the task as a subprocess.
Source code inlute/execution/executor.py
class Executor(BaseExecutor):\n \"\"\"Basic implementation of an Executor which manages simple IPC with Task.\n\n Attributes:\n\n Methods:\n add_hook(event: str, hook: Callable[[None], None]) -> None: Create a\n new hook to be called each time a specific event occurs.\n\n add_default_hooks() -> None: Populate the event hooks with the default\n functions.\n\n update_environment(env: Dict[str, str], update_path: str): Update the\n environment that is passed to the Task subprocess.\n\n execute_task(): Run the task as a subprocess.\n \"\"\"\n\n def __init__(\n self,\n task_name: str,\n communicators: List[Communicator] = [\n PipeCommunicator(Party.EXECUTOR),\n SocketCommunicator(Party.EXECUTOR),\n ],\n poll_interval: float = 0.05,\n ) -> None:\n super().__init__(\n task_name=task_name,\n communicators=communicators,\n poll_interval=poll_interval,\n )\n self.add_default_hooks()\n\n def add_default_hooks(self) -> None:\n \"\"\"Populate the set of default event hooks.\"\"\"\n\n def no_pickle_mode(self: Executor, msg: Message):\n for idx, communicator in enumerate(self._communicators):\n if isinstance(communicator, PipeCommunicator):\n self._communicators[idx] = PipeCommunicator(\n Party.EXECUTOR, use_pickle=False\n )\n\n self.add_hook(\"no_pickle_mode\", no_pickle_mode)\n\n def task_started(self: Executor, msg: Message):\n if isinstance(msg.contents, TaskParameters):\n self._analysis_desc.task_parameters = msg.contents\n # Maybe just run this no matter what? Rely on the other guards?\n # Perhaps just check if ThirdPartyParameters?\n # if isinstance(self._analysis_desc.task_parameters, ThirdPartyParameters):\n if hasattr(self._analysis_desc.task_parameters.Config, \"set_result\"):\n # Third party Tasks may mark a parameter as the result\n # If so, setup the result now.\n self._set_result_from_parameters()\n logger.info(\n f\"Executor: {self._analysis_desc.task_result.task_name} started\"\n )\n self._analysis_desc.task_result.task_status = TaskStatus.RUNNING\n elog_data: Dict[str, str] = {\n f\"{self._analysis_desc.task_result.task_name} status\": \"RUNNING\",\n }\n post_elog_run_status(elog_data)\n\n self.add_hook(\"task_started\", task_started)\n\n def task_failed(self: Executor, msg: Message):\n elog_data: Dict[str, str] = {\n f\"{self._analysis_desc.task_result.task_name} status\": \"FAILED\",\n }\n post_elog_run_status(elog_data)\n\n self.add_hook(\"task_failed\", task_failed)\n\n def task_stopped(self: Executor, msg: Message):\n elog_data: Dict[str, str] = {\n f\"{self._analysis_desc.task_result.task_name} status\": \"STOPPED\",\n }\n post_elog_run_status(elog_data)\n\n self.add_hook(\"task_stopped\", task_stopped)\n\n def task_done(self: Executor, msg: Message):\n elog_data: Dict[str, str] = {\n f\"{self._analysis_desc.task_result.task_name} status\": \"COMPLETED\",\n }\n post_elog_run_status(elog_data)\n\n self.add_hook(\"task_done\", task_done)\n\n def task_cancelled(self: Executor, msg: Message):\n elog_data: Dict[str, str] = {\n f\"{self._analysis_desc.task_result.task_name} status\": \"CANCELLED\",\n }\n post_elog_run_status(elog_data)\n\n self.add_hook(\"task_cancelled\", task_cancelled)\n\n def task_result(self: Executor, msg: Message):\n if isinstance(msg.contents, TaskResult):\n self._analysis_desc.task_result = msg.contents\n logger.info(self._analysis_desc.task_result.summary)\n logger.info(self._analysis_desc.task_result.task_status)\n elog_data: Dict[str, str] = {\n f\"{self._analysis_desc.task_result.task_name} status\": \"COMPLETED\",\n }\n post_elog_run_status(elog_data)\n\n self.add_hook(\"task_result\", task_result)\n\n def _task_loop(self, proc: subprocess.Popen) -> None:\n \"\"\"Actions to perform while the Task is running.\n\n This function is run in the body of a loop until the Task signals\n that its finished.\n \"\"\"\n for communicator in self._communicators:\n while True:\n msg: Message = communicator.read(proc)\n if msg.signal is not None and msg.signal.upper() in LUTE_SIGNALS:\n hook: Callable[[Executor, Message], None] = getattr(\n self.Hooks, msg.signal.lower()\n )\n hook(self, msg)\n if msg.contents is not None:\n if isinstance(msg.contents, str) and msg.contents != \"\":\n logger.info(msg.contents)\n elif not isinstance(msg.contents, str):\n logger.info(msg.contents)\n if not communicator.has_messages:\n break\n\n def _finalize_task(self, proc: subprocess.Popen) -> None:\n \"\"\"Any actions to be performed after the Task has ended.\n\n Examples include a final clearing of the pipes, retrieving results,\n reporting to third party services, etc.\n \"\"\"\n self._task_loop(proc) # Perform a final read.\n\n def _process_results(self) -> None:\n \"\"\"Performs result processing.\n\n Actions include:\n - For `ElogSummaryPlots`, will save the summary plot to the appropriate\n directory for display in the eLog.\n \"\"\"\n task_result: TaskResult = self._analysis_desc.task_result\n self._process_result_payload(task_result.payload)\n self._process_result_summary(task_result.summary)\n\n def _process_result_payload(self, payload: Any) -> None:\n if self._analysis_desc.task_parameters is None:\n logger.debug(\"Please run Task before using this method!\")\n return\n if isinstance(payload, ElogSummaryPlots):\n # ElogSummaryPlots has figures and a display name\n # display name also serves as a path.\n expmt: str = self._analysis_desc.task_parameters.lute_config.experiment\n base_path: str = f\"/sdf/data/lcls/ds/{expmt[:3]}/{expmt}/stats/summary\"\n full_path: str = f\"{base_path}/{payload.display_name}\"\n if not os.path.isdir(full_path):\n os.makedirs(full_path)\n\n # Preferred plots are pn.Tabs objects which save directly as html\n # Only supported plot type that has \"save\" method - do not want to\n # import plot modules here to do type checks.\n if hasattr(payload.figures, \"save\"):\n payload.figures.save(f\"{full_path}/report.html\")\n else:\n ...\n elif isinstance(payload, str):\n # May be a path to a file...\n schemas: Optional[str] = self._analysis_desc.task_result.impl_schemas\n # Should also check `impl_schemas` to determine what to do with path\n\n def _process_result_summary(self, summary: str) -> None: ...\n
"},{"location":"source/execution/executor/#execution.executor.Executor.add_default_hooks","title":"add_default_hooks()
","text":"Populate the set of default event hooks.
Source code inlute/execution/executor.py
def add_default_hooks(self) -> None:\n \"\"\"Populate the set of default event hooks.\"\"\"\n\n def no_pickle_mode(self: Executor, msg: Message):\n for idx, communicator in enumerate(self._communicators):\n if isinstance(communicator, PipeCommunicator):\n self._communicators[idx] = PipeCommunicator(\n Party.EXECUTOR, use_pickle=False\n )\n\n self.add_hook(\"no_pickle_mode\", no_pickle_mode)\n\n def task_started(self: Executor, msg: Message):\n if isinstance(msg.contents, TaskParameters):\n self._analysis_desc.task_parameters = msg.contents\n # Maybe just run this no matter what? Rely on the other guards?\n # Perhaps just check if ThirdPartyParameters?\n # if isinstance(self._analysis_desc.task_parameters, ThirdPartyParameters):\n if hasattr(self._analysis_desc.task_parameters.Config, \"set_result\"):\n # Third party Tasks may mark a parameter as the result\n # If so, setup the result now.\n self._set_result_from_parameters()\n logger.info(\n f\"Executor: {self._analysis_desc.task_result.task_name} started\"\n )\n self._analysis_desc.task_result.task_status = TaskStatus.RUNNING\n elog_data: Dict[str, str] = {\n f\"{self._analysis_desc.task_result.task_name} status\": \"RUNNING\",\n }\n post_elog_run_status(elog_data)\n\n self.add_hook(\"task_started\", task_started)\n\n def task_failed(self: Executor, msg: Message):\n elog_data: Dict[str, str] = {\n f\"{self._analysis_desc.task_result.task_name} status\": \"FAILED\",\n }\n post_elog_run_status(elog_data)\n\n self.add_hook(\"task_failed\", task_failed)\n\n def task_stopped(self: Executor, msg: Message):\n elog_data: Dict[str, str] = {\n f\"{self._analysis_desc.task_result.task_name} status\": \"STOPPED\",\n }\n post_elog_run_status(elog_data)\n\n self.add_hook(\"task_stopped\", task_stopped)\n\n def task_done(self: Executor, msg: Message):\n elog_data: Dict[str, str] = {\n f\"{self._analysis_desc.task_result.task_name} status\": \"COMPLETED\",\n }\n post_elog_run_status(elog_data)\n\n self.add_hook(\"task_done\", task_done)\n\n def task_cancelled(self: Executor, msg: Message):\n elog_data: Dict[str, str] = {\n f\"{self._analysis_desc.task_result.task_name} status\": \"CANCELLED\",\n }\n post_elog_run_status(elog_data)\n\n self.add_hook(\"task_cancelled\", task_cancelled)\n\n def task_result(self: Executor, msg: Message):\n if isinstance(msg.contents, TaskResult):\n self._analysis_desc.task_result = msg.contents\n logger.info(self._analysis_desc.task_result.summary)\n logger.info(self._analysis_desc.task_result.task_status)\n elog_data: Dict[str, str] = {\n f\"{self._analysis_desc.task_result.task_name} status\": \"COMPLETED\",\n }\n post_elog_run_status(elog_data)\n\n self.add_hook(\"task_result\", task_result)\n
"},{"location":"source/execution/executor/#execution.executor.MPIExecutor","title":"MPIExecutor
","text":" Bases: Executor
Runs first-party Tasks that require MPI.
This Executor is otherwise identical to the standard Executor, except it uses mpirun
for Task
submission. Currently this Executor assumes a job has been submitted using SLURM as a first step. It will determine the number of MPI ranks based on the resources requested. As a fallback, it will try to determine the number of local cores available for cases where a job has not been submitted via SLURM. On S3DF, the second determination mechanism should accurately match the environment variable provided by SLURM indicating resources allocated.
This Executor will submit the Task to run with a number of processes equal to the total number of cores available minus 1. A single core is reserved for the Executor itself. Note that currently this means that you must submit on 3 cores or more, since MPI requires a minimum of 2 ranks, and the number of ranks is determined from the cores dedicated to Task execution.
Methods:
Name Description_submit_cmd
Run the task as a subprocess using mpirun
.
lute/execution/executor.py
class MPIExecutor(Executor):\n \"\"\"Runs first-party Tasks that require MPI.\n\n This Executor is otherwise identical to the standard Executor, except it\n uses `mpirun` for `Task` submission. Currently this Executor assumes a job\n has been submitted using SLURM as a first step. It will determine the number\n of MPI ranks based on the resources requested. As a fallback, it will try\n to determine the number of local cores available for cases where a job has\n not been submitted via SLURM. On S3DF, the second determination mechanism\n should accurately match the environment variable provided by SLURM indicating\n resources allocated.\n\n This Executor will submit the Task to run with a number of processes equal\n to the total number of cores available minus 1. A single core is reserved\n for the Executor itself. Note that currently this means that you must submit\n on 3 cores or more, since MPI requires a minimum of 2 ranks, and the number\n of ranks is determined from the cores dedicated to Task execution.\n\n Methods:\n _submit_cmd: Run the task as a subprocess using `mpirun`.\n \"\"\"\n\n def _submit_cmd(self, executable_path: str, params: str) -> str:\n \"\"\"Override submission command to use `mpirun`\n\n Args:\n executable_path (str): Path to the LUTE subprocess script.\n\n params (str): String of formatted command-line arguments.\n\n Returns:\n cmd (str): Appropriately formatted command for this Executor.\n \"\"\"\n py_cmd: str = \"\"\n nprocs: int = max(\n int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1\n )\n mpi_cmd: str = f\"mpirun -np {nprocs}\"\n if __debug__:\n py_cmd = f\"python -B -u -m mpi4py.run {executable_path} {params}\"\n else:\n py_cmd = f\"python -OB -u -m mpi4py.run {executable_path} {params}\"\n\n cmd: str = f\"{mpi_cmd} {py_cmd}\"\n return cmd\n
"},{"location":"source/execution/executor/#execution.executor.Party","title":"Party
","text":" Bases: Enum
Identifier for which party (side/end) is using a communicator.
For some types of communication streams there may be different interfaces depending on which side of the communicator you are on. This enum is used by the communicator to determine which interface to use.
Source code inlute/execution/ipc.py
class Party(Enum):\n \"\"\"Identifier for which party (side/end) is using a communicator.\n\n For some types of communication streams there may be different interfaces\n depending on which side of the communicator you are on. This enum is used\n by the communicator to determine which interface to use.\n \"\"\"\n\n TASK = 0\n \"\"\"\n The Task (client) side.\n \"\"\"\n EXECUTOR = 1\n \"\"\"\n The Executor (server) side.\n \"\"\"\n
"},{"location":"source/execution/executor/#execution.executor.Party.EXECUTOR","title":"EXECUTOR = 1
class-attribute
instance-attribute
","text":"The Executor (server) side.
"},{"location":"source/execution/executor/#execution.executor.Party.TASK","title":"TASK = 0
class-attribute
instance-attribute
","text":"The Task (client) side.
"},{"location":"source/execution/executor/#execution.executor.PipeCommunicator","title":"PipeCommunicator
","text":" Bases: Communicator
Provides communication through pipes over stderr/stdout.
The implementation of this communicator has reading and writing ocurring on stderr and stdout. In general the Task
will be writing while the Executor
will be reading. stderr
is used for sending signals.
lute/execution/ipc.py
class PipeCommunicator(Communicator):\n \"\"\"Provides communication through pipes over stderr/stdout.\n\n The implementation of this communicator has reading and writing ocurring\n on stderr and stdout. In general the `Task` will be writing while the\n `Executor` will be reading. `stderr` is used for sending signals.\n \"\"\"\n\n def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n \"\"\"IPC through pipes.\n\n Arbitrary objects may be transmitted using pickle to serialize the data.\n If pickle is not used\n\n Args:\n party (Party): Which object (side/process) the Communicator is\n managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n use_pickle (bool): Whether to serialize data using Pickle prior to\n sending it. If False, data is assumed to be text whi\n \"\"\"\n super().__init__(party=party, use_pickle=use_pickle)\n self.desc = \"Communicates through stderr and stdout using pickle.\"\n\n def read(self, proc: subprocess.Popen) -> Message:\n \"\"\"Read from stdout and stderr.\n\n Args:\n proc (subprocess.Popen): The process to read from.\n\n Returns:\n msg (Message): The message read, containing contents and signal.\n \"\"\"\n signal: Optional[str]\n contents: Optional[str]\n raw_signal: bytes = proc.stderr.read()\n raw_contents: bytes = proc.stdout.read()\n if raw_signal is not None:\n signal = raw_signal.decode()\n else:\n signal = raw_signal\n if raw_contents:\n if self._use_pickle:\n try:\n contents = pickle.loads(raw_contents)\n except (pickle.UnpicklingError, ValueError, EOFError) as err:\n logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=False\")\n self._use_pickle = False\n contents = self._safe_unpickle_decode(raw_contents)\n else:\n try:\n contents = raw_contents.decode()\n except UnicodeDecodeError as err:\n logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=True\")\n self._use_pickle = True\n contents = self._safe_unpickle_decode(raw_contents)\n else:\n contents = None\n\n if signal and signal not in LUTE_SIGNALS:\n # Some tasks write on stderr\n # If the signal channel has \"non-signal\" info, add it to\n # contents\n if not contents:\n contents = f\"({signal})\"\n else:\n contents = f\"{contents} ({signal})\"\n signal = None\n\n return Message(contents=contents, signal=signal)\n\n def _safe_unpickle_decode(self, maybe_mixed: bytes) -> Optional[str]:\n \"\"\"This method is used to unpickle and/or decode a bytes object.\n\n It attempts to handle cases where contents can be mixed, i.e., part of\n the message must be decoded and the other part unpickled. It handles\n only two-way splits. If there are more complex arrangements such as:\n <pickled>:<unpickled>:<pickled> etc, it will give up.\n\n The simpler two way splits are unlikely to occur in normal usage. They\n may arise when debugging if, e.g., `print` statements are mixed with the\n usage of the `_report_to_executor` method.\n\n Note that this method works because ONLY text data is assumed to be\n sent via the pipes. The method needs to be revised to handle non-text\n data if the `Task` is modified to also send that via PipeCommunicator.\n The use of pickle is supported to provide for this option if it is\n necessary. It may be deprecated in the future.\n\n Be careful when making changes. This method has seemingly redundant\n checks because unpickling will not throw an error if a full object can\n be retrieved. That is, the library will ignore extraneous bytes. This\n method attempts to retrieve that information if the pickled data comes\n first in the stream.\n\n Args:\n maybe_mixed (bytes): A bytes object which could require unpickling,\n decoding, or both.\n\n Returns:\n contents (Optional[str]): The unpickled/decoded contents if possible.\n Otherwise, None.\n \"\"\"\n contents: Optional[str]\n try:\n contents = pickle.loads(maybe_mixed)\n repickled: bytes = pickle.dumps(contents)\n if len(repickled) < len(maybe_mixed):\n # Successful unpickling, but pickle stops even if there are more bytes\n try:\n additional_data: str = maybe_mixed[len(repickled) :].decode()\n contents = f\"{contents}{additional_data}\"\n except UnicodeDecodeError:\n # Can't decode the bytes left by pickle, so they are lost\n missing_bytes: int = len(maybe_mixed) - len(repickled)\n logger.debug(\n f\"PipeCommunicator has truncated message. Unable to retrieve {missing_bytes} bytes.\"\n )\n except (pickle.UnpicklingError, ValueError, EOFError) as err:\n # Pickle may also throw a ValueError, e.g. this bytes: b\"Found! \\n\"\n # Pickle may also throw an EOFError, eg. this bytes: b\"F0\\n\"\n try:\n contents = maybe_mixed.decode()\n except UnicodeDecodeError as err2:\n try:\n contents = maybe_mixed[: err2.start].decode()\n contents = f\"{contents}{pickle.loads(maybe_mixed[err2.start:])}\"\n except Exception as err3:\n logger.debug(\n f\"PipeCommunicator unable to decode/parse data! {err3}\"\n )\n contents = None\n return contents\n\n def write(self, msg: Message) -> None:\n \"\"\"Write to stdout and stderr.\n\n The signal component is sent to `stderr` while the contents of the\n Message are sent to `stdout`.\n\n Args:\n msg (Message): The Message to send.\n \"\"\"\n if self._use_pickle:\n signal: bytes\n if msg.signal:\n signal = msg.signal.encode()\n else:\n signal = b\"\"\n\n contents: bytes = pickle.dumps(msg.contents)\n\n sys.stderr.buffer.write(signal)\n sys.stdout.buffer.write(contents)\n\n sys.stderr.buffer.flush()\n sys.stdout.buffer.flush()\n else:\n raw_signal: str\n if msg.signal:\n raw_signal = msg.signal\n else:\n raw_signal = \"\"\n\n raw_contents: str\n if isinstance(msg.contents, str):\n raw_contents = msg.contents\n elif msg.contents is None:\n raw_contents = \"\"\n else:\n raise ValueError(\n f\"Cannot send msg contents of type: {type(msg.contents)} when not using pickle!\"\n )\n sys.stderr.write(raw_signal)\n sys.stdout.write(raw_contents)\n
"},{"location":"source/execution/executor/#execution.executor.PipeCommunicator.__init__","title":"__init__(party=Party.TASK, use_pickle=True)
","text":"IPC through pipes.
Arbitrary objects may be transmitted using pickle to serialize the data. If pickle is not used
Parameters:
Name Type Description Defaultparty
Party
Which object (side/process) the Communicator is managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.
TASK
use_pickle
bool
Whether to serialize data using Pickle prior to sending it. If False, data is assumed to be text whi
True
Source code in lute/execution/ipc.py
def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n \"\"\"IPC through pipes.\n\n Arbitrary objects may be transmitted using pickle to serialize the data.\n If pickle is not used\n\n Args:\n party (Party): Which object (side/process) the Communicator is\n managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n use_pickle (bool): Whether to serialize data using Pickle prior to\n sending it. If False, data is assumed to be text whi\n \"\"\"\n super().__init__(party=party, use_pickle=use_pickle)\n self.desc = \"Communicates through stderr and stdout using pickle.\"\n
"},{"location":"source/execution/executor/#execution.executor.PipeCommunicator.read","title":"read(proc)
","text":"Read from stdout and stderr.
Parameters:
Name Type Description Defaultproc
Popen
The process to read from.
requiredReturns:
Name Type Descriptionmsg
Message
The message read, containing contents and signal.
Source code inlute/execution/ipc.py
def read(self, proc: subprocess.Popen) -> Message:\n \"\"\"Read from stdout and stderr.\n\n Args:\n proc (subprocess.Popen): The process to read from.\n\n Returns:\n msg (Message): The message read, containing contents and signal.\n \"\"\"\n signal: Optional[str]\n contents: Optional[str]\n raw_signal: bytes = proc.stderr.read()\n raw_contents: bytes = proc.stdout.read()\n if raw_signal is not None:\n signal = raw_signal.decode()\n else:\n signal = raw_signal\n if raw_contents:\n if self._use_pickle:\n try:\n contents = pickle.loads(raw_contents)\n except (pickle.UnpicklingError, ValueError, EOFError) as err:\n logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=False\")\n self._use_pickle = False\n contents = self._safe_unpickle_decode(raw_contents)\n else:\n try:\n contents = raw_contents.decode()\n except UnicodeDecodeError as err:\n logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=True\")\n self._use_pickle = True\n contents = self._safe_unpickle_decode(raw_contents)\n else:\n contents = None\n\n if signal and signal not in LUTE_SIGNALS:\n # Some tasks write on stderr\n # If the signal channel has \"non-signal\" info, add it to\n # contents\n if not contents:\n contents = f\"({signal})\"\n else:\n contents = f\"{contents} ({signal})\"\n signal = None\n\n return Message(contents=contents, signal=signal)\n
"},{"location":"source/execution/executor/#execution.executor.PipeCommunicator.write","title":"write(msg)
","text":"Write to stdout and stderr.
The signal component is sent to stderr
while the contents of the Message are sent to stdout
.
Parameters:
Name Type Description Defaultmsg
Message
The Message to send.
required Source code inlute/execution/ipc.py
def write(self, msg: Message) -> None:\n \"\"\"Write to stdout and stderr.\n\n The signal component is sent to `stderr` while the contents of the\n Message are sent to `stdout`.\n\n Args:\n msg (Message): The Message to send.\n \"\"\"\n if self._use_pickle:\n signal: bytes\n if msg.signal:\n signal = msg.signal.encode()\n else:\n signal = b\"\"\n\n contents: bytes = pickle.dumps(msg.contents)\n\n sys.stderr.buffer.write(signal)\n sys.stdout.buffer.write(contents)\n\n sys.stderr.buffer.flush()\n sys.stdout.buffer.flush()\n else:\n raw_signal: str\n if msg.signal:\n raw_signal = msg.signal\n else:\n raw_signal = \"\"\n\n raw_contents: str\n if isinstance(msg.contents, str):\n raw_contents = msg.contents\n elif msg.contents is None:\n raw_contents = \"\"\n else:\n raise ValueError(\n f\"Cannot send msg contents of type: {type(msg.contents)} when not using pickle!\"\n )\n sys.stderr.write(raw_signal)\n sys.stdout.write(raw_contents)\n
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator","title":"SocketCommunicator
","text":" Bases: Communicator
Provides communication over Unix or TCP sockets.
Communication is provided either using sockets with the Python socket library or using ZMQ. The choice of implementation is controlled by the global bool USE_ZMQ
.
LUTE_USE_TCP=1
If defined, TCP sockets will be used, otherwise Unix sockets will be used.
Regardless of socket type, the environment variable LUTE_EXECUTOR_HOST=<hostname>
will be defined by the Executor-side Communicator.
For TCP sockets: The Executor-side Communicator should be run first and will bind to all interfaces on the port determined by the environment variable: LUTE_PORT=###
If no port is defined, a port scan will be performed and the Executor-side Communicator will bind the first one available from a random selection. It will then define the environment variable so the Task-side can pick it up.
For Unix sockets: The path to the Unix socket is defined by the environment variable: LUTE_SOCKET=/path/to/socket
This class assumes proper permissions and that this above environment variable has been defined. The Task
is configured as what would commonly be referred to as the client
, while the Executor
is configured as the server.
If the Task process is run on a different machine than the Executor, the Task-side Communicator will open a ssh-tunnel to forward traffic from a local Unix socket to the Executor Unix socket. Opening of the tunnel relies on the environment variable: LUTE_EXECUTOR_HOST=<hostname>
to determine the Executor's host. This variable should be defined by the Executor and passed to the Task process automatically, but it can also be defined manually if launching the Task process separately. The Task will use the local socket <LUTE_SOCKET>.task{##}
. Multiple local sockets may be created. Currently, it is assumed that the user is identical on both the Task machine and Executor machine.
lute/execution/ipc.py
class SocketCommunicator(Communicator):\n \"\"\"Provides communication over Unix or TCP sockets.\n\n Communication is provided either using sockets with the Python socket library\n or using ZMQ. The choice of implementation is controlled by the global bool\n `USE_ZMQ`.\n\n Whether to use TCP or Unix sockets is controlled by the environment:\n `LUTE_USE_TCP=1`\n If defined, TCP sockets will be used, otherwise Unix sockets will be used.\n\n Regardless of socket type, the environment variable\n `LUTE_EXECUTOR_HOST=<hostname>`\n will be defined by the Executor-side Communicator.\n\n\n For TCP sockets:\n The Executor-side Communicator should be run first and will bind to all\n interfaces on the port determined by the environment variable:\n `LUTE_PORT=###`\n If no port is defined, a port scan will be performed and the Executor-side\n Communicator will bind the first one available from a random selection. It\n will then define the environment variable so the Task-side can pick it up.\n\n For Unix sockets:\n The path to the Unix socket is defined by the environment variable:\n `LUTE_SOCKET=/path/to/socket`\n This class assumes proper permissions and that this above environment\n variable has been defined. The `Task` is configured as what would commonly\n be referred to as the `client`, while the `Executor` is configured as the\n server.\n\n If the Task process is run on a different machine than the Executor, the\n Task-side Communicator will open a ssh-tunnel to forward traffic from a local\n Unix socket to the Executor Unix socket. Opening of the tunnel relies on the\n environment variable:\n `LUTE_EXECUTOR_HOST=<hostname>`\n to determine the Executor's host. This variable should be defined by the\n Executor and passed to the Task process automatically, but it can also be\n defined manually if launching the Task process separately. The Task will use\n the local socket `<LUTE_SOCKET>.task{##}`. Multiple local sockets may be\n created. Currently, it is assumed that the user is identical on both the Task\n machine and Executor machine.\n \"\"\"\n\n ACCEPT_TIMEOUT: float = 0.01\n \"\"\"\n Maximum time to wait to accept connections. Used by Executor-side.\n \"\"\"\n MSG_HEAD: bytes = b\"MSG\"\n \"\"\"\n Start signal of a message. The end of a message is indicated by MSG_HEAD[::-1].\n \"\"\"\n MSG_SEP: bytes = b\";;;\"\n \"\"\"\n Separator for parts of a message. Messages have a start, length, message and end.\n \"\"\"\n\n def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n \"\"\"IPC over a TCP or Unix socket.\n\n Unlike with the PipeCommunicator, pickle is always used to send data\n through the socket.\n\n Args:\n party (Party): Which object (side/process) the Communicator is\n managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n\n use_pickle (bool): Whether to use pickle. Always True currently,\n passing False does not change behaviour.\n \"\"\"\n super().__init__(party=party, use_pickle=use_pickle)\n\n def delayed_setup(self) -> None:\n \"\"\"Delays the creation of socket objects.\n\n The Executor initializes the Communicator when it is created. Since\n all Executors are created and available at once we want to delay\n acquisition of socket resources until a single Executor is ready\n to use them.\n \"\"\"\n self._data_socket: Union[socket.socket, zmq.sugar.socket.Socket]\n if USE_ZMQ:\n self.desc: str = \"Communicates using ZMQ through TCP or Unix sockets.\"\n self._context: zmq.context.Context = zmq.Context()\n self._data_socket = self._create_socket_zmq()\n else:\n self.desc: str = \"Communicates through a TCP or Unix socket.\"\n self._data_socket = self._create_socket_raw()\n self._data_socket.settimeout(SocketCommunicator.ACCEPT_TIMEOUT)\n\n if self._party == Party.EXECUTOR:\n # Executor created first so we can define the hostname env variable\n os.environ[\"LUTE_EXECUTOR_HOST\"] = socket.gethostname()\n # Setup reader thread\n self._reader_thread: threading.Thread = threading.Thread(\n target=self._read_socket\n )\n self._msg_queue: queue.Queue = queue.Queue()\n self._partial_msg: Optional[bytes] = None\n self._stop_thread: bool = False\n self._reader_thread.start()\n else:\n # Only used by Party.TASK\n self._use_ssh_tunnel: bool = False\n self._ssh_proc: Optional[subprocess.Popen] = None\n self._local_socket_path: Optional[str] = None\n\n # Read\n ############################################################################\n\n def read(self, proc: subprocess.Popen) -> Message:\n \"\"\"Return a message from the queue if available.\n\n Socket(s) are continuously monitored, and read from when new data is\n available.\n\n Args:\n proc (subprocess.Popen): The process to read from. Provided for\n compatibility with other Communicator subtypes. Is ignored.\n\n Returns:\n msg (Message): The message read, containing contents and signal.\n \"\"\"\n msg: Message\n try:\n msg = self._msg_queue.get(timeout=SocketCommunicator.ACCEPT_TIMEOUT)\n except queue.Empty:\n msg = Message()\n\n return msg\n\n def _read_socket(self) -> None:\n \"\"\"Read data from a socket.\n\n Socket(s) are continuously monitored, and read from when new data is\n available.\n\n Calls an underlying method for either raw sockets or ZMQ.\n \"\"\"\n\n while True:\n if self._stop_thread:\n logger.debug(\"Stopping socket reader thread.\")\n break\n if USE_ZMQ:\n self._read_socket_zmq()\n else:\n self._read_socket_raw()\n\n def _read_socket_raw(self) -> None:\n \"\"\"Read data from a socket.\n\n Raw socket implementation for the reader thread.\n \"\"\"\n connection: socket.socket\n addr: Union[str, Tuple[str, int]]\n try:\n connection, addr = self._data_socket.accept()\n full_data: bytes = b\"\"\n while True:\n data: bytes = connection.recv(8192)\n if data:\n full_data += data\n else:\n break\n connection.close()\n self._unpack_messages(full_data)\n except socket.timeout:\n pass\n\n def _read_socket_zmq(self) -> None:\n \"\"\"Read data from a socket.\n\n ZMQ implementation for the reader thread.\n \"\"\"\n try:\n full_data: bytes = self._data_socket.recv(0)\n self._unpack_messages(full_data)\n except zmq.ZMQError:\n pass\n\n def _unpack_messages(self, data: bytes) -> None:\n \"\"\"Unpacks a byte stream into individual messages.\n\n Messages are encoded in the following format:\n <HEAD><SEP><len(msg)><SEP><msg><SEP><HEAD[::-1]>\n The items between <> are replaced as follows:\n - <HEAD>: A start marker\n - <SEP>: A separator for components of the message\n - <len(msg)>: The length of the message payload in bytes.\n - <msg>: The message payload in bytes\n - <HEAD[::-1]>: The start marker in reverse to indicate the end.\n\n Partial messages (a series of bytes which cannot be converted to a full\n message) are stored for later. An attempt is made to reconstruct the\n message with the next call to this method.\n\n Args:\n data (bytes): A raw byte stream containing anywhere from a partial\n message to multiple full messages.\n \"\"\"\n msg: Message\n working_data: bytes\n if self._partial_msg:\n # Concatenate the previous partial message to the beginning\n working_data = self._partial_msg + data\n self._partial_msg = None\n else:\n working_data = data\n while working_data:\n try:\n # Message encoding: <HEAD><SEP><len><SEP><msg><SEP><HEAD[::-1]>\n end = working_data.find(\n SocketCommunicator.MSG_SEP + SocketCommunicator.MSG_HEAD[::-1]\n )\n msg_parts: List[bytes] = working_data[:end].split(\n SocketCommunicator.MSG_SEP\n )\n if len(msg_parts) != 3:\n self._partial_msg = working_data\n break\n\n cmd: bytes\n nbytes: bytes\n raw_msg: bytes\n cmd, nbytes, raw_msg = msg_parts\n if len(raw_msg) != int(nbytes):\n self._partial_msg = working_data\n break\n msg = pickle.loads(raw_msg)\n self._msg_queue.put(msg)\n except pickle.UnpicklingError:\n self._partial_msg = working_data\n break\n if end < len(working_data):\n # Add len(SEP+HEAD) since end marks the start of <SEP><HEAD[::-1]\n offset: int = len(\n SocketCommunicator.MSG_SEP + SocketCommunicator.MSG_HEAD\n )\n working_data = working_data[end + offset :]\n else:\n working_data = b\"\"\n\n # Write\n ############################################################################\n\n def _write_socket(self, msg: Message) -> None:\n \"\"\"Sends data over a socket from the 'client' (Task) side.\n\n Messages are encoded in the following format:\n <HEAD><SEP><len(msg)><SEP><msg><SEP><HEAD[::-1]>\n The items between <> are replaced as follows:\n - <HEAD>: A start marker\n - <SEP>: A separator for components of the message\n - <len(msg)>: The length of the message payload in bytes.\n - <msg>: The message payload in bytes\n - <HEAD[::-1]>: The start marker in reverse to indicate the end.\n\n This structure is used for decoding the message on the other end.\n \"\"\"\n data: bytes = pickle.dumps(msg)\n cmd: bytes = SocketCommunicator.MSG_HEAD\n size: bytes = b\"%d\" % len(data)\n end: bytes = SocketCommunicator.MSG_HEAD[::-1]\n sep: bytes = SocketCommunicator.MSG_SEP\n packed_msg: bytes = cmd + sep + size + sep + data + sep + end\n if USE_ZMQ:\n self._data_socket.send(packed_msg)\n else:\n self._data_socket.sendall(packed_msg)\n\n def write(self, msg: Message) -> None:\n \"\"\"Send a single Message.\n\n The entire Message (signal and contents) is serialized and sent through\n a connection over Unix socket.\n\n Args:\n msg (Message): The Message to send.\n \"\"\"\n self._write_socket(msg)\n\n # Generic create\n ############################################################################\n\n def _create_socket_raw(self) -> socket.socket:\n \"\"\"Create either a Unix or TCP socket.\n\n If the environment variable:\n `LUTE_USE_TCP=1`\n is defined, a TCP socket is returned, otherwise a Unix socket.\n\n Refer to the individual initialization methods for additional environment\n variables controlling the behaviour of these two communication types.\n\n Returns:\n data_socket (socket.socket): TCP or Unix socket.\n \"\"\"\n import struct\n\n use_tcp: Optional[str] = os.getenv(\"LUTE_USE_TCP\")\n sock: socket.socket\n if use_tcp is not None:\n if self._party == Party.EXECUTOR:\n logger.info(\"Will use raw TCP sockets.\")\n sock = self._init_tcp_socket_raw()\n else:\n if self._party == Party.EXECUTOR:\n logger.info(\"Will use raw Unix sockets.\")\n sock = self._init_unix_socket_raw()\n sock.setsockopt(\n socket.SOL_SOCKET, socket.SO_LINGER, struct.pack(\"ii\", 1, 10000)\n )\n return sock\n\n def _create_socket_zmq(self) -> zmq.sugar.socket.Socket:\n \"\"\"Create either a Unix or TCP socket.\n\n If the environment variable:\n `LUTE_USE_TCP=1`\n is defined, a TCP socket is returned, otherwise a Unix socket.\n\n Refer to the individual initialization methods for additional environment\n variables controlling the behaviour of these two communication types.\n\n Returns:\n data_socket (socket.socket): Unix socket object.\n \"\"\"\n socket_type: Literal[zmq.PULL, zmq.PUSH]\n if self._party == Party.EXECUTOR:\n socket_type = zmq.PULL\n else:\n socket_type = zmq.PUSH\n\n data_socket: zmq.sugar.socket.Socket = self._context.socket(socket_type)\n data_socket.set_hwm(160000)\n # Need to multiply by 1000 since ZMQ uses ms\n data_socket.setsockopt(\n zmq.RCVTIMEO, int(SocketCommunicator.ACCEPT_TIMEOUT * 1000)\n )\n # Try TCP first\n use_tcp: Optional[str] = os.getenv(\"LUTE_USE_TCP\")\n if use_tcp is not None:\n if self._party == Party.EXECUTOR:\n logger.info(\"Will use TCP (ZMQ).\")\n self._init_tcp_socket_zmq(data_socket)\n else:\n if self._party == Party.EXECUTOR:\n logger.info(\"Will use Unix sockets (ZMQ).\")\n self._init_unix_socket_zmq(data_socket)\n\n return data_socket\n\n # TCP Init\n ############################################################################\n\n def _find_random_port(\n self, min_port: int = 41923, max_port: int = 64324, max_tries: int = 100\n ) -> Optional[int]:\n \"\"\"Find a random open port to bind to if using TCP.\"\"\"\n from random import choices\n\n sock: socket.socket\n ports: List[int] = choices(range(min_port, max_port), k=max_tries)\n for port in ports:\n sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)\n try:\n sock.bind((\"\", port))\n sock.close()\n del sock\n return port\n except:\n continue\n return None\n\n def _init_tcp_socket_raw(self) -> socket.socket:\n \"\"\"Initialize a TCP socket.\n\n Executor-side code should always be run first. It checks to see if\n the environment variable\n `LUTE_PORT=###`\n is defined, if so binds it, otherwise find a free port from a selection\n of random ports. If a port search is performed, the `LUTE_PORT` variable\n will be defined so it can be picked up by the the Task-side Communicator.\n\n In the event that no port can be bound on the Executor-side, or the port\n and hostname information is unavailable to the Task-side, the program\n will exit.\n\n Returns:\n data_socket (socket.socket): TCP socket object.\n \"\"\"\n data_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)\n port: Optional[Union[str, int]] = os.getenv(\"LUTE_PORT\")\n if self._party == Party.EXECUTOR:\n if port is None:\n # If port is None find one\n # Executor code executes first\n port = self._find_random_port()\n if port is None:\n # Failed to find a port to bind\n logger.info(\n \"Executor failed to bind a port. \"\n \"Try providing a LUTE_PORT directly! Exiting!\"\n )\n sys.exit(-1)\n # Provide port env var for Task-side\n os.environ[\"LUTE_PORT\"] = str(port)\n data_socket.bind((\"\", int(port)))\n data_socket.listen()\n else:\n hostname: str = socket.gethostname()\n executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n if executor_hostname is None or port is None:\n logger.info(\n \"Task-side does not have host/port information!\"\n \" Check environment variables! Exiting!\"\n )\n sys.exit(-1)\n if hostname == executor_hostname:\n data_socket.connect((\"localhost\", int(port)))\n else:\n data_socket.connect((executor_hostname, int(port)))\n return data_socket\n\n def _init_tcp_socket_zmq(self, data_socket: zmq.sugar.socket.Socket) -> None:\n \"\"\"Initialize a TCP socket using ZMQ.\n\n Equivalent as the method above but requires passing in a ZMQ socket\n object instead of returning one.\n\n Args:\n data_socket (zmq.socket.Socket): Socket object.\n \"\"\"\n port: Optional[Union[str, int]] = os.getenv(\"LUTE_PORT\")\n if self._party == Party.EXECUTOR:\n if port is None:\n new_port: int = data_socket.bind_to_random_port(\"tcp://*\")\n if new_port is None:\n # Failed to find a port to bind\n logger.info(\n \"Executor failed to bind a port. \"\n \"Try providing a LUTE_PORT directly! Exiting!\"\n )\n sys.exit(-1)\n port = new_port\n os.environ[\"LUTE_PORT\"] = str(port)\n else:\n data_socket.bind(f\"tcp://*:{port}\")\n logger.debug(f\"Executor bound port {port}\")\n else:\n executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n if executor_hostname is None or port is None:\n logger.info(\n \"Task-side does not have host/port information!\"\n \" Check environment variables! Exiting!\"\n )\n sys.exit(-1)\n data_socket.connect(f\"tcp://{executor_hostname}:{port}\")\n\n # Unix Init\n ############################################################################\n\n def _get_socket_path(self) -> str:\n \"\"\"Return the socket path, defining one if it is not available.\n\n Returns:\n socket_path (str): Path to the Unix socket.\n \"\"\"\n socket_path: str\n try:\n socket_path = os.environ[\"LUTE_SOCKET\"]\n except KeyError as err:\n import uuid\n import tempfile\n\n # Define a path, and add to environment\n # Executor-side always created first, Task will use the same one\n socket_path = f\"{tempfile.gettempdir()}/lute_{uuid.uuid4().hex}.sock\"\n os.environ[\"LUTE_SOCKET\"] = socket_path\n logger.debug(f\"SocketCommunicator defines socket_path: {socket_path}\")\n if USE_ZMQ:\n return f\"ipc://{socket_path}\"\n else:\n return socket_path\n\n def _init_unix_socket_raw(self) -> socket.socket:\n \"\"\"Returns a Unix socket object.\n\n Executor-side code should always be run first. It checks to see if\n the environment variable\n `LUTE_SOCKET=XYZ`\n is defined, if so binds it, otherwise it will create a new path and\n define the environment variable for the Task-side to find.\n\n On the Task (client-side), this method will also open a SSH tunnel to\n forward a local Unix socket to an Executor Unix socket if the Task and\n Executor processes are on different machines.\n\n Returns:\n data_socket (socket.socket): Unix socket object.\n \"\"\"\n socket_path: str = self._get_socket_path()\n data_socket = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)\n if self._party == Party.EXECUTOR:\n if os.path.exists(socket_path):\n os.unlink(socket_path)\n data_socket.bind(socket_path)\n data_socket.listen()\n elif self._party == Party.TASK:\n hostname: str = socket.gethostname()\n executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n if executor_hostname is None:\n logger.info(\"Hostname for Executor process not found! Exiting!\")\n data_socket.close()\n sys.exit(-1)\n if hostname == executor_hostname:\n data_socket.connect(socket_path)\n else:\n self._local_socket_path = self._setup_unix_ssh_tunnel(\n socket_path, hostname, executor_hostname\n )\n while 1:\n # Keep trying reconnect until ssh tunnel works.\n try:\n data_socket.connect(self._local_socket_path)\n break\n except FileNotFoundError:\n continue\n\n return data_socket\n\n def _init_unix_socket_zmq(self, data_socket: zmq.sugar.socket.Socket) -> None:\n \"\"\"Initialize a Unix socket object, using ZMQ.\n\n Equivalent as the method above but requires passing in a ZMQ socket\n object instead of returning one.\n\n Args:\n data_socket (socket.socket): ZMQ object.\n \"\"\"\n socket_path = self._get_socket_path()\n if self._party == Party.EXECUTOR:\n if os.path.exists(socket_path):\n os.unlink(socket_path)\n data_socket.bind(socket_path)\n elif self._party == Party.TASK:\n hostname: str = socket.gethostname()\n executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n if executor_hostname is None:\n logger.info(\"Hostname for Executor process not found! Exiting!\")\n self._data_socket.close()\n sys.exit(-1)\n if hostname == executor_hostname:\n data_socket.connect(socket_path)\n else:\n # Need to remove ipc:// from socket_path for forwarding\n self._local_socket_path = self._setup_unix_ssh_tunnel(\n socket_path[6:], hostname, executor_hostname\n )\n # Need to add it back\n path: str = f\"ipc://{self._local_socket_path}\"\n data_socket.connect(path)\n\n def _setup_unix_ssh_tunnel(\n self, socket_path: str, hostname: str, executor_hostname: str\n ) -> str:\n \"\"\"Prepares an SSH tunnel for forwarding between Unix sockets on two hosts.\n\n An SSH tunnel is opened with `ssh -L <local>:<remote> sleep 2`.\n This method of communication is slightly slower and incurs additional\n overhead - it should only be used as a backup. If communication across\n multiple hosts is required consider using TCP. The Task will use\n the local socket `<LUTE_SOCKET>.task{##}`. Multiple local sockets may be\n created. It is assumed that the user is identical on both the\n Task machine and Executor machine.\n\n Returns:\n local_socket_path (str): The local Unix socket to connect to.\n \"\"\"\n if \"uuid\" not in globals():\n import uuid\n local_socket_path = f\"{socket_path}.task{uuid.uuid4().hex[:4]}\"\n self._use_ssh_tunnel = True\n ssh_cmd: List[str] = [\n \"ssh\",\n \"-o\",\n \"LogLevel=quiet\",\n \"-L\",\n f\"{local_socket_path}:{socket_path}\",\n executor_hostname,\n \"sleep\",\n \"2\",\n ]\n logger.debug(f\"Opening tunnel from {hostname} to {executor_hostname}\")\n self._ssh_proc = subprocess.Popen(ssh_cmd)\n time.sleep(0.4) # Need to wait... -> Use single Task comm at beginning?\n return local_socket_path\n\n # Clean up and properties\n ############################################################################\n\n def _clean_up(self) -> None:\n \"\"\"Clean up connections.\"\"\"\n if self._party == Party.EXECUTOR:\n self._stop_thread = True\n self._reader_thread.join()\n logger.debug(\"Closed reading thread.\")\n\n self._data_socket.close()\n if USE_ZMQ:\n self._context.term()\n else:\n ...\n\n if os.getenv(\"LUTE_USE_TCP\"):\n return\n else:\n if self._party == Party.EXECUTOR:\n os.unlink(os.getenv(\"LUTE_SOCKET\")) # Should be defined\n return\n elif self._use_ssh_tunnel:\n if self._ssh_proc is not None:\n self._ssh_proc.terminate()\n\n @property\n def has_messages(self) -> bool:\n if self._party == Party.TASK:\n # Shouldn't be called on Task-side\n return False\n\n if self._msg_queue.qsize() > 0:\n return True\n return False\n\n def __exit__(self):\n self._clean_up()\n
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.ACCEPT_TIMEOUT","title":"ACCEPT_TIMEOUT: float = 0.01
class-attribute
instance-attribute
","text":"Maximum time to wait to accept connections. Used by Executor-side.
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.MSG_HEAD","title":"MSG_HEAD: bytes = b'MSG'
class-attribute
instance-attribute
","text":"Start signal of a message. The end of a message is indicated by MSG_HEAD[::-1].
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.MSG_SEP","title":"MSG_SEP: bytes = b';;;'
class-attribute
instance-attribute
","text":"Separator for parts of a message. Messages have a start, length, message and end.
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.__init__","title":"__init__(party=Party.TASK, use_pickle=True)
","text":"IPC over a TCP or Unix socket.
Unlike with the PipeCommunicator, pickle is always used to send data through the socket.
Parameters:
Name Type Description Defaultparty
Party
Which object (side/process) the Communicator is managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.
TASK
use_pickle
bool
Whether to use pickle. Always True currently, passing False does not change behaviour.
True
Source code in lute/execution/ipc.py
def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n \"\"\"IPC over a TCP or Unix socket.\n\n Unlike with the PipeCommunicator, pickle is always used to send data\n through the socket.\n\n Args:\n party (Party): Which object (side/process) the Communicator is\n managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n\n use_pickle (bool): Whether to use pickle. Always True currently,\n passing False does not change behaviour.\n \"\"\"\n super().__init__(party=party, use_pickle=use_pickle)\n
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.delayed_setup","title":"delayed_setup()
","text":"Delays the creation of socket objects.
The Executor initializes the Communicator when it is created. Since all Executors are created and available at once we want to delay acquisition of socket resources until a single Executor is ready to use them.
Source code inlute/execution/ipc.py
def delayed_setup(self) -> None:\n \"\"\"Delays the creation of socket objects.\n\n The Executor initializes the Communicator when it is created. Since\n all Executors are created and available at once we want to delay\n acquisition of socket resources until a single Executor is ready\n to use them.\n \"\"\"\n self._data_socket: Union[socket.socket, zmq.sugar.socket.Socket]\n if USE_ZMQ:\n self.desc: str = \"Communicates using ZMQ through TCP or Unix sockets.\"\n self._context: zmq.context.Context = zmq.Context()\n self._data_socket = self._create_socket_zmq()\n else:\n self.desc: str = \"Communicates through a TCP or Unix socket.\"\n self._data_socket = self._create_socket_raw()\n self._data_socket.settimeout(SocketCommunicator.ACCEPT_TIMEOUT)\n\n if self._party == Party.EXECUTOR:\n # Executor created first so we can define the hostname env variable\n os.environ[\"LUTE_EXECUTOR_HOST\"] = socket.gethostname()\n # Setup reader thread\n self._reader_thread: threading.Thread = threading.Thread(\n target=self._read_socket\n )\n self._msg_queue: queue.Queue = queue.Queue()\n self._partial_msg: Optional[bytes] = None\n self._stop_thread: bool = False\n self._reader_thread.start()\n else:\n # Only used by Party.TASK\n self._use_ssh_tunnel: bool = False\n self._ssh_proc: Optional[subprocess.Popen] = None\n self._local_socket_path: Optional[str] = None\n
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.read","title":"read(proc)
","text":"Return a message from the queue if available.
Socket(s) are continuously monitored, and read from when new data is available.
Parameters:
Name Type Description Defaultproc
Popen
The process to read from. Provided for compatibility with other Communicator subtypes. Is ignored.
requiredReturns:
Name Type Descriptionmsg
Message
The message read, containing contents and signal.
Source code inlute/execution/ipc.py
def read(self, proc: subprocess.Popen) -> Message:\n \"\"\"Return a message from the queue if available.\n\n Socket(s) are continuously monitored, and read from when new data is\n available.\n\n Args:\n proc (subprocess.Popen): The process to read from. Provided for\n compatibility with other Communicator subtypes. Is ignored.\n\n Returns:\n msg (Message): The message read, containing contents and signal.\n \"\"\"\n msg: Message\n try:\n msg = self._msg_queue.get(timeout=SocketCommunicator.ACCEPT_TIMEOUT)\n except queue.Empty:\n msg = Message()\n\n return msg\n
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.write","title":"write(msg)
","text":"Send a single Message.
The entire Message (signal and contents) is serialized and sent through a connection over Unix socket.
Parameters:
Name Type Description Defaultmsg
Message
The Message to send.
required Source code inlute/execution/ipc.py
def write(self, msg: Message) -> None:\n \"\"\"Send a single Message.\n\n The entire Message (signal and contents) is serialized and sent through\n a connection over Unix socket.\n\n Args:\n msg (Message): The Message to send.\n \"\"\"\n self._write_socket(msg)\n
"},{"location":"source/execution/ipc/","title":"ipc","text":"Classes and utilities for communication between Executors and subprocesses.
Communicators manage message passing and parsing between subprocesses. They maintain a limited public interface of \"read\" and \"write\" operations. Behind this interface the methods of communication vary from serialization across pipes to Unix sockets, etc. All communicators pass a single object called a \"Message\" which contains an arbitrary \"contents\" field as well as an optional \"signal\" field.
Classes:
Name DescriptionParty
Enum describing whether Communicator is on Task-side or Executor-side.
Message
A dataclass used for passing information from Task to Executor.
Communicator
Abstract base class for Communicator types.
PipeCommunicator
Manages communication between Task and Executor via pipes (stderr and stdout).
SocketCommunicator
Manages communication using sockets, either raw or using zmq. Supports both TCP and Unix sockets.
"},{"location":"source/execution/ipc/#execution.ipc.Communicator","title":"Communicator
","text":" Bases: ABC
lute/execution/ipc.py
class Communicator(ABC):\n def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n \"\"\"Abstract Base Class for IPC Communicator objects.\n\n Args:\n party (Party): Which object (side/process) the Communicator is\n managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n use_pickle (bool): Whether to serialize data using pickle prior to\n sending it.\n \"\"\"\n self._party = party\n self._use_pickle = use_pickle\n self.desc = \"Communicator abstract base class.\"\n\n @abstractmethod\n def read(self, proc: subprocess.Popen) -> Message:\n \"\"\"Method for reading data through the communication mechanism.\"\"\"\n ...\n\n @abstractmethod\n def write(self, msg: Message) -> None:\n \"\"\"Method for sending data through the communication mechanism.\"\"\"\n ...\n\n def __str__(self):\n name: str = str(type(self)).split(\"'\")[1].split(\".\")[-1]\n return f\"{name}: {self.desc}\"\n\n def __repr__(self):\n return self.__str__()\n\n def __enter__(self) -> Self:\n return self\n\n def __exit__(self) -> None: ...\n\n @property\n def has_messages(self) -> bool:\n \"\"\"Whether the Communicator has remaining messages.\n\n The precise method for determining whether there are remaining messages\n will depend on the specific Communicator sub-class.\n \"\"\"\n return False\n\n def stage_communicator(self):\n \"\"\"Alternative method for staging outside of context manager.\"\"\"\n self.__enter__()\n\n def clear_communicator(self):\n \"\"\"Alternative exit method outside of context manager.\"\"\"\n self.__exit__()\n\n def delayed_setup(self):\n \"\"\"Any setup that should be done later than init.\"\"\"\n ...\n
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.has_messages","title":"has_messages: bool
property
","text":"Whether the Communicator has remaining messages.
The precise method for determining whether there are remaining messages will depend on the specific Communicator sub-class.
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.__init__","title":"__init__(party=Party.TASK, use_pickle=True)
","text":"Abstract Base Class for IPC Communicator objects.
Parameters:
Name Type Description Defaultparty
Party
Which object (side/process) the Communicator is managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.
TASK
use_pickle
bool
Whether to serialize data using pickle prior to sending it.
True
Source code in lute/execution/ipc.py
def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n \"\"\"Abstract Base Class for IPC Communicator objects.\n\n Args:\n party (Party): Which object (side/process) the Communicator is\n managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n use_pickle (bool): Whether to serialize data using pickle prior to\n sending it.\n \"\"\"\n self._party = party\n self._use_pickle = use_pickle\n self.desc = \"Communicator abstract base class.\"\n
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.clear_communicator","title":"clear_communicator()
","text":"Alternative exit method outside of context manager.
Source code inlute/execution/ipc.py
def clear_communicator(self):\n \"\"\"Alternative exit method outside of context manager.\"\"\"\n self.__exit__()\n
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.delayed_setup","title":"delayed_setup()
","text":"Any setup that should be done later than init.
Source code inlute/execution/ipc.py
def delayed_setup(self):\n \"\"\"Any setup that should be done later than init.\"\"\"\n ...\n
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.read","title":"read(proc)
abstractmethod
","text":"Method for reading data through the communication mechanism.
Source code inlute/execution/ipc.py
@abstractmethod\ndef read(self, proc: subprocess.Popen) -> Message:\n \"\"\"Method for reading data through the communication mechanism.\"\"\"\n ...\n
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.stage_communicator","title":"stage_communicator()
","text":"Alternative method for staging outside of context manager.
Source code inlute/execution/ipc.py
def stage_communicator(self):\n \"\"\"Alternative method for staging outside of context manager.\"\"\"\n self.__enter__()\n
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.write","title":"write(msg)
abstractmethod
","text":"Method for sending data through the communication mechanism.
Source code inlute/execution/ipc.py
@abstractmethod\ndef write(self, msg: Message) -> None:\n \"\"\"Method for sending data through the communication mechanism.\"\"\"\n ...\n
"},{"location":"source/execution/ipc/#execution.ipc.Party","title":"Party
","text":" Bases: Enum
Identifier for which party (side/end) is using a communicator.
For some types of communication streams there may be different interfaces depending on which side of the communicator you are on. This enum is used by the communicator to determine which interface to use.
Source code inlute/execution/ipc.py
class Party(Enum):\n \"\"\"Identifier for which party (side/end) is using a communicator.\n\n For some types of communication streams there may be different interfaces\n depending on which side of the communicator you are on. This enum is used\n by the communicator to determine which interface to use.\n \"\"\"\n\n TASK = 0\n \"\"\"\n The Task (client) side.\n \"\"\"\n EXECUTOR = 1\n \"\"\"\n The Executor (server) side.\n \"\"\"\n
"},{"location":"source/execution/ipc/#execution.ipc.Party.EXECUTOR","title":"EXECUTOR = 1
class-attribute
instance-attribute
","text":"The Executor (server) side.
"},{"location":"source/execution/ipc/#execution.ipc.Party.TASK","title":"TASK = 0
class-attribute
instance-attribute
","text":"The Task (client) side.
"},{"location":"source/execution/ipc/#execution.ipc.PipeCommunicator","title":"PipeCommunicator
","text":" Bases: Communicator
Provides communication through pipes over stderr/stdout.
The implementation of this communicator has reading and writing ocurring on stderr and stdout. In general the Task
will be writing while the Executor
will be reading. stderr
is used for sending signals.
lute/execution/ipc.py
class PipeCommunicator(Communicator):\n \"\"\"Provides communication through pipes over stderr/stdout.\n\n The implementation of this communicator has reading and writing ocurring\n on stderr and stdout. In general the `Task` will be writing while the\n `Executor` will be reading. `stderr` is used for sending signals.\n \"\"\"\n\n def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n \"\"\"IPC through pipes.\n\n Arbitrary objects may be transmitted using pickle to serialize the data.\n If pickle is not used\n\n Args:\n party (Party): Which object (side/process) the Communicator is\n managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n use_pickle (bool): Whether to serialize data using Pickle prior to\n sending it. If False, data is assumed to be text whi\n \"\"\"\n super().__init__(party=party, use_pickle=use_pickle)\n self.desc = \"Communicates through stderr and stdout using pickle.\"\n\n def read(self, proc: subprocess.Popen) -> Message:\n \"\"\"Read from stdout and stderr.\n\n Args:\n proc (subprocess.Popen): The process to read from.\n\n Returns:\n msg (Message): The message read, containing contents and signal.\n \"\"\"\n signal: Optional[str]\n contents: Optional[str]\n raw_signal: bytes = proc.stderr.read()\n raw_contents: bytes = proc.stdout.read()\n if raw_signal is not None:\n signal = raw_signal.decode()\n else:\n signal = raw_signal\n if raw_contents:\n if self._use_pickle:\n try:\n contents = pickle.loads(raw_contents)\n except (pickle.UnpicklingError, ValueError, EOFError) as err:\n logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=False\")\n self._use_pickle = False\n contents = self._safe_unpickle_decode(raw_contents)\n else:\n try:\n contents = raw_contents.decode()\n except UnicodeDecodeError as err:\n logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=True\")\n self._use_pickle = True\n contents = self._safe_unpickle_decode(raw_contents)\n else:\n contents = None\n\n if signal and signal not in LUTE_SIGNALS:\n # Some tasks write on stderr\n # If the signal channel has \"non-signal\" info, add it to\n # contents\n if not contents:\n contents = f\"({signal})\"\n else:\n contents = f\"{contents} ({signal})\"\n signal = None\n\n return Message(contents=contents, signal=signal)\n\n def _safe_unpickle_decode(self, maybe_mixed: bytes) -> Optional[str]:\n \"\"\"This method is used to unpickle and/or decode a bytes object.\n\n It attempts to handle cases where contents can be mixed, i.e., part of\n the message must be decoded and the other part unpickled. It handles\n only two-way splits. If there are more complex arrangements such as:\n <pickled>:<unpickled>:<pickled> etc, it will give up.\n\n The simpler two way splits are unlikely to occur in normal usage. They\n may arise when debugging if, e.g., `print` statements are mixed with the\n usage of the `_report_to_executor` method.\n\n Note that this method works because ONLY text data is assumed to be\n sent via the pipes. The method needs to be revised to handle non-text\n data if the `Task` is modified to also send that via PipeCommunicator.\n The use of pickle is supported to provide for this option if it is\n necessary. It may be deprecated in the future.\n\n Be careful when making changes. This method has seemingly redundant\n checks because unpickling will not throw an error if a full object can\n be retrieved. That is, the library will ignore extraneous bytes. This\n method attempts to retrieve that information if the pickled data comes\n first in the stream.\n\n Args:\n maybe_mixed (bytes): A bytes object which could require unpickling,\n decoding, or both.\n\n Returns:\n contents (Optional[str]): The unpickled/decoded contents if possible.\n Otherwise, None.\n \"\"\"\n contents: Optional[str]\n try:\n contents = pickle.loads(maybe_mixed)\n repickled: bytes = pickle.dumps(contents)\n if len(repickled) < len(maybe_mixed):\n # Successful unpickling, but pickle stops even if there are more bytes\n try:\n additional_data: str = maybe_mixed[len(repickled) :].decode()\n contents = f\"{contents}{additional_data}\"\n except UnicodeDecodeError:\n # Can't decode the bytes left by pickle, so they are lost\n missing_bytes: int = len(maybe_mixed) - len(repickled)\n logger.debug(\n f\"PipeCommunicator has truncated message. Unable to retrieve {missing_bytes} bytes.\"\n )\n except (pickle.UnpicklingError, ValueError, EOFError) as err:\n # Pickle may also throw a ValueError, e.g. this bytes: b\"Found! \\n\"\n # Pickle may also throw an EOFError, eg. this bytes: b\"F0\\n\"\n try:\n contents = maybe_mixed.decode()\n except UnicodeDecodeError as err2:\n try:\n contents = maybe_mixed[: err2.start].decode()\n contents = f\"{contents}{pickle.loads(maybe_mixed[err2.start:])}\"\n except Exception as err3:\n logger.debug(\n f\"PipeCommunicator unable to decode/parse data! {err3}\"\n )\n contents = None\n return contents\n\n def write(self, msg: Message) -> None:\n \"\"\"Write to stdout and stderr.\n\n The signal component is sent to `stderr` while the contents of the\n Message are sent to `stdout`.\n\n Args:\n msg (Message): The Message to send.\n \"\"\"\n if self._use_pickle:\n signal: bytes\n if msg.signal:\n signal = msg.signal.encode()\n else:\n signal = b\"\"\n\n contents: bytes = pickle.dumps(msg.contents)\n\n sys.stderr.buffer.write(signal)\n sys.stdout.buffer.write(contents)\n\n sys.stderr.buffer.flush()\n sys.stdout.buffer.flush()\n else:\n raw_signal: str\n if msg.signal:\n raw_signal = msg.signal\n else:\n raw_signal = \"\"\n\n raw_contents: str\n if isinstance(msg.contents, str):\n raw_contents = msg.contents\n elif msg.contents is None:\n raw_contents = \"\"\n else:\n raise ValueError(\n f\"Cannot send msg contents of type: {type(msg.contents)} when not using pickle!\"\n )\n sys.stderr.write(raw_signal)\n sys.stdout.write(raw_contents)\n
"},{"location":"source/execution/ipc/#execution.ipc.PipeCommunicator.__init__","title":"__init__(party=Party.TASK, use_pickle=True)
","text":"IPC through pipes.
Arbitrary objects may be transmitted using pickle to serialize the data. If pickle is not used
Parameters:
Name Type Description Defaultparty
Party
Which object (side/process) the Communicator is managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.
TASK
use_pickle
bool
Whether to serialize data using Pickle prior to sending it. If False, data is assumed to be text whi
True
Source code in lute/execution/ipc.py
def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n \"\"\"IPC through pipes.\n\n Arbitrary objects may be transmitted using pickle to serialize the data.\n If pickle is not used\n\n Args:\n party (Party): Which object (side/process) the Communicator is\n managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n use_pickle (bool): Whether to serialize data using Pickle prior to\n sending it. If False, data is assumed to be text whi\n \"\"\"\n super().__init__(party=party, use_pickle=use_pickle)\n self.desc = \"Communicates through stderr and stdout using pickle.\"\n
"},{"location":"source/execution/ipc/#execution.ipc.PipeCommunicator.read","title":"read(proc)
","text":"Read from stdout and stderr.
Parameters:
Name Type Description Defaultproc
Popen
The process to read from.
requiredReturns:
Name Type Descriptionmsg
Message
The message read, containing contents and signal.
Source code inlute/execution/ipc.py
def read(self, proc: subprocess.Popen) -> Message:\n \"\"\"Read from stdout and stderr.\n\n Args:\n proc (subprocess.Popen): The process to read from.\n\n Returns:\n msg (Message): The message read, containing contents and signal.\n \"\"\"\n signal: Optional[str]\n contents: Optional[str]\n raw_signal: bytes = proc.stderr.read()\n raw_contents: bytes = proc.stdout.read()\n if raw_signal is not None:\n signal = raw_signal.decode()\n else:\n signal = raw_signal\n if raw_contents:\n if self._use_pickle:\n try:\n contents = pickle.loads(raw_contents)\n except (pickle.UnpicklingError, ValueError, EOFError) as err:\n logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=False\")\n self._use_pickle = False\n contents = self._safe_unpickle_decode(raw_contents)\n else:\n try:\n contents = raw_contents.decode()\n except UnicodeDecodeError as err:\n logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=True\")\n self._use_pickle = True\n contents = self._safe_unpickle_decode(raw_contents)\n else:\n contents = None\n\n if signal and signal not in LUTE_SIGNALS:\n # Some tasks write on stderr\n # If the signal channel has \"non-signal\" info, add it to\n # contents\n if not contents:\n contents = f\"({signal})\"\n else:\n contents = f\"{contents} ({signal})\"\n signal = None\n\n return Message(contents=contents, signal=signal)\n
"},{"location":"source/execution/ipc/#execution.ipc.PipeCommunicator.write","title":"write(msg)
","text":"Write to stdout and stderr.
The signal component is sent to stderr
while the contents of the Message are sent to stdout
.
Parameters:
Name Type Description Defaultmsg
Message
The Message to send.
required Source code inlute/execution/ipc.py
def write(self, msg: Message) -> None:\n \"\"\"Write to stdout and stderr.\n\n The signal component is sent to `stderr` while the contents of the\n Message are sent to `stdout`.\n\n Args:\n msg (Message): The Message to send.\n \"\"\"\n if self._use_pickle:\n signal: bytes\n if msg.signal:\n signal = msg.signal.encode()\n else:\n signal = b\"\"\n\n contents: bytes = pickle.dumps(msg.contents)\n\n sys.stderr.buffer.write(signal)\n sys.stdout.buffer.write(contents)\n\n sys.stderr.buffer.flush()\n sys.stdout.buffer.flush()\n else:\n raw_signal: str\n if msg.signal:\n raw_signal = msg.signal\n else:\n raw_signal = \"\"\n\n raw_contents: str\n if isinstance(msg.contents, str):\n raw_contents = msg.contents\n elif msg.contents is None:\n raw_contents = \"\"\n else:\n raise ValueError(\n f\"Cannot send msg contents of type: {type(msg.contents)} when not using pickle!\"\n )\n sys.stderr.write(raw_signal)\n sys.stdout.write(raw_contents)\n
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator","title":"SocketCommunicator
","text":" Bases: Communicator
Provides communication over Unix or TCP sockets.
Communication is provided either using sockets with the Python socket library or using ZMQ. The choice of implementation is controlled by the global bool USE_ZMQ
.
LUTE_USE_TCP=1
If defined, TCP sockets will be used, otherwise Unix sockets will be used.
Regardless of socket type, the environment variable LUTE_EXECUTOR_HOST=<hostname>
will be defined by the Executor-side Communicator.
For TCP sockets: The Executor-side Communicator should be run first and will bind to all interfaces on the port determined by the environment variable: LUTE_PORT=###
If no port is defined, a port scan will be performed and the Executor-side Communicator will bind the first one available from a random selection. It will then define the environment variable so the Task-side can pick it up.
For Unix sockets: The path to the Unix socket is defined by the environment variable: LUTE_SOCKET=/path/to/socket
This class assumes proper permissions and that this above environment variable has been defined. The Task
is configured as what would commonly be referred to as the client
, while the Executor
is configured as the server.
If the Task process is run on a different machine than the Executor, the Task-side Communicator will open a ssh-tunnel to forward traffic from a local Unix socket to the Executor Unix socket. Opening of the tunnel relies on the environment variable: LUTE_EXECUTOR_HOST=<hostname>
to determine the Executor's host. This variable should be defined by the Executor and passed to the Task process automatically, but it can also be defined manually if launching the Task process separately. The Task will use the local socket <LUTE_SOCKET>.task{##}
. Multiple local sockets may be created. Currently, it is assumed that the user is identical on both the Task machine and Executor machine.
lute/execution/ipc.py
class SocketCommunicator(Communicator):\n \"\"\"Provides communication over Unix or TCP sockets.\n\n Communication is provided either using sockets with the Python socket library\n or using ZMQ. The choice of implementation is controlled by the global bool\n `USE_ZMQ`.\n\n Whether to use TCP or Unix sockets is controlled by the environment:\n `LUTE_USE_TCP=1`\n If defined, TCP sockets will be used, otherwise Unix sockets will be used.\n\n Regardless of socket type, the environment variable\n `LUTE_EXECUTOR_HOST=<hostname>`\n will be defined by the Executor-side Communicator.\n\n\n For TCP sockets:\n The Executor-side Communicator should be run first and will bind to all\n interfaces on the port determined by the environment variable:\n `LUTE_PORT=###`\n If no port is defined, a port scan will be performed and the Executor-side\n Communicator will bind the first one available from a random selection. It\n will then define the environment variable so the Task-side can pick it up.\n\n For Unix sockets:\n The path to the Unix socket is defined by the environment variable:\n `LUTE_SOCKET=/path/to/socket`\n This class assumes proper permissions and that this above environment\n variable has been defined. The `Task` is configured as what would commonly\n be referred to as the `client`, while the `Executor` is configured as the\n server.\n\n If the Task process is run on a different machine than the Executor, the\n Task-side Communicator will open a ssh-tunnel to forward traffic from a local\n Unix socket to the Executor Unix socket. Opening of the tunnel relies on the\n environment variable:\n `LUTE_EXECUTOR_HOST=<hostname>`\n to determine the Executor's host. This variable should be defined by the\n Executor and passed to the Task process automatically, but it can also be\n defined manually if launching the Task process separately. The Task will use\n the local socket `<LUTE_SOCKET>.task{##}`. Multiple local sockets may be\n created. Currently, it is assumed that the user is identical on both the Task\n machine and Executor machine.\n \"\"\"\n\n ACCEPT_TIMEOUT: float = 0.01\n \"\"\"\n Maximum time to wait to accept connections. Used by Executor-side.\n \"\"\"\n MSG_HEAD: bytes = b\"MSG\"\n \"\"\"\n Start signal of a message. The end of a message is indicated by MSG_HEAD[::-1].\n \"\"\"\n MSG_SEP: bytes = b\";;;\"\n \"\"\"\n Separator for parts of a message. Messages have a start, length, message and end.\n \"\"\"\n\n def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n \"\"\"IPC over a TCP or Unix socket.\n\n Unlike with the PipeCommunicator, pickle is always used to send data\n through the socket.\n\n Args:\n party (Party): Which object (side/process) the Communicator is\n managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n\n use_pickle (bool): Whether to use pickle. Always True currently,\n passing False does not change behaviour.\n \"\"\"\n super().__init__(party=party, use_pickle=use_pickle)\n\n def delayed_setup(self) -> None:\n \"\"\"Delays the creation of socket objects.\n\n The Executor initializes the Communicator when it is created. Since\n all Executors are created and available at once we want to delay\n acquisition of socket resources until a single Executor is ready\n to use them.\n \"\"\"\n self._data_socket: Union[socket.socket, zmq.sugar.socket.Socket]\n if USE_ZMQ:\n self.desc: str = \"Communicates using ZMQ through TCP or Unix sockets.\"\n self._context: zmq.context.Context = zmq.Context()\n self._data_socket = self._create_socket_zmq()\n else:\n self.desc: str = \"Communicates through a TCP or Unix socket.\"\n self._data_socket = self._create_socket_raw()\n self._data_socket.settimeout(SocketCommunicator.ACCEPT_TIMEOUT)\n\n if self._party == Party.EXECUTOR:\n # Executor created first so we can define the hostname env variable\n os.environ[\"LUTE_EXECUTOR_HOST\"] = socket.gethostname()\n # Setup reader thread\n self._reader_thread: threading.Thread = threading.Thread(\n target=self._read_socket\n )\n self._msg_queue: queue.Queue = queue.Queue()\n self._partial_msg: Optional[bytes] = None\n self._stop_thread: bool = False\n self._reader_thread.start()\n else:\n # Only used by Party.TASK\n self._use_ssh_tunnel: bool = False\n self._ssh_proc: Optional[subprocess.Popen] = None\n self._local_socket_path: Optional[str] = None\n\n # Read\n ############################################################################\n\n def read(self, proc: subprocess.Popen) -> Message:\n \"\"\"Return a message from the queue if available.\n\n Socket(s) are continuously monitored, and read from when new data is\n available.\n\n Args:\n proc (subprocess.Popen): The process to read from. Provided for\n compatibility with other Communicator subtypes. Is ignored.\n\n Returns:\n msg (Message): The message read, containing contents and signal.\n \"\"\"\n msg: Message\n try:\n msg = self._msg_queue.get(timeout=SocketCommunicator.ACCEPT_TIMEOUT)\n except queue.Empty:\n msg = Message()\n\n return msg\n\n def _read_socket(self) -> None:\n \"\"\"Read data from a socket.\n\n Socket(s) are continuously monitored, and read from when new data is\n available.\n\n Calls an underlying method for either raw sockets or ZMQ.\n \"\"\"\n\n while True:\n if self._stop_thread:\n logger.debug(\"Stopping socket reader thread.\")\n break\n if USE_ZMQ:\n self._read_socket_zmq()\n else:\n self._read_socket_raw()\n\n def _read_socket_raw(self) -> None:\n \"\"\"Read data from a socket.\n\n Raw socket implementation for the reader thread.\n \"\"\"\n connection: socket.socket\n addr: Union[str, Tuple[str, int]]\n try:\n connection, addr = self._data_socket.accept()\n full_data: bytes = b\"\"\n while True:\n data: bytes = connection.recv(8192)\n if data:\n full_data += data\n else:\n break\n connection.close()\n self._unpack_messages(full_data)\n except socket.timeout:\n pass\n\n def _read_socket_zmq(self) -> None:\n \"\"\"Read data from a socket.\n\n ZMQ implementation for the reader thread.\n \"\"\"\n try:\n full_data: bytes = self._data_socket.recv(0)\n self._unpack_messages(full_data)\n except zmq.ZMQError:\n pass\n\n def _unpack_messages(self, data: bytes) -> None:\n \"\"\"Unpacks a byte stream into individual messages.\n\n Messages are encoded in the following format:\n <HEAD><SEP><len(msg)><SEP><msg><SEP><HEAD[::-1]>\n The items between <> are replaced as follows:\n - <HEAD>: A start marker\n - <SEP>: A separator for components of the message\n - <len(msg)>: The length of the message payload in bytes.\n - <msg>: The message payload in bytes\n - <HEAD[::-1]>: The start marker in reverse to indicate the end.\n\n Partial messages (a series of bytes which cannot be converted to a full\n message) are stored for later. An attempt is made to reconstruct the\n message with the next call to this method.\n\n Args:\n data (bytes): A raw byte stream containing anywhere from a partial\n message to multiple full messages.\n \"\"\"\n msg: Message\n working_data: bytes\n if self._partial_msg:\n # Concatenate the previous partial message to the beginning\n working_data = self._partial_msg + data\n self._partial_msg = None\n else:\n working_data = data\n while working_data:\n try:\n # Message encoding: <HEAD><SEP><len><SEP><msg><SEP><HEAD[::-1]>\n end = working_data.find(\n SocketCommunicator.MSG_SEP + SocketCommunicator.MSG_HEAD[::-1]\n )\n msg_parts: List[bytes] = working_data[:end].split(\n SocketCommunicator.MSG_SEP\n )\n if len(msg_parts) != 3:\n self._partial_msg = working_data\n break\n\n cmd: bytes\n nbytes: bytes\n raw_msg: bytes\n cmd, nbytes, raw_msg = msg_parts\n if len(raw_msg) != int(nbytes):\n self._partial_msg = working_data\n break\n msg = pickle.loads(raw_msg)\n self._msg_queue.put(msg)\n except pickle.UnpicklingError:\n self._partial_msg = working_data\n break\n if end < len(working_data):\n # Add len(SEP+HEAD) since end marks the start of <SEP><HEAD[::-1]\n offset: int = len(\n SocketCommunicator.MSG_SEP + SocketCommunicator.MSG_HEAD\n )\n working_data = working_data[end + offset :]\n else:\n working_data = b\"\"\n\n # Write\n ############################################################################\n\n def _write_socket(self, msg: Message) -> None:\n \"\"\"Sends data over a socket from the 'client' (Task) side.\n\n Messages are encoded in the following format:\n <HEAD><SEP><len(msg)><SEP><msg><SEP><HEAD[::-1]>\n The items between <> are replaced as follows:\n - <HEAD>: A start marker\n - <SEP>: A separator for components of the message\n - <len(msg)>: The length of the message payload in bytes.\n - <msg>: The message payload in bytes\n - <HEAD[::-1]>: The start marker in reverse to indicate the end.\n\n This structure is used for decoding the message on the other end.\n \"\"\"\n data: bytes = pickle.dumps(msg)\n cmd: bytes = SocketCommunicator.MSG_HEAD\n size: bytes = b\"%d\" % len(data)\n end: bytes = SocketCommunicator.MSG_HEAD[::-1]\n sep: bytes = SocketCommunicator.MSG_SEP\n packed_msg: bytes = cmd + sep + size + sep + data + sep + end\n if USE_ZMQ:\n self._data_socket.send(packed_msg)\n else:\n self._data_socket.sendall(packed_msg)\n\n def write(self, msg: Message) -> None:\n \"\"\"Send a single Message.\n\n The entire Message (signal and contents) is serialized and sent through\n a connection over Unix socket.\n\n Args:\n msg (Message): The Message to send.\n \"\"\"\n self._write_socket(msg)\n\n # Generic create\n ############################################################################\n\n def _create_socket_raw(self) -> socket.socket:\n \"\"\"Create either a Unix or TCP socket.\n\n If the environment variable:\n `LUTE_USE_TCP=1`\n is defined, a TCP socket is returned, otherwise a Unix socket.\n\n Refer to the individual initialization methods for additional environment\n variables controlling the behaviour of these two communication types.\n\n Returns:\n data_socket (socket.socket): TCP or Unix socket.\n \"\"\"\n import struct\n\n use_tcp: Optional[str] = os.getenv(\"LUTE_USE_TCP\")\n sock: socket.socket\n if use_tcp is not None:\n if self._party == Party.EXECUTOR:\n logger.info(\"Will use raw TCP sockets.\")\n sock = self._init_tcp_socket_raw()\n else:\n if self._party == Party.EXECUTOR:\n logger.info(\"Will use raw Unix sockets.\")\n sock = self._init_unix_socket_raw()\n sock.setsockopt(\n socket.SOL_SOCKET, socket.SO_LINGER, struct.pack(\"ii\", 1, 10000)\n )\n return sock\n\n def _create_socket_zmq(self) -> zmq.sugar.socket.Socket:\n \"\"\"Create either a Unix or TCP socket.\n\n If the environment variable:\n `LUTE_USE_TCP=1`\n is defined, a TCP socket is returned, otherwise a Unix socket.\n\n Refer to the individual initialization methods for additional environment\n variables controlling the behaviour of these two communication types.\n\n Returns:\n data_socket (socket.socket): Unix socket object.\n \"\"\"\n socket_type: Literal[zmq.PULL, zmq.PUSH]\n if self._party == Party.EXECUTOR:\n socket_type = zmq.PULL\n else:\n socket_type = zmq.PUSH\n\n data_socket: zmq.sugar.socket.Socket = self._context.socket(socket_type)\n data_socket.set_hwm(160000)\n # Need to multiply by 1000 since ZMQ uses ms\n data_socket.setsockopt(\n zmq.RCVTIMEO, int(SocketCommunicator.ACCEPT_TIMEOUT * 1000)\n )\n # Try TCP first\n use_tcp: Optional[str] = os.getenv(\"LUTE_USE_TCP\")\n if use_tcp is not None:\n if self._party == Party.EXECUTOR:\n logger.info(\"Will use TCP (ZMQ).\")\n self._init_tcp_socket_zmq(data_socket)\n else:\n if self._party == Party.EXECUTOR:\n logger.info(\"Will use Unix sockets (ZMQ).\")\n self._init_unix_socket_zmq(data_socket)\n\n return data_socket\n\n # TCP Init\n ############################################################################\n\n def _find_random_port(\n self, min_port: int = 41923, max_port: int = 64324, max_tries: int = 100\n ) -> Optional[int]:\n \"\"\"Find a random open port to bind to if using TCP.\"\"\"\n from random import choices\n\n sock: socket.socket\n ports: List[int] = choices(range(min_port, max_port), k=max_tries)\n for port in ports:\n sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)\n try:\n sock.bind((\"\", port))\n sock.close()\n del sock\n return port\n except:\n continue\n return None\n\n def _init_tcp_socket_raw(self) -> socket.socket:\n \"\"\"Initialize a TCP socket.\n\n Executor-side code should always be run first. It checks to see if\n the environment variable\n `LUTE_PORT=###`\n is defined, if so binds it, otherwise find a free port from a selection\n of random ports. If a port search is performed, the `LUTE_PORT` variable\n will be defined so it can be picked up by the the Task-side Communicator.\n\n In the event that no port can be bound on the Executor-side, or the port\n and hostname information is unavailable to the Task-side, the program\n will exit.\n\n Returns:\n data_socket (socket.socket): TCP socket object.\n \"\"\"\n data_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)\n port: Optional[Union[str, int]] = os.getenv(\"LUTE_PORT\")\n if self._party == Party.EXECUTOR:\n if port is None:\n # If port is None find one\n # Executor code executes first\n port = self._find_random_port()\n if port is None:\n # Failed to find a port to bind\n logger.info(\n \"Executor failed to bind a port. \"\n \"Try providing a LUTE_PORT directly! Exiting!\"\n )\n sys.exit(-1)\n # Provide port env var for Task-side\n os.environ[\"LUTE_PORT\"] = str(port)\n data_socket.bind((\"\", int(port)))\n data_socket.listen()\n else:\n hostname: str = socket.gethostname()\n executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n if executor_hostname is None or port is None:\n logger.info(\n \"Task-side does not have host/port information!\"\n \" Check environment variables! Exiting!\"\n )\n sys.exit(-1)\n if hostname == executor_hostname:\n data_socket.connect((\"localhost\", int(port)))\n else:\n data_socket.connect((executor_hostname, int(port)))\n return data_socket\n\n def _init_tcp_socket_zmq(self, data_socket: zmq.sugar.socket.Socket) -> None:\n \"\"\"Initialize a TCP socket using ZMQ.\n\n Equivalent as the method above but requires passing in a ZMQ socket\n object instead of returning one.\n\n Args:\n data_socket (zmq.socket.Socket): Socket object.\n \"\"\"\n port: Optional[Union[str, int]] = os.getenv(\"LUTE_PORT\")\n if self._party == Party.EXECUTOR:\n if port is None:\n new_port: int = data_socket.bind_to_random_port(\"tcp://*\")\n if new_port is None:\n # Failed to find a port to bind\n logger.info(\n \"Executor failed to bind a port. \"\n \"Try providing a LUTE_PORT directly! Exiting!\"\n )\n sys.exit(-1)\n port = new_port\n os.environ[\"LUTE_PORT\"] = str(port)\n else:\n data_socket.bind(f\"tcp://*:{port}\")\n logger.debug(f\"Executor bound port {port}\")\n else:\n executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n if executor_hostname is None or port is None:\n logger.info(\n \"Task-side does not have host/port information!\"\n \" Check environment variables! Exiting!\"\n )\n sys.exit(-1)\n data_socket.connect(f\"tcp://{executor_hostname}:{port}\")\n\n # Unix Init\n ############################################################################\n\n def _get_socket_path(self) -> str:\n \"\"\"Return the socket path, defining one if it is not available.\n\n Returns:\n socket_path (str): Path to the Unix socket.\n \"\"\"\n socket_path: str\n try:\n socket_path = os.environ[\"LUTE_SOCKET\"]\n except KeyError as err:\n import uuid\n import tempfile\n\n # Define a path, and add to environment\n # Executor-side always created first, Task will use the same one\n socket_path = f\"{tempfile.gettempdir()}/lute_{uuid.uuid4().hex}.sock\"\n os.environ[\"LUTE_SOCKET\"] = socket_path\n logger.debug(f\"SocketCommunicator defines socket_path: {socket_path}\")\n if USE_ZMQ:\n return f\"ipc://{socket_path}\"\n else:\n return socket_path\n\n def _init_unix_socket_raw(self) -> socket.socket:\n \"\"\"Returns a Unix socket object.\n\n Executor-side code should always be run first. It checks to see if\n the environment variable\n `LUTE_SOCKET=XYZ`\n is defined, if so binds it, otherwise it will create a new path and\n define the environment variable for the Task-side to find.\n\n On the Task (client-side), this method will also open a SSH tunnel to\n forward a local Unix socket to an Executor Unix socket if the Task and\n Executor processes are on different machines.\n\n Returns:\n data_socket (socket.socket): Unix socket object.\n \"\"\"\n socket_path: str = self._get_socket_path()\n data_socket = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)\n if self._party == Party.EXECUTOR:\n if os.path.exists(socket_path):\n os.unlink(socket_path)\n data_socket.bind(socket_path)\n data_socket.listen()\n elif self._party == Party.TASK:\n hostname: str = socket.gethostname()\n executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n if executor_hostname is None:\n logger.info(\"Hostname for Executor process not found! Exiting!\")\n data_socket.close()\n sys.exit(-1)\n if hostname == executor_hostname:\n data_socket.connect(socket_path)\n else:\n self._local_socket_path = self._setup_unix_ssh_tunnel(\n socket_path, hostname, executor_hostname\n )\n while 1:\n # Keep trying reconnect until ssh tunnel works.\n try:\n data_socket.connect(self._local_socket_path)\n break\n except FileNotFoundError:\n continue\n\n return data_socket\n\n def _init_unix_socket_zmq(self, data_socket: zmq.sugar.socket.Socket) -> None:\n \"\"\"Initialize a Unix socket object, using ZMQ.\n\n Equivalent as the method above but requires passing in a ZMQ socket\n object instead of returning one.\n\n Args:\n data_socket (socket.socket): ZMQ object.\n \"\"\"\n socket_path = self._get_socket_path()\n if self._party == Party.EXECUTOR:\n if os.path.exists(socket_path):\n os.unlink(socket_path)\n data_socket.bind(socket_path)\n elif self._party == Party.TASK:\n hostname: str = socket.gethostname()\n executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n if executor_hostname is None:\n logger.info(\"Hostname for Executor process not found! Exiting!\")\n self._data_socket.close()\n sys.exit(-1)\n if hostname == executor_hostname:\n data_socket.connect(socket_path)\n else:\n # Need to remove ipc:// from socket_path for forwarding\n self._local_socket_path = self._setup_unix_ssh_tunnel(\n socket_path[6:], hostname, executor_hostname\n )\n # Need to add it back\n path: str = f\"ipc://{self._local_socket_path}\"\n data_socket.connect(path)\n\n def _setup_unix_ssh_tunnel(\n self, socket_path: str, hostname: str, executor_hostname: str\n ) -> str:\n \"\"\"Prepares an SSH tunnel for forwarding between Unix sockets on two hosts.\n\n An SSH tunnel is opened with `ssh -L <local>:<remote> sleep 2`.\n This method of communication is slightly slower and incurs additional\n overhead - it should only be used as a backup. If communication across\n multiple hosts is required consider using TCP. The Task will use\n the local socket `<LUTE_SOCKET>.task{##}`. Multiple local sockets may be\n created. It is assumed that the user is identical on both the\n Task machine and Executor machine.\n\n Returns:\n local_socket_path (str): The local Unix socket to connect to.\n \"\"\"\n if \"uuid\" not in globals():\n import uuid\n local_socket_path = f\"{socket_path}.task{uuid.uuid4().hex[:4]}\"\n self._use_ssh_tunnel = True\n ssh_cmd: List[str] = [\n \"ssh\",\n \"-o\",\n \"LogLevel=quiet\",\n \"-L\",\n f\"{local_socket_path}:{socket_path}\",\n executor_hostname,\n \"sleep\",\n \"2\",\n ]\n logger.debug(f\"Opening tunnel from {hostname} to {executor_hostname}\")\n self._ssh_proc = subprocess.Popen(ssh_cmd)\n time.sleep(0.4) # Need to wait... -> Use single Task comm at beginning?\n return local_socket_path\n\n # Clean up and properties\n ############################################################################\n\n def _clean_up(self) -> None:\n \"\"\"Clean up connections.\"\"\"\n if self._party == Party.EXECUTOR:\n self._stop_thread = True\n self._reader_thread.join()\n logger.debug(\"Closed reading thread.\")\n\n self._data_socket.close()\n if USE_ZMQ:\n self._context.term()\n else:\n ...\n\n if os.getenv(\"LUTE_USE_TCP\"):\n return\n else:\n if self._party == Party.EXECUTOR:\n os.unlink(os.getenv(\"LUTE_SOCKET\")) # Should be defined\n return\n elif self._use_ssh_tunnel:\n if self._ssh_proc is not None:\n self._ssh_proc.terminate()\n\n @property\n def has_messages(self) -> bool:\n if self._party == Party.TASK:\n # Shouldn't be called on Task-side\n return False\n\n if self._msg_queue.qsize() > 0:\n return True\n return False\n\n def __exit__(self):\n self._clean_up()\n
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.ACCEPT_TIMEOUT","title":"ACCEPT_TIMEOUT: float = 0.01
class-attribute
instance-attribute
","text":"Maximum time to wait to accept connections. Used by Executor-side.
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.MSG_HEAD","title":"MSG_HEAD: bytes = b'MSG'
class-attribute
instance-attribute
","text":"Start signal of a message. The end of a message is indicated by MSG_HEAD[::-1].
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.MSG_SEP","title":"MSG_SEP: bytes = b';;;'
class-attribute
instance-attribute
","text":"Separator for parts of a message. Messages have a start, length, message and end.
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.__init__","title":"__init__(party=Party.TASK, use_pickle=True)
","text":"IPC over a TCP or Unix socket.
Unlike with the PipeCommunicator, pickle is always used to send data through the socket.
Parameters:
Name Type Description Defaultparty
Party
Which object (side/process) the Communicator is managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.
TASK
use_pickle
bool
Whether to use pickle. Always True currently, passing False does not change behaviour.
True
Source code in lute/execution/ipc.py
def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n \"\"\"IPC over a TCP or Unix socket.\n\n Unlike with the PipeCommunicator, pickle is always used to send data\n through the socket.\n\n Args:\n party (Party): Which object (side/process) the Communicator is\n managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n\n use_pickle (bool): Whether to use pickle. Always True currently,\n passing False does not change behaviour.\n \"\"\"\n super().__init__(party=party, use_pickle=use_pickle)\n
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.delayed_setup","title":"delayed_setup()
","text":"Delays the creation of socket objects.
The Executor initializes the Communicator when it is created. Since all Executors are created and available at once we want to delay acquisition of socket resources until a single Executor is ready to use them.
Source code inlute/execution/ipc.py
def delayed_setup(self) -> None:\n \"\"\"Delays the creation of socket objects.\n\n The Executor initializes the Communicator when it is created. Since\n all Executors are created and available at once we want to delay\n acquisition of socket resources until a single Executor is ready\n to use them.\n \"\"\"\n self._data_socket: Union[socket.socket, zmq.sugar.socket.Socket]\n if USE_ZMQ:\n self.desc: str = \"Communicates using ZMQ through TCP or Unix sockets.\"\n self._context: zmq.context.Context = zmq.Context()\n self._data_socket = self._create_socket_zmq()\n else:\n self.desc: str = \"Communicates through a TCP or Unix socket.\"\n self._data_socket = self._create_socket_raw()\n self._data_socket.settimeout(SocketCommunicator.ACCEPT_TIMEOUT)\n\n if self._party == Party.EXECUTOR:\n # Executor created first so we can define the hostname env variable\n os.environ[\"LUTE_EXECUTOR_HOST\"] = socket.gethostname()\n # Setup reader thread\n self._reader_thread: threading.Thread = threading.Thread(\n target=self._read_socket\n )\n self._msg_queue: queue.Queue = queue.Queue()\n self._partial_msg: Optional[bytes] = None\n self._stop_thread: bool = False\n self._reader_thread.start()\n else:\n # Only used by Party.TASK\n self._use_ssh_tunnel: bool = False\n self._ssh_proc: Optional[subprocess.Popen] = None\n self._local_socket_path: Optional[str] = None\n
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.read","title":"read(proc)
","text":"Return a message from the queue if available.
Socket(s) are continuously monitored, and read from when new data is available.
Parameters:
Name Type Description Defaultproc
Popen
The process to read from. Provided for compatibility with other Communicator subtypes. Is ignored.
requiredReturns:
Name Type Descriptionmsg
Message
The message read, containing contents and signal.
Source code inlute/execution/ipc.py
def read(self, proc: subprocess.Popen) -> Message:\n \"\"\"Return a message from the queue if available.\n\n Socket(s) are continuously monitored, and read from when new data is\n available.\n\n Args:\n proc (subprocess.Popen): The process to read from. Provided for\n compatibility with other Communicator subtypes. Is ignored.\n\n Returns:\n msg (Message): The message read, containing contents and signal.\n \"\"\"\n msg: Message\n try:\n msg = self._msg_queue.get(timeout=SocketCommunicator.ACCEPT_TIMEOUT)\n except queue.Empty:\n msg = Message()\n\n return msg\n
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.write","title":"write(msg)
","text":"Send a single Message.
The entire Message (signal and contents) is serialized and sent through a connection over Unix socket.
Parameters:
Name Type Description Defaultmsg
Message
The Message to send.
required Source code inlute/execution/ipc.py
def write(self, msg: Message) -> None:\n \"\"\"Send a single Message.\n\n The entire Message (signal and contents) is serialized and sent through\n a connection over Unix socket.\n\n Args:\n msg (Message): The Message to send.\n \"\"\"\n self._write_socket(msg)\n
"},{"location":"source/io/_sqlite/","title":"_sqlite","text":"Backend SQLite database utilites.
Functions should be used only by the higher-level database module.
"},{"location":"source/io/config/","title":"config","text":"Machinary for the IO of configuration YAML files and their validation.
Functions:
Name Descriptionparse_config
str, config_path: str) -> TaskParameters: Parse a configuration file and return a TaskParameters object of validated parameters for a specific Task. Raises an exception if the provided configuration does not match the expected model.
Raises:
Type DescriptionValidationError
Error raised by pydantic during data validation. (From Pydantic)
"},{"location":"source/io/config/#io.config.AnalysisHeader","title":"AnalysisHeader
","text":" Bases: BaseModel
Header information for LUTE analysis runs.
Source code inlute/io/models/base.py
class AnalysisHeader(BaseModel):\n \"\"\"Header information for LUTE analysis runs.\"\"\"\n\n title: str = Field(\n \"LUTE Task Configuration\",\n description=\"Description of the configuration or experiment.\",\n )\n experiment: str = Field(\"\", description=\"Experiment.\")\n run: Union[str, int] = Field(\"\", description=\"Data acquisition run.\")\n date: str = Field(\"1970/01/01\", description=\"Start date of analysis.\")\n lute_version: Union[float, str] = Field(\n 0.1, description=\"Version of LUTE used for analysis.\"\n )\n task_timeout: PositiveInt = Field(\n 600,\n description=(\n \"Time in seconds until a task times out. Should be slightly shorter\"\n \" than job timeout if using a job manager (e.g. SLURM).\"\n ),\n )\n work_dir: str = Field(\"\", description=\"Main working directory for LUTE.\")\n\n @validator(\"work_dir\", always=True)\n def validate_work_dir(cls, directory: str, values: Dict[str, Any]) -> str:\n work_dir: str\n if directory == \"\":\n std_work_dir = (\n f\"/sdf/data/lcls/ds/{values['experiment'][:3]}/\"\n f\"{values['experiment']}/scratch\"\n )\n work_dir = std_work_dir\n else:\n work_dir = directory\n # Check existence and permissions\n if not os.path.exists(work_dir):\n raise ValueError(f\"Working Directory: {work_dir} does not exist!\")\n if not os.access(work_dir, os.W_OK):\n # Need write access for database, files etc.\n raise ValueError(f\"Not write access for working directory: {work_dir}!\")\n return work_dir\n\n @validator(\"run\", always=True)\n def validate_run(\n cls, run: Union[str, int], values: Dict[str, Any]\n ) -> Union[str, int]:\n if run == \"\":\n # From Airflow RUN_NUM should have Format \"RUN_DATETIME\" - Num is first part\n run_time: str = os.environ.get(\"RUN_NUM\", \"\")\n if run_time != \"\":\n return int(run_time.split(\"_\")[0])\n return run\n\n @validator(\"experiment\", always=True)\n def validate_experiment(cls, experiment: str, values: Dict[str, Any]) -> str:\n if experiment == \"\":\n arp_exp: str = os.environ.get(\"EXPERIMENT\", \"EXPX00000\")\n return arp_exp\n return experiment\n
"},{"location":"source/io/config/#io.config.CompareHKLParameters","title":"CompareHKLParameters
","text":" Bases: ThirdPartyParameters
Parameters for CrystFEL's compare_hkl
for calculating figures of merit.
There are many parameters, and many combinations. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-partialator.html
Source code inlute/io/models/sfx_merge.py
class CompareHKLParameters(ThirdPartyParameters):\n \"\"\"Parameters for CrystFEL's `compare_hkl` for calculating figures of merit.\n\n There are many parameters, and many combinations. For more information on\n usage, please refer to the CrystFEL documentation, here:\n https://www.desy.de/~twhite/crystfel/manual-partialator.html\n \"\"\"\n\n class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/compare_hkl\",\n description=\"CrystFEL's reflection comparison binary.\",\n flag_type=\"\",\n )\n in_files: Optional[str] = Field(\n \"\",\n description=\"Path to input HKLs. Space-separated list of 2. Use output of partialator e.g.\",\n flag_type=\"\",\n )\n ## Need mechanism to set is_result=True ...\n symmetry: str = Field(\"\", description=\"Point group symmetry.\", flag_type=\"--\")\n cell_file: str = Field(\n \"\",\n description=\"Path to a file containing unit cell information (PDB or CrystFEL format).\",\n flag_type=\"-\",\n rename_param=\"p\",\n )\n fom: str = Field(\n \"Rsplit\", description=\"Specify figure of merit to calculate.\", flag_type=\"--\"\n )\n nshells: int = Field(10, description=\"Use n resolution shells.\", flag_type=\"--\")\n # NEED A NEW CASE FOR THIS -> Boolean flag, no arg, one hyphen...\n # fix_unity: bool = Field(\n # False,\n # description=\"Fix scale factors to unity.\",\n # flag_type=\"-\",\n # rename_param=\"u\",\n # )\n shell_file: str = Field(\n \"\",\n description=\"Write the statistics in resolution shells to a file.\",\n flag_type=\"--\",\n rename_param=\"shell-file\",\n is_result=True,\n )\n ignore_negs: bool = Field(\n False,\n description=\"Ignore reflections with negative reflections.\",\n flag_type=\"--\",\n rename_param=\"ignore-negs\",\n )\n zero_negs: bool = Field(\n False,\n description=\"Set negative intensities to 0.\",\n flag_type=\"--\",\n rename_param=\"zero-negs\",\n )\n sigma_cutoff: Optional[Union[float, int, str]] = Field(\n # \"-infinity\",\n description=\"Discard reflections with I/sigma(I) < n. -infinity means no cutoff.\",\n flag_type=\"--\",\n rename_param=\"sigma-cutoff\",\n )\n rmin: Optional[float] = Field(\n description=\"Low resolution cutoff of 1/d (m-1). Use this or --lowres NOT both.\",\n flag_type=\"--\",\n )\n lowres: Optional[float] = Field(\n descirption=\"Low resolution cutoff in Angstroms. Use this or --rmin NOT both.\",\n flag_type=\"--\",\n )\n rmax: Optional[float] = Field(\n description=\"High resolution cutoff in 1/d (m-1). Use this or --highres NOT both.\",\n flag_type=\"--\",\n )\n highres: Optional[float] = Field(\n description=\"High resolution cutoff in Angstroms. Use this or --rmax NOT both.\",\n flag_type=\"--\",\n )\n\n @validator(\"in_files\", always=True)\n def validate_in_files(cls, in_files: str, values: Dict[str, Any]) -> str:\n if in_files == \"\":\n partialator_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n )\n if partialator_file:\n hkls: str = f\"{partialator_file}1 {partialator_file}2\"\n return hkls\n return in_files\n\n @validator(\"cell_file\", always=True)\n def validate_cell_file(cls, cell_file: str, values: Dict[str, Any]) -> str:\n if cell_file == \"\":\n idx_cell_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\",\n \"IndexCrystFEL\",\n \"cell_file\",\n valid_only=False,\n )\n if idx_cell_file:\n return idx_cell_file\n return cell_file\n\n @validator(\"symmetry\", always=True)\n def validate_symmetry(cls, symmetry: str, values: Dict[str, Any]) -> str:\n if symmetry == \"\":\n partialator_sym: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"symmetry\"\n )\n if partialator_sym:\n return partialator_sym\n return symmetry\n\n @validator(\"shell_file\", always=True)\n def validate_shell_file(cls, shell_file: str, values: Dict[str, Any]) -> str:\n if shell_file == \"\":\n partialator_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n )\n if partialator_file:\n shells_out: str = partialator_file.split(\".\")[0]\n shells_out = f\"{shells_out}_{values['fom']}_n{values['nshells']}.dat\"\n return shells_out\n return shell_file\n
"},{"location":"source/io/config/#io.config.CompareHKLParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_merge.py
class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/config/#io.config.CompareHKLParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True
class-attribute
instance-attribute
","text":"Whether long command-line arguments are passed like --long=arg
.
set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/config/#io.config.ConcatenateStreamFilesParameters","title":"ConcatenateStreamFilesParameters
","text":" Bases: TaskParameters
Parameters for stream concatenation.
Concatenates the stream file output from CrystFEL indexing for multiple experimental runs.
Source code inlute/io/models/sfx_index.py
class ConcatenateStreamFilesParameters(TaskParameters):\n \"\"\"Parameters for stream concatenation.\n\n Concatenates the stream file output from CrystFEL indexing for multiple\n experimental runs.\n \"\"\"\n\n class Config(TaskParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n in_file: str = Field(\n \"\",\n description=\"Root of directory tree storing stream files to merge.\",\n )\n\n tag: Optional[str] = Field(\n \"\",\n description=\"Tag identifying the stream files to merge.\",\n )\n\n out_file: str = Field(\n \"\", description=\"Path to merged output stream file.\", is_result=True\n )\n\n @validator(\"in_file\", always=True)\n def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n if in_file == \"\":\n stream_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"IndexCrystFEL\", \"out_file\"\n )\n if stream_file:\n stream_dir: str = str(Path(stream_file).parent)\n return stream_dir\n return in_file\n\n @validator(\"tag\", always=True)\n def validate_tag(cls, tag: str, values: Dict[str, Any]) -> str:\n if tag == \"\":\n stream_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"IndexCrystFEL\", \"out_file\"\n )\n if stream_file:\n stream_tag: str = Path(stream_file).name.split(\"_\")[0]\n return stream_tag\n return tag\n\n @validator(\"out_file\", always=True)\n def validate_out_file(cls, tag: str, values: Dict[str, Any]) -> str:\n if tag == \"\":\n stream_out_file: str = str(\n Path(values[\"in_file\"]).parent / f\"{values['tag'].stream}\"\n )\n return stream_out_file\n return tag\n
"},{"location":"source/io/config/#io.config.ConcatenateStreamFilesParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_index.py
class Config(TaskParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/config/#io.config.ConcatenateStreamFilesParameters.Config.set_result","title":"set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/config/#io.config.DimpleSolveParameters","title":"DimpleSolveParameters
","text":" Bases: ThirdPartyParameters
Parameters for CCP4's dimple program.
There are many parameters. For more information on usage, please refer to the CCP4 documentation, here: https://ccp4.github.io/dimple/
Source code inlute/io/models/sfx_solve.py
class DimpleSolveParameters(ThirdPartyParameters):\n \"\"\"Parameters for CCP4's dimple program.\n\n There are many parameters. For more information on\n usage, please refer to the CCP4 documentation, here:\n https://ccp4.github.io/dimple/\n \"\"\"\n\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/ccp4-8.0/bin/dimple\",\n description=\"CCP4 Dimple for solving structures with MR.\",\n flag_type=\"\",\n )\n # Positional requirements - all required.\n in_file: str = Field(\n \"\",\n description=\"Path to input mtz.\",\n flag_type=\"\",\n )\n pdb: str = Field(\"\", description=\"Path to a PDB.\", flag_type=\"\")\n out_dir: str = Field(\"\", description=\"Output DIRECTORY.\", flag_type=\"\")\n # Most used options\n mr_thresh: PositiveFloat = Field(\n 0.4,\n description=\"Threshold for molecular replacement.\",\n flag_type=\"--\",\n rename_param=\"mr-when-r\",\n )\n slow: Optional[bool] = Field(\n False, description=\"Perform more refinement.\", flag_type=\"--\"\n )\n # Other options (IO)\n hklout: str = Field(\n \"final.mtz\", description=\"Output mtz file name.\", flag_type=\"--\"\n )\n xyzout: str = Field(\n \"final.pdb\", description=\"Output PDB file name.\", flag_type=\"--\"\n )\n icolumn: Optional[str] = Field(\n # \"IMEAN\",\n description=\"Name for the I column.\",\n flag_type=\"--\",\n )\n sigicolumn: Optional[str] = Field(\n # \"SIG<ICOL>\",\n description=\"Name for the Sig<I> column.\",\n flag_type=\"--\",\n )\n fcolumn: Optional[str] = Field(\n # \"F\",\n description=\"Name for the F column.\",\n flag_type=\"--\",\n )\n sigfcolumn: Optional[str] = Field(\n # \"F\",\n description=\"Name for the Sig<F> column.\",\n flag_type=\"--\",\n )\n libin: Optional[str] = Field(\n description=\"Ligand descriptions for refmac (LIBIN).\", flag_type=\"--\"\n )\n refmac_key: Optional[str] = Field(\n description=\"Extra Refmac keywords to use in refinement.\",\n flag_type=\"--\",\n rename_param=\"refmac-key\",\n )\n free_r_flags: Optional[str] = Field(\n description=\"Path to a mtz file with freeR flags.\",\n flag_type=\"--\",\n rename_param=\"free-r-flags\",\n )\n freecolumn: Optional[Union[int, float]] = Field(\n # 0,\n description=\"Refree column with an optional value.\",\n flag_type=\"--\",\n )\n img_format: Optional[str] = Field(\n description=\"Format of generated images. (png, jpeg, none).\",\n flag_type=\"-\",\n rename_param=\"f\",\n )\n white_bg: bool = Field(\n False,\n description=\"Use a white background in Coot and in images.\",\n flag_type=\"--\",\n rename_param=\"white-bg\",\n )\n no_cleanup: bool = Field(\n False,\n description=\"Retain intermediate files.\",\n flag_type=\"--\",\n rename_param=\"no-cleanup\",\n )\n # Calculations\n no_blob_search: bool = Field(\n False,\n description=\"Do not search for unmodelled blobs.\",\n flag_type=\"--\",\n rename_param=\"no-blob-search\",\n )\n anode: bool = Field(\n False, description=\"Use SHELX/AnoDe to find peaks in the anomalous map.\"\n )\n # Run customization\n no_hetatm: bool = Field(\n False,\n description=\"Remove heteroatoms from the given model.\",\n flag_type=\"--\",\n rename_param=\"no-hetatm\",\n )\n rigid_cycles: Optional[PositiveInt] = Field(\n # 10,\n description=\"Number of cycles of rigid-body refinement to perform.\",\n flag_type=\"--\",\n rename_param=\"rigid-cycles\",\n )\n jelly: Optional[PositiveInt] = Field(\n # 4,\n description=\"Number of cycles of jelly-body refinement to perform.\",\n flag_type=\"--\",\n )\n restr_cycles: Optional[PositiveInt] = Field(\n # 8,\n description=\"Number of cycles of refmac final refinement to perform.\",\n flag_type=\"--\",\n rename_param=\"restr-cycles\",\n )\n lim_resolution: Optional[PositiveFloat] = Field(\n description=\"Limit the final resolution.\", flag_type=\"--\", rename_param=\"reso\"\n )\n weight: Optional[str] = Field(\n # \"auto-weight\",\n description=\"The refmac matrix weight.\",\n flag_type=\"--\",\n )\n mr_prog: Optional[str] = Field(\n # \"phaser\",\n description=\"Molecular replacement program. phaser or molrep.\",\n flag_type=\"--\",\n rename_param=\"mr-prog\",\n )\n mr_num: Optional[Union[str, int]] = Field(\n # \"auto\",\n description=\"Number of molecules to use for molecular replacement.\",\n flag_type=\"--\",\n rename_param=\"mr-num\",\n )\n mr_reso: Optional[PositiveFloat] = Field(\n # 3.25,\n description=\"High resolution for molecular replacement. If >10 interpreted as eLLG.\",\n flag_type=\"--\",\n rename_param=\"mr-reso\",\n )\n itof_prog: Optional[str] = Field(\n description=\"Program to calculate amplitudes. truncate, or ctruncate.\",\n flag_type=\"--\",\n rename_param=\"ItoF-prog\",\n )\n\n @validator(\"in_file\", always=True)\n def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n if in_file == \"\":\n get_hkl_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"ManipulateHKL\", \"out_file\"\n )\n if get_hkl_file:\n return get_hkl_file\n return in_file\n\n @validator(\"out_dir\", always=True)\n def validate_out_dir(cls, out_dir: str, values: Dict[str, Any]) -> str:\n if out_dir == \"\":\n get_hkl_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"ManipulateHKL\", \"out_file\"\n )\n if get_hkl_file:\n return os.path.dirname(get_hkl_file)\n return out_dir\n
"},{"location":"source/io/config/#io.config.FindOverlapXSSParameters","title":"FindOverlapXSSParameters
","text":" Bases: TaskParameters
TaskParameter model for FindOverlapXSS Task.
This Task determines spatial or temporal overlap between an optical pulse and the FEL pulse based on difference scattering (XSS) signal. This Task uses SmallData HDF5 files as a source.
Source code inlute/io/models/smd.py
class FindOverlapXSSParameters(TaskParameters):\n \"\"\"TaskParameter model for FindOverlapXSS Task.\n\n This Task determines spatial or temporal overlap between an optical pulse\n and the FEL pulse based on difference scattering (XSS) signal. This Task\n uses SmallData HDF5 files as a source.\n \"\"\"\n\n class ExpConfig(BaseModel):\n det_name: str\n ipm_var: str\n scan_var: Union[str, List[str]]\n\n class Thresholds(BaseModel):\n min_Iscat: Union[int, float]\n min_ipm: Union[int, float]\n\n class AnalysisFlags(BaseModel):\n use_pyfai: bool = True\n use_asymls: bool = False\n\n exp_config: ExpConfig\n thresholds: Thresholds\n analysis_flags: AnalysisFlags\n
"},{"location":"source/io/config/#io.config.FindPeaksPsocakeParameters","title":"FindPeaksPsocakeParameters
","text":" Bases: ThirdPartyParameters
Parameters for crystallographic (Bragg) peak finding using Psocake.
This peak finding Task optionally has the ability to compress/decompress data with SZ for the purpose of compression validation. NOTE: This Task is deprecated and provided for compatibility only.
Source code inlute/io/models/sfx_find_peaks.py
class FindPeaksPsocakeParameters(ThirdPartyParameters):\n \"\"\"Parameters for crystallographic (Bragg) peak finding using Psocake.\n\n This peak finding Task optionally has the ability to compress/decompress\n data with SZ for the purpose of compression validation.\n NOTE: This Task is deprecated and provided for compatibility only.\n \"\"\"\n\n class Config(TaskParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n result_from_params: str = \"\"\n \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n\n class SZParameters(BaseModel):\n compressor: Literal[\"qoz\", \"sz3\"] = Field(\n \"qoz\", description=\"SZ compression algorithm (qoz, sz3)\"\n )\n binSize: int = Field(2, description=\"SZ compression's bin size paramater\")\n roiWindowSize: int = Field(\n 2, description=\"SZ compression's ROI window size paramater\"\n )\n absError: float = Field(10, descriptionp=\"Maximum absolute error value\")\n\n executable: str = Field(\"mpirun\", description=\"MPI executable.\", flag_type=\"\")\n np: PositiveInt = Field(\n max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n description=\"Number of processes\",\n flag_type=\"-\",\n )\n mca: str = Field(\n \"btl ^openib\", description=\"Mca option for the MPI executable\", flag_type=\"--\"\n )\n p_arg1: str = Field(\n \"python\", description=\"Executable to run with mpi (i.e. python).\", flag_type=\"\"\n )\n u: str = Field(\n \"\", description=\"Python option for unbuffered output.\", flag_type=\"-\"\n )\n p_arg2: str = Field(\n \"findPeaksSZ.py\",\n description=\"Executable to run with mpi (i.e. python).\",\n flag_type=\"\",\n )\n d: str = Field(description=\"Detector name\", flag_type=\"-\")\n e: str = Field(\"\", description=\"Experiment name\", flag_type=\"-\")\n r: int = Field(-1, description=\"Run number\", flag_type=\"-\")\n outDir: str = Field(\n description=\"Output directory where .cxi will be saved\", flag_type=\"--\"\n )\n algorithm: int = Field(1, description=\"PyAlgos algorithm to use\", flag_type=\"--\")\n alg_npix_min: float = Field(\n 1.0, description=\"PyAlgos algorithm's npix_min parameter\", flag_type=\"--\"\n )\n alg_npix_max: float = Field(\n 45.0, description=\"PyAlgos algorithm's npix_max parameter\", flag_type=\"--\"\n )\n alg_amax_thr: float = Field(\n 250.0, description=\"PyAlgos algorithm's amax_thr parameter\", flag_type=\"--\"\n )\n alg_atot_thr: float = Field(\n 330.0, description=\"PyAlgos algorithm's atot_thr parameter\", flag_type=\"--\"\n )\n alg_son_min: float = Field(\n 10.0, description=\"PyAlgos algorithm's son_min parameter\", flag_type=\"--\"\n )\n alg1_thr_low: float = Field(\n 80.0, description=\"PyAlgos algorithm's thr_low parameter\", flag_type=\"--\"\n )\n alg1_thr_high: float = Field(\n 270.0, description=\"PyAlgos algorithm's thr_high parameter\", flag_type=\"--\"\n )\n alg1_rank: int = Field(\n 3, description=\"PyAlgos algorithm's rank parameter\", flag_type=\"--\"\n )\n alg1_radius: int = Field(\n 3, description=\"PyAlgos algorithm's radius parameter\", flag_type=\"--\"\n )\n alg1_dr: int = Field(\n 1, description=\"PyAlgos algorithm's dr parameter\", flag_type=\"--\"\n )\n psanaMask_on: str = Field(\n \"True\", description=\"Whether psana's mask should be used\", flag_type=\"--\"\n )\n psanaMask_calib: str = Field(\n \"True\", description=\"Psana mask's calib parameter\", flag_type=\"--\"\n )\n psanaMask_status: str = Field(\n \"True\", description=\"Psana mask's status parameter\", flag_type=\"--\"\n )\n psanaMask_edges: str = Field(\n \"True\", description=\"Psana mask's edges parameter\", flag_type=\"--\"\n )\n psanaMask_central: str = Field(\n \"True\", description=\"Psana mask's central parameter\", flag_type=\"--\"\n )\n psanaMask_unbond: str = Field(\n \"True\", description=\"Psana mask's unbond parameter\", flag_type=\"--\"\n )\n psanaMask_unbondnrs: str = Field(\n \"True\", description=\"Psana mask's unbondnbrs parameter\", flag_type=\"--\"\n )\n mask: str = Field(\n \"\", description=\"Path to an additional mask to apply\", flag_type=\"--\"\n )\n clen: str = Field(\n description=\"Epics variable storing the camera length\", flag_type=\"--\"\n )\n coffset: float = Field(0, description=\"Camera offset in m\", flag_type=\"--\")\n minPeaks: int = Field(\n 15,\n description=\"Minimum number of peaks to mark frame for indexing\",\n flag_type=\"--\",\n )\n maxPeaks: int = Field(\n 15,\n description=\"Maximum number of peaks to mark frame for indexing\",\n flag_type=\"--\",\n )\n minRes: int = Field(\n 0,\n description=\"Minimum peak resolution to mark frame for indexing \",\n flag_type=\"--\",\n )\n sample: str = Field(\"\", description=\"Sample name\", flag_type=\"--\")\n instrument: Union[None, str] = Field(\n None, description=\"Instrument name\", flag_type=\"--\"\n )\n pixelSize: float = Field(0.0, description=\"Pixel size\", flag_type=\"--\")\n auto: str = Field(\n \"False\",\n description=(\n \"Whether to automatically determine peak per event peak \"\n \"finding parameters\"\n ),\n flag_type=\"--\",\n )\n detectorDistance: float = Field(\n 0.0, description=\"Detector distance from interaction point in m\", flag_type=\"--\"\n )\n access: Literal[\"ana\", \"ffb\"] = Field(\n \"ana\", description=\"Data node type: {ana,ffb}\", flag_type=\"--\"\n )\n szfile: str = Field(\"qoz.json\", description=\"Path to SZ's JSON configuration file\")\n lute_template_cfg: TemplateConfig = Field(\n TemplateConfig(\n template_name=\"sz.json\",\n output_path=\"\", # Will want to change where this goes...\n ),\n description=\"Template information for the sz.json file\",\n )\n sz_parameters: SZParameters = Field(\n description=\"Configuration parameters for SZ Compression\", flag_type=\"\"\n )\n\n @validator(\"e\", always=True)\n def validate_e(cls, e: str, values: Dict[str, Any]) -> str:\n if e == \"\":\n return values[\"lute_config\"].experiment\n return e\n\n @validator(\"r\", always=True)\n def validate_r(cls, r: int, values: Dict[str, Any]) -> int:\n if r == -1:\n return values[\"lute_config\"].run\n return r\n\n @validator(\"lute_template_cfg\", always=True)\n def set_output_path(\n cls, lute_template_cfg: TemplateConfig, values: Dict[str, Any]\n ) -> TemplateConfig:\n if lute_template_cfg.output_path == \"\":\n lute_template_cfg.output_path = values[\"szfile\"]\n return lute_template_cfg\n\n @validator(\"sz_parameters\", always=True)\n def set_sz_compression_parameters(\n cls, sz_parameters: SZParameters, values: Dict[str, Any]\n ) -> None:\n values[\"compressor\"] = sz_parameters.compressor\n values[\"binSize\"] = sz_parameters.binSize\n values[\"roiWindowSize\"] = sz_parameters.roiWindowSize\n if sz_parameters.compressor == \"qoz\":\n values[\"pressio_opts\"] = {\n \"pressio:abs\": sz_parameters.absError,\n \"qoz\": {\"qoz:stride\": 8},\n }\n else:\n values[\"pressio_opts\"] = {\"pressio:abs\": sz_parameters.absError}\n return None\n\n @root_validator(pre=False)\n def define_result(cls, values: Dict[str, Any]) -> Dict[str, Any]:\n exp: str = values[\"lute_config\"].experiment\n run: int = int(values[\"lute_config\"].run)\n directory: str = values[\"outDir\"]\n fname: str = f\"{exp}_{run:04d}.lst\"\n\n cls.Config.result_from_params = f\"{directory}/{fname}\"\n return values\n
"},{"location":"source/io/config/#io.config.FindPeaksPsocakeParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_find_peaks.py
class Config(TaskParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n result_from_params: str = \"\"\n \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n
"},{"location":"source/io/config/#io.config.FindPeaksPsocakeParameters.Config.result_from_params","title":"result_from_params: str = ''
class-attribute
instance-attribute
","text":"Defines a result from the parameters. Use a validator to do so.
"},{"location":"source/io/config/#io.config.FindPeaksPsocakeParameters.Config.set_result","title":"set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/config/#io.config.FindPeaksPyAlgosParameters","title":"FindPeaksPyAlgosParameters
","text":" Bases: TaskParameters
Parameters for crystallographic (Bragg) peak finding using PyAlgos.
This peak finding Task optionally has the ability to compress/decompress data with SZ for the purpose of compression validation.
Source code inlute/io/models/sfx_find_peaks.py
class FindPeaksPyAlgosParameters(TaskParameters):\n \"\"\"Parameters for crystallographic (Bragg) peak finding using PyAlgos.\n\n This peak finding Task optionally has the ability to compress/decompress\n data with SZ for the purpose of compression validation.\n \"\"\"\n\n class Config(TaskParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n class SZCompressorParameters(BaseModel):\n compressor: Literal[\"qoz\", \"sz3\"] = Field(\n \"qoz\", description='Compression algorithm (\"qoz\" or \"sz3\")'\n )\n abs_error: float = Field(10.0, description=\"Absolute error bound\")\n bin_size: int = Field(2, description=\"Bin size\")\n roi_window_size: int = Field(\n 9,\n description=\"Default window size\",\n )\n\n outdir: str = Field(\n description=\"Output directory for cxi files\",\n )\n n_events: int = Field(\n 0,\n description=\"Number of events to process (0 to process all events)\",\n )\n det_name: str = Field(\n description=\"Psana name of the detector storing the image data\",\n )\n event_receiver: Literal[\"evr0\", \"evr1\"] = Field(\n description=\"Event Receiver to be used: evr0 or evr1\",\n )\n tag: str = Field(\n \"\",\n description=\"Tag to add to the output file names\",\n )\n pv_camera_length: Union[str, float] = Field(\n \"\",\n description=\"PV associated with camera length \"\n \"(if a number, camera length directly)\",\n )\n event_logic: bool = Field(\n False,\n description=\"True if only events with a specific event code should be \"\n \"processed. False if the event code should be ignored\",\n )\n event_code: int = Field(\n 0,\n description=\"Required events code for events to be processed if event logic \"\n \"is True\",\n )\n psana_mask: bool = Field(\n False,\n description=\"If True, apply mask from psana Detector object\",\n )\n mask_file: Union[str, None] = Field(\n None,\n description=\"File with a custom mask to apply. If None, no custom mask is \"\n \"applied\",\n )\n min_peaks: int = Field(2, description=\"Minimum number of peaks per image\")\n max_peaks: int = Field(\n 2048,\n description=\"Maximum number of peaks per image\",\n )\n npix_min: int = Field(\n 2,\n description=\"Minimum number of pixels per peak\",\n )\n npix_max: int = Field(\n 30,\n description=\"Maximum number of pixels per peak\",\n )\n amax_thr: float = Field(\n 80.0,\n description=\"Minimum intensity threshold for starting a peak\",\n )\n atot_thr: float = Field(\n 120.0,\n description=\"Minimum summed intensity threshold for pixel collection\",\n )\n son_min: float = Field(\n 7.0,\n description=\"Minimum signal-to-noise ratio to be considered a peak\",\n )\n peak_rank: int = Field(\n 3,\n description=\"Radius in which central peak pixel is a local maximum\",\n )\n r0: float = Field(\n 3.0,\n description=\"Radius of ring for background evaluation in pixels\",\n )\n dr: float = Field(\n 2.0,\n description=\"Width of ring for background evaluation in pixels\",\n )\n nsigm: float = Field(\n 7.0,\n description=\"Intensity threshold to include pixel in connected group\",\n )\n compression: Optional[SZCompressorParameters] = Field(\n None,\n description=\"Options for the SZ Compression Algorithm\",\n )\n out_file: str = Field(\n \"\",\n description=\"Path to output file.\",\n flag_type=\"-\",\n rename_param=\"o\",\n is_result=True,\n )\n\n @validator(\"out_file\", always=True)\n def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n if out_file == \"\":\n fname: Path = (\n Path(values[\"outdir\"])\n / f\"{values['lute_config'].experiment}_{values['lute_config'].run}_\"\n f\"{values['tag']}.list\"\n )\n return str(fname)\n return out_file\n
"},{"location":"source/io/config/#io.config.FindPeaksPyAlgosParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_find_peaks.py
class Config(TaskParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/config/#io.config.FindPeaksPyAlgosParameters.Config.set_result","title":"set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/config/#io.config.IndexCrystFELParameters","title":"IndexCrystFELParameters
","text":" Bases: ThirdPartyParameters
Parameters for CrystFEL's indexamajig
.
There are many parameters, and many combinations. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-indexamajig.html
Source code inlute/io/models/sfx_index.py
class IndexCrystFELParameters(ThirdPartyParameters):\n \"\"\"Parameters for CrystFEL's `indexamajig`.\n\n There are many parameters, and many combinations. For more information on\n usage, please refer to the CrystFEL documentation, here:\n https://www.desy.de/~twhite/crystfel/manual-indexamajig.html\n \"\"\"\n\n class Config(ThirdPartyParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/indexamajig\",\n description=\"CrystFEL's indexing binary.\",\n flag_type=\"\",\n )\n # Basic options\n in_file: Optional[str] = Field(\n \"\", description=\"Path to input file.\", flag_type=\"-\", rename_param=\"i\"\n )\n out_file: str = Field(\n \"\",\n description=\"Path to output file.\",\n flag_type=\"-\",\n rename_param=\"o\",\n is_result=True,\n )\n geometry: str = Field(\n \"\", description=\"Path to geometry file.\", flag_type=\"-\", rename_param=\"g\"\n )\n zmq_input: Optional[str] = Field(\n description=\"ZMQ address to receive data over. `input` and `zmq-input` are mutually exclusive\",\n flag_type=\"--\",\n rename_param=\"zmq-input\",\n )\n zmq_subscribe: Optional[str] = Field( # Can be used multiple times...\n description=\"Subscribe to ZMQ message of type `tag`\",\n flag_type=\"--\",\n rename_param=\"zmq-subscribe\",\n )\n zmq_request: Optional[AnyUrl] = Field(\n description=\"Request new data over ZMQ by sending this value\",\n flag_type=\"--\",\n rename_param=\"zmq-request\",\n )\n asapo_endpoint: Optional[str] = Field(\n description=\"ASAP::O endpoint. zmq-input and this are mutually exclusive.\",\n flag_type=\"--\",\n rename_param=\"asapo-endpoint\",\n )\n asapo_token: Optional[str] = Field(\n description=\"ASAP::O authentication token.\",\n flag_type=\"--\",\n rename_param=\"asapo-token\",\n )\n asapo_beamtime: Optional[str] = Field(\n description=\"ASAP::O beatime.\",\n flag_type=\"--\",\n rename_param=\"asapo-beamtime\",\n )\n asapo_source: Optional[str] = Field(\n description=\"ASAP::O data source.\",\n flag_type=\"--\",\n rename_param=\"asapo-source\",\n )\n asapo_group: Optional[str] = Field(\n description=\"ASAP::O consumer group.\",\n flag_type=\"--\",\n rename_param=\"asapo-group\",\n )\n asapo_stream: Optional[str] = Field(\n description=\"ASAP::O stream.\",\n flag_type=\"--\",\n rename_param=\"asapo-stream\",\n )\n asapo_wait_for_stream: Optional[str] = Field(\n description=\"If ASAP::O stream does not exist, wait for it to appear.\",\n flag_type=\"--\",\n rename_param=\"asapo-wait-for-stream\",\n )\n data_format: Optional[str] = Field(\n description=\"Specify format for ZMQ or ASAP::O. `msgpack`, `hdf5` or `seedee`.\",\n flag_type=\"--\",\n rename_param=\"data-format\",\n )\n basename: bool = Field(\n False,\n description=\"Remove directory parts of filenames. Acts before prefix if prefix also given.\",\n flag_type=\"--\",\n )\n prefix: Optional[str] = Field(\n description=\"Add a prefix to the filenames from the infile argument.\",\n flag_type=\"--\",\n rename_param=\"asapo-stream\",\n )\n nthreads: PositiveInt = Field(\n max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n description=\"Number of threads to use. See also `max_indexer_threads`.\",\n flag_type=\"-\",\n rename_param=\"j\",\n )\n no_check_prefix: bool = Field(\n False,\n description=\"Don't attempt to correct the prefix if it seems incorrect.\",\n flag_type=\"--\",\n rename_param=\"no-check-prefix\",\n )\n highres: Optional[float] = Field(\n description=\"Mark all pixels greater than `x` has bad.\", flag_type=\"--\"\n )\n profile: bool = Field(\n False, description=\"Display timing data to monitor performance.\", flag_type=\"--\"\n )\n temp_dir: Optional[str] = Field(\n description=\"Specify a path for the temp files folder.\",\n flag_type=\"--\",\n rename_param=\"temp-dir\",\n )\n wait_for_file: conint(gt=-2) = Field(\n 0,\n description=\"Wait at most `x` seconds for a file to be created. A value of -1 means wait forever.\",\n flag_type=\"--\",\n rename_param=\"wait-for-file\",\n )\n no_image_data: bool = Field(\n False,\n description=\"Load only the metadata, no iamges. Can check indexability without high data requirements.\",\n flag_type=\"--\",\n rename_param=\"no-image-data\",\n )\n # Peak-finding options\n # ....\n # Indexing options\n indexing: Optional[str] = Field(\n description=\"Comma-separated list of supported indexing algorithms to use. Default is to automatically detect.\",\n flag_type=\"--\",\n )\n cell_file: Optional[str] = Field(\n description=\"Path to a file containing unit cell information (PDB or CrystFEL format).\",\n flag_type=\"-\",\n rename_param=\"p\",\n )\n tolerance: str = Field(\n \"5,5,5,1.5\",\n description=(\n \"Tolerances (in percent) for unit cell comparison. \"\n \"Comma-separated list a,b,c,angle. Default=5,5,5,1.5\"\n ),\n flag_type=\"--\",\n )\n no_check_cell: bool = Field(\n False,\n description=\"Do not check cell parameters against unit cell. Replaces '-raw' method.\",\n flag_type=\"--\",\n rename_param=\"no-check-cell\",\n )\n no_check_peaks: bool = Field(\n False,\n description=\"Do not verify peaks are accounted for by solution.\",\n flag_type=\"--\",\n rename_param=\"no-check-peaks\",\n )\n multi: bool = Field(\n False, description=\"Enable multi-lattice indexing.\", flag_type=\"--\"\n )\n wavelength_estimate: Optional[float] = Field(\n description=\"Estimate for X-ray wavelength. Required for some methods.\",\n flag_type=\"--\",\n rename_param=\"wavelength-estimate\",\n )\n camera_length_estimate: Optional[float] = Field(\n description=\"Estimate for camera distance. Required for some methods.\",\n flag_type=\"--\",\n rename_param=\"camera-length-estimate\",\n )\n max_indexer_threads: Optional[PositiveInt] = Field(\n # 1,\n description=\"Some indexing algos can use multiple threads. In addition to image-based.\",\n flag_type=\"--\",\n rename_param=\"max-indexer-threads\",\n )\n no_retry: bool = Field(\n False,\n description=\"Do not remove weak peaks and try again.\",\n flag_type=\"--\",\n rename_param=\"no-retry\",\n )\n no_refine: bool = Field(\n False,\n description=\"Skip refinement step.\",\n flag_type=\"--\",\n rename_param=\"no-refine\",\n )\n no_revalidate: bool = Field(\n False,\n description=\"Skip revalidation step.\",\n flag_type=\"--\",\n rename_param=\"no-revalidate\",\n )\n # TakeTwo specific parameters\n taketwo_member_threshold: Optional[PositiveInt] = Field(\n # 20,\n description=\"Minimum number of vectors to consider.\",\n flag_type=\"--\",\n rename_param=\"taketwo-member-threshold\",\n )\n taketwo_len_tolerance: Optional[PositiveFloat] = Field(\n # 0.001,\n description=\"TakeTwo length tolerance in Angstroms.\",\n flag_type=\"--\",\n rename_param=\"taketwo-len-tolerance\",\n )\n taketwo_angle_tolerance: Optional[PositiveFloat] = Field(\n # 0.6,\n description=\"TakeTwo angle tolerance in degrees.\",\n flag_type=\"--\",\n rename_param=\"taketwo-angle-tolerance\",\n )\n taketwo_trace_tolerance: Optional[PositiveFloat] = Field(\n # 3,\n description=\"Matrix trace tolerance in degrees.\",\n flag_type=\"--\",\n rename_param=\"taketwo-trace-tolerance\",\n )\n # Felix-specific parameters\n # felix_domega\n # felix-fraction-max-visits\n # felix-max-internal-angle\n # felix-max-uniqueness\n # felix-min-completeness\n # felix-min-visits\n # felix-num-voxels\n # felix-sigma\n # felix-tthrange-max\n # felix-tthrange-min\n # XGANDALF-specific parameters\n xgandalf_sampling_pitch: Optional[NonNegativeInt] = Field(\n # 6,\n description=\"Density of reciprocal space sampling.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-sampling-pitch\",\n )\n xgandalf_grad_desc_iterations: Optional[NonNegativeInt] = Field(\n # 4,\n description=\"Number of gradient descent iterations.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-grad-desc-iterations\",\n )\n xgandalf_tolerance: Optional[PositiveFloat] = Field(\n # 0.02,\n description=\"Relative tolerance of lattice vectors\",\n flag_type=\"--\",\n rename_param=\"xgandalf-tolerance\",\n )\n xgandalf_no_deviation_from_provided_cell: Optional[bool] = Field(\n description=\"Found unit cell must match provided.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-no-deviation-from-provided-cell\",\n )\n xgandalf_min_lattice_vector_length: Optional[PositiveFloat] = Field(\n # 30,\n description=\"Minimum possible lattice length.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-min-lattice-vector-length\",\n )\n xgandalf_max_lattice_vector_length: Optional[PositiveFloat] = Field(\n # 250,\n description=\"Minimum possible lattice length.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-max-lattice-vector-length\",\n )\n xgandalf_max_peaks: Optional[PositiveInt] = Field(\n # 250,\n description=\"Maximum number of peaks to use for indexing.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-max-peaks\",\n )\n xgandalf_fast_execution: bool = Field(\n False,\n description=\"Shortcut to set sampling-pitch=2, and grad-desc-iterations=3.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-fast-execution\",\n )\n # pinkIndexer parameters\n # ...\n # asdf_fast: bool = Field(False, description=\"Enable fast mode for asdf. 3x faster for 7% loss in accuracy.\", flag_type=\"--\", rename_param=\"asdf-fast\")\n # Integration parameters\n integration: str = Field(\n \"rings-nocen\", description=\"Method for integrating reflections.\", flag_type=\"--\"\n )\n fix_profile_radius: Optional[float] = Field(\n description=\"Fix the profile radius (m^{-1})\",\n flag_type=\"--\",\n rename_param=\"fix-profile-radius\",\n )\n fix_divergence: Optional[float] = Field(\n 0,\n description=\"Fix the divergence (rad, full angle).\",\n flag_type=\"--\",\n rename_param=\"fix-divergence\",\n )\n int_radius: str = Field(\n \"4,5,7\",\n description=\"Inner, middle, and outer radii for 3-ring integration.\",\n flag_type=\"--\",\n rename_param=\"int-radius\",\n )\n int_diag: str = Field(\n \"none\",\n description=\"Show detailed information on integration when condition is met.\",\n flag_type=\"--\",\n rename_param=\"int-diag\",\n )\n push_res: str = Field(\n \"infinity\",\n description=\"Integrate `x` higher than apparent resolution limit (nm-1).\",\n flag_type=\"--\",\n rename_param=\"push-res\",\n )\n overpredict: bool = Field(\n False,\n description=\"Over-predict reflections. Maybe useful with post-refinement.\",\n flag_type=\"--\",\n )\n cell_parameters_only: bool = Field(\n False, description=\"Do not predict refletions at all\", flag_type=\"--\"\n )\n # Output parameters\n no_non_hits_in_stream: bool = Field(\n False,\n description=\"Exclude non-hits from the stream file.\",\n flag_type=\"--\",\n rename_param=\"no-non-hits-in-stream\",\n )\n copy_hheader: Optional[str] = Field(\n description=\"Copy information from header in the image to output stream.\",\n flag_type=\"--\",\n rename_param=\"copy-hheader\",\n )\n no_peaks_in_stream: bool = Field(\n False,\n description=\"Do not record peaks in stream file.\",\n flag_type=\"--\",\n rename_param=\"no-peaks-in-stream\",\n )\n no_refls_in_stream: bool = Field(\n False,\n description=\"Do not record reflections in stream.\",\n flag_type=\"--\",\n rename_param=\"no-refls-in-stream\",\n )\n serial_offset: Optional[PositiveInt] = Field(\n description=\"Start numbering at `x` instead of 1.\",\n flag_type=\"--\",\n rename_param=\"serial-offset\",\n )\n harvest_file: Optional[str] = Field(\n description=\"Write parameters to file in JSON format.\",\n flag_type=\"--\",\n rename_param=\"harvest-file\",\n )\n\n @validator(\"in_file\", always=True)\n def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n if in_file == \"\":\n filename: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"FindPeaksPyAlgos\", \"out_file\"\n )\n if filename is None:\n exp: str = values[\"lute_config\"].experiment\n run: int = int(values[\"lute_config\"].run)\n tag: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"FindPeaksPsocake\", \"tag\"\n )\n out_dir: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"FindPeaksPsocake\", \"outDir\"\n )\n if out_dir is not None:\n fname: str = f\"{out_dir}/{exp}_{run:04d}\"\n if tag is not None:\n fname = f\"{fname}_{tag}\"\n return f\"{fname}.lst\"\n else:\n return filename\n return in_file\n\n @validator(\"out_file\", always=True)\n def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n if out_file == \"\":\n expmt: str = values[\"lute_config\"].experiment\n run: int = int(values[\"lute_config\"].run)\n work_dir: str = values[\"lute_config\"].work_dir\n fname: str = f\"{expmt}_r{run:04d}.stream\"\n return f\"{work_dir}/{fname}\"\n return out_file\n
"},{"location":"source/io/config/#io.config.IndexCrystFELParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_index.py
class Config(ThirdPartyParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n
"},{"location":"source/io/config/#io.config.IndexCrystFELParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True
class-attribute
instance-attribute
","text":"Whether long command-line arguments are passed like --long=arg
.
set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/config/#io.config.ManipulateHKLParameters","title":"ManipulateHKLParameters
","text":" Bases: ThirdPartyParameters
Parameters for CrystFEL's get_hkl
for manipulating lists of reflections.
This Task is predominantly used internally to convert hkl
to mtz
files. Note that performing multiple manipulations is undefined behaviour. Run the Task with multiple configurations in explicit separate steps. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-partialator.html
lute/io/models/sfx_merge.py
class ManipulateHKLParameters(ThirdPartyParameters):\n \"\"\"Parameters for CrystFEL's `get_hkl` for manipulating lists of reflections.\n\n This Task is predominantly used internally to convert `hkl` to `mtz` files.\n Note that performing multiple manipulations is undefined behaviour. Run\n the Task with multiple configurations in explicit separate steps. For more\n information on usage, please refer to the CrystFEL documentation, here:\n https://www.desy.de/~twhite/crystfel/manual-partialator.html\n \"\"\"\n\n class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/get_hkl\",\n description=\"CrystFEL's reflection manipulation binary.\",\n flag_type=\"\",\n )\n in_file: str = Field(\n \"\",\n description=\"Path to input HKL file.\",\n flag_type=\"-\",\n rename_param=\"i\",\n )\n out_file: str = Field(\n \"\",\n description=\"Path to output file.\",\n flag_type=\"-\",\n rename_param=\"o\",\n is_result=True,\n )\n cell_file: str = Field(\n \"\",\n description=\"Path to a file containing unit cell information (PDB or CrystFEL format).\",\n flag_type=\"-\",\n rename_param=\"p\",\n )\n output_format: str = Field(\n \"mtz\",\n description=\"Output format. One of mtz, mtz-bij, or xds. Otherwise CrystFEL format.\",\n flag_type=\"--\",\n rename_param=\"output-format\",\n )\n expand: Optional[str] = Field(\n description=\"Reflections will be expanded to fill asymmetric unit of specified point group.\",\n flag_type=\"--\",\n )\n # Reducing reflections to higher symmetry\n twin: Optional[str] = Field(\n description=\"Reflections equivalent to specified point group will have intensities summed.\",\n flag_type=\"--\",\n )\n no_need_all_parts: Optional[bool] = Field(\n description=\"Use with --twin to allow reflections missing a 'twin mate' to be written out.\",\n flag_type=\"--\",\n rename_param=\"no-need-all-parts\",\n )\n # Noise - Add to data\n noise: Optional[bool] = Field(\n description=\"Generate 10% uniform noise.\", flag_type=\"--\"\n )\n poisson: Optional[bool] = Field(\n description=\"Generate Poisson noise. Intensities assumed to be A.U.\",\n flag_type=\"--\",\n )\n adu_per_photon: Optional[int] = Field(\n description=\"Use with --poisson to convert A.U. to photons.\",\n flag_type=\"--\",\n rename_param=\"adu-per-photon\",\n )\n # Remove duplicate reflections\n trim_centrics: Optional[bool] = Field(\n description=\"Duplicated reflections (according to symmetry) are removed.\",\n flag_type=\"--\",\n )\n # Restrict to template file\n template: Optional[str] = Field(\n description=\"Only reflections which also appear in specified file are written out.\",\n flag_type=\"--\",\n )\n # Multiplicity\n multiplicity: Optional[bool] = Field(\n description=\"Reflections are multiplied by their symmetric multiplicites.\",\n flag_type=\"--\",\n )\n # Resolution cutoffs\n cutoff_angstroms: Optional[Union[str, int, float]] = Field(\n description=\"Either n, or n1,n2,n3. For n, reflections < n are removed. For n1,n2,n3 anisotropic trunction performed at separate resolution limits for a*, b*, c*.\",\n flag_type=\"--\",\n rename_param=\"cutoff-angstroms\",\n )\n lowres: Optional[float] = Field(\n description=\"Remove reflections with d > n\", flag_type=\"--\"\n )\n highres: Optional[float] = Field(\n description=\"Synonym for first form of --cutoff-angstroms\"\n )\n reindex: Optional[str] = Field(\n description=\"Reindex according to specified operator. E.g. k,h,-l.\",\n flag_type=\"--\",\n )\n # Override input symmetry\n symmetry: Optional[str] = Field(\n description=\"Point group symmetry to use to override. Almost always OMIT this option.\",\n flag_type=\"--\",\n )\n\n @validator(\"in_file\", always=True)\n def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n if in_file == \"\":\n partialator_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n )\n if partialator_file:\n return partialator_file\n return in_file\n\n @validator(\"out_file\", always=True)\n def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n if out_file == \"\":\n partialator_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n )\n if partialator_file:\n mtz_out: str = partialator_file.split(\".\")[0]\n mtz_out = f\"{mtz_out}.mtz\"\n return mtz_out\n return out_file\n\n @validator(\"cell_file\", always=True)\n def validate_cell_file(cls, cell_file: str, values: Dict[str, Any]) -> str:\n if cell_file == \"\":\n idx_cell_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\",\n \"IndexCrystFEL\",\n \"cell_file\",\n valid_only=False,\n )\n if idx_cell_file:\n return idx_cell_file\n return cell_file\n
"},{"location":"source/io/config/#io.config.ManipulateHKLParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_merge.py
class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/config/#io.config.ManipulateHKLParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True
class-attribute
instance-attribute
","text":"Whether long command-line arguments are passed like --long=arg
.
set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/config/#io.config.MergePartialatorParameters","title":"MergePartialatorParameters
","text":" Bases: ThirdPartyParameters
Parameters for CrystFEL's partialator
.
There are many parameters, and many combinations. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-partialator.html
Source code inlute/io/models/sfx_merge.py
class MergePartialatorParameters(ThirdPartyParameters):\n \"\"\"Parameters for CrystFEL's `partialator`.\n\n There are many parameters, and many combinations. For more information on\n usage, please refer to the CrystFEL documentation, here:\n https://www.desy.de/~twhite/crystfel/manual-partialator.html\n \"\"\"\n\n class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/partialator\",\n description=\"CrystFEL's Partialator binary.\",\n flag_type=\"\",\n )\n in_file: Optional[str] = Field(\n \"\", description=\"Path to input stream.\", flag_type=\"-\", rename_param=\"i\"\n )\n out_file: str = Field(\n \"\",\n description=\"Path to output file.\",\n flag_type=\"-\",\n rename_param=\"o\",\n is_result=True,\n )\n symmetry: str = Field(description=\"Point group symmetry.\", flag_type=\"--\")\n niter: Optional[int] = Field(\n description=\"Number of cycles of scaling and post-refinement.\",\n flag_type=\"-\",\n rename_param=\"n\",\n )\n no_scale: Optional[bool] = Field(\n description=\"Disable scaling.\", flag_type=\"--\", rename_param=\"no-scale\"\n )\n no_Bscale: Optional[bool] = Field(\n description=\"Disable Debye-Waller part of scaling.\",\n flag_type=\"--\",\n rename_param=\"no-Bscale\",\n )\n no_pr: Optional[bool] = Field(\n description=\"Disable orientation model.\", flag_type=\"--\", rename_param=\"no-pr\"\n )\n no_deltacchalf: Optional[bool] = Field(\n description=\"Disable rejection based on deltaCC1/2.\",\n flag_type=\"--\",\n rename_param=\"no-deltacchalf\",\n )\n model: str = Field(\n \"unity\",\n description=\"Partiality model. Options: xsphere, unity, offset, ggpm.\",\n flag_type=\"--\",\n )\n nthreads: int = Field(\n max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n description=\"Number of parallel analyses.\",\n flag_type=\"-\",\n rename_param=\"j\",\n )\n polarisation: Optional[str] = Field(\n description=\"Specification of incident polarisation. Refer to CrystFEL docs for more info.\",\n flag_type=\"--\",\n )\n no_polarisation: Optional[bool] = Field(\n description=\"Synonym for --polarisation=none\",\n flag_type=\"--\",\n rename_param=\"no-polarisation\",\n )\n max_adu: Optional[float] = Field(\n description=\"Maximum intensity of reflection to include.\",\n flag_type=\"--\",\n rename_param=\"max-adu\",\n )\n min_res: Optional[float] = Field(\n description=\"Only include crystals diffracting to a minimum resolution.\",\n flag_type=\"--\",\n rename_param=\"min-res\",\n )\n min_measurements: int = Field(\n 2,\n description=\"Include a reflection only if it appears a minimum number of times.\",\n flag_type=\"--\",\n rename_param=\"min-measurements\",\n )\n push_res: Optional[float] = Field(\n description=\"Merge reflections up to higher than the apparent resolution limit.\",\n flag_type=\"--\",\n rename_param=\"push-res\",\n )\n start_after: int = Field(\n 0,\n description=\"Ignore the first n crystals.\",\n flag_type=\"--\",\n rename_param=\"start-after\",\n )\n stop_after: int = Field(\n 0,\n description=\"Stop after processing n crystals. 0 means process all.\",\n flag_type=\"--\",\n rename_param=\"stop-after\",\n )\n no_free: Optional[bool] = Field(\n description=\"Disable cross-validation. Testing ONLY.\",\n flag_type=\"--\",\n rename_param=\"no-free\",\n )\n custom_split: Optional[str] = Field(\n description=\"Read a set of filenames, event and dataset IDs from a filename.\",\n flag_type=\"--\",\n rename_param=\"custom-split\",\n )\n max_rel_B: float = Field(\n 100,\n description=\"Reject crystals if |relB| > n sq Angstroms.\",\n flag_type=\"--\",\n rename_param=\"max-rel-B\",\n )\n output_every_cycle: bool = Field(\n False,\n description=\"Write per-crystal params after every refinement cycle.\",\n flag_type=\"--\",\n rename_param=\"output-every-cycle\",\n )\n no_logs: bool = Field(\n False,\n description=\"Do not write logs needed for plots, maps and graphs.\",\n flag_type=\"--\",\n rename_param=\"no-logs\",\n )\n set_symmetry: Optional[str] = Field(\n description=\"Set the apparent symmetry of the crystals to a point group.\",\n flag_type=\"-\",\n rename_param=\"w\",\n )\n operator: Optional[str] = Field(\n description=\"Specify an ambiguity operator. E.g. k,h,-l.\", flag_type=\"--\"\n )\n force_bandwidth: Optional[float] = Field(\n description=\"Set X-ray bandwidth. As percent, e.g. 0.0013 (0.13%).\",\n flag_type=\"--\",\n rename_param=\"force-bandwidth\",\n )\n force_radius: Optional[float] = Field(\n description=\"Set the initial profile radius (nm-1).\",\n flag_type=\"--\",\n rename_param=\"force-radius\",\n )\n force_lambda: Optional[float] = Field(\n description=\"Set the wavelength. In Angstroms.\",\n flag_type=\"--\",\n rename_param=\"force-lambda\",\n )\n harvest_file: Optional[str] = Field(\n description=\"Write parameters to file in JSON format.\",\n flag_type=\"--\",\n rename_param=\"harvest-file\",\n )\n\n @validator(\"in_file\", always=True)\n def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n if in_file == \"\":\n stream_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\",\n \"ConcatenateStreamFiles\",\n \"out_file\",\n )\n if stream_file:\n return stream_file\n return in_file\n\n @validator(\"out_file\", always=True)\n def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n if out_file == \"\":\n in_file: str = values[\"in_file\"]\n if in_file:\n tag: str = in_file.split(\".\")[0]\n return f\"{tag}.hkl\"\n else:\n return \"partialator.hkl\"\n return out_file\n
"},{"location":"source/io/config/#io.config.MergePartialatorParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_merge.py
class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/config/#io.config.MergePartialatorParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True
class-attribute
instance-attribute
","text":"Whether long command-line arguments are passed like --long=arg
.
set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/config/#io.config.RunSHELXCParameters","title":"RunSHELXCParameters
","text":" Bases: ThirdPartyParameters
Parameters for CCP4's SHELXC program.
SHELXC prepares files for SHELXD and SHELXE.
For more information please refer to the official documentation: https://www.ccp4.ac.uk/html/crank.html
Source code inlute/io/models/sfx_solve.py
class RunSHELXCParameters(ThirdPartyParameters):\n \"\"\"Parameters for CCP4's SHELXC program.\n\n SHELXC prepares files for SHELXD and SHELXE.\n\n For more information please refer to the official documentation:\n https://www.ccp4.ac.uk/html/crank.html\n \"\"\"\n\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/ccp4-8.0/bin/shelxc\",\n description=\"CCP4 SHELXC. Generates input files for SHELXD/SHELXE.\",\n flag_type=\"\",\n )\n placeholder: str = Field(\n \"xx\", description=\"Placeholder filename stem.\", flag_type=\"\"\n )\n in_file: str = Field(\n \"\",\n description=\"Input file for SHELXC with reflections AND proper records.\",\n flag_type=\"\",\n )\n\n @validator(\"in_file\", always=True)\n def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n if in_file == \"\":\n # get_hkl needed to be run to produce an XDS format file...\n xds_format_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"ManipulateHKL\", \"out_file\"\n )\n if xds_format_file:\n in_file = xds_format_file\n if in_file[0] != \"<\":\n # Need to add a redirection for this program\n # Runs like `shelxc xx <input_file.xds`\n in_file = f\"<{in_file}\"\n return in_file\n
"},{"location":"source/io/config/#io.config.SubmitSMDParameters","title":"SubmitSMDParameters
","text":" Bases: ThirdPartyParameters
Parameters for running smalldata to produce reduced HDF5 files.
Source code inlute/io/models/smd.py
class SubmitSMDParameters(ThirdPartyParameters):\n \"\"\"Parameters for running smalldata to produce reduced HDF5 files.\"\"\"\n\n class Config(ThirdPartyParameters.Config):\n \"\"\"Identical to super-class Config but includes a result.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n result_from_params: str = \"\"\n \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n\n executable: str = Field(\"mpirun\", description=\"MPI executable.\", flag_type=\"\")\n np: PositiveInt = Field(\n max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n description=\"Number of processes\",\n flag_type=\"-\",\n )\n p_arg1: str = Field(\n \"python\", description=\"Executable to run with mpi (i.e. python).\", flag_type=\"\"\n )\n u: str = Field(\n \"\", description=\"Python option for unbuffered output.\", flag_type=\"-\"\n )\n m: str = Field(\n \"mpi4py.run\",\n description=\"Python option to execute a module's contents as __main__ module.\",\n flag_type=\"-\",\n )\n producer: str = Field(\n \"\", description=\"Path to the SmallData producer Python script.\", flag_type=\"\"\n )\n run: str = Field(\n os.environ.get(\"RUN_NUM\", \"\"), description=\"DAQ Run Number.\", flag_type=\"--\"\n )\n experiment: str = Field(\n os.environ.get(\"EXPERIMENT\", \"\"),\n description=\"LCLS Experiment Number.\",\n flag_type=\"--\",\n )\n stn: NonNegativeInt = Field(0, description=\"Hutch endstation.\", flag_type=\"--\")\n nevents: int = Field(\n int(1e9), description=\"Number of events to process.\", flag_type=\"--\"\n )\n directory: Optional[str] = Field(\n None,\n description=\"Optional output directory. If None, will be in ${EXP_FOLDER}/hdf5/smalldata.\",\n flag_type=\"--\",\n )\n ## Need mechanism to set result_from_param=True ...\n gather_interval: PositiveInt = Field(\n 25, description=\"Number of events to collect at a time.\", flag_type=\"--\"\n )\n norecorder: bool = Field(\n False, description=\"Whether to ignore recorder streams.\", flag_type=\"--\"\n )\n url: HttpUrl = Field(\n \"https://pswww.slac.stanford.edu/ws-auth/lgbk\",\n description=\"Base URL for eLog posting.\",\n flag_type=\"--\",\n )\n epicsAll: bool = Field(\n False,\n description=\"Whether to store all EPICS PVs. Use with care.\",\n flag_type=\"--\",\n )\n full: bool = Field(\n False,\n description=\"Whether to store all data. Use with EXTRA care.\",\n flag_type=\"--\",\n )\n fullSum: bool = Field(\n False,\n description=\"Whether to store sums for all area detector images.\",\n flag_type=\"--\",\n )\n default: bool = Field(\n False,\n description=\"Whether to store only the default minimal set of data.\",\n flag_type=\"--\",\n )\n image: bool = Field(\n False,\n description=\"Whether to save everything as images. Use with care.\",\n flag_type=\"--\",\n )\n tiff: bool = Field(\n False,\n description=\"Whether to save all images as a single TIFF. Use with EXTRA care.\",\n flag_type=\"--\",\n )\n centerpix: bool = Field(\n False,\n description=\"Whether to mask center pixels for Epix10k2M detectors.\",\n flag_type=\"--\",\n )\n postRuntable: bool = Field(\n False,\n description=\"Whether to post run tables. Also used as a trigger for summary jobs.\",\n flag_type=\"--\",\n )\n wait: bool = Field(\n False, description=\"Whether to wait for a file to appear.\", flag_type=\"--\"\n )\n xtcav: bool = Field(\n False,\n description=\"Whether to add XTCAV processing to the HDF5 generation.\",\n flag_type=\"--\",\n )\n noarch: bool = Field(\n False, description=\"Whether to not use archiver data.\", flag_type=\"--\"\n )\n\n lute_template_cfg: TemplateConfig = TemplateConfig(template_name=\"\", output_path=\"\")\n\n @validator(\"producer\", always=True)\n def validate_producer_path(cls, producer: str) -> str:\n return producer\n\n @validator(\"lute_template_cfg\", always=True)\n def use_producer(\n cls, lute_template_cfg: TemplateConfig, values: Dict[str, Any]\n ) -> TemplateConfig:\n if not lute_template_cfg.output_path:\n lute_template_cfg.output_path = values[\"producer\"]\n return lute_template_cfg\n\n @root_validator(pre=False)\n def define_result(cls, values: Dict[str, Any]) -> Dict[str, Any]:\n exp: str = values[\"lute_config\"].experiment\n hutch: str = exp[:3]\n run: int = int(values[\"lute_config\"].run)\n directory: Optional[str] = values[\"directory\"]\n if directory is None:\n directory = f\"/sdf/data/lcls/ds/{hutch}/{exp}/hdf5/smalldata\"\n fname: str = f\"{exp}_Run{run:04d}.h5\"\n\n cls.Config.result_from_params = f\"{directory}/{fname}\"\n return values\n
"},{"location":"source/io/config/#io.config.SubmitSMDParameters.Config","title":"Config
","text":" Bases: Config
Identical to super-class Config but includes a result.
Source code inlute/io/models/smd.py
class Config(ThirdPartyParameters.Config):\n \"\"\"Identical to super-class Config but includes a result.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n result_from_params: str = \"\"\n \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n
"},{"location":"source/io/config/#io.config.SubmitSMDParameters.Config.result_from_params","title":"result_from_params: str = ''
class-attribute
instance-attribute
","text":"Defines a result from the parameters. Use a validator to do so.
"},{"location":"source/io/config/#io.config.SubmitSMDParameters.Config.set_result","title":"set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/config/#io.config.TaskParameters","title":"TaskParameters
","text":" Bases: BaseSettings
Base class for models of task parameters to be validated.
Parameters are read from a configuration YAML file and validated against subclasses of this type in order to ensure that both all parameters are present, and that the parameters are of the correct type.
NotePydantic is used for data validation. Pydantic does not perform \"strict\" validation by default. Parameter values may be cast to conform with the model specified by the subclass definition if it is possible to do so. Consider whether this may cause issues (e.g. if a float is cast to an int).
Source code inlute/io/models/base.py
class TaskParameters(BaseSettings):\n \"\"\"Base class for models of task parameters to be validated.\n\n Parameters are read from a configuration YAML file and validated against\n subclasses of this type in order to ensure that both all parameters are\n present, and that the parameters are of the correct type.\n\n Note:\n Pydantic is used for data validation. Pydantic does not perform \"strict\"\n validation by default. Parameter values may be cast to conform with the\n model specified by the subclass definition if it is possible to do so.\n Consider whether this may cause issues (e.g. if a float is cast to an\n int).\n \"\"\"\n\n class Config:\n \"\"\"Configuration for parameters model.\n\n The Config class holds Pydantic configuration. A number of LUTE-specific\n configuration has also been placed here.\n\n Attributes:\n env_prefix (str): Pydantic configuration. Will set parameters from\n environment variables containing this prefix. E.g. a model\n parameter `input` can be set with an environment variable:\n `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n underscore_attrs_are_private (bool): Pydantic configuration. Whether\n to hide attributes (parameters) prefixed with an underscore.\n\n copy_on_model_validation (str): Pydantic configuration. How to copy\n the input object passed to the class instance for model\n validation. Set to perform a deep copy.\n\n allow_inf_nan (bool): Pydantic configuration. Whether to allow\n infinity or NAN in float fields.\n\n run_directory (Optional[str]): None. If set, it should be a valid\n path. The `Task` will be run from this directory. This may be\n useful for some `Task`s which rely on searching the working\n directory.\n\n set_result (bool). False. If True, the model has information about\n setting the TaskResult object from the parameters it contains.\n E.g. it has an `output` parameter which is marked as the result.\n The result can be set with a field value of `is_result=True` on\n a specific parameter, or using `result_from_params` and a\n validator.\n\n result_from_params (Optional[str]): None. Optionally used to define\n results from information available in the model using a custom\n validator. E.g. use a `outdir` and `filename` field to set\n `result_from_params=f\"{outdir}/{filename}`, etc. Only used if\n `set_result==True`\n\n result_summary (Optional[str]): None. Defines a result summary that\n can be known after processing the Pydantic model. Use of summary\n depends on the Executor running the Task. All summaries are\n stored in the database, however. Only used if `set_result==True`\n\n impl_schemas (Optional[str]). Specifies a the schemas the\n output/results conform to. Only used if `set_result==True`.\n \"\"\"\n\n env_prefix = \"LUTE_\"\n underscore_attrs_are_private: bool = True\n copy_on_model_validation: str = \"deep\"\n allow_inf_nan: bool = False\n\n run_directory: Optional[str] = None\n \"\"\"Set the directory that the Task is run from.\"\"\"\n set_result: bool = False\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n result_from_params: Optional[str] = None\n \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n result_summary: Optional[str] = None\n \"\"\"Format a TaskResult.summary from output.\"\"\"\n impl_schemas: Optional[str] = None\n \"\"\"Schema specification for output result. Will be passed to TaskResult.\"\"\"\n\n lute_config: AnalysisHeader\n
"},{"location":"source/io/config/#io.config.TaskParameters.Config","title":"Config
","text":"Configuration for parameters model.
The Config class holds Pydantic configuration. A number of LUTE-specific configuration has also been placed here.
Attributes:
Name Type Descriptionenv_prefix
str
Pydantic configuration. Will set parameters from environment variables containing this prefix. E.g. a model parameter input
can be set with an environment variable: {env_prefix}input
, in LUTE's case LUTE_input
.
underscore_attrs_are_private
bool
Pydantic configuration. Whether to hide attributes (parameters) prefixed with an underscore.
copy_on_model_validation
str
Pydantic configuration. How to copy the input object passed to the class instance for model validation. Set to perform a deep copy.
allow_inf_nan
bool
Pydantic configuration. Whether to allow infinity or NAN in float fields.
run_directory
Optional[str]
None. If set, it should be a valid path. The Task
will be run from this directory. This may be useful for some Task
s which rely on searching the working directory.
result_from_params
Optional[str]
None. Optionally used to define results from information available in the model using a custom validator. E.g. use a outdir
and filename
field to set result_from_params=f\"{outdir}/{filename}
, etc. Only used if set_result==True
result_summary
Optional[str]
None. Defines a result summary that can be known after processing the Pydantic model. Use of summary depends on the Executor running the Task. All summaries are stored in the database, however. Only used if set_result==True
lute/io/models/base.py
class Config:\n \"\"\"Configuration for parameters model.\n\n The Config class holds Pydantic configuration. A number of LUTE-specific\n configuration has also been placed here.\n\n Attributes:\n env_prefix (str): Pydantic configuration. Will set parameters from\n environment variables containing this prefix. E.g. a model\n parameter `input` can be set with an environment variable:\n `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n underscore_attrs_are_private (bool): Pydantic configuration. Whether\n to hide attributes (parameters) prefixed with an underscore.\n\n copy_on_model_validation (str): Pydantic configuration. How to copy\n the input object passed to the class instance for model\n validation. Set to perform a deep copy.\n\n allow_inf_nan (bool): Pydantic configuration. Whether to allow\n infinity or NAN in float fields.\n\n run_directory (Optional[str]): None. If set, it should be a valid\n path. The `Task` will be run from this directory. This may be\n useful for some `Task`s which rely on searching the working\n directory.\n\n set_result (bool). False. If True, the model has information about\n setting the TaskResult object from the parameters it contains.\n E.g. it has an `output` parameter which is marked as the result.\n The result can be set with a field value of `is_result=True` on\n a specific parameter, or using `result_from_params` and a\n validator.\n\n result_from_params (Optional[str]): None. Optionally used to define\n results from information available in the model using a custom\n validator. E.g. use a `outdir` and `filename` field to set\n `result_from_params=f\"{outdir}/{filename}`, etc. Only used if\n `set_result==True`\n\n result_summary (Optional[str]): None. Defines a result summary that\n can be known after processing the Pydantic model. Use of summary\n depends on the Executor running the Task. All summaries are\n stored in the database, however. Only used if `set_result==True`\n\n impl_schemas (Optional[str]). Specifies a the schemas the\n output/results conform to. Only used if `set_result==True`.\n \"\"\"\n\n env_prefix = \"LUTE_\"\n underscore_attrs_are_private: bool = True\n copy_on_model_validation: str = \"deep\"\n allow_inf_nan: bool = False\n\n run_directory: Optional[str] = None\n \"\"\"Set the directory that the Task is run from.\"\"\"\n set_result: bool = False\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n result_from_params: Optional[str] = None\n \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n result_summary: Optional[str] = None\n \"\"\"Format a TaskResult.summary from output.\"\"\"\n impl_schemas: Optional[str] = None\n \"\"\"Schema specification for output result. Will be passed to TaskResult.\"\"\"\n
"},{"location":"source/io/config/#io.config.TaskParameters.Config.impl_schemas","title":"impl_schemas: Optional[str] = None
class-attribute
instance-attribute
","text":"Schema specification for output result. Will be passed to TaskResult.
"},{"location":"source/io/config/#io.config.TaskParameters.Config.result_from_params","title":"result_from_params: Optional[str] = None
class-attribute
instance-attribute
","text":"Defines a result from the parameters. Use a validator to do so.
"},{"location":"source/io/config/#io.config.TaskParameters.Config.result_summary","title":"result_summary: Optional[str] = None
class-attribute
instance-attribute
","text":"Format a TaskResult.summary from output.
"},{"location":"source/io/config/#io.config.TaskParameters.Config.run_directory","title":"run_directory: Optional[str] = None
class-attribute
instance-attribute
","text":"Set the directory that the Task is run from.
"},{"location":"source/io/config/#io.config.TaskParameters.Config.set_result","title":"set_result: bool = False
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/config/#io.config.TemplateConfig","title":"TemplateConfig
","text":" Bases: BaseModel
Parameters used for templating of third party configuration files.
Attributes:
Name Type Descriptiontemplate_name
str
The name of the template to use. This template must live in config/templates
.
output_path
str
The FULL path, including filename to write the rendered template to.
Source code inlute/io/models/base.py
class TemplateConfig(BaseModel):\n \"\"\"Parameters used for templating of third party configuration files.\n\n Attributes:\n template_name (str): The name of the template to use. This template must\n live in `config/templates`.\n\n output_path (str): The FULL path, including filename to write the\n rendered template to.\n \"\"\"\n\n template_name: str\n output_path: str\n
"},{"location":"source/io/config/#io.config.TemplateParameters","title":"TemplateParameters
","text":"Class for representing parameters for third party configuration files.
These parameters can represent arbitrary data types and are used in conjunction with templates for modifying third party configuration files from the single LUTE YAML. Due to the storage of arbitrary data types, and the use of a template file, a single instance of this class can hold from a single template variable to an entire configuration file. The data parsing is done by jinja using the complementary template. All data is stored in the single model variable params.
The pydantic \"dataclass\" is used over the BaseModel/Settings to allow positional argument instantiation of the params
Field.
lute/io/models/base.py
@dataclass\nclass TemplateParameters:\n \"\"\"Class for representing parameters for third party configuration files.\n\n These parameters can represent arbitrary data types and are used in\n conjunction with templates for modifying third party configuration files\n from the single LUTE YAML. Due to the storage of arbitrary data types, and\n the use of a template file, a single instance of this class can hold from a\n single template variable to an entire configuration file. The data parsing\n is done by jinja using the complementary template.\n All data is stored in the single model variable `params.`\n\n The pydantic \"dataclass\" is used over the BaseModel/Settings to allow\n positional argument instantiation of the `params` Field.\n \"\"\"\n\n params: Any\n
"},{"location":"source/io/config/#io.config.TestBinaryErrParameters","title":"TestBinaryErrParameters
","text":" Bases: ThirdPartyParameters
Same as TestBinary, but exits with non-zero code.
Source code inlute/io/models/tests.py
class TestBinaryErrParameters(ThirdPartyParameters):\n \"\"\"Same as TestBinary, but exits with non-zero code.\"\"\"\n\n executable: str = Field(\n \"/sdf/home/d/dorlhiac/test_tasks/test_threads_err\",\n description=\"Multi-threaded tes tbinary with non-zero exit code.\",\n )\n p_arg1: int = Field(1, description=\"Number of threads.\")\n
"},{"location":"source/io/config/#io.config.TestMultiNodeCommunicationParameters","title":"TestMultiNodeCommunicationParameters
","text":" Bases: TaskParameters
Parameters for the test Task TestMultiNodeCommunication
.
Test verifies communication across multiple machines.
Source code inlute/io/models/mpi_tests.py
class TestMultiNodeCommunicationParameters(TaskParameters):\n \"\"\"Parameters for the test Task `TestMultiNodeCommunication`.\n\n Test verifies communication across multiple machines.\n \"\"\"\n\n send_obj: Literal[\"plot\", \"array\"] = Field(\n \"array\", description=\"Object to send to Executor. `plot` or `array`\"\n )\n arr_size: Optional[int] = Field(\n None, description=\"Size of array to send back to Executor.\"\n )\n
"},{"location":"source/io/config/#io.config.TestParameters","title":"TestParameters
","text":" Bases: TaskParameters
Parameters for the test Task Test
.
lute/io/models/tests.py
class TestParameters(TaskParameters):\n \"\"\"Parameters for the test Task `Test`.\"\"\"\n\n float_var: float = Field(0.01, description=\"A floating point number.\")\n str_var: str = Field(\"test\", description=\"A string.\")\n\n class CompoundVar(BaseModel):\n int_var: int = 1\n dict_var: Dict[str, str] = {\"a\": \"b\"}\n\n compound_var: CompoundVar = Field(\n description=(\n \"A compound parameter - consists of a `int_var` (int) and `dict_var`\"\n \" (Dict[str, str]).\"\n )\n )\n throw_error: bool = Field(\n False, description=\"If `True`, raise an exception to test error handling.\"\n )\n
"},{"location":"source/io/config/#io.config.ThirdPartyParameters","title":"ThirdPartyParameters
","text":" Bases: TaskParameters
Base class for third party task parameters.
Contains special validators for extra arguments and handling of parameters used for filling in third party configuration files.
Source code inlute/io/models/base.py
class ThirdPartyParameters(TaskParameters):\n \"\"\"Base class for third party task parameters.\n\n Contains special validators for extra arguments and handling of parameters\n used for filling in third party configuration files.\n \"\"\"\n\n class Config(TaskParameters.Config):\n \"\"\"Configuration for parameters model.\n\n The Config class holds Pydantic configuration and inherited configuration\n from the base `TaskParameters.Config` class. A number of values are also\n overridden, and there are some specific configuration options to\n ThirdPartyParameters. A full list of options (with TaskParameters options\n repeated) is described below.\n\n Attributes:\n env_prefix (str): Pydantic configuration. Will set parameters from\n environment variables containing this prefix. E.g. a model\n parameter `input` can be set with an environment variable:\n `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n underscore_attrs_are_private (bool): Pydantic configuration. Whether\n to hide attributes (parameters) prefixed with an underscore.\n\n copy_on_model_validation (str): Pydantic configuration. How to copy\n the input object passed to the class instance for model\n validation. Set to perform a deep copy.\n\n allow_inf_nan (bool): Pydantic configuration. Whether to allow\n infinity or NAN in float fields.\n\n run_directory (Optional[str]): None. If set, it should be a valid\n path. The `Task` will be run from this directory. This may be\n useful for some `Task`s which rely on searching the working\n directory.\n\n set_result (bool). True. If True, the model has information about\n setting the TaskResult object from the parameters it contains.\n E.g. it has an `output` parameter which is marked as the result.\n The result can be set with a field value of `is_result=True` on\n a specific parameter, or using `result_from_params` and a\n validator.\n\n result_from_params (Optional[str]): None. Optionally used to define\n results from information available in the model using a custom\n validator. E.g. use a `outdir` and `filename` field to set\n `result_from_params=f\"{outdir}/{filename}`, etc.\n\n result_summary (Optional[str]): None. Defines a result summary that\n can be known after processing the Pydantic model. Use of summary\n depends on the Executor running the Task. All summaries are\n stored in the database, however.\n\n impl_schemas (Optional[str]). Specifies a the schemas the\n output/results conform to. Only used if set_result is True.\n\n -----------------------\n ThirdPartyTask-specific:\n\n extra (str): \"allow\". Pydantic configuration. Allow (or ignore) extra\n arguments.\n\n short_flags_use_eq (bool): False. If True, \"short\" command-line args\n are passed as `-x=arg`. ThirdPartyTask-specific.\n\n long_flags_use_eq (bool): False. If True, \"long\" command-line args\n are passed as `--long=arg`. ThirdPartyTask-specific.\n \"\"\"\n\n extra: str = \"allow\"\n short_flags_use_eq: bool = False\n \"\"\"Whether short command-line arguments are passed like `-x=arg`.\"\"\"\n long_flags_use_eq: bool = False\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n # lute_template_cfg: TemplateConfig\n\n @root_validator(pre=False)\n def extra_fields_to_thirdparty(cls, values: Dict[str, Any]):\n for key in values:\n if key not in cls.__fields__:\n values[key] = TemplateParameters(values[key])\n\n return values\n
"},{"location":"source/io/config/#io.config.ThirdPartyParameters.Config","title":"Config
","text":" Bases: Config
Configuration for parameters model.
The Config class holds Pydantic configuration and inherited configuration from the base TaskParameters.Config
class. A number of values are also overridden, and there are some specific configuration options to ThirdPartyParameters. A full list of options (with TaskParameters options repeated) is described below.
Attributes:
Name Type Descriptionenv_prefix
str
Pydantic configuration. Will set parameters from environment variables containing this prefix. E.g. a model parameter input
can be set with an environment variable: {env_prefix}input
, in LUTE's case LUTE_input
.
underscore_attrs_are_private
bool
Pydantic configuration. Whether to hide attributes (parameters) prefixed with an underscore.
copy_on_model_validation
str
Pydantic configuration. How to copy the input object passed to the class instance for model validation. Set to perform a deep copy.
allow_inf_nan
bool
Pydantic configuration. Whether to allow infinity or NAN in float fields.
run_directory
Optional[str]
None. If set, it should be a valid path. The Task
will be run from this directory. This may be useful for some Task
s which rely on searching the working directory.
result_from_params
Optional[str]
None. Optionally used to define results from information available in the model using a custom validator. E.g. use a outdir
and filename
field to set result_from_params=f\"{outdir}/{filename}
, etc.
result_summary
Optional[str]
None. Defines a result summary that can be known after processing the Pydantic model. Use of summary depends on the Executor running the Task. All summaries are stored in the database, however.
ThirdPartyTask-specific
Optional[str]
extra
str
\"allow\". Pydantic configuration. Allow (or ignore) extra arguments.
short_flags_use_eq
bool
False. If True, \"short\" command-line args are passed as -x=arg
. ThirdPartyTask-specific.
long_flags_use_eq
bool
False. If True, \"long\" command-line args are passed as --long=arg
. ThirdPartyTask-specific.
lute/io/models/base.py
class Config(TaskParameters.Config):\n \"\"\"Configuration for parameters model.\n\n The Config class holds Pydantic configuration and inherited configuration\n from the base `TaskParameters.Config` class. A number of values are also\n overridden, and there are some specific configuration options to\n ThirdPartyParameters. A full list of options (with TaskParameters options\n repeated) is described below.\n\n Attributes:\n env_prefix (str): Pydantic configuration. Will set parameters from\n environment variables containing this prefix. E.g. a model\n parameter `input` can be set with an environment variable:\n `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n underscore_attrs_are_private (bool): Pydantic configuration. Whether\n to hide attributes (parameters) prefixed with an underscore.\n\n copy_on_model_validation (str): Pydantic configuration. How to copy\n the input object passed to the class instance for model\n validation. Set to perform a deep copy.\n\n allow_inf_nan (bool): Pydantic configuration. Whether to allow\n infinity or NAN in float fields.\n\n run_directory (Optional[str]): None. If set, it should be a valid\n path. The `Task` will be run from this directory. This may be\n useful for some `Task`s which rely on searching the working\n directory.\n\n set_result (bool). True. If True, the model has information about\n setting the TaskResult object from the parameters it contains.\n E.g. it has an `output` parameter which is marked as the result.\n The result can be set with a field value of `is_result=True` on\n a specific parameter, or using `result_from_params` and a\n validator.\n\n result_from_params (Optional[str]): None. Optionally used to define\n results from information available in the model using a custom\n validator. E.g. use a `outdir` and `filename` field to set\n `result_from_params=f\"{outdir}/{filename}`, etc.\n\n result_summary (Optional[str]): None. Defines a result summary that\n can be known after processing the Pydantic model. Use of summary\n depends on the Executor running the Task. All summaries are\n stored in the database, however.\n\n impl_schemas (Optional[str]). Specifies a the schemas the\n output/results conform to. Only used if set_result is True.\n\n -----------------------\n ThirdPartyTask-specific:\n\n extra (str): \"allow\". Pydantic configuration. Allow (or ignore) extra\n arguments.\n\n short_flags_use_eq (bool): False. If True, \"short\" command-line args\n are passed as `-x=arg`. ThirdPartyTask-specific.\n\n long_flags_use_eq (bool): False. If True, \"long\" command-line args\n are passed as `--long=arg`. ThirdPartyTask-specific.\n \"\"\"\n\n extra: str = \"allow\"\n short_flags_use_eq: bool = False\n \"\"\"Whether short command-line arguments are passed like `-x=arg`.\"\"\"\n long_flags_use_eq: bool = False\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/config/#io.config.ThirdPartyParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = False
class-attribute
instance-attribute
","text":"Whether long command-line arguments are passed like --long=arg
.
set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/config/#io.config.ThirdPartyParameters.Config.short_flags_use_eq","title":"short_flags_use_eq: bool = False
class-attribute
instance-attribute
","text":"Whether short command-line arguments are passed like -x=arg
.
parse_config(task_name='test', config_path='')
","text":"Parse a configuration file and validate the contents.
Parameters:
Name Type Description Defaulttask_name
str
Name of the specific task that will be run.
'test'
config_path
str
Path to the configuration file.
''
Returns:
Name Type Descriptionparams
TaskParameters
A TaskParameters object of validated task-specific parameters. Parameters are accessed with \"dot\" notation. E.g. params.param1
.
Raises:
Type DescriptionValidationError
Raised if there are problems with the configuration file. Passed through from Pydantic.
Source code inlute/io/config.py
def parse_config(task_name: str = \"test\", config_path: str = \"\") -> TaskParameters:\n \"\"\"Parse a configuration file and validate the contents.\n\n Args:\n task_name (str): Name of the specific task that will be run.\n\n config_path (str): Path to the configuration file.\n\n Returns:\n params (TaskParameters): A TaskParameters object of validated\n task-specific parameters. Parameters are accessed with \"dot\"\n notation. E.g. `params.param1`.\n\n Raises:\n ValidationError: Raised if there are problems with the configuration\n file. Passed through from Pydantic.\n \"\"\"\n task_config_name: str = f\"{task_name}Parameters\"\n\n with open(config_path, \"r\") as f:\n docs: Iterator[Dict[str, Any]] = yaml.load_all(stream=f, Loader=yaml.FullLoader)\n header: Dict[str, Any] = next(docs)\n config: Dict[str, Any] = next(docs)\n substitute_variables(header, header)\n substitute_variables(header, config)\n LUTE_DEBUG_EXIT(\"LUTE_DEBUG_EXIT_AT_YAML\", pprint.pformat(config))\n lute_config: Dict[str, AnalysisHeader] = {\"lute_config\": AnalysisHeader(**header)}\n try:\n task_config: Dict[str, Any] = dict(config[task_name])\n lute_config.update(task_config)\n except KeyError as err:\n warnings.warn(\n (\n f\"{task_name} has no parameter definitions in YAML file.\"\n \" Attempting default parameter initialization.\"\n )\n )\n parsed_parameters: TaskParameters = globals()[task_config_name](**lute_config)\n return parsed_parameters\n
"},{"location":"source/io/config/#io.config.substitute_variables","title":"substitute_variables(header, config, curr_key=None)
","text":"Performs variable substitutions on a dictionary read from config YAML file.
Can be used to define input parameters in terms of other input parameters. This is similar to functionality employed by validators for parameters in the specific Task models, but is intended to be more accessible to users. Variable substitutions are defined using a minimal syntax from Jinja: {{ experiment }} defines a substitution of the variable experiment
. The characters {{ }}
can be escaped if the literal symbols are needed in place.
For example, a path to a file can be defined in terms of experiment and run values in the config file: MyTask: experiment: myexp run: 2 special_file: /path/to/{{ experiment }}/{{ run }}/file.inp
Acceptable variables for substitutions are values defined elsewhere in the YAML file. Environment variables can also be used if prefaced with a $
character. E.g. to get the experiment from an environment variable: MyTask: run: 2 special_file: /path/to/{{ $EXPERIMENT }}/{{ run }}/file.inp
Parameters:
Name Type Description Defaultconfig
Dict[str, Any]
A dictionary of parsed configuration.
requiredcurr_key
Optional[str]
Used to keep track of recursion level when scanning through iterable items in the config dictionary.
None
Returns:
Name Type Descriptionsubbed_config
Dict[str, Any]
The config dictionary after substitutions have been made. May be identical to the input if no substitutions are needed.
Source code inlute/io/config.py
def substitute_variables(\n header: Dict[str, Any], config: Dict[str, Any], curr_key: Optional[str] = None\n) -> None:\n \"\"\"Performs variable substitutions on a dictionary read from config YAML file.\n\n Can be used to define input parameters in terms of other input parameters.\n This is similar to functionality employed by validators for parameters in\n the specific Task models, but is intended to be more accessible to users.\n Variable substitutions are defined using a minimal syntax from Jinja:\n {{ experiment }}\n defines a substitution of the variable `experiment`. The characters `{{ }}`\n can be escaped if the literal symbols are needed in place.\n\n For example, a path to a file can be defined in terms of experiment and run\n values in the config file:\n MyTask:\n experiment: myexp\n run: 2\n special_file: /path/to/{{ experiment }}/{{ run }}/file.inp\n\n Acceptable variables for substitutions are values defined elsewhere in the\n YAML file. Environment variables can also be used if prefaced with a `$`\n character. E.g. to get the experiment from an environment variable:\n MyTask:\n run: 2\n special_file: /path/to/{{ $EXPERIMENT }}/{{ run }}/file.inp\n\n Args:\n config (Dict[str, Any]): A dictionary of parsed configuration.\n\n curr_key (Optional[str]): Used to keep track of recursion level when scanning\n through iterable items in the config dictionary.\n\n Returns:\n subbed_config (Dict[str, Any]): The config dictionary after substitutions\n have been made. May be identical to the input if no substitutions are\n needed.\n \"\"\"\n _sub_pattern = r\"\\{\\{[^}{]*\\}\\}\"\n iterable: Dict[str, Any] = config\n if curr_key is not None:\n # Need to handle nested levels by interpreting curr_key\n keys_by_level: List[str] = curr_key.split(\".\")\n for key in keys_by_level:\n iterable = iterable[key]\n else:\n ...\n # iterable = config\n for param, value in iterable.items():\n if isinstance(value, dict):\n new_key: str\n if curr_key is None:\n new_key = param\n else:\n new_key = f\"{curr_key}.{param}\"\n substitute_variables(header, config, curr_key=new_key)\n elif isinstance(value, list):\n ...\n # Scalars str - we skip numeric types\n elif isinstance(value, str):\n matches: List[str] = re.findall(_sub_pattern, value)\n for m in matches:\n key_to_sub_maybe_with_fmt: List[str] = m[2:-2].strip().split(\":\")\n key_to_sub: str = key_to_sub_maybe_with_fmt[0]\n fmt: Optional[str] = None\n if len(key_to_sub_maybe_with_fmt) == 2:\n fmt = key_to_sub_maybe_with_fmt[1]\n sub: Any\n if key_to_sub[0] == \"$\":\n sub = os.getenv(key_to_sub[1:], None)\n if sub is None:\n print(\n f\"Environment variable {key_to_sub[1:]} not found! Cannot substitute in YAML config!\",\n flush=True,\n )\n continue\n # substitutions from env vars will be strings, so convert back\n # to numeric in order to perform formatting later on (e.g. {var:04d})\n sub = _check_str_numeric(sub)\n else:\n try:\n sub = config\n for key in key_to_sub.split(\".\"):\n sub = sub[key]\n except KeyError:\n sub = header[key_to_sub]\n pattern: str = (\n m.replace(\"{{\", r\"\\{\\{\").replace(\"}}\", r\"\\}\\}\").replace(\"$\", r\"\\$\")\n )\n if fmt is not None:\n sub = f\"{sub:{fmt}}\"\n else:\n sub = f\"{sub}\"\n iterable[param] = re.sub(pattern, sub, iterable[param])\n # Reconvert back to numeric values if needed...\n iterable[param] = _check_str_numeric(iterable[param])\n
"},{"location":"source/io/db/","title":"db","text":"Tools for working with the LUTE parameter and configuration database.
The current implementation relies on a sqlite backend database. In the future this may change - therefore relatively few high-level API function calls are intended to be public. These abstract away the details of the database interface and work exclusively on LUTE objects.
Functions:
Name Descriptionrecord_analysis_db
DescribedAnalysis) -> None: Writes the configuration to the backend database.
read_latest_db_entry
str, task_name: str, param: str) -> Any: Retrieve the most recent entry from a database for a specific Task.
Raises:
Type DescriptionDatabaseError
Generic exception raised for LUTE database errors.
"},{"location":"source/io/db/#io.db.DatabaseError","title":"DatabaseError
","text":" Bases: Exception
General LUTE database error.
Source code inlute/io/db.py
class DatabaseError(Exception):\n \"\"\"General LUTE database error.\"\"\"\n\n ...\n
"},{"location":"source/io/db/#io.db.read_latest_db_entry","title":"read_latest_db_entry(db_dir, task_name, param, valid_only=True)
","text":"Read most recent value entered into the database for a Task parameter.
(Will be updated for schema compliance as well as Task name.)
Parameters:
Name Type Description Defaultdb_dir
str
Database location.
requiredtask_name
str
The name of the Task to check the database for.
requiredparam
str
The parameter name for the Task that we want to retrieve.
requiredvalid_only
bool
Whether to consider only valid results or not. E.g. An input file may be useful even if the Task result is invalid (Failed). Default = True.
True
Returns:
Name Type Descriptionval
Any
The most recently entered value for param
of task_name
that can be found in the database. Returns None if nothing found.
lute/io/db.py
def read_latest_db_entry(\n db_dir: str, task_name: str, param: str, valid_only: bool = True\n) -> Optional[Any]:\n \"\"\"Read most recent value entered into the database for a Task parameter.\n\n (Will be updated for schema compliance as well as Task name.)\n\n Args:\n db_dir (str): Database location.\n\n task_name (str): The name of the Task to check the database for.\n\n param (str): The parameter name for the Task that we want to retrieve.\n\n valid_only (bool): Whether to consider only valid results or not. E.g.\n An input file may be useful even if the Task result is invalid\n (Failed). Default = True.\n\n Returns:\n val (Any): The most recently entered value for `param` of `task_name`\n that can be found in the database. Returns None if nothing found.\n \"\"\"\n import sqlite3\n from ._sqlite import _select_from_db\n\n con: sqlite3.Connection = sqlite3.Connection(f\"{db_dir}/lute.db\")\n with con:\n try:\n cond: Dict[str, str] = {}\n if valid_only:\n cond = {\"valid_flag\": \"1\"}\n entry: Any = _select_from_db(con, task_name, param, cond)\n except sqlite3.OperationalError as err:\n logger.debug(f\"Cannot retrieve value {param} due to: {err}\")\n entry = None\n return entry\n
"},{"location":"source/io/db/#io.db.record_analysis_db","title":"record_analysis_db(cfg)
","text":"Write an DescribedAnalysis object to the database.
The DescribedAnalysis object is maintained by the Executor and contains all information necessary to fully describe a single Task
execution. The contained fields are split across multiple tables within the database as some of the information can be shared across multiple Tasks. Refer to docs/design/database.md
for more information on the database specification.
lute/io/db.py
def record_analysis_db(cfg: DescribedAnalysis) -> None:\n \"\"\"Write an DescribedAnalysis object to the database.\n\n The DescribedAnalysis object is maintained by the Executor and contains all\n information necessary to fully describe a single `Task` execution. The\n contained fields are split across multiple tables within the database as\n some of the information can be shared across multiple Tasks. Refer to\n `docs/design/database.md` for more information on the database specification.\n \"\"\"\n import sqlite3\n from ._sqlite import (\n _make_shared_table,\n _make_task_table,\n _add_row_no_duplicate,\n _add_task_entry,\n )\n\n try:\n work_dir: str = cfg.task_parameters.lute_config.work_dir\n except AttributeError:\n logger.info(\n (\n \"Unable to access TaskParameters object. Likely wasn't created. \"\n \"Cannot store result.\"\n )\n )\n return\n del cfg.task_parameters.lute_config.work_dir\n\n exec_entry, exec_columns = _cfg_to_exec_entry_cols(cfg)\n task_name: str = cfg.task_result.task_name\n # All `Task`s have an AnalysisHeader, but this info can be shared so is\n # split into a different table\n (\n task_entry, # Dict[str, Any]\n task_columns, # Dict[str, str]\n gen_entry, # Dict[str, Any]\n gen_columns, # Dict[str, str]\n ) = _params_to_entry_cols(cfg.task_parameters)\n x, y = _result_to_entry_cols(cfg.task_result)\n task_entry.update(x)\n task_columns.update(y)\n\n con: sqlite3.Connection = sqlite3.Connection(f\"{work_dir}/lute.db\")\n with con:\n # --- Table Creation ---#\n if not _make_shared_table(con, \"gen_cfg\", gen_columns):\n raise DatabaseError(\"Could not make general configuration table!\")\n if not _make_shared_table(con, \"exec_cfg\", exec_columns):\n raise DatabaseError(\"Could not make Executor configuration table!\")\n if not _make_task_table(con, task_name, task_columns):\n raise DatabaseError(f\"Could not make Task table for: {task_name}!\")\n\n # --- Row Addition ---#\n gen_id: int = _add_row_no_duplicate(con, \"gen_cfg\", gen_entry)\n exec_id: int = _add_row_no_duplicate(con, \"exec_cfg\", exec_entry)\n\n full_task_entry: Dict[str, Any] = {\n \"gen_cfg_id\": gen_id,\n \"exec_cfg_id\": exec_id,\n }\n full_task_entry.update(task_entry)\n # Prepare flag to indicate whether the task entry is valid or not\n # By default we say it is assuming proper completion\n valid_flag: int = (\n 1 if cfg.task_result.task_status == TaskStatus.COMPLETED else 0\n )\n full_task_entry.update({\"valid_flag\": valid_flag})\n\n _add_task_entry(con, task_name, full_task_entry)\n
"},{"location":"source/io/elog/","title":"elog","text":"Provides utilities for communicating with the LCLS eLog.
Make use of various eLog API endpoint to retrieve information or post results.
Functions:
Name Descriptionget_elog_opr_auth
str): Return an authorization object to interact with eLog API as an opr account for the hutch where exp
was conducted.
get_elog_kerberos_auth
Return the authorization headers for the user account submitting the job.
elog_http_request
str, request_type: str, **params): Make an HTTP request to the API endpoint at url
.
format_file_for_post
Union[str, tuple, list]): Prepare files according to the specification needed to add them as attachments to eLog posts.
post_elog_message
str, msg: str, tag: Optional[str], title: Optional[str], in_files: List[Union[str, tuple, list]], auth: Optional[Union[HTTPBasicAuth, Dict]] = None) Post a message to the eLog.
post_elog_run_status
Dict[str, Union[str, int, float]], update_url: Optional[str] = None) Post a run status to the summary section on the Workflows>Control tab.
post_elog_run_table
str, run: int, data: Dict[str, Any], auth: Optional[Union[HTTPBasicAuth, Dict]] = None) Update run table in the eLog.
get_elog_runs_by_tag
str, tag: str, auth: Optional[Union[HTTPBasicAuth, Dict]] = None) Return a list of runs with a specific tag.
get_elog_params_by_run
str, params: List[str], runs: Optional[List[int]]) Retrieve the requested parameters by run. If no run is provided, retrieve the requested parameters for all runs.
"},{"location":"source/io/elog/#io.elog.elog_http_request","title":"elog_http_request(exp, endpoint, request_type, **params)
","text":"Make an HTTP request to the eLog.
This method will determine the proper authorization method and update the passed parameters appropriately. Functions implementing specific endpoint functionality and calling this function should only pass the necessary endpoint-specific parameters and not include the authorization objects.
Parameters:
Name Type Description Defaultexp
str
Experiment.
requiredendpoint
str
eLog API endpoint.
requiredrequest_type
str
Type of request to make. Recognized options: POST or GET.
required**params
Dict
Endpoint parameters to pass with the HTTP request! Differs depending on the API endpoint. Do not include auth objects.
{}
Returns:
Name Type Descriptionstatus_code
int
Response status code. Can be checked for errors.
msg
str
An error message, or a message saying SUCCESS.
value
Optional[Any]
For GET requests ONLY, return the requested information.
Source code inlute/io/elog.py
def elog_http_request(\n exp: str, endpoint: str, request_type: str, **params\n) -> Tuple[int, str, Optional[Any]]:\n \"\"\"Make an HTTP request to the eLog.\n\n This method will determine the proper authorization method and update the\n passed parameters appropriately. Functions implementing specific endpoint\n functionality and calling this function should only pass the necessary\n endpoint-specific parameters and not include the authorization objects.\n\n Args:\n exp (str): Experiment.\n\n endpoint (str): eLog API endpoint.\n\n request_type (str): Type of request to make. Recognized options: POST or\n GET.\n\n **params (Dict): Endpoint parameters to pass with the HTTP request!\n Differs depending on the API endpoint. Do not include auth objects.\n\n Returns:\n status_code (int): Response status code. Can be checked for errors.\n\n msg (str): An error message, or a message saying SUCCESS.\n\n value (Optional[Any]): For GET requests ONLY, return the requested\n information.\n \"\"\"\n auth: Union[HTTPBasicAuth, Dict[str, str]] = get_elog_auth(exp)\n base_url: str\n if isinstance(auth, HTTPBasicAuth):\n params.update({\"auth\": auth})\n base_url = \"https://pswww.slac.stanford.edu/ws-auth/lgbk/lgbk\"\n elif isinstance(auth, dict):\n params.update({\"headers\": auth})\n base_url = \"https://pswww.slac.stanford.edu/ws-kerb/lgbk/lgbk\"\n\n url: str = f\"{base_url}/{endpoint}\"\n\n resp: requests.models.Response\n if request_type.upper() == \"POST\":\n resp = requests.post(url, **params)\n elif request_type.upper() == \"GET\":\n resp = requests.get(url, **params)\n else:\n return (-1, \"Invalid request type!\", None)\n\n status_code: int = resp.status_code\n msg: str = \"SUCCESS\"\n\n if resp.json()[\"success\"] and request_type.upper() == \"GET\":\n return (status_code, msg, resp.json()[\"value\"])\n\n if status_code >= 300:\n msg = f\"Error when posting to eLog: Response {status_code}\"\n\n if not resp.json()[\"success\"]:\n err_msg = resp.json()[\"error_msg\"]\n msg += f\"\\nInclude message: {err_msg}\"\n return (resp.status_code, msg, None)\n
"},{"location":"source/io/elog/#io.elog.format_file_for_post","title":"format_file_for_post(in_file)
","text":"Format a file for attachment to an eLog post.
The eLog API expects a specifically formatted tuple when adding file attachments. This function prepares the tuple to specification given a number of different input types.
Parameters:
Name Type Description Defaultin_file
str | tuple | list
File to include as an attachment in an eLog post.
required Source code inlute/io/elog.py
def format_file_for_post(\n in_file: Union[str, tuple, list]\n) -> Tuple[str, Tuple[str, BufferedReader], Any]:\n \"\"\"Format a file for attachment to an eLog post.\n\n The eLog API expects a specifically formatted tuple when adding file\n attachments. This function prepares the tuple to specification given a\n number of different input types.\n\n Args:\n in_file (str | tuple | list): File to include as an attachment in an\n eLog post.\n \"\"\"\n description: str\n fptr: BufferedReader\n ftype: Optional[str]\n if isinstance(in_file, str):\n description = os.path.basename(in_file)\n fptr = open(in_file, \"rb\")\n ftype = mimetypes.guess_type(in_file)[0]\n elif isinstance(in_file, tuple) or isinstance(in_file, list):\n description = in_file[1]\n fptr = open(in_file[0], \"rb\")\n ftype = mimetypes.guess_type(in_file[0])[0]\n else:\n raise ElogFileFormatError(f\"Unrecognized format: {in_file}\")\n\n out_file: Tuple[str, Tuple[str, BufferedReader], Any] = (\n \"files\",\n (description, fptr),\n ftype,\n )\n return out_file\n
"},{"location":"source/io/elog/#io.elog.get_elog_active_expmt","title":"get_elog_active_expmt(hutch, *, endstation=0)
","text":"Get the current active experiment for a hutch.
This function is one of two functions to manage the HTTP request independently. This is because it does not require an authorization object, and its result is needed for the generic function elog_http_request
to work properly.
Parameters:
Name Type Description Defaulthutch
str
The hutch to get the active experiment for.
requiredendstation
int
The hutch endstation to get the experiment for. This should generally be 0.
0
Source code in lute/io/elog.py
def get_elog_active_expmt(hutch: str, *, endstation: int = 0) -> str:\n \"\"\"Get the current active experiment for a hutch.\n\n This function is one of two functions to manage the HTTP request independently.\n This is because it does not require an authorization object, and its result\n is needed for the generic function `elog_http_request` to work properly.\n\n Args:\n hutch (str): The hutch to get the active experiment for.\n\n endstation (int): The hutch endstation to get the experiment for. This\n should generally be 0.\n \"\"\"\n\n base_url: str = \"https://pswww.slac.stanford.edu/ws/lgbk/lgbk\"\n endpoint: str = \"ws/activeexperiment_for_instrument_station\"\n url: str = f\"{base_url}/{endpoint}\"\n params: Dict[str, str] = {\"instrument_name\": hutch, \"station\": f\"{endstation}\"}\n resp: requests.models.Response = requests.get(url, params)\n if resp.status_code > 300:\n raise RuntimeError(\n f\"Error getting current experiment!\\n\\t\\tIncorrect hutch: '{hutch}'?\"\n )\n if resp.json()[\"success\"]:\n return resp.json()[\"value\"][\"name\"]\n else:\n msg: str = resp.json()[\"error_msg\"]\n raise RuntimeError(f\"Error getting current experiment! Err: {msg}\")\n
"},{"location":"source/io/elog/#io.elog.get_elog_auth","title":"get_elog_auth(exp)
","text":"Determine the appropriate auth method depending on experiment state.
Returns:
Name Type Descriptionauth
HTTPBasicAuth | Dict[str, str]
Depending on whether an experiment is active/live, returns authorization for the hutch operator account or the current user submitting a job.
Source code inlute/io/elog.py
def get_elog_auth(exp: str) -> Union[HTTPBasicAuth, Dict[str, str]]:\n \"\"\"Determine the appropriate auth method depending on experiment state.\n\n Returns:\n auth (HTTPBasicAuth | Dict[str, str]): Depending on whether an experiment\n is active/live, returns authorization for the hutch operator account\n or the current user submitting a job.\n \"\"\"\n hutch: str = exp[:3]\n if exp.lower() == get_elog_active_expmt(hutch=hutch).lower():\n return get_elog_opr_auth(exp)\n else:\n return get_elog_kerberos_auth()\n
"},{"location":"source/io/elog/#io.elog.get_elog_kerberos_auth","title":"get_elog_kerberos_auth()
","text":"Returns Kerberos authorization key.
This functions returns authorization for the USER account submitting jobs. It assumes that kinit
has been run.
Returns:
Name Type Descriptionauth
Dict[str, str]
Dictionary containing Kerberos authorization key.
Source code inlute/io/elog.py
def get_elog_kerberos_auth() -> Dict[str, str]:\n \"\"\"Returns Kerberos authorization key.\n\n This functions returns authorization for the USER account submitting jobs.\n It assumes that `kinit` has been run.\n\n Returns:\n auth (Dict[str, str]): Dictionary containing Kerberos authorization key.\n \"\"\"\n from krtc import KerberosTicket\n\n return KerberosTicket(\"HTTP@pswww.slac.stanford.edu\").getAuthHeaders()\n
"},{"location":"source/io/elog/#io.elog.get_elog_opr_auth","title":"get_elog_opr_auth(exp)
","text":"Produce authentication for the \"opr\" user associated to an experiment.
This method uses basic authentication using username and password.
Parameters:
Name Type Description Defaultexp
str
Name of the experiment to produce authentication for.
requiredReturns:
Name Type Descriptionauth
HTTPBasicAuth
HTTPBasicAuth for an active experiment based on username and password for the associated operator account.
Source code inlute/io/elog.py
def get_elog_opr_auth(exp: str) -> HTTPBasicAuth:\n \"\"\"Produce authentication for the \"opr\" user associated to an experiment.\n\n This method uses basic authentication using username and password.\n\n Args:\n exp (str): Name of the experiment to produce authentication for.\n\n Returns:\n auth (HTTPBasicAuth): HTTPBasicAuth for an active experiment based on\n username and password for the associated operator account.\n \"\"\"\n opr: str = f\"{exp[:3]}opr\"\n with open(\"/sdf/group/lcls/ds/tools/forElogPost.txt\", \"r\") as f:\n pw: str = f.readline()[:-1]\n return HTTPBasicAuth(opr, pw)\n
"},{"location":"source/io/elog/#io.elog.get_elog_params_by_run","title":"get_elog_params_by_run(exp, params, runs=None)
","text":"Retrieve requested parameters by run or for all runs.
Parameters:
Name Type Description Defaultexp
str
Experiment to retrieve parameters for.
requiredparams
List[str]
A list of parameters to retrieve. These can be any parameter recorded in the eLog (PVs, parameters posted by other Tasks, etc.)
required Source code inlute/io/elog.py
def get_elog_params_by_run(\n exp: str, params: List[str], runs: Optional[List[int]] = None\n) -> Dict[str, str]:\n \"\"\"Retrieve requested parameters by run or for all runs.\n\n Args:\n exp (str): Experiment to retrieve parameters for.\n\n params (List[str]): A list of parameters to retrieve. These can be any\n parameter recorded in the eLog (PVs, parameters posted by other\n Tasks, etc.)\n \"\"\"\n ...\n
"},{"location":"source/io/elog/#io.elog.get_elog_runs_by_tag","title":"get_elog_runs_by_tag(exp, tag, auth=None)
","text":"Retrieve run numbers with a specified tag.
Parameters:
Name Type Description Defaultexp
str
Experiment name.
requiredtag
str
The tag to retrieve runs for.
required Source code inlute/io/elog.py
def get_elog_runs_by_tag(\n exp: str, tag: str, auth: Optional[Union[HTTPBasicAuth, Dict]] = None\n) -> List[int]:\n \"\"\"Retrieve run numbers with a specified tag.\n\n Args:\n exp (str): Experiment name.\n\n tag (str): The tag to retrieve runs for.\n \"\"\"\n endpoint: str = f\"{exp}/ws/get_runs_with_tag?tag={tag}\"\n params: Dict[str, Any] = {}\n\n status_code, resp_msg, tagged_runs = elog_http_request(\n exp=exp, endpoint=endpoint, request_type=\"GET\", **params\n )\n\n if not tagged_runs:\n tagged_runs = []\n\n return tagged_runs\n
"},{"location":"source/io/elog/#io.elog.get_elog_workflows","title":"get_elog_workflows(exp)
","text":"Get the current workflow definitions for an experiment.
Returns:
Name Type Descriptiondefns
Dict[str, str]
A dictionary of workflow definitions.
Source code inlute/io/elog.py
def get_elog_workflows(exp: str) -> Dict[str, str]:\n \"\"\"Get the current workflow definitions for an experiment.\n\n Returns:\n defns (Dict[str, str]): A dictionary of workflow definitions.\n \"\"\"\n raise NotImplementedError\n
"},{"location":"source/io/elog/#io.elog.post_elog_message","title":"post_elog_message(exp, msg, *, tag, title, in_files=[])
","text":"Post a new message to the eLog. Inspired by the elog
package.
Parameters:
Name Type Description Defaultexp
str
Experiment name.
requiredmsg
str
BODY of the eLog post.
requiredtag
str | None
Optional \"tag\" to associate with the eLog post.
requiredtitle
str | None
Optional title to include in the eLog post.
requiredin_files
List[str | tuple | list]
Files to include as attachments in the eLog post.
[]
Returns:
Name Type Descriptionerr_msg
str | None
If successful, nothing is returned, otherwise, return an error message.
Source code inlute/io/elog.py
def post_elog_message(\n exp: str,\n msg: str,\n *,\n tag: Optional[str],\n title: Optional[str],\n in_files: List[Union[str, tuple, list]] = [],\n) -> Optional[str]:\n \"\"\"Post a new message to the eLog. Inspired by the `elog` package.\n\n Args:\n exp (str): Experiment name.\n\n msg (str): BODY of the eLog post.\n\n tag (str | None): Optional \"tag\" to associate with the eLog post.\n\n title (str | None): Optional title to include in the eLog post.\n\n in_files (List[str | tuple | list]): Files to include as attachments in\n the eLog post.\n\n Returns:\n err_msg (str | None): If successful, nothing is returned, otherwise,\n return an error message.\n \"\"\"\n # MOSTLY CORRECT\n out_files: list = []\n for f in in_files:\n try:\n out_files.append(format_file_for_post(in_file=f))\n except ElogFileFormatError as err:\n logger.debug(f\"ElogFileFormatError: {err}\")\n post: Dict[str, str] = {}\n post[\"log_text\"] = msg\n if tag:\n post[\"log_tags\"] = tag\n if title:\n post[\"log_title\"] = title\n\n endpoint: str = f\"{exp}/ws/new_elog_entry\"\n\n params: Dict[str, Any] = {\"data\": post}\n\n if out_files:\n params.update({\"files\": out_files})\n\n status_code, resp_msg, _ = elog_http_request(\n exp=exp, endpoint=endpoint, request_type=\"POST\", **params\n )\n\n if resp_msg != \"SUCCESS\":\n return resp_msg\n
"},{"location":"source/io/elog/#io.elog.post_elog_run_status","title":"post_elog_run_status(data, update_url=None)
","text":"Post a summary to the status/report section of a specific run.
In contrast to most eLog update/post mechanisms, this function searches for a specific environment variable which contains a specific URL for posting. This is updated every job/run as jobs are submitted by the JID. The URL can optionally be passed to this function if it is known.
Parameters:
Name Type Description Defaultdata
Dict[str, Union[str, int, float]]
The data to post to the eLog report section. Formatted in key:value pairs.
requiredupdate_url
Optional[str]
Optional update URL. If not provided, the function searches for the corresponding environment variable. If neither is found, the function aborts
None
Source code in lute/io/elog.py
def post_elog_run_status(\n data: Dict[str, Union[str, int, float]], update_url: Optional[str] = None\n) -> None:\n \"\"\"Post a summary to the status/report section of a specific run.\n\n In contrast to most eLog update/post mechanisms, this function searches\n for a specific environment variable which contains a specific URL for\n posting. This is updated every job/run as jobs are submitted by the JID.\n The URL can optionally be passed to this function if it is known.\n\n Args:\n data (Dict[str, Union[str, int, float]]): The data to post to the eLog\n report section. Formatted in key:value pairs.\n\n update_url (Optional[str]): Optional update URL. If not provided, the\n function searches for the corresponding environment variable. If\n neither is found, the function aborts\n \"\"\"\n if update_url is None:\n update_url = os.environ.get(\"JID_UPDATE_COUNTERS\")\n if update_url is None:\n logger.info(\"eLog Update Failed! JID_UPDATE_COUNTERS is not defined!\")\n return\n current_status: Dict[str, Union[str, int, float]] = _get_current_run_status(\n update_url\n )\n current_status.update(data)\n post_list: List[Dict[str, str]] = [\n {\"key\": f\"{key}\", \"value\": f\"{value}\"} for key, value in current_status.items()\n ]\n params: Dict[str, List[Dict[str, str]]] = {\"json\": post_list}\n resp: requests.models.Response = requests.post(update_url, **params)\n
"},{"location":"source/io/elog/#io.elog.post_elog_run_table","title":"post_elog_run_table(exp, run, data)
","text":"Post data for eLog run tables.
Parameters:
Name Type Description Defaultexp
str
Experiment name.
requiredrun
int
Run number corresponding to the data being posted.
requireddata
Dict[str, Any]
Data to be posted in format data[\"column_header\"] = value.
requiredReturns:
Name Type Descriptionerr_msg
None | str
If successful, nothing is returned, otherwise, return an error message.
Source code inlute/io/elog.py
def post_elog_run_table(\n exp: str,\n run: int,\n data: Dict[str, Any],\n) -> Optional[str]:\n \"\"\"Post data for eLog run tables.\n\n Args:\n exp (str): Experiment name.\n\n run (int): Run number corresponding to the data being posted.\n\n data (Dict[str, Any]): Data to be posted in format\n data[\"column_header\"] = value.\n\n Returns:\n err_msg (None | str): If successful, nothing is returned, otherwise,\n return an error message.\n \"\"\"\n endpoint: str = f\"run_control/{exp}/ws/add_run_params\"\n\n params: Dict[str, Any] = {\"params\": {\"run_num\": run}, \"json\": data}\n\n status_code, resp_msg, _ = elog_http_request(\n exp=exp, endpoint=endpoint, request_type=\"POST\", **params\n )\n\n if resp_msg != \"SUCCESS\":\n return resp_msg\n
"},{"location":"source/io/elog/#io.elog.post_elog_workflow","title":"post_elog_workflow(exp, name, executable, wf_params, *, trigger='run_end', location='S3DF', **trig_args)
","text":"Create a new eLog workflow, or update an existing one.
The workflow will run a specific executable as a batch job when the specified trigger occurs. The precise arguments may vary depending on the selected trigger type.
Parameters:
Name Type Description Defaultname
str
An identifying name for the workflow. E.g. \"process data\"
requiredexecutable
str
Full path to the executable to be run.
requiredwf_params
str
All command-line parameters for the executable as a string.
requiredtrigger
str
When to trigger execution of the specified executable. One of: - 'manual': Must be manually triggered. No automatic processing. - 'run_start': Execute immediately if a new run begins. - 'run_end': As soon as a run ends. - 'param_is': As soon as a parameter has a specific value for a run.
'run_end'
location
str
Where to submit the job. S3DF or NERSC.
'S3DF'
**trig_args
str
Arguments required for a specific trigger type. trigger='param_is' - 2 Arguments trig_param (str): Name of the parameter to watch for. trig_param_val (str): Value the parameter should have to trigger.
{}
Source code in lute/io/elog.py
def post_elog_workflow(\n exp: str,\n name: str,\n executable: str,\n wf_params: str,\n *,\n trigger: str = \"run_end\",\n location: str = \"S3DF\",\n **trig_args: str,\n) -> None:\n \"\"\"Create a new eLog workflow, or update an existing one.\n\n The workflow will run a specific executable as a batch job when the\n specified trigger occurs. The precise arguments may vary depending on the\n selected trigger type.\n\n Args:\n name (str): An identifying name for the workflow. E.g. \"process data\"\n\n executable (str): Full path to the executable to be run.\n\n wf_params (str): All command-line parameters for the executable as a string.\n\n trigger (str): When to trigger execution of the specified executable.\n One of:\n - 'manual': Must be manually triggered. No automatic processing.\n - 'run_start': Execute immediately if a new run begins.\n - 'run_end': As soon as a run ends.\n - 'param_is': As soon as a parameter has a specific value for a run.\n\n location (str): Where to submit the job. S3DF or NERSC.\n\n **trig_args (str): Arguments required for a specific trigger type.\n trigger='param_is' - 2 Arguments\n trig_param (str): Name of the parameter to watch for.\n trig_param_val (str): Value the parameter should have to trigger.\n \"\"\"\n endpoint: str = f\"{exp}/ws/create_update_workflow_def\"\n trig_map: Dict[str, str] = {\n \"manual\": \"MANUAL\",\n \"run_start\": \"START_OF_RUN\",\n \"run_end\": \"END_OF_RUN\",\n \"param_is\": \"RUN_PARAM_IS_VALUE\",\n }\n if trigger not in trig_map.keys():\n raise NotImplementedError(\n f\"Cannot create workflow with trigger type: {trigger}\"\n )\n wf_defn: Dict[str, str] = {\n \"name\": name,\n \"executable\": executable,\n \"parameters\": wf_params,\n \"trigger\": trig_map[trigger],\n \"location\": location,\n }\n if trigger == \"param_is\":\n if \"trig_param\" not in trig_args or \"trig_param_val\" not in trig_args:\n raise RuntimeError(\n \"Trigger type 'param_is' requires: 'trig_param' and 'trig_param_val' arguments\"\n )\n wf_defn.update(\n {\n \"run_param_name\": trig_args[\"trig_param\"],\n \"run_param_val\": trig_args[\"trig_param_val\"],\n }\n )\n post_params: Dict[str, Dict[str, str]] = {\"json\": wf_defn}\n status_code, resp_msg, _ = elog_http_request(\n exp, endpoint=endpoint, request_type=\"POST\", **post_params\n )\n
"},{"location":"source/io/exceptions/","title":"exceptions","text":"Specifies custom exceptions defined for IO problems.
Raises:
Type DescriptionElogFileFormatError
Raised if an attachment is specified in an incorrect format.
"},{"location":"source/io/exceptions/#io.exceptions.ElogFileFormatError","title":"ElogFileFormatError
","text":" Bases: Exception
Raised when an eLog attachment is specified in an invalid format.
Source code inlute/io/exceptions.py
class ElogFileFormatError(Exception):\n \"\"\"Raised when an eLog attachment is specified in an invalid format.\"\"\"\n\n ...\n
"},{"location":"source/io/models/base/","title":"base","text":"Base classes for describing Task parameters.
Classes:
Name DescriptionAnalysisHeader
Model holding shared configuration across Tasks. E.g. experiment name, run number and working directory.
TaskParameters
Base class for Task parameters. Subclasses specify a model of parameters and their types for validation.
ThirdPartyParameters
Base class for Third-party, binary executable Tasks.
TemplateParameters
Dataclass to represent parameters of binary (third-party) Tasks which are used for additional config files.
TemplateConfig
Class for holding information on where templates are stored in order to properly handle ThirdPartyParameter objects.
"},{"location":"source/io/models/base/#io.models.base.AnalysisHeader","title":"AnalysisHeader
","text":" Bases: BaseModel
Header information for LUTE analysis runs.
Source code inlute/io/models/base.py
class AnalysisHeader(BaseModel):\n \"\"\"Header information for LUTE analysis runs.\"\"\"\n\n title: str = Field(\n \"LUTE Task Configuration\",\n description=\"Description of the configuration or experiment.\",\n )\n experiment: str = Field(\"\", description=\"Experiment.\")\n run: Union[str, int] = Field(\"\", description=\"Data acquisition run.\")\n date: str = Field(\"1970/01/01\", description=\"Start date of analysis.\")\n lute_version: Union[float, str] = Field(\n 0.1, description=\"Version of LUTE used for analysis.\"\n )\n task_timeout: PositiveInt = Field(\n 600,\n description=(\n \"Time in seconds until a task times out. Should be slightly shorter\"\n \" than job timeout if using a job manager (e.g. SLURM).\"\n ),\n )\n work_dir: str = Field(\"\", description=\"Main working directory for LUTE.\")\n\n @validator(\"work_dir\", always=True)\n def validate_work_dir(cls, directory: str, values: Dict[str, Any]) -> str:\n work_dir: str\n if directory == \"\":\n std_work_dir = (\n f\"/sdf/data/lcls/ds/{values['experiment'][:3]}/\"\n f\"{values['experiment']}/scratch\"\n )\n work_dir = std_work_dir\n else:\n work_dir = directory\n # Check existence and permissions\n if not os.path.exists(work_dir):\n raise ValueError(f\"Working Directory: {work_dir} does not exist!\")\n if not os.access(work_dir, os.W_OK):\n # Need write access for database, files etc.\n raise ValueError(f\"Not write access for working directory: {work_dir}!\")\n return work_dir\n\n @validator(\"run\", always=True)\n def validate_run(\n cls, run: Union[str, int], values: Dict[str, Any]\n ) -> Union[str, int]:\n if run == \"\":\n # From Airflow RUN_NUM should have Format \"RUN_DATETIME\" - Num is first part\n run_time: str = os.environ.get(\"RUN_NUM\", \"\")\n if run_time != \"\":\n return int(run_time.split(\"_\")[0])\n return run\n\n @validator(\"experiment\", always=True)\n def validate_experiment(cls, experiment: str, values: Dict[str, Any]) -> str:\n if experiment == \"\":\n arp_exp: str = os.environ.get(\"EXPERIMENT\", \"EXPX00000\")\n return arp_exp\n return experiment\n
"},{"location":"source/io/models/base/#io.models.base.TaskParameters","title":"TaskParameters
","text":" Bases: BaseSettings
Base class for models of task parameters to be validated.
Parameters are read from a configuration YAML file and validated against subclasses of this type in order to ensure that both all parameters are present, and that the parameters are of the correct type.
NotePydantic is used for data validation. Pydantic does not perform \"strict\" validation by default. Parameter values may be cast to conform with the model specified by the subclass definition if it is possible to do so. Consider whether this may cause issues (e.g. if a float is cast to an int).
Source code inlute/io/models/base.py
class TaskParameters(BaseSettings):\n \"\"\"Base class for models of task parameters to be validated.\n\n Parameters are read from a configuration YAML file and validated against\n subclasses of this type in order to ensure that both all parameters are\n present, and that the parameters are of the correct type.\n\n Note:\n Pydantic is used for data validation. Pydantic does not perform \"strict\"\n validation by default. Parameter values may be cast to conform with the\n model specified by the subclass definition if it is possible to do so.\n Consider whether this may cause issues (e.g. if a float is cast to an\n int).\n \"\"\"\n\n class Config:\n \"\"\"Configuration for parameters model.\n\n The Config class holds Pydantic configuration. A number of LUTE-specific\n configuration has also been placed here.\n\n Attributes:\n env_prefix (str): Pydantic configuration. Will set parameters from\n environment variables containing this prefix. E.g. a model\n parameter `input` can be set with an environment variable:\n `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n underscore_attrs_are_private (bool): Pydantic configuration. Whether\n to hide attributes (parameters) prefixed with an underscore.\n\n copy_on_model_validation (str): Pydantic configuration. How to copy\n the input object passed to the class instance for model\n validation. Set to perform a deep copy.\n\n allow_inf_nan (bool): Pydantic configuration. Whether to allow\n infinity or NAN in float fields.\n\n run_directory (Optional[str]): None. If set, it should be a valid\n path. The `Task` will be run from this directory. This may be\n useful for some `Task`s which rely on searching the working\n directory.\n\n set_result (bool). False. If True, the model has information about\n setting the TaskResult object from the parameters it contains.\n E.g. it has an `output` parameter which is marked as the result.\n The result can be set with a field value of `is_result=True` on\n a specific parameter, or using `result_from_params` and a\n validator.\n\n result_from_params (Optional[str]): None. Optionally used to define\n results from information available in the model using a custom\n validator. E.g. use a `outdir` and `filename` field to set\n `result_from_params=f\"{outdir}/{filename}`, etc. Only used if\n `set_result==True`\n\n result_summary (Optional[str]): None. Defines a result summary that\n can be known after processing the Pydantic model. Use of summary\n depends on the Executor running the Task. All summaries are\n stored in the database, however. Only used if `set_result==True`\n\n impl_schemas (Optional[str]). Specifies a the schemas the\n output/results conform to. Only used if `set_result==True`.\n \"\"\"\n\n env_prefix = \"LUTE_\"\n underscore_attrs_are_private: bool = True\n copy_on_model_validation: str = \"deep\"\n allow_inf_nan: bool = False\n\n run_directory: Optional[str] = None\n \"\"\"Set the directory that the Task is run from.\"\"\"\n set_result: bool = False\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n result_from_params: Optional[str] = None\n \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n result_summary: Optional[str] = None\n \"\"\"Format a TaskResult.summary from output.\"\"\"\n impl_schemas: Optional[str] = None\n \"\"\"Schema specification for output result. Will be passed to TaskResult.\"\"\"\n\n lute_config: AnalysisHeader\n
"},{"location":"source/io/models/base/#io.models.base.TaskParameters.Config","title":"Config
","text":"Configuration for parameters model.
The Config class holds Pydantic configuration. A number of LUTE-specific configuration has also been placed here.
Attributes:
Name Type Descriptionenv_prefix
str
Pydantic configuration. Will set parameters from environment variables containing this prefix. E.g. a model parameter input
can be set with an environment variable: {env_prefix}input
, in LUTE's case LUTE_input
.
underscore_attrs_are_private
bool
Pydantic configuration. Whether to hide attributes (parameters) prefixed with an underscore.
copy_on_model_validation
str
Pydantic configuration. How to copy the input object passed to the class instance for model validation. Set to perform a deep copy.
allow_inf_nan
bool
Pydantic configuration. Whether to allow infinity or NAN in float fields.
run_directory
Optional[str]
None. If set, it should be a valid path. The Task
will be run from this directory. This may be useful for some Task
s which rely on searching the working directory.
result_from_params
Optional[str]
None. Optionally used to define results from information available in the model using a custom validator. E.g. use a outdir
and filename
field to set result_from_params=f\"{outdir}/{filename}
, etc. Only used if set_result==True
result_summary
Optional[str]
None. Defines a result summary that can be known after processing the Pydantic model. Use of summary depends on the Executor running the Task. All summaries are stored in the database, however. Only used if set_result==True
lute/io/models/base.py
class Config:\n \"\"\"Configuration for parameters model.\n\n The Config class holds Pydantic configuration. A number of LUTE-specific\n configuration has also been placed here.\n\n Attributes:\n env_prefix (str): Pydantic configuration. Will set parameters from\n environment variables containing this prefix. E.g. a model\n parameter `input` can be set with an environment variable:\n `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n underscore_attrs_are_private (bool): Pydantic configuration. Whether\n to hide attributes (parameters) prefixed with an underscore.\n\n copy_on_model_validation (str): Pydantic configuration. How to copy\n the input object passed to the class instance for model\n validation. Set to perform a deep copy.\n\n allow_inf_nan (bool): Pydantic configuration. Whether to allow\n infinity or NAN in float fields.\n\n run_directory (Optional[str]): None. If set, it should be a valid\n path. The `Task` will be run from this directory. This may be\n useful for some `Task`s which rely on searching the working\n directory.\n\n set_result (bool). False. If True, the model has information about\n setting the TaskResult object from the parameters it contains.\n E.g. it has an `output` parameter which is marked as the result.\n The result can be set with a field value of `is_result=True` on\n a specific parameter, or using `result_from_params` and a\n validator.\n\n result_from_params (Optional[str]): None. Optionally used to define\n results from information available in the model using a custom\n validator. E.g. use a `outdir` and `filename` field to set\n `result_from_params=f\"{outdir}/{filename}`, etc. Only used if\n `set_result==True`\n\n result_summary (Optional[str]): None. Defines a result summary that\n can be known after processing the Pydantic model. Use of summary\n depends on the Executor running the Task. All summaries are\n stored in the database, however. Only used if `set_result==True`\n\n impl_schemas (Optional[str]). Specifies a the schemas the\n output/results conform to. Only used if `set_result==True`.\n \"\"\"\n\n env_prefix = \"LUTE_\"\n underscore_attrs_are_private: bool = True\n copy_on_model_validation: str = \"deep\"\n allow_inf_nan: bool = False\n\n run_directory: Optional[str] = None\n \"\"\"Set the directory that the Task is run from.\"\"\"\n set_result: bool = False\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n result_from_params: Optional[str] = None\n \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n result_summary: Optional[str] = None\n \"\"\"Format a TaskResult.summary from output.\"\"\"\n impl_schemas: Optional[str] = None\n \"\"\"Schema specification for output result. Will be passed to TaskResult.\"\"\"\n
"},{"location":"source/io/models/base/#io.models.base.TaskParameters.Config.impl_schemas","title":"impl_schemas: Optional[str] = None
class-attribute
instance-attribute
","text":"Schema specification for output result. Will be passed to TaskResult.
"},{"location":"source/io/models/base/#io.models.base.TaskParameters.Config.result_from_params","title":"result_from_params: Optional[str] = None
class-attribute
instance-attribute
","text":"Defines a result from the parameters. Use a validator to do so.
"},{"location":"source/io/models/base/#io.models.base.TaskParameters.Config.result_summary","title":"result_summary: Optional[str] = None
class-attribute
instance-attribute
","text":"Format a TaskResult.summary from output.
"},{"location":"source/io/models/base/#io.models.base.TaskParameters.Config.run_directory","title":"run_directory: Optional[str] = None
class-attribute
instance-attribute
","text":"Set the directory that the Task is run from.
"},{"location":"source/io/models/base/#io.models.base.TaskParameters.Config.set_result","title":"set_result: bool = False
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/models/base/#io.models.base.TemplateConfig","title":"TemplateConfig
","text":" Bases: BaseModel
Parameters used for templating of third party configuration files.
Attributes:
Name Type Descriptiontemplate_name
str
The name of the template to use. This template must live in config/templates
.
output_path
str
The FULL path, including filename to write the rendered template to.
Source code inlute/io/models/base.py
class TemplateConfig(BaseModel):\n \"\"\"Parameters used for templating of third party configuration files.\n\n Attributes:\n template_name (str): The name of the template to use. This template must\n live in `config/templates`.\n\n output_path (str): The FULL path, including filename to write the\n rendered template to.\n \"\"\"\n\n template_name: str\n output_path: str\n
"},{"location":"source/io/models/base/#io.models.base.TemplateParameters","title":"TemplateParameters
","text":"Class for representing parameters for third party configuration files.
These parameters can represent arbitrary data types and are used in conjunction with templates for modifying third party configuration files from the single LUTE YAML. Due to the storage of arbitrary data types, and the use of a template file, a single instance of this class can hold from a single template variable to an entire configuration file. The data parsing is done by jinja using the complementary template. All data is stored in the single model variable params.
The pydantic \"dataclass\" is used over the BaseModel/Settings to allow positional argument instantiation of the params
Field.
lute/io/models/base.py
@dataclass\nclass TemplateParameters:\n \"\"\"Class for representing parameters for third party configuration files.\n\n These parameters can represent arbitrary data types and are used in\n conjunction with templates for modifying third party configuration files\n from the single LUTE YAML. Due to the storage of arbitrary data types, and\n the use of a template file, a single instance of this class can hold from a\n single template variable to an entire configuration file. The data parsing\n is done by jinja using the complementary template.\n All data is stored in the single model variable `params.`\n\n The pydantic \"dataclass\" is used over the BaseModel/Settings to allow\n positional argument instantiation of the `params` Field.\n \"\"\"\n\n params: Any\n
"},{"location":"source/io/models/base/#io.models.base.ThirdPartyParameters","title":"ThirdPartyParameters
","text":" Bases: TaskParameters
Base class for third party task parameters.
Contains special validators for extra arguments and handling of parameters used for filling in third party configuration files.
Source code inlute/io/models/base.py
class ThirdPartyParameters(TaskParameters):\n \"\"\"Base class for third party task parameters.\n\n Contains special validators for extra arguments and handling of parameters\n used for filling in third party configuration files.\n \"\"\"\n\n class Config(TaskParameters.Config):\n \"\"\"Configuration for parameters model.\n\n The Config class holds Pydantic configuration and inherited configuration\n from the base `TaskParameters.Config` class. A number of values are also\n overridden, and there are some specific configuration options to\n ThirdPartyParameters. A full list of options (with TaskParameters options\n repeated) is described below.\n\n Attributes:\n env_prefix (str): Pydantic configuration. Will set parameters from\n environment variables containing this prefix. E.g. a model\n parameter `input` can be set with an environment variable:\n `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n underscore_attrs_are_private (bool): Pydantic configuration. Whether\n to hide attributes (parameters) prefixed with an underscore.\n\n copy_on_model_validation (str): Pydantic configuration. How to copy\n the input object passed to the class instance for model\n validation. Set to perform a deep copy.\n\n allow_inf_nan (bool): Pydantic configuration. Whether to allow\n infinity or NAN in float fields.\n\n run_directory (Optional[str]): None. If set, it should be a valid\n path. The `Task` will be run from this directory. This may be\n useful for some `Task`s which rely on searching the working\n directory.\n\n set_result (bool). True. If True, the model has information about\n setting the TaskResult object from the parameters it contains.\n E.g. it has an `output` parameter which is marked as the result.\n The result can be set with a field value of `is_result=True` on\n a specific parameter, or using `result_from_params` and a\n validator.\n\n result_from_params (Optional[str]): None. Optionally used to define\n results from information available in the model using a custom\n validator. E.g. use a `outdir` and `filename` field to set\n `result_from_params=f\"{outdir}/{filename}`, etc.\n\n result_summary (Optional[str]): None. Defines a result summary that\n can be known after processing the Pydantic model. Use of summary\n depends on the Executor running the Task. All summaries are\n stored in the database, however.\n\n impl_schemas (Optional[str]). Specifies a the schemas the\n output/results conform to. Only used if set_result is True.\n\n -----------------------\n ThirdPartyTask-specific:\n\n extra (str): \"allow\". Pydantic configuration. Allow (or ignore) extra\n arguments.\n\n short_flags_use_eq (bool): False. If True, \"short\" command-line args\n are passed as `-x=arg`. ThirdPartyTask-specific.\n\n long_flags_use_eq (bool): False. If True, \"long\" command-line args\n are passed as `--long=arg`. ThirdPartyTask-specific.\n \"\"\"\n\n extra: str = \"allow\"\n short_flags_use_eq: bool = False\n \"\"\"Whether short command-line arguments are passed like `-x=arg`.\"\"\"\n long_flags_use_eq: bool = False\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n # lute_template_cfg: TemplateConfig\n\n @root_validator(pre=False)\n def extra_fields_to_thirdparty(cls, values: Dict[str, Any]):\n for key in values:\n if key not in cls.__fields__:\n values[key] = TemplateParameters(values[key])\n\n return values\n
"},{"location":"source/io/models/base/#io.models.base.ThirdPartyParameters.Config","title":"Config
","text":" Bases: Config
Configuration for parameters model.
The Config class holds Pydantic configuration and inherited configuration from the base TaskParameters.Config
class. A number of values are also overridden, and there are some specific configuration options to ThirdPartyParameters. A full list of options (with TaskParameters options repeated) is described below.
Attributes:
Name Type Descriptionenv_prefix
str
Pydantic configuration. Will set parameters from environment variables containing this prefix. E.g. a model parameter input
can be set with an environment variable: {env_prefix}input
, in LUTE's case LUTE_input
.
underscore_attrs_are_private
bool
Pydantic configuration. Whether to hide attributes (parameters) prefixed with an underscore.
copy_on_model_validation
str
Pydantic configuration. How to copy the input object passed to the class instance for model validation. Set to perform a deep copy.
allow_inf_nan
bool
Pydantic configuration. Whether to allow infinity or NAN in float fields.
run_directory
Optional[str]
None. If set, it should be a valid path. The Task
will be run from this directory. This may be useful for some Task
s which rely on searching the working directory.
result_from_params
Optional[str]
None. Optionally used to define results from information available in the model using a custom validator. E.g. use a outdir
and filename
field to set result_from_params=f\"{outdir}/{filename}
, etc.
result_summary
Optional[str]
None. Defines a result summary that can be known after processing the Pydantic model. Use of summary depends on the Executor running the Task. All summaries are stored in the database, however.
ThirdPartyTask-specific
Optional[str]
extra
str
\"allow\". Pydantic configuration. Allow (or ignore) extra arguments.
short_flags_use_eq
bool
False. If True, \"short\" command-line args are passed as -x=arg
. ThirdPartyTask-specific.
long_flags_use_eq
bool
False. If True, \"long\" command-line args are passed as --long=arg
. ThirdPartyTask-specific.
lute/io/models/base.py
class Config(TaskParameters.Config):\n \"\"\"Configuration for parameters model.\n\n The Config class holds Pydantic configuration and inherited configuration\n from the base `TaskParameters.Config` class. A number of values are also\n overridden, and there are some specific configuration options to\n ThirdPartyParameters. A full list of options (with TaskParameters options\n repeated) is described below.\n\n Attributes:\n env_prefix (str): Pydantic configuration. Will set parameters from\n environment variables containing this prefix. E.g. a model\n parameter `input` can be set with an environment variable:\n `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n underscore_attrs_are_private (bool): Pydantic configuration. Whether\n to hide attributes (parameters) prefixed with an underscore.\n\n copy_on_model_validation (str): Pydantic configuration. How to copy\n the input object passed to the class instance for model\n validation. Set to perform a deep copy.\n\n allow_inf_nan (bool): Pydantic configuration. Whether to allow\n infinity or NAN in float fields.\n\n run_directory (Optional[str]): None. If set, it should be a valid\n path. The `Task` will be run from this directory. This may be\n useful for some `Task`s which rely on searching the working\n directory.\n\n set_result (bool). True. If True, the model has information about\n setting the TaskResult object from the parameters it contains.\n E.g. it has an `output` parameter which is marked as the result.\n The result can be set with a field value of `is_result=True` on\n a specific parameter, or using `result_from_params` and a\n validator.\n\n result_from_params (Optional[str]): None. Optionally used to define\n results from information available in the model using a custom\n validator. E.g. use a `outdir` and `filename` field to set\n `result_from_params=f\"{outdir}/{filename}`, etc.\n\n result_summary (Optional[str]): None. Defines a result summary that\n can be known after processing the Pydantic model. Use of summary\n depends on the Executor running the Task. All summaries are\n stored in the database, however.\n\n impl_schemas (Optional[str]). Specifies a the schemas the\n output/results conform to. Only used if set_result is True.\n\n -----------------------\n ThirdPartyTask-specific:\n\n extra (str): \"allow\". Pydantic configuration. Allow (or ignore) extra\n arguments.\n\n short_flags_use_eq (bool): False. If True, \"short\" command-line args\n are passed as `-x=arg`. ThirdPartyTask-specific.\n\n long_flags_use_eq (bool): False. If True, \"long\" command-line args\n are passed as `--long=arg`. ThirdPartyTask-specific.\n \"\"\"\n\n extra: str = \"allow\"\n short_flags_use_eq: bool = False\n \"\"\"Whether short command-line arguments are passed like `-x=arg`.\"\"\"\n long_flags_use_eq: bool = False\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/models/base/#io.models.base.ThirdPartyParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = False
class-attribute
instance-attribute
","text":"Whether long command-line arguments are passed like --long=arg
.
set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/models/base/#io.models.base.ThirdPartyParameters.Config.short_flags_use_eq","title":"short_flags_use_eq: bool = False
class-attribute
instance-attribute
","text":"Whether short command-line arguments are passed like -x=arg
.
FindPeaksPsocakeParameters
","text":" Bases: ThirdPartyParameters
Parameters for crystallographic (Bragg) peak finding using Psocake.
This peak finding Task optionally has the ability to compress/decompress data with SZ for the purpose of compression validation. NOTE: This Task is deprecated and provided for compatibility only.
Source code inlute/io/models/sfx_find_peaks.py
class FindPeaksPsocakeParameters(ThirdPartyParameters):\n \"\"\"Parameters for crystallographic (Bragg) peak finding using Psocake.\n\n This peak finding Task optionally has the ability to compress/decompress\n data with SZ for the purpose of compression validation.\n NOTE: This Task is deprecated and provided for compatibility only.\n \"\"\"\n\n class Config(TaskParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n result_from_params: str = \"\"\n \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n\n class SZParameters(BaseModel):\n compressor: Literal[\"qoz\", \"sz3\"] = Field(\n \"qoz\", description=\"SZ compression algorithm (qoz, sz3)\"\n )\n binSize: int = Field(2, description=\"SZ compression's bin size paramater\")\n roiWindowSize: int = Field(\n 2, description=\"SZ compression's ROI window size paramater\"\n )\n absError: float = Field(10, descriptionp=\"Maximum absolute error value\")\n\n executable: str = Field(\"mpirun\", description=\"MPI executable.\", flag_type=\"\")\n np: PositiveInt = Field(\n max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n description=\"Number of processes\",\n flag_type=\"-\",\n )\n mca: str = Field(\n \"btl ^openib\", description=\"Mca option for the MPI executable\", flag_type=\"--\"\n )\n p_arg1: str = Field(\n \"python\", description=\"Executable to run with mpi (i.e. python).\", flag_type=\"\"\n )\n u: str = Field(\n \"\", description=\"Python option for unbuffered output.\", flag_type=\"-\"\n )\n p_arg2: str = Field(\n \"findPeaksSZ.py\",\n description=\"Executable to run with mpi (i.e. python).\",\n flag_type=\"\",\n )\n d: str = Field(description=\"Detector name\", flag_type=\"-\")\n e: str = Field(\"\", description=\"Experiment name\", flag_type=\"-\")\n r: int = Field(-1, description=\"Run number\", flag_type=\"-\")\n outDir: str = Field(\n description=\"Output directory where .cxi will be saved\", flag_type=\"--\"\n )\n algorithm: int = Field(1, description=\"PyAlgos algorithm to use\", flag_type=\"--\")\n alg_npix_min: float = Field(\n 1.0, description=\"PyAlgos algorithm's npix_min parameter\", flag_type=\"--\"\n )\n alg_npix_max: float = Field(\n 45.0, description=\"PyAlgos algorithm's npix_max parameter\", flag_type=\"--\"\n )\n alg_amax_thr: float = Field(\n 250.0, description=\"PyAlgos algorithm's amax_thr parameter\", flag_type=\"--\"\n )\n alg_atot_thr: float = Field(\n 330.0, description=\"PyAlgos algorithm's atot_thr parameter\", flag_type=\"--\"\n )\n alg_son_min: float = Field(\n 10.0, description=\"PyAlgos algorithm's son_min parameter\", flag_type=\"--\"\n )\n alg1_thr_low: float = Field(\n 80.0, description=\"PyAlgos algorithm's thr_low parameter\", flag_type=\"--\"\n )\n alg1_thr_high: float = Field(\n 270.0, description=\"PyAlgos algorithm's thr_high parameter\", flag_type=\"--\"\n )\n alg1_rank: int = Field(\n 3, description=\"PyAlgos algorithm's rank parameter\", flag_type=\"--\"\n )\n alg1_radius: int = Field(\n 3, description=\"PyAlgos algorithm's radius parameter\", flag_type=\"--\"\n )\n alg1_dr: int = Field(\n 1, description=\"PyAlgos algorithm's dr parameter\", flag_type=\"--\"\n )\n psanaMask_on: str = Field(\n \"True\", description=\"Whether psana's mask should be used\", flag_type=\"--\"\n )\n psanaMask_calib: str = Field(\n \"True\", description=\"Psana mask's calib parameter\", flag_type=\"--\"\n )\n psanaMask_status: str = Field(\n \"True\", description=\"Psana mask's status parameter\", flag_type=\"--\"\n )\n psanaMask_edges: str = Field(\n \"True\", description=\"Psana mask's edges parameter\", flag_type=\"--\"\n )\n psanaMask_central: str = Field(\n \"True\", description=\"Psana mask's central parameter\", flag_type=\"--\"\n )\n psanaMask_unbond: str = Field(\n \"True\", description=\"Psana mask's unbond parameter\", flag_type=\"--\"\n )\n psanaMask_unbondnrs: str = Field(\n \"True\", description=\"Psana mask's unbondnbrs parameter\", flag_type=\"--\"\n )\n mask: str = Field(\n \"\", description=\"Path to an additional mask to apply\", flag_type=\"--\"\n )\n clen: str = Field(\n description=\"Epics variable storing the camera length\", flag_type=\"--\"\n )\n coffset: float = Field(0, description=\"Camera offset in m\", flag_type=\"--\")\n minPeaks: int = Field(\n 15,\n description=\"Minimum number of peaks to mark frame for indexing\",\n flag_type=\"--\",\n )\n maxPeaks: int = Field(\n 15,\n description=\"Maximum number of peaks to mark frame for indexing\",\n flag_type=\"--\",\n )\n minRes: int = Field(\n 0,\n description=\"Minimum peak resolution to mark frame for indexing \",\n flag_type=\"--\",\n )\n sample: str = Field(\"\", description=\"Sample name\", flag_type=\"--\")\n instrument: Union[None, str] = Field(\n None, description=\"Instrument name\", flag_type=\"--\"\n )\n pixelSize: float = Field(0.0, description=\"Pixel size\", flag_type=\"--\")\n auto: str = Field(\n \"False\",\n description=(\n \"Whether to automatically determine peak per event peak \"\n \"finding parameters\"\n ),\n flag_type=\"--\",\n )\n detectorDistance: float = Field(\n 0.0, description=\"Detector distance from interaction point in m\", flag_type=\"--\"\n )\n access: Literal[\"ana\", \"ffb\"] = Field(\n \"ana\", description=\"Data node type: {ana,ffb}\", flag_type=\"--\"\n )\n szfile: str = Field(\"qoz.json\", description=\"Path to SZ's JSON configuration file\")\n lute_template_cfg: TemplateConfig = Field(\n TemplateConfig(\n template_name=\"sz.json\",\n output_path=\"\", # Will want to change where this goes...\n ),\n description=\"Template information for the sz.json file\",\n )\n sz_parameters: SZParameters = Field(\n description=\"Configuration parameters for SZ Compression\", flag_type=\"\"\n )\n\n @validator(\"e\", always=True)\n def validate_e(cls, e: str, values: Dict[str, Any]) -> str:\n if e == \"\":\n return values[\"lute_config\"].experiment\n return e\n\n @validator(\"r\", always=True)\n def validate_r(cls, r: int, values: Dict[str, Any]) -> int:\n if r == -1:\n return values[\"lute_config\"].run\n return r\n\n @validator(\"lute_template_cfg\", always=True)\n def set_output_path(\n cls, lute_template_cfg: TemplateConfig, values: Dict[str, Any]\n ) -> TemplateConfig:\n if lute_template_cfg.output_path == \"\":\n lute_template_cfg.output_path = values[\"szfile\"]\n return lute_template_cfg\n\n @validator(\"sz_parameters\", always=True)\n def set_sz_compression_parameters(\n cls, sz_parameters: SZParameters, values: Dict[str, Any]\n ) -> None:\n values[\"compressor\"] = sz_parameters.compressor\n values[\"binSize\"] = sz_parameters.binSize\n values[\"roiWindowSize\"] = sz_parameters.roiWindowSize\n if sz_parameters.compressor == \"qoz\":\n values[\"pressio_opts\"] = {\n \"pressio:abs\": sz_parameters.absError,\n \"qoz\": {\"qoz:stride\": 8},\n }\n else:\n values[\"pressio_opts\"] = {\"pressio:abs\": sz_parameters.absError}\n return None\n\n @root_validator(pre=False)\n def define_result(cls, values: Dict[str, Any]) -> Dict[str, Any]:\n exp: str = values[\"lute_config\"].experiment\n run: int = int(values[\"lute_config\"].run)\n directory: str = values[\"outDir\"]\n fname: str = f\"{exp}_{run:04d}.lst\"\n\n cls.Config.result_from_params = f\"{directory}/{fname}\"\n return values\n
"},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPsocakeParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_find_peaks.py
class Config(TaskParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n result_from_params: str = \"\"\n \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n
"},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPsocakeParameters.Config.result_from_params","title":"result_from_params: str = ''
class-attribute
instance-attribute
","text":"Defines a result from the parameters. Use a validator to do so.
"},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPsocakeParameters.Config.set_result","title":"set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPyAlgosParameters","title":"FindPeaksPyAlgosParameters
","text":" Bases: TaskParameters
Parameters for crystallographic (Bragg) peak finding using PyAlgos.
This peak finding Task optionally has the ability to compress/decompress data with SZ for the purpose of compression validation.
Source code inlute/io/models/sfx_find_peaks.py
class FindPeaksPyAlgosParameters(TaskParameters):\n \"\"\"Parameters for crystallographic (Bragg) peak finding using PyAlgos.\n\n This peak finding Task optionally has the ability to compress/decompress\n data with SZ for the purpose of compression validation.\n \"\"\"\n\n class Config(TaskParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n class SZCompressorParameters(BaseModel):\n compressor: Literal[\"qoz\", \"sz3\"] = Field(\n \"qoz\", description='Compression algorithm (\"qoz\" or \"sz3\")'\n )\n abs_error: float = Field(10.0, description=\"Absolute error bound\")\n bin_size: int = Field(2, description=\"Bin size\")\n roi_window_size: int = Field(\n 9,\n description=\"Default window size\",\n )\n\n outdir: str = Field(\n description=\"Output directory for cxi files\",\n )\n n_events: int = Field(\n 0,\n description=\"Number of events to process (0 to process all events)\",\n )\n det_name: str = Field(\n description=\"Psana name of the detector storing the image data\",\n )\n event_receiver: Literal[\"evr0\", \"evr1\"] = Field(\n description=\"Event Receiver to be used: evr0 or evr1\",\n )\n tag: str = Field(\n \"\",\n description=\"Tag to add to the output file names\",\n )\n pv_camera_length: Union[str, float] = Field(\n \"\",\n description=\"PV associated with camera length \"\n \"(if a number, camera length directly)\",\n )\n event_logic: bool = Field(\n False,\n description=\"True if only events with a specific event code should be \"\n \"processed. False if the event code should be ignored\",\n )\n event_code: int = Field(\n 0,\n description=\"Required events code for events to be processed if event logic \"\n \"is True\",\n )\n psana_mask: bool = Field(\n False,\n description=\"If True, apply mask from psana Detector object\",\n )\n mask_file: Union[str, None] = Field(\n None,\n description=\"File with a custom mask to apply. If None, no custom mask is \"\n \"applied\",\n )\n min_peaks: int = Field(2, description=\"Minimum number of peaks per image\")\n max_peaks: int = Field(\n 2048,\n description=\"Maximum number of peaks per image\",\n )\n npix_min: int = Field(\n 2,\n description=\"Minimum number of pixels per peak\",\n )\n npix_max: int = Field(\n 30,\n description=\"Maximum number of pixels per peak\",\n )\n amax_thr: float = Field(\n 80.0,\n description=\"Minimum intensity threshold for starting a peak\",\n )\n atot_thr: float = Field(\n 120.0,\n description=\"Minimum summed intensity threshold for pixel collection\",\n )\n son_min: float = Field(\n 7.0,\n description=\"Minimum signal-to-noise ratio to be considered a peak\",\n )\n peak_rank: int = Field(\n 3,\n description=\"Radius in which central peak pixel is a local maximum\",\n )\n r0: float = Field(\n 3.0,\n description=\"Radius of ring for background evaluation in pixels\",\n )\n dr: float = Field(\n 2.0,\n description=\"Width of ring for background evaluation in pixels\",\n )\n nsigm: float = Field(\n 7.0,\n description=\"Intensity threshold to include pixel in connected group\",\n )\n compression: Optional[SZCompressorParameters] = Field(\n None,\n description=\"Options for the SZ Compression Algorithm\",\n )\n out_file: str = Field(\n \"\",\n description=\"Path to output file.\",\n flag_type=\"-\",\n rename_param=\"o\",\n is_result=True,\n )\n\n @validator(\"out_file\", always=True)\n def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n if out_file == \"\":\n fname: Path = (\n Path(values[\"outdir\"])\n / f\"{values['lute_config'].experiment}_{values['lute_config'].run}_\"\n f\"{values['tag']}.list\"\n )\n return str(fname)\n return out_file\n
"},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPyAlgosParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_find_peaks.py
class Config(TaskParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPyAlgosParameters.Config.set_result","title":"set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/models/sfx_index/","title":"sfx_index","text":"Models for serial femtosecond crystallography indexing.
Classes:
Name DescriptionIndexCrystFELParameters
Perform indexing of hits/peaks using CrystFEL's indexamajig
.
ConcatenateStreamFilesParameters
","text":" Bases: TaskParameters
Parameters for stream concatenation.
Concatenates the stream file output from CrystFEL indexing for multiple experimental runs.
Source code inlute/io/models/sfx_index.py
class ConcatenateStreamFilesParameters(TaskParameters):\n \"\"\"Parameters for stream concatenation.\n\n Concatenates the stream file output from CrystFEL indexing for multiple\n experimental runs.\n \"\"\"\n\n class Config(TaskParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n in_file: str = Field(\n \"\",\n description=\"Root of directory tree storing stream files to merge.\",\n )\n\n tag: Optional[str] = Field(\n \"\",\n description=\"Tag identifying the stream files to merge.\",\n )\n\n out_file: str = Field(\n \"\", description=\"Path to merged output stream file.\", is_result=True\n )\n\n @validator(\"in_file\", always=True)\n def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n if in_file == \"\":\n stream_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"IndexCrystFEL\", \"out_file\"\n )\n if stream_file:\n stream_dir: str = str(Path(stream_file).parent)\n return stream_dir\n return in_file\n\n @validator(\"tag\", always=True)\n def validate_tag(cls, tag: str, values: Dict[str, Any]) -> str:\n if tag == \"\":\n stream_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"IndexCrystFEL\", \"out_file\"\n )\n if stream_file:\n stream_tag: str = Path(stream_file).name.split(\"_\")[0]\n return stream_tag\n return tag\n\n @validator(\"out_file\", always=True)\n def validate_out_file(cls, tag: str, values: Dict[str, Any]) -> str:\n if tag == \"\":\n stream_out_file: str = str(\n Path(values[\"in_file\"]).parent / f\"{values['tag'].stream}\"\n )\n return stream_out_file\n return tag\n
"},{"location":"source/io/models/sfx_index/#io.models.sfx_index.ConcatenateStreamFilesParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_index.py
class Config(TaskParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/models/sfx_index/#io.models.sfx_index.ConcatenateStreamFilesParameters.Config.set_result","title":"set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/models/sfx_index/#io.models.sfx_index.IndexCrystFELParameters","title":"IndexCrystFELParameters
","text":" Bases: ThirdPartyParameters
Parameters for CrystFEL's indexamajig
.
There are many parameters, and many combinations. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-indexamajig.html
Source code inlute/io/models/sfx_index.py
class IndexCrystFELParameters(ThirdPartyParameters):\n \"\"\"Parameters for CrystFEL's `indexamajig`.\n\n There are many parameters, and many combinations. For more information on\n usage, please refer to the CrystFEL documentation, here:\n https://www.desy.de/~twhite/crystfel/manual-indexamajig.html\n \"\"\"\n\n class Config(ThirdPartyParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/indexamajig\",\n description=\"CrystFEL's indexing binary.\",\n flag_type=\"\",\n )\n # Basic options\n in_file: Optional[str] = Field(\n \"\", description=\"Path to input file.\", flag_type=\"-\", rename_param=\"i\"\n )\n out_file: str = Field(\n \"\",\n description=\"Path to output file.\",\n flag_type=\"-\",\n rename_param=\"o\",\n is_result=True,\n )\n geometry: str = Field(\n \"\", description=\"Path to geometry file.\", flag_type=\"-\", rename_param=\"g\"\n )\n zmq_input: Optional[str] = Field(\n description=\"ZMQ address to receive data over. `input` and `zmq-input` are mutually exclusive\",\n flag_type=\"--\",\n rename_param=\"zmq-input\",\n )\n zmq_subscribe: Optional[str] = Field( # Can be used multiple times...\n description=\"Subscribe to ZMQ message of type `tag`\",\n flag_type=\"--\",\n rename_param=\"zmq-subscribe\",\n )\n zmq_request: Optional[AnyUrl] = Field(\n description=\"Request new data over ZMQ by sending this value\",\n flag_type=\"--\",\n rename_param=\"zmq-request\",\n )\n asapo_endpoint: Optional[str] = Field(\n description=\"ASAP::O endpoint. zmq-input and this are mutually exclusive.\",\n flag_type=\"--\",\n rename_param=\"asapo-endpoint\",\n )\n asapo_token: Optional[str] = Field(\n description=\"ASAP::O authentication token.\",\n flag_type=\"--\",\n rename_param=\"asapo-token\",\n )\n asapo_beamtime: Optional[str] = Field(\n description=\"ASAP::O beatime.\",\n flag_type=\"--\",\n rename_param=\"asapo-beamtime\",\n )\n asapo_source: Optional[str] = Field(\n description=\"ASAP::O data source.\",\n flag_type=\"--\",\n rename_param=\"asapo-source\",\n )\n asapo_group: Optional[str] = Field(\n description=\"ASAP::O consumer group.\",\n flag_type=\"--\",\n rename_param=\"asapo-group\",\n )\n asapo_stream: Optional[str] = Field(\n description=\"ASAP::O stream.\",\n flag_type=\"--\",\n rename_param=\"asapo-stream\",\n )\n asapo_wait_for_stream: Optional[str] = Field(\n description=\"If ASAP::O stream does not exist, wait for it to appear.\",\n flag_type=\"--\",\n rename_param=\"asapo-wait-for-stream\",\n )\n data_format: Optional[str] = Field(\n description=\"Specify format for ZMQ or ASAP::O. `msgpack`, `hdf5` or `seedee`.\",\n flag_type=\"--\",\n rename_param=\"data-format\",\n )\n basename: bool = Field(\n False,\n description=\"Remove directory parts of filenames. Acts before prefix if prefix also given.\",\n flag_type=\"--\",\n )\n prefix: Optional[str] = Field(\n description=\"Add a prefix to the filenames from the infile argument.\",\n flag_type=\"--\",\n rename_param=\"asapo-stream\",\n )\n nthreads: PositiveInt = Field(\n max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n description=\"Number of threads to use. See also `max_indexer_threads`.\",\n flag_type=\"-\",\n rename_param=\"j\",\n )\n no_check_prefix: bool = Field(\n False,\n description=\"Don't attempt to correct the prefix if it seems incorrect.\",\n flag_type=\"--\",\n rename_param=\"no-check-prefix\",\n )\n highres: Optional[float] = Field(\n description=\"Mark all pixels greater than `x` has bad.\", flag_type=\"--\"\n )\n profile: bool = Field(\n False, description=\"Display timing data to monitor performance.\", flag_type=\"--\"\n )\n temp_dir: Optional[str] = Field(\n description=\"Specify a path for the temp files folder.\",\n flag_type=\"--\",\n rename_param=\"temp-dir\",\n )\n wait_for_file: conint(gt=-2) = Field(\n 0,\n description=\"Wait at most `x` seconds for a file to be created. A value of -1 means wait forever.\",\n flag_type=\"--\",\n rename_param=\"wait-for-file\",\n )\n no_image_data: bool = Field(\n False,\n description=\"Load only the metadata, no iamges. Can check indexability without high data requirements.\",\n flag_type=\"--\",\n rename_param=\"no-image-data\",\n )\n # Peak-finding options\n # ....\n # Indexing options\n indexing: Optional[str] = Field(\n description=\"Comma-separated list of supported indexing algorithms to use. Default is to automatically detect.\",\n flag_type=\"--\",\n )\n cell_file: Optional[str] = Field(\n description=\"Path to a file containing unit cell information (PDB or CrystFEL format).\",\n flag_type=\"-\",\n rename_param=\"p\",\n )\n tolerance: str = Field(\n \"5,5,5,1.5\",\n description=(\n \"Tolerances (in percent) for unit cell comparison. \"\n \"Comma-separated list a,b,c,angle. Default=5,5,5,1.5\"\n ),\n flag_type=\"--\",\n )\n no_check_cell: bool = Field(\n False,\n description=\"Do not check cell parameters against unit cell. Replaces '-raw' method.\",\n flag_type=\"--\",\n rename_param=\"no-check-cell\",\n )\n no_check_peaks: bool = Field(\n False,\n description=\"Do not verify peaks are accounted for by solution.\",\n flag_type=\"--\",\n rename_param=\"no-check-peaks\",\n )\n multi: bool = Field(\n False, description=\"Enable multi-lattice indexing.\", flag_type=\"--\"\n )\n wavelength_estimate: Optional[float] = Field(\n description=\"Estimate for X-ray wavelength. Required for some methods.\",\n flag_type=\"--\",\n rename_param=\"wavelength-estimate\",\n )\n camera_length_estimate: Optional[float] = Field(\n description=\"Estimate for camera distance. Required for some methods.\",\n flag_type=\"--\",\n rename_param=\"camera-length-estimate\",\n )\n max_indexer_threads: Optional[PositiveInt] = Field(\n # 1,\n description=\"Some indexing algos can use multiple threads. In addition to image-based.\",\n flag_type=\"--\",\n rename_param=\"max-indexer-threads\",\n )\n no_retry: bool = Field(\n False,\n description=\"Do not remove weak peaks and try again.\",\n flag_type=\"--\",\n rename_param=\"no-retry\",\n )\n no_refine: bool = Field(\n False,\n description=\"Skip refinement step.\",\n flag_type=\"--\",\n rename_param=\"no-refine\",\n )\n no_revalidate: bool = Field(\n False,\n description=\"Skip revalidation step.\",\n flag_type=\"--\",\n rename_param=\"no-revalidate\",\n )\n # TakeTwo specific parameters\n taketwo_member_threshold: Optional[PositiveInt] = Field(\n # 20,\n description=\"Minimum number of vectors to consider.\",\n flag_type=\"--\",\n rename_param=\"taketwo-member-threshold\",\n )\n taketwo_len_tolerance: Optional[PositiveFloat] = Field(\n # 0.001,\n description=\"TakeTwo length tolerance in Angstroms.\",\n flag_type=\"--\",\n rename_param=\"taketwo-len-tolerance\",\n )\n taketwo_angle_tolerance: Optional[PositiveFloat] = Field(\n # 0.6,\n description=\"TakeTwo angle tolerance in degrees.\",\n flag_type=\"--\",\n rename_param=\"taketwo-angle-tolerance\",\n )\n taketwo_trace_tolerance: Optional[PositiveFloat] = Field(\n # 3,\n description=\"Matrix trace tolerance in degrees.\",\n flag_type=\"--\",\n rename_param=\"taketwo-trace-tolerance\",\n )\n # Felix-specific parameters\n # felix_domega\n # felix-fraction-max-visits\n # felix-max-internal-angle\n # felix-max-uniqueness\n # felix-min-completeness\n # felix-min-visits\n # felix-num-voxels\n # felix-sigma\n # felix-tthrange-max\n # felix-tthrange-min\n # XGANDALF-specific parameters\n xgandalf_sampling_pitch: Optional[NonNegativeInt] = Field(\n # 6,\n description=\"Density of reciprocal space sampling.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-sampling-pitch\",\n )\n xgandalf_grad_desc_iterations: Optional[NonNegativeInt] = Field(\n # 4,\n description=\"Number of gradient descent iterations.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-grad-desc-iterations\",\n )\n xgandalf_tolerance: Optional[PositiveFloat] = Field(\n # 0.02,\n description=\"Relative tolerance of lattice vectors\",\n flag_type=\"--\",\n rename_param=\"xgandalf-tolerance\",\n )\n xgandalf_no_deviation_from_provided_cell: Optional[bool] = Field(\n description=\"Found unit cell must match provided.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-no-deviation-from-provided-cell\",\n )\n xgandalf_min_lattice_vector_length: Optional[PositiveFloat] = Field(\n # 30,\n description=\"Minimum possible lattice length.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-min-lattice-vector-length\",\n )\n xgandalf_max_lattice_vector_length: Optional[PositiveFloat] = Field(\n # 250,\n description=\"Minimum possible lattice length.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-max-lattice-vector-length\",\n )\n xgandalf_max_peaks: Optional[PositiveInt] = Field(\n # 250,\n description=\"Maximum number of peaks to use for indexing.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-max-peaks\",\n )\n xgandalf_fast_execution: bool = Field(\n False,\n description=\"Shortcut to set sampling-pitch=2, and grad-desc-iterations=3.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-fast-execution\",\n )\n # pinkIndexer parameters\n # ...\n # asdf_fast: bool = Field(False, description=\"Enable fast mode for asdf. 3x faster for 7% loss in accuracy.\", flag_type=\"--\", rename_param=\"asdf-fast\")\n # Integration parameters\n integration: str = Field(\n \"rings-nocen\", description=\"Method for integrating reflections.\", flag_type=\"--\"\n )\n fix_profile_radius: Optional[float] = Field(\n description=\"Fix the profile radius (m^{-1})\",\n flag_type=\"--\",\n rename_param=\"fix-profile-radius\",\n )\n fix_divergence: Optional[float] = Field(\n 0,\n description=\"Fix the divergence (rad, full angle).\",\n flag_type=\"--\",\n rename_param=\"fix-divergence\",\n )\n int_radius: str = Field(\n \"4,5,7\",\n description=\"Inner, middle, and outer radii for 3-ring integration.\",\n flag_type=\"--\",\n rename_param=\"int-radius\",\n )\n int_diag: str = Field(\n \"none\",\n description=\"Show detailed information on integration when condition is met.\",\n flag_type=\"--\",\n rename_param=\"int-diag\",\n )\n push_res: str = Field(\n \"infinity\",\n description=\"Integrate `x` higher than apparent resolution limit (nm-1).\",\n flag_type=\"--\",\n rename_param=\"push-res\",\n )\n overpredict: bool = Field(\n False,\n description=\"Over-predict reflections. Maybe useful with post-refinement.\",\n flag_type=\"--\",\n )\n cell_parameters_only: bool = Field(\n False, description=\"Do not predict refletions at all\", flag_type=\"--\"\n )\n # Output parameters\n no_non_hits_in_stream: bool = Field(\n False,\n description=\"Exclude non-hits from the stream file.\",\n flag_type=\"--\",\n rename_param=\"no-non-hits-in-stream\",\n )\n copy_hheader: Optional[str] = Field(\n description=\"Copy information from header in the image to output stream.\",\n flag_type=\"--\",\n rename_param=\"copy-hheader\",\n )\n no_peaks_in_stream: bool = Field(\n False,\n description=\"Do not record peaks in stream file.\",\n flag_type=\"--\",\n rename_param=\"no-peaks-in-stream\",\n )\n no_refls_in_stream: bool = Field(\n False,\n description=\"Do not record reflections in stream.\",\n flag_type=\"--\",\n rename_param=\"no-refls-in-stream\",\n )\n serial_offset: Optional[PositiveInt] = Field(\n description=\"Start numbering at `x` instead of 1.\",\n flag_type=\"--\",\n rename_param=\"serial-offset\",\n )\n harvest_file: Optional[str] = Field(\n description=\"Write parameters to file in JSON format.\",\n flag_type=\"--\",\n rename_param=\"harvest-file\",\n )\n\n @validator(\"in_file\", always=True)\n def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n if in_file == \"\":\n filename: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"FindPeaksPyAlgos\", \"out_file\"\n )\n if filename is None:\n exp: str = values[\"lute_config\"].experiment\n run: int = int(values[\"lute_config\"].run)\n tag: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"FindPeaksPsocake\", \"tag\"\n )\n out_dir: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"FindPeaksPsocake\", \"outDir\"\n )\n if out_dir is not None:\n fname: str = f\"{out_dir}/{exp}_{run:04d}\"\n if tag is not None:\n fname = f\"{fname}_{tag}\"\n return f\"{fname}.lst\"\n else:\n return filename\n return in_file\n\n @validator(\"out_file\", always=True)\n def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n if out_file == \"\":\n expmt: str = values[\"lute_config\"].experiment\n run: int = int(values[\"lute_config\"].run)\n work_dir: str = values[\"lute_config\"].work_dir\n fname: str = f\"{expmt}_r{run:04d}.stream\"\n return f\"{work_dir}/{fname}\"\n return out_file\n
"},{"location":"source/io/models/sfx_index/#io.models.sfx_index.IndexCrystFELParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_index.py
class Config(ThirdPartyParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n
"},{"location":"source/io/models/sfx_index/#io.models.sfx_index.IndexCrystFELParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True
class-attribute
instance-attribute
","text":"Whether long command-line arguments are passed like --long=arg
.
set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/models/sfx_merge/","title":"sfx_merge","text":"Models for merging reflections in serial femtosecond crystallography.
Classes:
Name DescriptionMergePartialatorParameters
Perform merging using CrystFEL's partialator
.
CompareHKLParameters
Calculate figures of merit using CrystFEL's compare_hkl
.
ManipulateHKLParameters
Perform transformations on lists of reflections using CrystFEL's get_hkl
.
CompareHKLParameters
","text":" Bases: ThirdPartyParameters
Parameters for CrystFEL's compare_hkl
for calculating figures of merit.
There are many parameters, and many combinations. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-partialator.html
Source code inlute/io/models/sfx_merge.py
class CompareHKLParameters(ThirdPartyParameters):\n \"\"\"Parameters for CrystFEL's `compare_hkl` for calculating figures of merit.\n\n There are many parameters, and many combinations. For more information on\n usage, please refer to the CrystFEL documentation, here:\n https://www.desy.de/~twhite/crystfel/manual-partialator.html\n \"\"\"\n\n class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/compare_hkl\",\n description=\"CrystFEL's reflection comparison binary.\",\n flag_type=\"\",\n )\n in_files: Optional[str] = Field(\n \"\",\n description=\"Path to input HKLs. Space-separated list of 2. Use output of partialator e.g.\",\n flag_type=\"\",\n )\n ## Need mechanism to set is_result=True ...\n symmetry: str = Field(\"\", description=\"Point group symmetry.\", flag_type=\"--\")\n cell_file: str = Field(\n \"\",\n description=\"Path to a file containing unit cell information (PDB or CrystFEL format).\",\n flag_type=\"-\",\n rename_param=\"p\",\n )\n fom: str = Field(\n \"Rsplit\", description=\"Specify figure of merit to calculate.\", flag_type=\"--\"\n )\n nshells: int = Field(10, description=\"Use n resolution shells.\", flag_type=\"--\")\n # NEED A NEW CASE FOR THIS -> Boolean flag, no arg, one hyphen...\n # fix_unity: bool = Field(\n # False,\n # description=\"Fix scale factors to unity.\",\n # flag_type=\"-\",\n # rename_param=\"u\",\n # )\n shell_file: str = Field(\n \"\",\n description=\"Write the statistics in resolution shells to a file.\",\n flag_type=\"--\",\n rename_param=\"shell-file\",\n is_result=True,\n )\n ignore_negs: bool = Field(\n False,\n description=\"Ignore reflections with negative reflections.\",\n flag_type=\"--\",\n rename_param=\"ignore-negs\",\n )\n zero_negs: bool = Field(\n False,\n description=\"Set negative intensities to 0.\",\n flag_type=\"--\",\n rename_param=\"zero-negs\",\n )\n sigma_cutoff: Optional[Union[float, int, str]] = Field(\n # \"-infinity\",\n description=\"Discard reflections with I/sigma(I) < n. -infinity means no cutoff.\",\n flag_type=\"--\",\n rename_param=\"sigma-cutoff\",\n )\n rmin: Optional[float] = Field(\n description=\"Low resolution cutoff of 1/d (m-1). Use this or --lowres NOT both.\",\n flag_type=\"--\",\n )\n lowres: Optional[float] = Field(\n descirption=\"Low resolution cutoff in Angstroms. Use this or --rmin NOT both.\",\n flag_type=\"--\",\n )\n rmax: Optional[float] = Field(\n description=\"High resolution cutoff in 1/d (m-1). Use this or --highres NOT both.\",\n flag_type=\"--\",\n )\n highres: Optional[float] = Field(\n description=\"High resolution cutoff in Angstroms. Use this or --rmax NOT both.\",\n flag_type=\"--\",\n )\n\n @validator(\"in_files\", always=True)\n def validate_in_files(cls, in_files: str, values: Dict[str, Any]) -> str:\n if in_files == \"\":\n partialator_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n )\n if partialator_file:\n hkls: str = f\"{partialator_file}1 {partialator_file}2\"\n return hkls\n return in_files\n\n @validator(\"cell_file\", always=True)\n def validate_cell_file(cls, cell_file: str, values: Dict[str, Any]) -> str:\n if cell_file == \"\":\n idx_cell_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\",\n \"IndexCrystFEL\",\n \"cell_file\",\n valid_only=False,\n )\n if idx_cell_file:\n return idx_cell_file\n return cell_file\n\n @validator(\"symmetry\", always=True)\n def validate_symmetry(cls, symmetry: str, values: Dict[str, Any]) -> str:\n if symmetry == \"\":\n partialator_sym: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"symmetry\"\n )\n if partialator_sym:\n return partialator_sym\n return symmetry\n\n @validator(\"shell_file\", always=True)\n def validate_shell_file(cls, shell_file: str, values: Dict[str, Any]) -> str:\n if shell_file == \"\":\n partialator_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n )\n if partialator_file:\n shells_out: str = partialator_file.split(\".\")[0]\n shells_out = f\"{shells_out}_{values['fom']}_n{values['nshells']}.dat\"\n return shells_out\n return shell_file\n
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.CompareHKLParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_merge.py
class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.CompareHKLParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True
class-attribute
instance-attribute
","text":"Whether long command-line arguments are passed like --long=arg
.
set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.ManipulateHKLParameters","title":"ManipulateHKLParameters
","text":" Bases: ThirdPartyParameters
Parameters for CrystFEL's get_hkl
for manipulating lists of reflections.
This Task is predominantly used internally to convert hkl
to mtz
files. Note that performing multiple manipulations is undefined behaviour. Run the Task with multiple configurations in explicit separate steps. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-partialator.html
lute/io/models/sfx_merge.py
class ManipulateHKLParameters(ThirdPartyParameters):\n \"\"\"Parameters for CrystFEL's `get_hkl` for manipulating lists of reflections.\n\n This Task is predominantly used internally to convert `hkl` to `mtz` files.\n Note that performing multiple manipulations is undefined behaviour. Run\n the Task with multiple configurations in explicit separate steps. For more\n information on usage, please refer to the CrystFEL documentation, here:\n https://www.desy.de/~twhite/crystfel/manual-partialator.html\n \"\"\"\n\n class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/get_hkl\",\n description=\"CrystFEL's reflection manipulation binary.\",\n flag_type=\"\",\n )\n in_file: str = Field(\n \"\",\n description=\"Path to input HKL file.\",\n flag_type=\"-\",\n rename_param=\"i\",\n )\n out_file: str = Field(\n \"\",\n description=\"Path to output file.\",\n flag_type=\"-\",\n rename_param=\"o\",\n is_result=True,\n )\n cell_file: str = Field(\n \"\",\n description=\"Path to a file containing unit cell information (PDB or CrystFEL format).\",\n flag_type=\"-\",\n rename_param=\"p\",\n )\n output_format: str = Field(\n \"mtz\",\n description=\"Output format. One of mtz, mtz-bij, or xds. Otherwise CrystFEL format.\",\n flag_type=\"--\",\n rename_param=\"output-format\",\n )\n expand: Optional[str] = Field(\n description=\"Reflections will be expanded to fill asymmetric unit of specified point group.\",\n flag_type=\"--\",\n )\n # Reducing reflections to higher symmetry\n twin: Optional[str] = Field(\n description=\"Reflections equivalent to specified point group will have intensities summed.\",\n flag_type=\"--\",\n )\n no_need_all_parts: Optional[bool] = Field(\n description=\"Use with --twin to allow reflections missing a 'twin mate' to be written out.\",\n flag_type=\"--\",\n rename_param=\"no-need-all-parts\",\n )\n # Noise - Add to data\n noise: Optional[bool] = Field(\n description=\"Generate 10% uniform noise.\", flag_type=\"--\"\n )\n poisson: Optional[bool] = Field(\n description=\"Generate Poisson noise. Intensities assumed to be A.U.\",\n flag_type=\"--\",\n )\n adu_per_photon: Optional[int] = Field(\n description=\"Use with --poisson to convert A.U. to photons.\",\n flag_type=\"--\",\n rename_param=\"adu-per-photon\",\n )\n # Remove duplicate reflections\n trim_centrics: Optional[bool] = Field(\n description=\"Duplicated reflections (according to symmetry) are removed.\",\n flag_type=\"--\",\n )\n # Restrict to template file\n template: Optional[str] = Field(\n description=\"Only reflections which also appear in specified file are written out.\",\n flag_type=\"--\",\n )\n # Multiplicity\n multiplicity: Optional[bool] = Field(\n description=\"Reflections are multiplied by their symmetric multiplicites.\",\n flag_type=\"--\",\n )\n # Resolution cutoffs\n cutoff_angstroms: Optional[Union[str, int, float]] = Field(\n description=\"Either n, or n1,n2,n3. For n, reflections < n are removed. For n1,n2,n3 anisotropic trunction performed at separate resolution limits for a*, b*, c*.\",\n flag_type=\"--\",\n rename_param=\"cutoff-angstroms\",\n )\n lowres: Optional[float] = Field(\n description=\"Remove reflections with d > n\", flag_type=\"--\"\n )\n highres: Optional[float] = Field(\n description=\"Synonym for first form of --cutoff-angstroms\"\n )\n reindex: Optional[str] = Field(\n description=\"Reindex according to specified operator. E.g. k,h,-l.\",\n flag_type=\"--\",\n )\n # Override input symmetry\n symmetry: Optional[str] = Field(\n description=\"Point group symmetry to use to override. Almost always OMIT this option.\",\n flag_type=\"--\",\n )\n\n @validator(\"in_file\", always=True)\n def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n if in_file == \"\":\n partialator_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n )\n if partialator_file:\n return partialator_file\n return in_file\n\n @validator(\"out_file\", always=True)\n def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n if out_file == \"\":\n partialator_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n )\n if partialator_file:\n mtz_out: str = partialator_file.split(\".\")[0]\n mtz_out = f\"{mtz_out}.mtz\"\n return mtz_out\n return out_file\n\n @validator(\"cell_file\", always=True)\n def validate_cell_file(cls, cell_file: str, values: Dict[str, Any]) -> str:\n if cell_file == \"\":\n idx_cell_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\",\n \"IndexCrystFEL\",\n \"cell_file\",\n valid_only=False,\n )\n if idx_cell_file:\n return idx_cell_file\n return cell_file\n
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.ManipulateHKLParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_merge.py
class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.ManipulateHKLParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True
class-attribute
instance-attribute
","text":"Whether long command-line arguments are passed like --long=arg
.
set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.MergePartialatorParameters","title":"MergePartialatorParameters
","text":" Bases: ThirdPartyParameters
Parameters for CrystFEL's partialator
.
There are many parameters, and many combinations. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-partialator.html
Source code inlute/io/models/sfx_merge.py
class MergePartialatorParameters(ThirdPartyParameters):\n \"\"\"Parameters for CrystFEL's `partialator`.\n\n There are many parameters, and many combinations. For more information on\n usage, please refer to the CrystFEL documentation, here:\n https://www.desy.de/~twhite/crystfel/manual-partialator.html\n \"\"\"\n\n class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/partialator\",\n description=\"CrystFEL's Partialator binary.\",\n flag_type=\"\",\n )\n in_file: Optional[str] = Field(\n \"\", description=\"Path to input stream.\", flag_type=\"-\", rename_param=\"i\"\n )\n out_file: str = Field(\n \"\",\n description=\"Path to output file.\",\n flag_type=\"-\",\n rename_param=\"o\",\n is_result=True,\n )\n symmetry: str = Field(description=\"Point group symmetry.\", flag_type=\"--\")\n niter: Optional[int] = Field(\n description=\"Number of cycles of scaling and post-refinement.\",\n flag_type=\"-\",\n rename_param=\"n\",\n )\n no_scale: Optional[bool] = Field(\n description=\"Disable scaling.\", flag_type=\"--\", rename_param=\"no-scale\"\n )\n no_Bscale: Optional[bool] = Field(\n description=\"Disable Debye-Waller part of scaling.\",\n flag_type=\"--\",\n rename_param=\"no-Bscale\",\n )\n no_pr: Optional[bool] = Field(\n description=\"Disable orientation model.\", flag_type=\"--\", rename_param=\"no-pr\"\n )\n no_deltacchalf: Optional[bool] = Field(\n description=\"Disable rejection based on deltaCC1/2.\",\n flag_type=\"--\",\n rename_param=\"no-deltacchalf\",\n )\n model: str = Field(\n \"unity\",\n description=\"Partiality model. Options: xsphere, unity, offset, ggpm.\",\n flag_type=\"--\",\n )\n nthreads: int = Field(\n max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n description=\"Number of parallel analyses.\",\n flag_type=\"-\",\n rename_param=\"j\",\n )\n polarisation: Optional[str] = Field(\n description=\"Specification of incident polarisation. Refer to CrystFEL docs for more info.\",\n flag_type=\"--\",\n )\n no_polarisation: Optional[bool] = Field(\n description=\"Synonym for --polarisation=none\",\n flag_type=\"--\",\n rename_param=\"no-polarisation\",\n )\n max_adu: Optional[float] = Field(\n description=\"Maximum intensity of reflection to include.\",\n flag_type=\"--\",\n rename_param=\"max-adu\",\n )\n min_res: Optional[float] = Field(\n description=\"Only include crystals diffracting to a minimum resolution.\",\n flag_type=\"--\",\n rename_param=\"min-res\",\n )\n min_measurements: int = Field(\n 2,\n description=\"Include a reflection only if it appears a minimum number of times.\",\n flag_type=\"--\",\n rename_param=\"min-measurements\",\n )\n push_res: Optional[float] = Field(\n description=\"Merge reflections up to higher than the apparent resolution limit.\",\n flag_type=\"--\",\n rename_param=\"push-res\",\n )\n start_after: int = Field(\n 0,\n description=\"Ignore the first n crystals.\",\n flag_type=\"--\",\n rename_param=\"start-after\",\n )\n stop_after: int = Field(\n 0,\n description=\"Stop after processing n crystals. 0 means process all.\",\n flag_type=\"--\",\n rename_param=\"stop-after\",\n )\n no_free: Optional[bool] = Field(\n description=\"Disable cross-validation. Testing ONLY.\",\n flag_type=\"--\",\n rename_param=\"no-free\",\n )\n custom_split: Optional[str] = Field(\n description=\"Read a set of filenames, event and dataset IDs from a filename.\",\n flag_type=\"--\",\n rename_param=\"custom-split\",\n )\n max_rel_B: float = Field(\n 100,\n description=\"Reject crystals if |relB| > n sq Angstroms.\",\n flag_type=\"--\",\n rename_param=\"max-rel-B\",\n )\n output_every_cycle: bool = Field(\n False,\n description=\"Write per-crystal params after every refinement cycle.\",\n flag_type=\"--\",\n rename_param=\"output-every-cycle\",\n )\n no_logs: bool = Field(\n False,\n description=\"Do not write logs needed for plots, maps and graphs.\",\n flag_type=\"--\",\n rename_param=\"no-logs\",\n )\n set_symmetry: Optional[str] = Field(\n description=\"Set the apparent symmetry of the crystals to a point group.\",\n flag_type=\"-\",\n rename_param=\"w\",\n )\n operator: Optional[str] = Field(\n description=\"Specify an ambiguity operator. E.g. k,h,-l.\", flag_type=\"--\"\n )\n force_bandwidth: Optional[float] = Field(\n description=\"Set X-ray bandwidth. As percent, e.g. 0.0013 (0.13%).\",\n flag_type=\"--\",\n rename_param=\"force-bandwidth\",\n )\n force_radius: Optional[float] = Field(\n description=\"Set the initial profile radius (nm-1).\",\n flag_type=\"--\",\n rename_param=\"force-radius\",\n )\n force_lambda: Optional[float] = Field(\n description=\"Set the wavelength. In Angstroms.\",\n flag_type=\"--\",\n rename_param=\"force-lambda\",\n )\n harvest_file: Optional[str] = Field(\n description=\"Write parameters to file in JSON format.\",\n flag_type=\"--\",\n rename_param=\"harvest-file\",\n )\n\n @validator(\"in_file\", always=True)\n def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n if in_file == \"\":\n stream_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\",\n \"ConcatenateStreamFiles\",\n \"out_file\",\n )\n if stream_file:\n return stream_file\n return in_file\n\n @validator(\"out_file\", always=True)\n def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n if out_file == \"\":\n in_file: str = values[\"in_file\"]\n if in_file:\n tag: str = in_file.split(\".\")[0]\n return f\"{tag}.hkl\"\n else:\n return \"partialator.hkl\"\n return out_file\n
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.MergePartialatorParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_merge.py
class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.MergePartialatorParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True
class-attribute
instance-attribute
","text":"Whether long command-line arguments are passed like --long=arg
.
set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/models/sfx_solve/","title":"sfx_solve","text":"Models for structure solution in serial femtosecond crystallography.
Classes:
Name DescriptionDimpleSolveParameters
Perform structure solution using CCP4's dimple (molecular replacement).
"},{"location":"source/io/models/sfx_solve/#io.models.sfx_solve.DimpleSolveParameters","title":"DimpleSolveParameters
","text":" Bases: ThirdPartyParameters
Parameters for CCP4's dimple program.
There are many parameters. For more information on usage, please refer to the CCP4 documentation, here: https://ccp4.github.io/dimple/
Source code inlute/io/models/sfx_solve.py
class DimpleSolveParameters(ThirdPartyParameters):\n \"\"\"Parameters for CCP4's dimple program.\n\n There are many parameters. For more information on\n usage, please refer to the CCP4 documentation, here:\n https://ccp4.github.io/dimple/\n \"\"\"\n\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/ccp4-8.0/bin/dimple\",\n description=\"CCP4 Dimple for solving structures with MR.\",\n flag_type=\"\",\n )\n # Positional requirements - all required.\n in_file: str = Field(\n \"\",\n description=\"Path to input mtz.\",\n flag_type=\"\",\n )\n pdb: str = Field(\"\", description=\"Path to a PDB.\", flag_type=\"\")\n out_dir: str = Field(\"\", description=\"Output DIRECTORY.\", flag_type=\"\")\n # Most used options\n mr_thresh: PositiveFloat = Field(\n 0.4,\n description=\"Threshold for molecular replacement.\",\n flag_type=\"--\",\n rename_param=\"mr-when-r\",\n )\n slow: Optional[bool] = Field(\n False, description=\"Perform more refinement.\", flag_type=\"--\"\n )\n # Other options (IO)\n hklout: str = Field(\n \"final.mtz\", description=\"Output mtz file name.\", flag_type=\"--\"\n )\n xyzout: str = Field(\n \"final.pdb\", description=\"Output PDB file name.\", flag_type=\"--\"\n )\n icolumn: Optional[str] = Field(\n # \"IMEAN\",\n description=\"Name for the I column.\",\n flag_type=\"--\",\n )\n sigicolumn: Optional[str] = Field(\n # \"SIG<ICOL>\",\n description=\"Name for the Sig<I> column.\",\n flag_type=\"--\",\n )\n fcolumn: Optional[str] = Field(\n # \"F\",\n description=\"Name for the F column.\",\n flag_type=\"--\",\n )\n sigfcolumn: Optional[str] = Field(\n # \"F\",\n description=\"Name for the Sig<F> column.\",\n flag_type=\"--\",\n )\n libin: Optional[str] = Field(\n description=\"Ligand descriptions for refmac (LIBIN).\", flag_type=\"--\"\n )\n refmac_key: Optional[str] = Field(\n description=\"Extra Refmac keywords to use in refinement.\",\n flag_type=\"--\",\n rename_param=\"refmac-key\",\n )\n free_r_flags: Optional[str] = Field(\n description=\"Path to a mtz file with freeR flags.\",\n flag_type=\"--\",\n rename_param=\"free-r-flags\",\n )\n freecolumn: Optional[Union[int, float]] = Field(\n # 0,\n description=\"Refree column with an optional value.\",\n flag_type=\"--\",\n )\n img_format: Optional[str] = Field(\n description=\"Format of generated images. (png, jpeg, none).\",\n flag_type=\"-\",\n rename_param=\"f\",\n )\n white_bg: bool = Field(\n False,\n description=\"Use a white background in Coot and in images.\",\n flag_type=\"--\",\n rename_param=\"white-bg\",\n )\n no_cleanup: bool = Field(\n False,\n description=\"Retain intermediate files.\",\n flag_type=\"--\",\n rename_param=\"no-cleanup\",\n )\n # Calculations\n no_blob_search: bool = Field(\n False,\n description=\"Do not search for unmodelled blobs.\",\n flag_type=\"--\",\n rename_param=\"no-blob-search\",\n )\n anode: bool = Field(\n False, description=\"Use SHELX/AnoDe to find peaks in the anomalous map.\"\n )\n # Run customization\n no_hetatm: bool = Field(\n False,\n description=\"Remove heteroatoms from the given model.\",\n flag_type=\"--\",\n rename_param=\"no-hetatm\",\n )\n rigid_cycles: Optional[PositiveInt] = Field(\n # 10,\n description=\"Number of cycles of rigid-body refinement to perform.\",\n flag_type=\"--\",\n rename_param=\"rigid-cycles\",\n )\n jelly: Optional[PositiveInt] = Field(\n # 4,\n description=\"Number of cycles of jelly-body refinement to perform.\",\n flag_type=\"--\",\n )\n restr_cycles: Optional[PositiveInt] = Field(\n # 8,\n description=\"Number of cycles of refmac final refinement to perform.\",\n flag_type=\"--\",\n rename_param=\"restr-cycles\",\n )\n lim_resolution: Optional[PositiveFloat] = Field(\n description=\"Limit the final resolution.\", flag_type=\"--\", rename_param=\"reso\"\n )\n weight: Optional[str] = Field(\n # \"auto-weight\",\n description=\"The refmac matrix weight.\",\n flag_type=\"--\",\n )\n mr_prog: Optional[str] = Field(\n # \"phaser\",\n description=\"Molecular replacement program. phaser or molrep.\",\n flag_type=\"--\",\n rename_param=\"mr-prog\",\n )\n mr_num: Optional[Union[str, int]] = Field(\n # \"auto\",\n description=\"Number of molecules to use for molecular replacement.\",\n flag_type=\"--\",\n rename_param=\"mr-num\",\n )\n mr_reso: Optional[PositiveFloat] = Field(\n # 3.25,\n description=\"High resolution for molecular replacement. If >10 interpreted as eLLG.\",\n flag_type=\"--\",\n rename_param=\"mr-reso\",\n )\n itof_prog: Optional[str] = Field(\n description=\"Program to calculate amplitudes. truncate, or ctruncate.\",\n flag_type=\"--\",\n rename_param=\"ItoF-prog\",\n )\n\n @validator(\"in_file\", always=True)\n def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n if in_file == \"\":\n get_hkl_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"ManipulateHKL\", \"out_file\"\n )\n if get_hkl_file:\n return get_hkl_file\n return in_file\n\n @validator(\"out_dir\", always=True)\n def validate_out_dir(cls, out_dir: str, values: Dict[str, Any]) -> str:\n if out_dir == \"\":\n get_hkl_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"ManipulateHKL\", \"out_file\"\n )\n if get_hkl_file:\n return os.path.dirname(get_hkl_file)\n return out_dir\n
"},{"location":"source/io/models/sfx_solve/#io.models.sfx_solve.RunSHELXCParameters","title":"RunSHELXCParameters
","text":" Bases: ThirdPartyParameters
Parameters for CCP4's SHELXC program.
SHELXC prepares files for SHELXD and SHELXE.
For more information please refer to the official documentation: https://www.ccp4.ac.uk/html/crank.html
Source code inlute/io/models/sfx_solve.py
class RunSHELXCParameters(ThirdPartyParameters):\n \"\"\"Parameters for CCP4's SHELXC program.\n\n SHELXC prepares files for SHELXD and SHELXE.\n\n For more information please refer to the official documentation:\n https://www.ccp4.ac.uk/html/crank.html\n \"\"\"\n\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/ccp4-8.0/bin/shelxc\",\n description=\"CCP4 SHELXC. Generates input files for SHELXD/SHELXE.\",\n flag_type=\"\",\n )\n placeholder: str = Field(\n \"xx\", description=\"Placeholder filename stem.\", flag_type=\"\"\n )\n in_file: str = Field(\n \"\",\n description=\"Input file for SHELXC with reflections AND proper records.\",\n flag_type=\"\",\n )\n\n @validator(\"in_file\", always=True)\n def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n if in_file == \"\":\n # get_hkl needed to be run to produce an XDS format file...\n xds_format_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"ManipulateHKL\", \"out_file\"\n )\n if xds_format_file:\n in_file = xds_format_file\n if in_file[0] != \"<\":\n # Need to add a redirection for this program\n # Runs like `shelxc xx <input_file.xds`\n in_file = f\"<{in_file}\"\n return in_file\n
"},{"location":"source/io/models/smd/","title":"smd","text":"Models for smalldata_tools Tasks.
Classes:
Name DescriptionSubmitSMDParameters
Parameters to run smalldata_tools to produce a smalldata HDF5 file.
FindOverlapXSSParameters
Parameter model for the FindOverlapXSS Task. Used to determine spatial/temporal overlap based on XSS difference signal.
"},{"location":"source/io/models/smd/#io.models.smd.FindOverlapXSSParameters","title":"FindOverlapXSSParameters
","text":" Bases: TaskParameters
TaskParameter model for FindOverlapXSS Task.
This Task determines spatial or temporal overlap between an optical pulse and the FEL pulse based on difference scattering (XSS) signal. This Task uses SmallData HDF5 files as a source.
Source code inlute/io/models/smd.py
class FindOverlapXSSParameters(TaskParameters):\n \"\"\"TaskParameter model for FindOverlapXSS Task.\n\n This Task determines spatial or temporal overlap between an optical pulse\n and the FEL pulse based on difference scattering (XSS) signal. This Task\n uses SmallData HDF5 files as a source.\n \"\"\"\n\n class ExpConfig(BaseModel):\n det_name: str\n ipm_var: str\n scan_var: Union[str, List[str]]\n\n class Thresholds(BaseModel):\n min_Iscat: Union[int, float]\n min_ipm: Union[int, float]\n\n class AnalysisFlags(BaseModel):\n use_pyfai: bool = True\n use_asymls: bool = False\n\n exp_config: ExpConfig\n thresholds: Thresholds\n analysis_flags: AnalysisFlags\n
"},{"location":"source/io/models/smd/#io.models.smd.SubmitSMDParameters","title":"SubmitSMDParameters
","text":" Bases: ThirdPartyParameters
Parameters for running smalldata to produce reduced HDF5 files.
Source code inlute/io/models/smd.py
class SubmitSMDParameters(ThirdPartyParameters):\n \"\"\"Parameters for running smalldata to produce reduced HDF5 files.\"\"\"\n\n class Config(ThirdPartyParameters.Config):\n \"\"\"Identical to super-class Config but includes a result.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n result_from_params: str = \"\"\n \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n\n executable: str = Field(\"mpirun\", description=\"MPI executable.\", flag_type=\"\")\n np: PositiveInt = Field(\n max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n description=\"Number of processes\",\n flag_type=\"-\",\n )\n p_arg1: str = Field(\n \"python\", description=\"Executable to run with mpi (i.e. python).\", flag_type=\"\"\n )\n u: str = Field(\n \"\", description=\"Python option for unbuffered output.\", flag_type=\"-\"\n )\n m: str = Field(\n \"mpi4py.run\",\n description=\"Python option to execute a module's contents as __main__ module.\",\n flag_type=\"-\",\n )\n producer: str = Field(\n \"\", description=\"Path to the SmallData producer Python script.\", flag_type=\"\"\n )\n run: str = Field(\n os.environ.get(\"RUN_NUM\", \"\"), description=\"DAQ Run Number.\", flag_type=\"--\"\n )\n experiment: str = Field(\n os.environ.get(\"EXPERIMENT\", \"\"),\n description=\"LCLS Experiment Number.\",\n flag_type=\"--\",\n )\n stn: NonNegativeInt = Field(0, description=\"Hutch endstation.\", flag_type=\"--\")\n nevents: int = Field(\n int(1e9), description=\"Number of events to process.\", flag_type=\"--\"\n )\n directory: Optional[str] = Field(\n None,\n description=\"Optional output directory. If None, will be in ${EXP_FOLDER}/hdf5/smalldata.\",\n flag_type=\"--\",\n )\n ## Need mechanism to set result_from_param=True ...\n gather_interval: PositiveInt = Field(\n 25, description=\"Number of events to collect at a time.\", flag_type=\"--\"\n )\n norecorder: bool = Field(\n False, description=\"Whether to ignore recorder streams.\", flag_type=\"--\"\n )\n url: HttpUrl = Field(\n \"https://pswww.slac.stanford.edu/ws-auth/lgbk\",\n description=\"Base URL for eLog posting.\",\n flag_type=\"--\",\n )\n epicsAll: bool = Field(\n False,\n description=\"Whether to store all EPICS PVs. Use with care.\",\n flag_type=\"--\",\n )\n full: bool = Field(\n False,\n description=\"Whether to store all data. Use with EXTRA care.\",\n flag_type=\"--\",\n )\n fullSum: bool = Field(\n False,\n description=\"Whether to store sums for all area detector images.\",\n flag_type=\"--\",\n )\n default: bool = Field(\n False,\n description=\"Whether to store only the default minimal set of data.\",\n flag_type=\"--\",\n )\n image: bool = Field(\n False,\n description=\"Whether to save everything as images. Use with care.\",\n flag_type=\"--\",\n )\n tiff: bool = Field(\n False,\n description=\"Whether to save all images as a single TIFF. Use with EXTRA care.\",\n flag_type=\"--\",\n )\n centerpix: bool = Field(\n False,\n description=\"Whether to mask center pixels for Epix10k2M detectors.\",\n flag_type=\"--\",\n )\n postRuntable: bool = Field(\n False,\n description=\"Whether to post run tables. Also used as a trigger for summary jobs.\",\n flag_type=\"--\",\n )\n wait: bool = Field(\n False, description=\"Whether to wait for a file to appear.\", flag_type=\"--\"\n )\n xtcav: bool = Field(\n False,\n description=\"Whether to add XTCAV processing to the HDF5 generation.\",\n flag_type=\"--\",\n )\n noarch: bool = Field(\n False, description=\"Whether to not use archiver data.\", flag_type=\"--\"\n )\n\n lute_template_cfg: TemplateConfig = TemplateConfig(template_name=\"\", output_path=\"\")\n\n @validator(\"producer\", always=True)\n def validate_producer_path(cls, producer: str) -> str:\n return producer\n\n @validator(\"lute_template_cfg\", always=True)\n def use_producer(\n cls, lute_template_cfg: TemplateConfig, values: Dict[str, Any]\n ) -> TemplateConfig:\n if not lute_template_cfg.output_path:\n lute_template_cfg.output_path = values[\"producer\"]\n return lute_template_cfg\n\n @root_validator(pre=False)\n def define_result(cls, values: Dict[str, Any]) -> Dict[str, Any]:\n exp: str = values[\"lute_config\"].experiment\n hutch: str = exp[:3]\n run: int = int(values[\"lute_config\"].run)\n directory: Optional[str] = values[\"directory\"]\n if directory is None:\n directory = f\"/sdf/data/lcls/ds/{hutch}/{exp}/hdf5/smalldata\"\n fname: str = f\"{exp}_Run{run:04d}.h5\"\n\n cls.Config.result_from_params = f\"{directory}/{fname}\"\n return values\n
"},{"location":"source/io/models/smd/#io.models.smd.SubmitSMDParameters.Config","title":"Config
","text":" Bases: Config
Identical to super-class Config but includes a result.
Source code inlute/io/models/smd.py
class Config(ThirdPartyParameters.Config):\n \"\"\"Identical to super-class Config but includes a result.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n result_from_params: str = \"\"\n \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n
"},{"location":"source/io/models/smd/#io.models.smd.SubmitSMDParameters.Config.result_from_params","title":"result_from_params: str = ''
class-attribute
instance-attribute
","text":"Defines a result from the parameters. Use a validator to do so.
"},{"location":"source/io/models/smd/#io.models.smd.SubmitSMDParameters.Config.set_result","title":"set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/models/tests/","title":"tests","text":"Models for all test Tasks.
Classes:
Name DescriptionTestParameters
Model for most basic test case. Single core first-party Task. Uses only communication via pipes.
TestBinaryParameters
Parameters for a simple multi- threaded binary executable.
TestSocketParameters
Model for first-party test requiring communication via socket.
TestWriteOutputParameters
Model for test Task which writes an output file. Location of file is recorded in database.
TestReadOutputParameters
Model for test Task which locates an output file based on an entry in the database, if no path is provided.
"},{"location":"source/io/models/tests/#io.models.tests.TestBinaryErrParameters","title":"TestBinaryErrParameters
","text":" Bases: ThirdPartyParameters
Same as TestBinary, but exits with non-zero code.
Source code inlute/io/models/tests.py
class TestBinaryErrParameters(ThirdPartyParameters):\n \"\"\"Same as TestBinary, but exits with non-zero code.\"\"\"\n\n executable: str = Field(\n \"/sdf/home/d/dorlhiac/test_tasks/test_threads_err\",\n description=\"Multi-threaded tes tbinary with non-zero exit code.\",\n )\n p_arg1: int = Field(1, description=\"Number of threads.\")\n
"},{"location":"source/io/models/tests/#io.models.tests.TestParameters","title":"TestParameters
","text":" Bases: TaskParameters
Parameters for the test Task Test
.
lute/io/models/tests.py
class TestParameters(TaskParameters):\n \"\"\"Parameters for the test Task `Test`.\"\"\"\n\n float_var: float = Field(0.01, description=\"A floating point number.\")\n str_var: str = Field(\"test\", description=\"A string.\")\n\n class CompoundVar(BaseModel):\n int_var: int = 1\n dict_var: Dict[str, str] = {\"a\": \"b\"}\n\n compound_var: CompoundVar = Field(\n description=(\n \"A compound parameter - consists of a `int_var` (int) and `dict_var`\"\n \" (Dict[str, str]).\"\n )\n )\n throw_error: bool = Field(\n False, description=\"If `True`, raise an exception to test error handling.\"\n )\n
"},{"location":"source/tasks/dataclasses/","title":"dataclasses","text":"Classes for describing Task state and results.
Classes:
Name DescriptionTaskResult
Output of a specific analysis task.
TaskStatus
Enumeration of possible Task statuses (running, pending, failed, etc.).
DescribedAnalysis
Executor's description of a Task
run (results, parameters, env).
DescribedAnalysis
dataclass
","text":"Complete analysis description. Held by an Executor.
Source code inlute/tasks/dataclasses.py
@dataclass\nclass DescribedAnalysis:\n \"\"\"Complete analysis description. Held by an Executor.\"\"\"\n\n task_result: TaskResult\n task_parameters: Optional[TaskParameters]\n task_env: Dict[str, str]\n poll_interval: float\n communicator_desc: List[str]\n
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.ElogSummaryPlots","title":"ElogSummaryPlots
dataclass
","text":"Holds a graphical summary intended for display in the eLog.
Attributes:
Name Type Descriptiondisplay_name
str
This represents both a path and how the result will be displayed in the eLog. Can include \"/\" characters. E.g. display_name = \"scans/my_motor_scan\"
will have plots shown on a \"my_motor_scan\" page, under a \"scans\" tab. This format mirrors how the file is stored on disk as well.
lute/tasks/dataclasses.py
@dataclass\nclass ElogSummaryPlots:\n \"\"\"Holds a graphical summary intended for display in the eLog.\n\n Attributes:\n display_name (str): This represents both a path and how the result will be\n displayed in the eLog. Can include \"/\" characters. E.g.\n `display_name = \"scans/my_motor_scan\"` will have plots shown\n on a \"my_motor_scan\" page, under a \"scans\" tab. This format mirrors\n how the file is stored on disk as well.\n \"\"\"\n\n display_name: str\n figures: Union[pn.Tabs, hv.Image, plt.Figure]\n
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskResult","title":"TaskResult
dataclass
","text":"Class for storing the result of a Task's execution with metadata.
Attributes:
Name Type Descriptiontask_name
str
Name of the associated task which produced it.
task_status
TaskStatus
Status of associated task.
summary
str
Short message/summary associated with the result.
payload
Any
Actual result. May be data in any format.
impl_schemas
Optional[str]
A string listing Task
schemas implemented by the associated Task
. Schemas define the category and expected output of the Task
. An individual task may implement/conform to multiple schemas. Multiple schemas are separated by ';', e.g. * impl_schemas = \"schema1;schema2\"
lute/tasks/dataclasses.py
@dataclass\nclass TaskResult:\n \"\"\"Class for storing the result of a Task's execution with metadata.\n\n Attributes:\n task_name (str): Name of the associated task which produced it.\n\n task_status (TaskStatus): Status of associated task.\n\n summary (str): Short message/summary associated with the result.\n\n payload (Any): Actual result. May be data in any format.\n\n impl_schemas (Optional[str]): A string listing `Task` schemas implemented\n by the associated `Task`. Schemas define the category and expected\n output of the `Task`. An individual task may implement/conform to\n multiple schemas. Multiple schemas are separated by ';', e.g.\n * impl_schemas = \"schema1;schema2\"\n \"\"\"\n\n task_name: str\n task_status: TaskStatus\n summary: str\n payload: Any\n impl_schemas: Optional[str] = None\n
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus","title":"TaskStatus
","text":" Bases: Enum
Possible Task statuses.
Source code inlute/tasks/dataclasses.py
class TaskStatus(Enum):\n \"\"\"Possible Task statuses.\"\"\"\n\n PENDING = 0\n \"\"\"\n Task has yet to run. Is Queued, or waiting for prior tasks.\n \"\"\"\n RUNNING = 1\n \"\"\"\n Task is in the process of execution.\n \"\"\"\n COMPLETED = 2\n \"\"\"\n Task has completed without fatal errors.\n \"\"\"\n FAILED = 3\n \"\"\"\n Task encountered a fatal error.\n \"\"\"\n STOPPED = 4\n \"\"\"\n Task was, potentially temporarily, stopped/suspended.\n \"\"\"\n CANCELLED = 5\n \"\"\"\n Task was cancelled prior to completion or failure.\n \"\"\"\n TIMEDOUT = 6\n \"\"\"\n Task did not reach completion due to timeout.\n \"\"\"\n
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.CANCELLED","title":"CANCELLED = 5
class-attribute
instance-attribute
","text":"Task was cancelled prior to completion or failure.
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.COMPLETED","title":"COMPLETED = 2
class-attribute
instance-attribute
","text":"Task has completed without fatal errors.
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.FAILED","title":"FAILED = 3
class-attribute
instance-attribute
","text":"Task encountered a fatal error.
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.PENDING","title":"PENDING = 0
class-attribute
instance-attribute
","text":"Task has yet to run. Is Queued, or waiting for prior tasks.
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.RUNNING","title":"RUNNING = 1
class-attribute
instance-attribute
","text":"Task is in the process of execution.
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.STOPPED","title":"STOPPED = 4
class-attribute
instance-attribute
","text":"Task was, potentially temporarily, stopped/suspended.
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.TIMEDOUT","title":"TIMEDOUT = 6
class-attribute
instance-attribute
","text":"Task did not reach completion due to timeout.
"},{"location":"source/tasks/sfx_find_peaks/","title":"sfx_find_peaks","text":"Classes for peak finding tasks in SFX.
Classes:
Name DescriptionCxiWriter
utility class for writing peak finding results to CXI files.
FindPeaksPyAlgos
peak finding using psana's PyAlgos algorithm. Optional data compression and decompression with libpressio for data reduction tests.
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.CxiWriter","title":"CxiWriter
","text":"Source code in lute/tasks/sfx_find_peaks.py
class CxiWriter:\n\n def __init__(\n self,\n outdir: str,\n rank: int,\n exp: str,\n run: int,\n n_events: int,\n det_shape: Tuple[int, ...],\n min_peaks: int,\n max_peaks: int,\n i_x: Any, # Not typed becomes it comes from psana\n i_y: Any, # Not typed becomes it comes from psana\n ipx: Any, # Not typed becomes it comes from psana\n ipy: Any, # Not typed becomes it comes from psana\n tag: str,\n ):\n \"\"\"\n Set up the CXI files to which peak finding results will be saved.\n\n Parameters:\n\n outdir (str): Output directory for cxi file.\n\n rank (int): MPI rank of the caller.\n\n exp (str): Experiment string.\n\n run (int): Experimental run.\n\n n_events (int): Number of events to process.\n\n det_shape (Tuple[int, int]): Shape of the numpy array storing the detector\n data. This must be aCheetah-stile 2D array.\n\n min_peaks (int): Minimum number of peaks per image.\n\n max_peaks (int): Maximum number of peaks per image.\n\n i_x (Any): Array of pixel indexes along x\n\n i_y (Any): Array of pixel indexes along y\n\n ipx (Any): Pixel indexes with respect to detector origin (x component)\n\n ipy (Any): Pixel indexes with respect to detector origin (y component)\n\n tag (str): Tag to append to cxi file names.\n \"\"\"\n self._det_shape: Tuple[int, ...] = det_shape\n self._i_x: Any = i_x\n self._i_y: Any = i_y\n self._ipx: Any = ipx\n self._ipy: Any = ipy\n self._index: int = 0\n\n # Create and open the HDF5 file\n fname: str = f\"{exp}_r{run:0>4}_{rank}{tag}.cxi\"\n Path(outdir).mkdir(exist_ok=True)\n self._outh5: Any = h5py.File(Path(outdir) / fname, \"w\")\n\n # Entry_1 entry for processing with CrystFEL\n entry_1: Any = self._outh5.create_group(\"entry_1\")\n keys: List[str] = [\n \"nPeaks\",\n \"peakXPosRaw\",\n \"peakYPosRaw\",\n \"rcent\",\n \"ccent\",\n \"rmin\",\n \"rmax\",\n \"cmin\",\n \"cmax\",\n \"peakTotalIntensity\",\n \"peakMaxIntensity\",\n \"peakRadius\",\n ]\n ds_expId: Any = entry_1.create_dataset(\n \"experimental_identifier\", (n_events,), maxshape=(None,), dtype=int\n )\n ds_expId.attrs[\"axes\"] = \"experiment_identifier\"\n data_1: Any = entry_1.create_dataset(\n \"/entry_1/data_1/data\",\n (n_events, det_shape[0], det_shape[1]),\n chunks=(1, det_shape[0], det_shape[1]),\n maxshape=(None, det_shape[0], det_shape[1]),\n dtype=numpy.float32,\n )\n data_1.attrs[\"axes\"] = \"experiment_identifier\"\n key: str\n for key in [\"powderHits\", \"powderMisses\", \"mask\"]:\n entry_1.create_dataset(\n f\"/entry_1/data_1/{key}\",\n (det_shape[0], det_shape[1]),\n chunks=(det_shape[0], det_shape[1]),\n maxshape=(det_shape[0], det_shape[1]),\n dtype=float,\n )\n\n # Peak-related entries\n for key in keys:\n if key == \"nPeaks\":\n ds_x: Any = self._outh5.create_dataset(\n f\"/entry_1/result_1/{key}\",\n (n_events,),\n maxshape=(None,),\n dtype=int,\n )\n ds_x.attrs[\"minPeaks\"] = min_peaks\n ds_x.attrs[\"maxPeaks\"] = max_peaks\n else:\n ds_x: Any = self._outh5.create_dataset(\n f\"/entry_1/result_1/{key}\",\n (n_events, max_peaks),\n maxshape=(None, max_peaks),\n chunks=(1, max_peaks),\n dtype=float,\n )\n ds_x.attrs[\"axes\"] = \"experiment_identifier:peaks\"\n\n # Timestamp entries\n lcls_1: Any = self._outh5.create_group(\"LCLS\")\n keys: List[str] = [\n \"eventNumber\",\n \"machineTime\",\n \"machineTimeNanoSeconds\",\n \"fiducial\",\n \"photon_energy_eV\",\n ]\n key: str\n for key in keys:\n if key == \"photon_energy_eV\":\n ds_x: Any = lcls_1.create_dataset(\n f\"{key}\", (n_events,), maxshape=(None,), dtype=float\n )\n else:\n ds_x = lcls_1.create_dataset(\n f\"{key}\", (n_events,), maxshape=(None,), dtype=int\n )\n ds_x.attrs[\"axes\"] = \"experiment_identifier\"\n\n ds_x = self._outh5.create_dataset(\n \"/LCLS/detector_1/EncoderValue\", (n_events,), maxshape=(None,), dtype=float\n )\n ds_x.attrs[\"axes\"] = \"experiment_identifier\"\n\n def write_event(\n self,\n img: NDArray[numpy.float_],\n peaks: Any, # Not typed becomes it comes from psana\n timestamp_seconds: int,\n timestamp_nanoseconds: int,\n timestamp_fiducials: int,\n photon_energy: float,\n ):\n \"\"\"\n Write peak finding results for an event into the HDF5 file.\n\n Parameters:\n\n img (NDArray[numpy.float_]): Detector data for the event\n\n peaks: (Any): Peak information for the event, as recovered from the PyAlgos\n algorithm\n\n timestamp_seconds (int): Second part of the event's timestamp information\n\n timestamp_nanoseconds (int): Nanosecond part of the event's timestamp\n information\n\n timestamp_fiducials (int): Fiducials part of the event's timestamp\n information\n\n photon_energy (float): Photon energy for the event\n \"\"\"\n ch_rows: NDArray[numpy.float_] = peaks[:, 0] * self._det_shape[1] + peaks[:, 1]\n ch_cols: NDArray[numpy.float_] = peaks[:, 2]\n\n # Entry_1 entry for processing with CrystFEL\n self._outh5[\"/entry_1/data_1/data\"][self._index, :, :] = img.reshape(\n -1, img.shape[-1]\n )\n self._outh5[\"/entry_1/result_1/nPeaks\"][self._index] = peaks.shape[0]\n self._outh5[\"/entry_1/result_1/peakXPosRaw\"][self._index, : peaks.shape[0]] = (\n ch_cols.astype(\"int\")\n )\n self._outh5[\"/entry_1/result_1/peakYPosRaw\"][self._index, : peaks.shape[0]] = (\n ch_rows.astype(\"int\")\n )\n self._outh5[\"/entry_1/result_1/rcent\"][self._index, : peaks.shape[0]] = peaks[\n :, 6\n ]\n self._outh5[\"/entry_1/result_1/ccent\"][self._index, : peaks.shape[0]] = peaks[\n :, 7\n ]\n self._outh5[\"/entry_1/result_1/rmin\"][self._index, : peaks.shape[0]] = peaks[\n :, 10\n ]\n self._outh5[\"/entry_1/result_1/rmax\"][self._index, : peaks.shape[0]] = peaks[\n :, 11\n ]\n self._outh5[\"/entry_1/result_1/cmin\"][self._index, : peaks.shape[0]] = peaks[\n :, 12\n ]\n self._outh5[\"/entry_1/result_1/cmax\"][self._index, : peaks.shape[0]] = peaks[\n :, 13\n ]\n self._outh5[\"/entry_1/result_1/peakTotalIntensity\"][\n self._index, : peaks.shape[0]\n ] = peaks[:, 5]\n self._outh5[\"/entry_1/result_1/peakMaxIntensity\"][\n self._index, : peaks.shape[0]\n ] = peaks[:, 4]\n\n # Calculate and write pixel radius\n peaks_cenx: NDArray[numpy.float_] = (\n self._i_x[\n numpy.array(peaks[:, 0], dtype=numpy.int64),\n numpy.array(peaks[:, 1], dtype=numpy.int64),\n numpy.array(peaks[:, 2], dtype=numpy.int64),\n ]\n + 0.5\n - self._ipx\n )\n peaks_ceny: NDArray[numpy.float_] = (\n self._i_y[\n numpy.array(peaks[:, 0], dtype=numpy.int64),\n numpy.array(peaks[:, 1], dtype=numpy.int64),\n numpy.array(peaks[:, 2], dtype=numpy.int64),\n ]\n + 0.5\n - self._ipy\n )\n peak_radius: NDArray[numpy.float_] = numpy.sqrt(\n (peaks_cenx**2) + (peaks_ceny**2)\n )\n self._outh5[\"/entry_1/result_1/peakRadius\"][\n self._index, : peaks.shape[0]\n ] = peak_radius\n\n # LCLS entry dataset\n self._outh5[\"/LCLS/machineTime\"][self._index] = timestamp_seconds\n self._outh5[\"/LCLS/machineTimeNanoSeconds\"][self._index] = timestamp_nanoseconds\n self._outh5[\"/LCLS/fiducial\"][self._index] = timestamp_fiducials\n self._outh5[\"/LCLS/photon_energy_eV\"][self._index] = photon_energy\n\n self._index += 1\n\n def write_non_event_data(\n self,\n powder_hits: NDArray[numpy.float_],\n powder_misses: NDArray[numpy.float_],\n mask: NDArray[numpy.uint16],\n clen: float,\n ):\n \"\"\"\n Write to the file data that is not related to a specific event (masks, powders)\n\n Parameters:\n\n powder_hits (NDArray[numpy.float_]): Virtual powder pattern from hits\n\n powder_misses (NDArray[numpy.float_]): Virtual powder pattern from hits\n\n mask: (NDArray[numpy.uint16]): Pixel ask to write into the file\n\n \"\"\"\n # Add powders and mask to files, reshaping them to match the crystfel\n # convention\n self._outh5[\"/entry_1/data_1/powderHits\"][:] = powder_hits.reshape(\n -1, powder_hits.shape[-1]\n )\n self._outh5[\"/entry_1/data_1/powderMisses\"][:] = powder_misses.reshape(\n -1, powder_misses.shape[-1]\n )\n self._outh5[\"/entry_1/data_1/mask\"][:] = (1 - mask).reshape(\n -1, mask.shape[-1]\n ) # Crystfel expects inverted values\n\n # Add clen distance\n self._outh5[\"/LCLS/detector_1/EncoderValue\"][:] = clen\n\n def optimize_and_close_file(\n self,\n num_hits: int,\n max_peaks: int,\n ):\n \"\"\"\n Resize data blocks and write additional information to the file\n\n Parameters:\n\n num_hits (int): Number of hits for which information has been saved to the\n file\n\n max_peaks (int): Maximum number of peaks (per event) for which information\n can be written into the file\n \"\"\"\n\n # Resize the entry_1 entry\n data_shape: Tuple[int, ...] = self._outh5[\"/entry_1/data_1/data\"].shape\n self._outh5[\"/entry_1/data_1/data\"].resize(\n (num_hits, data_shape[1], data_shape[2])\n )\n self._outh5[f\"/entry_1/result_1/nPeaks\"].resize((num_hits,))\n key: str\n for key in [\n \"peakXPosRaw\",\n \"peakYPosRaw\",\n \"rcent\",\n \"ccent\",\n \"rmin\",\n \"rmax\",\n \"cmin\",\n \"cmax\",\n \"peakTotalIntensity\",\n \"peakMaxIntensity\",\n \"peakRadius\",\n ]:\n self._outh5[f\"/entry_1/result_1/{key}\"].resize((num_hits, max_peaks))\n\n # Resize LCLS entry\n for key in [\n \"eventNumber\",\n \"machineTime\",\n \"machineTimeNanoSeconds\",\n \"fiducial\",\n \"detector_1/EncoderValue\",\n \"photon_energy_eV\",\n ]:\n self._outh5[f\"/LCLS/{key}\"].resize((num_hits,))\n self._outh5.close()\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.CxiWriter.__init__","title":"__init__(outdir, rank, exp, run, n_events, det_shape, min_peaks, max_peaks, i_x, i_y, ipx, ipy, tag)
","text":"Set up the CXI files to which peak finding results will be saved.
Parameters:
outdir (str): Output directory for cxi file.\n\nrank (int): MPI rank of the caller.\n\nexp (str): Experiment string.\n\nrun (int): Experimental run.\n\nn_events (int): Number of events to process.\n\ndet_shape (Tuple[int, int]): Shape of the numpy array storing the detector\n data. This must be aCheetah-stile 2D array.\n\nmin_peaks (int): Minimum number of peaks per image.\n\nmax_peaks (int): Maximum number of peaks per image.\n\ni_x (Any): Array of pixel indexes along x\n\ni_y (Any): Array of pixel indexes along y\n\nipx (Any): Pixel indexes with respect to detector origin (x component)\n\nipy (Any): Pixel indexes with respect to detector origin (y component)\n\ntag (str): Tag to append to cxi file names.\n
Source code in lute/tasks/sfx_find_peaks.py
def __init__(\n self,\n outdir: str,\n rank: int,\n exp: str,\n run: int,\n n_events: int,\n det_shape: Tuple[int, ...],\n min_peaks: int,\n max_peaks: int,\n i_x: Any, # Not typed becomes it comes from psana\n i_y: Any, # Not typed becomes it comes from psana\n ipx: Any, # Not typed becomes it comes from psana\n ipy: Any, # Not typed becomes it comes from psana\n tag: str,\n):\n \"\"\"\n Set up the CXI files to which peak finding results will be saved.\n\n Parameters:\n\n outdir (str): Output directory for cxi file.\n\n rank (int): MPI rank of the caller.\n\n exp (str): Experiment string.\n\n run (int): Experimental run.\n\n n_events (int): Number of events to process.\n\n det_shape (Tuple[int, int]): Shape of the numpy array storing the detector\n data. This must be aCheetah-stile 2D array.\n\n min_peaks (int): Minimum number of peaks per image.\n\n max_peaks (int): Maximum number of peaks per image.\n\n i_x (Any): Array of pixel indexes along x\n\n i_y (Any): Array of pixel indexes along y\n\n ipx (Any): Pixel indexes with respect to detector origin (x component)\n\n ipy (Any): Pixel indexes with respect to detector origin (y component)\n\n tag (str): Tag to append to cxi file names.\n \"\"\"\n self._det_shape: Tuple[int, ...] = det_shape\n self._i_x: Any = i_x\n self._i_y: Any = i_y\n self._ipx: Any = ipx\n self._ipy: Any = ipy\n self._index: int = 0\n\n # Create and open the HDF5 file\n fname: str = f\"{exp}_r{run:0>4}_{rank}{tag}.cxi\"\n Path(outdir).mkdir(exist_ok=True)\n self._outh5: Any = h5py.File(Path(outdir) / fname, \"w\")\n\n # Entry_1 entry for processing with CrystFEL\n entry_1: Any = self._outh5.create_group(\"entry_1\")\n keys: List[str] = [\n \"nPeaks\",\n \"peakXPosRaw\",\n \"peakYPosRaw\",\n \"rcent\",\n \"ccent\",\n \"rmin\",\n \"rmax\",\n \"cmin\",\n \"cmax\",\n \"peakTotalIntensity\",\n \"peakMaxIntensity\",\n \"peakRadius\",\n ]\n ds_expId: Any = entry_1.create_dataset(\n \"experimental_identifier\", (n_events,), maxshape=(None,), dtype=int\n )\n ds_expId.attrs[\"axes\"] = \"experiment_identifier\"\n data_1: Any = entry_1.create_dataset(\n \"/entry_1/data_1/data\",\n (n_events, det_shape[0], det_shape[1]),\n chunks=(1, det_shape[0], det_shape[1]),\n maxshape=(None, det_shape[0], det_shape[1]),\n dtype=numpy.float32,\n )\n data_1.attrs[\"axes\"] = \"experiment_identifier\"\n key: str\n for key in [\"powderHits\", \"powderMisses\", \"mask\"]:\n entry_1.create_dataset(\n f\"/entry_1/data_1/{key}\",\n (det_shape[0], det_shape[1]),\n chunks=(det_shape[0], det_shape[1]),\n maxshape=(det_shape[0], det_shape[1]),\n dtype=float,\n )\n\n # Peak-related entries\n for key in keys:\n if key == \"nPeaks\":\n ds_x: Any = self._outh5.create_dataset(\n f\"/entry_1/result_1/{key}\",\n (n_events,),\n maxshape=(None,),\n dtype=int,\n )\n ds_x.attrs[\"minPeaks\"] = min_peaks\n ds_x.attrs[\"maxPeaks\"] = max_peaks\n else:\n ds_x: Any = self._outh5.create_dataset(\n f\"/entry_1/result_1/{key}\",\n (n_events, max_peaks),\n maxshape=(None, max_peaks),\n chunks=(1, max_peaks),\n dtype=float,\n )\n ds_x.attrs[\"axes\"] = \"experiment_identifier:peaks\"\n\n # Timestamp entries\n lcls_1: Any = self._outh5.create_group(\"LCLS\")\n keys: List[str] = [\n \"eventNumber\",\n \"machineTime\",\n \"machineTimeNanoSeconds\",\n \"fiducial\",\n \"photon_energy_eV\",\n ]\n key: str\n for key in keys:\n if key == \"photon_energy_eV\":\n ds_x: Any = lcls_1.create_dataset(\n f\"{key}\", (n_events,), maxshape=(None,), dtype=float\n )\n else:\n ds_x = lcls_1.create_dataset(\n f\"{key}\", (n_events,), maxshape=(None,), dtype=int\n )\n ds_x.attrs[\"axes\"] = \"experiment_identifier\"\n\n ds_x = self._outh5.create_dataset(\n \"/LCLS/detector_1/EncoderValue\", (n_events,), maxshape=(None,), dtype=float\n )\n ds_x.attrs[\"axes\"] = \"experiment_identifier\"\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.CxiWriter.optimize_and_close_file","title":"optimize_and_close_file(num_hits, max_peaks)
","text":"Resize data blocks and write additional information to the file
Parameters:
num_hits (int): Number of hits for which information has been saved to the\n file\n\nmax_peaks (int): Maximum number of peaks (per event) for which information\n can be written into the file\n
Source code in lute/tasks/sfx_find_peaks.py
def optimize_and_close_file(\n self,\n num_hits: int,\n max_peaks: int,\n):\n \"\"\"\n Resize data blocks and write additional information to the file\n\n Parameters:\n\n num_hits (int): Number of hits for which information has been saved to the\n file\n\n max_peaks (int): Maximum number of peaks (per event) for which information\n can be written into the file\n \"\"\"\n\n # Resize the entry_1 entry\n data_shape: Tuple[int, ...] = self._outh5[\"/entry_1/data_1/data\"].shape\n self._outh5[\"/entry_1/data_1/data\"].resize(\n (num_hits, data_shape[1], data_shape[2])\n )\n self._outh5[f\"/entry_1/result_1/nPeaks\"].resize((num_hits,))\n key: str\n for key in [\n \"peakXPosRaw\",\n \"peakYPosRaw\",\n \"rcent\",\n \"ccent\",\n \"rmin\",\n \"rmax\",\n \"cmin\",\n \"cmax\",\n \"peakTotalIntensity\",\n \"peakMaxIntensity\",\n \"peakRadius\",\n ]:\n self._outh5[f\"/entry_1/result_1/{key}\"].resize((num_hits, max_peaks))\n\n # Resize LCLS entry\n for key in [\n \"eventNumber\",\n \"machineTime\",\n \"machineTimeNanoSeconds\",\n \"fiducial\",\n \"detector_1/EncoderValue\",\n \"photon_energy_eV\",\n ]:\n self._outh5[f\"/LCLS/{key}\"].resize((num_hits,))\n self._outh5.close()\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.CxiWriter.write_event","title":"write_event(img, peaks, timestamp_seconds, timestamp_nanoseconds, timestamp_fiducials, photon_energy)
","text":"Write peak finding results for an event into the HDF5 file.
Parameters:
img (NDArray[numpy.float_]): Detector data for the event\n\npeaks: (Any): Peak information for the event, as recovered from the PyAlgos\n algorithm\n\ntimestamp_seconds (int): Second part of the event's timestamp information\n\ntimestamp_nanoseconds (int): Nanosecond part of the event's timestamp\n information\n\ntimestamp_fiducials (int): Fiducials part of the event's timestamp\n information\n\nphoton_energy (float): Photon energy for the event\n
Source code in lute/tasks/sfx_find_peaks.py
def write_event(\n self,\n img: NDArray[numpy.float_],\n peaks: Any, # Not typed becomes it comes from psana\n timestamp_seconds: int,\n timestamp_nanoseconds: int,\n timestamp_fiducials: int,\n photon_energy: float,\n):\n \"\"\"\n Write peak finding results for an event into the HDF5 file.\n\n Parameters:\n\n img (NDArray[numpy.float_]): Detector data for the event\n\n peaks: (Any): Peak information for the event, as recovered from the PyAlgos\n algorithm\n\n timestamp_seconds (int): Second part of the event's timestamp information\n\n timestamp_nanoseconds (int): Nanosecond part of the event's timestamp\n information\n\n timestamp_fiducials (int): Fiducials part of the event's timestamp\n information\n\n photon_energy (float): Photon energy for the event\n \"\"\"\n ch_rows: NDArray[numpy.float_] = peaks[:, 0] * self._det_shape[1] + peaks[:, 1]\n ch_cols: NDArray[numpy.float_] = peaks[:, 2]\n\n # Entry_1 entry for processing with CrystFEL\n self._outh5[\"/entry_1/data_1/data\"][self._index, :, :] = img.reshape(\n -1, img.shape[-1]\n )\n self._outh5[\"/entry_1/result_1/nPeaks\"][self._index] = peaks.shape[0]\n self._outh5[\"/entry_1/result_1/peakXPosRaw\"][self._index, : peaks.shape[0]] = (\n ch_cols.astype(\"int\")\n )\n self._outh5[\"/entry_1/result_1/peakYPosRaw\"][self._index, : peaks.shape[0]] = (\n ch_rows.astype(\"int\")\n )\n self._outh5[\"/entry_1/result_1/rcent\"][self._index, : peaks.shape[0]] = peaks[\n :, 6\n ]\n self._outh5[\"/entry_1/result_1/ccent\"][self._index, : peaks.shape[0]] = peaks[\n :, 7\n ]\n self._outh5[\"/entry_1/result_1/rmin\"][self._index, : peaks.shape[0]] = peaks[\n :, 10\n ]\n self._outh5[\"/entry_1/result_1/rmax\"][self._index, : peaks.shape[0]] = peaks[\n :, 11\n ]\n self._outh5[\"/entry_1/result_1/cmin\"][self._index, : peaks.shape[0]] = peaks[\n :, 12\n ]\n self._outh5[\"/entry_1/result_1/cmax\"][self._index, : peaks.shape[0]] = peaks[\n :, 13\n ]\n self._outh5[\"/entry_1/result_1/peakTotalIntensity\"][\n self._index, : peaks.shape[0]\n ] = peaks[:, 5]\n self._outh5[\"/entry_1/result_1/peakMaxIntensity\"][\n self._index, : peaks.shape[0]\n ] = peaks[:, 4]\n\n # Calculate and write pixel radius\n peaks_cenx: NDArray[numpy.float_] = (\n self._i_x[\n numpy.array(peaks[:, 0], dtype=numpy.int64),\n numpy.array(peaks[:, 1], dtype=numpy.int64),\n numpy.array(peaks[:, 2], dtype=numpy.int64),\n ]\n + 0.5\n - self._ipx\n )\n peaks_ceny: NDArray[numpy.float_] = (\n self._i_y[\n numpy.array(peaks[:, 0], dtype=numpy.int64),\n numpy.array(peaks[:, 1], dtype=numpy.int64),\n numpy.array(peaks[:, 2], dtype=numpy.int64),\n ]\n + 0.5\n - self._ipy\n )\n peak_radius: NDArray[numpy.float_] = numpy.sqrt(\n (peaks_cenx**2) + (peaks_ceny**2)\n )\n self._outh5[\"/entry_1/result_1/peakRadius\"][\n self._index, : peaks.shape[0]\n ] = peak_radius\n\n # LCLS entry dataset\n self._outh5[\"/LCLS/machineTime\"][self._index] = timestamp_seconds\n self._outh5[\"/LCLS/machineTimeNanoSeconds\"][self._index] = timestamp_nanoseconds\n self._outh5[\"/LCLS/fiducial\"][self._index] = timestamp_fiducials\n self._outh5[\"/LCLS/photon_energy_eV\"][self._index] = photon_energy\n\n self._index += 1\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.CxiWriter.write_non_event_data","title":"write_non_event_data(powder_hits, powder_misses, mask, clen)
","text":"Write to the file data that is not related to a specific event (masks, powders)
Parameters:
powder_hits (NDArray[numpy.float_]): Virtual powder pattern from hits\n\npowder_misses (NDArray[numpy.float_]): Virtual powder pattern from hits\n\nmask: (NDArray[numpy.uint16]): Pixel ask to write into the file\n
Source code in lute/tasks/sfx_find_peaks.py
def write_non_event_data(\n self,\n powder_hits: NDArray[numpy.float_],\n powder_misses: NDArray[numpy.float_],\n mask: NDArray[numpy.uint16],\n clen: float,\n):\n \"\"\"\n Write to the file data that is not related to a specific event (masks, powders)\n\n Parameters:\n\n powder_hits (NDArray[numpy.float_]): Virtual powder pattern from hits\n\n powder_misses (NDArray[numpy.float_]): Virtual powder pattern from hits\n\n mask: (NDArray[numpy.uint16]): Pixel ask to write into the file\n\n \"\"\"\n # Add powders and mask to files, reshaping them to match the crystfel\n # convention\n self._outh5[\"/entry_1/data_1/powderHits\"][:] = powder_hits.reshape(\n -1, powder_hits.shape[-1]\n )\n self._outh5[\"/entry_1/data_1/powderMisses\"][:] = powder_misses.reshape(\n -1, powder_misses.shape[-1]\n )\n self._outh5[\"/entry_1/data_1/mask\"][:] = (1 - mask).reshape(\n -1, mask.shape[-1]\n ) # Crystfel expects inverted values\n\n # Add clen distance\n self._outh5[\"/LCLS/detector_1/EncoderValue\"][:] = clen\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.FindPeaksPyAlgos","title":"FindPeaksPyAlgos
","text":" Bases: Task
Task that performs peak finding using the PyAlgos peak finding algorithms and writes the peak information to CXI files.
Source code inlute/tasks/sfx_find_peaks.py
class FindPeaksPyAlgos(Task):\n \"\"\"\n Task that performs peak finding using the PyAlgos peak finding algorithms and\n writes the peak information to CXI files.\n \"\"\"\n\n def __init__(self, *, params: TaskParameters) -> None:\n super().__init__(params=params)\n\n def _run(self) -> None:\n ds: Any = MPIDataSource(\n f\"exp={self._task_parameters.lute_config.experiment}:\"\n f\"run={self._task_parameters.lute_config.run}:smd\"\n )\n if self._task_parameters.n_events != 0:\n ds.break_after(self._task_parameters.n_events)\n\n det: Any = Detector(self._task_parameters.det_name)\n det.do_reshape_2d_to_3d(flag=True)\n\n evr: Any = Detector(self._task_parameters.event_receiver)\n\n i_x: Any = det.indexes_x(self._task_parameters.lute_config.run).astype(\n numpy.int64\n )\n i_y: Any = det.indexes_y(self._task_parameters.lute_config.run).astype(\n numpy.int64\n )\n ipx: Any\n ipy: Any\n ipx, ipy = det.point_indexes(\n self._task_parameters.lute_config.run, pxy_um=(0, 0)\n )\n\n alg: Any = None\n num_hits: int = 0\n num_events: int = 0\n num_empty_images: int = 0\n tag: str = self._task_parameters.tag\n if (tag != \"\") and (tag[0] != \"_\"):\n tag = \"_\" + tag\n\n evt: Any\n for evt in ds.events():\n\n evt_id: Any = evt.get(EventId)\n timestamp_seconds: int = evt_id.time()[0]\n timestamp_nanoseconds: int = evt_id.time()[1]\n timestamp_fiducials: int = evt_id.fiducials()\n event_codes: Any = evr.eventCodes(evt)\n\n if isinstance(self._task_parameters.pv_camera_length, float):\n clen: float = self._task_parameters.pv_camera_length\n else:\n clen = (\n ds.env().epicsStore().value(self._task_parameters.pv_camera_length)\n )\n\n if self._task_parameters.event_logic:\n if not self._task_parameters.event_code in event_codes:\n continue\n\n img: Any = det.calib(evt)\n\n if img is None:\n num_empty_images += 1\n continue\n\n if alg is None:\n det_shape: Tuple[int, ...] = img.shape\n if len(det_shape) == 3:\n det_shape = (det_shape[0] * det_shape[1], det_shape[2])\n else:\n det_shape = img.shape\n\n mask: NDArray[numpy.uint16] = numpy.ones(det_shape).astype(numpy.uint16)\n\n if self._task_parameters.psana_mask:\n mask = det.mask(\n self.task_parameters.run,\n calib=False,\n status=True,\n edges=False,\n centra=False,\n unbond=False,\n unbondnbrs=False,\n ).astype(numpy.uint16)\n\n hdffh: Any\n if self._task_parameters.mask_file is not None:\n with h5py.File(self._task_parameters.mask_file, \"r\") as hdffh:\n loaded_mask: NDArray[numpy.int] = hdffh[\"entry_1/data_1/mask\"][\n :\n ]\n mask *= loaded_mask.astype(numpy.uint16)\n\n file_writer: CxiWriter = CxiWriter(\n outdir=self._task_parameters.outdir,\n rank=ds.rank,\n exp=self._task_parameters.lute_config.experiment,\n run=self._task_parameters.lute_config.run,\n n_events=self._task_parameters.n_events,\n det_shape=det_shape,\n i_x=i_x,\n i_y=i_y,\n ipx=ipx,\n ipy=ipy,\n min_peaks=self._task_parameters.min_peaks,\n max_peaks=self._task_parameters.max_peaks,\n tag=tag,\n )\n alg: Any = PyAlgos(mask=mask, pbits=0) # pbits controls verbosity\n alg.set_peak_selection_pars(\n npix_min=self._task_parameters.npix_min,\n npix_max=self._task_parameters.npix_max,\n amax_thr=self._task_parameters.amax_thr,\n atot_thr=self._task_parameters.atot_thr,\n son_min=self._task_parameters.son_min,\n )\n\n if self._task_parameters.compression is not None:\n\n libpressio_config = generate_libpressio_configuration(\n compressor=self._task_parameters.compression.compressor,\n roi_window_size=self._task_parameters.compression.roi_window_size,\n bin_size=self._task_parameters.compression.bin_size,\n abs_error=self._task_parameters.compression.abs_error,\n libpressio_mask=mask,\n )\n\n powder_hits: NDArray[numpy.float_] = numpy.zeros(det_shape)\n powder_misses: NDArray[numpy.float_] = numpy.zeros(det_shape)\n\n peaks: Any = alg.peak_finder_v3r3(\n img,\n rank=self._task_parameters.peak_rank,\n r0=self._task_parameters.r0,\n dr=self._task_parameters.dr,\n # nsigm=self._task_parameters.nsigm,\n )\n\n num_events += 1\n\n if (peaks.shape[0] >= self._task_parameters.min_peaks) and (\n peaks.shape[0] <= self._task_parameters.max_peaks\n ):\n\n if self._task_parameters.compression is not None:\n\n libpressio_config_with_peaks = (\n add_peaks_to_libpressio_configuration(libpressio_config, peaks)\n )\n compressor = PressioCompressor.from_config(\n libpressio_config_with_peaks\n )\n compressed_img = compressor.encode(img)\n decompressed_img = numpy.zeros_like(img)\n decompressed = compressor.decode(compressed_img, decompressed_img)\n img = decompressed_img\n\n try:\n photon_energy: float = (\n Detector(\"EBeam\").get(evt).ebeamPhotonEnergy()\n )\n except AttributeError:\n photon_energy = (\n 1.23984197386209e-06\n / ds.env().epicsStore().value(\"SIOC:SYS0:ML00:AO192\")\n / 1.0e9\n )\n\n file_writer.write_event(\n img=img,\n peaks=peaks,\n timestamp_seconds=timestamp_seconds,\n timestamp_nanoseconds=timestamp_nanoseconds,\n timestamp_fiducials=timestamp_fiducials,\n photon_energy=photon_energy,\n )\n num_hits += 1\n\n # TODO: Fix bug here\n # generate / update powders\n if peaks.shape[0] >= self._task_parameters.min_peaks:\n powder_hits = numpy.maximum(powder_hits, img)\n else:\n powder_misses = numpy.maximum(powder_misses, img)\n\n if num_empty_images != 0:\n msg: Message = Message(\n contents=f\"Rank {ds.rank} encountered {num_empty_images} empty images.\"\n )\n self._report_to_executor(msg)\n\n file_writer.write_non_event_data(\n powder_hits=powder_hits,\n powder_misses=powder_misses,\n mask=mask,\n clen=clen,\n )\n\n file_writer.optimize_and_close_file(\n num_hits=num_hits, max_peaks=self._task_parameters.max_peaks\n )\n\n COMM_WORLD.Barrier()\n\n num_hits_per_rank: List[int] = COMM_WORLD.gather(num_hits, root=0)\n num_hits_total: int = COMM_WORLD.reduce(num_hits, SUM)\n num_events_per_rank: List[int] = COMM_WORLD.gather(num_events, root=0)\n\n if ds.rank == 0:\n master_fname: Path = write_master_file(\n mpi_size=ds.size,\n outdir=self._task_parameters.outdir,\n exp=self._task_parameters.lute_config.experiment,\n run=self._task_parameters.lute_config.run,\n tag=tag,\n n_hits_per_rank=num_hits_per_rank,\n n_hits_total=num_hits_total,\n )\n\n # Write final summary file\n f: TextIO\n with open(\n Path(self._task_parameters.outdir) / f\"peakfinding{tag}.summary\", \"w\"\n ) as f:\n print(f\"Number of events processed: {num_events_per_rank[-1]}\", file=f)\n print(f\"Number of hits found: {num_hits_total}\", file=f)\n print(\n \"Fractional hit rate: \"\n f\"{(num_hits_total/num_events_per_rank[-1]):.2f}\",\n file=f,\n )\n print(f\"No. hits per rank: {num_hits_per_rank}\", file=f)\n\n with open(Path(self._task_parameters.out_file), \"w\") as f:\n print(f\"{master_fname}\", file=f)\n\n # Write out_file\n\n def _post_run(self) -> None:\n super()._post_run()\n self._result.task_status = TaskStatus.COMPLETED\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.add_peaks_to_libpressio_configuration","title":"add_peaks_to_libpressio_configuration(lp_json, peaks)
","text":"Add peak infromation to libpressio configuration
Parameters:
lp_json: Dictionary storing the configuration JSON structure for the libpressio\n library.\n\npeaks (Any): Peak information as returned by psana.\n
Returns:
lp_json: Updated configuration JSON structure for the libpressio library.\n
Source code in lute/tasks/sfx_find_peaks.py
def add_peaks_to_libpressio_configuration(lp_json, peaks) -> Dict[str, Any]:\n \"\"\"\n Add peak infromation to libpressio configuration\n\n Parameters:\n\n lp_json: Dictionary storing the configuration JSON structure for the libpressio\n library.\n\n peaks (Any): Peak information as returned by psana.\n\n Returns:\n\n lp_json: Updated configuration JSON structure for the libpressio library.\n \"\"\"\n lp_json[\"compressor_config\"][\"pressio\"][\"roibin\"][\"roibin:centers\"] = (\n numpy.ascontiguousarray(numpy.uint64(peaks[:, [2, 1, 0]]))\n )\n return lp_json\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.generate_libpressio_configuration","title":"generate_libpressio_configuration(compressor, roi_window_size, bin_size, abs_error, libpressio_mask)
","text":"Create the configuration JSON for the libpressio library
Parameters:
compressor (Literal[\"sz3\", \"qoz\"]): Compression algorithm to use\n (\"qoz\" or \"sz3\").\n\nabs_error (float): Bound value for the absolute error.\n\nbin_size (int): Bining Size.\n\nroi_window_size (int): Default size of the ROI window.\n\nlibpressio_mask (NDArray): mask to be applied to the data.\n
Returns:
lp_json (Dict[str, Any]): Dictionary storing the JSON configuration structure\nfor the libpressio library\n
Source code in lute/tasks/sfx_find_peaks.py
def generate_libpressio_configuration(\n compressor: Literal[\"sz3\", \"qoz\"],\n roi_window_size: int,\n bin_size: int,\n abs_error: float,\n libpressio_mask,\n) -> Dict[str, Any]:\n \"\"\"\n Create the configuration JSON for the libpressio library\n\n Parameters:\n\n compressor (Literal[\"sz3\", \"qoz\"]): Compression algorithm to use\n (\"qoz\" or \"sz3\").\n\n abs_error (float): Bound value for the absolute error.\n\n bin_size (int): Bining Size.\n\n roi_window_size (int): Default size of the ROI window.\n\n libpressio_mask (NDArray): mask to be applied to the data.\n\n Returns:\n\n lp_json (Dict[str, Any]): Dictionary storing the JSON configuration structure\n for the libpressio library\n \"\"\"\n\n if compressor == \"qoz\":\n pressio_opts: Dict[str, Any] = {\n \"pressio:abs\": abs_error,\n \"qoz\": {\"qoz:stride\": 8},\n }\n elif compressor == \"sz3\":\n pressio_opts = {\"pressio:abs\": abs_error}\n\n lp_json = {\n \"compressor_id\": \"pressio\",\n \"early_config\": {\n \"pressio\": {\n \"pressio:compressor\": \"roibin\",\n \"roibin\": {\n \"roibin:metric\": \"composite\",\n \"roibin:background\": \"mask_binning\",\n \"roibin:roi\": \"fpzip\",\n \"background\": {\n \"binning:compressor\": \"pressio\",\n \"mask_binning:compressor\": \"pressio\",\n \"pressio\": {\"pressio:compressor\": compressor},\n },\n \"composite\": {\n \"composite:plugins\": [\n \"size\",\n \"time\",\n \"input_stats\",\n \"error_stat\",\n ]\n },\n },\n }\n },\n \"compressor_config\": {\n \"pressio\": {\n \"roibin\": {\n \"roibin:roi_size\": [roi_window_size, roi_window_size, 0],\n \"roibin:centers\": None, # \"roibin:roi_strategy\": \"coordinates\",\n \"roibin:nthreads\": 4,\n \"roi\": {\"fpzip:prec\": 0},\n \"background\": {\n \"mask_binning:mask\": None,\n \"mask_binning:shape\": [bin_size, bin_size, 1],\n \"mask_binning:nthreads\": 4,\n \"pressio\": pressio_opts,\n },\n }\n }\n },\n \"name\": \"pressio\",\n }\n\n lp_json[\"compressor_config\"][\"pressio\"][\"roibin\"][\"background\"][\n \"mask_binning:mask\"\n ] = (1 - libpressio_mask)\n\n return lp_json\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.write_master_file","title":"write_master_file(mpi_size, outdir, exp, run, tag, n_hits_per_rank, n_hits_total)
","text":"Generate a virtual dataset to map all individual files for this run.
Parameters:
mpi_size (int): Number of ranks in the MPI pool.\n\noutdir (str): Output directory for cxi file.\n\nexp (str): Experiment string.\n\nrun (int): Experimental run.\n\ntag (str): Tag to append to cxi file names.\n\nn_hits_per_rank (List[int]): Array containing the number of hits found on each\n node processing data.\n\nn_hits_total (int): Total number of hits found across all nodes.\n
Returns:
The path to the the written master file\n
Source code in lute/tasks/sfx_find_peaks.py
def write_master_file(\n mpi_size: int,\n outdir: str,\n exp: str,\n run: int,\n tag: str,\n n_hits_per_rank: List[int],\n n_hits_total: int,\n) -> Path:\n \"\"\"\n Generate a virtual dataset to map all individual files for this run.\n\n Parameters:\n\n mpi_size (int): Number of ranks in the MPI pool.\n\n outdir (str): Output directory for cxi file.\n\n exp (str): Experiment string.\n\n run (int): Experimental run.\n\n tag (str): Tag to append to cxi file names.\n\n n_hits_per_rank (List[int]): Array containing the number of hits found on each\n node processing data.\n\n n_hits_total (int): Total number of hits found across all nodes.\n\n Returns:\n\n The path to the the written master file\n \"\"\"\n # Retrieve paths to the files containing data\n fnames: List[Path] = []\n fi: int\n for fi in range(mpi_size):\n if n_hits_per_rank[fi] > 0:\n fnames.append(Path(outdir) / f\"{exp}_r{run:0>4}_{fi}{tag}.cxi\")\n if len(fnames) == 0:\n sys.exit(\"No hits found\")\n\n # Retrieve list of entries to populate in the virtual hdf5 file\n dname_list, key_list, shape_list, dtype_list = [], [], [], []\n datasets = [\"/entry_1/result_1\", \"/LCLS/detector_1\", \"/LCLS\", \"/entry_1/data_1\"]\n f = h5py.File(fnames[0], \"r\")\n for dname in datasets:\n dset = f[dname]\n for key in dset.keys():\n if f\"{dname}/{key}\" not in datasets:\n dname_list.append(dname)\n key_list.append(key)\n shape_list.append(dset[key].shape)\n dtype_list.append(dset[key].dtype)\n f.close()\n\n # Compute cumulative powder hits and misses for all files\n powder_hits, powder_misses = None, None\n for fn in fnames:\n f = h5py.File(fn, \"r\")\n if powder_hits is None:\n powder_hits = f[\"entry_1/data_1/powderHits\"][:].copy()\n powder_misses = f[\"entry_1/data_1/powderMisses\"][:].copy()\n else:\n powder_hits = numpy.maximum(\n powder_hits, f[\"entry_1/data_1/powderHits\"][:].copy()\n )\n powder_misses = numpy.maximum(\n powder_misses, f[\"entry_1/data_1/powderMisses\"][:].copy()\n )\n f.close()\n\n vfname: Path = Path(outdir) / f\"{exp}_r{run:0>4}{tag}.cxi\"\n with h5py.File(vfname, \"w\") as vdf:\n\n # Write the virtual hdf5 file\n for dnum in range(len(dname_list)):\n dname = f\"{dname_list[dnum]}/{key_list[dnum]}\"\n if key_list[dnum] not in [\"mask\", \"powderHits\", \"powderMisses\"]:\n layout = h5py.VirtualLayout(\n shape=(n_hits_total,) + shape_list[dnum][1:], dtype=dtype_list[dnum]\n )\n cursor = 0\n for i, fn in enumerate(fnames):\n vsrc = h5py.VirtualSource(\n fn, dname, shape=(n_hits_per_rank[i],) + shape_list[dnum][1:]\n )\n if len(shape_list[dnum]) == 1:\n layout[cursor : cursor + n_hits_per_rank[i]] = vsrc\n else:\n layout[cursor : cursor + n_hits_per_rank[i], :] = vsrc\n cursor += n_hits_per_rank[i]\n vdf.create_virtual_dataset(dname, layout, fillvalue=-1)\n\n vdf[\"entry_1/data_1/powderHits\"] = powder_hits\n vdf[\"entry_1/data_1/powderMisses\"] = powder_misses\n\n return vfname\n
"},{"location":"source/tasks/sfx_index/","title":"sfx_index","text":"Classes for indexing tasks in SFX.
Classes:
Name DescriptionConcatenateStreamFIles
task that merges multiple stream files into a single file.
"},{"location":"source/tasks/sfx_index/#tasks.sfx_index.ConcatenateStreamFiles","title":"ConcatenateStreamFiles
","text":" Bases: Task
Task that merges stream files located within a directory tree.
Source code inlute/tasks/sfx_index.py
class ConcatenateStreamFiles(Task):\n \"\"\"\n Task that merges stream files located within a directory tree.\n \"\"\"\n\n def __init__(self, *, params: TaskParameters) -> None:\n super().__init__(params=params)\n\n def _run(self) -> None:\n\n stream_file_path: Path = Path(self._task_parameters.in_file)\n stream_file_list: List[Path] = list(\n stream_file_path.rglob(f\"{self._task_parameters.tag}_*.stream\")\n )\n\n processed_file_list = [str(stream_file) for stream_file in stream_file_list]\n\n msg: Message = Message(\n contents=f\"Merging following stream files: {processed_file_list} into \"\n f\"{self._task_parameters.out_file}\",\n )\n self._report_to_executor(msg)\n\n wfd: BinaryIO\n with open(self._task_parameters.out_file, \"wb\") as wfd:\n infile: Path\n for infile in stream_file_list:\n fd: BinaryIO\n with open(infile, \"rb\") as fd:\n shutil.copyfileobj(fd, wfd)\n
"},{"location":"source/tasks/task/","title":"task","text":"Base classes for implementing analysis tasks.
Classes:
Name DescriptionTask
Abstract base class from which all analysis tasks are derived.
ThirdPartyTask
Class to run a third-party executable binary as a Task
.
DescribedAnalysis
dataclass
","text":"Complete analysis description. Held by an Executor.
Source code inlute/tasks/dataclasses.py
@dataclass\nclass DescribedAnalysis:\n \"\"\"Complete analysis description. Held by an Executor.\"\"\"\n\n task_result: TaskResult\n task_parameters: Optional[TaskParameters]\n task_env: Dict[str, str]\n poll_interval: float\n communicator_desc: List[str]\n
"},{"location":"source/tasks/task/#tasks.task.ElogSummaryPlots","title":"ElogSummaryPlots
dataclass
","text":"Holds a graphical summary intended for display in the eLog.
Attributes:
Name Type Descriptiondisplay_name
str
This represents both a path and how the result will be displayed in the eLog. Can include \"/\" characters. E.g. display_name = \"scans/my_motor_scan\"
will have plots shown on a \"my_motor_scan\" page, under a \"scans\" tab. This format mirrors how the file is stored on disk as well.
lute/tasks/dataclasses.py
@dataclass\nclass ElogSummaryPlots:\n \"\"\"Holds a graphical summary intended for display in the eLog.\n\n Attributes:\n display_name (str): This represents both a path and how the result will be\n displayed in the eLog. Can include \"/\" characters. E.g.\n `display_name = \"scans/my_motor_scan\"` will have plots shown\n on a \"my_motor_scan\" page, under a \"scans\" tab. This format mirrors\n how the file is stored on disk as well.\n \"\"\"\n\n display_name: str\n figures: Union[pn.Tabs, hv.Image, plt.Figure]\n
"},{"location":"source/tasks/task/#tasks.task.Task","title":"Task
","text":" Bases: ABC
Abstract base class for analysis tasks.
Attributes:
Name Type Descriptionname
str
The name of the Task.
Source code inlute/tasks/task.py
class Task(ABC):\n \"\"\"Abstract base class for analysis tasks.\n\n Attributes:\n name (str): The name of the Task.\n \"\"\"\n\n def __init__(self, *, params: TaskParameters, use_mpi: bool = False) -> None:\n \"\"\"Initialize a Task.\n\n Args:\n params (TaskParameters): Parameters needed to properly configure\n the analysis task. These are NOT related to execution parameters\n (number of cores, etc), except, potentially, in case of binary\n executable sub-classes.\n\n use_mpi (bool): Whether this Task requires the use of MPI.\n This determines the behaviour and timing of certain signals\n and ensures appropriate barriers are placed to not end\n processing until all ranks have finished.\n \"\"\"\n self.name: str = str(type(self)).split(\"'\")[1].split(\".\")[-1]\n self._result: TaskResult = TaskResult(\n task_name=self.name,\n task_status=TaskStatus.PENDING,\n summary=\"PENDING\",\n payload=\"\",\n )\n self._task_parameters: TaskParameters = params\n timeout: int = self._task_parameters.lute_config.task_timeout\n signal.setitimer(signal.ITIMER_REAL, timeout)\n\n run_directory: Optional[str] = self._task_parameters.Config.run_directory\n if run_directory is not None:\n try:\n os.chdir(run_directory)\n except FileNotFoundError:\n warnings.warn(\n (\n f\"Attempt to change to {run_directory}, but it is not found!\\n\"\n f\"Will attempt to run from {os.getcwd()}. It may fail!\"\n ),\n category=UserWarning,\n )\n self._use_mpi: bool = use_mpi\n\n def run(self) -> None:\n \"\"\"Calls the analysis routines and any pre/post task functions.\n\n This method is part of the public API and should not need to be modified\n in any subclasses.\n \"\"\"\n self._signal_start()\n self._pre_run()\n self._run()\n self._post_run()\n self._signal_result()\n\n @abstractmethod\n def _run(self) -> None:\n \"\"\"Actual analysis to run. Overridden by subclasses.\n\n Separating the calling API from the implementation allows `run` to\n have pre and post task functionality embedded easily into a single\n function call.\n \"\"\"\n ...\n\n def _pre_run(self) -> None:\n \"\"\"Code to run BEFORE the main analysis takes place.\n\n This function may, or may not, be employed by subclasses.\n \"\"\"\n ...\n\n def _post_run(self) -> None:\n \"\"\"Code to run AFTER the main analysis takes place.\n\n This function may, or may not, be employed by subclasses.\n \"\"\"\n ...\n\n @property\n def result(self) -> TaskResult:\n \"\"\"TaskResult: Read-only Task Result information.\"\"\"\n return self._result\n\n def __call__(self) -> None:\n self.run()\n\n def _signal_start(self) -> None:\n \"\"\"Send the signal that the Task will begin shortly.\"\"\"\n start_msg: Message = Message(\n contents=self._task_parameters, signal=\"TASK_STARTED\"\n )\n self._result.task_status = TaskStatus.RUNNING\n if self._use_mpi:\n from mpi4py import MPI\n\n comm: MPI.Intracomm = MPI.COMM_WORLD\n rank: int = comm.Get_rank()\n comm.Barrier()\n if rank == 0:\n self._report_to_executor(start_msg)\n else:\n self._report_to_executor(start_msg)\n\n def _signal_result(self) -> None:\n \"\"\"Send the signal that results are ready along with the results.\"\"\"\n signal: str = \"TASK_RESULT\"\n results_msg: Message = Message(contents=self.result, signal=signal)\n if self._use_mpi:\n from mpi4py import MPI\n\n comm: MPI.Intracomm = MPI.COMM_WORLD\n rank: int = comm.Get_rank()\n comm.Barrier()\n if rank == 0:\n self._report_to_executor(results_msg)\n else:\n self._report_to_executor(results_msg)\n time.sleep(0.1)\n\n def _report_to_executor(self, msg: Message) -> None:\n \"\"\"Send a message to the Executor.\n\n Details of `Communicator` choice are hidden from the caller. This\n method may be overriden by subclasses with specialized functionality.\n\n Args:\n msg (Message): The message object to send.\n \"\"\"\n communicator: Communicator\n if isinstance(msg.contents, str) or msg.contents is None:\n communicator = PipeCommunicator()\n else:\n communicator = SocketCommunicator()\n\n communicator.delayed_setup()\n communicator.write(msg)\n communicator.clear_communicator()\n\n def clean_up_timeout(self) -> None:\n \"\"\"Perform any necessary cleanup actions before exit if timing out.\"\"\"\n ...\n
"},{"location":"source/tasks/task/#tasks.task.Task.result","title":"result: TaskResult
property
","text":"TaskResult: Read-only Task Result information.
"},{"location":"source/tasks/task/#tasks.task.Task.__init__","title":"__init__(*, params, use_mpi=False)
","text":"Initialize a Task.
Parameters:
Name Type Description Defaultparams
TaskParameters
Parameters needed to properly configure the analysis task. These are NOT related to execution parameters (number of cores, etc), except, potentially, in case of binary executable sub-classes.
requireduse_mpi
bool
Whether this Task requires the use of MPI. This determines the behaviour and timing of certain signals and ensures appropriate barriers are placed to not end processing until all ranks have finished.
False
Source code in lute/tasks/task.py
def __init__(self, *, params: TaskParameters, use_mpi: bool = False) -> None:\n \"\"\"Initialize a Task.\n\n Args:\n params (TaskParameters): Parameters needed to properly configure\n the analysis task. These are NOT related to execution parameters\n (number of cores, etc), except, potentially, in case of binary\n executable sub-classes.\n\n use_mpi (bool): Whether this Task requires the use of MPI.\n This determines the behaviour and timing of certain signals\n and ensures appropriate barriers are placed to not end\n processing until all ranks have finished.\n \"\"\"\n self.name: str = str(type(self)).split(\"'\")[1].split(\".\")[-1]\n self._result: TaskResult = TaskResult(\n task_name=self.name,\n task_status=TaskStatus.PENDING,\n summary=\"PENDING\",\n payload=\"\",\n )\n self._task_parameters: TaskParameters = params\n timeout: int = self._task_parameters.lute_config.task_timeout\n signal.setitimer(signal.ITIMER_REAL, timeout)\n\n run_directory: Optional[str] = self._task_parameters.Config.run_directory\n if run_directory is not None:\n try:\n os.chdir(run_directory)\n except FileNotFoundError:\n warnings.warn(\n (\n f\"Attempt to change to {run_directory}, but it is not found!\\n\"\n f\"Will attempt to run from {os.getcwd()}. It may fail!\"\n ),\n category=UserWarning,\n )\n self._use_mpi: bool = use_mpi\n
"},{"location":"source/tasks/task/#tasks.task.Task.clean_up_timeout","title":"clean_up_timeout()
","text":"Perform any necessary cleanup actions before exit if timing out.
Source code inlute/tasks/task.py
def clean_up_timeout(self) -> None:\n \"\"\"Perform any necessary cleanup actions before exit if timing out.\"\"\"\n ...\n
"},{"location":"source/tasks/task/#tasks.task.Task.run","title":"run()
","text":"Calls the analysis routines and any pre/post task functions.
This method is part of the public API and should not need to be modified in any subclasses.
Source code inlute/tasks/task.py
def run(self) -> None:\n \"\"\"Calls the analysis routines and any pre/post task functions.\n\n This method is part of the public API and should not need to be modified\n in any subclasses.\n \"\"\"\n self._signal_start()\n self._pre_run()\n self._run()\n self._post_run()\n self._signal_result()\n
"},{"location":"source/tasks/task/#tasks.task.TaskResult","title":"TaskResult
dataclass
","text":"Class for storing the result of a Task's execution with metadata.
Attributes:
Name Type Descriptiontask_name
str
Name of the associated task which produced it.
task_status
TaskStatus
Status of associated task.
summary
str
Short message/summary associated with the result.
payload
Any
Actual result. May be data in any format.
impl_schemas
Optional[str]
A string listing Task
schemas implemented by the associated Task
. Schemas define the category and expected output of the Task
. An individual task may implement/conform to multiple schemas. Multiple schemas are separated by ';', e.g. * impl_schemas = \"schema1;schema2\"
lute/tasks/dataclasses.py
@dataclass\nclass TaskResult:\n \"\"\"Class for storing the result of a Task's execution with metadata.\n\n Attributes:\n task_name (str): Name of the associated task which produced it.\n\n task_status (TaskStatus): Status of associated task.\n\n summary (str): Short message/summary associated with the result.\n\n payload (Any): Actual result. May be data in any format.\n\n impl_schemas (Optional[str]): A string listing `Task` schemas implemented\n by the associated `Task`. Schemas define the category and expected\n output of the `Task`. An individual task may implement/conform to\n multiple schemas. Multiple schemas are separated by ';', e.g.\n * impl_schemas = \"schema1;schema2\"\n \"\"\"\n\n task_name: str\n task_status: TaskStatus\n summary: str\n payload: Any\n impl_schemas: Optional[str] = None\n
"},{"location":"source/tasks/task/#tasks.task.TaskStatus","title":"TaskStatus
","text":" Bases: Enum
Possible Task statuses.
Source code inlute/tasks/dataclasses.py
class TaskStatus(Enum):\n \"\"\"Possible Task statuses.\"\"\"\n\n PENDING = 0\n \"\"\"\n Task has yet to run. Is Queued, or waiting for prior tasks.\n \"\"\"\n RUNNING = 1\n \"\"\"\n Task is in the process of execution.\n \"\"\"\n COMPLETED = 2\n \"\"\"\n Task has completed without fatal errors.\n \"\"\"\n FAILED = 3\n \"\"\"\n Task encountered a fatal error.\n \"\"\"\n STOPPED = 4\n \"\"\"\n Task was, potentially temporarily, stopped/suspended.\n \"\"\"\n CANCELLED = 5\n \"\"\"\n Task was cancelled prior to completion or failure.\n \"\"\"\n TIMEDOUT = 6\n \"\"\"\n Task did not reach completion due to timeout.\n \"\"\"\n
"},{"location":"source/tasks/task/#tasks.task.TaskStatus.CANCELLED","title":"CANCELLED = 5
class-attribute
instance-attribute
","text":"Task was cancelled prior to completion or failure.
"},{"location":"source/tasks/task/#tasks.task.TaskStatus.COMPLETED","title":"COMPLETED = 2
class-attribute
instance-attribute
","text":"Task has completed without fatal errors.
"},{"location":"source/tasks/task/#tasks.task.TaskStatus.FAILED","title":"FAILED = 3
class-attribute
instance-attribute
","text":"Task encountered a fatal error.
"},{"location":"source/tasks/task/#tasks.task.TaskStatus.PENDING","title":"PENDING = 0
class-attribute
instance-attribute
","text":"Task has yet to run. Is Queued, or waiting for prior tasks.
"},{"location":"source/tasks/task/#tasks.task.TaskStatus.RUNNING","title":"RUNNING = 1
class-attribute
instance-attribute
","text":"Task is in the process of execution.
"},{"location":"source/tasks/task/#tasks.task.TaskStatus.STOPPED","title":"STOPPED = 4
class-attribute
instance-attribute
","text":"Task was, potentially temporarily, stopped/suspended.
"},{"location":"source/tasks/task/#tasks.task.TaskStatus.TIMEDOUT","title":"TIMEDOUT = 6
class-attribute
instance-attribute
","text":"Task did not reach completion due to timeout.
"},{"location":"source/tasks/task/#tasks.task.ThirdPartyTask","title":"ThirdPartyTask
","text":" Bases: Task
A Task
interface to analysis with binary executables.
lute/tasks/task.py
class ThirdPartyTask(Task):\n \"\"\"A `Task` interface to analysis with binary executables.\"\"\"\n\n def __init__(self, *, params: TaskParameters) -> None:\n \"\"\"Initialize a Task.\n\n Args:\n params (TaskParameters): Parameters needed to properly configure\n the analysis task. `Task`s of this type MUST include the name\n of a binary to run and any arguments which should be passed to\n it (as would be done via command line). The binary is included\n with the parameter `executable`. All other parameter names are\n assumed to be the long/extended names of the flag passed on the\n command line by default:\n * `arg_name = 3` is converted to `--arg_name 3`\n Positional arguments can be included with `p_argN` where `N` is\n any integer:\n * `p_arg1 = 3` is converted to `3`\n\n Note that it is NOT recommended to rely on this default behaviour\n as command-line arguments can be passed in many ways. Refer to\n the dcoumentation at\n https://slac-lcls.github.io/lute/tutorial/new_task/\n under \"Speciyfing a TaskParameters Model for your Task\" for more\n information on how to control parameter parsing from within your\n TaskParameters model definition.\n \"\"\"\n super().__init__(params=params)\n self._cmd = self._task_parameters.executable\n self._args_list: List[str] = [self._cmd]\n self._template_context: Dict[str, Any] = {}\n\n def _add_to_jinja_context(self, param_name: str, value: Any) -> None:\n \"\"\"Store a parameter as a Jinja template variable.\n\n Variables are stored in a dictionary which is used to fill in a\n premade Jinja template for a third party configuration file.\n\n Args:\n param_name (str): Name to store the variable as. This should be\n the name defined in the corresponding pydantic model. This name\n MUST match the name used in the Jinja Template!\n value (Any): The value to store. If possible, large chunks of the\n template should be represented as a single dictionary for\n simplicity; however, any type can be stored as needed.\n \"\"\"\n context_update: Dict[str, Any] = {param_name: value}\n if __debug__:\n msg: Message = Message(contents=f\"TemplateParameters: {context_update}\")\n self._report_to_executor(msg)\n self._template_context.update(context_update)\n\n def _template_to_config_file(self) -> None:\n \"\"\"Convert a template file into a valid configuration file.\n\n Uses Jinja to fill in a provided template file with variables supplied\n through the LUTE config file. This facilitates parameter modification\n for third party tasks which use a separate configuration, in addition\n to, or instead of, command-line arguments.\n \"\"\"\n from jinja2 import Environment, FileSystemLoader, Template\n\n out_file: str = self._task_parameters.lute_template_cfg.output_path\n template_name: str = self._task_parameters.lute_template_cfg.template_name\n\n lute_path: Optional[str] = os.getenv(\"LUTE_PATH\")\n template_dir: str\n if lute_path is None:\n warnings.warn(\n \"LUTE_PATH is None in Task process! Using relative path for templates!\",\n category=UserWarning,\n )\n template_dir: str = \"../../config/templates\"\n else:\n template_dir = f\"{lute_path}/config/templates\"\n environment: Environment = Environment(loader=FileSystemLoader(template_dir))\n template: Template = environment.get_template(template_name)\n\n with open(out_file, \"w\", encoding=\"utf-8\") as cfg_out:\n cfg_out.write(template.render(self._template_context))\n\n def _pre_run(self) -> None:\n \"\"\"Parse the parameters into an appropriate argument list.\n\n Arguments are identified by a `flag_type` attribute, defined in the\n pydantic model, which indicates how to pass the parameter and its\n argument on the command-line. This method parses flag:value pairs\n into an appropriate list to be used to call the executable.\n\n Note:\n ThirdPartyParameter objects are returned by custom model validators.\n Objects of this type are assumed to be used for a templated config\n file used by the third party executable for configuration. The parsing\n of these parameters is performed separately by a template file used as\n an input to Jinja. This method solely identifies the necessary objects\n and passes them all along. Refer to the template files and pydantic\n models for more information on how these parameters are defined and\n identified.\n \"\"\"\n super()._pre_run()\n full_schema: Dict[str, Union[str, Dict[str, Any]]] = (\n self._task_parameters.schema()\n )\n short_flags_use_eq: bool\n long_flags_use_eq: bool\n if hasattr(self._task_parameters.Config, \"short_flags_use_eq\"):\n short_flags_use_eq: bool = self._task_parameters.Config.short_flags_use_eq\n long_flags_use_eq: bool = self._task_parameters.Config.long_flags_use_eq\n else:\n short_flags_use_eq = False\n long_flags_use_eq = False\n for param, value in self._task_parameters.dict().items():\n # Clunky test with __dict__[param] because compound model-types are\n # converted to `dict`. E.g. type(value) = dict not AnalysisHeader\n if (\n param == \"executable\"\n or value is None # Cannot have empty values in argument list for execvp\n or value == \"\" # But do want to include, e.g. 0\n or isinstance(self._task_parameters.__dict__[param], TemplateConfig)\n or isinstance(self._task_parameters.__dict__[param], AnalysisHeader)\n ):\n continue\n if isinstance(self._task_parameters.__dict__[param], TemplateParameters):\n # TemplateParameters objects have a single parameter `params`\n self._add_to_jinja_context(param_name=param, value=value.params)\n continue\n\n param_attributes: Dict[str, Any] = full_schema[\"properties\"][param]\n # Some model params do not match the commnad-line parameter names\n param_repr: str\n if \"rename_param\" in param_attributes:\n param_repr = param_attributes[\"rename_param\"]\n else:\n param_repr = param\n if \"flag_type\" in param_attributes:\n flag: str = param_attributes[\"flag_type\"]\n if flag:\n # \"-\" or \"--\" flags\n if flag == \"--\" and isinstance(value, bool) and not value:\n continue\n constructed_flag: str = f\"{flag}{param_repr}\"\n if flag == \"--\" and isinstance(value, bool) and value:\n # On/off flag, e.g. something like --verbose: No Arg\n self._args_list.append(f\"{constructed_flag}\")\n continue\n if (flag == \"-\" and short_flags_use_eq) or (\n flag == \"--\" and long_flags_use_eq\n ): # Must come after above check! Otherwise you get --param=True\n # Flags following --param=value or -param=value\n constructed_flag = f\"{constructed_flag}={value}\"\n self._args_list.append(f\"{constructed_flag}\")\n continue\n self._args_list.append(f\"{constructed_flag}\")\n else:\n warnings.warn(\n (\n f\"Model parameters should be defined using Field(...,flag_type='')\"\n f\" in the future. Parameter: {param}\"\n ),\n category=PendingDeprecationWarning,\n )\n if len(param) == 1: # Single-dash flags\n if short_flags_use_eq:\n self._args_list.append(f\"-{param_repr}={value}\")\n continue\n self._args_list.append(f\"-{param_repr}\")\n elif \"p_arg\" in param: # Positional arguments\n pass\n else: # Double-dash flags\n if isinstance(value, bool) and not value:\n continue\n if long_flags_use_eq:\n self._args_list.append(f\"--{param_repr}={value}\")\n continue\n self._args_list.append(f\"--{param_repr}\")\n if isinstance(value, bool) and value:\n continue\n if isinstance(value, str) and \" \" in value:\n for val in value.split():\n self._args_list.append(f\"{val}\")\n else:\n self._args_list.append(f\"{value}\")\n if (\n hasattr(self._task_parameters, \"lute_template_cfg\")\n and self._template_context\n ):\n self._template_to_config_file()\n\n def _run(self) -> None:\n \"\"\"Execute the new program by replacing the current process.\"\"\"\n if __debug__:\n time.sleep(0.1)\n msg: Message = Message(contents=self._formatted_command())\n self._report_to_executor(msg)\n LUTE_DEBUG_EXIT(\"LUTE_DEBUG_BEFORE_TPP_EXEC\")\n os.execvp(file=self._cmd, args=self._args_list)\n\n def _formatted_command(self) -> str:\n \"\"\"Returns the command as it would passed on the command-line.\"\"\"\n formatted_cmd: str = \"\".join(f\"{arg} \" for arg in self._args_list)\n return formatted_cmd\n\n def _signal_start(self) -> None:\n \"\"\"Override start signal method to switch communication methods.\"\"\"\n super()._signal_start()\n time.sleep(0.05)\n signal: str = \"NO_PICKLE_MODE\"\n msg: Message = Message(signal=signal)\n self._report_to_executor(msg)\n
"},{"location":"source/tasks/task/#tasks.task.ThirdPartyTask.__init__","title":"__init__(*, params)
","text":"Initialize a Task.
Parameters:
Name Type Description Defaultparams
TaskParameters
Parameters needed to properly configure the analysis task. Task
s of this type MUST include the name of a binary to run and any arguments which should be passed to it (as would be done via command line). The binary is included with the parameter executable
. All other parameter names are assumed to be the long/extended names of the flag passed on the command line by default: * arg_name = 3
is converted to --arg_name 3
Positional arguments can be included with p_argN
where N
is any integer: * p_arg1 = 3
is converted to 3
Note that it is NOT recommended to rely on this default behaviour as command-line arguments can be passed in many ways. Refer to the dcoumentation at https://slac-lcls.github.io/lute/tutorial/new_task/ under \"Speciyfing a TaskParameters Model for your Task\" for more information on how to control parameter parsing from within your TaskParameters model definition.
required Source code inlute/tasks/task.py
def __init__(self, *, params: TaskParameters) -> None:\n \"\"\"Initialize a Task.\n\n Args:\n params (TaskParameters): Parameters needed to properly configure\n the analysis task. `Task`s of this type MUST include the name\n of a binary to run and any arguments which should be passed to\n it (as would be done via command line). The binary is included\n with the parameter `executable`. All other parameter names are\n assumed to be the long/extended names of the flag passed on the\n command line by default:\n * `arg_name = 3` is converted to `--arg_name 3`\n Positional arguments can be included with `p_argN` where `N` is\n any integer:\n * `p_arg1 = 3` is converted to `3`\n\n Note that it is NOT recommended to rely on this default behaviour\n as command-line arguments can be passed in many ways. Refer to\n the dcoumentation at\n https://slac-lcls.github.io/lute/tutorial/new_task/\n under \"Speciyfing a TaskParameters Model for your Task\" for more\n information on how to control parameter parsing from within your\n TaskParameters model definition.\n \"\"\"\n super().__init__(params=params)\n self._cmd = self._task_parameters.executable\n self._args_list: List[str] = [self._cmd]\n self._template_context: Dict[str, Any] = {}\n
"},{"location":"source/tasks/test/","title":"test","text":"Basic test Tasks for testing functionality.
Classes:
Name DescriptionTest
Simplest test Task - runs a 10 iteration loop and returns a result.
TestSocket
Test Task which sends larger data to test socket IPC.
TestWriteOutput
Test Task which writes an output file.
TestReadOutput
Test Task which reads in a file. Can be used to test database access.
"},{"location":"source/tasks/test/#tasks.test.Test","title":"Test
","text":" Bases: Task
Simple test Task to ensure subprocess and pipe-based IPC work.
Source code inlute/tasks/test.py
class Test(Task):\n \"\"\"Simple test Task to ensure subprocess and pipe-based IPC work.\"\"\"\n\n def __init__(self, *, params: TaskParameters) -> None:\n super().__init__(params=params)\n\n def _run(self) -> None:\n for i in range(10):\n time.sleep(1)\n msg: Message = Message(contents=f\"Test message {i}\")\n self._report_to_executor(msg)\n if self._task_parameters.throw_error:\n raise RuntimeError(\"Testing Error!\")\n\n def _post_run(self) -> None:\n self._result.summary = \"Test Finished.\"\n self._result.task_status = TaskStatus.COMPLETED\n time.sleep(0.1)\n
"},{"location":"source/tasks/test/#tasks.test.TestReadOutput","title":"TestReadOutput
","text":" Bases: Task
Simple test Task to read in output from the test Task above.
Its pydantic model relies on a database access to retrieve the output file.
Source code inlute/tasks/test.py
class TestReadOutput(Task):\n \"\"\"Simple test Task to read in output from the test Task above.\n\n Its pydantic model relies on a database access to retrieve the output file.\n \"\"\"\n\n def __init__(self, *, params: TaskParameters) -> None:\n super().__init__(params=params)\n\n def _run(self) -> None:\n array: np.ndarray = np.loadtxt(self._task_parameters.in_file, delimiter=\",\")\n self._report_to_executor(msg=Message(contents=\"Successfully loaded data!\"))\n for i in range(5):\n time.sleep(1)\n\n def _post_run(self) -> None:\n super()._post_run()\n self._result.summary = \"Was able to load data.\"\n self._result.payload = \"This Task produces no output.\"\n self._result.task_status = TaskStatus.COMPLETED\n
"},{"location":"source/tasks/test/#tasks.test.TestSocket","title":"TestSocket
","text":" Bases: Task
Simple test Task to ensure basic IPC over Unix sockets works.
Source code inlute/tasks/test.py
class TestSocket(Task):\n \"\"\"Simple test Task to ensure basic IPC over Unix sockets works.\"\"\"\n\n def __init__(self, *, params: TaskParameters) -> None:\n super().__init__(params=params)\n\n def _run(self) -> None:\n for i in range(self._task_parameters.num_arrays):\n msg: Message = Message(contents=f\"Sending array {i}\")\n self._report_to_executor(msg)\n time.sleep(0.05)\n msg: Message = Message(\n contents=np.random.rand(self._task_parameters.array_size)\n )\n self._report_to_executor(msg)\n\n def _post_run(self) -> None:\n super()._post_run()\n self._result.summary = f\"Sent {self._task_parameters.num_arrays} arrays\"\n self._result.payload = np.random.rand(self._task_parameters.array_size)\n self._result.task_status = TaskStatus.COMPLETED\n
"},{"location":"source/tasks/test/#tasks.test.TestWriteOutput","title":"TestWriteOutput
","text":" Bases: Task
Simple test Task to write output other Tasks depend on.
Source code inlute/tasks/test.py
class TestWriteOutput(Task):\n \"\"\"Simple test Task to write output other Tasks depend on.\"\"\"\n\n def __init__(self, *, params: TaskParameters) -> None:\n super().__init__(params=params)\n\n def _run(self) -> None:\n for i in range(self._task_parameters.num_vals):\n # Doing some calculations...\n time.sleep(0.05)\n if i % 10 == 0:\n msg: Message = Message(contents=f\"Processed {i+1} values!\")\n self._report_to_executor(msg)\n\n def _post_run(self) -> None:\n super()._post_run()\n work_dir: str = self._task_parameters.lute_config.work_dir\n out_file: str = f\"{work_dir}/{self._task_parameters.outfile_name}\"\n array: np.ndarray = np.random.rand(self._task_parameters.num_vals)\n np.savetxt(out_file, array, delimiter=\",\")\n self._result.summary = \"Completed task successfully.\"\n self._result.payload = out_file\n self._result.task_status = TaskStatus.COMPLETED\n
"},{"location":"tutorial/creating_workflows/","title":"Workflows with Airflow","text":"Note: Airflow uses the term DAG, or directed acyclic graph, to describe workflows of tasks with defined (and acyclic) connectivities. This page will use the terms workflow and DAG interchangeably.
"},{"location":"tutorial/creating_workflows/#relevant-components","title":"Relevant Components","text":"In addition to the core LUTE package, a number of components are generally involved to run a workflow. The current set of scripts and objects are used to interface with Airflow, and the SLURM job scheduler. The core LUTE library can also be used to run workflows using different backends, and in the future these may be supported.
For building and running workflows using SLURM and Airflow, the following components are necessary, and will be described in more detail below: - Airflow launch script: launch_airflow.py
- This has a wrapper batch submission script: submit_launch_airflow.sh
. When running using the ARP (from the eLog), you MUST use this wrapper script instead of the Python script directly. - SLURM submission script: submit_slurm.sh
- Airflow operators: - JIDSlurmOperator
launch_airflow.py
","text":"Sends a request to an Airflow instance to submit a specific DAG (workflow). This script prepares an HTTP request with the appropriate parameters in a specific format.
A request involves the following information, most of which is retrieved automatically:
dag_run_data: Dict[str, Union[str, Dict[str, Union[str, int, List[str]]]]] = {\n \"dag_run_id\": str(uuid.uuid4()),\n \"conf\": {\n \"experiment\": os.environ.get(\"EXPERIMENT\"),\n \"run_id\": f\"{os.environ.get('RUN_NUM')}{datetime.datetime.utcnow().isoformat()}\",\n \"JID_UPDATE_COUNTERS\": os.environ.get(\"JID_UPDATE_COUNTERS\"),\n \"ARP_ROOT_JOB_ID\": os.environ.get(\"ARP_JOB_ID\"),\n \"ARP_LOCATION\": os.environ.get(\"ARP_LOCATION\", \"S3DF\"),\n \"Authorization\": os.environ.get(\"Authorization\"),\n \"user\": getpass.getuser(),\n \"lute_params\": params,\n \"slurm_params\": extra_args,\n \"workflow\": wf_defn, # Used only for custom DAGs. See below under advanced usage.\n },\n}\n
Note that the environment variables are used to fill in the appropriate information because this script is intended to be launched primarily from the ARP (which passes these variables). The ARP allows for the launch job to be defined in the experiment eLog and submitted automatically for each new DAQ run. The environment variables EXPERIMENT
and RUN
can alternatively be defined prior to submitting the script on the command-line.
The script takes a number of parameters:
launch_airflow.py -c <path_to_config_yaml> -w <workflow_name> [--debug] [--test] [-e <exp>] [-r <run>] [SLURM_ARGS]\n
-c
refers to the path of the configuration YAML that contains the parameters for each managed Task
in the requested workflow.-w
is the name of the DAG (workflow) to run. By convention each DAG is named by the Python file it is defined in. (See below).-W
(capital W) followed by the path to the workflow instead of -w
. See below for further discussion on this use case.--debug
is an optional flag to run all steps of the workflow in debug mode for verbose logging and output.--test
is an optional flag which will use the test Airflow instance. By default the script will make requests of the standard production Airflow instance.-e
is used to pass the experiment name. Needed if not using the ARP, i.e. running from the command-line.-r
is used to pass a run number. Needed if not using the ARP, i.e. running from the command-line.SLURM_ARGS
are SLURM arguments to be passed to the submit_slurm.sh
script which are used for each individual managed Task
. These arguments to do NOT affect the submission parameters for the job running launch_airflow.py
(if using submit_launch_airflow.sh
below).Lifetime This script will run for the entire duration of the workflow (DAG). After making the initial request of Airflow to launch the DAG, it will enter a status update loop which will keep track of each individual job (each job runs one managed Task
) submitted by Airflow. At the end of each job it will collect the log file, in addition to providing a few other status updates/debugging messages, and append it to its own log. This allows all logging for the entire workflow (DAG) to be inspected from an individual file. This is particularly useful when running via the eLog, because only a single log file is displayed.
submit_launch_airflow.sh
","text":"This script is only necessary when running from the eLog using the ARP. The initial job submitted by the ARP can not have a duration of longer than 30 seconds, as it will then time out. As the launch_airflow.py
job will live for the entire duration of the workflow, which is often much longer than 30 seconds, the solution was to have a wrapper which submits the launch_airflow.py
script to run on the S3DF batch nodes. Usage of this script is mostly identical to launch_airflow.py
. All the arguments are passed transparently to the underlying Python script with the exception of the first argument which must be the location of the underlying launch_airflow.py
script. The wrapper will simply launch a batch job using minimal resources (1 core). While the primary purpose of the script is to allow running from the eLog, it is also an useful wrapper generally, to be able to submit the previous script as a SLURM job.
Usage:
submit_launch_airflow.sh /path/to/launch_airflow.py -c <path_to_config_yaml> -w <workflow_name> [--debug] [--test] [-e <exp>] [-r <run>] [SLURM_ARGS]\n
"},{"location":"tutorial/creating_workflows/#submit_slurmsh","title":"submit_slurm.sh
","text":"Launches a job on the S3DF batch nodes using the SLURM job scheduler. This script launches a single managed Task
at a time. The usage is as follows:
submit_slurm.sh -c <path_to_config_yaml> -t <MANAGED_task_name> [--debug] [SLURM_ARGS ...]\n
As a reminder the managed Task
refers to the Executor
-Task
combination. The script does not parse any SLURM specific parameters, and instead passes them transparently to SLURM. At least the following two SLURM arguments must be provided:
--partition=<...> # Usually partition=milano\n--account=<...> # Usually account=lcls:$EXPERIMENT\n
Generally, resource requests will also be included, such as the number of cores to use. A complete call may look like the following:
submit_slurm.sh -c /sdf/data/lcls/ds/hutch/experiment/scratch/config.yaml -t Tester --partition=milano --account=lcls:experiment --ntasks=100 [...]\n
When running a workflow using the launch_airflow.py
script, each step of the workflow will be submitted using this script.
Operator
s are the objects submitted as individual steps of a DAG by Airflow. They are conceptually linked to the idea of a task in that each task of a workflow is generally an operator. Care should be taken, not to confuse them with LUTE Task
s or managed Task
s though. There is, however, usually a one-to-one correspondance between a Task
and an Operator
.
Airflow runs on a K8S cluster which has no access to the experiment data. When we ask Airflow to run a DAG, it will launch an Operator
for each step of the DAG. However, the Operator
itself cannot perform productive analysis without access to the data. The solution employed by LUTE
is to have a limited set of Operator
s which do not perform analysis, but instead request that a LUTE
managed Task
s be submitted on the batch nodes where it can access the data. There may be small differences between how the various provided Operator
s do this, but in general they will all make a request to the job interface daemon (JID) that a new SLURM job be scheduled using the submit_slurm.sh
script described above.
Therefore, running a typical Airflow DAG involves the following steps:
launch_airflow.py
script is submitted, usually from a definition in the eLog.launch_airflow
script requests that Airflow run a specific DAG.Operator
s that makeup the DAG definition.Operator
sends a request to the JID
to submit a job.JID
submits the elog_submit.sh
script with the appropriate managed Task
.Task
runs on the batch nodes, while the Operator
, requesting updates from the JID on job status, waits for it to complete.Task
completes, the Operator
will receieve this information and tell the Airflow server whether the job completed successfully or resulted in failure.Currently, the following Operator
s are maintained: - JIDSlurmOperator
: The standard Operator
. Each instance has a one-to-one correspondance with a LUTE managed Task
.
JIDSlurmOperator
arguments","text":"task_id
: This is nominally the name of the task on the Airflow side. However, for simplicity this is used 1-1 to match the name of a managed Task defined in LUTE's managed_tasks.py
module. I.e., it should the name of an Executor(\"Task\")
object which will run the specific Task of interest. This must match the name of a defined managed Task.max_cores
: Used to cap the maximum number of cores which should be requested of SLURM. By default all jobs will run with the same number of cores, which should be specified when running the launch_airflow.py
script (either from the ARP, or by hand). This behaviour was chosen because in general we want to increase or decrease the core-count for all Task
s uniformly, and we don't want to have to specify core number arguments for each job individually. Nonetheless, on occassion it may be necessary to cap the number of cores a specific job will use. E.g. if the default value specified when launching the Airflow DAG is multiple cores, and one job is single threaded, the core count can be capped for that single job to 1, while the rest run with multiple cores.max_nodes
: Similar to the above. This will make sure the Task
is distributed across no more than a maximum number of nodes. This feature is useful for, e.g., multi-threaded software which does not make use of tools like MPI
. So, the Task
can run on multiple cores, but only within a single node.require_partition
: This option is a string that forces the use of a specific S3DF partition for the managed Task
submitted by the Operator. E.g. typically a LCLS user will use --partition=milano
for CPU-based workflows; however, if a specific Task
requires a GPU you may use JIDSlurmOperator(\"MyTaskRunner\", require_partition=\"ampere\")
to override the partition for that single Task
.custom_slurm_params
: You can provide a string of parameters which will be used in its entirety to replace any and all default arguments passed by the launch script. This method is not recommended for general use and is mostly used for dynamic DAGs described at the end of the document.Defining a new workflow involves creating a new module (Python file) in the directory workflows/airflow
, creating a number of Operator
instances within the module, and then drawing the connectivity between them. At the top of the file an Airflow DAG is created and given a name. By convention all LUTE
workflows use the name of the file as the name of the DAG. The following code can be copied exactly into the file:
from datetime import datetime\nimport os\nfrom airflow import DAG\nfrom lute.operators.jidoperators import JIDSlurmOperator # Import other operators if needed\n\ndag_id: str = f\"lute_{os.path.splitext(os.path.basename(__file__))[0]}\"\ndescription: str = (\n \"Run SFX processing using PyAlgos peak finding and experimental phasing\"\n)\n\ndag: DAG = DAG(\n dag_id=dag_id,\n start_date=datetime(2024, 3, 18),\n schedule_interval=None,\n description=description,\n)\n
Once the DAG has been created, a number of Operator
s must be created to run the various LUTE analysis operations. As an example consider a partial SFX processing workflow which includes steps for peak finding, indexing, merging, and calculating figures of merit. Each of the 4 steps will have an Operator
instance which will launch a corresponding LUTE
managed Task
, for example:
# Using only the JIDSlurmOperator\n# syntax: JIDSlurmOperator(task_id=\"LuteManagedTaskName\", dag=dag) # optionally, max_cores=123)\npeak_finder: JIDSlurmOperator = JIDSlurmOperator(task_id=\"PeakFinderPyAlgos\", dag=dag)\n\n# We specify a maximum number of cores for the rest of the jobs.\nindexer: JIDSlurmOperator = JIDSlurmOperator(\n max_cores=120, task_id=\"CrystFELIndexer\", dag=dag\n)\n# We can alternatively specify this task be only ever run with the following args.\n# indexer: JIDSlurmOperator = JIDSlurmOperator(\n# custom_slurm_params=\"--partition=milano --ntasks=120 --account=lcls:myaccount\",\n# task_id=\"CrystFELIndexer\",\n# dag=dag,\n# )\n\n# Merge\nmerger: JIDSlurmOperator = JIDSlurmOperator(\n max_cores=120, task_id=\"PartialatorMerger\", dag=dag\n)\n\n# Figures of merit\nhkl_comparer: JIDSlurmOperator = JIDSlurmOperator(\n max_cores=8, task_id=\"HKLComparer\", dag=dag\n)\n
Finally, the dependencies between the Operator
s are \"drawn\", defining the execution order of the various steps. The >>
operator has been overloaded for the Operator
class, allowing it to be used to specify the next step in the DAG. In this case, a completely linear DAG is drawn as:
peak_finder >> indexer >> merger >> hkl_comparer\n
Parallel execution can be added by using the >>
operator multiple times. Consider a task1
which upon successful completion starts a task2
and task3
in parallel. This dependency can be added to the DAG using:
#task1: JIDSlurmOperator = JIDSlurmOperator(...)\n#task2 ...\n\ntask1 >> task2\ntask1 >> task3\n
As each DAG is defined in pure Python, standard control structures (loops, if statements, etc.) can be used to create more complex workflow arrangements.
Note: Your DAG will not be available to Airflow until your PR including the file you have defined is merged! Once merged the file will be synced with the Airflow instance and can be run using the scripts described earlier in this document. For testing it is generally preferred that you run each step of your DAG individually using the submit_slurm.sh
script and the independent managed Task
names. If, however, you want to test the behaviour of Airflow itself (in a modified form) you can use the advanced run-time DAGs defined below as well.
In most cases, standard DAGs should be defined as described above and called by name. However, Airflow also supports the creation of DAGs dynamically, e.g. to vary the input data to various steps, or the number of steps that will occur. Some of this functionality has been used to allow for user-defined DAGs which are passed in the form of a dictionary, allowing Airflow to construct the workflow as it is running.
A basic YAML syntax is used to construct a series of nested dictionaries which define a DAG. Considering the first example DAG defined above (for serial femtosecond crystallography), the standard DAG looked like:
peak_finder >> indexer >> merger >> hkl_comparer\n
We can alternatively define this DAG in YAML:
task_name: PeakFinderPyAlgos\nslurm_params: ''\nnext:\n- task_name: CrystFELIndexer\n slurm_params: ''\n next: []\n - task_name: PartialatorMerger\n slurm_params: ''\n next: []\n - task_name: HKLComparer\n slurm_params: ''\n next: []\n
I.e. we define a tree where each node is constructed using Node(task_name: str, slurm_params: str, next: List[Node])
.
task_name
is the name of a managed Task
as before, in the same way that would be passed to the JIDSlurmOperator
.slurm_params
. This is a complete string of all the arguments to use for the corresponding managed Task
. Use of this field is all or nothing! - if it is left as an empty string, the default parameters (passed on the command-line using the launch script) are used, otherwise this string is used in its stead. Because of this remember to include a partition and account if using it.next
field is composed of either an empty list (meaning no managed Task
s are run after the current node), or additional nodes. All nodes in the list are run in parallel. As a second example, to run task1
followed by task2
and task3
in parellel we would use:
task_name: Task1\nslurm_params: ''\nnext:\n- task_name: Task2\n slurm_params: ''\n next: []\n- task_name: Task3\n slurm_params: ''\n next: []\n
In order to run a DAG defined this way we pass the path to the YAML file we have defined it in to the launch script using -W <path_to_dag>
. This is instead of calling it by name. E.g.
/path/to/lute/launch_scripts/submit_launch_airflow.sh /path/to/lute/launch_scripts/launch_airflow.py -e <exp> -r <run> -c /path/to/config -W <path_to_dag> --test [--debug] [SLURM_ARGS]\n
Note that fewer options are currently supported for configuring the operators for each step of the DAG. The slurm arguments can be replaced in their entirety using a custom slurm_params
string but individual options cannot be modified.
Task
","text":"Task
s can be broadly categorized into two types: - \"First-party\" - where the analysis or executed code is maintained within this library. - \"Third-party\" - where the analysis, code, or program is maintained elsewhere and is simply called by a wrapping Task
.
Creating a new Task
of either type generally involves the same steps, although for first-party Task
s, the analysis code must of course also be written. Due to this difference, as well as additional considerations for parameter handling when dealing with \"third-party\" Task
s, the \"first-party\" and \"third-party\" Task
integration cases will be considered separately.
Task
","text":"There are two required steps for third-party Task
integration, and one additional step which is optional, and may not be applicable to all possible third-party Task
s. Generally, Task
integration requires: 1. Defining a TaskParameters
(pydantic) model which fully parameterizes the Task
. This involves specifying a path to a binary, and all the required command-line arguments to run the binary. 2. Creating a managed Task
by specifying an Executor
for the new third-party Task
. At this stage, any additional environment variables can be added which are required for the execution environment. 3. (Optional/Maybe applicable) Create a template for a third-party configuration file. If the new Task
has its own configuration file, specifying a template will allow that file to be parameterized from the singular LUTE yaml configuration file. A couple of minor additions to the pydantic
model specified in 1. are required to support template usage.
Each of these stages will be discussed in detail below. The vast majority of the work is completed in step 1.
"},{"location":"tutorial/new_task/#specifying-a-taskparameters-model-for-your-task","title":"Specifying aTaskParameters
Model for your Task
","text":"A brief overview of parameters objects will be provided below. The following information goes into detail only about specifics related to LUTE configuration. An in depth description of pydantic is beyond the scope of this tutorial; please refer to the official documentation for more information. Please note that due to environment constraints pydantic is currently pinned to version 1.10! Make sure to read the appropriate documentation for this version as many things are different compared to the newer releases. At the end this document there will be an example highlighting some supported behaviour as well as a FAQ to address some common integration considerations.
Task
s and TaskParameter
s
All Task
s have a corresponding TaskParameters
object. These objects are linked exclusively by a named relationship. For a Task
named MyThirdPartyTask
, the parameters object must be named MyThirdPartyTaskParameters
. For third-party Task
s there are a number of additional requirements: - The model must inherit from a base class called ThirdPartyParameters
. - The model must have one field specified called executable
. The presence of this field indicates that the Task
is a third-party Task
and the specified executable must be called. This allows all third-party Task
s to be defined exclusively by their parameters model. A single ThirdPartyTask
class handles execution of all third-party Task
s.
All models are stored in lute/io/models
. For any given Task
, a new model can be added to an existing module contained in this directory or to a new module. If creating a new module, make sure to add an import statement to lute.io.models.__init__
.
Defining TaskParameter
s
When specifying parameters the default behaviour is to provide a one-to-one correspondance between the Python attribute specified in the parameter model, and the parameter specified on the command-line. Single-letter attributes are assumed to be passed using -
, e.g. n
will be passed as -n
when the executable is launched. Longer attributes are passed using --
, e.g. by default a model attribute named my_arg
will be passed on the command-line as --my_arg
. Positional arguments are specified using p_argX
where X
is a number. All parameters are passed in the order that they are specified in the model.
However, because the number of possible command-line combinations is large, relying on the default behaviour above is NOT recommended. It is provided solely as a fallback. Instead, there are a number of configuration knobs which can be tuned to achieve the desired behaviour. The two main mechanisms for controlling behaviour are specification of model-wide configuration under the Config
class within the model's definition, and parameter-by-parameter configuration using field attributes. For the latter, we define all parameters as Field
objects. This allows parameters to have their own attributes, which are parsed by LUTE's task-layer. Given this, the preferred starting template for a TaskParameters
model is the following - we assume we are integrating a new Task
called RunTask
:
\nfrom pydantic import Field, validator\n# Also include any pydantic type specifications - Pydantic has many custom\n# validation types already, e.g. types for constrained numberic values, URL handling, etc.\n\nfrom .base import ThirdPartyParameters\n\n# Change class name as necessary\nclass RunTaskParameters(ThirdPartyParameters):\n \"\"\"Parameters for RunTask...\"\"\"\n\n class Config(ThirdPartyParameters.Config): # MUST be exactly as written here.\n ...\n # Model-wide configuration will go here\n\n executable: str = Field(\"/path/to/executable\", description=\"...\")\n ...\n # Additional params.\n # param1: param1Type = Field(\"default\", description=\"\", ...)\n
Config settings and options Under the class definition for Config
in the model, we can modify global options for all the parameters. In addition, there are a number of configuration options related to specifying what the outputs/results from the associated Task
are, and a number of options to modify runtime behaviour. Currently, the available configuration options are:
run_directory
If provided, can be used to specify the directory from which a Task
is run. None
(not provided) NO set_result
bool
. If True
search the model definition for a parameter that indicates what the result is. False
NO result_from_params
If set_result
is True
can define a result using this option and a validator. See also is_result
below. None
(not provided) NO short_flags_use_eq
Use equals sign instead of space for arguments of -
parameters. False
YES - Only affects ThirdPartyTask
s long_flags_use_eq
Use equals sign instead of space for arguments of -
parameters. False
YES - Only affects ThirdPartyTask
s These configuration options modify how the parameter models are parsed and passed along on the command-line, as well as what we consider results and where a Task
can run. The default behaviour is that parameters are assumed to be passed as -p arg
and --param arg
, the Task
will be run in the current working directory (or scratch if submitted with the ARP), and we have no information about Task
results . Setting the above options can modify this behaviour.
short_flags_use_eq
and/or long_flags_use_eq
to True
parameters are instead passed as -p=arg
and --param=arg
.run_directory
to a valid path, we can force a Task
to be run in a specific directory. By default the Task
will be run from the directory you submit the job in, or from your scratch folder (/sdf/scratch/...
) if you submit from the eLog. Some ThirdPartyTask
s rely on searching the correct working directory in order run properly.set_result
to True
we indicate that the TaskParameters
model will provide information on what the TaskResult
is. This setting must be used with one of two options, either the result_from_params
Config
option, described below, or the Field attribute is_result
described in the next sub-section (Field Attributes).result_from_params
is a Config option that can be used when set_result==True
. In conjunction with a validator (described a sections down) we can use this option to specify a result from all the information contained in the model. E.g. if you have a Task
that has parameters for an output_directory
and a output_filename
, you can set result_from_params==f\"{output_directory}/{output_filename}\"
.Field attributes In addition to the global configuration options there are a couple of ways to specify individual parameters. The following Field
attributes are used when parsing the model:
flag_type
Specify the type of flag for passing this argument. One of \"-\"
, \"--\"
, or \"\"
N/A p_arg1 = Field(..., flag_type=\"\")
rename_param
Change the name of the parameter as passed on the command-line. N/A my_arg = Field(..., rename_param=\"my-arg\")
description
Documentation of the parameter's usage or purpose. N/A arg = Field(..., description=\"Argument for...\")
is_result
bool
. If the set_result
Config
option is True
, we can set this to True
to indicate a result. N/A output_result = Field(..., is_result=true)
The flag_type
attribute allows us to specify whether the parameter corresponds to a positional (\"\"
) command line argument, requires a single hyphen (\"-\"
), or a double hyphen (\"--\"
). By default, the parameter name is passed as-is on the command-line. However, command-line arguments can have characters which would not be valid in Python variable names. In particular, hyphens are frequently used. To handle this case, the rename_param
attribute can be used to specify an alternative spelling of the parameter when it is passed on the command-line. This also allows for using more descriptive variable names internally than those used on the command-line. A description
can also be provided for each Field to document the usage and purpose of that particular parameter.
As an example, we can again consider defining a model for a RunTask
Task
. Consider an executable which would normally be called from the command-line as follows:
/sdf/group/lcls/ds/tools/runtask -n <nthreads> --method=<algorithm> -p <algo_param> [--debug]\n
A model specification for this Task
may look like:
class RunTaskParameters(ThirdPartyParameters):\n \"\"\"Parameters for the runtask binary.\"\"\"\n\n class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True # For the --method parameter\n\n # Prefer using full/absolute paths where possible.\n # No flag_type needed for this field\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/runtask\", description=\"Runtask Binary v1.0\"\n )\n\n # We can provide a more descriptive name for -n\n # Let's assume it's a number of threads, or processes, etc.\n num_threads: int = Field(\n 1, description=\"Number of concurrent threads.\", flag_type=\"-\", rename_param=\"n\"\n )\n\n # In this case we will use the Python variable name directly when passing\n # the parameter on the command-line\n method: str = Field(\"algo1\", description=\"Algorithm to use.\", flag_type=\"--\")\n\n # For an actual parameter we would probably have a better name. Lets assume\n # This parameter (-p) modifies the behaviour of the method above.\n method_param1: int = Field(\n 3, description=\"Modify method performance.\", flag_type=\"-\", rename_param=\"p\"\n )\n\n # Boolean flags are only passed when True! `--debug` is an optional parameter\n # which is not followed by any arguments.\n debug: bool = Field(\n False, description=\"Whether to run in debug mode.\", flag_type=\"--\"\n )\n
The is_result
attribute allows us to specify whether the corresponding Field points to the output/result of the associated Task
. Consider a Task
, RunTask2
which writes its output to a single file which is passed as a parameter.
class RunTask2Parameters(ThirdPartyParameters):\n \"\"\"Parameters for the runtask2 binary.\"\"\"\n\n class Config(ThirdPartyParameters.Config):\n set_result: bool = True # This must be set here!\n # result_from_params: Optional[str] = None # We can use this for more complex result setups (see below). Ignore for now.\n\n # Prefer using full/absolute paths where possible.\n # No flag_type needed for this field\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/runtask2\", description=\"Runtask Binary v2.0\"\n )\n\n # Lets assume we take one input and write one output file\n # We will not provide a default value, so this parameter MUST be provided\n input: str = Field(\n description=\"Path to input file.\", flag_type=\"--\"\n )\n\n # We will also not provide a default for the output\n # BUT, we will specify that whatever is provided is the result\n output: str = Field(\n description=\"Path to write output to.\",\n flag_type=\"-\",\n rename_param=\"o\",\n is_result=True, # This means this parameter points to the result!\n )\n
Additional Comments 1. Model parameters of type bool
are not passed with an argument and are only passed when True
. This is a common use-case for boolean flags which enable things like test or debug modes, verbosity or reporting features. E.g. --debug
, --test
, --verbose
, etc. - If you need to pass the literal words \"True\"
or \"False\"
, use a parameter of type str
. 2. You can use pydantic
types to constrain parameters beyond the basic Python types. E.g. conint
can be used to define lower and upper bounds for an integer. There are also types for common categories, positive/negative numbers, paths, URLs, IP addresses, etc. - Even more custom behaviour can be achieved with validator
s (see below). 3. All TaskParameters
objects and its subclasses have access to a lute_config
parameter, which is of type lute.io.models.base.AnalysisHeader
. This special parameter is ignored when constructing the call for a binary task, but it provides access to shared/common parameters between tasks. For example, the following parameters are available through the lute_config
object, and may be of use when constructing validators. All fields can be accessed with .
notation. E.g. lute_config.experiment
. - title
: A user provided title/description of the analysis. - experiment
: The current experiment name - run
: The current acquisition run number - date
: The date of the experiment or the analysis. - lute_version
: The version of the software you are running. - task_timeout
: How long a Task
can run before it is killed. - work_dir
: The main working directory for LUTE. Files and the database are created relative to this directory. This is separate from the run_directory
config option. LUTE will write files to the work directory by default; however, the Task
itself is run from run_directory
if it is specified.
Validators Pydantic uses validators
to determine whether a value for a specific field is appropriate. There are default validators for all the standard library types and the types specified within the pydantic package; however, it is straightforward to define custom ones as well. In the template code-snippet above we imported the validator
decorator. To create our own validator we define a method (with any name) with the following prototype, and decorate it with the validator
decorator:
@validator(\"name_of_field_to_decorate\")\ndef my_custom_validator(cls, field: Any, values: Dict[str, Any]) -> Any: ...\n
In this snippet, the field
variable corresponds to the value for the specific field we want to validate. values
is a dictionary of fields and their values which have been parsed prior to the current field. This means you can validate the value of a parameter based on the values provided for other parameters. Since pydantic always validates the fields in the order they are defined in the model, fields dependent on other fields should come later in the definition.
For example, consider the method_param1
field defined above for RunTask
. We can provide a custom validator which changes the default value for this field depending on what type of algorithm is specified for the --method
option. We will also constrain the options for method
to two specific strings.
from pydantic import Field, validator, ValidationError, root_validator\nclass RunTaskParameters(ThirdPartyParameters):\n \"\"\"Parameters for the runtask binary.\"\"\"\n\n # [...]\n\n # In this case we will use the Python variable name directly when passing\n # the parameter on the command-line\n method: str = Field(\"algo1\", description=\"Algorithm to use.\", flag_type=\"--\")\n\n # For an actual parameter we would probably have a better name. Lets assume\n # This parameter (-p) modifies the behaviour of the method above.\n method_param1: Optional[int] = Field(\n description=\"Modify method performance.\", flag_type=\"-\", rename_param=\"p\"\n )\n\n # We will only allow method to take on one of two values\n @validator(\"method\")\n def validate_method(cls, method: str, values: Dict[str, Any]) -> str:\n \"\"\"Method validator: --method can be algo1 or algo2.\"\"\"\n\n valid_methods: List[str] = [\"algo1\", \"algo2\"]\n if method not in valid_methods:\n raise ValueError(\"method must be algo1 or algo2\")\n return method\n\n # Lets change the default value of `method_param1` depending on `method`\n # NOTE: We didn't provide a default value to the Field above and made it\n # optional. We can use this to test whether someone is purposefully\n # overriding the value of it, and if not, set the default ourselves.\n # We set `always=True` since pydantic will normally not use the validator\n # if the default is not changed\n @validator(\"method_param1\", always=True)\n def validate_method_param1(cls, param1: Optional[int], values: Dict[str, Any]) -> int:\n \"\"\"method param1 validator\"\"\"\n\n # If someone actively defined it, lets just return that value\n # We could instead do some additional validation to make sure that the\n # value they provided is valid...\n if param1 is not None:\n return param1\n\n # method_param1 comes after method, so this will be defined, or an error\n # would have been raised.\n method: str = values['method']\n if method == \"algo1\":\n return 3\n elif method == \"algo2\":\n return 5\n
The special root_validator(pre=False)
can also be used to provide validation of the model as a whole. This is also the recommended method for specifying a result (using result_from_params
) which has a complex dependence on the parameters of the model. This latter use-case is described in FAQ 2 below.
Use a custom validator. The example above shows how to do this. The parameter that depends on another parameter must come LATER in the model defintion than the independent parameter.
TaskResult
is determinable from the parameters model, but it isn't easily specified by one parameter. How can I use result_from_params
to indicate the result?When a result can be identified from the set of parameters defined in a TaskParameters
model, but is not as straightforward as saying it is equivalent to one of the parameters alone, we can set result_from_params
using a custom validator. In the example below, we have two parameters which together determine what the result is, output_dir
and out_name
. Using a validator we will define a result from these two values.
from pydantic import Field, root_validator\n\nclass RunTask3Parameters(ThirdPartyParameters):\n \"\"\"Parameters for the runtask3 binary.\"\"\"\n\n class Config(ThirdPartyParameters.Config):\n set_result: bool = True # This must be set here!\n result_from_params: str = \"\" # We will set this momentarily\n\n # [...] executable, other params, etc.\n\n output_dir: str = Field(\n description=\"Directory to write output to.\",\n flag_type=\"--\",\n rename_param=\"dir\",\n )\n\n out_name: str = Field(\n description=\"The name of the final output file.\",\n flag_type=\"--\",\n rename_param=\"oname\",\n )\n\n # We can still provide other validators as needed\n # But for now, we just set result_from_params\n # Validator name can be anything, we set pre=False so this runs at the end\n @root_validator(pre=False)\n def define_result(cls, values: Dict[str, Any]) -> Dict[str, Any]:\n # Extract the values of output_dir and out_name\n output_dir: str = values[\"output_dir\"]\n out_name: str = values[\"out_name\"]\n\n result: str = f\"{output_dir}/{out_name}\"\n # Now we set result_from_params\n cls.Config.result_from_params = result\n\n # We haven't modified any other values, but we MUST return this!\n return values\n
Task
depends on the output of a previous Task
, how can I specify this dependency? Parameters used to run a Task
are recorded in a database for every Task
. It is also recorded whether or not the execution of that specific parameter set was successful. A utility function is provided to access the most recent values from the database for a specific parameter of a specific Task
. It can also be used to specify whether unsuccessful Task
s should be included in the query. This utility can be used within a validator to specify dependencies. For example, suppose the input of RunTask2
(parameter input
) depends on the output location of RunTask1
(parameter outfile
). A validator of the following type can be used to retrieve the output file and make it the default value of the input parameter.from pydantic import Field, validator\n\nfrom .base import ThirdPartyParameters\nfrom ..db import read_latest_db_entry\n\nclass RunTask2Parameters(ThirdPartyParameters):\n input: str = Field(\"\", description=\"Input file.\", flag_type=\"--\")\n\n @validator(\"input\")\n def validate_input(cls, input: str, values: Dict[str, Any]) -> str:\n if input == \"\":\n task1_out: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", # Working directory. We search for the database here.\n \"RunTask1\", # Name of Task we want to look up\n \"outfile\", # Name of parameter of the Task\n valid_only=True, # We only want valid output files.\n )\n # read_latest_db_entry returns None if nothing is found\n if task1_out is not None:\n return task1_out\n return input\n
There are more examples of this pattern spread throughout the various Task
models.
Executor
: Creating a runnable, \"managed Task
\"","text":"Overview
After a pydantic model has been created, the next required step is to define a managed Task
. In the context of this library, a managed Task
refers to the combination of an Executor
and a Task
to run. The Executor
manages the process of Task
submission and the execution environment, as well as performing any logging, eLog communication, etc. There are currently two types of Executor
to choose from, but only one is applicable to third-party code. The second Executor
is listed below for completeness only. If you need MPI see the note below.
Executor
: This is the standard Executor
. It should be used for third-party uses cases.MPIExecutor
: This performs all the same types of operations as the option above; however, it will submit your Task
using MPI.MPIExecutor
will submit the Task
using the number of available cores - 1. The number of cores is determined from the physical core/thread count on your local machine, or the number of cores allocated by SLURM when submitting on the batch nodes.Using MPI with third-party Task
s
As mentioned, you should setup a third-party Task
to use the first type of Executor
. If, however, your third-party Task
uses MPI this may seem non-intuitive. When using the MPIExecutor
LUTE code is submitted with MPI. This includes the code that performs signalling to the Executor
and exec
s the third-party code you are interested in running. While it is possible to set this code up to run with MPI, it is more challenging in the case of third-party Task
s because there is no Task
code to modify directly! The MPIExecutor
is provided mostly for first-party code. This is not an issue, however, since the standard Executor
is easily configured to run with MPI in the case of third-party code.
When using the standard Executor
for a Task
requiring MPI, the executable
in the pydantic model must be set to mpirun
. For example, a third-party Task
model, that uses MPI but is intended to be run with the Executor
may look like the following. We assume this Task
runs a Python script using MPI.
class RunMPITaskParameters(ThirdPartyParameters):\n class Config(ThirdPartyParameters.Config):\n ...\n\n executable: str = Field(\"mpirun\", description=\"MPI executable\")\n np: PositiveInt = Field(\n max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n description=\"Number of processes\",\n flag_type=\"-\",\n )\n pos_arg: str = Field(\"python\", description=\"Python...\", flag_type=\"\")\n script: str = Field(\"\", description=\"Python script to run with MPI\", flag_type=\"\")\n
Selecting the Executor
After deciding on which Executor
to use, a single line must be added to the lute/managed_tasks.py
module:
# Initialization: Executor(\"TaskName\")\nTaskRunner: Executor = Executor(\"SubmitTask\")\n# TaskRunner: MPIExecutor = MPIExecutor(\"SubmitTask\") ## If using the MPIExecutor\n
In an attempt to make it easier to discern whether discussing a Task
or managed Task
, the standard naming convention is that the Task
(class name) will have a verb in the name, e.g. RunTask
, SubmitTask
. The corresponding managed Task
will use a related noun, e.g. TaskRunner
, TaskSubmitter
, etc.
As a reminder, the Task
name is the first part of the class name of the pydantic model, without the Parameters
suffix. This name must match. E.g. if your pydantic model's class name is RunTaskParameters
, the Task
name is RunTask
, and this is the string passed to the Executor
initializer.
Modifying the environment
If your third-party Task
can run in the standard psana
environment with no further configuration files, the setup process is now complete and your Task
can be run within the LUTE framework. If on the other hand your Task
requires some changes to the environment, this is managed through the Executor
. There are a couple principle methods that the Executor
has to change the environment.
Executor.update_environment
: if you only need to add a few environment variables, or update the PATH
this is the method to use. The method takes a Dict[str, str]
as input. Any variables can be passed/defined using this method. By default, any variables in the dictionary will overwrite those variable definitions in the current environment if they are already present, except for the variable PATH
. By default PATH
entries in the dictionary are prepended to the current PATH
available in the environment the Executor
runs in (the standard psana
environment). This behaviour can be changed to either append, or overwrite the PATH
entirely by an optional second argument to the method.Executor.shell_source
: This method will source a shell script which can perform numerous modifications of the environment (PATH changes, new environment variables, conda environments, etc.). The method takes a str
which is the path to a shell script to source.As an example, we will update the PATH
of one Task
and source a script for a second.
TaskRunner: Executor = Executor(\"RunTask\")\n# update_environment(env: Dict[str,str], update_path: str = \"prepend\") # \"append\" or \"overwrite\"\nTaskRunner.update_environment(\n { \"PATH\": \"/sdf/group/lcls/ds/tools\" } # This entry will be prepended to the PATH available after sourcing `psconda.sh`\n)\n\nTask2Runner: Executor = Executor(\"RunTask2\")\nTask2Runner.shell_source(\"/sdf/group/lcls/ds/tools/new_task_setup.sh\") # Will source new_task_setup.sh script\n
"},{"location":"tutorial/new_task/#using-templates-managing-third-party-configuration-files","title":"Using templates: managing third-party configuration files","text":"Some third-party executables will require their own configuration files. These are often separate JSON or YAML files, although they can also be bash or Python scripts which are intended to be edited. Since LUTE requires its own configuration YAML file, it attempts to handle these cases by using Jinja templates. When wrapping a third-party task a template can also be provided - with small modifications to the Task
's pydantic model, LUTE can process special types of parameters to render them in the template. LUTE offloads all the template rendering to Jinja, making the required additions to the pydantic model small. On the other hand, it does require understanding the Jinja syntax, and the provision of a well-formatted template, to properly parse parameters. Some basic examples of this syntax will be shown below; however, it is recommended that the Task
implementer refer to the official Jinja documentation for more information.
LUTE provides two additional base models which are used for template parsing in conjunction with the primary Task
model. These are: - TemplateParameters
objects which hold parameters which will be used to render a portion of a template. - TemplateConfig
objects which hold two strings: the name of the template file to use and the full path (including filename) of where to output the rendered result.
Task
models which inherit from the ThirdPartyParameters
model, as all third-party Task
s should, allow for extra arguments. LUTE will parse any extra arguments provided in the configuration YAML as TemplateParameters
objects automatically, which means that they do not need to be explicitly added to the pydantic model (although they can be). As such the only requirement on the Python-side when adding template rendering functionality to the Task
is the addition of one parameter - an instance of TemplateConfig
. The instance MUST be called lute_template_cfg
.
from pydantic import Field, validator\n\nfrom .base import TemplateConfig\n\nclass RunTaskParamaters(ThirdPartyParameters):\n ...\n # This parameter MUST be called lute_template_cfg!\n lute_template_cfg: TemplateConfig = Field(\n TemplateConfig(\n template_name=\"name_of_template.json\",\n output_path=\"/path/to/write/rendered_output_to.json\",\n ),\n description=\"Template rendering configuration\",\n )\n
LUTE looks for the template in config/templates
, so only the name of the template file to use within that directory is required for the template_name
attribute of lute_template_cfg
. LUTE can write the output anywhere (the user has permissions), and with any name, so the full absolute path including filename should be used for the output_path
of lute_template_cfg
.
The rest of the work is done by the combination of Jinja, LUTE's configuration YAML file, and the template itself. Understanding the interplay between these components is perhaps best illustrated by an example. As such, let us consider a simple third-party Task
whose only input parameter (on the command-line) is the location of a configuration JSON file. We'll call the third-party executable jsonuser
and our Task
model, the RunJsonUserParameters
. We assume the program is run like:
jsonuser -i <input_file.json>\n
The first step is to setup the pydantic model as before.
from pydantic import Field, validator\n\nfrom .base import TemplateConfig\n\nclass RunJsonUserParameters:\n executable: str = Field(\n \"/path/to/jsonuser\", description=\"Executable which requires a JSON configuration file.\"\n )\n # Lets assume the JSON file is passed as \"-i <path_to_json>\"\n input_json: str = Field(\n \"\", description=\"Path to the input JSON file.\", flag_type=\"-\", rename_param=\"i\"\n )\n
The next step is to create a template for the JSON file. Let's assume the JSON file looks like:
{\n \"param1\": \"arg1\",\n \"param2\": 4,\n \"param3\": {\n \"a\": 1,\n \"b\": 2\n },\n \"param4\": [\n 1,\n 2,\n 3\n ]\n}\n
Any, or all of these values can be substituted for, and we can determine the way in which we will provide them. I.e. a substitution can be provided for each variable individually, or, for example for a nested hierarchy, a dictionary can be provided which will substitute all the items at once. For this simple case, let's provide variables for param1
, param2
, param3.b
and assume that we want the first and second entries for param4
to be identical for our use case (i.e., we can use one variable for them both. In total, this means we will perform 5 substitutions using 4 variables. Jinja will substitute a variable anywhere it sees the following syntax, {{ variable_name }}
. As such a valid template for our use-case may look like:
{\n \"param1\": {{ str_var }},\n \"param2\": {{ int_var }},\n \"param3\": {\n \"a\": 1,\n \"b\": {{ p3_b }}\n },\n \"param4\": [\n {{ val }},\n {{ val }},\n 3\n ]\n}\n
We save this file as jsonuser.json
in config/templates
. Next, we will update the original pydantic model to include our template configuration. We still have an issue, however, in that we need to decide where to write the output of the template to. In this case, we can use the input_json
parameter. We will assume that the user will provide this, although a default value can also be used. A custom validator will be added so that we can take the input_json
value and update the value of lute_template_cfg.output_path
with it.
# from typing import Optional\n\nfrom pydantic import Field, validator\n\nfrom .base import TemplateConfig #, TemplateParameters\n\nclass RunJsonUserParameters:\n executable: str = Field(\n \"jsonuser\", description=\"Executable which requires a JSON configuration file.\"\n )\n # Lets assume the JSON file is passed as \"-i <path_to_json>\"\n input_json: str = Field(\n \"\", description=\"Path to the input JSON file.\", flag_type=\"-\", rename_param=\"i\"\n )\n # Add template configuration! *MUST* be called `lute_template_cfg`\n lute_template_cfg: TemplateConfig = Field(\n TemplateConfig(\n template_name=\"jsonuser.json\", # Only the name of the file here.\n output_path=\"\",\n ),\n description=\"Template rendering configuration\",\n )\n # We do not need to include these TemplateParameters, they will be added\n # automatically if provided in the YAML\n #str_var: Optional[TemplateParameters]\n #int_var: Optional[TemplateParameters]\n #p3_b: Optional[TemplateParameters]\n #val: Optional[TemplateParameters]\n\n\n # Tell LUTE to write the rendered template to the location provided with\n # `input_json`. I.e. update `lute_template_cfg.output_path`\n @validator(\"lute_template_cfg\", always=True)\n def update_output_path(\n cls, lute_template_cfg: TemplateConfig, values: Dict[str, Any]\n ) -> TemplateConfig:\n if lute_template_cfg.output_path == \"\":\n lute_template_cfg.output_path = values[\"input_json\"]\n return lute_template_cfg\n
All that is left to render the template, is to provide the variables we want to substitute in the LUTE configuration YAML. In our case we must provide the 4 variable names we included within the substitution syntax ({{ var_name }}
). The names in the YAML must match those in the template.
RunJsonUser:\n input_json: \"/my/chosen/path.json\" # We'll come back to this...\n str_var: \"arg1\" # Will substitute for \"param1\": \"arg1\"\n int_var: 4 # Will substitute for \"param2\": 4\n p3_b: 2 # Will substitute for \"param3: { \"b\": 2 }\n val: 2 # Will substitute for \"param4\": [2, 2, 3] in the JSON\n
If on the other hand, a user were to have an already valid JSON file, it is possible to turn off the template rendering. (ALL) Template variables (TemplateParameters
) are simply excluded from the configuration YAML.
RunJsonUser:\n input_json: \"/path/to/existing.json\"\n #str_var: ...\n #...\n
"},{"location":"tutorial/new_task/#additional-jinja-syntax","title":"Additional Jinja Syntax","text":"There are many other syntactical constructions we can use with Jinja. Some of the useful ones are:
If Statements - E.g. only include portions of the template if a value is defined.
{% if VARNAME is defined %}\n// Stuff to include\n{% endif %}\n
Loops - E.g. Unpacking multiple elements from a dictionary.
{% for name, value in VARNAME.items() %}\n// Do stuff with name and value\n{% endfor %}\n
"},{"location":"tutorial/new_task/#creating-a-first-party-task","title":"Creating a \"First-Party\" Task
","text":"The process for creating a \"First-Party\" Task
is very similar to that for a \"Third-Party\" Task
, with the difference being that you must also write the analysis code. The steps for integration are: 1. Write the TaskParameters
model. 2. Write the Task
class. There are a few rules that need to be adhered to. 3. Make your Task
available by modifying the import function. 4. Specify an Executor
TaskParameters
Model for your Task
","text":"Parameter models have a format that must be followed for \"Third-Party\" Task
s, but \"First-Party\" Task
s have a little more liberty in how parameters are dealt with, since the Task
will do all the parsing itself.
To create a model, the basic steps are: 1. If necessary, create a new module (e.g. new_task_category.py
) under lute.io.models
, or find an appropriate pre-existing module in that directory. - An import
statement must be added to lute.io.models._init_
if a new module is created, so it can be found. - If defining the model in a pre-existing module, make sure to modify the __all__
statement to include it. 2. Create a new model that inherits from TaskParameters
. You can look at lute.models.io.tests.TestReadOutputParameters
for an example. The model must be named <YourTaskName>Parameters
- You should include all relevant parameters here, including input file, output file, and any potentially adjustable parameters. These parameters must be included even if there are some implicit dependencies between Task
s and it would make sense for the parameter to be auto-populated based on some other output. Creating this dependency is done with validators (see step 3.). All parameters should be overridable, and all Task
s should be fully-independently configurable, based solely on their model and the configuration YAML. - To follow the preferred format, parameters should be defined as: param_name: type = Field([default value], description=\"This parameter does X.\")
3. Use validators to do more complex things for your parameters, including populating default values dynamically: - E.g. create default values that depend on other parameters in the model - see for example: SubmitSMDParameters. - E.g. create default values that depend on other Task
s by reading from the database - see for example: TestReadOutputParameters. 4. The model will have access to some general configuration values by inheriting from TaskParameters
. These parameters are all stored in lute_config
which is an instance of AnalysisHeader
(defined here). - For example, the experiment and run number can be obtained from this object and a validator could use these values to define the default input file for the Task
.
A number of configuration options and Field attributes are also available for \"First-Party\" Task
models. These are identical to those used for the ThirdPartyTask
s, although there is a smaller selection. These options are reproduced below for convenience.
Config settings and options Under the class definition for Config
in the model, we can modify global options for all the parameters. In addition, there are a number of configuration options related to specifying what the outputs/results from the associated Task
are, and a number of options to modify runtime behaviour. Currently, the available configuration options are:
run_directory
If provided, can be used to specify the directory from which a Task
is run. None
(not provided) NO set_result
bool
. If True
search the model definition for a parameter that indicates what the result is. False
NO result_from_params
If set_result
is True
can define a result using this option and a validator. See also is_result
below. None
(not provided) NO short_flags_use_eq
Use equals sign instead of space for arguments of -
parameters. False
YES - Only affects ThirdPartyTask
s long_flags_use_eq
Use equals sign instead of space for arguments of -
parameters. False
YES - Only affects ThirdPartyTask
s These configuration options modify how the parameter models are parsed and passed along on the command-line, as well as what we consider results and where a Task
can run. The default behaviour is that parameters are assumed to be passed as -p arg
and --param arg
, the Task
will be run in the current working directory (or scratch if submitted with the ARP), and we have no information about Task
results . Setting the above options can modify this behaviour.
short_flags_use_eq
and/or long_flags_use_eq
to True
parameters are instead passed as -p=arg
and --param=arg
.run_directory
to a valid path, we can force a Task
to be run in a specific directory. By default the Task
will be run from the directory you submit the job in, or from your scratch folder (/sdf/scratch/...
) if you submit from the eLog. Some ThirdPartyTask
s rely on searching the correct working directory in order run properly.set_result
to True
we indicate that the TaskParameters
model will provide information on what the TaskResult
is. This setting must be used with one of two options, either the result_from_params
Config
option, described below, or the Field attribute is_result
described in the next sub-section (Field Attributes).result_from_params
is a Config option that can be used when set_result==True
. In conjunction with a validator (described a sections down) we can use this option to specify a result from all the information contained in the model. E.g. if you have a Task
that has parameters for an output_directory
and a output_filename
, you can set result_from_params==f\"{output_directory}/{output_filename}\"
.Field attributes In addition to the global configuration options there are a couple of ways to specify individual parameters. The following Field
attributes are used when parsing the model:
description
Documentation of the parameter's usage or purpose. N/A arg = Field(..., description=\"Argument for...\")
is_result
bool
. If the set_result
Config
option is True
, we can set this to True
to indicate a result. N/A output_result = Field(..., is_result=true)
"},{"location":"tutorial/new_task/#writing-the-task","title":"Writing the Task
","text":"You can write your analysis code (or whatever code to be executed) as long as it adheres to the limited rules below. You can create a new module for your Task
in lute.tasks
or add it to any existing module, if it makes sense for it to belong there. The Task
itself is a single class constructed as:
Task
is a class named in a way that matches its Pydantic model. E.g. RunTask
is the Task
, and RunTaskParameters
is the Pydantic model.Task
class (see template below). If you intend to use MPI see the following section._run
method. This is the method that will be executed when the Task
is run. You can in addition write as many methods as you need. For fine-grained execution control you can also provide _pre_run()
and _post_run()
methods, but this is optional._report_to_executor(msg: Message)
method. Since the Task
is run as a subprocess this method will pass information to the controlling Executor
. You can pass any type of object using this method, strings, plots, arrays, etc.set_result
configuration option in your parameters model, make sure to provide a result when finished. This is done by setting self._result.payload = ...
. You can set the result to be any object. If you have written the result to a file, for example, please provide a path.A minimal template is provided below.
\"\"\"Standard docstring...\"\"\"\n\n__all__ = [\"RunTask\"]\n__author__ = \"\" # Please include so we know who the SME is\n\n# Include any imports you need here\n\nfrom lute.execution.ipc import Message # Message for communication\nfrom lute.io.models.base import * # For TaskParameters\nfrom lute.tasks.task import * # For Task\n\nclass RunTask(Task): # Inherit from Task\n \"\"\"Task description goes here, or in __init__\"\"\"\n\n def __init__(self, *, params: TaskParameters) -> None:\n super().__init__(params=params) # Sets up Task, parameters, etc.\n # Parameters will be available through:\n # self._task_parameters\n # You access with . operator: self._task_parameters.param1, etc.\n # Your result object is availble through:\n # self._result\n # self._result.payload <- Main result\n # self._result.summary <- Short summary\n # self._result.task_status <- Semi-automatic, but can be set manually\n\n def _run(self) -> None:\n # THIS METHOD MUST BE PROVIDED\n self.do_my_analysis()\n\n def do_my_analysis(self) -> None:\n # Send a message, proper way to print:\n msg: Message(contents=\"My message contents\", signal=\"\")\n self._report_to_executor(msg)\n\n # When done, set result - assume we wrote a file, e.g.\n self._result.payload = \"/path/to/output_file.h5\"\n # Optionally also set status - good practice but not obligatory\n self._result.task_status = TaskStatus.COMPLETED\n
"},{"location":"tutorial/new_task/#using-mpi-for-your-task","title":"Using MPI for your Task
","text":"In the case your Task
is written to use MPI
a slight modification to the template above is needed. Specifically, an additional keyword argument should be passed to the base class initializer: use_mpi=True
. This tells the base class to adjust signalling/communication behaviour appropriately for a multi-rank MPI program. Doing this prevents tricky-to-track-down problems due to ranks starting, completing and sending messages at different times. The rest of your code can, as before, be written as you see fit. The use of this keyword argument will also synchronize the start of all ranks and wait until all ranks have finished to exit.
\"\"\"Task which needs to run with MPI\"\"\"\n\n__all__ = [\"RunTask\"]\n__author__ = \"\" # Please include so we know who the SME is\n\n# Include any imports you need here\n\nfrom lute.execution.ipc import Message # Message for communication\nfrom lute.io.models.base import * # For TaskParameters\nfrom lute.tasks.task import * # For Task\n\n# Only the init is shown\nclass RunMPITask(Task): # Inherit from Task\n \"\"\"Task description goes here, or in __init__\"\"\"\n\n # Signal the use of MPI!\n def __init__(self, *, params: TaskParameters, use_mpi: bool = True) -> None:\n super().__init__(params=params, use_mpi=use_mpi) # Sets up Task, parameters, etc.\n # That's it.\n
"},{"location":"tutorial/new_task/#message-signals","title":"Message signals","text":"Signals in Message
objects are strings and can be one of the following:
LUTE_SIGNALS: Set[str] = {\n \"NO_PICKLE_MODE\",\n \"TASK_STARTED\",\n \"TASK_FAILED\",\n \"TASK_STOPPED\",\n \"TASK_DONE\",\n \"TASK_CANCELLED\",\n \"TASK_RESULT\",\n}\n
Each of these signals is associated with a hook on the Executor
-side. They are for the most part used by base classes; however, you can choose to make use of them manually as well.
Task
available","text":"Once the Task
has been written, it needs to be made available for import. Since different Task
s can have conflicting dependencies and environments, this is managed through an import function. When the Task
is done, or ready for testing, a condition is added to lute.tasks.__init__.import_task
. For example, assume the Task
is called RunXASAnalysis
and it's defined in a module called xas.py
, we would add the following lines to the import_task
function:
# in lute.tasks.__init__\n\n# ...\n\ndef import_task(task_name: str) -> Type[Task]:\n # ...\n if task_name == \"RunXASAnalysis\":\n from .xas import RunXASAnalysis\n\n return RunXASAnalysis\n
"},{"location":"tutorial/new_task/#defining-an-executor","title":"Defining an Executor
","text":"The process of Executor
definition is identical to the process as described for ThirdPartyTask
s above. The one exception is if you defined the Task
to use MPI as described in the section above (Using MPI for your Task
), you will likely consider using the MPIExecutor
.
LUTE is publically available on GitHub. In order to run it, the first step is to clone the repository:
# Navigate to the directory of your choice.\ngit clone@github.com:slac-lcls/lute\n
The repository directory structure is as follows:
lute\n |--- config # Configuration YAML files (see below) and templates for third party config\n |--- docs # Documentation (including this page)\n |--- launch_scripts # Entry points for using SLURM and communicating with Airflow\n |--- lute # Code\n |--- run_task.py # Script to run an individual managed Task\n |--- ...\n |--- utilities # Help utility programs\n |--- workflows # This directory contains workflow definitions. It is synced elsewhere and not used directly.\n\n
In general, most interactions with the software will be through scripts located in the launch_scripts
directory. Some users (for certain use-cases) may also choose to run the run_task.py
script directly - it's location has been highlighted within hierarchy. To begin with you will need a YAML file, templates for which are available in the config
directory. The structure of the YAML file and how to use the various launch scripts are described in more detail below.
In the utilities
directory there are two useful programs to provide assistance with using the software:
utilities/dbview
: LUTE stores all parameters for every analysis routine it runs (as well as results) in a database. This database is stored in the work_dir
defined in the YAML file (see below). The dbview
utility is a TUI application (Text-based user interface) which runs in the terminal. It allows you to navigate a LUTE database using the arrow keys, etc. Usage is: utilities/dbview -p <path/to/lute.db>
.utilities/lute_help
: This utility provides help and usage information for running LUTE software. E.g., it provides access to parameter descriptions to assist in properly filling out a configuration YAML. It's usage is described in slightly more detail below.LUTE runs code as Task
s that are managed by an Executor
. The Executor
provides modifications to the environment the Task
runs in, as well as controls details of inter-process communication, reporting results to the eLog, etc. Combinations of specific Executor
s and Task
s are already provided, and are referred to as managed Task
s. Managed Task
s are submitted as a single unit. They can be run individually, or a series of independent steps can be submitted all at once in the form of a workflow, or directed acyclic graph (DAG). This latter option makes use of Airflow to manage the individual execution steps.
Running analysis with LUTE is the process of submitting one or more managed Task
s. This is generally a two step process.
Task
s which you may run.Task
submission, or workflow (DAG) submission.These two steps are described below.
"},{"location":"#preparing-a-configuration-yaml","title":"Preparing a Configuration YAML","text":"All Task
s are parameterized through a single configuration YAML file - even third party code which requires its own configuration files is managed through this YAML file. The basic structure is split into two documents, a brief header section which contains information that is applicable across all Task
s, such as the experiment name, run numbers and the working directory, followed by per Task
parameters:
%YAML 1.3\n---\ntitle: \"Some title.\"\nexperiment: \"MYEXP123\"\n# run: 12 # Does not need to be provided\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/scratch/users/d/dorlhiac\"\n...\n---\nTaskOne:\n param_a: 123\n param_b: 456\n param_c:\n sub_var: 3\n sub_var2: 4\n\nTaskTwo:\n new_param1: 3\n new_param2: 4\n\n# ...\n...\n
In the first document, the header, it is important that the work_dir
is properly specified. This is the root directory from which Task
outputs will be written, and the LUTE database will be stored. It may also be desirable to modify the task_timeout
parameter which defines the time limit for individual Task
jobs. By default it is set to 10 minutes, although this may not be sufficient for long running jobs. This value will be applied to all Task
s so should account for the longest running job you expect.
The actual analysis parameters are defined in the second document. As these vary from Task
to Task
, a full description will not be provided here. An actual template with real Task
parameters is available in config/test.yaml
. Your analysis POC can also help you set up and choose the correct Task
s to include as a starting point. The template YAML file has further descriptions of what each parameter does and how to fill it out. You can also refer to the lute_help
program described under the following sub-heading.
Some things to consider and possible points of confusion:
Task
s, the parameters are defined at the Task
level. I.e. the managed Task
and Task
itself have different names, and the names in the YAML refer to the latter. This is because a single Task
can be run using different Executor
configurations, but using the same parameters. The list of managed Task
s is in lute/managed_tasks.py
. A table is also provided below for some routines of interest..Task
The Task
it Runs Task
Description SmallDataProducer
SubmitSMD
Smalldata production CrystFELIndexer
IndexCrystFEL
Crystallographic indexing PartialatorMerger
MergePartialator
Crystallographic merging HKLComparer
CompareHKL
Crystallographic figures of merit HKLManipulator
ManipulateHKL
Crystallographic format conversions DimpleSolver
DimpleSolve
Crystallographic structure solution with molecular replacement PeakFinderPyAlgos
FindPeaksPyAlgos
Peak finding with PyAlgos algorithm. PeakFinderPsocake
FindPeaksPsocake
Peak finding with psocake algorithm. StreamFileConcatenator
ConcatenateStreamFiles
Stream file concatenation."},{"location":"#how-do-i-know-what-parameters-are-available-and-what-they-do","title":"How do I know what parameters are available, and what they do?","text":"A summary of Task
parameters is available through the lute_help
program.
> utilities/lute_help -t [TaskName]\n
Note, some parameters may say \"Unknown description\" - this either means they are using an old-style defintion that does not include parameter help, or they may have some internal use. In particular you will see this for lute_config
on every Task
, this parameter is filled in automatically and should be ignored. E.g. as an example:
> utilities/lute_help -t IndexCrystFEL\nINFO:__main__:Fetching parameter information for IndexCrystFEL.\nIndexCrystFEL\n-------------\nParameters for CrystFEL's `indexamajig`.\n\nThere are many parameters, and many combinations. For more information on\nusage, please refer to the CrystFEL documentation, here:\nhttps://www.desy.de/~twhite/crystfel/manual-indexamajig.html\n\n\nRequired Parameters:\n--------------------\n[...]\n\nAll Parameters:\n-------------\n[...]\n\nhighres (number)\n Mark all pixels greater than `x` has bad.\n\nprofile (boolean) - Default: False\n Display timing data to monitor performance.\n\ntemp_dir (string)\n Specify a path for the temp files folder.\n\nwait_for_file (integer) - Default: 0\n Wait at most `x` seconds for a file to be created. A value of -1 means wait forever.\n\nno_image_data (boolean) - Default: False\n Load only the metadata, no iamges. Can check indexability without high data requirements.\n\n[...]\n
"},{"location":"#running-managed-tasks-and-workflows-dags","title":"Running Managed Task
s and Workflows (DAGs)","text":"After a YAML file has been filled in you can run a Task
. There are multiple ways to submit a Task
, but there are 3 that are most likely:
Task
interactively by running python ...
Task
as a batch job (e.g. on S3DF) via a SLURM submission submit_slurm.sh ...
Task
s).These will be covered in turn below; however, in general all methods will require two parameters: the path to a configuration YAML file, and the name of the managed Task
or workflow you want to run. When submitting via SLURM or submitting an entire workflow there are additional parameters to control these processes.
Task
s interactively","text":"The simplest submission method is just to run Python interactively. In most cases this is not practical for long-running analysis, but may be of use for short Task
s or when debugging. From the root directory of the LUTE repository (or after installation) you can use the run_task.py
script:
> python -B [-O] run_task.py -t <ManagedTaskName> -c </path/to/config/yaml>\n
The command-line arguments in square brackets []
are optional, while those in <>
must be provided:
-O
is the flag controlling whether you run in debug or non-debug mode. By default, i.e. if you do NOT provide this flag you will run in debug mode which enables verbose printing. Passing -O
will turn off debug to minimize output.-t <ManagedTaskName>
is the name of the managed Task
you want to run.-c </path/...>
is the path to the configuration YAML.Task
as a batch job","text":"On S3DF you can also submit individual managed Task
s to run as batch jobs. To do so use launch_scripts/submit_slurm.sh
> launch_scripts/submit_slurm.sh -t <ManagedTaskName> -c </path/to/config/yaml> [--debug] $SLURM_ARGS\n
As before command-line arguments in square brackets []
are optional, while those in <>
must be provided
-t <ManagedTaskName>
is the name of the managed Task
you want to run.-c </path/...>
is the path to the configuration YAML.--debug
is the flag to control whether or not to run in debug mode.In addition to the LUTE-specific arguments, SLURM arguments must also be provided ($SLURM_ARGS
above). You can provide as many as you want; however you will need to at least provide:
--partition=<partition/queue>
- The queue to run on, in general for LCLS this is milano
--account=lcls:<experiment>
- The account to use for batch job accounting.You will likely also want to provide at a minimum:
--ntasks=<...>
to control the number of cores in allocated.In general, it is best to prefer the long-form of the SLURM-argument (--arg=<...>
) in order to avoid potential clashes with present or future LUTE arguments.
Finally, you can submit a full workflow (e.g. SFX analysis, smalldata production and summary results, geometry optimization...). This can be done using a single script, submit_launch_airflow.sh
, similarly to the SLURM submission above:
> launch_scripts/submit_launch_airflow.sh /path/to/lute/launch_scripts/launch_airflow.py -c </path/to/yaml.yaml> -w <dag_name> [--debug] [--test] [-e <exp>] [-r <run>] $SLURM_ARGS\n
The submission process is slightly more complicated in this case. A more in-depth explanation is provided under \"Airflow Launch Steps\", in the advanced usage section below if interested. The parameters are as follows - as before command-line arguments in square brackets []
are optional, while those in <>
must be provided:
launch_scripts/launch_airflow.py
script located in whatever LUTE installation you are running. All other arguments can come afterwards in any order.-c </path/...>
is the path to the configuration YAML to use.-w <dag_name>
is the name of the DAG (workflow) to run. This replaces the task name provided when using the other two methods above. A DAG list is provided below.-W
(capital W) followed by the path to the workflow instead of -w
. See below for further discussion on this use case.--debug
controls whether to use debug mode (verbose printing)--test
controls whether to use the test or production instance of Airflow to manage the DAG. The instances are running identical versions of Airflow, but the test
instance may have \"test\" or more bleeding edge development DAGs.-e
is used to pass the experiment name. Needed if not using the ARP, i.e. running from the command-line.-r
is used to pass a run number. Needed if not using the ARP, i.e. running from the command-line.The $SLURM_ARGS
must be provided in the same manner as when submitting an individual managed Task
by hand to be run as batch job with the script above. Note that these parameters will be used as the starting point for the SLURM arguments of every managed Task
in the DAG; however, individual steps in the DAG may have overrides built-in where appropriate to make sure that step is not submitted with potentially incompatible arguments. For example, a single threaded analysis Task
may be capped to running on one core, even if in general everything should be running on 100 cores, per the SLURM argument provided. These caps are added during development and cannot be disabled through configuration changes in the YAML.
DAG List
find_peaks_index
psocake_sfx_phasing
pyalgos_sfx
eLog
","text":"You can use the script in the previous section to submit jobs through the eLog. To do so navigate to the Workflow > Definitions
tab using the blue navigation bar at the top of the eLog. On this tab, in the top-right corner (underneath the help and zoom icons) you can click the +
sign to add a new workflow. This will bring up a \"Workflow definition\" UI window. When filling out the eLog workflow definition the following fields are needed (all of them):
Name
: You can name the workflow anything you like. It should probably be something descriptive, e.g. if you are using LUTE to run smalldata_tools, you may call the workflow lute_smd
.Executable
: In this field you will put the full path to the submit_launch_airflow.sh
script: /path/to/lute/launch_scripts/submit_launch_airflow.sh
.Parameters
: You will use the parameters as described above. Remember the first argument will be the full path to the launch_airflow.py
script (this is NOT the same as the bash script used in the executable!): /full/path/to/lute/launch_scripts/launch_airflow.py -c <path/to/yaml> -w <dag_name> [--debug] [--test] $SLURM_ARGS
Location
: Be sure to set to S3DF
.Trigger
: You can have the workflow trigger automatically or manually. Which option to choose will depend on the type of workflow you are running. In general the options Manually triggered
(which displays as MANUAL
on the definitions page) and End of a run
(which displays as END_OF_RUN
on the definitions page) are safe options for ALL workflows. The latter will be automatically submitted for you when data acquisition has finished. If you are running a workflow with managed Task
s that work as data is being acquired (e.g. SmallDataProducer
), you may also select Start of a run
(which displays as START_OF_RUN
on the definitions page).Upon clicking create you will see a new entry in the table on the definitions page. In order to run MANUAL
workflows, or re-run automatic workflows, you must navigate to the Workflows > Control
tab. For each acquisition run you will find a drop down menu under the Job
column. To submit a workflow you select it from this drop down menu by the Name
you provided when creating its definition.
Using validator
s, it is possible to define (generally, default) model parameters for a Task
in terms of other parameters. It is also possible to use validated Pydantic model parameters to substitute values into a configuration file required to run a third party Task
(e.g. some Task
s may require their own JSON, TOML files, etc. to run properly). For more information on these types of substitutions, refer to the new_task.md
documentation on Task
creation.
These types of substitutions, however, have a limitation in that they are not easily adapted at run time. They therefore address only a small number of the possible combinations in the dependencies between different input parameters. In order to support more complex relationships between parameters, variable substitutions can also be used in the configuration YAML itself. Using a syntax similar to Jinja
templates, you can define values for YAML parameters in terms of other parameters or environment variables. The values are substituted before Pydantic attempts to validate the configuration.
It is perhaps easiest to illustrate with an example. A test case is provided in config/test_var_subs.yaml
and is reproduced here:
%YAML 1.3\n---\ntitle: \"Configuration to Test YAML Substitution\"\nexperiment: \"TestYAMLSubs\"\nrun: 12\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/scratch/users/d/dorlhiac\"\n...\n---\nOtherTask:\n useful_other_var: \"USE ME!\"\n\nNonExistentTask:\n test_sub: \"/path/to/{{ experiment }}/file_r{{ run:04d }}.input\" # Substitute `experiment` and `run` from header above\n test_env_sub: \"/path/to/{{ $EXPERIMENT }}/file.input\" # Substitute from the environment variable $EXPERIMENT\n test_nested:\n a: \"outfile_{{ run }}_one.out\" # Substitute `run` from header above\n b:\n c: \"outfile_{{ run }}_two.out\" # Also substitute `run` from header above\n d: \"{{ OtherTask.useful_other_var }}\" # Substitute `useful_other_var` from `OtherTask`\n test_fmt: \"{{ run:04d }}\" # Subsitute `run` and format as 0012\n test_env_fmt: \"{{ $RUN:04d }}\" # Substitute environment variable $RUN and pad to 4 w/ zeros\n...\n
Input parameters in the config YAML can be substituted with either other input parameters or environment variables, with or without limited string formatting. All substitutions occur between double curly brackets: {{ VARIABLE_TO_SUBSTITUTE }}
. Environment variables are indicated by $
in front of the variable name. Parameters from the header, i.e. the first YAML document (top section) containing the run
, experiment
, version fields, etc. can be substituted without any qualification. If you want to use the run
parameter, you can substitute it using {{ run }}
. All other parameters, i.e. from other Task
s or within Task
s, must use a qualified name. Nested levels are delimited using a .
. E.g. consider a structure like:
Task:\n param_set:\n a: 1\n b: 2\n c: 3\n
In order to use parameter c
, you would use {{ Task.param_set.c }}
as the substitution.
Take care when using substitutions! This process will not try to guess for you. When a substitution is not available, e.g. due to misspelling, one of two things will happen:
param: /my/failed/{{ $SUBSTITUTION }}
as your parameter. This may or may not fail the model validation step, but is likely not what you intended.Defining your own parameters
The configuration file is not validated in its totality, only on a Task
-by-Task
basis, but it is read in its totality. E.g. when running MyTask
only that portion of the configuration is validated even though the entire file has been read, and is available for substitutions. As a result, it is safe to introduce extra entries into the YAML file, as long as they are not entered under a specific Task
's configuration. This may be useful to create your own global substitutions, for example if there is a key variable that may be used across different Task
s. E.g. Consider a case where you want to create a more generic configuration file where a single variable is used by multiple Task
s. This single variable may be changed between experiments, for instance, but is likely static for the duration of a single set of analyses. In order to avoid a mistake when changing the configuration between experiments you can define this special variable (or variables) as a separate entry in the YAML, and make use of substitutions in each Task
's configuration. This way the variable only needs to be changed in one place.
# Define our substitution. This is only for substitutiosns!\nMY_SPECIAL_SUB: \"EXPMT_DEPENDENT_VALUE\" # Can change here once per experiment!\n\nRunTask1:\n special_var: \"{{ MY_SPECIAL_SUB }}\"\n var_1: 1\n var_2: \"a\"\n # ...\n\nRunTask2:\n special_var: \"{{ MY_SPECIAL_SUB }}\"\n var_3: \"abcd\"\n var_4: 123\n # ...\n\nRunTask3:\n special_var: \"{{ MY_SPECIAL_SUB }}\"\n #...\n\n# ... and so on\n
"},{"location":"#gotchas","title":"Gotchas!","text":"Order matters
While in general you can use parameters that appear later in a YAML document to substitute for values of parameters that appear earlier, the substitutions themselves will be performed in order of appearance. It is therefore NOT possible to correctly use a later parameter as a substitution for an earlier one, if the later one itself depends on a substitution. The YAML document, however, can be rearranged without error. The order in the YAML document has no effect on execution order which is determined purely by the workflow definition. As mentioned above, the document is not validated in its entirety so rearrangements are allowed. For example consider the following situation which produces an incorrect substitution:
%YAML 1.3\n---\ntitle: \"Configuration to Test YAML Substitution\"\nexperiment: \"TestYAMLSubs\"\nrun: 12\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/data/lcls/ds/exp/experiment/scratch\"\n...\n---\nRunTaskOne:\n input_dir: \"{{ RunTaskTwo.path }}\" # Will incorrectly be \"{{ work_dir }}/additional_path/{{ $RUN }}\"\n # ...\n\nRunTaskTwo:\n # Remember `work_dir` and `run` come from the header document and don't need to\n # be qualified\n path: \"{{ work_dir }}/additional_path/{{ run }}\"\n...\n
This configuration can be rearranged to achieve the desired result:
%YAML 1.3\n---\ntitle: \"Configuration to Test YAML Substitution\"\nexperiment: \"TestYAMLSubs\"\nrun: 12\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/data/lcls/ds/exp/experiment/scratch\"\n...\n---\nRunTaskTwo:\n # Remember `work_dir` comes from the header document and doesn't need to be qualified\n path: \"{{ work_dir }}/additional_path/{{ run }}\"\n\nRunTaskOne:\n input_dir: \"{{ RunTaskTwo.path }}\" # Will now be /sdf/data/lcls/ds/exp/experiment/scratch/additional_path/12\n # ...\n...\n
On the otherhand, relationships such as these may point to inconsistencies in the dependencies between Task
s which may warrant a refactor.
Found unhashable key
To avoid YAML parsing issues when using the substitution syntax, be sure to quote your substitutions. Before substitution is performed, a dictionary is first constructed by the pyyaml
package which parses the document - it may fail to parse the document and raise an exception if the substitutions are not quoted. E.g.
# USE THIS\nMyTask:\n var_sub: \"{{ other_var:04d }}\"\n\n# **DO NOT** USE THIS\nMyTask:\n var_sub: {{ other_var:04d }}\n
During validation, Pydantic will by default cast variables if possible, because of this it is generally safe to use strings for substitutions. E.g. if your parameter is expecting an integer, and after substitution you pass \"2\"
, Pydantic will cast this to the int
2
, and validation will succeed. As part of the substitution process limited type casting will also be handled if it is necessary for any formatting strings provided. E.g. \"{{ run:04d }}\"
requires that run be an integer, so it will be treated as such in order to apply the formatting.
In most cases, standard DAGs should be called as described above. However, Airflow also supports the dynamic creation of DAGs, e.g. to vary the input data to various steps, or the number of steps that will occur. Some of this functionality has been used to allow for user-defined DAGs which are passed in the form of a dictionary, allowing Airflow to construct the workflow as it is running.
A basic YAML syntax is used to construct a series of nested dictionaries which define a DAG. Consider a simplified serial femtosecond crystallography DAG which runs peak finding through merging and then calculates some statistics. I.e. we want an execution order that looks like:
peak_finder >> indexer >> merger >> hkl_comparer\n
We can alternatively define this DAG in YAML:
task_name: PeakFinderPyAlgos\nslurm_params: ''\nnext:\n- task_name: CrystFELIndexer\n slurm_params: ''\n next: []\n - task_name: PartialatorMerger\n slurm_params: ''\n next: []\n - task_name: HKLComparer\n slurm_params: ''\n next:\n
I.e. we define a tree where each node is constructed using Node(task_name: str, slurm_params: str, next: List[Node])
.
task_name
is the name of a managed Task
. This name must be identical to a managed Task
defined in the LUTE installation you are using.slurm_params
. This is a complete string of all the arguments to use for the corresponding managed Task
. Use of this field is all or nothing! - if it is left as an empty string, the default parameters (passed on the command-line using the launch script) are used, otherwise this string is used in its stead. Because of this remember to include a partition and account if using it.next
field is composed of either an empty list (meaning no managed Task
s are run after the current node), or additional nodes. All nodes in the next
list are run in parallel.As a second example, to run task1
followed by task2
and task3
in parellel we would use:
task_name: Task1\nslurm_params: ''\nnext:\n- task_name: Task2\n slurm_params: ''\n next: []\n- task_name: Task3\n slurm_params: ''\n next: []\n
In order to run a DAG defined in this way, we pass the path to the YAML file we have defined it in to the launch script using -W <path_to_dag>
. This is instead of calling it by name. E.g.
/path/to/lute/launch_scripts/submit_launch_airflow.sh /path/to/lute/launch_scripts/launch_airflow.py -e <exp> -r <run> -c /path/to/config -W <path_to_dag> --test [--debug] [SLURM_ARGS]\n
Note that fewer options are currently supported for configuring the operators for each step of the DAG. The slurm arguments can be replaced in their entirety using a custom slurm_params
string but individual options cannot be modified.
Special markers have been inserted at certain points in the execution flow for LUTE. These can be enabled by setting the environment variables detailed below. These are intended to allow developers to exit the program at certain points to investigate behaviour or a bug. For instance, when working on configuration parsing, an environment variable can be set which exits the program after passing this step. This allows you to run LUTE otherwise as normal (described above), without having to modify any additional code or insert your own early exits.
Types of debug markers:
LUTE_DEBUG_EXIT
: Will exit the program at this point if the corresponding environment variable has been set.Developers can insert these markers as needed into their code to add new exit points, although as a rule of thumb they should be used sparingly, and generally only after major steps in the execution flow (e.g. after parsing, after beginning a task, after returning a result, etc.).
In order to include a new marker in your code:
from lute.execution.debug_utils import LUTE_DEBUG_EXIT\n\ndef my_code() -> None:\n # ...\n LUTE_DEBUG_EXIT(\"MYENVVAR\", \"Additional message to print\")\n # If MYENVVAR is not set, the above function does nothing\n
You can enable a marker by setting to 1, e.g. to enable the example marker above while running Tester
:
MYENVVAR=1 python -B run_task.py -t Tester -c config/test.yaml\n
"},{"location":"#currently-used-environment-variables","title":"Currently used environment variables","text":"LUTE_DEBUG_EXIT_AT_YAML
: Exits the program after reading in a YAML configuration file and performing variable substitutions, but BEFORE Pydantic validation.LUTE_DEBUG_BEFORE_TPP_EXEC
: Exits the program after a ThirdPartyTask has prepared its submission command, but before exec
is used to run it.The Airflow launch process actually involves a number of steps, and is rather complicated. There are two wrapper steps prior to getting to the actual Airflow API communication.
launch_scripts/submit_launch_airflow.sh
is run./sdf/group/lcls/ds/tools/lute_launcher
with all the same parameters that it was called with.lute_launcher
runs the launch_scripts/launch_airflow.py
script which was provided as the first argument. This is the true launch scriptlaunch_airflow.py
communicates with the Airflow API, requesting that a specific DAG be launched. It then continues to run, and gathers the individual logs and the exit status of each step of the DAG.launch_scripts/submit_slurm.sh
.There are some specific reasons for this complexity:
submit_launch_airflow.sh
as a thin-wrapper around lute_launcher
is to allow the true Airflow launch script to be a long-lived job. This is for compatibility with the eLog and the ARP. When run from the eLog as a workflow, the job submission process must occur within 30 seconds due to a timeout built-in to the system. This is fine when submitting jobs to run on the batch-nodes, as the submission to the queue takes very little time. So here, submit_launch_airflow.sh
serves as a thin script to have lute_launcher
run as a batch job. It can then run as a long-lived job (for the duration of the entire DAG) collecting log files all in one place. This allows the log for each stage of the Airflow DAG to be inspected in a single file, and through the eLog browser interface.lute_launcher
as a wrapper around launch_airflow.py
is to manage authentication and credentials. The launch_airflow.py
script requires loading credentials in order to authenticate against the Airflow API. For the average user this is not possible, unless the script is run from within the lute_launcher
process.LUTE is publically available on GitHub. In order to run it, the first step is to clone the repository:
# Navigate to the directory of your choice.\ngit clone@github.com:slac-lcls/lute\n
The repository directory structure is as follows:
lute\n |--- config # Configuration YAML files (see below) and templates for third party config\n |--- docs # Documentation (including this page)\n |--- launch_scripts # Entry points for using SLURM and communicating with Airflow\n |--- lute # Code\n |--- run_task.py # Script to run an individual managed Task\n |--- ...\n |--- utilities # Help utility programs\n |--- workflows # This directory contains workflow definitions. It is synced elsewhere and not used directly.\n\n
In general, most interactions with the software will be through scripts located in the launch_scripts
directory. Some users (for certain use-cases) may also choose to run the run_task.py
script directly - it's location has been highlighted within hierarchy. To begin with you will need a YAML file, templates for which are available in the config
directory. The structure of the YAML file and how to use the various launch scripts are described in more detail below.
In the utilities
directory there are two useful programs to provide assistance with using the software:
utilities/dbview
: LUTE stores all parameters for every analysis routine it runs (as well as results) in a database. This database is stored in the work_dir
defined in the YAML file (see below). The dbview
utility is a TUI application (Text-based user interface) which runs in the terminal. It allows you to navigate a LUTE database using the arrow keys, etc. Usage is: utilities/dbview -p <path/to/lute.db>
.utilities/lute_help
: This utility provides help and usage information for running LUTE software. E.g., it provides access to parameter descriptions to assist in properly filling out a configuration YAML. It's usage is described in slightly more detail below.LUTE runs code as Task
s that are managed by an Executor
. The Executor
provides modifications to the environment the Task
runs in, as well as controls details of inter-process communication, reporting results to the eLog, etc. Combinations of specific Executor
s and Task
s are already provided, and are referred to as managed Task
s. Managed Task
s are submitted as a single unit. They can be run individually, or a series of independent steps can be submitted all at once in the form of a workflow, or directed acyclic graph (DAG). This latter option makes use of Airflow to manage the individual execution steps.
Running analysis with LUTE is the process of submitting one or more managed Task
s. This is generally a two step process.
Task
s which you may run.Task
submission, or workflow (DAG) submission.These two steps are described below.
"},{"location":"usage/#preparing-a-configuration-yaml","title":"Preparing a Configuration YAML","text":"All Task
s are parameterized through a single configuration YAML file - even third party code which requires its own configuration files is managed through this YAML file. The basic structure is split into two documents, a brief header section which contains information that is applicable across all Task
s, such as the experiment name, run numbers and the working directory, followed by per Task
parameters:
%YAML 1.3\n---\ntitle: \"Some title.\"\nexperiment: \"MYEXP123\"\n# run: 12 # Does not need to be provided\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/scratch/users/d/dorlhiac\"\n...\n---\nTaskOne:\n param_a: 123\n param_b: 456\n param_c:\n sub_var: 3\n sub_var2: 4\n\nTaskTwo:\n new_param1: 3\n new_param2: 4\n\n# ...\n...\n
In the first document, the header, it is important that the work_dir
is properly specified. This is the root directory from which Task
outputs will be written, and the LUTE database will be stored. It may also be desirable to modify the task_timeout
parameter which defines the time limit for individual Task
jobs. By default it is set to 10 minutes, although this may not be sufficient for long running jobs. This value will be applied to all Task
s so should account for the longest running job you expect.
The actual analysis parameters are defined in the second document. As these vary from Task
to Task
, a full description will not be provided here. An actual template with real Task
parameters is available in config/test.yaml
. Your analysis POC can also help you set up and choose the correct Task
s to include as a starting point. The template YAML file has further descriptions of what each parameter does and how to fill it out. You can also refer to the lute_help
program described under the following sub-heading.
Some things to consider and possible points of confusion:
Task
s, the parameters are defined at the Task
level. I.e. the managed Task
and Task
itself have different names, and the names in the YAML refer to the latter. This is because a single Task
can be run using different Executor
configurations, but using the same parameters. The list of managed Task
s is in lute/managed_tasks.py
. A table is also provided below for some routines of interest..Task
The Task
it Runs Task
Description SmallDataProducer
SubmitSMD
Smalldata production CrystFELIndexer
IndexCrystFEL
Crystallographic indexing PartialatorMerger
MergePartialator
Crystallographic merging HKLComparer
CompareHKL
Crystallographic figures of merit HKLManipulator
ManipulateHKL
Crystallographic format conversions DimpleSolver
DimpleSolve
Crystallographic structure solution with molecular replacement PeakFinderPyAlgos
FindPeaksPyAlgos
Peak finding with PyAlgos algorithm. PeakFinderPsocake
FindPeaksPsocake
Peak finding with psocake algorithm. StreamFileConcatenator
ConcatenateStreamFiles
Stream file concatenation."},{"location":"usage/#how-do-i-know-what-parameters-are-available-and-what-they-do","title":"How do I know what parameters are available, and what they do?","text":"A summary of Task
parameters is available through the lute_help
program.
> utilities/lute_help -t [TaskName]\n
Note, some parameters may say \"Unknown description\" - this either means they are using an old-style defintion that does not include parameter help, or they may have some internal use. In particular you will see this for lute_config
on every Task
, this parameter is filled in automatically and should be ignored. E.g. as an example:
> utilities/lute_help -t IndexCrystFEL\nINFO:__main__:Fetching parameter information for IndexCrystFEL.\nIndexCrystFEL\n-------------\nParameters for CrystFEL's `indexamajig`.\n\nThere are many parameters, and many combinations. For more information on\nusage, please refer to the CrystFEL documentation, here:\nhttps://www.desy.de/~twhite/crystfel/manual-indexamajig.html\n\n\nRequired Parameters:\n--------------------\n[...]\n\nAll Parameters:\n-------------\n[...]\n\nhighres (number)\n Mark all pixels greater than `x` has bad.\n\nprofile (boolean) - Default: False\n Display timing data to monitor performance.\n\ntemp_dir (string)\n Specify a path for the temp files folder.\n\nwait_for_file (integer) - Default: 0\n Wait at most `x` seconds for a file to be created. A value of -1 means wait forever.\n\nno_image_data (boolean) - Default: False\n Load only the metadata, no iamges. Can check indexability without high data requirements.\n\n[...]\n
"},{"location":"usage/#running-managed-tasks-and-workflows-dags","title":"Running Managed Task
s and Workflows (DAGs)","text":"After a YAML file has been filled in you can run a Task
. There are multiple ways to submit a Task
, but there are 3 that are most likely:
Task
interactively by running python ...
Task
as a batch job (e.g. on S3DF) via a SLURM submission submit_slurm.sh ...
Task
s).These will be covered in turn below; however, in general all methods will require two parameters: the path to a configuration YAML file, and the name of the managed Task
or workflow you want to run. When submitting via SLURM or submitting an entire workflow there are additional parameters to control these processes.
Task
s interactively","text":"The simplest submission method is just to run Python interactively. In most cases this is not practical for long-running analysis, but may be of use for short Task
s or when debugging. From the root directory of the LUTE repository (or after installation) you can use the run_task.py
script:
> python -B [-O] run_task.py -t <ManagedTaskName> -c </path/to/config/yaml>\n
The command-line arguments in square brackets []
are optional, while those in <>
must be provided:
-O
is the flag controlling whether you run in debug or non-debug mode. By default, i.e. if you do NOT provide this flag you will run in debug mode which enables verbose printing. Passing -O
will turn off debug to minimize output.-t <ManagedTaskName>
is the name of the managed Task
you want to run.-c </path/...>
is the path to the configuration YAML.Task
as a batch job","text":"On S3DF you can also submit individual managed Task
s to run as batch jobs. To do so use launch_scripts/submit_slurm.sh
> launch_scripts/submit_slurm.sh -t <ManagedTaskName> -c </path/to/config/yaml> [--debug] $SLURM_ARGS\n
As before command-line arguments in square brackets []
are optional, while those in <>
must be provided
-t <ManagedTaskName>
is the name of the managed Task
you want to run.-c </path/...>
is the path to the configuration YAML.--debug
is the flag to control whether or not to run in debug mode.In addition to the LUTE-specific arguments, SLURM arguments must also be provided ($SLURM_ARGS
above). You can provide as many as you want; however you will need to at least provide:
--partition=<partition/queue>
- The queue to run on, in general for LCLS this is milano
--account=lcls:<experiment>
- The account to use for batch job accounting.You will likely also want to provide at a minimum:
--ntasks=<...>
to control the number of cores in allocated.In general, it is best to prefer the long-form of the SLURM-argument (--arg=<...>
) in order to avoid potential clashes with present or future LUTE arguments.
Finally, you can submit a full workflow (e.g. SFX analysis, smalldata production and summary results, geometry optimization...). This can be done using a single script, submit_launch_airflow.sh
, similarly to the SLURM submission above:
> launch_scripts/submit_launch_airflow.sh /path/to/lute/launch_scripts/launch_airflow.py -c </path/to/yaml.yaml> -w <dag_name> [--debug] [--test] [-e <exp>] [-r <run>] $SLURM_ARGS\n
The submission process is slightly more complicated in this case. A more in-depth explanation is provided under \"Airflow Launch Steps\", in the advanced usage section below if interested. The parameters are as follows - as before command-line arguments in square brackets []
are optional, while those in <>
must be provided:
launch_scripts/launch_airflow.py
script located in whatever LUTE installation you are running. All other arguments can come afterwards in any order.-c </path/...>
is the path to the configuration YAML to use.-w <dag_name>
is the name of the DAG (workflow) to run. This replaces the task name provided when using the other two methods above. A DAG list is provided below.-W
(capital W) followed by the path to the workflow instead of -w
. See below for further discussion on this use case.--debug
controls whether to use debug mode (verbose printing)--test
controls whether to use the test or production instance of Airflow to manage the DAG. The instances are running identical versions of Airflow, but the test
instance may have \"test\" or more bleeding edge development DAGs.-e
is used to pass the experiment name. Needed if not using the ARP, i.e. running from the command-line.-r
is used to pass a run number. Needed if not using the ARP, i.e. running from the command-line.The $SLURM_ARGS
must be provided in the same manner as when submitting an individual managed Task
by hand to be run as batch job with the script above. Note that these parameters will be used as the starting point for the SLURM arguments of every managed Task
in the DAG; however, individual steps in the DAG may have overrides built-in where appropriate to make sure that step is not submitted with potentially incompatible arguments. For example, a single threaded analysis Task
may be capped to running on one core, even if in general everything should be running on 100 cores, per the SLURM argument provided. These caps are added during development and cannot be disabled through configuration changes in the YAML.
DAG List
find_peaks_index
psocake_sfx_phasing
pyalgos_sfx
eLog
","text":"You can use the script in the previous section to submit jobs through the eLog. To do so navigate to the Workflow > Definitions
tab using the blue navigation bar at the top of the eLog. On this tab, in the top-right corner (underneath the help and zoom icons) you can click the +
sign to add a new workflow. This will bring up a \"Workflow definition\" UI window. When filling out the eLog workflow definition the following fields are needed (all of them):
Name
: You can name the workflow anything you like. It should probably be something descriptive, e.g. if you are using LUTE to run smalldata_tools, you may call the workflow lute_smd
.Executable
: In this field you will put the full path to the submit_launch_airflow.sh
script: /path/to/lute/launch_scripts/submit_launch_airflow.sh
.Parameters
: You will use the parameters as described above. Remember the first argument will be the full path to the launch_airflow.py
script (this is NOT the same as the bash script used in the executable!): /full/path/to/lute/launch_scripts/launch_airflow.py -c <path/to/yaml> -w <dag_name> [--debug] [--test] $SLURM_ARGS
Location
: Be sure to set to S3DF
.Trigger
: You can have the workflow trigger automatically or manually. Which option to choose will depend on the type of workflow you are running. In general the options Manually triggered
(which displays as MANUAL
on the definitions page) and End of a run
(which displays as END_OF_RUN
on the definitions page) are safe options for ALL workflows. The latter will be automatically submitted for you when data acquisition has finished. If you are running a workflow with managed Task
s that work as data is being acquired (e.g. SmallDataProducer
), you may also select Start of a run
(which displays as START_OF_RUN
on the definitions page).Upon clicking create you will see a new entry in the table on the definitions page. In order to run MANUAL
workflows, or re-run automatic workflows, you must navigate to the Workflows > Control
tab. For each acquisition run you will find a drop down menu under the Job
column. To submit a workflow you select it from this drop down menu by the Name
you provided when creating its definition.
Using validator
s, it is possible to define (generally, default) model parameters for a Task
in terms of other parameters. It is also possible to use validated Pydantic model parameters to substitute values into a configuration file required to run a third party Task
(e.g. some Task
s may require their own JSON, TOML files, etc. to run properly). For more information on these types of substitutions, refer to the new_task.md
documentation on Task
creation.
These types of substitutions, however, have a limitation in that they are not easily adapted at run time. They therefore address only a small number of the possible combinations in the dependencies between different input parameters. In order to support more complex relationships between parameters, variable substitutions can also be used in the configuration YAML itself. Using a syntax similar to Jinja
templates, you can define values for YAML parameters in terms of other parameters or environment variables. The values are substituted before Pydantic attempts to validate the configuration.
It is perhaps easiest to illustrate with an example. A test case is provided in config/test_var_subs.yaml
and is reproduced here:
%YAML 1.3\n---\ntitle: \"Configuration to Test YAML Substitution\"\nexperiment: \"TestYAMLSubs\"\nrun: 12\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/scratch/users/d/dorlhiac\"\n...\n---\nOtherTask:\n useful_other_var: \"USE ME!\"\n\nNonExistentTask:\n test_sub: \"/path/to/{{ experiment }}/file_r{{ run:04d }}.input\" # Substitute `experiment` and `run` from header above\n test_env_sub: \"/path/to/{{ $EXPERIMENT }}/file.input\" # Substitute from the environment variable $EXPERIMENT\n test_nested:\n a: \"outfile_{{ run }}_one.out\" # Substitute `run` from header above\n b:\n c: \"outfile_{{ run }}_two.out\" # Also substitute `run` from header above\n d: \"{{ OtherTask.useful_other_var }}\" # Substitute `useful_other_var` from `OtherTask`\n test_fmt: \"{{ run:04d }}\" # Subsitute `run` and format as 0012\n test_env_fmt: \"{{ $RUN:04d }}\" # Substitute environment variable $RUN and pad to 4 w/ zeros\n...\n
Input parameters in the config YAML can be substituted with either other input parameters or environment variables, with or without limited string formatting. All substitutions occur between double curly brackets: {{ VARIABLE_TO_SUBSTITUTE }}
. Environment variables are indicated by $
in front of the variable name. Parameters from the header, i.e. the first YAML document (top section) containing the run
, experiment
, version fields, etc. can be substituted without any qualification. If you want to use the run
parameter, you can substitute it using {{ run }}
. All other parameters, i.e. from other Task
s or within Task
s, must use a qualified name. Nested levels are delimited using a .
. E.g. consider a structure like:
Task:\n param_set:\n a: 1\n b: 2\n c: 3\n
In order to use parameter c
, you would use {{ Task.param_set.c }}
as the substitution.
Take care when using substitutions! This process will not try to guess for you. When a substitution is not available, e.g. due to misspelling, one of two things will happen:
param: /my/failed/{{ $SUBSTITUTION }}
as your parameter. This may or may not fail the model validation step, but is likely not what you intended.Defining your own parameters
The configuration file is not validated in its totality, only on a Task
-by-Task
basis, but it is read in its totality. E.g. when running MyTask
only that portion of the configuration is validated even though the entire file has been read, and is available for substitutions. As a result, it is safe to introduce extra entries into the YAML file, as long as they are not entered under a specific Task
's configuration. This may be useful to create your own global substitutions, for example if there is a key variable that may be used across different Task
s. E.g. Consider a case where you want to create a more generic configuration file where a single variable is used by multiple Task
s. This single variable may be changed between experiments, for instance, but is likely static for the duration of a single set of analyses. In order to avoid a mistake when changing the configuration between experiments you can define this special variable (or variables) as a separate entry in the YAML, and make use of substitutions in each Task
's configuration. This way the variable only needs to be changed in one place.
# Define our substitution. This is only for substitutiosns!\nMY_SPECIAL_SUB: \"EXPMT_DEPENDENT_VALUE\" # Can change here once per experiment!\n\nRunTask1:\n special_var: \"{{ MY_SPECIAL_SUB }}\"\n var_1: 1\n var_2: \"a\"\n # ...\n\nRunTask2:\n special_var: \"{{ MY_SPECIAL_SUB }}\"\n var_3: \"abcd\"\n var_4: 123\n # ...\n\nRunTask3:\n special_var: \"{{ MY_SPECIAL_SUB }}\"\n #...\n\n# ... and so on\n
"},{"location":"usage/#gotchas","title":"Gotchas!","text":"Order matters
While in general you can use parameters that appear later in a YAML document to substitute for values of parameters that appear earlier, the substitutions themselves will be performed in order of appearance. It is therefore NOT possible to correctly use a later parameter as a substitution for an earlier one, if the later one itself depends on a substitution. The YAML document, however, can be rearranged without error. The order in the YAML document has no effect on execution order which is determined purely by the workflow definition. As mentioned above, the document is not validated in its entirety so rearrangements are allowed. For example consider the following situation which produces an incorrect substitution:
%YAML 1.3\n---\ntitle: \"Configuration to Test YAML Substitution\"\nexperiment: \"TestYAMLSubs\"\nrun: 12\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/data/lcls/ds/exp/experiment/scratch\"\n...\n---\nRunTaskOne:\n input_dir: \"{{ RunTaskTwo.path }}\" # Will incorrectly be \"{{ work_dir }}/additional_path/{{ $RUN }}\"\n # ...\n\nRunTaskTwo:\n # Remember `work_dir` and `run` come from the header document and don't need to\n # be qualified\n path: \"{{ work_dir }}/additional_path/{{ run }}\"\n...\n
This configuration can be rearranged to achieve the desired result:
%YAML 1.3\n---\ntitle: \"Configuration to Test YAML Substitution\"\nexperiment: \"TestYAMLSubs\"\nrun: 12\ndate: \"2024/05/01\"\nlute_version: 0.1\ntask_timeout: 600\nwork_dir: \"/sdf/data/lcls/ds/exp/experiment/scratch\"\n...\n---\nRunTaskTwo:\n # Remember `work_dir` comes from the header document and doesn't need to be qualified\n path: \"{{ work_dir }}/additional_path/{{ run }}\"\n\nRunTaskOne:\n input_dir: \"{{ RunTaskTwo.path }}\" # Will now be /sdf/data/lcls/ds/exp/experiment/scratch/additional_path/12\n # ...\n...\n
On the otherhand, relationships such as these may point to inconsistencies in the dependencies between Task
s which may warrant a refactor.
Found unhashable key
To avoid YAML parsing issues when using the substitution syntax, be sure to quote your substitutions. Before substitution is performed, a dictionary is first constructed by the pyyaml
package which parses the document - it may fail to parse the document and raise an exception if the substitutions are not quoted. E.g.
# USE THIS\nMyTask:\n var_sub: \"{{ other_var:04d }}\"\n\n# **DO NOT** USE THIS\nMyTask:\n var_sub: {{ other_var:04d }}\n
During validation, Pydantic will by default cast variables if possible, because of this it is generally safe to use strings for substitutions. E.g. if your parameter is expecting an integer, and after substitution you pass \"2\"
, Pydantic will cast this to the int
2
, and validation will succeed. As part of the substitution process limited type casting will also be handled if it is necessary for any formatting strings provided. E.g. \"{{ run:04d }}\"
requires that run be an integer, so it will be treated as such in order to apply the formatting.
In most cases, standard DAGs should be called as described above. However, Airflow also supports the dynamic creation of DAGs, e.g. to vary the input data to various steps, or the number of steps that will occur. Some of this functionality has been used to allow for user-defined DAGs which are passed in the form of a dictionary, allowing Airflow to construct the workflow as it is running.
A basic YAML syntax is used to construct a series of nested dictionaries which define a DAG. Consider a simplified serial femtosecond crystallography DAG which runs peak finding through merging and then calculates some statistics. I.e. we want an execution order that looks like:
peak_finder >> indexer >> merger >> hkl_comparer\n
We can alternatively define this DAG in YAML:
task_name: PeakFinderPyAlgos\nslurm_params: ''\nnext:\n- task_name: CrystFELIndexer\n slurm_params: ''\n next: []\n - task_name: PartialatorMerger\n slurm_params: ''\n next: []\n - task_name: HKLComparer\n slurm_params: ''\n next:\n
I.e. we define a tree where each node is constructed using Node(task_name: str, slurm_params: str, next: List[Node])
.
task_name
is the name of a managed Task
. This name must be identical to a managed Task
defined in the LUTE installation you are using.slurm_params
. This is a complete string of all the arguments to use for the corresponding managed Task
. Use of this field is all or nothing! - if it is left as an empty string, the default parameters (passed on the command-line using the launch script) are used, otherwise this string is used in its stead. Because of this remember to include a partition and account if using it.next
field is composed of either an empty list (meaning no managed Task
s are run after the current node), or additional nodes. All nodes in the next
list are run in parallel.As a second example, to run task1
followed by task2
and task3
in parellel we would use:
task_name: Task1\nslurm_params: ''\nnext:\n- task_name: Task2\n slurm_params: ''\n next: []\n- task_name: Task3\n slurm_params: ''\n next: []\n
In order to run a DAG defined in this way, we pass the path to the YAML file we have defined it in to the launch script using -W <path_to_dag>
. This is instead of calling it by name. E.g.
/path/to/lute/launch_scripts/submit_launch_airflow.sh /path/to/lute/launch_scripts/launch_airflow.py -e <exp> -r <run> -c /path/to/config -W <path_to_dag> --test [--debug] [SLURM_ARGS]\n
Note that fewer options are currently supported for configuring the operators for each step of the DAG. The slurm arguments can be replaced in their entirety using a custom slurm_params
string but individual options cannot be modified.
Special markers have been inserted at certain points in the execution flow for LUTE. These can be enabled by setting the environment variables detailed below. These are intended to allow developers to exit the program at certain points to investigate behaviour or a bug. For instance, when working on configuration parsing, an environment variable can be set which exits the program after passing this step. This allows you to run LUTE otherwise as normal (described above), without having to modify any additional code or insert your own early exits.
Types of debug markers:
LUTE_DEBUG_EXIT
: Will exit the program at this point if the corresponding environment variable has been set.Developers can insert these markers as needed into their code to add new exit points, although as a rule of thumb they should be used sparingly, and generally only after major steps in the execution flow (e.g. after parsing, after beginning a task, after returning a result, etc.).
In order to include a new marker in your code:
from lute.execution.debug_utils import LUTE_DEBUG_EXIT\n\ndef my_code() -> None:\n # ...\n LUTE_DEBUG_EXIT(\"MYENVVAR\", \"Additional message to print\")\n # If MYENVVAR is not set, the above function does nothing\n
You can enable a marker by setting to 1, e.g. to enable the example marker above while running Tester
:
MYENVVAR=1 python -B run_task.py -t Tester -c config/test.yaml\n
"},{"location":"usage/#currently-used-environment-variables","title":"Currently used environment variables","text":"LUTE_DEBUG_EXIT_AT_YAML
: Exits the program after reading in a YAML configuration file and performing variable substitutions, but BEFORE Pydantic validation.LUTE_DEBUG_BEFORE_TPP_EXEC
: Exits the program after a ThirdPartyTask has prepared its submission command, but before exec
is used to run it.The Airflow launch process actually involves a number of steps, and is rather complicated. There are two wrapper steps prior to getting to the actual Airflow API communication.
launch_scripts/submit_launch_airflow.sh
is run./sdf/group/lcls/ds/tools/lute_launcher
with all the same parameters that it was called with.lute_launcher
runs the launch_scripts/launch_airflow.py
script which was provided as the first argument. This is the true launch scriptlaunch_airflow.py
communicates with the Airflow API, requesting that a specific DAG be launched. It then continues to run, and gathers the individual logs and the exit status of each step of the DAG.launch_scripts/submit_slurm.sh
.There are some specific reasons for this complexity:
submit_launch_airflow.sh
as a thin-wrapper around lute_launcher
is to allow the true Airflow launch script to be a long-lived job. This is for compatibility with the eLog and the ARP. When run from the eLog as a workflow, the job submission process must occur within 30 seconds due to a timeout built-in to the system. This is fine when submitting jobs to run on the batch-nodes, as the submission to the queue takes very little time. So here, submit_launch_airflow.sh
serves as a thin script to have lute_launcher
run as a batch job. It can then run as a long-lived job (for the duration of the entire DAG) collecting log files all in one place. This allows the log for each stage of the Airflow DAG to be inspected in a single file, and through the eLog browser interface.lute_launcher
as a wrapper around launch_airflow.py
is to manage authentication and credentials. The launch_airflow.py
script requires loading credentials in order to authenticate against the Airflow API. For the average user this is not possible, unless the script is run from within the lute_launcher
process.madr_template.md
for creating new ADRs. This template was adapted from the MADR template (MIT License).Task
s inherit from a base class Accepted 2 2023-11-06 Analysis Task
submission and communication is performed via Executor
s Accepted 3 2023-11-06 Executor
s will run all Task
s via subprocess Proposed 4 2023-11-06 Airflow Operator
s and LUTE Executor
s are separate entities. Proposed 5 2023-12-06 Task-Executor IPC is Managed by Communicator Objects Proposed 6 2024-02-12 Third-party Config Files Managed by Templates Rendered by ThirdPartyTask
s Proposed 7 2024-02-12 Task
Configuration is Stored in a Database Managed by Executor
s Proposed 8 2024-03-18 Airflow credentials/authorization requires special launch program. Proposed 9 2024-04-15 Airflow launch script will run as long lived batch job. Proposed"},{"location":"adrs/MADR_LICENSE/","title":"MADR LICENSE","text":"Copyright 2022 ADR Github Organization
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \u201cSoftware\u201d), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED \u201cAS IS\u201d, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
"},{"location":"adrs/adr-1/","title":"[ADR-1] All Analysis Tasks Inherit from a Base Class","text":"Date: 2023-11-06
"},{"location":"adrs/adr-1/#status","title":"Status","text":"Accepted
"},{"location":"adrs/adr-1/#context-and-problem-statement","title":"Context and Problem Statement","text":"btx
tasks had heterogenous interfaces.Task
s simultaneously.Date: 2023-11-06
"},{"location":"adrs/adr-2/#status","title":"Status","text":"Accepted
"},{"location":"adrs/adr-2/#context-and-problem-statement","title":"Context and Problem Statement","text":"Task
code itself provides a separation of concerns allowing Task
s to run indepently of execution environment.Executor
can prepare environment, submission requirements, etc.Executor
classes avoids maintaining that code independently for each task (cf. alternatives considered).Executor
level and immediately applied to all Task
s.Task
code.btx
tasks. E.g. task timeout leading to failure of a processing pipeline even if substantial work had been done and subsequent tasks could proceed.Task
submission already exist in the original btx
but the methods were not fully standardized.JobScheduler
submission vs direct submission of the task.Task
class interface as pre/post analysis operations.Task
subclasses for different execution environments.Task
class.Task
code independent of execution environment.Task
failure.Executor
s as the \"Managed Task\"Task
s will not be submitted independently.Executor
s will run all Task
s via subprocess","text":"Date: 2023-11-06
"},{"location":"adrs/adr-3/#status","title":"Status","text":"Proposed
"},{"location":"adrs/adr-3/#context-and-problem-statement","title":"Context and Problem Statement","text":"Task
s from within the Executor
(cf. ADR-2)Task
s, at all locations, but at the very least all Task
s at a single location (e.g. S3DF, NERSC)Task
submission, but have to submit both first-party and third-party code.JobScheduler
for btx
multiprocessing
at the Python level.Operator
s and LUTE Executor
s are Separate Entities","text":"Date: 2023-11-06
"},{"location":"adrs/adr-4/#status","title":"Status","text":"Proposed
"},{"location":"adrs/adr-4/#context-and-problem-statement","title":"Context and Problem Statement","text":"Executor
which in turn submits the Task
*
"},{"location":"adrs/adr-4/#considered-options","title":"Considered Options","text":"*
"},{"location":"adrs/adr-4/#consequences","title":"Consequences","text":"*
"},{"location":"adrs/adr-4/#compliance","title":"Compliance","text":""},{"location":"adrs/adr-4/#metadata","title":"Metadata","text":""},{"location":"adrs/adr-5/","title":"[ADR-5] Task-Executor IPC is Managed by Communicator Objects","text":"Date: 2023-12-06
"},{"location":"adrs/adr-5/#status","title":"Status","text":"Proposed
"},{"location":"adrs/adr-5/#context-and-problem-statement","title":"Context and Problem Statement","text":"Communicator
objects which maintain simple read
and write
mechanisms for Message
objects. These latter can contain arbitrary Python objects. Task
s do not interact directly with the communicator, but rather through specific instance methods which hide the communicator interfaces. Multiple Communicators can be used in parallel. The same Communicator
objects are used identically at the Task
and Executor
layers - any changes to communication protocols are not transferred to the calling objects.
Task
output needs to be routed to other layers of the software, but the Task
s themselves should have no knowledge of where the output ends up.subprocess
Task
and Executor
layers.Communicator
: Abstract base class - defines interfacePipeCommunicator
: Manages communication through pipes (stderr
and stdout
)SocketCommunicator
: Manages communication through Unix socketsTask
and Executor
side, IPC is greatly simplifiedCommunicator
Communicator
objects are non-public. Their interfaces (already limited) are handled by simple methods in the base classes of Task
s and Executor
s.Communicator
should have no need to be directly manipulated by callers (even less so by users)ThirdPartyTask
s","text":"Date: 2024-02-12
"},{"location":"adrs/adr-6/#status","title":"Status","text":"Proposed
"},{"location":"adrs/adr-6/#context-and-problem-statement","title":"Context and Problem Statement","text":"Templates will be used for the third party configuration files. A generic interface to heterogenous templates will be provided through a combination of pydantic models and the ThirdPartyTask
implementation. The pydantic models will label extra arguments to ThirdPartyTask
s as being TemplateParameters
. I.e. any extra parameters are considered to be for a templated configuration file. The ThirdPartyTask
will find the necessary template and render it if any extra parameters are found. This puts the burden of correct parsing on the template definition itself.
Task
interface as possible - but due to the above, need a way of handling multiple output files.Task
to be run before the main ThirdPartyTask
.Task
.ThirdPartyTask
s to be run as instances of a single class.Task
Configuration is Stored in a Database Managed by Executor
s","text":"Date: 2024-02-12
"},{"location":"adrs/adr-7/#status","title":"Status","text":"Proposed
"},{"location":"adrs/adr-7/#context-and-problem-statement","title":"Context and Problem Statement","text":"Task
parameter configurations.Task
's code is designed to be independent of other Task
's aside from code shared by inheritance.Task
s are intended to be defined only at the level of workflows.Task
s may have implicit dependencies on others. E.g. one Task
may use the output files of another, and so could benefit from having knowledge of where they were written.Upon Task
completion the managing Executor
will write the AnalysisConfig
object, including TaskParameters
, results and generic configuration information to a database. Some entries from this database can be retrieved to provide default files for TaskParameter
fields; however, the Task
itself has no knowledge, and does not access to the database.
Task
s while allowing information to be shared between them.Task
-independent IO be managed solely at the Executor
level.Task
s write the database.Task
s pass information through other mechanisms, such as Airflow.sqlite
which should make everything transferrable.Task
s without any explicit code dependencies/linkages between them.Date: 2024-03-18
"},{"location":"adrs/adr-8/#status","title":"Status","text":"Proposed
"},{"location":"adrs/adr-8/#context-and-problem-statement","title":"Context and Problem Statement","text":"A closed-source lute_launcher
program will be used to run the Airflow launch scripts. This program accesses credentials with the correct permissions. Users should otherwise not have access to the credentials. This will help ensure the credentials can be used by everyone but only to run workflows and not perform restricted admin activities.
Date: 2024-04-15
"},{"location":"adrs/adr-9/#status","title":"Status","text":"Proposed
"},{"location":"adrs/adr-9/#context-and-problem-statement","title":"Context and Problem Statement","text":"Task
will produce its own log file.The Airflow launch script will be a long lived process, running for the duration of the entire DAG. It will provide basic status logging information, e.g. what Task
s are running, if they succeed or failed. Additionally, at the end of each Task
job, the launch job will collect the log file from that job and append it to its own log.
As the Airflow launch script is an entry point used from the eLog, only its log file is available to users using that UI. By converting the launch script into a long-lived monitoring job it allows the log information to be easily accessible.
In order to accomplish this, the launch script must be submitted as a batch job, in order to comply with the 30 second timeout imposed by jobs run by the ARP. This necessitates providing an additional wrapper script.
"},{"location":"adrs/adr-9/#decision-drivers","title":"Decision Drivers","text":"--open-mode=append
for SLURM)submit_launch_airflow.sh
which submits the launch_airflow.py
script (run by lute_launcher
) as a batch job.launch_airflow.py
) and 1 for the Executor
process. {ADR #X : Short description/title of feature/decision}
Date:
"},{"location":"adrs/madr_template/#status","title":"Status","text":"{Accepted | Proposed | Rejected | Deprecated | Superseded} {If this proposal supersedes another, please indicate so, e.g. \"Status: Accepted, supersedes [ADR-3]\"} {Likewise, if this proposal was superseded, e.g. \"Status: Superseded by [ADR-2]\"}
"},{"location":"adrs/madr_template/#context-and-problem-statement","title":"Context and Problem Statement","text":"{Describe the problem context and why this decision has been made/feature implemented.}
"},{"location":"adrs/madr_template/#decision","title":"Decision","text":"{Describe how the solution was arrived at in the manner it was. You may use the sections below to help.}
"},{"location":"adrs/madr_template/#decision-drivers","title":"Decision Drivers","text":"{Short description of anticipated consequences} * {Anticipated consequence 1} * {Anticipated consequence 2}
"},{"location":"adrs/madr_template/#compliance","title":"Compliance","text":"{How will the decision/implementation be enforced. How will compliance be validated?}
"},{"location":"adrs/madr_template/#metadata","title":"Metadata","text":"{Any additional information to include}
"},{"location":"design/database/","title":"LUTE Configuration Database Specification","text":"Date: 2024-02-12 VERSION: v0.1
"},{"location":"design/database/#basic-outline","title":"Basic Outline","text":"Executor
level code.Executor
configurationlute.io.config.AnalysisHeader
)Task
Task
tables by pointing/linking to the entry ids in the above two tables.gen_cfg
table","text":"The general configuration table contains entries which may be shared between multiple Task
s. The format of the table is:
These parameters are extracted from the TaskParameters
object. Each of those contains an AnalysisHeader
object stored in the lute_config
variable. For a given experimental run, this value will be shared across any Task
s that are executed.
id
ID of the entry in this table. title
Arbitrary description/title of the purpose of analysis. E.g. what kind of experiment is being conducted experiment
LCLS Experiment. Can be a placeholder if debugging, etc. run
LCLS Acquisition run. Can be a placeholder if debugging, testing, etc. date
Date the configuration file was first setup. lute_version
Version of the codebase being used to execute Task
s. task_timeout
The maximum amount of time in seconds that a Task
can run before being cancelled."},{"location":"design/database/#exec_cfg-table","title":"exec_cfg
table","text":"The Executor
table contains information on the environment provided to the Executor
for Task
execution, the polling interval used for IPC between the Task
and Executor
and information on the communicator protocols used for IPC. This information can be shared between Task
s or between experimental runs, but not necessarily every Task
of a given run will use exactly the same Executor
configuration and environment.
id
ID of the entry in this table. env
Execution environment used by the Executor and by proxy any Tasks submitted by an Executor matching this entry. Environment is stored as a string with variables delimited by \";\" poll_interval
Polling interval used for Task monitoring. communicator_desc
Description of the Communicators used. NOTE: The env
column currently only stores variables related to SLURM
or LUTE
itself.
Task
tables","text":"For every Task
a table of the following format will be created. The exact number of columns will depend on the specific Task
, as the number of parameters can vary between them, and each parameter gets its own column. Within a table, multiple experiments and runs can coexist. The experiment and run are not recorded directly. Instead, the first two columns point to the id of entries in the general configuration and Executor
tables respectively. The general configuration table entry will contain the experiment and run information.
Parameter sets which can be described as nested dictionaries are flattened and then delimited with a .
to create column names. Parameters which are lists (or Python tuples, etc.) have a column for each entry with names that include an index (counting from 0). E.g. consider the following dictionary of parameters:
param_dict: Dict[str, Any] = {\n \"a\": { # First parameter a\n \"b\": (1, 2),\n \"c\": 1,\n # ...\n },\n \"a2\": 4, # Second parameter a2\n # ...\n}\n
The dictionary a
will produce columns: a.b[0]
, a.b[1]
, a.c
, and so on.
id
ID of the entry in this table. CURRENT_TIMESTAMP
Full timestamp for the entry. gen_cfg_id
ID of the entry in the general config table that applies to this Task
entry. That table has, e.g., experiment and run number. exec_cfg_id
The ID of the entry in the Executor
table which applies to this Task
entry. P1
- Pn
The specific parameters of the Task
. The P{1..n}
are replaced by the actual parameter names. result.task_status
Reported exit status of the Task
. Note that the output may still be labeled invalid by the valid_flag
(see below). result.summary
Short text summary of the Task
result. This is provided by the Task
, or sometimes the Executor
. result.payload
Full description of result from the Task
. If the object is incompatible with the database, will instead be a pointer to where it can be found. result.impl_schemas
A string of semi-colon separated schema(s) implemented by the Task
. Schemas describe conceptually the type output the Task
produces. valid_flag
A boolean flag for whether the result is valid. May be 0
(False) if e.g., data is missing, or corrupt, or reported status is failed. NOTE: The result.payload
may be distinct from the output files. Payloads can be specified in terms of output parameters, specific output files, or are an optional summary of the results provided by the Task
. E.g. this may include graphical descriptions of results (plots, figures, etc.). In many cases, however, the output files will most likely be pointed to by a parameter in one of the columns P{1...n}
- if properly specified in the TaskParameters
model the value of this output parameter will be replicated in the result.payload
column as well..
This API is intended to be used at the Executor
level, with some calls intended to provide default values for Pydantic models. Utilities for reading and inspecting the database outside of normal Task
execution are addressed in the following subheader.
record_analysis_db(cfg: DescribedAnalysis) -> None
: Writes the configuration to the backend database.read_latest_db_entry(db_dir: str, task_name: str, param: str) -> Any
: Retrieve the most recent entry from a database for a specific Task.invalidate_entry
: Marks a database entry as invalid. Common reason to use this is if data has been deleted, or found to be corrupted.dbview
: TUI for database inspection. Read only.LUTE Managed Tasks.
Executor-managed Tasks with specific environment specifications are defined here.
"},{"location":"source/managed_tasks/#managed_tasks.BinaryErrTester","title":"BinaryErrTester = Executor('TestBinaryErr')
module-attribute
","text":"Runs a test of a third-party task that fails.
"},{"location":"source/managed_tasks/#managed_tasks.BinaryTester","title":"BinaryTester: Executor = Executor('TestBinary')
module-attribute
","text":"Runs a basic test of a multi-threaded third-party Task.
"},{"location":"source/managed_tasks/#managed_tasks.CrystFELIndexer","title":"CrystFELIndexer: Executor = Executor('IndexCrystFEL')
module-attribute
","text":"Runs crystallographic indexing using CrystFEL.
"},{"location":"source/managed_tasks/#managed_tasks.DimpleSolver","title":"DimpleSolver: Executor = Executor('DimpleSolve')
module-attribute
","text":"Solves a crystallographic structure using molecular replacement.
"},{"location":"source/managed_tasks/#managed_tasks.HKLComparer","title":"HKLComparer: Executor = Executor('CompareHKL')
module-attribute
","text":"Runs analysis on merge results for statistics/figures of merit..
"},{"location":"source/managed_tasks/#managed_tasks.HKLManipulator","title":"HKLManipulator: Executor = Executor('ManipulateHKL')
module-attribute
","text":"Performs format conversions (among other things) of merge results.
"},{"location":"source/managed_tasks/#managed_tasks.MultiNodeCommunicationTester","title":"MultiNodeCommunicationTester: MPIExecutor = MPIExecutor('TestMultiNodeCommunication')
module-attribute
","text":"Runs a test to confirm communication works between multiple nodes.
"},{"location":"source/managed_tasks/#managed_tasks.PartialatorMerger","title":"PartialatorMerger: Executor = Executor('MergePartialator')
module-attribute
","text":"Runs crystallographic merging using CrystFEL's partialator.
"},{"location":"source/managed_tasks/#managed_tasks.PeakFinderPsocake","title":"PeakFinderPsocake: Executor = Executor('FindPeaksPsocake')
module-attribute
","text":"Performs Bragg peak finding using psocake - DEPRECATED.
"},{"location":"source/managed_tasks/#managed_tasks.PeakFinderPyAlgos","title":"PeakFinderPyAlgos: MPIExecutor = MPIExecutor('FindPeaksPyAlgos')
module-attribute
","text":"Performs Bragg peak finding using the PyAlgos algorithm.
"},{"location":"source/managed_tasks/#managed_tasks.ReadTester","title":"ReadTester: Executor = Executor('TestReadOutput')
module-attribute
","text":"Runs a test to confirm database reading.
"},{"location":"source/managed_tasks/#managed_tasks.SHELXCRunner","title":"SHELXCRunner: Executor = Executor('RunSHELXC')
module-attribute
","text":"Runs CCP4 SHELXC - needed for crystallographic phasing.
"},{"location":"source/managed_tasks/#managed_tasks.SmallDataProducer","title":"SmallDataProducer: Executor = Executor('SubmitSMD')
module-attribute
","text":"Runs the production of a smalldata HDF5 file.
"},{"location":"source/managed_tasks/#managed_tasks.SocketTester","title":"SocketTester: Executor = Executor('TestSocket')
module-attribute
","text":"Runs a test of socket-based communication.
"},{"location":"source/managed_tasks/#managed_tasks.StreamFileConcatenator","title":"StreamFileConcatenator: Executor = Executor('ConcatenateStreamFiles')
module-attribute
","text":"Concatenates results from crystallographic indexing of multiple runs.
"},{"location":"source/managed_tasks/#managed_tasks.Tester","title":"Tester: Executor = Executor('Test')
module-attribute
","text":"Runs a basic test of a first-party Task.
"},{"location":"source/managed_tasks/#managed_tasks.WriteTester","title":"WriteTester: Executor = Executor('TestWriteOutput')
module-attribute
","text":"Runs a test to confirm database writing.
"},{"location":"source/execution/debug_utils/","title":"debug_utils","text":"Functions to assist in debugging execution of LUTE.
Functions:
Name DescriptionLUTE_DEBUG_EXIT
str, str_dump: Optional[str]): Exits the program if the provided env_var
is set. Optionally, also prints a message if provided.
Raises:
Type DescriptionValidationError
Error raised by pydantic during data validation. (From Pydantic)
"},{"location":"source/execution/executor/","title":"executor","text":"Base classes and functions for handling Task
execution.
Executors run a Task
as a subprocess and handle all communication with other services, e.g., the eLog. They accept specific handlers to override default stream parsing.
Event handlers/hooks are implemented as standalone functions which can be added to an Executor.
Classes:
Name DescriptionAnalysisConfig
Data class for holding a managed Task's configuration.
BaseExecutor
Abstract base class from which all Executors are derived.
Executor
Default Executor implementing all basic functionality and IPC.
BinaryExecutor
Can execute any arbitrary binary/command as a managed task within the framework provided by LUTE.
"},{"location":"source/execution/executor/#execution.executor--exceptions","title":"Exceptions","text":""},{"location":"source/execution/executor/#execution.executor.BaseExecutor","title":"BaseExecutor
","text":" Bases: ABC
ABC to manage Task execution and communication with user services.
When running in a workflow, \"tasks\" (not the class instances) are submitted as Executors
. The Executor manages environment setup, the actual Task submission, and communication regarding Task results and status with third party services like the eLog.
Attributes:
Methods:
Name Descriptionadd_hook
str, hook: Callable[[None], None]) -> None: Create a new hook to be called each time a specific event occurs.
add_default_hooks
Populate the event hooks with the default functions.
update_environment
Dict[str, str], update_path: str): Update the environment that is passed to the Task subprocess.
execute_task
Run the task as a subprocess.
Source code inlute/execution/executor.py
class BaseExecutor(ABC):\n \"\"\"ABC to manage Task execution and communication with user services.\n\n When running in a workflow, \"tasks\" (not the class instances) are submitted\n as `Executors`. The Executor manages environment setup, the actual Task\n submission, and communication regarding Task results and status with third\n party services like the eLog.\n\n Attributes:\n\n Methods:\n add_hook(event: str, hook: Callable[[None], None]) -> None: Create a\n new hook to be called each time a specific event occurs.\n\n add_default_hooks() -> None: Populate the event hooks with the default\n functions.\n\n update_environment(env: Dict[str, str], update_path: str): Update the\n environment that is passed to the Task subprocess.\n\n execute_task(): Run the task as a subprocess.\n \"\"\"\n\n class Hooks:\n \"\"\"A container class for the Executor's event hooks.\n\n There is a corresponding function (hook) for each event/signal. Each\n function takes two parameters - a reference to the Executor (self) and\n a reference to the Message (msg) which includes the corresponding\n signal.\n \"\"\"\n\n def no_pickle_mode(self: Self, msg: Message): ...\n\n def task_started(self: Self, msg: Message): ...\n\n def task_failed(self: Self, msg: Message): ...\n\n def task_stopped(self: Self, msg: Message): ...\n\n def task_done(self: Self, msg: Message): ...\n\n def task_cancelled(self: Self, msg: Message): ...\n\n def task_result(self: Self, msg: Message): ...\n\n def __init__(\n self,\n task_name: str,\n communicators: List[Communicator],\n poll_interval: float = 0.05,\n ) -> None:\n \"\"\"The Executor will manage the subprocess in which `task_name` is run.\n\n Args:\n task_name (str): The name of the Task to be submitted. Must match\n the Task's class name exactly. The parameter specification must\n also be in a properly named model to be identified.\n\n communicators (List[Communicator]): A list of one or more\n communicators which manage information flow to/from the Task.\n Subclasses may have different defaults, and new functionality\n can be introduced by composing Executors with communicators.\n\n poll_interval (float): Time to wait between reading/writing to the\n managed subprocess. In seconds.\n \"\"\"\n result: TaskResult = TaskResult(\n task_name=task_name, task_status=TaskStatus.PENDING, summary=\"\", payload=\"\"\n )\n task_parameters: Optional[TaskParameters] = None\n task_env: Dict[str, str] = os.environ.copy()\n self._communicators: List[Communicator] = communicators\n communicator_desc: List[str] = []\n for comm in self._communicators:\n comm.stage_communicator()\n communicator_desc.append(str(comm))\n\n self._analysis_desc: DescribedAnalysis = DescribedAnalysis(\n task_result=result,\n task_parameters=task_parameters,\n task_env=task_env,\n poll_interval=poll_interval,\n communicator_desc=communicator_desc,\n )\n\n def add_hook(self, event: str, hook: Callable[[Self, Message], None]) -> None:\n \"\"\"Add a new hook.\n\n Each hook is a function called any time the Executor receives a signal\n for a particular event, e.g. Task starts, Task ends, etc. Calling this\n method will remove any hook that currently exists for the event. I.e.\n only one hook can be called per event at a time. Creating hooks for\n events which do not exist is not allowed.\n\n Args:\n event (str): The event for which the hook will be called.\n\n hook (Callable[[None], None]) The function to be called during each\n occurrence of the event.\n \"\"\"\n if event.upper() in LUTE_SIGNALS:\n setattr(self.Hooks, event.lower(), hook)\n\n @abstractmethod\n def add_default_hooks(self) -> None:\n \"\"\"Populate the set of default event hooks.\"\"\"\n\n ...\n\n def update_environment(\n self, env: Dict[str, str], update_path: str = \"prepend\"\n ) -> None:\n \"\"\"Update the stored set of environment variables.\n\n These are passed to the subprocess to setup its environment.\n\n Args:\n env (Dict[str, str]): A dictionary of \"VAR\":\"VALUE\" pairs of\n environment variables to be added to the subprocess environment.\n If any variables already exist, the new variables will\n overwrite them (except PATH, see below).\n\n update_path (str): If PATH is present in the new set of variables,\n this argument determines how the old PATH is dealt with. There\n are three options:\n * \"prepend\" : The new PATH values are prepended to the old ones.\n * \"append\" : The new PATH values are appended to the old ones.\n * \"overwrite\" : The old PATH is overwritten by the new one.\n \"prepend\" is the default option. If PATH is not present in the\n current environment, the new PATH is used without modification.\n \"\"\"\n if \"PATH\" in env:\n sep: str = os.pathsep\n if update_path == \"prepend\":\n env[\"PATH\"] = (\n f\"{env['PATH']}{sep}{self._analysis_desc.task_env['PATH']}\"\n )\n elif update_path == \"append\":\n env[\"PATH\"] = (\n f\"{self._analysis_desc.task_env['PATH']}{sep}{env['PATH']}\"\n )\n elif update_path == \"overwrite\":\n pass\n else:\n raise ValueError(\n (\n f\"{update_path} is not a valid option for `update_path`!\"\n \" Options are: prepend, append, overwrite.\"\n )\n )\n os.environ.update(env)\n self._analysis_desc.task_env.update(env)\n\n def shell_source(self, env: str) -> None:\n \"\"\"Source a script.\n\n Unlike `update_environment` this method sources a new file.\n\n Args:\n env (str): Path to the script to source.\n \"\"\"\n import sys\n\n if not os.path.exists(env):\n logger.info(f\"Cannot source environment from {env}!\")\n return\n\n script: str = (\n f\"set -a\\n\"\n f'source \"{env}\" >/dev/null\\n'\n f'{sys.executable} -c \"import os; print(dict(os.environ))\"\\n'\n )\n logger.info(f\"Sourcing file {env}\")\n o, e = subprocess.Popen(\n [\"bash\", \"-c\", script], stdout=subprocess.PIPE\n ).communicate()\n new_environment: Dict[str, str] = eval(o)\n self._analysis_desc.task_env = new_environment\n\n def _pre_task(self) -> None:\n \"\"\"Any actions to be performed before task submission.\n\n This method may or may not be used by subclasses. It may be useful\n for logging etc.\n \"\"\"\n # This prevents the Executors in managed_tasks.py from all acquiring\n # resources like sockets.\n for communicator in self._communicators:\n communicator.delayed_setup()\n # Not great, but experience shows we need a bit of time to setup\n # network.\n time.sleep(0.1)\n # Propagate any env vars setup by Communicators - only update LUTE_ vars\n tmp: Dict[str, str] = {\n key: os.environ[key] for key in os.environ if \"LUTE_\" in key\n }\n self._analysis_desc.task_env.update(tmp)\n\n def _submit_task(self, cmd: str) -> subprocess.Popen:\n proc: subprocess.Popen = subprocess.Popen(\n cmd.split(),\n stdout=subprocess.PIPE,\n stderr=subprocess.PIPE,\n env=self._analysis_desc.task_env,\n )\n os.set_blocking(proc.stdout.fileno(), False)\n os.set_blocking(proc.stderr.fileno(), False)\n return proc\n\n @abstractmethod\n def _task_loop(self, proc: subprocess.Popen) -> None:\n \"\"\"Actions to perform while the Task is running.\n\n This function is run in the body of a loop until the Task signals\n that its finished.\n \"\"\"\n ...\n\n @abstractmethod\n def _finalize_task(self, proc: subprocess.Popen) -> None:\n \"\"\"Any actions to be performed after the Task has ended.\n\n Examples include a final clearing of the pipes, retrieving results,\n reporting to third party services, etc.\n \"\"\"\n ...\n\n def _submit_cmd(self, executable_path: str, params: str) -> str:\n \"\"\"Return a formatted command for launching Task subprocess.\n\n May be overridden by subclasses.\n\n Args:\n executable_path (str): Path to the LUTE subprocess script.\n\n params (str): String of formatted command-line arguments.\n\n Returns:\n cmd (str): Appropriately formatted command for this Executor.\n \"\"\"\n cmd: str = \"\"\n if __debug__:\n cmd = f\"python -B {executable_path} {params}\"\n else:\n cmd = f\"python -OB {executable_path} {params}\"\n\n return cmd\n\n def execute_task(self) -> None:\n \"\"\"Run the requested Task as a subprocess.\"\"\"\n self._pre_task()\n lute_path: Optional[str] = os.getenv(\"LUTE_PATH\")\n if lute_path is None:\n logger.debug(\"Absolute path to subprocess_task.py not found.\")\n lute_path = os.path.abspath(f\"{os.path.dirname(__file__)}/../..\")\n self.update_environment({\"LUTE_PATH\": lute_path})\n executable_path: str = f\"{lute_path}/subprocess_task.py\"\n config_path: str = self._analysis_desc.task_env[\"LUTE_CONFIGPATH\"]\n params: str = f\"-c {config_path} -t {self._analysis_desc.task_result.task_name}\"\n\n cmd: str = self._submit_cmd(executable_path, params)\n proc: subprocess.Popen = self._submit_task(cmd)\n\n while self._task_is_running(proc):\n self._task_loop(proc)\n time.sleep(self._analysis_desc.poll_interval)\n\n os.set_blocking(proc.stdout.fileno(), True)\n os.set_blocking(proc.stderr.fileno(), True)\n\n self._finalize_task(proc)\n proc.stdout.close()\n proc.stderr.close()\n proc.wait()\n if ret := proc.returncode:\n logger.info(f\"Task failed with return code: {ret}\")\n self._analysis_desc.task_result.task_status = TaskStatus.FAILED\n self.Hooks.task_failed(self, msg=Message())\n elif self._analysis_desc.task_result.task_status == TaskStatus.RUNNING:\n # Ret code is 0, no exception was thrown, task forgot to set status\n self._analysis_desc.task_result.task_status = TaskStatus.COMPLETED\n logger.debug(f\"Task did not change from RUNNING status. Assume COMPLETED.\")\n self.Hooks.task_done(self, msg=Message())\n self._store_configuration()\n for comm in self._communicators:\n comm.clear_communicator()\n\n if self._analysis_desc.task_result.task_status == TaskStatus.FAILED:\n logger.info(\"Exiting after Task failure. Result recorded.\")\n sys.exit(-1)\n\n self.process_results()\n\n def _store_configuration(self) -> None:\n \"\"\"Store configuration and results in the LUTE database.\"\"\"\n record_analysis_db(copy.deepcopy(self._analysis_desc))\n\n def _task_is_running(self, proc: subprocess.Popen) -> bool:\n \"\"\"Whether a subprocess is running.\n\n Args:\n proc (subprocess.Popen): The subprocess to determine the run status\n of.\n\n Returns:\n bool: Is the subprocess task running.\n \"\"\"\n # Add additional conditions - don't want to exit main loop\n # if only stopped\n task_status: TaskStatus = self._analysis_desc.task_result.task_status\n is_running: bool = task_status != TaskStatus.COMPLETED\n is_running &= task_status != TaskStatus.CANCELLED\n is_running &= task_status != TaskStatus.TIMEDOUT\n return proc.poll() is None and is_running\n\n def _stop(self, proc: subprocess.Popen) -> None:\n \"\"\"Stop the Task subprocess.\"\"\"\n os.kill(proc.pid, signal.SIGTSTP)\n self._analysis_desc.task_result.task_status = TaskStatus.STOPPED\n\n def _continue(self, proc: subprocess.Popen) -> None:\n \"\"\"Resume a stopped Task subprocess.\"\"\"\n os.kill(proc.pid, signal.SIGCONT)\n self._analysis_desc.task_result.task_status = TaskStatus.RUNNING\n\n def _set_result_from_parameters(self) -> None:\n \"\"\"Use TaskParameters object to set TaskResult fields.\n\n A result may be defined in terms of specific parameters. This is most\n useful for ThirdPartyTasks which would not otherwise have an easy way of\n reporting what the TaskResult is. There are two options for specifying\n results from parameters:\n 1. A single parameter (Field) of the model has an attribute\n `is_result`. This is a bool indicating that this parameter points\n to a result. E.g. a parameter `output` may set `is_result=True`.\n 2. The `TaskParameters.Config` has a `result_from_params` attribute.\n This is an appropriate option if the result is determinable for\n the Task, but it is not easily defined by a single parameter. The\n TaskParameters.Config.result_from_param can be set by a custom\n validator, e.g. to combine the values of multiple parameters into\n a single result. E.g. an `out_dir` and `out_file` parameter used\n together specify the result. Currently only string specifiers are\n supported.\n\n A TaskParameters object specifies that it contains information about the\n result by setting a single config option:\n TaskParameters.Config.set_result=True\n In general, this method should only be called when the above condition is\n met, however, there are minimal checks in it as well.\n \"\"\"\n # This method shouldn't be called unless appropriate\n # But we will add extra guards here\n if self._analysis_desc.task_parameters is None:\n logger.debug(\n \"Cannot set result from TaskParameters. TaskParameters is None!\"\n )\n return\n if (\n not hasattr(self._analysis_desc.task_parameters.Config, \"set_result\")\n or not self._analysis_desc.task_parameters.Config.set_result\n ):\n logger.debug(\n \"Cannot set result from TaskParameters. `set_result` not specified!\"\n )\n return\n\n # First try to set from result_from_params (faster)\n if self._analysis_desc.task_parameters.Config.result_from_params is not None:\n result_from_params: str = (\n self._analysis_desc.task_parameters.Config.result_from_params\n )\n logger.info(f\"TaskResult specified as {result_from_params}.\")\n self._analysis_desc.task_result.payload = result_from_params\n else:\n # Iterate parameters to find the one that is the result\n schema: Dict[str, Any] = self._analysis_desc.task_parameters.schema()\n for param, value in self._analysis_desc.task_parameters.dict().items():\n param_attrs: Dict[str, Any] = schema[\"properties\"][param]\n if \"is_result\" in param_attrs:\n is_result: bool = param_attrs[\"is_result\"]\n if isinstance(is_result, bool) and is_result:\n logger.info(f\"TaskResult specified as {value}.\")\n self._analysis_desc.task_result.payload = value\n else:\n logger.debug(\n (\n f\"{param} specified as result! But specifier is of \"\n f\"wrong type: {type(is_result)}!\"\n )\n )\n break # We should only have 1 result-like parameter!\n\n # If we get this far and haven't changed the payload we should complain\n if self._analysis_desc.task_result.payload == \"\":\n task_name: str = self._analysis_desc.task_result.task_name\n logger.debug(\n (\n f\"{task_name} specified result be set from {task_name}Parameters,\"\n \" but no result provided! Check model definition!\"\n )\n )\n # Now check for impl_schemas and pass to result.impl_schemas\n # Currently unused\n impl_schemas: Optional[str] = (\n self._analysis_desc.task_parameters.Config.impl_schemas\n )\n self._analysis_desc.task_result.impl_schemas = impl_schemas\n # If we set_result but didn't get schema information we should complain\n if self._analysis_desc.task_result.impl_schemas is None:\n task_name: str = self._analysis_desc.task_result.task_name\n logger.debug(\n (\n f\"{task_name} specified result be set from {task_name}Parameters,\"\n \" but no schema provided! Check model definition!\"\n )\n )\n\n def process_results(self) -> None:\n \"\"\"Perform any necessary steps to process TaskResults object.\n\n Processing will depend on subclass. Examples of steps include, moving\n files, converting file formats, compiling plots/figures into an HTML\n file, etc.\n \"\"\"\n self._process_results()\n\n @abstractmethod\n def _process_results(self) -> None: ...\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.Hooks","title":"Hooks
","text":"A container class for the Executor's event hooks.
There is a corresponding function (hook) for each event/signal. Each function takes two parameters - a reference to the Executor (self) and a reference to the Message (msg) which includes the corresponding signal.
Source code inlute/execution/executor.py
class Hooks:\n \"\"\"A container class for the Executor's event hooks.\n\n There is a corresponding function (hook) for each event/signal. Each\n function takes two parameters - a reference to the Executor (self) and\n a reference to the Message (msg) which includes the corresponding\n signal.\n \"\"\"\n\n def no_pickle_mode(self: Self, msg: Message): ...\n\n def task_started(self: Self, msg: Message): ...\n\n def task_failed(self: Self, msg: Message): ...\n\n def task_stopped(self: Self, msg: Message): ...\n\n def task_done(self: Self, msg: Message): ...\n\n def task_cancelled(self: Self, msg: Message): ...\n\n def task_result(self: Self, msg: Message): ...\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.__init__","title":"__init__(task_name, communicators, poll_interval=0.05)
","text":"The Executor will manage the subprocess in which task_name
is run.
Parameters:
Name Type Description Defaulttask_name
str
The name of the Task to be submitted. Must match the Task's class name exactly. The parameter specification must also be in a properly named model to be identified.
requiredcommunicators
List[Communicator]
A list of one or more communicators which manage information flow to/from the Task. Subclasses may have different defaults, and new functionality can be introduced by composing Executors with communicators.
requiredpoll_interval
float
Time to wait between reading/writing to the managed subprocess. In seconds.
0.05
Source code in lute/execution/executor.py
def __init__(\n self,\n task_name: str,\n communicators: List[Communicator],\n poll_interval: float = 0.05,\n) -> None:\n \"\"\"The Executor will manage the subprocess in which `task_name` is run.\n\n Args:\n task_name (str): The name of the Task to be submitted. Must match\n the Task's class name exactly. The parameter specification must\n also be in a properly named model to be identified.\n\n communicators (List[Communicator]): A list of one or more\n communicators which manage information flow to/from the Task.\n Subclasses may have different defaults, and new functionality\n can be introduced by composing Executors with communicators.\n\n poll_interval (float): Time to wait between reading/writing to the\n managed subprocess. In seconds.\n \"\"\"\n result: TaskResult = TaskResult(\n task_name=task_name, task_status=TaskStatus.PENDING, summary=\"\", payload=\"\"\n )\n task_parameters: Optional[TaskParameters] = None\n task_env: Dict[str, str] = os.environ.copy()\n self._communicators: List[Communicator] = communicators\n communicator_desc: List[str] = []\n for comm in self._communicators:\n comm.stage_communicator()\n communicator_desc.append(str(comm))\n\n self._analysis_desc: DescribedAnalysis = DescribedAnalysis(\n task_result=result,\n task_parameters=task_parameters,\n task_env=task_env,\n poll_interval=poll_interval,\n communicator_desc=communicator_desc,\n )\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.add_default_hooks","title":"add_default_hooks()
abstractmethod
","text":"Populate the set of default event hooks.
Source code inlute/execution/executor.py
@abstractmethod\ndef add_default_hooks(self) -> None:\n \"\"\"Populate the set of default event hooks.\"\"\"\n\n ...\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.add_hook","title":"add_hook(event, hook)
","text":"Add a new hook.
Each hook is a function called any time the Executor receives a signal for a particular event, e.g. Task starts, Task ends, etc. Calling this method will remove any hook that currently exists for the event. I.e. only one hook can be called per event at a time. Creating hooks for events which do not exist is not allowed.
Parameters:
Name Type Description Defaultevent
str
The event for which the hook will be called.
required Source code inlute/execution/executor.py
def add_hook(self, event: str, hook: Callable[[Self, Message], None]) -> None:\n \"\"\"Add a new hook.\n\n Each hook is a function called any time the Executor receives a signal\n for a particular event, e.g. Task starts, Task ends, etc. Calling this\n method will remove any hook that currently exists for the event. I.e.\n only one hook can be called per event at a time. Creating hooks for\n events which do not exist is not allowed.\n\n Args:\n event (str): The event for which the hook will be called.\n\n hook (Callable[[None], None]) The function to be called during each\n occurrence of the event.\n \"\"\"\n if event.upper() in LUTE_SIGNALS:\n setattr(self.Hooks, event.lower(), hook)\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.execute_task","title":"execute_task()
","text":"Run the requested Task as a subprocess.
Source code inlute/execution/executor.py
def execute_task(self) -> None:\n \"\"\"Run the requested Task as a subprocess.\"\"\"\n self._pre_task()\n lute_path: Optional[str] = os.getenv(\"LUTE_PATH\")\n if lute_path is None:\n logger.debug(\"Absolute path to subprocess_task.py not found.\")\n lute_path = os.path.abspath(f\"{os.path.dirname(__file__)}/../..\")\n self.update_environment({\"LUTE_PATH\": lute_path})\n executable_path: str = f\"{lute_path}/subprocess_task.py\"\n config_path: str = self._analysis_desc.task_env[\"LUTE_CONFIGPATH\"]\n params: str = f\"-c {config_path} -t {self._analysis_desc.task_result.task_name}\"\n\n cmd: str = self._submit_cmd(executable_path, params)\n proc: subprocess.Popen = self._submit_task(cmd)\n\n while self._task_is_running(proc):\n self._task_loop(proc)\n time.sleep(self._analysis_desc.poll_interval)\n\n os.set_blocking(proc.stdout.fileno(), True)\n os.set_blocking(proc.stderr.fileno(), True)\n\n self._finalize_task(proc)\n proc.stdout.close()\n proc.stderr.close()\n proc.wait()\n if ret := proc.returncode:\n logger.info(f\"Task failed with return code: {ret}\")\n self._analysis_desc.task_result.task_status = TaskStatus.FAILED\n self.Hooks.task_failed(self, msg=Message())\n elif self._analysis_desc.task_result.task_status == TaskStatus.RUNNING:\n # Ret code is 0, no exception was thrown, task forgot to set status\n self._analysis_desc.task_result.task_status = TaskStatus.COMPLETED\n logger.debug(f\"Task did not change from RUNNING status. Assume COMPLETED.\")\n self.Hooks.task_done(self, msg=Message())\n self._store_configuration()\n for comm in self._communicators:\n comm.clear_communicator()\n\n if self._analysis_desc.task_result.task_status == TaskStatus.FAILED:\n logger.info(\"Exiting after Task failure. Result recorded.\")\n sys.exit(-1)\n\n self.process_results()\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.process_results","title":"process_results()
","text":"Perform any necessary steps to process TaskResults object.
Processing will depend on subclass. Examples of steps include, moving files, converting file formats, compiling plots/figures into an HTML file, etc.
Source code inlute/execution/executor.py
def process_results(self) -> None:\n \"\"\"Perform any necessary steps to process TaskResults object.\n\n Processing will depend on subclass. Examples of steps include, moving\n files, converting file formats, compiling plots/figures into an HTML\n file, etc.\n \"\"\"\n self._process_results()\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.shell_source","title":"shell_source(env)
","text":"Source a script.
Unlike update_environment
this method sources a new file.
Parameters:
Name Type Description Defaultenv
str
Path to the script to source.
required Source code inlute/execution/executor.py
def shell_source(self, env: str) -> None:\n \"\"\"Source a script.\n\n Unlike `update_environment` this method sources a new file.\n\n Args:\n env (str): Path to the script to source.\n \"\"\"\n import sys\n\n if not os.path.exists(env):\n logger.info(f\"Cannot source environment from {env}!\")\n return\n\n script: str = (\n f\"set -a\\n\"\n f'source \"{env}\" >/dev/null\\n'\n f'{sys.executable} -c \"import os; print(dict(os.environ))\"\\n'\n )\n logger.info(f\"Sourcing file {env}\")\n o, e = subprocess.Popen(\n [\"bash\", \"-c\", script], stdout=subprocess.PIPE\n ).communicate()\n new_environment: Dict[str, str] = eval(o)\n self._analysis_desc.task_env = new_environment\n
"},{"location":"source/execution/executor/#execution.executor.BaseExecutor.update_environment","title":"update_environment(env, update_path='prepend')
","text":"Update the stored set of environment variables.
These are passed to the subprocess to setup its environment.
Parameters:
Name Type Description Defaultenv
Dict[str, str]
A dictionary of \"VAR\":\"VALUE\" pairs of environment variables to be added to the subprocess environment. If any variables already exist, the new variables will overwrite them (except PATH, see below).
requiredupdate_path
str
If PATH is present in the new set of variables, this argument determines how the old PATH is dealt with. There are three options: * \"prepend\" : The new PATH values are prepended to the old ones. * \"append\" : The new PATH values are appended to the old ones. * \"overwrite\" : The old PATH is overwritten by the new one. \"prepend\" is the default option. If PATH is not present in the current environment, the new PATH is used without modification.
'prepend'
Source code in lute/execution/executor.py
def update_environment(\n self, env: Dict[str, str], update_path: str = \"prepend\"\n) -> None:\n \"\"\"Update the stored set of environment variables.\n\n These are passed to the subprocess to setup its environment.\n\n Args:\n env (Dict[str, str]): A dictionary of \"VAR\":\"VALUE\" pairs of\n environment variables to be added to the subprocess environment.\n If any variables already exist, the new variables will\n overwrite them (except PATH, see below).\n\n update_path (str): If PATH is present in the new set of variables,\n this argument determines how the old PATH is dealt with. There\n are three options:\n * \"prepend\" : The new PATH values are prepended to the old ones.\n * \"append\" : The new PATH values are appended to the old ones.\n * \"overwrite\" : The old PATH is overwritten by the new one.\n \"prepend\" is the default option. If PATH is not present in the\n current environment, the new PATH is used without modification.\n \"\"\"\n if \"PATH\" in env:\n sep: str = os.pathsep\n if update_path == \"prepend\":\n env[\"PATH\"] = (\n f\"{env['PATH']}{sep}{self._analysis_desc.task_env['PATH']}\"\n )\n elif update_path == \"append\":\n env[\"PATH\"] = (\n f\"{self._analysis_desc.task_env['PATH']}{sep}{env['PATH']}\"\n )\n elif update_path == \"overwrite\":\n pass\n else:\n raise ValueError(\n (\n f\"{update_path} is not a valid option for `update_path`!\"\n \" Options are: prepend, append, overwrite.\"\n )\n )\n os.environ.update(env)\n self._analysis_desc.task_env.update(env)\n
"},{"location":"source/execution/executor/#execution.executor.Communicator","title":"Communicator
","text":" Bases: ABC
lute/execution/ipc.py
class Communicator(ABC):\n def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n \"\"\"Abstract Base Class for IPC Communicator objects.\n\n Args:\n party (Party): Which object (side/process) the Communicator is\n managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n use_pickle (bool): Whether to serialize data using pickle prior to\n sending it.\n \"\"\"\n self._party = party\n self._use_pickle = use_pickle\n self.desc = \"Communicator abstract base class.\"\n\n @abstractmethod\n def read(self, proc: subprocess.Popen) -> Message:\n \"\"\"Method for reading data through the communication mechanism.\"\"\"\n ...\n\n @abstractmethod\n def write(self, msg: Message) -> None:\n \"\"\"Method for sending data through the communication mechanism.\"\"\"\n ...\n\n def __str__(self):\n name: str = str(type(self)).split(\"'\")[1].split(\".\")[-1]\n return f\"{name}: {self.desc}\"\n\n def __repr__(self):\n return self.__str__()\n\n def __enter__(self) -> Self:\n return self\n\n def __exit__(self) -> None: ...\n\n @property\n def has_messages(self) -> bool:\n \"\"\"Whether the Communicator has remaining messages.\n\n The precise method for determining whether there are remaining messages\n will depend on the specific Communicator sub-class.\n \"\"\"\n return False\n\n def stage_communicator(self):\n \"\"\"Alternative method for staging outside of context manager.\"\"\"\n self.__enter__()\n\n def clear_communicator(self):\n \"\"\"Alternative exit method outside of context manager.\"\"\"\n self.__exit__()\n\n def delayed_setup(self):\n \"\"\"Any setup that should be done later than init.\"\"\"\n ...\n
"},{"location":"source/execution/executor/#execution.executor.Communicator.has_messages","title":"has_messages: bool
property
","text":"Whether the Communicator has remaining messages.
The precise method for determining whether there are remaining messages will depend on the specific Communicator sub-class.
"},{"location":"source/execution/executor/#execution.executor.Communicator.__init__","title":"__init__(party=Party.TASK, use_pickle=True)
","text":"Abstract Base Class for IPC Communicator objects.
Parameters:
Name Type Description Defaultparty
Party
Which object (side/process) the Communicator is managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.
TASK
use_pickle
bool
Whether to serialize data using pickle prior to sending it.
True
Source code in lute/execution/ipc.py
def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n \"\"\"Abstract Base Class for IPC Communicator objects.\n\n Args:\n party (Party): Which object (side/process) the Communicator is\n managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n use_pickle (bool): Whether to serialize data using pickle prior to\n sending it.\n \"\"\"\n self._party = party\n self._use_pickle = use_pickle\n self.desc = \"Communicator abstract base class.\"\n
"},{"location":"source/execution/executor/#execution.executor.Communicator.clear_communicator","title":"clear_communicator()
","text":"Alternative exit method outside of context manager.
Source code inlute/execution/ipc.py
def clear_communicator(self):\n \"\"\"Alternative exit method outside of context manager.\"\"\"\n self.__exit__()\n
"},{"location":"source/execution/executor/#execution.executor.Communicator.delayed_setup","title":"delayed_setup()
","text":"Any setup that should be done later than init.
Source code inlute/execution/ipc.py
def delayed_setup(self):\n \"\"\"Any setup that should be done later than init.\"\"\"\n ...\n
"},{"location":"source/execution/executor/#execution.executor.Communicator.read","title":"read(proc)
abstractmethod
","text":"Method for reading data through the communication mechanism.
Source code inlute/execution/ipc.py
@abstractmethod\ndef read(self, proc: subprocess.Popen) -> Message:\n \"\"\"Method for reading data through the communication mechanism.\"\"\"\n ...\n
"},{"location":"source/execution/executor/#execution.executor.Communicator.stage_communicator","title":"stage_communicator()
","text":"Alternative method for staging outside of context manager.
Source code inlute/execution/ipc.py
def stage_communicator(self):\n \"\"\"Alternative method for staging outside of context manager.\"\"\"\n self.__enter__()\n
"},{"location":"source/execution/executor/#execution.executor.Communicator.write","title":"write(msg)
abstractmethod
","text":"Method for sending data through the communication mechanism.
Source code inlute/execution/ipc.py
@abstractmethod\ndef write(self, msg: Message) -> None:\n \"\"\"Method for sending data through the communication mechanism.\"\"\"\n ...\n
"},{"location":"source/execution/executor/#execution.executor.Executor","title":"Executor
","text":" Bases: BaseExecutor
Basic implementation of an Executor which manages simple IPC with Task.
Attributes:
Methods:
Name Descriptionadd_hook
str, hook: Callable[[None], None]) -> None: Create a new hook to be called each time a specific event occurs.
add_default_hooks
Populate the event hooks with the default functions.
update_environment
Dict[str, str], update_path: str): Update the environment that is passed to the Task subprocess.
execute_task
Run the task as a subprocess.
Source code inlute/execution/executor.py
class Executor(BaseExecutor):\n \"\"\"Basic implementation of an Executor which manages simple IPC with Task.\n\n Attributes:\n\n Methods:\n add_hook(event: str, hook: Callable[[None], None]) -> None: Create a\n new hook to be called each time a specific event occurs.\n\n add_default_hooks() -> None: Populate the event hooks with the default\n functions.\n\n update_environment(env: Dict[str, str], update_path: str): Update the\n environment that is passed to the Task subprocess.\n\n execute_task(): Run the task as a subprocess.\n \"\"\"\n\n def __init__(\n self,\n task_name: str,\n communicators: List[Communicator] = [\n PipeCommunicator(Party.EXECUTOR),\n SocketCommunicator(Party.EXECUTOR),\n ],\n poll_interval: float = 0.05,\n ) -> None:\n super().__init__(\n task_name=task_name,\n communicators=communicators,\n poll_interval=poll_interval,\n )\n self.add_default_hooks()\n\n def add_default_hooks(self) -> None:\n \"\"\"Populate the set of default event hooks.\"\"\"\n\n def no_pickle_mode(self: Executor, msg: Message):\n for idx, communicator in enumerate(self._communicators):\n if isinstance(communicator, PipeCommunicator):\n self._communicators[idx] = PipeCommunicator(\n Party.EXECUTOR, use_pickle=False\n )\n\n self.add_hook(\"no_pickle_mode\", no_pickle_mode)\n\n def task_started(self: Executor, msg: Message):\n if isinstance(msg.contents, TaskParameters):\n self._analysis_desc.task_parameters = msg.contents\n # Maybe just run this no matter what? Rely on the other guards?\n # Perhaps just check if ThirdPartyParameters?\n # if isinstance(self._analysis_desc.task_parameters, ThirdPartyParameters):\n if hasattr(self._analysis_desc.task_parameters.Config, \"set_result\"):\n # Third party Tasks may mark a parameter as the result\n # If so, setup the result now.\n self._set_result_from_parameters()\n logger.info(\n f\"Executor: {self._analysis_desc.task_result.task_name} started\"\n )\n self._analysis_desc.task_result.task_status = TaskStatus.RUNNING\n elog_data: Dict[str, str] = {\n f\"{self._analysis_desc.task_result.task_name} status\": \"RUNNING\",\n }\n post_elog_run_status(elog_data)\n\n self.add_hook(\"task_started\", task_started)\n\n def task_failed(self: Executor, msg: Message):\n elog_data: Dict[str, str] = {\n f\"{self._analysis_desc.task_result.task_name} status\": \"FAILED\",\n }\n post_elog_run_status(elog_data)\n\n self.add_hook(\"task_failed\", task_failed)\n\n def task_stopped(self: Executor, msg: Message):\n elog_data: Dict[str, str] = {\n f\"{self._analysis_desc.task_result.task_name} status\": \"STOPPED\",\n }\n post_elog_run_status(elog_data)\n\n self.add_hook(\"task_stopped\", task_stopped)\n\n def task_done(self: Executor, msg: Message):\n elog_data: Dict[str, str] = {\n f\"{self._analysis_desc.task_result.task_name} status\": \"COMPLETED\",\n }\n post_elog_run_status(elog_data)\n\n self.add_hook(\"task_done\", task_done)\n\n def task_cancelled(self: Executor, msg: Message):\n elog_data: Dict[str, str] = {\n f\"{self._analysis_desc.task_result.task_name} status\": \"CANCELLED\",\n }\n post_elog_run_status(elog_data)\n\n self.add_hook(\"task_cancelled\", task_cancelled)\n\n def task_result(self: Executor, msg: Message):\n if isinstance(msg.contents, TaskResult):\n self._analysis_desc.task_result = msg.contents\n logger.info(self._analysis_desc.task_result.summary)\n logger.info(self._analysis_desc.task_result.task_status)\n elog_data: Dict[str, str] = {\n f\"{self._analysis_desc.task_result.task_name} status\": \"COMPLETED\",\n }\n post_elog_run_status(elog_data)\n\n self.add_hook(\"task_result\", task_result)\n\n def _task_loop(self, proc: subprocess.Popen) -> None:\n \"\"\"Actions to perform while the Task is running.\n\n This function is run in the body of a loop until the Task signals\n that its finished.\n \"\"\"\n for communicator in self._communicators:\n while True:\n msg: Message = communicator.read(proc)\n if msg.signal is not None and msg.signal.upper() in LUTE_SIGNALS:\n hook: Callable[[Executor, Message], None] = getattr(\n self.Hooks, msg.signal.lower()\n )\n hook(self, msg)\n if msg.contents is not None:\n if isinstance(msg.contents, str) and msg.contents != \"\":\n logger.info(msg.contents)\n elif not isinstance(msg.contents, str):\n logger.info(msg.contents)\n if not communicator.has_messages:\n break\n\n def _finalize_task(self, proc: subprocess.Popen) -> None:\n \"\"\"Any actions to be performed after the Task has ended.\n\n Examples include a final clearing of the pipes, retrieving results,\n reporting to third party services, etc.\n \"\"\"\n self._task_loop(proc) # Perform a final read.\n\n def _process_results(self) -> None:\n \"\"\"Performs result processing.\n\n Actions include:\n - For `ElogSummaryPlots`, will save the summary plot to the appropriate\n directory for display in the eLog.\n \"\"\"\n task_result: TaskResult = self._analysis_desc.task_result\n self._process_result_payload(task_result.payload)\n self._process_result_summary(task_result.summary)\n\n def _process_result_payload(self, payload: Any) -> None:\n if self._analysis_desc.task_parameters is None:\n logger.debug(\"Please run Task before using this method!\")\n return\n if isinstance(payload, ElogSummaryPlots):\n # ElogSummaryPlots has figures and a display name\n # display name also serves as a path.\n expmt: str = self._analysis_desc.task_parameters.lute_config.experiment\n base_path: str = f\"/sdf/data/lcls/ds/{expmt[:3]}/{expmt}/stats/summary\"\n full_path: str = f\"{base_path}/{payload.display_name}\"\n if not os.path.isdir(full_path):\n os.makedirs(full_path)\n\n # Preferred plots are pn.Tabs objects which save directly as html\n # Only supported plot type that has \"save\" method - do not want to\n # import plot modules here to do type checks.\n if hasattr(payload.figures, \"save\"):\n payload.figures.save(f\"{full_path}/report.html\")\n else:\n ...\n elif isinstance(payload, str):\n # May be a path to a file...\n schemas: Optional[str] = self._analysis_desc.task_result.impl_schemas\n # Should also check `impl_schemas` to determine what to do with path\n\n def _process_result_summary(self, summary: str) -> None: ...\n
"},{"location":"source/execution/executor/#execution.executor.Executor.add_default_hooks","title":"add_default_hooks()
","text":"Populate the set of default event hooks.
Source code inlute/execution/executor.py
def add_default_hooks(self) -> None:\n \"\"\"Populate the set of default event hooks.\"\"\"\n\n def no_pickle_mode(self: Executor, msg: Message):\n for idx, communicator in enumerate(self._communicators):\n if isinstance(communicator, PipeCommunicator):\n self._communicators[idx] = PipeCommunicator(\n Party.EXECUTOR, use_pickle=False\n )\n\n self.add_hook(\"no_pickle_mode\", no_pickle_mode)\n\n def task_started(self: Executor, msg: Message):\n if isinstance(msg.contents, TaskParameters):\n self._analysis_desc.task_parameters = msg.contents\n # Maybe just run this no matter what? Rely on the other guards?\n # Perhaps just check if ThirdPartyParameters?\n # if isinstance(self._analysis_desc.task_parameters, ThirdPartyParameters):\n if hasattr(self._analysis_desc.task_parameters.Config, \"set_result\"):\n # Third party Tasks may mark a parameter as the result\n # If so, setup the result now.\n self._set_result_from_parameters()\n logger.info(\n f\"Executor: {self._analysis_desc.task_result.task_name} started\"\n )\n self._analysis_desc.task_result.task_status = TaskStatus.RUNNING\n elog_data: Dict[str, str] = {\n f\"{self._analysis_desc.task_result.task_name} status\": \"RUNNING\",\n }\n post_elog_run_status(elog_data)\n\n self.add_hook(\"task_started\", task_started)\n\n def task_failed(self: Executor, msg: Message):\n elog_data: Dict[str, str] = {\n f\"{self._analysis_desc.task_result.task_name} status\": \"FAILED\",\n }\n post_elog_run_status(elog_data)\n\n self.add_hook(\"task_failed\", task_failed)\n\n def task_stopped(self: Executor, msg: Message):\n elog_data: Dict[str, str] = {\n f\"{self._analysis_desc.task_result.task_name} status\": \"STOPPED\",\n }\n post_elog_run_status(elog_data)\n\n self.add_hook(\"task_stopped\", task_stopped)\n\n def task_done(self: Executor, msg: Message):\n elog_data: Dict[str, str] = {\n f\"{self._analysis_desc.task_result.task_name} status\": \"COMPLETED\",\n }\n post_elog_run_status(elog_data)\n\n self.add_hook(\"task_done\", task_done)\n\n def task_cancelled(self: Executor, msg: Message):\n elog_data: Dict[str, str] = {\n f\"{self._analysis_desc.task_result.task_name} status\": \"CANCELLED\",\n }\n post_elog_run_status(elog_data)\n\n self.add_hook(\"task_cancelled\", task_cancelled)\n\n def task_result(self: Executor, msg: Message):\n if isinstance(msg.contents, TaskResult):\n self._analysis_desc.task_result = msg.contents\n logger.info(self._analysis_desc.task_result.summary)\n logger.info(self._analysis_desc.task_result.task_status)\n elog_data: Dict[str, str] = {\n f\"{self._analysis_desc.task_result.task_name} status\": \"COMPLETED\",\n }\n post_elog_run_status(elog_data)\n\n self.add_hook(\"task_result\", task_result)\n
"},{"location":"source/execution/executor/#execution.executor.MPIExecutor","title":"MPIExecutor
","text":" Bases: Executor
Runs first-party Tasks that require MPI.
This Executor is otherwise identical to the standard Executor, except it uses mpirun
for Task
submission. Currently this Executor assumes a job has been submitted using SLURM as a first step. It will determine the number of MPI ranks based on the resources requested. As a fallback, it will try to determine the number of local cores available for cases where a job has not been submitted via SLURM. On S3DF, the second determination mechanism should accurately match the environment variable provided by SLURM indicating resources allocated.
This Executor will submit the Task to run with a number of processes equal to the total number of cores available minus 1. A single core is reserved for the Executor itself. Note that currently this means that you must submit on 3 cores or more, since MPI requires a minimum of 2 ranks, and the number of ranks is determined from the cores dedicated to Task execution.
Methods:
Name Description_submit_cmd
Run the task as a subprocess using mpirun
.
lute/execution/executor.py
class MPIExecutor(Executor):\n \"\"\"Runs first-party Tasks that require MPI.\n\n This Executor is otherwise identical to the standard Executor, except it\n uses `mpirun` for `Task` submission. Currently this Executor assumes a job\n has been submitted using SLURM as a first step. It will determine the number\n of MPI ranks based on the resources requested. As a fallback, it will try\n to determine the number of local cores available for cases where a job has\n not been submitted via SLURM. On S3DF, the second determination mechanism\n should accurately match the environment variable provided by SLURM indicating\n resources allocated.\n\n This Executor will submit the Task to run with a number of processes equal\n to the total number of cores available minus 1. A single core is reserved\n for the Executor itself. Note that currently this means that you must submit\n on 3 cores or more, since MPI requires a minimum of 2 ranks, and the number\n of ranks is determined from the cores dedicated to Task execution.\n\n Methods:\n _submit_cmd: Run the task as a subprocess using `mpirun`.\n \"\"\"\n\n def _submit_cmd(self, executable_path: str, params: str) -> str:\n \"\"\"Override submission command to use `mpirun`\n\n Args:\n executable_path (str): Path to the LUTE subprocess script.\n\n params (str): String of formatted command-line arguments.\n\n Returns:\n cmd (str): Appropriately formatted command for this Executor.\n \"\"\"\n py_cmd: str = \"\"\n nprocs: int = max(\n int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1\n )\n mpi_cmd: str = f\"mpirun -np {nprocs}\"\n if __debug__:\n py_cmd = f\"python -B -u -m mpi4py.run {executable_path} {params}\"\n else:\n py_cmd = f\"python -OB -u -m mpi4py.run {executable_path} {params}\"\n\n cmd: str = f\"{mpi_cmd} {py_cmd}\"\n return cmd\n
"},{"location":"source/execution/executor/#execution.executor.Party","title":"Party
","text":" Bases: Enum
Identifier for which party (side/end) is using a communicator.
For some types of communication streams there may be different interfaces depending on which side of the communicator you are on. This enum is used by the communicator to determine which interface to use.
Source code inlute/execution/ipc.py
class Party(Enum):\n \"\"\"Identifier for which party (side/end) is using a communicator.\n\n For some types of communication streams there may be different interfaces\n depending on which side of the communicator you are on. This enum is used\n by the communicator to determine which interface to use.\n \"\"\"\n\n TASK = 0\n \"\"\"\n The Task (client) side.\n \"\"\"\n EXECUTOR = 1\n \"\"\"\n The Executor (server) side.\n \"\"\"\n
"},{"location":"source/execution/executor/#execution.executor.Party.EXECUTOR","title":"EXECUTOR = 1
class-attribute
instance-attribute
","text":"The Executor (server) side.
"},{"location":"source/execution/executor/#execution.executor.Party.TASK","title":"TASK = 0
class-attribute
instance-attribute
","text":"The Task (client) side.
"},{"location":"source/execution/executor/#execution.executor.PipeCommunicator","title":"PipeCommunicator
","text":" Bases: Communicator
Provides communication through pipes over stderr/stdout.
The implementation of this communicator has reading and writing ocurring on stderr and stdout. In general the Task
will be writing while the Executor
will be reading. stderr
is used for sending signals.
lute/execution/ipc.py
class PipeCommunicator(Communicator):\n \"\"\"Provides communication through pipes over stderr/stdout.\n\n The implementation of this communicator has reading and writing ocurring\n on stderr and stdout. In general the `Task` will be writing while the\n `Executor` will be reading. `stderr` is used for sending signals.\n \"\"\"\n\n def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n \"\"\"IPC through pipes.\n\n Arbitrary objects may be transmitted using pickle to serialize the data.\n If pickle is not used\n\n Args:\n party (Party): Which object (side/process) the Communicator is\n managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n use_pickle (bool): Whether to serialize data using Pickle prior to\n sending it. If False, data is assumed to be text whi\n \"\"\"\n super().__init__(party=party, use_pickle=use_pickle)\n self.desc = \"Communicates through stderr and stdout using pickle.\"\n\n def read(self, proc: subprocess.Popen) -> Message:\n \"\"\"Read from stdout and stderr.\n\n Args:\n proc (subprocess.Popen): The process to read from.\n\n Returns:\n msg (Message): The message read, containing contents and signal.\n \"\"\"\n signal: Optional[str]\n contents: Optional[str]\n raw_signal: bytes = proc.stderr.read()\n raw_contents: bytes = proc.stdout.read()\n if raw_signal is not None:\n signal = raw_signal.decode()\n else:\n signal = raw_signal\n if raw_contents:\n if self._use_pickle:\n try:\n contents = pickle.loads(raw_contents)\n except (pickle.UnpicklingError, ValueError, EOFError) as err:\n logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=False\")\n self._use_pickle = False\n contents = self._safe_unpickle_decode(raw_contents)\n else:\n try:\n contents = raw_contents.decode()\n except UnicodeDecodeError as err:\n logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=True\")\n self._use_pickle = True\n contents = self._safe_unpickle_decode(raw_contents)\n else:\n contents = None\n\n if signal and signal not in LUTE_SIGNALS:\n # Some tasks write on stderr\n # If the signal channel has \"non-signal\" info, add it to\n # contents\n if not contents:\n contents = f\"({signal})\"\n else:\n contents = f\"{contents} ({signal})\"\n signal = None\n\n return Message(contents=contents, signal=signal)\n\n def _safe_unpickle_decode(self, maybe_mixed: bytes) -> Optional[str]:\n \"\"\"This method is used to unpickle and/or decode a bytes object.\n\n It attempts to handle cases where contents can be mixed, i.e., part of\n the message must be decoded and the other part unpickled. It handles\n only two-way splits. If there are more complex arrangements such as:\n <pickled>:<unpickled>:<pickled> etc, it will give up.\n\n The simpler two way splits are unlikely to occur in normal usage. They\n may arise when debugging if, e.g., `print` statements are mixed with the\n usage of the `_report_to_executor` method.\n\n Note that this method works because ONLY text data is assumed to be\n sent via the pipes. The method needs to be revised to handle non-text\n data if the `Task` is modified to also send that via PipeCommunicator.\n The use of pickle is supported to provide for this option if it is\n necessary. It may be deprecated in the future.\n\n Be careful when making changes. This method has seemingly redundant\n checks because unpickling will not throw an error if a full object can\n be retrieved. That is, the library will ignore extraneous bytes. This\n method attempts to retrieve that information if the pickled data comes\n first in the stream.\n\n Args:\n maybe_mixed (bytes): A bytes object which could require unpickling,\n decoding, or both.\n\n Returns:\n contents (Optional[str]): The unpickled/decoded contents if possible.\n Otherwise, None.\n \"\"\"\n contents: Optional[str]\n try:\n contents = pickle.loads(maybe_mixed)\n repickled: bytes = pickle.dumps(contents)\n if len(repickled) < len(maybe_mixed):\n # Successful unpickling, but pickle stops even if there are more bytes\n try:\n additional_data: str = maybe_mixed[len(repickled) :].decode()\n contents = f\"{contents}{additional_data}\"\n except UnicodeDecodeError:\n # Can't decode the bytes left by pickle, so they are lost\n missing_bytes: int = len(maybe_mixed) - len(repickled)\n logger.debug(\n f\"PipeCommunicator has truncated message. Unable to retrieve {missing_bytes} bytes.\"\n )\n except (pickle.UnpicklingError, ValueError, EOFError) as err:\n # Pickle may also throw a ValueError, e.g. this bytes: b\"Found! \\n\"\n # Pickle may also throw an EOFError, eg. this bytes: b\"F0\\n\"\n try:\n contents = maybe_mixed.decode()\n except UnicodeDecodeError as err2:\n try:\n contents = maybe_mixed[: err2.start].decode()\n contents = f\"{contents}{pickle.loads(maybe_mixed[err2.start:])}\"\n except Exception as err3:\n logger.debug(\n f\"PipeCommunicator unable to decode/parse data! {err3}\"\n )\n contents = None\n return contents\n\n def write(self, msg: Message) -> None:\n \"\"\"Write to stdout and stderr.\n\n The signal component is sent to `stderr` while the contents of the\n Message are sent to `stdout`.\n\n Args:\n msg (Message): The Message to send.\n \"\"\"\n if self._use_pickle:\n signal: bytes\n if msg.signal:\n signal = msg.signal.encode()\n else:\n signal = b\"\"\n\n contents: bytes = pickle.dumps(msg.contents)\n\n sys.stderr.buffer.write(signal)\n sys.stdout.buffer.write(contents)\n\n sys.stderr.buffer.flush()\n sys.stdout.buffer.flush()\n else:\n raw_signal: str\n if msg.signal:\n raw_signal = msg.signal\n else:\n raw_signal = \"\"\n\n raw_contents: str\n if isinstance(msg.contents, str):\n raw_contents = msg.contents\n elif msg.contents is None:\n raw_contents = \"\"\n else:\n raise ValueError(\n f\"Cannot send msg contents of type: {type(msg.contents)} when not using pickle!\"\n )\n sys.stderr.write(raw_signal)\n sys.stdout.write(raw_contents)\n
"},{"location":"source/execution/executor/#execution.executor.PipeCommunicator.__init__","title":"__init__(party=Party.TASK, use_pickle=True)
","text":"IPC through pipes.
Arbitrary objects may be transmitted using pickle to serialize the data. If pickle is not used
Parameters:
Name Type Description Defaultparty
Party
Which object (side/process) the Communicator is managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.
TASK
use_pickle
bool
Whether to serialize data using Pickle prior to sending it. If False, data is assumed to be text whi
True
Source code in lute/execution/ipc.py
def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n \"\"\"IPC through pipes.\n\n Arbitrary objects may be transmitted using pickle to serialize the data.\n If pickle is not used\n\n Args:\n party (Party): Which object (side/process) the Communicator is\n managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n use_pickle (bool): Whether to serialize data using Pickle prior to\n sending it. If False, data is assumed to be text whi\n \"\"\"\n super().__init__(party=party, use_pickle=use_pickle)\n self.desc = \"Communicates through stderr and stdout using pickle.\"\n
"},{"location":"source/execution/executor/#execution.executor.PipeCommunicator.read","title":"read(proc)
","text":"Read from stdout and stderr.
Parameters:
Name Type Description Defaultproc
Popen
The process to read from.
requiredReturns:
Name Type Descriptionmsg
Message
The message read, containing contents and signal.
Source code inlute/execution/ipc.py
def read(self, proc: subprocess.Popen) -> Message:\n \"\"\"Read from stdout and stderr.\n\n Args:\n proc (subprocess.Popen): The process to read from.\n\n Returns:\n msg (Message): The message read, containing contents and signal.\n \"\"\"\n signal: Optional[str]\n contents: Optional[str]\n raw_signal: bytes = proc.stderr.read()\n raw_contents: bytes = proc.stdout.read()\n if raw_signal is not None:\n signal = raw_signal.decode()\n else:\n signal = raw_signal\n if raw_contents:\n if self._use_pickle:\n try:\n contents = pickle.loads(raw_contents)\n except (pickle.UnpicklingError, ValueError, EOFError) as err:\n logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=False\")\n self._use_pickle = False\n contents = self._safe_unpickle_decode(raw_contents)\n else:\n try:\n contents = raw_contents.decode()\n except UnicodeDecodeError as err:\n logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=True\")\n self._use_pickle = True\n contents = self._safe_unpickle_decode(raw_contents)\n else:\n contents = None\n\n if signal and signal not in LUTE_SIGNALS:\n # Some tasks write on stderr\n # If the signal channel has \"non-signal\" info, add it to\n # contents\n if not contents:\n contents = f\"({signal})\"\n else:\n contents = f\"{contents} ({signal})\"\n signal = None\n\n return Message(contents=contents, signal=signal)\n
"},{"location":"source/execution/executor/#execution.executor.PipeCommunicator.write","title":"write(msg)
","text":"Write to stdout and stderr.
The signal component is sent to stderr
while the contents of the Message are sent to stdout
.
Parameters:
Name Type Description Defaultmsg
Message
The Message to send.
required Source code inlute/execution/ipc.py
def write(self, msg: Message) -> None:\n \"\"\"Write to stdout and stderr.\n\n The signal component is sent to `stderr` while the contents of the\n Message are sent to `stdout`.\n\n Args:\n msg (Message): The Message to send.\n \"\"\"\n if self._use_pickle:\n signal: bytes\n if msg.signal:\n signal = msg.signal.encode()\n else:\n signal = b\"\"\n\n contents: bytes = pickle.dumps(msg.contents)\n\n sys.stderr.buffer.write(signal)\n sys.stdout.buffer.write(contents)\n\n sys.stderr.buffer.flush()\n sys.stdout.buffer.flush()\n else:\n raw_signal: str\n if msg.signal:\n raw_signal = msg.signal\n else:\n raw_signal = \"\"\n\n raw_contents: str\n if isinstance(msg.contents, str):\n raw_contents = msg.contents\n elif msg.contents is None:\n raw_contents = \"\"\n else:\n raise ValueError(\n f\"Cannot send msg contents of type: {type(msg.contents)} when not using pickle!\"\n )\n sys.stderr.write(raw_signal)\n sys.stdout.write(raw_contents)\n
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator","title":"SocketCommunicator
","text":" Bases: Communicator
Provides communication over Unix or TCP sockets.
Communication is provided either using sockets with the Python socket library or using ZMQ. The choice of implementation is controlled by the global bool USE_ZMQ
.
LUTE_USE_TCP=1
If defined, TCP sockets will be used, otherwise Unix sockets will be used.
Regardless of socket type, the environment variable LUTE_EXECUTOR_HOST=<hostname>
will be defined by the Executor-side Communicator.
For TCP sockets: The Executor-side Communicator should be run first and will bind to all interfaces on the port determined by the environment variable: LUTE_PORT=###
If no port is defined, a port scan will be performed and the Executor-side Communicator will bind the first one available from a random selection. It will then define the environment variable so the Task-side can pick it up.
For Unix sockets: The path to the Unix socket is defined by the environment variable: LUTE_SOCKET=/path/to/socket
This class assumes proper permissions and that this above environment variable has been defined. The Task
is configured as what would commonly be referred to as the client
, while the Executor
is configured as the server.
If the Task process is run on a different machine than the Executor, the Task-side Communicator will open a ssh-tunnel to forward traffic from a local Unix socket to the Executor Unix socket. Opening of the tunnel relies on the environment variable: LUTE_EXECUTOR_HOST=<hostname>
to determine the Executor's host. This variable should be defined by the Executor and passed to the Task process automatically, but it can also be defined manually if launching the Task process separately. The Task will use the local socket <LUTE_SOCKET>.task{##}
. Multiple local sockets may be created. Currently, it is assumed that the user is identical on both the Task machine and Executor machine.
lute/execution/ipc.py
class SocketCommunicator(Communicator):\n \"\"\"Provides communication over Unix or TCP sockets.\n\n Communication is provided either using sockets with the Python socket library\n or using ZMQ. The choice of implementation is controlled by the global bool\n `USE_ZMQ`.\n\n Whether to use TCP or Unix sockets is controlled by the environment:\n `LUTE_USE_TCP=1`\n If defined, TCP sockets will be used, otherwise Unix sockets will be used.\n\n Regardless of socket type, the environment variable\n `LUTE_EXECUTOR_HOST=<hostname>`\n will be defined by the Executor-side Communicator.\n\n\n For TCP sockets:\n The Executor-side Communicator should be run first and will bind to all\n interfaces on the port determined by the environment variable:\n `LUTE_PORT=###`\n If no port is defined, a port scan will be performed and the Executor-side\n Communicator will bind the first one available from a random selection. It\n will then define the environment variable so the Task-side can pick it up.\n\n For Unix sockets:\n The path to the Unix socket is defined by the environment variable:\n `LUTE_SOCKET=/path/to/socket`\n This class assumes proper permissions and that this above environment\n variable has been defined. The `Task` is configured as what would commonly\n be referred to as the `client`, while the `Executor` is configured as the\n server.\n\n If the Task process is run on a different machine than the Executor, the\n Task-side Communicator will open a ssh-tunnel to forward traffic from a local\n Unix socket to the Executor Unix socket. Opening of the tunnel relies on the\n environment variable:\n `LUTE_EXECUTOR_HOST=<hostname>`\n to determine the Executor's host. This variable should be defined by the\n Executor and passed to the Task process automatically, but it can also be\n defined manually if launching the Task process separately. The Task will use\n the local socket `<LUTE_SOCKET>.task{##}`. Multiple local sockets may be\n created. Currently, it is assumed that the user is identical on both the Task\n machine and Executor machine.\n \"\"\"\n\n ACCEPT_TIMEOUT: float = 0.01\n \"\"\"\n Maximum time to wait to accept connections. Used by Executor-side.\n \"\"\"\n MSG_HEAD: bytes = b\"MSG\"\n \"\"\"\n Start signal of a message. The end of a message is indicated by MSG_HEAD[::-1].\n \"\"\"\n MSG_SEP: bytes = b\";;;\"\n \"\"\"\n Separator for parts of a message. Messages have a start, length, message and end.\n \"\"\"\n\n def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n \"\"\"IPC over a TCP or Unix socket.\n\n Unlike with the PipeCommunicator, pickle is always used to send data\n through the socket.\n\n Args:\n party (Party): Which object (side/process) the Communicator is\n managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n\n use_pickle (bool): Whether to use pickle. Always True currently,\n passing False does not change behaviour.\n \"\"\"\n super().__init__(party=party, use_pickle=use_pickle)\n\n def delayed_setup(self) -> None:\n \"\"\"Delays the creation of socket objects.\n\n The Executor initializes the Communicator when it is created. Since\n all Executors are created and available at once we want to delay\n acquisition of socket resources until a single Executor is ready\n to use them.\n \"\"\"\n self._data_socket: Union[socket.socket, zmq.sugar.socket.Socket]\n if USE_ZMQ:\n self.desc: str = \"Communicates using ZMQ through TCP or Unix sockets.\"\n self._context: zmq.context.Context = zmq.Context()\n self._data_socket = self._create_socket_zmq()\n else:\n self.desc: str = \"Communicates through a TCP or Unix socket.\"\n self._data_socket = self._create_socket_raw()\n self._data_socket.settimeout(SocketCommunicator.ACCEPT_TIMEOUT)\n\n if self._party == Party.EXECUTOR:\n # Executor created first so we can define the hostname env variable\n os.environ[\"LUTE_EXECUTOR_HOST\"] = socket.gethostname()\n # Setup reader thread\n self._reader_thread: threading.Thread = threading.Thread(\n target=self._read_socket\n )\n self._msg_queue: queue.Queue = queue.Queue()\n self._partial_msg: Optional[bytes] = None\n self._stop_thread: bool = False\n self._reader_thread.start()\n else:\n # Only used by Party.TASK\n self._use_ssh_tunnel: bool = False\n self._ssh_proc: Optional[subprocess.Popen] = None\n self._local_socket_path: Optional[str] = None\n\n # Read\n ############################################################################\n\n def read(self, proc: subprocess.Popen) -> Message:\n \"\"\"Return a message from the queue if available.\n\n Socket(s) are continuously monitored, and read from when new data is\n available.\n\n Args:\n proc (subprocess.Popen): The process to read from. Provided for\n compatibility with other Communicator subtypes. Is ignored.\n\n Returns:\n msg (Message): The message read, containing contents and signal.\n \"\"\"\n msg: Message\n try:\n msg = self._msg_queue.get(timeout=SocketCommunicator.ACCEPT_TIMEOUT)\n except queue.Empty:\n msg = Message()\n\n return msg\n\n def _read_socket(self) -> None:\n \"\"\"Read data from a socket.\n\n Socket(s) are continuously monitored, and read from when new data is\n available.\n\n Calls an underlying method for either raw sockets or ZMQ.\n \"\"\"\n\n while True:\n if self._stop_thread:\n logger.debug(\"Stopping socket reader thread.\")\n break\n if USE_ZMQ:\n self._read_socket_zmq()\n else:\n self._read_socket_raw()\n\n def _read_socket_raw(self) -> None:\n \"\"\"Read data from a socket.\n\n Raw socket implementation for the reader thread.\n \"\"\"\n connection: socket.socket\n addr: Union[str, Tuple[str, int]]\n try:\n connection, addr = self._data_socket.accept()\n full_data: bytes = b\"\"\n while True:\n data: bytes = connection.recv(8192)\n if data:\n full_data += data\n else:\n break\n connection.close()\n self._unpack_messages(full_data)\n except socket.timeout:\n pass\n\n def _read_socket_zmq(self) -> None:\n \"\"\"Read data from a socket.\n\n ZMQ implementation for the reader thread.\n \"\"\"\n try:\n full_data: bytes = self._data_socket.recv(0)\n self._unpack_messages(full_data)\n except zmq.ZMQError:\n pass\n\n def _unpack_messages(self, data: bytes) -> None:\n \"\"\"Unpacks a byte stream into individual messages.\n\n Messages are encoded in the following format:\n <HEAD><SEP><len(msg)><SEP><msg><SEP><HEAD[::-1]>\n The items between <> are replaced as follows:\n - <HEAD>: A start marker\n - <SEP>: A separator for components of the message\n - <len(msg)>: The length of the message payload in bytes.\n - <msg>: The message payload in bytes\n - <HEAD[::-1]>: The start marker in reverse to indicate the end.\n\n Partial messages (a series of bytes which cannot be converted to a full\n message) are stored for later. An attempt is made to reconstruct the\n message with the next call to this method.\n\n Args:\n data (bytes): A raw byte stream containing anywhere from a partial\n message to multiple full messages.\n \"\"\"\n msg: Message\n working_data: bytes\n if self._partial_msg:\n # Concatenate the previous partial message to the beginning\n working_data = self._partial_msg + data\n self._partial_msg = None\n else:\n working_data = data\n while working_data:\n try:\n # Message encoding: <HEAD><SEP><len><SEP><msg><SEP><HEAD[::-1]>\n end = working_data.find(\n SocketCommunicator.MSG_SEP + SocketCommunicator.MSG_HEAD[::-1]\n )\n msg_parts: List[bytes] = working_data[:end].split(\n SocketCommunicator.MSG_SEP\n )\n if len(msg_parts) != 3:\n self._partial_msg = working_data\n break\n\n cmd: bytes\n nbytes: bytes\n raw_msg: bytes\n cmd, nbytes, raw_msg = msg_parts\n if len(raw_msg) != int(nbytes):\n self._partial_msg = working_data\n break\n msg = pickle.loads(raw_msg)\n self._msg_queue.put(msg)\n except pickle.UnpicklingError:\n self._partial_msg = working_data\n break\n if end < len(working_data):\n # Add len(SEP+HEAD) since end marks the start of <SEP><HEAD[::-1]\n offset: int = len(\n SocketCommunicator.MSG_SEP + SocketCommunicator.MSG_HEAD\n )\n working_data = working_data[end + offset :]\n else:\n working_data = b\"\"\n\n # Write\n ############################################################################\n\n def _write_socket(self, msg: Message) -> None:\n \"\"\"Sends data over a socket from the 'client' (Task) side.\n\n Messages are encoded in the following format:\n <HEAD><SEP><len(msg)><SEP><msg><SEP><HEAD[::-1]>\n The items between <> are replaced as follows:\n - <HEAD>: A start marker\n - <SEP>: A separator for components of the message\n - <len(msg)>: The length of the message payload in bytes.\n - <msg>: The message payload in bytes\n - <HEAD[::-1]>: The start marker in reverse to indicate the end.\n\n This structure is used for decoding the message on the other end.\n \"\"\"\n data: bytes = pickle.dumps(msg)\n cmd: bytes = SocketCommunicator.MSG_HEAD\n size: bytes = b\"%d\" % len(data)\n end: bytes = SocketCommunicator.MSG_HEAD[::-1]\n sep: bytes = SocketCommunicator.MSG_SEP\n packed_msg: bytes = cmd + sep + size + sep + data + sep + end\n if USE_ZMQ:\n self._data_socket.send(packed_msg)\n else:\n self._data_socket.sendall(packed_msg)\n\n def write(self, msg: Message) -> None:\n \"\"\"Send a single Message.\n\n The entire Message (signal and contents) is serialized and sent through\n a connection over Unix socket.\n\n Args:\n msg (Message): The Message to send.\n \"\"\"\n self._write_socket(msg)\n\n # Generic create\n ############################################################################\n\n def _create_socket_raw(self) -> socket.socket:\n \"\"\"Create either a Unix or TCP socket.\n\n If the environment variable:\n `LUTE_USE_TCP=1`\n is defined, a TCP socket is returned, otherwise a Unix socket.\n\n Refer to the individual initialization methods for additional environment\n variables controlling the behaviour of these two communication types.\n\n Returns:\n data_socket (socket.socket): TCP or Unix socket.\n \"\"\"\n import struct\n\n use_tcp: Optional[str] = os.getenv(\"LUTE_USE_TCP\")\n sock: socket.socket\n if use_tcp is not None:\n if self._party == Party.EXECUTOR:\n logger.info(\"Will use raw TCP sockets.\")\n sock = self._init_tcp_socket_raw()\n else:\n if self._party == Party.EXECUTOR:\n logger.info(\"Will use raw Unix sockets.\")\n sock = self._init_unix_socket_raw()\n sock.setsockopt(\n socket.SOL_SOCKET, socket.SO_LINGER, struct.pack(\"ii\", 1, 10000)\n )\n return sock\n\n def _create_socket_zmq(self) -> zmq.sugar.socket.Socket:\n \"\"\"Create either a Unix or TCP socket.\n\n If the environment variable:\n `LUTE_USE_TCP=1`\n is defined, a TCP socket is returned, otherwise a Unix socket.\n\n Refer to the individual initialization methods for additional environment\n variables controlling the behaviour of these two communication types.\n\n Returns:\n data_socket (socket.socket): Unix socket object.\n \"\"\"\n socket_type: Literal[zmq.PULL, zmq.PUSH]\n if self._party == Party.EXECUTOR:\n socket_type = zmq.PULL\n else:\n socket_type = zmq.PUSH\n\n data_socket: zmq.sugar.socket.Socket = self._context.socket(socket_type)\n data_socket.set_hwm(160000)\n # Need to multiply by 1000 since ZMQ uses ms\n data_socket.setsockopt(\n zmq.RCVTIMEO, int(SocketCommunicator.ACCEPT_TIMEOUT * 1000)\n )\n # Try TCP first\n use_tcp: Optional[str] = os.getenv(\"LUTE_USE_TCP\")\n if use_tcp is not None:\n if self._party == Party.EXECUTOR:\n logger.info(\"Will use TCP (ZMQ).\")\n self._init_tcp_socket_zmq(data_socket)\n else:\n if self._party == Party.EXECUTOR:\n logger.info(\"Will use Unix sockets (ZMQ).\")\n self._init_unix_socket_zmq(data_socket)\n\n return data_socket\n\n # TCP Init\n ############################################################################\n\n def _find_random_port(\n self, min_port: int = 41923, max_port: int = 64324, max_tries: int = 100\n ) -> Optional[int]:\n \"\"\"Find a random open port to bind to if using TCP.\"\"\"\n from random import choices\n\n sock: socket.socket\n ports: List[int] = choices(range(min_port, max_port), k=max_tries)\n for port in ports:\n sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)\n try:\n sock.bind((\"\", port))\n sock.close()\n del sock\n return port\n except:\n continue\n return None\n\n def _init_tcp_socket_raw(self) -> socket.socket:\n \"\"\"Initialize a TCP socket.\n\n Executor-side code should always be run first. It checks to see if\n the environment variable\n `LUTE_PORT=###`\n is defined, if so binds it, otherwise find a free port from a selection\n of random ports. If a port search is performed, the `LUTE_PORT` variable\n will be defined so it can be picked up by the the Task-side Communicator.\n\n In the event that no port can be bound on the Executor-side, or the port\n and hostname information is unavailable to the Task-side, the program\n will exit.\n\n Returns:\n data_socket (socket.socket): TCP socket object.\n \"\"\"\n data_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)\n port: Optional[Union[str, int]] = os.getenv(\"LUTE_PORT\")\n if self._party == Party.EXECUTOR:\n if port is None:\n # If port is None find one\n # Executor code executes first\n port = self._find_random_port()\n if port is None:\n # Failed to find a port to bind\n logger.info(\n \"Executor failed to bind a port. \"\n \"Try providing a LUTE_PORT directly! Exiting!\"\n )\n sys.exit(-1)\n # Provide port env var for Task-side\n os.environ[\"LUTE_PORT\"] = str(port)\n data_socket.bind((\"\", int(port)))\n data_socket.listen()\n else:\n hostname: str = socket.gethostname()\n executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n if executor_hostname is None or port is None:\n logger.info(\n \"Task-side does not have host/port information!\"\n \" Check environment variables! Exiting!\"\n )\n sys.exit(-1)\n if hostname == executor_hostname:\n data_socket.connect((\"localhost\", int(port)))\n else:\n data_socket.connect((executor_hostname, int(port)))\n return data_socket\n\n def _init_tcp_socket_zmq(self, data_socket: zmq.sugar.socket.Socket) -> None:\n \"\"\"Initialize a TCP socket using ZMQ.\n\n Equivalent as the method above but requires passing in a ZMQ socket\n object instead of returning one.\n\n Args:\n data_socket (zmq.socket.Socket): Socket object.\n \"\"\"\n port: Optional[Union[str, int]] = os.getenv(\"LUTE_PORT\")\n if self._party == Party.EXECUTOR:\n if port is None:\n new_port: int = data_socket.bind_to_random_port(\"tcp://*\")\n if new_port is None:\n # Failed to find a port to bind\n logger.info(\n \"Executor failed to bind a port. \"\n \"Try providing a LUTE_PORT directly! Exiting!\"\n )\n sys.exit(-1)\n port = new_port\n os.environ[\"LUTE_PORT\"] = str(port)\n else:\n data_socket.bind(f\"tcp://*:{port}\")\n logger.debug(f\"Executor bound port {port}\")\n else:\n executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n if executor_hostname is None or port is None:\n logger.info(\n \"Task-side does not have host/port information!\"\n \" Check environment variables! Exiting!\"\n )\n sys.exit(-1)\n data_socket.connect(f\"tcp://{executor_hostname}:{port}\")\n\n # Unix Init\n ############################################################################\n\n def _get_socket_path(self) -> str:\n \"\"\"Return the socket path, defining one if it is not available.\n\n Returns:\n socket_path (str): Path to the Unix socket.\n \"\"\"\n socket_path: str\n try:\n socket_path = os.environ[\"LUTE_SOCKET\"]\n except KeyError as err:\n import uuid\n import tempfile\n\n # Define a path, and add to environment\n # Executor-side always created first, Task will use the same one\n socket_path = f\"{tempfile.gettempdir()}/lute_{uuid.uuid4().hex}.sock\"\n os.environ[\"LUTE_SOCKET\"] = socket_path\n logger.debug(f\"SocketCommunicator defines socket_path: {socket_path}\")\n if USE_ZMQ:\n return f\"ipc://{socket_path}\"\n else:\n return socket_path\n\n def _init_unix_socket_raw(self) -> socket.socket:\n \"\"\"Returns a Unix socket object.\n\n Executor-side code should always be run first. It checks to see if\n the environment variable\n `LUTE_SOCKET=XYZ`\n is defined, if so binds it, otherwise it will create a new path and\n define the environment variable for the Task-side to find.\n\n On the Task (client-side), this method will also open a SSH tunnel to\n forward a local Unix socket to an Executor Unix socket if the Task and\n Executor processes are on different machines.\n\n Returns:\n data_socket (socket.socket): Unix socket object.\n \"\"\"\n socket_path: str = self._get_socket_path()\n data_socket = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)\n if self._party == Party.EXECUTOR:\n if os.path.exists(socket_path):\n os.unlink(socket_path)\n data_socket.bind(socket_path)\n data_socket.listen()\n elif self._party == Party.TASK:\n hostname: str = socket.gethostname()\n executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n if executor_hostname is None:\n logger.info(\"Hostname for Executor process not found! Exiting!\")\n data_socket.close()\n sys.exit(-1)\n if hostname == executor_hostname:\n data_socket.connect(socket_path)\n else:\n self._local_socket_path = self._setup_unix_ssh_tunnel(\n socket_path, hostname, executor_hostname\n )\n while 1:\n # Keep trying reconnect until ssh tunnel works.\n try:\n data_socket.connect(self._local_socket_path)\n break\n except FileNotFoundError:\n continue\n\n return data_socket\n\n def _init_unix_socket_zmq(self, data_socket: zmq.sugar.socket.Socket) -> None:\n \"\"\"Initialize a Unix socket object, using ZMQ.\n\n Equivalent as the method above but requires passing in a ZMQ socket\n object instead of returning one.\n\n Args:\n data_socket (socket.socket): ZMQ object.\n \"\"\"\n socket_path = self._get_socket_path()\n if self._party == Party.EXECUTOR:\n if os.path.exists(socket_path):\n os.unlink(socket_path)\n data_socket.bind(socket_path)\n elif self._party == Party.TASK:\n hostname: str = socket.gethostname()\n executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n if executor_hostname is None:\n logger.info(\"Hostname for Executor process not found! Exiting!\")\n self._data_socket.close()\n sys.exit(-1)\n if hostname == executor_hostname:\n data_socket.connect(socket_path)\n else:\n # Need to remove ipc:// from socket_path for forwarding\n self._local_socket_path = self._setup_unix_ssh_tunnel(\n socket_path[6:], hostname, executor_hostname\n )\n # Need to add it back\n path: str = f\"ipc://{self._local_socket_path}\"\n data_socket.connect(path)\n\n def _setup_unix_ssh_tunnel(\n self, socket_path: str, hostname: str, executor_hostname: str\n ) -> str:\n \"\"\"Prepares an SSH tunnel for forwarding between Unix sockets on two hosts.\n\n An SSH tunnel is opened with `ssh -L <local>:<remote> sleep 2`.\n This method of communication is slightly slower and incurs additional\n overhead - it should only be used as a backup. If communication across\n multiple hosts is required consider using TCP. The Task will use\n the local socket `<LUTE_SOCKET>.task{##}`. Multiple local sockets may be\n created. It is assumed that the user is identical on both the\n Task machine and Executor machine.\n\n Returns:\n local_socket_path (str): The local Unix socket to connect to.\n \"\"\"\n if \"uuid\" not in globals():\n import uuid\n local_socket_path = f\"{socket_path}.task{uuid.uuid4().hex[:4]}\"\n self._use_ssh_tunnel = True\n ssh_cmd: List[str] = [\n \"ssh\",\n \"-o\",\n \"LogLevel=quiet\",\n \"-L\",\n f\"{local_socket_path}:{socket_path}\",\n executor_hostname,\n \"sleep\",\n \"2\",\n ]\n logger.debug(f\"Opening tunnel from {hostname} to {executor_hostname}\")\n self._ssh_proc = subprocess.Popen(ssh_cmd)\n time.sleep(0.4) # Need to wait... -> Use single Task comm at beginning?\n return local_socket_path\n\n # Clean up and properties\n ############################################################################\n\n def _clean_up(self) -> None:\n \"\"\"Clean up connections.\"\"\"\n if self._party == Party.EXECUTOR:\n self._stop_thread = True\n self._reader_thread.join()\n logger.debug(\"Closed reading thread.\")\n\n self._data_socket.close()\n if USE_ZMQ:\n self._context.term()\n else:\n ...\n\n if os.getenv(\"LUTE_USE_TCP\"):\n return\n else:\n if self._party == Party.EXECUTOR:\n os.unlink(os.getenv(\"LUTE_SOCKET\")) # Should be defined\n return\n elif self._use_ssh_tunnel:\n if self._ssh_proc is not None:\n self._ssh_proc.terminate()\n\n @property\n def has_messages(self) -> bool:\n if self._party == Party.TASK:\n # Shouldn't be called on Task-side\n return False\n\n if self._msg_queue.qsize() > 0:\n return True\n return False\n\n def __exit__(self):\n self._clean_up()\n
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.ACCEPT_TIMEOUT","title":"ACCEPT_TIMEOUT: float = 0.01
class-attribute
instance-attribute
","text":"Maximum time to wait to accept connections. Used by Executor-side.
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.MSG_HEAD","title":"MSG_HEAD: bytes = b'MSG'
class-attribute
instance-attribute
","text":"Start signal of a message. The end of a message is indicated by MSG_HEAD[::-1].
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.MSG_SEP","title":"MSG_SEP: bytes = b';;;'
class-attribute
instance-attribute
","text":"Separator for parts of a message. Messages have a start, length, message and end.
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.__init__","title":"__init__(party=Party.TASK, use_pickle=True)
","text":"IPC over a TCP or Unix socket.
Unlike with the PipeCommunicator, pickle is always used to send data through the socket.
Parameters:
Name Type Description Defaultparty
Party
Which object (side/process) the Communicator is managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.
TASK
use_pickle
bool
Whether to use pickle. Always True currently, passing False does not change behaviour.
True
Source code in lute/execution/ipc.py
def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n \"\"\"IPC over a TCP or Unix socket.\n\n Unlike with the PipeCommunicator, pickle is always used to send data\n through the socket.\n\n Args:\n party (Party): Which object (side/process) the Communicator is\n managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n\n use_pickle (bool): Whether to use pickle. Always True currently,\n passing False does not change behaviour.\n \"\"\"\n super().__init__(party=party, use_pickle=use_pickle)\n
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.delayed_setup","title":"delayed_setup()
","text":"Delays the creation of socket objects.
The Executor initializes the Communicator when it is created. Since all Executors are created and available at once we want to delay acquisition of socket resources until a single Executor is ready to use them.
Source code inlute/execution/ipc.py
def delayed_setup(self) -> None:\n \"\"\"Delays the creation of socket objects.\n\n The Executor initializes the Communicator when it is created. Since\n all Executors are created and available at once we want to delay\n acquisition of socket resources until a single Executor is ready\n to use them.\n \"\"\"\n self._data_socket: Union[socket.socket, zmq.sugar.socket.Socket]\n if USE_ZMQ:\n self.desc: str = \"Communicates using ZMQ through TCP or Unix sockets.\"\n self._context: zmq.context.Context = zmq.Context()\n self._data_socket = self._create_socket_zmq()\n else:\n self.desc: str = \"Communicates through a TCP or Unix socket.\"\n self._data_socket = self._create_socket_raw()\n self._data_socket.settimeout(SocketCommunicator.ACCEPT_TIMEOUT)\n\n if self._party == Party.EXECUTOR:\n # Executor created first so we can define the hostname env variable\n os.environ[\"LUTE_EXECUTOR_HOST\"] = socket.gethostname()\n # Setup reader thread\n self._reader_thread: threading.Thread = threading.Thread(\n target=self._read_socket\n )\n self._msg_queue: queue.Queue = queue.Queue()\n self._partial_msg: Optional[bytes] = None\n self._stop_thread: bool = False\n self._reader_thread.start()\n else:\n # Only used by Party.TASK\n self._use_ssh_tunnel: bool = False\n self._ssh_proc: Optional[subprocess.Popen] = None\n self._local_socket_path: Optional[str] = None\n
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.read","title":"read(proc)
","text":"Return a message from the queue if available.
Socket(s) are continuously monitored, and read from when new data is available.
Parameters:
Name Type Description Defaultproc
Popen
The process to read from. Provided for compatibility with other Communicator subtypes. Is ignored.
requiredReturns:
Name Type Descriptionmsg
Message
The message read, containing contents and signal.
Source code inlute/execution/ipc.py
def read(self, proc: subprocess.Popen) -> Message:\n \"\"\"Return a message from the queue if available.\n\n Socket(s) are continuously monitored, and read from when new data is\n available.\n\n Args:\n proc (subprocess.Popen): The process to read from. Provided for\n compatibility with other Communicator subtypes. Is ignored.\n\n Returns:\n msg (Message): The message read, containing contents and signal.\n \"\"\"\n msg: Message\n try:\n msg = self._msg_queue.get(timeout=SocketCommunicator.ACCEPT_TIMEOUT)\n except queue.Empty:\n msg = Message()\n\n return msg\n
"},{"location":"source/execution/executor/#execution.executor.SocketCommunicator.write","title":"write(msg)
","text":"Send a single Message.
The entire Message (signal and contents) is serialized and sent through a connection over Unix socket.
Parameters:
Name Type Description Defaultmsg
Message
The Message to send.
required Source code inlute/execution/ipc.py
def write(self, msg: Message) -> None:\n \"\"\"Send a single Message.\n\n The entire Message (signal and contents) is serialized and sent through\n a connection over Unix socket.\n\n Args:\n msg (Message): The Message to send.\n \"\"\"\n self._write_socket(msg)\n
"},{"location":"source/execution/ipc/","title":"ipc","text":"Classes and utilities for communication between Executors and subprocesses.
Communicators manage message passing and parsing between subprocesses. They maintain a limited public interface of \"read\" and \"write\" operations. Behind this interface the methods of communication vary from serialization across pipes to Unix sockets, etc. All communicators pass a single object called a \"Message\" which contains an arbitrary \"contents\" field as well as an optional \"signal\" field.
Classes:
Name DescriptionParty
Enum describing whether Communicator is on Task-side or Executor-side.
Message
A dataclass used for passing information from Task to Executor.
Communicator
Abstract base class for Communicator types.
PipeCommunicator
Manages communication between Task and Executor via pipes (stderr and stdout).
SocketCommunicator
Manages communication using sockets, either raw or using zmq. Supports both TCP and Unix sockets.
"},{"location":"source/execution/ipc/#execution.ipc.Communicator","title":"Communicator
","text":" Bases: ABC
lute/execution/ipc.py
class Communicator(ABC):\n def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n \"\"\"Abstract Base Class for IPC Communicator objects.\n\n Args:\n party (Party): Which object (side/process) the Communicator is\n managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n use_pickle (bool): Whether to serialize data using pickle prior to\n sending it.\n \"\"\"\n self._party = party\n self._use_pickle = use_pickle\n self.desc = \"Communicator abstract base class.\"\n\n @abstractmethod\n def read(self, proc: subprocess.Popen) -> Message:\n \"\"\"Method for reading data through the communication mechanism.\"\"\"\n ...\n\n @abstractmethod\n def write(self, msg: Message) -> None:\n \"\"\"Method for sending data through the communication mechanism.\"\"\"\n ...\n\n def __str__(self):\n name: str = str(type(self)).split(\"'\")[1].split(\".\")[-1]\n return f\"{name}: {self.desc}\"\n\n def __repr__(self):\n return self.__str__()\n\n def __enter__(self) -> Self:\n return self\n\n def __exit__(self) -> None: ...\n\n @property\n def has_messages(self) -> bool:\n \"\"\"Whether the Communicator has remaining messages.\n\n The precise method for determining whether there are remaining messages\n will depend on the specific Communicator sub-class.\n \"\"\"\n return False\n\n def stage_communicator(self):\n \"\"\"Alternative method for staging outside of context manager.\"\"\"\n self.__enter__()\n\n def clear_communicator(self):\n \"\"\"Alternative exit method outside of context manager.\"\"\"\n self.__exit__()\n\n def delayed_setup(self):\n \"\"\"Any setup that should be done later than init.\"\"\"\n ...\n
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.has_messages","title":"has_messages: bool
property
","text":"Whether the Communicator has remaining messages.
The precise method for determining whether there are remaining messages will depend on the specific Communicator sub-class.
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.__init__","title":"__init__(party=Party.TASK, use_pickle=True)
","text":"Abstract Base Class for IPC Communicator objects.
Parameters:
Name Type Description Defaultparty
Party
Which object (side/process) the Communicator is managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.
TASK
use_pickle
bool
Whether to serialize data using pickle prior to sending it.
True
Source code in lute/execution/ipc.py
def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n \"\"\"Abstract Base Class for IPC Communicator objects.\n\n Args:\n party (Party): Which object (side/process) the Communicator is\n managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n use_pickle (bool): Whether to serialize data using pickle prior to\n sending it.\n \"\"\"\n self._party = party\n self._use_pickle = use_pickle\n self.desc = \"Communicator abstract base class.\"\n
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.clear_communicator","title":"clear_communicator()
","text":"Alternative exit method outside of context manager.
Source code inlute/execution/ipc.py
def clear_communicator(self):\n \"\"\"Alternative exit method outside of context manager.\"\"\"\n self.__exit__()\n
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.delayed_setup","title":"delayed_setup()
","text":"Any setup that should be done later than init.
Source code inlute/execution/ipc.py
def delayed_setup(self):\n \"\"\"Any setup that should be done later than init.\"\"\"\n ...\n
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.read","title":"read(proc)
abstractmethod
","text":"Method for reading data through the communication mechanism.
Source code inlute/execution/ipc.py
@abstractmethod\ndef read(self, proc: subprocess.Popen) -> Message:\n \"\"\"Method for reading data through the communication mechanism.\"\"\"\n ...\n
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.stage_communicator","title":"stage_communicator()
","text":"Alternative method for staging outside of context manager.
Source code inlute/execution/ipc.py
def stage_communicator(self):\n \"\"\"Alternative method for staging outside of context manager.\"\"\"\n self.__enter__()\n
"},{"location":"source/execution/ipc/#execution.ipc.Communicator.write","title":"write(msg)
abstractmethod
","text":"Method for sending data through the communication mechanism.
Source code inlute/execution/ipc.py
@abstractmethod\ndef write(self, msg: Message) -> None:\n \"\"\"Method for sending data through the communication mechanism.\"\"\"\n ...\n
"},{"location":"source/execution/ipc/#execution.ipc.Party","title":"Party
","text":" Bases: Enum
Identifier for which party (side/end) is using a communicator.
For some types of communication streams there may be different interfaces depending on which side of the communicator you are on. This enum is used by the communicator to determine which interface to use.
Source code inlute/execution/ipc.py
class Party(Enum):\n \"\"\"Identifier for which party (side/end) is using a communicator.\n\n For some types of communication streams there may be different interfaces\n depending on which side of the communicator you are on. This enum is used\n by the communicator to determine which interface to use.\n \"\"\"\n\n TASK = 0\n \"\"\"\n The Task (client) side.\n \"\"\"\n EXECUTOR = 1\n \"\"\"\n The Executor (server) side.\n \"\"\"\n
"},{"location":"source/execution/ipc/#execution.ipc.Party.EXECUTOR","title":"EXECUTOR = 1
class-attribute
instance-attribute
","text":"The Executor (server) side.
"},{"location":"source/execution/ipc/#execution.ipc.Party.TASK","title":"TASK = 0
class-attribute
instance-attribute
","text":"The Task (client) side.
"},{"location":"source/execution/ipc/#execution.ipc.PipeCommunicator","title":"PipeCommunicator
","text":" Bases: Communicator
Provides communication through pipes over stderr/stdout.
The implementation of this communicator has reading and writing ocurring on stderr and stdout. In general the Task
will be writing while the Executor
will be reading. stderr
is used for sending signals.
lute/execution/ipc.py
class PipeCommunicator(Communicator):\n \"\"\"Provides communication through pipes over stderr/stdout.\n\n The implementation of this communicator has reading and writing ocurring\n on stderr and stdout. In general the `Task` will be writing while the\n `Executor` will be reading. `stderr` is used for sending signals.\n \"\"\"\n\n def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n \"\"\"IPC through pipes.\n\n Arbitrary objects may be transmitted using pickle to serialize the data.\n If pickle is not used\n\n Args:\n party (Party): Which object (side/process) the Communicator is\n managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n use_pickle (bool): Whether to serialize data using Pickle prior to\n sending it. If False, data is assumed to be text whi\n \"\"\"\n super().__init__(party=party, use_pickle=use_pickle)\n self.desc = \"Communicates through stderr and stdout using pickle.\"\n\n def read(self, proc: subprocess.Popen) -> Message:\n \"\"\"Read from stdout and stderr.\n\n Args:\n proc (subprocess.Popen): The process to read from.\n\n Returns:\n msg (Message): The message read, containing contents and signal.\n \"\"\"\n signal: Optional[str]\n contents: Optional[str]\n raw_signal: bytes = proc.stderr.read()\n raw_contents: bytes = proc.stdout.read()\n if raw_signal is not None:\n signal = raw_signal.decode()\n else:\n signal = raw_signal\n if raw_contents:\n if self._use_pickle:\n try:\n contents = pickle.loads(raw_contents)\n except (pickle.UnpicklingError, ValueError, EOFError) as err:\n logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=False\")\n self._use_pickle = False\n contents = self._safe_unpickle_decode(raw_contents)\n else:\n try:\n contents = raw_contents.decode()\n except UnicodeDecodeError as err:\n logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=True\")\n self._use_pickle = True\n contents = self._safe_unpickle_decode(raw_contents)\n else:\n contents = None\n\n if signal and signal not in LUTE_SIGNALS:\n # Some tasks write on stderr\n # If the signal channel has \"non-signal\" info, add it to\n # contents\n if not contents:\n contents = f\"({signal})\"\n else:\n contents = f\"{contents} ({signal})\"\n signal = None\n\n return Message(contents=contents, signal=signal)\n\n def _safe_unpickle_decode(self, maybe_mixed: bytes) -> Optional[str]:\n \"\"\"This method is used to unpickle and/or decode a bytes object.\n\n It attempts to handle cases where contents can be mixed, i.e., part of\n the message must be decoded and the other part unpickled. It handles\n only two-way splits. If there are more complex arrangements such as:\n <pickled>:<unpickled>:<pickled> etc, it will give up.\n\n The simpler two way splits are unlikely to occur in normal usage. They\n may arise when debugging if, e.g., `print` statements are mixed with the\n usage of the `_report_to_executor` method.\n\n Note that this method works because ONLY text data is assumed to be\n sent via the pipes. The method needs to be revised to handle non-text\n data if the `Task` is modified to also send that via PipeCommunicator.\n The use of pickle is supported to provide for this option if it is\n necessary. It may be deprecated in the future.\n\n Be careful when making changes. This method has seemingly redundant\n checks because unpickling will not throw an error if a full object can\n be retrieved. That is, the library will ignore extraneous bytes. This\n method attempts to retrieve that information if the pickled data comes\n first in the stream.\n\n Args:\n maybe_mixed (bytes): A bytes object which could require unpickling,\n decoding, or both.\n\n Returns:\n contents (Optional[str]): The unpickled/decoded contents if possible.\n Otherwise, None.\n \"\"\"\n contents: Optional[str]\n try:\n contents = pickle.loads(maybe_mixed)\n repickled: bytes = pickle.dumps(contents)\n if len(repickled) < len(maybe_mixed):\n # Successful unpickling, but pickle stops even if there are more bytes\n try:\n additional_data: str = maybe_mixed[len(repickled) :].decode()\n contents = f\"{contents}{additional_data}\"\n except UnicodeDecodeError:\n # Can't decode the bytes left by pickle, so they are lost\n missing_bytes: int = len(maybe_mixed) - len(repickled)\n logger.debug(\n f\"PipeCommunicator has truncated message. Unable to retrieve {missing_bytes} bytes.\"\n )\n except (pickle.UnpicklingError, ValueError, EOFError) as err:\n # Pickle may also throw a ValueError, e.g. this bytes: b\"Found! \\n\"\n # Pickle may also throw an EOFError, eg. this bytes: b\"F0\\n\"\n try:\n contents = maybe_mixed.decode()\n except UnicodeDecodeError as err2:\n try:\n contents = maybe_mixed[: err2.start].decode()\n contents = f\"{contents}{pickle.loads(maybe_mixed[err2.start:])}\"\n except Exception as err3:\n logger.debug(\n f\"PipeCommunicator unable to decode/parse data! {err3}\"\n )\n contents = None\n return contents\n\n def write(self, msg: Message) -> None:\n \"\"\"Write to stdout and stderr.\n\n The signal component is sent to `stderr` while the contents of the\n Message are sent to `stdout`.\n\n Args:\n msg (Message): The Message to send.\n \"\"\"\n if self._use_pickle:\n signal: bytes\n if msg.signal:\n signal = msg.signal.encode()\n else:\n signal = b\"\"\n\n contents: bytes = pickle.dumps(msg.contents)\n\n sys.stderr.buffer.write(signal)\n sys.stdout.buffer.write(contents)\n\n sys.stderr.buffer.flush()\n sys.stdout.buffer.flush()\n else:\n raw_signal: str\n if msg.signal:\n raw_signal = msg.signal\n else:\n raw_signal = \"\"\n\n raw_contents: str\n if isinstance(msg.contents, str):\n raw_contents = msg.contents\n elif msg.contents is None:\n raw_contents = \"\"\n else:\n raise ValueError(\n f\"Cannot send msg contents of type: {type(msg.contents)} when not using pickle!\"\n )\n sys.stderr.write(raw_signal)\n sys.stdout.write(raw_contents)\n
"},{"location":"source/execution/ipc/#execution.ipc.PipeCommunicator.__init__","title":"__init__(party=Party.TASK, use_pickle=True)
","text":"IPC through pipes.
Arbitrary objects may be transmitted using pickle to serialize the data. If pickle is not used
Parameters:
Name Type Description Defaultparty
Party
Which object (side/process) the Communicator is managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.
TASK
use_pickle
bool
Whether to serialize data using Pickle prior to sending it. If False, data is assumed to be text whi
True
Source code in lute/execution/ipc.py
def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n \"\"\"IPC through pipes.\n\n Arbitrary objects may be transmitted using pickle to serialize the data.\n If pickle is not used\n\n Args:\n party (Party): Which object (side/process) the Communicator is\n managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n use_pickle (bool): Whether to serialize data using Pickle prior to\n sending it. If False, data is assumed to be text whi\n \"\"\"\n super().__init__(party=party, use_pickle=use_pickle)\n self.desc = \"Communicates through stderr and stdout using pickle.\"\n
"},{"location":"source/execution/ipc/#execution.ipc.PipeCommunicator.read","title":"read(proc)
","text":"Read from stdout and stderr.
Parameters:
Name Type Description Defaultproc
Popen
The process to read from.
requiredReturns:
Name Type Descriptionmsg
Message
The message read, containing contents and signal.
Source code inlute/execution/ipc.py
def read(self, proc: subprocess.Popen) -> Message:\n \"\"\"Read from stdout and stderr.\n\n Args:\n proc (subprocess.Popen): The process to read from.\n\n Returns:\n msg (Message): The message read, containing contents and signal.\n \"\"\"\n signal: Optional[str]\n contents: Optional[str]\n raw_signal: bytes = proc.stderr.read()\n raw_contents: bytes = proc.stdout.read()\n if raw_signal is not None:\n signal = raw_signal.decode()\n else:\n signal = raw_signal\n if raw_contents:\n if self._use_pickle:\n try:\n contents = pickle.loads(raw_contents)\n except (pickle.UnpicklingError, ValueError, EOFError) as err:\n logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=False\")\n self._use_pickle = False\n contents = self._safe_unpickle_decode(raw_contents)\n else:\n try:\n contents = raw_contents.decode()\n except UnicodeDecodeError as err:\n logger.debug(\"PipeCommunicator (Executor) - Set _use_pickle=True\")\n self._use_pickle = True\n contents = self._safe_unpickle_decode(raw_contents)\n else:\n contents = None\n\n if signal and signal not in LUTE_SIGNALS:\n # Some tasks write on stderr\n # If the signal channel has \"non-signal\" info, add it to\n # contents\n if not contents:\n contents = f\"({signal})\"\n else:\n contents = f\"{contents} ({signal})\"\n signal = None\n\n return Message(contents=contents, signal=signal)\n
"},{"location":"source/execution/ipc/#execution.ipc.PipeCommunicator.write","title":"write(msg)
","text":"Write to stdout and stderr.
The signal component is sent to stderr
while the contents of the Message are sent to stdout
.
Parameters:
Name Type Description Defaultmsg
Message
The Message to send.
required Source code inlute/execution/ipc.py
def write(self, msg: Message) -> None:\n \"\"\"Write to stdout and stderr.\n\n The signal component is sent to `stderr` while the contents of the\n Message are sent to `stdout`.\n\n Args:\n msg (Message): The Message to send.\n \"\"\"\n if self._use_pickle:\n signal: bytes\n if msg.signal:\n signal = msg.signal.encode()\n else:\n signal = b\"\"\n\n contents: bytes = pickle.dumps(msg.contents)\n\n sys.stderr.buffer.write(signal)\n sys.stdout.buffer.write(contents)\n\n sys.stderr.buffer.flush()\n sys.stdout.buffer.flush()\n else:\n raw_signal: str\n if msg.signal:\n raw_signal = msg.signal\n else:\n raw_signal = \"\"\n\n raw_contents: str\n if isinstance(msg.contents, str):\n raw_contents = msg.contents\n elif msg.contents is None:\n raw_contents = \"\"\n else:\n raise ValueError(\n f\"Cannot send msg contents of type: {type(msg.contents)} when not using pickle!\"\n )\n sys.stderr.write(raw_signal)\n sys.stdout.write(raw_contents)\n
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator","title":"SocketCommunicator
","text":" Bases: Communicator
Provides communication over Unix or TCP sockets.
Communication is provided either using sockets with the Python socket library or using ZMQ. The choice of implementation is controlled by the global bool USE_ZMQ
.
LUTE_USE_TCP=1
If defined, TCP sockets will be used, otherwise Unix sockets will be used.
Regardless of socket type, the environment variable LUTE_EXECUTOR_HOST=<hostname>
will be defined by the Executor-side Communicator.
For TCP sockets: The Executor-side Communicator should be run first and will bind to all interfaces on the port determined by the environment variable: LUTE_PORT=###
If no port is defined, a port scan will be performed and the Executor-side Communicator will bind the first one available from a random selection. It will then define the environment variable so the Task-side can pick it up.
For Unix sockets: The path to the Unix socket is defined by the environment variable: LUTE_SOCKET=/path/to/socket
This class assumes proper permissions and that this above environment variable has been defined. The Task
is configured as what would commonly be referred to as the client
, while the Executor
is configured as the server.
If the Task process is run on a different machine than the Executor, the Task-side Communicator will open a ssh-tunnel to forward traffic from a local Unix socket to the Executor Unix socket. Opening of the tunnel relies on the environment variable: LUTE_EXECUTOR_HOST=<hostname>
to determine the Executor's host. This variable should be defined by the Executor and passed to the Task process automatically, but it can also be defined manually if launching the Task process separately. The Task will use the local socket <LUTE_SOCKET>.task{##}
. Multiple local sockets may be created. Currently, it is assumed that the user is identical on both the Task machine and Executor machine.
lute/execution/ipc.py
class SocketCommunicator(Communicator):\n \"\"\"Provides communication over Unix or TCP sockets.\n\n Communication is provided either using sockets with the Python socket library\n or using ZMQ. The choice of implementation is controlled by the global bool\n `USE_ZMQ`.\n\n Whether to use TCP or Unix sockets is controlled by the environment:\n `LUTE_USE_TCP=1`\n If defined, TCP sockets will be used, otherwise Unix sockets will be used.\n\n Regardless of socket type, the environment variable\n `LUTE_EXECUTOR_HOST=<hostname>`\n will be defined by the Executor-side Communicator.\n\n\n For TCP sockets:\n The Executor-side Communicator should be run first and will bind to all\n interfaces on the port determined by the environment variable:\n `LUTE_PORT=###`\n If no port is defined, a port scan will be performed and the Executor-side\n Communicator will bind the first one available from a random selection. It\n will then define the environment variable so the Task-side can pick it up.\n\n For Unix sockets:\n The path to the Unix socket is defined by the environment variable:\n `LUTE_SOCKET=/path/to/socket`\n This class assumes proper permissions and that this above environment\n variable has been defined. The `Task` is configured as what would commonly\n be referred to as the `client`, while the `Executor` is configured as the\n server.\n\n If the Task process is run on a different machine than the Executor, the\n Task-side Communicator will open a ssh-tunnel to forward traffic from a local\n Unix socket to the Executor Unix socket. Opening of the tunnel relies on the\n environment variable:\n `LUTE_EXECUTOR_HOST=<hostname>`\n to determine the Executor's host. This variable should be defined by the\n Executor and passed to the Task process automatically, but it can also be\n defined manually if launching the Task process separately. The Task will use\n the local socket `<LUTE_SOCKET>.task{##}`. Multiple local sockets may be\n created. Currently, it is assumed that the user is identical on both the Task\n machine and Executor machine.\n \"\"\"\n\n ACCEPT_TIMEOUT: float = 0.01\n \"\"\"\n Maximum time to wait to accept connections. Used by Executor-side.\n \"\"\"\n MSG_HEAD: bytes = b\"MSG\"\n \"\"\"\n Start signal of a message. The end of a message is indicated by MSG_HEAD[::-1].\n \"\"\"\n MSG_SEP: bytes = b\";;;\"\n \"\"\"\n Separator for parts of a message. Messages have a start, length, message and end.\n \"\"\"\n\n def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n \"\"\"IPC over a TCP or Unix socket.\n\n Unlike with the PipeCommunicator, pickle is always used to send data\n through the socket.\n\n Args:\n party (Party): Which object (side/process) the Communicator is\n managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n\n use_pickle (bool): Whether to use pickle. Always True currently,\n passing False does not change behaviour.\n \"\"\"\n super().__init__(party=party, use_pickle=use_pickle)\n\n def delayed_setup(self) -> None:\n \"\"\"Delays the creation of socket objects.\n\n The Executor initializes the Communicator when it is created. Since\n all Executors are created and available at once we want to delay\n acquisition of socket resources until a single Executor is ready\n to use them.\n \"\"\"\n self._data_socket: Union[socket.socket, zmq.sugar.socket.Socket]\n if USE_ZMQ:\n self.desc: str = \"Communicates using ZMQ through TCP or Unix sockets.\"\n self._context: zmq.context.Context = zmq.Context()\n self._data_socket = self._create_socket_zmq()\n else:\n self.desc: str = \"Communicates through a TCP or Unix socket.\"\n self._data_socket = self._create_socket_raw()\n self._data_socket.settimeout(SocketCommunicator.ACCEPT_TIMEOUT)\n\n if self._party == Party.EXECUTOR:\n # Executor created first so we can define the hostname env variable\n os.environ[\"LUTE_EXECUTOR_HOST\"] = socket.gethostname()\n # Setup reader thread\n self._reader_thread: threading.Thread = threading.Thread(\n target=self._read_socket\n )\n self._msg_queue: queue.Queue = queue.Queue()\n self._partial_msg: Optional[bytes] = None\n self._stop_thread: bool = False\n self._reader_thread.start()\n else:\n # Only used by Party.TASK\n self._use_ssh_tunnel: bool = False\n self._ssh_proc: Optional[subprocess.Popen] = None\n self._local_socket_path: Optional[str] = None\n\n # Read\n ############################################################################\n\n def read(self, proc: subprocess.Popen) -> Message:\n \"\"\"Return a message from the queue if available.\n\n Socket(s) are continuously monitored, and read from when new data is\n available.\n\n Args:\n proc (subprocess.Popen): The process to read from. Provided for\n compatibility with other Communicator subtypes. Is ignored.\n\n Returns:\n msg (Message): The message read, containing contents and signal.\n \"\"\"\n msg: Message\n try:\n msg = self._msg_queue.get(timeout=SocketCommunicator.ACCEPT_TIMEOUT)\n except queue.Empty:\n msg = Message()\n\n return msg\n\n def _read_socket(self) -> None:\n \"\"\"Read data from a socket.\n\n Socket(s) are continuously monitored, and read from when new data is\n available.\n\n Calls an underlying method for either raw sockets or ZMQ.\n \"\"\"\n\n while True:\n if self._stop_thread:\n logger.debug(\"Stopping socket reader thread.\")\n break\n if USE_ZMQ:\n self._read_socket_zmq()\n else:\n self._read_socket_raw()\n\n def _read_socket_raw(self) -> None:\n \"\"\"Read data from a socket.\n\n Raw socket implementation for the reader thread.\n \"\"\"\n connection: socket.socket\n addr: Union[str, Tuple[str, int]]\n try:\n connection, addr = self._data_socket.accept()\n full_data: bytes = b\"\"\n while True:\n data: bytes = connection.recv(8192)\n if data:\n full_data += data\n else:\n break\n connection.close()\n self._unpack_messages(full_data)\n except socket.timeout:\n pass\n\n def _read_socket_zmq(self) -> None:\n \"\"\"Read data from a socket.\n\n ZMQ implementation for the reader thread.\n \"\"\"\n try:\n full_data: bytes = self._data_socket.recv(0)\n self._unpack_messages(full_data)\n except zmq.ZMQError:\n pass\n\n def _unpack_messages(self, data: bytes) -> None:\n \"\"\"Unpacks a byte stream into individual messages.\n\n Messages are encoded in the following format:\n <HEAD><SEP><len(msg)><SEP><msg><SEP><HEAD[::-1]>\n The items between <> are replaced as follows:\n - <HEAD>: A start marker\n - <SEP>: A separator for components of the message\n - <len(msg)>: The length of the message payload in bytes.\n - <msg>: The message payload in bytes\n - <HEAD[::-1]>: The start marker in reverse to indicate the end.\n\n Partial messages (a series of bytes which cannot be converted to a full\n message) are stored for later. An attempt is made to reconstruct the\n message with the next call to this method.\n\n Args:\n data (bytes): A raw byte stream containing anywhere from a partial\n message to multiple full messages.\n \"\"\"\n msg: Message\n working_data: bytes\n if self._partial_msg:\n # Concatenate the previous partial message to the beginning\n working_data = self._partial_msg + data\n self._partial_msg = None\n else:\n working_data = data\n while working_data:\n try:\n # Message encoding: <HEAD><SEP><len><SEP><msg><SEP><HEAD[::-1]>\n end = working_data.find(\n SocketCommunicator.MSG_SEP + SocketCommunicator.MSG_HEAD[::-1]\n )\n msg_parts: List[bytes] = working_data[:end].split(\n SocketCommunicator.MSG_SEP\n )\n if len(msg_parts) != 3:\n self._partial_msg = working_data\n break\n\n cmd: bytes\n nbytes: bytes\n raw_msg: bytes\n cmd, nbytes, raw_msg = msg_parts\n if len(raw_msg) != int(nbytes):\n self._partial_msg = working_data\n break\n msg = pickle.loads(raw_msg)\n self._msg_queue.put(msg)\n except pickle.UnpicklingError:\n self._partial_msg = working_data\n break\n if end < len(working_data):\n # Add len(SEP+HEAD) since end marks the start of <SEP><HEAD[::-1]\n offset: int = len(\n SocketCommunicator.MSG_SEP + SocketCommunicator.MSG_HEAD\n )\n working_data = working_data[end + offset :]\n else:\n working_data = b\"\"\n\n # Write\n ############################################################################\n\n def _write_socket(self, msg: Message) -> None:\n \"\"\"Sends data over a socket from the 'client' (Task) side.\n\n Messages are encoded in the following format:\n <HEAD><SEP><len(msg)><SEP><msg><SEP><HEAD[::-1]>\n The items between <> are replaced as follows:\n - <HEAD>: A start marker\n - <SEP>: A separator for components of the message\n - <len(msg)>: The length of the message payload in bytes.\n - <msg>: The message payload in bytes\n - <HEAD[::-1]>: The start marker in reverse to indicate the end.\n\n This structure is used for decoding the message on the other end.\n \"\"\"\n data: bytes = pickle.dumps(msg)\n cmd: bytes = SocketCommunicator.MSG_HEAD\n size: bytes = b\"%d\" % len(data)\n end: bytes = SocketCommunicator.MSG_HEAD[::-1]\n sep: bytes = SocketCommunicator.MSG_SEP\n packed_msg: bytes = cmd + sep + size + sep + data + sep + end\n if USE_ZMQ:\n self._data_socket.send(packed_msg)\n else:\n self._data_socket.sendall(packed_msg)\n\n def write(self, msg: Message) -> None:\n \"\"\"Send a single Message.\n\n The entire Message (signal and contents) is serialized and sent through\n a connection over Unix socket.\n\n Args:\n msg (Message): The Message to send.\n \"\"\"\n self._write_socket(msg)\n\n # Generic create\n ############################################################################\n\n def _create_socket_raw(self) -> socket.socket:\n \"\"\"Create either a Unix or TCP socket.\n\n If the environment variable:\n `LUTE_USE_TCP=1`\n is defined, a TCP socket is returned, otherwise a Unix socket.\n\n Refer to the individual initialization methods for additional environment\n variables controlling the behaviour of these two communication types.\n\n Returns:\n data_socket (socket.socket): TCP or Unix socket.\n \"\"\"\n import struct\n\n use_tcp: Optional[str] = os.getenv(\"LUTE_USE_TCP\")\n sock: socket.socket\n if use_tcp is not None:\n if self._party == Party.EXECUTOR:\n logger.info(\"Will use raw TCP sockets.\")\n sock = self._init_tcp_socket_raw()\n else:\n if self._party == Party.EXECUTOR:\n logger.info(\"Will use raw Unix sockets.\")\n sock = self._init_unix_socket_raw()\n sock.setsockopt(\n socket.SOL_SOCKET, socket.SO_LINGER, struct.pack(\"ii\", 1, 10000)\n )\n return sock\n\n def _create_socket_zmq(self) -> zmq.sugar.socket.Socket:\n \"\"\"Create either a Unix or TCP socket.\n\n If the environment variable:\n `LUTE_USE_TCP=1`\n is defined, a TCP socket is returned, otherwise a Unix socket.\n\n Refer to the individual initialization methods for additional environment\n variables controlling the behaviour of these two communication types.\n\n Returns:\n data_socket (socket.socket): Unix socket object.\n \"\"\"\n socket_type: Literal[zmq.PULL, zmq.PUSH]\n if self._party == Party.EXECUTOR:\n socket_type = zmq.PULL\n else:\n socket_type = zmq.PUSH\n\n data_socket: zmq.sugar.socket.Socket = self._context.socket(socket_type)\n data_socket.set_hwm(160000)\n # Need to multiply by 1000 since ZMQ uses ms\n data_socket.setsockopt(\n zmq.RCVTIMEO, int(SocketCommunicator.ACCEPT_TIMEOUT * 1000)\n )\n # Try TCP first\n use_tcp: Optional[str] = os.getenv(\"LUTE_USE_TCP\")\n if use_tcp is not None:\n if self._party == Party.EXECUTOR:\n logger.info(\"Will use TCP (ZMQ).\")\n self._init_tcp_socket_zmq(data_socket)\n else:\n if self._party == Party.EXECUTOR:\n logger.info(\"Will use Unix sockets (ZMQ).\")\n self._init_unix_socket_zmq(data_socket)\n\n return data_socket\n\n # TCP Init\n ############################################################################\n\n def _find_random_port(\n self, min_port: int = 41923, max_port: int = 64324, max_tries: int = 100\n ) -> Optional[int]:\n \"\"\"Find a random open port to bind to if using TCP.\"\"\"\n from random import choices\n\n sock: socket.socket\n ports: List[int] = choices(range(min_port, max_port), k=max_tries)\n for port in ports:\n sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)\n try:\n sock.bind((\"\", port))\n sock.close()\n del sock\n return port\n except:\n continue\n return None\n\n def _init_tcp_socket_raw(self) -> socket.socket:\n \"\"\"Initialize a TCP socket.\n\n Executor-side code should always be run first. It checks to see if\n the environment variable\n `LUTE_PORT=###`\n is defined, if so binds it, otherwise find a free port from a selection\n of random ports. If a port search is performed, the `LUTE_PORT` variable\n will be defined so it can be picked up by the the Task-side Communicator.\n\n In the event that no port can be bound on the Executor-side, or the port\n and hostname information is unavailable to the Task-side, the program\n will exit.\n\n Returns:\n data_socket (socket.socket): TCP socket object.\n \"\"\"\n data_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)\n port: Optional[Union[str, int]] = os.getenv(\"LUTE_PORT\")\n if self._party == Party.EXECUTOR:\n if port is None:\n # If port is None find one\n # Executor code executes first\n port = self._find_random_port()\n if port is None:\n # Failed to find a port to bind\n logger.info(\n \"Executor failed to bind a port. \"\n \"Try providing a LUTE_PORT directly! Exiting!\"\n )\n sys.exit(-1)\n # Provide port env var for Task-side\n os.environ[\"LUTE_PORT\"] = str(port)\n data_socket.bind((\"\", int(port)))\n data_socket.listen()\n else:\n hostname: str = socket.gethostname()\n executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n if executor_hostname is None or port is None:\n logger.info(\n \"Task-side does not have host/port information!\"\n \" Check environment variables! Exiting!\"\n )\n sys.exit(-1)\n if hostname == executor_hostname:\n data_socket.connect((\"localhost\", int(port)))\n else:\n data_socket.connect((executor_hostname, int(port)))\n return data_socket\n\n def _init_tcp_socket_zmq(self, data_socket: zmq.sugar.socket.Socket) -> None:\n \"\"\"Initialize a TCP socket using ZMQ.\n\n Equivalent as the method above but requires passing in a ZMQ socket\n object instead of returning one.\n\n Args:\n data_socket (zmq.socket.Socket): Socket object.\n \"\"\"\n port: Optional[Union[str, int]] = os.getenv(\"LUTE_PORT\")\n if self._party == Party.EXECUTOR:\n if port is None:\n new_port: int = data_socket.bind_to_random_port(\"tcp://*\")\n if new_port is None:\n # Failed to find a port to bind\n logger.info(\n \"Executor failed to bind a port. \"\n \"Try providing a LUTE_PORT directly! Exiting!\"\n )\n sys.exit(-1)\n port = new_port\n os.environ[\"LUTE_PORT\"] = str(port)\n else:\n data_socket.bind(f\"tcp://*:{port}\")\n logger.debug(f\"Executor bound port {port}\")\n else:\n executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n if executor_hostname is None or port is None:\n logger.info(\n \"Task-side does not have host/port information!\"\n \" Check environment variables! Exiting!\"\n )\n sys.exit(-1)\n data_socket.connect(f\"tcp://{executor_hostname}:{port}\")\n\n # Unix Init\n ############################################################################\n\n def _get_socket_path(self) -> str:\n \"\"\"Return the socket path, defining one if it is not available.\n\n Returns:\n socket_path (str): Path to the Unix socket.\n \"\"\"\n socket_path: str\n try:\n socket_path = os.environ[\"LUTE_SOCKET\"]\n except KeyError as err:\n import uuid\n import tempfile\n\n # Define a path, and add to environment\n # Executor-side always created first, Task will use the same one\n socket_path = f\"{tempfile.gettempdir()}/lute_{uuid.uuid4().hex}.sock\"\n os.environ[\"LUTE_SOCKET\"] = socket_path\n logger.debug(f\"SocketCommunicator defines socket_path: {socket_path}\")\n if USE_ZMQ:\n return f\"ipc://{socket_path}\"\n else:\n return socket_path\n\n def _init_unix_socket_raw(self) -> socket.socket:\n \"\"\"Returns a Unix socket object.\n\n Executor-side code should always be run first. It checks to see if\n the environment variable\n `LUTE_SOCKET=XYZ`\n is defined, if so binds it, otherwise it will create a new path and\n define the environment variable for the Task-side to find.\n\n On the Task (client-side), this method will also open a SSH tunnel to\n forward a local Unix socket to an Executor Unix socket if the Task and\n Executor processes are on different machines.\n\n Returns:\n data_socket (socket.socket): Unix socket object.\n \"\"\"\n socket_path: str = self._get_socket_path()\n data_socket = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)\n if self._party == Party.EXECUTOR:\n if os.path.exists(socket_path):\n os.unlink(socket_path)\n data_socket.bind(socket_path)\n data_socket.listen()\n elif self._party == Party.TASK:\n hostname: str = socket.gethostname()\n executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n if executor_hostname is None:\n logger.info(\"Hostname for Executor process not found! Exiting!\")\n data_socket.close()\n sys.exit(-1)\n if hostname == executor_hostname:\n data_socket.connect(socket_path)\n else:\n self._local_socket_path = self._setup_unix_ssh_tunnel(\n socket_path, hostname, executor_hostname\n )\n while 1:\n # Keep trying reconnect until ssh tunnel works.\n try:\n data_socket.connect(self._local_socket_path)\n break\n except FileNotFoundError:\n continue\n\n return data_socket\n\n def _init_unix_socket_zmq(self, data_socket: zmq.sugar.socket.Socket) -> None:\n \"\"\"Initialize a Unix socket object, using ZMQ.\n\n Equivalent as the method above but requires passing in a ZMQ socket\n object instead of returning one.\n\n Args:\n data_socket (socket.socket): ZMQ object.\n \"\"\"\n socket_path = self._get_socket_path()\n if self._party == Party.EXECUTOR:\n if os.path.exists(socket_path):\n os.unlink(socket_path)\n data_socket.bind(socket_path)\n elif self._party == Party.TASK:\n hostname: str = socket.gethostname()\n executor_hostname: Optional[str] = os.getenv(\"LUTE_EXECUTOR_HOST\")\n if executor_hostname is None:\n logger.info(\"Hostname for Executor process not found! Exiting!\")\n self._data_socket.close()\n sys.exit(-1)\n if hostname == executor_hostname:\n data_socket.connect(socket_path)\n else:\n # Need to remove ipc:// from socket_path for forwarding\n self._local_socket_path = self._setup_unix_ssh_tunnel(\n socket_path[6:], hostname, executor_hostname\n )\n # Need to add it back\n path: str = f\"ipc://{self._local_socket_path}\"\n data_socket.connect(path)\n\n def _setup_unix_ssh_tunnel(\n self, socket_path: str, hostname: str, executor_hostname: str\n ) -> str:\n \"\"\"Prepares an SSH tunnel for forwarding between Unix sockets on two hosts.\n\n An SSH tunnel is opened with `ssh -L <local>:<remote> sleep 2`.\n This method of communication is slightly slower and incurs additional\n overhead - it should only be used as a backup. If communication across\n multiple hosts is required consider using TCP. The Task will use\n the local socket `<LUTE_SOCKET>.task{##}`. Multiple local sockets may be\n created. It is assumed that the user is identical on both the\n Task machine and Executor machine.\n\n Returns:\n local_socket_path (str): The local Unix socket to connect to.\n \"\"\"\n if \"uuid\" not in globals():\n import uuid\n local_socket_path = f\"{socket_path}.task{uuid.uuid4().hex[:4]}\"\n self._use_ssh_tunnel = True\n ssh_cmd: List[str] = [\n \"ssh\",\n \"-o\",\n \"LogLevel=quiet\",\n \"-L\",\n f\"{local_socket_path}:{socket_path}\",\n executor_hostname,\n \"sleep\",\n \"2\",\n ]\n logger.debug(f\"Opening tunnel from {hostname} to {executor_hostname}\")\n self._ssh_proc = subprocess.Popen(ssh_cmd)\n time.sleep(0.4) # Need to wait... -> Use single Task comm at beginning?\n return local_socket_path\n\n # Clean up and properties\n ############################################################################\n\n def _clean_up(self) -> None:\n \"\"\"Clean up connections.\"\"\"\n if self._party == Party.EXECUTOR:\n self._stop_thread = True\n self._reader_thread.join()\n logger.debug(\"Closed reading thread.\")\n\n self._data_socket.close()\n if USE_ZMQ:\n self._context.term()\n else:\n ...\n\n if os.getenv(\"LUTE_USE_TCP\"):\n return\n else:\n if self._party == Party.EXECUTOR:\n os.unlink(os.getenv(\"LUTE_SOCKET\")) # Should be defined\n return\n elif self._use_ssh_tunnel:\n if self._ssh_proc is not None:\n self._ssh_proc.terminate()\n\n @property\n def has_messages(self) -> bool:\n if self._party == Party.TASK:\n # Shouldn't be called on Task-side\n return False\n\n if self._msg_queue.qsize() > 0:\n return True\n return False\n\n def __exit__(self):\n self._clean_up()\n
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.ACCEPT_TIMEOUT","title":"ACCEPT_TIMEOUT: float = 0.01
class-attribute
instance-attribute
","text":"Maximum time to wait to accept connections. Used by Executor-side.
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.MSG_HEAD","title":"MSG_HEAD: bytes = b'MSG'
class-attribute
instance-attribute
","text":"Start signal of a message. The end of a message is indicated by MSG_HEAD[::-1].
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.MSG_SEP","title":"MSG_SEP: bytes = b';;;'
class-attribute
instance-attribute
","text":"Separator for parts of a message. Messages have a start, length, message and end.
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.__init__","title":"__init__(party=Party.TASK, use_pickle=True)
","text":"IPC over a TCP or Unix socket.
Unlike with the PipeCommunicator, pickle is always used to send data through the socket.
Parameters:
Name Type Description Defaultparty
Party
Which object (side/process) the Communicator is managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.
TASK
use_pickle
bool
Whether to use pickle. Always True currently, passing False does not change behaviour.
True
Source code in lute/execution/ipc.py
def __init__(self, party: Party = Party.TASK, use_pickle: bool = True) -> None:\n \"\"\"IPC over a TCP or Unix socket.\n\n Unlike with the PipeCommunicator, pickle is always used to send data\n through the socket.\n\n Args:\n party (Party): Which object (side/process) the Communicator is\n managing IPC for. I.e., is this the \"Task\" or \"Executor\" side.\n\n use_pickle (bool): Whether to use pickle. Always True currently,\n passing False does not change behaviour.\n \"\"\"\n super().__init__(party=party, use_pickle=use_pickle)\n
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.delayed_setup","title":"delayed_setup()
","text":"Delays the creation of socket objects.
The Executor initializes the Communicator when it is created. Since all Executors are created and available at once we want to delay acquisition of socket resources until a single Executor is ready to use them.
Source code inlute/execution/ipc.py
def delayed_setup(self) -> None:\n \"\"\"Delays the creation of socket objects.\n\n The Executor initializes the Communicator when it is created. Since\n all Executors are created and available at once we want to delay\n acquisition of socket resources until a single Executor is ready\n to use them.\n \"\"\"\n self._data_socket: Union[socket.socket, zmq.sugar.socket.Socket]\n if USE_ZMQ:\n self.desc: str = \"Communicates using ZMQ through TCP or Unix sockets.\"\n self._context: zmq.context.Context = zmq.Context()\n self._data_socket = self._create_socket_zmq()\n else:\n self.desc: str = \"Communicates through a TCP or Unix socket.\"\n self._data_socket = self._create_socket_raw()\n self._data_socket.settimeout(SocketCommunicator.ACCEPT_TIMEOUT)\n\n if self._party == Party.EXECUTOR:\n # Executor created first so we can define the hostname env variable\n os.environ[\"LUTE_EXECUTOR_HOST\"] = socket.gethostname()\n # Setup reader thread\n self._reader_thread: threading.Thread = threading.Thread(\n target=self._read_socket\n )\n self._msg_queue: queue.Queue = queue.Queue()\n self._partial_msg: Optional[bytes] = None\n self._stop_thread: bool = False\n self._reader_thread.start()\n else:\n # Only used by Party.TASK\n self._use_ssh_tunnel: bool = False\n self._ssh_proc: Optional[subprocess.Popen] = None\n self._local_socket_path: Optional[str] = None\n
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.read","title":"read(proc)
","text":"Return a message from the queue if available.
Socket(s) are continuously monitored, and read from when new data is available.
Parameters:
Name Type Description Defaultproc
Popen
The process to read from. Provided for compatibility with other Communicator subtypes. Is ignored.
requiredReturns:
Name Type Descriptionmsg
Message
The message read, containing contents and signal.
Source code inlute/execution/ipc.py
def read(self, proc: subprocess.Popen) -> Message:\n \"\"\"Return a message from the queue if available.\n\n Socket(s) are continuously monitored, and read from when new data is\n available.\n\n Args:\n proc (subprocess.Popen): The process to read from. Provided for\n compatibility with other Communicator subtypes. Is ignored.\n\n Returns:\n msg (Message): The message read, containing contents and signal.\n \"\"\"\n msg: Message\n try:\n msg = self._msg_queue.get(timeout=SocketCommunicator.ACCEPT_TIMEOUT)\n except queue.Empty:\n msg = Message()\n\n return msg\n
"},{"location":"source/execution/ipc/#execution.ipc.SocketCommunicator.write","title":"write(msg)
","text":"Send a single Message.
The entire Message (signal and contents) is serialized and sent through a connection over Unix socket.
Parameters:
Name Type Description Defaultmsg
Message
The Message to send.
required Source code inlute/execution/ipc.py
def write(self, msg: Message) -> None:\n \"\"\"Send a single Message.\n\n The entire Message (signal and contents) is serialized and sent through\n a connection over Unix socket.\n\n Args:\n msg (Message): The Message to send.\n \"\"\"\n self._write_socket(msg)\n
"},{"location":"source/io/_sqlite/","title":"_sqlite","text":"Backend SQLite database utilites.
Functions should be used only by the higher-level database module.
"},{"location":"source/io/config/","title":"config","text":"Machinary for the IO of configuration YAML files and their validation.
Functions:
Name Descriptionparse_config
str, config_path: str) -> TaskParameters: Parse a configuration file and return a TaskParameters object of validated parameters for a specific Task. Raises an exception if the provided configuration does not match the expected model.
Raises:
Type DescriptionValidationError
Error raised by pydantic during data validation. (From Pydantic)
"},{"location":"source/io/config/#io.config.AnalysisHeader","title":"AnalysisHeader
","text":" Bases: BaseModel
Header information for LUTE analysis runs.
Source code inlute/io/models/base.py
class AnalysisHeader(BaseModel):\n \"\"\"Header information for LUTE analysis runs.\"\"\"\n\n title: str = Field(\n \"LUTE Task Configuration\",\n description=\"Description of the configuration or experiment.\",\n )\n experiment: str = Field(\"\", description=\"Experiment.\")\n run: Union[str, int] = Field(\"\", description=\"Data acquisition run.\")\n date: str = Field(\"1970/01/01\", description=\"Start date of analysis.\")\n lute_version: Union[float, str] = Field(\n 0.1, description=\"Version of LUTE used for analysis.\"\n )\n task_timeout: PositiveInt = Field(\n 600,\n description=(\n \"Time in seconds until a task times out. Should be slightly shorter\"\n \" than job timeout if using a job manager (e.g. SLURM).\"\n ),\n )\n work_dir: str = Field(\"\", description=\"Main working directory for LUTE.\")\n\n @validator(\"work_dir\", always=True)\n def validate_work_dir(cls, directory: str, values: Dict[str, Any]) -> str:\n work_dir: str\n if directory == \"\":\n std_work_dir = (\n f\"/sdf/data/lcls/ds/{values['experiment'][:3]}/\"\n f\"{values['experiment']}/scratch\"\n )\n work_dir = std_work_dir\n else:\n work_dir = directory\n # Check existence and permissions\n if not os.path.exists(work_dir):\n raise ValueError(f\"Working Directory: {work_dir} does not exist!\")\n if not os.access(work_dir, os.W_OK):\n # Need write access for database, files etc.\n raise ValueError(f\"Not write access for working directory: {work_dir}!\")\n return work_dir\n\n @validator(\"run\", always=True)\n def validate_run(\n cls, run: Union[str, int], values: Dict[str, Any]\n ) -> Union[str, int]:\n if run == \"\":\n # From Airflow RUN_NUM should have Format \"RUN_DATETIME\" - Num is first part\n run_time: str = os.environ.get(\"RUN_NUM\", \"\")\n if run_time != \"\":\n return int(run_time.split(\"_\")[0])\n return run\n\n @validator(\"experiment\", always=True)\n def validate_experiment(cls, experiment: str, values: Dict[str, Any]) -> str:\n if experiment == \"\":\n arp_exp: str = os.environ.get(\"EXPERIMENT\", \"EXPX00000\")\n return arp_exp\n return experiment\n
"},{"location":"source/io/config/#io.config.CompareHKLParameters","title":"CompareHKLParameters
","text":" Bases: ThirdPartyParameters
Parameters for CrystFEL's compare_hkl
for calculating figures of merit.
There are many parameters, and many combinations. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-partialator.html
Source code inlute/io/models/sfx_merge.py
class CompareHKLParameters(ThirdPartyParameters):\n \"\"\"Parameters for CrystFEL's `compare_hkl` for calculating figures of merit.\n\n There are many parameters, and many combinations. For more information on\n usage, please refer to the CrystFEL documentation, here:\n https://www.desy.de/~twhite/crystfel/manual-partialator.html\n \"\"\"\n\n class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/compare_hkl\",\n description=\"CrystFEL's reflection comparison binary.\",\n flag_type=\"\",\n )\n in_files: Optional[str] = Field(\n \"\",\n description=\"Path to input HKLs. Space-separated list of 2. Use output of partialator e.g.\",\n flag_type=\"\",\n )\n ## Need mechanism to set is_result=True ...\n symmetry: str = Field(\"\", description=\"Point group symmetry.\", flag_type=\"--\")\n cell_file: str = Field(\n \"\",\n description=\"Path to a file containing unit cell information (PDB or CrystFEL format).\",\n flag_type=\"-\",\n rename_param=\"p\",\n )\n fom: str = Field(\n \"Rsplit\", description=\"Specify figure of merit to calculate.\", flag_type=\"--\"\n )\n nshells: int = Field(10, description=\"Use n resolution shells.\", flag_type=\"--\")\n # NEED A NEW CASE FOR THIS -> Boolean flag, no arg, one hyphen...\n # fix_unity: bool = Field(\n # False,\n # description=\"Fix scale factors to unity.\",\n # flag_type=\"-\",\n # rename_param=\"u\",\n # )\n shell_file: str = Field(\n \"\",\n description=\"Write the statistics in resolution shells to a file.\",\n flag_type=\"--\",\n rename_param=\"shell-file\",\n is_result=True,\n )\n ignore_negs: bool = Field(\n False,\n description=\"Ignore reflections with negative reflections.\",\n flag_type=\"--\",\n rename_param=\"ignore-negs\",\n )\n zero_negs: bool = Field(\n False,\n description=\"Set negative intensities to 0.\",\n flag_type=\"--\",\n rename_param=\"zero-negs\",\n )\n sigma_cutoff: Optional[Union[float, int, str]] = Field(\n # \"-infinity\",\n description=\"Discard reflections with I/sigma(I) < n. -infinity means no cutoff.\",\n flag_type=\"--\",\n rename_param=\"sigma-cutoff\",\n )\n rmin: Optional[float] = Field(\n description=\"Low resolution cutoff of 1/d (m-1). Use this or --lowres NOT both.\",\n flag_type=\"--\",\n )\n lowres: Optional[float] = Field(\n descirption=\"Low resolution cutoff in Angstroms. Use this or --rmin NOT both.\",\n flag_type=\"--\",\n )\n rmax: Optional[float] = Field(\n description=\"High resolution cutoff in 1/d (m-1). Use this or --highres NOT both.\",\n flag_type=\"--\",\n )\n highres: Optional[float] = Field(\n description=\"High resolution cutoff in Angstroms. Use this or --rmax NOT both.\",\n flag_type=\"--\",\n )\n\n @validator(\"in_files\", always=True)\n def validate_in_files(cls, in_files: str, values: Dict[str, Any]) -> str:\n if in_files == \"\":\n partialator_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n )\n if partialator_file:\n hkls: str = f\"{partialator_file}1 {partialator_file}2\"\n return hkls\n return in_files\n\n @validator(\"cell_file\", always=True)\n def validate_cell_file(cls, cell_file: str, values: Dict[str, Any]) -> str:\n if cell_file == \"\":\n idx_cell_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\",\n \"IndexCrystFEL\",\n \"cell_file\",\n valid_only=False,\n )\n if idx_cell_file:\n return idx_cell_file\n return cell_file\n\n @validator(\"symmetry\", always=True)\n def validate_symmetry(cls, symmetry: str, values: Dict[str, Any]) -> str:\n if symmetry == \"\":\n partialator_sym: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"symmetry\"\n )\n if partialator_sym:\n return partialator_sym\n return symmetry\n\n @validator(\"shell_file\", always=True)\n def validate_shell_file(cls, shell_file: str, values: Dict[str, Any]) -> str:\n if shell_file == \"\":\n partialator_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n )\n if partialator_file:\n shells_out: str = partialator_file.split(\".\")[0]\n shells_out = f\"{shells_out}_{values['fom']}_n{values['nshells']}.dat\"\n return shells_out\n return shell_file\n
"},{"location":"source/io/config/#io.config.CompareHKLParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_merge.py
class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/config/#io.config.CompareHKLParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True
class-attribute
instance-attribute
","text":"Whether long command-line arguments are passed like --long=arg
.
set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/config/#io.config.ConcatenateStreamFilesParameters","title":"ConcatenateStreamFilesParameters
","text":" Bases: TaskParameters
Parameters for stream concatenation.
Concatenates the stream file output from CrystFEL indexing for multiple experimental runs.
Source code inlute/io/models/sfx_index.py
class ConcatenateStreamFilesParameters(TaskParameters):\n \"\"\"Parameters for stream concatenation.\n\n Concatenates the stream file output from CrystFEL indexing for multiple\n experimental runs.\n \"\"\"\n\n class Config(TaskParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n in_file: str = Field(\n \"\",\n description=\"Root of directory tree storing stream files to merge.\",\n )\n\n tag: Optional[str] = Field(\n \"\",\n description=\"Tag identifying the stream files to merge.\",\n )\n\n out_file: str = Field(\n \"\", description=\"Path to merged output stream file.\", is_result=True\n )\n\n @validator(\"in_file\", always=True)\n def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n if in_file == \"\":\n stream_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"IndexCrystFEL\", \"out_file\"\n )\n if stream_file:\n stream_dir: str = str(Path(stream_file).parent)\n return stream_dir\n return in_file\n\n @validator(\"tag\", always=True)\n def validate_tag(cls, tag: str, values: Dict[str, Any]) -> str:\n if tag == \"\":\n stream_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"IndexCrystFEL\", \"out_file\"\n )\n if stream_file:\n stream_tag: str = Path(stream_file).name.split(\"_\")[0]\n return stream_tag\n return tag\n\n @validator(\"out_file\", always=True)\n def validate_out_file(cls, tag: str, values: Dict[str, Any]) -> str:\n if tag == \"\":\n stream_out_file: str = str(\n Path(values[\"in_file\"]).parent / f\"{values['tag'].stream}\"\n )\n return stream_out_file\n return tag\n
"},{"location":"source/io/config/#io.config.ConcatenateStreamFilesParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_index.py
class Config(TaskParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/config/#io.config.ConcatenateStreamFilesParameters.Config.set_result","title":"set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/config/#io.config.DimpleSolveParameters","title":"DimpleSolveParameters
","text":" Bases: ThirdPartyParameters
Parameters for CCP4's dimple program.
There are many parameters. For more information on usage, please refer to the CCP4 documentation, here: https://ccp4.github.io/dimple/
Source code inlute/io/models/sfx_solve.py
class DimpleSolveParameters(ThirdPartyParameters):\n \"\"\"Parameters for CCP4's dimple program.\n\n There are many parameters. For more information on\n usage, please refer to the CCP4 documentation, here:\n https://ccp4.github.io/dimple/\n \"\"\"\n\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/ccp4-8.0/bin/dimple\",\n description=\"CCP4 Dimple for solving structures with MR.\",\n flag_type=\"\",\n )\n # Positional requirements - all required.\n in_file: str = Field(\n \"\",\n description=\"Path to input mtz.\",\n flag_type=\"\",\n )\n pdb: str = Field(\"\", description=\"Path to a PDB.\", flag_type=\"\")\n out_dir: str = Field(\"\", description=\"Output DIRECTORY.\", flag_type=\"\")\n # Most used options\n mr_thresh: PositiveFloat = Field(\n 0.4,\n description=\"Threshold for molecular replacement.\",\n flag_type=\"--\",\n rename_param=\"mr-when-r\",\n )\n slow: Optional[bool] = Field(\n False, description=\"Perform more refinement.\", flag_type=\"--\"\n )\n # Other options (IO)\n hklout: str = Field(\n \"final.mtz\", description=\"Output mtz file name.\", flag_type=\"--\"\n )\n xyzout: str = Field(\n \"final.pdb\", description=\"Output PDB file name.\", flag_type=\"--\"\n )\n icolumn: Optional[str] = Field(\n # \"IMEAN\",\n description=\"Name for the I column.\",\n flag_type=\"--\",\n )\n sigicolumn: Optional[str] = Field(\n # \"SIG<ICOL>\",\n description=\"Name for the Sig<I> column.\",\n flag_type=\"--\",\n )\n fcolumn: Optional[str] = Field(\n # \"F\",\n description=\"Name for the F column.\",\n flag_type=\"--\",\n )\n sigfcolumn: Optional[str] = Field(\n # \"F\",\n description=\"Name for the Sig<F> column.\",\n flag_type=\"--\",\n )\n libin: Optional[str] = Field(\n description=\"Ligand descriptions for refmac (LIBIN).\", flag_type=\"--\"\n )\n refmac_key: Optional[str] = Field(\n description=\"Extra Refmac keywords to use in refinement.\",\n flag_type=\"--\",\n rename_param=\"refmac-key\",\n )\n free_r_flags: Optional[str] = Field(\n description=\"Path to a mtz file with freeR flags.\",\n flag_type=\"--\",\n rename_param=\"free-r-flags\",\n )\n freecolumn: Optional[Union[int, float]] = Field(\n # 0,\n description=\"Refree column with an optional value.\",\n flag_type=\"--\",\n )\n img_format: Optional[str] = Field(\n description=\"Format of generated images. (png, jpeg, none).\",\n flag_type=\"-\",\n rename_param=\"f\",\n )\n white_bg: bool = Field(\n False,\n description=\"Use a white background in Coot and in images.\",\n flag_type=\"--\",\n rename_param=\"white-bg\",\n )\n no_cleanup: bool = Field(\n False,\n description=\"Retain intermediate files.\",\n flag_type=\"--\",\n rename_param=\"no-cleanup\",\n )\n # Calculations\n no_blob_search: bool = Field(\n False,\n description=\"Do not search for unmodelled blobs.\",\n flag_type=\"--\",\n rename_param=\"no-blob-search\",\n )\n anode: bool = Field(\n False, description=\"Use SHELX/AnoDe to find peaks in the anomalous map.\"\n )\n # Run customization\n no_hetatm: bool = Field(\n False,\n description=\"Remove heteroatoms from the given model.\",\n flag_type=\"--\",\n rename_param=\"no-hetatm\",\n )\n rigid_cycles: Optional[PositiveInt] = Field(\n # 10,\n description=\"Number of cycles of rigid-body refinement to perform.\",\n flag_type=\"--\",\n rename_param=\"rigid-cycles\",\n )\n jelly: Optional[PositiveInt] = Field(\n # 4,\n description=\"Number of cycles of jelly-body refinement to perform.\",\n flag_type=\"--\",\n )\n restr_cycles: Optional[PositiveInt] = Field(\n # 8,\n description=\"Number of cycles of refmac final refinement to perform.\",\n flag_type=\"--\",\n rename_param=\"restr-cycles\",\n )\n lim_resolution: Optional[PositiveFloat] = Field(\n description=\"Limit the final resolution.\", flag_type=\"--\", rename_param=\"reso\"\n )\n weight: Optional[str] = Field(\n # \"auto-weight\",\n description=\"The refmac matrix weight.\",\n flag_type=\"--\",\n )\n mr_prog: Optional[str] = Field(\n # \"phaser\",\n description=\"Molecular replacement program. phaser or molrep.\",\n flag_type=\"--\",\n rename_param=\"mr-prog\",\n )\n mr_num: Optional[Union[str, int]] = Field(\n # \"auto\",\n description=\"Number of molecules to use for molecular replacement.\",\n flag_type=\"--\",\n rename_param=\"mr-num\",\n )\n mr_reso: Optional[PositiveFloat] = Field(\n # 3.25,\n description=\"High resolution for molecular replacement. If >10 interpreted as eLLG.\",\n flag_type=\"--\",\n rename_param=\"mr-reso\",\n )\n itof_prog: Optional[str] = Field(\n description=\"Program to calculate amplitudes. truncate, or ctruncate.\",\n flag_type=\"--\",\n rename_param=\"ItoF-prog\",\n )\n\n @validator(\"in_file\", always=True)\n def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n if in_file == \"\":\n get_hkl_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"ManipulateHKL\", \"out_file\"\n )\n if get_hkl_file:\n return get_hkl_file\n return in_file\n\n @validator(\"out_dir\", always=True)\n def validate_out_dir(cls, out_dir: str, values: Dict[str, Any]) -> str:\n if out_dir == \"\":\n get_hkl_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"ManipulateHKL\", \"out_file\"\n )\n if get_hkl_file:\n return os.path.dirname(get_hkl_file)\n return out_dir\n
"},{"location":"source/io/config/#io.config.FindOverlapXSSParameters","title":"FindOverlapXSSParameters
","text":" Bases: TaskParameters
TaskParameter model for FindOverlapXSS Task.
This Task determines spatial or temporal overlap between an optical pulse and the FEL pulse based on difference scattering (XSS) signal. This Task uses SmallData HDF5 files as a source.
Source code inlute/io/models/smd.py
class FindOverlapXSSParameters(TaskParameters):\n \"\"\"TaskParameter model for FindOverlapXSS Task.\n\n This Task determines spatial or temporal overlap between an optical pulse\n and the FEL pulse based on difference scattering (XSS) signal. This Task\n uses SmallData HDF5 files as a source.\n \"\"\"\n\n class ExpConfig(BaseModel):\n det_name: str\n ipm_var: str\n scan_var: Union[str, List[str]]\n\n class Thresholds(BaseModel):\n min_Iscat: Union[int, float]\n min_ipm: Union[int, float]\n\n class AnalysisFlags(BaseModel):\n use_pyfai: bool = True\n use_asymls: bool = False\n\n exp_config: ExpConfig\n thresholds: Thresholds\n analysis_flags: AnalysisFlags\n
"},{"location":"source/io/config/#io.config.FindPeaksPsocakeParameters","title":"FindPeaksPsocakeParameters
","text":" Bases: ThirdPartyParameters
Parameters for crystallographic (Bragg) peak finding using Psocake.
This peak finding Task optionally has the ability to compress/decompress data with SZ for the purpose of compression validation. NOTE: This Task is deprecated and provided for compatibility only.
Source code inlute/io/models/sfx_find_peaks.py
class FindPeaksPsocakeParameters(ThirdPartyParameters):\n \"\"\"Parameters for crystallographic (Bragg) peak finding using Psocake.\n\n This peak finding Task optionally has the ability to compress/decompress\n data with SZ for the purpose of compression validation.\n NOTE: This Task is deprecated and provided for compatibility only.\n \"\"\"\n\n class Config(TaskParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n result_from_params: str = \"\"\n \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n\n class SZParameters(BaseModel):\n compressor: Literal[\"qoz\", \"sz3\"] = Field(\n \"qoz\", description=\"SZ compression algorithm (qoz, sz3)\"\n )\n binSize: int = Field(2, description=\"SZ compression's bin size paramater\")\n roiWindowSize: int = Field(\n 2, description=\"SZ compression's ROI window size paramater\"\n )\n absError: float = Field(10, descriptionp=\"Maximum absolute error value\")\n\n executable: str = Field(\"mpirun\", description=\"MPI executable.\", flag_type=\"\")\n np: PositiveInt = Field(\n max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n description=\"Number of processes\",\n flag_type=\"-\",\n )\n mca: str = Field(\n \"btl ^openib\", description=\"Mca option for the MPI executable\", flag_type=\"--\"\n )\n p_arg1: str = Field(\n \"python\", description=\"Executable to run with mpi (i.e. python).\", flag_type=\"\"\n )\n u: str = Field(\n \"\", description=\"Python option for unbuffered output.\", flag_type=\"-\"\n )\n p_arg2: str = Field(\n \"findPeaksSZ.py\",\n description=\"Executable to run with mpi (i.e. python).\",\n flag_type=\"\",\n )\n d: str = Field(description=\"Detector name\", flag_type=\"-\")\n e: str = Field(\"\", description=\"Experiment name\", flag_type=\"-\")\n r: int = Field(-1, description=\"Run number\", flag_type=\"-\")\n outDir: str = Field(\n description=\"Output directory where .cxi will be saved\", flag_type=\"--\"\n )\n algorithm: int = Field(1, description=\"PyAlgos algorithm to use\", flag_type=\"--\")\n alg_npix_min: float = Field(\n 1.0, description=\"PyAlgos algorithm's npix_min parameter\", flag_type=\"--\"\n )\n alg_npix_max: float = Field(\n 45.0, description=\"PyAlgos algorithm's npix_max parameter\", flag_type=\"--\"\n )\n alg_amax_thr: float = Field(\n 250.0, description=\"PyAlgos algorithm's amax_thr parameter\", flag_type=\"--\"\n )\n alg_atot_thr: float = Field(\n 330.0, description=\"PyAlgos algorithm's atot_thr parameter\", flag_type=\"--\"\n )\n alg_son_min: float = Field(\n 10.0, description=\"PyAlgos algorithm's son_min parameter\", flag_type=\"--\"\n )\n alg1_thr_low: float = Field(\n 80.0, description=\"PyAlgos algorithm's thr_low parameter\", flag_type=\"--\"\n )\n alg1_thr_high: float = Field(\n 270.0, description=\"PyAlgos algorithm's thr_high parameter\", flag_type=\"--\"\n )\n alg1_rank: int = Field(\n 3, description=\"PyAlgos algorithm's rank parameter\", flag_type=\"--\"\n )\n alg1_radius: int = Field(\n 3, description=\"PyAlgos algorithm's radius parameter\", flag_type=\"--\"\n )\n alg1_dr: int = Field(\n 1, description=\"PyAlgos algorithm's dr parameter\", flag_type=\"--\"\n )\n psanaMask_on: str = Field(\n \"True\", description=\"Whether psana's mask should be used\", flag_type=\"--\"\n )\n psanaMask_calib: str = Field(\n \"True\", description=\"Psana mask's calib parameter\", flag_type=\"--\"\n )\n psanaMask_status: str = Field(\n \"True\", description=\"Psana mask's status parameter\", flag_type=\"--\"\n )\n psanaMask_edges: str = Field(\n \"True\", description=\"Psana mask's edges parameter\", flag_type=\"--\"\n )\n psanaMask_central: str = Field(\n \"True\", description=\"Psana mask's central parameter\", flag_type=\"--\"\n )\n psanaMask_unbond: str = Field(\n \"True\", description=\"Psana mask's unbond parameter\", flag_type=\"--\"\n )\n psanaMask_unbondnrs: str = Field(\n \"True\", description=\"Psana mask's unbondnbrs parameter\", flag_type=\"--\"\n )\n mask: str = Field(\n \"\", description=\"Path to an additional mask to apply\", flag_type=\"--\"\n )\n clen: str = Field(\n description=\"Epics variable storing the camera length\", flag_type=\"--\"\n )\n coffset: float = Field(0, description=\"Camera offset in m\", flag_type=\"--\")\n minPeaks: int = Field(\n 15,\n description=\"Minimum number of peaks to mark frame for indexing\",\n flag_type=\"--\",\n )\n maxPeaks: int = Field(\n 15,\n description=\"Maximum number of peaks to mark frame for indexing\",\n flag_type=\"--\",\n )\n minRes: int = Field(\n 0,\n description=\"Minimum peak resolution to mark frame for indexing \",\n flag_type=\"--\",\n )\n sample: str = Field(\"\", description=\"Sample name\", flag_type=\"--\")\n instrument: Union[None, str] = Field(\n None, description=\"Instrument name\", flag_type=\"--\"\n )\n pixelSize: float = Field(0.0, description=\"Pixel size\", flag_type=\"--\")\n auto: str = Field(\n \"False\",\n description=(\n \"Whether to automatically determine peak per event peak \"\n \"finding parameters\"\n ),\n flag_type=\"--\",\n )\n detectorDistance: float = Field(\n 0.0, description=\"Detector distance from interaction point in m\", flag_type=\"--\"\n )\n access: Literal[\"ana\", \"ffb\"] = Field(\n \"ana\", description=\"Data node type: {ana,ffb}\", flag_type=\"--\"\n )\n szfile: str = Field(\"qoz.json\", description=\"Path to SZ's JSON configuration file\")\n lute_template_cfg: TemplateConfig = Field(\n TemplateConfig(\n template_name=\"sz.json\",\n output_path=\"\", # Will want to change where this goes...\n ),\n description=\"Template information for the sz.json file\",\n )\n sz_parameters: SZParameters = Field(\n description=\"Configuration parameters for SZ Compression\", flag_type=\"\"\n )\n\n @validator(\"e\", always=True)\n def validate_e(cls, e: str, values: Dict[str, Any]) -> str:\n if e == \"\":\n return values[\"lute_config\"].experiment\n return e\n\n @validator(\"r\", always=True)\n def validate_r(cls, r: int, values: Dict[str, Any]) -> int:\n if r == -1:\n return values[\"lute_config\"].run\n return r\n\n @validator(\"lute_template_cfg\", always=True)\n def set_output_path(\n cls, lute_template_cfg: TemplateConfig, values: Dict[str, Any]\n ) -> TemplateConfig:\n if lute_template_cfg.output_path == \"\":\n lute_template_cfg.output_path = values[\"szfile\"]\n return lute_template_cfg\n\n @validator(\"sz_parameters\", always=True)\n def set_sz_compression_parameters(\n cls, sz_parameters: SZParameters, values: Dict[str, Any]\n ) -> None:\n values[\"compressor\"] = sz_parameters.compressor\n values[\"binSize\"] = sz_parameters.binSize\n values[\"roiWindowSize\"] = sz_parameters.roiWindowSize\n if sz_parameters.compressor == \"qoz\":\n values[\"pressio_opts\"] = {\n \"pressio:abs\": sz_parameters.absError,\n \"qoz\": {\"qoz:stride\": 8},\n }\n else:\n values[\"pressio_opts\"] = {\"pressio:abs\": sz_parameters.absError}\n return None\n\n @root_validator(pre=False)\n def define_result(cls, values: Dict[str, Any]) -> Dict[str, Any]:\n exp: str = values[\"lute_config\"].experiment\n run: int = int(values[\"lute_config\"].run)\n directory: str = values[\"outDir\"]\n fname: str = f\"{exp}_{run:04d}.lst\"\n\n cls.Config.result_from_params = f\"{directory}/{fname}\"\n return values\n
"},{"location":"source/io/config/#io.config.FindPeaksPsocakeParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_find_peaks.py
class Config(TaskParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n result_from_params: str = \"\"\n \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n
"},{"location":"source/io/config/#io.config.FindPeaksPsocakeParameters.Config.result_from_params","title":"result_from_params: str = ''
class-attribute
instance-attribute
","text":"Defines a result from the parameters. Use a validator to do so.
"},{"location":"source/io/config/#io.config.FindPeaksPsocakeParameters.Config.set_result","title":"set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/config/#io.config.FindPeaksPyAlgosParameters","title":"FindPeaksPyAlgosParameters
","text":" Bases: TaskParameters
Parameters for crystallographic (Bragg) peak finding using PyAlgos.
This peak finding Task optionally has the ability to compress/decompress data with SZ for the purpose of compression validation.
Source code inlute/io/models/sfx_find_peaks.py
class FindPeaksPyAlgosParameters(TaskParameters):\n \"\"\"Parameters for crystallographic (Bragg) peak finding using PyAlgos.\n\n This peak finding Task optionally has the ability to compress/decompress\n data with SZ for the purpose of compression validation.\n \"\"\"\n\n class Config(TaskParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n class SZCompressorParameters(BaseModel):\n compressor: Literal[\"qoz\", \"sz3\"] = Field(\n \"qoz\", description='Compression algorithm (\"qoz\" or \"sz3\")'\n )\n abs_error: float = Field(10.0, description=\"Absolute error bound\")\n bin_size: int = Field(2, description=\"Bin size\")\n roi_window_size: int = Field(\n 9,\n description=\"Default window size\",\n )\n\n outdir: str = Field(\n description=\"Output directory for cxi files\",\n )\n n_events: int = Field(\n 0,\n description=\"Number of events to process (0 to process all events)\",\n )\n det_name: str = Field(\n description=\"Psana name of the detector storing the image data\",\n )\n event_receiver: Literal[\"evr0\", \"evr1\"] = Field(\n description=\"Event Receiver to be used: evr0 or evr1\",\n )\n tag: str = Field(\n \"\",\n description=\"Tag to add to the output file names\",\n )\n pv_camera_length: Union[str, float] = Field(\n \"\",\n description=\"PV associated with camera length \"\n \"(if a number, camera length directly)\",\n )\n event_logic: bool = Field(\n False,\n description=\"True if only events with a specific event code should be \"\n \"processed. False if the event code should be ignored\",\n )\n event_code: int = Field(\n 0,\n description=\"Required events code for events to be processed if event logic \"\n \"is True\",\n )\n psana_mask: bool = Field(\n False,\n description=\"If True, apply mask from psana Detector object\",\n )\n mask_file: Union[str, None] = Field(\n None,\n description=\"File with a custom mask to apply. If None, no custom mask is \"\n \"applied\",\n )\n min_peaks: int = Field(2, description=\"Minimum number of peaks per image\")\n max_peaks: int = Field(\n 2048,\n description=\"Maximum number of peaks per image\",\n )\n npix_min: int = Field(\n 2,\n description=\"Minimum number of pixels per peak\",\n )\n npix_max: int = Field(\n 30,\n description=\"Maximum number of pixels per peak\",\n )\n amax_thr: float = Field(\n 80.0,\n description=\"Minimum intensity threshold for starting a peak\",\n )\n atot_thr: float = Field(\n 120.0,\n description=\"Minimum summed intensity threshold for pixel collection\",\n )\n son_min: float = Field(\n 7.0,\n description=\"Minimum signal-to-noise ratio to be considered a peak\",\n )\n peak_rank: int = Field(\n 3,\n description=\"Radius in which central peak pixel is a local maximum\",\n )\n r0: float = Field(\n 3.0,\n description=\"Radius of ring for background evaluation in pixels\",\n )\n dr: float = Field(\n 2.0,\n description=\"Width of ring for background evaluation in pixels\",\n )\n nsigm: float = Field(\n 7.0,\n description=\"Intensity threshold to include pixel in connected group\",\n )\n compression: Optional[SZCompressorParameters] = Field(\n None,\n description=\"Options for the SZ Compression Algorithm\",\n )\n out_file: str = Field(\n \"\",\n description=\"Path to output file.\",\n flag_type=\"-\",\n rename_param=\"o\",\n is_result=True,\n )\n\n @validator(\"out_file\", always=True)\n def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n if out_file == \"\":\n fname: Path = (\n Path(values[\"outdir\"])\n / f\"{values['lute_config'].experiment}_{values['lute_config'].run}_\"\n f\"{values['tag']}.list\"\n )\n return str(fname)\n return out_file\n
"},{"location":"source/io/config/#io.config.FindPeaksPyAlgosParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_find_peaks.py
class Config(TaskParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/config/#io.config.FindPeaksPyAlgosParameters.Config.set_result","title":"set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/config/#io.config.IndexCrystFELParameters","title":"IndexCrystFELParameters
","text":" Bases: ThirdPartyParameters
Parameters for CrystFEL's indexamajig
.
There are many parameters, and many combinations. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-indexamajig.html
Source code inlute/io/models/sfx_index.py
class IndexCrystFELParameters(ThirdPartyParameters):\n \"\"\"Parameters for CrystFEL's `indexamajig`.\n\n There are many parameters, and many combinations. For more information on\n usage, please refer to the CrystFEL documentation, here:\n https://www.desy.de/~twhite/crystfel/manual-indexamajig.html\n \"\"\"\n\n class Config(ThirdPartyParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/indexamajig\",\n description=\"CrystFEL's indexing binary.\",\n flag_type=\"\",\n )\n # Basic options\n in_file: Optional[str] = Field(\n \"\", description=\"Path to input file.\", flag_type=\"-\", rename_param=\"i\"\n )\n out_file: str = Field(\n \"\",\n description=\"Path to output file.\",\n flag_type=\"-\",\n rename_param=\"o\",\n is_result=True,\n )\n geometry: str = Field(\n \"\", description=\"Path to geometry file.\", flag_type=\"-\", rename_param=\"g\"\n )\n zmq_input: Optional[str] = Field(\n description=\"ZMQ address to receive data over. `input` and `zmq-input` are mutually exclusive\",\n flag_type=\"--\",\n rename_param=\"zmq-input\",\n )\n zmq_subscribe: Optional[str] = Field( # Can be used multiple times...\n description=\"Subscribe to ZMQ message of type `tag`\",\n flag_type=\"--\",\n rename_param=\"zmq-subscribe\",\n )\n zmq_request: Optional[AnyUrl] = Field(\n description=\"Request new data over ZMQ by sending this value\",\n flag_type=\"--\",\n rename_param=\"zmq-request\",\n )\n asapo_endpoint: Optional[str] = Field(\n description=\"ASAP::O endpoint. zmq-input and this are mutually exclusive.\",\n flag_type=\"--\",\n rename_param=\"asapo-endpoint\",\n )\n asapo_token: Optional[str] = Field(\n description=\"ASAP::O authentication token.\",\n flag_type=\"--\",\n rename_param=\"asapo-token\",\n )\n asapo_beamtime: Optional[str] = Field(\n description=\"ASAP::O beatime.\",\n flag_type=\"--\",\n rename_param=\"asapo-beamtime\",\n )\n asapo_source: Optional[str] = Field(\n description=\"ASAP::O data source.\",\n flag_type=\"--\",\n rename_param=\"asapo-source\",\n )\n asapo_group: Optional[str] = Field(\n description=\"ASAP::O consumer group.\",\n flag_type=\"--\",\n rename_param=\"asapo-group\",\n )\n asapo_stream: Optional[str] = Field(\n description=\"ASAP::O stream.\",\n flag_type=\"--\",\n rename_param=\"asapo-stream\",\n )\n asapo_wait_for_stream: Optional[str] = Field(\n description=\"If ASAP::O stream does not exist, wait for it to appear.\",\n flag_type=\"--\",\n rename_param=\"asapo-wait-for-stream\",\n )\n data_format: Optional[str] = Field(\n description=\"Specify format for ZMQ or ASAP::O. `msgpack`, `hdf5` or `seedee`.\",\n flag_type=\"--\",\n rename_param=\"data-format\",\n )\n basename: bool = Field(\n False,\n description=\"Remove directory parts of filenames. Acts before prefix if prefix also given.\",\n flag_type=\"--\",\n )\n prefix: Optional[str] = Field(\n description=\"Add a prefix to the filenames from the infile argument.\",\n flag_type=\"--\",\n rename_param=\"asapo-stream\",\n )\n nthreads: PositiveInt = Field(\n max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n description=\"Number of threads to use. See also `max_indexer_threads`.\",\n flag_type=\"-\",\n rename_param=\"j\",\n )\n no_check_prefix: bool = Field(\n False,\n description=\"Don't attempt to correct the prefix if it seems incorrect.\",\n flag_type=\"--\",\n rename_param=\"no-check-prefix\",\n )\n highres: Optional[float] = Field(\n description=\"Mark all pixels greater than `x` has bad.\", flag_type=\"--\"\n )\n profile: bool = Field(\n False, description=\"Display timing data to monitor performance.\", flag_type=\"--\"\n )\n temp_dir: Optional[str] = Field(\n description=\"Specify a path for the temp files folder.\",\n flag_type=\"--\",\n rename_param=\"temp-dir\",\n )\n wait_for_file: conint(gt=-2) = Field(\n 0,\n description=\"Wait at most `x` seconds for a file to be created. A value of -1 means wait forever.\",\n flag_type=\"--\",\n rename_param=\"wait-for-file\",\n )\n no_image_data: bool = Field(\n False,\n description=\"Load only the metadata, no iamges. Can check indexability without high data requirements.\",\n flag_type=\"--\",\n rename_param=\"no-image-data\",\n )\n # Peak-finding options\n # ....\n # Indexing options\n indexing: Optional[str] = Field(\n description=\"Comma-separated list of supported indexing algorithms to use. Default is to automatically detect.\",\n flag_type=\"--\",\n )\n cell_file: Optional[str] = Field(\n description=\"Path to a file containing unit cell information (PDB or CrystFEL format).\",\n flag_type=\"-\",\n rename_param=\"p\",\n )\n tolerance: str = Field(\n \"5,5,5,1.5\",\n description=(\n \"Tolerances (in percent) for unit cell comparison. \"\n \"Comma-separated list a,b,c,angle. Default=5,5,5,1.5\"\n ),\n flag_type=\"--\",\n )\n no_check_cell: bool = Field(\n False,\n description=\"Do not check cell parameters against unit cell. Replaces '-raw' method.\",\n flag_type=\"--\",\n rename_param=\"no-check-cell\",\n )\n no_check_peaks: bool = Field(\n False,\n description=\"Do not verify peaks are accounted for by solution.\",\n flag_type=\"--\",\n rename_param=\"no-check-peaks\",\n )\n multi: bool = Field(\n False, description=\"Enable multi-lattice indexing.\", flag_type=\"--\"\n )\n wavelength_estimate: Optional[float] = Field(\n description=\"Estimate for X-ray wavelength. Required for some methods.\",\n flag_type=\"--\",\n rename_param=\"wavelength-estimate\",\n )\n camera_length_estimate: Optional[float] = Field(\n description=\"Estimate for camera distance. Required for some methods.\",\n flag_type=\"--\",\n rename_param=\"camera-length-estimate\",\n )\n max_indexer_threads: Optional[PositiveInt] = Field(\n # 1,\n description=\"Some indexing algos can use multiple threads. In addition to image-based.\",\n flag_type=\"--\",\n rename_param=\"max-indexer-threads\",\n )\n no_retry: bool = Field(\n False,\n description=\"Do not remove weak peaks and try again.\",\n flag_type=\"--\",\n rename_param=\"no-retry\",\n )\n no_refine: bool = Field(\n False,\n description=\"Skip refinement step.\",\n flag_type=\"--\",\n rename_param=\"no-refine\",\n )\n no_revalidate: bool = Field(\n False,\n description=\"Skip revalidation step.\",\n flag_type=\"--\",\n rename_param=\"no-revalidate\",\n )\n # TakeTwo specific parameters\n taketwo_member_threshold: Optional[PositiveInt] = Field(\n # 20,\n description=\"Minimum number of vectors to consider.\",\n flag_type=\"--\",\n rename_param=\"taketwo-member-threshold\",\n )\n taketwo_len_tolerance: Optional[PositiveFloat] = Field(\n # 0.001,\n description=\"TakeTwo length tolerance in Angstroms.\",\n flag_type=\"--\",\n rename_param=\"taketwo-len-tolerance\",\n )\n taketwo_angle_tolerance: Optional[PositiveFloat] = Field(\n # 0.6,\n description=\"TakeTwo angle tolerance in degrees.\",\n flag_type=\"--\",\n rename_param=\"taketwo-angle-tolerance\",\n )\n taketwo_trace_tolerance: Optional[PositiveFloat] = Field(\n # 3,\n description=\"Matrix trace tolerance in degrees.\",\n flag_type=\"--\",\n rename_param=\"taketwo-trace-tolerance\",\n )\n # Felix-specific parameters\n # felix_domega\n # felix-fraction-max-visits\n # felix-max-internal-angle\n # felix-max-uniqueness\n # felix-min-completeness\n # felix-min-visits\n # felix-num-voxels\n # felix-sigma\n # felix-tthrange-max\n # felix-tthrange-min\n # XGANDALF-specific parameters\n xgandalf_sampling_pitch: Optional[NonNegativeInt] = Field(\n # 6,\n description=\"Density of reciprocal space sampling.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-sampling-pitch\",\n )\n xgandalf_grad_desc_iterations: Optional[NonNegativeInt] = Field(\n # 4,\n description=\"Number of gradient descent iterations.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-grad-desc-iterations\",\n )\n xgandalf_tolerance: Optional[PositiveFloat] = Field(\n # 0.02,\n description=\"Relative tolerance of lattice vectors\",\n flag_type=\"--\",\n rename_param=\"xgandalf-tolerance\",\n )\n xgandalf_no_deviation_from_provided_cell: Optional[bool] = Field(\n description=\"Found unit cell must match provided.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-no-deviation-from-provided-cell\",\n )\n xgandalf_min_lattice_vector_length: Optional[PositiveFloat] = Field(\n # 30,\n description=\"Minimum possible lattice length.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-min-lattice-vector-length\",\n )\n xgandalf_max_lattice_vector_length: Optional[PositiveFloat] = Field(\n # 250,\n description=\"Minimum possible lattice length.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-max-lattice-vector-length\",\n )\n xgandalf_max_peaks: Optional[PositiveInt] = Field(\n # 250,\n description=\"Maximum number of peaks to use for indexing.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-max-peaks\",\n )\n xgandalf_fast_execution: bool = Field(\n False,\n description=\"Shortcut to set sampling-pitch=2, and grad-desc-iterations=3.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-fast-execution\",\n )\n # pinkIndexer parameters\n # ...\n # asdf_fast: bool = Field(False, description=\"Enable fast mode for asdf. 3x faster for 7% loss in accuracy.\", flag_type=\"--\", rename_param=\"asdf-fast\")\n # Integration parameters\n integration: str = Field(\n \"rings-nocen\", description=\"Method for integrating reflections.\", flag_type=\"--\"\n )\n fix_profile_radius: Optional[float] = Field(\n description=\"Fix the profile radius (m^{-1})\",\n flag_type=\"--\",\n rename_param=\"fix-profile-radius\",\n )\n fix_divergence: Optional[float] = Field(\n 0,\n description=\"Fix the divergence (rad, full angle).\",\n flag_type=\"--\",\n rename_param=\"fix-divergence\",\n )\n int_radius: str = Field(\n \"4,5,7\",\n description=\"Inner, middle, and outer radii for 3-ring integration.\",\n flag_type=\"--\",\n rename_param=\"int-radius\",\n )\n int_diag: str = Field(\n \"none\",\n description=\"Show detailed information on integration when condition is met.\",\n flag_type=\"--\",\n rename_param=\"int-diag\",\n )\n push_res: str = Field(\n \"infinity\",\n description=\"Integrate `x` higher than apparent resolution limit (nm-1).\",\n flag_type=\"--\",\n rename_param=\"push-res\",\n )\n overpredict: bool = Field(\n False,\n description=\"Over-predict reflections. Maybe useful with post-refinement.\",\n flag_type=\"--\",\n )\n cell_parameters_only: bool = Field(\n False, description=\"Do not predict refletions at all\", flag_type=\"--\"\n )\n # Output parameters\n no_non_hits_in_stream: bool = Field(\n False,\n description=\"Exclude non-hits from the stream file.\",\n flag_type=\"--\",\n rename_param=\"no-non-hits-in-stream\",\n )\n copy_hheader: Optional[str] = Field(\n description=\"Copy information from header in the image to output stream.\",\n flag_type=\"--\",\n rename_param=\"copy-hheader\",\n )\n no_peaks_in_stream: bool = Field(\n False,\n description=\"Do not record peaks in stream file.\",\n flag_type=\"--\",\n rename_param=\"no-peaks-in-stream\",\n )\n no_refls_in_stream: bool = Field(\n False,\n description=\"Do not record reflections in stream.\",\n flag_type=\"--\",\n rename_param=\"no-refls-in-stream\",\n )\n serial_offset: Optional[PositiveInt] = Field(\n description=\"Start numbering at `x` instead of 1.\",\n flag_type=\"--\",\n rename_param=\"serial-offset\",\n )\n harvest_file: Optional[str] = Field(\n description=\"Write parameters to file in JSON format.\",\n flag_type=\"--\",\n rename_param=\"harvest-file\",\n )\n\n @validator(\"in_file\", always=True)\n def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n if in_file == \"\":\n filename: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"FindPeaksPyAlgos\", \"out_file\"\n )\n if filename is None:\n exp: str = values[\"lute_config\"].experiment\n run: int = int(values[\"lute_config\"].run)\n tag: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"FindPeaksPsocake\", \"tag\"\n )\n out_dir: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"FindPeaksPsocake\", \"outDir\"\n )\n if out_dir is not None:\n fname: str = f\"{out_dir}/{exp}_{run:04d}\"\n if tag is not None:\n fname = f\"{fname}_{tag}\"\n return f\"{fname}.lst\"\n else:\n return filename\n return in_file\n\n @validator(\"out_file\", always=True)\n def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n if out_file == \"\":\n expmt: str = values[\"lute_config\"].experiment\n run: int = int(values[\"lute_config\"].run)\n work_dir: str = values[\"lute_config\"].work_dir\n fname: str = f\"{expmt}_r{run:04d}.stream\"\n return f\"{work_dir}/{fname}\"\n return out_file\n
"},{"location":"source/io/config/#io.config.IndexCrystFELParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_index.py
class Config(ThirdPartyParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n
"},{"location":"source/io/config/#io.config.IndexCrystFELParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True
class-attribute
instance-attribute
","text":"Whether long command-line arguments are passed like --long=arg
.
set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/config/#io.config.ManipulateHKLParameters","title":"ManipulateHKLParameters
","text":" Bases: ThirdPartyParameters
Parameters for CrystFEL's get_hkl
for manipulating lists of reflections.
This Task is predominantly used internally to convert hkl
to mtz
files. Note that performing multiple manipulations is undefined behaviour. Run the Task with multiple configurations in explicit separate steps. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-partialator.html
lute/io/models/sfx_merge.py
class ManipulateHKLParameters(ThirdPartyParameters):\n \"\"\"Parameters for CrystFEL's `get_hkl` for manipulating lists of reflections.\n\n This Task is predominantly used internally to convert `hkl` to `mtz` files.\n Note that performing multiple manipulations is undefined behaviour. Run\n the Task with multiple configurations in explicit separate steps. For more\n information on usage, please refer to the CrystFEL documentation, here:\n https://www.desy.de/~twhite/crystfel/manual-partialator.html\n \"\"\"\n\n class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/get_hkl\",\n description=\"CrystFEL's reflection manipulation binary.\",\n flag_type=\"\",\n )\n in_file: str = Field(\n \"\",\n description=\"Path to input HKL file.\",\n flag_type=\"-\",\n rename_param=\"i\",\n )\n out_file: str = Field(\n \"\",\n description=\"Path to output file.\",\n flag_type=\"-\",\n rename_param=\"o\",\n is_result=True,\n )\n cell_file: str = Field(\n \"\",\n description=\"Path to a file containing unit cell information (PDB or CrystFEL format).\",\n flag_type=\"-\",\n rename_param=\"p\",\n )\n output_format: str = Field(\n \"mtz\",\n description=\"Output format. One of mtz, mtz-bij, or xds. Otherwise CrystFEL format.\",\n flag_type=\"--\",\n rename_param=\"output-format\",\n )\n expand: Optional[str] = Field(\n description=\"Reflections will be expanded to fill asymmetric unit of specified point group.\",\n flag_type=\"--\",\n )\n # Reducing reflections to higher symmetry\n twin: Optional[str] = Field(\n description=\"Reflections equivalent to specified point group will have intensities summed.\",\n flag_type=\"--\",\n )\n no_need_all_parts: Optional[bool] = Field(\n description=\"Use with --twin to allow reflections missing a 'twin mate' to be written out.\",\n flag_type=\"--\",\n rename_param=\"no-need-all-parts\",\n )\n # Noise - Add to data\n noise: Optional[bool] = Field(\n description=\"Generate 10% uniform noise.\", flag_type=\"--\"\n )\n poisson: Optional[bool] = Field(\n description=\"Generate Poisson noise. Intensities assumed to be A.U.\",\n flag_type=\"--\",\n )\n adu_per_photon: Optional[int] = Field(\n description=\"Use with --poisson to convert A.U. to photons.\",\n flag_type=\"--\",\n rename_param=\"adu-per-photon\",\n )\n # Remove duplicate reflections\n trim_centrics: Optional[bool] = Field(\n description=\"Duplicated reflections (according to symmetry) are removed.\",\n flag_type=\"--\",\n )\n # Restrict to template file\n template: Optional[str] = Field(\n description=\"Only reflections which also appear in specified file are written out.\",\n flag_type=\"--\",\n )\n # Multiplicity\n multiplicity: Optional[bool] = Field(\n description=\"Reflections are multiplied by their symmetric multiplicites.\",\n flag_type=\"--\",\n )\n # Resolution cutoffs\n cutoff_angstroms: Optional[Union[str, int, float]] = Field(\n description=\"Either n, or n1,n2,n3. For n, reflections < n are removed. For n1,n2,n3 anisotropic trunction performed at separate resolution limits for a*, b*, c*.\",\n flag_type=\"--\",\n rename_param=\"cutoff-angstroms\",\n )\n lowres: Optional[float] = Field(\n description=\"Remove reflections with d > n\", flag_type=\"--\"\n )\n highres: Optional[float] = Field(\n description=\"Synonym for first form of --cutoff-angstroms\"\n )\n reindex: Optional[str] = Field(\n description=\"Reindex according to specified operator. E.g. k,h,-l.\",\n flag_type=\"--\",\n )\n # Override input symmetry\n symmetry: Optional[str] = Field(\n description=\"Point group symmetry to use to override. Almost always OMIT this option.\",\n flag_type=\"--\",\n )\n\n @validator(\"in_file\", always=True)\n def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n if in_file == \"\":\n partialator_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n )\n if partialator_file:\n return partialator_file\n return in_file\n\n @validator(\"out_file\", always=True)\n def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n if out_file == \"\":\n partialator_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n )\n if partialator_file:\n mtz_out: str = partialator_file.split(\".\")[0]\n mtz_out = f\"{mtz_out}.mtz\"\n return mtz_out\n return out_file\n\n @validator(\"cell_file\", always=True)\n def validate_cell_file(cls, cell_file: str, values: Dict[str, Any]) -> str:\n if cell_file == \"\":\n idx_cell_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\",\n \"IndexCrystFEL\",\n \"cell_file\",\n valid_only=False,\n )\n if idx_cell_file:\n return idx_cell_file\n return cell_file\n
"},{"location":"source/io/config/#io.config.ManipulateHKLParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_merge.py
class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/config/#io.config.ManipulateHKLParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True
class-attribute
instance-attribute
","text":"Whether long command-line arguments are passed like --long=arg
.
set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/config/#io.config.MergePartialatorParameters","title":"MergePartialatorParameters
","text":" Bases: ThirdPartyParameters
Parameters for CrystFEL's partialator
.
There are many parameters, and many combinations. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-partialator.html
Source code inlute/io/models/sfx_merge.py
class MergePartialatorParameters(ThirdPartyParameters):\n \"\"\"Parameters for CrystFEL's `partialator`.\n\n There are many parameters, and many combinations. For more information on\n usage, please refer to the CrystFEL documentation, here:\n https://www.desy.de/~twhite/crystfel/manual-partialator.html\n \"\"\"\n\n class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/partialator\",\n description=\"CrystFEL's Partialator binary.\",\n flag_type=\"\",\n )\n in_file: Optional[str] = Field(\n \"\", description=\"Path to input stream.\", flag_type=\"-\", rename_param=\"i\"\n )\n out_file: str = Field(\n \"\",\n description=\"Path to output file.\",\n flag_type=\"-\",\n rename_param=\"o\",\n is_result=True,\n )\n symmetry: str = Field(description=\"Point group symmetry.\", flag_type=\"--\")\n niter: Optional[int] = Field(\n description=\"Number of cycles of scaling and post-refinement.\",\n flag_type=\"-\",\n rename_param=\"n\",\n )\n no_scale: Optional[bool] = Field(\n description=\"Disable scaling.\", flag_type=\"--\", rename_param=\"no-scale\"\n )\n no_Bscale: Optional[bool] = Field(\n description=\"Disable Debye-Waller part of scaling.\",\n flag_type=\"--\",\n rename_param=\"no-Bscale\",\n )\n no_pr: Optional[bool] = Field(\n description=\"Disable orientation model.\", flag_type=\"--\", rename_param=\"no-pr\"\n )\n no_deltacchalf: Optional[bool] = Field(\n description=\"Disable rejection based on deltaCC1/2.\",\n flag_type=\"--\",\n rename_param=\"no-deltacchalf\",\n )\n model: str = Field(\n \"unity\",\n description=\"Partiality model. Options: xsphere, unity, offset, ggpm.\",\n flag_type=\"--\",\n )\n nthreads: int = Field(\n max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n description=\"Number of parallel analyses.\",\n flag_type=\"-\",\n rename_param=\"j\",\n )\n polarisation: Optional[str] = Field(\n description=\"Specification of incident polarisation. Refer to CrystFEL docs for more info.\",\n flag_type=\"--\",\n )\n no_polarisation: Optional[bool] = Field(\n description=\"Synonym for --polarisation=none\",\n flag_type=\"--\",\n rename_param=\"no-polarisation\",\n )\n max_adu: Optional[float] = Field(\n description=\"Maximum intensity of reflection to include.\",\n flag_type=\"--\",\n rename_param=\"max-adu\",\n )\n min_res: Optional[float] = Field(\n description=\"Only include crystals diffracting to a minimum resolution.\",\n flag_type=\"--\",\n rename_param=\"min-res\",\n )\n min_measurements: int = Field(\n 2,\n description=\"Include a reflection only if it appears a minimum number of times.\",\n flag_type=\"--\",\n rename_param=\"min-measurements\",\n )\n push_res: Optional[float] = Field(\n description=\"Merge reflections up to higher than the apparent resolution limit.\",\n flag_type=\"--\",\n rename_param=\"push-res\",\n )\n start_after: int = Field(\n 0,\n description=\"Ignore the first n crystals.\",\n flag_type=\"--\",\n rename_param=\"start-after\",\n )\n stop_after: int = Field(\n 0,\n description=\"Stop after processing n crystals. 0 means process all.\",\n flag_type=\"--\",\n rename_param=\"stop-after\",\n )\n no_free: Optional[bool] = Field(\n description=\"Disable cross-validation. Testing ONLY.\",\n flag_type=\"--\",\n rename_param=\"no-free\",\n )\n custom_split: Optional[str] = Field(\n description=\"Read a set of filenames, event and dataset IDs from a filename.\",\n flag_type=\"--\",\n rename_param=\"custom-split\",\n )\n max_rel_B: float = Field(\n 100,\n description=\"Reject crystals if |relB| > n sq Angstroms.\",\n flag_type=\"--\",\n rename_param=\"max-rel-B\",\n )\n output_every_cycle: bool = Field(\n False,\n description=\"Write per-crystal params after every refinement cycle.\",\n flag_type=\"--\",\n rename_param=\"output-every-cycle\",\n )\n no_logs: bool = Field(\n False,\n description=\"Do not write logs needed for plots, maps and graphs.\",\n flag_type=\"--\",\n rename_param=\"no-logs\",\n )\n set_symmetry: Optional[str] = Field(\n description=\"Set the apparent symmetry of the crystals to a point group.\",\n flag_type=\"-\",\n rename_param=\"w\",\n )\n operator: Optional[str] = Field(\n description=\"Specify an ambiguity operator. E.g. k,h,-l.\", flag_type=\"--\"\n )\n force_bandwidth: Optional[float] = Field(\n description=\"Set X-ray bandwidth. As percent, e.g. 0.0013 (0.13%).\",\n flag_type=\"--\",\n rename_param=\"force-bandwidth\",\n )\n force_radius: Optional[float] = Field(\n description=\"Set the initial profile radius (nm-1).\",\n flag_type=\"--\",\n rename_param=\"force-radius\",\n )\n force_lambda: Optional[float] = Field(\n description=\"Set the wavelength. In Angstroms.\",\n flag_type=\"--\",\n rename_param=\"force-lambda\",\n )\n harvest_file: Optional[str] = Field(\n description=\"Write parameters to file in JSON format.\",\n flag_type=\"--\",\n rename_param=\"harvest-file\",\n )\n\n @validator(\"in_file\", always=True)\n def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n if in_file == \"\":\n stream_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\",\n \"ConcatenateStreamFiles\",\n \"out_file\",\n )\n if stream_file:\n return stream_file\n return in_file\n\n @validator(\"out_file\", always=True)\n def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n if out_file == \"\":\n in_file: str = values[\"in_file\"]\n if in_file:\n tag: str = in_file.split(\".\")[0]\n return f\"{tag}.hkl\"\n else:\n return \"partialator.hkl\"\n return out_file\n
"},{"location":"source/io/config/#io.config.MergePartialatorParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_merge.py
class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/config/#io.config.MergePartialatorParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True
class-attribute
instance-attribute
","text":"Whether long command-line arguments are passed like --long=arg
.
set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/config/#io.config.RunSHELXCParameters","title":"RunSHELXCParameters
","text":" Bases: ThirdPartyParameters
Parameters for CCP4's SHELXC program.
SHELXC prepares files for SHELXD and SHELXE.
For more information please refer to the official documentation: https://www.ccp4.ac.uk/html/crank.html
Source code inlute/io/models/sfx_solve.py
class RunSHELXCParameters(ThirdPartyParameters):\n \"\"\"Parameters for CCP4's SHELXC program.\n\n SHELXC prepares files for SHELXD and SHELXE.\n\n For more information please refer to the official documentation:\n https://www.ccp4.ac.uk/html/crank.html\n \"\"\"\n\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/ccp4-8.0/bin/shelxc\",\n description=\"CCP4 SHELXC. Generates input files for SHELXD/SHELXE.\",\n flag_type=\"\",\n )\n placeholder: str = Field(\n \"xx\", description=\"Placeholder filename stem.\", flag_type=\"\"\n )\n in_file: str = Field(\n \"\",\n description=\"Input file for SHELXC with reflections AND proper records.\",\n flag_type=\"\",\n )\n\n @validator(\"in_file\", always=True)\n def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n if in_file == \"\":\n # get_hkl needed to be run to produce an XDS format file...\n xds_format_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"ManipulateHKL\", \"out_file\"\n )\n if xds_format_file:\n in_file = xds_format_file\n if in_file[0] != \"<\":\n # Need to add a redirection for this program\n # Runs like `shelxc xx <input_file.xds`\n in_file = f\"<{in_file}\"\n return in_file\n
"},{"location":"source/io/config/#io.config.SubmitSMDParameters","title":"SubmitSMDParameters
","text":" Bases: ThirdPartyParameters
Parameters for running smalldata to produce reduced HDF5 files.
Source code inlute/io/models/smd.py
class SubmitSMDParameters(ThirdPartyParameters):\n \"\"\"Parameters for running smalldata to produce reduced HDF5 files.\"\"\"\n\n class Config(ThirdPartyParameters.Config):\n \"\"\"Identical to super-class Config but includes a result.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n result_from_params: str = \"\"\n \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n\n executable: str = Field(\"mpirun\", description=\"MPI executable.\", flag_type=\"\")\n np: PositiveInt = Field(\n max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n description=\"Number of processes\",\n flag_type=\"-\",\n )\n p_arg1: str = Field(\n \"python\", description=\"Executable to run with mpi (i.e. python).\", flag_type=\"\"\n )\n u: str = Field(\n \"\", description=\"Python option for unbuffered output.\", flag_type=\"-\"\n )\n m: str = Field(\n \"mpi4py.run\",\n description=\"Python option to execute a module's contents as __main__ module.\",\n flag_type=\"-\",\n )\n producer: str = Field(\n \"\", description=\"Path to the SmallData producer Python script.\", flag_type=\"\"\n )\n run: str = Field(\n os.environ.get(\"RUN_NUM\", \"\"), description=\"DAQ Run Number.\", flag_type=\"--\"\n )\n experiment: str = Field(\n os.environ.get(\"EXPERIMENT\", \"\"),\n description=\"LCLS Experiment Number.\",\n flag_type=\"--\",\n )\n stn: NonNegativeInt = Field(0, description=\"Hutch endstation.\", flag_type=\"--\")\n nevents: int = Field(\n int(1e9), description=\"Number of events to process.\", flag_type=\"--\"\n )\n directory: Optional[str] = Field(\n None,\n description=\"Optional output directory. If None, will be in ${EXP_FOLDER}/hdf5/smalldata.\",\n flag_type=\"--\",\n )\n ## Need mechanism to set result_from_param=True ...\n gather_interval: PositiveInt = Field(\n 25, description=\"Number of events to collect at a time.\", flag_type=\"--\"\n )\n norecorder: bool = Field(\n False, description=\"Whether to ignore recorder streams.\", flag_type=\"--\"\n )\n url: HttpUrl = Field(\n \"https://pswww.slac.stanford.edu/ws-auth/lgbk\",\n description=\"Base URL for eLog posting.\",\n flag_type=\"--\",\n )\n epicsAll: bool = Field(\n False,\n description=\"Whether to store all EPICS PVs. Use with care.\",\n flag_type=\"--\",\n )\n full: bool = Field(\n False,\n description=\"Whether to store all data. Use with EXTRA care.\",\n flag_type=\"--\",\n )\n fullSum: bool = Field(\n False,\n description=\"Whether to store sums for all area detector images.\",\n flag_type=\"--\",\n )\n default: bool = Field(\n False,\n description=\"Whether to store only the default minimal set of data.\",\n flag_type=\"--\",\n )\n image: bool = Field(\n False,\n description=\"Whether to save everything as images. Use with care.\",\n flag_type=\"--\",\n )\n tiff: bool = Field(\n False,\n description=\"Whether to save all images as a single TIFF. Use with EXTRA care.\",\n flag_type=\"--\",\n )\n centerpix: bool = Field(\n False,\n description=\"Whether to mask center pixels for Epix10k2M detectors.\",\n flag_type=\"--\",\n )\n postRuntable: bool = Field(\n False,\n description=\"Whether to post run tables. Also used as a trigger for summary jobs.\",\n flag_type=\"--\",\n )\n wait: bool = Field(\n False, description=\"Whether to wait for a file to appear.\", flag_type=\"--\"\n )\n xtcav: bool = Field(\n False,\n description=\"Whether to add XTCAV processing to the HDF5 generation.\",\n flag_type=\"--\",\n )\n noarch: bool = Field(\n False, description=\"Whether to not use archiver data.\", flag_type=\"--\"\n )\n\n lute_template_cfg: TemplateConfig = TemplateConfig(template_name=\"\", output_path=\"\")\n\n @validator(\"producer\", always=True)\n def validate_producer_path(cls, producer: str) -> str:\n return producer\n\n @validator(\"lute_template_cfg\", always=True)\n def use_producer(\n cls, lute_template_cfg: TemplateConfig, values: Dict[str, Any]\n ) -> TemplateConfig:\n if not lute_template_cfg.output_path:\n lute_template_cfg.output_path = values[\"producer\"]\n return lute_template_cfg\n\n @root_validator(pre=False)\n def define_result(cls, values: Dict[str, Any]) -> Dict[str, Any]:\n exp: str = values[\"lute_config\"].experiment\n hutch: str = exp[:3]\n run: int = int(values[\"lute_config\"].run)\n directory: Optional[str] = values[\"directory\"]\n if directory is None:\n directory = f\"/sdf/data/lcls/ds/{hutch}/{exp}/hdf5/smalldata\"\n fname: str = f\"{exp}_Run{run:04d}.h5\"\n\n cls.Config.result_from_params = f\"{directory}/{fname}\"\n return values\n
"},{"location":"source/io/config/#io.config.SubmitSMDParameters.Config","title":"Config
","text":" Bases: Config
Identical to super-class Config but includes a result.
Source code inlute/io/models/smd.py
class Config(ThirdPartyParameters.Config):\n \"\"\"Identical to super-class Config but includes a result.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n result_from_params: str = \"\"\n \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n
"},{"location":"source/io/config/#io.config.SubmitSMDParameters.Config.result_from_params","title":"result_from_params: str = ''
class-attribute
instance-attribute
","text":"Defines a result from the parameters. Use a validator to do so.
"},{"location":"source/io/config/#io.config.SubmitSMDParameters.Config.set_result","title":"set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/config/#io.config.TaskParameters","title":"TaskParameters
","text":" Bases: BaseSettings
Base class for models of task parameters to be validated.
Parameters are read from a configuration YAML file and validated against subclasses of this type in order to ensure that both all parameters are present, and that the parameters are of the correct type.
NotePydantic is used for data validation. Pydantic does not perform \"strict\" validation by default. Parameter values may be cast to conform with the model specified by the subclass definition if it is possible to do so. Consider whether this may cause issues (e.g. if a float is cast to an int).
Source code inlute/io/models/base.py
class TaskParameters(BaseSettings):\n \"\"\"Base class for models of task parameters to be validated.\n\n Parameters are read from a configuration YAML file and validated against\n subclasses of this type in order to ensure that both all parameters are\n present, and that the parameters are of the correct type.\n\n Note:\n Pydantic is used for data validation. Pydantic does not perform \"strict\"\n validation by default. Parameter values may be cast to conform with the\n model specified by the subclass definition if it is possible to do so.\n Consider whether this may cause issues (e.g. if a float is cast to an\n int).\n \"\"\"\n\n class Config:\n \"\"\"Configuration for parameters model.\n\n The Config class holds Pydantic configuration. A number of LUTE-specific\n configuration has also been placed here.\n\n Attributes:\n env_prefix (str): Pydantic configuration. Will set parameters from\n environment variables containing this prefix. E.g. a model\n parameter `input` can be set with an environment variable:\n `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n underscore_attrs_are_private (bool): Pydantic configuration. Whether\n to hide attributes (parameters) prefixed with an underscore.\n\n copy_on_model_validation (str): Pydantic configuration. How to copy\n the input object passed to the class instance for model\n validation. Set to perform a deep copy.\n\n allow_inf_nan (bool): Pydantic configuration. Whether to allow\n infinity or NAN in float fields.\n\n run_directory (Optional[str]): None. If set, it should be a valid\n path. The `Task` will be run from this directory. This may be\n useful for some `Task`s which rely on searching the working\n directory.\n\n set_result (bool). False. If True, the model has information about\n setting the TaskResult object from the parameters it contains.\n E.g. it has an `output` parameter which is marked as the result.\n The result can be set with a field value of `is_result=True` on\n a specific parameter, or using `result_from_params` and a\n validator.\n\n result_from_params (Optional[str]): None. Optionally used to define\n results from information available in the model using a custom\n validator. E.g. use a `outdir` and `filename` field to set\n `result_from_params=f\"{outdir}/{filename}`, etc. Only used if\n `set_result==True`\n\n result_summary (Optional[str]): None. Defines a result summary that\n can be known after processing the Pydantic model. Use of summary\n depends on the Executor running the Task. All summaries are\n stored in the database, however. Only used if `set_result==True`\n\n impl_schemas (Optional[str]). Specifies a the schemas the\n output/results conform to. Only used if `set_result==True`.\n \"\"\"\n\n env_prefix = \"LUTE_\"\n underscore_attrs_are_private: bool = True\n copy_on_model_validation: str = \"deep\"\n allow_inf_nan: bool = False\n\n run_directory: Optional[str] = None\n \"\"\"Set the directory that the Task is run from.\"\"\"\n set_result: bool = False\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n result_from_params: Optional[str] = None\n \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n result_summary: Optional[str] = None\n \"\"\"Format a TaskResult.summary from output.\"\"\"\n impl_schemas: Optional[str] = None\n \"\"\"Schema specification for output result. Will be passed to TaskResult.\"\"\"\n\n lute_config: AnalysisHeader\n
"},{"location":"source/io/config/#io.config.TaskParameters.Config","title":"Config
","text":"Configuration for parameters model.
The Config class holds Pydantic configuration. A number of LUTE-specific configuration has also been placed here.
Attributes:
Name Type Descriptionenv_prefix
str
Pydantic configuration. Will set parameters from environment variables containing this prefix. E.g. a model parameter input
can be set with an environment variable: {env_prefix}input
, in LUTE's case LUTE_input
.
underscore_attrs_are_private
bool
Pydantic configuration. Whether to hide attributes (parameters) prefixed with an underscore.
copy_on_model_validation
str
Pydantic configuration. How to copy the input object passed to the class instance for model validation. Set to perform a deep copy.
allow_inf_nan
bool
Pydantic configuration. Whether to allow infinity or NAN in float fields.
run_directory
Optional[str]
None. If set, it should be a valid path. The Task
will be run from this directory. This may be useful for some Task
s which rely on searching the working directory.
result_from_params
Optional[str]
None. Optionally used to define results from information available in the model using a custom validator. E.g. use a outdir
and filename
field to set result_from_params=f\"{outdir}/{filename}
, etc. Only used if set_result==True
result_summary
Optional[str]
None. Defines a result summary that can be known after processing the Pydantic model. Use of summary depends on the Executor running the Task. All summaries are stored in the database, however. Only used if set_result==True
lute/io/models/base.py
class Config:\n \"\"\"Configuration for parameters model.\n\n The Config class holds Pydantic configuration. A number of LUTE-specific\n configuration has also been placed here.\n\n Attributes:\n env_prefix (str): Pydantic configuration. Will set parameters from\n environment variables containing this prefix. E.g. a model\n parameter `input` can be set with an environment variable:\n `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n underscore_attrs_are_private (bool): Pydantic configuration. Whether\n to hide attributes (parameters) prefixed with an underscore.\n\n copy_on_model_validation (str): Pydantic configuration. How to copy\n the input object passed to the class instance for model\n validation. Set to perform a deep copy.\n\n allow_inf_nan (bool): Pydantic configuration. Whether to allow\n infinity or NAN in float fields.\n\n run_directory (Optional[str]): None. If set, it should be a valid\n path. The `Task` will be run from this directory. This may be\n useful for some `Task`s which rely on searching the working\n directory.\n\n set_result (bool). False. If True, the model has information about\n setting the TaskResult object from the parameters it contains.\n E.g. it has an `output` parameter which is marked as the result.\n The result can be set with a field value of `is_result=True` on\n a specific parameter, or using `result_from_params` and a\n validator.\n\n result_from_params (Optional[str]): None. Optionally used to define\n results from information available in the model using a custom\n validator. E.g. use a `outdir` and `filename` field to set\n `result_from_params=f\"{outdir}/{filename}`, etc. Only used if\n `set_result==True`\n\n result_summary (Optional[str]): None. Defines a result summary that\n can be known after processing the Pydantic model. Use of summary\n depends on the Executor running the Task. All summaries are\n stored in the database, however. Only used if `set_result==True`\n\n impl_schemas (Optional[str]). Specifies a the schemas the\n output/results conform to. Only used if `set_result==True`.\n \"\"\"\n\n env_prefix = \"LUTE_\"\n underscore_attrs_are_private: bool = True\n copy_on_model_validation: str = \"deep\"\n allow_inf_nan: bool = False\n\n run_directory: Optional[str] = None\n \"\"\"Set the directory that the Task is run from.\"\"\"\n set_result: bool = False\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n result_from_params: Optional[str] = None\n \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n result_summary: Optional[str] = None\n \"\"\"Format a TaskResult.summary from output.\"\"\"\n impl_schemas: Optional[str] = None\n \"\"\"Schema specification for output result. Will be passed to TaskResult.\"\"\"\n
"},{"location":"source/io/config/#io.config.TaskParameters.Config.impl_schemas","title":"impl_schemas: Optional[str] = None
class-attribute
instance-attribute
","text":"Schema specification for output result. Will be passed to TaskResult.
"},{"location":"source/io/config/#io.config.TaskParameters.Config.result_from_params","title":"result_from_params: Optional[str] = None
class-attribute
instance-attribute
","text":"Defines a result from the parameters. Use a validator to do so.
"},{"location":"source/io/config/#io.config.TaskParameters.Config.result_summary","title":"result_summary: Optional[str] = None
class-attribute
instance-attribute
","text":"Format a TaskResult.summary from output.
"},{"location":"source/io/config/#io.config.TaskParameters.Config.run_directory","title":"run_directory: Optional[str] = None
class-attribute
instance-attribute
","text":"Set the directory that the Task is run from.
"},{"location":"source/io/config/#io.config.TaskParameters.Config.set_result","title":"set_result: bool = False
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/config/#io.config.TemplateConfig","title":"TemplateConfig
","text":" Bases: BaseModel
Parameters used for templating of third party configuration files.
Attributes:
Name Type Descriptiontemplate_name
str
The name of the template to use. This template must live in config/templates
.
output_path
str
The FULL path, including filename to write the rendered template to.
Source code inlute/io/models/base.py
class TemplateConfig(BaseModel):\n \"\"\"Parameters used for templating of third party configuration files.\n\n Attributes:\n template_name (str): The name of the template to use. This template must\n live in `config/templates`.\n\n output_path (str): The FULL path, including filename to write the\n rendered template to.\n \"\"\"\n\n template_name: str\n output_path: str\n
"},{"location":"source/io/config/#io.config.TemplateParameters","title":"TemplateParameters
","text":"Class for representing parameters for third party configuration files.
These parameters can represent arbitrary data types and are used in conjunction with templates for modifying third party configuration files from the single LUTE YAML. Due to the storage of arbitrary data types, and the use of a template file, a single instance of this class can hold from a single template variable to an entire configuration file. The data parsing is done by jinja using the complementary template. All data is stored in the single model variable params.
The pydantic \"dataclass\" is used over the BaseModel/Settings to allow positional argument instantiation of the params
Field.
lute/io/models/base.py
@dataclass\nclass TemplateParameters:\n \"\"\"Class for representing parameters for third party configuration files.\n\n These parameters can represent arbitrary data types and are used in\n conjunction with templates for modifying third party configuration files\n from the single LUTE YAML. Due to the storage of arbitrary data types, and\n the use of a template file, a single instance of this class can hold from a\n single template variable to an entire configuration file. The data parsing\n is done by jinja using the complementary template.\n All data is stored in the single model variable `params.`\n\n The pydantic \"dataclass\" is used over the BaseModel/Settings to allow\n positional argument instantiation of the `params` Field.\n \"\"\"\n\n params: Any\n
"},{"location":"source/io/config/#io.config.TestBinaryErrParameters","title":"TestBinaryErrParameters
","text":" Bases: ThirdPartyParameters
Same as TestBinary, but exits with non-zero code.
Source code inlute/io/models/tests.py
class TestBinaryErrParameters(ThirdPartyParameters):\n \"\"\"Same as TestBinary, but exits with non-zero code.\"\"\"\n\n executable: str = Field(\n \"/sdf/home/d/dorlhiac/test_tasks/test_threads_err\",\n description=\"Multi-threaded tes tbinary with non-zero exit code.\",\n )\n p_arg1: int = Field(1, description=\"Number of threads.\")\n
"},{"location":"source/io/config/#io.config.TestMultiNodeCommunicationParameters","title":"TestMultiNodeCommunicationParameters
","text":" Bases: TaskParameters
Parameters for the test Task TestMultiNodeCommunication
.
Test verifies communication across multiple machines.
Source code inlute/io/models/mpi_tests.py
class TestMultiNodeCommunicationParameters(TaskParameters):\n \"\"\"Parameters for the test Task `TestMultiNodeCommunication`.\n\n Test verifies communication across multiple machines.\n \"\"\"\n\n send_obj: Literal[\"plot\", \"array\"] = Field(\n \"array\", description=\"Object to send to Executor. `plot` or `array`\"\n )\n arr_size: Optional[int] = Field(\n None, description=\"Size of array to send back to Executor.\"\n )\n
"},{"location":"source/io/config/#io.config.TestParameters","title":"TestParameters
","text":" Bases: TaskParameters
Parameters for the test Task Test
.
lute/io/models/tests.py
class TestParameters(TaskParameters):\n \"\"\"Parameters for the test Task `Test`.\"\"\"\n\n float_var: float = Field(0.01, description=\"A floating point number.\")\n str_var: str = Field(\"test\", description=\"A string.\")\n\n class CompoundVar(BaseModel):\n int_var: int = 1\n dict_var: Dict[str, str] = {\"a\": \"b\"}\n\n compound_var: CompoundVar = Field(\n description=(\n \"A compound parameter - consists of a `int_var` (int) and `dict_var`\"\n \" (Dict[str, str]).\"\n )\n )\n throw_error: bool = Field(\n False, description=\"If `True`, raise an exception to test error handling.\"\n )\n
"},{"location":"source/io/config/#io.config.ThirdPartyParameters","title":"ThirdPartyParameters
","text":" Bases: TaskParameters
Base class for third party task parameters.
Contains special validators for extra arguments and handling of parameters used for filling in third party configuration files.
Source code inlute/io/models/base.py
class ThirdPartyParameters(TaskParameters):\n \"\"\"Base class for third party task parameters.\n\n Contains special validators for extra arguments and handling of parameters\n used for filling in third party configuration files.\n \"\"\"\n\n class Config(TaskParameters.Config):\n \"\"\"Configuration for parameters model.\n\n The Config class holds Pydantic configuration and inherited configuration\n from the base `TaskParameters.Config` class. A number of values are also\n overridden, and there are some specific configuration options to\n ThirdPartyParameters. A full list of options (with TaskParameters options\n repeated) is described below.\n\n Attributes:\n env_prefix (str): Pydantic configuration. Will set parameters from\n environment variables containing this prefix. E.g. a model\n parameter `input` can be set with an environment variable:\n `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n underscore_attrs_are_private (bool): Pydantic configuration. Whether\n to hide attributes (parameters) prefixed with an underscore.\n\n copy_on_model_validation (str): Pydantic configuration. How to copy\n the input object passed to the class instance for model\n validation. Set to perform a deep copy.\n\n allow_inf_nan (bool): Pydantic configuration. Whether to allow\n infinity or NAN in float fields.\n\n run_directory (Optional[str]): None. If set, it should be a valid\n path. The `Task` will be run from this directory. This may be\n useful for some `Task`s which rely on searching the working\n directory.\n\n set_result (bool). True. If True, the model has information about\n setting the TaskResult object from the parameters it contains.\n E.g. it has an `output` parameter which is marked as the result.\n The result can be set with a field value of `is_result=True` on\n a specific parameter, or using `result_from_params` and a\n validator.\n\n result_from_params (Optional[str]): None. Optionally used to define\n results from information available in the model using a custom\n validator. E.g. use a `outdir` and `filename` field to set\n `result_from_params=f\"{outdir}/{filename}`, etc.\n\n result_summary (Optional[str]): None. Defines a result summary that\n can be known after processing the Pydantic model. Use of summary\n depends on the Executor running the Task. All summaries are\n stored in the database, however.\n\n impl_schemas (Optional[str]). Specifies a the schemas the\n output/results conform to. Only used if set_result is True.\n\n -----------------------\n ThirdPartyTask-specific:\n\n extra (str): \"allow\". Pydantic configuration. Allow (or ignore) extra\n arguments.\n\n short_flags_use_eq (bool): False. If True, \"short\" command-line args\n are passed as `-x=arg`. ThirdPartyTask-specific.\n\n long_flags_use_eq (bool): False. If True, \"long\" command-line args\n are passed as `--long=arg`. ThirdPartyTask-specific.\n \"\"\"\n\n extra: str = \"allow\"\n short_flags_use_eq: bool = False\n \"\"\"Whether short command-line arguments are passed like `-x=arg`.\"\"\"\n long_flags_use_eq: bool = False\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n # lute_template_cfg: TemplateConfig\n\n @root_validator(pre=False)\n def extra_fields_to_thirdparty(cls, values: Dict[str, Any]):\n for key in values:\n if key not in cls.__fields__:\n values[key] = TemplateParameters(values[key])\n\n return values\n
"},{"location":"source/io/config/#io.config.ThirdPartyParameters.Config","title":"Config
","text":" Bases: Config
Configuration for parameters model.
The Config class holds Pydantic configuration and inherited configuration from the base TaskParameters.Config
class. A number of values are also overridden, and there are some specific configuration options to ThirdPartyParameters. A full list of options (with TaskParameters options repeated) is described below.
Attributes:
Name Type Descriptionenv_prefix
str
Pydantic configuration. Will set parameters from environment variables containing this prefix. E.g. a model parameter input
can be set with an environment variable: {env_prefix}input
, in LUTE's case LUTE_input
.
underscore_attrs_are_private
bool
Pydantic configuration. Whether to hide attributes (parameters) prefixed with an underscore.
copy_on_model_validation
str
Pydantic configuration. How to copy the input object passed to the class instance for model validation. Set to perform a deep copy.
allow_inf_nan
bool
Pydantic configuration. Whether to allow infinity or NAN in float fields.
run_directory
Optional[str]
None. If set, it should be a valid path. The Task
will be run from this directory. This may be useful for some Task
s which rely on searching the working directory.
result_from_params
Optional[str]
None. Optionally used to define results from information available in the model using a custom validator. E.g. use a outdir
and filename
field to set result_from_params=f\"{outdir}/{filename}
, etc.
result_summary
Optional[str]
None. Defines a result summary that can be known after processing the Pydantic model. Use of summary depends on the Executor running the Task. All summaries are stored in the database, however.
ThirdPartyTask-specific
Optional[str]
extra
str
\"allow\". Pydantic configuration. Allow (or ignore) extra arguments.
short_flags_use_eq
bool
False. If True, \"short\" command-line args are passed as -x=arg
. ThirdPartyTask-specific.
long_flags_use_eq
bool
False. If True, \"long\" command-line args are passed as --long=arg
. ThirdPartyTask-specific.
lute/io/models/base.py
class Config(TaskParameters.Config):\n \"\"\"Configuration for parameters model.\n\n The Config class holds Pydantic configuration and inherited configuration\n from the base `TaskParameters.Config` class. A number of values are also\n overridden, and there are some specific configuration options to\n ThirdPartyParameters. A full list of options (with TaskParameters options\n repeated) is described below.\n\n Attributes:\n env_prefix (str): Pydantic configuration. Will set parameters from\n environment variables containing this prefix. E.g. a model\n parameter `input` can be set with an environment variable:\n `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n underscore_attrs_are_private (bool): Pydantic configuration. Whether\n to hide attributes (parameters) prefixed with an underscore.\n\n copy_on_model_validation (str): Pydantic configuration. How to copy\n the input object passed to the class instance for model\n validation. Set to perform a deep copy.\n\n allow_inf_nan (bool): Pydantic configuration. Whether to allow\n infinity or NAN in float fields.\n\n run_directory (Optional[str]): None. If set, it should be a valid\n path. The `Task` will be run from this directory. This may be\n useful for some `Task`s which rely on searching the working\n directory.\n\n set_result (bool). True. If True, the model has information about\n setting the TaskResult object from the parameters it contains.\n E.g. it has an `output` parameter which is marked as the result.\n The result can be set with a field value of `is_result=True` on\n a specific parameter, or using `result_from_params` and a\n validator.\n\n result_from_params (Optional[str]): None. Optionally used to define\n results from information available in the model using a custom\n validator. E.g. use a `outdir` and `filename` field to set\n `result_from_params=f\"{outdir}/{filename}`, etc.\n\n result_summary (Optional[str]): None. Defines a result summary that\n can be known after processing the Pydantic model. Use of summary\n depends on the Executor running the Task. All summaries are\n stored in the database, however.\n\n impl_schemas (Optional[str]). Specifies a the schemas the\n output/results conform to. Only used if set_result is True.\n\n -----------------------\n ThirdPartyTask-specific:\n\n extra (str): \"allow\". Pydantic configuration. Allow (or ignore) extra\n arguments.\n\n short_flags_use_eq (bool): False. If True, \"short\" command-line args\n are passed as `-x=arg`. ThirdPartyTask-specific.\n\n long_flags_use_eq (bool): False. If True, \"long\" command-line args\n are passed as `--long=arg`. ThirdPartyTask-specific.\n \"\"\"\n\n extra: str = \"allow\"\n short_flags_use_eq: bool = False\n \"\"\"Whether short command-line arguments are passed like `-x=arg`.\"\"\"\n long_flags_use_eq: bool = False\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/config/#io.config.ThirdPartyParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = False
class-attribute
instance-attribute
","text":"Whether long command-line arguments are passed like --long=arg
.
set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/config/#io.config.ThirdPartyParameters.Config.short_flags_use_eq","title":"short_flags_use_eq: bool = False
class-attribute
instance-attribute
","text":"Whether short command-line arguments are passed like -x=arg
.
parse_config(task_name='test', config_path='')
","text":"Parse a configuration file and validate the contents.
Parameters:
Name Type Description Defaulttask_name
str
Name of the specific task that will be run.
'test'
config_path
str
Path to the configuration file.
''
Returns:
Name Type Descriptionparams
TaskParameters
A TaskParameters object of validated task-specific parameters. Parameters are accessed with \"dot\" notation. E.g. params.param1
.
Raises:
Type DescriptionValidationError
Raised if there are problems with the configuration file. Passed through from Pydantic.
Source code inlute/io/config.py
def parse_config(task_name: str = \"test\", config_path: str = \"\") -> TaskParameters:\n \"\"\"Parse a configuration file and validate the contents.\n\n Args:\n task_name (str): Name of the specific task that will be run.\n\n config_path (str): Path to the configuration file.\n\n Returns:\n params (TaskParameters): A TaskParameters object of validated\n task-specific parameters. Parameters are accessed with \"dot\"\n notation. E.g. `params.param1`.\n\n Raises:\n ValidationError: Raised if there are problems with the configuration\n file. Passed through from Pydantic.\n \"\"\"\n task_config_name: str = f\"{task_name}Parameters\"\n\n with open(config_path, \"r\") as f:\n docs: Iterator[Dict[str, Any]] = yaml.load_all(stream=f, Loader=yaml.FullLoader)\n header: Dict[str, Any] = next(docs)\n config: Dict[str, Any] = next(docs)\n substitute_variables(header, header)\n substitute_variables(header, config)\n LUTE_DEBUG_EXIT(\"LUTE_DEBUG_EXIT_AT_YAML\", pprint.pformat(config))\n lute_config: Dict[str, AnalysisHeader] = {\"lute_config\": AnalysisHeader(**header)}\n try:\n task_config: Dict[str, Any] = dict(config[task_name])\n lute_config.update(task_config)\n except KeyError as err:\n warnings.warn(\n (\n f\"{task_name} has no parameter definitions in YAML file.\"\n \" Attempting default parameter initialization.\"\n )\n )\n parsed_parameters: TaskParameters = globals()[task_config_name](**lute_config)\n return parsed_parameters\n
"},{"location":"source/io/config/#io.config.substitute_variables","title":"substitute_variables(header, config, curr_key=None)
","text":"Performs variable substitutions on a dictionary read from config YAML file.
Can be used to define input parameters in terms of other input parameters. This is similar to functionality employed by validators for parameters in the specific Task models, but is intended to be more accessible to users. Variable substitutions are defined using a minimal syntax from Jinja: {{ experiment }} defines a substitution of the variable experiment
. The characters {{ }}
can be escaped if the literal symbols are needed in place.
For example, a path to a file can be defined in terms of experiment and run values in the config file: MyTask: experiment: myexp run: 2 special_file: /path/to/{{ experiment }}/{{ run }}/file.inp
Acceptable variables for substitutions are values defined elsewhere in the YAML file. Environment variables can also be used if prefaced with a $
character. E.g. to get the experiment from an environment variable: MyTask: run: 2 special_file: /path/to/{{ $EXPERIMENT }}/{{ run }}/file.inp
Parameters:
Name Type Description Defaultconfig
Dict[str, Any]
A dictionary of parsed configuration.
requiredcurr_key
Optional[str]
Used to keep track of recursion level when scanning through iterable items in the config dictionary.
None
Returns:
Name Type Descriptionsubbed_config
Dict[str, Any]
The config dictionary after substitutions have been made. May be identical to the input if no substitutions are needed.
Source code inlute/io/config.py
def substitute_variables(\n header: Dict[str, Any], config: Dict[str, Any], curr_key: Optional[str] = None\n) -> None:\n \"\"\"Performs variable substitutions on a dictionary read from config YAML file.\n\n Can be used to define input parameters in terms of other input parameters.\n This is similar to functionality employed by validators for parameters in\n the specific Task models, but is intended to be more accessible to users.\n Variable substitutions are defined using a minimal syntax from Jinja:\n {{ experiment }}\n defines a substitution of the variable `experiment`. The characters `{{ }}`\n can be escaped if the literal symbols are needed in place.\n\n For example, a path to a file can be defined in terms of experiment and run\n values in the config file:\n MyTask:\n experiment: myexp\n run: 2\n special_file: /path/to/{{ experiment }}/{{ run }}/file.inp\n\n Acceptable variables for substitutions are values defined elsewhere in the\n YAML file. Environment variables can also be used if prefaced with a `$`\n character. E.g. to get the experiment from an environment variable:\n MyTask:\n run: 2\n special_file: /path/to/{{ $EXPERIMENT }}/{{ run }}/file.inp\n\n Args:\n config (Dict[str, Any]): A dictionary of parsed configuration.\n\n curr_key (Optional[str]): Used to keep track of recursion level when scanning\n through iterable items in the config dictionary.\n\n Returns:\n subbed_config (Dict[str, Any]): The config dictionary after substitutions\n have been made. May be identical to the input if no substitutions are\n needed.\n \"\"\"\n _sub_pattern = r\"\\{\\{[^}{]*\\}\\}\"\n iterable: Dict[str, Any] = config\n if curr_key is not None:\n # Need to handle nested levels by interpreting curr_key\n keys_by_level: List[str] = curr_key.split(\".\")\n for key in keys_by_level:\n iterable = iterable[key]\n else:\n ...\n # iterable = config\n for param, value in iterable.items():\n if isinstance(value, dict):\n new_key: str\n if curr_key is None:\n new_key = param\n else:\n new_key = f\"{curr_key}.{param}\"\n substitute_variables(header, config, curr_key=new_key)\n elif isinstance(value, list):\n ...\n # Scalars str - we skip numeric types\n elif isinstance(value, str):\n matches: List[str] = re.findall(_sub_pattern, value)\n for m in matches:\n key_to_sub_maybe_with_fmt: List[str] = m[2:-2].strip().split(\":\")\n key_to_sub: str = key_to_sub_maybe_with_fmt[0]\n fmt: Optional[str] = None\n if len(key_to_sub_maybe_with_fmt) == 2:\n fmt = key_to_sub_maybe_with_fmt[1]\n sub: Any\n if key_to_sub[0] == \"$\":\n sub = os.getenv(key_to_sub[1:], None)\n if sub is None:\n print(\n f\"Environment variable {key_to_sub[1:]} not found! Cannot substitute in YAML config!\",\n flush=True,\n )\n continue\n # substitutions from env vars will be strings, so convert back\n # to numeric in order to perform formatting later on (e.g. {var:04d})\n sub = _check_str_numeric(sub)\n else:\n try:\n sub = config\n for key in key_to_sub.split(\".\"):\n sub = sub[key]\n except KeyError:\n sub = header[key_to_sub]\n pattern: str = (\n m.replace(\"{{\", r\"\\{\\{\").replace(\"}}\", r\"\\}\\}\").replace(\"$\", r\"\\$\")\n )\n if fmt is not None:\n sub = f\"{sub:{fmt}}\"\n else:\n sub = f\"{sub}\"\n iterable[param] = re.sub(pattern, sub, iterable[param])\n # Reconvert back to numeric values if needed...\n iterable[param] = _check_str_numeric(iterable[param])\n
"},{"location":"source/io/db/","title":"db","text":"Tools for working with the LUTE parameter and configuration database.
The current implementation relies on a sqlite backend database. In the future this may change - therefore relatively few high-level API function calls are intended to be public. These abstract away the details of the database interface and work exclusively on LUTE objects.
Functions:
Name Descriptionrecord_analysis_db
DescribedAnalysis) -> None: Writes the configuration to the backend database.
read_latest_db_entry
str, task_name: str, param: str) -> Any: Retrieve the most recent entry from a database for a specific Task.
Raises:
Type DescriptionDatabaseError
Generic exception raised for LUTE database errors.
"},{"location":"source/io/db/#io.db.DatabaseError","title":"DatabaseError
","text":" Bases: Exception
General LUTE database error.
Source code inlute/io/db.py
class DatabaseError(Exception):\n \"\"\"General LUTE database error.\"\"\"\n\n ...\n
"},{"location":"source/io/db/#io.db.read_latest_db_entry","title":"read_latest_db_entry(db_dir, task_name, param, valid_only=True)
","text":"Read most recent value entered into the database for a Task parameter.
(Will be updated for schema compliance as well as Task name.)
Parameters:
Name Type Description Defaultdb_dir
str
Database location.
requiredtask_name
str
The name of the Task to check the database for.
requiredparam
str
The parameter name for the Task that we want to retrieve.
requiredvalid_only
bool
Whether to consider only valid results or not. E.g. An input file may be useful even if the Task result is invalid (Failed). Default = True.
True
Returns:
Name Type Descriptionval
Any
The most recently entered value for param
of task_name
that can be found in the database. Returns None if nothing found.
lute/io/db.py
def read_latest_db_entry(\n db_dir: str, task_name: str, param: str, valid_only: bool = True\n) -> Optional[Any]:\n \"\"\"Read most recent value entered into the database for a Task parameter.\n\n (Will be updated for schema compliance as well as Task name.)\n\n Args:\n db_dir (str): Database location.\n\n task_name (str): The name of the Task to check the database for.\n\n param (str): The parameter name for the Task that we want to retrieve.\n\n valid_only (bool): Whether to consider only valid results or not. E.g.\n An input file may be useful even if the Task result is invalid\n (Failed). Default = True.\n\n Returns:\n val (Any): The most recently entered value for `param` of `task_name`\n that can be found in the database. Returns None if nothing found.\n \"\"\"\n import sqlite3\n from ._sqlite import _select_from_db\n\n con: sqlite3.Connection = sqlite3.Connection(f\"{db_dir}/lute.db\")\n with con:\n try:\n cond: Dict[str, str] = {}\n if valid_only:\n cond = {\"valid_flag\": \"1\"}\n entry: Any = _select_from_db(con, task_name, param, cond)\n except sqlite3.OperationalError as err:\n logger.debug(f\"Cannot retrieve value {param} due to: {err}\")\n entry = None\n return entry\n
"},{"location":"source/io/db/#io.db.record_analysis_db","title":"record_analysis_db(cfg)
","text":"Write an DescribedAnalysis object to the database.
The DescribedAnalysis object is maintained by the Executor and contains all information necessary to fully describe a single Task
execution. The contained fields are split across multiple tables within the database as some of the information can be shared across multiple Tasks. Refer to docs/design/database.md
for more information on the database specification.
lute/io/db.py
def record_analysis_db(cfg: DescribedAnalysis) -> None:\n \"\"\"Write an DescribedAnalysis object to the database.\n\n The DescribedAnalysis object is maintained by the Executor and contains all\n information necessary to fully describe a single `Task` execution. The\n contained fields are split across multiple tables within the database as\n some of the information can be shared across multiple Tasks. Refer to\n `docs/design/database.md` for more information on the database specification.\n \"\"\"\n import sqlite3\n from ._sqlite import (\n _make_shared_table,\n _make_task_table,\n _add_row_no_duplicate,\n _add_task_entry,\n )\n\n try:\n work_dir: str = cfg.task_parameters.lute_config.work_dir\n except AttributeError:\n logger.info(\n (\n \"Unable to access TaskParameters object. Likely wasn't created. \"\n \"Cannot store result.\"\n )\n )\n return\n del cfg.task_parameters.lute_config.work_dir\n\n exec_entry, exec_columns = _cfg_to_exec_entry_cols(cfg)\n task_name: str = cfg.task_result.task_name\n # All `Task`s have an AnalysisHeader, but this info can be shared so is\n # split into a different table\n (\n task_entry, # Dict[str, Any]\n task_columns, # Dict[str, str]\n gen_entry, # Dict[str, Any]\n gen_columns, # Dict[str, str]\n ) = _params_to_entry_cols(cfg.task_parameters)\n x, y = _result_to_entry_cols(cfg.task_result)\n task_entry.update(x)\n task_columns.update(y)\n\n con: sqlite3.Connection = sqlite3.Connection(f\"{work_dir}/lute.db\")\n with con:\n # --- Table Creation ---#\n if not _make_shared_table(con, \"gen_cfg\", gen_columns):\n raise DatabaseError(\"Could not make general configuration table!\")\n if not _make_shared_table(con, \"exec_cfg\", exec_columns):\n raise DatabaseError(\"Could not make Executor configuration table!\")\n if not _make_task_table(con, task_name, task_columns):\n raise DatabaseError(f\"Could not make Task table for: {task_name}!\")\n\n # --- Row Addition ---#\n gen_id: int = _add_row_no_duplicate(con, \"gen_cfg\", gen_entry)\n exec_id: int = _add_row_no_duplicate(con, \"exec_cfg\", exec_entry)\n\n full_task_entry: Dict[str, Any] = {\n \"gen_cfg_id\": gen_id,\n \"exec_cfg_id\": exec_id,\n }\n full_task_entry.update(task_entry)\n # Prepare flag to indicate whether the task entry is valid or not\n # By default we say it is assuming proper completion\n valid_flag: int = (\n 1 if cfg.task_result.task_status == TaskStatus.COMPLETED else 0\n )\n full_task_entry.update({\"valid_flag\": valid_flag})\n\n _add_task_entry(con, task_name, full_task_entry)\n
"},{"location":"source/io/elog/","title":"elog","text":"Provides utilities for communicating with the LCLS eLog.
Make use of various eLog API endpoint to retrieve information or post results.
Functions:
Name Descriptionget_elog_opr_auth
str): Return an authorization object to interact with eLog API as an opr account for the hutch where exp
was conducted.
get_elog_kerberos_auth
Return the authorization headers for the user account submitting the job.
elog_http_request
str, request_type: str, **params): Make an HTTP request to the API endpoint at url
.
format_file_for_post
Union[str, tuple, list]): Prepare files according to the specification needed to add them as attachments to eLog posts.
post_elog_message
str, msg: str, tag: Optional[str], title: Optional[str], in_files: List[Union[str, tuple, list]], auth: Optional[Union[HTTPBasicAuth, Dict]] = None) Post a message to the eLog.
post_elog_run_status
Dict[str, Union[str, int, float]], update_url: Optional[str] = None) Post a run status to the summary section on the Workflows>Control tab.
post_elog_run_table
str, run: int, data: Dict[str, Any], auth: Optional[Union[HTTPBasicAuth, Dict]] = None) Update run table in the eLog.
get_elog_runs_by_tag
str, tag: str, auth: Optional[Union[HTTPBasicAuth, Dict]] = None) Return a list of runs with a specific tag.
get_elog_params_by_run
str, params: List[str], runs: Optional[List[int]]) Retrieve the requested parameters by run. If no run is provided, retrieve the requested parameters for all runs.
"},{"location":"source/io/elog/#io.elog.elog_http_request","title":"elog_http_request(exp, endpoint, request_type, **params)
","text":"Make an HTTP request to the eLog.
This method will determine the proper authorization method and update the passed parameters appropriately. Functions implementing specific endpoint functionality and calling this function should only pass the necessary endpoint-specific parameters and not include the authorization objects.
Parameters:
Name Type Description Defaultexp
str
Experiment.
requiredendpoint
str
eLog API endpoint.
requiredrequest_type
str
Type of request to make. Recognized options: POST or GET.
required**params
Dict
Endpoint parameters to pass with the HTTP request! Differs depending on the API endpoint. Do not include auth objects.
{}
Returns:
Name Type Descriptionstatus_code
int
Response status code. Can be checked for errors.
msg
str
An error message, or a message saying SUCCESS.
value
Optional[Any]
For GET requests ONLY, return the requested information.
Source code inlute/io/elog.py
def elog_http_request(\n exp: str, endpoint: str, request_type: str, **params\n) -> Tuple[int, str, Optional[Any]]:\n \"\"\"Make an HTTP request to the eLog.\n\n This method will determine the proper authorization method and update the\n passed parameters appropriately. Functions implementing specific endpoint\n functionality and calling this function should only pass the necessary\n endpoint-specific parameters and not include the authorization objects.\n\n Args:\n exp (str): Experiment.\n\n endpoint (str): eLog API endpoint.\n\n request_type (str): Type of request to make. Recognized options: POST or\n GET.\n\n **params (Dict): Endpoint parameters to pass with the HTTP request!\n Differs depending on the API endpoint. Do not include auth objects.\n\n Returns:\n status_code (int): Response status code. Can be checked for errors.\n\n msg (str): An error message, or a message saying SUCCESS.\n\n value (Optional[Any]): For GET requests ONLY, return the requested\n information.\n \"\"\"\n auth: Union[HTTPBasicAuth, Dict[str, str]] = get_elog_auth(exp)\n base_url: str\n if isinstance(auth, HTTPBasicAuth):\n params.update({\"auth\": auth})\n base_url = \"https://pswww.slac.stanford.edu/ws-auth/lgbk/lgbk\"\n elif isinstance(auth, dict):\n params.update({\"headers\": auth})\n base_url = \"https://pswww.slac.stanford.edu/ws-kerb/lgbk/lgbk\"\n\n url: str = f\"{base_url}/{endpoint}\"\n\n resp: requests.models.Response\n if request_type.upper() == \"POST\":\n resp = requests.post(url, **params)\n elif request_type.upper() == \"GET\":\n resp = requests.get(url, **params)\n else:\n return (-1, \"Invalid request type!\", None)\n\n status_code: int = resp.status_code\n msg: str = \"SUCCESS\"\n\n if resp.json()[\"success\"] and request_type.upper() == \"GET\":\n return (status_code, msg, resp.json()[\"value\"])\n\n if status_code >= 300:\n msg = f\"Error when posting to eLog: Response {status_code}\"\n\n if not resp.json()[\"success\"]:\n err_msg = resp.json()[\"error_msg\"]\n msg += f\"\\nInclude message: {err_msg}\"\n return (resp.status_code, msg, None)\n
"},{"location":"source/io/elog/#io.elog.format_file_for_post","title":"format_file_for_post(in_file)
","text":"Format a file for attachment to an eLog post.
The eLog API expects a specifically formatted tuple when adding file attachments. This function prepares the tuple to specification given a number of different input types.
Parameters:
Name Type Description Defaultin_file
str | tuple | list
File to include as an attachment in an eLog post.
required Source code inlute/io/elog.py
def format_file_for_post(\n in_file: Union[str, tuple, list]\n) -> Tuple[str, Tuple[str, BufferedReader], Any]:\n \"\"\"Format a file for attachment to an eLog post.\n\n The eLog API expects a specifically formatted tuple when adding file\n attachments. This function prepares the tuple to specification given a\n number of different input types.\n\n Args:\n in_file (str | tuple | list): File to include as an attachment in an\n eLog post.\n \"\"\"\n description: str\n fptr: BufferedReader\n ftype: Optional[str]\n if isinstance(in_file, str):\n description = os.path.basename(in_file)\n fptr = open(in_file, \"rb\")\n ftype = mimetypes.guess_type(in_file)[0]\n elif isinstance(in_file, tuple) or isinstance(in_file, list):\n description = in_file[1]\n fptr = open(in_file[0], \"rb\")\n ftype = mimetypes.guess_type(in_file[0])[0]\n else:\n raise ElogFileFormatError(f\"Unrecognized format: {in_file}\")\n\n out_file: Tuple[str, Tuple[str, BufferedReader], Any] = (\n \"files\",\n (description, fptr),\n ftype,\n )\n return out_file\n
"},{"location":"source/io/elog/#io.elog.get_elog_active_expmt","title":"get_elog_active_expmt(hutch, *, endstation=0)
","text":"Get the current active experiment for a hutch.
This function is one of two functions to manage the HTTP request independently. This is because it does not require an authorization object, and its result is needed for the generic function elog_http_request
to work properly.
Parameters:
Name Type Description Defaulthutch
str
The hutch to get the active experiment for.
requiredendstation
int
The hutch endstation to get the experiment for. This should generally be 0.
0
Source code in lute/io/elog.py
def get_elog_active_expmt(hutch: str, *, endstation: int = 0) -> str:\n \"\"\"Get the current active experiment for a hutch.\n\n This function is one of two functions to manage the HTTP request independently.\n This is because it does not require an authorization object, and its result\n is needed for the generic function `elog_http_request` to work properly.\n\n Args:\n hutch (str): The hutch to get the active experiment for.\n\n endstation (int): The hutch endstation to get the experiment for. This\n should generally be 0.\n \"\"\"\n\n base_url: str = \"https://pswww.slac.stanford.edu/ws/lgbk/lgbk\"\n endpoint: str = \"ws/activeexperiment_for_instrument_station\"\n url: str = f\"{base_url}/{endpoint}\"\n params: Dict[str, str] = {\"instrument_name\": hutch, \"station\": f\"{endstation}\"}\n resp: requests.models.Response = requests.get(url, params)\n if resp.status_code > 300:\n raise RuntimeError(\n f\"Error getting current experiment!\\n\\t\\tIncorrect hutch: '{hutch}'?\"\n )\n if resp.json()[\"success\"]:\n return resp.json()[\"value\"][\"name\"]\n else:\n msg: str = resp.json()[\"error_msg\"]\n raise RuntimeError(f\"Error getting current experiment! Err: {msg}\")\n
"},{"location":"source/io/elog/#io.elog.get_elog_auth","title":"get_elog_auth(exp)
","text":"Determine the appropriate auth method depending on experiment state.
Returns:
Name Type Descriptionauth
HTTPBasicAuth | Dict[str, str]
Depending on whether an experiment is active/live, returns authorization for the hutch operator account or the current user submitting a job.
Source code inlute/io/elog.py
def get_elog_auth(exp: str) -> Union[HTTPBasicAuth, Dict[str, str]]:\n \"\"\"Determine the appropriate auth method depending on experiment state.\n\n Returns:\n auth (HTTPBasicAuth | Dict[str, str]): Depending on whether an experiment\n is active/live, returns authorization for the hutch operator account\n or the current user submitting a job.\n \"\"\"\n hutch: str = exp[:3]\n if exp.lower() == get_elog_active_expmt(hutch=hutch).lower():\n return get_elog_opr_auth(exp)\n else:\n return get_elog_kerberos_auth()\n
"},{"location":"source/io/elog/#io.elog.get_elog_kerberos_auth","title":"get_elog_kerberos_auth()
","text":"Returns Kerberos authorization key.
This functions returns authorization for the USER account submitting jobs. It assumes that kinit
has been run.
Returns:
Name Type Descriptionauth
Dict[str, str]
Dictionary containing Kerberos authorization key.
Source code inlute/io/elog.py
def get_elog_kerberos_auth() -> Dict[str, str]:\n \"\"\"Returns Kerberos authorization key.\n\n This functions returns authorization for the USER account submitting jobs.\n It assumes that `kinit` has been run.\n\n Returns:\n auth (Dict[str, str]): Dictionary containing Kerberos authorization key.\n \"\"\"\n from krtc import KerberosTicket\n\n return KerberosTicket(\"HTTP@pswww.slac.stanford.edu\").getAuthHeaders()\n
"},{"location":"source/io/elog/#io.elog.get_elog_opr_auth","title":"get_elog_opr_auth(exp)
","text":"Produce authentication for the \"opr\" user associated to an experiment.
This method uses basic authentication using username and password.
Parameters:
Name Type Description Defaultexp
str
Name of the experiment to produce authentication for.
requiredReturns:
Name Type Descriptionauth
HTTPBasicAuth
HTTPBasicAuth for an active experiment based on username and password for the associated operator account.
Source code inlute/io/elog.py
def get_elog_opr_auth(exp: str) -> HTTPBasicAuth:\n \"\"\"Produce authentication for the \"opr\" user associated to an experiment.\n\n This method uses basic authentication using username and password.\n\n Args:\n exp (str): Name of the experiment to produce authentication for.\n\n Returns:\n auth (HTTPBasicAuth): HTTPBasicAuth for an active experiment based on\n username and password for the associated operator account.\n \"\"\"\n opr: str = f\"{exp[:3]}opr\"\n with open(\"/sdf/group/lcls/ds/tools/forElogPost.txt\", \"r\") as f:\n pw: str = f.readline()[:-1]\n return HTTPBasicAuth(opr, pw)\n
"},{"location":"source/io/elog/#io.elog.get_elog_params_by_run","title":"get_elog_params_by_run(exp, params, runs=None)
","text":"Retrieve requested parameters by run or for all runs.
Parameters:
Name Type Description Defaultexp
str
Experiment to retrieve parameters for.
requiredparams
List[str]
A list of parameters to retrieve. These can be any parameter recorded in the eLog (PVs, parameters posted by other Tasks, etc.)
required Source code inlute/io/elog.py
def get_elog_params_by_run(\n exp: str, params: List[str], runs: Optional[List[int]] = None\n) -> Dict[str, str]:\n \"\"\"Retrieve requested parameters by run or for all runs.\n\n Args:\n exp (str): Experiment to retrieve parameters for.\n\n params (List[str]): A list of parameters to retrieve. These can be any\n parameter recorded in the eLog (PVs, parameters posted by other\n Tasks, etc.)\n \"\"\"\n ...\n
"},{"location":"source/io/elog/#io.elog.get_elog_runs_by_tag","title":"get_elog_runs_by_tag(exp, tag, auth=None)
","text":"Retrieve run numbers with a specified tag.
Parameters:
Name Type Description Defaultexp
str
Experiment name.
requiredtag
str
The tag to retrieve runs for.
required Source code inlute/io/elog.py
def get_elog_runs_by_tag(\n exp: str, tag: str, auth: Optional[Union[HTTPBasicAuth, Dict]] = None\n) -> List[int]:\n \"\"\"Retrieve run numbers with a specified tag.\n\n Args:\n exp (str): Experiment name.\n\n tag (str): The tag to retrieve runs for.\n \"\"\"\n endpoint: str = f\"{exp}/ws/get_runs_with_tag?tag={tag}\"\n params: Dict[str, Any] = {}\n\n status_code, resp_msg, tagged_runs = elog_http_request(\n exp=exp, endpoint=endpoint, request_type=\"GET\", **params\n )\n\n if not tagged_runs:\n tagged_runs = []\n\n return tagged_runs\n
"},{"location":"source/io/elog/#io.elog.get_elog_workflows","title":"get_elog_workflows(exp)
","text":"Get the current workflow definitions for an experiment.
Returns:
Name Type Descriptiondefns
Dict[str, str]
A dictionary of workflow definitions.
Source code inlute/io/elog.py
def get_elog_workflows(exp: str) -> Dict[str, str]:\n \"\"\"Get the current workflow definitions for an experiment.\n\n Returns:\n defns (Dict[str, str]): A dictionary of workflow definitions.\n \"\"\"\n raise NotImplementedError\n
"},{"location":"source/io/elog/#io.elog.post_elog_message","title":"post_elog_message(exp, msg, *, tag, title, in_files=[])
","text":"Post a new message to the eLog. Inspired by the elog
package.
Parameters:
Name Type Description Defaultexp
str
Experiment name.
requiredmsg
str
BODY of the eLog post.
requiredtag
str | None
Optional \"tag\" to associate with the eLog post.
requiredtitle
str | None
Optional title to include in the eLog post.
requiredin_files
List[str | tuple | list]
Files to include as attachments in the eLog post.
[]
Returns:
Name Type Descriptionerr_msg
str | None
If successful, nothing is returned, otherwise, return an error message.
Source code inlute/io/elog.py
def post_elog_message(\n exp: str,\n msg: str,\n *,\n tag: Optional[str],\n title: Optional[str],\n in_files: List[Union[str, tuple, list]] = [],\n) -> Optional[str]:\n \"\"\"Post a new message to the eLog. Inspired by the `elog` package.\n\n Args:\n exp (str): Experiment name.\n\n msg (str): BODY of the eLog post.\n\n tag (str | None): Optional \"tag\" to associate with the eLog post.\n\n title (str | None): Optional title to include in the eLog post.\n\n in_files (List[str | tuple | list]): Files to include as attachments in\n the eLog post.\n\n Returns:\n err_msg (str | None): If successful, nothing is returned, otherwise,\n return an error message.\n \"\"\"\n # MOSTLY CORRECT\n out_files: list = []\n for f in in_files:\n try:\n out_files.append(format_file_for_post(in_file=f))\n except ElogFileFormatError as err:\n logger.debug(f\"ElogFileFormatError: {err}\")\n post: Dict[str, str] = {}\n post[\"log_text\"] = msg\n if tag:\n post[\"log_tags\"] = tag\n if title:\n post[\"log_title\"] = title\n\n endpoint: str = f\"{exp}/ws/new_elog_entry\"\n\n params: Dict[str, Any] = {\"data\": post}\n\n if out_files:\n params.update({\"files\": out_files})\n\n status_code, resp_msg, _ = elog_http_request(\n exp=exp, endpoint=endpoint, request_type=\"POST\", **params\n )\n\n if resp_msg != \"SUCCESS\":\n return resp_msg\n
"},{"location":"source/io/elog/#io.elog.post_elog_run_status","title":"post_elog_run_status(data, update_url=None)
","text":"Post a summary to the status/report section of a specific run.
In contrast to most eLog update/post mechanisms, this function searches for a specific environment variable which contains a specific URL for posting. This is updated every job/run as jobs are submitted by the JID. The URL can optionally be passed to this function if it is known.
Parameters:
Name Type Description Defaultdata
Dict[str, Union[str, int, float]]
The data to post to the eLog report section. Formatted in key:value pairs.
requiredupdate_url
Optional[str]
Optional update URL. If not provided, the function searches for the corresponding environment variable. If neither is found, the function aborts
None
Source code in lute/io/elog.py
def post_elog_run_status(\n data: Dict[str, Union[str, int, float]], update_url: Optional[str] = None\n) -> None:\n \"\"\"Post a summary to the status/report section of a specific run.\n\n In contrast to most eLog update/post mechanisms, this function searches\n for a specific environment variable which contains a specific URL for\n posting. This is updated every job/run as jobs are submitted by the JID.\n The URL can optionally be passed to this function if it is known.\n\n Args:\n data (Dict[str, Union[str, int, float]]): The data to post to the eLog\n report section. Formatted in key:value pairs.\n\n update_url (Optional[str]): Optional update URL. If not provided, the\n function searches for the corresponding environment variable. If\n neither is found, the function aborts\n \"\"\"\n if update_url is None:\n update_url = os.environ.get(\"JID_UPDATE_COUNTERS\")\n if update_url is None:\n logger.info(\"eLog Update Failed! JID_UPDATE_COUNTERS is not defined!\")\n return\n current_status: Dict[str, Union[str, int, float]] = _get_current_run_status(\n update_url\n )\n current_status.update(data)\n post_list: List[Dict[str, str]] = [\n {\"key\": f\"{key}\", \"value\": f\"{value}\"} for key, value in current_status.items()\n ]\n params: Dict[str, List[Dict[str, str]]] = {\"json\": post_list}\n resp: requests.models.Response = requests.post(update_url, **params)\n
"},{"location":"source/io/elog/#io.elog.post_elog_run_table","title":"post_elog_run_table(exp, run, data)
","text":"Post data for eLog run tables.
Parameters:
Name Type Description Defaultexp
str
Experiment name.
requiredrun
int
Run number corresponding to the data being posted.
requireddata
Dict[str, Any]
Data to be posted in format data[\"column_header\"] = value.
requiredReturns:
Name Type Descriptionerr_msg
None | str
If successful, nothing is returned, otherwise, return an error message.
Source code inlute/io/elog.py
def post_elog_run_table(\n exp: str,\n run: int,\n data: Dict[str, Any],\n) -> Optional[str]:\n \"\"\"Post data for eLog run tables.\n\n Args:\n exp (str): Experiment name.\n\n run (int): Run number corresponding to the data being posted.\n\n data (Dict[str, Any]): Data to be posted in format\n data[\"column_header\"] = value.\n\n Returns:\n err_msg (None | str): If successful, nothing is returned, otherwise,\n return an error message.\n \"\"\"\n endpoint: str = f\"run_control/{exp}/ws/add_run_params\"\n\n params: Dict[str, Any] = {\"params\": {\"run_num\": run}, \"json\": data}\n\n status_code, resp_msg, _ = elog_http_request(\n exp=exp, endpoint=endpoint, request_type=\"POST\", **params\n )\n\n if resp_msg != \"SUCCESS\":\n return resp_msg\n
"},{"location":"source/io/elog/#io.elog.post_elog_workflow","title":"post_elog_workflow(exp, name, executable, wf_params, *, trigger='run_end', location='S3DF', **trig_args)
","text":"Create a new eLog workflow, or update an existing one.
The workflow will run a specific executable as a batch job when the specified trigger occurs. The precise arguments may vary depending on the selected trigger type.
Parameters:
Name Type Description Defaultname
str
An identifying name for the workflow. E.g. \"process data\"
requiredexecutable
str
Full path to the executable to be run.
requiredwf_params
str
All command-line parameters for the executable as a string.
requiredtrigger
str
When to trigger execution of the specified executable. One of: - 'manual': Must be manually triggered. No automatic processing. - 'run_start': Execute immediately if a new run begins. - 'run_end': As soon as a run ends. - 'param_is': As soon as a parameter has a specific value for a run.
'run_end'
location
str
Where to submit the job. S3DF or NERSC.
'S3DF'
**trig_args
str
Arguments required for a specific trigger type. trigger='param_is' - 2 Arguments trig_param (str): Name of the parameter to watch for. trig_param_val (str): Value the parameter should have to trigger.
{}
Source code in lute/io/elog.py
def post_elog_workflow(\n exp: str,\n name: str,\n executable: str,\n wf_params: str,\n *,\n trigger: str = \"run_end\",\n location: str = \"S3DF\",\n **trig_args: str,\n) -> None:\n \"\"\"Create a new eLog workflow, or update an existing one.\n\n The workflow will run a specific executable as a batch job when the\n specified trigger occurs. The precise arguments may vary depending on the\n selected trigger type.\n\n Args:\n name (str): An identifying name for the workflow. E.g. \"process data\"\n\n executable (str): Full path to the executable to be run.\n\n wf_params (str): All command-line parameters for the executable as a string.\n\n trigger (str): When to trigger execution of the specified executable.\n One of:\n - 'manual': Must be manually triggered. No automatic processing.\n - 'run_start': Execute immediately if a new run begins.\n - 'run_end': As soon as a run ends.\n - 'param_is': As soon as a parameter has a specific value for a run.\n\n location (str): Where to submit the job. S3DF or NERSC.\n\n **trig_args (str): Arguments required for a specific trigger type.\n trigger='param_is' - 2 Arguments\n trig_param (str): Name of the parameter to watch for.\n trig_param_val (str): Value the parameter should have to trigger.\n \"\"\"\n endpoint: str = f\"{exp}/ws/create_update_workflow_def\"\n trig_map: Dict[str, str] = {\n \"manual\": \"MANUAL\",\n \"run_start\": \"START_OF_RUN\",\n \"run_end\": \"END_OF_RUN\",\n \"param_is\": \"RUN_PARAM_IS_VALUE\",\n }\n if trigger not in trig_map.keys():\n raise NotImplementedError(\n f\"Cannot create workflow with trigger type: {trigger}\"\n )\n wf_defn: Dict[str, str] = {\n \"name\": name,\n \"executable\": executable,\n \"parameters\": wf_params,\n \"trigger\": trig_map[trigger],\n \"location\": location,\n }\n if trigger == \"param_is\":\n if \"trig_param\" not in trig_args or \"trig_param_val\" not in trig_args:\n raise RuntimeError(\n \"Trigger type 'param_is' requires: 'trig_param' and 'trig_param_val' arguments\"\n )\n wf_defn.update(\n {\n \"run_param_name\": trig_args[\"trig_param\"],\n \"run_param_val\": trig_args[\"trig_param_val\"],\n }\n )\n post_params: Dict[str, Dict[str, str]] = {\"json\": wf_defn}\n status_code, resp_msg, _ = elog_http_request(\n exp, endpoint=endpoint, request_type=\"POST\", **post_params\n )\n
"},{"location":"source/io/exceptions/","title":"exceptions","text":"Specifies custom exceptions defined for IO problems.
Raises:
Type DescriptionElogFileFormatError
Raised if an attachment is specified in an incorrect format.
"},{"location":"source/io/exceptions/#io.exceptions.ElogFileFormatError","title":"ElogFileFormatError
","text":" Bases: Exception
Raised when an eLog attachment is specified in an invalid format.
Source code inlute/io/exceptions.py
class ElogFileFormatError(Exception):\n \"\"\"Raised when an eLog attachment is specified in an invalid format.\"\"\"\n\n ...\n
"},{"location":"source/io/models/base/","title":"base","text":"Base classes for describing Task parameters.
Classes:
Name DescriptionAnalysisHeader
Model holding shared configuration across Tasks. E.g. experiment name, run number and working directory.
TaskParameters
Base class for Task parameters. Subclasses specify a model of parameters and their types for validation.
ThirdPartyParameters
Base class for Third-party, binary executable Tasks.
TemplateParameters
Dataclass to represent parameters of binary (third-party) Tasks which are used for additional config files.
TemplateConfig
Class for holding information on where templates are stored in order to properly handle ThirdPartyParameter objects.
"},{"location":"source/io/models/base/#io.models.base.AnalysisHeader","title":"AnalysisHeader
","text":" Bases: BaseModel
Header information for LUTE analysis runs.
Source code inlute/io/models/base.py
class AnalysisHeader(BaseModel):\n \"\"\"Header information for LUTE analysis runs.\"\"\"\n\n title: str = Field(\n \"LUTE Task Configuration\",\n description=\"Description of the configuration or experiment.\",\n )\n experiment: str = Field(\"\", description=\"Experiment.\")\n run: Union[str, int] = Field(\"\", description=\"Data acquisition run.\")\n date: str = Field(\"1970/01/01\", description=\"Start date of analysis.\")\n lute_version: Union[float, str] = Field(\n 0.1, description=\"Version of LUTE used for analysis.\"\n )\n task_timeout: PositiveInt = Field(\n 600,\n description=(\n \"Time in seconds until a task times out. Should be slightly shorter\"\n \" than job timeout if using a job manager (e.g. SLURM).\"\n ),\n )\n work_dir: str = Field(\"\", description=\"Main working directory for LUTE.\")\n\n @validator(\"work_dir\", always=True)\n def validate_work_dir(cls, directory: str, values: Dict[str, Any]) -> str:\n work_dir: str\n if directory == \"\":\n std_work_dir = (\n f\"/sdf/data/lcls/ds/{values['experiment'][:3]}/\"\n f\"{values['experiment']}/scratch\"\n )\n work_dir = std_work_dir\n else:\n work_dir = directory\n # Check existence and permissions\n if not os.path.exists(work_dir):\n raise ValueError(f\"Working Directory: {work_dir} does not exist!\")\n if not os.access(work_dir, os.W_OK):\n # Need write access for database, files etc.\n raise ValueError(f\"Not write access for working directory: {work_dir}!\")\n return work_dir\n\n @validator(\"run\", always=True)\n def validate_run(\n cls, run: Union[str, int], values: Dict[str, Any]\n ) -> Union[str, int]:\n if run == \"\":\n # From Airflow RUN_NUM should have Format \"RUN_DATETIME\" - Num is first part\n run_time: str = os.environ.get(\"RUN_NUM\", \"\")\n if run_time != \"\":\n return int(run_time.split(\"_\")[0])\n return run\n\n @validator(\"experiment\", always=True)\n def validate_experiment(cls, experiment: str, values: Dict[str, Any]) -> str:\n if experiment == \"\":\n arp_exp: str = os.environ.get(\"EXPERIMENT\", \"EXPX00000\")\n return arp_exp\n return experiment\n
"},{"location":"source/io/models/base/#io.models.base.TaskParameters","title":"TaskParameters
","text":" Bases: BaseSettings
Base class for models of task parameters to be validated.
Parameters are read from a configuration YAML file and validated against subclasses of this type in order to ensure that both all parameters are present, and that the parameters are of the correct type.
NotePydantic is used for data validation. Pydantic does not perform \"strict\" validation by default. Parameter values may be cast to conform with the model specified by the subclass definition if it is possible to do so. Consider whether this may cause issues (e.g. if a float is cast to an int).
Source code inlute/io/models/base.py
class TaskParameters(BaseSettings):\n \"\"\"Base class for models of task parameters to be validated.\n\n Parameters are read from a configuration YAML file and validated against\n subclasses of this type in order to ensure that both all parameters are\n present, and that the parameters are of the correct type.\n\n Note:\n Pydantic is used for data validation. Pydantic does not perform \"strict\"\n validation by default. Parameter values may be cast to conform with the\n model specified by the subclass definition if it is possible to do so.\n Consider whether this may cause issues (e.g. if a float is cast to an\n int).\n \"\"\"\n\n class Config:\n \"\"\"Configuration for parameters model.\n\n The Config class holds Pydantic configuration. A number of LUTE-specific\n configuration has also been placed here.\n\n Attributes:\n env_prefix (str): Pydantic configuration. Will set parameters from\n environment variables containing this prefix. E.g. a model\n parameter `input` can be set with an environment variable:\n `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n underscore_attrs_are_private (bool): Pydantic configuration. Whether\n to hide attributes (parameters) prefixed with an underscore.\n\n copy_on_model_validation (str): Pydantic configuration. How to copy\n the input object passed to the class instance for model\n validation. Set to perform a deep copy.\n\n allow_inf_nan (bool): Pydantic configuration. Whether to allow\n infinity or NAN in float fields.\n\n run_directory (Optional[str]): None. If set, it should be a valid\n path. The `Task` will be run from this directory. This may be\n useful for some `Task`s which rely on searching the working\n directory.\n\n set_result (bool). False. If True, the model has information about\n setting the TaskResult object from the parameters it contains.\n E.g. it has an `output` parameter which is marked as the result.\n The result can be set with a field value of `is_result=True` on\n a specific parameter, or using `result_from_params` and a\n validator.\n\n result_from_params (Optional[str]): None. Optionally used to define\n results from information available in the model using a custom\n validator. E.g. use a `outdir` and `filename` field to set\n `result_from_params=f\"{outdir}/{filename}`, etc. Only used if\n `set_result==True`\n\n result_summary (Optional[str]): None. Defines a result summary that\n can be known after processing the Pydantic model. Use of summary\n depends on the Executor running the Task. All summaries are\n stored in the database, however. Only used if `set_result==True`\n\n impl_schemas (Optional[str]). Specifies a the schemas the\n output/results conform to. Only used if `set_result==True`.\n \"\"\"\n\n env_prefix = \"LUTE_\"\n underscore_attrs_are_private: bool = True\n copy_on_model_validation: str = \"deep\"\n allow_inf_nan: bool = False\n\n run_directory: Optional[str] = None\n \"\"\"Set the directory that the Task is run from.\"\"\"\n set_result: bool = False\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n result_from_params: Optional[str] = None\n \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n result_summary: Optional[str] = None\n \"\"\"Format a TaskResult.summary from output.\"\"\"\n impl_schemas: Optional[str] = None\n \"\"\"Schema specification for output result. Will be passed to TaskResult.\"\"\"\n\n lute_config: AnalysisHeader\n
"},{"location":"source/io/models/base/#io.models.base.TaskParameters.Config","title":"Config
","text":"Configuration for parameters model.
The Config class holds Pydantic configuration. A number of LUTE-specific configuration has also been placed here.
Attributes:
Name Type Descriptionenv_prefix
str
Pydantic configuration. Will set parameters from environment variables containing this prefix. E.g. a model parameter input
can be set with an environment variable: {env_prefix}input
, in LUTE's case LUTE_input
.
underscore_attrs_are_private
bool
Pydantic configuration. Whether to hide attributes (parameters) prefixed with an underscore.
copy_on_model_validation
str
Pydantic configuration. How to copy the input object passed to the class instance for model validation. Set to perform a deep copy.
allow_inf_nan
bool
Pydantic configuration. Whether to allow infinity or NAN in float fields.
run_directory
Optional[str]
None. If set, it should be a valid path. The Task
will be run from this directory. This may be useful for some Task
s which rely on searching the working directory.
result_from_params
Optional[str]
None. Optionally used to define results from information available in the model using a custom validator. E.g. use a outdir
and filename
field to set result_from_params=f\"{outdir}/{filename}
, etc. Only used if set_result==True
result_summary
Optional[str]
None. Defines a result summary that can be known after processing the Pydantic model. Use of summary depends on the Executor running the Task. All summaries are stored in the database, however. Only used if set_result==True
lute/io/models/base.py
class Config:\n \"\"\"Configuration for parameters model.\n\n The Config class holds Pydantic configuration. A number of LUTE-specific\n configuration has also been placed here.\n\n Attributes:\n env_prefix (str): Pydantic configuration. Will set parameters from\n environment variables containing this prefix. E.g. a model\n parameter `input` can be set with an environment variable:\n `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n underscore_attrs_are_private (bool): Pydantic configuration. Whether\n to hide attributes (parameters) prefixed with an underscore.\n\n copy_on_model_validation (str): Pydantic configuration. How to copy\n the input object passed to the class instance for model\n validation. Set to perform a deep copy.\n\n allow_inf_nan (bool): Pydantic configuration. Whether to allow\n infinity or NAN in float fields.\n\n run_directory (Optional[str]): None. If set, it should be a valid\n path. The `Task` will be run from this directory. This may be\n useful for some `Task`s which rely on searching the working\n directory.\n\n set_result (bool). False. If True, the model has information about\n setting the TaskResult object from the parameters it contains.\n E.g. it has an `output` parameter which is marked as the result.\n The result can be set with a field value of `is_result=True` on\n a specific parameter, or using `result_from_params` and a\n validator.\n\n result_from_params (Optional[str]): None. Optionally used to define\n results from information available in the model using a custom\n validator. E.g. use a `outdir` and `filename` field to set\n `result_from_params=f\"{outdir}/{filename}`, etc. Only used if\n `set_result==True`\n\n result_summary (Optional[str]): None. Defines a result summary that\n can be known after processing the Pydantic model. Use of summary\n depends on the Executor running the Task. All summaries are\n stored in the database, however. Only used if `set_result==True`\n\n impl_schemas (Optional[str]). Specifies a the schemas the\n output/results conform to. Only used if `set_result==True`.\n \"\"\"\n\n env_prefix = \"LUTE_\"\n underscore_attrs_are_private: bool = True\n copy_on_model_validation: str = \"deep\"\n allow_inf_nan: bool = False\n\n run_directory: Optional[str] = None\n \"\"\"Set the directory that the Task is run from.\"\"\"\n set_result: bool = False\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n result_from_params: Optional[str] = None\n \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n result_summary: Optional[str] = None\n \"\"\"Format a TaskResult.summary from output.\"\"\"\n impl_schemas: Optional[str] = None\n \"\"\"Schema specification for output result. Will be passed to TaskResult.\"\"\"\n
"},{"location":"source/io/models/base/#io.models.base.TaskParameters.Config.impl_schemas","title":"impl_schemas: Optional[str] = None
class-attribute
instance-attribute
","text":"Schema specification for output result. Will be passed to TaskResult.
"},{"location":"source/io/models/base/#io.models.base.TaskParameters.Config.result_from_params","title":"result_from_params: Optional[str] = None
class-attribute
instance-attribute
","text":"Defines a result from the parameters. Use a validator to do so.
"},{"location":"source/io/models/base/#io.models.base.TaskParameters.Config.result_summary","title":"result_summary: Optional[str] = None
class-attribute
instance-attribute
","text":"Format a TaskResult.summary from output.
"},{"location":"source/io/models/base/#io.models.base.TaskParameters.Config.run_directory","title":"run_directory: Optional[str] = None
class-attribute
instance-attribute
","text":"Set the directory that the Task is run from.
"},{"location":"source/io/models/base/#io.models.base.TaskParameters.Config.set_result","title":"set_result: bool = False
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/models/base/#io.models.base.TemplateConfig","title":"TemplateConfig
","text":" Bases: BaseModel
Parameters used for templating of third party configuration files.
Attributes:
Name Type Descriptiontemplate_name
str
The name of the template to use. This template must live in config/templates
.
output_path
str
The FULL path, including filename to write the rendered template to.
Source code inlute/io/models/base.py
class TemplateConfig(BaseModel):\n \"\"\"Parameters used for templating of third party configuration files.\n\n Attributes:\n template_name (str): The name of the template to use. This template must\n live in `config/templates`.\n\n output_path (str): The FULL path, including filename to write the\n rendered template to.\n \"\"\"\n\n template_name: str\n output_path: str\n
"},{"location":"source/io/models/base/#io.models.base.TemplateParameters","title":"TemplateParameters
","text":"Class for representing parameters for third party configuration files.
These parameters can represent arbitrary data types and are used in conjunction with templates for modifying third party configuration files from the single LUTE YAML. Due to the storage of arbitrary data types, and the use of a template file, a single instance of this class can hold from a single template variable to an entire configuration file. The data parsing is done by jinja using the complementary template. All data is stored in the single model variable params.
The pydantic \"dataclass\" is used over the BaseModel/Settings to allow positional argument instantiation of the params
Field.
lute/io/models/base.py
@dataclass\nclass TemplateParameters:\n \"\"\"Class for representing parameters for third party configuration files.\n\n These parameters can represent arbitrary data types and are used in\n conjunction with templates for modifying third party configuration files\n from the single LUTE YAML. Due to the storage of arbitrary data types, and\n the use of a template file, a single instance of this class can hold from a\n single template variable to an entire configuration file. The data parsing\n is done by jinja using the complementary template.\n All data is stored in the single model variable `params.`\n\n The pydantic \"dataclass\" is used over the BaseModel/Settings to allow\n positional argument instantiation of the `params` Field.\n \"\"\"\n\n params: Any\n
"},{"location":"source/io/models/base/#io.models.base.ThirdPartyParameters","title":"ThirdPartyParameters
","text":" Bases: TaskParameters
Base class for third party task parameters.
Contains special validators for extra arguments and handling of parameters used for filling in third party configuration files.
Source code inlute/io/models/base.py
class ThirdPartyParameters(TaskParameters):\n \"\"\"Base class for third party task parameters.\n\n Contains special validators for extra arguments and handling of parameters\n used for filling in third party configuration files.\n \"\"\"\n\n class Config(TaskParameters.Config):\n \"\"\"Configuration for parameters model.\n\n The Config class holds Pydantic configuration and inherited configuration\n from the base `TaskParameters.Config` class. A number of values are also\n overridden, and there are some specific configuration options to\n ThirdPartyParameters. A full list of options (with TaskParameters options\n repeated) is described below.\n\n Attributes:\n env_prefix (str): Pydantic configuration. Will set parameters from\n environment variables containing this prefix. E.g. a model\n parameter `input` can be set with an environment variable:\n `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n underscore_attrs_are_private (bool): Pydantic configuration. Whether\n to hide attributes (parameters) prefixed with an underscore.\n\n copy_on_model_validation (str): Pydantic configuration. How to copy\n the input object passed to the class instance for model\n validation. Set to perform a deep copy.\n\n allow_inf_nan (bool): Pydantic configuration. Whether to allow\n infinity or NAN in float fields.\n\n run_directory (Optional[str]): None. If set, it should be a valid\n path. The `Task` will be run from this directory. This may be\n useful for some `Task`s which rely on searching the working\n directory.\n\n set_result (bool). True. If True, the model has information about\n setting the TaskResult object from the parameters it contains.\n E.g. it has an `output` parameter which is marked as the result.\n The result can be set with a field value of `is_result=True` on\n a specific parameter, or using `result_from_params` and a\n validator.\n\n result_from_params (Optional[str]): None. Optionally used to define\n results from information available in the model using a custom\n validator. E.g. use a `outdir` and `filename` field to set\n `result_from_params=f\"{outdir}/{filename}`, etc.\n\n result_summary (Optional[str]): None. Defines a result summary that\n can be known after processing the Pydantic model. Use of summary\n depends on the Executor running the Task. All summaries are\n stored in the database, however.\n\n impl_schemas (Optional[str]). Specifies a the schemas the\n output/results conform to. Only used if set_result is True.\n\n -----------------------\n ThirdPartyTask-specific:\n\n extra (str): \"allow\". Pydantic configuration. Allow (or ignore) extra\n arguments.\n\n short_flags_use_eq (bool): False. If True, \"short\" command-line args\n are passed as `-x=arg`. ThirdPartyTask-specific.\n\n long_flags_use_eq (bool): False. If True, \"long\" command-line args\n are passed as `--long=arg`. ThirdPartyTask-specific.\n \"\"\"\n\n extra: str = \"allow\"\n short_flags_use_eq: bool = False\n \"\"\"Whether short command-line arguments are passed like `-x=arg`.\"\"\"\n long_flags_use_eq: bool = False\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n # lute_template_cfg: TemplateConfig\n\n @root_validator(pre=False)\n def extra_fields_to_thirdparty(cls, values: Dict[str, Any]):\n for key in values:\n if key not in cls.__fields__:\n values[key] = TemplateParameters(values[key])\n\n return values\n
"},{"location":"source/io/models/base/#io.models.base.ThirdPartyParameters.Config","title":"Config
","text":" Bases: Config
Configuration for parameters model.
The Config class holds Pydantic configuration and inherited configuration from the base TaskParameters.Config
class. A number of values are also overridden, and there are some specific configuration options to ThirdPartyParameters. A full list of options (with TaskParameters options repeated) is described below.
Attributes:
Name Type Descriptionenv_prefix
str
Pydantic configuration. Will set parameters from environment variables containing this prefix. E.g. a model parameter input
can be set with an environment variable: {env_prefix}input
, in LUTE's case LUTE_input
.
underscore_attrs_are_private
bool
Pydantic configuration. Whether to hide attributes (parameters) prefixed with an underscore.
copy_on_model_validation
str
Pydantic configuration. How to copy the input object passed to the class instance for model validation. Set to perform a deep copy.
allow_inf_nan
bool
Pydantic configuration. Whether to allow infinity or NAN in float fields.
run_directory
Optional[str]
None. If set, it should be a valid path. The Task
will be run from this directory. This may be useful for some Task
s which rely on searching the working directory.
result_from_params
Optional[str]
None. Optionally used to define results from information available in the model using a custom validator. E.g. use a outdir
and filename
field to set result_from_params=f\"{outdir}/{filename}
, etc.
result_summary
Optional[str]
None. Defines a result summary that can be known after processing the Pydantic model. Use of summary depends on the Executor running the Task. All summaries are stored in the database, however.
ThirdPartyTask-specific
Optional[str]
extra
str
\"allow\". Pydantic configuration. Allow (or ignore) extra arguments.
short_flags_use_eq
bool
False. If True, \"short\" command-line args are passed as -x=arg
. ThirdPartyTask-specific.
long_flags_use_eq
bool
False. If True, \"long\" command-line args are passed as --long=arg
. ThirdPartyTask-specific.
lute/io/models/base.py
class Config(TaskParameters.Config):\n \"\"\"Configuration for parameters model.\n\n The Config class holds Pydantic configuration and inherited configuration\n from the base `TaskParameters.Config` class. A number of values are also\n overridden, and there are some specific configuration options to\n ThirdPartyParameters. A full list of options (with TaskParameters options\n repeated) is described below.\n\n Attributes:\n env_prefix (str): Pydantic configuration. Will set parameters from\n environment variables containing this prefix. E.g. a model\n parameter `input` can be set with an environment variable:\n `{env_prefix}input`, in LUTE's case `LUTE_input`.\n\n underscore_attrs_are_private (bool): Pydantic configuration. Whether\n to hide attributes (parameters) prefixed with an underscore.\n\n copy_on_model_validation (str): Pydantic configuration. How to copy\n the input object passed to the class instance for model\n validation. Set to perform a deep copy.\n\n allow_inf_nan (bool): Pydantic configuration. Whether to allow\n infinity or NAN in float fields.\n\n run_directory (Optional[str]): None. If set, it should be a valid\n path. The `Task` will be run from this directory. This may be\n useful for some `Task`s which rely on searching the working\n directory.\n\n set_result (bool). True. If True, the model has information about\n setting the TaskResult object from the parameters it contains.\n E.g. it has an `output` parameter which is marked as the result.\n The result can be set with a field value of `is_result=True` on\n a specific parameter, or using `result_from_params` and a\n validator.\n\n result_from_params (Optional[str]): None. Optionally used to define\n results from information available in the model using a custom\n validator. E.g. use a `outdir` and `filename` field to set\n `result_from_params=f\"{outdir}/{filename}`, etc.\n\n result_summary (Optional[str]): None. Defines a result summary that\n can be known after processing the Pydantic model. Use of summary\n depends on the Executor running the Task. All summaries are\n stored in the database, however.\n\n impl_schemas (Optional[str]). Specifies a the schemas the\n output/results conform to. Only used if set_result is True.\n\n -----------------------\n ThirdPartyTask-specific:\n\n extra (str): \"allow\". Pydantic configuration. Allow (or ignore) extra\n arguments.\n\n short_flags_use_eq (bool): False. If True, \"short\" command-line args\n are passed as `-x=arg`. ThirdPartyTask-specific.\n\n long_flags_use_eq (bool): False. If True, \"long\" command-line args\n are passed as `--long=arg`. ThirdPartyTask-specific.\n \"\"\"\n\n extra: str = \"allow\"\n short_flags_use_eq: bool = False\n \"\"\"Whether short command-line arguments are passed like `-x=arg`.\"\"\"\n long_flags_use_eq: bool = False\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/models/base/#io.models.base.ThirdPartyParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = False
class-attribute
instance-attribute
","text":"Whether long command-line arguments are passed like --long=arg
.
set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/models/base/#io.models.base.ThirdPartyParameters.Config.short_flags_use_eq","title":"short_flags_use_eq: bool = False
class-attribute
instance-attribute
","text":"Whether short command-line arguments are passed like -x=arg
.
FindPeaksPsocakeParameters
","text":" Bases: ThirdPartyParameters
Parameters for crystallographic (Bragg) peak finding using Psocake.
This peak finding Task optionally has the ability to compress/decompress data with SZ for the purpose of compression validation. NOTE: This Task is deprecated and provided for compatibility only.
Source code inlute/io/models/sfx_find_peaks.py
class FindPeaksPsocakeParameters(ThirdPartyParameters):\n \"\"\"Parameters for crystallographic (Bragg) peak finding using Psocake.\n\n This peak finding Task optionally has the ability to compress/decompress\n data with SZ for the purpose of compression validation.\n NOTE: This Task is deprecated and provided for compatibility only.\n \"\"\"\n\n class Config(TaskParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n result_from_params: str = \"\"\n \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n\n class SZParameters(BaseModel):\n compressor: Literal[\"qoz\", \"sz3\"] = Field(\n \"qoz\", description=\"SZ compression algorithm (qoz, sz3)\"\n )\n binSize: int = Field(2, description=\"SZ compression's bin size paramater\")\n roiWindowSize: int = Field(\n 2, description=\"SZ compression's ROI window size paramater\"\n )\n absError: float = Field(10, descriptionp=\"Maximum absolute error value\")\n\n executable: str = Field(\"mpirun\", description=\"MPI executable.\", flag_type=\"\")\n np: PositiveInt = Field(\n max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n description=\"Number of processes\",\n flag_type=\"-\",\n )\n mca: str = Field(\n \"btl ^openib\", description=\"Mca option for the MPI executable\", flag_type=\"--\"\n )\n p_arg1: str = Field(\n \"python\", description=\"Executable to run with mpi (i.e. python).\", flag_type=\"\"\n )\n u: str = Field(\n \"\", description=\"Python option for unbuffered output.\", flag_type=\"-\"\n )\n p_arg2: str = Field(\n \"findPeaksSZ.py\",\n description=\"Executable to run with mpi (i.e. python).\",\n flag_type=\"\",\n )\n d: str = Field(description=\"Detector name\", flag_type=\"-\")\n e: str = Field(\"\", description=\"Experiment name\", flag_type=\"-\")\n r: int = Field(-1, description=\"Run number\", flag_type=\"-\")\n outDir: str = Field(\n description=\"Output directory where .cxi will be saved\", flag_type=\"--\"\n )\n algorithm: int = Field(1, description=\"PyAlgos algorithm to use\", flag_type=\"--\")\n alg_npix_min: float = Field(\n 1.0, description=\"PyAlgos algorithm's npix_min parameter\", flag_type=\"--\"\n )\n alg_npix_max: float = Field(\n 45.0, description=\"PyAlgos algorithm's npix_max parameter\", flag_type=\"--\"\n )\n alg_amax_thr: float = Field(\n 250.0, description=\"PyAlgos algorithm's amax_thr parameter\", flag_type=\"--\"\n )\n alg_atot_thr: float = Field(\n 330.0, description=\"PyAlgos algorithm's atot_thr parameter\", flag_type=\"--\"\n )\n alg_son_min: float = Field(\n 10.0, description=\"PyAlgos algorithm's son_min parameter\", flag_type=\"--\"\n )\n alg1_thr_low: float = Field(\n 80.0, description=\"PyAlgos algorithm's thr_low parameter\", flag_type=\"--\"\n )\n alg1_thr_high: float = Field(\n 270.0, description=\"PyAlgos algorithm's thr_high parameter\", flag_type=\"--\"\n )\n alg1_rank: int = Field(\n 3, description=\"PyAlgos algorithm's rank parameter\", flag_type=\"--\"\n )\n alg1_radius: int = Field(\n 3, description=\"PyAlgos algorithm's radius parameter\", flag_type=\"--\"\n )\n alg1_dr: int = Field(\n 1, description=\"PyAlgos algorithm's dr parameter\", flag_type=\"--\"\n )\n psanaMask_on: str = Field(\n \"True\", description=\"Whether psana's mask should be used\", flag_type=\"--\"\n )\n psanaMask_calib: str = Field(\n \"True\", description=\"Psana mask's calib parameter\", flag_type=\"--\"\n )\n psanaMask_status: str = Field(\n \"True\", description=\"Psana mask's status parameter\", flag_type=\"--\"\n )\n psanaMask_edges: str = Field(\n \"True\", description=\"Psana mask's edges parameter\", flag_type=\"--\"\n )\n psanaMask_central: str = Field(\n \"True\", description=\"Psana mask's central parameter\", flag_type=\"--\"\n )\n psanaMask_unbond: str = Field(\n \"True\", description=\"Psana mask's unbond parameter\", flag_type=\"--\"\n )\n psanaMask_unbondnrs: str = Field(\n \"True\", description=\"Psana mask's unbondnbrs parameter\", flag_type=\"--\"\n )\n mask: str = Field(\n \"\", description=\"Path to an additional mask to apply\", flag_type=\"--\"\n )\n clen: str = Field(\n description=\"Epics variable storing the camera length\", flag_type=\"--\"\n )\n coffset: float = Field(0, description=\"Camera offset in m\", flag_type=\"--\")\n minPeaks: int = Field(\n 15,\n description=\"Minimum number of peaks to mark frame for indexing\",\n flag_type=\"--\",\n )\n maxPeaks: int = Field(\n 15,\n description=\"Maximum number of peaks to mark frame for indexing\",\n flag_type=\"--\",\n )\n minRes: int = Field(\n 0,\n description=\"Minimum peak resolution to mark frame for indexing \",\n flag_type=\"--\",\n )\n sample: str = Field(\"\", description=\"Sample name\", flag_type=\"--\")\n instrument: Union[None, str] = Field(\n None, description=\"Instrument name\", flag_type=\"--\"\n )\n pixelSize: float = Field(0.0, description=\"Pixel size\", flag_type=\"--\")\n auto: str = Field(\n \"False\",\n description=(\n \"Whether to automatically determine peak per event peak \"\n \"finding parameters\"\n ),\n flag_type=\"--\",\n )\n detectorDistance: float = Field(\n 0.0, description=\"Detector distance from interaction point in m\", flag_type=\"--\"\n )\n access: Literal[\"ana\", \"ffb\"] = Field(\n \"ana\", description=\"Data node type: {ana,ffb}\", flag_type=\"--\"\n )\n szfile: str = Field(\"qoz.json\", description=\"Path to SZ's JSON configuration file\")\n lute_template_cfg: TemplateConfig = Field(\n TemplateConfig(\n template_name=\"sz.json\",\n output_path=\"\", # Will want to change where this goes...\n ),\n description=\"Template information for the sz.json file\",\n )\n sz_parameters: SZParameters = Field(\n description=\"Configuration parameters for SZ Compression\", flag_type=\"\"\n )\n\n @validator(\"e\", always=True)\n def validate_e(cls, e: str, values: Dict[str, Any]) -> str:\n if e == \"\":\n return values[\"lute_config\"].experiment\n return e\n\n @validator(\"r\", always=True)\n def validate_r(cls, r: int, values: Dict[str, Any]) -> int:\n if r == -1:\n return values[\"lute_config\"].run\n return r\n\n @validator(\"lute_template_cfg\", always=True)\n def set_output_path(\n cls, lute_template_cfg: TemplateConfig, values: Dict[str, Any]\n ) -> TemplateConfig:\n if lute_template_cfg.output_path == \"\":\n lute_template_cfg.output_path = values[\"szfile\"]\n return lute_template_cfg\n\n @validator(\"sz_parameters\", always=True)\n def set_sz_compression_parameters(\n cls, sz_parameters: SZParameters, values: Dict[str, Any]\n ) -> None:\n values[\"compressor\"] = sz_parameters.compressor\n values[\"binSize\"] = sz_parameters.binSize\n values[\"roiWindowSize\"] = sz_parameters.roiWindowSize\n if sz_parameters.compressor == \"qoz\":\n values[\"pressio_opts\"] = {\n \"pressio:abs\": sz_parameters.absError,\n \"qoz\": {\"qoz:stride\": 8},\n }\n else:\n values[\"pressio_opts\"] = {\"pressio:abs\": sz_parameters.absError}\n return None\n\n @root_validator(pre=False)\n def define_result(cls, values: Dict[str, Any]) -> Dict[str, Any]:\n exp: str = values[\"lute_config\"].experiment\n run: int = int(values[\"lute_config\"].run)\n directory: str = values[\"outDir\"]\n fname: str = f\"{exp}_{run:04d}.lst\"\n\n cls.Config.result_from_params = f\"{directory}/{fname}\"\n return values\n
"},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPsocakeParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_find_peaks.py
class Config(TaskParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n result_from_params: str = \"\"\n \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n
"},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPsocakeParameters.Config.result_from_params","title":"result_from_params: str = ''
class-attribute
instance-attribute
","text":"Defines a result from the parameters. Use a validator to do so.
"},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPsocakeParameters.Config.set_result","title":"set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPyAlgosParameters","title":"FindPeaksPyAlgosParameters
","text":" Bases: TaskParameters
Parameters for crystallographic (Bragg) peak finding using PyAlgos.
This peak finding Task optionally has the ability to compress/decompress data with SZ for the purpose of compression validation.
Source code inlute/io/models/sfx_find_peaks.py
class FindPeaksPyAlgosParameters(TaskParameters):\n \"\"\"Parameters for crystallographic (Bragg) peak finding using PyAlgos.\n\n This peak finding Task optionally has the ability to compress/decompress\n data with SZ for the purpose of compression validation.\n \"\"\"\n\n class Config(TaskParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n class SZCompressorParameters(BaseModel):\n compressor: Literal[\"qoz\", \"sz3\"] = Field(\n \"qoz\", description='Compression algorithm (\"qoz\" or \"sz3\")'\n )\n abs_error: float = Field(10.0, description=\"Absolute error bound\")\n bin_size: int = Field(2, description=\"Bin size\")\n roi_window_size: int = Field(\n 9,\n description=\"Default window size\",\n )\n\n outdir: str = Field(\n description=\"Output directory for cxi files\",\n )\n n_events: int = Field(\n 0,\n description=\"Number of events to process (0 to process all events)\",\n )\n det_name: str = Field(\n description=\"Psana name of the detector storing the image data\",\n )\n event_receiver: Literal[\"evr0\", \"evr1\"] = Field(\n description=\"Event Receiver to be used: evr0 or evr1\",\n )\n tag: str = Field(\n \"\",\n description=\"Tag to add to the output file names\",\n )\n pv_camera_length: Union[str, float] = Field(\n \"\",\n description=\"PV associated with camera length \"\n \"(if a number, camera length directly)\",\n )\n event_logic: bool = Field(\n False,\n description=\"True if only events with a specific event code should be \"\n \"processed. False if the event code should be ignored\",\n )\n event_code: int = Field(\n 0,\n description=\"Required events code for events to be processed if event logic \"\n \"is True\",\n )\n psana_mask: bool = Field(\n False,\n description=\"If True, apply mask from psana Detector object\",\n )\n mask_file: Union[str, None] = Field(\n None,\n description=\"File with a custom mask to apply. If None, no custom mask is \"\n \"applied\",\n )\n min_peaks: int = Field(2, description=\"Minimum number of peaks per image\")\n max_peaks: int = Field(\n 2048,\n description=\"Maximum number of peaks per image\",\n )\n npix_min: int = Field(\n 2,\n description=\"Minimum number of pixels per peak\",\n )\n npix_max: int = Field(\n 30,\n description=\"Maximum number of pixels per peak\",\n )\n amax_thr: float = Field(\n 80.0,\n description=\"Minimum intensity threshold for starting a peak\",\n )\n atot_thr: float = Field(\n 120.0,\n description=\"Minimum summed intensity threshold for pixel collection\",\n )\n son_min: float = Field(\n 7.0,\n description=\"Minimum signal-to-noise ratio to be considered a peak\",\n )\n peak_rank: int = Field(\n 3,\n description=\"Radius in which central peak pixel is a local maximum\",\n )\n r0: float = Field(\n 3.0,\n description=\"Radius of ring for background evaluation in pixels\",\n )\n dr: float = Field(\n 2.0,\n description=\"Width of ring for background evaluation in pixels\",\n )\n nsigm: float = Field(\n 7.0,\n description=\"Intensity threshold to include pixel in connected group\",\n )\n compression: Optional[SZCompressorParameters] = Field(\n None,\n description=\"Options for the SZ Compression Algorithm\",\n )\n out_file: str = Field(\n \"\",\n description=\"Path to output file.\",\n flag_type=\"-\",\n rename_param=\"o\",\n is_result=True,\n )\n\n @validator(\"out_file\", always=True)\n def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n if out_file == \"\":\n fname: Path = (\n Path(values[\"outdir\"])\n / f\"{values['lute_config'].experiment}_{values['lute_config'].run}_\"\n f\"{values['tag']}.list\"\n )\n return str(fname)\n return out_file\n
"},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPyAlgosParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_find_peaks.py
class Config(TaskParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/models/sfx_find_peaks/#io.models.sfx_find_peaks.FindPeaksPyAlgosParameters.Config.set_result","title":"set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/models/sfx_index/","title":"sfx_index","text":"Models for serial femtosecond crystallography indexing.
Classes:
Name DescriptionIndexCrystFELParameters
Perform indexing of hits/peaks using CrystFEL's indexamajig
.
ConcatenateStreamFilesParameters
","text":" Bases: TaskParameters
Parameters for stream concatenation.
Concatenates the stream file output from CrystFEL indexing for multiple experimental runs.
Source code inlute/io/models/sfx_index.py
class ConcatenateStreamFilesParameters(TaskParameters):\n \"\"\"Parameters for stream concatenation.\n\n Concatenates the stream file output from CrystFEL indexing for multiple\n experimental runs.\n \"\"\"\n\n class Config(TaskParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n in_file: str = Field(\n \"\",\n description=\"Root of directory tree storing stream files to merge.\",\n )\n\n tag: Optional[str] = Field(\n \"\",\n description=\"Tag identifying the stream files to merge.\",\n )\n\n out_file: str = Field(\n \"\", description=\"Path to merged output stream file.\", is_result=True\n )\n\n @validator(\"in_file\", always=True)\n def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n if in_file == \"\":\n stream_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"IndexCrystFEL\", \"out_file\"\n )\n if stream_file:\n stream_dir: str = str(Path(stream_file).parent)\n return stream_dir\n return in_file\n\n @validator(\"tag\", always=True)\n def validate_tag(cls, tag: str, values: Dict[str, Any]) -> str:\n if tag == \"\":\n stream_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"IndexCrystFEL\", \"out_file\"\n )\n if stream_file:\n stream_tag: str = Path(stream_file).name.split(\"_\")[0]\n return stream_tag\n return tag\n\n @validator(\"out_file\", always=True)\n def validate_out_file(cls, tag: str, values: Dict[str, Any]) -> str:\n if tag == \"\":\n stream_out_file: str = str(\n Path(values[\"in_file\"]).parent / f\"{values['tag'].stream}\"\n )\n return stream_out_file\n return tag\n
"},{"location":"source/io/models/sfx_index/#io.models.sfx_index.ConcatenateStreamFilesParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_index.py
class Config(TaskParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/models/sfx_index/#io.models.sfx_index.ConcatenateStreamFilesParameters.Config.set_result","title":"set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/models/sfx_index/#io.models.sfx_index.IndexCrystFELParameters","title":"IndexCrystFELParameters
","text":" Bases: ThirdPartyParameters
Parameters for CrystFEL's indexamajig
.
There are many parameters, and many combinations. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-indexamajig.html
Source code inlute/io/models/sfx_index.py
class IndexCrystFELParameters(ThirdPartyParameters):\n \"\"\"Parameters for CrystFEL's `indexamajig`.\n\n There are many parameters, and many combinations. For more information on\n usage, please refer to the CrystFEL documentation, here:\n https://www.desy.de/~twhite/crystfel/manual-indexamajig.html\n \"\"\"\n\n class Config(ThirdPartyParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/indexamajig\",\n description=\"CrystFEL's indexing binary.\",\n flag_type=\"\",\n )\n # Basic options\n in_file: Optional[str] = Field(\n \"\", description=\"Path to input file.\", flag_type=\"-\", rename_param=\"i\"\n )\n out_file: str = Field(\n \"\",\n description=\"Path to output file.\",\n flag_type=\"-\",\n rename_param=\"o\",\n is_result=True,\n )\n geometry: str = Field(\n \"\", description=\"Path to geometry file.\", flag_type=\"-\", rename_param=\"g\"\n )\n zmq_input: Optional[str] = Field(\n description=\"ZMQ address to receive data over. `input` and `zmq-input` are mutually exclusive\",\n flag_type=\"--\",\n rename_param=\"zmq-input\",\n )\n zmq_subscribe: Optional[str] = Field( # Can be used multiple times...\n description=\"Subscribe to ZMQ message of type `tag`\",\n flag_type=\"--\",\n rename_param=\"zmq-subscribe\",\n )\n zmq_request: Optional[AnyUrl] = Field(\n description=\"Request new data over ZMQ by sending this value\",\n flag_type=\"--\",\n rename_param=\"zmq-request\",\n )\n asapo_endpoint: Optional[str] = Field(\n description=\"ASAP::O endpoint. zmq-input and this are mutually exclusive.\",\n flag_type=\"--\",\n rename_param=\"asapo-endpoint\",\n )\n asapo_token: Optional[str] = Field(\n description=\"ASAP::O authentication token.\",\n flag_type=\"--\",\n rename_param=\"asapo-token\",\n )\n asapo_beamtime: Optional[str] = Field(\n description=\"ASAP::O beatime.\",\n flag_type=\"--\",\n rename_param=\"asapo-beamtime\",\n )\n asapo_source: Optional[str] = Field(\n description=\"ASAP::O data source.\",\n flag_type=\"--\",\n rename_param=\"asapo-source\",\n )\n asapo_group: Optional[str] = Field(\n description=\"ASAP::O consumer group.\",\n flag_type=\"--\",\n rename_param=\"asapo-group\",\n )\n asapo_stream: Optional[str] = Field(\n description=\"ASAP::O stream.\",\n flag_type=\"--\",\n rename_param=\"asapo-stream\",\n )\n asapo_wait_for_stream: Optional[str] = Field(\n description=\"If ASAP::O stream does not exist, wait for it to appear.\",\n flag_type=\"--\",\n rename_param=\"asapo-wait-for-stream\",\n )\n data_format: Optional[str] = Field(\n description=\"Specify format for ZMQ or ASAP::O. `msgpack`, `hdf5` or `seedee`.\",\n flag_type=\"--\",\n rename_param=\"data-format\",\n )\n basename: bool = Field(\n False,\n description=\"Remove directory parts of filenames. Acts before prefix if prefix also given.\",\n flag_type=\"--\",\n )\n prefix: Optional[str] = Field(\n description=\"Add a prefix to the filenames from the infile argument.\",\n flag_type=\"--\",\n rename_param=\"asapo-stream\",\n )\n nthreads: PositiveInt = Field(\n max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n description=\"Number of threads to use. See also `max_indexer_threads`.\",\n flag_type=\"-\",\n rename_param=\"j\",\n )\n no_check_prefix: bool = Field(\n False,\n description=\"Don't attempt to correct the prefix if it seems incorrect.\",\n flag_type=\"--\",\n rename_param=\"no-check-prefix\",\n )\n highres: Optional[float] = Field(\n description=\"Mark all pixels greater than `x` has bad.\", flag_type=\"--\"\n )\n profile: bool = Field(\n False, description=\"Display timing data to monitor performance.\", flag_type=\"--\"\n )\n temp_dir: Optional[str] = Field(\n description=\"Specify a path for the temp files folder.\",\n flag_type=\"--\",\n rename_param=\"temp-dir\",\n )\n wait_for_file: conint(gt=-2) = Field(\n 0,\n description=\"Wait at most `x` seconds for a file to be created. A value of -1 means wait forever.\",\n flag_type=\"--\",\n rename_param=\"wait-for-file\",\n )\n no_image_data: bool = Field(\n False,\n description=\"Load only the metadata, no iamges. Can check indexability without high data requirements.\",\n flag_type=\"--\",\n rename_param=\"no-image-data\",\n )\n # Peak-finding options\n # ....\n # Indexing options\n indexing: Optional[str] = Field(\n description=\"Comma-separated list of supported indexing algorithms to use. Default is to automatically detect.\",\n flag_type=\"--\",\n )\n cell_file: Optional[str] = Field(\n description=\"Path to a file containing unit cell information (PDB or CrystFEL format).\",\n flag_type=\"-\",\n rename_param=\"p\",\n )\n tolerance: str = Field(\n \"5,5,5,1.5\",\n description=(\n \"Tolerances (in percent) for unit cell comparison. \"\n \"Comma-separated list a,b,c,angle. Default=5,5,5,1.5\"\n ),\n flag_type=\"--\",\n )\n no_check_cell: bool = Field(\n False,\n description=\"Do not check cell parameters against unit cell. Replaces '-raw' method.\",\n flag_type=\"--\",\n rename_param=\"no-check-cell\",\n )\n no_check_peaks: bool = Field(\n False,\n description=\"Do not verify peaks are accounted for by solution.\",\n flag_type=\"--\",\n rename_param=\"no-check-peaks\",\n )\n multi: bool = Field(\n False, description=\"Enable multi-lattice indexing.\", flag_type=\"--\"\n )\n wavelength_estimate: Optional[float] = Field(\n description=\"Estimate for X-ray wavelength. Required for some methods.\",\n flag_type=\"--\",\n rename_param=\"wavelength-estimate\",\n )\n camera_length_estimate: Optional[float] = Field(\n description=\"Estimate for camera distance. Required for some methods.\",\n flag_type=\"--\",\n rename_param=\"camera-length-estimate\",\n )\n max_indexer_threads: Optional[PositiveInt] = Field(\n # 1,\n description=\"Some indexing algos can use multiple threads. In addition to image-based.\",\n flag_type=\"--\",\n rename_param=\"max-indexer-threads\",\n )\n no_retry: bool = Field(\n False,\n description=\"Do not remove weak peaks and try again.\",\n flag_type=\"--\",\n rename_param=\"no-retry\",\n )\n no_refine: bool = Field(\n False,\n description=\"Skip refinement step.\",\n flag_type=\"--\",\n rename_param=\"no-refine\",\n )\n no_revalidate: bool = Field(\n False,\n description=\"Skip revalidation step.\",\n flag_type=\"--\",\n rename_param=\"no-revalidate\",\n )\n # TakeTwo specific parameters\n taketwo_member_threshold: Optional[PositiveInt] = Field(\n # 20,\n description=\"Minimum number of vectors to consider.\",\n flag_type=\"--\",\n rename_param=\"taketwo-member-threshold\",\n )\n taketwo_len_tolerance: Optional[PositiveFloat] = Field(\n # 0.001,\n description=\"TakeTwo length tolerance in Angstroms.\",\n flag_type=\"--\",\n rename_param=\"taketwo-len-tolerance\",\n )\n taketwo_angle_tolerance: Optional[PositiveFloat] = Field(\n # 0.6,\n description=\"TakeTwo angle tolerance in degrees.\",\n flag_type=\"--\",\n rename_param=\"taketwo-angle-tolerance\",\n )\n taketwo_trace_tolerance: Optional[PositiveFloat] = Field(\n # 3,\n description=\"Matrix trace tolerance in degrees.\",\n flag_type=\"--\",\n rename_param=\"taketwo-trace-tolerance\",\n )\n # Felix-specific parameters\n # felix_domega\n # felix-fraction-max-visits\n # felix-max-internal-angle\n # felix-max-uniqueness\n # felix-min-completeness\n # felix-min-visits\n # felix-num-voxels\n # felix-sigma\n # felix-tthrange-max\n # felix-tthrange-min\n # XGANDALF-specific parameters\n xgandalf_sampling_pitch: Optional[NonNegativeInt] = Field(\n # 6,\n description=\"Density of reciprocal space sampling.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-sampling-pitch\",\n )\n xgandalf_grad_desc_iterations: Optional[NonNegativeInt] = Field(\n # 4,\n description=\"Number of gradient descent iterations.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-grad-desc-iterations\",\n )\n xgandalf_tolerance: Optional[PositiveFloat] = Field(\n # 0.02,\n description=\"Relative tolerance of lattice vectors\",\n flag_type=\"--\",\n rename_param=\"xgandalf-tolerance\",\n )\n xgandalf_no_deviation_from_provided_cell: Optional[bool] = Field(\n description=\"Found unit cell must match provided.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-no-deviation-from-provided-cell\",\n )\n xgandalf_min_lattice_vector_length: Optional[PositiveFloat] = Field(\n # 30,\n description=\"Minimum possible lattice length.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-min-lattice-vector-length\",\n )\n xgandalf_max_lattice_vector_length: Optional[PositiveFloat] = Field(\n # 250,\n description=\"Minimum possible lattice length.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-max-lattice-vector-length\",\n )\n xgandalf_max_peaks: Optional[PositiveInt] = Field(\n # 250,\n description=\"Maximum number of peaks to use for indexing.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-max-peaks\",\n )\n xgandalf_fast_execution: bool = Field(\n False,\n description=\"Shortcut to set sampling-pitch=2, and grad-desc-iterations=3.\",\n flag_type=\"--\",\n rename_param=\"xgandalf-fast-execution\",\n )\n # pinkIndexer parameters\n # ...\n # asdf_fast: bool = Field(False, description=\"Enable fast mode for asdf. 3x faster for 7% loss in accuracy.\", flag_type=\"--\", rename_param=\"asdf-fast\")\n # Integration parameters\n integration: str = Field(\n \"rings-nocen\", description=\"Method for integrating reflections.\", flag_type=\"--\"\n )\n fix_profile_radius: Optional[float] = Field(\n description=\"Fix the profile radius (m^{-1})\",\n flag_type=\"--\",\n rename_param=\"fix-profile-radius\",\n )\n fix_divergence: Optional[float] = Field(\n 0,\n description=\"Fix the divergence (rad, full angle).\",\n flag_type=\"--\",\n rename_param=\"fix-divergence\",\n )\n int_radius: str = Field(\n \"4,5,7\",\n description=\"Inner, middle, and outer radii for 3-ring integration.\",\n flag_type=\"--\",\n rename_param=\"int-radius\",\n )\n int_diag: str = Field(\n \"none\",\n description=\"Show detailed information on integration when condition is met.\",\n flag_type=\"--\",\n rename_param=\"int-diag\",\n )\n push_res: str = Field(\n \"infinity\",\n description=\"Integrate `x` higher than apparent resolution limit (nm-1).\",\n flag_type=\"--\",\n rename_param=\"push-res\",\n )\n overpredict: bool = Field(\n False,\n description=\"Over-predict reflections. Maybe useful with post-refinement.\",\n flag_type=\"--\",\n )\n cell_parameters_only: bool = Field(\n False, description=\"Do not predict refletions at all\", flag_type=\"--\"\n )\n # Output parameters\n no_non_hits_in_stream: bool = Field(\n False,\n description=\"Exclude non-hits from the stream file.\",\n flag_type=\"--\",\n rename_param=\"no-non-hits-in-stream\",\n )\n copy_hheader: Optional[str] = Field(\n description=\"Copy information from header in the image to output stream.\",\n flag_type=\"--\",\n rename_param=\"copy-hheader\",\n )\n no_peaks_in_stream: bool = Field(\n False,\n description=\"Do not record peaks in stream file.\",\n flag_type=\"--\",\n rename_param=\"no-peaks-in-stream\",\n )\n no_refls_in_stream: bool = Field(\n False,\n description=\"Do not record reflections in stream.\",\n flag_type=\"--\",\n rename_param=\"no-refls-in-stream\",\n )\n serial_offset: Optional[PositiveInt] = Field(\n description=\"Start numbering at `x` instead of 1.\",\n flag_type=\"--\",\n rename_param=\"serial-offset\",\n )\n harvest_file: Optional[str] = Field(\n description=\"Write parameters to file in JSON format.\",\n flag_type=\"--\",\n rename_param=\"harvest-file\",\n )\n\n @validator(\"in_file\", always=True)\n def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n if in_file == \"\":\n filename: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"FindPeaksPyAlgos\", \"out_file\"\n )\n if filename is None:\n exp: str = values[\"lute_config\"].experiment\n run: int = int(values[\"lute_config\"].run)\n tag: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"FindPeaksPsocake\", \"tag\"\n )\n out_dir: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"FindPeaksPsocake\", \"outDir\"\n )\n if out_dir is not None:\n fname: str = f\"{out_dir}/{exp}_{run:04d}\"\n if tag is not None:\n fname = f\"{fname}_{tag}\"\n return f\"{fname}.lst\"\n else:\n return filename\n return in_file\n\n @validator(\"out_file\", always=True)\n def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n if out_file == \"\":\n expmt: str = values[\"lute_config\"].experiment\n run: int = int(values[\"lute_config\"].run)\n work_dir: str = values[\"lute_config\"].work_dir\n fname: str = f\"{expmt}_r{run:04d}.stream\"\n return f\"{work_dir}/{fname}\"\n return out_file\n
"},{"location":"source/io/models/sfx_index/#io.models.sfx_index.IndexCrystFELParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_index.py
class Config(ThirdPartyParameters.Config):\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n
"},{"location":"source/io/models/sfx_index/#io.models.sfx_index.IndexCrystFELParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True
class-attribute
instance-attribute
","text":"Whether long command-line arguments are passed like --long=arg
.
set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/models/sfx_merge/","title":"sfx_merge","text":"Models for merging reflections in serial femtosecond crystallography.
Classes:
Name DescriptionMergePartialatorParameters
Perform merging using CrystFEL's partialator
.
CompareHKLParameters
Calculate figures of merit using CrystFEL's compare_hkl
.
ManipulateHKLParameters
Perform transformations on lists of reflections using CrystFEL's get_hkl
.
CompareHKLParameters
","text":" Bases: ThirdPartyParameters
Parameters for CrystFEL's compare_hkl
for calculating figures of merit.
There are many parameters, and many combinations. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-partialator.html
Source code inlute/io/models/sfx_merge.py
class CompareHKLParameters(ThirdPartyParameters):\n \"\"\"Parameters for CrystFEL's `compare_hkl` for calculating figures of merit.\n\n There are many parameters, and many combinations. For more information on\n usage, please refer to the CrystFEL documentation, here:\n https://www.desy.de/~twhite/crystfel/manual-partialator.html\n \"\"\"\n\n class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/compare_hkl\",\n description=\"CrystFEL's reflection comparison binary.\",\n flag_type=\"\",\n )\n in_files: Optional[str] = Field(\n \"\",\n description=\"Path to input HKLs. Space-separated list of 2. Use output of partialator e.g.\",\n flag_type=\"\",\n )\n ## Need mechanism to set is_result=True ...\n symmetry: str = Field(\"\", description=\"Point group symmetry.\", flag_type=\"--\")\n cell_file: str = Field(\n \"\",\n description=\"Path to a file containing unit cell information (PDB or CrystFEL format).\",\n flag_type=\"-\",\n rename_param=\"p\",\n )\n fom: str = Field(\n \"Rsplit\", description=\"Specify figure of merit to calculate.\", flag_type=\"--\"\n )\n nshells: int = Field(10, description=\"Use n resolution shells.\", flag_type=\"--\")\n # NEED A NEW CASE FOR THIS -> Boolean flag, no arg, one hyphen...\n # fix_unity: bool = Field(\n # False,\n # description=\"Fix scale factors to unity.\",\n # flag_type=\"-\",\n # rename_param=\"u\",\n # )\n shell_file: str = Field(\n \"\",\n description=\"Write the statistics in resolution shells to a file.\",\n flag_type=\"--\",\n rename_param=\"shell-file\",\n is_result=True,\n )\n ignore_negs: bool = Field(\n False,\n description=\"Ignore reflections with negative reflections.\",\n flag_type=\"--\",\n rename_param=\"ignore-negs\",\n )\n zero_negs: bool = Field(\n False,\n description=\"Set negative intensities to 0.\",\n flag_type=\"--\",\n rename_param=\"zero-negs\",\n )\n sigma_cutoff: Optional[Union[float, int, str]] = Field(\n # \"-infinity\",\n description=\"Discard reflections with I/sigma(I) < n. -infinity means no cutoff.\",\n flag_type=\"--\",\n rename_param=\"sigma-cutoff\",\n )\n rmin: Optional[float] = Field(\n description=\"Low resolution cutoff of 1/d (m-1). Use this or --lowres NOT both.\",\n flag_type=\"--\",\n )\n lowres: Optional[float] = Field(\n descirption=\"Low resolution cutoff in Angstroms. Use this or --rmin NOT both.\",\n flag_type=\"--\",\n )\n rmax: Optional[float] = Field(\n description=\"High resolution cutoff in 1/d (m-1). Use this or --highres NOT both.\",\n flag_type=\"--\",\n )\n highres: Optional[float] = Field(\n description=\"High resolution cutoff in Angstroms. Use this or --rmax NOT both.\",\n flag_type=\"--\",\n )\n\n @validator(\"in_files\", always=True)\n def validate_in_files(cls, in_files: str, values: Dict[str, Any]) -> str:\n if in_files == \"\":\n partialator_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n )\n if partialator_file:\n hkls: str = f\"{partialator_file}1 {partialator_file}2\"\n return hkls\n return in_files\n\n @validator(\"cell_file\", always=True)\n def validate_cell_file(cls, cell_file: str, values: Dict[str, Any]) -> str:\n if cell_file == \"\":\n idx_cell_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\",\n \"IndexCrystFEL\",\n \"cell_file\",\n valid_only=False,\n )\n if idx_cell_file:\n return idx_cell_file\n return cell_file\n\n @validator(\"symmetry\", always=True)\n def validate_symmetry(cls, symmetry: str, values: Dict[str, Any]) -> str:\n if symmetry == \"\":\n partialator_sym: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"symmetry\"\n )\n if partialator_sym:\n return partialator_sym\n return symmetry\n\n @validator(\"shell_file\", always=True)\n def validate_shell_file(cls, shell_file: str, values: Dict[str, Any]) -> str:\n if shell_file == \"\":\n partialator_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n )\n if partialator_file:\n shells_out: str = partialator_file.split(\".\")[0]\n shells_out = f\"{shells_out}_{values['fom']}_n{values['nshells']}.dat\"\n return shells_out\n return shell_file\n
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.CompareHKLParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_merge.py
class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.CompareHKLParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True
class-attribute
instance-attribute
","text":"Whether long command-line arguments are passed like --long=arg
.
set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.ManipulateHKLParameters","title":"ManipulateHKLParameters
","text":" Bases: ThirdPartyParameters
Parameters for CrystFEL's get_hkl
for manipulating lists of reflections.
This Task is predominantly used internally to convert hkl
to mtz
files. Note that performing multiple manipulations is undefined behaviour. Run the Task with multiple configurations in explicit separate steps. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-partialator.html
lute/io/models/sfx_merge.py
class ManipulateHKLParameters(ThirdPartyParameters):\n \"\"\"Parameters for CrystFEL's `get_hkl` for manipulating lists of reflections.\n\n This Task is predominantly used internally to convert `hkl` to `mtz` files.\n Note that performing multiple manipulations is undefined behaviour. Run\n the Task with multiple configurations in explicit separate steps. For more\n information on usage, please refer to the CrystFEL documentation, here:\n https://www.desy.de/~twhite/crystfel/manual-partialator.html\n \"\"\"\n\n class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/get_hkl\",\n description=\"CrystFEL's reflection manipulation binary.\",\n flag_type=\"\",\n )\n in_file: str = Field(\n \"\",\n description=\"Path to input HKL file.\",\n flag_type=\"-\",\n rename_param=\"i\",\n )\n out_file: str = Field(\n \"\",\n description=\"Path to output file.\",\n flag_type=\"-\",\n rename_param=\"o\",\n is_result=True,\n )\n cell_file: str = Field(\n \"\",\n description=\"Path to a file containing unit cell information (PDB or CrystFEL format).\",\n flag_type=\"-\",\n rename_param=\"p\",\n )\n output_format: str = Field(\n \"mtz\",\n description=\"Output format. One of mtz, mtz-bij, or xds. Otherwise CrystFEL format.\",\n flag_type=\"--\",\n rename_param=\"output-format\",\n )\n expand: Optional[str] = Field(\n description=\"Reflections will be expanded to fill asymmetric unit of specified point group.\",\n flag_type=\"--\",\n )\n # Reducing reflections to higher symmetry\n twin: Optional[str] = Field(\n description=\"Reflections equivalent to specified point group will have intensities summed.\",\n flag_type=\"--\",\n )\n no_need_all_parts: Optional[bool] = Field(\n description=\"Use with --twin to allow reflections missing a 'twin mate' to be written out.\",\n flag_type=\"--\",\n rename_param=\"no-need-all-parts\",\n )\n # Noise - Add to data\n noise: Optional[bool] = Field(\n description=\"Generate 10% uniform noise.\", flag_type=\"--\"\n )\n poisson: Optional[bool] = Field(\n description=\"Generate Poisson noise. Intensities assumed to be A.U.\",\n flag_type=\"--\",\n )\n adu_per_photon: Optional[int] = Field(\n description=\"Use with --poisson to convert A.U. to photons.\",\n flag_type=\"--\",\n rename_param=\"adu-per-photon\",\n )\n # Remove duplicate reflections\n trim_centrics: Optional[bool] = Field(\n description=\"Duplicated reflections (according to symmetry) are removed.\",\n flag_type=\"--\",\n )\n # Restrict to template file\n template: Optional[str] = Field(\n description=\"Only reflections which also appear in specified file are written out.\",\n flag_type=\"--\",\n )\n # Multiplicity\n multiplicity: Optional[bool] = Field(\n description=\"Reflections are multiplied by their symmetric multiplicites.\",\n flag_type=\"--\",\n )\n # Resolution cutoffs\n cutoff_angstroms: Optional[Union[str, int, float]] = Field(\n description=\"Either n, or n1,n2,n3. For n, reflections < n are removed. For n1,n2,n3 anisotropic trunction performed at separate resolution limits for a*, b*, c*.\",\n flag_type=\"--\",\n rename_param=\"cutoff-angstroms\",\n )\n lowres: Optional[float] = Field(\n description=\"Remove reflections with d > n\", flag_type=\"--\"\n )\n highres: Optional[float] = Field(\n description=\"Synonym for first form of --cutoff-angstroms\"\n )\n reindex: Optional[str] = Field(\n description=\"Reindex according to specified operator. E.g. k,h,-l.\",\n flag_type=\"--\",\n )\n # Override input symmetry\n symmetry: Optional[str] = Field(\n description=\"Point group symmetry to use to override. Almost always OMIT this option.\",\n flag_type=\"--\",\n )\n\n @validator(\"in_file\", always=True)\n def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n if in_file == \"\":\n partialator_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n )\n if partialator_file:\n return partialator_file\n return in_file\n\n @validator(\"out_file\", always=True)\n def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n if out_file == \"\":\n partialator_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"MergePartialator\", \"out_file\"\n )\n if partialator_file:\n mtz_out: str = partialator_file.split(\".\")[0]\n mtz_out = f\"{mtz_out}.mtz\"\n return mtz_out\n return out_file\n\n @validator(\"cell_file\", always=True)\n def validate_cell_file(cls, cell_file: str, values: Dict[str, Any]) -> str:\n if cell_file == \"\":\n idx_cell_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\",\n \"IndexCrystFEL\",\n \"cell_file\",\n valid_only=False,\n )\n if idx_cell_file:\n return idx_cell_file\n return cell_file\n
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.ManipulateHKLParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_merge.py
class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.ManipulateHKLParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True
class-attribute
instance-attribute
","text":"Whether long command-line arguments are passed like --long=arg
.
set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.MergePartialatorParameters","title":"MergePartialatorParameters
","text":" Bases: ThirdPartyParameters
Parameters for CrystFEL's partialator
.
There are many parameters, and many combinations. For more information on usage, please refer to the CrystFEL documentation, here: https://www.desy.de/~twhite/crystfel/manual-partialator.html
Source code inlute/io/models/sfx_merge.py
class MergePartialatorParameters(ThirdPartyParameters):\n \"\"\"Parameters for CrystFEL's `partialator`.\n\n There are many parameters, and many combinations. For more information on\n usage, please refer to the CrystFEL documentation, here:\n https://www.desy.de/~twhite/crystfel/manual-partialator.html\n \"\"\"\n\n class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/crystfel/0.10.2/bin/partialator\",\n description=\"CrystFEL's Partialator binary.\",\n flag_type=\"\",\n )\n in_file: Optional[str] = Field(\n \"\", description=\"Path to input stream.\", flag_type=\"-\", rename_param=\"i\"\n )\n out_file: str = Field(\n \"\",\n description=\"Path to output file.\",\n flag_type=\"-\",\n rename_param=\"o\",\n is_result=True,\n )\n symmetry: str = Field(description=\"Point group symmetry.\", flag_type=\"--\")\n niter: Optional[int] = Field(\n description=\"Number of cycles of scaling and post-refinement.\",\n flag_type=\"-\",\n rename_param=\"n\",\n )\n no_scale: Optional[bool] = Field(\n description=\"Disable scaling.\", flag_type=\"--\", rename_param=\"no-scale\"\n )\n no_Bscale: Optional[bool] = Field(\n description=\"Disable Debye-Waller part of scaling.\",\n flag_type=\"--\",\n rename_param=\"no-Bscale\",\n )\n no_pr: Optional[bool] = Field(\n description=\"Disable orientation model.\", flag_type=\"--\", rename_param=\"no-pr\"\n )\n no_deltacchalf: Optional[bool] = Field(\n description=\"Disable rejection based on deltaCC1/2.\",\n flag_type=\"--\",\n rename_param=\"no-deltacchalf\",\n )\n model: str = Field(\n \"unity\",\n description=\"Partiality model. Options: xsphere, unity, offset, ggpm.\",\n flag_type=\"--\",\n )\n nthreads: int = Field(\n max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n description=\"Number of parallel analyses.\",\n flag_type=\"-\",\n rename_param=\"j\",\n )\n polarisation: Optional[str] = Field(\n description=\"Specification of incident polarisation. Refer to CrystFEL docs for more info.\",\n flag_type=\"--\",\n )\n no_polarisation: Optional[bool] = Field(\n description=\"Synonym for --polarisation=none\",\n flag_type=\"--\",\n rename_param=\"no-polarisation\",\n )\n max_adu: Optional[float] = Field(\n description=\"Maximum intensity of reflection to include.\",\n flag_type=\"--\",\n rename_param=\"max-adu\",\n )\n min_res: Optional[float] = Field(\n description=\"Only include crystals diffracting to a minimum resolution.\",\n flag_type=\"--\",\n rename_param=\"min-res\",\n )\n min_measurements: int = Field(\n 2,\n description=\"Include a reflection only if it appears a minimum number of times.\",\n flag_type=\"--\",\n rename_param=\"min-measurements\",\n )\n push_res: Optional[float] = Field(\n description=\"Merge reflections up to higher than the apparent resolution limit.\",\n flag_type=\"--\",\n rename_param=\"push-res\",\n )\n start_after: int = Field(\n 0,\n description=\"Ignore the first n crystals.\",\n flag_type=\"--\",\n rename_param=\"start-after\",\n )\n stop_after: int = Field(\n 0,\n description=\"Stop after processing n crystals. 0 means process all.\",\n flag_type=\"--\",\n rename_param=\"stop-after\",\n )\n no_free: Optional[bool] = Field(\n description=\"Disable cross-validation. Testing ONLY.\",\n flag_type=\"--\",\n rename_param=\"no-free\",\n )\n custom_split: Optional[str] = Field(\n description=\"Read a set of filenames, event and dataset IDs from a filename.\",\n flag_type=\"--\",\n rename_param=\"custom-split\",\n )\n max_rel_B: float = Field(\n 100,\n description=\"Reject crystals if |relB| > n sq Angstroms.\",\n flag_type=\"--\",\n rename_param=\"max-rel-B\",\n )\n output_every_cycle: bool = Field(\n False,\n description=\"Write per-crystal params after every refinement cycle.\",\n flag_type=\"--\",\n rename_param=\"output-every-cycle\",\n )\n no_logs: bool = Field(\n False,\n description=\"Do not write logs needed for plots, maps and graphs.\",\n flag_type=\"--\",\n rename_param=\"no-logs\",\n )\n set_symmetry: Optional[str] = Field(\n description=\"Set the apparent symmetry of the crystals to a point group.\",\n flag_type=\"-\",\n rename_param=\"w\",\n )\n operator: Optional[str] = Field(\n description=\"Specify an ambiguity operator. E.g. k,h,-l.\", flag_type=\"--\"\n )\n force_bandwidth: Optional[float] = Field(\n description=\"Set X-ray bandwidth. As percent, e.g. 0.0013 (0.13%).\",\n flag_type=\"--\",\n rename_param=\"force-bandwidth\",\n )\n force_radius: Optional[float] = Field(\n description=\"Set the initial profile radius (nm-1).\",\n flag_type=\"--\",\n rename_param=\"force-radius\",\n )\n force_lambda: Optional[float] = Field(\n description=\"Set the wavelength. In Angstroms.\",\n flag_type=\"--\",\n rename_param=\"force-lambda\",\n )\n harvest_file: Optional[str] = Field(\n description=\"Write parameters to file in JSON format.\",\n flag_type=\"--\",\n rename_param=\"harvest-file\",\n )\n\n @validator(\"in_file\", always=True)\n def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n if in_file == \"\":\n stream_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\",\n \"ConcatenateStreamFiles\",\n \"out_file\",\n )\n if stream_file:\n return stream_file\n return in_file\n\n @validator(\"out_file\", always=True)\n def validate_out_file(cls, out_file: str, values: Dict[str, Any]) -> str:\n if out_file == \"\":\n in_file: str = values[\"in_file\"]\n if in_file:\n tag: str = in_file.split(\".\")[0]\n return f\"{tag}.hkl\"\n else:\n return \"partialator.hkl\"\n return out_file\n
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.MergePartialatorParameters.Config","title":"Config
","text":" Bases: Config
lute/io/models/sfx_merge.py
class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True\n \"\"\"Whether long command-line arguments are passed like `--long=arg`.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n
"},{"location":"source/io/models/sfx_merge/#io.models.sfx_merge.MergePartialatorParameters.Config.long_flags_use_eq","title":"long_flags_use_eq: bool = True
class-attribute
instance-attribute
","text":"Whether long command-line arguments are passed like --long=arg
.
set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/models/sfx_solve/","title":"sfx_solve","text":"Models for structure solution in serial femtosecond crystallography.
Classes:
Name DescriptionDimpleSolveParameters
Perform structure solution using CCP4's dimple (molecular replacement).
"},{"location":"source/io/models/sfx_solve/#io.models.sfx_solve.DimpleSolveParameters","title":"DimpleSolveParameters
","text":" Bases: ThirdPartyParameters
Parameters for CCP4's dimple program.
There are many parameters. For more information on usage, please refer to the CCP4 documentation, here: https://ccp4.github.io/dimple/
Source code inlute/io/models/sfx_solve.py
class DimpleSolveParameters(ThirdPartyParameters):\n \"\"\"Parameters for CCP4's dimple program.\n\n There are many parameters. For more information on\n usage, please refer to the CCP4 documentation, here:\n https://ccp4.github.io/dimple/\n \"\"\"\n\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/ccp4-8.0/bin/dimple\",\n description=\"CCP4 Dimple for solving structures with MR.\",\n flag_type=\"\",\n )\n # Positional requirements - all required.\n in_file: str = Field(\n \"\",\n description=\"Path to input mtz.\",\n flag_type=\"\",\n )\n pdb: str = Field(\"\", description=\"Path to a PDB.\", flag_type=\"\")\n out_dir: str = Field(\"\", description=\"Output DIRECTORY.\", flag_type=\"\")\n # Most used options\n mr_thresh: PositiveFloat = Field(\n 0.4,\n description=\"Threshold for molecular replacement.\",\n flag_type=\"--\",\n rename_param=\"mr-when-r\",\n )\n slow: Optional[bool] = Field(\n False, description=\"Perform more refinement.\", flag_type=\"--\"\n )\n # Other options (IO)\n hklout: str = Field(\n \"final.mtz\", description=\"Output mtz file name.\", flag_type=\"--\"\n )\n xyzout: str = Field(\n \"final.pdb\", description=\"Output PDB file name.\", flag_type=\"--\"\n )\n icolumn: Optional[str] = Field(\n # \"IMEAN\",\n description=\"Name for the I column.\",\n flag_type=\"--\",\n )\n sigicolumn: Optional[str] = Field(\n # \"SIG<ICOL>\",\n description=\"Name for the Sig<I> column.\",\n flag_type=\"--\",\n )\n fcolumn: Optional[str] = Field(\n # \"F\",\n description=\"Name for the F column.\",\n flag_type=\"--\",\n )\n sigfcolumn: Optional[str] = Field(\n # \"F\",\n description=\"Name for the Sig<F> column.\",\n flag_type=\"--\",\n )\n libin: Optional[str] = Field(\n description=\"Ligand descriptions for refmac (LIBIN).\", flag_type=\"--\"\n )\n refmac_key: Optional[str] = Field(\n description=\"Extra Refmac keywords to use in refinement.\",\n flag_type=\"--\",\n rename_param=\"refmac-key\",\n )\n free_r_flags: Optional[str] = Field(\n description=\"Path to a mtz file with freeR flags.\",\n flag_type=\"--\",\n rename_param=\"free-r-flags\",\n )\n freecolumn: Optional[Union[int, float]] = Field(\n # 0,\n description=\"Refree column with an optional value.\",\n flag_type=\"--\",\n )\n img_format: Optional[str] = Field(\n description=\"Format of generated images. (png, jpeg, none).\",\n flag_type=\"-\",\n rename_param=\"f\",\n )\n white_bg: bool = Field(\n False,\n description=\"Use a white background in Coot and in images.\",\n flag_type=\"--\",\n rename_param=\"white-bg\",\n )\n no_cleanup: bool = Field(\n False,\n description=\"Retain intermediate files.\",\n flag_type=\"--\",\n rename_param=\"no-cleanup\",\n )\n # Calculations\n no_blob_search: bool = Field(\n False,\n description=\"Do not search for unmodelled blobs.\",\n flag_type=\"--\",\n rename_param=\"no-blob-search\",\n )\n anode: bool = Field(\n False, description=\"Use SHELX/AnoDe to find peaks in the anomalous map.\"\n )\n # Run customization\n no_hetatm: bool = Field(\n False,\n description=\"Remove heteroatoms from the given model.\",\n flag_type=\"--\",\n rename_param=\"no-hetatm\",\n )\n rigid_cycles: Optional[PositiveInt] = Field(\n # 10,\n description=\"Number of cycles of rigid-body refinement to perform.\",\n flag_type=\"--\",\n rename_param=\"rigid-cycles\",\n )\n jelly: Optional[PositiveInt] = Field(\n # 4,\n description=\"Number of cycles of jelly-body refinement to perform.\",\n flag_type=\"--\",\n )\n restr_cycles: Optional[PositiveInt] = Field(\n # 8,\n description=\"Number of cycles of refmac final refinement to perform.\",\n flag_type=\"--\",\n rename_param=\"restr-cycles\",\n )\n lim_resolution: Optional[PositiveFloat] = Field(\n description=\"Limit the final resolution.\", flag_type=\"--\", rename_param=\"reso\"\n )\n weight: Optional[str] = Field(\n # \"auto-weight\",\n description=\"The refmac matrix weight.\",\n flag_type=\"--\",\n )\n mr_prog: Optional[str] = Field(\n # \"phaser\",\n description=\"Molecular replacement program. phaser or molrep.\",\n flag_type=\"--\",\n rename_param=\"mr-prog\",\n )\n mr_num: Optional[Union[str, int]] = Field(\n # \"auto\",\n description=\"Number of molecules to use for molecular replacement.\",\n flag_type=\"--\",\n rename_param=\"mr-num\",\n )\n mr_reso: Optional[PositiveFloat] = Field(\n # 3.25,\n description=\"High resolution for molecular replacement. If >10 interpreted as eLLG.\",\n flag_type=\"--\",\n rename_param=\"mr-reso\",\n )\n itof_prog: Optional[str] = Field(\n description=\"Program to calculate amplitudes. truncate, or ctruncate.\",\n flag_type=\"--\",\n rename_param=\"ItoF-prog\",\n )\n\n @validator(\"in_file\", always=True)\n def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n if in_file == \"\":\n get_hkl_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"ManipulateHKL\", \"out_file\"\n )\n if get_hkl_file:\n return get_hkl_file\n return in_file\n\n @validator(\"out_dir\", always=True)\n def validate_out_dir(cls, out_dir: str, values: Dict[str, Any]) -> str:\n if out_dir == \"\":\n get_hkl_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"ManipulateHKL\", \"out_file\"\n )\n if get_hkl_file:\n return os.path.dirname(get_hkl_file)\n return out_dir\n
"},{"location":"source/io/models/sfx_solve/#io.models.sfx_solve.RunSHELXCParameters","title":"RunSHELXCParameters
","text":" Bases: ThirdPartyParameters
Parameters for CCP4's SHELXC program.
SHELXC prepares files for SHELXD and SHELXE.
For more information please refer to the official documentation: https://www.ccp4.ac.uk/html/crank.html
Source code inlute/io/models/sfx_solve.py
class RunSHELXCParameters(ThirdPartyParameters):\n \"\"\"Parameters for CCP4's SHELXC program.\n\n SHELXC prepares files for SHELXD and SHELXE.\n\n For more information please refer to the official documentation:\n https://www.ccp4.ac.uk/html/crank.html\n \"\"\"\n\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/ccp4-8.0/bin/shelxc\",\n description=\"CCP4 SHELXC. Generates input files for SHELXD/SHELXE.\",\n flag_type=\"\",\n )\n placeholder: str = Field(\n \"xx\", description=\"Placeholder filename stem.\", flag_type=\"\"\n )\n in_file: str = Field(\n \"\",\n description=\"Input file for SHELXC with reflections AND proper records.\",\n flag_type=\"\",\n )\n\n @validator(\"in_file\", always=True)\n def validate_in_file(cls, in_file: str, values: Dict[str, Any]) -> str:\n if in_file == \"\":\n # get_hkl needed to be run to produce an XDS format file...\n xds_format_file: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", \"ManipulateHKL\", \"out_file\"\n )\n if xds_format_file:\n in_file = xds_format_file\n if in_file[0] != \"<\":\n # Need to add a redirection for this program\n # Runs like `shelxc xx <input_file.xds`\n in_file = f\"<{in_file}\"\n return in_file\n
"},{"location":"source/io/models/smd/","title":"smd","text":"Models for smalldata_tools Tasks.
Classes:
Name DescriptionSubmitSMDParameters
Parameters to run smalldata_tools to produce a smalldata HDF5 file.
FindOverlapXSSParameters
Parameter model for the FindOverlapXSS Task. Used to determine spatial/temporal overlap based on XSS difference signal.
"},{"location":"source/io/models/smd/#io.models.smd.FindOverlapXSSParameters","title":"FindOverlapXSSParameters
","text":" Bases: TaskParameters
TaskParameter model for FindOverlapXSS Task.
This Task determines spatial or temporal overlap between an optical pulse and the FEL pulse based on difference scattering (XSS) signal. This Task uses SmallData HDF5 files as a source.
Source code inlute/io/models/smd.py
class FindOverlapXSSParameters(TaskParameters):\n \"\"\"TaskParameter model for FindOverlapXSS Task.\n\n This Task determines spatial or temporal overlap between an optical pulse\n and the FEL pulse based on difference scattering (XSS) signal. This Task\n uses SmallData HDF5 files as a source.\n \"\"\"\n\n class ExpConfig(BaseModel):\n det_name: str\n ipm_var: str\n scan_var: Union[str, List[str]]\n\n class Thresholds(BaseModel):\n min_Iscat: Union[int, float]\n min_ipm: Union[int, float]\n\n class AnalysisFlags(BaseModel):\n use_pyfai: bool = True\n use_asymls: bool = False\n\n exp_config: ExpConfig\n thresholds: Thresholds\n analysis_flags: AnalysisFlags\n
"},{"location":"source/io/models/smd/#io.models.smd.SubmitSMDParameters","title":"SubmitSMDParameters
","text":" Bases: ThirdPartyParameters
Parameters for running smalldata to produce reduced HDF5 files.
Source code inlute/io/models/smd.py
class SubmitSMDParameters(ThirdPartyParameters):\n \"\"\"Parameters for running smalldata to produce reduced HDF5 files.\"\"\"\n\n class Config(ThirdPartyParameters.Config):\n \"\"\"Identical to super-class Config but includes a result.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n result_from_params: str = \"\"\n \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n\n executable: str = Field(\"mpirun\", description=\"MPI executable.\", flag_type=\"\")\n np: PositiveInt = Field(\n max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n description=\"Number of processes\",\n flag_type=\"-\",\n )\n p_arg1: str = Field(\n \"python\", description=\"Executable to run with mpi (i.e. python).\", flag_type=\"\"\n )\n u: str = Field(\n \"\", description=\"Python option for unbuffered output.\", flag_type=\"-\"\n )\n m: str = Field(\n \"mpi4py.run\",\n description=\"Python option to execute a module's contents as __main__ module.\",\n flag_type=\"-\",\n )\n producer: str = Field(\n \"\", description=\"Path to the SmallData producer Python script.\", flag_type=\"\"\n )\n run: str = Field(\n os.environ.get(\"RUN_NUM\", \"\"), description=\"DAQ Run Number.\", flag_type=\"--\"\n )\n experiment: str = Field(\n os.environ.get(\"EXPERIMENT\", \"\"),\n description=\"LCLS Experiment Number.\",\n flag_type=\"--\",\n )\n stn: NonNegativeInt = Field(0, description=\"Hutch endstation.\", flag_type=\"--\")\n nevents: int = Field(\n int(1e9), description=\"Number of events to process.\", flag_type=\"--\"\n )\n directory: Optional[str] = Field(\n None,\n description=\"Optional output directory. If None, will be in ${EXP_FOLDER}/hdf5/smalldata.\",\n flag_type=\"--\",\n )\n ## Need mechanism to set result_from_param=True ...\n gather_interval: PositiveInt = Field(\n 25, description=\"Number of events to collect at a time.\", flag_type=\"--\"\n )\n norecorder: bool = Field(\n False, description=\"Whether to ignore recorder streams.\", flag_type=\"--\"\n )\n url: HttpUrl = Field(\n \"https://pswww.slac.stanford.edu/ws-auth/lgbk\",\n description=\"Base URL for eLog posting.\",\n flag_type=\"--\",\n )\n epicsAll: bool = Field(\n False,\n description=\"Whether to store all EPICS PVs. Use with care.\",\n flag_type=\"--\",\n )\n full: bool = Field(\n False,\n description=\"Whether to store all data. Use with EXTRA care.\",\n flag_type=\"--\",\n )\n fullSum: bool = Field(\n False,\n description=\"Whether to store sums for all area detector images.\",\n flag_type=\"--\",\n )\n default: bool = Field(\n False,\n description=\"Whether to store only the default minimal set of data.\",\n flag_type=\"--\",\n )\n image: bool = Field(\n False,\n description=\"Whether to save everything as images. Use with care.\",\n flag_type=\"--\",\n )\n tiff: bool = Field(\n False,\n description=\"Whether to save all images as a single TIFF. Use with EXTRA care.\",\n flag_type=\"--\",\n )\n centerpix: bool = Field(\n False,\n description=\"Whether to mask center pixels for Epix10k2M detectors.\",\n flag_type=\"--\",\n )\n postRuntable: bool = Field(\n False,\n description=\"Whether to post run tables. Also used as a trigger for summary jobs.\",\n flag_type=\"--\",\n )\n wait: bool = Field(\n False, description=\"Whether to wait for a file to appear.\", flag_type=\"--\"\n )\n xtcav: bool = Field(\n False,\n description=\"Whether to add XTCAV processing to the HDF5 generation.\",\n flag_type=\"--\",\n )\n noarch: bool = Field(\n False, description=\"Whether to not use archiver data.\", flag_type=\"--\"\n )\n\n lute_template_cfg: TemplateConfig = TemplateConfig(template_name=\"\", output_path=\"\")\n\n @validator(\"producer\", always=True)\n def validate_producer_path(cls, producer: str) -> str:\n return producer\n\n @validator(\"lute_template_cfg\", always=True)\n def use_producer(\n cls, lute_template_cfg: TemplateConfig, values: Dict[str, Any]\n ) -> TemplateConfig:\n if not lute_template_cfg.output_path:\n lute_template_cfg.output_path = values[\"producer\"]\n return lute_template_cfg\n\n @root_validator(pre=False)\n def define_result(cls, values: Dict[str, Any]) -> Dict[str, Any]:\n exp: str = values[\"lute_config\"].experiment\n hutch: str = exp[:3]\n run: int = int(values[\"lute_config\"].run)\n directory: Optional[str] = values[\"directory\"]\n if directory is None:\n directory = f\"/sdf/data/lcls/ds/{hutch}/{exp}/hdf5/smalldata\"\n fname: str = f\"{exp}_Run{run:04d}.h5\"\n\n cls.Config.result_from_params = f\"{directory}/{fname}\"\n return values\n
"},{"location":"source/io/models/smd/#io.models.smd.SubmitSMDParameters.Config","title":"Config
","text":" Bases: Config
Identical to super-class Config but includes a result.
Source code inlute/io/models/smd.py
class Config(ThirdPartyParameters.Config):\n \"\"\"Identical to super-class Config but includes a result.\"\"\"\n\n set_result: bool = True\n \"\"\"Whether the Executor should mark a specified parameter as a result.\"\"\"\n\n result_from_params: str = \"\"\n \"\"\"Defines a result from the parameters. Use a validator to do so.\"\"\"\n
"},{"location":"source/io/models/smd/#io.models.smd.SubmitSMDParameters.Config.result_from_params","title":"result_from_params: str = ''
class-attribute
instance-attribute
","text":"Defines a result from the parameters. Use a validator to do so.
"},{"location":"source/io/models/smd/#io.models.smd.SubmitSMDParameters.Config.set_result","title":"set_result: bool = True
class-attribute
instance-attribute
","text":"Whether the Executor should mark a specified parameter as a result.
"},{"location":"source/io/models/tests/","title":"tests","text":"Models for all test Tasks.
Classes:
Name DescriptionTestParameters
Model for most basic test case. Single core first-party Task. Uses only communication via pipes.
TestBinaryParameters
Parameters for a simple multi- threaded binary executable.
TestSocketParameters
Model for first-party test requiring communication via socket.
TestWriteOutputParameters
Model for test Task which writes an output file. Location of file is recorded in database.
TestReadOutputParameters
Model for test Task which locates an output file based on an entry in the database, if no path is provided.
"},{"location":"source/io/models/tests/#io.models.tests.TestBinaryErrParameters","title":"TestBinaryErrParameters
","text":" Bases: ThirdPartyParameters
Same as TestBinary, but exits with non-zero code.
Source code inlute/io/models/tests.py
class TestBinaryErrParameters(ThirdPartyParameters):\n \"\"\"Same as TestBinary, but exits with non-zero code.\"\"\"\n\n executable: str = Field(\n \"/sdf/home/d/dorlhiac/test_tasks/test_threads_err\",\n description=\"Multi-threaded tes tbinary with non-zero exit code.\",\n )\n p_arg1: int = Field(1, description=\"Number of threads.\")\n
"},{"location":"source/io/models/tests/#io.models.tests.TestParameters","title":"TestParameters
","text":" Bases: TaskParameters
Parameters for the test Task Test
.
lute/io/models/tests.py
class TestParameters(TaskParameters):\n \"\"\"Parameters for the test Task `Test`.\"\"\"\n\n float_var: float = Field(0.01, description=\"A floating point number.\")\n str_var: str = Field(\"test\", description=\"A string.\")\n\n class CompoundVar(BaseModel):\n int_var: int = 1\n dict_var: Dict[str, str] = {\"a\": \"b\"}\n\n compound_var: CompoundVar = Field(\n description=(\n \"A compound parameter - consists of a `int_var` (int) and `dict_var`\"\n \" (Dict[str, str]).\"\n )\n )\n throw_error: bool = Field(\n False, description=\"If `True`, raise an exception to test error handling.\"\n )\n
"},{"location":"source/tasks/dataclasses/","title":"dataclasses","text":"Classes for describing Task state and results.
Classes:
Name DescriptionTaskResult
Output of a specific analysis task.
TaskStatus
Enumeration of possible Task statuses (running, pending, failed, etc.).
DescribedAnalysis
Executor's description of a Task
run (results, parameters, env).
DescribedAnalysis
dataclass
","text":"Complete analysis description. Held by an Executor.
Source code inlute/tasks/dataclasses.py
@dataclass\nclass DescribedAnalysis:\n \"\"\"Complete analysis description. Held by an Executor.\"\"\"\n\n task_result: TaskResult\n task_parameters: Optional[TaskParameters]\n task_env: Dict[str, str]\n poll_interval: float\n communicator_desc: List[str]\n
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.ElogSummaryPlots","title":"ElogSummaryPlots
dataclass
","text":"Holds a graphical summary intended for display in the eLog.
Attributes:
Name Type Descriptiondisplay_name
str
This represents both a path and how the result will be displayed in the eLog. Can include \"/\" characters. E.g. display_name = \"scans/my_motor_scan\"
will have plots shown on a \"my_motor_scan\" page, under a \"scans\" tab. This format mirrors how the file is stored on disk as well.
lute/tasks/dataclasses.py
@dataclass\nclass ElogSummaryPlots:\n \"\"\"Holds a graphical summary intended for display in the eLog.\n\n Attributes:\n display_name (str): This represents both a path and how the result will be\n displayed in the eLog. Can include \"/\" characters. E.g.\n `display_name = \"scans/my_motor_scan\"` will have plots shown\n on a \"my_motor_scan\" page, under a \"scans\" tab. This format mirrors\n how the file is stored on disk as well.\n \"\"\"\n\n display_name: str\n figures: Union[pn.Tabs, hv.Image, plt.Figure]\n
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskResult","title":"TaskResult
dataclass
","text":"Class for storing the result of a Task's execution with metadata.
Attributes:
Name Type Descriptiontask_name
str
Name of the associated task which produced it.
task_status
TaskStatus
Status of associated task.
summary
str
Short message/summary associated with the result.
payload
Any
Actual result. May be data in any format.
impl_schemas
Optional[str]
A string listing Task
schemas implemented by the associated Task
. Schemas define the category and expected output of the Task
. An individual task may implement/conform to multiple schemas. Multiple schemas are separated by ';', e.g. * impl_schemas = \"schema1;schema2\"
lute/tasks/dataclasses.py
@dataclass\nclass TaskResult:\n \"\"\"Class for storing the result of a Task's execution with metadata.\n\n Attributes:\n task_name (str): Name of the associated task which produced it.\n\n task_status (TaskStatus): Status of associated task.\n\n summary (str): Short message/summary associated with the result.\n\n payload (Any): Actual result. May be data in any format.\n\n impl_schemas (Optional[str]): A string listing `Task` schemas implemented\n by the associated `Task`. Schemas define the category and expected\n output of the `Task`. An individual task may implement/conform to\n multiple schemas. Multiple schemas are separated by ';', e.g.\n * impl_schemas = \"schema1;schema2\"\n \"\"\"\n\n task_name: str\n task_status: TaskStatus\n summary: str\n payload: Any\n impl_schemas: Optional[str] = None\n
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus","title":"TaskStatus
","text":" Bases: Enum
Possible Task statuses.
Source code inlute/tasks/dataclasses.py
class TaskStatus(Enum):\n \"\"\"Possible Task statuses.\"\"\"\n\n PENDING = 0\n \"\"\"\n Task has yet to run. Is Queued, or waiting for prior tasks.\n \"\"\"\n RUNNING = 1\n \"\"\"\n Task is in the process of execution.\n \"\"\"\n COMPLETED = 2\n \"\"\"\n Task has completed without fatal errors.\n \"\"\"\n FAILED = 3\n \"\"\"\n Task encountered a fatal error.\n \"\"\"\n STOPPED = 4\n \"\"\"\n Task was, potentially temporarily, stopped/suspended.\n \"\"\"\n CANCELLED = 5\n \"\"\"\n Task was cancelled prior to completion or failure.\n \"\"\"\n TIMEDOUT = 6\n \"\"\"\n Task did not reach completion due to timeout.\n \"\"\"\n
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.CANCELLED","title":"CANCELLED = 5
class-attribute
instance-attribute
","text":"Task was cancelled prior to completion or failure.
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.COMPLETED","title":"COMPLETED = 2
class-attribute
instance-attribute
","text":"Task has completed without fatal errors.
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.FAILED","title":"FAILED = 3
class-attribute
instance-attribute
","text":"Task encountered a fatal error.
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.PENDING","title":"PENDING = 0
class-attribute
instance-attribute
","text":"Task has yet to run. Is Queued, or waiting for prior tasks.
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.RUNNING","title":"RUNNING = 1
class-attribute
instance-attribute
","text":"Task is in the process of execution.
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.STOPPED","title":"STOPPED = 4
class-attribute
instance-attribute
","text":"Task was, potentially temporarily, stopped/suspended.
"},{"location":"source/tasks/dataclasses/#tasks.dataclasses.TaskStatus.TIMEDOUT","title":"TIMEDOUT = 6
class-attribute
instance-attribute
","text":"Task did not reach completion due to timeout.
"},{"location":"source/tasks/sfx_find_peaks/","title":"sfx_find_peaks","text":"Classes for peak finding tasks in SFX.
Classes:
Name DescriptionCxiWriter
utility class for writing peak finding results to CXI files.
FindPeaksPyAlgos
peak finding using psana's PyAlgos algorithm. Optional data compression and decompression with libpressio for data reduction tests.
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.CxiWriter","title":"CxiWriter
","text":"Source code in lute/tasks/sfx_find_peaks.py
class CxiWriter:\n\n def __init__(\n self,\n outdir: str,\n rank: int,\n exp: str,\n run: int,\n n_events: int,\n det_shape: Tuple[int, ...],\n min_peaks: int,\n max_peaks: int,\n i_x: Any, # Not typed becomes it comes from psana\n i_y: Any, # Not typed becomes it comes from psana\n ipx: Any, # Not typed becomes it comes from psana\n ipy: Any, # Not typed becomes it comes from psana\n tag: str,\n ):\n \"\"\"\n Set up the CXI files to which peak finding results will be saved.\n\n Parameters:\n\n outdir (str): Output directory for cxi file.\n\n rank (int): MPI rank of the caller.\n\n exp (str): Experiment string.\n\n run (int): Experimental run.\n\n n_events (int): Number of events to process.\n\n det_shape (Tuple[int, int]): Shape of the numpy array storing the detector\n data. This must be aCheetah-stile 2D array.\n\n min_peaks (int): Minimum number of peaks per image.\n\n max_peaks (int): Maximum number of peaks per image.\n\n i_x (Any): Array of pixel indexes along x\n\n i_y (Any): Array of pixel indexes along y\n\n ipx (Any): Pixel indexes with respect to detector origin (x component)\n\n ipy (Any): Pixel indexes with respect to detector origin (y component)\n\n tag (str): Tag to append to cxi file names.\n \"\"\"\n self._det_shape: Tuple[int, ...] = det_shape\n self._i_x: Any = i_x\n self._i_y: Any = i_y\n self._ipx: Any = ipx\n self._ipy: Any = ipy\n self._index: int = 0\n\n # Create and open the HDF5 file\n fname: str = f\"{exp}_r{run:0>4}_{rank}{tag}.cxi\"\n Path(outdir).mkdir(exist_ok=True)\n self._outh5: Any = h5py.File(Path(outdir) / fname, \"w\")\n\n # Entry_1 entry for processing with CrystFEL\n entry_1: Any = self._outh5.create_group(\"entry_1\")\n keys: List[str] = [\n \"nPeaks\",\n \"peakXPosRaw\",\n \"peakYPosRaw\",\n \"rcent\",\n \"ccent\",\n \"rmin\",\n \"rmax\",\n \"cmin\",\n \"cmax\",\n \"peakTotalIntensity\",\n \"peakMaxIntensity\",\n \"peakRadius\",\n ]\n ds_expId: Any = entry_1.create_dataset(\n \"experimental_identifier\", (n_events,), maxshape=(None,), dtype=int\n )\n ds_expId.attrs[\"axes\"] = \"experiment_identifier\"\n data_1: Any = entry_1.create_dataset(\n \"/entry_1/data_1/data\",\n (n_events, det_shape[0], det_shape[1]),\n chunks=(1, det_shape[0], det_shape[1]),\n maxshape=(None, det_shape[0], det_shape[1]),\n dtype=numpy.float32,\n )\n data_1.attrs[\"axes\"] = \"experiment_identifier\"\n key: str\n for key in [\"powderHits\", \"powderMisses\", \"mask\"]:\n entry_1.create_dataset(\n f\"/entry_1/data_1/{key}\",\n (det_shape[0], det_shape[1]),\n chunks=(det_shape[0], det_shape[1]),\n maxshape=(det_shape[0], det_shape[1]),\n dtype=float,\n )\n\n # Peak-related entries\n for key in keys:\n if key == \"nPeaks\":\n ds_x: Any = self._outh5.create_dataset(\n f\"/entry_1/result_1/{key}\",\n (n_events,),\n maxshape=(None,),\n dtype=int,\n )\n ds_x.attrs[\"minPeaks\"] = min_peaks\n ds_x.attrs[\"maxPeaks\"] = max_peaks\n else:\n ds_x: Any = self._outh5.create_dataset(\n f\"/entry_1/result_1/{key}\",\n (n_events, max_peaks),\n maxshape=(None, max_peaks),\n chunks=(1, max_peaks),\n dtype=float,\n )\n ds_x.attrs[\"axes\"] = \"experiment_identifier:peaks\"\n\n # Timestamp entries\n lcls_1: Any = self._outh5.create_group(\"LCLS\")\n keys: List[str] = [\n \"eventNumber\",\n \"machineTime\",\n \"machineTimeNanoSeconds\",\n \"fiducial\",\n \"photon_energy_eV\",\n ]\n key: str\n for key in keys:\n if key == \"photon_energy_eV\":\n ds_x: Any = lcls_1.create_dataset(\n f\"{key}\", (n_events,), maxshape=(None,), dtype=float\n )\n else:\n ds_x = lcls_1.create_dataset(\n f\"{key}\", (n_events,), maxshape=(None,), dtype=int\n )\n ds_x.attrs[\"axes\"] = \"experiment_identifier\"\n\n ds_x = self._outh5.create_dataset(\n \"/LCLS/detector_1/EncoderValue\", (n_events,), maxshape=(None,), dtype=float\n )\n ds_x.attrs[\"axes\"] = \"experiment_identifier\"\n\n def write_event(\n self,\n img: NDArray[numpy.float_],\n peaks: Any, # Not typed becomes it comes from psana\n timestamp_seconds: int,\n timestamp_nanoseconds: int,\n timestamp_fiducials: int,\n photon_energy: float,\n ):\n \"\"\"\n Write peak finding results for an event into the HDF5 file.\n\n Parameters:\n\n img (NDArray[numpy.float_]): Detector data for the event\n\n peaks: (Any): Peak information for the event, as recovered from the PyAlgos\n algorithm\n\n timestamp_seconds (int): Second part of the event's timestamp information\n\n timestamp_nanoseconds (int): Nanosecond part of the event's timestamp\n information\n\n timestamp_fiducials (int): Fiducials part of the event's timestamp\n information\n\n photon_energy (float): Photon energy for the event\n \"\"\"\n ch_rows: NDArray[numpy.float_] = peaks[:, 0] * self._det_shape[1] + peaks[:, 1]\n ch_cols: NDArray[numpy.float_] = peaks[:, 2]\n\n if self._outh5[\"/entry_1/data_1/data\"].shape[0] <= self._index:\n self._outh5[\"entry_1/data_1/data\"].resize(self._index + 1, axis=0)\n ds_key: str\n for ds_key in self._outh5[\"/entry_1/result_1\"].keys():\n self._outh5[f\"/entry_1/result_1/{ds_key}\"].resize(\n self._index + 1, axis=0\n )\n for ds_key in (\n \"machineTime\",\n \"machineTimeNanoSeconds\",\n \"fiducial\",\n \"photon_energy_eV\",\n ):\n self._outh5[f\"/LCLS/{ds_key}\"].resize(self._index + 1, axis=0)\n\n # Entry_1 entry for processing with CrystFEL\n self._outh5[\"/entry_1/data_1/data\"][self._index, :, :] = img.reshape(\n -1, img.shape[-1]\n )\n self._outh5[\"/entry_1/result_1/nPeaks\"][self._index] = peaks.shape[0]\n self._outh5[\"/entry_1/result_1/peakXPosRaw\"][self._index, : peaks.shape[0]] = (\n ch_cols.astype(\"int\")\n )\n self._outh5[\"/entry_1/result_1/peakYPosRaw\"][self._index, : peaks.shape[0]] = (\n ch_rows.astype(\"int\")\n )\n self._outh5[\"/entry_1/result_1/rcent\"][self._index, : peaks.shape[0]] = peaks[\n :, 6\n ]\n self._outh5[\"/entry_1/result_1/ccent\"][self._index, : peaks.shape[0]] = peaks[\n :, 7\n ]\n self._outh5[\"/entry_1/result_1/rmin\"][self._index, : peaks.shape[0]] = peaks[\n :, 10\n ]\n self._outh5[\"/entry_1/result_1/rmax\"][self._index, : peaks.shape[0]] = peaks[\n :, 11\n ]\n self._outh5[\"/entry_1/result_1/cmin\"][self._index, : peaks.shape[0]] = peaks[\n :, 12\n ]\n self._outh5[\"/entry_1/result_1/cmax\"][self._index, : peaks.shape[0]] = peaks[\n :, 13\n ]\n self._outh5[\"/entry_1/result_1/peakTotalIntensity\"][\n self._index, : peaks.shape[0]\n ] = peaks[:, 5]\n self._outh5[\"/entry_1/result_1/peakMaxIntensity\"][\n self._index, : peaks.shape[0]\n ] = peaks[:, 4]\n\n # Calculate and write pixel radius\n peaks_cenx: NDArray[numpy.float_] = (\n self._i_x[\n numpy.array(peaks[:, 0], dtype=numpy.int64),\n numpy.array(peaks[:, 1], dtype=numpy.int64),\n numpy.array(peaks[:, 2], dtype=numpy.int64),\n ]\n + 0.5\n - self._ipx\n )\n peaks_ceny: NDArray[numpy.float_] = (\n self._i_y[\n numpy.array(peaks[:, 0], dtype=numpy.int64),\n numpy.array(peaks[:, 1], dtype=numpy.int64),\n numpy.array(peaks[:, 2], dtype=numpy.int64),\n ]\n + 0.5\n - self._ipy\n )\n peak_radius: NDArray[numpy.float_] = numpy.sqrt(\n (peaks_cenx**2) + (peaks_ceny**2)\n )\n self._outh5[\"/entry_1/result_1/peakRadius\"][\n self._index, : peaks.shape[0]\n ] = peak_radius\n\n # LCLS entry dataset\n self._outh5[\"/LCLS/machineTime\"][self._index] = timestamp_seconds\n self._outh5[\"/LCLS/machineTimeNanoSeconds\"][self._index] = timestamp_nanoseconds\n self._outh5[\"/LCLS/fiducial\"][self._index] = timestamp_fiducials\n self._outh5[\"/LCLS/photon_energy_eV\"][self._index] = photon_energy\n\n self._index += 1\n\n def write_non_event_data(\n self,\n powder_hits: NDArray[numpy.float_],\n powder_misses: NDArray[numpy.float_],\n mask: NDArray[numpy.uint16],\n clen: float,\n ):\n \"\"\"\n Write to the file data that is not related to a specific event (masks, powders)\n\n Parameters:\n\n powder_hits (NDArray[numpy.float_]): Virtual powder pattern from hits\n\n powder_misses (NDArray[numpy.float_]): Virtual powder pattern from hits\n\n mask: (NDArray[numpy.uint16]): Pixel ask to write into the file\n\n \"\"\"\n # Add powders and mask to files, reshaping them to match the crystfel\n # convention\n self._outh5[\"/entry_1/data_1/powderHits\"][:] = powder_hits.reshape(\n -1, powder_hits.shape[-1]\n )\n self._outh5[\"/entry_1/data_1/powderMisses\"][:] = powder_misses.reshape(\n -1, powder_misses.shape[-1]\n )\n self._outh5[\"/entry_1/data_1/mask\"][:] = (1 - mask).reshape(\n -1, mask.shape[-1]\n ) # Crystfel expects inverted values\n\n # Add clen distance\n self._outh5[\"/LCLS/detector_1/EncoderValue\"][:] = clen\n\n def optimize_and_close_file(\n self,\n num_hits: int,\n max_peaks: int,\n ):\n \"\"\"\n Resize data blocks and write additional information to the file\n\n Parameters:\n\n num_hits (int): Number of hits for which information has been saved to the\n file\n\n max_peaks (int): Maximum number of peaks (per event) for which information\n can be written into the file\n \"\"\"\n\n # Resize the entry_1 entry\n data_shape: Tuple[int, ...] = self._outh5[\"/entry_1/data_1/data\"].shape\n self._outh5[\"/entry_1/data_1/data\"].resize(\n (num_hits, data_shape[1], data_shape[2])\n )\n self._outh5[f\"/entry_1/result_1/nPeaks\"].resize((num_hits,))\n key: str\n for key in [\n \"peakXPosRaw\",\n \"peakYPosRaw\",\n \"rcent\",\n \"ccent\",\n \"rmin\",\n \"rmax\",\n \"cmin\",\n \"cmax\",\n \"peakTotalIntensity\",\n \"peakMaxIntensity\",\n \"peakRadius\",\n ]:\n self._outh5[f\"/entry_1/result_1/{key}\"].resize((num_hits, max_peaks))\n\n # Resize LCLS entry\n for key in [\n \"eventNumber\",\n \"machineTime\",\n \"machineTimeNanoSeconds\",\n \"fiducial\",\n \"detector_1/EncoderValue\",\n \"photon_energy_eV\",\n ]:\n self._outh5[f\"/LCLS/{key}\"].resize((num_hits,))\n self._outh5.close()\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.CxiWriter.__init__","title":"__init__(outdir, rank, exp, run, n_events, det_shape, min_peaks, max_peaks, i_x, i_y, ipx, ipy, tag)
","text":"Set up the CXI files to which peak finding results will be saved.
Parameters:
outdir (str): Output directory for cxi file.\n\nrank (int): MPI rank of the caller.\n\nexp (str): Experiment string.\n\nrun (int): Experimental run.\n\nn_events (int): Number of events to process.\n\ndet_shape (Tuple[int, int]): Shape of the numpy array storing the detector\n data. This must be aCheetah-stile 2D array.\n\nmin_peaks (int): Minimum number of peaks per image.\n\nmax_peaks (int): Maximum number of peaks per image.\n\ni_x (Any): Array of pixel indexes along x\n\ni_y (Any): Array of pixel indexes along y\n\nipx (Any): Pixel indexes with respect to detector origin (x component)\n\nipy (Any): Pixel indexes with respect to detector origin (y component)\n\ntag (str): Tag to append to cxi file names.\n
Source code in lute/tasks/sfx_find_peaks.py
def __init__(\n self,\n outdir: str,\n rank: int,\n exp: str,\n run: int,\n n_events: int,\n det_shape: Tuple[int, ...],\n min_peaks: int,\n max_peaks: int,\n i_x: Any, # Not typed becomes it comes from psana\n i_y: Any, # Not typed becomes it comes from psana\n ipx: Any, # Not typed becomes it comes from psana\n ipy: Any, # Not typed becomes it comes from psana\n tag: str,\n):\n \"\"\"\n Set up the CXI files to which peak finding results will be saved.\n\n Parameters:\n\n outdir (str): Output directory for cxi file.\n\n rank (int): MPI rank of the caller.\n\n exp (str): Experiment string.\n\n run (int): Experimental run.\n\n n_events (int): Number of events to process.\n\n det_shape (Tuple[int, int]): Shape of the numpy array storing the detector\n data. This must be aCheetah-stile 2D array.\n\n min_peaks (int): Minimum number of peaks per image.\n\n max_peaks (int): Maximum number of peaks per image.\n\n i_x (Any): Array of pixel indexes along x\n\n i_y (Any): Array of pixel indexes along y\n\n ipx (Any): Pixel indexes with respect to detector origin (x component)\n\n ipy (Any): Pixel indexes with respect to detector origin (y component)\n\n tag (str): Tag to append to cxi file names.\n \"\"\"\n self._det_shape: Tuple[int, ...] = det_shape\n self._i_x: Any = i_x\n self._i_y: Any = i_y\n self._ipx: Any = ipx\n self._ipy: Any = ipy\n self._index: int = 0\n\n # Create and open the HDF5 file\n fname: str = f\"{exp}_r{run:0>4}_{rank}{tag}.cxi\"\n Path(outdir).mkdir(exist_ok=True)\n self._outh5: Any = h5py.File(Path(outdir) / fname, \"w\")\n\n # Entry_1 entry for processing with CrystFEL\n entry_1: Any = self._outh5.create_group(\"entry_1\")\n keys: List[str] = [\n \"nPeaks\",\n \"peakXPosRaw\",\n \"peakYPosRaw\",\n \"rcent\",\n \"ccent\",\n \"rmin\",\n \"rmax\",\n \"cmin\",\n \"cmax\",\n \"peakTotalIntensity\",\n \"peakMaxIntensity\",\n \"peakRadius\",\n ]\n ds_expId: Any = entry_1.create_dataset(\n \"experimental_identifier\", (n_events,), maxshape=(None,), dtype=int\n )\n ds_expId.attrs[\"axes\"] = \"experiment_identifier\"\n data_1: Any = entry_1.create_dataset(\n \"/entry_1/data_1/data\",\n (n_events, det_shape[0], det_shape[1]),\n chunks=(1, det_shape[0], det_shape[1]),\n maxshape=(None, det_shape[0], det_shape[1]),\n dtype=numpy.float32,\n )\n data_1.attrs[\"axes\"] = \"experiment_identifier\"\n key: str\n for key in [\"powderHits\", \"powderMisses\", \"mask\"]:\n entry_1.create_dataset(\n f\"/entry_1/data_1/{key}\",\n (det_shape[0], det_shape[1]),\n chunks=(det_shape[0], det_shape[1]),\n maxshape=(det_shape[0], det_shape[1]),\n dtype=float,\n )\n\n # Peak-related entries\n for key in keys:\n if key == \"nPeaks\":\n ds_x: Any = self._outh5.create_dataset(\n f\"/entry_1/result_1/{key}\",\n (n_events,),\n maxshape=(None,),\n dtype=int,\n )\n ds_x.attrs[\"minPeaks\"] = min_peaks\n ds_x.attrs[\"maxPeaks\"] = max_peaks\n else:\n ds_x: Any = self._outh5.create_dataset(\n f\"/entry_1/result_1/{key}\",\n (n_events, max_peaks),\n maxshape=(None, max_peaks),\n chunks=(1, max_peaks),\n dtype=float,\n )\n ds_x.attrs[\"axes\"] = \"experiment_identifier:peaks\"\n\n # Timestamp entries\n lcls_1: Any = self._outh5.create_group(\"LCLS\")\n keys: List[str] = [\n \"eventNumber\",\n \"machineTime\",\n \"machineTimeNanoSeconds\",\n \"fiducial\",\n \"photon_energy_eV\",\n ]\n key: str\n for key in keys:\n if key == \"photon_energy_eV\":\n ds_x: Any = lcls_1.create_dataset(\n f\"{key}\", (n_events,), maxshape=(None,), dtype=float\n )\n else:\n ds_x = lcls_1.create_dataset(\n f\"{key}\", (n_events,), maxshape=(None,), dtype=int\n )\n ds_x.attrs[\"axes\"] = \"experiment_identifier\"\n\n ds_x = self._outh5.create_dataset(\n \"/LCLS/detector_1/EncoderValue\", (n_events,), maxshape=(None,), dtype=float\n )\n ds_x.attrs[\"axes\"] = \"experiment_identifier\"\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.CxiWriter.optimize_and_close_file","title":"optimize_and_close_file(num_hits, max_peaks)
","text":"Resize data blocks and write additional information to the file
Parameters:
num_hits (int): Number of hits for which information has been saved to the\n file\n\nmax_peaks (int): Maximum number of peaks (per event) for which information\n can be written into the file\n
Source code in lute/tasks/sfx_find_peaks.py
def optimize_and_close_file(\n self,\n num_hits: int,\n max_peaks: int,\n):\n \"\"\"\n Resize data blocks and write additional information to the file\n\n Parameters:\n\n num_hits (int): Number of hits for which information has been saved to the\n file\n\n max_peaks (int): Maximum number of peaks (per event) for which information\n can be written into the file\n \"\"\"\n\n # Resize the entry_1 entry\n data_shape: Tuple[int, ...] = self._outh5[\"/entry_1/data_1/data\"].shape\n self._outh5[\"/entry_1/data_1/data\"].resize(\n (num_hits, data_shape[1], data_shape[2])\n )\n self._outh5[f\"/entry_1/result_1/nPeaks\"].resize((num_hits,))\n key: str\n for key in [\n \"peakXPosRaw\",\n \"peakYPosRaw\",\n \"rcent\",\n \"ccent\",\n \"rmin\",\n \"rmax\",\n \"cmin\",\n \"cmax\",\n \"peakTotalIntensity\",\n \"peakMaxIntensity\",\n \"peakRadius\",\n ]:\n self._outh5[f\"/entry_1/result_1/{key}\"].resize((num_hits, max_peaks))\n\n # Resize LCLS entry\n for key in [\n \"eventNumber\",\n \"machineTime\",\n \"machineTimeNanoSeconds\",\n \"fiducial\",\n \"detector_1/EncoderValue\",\n \"photon_energy_eV\",\n ]:\n self._outh5[f\"/LCLS/{key}\"].resize((num_hits,))\n self._outh5.close()\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.CxiWriter.write_event","title":"write_event(img, peaks, timestamp_seconds, timestamp_nanoseconds, timestamp_fiducials, photon_energy)
","text":"Write peak finding results for an event into the HDF5 file.
Parameters:
img (NDArray[numpy.float_]): Detector data for the event\n\npeaks: (Any): Peak information for the event, as recovered from the PyAlgos\n algorithm\n\ntimestamp_seconds (int): Second part of the event's timestamp information\n\ntimestamp_nanoseconds (int): Nanosecond part of the event's timestamp\n information\n\ntimestamp_fiducials (int): Fiducials part of the event's timestamp\n information\n\nphoton_energy (float): Photon energy for the event\n
Source code in lute/tasks/sfx_find_peaks.py
def write_event(\n self,\n img: NDArray[numpy.float_],\n peaks: Any, # Not typed becomes it comes from psana\n timestamp_seconds: int,\n timestamp_nanoseconds: int,\n timestamp_fiducials: int,\n photon_energy: float,\n):\n \"\"\"\n Write peak finding results for an event into the HDF5 file.\n\n Parameters:\n\n img (NDArray[numpy.float_]): Detector data for the event\n\n peaks: (Any): Peak information for the event, as recovered from the PyAlgos\n algorithm\n\n timestamp_seconds (int): Second part of the event's timestamp information\n\n timestamp_nanoseconds (int): Nanosecond part of the event's timestamp\n information\n\n timestamp_fiducials (int): Fiducials part of the event's timestamp\n information\n\n photon_energy (float): Photon energy for the event\n \"\"\"\n ch_rows: NDArray[numpy.float_] = peaks[:, 0] * self._det_shape[1] + peaks[:, 1]\n ch_cols: NDArray[numpy.float_] = peaks[:, 2]\n\n if self._outh5[\"/entry_1/data_1/data\"].shape[0] <= self._index:\n self._outh5[\"entry_1/data_1/data\"].resize(self._index + 1, axis=0)\n ds_key: str\n for ds_key in self._outh5[\"/entry_1/result_1\"].keys():\n self._outh5[f\"/entry_1/result_1/{ds_key}\"].resize(\n self._index + 1, axis=0\n )\n for ds_key in (\n \"machineTime\",\n \"machineTimeNanoSeconds\",\n \"fiducial\",\n \"photon_energy_eV\",\n ):\n self._outh5[f\"/LCLS/{ds_key}\"].resize(self._index + 1, axis=0)\n\n # Entry_1 entry for processing with CrystFEL\n self._outh5[\"/entry_1/data_1/data\"][self._index, :, :] = img.reshape(\n -1, img.shape[-1]\n )\n self._outh5[\"/entry_1/result_1/nPeaks\"][self._index] = peaks.shape[0]\n self._outh5[\"/entry_1/result_1/peakXPosRaw\"][self._index, : peaks.shape[0]] = (\n ch_cols.astype(\"int\")\n )\n self._outh5[\"/entry_1/result_1/peakYPosRaw\"][self._index, : peaks.shape[0]] = (\n ch_rows.astype(\"int\")\n )\n self._outh5[\"/entry_1/result_1/rcent\"][self._index, : peaks.shape[0]] = peaks[\n :, 6\n ]\n self._outh5[\"/entry_1/result_1/ccent\"][self._index, : peaks.shape[0]] = peaks[\n :, 7\n ]\n self._outh5[\"/entry_1/result_1/rmin\"][self._index, : peaks.shape[0]] = peaks[\n :, 10\n ]\n self._outh5[\"/entry_1/result_1/rmax\"][self._index, : peaks.shape[0]] = peaks[\n :, 11\n ]\n self._outh5[\"/entry_1/result_1/cmin\"][self._index, : peaks.shape[0]] = peaks[\n :, 12\n ]\n self._outh5[\"/entry_1/result_1/cmax\"][self._index, : peaks.shape[0]] = peaks[\n :, 13\n ]\n self._outh5[\"/entry_1/result_1/peakTotalIntensity\"][\n self._index, : peaks.shape[0]\n ] = peaks[:, 5]\n self._outh5[\"/entry_1/result_1/peakMaxIntensity\"][\n self._index, : peaks.shape[0]\n ] = peaks[:, 4]\n\n # Calculate and write pixel radius\n peaks_cenx: NDArray[numpy.float_] = (\n self._i_x[\n numpy.array(peaks[:, 0], dtype=numpy.int64),\n numpy.array(peaks[:, 1], dtype=numpy.int64),\n numpy.array(peaks[:, 2], dtype=numpy.int64),\n ]\n + 0.5\n - self._ipx\n )\n peaks_ceny: NDArray[numpy.float_] = (\n self._i_y[\n numpy.array(peaks[:, 0], dtype=numpy.int64),\n numpy.array(peaks[:, 1], dtype=numpy.int64),\n numpy.array(peaks[:, 2], dtype=numpy.int64),\n ]\n + 0.5\n - self._ipy\n )\n peak_radius: NDArray[numpy.float_] = numpy.sqrt(\n (peaks_cenx**2) + (peaks_ceny**2)\n )\n self._outh5[\"/entry_1/result_1/peakRadius\"][\n self._index, : peaks.shape[0]\n ] = peak_radius\n\n # LCLS entry dataset\n self._outh5[\"/LCLS/machineTime\"][self._index] = timestamp_seconds\n self._outh5[\"/LCLS/machineTimeNanoSeconds\"][self._index] = timestamp_nanoseconds\n self._outh5[\"/LCLS/fiducial\"][self._index] = timestamp_fiducials\n self._outh5[\"/LCLS/photon_energy_eV\"][self._index] = photon_energy\n\n self._index += 1\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.CxiWriter.write_non_event_data","title":"write_non_event_data(powder_hits, powder_misses, mask, clen)
","text":"Write to the file data that is not related to a specific event (masks, powders)
Parameters:
powder_hits (NDArray[numpy.float_]): Virtual powder pattern from hits\n\npowder_misses (NDArray[numpy.float_]): Virtual powder pattern from hits\n\nmask: (NDArray[numpy.uint16]): Pixel ask to write into the file\n
Source code in lute/tasks/sfx_find_peaks.py
def write_non_event_data(\n self,\n powder_hits: NDArray[numpy.float_],\n powder_misses: NDArray[numpy.float_],\n mask: NDArray[numpy.uint16],\n clen: float,\n):\n \"\"\"\n Write to the file data that is not related to a specific event (masks, powders)\n\n Parameters:\n\n powder_hits (NDArray[numpy.float_]): Virtual powder pattern from hits\n\n powder_misses (NDArray[numpy.float_]): Virtual powder pattern from hits\n\n mask: (NDArray[numpy.uint16]): Pixel ask to write into the file\n\n \"\"\"\n # Add powders and mask to files, reshaping them to match the crystfel\n # convention\n self._outh5[\"/entry_1/data_1/powderHits\"][:] = powder_hits.reshape(\n -1, powder_hits.shape[-1]\n )\n self._outh5[\"/entry_1/data_1/powderMisses\"][:] = powder_misses.reshape(\n -1, powder_misses.shape[-1]\n )\n self._outh5[\"/entry_1/data_1/mask\"][:] = (1 - mask).reshape(\n -1, mask.shape[-1]\n ) # Crystfel expects inverted values\n\n # Add clen distance\n self._outh5[\"/LCLS/detector_1/EncoderValue\"][:] = clen\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.FindPeaksPyAlgos","title":"FindPeaksPyAlgos
","text":" Bases: Task
Task that performs peak finding using the PyAlgos peak finding algorithms and writes the peak information to CXI files.
Source code inlute/tasks/sfx_find_peaks.py
class FindPeaksPyAlgos(Task):\n \"\"\"\n Task that performs peak finding using the PyAlgos peak finding algorithms and\n writes the peak information to CXI files.\n \"\"\"\n\n def __init__(self, *, params: TaskParameters, use_mpi: bool = True) -> None:\n super().__init__(params=params, use_mpi=use_mpi)\n if self._task_parameters.compression is not None:\n from libpressio import PressioCompressor\n\n def _run(self) -> None:\n ds: Any = MPIDataSource(\n f\"exp={self._task_parameters.lute_config.experiment}:\"\n f\"run={self._task_parameters.lute_config.run}:smd\"\n )\n if self._task_parameters.n_events != 0:\n ds.break_after(self._task_parameters.n_events)\n\n det: Any = Detector(self._task_parameters.det_name)\n det.do_reshape_2d_to_3d(flag=True)\n\n evr: Any = Detector(self._task_parameters.event_receiver)\n\n i_x: Any = det.indexes_x(self._task_parameters.lute_config.run).astype(\n numpy.int64\n )\n i_y: Any = det.indexes_y(self._task_parameters.lute_config.run).astype(\n numpy.int64\n )\n ipx: Any\n ipy: Any\n ipx, ipy = det.point_indexes(\n self._task_parameters.lute_config.run, pxy_um=(0, 0)\n )\n\n alg: Any = None\n num_hits: int = 0\n num_events: int = 0\n num_empty_images: int = 0\n tag: str = self._task_parameters.tag\n if (tag != \"\") and (tag[0] != \"_\"):\n tag = \"_\" + tag\n\n evt: Any\n for evt in ds.events():\n\n evt_id: Any = evt.get(EventId)\n timestamp_seconds: int = evt_id.time()[0]\n timestamp_nanoseconds: int = evt_id.time()[1]\n timestamp_fiducials: int = evt_id.fiducials()\n event_codes: Any = evr.eventCodes(evt)\n\n if isinstance(self._task_parameters.pv_camera_length, float):\n clen: float = self._task_parameters.pv_camera_length\n else:\n clen = (\n ds.env().epicsStore().value(self._task_parameters.pv_camera_length)\n )\n\n if self._task_parameters.event_logic:\n if not self._task_parameters.event_code in event_codes:\n continue\n\n img: Any = det.calib(evt)\n\n if img is None:\n num_empty_images += 1\n continue\n\n if alg is None:\n det_shape: Tuple[int, ...] = img.shape\n if len(det_shape) == 3:\n det_shape = (det_shape[0] * det_shape[1], det_shape[2])\n else:\n det_shape = img.shape\n\n mask: NDArray[numpy.uint16] = numpy.ones(det_shape).astype(numpy.uint16)\n\n if self._task_parameters.psana_mask:\n mask = det.mask(\n self.task_parameters.run,\n calib=False,\n status=True,\n edges=False,\n centra=False,\n unbond=False,\n unbondnbrs=False,\n ).astype(numpy.uint16)\n\n hdffh: Any\n if self._task_parameters.mask_file is not None:\n with h5py.File(self._task_parameters.mask_file, \"r\") as hdffh:\n loaded_mask: NDArray[numpy.int] = hdffh[\"entry_1/data_1/mask\"][\n :\n ]\n mask *= loaded_mask.astype(numpy.uint16)\n\n file_writer: CxiWriter = CxiWriter(\n outdir=self._task_parameters.outdir,\n rank=ds.rank,\n exp=self._task_parameters.lute_config.experiment,\n run=self._task_parameters.lute_config.run,\n n_events=self._task_parameters.n_events,\n det_shape=det_shape,\n i_x=i_x,\n i_y=i_y,\n ipx=ipx,\n ipy=ipy,\n min_peaks=self._task_parameters.min_peaks,\n max_peaks=self._task_parameters.max_peaks,\n tag=tag,\n )\n alg: Any = PyAlgos(mask=mask, pbits=0) # pbits controls verbosity\n alg.set_peak_selection_pars(\n npix_min=self._task_parameters.npix_min,\n npix_max=self._task_parameters.npix_max,\n amax_thr=self._task_parameters.amax_thr,\n atot_thr=self._task_parameters.atot_thr,\n son_min=self._task_parameters.son_min,\n )\n\n if self._task_parameters.compression is not None:\n\n libpressio_config = generate_libpressio_configuration(\n compressor=self._task_parameters.compression.compressor,\n roi_window_size=self._task_parameters.compression.roi_window_size,\n bin_size=self._task_parameters.compression.bin_size,\n abs_error=self._task_parameters.compression.abs_error,\n libpressio_mask=mask,\n )\n\n powder_hits: NDArray[numpy.float_] = numpy.zeros(det_shape)\n powder_misses: NDArray[numpy.float_] = numpy.zeros(det_shape)\n\n peaks: Any = alg.peak_finder_v3r3(\n img,\n rank=self._task_parameters.peak_rank,\n r0=self._task_parameters.r0,\n dr=self._task_parameters.dr,\n # nsigm=self._task_parameters.nsigm,\n )\n\n num_events += 1\n\n if (peaks.shape[0] >= self._task_parameters.min_peaks) and (\n peaks.shape[0] <= self._task_parameters.max_peaks\n ):\n\n if self._task_parameters.compression is not None:\n\n libpressio_config_with_peaks = (\n add_peaks_to_libpressio_configuration(libpressio_config, peaks)\n )\n compressor = PressioCompressor.from_config(\n libpressio_config_with_peaks\n )\n compressed_img = compressor.encode(img)\n decompressed_img = numpy.zeros_like(img)\n decompressed = compressor.decode(compressed_img, decompressed_img)\n img = decompressed_img\n\n try:\n photon_energy: float = (\n Detector(\"EBeam\").get(evt).ebeamPhotonEnergy()\n )\n except AttributeError:\n photon_energy = (\n 1.23984197386209e-06\n / ds.env().epicsStore().value(\"SIOC:SYS0:ML00:AO192\")\n / 1.0e9\n )\n\n file_writer.write_event(\n img=img,\n peaks=peaks,\n timestamp_seconds=timestamp_seconds,\n timestamp_nanoseconds=timestamp_nanoseconds,\n timestamp_fiducials=timestamp_fiducials,\n photon_energy=photon_energy,\n )\n num_hits += 1\n\n # TODO: Fix bug here\n # generate / update powders\n if peaks.shape[0] >= self._task_parameters.min_peaks:\n powder_hits = numpy.maximum(\n powder_hits,\n img.reshape(-1, img.shape[-1]),\n )\n else:\n powder_misses = numpy.maximum(\n powder_misses,\n img.reshape(-1, img.shape[-1]),\n )\n\n if num_empty_images != 0:\n msg: Message = Message(\n contents=f\"Rank {ds.rank} encountered {num_empty_images} empty images.\"\n )\n self._report_to_executor(msg)\n\n file_writer.write_non_event_data(\n powder_hits=powder_hits,\n powder_misses=powder_misses,\n mask=mask,\n clen=clen,\n )\n\n file_writer.optimize_and_close_file(\n num_hits=num_hits, max_peaks=self._task_parameters.max_peaks\n )\n\n COMM_WORLD.Barrier()\n\n num_hits_per_rank: List[int] = COMM_WORLD.gather(num_hits, root=0)\n num_hits_total: int = COMM_WORLD.reduce(num_hits, SUM)\n num_events_per_rank: List[int] = COMM_WORLD.gather(num_events, root=0)\n\n if ds.rank == 0:\n master_fname: Path = write_master_file(\n mpi_size=ds.size,\n outdir=self._task_parameters.outdir,\n exp=self._task_parameters.lute_config.experiment,\n run=self._task_parameters.lute_config.run,\n tag=tag,\n n_hits_per_rank=num_hits_per_rank,\n n_hits_total=num_hits_total,\n )\n\n # Write final summary file\n f: TextIO\n with open(\n Path(self._task_parameters.outdir) / f\"peakfinding{tag}.summary\", \"w\"\n ) as f:\n print(f\"Number of events processed: {num_events_per_rank[-1]}\", file=f)\n print(f\"Number of hits found: {num_hits_total}\", file=f)\n print(\n \"Fractional hit rate: \"\n f\"{(num_hits_total/num_events_per_rank[-1]):.2f}\",\n file=f,\n )\n print(f\"No. hits per rank: {num_hits_per_rank}\", file=f)\n\n with open(Path(self._task_parameters.out_file), \"w\") as f:\n print(f\"{master_fname}\", file=f)\n\n # Write out_file\n\n def _post_run(self) -> None:\n super()._post_run()\n self._result.task_status = TaskStatus.COMPLETED\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.add_peaks_to_libpressio_configuration","title":"add_peaks_to_libpressio_configuration(lp_json, peaks)
","text":"Add peak infromation to libpressio configuration
Parameters:
lp_json: Dictionary storing the configuration JSON structure for the libpressio\n library.\n\npeaks (Any): Peak information as returned by psana.\n
Returns:
lp_json: Updated configuration JSON structure for the libpressio library.\n
Source code in lute/tasks/sfx_find_peaks.py
def add_peaks_to_libpressio_configuration(lp_json, peaks) -> Dict[str, Any]:\n \"\"\"\n Add peak infromation to libpressio configuration\n\n Parameters:\n\n lp_json: Dictionary storing the configuration JSON structure for the libpressio\n library.\n\n peaks (Any): Peak information as returned by psana.\n\n Returns:\n\n lp_json: Updated configuration JSON structure for the libpressio library.\n \"\"\"\n lp_json[\"compressor_config\"][\"pressio\"][\"roibin\"][\"roibin:centers\"] = (\n numpy.ascontiguousarray(numpy.uint64(peaks[:, [2, 1, 0]]))\n )\n return lp_json\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.generate_libpressio_configuration","title":"generate_libpressio_configuration(compressor, roi_window_size, bin_size, abs_error, libpressio_mask)
","text":"Create the configuration JSON for the libpressio library
Parameters:
compressor (Literal[\"sz3\", \"qoz\"]): Compression algorithm to use\n (\"qoz\" or \"sz3\").\n\nabs_error (float): Bound value for the absolute error.\n\nbin_size (int): Bining Size.\n\nroi_window_size (int): Default size of the ROI window.\n\nlibpressio_mask (NDArray): mask to be applied to the data.\n
Returns:
lp_json (Dict[str, Any]): Dictionary storing the JSON configuration structure\nfor the libpressio library\n
Source code in lute/tasks/sfx_find_peaks.py
def generate_libpressio_configuration(\n compressor: Literal[\"sz3\", \"qoz\"],\n roi_window_size: int,\n bin_size: int,\n abs_error: float,\n libpressio_mask,\n) -> Dict[str, Any]:\n \"\"\"\n Create the configuration JSON for the libpressio library\n\n Parameters:\n\n compressor (Literal[\"sz3\", \"qoz\"]): Compression algorithm to use\n (\"qoz\" or \"sz3\").\n\n abs_error (float): Bound value for the absolute error.\n\n bin_size (int): Bining Size.\n\n roi_window_size (int): Default size of the ROI window.\n\n libpressio_mask (NDArray): mask to be applied to the data.\n\n Returns:\n\n lp_json (Dict[str, Any]): Dictionary storing the JSON configuration structure\n for the libpressio library\n \"\"\"\n\n if compressor == \"qoz\":\n pressio_opts: Dict[str, Any] = {\n \"pressio:abs\": abs_error,\n \"qoz\": {\"qoz:stride\": 8},\n }\n elif compressor == \"sz3\":\n pressio_opts = {\"pressio:abs\": abs_error}\n\n lp_json = {\n \"compressor_id\": \"pressio\",\n \"early_config\": {\n \"pressio\": {\n \"pressio:compressor\": \"roibin\",\n \"roibin\": {\n \"roibin:metric\": \"composite\",\n \"roibin:background\": \"mask_binning\",\n \"roibin:roi\": \"fpzip\",\n \"background\": {\n \"binning:compressor\": \"pressio\",\n \"mask_binning:compressor\": \"pressio\",\n \"pressio\": {\"pressio:compressor\": compressor},\n },\n \"composite\": {\n \"composite:plugins\": [\n \"size\",\n \"time\",\n \"input_stats\",\n \"error_stat\",\n ]\n },\n },\n }\n },\n \"compressor_config\": {\n \"pressio\": {\n \"roibin\": {\n \"roibin:roi_size\": [roi_window_size, roi_window_size, 0],\n \"roibin:centers\": None, # \"roibin:roi_strategy\": \"coordinates\",\n \"roibin:nthreads\": 4,\n \"roi\": {\"fpzip:prec\": 0},\n \"background\": {\n \"mask_binning:mask\": None,\n \"mask_binning:shape\": [bin_size, bin_size, 1],\n \"mask_binning:nthreads\": 4,\n \"pressio\": pressio_opts,\n },\n }\n }\n },\n \"name\": \"pressio\",\n }\n\n lp_json[\"compressor_config\"][\"pressio\"][\"roibin\"][\"background\"][\n \"mask_binning:mask\"\n ] = (1 - libpressio_mask)\n\n return lp_json\n
"},{"location":"source/tasks/sfx_find_peaks/#tasks.sfx_find_peaks.write_master_file","title":"write_master_file(mpi_size, outdir, exp, run, tag, n_hits_per_rank, n_hits_total)
","text":"Generate a virtual dataset to map all individual files for this run.
Parameters:
mpi_size (int): Number of ranks in the MPI pool.\n\noutdir (str): Output directory for cxi file.\n\nexp (str): Experiment string.\n\nrun (int): Experimental run.\n\ntag (str): Tag to append to cxi file names.\n\nn_hits_per_rank (List[int]): Array containing the number of hits found on each\n node processing data.\n\nn_hits_total (int): Total number of hits found across all nodes.\n
Returns:
The path to the the written master file\n
Source code in lute/tasks/sfx_find_peaks.py
def write_master_file(\n mpi_size: int,\n outdir: str,\n exp: str,\n run: int,\n tag: str,\n n_hits_per_rank: List[int],\n n_hits_total: int,\n) -> Path:\n \"\"\"\n Generate a virtual dataset to map all individual files for this run.\n\n Parameters:\n\n mpi_size (int): Number of ranks in the MPI pool.\n\n outdir (str): Output directory for cxi file.\n\n exp (str): Experiment string.\n\n run (int): Experimental run.\n\n tag (str): Tag to append to cxi file names.\n\n n_hits_per_rank (List[int]): Array containing the number of hits found on each\n node processing data.\n\n n_hits_total (int): Total number of hits found across all nodes.\n\n Returns:\n\n The path to the the written master file\n \"\"\"\n # Retrieve paths to the files containing data\n fnames: List[Path] = []\n fi: int\n for fi in range(mpi_size):\n if n_hits_per_rank[fi] > 0:\n fnames.append(Path(outdir) / f\"{exp}_r{run:0>4}_{fi}{tag}.cxi\")\n if len(fnames) == 0:\n sys.exit(\"No hits found\")\n\n # Retrieve list of entries to populate in the virtual hdf5 file\n dname_list, key_list, shape_list, dtype_list = [], [], [], []\n datasets = [\"/entry_1/result_1\", \"/LCLS/detector_1\", \"/LCLS\", \"/entry_1/data_1\"]\n f = h5py.File(fnames[0], \"r\")\n for dname in datasets:\n dset = f[dname]\n for key in dset.keys():\n if f\"{dname}/{key}\" not in datasets:\n dname_list.append(dname)\n key_list.append(key)\n shape_list.append(dset[key].shape)\n dtype_list.append(dset[key].dtype)\n f.close()\n\n # Compute cumulative powder hits and misses for all files\n powder_hits, powder_misses = None, None\n for fn in fnames:\n f = h5py.File(fn, \"r\")\n if powder_hits is None:\n powder_hits = f[\"entry_1/data_1/powderHits\"][:].copy()\n powder_misses = f[\"entry_1/data_1/powderMisses\"][:].copy()\n else:\n powder_hits = numpy.maximum(\n powder_hits, f[\"entry_1/data_1/powderHits\"][:].copy()\n )\n powder_misses = numpy.maximum(\n powder_misses, f[\"entry_1/data_1/powderMisses\"][:].copy()\n )\n f.close()\n\n vfname: Path = Path(outdir) / f\"{exp}_r{run:0>4}{tag}.cxi\"\n with h5py.File(vfname, \"w\") as vdf:\n\n # Write the virtual hdf5 file\n for dnum in range(len(dname_list)):\n dname = f\"{dname_list[dnum]}/{key_list[dnum]}\"\n if key_list[dnum] not in [\"mask\", \"powderHits\", \"powderMisses\"]:\n layout = h5py.VirtualLayout(\n shape=(n_hits_total,) + shape_list[dnum][1:], dtype=dtype_list[dnum]\n )\n cursor = 0\n for i, fn in enumerate(fnames):\n vsrc = h5py.VirtualSource(\n fn, dname, shape=(n_hits_per_rank[i],) + shape_list[dnum][1:]\n )\n if len(shape_list[dnum]) == 1:\n layout[cursor : cursor + n_hits_per_rank[i]] = vsrc\n else:\n layout[cursor : cursor + n_hits_per_rank[i], :] = vsrc\n cursor += n_hits_per_rank[i]\n vdf.create_virtual_dataset(dname, layout, fillvalue=-1)\n\n vdf[\"entry_1/data_1/powderHits\"] = powder_hits\n vdf[\"entry_1/data_1/powderMisses\"] = powder_misses\n\n return vfname\n
"},{"location":"source/tasks/sfx_index/","title":"sfx_index","text":"Classes for indexing tasks in SFX.
Classes:
Name DescriptionConcatenateStreamFIles
task that merges multiple stream files into a single file.
"},{"location":"source/tasks/sfx_index/#tasks.sfx_index.ConcatenateStreamFiles","title":"ConcatenateStreamFiles
","text":" Bases: Task
Task that merges stream files located within a directory tree.
Source code inlute/tasks/sfx_index.py
class ConcatenateStreamFiles(Task):\n \"\"\"\n Task that merges stream files located within a directory tree.\n \"\"\"\n\n def __init__(self, *, params: TaskParameters) -> None:\n super().__init__(params=params)\n\n def _run(self) -> None:\n\n stream_file_path: Path = Path(self._task_parameters.in_file)\n stream_file_list: List[Path] = list(\n stream_file_path.rglob(f\"{self._task_parameters.tag}_*.stream\")\n )\n\n processed_file_list = [str(stream_file) for stream_file in stream_file_list]\n\n msg: Message = Message(\n contents=f\"Merging following stream files: {processed_file_list} into \"\n f\"{self._task_parameters.out_file}\",\n )\n self._report_to_executor(msg)\n\n wfd: BinaryIO\n with open(self._task_parameters.out_file, \"wb\") as wfd:\n infile: Path\n for infile in stream_file_list:\n fd: BinaryIO\n with open(infile, \"rb\") as fd:\n shutil.copyfileobj(fd, wfd)\n
"},{"location":"source/tasks/task/","title":"task","text":"Base classes for implementing analysis tasks.
Classes:
Name DescriptionTask
Abstract base class from which all analysis tasks are derived.
ThirdPartyTask
Class to run a third-party executable binary as a Task
.
DescribedAnalysis
dataclass
","text":"Complete analysis description. Held by an Executor.
Source code inlute/tasks/dataclasses.py
@dataclass\nclass DescribedAnalysis:\n \"\"\"Complete analysis description. Held by an Executor.\"\"\"\n\n task_result: TaskResult\n task_parameters: Optional[TaskParameters]\n task_env: Dict[str, str]\n poll_interval: float\n communicator_desc: List[str]\n
"},{"location":"source/tasks/task/#tasks.task.ElogSummaryPlots","title":"ElogSummaryPlots
dataclass
","text":"Holds a graphical summary intended for display in the eLog.
Attributes:
Name Type Descriptiondisplay_name
str
This represents both a path and how the result will be displayed in the eLog. Can include \"/\" characters. E.g. display_name = \"scans/my_motor_scan\"
will have plots shown on a \"my_motor_scan\" page, under a \"scans\" tab. This format mirrors how the file is stored on disk as well.
lute/tasks/dataclasses.py
@dataclass\nclass ElogSummaryPlots:\n \"\"\"Holds a graphical summary intended for display in the eLog.\n\n Attributes:\n display_name (str): This represents both a path and how the result will be\n displayed in the eLog. Can include \"/\" characters. E.g.\n `display_name = \"scans/my_motor_scan\"` will have plots shown\n on a \"my_motor_scan\" page, under a \"scans\" tab. This format mirrors\n how the file is stored on disk as well.\n \"\"\"\n\n display_name: str\n figures: Union[pn.Tabs, hv.Image, plt.Figure]\n
"},{"location":"source/tasks/task/#tasks.task.Task","title":"Task
","text":" Bases: ABC
Abstract base class for analysis tasks.
Attributes:
Name Type Descriptionname
str
The name of the Task.
Source code inlute/tasks/task.py
class Task(ABC):\n \"\"\"Abstract base class for analysis tasks.\n\n Attributes:\n name (str): The name of the Task.\n \"\"\"\n\n def __init__(self, *, params: TaskParameters, use_mpi: bool = False) -> None:\n \"\"\"Initialize a Task.\n\n Args:\n params (TaskParameters): Parameters needed to properly configure\n the analysis task. These are NOT related to execution parameters\n (number of cores, etc), except, potentially, in case of binary\n executable sub-classes.\n\n use_mpi (bool): Whether this Task requires the use of MPI.\n This determines the behaviour and timing of certain signals\n and ensures appropriate barriers are placed to not end\n processing until all ranks have finished.\n \"\"\"\n self.name: str = str(type(self)).split(\"'\")[1].split(\".\")[-1]\n self._result: TaskResult = TaskResult(\n task_name=self.name,\n task_status=TaskStatus.PENDING,\n summary=\"PENDING\",\n payload=\"\",\n )\n self._task_parameters: TaskParameters = params\n timeout: int = self._task_parameters.lute_config.task_timeout\n signal.setitimer(signal.ITIMER_REAL, timeout)\n\n run_directory: Optional[str] = self._task_parameters.Config.run_directory\n if run_directory is not None:\n try:\n os.chdir(run_directory)\n except FileNotFoundError:\n warnings.warn(\n (\n f\"Attempt to change to {run_directory}, but it is not found!\\n\"\n f\"Will attempt to run from {os.getcwd()}. It may fail!\"\n ),\n category=UserWarning,\n )\n self._use_mpi: bool = use_mpi\n\n def run(self) -> None:\n \"\"\"Calls the analysis routines and any pre/post task functions.\n\n This method is part of the public API and should not need to be modified\n in any subclasses.\n \"\"\"\n self._signal_start()\n self._pre_run()\n self._run()\n self._post_run()\n self._signal_result()\n\n @abstractmethod\n def _run(self) -> None:\n \"\"\"Actual analysis to run. Overridden by subclasses.\n\n Separating the calling API from the implementation allows `run` to\n have pre and post task functionality embedded easily into a single\n function call.\n \"\"\"\n ...\n\n def _pre_run(self) -> None:\n \"\"\"Code to run BEFORE the main analysis takes place.\n\n This function may, or may not, be employed by subclasses.\n \"\"\"\n ...\n\n def _post_run(self) -> None:\n \"\"\"Code to run AFTER the main analysis takes place.\n\n This function may, or may not, be employed by subclasses.\n \"\"\"\n ...\n\n @property\n def result(self) -> TaskResult:\n \"\"\"TaskResult: Read-only Task Result information.\"\"\"\n return self._result\n\n def __call__(self) -> None:\n self.run()\n\n def _signal_start(self) -> None:\n \"\"\"Send the signal that the Task will begin shortly.\"\"\"\n start_msg: Message = Message(\n contents=self._task_parameters, signal=\"TASK_STARTED\"\n )\n self._result.task_status = TaskStatus.RUNNING\n if self._use_mpi:\n from mpi4py import MPI\n\n comm: MPI.Intracomm = MPI.COMM_WORLD\n rank: int = comm.Get_rank()\n comm.Barrier()\n if rank == 0:\n self._report_to_executor(start_msg)\n else:\n self._report_to_executor(start_msg)\n\n def _signal_result(self) -> None:\n \"\"\"Send the signal that results are ready along with the results.\"\"\"\n signal: str = \"TASK_RESULT\"\n results_msg: Message = Message(contents=self.result, signal=signal)\n if self._use_mpi:\n from mpi4py import MPI\n\n comm: MPI.Intracomm = MPI.COMM_WORLD\n rank: int = comm.Get_rank()\n comm.Barrier()\n if rank == 0:\n self._report_to_executor(results_msg)\n else:\n self._report_to_executor(results_msg)\n time.sleep(0.1)\n\n def _report_to_executor(self, msg: Message) -> None:\n \"\"\"Send a message to the Executor.\n\n Details of `Communicator` choice are hidden from the caller. This\n method may be overriden by subclasses with specialized functionality.\n\n Args:\n msg (Message): The message object to send.\n \"\"\"\n communicator: Communicator\n if isinstance(msg.contents, str) or msg.contents is None:\n communicator = PipeCommunicator()\n else:\n communicator = SocketCommunicator()\n\n communicator.delayed_setup()\n communicator.write(msg)\n communicator.clear_communicator()\n\n def clean_up_timeout(self) -> None:\n \"\"\"Perform any necessary cleanup actions before exit if timing out.\"\"\"\n ...\n
"},{"location":"source/tasks/task/#tasks.task.Task.result","title":"result: TaskResult
property
","text":"TaskResult: Read-only Task Result information.
"},{"location":"source/tasks/task/#tasks.task.Task.__init__","title":"__init__(*, params, use_mpi=False)
","text":"Initialize a Task.
Parameters:
Name Type Description Defaultparams
TaskParameters
Parameters needed to properly configure the analysis task. These are NOT related to execution parameters (number of cores, etc), except, potentially, in case of binary executable sub-classes.
requireduse_mpi
bool
Whether this Task requires the use of MPI. This determines the behaviour and timing of certain signals and ensures appropriate barriers are placed to not end processing until all ranks have finished.
False
Source code in lute/tasks/task.py
def __init__(self, *, params: TaskParameters, use_mpi: bool = False) -> None:\n \"\"\"Initialize a Task.\n\n Args:\n params (TaskParameters): Parameters needed to properly configure\n the analysis task. These are NOT related to execution parameters\n (number of cores, etc), except, potentially, in case of binary\n executable sub-classes.\n\n use_mpi (bool): Whether this Task requires the use of MPI.\n This determines the behaviour and timing of certain signals\n and ensures appropriate barriers are placed to not end\n processing until all ranks have finished.\n \"\"\"\n self.name: str = str(type(self)).split(\"'\")[1].split(\".\")[-1]\n self._result: TaskResult = TaskResult(\n task_name=self.name,\n task_status=TaskStatus.PENDING,\n summary=\"PENDING\",\n payload=\"\",\n )\n self._task_parameters: TaskParameters = params\n timeout: int = self._task_parameters.lute_config.task_timeout\n signal.setitimer(signal.ITIMER_REAL, timeout)\n\n run_directory: Optional[str] = self._task_parameters.Config.run_directory\n if run_directory is not None:\n try:\n os.chdir(run_directory)\n except FileNotFoundError:\n warnings.warn(\n (\n f\"Attempt to change to {run_directory}, but it is not found!\\n\"\n f\"Will attempt to run from {os.getcwd()}. It may fail!\"\n ),\n category=UserWarning,\n )\n self._use_mpi: bool = use_mpi\n
"},{"location":"source/tasks/task/#tasks.task.Task.clean_up_timeout","title":"clean_up_timeout()
","text":"Perform any necessary cleanup actions before exit if timing out.
Source code inlute/tasks/task.py
def clean_up_timeout(self) -> None:\n \"\"\"Perform any necessary cleanup actions before exit if timing out.\"\"\"\n ...\n
"},{"location":"source/tasks/task/#tasks.task.Task.run","title":"run()
","text":"Calls the analysis routines and any pre/post task functions.
This method is part of the public API and should not need to be modified in any subclasses.
Source code inlute/tasks/task.py
def run(self) -> None:\n \"\"\"Calls the analysis routines and any pre/post task functions.\n\n This method is part of the public API and should not need to be modified\n in any subclasses.\n \"\"\"\n self._signal_start()\n self._pre_run()\n self._run()\n self._post_run()\n self._signal_result()\n
"},{"location":"source/tasks/task/#tasks.task.TaskResult","title":"TaskResult
dataclass
","text":"Class for storing the result of a Task's execution with metadata.
Attributes:
Name Type Descriptiontask_name
str
Name of the associated task which produced it.
task_status
TaskStatus
Status of associated task.
summary
str
Short message/summary associated with the result.
payload
Any
Actual result. May be data in any format.
impl_schemas
Optional[str]
A string listing Task
schemas implemented by the associated Task
. Schemas define the category and expected output of the Task
. An individual task may implement/conform to multiple schemas. Multiple schemas are separated by ';', e.g. * impl_schemas = \"schema1;schema2\"
lute/tasks/dataclasses.py
@dataclass\nclass TaskResult:\n \"\"\"Class for storing the result of a Task's execution with metadata.\n\n Attributes:\n task_name (str): Name of the associated task which produced it.\n\n task_status (TaskStatus): Status of associated task.\n\n summary (str): Short message/summary associated with the result.\n\n payload (Any): Actual result. May be data in any format.\n\n impl_schemas (Optional[str]): A string listing `Task` schemas implemented\n by the associated `Task`. Schemas define the category and expected\n output of the `Task`. An individual task may implement/conform to\n multiple schemas. Multiple schemas are separated by ';', e.g.\n * impl_schemas = \"schema1;schema2\"\n \"\"\"\n\n task_name: str\n task_status: TaskStatus\n summary: str\n payload: Any\n impl_schemas: Optional[str] = None\n
"},{"location":"source/tasks/task/#tasks.task.TaskStatus","title":"TaskStatus
","text":" Bases: Enum
Possible Task statuses.
Source code inlute/tasks/dataclasses.py
class TaskStatus(Enum):\n \"\"\"Possible Task statuses.\"\"\"\n\n PENDING = 0\n \"\"\"\n Task has yet to run. Is Queued, or waiting for prior tasks.\n \"\"\"\n RUNNING = 1\n \"\"\"\n Task is in the process of execution.\n \"\"\"\n COMPLETED = 2\n \"\"\"\n Task has completed without fatal errors.\n \"\"\"\n FAILED = 3\n \"\"\"\n Task encountered a fatal error.\n \"\"\"\n STOPPED = 4\n \"\"\"\n Task was, potentially temporarily, stopped/suspended.\n \"\"\"\n CANCELLED = 5\n \"\"\"\n Task was cancelled prior to completion or failure.\n \"\"\"\n TIMEDOUT = 6\n \"\"\"\n Task did not reach completion due to timeout.\n \"\"\"\n
"},{"location":"source/tasks/task/#tasks.task.TaskStatus.CANCELLED","title":"CANCELLED = 5
class-attribute
instance-attribute
","text":"Task was cancelled prior to completion or failure.
"},{"location":"source/tasks/task/#tasks.task.TaskStatus.COMPLETED","title":"COMPLETED = 2
class-attribute
instance-attribute
","text":"Task has completed without fatal errors.
"},{"location":"source/tasks/task/#tasks.task.TaskStatus.FAILED","title":"FAILED = 3
class-attribute
instance-attribute
","text":"Task encountered a fatal error.
"},{"location":"source/tasks/task/#tasks.task.TaskStatus.PENDING","title":"PENDING = 0
class-attribute
instance-attribute
","text":"Task has yet to run. Is Queued, or waiting for prior tasks.
"},{"location":"source/tasks/task/#tasks.task.TaskStatus.RUNNING","title":"RUNNING = 1
class-attribute
instance-attribute
","text":"Task is in the process of execution.
"},{"location":"source/tasks/task/#tasks.task.TaskStatus.STOPPED","title":"STOPPED = 4
class-attribute
instance-attribute
","text":"Task was, potentially temporarily, stopped/suspended.
"},{"location":"source/tasks/task/#tasks.task.TaskStatus.TIMEDOUT","title":"TIMEDOUT = 6
class-attribute
instance-attribute
","text":"Task did not reach completion due to timeout.
"},{"location":"source/tasks/task/#tasks.task.ThirdPartyTask","title":"ThirdPartyTask
","text":" Bases: Task
A Task
interface to analysis with binary executables.
lute/tasks/task.py
class ThirdPartyTask(Task):\n \"\"\"A `Task` interface to analysis with binary executables.\"\"\"\n\n def __init__(self, *, params: TaskParameters) -> None:\n \"\"\"Initialize a Task.\n\n Args:\n params (TaskParameters): Parameters needed to properly configure\n the analysis task. `Task`s of this type MUST include the name\n of a binary to run and any arguments which should be passed to\n it (as would be done via command line). The binary is included\n with the parameter `executable`. All other parameter names are\n assumed to be the long/extended names of the flag passed on the\n command line by default:\n * `arg_name = 3` is converted to `--arg_name 3`\n Positional arguments can be included with `p_argN` where `N` is\n any integer:\n * `p_arg1 = 3` is converted to `3`\n\n Note that it is NOT recommended to rely on this default behaviour\n as command-line arguments can be passed in many ways. Refer to\n the dcoumentation at\n https://slac-lcls.github.io/lute/tutorial/new_task/\n under \"Speciyfing a TaskParameters Model for your Task\" for more\n information on how to control parameter parsing from within your\n TaskParameters model definition.\n \"\"\"\n super().__init__(params=params)\n self._cmd = self._task_parameters.executable\n self._args_list: List[str] = [self._cmd]\n self._template_context: Dict[str, Any] = {}\n\n def _add_to_jinja_context(self, param_name: str, value: Any) -> None:\n \"\"\"Store a parameter as a Jinja template variable.\n\n Variables are stored in a dictionary which is used to fill in a\n premade Jinja template for a third party configuration file.\n\n Args:\n param_name (str): Name to store the variable as. This should be\n the name defined in the corresponding pydantic model. This name\n MUST match the name used in the Jinja Template!\n value (Any): The value to store. If possible, large chunks of the\n template should be represented as a single dictionary for\n simplicity; however, any type can be stored as needed.\n \"\"\"\n context_update: Dict[str, Any] = {param_name: value}\n if __debug__:\n msg: Message = Message(contents=f\"TemplateParameters: {context_update}\")\n self._report_to_executor(msg)\n self._template_context.update(context_update)\n\n def _template_to_config_file(self) -> None:\n \"\"\"Convert a template file into a valid configuration file.\n\n Uses Jinja to fill in a provided template file with variables supplied\n through the LUTE config file. This facilitates parameter modification\n for third party tasks which use a separate configuration, in addition\n to, or instead of, command-line arguments.\n \"\"\"\n from jinja2 import Environment, FileSystemLoader, Template\n\n out_file: str = self._task_parameters.lute_template_cfg.output_path\n template_name: str = self._task_parameters.lute_template_cfg.template_name\n\n lute_path: Optional[str] = os.getenv(\"LUTE_PATH\")\n template_dir: str\n if lute_path is None:\n warnings.warn(\n \"LUTE_PATH is None in Task process! Using relative path for templates!\",\n category=UserWarning,\n )\n template_dir: str = \"../../config/templates\"\n else:\n template_dir = f\"{lute_path}/config/templates\"\n environment: Environment = Environment(loader=FileSystemLoader(template_dir))\n template: Template = environment.get_template(template_name)\n\n with open(out_file, \"w\", encoding=\"utf-8\") as cfg_out:\n cfg_out.write(template.render(self._template_context))\n\n def _pre_run(self) -> None:\n \"\"\"Parse the parameters into an appropriate argument list.\n\n Arguments are identified by a `flag_type` attribute, defined in the\n pydantic model, which indicates how to pass the parameter and its\n argument on the command-line. This method parses flag:value pairs\n into an appropriate list to be used to call the executable.\n\n Note:\n ThirdPartyParameter objects are returned by custom model validators.\n Objects of this type are assumed to be used for a templated config\n file used by the third party executable for configuration. The parsing\n of these parameters is performed separately by a template file used as\n an input to Jinja. This method solely identifies the necessary objects\n and passes them all along. Refer to the template files and pydantic\n models for more information on how these parameters are defined and\n identified.\n \"\"\"\n super()._pre_run()\n full_schema: Dict[str, Union[str, Dict[str, Any]]] = (\n self._task_parameters.schema()\n )\n short_flags_use_eq: bool\n long_flags_use_eq: bool\n if hasattr(self._task_parameters.Config, \"short_flags_use_eq\"):\n short_flags_use_eq: bool = self._task_parameters.Config.short_flags_use_eq\n long_flags_use_eq: bool = self._task_parameters.Config.long_flags_use_eq\n else:\n short_flags_use_eq = False\n long_flags_use_eq = False\n for param, value in self._task_parameters.dict().items():\n # Clunky test with __dict__[param] because compound model-types are\n # converted to `dict`. E.g. type(value) = dict not AnalysisHeader\n if (\n param == \"executable\"\n or value is None # Cannot have empty values in argument list for execvp\n or value == \"\" # But do want to include, e.g. 0\n or isinstance(self._task_parameters.__dict__[param], TemplateConfig)\n or isinstance(self._task_parameters.__dict__[param], AnalysisHeader)\n ):\n continue\n if isinstance(self._task_parameters.__dict__[param], TemplateParameters):\n # TemplateParameters objects have a single parameter `params`\n self._add_to_jinja_context(param_name=param, value=value.params)\n continue\n\n param_attributes: Dict[str, Any] = full_schema[\"properties\"][param]\n # Some model params do not match the commnad-line parameter names\n param_repr: str\n if \"rename_param\" in param_attributes:\n param_repr = param_attributes[\"rename_param\"]\n else:\n param_repr = param\n if \"flag_type\" in param_attributes:\n flag: str = param_attributes[\"flag_type\"]\n if flag:\n # \"-\" or \"--\" flags\n if flag == \"--\" and isinstance(value, bool) and not value:\n continue\n constructed_flag: str = f\"{flag}{param_repr}\"\n if flag == \"--\" and isinstance(value, bool) and value:\n # On/off flag, e.g. something like --verbose: No Arg\n self._args_list.append(f\"{constructed_flag}\")\n continue\n if (flag == \"-\" and short_flags_use_eq) or (\n flag == \"--\" and long_flags_use_eq\n ): # Must come after above check! Otherwise you get --param=True\n # Flags following --param=value or -param=value\n constructed_flag = f\"{constructed_flag}={value}\"\n self._args_list.append(f\"{constructed_flag}\")\n continue\n self._args_list.append(f\"{constructed_flag}\")\n else:\n warnings.warn(\n (\n f\"Model parameters should be defined using Field(...,flag_type='')\"\n f\" in the future. Parameter: {param}\"\n ),\n category=PendingDeprecationWarning,\n )\n if len(param) == 1: # Single-dash flags\n if short_flags_use_eq:\n self._args_list.append(f\"-{param_repr}={value}\")\n continue\n self._args_list.append(f\"-{param_repr}\")\n elif \"p_arg\" in param: # Positional arguments\n pass\n else: # Double-dash flags\n if isinstance(value, bool) and not value:\n continue\n if long_flags_use_eq:\n self._args_list.append(f\"--{param_repr}={value}\")\n continue\n self._args_list.append(f\"--{param_repr}\")\n if isinstance(value, bool) and value:\n continue\n if isinstance(value, str) and \" \" in value:\n for val in value.split():\n self._args_list.append(f\"{val}\")\n else:\n self._args_list.append(f\"{value}\")\n if (\n hasattr(self._task_parameters, \"lute_template_cfg\")\n and self._template_context\n ):\n self._template_to_config_file()\n\n def _run(self) -> None:\n \"\"\"Execute the new program by replacing the current process.\"\"\"\n if __debug__:\n time.sleep(0.1)\n msg: Message = Message(contents=self._formatted_command())\n self._report_to_executor(msg)\n LUTE_DEBUG_EXIT(\"LUTE_DEBUG_BEFORE_TPP_EXEC\")\n os.execvp(file=self._cmd, args=self._args_list)\n\n def _formatted_command(self) -> str:\n \"\"\"Returns the command as it would passed on the command-line.\"\"\"\n formatted_cmd: str = \"\".join(f\"{arg} \" for arg in self._args_list)\n return formatted_cmd\n\n def _signal_start(self) -> None:\n \"\"\"Override start signal method to switch communication methods.\"\"\"\n super()._signal_start()\n time.sleep(0.05)\n signal: str = \"NO_PICKLE_MODE\"\n msg: Message = Message(signal=signal)\n self._report_to_executor(msg)\n
"},{"location":"source/tasks/task/#tasks.task.ThirdPartyTask.__init__","title":"__init__(*, params)
","text":"Initialize a Task.
Parameters:
Name Type Description Defaultparams
TaskParameters
Parameters needed to properly configure the analysis task. Task
s of this type MUST include the name of a binary to run and any arguments which should be passed to it (as would be done via command line). The binary is included with the parameter executable
. All other parameter names are assumed to be the long/extended names of the flag passed on the command line by default: * arg_name = 3
is converted to --arg_name 3
Positional arguments can be included with p_argN
where N
is any integer: * p_arg1 = 3
is converted to 3
Note that it is NOT recommended to rely on this default behaviour as command-line arguments can be passed in many ways. Refer to the dcoumentation at https://slac-lcls.github.io/lute/tutorial/new_task/ under \"Speciyfing a TaskParameters Model for your Task\" for more information on how to control parameter parsing from within your TaskParameters model definition.
required Source code inlute/tasks/task.py
def __init__(self, *, params: TaskParameters) -> None:\n \"\"\"Initialize a Task.\n\n Args:\n params (TaskParameters): Parameters needed to properly configure\n the analysis task. `Task`s of this type MUST include the name\n of a binary to run and any arguments which should be passed to\n it (as would be done via command line). The binary is included\n with the parameter `executable`. All other parameter names are\n assumed to be the long/extended names of the flag passed on the\n command line by default:\n * `arg_name = 3` is converted to `--arg_name 3`\n Positional arguments can be included with `p_argN` where `N` is\n any integer:\n * `p_arg1 = 3` is converted to `3`\n\n Note that it is NOT recommended to rely on this default behaviour\n as command-line arguments can be passed in many ways. Refer to\n the dcoumentation at\n https://slac-lcls.github.io/lute/tutorial/new_task/\n under \"Speciyfing a TaskParameters Model for your Task\" for more\n information on how to control parameter parsing from within your\n TaskParameters model definition.\n \"\"\"\n super().__init__(params=params)\n self._cmd = self._task_parameters.executable\n self._args_list: List[str] = [self._cmd]\n self._template_context: Dict[str, Any] = {}\n
"},{"location":"source/tasks/test/","title":"test","text":"Basic test Tasks for testing functionality.
Classes:
Name DescriptionTest
Simplest test Task - runs a 10 iteration loop and returns a result.
TestSocket
Test Task which sends larger data to test socket IPC.
TestWriteOutput
Test Task which writes an output file.
TestReadOutput
Test Task which reads in a file. Can be used to test database access.
"},{"location":"source/tasks/test/#tasks.test.Test","title":"Test
","text":" Bases: Task
Simple test Task to ensure subprocess and pipe-based IPC work.
Source code inlute/tasks/test.py
class Test(Task):\n \"\"\"Simple test Task to ensure subprocess and pipe-based IPC work.\"\"\"\n\n def __init__(self, *, params: TaskParameters) -> None:\n super().__init__(params=params)\n\n def _run(self) -> None:\n for i in range(10):\n time.sleep(1)\n msg: Message = Message(contents=f\"Test message {i}\")\n self._report_to_executor(msg)\n if self._task_parameters.throw_error:\n raise RuntimeError(\"Testing Error!\")\n\n def _post_run(self) -> None:\n self._result.summary = \"Test Finished.\"\n self._result.task_status = TaskStatus.COMPLETED\n time.sleep(0.1)\n
"},{"location":"source/tasks/test/#tasks.test.TestReadOutput","title":"TestReadOutput
","text":" Bases: Task
Simple test Task to read in output from the test Task above.
Its pydantic model relies on a database access to retrieve the output file.
Source code inlute/tasks/test.py
class TestReadOutput(Task):\n \"\"\"Simple test Task to read in output from the test Task above.\n\n Its pydantic model relies on a database access to retrieve the output file.\n \"\"\"\n\n def __init__(self, *, params: TaskParameters) -> None:\n super().__init__(params=params)\n\n def _run(self) -> None:\n array: np.ndarray = np.loadtxt(self._task_parameters.in_file, delimiter=\",\")\n self._report_to_executor(msg=Message(contents=\"Successfully loaded data!\"))\n for i in range(5):\n time.sleep(1)\n\n def _post_run(self) -> None:\n super()._post_run()\n self._result.summary = \"Was able to load data.\"\n self._result.payload = \"This Task produces no output.\"\n self._result.task_status = TaskStatus.COMPLETED\n
"},{"location":"source/tasks/test/#tasks.test.TestSocket","title":"TestSocket
","text":" Bases: Task
Simple test Task to ensure basic IPC over Unix sockets works.
Source code inlute/tasks/test.py
class TestSocket(Task):\n \"\"\"Simple test Task to ensure basic IPC over Unix sockets works.\"\"\"\n\n def __init__(self, *, params: TaskParameters) -> None:\n super().__init__(params=params)\n\n def _run(self) -> None:\n for i in range(self._task_parameters.num_arrays):\n msg: Message = Message(contents=f\"Sending array {i}\")\n self._report_to_executor(msg)\n time.sleep(0.05)\n msg: Message = Message(\n contents=np.random.rand(self._task_parameters.array_size)\n )\n self._report_to_executor(msg)\n\n def _post_run(self) -> None:\n super()._post_run()\n self._result.summary = f\"Sent {self._task_parameters.num_arrays} arrays\"\n self._result.payload = np.random.rand(self._task_parameters.array_size)\n self._result.task_status = TaskStatus.COMPLETED\n
"},{"location":"source/tasks/test/#tasks.test.TestWriteOutput","title":"TestWriteOutput
","text":" Bases: Task
Simple test Task to write output other Tasks depend on.
Source code inlute/tasks/test.py
class TestWriteOutput(Task):\n \"\"\"Simple test Task to write output other Tasks depend on.\"\"\"\n\n def __init__(self, *, params: TaskParameters) -> None:\n super().__init__(params=params)\n\n def _run(self) -> None:\n for i in range(self._task_parameters.num_vals):\n # Doing some calculations...\n time.sleep(0.05)\n if i % 10 == 0:\n msg: Message = Message(contents=f\"Processed {i+1} values!\")\n self._report_to_executor(msg)\n\n def _post_run(self) -> None:\n super()._post_run()\n work_dir: str = self._task_parameters.lute_config.work_dir\n out_file: str = f\"{work_dir}/{self._task_parameters.outfile_name}\"\n array: np.ndarray = np.random.rand(self._task_parameters.num_vals)\n np.savetxt(out_file, array, delimiter=\",\")\n self._result.summary = \"Completed task successfully.\"\n self._result.payload = out_file\n self._result.task_status = TaskStatus.COMPLETED\n
"},{"location":"tutorial/creating_workflows/","title":"Workflows with Airflow","text":"Note: Airflow uses the term DAG, or directed acyclic graph, to describe workflows of tasks with defined (and acyclic) connectivities. This page will use the terms workflow and DAG interchangeably.
"},{"location":"tutorial/creating_workflows/#relevant-components","title":"Relevant Components","text":"In addition to the core LUTE package, a number of components are generally involved to run a workflow. The current set of scripts and objects are used to interface with Airflow, and the SLURM job scheduler. The core LUTE library can also be used to run workflows using different backends, and in the future these may be supported.
For building and running workflows using SLURM and Airflow, the following components are necessary, and will be described in more detail below: - Airflow launch script: launch_airflow.py
- This has a wrapper batch submission script: submit_launch_airflow.sh
. When running using the ARP (from the eLog), you MUST use this wrapper script instead of the Python script directly. - SLURM submission script: submit_slurm.sh
- Airflow operators: - JIDSlurmOperator
launch_airflow.py
","text":"Sends a request to an Airflow instance to submit a specific DAG (workflow). This script prepares an HTTP request with the appropriate parameters in a specific format.
A request involves the following information, most of which is retrieved automatically:
dag_run_data: Dict[str, Union[str, Dict[str, Union[str, int, List[str]]]]] = {\n \"dag_run_id\": str(uuid.uuid4()),\n \"conf\": {\n \"experiment\": os.environ.get(\"EXPERIMENT\"),\n \"run_id\": f\"{os.environ.get('RUN_NUM')}{datetime.datetime.utcnow().isoformat()}\",\n \"JID_UPDATE_COUNTERS\": os.environ.get(\"JID_UPDATE_COUNTERS\"),\n \"ARP_ROOT_JOB_ID\": os.environ.get(\"ARP_JOB_ID\"),\n \"ARP_LOCATION\": os.environ.get(\"ARP_LOCATION\", \"S3DF\"),\n \"Authorization\": os.environ.get(\"Authorization\"),\n \"user\": getpass.getuser(),\n \"lute_params\": params,\n \"slurm_params\": extra_args,\n \"workflow\": wf_defn, # Used only for custom DAGs. See below under advanced usage.\n },\n}\n
Note that the environment variables are used to fill in the appropriate information because this script is intended to be launched primarily from the ARP (which passes these variables). The ARP allows for the launch job to be defined in the experiment eLog and submitted automatically for each new DAQ run. The environment variables EXPERIMENT
and RUN
can alternatively be defined prior to submitting the script on the command-line.
The script takes a number of parameters:
launch_airflow.py -c <path_to_config_yaml> -w <workflow_name> [--debug] [--test] [-e <exp>] [-r <run>] [SLURM_ARGS]\n
-c
refers to the path of the configuration YAML that contains the parameters for each managed Task
in the requested workflow.-w
is the name of the DAG (workflow) to run. By convention each DAG is named by the Python file it is defined in. (See below).-W
(capital W) followed by the path to the workflow instead of -w
. See below for further discussion on this use case.--debug
is an optional flag to run all steps of the workflow in debug mode for verbose logging and output.--test
is an optional flag which will use the test Airflow instance. By default the script will make requests of the standard production Airflow instance.-e
is used to pass the experiment name. Needed if not using the ARP, i.e. running from the command-line.-r
is used to pass a run number. Needed if not using the ARP, i.e. running from the command-line.SLURM_ARGS
are SLURM arguments to be passed to the submit_slurm.sh
script which are used for each individual managed Task
. These arguments to do NOT affect the submission parameters for the job running launch_airflow.py
(if using submit_launch_airflow.sh
below).Lifetime This script will run for the entire duration of the workflow (DAG). After making the initial request of Airflow to launch the DAG, it will enter a status update loop which will keep track of each individual job (each job runs one managed Task
) submitted by Airflow. At the end of each job it will collect the log file, in addition to providing a few other status updates/debugging messages, and append it to its own log. This allows all logging for the entire workflow (DAG) to be inspected from an individual file. This is particularly useful when running via the eLog, because only a single log file is displayed.
submit_launch_airflow.sh
","text":"This script is only necessary when running from the eLog using the ARP. The initial job submitted by the ARP can not have a duration of longer than 30 seconds, as it will then time out. As the launch_airflow.py
job will live for the entire duration of the workflow, which is often much longer than 30 seconds, the solution was to have a wrapper which submits the launch_airflow.py
script to run on the S3DF batch nodes. Usage of this script is mostly identical to launch_airflow.py
. All the arguments are passed transparently to the underlying Python script with the exception of the first argument which must be the location of the underlying launch_airflow.py
script. The wrapper will simply launch a batch job using minimal resources (1 core). While the primary purpose of the script is to allow running from the eLog, it is also an useful wrapper generally, to be able to submit the previous script as a SLURM job.
Usage:
submit_launch_airflow.sh /path/to/launch_airflow.py -c <path_to_config_yaml> -w <workflow_name> [--debug] [--test] [-e <exp>] [-r <run>] [SLURM_ARGS]\n
"},{"location":"tutorial/creating_workflows/#submit_slurmsh","title":"submit_slurm.sh
","text":"Launches a job on the S3DF batch nodes using the SLURM job scheduler. This script launches a single managed Task
at a time. The usage is as follows:
submit_slurm.sh -c <path_to_config_yaml> -t <MANAGED_task_name> [--debug] [SLURM_ARGS ...]\n
As a reminder the managed Task
refers to the Executor
-Task
combination. The script does not parse any SLURM specific parameters, and instead passes them transparently to SLURM. At least the following two SLURM arguments must be provided:
--partition=<...> # Usually partition=milano\n--account=<...> # Usually account=lcls:$EXPERIMENT\n
Generally, resource requests will also be included, such as the number of cores to use. A complete call may look like the following:
submit_slurm.sh -c /sdf/data/lcls/ds/hutch/experiment/scratch/config.yaml -t Tester --partition=milano --account=lcls:experiment --ntasks=100 [...]\n
When running a workflow using the launch_airflow.py
script, each step of the workflow will be submitted using this script.
Operator
s are the objects submitted as individual steps of a DAG by Airflow. They are conceptually linked to the idea of a task in that each task of a workflow is generally an operator. Care should be taken, not to confuse them with LUTE Task
s or managed Task
s though. There is, however, usually a one-to-one correspondance between a Task
and an Operator
.
Airflow runs on a K8S cluster which has no access to the experiment data. When we ask Airflow to run a DAG, it will launch an Operator
for each step of the DAG. However, the Operator
itself cannot perform productive analysis without access to the data. The solution employed by LUTE
is to have a limited set of Operator
s which do not perform analysis, but instead request that a LUTE
managed Task
s be submitted on the batch nodes where it can access the data. There may be small differences between how the various provided Operator
s do this, but in general they will all make a request to the job interface daemon (JID) that a new SLURM job be scheduled using the submit_slurm.sh
script described above.
Therefore, running a typical Airflow DAG involves the following steps:
launch_airflow.py
script is submitted, usually from a definition in the eLog.launch_airflow
script requests that Airflow run a specific DAG.Operator
s that makeup the DAG definition.Operator
sends a request to the JID
to submit a job.JID
submits the elog_submit.sh
script with the appropriate managed Task
.Task
runs on the batch nodes, while the Operator
, requesting updates from the JID on job status, waits for it to complete.Task
completes, the Operator
will receieve this information and tell the Airflow server whether the job completed successfully or resulted in failure.Currently, the following Operator
s are maintained: - JIDSlurmOperator
: The standard Operator
. Each instance has a one-to-one correspondance with a LUTE managed Task
.
JIDSlurmOperator
arguments","text":"task_id
: This is nominally the name of the task on the Airflow side. However, for simplicity this is used 1-1 to match the name of a managed Task defined in LUTE's managed_tasks.py
module. I.e., it should the name of an Executor(\"Task\")
object which will run the specific Task of interest. This must match the name of a defined managed Task.max_cores
: Used to cap the maximum number of cores which should be requested of SLURM. By default all jobs will run with the same number of cores, which should be specified when running the launch_airflow.py
script (either from the ARP, or by hand). This behaviour was chosen because in general we want to increase or decrease the core-count for all Task
s uniformly, and we don't want to have to specify core number arguments for each job individually. Nonetheless, on occassion it may be necessary to cap the number of cores a specific job will use. E.g. if the default value specified when launching the Airflow DAG is multiple cores, and one job is single threaded, the core count can be capped for that single job to 1, while the rest run with multiple cores.max_nodes
: Similar to the above. This will make sure the Task
is distributed across no more than a maximum number of nodes. This feature is useful for, e.g., multi-threaded software which does not make use of tools like MPI
. So, the Task
can run on multiple cores, but only within a single node.require_partition
: This option is a string that forces the use of a specific S3DF partition for the managed Task
submitted by the Operator. E.g. typically a LCLS user will use --partition=milano
for CPU-based workflows; however, if a specific Task
requires a GPU you may use JIDSlurmOperator(\"MyTaskRunner\", require_partition=\"ampere\")
to override the partition for that single Task
.custom_slurm_params
: You can provide a string of parameters which will be used in its entirety to replace any and all default arguments passed by the launch script. This method is not recommended for general use and is mostly used for dynamic DAGs described at the end of the document.Defining a new workflow involves creating a new module (Python file) in the directory workflows/airflow
, creating a number of Operator
instances within the module, and then drawing the connectivity between them. At the top of the file an Airflow DAG is created and given a name. By convention all LUTE
workflows use the name of the file as the name of the DAG. The following code can be copied exactly into the file:
from datetime import datetime\nimport os\nfrom airflow import DAG\nfrom lute.operators.jidoperators import JIDSlurmOperator # Import other operators if needed\n\ndag_id: str = f\"lute_{os.path.splitext(os.path.basename(__file__))[0]}\"\ndescription: str = (\n \"Run SFX processing using PyAlgos peak finding and experimental phasing\"\n)\n\ndag: DAG = DAG(\n dag_id=dag_id,\n start_date=datetime(2024, 3, 18),\n schedule_interval=None,\n description=description,\n)\n
Once the DAG has been created, a number of Operator
s must be created to run the various LUTE analysis operations. As an example consider a partial SFX processing workflow which includes steps for peak finding, indexing, merging, and calculating figures of merit. Each of the 4 steps will have an Operator
instance which will launch a corresponding LUTE
managed Task
, for example:
# Using only the JIDSlurmOperator\n# syntax: JIDSlurmOperator(task_id=\"LuteManagedTaskName\", dag=dag) # optionally, max_cores=123)\npeak_finder: JIDSlurmOperator = JIDSlurmOperator(task_id=\"PeakFinderPyAlgos\", dag=dag)\n\n# We specify a maximum number of cores for the rest of the jobs.\nindexer: JIDSlurmOperator = JIDSlurmOperator(\n max_cores=120, task_id=\"CrystFELIndexer\", dag=dag\n)\n# We can alternatively specify this task be only ever run with the following args.\n# indexer: JIDSlurmOperator = JIDSlurmOperator(\n# custom_slurm_params=\"--partition=milano --ntasks=120 --account=lcls:myaccount\",\n# task_id=\"CrystFELIndexer\",\n# dag=dag,\n# )\n\n# Merge\nmerger: JIDSlurmOperator = JIDSlurmOperator(\n max_cores=120, task_id=\"PartialatorMerger\", dag=dag\n)\n\n# Figures of merit\nhkl_comparer: JIDSlurmOperator = JIDSlurmOperator(\n max_cores=8, task_id=\"HKLComparer\", dag=dag\n)\n
Finally, the dependencies between the Operator
s are \"drawn\", defining the execution order of the various steps. The >>
operator has been overloaded for the Operator
class, allowing it to be used to specify the next step in the DAG. In this case, a completely linear DAG is drawn as:
peak_finder >> indexer >> merger >> hkl_comparer\n
Parallel execution can be added by using the >>
operator multiple times. Consider a task1
which upon successful completion starts a task2
and task3
in parallel. This dependency can be added to the DAG using:
#task1: JIDSlurmOperator = JIDSlurmOperator(...)\n#task2 ...\n\ntask1 >> task2\ntask1 >> task3\n
As each DAG is defined in pure Python, standard control structures (loops, if statements, etc.) can be used to create more complex workflow arrangements.
Note: Your DAG will not be available to Airflow until your PR including the file you have defined is merged! Once merged the file will be synced with the Airflow instance and can be run using the scripts described earlier in this document. For testing it is generally preferred that you run each step of your DAG individually using the submit_slurm.sh
script and the independent managed Task
names. If, however, you want to test the behaviour of Airflow itself (in a modified form) you can use the advanced run-time DAGs defined below as well.
In most cases, standard DAGs should be defined as described above and called by name. However, Airflow also supports the creation of DAGs dynamically, e.g. to vary the input data to various steps, or the number of steps that will occur. Some of this functionality has been used to allow for user-defined DAGs which are passed in the form of a dictionary, allowing Airflow to construct the workflow as it is running.
A basic YAML syntax is used to construct a series of nested dictionaries which define a DAG. Considering the first example DAG defined above (for serial femtosecond crystallography), the standard DAG looked like:
peak_finder >> indexer >> merger >> hkl_comparer\n
We can alternatively define this DAG in YAML:
task_name: PeakFinderPyAlgos\nslurm_params: ''\nnext:\n- task_name: CrystFELIndexer\n slurm_params: ''\n next: []\n - task_name: PartialatorMerger\n slurm_params: ''\n next: []\n - task_name: HKLComparer\n slurm_params: ''\n next: []\n
I.e. we define a tree where each node is constructed using Node(task_name: str, slurm_params: str, next: List[Node])
.
task_name
is the name of a managed Task
as before, in the same way that would be passed to the JIDSlurmOperator
.slurm_params
. This is a complete string of all the arguments to use for the corresponding managed Task
. Use of this field is all or nothing! - if it is left as an empty string, the default parameters (passed on the command-line using the launch script) are used, otherwise this string is used in its stead. Because of this remember to include a partition and account if using it.next
field is composed of either an empty list (meaning no managed Task
s are run after the current node), or additional nodes. All nodes in the list are run in parallel. As a second example, to run task1
followed by task2
and task3
in parellel we would use:
task_name: Task1\nslurm_params: ''\nnext:\n- task_name: Task2\n slurm_params: ''\n next: []\n- task_name: Task3\n slurm_params: ''\n next: []\n
In order to run a DAG defined this way we pass the path to the YAML file we have defined it in to the launch script using -W <path_to_dag>
. This is instead of calling it by name. E.g.
/path/to/lute/launch_scripts/submit_launch_airflow.sh /path/to/lute/launch_scripts/launch_airflow.py -e <exp> -r <run> -c /path/to/config -W <path_to_dag> --test [--debug] [SLURM_ARGS]\n
Note that fewer options are currently supported for configuring the operators for each step of the DAG. The slurm arguments can be replaced in their entirety using a custom slurm_params
string but individual options cannot be modified.
Task
","text":"Task
s can be broadly categorized into two types: - \"First-party\" - where the analysis or executed code is maintained within this library. - \"Third-party\" - where the analysis, code, or program is maintained elsewhere and is simply called by a wrapping Task
.
Creating a new Task
of either type generally involves the same steps, although for first-party Task
s, the analysis code must of course also be written. Due to this difference, as well as additional considerations for parameter handling when dealing with \"third-party\" Task
s, the \"first-party\" and \"third-party\" Task
integration cases will be considered separately.
Task
","text":"There are two required steps for third-party Task
integration, and one additional step which is optional, and may not be applicable to all possible third-party Task
s. Generally, Task
integration requires: 1. Defining a TaskParameters
(pydantic) model which fully parameterizes the Task
. This involves specifying a path to a binary, and all the required command-line arguments to run the binary. 2. Creating a managed Task
by specifying an Executor
for the new third-party Task
. At this stage, any additional environment variables can be added which are required for the execution environment. 3. (Optional/Maybe applicable) Create a template for a third-party configuration file. If the new Task
has its own configuration file, specifying a template will allow that file to be parameterized from the singular LUTE yaml configuration file. A couple of minor additions to the pydantic
model specified in 1. are required to support template usage.
Each of these stages will be discussed in detail below. The vast majority of the work is completed in step 1.
"},{"location":"tutorial/new_task/#specifying-a-taskparameters-model-for-your-task","title":"Specifying aTaskParameters
Model for your Task
","text":"A brief overview of parameters objects will be provided below. The following information goes into detail only about specifics related to LUTE configuration. An in depth description of pydantic is beyond the scope of this tutorial; please refer to the official documentation for more information. Please note that due to environment constraints pydantic is currently pinned to version 1.10! Make sure to read the appropriate documentation for this version as many things are different compared to the newer releases. At the end this document there will be an example highlighting some supported behaviour as well as a FAQ to address some common integration considerations.
Task
s and TaskParameter
s
All Task
s have a corresponding TaskParameters
object. These objects are linked exclusively by a named relationship. For a Task
named MyThirdPartyTask
, the parameters object must be named MyThirdPartyTaskParameters
. For third-party Task
s there are a number of additional requirements: - The model must inherit from a base class called ThirdPartyParameters
. - The model must have one field specified called executable
. The presence of this field indicates that the Task
is a third-party Task
and the specified executable must be called. This allows all third-party Task
s to be defined exclusively by their parameters model. A single ThirdPartyTask
class handles execution of all third-party Task
s.
All models are stored in lute/io/models
. For any given Task
, a new model can be added to an existing module contained in this directory or to a new module. If creating a new module, make sure to add an import statement to lute.io.models.__init__
.
Defining TaskParameter
s
When specifying parameters the default behaviour is to provide a one-to-one correspondance between the Python attribute specified in the parameter model, and the parameter specified on the command-line. Single-letter attributes are assumed to be passed using -
, e.g. n
will be passed as -n
when the executable is launched. Longer attributes are passed using --
, e.g. by default a model attribute named my_arg
will be passed on the command-line as --my_arg
. Positional arguments are specified using p_argX
where X
is a number. All parameters are passed in the order that they are specified in the model.
However, because the number of possible command-line combinations is large, relying on the default behaviour above is NOT recommended. It is provided solely as a fallback. Instead, there are a number of configuration knobs which can be tuned to achieve the desired behaviour. The two main mechanisms for controlling behaviour are specification of model-wide configuration under the Config
class within the model's definition, and parameter-by-parameter configuration using field attributes. For the latter, we define all parameters as Field
objects. This allows parameters to have their own attributes, which are parsed by LUTE's task-layer. Given this, the preferred starting template for a TaskParameters
model is the following - we assume we are integrating a new Task
called RunTask
:
\nfrom pydantic import Field, validator\n# Also include any pydantic type specifications - Pydantic has many custom\n# validation types already, e.g. types for constrained numberic values, URL handling, etc.\n\nfrom .base import ThirdPartyParameters\n\n# Change class name as necessary\nclass RunTaskParameters(ThirdPartyParameters):\n \"\"\"Parameters for RunTask...\"\"\"\n\n class Config(ThirdPartyParameters.Config): # MUST be exactly as written here.\n ...\n # Model-wide configuration will go here\n\n executable: str = Field(\"/path/to/executable\", description=\"...\")\n ...\n # Additional params.\n # param1: param1Type = Field(\"default\", description=\"\", ...)\n
Config settings and options Under the class definition for Config
in the model, we can modify global options for all the parameters. In addition, there are a number of configuration options related to specifying what the outputs/results from the associated Task
are, and a number of options to modify runtime behaviour. Currently, the available configuration options are:
run_directory
If provided, can be used to specify the directory from which a Task
is run. None
(not provided) NO set_result
bool
. If True
search the model definition for a parameter that indicates what the result is. False
NO result_from_params
If set_result
is True
can define a result using this option and a validator. See also is_result
below. None
(not provided) NO short_flags_use_eq
Use equals sign instead of space for arguments of -
parameters. False
YES - Only affects ThirdPartyTask
s long_flags_use_eq
Use equals sign instead of space for arguments of -
parameters. False
YES - Only affects ThirdPartyTask
s These configuration options modify how the parameter models are parsed and passed along on the command-line, as well as what we consider results and where a Task
can run. The default behaviour is that parameters are assumed to be passed as -p arg
and --param arg
, the Task
will be run in the current working directory (or scratch if submitted with the ARP), and we have no information about Task
results . Setting the above options can modify this behaviour.
short_flags_use_eq
and/or long_flags_use_eq
to True
parameters are instead passed as -p=arg
and --param=arg
.run_directory
to a valid path, we can force a Task
to be run in a specific directory. By default the Task
will be run from the directory you submit the job in, or from your scratch folder (/sdf/scratch/...
) if you submit from the eLog. Some ThirdPartyTask
s rely on searching the correct working directory in order run properly.set_result
to True
we indicate that the TaskParameters
model will provide information on what the TaskResult
is. This setting must be used with one of two options, either the result_from_params
Config
option, described below, or the Field attribute is_result
described in the next sub-section (Field Attributes).result_from_params
is a Config option that can be used when set_result==True
. In conjunction with a validator (described a sections down) we can use this option to specify a result from all the information contained in the model. E.g. if you have a Task
that has parameters for an output_directory
and a output_filename
, you can set result_from_params==f\"{output_directory}/{output_filename}\"
.Field attributes In addition to the global configuration options there are a couple of ways to specify individual parameters. The following Field
attributes are used when parsing the model:
flag_type
Specify the type of flag for passing this argument. One of \"-\"
, \"--\"
, or \"\"
N/A p_arg1 = Field(..., flag_type=\"\")
rename_param
Change the name of the parameter as passed on the command-line. N/A my_arg = Field(..., rename_param=\"my-arg\")
description
Documentation of the parameter's usage or purpose. N/A arg = Field(..., description=\"Argument for...\")
is_result
bool
. If the set_result
Config
option is True
, we can set this to True
to indicate a result. N/A output_result = Field(..., is_result=true)
The flag_type
attribute allows us to specify whether the parameter corresponds to a positional (\"\"
) command line argument, requires a single hyphen (\"-\"
), or a double hyphen (\"--\"
). By default, the parameter name is passed as-is on the command-line. However, command-line arguments can have characters which would not be valid in Python variable names. In particular, hyphens are frequently used. To handle this case, the rename_param
attribute can be used to specify an alternative spelling of the parameter when it is passed on the command-line. This also allows for using more descriptive variable names internally than those used on the command-line. A description
can also be provided for each Field to document the usage and purpose of that particular parameter.
As an example, we can again consider defining a model for a RunTask
Task
. Consider an executable which would normally be called from the command-line as follows:
/sdf/group/lcls/ds/tools/runtask -n <nthreads> --method=<algorithm> -p <algo_param> [--debug]\n
A model specification for this Task
may look like:
class RunTaskParameters(ThirdPartyParameters):\n \"\"\"Parameters for the runtask binary.\"\"\"\n\n class Config(ThirdPartyParameters.Config):\n long_flags_use_eq: bool = True # For the --method parameter\n\n # Prefer using full/absolute paths where possible.\n # No flag_type needed for this field\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/runtask\", description=\"Runtask Binary v1.0\"\n )\n\n # We can provide a more descriptive name for -n\n # Let's assume it's a number of threads, or processes, etc.\n num_threads: int = Field(\n 1, description=\"Number of concurrent threads.\", flag_type=\"-\", rename_param=\"n\"\n )\n\n # In this case we will use the Python variable name directly when passing\n # the parameter on the command-line\n method: str = Field(\"algo1\", description=\"Algorithm to use.\", flag_type=\"--\")\n\n # For an actual parameter we would probably have a better name. Lets assume\n # This parameter (-p) modifies the behaviour of the method above.\n method_param1: int = Field(\n 3, description=\"Modify method performance.\", flag_type=\"-\", rename_param=\"p\"\n )\n\n # Boolean flags are only passed when True! `--debug` is an optional parameter\n # which is not followed by any arguments.\n debug: bool = Field(\n False, description=\"Whether to run in debug mode.\", flag_type=\"--\"\n )\n
The is_result
attribute allows us to specify whether the corresponding Field points to the output/result of the associated Task
. Consider a Task
, RunTask2
which writes its output to a single file which is passed as a parameter.
class RunTask2Parameters(ThirdPartyParameters):\n \"\"\"Parameters for the runtask2 binary.\"\"\"\n\n class Config(ThirdPartyParameters.Config):\n set_result: bool = True # This must be set here!\n # result_from_params: Optional[str] = None # We can use this for more complex result setups (see below). Ignore for now.\n\n # Prefer using full/absolute paths where possible.\n # No flag_type needed for this field\n executable: str = Field(\n \"/sdf/group/lcls/ds/tools/runtask2\", description=\"Runtask Binary v2.0\"\n )\n\n # Lets assume we take one input and write one output file\n # We will not provide a default value, so this parameter MUST be provided\n input: str = Field(\n description=\"Path to input file.\", flag_type=\"--\"\n )\n\n # We will also not provide a default for the output\n # BUT, we will specify that whatever is provided is the result\n output: str = Field(\n description=\"Path to write output to.\",\n flag_type=\"-\",\n rename_param=\"o\",\n is_result=True, # This means this parameter points to the result!\n )\n
Additional Comments 1. Model parameters of type bool
are not passed with an argument and are only passed when True
. This is a common use-case for boolean flags which enable things like test or debug modes, verbosity or reporting features. E.g. --debug
, --test
, --verbose
, etc. - If you need to pass the literal words \"True\"
or \"False\"
, use a parameter of type str
. 2. You can use pydantic
types to constrain parameters beyond the basic Python types. E.g. conint
can be used to define lower and upper bounds for an integer. There are also types for common categories, positive/negative numbers, paths, URLs, IP addresses, etc. - Even more custom behaviour can be achieved with validator
s (see below). 3. All TaskParameters
objects and its subclasses have access to a lute_config
parameter, which is of type lute.io.models.base.AnalysisHeader
. This special parameter is ignored when constructing the call for a binary task, but it provides access to shared/common parameters between tasks. For example, the following parameters are available through the lute_config
object, and may be of use when constructing validators. All fields can be accessed with .
notation. E.g. lute_config.experiment
. - title
: A user provided title/description of the analysis. - experiment
: The current experiment name - run
: The current acquisition run number - date
: The date of the experiment or the analysis. - lute_version
: The version of the software you are running. - task_timeout
: How long a Task
can run before it is killed. - work_dir
: The main working directory for LUTE. Files and the database are created relative to this directory. This is separate from the run_directory
config option. LUTE will write files to the work directory by default; however, the Task
itself is run from run_directory
if it is specified.
Validators Pydantic uses validators
to determine whether a value for a specific field is appropriate. There are default validators for all the standard library types and the types specified within the pydantic package; however, it is straightforward to define custom ones as well. In the template code-snippet above we imported the validator
decorator. To create our own validator we define a method (with any name) with the following prototype, and decorate it with the validator
decorator:
@validator(\"name_of_field_to_decorate\")\ndef my_custom_validator(cls, field: Any, values: Dict[str, Any]) -> Any: ...\n
In this snippet, the field
variable corresponds to the value for the specific field we want to validate. values
is a dictionary of fields and their values which have been parsed prior to the current field. This means you can validate the value of a parameter based on the values provided for other parameters. Since pydantic always validates the fields in the order they are defined in the model, fields dependent on other fields should come later in the definition.
For example, consider the method_param1
field defined above for RunTask
. We can provide a custom validator which changes the default value for this field depending on what type of algorithm is specified for the --method
option. We will also constrain the options for method
to two specific strings.
from pydantic import Field, validator, ValidationError, root_validator\nclass RunTaskParameters(ThirdPartyParameters):\n \"\"\"Parameters for the runtask binary.\"\"\"\n\n # [...]\n\n # In this case we will use the Python variable name directly when passing\n # the parameter on the command-line\n method: str = Field(\"algo1\", description=\"Algorithm to use.\", flag_type=\"--\")\n\n # For an actual parameter we would probably have a better name. Lets assume\n # This parameter (-p) modifies the behaviour of the method above.\n method_param1: Optional[int] = Field(\n description=\"Modify method performance.\", flag_type=\"-\", rename_param=\"p\"\n )\n\n # We will only allow method to take on one of two values\n @validator(\"method\")\n def validate_method(cls, method: str, values: Dict[str, Any]) -> str:\n \"\"\"Method validator: --method can be algo1 or algo2.\"\"\"\n\n valid_methods: List[str] = [\"algo1\", \"algo2\"]\n if method not in valid_methods:\n raise ValueError(\"method must be algo1 or algo2\")\n return method\n\n # Lets change the default value of `method_param1` depending on `method`\n # NOTE: We didn't provide a default value to the Field above and made it\n # optional. We can use this to test whether someone is purposefully\n # overriding the value of it, and if not, set the default ourselves.\n # We set `always=True` since pydantic will normally not use the validator\n # if the default is not changed\n @validator(\"method_param1\", always=True)\n def validate_method_param1(cls, param1: Optional[int], values: Dict[str, Any]) -> int:\n \"\"\"method param1 validator\"\"\"\n\n # If someone actively defined it, lets just return that value\n # We could instead do some additional validation to make sure that the\n # value they provided is valid...\n if param1 is not None:\n return param1\n\n # method_param1 comes after method, so this will be defined, or an error\n # would have been raised.\n method: str = values['method']\n if method == \"algo1\":\n return 3\n elif method == \"algo2\":\n return 5\n
The special root_validator(pre=False)
can also be used to provide validation of the model as a whole. This is also the recommended method for specifying a result (using result_from_params
) which has a complex dependence on the parameters of the model. This latter use-case is described in FAQ 2 below.
Use a custom validator. The example above shows how to do this. The parameter that depends on another parameter must come LATER in the model defintion than the independent parameter.
TaskResult
is determinable from the parameters model, but it isn't easily specified by one parameter. How can I use result_from_params
to indicate the result?When a result can be identified from the set of parameters defined in a TaskParameters
model, but is not as straightforward as saying it is equivalent to one of the parameters alone, we can set result_from_params
using a custom validator. In the example below, we have two parameters which together determine what the result is, output_dir
and out_name
. Using a validator we will define a result from these two values.
from pydantic import Field, root_validator\n\nclass RunTask3Parameters(ThirdPartyParameters):\n \"\"\"Parameters for the runtask3 binary.\"\"\"\n\n class Config(ThirdPartyParameters.Config):\n set_result: bool = True # This must be set here!\n result_from_params: str = \"\" # We will set this momentarily\n\n # [...] executable, other params, etc.\n\n output_dir: str = Field(\n description=\"Directory to write output to.\",\n flag_type=\"--\",\n rename_param=\"dir\",\n )\n\n out_name: str = Field(\n description=\"The name of the final output file.\",\n flag_type=\"--\",\n rename_param=\"oname\",\n )\n\n # We can still provide other validators as needed\n # But for now, we just set result_from_params\n # Validator name can be anything, we set pre=False so this runs at the end\n @root_validator(pre=False)\n def define_result(cls, values: Dict[str, Any]) -> Dict[str, Any]:\n # Extract the values of output_dir and out_name\n output_dir: str = values[\"output_dir\"]\n out_name: str = values[\"out_name\"]\n\n result: str = f\"{output_dir}/{out_name}\"\n # Now we set result_from_params\n cls.Config.result_from_params = result\n\n # We haven't modified any other values, but we MUST return this!\n return values\n
Task
depends on the output of a previous Task
, how can I specify this dependency? Parameters used to run a Task
are recorded in a database for every Task
. It is also recorded whether or not the execution of that specific parameter set was successful. A utility function is provided to access the most recent values from the database for a specific parameter of a specific Task
. It can also be used to specify whether unsuccessful Task
s should be included in the query. This utility can be used within a validator to specify dependencies. For example, suppose the input of RunTask2
(parameter input
) depends on the output location of RunTask1
(parameter outfile
). A validator of the following type can be used to retrieve the output file and make it the default value of the input parameter.from pydantic import Field, validator\n\nfrom .base import ThirdPartyParameters\nfrom ..db import read_latest_db_entry\n\nclass RunTask2Parameters(ThirdPartyParameters):\n input: str = Field(\"\", description=\"Input file.\", flag_type=\"--\")\n\n @validator(\"input\")\n def validate_input(cls, input: str, values: Dict[str, Any]) -> str:\n if input == \"\":\n task1_out: Optional[str] = read_latest_db_entry(\n f\"{values['lute_config'].work_dir}\", # Working directory. We search for the database here.\n \"RunTask1\", # Name of Task we want to look up\n \"outfile\", # Name of parameter of the Task\n valid_only=True, # We only want valid output files.\n )\n # read_latest_db_entry returns None if nothing is found\n if task1_out is not None:\n return task1_out\n return input\n
There are more examples of this pattern spread throughout the various Task
models.
Executor
: Creating a runnable, \"managed Task
\"","text":"Overview
After a pydantic model has been created, the next required step is to define a managed Task
. In the context of this library, a managed Task
refers to the combination of an Executor
and a Task
to run. The Executor
manages the process of Task
submission and the execution environment, as well as performing any logging, eLog communication, etc. There are currently two types of Executor
to choose from, but only one is applicable to third-party code. The second Executor
is listed below for completeness only. If you need MPI see the note below.
Executor
: This is the standard Executor
. It should be used for third-party uses cases.MPIExecutor
: This performs all the same types of operations as the option above; however, it will submit your Task
using MPI.MPIExecutor
will submit the Task
using the number of available cores - 1. The number of cores is determined from the physical core/thread count on your local machine, or the number of cores allocated by SLURM when submitting on the batch nodes.Using MPI with third-party Task
s
As mentioned, you should setup a third-party Task
to use the first type of Executor
. If, however, your third-party Task
uses MPI this may seem non-intuitive. When using the MPIExecutor
LUTE code is submitted with MPI. This includes the code that performs signalling to the Executor
and exec
s the third-party code you are interested in running. While it is possible to set this code up to run with MPI, it is more challenging in the case of third-party Task
s because there is no Task
code to modify directly! The MPIExecutor
is provided mostly for first-party code. This is not an issue, however, since the standard Executor
is easily configured to run with MPI in the case of third-party code.
When using the standard Executor
for a Task
requiring MPI, the executable
in the pydantic model must be set to mpirun
. For example, a third-party Task
model, that uses MPI but is intended to be run with the Executor
may look like the following. We assume this Task
runs a Python script using MPI.
class RunMPITaskParameters(ThirdPartyParameters):\n class Config(ThirdPartyParameters.Config):\n ...\n\n executable: str = Field(\"mpirun\", description=\"MPI executable\")\n np: PositiveInt = Field(\n max(int(os.environ.get(\"SLURM_NPROCS\", len(os.sched_getaffinity(0)))) - 1, 1),\n description=\"Number of processes\",\n flag_type=\"-\",\n )\n pos_arg: str = Field(\"python\", description=\"Python...\", flag_type=\"\")\n script: str = Field(\"\", description=\"Python script to run with MPI\", flag_type=\"\")\n
Selecting the Executor
After deciding on which Executor
to use, a single line must be added to the lute/managed_tasks.py
module:
# Initialization: Executor(\"TaskName\")\nTaskRunner: Executor = Executor(\"SubmitTask\")\n# TaskRunner: MPIExecutor = MPIExecutor(\"SubmitTask\") ## If using the MPIExecutor\n
In an attempt to make it easier to discern whether discussing a Task
or managed Task
, the standard naming convention is that the Task
(class name) will have a verb in the name, e.g. RunTask
, SubmitTask
. The corresponding managed Task
will use a related noun, e.g. TaskRunner
, TaskSubmitter
, etc.
As a reminder, the Task
name is the first part of the class name of the pydantic model, without the Parameters
suffix. This name must match. E.g. if your pydantic model's class name is RunTaskParameters
, the Task
name is RunTask
, and this is the string passed to the Executor
initializer.
Modifying the environment
If your third-party Task
can run in the standard psana
environment with no further configuration files, the setup process is now complete and your Task
can be run within the LUTE framework. If on the other hand your Task
requires some changes to the environment, this is managed through the Executor
. There are a couple principle methods that the Executor
has to change the environment.
Executor.update_environment
: if you only need to add a few environment variables, or update the PATH
this is the method to use. The method takes a Dict[str, str]
as input. Any variables can be passed/defined using this method. By default, any variables in the dictionary will overwrite those variable definitions in the current environment if they are already present, except for the variable PATH
. By default PATH
entries in the dictionary are prepended to the current PATH
available in the environment the Executor
runs in (the standard psana
environment). This behaviour can be changed to either append, or overwrite the PATH
entirely by an optional second argument to the method.Executor.shell_source
: This method will source a shell script which can perform numerous modifications of the environment (PATH changes, new environment variables, conda environments, etc.). The method takes a str
which is the path to a shell script to source.As an example, we will update the PATH
of one Task
and source a script for a second.
TaskRunner: Executor = Executor(\"RunTask\")\n# update_environment(env: Dict[str,str], update_path: str = \"prepend\") # \"append\" or \"overwrite\"\nTaskRunner.update_environment(\n { \"PATH\": \"/sdf/group/lcls/ds/tools\" } # This entry will be prepended to the PATH available after sourcing `psconda.sh`\n)\n\nTask2Runner: Executor = Executor(\"RunTask2\")\nTask2Runner.shell_source(\"/sdf/group/lcls/ds/tools/new_task_setup.sh\") # Will source new_task_setup.sh script\n
"},{"location":"tutorial/new_task/#using-templates-managing-third-party-configuration-files","title":"Using templates: managing third-party configuration files","text":"Some third-party executables will require their own configuration files. These are often separate JSON or YAML files, although they can also be bash or Python scripts which are intended to be edited. Since LUTE requires its own configuration YAML file, it attempts to handle these cases by using Jinja templates. When wrapping a third-party task a template can also be provided - with small modifications to the Task
's pydantic model, LUTE can process special types of parameters to render them in the template. LUTE offloads all the template rendering to Jinja, making the required additions to the pydantic model small. On the other hand, it does require understanding the Jinja syntax, and the provision of a well-formatted template, to properly parse parameters. Some basic examples of this syntax will be shown below; however, it is recommended that the Task
implementer refer to the official Jinja documentation for more information.
LUTE provides two additional base models which are used for template parsing in conjunction with the primary Task
model. These are: - TemplateParameters
objects which hold parameters which will be used to render a portion of a template. - TemplateConfig
objects which hold two strings: the name of the template file to use and the full path (including filename) of where to output the rendered result.
Task
models which inherit from the ThirdPartyParameters
model, as all third-party Task
s should, allow for extra arguments. LUTE will parse any extra arguments provided in the configuration YAML as TemplateParameters
objects automatically, which means that they do not need to be explicitly added to the pydantic model (although they can be). As such the only requirement on the Python-side when adding template rendering functionality to the Task
is the addition of one parameter - an instance of TemplateConfig
. The instance MUST be called lute_template_cfg
.
from pydantic import Field, validator\n\nfrom .base import TemplateConfig\n\nclass RunTaskParamaters(ThirdPartyParameters):\n ...\n # This parameter MUST be called lute_template_cfg!\n lute_template_cfg: TemplateConfig = Field(\n TemplateConfig(\n template_name=\"name_of_template.json\",\n output_path=\"/path/to/write/rendered_output_to.json\",\n ),\n description=\"Template rendering configuration\",\n )\n
LUTE looks for the template in config/templates
, so only the name of the template file to use within that directory is required for the template_name
attribute of lute_template_cfg
. LUTE can write the output anywhere (the user has permissions), and with any name, so the full absolute path including filename should be used for the output_path
of lute_template_cfg
.
The rest of the work is done by the combination of Jinja, LUTE's configuration YAML file, and the template itself. Understanding the interplay between these components is perhaps best illustrated by an example. As such, let us consider a simple third-party Task
whose only input parameter (on the command-line) is the location of a configuration JSON file. We'll call the third-party executable jsonuser
and our Task
model, the RunJsonUserParameters
. We assume the program is run like:
jsonuser -i <input_file.json>\n
The first step is to setup the pydantic model as before.
from pydantic import Field, validator\n\nfrom .base import TemplateConfig\n\nclass RunJsonUserParameters:\n executable: str = Field(\n \"/path/to/jsonuser\", description=\"Executable which requires a JSON configuration file.\"\n )\n # Lets assume the JSON file is passed as \"-i <path_to_json>\"\n input_json: str = Field(\n \"\", description=\"Path to the input JSON file.\", flag_type=\"-\", rename_param=\"i\"\n )\n
The next step is to create a template for the JSON file. Let's assume the JSON file looks like:
{\n \"param1\": \"arg1\",\n \"param2\": 4,\n \"param3\": {\n \"a\": 1,\n \"b\": 2\n },\n \"param4\": [\n 1,\n 2,\n 3\n ]\n}\n
Any, or all of these values can be substituted for, and we can determine the way in which we will provide them. I.e. a substitution can be provided for each variable individually, or, for example for a nested hierarchy, a dictionary can be provided which will substitute all the items at once. For this simple case, let's provide variables for param1
, param2
, param3.b
and assume that we want the first and second entries for param4
to be identical for our use case (i.e., we can use one variable for them both. In total, this means we will perform 5 substitutions using 4 variables. Jinja will substitute a variable anywhere it sees the following syntax, {{ variable_name }}
. As such a valid template for our use-case may look like:
{\n \"param1\": {{ str_var }},\n \"param2\": {{ int_var }},\n \"param3\": {\n \"a\": 1,\n \"b\": {{ p3_b }}\n },\n \"param4\": [\n {{ val }},\n {{ val }},\n 3\n ]\n}\n
We save this file as jsonuser.json
in config/templates
. Next, we will update the original pydantic model to include our template configuration. We still have an issue, however, in that we need to decide where to write the output of the template to. In this case, we can use the input_json
parameter. We will assume that the user will provide this, although a default value can also be used. A custom validator will be added so that we can take the input_json
value and update the value of lute_template_cfg.output_path
with it.
# from typing import Optional\n\nfrom pydantic import Field, validator\n\nfrom .base import TemplateConfig #, TemplateParameters\n\nclass RunJsonUserParameters:\n executable: str = Field(\n \"jsonuser\", description=\"Executable which requires a JSON configuration file.\"\n )\n # Lets assume the JSON file is passed as \"-i <path_to_json>\"\n input_json: str = Field(\n \"\", description=\"Path to the input JSON file.\", flag_type=\"-\", rename_param=\"i\"\n )\n # Add template configuration! *MUST* be called `lute_template_cfg`\n lute_template_cfg: TemplateConfig = Field(\n TemplateConfig(\n template_name=\"jsonuser.json\", # Only the name of the file here.\n output_path=\"\",\n ),\n description=\"Template rendering configuration\",\n )\n # We do not need to include these TemplateParameters, they will be added\n # automatically if provided in the YAML\n #str_var: Optional[TemplateParameters]\n #int_var: Optional[TemplateParameters]\n #p3_b: Optional[TemplateParameters]\n #val: Optional[TemplateParameters]\n\n\n # Tell LUTE to write the rendered template to the location provided with\n # `input_json`. I.e. update `lute_template_cfg.output_path`\n @validator(\"lute_template_cfg\", always=True)\n def update_output_path(\n cls, lute_template_cfg: TemplateConfig, values: Dict[str, Any]\n ) -> TemplateConfig:\n if lute_template_cfg.output_path == \"\":\n lute_template_cfg.output_path = values[\"input_json\"]\n return lute_template_cfg\n
All that is left to render the template, is to provide the variables we want to substitute in the LUTE configuration YAML. In our case we must provide the 4 variable names we included within the substitution syntax ({{ var_name }}
). The names in the YAML must match those in the template.
RunJsonUser:\n input_json: \"/my/chosen/path.json\" # We'll come back to this...\n str_var: \"arg1\" # Will substitute for \"param1\": \"arg1\"\n int_var: 4 # Will substitute for \"param2\": 4\n p3_b: 2 # Will substitute for \"param3: { \"b\": 2 }\n val: 2 # Will substitute for \"param4\": [2, 2, 3] in the JSON\n
If on the other hand, a user were to have an already valid JSON file, it is possible to turn off the template rendering. (ALL) Template variables (TemplateParameters
) are simply excluded from the configuration YAML.
RunJsonUser:\n input_json: \"/path/to/existing.json\"\n #str_var: ...\n #...\n
"},{"location":"tutorial/new_task/#additional-jinja-syntax","title":"Additional Jinja Syntax","text":"There are many other syntactical constructions we can use with Jinja. Some of the useful ones are:
If Statements - E.g. only include portions of the template if a value is defined.
{% if VARNAME is defined %}\n// Stuff to include\n{% endif %}\n
Loops - E.g. Unpacking multiple elements from a dictionary.
{% for name, value in VARNAME.items() %}\n// Do stuff with name and value\n{% endfor %}\n
"},{"location":"tutorial/new_task/#creating-a-first-party-task","title":"Creating a \"First-Party\" Task
","text":"The process for creating a \"First-Party\" Task
is very similar to that for a \"Third-Party\" Task
, with the difference being that you must also write the analysis code. The steps for integration are: 1. Write the TaskParameters
model. 2. Write the Task
class. There are a few rules that need to be adhered to. 3. Make your Task
available by modifying the import function. 4. Specify an Executor
TaskParameters
Model for your Task
","text":"Parameter models have a format that must be followed for \"Third-Party\" Task
s, but \"First-Party\" Task
s have a little more liberty in how parameters are dealt with, since the Task
will do all the parsing itself.
To create a model, the basic steps are: 1. If necessary, create a new module (e.g. new_task_category.py
) under lute.io.models
, or find an appropriate pre-existing module in that directory. - An import
statement must be added to lute.io.models._init_
if a new module is created, so it can be found. - If defining the model in a pre-existing module, make sure to modify the __all__
statement to include it. 2. Create a new model that inherits from TaskParameters
. You can look at lute.models.io.tests.TestReadOutputParameters
for an example. The model must be named <YourTaskName>Parameters
- You should include all relevant parameters here, including input file, output file, and any potentially adjustable parameters. These parameters must be included even if there are some implicit dependencies between Task
s and it would make sense for the parameter to be auto-populated based on some other output. Creating this dependency is done with validators (see step 3.). All parameters should be overridable, and all Task
s should be fully-independently configurable, based solely on their model and the configuration YAML. - To follow the preferred format, parameters should be defined as: param_name: type = Field([default value], description=\"This parameter does X.\")
3. Use validators to do more complex things for your parameters, including populating default values dynamically: - E.g. create default values that depend on other parameters in the model - see for example: SubmitSMDParameters. - E.g. create default values that depend on other Task
s by reading from the database - see for example: TestReadOutputParameters. 4. The model will have access to some general configuration values by inheriting from TaskParameters
. These parameters are all stored in lute_config
which is an instance of AnalysisHeader
(defined here). - For example, the experiment and run number can be obtained from this object and a validator could use these values to define the default input file for the Task
.
A number of configuration options and Field attributes are also available for \"First-Party\" Task
models. These are identical to those used for the ThirdPartyTask
s, although there is a smaller selection. These options are reproduced below for convenience.
Config settings and options Under the class definition for Config
in the model, we can modify global options for all the parameters. In addition, there are a number of configuration options related to specifying what the outputs/results from the associated Task
are, and a number of options to modify runtime behaviour. Currently, the available configuration options are:
run_directory
If provided, can be used to specify the directory from which a Task
is run. None
(not provided) NO set_result
bool
. If True
search the model definition for a parameter that indicates what the result is. False
NO result_from_params
If set_result
is True
can define a result using this option and a validator. See also is_result
below. None
(not provided) NO short_flags_use_eq
Use equals sign instead of space for arguments of -
parameters. False
YES - Only affects ThirdPartyTask
s long_flags_use_eq
Use equals sign instead of space for arguments of -
parameters. False
YES - Only affects ThirdPartyTask
s These configuration options modify how the parameter models are parsed and passed along on the command-line, as well as what we consider results and where a Task
can run. The default behaviour is that parameters are assumed to be passed as -p arg
and --param arg
, the Task
will be run in the current working directory (or scratch if submitted with the ARP), and we have no information about Task
results . Setting the above options can modify this behaviour.
short_flags_use_eq
and/or long_flags_use_eq
to True
parameters are instead passed as -p=arg
and --param=arg
.run_directory
to a valid path, we can force a Task
to be run in a specific directory. By default the Task
will be run from the directory you submit the job in, or from your scratch folder (/sdf/scratch/...
) if you submit from the eLog. Some ThirdPartyTask
s rely on searching the correct working directory in order run properly.set_result
to True
we indicate that the TaskParameters
model will provide information on what the TaskResult
is. This setting must be used with one of two options, either the result_from_params
Config
option, described below, or the Field attribute is_result
described in the next sub-section (Field Attributes).result_from_params
is a Config option that can be used when set_result==True
. In conjunction with a validator (described a sections down) we can use this option to specify a result from all the information contained in the model. E.g. if you have a Task
that has parameters for an output_directory
and a output_filename
, you can set result_from_params==f\"{output_directory}/{output_filename}\"
.Field attributes In addition to the global configuration options there are a couple of ways to specify individual parameters. The following Field
attributes are used when parsing the model:
description
Documentation of the parameter's usage or purpose. N/A arg = Field(..., description=\"Argument for...\")
is_result
bool
. If the set_result
Config
option is True
, we can set this to True
to indicate a result. N/A output_result = Field(..., is_result=true)
"},{"location":"tutorial/new_task/#writing-the-task","title":"Writing the Task
","text":"You can write your analysis code (or whatever code to be executed) as long as it adheres to the limited rules below. You can create a new module for your Task
in lute.tasks
or add it to any existing module, if it makes sense for it to belong there. The Task
itself is a single class constructed as:
Task
is a class named in a way that matches its Pydantic model. E.g. RunTask
is the Task
, and RunTaskParameters
is the Pydantic model.Task
class (see template below). If you intend to use MPI see the following section._run
method. This is the method that will be executed when the Task
is run. You can in addition write as many methods as you need. For fine-grained execution control you can also provide _pre_run()
and _post_run()
methods, but this is optional._report_to_executor(msg: Message)
method. Since the Task
is run as a subprocess this method will pass information to the controlling Executor
. You can pass any type of object using this method, strings, plots, arrays, etc.set_result
configuration option in your parameters model, make sure to provide a result when finished. This is done by setting self._result.payload = ...
. You can set the result to be any object. If you have written the result to a file, for example, please provide a path.A minimal template is provided below.
\"\"\"Standard docstring...\"\"\"\n\n__all__ = [\"RunTask\"]\n__author__ = \"\" # Please include so we know who the SME is\n\n# Include any imports you need here\n\nfrom lute.execution.ipc import Message # Message for communication\nfrom lute.io.models.base import * # For TaskParameters\nfrom lute.tasks.task import * # For Task\n\nclass RunTask(Task): # Inherit from Task\n \"\"\"Task description goes here, or in __init__\"\"\"\n\n def __init__(self, *, params: TaskParameters) -> None:\n super().__init__(params=params) # Sets up Task, parameters, etc.\n # Parameters will be available through:\n # self._task_parameters\n # You access with . operator: self._task_parameters.param1, etc.\n # Your result object is availble through:\n # self._result\n # self._result.payload <- Main result\n # self._result.summary <- Short summary\n # self._result.task_status <- Semi-automatic, but can be set manually\n\n def _run(self) -> None:\n # THIS METHOD MUST BE PROVIDED\n self.do_my_analysis()\n\n def do_my_analysis(self) -> None:\n # Send a message, proper way to print:\n msg: Message(contents=\"My message contents\", signal=\"\")\n self._report_to_executor(msg)\n\n # When done, set result - assume we wrote a file, e.g.\n self._result.payload = \"/path/to/output_file.h5\"\n # Optionally also set status - good practice but not obligatory\n self._result.task_status = TaskStatus.COMPLETED\n
"},{"location":"tutorial/new_task/#using-mpi-for-your-task","title":"Using MPI for your Task
","text":"In the case your Task
is written to use MPI
a slight modification to the template above is needed. Specifically, an additional keyword argument should be passed to the base class initializer: use_mpi=True
. This tells the base class to adjust signalling/communication behaviour appropriately for a multi-rank MPI program. Doing this prevents tricky-to-track-down problems due to ranks starting, completing and sending messages at different times. The rest of your code can, as before, be written as you see fit. The use of this keyword argument will also synchronize the start of all ranks and wait until all ranks have finished to exit.
\"\"\"Task which needs to run with MPI\"\"\"\n\n__all__ = [\"RunTask\"]\n__author__ = \"\" # Please include so we know who the SME is\n\n# Include any imports you need here\n\nfrom lute.execution.ipc import Message # Message for communication\nfrom lute.io.models.base import * # For TaskParameters\nfrom lute.tasks.task import * # For Task\n\n# Only the init is shown\nclass RunMPITask(Task): # Inherit from Task\n \"\"\"Task description goes here, or in __init__\"\"\"\n\n # Signal the use of MPI!\n def __init__(self, *, params: TaskParameters, use_mpi: bool = True) -> None:\n super().__init__(params=params, use_mpi=use_mpi) # Sets up Task, parameters, etc.\n # That's it.\n
"},{"location":"tutorial/new_task/#message-signals","title":"Message signals","text":"Signals in Message
objects are strings and can be one of the following:
LUTE_SIGNALS: Set[str] = {\n \"NO_PICKLE_MODE\",\n \"TASK_STARTED\",\n \"TASK_FAILED\",\n \"TASK_STOPPED\",\n \"TASK_DONE\",\n \"TASK_CANCELLED\",\n \"TASK_RESULT\",\n}\n
Each of these signals is associated with a hook on the Executor
-side. They are for the most part used by base classes; however, you can choose to make use of them manually as well.
Task
available","text":"Once the Task
has been written, it needs to be made available for import. Since different Task
s can have conflicting dependencies and environments, this is managed through an import function. When the Task
is done, or ready for testing, a condition is added to lute.tasks.__init__.import_task
. For example, assume the Task
is called RunXASAnalysis
and it's defined in a module called xas.py
, we would add the following lines to the import_task
function:
# in lute.tasks.__init__\n\n# ...\n\ndef import_task(task_name: str) -> Type[Task]:\n # ...\n if task_name == \"RunXASAnalysis\":\n from .xas import RunXASAnalysis\n\n return RunXASAnalysis\n
"},{"location":"tutorial/new_task/#defining-an-executor","title":"Defining an Executor
","text":"The process of Executor
definition is identical to the process as described for ThirdPartyTask
s above. The one exception is if you defined the Task
to use MPI as described in the section above (Using MPI for your Task
), you will likely consider using the MPIExecutor
.
class BaseExecutor(ABC):
+488
+489
+490
class BaseExecutor(ABC):
"""ABC to manage Task execution and communication with user services.
When running in a workflow, "tasks" (not the class instances) are submitted
@@ -2633,7 +2635,9 @@
# network.
time.sleep(0.1)
# Propagate any env vars setup by Communicators - only update LUTE_ vars
- tmp: Dict[str, str] = {key: os.environ[key] for key in os.environ if "LUTE_" in key}
+ tmp: Dict[str, str] = {
+ key: os.environ[key] for key in os.environ if "LUTE_" in key
+ }
self._analysis_desc.task_env.update(tmp)
def _submit_task(self, cmd: str) -> subprocess.Popen:
@@ -3267,9 +3271,7 @@
Source code in lute/execution/executor.py
- 309
-310
-311
+ 311
312
313
314
@@ -3310,7 +3312,9 @@
349
350
351
-352
def execute_task(self) -> None:
+352
+353
+354
def execute_task(self) -> None:
"""Run the requested Task as a subprocess."""
self._pre_task()
lute_path: Optional[str] = os.getenv("LUTE_PATH")
@@ -3378,14 +3382,14 @@
Source code in lute/execution/executor.py
- 478
-479
-480
+ 480
481
482
483
484
-485
def process_results(self) -> None:
+485
+486
+487
def process_results(self) -> None:
"""Perform any necessary steps to process TaskResults object.
Processing will depend on subclass. Examples of steps include, moving
@@ -4142,9 +4146,7 @@
Source code in lute/execution/executor.py
- 491
-492
-493
+ 493
494
495
496
@@ -4319,7 +4321,9 @@
665
666
667
-668
class Executor(BaseExecutor):
+668
+669
+670
class Executor(BaseExecutor):
"""Basic implementation of an Executor which manages simple IPC with Task.
Attributes:
@@ -4527,9 +4531,7 @@
Source code in lute/execution/executor.py
- 525
-526
-527
+ 527
528
529
530
@@ -4602,7 +4604,9 @@
597
598
599
-600
def add_default_hooks(self) -> None:
+600
+601
+602
def add_default_hooks(self) -> None:
"""Populate the set of default event hooks."""
def no_pickle_mode(self: Executor, msg: Message):
@@ -4746,9 +4750,7 @@
Source code in lute/execution/executor.py
- 671
-672
-673
+ 673
674
675
676
@@ -4790,7 +4792,9 @@
712
713
714
-715
class MPIExecutor(Executor):
+715
+716
+717
class MPIExecutor(Executor):
"""Runs first-party Tasks that require MPI.
This Executor is otherwise identical to the standard Executor, except it
diff --git a/dev/source/tasks/sfx_find_peaks/index.html b/dev/source/tasks/sfx_find_peaks/index.html
index 2523df7d..dfe3621b 100644
--- a/dev/source/tasks/sfx_find_peaks/index.html
+++ b/dev/source/tasks/sfx_find_peaks/index.html
@@ -1376,7 +1376,8 @@
Source code in lute/tasks/sfx_find_peaks.py
- 31
+ 30
+ 31
32
33
34
@@ -1708,7 +1709,21 @@
360
361
362
-363
class CxiWriter:
+363
+364
+365
+366
+367
+368
+369
+370
+371
+372
+373
+374
+375
+376
+377
class CxiWriter:
def __init__(
self,
@@ -1887,6 +1902,21 @@
ch_rows: NDArray[numpy.float_] = peaks[:, 0] * self._det_shape[1] + peaks[:, 1]
ch_cols: NDArray[numpy.float_] = peaks[:, 2]
+ if self._outh5["/entry_1/data_1/data"].shape[0] <= self._index:
+ self._outh5["entry_1/data_1/data"].resize(self._index + 1, axis=0)
+ ds_key: str
+ for ds_key in self._outh5["/entry_1/result_1"].keys():
+ self._outh5[f"/entry_1/result_1/{ds_key}"].resize(
+ self._index + 1, axis=0
+ )
+ for ds_key in (
+ "machineTime",
+ "machineTimeNanoSeconds",
+ "fiducial",
+ "photon_energy_eV",
+ ):
+ self._outh5[f"/LCLS/{ds_key}"].resize(self._index + 1, axis=0)
+
# Entry_1 entry for processing with CrystFEL
self._outh5["/entry_1/data_1/data"][self._index, :, :] = img.reshape(
-1, img.shape[-1]
@@ -2099,7 +2129,8 @@
Source code in lute/tasks/sfx_find_peaks.py
- 33
+ 32
+ 33
34
35
36
@@ -2241,8 +2272,7 @@
172
173
174
-175
-176
def __init__(
+175
def __init__(
self,
outdir: str,
rank: int,
@@ -2414,21 +2444,7 @@
Source code in lute/tasks/sfx_find_peaks.py
- 314
-315
-316
-317
-318
-319
-320
-321
-322
-323
-324
-325
-326
-327
-328
+ 328
329
330
331
@@ -2463,7 +2479,21 @@ 360
361
362
-363
def optimize_and_close_file(
+363
+364
+365
+366
+367
+368
+369
+370
+371
+372
+373
+374
+375
+376
+377
def optimize_and_close_file(
self,
num_hits: int,
max_peaks: int,
@@ -2550,7 +2580,8 @@
Source code in lute/tasks/sfx_find_peaks.py
- 178
+ 177
+178
179
180
181
@@ -2650,7 +2681,21 @@
275
276
277
-278
def write_event(
+278
+279
+280
+281
+282
+283
+284
+285
+286
+287
+288
+289
+290
+291
+292
def write_event(
self,
img: NDArray[numpy.float_],
peaks: Any, # Not typed becomes it comes from psana
@@ -2682,6 +2727,21 @@
ch_rows: NDArray[numpy.float_] = peaks[:, 0] * self._det_shape[1] + peaks[:, 1]
ch_cols: NDArray[numpy.float_] = peaks[:, 2]
+ if self._outh5["/entry_1/data_1/data"].shape[0] <= self._index:
+ self._outh5["entry_1/data_1/data"].resize(self._index + 1, axis=0)
+ ds_key: str
+ for ds_key in self._outh5["/entry_1/result_1"].keys():
+ self._outh5[f"/entry_1/result_1/{ds_key}"].resize(
+ self._index + 1, axis=0
+ )
+ for ds_key in (
+ "machineTime",
+ "machineTimeNanoSeconds",
+ "fiducial",
+ "photon_energy_eV",
+ ):
+ self._outh5[f"/LCLS/{ds_key}"].resize(self._index + 1, axis=0)
+
# Entry_1 entry for processing with CrystFEL
self._outh5["/entry_1/data_1/data"][self._index, :, :] = img.reshape(
-1, img.shape[-1]
@@ -2779,21 +2839,7 @@
Source code in lute/tasks/sfx_find_peaks.py
- 280
-281
-282
-283
-284
-285
-286
-287
-288
-289
-290
-291
-292
-293
-294
+ 294
295
296
297
@@ -2811,7 +2857,21 @@ 309
310
311
-312
def write_non_event_data(
+312
+313
+314
+315
+316
+317
+318
+319
+320
+321
+322
+323
+324
+325
+326
def write_non_event_data(
self,
powder_hits: NDArray[numpy.float_],
powder_misses: NDArray[numpy.float_],
@@ -2879,21 +2939,7 @@
Source code in lute/tasks/sfx_find_peaks.py
- 575
-576
-577
-578
-579
-580
-581
-582
-583
-584
-585
-586
-587
-588
-589
+ 589
590
591
592
@@ -3122,14 +3168,38 @@
815
816
817
-818
class FindPeaksPyAlgos(Task):
+818
+819
+820
+821
+822
+823
+824
+825
+826
+827
+828
+829
+830
+831
+832
+833
+834
+835
+836
+837
+838
+839
+840
class FindPeaksPyAlgos(Task):
"""
Task that performs peak finding using the PyAlgos peak finding algorithms and
writes the peak information to CXI files.
"""
- def __init__(self, *, params: TaskParameters) -> None:
- super().__init__(params=params)
+ def __init__(self, *, params: TaskParameters, use_mpi: bool = True) -> None:
+ super().__init__(params=params, use_mpi=use_mpi)
+ if self._task_parameters.compression is not None:
+ from libpressio import PressioCompressor
def _run(self) -> None:
ds: Any = MPIDataSource(
@@ -3306,9 +3376,15 @@
# TODO: Fix bug here
# generate / update powders
if peaks.shape[0] >= self._task_parameters.min_peaks:
- powder_hits = numpy.maximum(powder_hits, img)
+ powder_hits = numpy.maximum(
+ powder_hits,
+ img.reshape(-1, img.shape[-1]),
+ )
else:
- powder_misses = numpy.maximum(powder_misses, img)
+ powder_misses = numpy.maximum(
+ powder_misses,
+ img.reshape(-1, img.shape[-1]),
+ )
if num_empty_images != 0:
msg: Message = Message(
@@ -3414,25 +3490,25 @@
Source code in lute/tasks/sfx_find_peaks.py
- 554
-555
-556
-557
-558
-559
-560
-561
-562
-563
-564
-565
-566
-567
-568
+ 568
569
570
571
-572
def add_peaks_to_libpressio_configuration(lp_json, peaks) -> Dict[str, Any]:
+572
+573
+574
+575
+576
+577
+578
+579
+580
+581
+582
+583
+584
+585
+586
def add_peaks_to_libpressio_configuration(lp_json, peaks) -> Dict[str, Any]:
"""
Add peak infromation to libpressio configuration
@@ -3488,21 +3564,7 @@
Source code in lute/tasks/sfx_find_peaks.py
- 466
-467
-468
-469
-470
-471
-472
-473
-474
-475
-476
-477
-478
-479
-480
+ 480
481
482
483
@@ -3573,7 +3635,21 @@ 548
549
550
-551
def generate_libpressio_configuration(
+551
+552
+553
+554
+555
+556
+557
+558
+559
+560
+561
+562
+563
+564
+565
def generate_libpressio_configuration(
compressor: Literal["sz3", "qoz"],
roi_window_size: int,
bin_size: int,
@@ -3699,21 +3775,7 @@
Source code in lute/tasks/sfx_find_peaks.py
- 366
-367
-368
-369
-370
-371
-372
-373
-374
-375
-376
-377
-378
-379
-380
+ 380
381
382
383
@@ -3796,7 +3858,21 @@
460
461
462
-463
def write_master_file(
+463
+464
+465
+466
+467
+468
+469
+470
+471
+472
+473
+474
+475
+476
+477
def write_master_file(
mpi_size: int,
outdir: str,
exp: str,