Skip to content

Commit

Permalink
Merge branch 'dev' into ENH/yaml_substitutions
Browse files Browse the repository at this point in the history
  • Loading branch information
gadorlhiac committed May 6, 2024
2 parents 9a0bcfa + 405417f commit d7f116e
Show file tree
Hide file tree
Showing 3 changed files with 78 additions and 0 deletions.
2 changes: 2 additions & 0 deletions docs/adrs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,6 @@
| 5 | 2023-12-06 | Task-Executor IPC is Managed by Communicator Objects | **Proposed** |
| 6 | 2024-02-12 | Third-party Config Files Managed by Templates Rendered by `ThirdPartyTask`s | **Proposed** |
| 7 | 2024-02-12 | `Task` Configuration is Stored in a Database Managed by `Executor`s | **Proposed** |
| 8 | 2024-03-18 | Airflow credentials/authorization requires special launch program. | **Proposed** |
| 9 | 2024-04-15 | Airflow launch script will run as long lived batch job. | **Proposed** |
| | | | |
37 changes: 37 additions & 0 deletions docs/adrs/adr-8.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# [ADR-8] Airflow credentials/authorization requires special launch program

**Date:** 2024-03-18

## Status
**Proposed**

## Context and Problem Statement
- Airflow is used as the workflow manager.
- Airflow does not currently support multi-tenancy, and LDAP is not currently supported for authentication.
- Multiple users will be expected to run the software and thus need to authenticate against the Airflow API.
- We require a mechanism to control shared credentials for multiple users.
- The credentials are admin credentials, so we do not want unconstrained access to them.
- We want users to run workflows, for instance, but not to have free access to add and remove workflows.

## Decision
A closed-source `lute_launcher` program will be used to run the Airflow launch scripts. This program accesses credentials with the correct permissions. Users should otherwise not have access to the credentials. This will help ensure the credentials can be used by everyone but only to run workflows and not perform restricted admin activities.

### Decision Drivers
* Need shared access to credentials for the purpose of launching jobs.
* Restricted access to credentials for administrative activities.
* Ease of use for users
* Authentication should be automatic - users can not be asked for passwords etc, for jobs that need to run automatically upon data acquisition

### Considered Options
* LDAP - this may be used in the future, but requires backend work outside of our control. We will revisit the implementation arising from this ADR in the future if LDAP is supported.
*

## Consequences
* Complexity

## Compliance


## Metadata
- This ADR WILL be revisited during the post-mortem of the first prototype.
- Compliance section will be updated as prototype evolves.
39 changes: 39 additions & 0 deletions docs/adrs/adr-9.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# [ADR-9] Airflow launch script will run as long lived batch job.

**Date:** 2024-04-15

## Status
**Proposed**

## Context and Problem Statement
- Each `Task` will produce its own log file.
- Log files from jobs (i.e. DAGs/workflows) run by different users will be in different locations/directories.
- None of these log files will be accessible from the Web UI of the eLog unless they are available to the initial launch script which starts the workflow.

## Decision
The Airflow launch script will be a long lived process, running for the duration of the entire DAG. It will provide basic status logging information, e.g. what `Task`s are running, if they succeed or failed. Additionally, at the end of each `Task` job, the launch job will collect the log file from that job and append it to its own log.

As the Airflow launch script is an entry point used from the eLog, only its log file is available to users using that UI. By converting the launch script into a long-lived monitoring job it allows the log information to be easily accessible.

In order to accomplish this, the launch script must be submitted as a batch job, in order to comply with the 30 second timeout imposed by jobs run by the ARP. This necessitates providing an additional wrapper script.

### Decision Drivers
* Log availability from the eLog.
* All logs available from a single location.

### Considered Options
* All jobs append to the same initial file, by specifying a log file. (`--open-mode=append` for SLURM)
* Having a monitoring job provides the opportunity to include additional information.

## Consequences
* There needs to be an additional wrapper script: `submit_launch_airflow.sh` which submits the `launch_airflow.py` script (run by `lute_launcher`) as a batch job.
* Jobs run by the ARP can not be long-lived - there is a 30 second timeout.
* The ARP was intended to submit batch jobs - it captures the log file from batch jobs, so running the job directly or submitting as a batch job is equivalent in terms of presenting information to the eLog UI.
* Another core is used to run the job. Overhead is now two cores - 1 for the monitoring job (`launch_airflow.py`) and 1 for the `Executor` process.

## Compliance


## Metadata
- This ADR WILL be revisited during the post-mortem of the first prototype.
- Compliance section will be updated as prototype evolves.

0 comments on commit d7f116e

Please sign in to comment.