Astronomer-on-AWS proof-of-concept #51

ian-r-rose · 2023-03-20T21:48:57Z

As part of cagov/data-orchestration#4, I'd like to stand up AWS infastructure for Astronomer.

I'm copying some ideas from #138 on things to look out for when evaluating how easy-to-use this approach is. Along each of these dimensions, I would score them from "makes me happy" to "makes me sad".

Custom software environments

The GIS stack often requires custom software environments. That is to say, whatever default image the tool uses will not do the job, so we'll need to provide our own image. We'll want to evaluate how easy it is to build our own image and provide it to the orchestration tool.

Compute resources

Do they provide a compute cluster or other batch-like service for running custom jobs? Or will we need to bring our own K8s/ECS/Batch cluster? If the latter, how easy is it to set up?

Integration with AWS/GCP services

If we do have services running in our own cloud account, how easy is it to interact with them? Are there nice user interfaces for securely providing service account credentials?

API access and CI integration

How painful is it to deploy new versions of a pipeline? Are there CI tools or custom GitHub actions for doing this? Ideally it is simple to deploy-on-merge.

dbt Integration

This is one of the most important: all of the major orchestrators have been implementing some level of integration with dbt (which provides its own DAG abstraction). How pleasant are these integrations to use?

Is it easy to trigger a dbt run from the API? In the test, we'll want to kick off dbt after the initial load.
Is it easy to have a task triggered by the completion of a dbt run? In the test, we'll want to send off an email/ad-hoc-report after the dbt run is complete.
Is there visibility into a dbt run while it is going on? Can we see:
- The structure of a dbt DAG
- The current status of the dbt DAG as it runs
- Any error states or warnings from dbt
- Logs from the dbt cl

ram-kishore-odi · 2024-12-07T01:45:52Z

Closing this as not planned. The POC story will be taken up later if needed. Please refer to this ticket for additional information - #378

ian-r-rose self-assigned this Mar 20, 2023

This was referenced Nov 14, 2024

Create plan for Infrastructure Improvements Step 0: Review Open Issues #377

Closed

Create plan for Infrastructure Improvements Step 1: Create Roadmap #378

Open

ram-kishore-odi closed this as completed Dec 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Astronomer-on-AWS proof-of-concept #51

Astronomer-on-AWS proof-of-concept #51

ian-r-rose commented Mar 20, 2023

ram-kishore-odi commented Dec 7, 2024

Astronomer-on-AWS proof-of-concept #51

Astronomer-on-AWS proof-of-concept #51

Comments

ian-r-rose commented Mar 20, 2023

Custom software environments

Compute resources

Integration with AWS/GCP services

API access and CI integration

dbt Integration

ram-kishore-odi commented Dec 7, 2024