AWS MWAA proof-of-concept #50

ian-r-rose · 2023-03-20T21:47:21Z

As part of cagov/data-orchestration#4, I'd like to stand up AWS infastructure for MWAA using terraform.

I'm copying some ideas from #138 on things to look out for when evaluating how easy-to-use this approach is. Along each of these dimensions, I would score them from "makes me happy" to "makes me sad".

Custom software environments

The GIS stack often requires custom software environments. That is to say, whatever default image the tool uses will not do the job, so we'll need to provide our own image. We'll want to evaluate how easy it is to build our own image and provide it to the orchestration tool.

Compute resources

Do they provide a compute cluster or other batch-like service for running custom jobs? Or will we need to bring our own K8s/ECS/Batch cluster? If the latter, how easy is it to set up?

Integration with AWS/GCP services

If we do have services running in our own cloud account, how easy is it to interact with them? Are there nice user interfaces for securely providing service account credentials?

API access and CI integration

How painful is it to deploy new versions of a pipeline? Are there CI tools or custom GitHub actions for doing this? Ideally it is simple to deploy-on-merge.

dbt Integration

This is one of the most important: all of the major orchestrators have been implementing some level of integration with dbt (which provides its own DAG abstraction). How pleasant are these integrations to use?

Is it easy to trigger a dbt run from the API? In the test, we'll want to kick off dbt after the initial load.
Is it easy to have a task triggered by the completion of a dbt run? In the test, we'll want to send off an email/ad-hoc-report after the dbt run is complete.
Is there visibility into a dbt run while it is going on? Can we see:
- The structure of a dbt DAG
- The current status of the dbt DAG as it runs
- Any error states or warnings from dbt
- Logs from the dbt cl

ian-r-rose self-assigned this Mar 20, 2023

ian-r-rose mentioned this issue Mar 23, 2023

Managed Airflow (MWAA) proof-of-concept #57

Merged

britt-allen closed this as completed in #57 Apr 7, 2023

ian-r-rose mentioned this issue Apr 7, 2023

Add documentation project #61

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS MWAA proof-of-concept #50

AWS MWAA proof-of-concept #50

ian-r-rose commented Mar 20, 2023

AWS MWAA proof-of-concept #50

AWS MWAA proof-of-concept #50

Comments

ian-r-rose commented Mar 20, 2023

Custom software environments

Compute resources

Integration with AWS/GCP services

API access and CI integration

dbt Integration