Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Astronomer-on-AWS proof-of-concept #51

Closed
ian-r-rose opened this issue Mar 20, 2023 · 1 comment
Closed

Astronomer-on-AWS proof-of-concept #51

ian-r-rose opened this issue Mar 20, 2023 · 1 comment
Assignees

Comments

@ian-r-rose
Copy link
Member

As part of cagov/data-orchestration#4, I'd like to stand up AWS infastructure for Astronomer.

I'm copying some ideas from #138 on things to look out for when evaluating how easy-to-use this approach is. Along each of these dimensions, I would score them from "makes me happy" to "makes me sad".

Custom software environments

The GIS stack often requires custom software environments. That is to say, whatever default image the tool uses will not do the job, so we'll need to provide our own image. We'll want to evaluate how easy it is to build our own image and provide it to the orchestration tool.

Compute resources

Do they provide a compute cluster or other batch-like service for running custom jobs? Or will we need to bring our own K8s/ECS/Batch cluster? If the latter, how easy is it to set up?

Integration with AWS/GCP services

If we do have services running in our own cloud account, how easy is it to interact with them? Are there nice user interfaces for securely providing service account credentials?

API access and CI integration

How painful is it to deploy new versions of a pipeline? Are there CI tools or custom GitHub actions for doing this? Ideally it is simple to deploy-on-merge.

dbt Integration

This is one of the most important: all of the major orchestrators have been implementing some level of integration with dbt (which provides its own DAG abstraction). How pleasant are these integrations to use?

  • Is it easy to trigger a dbt run from the API? In the test, we'll want to kick off dbt after the initial load.
  • Is it easy to have a task triggered by the completion of a dbt run? In the test, we'll want to send off an email/ad-hoc-report after the dbt run is complete.
  • Is there visibility into a dbt run while it is going on? Can we see:
    • The structure of a dbt DAG
    • The current status of the dbt DAG as it runs
    • Any error states or warnings from dbt
    • Logs from the dbt cl
@ram-kishore-odi
Copy link
Contributor

Closing this as not planned. The POC story will be taken up later if needed. Please refer to this ticket for additional information - #378

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants