-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate orchestration options #4
Comments
Orchestration optionsThis a fairly basic comparison of a bunch of the available options for workflow orchestration. There are also some more closed-off proprietary solutions from AWS, GCP, etc (e.g., Glue) that I haven't really evaluated. They often try to be low-code and serverless. I tend to be a bit more skittish around single cloud options. A high-level note: Our general approach right now is to prefer an ELT-style workflow over an ETL-style one. So while these orchestration tools allow for quite complex DAGs, I would tend to treat them as more simple hosted services for running small loading scripts on a schedule. This means that some of the neat, more advanced features of these things would be less relevant to our (initial) deployments. AirflowAirflow is the oldest and most popular orchestration tool that is still widely used today. Pros
Cons
There are at least three managed offerings of Airflow available from major vendors: GCP Cloud Composer
AWS MWAA
Astronomer
Prefect Cloud
Dagster Cloud
Feature Comparisons
Would be particularly interested in hearing @jasonlally's thoughts about the above. |
@ian-r-rose - this is great! Thanks for putting this together. Some thoughts:
What do you think of doing an intentional test drive on a simple workflow? If we do that, let's hash out a test plan before starting any assessment. |
As a side note, I found Astronomer to be really great last I used it. They really focus on developer experience and have some nice useful cli tools that wrap around airflow and make managing deployments easier. |
I don't want to stress it too much, since I think setting up AWS SES or sendgrid for an airflow deployment isn't too much work. That said, I was a bit surprised to read that Astronomer doesn't do this for you, as it seems like a fairly simple value-add. But perhaps I'm not understanding their docs correctly.
That's good to hear. I previously self-managed an airflow deployment, and it was a significant amount of work. The developer experience around these managed offerings has improved a lot.
Sure, I think that would be instructive. One idea for a test plan could be to load the Microsoft building footprints dataset, as there are a few things that make it a moderately challenging job which might flush out issues:
|
This is relevant to @melanie-logan working on data loading options. @ian-r-rose should we close this since Melanie is working on evaluation now? Or I guess not since we still want to do more eval on orchestration. We can keep open, but makes sense to me to reassign. |
Makes sense to me! |
Yes, I can do a separate orchestration Eval. Thanks! |
Closing this as complete for now. We may revisit orchestration options at some point, but would probably start with a new set of tasks that are relevant. Please refer to this ticket for additional information - #378 |
We intend to do most transformation in our data warehouse(s) via dbt, but there is still need for scheduled loads of custom data. So while a full DAG framework might be overkill at this point, some sort of workflow orchestration tool is worthwhile.
Some requirements:
Nice-to-have:
The text was updated successfully, but these errors were encountered: