Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(pipeline): generate main dag from dbt, using cosmos #289

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

vmttn
Copy link
Contributor

@vmttn vmttn commented Sep 11, 2024

The trick is to leverage plpython to do the geocoding
inside the database. Doing so, geocoding can now be
modelled as a dbt model and orchestrate as such.

The geocoding implementation has been moved to an
actual package maintained next to the datawarehouse
image. The plpython udf simply wraps the call.

Is it worth it ?

* it heavely simplifies the flow and set clearer concerns
between airflow and dbt. Dbt does the transformation, airflow
orchestrate it.
* less error prone since we do not have to pull data from the db
and map it to python ourselves.
* geocoding now can simply be done per source. This would have
been terribly cumbersome.
* we can even leverage dbt to to the geocoding incrementally on
inputs that have not been seen before. This will drastically
reduce our carbon footprint...

There are a few enhancements we would probably want :

* obviously clean up
* define a macro with the geocoding model that can be used for all
sources
* re-do geocodings for relatively old inputs
@vmttn vmttn changed the title Generate main dag from dbt, using cosmos chore(pipeline): generate main dag from dbt, using cosmos Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant