You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One option for orchestration is GCP's managed airflow, Composer (AWS has one as well). As part of #4, we can stand up an instance and kick the tires on it. I've used Airflow before, but never the managed service. Though keeping the airflow server up and running was really annoying, so a managed service is attractive!
As a testing ground, I'd like to investigate:
Dev workflow with GitHub
Best practices with developing in a test environment
Environment isolation (e.g., using the virtual env operator, or kubernetes pod operator)
A good first target for data artifacts could be loading some geospatial reference data into our data warehouse(s). The benefits team is particularly interested in healthy places index and CalEnviroscreen
The text was updated successfully, but these errors were encountered:
It currently has two DAGs which load two datasets from the California Geo portal:
California incorporated cities
California counties
Some early thoughts:
There is a brand new CLI tool from Google which makes local development somewhat tolerable. It basically manages a docker flow using their official images, and stands up local airflow deployments based upon them.
I'm currently just installing custom python packages (specifically geopandas and GDAL/geos stuff) into the main environment. This works right now, but is definitely not scalable. Eventually we'd probably need a kubernetes pod operator.
This is (so far) way easier than managing an airflow deployment manually.
I should also note that this is not free: a minimal version is ~$150 per month, and something that we transition to being actual infrastructure would be a few times that. I expect most of the options (#4) would have similar stories, though I haven't priced them out.
One option for orchestration is GCP's managed airflow, Composer (AWS has one as well). As part of #4, we can stand up an instance and kick the tires on it. I've used Airflow before, but never the managed service. Though keeping the airflow server up and running was really annoying, so a managed service is attractive!
As a testing ground, I'd like to investigate:
A good first target for data artifacts could be loading some geospatial reference data into our data warehouse(s). The benefits team is particularly interested in healthy places index and CalEnviroscreen
The text was updated successfully, but these errors were encountered: