Author: @nicseltzer
Status: Planning
This project aims to be the jumping off point for the OpenSTL Extract, Transform, and Load (ETL) pipeline.
The below notes assume that you've forked the repo to your account and have pulled down a clone of that fork.
Note: This project has only been built on MacOS and Linux so far. Submit PRs with instructions on building / running in Windows (or whatever you use).
If you don't have Python3 installed, you'll need to download it from the Python website for your particular operating system.
There are a lot of resources for doing this online, but I recommend the following (especically if you're using VSCode):
- From your locally cloned repo, run
python3 -m venv .venv
. This will create a .venv directory which contains all of the pieces for your local Python project. - That's it. You've created a venv.
- Using the your favorite terminal or the terminal built-in to VSCode (which should pick up
.venv
automatically), runsource ./.venv/bin/activate
. This will do the magic of setting up your project in isolation from your global package manifest. - Next, we need to get the dependencies for the project. You can do this by running
./make.py
. If you need to add dependencies, you can add them withpip
as you normally would. Just make sure to run./package.py
before committing back to the repo.
- Run
python3 ./app.py
.
- Run
deactivate
.
Author: @nicseltzer
Status: Alpha
This script is run at a configurable interval and is responsible for fetching data from configured remote sources.
Author: @nicseltzer
Status: Alpha
This module is responsible for classifying fetched binary data. The application will hand this data off to the Extractor module.
Author: Looking for owner
Status: Unstarted
This module is responsible for taking data of a given format and extracting it to an agreed upon, unifrom format
Author: Looking for owner
Status: Unstarted
This module will mold the data into a usable state - pandas? Ah?
Author: Looking for owner
Status: Unstarted
This module is responsible for pushing the transformed data into a persistent datastore.
https://www.stlouis-mo.gov/data/upload/data-files/prcl_shape.zip
https://www.stlouis-mo.gov/data/upload/data-files/prcl.zip https://www.stlouis-mo.gov/data/upload/data-files/par.zip
https://www.stlouis-mo.gov/data/upload/data-files/lra_public.zip
https://www.stlouis-mo.gov/data/upload/data-files/bldginsp.zip
https://www.stlouis-mo.gov/data/upload/data-files/prmbdo.zip
https://www.stlouis-mo.gov/data/upload/data-files/forestry-maintenance-properties.csv