This page documents the local environment setup steps for the Justice40 Data Pipeline and Scoring Application. It covers steps for macOS and Win10. If you are not on either of those platforms, install the software using instructions appropriate for your operating system and device.
⚠️ WARNING
This guide assumes you've performed all prerequisite steps listed in the main installation guide. If you've not performed those steps, now is a good time.
💡 NOTE
If you've not yet read the project README or the data pipeline and scoring application README to familiarize yourself with the project, it would be useful to do so before continuing with this installation guide.
The Justice40 Data Pipeline and Scoring Application is written in Python. It can be run using Poetry after installing a few third party tools.
The application is written in Python, and requires the installation of Python 3.8 or newer (we recommend 3.10).
There are many ways to install Python on macOS, and you can choose any of those ways that work for your configuration.
One such way is by using pyenv
. pyenv
allows you to manage multiple Python versions on the same device. To install pyenv
on your system, follow these instructions. Be sure to follow any post-installation steps listed by Homebrew, as well as any extra steps listed in the installation instructions.
Once pyenv
is installed, you can use it to install Python. Execute the command pyenv install 3.10.6
to install Python 3.10. After installing Python, navigate to the justice40-tool
directory and set this Python to be your default by issuing the command pyenv local 3.10.6
. Run the command python --version
to make sure this worked.
⚠️ WARNING We've had trouble with 3rd party dependencies in Python 3.11 on macOS machines with Apple silicon. In case of odd dependency issues, please use Python 3.10.
Follow the Get Started guide on python.org to download and install Python on your Windows system. Alternately, if you wish to manage your Python installations more carefully, you can use pyenv-win
.
The Justice40 Data Pipeline and Scoring Application uses Poetry to manage Python dependencies. Those dependencies are defined in pyproject.toml, and exact versions of all dependencies can be found in poetry.lock.
Once Poetry is installed, you can download project dependencies by navigating to justice40-tool/data/data-pipeline
and running poetry install
.
⚠️ WARNING
While it may be tempting to runpoetry update
, this project is built with older versions of some dependencies. Updating all dependencies will likely cause the application to behave in unexpected ways, and may cause the application to crash.
To install Poetry on macOS, follow the installation instructions on the Poetry site. There are multiple ways to install Poetry; we prefer installing and managing it through pipx
(requires pipx
installation), but feel free to use whatever works for your configuration.
To install Poetry on Win10, follow the installation instructions on the Poetry site.
The application requires the installation of three 3rd party tools.
Tool | Purpose | Link |
---|---|---|
GDAL | Generate census data | GDAL library |
libspatialindex | Score generation | libspatialindex |
tippecanoe | Generate map tiles | Mapbox tippecanoe |
Use Homebrew to install the three tools.
- GDAL:
brew install gdal
- libspatialindex:
brew install spatialindex
- tippecanoe:
brew install tippecanoe
❗ ATTENTION
For macOS Monterey or Macs with Apple silicon, you may need to follow these steps to install Scipy.
If you want to run tile generation, please install tippecanoe following these instructions. You also need some pre-requisites for Geopandas (as specified in the Poetry requirements). Please follow these instructions to install the Geopandas dependency locally. It's definitely easier if you have access to WSL (Windows Subsystem Linux), and install these packages using commands similar to our Dockerfile.
To promote consistent code style and quality, we use Git pre-commit hooks to automatically lint and reformat our code before every commit. This project's pre-commit hooks are defined in .pre-commit-config.yaml
.
After following the installation instructions for your platform, navigate to the justice40-tool/data/data-pipeline
directory and run pre-commit install
to install the pre-commit hooks used in this repository.
After installing pre-commit hooks, any time you commit code to the repository the hooks will run on all modified files automatically. You can force a re-run on all files with pre-commit run --all-files
.
Follow the Homebrew installation instructions on the pre-commit website to install pre-commit on macOS.
Follow the instructions on the pre-commit website to install pre-commit on Win10.
In the client part of the codebase (the justice40-tool/client
folder), we use a different tool, Husky
, to run pre-commit hooks. It is not possible to run both our Husky
hooks and pre-commit
hooks on every commit; either one or the other will run.
Husky
is installed every time you run npm install
. To use the Husky
front-end hooks during front-end development, simply run npm install
.
However, running npm install
overwrites the backend hooks setup by pre-commit
. To restore the backend hooks after running npm install
, do the following:
- Run
pre-commit install
while in thejustice40-tool/data/data-pipeline
directory. - The terminal should respond with an error message such as:
[ERROR] Cowardly refusing to install hooks with `core.hooksPath` set.
hint: `git config --unset-all core.hooksPath`
This error is caused by having previously run npm install
which used Husky
to overwrite the hooks path.
- Follow the hint and run
git config --unset-all core.hooksPath
. - Run
pre-commit install
again.
Now pre-commit
and the backend hooks should work.
If you are using VS Code, you can make use of the .vscode
configurations located at data/data-pipeline/.vscode
. To do this, open VS Code with the command code data/data-pipeline
.
These configurations include,
launch.json
- launch commands that allow for debugging the various commands inapplication.py
. Note that because we are using the otherwise excellent Click CLI, and Click in turn usesconsole_scripts
to parse and execute command line options, it is necessary to run the equivalent ofpython -m data_pipeline.application [command]
withinlaunch.json
to be able to set and hit breakpoints (this is what is currently implemented. Otherwise, you may find that the script times out after 5 seconds. More about this here.settings.json
- these ensure that you're using the default linter (pylint
), formatter (flake8
), and test library (pytest
).tasks.json
- these enable you to useTerminal → Run Task
to run our preferred formatters and linters within your project.
Please only add settings to this file that should be shared across the team (not settings here that only apply to local development environments, such as those that use absolute paths). If you are looking to add something to this file, check in with the rest of the team to ensure the proposed settings should be shared.