This template is the result of my years refining the best way to structure a data science project so that it is reproducible and maintainable.
This template allows you to:
✅ Create a readable structure for your project
✅ Automatically run tests when committing your code
✅ Enforce type hints at runtime
✅ Check issues in your code before committing
✅ Efficiently manage the dependencies in your project
✅ Create short and readable commands for repeatable tasks
✅ Rerun only modified components of a pipeline
✅ Automatically document your code
✅ Observe and automate your code
- Poetry: Dependency management - article
- Prefect: Orchestrate and observe your data pipeline - article
- Pydantic: Data validation using Python type annotations - article
- pre-commit plugins: Automate code reviewing formatting - article
- Makefile: Create short and readable commands for repeatable tasks - article
- GitHub Actions: Automate your workflows, making it faster to build, test, and deploy your code - article
- pdoc: Automatically create an API documentation for your project
.
├── data
│ ├── final # data after training the model
│ ├── processed # data after processing
│ ├── raw # raw data
├── docs # documentation for your project
├── .flake8 # configuration for flake8 - a Python formatter tool
├── .gitignore # ignore files that cannot commit to Git
├── Makefile # store useful commands to set up the environment
├── models # store models
├── notebooks # store notebooks
├── .pre-commit-config.yaml # configurations for pre-commit
├── pyproject.toml # dependencies for poetry
├── README.md # describe your project
├── src # store source code
│ ├── __init__.py # make src a Python module
│ ├── config.py # store configs
│ ├── process.py # process data before training model
│ ├── run_notebook.py # run notebook
│ └── train_model.py # train model
└── tests # store tests
├── __init__.py # make tests a Python module
├── test_process.py # test functions for process.py
└── test_train_model.py # test functions for train_model.py
Install Cookiecutter:
pip install cookiecutter
Create a project based on the template:
cookiecutter https://github.com/khuyentran1401/data-science-template --checkout prefect-poetry