Skip to content

Latest commit

 

History

History
55 lines (35 loc) · 1.88 KB

File metadata and controls

55 lines (35 loc) · 1.88 KB

Cookiecutter Data Science with luigi

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

This project provides a Cookiecutter data science project template based on an existing project template. This version adds support for luigi tasks instead of using ad-hoc python for data processing as suggested in the original template.

Requirements to use the cookiecutter template:


  • Python 2.7 or 3.5
  • Cookiecutter Python package >= 1.4.0: This can be installed with pip by or conda depending on how you manage your Python packages:
$ pip install cookiecutter

or

$ conda config --add channels conda-forge
$ conda install cookiecutter

To start a new project, run:


cookiecutter https://github.com/ffmmjj/luigi_data_science_project_cookiecutter

Installing development requirements


pip install -r requirements.txt

Run luigi tasks


make data

Clean temporary and processed data


make data_clean 

Running the tests


py.test tests

Adding new luigi tasks

The project comes with a final luigi task called FinalTask in the module src/data_tasks/final.py. New tasks must be placed under the directory src/data_tasks/. The luigi task that generates the final, processed dataset must be added to the list of tasks required by FinalTask since this is the "data sink" processed by luigi when you use the Makefile's data target.

See the original project for more details on this project structure.