This project template enables data scientists to use Visual Studio Code with a consistent and isolated Docker environment. It’s cross-platform, supporting Windows, macOS, and Linux—ideal for AI and data science work. With this setup, all dependencies are managed within Docker, eliminating the need for local Python environment management. This ensures version control and reproducibility across platforms using Docker and Visual Studio Code.
To get started, you’ll need to install the following on your computer:
-
Docker – Download and install Docker for your operating system:
-
Visual Studio Code (VS Code) – Download and install VS Code.
-
Git – Git is often pre-installed on Linux, but you may need to install it on Windows and macOS.
Note: You’ll also need the Dev Containers extension in VS Code, which we’ll cover in the installation steps.
Follow these steps to set up this template on any system:
- Open a terminal on your computer.
- Run the following command to clone this template from GitHub:
git clone https://github.com/donphi/data-science-template.git
- Navigate to the newly cloned folder:
cd data-science-template
-
Launch Visual Studio Code.
-
Click on File > Open Folder and select the root folder of the project (
data-science-template
). -
Once opened, you should see a prompt at the bottom of VS Code asking if you want to "Reopen in Container." Click Reopen in Container.
Note: If you don’t see this prompt, ensure the Dev Containers extension is installed in VS Code.
Once VS Code loads the container:
- Confirm that all project folders (e.g.,
data
,models
,notebooks
) are visible in the left sidebar. - Check the bottom-left corner of VS Code for a green icon indicating the container is active.
-
You’re now ready to work within the container! Any code changes or new files will be saved directly in your project directory.
-
When you’re ready to update your work on GitHub, use the following commands to commit and push changes:
git add . git commit -m "Describe your changes here" git push
Important: The
data
folder is ignored by default (based on.gitignore
) to prevent large or sensitive files from being tracked in Git.
If you’re not using Visual Studio Code, you can still use the Dockerfile
and requirements.txt
to set up the environment directly with Docker.
- Build the Docker Image:
docker build -t your_project_name -f docker/Dockerfile .
- Run the Docker Container:
docker run --rm -it --env-file docker/.env -v $(pwd):/workspace your_project_name
- The
--env-file
option loads environment variables from.env
. - The
-v $(pwd):/workspace
option mounts your project directory to/workspace
inside the container.
- The
This setup gives you a similar environment to the Dev Container in VS Code.
This template uses Docker and VS Code’s Dev Container configuration to ensure consistency and avoid conflicts between local dependencies on different systems.
-
Build the Docker Image
Run the following command to build the Docker image from theDockerfile
in thedocker
folder:docker build -t your_project_name -f docker/Dockerfile .
-
Run the Docker Container
Start a container with the following command:docker run --rm -it --env-file docker/.env -v $(pwd):/workspace your_project_name
-
Environment Variables
If you don’t want to use the.env.example
file, you can skip the.env
setup entirely for Docker. In this case, environment variables can be defined directly in thedocker run
command, like this:docker run --rm -it -e VARIABLE_NAME=value -v $(pwd):/workspace your_project_name
- Replace
VARIABLE_NAME
andvalue
with each environment variable and its value. - This method is helpful for quickly setting variables without needing a separate
.env
file but may not be ideal for more complex configurations. - If you don’t need certain variables, you can omit
-e VARIABLE_NAME=value
entirely from the command.
- Replace
-
Copying
.env.example
for Project-Specific Configurations
Copy.env.example
to.env
and update variables as needed:cp docker/.env.example docker/.env # For Linux/macOS copy docker\.env.example docker\.env # For Windows
- Add Dependencies: Add any required dependencies to
docker/requirements.txt
. - Install Dependencies in Docker: When building the Docker image, all dependencies from
requirements.txt
will be installed automatically.
├── .devcontainer <- Devcontainer files for VS Code Docker setup.
│ └── devcontainer.json <- VS Code configuration for dev container support.
│
├── docker <- Docker-specific files, including Dockerfile and environment files.
│ ├── Dockerfile <- Dockerfile defining the project environment.
│ ├── .env.example <- Template for environment variables, to be copied to `.env`.
│ └── requirements.txt <- List of dependencies for the Docker environment.
│
├── README.md <- The top-level README for developers using this project
│
├── data
│ ├── external <- Data from third party sources
│ ├── interim <- Intermediate data that has been transformed
│ ├── processed <- The final, canonical data sets for modeling
│ └── raw <- The original, immutable data dump
│
├── models <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`
│
├── references <- Data dictionaries, manuals, and all other explanatory materials
│
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting
│
└── src <- Source code for this project
│
├── __init__.py <- Makes src a Python module
│
├── config.py <- Store useful variables and configuration
│
├── dataset.py <- Scripts to download or generate data
│
├── features.py <- Code to create features for modeling
│
├── modeling <- Code for training and inference
│ ├── __init__.py
│ ├── predict.py <- Code to run model inference with trained models
│ └── train.py <- Code to train models
│
├── plots.py <- Code to create visualizations
│
└── services <- Service classes to connect with external platforms, tools, or APIs
└── __init__.py
Designed by chonkie