Name		Name	Last commit message	Last commit date
parent directory ..
data		data
files/images/ImageSegmentation		files/images/ImageSegmentation
Dockerfile		Dockerfile
Image Segmentation.ipynb		Image Segmentation.ipynb
ImageSegmentationSetup.ipynb		ImageSegmentationSetup.ipynb
README.md		README.md
jupyter_notebook_config.py		jupyter_notebook_config.py
start-mlflow-ui.ipynb		start-mlflow-ui.ipynb

README.md

Image Segmentation Demo

Start by ⭐️ starring lakeFS open source project.

This repository includes a Jupyter Notebook which you can run on your local machine. The notebook demonstrates ML Data Version Control and Reproducibility at Scale.

In the ever-evolving landscape of machine learning (ML), data stands as the cornerstone upon which triumphant models are built. However, as ML projects expand and encompass larger and more complex datasets, the challenge of efficiently managing and controlling data at scale becomes more pronounced.

Breaking Down Conventional Approaches:

The Copy/Paste Predicament: In the world of data science, it's commonplace for data scientists to extract subsets of data to their local environments for model training. This method allows for iterative experimentation, but it introduces challenges that hinder the seamless evolution of ML projects.
Reproducibility Constraints: Traditional practices of copying and modifying data locally lack the version control and audit-ability crucial for reproducibility. Iterating on models with various data subsets becomes a daunting task.
Inefficient Data Transfer: Regularly shuttling data between the central repository and local environments strains resources and time, especially when choosing different subsets of data for each training run.
Limited Compute Power: Operating within a local environment hampers the ability to harness the full power of parallel computing, as well as the distributed prowess of systems like Apache Spark.

In this demo, we will demonstrate:

How to use lakeFS to version control your data when working with your data locally.
How to use lakeFS without the need to copy data and train your model at scale directly on the Cloud.
We will be leveraging the technology stack of: AWS S3, Databricks Delta Lake, PyTorch and MLflow

Prerequisites

Docker installed on your local machine
This demo requires connecting to a lakeFS Server. You can spin up lakeFS Server for free on the lakeFS cloud (https://lakefs.cloud).

Setup

Start by cloning this repository:

git clone https://github.com/treeverse/lakeFS-samples && cd lakeFS-samples/01_standalone_examples/image-segmentation

Run following commands to build and run Docker container which includes Python, Spark, Jupyter Notebook and required Python packages (Docker image size is around 10GB):

   docker build -t lakefs-image-segmentation-demo .

   docker run -d -p 8889:8888 -p 4041:4040 -p 5001:5000 --user root -e GRANT_SUDO=yes -v $PWD:/home/jovyan -v $PWD/jupyter_notebook_config.py:/home/jovyan/.jupyter/jupyter_notebook_config.py --name lakefs-image-segmentation-demo lakefs-image-segmentation-demo

If any of the port numbers (8889, 4041 and 5001) are already in use then change the port numbers to any available ports.

Open JupyterLab UI http://127.0.0.1:8889/ in your web browser.

Demo Instructions

Once you have successfully completed setup then open "Image Segmentation" notebook from JupyterLab UI and follow the instructions.
If you want to run same notebook on the Databricks cluster:
- Use Databricks Runtime version "14.3 LTS ML". GPUs are not required for this demo.
- Install pytorch-lightning==1.5.4, segmentation-models-pytorch==0.3.3 and lakefs==0.4.1 Python libraries on your Databricks Compute cluster. Also, install io.lakefs:hadoop-lakefs-assembly:0.2.3 library from Maven repository.
- Follow Databricks and lakeFS Integration: Step-by-Step Configuration Tutorial to configure your Databricks Compute cluster.
- Import "Image Segmentation" and "ImageSegmentationSetup" notebooks to your Databricks environment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

image-segmentation

image-segmentation

README.md

Image Segmentation Demo

Prerequisites

Setup

Demo Instructions

Files

image-segmentation

Directory actions

More options

Directory actions

More options

Latest commit

History

image-segmentation

Folders and files

parent directory

README.md

Image Segmentation Demo

Prerequisites

Setup

Demo Instructions