Start by ⭐️ starring lakeFS open source project.
This repository includes following Databricks Notebooks which you can run in your Databricks cluster:
- Azure Databricks Tutorial:
- This notebook is used in this blog Databricks and lakeFS Integration: Step-by-Step Configuration Tutorial.
- lakeFS Demo:
- Use Case: Managing the Data Lifecycle with lakeFS
- Delta Lake Demo:
- Integration of lakeFS with Delta Lake
- Use Cases: Isolating ETL job and atomic promotion to production. Atomic rollback of Multi-Table Transactions.
- This notebook also runs deltaLakeSetup notebook internally.
- Unstructured Data ML Demo:
- Use Case: Isolated Reproducible Unstructured Datasets for ML
- This notebook also runs unstructuredDataMLDemoSetup notebook internally.
- Unity Catalog Integration Demo:
- Use Case: Isolated Unity Catalog schema for dev/test environment
- This notebook also runs unityCatalogIntegrationDemoSetup notebook internally.
- lakeFS installed and running on your local machine or on a server or in the cloud. If you don't have lakeFS already running then either use lakeFS Cloud which provides lakeFS server on-demand with a single click or refer to lakeFS Quickstart doc.
- Databricks server with the ability to run compute clusters on top of it.
- Configure your Databricks cluster to use lakeFS Hadoop file system in Presigned Mode. Read this blog Databricks and lakeFS Integration: Step-by-Step Configuration Tutorial or lakeFS documentation for the configuration.
- Permissions to manage the cluster configuration, including adding libraries.
- Download these notebooks from GitHub and import it in your Databricks workspace.
Once you have successfully completed setup then open any notebook from Databricks UI and follow the instructions.