Modern Data Engineering with Medallion Architecture

Project Overview

This project sets up an end-to-end data engineering pipeline using Apache Spark, Azure Databricks, and Data Build Tool (DBT) on the Azure cloud platform. Leveraging the Medallion Architecture, our pipeline encompasses data ingestion, integration, and transformation processes designed to prepare data for advanced analytics.

Architecture

Components

Apache Spark: Utilized for large-scale data processing.
Azure Databricks: Provides a high-performance analytics platform.
DBT (Data Build Tool): Used for data modeling and transformations within the data lakehouse.
Azure Data Factory: Manages data pipelines for data integration and transformation.

Data Layers

Bronze: Raw data ingestion and storage.
Silver: Data cleaning and enrichment.
Gold: Aggregated data optimized for business intelligence.

Workflow Commands

dbt run         # Run transformation models
dbt test        # Execute data tests
dbt snapshot    # Manage slowly changing dimensions
dbt docs generate # Generate project documentation
dbt docs serve   # Serve documentation locally

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
analyses		analyses
macros		macros
models		models
seeds		seeds
snapshots		snapshots
tests		tests
.gitignore		.gitignore
README.md		README.md
System Architecture.jpeg		System Architecture.jpeg
dbt_project.yml		dbt_project.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modern Data Engineering with Medallion Architecture

Project Overview

Architecture

Components

Data Layers

Workflow Commands

About

Releases

Packages

sowrabh-m/Data_Pipeline_Spark_Azure_DBT

Folders and files

Latest commit

History

Repository files navigation

Modern Data Engineering with Medallion Architecture

Project Overview

Architecture

Components

Data Layers

Workflow Commands

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages