ajc_dbt

Summary

This is a simple personal project to extract and model music album data using Azure Data Factory, dbt and Databricks.

Data: 1001 Albums To Hear Before You Die

Data is sourced from an API provided by the webapp 1001albumsgenerator, based off the book 1001 Albums You Must Hear Before You Die by Robert Dimery.

Every day a new music album is listened to and rated. The API tracks the albums listened to and the rating assigned, along with some metadata such as release year, wikipedia link, genres and global rating.

Build: GitHub Actions

A GitHub Actions Workflow pipeline is triggered on push to the main branch and on completion of a PR. This pipeline compiles and lints the dbt project code before building it on the target Databricks database. A Databricks personal access token is stored as an environment variable called ADBTOKEN.

Build status:

Extract: Data Factory

Data is extracted daily as JSON and stored as raw data in an Azure storage account via the Azure Data Factory pipeline get_albums.

The pipeline then calls an Azure Databricks notebook called load_albums_delta that loads today's json file into a delta table in an ADB workspace.

Storage account access is managed via an API call to an Azure Key Vault that hold the details of a storage account key to be used by Databricks to connect to.

The source code for the Data Factory is stored in the adf folder. The Databricks notebook lives in the adb folder.

Transform: dbt

Once the data is loaded into the delta lake, it can be transformed into the desired shape. A dbt project takes the raw Albums data and builds a small set of dimensional models to allow for data analysis and visualisation.

Visualisation: Databricks

In the ADB workspace, a Dashboard visualisation uses the Gold layer of the dbt warehouse and provides simple visualisations and analysis.

Users with permission can access the visualisation dashboard via the ADB workspace here

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
.github/workflows		.github/workflows
.vscode		.vscode
adb		adb
adf		adf
models		models
seeds		seeds
.gitignore		.gitignore
.sqlfluff		.sqlfluff
.user.yml		.user.yml
README.md		README.md
dbt_project.yml		dbt_project.yml
package-lock.yml		package-lock.yml
packages.yml		packages.yml
profiles.yml		profiles.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ajc_dbt

Summary

Data: 1001 Albums To Hear Before You Die

Build: GitHub Actions

Extract: Data Factory

Transform: dbt

Visualisation: Databricks

About

Languages

andrewcrosher/ajc_dbt

Folders and files

Latest commit

History

Repository files navigation

ajc_dbt

Summary

Data: 1001 Albums To Hear Before You Die

Build: GitHub Actions

Extract: Data Factory

Transform: dbt

Visualisation: Databricks

About

Resources

Stars

Watchers

Forks

Languages