Skip to content

Latest commit

 

History

History
402 lines (288 loc) · 19 KB

README.md

File metadata and controls

402 lines (288 loc) · 19 KB

dbt Certification Project

The main goal of this project is to deliver the tools and practice required to pass the dbt Analytics Engineering Exam.

Scenario

  • You have been hired by a new company that wants to publish the next big hit of boardgame and needs your help to identify the factors to create this new boardgame, like:
    • Countries with highest demand (based on number of reviews and positive reviews, what are the kind of games that users rank the best).
    • Define metrics to calculate the average rating for each dimension (designers, categories, artists, mechanics)

To do that the company relies on your expertise to perform insights on the datasets available and build recommendations from it.

Data Architecture

Data Architecture

Components:

Data Model

Raw Layer

Data Model

Core Layer

Data Model - Snowflake Schema

Data Dictionary

Data Dictionary


1. Development Setup

The following steps will help you to configure your mac for Python development, automation, and command line use.

You can also check this recorded session that goes through project setup step-by-step. (Passcode: 7%#DU9#b)

1.1 Setup your Mac

You should install some command line tools, and configure your terminal and editor:

1.2 Git

  • Confirm git is installed: git --version

  • Sign up on GitHub using your Hakkoda email

  • Sign in to GitHub using your Hakkoda account

  • Click on the top left button (alongside the GitHub logo) to expand the left panel and press the Single sign-on button like shown below GitHub Single Sign-On

  • Press the Continue button GitHub Single Sign-On

  • Finally you have the main page with top panel with the links to access repositories, projects, teams and so on. GitHub Single Sign-On

Configure global git settings

git config --global user.name "[github_username]"
git config --global user.email "[github_email]"
git config --global init.defaultBranch main
git config --global --add push.default current
git config --global push.autoSetupRemote true
git config --global pull.rebase false

Note: make sure to replace [github_username] and [github_email] in the path above with your own

To confirm your git settings run the following command:

git config --global --list

2. Clone the dbt Certification Project Repository

Like all Hakkoda repos, the dbt Certification Project repo can only be access locally using SSH keys. If you haven't already setup SSH follow these steps or skip straight to the git clone command below.

Generate a New SSH Key

Open a Terminal. If you haven't already, setup your terminal. Then create an SSH key. Ensure that you save your key passphrase in Bitwarden.

# Generate the SSH key
# NOTE: 
#  * Update your email address before running the keygen command

ssh-keygen -t ed25519 -C "[email protected]"

Check the example below on how to adjust the above command to your e-mail account.

When asked to enter file in which to save the key just hit Enter on your keyboard to accept to save on the suggested file. If it already exists, it will ask if you want to overwrite.

SSH-Keys

When asked to Enter passphrase you can use Bitwarden to generate the passphrase and save as something like LaptopSSHKey. You will need to provide this passphrase every time you need to unlock your SSH key

# Add your SSH key to the ssh-agent
eval "$(ssh-agent -s)"

# Update your SSH config to automatically load keys into the ssh-agent and 
# store passphrases in your keychain.
cat <<- EOF >> ~/.ssh/config
Host github.com
  AddKeysToAgent yes
  UseKeychain yes
  IdentityFile ~/.ssh/id_ed25519
EOF

# For MacOS, you can also store your passphrase in the Apple keychain. You may 
# need to provide your mac password to access your keychain. 
ssh-add --apple-use-keychain ~/.ssh/id_ed25519

Add the SSH key to your GitHub account

Follow instructions to add your public key to your GitHub account.

# Copy public key to clipboard
pbcopy < ~/.ssh/id_ed25519.pub

Go to the GitHub Keys page and click the New SSH key button. Then:

  • Add a description title: LaptopSSHKey
  • Paste the key: cmd + v

If requested, enter your GitHub password to save.

Then to enable SSO for your key, click Configure SSO and then Authorize. Click continue to enable SSO. Go through Okta authentication steps. Click continue to complete.

When finalized it should look like this. GitHub SSH Key

Git Clone dbt Certification Project

Before cloning your repo, you should open your terminal to a development folder to store our project code in.

For example:

# Ensure that you are in your "home" directory
cd ~

# Create a new development directory, if it doesn't already exist
mkdir Develop

# Change to the development directory
cd Develop

# Clone project repository (NOTE: uses SSH key)
git clone [email protected]:Hakkoda1/dbt-certification-project.git

Docker Version - Optional

There is a docker version available on this repository that you can use instead of setting up a python environment on your local machine.

Check this video tutorial on how to setup the docker-compose.

To install docker desktop on Mac you can use this url.

Note: If you choose this path, you can jump directly to chapter 6 of the tutorial.


3. Setup Your Python Environment

Setup Python for dbt use with Snowflake. Because dbt-core is currently compatible with Python versions from 3.8 - 3.11, we have setup the project to use Python 3.10, but feel free to install any version that is supported by dbt (source).

brew install [email protected] virtualenv

Create your Python virtual environment. (If you have used another Python version, please change the code below accordingly)

# Change to dbt-certification-project repository root folder
cd dbt-certification-project/

# Setup a project python virtual environement
virtualenv .dbt-env --python=python3.10

4. Activate Your Virtual Environment

Every time you open a new terminal, you will need to source your Project Python environment:

# Source your project environment
source .dbt-env/bin/activate

5. Install Python packages required

Run the following command to install the required Python packages for the project:

pip install -r requirements.txt

6. Setup dbt Project

When you clone the repository the dbt project was already initialized, so you can skip the command dbt init.

After that, you need to use the sample-profiles.yml template file to create your own profiles.yml by running the following command inside dbt folder:

# Change to dbt project folder
 cd dbt/

# Rename the file sample-profiles.yml to profiles.yml
cp sample-profiles.yml profiles.yml

7. Install dbt Power Extension for VSCode

dbt Power User extension is a great tool to bring a similar experience to dbt Cloud version using dbt-core on VSCode.

Below are listed the features that this extension provides:

  • Setting up the dbt Power User extension
  • Column Level Lineage
  • dbt query data results preview with file export and analysis
  • Generate dbt models from source files or SQL queries
  • Generate model and column descriptions
  • Click to execute parent/child models and tests
  • Query explanations for complex dbt models

Follow the video below on a step-by-step tutorial on how to install.

Tutorial Installation

Below there are a couple of additional videos with useful information on how to setup and take advantage of the features from dbt Power User extension.

Note: Those are from the dbt Zero to Hero Udemy course that is part of the learning path for the dbt Analytics Engineering Certification Exam.

8. How to use Git on this Project

Git is an integral part of version control and CI/CD. It makes contributing to a project with multiple contributors extremely easy. Version control is essential and allows any project to move to any snapshot of time in the development history.

8.1.dbt Certification Project Git Workflow

dbt Certification Project uses the Simple Flow branching convention with a few types of branches:

  • Short running branches (features): temporary dedicated branches for work-in-progress features

    • start from develop, to develop changes that are then integrated (merged) back to develop

    • enables parallel development with minimal conflicts with other features

    • For example, to create a new feature branch we need to ensure we are basing it off of the most recent develop branch and then creating the new branch:

      git checkout develop
      git pull
      # In this scenario we should use the [feature_first-initial-last-name] structure to identify each member developing branch (ex: feature_fbalseiro)
      git checkout -b feature_fbalseiro
  • Long running branches (protected): branches that require pull requests to merge in changes

    • develop: changes staged for the next production release, should be merged to main when a release version is ready. This is is part of our CI/CD process.
    • main: current version in production, each merge on this branch should be tag and deployed.
graph BT;
    feature_1 --> develop;
    feature_2 --> develop;
    feature_x --> develop;
    develop --> main;
Loading

8.2. How to get new updates from production to your branch

As you're going through the project and working on your branch ex: feature_fbalseiro there can be new updates on main branch due to new features or challenges that have been added to the repository.

In that scenario, if you want to to get access to that on your local machine, you can perform the following steps:

  1. Check first in which branch are you working with the command git branch and the branch should be identified with a * before the name, like this:
❯ git branch                           
  add-pr-template
  add-snapshot
  develop
* feature_fbalseiro
  github-cronjob
  main
  1. Check if you have all your files staged and pushed to the remote repository by running the command git status 2.1. If you get an output like the following you should first run git push to push your development to the remote repository branch:
 ❯ git status
 On branch develop
 Your branch is ahead of 'origin/develop' by 1 commit.
   (use "git push" to publish your local commits)
 
 Changes to be committed:
   (use "git restore --staged <file>..." to unstage)
         modified:   ../README.md

2.2. If you get an output that your branch is up to date and nothing to commit, then you can move to the next step

  1. Checkout of your branch to the main branch by running the command git checkout main to switch to the main branch

  2. Finally run git pull to pull the latest version of main branch on remote repository to your local machine

  3. If you want to go back again to your working branch just run git checkout feature_fbalseiro (adjust to your branch name)

9. Project Challenges

  1. Customize Profile Directory
  2. Setup Environment Variables
  3. Add Sources
  4. Testing and Documenting Sources
  5. Add Snapshot
  6. Add Staging Models
  7. Add Source Freshness
  8. Add Singular Test
  9. Add Generic Test
  10. Configure Test Severity
  11. Update Staging Models
  12. Customize Data Test Name
  13. Add Intermediate Models
  14. Add Variables
  15. Add Tags
  16. Add Marts Core Models
  17. Generate Documentation
  18. Add Macros
  19. dbt run-operation
  20. Hooks
  21. Understanding DAGs

9.1. Additional Challenges - Not Covered in the Exam

  1. Add dbt Expectations
  2. Add CI Workflow
  3. Add Schedule Daily Run Workflow
  4. Add Deploy to Production (Post-Merge) Workflow

10. Additional Content