In this homework, we'll use the models developed during the week 4 videos and enhance the already presented dbt project using the already loaded Taxi data for fhv vehicles for year 2019 in our DWH.
This means that in this homework we use the following data Datasets list
- Yellow taxi data - Years 2019 and 2020
- Green taxi data - Years 2019 and 2020
- fhv data - Year 2019.
We will use the data loaded for:
- Building a source table:
stg_fhv_tripdata
- Building a fact table:
fact_fhv_trips
- Create a dashboard
If you don't have access to GCP, you can do this locally using the ingested data from your Postgres database instead. If you have access to GCP, you don't need to do it for local Postgres - only if you want to.
Note: if your answer doesn't match exactly, select the closest option
What happens when we execute dbt build --vars '{'is_test_run':'true'}' You'll need to have completed the "Build the first dbt models" video.
- It's the same as running dbt build
- It applies a limit 100 to all of our models
- It applies a limit 100 only to our staging models
- Nothing
What is the code that our CI job will run? Where is this code coming from?
- The code that has been merged into the main branch
- The code that is behind the creation object on the dbt_cloud_pr_ schema
- The code from any development branch that has been opened based on main
- The code from the development branch we are requesting to merge to main
What is the count of records in the model fact_fhv_trips after running all dependencies with the test run variable disabled (:false)?
Create a staging model for the fhv data, similar to the ones made for yellow and green data. Add an additional filter for keeping only records with pickup time in year 2019.
Do not add a deduplication step. Run this models without limits (is_test_run: false).
Create a core model similar to fact trips, but selecting from stg_fhv_tripdata and joining with dim_zones. Similar to what we've done in fact_trips, keep only records with known pickup and dropoff locations entries for pickup and dropoff locations. Run the dbt model without limits (is_test_run: false).
- 12998722
- 22998722
- 32998722
- 42998722
What is the service that had the most rides during the month of July 2019 month with the biggest amount of rides after building a tile for the fact_fhv_trips table and the fact_trips tile as seen in the videos?
Create a dashboard with some tiles that you find interesting to explore the data. One tile should show the amount of trips per month, as done in the videos for fact_trips, including the fact_fhv_trips data.
- FHV
- Green
- Yellow
- FHV and Green
- Form for submitting: https://courses.datatalks.club/de-zoomcamp-2024/homework/hw4
Deadline: 22 February (Thursday), 22:00 CET
- Video: https://youtu.be/3OPggh5Rca8
- Answers:
- Question 1: It applies a limit 100 only to our staging models
- Question 2: The code from the development branch we are requesting to merge to main
- Question 3: 22998722
- Question 4: Yellow