In this HashiQube DevOps lab you will get hands on experience with DBT - The Data Build Tool.
dbt is a data transformation tool that enables data analysts and engineers to transform, test and document data in the cloud data warehouse.
Review the dbt and adapter versions located in common.sh
To control which adapter and version you would like to install with dbt, change the variable DBT_WITH
to an accepted value
DBT_WITH=postgres
# AVAILABLE OPTIONS ARE:
# postgres
# redshift
# bigquery
# snowflake
# mssql
# ^^ with mssql being SQL Server and Synapase
# spark
# all
# ^^ will install all adapters excluding mssql
Next lets provision Hashiqube with basetools and dbt:
vagrant up --provision-with basetools,docsify,docker,postgresql,dbt
https://github.com/dbt-labs/jaffle_shop#running-this-project
-
Run
vagrant up --provision-with basetools,docsify,docker,postgresql,dbt
-
We have already cloned https://github.com/dbt-labs/jaffle_shop into
/vagrant/dbt/jaffle_shop
and we will be following the tutorial at https://github.com/dbt-labs/jaffle_shop#running-this-project -
See the output and now try to follow the tutorial at https://github.com/dbt-labs/jaffle_shop
-
Enter Hashiqube ssh session using
vagrant ssh
this project can be found in/vagrant/dbt
-
Enter Hashiqube ssh session using
vagrant ssh
-
If you have an existing dbt project under your home directory, you can navigate to your dbt project via the
/osdata
volume which is mapped to your home directory. -
Update your
profile.yml
with the correct credentials of your target database. Usedbt debug
to test connection. -
dbt run
and be awesome.
These adapters require a previous version of dbt (not latest).
dbt --version
will output:
Core:
- installed: 1.1.0
- latest: 1.2.1 - Update available!
Your version of dbt-core is out of date!
You can find instructions for upgrading here:
https://docs.getdbt.com/docs/installation
Plugins:
- postgres: 1.1.0 - Update available!
- synapse: 1.1.0 - Up to date!
- sqlserver: 1.1.0 - Up to date!
Example 2 with other adapters:
Core:
- installed: 1.2.1
- latest: 1.2.1 - Up to date!
Plugins:
- spark: 1.2.0 - Up to date!
- postgres: 1.2.1 - Up to date!
- snowflake: 1.2.0 - Up to date!
- redshift: 1.2.1 - Up to date!
- bigquery: 1.2.0 - Up to date!
For a practical example we are going to use https://github.com/dbt-labs/jaffle_shop
Jaffle shop will automatically cloned down, and instantiated. It will seed to the PosgreSQL database which we provisioned with postgresql
in the command vagrant up --provision-with basetools,docsify,docker,postgresql,dbt
Once the provisioner is done you will be able to access the DBT Web interface at http://localhost:28080/
When the dbt project grows, DBT RUN and DBT TEST become expensive. An alternative to reduce the cost of running the project is to have the content of the folder .dbt/targeton persistent storage to reuse later
Use DBT RUN and DBT TEST with deferring parameters: $ dbt run --select [...] --defer --state path/to/artifacts and
dbt test --select [...] --defer --state path/to/artifacts
In this way, if we have already run the model, the next RUN and TEST will execute exclusively what is new on the code and reuse what has been run previously.
This would open space as well to implement cool stuff, such as running the model over a pull request.