Skip to content

Commit

Permalink
Merge branch 'current' into dbeatty/jinja-statements
Browse files Browse the repository at this point in the history
  • Loading branch information
matthewshaver authored Dec 18, 2023
2 parents 065a882 + a6d6e6f commit ac2d98e
Show file tree
Hide file tree
Showing 310 changed files with 2,556 additions and 2,301 deletions.
5 changes: 4 additions & 1 deletion contributing/content-style-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,7 @@ If the list starts getting lengthy and dense, consider presenting the same conte

A bulleted list with introductory text:

> A dbt project is a directory of `.sql` and .yml` files. The directory must contain at a minimum:
> A dbt project is a directory of `.sql` and `.yml` files. The directory must contain at a minimum:
>
> - Models: A model is a single `.sql` file. Each model contains a single `select` statement that either transforms raw data into a dataset that is ready for analytics or, more often, is an intermediate step in such a transformation.
> - A project file: A `dbt_project.yml` file, which configures and defines your dbt project.
Expand Down Expand Up @@ -479,6 +479,9 @@ Some common Latin abbreviations and other words to use instead:
| i.e. | that is | Use incremental models when your dbt runs are becoming too slow (that is, don't start with incremental models) |
| e.g. | <ul><li>for example</li><li>like</li></ul> | <ul><li>Join both the dedicated #adapter-ecosystem channel in dbt Slack and the channel for your adapter's data store (for example, #db-sqlserver and #db-athena)</li><li>Using Jinja in SQL provides a way to use control structures (like `if` statements and `for` loops) in your queries </li></ul> |
| etc. | <ul><li>and more</li><li>and so forth</li></ul> | <ul><li>A continuous integration environment running pull requests in GitHub, GitLab, and more</li><li>While reasonable defaults are provided for many such operations (like `create_schema`, `drop_schema`, `create_table`, and so forth), you might need to override one or more macros when building a new adapter</li></ul> |
| N.B. | note | Note: State-based selection is a powerful, complex feature. |

https://www.thoughtco.com/n-b-latin-abbreviations-in-english-3972787

### Prepositions

Expand Down
2 changes: 1 addition & 1 deletion website/blog/2021-11-22-dbt-labs-pr-template.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ Checking for things like modularity and 1:1 relationships between sources and st

#### Validation of models:

This section should show something to confirm that your model is doing what you intended it to do. This could be a [dbt test](/docs/build/tests) like uniqueness or not null, or could be an ad-hoc query that you wrote to validate your data. Here is a screenshot from a test run on a local development branch:
This section should show something to confirm that your model is doing what you intended it to do. This could be a [dbt test](/docs/build/data-tests) like uniqueness or not null, or could be an ad-hoc query that you wrote to validate your data. Here is a screenshot from a test run on a local development branch:

![test validation](/img/blog/pr-template-test-validation.png "dbt test validation")

Expand Down
2 changes: 1 addition & 1 deletion website/blog/2021-11-22-primary-keys.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ In the days before testing your data was commonplace, you often found out that y

## How to test primary keys with dbt

Today, you can add two simple [dbt tests](/docs/build/tests) onto your primary keys and feel secure that you are going to catch the vast majority of problems in your data.
Today, you can add two simple [dbt tests](/docs/build/data-tests) onto your primary keys and feel secure that you are going to catch the vast majority of problems in your data.

Not surprisingly, these two tests correspond to the two most common errors found on your primary keys, and are usually the first tests that teams testing data with dbt implement:

Expand Down
2 changes: 1 addition & 1 deletion website/blog/2021-11-29-dbt-airflow-spiritual-alignment.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ So instead of getting bogged down in defining roles, let’s focus on hard skill
The common skills needed for implementing any flavor of dbt (Core or Cloud) are:

* SQL: ‘nuff said
* YAML: required to generate config files for [writing tests on data models](/docs/build/tests)
* YAML: required to generate config files for [writing tests on data models](/docs/build/data-tests)
* [Jinja](/guides/using-jinja): allows you to write DRY code (using [macros](/docs/build/jinja-macros), for loops, if statements, etc)

YAML + Jinja can be learned pretty quickly, but SQL is the non-negotiable you’ll need to get started.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ The most important thing we’re introducing when your project is an infant is t

* Introduce modularity with [{{ ref() }}](/reference/dbt-jinja-functions/ref) and [{{ source() }}](/reference/dbt-jinja-functions/source)

* [Document](/docs/collaborate/documentation) and [test](/docs/build/tests) your first models
* [Document](/docs/collaborate/documentation) and [test](/docs/build/data-tests) your first models

![image alt text](/img/blog/building-a-mature-dbt-project-from-scratch/image_3.png)

Expand Down
2 changes: 1 addition & 1 deletion website/blog/2022-04-19-complex-deduplication.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ select * from filter_real_diffs

> *What happens in this step? You check your data because you are thorough!*
Good thing dbt has already built this for you. Add a [unique test](/docs/build/tests#generic-tests) to your YAML model block for your `grain_id` in this de-duped staging model, and give it a dbt test!
Good thing dbt has already built this for you. Add a [unique test](/docs/build/data-tests#generic-data-tests) to your YAML model block for your `grain_id` in this de-duped staging model, and give it a dbt test!

```yaml
models:
Expand Down
2 changes: 1 addition & 1 deletion website/blog/2022-09-28-analyst-to-ae.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ The analyst caught the issue because they have the appropriate context to valida

An analyst is able to identify which areas do *not* need to be 100% accurate, which means they can also identify which areas *do* need to be 100% accurate.

> dbt makes it very quick to add [data quality tests](/docs/build/tests). In fact, it’s so quick, that it’ll take an analyst longer to write up what tests they want than it would take for an analyst to completely finish coding them.
> dbt makes it very quick to add [data quality tests](/docs/build/data-tests). In fact, it’s so quick, that it’ll take an analyst longer to write up what tests they want than it would take for an analyst to completely finish coding them.
When data quality issues are identified by the business, we often see that analysts are the first ones to be asked:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -133,9 +133,9 @@ This model tries to parse the raw string value into a Python datetime. When not

#### Testing the result

During the build process, dbt will check if any of the values are null. This is using the built-in [`not_null`](https://docs.getdbt.com/docs/building-a-dbt-project/tests#generic-tests) test, which will generate and execute SQL in the data platform.
During the build process, dbt will check if any of the values are null. This is using the built-in [`not_null`](https://docs.getdbt.com/docs/building-a-dbt-project/tests#generic-data-tests) test, which will generate and execute SQL in the data platform.

Our initial recommendation for testing Python models is to use [generic](https://docs.getdbt.com/docs/building-a-dbt-project/tests#generic-tests) and [singular](https://docs.getdbt.com/docs/building-a-dbt-project/tests#singular-tests) tests.
Our initial recommendation for testing Python models is to use [generic](https://docs.getdbt.com/docs/building-a-dbt-project/tests#generic-data-tests) and [singular](https://docs.getdbt.com/docs/building-a-dbt-project/tests#singular-data-tests) tests.

```yaml
version: 2
Expand Down
2 changes: 1 addition & 1 deletion website/blog/2023-01-24-aggregating-test-failures.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ _It should be noted that this framework is for dbt v1.0+ on BigQuery. Small adap

When we talk about high quality data tests, we aren’t just referencing high quality code, but rather the informational quality of our testing framework and their corresponding error messages. Originally, we theorized that any test that cannot be acted upon is a test that should not be implemented. Later, we realized there is a time and place for tests that should receive attention at a critical mass of failures. All we needed was a higher specificity system: tests should have an explicit severity ranking associated with them, equipped to filter out the noise of common, but low concern, failures. Each test should also mesh into established [RACI](https://project-management.com/understanding-responsibility-assignment-matrix-raci-matrix/) guidelines that state which groups tackle what failures, and what constitutes a critical mass.

To ensure that tests are always acted upon, we implement tests differently depending on the user groups that must act when a test fails. This led us to have two main classes of tests — Data Integrity Tests (called [Generic Tests](https://docs.getdbt.com/docs/build/tests) in dbt docs) and Context Driven Tests (called [Singular Tests](https://docs.getdbt.com/docs/build/tests#singular-tests) in dbt docs), with varying levels of severity across both test classes.
To ensure that tests are always acted upon, we implement tests differently depending on the user groups that must act when a test fails. This led us to have two main classes of tests — Data Integrity Tests (called [Generic Tests](https://docs.getdbt.com/docs/build/tests) in dbt docs) and Context Driven Tests (called [Singular Tests](https://docs.getdbt.com/docs/build/tests#singular-data-tests) in dbt docs), with varying levels of severity across both test classes.

Data Integrity tests (Generic Tests)  are simple — they’re tests akin to a uniqueness check or not null constraint. These tests are usually actionable by the data platform team rather than subject matter experts. We define Data Integrity tests in our YAML files, similar to how they are [outlined by dbt’s documentation on generic tests](https://docs.getdbt.com/docs/build/tests). They look something like this —

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ This article covers an approach to handling time-varying ragged hierarchies in a

To help visualize this data, we're going to pretend we are a company that manufactures and rents out eBikes in a ride share application. When we build a bike, we keep track of the serial numbers of the components that make up the bike. Any time something breaks and needs to be replaced, we track the old parts that were removed and the new parts that were installed. We also precisely track the mileage accumulated on each of our bikes. Our primary analytical goal is to be able to report on the expected lifetime of each component, so we can prioritize improving that component and reduce costly maintenance.

<!--truncate-->

## Data model

Obviously, a real bike could have a hundred or more separate components. To keep things simple for this article, let's just consider the bike, the frame, a wheel, the wheel rim, tire, and tube. Our component hierarchy looks like:
Expand Down
2 changes: 1 addition & 1 deletion website/blog/2023-07-03-data-vault-2-0-with-dbt-cloud.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ To help you get started, [we have created a template GitHub project](https://git

### Entity Relation Diagrams (ERDs) and dbt

Data lineage is dbt's strength, but sometimes it's not enough to help you to understand the relationships between Data Vault components like a classic ERD would. There are a few open source packages to visualize the entities in your Data Vault built with dbt. I recommend checking out the [dbterd](https://dbterd.datnguyen.de/1.2/index.html) which turns your [dbt relationship data quality checks](https://docs.getdbt.com/docs/build/tests#generic-tests) into an ERD.
Data lineage is dbt's strength, but sometimes it's not enough to help you to understand the relationships between Data Vault components like a classic ERD would. There are a few open source packages to visualize the entities in your Data Vault built with dbt. I recommend checking out the [dbterd](https://dbterd.datnguyen.de/1.2/index.html) which turns your [dbt relationship data quality checks](https://docs.getdbt.com/docs/build/tests#generic-data-tests) into an ERD.

## Summary

Expand Down
43 changes: 30 additions & 13 deletions website/blog/2023-08-01-announcing-materialized-views.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@ hide_table_of_contents: false
date: 2023-08-03
is_featured: true
---
:::note
This blog post was updated on December 18, 2023 to cover the support of MVs on dbt-bigquery
and updates on how to test MVs.
:::


## Introduction

Expand All @@ -26,22 +31,21 @@ Today we are announcing that we now support Materialized Views in dbt. So, what

Materialized views are now an out of the box materialization in your dbt project once you upgrade to the latest version of dbt v1.6 on these following adapters:

- dbt-postgres
- dbt-redshift
- dbt-snowflake
- dbt-databricks
- dbt-materialize*
- dbt-trino*
- dbt-bigquery**
- [dbt-postgres](/reference/resource-configs/postgres-configs#materialized-views)
- [dbt-redshift](reference/resource-configs/redshift-configs#materialized-views)
- [dbt-snowflake](reference/resource-configs/snowflake-configs#dynamic-tables)
- [dbt-databricks](/reference/resource-configs/databricks-configs#materialized-views-and-streaming-tables)
- [dbt-materialize*](/reference/resource-configs/materialize-configs#incremental-models-materialized-views)
- [dbt-trino*](/reference/resource-configs/trino-configs#materialized-view)
- [dbt-bigquery (available on 1.7)](/reference/resource-configs/bigquery-configs#materialized-views)

*These adapters have supported materialized views in their adapter prior 1.6.
**dbt-bigquery support will be coming in 1.7.

Just like you would materialize your sql model as  `table` or `view`  today, you can use `materialized_view` in your model configuration, dbt_project.yml, and resources.yml files. At release, python models will not be supported.



For Postgres/Redshift/Databricks
For Postgres/Redshift/Databricks/Bigquery

```sql
{{
Expand All @@ -61,6 +65,7 @@ config(
}}
```


:::note
We are only supporting dynamic tables on Snowflake, not Snowflake’s materialized views (for a comparison between Snowflake Dynamic Tables and Materialized Views, refer [docs](https://docs.snowflake.com/en/user-guide/dynamic-tables-comparison#dynamic-tables-compared-to-materialized-views). Dynamic tables are better suited for continuous transformations due to functionality like the ability to join, union, and aggregate on base tables, views , and other dynamic tables. Due to those features, they are also more aligned with what other data platforms are calling Materialized Views. For the sake of simplicity, when I refer to materialized views in this blog, I mean dynamic tables in Snowflake.
:::
Expand Down Expand Up @@ -137,6 +142,18 @@ config(
)
}}
```
For Bigquery
```sql
{{
config(
materialized = 'materialized_view',
on_configuration_change = 'apply',
enable_refresh = True,
refresh_interval_minutes = 30
max_staleness = 60,
)
}}
```

For Databricks:

Expand Down Expand Up @@ -171,12 +188,12 @@ Now if you do need to more fully build out your development pipeline (making sur

### Testing

Now let’s dive into the second question: how do you test? In development and QA, this will look the same as any batch run tests. You can run `dbt build` or  `dbt test` and then have the tests run after execution as validation. But in production, what can you do to continually test? Your options are:
Now let’s dive into the second question: how do you test? In development and QA, this will look the same as any tests you might have while developing your batch pipelines. You can run `dbt build` or  `dbt test` and then have the tests run after execution as validation. But in production, what changes?

- Continue to do batch testing as we wait for [materialized tests](https://github.com/dbt-labs/dbt-core/issues/6914)
- Or overriding the –store-failures macro like what Materialize has created [here](https://materialize.com/blog/real-time-data-quality-tests-using-dbt-and-materialize/) for their adapter to materialize failed rows as a materialized view. This is not a great solution for the long term but if you have urgency to put this into production, it is an option.
I recommend that you update any tests applied to a materialized view/dynamic table with the
[store_failures_as](/reference/resource-configs/store_failures_as) configuration set to true and materialized as a view. This allows you to create a view that will provide all the rows that failed your test at time of query. Please note that this does not provide a historical look. You can also create alerting onto the view if it fails your expectations.

In order to promote materialized views into production, the process will look very much like it did with your incremental models. Using SlimCI, for new MVs, you can build them into your QA environment. For existing MVs without changes, we can skip and defer to the production objects.
In order to promote materialized views into production, the process will look very much like it did with your incremental models. Use [CI jobs](https://docs.getdbt.com/docs/deploy/ci-jobs) with defer so you can build them into your QA environment. For existing MVs without changes, we can skip and defer to the production objects.

### Production

Expand Down
8 changes: 6 additions & 2 deletions website/blog/2023-10-31-to-defer-or-to-clone.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ Using the cheat sheet above, let’s explore a few common scenarios and explore
1. Make a copy of our production dataset available in our downstream BI tool
2. To safely iterate on this copy without breaking production datasets

Therefore, we should use **clone** in this scenario
Therefore, we should use **clone** in this scenario.

2. **[Slim CI](https://discourse.getdbt.com/t/how-we-sped-up-our-ci-runs-by-10x-using-slim-ci/2603)**

Expand All @@ -96,7 +96,11 @@ Using the cheat sheet above, let’s explore a few common scenarios and explore
2. Only run and test models in the CI staging environment that have changed from the production environment
3. Reference models from different environments – prod for unchanged models, and staging for modified models

Therefore, we should use **defer** in this scenario
Therefore, we should use **defer** in this scenario.

:::tip Use `dbt clone` in CI jobs to test incremental models
Learn how to [use `dbt clone` in CI jobs](/best-practices/clone-incremental-models) to efficiently test modified incremental models, simulating post-merge behavior while avoiding full-refresh costs.
:::

3. **[Blue/Green Deployments](https://discourse.getdbt.com/t/performing-a-blue-green-deploy-of-your-dbt-project-on-snowflake/1349)**

Expand Down
Loading

0 comments on commit ac2d98e

Please sign in to comment.