Skip to content

Commit

Permalink
Merge branch 'current' into patch-1
Browse files Browse the repository at this point in the history
  • Loading branch information
mirnawong1 authored Dec 19, 2023
2 parents 433c4d1 + fa3b9ff commit 8586656
Show file tree
Hide file tree
Showing 30 changed files with 341 additions and 178 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ This article covers an approach to handling time-varying ragged hierarchies in a

To help visualize this data, we're going to pretend we are a company that manufactures and rents out eBikes in a ride share application. When we build a bike, we keep track of the serial numbers of the components that make up the bike. Any time something breaks and needs to be replaced, we track the old parts that were removed and the new parts that were installed. We also precisely track the mileage accumulated on each of our bikes. Our primary analytical goal is to be able to report on the expected lifetime of each component, so we can prioritize improving that component and reduce costly maintenance.

<!--truncate-->

## Data model

Obviously, a real bike could have a hundred or more separate components. To keep things simple for this article, let's just consider the bike, the frame, a wheel, the wheel rim, tire, and tube. Our component hierarchy looks like:
Expand Down
43 changes: 30 additions & 13 deletions website/blog/2023-08-01-announcing-materialized-views.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@ hide_table_of_contents: false
date: 2023-08-03
is_featured: true
---
:::note
This blog post was updated on December 18, 2023 to cover the support of MVs on dbt-bigquery
and updates on how to test MVs.
:::


## Introduction

Expand All @@ -26,22 +31,21 @@ Today we are announcing that we now support Materialized Views in dbt. So, what

Materialized views are now an out of the box materialization in your dbt project once you upgrade to the latest version of dbt v1.6 on these following adapters:

- dbt-postgres
- dbt-redshift
- dbt-snowflake
- dbt-databricks
- dbt-materialize*
- dbt-trino*
- dbt-bigquery**
- [dbt-postgres](/reference/resource-configs/postgres-configs#materialized-views)
- [dbt-redshift](reference/resource-configs/redshift-configs#materialized-views)
- [dbt-snowflake](reference/resource-configs/snowflake-configs#dynamic-tables)
- [dbt-databricks](/reference/resource-configs/databricks-configs#materialized-views-and-streaming-tables)
- [dbt-materialize*](/reference/resource-configs/materialize-configs#incremental-models-materialized-views)
- [dbt-trino*](/reference/resource-configs/trino-configs#materialized-view)
- [dbt-bigquery (available on 1.7)](/reference/resource-configs/bigquery-configs#materialized-views)

*These adapters have supported materialized views in their adapter prior 1.6.
**dbt-bigquery support will be coming in 1.7.

Just like you would materialize your sql model as  `table` or `view`  today, you can use `materialized_view` in your model configuration, dbt_project.yml, and resources.yml files. At release, python models will not be supported.



For Postgres/Redshift/Databricks
For Postgres/Redshift/Databricks/Bigquery

```sql
{{
Expand All @@ -61,6 +65,7 @@ config(
}}
```


:::note
We are only supporting dynamic tables on Snowflake, not Snowflake’s materialized views (for a comparison between Snowflake Dynamic Tables and Materialized Views, refer [docs](https://docs.snowflake.com/en/user-guide/dynamic-tables-comparison#dynamic-tables-compared-to-materialized-views). Dynamic tables are better suited for continuous transformations due to functionality like the ability to join, union, and aggregate on base tables, views , and other dynamic tables. Due to those features, they are also more aligned with what other data platforms are calling Materialized Views. For the sake of simplicity, when I refer to materialized views in this blog, I mean dynamic tables in Snowflake.
:::
Expand Down Expand Up @@ -137,6 +142,18 @@ config(
)
}}
```
For Bigquery
```sql
{{
config(
materialized = 'materialized_view',
on_configuration_change = 'apply',
enable_refresh = True,
refresh_interval_minutes = 30
max_staleness = 60,
)
}}
```

For Databricks:

Expand Down Expand Up @@ -171,12 +188,12 @@ Now if you do need to more fully build out your development pipeline (making sur

### Testing

Now let’s dive into the second question: how do you test? In development and QA, this will look the same as any batch run tests. You can run `dbt build` or  `dbt test` and then have the tests run after execution as validation. But in production, what can you do to continually test? Your options are:
Now let’s dive into the second question: how do you test? In development and QA, this will look the same as any tests you might have while developing your batch pipelines. You can run `dbt build` or  `dbt test` and then have the tests run after execution as validation. But in production, what changes?

- Continue to do batch testing as we wait for [materialized tests](https://github.com/dbt-labs/dbt-core/issues/6914)
- Or overriding the –store-failures macro like what Materialize has created [here](https://materialize.com/blog/real-time-data-quality-tests-using-dbt-and-materialize/) for their adapter to materialize failed rows as a materialized view. This is not a great solution for the long term but if you have urgency to put this into production, it is an option.
I recommend that you update any tests applied to a materialized view/dynamic table with the
[store_failures_as](/reference/resource-configs/store_failures_as) configuration set to true and materialized as a view. This allows you to create a view that will provide all the rows that failed your test at time of query. Please note that this does not provide a historical look. You can also create alerting onto the view if it fails your expectations.

In order to promote materialized views into production, the process will look very much like it did with your incremental models. Using SlimCI, for new MVs, you can build them into your QA environment. For existing MVs without changes, we can skip and defer to the production objects.
In order to promote materialized views into production, the process will look very much like it did with your incremental models. Use [CI jobs](https://docs.getdbt.com/docs/deploy/ci-jobs) with defer so you can build them into your QA environment. For existing MVs without changes, we can skip and defer to the production objects.

### Production

Expand Down
8 changes: 6 additions & 2 deletions website/blog/2023-10-31-to-defer-or-to-clone.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ Using the cheat sheet above, let’s explore a few common scenarios and explore
1. Make a copy of our production dataset available in our downstream BI tool
2. To safely iterate on this copy without breaking production datasets

Therefore, we should use **clone** in this scenario
Therefore, we should use **clone** in this scenario.

2. **[Slim CI](https://discourse.getdbt.com/t/how-we-sped-up-our-ci-runs-by-10x-using-slim-ci/2603)**

Expand All @@ -96,7 +96,11 @@ Using the cheat sheet above, let’s explore a few common scenarios and explore
2. Only run and test models in the CI staging environment that have changed from the production environment
3. Reference models from different environments – prod for unchanged models, and staging for modified models

Therefore, we should use **defer** in this scenario
Therefore, we should use **defer** in this scenario.

:::tip Use `dbt clone` in CI jobs to test incremental models
Learn how to [use `dbt clone` in CI jobs](/best-practices/clone-incremental-models) to efficiently test modified incremental models, simulating post-merge behavior while avoiding full-refresh costs.
:::

3. **[Blue/Green Deployments](https://discourse.getdbt.com/t/performing-a-blue-green-deploy-of-your-dbt-project-on-snowflake/1349)**

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "How we built consistent product launch metrics with the dbt Semantic Layer."
title: "How we built consistent product launch metrics with the dbt Semantic Layer"
description: "We built an end-to-end data pipeline for measuring the launch of the dbt Semantic Layer using the dbt Semantic Layer."
slug: product-analytics-pipeline-with-dbt-semantic-layer

Expand Down
4 changes: 4 additions & 0 deletions website/dbt-versions.js
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,10 @@ exports.versionedPages = [
{
"page": "docs/build/saved-queries",
"firstVersion": "1.7",
},
{
"page": "reference/resource-configs/on_configuration_change",
"firstVersion": "1.6",
}
]

Expand Down
12 changes: 9 additions & 3 deletions website/docs/best-practices/clone-incremental-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,17 @@ This can be suboptimal because:
- Typically incremental models are your largest datasets, so they take a long time to build in their entirety which can slow down development time and incur high warehouse costs.
- There are situations where a `full-refresh` of the incremental model passes successfully in your CI job but an _incremental_ build of that same table in prod would fail when the PR is merged into main (think schema drift where [on_schema_change](/docs/build/incremental-models#what-if-the-columns-of-my-incremental-model-change) config is set to `fail`)

You can alleviate these problems by zero copy cloning the relevant, pre-exisitng incremental models into your PR-specific schema as the first step of the CI job using the `dbt clone` command. This way, the incremental models already exist in the PR-specific schema when you first execute the command `dbt build --select state:modified+` so the `is_incremental` flag will be `true`.
You can alleviate these problems by zero copy cloning the relevant, pre-existing incremental models into your PR-specific schema as the first step of the CI job using the `dbt clone` command. This way, the incremental models already exist in the PR-specific schema when you first execute the command `dbt build --select state:modified+` so the `is_incremental` flag will be `true`.

You'll have two commands for your dbt Cloud CI check to execute:
1. Clone all of the pre-existing incremental models that have been modified or are downstream of another model that has been modified: `dbt clone --select state:modified+,config.materialized:incremental,state:old`
2. Build all of the models that have been modified and their downstream dependencies: `dbt build --select state:modified+`
1. Clone all of the pre-existing incremental models that have been modified or are downstream of another model that has been modified:
```shell
dbt clone --select state:modified+,config.materialized:incremental,state:old
```
2. Build all of the models that have been modified and their downstream dependencies:
```shell
dbt build --select state:modified+
```

Because of your first clone step, the incremental models selected in your `dbt build` on the second step will run in incremental mode.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,43 +1,35 @@
---
title: "Tips and tricks"
id: dbt-cloud-tips
description: "Check out any dbt Cloud and IDE-related tips."
sidebar_label: "Tips and tricks"
title: "dbt tips and tricks"
description: "Check out any dbt-related tips and tricks to help you work faster and be more productive."
sidebar_label: "dbt tips and tricks"
pagination_next: null
---

# dbt Cloud tips
Use this page for valuable insights and practical advice to enhance your dbt experience. Whether you're new to dbt or an experienced user, these tips are designed to help you work more efficiently and effectively.

The Cloud IDE provides keyboard shortcuts, features, and development tips to help you work faster and be more productive. Use this Cloud IDE cheat sheet to help you quickly reference some common operations.
The following tips are organized into the following categories:

## Cloud IDE Keyboard shortcuts
- [Package tips](#package-tips) to help you streamline your workflow.
- [Advanced tips and techniques](#advanced-tips-and-techniques) to help you get the most out of dbt.

There are default keyboard shortcuts that can help make development more productive and easier for everyone.

- Press Fn-F1 to view a full list of the editor shortcuts
- Command-O on macOS or Control-O on Windows to select a file to open
- Command-P/Command-Shift-P on macOS or Control-P/Control-Shift-P on Windows to see the command palette
- Hold Option-click-on-area or press Shift-Option-Command on macOS or Hold-Alt-click-on-area on Windows to select multiple lines and perform a multi-edit. You can also press Command-E to perform this operation on the command line.
- Command-Enter on macOS or Control-Enter on Windows to Preview your code
- Command-Shift-Enter on macOS or Control-Shift-Enter on Windows to Compile
- Highlight a portion of code and use the above shortcuts to Preview or Compile code
- Enter two underscores (__) in the IDE to reveal a list of dbt functions
- Press Control-backtick (or Ctrl + `) to toggle the Invocation history
- Press Command-Option-forward slash on macOS or Control-Alt-forward slash on Windows on the selected code to add a block comment. SQL files will use the Jinja syntax `({# #})` rather than the SQL one `(/* */)`. Markdown files will use the Markdown syntax `(<!-- -->)`
- Option-W on macOS or Alt-W on Windows will close the currently active editor tab

If you're developing with the dbt Cloud IDE, you can refer to the [keyboard shortcuts](/docs/cloud/dbt-cloud-ide/keyboard-shortcuts) page to help make development more productive and easier for everyone.

## Package tips

- Use the [dbt_codegen](https://hub.getdbt.com/dbt-labs/codegen/latest/) package to help you generate YML files for your models and sources and SQL files for your staging models.
- The [dbt_utils](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/) package contains macros useful for daily development. For example, `date_spine` generates a table with all dates between the ones provided as parameters.
- The [dbt_project_evaluator](https://hub.getdbt.com/dbt-labs/dbt_project_evaluator/latest) package compares your dbt project against a list of our best practices and provides suggestions and guidelines on how to update your models.
- The [dbt_expectations](https://hub.getdbt.com/calogica/dbt_expectations/latest) package contains many tests beyond those built into dbt Core.
- The [dbt_audit_helper](https://hub.getdbt.com/#:~:text=adwords-,audit_helper,-codegen) package lets you compare the output of 2 queries. Use it when refactoring existing logic to ensure that the new results are identical.
- The [dbt_artifacts](https://hub.getdbt.com/brooklyn-data/dbt_artifacts/latest) package saves information about your dbt runs directly to your data platform so that you can track the performance of models over time.
- The [dbt_meta_testing](https://hub.getdbt.com/tnightengale/dbt_meta_testing/latest) package checks that your dbt project is sufficiently tested and documented.
Leverage these dbt packages to streamline your workflow:

| Package | Description |
|---------|-------------|
| [`dbt_codegen`](https://hub.getdbt.com/dbt-labs/codegen/latest/) |Use the package to help you generate YML files for your models and sources and SQL files for your staging models. |
| [`dbt_utils`](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/) | The package contains macros useful for daily development. For example, `date_spine` generates a table with all dates between the ones provided as parameters. |
| [`dbt_project_evaluator`](https://hub.getdbt.com/dbt-labs/dbt_project_evaluator/latest) | The package compares your dbt project against a list of our best practices and provides suggestions and guidelines on how to update your models. |
| [`dbt_expectations`](https://hub.getdbt.com/calogica/dbt_expectations/latest) | The package contains many tests beyond those built into dbt. |
| [`dbt_audit_helper`](https://hub.getdbt.com/#:~:text=adwords-,audit_helper,-codegen) | The package lets you compare the output of 2 queries. Use it when refactoring existing logic to ensure that the new results are identical. |
| [`dbt_artifacts`](https://hub.getdbt.com/brooklyn-data/dbt_artifacts/latest) | The package saves information about your dbt runs directly to your data platform so that you can track the performance of models over time. |
| [`dbt_meta_testing`](https://hub.getdbt.com/tnightengale/dbt_meta_testing/latest) | This package checks that your dbt project is sufficiently tested and documented. |

## Advanced tips
## Advanced tips and techniques

- Use your folder structure as your primary selector method. `dbt build --select marts.marketing` is simpler and more resilient than relying on tagging every model.
- Think about jobs in terms of build cadences and SLAs. Run models that have hourly, daily, or weekly build cadences together.
Expand All @@ -61,4 +53,4 @@ There are default keyboard shortcuts that can help make development more product

- [Quickstart guide](/guides)
- [About dbt Cloud](/docs/cloud/about-cloud/dbt-cloud-features)
- [Develop in the Cloud](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud)
- [Develop in the Cloud](/docs/cloud/about-develop-dbt)
4 changes: 2 additions & 2 deletions website/docs/docs/build/jinja-macros.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,8 +71,8 @@ group by 1

You can recognize Jinja based on the delimiters the language uses, which we refer to as "curlies":
- **Expressions `{{ ... }}`**: Expressions are used when you want to output a string. You can use expressions to reference [variables](/reference/dbt-jinja-functions/var) and call [macros](/docs/build/jinja-macros#macros).
- **Statements `{% ... %}`**: Statements are used for control flow, for example, to set up `for` loops and `if` statements, or to define macros.
- **Comments `{# ... #}`**: Jinja comments are used to prevent the text within the comment from compiling.
- **Statements `{% ... %}`**: Statements don't output a string. They are used for control flow, for example, to set up `for` loops and `if` statements, to [set](https://jinja.palletsprojects.com/en/3.1.x/templates/#assignments) or [modify](https://jinja.palletsprojects.com/en/3.1.x/templates/#expression-statement) variables, or to define macros.
- **Comments `{# ... #}`**: Jinja comments are used to prevent the text within the comment from executing or outputing a string.

When used in a dbt model, your Jinja needs to compile to a valid query. To check what SQL your Jinja compiles to:
* **Using dbt Cloud:** Click the compile button to see the compiled SQL in the Compiled SQL pane
Expand Down
Loading

0 comments on commit 8586656

Please sign in to comment.