From 7b0efb2a96e769b240915fd0d4bc7c43ce4495cd Mon Sep 17 00:00:00 2001 From: mirnawong1 <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 19 Dec 2023 13:05:53 -0500 Subject: [PATCH 1/2] add discourse link + simplify language this pr adds a discourse link to incremental strategies discussion for large datasets and simplifies the 'when should i use an incremental model' paragraph/section. --- website/docs/docs/build/incremental-models.md | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/website/docs/docs/build/incremental-models.md b/website/docs/docs/build/incremental-models.md index 2a247263159..ed0e6b51f02 100644 --- a/website/docs/docs/build/incremental-models.md +++ b/website/docs/docs/build/incremental-models.md @@ -154,17 +154,21 @@ For detailed usage instructions, check out the [dbt run](/reference/commands/run # Understanding incremental models ## When should I use an incremental model? -It's often desirable to build models as tables in your data warehouse since downstream queries are more performant. While the `table` materialization also creates your models as tables, it rebuilds the table on each dbt run. These runs can become problematic in that they use a lot of compute when either: -* source data tables have millions, or even billions, of rows. -* the transformations on the source data are computationally expensive (that is, take a long time to execute), for example, complex Regex functions, or UDFs are being used to transform data. -Like many things in programming, incremental models are a trade-off between complexity and performance. While they are not as straightforward as the `view` and `table` materializations, they can lead to significantly better performance of your dbt runs. +Building models as tables in your data warehouse is often preferred for better query performance. However, using `table` materialization can be computationally intensive, especially when: + +- Source data has millions or billions of rows. +- Data transformations on the source data are computationally expensive (take a long time to execute) and complex, like using Regex or UDFs. + +Incremental models offer a balance between complexity and improved performance compared to `view` and `table` materializations and offer better performance of your dbt runs. + +In addition to these considerations for incremental models, it's important to understand their limits and challenges, particularly with large datasets. For more insights into efficient strategies, performance considerations, and the handling of late-arriving data in incremental models, refer to the [On the Limits of Incrementality](https://discourse.getdbt.com/t/on-the-limits-of-incrementality/303) discourse discussion. ## Understanding the is_incremental() macro The `is_incremental()` macro will return `True` if _all_ of the following conditions are met: * the destination table already exists in the database * dbt is _not_ running in full-refresh mode -* the running model is configured with `materialized='incremental'` +* The running model is configured with `materialized='incremental'` Note that the SQL in your model needs to be valid whether `is_incremental()` evaluates to `True` or `False`. From 03eb38de5abe022ef49a96ab7899bab68c882cc6 Mon Sep 17 00:00:00 2001 From: mirnawong1 <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 19 Dec 2023 13:26:37 -0500 Subject: [PATCH 2/2] Update website/docs/docs/build/incremental-models.md Co-authored-by: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> --- website/docs/docs/build/incremental-models.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-models.md b/website/docs/docs/build/incremental-models.md index ed0e6b51f02..cc45290ae15 100644 --- a/website/docs/docs/build/incremental-models.md +++ b/website/docs/docs/build/incremental-models.md @@ -162,7 +162,7 @@ Building models as tables in your data warehouse is often preferred for better q Incremental models offer a balance between complexity and improved performance compared to `view` and `table` materializations and offer better performance of your dbt runs. -In addition to these considerations for incremental models, it's important to understand their limits and challenges, particularly with large datasets. For more insights into efficient strategies, performance considerations, and the handling of late-arriving data in incremental models, refer to the [On the Limits of Incrementality](https://discourse.getdbt.com/t/on-the-limits-of-incrementality/303) discourse discussion. +In addition to these considerations for incremental models, it's important to understand their limitations and challenges, particularly with large datasets. For more insights into efficient strategies, performance considerations, and the handling of late-arriving data in incremental models, refer to the [On the Limits of Incrementality](https://discourse.getdbt.com/t/on-the-limits-of-incrementality/303) discourse discussion. ## Understanding the is_incremental() macro The `is_incremental()` macro will return `True` if _all_ of the following conditions are met: