From 9db07cc03bca6c12e308d7550b989997d3bacc96 Mon Sep 17 00:00:00 2001 From: Natalie Fiann Date: Wed, 4 Dec 2024 16:43:28 +0000 Subject: [PATCH 01/74] Created new section called Parallel batch execution --- .../docs/docs/build/incremental-microbatch.md | 63 +++++++++++++++++++ 1 file changed, 63 insertions(+) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 9055aa7650b..6482b844dee 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -185,6 +185,7 @@ Several configurations are relevant to microbatch models, and some are required: | `begin` | Date (required) | The "beginning of time" for the microbatch model. This is the starting point for any initial or full-refresh builds. For example, a daily-grain microbatch model run on `2024-10-01` with `begin = '2023-10-01` will process 366 batches (it's a leap year!) plus the batch for "today." | N/A | | `batch_size` | String (required) | The granularity of your batches. Supported values are `hour`, `day`, `month`, and `year` | N/A | | `lookback` | Integer (optional) | Process X batches prior to the latest bookmark to capture late-arriving records. | `1` | +|[`concurrent_batches`](docs/build/incremental-microbatch#concurrent-execution-with-the-concurrent_batches-configuration)|Boolean|Determines whether batches should be run concurrently (at the same time) or sequentially (one after the other)|`True` or `False`| @@ -239,6 +240,68 @@ For now, dbt assumes that all values supplied are in UTC: While we may consider adding support for custom time zones in the future, we also believe that defining these values in UTC makes everyone's lives easier. +## Parallel batch execution + +The microbatch strategy offers the benefit of updating a model in smaller, more manageable batches. Within dbt, these microbatches can often run at the same time (in parallel), enhancing efficiency. + +Parallel batch execution means that multiple batches are processed at the same time, instead of one after the other (sequentially). For example, if you have a micro-batch model with 12 batches, those batches can be executed in parallel. Specifically they'll run in parallel limited by the number of [available threads](docs/running-a-dbt-project/using-threads). + +To run batches in parallel you can use the `concurrent_batches` configuation: + + + +```yaml +models: + +incremental_strategy: "concurrent_batches" +``` + + + +or: + + + +```sql +{{ + config( + materialized='incremental', + incremental_strategy='concurrent_batches' + + ... + ) +}} + +select ... +``` + + + +It's not possible to run batches concurrently when the model definition contains [`{{ this }}`](/reference/dbt-jinja-functions/this). This is because Jinja represents the state of a table. + +### Concurrent Execution with the `concurrent_batches` configuration + +To enable parallel execution when possible, users can check the model's concurrent_batches configuration: + +| `concurrent_batches` | Description | +|----------------------------|----------------------------| +| `True` |Batches will run in parallel.| +| `False` |Batches will run sequentially.| +| `None` (not explicitly set)|dbt will evaluate the model's Jinja logic and evaluate if it contains a reference to `this`. If it references `this`, the batches will run sequentially. This is because Jinja represents states of table. Otherwise, the batches will run in parallel.| + +### Sequential Execution Conditions + +If any of the following conditions are met, the batches will execute sequentially: + +1. The database adapter does not support concurrent batches. +2. The relation in the data warehouse for the model doesn't exist. + +### Supported adapters + +The following adapters support parallel batch execution: + +- dbt-snowflake +- dbt-databricks + ## How `microbatch` compares to other incremental strategies? Most incremental models rely on the end user (you) to explicitly tell dbt what "new" means, in the context of each model, by writing a filter in an `{% if is_incremental() %}` conditional block. You are responsible for crafting this SQL in a way that queries [`{{ this }}`](/reference/dbt-jinja-functions/this) to check when the most recent record was last loaded, with an optional look-back window for late-arriving records. From ca9e4e9c6db87159fdacec2bca6ed01f843e1cb2 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Wed, 4 Dec 2024 16:52:34 +0000 Subject: [PATCH 02/74] Update website/docs/docs/build/incremental-microbatch.md Co-authored-by: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 6482b844dee..9887bb72533 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -185,7 +185,7 @@ Several configurations are relevant to microbatch models, and some are required: | `begin` | Date (required) | The "beginning of time" for the microbatch model. This is the starting point for any initial or full-refresh builds. For example, a daily-grain microbatch model run on `2024-10-01` with `begin = '2023-10-01` will process 366 batches (it's a leap year!) plus the batch for "today." | N/A | | `batch_size` | String (required) | The granularity of your batches. Supported values are `hour`, `day`, `month`, and `year` | N/A | | `lookback` | Integer (optional) | Process X batches prior to the latest bookmark to capture late-arriving records. | `1` | -|[`concurrent_batches`](docs/build/incremental-microbatch#concurrent-execution-with-the-concurrent_batches-configuration)|Boolean|Determines whether batches should be run concurrently (at the same time) or sequentially (one after the other)|`True` or `False`| +|`concurrent_batches`| Boolean | Configures whether batches can run concurrently (at the same time) or sequentially (one after the other). When set to `True` and conditions are met, batches run in parallel. |`True` or `False`| From 8fd0faaadb77acaaf40b6ff207d87861ec924c1a Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Wed, 4 Dec 2024 16:54:23 +0000 Subject: [PATCH 03/74] Update website/docs/docs/build/incremental-microbatch.md Co-authored-by: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 9887bb72533..a5fef740c94 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -244,7 +244,7 @@ While we may consider adding support for custom time zones in the future, we als The microbatch strategy offers the benefit of updating a model in smaller, more manageable batches. Within dbt, these microbatches can often run at the same time (in parallel), enhancing efficiency. -Parallel batch execution means that multiple batches are processed at the same time, instead of one after the other (sequentially). For example, if you have a micro-batch model with 12 batches, those batches can be executed in parallel. Specifically they'll run in parallel limited by the number of [available threads](docs/running-a-dbt-project/using-threads). +Parallel batch execution means that multiple batches are processed at the same time, instead of one after the other (sequentially). For example, if you have a microbatch model with 12 batches, you can execute those batches in parallel. Specifically they'll run in parallel limited by the number of [available threads](/docs/running-a-dbt-project/using-threads). To run batches in parallel you can use the `concurrent_batches` configuation: From ca3e04c8342c0026fd27531d6cc5789e3fbd07d8 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Wed, 4 Dec 2024 16:54:38 +0000 Subject: [PATCH 04/74] Update website/docs/docs/build/incremental-microbatch.md Co-authored-by: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index a5fef740c94..7aad6218442 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -246,7 +246,7 @@ The microbatch strategy offers the benefit of updating a model in smaller, more Parallel batch execution means that multiple batches are processed at the same time, instead of one after the other (sequentially). For example, if you have a microbatch model with 12 batches, you can execute those batches in parallel. Specifically they'll run in parallel limited by the number of [available threads](/docs/running-a-dbt-project/using-threads). -To run batches in parallel you can use the `concurrent_batches` configuation: +To run batches in parallel, use the `concurrent_batches` configuration: From 0cb99bad39fbcbba713453aac98c4204c3092c38 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Wed, 4 Dec 2024 16:55:01 +0000 Subject: [PATCH 05/74] Update website/docs/docs/build/incremental-microbatch.md Co-authored-by: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 7aad6218442..5e436868bc8 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -242,7 +242,7 @@ While we may consider adding support for custom time zones in the future, we als ## Parallel batch execution -The microbatch strategy offers the benefit of updating a model in smaller, more manageable batches. Within dbt, these microbatches can often run at the same time (in parallel), enhancing efficiency. +The microbatch strategy offers the benefit of updating a model in smaller, more manageable batches. Parallel batch execution means that multiple batches are processed at the same time, instead of one after the other (sequentially). For example, if you have a microbatch model with 12 batches, you can execute those batches in parallel. Specifically they'll run in parallel limited by the number of [available threads](/docs/running-a-dbt-project/using-threads). From 099a2bc236e5ee45eb0335e0b50bb7bbdcb6c9e4 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Wed, 4 Dec 2024 16:55:22 +0000 Subject: [PATCH 06/74] Update website/docs/docs/build/incremental-microbatch.md Co-authored-by: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 5e436868bc8..4a3b2f91c45 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -278,7 +278,7 @@ select ... It's not possible to run batches concurrently when the model definition contains [`{{ this }}`](/reference/dbt-jinja-functions/this). This is because Jinja represents the state of a table. -### Concurrent Execution with the `concurrent_batches` configuration +### Configure `concurrent_batches` To enable parallel execution when possible, users can check the model's concurrent_batches configuration: From 40c57e8e13b0f61a30271936d7b82cd0fd861747 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Wed, 4 Dec 2024 16:55:56 +0000 Subject: [PATCH 07/74] Update website/docs/docs/build/incremental-microbatch.md Co-authored-by: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 4a3b2f91c45..88d86524f5d 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -288,7 +288,7 @@ To enable parallel execution when possible, users can check the model's concurre | `False` |Batches will run sequentially.| | `None` (not explicitly set)|dbt will evaluate the model's Jinja logic and evaluate if it contains a reference to `this`. If it references `this`, the batches will run sequentially. This is because Jinja represents states of table. Otherwise, the batches will run in parallel.| -### Sequential Execution Conditions +### Sequential execution Conditions If any of the following conditions are met, the batches will execute sequentially: From 24911766584b451166749d84d51fa5a4bd7cca01 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Wed, 4 Dec 2024 17:31:19 +0000 Subject: [PATCH 08/74] Update website/docs/docs/build/incremental-microbatch.md Co-authored-by: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 88d86524f5d..3a1d432e207 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -280,7 +280,7 @@ It's not possible to run batches concurrently when the model definition contains ### Configure `concurrent_batches` -To enable parallel execution when possible, users can check the model's concurrent_batches configuration: +To enable parallel execution when possible, users can check the model's `concurrent_batches` configuration: | `concurrent_batches` | Description | |----------------------------|----------------------------| From e9a42d77bb89508c2b45d96fe346d252a69c2fce Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Wed, 4 Dec 2024 17:34:19 +0000 Subject: [PATCH 09/74] Update website/docs/docs/build/incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 3a1d432e207..8f4233de756 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -275,7 +275,10 @@ select ... ``` +If any of the following conditions are met, the batches will execute sequentially: +1. The database adapter does not support concurrent batches. +2. The relation in the data warehouse for the model doesn't exist. It's not possible to run batches concurrently when the model definition contains [`{{ this }}`](/reference/dbt-jinja-functions/this). This is because Jinja represents the state of a table. ### Configure `concurrent_batches` From efbeae560aff01e0bb2a2e22470ef8b47f0ded83 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Wed, 4 Dec 2024 17:34:49 +0000 Subject: [PATCH 10/74] Update website/docs/docs/build/incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 1 - 1 file changed, 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 8f4233de756..6bc8a5774c7 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -293,7 +293,6 @@ To enable parallel execution when possible, users can check the model's `concurr ### Sequential execution Conditions -If any of the following conditions are met, the batches will execute sequentially: 1. The database adapter does not support concurrent batches. 2. The relation in the data warehouse for the model doesn't exist. From ed3132eee11d77c0be956199a67a9284974452b7 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Wed, 4 Dec 2024 17:35:10 +0000 Subject: [PATCH 11/74] Update website/docs/docs/build/incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 1 - 1 file changed, 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 6bc8a5774c7..bdf46f2efae 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -294,7 +294,6 @@ To enable parallel execution when possible, users can check the model's `concurr ### Sequential execution Conditions -1. The database adapter does not support concurrent batches. 2. The relation in the data warehouse for the model doesn't exist. ### Supported adapters From 3e207c02768e62a60515fc8297c350c531476d66 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Wed, 4 Dec 2024 17:35:27 +0000 Subject: [PATCH 12/74] Update website/docs/docs/build/incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 1 - 1 file changed, 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index bdf46f2efae..cbc40547799 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -294,7 +294,6 @@ To enable parallel execution when possible, users can check the model's `concurr ### Sequential execution Conditions -2. The relation in the data warehouse for the model doesn't exist. ### Supported adapters From cbed4d6ee30b17e1a522f9aa4255e50ae6cce160 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Wed, 4 Dec 2024 17:35:49 +0000 Subject: [PATCH 13/74] Update website/docs/docs/build/incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 1 - 1 file changed, 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index cbc40547799..a73712df3d8 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -291,7 +291,6 @@ To enable parallel execution when possible, users can check the model's `concurr | `False` |Batches will run sequentially.| | `None` (not explicitly set)|dbt will evaluate the model's Jinja logic and evaluate if it contains a reference to `this`. If it references `this`, the batches will run sequentially. This is because Jinja represents states of table. Otherwise, the batches will run in parallel.| -### Sequential execution Conditions From 6d9e7c11ede21edd7462c808dc79cf7e49769fab Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Wed, 4 Dec 2024 17:36:27 +0000 Subject: [PATCH 14/74] Update website/docs/docs/build/incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index a73712df3d8..67d75604e1a 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -275,6 +275,8 @@ select ... ``` +### Sequential execution Conditions + If any of the following conditions are met, the batches will execute sequentially: 1. The database adapter does not support concurrent batches. From f349a164a3241806f4a18c364fb10a343a9a4ca3 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Wed, 4 Dec 2024 21:11:23 +0000 Subject: [PATCH 15/74] Update website/docs/docs/build/incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 1 + 1 file changed, 1 insertion(+) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 67d75604e1a..832cf68e680 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -186,6 +186,7 @@ Several configurations are relevant to microbatch models, and some are required: | `batch_size` | String (required) | The granularity of your batches. Supported values are `hour`, `day`, `month`, and `year` | N/A | | `lookback` | Integer (optional) | Process X batches prior to the latest bookmark to capture late-arriving records. | `1` | |`concurrent_batches`| Boolean | Configures whether batches can run concurrently (at the same time) or sequentially (one after the other). When set to `True` and conditions are met, batches run in parallel. |`True` or `False`| +| `{{this}}` | Macro | When the model definition contains `{{ this }}`, batches can't be ran concurrently | Refer to [Concurrent Execution with the `concurrent_batches` configuration](/docs/build/incremental-microbatch#concurrent-execution-with-the-concurrent_batches-configuration) for more information | From a8e56df27a9e3ac0457a5da1fbd6f6a096a4fe0e Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Wed, 4 Dec 2024 21:58:35 +0000 Subject: [PATCH 16/74] Update website/docs/docs/build/incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 832cf68e680..855039db7ee 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -266,7 +266,7 @@ or: {{ config( materialized='incremental', - incremental_strategy='concurrent_batches' + incremental_strategy='microbatch' ... ) From 7c89a592e86ee0e02df7bcb4dfe782ac48367687 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Wed, 4 Dec 2024 22:00:23 +0000 Subject: [PATCH 17/74] Update website/docs/docs/build/incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 855039db7ee..07a62ec5ab0 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -253,7 +253,7 @@ To run batches in parallel, use the `concurrent_batches` configuration: ```yaml models: - +incremental_strategy: "concurrent_batches" + +incremental_strategy: "microbatch" ``` From c17a31a0d001482899b3f69157822e888ddba63b Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Wed, 4 Dec 2024 22:03:10 +0000 Subject: [PATCH 18/74] Update website/docs/docs/build/incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 1 + 1 file changed, 1 insertion(+) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 07a62ec5ab0..16d1aef4685 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -266,6 +266,7 @@ or: {{ config( materialized='incremental', + unique_key='concurrent_batches', incremental_strategy='microbatch' ... From 5e24a5af1e2004440e63addc60a8e1c778a80928 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Thu, 5 Dec 2024 10:03:13 +0000 Subject: [PATCH 19/74] Update website/docs/docs/build/incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 1 - 1 file changed, 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 16d1aef4685..14c61174974 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -186,7 +186,6 @@ Several configurations are relevant to microbatch models, and some are required: | `batch_size` | String (required) | The granularity of your batches. Supported values are `hour`, `day`, `month`, and `year` | N/A | | `lookback` | Integer (optional) | Process X batches prior to the latest bookmark to capture late-arriving records. | `1` | |`concurrent_batches`| Boolean | Configures whether batches can run concurrently (at the same time) or sequentially (one after the other). When set to `True` and conditions are met, batches run in parallel. |`True` or `False`| -| `{{this}}` | Macro | When the model definition contains `{{ this }}`, batches can't be ran concurrently | Refer to [Concurrent Execution with the `concurrent_batches` configuration](/docs/build/incremental-microbatch#concurrent-execution-with-the-concurrent_batches-configuration) for more information | From 4da02b1fbf8d6ab08f16c9d637d269362976e859 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Thu, 5 Dec 2024 10:14:07 +0000 Subject: [PATCH 20/74] Update website/docs/docs/build/incremental-microbatch.md Co-authored-by: Quigley Malcolm --- website/docs/docs/build/incremental-microbatch.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index e8c3f48428b..7b35d848cb9 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -245,7 +245,16 @@ While we may consider adding support for custom time zones in the future, we als The microbatch strategy offers the benefit of updating a model in smaller, more manageable batches. Parallel batch execution means that multiple batches are processed at the same time, instead of one after the other (sequentially). For example, if you have a microbatch model with 12 batches, you can execute those batches in parallel. Specifically they'll run in parallel limited by the number of [available threads](/docs/running-a-dbt-project/using-threads). +### How it's determined if a batch will run in parallel +A batch can only run in parallel if: +1. It's **not** the first batch +2. It's **not** the last batch +3. The [adapter supports](/docs/build/incremental-microbatch#supported-adapters) concurrent batches + +After [1], [2], and [3] we check if the [`this` jinja function](https://docs.getdbt.com/reference/dbt-jinja-functions/this) is invoked in the model. If `this` is used, then the batch will be run sequentially, as it may be that your batch depends on the existence of prior batches. If `this` isn't used, the batch will be run in parallel. + +You can override the check for `this` by setting `concurrent_batches` to either `True` or `False`. If set to `False`, the batch will be run sequentially. If set to `True` the batch will be run in parallel (assuming [1], [2], and [3]) To run batches in parallel, use the `concurrent_batches` configuration: From bc2adafd8c3ec660fb02b7152071d747fac9afc1 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Thu, 5 Dec 2024 10:14:23 +0000 Subject: [PATCH 21/74] Update website/docs/docs/build/incremental-microbatch.md Co-authored-by: Quigley Malcolm --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 7b35d848cb9..2dc846b8154 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -255,7 +255,7 @@ A batch can only run in parallel if: After [1], [2], and [3] we check if the [`this` jinja function](https://docs.getdbt.com/reference/dbt-jinja-functions/this) is invoked in the model. If `this` is used, then the batch will be run sequentially, as it may be that your batch depends on the existence of prior batches. If `this` isn't used, the batch will be run in parallel. You can override the check for `this` by setting `concurrent_batches` to either `True` or `False`. If set to `False`, the batch will be run sequentially. If set to `True` the batch will be run in parallel (assuming [1], [2], and [3]) -To run batches in parallel, use the `concurrent_batches` configuration: +To override the `this` check, use the `concurrent_batches` configuration: From b30107ccaf5fcdb8d308f67f2d35079770363bbb Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Thu, 5 Dec 2024 10:14:37 +0000 Subject: [PATCH 22/74] Update website/docs/docs/build/incremental-microbatch.md Co-authored-by: Quigley Malcolm --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 2dc846b8154..3f2d73d6e8f 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -261,7 +261,7 @@ To override the `this` check, use the `concurrent_batches` configuration: ```yaml models: - +incremental_strategy: "microbatch" + +concurrent_batches: True ``` From 146052617be25fc9a166ec15ca36497ecb23d280 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Thu, 5 Dec 2024 10:14:46 +0000 Subject: [PATCH 23/74] Update website/docs/docs/build/incremental-microbatch.md Co-authored-by: Quigley Malcolm --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 3f2d73d6e8f..18b3f999cf8 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -274,7 +274,7 @@ or: {{ config( materialized='incremental', - unique_key='concurrent_batches', + concurrent_batches=True, incremental_strategy='microbatch' ... From ed0fa0404037ca9227f6c748cd64540d8f1f0d85 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Thu, 5 Dec 2024 10:15:23 +0000 Subject: [PATCH 24/74] Update website/docs/docs/build/incremental-microbatch.md Co-authored-by: Quigley Malcolm --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 18b3f999cf8..046030d937f 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -301,7 +301,7 @@ To enable parallel execution when possible, users can check the model's `concurr |----------------------------|----------------------------| | `True` |Batches will run in parallel.| | `False` |Batches will run sequentially.| -| `None` (not explicitly set)|dbt will evaluate the model's Jinja logic and evaluate if it contains a reference to `this`. If it references `this`, the batches will run sequentially. This is because Jinja represents states of table. Otherwise, the batches will run in parallel.| +| `None` (default)|dbt will evaluate the model's Jinja logic and evaluate if it contains a reference to `this`. If it references `this`, the batches will run sequentially. This is because Jinja represents states of table. Otherwise, the batches will run in parallel.| From 6a7af2c989c0c650d4319be7c893241956c12f15 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 5 Dec 2024 12:42:10 +0000 Subject: [PATCH 25/74] Update incremental-microbatch.md committing some changes to the parallel --- .../docs/docs/build/incremental-microbatch.md | 92 ++++++++++++++----- 1 file changed, 68 insertions(+), 24 deletions(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 72a9b144c4a..8a0e7445125 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -186,7 +186,7 @@ Several configurations are relevant to microbatch models, and some are required: | `begin` | The "beginning of time" for the microbatch model. This is the starting point for any initial or full-refresh builds. For example, a daily-grain microbatch model run on `2024-10-01` with `begin = '2023-10-01` will process 366 batches (it's a leap year!) plus the batch for "today." | N/A | Date | Required | | `batch_size` | The granularity of your batches. Supported values are `hour`, `day`, `month`, and `year` | N/A | String | Required | | `lookback` | Process X batches prior to the latest bookmark to capture late-arriving records. | `1` | Integer | Optional | -|`concurrent_batches`| Configures whether batches can run concurrently (at the same time) or sequentially (one after the other). Can set to `True` or `False`. When set to `True` and conditions are met, batches run in parallel. | N/A | Boolean | Optional | +|`concurrent_batches`| Configures whether batches can run concurrently (at the same time) or sequentially (one after the other). Can set to `True` or `False`. When set to `True` and conditions are met, batches run in parallel. | `None` | Boolean | Optional | @@ -286,19 +286,48 @@ While we may consider adding support for custom time zones in the future, we als The microbatch strategy offers the benefit of updating a model in smaller, more manageable batches. -Parallel batch execution means that multiple batches are processed at the same time, instead of one after the other (sequentially). For example, if you have a microbatch model with 12 batches, you can execute those batches in parallel. Specifically they'll run in parallel limited by the number of [available threads](/docs/running-a-dbt-project/using-threads). -### How it's determined if a batch will run in parallel +Parallel batch execution means that multiple batches are processed at the same time, instead of one after the other (sequentially) for faster processing of your microbatch models. + +For example, if you have a microbatch model with 12 batches, you can execute those batches to run in parallel. Specifically they'll run in parallel limited by the number of [available threads](/docs/running-a-dbt-project/using-threads). + +### Prerequisites + +To enable parallel execution, you must meet the following conditions: + +- You use the following supported adapters: + - Snowflake + - Databricks + - More adapters coming soon! +- The relation in the data warehouse for the model doesn't exist +- You meet [additional conditions](#how-parallel-batch-execution-works) mentioned in the next section + +### How parallel batch execution works A batch can only run in parallel if: -1. It's **not** the first batch -2. It's **not** the last batch -3. The [adapter supports](/docs/build/incremental-microbatch#supported-adapters) concurrent batches -After [1], [2], and [3] we check if the [`this` jinja function](https://docs.getdbt.com/reference/dbt-jinja-functions/this) is invoked in the model. If `this` is used, then the batch will be run sequentially, as it may be that your batch depends on the existence of prior batches. If `this` isn't used, the batch will be run in parallel. +| Step | Condition | Parallel execution | Sequentially | +| ---- | ---------------| :------------------: | :----------: | +| 1. | **Not** the first batch | ✅ | - | +| 2. | **Not** the last batch | ✅ | - | +| 3. | [Adapter supports](#prerequisites) parallel batches | ✅ | - | +| 4. | `concurrent_batches` set to `True` | ✅ | - | +| 5. | `concurrent_batches` set to `False` | - | ✅ | + +:::info +After checking for 1, 2, and 3 — and if `concurrent_batches` value isn't set, dbt will intelligently auto-detect if the model invokes the [`{{ this }}`](/reference/dbt-jinja-functions/this) Jinja function. If it references `{{ this }}`, the batches will run sequentially since `{{ this }}` represents the database of the current model and referencing the same relation causes conflict. Otherwise, if the `concurrent_batches` value isn't set _and_ `{{ this }}` isn't detected (and other conditions are met), the batches will run in parallel. +::: + +EXPANDABLE IDEA OR H3/H4 + +ADD GUIDE/INFORMATION ON WHEN TO USE PARALLEL AND WHEN TO USE SEQUENTIAL FOR USERS. -You can override the check for `this` by setting `concurrent_batches` to either `True` or `False`. If set to `False`, the batch will be run sequentially. If set to `True` the batch will be run in parallel (assuming [1], [2], and [3]) +- parallel is more performant / faster, but means your logic needs to be independent to the order the batches are executed +- sequential is slower, but means you can calculate things like cumulative metrics in your microbatch models + + -If any of the following conditions are met, the batches will execute sequentially: +### Configure `concurrent_batches` -1. The database adapter does not support concurrent batches. -2. The relation in the data warehouse for the model doesn't exist. -It's not possible to run batches concurrently when the model definition contains [`{{ this }}`](/reference/dbt-jinja-functions/this). This is because Jinja represents the state of a table. +If you meet all the [conditions](#prerequisites), set the `concurrent_batches` config in your `dbt_project.yml` or incremental microbatch model `.sql` file to run batches in parallel: -### Configure `concurrent_batches` + + -| `concurrent_batches` | Description | -|----------------------------|----------------------------| -| `True` |Batches will run in parallel.| -| `False` |Batches will run sequentially.| -| `None` (default)|dbt will evaluate the model's Jinja logic and evaluate if it contains a reference to `this`. If it references `this`, the batches will run sequentially. This is because Jinja represents states of table. Otherwise, the batches will run in parallel.| +```yaml +models: + +concurrent_batches: True # value set to True to run batches in parallel +``` + + + -### Supported adapters +```sql +{{ + config( + materialized='incremental', + incremental_strategy='microbatch', + event_time='session_start', + begin='2020-01-01', + batch_size='day + concurrent_batches=True, # value set to True to run batches in parallel + ... + ) +}} -The following adapters support parallel batch execution: +select ... +``` + + -- dbt-snowflake -- dbt-databricks +Depending on your use case, configuring your microbatch models to run in parallel offer faster processing, in comparison to having models run sequentially, which is slower. ## How `microbatch` compares to other incremental strategies? From 3c9539ba6dd99bbea76d6877056742790406f37f Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 5 Dec 2024 12:58:21 +0000 Subject: [PATCH 26/74] Update incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 8a0e7445125..1aea8f86e7a 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -363,7 +363,7 @@ select ... If you meet all the [conditions](#prerequisites), set the `concurrent_batches` config in your `dbt_project.yml` or incremental microbatch model `.sql` file to run batches in parallel: - @@ -375,7 +375,7 @@ models: - From 9a4b3c0dda2803e979db87aca802ba968fbe2cfe Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 5 Dec 2024 12:59:23 +0000 Subject: [PATCH 27/74] Update incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 1aea8f86e7a..bb6eb21dcbb 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -363,7 +363,7 @@ select ... If you meet all the [conditions](#prerequisites), set the `concurrent_batches` config in your `dbt_project.yml` or incremental microbatch model `.sql` file to run batches in parallel: - @@ -375,7 +375,7 @@ models: - From 4cfa1519244557a34e19513e1996b9a8d453541f Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 5 Dec 2024 13:03:49 +0000 Subject: [PATCH 28/74] Update incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 1 + 1 file changed, 1 insertion(+) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index bb6eb21dcbb..6e71d0094b1 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -394,6 +394,7 @@ models: select ... ``` + From f98903791dbd159b259c5975d0f5334c5332943a Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 5 Dec 2024 13:11:03 +0000 Subject: [PATCH 29/74] Update incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 6e71d0094b1..87db68f0470 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -305,7 +305,7 @@ To enable parallel execution, you must meet the following conditions: A batch can only run in parallel if: -| Step | Condition | Parallel execution | Sequentially | +| Step | Condition | Parallel execution | Sequential execution| | ---- | ---------------| :------------------: | :----------: | | 1. | **Not** the first batch | ✅ | - | | 2. | **Not** the last batch | ✅ | - | @@ -313,9 +313,11 @@ A batch can only run in parallel if: | 4. | `concurrent_batches` set to `True` | ✅ | - | | 5. | `concurrent_batches` set to `False` | - | ✅ | -:::info -After checking for 1, 2, and 3 — and if `concurrent_batches` value isn't set, dbt will intelligently auto-detect if the model invokes the [`{{ this }}`](/reference/dbt-jinja-functions/this) Jinja function. If it references `{{ this }}`, the batches will run sequentially since `{{ this }}` represents the database of the current model and referencing the same relation causes conflict. Otherwise, if the `concurrent_batches` value isn't set _and_ `{{ this }}` isn't detected (and other conditions are met), the batches will run in parallel. -::: + +- After checking for 1, 2, and 3 in the previous table — and if `concurrent_batches` value isn't set, dbt will intelligently auto-detect if the model invokes the [`{{ this }}`](/reference/dbt-jinja-functions/this) Jinja function. If it references `{{ this }}`, the batches will run sequentially since `{{ this }}` represents the database of the current model and referencing the same relation causes conflict. + +- Otherwise, if the `concurrent_batches` value isn't set _and_ `{{ this }}` isn't detected (and other conditions are met), the batches will run in parallel. + EXPANDABLE IDEA OR H3/H4 @@ -398,7 +400,7 @@ select ... -Depending on your use case, configuring your microbatch models to run in parallel offer faster processing, in comparison to having models run sequentially, which is slower. +Depending on your use case, configuring your microbatch models to run in parallel offer faster processing, in comparison to running batches sequentially. ## How `microbatch` compares to other incremental strategies? From 28e11af1cf6c0ca857152e69f048118cee8c8535 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 5 Dec 2024 13:22:02 +0000 Subject: [PATCH 30/74] Update website/docs/docs/build/incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 87db68f0470..f14bf9419a9 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -314,7 +314,7 @@ A batch can only run in parallel if: | 5. | `concurrent_batches` set to `False` | - | ✅ | -- After checking for 1, 2, and 3 in the previous table — and if `concurrent_batches` value isn't set, dbt will intelligently auto-detect if the model invokes the [`{{ this }}`](/reference/dbt-jinja-functions/this) Jinja function. If it references `{{ this }}`, the batches will run sequentially since `{{ this }}` represents the database of the current model and referencing the same relation causes conflict. +After checking for 1, 2, and 3 in the previous table — and if `concurrent_batches` value isn't set, dbt will intelligently auto-detect if the model invokes the [`{{ this }}`](/reference/dbt-jinja-functions/this) Jinja function. If it references `{{ this }}`, the batches will run sequentially since `{{ this }}` represents the database of the current model and referencing the same relation causes conflict. - Otherwise, if the `concurrent_batches` value isn't set _and_ `{{ this }}` isn't detected (and other conditions are met), the batches will run in parallel. From e4c9bf29b476501d1485a443ed0deadf5378f54f Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 5 Dec 2024 13:22:14 +0000 Subject: [PATCH 31/74] Update website/docs/docs/build/incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index f14bf9419a9..87b17c0e8d0 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -316,7 +316,7 @@ A batch can only run in parallel if: After checking for 1, 2, and 3 in the previous table — and if `concurrent_batches` value isn't set, dbt will intelligently auto-detect if the model invokes the [`{{ this }}`](/reference/dbt-jinja-functions/this) Jinja function. If it references `{{ this }}`, the batches will run sequentially since `{{ this }}` represents the database of the current model and referencing the same relation causes conflict. -- Otherwise, if the `concurrent_batches` value isn't set _and_ `{{ this }}` isn't detected (and other conditions are met), the batches will run in parallel. +Otherwise, if the `concurrent_batches` value isn't set _and_ `{{ this }}` isn't detected (and other conditions are met), the batches will run in parallel. EXPANDABLE IDEA OR H3/H4 From be2d33230c1450760ccb5fe3c8f79712ba063e71 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 5 Dec 2024 13:22:21 +0000 Subject: [PATCH 32/74] Update website/docs/docs/build/incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 87b17c0e8d0..9eff9e9ff4d 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -286,7 +286,7 @@ While we may consider adding support for custom time zones in the future, we als The microbatch strategy offers the benefit of updating a model in smaller, more manageable batches. -Parallel batch execution means that multiple batches are processed at the same time, instead of one after the other (sequentially) for faster processing of your microbatch models. +Parallel batch execution means that multiple batches are processed at the same time using the `concurrent_batches` config, instead of one after the other (sequentially) for faster processing of your microbatch models. For example, if you have a microbatch model with 12 batches, you can execute those batches to run in parallel. Specifically they'll run in parallel limited by the number of [available threads](/docs/running-a-dbt-project/using-threads). From 7384dd9e02a3d8a8656b6e6bdc5e71c4db6c7586 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Thu, 5 Dec 2024 14:01:13 +0000 Subject: [PATCH 33/74] Update website/docs/docs/build/incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 9eff9e9ff4d..24fdc38b848 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -321,7 +321,7 @@ Otherwise, if the `concurrent_batches` value isn't set _and_ `{{ this }}` isn't EXPANDABLE IDEA OR H3/H4 -ADD GUIDE/INFORMATION ON WHEN TO USE PARALLEL AND WHEN TO USE SEQUENTIAL FOR USERS. +Parallel batch execution or sequential? - parallel is more performant / faster, but means your logic needs to be independent to the order the batches are executed - sequential is slower, but means you can calculate things like cumulative metrics in your microbatch models From 979f689f1d678f380a3228143b7e6799a0ff6de4 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Thu, 5 Dec 2024 14:27:30 +0000 Subject: [PATCH 34/74] Update website/docs/docs/build/incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 24fdc38b848..8180c97216c 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -321,7 +321,7 @@ Otherwise, if the `concurrent_batches` value isn't set _and_ `{{ this }}` isn't EXPANDABLE IDEA OR H3/H4 -Parallel batch execution or sequential? +### Parallel batch execution or sequential? - parallel is more performant / faster, but means your logic needs to be independent to the order the batches are executed - sequential is slower, but means you can calculate things like cumulative metrics in your microbatch models From 690286f40554fbfaab72f0a4ba9355ad6a7121fc Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Thu, 5 Dec 2024 14:31:07 +0000 Subject: [PATCH 35/74] Update website/docs/docs/build/incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 8180c97216c..7e69a64e0a6 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -319,7 +319,13 @@ After checking for 1, 2, and 3 in the previous table — and if `concurrent_ Otherwise, if the `concurrent_batches` value isn't set _and_ `{{ this }}` isn't detected (and other conditions are met), the batches will run in parallel. -EXPANDABLE IDEA OR H3/H4 + +Parallel batch execution is faster but requires logic that is independent of batch execution order. For example, if you are developing a data processing pipeline for a system that processes user transactions in batches, each batch is executed in parallel for better performance. However, the logic used to process each transaction must not depend on the order of how batches are executed or completed. + + + +Sequential processing isn't as performant and is slower however, it enables calculations such as cumulative metrics in microbatch models. Since cumulative metrics require data to be processed in the correct order to ensure each step builds on the previous one, sequential processing is ideal. + ### Parallel batch execution or sequential? From dc700abedd05d7669a2c284882004b531da5b20d Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Thu, 5 Dec 2024 14:31:15 +0000 Subject: [PATCH 36/74] Update website/docs/docs/build/incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 7e69a64e0a6..6c1d5b2909f 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -329,7 +329,7 @@ Sequential processing isn't as performant and is slower however, it enables calc ### Parallel batch execution or sequential? -- parallel is more performant / faster, but means your logic needs to be independent to the order the batches are executed +- Parallel batch execution is faster but requires logic that is independent of batch execution order. For example, if you are developing a data processing pipeline for a system that processes user transactions in batches, each batch is executed in parallel for better performance. However, the logic used to process each transaction must not depend on the order of how batches are executed or completed. - sequential is slower, but means you can calculate things like cumulative metrics in your microbatch models --- 🚀 Deployment available! Here are the direct links to the updated files: - https://docs-getdbt-com-git-new-branch-name-1-dbt-labs.vercel.app/docs/build/incremental-microbatch - https://docs-getdbt-com-git-new-branch-name-1-dbt-labs.vercel.app/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9 --------- Co-authored-by: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Co-authored-by: Leona B. Campbell <3880403+runleonarun@users.noreply.github.com> --- website/docs/docs/build/incremental-microbatch.md | 3 ++- .../docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md | 2 ++ 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 2cb32dd3527..67b297df2f1 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -25,7 +25,8 @@ Incremental models in dbt are a [materialization](/docs/build/materializations) Microbatch is an incremental strategy designed for large time-series datasets: - It relies solely on a time column ([`event_time`](/reference/resource-configs/event-time)) to define time-based ranges for filtering. Set the `event_time` column for your microbatch model and its direct parents (upstream models). Note, this is different to `partition_by`, which groups rows into partitions. - It complements, rather than replaces, existing incremental strategies by focusing on efficiency and simplicity in batch processing. -- Unlike traditional incremental strategies, microbatch doesn't require implementing complex conditional logic for [backfilling](#backfills). +- Unlike traditional incremental strategies, microbatch enables you to [reprocess failed batches](/docs/build/incremental-microbatch#retry), auto-detect [parallel batch execution](#parallel-batch-execution), and eliminate the need to implement complex conditional logic for [backfilling](#backfills). + - Note, microbatch might not be the best strategy for all use cases. Consider other strategies for use cases such as not having a reliable `event_time` column or if you want more control over the incremental logic. Read more in [How `microbatch` compares to other incremental strategies](#how-microbatch-compares-to-other-incremental-strategies). ### How microbatch works diff --git a/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md b/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md index 6ade3d5013f..a7d8be0e8a1 100644 --- a/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md +++ b/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md @@ -49,6 +49,8 @@ Starting in Core 1.9, you can use the new [microbatch strategy](/docs/build/incr - Simplified query design: Write your model query for a single batch of data. dbt will use your `event_time`, `lookback`, and `batch_size` configurations to automatically generate the necessary filters for you, making the process more streamlined and reducing the need for you to manage these details. - Independent batch processing: dbt automatically breaks down the data to load into smaller batches based on the specified `batch_size` and processes each batch independently, improving efficiency and reducing the risk of query timeouts. If some of your batches fail, you can use `dbt retry` to load only the failed batches. - Targeted reprocessing: To load a *specific* batch or batches, you can use the CLI arguments `--event-time-start` and `--event-time-end`. +- [Automatic parallel batch execution](/docs/build/incremental-microbatch#parallel-batch-execution): Process multiple batches at the same time, instead of one after the other (sequentially) for faster processing of your microbatch models. dbt intelligently auto-detects if your batches can run in parallel, while also allowing you to manually override parallel execution with the `concurrent_batches` config. + Currently microbatch is supported on these adapters with more to come: * postgres From 261f781d30a399470257dbcb06bcc5f961f2dceb Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Fri, 6 Dec 2024 23:07:23 +0000 Subject: [PATCH 69/74] Created concurrent batches page (#6601) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## What are you changing in this pull request and why? I've created a new page to describe how the concurrent batches config works ## Checklist - [ ] I have reviewed the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) so my content adheres to these guidelines. - [ ] The topic I'm writing about is for specific dbt version(s) and I have versioned it according to the [version a whole page](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version) and/or [version a block of content](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-blocks-of-content) guidelines. - [ ] I have added checklist item(s) to this list for anything anything that needs to happen before this PR is merged, such as "needs technical review" or "change base branch." - [ ] The content in this PR requires a dbt release note, so I added one to the [release notes page](https://docs.getdbt.com/docs/dbt-versions/dbt-cloud-release-notes). --- 🚀 Deployment available! Here are the direct links to the updated files: - https://docs-getdbt-com-git-new-branch-name-dbt-labs.vercel.app/docs/build/incremental-microbatch - https://docs-getdbt-com-git-new-branch-name-dbt-labs.vercel.app/docs/core/connect-data-platform/mssql-setup - https://docs-getdbt-com-git-new-branch-name-dbt-labs.vercel.app/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9 - https://docs-getdbt-com-git-new-branch-name-dbt-labs.vercel.app/reference/resource-properties/concurrent_batches --------- Co-authored-by: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Co-authored-by: Leona B. Campbell <3880403+runleonarun@users.noreply.github.com> --- .../resource-properties/concurrent_batches.md | 90 +++++++++++++++++++ website/sidebars.js | 1 + 2 files changed, 91 insertions(+) create mode 100644 website/docs/reference/resource-properties/concurrent_batches.md diff --git a/website/docs/reference/resource-properties/concurrent_batches.md b/website/docs/reference/resource-properties/concurrent_batches.md new file mode 100644 index 00000000000..4d6b2ea0af4 --- /dev/null +++ b/website/docs/reference/resource-properties/concurrent_batches.md @@ -0,0 +1,90 @@ +--- +title: "concurrent_batches" +resource_types: [models] +datatype: model_name +description: "Learn about concurrent_batches in dbt." +--- + +:::note + +Available in dbt Core v1.9+ or the [dbt Cloud "Latest" release tracks](/docs/dbt-versions/cloud-release-tracks). + +::: + + + + + + + +```yaml +models: + +concurrent_batches: true +``` + + + + + + + + + + +```sql +{{ + config( + materialized='incremental', + concurrent_batches=true, + incremental_strategy='microbatch' + ... + ) +}} +select ... +``` + + + + + + +## Definition + +`concurrent_batches` is an override which allows you to decide whether or not you want to run batches in parallel or sequentially (one at a time). + +For more information, refer to [how batch execution works](/docs/build/incremental-microbatch#how-parallel-batch-execution-works). +## Example + +By default, dbt auto-detects whether batches can run in parallel for microbatch models. However, you can override dbt's detection by setting the `concurrent_batches` config to `false` in your `dbt_project.yml` or model `.sql` file to specify parallel or sequential execution, given you meet these conditions: +* You've configured a microbatch incremental strategy. +* You're working with cumulative metrics or any logic that depends on batch order. + +Set `concurrent_batches` config to `false` to ensure batches are processed sequentially. For example: + + + +```yaml +models: + my_project: + cumulative_metrics_model: + +concurrent_batches: false +``` + + + + + +```sql +{{ + config( + materialized='incremental', + incremental_strategy='microbatch' + concurrent_batches=false + ) +}} +select ... + +``` + + + diff --git a/website/sidebars.js b/website/sidebars.js index 5d6e0582765..08494e4c713 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -956,6 +956,7 @@ const sidebarSettings = { "reference/resource-configs/materialized", "reference/resource-configs/on_configuration_change", "reference/resource-configs/sql_header", + "reference/resource-properties/concurrent_batches", ], }, { From 6f84d2b303478acdf1c267bc7db58e46940853f0 Mon Sep 17 00:00:00 2001 From: "Leona B. Campbell" <3880403+runleonarun@users.noreply.github.com> Date: Fri, 6 Dec 2024 15:25:59 -0800 Subject: [PATCH 70/74] Apply suggestions from code review --- website/docs/docs/build/incremental-microbatch.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 67b297df2f1..749e4a38038 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -296,9 +296,9 @@ For example, if you have a microbatch model with 12 batches, you can execute tho ### Prerequisites -To enable parallel execution, you must meet the following conditions: +To enable parallel execution, you must: -- You use the following supported adapters: +- Use a supported adapter: - Snowflake - Databricks - More adapters coming soon! @@ -306,7 +306,7 @@ To enable parallel execution, you must meet the following conditions: -- You meet [additional conditions](#how-parallel-batch-execution-works) mentioned in the next section +- Meet [additional conditions](#how-parallel-batch-execution-works) described in the next section. ### How parallel batch execution works From ef3ccc3034835be98e42366d8bf52cd757465a89 Mon Sep 17 00:00:00 2001 From: "Leona B. Campbell" <3880403+runleonarun@users.noreply.github.com> Date: Fri, 6 Dec 2024 15:33:46 -0800 Subject: [PATCH 71/74] Update incremental-microbatch.md --- .../docs/docs/build/incremental-microbatch.md | 31 +++++++++---------- 1 file changed, 15 insertions(+), 16 deletions(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 749e4a38038..cc17ca26fd5 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -304,31 +304,28 @@ To enable parallel execution, you must: - More adapters coming soon! - We'll be continuing to test and add concurrency support for adapters. This means that some adapters might get concurrency support _after_ the 1.9 initial release. - - -- Meet [additional conditions](#how-parallel-batch-execution-works) described in the next section. +- Meet [additional conditions](#how-parallel-batch-execution-works) described in the following section. ### How parallel batch execution works -A batch can only run in parallel if: +A batch can only run in parallel if all of these conditions are met: -| Step | Condition | Parallel execution | Sequential execution| -| ---- | ---------------| :------------------: | :----------: | -| 1. | **Not** the first batch | ✅ | - | -| 2. | **Not** the last batch | ✅ | - | -| 3. | [Adapter supports](#prerequisites) parallel batches | ✅ | - | +| Condition | Parallel execution | Sequential execution| +| ---------------| :------------------: | :----------: | +| **Not** the first batch | ✅ | - | +| **Not** the last batch | ✅ | - | +| [Adapter supports](#prerequisites) parallel batches | ✅ | - | -After checking for 1, 2, and 3 in the previous table — and if `concurrent_batches` value isn't set, dbt will intelligently auto-detect if the model invokes the [`{{ this }}`](/reference/dbt-jinja-functions/this) Jinja function. If it references `{{ this }}`, the batches will run sequentially since `{{ this }}` represents the database of the current model and referencing the same relation causes conflict. - -Otherwise, if `{{ this }}` isn't detected (and other conditions are met), the batches will run in parallel. This can be overriden by setting a value for `concurrent_batches`. -### Parallel or sequential execution +After checking for the conditions in the previous table — and if `concurrent_batches` value isn't set, dbt will intelligently auto-detect if the model invokes the [`{{ this }}`](/reference/dbt-jinja-functions/this) Jinja function. If it references `{{ this }}`, the batches will run sequentially since `{{ this }}` represents the database of the current model and referencing the same relation causes conflict. +Otherwise, if `{{ this }}` isn't detected (and other conditions are met), the batches will run in parallel, which can be overriden when you set a value for `concurrent_batches`. +### Parallel or sequential execution Choosing between parallel batch execution and sequential processing depends on the specific requirements of your use case. -- Parallel batch execution is faster but requires logic that's independent of batch execution order. For example, if you're developing a data pipeline for a system that processes user transactions in batches, each batch is executed in parallel for better performance. However, the logic used to process each transaction shouldn't depend on the order of how batches are executed or completed. +- Parallel batch execution is faster but requires logic independent of batch execution order. For example, if you're developing a data pipeline for a system that processes user transactions in batches, each batch is executed in parallel for better performance. However, the logic used to process each transaction shouldn't depend on the order of how batches are executed or completed. - Sequential processing is slower but essential for calculations like [cumulative metrics](/docs/build/cumulative) in microbatch models. It processes data in the correct order, allowing each step to build on the previous one.