Skip to content

Commit

Permalink
grace's feedback
Browse files Browse the repository at this point in the history
  • Loading branch information
mirnawong1 committed Nov 27, 2024
1 parent e3d8e96 commit 9c52827
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 3 deletions.
6 changes: 4 additions & 2 deletions website/docs/docs/build/incremental-microbatch.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,16 +23,18 @@ Refer to [Supported incremental strategies by adapter](/docs/build/incremental-s
Incremental models in dbt are a [materialization](/docs/build/materializations) designed to efficiently update your data warehouse tables by only transforming and loading _new or changed data_ since the last run. Instead of reprocessing an entire dataset every time, incremental models process a smaller number of rows, and then append, update, or replace those rows in the existing table. This can significantly reduce the time and resources required for your data transformations.

Microbatch is an incremental strategy designed for large time-series datasets:
- It relies solely on a time column ([`event_time`](/reference/resource-configs/event-time)) to define time-based ranges for filtering. Set the `event_time` column for your microbatch model and its direct parents (upstream models). Note, this is different to `partition_by`, which groups rows into partitions.
- It complements, rather than replaces, existing incremental strategies by focusing on efficiency and simplicity in batch processing.
- Unlike traditional incremental strategies, microbatch doesn't require implementing complex conditional logic for [backfilling](#backfills).
- It relies solely on a time column ([`event_time`](/reference/resource-configs/event-time)) to handle data partitioning and filtering.
- Note, microbatch might not be the best strategy for all use cases. Consider other strategies for use cases such as not having a reliable `event_time` column or if you want more control over the incremental logic. Read more in [How `microbatch` compares to other incremental strategies](#how-microbatch-compares-to-other-incremental-strategies).

### How microbatch works

When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the `event_time` and `batch_size` you configure.

Each "batch" corresponds to a single bounded time period (by default, a single day of data). Where other incremental strategies operate only on "old" and "new" data, microbatch models treat every batch as an atomic unit that can be built or replaced on its own. Each batch is independent and <Term id="idempotent" />. This is a powerful abstraction that makes it possible for dbt to run batches [separately](#backfills) — in the future, concurrently — and to [retry](#retry) them independently.
Each "batch" corresponds to a single bounded time period (by default, a single day of data). Where other incremental strategies operate only on "old" and "new" data, microbatch models treat every batch as an atomic unit that can be built or replaced on its own. Each batch is independent and <Term id="idempotent" />.

This is a powerful abstraction that makes it possible for dbt to run batches [separately](#backfills), concurrently, and [retry](#retry) them independently.

### Example

Expand Down
2 changes: 1 addition & 1 deletion website/docs/docs/build/incremental-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ When you define a `unique_key`, you'll see this behavior for each row of "new" d
Please note that if there's a unique_key with more than one row in either the existing target table or the new incremental rows, the incremental model may fail depending on your database and [incremental strategy](/docs/build/incremental-strategy). If you're having issues running an incremental model, it's a good idea to double check that the unique key is truly unique in both your existing database table and your new incremental rows. You can [learn more about surrogate keys here](https://www.getdbt.com/blog/guide-to-surrogate-key).

:::info
While common incremental strategies, such as`delete+insert` + `merge`, might use `unique_key`, others don't. For example, the `insert_overwrite` strategy does not use `unique_key`, because it operates on partitions of data rather than individual rows. For more information, see [About incremental_strategy](/docs/build/incremental-strategy).
While common incremental strategies, such as `delete+insert` + `merge`, might use `unique_key`, others don't. For example, the `insert_overwrite` strategy does not use `unique_key`, because it operates on partitions of data rather than individual rows. For more information, see [About incremental_strategy](/docs/build/incremental-strategy).

Check warning on line 117 in website/docs/docs/build/incremental-models.md

View workflow job for this annotation

GitHub Actions / vale

[vale] website/docs/docs/build/incremental-models.md#L117

[custom.Typos] Oops there's a typo -- did you really mean 'incremental_strategy'?
Raw output
{"message": "[custom.Typos] Oops there's a typo -- did you really mean 'incremental_strategy'? ", "location": {"path": "website/docs/docs/build/incremental-models.md", "range": {"start": {"line": 117, "column": 287}}}, "severity": "WARNING"}
:::

#### `unique_key` example
Expand Down

0 comments on commit 9c52827

Please sign in to comment.