Skip to content

Commit

Permalink
This branch was auto-updated!
Browse files Browse the repository at this point in the history
  • Loading branch information
github-actions[bot] authored Oct 5, 2024
2 parents 1145b7d + bd73e21 commit 52b7cae
Show file tree
Hide file tree
Showing 6 changed files with 402 additions and 2 deletions.
2 changes: 1 addition & 1 deletion website/docs/docs/build/incremental-microbatch.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ A `sessions` model is aggregating and enriching data that comes from two other m
- `page_views` is a large, time-series table. It contains many rows, new records almost always arrive after existing ones, and existing records rarely update.
- `customers` is a relatively small dimensional table. Customer attributes update often, and not in a time-based manner — that is, older customers are just as likely to change column values as newer customers.

The `page_view_start` column in `page_views` is configured as that model's `event_time`. The `customers` model does not configure an `event_time`. Therefore, each batch of `sessions` will filter `page_views` to the equivalent time-bounded batch, and it will not filter `sessions` (a full scan for every batch).
The `page_view_start` column in `page_views` is configured as that model's `event_time`. The `customers` model does not configure an `event_time`. Therefore, each batch of `sessions` will filter `page_views` to the equivalent time-bounded batch, and it will not filter `customers` (a full scan for every batch).

We run the `sessions` model on October 1, 2024, and then again on October 2. It produces the following queries:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ Historically, managing incremental models involved several manual steps and resp

While this works for many use-cases, there’s a clear limitation with this approach: *Some datasets are just too big to fit into one query.*

Starting in Core 1.9, you can use the new microbatch strategy to optimize your largest datasets -- **process your event data in discrete periods with their own SQL queries, rather than all at once.** The benefits include:
Starting in Core 1.9, you can use the new [microbatch strategy](/docs/build/incremental-microbatch#what-is-microbatch-in-dbt) to optimize your largest datasets -- **process your event data in discrete periods with their own SQL queries, rather than all at once.** The benefits include:

- Simplified query design: Write your model query for a single batch of data. dbt will use your `event_time``lookback`, and `batch_size` configurations to automatically generate the necessary filters for you, making the process more streamlined and reducing the need for you to manage these details.
- Independent batch processing: dbt automatically breaks down the data to load into smaller batches based on the specified `batch_size` and processes each batch independently, improving efficiency and reducing the risk of query timeouts. If some of your batches fail, you can use `dbt retry` to load only the failed batches.
Expand Down
Loading

0 comments on commit 52b7cae

Please sign in to comment.