[Core] More robust example for incremental runs #5321

dbeatty10 · 2024-04-22T18:41:13Z

Link to the page(s) on docs.getdbt.com requiring updates

https://docs.getdbt.com/docs/build/incremental-models

Tell us more about this update

The docs currently have the following example:

{% if is_incremental() %}
  where event_time >= (select max(event_time) from {{ this }})
{% endif %}

But this assumes that the the table is not empty.

Improved logic handles the case when the table is empty by using a default date in the distant past:

  where event_time >= (select coalesce(max(event_time), '1900-01-01') from {{ this }})

Reviewers/Stakeholders/SMEs

@b-per

Related GitHub issues

dbt-labs/dbt-core#9997 (comment)

Additional information

Tip

Instead of '1900-01-01', the ideal scenario would be if there were a cross-database way to specify the minimum date / timestamp_tz / timestamp_ntz / timestamp_ltz supported by the database engine. Then we'd just include that in our example instead.

The text was updated successfully, but these errors were encountered:

BenameurZyad · 2024-04-22T19:35:41Z

We used something like this in our case :

where event_time >= ( select coalesce(max(event_time), '' ) from {{ this }} )

dbeatty10 · 2024-04-23T13:22:50Z

@BenameurZyad coalesce(max(event_time), '' ) didn't work for me when I tried it out -- which database did you use? And are you certain it worked as intended when the incremental table is empty?

I've got a draft PR to update the docs here: #5326

BenameurZyad · 2024-04-23T14:07:49Z

Ouch. Sorry @dbeatty10 !
this is our actual implementation :
coalesce ( select (max(event_time) fom {{ this }} ), '' )

I changed it a bit to align with your logic above without trying, thinking it would be the same. Sorry for the confusion.
This one in this comment works for sure for us in Snowflake.

dbeatty10 · 2024-04-23T14:58:54Z

That doesn't look like it would work if I copy-pasted it. i.e., fom vs. from

For sake of completeness and correctness, would you mind copy-pasting your exact logic that you think would work in dbt-snowflake?

BenameurZyad · 2024-04-23T15:07:19Z

here it is :


with raw as (
    select
        properties,                 -- this is a json (stored as VARIANT type)
        content,                    -- this is a json (stored as VARIANT type)
        __INGESTED_AT               -- this is a timestamp UTC (stored as TEXT)
    from {{source('********','********')}}

    {% if is_incremental() %}

  -- this filter will only be applied on an incremental run
  -- (uses >= to include records arriving later on the same day as the last run of this model)

    -- this should prevent a bug where the incremental model would not work if the target table is empty
    where __INGESTED_AT >= coalesce((select max(__INGESTED_AT) from {{ this }}), '')

    {% endif %}
),

[Preview](https://docs-getdbt-com-git-dbeatty10-patch-4-dbt-labs.vercel.app//docs/build/incremental-models#filtering-rows-on-an-incremental-run) ## What are you changing in this pull request and why? resolves #5321 To ensure that the updated code will work for a broad number of users without issues, I tested the following example against these data platforms: - bigquery - databricks - duckdb - postgres - redshift - snowflake <img width="782" alt="image" src="https://github.com/dbt-labs/docs.getdbt.com/assets/44704949/0739892e-6f5d-45b8-ac1d-5bbd844cf096"> ☝️ Notice the table is empty, like the edge case scenario described in dbt-labs/dbt-core#9997 <img width="772" alt="image" src="https://github.com/dbt-labs/docs.getdbt.com/assets/44704949/87ad6438-082f-4d65-9a8b-f97d36497c8e"> ☝️ Notice it successfully added new data when it arrived. <details> <summary> ### Reprex </summary> Create this file: `models/my_incremental.sql` ```sql {{ config(materialized="incremental") }} with non_empty_cte as ( select 1 as id, cast('2024-01-01' as date) as event_time ), empty_cte as ( select 0 as id, cast('1999-12-31' as date) as event_time from non_empty_cte where 0=1 ) select * {% if var("scenario", "empty") == "empty" %} from empty_cte {% else %} from non_empty_cte {% endif %} {% if is_incremental() %} -- this filter will only be applied on an incremental run -- (uses >= to include records whose timestamp occurred since the last run of this model) where event_time >= (select coalesce(max(event_time), cast('1900-01-01' as date)) from {{ this }}) {% endif %} ``` Assuming a `profiles.yml` with all the relevant profile names, run these commands: ```shell dbt run --profile duckdb -s my_incremental --vars '{scenario: empty}' --full-refresh dbt show --profile duckdb --inline "select * from {{ ref('my_incremental') }}" dbt run --profile duckdb -s my_incremental --vars '{scenario: empty}' dbt show --profile duckdb --inline "select * from {{ ref('my_incremental') }}" dbt run --profile duckdb -s my_incremental --vars '{scenario: non_empty}' dbt show --profile duckdb --inline "select * from {{ ref('my_incremental') }}" dbt run --profile postgres -s my_incremental --vars '{scenario: empty}' --full-refresh dbt show --profile postgres --inline "select * from {{ ref('my_incremental') }}" dbt run --profile postgres -s my_incremental --vars '{scenario: empty}' dbt show --profile postgres --inline "select * from {{ ref('my_incremental') }}" dbt run --profile postgres -s my_incremental --vars '{scenario: non_empty}' dbt show --profile postgres --inline "select * from {{ ref('my_incremental') }}" dbt run --profile redshift -s my_incremental --vars '{scenario: empty}' --full-refresh dbt show --profile redshift --inline "select * from {{ ref('my_incremental') }}" dbt run --profile redshift -s my_incremental --vars '{scenario: empty}' dbt show --profile redshift --inline "select * from {{ ref('my_incremental') }}" dbt run --profile redshift -s my_incremental --vars '{scenario: non_empty}' dbt show --profile redshift --inline "select * from {{ ref('my_incremental') }}" dbt run --profile databricks -s my_incremental --vars '{scenario: empty}' --full-refresh dbt show --profile databricks --inline "select * from {{ ref('my_incremental') }}" dbt run --profile databricks -s my_incremental --vars '{scenario: empty}' dbt show --profile databricks --inline "select * from {{ ref('my_incremental') }}" dbt run --profile databricks -s my_incremental --vars '{scenario: non_empty}' dbt show --profile databricks --inline "select * from {{ ref('my_incremental') }}" dbt run --profile snowflake -s my_incremental --vars '{scenario: empty}' --full-refresh dbt show --profile snowflake --inline "select * from {{ ref('my_incremental') }}" dbt run --profile snowflake -s my_incremental --vars '{scenario: empty}' dbt show --profile snowflake --inline "select * from {{ ref('my_incremental') }}" dbt run --profile snowflake -s my_incremental --vars '{scenario: non_empty}' dbt show --profile snowflake --inline "select * from {{ ref('my_incremental') }}" dbt run --profile bigquery -s my_incremental --vars '{scenario: empty}' --full-refresh dbt show --profile bigquery --inline "select * from {{ ref('my_incremental') }}" dbt run --profile bigquery -s my_incremental --vars '{scenario: empty}' dbt show --profile bigquery --inline "select * from {{ ref('my_incremental') }}" dbt run --profile bigquery -s my_incremental --vars '{scenario: non_empty}' dbt show --profile bigquery --inline "select * from {{ ref('my_incremental') }}" ``` </details> ## Checklist - [x] Review the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) so my content adheres to these guidelines.

dbeatty10 added content Improvements or additions to content dbt Core The changes proposed in this issue relate to dbt Core labels Apr 22, 2024

dbeatty10 mentioned this issue Apr 22, 2024

[Feature] Incremental model logic fails when initial loaded table is empty dbt-labs/dbt-core#9997

Closed

2 tasks

dbeatty10 mentioned this issue Apr 23, 2024

Handle case when an incremental table is empty #5326

Merged

1 task

dbeatty10 closed this as completed in #5326 Apr 24, 2024

dbeatty10 mentioned this issue Jul 15, 2024

[Feature] Incremental Strategy - Edge Case Safeguarding on Merge Incremental Model dbt-labs/dbt-core#10440

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] More robust example for incremental runs #5321

[Core] More robust example for incremental runs #5321

dbeatty10 commented Apr 22, 2024 •

edited

Loading

BenameurZyad commented Apr 22, 2024 •

edited

Loading

dbeatty10 commented Apr 23, 2024

BenameurZyad commented Apr 23, 2024

dbeatty10 commented Apr 23, 2024

BenameurZyad commented Apr 23, 2024

[Core] More robust example for incremental runs #5321

[Core] More robust example for incremental runs #5321

Comments

dbeatty10 commented Apr 22, 2024 • edited Loading

Link to the page(s) on docs.getdbt.com requiring updates

Tell us more about this update

Reviewers/Stakeholders/SMEs

Related GitHub issues

Additional information

BenameurZyad commented Apr 22, 2024 • edited Loading

dbeatty10 commented Apr 23, 2024

BenameurZyad commented Apr 23, 2024

dbeatty10 commented Apr 23, 2024

BenameurZyad commented Apr 23, 2024

dbeatty10 commented Apr 22, 2024 •

edited

Loading

BenameurZyad commented Apr 22, 2024 •

edited

Loading