Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-946] [CT-2689] [Bug] Incremental models ran from scratch when created in a pre-hook #918

Closed
2 tasks done
jelstongreen opened this issue Jun 13, 2023 · 9 comments
Closed
2 tasks done
Labels
bug Something isn't working Stale

Comments

@jelstongreen
Copy link

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When running incremental models in a dev / ci environment I am trying to shallow clone the production tables into the dev schema to fully simulate how the changes might affect the production model. Unfortunately whether a model should be run with is_incremental or not is determined at compile time, before the pre-hook creating the dev version of the model is run. This leads to dbt running the dev model from scratch even though it already exists.

Expected Behavior

Ideally, the check for whether to run the model in incremental mode or not should be executed at runtime.

Steps To Reproduce

Opened a fresh branch which will execute tables in branch db. Run the model and it executes:
pre-hook:

CREATE TABLE IF NOT EXISTS `hive_metastore`.`branch`.`model` SHALLOW CLONE prod.model

model:

create
or replace table `hive_metastore`.`branch`.`model` using delta as with src as (
  SELECT
    *
  FROM
    `hive_metastore`.`branch`.`src`
)
SELECT
  *
FROM
  src
WHERE
  timestamp >= (SELECT min(timestamp) FROM src)

If I then cancel that query and rerun dbt:
pre-hook:

CREATE TABLE IF NOT EXISTS `hive_metastore`.`branch`.`model` SHALLOW CLONE prod.model

model:

merge into `hive_metastore`.`branch`.`model` as DBT_INTERNAL_DEST using `model__dbt_tmp` as DBT_INTERNAL_SOURCE on DBT_INTERNAL_SOURCE.id = DBT_INTERNAL_DEST.id
when matched then
update
set
  *
  when not matched then
insert
  *

The only difference between the two runs is that the dev version of the model (the clone) exists in the 2nd one at compile time.

Relevant log output

No response

Environment

- OS: MacOS 13.3.1 (a) 
- Python: 3.10.8
- dbt: 1.4.1

Which database adapter are you using with dbt?

spark, other (mention it in "Additional Context")

Additional Context

Databricks

@jelstongreen jelstongreen added bug Something isn't working triage labels Jun 13, 2023
@github-actions github-actions bot changed the title [Bug] Incremental models ran from scratch when created in a pre-hook [CT-2689] [Bug] Incremental models ran from scratch when created in a pre-hook Jun 13, 2023
@dbeatty10
Copy link
Contributor

dbeatty10 commented Jun 13, 2023

Thanks for opening this @jelstongreen !

Before we get into the details of this bug report, you might be interested in subscribing to dbt-labs/dbt-core#7256 since the big idea in that feature is:

Clone my production state into my development schema, please!

Reproducing this

Could you help me reproduce this by supplying example model .sql file(s) (and any related YAML configuration files)?

And also the commands to trigger the error?

i.e., something like this:

models/my_incremental_model.sql

{{ config(
    materialized='incremental',
    pre_hook="SQL-statement",
    ...
) }}

select ...

dbt_project.yml

# anything applicable from a dbt_project.yml file here

models/_models.yml

# anything applicable from a (model) properties file here

Commands:

dbt build --full-refresh
dbt run --select my_incremental_model

@jelstongreen
Copy link
Author

Hi @dbeatty10. Here you go:
models/my_incremental_model.sql

{{ config(
    materialized='incremental',
    incremental_strategy='merge',
    unique_key="unique_id",
    pre_hook = [
        """
        {{ shallow_clone_production() }}
        """
    ],
    post_hook = [
        """
        {{ optimize_and_zorder(zorder_cols=['col_a','col_b']) }}
        """
    ] 
) }}

SELECT col_a, col_b...

shallow_clone_production.sql

{%- macro shallow_clone_production() %}
    {% if target.name in ('dev','testing') %}
        {% if var('clone_incremental_dev', 'True') == 'True' %}
            CREATE TABLE IF NOT EXISTS {{ this }}
            SHALLOW CLONE {{ var('prod_schema') }}.{{ this.name }}
        {% endif %}
    {% endif %}
{%- endmacro %}

command

dbt run -s my_incremental_model --project-dir projects/my_project

@jelstongreen
Copy link
Author

Given that this is being looked at for 1.6 I imagine this isn't going to be a priority. I can create a workaround in the meantime and run the macro as an operation in a step for invoking dbt unless this is something you'd like to look at.

@dbeatty10
Copy link
Contributor

Thanks for that example @jelstongreen -- that helps me see exactly what is going on.

When you invoke dbt like dbt run -s my_model --target dev with the setup you described, here's what it does:

  • Introspects the database to see which tables already exist
  • The table for my_model doesn't exist yet, so it will need to be created from scratch rather than inserting into it
  • There is some SQL to run as a pre-hook right before that

So you can see from the above how the error follows:

  1. The pre-hook creates the dev version of the table
  2. Then the materialization tries to create that same dev version of the table 💥

Running a macro as an operation is a good workaround until v1.6 comes out.

Another option you could try if you are so inclined:

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Sep 15, 2023
@github-actions
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 23, 2023
@amyfeng-klaviyo
Copy link

can we check the status of this please?

@dbeatty10
Copy link
Contributor

@amyfeng-klaviyo we've closed this issue since we didn't find any bug within dbt-core to fix.

If you think you're experiencing a bug within dbt-core, could you open a new issue here and describe it in detail so we can try to reproduce it?

If you have a substantially similar use-case as jelstongreen described here, then I'd suggest one of the following:

@dbeatty10 dbeatty10 removed the triage label Oct 13, 2023
@amyfeng-klaviyo
Copy link

amyfeng-klaviyo commented Oct 15, 2023

@dbeatty10 thanks for the update! my question is really the same as the bug described above.
We have set up a pre-hook macro to clone the production table for an incremental model (mainly for CI). while i m testing the pre-hook in local, The table:

  1. pre-hook: create or replace table dev.abc clone prod.abc
  2. model run: create or replace table dev.abc as ... (another full build instead of running it as incremental model)
    I am wondering y the 2nd step wound not run as the model already existed in the dev env. we are currently on v1.3.3 and would need some testing before we move on to v1.6. i can see dbt clone might be the option here.

@dbeatty10 dbeatty10 transferred this issue from dbt-labs/dbt-core Oct 15, 2023
@github-actions github-actions bot changed the title [CT-2689] [Bug] Incremental models ran from scratch when created in a pre-hook [ADAP-946] [CT-2689] [Bug] Incremental models ran from scratch when created in a pre-hook Oct 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

3 participants