-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-1507] [Regression] v1.3: views instead of temp tables cause performance issue #323
Comments
@asouletdebrugiere This is surprising to hear! The entire justification for #93, which switched out temp tables for views, was to improve performance. You can switch it back by reimplementing the macro named {% macro dbt_snowflake_get_tmp_relation_type(strategy, unique_key, language) %}
-- always table
{{ return('table') }}
{% endmacro %} If you confirm that switching back to temp tables speeds up your incremental runs significantly, then we should preserve both behaviors, and make this an opt-in config instead of an across-the-board default. I'd also like to figure out why this is faster for some tables, and slower for others. |
Thanks @jtcohen6 for your quick answer ! Overriding the |
@asouletdebrugiere It would be very helpful to see the query profiles for
Tmp table:
|
Thanks @asouletdebrugiere - what's interesting is that the first query is scanning 11 billion rows from the target table on the final merge, whereas the merge from tmp table query is only scanning 520 million. It seems like the merge from tmp table is creating a join filter which is pushed down onto the target table scan, but the merge from view is not creating that join filter on the target table scan, which is causing the massive disk spillage in the join step. Join filter (from screenshot 3): No join filter (from screenshot 1): Would you be able to send screenshots of:
2 From the third screenshot: |
Yes sure :
Full "Equality Join Condition" is based on a surrogate key we created for the table :
|
It sounds like there's merit in making this behavior opt-in, since there are cases where it's better and cases where it's worse. Questions for us are:
In the meantime, users can opt into the previous view behavior by overriding the |
Hi there, Therefore, I feel like the default behavior should be reverted to 'table' to avoid misunderstood long runs until the matter is better understood. Also, the naming sounds right. As usual, you all rock, thanks for the time! |
My inclination would be to say the default should be whatever is the more performant in the majority of cases and better document the cases where this default can be problematic. |
Ok so I think we're good to move forward with adding a model level config for incremental models using the merge strategy, to switch the temporary relation type from view (default from 1.3) to table.
Any preference @jtcohen6 ? I think I like |
May be relevant to this conversation : dbt-labs/dbt-core#5236 |
Sorry I missed this last month — agree with your proposed name! |
just sharing some thoughts from an internal thread from an optimizer guru at Snowflake from last month.
|
As @McKnight-42 and I have been iterating on the implementation of this feature, a question came up how we should handle the Consider the
Three optionsWe have three main options:
Trade-offsPrimary trade-off of each option (in corresponding order):
ProposalSo I'd like to propose that we actually do the third option! This would expand the scope slightly to cover other incremental strategies also. One advantage is that writing the documentation for the third option would be the most simple. It would also be easiest to understand from a user perspective:
Trying on Option ThreeSuppose the new config name is
Expanding this out to include all the current logic (and simplifying the
Current Logic in PR #389If we keep the original decision to have a
Ultimately, the two variations aren't that substantially different from each other. However, if we do ever want to come back and allow strategies other than If we allow all incremental strategies to provide this config in one fell swoop right now, then we'll complete the offering of a balance of rational defaults along with configurability for intermediate temp objects used by incremental models. |
Is this a regression in a recent version of dbt-snowflake?
Current Behavior
A change has been introduced in dbt 1.3.0 that uses a view instead of a temporary table for incremental models when the merge incremental strategy is used.
I noticed this change because one of my hourly job is now taking 30 minutes vs 3 minutes previously.
I know that using the
delete+insert
strategy would work to get temp tables back but we did some tests and the performance of our jobs is better when we use merge.Expected/Previous Behavior
Would it be possible to go back to temporary table instead of views ? Or at least to add a parameter to be able to use temporary tables when the incremental strategy is a merge here ?
Steps To Reproduce
create or replace view
vscreate or replace temporary table
in v<=1.2Relevant log output
No response
Environment
Additional Context
No response
The text was updated successfully, but these errors were encountered: