-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Snowflake-specific implementation of min_or_max
#32
base: main
Are you sure you want to change the base?
Conversation
Hey @brown5628 - thanks for opening this! For sake of auditing, would you mind installing your fork in the Snowflake-based project you have and running a query that uses the |
@bcodell Should be ready for review. Thanks for your patience on this one with the slow turnaround time. Quick callouts:
Two questions from me:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for tackling this! I think we just need to make a small tweak to leverage dbt dispatch functionality. See the comment in the code.
Otherwise, if you wouldn't mind turning off sqlfrmt
so the diff is cleaner!
100% we should implement a style guide in the near future. Until then, let's refrain from style changes outside the PR scope.
Awesome work though!
@@ -1,56 +1,65 @@ | |||
{% macro _min_or_max(min_or_max, qualified_col) %} | |||
|
|||
{% if target.type == "snowflake" %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! But I think we ought to change this to use dbt dispatch.
I think we could just abstract this by doing {% set aggregation = get_db_aggregation() %}
. And have that get_db_aggregation()
macro contain default__
and snowflake__
dispatches.
Sound reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, absolutely! Wasn't aware of that functionality but seems like a perfect fit here. Will make the changes & tee this up!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tnightengale:
- Restated the file without
sqlfmt
so the diff should be cleaner. - Struggling a bit with how to implement the
dbt dispatch
logic. I see how it can be used to replace the if/then logic I was using and have done so, but I'm not following applying this at the{% set aggregation = get_db_aggregation() %}
level since there are other changes to the Snowflake specific logic. Would you be able to spell out a bit further what you are looking for there? Happy to take another pass but recognize that I'm spinning my wheels.
Validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the _min_or_max
macro actually uses a trick to prepend the ts
, then aggregate, then trim off the prepended ts
column, so that the aggregation abstraction can work the same across sum
and count
.
So I think that logic needs to be replicated for snowflake as well, as far as I understand.
I don't think this will work as is?
{{ return(adapter.dispatch('_min_or_max','dbt_activity_schema')(min_or_max, qualified_col)) }} | ||
{%- endmacro -%} | ||
|
||
{% macro default___min_or_max(min_or_max, qualified_col) -%} | ||
|
||
{% set aggregation = "min" if min_or_max == "min" else "max" %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this assignment should be via a dispatched macro. And then we can call that aggregation with {{ }}
in the relevant place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tnightengale Apologies for the back and forth. I think I'm missing something here that would help me address this feedback. Three questions for you:
- I follow how the
default
_min_or_max
macro works- per the discussion in TRY_CAST SQL compilation error #26, this code uses built-in Snowflake functionality to achieve the same end rather than replicate the prepend approach. In my dbt project and the Snowflake version of the integration tests, I see that themin_by/max_by
version produces the expected results. Could you explain further what you mean byI don't think this will work as is?
? - For
So I think that logic needs to be replicated for Snowflake as well, as far as I understand.
, are you saying that you would prefer a Snowflake version of the prepend logic rather than the native Snowflake approach used here or something else? - I'm not following the comment on the assignment via dispatch macro. The version here in this PR is essentially a carbon copy of the example provided in the dbt docs for this functionality. Could you explain more about what you are looking for or point me to an example on which I can base the refactoring?
Appreciate the feedback to give me what I need to get this all squared away!
Per #26, the generic implementation of the
_min_or_max
macro does not work correctly on Snowflake, with the root cause being dbt'ssafe_cast
using a function in Snowflake with limitations that make it unsuitable for this purpose.Given that this macro is intended to create the equivalent of Snowflake's
min_by
/max_by
functions, the work of this PR is to:Test 1- Integration Suite
Test 2- Prod project
dbt_project.yml
varsquery
compiled
query