You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Support composition key that result in a join based on multiple columns when upserting an increment.
Describe alternatives you've considered
The dbt docs recommends you to create a new column which combines the two columns. I do not like to add a column just for that purpose. Moreover, (I expect) it is slower than having a composition key.
Additional context
This PR in dbt-core is about allowing unique_key to be a list.
Who will this benefit?
Me. I think it is "nicer" to not have to create a new column that combines your composition key into one column to make a join on multiple columns work. I expect it will be faster too, however I did not benchmark it.
Are you interested in contributing this feature?
This is the macro I use currently:
{% macro spark__get_merge_sql(target, source, unique_key, dest_columns, predicates=none) %}
{# skip dest_columns, use merge_update_columns config if provided, otherwise use "*" #}
{%-set update_columns =config.get("merge_update_columns") -%}
{% set merge_condition %}
{% if unique_key is not none %}
{%- if unique_key is string -%}
{%-set unique_key = [unique_key] -%}
{%- endif -%}
on
{%- for column in unique_key %}
DBT_INTERNAL_SOURCE.{{ column }} = DBT_INTERNAL_DEST.{{ column }}
{%- if not loop.last %} and {%- endif %}
{%- endfor %}
{% else %}
on false
{% endif %}
{% endset %}
merge into {{ target }} as DBT_INTERNAL_DEST
using {{ source.include(schema=false) }} as DBT_INTERNAL_SOURCE
{{ merge_condition }}
when matched then updateset
{% if update_columns -%}{%- for column_name in update_columns %}
{{ column_name }} = DBT_INTERNAL_SOURCE.{{ column_name }}
{%- if not loop.last %}, {%- endif %}
{%- endfor %}
{%- else %} * {% endif %}
when not matched then insert *
{% endmacro %}
The text was updated successfully, but these errors were encountered:
@JCZuurmond I'd welcome this change to dbt-spark, along with parallel changes to dbt-core and any other adapters where we'd need to make the update. Luckily, I think both dbt-bigquery and dbt-snowflake use or shell out to the default get_merge_sql implementation.
I wonder if the merge_condition conditional you have above could even be rolled into its own macro. It could be dispatched, with a reasonable default, from dbt-core:
{% macro get_merge_condition(unique_key) %}
{% set merge_condition %}
{% if unique_key is not none %}
{%- if unique_key is string -%}
{%-set unique_key = [unique_key] -%}
{%- endif -%}
on
{%- for column in unique_key %}
DBT_INTERNAL_SOURCE.{{ column }} = DBT_INTERNAL_DEST.{{ column }}
{%- if not loop.last %} and {%- endif %}
{%- endfor %}
{% else %}
on false
{% endif %}
{% endset %}
{{ return(merge_condition) }}
{% endmacro %}
Then, the only change we'd need in spark__get_merge_sql would be to call that macro.
Additionally, I'm all about adding unique_key as an actual validated config in dbt-core, and having that config accept either a string or a list, as described in dbt-labs/dbt-core#2479 (comment).
So: I think a PR in dbt-core, resolving #2479, is the right way to start. I see you've already jumped onto that thread, so I'll say no more :)
I forgot we had this issue open! Going to close this one in favor of #282, which will have us make the same change to spark__get_merge_sql and add a test for it
Describe the feature
Support composition key that result in a join based on multiple columns when upserting an increment.
Describe alternatives you've considered
The dbt docs recommends you to create a new column which combines the two columns. I do not like to add a column just for that purpose. Moreover, (I expect) it is slower than having a composition key.
Additional context
This PR in
dbt-core
is about allowingunique_key
to be a list.Who will this benefit?
Me. I think it is "nicer" to not have to create a new column that combines your composition key into one column to make a join on multiple columns work. I expect it will be faster too, however I did not benchmark it.
Are you interested in contributing this feature?
This is the macro I use currently:
The text was updated successfully, but these errors were encountered: