Support composition key (multiple columns) when merging increment #238

JCZuurmond · 2021-10-22T14:35:24Z

Describe the feature

Support composition key that result in a join based on multiple columns when upserting an increment.

Describe alternatives you've considered

The dbt docs recommends you to create a new column which combines the two columns. I do not like to add a column just for that purpose. Moreover, (I expect) it is slower than having a composition key.

Additional context

This PR in dbt-core is about allowing unique_key to be a list.

Who will this benefit?

Me. I think it is "nicer" to not have to create a new column that combines your composition key into one column to make a join on multiple columns work. I expect it will be faster too, however I did not benchmark it.

Are you interested in contributing this feature?

This is the macro I use currently:

{% macro spark__get_merge_sql(target, source, unique_key, dest_columns, predicates=none) %}
  {# skip dest_columns, use merge_update_columns config if provided, otherwise use "*" #}
  {%- set update_columns = config.get("merge_update_columns") -%}

  {% set merge_condition %}
    {% if unique_key is not none %}
      {%- if unique_key is string -%}
        {%- set unique_key = [unique_key] -%}
      {%- endif -%}
      on
      {%- for column in unique_key %}
        DBT_INTERNAL_SOURCE.{{ column }} = DBT_INTERNAL_DEST.{{ column }}
        {%- if not loop.last %} and {%- endif %}
      {%- endfor %}
    {% else %}
      on false
    {% endif %}
  {% endset %}

  merge into {{ target }} as DBT_INTERNAL_DEST
    using {{ source.include(schema=false) }} as DBT_INTERNAL_SOURCE

    {{ merge_condition }}

    when matched then update set
      {% if update_columns -%}{%- for column_name in update_columns %}
        {{ column_name }} = DBT_INTERNAL_SOURCE.{{ column_name }}
        {%- if not loop.last %}, {%- endif %}
      {%- endfor %}
      {%- else %} * {% endif %}

    when not matched then insert *
{% endmacro %}

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2021-10-22T15:31:37Z

@JCZuurmond I'd welcome this change to dbt-spark, along with parallel changes to dbt-core and any other adapters where we'd need to make the update. Luckily, I think both dbt-bigquery and dbt-snowflake use or shell out to the default get_merge_sql implementation.

I wonder if the merge_condition conditional you have above could even be rolled into its own macro. It could be dispatched, with a reasonable default, from dbt-core:

{% macro get_merge_condition(unique_key) %}
  {% set merge_condition %}
    {% if unique_key is not none %}
      {%- if unique_key is string -%}
        {%- set unique_key = [unique_key] -%}
      {%- endif -%}
      on
      {%- for column in unique_key %}
        DBT_INTERNAL_SOURCE.{{ column }} = DBT_INTERNAL_DEST.{{ column }}
        {%- if not loop.last %} and {%- endif %}
      {%- endfor %}
    {% else %}
      on false
    {% endif %}
  {% endset %}
  {{ return(merge_condition) }}
{% endmacro %}

Then, the only change we'd need in spark__get_merge_sql would be to call that macro.

Additionally, I'm all about adding unique_key as an actual validated config in dbt-core, and having that config accept either a string or a list, as described in dbt-labs/dbt-core#2479 (comment).

So: I think a PR in dbt-core, resolving #2479, is the right way to start. I see you've already jumped onto that thread, so I'll say no more :)

JCZuurmond · 2022-02-05T10:16:47Z

PR and issue in core that make unique_id a list

jtcohen6 · 2022-02-07T14:09:52Z

I forgot we had this issue open! Going to close this one in favor of #282, which will have us make the same change to spark__get_merge_sql and add a test for it

JCZuurmond added type:enhancement New feature or request triage:product labels Oct 22, 2021

jtcohen6 added good_first_issue Good for newcomers and removed triage:product labels Oct 22, 2021

jtcohen6 closed this as completed Feb 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support composition key (multiple columns) when merging increment #238

Support composition key (multiple columns) when merging increment #238

JCZuurmond commented Oct 22, 2021

jtcohen6 commented Oct 22, 2021 •

edited

Loading

JCZuurmond commented Feb 5, 2022

jtcohen6 commented Feb 7, 2022

Support composition key (multiple columns) when merging increment #238

Support composition key (multiple columns) when merging increment #238

Comments

JCZuurmond commented Oct 22, 2021

Describe the feature

Describe alternatives you've considered

Additional context

Who will this benefit?

Are you interested in contributing this feature?

jtcohen6 commented Oct 22, 2021 • edited Loading

JCZuurmond commented Feb 5, 2022

jtcohen6 commented Feb 7, 2022

jtcohen6 commented Oct 22, 2021 •

edited

Loading