-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADAP-614] [Bug] unique_key config for snapshots using hudi file_format #801
Comments
Hi @darcyMaJiyue thank you so much for opening this issue! We will try to update the docs as soon as I have a clear understanding what to update. @dbeatty10 Can you help me understand what the change to the docs should be? |
@runleonarun I can't tell if this is a request to update the docs or a request to update the implementation in dbt-spark. It looks more like the latter, so I'm going to transfer this issue to dbt-spark -- we can always transfer it back again if needed 😎 |
@darcyMaJiyue which of these two are you requesting / suggesting?
My colleague @dataders will help determine next steps either way. |
oh, It's a long time. @dbeatty10 thanks for your response. I'm suggesting the first one. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
Sorry for the long time @darcyMaJiyue. Could you share reproducible example?Could you provide an example snapshots SQL + config so that we can try to reproduce the issue that you are seeing? We'd be looking for two things:
We'd need both in order to add new tests to dbt-spark that trigger this error so that we can be sure any fix works as we expect. ExampleHere's what those two things should look similar to: {% snapshot orders_snapshot %}
{{
config(
target_database='analytics',
target_schema='snapshots',
unique_key='id',
file_format='hudi'
strategy='timestamp',
updated_at='updated_at',
)
}}
select * from {{ source('jaffle_shop', 'orders') }}
{% endsnapshot %} And then also whichever command(s) trigger the error. So maybe in your case running the snapshot twice in a row causes the issue? dbt snapshot -s orders_snapshot
dbt snapshot -s orders_snapshot Could you confirm the necessary code changes?To be clear, is this the only change you are proposing? i.e., remove these lines from 9,10d8
< {%- elif options is not none and 'primaryKey' in options and options['primaryKey'] != unique_key -%}
< {{ exceptions.raise_compiler_error("unique_key and options('primaryKey') should be the same column(s).") }} Or is there some change needed in 2a3,4
> {%- set insert_cols_csv = insert_cols join(', ') -%}
>
14c16,17
< then insert *
---
> then insert ({{ insert cols csv }})
> values ({{ insert_cols_csv}}) |
p.s. support for We would need to add relevant tests for snapshots for |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
1 similar comment
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
Contributions
What page(s) or areas on docs.getdbt.com are affected?
unique_key config for snapshots
What changes are you suggesting?
I think for the snapshot hudi table, the primaryKey should be ‘dbt_scd_id’, which is used in the ‘merge into on’ statement generated by ‘dbt snapshot’ command. And the dbt_scd_id column is related to the primary key of the source table and the update_at referenced column when we reference to timestamp strategy.
But now the primaryKey of the snapshot CTAS hudi table, which is created as we run the first ‘dbt snapshot’ command, is the same with the primary key of the source table.
In my experiment, I change two parts of the sql generated by the dbt, one is the primaryKey of the snapshot CTAS hudi table using ‘dbt_scd_id’, the other one is the ‘insert [referenced columns]’ in the 'merge into insert ’ statement instead of ‘insert *’. After these changes, I get the scd2 table, stored in hudi.
Additional information
In my solution, I copy the files include/spark/macros/adapters.sql and include/spark/macros/materializations/snapshot.sql to the macro directory in my own project
And, I change the related code, that works.
The text was updated successfully, but these errors were encountered: