-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ct 276/iceberg support patch #651
Conversation
Found a way to identify iceberg tables given that spark returns an error when trying to execute "SHOW TABLE EXTENDED..." See https://issues.apache.org/jira/browse/SPARK-33393 Instead of show table extended a "DESCRIBE EXTENDED" is performed to retrieve the provider information. This allows for identification of iceberg through an is_iceberg member variable. Allow for multiple join conditions to allow for mutliple columns to make a row distinct Use is_iceberg everywhere handling iceberg tables differs from other sources of data.
[CT-276] Apache Iceberg Support #294 The _schema variable was used for non-iceberg tables but was being overridden by work for iceberg v2 tables. I've made it so the iceberg condition will set _schema rather than blanket changing the schema for all providers.
On second look I wasn't happy with my name choices for macro name and method, hopefully what I have now makes more sense. [CT-276] Apache Iceberg Support #294
Upon further investigation this check is not needed since self.database will not be set.
[skip ci]
Fix incremental runs
Add Iceberg to the list
- upstream dbt changed breaking the use of ParsedSourceDefinition, using SourceDefinition appears to work instead - Added in change to include iceberg in adapters.sql: macro spark__alter_column_comment
tox on certain platforms will complain that /bin/bash is not allowed to be used. I'm allowing it to be used with this change.
Fixed conflicts: dbt/adapters/spark/impl.py dbt/adapters/spark/relation.py dbt/include/spark/macros/materializations/incremental/incremental.sql dbt/include/spark/macros/materializations/incremental/strategies.sql Updated tox.ini removing my own addition of allowlist_externals, main has it set now.
Cleanup based on comments
- Fixing up merge conflicts around Exception/Error types - Putting back accidental newline removal in tox.ini
- in impl.py changed: List[agate.Row] to AttrDict This fixed the pre-commit error after merging in changes from main.
I noticed that two Spark tests are failing, so tried to revert some of the changes that seemed related. They are now passing and also my Iceberg dbt project is still running fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @VersusFacit for working on this. Looks even better now it has been refactored. Ran some tests locally, and it looks great!
I've run my original testcase with this change and it is working for me! Thanks so much for this @VersusFacit, I'll close my other PR. |
f878748
to
36f1d4c
Compare
@@ -111,4 +110,4 @@ | |||
amount: ["integer", "number"] | |||
amount_usd: ["decimal", "number"] | |||
created: ["timestamp", "string"] | |||
""" | |||
""".strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like my mom used to say when she was bundling me up for school, "Remember to strip your whitespace on seeds, dear"
Thanks @dparent1 for creating the PR and @VersusFacit & @mikealfare for picking this up! Great to see this happening! |
This made my day when I turned my computer on this morning. Also @Fokko was very much instrumental in keeping this PR alive, thanks goes to you as well as the others. |
resolves #432
Description
Patches to dparent1's pr here.
I have addressed all comments that I could find open/unaddressed in their PR. I also rewrite the bulk of the
list_relations_without_caching
method.Me running some
run
s against my dockerized iceberg instance.Checklist
changie new
to create a changelog entry