Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ct 276/iceberg support patch #651

Merged
merged 52 commits into from
Mar 2, 2023
Merged

Ct 276/iceberg support patch #651

merged 52 commits into from
Mar 2, 2023

Conversation

VersusFacit
Copy link
Contributor

@VersusFacit VersusFacit commented Feb 24, 2023

resolves #432

Description

Patches to dparent1's pr here.

I have addressed all comments that I could find open/unaddressed in their PR. I also rewrite the bulk of the list_relations_without_caching method.

image

Me running some runs against my dockerized iceberg instance.

Checklist

dparent1 and others added 30 commits August 16, 2022 12:51
Found a way to identify iceberg tables given that spark returns
an error when trying to execute "SHOW TABLE EXTENDED..."  See
https://issues.apache.org/jira/browse/SPARK-33393

Instead of show table extended a "DESCRIBE EXTENDED" is
performed to retrieve the provider information.  This allows
for identification of iceberg through an is_iceberg member
variable.

Allow for multiple join conditions to allow for mutliple columns to
make a row distinct

Use is_iceberg everywhere handling iceberg tables differs from other
sources of data.
[CT-276] Apache Iceberg Support #294

The _schema variable was used for non-iceberg tables but was being
overridden by work for iceberg v2 tables.  I've made it so the iceberg
condition will set _schema rather than blanket changing the schema for
all providers.
On second look I wasn't happy with my name choices for macro name and
method, hopefully what I have now makes more sense.

[CT-276] Apache Iceberg Support #294
Upon further investigation this check is not needed since
self.database will not be set.
- upstream dbt changed breaking the use of ParsedSourceDefinition, using
  SourceDefinition appears to work instead
- Added in change to include iceberg in adapters.sql:
   macro spark__alter_column_comment
tox on certain platforms will complain that /bin/bash is not allowed to
be used.  I'm allowing it to be used with this change.
Fixed conflicts:
dbt/adapters/spark/impl.py
dbt/adapters/spark/relation.py
dbt/include/spark/macros/materializations/incremental/incremental.sql
dbt/include/spark/macros/materializations/incremental/strategies.sql

Updated tox.ini removing my own addition of allowlist_externals, main
has it set now.
Cleanup based on comments
- Fixing up merge conflicts around Exception/Error types
- Putting back accidental newline removal in tox.ini
- in impl.py changed:
    List[agate.Row]
    to
    AttrDict

This fixed the pre-commit error after merging in changes from main.
I noticed that two Spark tests are failing, so tried to revert
some of the changes that seemed related. They are now passing
and also my Iceberg dbt project is still running fine
@VersusFacit VersusFacit mentioned this pull request Feb 24, 2023
4 tasks
@VersusFacit VersusFacit self-assigned this Feb 24, 2023
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @VersusFacit for working on this. Looks even better now it has been refactored. Ran some tests locally, and it looks great!

image

@dparent1
Copy link
Contributor

I've run my original testcase with this change and it is working for me! Thanks so much for this @VersusFacit, I'll close my other PR.

dbt/adapters/spark/impl.py Outdated Show resolved Hide resolved
dbt/adapters/spark/impl.py Outdated Show resolved Hide resolved
dbt/adapters/spark/impl.py Outdated Show resolved Hide resolved
dbt/adapters/spark/impl.py Outdated Show resolved Hide resolved
dbt/adapters/spark/impl.py Show resolved Hide resolved
dbt/adapters/spark/relation.py Show resolved Hide resolved
@VersusFacit VersusFacit requested a review from mikealfare March 2, 2023 11:47
@VersusFacit VersusFacit force-pushed the CT-276/iceberg-support-patch branch from f878748 to 36f1d4c Compare March 2, 2023 21:50
@@ -111,4 +110,4 @@
amount: ["integer", "number"]
amount_usd: ["decimal", "number"]
created: ["timestamp", "string"]
"""
""".strip()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like my mom used to say when she was bundling me up for school, "Remember to strip your whitespace on seeds, dear"

@mikealfare mikealfare merged commit 8987e66 into main Mar 2, 2023
@mikealfare mikealfare deleted the CT-276/iceberg-support-patch branch March 2, 2023 23:24
@Fokko
Copy link
Contributor

Fokko commented Mar 3, 2023

Thanks @dparent1 for creating the PR and @VersusFacit & @mikealfare for picking this up! Great to see this happening!

@dparent1
Copy link
Contributor

dparent1 commented Mar 6, 2023

Thanks @dparent1 for creating the PR and @VersusFacit & @mikealfare for picking this up! Great to see this happening!

This made my day when I turned my computer on this morning. Also @Fokko was very much instrumental in keeping this PR alive, thanks goes to you as well as the others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants