-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-783] Spark with Iceberg tables: catalog.json is empty #376
Comments
Hi @zsvoboda, thank you for opening this issue! Apache Iceberg is not yet an officially supported file format: https://docs.getdbt.com/reference/resource-configs/spark-configs#configuring-tables Would you be interested in contributing this? If so, we can likely take some time to give guidance on how this could be implemented. I'll mark this as help wanted as it's likely not something we can prioritize in the near future. |
I'm not sure what the longer-term solution here. If using OSS Delta, Iceberg, or other file formats, do we need to revert to the much older way of doing this ( |
I think there will eventually be a migration to some sort of information_schema, but we’d need to have a generic API to support it (like merge into does) so that data sources could implement that. that will probably be a while and having format v1 vs v2 for the provider in the general configuration for the table would be a good idea. That’s the difference in Spark between the two statements (why the schema looks the way it does) and the SQL queries they need. But information_schema is not yet part of the Spark catalog API at all so I wouldn’t recommend relying on that if more is to be supported. My 2 cents. Happy to help where I can as I get back from my break if there’s interest! |
Is there some workaround for this at the moment? |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
This issue has been fixed in #294 |
Describe the bug
A clear and concise description of what the bug is. What command did you run? What happened?
I'm using Spark 3.2.1 with Iceberg 0.13.2 (more details here spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.2)
Steps To Reproduce
In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.
Expected behavior
A clear and concise description of what you expected to happen.
catalog.json is populated with the table schema and docs.
Screenshots and log output
If applicable, add screenshots or log output to help explain your problem.
System information
The output of
dbt --version
:The operating system you're using:
Mac OSX Monterey
The output of
python --version
:Python 3.8.12
Additional context
Add any other context about the problem here.
Seems to be similar problem like with the Delta tables (#295)
SQL Error: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.AnalysisException: SHOW TABLE EXTENDED is not supported for v2 tables.;
ShowTableExtended *, [namespace#21, tableName#22, isTemporary#23, information#24]
+- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@7929bdd7
Caused by: org.apache.spark.sql.AnalysisException: SHOW TABLE EXTENDED is not supported for v2 tables.;
ShowTableExtended *, [namespace#21, tableName#22, isTemporary#23, information#24]
+- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@7929bdd7
The text was updated successfully, but these errors were encountered: