Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: S3 paths with special characters are not escaped #10163

Open
c-thiel opened this issue Jan 5, 2025 · 7 comments
Open

[Bug]: S3 paths with special characters are not escaped #10163

c-thiel opened this issue Jan 5, 2025 · 7 comments

Comments

@c-thiel
Copy link
Contributor

c-thiel commented Jan 5, 2025

What happened

Maybe related to #8524

In our deployment we had a user that created a namespace with a questionmark in its name. Since then we cannot run gc anymore. I managed to reproduce it in our tests.
GC It fails with:

Caused by: java.lang.RuntimeException: Failed to get manifest files for ICEBERG_TABLE tests.a.C?mpl Name .table, content-ID 0749c4e9-ac98-4c82-ad2f-1bc387f5045f at commit 4696d1290522386c8ee4db51c64c1fd3ed567351a0026a5fefb6154faa1629d2 via s3://wyrm-lakehouse-pytest/tests.a.C?mpl Name /table_2559f350-e7c7-4216-b29e-5033091a4d06/metadata/00000-6cc479b6-0667-4c01-93e8-d59ed46c4c4c.metadata.json
...
Caused by: org.apache.iceberg.exceptions.RuntimeIOException: Failed to open file: s3://wyrm-lakehouse-pytest/tests.a.C?mpl Name /table_2559f350-e7c7-4216-b29e-5033091a4d06/metadata/snap-6014480353781191283-1-fe723339-fd23-4db3-a534-95f1de851af9.avro

The file exists. It seems to me that the ? is not escaped, and thus the rest of the path is interpreted as a query argument.

Is there a flag to make gc altogether more robust? I don't want to abort it halfway through due to a single table failure.

How to reproduce it

  1. Create a namespace "a" with a subnamespace "C?mpl3 Name " (I used spark)
  2. Create a table in it called "table"
  3. Drop the table
  4. Run Nessie gc

Nessie server type (docker/uber-jar/built from source) and version

docker

Client type (Ex: UI/Spark/pynessie ...) and version

None - Nessie gc

Additional information

Nessie 0.99.0

@dimas-b
Copy link
Member

dimas-b commented Jan 6, 2025

Sorry, @c-thiel , I was not able to reproduce this problem with the current main code (local build) 🤔

Could you share your full exception stack trace and exact gc CLI arguments (minus secret stuff, of course)?

@c-thiel
Copy link
Contributor Author

c-thiel commented Jan 8, 2025

Hm, strange.

Here is the callstack:

java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: Failed to get manifest files for ICEBERG_TABLE tests.2023-10-13-101415-edd293e4-1eee-47d7-b964-171340e13ae9.a.C?mpl Name .table, content-ID 0749c4e9-ac98-4c82-ad2f-1bc387f5045f at commit f150347568bc34bfd27b611d6588a00bf472e7bd76c477b3b96864deddf3ed5d via s3://my-bucket/tests.2023-10-13-101415-edd293e4-1eee-47d7-b964-171340e13ae9.a.C?mpl Name /table_2559f350-e7c7-4216-b29e-5033091a4d06/metadata/00000-6cc479b6-0667-4c01-93e8-d59ed46c4c4c.metadata.json
        at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
        at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
        at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:540)
        at java.base/java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:567)
        at java.base/java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:653)
        at java.base/java.util.concurrent.ForkJoinPool.invoke(ForkJoinPool.java:2822)
        at org.projectnessie.gc.expire.local.DefaultLocalExpire.expire(DefaultLocalExpire.java:75)
        at org.projectnessie.gc.tool.cli.commands.BaseRepositoryCommand.expire(BaseRepositoryCommand.java:250)
        at org.projectnessie.gc.tool.cli.commands.MarkAndSweep.call(MarkAndSweep.java:57)
        at org.projectnessie.gc.tool.cli.commands.BaseRepositoryCommand.call(BaseRepositoryCommand.java:84)
        at org.projectnessie.gc.tool.cli.commands.BaseCommand.call(BaseCommand.java:26)
        at org.projectnessie.gc.tool.cli.commands.BaseCommand.call(BaseCommand.java:21)
        at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
        at picocli.CommandLine.access$1500(CommandLine.java:148)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
        at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
        at picocli.CommandLine.execute(CommandLine.java:2174)
        at org.projectnessie.gc.tool.cli.CLI.runMain(CLI.java:104)
        at org.projectnessie.gc.tool.cli.CLI.runMain(CLI.java:75)
        at org.projectnessie.gc.tool.cli.CLI.main(CLI.java:70)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Failed to get manifest files for ICEBERG_TABLE tests.2023-10-13-101415-edd293e4-1eee-47d7-b964-171340e13ae9.a.C?mpl Name .table, content-ID 0749c4e9-ac98-4c82-ad2f-1bc387f5045f at commit f150347568bc34bfd27b611d6588a00bf472e7bd76c477b3b96864deddf3ed5d via s3://my-bucket/tests.2023-10-13-101415-edd293e4-1eee-47d7-b964-171340e13ae9.a.C?mpl Name /table_2559f350-e7c7-4216-b29e-5033091a4d06/metadata/00000-6cc479b6-0667-4c01-93e8-d59ed46c4c4c.metadata.json
        at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
        at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
        at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:540)
        at java.base/java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:567)
        at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:670)
        at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:927)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
        at java.base/java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:657)
        at org.projectnessie.gc.expire.local.DefaultLocalExpire.expireInForkJoinPool(DefaultLocalExpire.java:96)
        at java.base/java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(ForkJoinTask.java:1456)
        at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
        at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1312)
        at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1843)
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1808)
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)
Caused by: java.lang.RuntimeException: Failed to get manifest files for ICEBERG_TABLE tests.2023-10-13-101415-edd293e4-1eee-47d7-b964-171340e13ae9.a.C?mpl Name .table, content-ID 0749c4e9-ac98-4c82-ad2f-1bc387f5045f at commit f150347568bc34bfd27b611d6588a00bf472e7bd76c477b3b96864deddf3ed5d via s3://my-bucket/tests.2023-10-13-101415-edd293e4-1eee-47d7-b964-171340e13ae9.a.C?mpl Name /table_2559f350-e7c7-4216-b29e-5033091a4d06/metadata/00000-6cc479b6-0667-4c01-93e8-d59ed46c4c4c.metadata.json
        at org.projectnessie.gc.iceberg.IcebergContentToFiles.lambda$extractTableFiles$0(IcebergContentToFiles.java:170)
        at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273)
        at java.base/java.util.stream.Streams$StreamBuilderImpl.forEachRemaining(Streams.java:411)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
        at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:310)
        at java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:735)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
        at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:276)
        at java.base/java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1715)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
        at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
        at java.base/java.util.stream.ReduceOps$5.evaluateSequential(ReduceOps.java:258)
        at java.base/java.util.stream.ReduceOps$5.evaluateSequential(ReduceOps.java:248)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.count(ReferencePipeline.java:709)
        at org.projectnessie.gc.expire.PerContentDeleteExpired.identifyLiveFiles(PerContentDeleteExpired.java:136)
        at org.projectnessie.gc.expire.PerContentDeleteExpired.expire(PerContentDeleteExpired.java:73)
        at org.projectnessie.gc.expire.local.DefaultLocalExpire.expireSingleContent(DefaultLocalExpire.java:109)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
        at java.base/java.util.concurrent.ConcurrentHashMap$KeySpliterator.forEachRemaining(ConcurrentHashMap.java:3573)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
        at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:960)
        at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:934)
        at java.base/java.util.stream.AbstractTask.compute(AbstractTask.java:327)
        at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:754)
        ... 5 more
Caused by: org.apache.iceberg.exceptions.RuntimeIOException: Failed to open file: s3://my-bucket/tests.2023-10-13-101415-edd293e4-1eee-47d7-b964-171340e13ae9.a.C?mpl Name /table_2559f350-e7c7-4216-b29e-5033091a4d06/metadata/snap-6014480353781191283-1-fe723339-fd23-4db3-a534-95f1de851af9.avro
        at org.apache.iceberg.avro.AvroIterable.newFileReader(AvroIterable.java:104)
        at org.apache.iceberg.avro.AvroIterable.iterator(AvroIterable.java:77)
        at org.apache.iceberg.avro.AvroIterable.iterator(AvroIterable.java:37)
        at org.apache.iceberg.relocated.com.google.common.collect.Iterables.addAll(Iterables.java:333)
        at org.apache.iceberg.relocated.com.google.common.collect.Lists.newLinkedList(Lists.java:242)
        at org.apache.iceberg.ManifestLists.read(ManifestLists.java:45)
        at org.apache.iceberg.BaseSnapshot.cacheManifests(BaseSnapshot.java:146)
        at org.apache.iceberg.BaseSnapshot.allManifests(BaseSnapshot.java:164)
        at org.projectnessie.gc.iceberg.IcebergContentToFiles.allManifests(IcebergContentToFiles.java:268)
        at org.projectnessie.gc.iceberg.IcebergContentToFiles.allManifestsAndDataFiles(IcebergContentToFiles.java:254)
        at org.projectnessie.gc.iceberg.IcebergContentToFiles.lambda$extractTableFiles$0(IcebergContentToFiles.java:155)
        ... 37 more
Caused by: org.apache.avro.InvalidAvroMagicException: Not an Avro data file
        at org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:78)
        at org.apache.iceberg.avro.AvroIterable.newFileReader(AvroIterable.java:102)
        ... 47 more

The gc command is:

        - gc
        - --uri
        - http://....:19120/api/v2
        - --inmemory
        - --default-cutoff
        - PT80H
        - --iceberg
        - $(S3_CREDENTIALS)

The credential has access to the file. I am on Version 0.99 - if main doesn't have the problem, we could just try an update and see if the problem persists.

@dimas-b
Copy link
Member

dimas-b commented Jan 8, 2025

@c-thiel : From the stack trace it looks like it is not a problem with S3 URI handling, but might be a problem with the file contents. Could you double check in S3 UI/CLI whether that avro file is correct?

@dimas-b
Copy link
Member

dimas-b commented Jan 8, 2025

I tried Spark + AWS and created the top-level namespace with ? and a in the name.

Then Iceberg fails to insert on the Spark side.

spark-sql ()> create namespace `ns?test abc`;

then

use `ns?test abc`

then

spark-sql (`ns?test abc`)> create table t2 (n string)
                         > ;
Time taken: 1.609 seconds
spark-sql (`ns?test abc`)> insert into t2 values ('t2');
25/01/08 10:57:35 WARN SnapshotProducer: Failed to load committed table metadata or during cleanup, skipping further cleanup
org.apache.iceberg.exceptions.RuntimeIOException: Failed to open file: s3://EXAMPLE/ns?test abc/t2_11c36b10-a8c1-477a-8a91-b90733dad13f/metadata/snap-4983815506953660385-1-b650452a-4aba-45ae-b417-c9a3adc99462.avro
	at org.apache.iceberg.avro.AvroIterable.newFileReader(AvroIterable.java:104)
	at org.apache.iceberg.avro.AvroIterable.iterator(AvroIterable.java:77)
	at org.apache.iceberg.avro.AvroIterable.iterator(AvroIterable.java:37)
	at org.apache.iceberg.relocated.com.google.common.collect.Iterables.addAll(Iterables.java:333)
	at org.apache.iceberg.relocated.com.google.common.collect.Lists.newLinkedList(Lists.java:258)
	at org.apache.iceberg.ManifestLists.read(ManifestLists.java:45)
	at org.apache.iceberg.BaseSnapshot.cacheManifests(BaseSnapshot.java:146)
	at org.apache.iceberg.BaseSnapshot.allManifests(BaseSnapshot.java:164)
	at org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:451)
	at org.apache.iceberg.spark.source.SparkWrite.commitOperation(SparkWrite.java:233)
	at org.apache.iceberg.spark.source.SparkWrite$BatchAppend.commit(SparkWrite.java:301)
	at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:399)
	at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:359)
	at org.apache.spark.sql.execution.datasources.v2.AppendDataExec.writeWithV2(WriteToDataSourceV2Exec.scala:225)
	at org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run(WriteToDataSourceV2Exec.scala:337)
	at org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run$(WriteToDataSourceV2Exec.scala:336)
	at org.apache.spark.sql.execution.datasources.v2.AppendDataExec.run(WriteToDataSourceV2Exec.scala:225)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437)
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:98)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:85)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:83)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:220)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$4(SparkSession.scala:691)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:682)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:713)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:744)
	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:68)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:501)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:619)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:613)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.IterableLike.foreach(IterableLike.scala:74)
	at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:613)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:310)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:75)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:52)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1029)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.iceberg.shaded.org.apache.avro.InvalidAvroMagicException: Not an Avro data file
	at org.apache.iceberg.shaded.org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:79)
	at org.apache.iceberg.avro.AvroIterable.newFileReader(AvroIterable.java:102)
	... 74 more
Time taken: 2.957 seconds

So, my guess is that it's a bug in Iceberg code 🤷

@dimas-b
Copy link
Member

dimas-b commented Jan 8, 2025

@c-thiel : Does this use case work with a JDBC Iceberg catalog?

@c-thiel
Copy link
Contributor Author

c-thiel commented Jan 8, 2025

@c-thiel : Does this use case work with a JDBC Iceberg catalog?

I haven't tried actually. Thanks for investigating @dimas-b.

I don't really care much about this specific use-case, but if someone manages to create a broken table (very small fraction), it would be great if Nessie would continue with gc instead of aborting. Not really sure how I can get the gc to run normally again without touching MongoDB

@dimas-b
Copy link
Member

dimas-b commented Jan 8, 2025

Nessie GC uses Iceberg jars for metadata access very similarly to how Spark uses them.

So it is conceivable that, for example, Python might be able to write something that Iceberg java would not be able to read 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants