-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: S3 paths with special characters are not escaped #10163
Comments
Sorry, @c-thiel , I was not able to reproduce this problem with the current Could you share your full exception stack trace and exact |
Hm, strange. Here is the callstack:
The gc command is: - gc
- --uri
- http://....:19120/api/v2
- --inmemory
- --default-cutoff
- PT80H
- --iceberg
- $(S3_CREDENTIALS) The credential has access to the file. I am on Version 0.99 - if main doesn't have the problem, we could just try an update and see if the problem persists. |
@c-thiel : From the stack trace it looks like it is not a problem with S3 URI handling, but might be a problem with the file contents. Could you double check in S3 UI/CLI whether that avro file is correct? |
I tried Spark + AWS and created the top-level namespace with Then Iceberg fails to insert on the Spark side.
then
then
So, my guess is that it's a bug in Iceberg code 🤷 |
@c-thiel : Does this use case work with a JDBC Iceberg catalog? |
I haven't tried actually. Thanks for investigating @dimas-b. I don't really care much about this specific use-case, but if someone manages to create a broken table (very small fraction), it would be great if Nessie would continue with gc instead of aborting. Not really sure how I can get the gc to run normally again without touching MongoDB |
Nessie GC uses Iceberg jars for metadata access very similarly to how Spark uses them. So it is conceivable that, for example, Python might be able to write something that Iceberg java would not be able to read 🤷 |
What happened
Maybe related to #8524
In our deployment we had a user that created a namespace with a questionmark in its name. Since then we cannot run gc anymore. I managed to reproduce it in our tests.
GC It fails with:
The file exists. It seems to me that the ? is not escaped, and thus the rest of the path is interpreted as a query argument.
Is there a flag to make gc altogether more robust? I don't want to abort it halfway through due to a single table failure.
How to reproduce it
Nessie server type (docker/uber-jar/built from source) and version
docker
Client type (Ex: UI/Spark/pynessie ...) and version
None - Nessie gc
Additional information
Nessie 0.99.0
The text was updated successfully, but these errors were encountered: