is predicate pushdown supported when using the Gateway with posix backend #684
-
Thank you for this tool. I was wondering if tools like duckdb will be able to do predicate pushdown when using the Gateway with posix backend. If my understanding is correct it is through range headers in the GET object call thy they do the predicate pushdown. So does the gateway translates that it into corresponding posix calls? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
I don't have direct experience with duckdb, but according to this https://stackoverflow.com/questions/76696239/predicate-pushdown-in-duckdb-for-a-parquet-file-in-s3 you are correct that it will retrieve specific sections of the objects using range GETs. The gateway does translate the range headers to only read the requested section of the file using io.NewSectionReader(): versitygw/backend/posix/posix.go Line 1729 in d521c66 So I expect this should be optimized to only read the requested file offsets as requested by duckdb. If this ends up working, it might be worth an article in the wiki on how to setup duckdb to access files via versitygw. Thanks for sharing this use case, its not something I have previously seen. |
Beta Was this translation helpful? Give feedback.
-
I was able to run a quick test using duckdb cli. I configured for use against my local versitygw with the following:
and I downloaded their example parquet file:
then in duckdb:
and ran their test query:
It seems to be issuing small range GETs without needing to download the whole test file. Is this the behavior you were looking for? |
Beta Was this translation helpful? Give feedback.
-
Thank you for checking this Ben McClelland. Since we have the S3 api compatibility with Versity, I expected duckdb to do just that. I mean the ability to "issue small range GETs without needing to download the whole file" was kind of expected to be there with Versity. Whether that would translate into smaller disk io was what I was wondering. I think the following part answers my question. rdr := io.NewSectionReader(f, startOffset, length) It must be indeed resulting in smaller disk io Thank you once again for checking this. |
Beta Was this translation helpful? Give feedback.
I don't have direct experience with duckdb, but according to this https://stackoverflow.com/questions/76696239/predicate-pushdown-in-duckdb-for-a-parquet-file-in-s3 you are correct that it will retrieve specific sections of the objects using range GETs.
The gateway does translate the range headers to only read the requested section of the file using io.NewSectionReader():
versitygw/backend/posix/posix.go
Line 1729 in d521c66
So I expect this should be optimized to only read the requested file offsets as requested by duckdb. If this ends up working, it might be worth an article in the wiki on how to setup duckdb to access…