-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Explorer: Always read fresh file contents when ingesting a delimited file into duckdb #5890
Conversation
E2E Tests 🚀 ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this PR, I see the problem outlined in #5860 fixed. ✅
However, with this PR, I believe that I see we are back to no longer having #5551 fixed for parquet files i.e. I believe we are regressing from #5651. If an underlying parquet file changes while a Data Explorer tab is open, the Data Explorer does not update.
Note that an open Data Explorer tab will update if it is watching a CSV or TSV file. 😱
What do you think? Can we get the behavior to work for parquet files as well?
Wait, I just tested this out with: Positron Version: 2025.01.0 (Universal) build 152 And it looks like this updating behavior never worked for parquet files? Is this expected? |
I will take a closer look with Parquet files. There is probably a similar file handle caching problem which is very annoying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do see the problem fixed for delimited files, and maybe the treatment for Parquet files will need a more thorough revamp.
88a65ed
to
87b948f
Compare
I used the same approach to fix the issue for Parquet files, too. This is a temporary triage to get the behavior correct -- it will need some additional work for avoid using too much memory in the common scenario that the file doesn't change (in which case we can use |
@juliasilge it would be great to get this into the 2025.01 release if possible! It can be merged whenever the build is green |
Addresses #5860. duckdb-wasm has some kind of file caching behavior that is not resilient to file changes on the underlying file system. This patch nukes this issue from orbit by always registering the file contents as a virtual file and reading from that. I think if and when we migrate to the duckdb NodeJS bindings per #5889 that we can avoid this chicanery.
e2e: @:data-explorer
QA Notes
Here was the reproducible failure from the issue: