Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Explorer: Always read fresh file contents when ingesting a delimited file into duckdb #5890

Merged
merged 3 commits into from
Jan 7, 2025

Conversation

wesm
Copy link
Contributor

@wesm wesm commented Jan 6, 2025

Addresses #5860. duckdb-wasm has some kind of file caching behavior that is not resilient to file changes on the underlying file system. This patch nukes this issue from orbit by always registering the file contents as a virtual file and reading from that. I think if and when we migrate to the duckdb NodeJS bindings per #5889 that we can avoid this chicanery.

e2e: @:data-explorer

QA Notes

Here was the reproducible failure from the issue:

  • Run iris |> readr::write_csv("test.csv")
  • Click on the file in the File Explorer to open it up; note that we correctly see the iris data
  • Run mtcars |> readr::write_csv("test.csv")
  • Again click on the file in the File Explorer to open it up; note that we incorrectly still see the iris data

@wesm wesm requested a review from juliasilge January 6, 2025 21:31
Copy link

github-actions bot commented Jan 6, 2025

E2E Tests 🚀  ?
This PR will run tests tagged with: @critical @data-explorer

Copy link
Contributor

@juliasilge juliasilge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this PR, I see the problem outlined in #5860 fixed. ✅

However, with this PR, I believe that I see we are back to no longer having #5551 fixed for parquet files i.e. I believe we are regressing from #5651. If an underlying parquet file changes while a Data Explorer tab is open, the Data Explorer does not update.

Note that an open Data Explorer tab will update if it is watching a CSV or TSV file. 😱

What do you think? Can we get the behavior to work for parquet files as well?

@juliasilge
Copy link
Contributor

Wait, I just tested this out with:

Positron Version: 2025.01.0 (Universal) build 152
Code - OSS Version: 1.95.0
Commit: 66aa3fb
Date: 2025-01-06T02:47:02.050Z
Electron: 32.2.1
Chromium: 128.0.6613.186
Node.js: 20.18.0
V8: 12.8.374.38-electron.0
OS: Darwin arm64 24.2.0

And it looks like this updating behavior never worked for parquet files? Is this expected?

@wesm
Copy link
Contributor Author

wesm commented Jan 7, 2025

I will take a closer look with Parquet files. There is probably a similar file handle caching problem which is very annoying

juliasilge
juliasilge previously approved these changes Jan 7, 2025
Copy link
Contributor

@juliasilge juliasilge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see the problem fixed for delimited files, and maybe the treatment for Parquet files will need a more thorough revamp.

@wesm
Copy link
Contributor Author

wesm commented Jan 7, 2025

I used the same approach to fix the issue for Parquet files, too. This is a temporary triage to get the behavior correct -- it will need some additional work for avoid using too much memory in the common scenario that the file doesn't change (in which case we can use CREATE VIEW to avoid fully ingesting the Parquet file from scratch)

@wesm wesm added this to the 2025.01.0 Pre-Release milestone Jan 7, 2025
@wesm
Copy link
Contributor Author

wesm commented Jan 7, 2025

@juliasilge it would be great to get this into the 2025.01 release if possible! It can be merged whenever the build is green

@wesm wesm merged commit a66d65e into main Jan 7, 2025
6 checks passed
@wesm wesm deleted the bug/de-duckdb-file-updated branch January 7, 2025 18:48
@github-actions github-actions bot locked and limited conversation to collaborators Jan 7, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants