Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support content diff for parquet files #8439

Open
isaac-jordan opened this issue Dec 19, 2024 · 0 comments
Open

Support content diff for parquet files #8439

isaac-jordan opened this issue Dec 19, 2024 · 0 comments

Comments

@isaac-jordan
Copy link

If I make a change to a Parquet file in a commit, and then look at the commit diff in the UI, I'd like to be able see what was changed in the parquet file.

As an example of the current experience, here is what a parquet file changed in a commit looks like (in this case totally deleted):

image

In this example, I'd maybe expect a view of the Parquet file (similar to the DuckDB integration) showing the contents of the file that was deleted. Similar to what you get if viewing one of those text formats, e.g. snippet of JSON being added below:

image

Example scenarios:

  • new rows added,
  • some rows deleted,
  • cell values changed,
  • columns added
  • columns deleted.

Originally asked on Slack: https://lakefs.slack.com/archives/C016726JLJW/p1734620253055009

@isaac-jordan isaac-jordan changed the title Support visual diff for parquet files Support content diff for parquet files Dec 19, 2024
@talSofer talSofer added the P3 label Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants