Replies: 5 comments
-
Looking at the checkpoint code, I see this line: Am I misunderstanding what checkpoint does? Why is it being set to |
Beta Was this translation helpful? Give feedback.
-
@echai58 change seems to be done 3 years ago ^^, but reading the protocol it mentions data_change=false if files were already present in a table, so I guess that's why in this context. You could check what the behavior is of delta-spark and if you see the data_change being preserved as-is? |
Beta Was this translation helpful? Give feedback.
-
Yeah, in my eyes a checkpoint isn't really a transaction because it doesn't generate a commit file, so i feel it should preserve it. But yeah, I'll look into what the delta-spark behavior is, good call. |
Beta Was this translation helpful? Give feedback.
-
@ion-elgreco Seems like checkpointing with delta-spark also sets I brought this up because I have a use case where I'm interested in figuring out which partitions were edited in the past |
Beta Was this translation helpful? Give feedback.
-
Environment
Delta-rs version: 0.16.3
Binding: Python
Bug
What happened:
Before checkpointing,
table.get_add_actions(flatten=True).to_pandas
looks like:After checkpointing, the same call looks like:
which shows everything is unchanged, except
data_change
goes from True to False.Examining the checkpoint parquet file, I see the following entry for
add
:while the original commit file shows:
What you expected to happen:
Unless I'm misunderstanding what
checkpoint
does, I believe it is a bug thatdataChange
is being set toFalse
, instead of preserving the value from the original commit file.How to reproduce it:
Beta Was this translation helpful? Give feedback.
All reactions