You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, yesterday we suddenly realized this will cause problems: it forces every barrier must have more than 256 rows, which could cause barrier pile-up in some bad cases (such as when there is huge amplification in downstream MV)
For each barrier, we currently enforce to read 1 chunk at least. That could have issues when there's huge amplification.
So we enforce to read 1 row exactly instead, if no rows read before we recieved the barrier.
kwannoel
changed the title
snapshot read: Read 1 record at least rather than 1 chunk at least per barrier
snapshot read: Read 1 record exactly rather than 1 chunk at least per barrier
Dec 20, 2023
We spend a lot of time skipping tombstones when creating the iterator, but in the end, we are only going to read one line. This was quite wasteful and inefficient to me. 😕
We spend a lot of time skipping tombstones when creating the iterator, but in the end, we are only going to read one line. This was quite wasteful and inefficient to me. 😕
The dominant cost in that scenario is skipping tombstones. Even though we only read 1 row, in the next epoch, since we already skipped the tombstones, we don't need to skip them again. So it will be pretty fast to read the next CHUNK_SIZE - 1 rows.
It's a worth while trade off IMO so we can deal with amplification cases.
In normal cases (little or no tombstone) we won't encounter it.
For each barrier, we currently enforce to read 1 chunk at least. That could have issues when there's huge amplification.
So we enforce to read 1 row exactly instead, if no rows read before we recieved the barrier.
credit: @fuyufjh and @st1page
The text was updated successfully, but these errors were encountered: