snapshot read: Read 1 record exactly rather than 1 chunk at least per barrier #14077

kwannoel · 2023-12-20T03:44:45Z

However, yesterday we suddenly realized this will cause problems: it forces every barrier must have more than 256 rows, which could cause barrier pile-up in some bad cases (such as when there is huge amplification in downstream MV)

For each barrier, we currently enforce to read 1 chunk at least. That could have issues when there's huge amplification.
So we enforce to read 1 row exactly instead, if no rows read before we recieved the barrier.

credit: @fuyufjh and @st1page

StrikeW · 2023-12-20T05:31:55Z

For each barrier, we currently enforce to read 1 chunk at least.

IIRC, the background is to skip the tombstone in storage, right?

BugenZhao · 2023-12-21T03:04:37Z

We spend a lot of time skipping tombstones when creating the iterator, but in the end, we are only going to read one line. This was quite wasteful and inefficient to me. 😕

kwannoel · 2023-12-21T03:14:48Z

We spend a lot of time skipping tombstones when creating the iterator, but in the end, we are only going to read one line. This was quite wasteful and inefficient to me. 😕

The dominant cost in that scenario is skipping tombstones. Even though we only read 1 row, in the next epoch, since we already skipped the tombstones, we don't need to skip them again. So it will be pretty fast to read the next CHUNK_SIZE - 1 rows.

It's a worth while trade off IMO so we can deal with amplification cases.

In normal cases (little or no tombstone) we won't encounter it.

kwannoel added the type/feature label Dec 20, 2023

github-actions bot added this to the release-1.6 milestone Dec 20, 2023

kwannoel self-assigned this Dec 20, 2023

kwannoel added the priority/high label Dec 20, 2023

kwannoel changed the title ~~snapshot read: Read 1 record at least rather than 1 chunk at least per barrier~~ snapshot read: Read 1 record exactly rather than 1 chunk at least per barrier Dec 20, 2023

kwannoel mentioned this issue Dec 22, 2023

fix(stream): read exactly a single row if no snapshot read on barrier #14146

Closed

9 tasks

kwannoel modified the milestones: release-1.6, release-1.7 Jan 10, 2024

kwannoel mentioned this issue Jan 12, 2024

fix(stream): read exactly a single row if no snapshot read on barrier #14544

Merged

9 tasks

kwannoel closed this as completed in #14544 Jan 17, 2024

kwannoel mentioned this issue Jan 30, 2024

feat(stream): read exactly 1 row if no snapshot read per barrier for arrangement backfill #14842

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

snapshot read: Read 1 record exactly rather than 1 chunk at least per barrier #14077

snapshot read: Read 1 record exactly rather than 1 chunk at least per barrier #14077

kwannoel commented Dec 20, 2023 •

edited

Loading

StrikeW commented Dec 20, 2023 •

edited

Loading

BugenZhao commented Dec 21, 2023

kwannoel commented Dec 21, 2023 •

edited

Loading

snapshot read: Read 1 record exactly rather than 1 chunk at least per barrier #14077

snapshot read: Read 1 record exactly rather than 1 chunk at least per barrier #14077

Comments

kwannoel commented Dec 20, 2023 • edited Loading

StrikeW commented Dec 20, 2023 • edited Loading

BugenZhao commented Dec 21, 2023

kwannoel commented Dec 21, 2023 • edited Loading

kwannoel commented Dec 20, 2023 •

edited

Loading

StrikeW commented Dec 20, 2023 •

edited

Loading

kwannoel commented Dec 21, 2023 •

edited

Loading