Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arrangement backfill can be slow when there are consecutive tombstones in upstream table #17267

Open
hzxa21 opened this issue Jun 14, 2024 · 1 comment
Labels
type/bug Something isn't working

Comments

@hzxa21
Copy link
Collaborator

hzxa21 commented Jun 14, 2024

Describe the bug

Assuming we only have two vnodes and there are consecutive tombstones in upstream MV/Table state, which is possible when there is a temporal filter in the upstream:

vnode_1:
pk_1 -> tomb
pk_2 -> tomb
...
pk_N -> tomb
pk_(N+1) -> row
pk_(N+2) -> row
...

vnode_2:
pk_1 -> tomb
pk_2 -> tomb
...
pk_M -> tomb
pk_(M+1) -> row
pk_(M+2) -> row
...

With arrangement backfill, vnode_1 and vnode_2 are iterated independently and on seeing a barrier, backfill will stop in the current epoch as long as there is at least one visible row emitted in either of the two vnode iterator. Therefore, it is possible that the slow vnode will never update its current position and can hardly make progress in the next epoch because the consecutive tombstones will be repetitively scanned. Consider the following case:

  1. In epoch1, vnode_1 and vnode_2 start the backfill snapshot read by scanning the upstream table independently.
  2. vnode_1 scans [pk_1, pk_(N+1)] and emits pk_(N+2) -> row.
  3. vnode_2 is slightly slower and just scans [pk_1, pk_(M-1)], which are all tombstones.
  4. epoch2 comes. Backfill is interrupted. vnode_1's position is updated to pk_(N+2) while vnode_2's position remains to be left unbounded.

Error message/log

No response

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

No response

Additional context

No response

@hzxa21
Copy link
Collaborator Author

hzxa21 commented Jul 10, 2024

Given that we target at releasing the 1st version of serverless backfill in v1.11, which can fundamentally solve this issue because backfill will no longer be interrupted. I wonder whether we still need to fix arrangement backfill in this corner case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
3 participants