Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: use high watermark to finish backfill faster #18342

Merged

Conversation

xxchan
Copy link
Member

@xxchan xxchan commented Aug 30, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

step 2 & 3 of #18299, see there for the motivation and a state diagram of the new design.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@xxchan xxchan marked this pull request as ready for review August 30, 2024 09:44
Copy link
Member Author

xxchan commented Aug 30, 2024

@xxchan xxchan force-pushed the 08-24-refactor_use_high_watermark_to_finish_backfill_faster branch from 6fb622b to 0eaf49f Compare September 2, 2024 05:52
@xxchan xxchan force-pushed the 08-24-refactor_add_target_offsets_to_determinine_if_source_backfill_finished branch from aa5610d to 272a7a4 Compare September 2, 2024 08:03
@xxchan xxchan force-pushed the 08-24-refactor_use_high_watermark_to_finish_backfill_faster branch from 0eaf49f to d2bc71a Compare September 2, 2024 08:04
@xxchan xxchan force-pushed the 08-24-refactor_add_target_offsets_to_determinine_if_source_backfill_finished branch from c8b80b7 to 2b8f062 Compare September 2, 2024 08:08
@xxchan xxchan force-pushed the 08-24-refactor_use_high_watermark_to_finish_backfill_faster branch from d2bc71a to 051239b Compare September 2, 2024 08:08
Base automatically changed from 08-24-refactor_add_target_offsets_to_determinine_if_source_backfill_finished to main September 3, 2024 04:05
@graphite-app graphite-app bot requested a review from a team September 3, 2024 04:05
@xxchan xxchan force-pushed the 08-24-refactor_use_high_watermark_to_finish_backfill_faster branch from efc2a77 to dc3aae9 Compare September 3, 2024 04:19
@xxchan xxchan force-pushed the 08-24-refactor_use_high_watermark_to_finish_backfill_faster branch from 4a5c008 to 1a663a9 Compare September 3, 2024 10:10
Copy link
Contributor

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great work

@@ -21,7 +44,7 @@ create source s0 (v1 int, v2 varchar) with (
scan.startup.mode = 'earliest'
) FORMAT PLAIN ENCODE JSON;

query I
query ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do we need to use ? instead of I ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently sqllogictest --override will produce ?. Actually the character doesn't have any meaning now, any character will pass test..

system ok
internal_table.mjs --name s0 --type source
----
(empty)


# offset 0 must be backfilled, not from upstream.
# SourceBackfill starts from offset 0, with backfill_info: HasDataToBackfill { latest_offset: "0" } (decided by kafka high watermark).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, what will the high watermark value be?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1

@xxchan xxchan added this pull request to the merge queue Sep 4, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 4, 2024
@xxchan xxchan added this pull request to the merge queue Sep 4, 2024
Merged via the queue into main with commit 847610a Sep 4, 2024
30 of 31 checks passed
@xxchan xxchan deleted the 08-24-refactor_use_high_watermark_to_finish_backfill_faster branch September 4, 2024 02:26
xxchan added a commit that referenced this pull request Jan 23, 2025
This is a follow up of #18342.
For "there's no history data to backfill at all", we just checked `low_watermark == high_watermark`, but `start_offset == high_watermark-1` is also a case.

We may meet this case when `scan.startup.mode=latest`: #20083 (comment)

Signed-off-by: xxchan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants