-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: persist processed rows for SourceBackfill for SHOW JOBS #18983
feat: persist processed rows for SourceBackfill for SHOW JOBS #18983
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
6436f64
to
1fa501c
Compare
impl BackfillStateWithCnt { | ||
pub fn encode_to_json(self) -> JsonbVal { | ||
serde_json::to_value(self).unwrap().into() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the state table representation, and should be a breaking change. But if we can get this into 2.1, then we don't need to consider compatibility.
@@ -668,23 +682,15 @@ impl<S: StateStore> SourceBackfillExecutorInner<S> { | |||
let new_chunk = chunk.clone_with_vis(new_vis); | |||
yield Message::Chunk(new_chunk); | |||
source_backfill_row_count.inc_by(card as u64); | |||
total_backfilled_rows += card as u64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, we just need to += card
here for total cnt
@@ -206,6 +216,8 @@ impl BackfillStage { | |||
/// Updates backfill states and returns whether the row backfilled from external system is visible. | |||
fn handle_backfill_row(&mut self, split_id: &str, offset: &str) -> bool { | |||
let state = self.states.get_mut(split_id).unwrap(); | |||
state.num_consumed_rows += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we need to +=1
since we need to count and persist separately for each split.
c9d083b
to
16ae682
Compare
7c2e849
to
e6086ef
Compare
total_backfilled_rows, | ||
backfill_stage.total_backfilled_rows(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we calculate per-split cnt, and sum here
16ae682
to
3498778
Compare
e6086ef
to
09a9737
Compare
3498778
to
4a18292
Compare
09a9737
to
7fa4f3f
Compare
4a18292
to
1c50a1b
Compare
7fa4f3f
to
b08b8af
Compare
1c50a1b
to
eaa34e4
Compare
b08b8af
to
ef793bd
Compare
eaa34e4
to
0f6248b
Compare
45f6490
to
874ca44
Compare
You mean it's just for observability/debugging purposes? |
Yes. |
Hey @BugenZhao @kwannoel , would you mind taking a look at the downstack PR #18925? 🤣 |
7d123f2
to
d3ef9bb
Compare
874ca44
to
f0130cd
Compare
2 {"v1": 3, "v2": "c"} | ||
3 {"v1": 4, "v2": "d"} | ||
3 {"v1": 4, "v2": "d"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
produce more data to make partitions look different
Inspired by #18925 (comment), I also persist 👍 if it looks good @BugenZhao @kwannoel 👀 |
BTW, I just realized that we can merge this PR first, not necessarily wait for #18925 |
0,"{""num_consumed_rows"": 0, ""state"": ""Finished"", ""target_offset"": null}" | ||
1,"{""num_consumed_rows"": 0, ""state"": ""Finished"", ""target_offset"": null}" | ||
2,"{""num_consumed_rows"": 0, ""state"": ""Finished"", ""target_offset"": null}" | ||
3,"{""num_consumed_rows"": 0, ""state"": ""Finished"", ""target_offset"": null}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we don't have any tests to check the output of show jobs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DDL finishes immediately. It seems not easy to test show jobs
(and it has non deterministic results).
Actually do we still need to persist backfilled rows? Since we can just use offset to show progress. |
I'm thinking How do you think? |
We may also deprecate some fields in the future if they are not useful. |
f7949f1
to
4d00aac
Compare
Signed-off-by: xxchan <[email protected]>
Signed-off-by: xxchan <[email protected]>
#19039) Signed-off-by: xxchan <[email protected]> Co-authored-by: xxchan <[email protected]>
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Follow up of #18925
If we don't persist, after recovery the progress will start from 0 again.
However, TBH I'm a little hesitant about this change, since it is a little ugly. And we have to persist it just for SHOW JOBS.
This changes the state table representation, and should be a breaking change. But if we can get this into 2.1, then we don't need to consider compatibility.
Besides
SHOW JOBS
, this also benefits scanning from state table, making it more informative. (See diff onslt
)Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.