Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(tolerate inconsistency): join match failure #16823

Closed
fuyufjh opened this issue May 20, 2024 · 7 comments · Fixed by #16859
Closed

feat(tolerate inconsistency): join match failure #16823

fuyufjh opened this issue May 20, 2024 · 7 comments · Fixed by #16859
Assignees
Milestone

Comments

@fuyufjh
Copy link
Member

fuyufjh commented May 20, 2024

This code branch seems not to be covered yet, nor did it been triggered by our TroubleMakerExecutor.

https://github.com/risingwavelabs/risingwave/blob/1e848526ba487836d72a29b5390689f712b50e0d/src/stream/src/executor/managed_state/join/mod.rs#L4[…]63

Logically, this situation should be covered by the exception injections in state_table.insert/delete/update.

Hmm, interesting. I've never encountered this issue locally.

Could it be because the join key is too random and didn't match up?

It's possible. With random data, some paths might be hard to encounter.

Yes, if it's purely random (from INT_MIN to INT_MAX), it might not work well.

It seems like a good idea to use a fixed value occasionally within the random range.

Yes, indeed.

@github-actions github-actions bot added this to the release-1.10 milestone May 20, 2024
@fuyufjh fuyufjh changed the title feat(tolerate inconsistency): feat(tolerate inconsistency): join match failure May 20, 2024
@stdrc
Copy link
Member

stdrc commented May 20, 2024

loop {
// Iterate on both iterators and ensure they have same size. Basically `zip_eq()`.
let (row, degree) =
join(pinned_table_iter.next(), pinned_degree_table_iter.next()).await;
let (row, degree) = match (row, degree) {
(None, None) => break,
(None, Some(_)) | (Some(_), None) => {
panic!("mismatched row and degree table of join key: {:?}", &key)
}
(Some(r), Some(d)) => (r, d),
};
let row = row?;
let degree_row = degree?;
let pk1 = row.key();
let pk2 = degree_row.key();
debug_assert_eq!(
pk1, pk2,
"mismatched pk in degree table: pk1: {pk1:?}, pk2: {pk2:?}",
);
let pk = row
.as_ref()
.project(&self.state.pk_indices)
.memcmp_serialize(&self.pk_serializer);
let degree_i64 = degree_row
.datum_at(degree_row.len() - 1)
.expect("degree should not be NULL");
entry_state
.insert(
pk,
JoinRow::new(row.row(), degree_i64.into_int64() as u64).encode(),
)
.with_context(|| self.state.error_context(row.row()))?;
}

The loop highly relies on the consistency of data in the two state tables, no missing data or extra data is allowed in one of the tables, because of the fact that it only checks pk equality in debug build.

I'm afraid that only tolerating the panic call will cause massive data corruption when inconsistency really happens, because all the pairs of row and degree can mismatch. IMO we'd better change the loop logic to pair row and degree according to pk, but this may hurt performance. Any comments?

cc @yuhao-su

@stdrc
Copy link
Member

stdrc commented May 21, 2024

Oh, we can only fallback to pk-based row-degree matching once after an inconsistency happened.

@yuhao-su
Copy link
Contributor

@stdrc
Copy link
Member

stdrc commented May 21, 2024

@stdrc
Copy link
Member

stdrc commented May 21, 2024

May I ask a dumb question? Why we need to separate data table and degree table? If they are always updated at the same time, and share the same pk, why we don't add a degree column to the data table?🥵

Is it just for historical reason (backward compatibility?) or for performance concern?

@yuhao-su
Copy link
Contributor

You mean zip_by_order_key?

Yes

Why we need to separate data table and degree table?

In case when we need the degree table, it is very common only degree table get updated, in which case we don't want update the whole row.

@stdrc
Copy link
Member

stdrc commented May 21, 2024

Why we need to separate data table and degree table?

In case when we need the degree table, it is very common only degree table get updated, in which case we don't want update the whole row.

Oh, you're right. I was just kinda confused between the two join sides. Now I understand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants