Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch(hash join): remove the filter(|(start_row_id, end_row_id)| start_row_id < end_row_id) when process first_output_row_ids #15854

Open
st1page opened this issue Mar 21, 2024 · 1 comment

Comments

@st1page
Copy link
Contributor

st1page commented Mar 21, 2024

In BatchHashJoin we have many logic like

for (&start_row_id, &end_row_id) in iter::once(&0)
.chain(first_output_row_ids.iter())
.tuple_windows()
.filter(|(start_row_id, end_row_id)| start_row_id < end_row_id)
{

But I guess it is not needed and it just to tolerate some wrongly inserted row_id such as

}
non_equi_state.found_matched = false;
non_equi_state
.first_output_row_id
.push(chunk_builder.buffered_count());
if let Some(first_matched_build_row_id) = hash_map.get(probe_key) {
let mut build_row_id_iter = next_build_row_with_same_key
.row_id_iter(Some(*first_matched_build_row_id))
.peekable();
while let Some(build_row_id) = build_row_id_iter.next() {
shutdown_rx.check()?;
let build_chunk = &build_side[build_row_id.chunk_id()];

The push logic should be inside the matching branch(if let Some(first_matched_build_row_id) = hash_map.get(probe_key) {)

I think we need to fix it and remove the filter to prevent more potential issues.

@github-actions github-actions bot added this to the release-1.8 milestone Mar 21, 2024
@st1page st1page modified the milestones: release-1.8, release-1.9 Apr 8, 2024
@st1page st1page modified the milestones: release-1.9, release-1.10 May 14, 2024
@st1page st1page self-assigned this May 14, 2024
Copy link
Contributor

github-actions bot commented Aug 1, 2024

This issue has been open for 60 days with no activity.

If you think it is still relevant today, and needs to be done in the near future, you can comment to update the status, or just manually remove the no-issue-activity label.

You can also confidently close this issue as not planned to keep our backlog clean.
Don't worry if you think the issue is still valuable to continue in the future.
It's searchable and can be reopened when it's time. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant