Refactor full join iterator to allow access to build tracker #10246
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Relates to #10240. When looping over build partitions for each stream batch, a full outer join requires that we keep track of which rows in the build side partitions have been referenced across all stream batches. After all the stream batches have been processed across all sub-partitions, the build-side row tracking data per partition can be used to perform the anti-join needed to finish results of the full outer join.
This refactors the full outer join iterator to have a sub-iterator that performs the left or right outer join and tracks the build side rows as it goes. After it is done iterating, callers can release the tracking data. This removes the need for a final batch concept in the abstract join iterator, but the abstract iterator does need to know whether it's safe to close the iterator early when hasNext returns false.