-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating plans when using joins with unnest on the left #15075
Conversation
// Will need an instanceof check here | ||
// A future work should look into if the flattenJoin | ||
// can be refactored to omit these instanceof checks | ||
while (current instanceof JoinDataSource || current instanceof UnnestDataSource || current instanceof FilteredDataSource) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add test cases for self join with unnest datasource if we do not have already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, added a test with self join on an unnest data source
looks good to me. |
@@ -476,10 +476,18 @@ private Function<SegmentReference, SegmentReference> createSegmentMapFunctionInt | |||
.orElse(null) | |||
) | |||
); | |||
|
|||
final Function<SegmentReference, SegmentReference> baseMapFn; | |||
if (left instanceof JoinDataSource) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems worth a comment on what is going on. Is it still ok to do if left is not concrete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment as into why we are not using the isConcrete() check and instead using the instanceof check here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is added
joinDataSource.getConditionAnalysis() | ||
) | ||
); | ||
} else if (current instanceof UnnestDataSource) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't seem intuitive to me that we can flatten away unnest and filtered datasources, could we add comments explaining why its ok? is it still ok if the unnest datasource is wrapping a join datasource? like does it flatten through it? where does the unnest and filters go in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add comments. The getAnalysis()
of an Unnest or a filteredDS always delegates to its base. So flattening through a Join->Unnest->Join kind of scenario to get the base data source makes sense as it goes down to find the base concrete data source. In this PR, the filters on the filteredDataSource and unnestDataSource are not pushed down to the left of the join, the unnest filter and the filter on the filteredDataSource remain on the data source. I have added an unit test of Join->Unnest->Join will add another UT of Join->Unnest->Filter->Join
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment and unit test added
* Updating plans when using joins with unnest on the left * Correcting segment map function for hashJoin * The changes done here are not reflected into MSQ yet so these tests might not run in MSQ * native tests * Self joins with unnest data source * Making this pass * Addressing comments by adding explanation and new test
* Updating plans when using joins with unnest on the left * Correcting segment map function for hashJoin * The changes done here are not reflected into MSQ yet so these tests might not run in MSQ * native tests * Self joins with unnest data source * Making this pass * Addressing comments by adding explanation and new test
Previously a query as
would be planned as a join between a query data source on the left and a query data source on the right. Although the results were correct this is limiting performance as query data sources are evaluated at the broker where the number of rows is limited by
maxSubqueryRows
.Additionally native queries like
would fail with an error
Through this PR we do the following to address this:
This PR has: