-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segregate advance and advanceUninterruptibly flow in postJoinCursor to allow for interrupts in advance #15222
Segregate advance and advanceUninterruptibly flow in postJoinCursor to allow for interrupts in advance #15222
Conversation
…r by calling baseCursor.advance and baseCursor.advanceInterruptibly resp.
can we add test for this scenario if possible unless it is covered already in some other test cases? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add an example in CalciteJoinQueryTest of the form that was mentioned in the issue ? Since the PostJoinCursor is used used to wrap around the Unnest Cursor can you also please check if we are unnesting a high cardinality array column or a MVD the problem mentioned in the issue does not appear anymore
@@ -99,15 +108,15 @@ public Filter getPostJoinFilter() | |||
@Override | |||
public void advance() | |||
{ | |||
advanceUninterruptibly(); | |||
BaseQuery.checkInterrupted(); | |||
baseCursor.advance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be good to put some comments as in why this change is being done in the code for a future developer to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments.
@@ -99,15 +108,15 @@ public Filter getPostJoinFilter() | |||
@Override | |||
public void advance() | |||
{ | |||
advanceUninterruptibly(); | |||
BaseQuery.checkInterrupted(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a note on why the checkInterrupted() is not needed anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment.
I dunno why the checkstyle is failing though. If using IntelliJ it'll be good to add this https://raw.githubusercontent.com/apache/incubator-druid/master/dev/druid_intellij_formatting.xml to IntelliJ |
My intellij doesn't clean up unused imports automatically. I run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add few test cases. LGTM otherwise.
} | ||
} | ||
|
||
private void advanceToMatchUninterruptibly() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please leave a comment here that this can be a long-running CPU call. Which is why advanceUninterruptibly is not directly used in advance() call unlike other cursors. Please link the github issue in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
processing/src/test/java/org/apache/druid/segment/join/PostJoinCursorTest.java
Fixed
Show fixed
Hide fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pranavbhole
I've added a test now to mimic the original issue of postJoinCursor
filter condition not matching any tuples coming out of high-cardinality joins, thereby sending it into an uninterruptible, cpu-intensive and long-running call.
@soumyava
For unnest cases, the scenario (with prior implementation), though possible, is less likely due to absence of (join's) output row_count multiplication property, leading to a much faster exhaustion of input if postJoinCursor
filter condition is never met. But in principle it can potentially happen with any long-running advanceUninterruptibly()
call at any level -- join itself not finding any matching rows for high cardinality tables, for example. Probably maxRowsPerSegment
would act as the limit in such cases.
} | ||
} | ||
|
||
private void advanceToMatchUninterruptibly() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -99,15 +108,15 @@ public Filter getPostJoinFilter() | |||
@Override | |||
public void advance() | |||
{ | |||
advanceUninterruptibly(); | |||
BaseQuery.checkInterrupted(); | |||
baseCursor.advance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments.
@@ -99,15 +108,15 @@ public Filter getPostJoinFilter() | |||
@Override | |||
public void advance() | |||
{ | |||
advanceUninterruptibly(); | |||
BaseQuery.checkInterrupted(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment.
@@ -0,0 +1,230 @@ | |||
package org.apache.druid.segment.join; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the license header here
|
||
cursor.advance(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to leave a blank line at the end of each file. Not sure though. If checkstyle oks it am good with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it is needed, the checkstyle is complaining. It is good to run a mvn checkstyle:checkstyle
beforehand as Laksh have suggested earlier. Another thing I do is in IntelliJ when adding a new section of code have the druid coding style and run a reformat code that fixes most of the codestyling for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall follow. Thanks!
@Override | ||
public boolean isDone() | ||
{ | ||
return cursor.isDone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this return false? May not matter but given its infinite cursor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
catch (InterruptedException e) { | ||
throw new RuntimeException(e); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit : catch clause can be removed and exception declared on method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
sleep(5000); | ||
joinCursorThread.interrupt(); | ||
sleep(5000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A 10 second test is expensive. Rather, we should use a count latch or something to wait for X records to be processed. The latch wait can have 5 second timeout. Similarly, the 5 second sleep after interrupting should be removed and we can wait in a loop or use count latch again in ExceptionHandler.
I also like https://github.com/awaitility/awaitility/wiki/Usage#simple in case you want to bring that in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went with a latch, but thanks for the pointer to this library.
if (exceptionHandler.getException() == null) { | ||
sleep(1); | ||
} else { | ||
assertTrue(exceptionHandler.getException() instanceof QueryInterruptedException); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should just return the call here and also add assertFalse
at the end of the method. So we are checking that we did get QueryInterruptedException.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Done - thanks.
…o allow for interrupts in advance (apache#15222) Currently advance function in postJoinCursor calls advanceUninterruptibly which in turn keeps calling baseCursor.advanceUninterruptibly until the post join condition matches, without checking for interrupts. This causes the CPU to hit 100% without getting a chance for query to be cancelled. With this change, the call flow of advance and advanceUninterruptibly is separated out so that they call baseCursor.advance and baseCursor.advanceUninterruptibly in them, respectively, giving a chance for interrupts in the former case between successive calls to baseCursor.advance.
…o allow for interrupts in advance (#15222) (#15278) Currently advance function in postJoinCursor calls advanceUninterruptibly which in turn keeps calling baseCursor.advanceUninterruptibly until the post join condition matches, without checking for interrupts. This causes the CPU to hit 100% without getting a chance for query to be cancelled. With this change, the call flow of advance and advanceUninterruptibly is separated out so that they call baseCursor.advance and baseCursor.advanceUninterruptibly in them, respectively, giving a chance for interrupts in the former case between successive calls to baseCursor.advance. Co-authored-by: Vishesh Garg <[email protected]>
…o allow for interrupts in advance (apache#15222) Currently advance function in postJoinCursor calls advanceUninterruptibly which in turn keeps calling baseCursor.advanceUninterruptibly until the post join condition matches, without checking for interrupts. This causes the CPU to hit 100% without getting a chance for query to be cancelled. With this change, the call flow of advance and advanceUninterruptibly is separated out so that they call baseCursor.advance and baseCursor.advanceUninterruptibly in them, respectively, giving a chance for interrupts in the former case between successive calls to baseCursor.advance.
Fixes: #14514
Description
Currently
advance
function in postJoinCursor callsadvanceUninterruptibly
which in turn keeps callingbaseCursor.advanceUninterruptibly
until the post join condition matches, without checking for interrupts. This causes the CPU to hit 100% without getting a chance for query to be cancelled.With this change, the call flow of
advance
andadvanceUninterruptibly
is separated out so that they callbaseCursor.advance
andbaseCursor.advanceUninterruptibly
in them, respectively, giving a chance for interrupts in the former case between successive calls tobaseCursor.advance
.