-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix query timeout handling by checking searchContext.isSearchTimedOut() #16882
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16882 +/- ##
============================================
- Coverage 72.21% 72.13% -0.08%
+ Complexity 65335 65325 -10
============================================
Files 5318 5318
Lines 304081 304083 +2
Branches 43995 43996 +1
============================================
- Hits 219578 219365 -213
- Misses 66541 66740 +199
- Partials 17962 17978 +16 ☔ View full report in Codecov by Sentry. |
43e54c0
to
e47aab2
Compare
e47aab2
to
da3160c
Compare
@@ -305,6 +305,10 @@ protected void search(List<LeafReaderContext> leaves, Weight weight, Collector c | |||
*/ | |||
@Override | |||
protected void searchLeaf(LeafReaderContext ctx, Weight weight, Collector collector) throws IOException { | |||
// If search is already flagged as timed out, do not continue | |||
if (searchContext.isSearchTimedOut()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @reta, I will have more time to take a closer look at this later this week.
From what I recall off the top of my head the cancellable.checkCancelled();
right below this should already be doing the timeout checking and then throwing the QueryPhase.TimeExceededException
. The searchContext.setSearchTimedOut(true);
logic was needed because with concurrent search any exceptions thrown in searchLeaf
would be handled by the Lucene TaskExecutor
and it was difficult to figure out from the query phase thread if the search had actually timed out or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cancellable.checkCancelled()
seems to be somehow not short-circuiting the search
That's definitely strange. I do see that searchLeaf
gets called for each leaf even post the timeout and we leave it to cancellable.checkCancelled()
to throw an exception which we than catch for each leaf. I wonder if this repeated exception handling is what is causing it to take longer?
Maybe we should even have the searchContext.isSearchTimedOut()
check in the search
method above so we can short circuit those for loops when the timeout is hit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should even have the
searchContext.isSearchTimedOut()
check in thesearch
method above so we can short circuit those for loops when the timeout is hit?
What I've seen, the timeout is hit by first searchLeaf
call (those are called in loop), so the check in search
method may not need this check (I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reta, In the next code: cancellable.checkCancelled()
(line 316) will do the timeout check, it looks like duplicate work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could clearly see that by checking how long tests are taking w/o this change.
Currently the only reason I can think why this would be the case is because cancellable.checkCancelled()
involves throwing and catching an exception which is slowing things down. I think we may need to run this piece of code through a profiler to understand why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Andriy Redko <[email protected]>
da3160c
to
751ad99
Compare
Closing, not solving the issue |
Description
I only briefly looked at overall implementation and it looks like we may lost some timeout handling guarantees, likely while implementing the concurrent search. Fix query timeout handling by checking searchContext.isSearchTimedOut().
Related Issues
Resolves #16056 #9401
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.