-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Improve performance for Gravsearch queries #2857
Conversation
This is an attempt at a minimally invasive fix for some of the performance issue we face with Gravsearch. Gravsearch queries contain `FILTER NOT EXISTS` blocks which are used to only retrieve resources which were not marked as deleted. For some reason this slows down the query substantially. Replacing this block with a `MINUS` makes simple queries faster. In general, MINUS is often more efficient than FILTER NOT EXISTS for simple cases because it directly computes the difference between two result sets. It can benefit from query optimization and indexing. FILTER NOT EXISTS, especially when used with complex subqueries or patterns, may be slower, as it needs to check for the non-existence of a certain pattern within the result set.
Codecov ReportAll modified lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2857 +/- ##
===========================================
+ Coverage 18.00% 88.05% +70.05%
===========================================
Files 281 244 -37
Lines 28899 23123 -5776
===========================================
+ Hits 5202 20362 +15160
+ Misses 23697 2761 -20936 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! How big is the difference?
Have not measured it exactly: The query in the issue is on a resource of webern restricted to a Bibliography class with two query pattern |
This is an attempt at a minimally invasive fix for some of the performance issue we face with Gravsearch.
Gravsearch queries contain
FILTER NOT EXISTS
blocks which are used to only retrieve resources which were not marked as deleted. For some reason this slows down the query substantially. Replacing this block with aMINUS
makes simple queries faster.For the difference between those two statements in the official SPARQL documentation:
https://www.w3.org/TR/sparql11-query/#neg-notexists-minus
Pull Request Checklist
Task Description/Number
Issue Number: DEV-2714
Basic Requirements
Please check if your PR fulfills the following requirements:
PR Type
What kind of change does this PR introduce?
Does this PR introduce a breaking change?
Does this PR change client-test-data?