Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Improve performance for Gravsearch queries #2857

Merged
merged 6 commits into from
Oct 2, 2023

Conversation

seakayone
Copy link
Contributor

@seakayone seakayone commented Sep 29, 2023

This is an attempt at a minimally invasive fix for some of the performance issue we face with Gravsearch.

Gravsearch queries contain FILTER NOT EXISTS blocks which are used to only retrieve resources which were not marked as deleted. For some reason this slows down the query substantially. Replacing this block with a MINUS makes simple queries faster.

For the difference between those two statements in the official SPARQL documentation:
https://www.w3.org/TR/sparql11-query/#neg-notexists-minus

NOT EXISTS and MINUS represent two ways of thinking about negation, one based on testing whether a pattern exists in the data, given the bindings already determined by the query pattern, and one based on removing matches based on the evaluation of two patterns.

Pull Request Checklist

Task Description/Number

Issue Number: DEV-2714

Basic Requirements

Please check if your PR fulfills the following requirements:

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)

PR Type

What kind of change does this PR introduce?

  • fix: represents bug fixes
  • refactor: represents production code refactoring
  • feat: represents a new feature
  • docs: documentation changes (no production code change)
  • chore: maintenance tasks (no production code change)
  • test: all about tests: adding, refactoring tests (no production code change)
  • other... Please describe:

Does this PR introduce a breaking change?

  • Yes
  • No
  • Maybe (not 100% sure => check with FE)

Does this PR change client-test-data?

  • Yes (don't forget to update the JS-LIB team about the change)
  • No

This is an attempt at a minimally invasive fix for some of the performance issue we face with Gravsearch.

Gravsearch queries contain `FILTER NOT EXISTS` blocks which are used to only retrieve resources which were not marked as deleted.
For some reason this slows down the query substantially.
Replacing this block with a `MINUS` makes simple queries faster.

In general, MINUS is often more efficient than FILTER NOT EXISTS for simple cases because it directly computes the difference between two result sets. It can benefit from query optimization and indexing.

FILTER NOT EXISTS, especially when used with complex subqueries or patterns, may be slower, as it needs to check for the non-existence of a certain pattern within the result set.
@seakayone seakayone self-assigned this Sep 29, 2023
@codecov
Copy link

codecov bot commented Sep 29, 2023

Codecov Report

All modified lines are covered by tests ✅

Comparison is base (12402f3) 18.00% compared to head (4d3a090) 88.05%.
Report is 75 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #2857       +/-   ##
===========================================
+ Coverage   18.00%   88.05%   +70.05%     
===========================================
  Files         281      244       -37     
  Lines       28899    23123     -5776     
===========================================
+ Hits         5202    20362    +15160     
+ Misses      23697     2761    -20936     

see 229 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@seakayone seakayone requested review from BalduinLandolt and mpro7 and removed request for BalduinLandolt September 29, 2023 14:06
@seakayone seakayone marked this pull request as ready for review September 29, 2023 14:06
Copy link
Contributor

@BalduinLandolt BalduinLandolt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! How big is the difference?

@seakayone
Copy link
Contributor Author

Cool! How big is the difference?

Have not measured it exactly: The query in the issue is on a resource of webern restricted to a Bibliography class with two query pattern is like on two different properties. That query times out on Fuseki. The improvement in this issue makes it work in a couple of seconds. Hope I can see the exact improvement once I set up our dashboard for the query metrics.

@seakayone seakayone merged commit 86cc4f2 into main Oct 2, 2023
14 checks passed
@seakayone seakayone deleted the fix/speed-up-gravsearch-filter-deleted branch October 2, 2023 08:08
SepidehAlassi pushed a commit to SepidehAlassi/knora that referenced this pull request Nov 17, 2023
@SepidehAlassi SepidehAlassi mentioned this pull request Nov 17, 2023
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants