-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(gravsearch): Prevent duplicate results #1626
Conversation
# Conflicts: # webapi/src/main/scala/org/knora/webapi/util/StringFormatter.scala
so for now, this is only the test case demonstrating the bug of the issue #1588 Sorry @benjamingeer, I assign this to you as you are also the assignee on #1588 |
@loicjaouen What would you like to change in the generated SPARQL SELECT? We need the GROUP_CONCAT for each variable whose values you want to receive in the API response. |
I can confirm that omitting This is the simplified JSON for the first resource: {
"@graph": [
{
"@id": "http://rdfh.ch/0803/2e0252679a35",
"@type": "knora-api:Region",
"knora-api:isRegionOfValue": {
"@id": "http://rdfh.ch/0803/2e0252679a35/values/498d07ec-dddb-4051-bfd1-62626f282fc5",
"@type": "knora-api:LinkValue",
"knora-api:linkValueHasTarget": {
"@id": "http://rdfh.ch/0803/3f89a693a501",
"@type": "http://0.0.0.0:3333/ontology/0803/incunabula/v2#page",
"http://0.0.0.0:3333/ontology/0803/incunabula/v2#partOfValue": {
"@id": "http://rdfh.ch/0803/3f89a693a501/values/133ea4ce-b9a5-4c66-a3ab-38be07ccd201",
"@type": "knora-api:LinkValue",
"knora-api:linkValueHasTarget": {
"@id": "http://rdfh.ch/0803/ff17e5ef9601",
"@type": "http://0.0.0.0:3333/ontology/0803/incunabula/v2#book",
"http://0.0.0.0:3333/ontology/0803/incunabula/v2#title": {
"@id": "http://rdfh.ch/0803/ff17e5ef9601/values/d9a522845006",
"@type": "knora-api:TextValue",
"knora-api:valueAsString": "Zeitglöcklein des Lebens und Leidens Christi",
},
"knora-api:hasIncomingLinkValue": [
{
"@id": "http://rdfh.ch/0803/6240ce899801/values/e29da7bf-bfa5-4842-a4fc-e455c72dbb0a",
"@type": "knora-api:LinkValue",
"knora-api:linkValueHasSource": {
"@id": "http://rdfh.ch/0803/6240ce899801",
"@type": "http://0.0.0.0:3333/ontology/0803/incunabula/v2#page",
"http://0.0.0.0:3333/ontology/0803/incunabula/v2#partOfValue": {
"@id": "http://rdfh.ch/0803/6240ce899801/values/e29da7bf-bfa5-4842-a4fc-e455c72dbb0a",
"@type": "knora-api:LinkValue",
"knora-api:linkValueHasTargetIri": {
"@id": "http://rdfh.ch/0803/ff17e5ef9601"
}
},
"rdfs:label": "a3v"
}
},
{
"@id": "http://rdfh.ch/0803/c568b7239a01/values/46951e64-8f06-47fb-a07a-c8b19c146447",
"@type": "knora-api:LinkValue",
"knora-api:linkValueHasSource": {
"@id": "http://rdfh.ch/0803/c568b7239a01",
"@type": "http://0.0.0.0:3333/ontology/0803/incunabula/v2#page",
"http://0.0.0.0:3333/ontology/0803/incunabula/v2#partOfValue": {
"@id": "http://rdfh.ch/0803/c568b7239a01/values/46951e64-8f06-47fb-a07a-c8b19c146447",
"@type": "knora-api:LinkValue",
"knora-api:linkValueHasTargetIri": {
"@id": "http://rdfh.ch/0803/ff17e5ef9601"
}
},
"rdfs:label": "a7r"
}
}
],
"rdfs:label": "Zeitglöcklein des Lebens und Leidens Christi"
}
},
"rdfs:label": "d8r"
}
},
"rdfs:label": "DEMOX"
}
],
"@context": {
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"knora-api": "http://api.knora.org/ontology/knora-api/v2#"
}
} |
After talking with @tobiasschweizer, we decided to handle further refactoring in a separate PR (see issue #1632). |
# Conflicts: # webapi/src/main/scala/org/knora/webapi/responders/v2/search/MainQueryResultProcessor.scala
- Simplify Gravsearch paging. - Add design docs.
# Conflicts: # docs/src/paradox/05-internals/design/api-v2/gravsearch.md
# Conflicts: # docs/src/paradox/05-internals/design/api-v2/gravsearch.md # webapi/src/main/scala/org/knora/webapi/responders/v2/SearchResponderV2.scala # webapi/src/main/scala/org/knora/webapi/responders/v2/search/MainQueryResultProcessor.scala # webapi/src/main/scala/org/knora/webapi/responders/v2/search/QueryTraverser.scala # webapi/src/main/scala/org/knora/webapi/responders/v2/search/gravsearch/prequery/AbstractPrequeryGenerator.scala # webapi/src/main/scala/org/knora/webapi/responders/v2/search/gravsearch/prequery/NonTriplestoreSpecificGravsearchToPrequeryGenerator.scala
# Conflicts: # docs/src/paradox/05-internals/design/api-v2/gravsearch.md # webapi/src/main/scala/org/knora/webapi/messages/v2/responder/resourcemessages/ResourceMessagesV2.scala # webapi/src/main/scala/org/knora/webapi/responders/v2/ResourcesResponderV2.scala # webapi/src/main/scala/org/knora/webapi/responders/v2/SearchResponderV2.scala # webapi/src/main/scala/org/knora/webapi/responders/v2/StandoffResponderV2.scala # webapi/src/main/scala/org/knora/webapi/util/ConstructResponseUtilV2.scala # webapi/src/test/scala/org/knora/webapi/util/ConstructResponseUtilV2Spec.scala
@benjamingeer I will review this PR today in the afternoon. |
Doing this review will also help me understand our own paper :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a first look, please see my comments. However, I would appreciate if I could talk to you on video chat some time next week to make sure I understand the changes introduced by this PR.
The prequery's SELECT clause is built by | ||
`NonTriplestoreSpecificGravsearchToPrequeryTransformer.getSelectColumns`, | ||
based on the variables used in the input query's `CONSTRUCT` clause. | ||
The resulting SELECT clause looks as follows: | ||
|
||
```sparql | ||
SELECT DISTINCT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unaware of the construct (GROUP_CONCAT(DISTINCT(IF(BOUND(?book), STR(?book), "")); SEPARATOR='') AS ?book__Concat)
(82f8a55). I presume this means that if the variable is bound, an IRI is returned, otherwise an empty string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This is explained in the paragraph starting on line 211:
Each
GROUP_CONCAT
checks whether the concatenated variable is bound in each result in the group; if a variable is unbound, we concatenate an empty string. This is necessary because, in Apache Jena (and perhaps other triplestores), "IfGROUP_CONCAT
has an unbound value in the list of values to concat, the overall result is 'error'" (see this Jena issue).
@@ -281,7 +282,8 @@ class SearchResponderV2(responderData: ResponderData) extends ResponderWithStand | |||
|
|||
// Create a Select prequery | |||
|
|||
nonTriplestoreSpecificConstructToSelectTransformer: NonTriplestoreSpecificGravsearchToCountPrequeryGenerator = new NonTriplestoreSpecificGravsearchToCountPrequeryGenerator( | |||
nonTriplestoreSpecificConstructToSelectTransformer: NonTriplestoreSpecificGravsearchToCountPrequeryTransformer = new NonTriplestoreSpecificGravsearchToCountPrequeryTransformer( | |||
constructClause = inputQuery.constructClause, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you need to know about the Construct clause because you only want to return the IRIs of the resources and values that the client asks for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
* the search criteria. This query will be used to get resource IRIs for a single page of results. These IRIs will be included in a CONSTRUCT | ||
* query to get the actual results for the page. | ||
* | ||
* @param typeInspectionResult the result of type inspection of the original query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add all args to the doc string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 8d4fdfb.
|
||
/** | ||
* Transforms a preprocessed CONSTRUCT query into a SELECT query that returns only the IRIs and sort order of the main resources that matched | ||
* the search criteria. This query will be used to get resource IRIs for a single page of results. These IRIs will be included in a CONSTRUCT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment should include additional information such as:
Transforms a preprocessed CONSTRUCT query into a SELECT query that returns only the IRIs and sort order of the main resources that matched the search criteria and are requested by client in the input query's WHERE clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 8d4fdfb.
@@ -0,0 +1,345 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has NonTriplestoreSpecificGravsearchToPrequeryGenerator.scala
been renamed / moved to NonTriplestoreSpecificGravsearchToPrequeryTransformer
? If yes, I don't see a diff (it looks like the files are unrelated) and it's hard for me to see what has changed exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will attach the diff below.
Here's the diff you asked for: |
No problem. I'll take a couple of days of holiday next week. How about we talk on Friday 16 April? |
Works for me. I would prefer to have an online meeting before 12. Would 10 o'clock work for you? |
Perfect! |
The appointment you created is on a Thursday, but that's also ok for me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the additional comments and adaptions.
Thanks for the review! |
fixes #1588
a simple query in the available
anything
data set:/v2/searchextended/count
/v2/searchextended
the in and outs of this have been discussed in the above mentionned #1588
if ever it helps, here are the generated sparql requests, for the count:
for the search: