Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(gravsearch): Prevent duplicate results #1626

Merged
merged 47 commits into from
Apr 16, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
3dbb59c
feat(api-v2): Refactor to use ForbiddenResource consistently (ongoing).
Feb 19, 2020
b6d3cf2
Merge branch 'develop' into wip/1543-forbidden-resource
Feb 28, 2020
49f4214
test: Fix tests.
Feb 28, 2020
9171746
feat(api-v2): Use ForbiddenResource consistently (ongoing).
Feb 28, 2020
15c3be5
Merge branch 'develop' into wip/1543-forbidden-resource
Feb 28, 2020
cae954d
refactor(api-v2): Move ForbiddenResource from triplestore to app.
Feb 28, 2020
d6c8fc0
refactor(test): Simplify code.
Feb 28, 2020
7f1d91f
refactor(api-v2): Simplify response processing (ongoing).
Feb 28, 2020
623630c
Merge branch 'develop' into wip/1543-forbidden-resource
Feb 28, 2020
5d1b28e
feat(api-v2): Add ForbiddenValue (ongoing).
Mar 3, 2020
e86b20b
feat(api-v2): Remove ForbiddenResource (ongoing).
Mar 4, 2020
c25808e
test(api-v2): Fix compile error.
Mar 4, 2020
d8d1c38
feat(api-v2): Get rid of ForbiddenResource (ongoing).
Mar 5, 2020
da23fc6
feat(api-v2): Remove ForbiddenResource (ongoing).
Mar 5, 2020
baeff8a
feat(api-v2): Remove ForbiddenResource (ongoing).
Mar 5, 2020
144863d
feat(api-v2): Remove ForbiddenResource (ongoing).
Mar 5, 2020
f3ed00c
refactor(api-v2): Move checkResourceIris into ReadResourcesSequenceV2.
Mar 5, 2020
5de2b47
style(api-v2): Clean up a few things.
Mar 5, 2020
ba61578
docs(gravsearch): Update docs.
Mar 6, 2020
8a0c8c6
Merge branch 'develop' into wip/1543-forbidden-resource
Mar 9, 2020
ce602dc
initial commit of a possible test case
loicjaouen Mar 9, 2020
9499f85
add the count route
loicjaouen Mar 10, 2020
62d843c
Merge branch 'develop' into wip/1543-forbidden-resource
Mar 10, 2020
81ec5c9
Merge branch 'develop' into wip/1588_duplicates
Mar 11, 2020
06a2c41
Merge branch 'develop' into wip/1588_duplicates
Mar 11, 2020
d56811c
Merge branch 'develop' into wip/1543-forbidden-resource
Mar 11, 2020
01e1eef
Merge branch 'wip/1543-forbidden-resource' into wip/1588_duplicates
Mar 11, 2020
85541cd
refactor(gravsearch): Don't query values that the user didn't request…
Mar 11, 2020
40fa184
test(gravsearch): Add client test data with paging.
Mar 12, 2020
926c942
refactor(gravsearch): Don't query values that the user didn't request…
Mar 12, 2020
b7a132e
refactor(gravsearch): Don't query values that the user didn't request.
Mar 12, 2020
6bf6d43
fix(gravsearch): Merge duplicate rows in prequery results.
Mar 12, 2020
d6a0f3a
refactor(gravsearch): Get rid of some vars, make things more functional.
Mar 13, 2020
3934254
style(gravesearch): Rename method.
Mar 13, 2020
717da58
style(gravsearch): Fix comment.
Mar 13, 2020
b46c4f0
docs(gravsearch): Update design docs.
Mar 20, 2020
5232359
Merge branch 'wip/1543-forbidden-resource' into wip/1588_duplicates
Mar 20, 2020
8617571
docs(gravsearch): Update design docs.
Mar 20, 2020
8e81c67
feat(upgrade): Increment knora-base version.
Mar 20, 2020
c3b19cd
Merge branch 'wip/1543-forbidden-resource' into wip/1588_duplicates
Mar 20, 2020
9794698
test(ResourcesResponderV2): Increase timeouts.
Mar 20, 2020
8993daa
Merge branch 'develop' into wip/1588_duplicates
Apr 3, 2020
3db3a82
Merge branch 'develop' into wip/1588_duplicates
Apr 8, 2020
c7b9703
fix(gravsearch): Handle empty GROUP_CONCAT values resulting from unbo…
Apr 8, 2020
9f38418
fix(gravsearch): Ignore empty strings in GROUP_CONCAT results.
Apr 8, 2020
8d4fdfb
style(gravsearch): Improve comments.
Apr 9, 2020
06be757
refactor(gravsearch): Clarify code after review.
Apr 16, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 20 additions & 12 deletions docs/src/paradox/05-internals/design/api-v2/gravsearch.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,16 +174,10 @@ PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
}
```

The prequery's SELECT clause is built using the member variables defined in `AbstractPrequeryGenerator`.
State of member variables after transformation of the input query into the prequery:

- `mainResourceVariable`: `QueryVariable(page)`
- `dependentResourceVariables`: `Set(QueryVariable(book))`
- `dependentResourceVariablesGroupConcat`: `Set(QueryVariable(book__Concat))`
- `valueObjectVariables`: `Set(QueryVariable(book__LinkValue), QueryVariable(seqnum))`: `?book` represents the dependent resource and `?book__LinkValue` the link value connecting `?page` and `?book`.
- `valueObjectVariablesGroupConcat`: `Set(QueryVariable(seqnum__Concat), QueryVariable(book__LinkValue__Concat))`

The resulting SELECT clause of the prequery looks as follows:
The prequery's SELECT clause is built by
`NonTriplestoreSpecificGravsearchToPrequeryTransformer.getSelectColumns`,
based on the variables used in the input query's `CONSTRUCT` clause.
The resulting SELECT clause looks as follows:

```sparql
SELECT DISTINCT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unaware of the construct (GROUP_CONCAT(DISTINCT(IF(BOUND(?book), STR(?book), "")); SEPARATOR='') AS ?book__Concat) (82f8a55). I presume this means that if the variable is bound, an IRI is returned, otherwise an empty string.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is explained in the paragraph starting on line 211:

Each GROUP_CONCAT checks whether the concatenated variable is bound in each result in the group; if a variable is unbound, we concatenate an empty string. This is necessary because, in Apache Jena (and perhaps other triplestores), "If GROUP_CONCAT has an unbound value in the list of values to concat, the overall result is 'error'" (see this Jena issue).

Expand Down Expand Up @@ -219,6 +213,11 @@ is unbound, we concatenate an empty string. This is necessary because, in Apache
triplestores), "If `GROUP_CONCAT` has an unbound value in the list of values to concat, the overall result is 'error'"
(see [this Jena issue](https://issues.apache.org/jira/browse/JENA-1856)).

If the input query contains a `UNION`, and a variable is bound in one branch
of the `UNION` and not in another branch, it is possible that the prequery
will return more than one row per main resource. To deal with this situation,
`SearchResponderV2` merges rows that contain the same main resource IRI.

### Main Query

The purpose of the main query is to get all requested information about the main resource, dependent resources, and value objects.
Expand All @@ -233,8 +232,17 @@ The classes involved in generating the main query can be found in

The main query is a SPARQL CONSTRUCT query. Its generation is handled by the
method `GravsearchMainQueryGenerator.createMainQuery`.
It takes three arguments: `mainResourceIris: Set[IriRef], dependentResourceIris:
Set[IriRef], valueObjectIris: Set[IRI]`. From the given Iris, statements are
It takes three arguments:
`mainResourceIris: Set[IriRef], dependentResourceIris: Set[IriRef], valueObjectIris: Set[IRI]`.

These sets are constructed based on information about variables representing
dependent resources and value objects in the prequery, which is provided by
`NonTriplestoreSpecificGravsearchToPrequeryTransformer`:

- `dependentResourceVariablesGroupConcat`: `Set(QueryVariable(book__Concat))`
- `valueObjectVariablesGroupConcat`: `Set(QueryVariable(seqnum__Concat), QueryVariable(book__LinkValue__Concat))`

From the given Iris, statements are
generated that ask for complete information on *exactly* these resources and
values. For any given resource Iri, only the values present in
`valueObjectIris` are to be queried. This is achieved by using SPARQL's
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -824,6 +824,17 @@ case class ReadResourcesSequenceV2(resources: Seq[ReadResourceV2],
)
}

private def getOntologiesFromResource(resource: ReadResourceV2): Set[SmartIri] = {
val propertyIriOntologies: Set[SmartIri] = resource.values.keySet.map(_.getOntologyFromEntity)

val valueOntologies: Set[SmartIri] = resource.values.values.flatten.collect {
case readLinkValueV2: ReadLinkValueV2 =>
readLinkValueV2.valueContent.nestedResource.map(nested => getOntologiesFromResource(nested))
}.flatten.flatten.toSet

propertyIriOntologies ++ valueOntologies + resource.resourceClassIri.getOntologyFromEntity
}

// #generateJsonLD
private def generateJsonLD(targetSchema: ApiV2Schema, settings: SettingsImpl, schemaOptions: Set[SchemaOption]): JsonLDDocument = {
// #generateJsonLD
Expand All @@ -843,14 +854,7 @@ case class ReadResourcesSequenceV2(resources: Seq[ReadResourceV2],
// Make JSON-LD prefixes for the project-specific ontologies used in the response.

val projectSpecificOntologiesUsed: Set[SmartIri] = resources.flatMap {
resource =>
val resourceOntology = resource.resourceClassIri.getOntologyFromEntity

val propertyOntologies = resource.values.keySet.map {
property => property.getOntologyFromEntity
}

propertyOntologies + resourceOntology
resource => getOntologiesFromResource(resource)
}.toSet.filter(!_.isKnoraBuiltInDefinitionIri)

// Make the knora-api prefix for the target schema.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -744,7 +744,7 @@ class ValuesResponderV1(responderData: ResponderData) extends Responder(responde
// If we're updating a link, findResourceWithValueResult will contain the IRI of the property that points to the
// knora-base:LinkValue, but we'll need the IRI of the corresponding link property.
val propertyIri = changeValueRequest.value match {
case linkUpdateV1: LinkUpdateV1 => stringFormatter.linkValuePropertyIri2LinkPropertyIri(findResourceWithValueResult.propertyIri)
case linkUpdateV1: LinkUpdateV1 => stringFormatter.linkValuePropertyIriToLinkPropertyIri(findResourceWithValueResult.propertyIri)
case _ => findResourceWithValueResult.propertyIri
}

Expand Down Expand Up @@ -1075,7 +1075,7 @@ class ValuesResponderV1(responderData: ResponderData) extends Responder(responde
case (p, o) => p == OntologyConstants.KnoraBase.HasPermissions
}.map(_._2).getOrElse(throw InconsistentTriplestoreDataException(s"Value ${deleteValueRequest.valueIri} has no permissions"))

val linkPropertyIri = stringFormatter.linkValuePropertyIri2LinkPropertyIri(findResourceWithValueResult.propertyIri)
val linkPropertyIri = stringFormatter.linkValuePropertyIriToLinkPropertyIri(findResourceWithValueResult.propertyIri)

for {
// Get project info
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1210,6 +1210,7 @@ class ResourcesResponderV2(responderData: ResponderData) extends ResponderWithSt
apiResponse: ReadResourcesSequenceV2 <- ConstructResponseUtilV2.createApiResponse(
mainResourcesAndValueRdfData = mainResourcesAndValueRdfData,
orderByResourceIri = resourceIrisDistinct,
pageSizeBeforeFiltering = resourceIris.size, // doesn't matter because we're not doing paging
mappings = mappingsAsMap,
queryStandoff = queryStandoff,
versionDate = versionDate,
Expand Down Expand Up @@ -1264,6 +1265,7 @@ class ResourcesResponderV2(responderData: ResponderData) extends ResponderWithSt
apiResponse: ReadResourcesSequenceV2 <- ConstructResponseUtilV2.createApiResponse(
mainResourcesAndValueRdfData = mainResourcesAndValueRdfData,
orderByResourceIri = resourceIrisDistinct,
pageSizeBeforeFiltering = resourceIris.size, // doesn't matter because we're not doing paging
mappings = Map.empty[IRI, MappingAndXSLTransformation],
queryStandoff = false,
versionDate = None,
Expand Down
Loading