Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Level of SPARQL Update support #125

Open
kjetilk opened this issue Nov 26, 2019 · 43 comments
Open

Level of SPARQL Update support #125

kjetilk opened this issue Nov 26, 2019 · 43 comments
Assignees
Labels
doc: Ecosystem status: Nominated An issue that has been nominated for the next monthly milestone topic: querying topic: resource access

Comments

@kjetilk
Copy link
Member

kjetilk commented Nov 26, 2019

A certain level of SPARQL Update support is expected in Solid, to be used with the PATCH method (#85). This discussion has begun, and this issue is to discuss some details that we need to decide upon.

The main questions are:

  1. What subset of SPARQL Update is suited as a minimal requirement for Solid?
  2. How does WAC apply to the minimal subset?
  3. How does WAC apply to SPARQL Update for implementations that will use a fully compliant SPARQL implementation?
  4. What would be the URI of SPARQL Endpoint(s)?
  5. Should any SPARQL Update operations be forbidden?
  6. Should complex HTTP Verbs rather be SPARQL operations?

A short overview of SPARQL Update

SPARQL Update has three operations that are of relevance to us, INSERT DATA, DELETE DATA, and INSERT/DELETE. SPARQL always operates over quads and quad patterns, whether they are quads that are passed directly as data to the two former operations, or used with the keywords WITH and USING to the INSERT/DELETE operation. In the context of Solid, each resource is represented with triples, and since the graph part is optional, we can safely ignore it for now.

INSERT DATA takes triples in their curly brackets, and RDF merges the triples into the resource. DELETE DATA deletes exactly the triples that it has in curly brackets if they exist. It is important to note that there are no variables with these two operations, so no pattern matching is going on. They are the simplest forms of SPARQL Update.

If pattern matching is required, i.e. you need variables, then those variables goes in a WHERE clause, and thus, the more advanced INSERT/DELETE operation must be used.

Minimal SPARQL Update requirement

Quite clearly, the two operations INSERT DATA and DELETE DATA has some interesting properties, as supporting them does not require a query engine, it only requires that the RDF library can parse the queries, which are trivial since it is just triples, no patterns, and that it can perform an RDF Merge operation, and delete triples. The DELETE DATA operation can't contained blank nodes, which also simplify. Moreover, since both operations can be performed in a single HTTP request, it can be implemented as an atomic operation with relative ease.

Once a WHERE clause is added, for the more complex INSERT/DELETE operation, pretty much a full SPARQL engine with an almost complete larger parser and query planner is required.

Thus, a requirement to support INSERT DATA and DELETE DATA in a single HTTP PATCH request seems like an attractive option.

WAC as applied to Minimal SPARQL Update

INSERT DATA seems clearly an acl:Append operation, and DELETE DATA is clearly a acl:Write operation.

The question is if acl:Read should also be required. Imagine a malicious user "Mallory": Mallory is authorized to write, but not to read, and does not particularly care if he destroys things, he just wants to check if certain triples were there. In that case, he can send the query

DELETE DATA {
  <alice/profile#me> ex:age 14 . 
}

The fear now would be that Mallory can figure out from the response that Alice was in fact 14 years old. With SPARQL as defined, this will have no effect, so it shouldn't be a problem. However, we have challenged this behaviour, so this may be a problem with Solid, that may be solved by requiring acl:Read to be able to perform a DELETE DATA operation.

The risk may be so remote that it isn't a real concern, but I think we need to discuss it.

WAC applied to SPARQL as a whole

Some implementations may have a full SPARQL Engine available and will wish to use it. For them, we need to define how WAC applies. As above, INSERT is clearly an acl:Append operation, DELETE is clearly an acl:Write operation, but with the caveat above, it may also be an acl:Read operation. Whenever the WHERE clause is added, acl:Read would also be required. There is a long-term possibility that data could participate in the query without being exposed to the user, but lets only be concerned with the permission modes we currently have for now. Then, obviously, all the SPARQL read queries require acl:Read.

SPARQL Endpoint

Historically, SPARQL has been queried through a server-wide SPARQL Endpoint, but the PATCH use case typically makes every resource its own endpoint, and will only query data from that resource. This is a useful simplification, because it removes the need to use graph naming. This assumption may be relaxed in the future, but for now, I suggest we keep it that way.

Other SPARQL Update operations

SPARQL Update also defines operations LOAD, CLEAR, CREATE, DROP, COPY, MOVE and ADD. We might need a brief note on what to do with them.

COPY and MOVE operations

The COPY use case has been proposed in #19 , and a possible solution could be to use the SPARQL Update COPY operation instead of a protocol verb. Similar with MOVE.

Forbidden SPARQL Update operations?

Most of the other operations maps trivially to HTTP methods as defined in Solid through LDP. It may be problematic to support them, as WAC must be applied in a consistent manner, and failure to do so may cause leaks. OTOH, those who have a full SPARQL engine may find it bothersome if they cannot use them. We need to define the behaviour.

@kjetilk kjetilk mentioned this issue Nov 26, 2019
@RubenVerborgh
Copy link
Contributor

Couple of quick points from my side:

  • What would be the URI of SPARQL Endpoint(s)?

None.

As it stands, there is no notion of a SPARQL endpoint, in the sense of the SPARQL procotol (which would use GET or POST).

Rather, we are using the patch format with its MIME type application/sparql-update as one (mandatory?) accepted patch document of a PATCH operation.

  • What subset of SPARQL Update is suited as a minimal requirement for Solid?

Additional question: And what should happen when clients go outside of that subset?

  • Should complex HTTP Verbs rather be SPARQL operations?

No, not by default as the minimal interface, given that:

  • We do not use the SPARQL protocol, but rather the SPARQL UPDATE syntax and semantics.
  • Other patch documents such as Notation3 patches exist (support to be decided); SPARQL does not have a special relationship (other than that its support for patch documents might be mandatory).

Other question:

  • What semaphore semantics do we want? The current Solid draft spec deviates from the SPARQL UPDATE standard, which is—in my opinion—highly undesired.
    • My suggestion there would be to follow the SPARQL standard by default, but allow different behaviors, either through Link headers from the client`, or by using a different patch body altogether (such as Notation3), for the semantics are still ours to define.

@kjetilk
Copy link
Member Author

kjetilk commented Dec 19, 2019

As it stands, there is no notion of a SPARQL endpoint, in the sense of the SPARQL procotol (which would use GET or POST).

Right, a flaw in my mental model. Thanks for pointing that out.

  • What subset of SPARQL Update is suited as a minimal requirement for Solid?

Additional question: And what should happen when clients go outside of that subset?

👍

  • Should complex HTTP Verbs rather be SPARQL operations?

No, not by default as the minimal interface, given that:

* We do not use the SPARQL protocol, but rather the SPARQL UPDATE syntax and semantics.

Ah, but I think you misunderstood my point there. I'm not talking about HTTP verbs in relation to SPARQL Protocol, I'm talking about them in relation to Solid, like in the proposal to introduce HTTP Verb COPY from WebDAV in #19 . Another implementation option there might be to use the SPARQL Update syntax and semantics, not the WebDAV one.

* What semaphore semantics do we want? The current Solid draft spec deviates from the SPARQL UPDATE standard, which is—in my opinion—highly undesired.
  
  * My suggestion there would be to follow the SPARQL standard by default, but allow different behaviors, either through `Link` headers from the client`, or by using a different patch body altogether (such as Notation3), for the semantics are still ours to define.

Yeah, it is a pain. I would like to add some more sophistication in SPARQL at this point, but it would take quite an effort to argue for that, I think.

Meanwhile, I would like to see the queries that are used, especially if the DELETE/INSERT/WHERE can be dropped in favour of DELETE DATA ; INSERT DATA.

@kjetilk
Copy link
Member Author

kjetilk commented Jan 14, 2020

Since one of the most urgent decisions that we need from this is the minimal SPARQL Update requirement, I started to look into what could inform this decision. The TL;DR is: "Is it sufficient for a Solid server to support DELETE DATA and INSERT DATA query forms?"

I'd like to hear the input of @RubenVerborgh and @rubensworks , as it can be informed by the LDFlex work.

I also looked into rdflib, and found that it seems to look to see if a statement has a blank node, and therefore interpretes that as a quad pattern, and so uses a WHERE clause: https://github.com/linkeddata/rdflib.js/blob/master/src/update-manager.ts#L776-L797
The key to understand the requirement is therefore to see to what extent blank nodes are used in updates using rdflib.

@rubensworks
Copy link

Currently, LDflex can also produce WHERE clauses for insertions and deletions.
Several examples can be seen in the unit tests.

I do however think that it may be possible to disallow WHERE clauses, require the client to perform a query beforehand, and fill in all the triples that need to be mutated directly.
In some cases, this could cause a blowup in the number of triples though, but this may be manageable in the context of solid.

@RubenVerborgh
Copy link
Contributor

The TL;DR is: "Is it sufficient for a Solid server to support DELETE DATA and INSERT DATA query forms?"

I don't think so; the semaphore functionality is important to many Solid apps. See #139

@kjetilk
Copy link
Member Author

kjetilk commented Jan 15, 2020

But isn't that orthogonal to the semaphore issue?

I just saw you restarted discussion in solid/solid-spec#193 , I'll go over there.

@RubenVerborgh
Copy link
Contributor

RubenVerborgh commented Jan 15, 2020

But isn't that orthogonal to the semaphore issue?

The current semaphore mechanism relies on INSERT … WHERE, in which the WHERE clause ensures the existence of one thing before writing another. The less related part is whether the semaphore should also work if there is more than one match to the WHERE clause (spec says yes, Tim says no).

@kjetilk
Copy link
Member Author

kjetilk commented Jan 16, 2020

I have made a loose proposal to the SPARQL 1.2 CG mailing list, which I think would address the semaphore problem as well as the confidentiality problem:
https://lists.w3.org/Archives/Public/public-sparql-12/2020Jan/0000.html

I suggest that further discussion is held in a query-panel repository (solid/process#186) or in the SPARQL 1.2 CG as appropriate.

@RubenVerborgh
Copy link
Contributor

Note: we might (or might not) want to move issues such as this one over there.

@kjetilk
Copy link
Member Author

kjetilk commented Jan 16, 2020

Yeah, actually, my idea, which is codified in solid/process#182 is that this is exactly the kind of overarching issue that should live in the spec repo for the editors to track, and for the panel to report progress on, to move it along the editors project board, but the panel will create issues like "what permissions are required for different operations" will be opened in the panel repo board, and each of them isn't the editors task to track.

@kjetilk
Copy link
Member Author

kjetilk commented Jan 17, 2020

A new Query Panel has been formed, and the issues from here have been detailed as individual issues there. There's also a gitter channel. Further detailed discussion should happen there.

This will now serve as the birds-eye view issue that serves as a contact point between the Query Panel and the Editors.

@ericprud
Copy link

ericprud commented Aug 3, 2021

The Solid Editors Meeting today, with @timbl , @csarven and @kjetilk present resolved:

  1. We adopt @ericprud 's proposed subset, but add INSERT DATA and DELETE DATA.

I just touched that proposal page to fix the link to an HTML-ized RFC.

Feel free to edit that proposal and the associated yacker. If you want to keep the orig grammar around for posterity, you can save the edited grammar under a new name. One way to decide whether to duplicate is whether someone somewhere would want to instantiate that proposal without INSERT DATA and DELETE DATA. I kinda doubt it but leave it to your discretion.

@TallTed
Copy link
Contributor

TallTed commented Aug 3, 2021

Because it's important to keep things clear, ad wiki pages should always be considered moving targets, and future wiki edits might change the SPARQL UPDATE subset on that page...

Subset page as resolved for adoption -- https://www.w3.org/2001/sw/wiki/index.php?title=SparqlPatch&oldid=4800

Today's tweaked page (which did not change the SPARQL UPDATE subset) -- https://www.w3.org/2001/sw/wiki/index.php?title=SparqlPatch&oldid=5335

@kjetilk kjetilk added the status: Nominated An issue that has been nominated for the next monthly milestone label Sep 14, 2021
@kjetilk kjetilk added this to the October 2021 milestone Sep 22, 2021
@kjetilk
Copy link
Member Author

kjetilk commented Sep 22, 2021

This issue has been nominated for drafting phase for the next milestone.

@kjetilk
Copy link
Member Author

kjetilk commented Oct 5, 2021

I have started work on the draft, and for the first iteration, I have attempted to produce a BNF, but yacker was down now, so I couldn't use that. Also, it is the first time I'm writing a BNF, and I have some confusion around the differences between the variations of BNF.

I tried to stay close to SPARQL-patch, but made some changes and also introduced no less than 10 new rules to accommodate for INSERT DATA and DELETE DATA, which isn't well defined in the SPARQL 1.1 spec. Great if you could have a pass, @ericprud , to see if this looks sensible...

[30p]  Update1	  		  ::=  Prologue ( InsertData | DeleteData | Modify )
[38p]  InsertData	          ::= 'INSERT DATA' TripleData
[39p]  DeleteData	  	  ::= 'DELETE DATA' TripleData

[41p]  Modify                     ::=  ( DeleteClause InsertClause? | InsertClause ) 'WHERE' GroupGraphPattern
[4]    Prologue                   ::=  ( BaseDecl | PrefixDecl )*
[5]    BaseDecl                   ::=  'BASE' IRIREF
[6]    PrefixDecl                 ::=  'PREFIX' PNAME_NS IRIREF

[42p]  DeleteClause               ::=  'DELETE' '{' TriplesTemplate '}'
[43p]  InsertClause               ::=  'INSERT' '{' TriplesTemplate '}'
[52]   TriplesTemplate            ::=  TriplesSameSubject ( '.' TriplesTemplate? )?
[52d]  TripleData                 ::=  TriplesDataSameSubject ( '.' TripleData? )?
[53p]  GroupGraphPattern          ::=  '{' GroupGraphPatternSub '}'
[54p]  GroupGraphPatternSub       ::=  TriplesBlock
[55p]  TriplesBlock               ::=  TriplesSameSubject ( '.' TriplesBlock? )?
[75]   TriplesSameSubject         ::=  VarOrTerm PropertyListNotEmpty | TriplesNode PropertyList
[75d]  TriplesDataSameSubject     ::=  GraphTerm PropertyDataListNotEmpty | TriplesDataNode PropertyDataList
[76]   PropertyList               ::=  PropertyListNotEmpty?
[77]   PropertyListNotEmpty       ::=  Verb ObjectList ( ';' ( Verb ObjectList )? )*
[76d]  PropertyDataList           ::=  PropertyDataListNotEmpty?
[77d]  PropertyDataListNotEmpty   ::=  Verb ObjectDataList ( ';' ( Verb ObjectDataList )? )*
[78p]  Verb                       ::=  iri | 'a'
[79]   ObjectList                 ::=  Object ( ',' Object )*
[79d]  ObjectDataList             ::=  ObjectData ( ',' ObjectData )*
[80]   Object                     ::=  GraphNode
[80d]  ObjectData                 ::=  GraphDataNode
[98]   TriplesNode                ::=  Collection | BlankNodePropertyList
[98d]  TriplesDataNode            ::=  CollectionData | BlankNodeDataPropertyList
[99]   BlankNodePropertyList      ::=  '[' PropertyListNotEmpty ']'
[99d]  BlankNodeDataPropertyList  ::=  '[' PropertyDataListNotEmpty ']'
[102]  Collection                 ::=  '(' GraphNode+ ')'
[102d] CollectionData             ::=  '(' GraphDataNode+ ')'
[104]  GraphNode                  ::=  VarOrTerm | TriplesNode
[104d] GraphDataNode              ::=  GraphTerm | TriplesDataNode
[106]  VarOrTerm                  ::=  Var | GraphTerm
[108]  Var                        ::=  VAR1 | VAR2
[109]  GraphTerm                  ::=  iri | RDFLiteral | NumericLiteral | BooleanLiteral | BlankNode | NIL
[129]  RDFLiteral                 ::=  String ( LANGTAG | ( '^^' iri ) )?
[130]  NumericLiteral             ::=  NumericLiteralUnsigned | NumericLiteralPositive | NumericLiteralNegative
[131]  NumericLiteralUnsigned     ::=  INTEGER | DECIMAL | DOUBLE
[132]  NumericLiteralPositive     ::=  INTEGER_POSITIVE | DECIMAL_POSITIVE | DOUBLE_POSITIVE
[133]  NumericLiteralNegative     ::=  INTEGER_NEGATIVE | DECIMAL_NEGATIVE | DOUBLE_NEGATIVE
[134]  BooleanLiteral             ::=  'true' | 'false'
[135]  String                     ::=  STRING_LITERAL1 | STRING_LITERAL2
       				     | STRING_LITERAL_LONG1 | STRING_LITERAL_LONG2
[136]  iri                        ::=  IRIREF | PrefixedName
[137]  PrefixedName               ::=  PNAME_LN | PNAME_NS
[138]  BlankNode                  ::=  BLANK_NODE_LABEL | ANON

@terminals
[139]  IRIREF                     ::=  '<' ([^<>\"{}|^`\\]-[#x00-#x20])* '>'
[140]  PNAME_NS                   ::=  PN_PREFIX? ':'
[141]  PNAME_LN                   ::=  PNAME_NS PN_LOCAL
[142]  BLANK_NODE_LABEL           ::=  '_:' ( PN_CHARS_U | [0-9] ) ((PN_CHARS|'.')* PN_CHARS)?
[143]  VAR1                       ::=  '?' VARNAME
[144]  VAR2                       ::=  '$' VARNAME
[145]  LANGTAG                    ::=  '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*
[146]  INTEGER                    ::=  [0-9]+
[147]  DECIMAL                    ::=  [0-9]* '.' [0-9]+
[148]  DOUBLE                     ::=  [0-9]+ '.' [0-9]* EXPONENT | '.' ([0-9])+ EXPONENT | ([0-9])+ EXPONENT
[149]  INTEGER_POSITIVE           ::=  '+' INTEGER
[150]  DECIMAL_POSITIVE           ::=  '+' DECIMAL
[151]  DOUBLE_POSITIVE            ::=  '+' DOUBLE
[152]  INTEGER_NEGATIVE           ::=  '-' INTEGER
[153]  DECIMAL_NEGATIVE           ::=  '-' DECIMAL
[154]  DOUBLE_NEGATIVE            ::=  '-' DOUBLE
[155]  EXPONENT                   ::=  [eE] [+-]? [0-9]+
[156]  STRING_LITERAL1            ::=  "'" ( ([^#x27#x5C#xA#xD]) | ECHAR )* "'"
[157]  STRING_LITERAL2            ::=  '"' ( ([^#x22#x5C#xA#xD]) | ECHAR )* '"'
[158]  STRING_LITERAL_LONG1       ::=  "'''" ( ( "'" | "''" )? ( [^'\\] | ECHAR ) )* "'''"
[159]  STRING_LITERAL_LONG2       ::=  '"""' ( ( '"' | '""' )? ( [^"\\] | ECHAR ) )* '"""'
[160]  ECHAR                      ::=  '\\' [tbnrf\\"']
[161]  NIL                        ::=  '(' WS* ')'
[162]  WS                         ::=  #x20 | #x9 | #xD | #xA
[163]  ANON                       ::=  '[' WS* ']'
[164]  PN_CHARS_BASE              ::=  [A-Z] | [a-z] | [#x00C0-#x00D6] | [#x00D8-#x00F6] | [#x00F8-#x02FF] |
                                       [#x0370-#x037D] | [#x037F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] |
                                       [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] |
                                       [#x10000-#xEFFFF]
[165]  PN_CHARS_U                 ::=  PN_CHARS_BASE | '_'
[166]  VARNAME                    ::=  ( PN_CHARS_U | [0-9] ) ( PN_CHARS_U | [0-9] | #x00B7 | [#x0300-#x036F] | [#x203F-#x2040] )*
[167]  PN_CHARS                   ::=  PN_CHARS_U | '-' | [0-9] | #x00B7 | [#x0300-#x036F] | [#x203F-#x2040]
[168]  PN_PREFIX                  ::=  PN_CHARS_BASE ((PN_CHARS|'.')* PN_CHARS)?
[169]  PN_LOCAL                   ::=  (PN_CHARS_U | ':' | [0-9] | PLX ) ((PN_CHARS | '.' | ':' | PLX)* (PN_CHARS | ':' | PLX) )?
[170]  PLX                        ::=  PERCENT | PN_LOCAL_ESC
[171]  PERCENT                    ::=  '%' HEX HEX
[172]  HEX                        ::=  [0-9] | [A-F] | [a-f]
[173]  PN_LOCAL_ESC               ::=  '\\' ( '_' | '~' | '.' | '-' | '!' | '$' | '&' | "'" | '(' | ')' | '*' | '+' | ','
                                       | ';' | '=' | '/' | '?' | '#' | '@' | '%' )

@pass ::= [ \t\r\n]+
 | "#" [^\r\n]* 

@jeff-zucker
Copy link
Member

Where is TripleData defined?

@kjetilk
Copy link
Member Author

kjetilk commented Oct 5, 2021

ooops, good catch, I'll update :-)

@rubensworks
Copy link

The BNF itselfs looks correct to me at first glance.

For reference, it does not cover quad support. Perhaps we want to await the resolution of #291 before this issue here is pushed forward?

@kjetilk
Copy link
Member Author

kjetilk commented Oct 6, 2021

The BNF itselfs looks correct to me at first glance.

Great!

For reference, it does not cover quad support. Perhaps we want to await the resolution of #291 before this issue here is pushed forward?

No, that is deliberate, this is what we need for defining 1.0. Quad support will need to be after 1.0. :-)

@kjetilk
Copy link
Member Author

kjetilk commented Oct 11, 2021

One thing I came to think of was that it doesn't seem that @ericprud 's subset supports DELETE WHERE, which I find very useful (it was a commercial project I had back in the day where most of our queries were like DELETE WHERE queries that motivated that feature).

Any comment on that, @ericprud ? I wonder if we could/should support that by making a simple adjustment to DeleteClause?

@kjetilk
Copy link
Member Author

kjetilk commented Oct 15, 2021

@ericprud and I are now analysing the subset, and we just had a list of the things that we've taken out of the grammar. I figured this list is nice to see for completeness, so here goes (now updated with a more complete list):

So these are grammar rules that are currently not in the subset that we are defining.

@kjetilk
Copy link
Member Author

kjetilk commented Oct 15, 2021

One question that just arose, should we have more than one query in a single HTTP request?

@RubenVerborgh
Copy link
Contributor

One question that just arose, should we have more than one query in a single HTTP request?

Just not yet, please 🙂

Another question from me: do we want a (sub-) MIME type or a profile for this?

@kjetilk
Copy link
Member Author

kjetilk commented Oct 15, 2021

One question that just arose, should we have more than one query in a single HTTP request?

Just not yet, please slightly_smiling_face

👍 Just wondered if that was in the wild :-)

Another question from me: do we want a (sub-) MIME type or a profile for this?

I think that can be a post-0.9 question?

@RubenVerborgh
Copy link
Contributor

👍 Just wondered if that was in the wild :-)

Unfortunately it is, as I realized again in #322 (comment)

NSS supports what rdflib.js supports; which seems to be a sequence of two queries DELETE DATA / INSERT DATA ,but only in that order.

@kjetilk
Copy link
Member Author

kjetilk commented Nov 3, 2021

Solid Editors decided in meeting https://github.com/solid/specification/blob/main/meetings/2021-11-03.md#level-of-sparql-update-support with @justinwb , @RubenVerborgh , @dmitrizagidulin , @csarven and @kjetilk present to define N3 Patch #332 for 0.9, and put the current SPARQL PATCH behaviour at risk, but to come back to it for 1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc: Ecosystem status: Nominated An issue that has been nominated for the next monthly milestone topic: querying topic: resource access
Projects
Status: Drafting Phase
Development

No branches or pull requests

8 participants