Preamble

Let us first disabuse ourselves of the notion that this is anyhthing more than a toy database.
That said, it’s written in a language which is easy to experiment with, on top of a simple database which is easy to use.
Also, none of what I’m proposing in here is peculiar to either Ruby or LMDB.
- Indeed, any language and any direct-attached key-value store that does transactions could support this (I think?)
So whereas other products like Oxigraph are focused on features like SPARQL, I am particularly interested in how you lay out a key-value store in general such that you can represent an RDF store with characteristics like:
- RDF-star (which I should just do anyway)
- a change history (i.e., undo)
- dealing with multiple users
  - (i.e., access control)
- efficient storage of typed literals
- efficient handling of large literals and data: URIs
  - unicode normalization for literals for sure
  - outsourcing to content-addressable storage would be ideal
  - There are going to be really silly SPARQL queries like searching substrings in data: URIs
    - at the basic graph level we will probably just have to serve those up and deal with the cost of doing that
- Inferencing:
  - RDFS, OWL, SHACL inferencing for basic graph queries
    - don’t generate statements here, just return them if the inferences resolve
- Layering:
  - think ~unionfs~ but for RDF stores
  - “union graphs”, contexts which merge two or more other contexts together
    - no context is kind of like the union of all contexts
    - except triples have to be stored in an invisible null context if they aren’t explicitly ascribed a context
      - if you select without a context it should return statements from all contexts at once
      - if you delete a triple (ie not a quad) it should delete it from all contexts (y/n?)
    - it should be possible to specify contexts that union other arbitrary contexts together
      - this should recurse but probably not loop/self-reference
      - the question (as ever) will be when you write to one of these, what happens?
  - “consensus graphs” which extend the idea of union graphs to a shared reality for multiple users
  - “proxy graphs” that map to other systems (e.g. SQL)
    - or even other RDF stores
  - statement-generating layers that do things we actually do want statements in the graph for, but generated rather than stored (or perhaps merely cached, and thus not subject to versioning)
    - e.g. “soft” inferences, stuff written in vocab specs that had no way to formally express at the time
      - I’m thinking specifically how ?c a skos:OrderedCollection; skos:memberList (?m1 ?m2 ?mn) implies ?c skos:member ?m1 and so on.
      - Totally achievable with SHACL rules
    - e.g. stateful or aggregate statements computed from other statements
      - again this is totally doable with SHACL.

RDF-star

at root there are terms
- terms can be normalized and hashed
- each term is assigned a numeric identifier that is local to the database and not otherwise exposed
  - assume this is a native-endian size_t integer; we are not gonna screw around with portability across cpu architectures
    - so intel (and apple silicon coincidentally) will be 64-bit little-endian
- statements are composed of terms
  - statements can be represented as: statement id => [subject id, predicate id, object id]
- quad stores have contexts
  - a context is just a term
  - context id => statement id
    - also statement id => context id
- the gist of RDF* is that entire statements can also be terms
  - and this can be recursive
  - so subjects and objects can now be statements in addition to URIs and bnodes (and literals for objects)
- so it shouldn’t be the end of the world to make that a thing
- albeit backward-compatibility to existing stores might be a problem
  - well if anybody wants to hire me te do that for them, they can

change history

anyway, that aside, what we’re actually after is being able to access the state of the database at the instant of a particular transaction
- random access is ideal
- indeed random access is probably necessary, all things considered
so there should be a basic key-value map that maps statement identifiers to statements
- then there is another one that maps statements to contexts; this is how contexts are handled
each transaction can basically be seen as a “meta-context”
- i.e., the state after the transaction is committed may as well have its own context URL.
- the grammar of change in an rdf store reduces to:
  - statements added
  - statements removed
- we can work with this
again, you have layer zero which maps between terms and hashes/internal IDs
- this is like saying “the database has seen these terms.”
you have layer one which maps statements (which are also considered terms) to their referents
- this is like saying “the database has seen these statements.”
- (again note statements are also terms under RDF*.)
layer two says which contexts the statements belong to.
- this is like saying “the context currently contains these statements.”
- there is a “null” context that includes all statements ever

make a sandwich layer between raw statements and context for current state

between-/ish/: you can easily imagine removing a statement from one context and adding it to another within a single transaction
every transaction can be represented as adding and/or removing zero or more quads such that the union of both sets is nonempty
- otherwise there’s nothing to record
- in other words to be recorded as a transaction you have to either add or remove at least one quad, otherwise it’s a no-op
originally considered using generated contexts as a surrogate interface for identifying individual states
- this obviously isn’t going to work because a context implies what remains is a triple, not a quad, so diffs that don’t change anything but the context of a given statement aren’t going to be visible
- although ehhh that’s gonna be weird already because you’ll have to have individual contexts for the add side and remove side
  - how else are you going to represent statements that were removed?
anyway there is the technical problem of how to implement this without a shitload of waste
- change ID
- statements removed
- statements added
if the change ID monotonically increases (it should, at least internally) on retrieval we just do this:
- retrieve the statement from whatever stateless storage
- check if it has been added by whatever change ID we’re currently looking at
- check if it has not been subsequently removed
  - if it has been subsequently removed, check if it has been re-added
  - basically we need a mapping of statement ID to change ID
    - why not just stick a bit on the end of that as to whether it’s added or removed
    - so we have added and removed tables of the form change id => statement id
    - we also have i dunno, state or something of the form statement id => change id, bit for added/removed

principals (multi-user)

each individual user gets their own quad store from their point of view
“consensus graph” for multiple users
- union of individual spaces
  - one context identifier everybody involved can read in its totality
- every statement you add goes into your own slice and is visible to everybody in the group
- you can’t add or delete statements in other people’s slices and they can’t change yours
- though they should be able to transfer ownership of a set of statements to you somehow
  - (but the person receiving should be able to decline the transfer)

access control

evaluate different approaches
- resource-based
  - individual resources or sets of resources?
  - privileges:
    - know the existence of a resource
      - i.e. you don’t see statements with this rsource
    - read statements where the resource is a subject
      - going to have to censor owl:inverseOf etc, i.e. access control will have to be evaluated before inferences
    - add statements with this subject
    - remove statements with this subject
- statement-based
  - just access-control entire contexts?
  - that would probably be easiest
- identity-oriented vs capability-oriented
  - would kinda love to do capability-oriented

layered graphs

yeah this is gonna be hard lol

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TODO.org

TODO.org

Preamble

RDF-star

change history

make a sandwich layer between raw statements and context for current state

principals (multi-user)

access control

layered graphs

Files

TODO.org

Latest commit

History

TODO.org

File metadata and controls

Preamble

RDF-star

change history

make a sandwich layer between raw statements and context for current state

principals (multi-user)

access control

layered graphs