-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Codec proposal: N-Quads (RDF format) #180
Comments
IPLD <-> RDF interop has also been discussed in a few times in the past, without concrete results: |
I suspect @mikeal and @vmx will have more mature thoughts about RDF than me, but I'd say that in general multicodec can be used to disambiguate types of objects where any such ambiguity exists. It's not strictly tied to IPLD, although IPLD is a logical consumer of multicodecs. Where something is being transmitted or stored and you want to ensure clarity about what type of thing it is, multicodec should be helpful. So with that in mind, if you have a use-case where that's applicable, IPLD or not, then an entry in the multicodec table would be a good thing. My preference would be to be adding things where there are concrete examples of them existing in the wild where multicodec could be applied, or at least concrete plans on how they could be applied, but we're taking a fairly relaxed approach to that lately and the idea of explicitly labelling things as "draft" for this purpose is on the cards: #165 Do you see a path to this being used any time soon, or is would this be more a symbolic move for now by saying that multicodec & RDF have potential connectivity? |
I know that I'd use it right away! For the Underlay we're currently storing and referencing lots of N-Quads files as |
Is there utility you’d get out of an IPLD representation beyond The only thing a codec would give you is a Data Model (for this it would just be JSON types) representation of the file format, but you’d have to ensure the serialized representation is kept below the block size limit (1mb) which is going to be hard since you don’t have a way to link between the blocks in IPLD to handle N-Quad files that are larger than the limit because it doesn’t link by hash. That said, if you can get some utility out of it there’s no real barrier to adding the codec as long as we document these constraints, I’d just caution against using it if you’re going to be encoding large data structures this way. |
Yes! It would give us a way of referencing individual quads in a dataset (using integer index paths), which we want to do for tracing provenance. There's no widely accepted method for doing this in the RDF world right now. You're right that the graph structure (what nodes are connected by what edges) won't be directly represented in IPLD - but it couldn't if we tried, since RDF is a directed labelled multigraph (ie possibly containing cycles). I understand that codecs are a different abstraction level than the IPLD data model, and that there would have to be different representation strategies for 1mb+ datasets, but I still see this as having real utility as a building block for people working to decentralize RDF. |
@joeltg I went down the rdf over ipld and ran into the fact that rdf graphs contain cycles and thus wouldn't be a good fit for IPLD. |
@jonnycrunch the IPLD data model representation of an N-Quads file wouldn't represent the dataset "directly" by having nodes be maps and edges be keys like in JSON-LD, it would represent the dataset at the lower-level RDFJS Data Model, as a flat array of quads. IPLD data model stuff could be its own conversation; this issue is just about getting an N-Quads multicodec. |
Multicodecs describe a lot. We started to put them into categories. One of them is "ipld" to describe codecs that make sense within the IPLD ecosystem. I don't think it's written down anywhere, but I think formats in that category need to support at least So we could put it into another category. Then it would be just an identifier of how things are encoded. I think it would be OK to add such a code, but I it won't add much value to IPLD. IPLD might link to an N-Quad, but that would always be the end of the traversal (a sink), just like the |
This is very interesting... I did some related CBOR work here: https://github.com/transmute-industries/decentralized-cbor in particular, I represent ZLIB_Compressed_NQuads as CBOR... providing compressed representation for JSON-LD with bi-directional transformation between CBOR and JSON-LD.... There is also work in progress of CBOR-LD as well.... (and obviously DAG_CBOR which powers IPLD). I agree with vmx, N-Quads are the end of pure IPLD, but here is nothing stoping your from leaving IPLD and following them further... for example, across DIDs or URIs in the N-Quads...
Some DIDs rely on multicodec already like |
Not sure if this is the right way to bring this up, but I'd like to propose adding a codec for N-Quads files. RDF is the graph data model for the semantic web, and although N-Quads is just one of many RDF serializations, it's commonly regarded as the lowest-level representation with the most regular structure and the least syntactic sugar.
In particular, N-Quads is the output format of the Universal Dataset Normalization Algorithm (URDNA2015) (also brought up in this issue). URDNA2015 is a big deal for the RDF world because it produces a canonical representation (ie two isomorphic datasets will produce the exact same serialized N-Quads string) that is required for all the digital signatures work that's starting to happen, and it's a representation that people will commonly want to hash!
This would also enable a natural interpretation of RDF datasets as IPLD objects, using an IPLD schema for the RDFJS data model with N-Quads as a custom representation.
I see this as a great concrete foundation for bringing the semantic web & decentralized web communities closer together. Is this the kind of codec we're open to adding? Would it be appropriate to open a pull request to table.csv?
The text was updated successfully, but these errors were encountered: