Codec proposal: N-Quads (RDF format) #180

joeltg · 2020-06-30T17:10:06Z

Not sure if this is the right way to bring this up, but I'd like to propose adding a codec for N-Quads files. RDF is the graph data model for the semantic web, and although N-Quads is just one of many RDF serializations, it's commonly regarded as the lowest-level representation with the most regular structure and the least syntactic sugar.

In particular, N-Quads is the output format of the Universal Dataset Normalization Algorithm (URDNA2015) (also brought up in this issue). URDNA2015 is a big deal for the RDF world because it produces a canonical representation (ie two isomorphic datasets will produce the exact same serialized N-Quads string) that is required for all the digital signatures work that's starting to happen, and it's a representation that people will commonly want to hash!

This would also enable a natural interpretation of RDF datasets as IPLD objects, using an IPLD schema for the RDFJS data model with N-Quads as a custom representation.

I see this as a great concrete foundation for bringing the semantic web & decentralized web communities closer together. Is this the kind of codec we're open to adding? Would it be appropriate to open a pull request to table.csv?

joeltg · 2020-06-30T17:21:18Z

IPLD <-> RDF interop has also been discussed in a few times in the past, without concrete results:

rvagg · 2020-07-01T04:21:21Z

I suspect @mikeal and @vmx will have more mature thoughts about RDF than me, but I'd say that in general multicodec can be used to disambiguate types of objects where any such ambiguity exists. It's not strictly tied to IPLD, although IPLD is a logical consumer of multicodecs. Where something is being transmitted or stored and you want to ensure clarity about what type of thing it is, multicodec should be helpful.

So with that in mind, if you have a use-case where that's applicable, IPLD or not, then an entry in the multicodec table would be a good thing. My preference would be to be adding things where there are concrete examples of them existing in the wild where multicodec could be applied, or at least concrete plans on how they could be applied, but we're taking a fairly relaxed approach to that lately and the idea of explicitly labelling things as "draft" for this purpose is on the cards: #165

Do you see a path to this being used any time soon, or is would this be more a symbolic move for now by saying that multicodec & RDF have potential connectivity?

joeltg · 2020-07-01T12:27:48Z

I know that I'd use it right away! For the Underlay we're currently storing and referencing lots of N-Quads files as raw objects - including linking to N-Quads files from other N-Quads files using a dweb:/ipld/ URI format (all identifiers in RDF are URIs). One use case we'd really like to pull off is using CAR archives (or something similar) to collect and package all transitively linked files, so we want to be able to tell whether a CID is an N-Quads file, and we want IPLD to know how to traverse its links.

mikeal · 2020-07-01T18:08:06Z

Is there utility you’d get out of an IPLD representation beyond raw though? My understanding is that links in this format are not addressed by hash, so there’s no way to represent them as links in IPLD, so you’re never actually going to get a graph for this format even if there’s a codec.

The only thing a codec would give you is a Data Model (for this it would just be JSON types) representation of the file format, but you’d have to ensure the serialized representation is kept below the block size limit (1mb) which is going to be hard since you don’t have a way to link between the blocks in IPLD to handle N-Quad files that are larger than the limit because it doesn’t link by hash.

That said, if you can get some utility out of it there’s no real barrier to adding the codec as long as we document these constraints, I’d just caution against using it if you’re going to be encoding large data structures this way.

joeltg · 2020-07-02T15:42:38Z

Is there utility you’d get out of an IPLD representation beyond raw though?

Yes! It would give us a way of referencing individual quads in a dataset (using integer index paths), which we want to do for tracing provenance. There's no widely accepted method for doing this in the RDF world right now.

You're right that the graph structure (what nodes are connected by what edges) won't be directly represented in IPLD - but it couldn't if we tried, since RDF is a directed labelled multigraph (ie possibly containing cycles).

I understand that codecs are a different abstraction level than the IPLD data model, and that there would have to be different representation strategies for 1mb+ datasets, but I still see this as having real utility as a building block for people working to decentralize RDF.

jonnycrunch · 2020-07-10T03:17:50Z

@joeltg I went down the rdf over ipld and ran into the fact that rdf graphs contain cycles and thus wouldn't be a good fit for IPLD.

joeltg · 2020-07-10T12:21:12Z

@jonnycrunch the IPLD data model representation of an N-Quads file wouldn't represent the dataset "directly" by having nodes be maps and edges be keys like in JSON-LD, it would represent the dataset at the lower-level RDFJS Data Model, as a flat array of quads.

IPLD data model stuff could be its own conversation; this issue is just about getting an N-Quads multicodec.

vmx · 2020-07-10T15:14:21Z

Multicodecs describe a lot. We started to put them into categories. One of them is "ipld" to describe codecs that make sense within the IPLD ecosystem. I don't think it's written down anywhere, but I think formats in that category need to support at least Links. Obviously that's not the case for N-Quads.

So we could put it into another category. Then it would be just an identifier of how things are encoded. I think it would be OK to add such a code, but I it won't add much value to IPLD. IPLD might link to an N-Quad, but that would always be the end of the traversal (a sink), just like the raw codec.

OR13 · 2020-08-14T18:15:44Z

This is very interesting... I did some related CBOR work here:

https://github.com/transmute-industries/decentralized-cbor

in particular, I represent ZLIB_Compressed_NQuads as CBOR... providing compressed representation for JSON-LD with bi-directional transformation between CBOR and JSON-LD....

There is also work in progress of CBOR-LD as well.... (and obviously DAG_CBOR which powers IPLD).

I agree with vmx, N-Quads are the end of pure IPLD, but here is nothing stoping your from leaving IPLD and following them further... for example, across DIDs or URIs in the N-Quads...

IPLD1 -> IPLD2 -> NQuads  -> did:sov:123
                          -> did:ethr:456
                          -> https://public.oracle.example.com/credentials/123
                          -> https://ipfs.io/CID
                          -> IPLD3

Some DIDs rely on multicodec already like did:key, and obviously any IRI in an N-Quad might rely on multicodec as well.

OR13 mentioned this issue Jul 8, 2020

Security audit on this approach w3c/vc-di-bbs#27

Closed

OR13 mentioned this issue Aug 14, 2020

CBOR core representation is solid and should NOT be marked as at-risk w3c/did-core#339

Closed

clehner mentioned this issue Mar 25, 2022

Initial implementation spruceid/cacao-zcap-rs#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codec proposal: N-Quads (RDF format) #180

Codec proposal: N-Quads (RDF format) #180

joeltg commented Jun 30, 2020 •

edited

Loading

joeltg commented Jun 30, 2020

rvagg commented Jul 1, 2020

joeltg commented Jul 1, 2020

mikeal commented Jul 1, 2020

joeltg commented Jul 2, 2020 •

edited

Loading

jonnycrunch commented Jul 10, 2020

joeltg commented Jul 10, 2020

vmx commented Jul 10, 2020

OR13 commented Aug 14, 2020 •

edited

Loading

Codec proposal: N-Quads (RDF format) #180

Codec proposal: N-Quads (RDF format) #180

Comments

joeltg commented Jun 30, 2020 • edited Loading

joeltg commented Jun 30, 2020

rvagg commented Jul 1, 2020

joeltg commented Jul 1, 2020

mikeal commented Jul 1, 2020

joeltg commented Jul 2, 2020 • edited Loading

jonnycrunch commented Jul 10, 2020

joeltg commented Jul 10, 2020

vmx commented Jul 10, 2020

OR13 commented Aug 14, 2020 • edited Loading

joeltg commented Jun 30, 2020 •

edited

Loading

joeltg commented Jul 2, 2020 •

edited

Loading

OR13 commented Aug 14, 2020 •

edited

Loading