Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add CID.encodedLength() & remove leading uint32 CID length from encoded aes form #60

Merged
merged 1 commit into from
Jan 22, 2021

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Jan 14, 2021

There's two commits in here, the first one could go against master but it's used for the second one which is intended for crypto, #349. If this is an acceptable approach I'll write up some additional tests to exercise encodedLength() a bit more (CIDv0 and bad bytes - and these would be good to do with the aes tests too).

Ref: ipld/specs#349 (comment)

@rvagg
Copy link
Member Author

rvagg commented Jan 14, 2021

Re performance - this means there's a double-decode of 3 varints, once to figure out the length and another to do the decode. An alternative method that would avoid this is to allow CID.decode() to be sloppy with the bytes it accepts. Currently it borks at Digest.decode() which assumes the right number of bytes. We could add a flag to both decode functions that lets it be sloppy, but in terms of clean API I prefer the encodedLength() approach and just eating the double decode.

@mikeal
Copy link
Contributor

mikeal commented Jan 20, 2021

Re performance

Instead of changing decode() we could add another method (CID.fromBytes()) that returns a pair of [ CID, remainingBytes ] for parsing CID’s out of larger buffers.

@rvagg
Copy link
Member Author

rvagg commented Jan 20, 2021

The use-case I have for this in @ipld/car is streaming - I can get access to a fixed number of bytes but I don't know how many I need for a CID, it could be quite large. So I need to know ahead of time how many bytes I'll need for the whole lot and do this by ensuring I have enough bytes to look in the 3 varints and then work out the total and then I can ensure I have at least that many bytes. So for me the encodedBytes() is perfect as it can act like a peek to tell me how many bytes I need to buffer. I could give it ~10 bytes and it'll give me a the answer I need. A fromBytes() isn't useful at all for the streaming case as it assumes that you're dealing with all of the bytes that there are.

I'm not saying fromBytes() is a bad idea, it just doesn't fit the streaming case. So we either deal with the two cases separately or try and make a single solution that fits both.

Some of the relevant code for the streaming case is here https://github.com/ipld/js-car/blob/16d11873a33ae1935beb17847de6fbb90faba1d3/lib/decoder.js#L53-L69 . Note how I have a wrapper around the incoming stream data that lets me request exactly(size), so I can ensure sufficient buffering and cross chunk boundaries transparently. It would be nice to upstream the CID/multihash portions of this, but not essential yet as I don't have a concrete second use-case for it yet, but it'll come I'm sure.

@rvagg
Copy link
Member Author

rvagg commented Jan 20, 2021

There's a why not both? in here perhaps - maybe an inspectBytes() that returns an object that has the initial varints and a calculation of the total length. CID.inspectBytes(initialBytes) -> { version, codec, multihashCode, multihashLength, prefixLength } - I could use that to combine prefixLength and multihashLength to figure out what I need. Internally this could be the start of decode() and fromBytes() so we reduce varint decode duplication. And it could also be used in cases where you want to peek into a CID without fully decoding it to check on it -- perhaps you want to check if the bytes you have are a DAG-CBOR+Blake2-256 and that's all you care about.

@mikeal
Copy link
Contributor

mikeal commented Jan 21, 2021 via email

@rvagg rvagg force-pushed the rvagg/crypto-cid-length branch from 09a6573 to 9d83ab8 Compare January 21, 2021 03:02
@rvagg
Copy link
Member Author

rvagg commented Jan 21, 2021

OK, so I'm trying to reconcile API complexity and naming concerns here, but this is what I've come up with:

  • CID.decode(bytes): the existing API, unchanged externally but heavily modified internally
  • CID.decodeFirst(bytes): same as CID.decode() but returns [cid:CID, remainder:Uint8Array] and can be used for this non-streaming case where you have all the bites and you want to chomp off the CID from the start.
  • CID.inspectBytes(initialBytes): provide it with the first 6-10 (or more) bytes of a CID and you'll get: { version, codec, multihashCode, digestSize, multihashSize, size } (all numbers). Which is useful for the streaming case, and is used internally for these other methods.

One major change internally is that it now bypasses Digest.decode() and constructs new Digest() directly because it has everything needed to dissect the multihash bytes. This is a little unfortunate because there's a tiny amount of duplication now and we're not pushing off multihash decoding entirely to Digest but retaining some of it in CID. I'm not super happy about this but I think it's workable as long as we keep the test coverage solid (I've duplicated some of the Digest cases into CID to ensure that we're solid).

@rvagg rvagg force-pushed the rvagg/crypto-cid-length branch from 9d83ab8 to 80ba05f Compare January 21, 2021 03:12
@rvagg rvagg changed the base branch from crypto to master January 21, 2021 03:12
@rvagg
Copy link
Member Author

rvagg commented Jan 21, 2021

I've rebased this against master as a stand-alone commit that just adds these factory methods to CID.

For the record (so it can be re-applied later), the additional diff for the crypto PR #59 that I've removed is:

diff --git a/src/codecs/aes.js b/src/codecs/aes.js
index cd14de5..e46c236 100644
--- a/src/codecs/aes.js
+++ b/src/codecs/aes.js
@@ -3,24 +3,6 @@ import aes from 'js-crypto-aes'
 import CID from '../cid.js'
 import { codec } from './codec.js'

-const enc32 = value => {
-  value = +value
-  const buff = new Uint8Array(4)
-  buff[3] = (value >>> 24)
-  buff[2] = (value >>> 16)
-  buff[1] = (value >>> 8)
-  buff[0] = (value & 0xff)
-  return buff
-}
-
-const readUInt32LE = (buffer) => {
-  const offset = buffer.byteLength - 4
-  return ((buffer[offset]) |
-          (buffer[offset + 1] << 8) |
-          (buffer[offset + 2] << 16)) +
-          (buffer[offset + 3] * 0x1000000)
-}
-
 let code = 0x1400

 const concat = buffers => Uint8Array.from(buffers.map(b => [...b]).flat())
@@ -37,18 +19,17 @@ const mkcrypto = ({ name, code, ivsize }) => {
   const decrypt = async ({ key, value }) => {
     let { bytes, iv } = value
     bytes = await aes.decrypt(bytes, key, { name: name.toUpperCase(), iv, tagLength: 16 })
-    const len = readUInt32LE(bytes.subarray(0, 4))
-    const cid = CID.decode(bytes.subarray(4, 4 + len))
-    bytes = bytes.subarray(4 + len)
-    return { cid, bytes }
+    const [cid, remainder] = CID.decodeFirst(bytes)
+    return { cid, bytes: remainder }
   }
+
   const encrypt = async ({ key, cid, bytes }) => {
-    const len = enc32(cid.bytes.byteLength)
     const iv = random.getRandomBytes(ivsize)
-    const msg = concat([len, cid.bytes, bytes])
+    const msg = concat([cid.bytes, bytes])
     bytes = await aes.encrypt(msg, key, { name: name.toUpperCase(), iv, tagLength: 16 })
     return { bytes, iv }
   }
+
   return { encode, decode, encrypt, decrypt, code, name: name.toLowerCase() }
 }

@rvagg rvagg force-pushed the rvagg/crypto-cid-length branch from 80ba05f to 8e57fdd Compare January 21, 2021 03:23
@mikeal
Copy link
Contributor

mikeal commented Jan 21, 2021

Looks great! :)

@mikeal mikeal self-requested a review January 21, 2021 05:07
@rvagg rvagg merged commit 1f976fb into master Jan 22, 2021
@rvagg rvagg deleted the rvagg/crypto-cid-length branch January 22, 2021 05:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants