add CID.encodedLength() & remove leading uint32 CID length from encoded aes form #60

rvagg · 2021-01-14T11:04:25Z

There's two commits in here, the first one could go against master but it's used for the second one which is intended for crypto, #349. If this is an acceptable approach I'll write up some additional tests to exercise encodedLength() a bit more (CIDv0 and bad bytes - and these would be good to do with the aes tests too).

Ref: ipld/specs#349 (comment)

rvagg · 2021-01-14T11:07:50Z

Re performance - this means there's a double-decode of 3 varints, once to figure out the length and another to do the decode. An alternative method that would avoid this is to allow CID.decode() to be sloppy with the bytes it accepts. Currently it borks at Digest.decode() which assumes the right number of bytes. We could add a flag to both decode functions that lets it be sloppy, but in terms of clean API I prefer the encodedLength() approach and just eating the double decode.

mikeal · 2021-01-20T19:04:11Z

Re performance

Instead of changing decode() we could add another method (CID.fromBytes()) that returns a pair of [ CID, remainingBytes ] for parsing CID’s out of larger buffers.

rvagg · 2021-01-20T22:58:59Z

The use-case I have for this in @ipld/car is streaming - I can get access to a fixed number of bytes but I don't know how many I need for a CID, it could be quite large. So I need to know ahead of time how many bytes I'll need for the whole lot and do this by ensuring I have enough bytes to look in the 3 varints and then work out the total and then I can ensure I have at least that many bytes. So for me the encodedBytes() is perfect as it can act like a peek to tell me how many bytes I need to buffer. I could give it ~10 bytes and it'll give me a the answer I need. A fromBytes() isn't useful at all for the streaming case as it assumes that you're dealing with all of the bytes that there are.

I'm not saying fromBytes() is a bad idea, it just doesn't fit the streaming case. So we either deal with the two cases separately or try and make a single solution that fits both.

Some of the relevant code for the streaming case is here https://github.com/ipld/js-car/blob/16d11873a33ae1935beb17847de6fbb90faba1d3/lib/decoder.js#L53-L69 . Note how I have a wrapper around the incoming stream data that lets me request exactly(size), so I can ensure sufficient buffering and cross chunk boundaries transparently. It would be nice to upstream the CID/multihash portions of this, but not essential yet as I don't have a concrete second use-case for it yet, but it'll come I'm sure.

rvagg · 2021-01-20T23:05:36Z

There's a why not both? in here perhaps - maybe an inspectBytes() that returns an object that has the initial varints and a calculation of the total length. CID.inspectBytes(initialBytes) -> { version, codec, multihashCode, multihashLength, prefixLength } - I could use that to combine prefixLength and multihashLength to figure out what I need. Internally this could be the start of decode() and fromBytes() so we reduce varint decode duplication. And it could also be used in cases where you want to peek into a CID without fully decoding it to check on it -- perhaps you want to check if the bytes you have are a DAG-CBOR+Blake2-256 and that's all you care about.

mikeal · 2021-01-21T00:15:06Z

Ya, probably best to just return all the data you’ve parsed that could possibly be useful since you went and parsed it already anyway :)

…

-Mikeal

________________________________ From: Rod Vagg <[email protected]> Sent: Wednesday, January 20, 2021 3:05:51 PM To: multiformats/js-multiformats <[email protected]> Cc: Mikeal Rogers <[email protected]>; Comment <[email protected]> Subject: Re: [multiformats/js-multiformats] add CID.encodedLength() & remove leading uint32 CID length from encoded aes form (#60) There's a why not both? in here perhaps - maybe an inspectBytes() that returns an object that has the initial varints and a calculation of the total length. CID.inspectBytes(initialBytes) -> { version, codec, multihashCode, multihashLength, prefixLength } - I could use that to combine prefixLength and multihashLength to figure out what I need. Internally this could be the start of decode() and fromBytes() so we reduce varint decode duplication. And it could also be used in cases where you want to peek into a CID without fully decoding it to check on it -- perhaps you want to check if the bytes you have are a DAG-CBOR+Blake2-256 and that's all you care about. — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#60 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAAAEQ7AKSP3XL3U7IH37QTS25OU7ANCNFSM4WCIFXJA>.

rvagg · 2021-01-21T03:07:42Z

OK, so I'm trying to reconcile API complexity and naming concerns here, but this is what I've come up with:

CID.decode(bytes): the existing API, unchanged externally but heavily modified internally
CID.decodeFirst(bytes): same as CID.decode() but returns [cid:CID, remainder:Uint8Array] and can be used for this non-streaming case where you have all the bites and you want to chomp off the CID from the start.
CID.inspectBytes(initialBytes): provide it with the first 6-10 (or more) bytes of a CID and you'll get: { version, codec, multihashCode, digestSize, multihashSize, size } (all numbers). Which is useful for the streaming case, and is used internally for these other methods.

One major change internally is that it now bypasses Digest.decode() and constructs new Digest() directly because it has everything needed to dissect the multihash bytes. This is a little unfortunate because there's a tiny amount of duplication now and we're not pushing off multihash decoding entirely to Digest but retaining some of it in CID. I'm not super happy about this but I think it's workable as long as we keep the test coverage solid (I've duplicated some of the Digest cases into CID to ensure that we're solid).

rvagg · 2021-01-21T03:14:10Z

I've rebased this against master as a stand-alone commit that just adds these factory methods to CID.

For the record (so it can be re-applied later), the additional diff for the crypto PR #59 that I've removed is:

diff --git a/src/codecs/aes.js b/src/codecs/aes.js
index cd14de5..e46c236 100644
--- a/src/codecs/aes.js
+++ b/src/codecs/aes.js
@@ -3,24 +3,6 @@ import aes from 'js-crypto-aes'
 import CID from '../cid.js'
 import { codec } from './codec.js'

-const enc32 = value => {
-  value = +value
-  const buff = new Uint8Array(4)
-  buff[3] = (value >>> 24)
-  buff[2] = (value >>> 16)
-  buff[1] = (value >>> 8)
-  buff[0] = (value & 0xff)
-  return buff
-}
-
-const readUInt32LE = (buffer) => {
-  const offset = buffer.byteLength - 4
-  return ((buffer[offset]) |
-          (buffer[offset + 1] << 8) |
-          (buffer[offset + 2] << 16)) +
-          (buffer[offset + 3] * 0x1000000)
-}
-
 let code = 0x1400

 const concat = buffers => Uint8Array.from(buffers.map(b => [...b]).flat())
@@ -37,18 +19,17 @@ const mkcrypto = ({ name, code, ivsize }) => {
   const decrypt = async ({ key, value }) => {
     let { bytes, iv } = value
     bytes = await aes.decrypt(bytes, key, { name: name.toUpperCase(), iv, tagLength: 16 })
-    const len = readUInt32LE(bytes.subarray(0, 4))
-    const cid = CID.decode(bytes.subarray(4, 4 + len))
-    bytes = bytes.subarray(4 + len)
-    return { cid, bytes }
+    const [cid, remainder] = CID.decodeFirst(bytes)
+    return { cid, bytes: remainder }
   }
+
   const encrypt = async ({ key, cid, bytes }) => {
-    const len = enc32(cid.bytes.byteLength)
     const iv = random.getRandomBytes(ivsize)
-    const msg = concat([len, cid.bytes, bytes])
+    const msg = concat([cid.bytes, bytes])
     bytes = await aes.encrypt(msg, key, { name: name.toUpperCase(), iv, tagLength: 16 })
     return { bytes, iv }
   }
+
   return { encode, decode, encrypt, decrypt, code, name: name.toLowerCase() }
 }

mikeal · 2021-01-21T05:07:14Z

Looks great! :)

rvagg mentioned this pull request Jan 14, 2021

codec: AES encrypted blocks ipld/specs#349

Open

rvagg force-pushed the rvagg/crypto-cid-length branch from 09a6573 to 9d83ab8 Compare January 21, 2021 03:02

rvagg force-pushed the rvagg/crypto-cid-length branch from 9d83ab8 to 80ba05f Compare January 21, 2021 03:12

rvagg changed the base branch from crypto to master January 21, 2021 03:12

feat: CID.inspectBytes() and CID.decodeFirst()

8e57fdd

rvagg force-pushed the rvagg/crypto-cid-length branch from 80ba05f to 8e57fdd Compare January 21, 2021 03:23

mikeal self-requested a review January 21, 2021 05:07

rvagg merged commit 1f976fb into master Jan 22, 2021

rvagg deleted the rvagg/crypto-cid-length branch January 22, 2021 05:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add CID.encodedLength() & remove leading uint32 CID length from encoded aes form #60

add CID.encodedLength() & remove leading uint32 CID length from encoded aes form #60

rvagg commented Jan 14, 2021

rvagg commented Jan 14, 2021

mikeal commented Jan 20, 2021

rvagg commented Jan 20, 2021

rvagg commented Jan 20, 2021

mikeal commented Jan 21, 2021 via email

rvagg commented Jan 21, 2021

rvagg commented Jan 21, 2021

mikeal commented Jan 21, 2021

add CID.encodedLength() & remove leading uint32 CID length from encoded aes form #60

add CID.encodedLength() & remove leading uint32 CID length from encoded aes form #60

Conversation

rvagg commented Jan 14, 2021

rvagg commented Jan 14, 2021

mikeal commented Jan 20, 2021

rvagg commented Jan 20, 2021

rvagg commented Jan 20, 2021

mikeal commented Jan 21, 2021 via email

rvagg commented Jan 21, 2021

rvagg commented Jan 21, 2021

mikeal commented Jan 21, 2021