Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IETF-readiness] Add Prior Art and Translation section, update deprecation FAQ entry #164

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
64 changes: 56 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,24 @@ It is useful to write applications that future-proof their use of hashes, and al
- [Format](#format)
- [Implementations](#implementations)
- [Table for Multihash](#table-for-multihash)
- [Other Tables](#other-tables)
- [Prior Art And Translation](#prior-art-and-translation)
- [Named Information Hash](#named-information-hash)
- [Translation from multihash to named-information hash](#translation-from-multihash-to-named-information-hash)
- [Namespaced UUIDs](#namespaced-uuids)
- [Notes](#notes)
- [Multihash and randomness](#multihash-and-randomness)
- [Insecure / obsolete hash functions](#insecure--obsolete-hash-functions)
- [Non-cryptographic hash functions](#non-cryptographic-hash-functions)
- [Visual Examples](#visual-examples)
- [Maintainers](#maintainers)
- [Consider these 4 different hashes of same input](#consider-these-4-different-hashes-of-same-input)
- [Same length: 256 bits](#same-length-256-bits)
- [Different hash functions](#different-hash-functions)
- [Idea: self-describe the values to distinguish](#idea-self-describe-the-values-to-distinguish)
- [Multihash: fn code + length prefix](#multihash-fn-code--length-prefix)
- [Multihash: a pretty good multiformat](#multihash-a-pretty-good-multiformat)
- [Multihash: has a bunch of implementations already](#multihash-has-a-bunch-of-implementations-already)
- [Contribute](#contribute)
- [References](#references)
- [License](#license)

## Example
Expand Down Expand Up @@ -126,18 +136,45 @@ Yes, but we already have to agree on functions, so this is not hard. The table e

## Table for Multihash

We use a single [Multicodec](https://github.com/multiformats/multicodec) table across all of our multiformat projects. The shared namespace reduces the chances of accidentally interpreting a code in the wrong context. Multihash entries are identified with a `multihash` value in the `tag` column.
We use a single [Multicodec][] table across all of our multiformat projects. The shared namespace reduces the chances of accidentally interpreting a code in the wrong context. Multihash entries are identified with a `multihash` value in the `tag` column.

The current table lives [here](https://github.com/multiformats/multicodec/blob/master/table.csv)

### Other Tables
## Prior Art And Translation

Cannot find a good standard on this. Found some _different_ IANA ones:
In IETF's corpus of normative protocols, there are two partial overlaps worth knowing about to ensure a safe implementation:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to ensure a safe implementation

What does this mean?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meaning, if you're really brownfield or ingesting unknown data and you get something that isn't a multiformat, here are some other prefixes you might want to sniff for as fallback, that might have been put there by other IETF self-description conventions 😄 . any wordsmithing help appreciated!


- https://www.iana.org/assignments/tls-parameters/tls-parameters.xhtml#tls-parameters-18
- http://tools.ietf.org/html/rfc6920#section-9.4
* "Named Information Hash", a.k.a. [RFC-6920](https://datatracker.ietf.org/doc/html/rfc6920), defines an hierarchical URI scheme for content-identifiers, partitioned by enumerated hash functions. The [NIH registry][] at IANA contains all of these.
* UUIDv5, aka "Namespaced UUIDs", defined in [RFC-9562](https://datatracker.ietf.org/doc/html/rfc9562#uuidv5), does the inverse, defining a universal namespace for one hash function, partitioned by the application of that function to multiple URI schemes (i.e. DNS names, valid URLs, etc.)
* The IANA [NIH registry][] has a similar shape and governance mode to the IANA [hashAlgorithm registry][] that TLS 1.2 implementations use to compactly signal supported hash+signature combinations. Since the former has different entries for some hash functions based on output length and the latter does not, the two registries are not alignable. However, given their different contexts, collisions between the two would not be a practical concern for users of either.

They disagree. :(
### Named Information Hash

The "Named Information Hash" URI scheme allows for minimally self-describing hash strings to serve as content-identifiers for arbitrary binary inputs.
This lightweight identifier scheme is defined in [RFC-6920](https://datatracker.ietf.org/doc/html/rfc6920) and the supported hash-context prefixes live in an IANA registry named ["https://www.iana.org/assignments/named-information/named-information.xhtml#hash-alg"](https://www.iana.org/assignments/named-information/named-information.xhtml#hash-alg).
Its syntactic similarity to HTTP headers and [support for MIME content-types](https://datatracker.ietf.org/doc/html/rfc6920#section-3.1) makes it potentially useful for web use-cases, but use-cases are not constrained by URI scheme, only hinted at by the specification in sections 3 through 7.

#### Translation from multihash to named-information hash

Translating from a bare, binary multihash (i.e., a hash value in [`unsigned_varint`](https://github.com/multiformats/unsigned-varint) format, i.e. a minimally-encoded ULEB128 under 64 bits in total length) to a named-information hash in binary format is fairly easy to do insofar as a generic tag for self-describing multihashes was proposed to the [NIH registry][] by [Appendix B](https://www.ietf.org/archive/id/draft-multiformats-multihash-03.html#appendix-D.2) in the 2021 [multihash internet draft](https://www.ietf.org/archive/id/draft-multiformats-multihash-03.html):

1. Strip the prefix bytes from the hash value and use the prefix bytes to identity the hash function used from the [Multicodec][] table
2. If multihash prefix corresponds to any tags in the [NIH registry][]:
1. translate multicodec tag to NIH tag, i.e., if `0x12` (`sha2-256`) in `multicodec` registry, then `0x01` (`sha256`) in `named-information` registry
2. transcode the hash value from [`unsigned varint`](https://github.com/multiformats/unsigned-varint) to standard MSB binary
3. (for binary form:) reattach new prefix to transcoded hash value
4. (for ASCII form:) convert prefix to URL format, i.e., `ni:///sha-256;` for `0x01`, and reattach to base64-encoded transcoded hash value
3. If multihash prefix does NOT map cleanly to a registered value in [NIH registry][]:
1. (for binary form:) prefix existing binary multihash with `0x42` to designate that what follows is a multicodec prefix followed by an ULEB128 hash value.
2. (for ASCII form:) convert the `0x42` prefix to URL format, i.e., `ni:///mh;` and then append a base64url, no-padding encoding of the entire binary multihash with prefix (and _without_ adding the additional base-64-url-no-padding prefix, `u`, if using a [multibase][] library for this base-encoding).
Comment on lines +168 to +169

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a spec proposal? Doesn't seem to be anywhere else and seems to effectively be a separate spec.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea that if we merge this to then go to the nih registry and request 0x42?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: which number we use doesn't bother me but we used 0x42 for the CID tag in dag-cbor. I don't see NIH wanting CIDs more than multihashes so probably fine, but wanted to flag so it's documented here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea that if we merge this to then go to the nih registry and request 0x42?

that's the beauty of it:
https://www.ietf.org/archive/id/draft-multiformats-multihash-07.html#name-the-mh-digest-algorithm

(I assumed the 42 was an intentional nod to that other iana registration!)


### Namespaced UUIDs

Since the "Named Information Hash" URI scheme conforms to URL syntax (with or without an authority), each valid Named Information Hash URI can be assumed to be unique within the namespace of all valid URLs.
As such, any `ni://` URL (with or without an authority) can be hashed and used as a [UUIDv5](https://datatracker.ietf.org/doc/html/rfc9562#uuidv5) in the URL namespace, i.e. `6ba7b811-9dad-11d1-80b4-00c04fd430c8` (See [section 6.6](https://datatracker.ietf.org/doc/html/rfc9562#namespaces)).

Since this approach relies on SHA-1, and discards all but the most significant 128 bits of the hash output, its security may not be adequate for all applications, as noted in the specification.
Alternative ways of using a bounded namespace could include a novel namespace registration for UUIDv5, or a UUIDv8 approach, to content-address arbitrary information with namespaced UUID variants.

## Notes

Expand All @@ -149,6 +186,9 @@ They disagree. :(

**Obsolete and deprecated hash functions are included** in this list. [MD4](https://en.wikipedia.org/wiki/MD4), [MD5](https://en.wikipedia.org/wiki/MD5) and [SHA-1](https://en.wikipedia.org/wiki/SHA-1) should no longer be used for cryptographic purposes, but since many such hashes already exist they are included in this specification and may be implemented in multihash libraries.

MD5 and SHA-1 were previously used in TLS and DTLS protocols version 1.2, as defined in [RFC5246](https://www.rfc-editor.org/rfc/rfc5246#section-1.2), but were later deprecated by [RFC9155](https://www.rfc-editor.org/rfc/rfc9155.html).
MD4 seems to have gone out of favor even before TLS 1.2 was finalized at IETF, and was officially deprecated by [RFC-6150](https://www.rfc-editor.org/rfc/rfc6150).

### Non-cryptographic hash functions

Multihash is intended for *"well-established cryptographic hash functions"* as **non-cryptographic hash functions are not suitable for content addressing systems**. However, there may be use-cases where it is desireable to identify non-cryptographic hash functions or their digests by use of a multihash. Non-cryptographic hash functions are identified in the [Multicodec table](https://github.com/multiformats/multicodec/blob/master/table.csv) with a tag `hash` value in the `tag` column.
Expand Down Expand Up @@ -195,6 +235,14 @@ Check out our [contributing document](https://github.com/multiformats/multiforma

Small note: If editing the README, please conform to the [standard-readme](https://github.com/RichardLitt/standard-readme) specification.

## References

The [Prior Art and Translation](#prior-art-and-translation) section is heavily indebted to an earlier 2024 blog post, ["The Secret of NIMHs: Naming Things with Multihashes](https://bengo.is/blogging/the-secret-of-nimhs/), by github user @gobengo .

[multicodec]: https://github.com/multiformats/multicodec
[NIH registry]: https://www.iana.org/assignments/named-information/named-information.xhtml#hash-alg
[hashAlgorithm registry]: https://www.iana.org/assignments/tls-parameters/tls-parameters.xhtml#tls-parameters-18

## License

This repository is only for documents. All of these are licensed under the [CC-BY-SA 3.0](https://ipfs.io/ipfs/QmVreNvKsQmQZ83T86cWSjPu2vR3yZHGPm5jnxFuunEB9u) license © 2016 Protocol Labs Inc. Any code is under a [MIT](LICENSE) © 2016 Protocol Labs Inc.