diff --git a/README.md b/README.md index 25089c2..50f7680 100644 --- a/README.md +++ b/README.md @@ -18,14 +18,24 @@ It is useful to write applications that future-proof their use of hashes, and al - [Format](#format) - [Implementations](#implementations) - [Table for Multihash](#table-for-multihash) - - [Other Tables](#other-tables) +- [Prior Art And Translation](#prior-art-and-translation) + - [Named Information Hash](#named-information-hash) + - [Translation from multihash to named-information hash](#translation-from-multihash-to-named-information-hash) + - [Namespaced UUIDs](#namespaced-uuids) - [Notes](#notes) - [Multihash and randomness](#multihash-and-randomness) - [Insecure / obsolete hash functions](#insecure--obsolete-hash-functions) - [Non-cryptographic hash functions](#non-cryptographic-hash-functions) - [Visual Examples](#visual-examples) -- [Maintainers](#maintainers) + - [Consider these 4 different hashes of same input](#consider-these-4-different-hashes-of-same-input) + - [Same length: 256 bits](#same-length-256-bits) + - [Different hash functions](#different-hash-functions) + - [Idea: self-describe the values to distinguish](#idea-self-describe-the-values-to-distinguish) + - [Multihash: fn code + length prefix](#multihash-fn-code--length-prefix) + - [Multihash: a pretty good multiformat](#multihash-a-pretty-good-multiformat) + - [Multihash: has a bunch of implementations already](#multihash-has-a-bunch-of-implementations-already) - [Contribute](#contribute) +- [References](#references) - [License](#license) ## Example @@ -126,18 +136,45 @@ Yes, but we already have to agree on functions, so this is not hard. The table e ## Table for Multihash -We use a single [Multicodec](https://github.com/multiformats/multicodec) table across all of our multiformat projects. The shared namespace reduces the chances of accidentally interpreting a code in the wrong context. Multihash entries are identified with a `multihash` value in the `tag` column. +We use a single [Multicodec][] table across all of our multiformat projects. The shared namespace reduces the chances of accidentally interpreting a code in the wrong context. Multihash entries are identified with a `multihash` value in the `tag` column. The current table lives [here](https://github.com/multiformats/multicodec/blob/master/table.csv) -### Other Tables +## Prior Art And Translation -Cannot find a good standard on this. Found some _different_ IANA ones: +In IETF's corpus of normative protocols, there are two partial overlaps worth knowing about to ensure a safe implementation: -- https://www.iana.org/assignments/tls-parameters/tls-parameters.xhtml#tls-parameters-18 -- http://tools.ietf.org/html/rfc6920#section-9.4 +* "Named Information Hash", a.k.a. [RFC-6920](https://datatracker.ietf.org/doc/html/rfc6920), defines an hierarchical URI scheme for content-identifiers, partitioned by enumerated hash functions. The [NIH registry][] at IANA contains all of these. +* UUIDv5, aka "Namespaced UUIDs", defined in [RFC-9562](https://datatracker.ietf.org/doc/html/rfc9562#uuidv5), does the inverse, defining a universal namespace for one hash function, partitioned by the application of that function to multiple URI schemes (i.e. DNS names, valid URLs, etc.) +* The IANA [NIH registry][] has a similar shape and governance mode to the IANA [hashAlgorithm registry][] that TLS 1.2 implementations use to compactly signal supported hash+signature combinations. Since the former has different entries for some hash functions based on output length and the latter does not, the two registries are not alignable. However, given their different contexts, collisions between the two would not be a practical concern for users of either. -They disagree. :( +### Named Information Hash + +The "Named Information Hash" URI scheme allows for minimally self-describing hash strings to serve as content-identifiers for arbitrary binary inputs. +This lightweight identifier scheme is defined in [RFC-6920](https://datatracker.ietf.org/doc/html/rfc6920) and the supported hash-context prefixes live in an IANA registry named ["https://www.iana.org/assignments/named-information/named-information.xhtml#hash-alg"](https://www.iana.org/assignments/named-information/named-information.xhtml#hash-alg). +Its syntactic similarity to HTTP headers and [support for MIME content-types](https://datatracker.ietf.org/doc/html/rfc6920#section-3.1) makes it potentially useful for web use-cases, but use-cases are not constrained by URI scheme, only hinted at by the specification in sections 3 through 7. + +#### Translation from multihash to named-information hash + +Translating from a bare, binary multihash (i.e., a hash value in [`unsigned_varint`](https://github.com/multiformats/unsigned-varint) format, i.e. a minimally-encoded ULEB128 under 64 bits in total length) to a named-information hash in binary format is fairly easy to do insofar as a generic tag for self-describing multihashes was proposed to the [NIH registry][] by [Appendix B](https://www.ietf.org/archive/id/draft-multiformats-multihash-03.html#appendix-D.2) in the 2021 [multihash internet draft](https://www.ietf.org/archive/id/draft-multiformats-multihash-03.html): + +1. Strip the prefix bytes from the hash value and use the prefix bytes to identity the hash function used from the [Multicodec][] table +2. If multihash prefix corresponds to any tags in the [NIH registry][]: + 1. translate multicodec tag to NIH tag, i.e., if `0x12` (`sha2-256`) in `multicodec` registry, then `0x01` (`sha256`) in `named-information` registry + 2. transcode the hash value from [`unsigned varint`](https://github.com/multiformats/unsigned-varint) to standard MSB binary + 3. (for binary form:) reattach new prefix to transcoded hash value + 4. (for ASCII form:) convert prefix to URL format, i.e., `ni:///sha-256;` for `0x01`, and reattach to base64-encoded transcoded hash value +3. If multihash prefix does NOT map cleanly to a registered value in [NIH registry][]: + 1. (for binary form:) prefix existing binary multihash with `0x42` to designate that what follows is a multicodec prefix followed by an ULEB128 hash value. + 2. (for ASCII form:) convert the `0x42` prefix to URL format, i.e., `ni:///mh;` and then append a base64url, no-padding encoding of the entire binary multihash with prefix (and _without_ adding the additional base-64-url-no-padding prefix, `u`, if using a [multibase][] library for this base-encoding). + +### Namespaced UUIDs + +Since the "Named Information Hash" URI scheme conforms to URL syntax (with or without an authority), each valid Named Information Hash URI can be assumed to be unique within the namespace of all valid URLs. +As such, any `ni://` URL (with or without an authority) can be hashed and used as a [UUIDv5](https://datatracker.ietf.org/doc/html/rfc9562#uuidv5) in the URL namespace, i.e. `6ba7b811-9dad-11d1-80b4-00c04fd430c8` (See [section 6.6](https://datatracker.ietf.org/doc/html/rfc9562#namespaces)). + +Since this approach relies on SHA-1, and discards all but the most significant 128 bits of the hash output, its security may not be adequate for all applications, as noted in the specification. +Alternative ways of using a bounded namespace could include a novel namespace registration for UUIDv5, or a UUIDv8 approach, to content-address arbitrary information with namespaced UUID variants. ## Notes @@ -149,6 +186,9 @@ They disagree. :( **Obsolete and deprecated hash functions are included** in this list. [MD4](https://en.wikipedia.org/wiki/MD4), [MD5](https://en.wikipedia.org/wiki/MD5) and [SHA-1](https://en.wikipedia.org/wiki/SHA-1) should no longer be used for cryptographic purposes, but since many such hashes already exist they are included in this specification and may be implemented in multihash libraries. +MD5 and SHA-1 were previously used in TLS and DTLS protocols version 1.2, as defined in [RFC5246](https://www.rfc-editor.org/rfc/rfc5246#section-1.2), but were later deprecated by [RFC9155](https://www.rfc-editor.org/rfc/rfc9155.html). +MD4 seems to have gone out of favor even before TLS 1.2 was finalized at IETF, and was officially deprecated by [RFC-6150](https://www.rfc-editor.org/rfc/rfc6150). + ### Non-cryptographic hash functions Multihash is intended for *"well-established cryptographic hash functions"* as **non-cryptographic hash functions are not suitable for content addressing systems**. However, there may be use-cases where it is desireable to identify non-cryptographic hash functions or their digests by use of a multihash. Non-cryptographic hash functions are identified in the [Multicodec table](https://github.com/multiformats/multicodec/blob/master/table.csv) with a tag `hash` value in the `tag` column. @@ -195,6 +235,14 @@ Check out our [contributing document](https://github.com/multiformats/multiforma Small note: If editing the README, please conform to the [standard-readme](https://github.com/RichardLitt/standard-readme) specification. +## References + +The [Prior Art and Translation](#prior-art-and-translation) section is heavily indebted to an earlier 2024 blog post, ["The Secret of NIMHs: Naming Things with Multihashes](https://bengo.is/blogging/the-secret-of-nimhs/), by github user @gobengo . + +[multicodec]: https://github.com/multiformats/multicodec +[NIH registry]: https://www.iana.org/assignments/named-information/named-information.xhtml#hash-alg +[hashAlgorithm registry]: https://www.iana.org/assignments/tls-parameters/tls-parameters.xhtml#tls-parameters-18 + ## License This repository is only for documents. All of these are licensed under the [CC-BY-SA 3.0](https://ipfs.io/ipfs/QmVreNvKsQmQZ83T86cWSjPu2vR3yZHGPm5jnxFuunEB9u) license © 2016 Protocol Labs Inc. Any code is under a [MIT](LICENSE) © 2016 Protocol Labs Inc.