-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Order of fields in asset definition file is significant for hashing #15
Comments
Good points. |
@jeffomatic what is the practical use case where checking/enforcing some sort of canonical encoding is important? If I understand correctly, it's not only ordering of fields that matters, but also Unicode encoding must have some canonical form. As of now OA clients just need to preserve the original binary file, so no one cares how exactly it is formatted (we don't have malleability issues like with bitcoin transactions). This proposal requires every client to do additional checks or sort and canonically encode data to be sure it matches the hash. Not every language/platform has appropriate unicode equivalence conversion functions, or a correct implementation of them. Also, sorting the JSON object keys is not trivial. I haven't yet seen a library that has an API for that. In addition to above, a good chunk of existing highly important assets uses pretty-printed JSON formatting which probably won't be compatible with any kind of strict encoding rules. |
@oleganza Yeah, good observations! Even if canonicalizing the JSON is impossible, then I think it would be nice to get the following updates to the documentation:
The rest of this message might be purely academic. Regarding your concerns about practicality, I think the issues you point out with canonicalizing JSON are very real, and could be deal-breakers. But if we magically had some kind of canonical (if imperfect) serialization method, I'm tempted to think that a Without such a procedure, we put a burden on every server app and every client app to persist, render, and retrieve the asset definition in a specific way. In particular, regardless of how servers are storing the asset definition (e.g. as multiple columns in a SQL table, or a MongoDB document, etc.), you'd want to make sure they're also persisting the original JSON message used to generate the hash digest at the time of issuance, AND make sure they are delivering the original JSON message instead of rendering the structured data through an arbitrary pretty-printer. Likewise, clients would need to keep track of the raw HTTP body of the asset definition before proceeding with any kind of deserialization, which means that dynamic-language applications need to be careful about using any HTTP request library that performs automatic deserialization (which is a pretty widespread practice). I'm not sure if this is such a huge deal, so to reiterate, I think even making some adjustments to the documentation would be beneficial. |
The best bug free way to make it works is to hash the bytes provided by the asset definition url. I think it is correct to say that the field ordering is not important, and say explicitly that the hash refer to the raw bytes provided by the asset definition url. |
This is a subtle point, but if we're going to use hashing for verifying the asset definition file, then it's probably best that we define a strict ordering and formatting of the JSON so that idiosyncrasies in JSON serialization and/or rendering don't affect the value of the hash.
Given that the asset definition file is JSON, it's likely to be stored as such, and possibly deserialized and re-serialized before being rendered, or after being fetched for verification. JSON as a format doesn't care about indentation, newlines, or field ordering, but the hash operation does care about those things.
Off the top of my head, there are a couple of options:
null
values as if the key was removed, and then take the hash.The text was updated successfully, but these errors were encountered: